Abstraction in Artificial Intelligence and Complex Systems
Abstraction in Artificial Intelligence and Complex Systems
Jean-Daniel Zucker
Abstraction
in Artificial
Intelligence
and Complex
Systems
Abstraction in Artificial Intelligence
and Complex Systems
(Pierre Soulages, Peinture 260 9 202 cm, 19 juin 1963)
‘‘Une peinture est une organisation, un ensemble de relations entre des formes (lignes, surfaces
colorées) sur lequel viennent se faire ou se défaire les sens qu’on lui pr̂ete’’ (Franz̈osische
Abstrakte Malerei, Catalogue de l’exposition, Stuttgart, 1948)
Lorenza Saitta Jean-Daniel Zucker
•
Abstraction in Artificial
Intelligence and Complex
Systems
123
Lorenza Saitta Jean-Daniel Zucker
Dipartimento di Scienze e Innovazione International Research Unit
Tecnologica UMMISCO 209
Università degli Studi del Piemonte Research Institute for Development (IRD)
Orientale Bondy
Alessandria France
Italy
When we started writing this book we were aware of the complexity of the task,
but we did not imagine that it would take us almost three years to complete it.
Furthermore, during the analysis and comparison of the literature from different
fields, it clearly emerged that important results have been achieved, but that much
more important ones are still out of reach. Then, the spirit of the book changed, by
itself, from the intended assessment of the past to a stimulus for the future. We
would be happy if the reader, instead of being content with the ideas we propose,
would take them as a motivation and starting point to go beyond them.
We present a large selection of works on abstraction in several disciplines;
nonetheless many relevant contributions to the field have been necessarily left out,
owing to the sheer amount of pages they would fill. We apologize for the missing
citations.
In this book we present a model of abstraction, the KRA model, but this is not
the core of the book. It has a limited scope and serves two main purposes: on the
one hand it shows that several previous proposals of abstraction theories have a
common root and can be handled inside a unified framework, and, on the other, it
offers a computational environment for performing abstraction by applying a set of
available, domain-independent operators (programs). In fact, there is still a gap
between general abstraction theories, mostly elegant logical formulations of rep-
resentation changes, and concrete approaches that heavily rely on specific domain
characteristics. The KRA model is meant to be something in between: the
domain-independence of the abstraction operators achieves both generality (it can
cover a broad spectrum of applications and application domains), and synergy (by
instantiating in different contexts some code written just one time).
Independently of the model, we believe that the basic ideas on which it relies
are more important than the model itself. These ideas are certainly arguable; some
reader might think that our view of abstraction is exactly what he/she has always
looked for, whereas some other might think that abstraction is totally something
else. Both reactions are welcome: what matters is to trigger interest in the subject
and stimulate more research.
The book is not intended to be a textbook: it is targeted to scientists working on
or using abstraction techniques, without limitation of fields. Computer scientists,
Artificial Intelligence researchers, artists, cognitive scientists, mathematicians, and
vii
viii Preface
curious minds can read the book. Some parts are more formalized, and they may
look complex at first sight. However, we believe that the greatest part of the
content is graspable by intuition.
Finally, we mention that we have set up a companion Web site (http://
www.abstractionthebook.com), where implemented operators are uploaded.
Anyone interested in abstraction is welcome to contribute to it.
The authors would like to thank Yann Chevaleyre for his invaluable help and
expertise in abstraction in Reinforcement Learning, Laurent Navarro and Vincent
Corruble for their help in abstraction in multi-agent systems, and Nicolas Reg-
nauld, from Edinburgh University, for providing them with both test data and his
expertise on relevant operations and measures on buildings in Cartography.
Lorenza would like to thank her husband Attilio, who contributed, with
insightful discussions, to shape the content of the book, and also for providing two
of the appendices.
Jean-Daniel would like to thank his wife (Miao), children (Zoé, Théo, Arthur
and Nicolas), family, colleagues, and friends (especially Jeffrey, Bernard, Joël,
Vincent, Laurent and Alexis) for encouraging and tolerating him through the long
hours of writing and longer hours of rewriting, and Pierre Encrevé for his unfailing
availability and communicative passion for Soulages.
Finally, the authors are deeply grateful to Pierre Soulages, to the Escher
Company, and to the Museo Larco (Lima, Peru), who allowed them to illustrate
their idea with magnificent works, and also to all the authors who granted per-
mission to publish some of their figures, contributing to the visual and conceptual
enrichment of this book.
Finally, Lorenza and Jean-Daniel are grateful to Melissa Fearon and Courtney
Clark, at Springer, for their patience in waiting that this book would be completed.
ix
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Definitions of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1 Giunchiglia and Walsh’ Theory . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Abstraction in Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.1 Wright and Hale’s Abstraction Principles . . . . . . . . . 70
4.2.2 Floridi’s Levels of Abstraction . . . . . . . . . . . . . . . . . 71
4.3 Abstraction in Computer Science . . . . . . . . . . . . . . . . . . . . . 77
4.4 Abstraction in Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4.1 Miles Smith and Smith’s Approach . . . . . . . . . . . . . 79
4.4.2 Goldstein and Storey’s Approach . . . . . . . . . . . . . . . 83
4.4.3 Cross’ Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . 84
xi
xii Contents
4.5 Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.1 Hobbs’ Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.2 Imielinski’s Approach . . . . . . . . . . . . . . . . . . . . . . . 89
4.5.3 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5.4 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.6 Syntactic Theories of Abstraction . . . . . . . . . . . . . . . . . . . . . 94
4.6.1 Plaisted’s Theory of Abstraction. . . . . . . . . . . . . . . . 94
4.6.2 Tenenberg’s Theory . . . . . . . . . . . . . . . . . . . . . . . . 96
4.6.3 De Saeger and Shimojima’s Theory . . . . . . . . . . . . . 99
4.7 Semantic Theories of Abstraction . . . . . . . . . . . . . . . . . . . . . 103
4.7.1 Nayak and Levy’s Theory . . . . . . . . . . . . . . . . . . . . 103
4.7.2 Ghidini and Giunchiglia’s Theory. . . . . . . . . . . . . . . 106
4.8 Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.8.1 Lowry’s Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.8.2 Choueiry et al.’s Approach . . . . . . . . . . . . . . . . . . . 113
4.8.3 Subramanian’s Approach . . . . . . . . . . . . . . . . . . . . . 114
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
13.1 Ubiquity of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
13.2 Difficulty of a Formal Definition . . . . . . . . . . . . . . . . . . . . . 408
13.3 The Need for an Operational Theory of Abstraction . . . . . . . . 408
13.4 Perspectives of Abstraction in AI . . . . . . . . . . . . . . . . . . . . . 410
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Chapter 1
Introduction
1 See Chap. 2.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 1
DOI: 10.1007/978-1-4614-7052-6_1, © Springer Science+Business Media New York 2013
2 1 Introduction
the aim of capturing what they might have in common. From this comparison one
has the feeling that coming out with a theory of abstraction, both sufficiently general
to cover all of its uses, and, at the same time, “operational”, is a task deemed to fail
from the onset.
Given the acknowledged importance of abstraction in human reasoning, it is likely
that an analogously basic role should be played by abstraction in the design of “intel-
ligent” artefacts. Researchers in Artificial Intelligence (AI) have indeed proposed
various theories of abstraction,2 based on different principles. However, the difficulty
of transforming these theories into some procedure able to generate, in a possibly
automatic way, abstractions useful in practice has suggested to target less ambitious
goals. As we are interested, in the end, in computational models of abstraction (even
though limited in scope), the work done in AI may be a primary source of inspiration,
as well as a term of reference to match theories proposed elsewhere.
Even amid the multiplicity of interpretations, there is a generic agreement that
abstraction plays a key role in representing knowledge and in reasoning. The first
intuitive idea of abstraction that comes to mind, especially in everyday life, is that
of something which is far from the sensory world, and can only exist in the realm of
thought. For instance, most people think of Mathematics as an essentially abstract
discipline, and a branch of modern art has assumed abstraction as its very defini-
tion (see, as an example, Fig. 1.1). This interpretation complies with the etymological
meaning, in the sense that “to abstract” is to take away all aspects that can be captured
with our senses. In abstract art objects are stripped of their mundane concreteness to
leave their bare essence. An important work relating art, abstraction and neurophys-
iology has been done by Zeki [581], who tried to explain how the brain perceives
art. In doing so, he claims that abstraction is the common ability that underlies the
functioning of most cells in the visual system, where abstraction is, in this context,
“the emphasis on the general property at the expense of the particular”.
2 See Chap. 3.
1 Introduction 3
Fig. 1.2 Satellite image of the center of Torino (left): buildings and monuments are visible. The
same area can be described by considering just the street network (right): this abstract map is more
convenient for moving around the city
At each step of refinement more details are possibly taken into account, generating
a sequence of solutions, each one more detailed than the previous one. In this case
we may speak of a hierarchy of levels of abstraction, with the highest levels poor
in details, and the lowest ones very rich. An example of a hierarchy is reported in
Fig. 1.3.
As we will see, the hierarchical approach is widespread in Computer Science and
Artificial Intelligence. However, choosing the correct level of detail to work with,
on any given problem, is a crucial step; in fact, a poor choice may be harmful to the
solution.
A sensible issue in defining abstraction, one which is at the core of a hot debate, is
its relation with generalization, defined as the process of extracting common prop-
erties from a set of objects or situations. Sometimes abstraction and generalization
have simply been equated. It is clear that, being a matter of definition, nothing hin-
ders, in principle, from defining abstraction as generalization. However, this equation
does not allow one to see possibly useful differences, which can be observed if the
two concepts are taken apart. Then, hypothesizing a certain type of relation between
generalization and abstraction is not a question of correctness or truth, but of conve-
nience. The discussion on the links between generalization and abstraction should
also include the notion of categorization; this triad is fundamental for the conceptu-
alization of any domain of thought.7
Another dimension along which abstraction can be considered is related to infor-
mation content. Abstracting, from this perspective, corresponds to reducing the
amount of information that an event or object provides. This information can be
hidden or lost, according to the view of abstraction as a reversible or irreversible
process. Clearly this aspect of abstraction is strictly related to the ideas of levels
Fig. 1.3 Example of hierarchy in the field of animal classification. The lower the level, the more
details are added to the characterization of the animals
Monitor
Body
Mouse
Keyboard
Computer
Fig. 1.4 a The components of a computer are perceived as constituting a unique object.
b Abstraction makes a substitution of a set of objects with a single one, reducing thus their number
the problem of abstraction definition, even though he makes use of an intuitive notion
thereof, but clearly his work can be put in relation with this fundamental problem.8
Also Archer et al. [21] link abstraction to information handling. They claim that
“Abstraction is probably the most powerful tool available to managing complexity”.
To tame the complexity of a problem they see two ways: reducing information or
condensing it. Reducing information can be related to the previously introduced idea
of selecting the most relevant aspects of a problem and deleting details, whereas
condensation is a form of aggregation. As in Brooks’ perspective, abstraction is the
bridge between an extremely rich sensory input and what we actually keep of it.
Globally, all the perspectives outlined before on the definition of abstraction con-
verge to a change of representation. In fact, it is often true that finding an adequate
representation for a problem may be the hardest part of getting a solution. The generic
process of abstraction is represented in Fig. 1.5. Of course, the change of representa-
tion must be goal-oriented, i.e., useful to solve a given problem, or to perform a task
more easily. Moreover, not any change of representation is an abstraction, and it is
necessary to circumscribe abstraction’s scope. Intuitively, an abstract representation
should be “simpler” that the original one. In this way, abstraction is strictly related
to the notion of simplicity; however, this link does not make its definition any easier,
as simplicity seems to be an equally elusive notion.9
8 See Chap. 3.
9 See Chap. 10.
1 Introduction 7
Fig. 1.5 Abstraction process for Problem Solving. Step 1 concerns a representation change justi-
fied by the need to reduce the computational complexity to solve a ground problem. Step 2 involves
solving the abstract problem. Step 3 refines the abstract solution to obtain one in the ground repre-
sentation space. The overhead of the representation changes (Steps 1 and 3) needs to be taken into
account to assess the abstraction usefulness
In order to make sense of the various definitions, theories, and practices of abstrac-
tion in different disciplines and contexts, it is necessary to go further, carrying out a
comparative analysis of the alternative approaches, with the aim of identifying com-
monalities (however vagues) and differences (either superficial or essential ones), in
order to possibly define a set of properties that abstraction should satisfy for a given
class of tasks. Furthermore, it is useful and clarifying to contrast/compare abstraction
with the notions of generalization, categorization, approximation, and reformulation
in general.10
Based on the results of the comparison, we come out with a model of abstraction,
the KRA model, which tries to bring back this notion to its perceptive source.11 The
model does not have the ambition to be universal; on the contrary, it is targeted to
the task of conceptualizing the domain of a given application field. In essence, it is
first and foremost suited to model abstraction in systems that can be experimentally
observed.
As we have already said, it is essential, in order to fully exploit the power of
abstraction, that this notion becomes “operational”; in other words, even though
finding a “good” abstraction is still a matter of art, it should nevertheless be possible
to identify a set of operators that can be (semi-)automatically applied when a given
pattern of pre-conditions is discovered in the problem at hand. Operators may be
12 See Chap. 7.
13 See Chap. 12.
14 See Chap. 10.
15 See Chaps. 5, and 9.
1 Introduction 9
an important way in which this concept can be instantiated is the reduction of the
computational complexity of programs. Clearly, if, on the one hand, abstraction
reduces the complexity of problem solution, on the other hand, its application has a
cost; this cost has to be traded-off with the beneficial effects of the simplification.
Then, choosing the “correct” abstraction requires to find a delicate balance between
different costs.16
Even in the absence of a general theory of abstraction, there are significant appli-
cations of this notion in different fields and domains. It is thus interesting to look at a
set of selected applications, in order to show what are the advantages that abstraction
provides, both in terms of simplification of the conceptualization of a domain and of
problem solving.17
1.1 Summary
In this chapter the book’s content is outlined. Investigating abstraction and its
computational properties involves a sequence of steps, the first one being collecting
and comparing various notions of abstraction used in a variety of disciplines, from
Philosophy to Art, from Computer Science to Artificial Intelligence. Then, in view
of building a computational model, it is necessary to set some boundaries around
the notion, distinguishing it from generalization, approximation, reformulation, and
so on. A computational model of abstraction tries to capture its essential properties,
and makes abstraction operational by means of operators.
As abstraction is often linked to simplicity, the relations between different defin-
itions of abstraction and different definitions of simplicity (or complexity) must be
investigated. In general, abstraction is employed, in problem solving, for reducing
the computational complexity of a task. As abstracting has a cost in itself, a balance
has to be obtained between this cost and the reduction of the cost required by finding
an abstract solution.
Abstraction is not only interesting per se, but we believe it is at the basis of other
forms of reasoning, for instance analogy. Finally, in order to show the utility of
using abstraction in general, the existing models are compared, and some domains
of applications will be described in detail.
2.1 Philosophy
1 http://plato.stanford.edu/entries/abstract-objects/
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 11
DOI: 10.1007/978-1-4614-7052-6_2, © Springer Science+Business Media New York 2013
12 2 Abstraction in Different Disciplines
One of the first attempts to pin down the idea of abstraction has been made in
the Greek Philosophy, most notably by Plato, who proposed a distinction between
the forms or ideas (abstract, ideal entities that capture the essence of things) and the
objects in the world (which are instantiations of those ideas) [420]. According to
him, abstraction is simple: ideas do not exist in the world, they do not have substance
or spatial/temporal localization, but their instantiations do. In this approach we may
recognize the basic reflex of associating abstraction with being far from the sensible
world, and of capturing the “essence” of things; however, Plato’s ideas have still their
own kind of existence in some other realm, like “idols in a cavern”, from where they
shape the reality and have causal power.
The foundation of abstract reasoning was set later on by Aristotle, who perfected
the symbolic methods of reasoning, and whose views dogmatically entered the whole
body of Medieval Philosophy. According to Aristotle, there are three types of abstrac-
tion:
• Physical abstraction—Concrete objects are deprived of their specific attributes
but keep their material nature. For instance, starting from the physical reality
of an individual man, the physical, universal characteristics of all men can be
apprehended.
• Mathematical abstraction—Sensory characteristics of embodied objects are
ignored, and only the intelligible ones are kept.
• Metaphysical abstraction—Entities are considered as disembodied, leaving apart
any connotation linked to their realizations. Metaphysics starts not from things, but
from the idea of things (res or aliquid) and tries to discover the essence contained
in that idea.
In Philosophy the idea of abstraction has been mainly related to two aspects of
reasoning: on the one hand, generalization, intended as a process that reduces the
information content of a concept or an observable phenomenon, typically in order
to retain only information which is relevant for a particular purpose. Abstraction,
thus, results in the reduction of a complex idea to a simpler concept, which allows
the understanding of a variety of specific scenarios in terms of basic ideas.
On the other hand, abstraction has been investigated in connection with the very
nature or essence of things, specifically in order to ascertain their epistemological or
ontological status. Abstract things are sometimes defined as those things that do not
exist in reality, do not have a spatio/temporal dimension, and are causally inert. On
the opposite, a physical object is concrete because it is a particular individual that is
located at a particular place and time.
Originally, the “abstract/concrete” distinction was a distinction between words or
terms. Traditionally, grammar distinguishes the abstract noun “whiteness” from the
concrete noun “white” without implying that this linguistic contrast corresponds to
a metaphysical distinction. In the seventeenth century this grammatical distinction
was transposed to the domain of ideas. Locke supported the existence of abstraction
[339], recognizing the ability to abstract as the quality that distinguishes humans
from animals and makes language possible. Locke speaks of the general idea of a
2.1 Philosophy 13
triangle which is “neither oblique nor rectangle, neither equilateral nor scalenon,
but all and none of these at once”.
Locke’s conception of an abstract idea, as one that is formed from concrete ideas
by the omission of distinguishing details, was immediately rejected by Berkeley, and
then by Hume. Berkeley argued that the concept of an abstract idea is incoherent
because it requires both the inclusion and the exclusion of one and the same property
[53]. An abstract idea would have to be general and precise at the same time, general
enough to include all instances of a concept, yet precise enough to exclude all non-
instances. The modern empiricism of Hume and Berkely refuses that the mind can
attain knowledge of the universals through the generalization process. The mind does
not perform any abstraction, but, on the contrary, selects a particular and makes, out
of it, a template of all particular occurrences that are the only possible realities.
For Kant there is no doubt that all our knowledge begins with experience [280],
i.e., it has a concrete origin. Nevertheless, by no means it follows that everything
derives from experience. For, on the contrary, it is possible that our knowledge is a
compound of sensory impressions (phenomena) and of something that the faculty of
cognition supplies from itself a priori (noumena). By the term “knowledge a priori”,
therefore, Kant means something that does not come from the sensory input, and
that is independent from all experiences. Opposed to this is “empirical knowledge”,
which is possible to obtain only a posteriori, namely through experience. Knowledge
a priori is either pure or impure. Pure a priori knowledge is not mixed up with
any empirical element. Even though not set by Kant himself in these terms, the
counterposition between a priori (or pure) knowledge and a posteriori or empirical
knowledge mirrors the dichotomy between abstract and concrete knowledge. From
this point of view, abstraction is not (directly) related to generalization or concept
formation, but represents some sort of a priori category of human thinking.
The Kantian Illuminism, with its predilection for the intellect, was strongly
criticized by Hegel [240], who considered it as the philosophical abstraction of
everything, both real and phenomenological. According to Hegel, the philosophers
of his time had so abstracted the physical world that nothing was left. Hegel rejected
this line of reasoning, concluding in contrast that “What is real is rational—what
is rational is real”. He set out to reverse this trend, moving away from the abstract
and toward the concrete. Hegel viewed the phenomenological world (what can be
sensed by humans or manmade instruments) and the conceptual (thoughts and ideas)
as equal parts to existence. Hegel thought that abstraction inherently leads to the
isolation of parts from the whole. Eventually, abstraction leads to the point where
physical items and phenomenological concepts have no value.
Abstraction plays a central role also in Marx’s philosophy. By criticizing Hegel,
Marx claims that his own method starts from the “real concrete” (the world) and
proceeds through “abstraction” (intellectual activity) to the “thought concrete” (the
whole present in the mind) [355]. In one sense, the role Marx gives to abstraction is
the simple recognition of the fact that all thinking about reality begins by breaking
it down into manageable parts. Reality may be in one piece when lived, but to be
thought about and communicated it must be parceled out. We “see” only some of what
lies in front of us, “hear” only part of the noises in our vicinity; in each case, a focus
14 2 Abstraction in Different Disciplines
is established, and a kind of boundary set within our perceptions, distinguishing what
is relevant from what is not. Likewise, in thinking about any subject, we focus on
only some of its qualities and relations. The mental activity involved in establishing
such boundaries, whether conscious or unconscious, is the process of abstraction.
A complication in grasping Marx’s notion of abstraction arises from the fact that
Marx uses the term in four different senses. First, and most important, it refers to the
mental activity of subdividing the world into the mental constructs with which we
think about it, which is the process that we have been describing. Second, it refers to
the results of this process, the actual parts into which reality has been apportioned.
That is to say, for Marx, as for Hegel before him, “abstraction” functions as a noun
as well as a verb, the noun referring to what the verb has brought into being. But
Marx also uses “abstraction” in a third sense, where it refers to a kind of particularly
ill fitting mental constructs. Whether because they are too narrow, or take in too
little, or focus too exclusively on appearances, these constructs do not allow an
adequate grasp of their subject matter. Taken in this third sense, abstractions are the
basic unit of ideology, the inescapable ideational result of living and working in an
alienated society. “Freedom”, for example, is said to be such an abstraction whenever
we remove the real individual from “the conditions of existence within which these
individuals enter into contact” [356]. Omitting the conditions that make freedom
possible makes “freedom” a distorted and obfuscated notion.
Finally, Marx uses the term “abstraction” in a fourth sense, where it refers to a
particular organization of elements in the real world (having to do with the functioning
of capitalism). Abstractions in this fourth sense exist in the world and not, as in
the case with the other three, in the mind. In these abstractions, certain spatial and
temporal boundaries and connections stand out, just as others are obscure or invisible,
making what is in practice inseparable to appear separate. It is in this way that
commodities, value, money, capital are likely to be misconstrued from the start.
Marx labels these objective results of capitalist functioning “real abstractions”, and
it is to these abstractions that he refers to when he says that in capitalist society
“people are governed by abstractions” [356]. As a conclusion, we can say that
Marx’s abstractions are not things but rather processes. These processes are also,
of necessity, systemic relations. Consequently, each process acts as an aspect, or
subordinate part, of other processes, grasped as clusters of relations.
In today’s Philosophy the abstract/concrete distinction aims at marking a line
in the domain of objects. An important contribution was given by Frege [181].2
Frege’s way of drawing this distinction is an instance of what Lewis calls the Way
of Negation [329]. Abstract objects are defined as those that lack certain features
possessed by paradigmatic concrete things. Contemporary supporters of the Way of
Negation modify now Frege’s criterion by requiring that abstract objects be non-
spatial and/or causally inefficacious. Thus, an abstract entity can be defined as a
non-spatial (or non-spatio/temporal), causally inert thing.
The most important alternative to the Way of Negation is what Lewis calls the
Way of Abstraction [329]. According to the tradition in philosophical Psychology,
abstraction is a specific mental process in which new ideas or conceptions are formed
by considering several objects or ideas and omitting the features that distinguish
them. Nothing in this tradition requires that ideas formed in this way represent or
correspond to a distinctive class of objects. But it might be maintained that the
distinction between abstract and concrete objects should be explained by reference
to the psychological process of abstraction or something like it. The simplest version
of this strategy would be to say that an object is abstract if it is (or might be) the
referent of an abstract idea, i.e., an idea formed by abstraction.
Starting from an observation by Frege, Wright [568] and Hale [230] have devel-
oped a “formal” account of abstraction. Frege points out that terms that refer to
abstract entities are often formed by means of functional expressions, for instance,
the direction of a line, the number of books. When such a function f (a) can be
defined, there is typically an equation of the form:
These equations are called abstraction principles,4 and appear to have a special
meaning: in fact, they are not exactly definitions of the functional expression that
occurs on the left-hand side, but they hold in virtue of the meaning of that expression.
To understand the term “direction” requires to know that “the direction of a” and
“the direction of b” refer to the same entity if and only if the lines a and b are
parallel. Moreover, the equivalence relation that appears on the right-hand side of
the equation comes semantically before the functional expression on the left-hand
side [403]. Mastery of the concept of “direction” presupposes mastery of the concept
of parallelism, but not vice versa. In fact, the direction is what a set of parallel lines
have in common.
An in depth discussion of the concrete/abstract distinction in Philosophy, with
a historical perspective, is provided by Laycock [321].5 He starts by considering
the two dichotomies “concrete versus abstract”, and “universal versus particular”,
which are commonly presented as being mutually exclusive and jointly exhaustive
categories of objects. He claims that “the abstract/concrete, universal/particular
… distinctions are all prima facie different distinctions, and to thus conflate them
can only be an invitation to further confusion”. For this reason he suggests that the
first step to clarify the issues involved with the dichotomies is to investigate the
relationship between them.
Regarding the dichotomy of concrete and abstract objects, he notices that “this
last seems particularly difficult. On the one hand, the use of the term “object” in this
context strongly suggests a contrast between two general ontic categories. On the
other hand, though, the adjective abstract is closely cognate with the noun “abstrac-
tion”, which might suggest “a product of the mind”, or perhaps even “unreal” or
“non-existent” …”. This dichotomy has “at least two prominent but widely divergent
interpretations. On the one hand, there is an ontic interpretation, and there is a purely
semantic or non-objectual interpretation, on the other hand. Construed as ontic, the
concrete/abstract dichotomy is commonly taken to simply coincide with that of uni-
versal and particular.” This interpretation has been adopted, for instance, by Quine
[437]. On the contrary, the semantic interpretation of the dichotomy was accepted
by Mill [373] and applied to names: “A concrete name is a name which stands for a
thing; an abstract name is a name which stands for an attribute of a thing.”
According to Barsalou and Wiemer-Hastings [37], concrete and abstract concepts
differ in their focus on situational contexts: concrete concepts focus on specific
objects and their properties in situations, whereas abstract concepts focus on events
and introspective properties.
Once the distinction between concrete and abstract has been introduced, it is a
small step ahead to think of varying degrees of abstraction, organized into a hierarchy.
The study of the reality on different levels has been the object of various kinds
of “levelism”, from epistemological to ontological. Even though some of the past
hierarchical organizations of reality seem obsolete, Floridi claimed recently [175]
that the epistemological one is tenable, and proposed a “theory of the levels of
abstraction”. At the basis of this theory there is the notion of “observable”. Given
a system to be analyzed, an observable is a variable whose domain is specified,
together with the feature of the system that the variable represents.6 Defining an
observable in a system corresponds to a focalization on some specific aspect of
the system itself, obtaining, as a result, a simplification. It is important to note that
an observable is properly defined only with respect to its context and use. A level
of abstraction (LoA) is nothing else than a finite and non-empty set of observables.
Different levels of abstraction for the same system are appropriate for different goals.
Each level “sees” the system under a specific perspective. The definition of a level of
abstraction is only the first step in the analysis of a system. In fact, taken in isolation,
each observable might take on values that are incompatible with those assumed by
some others. Then, Floridi introduces a predicate over the observables, which is true
only if the values assumed by the observables correspond to a feasible behavior of
the system. A LoA with associated a behavior is a moderated LoA.
As previously said, different LoAs correspond to different views of a system. It
is thus important to establish relations among them. To this end, Floridi introduces
the concept of Gradient of Abstraction (GoA), which is a finite set {L i |1 i n}
of moderated LoAs, and a set of relations relating the observables belonging to pairs
of LoAs. A GoA can be disjoint or nested. Informally, a disjoint GoA is a collection
of unrelated LoAs, whereas a nested one contains a set of LoAs that are refinements
one of another.
6 An observable does not necessarily correspond to a physically measurable entity, because the
system under analysis may be a conceptual one.
2.1 Philosophy 17
(they viewed the world through the senses) [47]. As an example, he mentions that
the abstract word “anger” corresponds, in the ancient Hebrew, to “nose”, because a
Hebrew sees anger as “the flaring of the nose”.
In a sense, the whole language is an abstraction, because it substitutes a “name”
to the real thing. And this is another way of considering abstraction in language.
By naming an entity, we associate to the name a bundle of attributes and functions
characterizing the specific instances of the entity. For example, when we say car, we
think to a closed vehicle with 4 wheels and a steering wheel, even though many details
may be left unspecified, such as the color, the actual shape, and so on. The ontological
status of the “universal” names has been debated, especially in the late Medieval time,
with positions ranging from the one of Roscelin,7 who claimed that universals are
nothing more than verbal expressions, to that of Guillaume de Champeaux,8 who,
on the contrary, sustained that the universals are the real thing.
Independently from their ontological status, words stand for common features of
perceived entities, and they are considered abstractions derived from extracting the
characterizing properties of classes of objects. The word tree, for instance, represents
all the concrete trees that can exist. This view is based on a referential view of the
meaning of the words. Kayser [283] challenges this view, proposing an inferential
view of the words’ semantic: words are premises of inference rules, and they end up
denoting classes of objects only as a side-effect of the role they play. Barsalou sees
the process of naming an object as a way to simplify its representation, by endowing
it with invisible properties that constitute its very nature [38]. An interesting aspect
of naming is the interaction between vision and linguistic [144, 165]. Assigning a
name to a seen object implies recognizing its shape, identifying the object itself and
retrieving a suitable word for it. The name can then act as the semantic of an image.
The role of the name as an abstraction of the concrete thing also plays a relevant
role in magics. According to Cavendish [89], “the conviction that the name of a thing
contains the essence of its being is one of the oldest and most fundamental of magical
beliefs.... For the magical thinker the name sums up all the characteristics which make
an animal what it is, and so the name is the animal’s identity.” For instance, burying
a piece of lead, with the name of an enemy written on top together with a curse,
was, supposedly, a way of killing the enemy. View from this perspective, the name
is quite dangerous to a person, because he/she can be easily harmed through his/her
name. For this reason, in many primitive societies a man had two names: one to be
used in the everyday life, and another, the real one, is kept secret. For similar reasons
also the names of gods and angels were often considered secret. An Egyptian myth
tells that the goddess Isis, in order to take over the power of the sun-god Ra, had to
discover his name. Magical power or not, a name is, after all, a shortcut allowing a
complex set of properties to be synthesized into a word.
7 French philosopher, who lived in France in the second half of XII century. His work is lost, but
references to it can be found in the works by Saint Anselm and Peter Abelard.
8 French philosopher, who lived in the late XII century in Paris. He also has been a teacher of Peter
Abelard, who, later, convinced him to change his opinion about universals.
20 2 Abstraction in Different Disciplines
An approach relevant to both abstraction and language, even though not explicitly
stated so, is described by Gärdenfors [193]. He discusses the representations needed
for language to evolve, and he identifies two main types: cued and detached. A cued
representation “stands for something that is present in the current external situation
of the representing organism”. On the contrary, a detached representation may stand
for objects or events that are neither present in the current situation nor triggered by
some recent situation. Strictly connected with these representations are the notions of
symbol, which refers to a detached representation, and signal, which refers to a cued
one. Languages use mostly symbols. Animals may show even complex patterns of
communication, but these are patterns of signals, not symbols. Gärdenfors’ distinc-
tion closely resembles the distinction between abstract and concrete communication;
in this context an abstract communication may involve things that have been, that
could be, or that are not localized in time and space. A signal system, instead, can
only communicate what is here and now.
In natural language abstraction enters also as a figure of style. In fact, abstraction
is a particular form of metonymy, which replaces a qualifying adjective by an abstract
name. For example, in La Fontaine’s fable Les deux coqs (VII, 13) the sentence “tout
cet orgueil périt … (all this pride dies)”, refers, actually to the dying cock.
2.3 Mathematics
9 See http://www.wordiq.com/definition/Abstraction_(mathematics).
22 2 Abstraction in Different Disciplines
visible experiment.” Then, Roşu shows that the notion of behavioral abstraction is a
special case of a more general abstraction technique, namely information hiding.
Another technical notion of abstraction is presented by Antonelli in a recent paper
[20]. Starting from the abstraction principle (2.1), introduced by Wright [569] and
Hale [230] and reported in Sect. 2.1, he defines an abstraction operator, which assigns
an object—a “number”—to the equivalence classes generated by the equinumerosity
relation, in such a way that each class has associated a different object. According
to Antonelli, this principle is what is needed to formalize arithmetic following the
“traditional Frege-Russel strategy of characterizing the natural numbers as abstracta
of the equinumerosity relation.”
More precisely, numbers, as abstract objects, are obtained by applying an abstrac-
tion operator to a concept (in Frege’s sense). However in order to be an abstraction,
such mapping from concepts to objects must respect a given equivalence relation
[19]. In the case of numbers, the principle of numerical abstraction, or Hume’s Prin-
ciple, postulates an operator Num assigning objects to concepts in such a way that
concepts P and Q are mapped to the same object exactly when as many objects falls
under P as they fall under Q. The object Num(P) can be regarded as “the number
of P”.
His view of abstraction is called by Antonelli deflationary [19], because it denies
that objects obtained via abstraction enjoy a special status: they are “just ordinary
objects, recruited for the purpose of serving as proxies for the equivalence classes
of concepts generated by the given equivalence relation.” Abstraction principles are
linguistically represented by introducing a “term-forming” operator Φ(P), which
stands for the possibly complex predicate expression P.
An interesting overview of the notion of abstraction in Mathematics is given
by Ferrari, who tries to establish connections with other fields, such as Cognitive
Science, Psychology, and mathematical education practice [166]; the reason is that
“abstraction has been early recognized as one of the most relevant features of Math-
ematics from a cognitive viewpoint as well as one of the main reasons for failure in
Mathematics learning.”
By looking at the history of Mathematics, Ferrari acknowledges that abstract
objects have been characterized by a certain degree of both generalization and decon-
textualization. However, he points out that maybe their primary role is in creating
new concepts, when, specifically, a (possibly complex) process or relation are reinter-
preted as (possibly simpler) objects, as in Antonelli’s approach [19, 20]. An example
is provided by the arithmetic operations, which, at the beginning, are learned as pro-
cedures, but then become objects whose properties (for instance, associativity) can
be investigated. This transition is called encapsulation [142] or reification [482].
Ferrari argues that generalization, decontextualization and reification are all basic
components of abstraction in Mathematics, but that abstraction cannot be identi-
fied with any single one of them. For instance, generalization, defined as an exten-
sional inclusion relation, cannot exhaust the abstraction process, which also includes
recognition of common properties, adoption of a compact axiom set, and definition
of a notation systems to deal with newly defined concepts. Even though generaliza-
tion and decontextualization do not coincide, generalization implies a certain degree
2.3 Mathematics 23
Fig. 2.1 Specification templates for procedural and data abstraction. When assigning a name to the
procedure, its inputs and outputs are defined. For data, their structure and the applicable operations
are defined
For Liskov and Guttag [333], “abstraction is a many-to-one map.” It ignores irrel-
evant details; all its realizations must agree on the relevant details, but may differ on
the irrelevant ones. Abstraction is defined by Liskov and Guttag by means of speci-
fications. They introduce templates for procedural and data abstraction, examples of
which are reported in Fig. 2.1.
As we will see in Chap. 7, abstraction operators can be represented with Abstract
Procedural Types. Let us now introduce examples of procedural and data abstraction
in order to clarify these notions.
Example 2.1 Suppose that we want to write a procedure for searching whether an
element y appears in a vector X without specifying the actual program to do it. We
can define the following abstract procedure:
pname = Search(X, y) returns({true, false})
requires X is a vector, y is of the same type as the elements of X
modifies ∅
effects Searches through X , and returns true if y occurs in X , else returns false
end pname
Data abstraction, on the other hand, consists in defining a type of data and the
operations that manipulate it. Data abstraction makes a clear separation between the
abstract properties of a data type and its concrete implementation.
Example 2.2 Let us define the data type complex number z as a pair (x, y) of
real numbers, with associated some operations, such as, for example, Real(z),
Imaginary(z), Modulus(z) and Phase(z).
dname = complex is pair of reals (x, y)
Overview A complex number has a real √ part, x, and an imaginary one, y, such that
z = x + i y, where i = −1. In polar coordinates z has a modulus
and a phase.
Operations Real (z) = x
Imaginary (z) = y
2.4 Computer Science 25
Modulus (z) = x2 + y2
y
Phase (z) = arctg x
end dname
Data and procedural abstractions have been reunited in the concept of Abstract Data
Type (ADT), which is at the core of object-oriented programming languages. An ADT
defines a data structure, with associated methods, i.e., procedures for manipulating
the data. An ADT offers to the programmer an interface, used to trigger methods,
which is separated from the actual implementation, which the programmer does not
need to see. Thus, abstraction, in this context, realizes information hiding. Even
though the notion of ADT has been around for a while, a modern description of it is
provided by Gabbrielli and Martini [186]. ADTs are only one step in the evolution
of object-oriented programming, because they are passive entities, which can only
be acted upon by a controlling program; on the contrary, the notion of object goes
further, by introducing interaction possibilities via message passing, and a sort of
autonomy in letting an object invoke operations on other objects. The relationship
between classes, objects and data abstraction has been investigated by Fisher and
Mitchell [170], who compare three approaches to class-based programming, namely,
one called “premethods”, and two others called “prototype”. The authors claim that
object-based methods are superior to class-based ones.
Introducing an ADT leads spontaneously to the idea of several nested layers
of abstraction [170]. A data type may be part of an is-a hierarchy, organized as a
tree, where each node has one father, but may have several children. The interest
in defining such a hierarchy is that it is not necessary to define from scratch every
node; on the contrary, a child node automatically inherits the properties of the father
(unless specified otherwise) through downward inheritance rules, but, at the same
time, it may have some more specific properties added. For instance, if an animal
is defined as a being that eats, moves, and reproduces, a bird can inherit all the
above properties, with the addition of has-wings. The is-a relation between a type
and a sub-type is called by Goldstein and Storey an inclusion abstraction [217].
These authors define other types of abstraction as well, which will be described in
Sect. 4.4.2.
Colburn and Shute [111] make a point is differentiating Computer Science from
empirical sciences, because the latter ones have concrete models in the form of
experimental apparata as well as abstract mathematical models, whereas the former
has only software models, which are not physically concrete. Going further along
this line, the authors claim that the fundamental nature of abstraction in Computer
Science is quite different also from the one in Mathematics with respect to both the
primary product (i.e., the use of formalism), and the objectives.
The main reason of abstraction in Mathematics are inference structures (theorems
and their proofs), while in Computer Science it is interaction patterns (pieces of soft-
ware). Interactions can be considered at many levels, starting from the basic ones
between instruction and data in memory, up to the complex interactions occurring in
multi-agent systems, or even those between human users and computers. For what
26 2 Abstraction in Different Disciplines
concerns formalism, the one of Mathematics is rather “monolithic”, based on set the-
ory and predicate calculus, whereas formalism in Computer Science is “pluralistic”
and “multilayered”, involving programming languages, operating systems [481], and
networks [100]. Looking at the objectives of abstraction, Colburn and Shute make an
interesting distinction: in Mathematics the construction of models involves getting
rid of inessential details, which they call an act of information neglect, whereas in
Computer Science writing programs involves information hiding, because the details
that are invisible at a given level of abstraction cannot be really eliminated, because
they are essential at some lower level. This is true for programming languages, but
also for operating systems, and network architectures.
A teaching perspective in considering abstraction in Mathematics and Computer
Science is taken by both Leron [326] and Hill et al. [249]. Leron claims that in
Mathematics “abstraction is closely related to generalization, but each can also
occur without the other.” In order to support his claim, he offers two examples; the
first is the formula (a+b)2 = a 2 +2ab+b2 , which is generalized (but not abstracted)
when its validity is extended from natural numbers (a and b) to rational ones. On
the other hand, the same formula is abstracted when it is considered to hold for any
two commuting elements in a ring. The second example consists in the description
“all prime numbers less than 20”, which is more abstract (but not more general)
than “the numbers 2, 3, 5, 7, 11, 13, 17, 19”. In Computer Science the separation
between the high level concepts, used to solve a problem, and the implementation
details constitutes what Leron calls an abstraction barrier. Above the barrier the
problem is solved using suitably selected abstraction primitives, whereas, below
the barrier, one is concerned with the implementation of those primitives. Looking
at the mathematical examples we may see that Leron attributes to generalization
an extensional nature. Moreover, he notices that proofs of abstractly formulated
theorems gain in simplicity and insights. Finally, he makes a distinction between
descriptions of objects in terms of their structure and in terms of their functionalities,
and claims that abstraction is more often linked to the functional aspects.
For their part, Hill et al. [249] claim that “abstraction is a context-dependent, yet
widely accepted aspect of human cognition that is vitally important for success in
the study of Computer Science, computer programming and software development.”
They distinguish three types of abstraction: conceptual, formal, and descriptive. Con-
ceptual abstraction is the ability to move forward and backward between a big picture
and small details. Formal abstraction allows details to be removed and attention to
be focalized in order to obtain simplifications. Finally, descriptive abstraction is the
ability to perceive the essence of things, focalizing on their most important character-
istics; this type of abstraction also allows “salient unification and/or differentiation”,
namely it is related to generalization.
Abstraction not only plays a fundamental role in Computer Science in general
(namely, in discussing programming philosophy), but it also offers powerful tools
to specific fields. One is software testing, where abstraction has been proposed as
a useful mechanism for model-based software testing [345, 428]. Another one is
Database technology. In databases three levels of abstraction are usually considered:
the conceptual level, where the entities that will appear in the database are defined, as
2.4 Computer Science 27
well as their inter-relationships; the logical level, where the attributes of the entities
and the keys are introduced, and the physical level, which contains the actual details of
the implementation. Abstraction increases from the physical to the conceptual level.
Beyond this generic stratification, in a database it is often crucial to select an
appropriate level of abstraction concerning the very data to be memorized. With a
too fine-grained memorization the database may reach excessive size, whereas with a
too coarse-grained memorization important distinctions might be masked. The issue
is discussed, among others, by Calders et al. [87], who say that “a major problem
… is that of finding those abstraction levels in databases that allow significant
data aggregation without hiding important variations.” For instance, if a department
store has recorded every day the number and type of sold items, memorizing these
raw data over a period of three years may mask some trends that could have been
apparent if the data were aggregated, say, by weeks or months. In order to select the
appropriate level, database designers exploit hierarchies over the values of variables.
For instance, for a time variable, hour, day, week, month, and year constitute a
hierarchy of values of increasing coarseness. In an analogous way, city, region,
country constitute a hierarchy for a location variable.
In relational algebra, several among the operators can be interpreted, in an intu-
itive sense, as abstraction operators. For instance, given a relational table R with
attributes (A1 , . . . , An ) on the columns, the projection operator π Ai1 ,...,Air (R) hides
R’s columns that are not mentioned in the operator. In an analogous way the selec-
tion operator σϕ (R) selects only those tuples for which the logical formula ϕ is true,
hiding the remaining one. These operators clearly obey the principles of information
hiding, because the omitted columns or rows in R are not deleted, but only hidden:
they may be visualized again at any time.
Miles Smith and Smith [372] address the issue of abstraction in databases directly.
They say that “an abstraction of some system is a model of that system in which cer-
tain details are deliberately omitted. The choice of the details to omit is made by
considering both the intended application of the abstraction and also its users. The
objective is to allow users to heed details of the system which are relevant to the
application and to ignore other details.” As in some systems there may be too many
relevant details for a single abstraction, a hierarchy can be built up, in which some
details are temporarily ignored at any given level. In Codd’s model of a relational
database [109] abstraction requires two steps: first, a relational representation com-
patible with the intended abstraction’s semantics must be found. Second, the meaning
of this representation must be explicitly described in terms of data dictionary entries
and procedures. As we will see in Sect. 4.7.1, a similar approach is adopted by Nayak
and Levy [395] for their semantic model of abstraction.
In Computer Science another important aspect is software verification. According
to Yang et al. [572] formal program verification must cope with complex computa-
tions by means of approximations. Abstract interpretation [117] is a theory for defin-
ing sound approximations, and also a unifying framework for different approximate
methods of program verification tools. Therefore, abstract interpretation is widely
exploited in several fields, such as static analysis, program transform, debugging,
28 2 Abstraction in Different Disciplines
Fig. 2.2 “La trahison des images” (1928-9). Magritte’s painting is an “image” of a pipe, not the
“real thing”
and program watermarking. In their paper the authors describe the foundations of
abstracting program’s fixpoint semantics, and present a state of the art on the subject.
Whatever “art” might be, according to Gortais [219] “as a symbolic device, art,
whether figurative or not, is an abstraction”. This statement is well illustrated by
Magritte’s picture of a pipe (see Fig. 2.2), where the sentence “Ceci n’est pas une
pipe”10 refers to the fact that the painting “represents” a pipe but it is not the “real
thing”. Certainly, if we look at a person, an event, a landscape in the world, any
attempt of reproducing it, be it through a painting, a sculpture, a novel or a music,
forgets something existing in the original. Under this respect, art tries to get at the
essence of its subject, and hence it is indeed an abstraction of the reality, if abstraction
is intended as a process of getting rid of irrelevancies. On the other hand, a work of art
is exactly such because it makes present something that was not present before, and
may reveal what was not visible before. Moreover, art’s true value is in the emotional
relation with the public. “Each work of art will resonate in its own way over the whole
range of human emotions and each person will be touched in a different way” [219].
Art involves an abstract process, exploiting a communication “language” using a
set of symbols. In visual arts, this language is based on colors, forms, lines, and so
on. The art language of Western cultures had, in the past, a strict link with the reality
that was to be communicated: arts were figurative. Later on, the language acquired
more and more autonomy, and (at least parts of) the arts became abstract [219].
Thus, abstract art does not aim at representing the world as it appears, but rather at
composing works that are purposefully non-representational and subjective. The use
of non-figurative patterns is not new, as many of them appear on pottery and textiles
from pre-historical times. However, these patterns were elements of decoration, and
did not have necessarily the ambition to be called “art”. A great impulse to the
abandon of faithfulness to reality, especially in painting, was given by the advent of
photography. In fact, paintings were also intended to transmit to posterity the faces
of important persons, or memories of historical events. Actually, a complex interplay
exists among figurative works, abstract works, and photography. All three may show
different degrees of abstraction, and all three may or may not be classified as art at
all: history, context, culture, and social constraints, all play a role in this evaluation.
Even before photography, some painter, such as James McNeill Whistler, stressed
the importance of transmitting visual sensations rather than precise representations of
objects. His work Nocturne in Black and Gold, reported in Fig. 2.3, is often considered
as a first step toward abstract art.
A scientific approach to abstract art was proposed by Kandinsky [279], who
defined some primitives (points, lines, surfaces) of a work of art, and associated to
them an emotional content. In this way it was possible to define a syntax and a lan-
guage for art, which were free from any figurative meaning. However, the primitives
were fuzzy (when a point starts to be perceived as a surface?), and the proposed lan-
guage found difficulties in being applied. Kandinsky, with Malevich, is considered
a father of the abstract pictorial art. An examples of Malevich’ work is reported in
Fig. 2.4.
30 2 Abstraction in Different Disciplines
In Fig. 2.5 an even more abstract painting, by the contemporary French painter
Pierre Soulages, is reported. He says: “J’aime l’autorit é du noir. C’est une couleur
qui ne transige pas. Une couleur violente mais qui incite pourtant à l’int ériorisation.
A la fois couleur et non-couleur. Quand la lumière s’y refl ète, il la transforme, la
transmute. Il ouvre un champ mental qui lui est propre.”11
Since the eighteenth century it was thought that an artist would use abstraction for
uncovering the essence of a thing [377, 588]. The essence was reached by throwing
away peculiarities of instances, and keeping universal and essential aspects. This
idea of abstraction did not necessarily imply, at the beginning, moving away from
the figurative. But, once accepted that the goal of art was to attain the essence and
not to faithfully represent reality, the door to non-figurative art was open.
An example of this process is reported in Fig. 2.6, due to Theo van Doesbourg, an
early abstraction painter, who, together with Piet Mondrian, founded the journal De
Stijl. In 1930 he published a Concrete Art Manifesto, in which he explicitly denied
that art should take inspiration from nature or feelings. In Appendix A the text of the
Manifesto is reported. Actually, it sounds rather surprising that the type of totally
abstract art delineated in the Manifesto be called “concrete art”.
11 “I love the authority of black. It is a color that does not make compromises. A violent color,
but one that stimulates interiorization. At the same time a color and a non-color. When the light is
reflected on it, it is transformed, transmuted. It opens a mental field which is its own.”
2.6 Cognition 31
Fig. 2.5 Painting by Pierre Soulages (2008). Bernard Jacobson Gallery (Printed with the author’s
permission)
Fig. 2.6 Studies by Theo van Doesbourg (1919). From nature to composition
2.6 Cognition
the abstract”. However, the name stands for a large variety of different cognitive
phenomena, so that it is difficult to come up with a unifying view.
In Cognitive Science the term “abstraction” occurs frequently; even though with
different meanings and in different contexts, it is mostly associated with two other
notions, namely, category formation and/or generalization. Barsalou and co-workers
have handled the subjects in several papers (see, for instance, [34]). In particular, a
direct investigation of the concept of abstraction led Barsalou to identify six different
meaning of the word [35]:
• Abstraction as categorical knowledge, meaning that knowledge of a specific cat-
egory has been abstracted out of experience (e.g., “Ice cream tastes good”).
• Abstraction as the behavioral ability to generalize across instances, namely the
ability to summarize behaviorally the properties of a category’s members (e.g.,
“Bats live in caves”).
• Abstraction as summary representation of category instances in long-term memory
(for instance, the generation of a template for a category).
• Abstraction as schematic representation, i.e., keeping critical properties of a cate-
gory’s members and discarding irrelevant ones, or distorting some others to obtain
an idealized or caricaturized description (e.g., generating a “line drawing” carica-
ture starting from a person’s picture).
• Abstraction as flexible representation, i.e., making a representation suitable to a
large variety of tasks (categorization, inference, …)
• Abstraction as an abstract concept, referring to the distance of a concept from the
tangible world (“chair” is less abstract that “truth”).
In connection with the above classification of abstraction types, Barsalou intro-
duces three properties of abstraction: Interpretation, Structured Representation, and
Dynamic Realization. Regarding interpretation, Barsalou agrees with Pylyshyn [435]
on the fact that cognitive representations are not recordings, but interpretations of
experience, a process based on abstraction: “Once a concept has been abstracted
from experience, its summary representation enables the subsequent interpretation
of later experiences.” Moreover, concepts are usually not interpreted in isolation,
but they are connected via relationships; then, abstractions assemble components of
experience into compound representations that interpret complex structures in the
world. Finally, abstraction offers dynamic realization, in the sense that it manifests
itself in a variety of ways that makes it difficult to define it univocally.
Similar to the notion of category is the one of concept. And, in fact, abstraction is
also viewed as the process of concept formation, i.e., the process aimed at identifying
the “essence” in the sensorial input [522].
An interesting discussion concerns the comparison between abstraction theories
in classical Artificial Intelligence (where Barsalou sees them based on predicate
calculus), and in connectionism. Barsalou identifies an abstraction as an attractor
for a statistical combination of properties; here the abstraction is represented by
the active units that characterize the attractor. The connectionist view of abstraction
suffers from the problem of concept complexity, as neural nets have difficulties in
representing structured scenarios.
2.6 Cognition 33
a fast one, where children seem to use general rules about the nature of words and
lexical categories, and they become able to perform second-order generalization,
namely distinctions not between categories but between features allowing category
formation.13
The idea of an increasing children’s ability to handle abstraction agrees with
Piaget’s genetic epistemology [417], where he distinguishes empirical abstraction,
focusing on objects, and reflective abstraction, in which the mental concepts and
actions are the focus of abstraction. Young children primarily use empirical abstrac-
tion to organize the world, and then they increasingly use reflective abstraction to
organize mental concepts. The basis for Piaget’s notion of abstraction is the ability
to find structures, patterns or regularities in the world.
An interesting point is made by Halford, Wilson, and Phillips [231], who draw
attention to the role relational knowledge plays in the process of abstraction and in
analogy. In their view, the ability of dealing with relations is the core of abstract
thinking, and this ability increases with the phylogenetic level, and also with age
in childhood. The reason is that the cognitive load imposed by processing relational
knowledge depends on the complexity of the relations themselves; actually, the num-
ber of arguments of a relation makes a good metric for conceptual complexity. In fact,
the cost of instantiating a relation is exponential in the number of arguments. These
observations, corroborated by experimental findings, led the authors to conclude that
associative processing is not noticeably capacity limited, but that there are, on the
contrary, severe capacity limitations on relational processing.
According to Welling, abstraction is also a critical aspect of creativity [556].
He claims that the “abstraction operation, which has often been neglected in the
literature, constitutes a core operation for many instances of higher creativity”.
On a very basic level, abstraction can be uncovered in the principles of perceptual
organization, such as grouping and closure. In fact “it is a challenging hypothesis
that these perceptual organizations may have formed the neurological matrix for
abstraction in higher cognitive functions”. Abstract representation is a prerequisite
for several cognitive operations such as symbolization, classification, generalization
and pattern recognition.
An intriguing process, in which abstraction is likely to play a fundamental role, is
fast categorization of animals in natural scenes [132, 158, 211]. It has been observed
that humans and non-human primates are able to classify a picture as containing a
living being (or some similar task) after an exposure to the picture of only 30 ms, and
with a time constraint of at most 1 s (the median is actually 400 ms) for manifesting
recognition. The speed at which humans and monkeys can perform the task (answers
may be reached within 250 ms, with a minimum of 100 ms [211]) is puzzling, because
it suggests that the visual analysis of the pictures must occur in a single feed-forward
wave. One explanation is that recognition happens on the basis of a dictionary of
generic features, but how these features are represented and combined in the visual
system is not clear. We have here a typical case of abstraction, where the important
13 For instance they learn that solid things are named by their shapes (e.g., a glass “cube”), and
discriminant features are selected and used to achieve quick decisions. The specific
features involved may have been learned during the evolution of the species, as
recognizing a living being (typically, a predator or a pray) may be crucial for survival.
It is interesting to note that the color (which requires a rather long analysis) does
not play a significant role in the recognition, as the same recognition accuracy is
reached with gray-scale images. The fact that color does not play an essential part
suggests that the sensory computations necessary to perform the task rely on the first
visual information available for processing. In fact, color information travels along
a relatively slow visual pathway (the parvocellular system), and the decision might
be taken even before it gains access to mental representations.
According to recent findings [132], recognition might exploit both global aspects
of the target and some intermediate diagnostic features. An important one is the
size of the animal’s body in the picture; in fact, humans are quite familiar with the
processing of natural photographs, so that they may have an implicit bias about
the scale of an animal target within a natural scene. However this does not seem to
be true for monkeys.
A hypothesis about the nature of the processing was investigated very recently by
Girard and Koening-Robert [211]. They argue that fast categorization could rely on
the quantity of relevant information contained in the low spatial frequencies, because
these last could allow a quick hypothesis about the content of the image to be built
up. It would be very interesting to come up with a theory of abstraction capable of
explaining (or, at least, describing) such a challenging phenomenon.
Another curious cognitive phenomenon, in which abstraction plays a crucial role,
is “change blindness” [327, 452, 491, 492], firstly mentioned by the psychologist
W. James in his book The Principles of Psychology [274]. This phenomenon arises
when some distracting element hinders an observer from noticing even big changes
occurring in a scene which he/she is looking at. Change blindness occurs both in the
laboratory and in real-world situations, when changes are unexpected. It is a symptom
of a large abstraction, performed on a scene, which has the effect of discarding a
large portion of the perceptual visual input, deemed to be inessential to one’s current
goal. For example, in an experiment a video shows some kids playing with a ball;
asked to count how many times the ball bounces, all observers failed to see a man
who traverses the scene holding an open umbrella.14 Clearly, abstraction is strongly
connected to attention, on the one hand, and to the goal, on the other.
Recent studies on the phenomenon include neurophysiological approaches
[11, 85], investigation of social effects (changes between images are easier noticed
when individuals work in teams as opposed to individually) [530], and level of exper-
tise of the observer (experts are less prone to change blindness, because they can reach
a deeper level in analyzing a problem than a novice) [161].
A field where the development of computational models of abstraction could be
very beneficial is spatial cognition. According to Hartley and Burgess, “the term
spatial cognition covers processes controlling behaviors that must be directed at
psych.ubc.ca/~rensink/flicker/download/.
36 2 Abstraction in Different Disciplines
The notion of granularity has been addressed also by Euzenat [154–156] in the
context of object representation in relational systems. He defined some operators for
changing granularity, subject to suitable conditions, and used this concept to define
approximate representations, particularly in the time and space domains.
A very interesting link between abstraction and the brain’s functioning is provided
by Zeki [580–582], who gives to the first part of his book, Splendors and Miseries
of the Brain, the title “Abstraction and the Brain”. Zeki suggests that behind the
large variety of functions performed by the cells in the brain on inputs of differ-
ent modalities there is a unifying functionality, which is the ability to abstract. By
abstraction Zeki means “the emphasis on the general property at the expense of the
particular”. As an example, a cell endowed with orientation selectivity responds to
a visual stimulus along a given direction, for instance the vertical one. Then, the cell
will respond to any object vertically oriented, disregarding what the object actually
is. The cell has abstracted the property of verticality, without being concerned with
the particulars. The ability to abstract is not limited to the cells in the visual system,
but extends to all sensory areas of the brain, as well as to higher cognitive properties
and judgmental levels.
According to Zeki [582], the brain performs another type of abstraction, which
is the basis for the perceptual constancy. Perceptual constancy allows an object to
be recognized under various points of view, luminance levels, distance, and so on.
Without this constancy, recognition of objects would be an almost impossible task.
An excellent example is color constancy: even though the amount of red, green and
blue of a given surface changes with different illuminations, our brain attributes to the
surface the same color. Then, abstraction, in this context, is the capability of the brain
to capture the essence of an object, independently from the contextual conditions of
the observation. As a conclusion, Zeki claims that “a ubiquitous function of the
cerebral cortex, one in which many if not all of its areas are involved, is that of
abstraction” [582].
2.7 Vision
Vision is perhaps the field where abstraction is most fundamental and ubiquitous,
both in human perception and in artificial image processing. Without the ability to
abstract, we could not make sense of the enormous number of pixels continuously
arriving at our retina. It is abstraction that allows us to group pixels into objects, to
discard irrelevant details, to visually organize in a meaningful way the world around
us. Then, abstraction necessarily enters into any account of vision, either explicitly
or implicitly. In the following we will just mention those works that make more or
less explicit reference to some kind of abstraction.
One of the fundamental approach to vision, strictly related to abstraction, is the
Gestalt theory [558]. “Gestalt” is a German word that roughly means “form”, and
the Gestalt Psychology investigates how visual perception is organized, particularly
concerning the part-whole relationship. Gestalt theorists state that the “whole” is
38 2 Abstraction in Different Disciplines
(a) (b)
Fig. 2.7 a A case of clear separation between foreground and background. b A case of ambiguous
background in Sky and Water II, Escher, 1938 (Permission to publish granted by The M.C. Escher
Company, Baarn, The Netherlands)
greater than the sum of its parts, i.e., the “whole” carries a greater meaning than
its individual components. In viewing the “whole”, a cognitive process takes place
which consists of a leap from comprehending the parts to realizing the “whole”.
Abstraction is exactly the process by which elements are grouped together to
form meaningful units, reducing thus the complexity of the perceived environment.
According to Simmons [489], parts are grouped together according to function as
well; in this way the functional salience of parts [538] determines the granularity
level from the functional point of view, which often, but not always, coincides with
the level suggested by the perceptual one (gestalt).
The Gestalt theory proposes six grouping principles, which appear to underly the
cognitive organization of the visual input. More precisely:
• Foreground/Background—Visual processing has the tendency to separate figures
from the background, on the basis on some feature (color, texture, …). In complex
images several figures can become foreground in turn. In some cases, the relation
fore/background is stable, whereas in others the mind oscillates between alternative
states (see Fig. 2.7).
• Similarity—Things that share visual characteristics (shape, size, color, texture, …)
will be seen as belonging together, as in Fig. 2.8a. The same happens for elements
that show a repetition pattern. Repetition is perceived as a rhythm, producing a
pleasing effect, as in Fig. 2.8b.
• Proximity—Objects that are close to one another appear to form a unit, even
if their shapes or sizes radically differ. This principle also concerns the effect
generated when a collection of elements becomes more meaningful than their
separate presence. Examples can be found in Fig. 2.9.
2.7 Vision 39
(a) (b)
Fig. 2.8 a The set of circles in the middle of the array is perceived as a unit even though the
surrounding squares have the same color and size. b A pleasant repeated arrangement of bicycles
in Paris
(a) (b)
Fig. 2.9 a The set of squares is perceived as two separate entities (left and right), even though the
squares are all identical. b A ground covered by leaves, where the individual leaves do not matter
singularly, but only their ensemble is perceived
One of the first and most influential work, which has very strict links with abstrac-
tion, is Marr’s proposal of vision as a process going through a series of representation
stages [352, 353]. Particularly relevant for our purposes is the sketchy 3-D repre-
sentation by means of a series of “generalized cones”, as illustrated in Fig. 2.13.
The successive stages of a scene representation, from the primal sketch to the 3-D
description, can be considered as a series of level of abstraction. Another fundamen-
tal contribution to the modeling of human vision was provided by Biederman [59],
who introduced the idea that object recognition may occur via segmentation into
regions of deep concavity and spatial arrangement of these last. Components can
be represented by means of a small set of geons, i.e., generalized cones detectable
in the image through their curvature, collinearity, symmetry, parallelism, and co-
termination. As the geons are free to combine with one another, a large variety of
objects can be represented. A Principle of Componential Recovery asserts that the
identification of two or three geons in an object representation allows the whole object
to be recovered, even in presence of occlusion, rotation, and severe degradation.
2.7 Vision 41
Fig. 2.12 The symmetry of Notre Dame de Paris appeals to our sense of beauty
Fig. 2.13 Organization of shape information in a 3-D model description of an object based on
generalized cone parts. Each box corresponds to a 3-D model, with its model axis on the left side
of the box and the arrangement of its component axes on the right. In addition, some component
axes have 3-D models associated with them, as indicated by the way the boxes overlap (Reprinted
from Marr [353])
42 2 Abstraction in Different Disciplines
Fig. 2.14 The abstraction technique combines structural information (left) with feature information
(right) (Reprinted with permission from de Goes et al. [125])
Fig. 2.16 Example of a picture (left) and its rendering with lines and color regions (right) (Reprinted
with permission from DeCarlo and Santella [129])
observer to easily extract the core meaning of a picture, leaving aside details. A
human user interacts with the system, and simply looks at an image for a short
period of time, in order to identify meaningful content of the image itself. Then, a
perceptual model translates the data gathered from an eye-tracker into predictions
about which elements of the image representation carry important information.
In order to cope with the increased resolution power of modern cameras, image
processing requires a large amount of memory to store the original pictures. Then,
different techniques of image compression are routinely used. Image compression can
be lossy or lossless. Lossy compression methods exploit, among other approaches,
color reduction, Fourier (or other) transforms, or fractals. They throw away part of
the content of an image to accomplish a trade-off between memory requirements and
fidelity. For instance, in natural images the loss of some details can go unnoticed,
but allows a large economy in memorization space. Lossy compression can be seen
as an abstraction process, which (irreversibly) reduces the information content of
an image.
When reduction in information is not acceptable, a lossless compression is suit-
able. There are many methods that can be used, including run-length encoding, chain
codes, deflation, predictive coding, or the well-known Lempel-Ziv-Welch algorithm.
Lossless compression is a process of image transformation, because the content of
the image is preserved, while its representation is made more efficient.
A technique related to abstraction, which is widely used in graphics, is the Level of
Detail (LOD) approach, described by Luebke et al. [348]. In building up graphic sys-
tems, there is always a conflict between speed and fluidity of rendering, and realism
and richness of representation. The field of LOD is an area of interactive computer
graphics that tries the bridge the gap between performances and complexity by accu-
rately selecting the precision with which to represent the world. Notwithstanding
the great increasing in the power of the machines devoted to computer graphics,
the problem is still up-to-date, because the complexity of the needed models has
increased even faster.
The idea underlying LOD, illustrated in Fig. 2.17, is extremely simple: in render-
ing, objects which are far, or small, or less important contain much less details that
the more close or important ones. Concretely, several versions of the same object are
46 2 Abstraction in Different Disciplines
(b)
created, each one faster and with less details than the preceding one. When composing
a scenario, for each object the most suitable LOD is selected.
The creation of the various versions starts from the most detailed representation
of an object, the one with the greatest number of polygons. Then, an abstraction
mechanism reduces progressively this number, trying to keep as much resemblance as
possible with the original one. In recent years several algorithms have been described
to automatize this simplification process, which was performed manually in the past.
As the generated scenes are to be seen by humans, an important issue is to investigate
what principles of visual perception may suggest the most effective simplification
strategies.
An approach inspired by the LOD has been described by Navarro et al. [394]
to model and simulate very large multi-agent systems. In this case the trade-off is
between the amount of details that must be incorporated into each agent’s behav-
ior and the computational power available to run the simulation. Instead of a pri-
ori choosing a given level of detail for the system, the authors propose a dynamic
approach, where the level of detail is a parameter that can be adjusted dynamically
and automatically during the simulation, taking into account the current focus and/or
special events.
2.8 Summary 47
2.8 Summary
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 49
DOI: 10.1007/978-1-4614-7052-6_3, © Springer Science+Business Media New York 2013
50 3 Abstraction in Artificial Intelligence
type of mapping. However, any given mapping is not necessarily an abstraction, and
some additional constraints are needed so that the mapping be so qualified. Usually,
the constraints require that the solution of the problem at hand, in the abstract space,
be “easier”, in some sense, than the solution in the ground one.
As already mentioned in Chap. 1, it is well known since the beginning of AI
[14, 371, 462, 490] (and even before [424]) that a “good” representation is a key to
successfully solve problems. From another perspective, AI theories of abstraction
can be embedded in the framework of representation changes.
Representation changes considered in abstraction theories broadly fall into one
of four categories:
• perceptive/ontological (mapping between sensory signals/objects)
• syntactic (mapping between predicates, namely words of formal languages)
• semantic (mapping between semantic interpretations in logical languages)
• axiomatic (mapping between logical theories)
Historically, the first explicit theory of abstraction started at the axiomatic level.
Plaisted [419] provided a foundation of theorem proving with abstraction, which he
sees as a mapping from a set of clauses to another one that satisfies some properties
related to the deduction mechanism. Plaisted introduced more than one abstraction,
including a mapping between literals and a semantic mapping. A more detailed
description of his work will be given in Chap. 4. Later on Tenenberg [526] pointed
out some limitations in Plaisted’s work, and defined abstraction at a syntactic level
as a mapping between predicates, which preserves logical consistency. Giunchiglia
and Walsh [214] have extended Plaisted’s approach and reviewed most of the work
done at the time in reasoning with abstraction. They informally define abstraction as
a mapping, at both axiomatic and syntactic level, which preserves certain desirable
properties, and leads to a simpler representation. Recently, Kinoshita and Nishizawa
have provided an algebraic semantics of predicate abstraction in the Pointer Manip-
ulation Language [288].
Nayak and Levy [395] have proposed a theory of abstraction defined as a mapping
at the semantic level. Their theory defines abstraction as a model level mapping
rather than predicate mapping, i.e., abstraction is defined at the level of formula
interpretation. More recently, De Saeger and Shimojima [464] have proposed a theory
of abstraction based on the notion of channeling. This theory considers abstractions
3.1 Theoretical Approaches 51
as theories themselves, allowing the nature of the mapping at the different levels to
be defined formally (axiomatic, syntactic, and semantic).
For abstractions at the ontologic level we can mention the seminal works by
Hobbs [252] and Imielinski [269]. Hobbs’ approach aims at generating, out of an
initial theory, a computationally more tractable one, by focusing on the granularity
of objects or observations. Similarly, Imielinski proposed an approximate reasoning
framework for abstraction, by defining an indistinguishability relation among objects
of a domain. While all the aforementioned models rely on symbolic representation,
Saitta and Zucker [468] proposed a theory which adds the possibility of explicitly
defining abstraction at the observation (perception) level. The associated KRA model
will be described in Chap. 6.
In the following we will describe the basic aspects of the theoretical approaches
to abstraction mentioned so far, with the aim of providing the reader with an intuition
of the ideas behind. The formal treatment of some of the models will be presented
in Chap. 4.
As Logic has been the most used formalism to represent knowledge at the start
of AI researches, it is not surprising that the first models of abstraction were defined
within some logical formalim. Plaisted considered clauses in First Order Predi-
cate Logic (FOL) as a knowledge representation formalism [419]. His definition
of abstraction consists in a mapping f from a clause A to a “simpler” set of clauses
B = f (A). The mapping is such that B has always a solution if A does, but not
vice-versa. The idea is that a solution for B may act as a guide for finding a solution
for A with a reduced effort. To be valid, the mapping must satisfy some properties
which are considered “desirable” from the resolution mechanism point of view.
Plaisted provided several examples of abstractions; for instance, the ground
abstraction associates to a clause the set of all its ground instances (which may
be infinite), whereas the deleting argument abstraction reduces the arity of a clause.
Among other cases, Plaisted also considers predicate mapping, where two or more
predicates are mapped to a single one.
Example 3.1 (Predicate mapping). Let us consider a simple problem represented in
FOL, namely making plane ticket reservation for a family. Let us suppose that there
are two predicates in the ground theory, namely Son and Daughter, which describe
the family relationship between a father John and two of his kids, Zoe and Theo. An
abstraction may consist in replacing the two predicates of the ground theory by a
single one in the abstract theory, for example, Kid. In the abstract theory John would
simply have two kids. Indeed, the abstract representation counts less predicates, and
supports faster processing where the distinction between girl and boy is not relevant,
as it is for both place occupancy and ticket price.
Even though the notion of predicate mapping is very intuitive and appealing,
Tenenberg [526] pinpointed a problem in it, which is related to the generation of
false evidence. In fact, reasoning with the corresponding abstract theory may lead
to inconsistencies. For instance, in Example 3.1, the presence of an axiom stating
that a Son is a Boy, may lead to conclude, in the abstract theory, that a Kid is a Boy,
52 3 Abstraction in Artificial Intelligence
implying thus that Zoe is a Boy as well. Plaisted was aware of this problem, which he
called the “false proof” problem. Tenenberg tried to solve the problem by defining an
abstraction as a predicate mapping (or a clause mapping), in which only consistent
clauses are kept. Unfortunately, checking consistency is only semi-decidable.
About ten years after Plaisted’s seminal work, Giunchiglia and Walsh [214] have
proposed a more general theory of abstraction, which integrates both predicate and
clause mapping. According to these authors, the majority of the abstractions in prob-
lem solving and theorem proving may be represented as a mapping between formal
systems. They also note that most abstractions neither modify the axioms nor the infer-
ence rules, and are therefore in most cases a pure mapping of languages. Giunchiglia
and Walsh’s goals in introducing a theory of abstraction included understanding the
meaning of abstraction, investigating the formal properties of the operators for a
practical implementation, and suggesting ways to build abstractions.
A useful distinction among abstractions, introduced by Giunchiglia and Walsh, is
among Theorem-Decreasing (TD), Theorem-Increasing (TI), and Theorem-Constant
(TC) abstractions. In a TI-abstraction the abstract space has more theorems that the
ground one, while the opposite happens for a TD-abstraction. In TC-abstractions,
instead, ground and abstract spaces have exactly the same theorems. Giunchiglia
and Walsh have argued that useful abstractions for the resolution of problems are the
TI-abstractions, because they preserve all existing theorems.
Example 3.2 (TI-Abstraction). Going back to Example 3.1, let us consider the book-
ing of a plane ticket for the family between Hanoi and Paris. If the axiom represent-
ing the constraint that the whole family is traveling on the same plane is dropped,
an online booking system could find more possible flights. Therefore, the abstrac-
tion consisting in dropping the axiom that the family should fly together is a TI-
abstraction.
Example 3.3 (TD-Abstraction). Let us consider now the axiom that states that two
cities A and B are connected by a flight if there is a city C that is directly connected
bidirectionally to A and B. In the ground space this axiom allows trips that have
several stops. A flight from Hanoi to Paris may then be booked even though there is
a stop in Saigon or Frankfurt. The abstraction consisting in removing such axiom is
a TD-abstraction. Indeed in the abstract space there will be far less possible flights,
as only direct flights will be considered.
Examples 3.2 and 3.3 show what might not be intuitive at first thought, i.e., that by
simplifying a representation from a syntactical point of view, one can either increase
or decrease the number of solutions (described in the theory as theorems).
The shortcomings of the syntactic theory of abstraction described above are that
while it captures the final result of an abstraction, the theory does not explicitly
capture the underlying justifications or assumptions that lead to the abstraction, nor
the mechanism that generates the abstraction itself.
Nayak and Levy [395], extending Tenenberg’s work, have proposed a seman-
tic theory of abstraction to address the shortcomings of syntactic theories. Indeed,
they view abstraction as a two step process. The first step consists in abstracting
3.1 Theoretical Approaches 53
the “domain” of interpretation, and the second one in constructing a set of abstract
formulas that best capture the abstracted domain. This semantic theory yields abstrac-
tions that are weaker than the base theory, i.e., they are a proper subset of TD-
abstractions. Nayak and Levy introduce two important notions: Model Increasing
abstractions (MI), which are a proper subset of TD-abstractions, and Simplifying
Assumptions (SA), which allow abstractions to be evaluated according to the relia-
bility of the assumptions themselves.
Example 3.4 Going back to Example 3.1, a simplifying assumption could be that
for the task of finding an airplane route, the difference between daughter and son
is not relevant, because what counts is that they are kids. The sets of models of the
two predicates Son and Daughter would then be merged into a single set of models,
corresponding to a new predicate, namely Kid. As for the axioms stating that a “Son
is a Boy” and a “Daughter is a Girl”, their abstract counterparts would both be
constructed from the models of the ground predicates, and not mapped syntactically.
One possible outcome of this construction is an abstract axiom stating that a “Kid is
a Boy OR a Girl”. Such mapping of models does not introduce false evidence.
Abstractions of the ontological type deal with the objects in a domain, and aim
at reducing the number of different objects by declaring some equivalence rela-
tion among subsets of them. The best known approaches to this type of abstraction
are those by Hobbs [252] and Imielinski [269]. Hobbs introduces the concept of
granularity, linked to an indistinguishability relation. Two objects are said to be
indistinguishable (and hence treated as the same object) if they satisfy a set R of
relevant predicates, defined a priori on the basis of domain knowledge. R parti-
tions the objects into equivalence classes, each one represented by a single symbol.
Equivalence between objects is used to make tractable FOL theories.
Imielinski starts from the notion of error in numerical values, and tries to extend it
to FOL knowledge bases, used to answer queries. One way of introducing an “error”
in a logical knowledge base is to apply Hobb’s indistinguishability relation, making
objets in the domain collapse into equivalence class. Reasoning with the abstract
knowledge base is called limited reasoning by Imielinski.
The syntactic theories presented before are good at characterizing and classifying
existing abstractions, but they fail to offer a constructive approach to the problem of
creating abstractions. Moreover, these approaches manipulate symbols that are not
related to real world object: they are not grounded. The semantic theory of abstraction
proposed by Nayak and Levy [395] provides the basis for a constructive approach,
but is substantially limited to one type of abstraction, namely, predicate mapping.
There are other approaches to abstraction, which address the grounding problem and
consider non logical representations, such as images or signals. These approaches are
at the perception level as opposed to more formal levels, where information is already
encoded in symbols. Such type of perception mapping is particularly important in
the signal analysis and media community.
Example 3.5 (Perceptual Abstraction). A very common example of perceptual
abstraction is changing the resolution of a digital screen. When the resolution is
54 3 Abstraction in Artificial Intelligence
lowered, some details, which were visible at a higher resolution, are no more visible.
Another example of perceptual abstraction, easy to implement with simple image
processing tools, is to change a color image into a black and white one.
Although the theories mentioned above are all “theories of abstraction”, they are quite
different in their content and applicability, depending on the representation level at
which abstraction is considered. In Fig. 3.2 a summary of the types of approaches
proposed in the literature is presented, together with their associated seminal papers.
Notwithstanding the large use of abstraction in different tasks and domains, the-
oretical frameworks to model abstraction are only a few, and also rather old. More
general theories of representation changes, such as Korf’s one [297], are too general,
as they do not consider any particular language of representation or formalism. As
such they are not easily made applicable. One of the difficulties comes from the fact
that choosing a “good” abstraction is not easy, given that the very notion of “good”
is strongly task-dependent. It is then important to understand the reasons that justify
the choice of a particular abstraction, and the search for better ones. A comprehen-
sive theory of the principles underlying abstraction would be useful for a number of
reasons; in fact, from a practical point of view it may provide:
• the means for clearly understanding the different types of abstraction and their
computational cost,
• the semantic and computational justifications for using abstraction,
• the framework to support the transfer of techniques between different domains,
• suggestions to automatically construct useful abstractions.
Finally, we may also mention that, in AI, abstraction has been often associated to the
idea of problem reformulation, which goes back to Lowry [344]. Lowry is concerned
with design and implementation of algorithms; both specifications and algorithms
are viewed as theories, and reformulation is defined as a mapping between theories.
His system STRATA works in three steps: first, it removes superfluous distinctions
in the initial conceptualization of the problem supplied by the user. Then, it designs
an abstract algorithm, exploiting pre-defined algorithm schemas, which are seen
3.1 Theoretical Approaches 55
Fig. 3.3 In ABSTRIPS a complete plan is developed at each level of abstraction before descending
to a more detailed level. First, a plan that uses actions with the highest criticalities is found. Then
this plan is iteratively refined to reach one that satisfies all the less critical preconditions. Each
abstract plan is an esquisse of a final plan (Reprinted from Nilsson [402])
language is the basis of most languages for expressing automated planning inputs in
today’s solvers.
A STRIPS problem instance is composed of an initial state, a set of goal states, and
a set of actions. Each action description includes a set of preconditions (facts which
must be established before the action can be performed), and a set of postconditions
(facts that are added or removed after the action is performed). The search space can
be modeled as a graph, where nodes correspond to states, and arcs correspond to
actions. A plan is a path, that is, a sequence of states together with the arcs linking
them. ABSTRIPS first constructs an abstraction hierarchy for a given problem space,
and then uses it for hierarchical planning, as illustrated in Fig. 3.3. More precisely,
ABSTRIPS assigns criticalities (i.e., relative difficulties) to the preconditions of each
action. First, a plan that uses actions with the highest criticality is found. Then this
plan is iteratively refined to reach one that satisfies all the less critical preconditions.
The degree of criticality induces thus a hierarchy of abstract spaces. Each abstract
plan is an esquisse of a final plan.
Another historical system using a hierarchical approach has been PRODIGY,
in which abstraction has been integrated with Explanation-Based Learning (EBL)
[290].
Although abstraction and hierarchical planning seem central to controlling com-
plexity in real life, several authors argue that it has not been used as extensively as it
could have been. The reason might lie in the fact that there is still much work to be
done in order to better understand all the different ways of doing optimal abstraction
planning. As an example, even though ABSTRIPS was proposed in 1974, it is only
twenty years later that Knoblock did a thorough analysis of the algorithm [289]. As
a result it became clear that ABSTRIP implicitly assumed that the low criticality
3.2 Abstraction in Planning 57
2 A constraint C is often called global when processing C as a whole gives better results than
processing any conjunction of constraints that is semantically equivalent to C [56].
60 3 Abstraction in Artificial Intelligence
More recently, the framework for abstraction in CSPs has been extended to Soft
Constraints [63]. Soft constraints, as opposed to hard constraints, are represented
as inequalities, and may correspond to preferences [458]. Although very flexible
and expressive, they are also very complex to handle. The authors have shown that
“processing the abstracted version of a soft constraint problem can help us in finding
good approximations of the optimal solutions, or also in obtaining information that
can make the subsequent search for the best solution easier”.
The semiring-based CSP framework proposed by Bistarelli et al. [63] has been
extended by Li and Ying [332], who propose an abstraction scheme for soft con-
straints that uses semiring homomorphism. To find optimal solutions of the concrete
problem, one works first on the abstract problem for finding its optimal solutions,
and then uses them to solve the concrete problem. In particular, the authors find
conditions under which optimal solutions are preserved under mapping.
A method for abstracting CSPs represented as graphs has been proposed by
Epstein and Li [152]. Through a local search, they find clusters of tightly connected
nodes, which are then abstracted and exploited by a global searcher.
An improvement of the scalability of CSPs has been obtained via reformulation.
Bayer et al. [44] describe four reformulation techniques that operate on the vari-
ous components of a CSP, by modifying one or more of them (i.e., query, variable
domains, constraints) and detecting symmetrical solutions to avoid generating them.
Reformulation for speeding up solving CSPs has also been proposed by Charnley
et al. [93].
A very interesting, but isolated, approach to CSPs is described by Schrag and
Miranker [479]. They start considering the phase transition between solvability and
unsolvability existing in CSPs, and try to apply domain abstraction to circumvent
it. Domain abstraction is an efficient method for solving CSPs, which is sound and
incomplete with respect to unsatisfiability; then, its application is useful only when
both the ground and the abstract problems are unsatisfiable. The authors have char-
acterized the effectiveness of domain abstraction, and found that this effectiveness
undergoes itself a phase transition, dropping suddenly when the loosening of con-
straints, generated by the abstraction, increases. Finally, they developed a series
of analytical approximations to predict the location of the phase transition of the
abstraction effectiveness.
some property. Examples of this type of abstraction in graphs are given by Saitta
et al. [466], Bauer et al. [43], Boneva et al. [67], Bulitko et al. [84], Harry and
Lindquist [234], whereas abstraction in hierarchical Hidden Markov Models has
been handled by Galassi et al. [187], Fine et al. [169], and Murphy and Paskin [387].
3.6 Summary
The brief survey of abstraction in AI gives an overview of the different concepts that
are frequently associated or used to define abstraction. Although this chapter does
not account for all the research carried out on abstraction in AI,3 it allows the main
concepts that are common to many studies to be identified.
The notion of “detail” is often associated to that of relevance for a class of tasks
[64, 320, 513]. Details which are hidden are indeed defined as being “less relevant”
to these tasks. In Machine Learning, for example, Blum and Langley [64] have given
several definitions of attribute relevance. Their definitions are related to measures
that quantify the information brought by an attribute with respect to other attributes,
or the class, or the sample distribution.
As in practice the choice among sets of alternative abstractions may be difficult,
given their large number and under-constrainedness, and the fact that abstraction must
preserve some additional “desirable property”. These “desirable” properties differ
according to the field where abstraction is used. In problem solving, for example, a
classical desired property is the “monotonicity” [289, 462]. This property states that
operator pre-conditions do not interfere once abstracted. Another useful property is
the “downward refinement” [27] that states that no backtrack in a hierarchy of abstract
spaces is necessary to build the refined plan. In theorem proving, a desirable property
states that for each theorem in the ground representation there exists a corresponding
abstract one in the abstract representation (TI-abstraction). In Machine Learning,
a desirable property states that generalization order between generated hypotheses
should be preserved [208]. In Constraint Satisfaction Problems a desirable property
is that the set of variables that are abstracted into one have “interchangeable supports”
[182, 183]. A domain independent desirable property states that some order relation
is preserved [252].
Finally, the notion of simplicity is essential to characterize abstract representations.
All these notions will be useful to establish a definition of abstraction in the next
chapter.
3 For example abstraction in games [205, 486], or abstraction in networks [466, 472, 586], or
abstraction in Multiple Representation Modeling (MRM) [13, 41, 123, 192], or many others.
Chapter 4
Definitions of Abstraction
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 65
DOI: 10.1007/978-1-4614-7052-6_4, © Springer Science+Business Media New York 2013
66 4 Definitions of Abstraction
them, because it stresses one or few aspects over all the other ones. In conclusion,
the important issues in abstraction can be summarized as follows:
1. Simplicity—There is a general agreement that abstraction should reduce the com-
plexity of tasks. Even though simplicity is not easy to define as well (see Chap. 10),
an intuitive notion may nevertheless help us in several contexts.
2. Relevance—Abstraction is largely supposed to capture relevant aspects of prob-
lems, objects, or perceptions. It is then a mechanism suitable to select those
features that are useful to solve a task.
3. Granularity—An entity can be described at different levels of detail; the less
details a description provides, the more abstract it is. By progressively reducing
the amount of details kept in a description, a hierarchy of abstractions is obtained.
Details or features may be hidden, and hence removed from descriptions (selective
abstraction), or aggregated into larger units (constructive abstraction). Granularity
is also linked to the notion of scale.
4. Abstract/concrete status—Abstraction is connected with the idea of taking a dis-
tance from the sensory world. The dichotomy applies to ideas, concepts, or words,
which can be classified as either concrete or abstract. This issue is also related to
the nature of abstraction as a state or a process.
5. Naming—When a name is given to an entity, this name stands for the properties
and attributes characterizing the entity; in a sense, the name captures its essence.
Abstraction is also naming.
6. Reformulation—Abstraction can be achieved through a change of representation.
Even though reformulation is most often used for problem formalization, it can
also be applied to data. Representation changes may involve either description
languages, or the described content, or both.
7. Information content—Abstraction is related to the amount of information an
entity (object, event, …) provides.
When introducing a formal definition, we will analyze which among the above issues
are specifically targeted. In this chapter the review of the theoretical models proposed
for abstraction shall mostly follow a chronological order. An exception is the first
work by Giunchiglia and Walsh [214], which is described first. The reason is that
this work introduces some notions that are useful to classify and compare abstraction
models, and we will use them to this purpose.
Some foundations of abstraction have been set by Giunchiglia and Walsh [213, 214],
who tried to provide a unified framework for handling abstraction in reasoning, at
the same time defining their own theory. The authors’ central goal was to provide a
general environment for the use of abstraction in automated deduction. Giunchiglia
and Walsh start from the definition of a formal system:
4.1 Giunchiglia and Walsh’ Theory 67
Definition 4.1 (Formal system) A formal system is a pair Σ = (Θ, L), where Θ is
a set of well-formed formulas (wff) in the language L.
Abstraction is then defined as a mapping between formal systems, preserving some
desirable properties (specified later on).
Definition 4.2 (Abstraction) An abstraction f : Σ1 → Σ2 is a pair of formal
systems (Σ1 , Σ2 ), with languages L1 and L2 , respectively, and an effective, total
function fL : L1 → L2 . Σ1 is called the “ground” space and Σ2 the “abstract” one,
whereas fL is called the “mapping function”.
In a formal system the set of theorems of Σ, denoted by TH(Σ), is the minimal set of
well formed formulas, including the axioms, that is closed under the inference rules
(used to perform deduction). Being oriented to theorem proving, the authors choose
provability as the central notion, and classify abstraction mappings with respect to
this notion.
Definition 4.3 (T ∗ -Abstraction) An abstraction f : Σ1 → Σ2 is said to be:
1. Theorem Constant (TC) iff, for any wff α, α ∈ TH(Σ1 ) iff fL (α) ∈ TH(Σ2 ),
2. Theorem Decreasing (TD) iff, for any wff α, if fL (α) ∈ TH(Σ2 ) then α ∈
TH(Σ1 ),
3. Theorem Increasing (TI) iff, for any wff α, if α ∈ TH(Σ1 ) then fL (α) ∈ TH(Σ2 ).
A graphical representation of the various types of abstraction is reported in Fig. 4.1.
Giunchiglia and Walsh do not consider TC-abstractions any further, because they
are too strong, and hence not very useful in practice. Furthermore, Giunchiglia and
Fig. 4.1 Classification of abstraction mappings according to provability preservation. The set of
theorems TH(Σ2 ) can be either identical, or a proper subset, or a proper superset of the abstractions
of the theorem set TH(Σ1 )
68 4 Definitions of Abstraction
Fig. 4.2 The combinations of the Deductive/Abductive and Positive/Negative modes of using
abstraction
4.1 Giunchiglia and Walsh’ Theory 69
The first model that we consider (see Sect. 2.1) was proposed by Wright [569] and
Hale [230], following an idea of Frege, and is based on the notion of Abstraction
Principle.
Definition 4.8 (Abstraction Principle) Let f (x) be a function, defined on a variable
x ranging over items of a given sort. An Abstraction Principle is an equation:
train(x) = train(y) iff (x and y are carriages) and (x and y are connected).
In words, the above principle states that two carriages, connected together, share
(belongs to) the same train. By applying the above definition we may conclude that
the train is an abstract entity. Hale and Wright have proposed more sophisticated
accounts of Abstraction Principles (see also [168]), but it is still unclear whether
their new approaches are free from counterexamples.
1 The function train(x) is defined as train : CARS → TRAINS, and train(x) provides the identifier
of the train that contains car x.
2 See Sect. 2.1
72 4 Definitions of Abstraction
of the system at hand, as it only focalizes on some specific aspects; on the other,
LoAs are intended to capture exactly those aspects that are relevant to the current
goal. For instance, if we want to choose a wine for a special dish, we may define a
“tasting LoA”, including bouquet, sweetness, color, acidity; if, instead, we want to
buy a bottle of wine, a “purchasing LoA”, including maker, vintage, price, and so
on, is more useful. LoAs allow multiple views of a system, but they are not sufficient
to completely describe it; the additional notion of behavior is needed.
Definition 4.13 (Behavior) The behavior of a system, at a given LoA, consists of a
predicate Π , whose free variables are observables at that LoA. The substitutions of
values for the observables that make the predicate true are called the system behaviors.
A moderated LoA is a LoA together with a behavior.
When the observables of a LoA are defined, it is usually the case that not all the
combinations of possible values for the observables are realizable. The behavior
aims at capturing only those combinations that are actually possible.
LoAs are then linked to the notion of granularity in describing systems, and
Floridi takes a further step by allowing multiple LoAs. To this aim, the notion of
relation must be recalled.
Definition 4.14 (Relation) Given a set A and a set C, a relation R from A to C is a
subset of the Cartesian product A × C. The reverse of R is the set {(y, x)|(x, y) ∈ R},
where x ∈ A and y ∈ C.
A relation R from A to C translates any predicate p(x) on A to a predicate qR [p] on
C, such that qR [p](y) is true at just those y ∈ C that are the image through R of some
x ∈ A, satisfying p, namely:
In order to see more precisely the meaning of the introduced relation, let us define
the cover of R:
COV (R) = {(x, y)| x ∈ A, y ∈ C and R(x, y)}
Then:
COV (R−1 ) = {(y, x)| x ∈ A, y ∈ C and R(x, y)}
qR [p] is a predicate whose instances are in that subset of C that contains the images
of the points in COV (p(x)).
Example 4.1 Let A be the set of men and C the set of women. Let x ∈ A, y ∈ C,
and Son be the relation Son ⊆ A × C, linking a male person with his mother. Let
moreover p be the predicate Student. Then:
4.2 Abstraction in Philosophy 73
We have that COV (qSon [Student](y)) = {Mothers (subsets of C) whose sons are
students (subset of A)}.
Finally, the main notion of Floridi’s account of abstraction is provided by the
following definition:
Definition 4.15 (Gradient of abstraction) A Gradient of Abstraction (GoA) is a
finite set of moderated LoAs Li (0 i n), a family of relations Ri,j ⊆ Li × Lj
(0 i
= j < n), relating the observables of each pair (Li , Lj ) of distinct LoAs in
such a way that:
1. relation Ri,j is the reverse of relation Rj,i for i
= j,
2. the behavior pj at Lj is at least as strong as the translated behavior.
The meaning of Definition 4.15 can be better understood by looking at Fig. 4.4. We
have two LoAs, namely Li , with observable {X1 , · · · , Xn } and Lj , with observable
{Y1 , · · · , Ym }. Observable Xr takes values xr in Λr (1 r n), whereas observable
Ys takes values ys in Λs (1 s m). Given a relation between pairs of observ-
ables, Ri,j (Xr , Ys ) ⊆ Li × Lj , the first condition of Definition 4.15 simply says that
−1
Rj,i (Ys , Xr ) = Ri,j (Xr , Ys ), i.e., the relation between Lj and Li is the reverse of the one
between Li and Lj . The second condition is a bit more complex. Let Πi (X1 , · · · , Xn )
be a behavior of Li , and let Πj (Y1 , · · · , Ym ) be a behavior of Lj . Let us transform Πi
with Ri,j , obtaining thus:
Ri,j(Xr, Ys)
Xr .
.Y s
Rj,i(Ys, Xr)
Fig. 4.4 Correspondence between LoAs established by the relation Ri,j (Xr , Ys )
Ri,j(Xr, Ys)
qRi,j[ i]
i
j
(Li)n (Lj)m
Fig. 4.5 Correspondence between behaviors established by the relation Ri,j (Xr , Ys )
The situation is described in Fig. 4.5, where we can see that, according to Giunchiglia
and Walsh’ classification, Floridi’s abstraction is theorem decreasing. In fact, each
true behavior in Lj (the “concrete” LoA), which implies Πi ’s transformed predicate
qRi,j [Πi ], has a corresponding behavior Πi which is true.
For the sake of illustration, let us introduce an example.
Example 4.2 Let Li = {X} and Lj = {X, Y } be two LoAs, where X and Y assume
values in R+ . The relation Ri,j is a relation of inclusion, namely Li ⊂ Lj . Let Πi be
a behavior of Li , whose cover is COV (Πi ) = DX = [c1 , c2 ]. Any instantiation of
Πi , let us say Πi (a), is transformed into a vertical line x = a in Lj , as described in
Fig. 4.6.
Then, the cover of the predicate qRi,j [Πi ] is the vertical strip defined by c1
x c2 . For each behavior Πj inside the strip, the corresponding Πi is true. If
4.2 Abstraction in Philosophy 75
Fig. 4.6 States of a LoA Li , consisting of a single observable X, can be represented by points on the
X axis, whereas states of a LoA Lj , consisting of the pair of observables X and Y , can be represented
by points on the (X, Y ) plane. A state corresponding to a true behavior in Li is, for example, the
point a ∈ DX = [c1 , c2 ]. As all values of Y are compatible with X = a, then all points on the
vertical line x = a correspond to true behavior of Li . As far as the behavior Πj has a cover included
in the strip c1 x c2 , there is always a corresponding behavior in Li which is true. However, if
this is not the case, as for Πj
(X, Y ), a corresponding true behavior on Li may not exist
Πj qRi,j [Πi ], as, for instance, Πj
, there may not exist an a
such that Πi (a
) is
true.
Two GoAs are equal iff they have the same moderated LoAs, and their families
of relations are equal. Condition (1) in Definition 4.15 states that the behavior mod-
erating each lower level LoA is consistent with the one of higher level LoAs. This
property links among each other the LoAs of the GoA. Definition 4.15 only asserts
that the LoAs are related, but it does not specify how. There are two special cases of
relatedness: “nestedness” and “disjointness”.
Definition 4.16 (Disjoint GoA) A GoA is called disjoint if and only if the Li ’s are
pairwise disjoint (i.e., taken two at a time, they have no observable in common), and
the relations are all empty.
A disjoint GoA is useful to describe a system under different, non-overlapping
points of view.
Definition 4.17 (Nested GoA) A GoA is called nested if and only if the only non-
empty relations are those between Li and Li+1 (0 i < n − 1), and, moreover, the
reverse of each Ri,i+1 is a surjective function3 from the observables of Li+1 to those
of Li .
3 We recall that a surjective function is a function whose image is equal to its codomain. In other
words, a function f with domain X and codomain Y is surjective if for every y ∈ Y there exists at
least one x ∈ X such that f (x) = y.
76 4 Definitions of Abstraction
(1) (2)
(λred Wavelength λred ) ∨ (Wavelength = yellow) ∨ (Wavelength = green)
The sequence consisting of the LoAs La and Lg forms a nested GoA. Informally, the
smaller, abstract space {red, yellow, green} is a projection of the larger, concrete
4.2 Abstraction in Philosophy 77
one. The relevant relation associates to each value c ∈ {red, yellow, green}
a band of wavelengths perceived as that color. Formally, R(Color, Wavelength) is
defined to hold if and only if each time the color is red the wavelength is in the
appropriate, corresponding interval:
(1) (2)
Color = red ⇐⇒ λred Wavelength λred
they are not descriptive but operational: in other words, they define the operations
that can be done on T. From this point of view, the definition of an ADT is in line
with the cognitive approach of classifying/recognizing objects by their functions (see
Sect. 2.6).
The typical way of working with ADTs is encapsulation, meaning that (imple-
mentation) details are not lost, but hidden inside the higher level definition of the
type. A generic definition of an ADT is as follows:
type name
definition
scalar or structured type definition
operations
procedures and functions
end name
Liskov and Guttag’s templates, reported in Sect. 2.4, are instances of this one [333].
The definition of an ADT involves the abstraction issues of simplicity (no implemen-
tation details included in the definition), of relevance (only the defining aspects are
considered), of granularity (the type can be defined at different levels of detail), and
of naming (the name stands for all its properties).
Example 4.5 (ADT queue) An example of an ADT is provided by the type Q =
queue:
type queue
definition
Finite sequence q of element of type T
operations
Init(q): Initialize the queue to be empty
Empty(q): Determine whether the queue is empty
Full(q): Determine whether the queue is full
Append(q, x): Add a new item at the end of the queue (if not full)
Head(q): Retrieve the first item in the queue (if not empty)
Remainder(q): Delete the first item in the queue (if not empty)
end queue
The type Q is a composite one, because it makes reference to the type T of its
elements. Using the ADT queue, a sequence of operations on a queue q can be
described without the need to specify any programming language.
The view of abstraction in ADTs is shared by Floridi, in that both accounts of
abstraction move from the abstract to the concrete: first an ADT is defined, then
concrete implementations come on. The relation between the ADT and the imple-
mentation is again a TD-abstraction.
4.4 Abstraction in Databases 79
The models proposed by Miles Smith and Smith [371, 372] for abstraction in data-
bases, even though quite old, are still fundamental, as mentioned in Sect. 2.4. In a
couple of papers they have defined two kinds of abstraction to be used in relational
database: “aggregation” abstraction and “generalization” abstraction.
Aggregation Abstraction This type of abstraction transforms a relation among
several named objects into a higher-level named object. For example, a relation
between a person, a hotel, a room, and a date may be abstracted as the
aggregated object “reservation”. This transformation is realized through the
introduction of a new type of entities, i.e., aggregate, defined as follows (using
Hoarse’s structures [251]):
type name = aggregate[key]
A1 : Attribute1
A2 : Attribute2
...............
An : Attributen
end
Component objects of the type name appear in the aggregate as attributes, whereas
the content inside the squared parentheses denotes the key, i.e., the subset of attributes
that uniquely identify the aggregate. The type name defines a table scheme, whose
columns are the attributes A1 , · · · , An .
Example 4.6 (Miles Smith and Smith [371]) Let us define an aggregate reservat-
ion as follows:
type reservation = aggregate [Number]
# : [key] Number
P : Person
H : Hotel
nR : Number of rooms
SD : Starting date
D : Duration
end
The aggregate has a key, the attribute Number, which univoquely identifies the reser-
vation.
80 4 Definitions of Abstraction
Fig. 4.8 A hierarchy of generic objects including various kinds of vehicles. The root is at level 0.
(Derived with permission from Miles Smith and Smith [371])
82 4 Definitions of Abstraction
var R : generic
sk1 : (R1,1 , . . . , R1,p1 )
...............
skm : (Rm,1 , . . . , Rm,pm )
of aggregate [key]
s1 : R1
...............
sn : Rn
end
M : Manufacturer
P : Price
W : Weight
MC : Medium category
PC : Propulsion category
end
As we can see, the generic object “vehicle” is defined as an aggregate of the entities
ID, M, P, W, MC, and PC. Then, two of the components of the aggregate, namely
MC and PC, have been selected to form generalizations. More precisely, they have
been chosen to create a double partition of the vehicles according to their “Medium
category”, with values {Land, Air, Water}, and according to their “Propulsion cat-
egory”, with values {Motorized, Man-powered, Wind-propelled}. In this way two
clusters have been created, and each instance of vehicle belongs to one of the
cluster elements. The other attributes, i.e., ID, M, P, W are common to all instances,
and are assigned to the generic object. Figure 4.9 shows the resulting tabular repre-
sentation.
In order to encompass both cases of aggregation abstraction and generalization
abstraction, the two conditions for well-definedness introduced before are substi-
tuted by the following five:
• Each R-individual4 must determine a unique Ri -individual.
• No two R-individuals determine the same set of Ri -individuals for all Ri in key.
• Each Ri,j -individual must also be an R-individual.
• Each R-individual classified as Ri,j must also be an Ri,j -individual.
• No Ri,j -individual is also an Ri,k -individual for j
= k.
Aggregation and generalization abstractions can only model simple systems, if
used in isolation, while their power greatly increases when used in combination.
Taking inspiration from the work just described, Goldstein and Storey [217] provide a
model for data abstraction which is a refinement of Miles Smith and Smith’s one [371,
ID M P W C1 (Medium) C 2 (Propulsion)
v1 Mazda 65.4 10.5 Land Moto rized
v2 Schwin 3.5 0.1 Land Man-powered
v3 Boeing 7,900 840 Air Motorized
v4 Acqua 12.2 1.9 Water Wind-propelled
Fig. 4.9 Relation corresponding to the generic object vehicle. This generic object has two
clusters of nodes as alternative partitions of its instances
372]. They keep the aggregation and generalization (renamed inclusion) mechanisms
for abstraction, and add one more, i.e., association. The model specifies a number of
dimensions that have to be defined for each type of abstraction mechanism: Semantic
meaning, Property set, Roles, Transitivity, and Mapping.
Let us look more closely to the abstraction operations.
Inclusion Inclusion describes an is-a relation between a supertype (generic) and a
subtype (specific). The most important property is inheritance (anything true for
the generic is also true for the specific). Abstraction via inclusion is transitive.
There is a many-to-one mapping between specifics and generic. An example is
“Secretary is-a Employee”. Inclusion acts on classes of entities, i.e., the generic
types defined by Miles Smith and Smith.
Aggregation A relationship among objects is considered as a higher level (aggre-
gate) object. There are three kinds of aggregation: (1) An attribute can be an aggre-
gation of other attributes; (2) An entity can be an aggregation of entities and/or
attributes, (3) A relationship can be an aggregation of entities and attributes.
Aggregates have the property of partial inheritance from the components, and
may have emergent properties (properties that do not pertain to components but
only to the aggregate). Each component plays a peculiar role in the aggregate,
and it may be relevant (interesting, but optional), characteristic (required), or
identifying (defining the aggregate). Aggregation is transitive. An example is the
computer of Fig. 2.4.
Association A collection of members is considered as a higher-level (more
abstract) set, called an entity set type. Details of the member objects are sup-
pressed and properties of the object set emphasized. For association there is no
inheritance property, but there are derived properties. Members of an associa-
tion are not required to have different roles, and the mapping between members
and the set entity type is unrestricted. Association is transitive. An example is a
“forest” with respect to its component trees.
The approaches to abstraction proposed by Miles Smith and Smith and by
Goldstein and Storey can be labelled as semantic. Proposed in the main stream
of Computer Science, they address the issues of feature selection (generaliza-
tion/inclusion), feature construction (aggregation), and hierarchical representation
at different levels of detail. Marginally, they also address the issue of naming (ADT
definition).
More recently, Cross [118] defined, in the context of object-oriented methods, some
dimensions along which abstraction mechanisms can be considered. Indeed, these
methods provide support for important abstraction principles, as illustrated in the
following.
Classification/Instantiation In classification, things which are similar are
grouped together. The properties that make the things alike are abstracted out
4.4 Abstraction in Databases 85
In case the group is created by a user according to his/her opinions about mem-
bership, the slot Predicate in the above definition is absent.
Generalization/Specialization Generalization is the process of creating a type
that has properties that are common to several other more specific types. The
generalized type, referred to as the supertype, defines these common properties.
Each one of the more specific types, referred to as a subtype, contains those
properties that are essential for its definition as a specialization of the super-
type. Specialization is the reverse of generalization, i.e., it creates a subtype with
more specific properties than the supertype. An important characterization of the
generalization/specialization principle is that it supports a hierarchical structur-
ing mechanism for conceptual specialization. Generalization abstraction can be
defined as follows:
Interface name
Extent: instances-of-name
A1 : attribute1
..............
An : attributen
Relationship Set : {R1 , . . . , Rm }
end
Here is an example:
Interface professor: employee
Extent: professors
A1 : rank {full, associate, assistant}
A1 : research-keywords
Relationship Set : <student> supervises
inverse: student :: supervised-by
end
Aggregation/Decomposition Aggregation creates a unique entity starting from
its components, so that the grouped entities are linked to the type via a part-of
relation. The inverse relation is composed-of. The components may have different
types among themselves, and do not have the same type as the aggregate object.
The aggregation type may be hierarchical as well. For example, a car type can be
defined as an aggregation of other types such as wheel, body and engine. The
type’s body may be created as an aggregation of other types such as hood, door
and handle. The intensional definition of car still would include a classification
group, but its intensional definition also includes the various types that are parts
of a car.
Even though the operations of Classification and Generalization look quite sim-
ilar, there is nevertheless a substantial difference, in that Classification acts on
instances and forms a type, whereas Generalization acts on types and builds up
a super-type.
4.5 Granularity 87
4.5 Granularity
The classical works on granularity, in the context of abstraction, go back to the late
80’s, with Hobbs’ [252] and Imielinski’s [269] approaches.
Hobbs is concerned with tractability of models of (parts of) the world. He assumes
that the world is described by a global, First Order Logic theory T0 , and his goal is
to extract from this theory a smaller, more computationally tractable, local one. Let
P0 be the set of predicates of T0 , and S0 their domain of interpretation. Let moreover
R be that subset of P0 which is relevant to the situation at hand. The idea behind
Hobbs’ approach is that elements of S0 can be partitioned into equivalence classes,
by defining an indistinguishability relation on S0 :
where is a small, positive number, which can, for example, quantify the precision of
measurement. If definition (4.7) has to be traced back to definition (4.3), some set of
relevant predicates must be identified. To this aim, definition (4.3) must be extended
by allowing partial predicates to be relevant as well; the new definition reads then:
(x ∼ y) iff ∀p ∈ R : (p(x) and p(y) are both defined) ⇒ (p(x) ≡ p(y)) (4.8)
In order to distinguish between x and y, we must find a relevant partial predicate that
is true for one and false for the other. Finally, a system can be described at different
granularity levels, each one articulated with the others.
Example 4.10 (Hobbs [252]) Let T be a body’s temperature in ◦ C, and let p(x, t) be
the relevant predicate “temperature x is around t”, with t a real number. By varying t
in R we obtain an infinite set of relevant predicates. Suppose furthermore that p(x, t)
is true for t − 3 x t + 3, false for x t − 3 − , and x t + 3 + , and undefined
otherwise (see Fig. 4.10).
Two temperatures x1 and x2 are distinguishable is there exist a t such that p(x1 , t)
is true and p(x2 , t) is false (or vice-versa, owing to the symmetry between x1 and
x2 ). Both p(x1 , t) and p(x2 , t) must be defined, so that the intervals AB and CD in
4.5 Granularity 89
Fig. 4.10 The relevant predicate, for a given t, is true if x lies on the segment BC, it is false if x
lies to the right of D or to the left of A, and it is undefined otherwise. In other words, in order to
be distinguishable two temperature x1 and x2 must lie one in the “true” interval and the other in
one of the two “false” intervals. The remaining intervals AB and CD do not count, because the
predicate is undefined in them. Then, in order to distinguish x1 from x2 , it must be |x2 − x1 |
Fig. 4.10 are irrelevant. As a consequence, one of the two, say x1 must be inside the
interval BC, whereas x2 must be either to the left of point A or the right of point D.
In both cases it must be:
|x2 − x1 |
If |x2 − x1 | , no t exists for which x1 and x2 can be distinguished.
Imielinski’s research [269] had the same motivations as Hobbs’ with respect to sim-
plifying reasoning; in fact, he called his approach to abstraction “limited” reasoning
(i.e., weaker than First Oder Logic). In his view, one of the problems in achieving
this simplification is that, differently from numerical computations, logical reasoning
lacks a proper notion of error, and hence that of approximation. Imielinski proposes
then a definition of error sufficiently general to cover both automated reasoning and
numerical computations.
Imielinski starts from a knowledge base containing some formulas of the type
M(X, v), denoting a measure v for the variable X. Then, taking into account the fact
that measurements may be affected by precision limits, he substitutes those formulas
with weakened ones, such as M(X, int), being int an interval of v values. Given a
property p of the original formulas (for instance, being true), some of the substitutions
preserve p and some not.
Definition 4.18 (Error in a knowledge base) The error in a knowledge base is the
set of all formulas that do not preserve a given property p.
Example 4.11 (Imielinski [269]) Let us suppose that we measure the volume of a
body X in m3 with a maximum error of 1 m3 . The formula Vol(X, 124) may or may not
be exactly true in the real world, because it just tells us that, due to the measurement
error, the actual value of the volume will be in the interval [123, 125] m3 (property
p). Then, the approximate formula ϕ1 = ∃v[Vol(X, v ∈ [123, 125])] preserves p.
Actually, p is preserved by any formula ∃v[Vol(X, v ∈ [a, b])] with a 123 and
b 125. On the other hand, the formula ϕ2 = ∃v[Vol(X, v ∈ [123.9, 124.1])] does
90 4 Definitions of Abstraction
not preserve p, because the true value could be, for instance, 123.2, and hence the
formula is part of the error.
Imielinski calls local the notion of error just introduced. He also defines a global
error, which results from the replacement of the whole knowledge base by the
“rounded up” one. The global error is simply the set of all formulas that are not
guaranteed to preserve the properties (usually the truth) of the original knowledge
base.
Imielinski’s notion of error is general, but not easy to apply to generic approxima-
tion schemes; then, he concentrates on the same type of abstraction as Hobbs, namely
domain abstraction. More precisely, he defines a knowledge base KB as a finite set
of formulas in some First Order Logic language, a query as an open formula ϕ, an
answer to the query as the set of all substitutions of domain constants for variables
in ϕ, such that the resulting closed formula is a logical consequence of KB. The
domain D of the knowledge base is the set of all objects occurring in the KB. On the
domain D an equivalence relation R (reflexive, symmetric, and transitive) is defined.
Relation R may represent the relevant features of the domain, or is supposed to hide
some features of the external world, or may correspond to the error in measurements.
The equivalence relation R induces a partition of the constant names into equiva-
lence classes, denoted by [a], where [a] can be considered the representative of the
class. If we substitute all constants in KB with their equivalence classes, a simplified
KB is obtained, from which approximate answers to queries can be derived.
Example 4.12 (Imielinski [269]) Let a knowledge base contain the following long
disjunction:
P(a,b) ∧ P(a1 , b1 ) ∧ · · · ∧ P(an , bn )
instance, connect(New York, San Francisco) is true, than the predicate will
be rewritten as connect(New York State, California). Suppose we want
to answer the query Q =“Find all direct or one-stop flights from New York to Seattle”.
If the original KB contained connect(New York, San Francisco) and con-
nect(Los Angeles, Seattle), the abstract one contains connect(New York
State, California) and connect(California, Washington). In the
“liberal” interpretation the trip (New York State, California, Washing-
ton) will be added (incorrectly) to ANS(Q), while in the conservative interpretation
it will not. On the other hand, if the original KB contained connect(New York,
San Francisco) and connect(San Francisco, Seattle), this trip will not
be added (even though correct) to ANS(Q), because it is known that each state con-
tains several cities, and the connection could potentially be incorrect.
In addition to the abstraction operations reported in Sect. 4.4.3, Cross [118] shows
how the Fuzzy Set theory [576] can be used to implement them. Independently from
other definitions, fuzzy sets can be considered an abstraction per se, at least under
two respects. First of all they allow a form of attribute value abstraction. As an
example, let us consider a numerical variable X, assuming values x in the interval
[0, ∞), and let “Big” be a fuzzy set whose membership function μ(x) is reported in
Fig. 4.11. When we say that “X is Big”, we abstract from the actual value taken on
by X, and we just retain the compatibility measure of X with the concept Big. For
X in the interval (10,100) we still can make some difference between values, as the
membership μ(x) differentiates the degree of compatibility of x with the meaning
of Big. For values of X either in the interval [0,10] or in the interval [100, ∞),
we consider all the values of X as equivalent with respect to the relevant predicate
Fig. 4.11 Fuzzy set “Big”, whose membership function μ(x) is defined over the real axis where
the variable X takes values
92 4 Definitions of Abstraction
membership(X, μ(x)) (“the membership of X in the fuzzy set Big is μ(x)”). Then:
Another, even more interesting way of achieving abstraction with fuzzy sets is the
use of linguistic variables, introduced by Zadeh [577]. A linguistic variable is a
variable that takes as values words or sentences in a language. For example, Age is
a linguistic variable if its values are linguistic rather than numerical, i.e., young,
very young, old, …rather than 20, 21, 22, 80. Formally:
Definition 4.19 (Linguistic variable) A linguistic variable is a quintuple L, T (L),
U, G, M, where L is the name of the variable, T (L) is the term-set of L, i.e., the
collection of its linguistic values, U is a universe of discourse, G is a syntactic rule
which generates the terms in T (L), and M is a semantic rule that associates with
each linguistic value x its meaning μ(x), which is a fuzzy subset of U.
As Zadeh puts it [577], “the concept of a linguistic variable provides a means of
approximate characterization of phenomena which are too complex or too ill-defined
to be amenable to description in conventional quantitative terms”. Using a linguistic
variable can be considered a special case of discretization.
A role similar to the fuzzy sets’ one, but more oriented to approximation, is played by
the Rough Set theory, introduced by Pawlak [414]. The idea is based on information
systems and a notion of indiscernability.
Definition 4.20 (Information system) An information system S is a 4-tuple (U, A,
Λ, f ), where U
is the universe, i.e., a finite set of N objects, A is a finite set of
attributes, Λ = A∈A ΛA is the union of the domains of the attributes (ΛA is the
domain of A), and f : U × A → Λ is a total decision function such that f (x, A) ∈ ΛA
for every A ∈ A, x ∈ U.
In Definition 4.20 f is a function that associates to an object x a value f (x, A) of
attribute A. A subset X of U is called a concept. We can now introduce the indis-
cernability relation.
Definition 4.21 (Indiscernability) A subset of attribute Aind ⊆ A defines an equiv-
alence relation, called indiscernability relation, IND(Aind ), on U 2 , such that:
Indiscernability, in the rough set theory, originates from the fact that removing the
attribute subset A-Aind from A leaves some objects with the same description; hence,
4.5 Granularity 93
for those objects, the function f assumes the same value for every attribute A ∈ Aind ,
making them indistinguishable.
Given
an information system S and an indiscernability relation IND(Aind ), let
AS = U, IND(Aind ) be the approximation space in S.
Definition 4.22 (Upper/Lower bounds) Given an approximation space AS and a
concept X , the Aind -lower approximation LX Aind of the set X can be defined as:
LX Aind = {x ∈ U : [x]Aind ⊆ X }
UX Aind = {x ∈ U : [x]Aind ∩ X }
= 0}
In Definition 4.22, [x]Aind denotes the equivalence class of x with respect to the
attribute set Aind . Finally, we can define a rough set.
Definition 4.23 (Rough set) A rough set R is the pair (LX Aind , UX Aind ).
A rough set is thus a pair of crisp sets, one representing a lower bound and the other
representing an upper bound of a concept X .
Example 4.14 Let U be the set of points P in a plane, and let A = {X, Y } be the
set of attributes, representing the coordinates X and Y in the plane. Let moreover
ΛX = ΛY = (−∞, +∞). Then, Λ = ΛX ∪ ΛY .
The function f : U × A → Λ will be:
f (P, X) = x ∈ ΛX
f (P, Y ) = y ∈ ΛY
Fig. 4.12 Upper (pink + yellow regions) and lower (yellow region) approximations of a concept
X = Oval, defined as a region in the 2D plane. [A color version of this figure is reported in Fig. H.4
of Appendix H]
94 4 Definitions of Abstraction
Let us choose as Aind the whole A. Then we can define equivalence classes among
points as follows:
with a given Δ. The plane will be divided into squares of side Δ, such that all points
inside a square are considered equivalent, as represented in Fig. 4.12.
Given a concept X , defined extensionally as an oval region of the plane, the lower
approximation consists of all squares that are totally inside the oval, whereas the
upper approximation consists of all the squares that, at least partially, overlap the
oval.
Even though the rough set theory is based on a notion of indiscernability similar to
the one used by Hobbs [252] and Imielinski [269] for granularity, its use is different,
because it in not used per se, but as a first step to provide approximations of sets.
Within the realm of logical representations, Plaisted [419] was the first to propose
a general theory of abstraction oriented to theorem proving, and, specifically, to
resolution.
Plaisted considered a First Order Logic language in clausal form (See Appendix
D), and defined a generic abstraction mapping as a mapping between a (ground)
clause C and a set f (C) of (abstract) clauses. The idea is to transform a problem A
into a simpler one B, such that B has certainly a solution if A does, but B may also
have additional solutions. According to Giunchiglia and Walsh’s classification [214]
this mapping is a TI-abstraction.
A cardinal notion in Plaisted’s approach is subsumption. Let x denote a vector
of variables {x1 , x2 , . . . , xn } and let A be a set of constants. A substitution θ is an
assignment xi = ai (1 i n), with ai ∈ A. Given a clause C(x), the notation Cθ
stands for C(a) = C(a1 , a2 , . . . , an ).
Definition 4.24 (Subsumption) A clause C1 subsumes a clause C2 (denoted by
C1 C2 ) if there exists a substitution θ such that C1 θ is a subset of C2 .
4.6 Syntactic Theories of Abstraction 95
R(x) R
(a) (b)
Fig. 4.13 a Resolution in First Order Logic. b Abstract resolution after removing all variables from
clauses. The trees in (a) and (b) have the same structure
f : P1 → P2
into containers, because they have certainly some property distinguishing them (oth-
erwise they would be the same object). A weakening of the requirement is to allow
abstractions of the form bottle(x) ∨ glass(x) → container(x).
The intuitive notions discussed so far are then formalized by Tenenberg. First
of all, let Cf be the set of predicates (or clauses) that map into C under predicate
mapping f , i.e.:
Cf = {D|f (D) = C}
Definition 4.28 simply states that, among all the correspondences between predi-
cates from the ground to the abstract language, only those that preserve consistency
are kept. In fact, Tenenberg proves that g(Φ) preserves consistency across predicate
mapping. However, restricted predicate mappings are no longer abstraction mappings
according to Plaisted’s definition, because the upward-solution property is not pre-
served. On the contrary, restricted mappings do have a downward-solution property,
since for every clause derivable from the abstract theory there is a specialization
of it derivable from the original theory. Restricted mappings are TD-abstractions,
according to Giunchiglia and Walsh [214], because a solution may not exist in the
abstract theory, but, if it does, the original problem has a corresponding solution.
It has to be noted that g(Φ), as defined above, is undecidable, since it requires
determining Φ D for every clause D mapping to each candidate clause in the
abstract clause set. In practice, the search for derivability can always be arbitrarily
bounded, and if no proof is obtained within this bound, it can be assumed that the
clause is not derivable. In this way, consistency is still preserved between the original
and the abstract theory, the abstract theory being simply weaker than it theoretically
could be (it has fewer theorems). Let us see an example of a restricted predicate
mapping.
Example 4.16 (Tenenberg [526]) Let a be a constant and Φ be the set of clauses
reported in Fig. 4.14. Let f be a predicate mapping associating each predicate in
Φ to itself, except for bottle(x) → glass-container(x) and glass(x) →
glass-container(x).
The abstracted clauses are the following ones:
1’) glass-container(x) ⇒ made-of-glass(x)
2’) glass-container(x) ⇒ graspable(x)
4.6 Syntactic Theories of Abstraction 99
tokens; tokens are equivalent if they are of the same types. A classification is type-
extensional if there are no two distinct equivalent types, and it is token-extensional
if there are no two distinct equivalent tokens.
A classification can be seen as a table in a very simple database, where only two
columns are available: token and types. However, unlike a row in a relational database,
channel theory treats each token as a first-class object,6 and hence each token is the
key of itself. By treating tokens as first-class objects, relationships can be modeled
using an infomorphism.
Definition 4.30 (Infomorphism) Given two classifications A =< typ(A),
tok(A), |=A > and < typ(B), tok(B), |=B >, an infomorphism f : A B
from A to B is a pair of functions (f ∧ , f ∨ ), such that f ∧ : typ(A) → typ(B),
and f ∨ : tok(B) → tok(A), satisfying the following property:
For every type α in A and every token b in B, b |=B f ∧ (α) iff f ∨ (b) |=B α
An infomorphism formalizes the correspondence in the information structure of two
classifications; it states that the regularities in a domain, captured by classifications,
are compatible. More precisely, an infomorphism is intended to model transfer of
information from one view of a system to another one; for instance knowing that “a
mountain’s side has a particular distribution of flora can carry information about the
local micro-climate” [480]. An infomorphism is more general than an isomorphism
between classifications. For example, an infomorphism between classifications A
and B might map a single type β in B onto two or more types in A, provided that
from B’s point of view the two types are indistinguishable; this means that, for all
tokens b in B and all types α in A, f ∨ (b) |=A if and only if f ∨ (b) |=B α
. It must be
noticed that two types indistinguishable for B may be distinguishable in A. In fact,
there may be tokens in A outside the range of f ∨ for which, for example, a |=A α
but not a |=A α
.
Dually, two tokens of B may be mapped onto the same token in A, provided that
those tokens in B are indistinguishable with respect to the set of types β in B for
which there exists some α such that f ∧ (α) = β. Again, this does not mean that these
same tokens in B are wholly indistinguishable in B. For example, there may be types
outside the range of f ∧ classifying them differently. Thus, “an infomorphism may be
thought of as a kind of view or filter into the other classification” [40].
In practice, it may be difficult to find infomorphisms between arbitrary classifica-
tions. If the correspondence is too easy, then the morphism would not be interesting.
If it is too stringent, it would not be applicable (Fig. 4.15).
An example of infomorphism can be taken from chess.
Example 4.17 (Seligman [40]) Consider a game of chess, observed and analysed by
a group of experts. The observations can be represented by an event classification
G in which the actual moves of the game are classified by the experts into types of
varying precision: “White now has a one-pawn advantage”, “Black has control of
6 A first class item is one that has an identity independent of any other item.
4.6 Syntactic Theories of Abstraction 101
Fig. 4.15 A graphical representation of an infomorphism (Adapted with permission from De Saeger
and Shimojima [464])
typ( )
f
typ( ) typ( )
tok( )
f tok( )
tok( )
Fig. 4.16 A channel connecting two infomorphisms
Fig. 4.17 A channel representing the telegraph (Reprinted with permission from Seligman [480])
π : Interpretations(Lb ) → Interpretations(La )
BMW(x) ⇒ EuropeanCar(x)
Let us first consider a (syntactic) predicate abstraction that associates the abstract
predicate ForeignCar(x) to both EuropeanCar(x) and JapaneseCar(x).
Then, Ta will be:
ForeignCar(x) ⇒ Car(x)
Toyota(x) ⇒ ForeignCar(x)
BMW(x) ⇒ ForeignCar(x)
This abstraction considers the difference between a European and a Japanese car
irrelevant to the goal of the current reasoning. Let us now add to Tb the following
axioms:
EuropeanCar(x) ⇒ Fast(x)
JapaneseCar(x) ⇒ Reliable(x)
Applying the previous abstraction, we obtain:
ForeignCar(x) ⇒ Fast(x)
ForeignCar(x) ⇒ Reliable(x)
These last axioms, added to the previously obtained ones, may lead to the conclusion
that a Toyota is fast and that a BMW is reliable, even though these conclusions are
not warranted in the base theory.
Let us now consider how this example can be handled in Nayak and Levy’s theory.
As π preserves the universe of discourse, we have that π∀ = (v1 = v1 ), which is
satisfied by all elements. The extension of the predicate ForeignCar(x) is the
union of the extensions of the predicates JapaneseCar(x) and EuropeanCar(x).
Hence:
πForeignCar (v1 ) =JapaneseCar(v1 ) ∨ EuropeanCar(v1 )
The extension of the other predicates (except JapaneseCar and EuropeanCar,
which are not in the abstract theory) is unchanged.
Another interesting point raised by Nayak and Levy is that TI-abstractions, which
admit false proofs, are better viewed as MI-abstractions in conjunction with a set
of simplifying assumptions; false proofs emerge when the assumptions are violated.
We may see how this happens in the following example.
Example 4.19 Let us consider Imielinski’s domain abstraction [269], and let the
base theory contain the axioms {p(a, b), ¬p(c, d)}. If we assume that a and c
are equivalent, and that b and d are equivalent, then the abstract theory becomes
{p(a, b), ¬p(a, b)}, which is inconsistent. As the base theory is consistent, this
abstraction violates Theorem 4.2 and cannot be an MI-abstraction.
Suppose now that the equivalence relation is a congruence, i.e., for every n-ary
relation p and terms t1 , ti
(1 i n) such that t1 and ti
are equivalent, the base
theory entails p(t1 , . . . , tn ) ⇔ p(t1
, . . . , tn
). In this case domain abstraction is indeed
an MI-abstraction, and the simplifying assumption is that the “equivalence relation
is a congruence”.
106 4 Definitions of Abstraction
A few years after the publication of Giunchiglia and Walsh’ paper [214], Ghidini and
Giunchiglia [203] proposed a “model-theoretic formalization of abstraction, where
abstraction is modeled as two representations, the ground and the abstract ones,
modeling the same phenomenon at different levels of detail”.
In this revisited approach abstraction is simply defined as a mapping function
f : L0 → L1 between a ground language L0 and an abstract language L1 , where
a language is a set of wffs. The function f preserves the names of variables, and is
total, effective, and surjective, i.e.:
• for each symbol s ∈ L0 , f (s) is defined,
• for each symbol s
∈ L1 , there is a symbol s ∈ L0 such that s
= f (s),
• if f (s) = s0 and f (s) = s1 , then s0 = s1 .
Moreover, f only maps atomic formulas in the languages, keeping the logical
structure unmodified; for this reason it is called an “atomic abstraction”.
Definition 4.38 (Atomic abstraction) The function f : L0 → L1 is an atomic
abstraction iff
• f (α ◦ β) = f (α) ◦ f (β) for all binary connectives ◦,
• f (α) = f (α) for all unary connectives ,
• f (x.α) = x.f (α) for all quantifiers .
Atomic abstractions can be further classified as term abstractions, which operate
on term symbols, and formula abstractions, which operate on predicates, and map
ground formulas to abstract ones. A typical atomic abstraction is symbol abstraction,
which lets different ground constants (or function symbols, or predicates) collapse
into a single abstract one. Another one is arity abstraction, which reduces the number
of arguments in functions or predicates.
The meaning of an abstraction is then defined in terms of Local Model Semantics
(LMS) [202]. In order to make this section self-consistent, we briefly recall the
underlying theory, proposed by Ghidini and Giunchiglia [203]. The theory tries to
formalize the notion of context, i.e., of the environment where some reasoning is
performed according to a partial view of a system. If a system is observed under
different points of view, each observer can reason using the information he/she has
gathered. However, as the observed system is the same, the partial views of the various
observers must agree to some extent. Indeed, not all the system’s states collected by
one observer are a priori compatible with all the system’s states collected by another.
This problem is, for instance, a classical one in Medical Informatics, when a 3-D
image has to be reconstructed from a series of 2-D images.
In order to illustrate the problem, we introduce the same example that Ghidini
and Giunchiglia used themselves [203] for the same purpose.
Example 4.20 (Ghidini and Giunchiglia [203]) Let us suppose that we have a trans-
parent box, subdivided into a 2 × 3 grid of sectors; in each sector a ball can be put.
4.7 Semantic Theories of Abstraction 107
(a) (b)
Fig. 4.19 a Two observers look at the same transparent box from orthogonal directions. b Edges
connect states that are compatible in O1 and O2 ’s views (Reprinted with permission from Ghidini
and Giunchiglia [202])
An observer O1 looks at the box in one direction, whereas another observer O2 looks
along an orthogonal one. As aligned balls cover one another, O1 can only observe
four states of the box, namely, “ball to the left”, “ball to the right”, “no ball”, or “balls
to the left and to the right”. Observer O2 can see eight different states, according to
the presence of no ball, one ball, two balls, or three balls in the three visible sectors.
The various states are reported in Fig. 4.19.
Let L1 and L2 be the languages in which O1 and O2 describe their observations.
These are propositional languages, where P1 = {, r} and P2 = {, c, r} are the sets
of propositional variables in L1 and L2 , respectively. Let Mi be the set of models of
Li (i = 1, 2). Models in Mi are called local models, because their truth is assessed
independently from other views of the system. Let now ci (i = 1, 2) be a subset of Mi .
The set ci belongs to the power set of Mi . Let c = (c1 , c2 ) be a compatibility pair,
i.e., a pair of subsets of models that are compatible in the two views. A compatibility
relation C = {c} is a set of compatibility pairs. Then:
C ⊆ 2M1 × 2M2
Finally, a model is a compatibility relation which is not empty and does not contain
the empty pair. A special case occurs when |ci | = 1 (i = 1, 2); in this case, C ⊆
M1 × M2 .
In the example of the box, the local models of L1 are:
M2 = {∅, (), (c), (r), (, c), (, r), (c, r), (, c, r)}
r ⊆ domg × doma
A domain relation r represents the relation between the domains of the ground and
abstract models. All domain relations are considered total and surjective functions;
in other words, for all d1 , d2 ∈ doma , if (d, d1 ) ∈ r and (d, d2 ) ∈ r, then d1 = d2 .
Moreover, Ghidini and Giunchiglia assume that all local models in Mi (i = g, a)
agree on the interpretation of terms. This means that elements in Mi may only differ
in the interpretation of predicates.
The preservation of meaning across abstraction is formalized by means of a com-
patibility relation.
Definition 4.41 (Compatibility relation) Given Mg and Ma and a domain relation
r ⊆ domg × doma , a compatibility pair c = (cg , ca ) is defined as either a pair of
local models in Mg and Ma , or the empty set ∅. Moreover, a compatibility relation
C is a set C = {c} of compatibility pairs.
4.7 Semantic Theories of Abstraction 109
It is easy to see that a model satisfies a term abstraction if the domain relation maps
all the ground terms (tuples of terms) into the corresponding abstract terms (tuples
of terms). A graphical representation of term abstraction is reported in Fig. 4.21. The
fact that constants c1 and c2 are abstracted into the same constant c in La is captured,
at the semantic level, by imposing that both the interpretations d1 and d2 of c1 and
c2 in domg are mapped into the interpretation d of c in doma .
Abstraction of functions is similar, but has an additional difficulty: if function
g1 (x) and g2 (x) are collapsed into g(x), it is not clear what value should be attributed
to g(x). Different choices are available to the user (max, min, or other aggregation
operations).
The last notion to introduce is the satisfiability of formula abstraction.
Definition 4.43 (Satisfiability of formula abstraction) Let f : Lg → La be a for-
mula abstraction. Let C be a model over Mg , Ma , and r ⊆ domg × doma . C is said
to satisfy the formula abstraction f if for all compatibility pairs (cg , ca ) in C:
• For all p1 , . . . pn ∈ Lg , and p ∈ La , such that f (pi ) = p(1 i n):
if cg |= pg (x1 , . . . , xm , . . . , xn )[d1 , . . . , dm , . . . , dn ]
then ca |= p(x1 , . . . , xm )[r(d1 ), . . . , r(dm )]
if cg pg (x1 , . . . , xm , . . . , xn )[d1 , . . . , dm , . . . , dn ]
then ca p(x1 , . . . , xm )[r(d1 ), . . . , r(dm )]
4.7 Semantic Theories of Abstraction 111
Definition 4.43 states that a model satisfies a formula abstraction if the satisfiability
of formulas (and of their negation) is preserved throughout abstraction. Finally, given
a model C and an abstraction f , C is said to satisfy f if it satisfies all the terms and
formula abstractions.
For the sake of exemplification we report the following example, provided by
Ghidini and Giunchiglia themselves, in which they show how Hobbs’s example,
reported in Example 4.9, can be reformulated in their approach.
Example 4.21 (Ghidini and Giunchiglia [203]) Let Lg and La be the ground and
abstract languages introduced in Example 4.9, and let f : Lg → La an abstrac-
tion mapping. Let domg and doma be two domains of interpretation, containing
all the constants of Lg and La , respectively; let moreover r ⊆ domg × doma be a
domain relation that follows directly from f , i.e., a domain relation that satisfies the
constraints:
Let mg and ma be a pair of local models over domg and doma , which interpret
each constant c into itself. Let C be any model over r containing these compatibility
pairs. Then C satisfies the granularity abstraction on constants “by construction”. Let
us now restrict ourselves to a C that satisfies also the granularity abstraction on the
predicate symbol on. It is easy to see that if mg satisfies the formula on(b, x, y, z),
and the block b is on the table, then ma satisfies the formula on(b, int(x),
int(y)).
4.8 Reformulation
The notion of abstraction has been often connected to the one of reformulation, with-
out equating the two. In this section we mention three approaches on reformulation,
which are explicitly linked to abstraction.
112 4 Definitions of Abstraction
Design
Abstract Abstract
Specification Algorithm
Abstraction
Implementation
User Concrete
Specification Algorithm
Fig. 4.22 Problem reformulation scheme (Reprinted with permission from Lowry [342])
One of the first theories on reformulation, in connection with abstraction, has been
proposed by Lowry [342, 344], who described the system STRATA, which reformu-
lates problem class descriptions, targeting algorithm synthesis.
A problem class description consists of input–output specifications, and a domain
theory describing the semantics of objects, functions and relations in the specifica-
tions. Data structures are Abstract Data Types (ADT), generated by STRATA. ADTs
are considered as theories, whose symbols denote the functions of interest, and whose
axioms are given abstractly. An ADT hides implementation details, while making
the essential properties of the functions explicit. Figure 4.22 graphically describes
the reformulation process.
Reformulation is a representation mapping between theories. Given a problem
specification in a problem domain theory, STRATA finds an equivalent problem spec-
ification in a more abstract problem domain theory. This type of abstraction is called
behavioral abstraction, because it concerns input–output (IO) behavior, and the refor-
mulation involved is similar to Korf’s homomorphism [297]. Behavioral abstraction
occurs by merging models of the concrete theory that are identical with respect to
IO behavior. Abstractions are searched for in a space with a semi-lattice structure,
where more and more abstract (tractable) formulations are found moving toward the
top, whereas implementations are at the bottom.
In order to apply behavioral abstraction, behavioral equivalence schemas are used,
such as:
In1 ∼
=beh In2 iff ∀Out : R(In1 , Out) ↔ R(In2 , Out)
Out1 ∼
=beh Out2 iff ∀In : R(In, Out1 ) ↔ R(In, Out2 )
Methods for generating behavioral equivalence theorems are the kernel method and
the homomorphism method [343].
4.8 Reformulation 113
4.9 Summary
Not so many theories of abstraction have been proposed in the literature in the
last decades. Even though abstraction has a fundamental role in many disciplines,
only in Computer Science and Artificial Intelligence some computational models
have been put forward. In Artificial Intelligence most models exploit some logical
context/language.
After an initial enthusiasm and optimism, the complexity of the aspects of abstrac-
tion and the variety of the contexts of its use have dissuaded researchers from looking
for general theories and suggested to concentrate the efforts on more limited, but
practically useful notions. In fact, to the best of our knowledge, none of the general
logical theories proposed went beyond some simple, didactical examples. In fact, the
elegant formulations at the theoretical level fail to cope with all the details that must
be specified for actual application to real-world problems.
The attitude toward applicability was, since the beginning, at the core of the idea
of Abstract Data Types, which had pragmatic goals and limited scope, and were not
presented as a general theory, but as an effective and useful tool. Something similar
116 4 Definitions of Abstraction
can be said for abstraction in databases, which is the other subfield where abstraction
has been treated somewhat formally.
The confinement to the realm of the theory also happened for approaches to irrele-
vance, even though they were promising. Actually, an effective theory of irrelevance
could exactly be the missing link between a problem and the identification of the
fundamental aspects needed to solve it.
Chapter 5
Boundaries of Abstraction
n this chapter we come up with the properties that in our view, abstraction
should have, as it emerges from the analysis in the previous chapters. We will also
relate abstraction with its cognate notions, mainly generalization and approximation.
As we have seen, notwithstanding the recognized role abstraction plays in many
disciplines, there are very few general theories of abstraction, and most of them are
quite old and difficult to apply. Abstraction is an elusive and multi-faceted concept,
difficult to pin down and formalize. Its ubiquitous presence, even in the everyday life,
contributes to overload its meaning. Thus, we are aware that finding general properties
and a definition of abstraction that covers all its meaning and usage is likely to be an
impossible task, and, hence, we are focusing on a notion of abstraction targeted to
domains and tasks whose conceptualization is largely grounded on observations.
A definition of abstraction may be useful for several reasons, such as:
• Clarifying the exact role abstraction plays in a given domain.
• Defining abstraction operators with a clear semantics.
• Establishing what properties are or not preserved when modifying a system
description.
• Eliciting knowledge, by establishing clear relations between differently detailed
layers of knowledge.
This chapter will be kept at an informal level, as it is meant to provide an intuitive
and introductory understanding of the issues, whereas a more formal treatment will
be presented in the next chapters.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 117
DOI: 10.1007/978-1-4614-7052-6_5, © Springer Science+Business Media New York 2013
118 5 Boundaries of Abstraction
In order to be at the same time useful in practice and sufficiently general with respect
to its intended goal, abstraction should comply with the following characterizing
features:
• It should be related to the notion of information, and, specifically, of information
hiding.
• It should be an intensional property and not an extensional one of a system.
• It should be a relative notion and not an absolute one.
• It should be a process and not a state.
Each one of the above features, not independent from one another, will be elabo-
rated upon in the following of this chapter.
As we have seen in Chaps. 2 and 3, there are mainly three broad and intertwined
points of view from which to start for a definition of abstraction:
• To move away from the physical world, ignoring concrete details.
• To restructure a problem aiming at its simplification, which is the basis for the
reformulation view of abstraction.
• To forget or factorize irrelevant details, which is linked to the notion of relevance,
information hiding, and aggregation.
Even though the first of these points of view matches our intuition, it does not
lend itself to a computational treatment, as it includes philosophical, linguistic, and
perceptive aspects which are hard to assess and quantify. Concerning the second
perspective, simplicity is as hard to define as abstraction itself, and grounding the
definition of abstraction on it turns out to be somewhat circular. Then, we remain
with the last point of view, relating abstraction to information reduction. As we
will see, taking this perspective is, prima facie, less satisfying for our intuition than
considering abstract “what does not fall under our senses”, but, at a closer inspection,
it will not be conflicting with that intuition. Moreover, the information-based view
nicely integrates with the simplicity-based one.
Clearly, we have first to define information. We may recur to formal notions of
information readily available, and we may also choose between alternative ones,
according to our need (see Gründwald and Vitànyi [223] for a discussion). The two
classical definitions are Shannon’s “information” [483] and Kolmogorov’s “algo-
rithmic information” (or complexity) [295], which seem quite different, but are nev-
ertheless related by strong links. Both definitions aim at measuring “information”
in bits. In both cases, the amount of information of an entity is the length of its
description. But in Shannon’s approach entities are supposed to be the outcomes
of a known random source, and the length of the entity description is determined
5.1 Characteristic Aspects of Abstraction 119
solely by the stochastic characteristics of the source and not by the individual entities.
Concretely, Shannon information of a message (a string of symbols) is the minimum
expected number of bits to transmit the message from the random source through an
error-free channel. On the contrary, Kolmogorov algorithmic information depends
exclusively upon the individual features, and it is defined as the length of the shortest
(universal) computer program that generates it and then halts. In Shannon’s words:
“The fundamental problem of communication is that of reproducing at one point,
either exactly or approximately, a message selected at another point ”. In Kolmogorov
complexity the quantity of interest is the minimum number of bits from which a
particular message can effectively be reconstructed. The link between Shannon’s
and Kolmogorov’s notions is established by a theorem that states that the average
Kolmogorov complexity over all the possible strings is equal to Shannon information.
Notwithstanding their elegance and appropriateness, the two mentioned defini-
tions are not suitable to be used, as they are, in an approach to abstraction. In fact,
Kolmogorov information measure is uncomputable, and, instead, we need a notion
that could be easily understood and computed in concrete systems. On the other
hand, Shannon’s probabilistic notion of information, based on an ensemble of possi-
ble messages (objects), is not suited either to our needs. In fact, we would like a notion
of information which depends on single objects (as for Kolmogorov’s) and which is
not probabilistic, as Shannon’s information is. However, the notion of information
that we search for should, in some sense, be reducible to Shannon’s or Kolmogorov’s
definitions in special cases. The reason why the probabilistic approach is not well
suited to our approach is twofold. First of all, the set of messages from the source
should be known in advance. When messages are transmitted, this requirement is
usually met. But when the “source” is the world, and the “message” is a perceived
part of it, the definition of the set of messages and of a superimposed probability
distribution is out of reach. In addition, our intuition tells us that information needs
not to be always probabilistic; when we read a book or speak with someone, we
may acquire some new pieces of information, something that we were not aware of
before, that must be integrated with our current knowledge, and which is not sto-
chastic. Clearly, if the part of the world of interest can be reduced to a known set of
possible alternatives, Shannon’s information notion may apply.
Based on the above discussion, we need a definition of information suggested by
pragmatic reasons: it has to be well suited to support the definition of abstraction, but
it does not need to be general outside its intended application. Moreover, it should
reduce to either Shannon’s or Kolmogorov’s definitions, when applicable.
The notion of information that we will use in this book starts from the consideration
of a system S. We observe this system with a set of sensors Σ. The system can
be described as being in a (usually very large) set of possible states, Ψ , and the
observations tell us which one is appropriate to describe S. The states ψ ∈ Ψ can
be represented in terms of the measurements captured by the sensors Σ. Knowledge
of ψ provides information about the various parts of S and of their interactions, and
allows some distinctions to be made between entities and their properties.
If we use a different set of observation tools, Σ , obtained by changing the sensors,
the system S does not change, but the image that we have of it does; in particular,
120 5 Boundaries of Abstraction
S can be now described as being in a different set of possible states Ψ . The state
ψ ∈ Ψ , describing S if observed with Σ, becomes a state ψ ∈ Ψ , if observed
with Σ . If the sensors in Σ are less sensitive, more coarse-grained or are a subset of
those in Σ, some distinctions in ψ will not be possible anymore, and the information
we gather on S is in a lesser amount and/or less detailed. If this is the case, we say that
knowing the state ψ is less informative, or provides less information that knowing
the state ψ. As a consequence, we say that (the description of) state ψ is “more
abstract” than (the description of) state ψ.
Abstraction is then linked to the change in the amount of information that can
be obtained by “measuring” the system S with different observation tools.1 We
recall that information may be reduced either by omitting (hiding) part of it, or by
aggregating details into larger units. In the next chapter we will introduce the formal
definition of abstraction based on these ideas.
To make more intuitive the notions introduced so far, let us consider a simple
example.
Example 5.1 Suppose that we have a system S including geometrical figures, such
as polygons. Two objects a and b in the system may be known to be polygons.
Then, S is in the state ψ = (“a is a polygon”, “b is a polygon”). If we observe
more carefully, we may notice that a is a square and b is an hexagon. Then, we
may consider a new state ψ = (“a is a square”, “b is a hexagon”). In this case,
state ψ is less informative and hence more abstract that state ψ, because ψ allows
squares to be distinguished from hexagons, which was not the case in ψ .
Linking abstraction to information reduction is at least coherent with the view of
abstraction as a mechanism that “distills the essential”, i.e., that selects those features
of a problem or of an object that are more relevant, and ignores the others. This
corresponds to the cognitive mechanism of focus of attention, also described by
Gärdenfors,2 and well known in Cognitive Sciences.
The informal definition of abstraction introduced earlier is far from being satisfy-
ing; in fact, we did not say anything about the link between the states ψ and ψ . One
may expect, in fact, that a precise definition of abstraction ought to limit the possi-
ble transformations between ground and abstract states. Nevertheless, even though
informal, the definition is sufficient to go ahead with the other aspects characterizing
abstraction.
When we observe a system in the world, we have to deal with entities (objects, events,
actions, . . .) and with their descriptions. The entities themselves are the extensional
1 As it will be discussed in the next chapter, the system S needs not to be a physical one, and
“measuring” is to be taken in a wide sense, as discussed also by Floridi [175].
2 See Sect. 2.2.
5.1 Characteristic Aspects of Abstraction 121
Fig. 5.1 Incas used quipus to memorize numbers. A quipu is a cord with nodes that assume position-
dependent values. An example of the complexity a quipu may reach (Reprinted with permission
from Museo Larco, Pueblo Libre, Lima, Peru) [A color version of this figure can be found in Fig. H.5
of Appendix H]
part of the system, characterized by their number N ; the size of the system increases
linearly with N . On the other hand, the description of the system is its intensional
part, which, for a description to be useful, should not increase more than sub-linearly
with N . In Kolmogorov’s theory of complexity an object, whose description is of
the size of the object itself, is incompressible.
When we say that abstraction is an intensional property, we mean that it pertains
to the description of the observed entities, and not to collections thereof. During
evolution, humans, in order to organize the inputs they receive from the world in
a cognitively coherent body of knowledge, faced the problem of going beyond an
extensional apprehension of the world, “inventing” the concepts. A typical example,
as discussed in Sect. 2.3, has been to move from physically counting objects (see
Fig. 5.1) to the notion of “number”.
Without entering into a disquisition of the subtleties of the concept of “con-
cept”, we will equate a concept C with a “set Z of sufficient properties”. Any object
satisfying Z is declared to be an instance of C. For example, the Oxford Dictio-
nary defines the concept C = vehicle as “a thing used for transporting people or
goods”. As we can see, a concept can also be defined in terms of functionalities.
Concepts can be more or less detailed, according to the defining properties, and they
may form hierarchies. For example, a vehicle can be terrestrial, marine, or aerial. An
example of a possible hierarchical organization for vehicle is reported in Fig. 5.2.
In this figure we may notice that there are two types of nodes and edges. Oval nodes
correspond to descriptions, whereas rectangle nodes correspond to instances, i.e.,
particular objects that satisfy the descriptions. Each oval node adds some new prop-
erty to the description of the father node, and is linked to it by an “is-a” relation. Thus,
nodes low in the hierarchy are more detailed then nodes up, and they provide more
information about the objects that satisfy the properties. The lowest level contains
the objects themselves, which are, in fact, the most detailed descriptions.
122 5 Boundaries of Abstraction
Fig. 5.2 A possible hierarchical organization of the concept vehicle = “thing used for
transporting people or goods”. Transportation may occur on land, sea, or air. A vehicle can be
used to transport people or goods, and so on. The instances of car are specific cars, uniquely
identified with their (Italian, in the figure) plate
part-of
part-of
part-of
Fig. 5.3 A computer is constituted by several parts, such as screen, keyboard, loudspeakers, mouse
and body. The body has many components inside, among which there is the internal hard disk,
which, in turn, has its own components. Also the mouse has several parts in its interior. Then,
compound objects can be decomposed into parts at several nested levels
to us what to “see”. Notice that a “part-of” hierarchy does not have the extensional
interpretation as the “is-a” hierarchy.
In summary, abstraction acts on an object description: we cannot abstract the
object, because we cannot change what it is, but we can abstract its description.
(a) (b)
Fig. 5.4 a Picture of a poppy field. If we only have this picture, it is impossible to say whether
it is concrete or abstract. b The same picture in black and white. By comparison, this last is less
informative than the colored one, because the information referring to the color has been removed;
then picture (b) is more abstract than picture (a) [A color version of this figure is reported in
Fig. H.6 of Appendix H]
hope, and that we can only speak of abstraction as a relative notion, and not as an
absolute one. In other words, all we can say is that something is more abstract than
something else. Then, abstraction has to be considered as an equivalence relation
that induces a partial order on entities.
In order to explain our choice, let us look at an example. In Fig. 5.4a a picture of
a poppy field is reported. There are no clear and undisputable grounds for labeling
this picture as abstract or concrete. In fact, if we reason from the point of view of
closeness with the reality, the picture is not the “true” poppy field, and then it should
be abstract (see also Fig. 2.2). On the other hand, if we judge from the point of
view of abstract art, it has a close resemblance with the original, and then it should
be rather labeled as concrete. From the point of view of the ability to capture the
essential aspects of the original, again we do not know what to say: maybe there
are important details that the picture did not capture (for instance, the pistils), or
the image is even too much detailed (maybe, only the perception of a red field, as
in impressionist art, would matter). But, if we look at the picture in Fig. 5.4b, and
we compare picture (a) with picture (b), we are immediately able to say that picture
(b) is more abstract. In fact, the information about the color has been removed,
leaving the rest unchanged. We want to stress that only the pictures are compared
with respect to the “more abstract then” relation, because the original poppy field,
of course, did not change, as we have discussed in Sect. 5.1.2. We may notice that
5.1 Characteristic Aspects of Abstraction 125
picture (b) in Fig. 5.4 is more abstract than picture (a) even according to the notion
of abstraction as taking a distance from the senses. In fact picture (b) has a poorer
sensory quality than its colored counterpart.
Defining abstraction as a relative notion agrees with Floridi’s idea of Gradient of
Abstraction (GoA), discussed in Sect. 4.2.2, and, specifically, with nested GoAs.
Moving to a more subtle domain, let us consider concepts. We have seen that
concepts, too, are labeled as “abstract” or “concrete”, according to the fact that they
refer to abstract or concrete things in the world. However, also in this case the dis-
tinction is not easy to do. In fact, if considering concrete a chair (a classical example
of concrete thing) looks quite reasonable, classifying as abstract freedom (a classical
example of abstract thing) might be challenged: in fact, one could say that experi-
encing freedom (or its opposite, slavery) affects deeply one’s life in a very concrete
way. Clearly, this discussion involves the notion of abstraction as distance from the
sensorial world, which is unable to provide an always uncontroversial labeling.
The examples of abstraction as moving up and down an “is-a” or a “part-of”
hierarchy, described in the preceding section, are good instances of the relativity of
the notion itself. In fact, while it is impossible to label any concept node (is-a) or
any structural node (part-of) as abstract or concrete, it is very natural to compare
two of them if they are linked by the same relation. Clearly, abstraction induces
only a partial order among entities, and the node car and truck, in Fig. 5.2, are
incomparable. Given two entities, the means that can be used to actually compare
them according to the more-abstract-than relation will be introduced in Chap. 6.
If abstraction is not an absolute but a relative notion, the process with which a more
abstract state is generated from a more detailed one becomes crucial.
Let S be a system (with objects, events, . . .), and let Ψ be the set of its possible
states, determined by the used sensors. Each state ψ corresponds to a description of S.
Taking two of these states, say ψ1 and ψ2 , we would like to compare them with respect
to the more-abstract-than relation. Except in cases where the relation is obvious (as,
for instance, in the cases of Figs. 5.2 and 5.3), it would not be easy, or it would be even
impossible, to make a comparison. In fact, if abstraction were an absolute notion,
for any two states we could say whether they are comparable (and then which one
is the more abstract) or whether they are incomparable, only by looking at the states
themselves. In fact, it would be necessary to define a function I(ψ), depending only
on the state ψ, which represents the information that the description corresponding
to ψ conveys about the system under analysis. In this case, any pair of states ψ1
and ψ2 such that I(ψ2 ) < I(ψ1 ) would imply that ψ2 is more abstract than ψ1 ,
even though ψ1 and ψ2 are unrelated. On the contrary, with the generative view of
abstraction that we propose, ψ2 must be obtained from ψ1 through an identifiable
process. By taking the position that the comparison among two states, with respect
to their relative abstraction level, depends on the path followed for going from one
to the other, the comparison may require additional information.
126 5 Boundaries of Abstraction
In order to understand the importance of taking into account the very abstraction
process, let us consider two examples. In Fig. 5.5 the beautiful Escher’s lithography
Liberation is reported. Suppose that we have only access to the view of the birds at the
top and of the triangles at the bottom of the drawing. If we want to link them according
to their information content, we cannot but conclude that they are unrelated. In fact,
birds and triangles, taken in isolation, do not have anything meaningful in common.
On the contrary, if we have access to the whole drawing, we see quite clearly how the
triangles are obtained from the birds through a sequence of detail eliminations and
approximations. Then, knowing the process of transformation from one state into
another allows the birds and the triangles to be related, by saying that the triangles
are indeed modified representations of the birds.
Another example can be found in Fig. 5.6, taken from Il vero modo et ordine per
dissegnar tutte le parti ie membra del corpo humano,3 by Odoardo Fialetti, printed in
Venice in 1608. Here a study on the techniques for drawing a human eye is illustrated.
3 “The true way and order for drawing all parts and members of the human body”.
5.1 Characteristic Aspects of Abstraction 127
Fig. 5.6 From Fialetti’s “Il vero modo et ordine per dissegnar tutte le parti ie membra del corpo
humano”, 1608. One among a set of studies for drawing eyes
In the series of eyes it is really hard, without looking at the intermediate steps,
to relate the top-leftmost and bottom right-most drawings. However, the relation
between the two clearly appears if we consider the whole process of stepwise trans-
formations.
Abstraction has been considered a process also in Mathematics, where the concept
of number is reached, according to Husserl, through a counting process leaving
aside all properties of a set of objects, except their numerosity. Lewis [329] defines
explicitly abstraction as a process of removing details from the concrete.4 Finally,
Staub and Stern’s approach to abstraction5 mixes the idea of both abstraction as a
process and abstraction as a relative notion, as we do; in fact, these authors claim
that concepts are obtained by reasoning, starting from the concrete world. Along the
reasoning chain abstraction increases, so that the farther from the concrete world
a concept is along the chain, the more abstract it is. As an example, real numbers
are more abstract than integers. Even though this approach shares with our view the
ideas of process and relativity of abstraction, we do not reach the same conclusions
as Staub and Stern, regarding numbers, because they do not acknowledge the role of
information reduction along the abstraction process.
Considering abstraction as a process raises two important issues. The first one
is to investigate whether the process has a preferential direction, and whether it is
reversible. The second one is the identification of the abstraction processes them-
selves. Concerning the first issue, we must remember that we have defined abstraction
as an information reduction mechanism, whatever this means. A part of the world,
namely a system S, contains, in nuce, all the features and details that can possibly
be detected. It is then necessary to decide what features of the system are to be
considered and measured, and which ones are not. The result of this selection is
the most detailed description dg of the system that we decide to keep, and also the
Fig. 5.7 A color picture has been transformed into a black and white one. If the color is added
again, there is no clue for performing this addition correctly, if it is not know how the color was
originally removed [A color version of this figure is reported in Fig. H.7 of Appendix H]
The last aspect of abstraction to be discussed in this chapter is its effect on the
information that is removed. According to the view of Abstract Data Types in Com-
puter Science, the idea behind abstraction is that information is not deleted from
a description at a given level, but only hidden, so that it can be seen at a lower
(more detailed) levels, and also recovered when needed. This information hiding
is also called encapsulation [142]. For example, in Roşu’s approach, abstraction is
explicitly defined as information hiding.7 As we have discussed in Sect. 2.4, Colburn
and Shute [111] contrast information hiding in writing programs with information
neglect in Mathematics.
If we think of the reversibility problem mentioned earlier, the loss of the informa-
tion removed at a given level would completely hinder the concretion process. In fact,
any lost information cannot be recovered without seeking it again in the real world
Having now introduced our basic ideas about abstraction, we can see how they
help in setting boundaries between abstraction and cognate notions. In particular,
we will discuss the relations between abstraction, on one hand, and generalization,
categorization, approximation, simplification, and reformulation, on the other.
Throughout Chaps. 2 and 3 we have seen that abstraction is very often linked to the
notion of generalization, and sometimes even identified with it, as, for instance, by
Colunga and Smith [112] or by Thinus Blanc [528]. Of course the relation between
abstraction and generalization depends on their respective definitions. If abstraction
is defined as generalization, the identification cannot be but correct. However, we
claim that this identification is not appropriate, because it masks distinctions that it
is important and useful to preserve.
The first and most important distinction between abstraction and generalization
is that the first is an intensional property, i.e., it pertains to descriptions, whereas
generalization is extensional, i.e., it pertains to instances. In order to clarify this
distinction, we have to start somewhat far, and precisely from Frege’s notion of
concept [181]. Given a First Order Logical language L, let ϕ(x) be a formula with
free variables x = {x1 , . . . , xn }. This formula represents a “concept”, i.e., the set of
all n-tuple of objects a = {a1 , . . . , an }, in the chosen domain A, which satisfy ϕ(x).
(a) (b)
Fig. 5.8 Examples of concepts according to Frege. a COV (ϕ) is the extension of the concept
Mother(x, y), i.e., the set of pairs of peoples such that y is the mother of x. b COV 1 is the extension
of the concept Mother(x, b), i.e., the set of b’s children, and COV 2 is the extension of the concept
∃x[Mother(x, y)], i.e., the set of women that have at least one child in A. COV 2 is the projection
of COV (ϕ) on the y axis
Let us call this set COV (ϕ).9 Formula ϕ does not have a truth value associated to it,
but an “extension”. It is not necessary that all the variables in ϕ are free: some may
be bound, but at least one must remain free.
We notice that this definition of concept is coherent with the view of concept
as a set of properties introduced in Sect. 2.6. In fact, formula ϕ(x) specifies what
properties the instances must satisfy.
Example 5.2 Let ϕ(x, y) = Mother(x, y) be a concept and let A be a given set of
people. Let the upper right quadrant of the space (x, y) contains the set of all pairs
(x, y) ⊆ A × A, and let COV (ϕ) be the extension of ϕ(x, y), i.e., the set of all pairs
(x, y) of people such that y is the mother of x, as it appears in Fig. 5.8a. We may
bind a variable either by instantiating it to a constant, or by using a quantifier. Let us
see what concepts we obtain through these operations.
Let us first set y = b; the concept Mother(x, b) has the only free variable x,
and represents the set of children of b, some of which (the subset COV 1 on the x
axes) belong to the set A. On the contrary, if we bind x to a, we obtain the concept
Mother(a, y), which represents the set of mothers of a; as there is only one mother
for each person, this concept has either an extension consisting of a unique point (if
a and his/her mother belong to A), or it is void.
Consider now the existential quantifier applied to x, i.e., ∃x[Mother(x, y)]. This
is a concept with free variable y, and represent the subset COV 2 of persons y that are
mothers of some children. On the contrary, the concept ∃y[mother(x, y)] represents
the set of people whose mother is included in the set A.
Finally, let us consider the universal quantifier applied to x, i.e., ∀x[Mother(x, y)].
This concept has y as free variable, and represents the set of women y that are mother
of all persons in A, clearly a void concept, because y ∈ A, but cannot be mother of
herself. On the contrary, the concept ∀y[Mother(x, y)] represents the set of people
x whose mothers are the whole population in A; again, clearly a void concept.
When a formula does not have any free variable, it becomes a sentence; it does
not have an extension associated to it, but has a truth value.
Example 5.3 Let us consider in the set A the whole humanity at a given time instant.
The formula ∀x ∃y [mother(x, y)] is not a concept, because it does not have any free
variable, but is a sentence that has value true, because it asserts that each person has
a mother.
Concepts can be compared according to their extension: a concept C1 is more
general than concept C2 iff COV (C1 ) ⊇ COV (C2 ). We notice that the more-general-
than relation, in order to be assessed, needs to compare sets of concept instances.
Hence, generality cannot be attributed to sentences, which do not have associated
extensions. On their part, sentences, with their truth value, provide information. They
are the intensional counterpart of concepts. Being an intensional property, abstraction
is related to sentences. As a conclusion, the following differences between abstraction
and generalization can be assessed:
• Abstraction is an intensional notion and has to do with information, whereas gen-
eralization is an extensional notion and has to do with instance covering.
• Abstraction can be applied to a single entity, generalization only to sets of entities.
• Abstraction is related to sentences, generalization to concepts (in Frege’s sense).
• Abstraction and generalization can be performed by means of different operators.10
Abstraction and generalization not only can be distinguished, but they are, in a
sense, orthogonal. In Fig. 5.9 we see that they can be combined in all possible ways,
generating a bi-dimensional space, where one axis corresponds to abstraction and the
other to generalization. It is possible to find descriptions that are general and abstract,
general and concrete, specific and abstract, or specific and concrete. The separation of
abstraction from generalization “solves”, to a certain extent, Berkeley’s objection to
the idea of abstraction (see Sect. 2.1); in fact, a concept can, at the same time, be very
precise and cover many instances. Moreover, the separation agrees with Laycock’s
view, reported in Sect. 2.1, that there are two dichotomies: “abstract/concrete” and
“universal/particular”. The first one directly corresponds to the Abstraction axis in
Fig. 5.9, whereas the second one can be mapped onto the Generalization axis in
the same figure. Unfortunately, starting from the ontic view of abstraction, other
philosophers, such as Quine (see Sect. 2.1), let again the two dichotomies coincide.
10 For instance, aggregation is only meaningful for abstraction, whereas omitting details pertains
to both abstraction and generalisation.
5.2 Boundaries of Abstraction 133
Fig. 5.9 Abstraction and generalization can be combined in every possible way. In the left-bottom
corner there is picture of one of the authors, which is specific (only one instance) and concrete
(all the skin, hair, face, . . . details are visible). In the right-bottom corner there is a version of the
picture which is specific (only one instance, as the person is still recognizable) and abstract (most
details of the appearance are hidden). In the top-left corner the chimpanzee-human last common
ancestor is represented with many physical details, making thus the picture still concrete; however
many monkeys or humans satisfy the same description, so that this is an example of a concrete but
general concept. Finally, in the top-right corner there is a representation of a human head according
to Marr [353] (see Fig. 2.13); the head is abstract (very few details of the appearance) and general
(any person could be an instance) image [A color version of this figure in reported in Fig. H.8 of
Appendix H]
Figure 5.9 has to be read along two dimensions: the pictures may be viewed either
as concepts, or as descriptions of concepts (sentences). In the first interpretation, they
must be compared according to their extension, which increases from bottom to top.
In the second interpretation, they must be compared according to the amount of
information they provide, which decreases from left to right.
Clearly, even if Fig. 5.9 shows that it is not always the case, it happens very
frequently that a more abstract description corresponds to a more general concept.
In fact, during the process of abstraction, details are increasingly removed, and the
set of properties that instances must satisfy shrinks. This concomitance might be a
reason for the confusion between abstraction and generalization.
A second aspect that differentiates abstraction from generalization consists in
the possibly different nature of their related operators. If we consider the hier-
archy in Fig. 5.2, we can make two observations. The first is that, if nodes are
134 5 Boundaries of Abstraction
viewed as concepts, by going up the hierarchy, more and more general concepts are
found, because their extension is the union of the extensions of the children nodes.
On the other hand, if the nodes are viewed as descriptions, they become more and
more abstract going up, because the information that they provide about the instances
of the associated concepts are less and less detailed. Then, in this case, an increase
in generality goes together with an increase in abstractness.
If we now look at Fig. 5.3, we see that what was said for the hierarchy in Fig. 5.2
is not applicable here. In fact, the nodes in the “part-of” hierarchy can only be
interpreted as descriptions, whose abstraction level increases going up, and not as
concepts whose generality increases. The nodes in this hierarchy are incomparable
from the more-general-than relation point of view.
In several disciplines where abstraction is used and deemed important this notion
has been related to (and sometimes defined on the basis of) a mechanism for extracting
common features from a variety of instances. By considering what has been said
earlier in this chapter, this mechanism might underlie generalization, rather than
abstraction. In fact, the abstraction process, in order to be performed, does not need to
look at several instances, but it can be applied to single objects, so that commonalities
with other objects do not matter. In addition, abstraction is a process that hides
features instead of searching for them. Again, as searching for common features
means deleting the different ones, searching shared features across instances and
forgetting irrelevant ones ends up with a more abstract description. In fact, abstraction
ignores irrelevant features; these ones are likely to be those that are accidental to
an instance and not belonging to its essence. Then, even though generalization and
abstraction are different mechanisms with different goals, their result may sometimes
be the same, which explains again why they get often confused.
After discussing the differences between generalization and abstraction, we may
look into their possible links. To this aim, let us consider a question that has been dis-
cussed in the Machine Learning literature at some length: Is the assertion s1 =“Yves
lives in France” more general than the assertion s2 = “Yves lives in Paris”? [172].
Actually, this question, as it is formulated, is ill-posed. First of all, both s1 and s2
are sentences and not concepts; as such, they do not have an extension associated to
them, but a truth value. As we have discussed earlier, generalization is an extensional
property and, then, it makes no sense to assign to either s1 or s2 a generality status.
On the other hand, assertion s1 provides much less information about the place where
Yves lives than s2 , and then, we are willing to say that s1 is a more abstract description
of Yves’ domicile than s2 . Now, let us consider the set of people living in Europe, and
let lives(x, France) be the concept whose extension COV (lives(x, France)) is
the subset of all people living in France. In an analogous way, let lives(x, Paris)
be the concept whose extension COV (lives(x, Paris)) is the subset of all people
living in Paris. As Paris is in France, then
and, hence, the concept lives(x, France) is more general than the concept
lives(x, Paris).
5.2 Boundaries of Abstraction 135
After investigating the connections between abstraction and generalization, let us try
to relate abstraction and approximation. The task here is more difficult, because the
precise definition of the more-general-than relation in terms of extensions is not paral-
leled by anything similar for approximation. We can decompose the problem into two
parts: defining approximation first, and discussing the link approximation/abstraction
later.
The Oxford Dictionary defines approximation as “a value or quantity that is nearly
but not exactly correct ” or “a thing that is similar to something else, but is not exactly
the same ”. Then, intuitively, approximation is related to the notion of controlled error.
When describing a system, some part of its description or behavior is replaced by
another one. In principle, any part could be replaced with anything else, depending
on the reasons underlying the substitution. When considering approximation in the
context of abstraction, the main reason is usually simplification. We may recall here
Hobbs’ proposal [252] of considering approximate values in a numerical interval as
indistinguishable.
Example 5.4 (Pendulum) Let us consider the simple pendulum represented in
Fig. 5.10, embedded in the gravitational field. If we suppose that the pendulum starts
from the position θ = θ0 > 0 with null velocity, and that there are no dissipative
forces, it will oscillate between position θ = θ0 and θ = −θ0 with a period T .
Solving the equation of motion
g
θ̈ = − sin θ
r
136 5 Boundaries of Abstraction
where r is the length of the pendulum, g the gravity acceleration, and K the Complete
Elliptic Integral of First Kind.
If we assume that θ0 is small, i.e., that the pendulum swings in the vicinity of the
position θ = 0, an approximate (but simpler to solve) equation of motion is obtained,
namely
g
θ̈ = − θ
r
This linearized equation provides an approximate value for the period, i.e.,
r
Ta = 2π (5.2)
g
When θ0 = 0, K (0) = π/2, and hence the error is 0, as it must be. When
θ0 = π/2, K (sin π/4) = 2.086, and the maximum relative error is 0.25, namely the
approximate value does not differ from the true one by more than 25 % of the latter.
Approximation may also be done in discrete systems, where it is not always easy
to describe what replaces what, and how good the approximation is. This was actually
5.2 Boundaries of Abstraction 137
Fig. 5.11 A running man has been approximated by replacing his body parts by polygons. The
approximation lets both the human body and its running posture to be still recognized
the starting point of the abstraction theory proposed by Imielinski [269], as described
in Sect. 4.5.2.
The following example clarifies this case.
Example 5.5 Let us consider the running man in Fig. 5.11. The parts of the body
have been approximated by means of polygons, and yet the body and its running
attitude are clearly recognizable. However, in this case it is very difficult to define
an approximation error.
The schematic representation of a man in Fig. 5.11 recalls Marr’s 3-D representa-
tion with generalized cones (geons), reported in Fig. 2.13, which has to be considered
an approximation of the human body as well.
A last example comes from Computer Graphics, where 3-D objects are represented
as a network of meshes (see Fig. 5.12). The mesh representation makes a compromise
between realism in object rendering and computational complexity. Increasing the
number of meshes lets the realism of the image increases, but also computational
needs increase.
In summary, approximation occurs when some part (a variable or a value) of
a system is not hidden but replaced with something else, usually with the aim of
achieving simplicity: a simpler description, a simpler behavior, a simpler solution.
The approximation of Example 5.4 reduces the complexity of solving the equation
of motion, whereas the ones of Figs. 5.11 and 5.12 generate simplified descriptions.
138 5 Boundaries of Abstraction
5.3 Summary
Abstraction has been related, in the literature, to other notions, such as generalization,
approximation, and reformulation. It is thus important to try to set boundaries among
these notions (or mechanisms) in such a way that modifications applied to system
descriptions are clearly understood with their effects and properties.
A recurring theme in defining all the above mechanisms is simplicity. All of
them aim at simplifying a problem or a problem solution. This common struggle
to simplicity has sometimes generated confusion. Even though it is true that sim-
plification is the ultimate goal of abstraction, generalization, approximation and
reformulation, nonetheless the workings of these mechanisms may be very differ-
ent from one another. In addition, as there is no precise definition of simplicity,
either, things become even more intricate, as different notions of simplicity may be
implicitly or explicitly invoked.
In our approach we use information as a common denominator to all notions. This
allows a clear characterization of the various notions to be provided, even though only
focused on the problem of knowledge representation. With this choice, all the above
mechanisms can be described as acting on spaces containing states of a dynamical
system. The mechanism, be it abstraction or any of the others, is modeled as a process
moving from one state to another.
Concerning abstraction, it has been identified as a mechanism that handles sys-
tem descriptions, providing information about the system itself, and modifying the
amount of the information provided by hiding or aggregating details. Only changes
in the information are considered, so that it is not necessary to assess whether an
entity is abstract or concrete; what only matters is a partial order among entities,
generated by a more-abstract-than partial relation.
Abstraction is not viewed as a mapping between two existing spaces, but as a
generative process, which, starting from one space (called “ground”), generates the
other (the abstract one) with less information. As a consequence, in the more abstract
space there are only states with at least one antecedent in the ground one. In this case,
no “spurious” state [587] may appear.
Chapter 6
The KRA Model
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 141
DOI: 10.1007/978-1-4614-7052-6_6, © Springer Science+Business Media New York 2013
142 6 The KRA Model
(a) (b)
Fig. 6.1 a A task to be performed (a query) requires both measurements (observations) from the
world and a theory. b In order to perform the task of detecting the presence of a person in a corridor,
a camera is used. The output of the camera (the observations) is processed by suitable algorithms
(theory)
As sketched in Fig. 6.1, the task (query) requires (at least) two sources of
information: measurements of observables in S, obtained through a set of sensors,1
and a task-dependent theory, namely a priori information about the structure of S,
its functioning, and its relations with the rest of the world. An additional source of
information may include some general background knowledge. As we will see in the
following, tracking the sources of the information is very important. Q may assume
various formats, according to the nature of S. For example, in symbolic systems
Q may be a closed logical formula (a sentence) to be proved true, or an open one
(a “concept”) whose extension has to be found. In continuous systems Q may be a
set of variables whose values have to be computed. Analogous observations can be
done for the measurements and the theory, which must comply with the format of
the query.
Concerning the source of all the components in Fig. 6.1a, the measures clearly
come from the world, whereas the theory and the query itself are usually provided
by a user, who will receive the answers. For the moment we just consider abstraction
from the representation point of view, leaving a discussion of the interaction between
theory and observations for a later chapter.
1As already mentioned, the term “sensor” has to be intended in a wide sense, not only as a physical
mechanism or apparatus. Acquiring information consists in applying a procedure that supplies the
basic elements of the system under consideration.
6.1 Query Environment, Description Frame, and Configuration Space 143
Let us start from the query. As already said, the query Q represents the task to
be performed on a system S, and it may assume different formats. The query is
provided by the user, and, in order to answer it, we need to reason and/or execute
some procedure on data observed in S.
The choice of the sensors Σ (either natural or manmade measurement apparata)
needed to acquire information about S biases all what we can know of S, both directly
(through measurements) or indirectly (through inference). The output from the sen-
sors are the observations. We assume that observations consist of the specification
of the objects that can be present in S, of the values of some attributes of the objects,
of functional relations among objects (functions), and of some inter-relationships
among sets of objects (relations). We are now in the position of introducing the
notion of description frame.
Definition 6.1 (Description frame) Given a set of sensors Σ, let ΓTYPE , ΓO , ΓA ,
ΓF , and ΓR be the sets of all types, identifiers of objects, attributes, functions and
relations, respectively, potentially observable in a system S by means of Σ. The
description frame of Σ is the 5-ple Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR .
The set ΓO includes labels of the objects that can possibly be detected by Σ in
the system S. Here, the notion of object is considered as an undefined primitive, and
we rely on an intuitive definition of objects as elementary units (physical objects,
images, words, concepts, . . .) appearing in the system S to be described. Objects
are typed, i.e., assigned to different classes, each one characterized by potentially
different properties. For instance, an object can be of type human, whereas another
of type book. If no specific type is given, then objects will be of the generic type
obj. We will denote by ΓTYPE the set of types that objects can have, and by ΓO,t
the subset of objects of type t∈ ΓTYPE .
The set of attributes ΓA = {(A1 , Λ1 ), (A2 , Λ2 ), . . . , (AM , ΛM )} consists of
descriptors of the objects. Each attribute Am (1 m M) may take values either
(m) (m)
in a discrete set Λm = {v1 , . . . , vm }, or in a continuous one, i.e., a (proper or
improper) subset of the real axis R, namely Λm = [a, b] ⊆ R. When suitable, we will
consider the type of the objects as an attribute A0 , whose domain is Λ0 = ΓTYPE ,
and |Λ0 | = 0 . For each type of objects only a subset of the attributes defined
(t) (t) (t) (t)
in ΓA is usually applicable. Let ΓA,t = {(A1 , Λ1 ), . . . , (AMt , ΛMt )} ⊆ ΓA
(t)
be the subset of attributes applicable to objects of type t. The set Λi ⊆ Λi (1
i Mt ) is the set of values that objects of type t can take on. Let |ΓA,t | =
Mt M. The association of the attributes and their domains to the types has
also the advantage that it allows some specific values, characteristic of a type,
to be specified. For instance, given the attribute Color with domain ΛColor =
{yellow, red, white, orange, pink, blue, green}, we may asso-
ciate to flowers of type poppy the attribute (Color (poppy) , {red}). This is an easy
way to represent ontologies, where the attributes characterizing each node can be
specified.
The set ΓF = {f1 , f2 , . . . , fH } contains some functions fh (1 h H), with
arity th , such that:
144 6 The KRA Model
The domain DOM(fh ) contains a set of th -ples, each one with associated a value in
the co-domain CD(fh ). We assume that all arguments of fh take values in ΓO , so that
t
DOM(fh ) = ΓOh . The co-domain can be either ΓO or another discrete or continuous
value set. Notice that functions are, at this point, only empty shells, because ΓO
contains only a set of identifiers, as previously mentioned. Then, function fh has to
be intended as a procedure that, once the values of its tuple of arguments is actually
instantiated, associates to the tuple a value in CD(fh ). As an example, let Mother:
ΓO → ΓO be a function. The semantics of this function is that, once the identifier
of a particular person x is given, the procedure Mother provides the identifier of x’s
mother.
Finally, the set ΓR contains some relations Rk (1 k K), each one of arity tk ,
such that:
t
Rk ⊆ ΓOk
Each argument of a relation can only take values in ΓO . As it happens for functions,
also Rk is a procedure that, given an instantiation of the tuple of its arguments, is able
to ascertain whether the tuple satisfies the relation. As an example, let us consider
relation RFatherOf ⊆ ΓO × ΓO . To each pair (x1 , x2 ) of persons, RFatherOf determines
whether x1 is actually the father of x2 .
When an attribute, or a function, or relation is not applicable to an object (or set
of objects), we denote it as NA (Not Applicable). It is also possible that some value,
even though applicable, is not known; in this case we set it to UN.
The description frame Γ generates the totality of the descriptions that can be
formed with the elements specified in it (i.e., objects, attributes, functions and rela-
tions) be a function. The set of these descriptions is the configuration space. In order
to formally define the configuration space, we have to look into more details at how
a description can be built up with the elements of Γ .
First of all, any object has its type t associated to it, and has an identifier o ∈ ΓO,t .
An object is described by a vector:
FSET (fh ) are usually observed, but only a subset of v (0 v |FSET (fh )|) of them.
Then, in order to capture the actual situation, it is useful to introduce the following
definition.
Definition 6.2 (FCOV) Given a function fh of arity th , let FCOV (fh ) be a cover of
fh , namely a set of tuples satisfying the function. Then:
An analogous reasoning can be made for relations. We define by RSET (Rk ) the set of
all tk -ples x1 , x2 , . . . , xtk that can potentially verify Rk in a system with N objects.
In analogy with functions, we introduce the following definition:
Definition 6.3 (RCOV) Given a relations Rk of arity tk , let RCOV (Rk ) be a cover
of Rk , namely, a set of tk -tuples (x1 , . . . , xtk ) satisfying Rk . It is:
ψ = ({(on , tn , v(t1 n ) , . . . , vM
(tn )
t
| 1 n N), {FCOV (fh ) | 1 h H},
n
The description frame and the configuration space are defined before any obser-
vation is made in the world. Let us consider now a system S, and let us collect all
the measures performed on it in a structure, called a P-Set, and denoted P. The
name comes from the fact that P is a perception, i.e., it contains the measures and
information “perceived” in the world. As we assign a primary role to P, we call our
model of abstraction “perception-based”.
Definition 6.5 (P-Set) Given a system S and a set of sensors Σ, let Γ be its
associated description frame, and Ψ the corresponding configuration space.
A P-Set P, containing the specific observations made on the system S, is a 4-ple
P = O, A, F, R, where O is the actual set of identifiers of (typed) objects observed
in S, and A, F and R are specific instantiations, on the actual objects belonging to
O, of the attributes, functions and relations defined in Γ .
146 6 The KRA Model
(t ) (t )
A = {(on , tn , vj1 n (on ), . . . , vjM n (on )) | 1 n N},
tn
Analogously:
R = {RCOV (R1 ), . . . , RCOV (RK )}
The relation between a P-Set and a configuration lies in the possibility of leaving
some value unspecified. If no UN appears in a P-Set, then the P-Set is exactly one
configuration. If some values are set to UN, then a P-Set is a set of configurations,
precisely the set of all those configurations that can be completed with any legal
value in the place of UN.
In order to clarify the links between the description frame, the configuration space,
and a P-Set, let us introduce a simple example.
Example 6.1 Let Σ be a set of sensors allowing N objects, all of the same type, to be
observed in a system S. Then, ΓTYPE = {obj} and ΓO = {o1 , . . . , oN |N 1}. The
set ΓA = {(A1 , {0, 1}), (A2 , {true,false})} includes two attributes with sets of
values Λ1 = {0, 1} and Λ2 = {true,false}, respectively. The set of functions,
ΓF = {f : ΓO → ΓO } includes a single function, and the same is true for the set of
relations, namely, ΓR = {R(x, y) ⊆ ΓO2 }.
In this simple case we can find all possible configurations. The possible combi-
nations of attribute values are four; each one can be assigned to any of the N objects,
obtaining the set ΨA (N):
Hence, |ΨA | = 4N . In the description of objects, the type has been omitted, as it is
the same for all.
2 Notice that the names of objects are unique, so that they are the key to themselves.
6.1 Query Environment, Description Frame, and Configuration Space 147
and then: N
N
|ΨF | = = 2N
v
v=0
and then:
n 2
2
N 2
|ΨR | = = 2N
v
v=0
Let us suppose now to take only partial observations of S. We may obtain, for
instance, the following P-Set:
⎧
⎪
⎨ A = {(a,1,UN), (b,0,true), (c,0,false)}
P→ FCOV (f ) = {(b,UN)}
⎪
⎩
RCOV (R) = {(a,b)}
P corresponds to the set of six configurations {ψ1 , ψ2 , ψ3 , ψ4 , ψ5 , ψ6 }, where:
ψ1 = ((a,1,false), (b,0,true), (c,0,false)), {(b, a)}, {(a,b)}
ψ2 = ((a,1,false), (b,0,true), (c,0,false)), {(b, c)}, {(a,b)}
ψ3 = ((a,1,false), (b,0,true), (c,0,false)), {(b, b)}, {(a,b)}
148 6 The KRA Model
ψ4 = ((a,1,true), (b,0,true), (c,0,false)), {(b, a)}, {(a,b)}
ψ5 = ((a,1,true), (b,0,true), (c,0,false)), {(b, c)}, {(a,b)}
ψ6 = ((a,1,true), (b,0,true), (c,0,false)), {(b, b)}, {(a,b)}
Obviously, one of the possible configuration coincides with the exact one,
namely ψ2 .
A description frame more sophisticated than the previous one is described in the
following example.
Example 6.2 Let Σ be a set of sensors that recognize geometric objects in a plane,
their attributes and relative positions. We define a description frame3
Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR ,
Objects of type point do not have dimensions, objects of type segment are uni-
dimensional, whereas objects of type figure are 2-dimensional.
The sensors provide four attributes, i.e., ΓA = {(Color, ΛColor ), (Shape,ΛShape ),
(Size, ΛSize ), (Length, ΛLength )}. Color captures the wavelength reflected by the
objects, and the corresponding sensor is able to make distinctions into four shades:
Attribute Shape captures the spatial structure of the objects, and can distinguish
among four values:
Attribute Size captures the spatial extension of the objects, and can assume three
values:
ΛSize = {small, medium, large}
3 We may notice that the perception of the world does not provide names to the percepts, but it
limits itself to register the outcomes of a set of sensors Σ, grouping together those that come from
the same sensors, and classifying them accordingly. This is an important point, because it allows
the information about a system be decoupled from its linguistic denotation; for instance, when we
see an object on top of another, we capture their relative spatial position, and this relation is not
affected by the name (ontop, under, supporting, . . .) that we give to the relation itself. Or, if we see
some objects all the same color (say, red) we can perceptually group those objects, without knowing
the name (“red”) of the color, nor even that the observed property has name “color”. The names are
provided from outside the system.
6.1 Query Environment, Description Frame, and Configuration Space 149
Attribute Length captures the linear extension of the objects, and can assume positive
real values:
ΛLength = R+ .
The set R contains the real numbers, of type real. This type does not come from the
observation process, but it is known a priori, and is part of the background knowledge
about the sensors.
Attribute Color is applicable to objects of type segment and figure, attributes
Shape and Size are applicable to objects of type figure, whereas attribute Length
is applicable to objects of type segment. Moreover, all segments are black, and no
figure is black. Then:
ΓA,point = ∅
ΓA,segment = {(Color, {black}), (Length, R+ )}
ΓA,figure = {(Color, {red, blue, green), (Shape, ΛShape ), (Size, ΛSize )}
The functions Radius and Center capture functional links between objects of type
figure and some of its elements. Finally, we consider three binary relations:
Opoint = {A, B, C, D, E, F, G, H, O}
Osegment = {AB, BD, DB, CA, EF, HF, GH, GE, OP}
Ofigure = {a, b, c, d}
Out of the many possible combinations of values for the attributes Color, Shape and
Size, the following ones have been associated to the objects of type figure:
150 6 The KRA Model
Fig. 6.2 A geometrical scenario with various geometrical elements [A color version of this figure
is reported in Fig. H.9 of Appendix H]
We may notice that the sides of triangle a have not been observed as single entities.
In the above assignments, , h and r are numbers in R+ . The functions Radius and
Center are observed on a unique point of the domain:
Then, FCOV (Radius) = (c, OP) and FCOV (Center) = (c,O) and F =
{FCOV (Radius), FCOV (Center)}.
Finally we have:
Hence:
with:
ΛArticle−Kind = {definite,undefinite}
ΛNoun−Kind = {common,proper}
ΛVerb−Kind = {transitive,intransitive,auxiliary}
Oarticle = {a,the}
Overb = {adopting,mean,deny,. . .,depends}
The sensor set Σ is the source of any experience and information about a system
S under analysis, where concrete objects (the “real things”) reside. However, most
often the world is not really known, because we only have a mediated access to it,
through our “perception” (or some measurement apparata). Then, given a specific
system S, what is important, for an observer, is not the world per se, but how s/he
perceives it. During the act of perceiving the percepts “exist” only for the observer,
and only during the period in which they are observed. Their reality consists in
some stimuli generated in the observer. As an example, let us consider looking at a
landscape; as long as we look at it, the landscape “exists” for us, but when we turn
the head or close our eyes, it is no more there. Then, the simple perception of an
object is something that cannot be used outside the perception act itself.
In order to let the stimuli become available over time, they must become data,
organized in a structure DS. The first and very basic one can be the observer’s
memory, where stimuli of the same kind, coming from the same type of experience,
are put together: images with images, sounds with sounds, color with color, and so
on. The content of memory can be recalled, but can neither be shared with others nor
acted upon, as it is. Clearly, for an artificial agent, an “artificial” memory structure
must be considered. This structure is an extensional representation [545] of the
perceived world, in which those stimuli, perceptively related one to another, are
stored together. In the case of symbolic systems, information pieces can be stored in
tables. Then, the memory consists of a set of tables, i.e., a relational database scheme,
DS, where relational algebra operators can be applied.4 The query environment is
unable to provide answers to the query without actual data; then, we have introduced
the notion of P-Set and of configuration space. The actual observations populate the
structure DS, generating an actual dataset D. If a relational database is used, then
DS is its scheme, whereas D is the populated database. Then, the relation between
DS and D is analogous to that between Γ and P.
Data can be accessed by the observer, but cannot be communicated to other agents,
nor reasoned upon. To this aim, it is necessary to assign to the elements of Γ , and, as
a consequence, to the elements of DS, names which are both sharable among users
and meaningful to them. These names constitute a vocabulary V of a language L,
which has a double role: on the one hand, it offers an intensional view of the perceived
world, as a single symbol can stand for a whole table. On the other, it offers the
building blocks for expressing a theory. Notice that L must be able to express all the
information specified by DS.
Even though there may be a large choice of languages, depending on the nature
of the system under consideration and of its properties, we will concentrate here
on (some subset of) Predicate Logic.5 Hence, the elements of V enter the definition
of a language L = C, X, O, P, F. In L, C is a set of constants associated, in
a one-to-one correspondence,
with the objects in ΓO (namely, CO ), and with the
elements of Λ = M m=1 Λ m (namely, CA ). If continuous attributes exist, then the
set R is considered as well. X is a set of variables. F is the set of names of functions,
associated, in a one-to-one mapping, to the functions in ΓF .
For the set P of predicates, things are a little more complex. The predicates are
the basic elements that allow the theory to be expressed and inferences to be made.
Then, they should be able to describe in an intensional way DS. In this case data are
to be expressed as ground logical formulas to be manipulated by a logical engine.
For this reason, the set P is the union of four subsets, each corresponding to one
component of the P-Set:
P = PTYPE ∪ PA ∪ PF ∪ PR
The set PTYPE contains predicates referring to the types of objects present in the
system, namely:
PTYPE = {type(x)|∀ type ∈ ΓTYPE }
The set PA contains predicates referring to the values of attributes that can be assigned
to objects:
M
PA = {am (x, v)|v ∈ Λm }
m=1
The meaning of am (x, v) is “object x has value v for attribute Am ”. The set PF
contains a predicate for each fh ∈ ΓF of arity th :
H
PF = {fh (x1 , . . . , xth , y)}
h=1
The meaning of fh (x1 , . . . , xth , y) is “y is the value of fh (x1 , . . . , xth )”. The set PR
contains a predicate for each Rk ∈ ΓR of arity tk :
K
PR = {rk (x1 , . . . , xtk )}
k=1
if x = y then y = x
For the sake of exemplification, let us show, in the next example, how a DS can
be obtained once Γ is defined, in the case DS is a relational database.
Example 6.5 Let us consider again the description frame introduced in Example 6.2,
and let us build up the corresponding data structure DS.
The first table to be defined is the OBJ table, assigning a unique identifier to each
object considered in the scene, specifying, at the same time, its type. The scheme
of this table is then OBJ = [ID, Type]. As objects of different types have usually
different attributes associated to them, we define a table of attributes for each type.
With respect to a single table with all objects and all attributes, this choice has the
advantage that it avoids considering a possibly large number of entries with value NA.
As a consequence, we define a set of table t-ATTR (∀t ∈ ΓTYPE ), each one of them
following the scheme t-ATTR = [ID, Aj(t) 1
, . . . , Aj(t)
Mt
].
Regarding functions, each one generates a table corresponding to its cover; for
each function fh ∈ ΓF , a table F-H, with scheme F-H = [X1 , . . . , Xth , fh ], will be
created. The first th columns correspond to the argument of fh , and the last one contains
the associated value of the function. In an analogous way, each relation Rk ∈ ΓR
is associated to a table representing its cover; more precisely, RK = [X1 , . . . , Xtk ],
where the columns correspond to the arguments of relation Rk .
When actual observations are taken, a specific description of a system is acquired
in P. As previously said, these observations are to be memorized in a populated
database D.
Example 6.6 Let Γ be the description frame introduced in Example 6.2. In Γ objects
are partitioned into three types, namely point, segment, and figure. Table OBJ
will thus be the one reported in Fig. 6.3. Objects of a given type can be extracted
from OBJ using the relational algebra selection operator. For instance:
Fig. 6.3 The table OBJ assigns to each objects in the scene a unique identifier, ID, as well as its
type
Fig. 6.4 Tables SEGMENT -ATTR and FIGURE-ATTR, reporting the attribute values of the objects
of type segment and figure, respectively, occurring in the scenario. The objects of type point
do not have attributes associated, and hence there is no corresponding table. The segment OP does
not have a color, as it is not a true segment, but it only denotes the radius of the circle c. The values
, b,h,r stand for generic real numbers
Fig. 6.5 Tables RADIUS and CENTER, corresponding to the FCOV s of the functions Radius and
Center, defined in the scenario. Each function is unary, i.e., it has arity 1
6.2 Query Environment 157
Fig. 6.6 For each relation in the set ΓR = {Rontop , Rleftof , Rsideof } a table has been built up to
collect the tuples satisfying them. Each of the table is an RCOV
The set PA contains predicates referring to the values of attributes that can be assigned
to objects:
The predicate {radius(x, y)} has the semantics: “Object x has radius y”, and
{center(x, z)} has the semantics: “Object x has center z”.
Finally, the set PR contains predicates associated to each Rk ∈ ΓR :
158 6 The KRA Model
The semantics of the predicate ontop(x, y) is that “object x is located on top of object
y”, the one of the predicate leftof(x, y) is that “object x is located to the left of object
y”, and the semantics of the predicate sideof(x, y) is that “object x belongs to the
contour of object y”.
All the above introduced predicates represent what can be said with the language
L referring to the chosen P-Set. The actual instantiations of the predicates that are
true in P are the following ones (for the P of Example 6.3):
The above grounded predicates are the subset of Herbrand’s universe containing the
predicates true in S.
The examples reported above illustrate what a description frame looks like, inde-
pendently from any task. Then, the aspects of a system captured by it may or may
not be relevant, as shown in the following example.
Example 6.8 Let us suppose that, in the scenario composed by geometric elements,
each figure x has associated the ratio ζ(x) between its contour length and its surface
area. This ratio has dimension [length−1 ]; it does not matter what unit is used to
measure length, but this unit has to be the same for all figures. We want to answer
the query:
Q = Arg Max ζ(x)
x∈Ofigure
The answer is an object (or a set of objects) o∗ ∈ Ofigure , whose ratio ζ(o∗ ) is the
maximum over all objects in Ofigure .
In order to answer the query, we need to define, first of all, the functions Area(x),
Contour-length(x), and ζ(x), which provide, respectively, the area of a figure, the
length of its contour (i.e., the perimeter for a polygonal figure, and the circumference
for a circle), and the ratio between the latter and the former.
6.2 Query Environment 159
The function Sum(x, y) (Sum : R2 → R) computes the sum between the two
numbers x and y, whereas predicate diff(x, y) states that x and y are to be bound to
different constants. In an analogous way, the constants 2 and 4 are integer numbers
that belong to N, and have type natural, whereas π ∈ R and has type real.
Finally, the function ζ is simply defined as
Types = {figure,segments}
Attributes = {Shape, Length}
Functions = {Radius(x), Area(x), Contour − length(x), ζ(x), Prod(z, w),
Power2(z), Sum(z, w), Divide(z, w)}
Relations = {Rsideof , Rbaseof , Rheightof , Rdiff }
Both the types and the attributes are to be inserted in Γ , because the type of an object
and its attribute values must be observed in the world. Then:
The needed functions are in part to be observed, and in part are given a priori. More
precisely, the function Radius(x) must be observed, i.e.,
ΓF = {Radius(x)},
whereas the remaining functions are inserted into the theory T . In fact, they are
either computed from more elementary information, or provided by the background
knowledge. Then:
T = {Area(x), Contour − length(x), ζ(x), Prod(z, w),
Power2(z), Sum(z, w), Divide(z, w)}
Concerning the relations, three of them must be observed, whereas Rdiff is only added
to the theory:
ΓR = {Rsideof , Rbaseof , Rheightof }
T = T ∪ {Rdiff }
All the introduced descriptive elements are inserted into the language L = C, X, O, P, F:
When a specific P-Set is observed, the corresponding sensor outcomes are inserted
into the database D. The semantics of the functions and relations not grounded on
the observations (such as, Rdiff , or Prod(x, y) is considered implicitly given.
Finally, we have to provide, in the theory, the means to answer the query, namely,
the algorithm SOLVE(Q, QE), reported in Fig. 6.7. SOLVE returns ANS(Q), i.e.,
the set of objects whose ζ’s value is the largest. Notice that more than one object
may have the same maximum value of ζ.
By considering Example 6.8, we may notice that the description frame Γ , chosen
in Example 6.2, has some measures, such as the color, that are not useful to solve
the task described in Example 6.8, and then, are irrelevant. On the other hand, the
base and the height of triangle a (see Fig. 6.2) are not observed, so that the query
of Example 6.8 cannot be solved exactly. Then, in Γ of Example 6.2 some relevant
aspects of the problem have been overlooked. In order to solve a task, all and only
the relevant aspects of the problem at hand should be captured. In practice, it may
not be possible to obtain a perfect match between query and observations, so that the
query might be solved only partially, or approximately, or not at all. Unfortunately,
it is not always possible to select observations and theory in a coherent way. Then,
one usually tries to collect more information that the one is actually used.
Fig. 6.8 Procedure BUILD-DATA automatically generates the database schema DS starting
from Γ , and the database D starting from P
Fig. 6.9 After generating the database scheme DS , algorithm BUILD-LANG(DS ) constructs the
language L
1 − E(ψ)
Pr(ψ) = e kB T
Z
where E(ψ) is the “energy” of ψ, kB is Boltzmann’s constant (kB =
1.38066 · 1023 J/K), and T is a “temperature”. For such a situation, it makes sense
to speak of the entropy S of ψ and related notion. In this book, however, we only
consider the deterministic case, where no probability is assigned to the outcomes of
164 6 The KRA Model
the sensors.7 On the other hand, observations may not identify exactly the state of
system S, if some value is UN (unknown).
In order to introduce our definition of abstraction, based on information reduction,
we need first to make more precise the notion of information that we will rely upon.
Luckily enough, we do not need to come up with an absolute value of information, but
we only need some tool to determine whether, in some transformation, a reduction
of information occurred. In order to reach this goal, we make use of the relationship
between informativeness and generality discussed in Sect. 5.2.1.
Given a set of sensors, Σ, used to observe a system S, let Γ and Ψ be the
description frame and the configuration space associated to Σ, respectively. Ψ usually
contains a large number of states. When we apply Σ to S, our ignorance about the
state of S is reduced, because we gather, in a P-Set P some information about S.
In the ideal case, when no variable takes on the UN (unknown) value, a single state
is left, and the system’s state is perfectly identified, i.e., P corresponds to a unique
configuration (or state) ψ ∈ Ψ . On the contrary, if some of the variables in P assume
the value UN, P selects a subset of states in Ψ . Then, we can say that, in general,
P ⊂ Ψ . We have now to introduce some definitions.
Definition 6.7 (State compatibility) Given a configuration space Ψ , containing the
possible descriptions of a system S, and an actual set of observations P, a config-
uration ψ ∈ Ψ is compatible with P iff no value in ψ contradicts any of the values
specified by P for the variables. If a variable x (an attribute, a function’s argument, . . .)
may take value in Λ but has, instead, a value UN in P, then any value in Λ for x is
compatible with UN.
Definition 6.8 (COMP) Given a description framework Γ and the configuration
space Ψ associated to the set of sensors Σ, let COMP(P) be the subset of configu-
rations (states) in Ψ that are compatible with an observed P-Set P.
When P is completely specified, the corresponding COMP(P) contains a unique
configuration, the state ψ corresponding to P itself. We can now introduce a funda-
mental definition.
Definition 6.9 (Same space “Less-Informative-Than” relation between P-Sets)
Given two P-Sets P1 and P2 , belonging to a configuration space Ψ , we will say that
P1 is less informative than P2 (denoted P1 P2 ) iff COMP(P2 ) ⊂ COMP(P1 ). If
COMP(P1 ) ≡ COMP(P2 ), the two P-Sets are equally informative.
Definition 6.9 allows two P-Sets belonging to the same configuration space to be
compared with respect to the amount of information they convey. Two configurations
ψ1 and ψ2 in the same space are either equally informative, if they coincide, or
incomparable if they are distinct.
Suppose now that we observe a system S with a given set of sensors, Σg , which
defines a configuration space Ψg . Let QE g = Qg , Γg , DS g , Tg , Lg be the query
7The probabilistic setting is a possible extension of the KRA model, very briefly mentioned in
Chap. 10.
6.4 The KRA Model of Abstraction 165
different from Γg ; in this way the model KRA offers a unifying and operational
framework to previous models of abstraction.
We will now formalize the concepts introduced in the above discussion. Let us
start from the process used to generate Γa .
Definition 6.10 (Generative process) Given a description frame Γg , with the asso-
ciated configuration space Ψg , let Π be a process that generates Γa starting from Γg
Π
(denoted Γg Γa or Γa = Π (Γg )), and the configuration space Ψa starting from
Π
Ψg (denoted Ψg Ψa or Ψa = Π (Ψg )), respectively. Π is said a generative process
for (Γa , Ψa ) with origin (Γg , Ψg ).
Process Π acts on the description frame, before any observation is made. It is
sufficient to define the kind of abstraction one wants to perform, but not to fill in
all the details that are needed for actually abstracting a set of observations Pg . In
fact, Π establishes what are the descriptive elements that a simplified set of sensors
allows to be observed on a system S, independently of any actual S. In addition,
we have to provide a program which implements the modifications defined by Π .
For instance, Π may state that two ground functions f1 and f2 collapse to the same
abstract function f . This is sufficient to define Γa . However, when implementing this
abstraction on a Pg we need to specify what value f shall take on. The implementation
program is embedded in Π , as it will be discussed in the next chapter. By assuming
Π
that the program is given, we can use the notation ψg ψa to indicate that a ground
configuration ψg can be transformed into a more abstract one using the program
embedded in Π . Then, we can introduce the definition that follows.
Π
Definition 6.11 (Configuration space generation) The notation Ψg Ψa or Ψa =
Π (Ψg ) is equivalent to say that:
Π
Ψa = {ψa | ∃ψg ∈ Ψg : ψg ψa }
Notice that all ψa ∈ Ψa have at least one source ψg in Ψg , and possibly more than
one. Concerning P-Sets, we use the following definition:
Definition 6.12 (P-Set generation) A P-Set Pa is obtained from a P-Set Pg through
Π in the following way:
Π
Pa = {ψa | ∃ψg ∈ Pg : ψg ψa }
Fig. 6.10 Graphical illustration of the link between Ψg and Ψa . The P -Set Pg contains a single
configuration ψg ; then, COMPg (Pg ) = ψg . The transformed Pa has COMPa (Pa ) = ψa in the
space Ψa . Given Pa , more than one configuration in Γg is compatible with it. Then, COMPg (Pa )
is a proper superset of ψg
Π
COMPg (ψa ) = {ψg |ψg ψa }
A graphical illustration of COMPg (ψa ) is reported in Fig. 6.10. We can extend the
notion of compatibility set from configurations to P-Sets as follows.
Definition 6.15 (Compatibility set for P-Sets) Given a generative process Π and
an “abstract” P-Set Pa , the compatibility set COMPg (Pa ) of Pa is the set of ground
configurations with are compatible with Pa , i.e.:
COMPg (Pa ) = COMPg (ψa )
ψa ∈Pa
|COMPg (Pa )|
ξ(Pa , Pg ) = log2
|COMPg (Pg )|
is the abstraction ratio of the transformation. The values of ξ(Pa , Pg ) are always
positive, and higher values correspond to higher degrees of abstraction. This ratio
has only meaning for P-Sets.
Before proceeding any further, we provide an example to informally illustrate the
concepts introduced so far.
Example 6.9 Let us consider a camera Σ, which provides pictures of resolution
256 × 256 pixels, each with a gray level in the integer interval [0, 255]. Objects are
all of type pixel, and they have three attributes associated to them, namely the
X coordinate, with domain ΛX = [1, 256], the Y coordinate, with domain ΛY =
[1, 256], and the intensity I, with domain ΛI = [0, 255]. Neither functions nor
relations are considered. We can define the following description frame:
If we want to lower the resolution of the taken picture, we can aggregate non over-
lapping groups of four adjacent pixels to form one, called a square z, where square
is a new type of object. Then, the generated description framework is as follows:
(a)
Γa = {square}, {pi,j | 1 i, j 128}, {(X (a) , {1, . . . , 128}), (Y (a) , {1, . . . , 128}),
(I (a) , {0, . . . , 255})}, ∅, ∅
170 6 The KRA Model
of the leftmost ground pixel of the square, or the rightmost one. All these choices
are contained in the definition of the program associated to Π . One possibility is the
following:
Process Π (Pg )
for h = 0, 127 do
for k = 0, 127 do
z2h+1,2k+1 ← (p2h+1,2k+1 , p2h+2,2k+1 , p2h+1,2k+2 , p2h+2,2k+2 )
(a)
ph+1,k+1 = z2h+1,2k+1
X (a) (p(a)
h+1,k+1 ) = h + 1
(a)
Y (a) (ph+1,k+1 ) = k + 1
(a) (a)
Ih,k (ph+1,k+1 ) = 41 [I(p2h+1,2k+1 ) + I(p2h+2,2k+1 )+
I(p2h+1,2k+2 ) + I(p2h+2,2k+2 )]
end
end
Given a generic I (a) , the intensities of the original pixels satisfy the equation:
I1 + I2 + I3 + I4 = 4 I (a) (6.3)
Then, the number n(I (a) ) of different 4-ples satisfying (6.3) is:
Fig. 6.11 The pixels in the grid are grouped four by four. Rules are given to compute the attributes
of the so formed squares
The number n(I (a) ) is very large and the process is indeed an abstraction.
An abstraction operator is a special case of an abstraction process with just one
step. By concatenating or composing abstraction operators we obtain more complex
abstraction processes. Composition of abstractions, and their properties, have also
been considered, among others, by Giunchiglia and Walsh [214], and by Plaisted
[419].
Definition 6.22 (Abstraction process) An abstraction process is a simultaneous or
sequential composition of abstraction operators.
Abstraction processes will be considered in more detail in Chap. 7.
We may observe that abstraction processes can be applied repeatedly. In the fol-
lowing, when moving from one level of abstraction to the next, we may, for the sake
of simplicity, speak of a “ground” and of an “abstract” space. The sequential appli-
cation of abstractions makes in turn “ground” the space that was “abstract” before
and so on, obtaining a multi-level hierarchy of more and more abstract representation
172 6 The KRA Model
frames. This possibility complies with a relative notion of abstraction, because one
representation is “abstract” only with respect to the previous ones.
Let us go back to the KRA model, and see what role do play an (elementary)
abstraction operator ω in it. By applying ω to Γg a new description frame Γa is
obtained. Γa specifies what descriptive elements can be used to represent systems
in the abstract space. The space Ψa is generated accordingly from Ψg . Given an
actual observation Pg of a system, the application of ω to it generates an abstract
“observation” Pa . We write, symbolically:
Pa = ω(Pg )
Pg = Pa Δ
Pa = ω(Pg )
DS a = δ(DS g )
La = λ(Lg )
Ta = τ (Tg )
Ω = (ω, δ, λ, τ )
In Fig. 6.12 the full abstraction model KRA is reported. There are two reasons why
we do not define an operator for the query Q: on the one hand, the query remains,
usually, the same, modulo some syntactic modifications dictated by the more abstract
language La . On the other hand, if another query (related to Qg ) is to be solved in
6.4 The KRA Model of Abstraction 173
Fig. 6.12 The KRA model of abstraction. The reason why abstraction on Γg is set apart is that the
model assumes that abstraction must be defined first on Γg , by applying operator ω. In essence, the
basic act of abstracting is performed when selecting a particular set of descriptors for a system S .
In fact, after observation is performed, and a Pg is acquired, no other information comes from
the world. If the observations are too detailed, the user may decide to simplify them by removing
some of the information provided, obtaining thus a new, more abstract set of observations. An
automated procedure BUILD-DATA generates the component DS g starting from Γg , and possibly
the component Lg . Once the more abstract Γa is obtained, the same procedures can build up DS a and
La starting from Pa . However, for the sake of avoiding wasting computation efforts, it is possible
to apply suitable operators to each pair of the corresponding QE ’s components. Theory Ta has to
be generated directly, because it is not derivable from Pa . The same is true for the query
the more abstract space, then only the user can do such a modification, and hence an
abstraction operator is not needed.
The role of the operators δ and λ is to avoid to go through BUILD-DATA and
BUILD-LANG to build DS a and La starting from Pa . Instead, these operators can
be applied directly to the corresponding components of the observation frame. Notice
that Ta has to be generated directly from Tg by means of τ in any case, because it
cannot be derived from Γa . Of course, the operators acting on the various components
are not independent from one another.
Before moving ahead, we can make some comments about the whole approach.
As we may see from Fig. 6.12, the first and basic abstraction process actually takes
place in the transition from the “true” world to the “perceived” one. After that,
the construction of DS and L does not require any further abstraction, because DS
and L do not contain less information than the perception itself. The theory is not
derived from Γ , because it is independent. The schema of Fig. 6.12 can be contrasted
with the one of Fig. 6.13, which looks superficially quite similar to the former, but, in
fact, is orthogonal to it. The schema in Fig. 6.13 depicts a notion of abstraction based
on the stepwise process of moving away from the sensory world. As we have seen
in Chap. 2, this idea of abstraction is widely shared among many investigations on
abstraction, and it is intuitive and reasonable. However, it is not practical, especially
174 6 The KRA Model
Fig. 6.13 The knowledge spectrum. (Reprinted with permission from Pantazi et al. [412])
Fig. 6.14 Application of an operator that aggregates two objects, one lying on top of the other,
into a new object, called a tower. In the left-side configuration the operator can be applied in two
mutually exclusive ways, namely forming s from a and b, or forming s from b and c
a single Pg may generate two Pa ’s, each one being an abstraction of Pg according
to Definition 6.19, but only one is actually performed.
In the next chapter abstraction operators will be classified and described in details.
6.5 Summary
The KRA model is based on the acknowledgement that solving a problem (or
performing a task) usually requires two sources of information: observations and
“theory”. The observations have originally a perceptive connotation, and, hence,
not immediately exploitable: they need to be transformed into “data”, i.e., struc-
tured information usable by the theory. The link between the data and the theory
is assured by a language, that allows both the theory and the data to be expressed
for communication purposes. There may be a complex interplay between data and
theory, especially regarding their mutual compatibility, and the order in which they
are acquired, which biases the obtainable solutions. The model is not concerned with
this interplay, but assumes that all the information that is needed to solve a problem
(i.e., the “ground” description frame Γg ) has been acquired in some way. Instead, the
model is aimed at capturing the transformations that Γg undergoes under abstraction,
namely when the information contained in it is reduced.
A description frame Γ defines a space of possible configurations Ψ , i.e., a space
of descriptions that can be applied to systems. Γ does not refer to any concrete
176 6 The KRA Model
Fig. 6.15 The Rubik’s cube can be described in terms of the 26 small component cubes, which give
rise to the description frame Γ . Each arrangement of the cubes generates a specific configuration ψ;
the configuration set, Ψ , is very large. A configuration is a complete description of the positions
of the small cubes, so that it is unique. If Rubik’s cube is observed only partially, for instance by
looking only at one face, the observation corresponds to many configurations, each one obtained
by completing the invisible faces of the cube in a different way; in this case we have a P -Set P ,
which is a set of configurations. The query Q can be represented by a particular configuration to
be reached starting from an initial one [A color version of this figure is reported in Fig. H.12 of
Appendix H]
system, but only establishes what are the elements usable to describe one. When
an actual system is observed, the signals captured on it are collected in a P-Set P.
Abstraction is defined on Γ , and then it is uniformly applied to all the potentially
observed systems. The relation between the various elements involved in modeling
abstraction with KRA are illustrated in Fig. 6.15.
The KRA model allows reformulation to be distinguished from abstraction; in
fact, some transformations reduce the amount of information provided by a descrip-
tion, and some only change the form in which information is represented.
Abstraction is defined in terms of information reduction. This view of abstrac-
tion allows two configurations (descriptions) to be compared with respect to the
more-abstract-than relation, even though they may belong to different configuration
spaces. The information is not lost, but simply hidden or encapsulated.
An important aspect of the view of abstraction captured by KRA is that moving
across abstraction levels should be easy, in order to be able to try many abstractions,
6.5 Summary 177
when solving a problem. For this reason, all the hidden information is memorized
during the process of abstracting, so that it can be quickly retrieved.
Finally, only transformations generated by a precisely defined set of abstraction
operators are considered in the model. This is done to avoid the costly process of
checking the more-abstract-than relation on pairs of configurations.
Chapter 7
Abstraction Operators and Design Patterns
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 179
DOI: 10.1007/978-1-4614-7052-6_7, © Springer Science+Business Media New York 2013
180 7 Abstraction Operators and Design Patterns
Abstraction operators can be subdivided into classes according to their basic func-
tioning. In particular, we consider four categories:
• Operators that mask information
– by hiding elements of a system description.
• Operators that make information less detailed
– by building equivalence classes of elements,
– by generating hierarchies of element descriptions,
– by combining existing elements into new ones.
The definitions given in this chapter concern the operator’s component ω that acts
on description frames.
In order to describe abstraction operators, we use the encapsulation approach
exploited in Abstract Data Types, by only providing formal definitions, and encap-
sulating implementation details inside the definitions. More precisely, we will use
Liskov and Guttag’s formalism [333] for Procedural Data Type (PDT), described in
Sect. 2.4 and reported (adapted to our approach) below:
begin NAME = proc ω
described as : % function and goal
requires : % identifies inputs
generates : % identifies outputs
method : meth[Pg , ω] (program that performs abstraction on Pg )
end NAME
The above schema describes, in an “abstract” way, what does change in the descrip-
tion frame, whereas the actual implementation on a perception P is realized by the
method meth.
Each instantiation of the PDT corresponds to a specific operator. Actually,
the procedural data type is nested, in that meth[Pg , ω] is in turn a PDT, as
it will be described later on. In this chapter it is always assumed that oper-
ators of type ω take as input elements of a ground description frame Γg =
(g) (g) (g) (g) (g)
ΓTYPE , ΓO , ΓA , ΓF , ΓR and give as output elements of an abstract descrip-
(a) (a) (a) (a) (a)
tion frame Γa = ΓTYPE , ΓO , ΓA , ΓF , ΓR . Instead, meth[Pg , ω] describes
how ω has to be applied to any Pg to obtain Pa .
Operators of the first group hide an element of the description frame. The operator’s
generic name is ωh , and it may act on types, objects, attributes, functions or relations.
The class is described in the following PDT:
begin NAME = proc ωh
described as : Removing from view an element of Γg
requires : X (g) (set of involved element)
y (element to hide)
generates : X (a) = X (g) − {y}
method : meth[Pg , ωh ]
end NAME
By instantiating X (g) and y, we obtain specific operators. In particular:
• ωhobj hides an object of a description frame,
• ωhtype hides a type of a description frame,
• ωhattr hides an attribute of a description frame,
• ωhfun hides a function of a description frame,
• ωhrel hides a relation of a description frame.
Among these operators, we will only detail the first and third ones.
(g)
If X (g) = ΓO and y = o, the object with identifier o is no more part of the set
of objects that can be observed in a system. For the sake of simplicity notation, we
define:
(g)
ωhobj (o) = ωh (ΓO , o)
def
and we obtain:
(g)
ΓO(a) = ΓO − {o}
The method meth[Pg , ωhobj (o)], applied to the observed description Pg of a system,
removes from view the object o, if it is actually observed, as well as all its attribute
values and its occurrences in the covers of functions and relations. An example is
reported in Fig. 7.1. “Removing from view” means that ωhobj (o) replaces in Pg every
occurrence of o by UN.
The reported operator is the most basic one. It is easy to think of extending it to
more complex situations, for instance to hiding a set of objects that satisfy a given
formula, for example to hide all “red objects”.
182 7 Abstraction Operators and Design Patterns
Fig. 7.1 Example of application of the method meth[Pg , ωhobj (o)]. In the right-hand picture
object o is hidden behind a cloud of smoke. Its shape, color, and position are hidden as well
(g)
ωhattr (Am , Λm ) = ωh (ΓA , (Am , Λm ))
def
Then:
(a) (g)
ΓA = ΓA − {(Am , Λm )}
The second group of operators of this class hides a value taken on by a variable. Its
generic PDT is the following:
begin NAME = proc ωhval
described as : Removing from view an element of the domain of
a variable.
requires : X (g) (set of involved element)
(x, Λx ) (variable and its domain)
v (value to hide)
7.2 Hiding Operators 183
Fig. 7.2 Example of method meth(Pg , ωhattr (Am , Λm )). The attribute Am = Color is hidden from
the left picture giving a gray-level picture (right). Each pixel shows a value of the light intensity, but
this last is no more distributed over the R,G,B channels [A color version of the figure is reported in
Fig. H.10 of Appendix H]
(a)
generates : Λx = Λx − {v}
method : meth[Pg , ωhval ]
end NAME
By instantiating X (g) , y, (x, Λx ), and v, we obtain four specific operators:
(g)
If X (g) = ΓA , (x, Λx ) = (Am , Λm ), v = vi ∈ Λm , then the operator
(g)
ωhattrval ((Am , Λm ), vi ) = ωh (ΓA , (Am , Λm ), vi )
def
Fig. 7.3 Example of application of the method meth [Pg , ωhattrval ((Color, ΛColor ),
turquoise)]. The value turquoise is hidden from the left picture; a less colorful picture
is obtained (right), where objects of color turquoise become transparent (UN) [A color version
of this figure is reported in Fig. H.11 of Appendix H]
The third group of operators of this class hides an argument of a function or a relation.
Its generic PDT is the following:
begin NAME = proc ωharg
described as : Removing from view an argument of a function or
a relation.
requires : X (g) (set of involved element)
y (element to be modified)
x (argument to be hidden)
generates : X (a)
method : meth[Pg , ωharg ]
end NAME
By instantiating X (g) , y, and x, we obtain specific operators:
• ωhfunarg hides an argument of a function,
• ωhrelarg hides an argument of a relation.
We only detail here the first one of these operators.
(g)
If X (g) = ΓR , y = Rk and x = xj , then the operator
(g)
ωhrelarg (Rk , xj ) = ωh (ΓR , Rk , xj )
def
7.2 Hiding Operators 185
reduces the arity of relation Rk by hiding its argument xj . If the arity of Rk is tk , then
an abstract realtion Rk(a) , with arity tk − 1, is created:
Moreover:
(a) (g) (a)
ΓR = ΓR − {Rk } ∪ {Rk }
Method meth[Pg , ωhrelarg (Rk , xj )] acts on the cover FCOV (Rk ) of Rk , replacing in
each tuple the argument in the j-th position with UN.
As an example, let us consider a description frame Γg such that Rontop (x1 , x2 ) ∈
(g) (g) (a)
ΓR , with x1 , x2 ∈ ΓO . We want to hide the first argument, obtaining thus Rontop (x2 ).
(a)
Again, meth[Pg , ωhrelarg (Rk , xj )] provides rules for constructing RCOV (Rontop ).
For instance:
(a)
∀σ ≡ (o1 , o2 ) ∈ RCOV (Rontop ) : Add σ (a) ≡ (o2 ) to RCOV (Rontop )
(a)
In Example 6.3 the cover of Rontop will be {b, d}. With this kind of abstraction we
still know that both b and d have some objects on top of them, but we do not know
any more which one.
This operator is the same as the arity reduction one, defined by Ghidini and
Giunchiglia [203], and the propositionalization operator defined by Plaisted [419].
Some operators may merge pieces of information, reducing the level of detail of a sys-
tem description. More specifically, these operators group description elements into
equivalence classes. The equivalence classes are formed by defining a formula ϕeq ,
which the elements of the same class must satisfy (intensional specification) or sim-
ply by enumerating the elements y1 , . . . , ys that belong to the class (extensional
specification). The extensional definition is a special case of the intensional one,
when ϕeq is the disjunction of Krönecker Delta functions.
When a set of elements {a1 , . . . , ak } are equated by building up an equivalence
class, the class can be defined in two ways: either it is denoted with a generic name
such as, for instance, [a], or all the values in the set are equated to any one among
them, say, a1 . Abstraction operators building up equivalence classes must use the first
method, whereas those that use the second one are actually approximation operators,
and will be discussed in Sect. 7.6.
186 7 Abstraction Operators and Design Patterns
The abstraction operators building equivalence classes are partitioned into three
groups, according to the type of elements they act upon.
• ωeqelem builds equivalence classes of elements,
• ωeqval builds equivalence classes of values of elements,
• ωeqarg build equivalence classes of arguments of elements.
Building equivalence classes has been a much studied abstraction, due to its sim-
plicity and large applicability. For instance, Roşu describes behavioral abstraction as
an extension of algebraic specification [455]: “two states are behaviorally equivalent
if and only if they appear to be the same under any visible experiment.”
This operator implements the partition of the domain objects into equivalence
classes, as defined by Hobbs [252], Imielinski [269], and Ghidini and Giunchiglia
[203].
In a recent paper Antonelli [20], starting from the abstraction principle (4.1),
defines an abstraction operator, which assigns an object—“number”—to the equiv-
alence classes generated by the equinumerosity relation, in such a way that a different
object is associated to each class.
The operators that we consider in this section build a single equivalence class out
of a number of elements. It is an immediate extension to define operators that build
several equivalence classes at the same time, using an equivalence relation.
The first group of operators builds equivalence classes of elements (objects, attributes,
functions, or relations) of a description frame. Their generic PDT is the following:
begin NAME = proc ωeqelem
described as : Making some elements indistinguishable
requires : X (g) (set of involved element)
ϕeq (indistinguishability condition)
y (a) (name of the equivalence class)
generates : X (a)
Xeq (set of indistinguishable elements)
method : meth[Pg , ωeqelem ]
end NAME
In the above PDT ϕeq represents the condition stating the equivalence among a set
of elements x1 , . . . , xk ∈ X (g) . In a logical context it may be expressed as a logical
formula. Moreover, y (a) is the name of the class, given by the user. By applying
ϕeq to objects belonging to X (g) the set of equivalent (indistinguishable) elements is
computed, obtaining thus Xeq .
By instantiating X (g) and ϕeq , specific operators are obtained:
• ωeqobj builds equivalence classes of objects,
7.3 Building Equivalence Classes Operators 187
(g)
If X (g) = ΓO , and y (a) = o(a) , the operator defines the granularity of the
description. All tuples of objects (o1 , . . . , ok ) satisfying ϕeq are considered indis-
tinguishable. Then:
(g)
ωeqobj (ϕeq , o(a) ) = ωeqelem (ΓO , ϕeq , o(a) )
def
The method meth[Pg , ωeqobj (ϕeq , o(a) )] generates first the set ΓO,eq ; then, it
replaces each element of ΓO,eq by o(a) , obtaining:
(g)
ΓO(a) = ΓO − ΓO,eq ∪ {o(a) }
Then, ΓO,eq = ΓO,chair , and all chairs are replaced by some abstract “schema” of
them, as illustrated in Fig. 7.4. In this case o(a) may have Color = UN and Use = UN.
188 7 Abstraction Operators and Design Patterns
Fig. 7.4 Example of application of meth[Pg , ωeqobj (ϕeq , o(a) )], where ϕeq (o)=“o ∈ ΓO,chair ”.
The different chairs (on the left) might be considered equivalent among each other, and the class
can be represented by an abstract schema o(a) of a chair (on the right)
(g)
If X (g) = ΓTYPE and y (a) = t(a) , the operator makes all types satisfying ϕeq indis-
tinguishable. Then, type t(a) is applied to all objects in the equivalence class. We
define:
(g)
ωeqtype (ϕeq , t(a) ) = ωeqelem (ΓTYPE , ϕeq , t(a) )
def
The method meth[Pg , ωeqtype (ϕeq , t(a) )] generates first the set Xeq = ΓTYPE,eq of
indistinguishable types, and then it applies t(a) to the obtained class. All types in
ΓTYPE,eq become t(a) , obtaining:
(a) (g)
ΓTYPE = ΓTYPE − ΓTYPE,eq ∪ {t(a) }
The method meth[Pg , ωeqtype (ϕeq , t(a) )] specifies what properties are to be assigned
to t(a) , considering the ones of the equated types. For instance, if the types in ΓTYPE,eq
have different sets of attributes, t(a) could have the intersection of these sets, or their
union, by setting some values to NA, depending on the choice of the user.
7.3 Building Equivalence Classes Operators 189
(g)
If X (g) = ΓA , Y = (Am , Λm ), and Veq = Λm,eq ⊆ Λm , then the operator makes
indistinguishable a subset Λm,eq of the domain Λm of Am . We define:
(g)
ωeqattrval (Am , Λm ), Λm,eq , v(a) = ωeqval ΓA , (Am , Λm ), Λm,eq , v(a)
def
(a) (a)
We obtain an abstract attribute Am such that Λm = Λm − Λm,eq ∪ {v(a) }, and
(g)
ΓA(a) = ΓA − {(Am , Λm )} ∪ {(A(a) (a)
m , Λm )}
The third class of operators contains those that act on arguments of functions or
relations. Their generic PDT is the following one:
begin NAME = proc ωeqarg
described as : Making indistinguishable arguments of functions
or relations
requires : X (g) (set of involved element)
Y (element to be modified)
Zeq (set of indistinguishable arguments)
z(a) (name of the equivalence class)
generates : X (a)
method : meth[Pg , ωeqarg ]
end NAME
In the above PDT we have assumed, for the sake of simplicity, that the set of indis-
tinguishable arguments are give extentionally, by enumeration; it is easy to extend
the case to indistinguishable arguments satisfying a given equivalence predicate or
formula. By instantiating X (g) , Y and Zeq specific operators are obtained:
• ωeqfunarg makes indiscernible arguments of a function,
• ωeqrelarg makes indiscernible arguments of a relation.
As these operators have a reduced applicability, we do not give here their details.
Hierarchy generating operators replace some set of description elements with a more
general one, reducing thus the level of detail of a system description. More specif-
ically, these operators reduce the information in a description by generating hierar-
chies, in which the ground information in lower level nodes is replaced by higher
level ones (more generic and in smaller number). Objects, per se, cannot be orga-
nized into hierarchies, because they are just instances of types. Then, only “types” of
objects can form hierarchies. Moreover, function and relation arguments may only
have objects as values. Then, no operator is defined for hierarchies over argument
values of functions and relations. The generic PDT corresponding to this group of
operators is the following:
7.4 Hierarchy Generating Operators 191
The considered operator builds up one higher level node at a time. For generating
a complete hierarchy, the operator must be reapplied several times, or a composite
abstraction process must be defined. By instantiating X (g) , Y , Ychild , y (a) we obtain
specific operators:
• ωhiertype builds up a hierarchy of types,
• ωhierattr builds up a hierarchy of attributes,
• ωhierfun builds up a hierarchy of functions,
• ωhierrel builds up a hierarchy of relations,
• ωhierattrval builds up a hierarchy of attribute values,
• ωhierfuncodom builds up a hierarchy of values of a function co-domain.
The elements of the set Ychild are linked to y (a) via an is-a relation.
(g) (g)
If X (g) = ΓTYPE , Y = ∅, Ychild = ΓTYPE,child , and y (a) = t(a) , then the operator
(g)
builds a type hierarchy, where a set of nodes, those contained in ΓTYPE,child , are
replaced by t(a) . We define:
(g) (g) (g)
ωhiertype ΓTYPE,child , t(a) = ωhier ΓTYPE , ΓTYPE,child , t(a)
def
and we obtain:
(a) (g) (g)
ΓTYPE = ΓTYPE − ΓTYPE,child ∪ {t(a) }.
The original types are all hidden, because only the new one can now label the objects.
As an example, let us consider the set of types
Each of these types can be replaced by the more general type polygon, thus
loosing
the information
about the shape and number of sides. The method
(g)
meth Pg , ωhiertype ΓTYPE,child , t(a) specifies which attributes can still be asso-
ciated to polygons, and what to do with those that cannot. This operator typically
implements predicate mapping.
An operator that is similar to this one is ωeqtype . However, this last makes a set
of types indistinguishable and interchangeable, without merging the corresponding
instances; simply, any instance of each of the types in the set can be labelled with
any other equivalent type. On the contrary, ωhiertype explicitly builds up hierarchies,
merging also instances. In addition, attributes of the new type can be defined differ-
ently by meth(Pg , ωhiertype ) and meth(Pg , ωeqtype ).
(g)
If X (g) = ΓA , Y = (Am , Λm ), Ychild = Λm,child , and y (a) = v(a) , then the operator
builds up a hierarchical structure by replacing all values in Λm,child with the single,
more general value v(a) . We define:
(g)
ωhierattrval (Am , Λm ), Λm,child , v(a) = ωhier ΓA , (Am , Λm ), Λm,child , v(a)
def
and we obtain:
Λ(a) (a)
m = Λm − Λm,child ∪ {v }
Then:
(g)
ΓA(a) = ΓA − {(Am , Λm )} ∪ {(A(a) (a)
m , Λm )}
As an example, let Color be an attribute that takes values in the palette ΛColor =
{lawn-green, light-green, dark-green, sea-green, olive- green,
white, yellow, blue, light-blue, aquamarine, cyan, magenta, red,
pink, orange, black}.
We can consider, as illustrated in Fig. 7.5, the set of values Λm,child = {lawn-
green, light-green, dark-green, sea-green, olive-green} and
replace them with and v(a) = green. Notice that the operator ωhierattrval builds
up a new node for a set of old ones at a time. Moreover, when the hierarchy is
climbed upon, the lower levels nodes disappear.
7.5 Composition Operators 193
Fig. 7.5 Example of the method meth[Pg , ωhierattrval (Color, ΛColor ), ΛColor,child , green ],
where ΛColor and ΛColor,child are given in the text, and v(a) = green
This operator builds up a “collective” object of type t(a) out of a number of objects
of type t. Its PDT is the following:
begin NAME = proc ωcoll
described as : Building a “collective” object using objects
of the same type
(g)
requires : ΓTYPE
t (original type)
t(a) (new type)
(a)
generates : ΓTYPE
method : meth[Pg , ωcoll ]
end NAME
194 7 Abstraction Operators and Design Patterns
Fig. 7.6 Example of application of the method meth Pg , ωcoll (tree, forest) . A set of trees
(left) is abstracted into a forest (right) represented by an icon with four concentric levels. The other
trees are left unaltered
We define:
(g)
ωcoll t, t(a) = ωcoll ΓTYPE , t, t(a)
def
We have then:
(a) (g)
ΓTYPE = ΓTYPE ∪ {t(a) }
The original type is not hidden in the abstract space, because there may be objects
of that type which are not combined.
The details of the abstraction are specified in
the method meth Pg , t, t(a) . In particular the method states what objects are to be
combined and what properties are to be associated to them, based on the properties
of the constituent objects. The combined objects are removed from view, so that their
attribute values are no more accessible as well as their occurrences in the covers of
functions and relations. The role of each member in the collection is the same.
An example of this abstraction operator is the definition of a type t(a) = forest
out of an ensemble of objects of type tree, as illustrated in Fig. 7.6. The original
objects, collected into the new one, are hidden.
The relation between the collected objects and the collective one is an individual-
of relation.
We have then:
(a) (g)
ΓTYPE = ΓTYPE ∪ {t(a) }
The original types are not hidden in the abstract space, because there may be objects
of those types which
are not combined.
The details
of the abstraction are specified by
the method meth Pg , ωaggr (t1 , . . . , ts ), t(a) , which states what objects in a Pg
are to be aggregated, and what properties are to be associated to the new one, based
on the properties of the original objects. The constituent objects have different roles
or functions inside the aggregate, whose properties cannot be just the sum of those
of the components. Usually, the abstract type has emerging properties or functions.
The combined objects are removed from view, so that their attribute values are no
more accessible as well as
their occurrences in the covers of functions and relations.
Method meth Pg , ωaggr also describes the procedure (physical or logical) to be
used to aggregate the component objects.
As an example, we may build up a t(a) = computer starting from objects
of type body, monitor, mouse, and keyboard,1 or a t(a) = tennis − set
by (functionally) aggregating an object of type tennis-racket and one of type
tennis-ball. An example of an aggregation that uses a unique type of component
objects is a chain, formed by a set of rings. The aggregated objects are removed
from further consideration.
The relation between the component objects and the aggregate is a part-of relation.
This operator forms a group of objects that may not have any relation among each
other: it may be the case that we just want to put them together for some reason.
The grouped objects satisfy some condition ϕgroup , which can simply be an enumer-
ation of particular objects. Its PDT is the following:
begin NAME = proc ωgroup
described as : Building a group of etherogeneous objects
(g) (g)
requires : ΓO , ΓTYPE
ϕgroup (condition for grouping)
group (new type)
G(a) (group’s name)
(a) (a)
generates : ΓTYPE , ΓO,group , ΓO
method : meth[Pg , ωgroup ]
end NAME
We define:
(g) (g)
ωgroup (ϕgroup , G(a) ) = ωgroup ΓO , ΓTYPE , ϕgroup , G(a)
def
(a) (g)
ΓTYPE = ΓTYPE ∪ {group}
(a) (g)
ΓO = ΓO − ΓO,group ∪ {G(a) }
A group has simply the generic type group. As an example, we may want to put
together all the pieces of furniture existing in a given office room. In this way, we
form a group-object G(a) = office − furniture of type group. Notice that
this operator is defined on objects, not on types. Hence, it is neither a collection, nor
an aggregate, nor a hierarchy. The relation between the component objects and the
group is a member-of relation.
This operator constructs a new description element starting from attributes, relations,
or functions. Depending on the input and output, different specific operators can be
defined. The PDT of the operator is the following:
begin NAME = proc ωconstr
described as : Constructing a new description element starting from
elements chosen among attributes, functions and relations
(g) (g) (g)
requires : ΓA , ΓF , ΓR , y
Constr (function that builds up the new element)
(a) (a) (a)
generates : ΓA , ΓF , ΓR
method : meth Pg , ωconstr
end NAME
7.5 Composition Operators 197
where:
(g) (g) (g) (a) (a) (a)
Constr :ΓA × ΓF × ΓR → ΓA ∪ ΓF ∪ ΓR
(a) (a) (a)
y ∈ ΓA ∪ ΓF ∪ ΓR
The corresponding meth Pg , ωconstr (Constr, y) states how a new description ele-
ment is built up and what are its properties.
An example of this operator is the combination of attributes to form a new attribute.
For instance, given an object x of type rectangle, let Long be a binary attribute,
which assumes value 1 if x is long and 0 otherwise, and Wide be a binary attribute,
which assumes value 1 if x is wide and 0 otherwise. Then, we can construct a new
attribute, Big (a) , defined as Big (a) = Long ∧ Wide. The attribute Big (a) is a binary
one, and assumes the value 1 only if x is both long and wide. As usual, the attributes
Long and Wide do not enter ΓA(a) , which will only contain (Big (a) , {0, 1}).
In this section we briefly discuss the relation between approximation and abstraction.
The view that we present is far from being general, as it is suggested by pragmatic
issues. In our view approximation occurs when some elements of a system descrip-
tion is replaced on purpose with another one. The new element is, according to some
defined criterion, “simpler” than the original one. As also abstraction aims at sim-
plification, abstraction and approximation look similar, and it is a sensible question
whether or when approximation is also an abstraction. We have tried to answer the
question within the framework we propose, based on the notion of configuration and
information reduction, and we have come up with a distinction. This distinction has a
meaning only inside the KRA framework, and it may well be possible that different
conclusions could be drawn in other contexts.
Let us first consider approximation applied to a specific configuration ψ. If
ψ is a configuration belonging to a configuration space Ψ , any change of a value
v into v of a variable changes ψ into another configuration ψap . Intuitively, we
are willing to say that the approximation v ≈ v is indeed an abstraction if
the set COMP(ψap ) in Ψ contains ψ. However, this is never the case, because
COMP(ψ) ≡ ψ, COMP(ψap ) ≡ ψap , and ψ = ψap , because they differ in the
values v and v for the variable of interest.
198 7 Abstraction Operators and Design Patterns
The notion can be extended to P-Sets. Let COMP(P) be the set of configurations
compatible with P. Modifying a variable in P lets P become a Pap . Again, if the
approximation v ≈ v should be an abstraction, then it must be COMP(P) ⊆
COMP(Pap ). As before, this is impossible, because, even though P may have some
values set to UN, approximation is always made on a variable that is different from
UN, otherwise it would be an addition of information, and not just a change, and,
then, certainly not an abstraction.
As a conclusion, we can say that approximation performed on a P-Set, i.e., on an
observed system description, is never an abstraction, per se, even though it generates
a possibly simpler description. The original and the approximated configurations are
incomparable with respect to our notion of abstraction as information reduction. On
the other hand, this is an intuitive result; in fact, modifying a value is not reducing
information but changing it.
As an example, let us suppose that a configuration consists of a single object,
a, with attributes Length = 1.3 m and Color = red. Let us approximate the real
number “1.3” with 1. The original configuration ψ = (a, obj, 1.3, red) becomes
now ψap = (a, obj, 1, red), and the two are incomparable with respect to the
information they provide.
Let now Q be a query and ANS(Q) be the set of obtainable answers in the orig-
inal space. For the abstract query, things are different, as approximation may lead
to a superset, to a subset, or to a partially or totally disjoint set of answers with
respect to ANS(Q). If we consider the example of the pendulum, reported in Fig. 5.10,
and the query is Q ≡ “Compute T ”, ANS(Q) contains the unique, correct value (5.1).
If we approximate the function Sin θ by θ in the equation of motion, a new set ANS (Q)
containing the only solution (5.2) is obtained. Then, ANS(Q) ∩ ANS (Q) = ∅.
Let us see whether the notion of approximation might be extended to description
frames, and, if yes, what the effect would be. Applying an approximation to Γ
means that some element of the potential descriptions is systematically modified on
purpose in all possible observed systems. For instance, we could change any real
value v ∈ R into its floor function, namely v ≈ v, or to expand any function
f ∈ ΓF into Taylor series and take only the terms of order 0 and 1 (linearization). In
so doing, approximation operators can be defined, in as much the same way as for
abstraction operators. In particular, a description frame Γg can be transformed into an
approximate description frame Γap . A substantial difference between an abstraction
operator and an approximation one is that in abstraction all the information needed is
contained in Γg , as the user only provides names; for instance, in building a node in a
hierarchy, the nodes to be replaced are only selected by the user, but they already exist
in Γg . Moreover, the user provides just the name of the new node. In approximation,
the user introduces some new element; for instance, the linearized version of the
functions in ΓF are usually not already present in it.
In any case, at the level of Γg , approximation operators can be defined by spec-
ifying a procedure Prox, which describes what has to be replaced in Γg and how.
The effect is to build up an approximate description frame Γap . Knowledge of Prox
allows a process similar to the inversion of abstraction to be performed, and then it
allows the ground and approximate configuration spaces to be related.
7.6 Approximation Operators 199
If we consider the abstraction operators introduced in this chapter, we may see that
those generating equivalence classes of elements have their counterpart in approxi-
mation. In fact, approximation occurs when the representative of the class is one of
its element, instead of a generic name. In this way all the class elements are replaced
by one of them, generating thus approximate configurations. Another way to per-
form approximation is the definition of a specific operator that replaces an element.
In the following we will consider this operator and two others, for the sake of exem-
plification. In order to distinguish approximation from abstraction, we will denote
approximation operators by the letter ρ.
The replacement operator is the fundamental one for approximation. It takes any
element of a description frame Γg and replaces it with another one. Its PDT is the
following:
begin NAME = proc ρrepl
described as : Replacing a description element with another
requires : X (g) (set of involved elements)
y (element to be approximated)
y (ap) (approximation)
generates : X (ap)
method : prox Pg , ρrepl
end NAME
By instatiating X (g) and y, different operators are obtained. The element to be replaced
can be an object, a type, a function, a relation, an argument, or a value. As an example,
let us consider the case of replacing a function.
(g)
Let X (g) = ΓF and y = fh . Let moreover y (ap) = gh be the function that replaces
fh . We can define ρreplfun (fh , gh ), and we obtain:
(ap) (g)
ΓF = ΓF − {fh } ∪ {gh }
This operator changes uniformly the function fh into gh whenever it occurs, in any
perception Pg .
As we have discussed when dealing with abstraction operators, there are two ways
of handling classes of equivalent objects: either the class is denoted by a generic
name, which can be instantiated to any element of the class, or all elements of the
class are made equal to one of them; in the former case (“equation”) we have an
abstraction, whereas in the latter one (“identification”) we have an approximation.
The PDT corresponding to identification is the following:
200 7 Abstraction Operators and Design Patterns
(g)
Let X (g) = ΓO and ϕid some condition on objects. Then, all tuples of objects
(o1 , . . . , ok ) satisfying ϕid are considered indistinguishable, and are equated to
y (a) ∈ {o1 , . . . , ok }. Then:
We define:
(g)
ρidobj (ϕid ) = ρid (ΓO , ϕid )
def
The method prox Pg , ρidobj (ϕid ) generates first the set ΓO,id ; then, it replaces each
element of ΓO,id by v(a) = o(a) , where o(a) ∈ ΓO,id obtaining:
(a) (g)
ΓO = ΓO − ΓO,id ∪ {o(a) }
The element o(a) can be given by the user of selected in ΓO,eq according to a give
procedure.
7.6 Approximation Operators 201
Fig. 7.7 Example of application of method prox Pg , ρidobj (ϕid ) , where ϕid (o) = “o ∈ ΓO,chair .
The different chairs (on the left) might be considered equivalent among each other, and equal to
one of them (on the right)
As an example, let us consider again the furniture of Fig. 7.4, and let us equate
again all chairs. Whereas in Sect. 7.3.1.1 the class of chairs was represented by a
generic schema of a chair (obtaining thus an abstraction), in this case all chairs will
be equate to one of them, extracted randomly from the set of all chairs. Suppose
that the extraction provided an instance of a folding-chair. Then, all other chairs are
considered equal to it, producing the approximation reported in Fig. 7.7.
Let us now go back again to the discretization of real intervals. Let us consider
the interval [0, 100), and let us divide it into 10 subintervals {[10k, 10(k + 1)) | 0
k 9}. Numbers falling inside one of the intervals are considered equivalent. As a
representative of each subinterval we may take its middle point, i.e., (10 k + 0.5)
for (0 k 9). Then, any value in a specific subinterval will be replaced by the
interval’s middle point, obtaining an approximate value for each number in the bin.
On the contrary, we remember that by assigning to each bin a linguistic value, an
abstraction was obtained.
7.7 Reformulation
For the sake of completeness, we add here a few words on reformulation. Considering
a description frame Γ , and the configuration space Ψ associated to it, its is a natural
thing to extend the definition that we have given for abstraction and approximation,
in terms of information content, to the case of reformulation.
202 7 Abstraction Operators and Design Patterns
In other words, P and its image under Π provide exactly the same information about
the system under analysis.
Unfortunately, it does not seem feasible to define reformulation operators at the
same time generic and meaningful, as was the case for approximation and abstrac-
tion, because they are too strongly dependent on the context. However, the very
Definition 7.1 allows us to say that, according to our view, reformulation is never an
abstraction. Again, as in the case of approximation, the result of a reformulation may
be “simpler” than the original one, so that simplicity is the common denominator of
all three mechanisms.
Abstraction, approximation, and reformulation are three facets of knowledge rep-
resentation which are complementary, and often work in synergy, to allow complex
changes to be performed.
All operators introduced so far are summarized in Table 7.1. They are grouped
according to the elements of the description frame they act upon, and the under-
lying abstraction mechanism. Even though they are in quite large number, we notice
that several among them can be “technically” applied in the same way, exploiting
synergies. For instance, equating values of a variable can be implemented with the
same code for attributes, argument values in functions and relations, and in function
co-domains. Nevertheless, we have kept them separate, because they differ in mean-
ing, and also in the impact they have on the Γ ’s. In fact, their PDTs are different, but
they share the same method.
As it was said at the beginning, the listed operators are defined at the level of
description frames, because they correspond to changing the perception provided by
sensors that are used to analyze the world. A method corresponding to each operator
acts on specific P-Sets according to rules that guide the actual process of abstraction.
The list of operators introduced in this chapter is by no means intended to exhaust
the spectrum of abstractions that can be thought of. However they are sufficient to
describe most of the abstractions proposed in the past in a unified way. Moreover,
they provide a guide for defining new ones, better suited to particular fields. The
complete list of currently available operators is reported in Appendix E.
7.9 Abstraction Processes 203
Table 7.1 Summary of the elementary abstraction and approximation operators, classified accord-
ing to the elements of the description frame they act upon and their mechanism
Operators Elements Arguments Values
Hiding ωhobj , ωhtype , ωhfunarg , ωhrelarg ωhattrval
ωhattr , ωhrel , ωhfunargval
ωhfun ωhfuncodom
ωhrelargval
Equating ωeqobj , ωeqtype , ωeqfunarg ωeqattrval
ωeqattr , ωeqfun , ωeqrelarg ωeqfunargval
ωeqrel ωeqfuncodom
ωeqrelargval
Building ωhierattr , ωhierfun , ωhierattrval
hierarchy ωhierrel , ωhiertype ωhierfuncodom
Combining ωcoll , ωaggr , ωgroup ωconstr
Approximation ρreplobj , ρrepltype , ρreplfunarg ρreplattrval
ρreplfun , ρreplrel ρreplrelarg ρreplfunargval
ρreplrelargval
ρreplfuncodom
ρidobj , ρidtype , ρidfunarg ρidattrval
ρidattr , ρidfun , ρidrelarg ρidfunargval
ρidrel ρidfuncodom
ρidrelargval
The abstraction operators introduced in this chapter are irreducible to simpler ones.
The use of only abstraction operators may not be effective in practice. In fact, if we
want to apply several operators to a ground description frame, we must build up a
hierarchy of more and more abstract spaces, each one obtained by the application of
a single operator. For instance, if we would like to build up a hierarchy, we should
apply ωhier as many times as the number of new nodes we want to add, creating thus
a possibly long chain of spaces, very close to one another.
In such cases it would be more convenient to combine the operators into sets
and/or chains, and apply them all at one time. The combination of operators is an
abstraction process, according to Definition 6.22. Clearly, not any composition of
operators is allowed. In particular, the result of the whole process must be the same as
the result obtained by applying all the composing operators in parallel or in sequence,
one at a time.
Definition 7.2 (Parallel abstraction process) A parallel abstraction process Π =
{ω1 , . . . , ωi , . . . , ωr } is a set of r operators to be applied simultaneously. The process
Π is admissible iff any permutation of the r operators generates the same final Γa .
In other words, if Π is admissible, there exists a corresponding method M =
{meth[Pg , ω1 ], . . . , meth[Pg , ωi ], . . . , meth[Pg , ωr ]} such that, for any Pg ⊆ Ψg ,
204 7 Abstraction Operators and Design Patterns
contains the actual program body, which, for the sake of generality, will be described
in pseudo-code.
In the following we will provide examples of methods for some operators.
We start with one of the simplest operator, ωhobj (o), the one that hides object o
from view, and was described in Sect. 7.2.1.1. Let us consider a P-Set describing an
observed system S, namely Pg = Og , Ag , Fg , Rg . In order to obtain the
abstract
description Pa = Oa , Aa , Fa , Ra , we have to apply meth Pg , ωhobj (o) , reported
in Table 7.3.
The NAME slot simply contains meth Pg , ωhobj (o) . The method requires in
input Pg and the
object to hide, o, and provides in output Pa . In order for
meth Pg , ωhobj (o) to be applied, object o must have been actually observed in
the system S; hence, this condition appears in the field APPL-CONDITIONS. The
field MEMORY is filled during the execution of the method. The code for the method,
which is executed only if the application conditions are met, is reported in Table 7.4.
The method modifies in turn Og , Ag , Fg , Rg , and memorizes the changes. In order
to obtain Oa , it simply deletes the object o from Og . For the attributes, it deletes
from Ag the assignment of attribute values to o. Then, it looks at each function
(g)
defined in Fg ; hiding an object has the effect of transforming a function fh , with
(g)
cover FCOV (fh ), into another one, whose cover does not contain anymore tuples
containing o. The new set Fa is the collection of all the modified functions. In an
2
2 This is one among several possible choices. For instance, the tuples can be kept, and a value UN
can replace object o.
206 7 Abstraction Operators and Design Patterns
Table 7.4 Pseudo-code for the method meth P g, ωhobj (o)
METHOD meth Pg , ωhobj (o)
Oa = Og − {o}
(P )
ΔO = {o}
Let t be the type of o
(t) (t)
Aa = Ag − {o, t, v1 , . . . , vMt }
(P ) (t) (t)
ΔA = {o, t, v1 , . . . , vMt }
Fa = ∅
(g)
forall fh ∈ Fg with arity th do
(P )
ΔF (h) = ∅
(a) (g)
FCOV (fh ) = FCOV (fh )
(a)
for all σ ∈ FCOV (fh ) do
if o ∈ σ
(a) (a)
then FCOV (fh ) = FCOV (fh ) − {σ}
(P ) (P )
ΔF (h) = Append(ΔF (h), σ)
endif
end
Define fh(a) correspondingto FCOV (fh(a) )
(a)
Fa = Fa ∪ {fh }
end
Ra = ∅
(g)
for all Rk ∈ Rg with arity tk do
(P )
ΔR (k) = ∅
(a) (g)
RCOV (Rk ) = RCOV (Rk )
(a)
for all σ ∈ RCOV (Rk ) do
if o ∈ σ
then RCOV (Rk(a) ) = RCOV (Rk(a) ) − {σ}
Δ(RP ) (k) = Append(Δ(RP ) (k), σ)
endif
end
(a) (a)
Define Rk corresponding to RCOV (Rk )
(a)
Ra = Ra ∪ {Rk }
end
(g)
analogous way, relation Rk is transformed into a relation whose cover does not
contain anymore tuples in which o occurs. Again, the new set Ra is the collection of
all the modified relations. Notice that all the hidden information is stored in Δ(P ) :
H
K
Δ(P ) = Δ(OP ) ∪ Δ(AP ) ∪ Δ(FP ) (h) ∪ Δ(RP ) (k)
h=1 k=1
It is immediate to see that, by applying meth Pg , ωhobj (o) , the P-Set Pa is less
informative than Pg , i.e., Pa Pg . In fact, in Pa we do not know anything more
7.10 Applying Abstraction: The Method 207
about object o. Then, any configuration in Pg specifying any value for the type and
attributes of o is compatible with Pa . As tuples of objects are also hidden in some
(g) (g)
FCOV (fh ) and RCOV (Rk ), information about these tuples is no more available,
as well. In order to reconstruct Pg we have to apply the following operations:
(P )
Og = Oa ΔO → Og = Oa ∪ {o}
Ag = Aa Δ(AP ) → Ag = Aa ∪ Δ(AP )
Fg = Fa Δ(FP ) → Fg = {fh |FCOV (fh ) = FCOV (fh(a) ) ∪ Δ(FP ) (h),
(g) (g)
(1 h H)}
(P ) (g) (g) (a) (P )
Rg = Ra ΔR → Rg = {Rk |RCOV (Rk ) = RCOV (Rk ) ∪ ΔR (k),
(1 k K)}
(a) (g)
Opoint = Opoint
(a) (g)
Osegment = Osegment
(a) (g)
Ofigure = Ofigure − {b} = {a, c, d}
The two functions Radius and Center are not affected, because b occurs neither in
their domain nor in their image. Then Fa = Fg . For what concerns relations, the
(a) (a)
abstract set Ra = {Rontop , Rleftof } contains:
(a)
RCOV (Rontop ) = {(c, d)}
(a)
RCOV (Rleftof ) = {(a, c), (a, d)}
When the method meth Pg , ωhobj (b) has been applied to Pg , the hidden information
(P ) (P ) (P ) (P )
can be found in Δ(P ) = ΔO ∪ ΔA ∪ ΔF ∪ ΔR , where:
Δ(OP ) = {b}
Δ(AP ) = {b, square, blue, large)}
(P )
ΔF = ∅
(P ) (P ) (P )
ΔR = {ΔR (Rontop ), ΔR (Rleftof )}, where :
(P ) (P )
ΔR (Rontop ) = {(a, b)} and ΔR (Rleftof ) = {(b, c), (b, d)}.
We will now describe the method for aggregating objects, which is one of the
most complex.
Table 7.5 Method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
NAME meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
INPUT Pg , (Ot1 , . . . , Ots ), t(a) ,
g : Ot1 × . . . × Ots → Ot(a)
OUTPUT Pa , z, Rpartof ⊆ si=1 Oti × Ot(a)
APPL-CONDITIONS ∃oi ∈ Oti (1 i s)
PARAMETERS See Table 7.6
MEMORY Δ(P ) , RCOV (Rpartof )
BODY See Table 7.7
7.10 Applying Abstraction: The Method 209
Table 7.6 Parameters of the method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
α(x) α(x) = (α1 (x), . . . , αM (x))
for m = 1, M do
(a)
if αm (x) then Am (z) = vj ∈ Λm ∪ {UN} ∪ {NA} endif
end
β(x) if β(x) then Transform Fg into Fa according to given rules endif
γ(x) if γ(x) then Transform Rg into Ra according to given rules endif
The method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) ) takes as input Pg , the sets of
objects of the types to be aggregated, and the new type to be generated. It takes also
in input a function g : Ot1 × . . . × Ots → Ot(a) , which tells how the new object
is obtained from the old ones. The original objects are removed from Oa , whereas
the new object is added. For this method, the field PARAMETERS is very important,
because it contains the rules for the aggregation of the input objects; the relevant
parameters are presented in Table 7.6.
The
rules of transformation must
be provided by the user. The body of
meth Pg , ωaggr ((t1 , . . . , ts ), t(a) ) , which is reported in Table 7.7, performs two
separate tasks: hiding the information regarding the original objects {o1 , . . . , os },
and transferring information from {o1 , . . . , os } to the new object c.
While hiding information is easy, and can be done unambiguously once given the
objects to hide, the transfer of information from the components to the aggregated
object requires the use of the rules specified in the PARAMETERS field. The transfer
of information from the component objects to the composite one is not unconditional,
because it might not be always meaningful. First of all, we must provide an abstrac-
tion function g that constructs, starting from the typed objects (o1 , t1 ) . . . , (os , ts )
the new object of the given type (c, t(a) ). Then, the parameters of the method
include sets of conditions, α(o1 , . . . , os ), β(o1 , . . . , os ), and γ(o1 , . . . , os ), which
tell whether the corresponding attributes, functions, or relations are applicable to the
new object, and, if yes, how.
To clarify the working of the aggregation operator we introduce an example.
Example 7.2 Let us consider again the geometric scenario of Fig. 6.2. We want to
aggregate two objects which are one on top of another to form a new object of type
t(a) = tower.
(g) (g) (g) (g) (g)
Given the description frame Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR of Example
6.2, we apply to it the operator ωaggr ((figure, figure), tower).
If we consider
the scenario Pg of Fig. 6.2, we can apply to it the method meth Pg , ωaggr ((figure ,
figure), tower)]. The instantiation of this method is reported in Table 7.8,
whereas the PARAMETERS field has the content reported in Table 7.9.
The function α generates the attribute values for the new object. Specifically, if
the objects x1 and x2 have the same color, then the composite object will have the
same color as well. If x1 and x2 do not have the same color, then the composite object
assumes the color of the biggest component. Obviously, this choice is one among
210 7 Abstraction Operators and Design Patterns
Table 7.7 Pseudo-code of method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
METHOD meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
s
Let Rpartof ⊆ i=1 Oti × Ot(a) be a new relation
Let σ = (o1 , . . . , os ) with oi ∈ Oti (1 i s)
Let B = {σ | ∀σ , σ : σ ∩ σ = ∅}
Oa = Og , Aa = Ag , Fa = Fg , Ra = Rg
(P ) (P ) (P ) (P )
ΔO = ΔA = ΔF = ΔR = ∅
RCOV (Rpartof ) = ∅
forall σ ∈ B do
Build up c = g(σ)
forall oj ∈ σ do
RCOV (Rpartof ) = RCOV (Rpartof ) ∪ {(oj , c)}
end
Oa = Oa − {o1 , . . . , os } ∪ {c}
Δ(OP ) = (o1 , . . . , os , c)
(t ) (t )
Aa = Aa − {(oi , ti , v1 i , . . . , vMti )|(1 i s)}
i
Aa = Aa ∪ {(c, t(a) , v1 , . . . , vM )},
where vm (1 m M) is determined by the rules
α(o1 , . . . , os ) specified in PARAMETERS
Δ(AP ) = Δ(AP ) ∪ {(oi , ti , v1(ti ) , . . . , vM
(ti )
t
)|(1 i s)}
i
forall fh ∈ ΓF do
forall tuple τ ∈ FCOV (fh ) such that at least one of the oi
occurs in τ do
(a)
FCOV (fh ) = FCOV (fh ) − {τ }
ΔF = Δ(FP ) ∪ {(fh , τ )}
(P )
end
end
(a)
Transform some FCOVER(fh ) ∈ Fg into FCOVER(fh )
according to the rules β(o1 , . . . , os ) and add them to Fa
forall Rk ∈ ΓR do
forall tuple τ ∈ RCOV (Rk ) such that at least one of the oi
occurs in τ do
(a)
RCOV (Rk ) = RCOV (Rk ) − {τ }
(P ) (P )
ΔR = ΔR ∪ {(Rk , τ )}
end
end
(a)
Transform some RCOVER(Rk ) ∈ Fg into FCOVER(Rk )
according to the rules γ(o1 , . . . , os ) and add them to Ra
end
Δ(P ) = Δ(OP ) ∪ Δ(AP ) ∪ Δ(FP ) ∪ Δ(RP ) ∪ RCOV (Rpartof )
the many that the user can make. For instance, the color of z could be set to UN or
to NA. For Size, two objects generate a large object if at least one of them is large, or
if both are of medium size. In all other cases the resulting object is of medium size.
The attribute Shape and Length are no more applicable to z.
7.10 Applying Abstraction: The Method 211
Table 7.8 Method meth Pg , ωaggr ((figure, figure), tower)
NAME meth Pg , ωaggr ((figure, figure), tower)
INPUT Pg , Ofigure , tower,
g : Ofigure × Ofigure → Otower
g(x1 , x2 ) = if [x1 ∈ Ofigure ] ∧ [x2 ∈ Ofigure ]
∧[(x1 , x2 ) ∈ RCOV (Rontop )] then z
OUTPUT Pa , {c}, Rpartof ⊆ Ofigure × Otower
APPL-CONDITIONS ∃o1 , o2 ∈ Ofigure , with o1 = o2
(o1 , o2 ) ∈ RCOV (Rontop )
PARAMETERS See Table 7.9
MEMORY Δ(P ) , RCOV (Rpartof )
BODY See Table 7.7
Table 7.9 Parameters of the method meth Pg , ωaggr ((figure, figure), tower)
α(x1 , x2 ) ⇒ if [Color(x1 ) = v1 ] ∧ [Color(x2 ) = v2 ] ∧ [v1 = v2 ]
then [Color (a) (z) = v1 ]
else if [Size(x1 ) = v3 ] ∧ [Size(x2 ) = v4 ] ∧ [v3 v4 ]
then [Color (a) (z) = v3 ]
else [Color (a) (z) = v4 ]
endif
endif
if [Size(x1 ) = v1 ] ∧ [Size(x2 ) = v2 ] ∧ [(v1 = large) ∨ (v2 = large)]
then [Size(a) (z) = large]
else if [v1 = medium] ∧ [v2 = medium]
then [Size(a) (z) = large]
else [Size(a) (z) = medium]
endif
endif
Shape(a) (z) = NA
Length(a) (z) = NA
β(x1 , x2 ) ⇒ if [Shape(x1 ) = circle] ∧ [Center(c1 , x1 )] ∧ [Radius(y1 , x1 )]
then Delete(c1 , x1 ) from FCOV (Center (a) )
Delete(y1 , x1 ) from FCOV (Radius(a) )
if [Shape(x2 ) = circle] ∧ [Center(c2 , x2 )] ∧ [Radius(y2 , x2 )]
then Delete(c2 , x2 ) from FCOV (Center (a) )
Delete(y2 , x2 ) from FCOV (Radius(a) )
(a)
γ(x1 , x2 ) ⇒ if ∃ u s.t. (u, x1 ) ∈ RCOV (Rontop ) then (u, z) ∈ RCOV (Rontop )
(a)
if ∃ v s.t. (x2 , v) ∈ RCOV (Rontop ) then (z, v) ∈ RCOV (Rontop )
if ∃ u s.t. [(x1 , u) ∈ RCOV (Rleftof ) ∨
(a)
(x2 , u) ∈ RCOV (Rleftof )] then (z, u) ∈ RCOV (Rleftof )
if ∃ v s.t. [(v, x1 ) ∈ RCOV (Rleftof ) ∨
(a)
(v, x2 ) ∈ RCOV (Rleftof )] then (v, z) ∈ RCOV (Rleftof )
forall u s.t. [(u, x1 ) ∈ RCOV (Rsideof )]∨
[(u, x2 ) ∈ RCOV (Rsideof )] do
(a)
Remove (u, x1 ) or (u, x2 ) from RCOV (Rsideof )
end
212 7 Abstraction Operators and Design Patterns
Fig. 7.8 Application of method meth Pg , ωaggr ((figure, figure), tower) . Objects a and
b are aggregated to obtain object c1 , and objects c and d are aggregated to obtain object c2 . The
color of c1 is blue, because b is larger than a, whereas the color of c2 is green. Both composite
objects are large. The new object c1 is at the left of c2 [A color version can be found in Fig. H. 13
of Appendix H]
Regarding functions, neither Center nor Radius are applicable to z; then, if one
of the two objects is a circle, its center and radius disappear from the corresponding
covers.
Regarding relations, if there is an object u which is on top of x1 , then u is also on
top of z. If there is an object v which is under x2 , then z is on top of v. Moreover,
if both x1 or x2 are at the left of an object u, then z is at the left of u; if there is an
object v wich is at the left of both x1 and x2 , then v is at the left of z as well. Finally,
the relation Rsideof is not considered applicable to z, and hence all the original sides
of x1 and x2 are hidden.
In Fig. 7.8 the resulting abstract scenario is reported. It corresponds to the trans-
formations described by the function α, β, and γ.
7.10 Applying Abstraction: The Method 213
The application of the method to the ground scenario Pg generates the follow-
ing Pa :
(P )
ΔO = {(a, figure), (b, figure), (c, figure), (d, figure), (O, point),
(OP, segment)}
(P )
ΔA = {(a, green, triangle, small), (b, blue, square, large),
(c, red, circle, medium), (d, green, rectangle, large),
(OP, black, r)}
(P )
ΔF = {FCOV (Center), FCOV (Radius)}
Δ(RP ) (RCOV (Rontop )) = RCOV (Rontop )
Δ(RP ) (Rleftof ) = {(a, c), (a, d), (b, c), (b, d)}
Δ(P)
R (Rsideof ) = {(AB, b), (AC, b), (BD, b), (CD, b), (EG, d), (EF, d),
(GH, d), (HF, d)}
RCOV (RPartOf ) = {(a, c1 ), (b, c1 ), (c, c2 ), (d, c2 )}
Table 7.10 Methods meth Dg , δhobj , meth Lg , λhobj , and meth Tg , τhobj (o)
NAME meth Dg , δhobj meth Lg , λhobj meth Tg , τhobj
INPUT Dg , o Lg , o Tg , o
OUTPUT Da La Ta
APPL-CONDITIONS σID=o (OBJ) = ∅ o ∈ CO ∅
PARAMETERS ∅ ∅ ∅
MEMORY Δ(D) Δ(L) Δ(T )
BODY See Table 7.11 Ca = Cg − {o} ∀ϕ ∈ Tg s.t. o ∈ ϕ do
Ta = Tg − {ϕ}
end
has the same structure as the one reported in Table 7.2. For the sake of simplicity,
only the methods and not the operators are described in the following, because the
methods are the ones used in practice to perform the abstraction. For the opera-
tors δhobj (o), λhobj
(o), and τhobj
(o), the associated
methods, meth Dg , δhobj (o) ,
meth Lg , λhobj (o) , and meth Tg
, τhobj (o) are reported in the same Table 7.10.
The input to meth Dg , δhobj (o) is the database Dg , as well as the object o to hide.
The application condition states that the object must be present in the table OBJ, so
that a query to this table does not return an empty set. The body of the operator is
reported in Table 7.11.
The method meth Dg , δhobj (o) takes as input the database Dg and the object to
be hidden, o, and outputs Da . No internal parameter is needed. When execution
terminates, the hidden information is stored in Δ(D) . The action of the method
consists in removing from all tables in Dg all tuples containing o. The database
Dg can be simply recovered from Da and Δ(D) using operations similar to the ones
reported for Pg .
Table 7.11 Pseudo-code for the method meth Dg , δhobj (o)
METHOD meth Dg , δhobj (o)
for all table Tg ∈ Dg do
Ta = Tg
Δ(D) (Tg ) = ∅
forall σ ∈ Ta do
if o occurs in σ
then Ta = Ta − {σ}
Δ(D) (Tg ) = Append(Δ(D) (Tg ), σ)
endif
end
end
Da = {Ta }
Δ(D) = Tg ∈Dg Δ(D) (Tg )
7.11 Abstraction Processes and Query Environment 215
The method meth Lg , λhobj (o) works on the language Lg = Cg , X, O, Pg , Fg ,
defined in Sect. 6.2. As already mentioned, we assume that the unique name of object
o in the set Cg of constants is simply its identifier o. As Lg only provides names
for attributes, functions and relations, nothing changes in it except the removal of o
in Ca .
Regarding the theory, there may be two cases for operator τhobj (o): either the
constant (object) o does not occur explicitly in any of the formulas in Tg , or it occurs
in some of them. In the former case nothing happens, and Ta = Tg . In the latter
case, we have to remove all formulas in which the constant occurs explicitly. Let, for
instance, Tg contain the formula
and let Bob be the hidden constant (person). Hence, the above formula cannot be
applied anymore, and it has to be hidden from Tg . The method meth Tg , τhobj (o)
is reported in the BODY field of Table 7.10.
Again, the choice of hiding all formulas in Tg in which the object to be hidden
occurs is one among many other possible choices. It is up to the user to choose one,
according to the nature of the query and the context. For instance, the constant Bob
could have been replaced, in expression (7.1), by an existentially quantified variable.
The choice is encoded in rules α in the PARAMETERS field.
In order to clarify the above defined operators, we introduce an example.
Example 7.3 Let us consider the situation described in Example 7.3, where we have
hidden object b. From the P-Set described in Example 6.3, we have built up the
database Dg described in Example 6.5. Dg consists of the tables OBJ, SEGMENT -
ATTR, FIGURE-ATTR, RADIUS, CENTER, ONTOP, LEFTOF, and SIDEOF. By
applying the selection operation σID=b (OBJ) to the table OBJ, we observe that
the object is present, and then we can apply the operator. Then, the abstract table
OBJ (a) becomes:
As the two function Radius and Center are not affected, because b is neither a circle
nor a point, then:
For what concerns relations, object b occurs in all tables ONTOP, LEFTOF, and
SIDEOF. Then:
Ca = Cg − {b},
Pa = Pg ,
Fa = Fg .
Then:
La = Ca , X, Og , Pg , Fg
(L)
The hidden information
in Δ (C) = {b}.
can be found
The method meth Tg , τhobj (b) does not modify the theory, because it does not
explicitly mention the object b. Notice that when the function Area and Contour-
length are instantiated on the scenario in the more abstract space, they will simply
not be applied to b, which is hidden.
tions. This situation has been addressed, in Software Engineering, with the notion of
Design Patterns. In this section we would like to propose Abstraction Patterns as an
analogue to design patterns, to be used when the same type of abstraction is required
in different domains and/or applications. In the next subsection a brief introduction
to the concept of design patterns is presented, for the sake of self-containedness.
When introduced by Gamma et al. [190], Design Patterns were meant to capture
the “intent behind a design by identifying objects, their collaborations, and the dis-
tribution of responsibilities. Design patterns play many roles in the object-oriented
development process: they provide a common vocabulary for design, they reduce
system complexity by naming and defining abstractions, they constitute a base of
experience for building reusable software, and they act as building blocks from which
more complex designs can be built ”. But Design Patterns are also motivated by the
fact that they can speed up the development process, and improve the quality of
developed software. Indeed, they provide general documented solutions to particular
representation problems but are not tied to a particular context or formalism. Finally,
patterns allow developers to communicate using well-known, well understood names
for software interactions. Common Design Patterns can also benefit from experience
using them over time, and making them more robust than ad-hoc “creative” designs
that reinvent solutions.
Design Patterns have become widely used and many books have specified how to
implement them in different programming languages such as JAVA, C++, or Ajax.
Beyond programming languages, there have also been attempts to codify design
patterns in particular domains as domain specific Design Patterns. Such attempts
include business model design, user interface design, secure design, Web design,
and so on. There is not a unique way to describe a pattern, but the notion of Design
Pattern Template is widely used to provide a coherent and systematic description
of its properties. Within the context of this book we are neither concerned by a
particular language nor by software engineering per se. The key idea we want to retain
from Design Patterns is that of building a documented list of abstraction operators
and algorithms that support their implementation, and of defining a template for a
common language to describe them.
According to Gamma [190], the use of Design Patterns can be a suitable conceptu-
alization framework to design effective systems, because it allows the experience of
many people to be reused to increase productivity and quality of results. The same
can be said for abstraction. In fact, designing a good abstraction for a given task
may be difficult and still a matter of art, and it would be very useful to exploit past
experience of several people. In fact analyzing a number of applications, abstraction
patterns might emerge; they could act as starting point for a new application, to be
adapted to specific requirements. A Design Pattern has three components:
1. An abstract description of a class or object and its structure.
2. The issue addressed by the abstract structure, which determines the conditions of
pattern applicability.
7.12 From Abstraction Operators to Abstraction Patterns 219
3. The effects of the pattern application to the system’s architecture, which suggests
its suitability.
As emphasized by Rising [454], there is now a community, called the patterns
community, in Software Development around the questions of identifying and
documenting design patterns. In the field of AI and in Knowledge Engineering,
what corresponds to the pivotal role of Design in Software Development is the cen-
tral notion of Knowledge Representation. By analogy to Software Development we
have chosen to describe the abstraction operators as a kind of Abstraction Patterns
(AP). Informally, such an abstraction pattern shall correspond to a generic type of
abstraction, to its impact, but also to a concrete approach to make it operational.
More precisely, we will identify four components in an AP:
1. An abstract description of the operator.
2. The issue addressed by the operator, which determines the conditions of pattern
applicability.
3. The effects of the operator to the system’s performance, which suggests pattern
suitability.
4. An operationalization of the operator.
Behind the introduction of abstraction patterns is the very idea of “abstraction”
itself: a user looks first at available patterns to identify the operator class that seems
better suited to his/her problem, without bothering with the operator details. Then,
after the choice is done, the actual operators are analyzed and tried.
To homogenize the description of these components for a generic Abstraction
Pattern (AP) we will use the template, adapted from a Design Pattern Templates
[267], reported in Table 7.12. In this template fields can be added or removed as
needed.
Making a parallel with the classification introduced in Sect. 7.1, we subdivide
abstraction patterns into groups, as reported in Table 7.13.
220 7 Abstraction Operators and Design Patterns
Table 7.13 Classification of abstraction patterns according to their effects, and to the elements of
the description frame they act upon
Argument → Elements Arguments Values
Type of (objects, types, attributes, (of a function, or relation) (of an attribute, or
abstraction functions, relations) function’s argument
or co-domain, or
relation’s argument)
↓
Hiding Hiding elements Hiding arguments Hiding values
Equating Equating elements Equating arguments Equating values
Hierarchy Building a hierarchy of Building a hierarchy of Building a hierarchy of
Building elements arguments values
In this section the abstraction patterns for hiding components of a description frame
are provided for the sake of illustration.
The abstraction pattern describing the act of hiding an element of a description
frame Γg aims at simplifying the description of a system by removing from view
an element, be it an attribute or a function or a relation. The corresponding generic
operator is ωhy , where y ∈ {obj, type, attr, fun, rel}.
The abstraction pattern for hiding an argument acts only on functions and relations,
and corresponds to the operators ωhyarg , where y ∈ {fun, rel}. Hiding an argument in
a function or relation reduces its arity. It is necessary to provide rules for computing
the cover of the abstract function/relation, because this is not usually automatically
determined.
The abstraction pattern concerning hiding a value in a description framework
corresponds to the operator ωhyval , where y ∈ {attr, funarg, relarg} (Table 7.14).
Other APs are given in Appendix F.
7.13 Summary
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 223
DOI: 10.1007/978-1-4614-7052-6_8, © Springer Science+Business Media New York 2013
224 8 Properties of the KRA Model
For approximation, let us apply the two operators ρrepl ((X, [0, ∞)), (X
, N))
and ρrepl ((Y , [0, ∞)), (Y
, N)), which replace the attributes X and Y , assuming
real values, with the attributes X
and Y
, assuming integer values. In particu-
lar,
the corresponding methods meth P0 , ρ
repl ((X, [0, ∞)), (X
, N)) and meth
P0 , ρrepl ((Y , [0, ∞)), (Y , N)) state that x = x and y = y. The effect of
these operators, to be applied simultaneously, is to round up all the real coordinates
with the largest integers not greater than the coordinates themselves.
Finally, in order to exemplify reformulation, we change the Cartesian coordinate
system from the Cartesian pair (X, Y ) to the polar coordinates (ρ, θ), with ρ 0 and
0 θ π/4. We have then:
ρ = x 2 + y2
θ = arctang xy
Let us suppose now that we observe a set of 10 points in the plane. Then:
P0 = O, A, ∅, ∅ with:
O0 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
A0 = {(1, point, 0.60, 1.13), (2, point, 1.34, 9.24)), (3, point, 2.63, 8.56),
(4, point, 6.05, 3.86), (5, point, 8.35, 9.05), (6, point, 9.80, 7.87),
(7, point, 12.11, 3.03), (8, point, 14.29, 9.58),
(9, point, 17.41, 5.89), (10, point, 19.11, 3.73)}
F0 = R0 = ∅
In Fig. 8.1 the three transformed
P-Sets Pa , Pap , Pr are reported.
By applying meth P0 , ωhobj (ϕhide ) to P0 , we obtain the following P-Set Pa :
Oa = {1, 3, 5, 7, 9, UN, . . . , UN}
Aa = {(1, point, 0.60, 1.13), (3, point, 2.63, 8.56), (5, point, 8.35, 9.05),
(7, point, 12.11, 3.03), (9, point, 17.41, 5.89),
(UN, point, UN, UN), . . . , (UN, point, UN, UN)}
Fa = Ra = ∅
The set COMP0 (Pa ) consists of all the configurations in which the UN’s in Oa and
Aa are replaced with precise values (with 2 decimal digits). COMP0 (Pa ) contains
P0 , and hence the transformation is indeed an abstraction. Notice that the UN values
are set here to denote the places where an abstraction took place, but they are ignored
when reasoning in the abstract space.
By applying the methods meth P0 , ρrepl ((X, [0, ∞)), (X
, N)) and meth
P0 , ρrepl ((Y , [0, ∞)), (Y
, N)) , the following P-Set Pap is obtained:
Oap = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Aap = {(1, point, 0, 1), (2, point, 1, 9)), (3, point, 2, 8), (4, point, 6, 3),
(5, point, 8, 9), (6, point, 9, 7), (7, point, 12, 3), (8, point, 14, 9),
(9, point, 17, 5), (10, point, 19, 3)}
Fap = Rap = ∅
226 8 Properties of the KRA Model
10
5
10
4
9
3
8
2
7
1
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 5
10 4
3
9
2
8
1
7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
6
5
10
4
9
3
8
2
7
1 6
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Fig. 8.1 Transformation of a P -Set P0 into Pa , via abstraction, into Pap , via approximation, and
Pr , via reformulation. In Pa all the points with even identifiers are hidden. In Pp all points have their
coordinates modified being approximated by their floor functions. In Pr points in P0 and points in
Pr are set into a one-to-one correspondence
The set COMP0 (Pap ) ≡ Pap consists of a single configuration. On the other hand
P0
∈ COMP0 (Pap ) and then COMP0 (Pap ) ∩ COMP0 (P0 ) = ∅; hence the transfor-
mation is indeed an approximation.
Finally, by changing the coordinate system from the Cartesian to the polar one
(where angles are measured in radians), the following Pr is obtained:
Or = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Ar = {(1, point, 1.28, 1.08), (2, point, 9.34, 1.43)), (3, point, 8.95, 1.27),
(4, point, 7.18, 0.57), (5, point, 12.31, 0.83), (6, point, 12.57, 0.68),
(7, point, 12.48, 0.25), (8, point, 17.20, 0.59), (9, point, 18.38, 0.33),
(10, point, 19.47, 0.19)}
Fr = Rr = ∅
In this case we have COMP0 (Pr ) ≡ Pr and COMP0 (P0 ) ≡ P0 ; on the other hand
Pr is functionally related to P0 , and then the two sets COMP0 (Pr ) and COMP0 (P0 )
coincide, as it must be for a reformulation. In fact, from any pair (ρ, θ) a single point
(the original one) is recovered in the (X, Y ) plane.
8.1 Abstraction, Approximation, and Reformulation 227
As an example, let us consider hiding an object, which is one of the most complex
(g)
hiding operation. If o is the identifier of the hidden object in ΓO , then we will have,
for every Pg in which o occurs:
Og = {o1 , o2 , . . . , oN }
Oa = {o
1 , . . . , o
N−1 } ∪ {UN}
where UN is one of the identifiers in {o1 , o2 , . . . , oN }. Moreover:
Aa = Ag − {(o, t, v1 (o), . . . , vM (o))} ∪ {(UN, UN, . . . , UN)}
Finally, in all the covers of functions and relations all occurrences of o are replaced
with UN. Clearly, when reasoning in the abstract space, all the UN’s values are ignored
(and the corresponding tuples as well), but the user is always aware that they denote
something hidden. As the above derivation is valid for all Pg ⊆ Ψg , the operator ωh
is an abstraction operator.
It is equally easy to show that operator ωhattrval is an abstraction operator. In fact,
it simply replaces some value in the domain Λm of an attribute Am with UN. This UN
stands for any value in Λm , including the “correct” one. Analogous reasoning can
be done when the value is hidden from the codomain of a function.
A little more tricky is ωharg , which hides an argument in a function (or relation). In
(g)
fact, all functions and relations have arguments that belong to ΓO ; then, by saying
that an argument in a function (or relation) is UN does not change the function (or
(g)
relation), because the unknown argument can take on any value in ΓO . However,
given a Pg and a function f (x1 , ., xj , . . . , xt ) (or relation R(x1 , ., xj , . . . , xt )), the
cover of f (or R) in Pg becomes less informed, as any observed value of argument
xj is replace by UN, introducing thus tuples that were not present in it. As this is
true for any Pg , any function f or relation R, and any argument xj , operator ωharg is
indeed an abstraction operator. Hiding an argument of a relation or function can be
implemented, in a database, with the projection operator in relational algebra.
Moving to the group of operators ωhier that create hierarchies, they hide a set
of elements or values, and replace each one of them by a more abstract element
or value.Let us consider the operator that builds up a more abstract type, namely
(g) (g)
ωhiertype ΓTYPE,child , t(a) . The link between each element tj ∈ ΓTYPE,child and
t(a) is an is-a link. Any configuration in the abstract space, where t(a) occurs,
corresponds to the set of configurations in which t(a) can be replaced by any one of
(g)
the tj ∈ ΓTYPE,child . In other word, any abstract element corresponds to the set of
ground elements from which it has been defined. Then, operators of this group are
indeed abstraction operators.
Finally, let us consider the composition operators, which combine description
elements to form new ones. As we have seen in Sect. 7.5, there are four operators in
this group, namely ωcoll , ωaggr , ωgroup , and ωconstr . The first three ones act exclusively
on objects.1 Let us consider the operators one at a time.
1This is a choice that we have done for the sake of simplicity. It is easy to envisage, however, that
combination can be defined on other descriptors or values.
8.2 Abstraction and Information 229
Operator ωcoll t, t(a) builds up a collective object of type t(a) , which is the
ensemble of many objects of the same type t. In more details, we have:
(a) (g)
ΓTYPE = ΓTYPE ∪ {t(a) }
(a) (g) (g)
ΓO = ΓO ∪ ΓO,t(a)
(a) (g)
ΓA = ΓA
(a) (g)
ΓF = ΓF
(a) (g)
ΓR = ΓR
(a)
Notice that, at the level of description frame, nothing changes except ΓTYPE and
(g) (a)
ΓO . In fact, type t remains in ΓTYPE , because not all objects of type t necessarily
(g)
enter a collective object. Even though the original identifiers in ΓO could be used to
denote the abstract objects, it may be convenient to introduce some specific identifiers
for them. All ground attributes are still valid and the functions and relations do not
change, as such. The real difference between ground and abstract representations only
appears when the corresponding methods are applied to Pg = Og , Ag , Fg , Rg . For
the sake of simplicity, we assume that just a single abstract object can be created
from Pg . By denoting the collective object by c, we obtain, in this case:
Oa = Og − {o1 , . . . , ok } ∪ {c} (k 2)
Aa = Ag − {(o1 , t, v1 (o1 ), . . . , vM (o1 )), . . . , (ok , t, v1 (ok ), . . . , vM (ok ))} ∪
∪ {(c, t(a) , v1 (c), . . . , vM (c))}
In the above expressions {o1 , . . . , ok } is the set of objects of type t entering the
abstract object c of type t(a) . Given that t(a) is a new type, the ground attributes
may or may not be applicable to it; then, attribute values are set to UN, or NA, or to
some specific value, depending on the specific meaning of t(a) . Each original object,
oj (1 j k), is linked to the newly created one, c, via an individual-of relation.
Concerning functions, Fg contains the covers of the functions fh (1 h H)
(g)
defined in ΓF . In the cover FCOVg (fh ) all the tuples where at least one of the objects
oj (1 j k) occurs, are hidden. The same can be done for relations.
According to the above considerations, we can conclude that ωcoll t, t(a) is
indeed an abstraction operator. In order to see this, let us consider a Pg , which includes
a single configuration ψg . Also Pa includes a single configuration, ψa , namely the one
obtained from Pg via ωcoll . Configuration ψa consists of the sets Oa and Aa , reported
above, plus the sets Fa = {FCOV
(fh ) | 1 h H} and Ra {RCOV
(Rk ) | 1 k
K}, containing the abstracted covers of functions an relations. The generic cover
FCOV
(fh ) is the cover FCOV (fh ), where some argument values have been replaced
by UN. This new cover corresponds to the set of configurations, in the ground space,
that can be obtained by replacing each UN with any admissible value, clearly including
the “correct” one. The same can be said for RCOV
(Rk ). The only critical point is
230 8 Properties of the KRA Model
the introduction of the new collective object with identifier c. However, this does
not increase the information, because c is functionally determined by the application
of an abstraction function coll(o1 , . . . , ok ) to the set of objets {o1 , . . . , ok }. Then,
the abstract configuration ψa is compatible, in Ψg , with all the configurations that
involve k-ple of objects generating c.
Let us now consider operator ωaggr ((t1 , . . . , ts ), t(a) ), which generates an object
of a new type by putting together objects of different types, on a perceptive or
functional basis, and operator ωgroup ϕgroup , G(a) ) , which groups objects according
to a given (possibly extensional) criterion. By following the same reasoning as for
ωcoll , it can be proved that ωaggr and ωgroup are abstraction operators, as well.
The role of the last combination operator, namely ωconstr (Constr) is somewhat
special. It is used to construct abstract descriptors, be they attributes or functions or
relations, starting from ground ones. It is difficult to precisely analyze its behavior
without defining the function Constr. What we can say, in general, is that, in order
for the operator to be an abstraction, no new information has to be added in the
abstract space, and the new descriptor’s values must be deducible from the ground
ones. Given the codomain of the function
(g) (g) (g) (g) (g) (g)
Constr : ΓA × ΓF × ΓR → ΓA ∪ ΓF ∪ ΓR ,
each value in it may be, in principle, obtained from more than one tuple in the domain.
Each tuple corresponds to a ground configuration consistent with the value in the
abstract one.
Apart from its use as a stand-alone operator, ωconstr can be used in connection
with the other combination operators. In fact, when building up a new type, one may
think of defining derived attributes from those of the constituents, without loosing
the property of obtaining an abstract space.2 Then, the application of one of the
operators ωcoll , ωaggr , or ωgroup can be followed by one or more application of the
operator ωconstr , obtaining a more complex abstraction process.
The methodology described in this section can be applied to all the abstraction
operators introduced in Chap. 7 and Appendix E, which can all be proved to be
abstraction operators.
In this section we analyze some of the approximation operators defined in Sect. 7.6
with respect to their information content. In Chap.7 we have defined as approximation
operators those that identify sets of elements, and replace these elements with one
of them, namely ρidelem , ρidval , and ρidarg .3 Moreover, we have defined a special
replacement operator ρrepl .
2A conservative attitude would assign to all existing attributes an NA value for the new object.
3We recall that operators that build up equivalence classes and denote them by generic names are
abstraction operators, instead.
8.3 Approximation and Information 231
Let us consider, for instance, ρidobj (ϕid ). This operator searches for k-tuples of
objects satisfying ϕid , selects one of them, say o(a) = oj (1 j k), and set the
others equal to o(a) . The way in which
o(a) is chosen (for instance, randomly) is
specified by meth Pg , ρidobj (ϕid ) . In particular, given Pg = Og , Ag , Fg , Rg , let
us suppose, for the sake of simplicity, that there is just one tuple satisfying formula
ϕid (the extension to multiple tuples is straightforward). Then, if ϕid (o1 , . . . , ok ) is
true, and t is the type of oj (1 j k), we have:
In the above expression we notice that o(a) has, of course, the same type as the
equated objects. Moreover, each of the objects oj (1 j k) becomes o(a) and
assumes the same attribute values as o(a) . Finally, all occurrences of each of the
oj (1 j k) in the cover of functions and relations is replaced by o(a) .
As we can see, we are dealing with an approximation, because some objects
and the corresponding attributes are all replaced with other ones, but not with an
abstraction, because the ground configurations corresponding to an abstract one are
not a superset of the original ones. Then Pg and Pa are incomparable with respect
to the information content. From the syntactic point of view this approximation
is also a simplification, because the number of distinct elements in Pa is reduced;
nevertheless, this fact does not imply, in our definition, that Pa Pg .
Analogous considerations hold for the other approximation operators that we have
defined. Let us look, for instance, at ρidtype , which is one of the most useful and inter-
esting approximation operators. Equating two types means that all objects, instances
of these types, are declared to be of just one type, chosen among the two. For example,
we can identify the types square and rectangle, and let them be represented
by rectangle. With this approximation all instances of squares and rectangles, in
any Pg , are all declared of type rectangle, and then they will have the attributes
of the selected type, possibly with some NA or UN values. As an example, let Color,
and Side be the attributes of type square, whereas those of type rectangle
are Color, Height, and Width. Let moreover (a, square, blue, 20) be an
instance of square, and (b, rectangle, yellow, 15, 10) be an instance
of rectangle. If the only type considered is rectangle, then the description of
object b is unchanged, whereas the one of object a becomes
(a, rectangle,
blue, 20, 20). Clearly, the method meth Pg , ωidtype specifies that Height (a) =
Side, Width(a) = Side, and Color (a) = Color. On the other hand, if we equate the
type square with circle, and the attributes of a circle are Color and Has-Center,
then the new description of the square will be (a, circle, blue, NA). Notice
that the objects a and b are kept distinct. In this case, as well, the operator is not an
abstraction operator.
232 8 Properties of the KRA Model
Given a query environment QE = Q, Γ, DS, T , L, we have seen that the observa-
tions cannot usually be exploited as they are, because they consist of “instantaneous
signals”, and then they must be rewritten into a format exploitable by the com-
putational devices specified in T . The exploitable observations constitute the data
structure DS. We will now prove that the transformation from Γ to DS (and hence,
from P to D) is a reformulation, in the sense that it does not change the configuration
space, and, hence, Γ and DS are equivalent from the point of view of information
potential.4
4 This justifies the name of the model KRA as Knowledge Reformulation and Abstraction.
8.5 Query Environment and Abstraction Operators 233
H
K
(t(o)) (t(o))
ψP = o,t(o), vj1 , . . . , vjM , FCOV (fh ), RCOV (Rk )
t
o∈O h=1 k=1
Moreover, each table in DS, associated to a function fh , contains exactly one row for
each tuple occurring in FCOV (fh ), and the same is true for each table corresponding
to a relation Rk . Then:
ψ P = ψD
If some information is missing, the same UN values occur at the same places in both
P and D. Hence, if D is constructed from P using algorithm BUILDATA(P), P and
D generate the same set of compatible configurations, i.e.:
COMP(P) = COMP(D)
As the above reasoning is true from any P, then the equivalence holds for Γ and
DS as well.
As we explained in Chap. 6, in order to answer a query Q, a theory must usually
be provided and applied to D. Let ANSg (Q) be the set of answers to the query
234 8 Properties of the KRA Model
obtained in the ground space, and let ANSa (Q) be the one in the more abstract one.
Following Giunchiglia and Walsh [214], we extend their classification of abstractions
as theorem increasing, theorem decreasing and theorem constant, in such a way that
the classification can be applied also to other contexts than theorem proving.
Definition 8.2 (A∗ -Abstraction) Given a query Q, and a query environment QE g =
Qg , Γg , DS g , Tg , Lg , let ANSg (Q) be the set of answers obtained by applying
theory Tg to DS g . Let QE a = Qa , Γa , DS a , Ta , La be the more abstract query
environment, obtained by applying an operator ω to Γ . We say that Ω = (ω, δ, λ, τ , )
is:
• Answer Increasing (AI), if ANSa (Q) ⊃ ANSg (Q),
• Answer Decreasing (AD), if ANSa (Q) ⊂ ANSg (Q),
• Answer Constant (AC), if ANSa (Q) = ANSg (Q).
If the query is the proof of a theorem, the notion of an A∗ -Abstraction becomes
exactly Giunchiglia and Walsh’ definition of T ∗ -Abstraction.
In the KRA model of abstraction there are several interacting components. First of
all, there is the relation between a description frame Γ and a P-Set P, the former
delimiting the boundaries of what could be observed with a given set of sensors,
the latter collecting actual measures performed on a concrete system. Abstraction
operators ω applied to Γ constrains all the potential descriptions of any system
observable with a given set of sensors Σ, whereas the method meth Pg , ω , applied
to Pg , implements, on a particular system, the changes specified by ω.
Another fundamental point we have highlighted is that abstraction acts on descrip-
tions of a system, and not on the system itself. Then, whatever its description (coherent
with a given Γ ), the system is always the same and the information about it can be
shown/hidden at will. In fact, abstraction is a non-destructive process, and the hidden
information is memorized to be retrieved later, if needed.
Finally, the presence of a query is crucial. In fact, there is no point in abstracting
without any target, and the effectiveness of an abstraction is strictly linked to the ways
the query can be answered. Then, any abstraction theory should take into account
the query. As we have already mentioned, in order to perform the task of answering
a query, we need information coming from two sources, namely observations from
the world, described by in Γ , and a “theory” T , providing the tools to perform the
task. These are the two primary components of QE, because the data structure DS
and the language L are determined by these two. In fact, DS simply formalizes the
content of Γ , as shown in the previous section, and L is simply a tool to express T ,
Q, and DS.
Given the query Q, we know that T and Γ should agree, in order to answer Q.
This means that the observations collected in any P, describable in Γ , should at
8.6 Abstraction versus Concretion 235
least include the information needed by T . There are basically three alternatives to
provide T and Γ :
• The theory T and the description frame Γ can be chosen independently. In this
case, to provide them is easier, but agreement is not guaranteed. If there is no other
way to proceed, it would be better to collect from the world as much information
as possible, and to gather a large theory, in such a way that the relevant parts can
be extracted later on. In this type of behavior there may be a waste of resources.
• The theory T is given first, biased by Q, and the observables will only include
aspects that are necessary to use T .
• The set of sensors is given first, and hence, Γ . The theory T shall then agree
with Γ .
In practice, the three alternatives may be combined. Nevertheless, it may happen that
complete agreement is not reachable, and only approximate/partial answers can be
given to Q. We may also notice that the process of selecting the observations and
the theory is often a spiral one: one may starts from one of the two, and then adjust
T and/or Γ in order to reach agreement gradually. Our model of abstraction is not
concerned with the possible ways to letting T and Γ agree; what we are interest in
is simply how the application of an abstraction operator affects agreement.
As an example, let us consider a concept learning task. We may provide, as
observations (Γ ), a set of learning examples described by continuous features. For
performing the task the algorithm ID3 [440] is available (T ). However, ID3 only
works on discrete features, so that T and Γ do not agree, because ID3 cannot be
used on the learning examples. This is a case of disagreement that can be solved in
various ways; for instance, by searching for another learning algorithm (changing
T ), or by searching for discrete features characterizing the examples (changing Γ ),
or by discretizing the available features (abstracting Γ ).
Going into some more detail, the observations (i.e., the “percept” P) play, in
our view, the primary role; in fact, the measures that can be collected on a system
are often constrained by physical impossibility, unavailability, cost, risk, and so on.
Then, in many cases, we are not free of collecting all the information we want or
need. Moreover, owing to the mere fact of existing, the sensory world is intrinsically
coherent, and, then, the information arriving from the sensors are consistent among
each other. Clearly, this is true assuming that the sensors are working correctly, and
that they are used in a proper way; if not, other issues beyond abstraction emerge,
and we are not dealing with them. As we briefly discussed in Sect. 6.2, the “percepts”
must be acquired and then memorized, to be available later on. The memorization is
structured according to stimuli “similarity”, in such a way that stimuli arriving from
the same sensor are kept together.
If DS is a database, tables in it are originally anonymous; each one is seman-
tically identified by the corresponding sensor and not by a “name”. However, the
manipulation and the communication of the content of the tables requires that they
indeed receive names; these names are provided by the language L. The fact that the
tables in DS have not a name, per se, is an important aspect of the model. In fact,
the table content corresponds to the outcome of some sensor, which does not depend
236 8 Properties of the KRA Model
on the language used to describe it. For instance, let us consider the case where we
group together stimuli corresponding to the color of objects; the table reporting the
color of objects can be assigned the name colore, if one speaks Italian, or color
if one speaks English. In fact, the name of the table does not change the perception of
colors. The same observations can be done for the objects, which receive a posteriori
their name. In this view the language is ancillary to the perception via the database.
On the other hand, the “theory” needed to reason about S is, in some sense,
independent of the perception, because it states generic properties holding in any
S. When the theory becomes instantiated on an actual system S it provides spe-
cific information about it. From this point of view, even though the theory T could
be selected independently from P, a theory that mentions properties that are not
acquirable from S is of no use. A theory may, of course, predict aspects of S not yet
known, as in the clamorous case of Higgs’ boson [248]. Also, “theory” may be the
starting point when generating worlds in virtual reality. However, in the problems
we have to face in everyday life, we need a theory the can be applied immediately.
Then, the perception and the theory must be compatible.
Concerning the language L, once its nature (logical, mathematical, …) has been
decided, its constituent elements are a consequence of the perception (through DS)
and the theory. L must be such that the theory can be expressed in it, and all the
tables in DS and constants in O receive their name.
Once the theory is added to QE, some inferences can be done, leading to the
potential definition of new tables, which, in turn, may offer new material for further
inferences. For economy reasons, the inferences are not done at definition time, nor
new tables are added to DS; both activities will be performed on demand, only if
and when needed.
Another issue that should be discussed is the bottom-up (abstraction) versus top-
down (concretion) use of any abstraction model (including KRA). By definition,
abstraction is a process that goes from more detailed representations to less detailed
ones. In this case, which is the one proposed in most previous models in Artificial
Intelligence (except planning), abstraction is constrained by the observations, and Pa
must comply with the ground truth provided by Pg , in the sense that, by de-abstracting
Pa , Pg should be obtained again.
However, one may think as well of a top-down process, where an original descrip-
tion of a system is made more precise by adding details steps by steps. This is the
case of design, invention, creation, where there is no a priori ground truth. This is,
for instance, Floridi’s approach [176], discussed in the next section. The same idea
is behind Abstract Data Types (ADT) in Computer Science, where an ADT is first
introduced at its maximum abstraction level, and implementation details are added
later on. An interesting top-down approach is proposed by Schmidhuber [477], who
creates even complex pictures starting from a single large circumference and adding
smaller and smaller ones, as illustrated in Fig. 8.2.
We notice that the top-down abstraction process described above is different from
the one used, for example, in planning. Here there is a ground truth, Pg , consisting
of detailed states and actions. When Pg is abstracted, less detailed states and actions
are obtained, which allow a schematic plan to be formulated. When the abstract plan
8.6 Abstraction versus Concretion 237
Fig. 8.2 Schmidhuber proposes a top-down approach to create abstract pictures. Starting from a
single circumference, smaller and smaller circumferences are added. By keeping only parts of the
circumferences, even complex figures can be created (Reprinted with permission from Schmidhuber
[477])
is refined, the details to be included are the ones originally present in Pg . In the case
of a true top-down abstraction process, the result may be totally new.
Even though the KRA model is primarily thought for a bottom-up use, which
is also the most common one in Artificial Intelligence tasks, it can also be used
top-down, by inverting the abstraction operators defined in Chap. 7. An example will
be provided in Sect. 8.8.1.1, when discussing Floridi’s approach.
Another aspect of abstraction that deserves to be discussed emerges when com-
bination operators are applied, in particular those that generate a collective or aggre-
gated object, or that create a group. In such cases, combined objects and new ones
are related by special relations, namely individual-of for collection, part-of for
aggregation, and member-of for grouping.
As we have seen in the definition of the operators’ body (see Sect. 7.10.1), these
relations are neither in Pg nor in Pa , but, instead, they are created during the oper-
ator application, and are stored in the operator’s memory. In fact, each one of these
relations establishes a link between objects that belong to different spaces, situated
at different levels of abstraction. Then, we must use either the ones or the others.
This fact may appear strange at first sight, because we see at the same time both the
components and the whole. This is due to the fact that when we see some special
arrangement of objects (for instance the parts of the computer in Fig. 5.3), the past
experience tells us that their association or aggregation brings some conceptual or
computational advantage, reinforced each time we see it anew. Then, in those par-
ticular cases, we automatically know, on the basis of past learning, what is a suitable
abstraction, without searching for it. As a consequence, we (humans) are able to rea-
son moving quickly and seamlessly without apparent effort between two (or more)
abstraction spaces at the same time.
238 8 Properties of the KRA Model
In the abstraction literature it is well known that some types of abstraction may
generate inconsistencies in the abstract space, as discussed, for instance, by Plaisted
[419], Giunchiglia and Walsh [214], Zilles and Holte [587], and Tenenberg [526].
As it will be shown in the following, this problem might not be as severe as it appears
at first sight. In fact, the consistency or inconsistency of the abstract space may or
may not be an issue, because the only important thing is whether the given query can
or cannot be answered in the abstract space. For instance, if we plan a trip by car from
one city to another, color and make of the car do not matter, whereas speed does.
Then, if there is an inconsistency about color or make, they can be ignored, as even
an inconsistent theory can be fruitfully used (avoiding checking for inconsistencies).
On the other hand, if a car has to be bought, color and make are relevant, and the
presence of an inconsistency may affect the results.
The reason why inconsistency derives in abstraction is that logical theories assume
that abstracting is deducing all that can be deduced from the ground space and the
abstraction mapping. Actually, abstraction should not be primarily concerned with
deduction, and hence theory correctness or completeness, but about usefulness. In
fact, the very idea of abstraction is deciding what information to keep and what
information to ignore. It is the user that has to decide how much he/she is ready to
bet, risking to obtain wrong results versus to obtain useful ones with a reduced cost.
Then, in the abstract space we have to preserve what we decide to keep, not what
can be deduced.
Then, a crucial issue in abstraction is the ability to go back to the ground space,
when abstraction proved to be unsatisfactory, and either try another abstraction, or
give up abstracting at all. From this perspective a very important issue is the ease in
going up and down across abstraction levels. This is the reason why we keep ready in
memory what has been hidden from one level to another, in order to facilitate coming
back to the ground space. The next example, provided by Tenenberg, and reported
in Example 4.15, illustrates the issue.
Example 8.2 (Tenenberg [526]) Let us go back to Example 4.15. In Tenenberg’s
formulation, all information provided for reasoning are put together in a single
“theory”. In order to handle the same example in KRA, we have to describe the
(g) (g) (g) (g) (g)
ground description frame Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR , where:
(g)
ΓTYPE = {glass, bottle}
(g)
ΓO = {a, b, . . .}
(g) (g) (g)
ΓA = ΓF = ΓR = ∅
In the KRA model predicate mapping is realized by an operator that constructs a
node in a type hierarchy. Then, we apply to Γg the operator
ωhiertype ({glass, bottle}, container}) ,
240 8 Properties of the KRA Model
which maps types glass and bottle to a new, more abstract type, container.
The results of this operator is the more abstract description frame
(a)
Γa = ΓTYPE , ΓO(a) , ΓA(a) , ΓF(a) , ΓR
(a)
, where:
(a)
ΓTYPE = {container}
(a) (g)
ΓO = ΓO
ΓA = ΓF(a) = ΓR
(a) (a)
=∅
By using KRA’s typing facility, there is no need to explicitly say that a glass is
not a bottle, and vice-versa, which is the rule generating the inconsistency. The
description frame Γa is neither consistent nor inconsistent; simply, in Γa there is no
more distinction between glasses and bottles, and this has only effect on the possibility
to answer the query. In Γa , any question that requires glasses to be distinguished from
bottles cannot be answered anymore, because we simply ignore that there are bottles
and glasses in the world.
Tenenberg shows that an inconsistency arises when a bottle a is observed, and he
makes the following derivation:
bottle(a) ⇒ container(a)
bottle(a) ⇒ ¬glass(a)
glass(a) ⇒ container(a)
container(a) ⇒ ¬container(a)
In our model we have the following ground and abstract observations:
Og = {a}
Ag = {(a, bottle)}
Fg = Rg = ∅
and
Oa = {a}
Aa = {(a, container)}
Fg = Rg = ∅
and there is no inconsistency in the abstract space, but only a less detailed description.
In fact, in our model the derivation that generates the inconsistency is not allowed,
because it involves elements across two different abstraction levels.
A more complex example is also provided by Tenenberg (reported in Example 4.16),
and we show here how it can be handled in the KRA model.
Example 8.3 (Tenenberg [526]) Tenenberg provides another case of predicate map-
ping, in which, besides types, there are also attributes of objects, which we partition
in observable and deducible. The observable attributes (Made-of-glass, Graspable,
Open) are inserted in Γg , whereas the deducible ones (Movable, Breakable, Pourable)
in the theory Tg .5
5 There are actually alternative ways to represent this example in the KRA model. We have chosen
the one that gives the closest results to Tenenberg’s.
8.7 Inconsistency Problem 241
By using the 18 rules reported in Example 4.16, we can define the following
observation frame Γg :
(g)
ΓTYPE = {glass, bottle, box}
(g)
ΓO = {a, b, . . .}
(g)
ΓA = Made-of -glass, {yes, no} , Graspable, {yes, no} , Open, {yes,
no}
(g)
ΓA,bottle = Made-of -glass, {yes} , Graspable, {yes} , Open, {yes,no}
(g)
ΓA,glass = Made-of -glass, {yes} , Graspable, {yes} , Open, {yes}
(g)
ΓA,box = Made-of -glass, {yes, no} , Graspable, {yes} , Open, {yes,
no}
(g) (g)
ΓF = Γ R = ∅
(g)
As we may see, the domains of attributes reported in ΓA are generic, and contain
all the possible values the corresponding attributes can take. On the contrary, some
of the attributes associated to specific types may have fixed values, common to all
instances of the type. The values of the attributes have been derived from rules (1–6).
We may note that rule (13) of Example 4.16 is not used, because it actually refers
to a less abstract description frame, where the basic types are milk-bottle and
wine-bottle, instead of bottle. Then, the frame called Γg is already a result
of a previous abstraction, where an operator creating a hierarchy has been applied.
As this rule is not used in the following, we may ignore this previous abstraction and
start from Γg .
Rules (7–12) are implicitly taken into account by the typing mechanism, and we
do not need to explicit them. The same can be said for rule (17). The theory contains
rules (14–16), namely:
Tg = {graspable(x) ⇒ movable(x)
made-of-glass(x) ⇒ breakable(x)}
open(x) ∧ breakable(x) ⇒ pourable(x)}
Let us apply now the abstraction operator that builds up a node in a type hierarchy,
starting from bottle and glass, i.e.,
ωhiertype ({glass, bottle}, glass-container).
The method meth Pg , ωhiertype ({bottle, glass}, glass-container)
must specify what to do with the attributes of the original types. This is a choice
that the user has to do. For instance, he/she may decide to be conservative, and thus
selects, for the type glass-container the following:
• only the attributes that are common to both bottle and glass,
242 8 Properties of the KRA Model
• for each selected attribute, the smallest superset of the values appearing in both
types bottle and glass.
With this choice, the abstract description frame Γa becomes:
(a)
ΓTYPE = {glass-container, box}
(a) (g)
ΓO = ΓO
(a) (g)
ΓA = ΓA
(a)
ΓA,glass-container = Made-of -glass, {yes} , Graspable, {yes} , Open,
{yes, no}
(a) (g)
ΓA,box = ΓA,box
ΓF(a) = ΓR (a)
=∅
As both types bottle and glass have attributes Made-of-glass, Graspable, and
Open, all three attributes are associated to the type glass-container. Moreover,
both glasses and bottles are made of glass and are graspable, so that the attributes
Made-of-glass and Graspable have only value yes. On the contrary, glasses are
open, but bottle may not be, so that attribute Open, in glass-container, may
take value in the whole domain {yes, no}. Moreover, the theory Ta is equal to Tg .
In Tenenberg’s example there is only one observation, namely open(a). In our
model this is represented by the following Pg :
Og = {a}
Ag = {(a, UN, UN, UN, yes)}
Fg = Rg = ∅
The values of a’s attributes are reported in the same order in which they appear in
(g)
ΓA . More precisely, the first UN stands for the type, which is not specified, the
second and third UN stand for Made-of-glass = UN and Graspable = UN, whereas the
last yes tells that object a has been observed to be open.
The observation Pg consists of several configurations, all the ones where each UN
is replaced by any value in the attribute’s domain. In particular:
Pg = {ψ1 , ψ2 , ψ3 , ψ4 }
where:
ψ1 = (a, glass, yes, yes, yes)
ψ2 = (a, bottle, yes, yes, yes)
ψ3 = (a, box, yes, yes, yes)
ψ4 = (a, box, no, yes, yes)
By abstracting the possible configurations, we obtain:
(a)
ψ1 = (a, glass-container, yes, yes, yes)
(a)
ψ2 = (a, glass-container, yes, yes, yes)
(a)
ψ3 = (a, box, yes, yes, yes)
(a)
ψ4 = (a, box, no, yes, yes)
8.7 Inconsistency Problem 243
(a) (a)
Configurations ψ1 and ψ2 collapse together, as it must be, having equated types
bottle and glass. Then, the abstract Pa contains:
(a)
ψ1 = (a, glass-container, yes, yes, yes)
ψ3(a) = (a, box, UN, yes, yes)
In the ground theory Tg the predicate open(a) is true, but also predicate graspable(a)
is true, because attribute Graspable is equal to yes in all Pg ’s configurations. As a
consequence, movable(a) and pourable(a) are also true.
As we may see, there are no inconsistencies in the abstract space, per se. There is
only a question of utility. For instance, consider the question Q1 = “pourable(a)?”.
The answer in the ground space is yes, as it is in the abstract one. Then the
performed abstraction is AC (Answer Constant). On the contrary, question Q2 =
“breakable(a)?”, cannot be answered in either space, because the information that
a is open and graspable is not sufficient to ascertain whether a is also breakable.
In the considered example, the types bottle and glass are almost the same,
except for the attribute Open. As this is exactly the attribute which is observed
true, configurations ψ1 and ψ2 become identical. Then, any question about a glass-
container involves equally a bottle and a glass.
The situation would be different if we assume the Closed World Assumption. In
this case, at it is not said that bottle(x) ⇒ open(x) nor that box(x) ⇒ open(x), we
have to assume that Open = no for all instances of types bottle and glass.
Then, we will have in Γg (all the rest being equal):
(g)
ΓA,bottle = Made-of -glass, {yes} , Graspable, {yes} , Open, {yes}
(g)
ΓA,glass = Made-of -glass, {yes} , Graspable, {yes} , Open, {no}
(g)
ΓA,box = Made-of -glass, {yes, no} , Graspable, {yes} , Open, {no}
In the theory we can also derive movable(a) and pourable(a). By abstracting Γg ,
we obtain, in this case (all the rest being equal):
(a)
ΓTYPE = {glass-container, box}
(a)
ΓA,glass-container = Made-of -glass, {yes} , Graspable, {yes} , Open,
{yes, no}
(a) (g)
ΓA,box = ΓA,box
Then, Pg would correspond to the following configurations:
ψ1 = (a, glass, yes, yes, yes)
ψ2 = (a, bottle, yes, yes, no)
ψ3 = (a, box, yes, yes, no)
ψ4 = (a, box, no, yes, no)
Using the rules introduced at the beginning of the example, we obtain the following
Pa :
ψ1(a) = (a, glass-container, yes, yes, yes)
(a)
ψ2 = (a, glass-container, yes, yes, no)
(a)
ψ3 = (a, box, yes, yes, no)
244 8 Properties of the KRA Model
(a)
ψ4 = (a, box, no, yes, no)
Again, Pa is not inconsistent, per se, but it simply has less information that the
original one; in fact, it collapses into the following one:
(a)
ψ1 = (a, glass-container, yes, yes, UN)
(a)
ψ3 = (a, box, UN, yes, no)
Pa derives from a set of sensors that are no more able to establish whether an object
is open or not, and hence any query involving attribute Open cannot be answered
anymore. In fact, even though query Q1 can be answered in the ground space, it
cannot be answered anymore in the abstract one. The same is true for Q2 .
As a conclusion, we can say that it is neither worth checking consistency of the
abstract space a priori, as Tenenberg suggests [525], nor to use complex derivations,
as proposed by De Saeger and Shimojima [464]. Nayak and Levy [395] have sug-
gested a method of dealing with inconsistencies which has something in common
with our approach. It will be discussed later on in this chapter.
where:
(g)
ΓTYPE = {greenlight}
(g)
ΓO = {o, o
, o
, . . .}
(g) (1) (2)
ΓA = Wavelength, {green, yellow, [λred , λred ]}
When a particular system is described, the values of the instantiated variables are
inserted into the behavior Πg , which contains the actual observations. Then, Πg
corresponds to a Pg :
Pg = {o}, {(o, greenlight, x)}, ∅, ∅
where x is the wavelength of o. The set A = {(o, greenlight, x)} specifies the
measured value x of the variable Wavelength on object o.
Floridi’s method of abstraction involves generic relations among LoAs, but we are
only interested in those that can be clearly identified as abstraction in KRA’s sense.
For this reason, we only consider nested Gradients of Abstraction (GoA). Given two
LoAs L1 and L2 , let L1 = Γa and L2 = Γg be the more abstract and more concrete
LoAs, respectively. We recall that Floridi’s method proceeds from the abstract to the
concrete by progressively refining descriptions. In order to map this method to the
KRA Model, we have to invert its process. Moreover, even though not explicitly
stated, his goal in abstracting is to find behaviors, so that the query can be formulated
as follows:
Q = “Given a behavior Πa , is there a corresponding behavior Πg such that
a relation R between L1 and L2 is satisfied?”
In principle, relation R can be any one, provided that for each Pa in the abstract
space there is at least one Pg in the concrete one. In particular, Floridi considers two
cases: in the first one, the range of value of a variable is refined, and, in the second
246 8 Properties of the KRA Model
(1) (2)
Table 8.1 Pseudo-code for meth Pg , ωeqattrval (Wavelength, ΛWavelength ), [λred , λred ], red)
METHOD meth Pg , ωeqattrval (Wavelength, ΛWavelength ), [λ(1) (2)
red , λred ], red)
Oa = Og
Fa = ∅
Ra = ∅
if Wavelength(o) = yellow or green
then Aa = Ag
else Aa = {(o, greenlight, red)}
endif
Δ(P ) = Wavelength(o)
one, a variable is added going from the abstract to the concrete. Refining a variable’s
domain is described in Example 4.17, where the color of a greenlight is considered.
The abstract LoA is then:
(a)
Γa = L1 = {greenlight}, ΓO , {(Color, {red, yellow, green})}, ∅, ∅
Then:
Γa = ωhattr Y = {obj}, ΓO , {(X, ΛX ), ∅, ∅
The corresponding method meth Pg , ωhattr Y generates an abstract Pa for each
Pg . Both Tg and Ta are empty, and the construction of Da and La is straightforward.
As we have seen in Sect. 4.10, Hobbs partitions the set of objects in a domain into
equivalence classes, such that two objects belongs to the same class iff they satisfy
a subset of the “relevant” predicates in the domain. Hobbs starts from a ground
theory Tg , which contains everything, namely the observations, the language and the
theory. Data and observations are not distinguished between each other, and are both
expressed in logical form. If Sg is the set of ground objects, Pg the set of predicated,
R ⊆ Pg the subset of relevant predicates, and Sa the set of equivalence classes, then
Hobbs defines an abstraction function f : Sg → Sa such that
(g)
ΓTYPE = {agent, block, table, location, event}
(g) (g) (g) (g) (g) (g)
ΓO = ΓO,agent ∪ ΓO,block ∪ ΓO,table ∪ ΓO,location ∪ ΓO,event
(g)
ΓA,location = {(X, R+ ), (Y , R+ ), (Z, R+ )}
(g) (g) (g)
ΓA,agent = ΓA,block = ΓA,table = ∅
(g) +
ΓA,event = {(te , R+ ), (T , 2R )}
(g) (g) (g)
ΓA = ΓA,location ∪ ΓA,event
(g)
ΓF = ∅
(g) (g) (g)
ΓR = ROn ⊆ ΓO,block × ΓO,block ∪ {tab} × R+
(g)
In the above definitions ΓO,location is the continuous set of points (locations ) in
the Euclidean space, and X, Y , Z are the Cartesian coordinates of a location . Objects
of type agent, block, and table do not have attributes. Events are described
by their end time te and duration T . Relation ROn (x, y, t) states that block x is on
another block y or on the table at time t.
The theory contains the predicate move. If one tries to apply the above theory
to some real case, it is immediately clear that no sufficient details are provided. For
instance, it is not clear how move does work, namely whether it acts directly on
blocks (“move block b1 on block b2 ”) or on blocks through locations (“move block
b from location 1 to location 2 ”).
Another point needing clarification is the relation between actions in the world and
events. As the only allowed action consists in moving a block, at each application of
the move predicate a corresponding event should be defined. Events must be created
on the spot, because a priori it is not known which moves will be made. How events
are created must be specified in Tg as well.
In the abstraction mapping all agents are indistinguishable (equivalence class EA)
except agent A, all blocks are indistinguishable (equivalence class EB) except those
on the table. For locations, only locations (xi , yi ) with xi , yi ∈ [0, . . . , 100] are kept
as different, whereas all other locations are collapsed into the same equivalence class
and labelled EL. Table tab does not change. Moreover, Hobbs defines a mapping
function κ, which maps the predicate move onto itself.
We will show now how the above transformations can be realized in the KRA
model. Regarding locations, two different operators are applied in sequence: the first
one makes points on the same vertical line (Z axis) indistinguishable, and, afterwards
the continuous values x and y are transformed into their floors. This second operation
can be interpreted as an approximation, which substitutes to k any value of the
attributes X or Y included in the intervals [k, k + 1) for (0 k 99).
In the KRA model, the above transformations can be obtained with the set of
operators specified in the following:
• For events, the end time and the duration are equated, so that events become
instantaneous
in the abstract space. This is achieved by operator ωeqattr (te , R+ ),
+
(T , R ) .
8.8 KRA’s Unification Power 249
(g)
• For agents, operator ωeqobj ϕeq (u), EA , with ϕeq (u) ≡ [u ∈ ΓO,agent ]∧[u
= A],
makes all agents different from A equivalent to each other.
• For blocks, only those that lie on the (unique) table maintain their identity, whereas
all the others are made equivalent. This is achieved by
(g)
operator ωeqobj ϕeq (u), EB , with ϕ
eq (u) ≡ [u ∈ ΓO,block ]∧te [(u, tab, te ) ∈
0 = (x, y, 0).
• Once the Z coordinate has been hidden, we want to reduce the surface of the
table (which is located at z = 0) to a grid corresponding to integer values of
X and Y . In this way, any value x or y is reduce to its floor function, resulting
thus in an approximation
ofthe true value. This approximation is performed by
operator ρidobj ϕ
id ( 0 ),
(a) , applied one time to X and one time to Y . Moreover,
eq ( ), 0 ⊗ ρeqobj ϕ
eq,X ( 0 ),
(a)
, ρeqobj ϕ
eq,Y ( 0 ),
(a)
(a)
(b)
Fig. 8.3 Ground and abstract spaces corresponding to the scenario described in Example 8.4
Clearly the obtained abstract theory is simpler than the original one, as in Hobbs’
intentions, but its usefulness cannot be ascertained per se. In fact, if we consider the
starting scenario of Fig. 8.3a, and a query Q1 , consisting of a state to be reached,
namely “Six block on the table”, this query can be answered in 3003 different ways
(6 objects out of 14) in the ground space, whereas it cannot be solved in the abstract
one. On the contrary, the query Q2 “Four blocks on the table” can be solved in 1001
different ways (choosing 4 objects out of 14) in the ground space, and in 5 ways in
8.8 KRA’s Unification Power 251
the abstract one. Then the applied abstraction process is AD (Answer Decreasing)
for both Q1 and Q2 .
The query we want to answer is Q = “Flights with no more that 1 stop between
New York and Seattle”. Using the rule in Tg , the set of answers to the query is
ANSQg (Q) = {(New York → Seattle),
(New York → San Francisco → Seattle)}.
If we now apply the grouping operator, we obtain:
(a)
ΓTYPE = {state}
(a)
ΓO = {California,Washington,NY State,North Carolina,
Massachussetts},
(a) (a) (a) (a) (a) (a)
ΓA = ΓF = ∅, ΓR = {Rdirconnect ⊆ ΓO × ΓO }.
For the connections we have:
(a)
RCOV (Rdirconnect ) = {(NW State, California), (NW State,
Washington), (California, Washington),
(Washington, Massachussetts),
(California, North Carolina)}
The content of Pa is translated into two tables in Da , namely, OBJ (a) and
DIRCONNECT (a) , the first containing the states and their type state, and the
second the direct flights between states. The language La contains the constants Ca =
{California, Washington, NY State, North Carolina,Massa
chussetts}, the predicates state(x), dirconnect(x, y) and connect(x, y). There
are no functions. Theory Ta contains the same rules (8.5) and (8.6), when x and y are
now of type state.
If we try to answer the query Q in the abstract space, we obtain the following
answer set:
ANSQa (Q) = {(NW State → Washington),
(NW State → California → Washington)}
Then, the applied abstraction is an AI (Answer Increasing) abstraction; in fact, the
answer set contains the correct direct connection (NW State → Washington),
which corresponds, in the ground space to (New York → Seattle), and the addi-
tional (NW State → California → Washington), which corresponds to
the existing connection (New York → San Francisco → Seattle) but also
to the non existing connection (New York → Los Angeles → Seattle).
Clearly, if no flight exists between two states, no flight will exist between any two
cities in the state.
8.8 KRA’s Unification Power 253
Among the operators defined on perception, we may include also the fuzzy sets and
the rough sets theories.
Fuzzy Sets
In the theory of fuzzy linguistic variables (see Sect. 4.5.3), the range U of a numerical
variable X is mapped onto a set T (L) of linguistic terms taken on by a corresponding
linguistic variable L. The association is done by a semantic rule M, specifying the
membership function μ(x). In the KRA model variable X can be either an attribute X
(g)
or the codomain of a function. In the case of an attribute, we have ΓA = {(X, ΛX )},
and the corresponding operator is one that discretizes U, i.e.:
ωeqattrval (X, ΛX ), [xi , xj ], Lij
The corresponding method meth Pg , ωeqattrval (X, ΛX ), [xi , xj ], Lij will assign a
linguistic term Lij to the interval [xi , xj ] ⊆ U. By considering a set of such oper-
ators, the domain of X is transformed into a linguistic domain of attribute L. As a
consequence, the abstract set of attributes will be:
(g)
ΓA(a) = ΓA − {(X, ΛX )} ∪ {(L, T (L))}
and the memory Δ(P ) will contain the semantic rule M.
Example 8.6 Let X be the attribute Age of a person, and let ΛAge = [0, 140] be
its domain. We can define a linguistic variable LAge with domain ΛLAge = {very-
young, young, middle-age, rather-old, old, very-old}. By
suitably defining fuzzy membership functions, we may say, for instance, that
Age = 18 and Age = 22 are both abstracted to LAge = very − young.
Rough Sets
Concerning rough sets, objects in a domain are make identical by an indiscernibility
relation, which produces a tessellation of the object space into equivalence regions.
As modeling rough sets arises interesting issues, we handle this case in some detail.
In fact, the use of rough sets involves both abstraction and approximation.
Let us consider, for the sake of exemplification, a simple case, where Γg contains
a single variable u of type t and with domain U. Let moreover concept be a type
denoting subsets of U. Let us consider a set A = {A1 , . . . , Am } of attributes, with the
associated set Λ = {Λ1 , . . . , Λm } of values. Then:
(g)
ΓTYPE = {t, concept}
(g)
ΓO = U
(g)
ΓA = {(Ai , Λi )|1 i m}
Let moreover Aind be the subset of A containing the attributes that make indis-
tinguishable objects (see Sect. 7.3.1.1). We apply to Γg the following abstraction
process Π , consisting of a chain of three operators:
254 8 Properties of the KRA Model
(A )
Π = ωeqobj ϕeq ind , [u]Aind ⊗ ωhtype (t) ⊗ ωaggr ((concept, concept), approx)
(a)
ylw = {[u]Aind ∈ ΓO,concept | [u]Aind ⊆ y}
(a)
yup = {[u]Aind ∈ ΓO,concept | [u]Aind ∩ y
= ∅}
This operator performs a tessellation of the plane. Suppose now that we observe a
Pg , consisting of all points in the upper-right quadrant of the plane, and a region c,
corresponding to the oval in Fig. 4.12. Then:
Oa,point = {p | X(p) 0, Y (p) 0}
Oa,region = {c}
By applying the process Π first, and the approximation operator afterward, we obtain
a final Pap :
(ap)
ΓTYPE = {region, approx}
(ap)
ΓO,region = i,j∈N [r]ij
(ap)
ΓO,approx = {(clw , cup )}
The concepts clw and cup are the lower and upper approximations, respectively, of
c, and they are reported in Fig. 4.12.
As a conclusion, the procedure of approximating a set (a “concept”, in Pawlak’s
terms [414]) with two others, less detailed sets, involves both abstraction and approx-
imation.
In this section we consider the theories of abstraction defined primarily at the level
of models (or data structure, in our terminology). Figure 8.4 highlights the primary
operator used to define these types of abstractions, namely δ.
Fig. 8.4 Abstraction defined primarily at the data level. The other operators, λ and τ are derived
from δ
256 8 Properties of the KRA Model
By semantic operators we mean all the ones originated in the Database field
(reviewed in Sect. 4.4), and the operators acting on (logical) models (reviewed in
Sect. 4.7).
In this section we consider the models proposed by Miles-Smith and Smith’s [371],
Goldstein and Storey [217], and Cross [118]. The abstraction operators proposed by
these authors can be grouped into five categories:
• Aggregation between objects of different types [118, 217, 371], among attributes
[217], or among entities and attributes [217]. In an aggregation a relation becomes
a new entity with a new meaning and possibly emerging properties. The basic
abstraction relation is the “part-of”. Aggregation is defined at the level of database
schema.
• Generalization of types into supertypes, forming an “is-a” hierarchy [118, 371].
Generalization is also called Inclusion by Goldstein and Storey [217]. The basic
aspect of generalization is inheritance, because a node transmits its properties to
its children. Generalization is defined at the level of database schema.
• Association of entities of the same type to form a new collective type [217]. The
collective type may have emerging properties. The basic relation in association is
“individual-of”. Association is defined at the level of database scheme as well.
• Classification is a particular case of Generalization, where an “instance-of”
relation is defined between instances of a type and the type itself [118]. In Classi-
fication, the type has properties which are common to all its instances.
• Grouping is another way of linking individuals to a class [118]. On the contrary
of Classification, which has an intensional basis, grouping has an extensional one.
In fact, a group may be created simply out of the will of the user, and individuals
in the group do not need to share any property. In addition to simply collecting
together individuals in a subjective way, a group can also be created by defining
a predicate, and collecting together all individuals satisfying the predicate. The
type corresponding to the group is simply a name without a description, used as
a short-cut to denote a set of individuals. The relevant relation for grouping is
“member-of”.
The phase of data acquisition is implicitly assumed to have been done previously;
then, there is no explicit notion of perception or observation. The language used to
manipulate the data is an SQL-like language. The “theory” consists of the set of
relational algebra operators, and the task to be solved is usually answering a query
expressed in SQL. In terms of the KRA model, the operators of interest are those of
kind δ (cfr. Fig. 8.4).
In the database approach abstraction is mostly a reorganization of data, consisting
in creating new tables from existing ones. As the original tables are not hidden, the
global process is not actually an “abstraction” in the sense considered in this book,
8.8 KRA’s Unification Power 257
because no information is hidden. However, using the view mechanism, the old tables
can be easily hidden.
Between abstraction operations defined for databases and abstraction operators
in KRA there is a two-way link; in fact, on the one hand, KRA operators can be
used to model database abstractions, but, on the other, these last could be used as
methods of the δ operators themselves. Before exploring this correspondence, we
need to make an association between an Entity-Relationship (ER) model and KRA.
The schema of a database involves entities, which are objects or events or anything
that can be considered as stand-alone and uniquely identified, attributes, which are
associated to entities (or relations), and relations, which link entities among each
other. In terms of KRA, the database schema corresponds to DS. Let us look at
each operation in turn.
AGGREGATION—Aggregation among entities is a widespread operation. Aggre-
gation operates at the level of the database schema, by relating types of entities
rather than single entities. If the types to be aggregated are {E1 , . . . , En }, and the
new type is E, Aggregation creates a new table scheme, where the Ei (1 i n)
become attributes of type E. This scheme is added to the database scheme DS, and
the corresponding populated table is added to D. The matching operator in KRA is
Beyond aggregating entities (objects), Goldstein and Storey [217] suggest aggre-
gation of attributes. For instance, let Street, Civic number, City, and Country be
attributes of a type person. We can build up with them a new, aggregate attribute
Address. In order to model this type of aggregation in KRA we use the operator
where ΓTYPE,child is the subset of types that are to be generalized, and type(a) is the
new type. In order to model G : (A1 , . . . , An , C1 , . . . , Cm ) we have to select, in G, the
types C1 , . . . , Cm that become children in a hierarchy, whose father is type(a) = G;
then, we can use
δhiertype ({C1 , . . . , Cm }, G)
for i = 1, m do
Let Ai ⊆ A be the subset of attributes meaningful
for type Ci
Assign to G the set of attributes AG = m i=1 A i
end
We have left this choice to the implementation level, because also other choices could
be made without changing the semantics of δhiertype .
As we have noticed for Aggregation, also Generalization is not an abstraction, in our
sense, if the original types are not hidden. In Miles Smith and Smith’s, and Goldstein
and Storey’s approaches the new type (with the corresponding
table for the generic
object) is simply added to Dg . As for Aggregation, meth Dg , δhiertype also performs
the following operations:
• In table OBJ, all rows corresponding to objects of type Ci (1 i m) are hidden.
• All rows in all tables of Dg , where an instance of Ci (1 i m) occurs, must be
removed (hidden).
• All tables Ci -ATTR (1 i m) must be hidden in Da .
• If ΓO,i is the set of objects of type Ci in Dg , then ΓG(a) = m
i=1 ΓO ,i .
The new table, containing the actually performed aggregations, is not added to Da ,
because is contains items across the two levels of abstraction. On the contrary, it is
stored in Δ(D) as the is-a relation between aggregate and components.
(a) (a) (a) (a) (a)
In the abstract description frame Γa = ΓTYPE , ΓO , ΓA , ΓF , ΓR we have:
(a) (g)
m
ΓTYPE = ΓTYPE − Ci ∪ {G}
i=1
(g)
m
(g)
ΓO,G = ΓO,i
i=1
(a) (g)
m
(g) (g)
ΓO = ΓO − ΓO,i ∪ ΓO,G
i=1
(a) (g)
ΓA = ΓA
(g)
For what concerns functions and relations, the set ΓO,t(a) replaces, in their domain
(g)
or codomain, any occurrence of one of the objects in the ΓO,i ’s.
The above discussion applies without changes to the Inclusion abstraction defined
by Goldstein and Storey [217].
GROUPING—The grouping
operation corresponds in KRA to the grouping oper-
ator ωgroup ϕgroup , G . For this operation, analogous considerations can be made as
for the preceding ones.
260 8 Properties of the KRA Model
As described in Sect. 4.7.1, Nayak and Levy proposed a theory of abstraction based
on the models of a logical theory [396], motivated by the desire to solve the incon-
sistency problem. In essence, they suggest to manipulate the tables generated by an
interpretation of a ground logical theory, and then to try to find an “abstract” formula
that reflects the modification. Then, in KRA terms, they start with a δ operator, and
then move to λ (modifying the logical language), and to τ , as a side-effect of λ (see
Fig. 8.4).
As Nayak and Levy notice, complex manipulations can be easily done on the mod-
els, but finding the abstract theory that implements them may be difficult. To this aim,
the authors describe an automated procedure, Construct-Abstraction(Tg , N , V),
which constructs the abstract theory for the special case in which the abstract lan-
guage can be obtained from the ground one by dropping some predicates and adding
new ones. This type of abstraction includes predicate mapping, dropping arguments
in predicates, and taking the union or intersection of predicates. The procedure con-
sists in deriving the abstract theory from the ground one (Tg ), the set of rules (N )
defining the new predicates, and the set (P) of predicates to be dropped. For com-
plex theories this procedure may be computationally costly. A similar approach was
taken by Giordana et al. [208, 210] earlier on, when proposing a semantic theory of
abstraction for relational Machine Learning.
If we look more closely into the procedure Construct-Abstraction(Tg , N , V), we
notice that it is actually a syntactic abstraction that it realizes, because it consists of
logical derivations at the level of theory. In order to illustrate how Nayak and Levy’s
semantic model can be represented in KRA, let us limit ourselves to predicate
mapping.
Given a model of a ground theory, let T1 and T2 be two tables, sharing the same
schema. T1 contains the set of objects of type t1 , and T2 those of type t2 . We may
construct a new table T = T1 ∪ T2 . T contains the set of objects of either type t1 or
type t2 . The two types can be expressed, in the ground language, as two predicates
t1 (x) and t2 (x), and the resulting formula, associated to t, is t(x) = t1 (x) ∨ t2 (x).
In terms of the KRA model, the whole process can be represented as follows.
Let Dg be a database, where the table OBJ = [ID, Type] contains the identifiers of
N objects, each one with associated its type. Let t1 and t2 be two types. The objects
belonging to these types can be extracted from OBJ by means of the relational algebra
operator of selection, namely:
8.8 KRA’s Unification Power 261
As described in Sect. 4.7.2, Ghidini and Giunchiglia [203] define a semantic abstrac-
tion that encompasses several of the operators defined in this book. For example, they
consider symbol abstraction, in which different ground constants (domain abstrac-
tion), or functions, or predicates (predicate mapping) collapse into a single one in
the abstract space. These abstractions can be modeled, in KRA, with the operators:
• ωeqobj ({c1 , . . . , cn }, c) for domain abstraction.
• ωeqfun ({f1 , . . . , fn }, f ) for function mapping.
• For predicates, more than one KRA’s operator can be used. For example,
ωhiertype ({t1 , . . . , tn }, t) can be used when predicates represent types, and they
are replaced by a more general type (usually this is a predicate mapping), or
ωhierrel ({R1 , . . . , Rn }, R), when predicates are associated to relations and not types,
or still ωeqrel ({R1 , . . . , Rn }, [R]) when an equivalence class of predicates is built
up.
Another kind of abstraction described by Ghidini and Giunchiglia is arity abstraction,
which reduces the number of arguments of a function or relation. In our model, we can
map arity reduction of a function to the operator ωhfunarg (fh , xj ), which hides argu-
ment xj of function fh , and arity reduction of a relation to the operator ωhrelarg (Rk , xj ),
which hides argument xj of relation Rk . For relations, if all arguments are hidden, the
propositionalization operator, defined by Plaisted [419], is obtained. In our model it
is possible to define a composite operator, namely ωhrelarg (Rk , {xj1 , . . . , xjk }), which
hides several arguments (in the limit, all) at the same time.
Finally, the authors introduce a truth abstraction, which maps a set of predicates
to the truth symbol . In our model this abstraction corresponds to an abstraction
process consisting of a sequence of two operators: the first removes all arguments of
the predicate, and the second builds up a hierarchy with the father node equal to .
As in our model a predicate may correspond to different description elements, such
as types, attribute values, relations, etc., we consider here the case of a predicate
corresponding to a relation, which is the most common case. Then, the first operator
(a) (a)
is ωhrelarg (Rk , {x1 , . . . , xn }), which generates Rk , and the second is ωhierrel (Rk , ).
Another way to look at this type of abstraction is to hide the arguments of Rk , and
then approximate it with .
264 8 Properties of the KRA Model
The above described different kinds of abstraction are special cases of atomic
abstractions, introduced by Ghidini and Giunchiglia mostly targeting theorem
proving. In fact, they are all TI-abstractions, and offer abstract simplified proofs,
whose structure may guide the search for proofs in the ground space. The properties
required by an atomic abstraction, reported in Sect. 4.7.2, are not meaningful in our
model, at least for the part that concerns observations. As we have said, our model
does not have the ambition to exhaust all the aspects of abstraction, and it is explicitly
not targeted to theorem proving. It is much better suited to domains where observa-
tions play the most important role, complemented by a theory specifically designed
to handle the observations.
Historically, models of abstraction have been first proposed at the syntactic level,
as mapping between languages, with the works by Plaisted [419], Tenenberg [526],
Giunchiglia and Walsh [214], and, more recently, De Saeger and Shimojima [464].
These models have sound foundations in Logics, but they fail to offer concrete tools
to perform abstraction in practice. In fact, to the best of our knowledge, no one of
them went beyond a simple explicative example. In the following we will show, for
some of them, how they can be translated into abstraction operators in the KRA
model, making them applicable in practice.
Even though several theories of abstraction are defined as mappings between
complex formulas or sets of clauses, predicate mapping is one of the most investigated
abstractions, for its potential applicability. Given a predicate in the ground language,
its renaming in the abstract one is clearly not an abstraction, if it is done in isolation,
because it simply corresponds to changing its name, and hence to a reformulation.
The interesting case is when two different predicates in the ground language are
renamed onto a unique one in the abstract language (Giunchigla and Walsh’ Predicate
Mapping and Plaisted’s Renaming).
Plaisted points out the difficulty of generating abstractions in general, and he offers
a number of methods to concretely build up abstraction mappings that preserve
some required properties. Among these there are mappings between sets of clauses,
ground abstraction, propositional abstraction, changing signs to literals, permuting
and deleting arguments. One way to simplify the generation of abstractions between
clauses or sets of clauses is to reduce the abstraction to a mapping between literals.
The basic properties required from an abstraction between literals are reported in
Theorem 4.1. We want now to match Plaisted’s approach to the KRA model. Because
of Theorem 4.1 we limit ourselves to abstractions between literals. Actually, as
8.8 KRA’s Unification Power 265
Plaisted’s abstractions preserve negation (and instances), we can further reduce our
analysis to positive literal, i.e., predicates.
In the following, we consider a theory consisting of a set of clauses Tg ={C1 (x),
. . . , Cn (x)}. The theory is expressed in a language Lg = Cg , Xg , Og , Fg , Pg , where
Cg = {a1 , a2 , . . .}, Xg = {x1 , x2 , . . .}, Og is the set of standard logical connectives,
Fg = ∅, and Pg = {p1 (x), . . . , pm (x)}. Using Pg and Cg we can build Herbrand’s
universe. There is neither a notion of observation nor of data structure.
Given a clause C, let C(a) be its abstraction. If C = { 1 , . . . , k }, by definition:
C(a) = τ (C) = { (a)
j | 1 j k}
The operator τ is then expressed in terms of operators on the predicates in Pg , which
we have called λ. In conclusion, in Plaisted’s approach, we can handle abstraction
between theories in terms of abstraction between languages. Let us consider some
of the proposed abstraction.
GROUND ABSTRACTION—This kind of abstraction replaces a predicate
p(x) with the set of all its grounding with the constants in Cg . This set can be
infinite. Without loosing the essence of the abstraction, let us suppose that p is a
unary predicate p(x). We can thus define an operator λground (p, Cg ), such that:
to hide the set {xj1 , . . . , xjr 1 } of arguments. This transformation is indeed an abstrac-
tion according to our model.
CHANGING SIGN OF A LITERAL or PERMUTING ARGUMENTS—As the
negation of a literal or changing systematically the order of the arguments is univo-
quely determined by the original literal, also these abstractions are actually reformu-
lations, according to our criteria.
As we may see, in Plaisted’s approach there is no need for observations or a database,
because the query to be answered is always a theorem to be proved. This task only
requires theory and language.
266 8 Properties of the KRA Model
The most recent model, aimed at capturing general properties of abstraction, has been
proposed by De Saeger and Shimojima [464]. As described in Sect. 4.6.3, it uses the
notion of classification and infomorphism. Notwithstanding its elegance, the model
does not solve the problems that we have analyzed in this chapter, and, moreover,
it is hard to model with it abstractions more complex than predicate mappings. One
interesting aspect of the model is that, by considering abstraction as a local logic on
a channel that connects two classifications, abstraction becomes a two-way process;
this may form the basis for achieving the flexibility in abstracting/de-abstracting that
we consider a fundamental aspect of any applicable abstraction theory.
Moreover, the model includes in a natural way both the syntactic and the semantic
aspects of abstraction. In terms of KRA, abstraction based on channel theory includes
the theory and language components of a query environment, in the syntactic part,
and the data structure in the semantic one. Possible observations do not play any role
in the model, currently, but the authors themselves acknowledge their importance
and believe that observations could be added as a further classification in the whole
schema.
The KRA model appears particularly well suited to describe systems that have a
strong experimental components (perception and observation). Often this kind of
systems is investigated in non logical contexts, where the primary source of infor-
mation is the world. One of the field where this is true is Cognitive Science, where
abstraction plays an essential role, but is rarely formalized.
One of the researcher who explicitly acknowledges this role is Barsalou, who,
together with his co-workers, investigated in detail abstraction and its properties
in cognition [216]. They define three abstraction “operators”, namely selectivity,
blurring, and productivity. These operators are all defined on perception, namely on
our Γ , as the ones in KRA. The selectivity operator lets the attention concentrate
on particular aspects of a perception, and then it can be modeled with operators that
select features, namely of the kind ωh , the most common being ωhattr , which selects
attributes of objects. The blurring operation is mostly relevant in acoustic or visual
perception, and it consists in lowering the resolution of an image or sound, by making
it less detailed. This operation corresponds, in the KRA model, to more than one
operator; in particular blurring can be obtained by replacing groups of pixels (or of
sounds) within smaller regions, obtained, for instance, with the aggregation operator
ωaggr , or the operator ωeqobj , which forms equivalence classes among objects; even
ωhobj can be applied, because it hides sounds or pixels, realizing thus a sampling of
the input signal. Finally, productivity corresponds to our aggregation operator ωaggr ,
which generates new objects from parts.
8.9 KRA and Other Models of Abstraction 267
Goldstone and Barsalou [216] have also introduce the object-to-variable binding
operation, which they label as abstraction. This operation might correspond to the
operator that builds up classes of equivalence among objects. If {a1 , . . . , an } is the
set of constants that are replaced by the variable x, operator ωeqobj ({a1 , . . . , an }, x)
do the job. Replacing constants with a variable is a typical generalization operation,
which, in this case, also corresponds to an abstraction.
Operator ωhattr , which performs feature selection, is also at the basis of Schyns’
investigation [220]. In fact, he tried to ascertain what features the human eye focus
on when looking at a face and trying to decide its gender and/or the mood.
Behind the phenomenon of fast categorization of animals in natural scenes [132,
158, 211] there is likely a complex, but very rapid process of abstraction, consisting
of a mix of feature selection (ωhattr ), feature construction (ωconstr ), and aggregation
(ωaggr ), possibly wired in the brain. One of the component is certainly the removal
of colors (ωhattr (Color, ΛColor )), as this feature does not appear to influence the
performance in the task.
In spatial cognition, the KRA model allows the formation of spatial aggregates,
as proposed by Yip and Zhao [575], to be easily modeled. These aggregates are
equivalence classes of locations, built on adjacency relations. Then, we can use the
operator ωeqobj (ϕeq , region), where the objects are of type location, and ϕeq
involves spatial relations among them. Operators that change the granularity of a
representation, such as ωeqobj , are also able to model Euzenat’s notion of granularity
[154–156].
As Zeki affirms [580–582], abstraction is a crucial aspect of the whole brain activ-
ity. In particular, perceptual constancy could be modeled with a composite process
of abstraction and approximation, consisting of several kinds of operators combined
together. For instance, feature selection (ωhattr ) may play a relevant role, but also
aggregation (ωaggr ), and some approximation operator that generates a schematic
view. For explaining a complex phenomenon as perceptual constancy, the percep-
tion is likely not to be sufficient, and a theory is also needed. The same is true for
other cognitive aspects relevant for abstraction.
In processing images, a key role is played by the Gestalt theory, as was discussed
in Sect. 2.7, and abstraction plays a crucial role in it. Particularly important are the
constructive operators ωcoll , ωaggr , ωgroup , which allow a matrix of pixels to become
a scenario with meaningful objects. If we consider the six grouping principles at the
base of the theory, we can make the following considerations:
• Operator ωaggr and ωgroup are useful to distinguish “objects” from the background,
because objects are often structured, and background is formed by more or less
homogeneous region.
• The principle of similarity can be implemented by operators of the kind ωeqelem
or ωeqval , because equivalence of content or of attribute values imply similarity of
objects.
• Proximity derives from spatial arrangements, and can be implemented by operators
of the kind ωgroup , where the condition for grouping involves spatial closeness.
Repetitive patterns may be discovered by abstracting either with ωaggr , as, for
268 8 Properties of the KRA Model
instance, in Fig. 1.4, or with ωcoll , as, for instance in Fig. 2.9, where a lot of leaves
are perceived as a uniform ground cover.
• Both closure and continuity might be explained with a process of acquisition of
abstract scheme (a “square”, as in Fig. 2.10), which are then used to bias subsequent
perceptions. The scheme can be generated, for example, by an ωaggr operator, and
then reinforced by further observations. When a part of the scheme is observed, the
whole scheme is retrieved from memory and used to interpret the incoming sig-
nal. Moreover, this is in accordance with Biederman’s Principle of Componential
Recovery [59].
If we move from cognition in general to the more specialized field of vision, we can
safely say that every abstraction operator is useful. In fact, we do see because we
abstract. With feature selection (ωhattr ) we focus our attention on relevant aspects of
an image, with aggregation (ωaggr ) meaningful objects are detected in a scene, with
the identification of equivalence classes (ωeqelem or ωeqattrval ) we find homogeneous
regions. Of particular importance, in vision, is the a ability of moving across several
levels of abstraction at the same time. The KRA model is particularly well suited
to this aim, because it keeps separated in each level the relevant information, yet
allowing more than one level to be used at the same time.
Connected with the idea of multi-level image representation is Luebke
et al.’s approach of the Level of Detail (LOD) [348], which adapts the amount of
details of an image to its distance from the observer or to its size: the farer or the
smaller, the less detailed. Also in this case the KRA model can easily model the
rendering process. First of all, the LOD approach has a preferential direction from
the most detailed version of the image (the one acquired from the world) to the less
detailed ones, at it happens in KRA. Then, one or more ωhide operators are applied
in sequence, obtaining more and more abstract image representations, remembering
also what details have been overlooked at each step. Actually, the operator could be
parametrized, so that the process of generating the sequence of images can be totally
automatized.
During the analysis of abstraction in different disciplines we have come across some
examples of processes, labelled abstractions, which would be interesting to discuss.
The first is offered by Leron [327], who states that formula ϕ(x, y) ≡ (x + y)2 =
x + 2xy + y2 , with x and y natural numbers, is generalized, but not abstracted,
2
when its validity is extended from natural numbers to rational ones. In our model,
the alternatives can be modeled by two description frames Γ1 and Γ2 , such that:
8.10 Special Cases 269
Fig. 8.5 Mandelbrot set, generated by the recursive equation Z = Z 2 + C, where C and Z are
complex numbers. For a fixed Imax none of them is more abstract than the other, and they are exact
reformulations from one another
(1) (2)
ΓTYPE = {integer-pair} ΓTYPE = {rational-pair}
(1) (2)
ΓO = {(i, j) ∈ N2 } ΓO = {Pair (x, y) of rationals}
(1) (1) (2) (2)
ΓA = Γ F = ∅ ΓA = ΓF = ∅
(1)
ΓR = {Rϕ } ΓR = {Rϕ(a) }
(2)
(a)
The cover RCOV (Rϕ ) is the set of pairs of integers, whereas RCOV (Rϕ ) is the set
(a)
of pairs of rationals. Then, RCOV (Rϕ ) ⊆ RCOV (Rϕ ), and the transformation from
Γ1 to Γ2 is indeed a generalization.
The set Ψ1 of configurations associated to Γ1 is the whole N2 , whereas the set
Ψ2 of configurations associated to Γ2 is the set of pairs of rational numbers. Then,
transformation from Γ1 to Γ2 is not an abstraction, as Ψ2 contains more information
that Ψ1 ; actually it is the other way around, and Γ1 is an abstraction of Γ2 , obtained
(2)
by hiding all points in ΓO that do not have integer coordinates.
The second example discussed by the same author is that the description “all prime
numbers less than 20”, is more abstract, but not more general, than “the numbers
2, 3, 5, 7, 11, 13, 17, 19”. It is immediate to see that, in our approach, the two
descriptions are reformulations from one another, and then there is neither abstraction
nor generalization involved.
A last interesting case to consider is the description of a fractal, such as the
Mandelbrot set, reported in Fig. 8.5, generated by the recursive equation Z = Z 2 +C,
where C and Z are complex numbers.
By considering the equation and the picture, one is tempted to say that the equation
is an abstraction of the figure. However, according to the amount of information that
270 8 Properties of the KRA Model
convey, the equation and the picture are reformulations from one another. This is true
for each maximum number of iteration Imax allowed. Abstraction intervenes when
Imax is changed. In fact, by increasing Imax , more and more detailed pictures are
obtained. Then, abstraction corresponds to a decrease of Imax , as for each Imax the
set of generated points is a subset of those generated with a higher Imax .
8.11 Summary
specializing KRA’s operators, we have shown that they can implement all operators
proposed so far.
The KRA model does not just mimic previously proposed abstraction theories,
but could also be used in other domains, to make abstraction operational where they
were defined informally. The possibility of translating an operator into a program
could allow the exploration of the role of abstraction in several disciplines, typically
cognitive science. In fact, a systematic use of the operators, possibly inserted into a
wrapper approach, would allow the ones that best explain the experimental finding
to be discovered. This could be the case, for instance, to explain why animals are
recognized so rapidly in pictures.
Chapter 9
Abstraction in Machine Learning
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 273
DOI: 10.1007/978-1-4614-7052-6_9, © Springer Science+Business Media New York 2013
274 9 Abstraction in Machine Learning
Around the same period, Zucker and Ganascia [591, 592] developed an approach to
FOL learning based on an abstraction and a reformulation of the learning examples.
Merging together their previous approaches, Saitta and Zucker have proposed a
unifying model of abstraction, the KRA model, which moves the semantic view of
abstraction a step further toward observations and perception [468, 590]. The KRA
model has been applied to various domains, such as relational learning [208], car-
tographic generalization [393], robotic vision [470], complex networks [466, 471],
and model-based diagnosis [467]. The authors have also shown that their abstrac-
tion framework can alleviate the computational complexity of relational learning
originated from the phase transition emerging in the covering test in FOL learning
[209, 469].
Afterwards, abstraction has been addressed directly in learning languages [120], in
learning abstraction hierarchies [289], in abstracting data for decision tree induction
[306], in grouping objects by conceptual clustering [6], in relational learning [58,
164], in Reinforcement Learning [185], and in Support Vector Machines (SVM).
Apart from the works cited above, there are several subfields of Machine Learning
and Data Mining in which abstraction is largely present under other names. These
subfields are the following ones:
• Data discretization—When continuous data are discretized, values in the same
interval are considered equivalent and replaced by a symbol [140, 292].
• Feature and Instance selection—Collected data for learning are often described
with a set of features, some of which are irrelevant to the goal. These features
increase the complexity of learning, and may also mask useful knowledge. The
literature on feature selection is immense [291]. A recent overview is provided
by Liu et al. [335]. Instance selection is a similar task, where only a subset of
the available data is kept [64, 334, 358]. Both feature and instance selection are
clearly related to the idea of abstraction as the process of focalizing on the important
aspects of observations, forgetting the irrelevant ones.
• Constructive induction and Predicate invention—Constructive induction and Pred-
icate invention are techniques for introducing descriptors that are combinations of
the original ones. The goal is to obtain more meaningful features, able to facili-
tate learning. There is a rich literature in this field, starting from the old paper by
Michalski and Stepp [365], followed by Muggleton [381], who described the sys-
tem DUCE, in which propositional formulas were grouped by means of operators
to form new ones, and by Rendell and co-workers [362, 357], who tried to pro-
vide some principles for introducing new terms. An incremental, instance-based
approach was described by Aha [5], whereas Japkowicz and Hirsh [275] presented
a bootstrapping method starting from human-suggested features. Except Michal-
ski’s proposal, all the others mentioned so far dealt with learning in propositional
settings. More recently, constructive induction moved to FOL learning, under the
name of predicate invention [172, 294, 354, 592]. Constructive induction is thus a
9 Abstraction in Machine Learning 275
We will focus on the first and third type of learning to explore abstraction in
Machine Learning. In fact, most methods that are applicable in supervised learning
may also be applied to unsupervised learning. Addressing these two fundamentally
different types of learning will support exploring a wide range of abstraction. In both
cases the role of the representation is critical to the success of learning.
In supervised learning, the task relies on training examples of a function that are
given in an initial representation based on “raw” input variables.2 The “features”
used by the learning algorithms, which are constructed from the raw input variables,
1 Semi-supervised learning also learns a function that maps inputs to desired outputs, using both
labeled and unlabeled examples, the latter ones providing information about the distribution of the
observations.
2 Guyon et al. [229] suggest to call variables the raw input variables, and features the variables
constructed from the input variables. The distinction is necessary in the case of kernel methods for
which features are not explicitly computed.
276 9 Abstraction in Machine Learning
Fig. 9.1 Schemes of supervised and unsupervised learning from examples and observations
Fig. 9.2 Schematic representation of Reinforcement Learning. An agent receives rewards from the
environment it belongs to and that it perceives. The agent performs action that may modify itself
or the environment
are deeply related to the complexity and success of learning. Abstracting the repre-
sentation of each example is a step that has paramount impact on supervised learning
algorithms’ complexity and accuracy. Another component of the learning process,
where abstraction can take place, is the hypothesis space that is explored. There is
a classical simplification, often considered in supervised learning, that consists in
representing examples in the language used to describe the function to be learned.
It is called the “single representation trick”. Abstracting the representation of the
function to learn is thus often coupled with the abstraction of the examples.
In Reinforcement Learning (RL) there are no examples, but one or more agents
interacting with an environment. An agent performs actions that (most often) modify
its state, and, while it receives rewards from the environment, it builds value functions
that guide its search for the good action to take so as to maximize its cumulative
reward. As Dietterich argues, all basic Reinforcement Learning algorithms are “flat”
methods inasmuch as they treat the state space as one very large flat search space
[137]. The paths from a start state to a generic state may be very long, and its length
9.1 A Brief Introduction to Machine Learning 277
has a direct impact on the cost of learning, because information about future rewards
must be propagated backward along these paths. The representation of the states, of
the actions, and of the value functions play therefore a key role in the complexity
and success of the learning task. To scale up Reinforcement Learning to real life
problems there is thus a need to introduce various mechanisms for abstraction, either
by abstracting the state space (be it flat and/or factored) or the actions, [185] or any
of the functions that apply to states or actions [137].
The two first broad classes of learning introduced above (supervised or unsupervised
learning), illustrated on Fig. 9.1, can be more precisely defined as follows:
• In supervised learning the goal is to learn a function f that maps a vector xj =
(vj,1 , . . . , vj,n ) of features values Ai = vj,i into a set of values (be it Boolean,
discrete or continuous), by looking at several pairs (xj , f (xj )), called examples or
training data of the function. The quality of an approximation h of f is measured
by their differences on some testing data [238, 375, 475].
• In unsupervised learning there are only observations (xj ), and the goal is to find
a “good” clustering of a given set of observations into groups (be it a partition,
a hierarchy, a pyramid, a lattice, …), so that objects of the same group are more
similar than objects of different groups. Different measures have been proposed in
the literature to address the problem of determining the optimal number of clusters
and the quality of the clustering [273].
As an illustrative supervised learning problem, we have chosen the task of deciding
whether an adult woman of Pima Indian heritage shows signs of diabetes according to
the World Health Organization criteria. A Public Database [179] illustrating this task
is often used to test Machine Learning algorithms, namely the Pima Indians Diabetes
Data Set. This data set contains 768 observations with 8 real-valued features that have
hopefully self-explanatory names: “Number of times pregnant”, “Plasma glucose
concentration”, “Diastolic blood pressure (mm Hg)”, “Triceps skin fold thickness
(mm)”, “2-Hour serum insulin (mu U/ml)”, “Body mass index (weight/(height)2 )”,
“Diabetes pedigree function” and “Age (years)”. The class variable is Boolean,
and expresses whether the considered Pima Indian woman shows signs of diabetes
or not.
As already mentioned, abstraction takes several forms in Machine Learning, which
are related to either features or instances. The main areas of research that include
abstraction in learning includes:
The last four types of representation changes are grouped under the term of Construc-
tive Induction, whose goal is to construct new objects, features, functions, relations
or predicates for the language used to describe the data and the hypothesis language.
In Machine Learning and Data Mining, data are often described by a large set of
features. In Metagenomics prediction tasks, for example, there might be up to several
millions of features for a set of hundreds of examples [98]; in many cases all but a
few are in fact relevant to the learning task. The presence of irrelevant features has
numerous drawbacks. First, these features increase the computational complexity.
Second, these irrelevant features might spuriously correlate to other more meaningful
features, and thus be preferred to the latter although not related to the task. Finally,
and more generally, they are known to increase the storage of data, deteriorate the
performance, diminish understandability of models, introduce a wide range of errors,
and reduce robustness of algorithms [229, 475]. In this context, feature selection is
thus simply the process of hiding features from the initial representation of the
instances. The difficult part of the process is obviously to define the criterion for
keeping some features and discard the other ones.
The literature on the task of feature selection, which goes back to the field of
Pattern Recognition, is very large, as the topic has been an active field of research
in Machine Learning, Data Mining and Statistics for many years [64, 229, 291, 294,
465, 537, 562]. It has been widely applied to many fields, such as bioinformatics
[160, 229, 270, 465], text mining [177, 276], text retrieval [224], music classification
[409], only to cite a few.
In feature selection the data used can be either labeled, unlabeled or partially
labeled, leading to the development of supervised, unsupervised and semi-supervised
feature selection algorithms, respectively. We will briefly present the principles of
feature selection in the framework of supervised learning, our goal being to analyze
abstraction in Machine Learning rather that exhaustively list all the methods that
have been used; to the latter aim excellent reviews already exist [64, 229, 291, 294,
465, 562]. In the supervised learning case, there is a class label associated to each
instance (see for example the column “Class”’ in Table 9.1). The feature selection
process can be an iterative one, and different information can be used to guide it,
including the class label:
9.2 Abstraction in Learning from Examples or Observations 279
Fig. 9.3 A view of a feature selection process distinguishing Filter, Wrapper and Embedded
approaches
280 9 Abstraction in Machine Learning
Fig. 9.4 A taxonomy of feature selection techniques. For each feature selection type, we highlight
a set of characteristics which can guide the choice for a technique suited to the goals and resources
of practitioners in the field. (Reprinted with permission from Saeys et al. [465] with a few updates)
depends on the search strategy, relevance index or predictive power, and assessment
method, and which is described in Fig. 9.5.
Filter approaches select variables by ranking them with coefficients, based on
correlations between the features (or subset thereof) and the class, or usefulness of
the feature to differentiate neighboring instances with different classes, and so on.
They can be very efficient (whether individual or subset of variables are ranked).
Wrapper methods require one or more predetermined learning algorithms, and use
their performances on the provided features to assess relevant subsets of them. Finally
Embedded approaches incorporate feature selection as a part of the learning process,
and directly optimize a two-part objective function with a goodness-of-fit term and
a penalty for a large number of variables [229]. The lasso introduced by Tibshirani
[529], a shrinkage and selection method for linear regression, is a good example of
such embedded systems. Whatever approach is chosen to search for the best subset
of variables, tractability is an issue as this problem is known to be NP-complete [12].
Many software packages are currently available to perform feature selection.
9.2 Abstraction in Learning from Examples or Observations 281
(a) (b)
(c)
Fig. 9.5 The three principal approaches to feature selection: Filter, Wrappers, and Embedded. The
shades show each of the components used by each of the three approaches; Cross-validation, for
example, is used by both the Embedded and Wrappers methods but not Filters. (Reprinted with
permission from Guyon et al. [228]). a Filters, b Wrappers, c Embedded methods
Example 9.1. For the sake of illustration we present a very simple implementation
of a procedure that supports selecting one feature out of the initial set.3 The R code—
hopefully self-explanatory—is reported in Fig. 9.6. The results of applying this code
to the PIMA Dataset is reported in Fig. 9.7.
In Machine Learning there is also a specular approach to feature selection, i.e.,
given a set of features, each of them is added one at a time to the example description.
3 There exist various R packages that support a wide variety of feature selection methods (for
example the FSelector Package, which provides functions for selecting attributes from a given
dataset: http://cran.r-project.org/web/packages/FSelector/index.html). Several approaches to fea-
ture selections are also available in the WEKA [565] package (http://www.cs.waikato.ac.nz/ml/
weka/).
282 9 Abstraction in Machine Learning
Fig. 9.6 Short R code for performing feature selection on the Pima Database. The feature that has
the greatest number of zero values (i.e., the feature “insulin”) is hidden
Fig. 9.7 Result of the execution of the R code of Fig. 9.6. In the Abstract Data, the “insulin” feature
is hidden from the ground data
This approach has not been considered here in more details: it is a kind of “inverse”
abstraction or “concretion”. On the other hand, it corresponds to Floridi’s notion,
where, in hierarchical GoAs, observables are added rather than removed from one
layer to the other.
9.2 Abstraction in Learning from Examples or Observations 283
Instance selection is in some sense a dual task with respect to feature selection,
because instances rather than features are hidden [64, 80, 334, 358]. Just as some
attributes may be more relevant than others, some examples may be as well more
relevant than others for the learning task [64]. Instance selection has been widely
studied in the field of outlier or anomaly detection [92, 253, 380, 484]. Here the
removed instances correspond to anomalous observations in the data. Outliers may
arise because of human error, instrument error, fraudulent behavior, or simply through
natural deviations in populations. Today, principled and systematic techniques are
used to detect and remove outliers.
Instance selection has also been studied in the field of Instance-Based Learning
(IBL), but it also finds application in Data Mining, where the number of observations
are potentially huge. It is also of critical importance in the field of online learning [60,
560], learning from data streams [189] and, recently, in learning with limited memory
[131]. The Forgetron, for example, is a family of kernel-based online classification
algorithms that restrict themselves to a predefined memory budget [131].
A first reason to reduce the number of examples required to learn, or to learn
with a predefined memory budget, is to reduce the computational cost of learning.
Another reason is related to the cost of labeling. Be it because it must be obtained
from experts, or because the technique to obtain labeled examples is itself expensive,4
reducing the number of examples required to learn is important. A third reason is to
increase the rate of learning by focusing attention on informative examples.5
As Blum et Langley note, one should distinguish between examples that are rele-
vant from the viewpoint of information, and ones that are relevant from the viewpoint
of the algorithm [64]. Most works emphasize the latter, though information-based
measures are sometimes used for this purpose. Instance selection approaches can be
classified, like feature selection methods, into Filter, Wrapper or Embedded.
Example 9.2. For illustration purposes, we present a very simple implementation of
a procedure that supports selecting one instance out of the initial training set. The R
code is presented in Fig. 9.8.6 The results of the run of the code reported in Fig. 9.8
are collected in Fig. 9.9.
4 For example, the price for sequencing one individual genome to support personalized medicine
tive instances. Active learning, also called optimal experimental design in Statistics, is a form of
supervised Machine Learning method, in which the learning algorithm is able to interactively make
requests to obtain the desired outputs at new data points. As a consequence, the number of examples
to learn a concept can often be much lower than the number required in normal supervised learning.
6 There exists an R package that support a wide variety of instance selection methods, such as, for
example, the “outliers” Package, which provides functions for selecting out instances from a given
dataset: http://cran.r-project.org/web/packages/outliers/index.html
284 9 Abstraction in Machine Learning
Fig. 9.8 Short R code for performing an instance selection on the Pima Dataset. The hidden instance
is the one that has the greatest number of zero values in its description (i.e., the woman with Id427)
Fig. 9.9 Result of the execution of R code of Fig. 9.8: the woman Id427 is hidden from the Ground-
Data
9.2 Abstraction in Learning from Examples or Observations 285
Fig. 9.11 Classification of discretization methods and several illustrative algorithms. (Reprint with
permission from Liu et al. [335])
“best” number of clusters is a difficult problem, which is often cast as the problem of
model selection [273]. In supervised learning, the search for an optimal K is usually
led by the quality of learning, and operates on the principle of wrapper defined
earlier on, whereas in unsupervised learning there are several measures that have
been proposed to identify an optimal number of clusters (BIC, AIC, MML, Gap
statistics,...) [273].
One of the effects of discretization for learning is that it reduces information,
offering thus a satisfactory trade-off between gain in tractability and loss in accuracy.
Many studies show that induction tasks benefit from discretization, be it in terms of
accuracy, or time required for learning (including discretization), or understandability
of the learning result. The majority of discretization algorithms found in the literature
performs an iterative greedy heuristic search in the space of candidate discretizations,
using different types of scoring functions to evaluate their quality [298]. Finally we
should mention that when the feature to discretize is either time or space (the term
aggregation is also used), dedicated approaches have been developed [8, 575].
Example 9.3. For illustration purposes, we present a very simple implementation of
a procedure that supports discretizing one feature of the initial representation of the
Pima Data. The R code is presented in Fig. 9.12.7
7 There exist packages in R that support a wide variety of discretization, such as, for example, the
“discretization” Package, which provides functions for discretizing features http://cran.r-project.
org/web/packages/discretization/index.html. Several approaches to feature discretization are also
available in the WEKA [565] package (http://www.cs.waikato.ac.nz/ml/weka/).
9.2 Abstraction in Learning from Examples or Observations 287
Fig. 9.12 Short R code for performing a discretization of the feature “Age”, from the Pima Dataset,
into two bins of the same width (here, 30 years). The results are reported in Fig. 9.13
Fig. 9.13 Result of the execution of code of Fig. 9.12. The “Age” of the GroundData has two
possible values after discretization
the data or the hypotheses [136, 141, 173, 286, 319, 381]. Whereas feature selection
is a form of dimensionality reduction, the purpose of constructive induction is to
change the language of representation. It might sound counter-intuitive to consider
as an abstraction an approach that constructs new features. However, we must first
notice that the features do not add really “new” information, because they are built
using existing features, which are removed from the description.
In this section we will focus on four types of such representation changes, and
describe the nature of the abstraction at stake in each:
• Feature construction
• Predicate invention
• Term abstraction
• Propositionalization
Feature construction is with no doubt the most widely used approach of con-
structive induction. It can be applied to almost all representations used in Machine
Learning or Pattern Recognition. As opposed to feature construction, the last three
representation changes mentioned above apply mostly inside the framework of struc-
tural learning, Inductive Logic Programming (ILP [384, 385]) or Statistical Rela-
tional Learning (SRI), all three relying on First Order Logic representation.
288 9 Abstraction in Machine Learning
Feature construction is a very popular way of generating features that are combi-
nations of the original ones, and may be more appropriate as they preserve crucial
information for the task while removing redundant features [141, 173, 286, 319,
381]. The term feature generation [324, 351, 409] is also used to characterize the
creation of new features.
Feature construction may be used as an approach to reduce the dimensionality,
when it projects the data onto a lower-dimensional space. Among the many statistical
approaches used to construct features, the Principal Component Analysis (PCA),
which is a particular case of the family of Karhunen-Loeve’s transformations, is
with no doubt one of the most commonly used. Principal component analysis uses
an orthogonal transformation to convert a set of observations of possibly correlated
features into a set of linearly uncorrelated features, called principal components,
which “explain” most of the variance in the data. However, in general only a subset
of the most significant coefficients is considered, obtaining thus a more abstract
representation of the original function.
Another approach that performs abstraction on numerical data is the use of the
Discrete Fourier transform. Also in this case a function is expressed as a series of
trigonometric functions, each with a coefficient associated. If all the coefficients
are kept (they are often in an infinite number), no abstraction occurs but only
reformulation.
In Machine Learning, feature construction was first introduced by Michalski [366,
367], and then formally defined by Matheus and Rendell [357] as the application of
a set of constructive operators to a set of existing features; the result consists in the
construction of one or more new features intended for use in describing the target
concept to be learned. Similarly to feature selection, the building of abstract feature
can take place before induction (as in Filter mode), after induction (as in Wrapper
mode) or during induction (as in Embedded mode). Feature construction has been
widely used in Machine Learning in all kinds of representation, from propositional
to relational learning: PLSO [449], DUCE [381], STAGGER [476], BACON [315],
Fringe [409], MIRO [141], KLUSTER [286], MIR-SAYU [122], Feat-KNN [232], to
cite only a few.
Wnek [566] offers a review of feature construction distinguishing (a) Deductive
Constructive Induction (DCI), which takes place before induction, and (b) Hypothesis
Constructive Induction (HCI), which takes place after. We add to this list the notion
of ECI (Embedded Constructive Induction) to account for constructive induction that
is embedded in the Machine Learning algorithm.
• In DCI (corresponding to a Filter approach) the space of possible new features is
combinatorial, and a priori knowledge must be used to choose the types of fea-
tures to construct (e.g., product fi ∗ fj , ratio fi /fj , Boolean formula M-of -N, . . .).
Constructed features may also use expert knowledge; for instance, Hirsh and Jap-
kowicz [250] presented a bootstrapping method starting from human-suggested
9.2 Abstraction in Learning from Examples or Observations 289
features. The main problem of DCI approaches is clearly the combinatorial aspect
of the possible features that may be generated.
• In HCI (corresponding to a Wrapper approach) the new features are built after a
learning process has taken place. Some typical approaches are listed below.
– FRINGE [410] is a decision tree-based feature construction algorithm. It
reduces the size of the learned decision tree by iteratively conjoining features at
the fringe of branches in the decision tree. It addresses the replication problem
in trees (i.e., many similar subtrees appearing in the tree) and provides more
and more compact decision trees [306, 418]. The features built this way corre-
spond to particular Boolean formulas over the initial or previously constructed
features. A continuous feature can also be created.
– In Feat-KNN [232] the new features are built from pairs of features that are
density estimator functions learnt from the projection of the data onto the 2-D
space formed by the pair of features. Only the best newly built features are then
kept for subsequent learning.
– Pachet et al. [409] present a feature construction system designed to create audio
features for supervised classification tasks. They build analytical features (AFs)
based on raw audio signal processing features. AFs can express arbitrary signal
functions, and might not make obvious sense. MaxPos(FFT (Split(FFT (Lp
Filter(x, MaxPos(FFT (x))), 100, 0))) is, for instance, an AF that computes a
complex expression involving a low-pass filter, whose cutoff frequency is itself
computed as the maximum peak of the signal spectrum. AFs are evaluated
individually using a wrapper approach.
• In ECI (corresponding to an Embedded approach) the new features are built during
the learning process, as it is the case of OC1, which learns oblique decision trees
[388]. Decision splits that are ratios of the initial features are considered during the
building of the tree. Motoda proposes an algorithm that extracts features from a
feed-forward neural network [379]. Indeed, feed-forward neural networks might be
considered as dynamically building features [542], because they construct layered
features from input to hidden layer(s) and further to the output layer.
Finally let us mention that feature construction has some natural links with the field
of Scientific Discovery [145, 316, 543]. BACON [315], for example, is a program
that discovers relationships among real-valued features of instances in data using
two operators, multiply and divide.
As a conclusion, we can say that, according to Definition 6.19, feature construction
is an abstraction only if the original features are removed from the “abstract” space.
Otherwise, it is a simple reformulation; in fact, the set of configurations remains
the same, as both the newly created feature and its components are visible in each
example.
In Fig. 9.14 a simple R code to perform the construction of two features is reported.
The results on the PIMA dataset is reported in Fig. 9.15.
290 9 Abstraction in Machine Learning
Fig. 9.14 Short R code for performing the construction of two features from the eight features of
the Pima Database
Fig. 9.15 Result of the execution of the R code of Fig. 9.14 (apart from the last three lines): the
two new features PC1 and PC2 are abstracted from the 8 initial ones
Predicate invention (PI) refers to the automatic introduction of new relations or pred-
icates directly in the hypothesis language [172, 302, 504, 505]. To extend the initial
set of predicates or relations (sometimes called the vocabulary) may be useful to
either speed up or ease the learning task. However, inventing relevant new predi-
cates is one of the hardest tasks in Machine Learning, because they are difficult to
9.2 Abstraction in Learning from Examples or Observations 291
evaluate, and there may potentially be so many of them [174]. As an example let us
consider the predicates StepMotherOf and MotherOf . It might be useful to consider
a new predicate MomOf , true whenever StepMotherOf or MotherOf are true. This
predicate would account for the maternal relationship that is true for both mothers
and stepmothers. Formally:
IF SON(Luke,X)
SON(Luke,Anakin) THEN PARENT (X,Luke) DAUGHTER (Leia,Padme) ?
Fig. 9.16 The V-operator inverting the resolution. On the right, a new clause stating that if Leia is
a daughter of X then X may be the parent of Leia is a possible result of the resolution inversion
8 There are also scheme-driven methods, which define new intermediate predicates as combinations
of known literals that match one of the schemes provided initially for useful literal combinations
[504].
292 9 Abstraction in Machine Learning
Fig. 9.17 The W-operator of intra-constructing. It consists here in two simultaneous V-operators
based on a factorization, where a new predicate that subsumes father and mother is invented (called
“new”, assuming that the “parent” concept had not been given before)
to over-generating predicates, many of which are not useful. Predicates can also be
invented by instantiating second-order templates [488], or to represent exceptions to
learned rules [501]. Relational predicate invention approaches suffer from a limited
ability to handle noisy data.
As noted by Khan et al. [285], “... surprisingly little, if any, experimental evi-
dence exists to indicate that learners which employ PI perform better than those
which do not. On the face of it, there are good reasons to believe that since increas-
ing the learner’s vocabulary expands the hypothesis space, PI could degrade both the
learner’s predictive accuracy and learning speed” . But, as recently underlined by
Kok and Domingos [293], there are only few systems able to invent new predicates,
and only weak or no results about the properties of their operators. The crucial prob-
lems concerning the introduction of new predicates have not yet been satisfactorily
solved. Nevertheless, the need for predicate invention is undoubted.
In the past few years, the Statistical Relational Learning (SRL) community has
recognized the importance of combining the strengths of statistical learning and
relational learning, and developed several novel representations, as well as algorithms
to learn their parameters and structure [199]. However, the problem of statistical
predicate invention (SPI) has so far received little attention in the community. SPI
is the discovery of new concepts, properties and relations from data, expressed in
terms of the observable ones, using statistical techniques to guide the process, and
explicitly representing the uncertainty in the discovered predicates. These can in turn
be used as a basis for discovering new predicates, which is potentially much more
powerful than learning based on a fixed set of simple primitives. Essentially, all the
9.2 Abstraction in Learning from Examples or Observations 293
concepts used by humans can be viewed as invented predicates, with many levels
of discovery between them and the sensory percepts they are ultimately based on.
A recent proposal for predicate invention has been put forward by Domingos [293].
Comments similar to those provided for predicate invention can be done for feature
construction. If the new predicate is meaningful for the task at hand, then learning will
occur with less computational effort and better results. Otherwise, a good concept
can be lost, because it may be masked by too many other predicates.
(a) (b)
Fig. 9.18 a A learning example (a cart) with structural noise. b A new description of a cart is
obtained by hiding objects a and d, and merging object h and m into p, as well as objects e, f, and
g into n. (Reprinted with permission from Giordana et al. [210])
Fig. 9.19 Example of two different term abstractions of a molecule: in the one on top, a unique
term abstracts the whole molecule, whereas in the one at the bottom several terms abstract pairs of
atoms
294 9 Abstraction in Machine Learning
a significant speed-up. Finding such terms may be related to approaches that attempt
to detect common substructures, like the SUBDUE system [256].
In relational learning examples may have internal parts (e.g., a carcinogen mole-
cule may be described by its components and their relations [241, 287]), and a
possible abstraction aims at simplifying its representation by hiding part of the com-
plexity of their structure [416]. This aggregation hides the structural complexity of
objects, and as such it leads to very significant speed-up in Machine Learning at the
expense of using a simplified representation.
9.2.4.4 Propositionalization
Although some learning systems can directly operate on relations, the vast majority of
them only operate on attribute-value representations. Since the beginning of Machine
Learning there have been approaches to reformulate relational representation into
propositional ones. Such a reformulation requires first that new features are built up
(e.g., LINUS [318]), and then the relational data are to be re-expressed in the new
representation. This process, called originally selective reformulation [318, 591,
592], was later called propositionalization [9, 72, 304, 578, 583].
The issue becomes then the translation of the relational learning task into a propo-
sitional one, in such a way that efficient algorithms for propositional learning can be
applied [10, 303, 305, 592, 591]. Propositionalization involves identifying the terms
(also called morion [592]) that will be used as the individual objects for learning.
9 To simplify the treatment, we will not explicitly represent the starting state probability distribution.
9.3 Abstraction in Reinforcement Learning 295
(a) (b)
Fig. 9.20 a A simple Markov decision problem used to introduce reinforcement learning after
Dietterich [137]. A passenger is located in Y at (0,0), and wishes to go by taxi to location B at (3,0).
b The optimal value function V for the taxi agent in the case described in a
296 9 Abstraction in Machine Learning
Fig. 9.21 The six primitives actions of the taxi agent in the taxi problem
passenger at his/her destination, and negative ones when he/she attempts to pickup
a non-existent passenger or putdown the passenger anywhere except one of the four
special spots [137].
In spite of several success stories of RL, in many cases, tackling difficult tasks
using RL may be slow or infeasible, the difficulty being usually a result of the
combination of the size of the state space with the lack of immediate reinforcement
signal. Thus, a significant amount of RL research is focused on improving the speed
of learning by using background knowledge to either reuse solutions from similar
problems (using transfer or generalization techniques [524]) to bootstrap the value
of V π (s), or abstracting along the different dimensions of knowledge representation
in RL [137, 363, 426, 516]. To represent large MDP Boutilier et al. [75] have been
precursors in proposing to use a factored model in planning. Factored MDPs are
a representation language that supports exploiting problem structure to represent
exponentially large MDPs in a compact way [226]. In a factored MDP, the set of
states is described via a set of random variables. There is in fact a wide spectrum
of abstractions that have been explored in the field of Reinforcement Learning and
planning literatures. Both positive and negative results are known [330]. There are
mainly four dimensions (summarized in Fig. 9.22), along which these representation
changes involving abstraction have been explored:
in the second case there are algorithms that search regularities in the state space
(symmetry, equivalence, irrelevant features, etc.), in the third case a surrogate of
the Value function V or Q is built up using a classifier systems (Decision tree,
SVM, …) to learn a model of V or Q, and, similarly, in the fourth case a surrogate
of the policy function is built up using again a classifier [322].
There is a large literature on factored MDPs that is relevant to this question [226].
• How to guarantee the convergence of algorithms in the abstracted state space. This
issue has raised a lot of work, both theoretical and empirical [523].
There have been several strategies proposed to state aggregation, overviewed by Li
et al. [330] (see Fig. 9.23). Symmetry of the state space arises when states, actions or
a combination of both can be mapped to an equivalent reduced or factored MDP that
has less states or/and actions. An example of state symmetry is learning to exit similar
rooms that differ only in non relevant properties. A more subtle kind of symmetry
arises when the state space can be partitioned into blocks such that the inter-block
transition probabilities and reward probabilities are constant for all actions.
Early work from Boutilier et al. [76] introduced a stochastic dynamic program-
ming algorithm that automatically builds aggregation trees in the abstract space to
create an abstract model, where states with the same transition and reward functions,
under a fixed policy, are grouped together.
Fig. 9.23 Different strategies for state aggregation in Reinforcement Learning. The column “MDP
given” states whether an MDP is given or not before learning. (Reprinted with permission from Li
et al. [330])
9.3 Abstraction in Reinforcement Learning 299
This principle has been formalized using the notion of bisimulation homogeneity
by Dean et al. [128]. The elimination of an irrelevant random variable in a state
description is an example of such homogeneity. Givan et al. [215] have proposed
an algorithm that generalizes Boutilier’s approach [76] based on the iterative meth-
ods for finding a bisimulation in the semantics of concurrent processes. This algo-
rithm, where states with the same transition probability and reward function are
automatically aggregated, supports building the abstraction of an MDP in polyno-
mial time [215].
Andre and Russell [18] propose a state abstraction that maintains the optimality
among all policies consistent with the partial program that they call hierarchical
optimality. They have demonstrated that their approach, on variants of the taxi prob-
lem, shows faster learning of better policies, and enables the transfer of learned skills
from one problem to another. Fitch et al. [171] consider using homomorphism as
algebraic formalism for modeling abstraction in the framework of MDPs and semi-
MDPs [444]. They explore abstraction in the context of multi-agent systems, where
the state space grows also exponentially in the number of agents. They also investi-
gate several classes of abstractions specific to multi-agent RL; in these abstractions
agents act one at a time as far as learning is concerned, but they are assumed to be
able to execute actions jointly in a real world.
Li et al. [329] have proposed a framework to unify previous work on the subject
of abstraction in RL. They consider abstraction as a mapping φ between MDPs, and
distinguish abstractions from the less to the more coarser ones:
• φmodel gives the opportunity to recover essentially the entire model (e.g.,
bisimulation [215]);
• φQπ preserves only the state-action value function for all policies;
• φQa preserves the optimal state-action value function (e.g., stochastic dynamic
programming with factored representations [76], mentioned above);
• φa preserves the optimal action and its value, and thus does not guarantee learn-
ability of the value function for suboptimal actions, but does allow for planning
(i.e., value iteration).
• φπ attempts to preserve the optimal action, but optimal planning is generally lost,
although an optimal policy is still representable [277].
Ponsen et al. [425] present an interesting survey that summarizes the most important
techniques available to achieve both generalization and abstraction in Reinforcement
Learning, and illustrate them with examples. They rely on the KRA model presented
in Chap. 6.
(a) (b)
Fig. 9.24 a Example of a function approximation where the tabular state-action representation is
described using a decision tree or a neural network. b A decision tree dynamically abstracts the
state space (Adapted from Pyeatt and Howe [434])
Fig. 9.25 A task directed graph for the Taxi problem. The leaves of this pyramid are primitive
actions (see Fig. 9.21). Root is the whole taxi task. The nodes represent individual subtasks that are
believed to be important for solving the overall task. Navigate(t), for example, is a subtask whose
goal is to move the taxi from its current location to one of the four target locations (indicated by the
formal parameter t). Get is a subtask whose goal is to move the taxi from its current location to
the passenger’s current location and pick up the passenger. Put is a subtask whose goal is to move
the taxi from the current location to the passenger’s destination location and drop off the passenger.
The directed links represent task dependancies. According to the pyramid, the Navigate(t) subtask
uses the four primitive actions North, South, East, and West. (Reprinted with permission from
Dietterich [137])
have been developed for reducing the degree of suboptimality. The most interesting
of these involves using the hierarchical value function to construct a non-hierarchical
policy that is provably better than the hierarchical one [519].
Hengst et al. [246] developed an algorithm that discovers sub-tasks automati-
cally. They introduce two completion functions, which jointly decompose the value
function hierarchically to solve problems simultaneously, and reuse sub-tasks with
discounted value functions. The significance of this result is that the benefits of HRL
can be extended to discounted value functions, and to continuous Reinforcement
Learning. Lasheng et al. [317] present the SVI algorithm, which uses a dynamic
Bayesian network model to construct an influence graph that indicates relationships
between state variables. Their work is also related to state abstraction as most work
in HRL. SVI performs state abstraction for each subtask by ignoring irrelevant state
variables and lower level subtasks. Experimental results show that the decomposition
of tasks introduced by SVI can significantly speed up constructing a near-optimal
policy. They argue that this can be applied to a broad spectrum of complex real world
problems such as robotics, industrial manufacturing, games and others.
The last type of abstraction we consider here is temporal abstraction, which has
been analyzed, in particular by Sutton et al. [519], within the framework of both
Reinforcement Learning and Markov Decision Processes. The main idea is to extend
the usual notion of action to include options, namely closed-loop policies for taking
actions over a period of time. Examples of options include picking up an object,
going to lunch, and traveling to a distant city, as well as primitive actions such as
muscle twitches and joint torques.
In previous works Sutton et al. [517, 518] used other terms including “macro-
actions” “behaviors”, “abstract actions”, and “sub-controllers” for structures closely
related to options. The term “options” is meant as a generalization of “actions”, which
is used formally only for primitive choices. It might at first seem inappropriate that
“option” does not connotate a course of non-primitive action, but this is exactly the
authors’ intention. They showed that options enable temporally abstract knowledge
and actions to be included in the Reinforcement Learning framework in a natural
and general way. In particular, options may be used interchangeably with primitive
actions in planning methods, such as dynamic programming, and in learning meth-
ods, such as Q-learning. Formally, a set of options defined over an MDP constitutes
a semi-Markov decision process. One of the tricks to treating temporal abstraction as
a minimal extension of the Reinforcement Learning framework is to build the theory
of options on the theory of semi-Markov decision processes (SMDPs, see Fig. 9.26).
Temporal abstraction provides the flexibility to greatly reduce computational com-
plexity, but can also have the opposite effect if used indiscriminately.
Representing knowledge flexibly at different levels of temporal abstraction has
the potential to greatly speedup planning and learning on large problems [317, 350].
9.4 Abstraction Operators in Machine Learning 303
Fig. 9.26 The state trajectory of an MDP is made up of small, discrete-time transitions, whereas
that of a SMDP comprises large, continuous-time transitions. Options enable an MDP trajectory to
be analyzed in either way [516]
Table 9.2 Several operators used in Machine Learning (focusing on Concept Learning and Rein-
forcement Learning), classified according to the elements they act upon, and to the type of abstraction
performed
Operators Objects Features Predicates & Functions
Hiding Instance selection Feature selection Predicates selection
Factored state
aggregation
Equating Clustering, Macro-action, Feature discretization Value/Function
Flat state aggregation approximation
Hierarchy Climbing hierarchy of Climbing hierarchy of
features or values tasks
Generation
Aggregating Term construction state Feature construction Predicate invention
space aggregation
Let CL = {c1 , ..., cS } be a set of given “concepts” (classes), X the (possibly infinite)
set of instances of the classes, and L a language for representing hypotheses. The
set X contains the identifiers of the examples, whose description is provided by the
304 9 Abstraction in Machine Learning
choice of their attributes. Let moreover LS (with cardinality N) be the learning set.
The discriminant learning task can be formulated as the following query:
Q = Given a language L, a set of “concepts” (classes), a criterion for evaluating
the quality of a hypothesis, and a learning set LS, find the hypothesis belonging to L
that correctly assigns classes to previously unseen examples.
The examples are to be observed, and it is the task designer who decides what fea-
tures (attributes) are to be measured on examples and the granularity of the attribute
values. Usually, neither functions nor relations are included in the observations. Once
selected the attributes and their domains, a description frame Γ can be defined for
representing the examples:
Then:
ψ = xi 1 i N , (9.3)
From expression (9.3) we see that, with this formalization, one configuration corre-
sponds to a set of examples, as illustrated in Fig. 9.27.
If all examples are totally specified, i.e., there are no missing values, then the
P-Set P, containing all the observations necessary to answer Q, coincides with just
one configuration. If some examples have some missing values, then P corresponds
to the set of configurations consistent with it (see Definition 6.7).
In more details, we have P = O, A, F, R, where:
O = LS with |O| = N
Then:
A = {xi | 1 i N}
9.4 Abstraction Operators in Machine Learning 305
V = {pm,j ≡ [Am = vj ] | 1 m M, 1 j m },
where m = |Λm |.
The theory T contains, first of all, a learning algorithm LEARN. Then, we must
provide a criterion to compare candidate hypotheses, for instance, the Information
Gain, and another criterion to stop the search in the hypothesis space.A hypothesis
is of the form:
ϕh (xi ) ⇒ ci ,
Solving Q consists in applying LEARN to LS, and searching for a ϕ∗ using the given
criteria for hypothesis comparison and for stopping. For the sake of illustration, let
us consider a simple example, taken from Quinlan [440].
Example 9.4. Suppose that we want to decide whether to play or not to play tennis
in a given day, based on the day’s weather. We define ΓTYPE = {example}, and,
for instance, ΓO = {1, 2, . . . , 365}, i.e., the days of a year. Each day is described
by the attributes:
ΓA = (Outlook, {sunny,overcast,rain}),
(Humidity, {high,normal}),
(Temperature, {hot,mild,cool}), (Windy, {true,false})
Γ = ΓTYPE , ΓO , ΓA , ∅, ∅
All attributes are applicable to all examples. For learning, we consider a P = O, A,
∅, ∅, including in O a learning set LS of N = 14 examples. Then, O = {x1 , . . . , x14 }
and
The database D contains two tables, namely OBJ, and ATTR, reported in Tables 9.3,
and 9.4, respectively.11
The language L consists of decision trees. Each node of the tree has an attribute
associated to it, and the edges outgoing from the node are labelled with the values
11 As all objects have the same type, the type specification is superfluous, but we have kept it for
the sake of completeness.
9.4 Abstraction Operators in Machine Learning 307
taken on by that attribute. Each path from the root the a node ν represents a conjunctive
description ϕ, and then the set of examples verifying ϕ is “associated” to ν as well.
Examples of more than one class can verify ϕ, but leaf nodes contain examples of
just one class.12
The theory T includes the learning algorithm LEARN = ID3 [440], and the
information gain IG as a hypothesis evaluation criterion. The stop criterion states
that learning stops when the frontier to be expanded in the decision tree T consists of
only leaf nodes. The examples in O are assigned by a teacher to one of the two classes
contained in CL = {Yes, No}. The labeling by the teacher (class No to examples
x1 , . . . , x5 , and class Yes to examples x6 , . . . , x14 ) adds a column to table OBJ,
reporting the correct classification. The new table in D is reported in Table 9.5.
The query can be formulated as follows:
In order to answer the query, algorithm ID3 is run on LS, and the resulting “best”
decision tree is output.
After having formalized the propositional learning problem inside the KRA model,
we address the task of feature selection. Feature selection, in this context, means
By applying ωhattr , there are subsets of examples in X that collapse. In fact, all those
examples only differing for the value of attribute Aj , become identical.
Let us now consider a specific learning task, in which we have:
As already mentioned, if in LS there are two examples that only differ for the
values of the attribute Aj , then the descriptions of those examples now coincide.
However, the examples are still distinguishable, due to their unique identifiers. The
associated method shall specify if duplicate examples are to be removed or not.
Let COMPg (Pg ) be the set of configurations in Ψg consistent with Pg . If no
example has missing values, then COMPg (Pg ) contains a unique configuration
9.4 Abstraction Operators in Machine Learning 309
(a)
Pg ≡ ψg , otherwise |COMPg (Pg )| > 1. An abstract example xi corresponds
to j = |Λj | ground examples.
each elementary object o, of two pieces of information, namely the values of two
special attributes: Example and Class. Attribute Example specifies which example o
belongs to, and Class specifies which class the example (and, hence, also object o) is
an instance of (See illustrative example in Table 9.1). These two attributes are added
as columns to the table OBJ in D. Depending on the learning algorithm, the standard
format of the database D, as it is built up in the KRA model, may or may not be used.
Then, data in D might be reformulated, without changing the information content they
provide, in another format. One commonly used transformation is a reformulation
of the data into ground logical formulas, to be used by an ILP algorithm. Another
way is to use relational algebra operators to regroup the data into a different set of
tables, one for each example. Relational learning is one of the cases where multiple
formulations of the same data can be present in D, to be used according to the nature
of the theory.
Finally, in the theory T we have to provide a mean to evaluate learning and,
as usual, a stopping criterion. In relational learning the language for representing
hypotheses is a subset of a First Order Logic, so that hypotheses have variables that
must be bound to objects. Remember that an example is here a composite object. In
order to see whether an example satisfies a formula, it is frequently used a restricted
form of deduction called θ-subsumption [384, 423].
Definition 9.1. {θ-Subsumption in DATALOG} Let h(x1 , x2 , ..., xn ) be a First
Order Logic formula, with variables xi (1 i n), and e the description of an
example. We will say that h subsumes e if there exists a substitution θ for the vari-
ables in h which makes h a subset of e.
The θ-subsumption relation offers a means to perform the covering test between
a hypothesis h and an example e, but also for testing the more-general-than relation.
An informal, but intuitive way for testing whether a hypothesis h covers a learning
example e is to consider each atomic predicate in h as a test to be performed on the
example e. This can be done by binding the variables in h to components of e, and then
ascertaining whether the selected bindings verify, in the example e, the predicates
appearing in h. The binding procedure should be made for every possible choice of
the objects in e. The procedure will stop as soon as a binding satisfying h is found,
thus reporting true, or it will stop reporting false after all possible bindings have
been tried. The procedure is called, as in the propositional case, “matching h to e”.
Matching h to e has the advantage that it avoids the need of translating the example
e into a ground logical formula, because examples normally come in tabular form.
In practice, several learners use this matching approach for testing coverage [52, 69,
439].
For the sake of exemplification, let consider an example.
Example 9.5. Let Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR be a description frame, where:
ΓA = {(Shape, {triangle,rectangle}),
(Color, {yellow,green,blue,red})}
ΓF = ∅
ΓR = {Rontop , Radjacent }
O = Ogirder ∪ Osupport
Ogirder = {c1 , c2 , c4 }
Osupport = {a1 , a2 , a3 , a4 , b1 , b2 , b3 }
A = {(a1 , support,rectangle, blue),
= (a2 , support,rectangle,yellow),
= (a3 , support,rectangle,blue),
= (a4 , support,rectangle,blue),
= (b1 , support,rectangle,blue),
= (b2 , support,rectangle,red),
= (b3 , support,rectangle,yellow),
= (c1 , girder,rectangle,red),
= (c2 , girder,triangle,green),
= (c4 , girder,rectangle,yellow)}
R = {RCOV (Rontop ), RCOV (Radjacent )}
RCOV (Rontop ) = {(c1 , a1 ), (c1 , b1 ), (c2 , a2 ), (c2 , b2 ), (c4 , a4 )}
RCOV (Radjacent ) = {(a3 , b3 )
The encoding of P in the database generates the set of tables reported in Fig. 9.29.
From the theory we also know the distribution of the objects among the examples
e1 , e2 , e3 , e4 , as described in Fig. 9.28. Moreover, we have two classes, i.e., CL =
{Arch, NoArch} and the teacher tells that e1 and e2 are arches, whereas e3 and
e4 are not. Then, the OBJ table in D is modified as in Fig. 9.30.
The language L is a DATALOG language L = C, X, O, F, P, where:
C = {a1 , a2 , a3 , a4 , b1 , b2 , b3 , b4 , c1 , c2 , c3 , c4 , rectangle, . . . ,
small, . . . , yellow, . . .}
P = {girder(x), support(x), shape(x, rectangle), shape(x, triangle),
= color(x, red), color(x, blue), color(x, yellow), color(x, green),
= ontop(x, y), adjacent(x, y), arch(x), noarch(x), example(x)}
312 9 Abstraction in Machine Learning
Fig. 9.31 The 10-trains original East-West challenge, after Michalski [367]
(g)
ΓTYPE = {engine, car, load}
Moreover:
(g)
ΓO,engine = {g1 , g2 , . . .}
(g)
ΓO,car = {ci,j | i, j 1}
(g)
ΓO,load = {i,j,k | i, j, k 1}
and
(g) (g) (g) (g)
ΓO = ΓO,engine ∪ ΓO,car ∪ ΓO,load
Cars and loads have attributes associated to them, whereas engines do not.
(g)
ΓA,car = {(Cshape, ΛCshape ), (Clength, {long, short}),
(Cwall, {single, double}), (Cwheels, {2, 3})},
where:
and
(g) (g) (g)
ΓA = ΓA,car ∪ ΓA,load
314 9 Abstraction in Machine Learning
(g) (g)
No function is considered, so that ΓF = ∅. Finally, ΓR = {RInfrontof , RInside }
contains the considered relation between pairs of elementary objects. More precisely:
(g) (g) (g)
RInfrontof ⊆ ΓA,engine ∪ ΓA,car × ΓA,car
(g) (g)
RInside ⊆ ΓA,load × ΓA,car
Og,car = {c1,1 , c1,2 , c1,3 , c1,4 , c2,1 , c2,2 , c2,3 , c3,1 , c3,2 , c3,3 , c4,1 , c4,2 , c4,3 ,
c4,4 , c5,1 , c5,2 , c5,3 , c6,1 , c6,2 , c7,1 , c7,2 , c7,3 , c8,1 , c8,2 , c9,1 , c9,2 ,
c9,3 , c9,4 , c10,1 , c10,2 }
Og,load = {1,1,1 , 1,1,2 , 1,1,3 , 1,2,1 , . . . , 10,1,1 , 10,2,1 , 10,2,2 }
The name ci,j denotes the jth car in train i, counted starting from the one directly
connected to the engine. Similarly, i,j,k denotes the kth load (from the engine to the
rear of the train) in car ci,j . Notice that the meaning of the indexes is only for the
reader.
The set Ag contains the specification of the attributes for each object:
RCOV (RInfrontof ) = {(g1 , c1,1 ), (c1,1 , c1,2 ), (c1,2 , c1,3 ), . . . , (g10 , c10,1 ),
(c10,1 , c10,2 )}
RCOV (RInside ) = {(1,1,1 , c1,1 ), (1,1,2 , c1,1 ), (1,1,3 , c1,1 ), . . . ,
(10,2,1 , c10,2 ), (10,2,2 , c10,2 )}
9.4 Abstraction Operators in Machine Learning 315
Fig. 9.32 Tables in the Dg of Michalski’s “trains” problem, referring to the objects and their
attributes
The theory Tg contains the learning algorithm LEARN, and the criteria for stopping
the search and for evaluating hypotheses. Moreover, the query Q specifies that there
are two classes, namely CL = {East, West} and the teacher labels all elementary
objects with respect to the classes and the examples.13
The database Dg contains the tables reported in Figs. 9.32 and 9.33, where OBJ
has already incorporated the information provided by the teacher, referring to the
query.
In order to apply LEARN it is often more convenient to reformulate the content of
Dg in such a way to put together all the information referring to a single example. It is
sufficient to make a selection on table OBJ on the base of condition “Example = ei ”
and then selecting from the other tables the rows corresponding to the ID of the
Fig. 9.34 Reformulation of the database Dg in such a way that information regarding a single
example (train 6) is grouped
objects extracted from OBJ. As an example, we report in Fig. 9.34 the reformulation
of example e6 .
The ground language L we consider (noted Lg ) is a DATALOG language Lg =
Cg , X, O, Fg , Pg , where:
Cg = Og
Fg = ∅
Pg = {engine(x), car(x), load(x) ∪ {example(x, y) | x ∈ Og , y ∈ {e1 , . . . , e10 }}
∪ {class(x, z) | x ∈ Og , z ∈ CL} ∪ {cshape(x, v)|x ∈ Og,car , v ∈ ΛCshape }
∪ {clength(x, v)|x ∈ Og,car , v ∈ ΛClength }
∪ {cwall(x, v)|x ∈ Og,car , v ∈ ΛCwall }
∪ {cwheels(x, v)|x ∈ Og,car , v ∈ ΛCwheels }
∪ {lshape(x, v)|x ∈ Og,load , v ∈ ΛLshape }
We recall that we used throughout the book the convention of using the same
names for the objects in O and the constants in L.
By using LEARN, several sets of rules distinguishing trains going East from trains
going west can be found. Let us consider the following ones:
(a) (g)
ΓTYPE = ΓTYPE ∪ {loadedcar}
(a) (a) (g)
ΓO = ΓO ∪ ΓO,loadedcar
(a) (r) (a) (a)
ΓR = ΓR − {RInfrontof } ∪ {RInfrontof }
(a)
Relation RInfrontof is defined as follows:
The hierarchy operator can be applied, in this case, independently from the aggre-
(a)
gation operator. It only affects ΓA , changing the value of car’s attribute Cshape.
Now we will show how the construction operator can be applied to generate a new
attribute for object of type loadedcar, which does not add new information, as it
can be derived from the ground space. The corresponding operator is ωconstr (Count),
where:
(g) (g) (a)
Count : ΓO,car × (ΓO,load )k → ΓA
Count takes as input a loaded car and its loads, and counts how many loads there
are. If the number of loads is 3, the car is declared Heavy. Then we define a new
attribute for cars, namely (Heavy, {true, false}), associated to objects of type
loadedcar.
The operator ωconstr (Count) can be applied independently from both the aggrega-
tion and the hierarchy building one. It can be easily applied with a relational algebra
operation in Dg .
The application of the three above mentioned operators generates three abstract
description frames. In order not to multiply these, we define a parallel/sequential
abstraction process Π :
(1) (2)
If we call Γa the description frame generated by ωaggr , Γa the description
frame generated by ωhierattrval , and Γa the final one obtained by Π , we have the
following relations with respect to the relative abstraction levels:
Fig. 9.35 Database Da corresponding to the perception Pa obtained by applying the abstraction
process Π
By analyzing the above rules, we may notice that aggregation has reduced the com-
plexity of the description without affecting the quality of the classification rules. The
hierarchy building operator has simplified rule r1 , but has negatively affected rule r3 ;
in fact, by replacing the raggedtop value with closedtop, rule r3 also covers
trains 3 and 5, which are bound East. Then, rule r3 becomes more complex, as it was
necessary to add that there is also a U-shaped or trap-shaped car in the train.
As a last step, we would like to add a new level of abstraction, where only the trains
as single
entities are present. We have to apply again an aggregation operator, namely
ωaggr (engine, loadedcark ), train . The aggregation rule is the following:
(g) (g)
f (y, x1 , . . . , xn ) = If [y ∈ ΓO,engine ] ∧ [x1 , . . . , xk ∈ ΓO,loadedcar ] ∧
(a)
[(y, x1 ), (x1 , x2 ), . . . (xk−1 , xk ) ∈ RCOV (RInfrontof )]
Then t
We have to decide what attributes (if any) are to be transferred to the trains. Only the
length, called Tlength, is applicable. None of the relations is applicable anymore. We
obtain then a new description framework Γa , more abstract than Γa , which contains:
(a ) (a)
ΓO = ΓO,train
ΓA(a ) = {(Tlength, {long, short})}
ΓF(a ) = ∅
(a )
ΓR =∅
The value long of the attribute Tlength is applied to a train if it has 3 or more
loaded cars, otherwise the train is short. In this abstract space clearly the trains
cannot be distinguished anymore; in fact, even if it is true that all trains going East
are long, two of those going West are long as well. Then, we have removed too much
320 9 Abstraction in Machine Learning
information to still be able to answer our question, i.e., to learn to distinguish the
two sets of trains.
The associated optimal value function is noted V ∗ (s), and it is the unique solution
to Bellman equation:
V ∗ (s) = max P(s |s, a) R(s |s, a) + γV ∗ (s ) . (9.5)
a
s
Let us now model this query Q in the KRA model. The observable states S of the
environment are objects of type state. The observable actions A are objects of type
action. The parameter γ is a constant of type R+ . States are to be observed, and it
is the task designer who decides what features are to be measured in each state and
their value sets. Each state is thus described by a set of attributes and corresponding
values (Am , Λm ) (1 m M). The probability distribution P(s |s, a) is represented
by a function whose domain is S × A × S and its co-domain is [0, 1]. The reward R
is a function that has domain S × A, but its co-domain is given by the designer. Let
us add to the description frame two other functions, i.e., the current policy π, and
one value function V π (s) (function Qπ (s, a) could have been chosen instead). No
relations are used.
Once the attributes of the states, their values, and the actions are all selected, a
description frame Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR , can be defined, where14 :
ΓTYPE = {state, action, real},
ΓO,state = S, ΓO,action = A, ΓO = S ∪ A ∪ R,
ΓA = ΓA,state = {(Am , Λm )|1 m M}
ΓF = {V π : S → R, π : S → A, P : S × A × S → [0, 1], R : S × A → R},
ΓR = ∅.
14 In principle, also attributes for the actions could be envisaged. They can be added if needed.
9.4 Abstraction Operators in Machine Learning 321
(M)
ψ ={(si , state, v(1) (2)
i , vi , . . . , vi ) 1 i |S|} ∪
{(aj , action) 1 j |A|} ∪ (9.6)
FCOV (V π ) ∪ FCOV (π) ∪ FCOV (P) ∪ FCOV (R)
where FCOV (V π ) contains pairs of the form (si , Viπ ), FCOV (π) contains pairs of
the form (si , πi = aj ), FCOV (P) contains 4-ple of the form (si , aj , sk , pijk ), and
FCOV (R) contains triplets of the form (si , aj , rij ), with:
• si is the identifier of a state,
• v(m)
i ∈ Λm ( 1 m M) is the value of attribute Am of state si ,
• Viπ is the value function in state si ,
• πi the value of the policy in state si , i.e., πi = π(si ) = aj ,
• pi,j,k is the probability of obtaining state sk by applying action aj in state si ,
• ri,j is the reward received by the agent choosing action aj in state si .
The theory T contains the discount parameter γ ∈ R, the Bellman equation (9.5),
and an algorithm Algo that chooses the action to apply at each step. In addition
T must provide a criterion to stop Algo. As a learning algorithm we can consider
Q-learning; after state s has been observed, action a has been chosen, a reward r has
been gathered, and the next state s has been observed as well, Q-learning performs
the following update:
where αt is a learning rate parameter. For choosing the action we may consider a
classic
-greedy policy, which chooses a random action with probability
instead of
trying to choose the “best” action in terms of the highest value of the successor states
(with probability 1/
). Equation (9.7), and parameters αt and
are to be added to
the theory.
With this formalization, one configuration corresponds to a complete description
of the knowledge of the agent at time t. As opposed to the modeling of propositional
concept learning, Reinforcement Learning cannot easily be modeled as an inference
from a set of known facts. As a matter of facts, in Reinforcement Learning the agent
both explores the world and learns from it at the same time.
Example 9.6. For the sake of illustration, let us consider the simple Taxi example
(taken from [137], and described in Fig. 9.20), where a taxi has to navigate a 5-by-5
grid world, picking-up a passenger and delivering him/her to a destination. There are
six primitive actions in this domain: (a) four navigation actions that move the taxi one
square North, South, East, or West, (b) a Pickup action (only possible if
the taxi is at the passenger’s location), and (c) a Putdown action (only possible if
a passenger is in the taxi at his/her destination). The agent receives a reward of −1
for each action, and a final +20 for successfully delivering the passenger to his/her
destination. There is a reward of −10 if the taxi attempts to execute the Putdown
or Pickup actions illegally. If a navigation action would cause the taxi to hit a wall,
the action is a no-op, and there is only the usual reward of −1. The six primitive
actions are considered deterministic for the sake of simplicity. The query is to find a
policy that maximizes the total reward per episode.
First of all, we have objects of two types:
Then, we have to define the structure of the states and their attributes. A state is
a triple ((i, j), 1 , 2 ) containing the location of the taxi, (i, j), the initial location
of the passenger, 1 , and the location 2 of the final destination. We can define the
following attributes for a state:
Considering the attributes, there are in total N = 500 states, because there are 25
values for TaxiLocation, 5 values for PsgLocation, and 4 values for PsgDestination.
Actions may have applicability constraints associated to them, which we can model
as the values of an attribute Constr. Hence:
where ΛConstr is the set of given constraints. The theory contains the value of the
parameters γ and α.
The six primitive actions are considered deterministic for the sake of simplicity,
and thus the transition probability function P takes values in {0, 1}. The knowledge
9.4 Abstraction Operators in Machine Learning 323
ΓF = {V π : S → R, π : S → A, P : S × A × S → {0, 1}, R : S × A → R}
Given the description frame built up as above, we consider now a specific RL task
for the Taxi problem, namely a P-Set P. As the world does not change from one
problem instance to another, a problem instance is specified by the initial location
1 of the passenger and his/her destination, 2 . Let, in our case, 1 = Y and 2 = B.
Then, P = O, A, F, R contains:
Ostate = {o}
Oaction = {North, Sud, East, West, Pickup, Putdown}
Notice that P contains a single, not completely specified state o, which corresponds
to 125 observable states {si | 1 i 125}, because the position of the taxi is not
observed. However, the algorithm Algo, given in the theory, may use non observed
states, because the position of the passenger may change to inTaxi or to B.
For A we simply have:
A = { (UN,UN), Y,B }
Finally, R = ∅ and
The covers of all functions are given by the designer. The observed information is
stored in a database D, where the table OBJ contains both the identifiers of the states
and those of the actions, the tables state-ATTR and action-ATTR contain the
attributes of the states and of the actions, respectively, and there is one table for each
function cover. The theory contains the parameters α = 0.1 and γ = 0.9.
There are several existing algorithms in RL that provide good solution to large MDPs.
One of their limitations is that in most cases they consider S as a single “flat” search
space [137]. These methods have been successfully applied in several domains, such
as game playing, elevator control, and job-shop scheduling. Nevertheless, in order to
scale to more complex tasks, which have large state spaces and a complex structure,
abstraction mechanisms are required.
In Sect. 9.3 we have briefly introduced the four dimensions along which abstrac-
tion has been explored in the field of Reinforcement Learning: State aggregation,
15 This formulation is equivalent to say that the probability is equal to δa,North , because we are in
the deterministic case.
324 9 Abstraction in Machine Learning
(g)
ΓTYPE = {state, action, real},
(g) (g) (g)
ΓO,state = S, ΓO,action = A, ΓO = S ∪ A ∪ R,
(g) (g)
ΓA = ΓA,state = {(Am , Λm )|1 m M}
(g)
ΓF = {V π : S → R, π : S → A, P : S × A × S → [0, 1], R : S × A → R},
(g)
ΓR = ∅.
By applying the operator to several ground states, specified in the set ΓO,state,aggr
a new one is created. A more abstract description frame is obtained, where:
(a)
ΓTYPE = {state, action, real, newstate},
(a) (a) (a)
ΓO,state = S − ΓO,state,aggr , ΓO,action = A, ΓO,newstate = {c, c1 , · · ·}
ΓO(a) = ΓO(a),state ∪ A ∪ R ∪ ΓO(a),newstate
(a)
ΓA,state = {(A(a) (a)
m , Λm )|1 m M}
16 As mentioned before, factored MDPs exploit problem structure to represent exponentially large
state spaces very compactly [76].
9.4 Abstraction Operators in Machine Learning 325
(a) (b)
Fig. 9.36 a An abstracted state space (after [25]) with six states B1 to B6 . If the taxi is in block B2 , it
can go left to B1 , right to B6 and down to B4 . In B3 , the taxi can only go up to B1 . b A reformulation
of the abstract space that makes it similar to the ground formulation
All the details of the actual aggregation are defined in meth(Pg , ωaggr ). This
method determines what attributes are to be kept for the new aggregated states and
with what values, and describes how the abstract functions are to be computed.
For instance, suppose that a new state c is formed by aggregation of k original
states {s1 , . . . , sk }. Then, the attribute TaxiLocation could be defined as the averages
of the i’s and j’s of the components states. For the attribute PsgLocation, state c
could be labelled, for instance, R, if R ∈ {s1 , . . . , sk }. The same can be done for
PsgDestination. Actually, this abstraction could also be realized by equating subsets
of values in the domains of the attributes characterizing the states.
Two instances of state abstraction are presented in Fig. 9.36 for the Taxi problem
of Example 9.6. A taxi can navigate between two points with or without a client in
a similar manner; then, all the states corresponding to the taxi with or without the
passenger can be considered equivalent. The same is true regarding the passenger
destination with respect to the “navigate” and “get a passenger” subtasks. In Fig. 9.36
the number k of abstract taxi locations is 6 instead of the 25 initial ones.
In the abstract representation, a passenger located in Y at (0,0) in the ground
space (see Fig. 9.20) is now in B3, and its destination (3,0) in the ground space is
now in state B4. A solution in the abstract space is for the Taxi to go through state B3,
pickup the passenger, go through B1 and B2, and finally drop the passenger in B4.
326 9 Abstraction in Machine Learning
9.5 Summary
Fig. 9.37 Three approaches to combine abstraction and learning. The idea has its root in the Feature
Selection task, which uses the hiding feature operator, and can be extended to any set of abstraction
operators
9.5 Summary 327
The term “complex systems” has not a precise, general definition, but it is agreed
upon that it applies to systems that have at least the following properties:
• They are composed of a large number of elements, non-linearly interacting with
each other.
• Their behavior cannot be determined by the behaviors of the components, but it is
emerging from those interactions as an ensemble.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 329
DOI: 10.1007/978-1-4614-7052-6_10, © Springer Science+Business Media New York 2013
330 10 Simplicity, Complex Systems, and Abstraction
Fig. 10.1 A bifurcation diagram for the logistic map: xn+1 = rxn (1 − xn ). The horizontal axis is
the r parameter, the vertical axis is the x variable. The map was iterated 1000 times. A series of
bifurcation points leads eventually to chaos
1 The degree of a vertex, in an undirected graph, is the number of edges connected to the vertex. In
a directed graph is the sum of the number of ingoing and outgoing edges.
10.1 Complex Systems 331
(a) (b)
Fig. 10.4 a A graph with 7 vertices, with no edges. b A graph with 7 vertices, fully connected, i.e.,
with 21 edges. Which one is more complex?
V
Ivd = di log2 di (10.1)
i=1
E
OC(G) = OCk (G) (10.2)
k=1
Instead of counting edges, the complexity of a graph can be related to paths. Rücker
and Rücker [445] have proposed a measure called “total walk count” (TWC). Denot-
ing by wi () a generic path on G of length , the TWC is defined by:
V −1
TWC(G) = wi () (10.3)
=1 i
The number of walks of length is obtained from the th power of the adjacency
()
matrix A. In fact, the entry ars in A() is equal to 1 iff there is a path of length in
the graph.
By observing that highly complex networks are characterized by a high degree of
vertex-vertex connectedness and a small vertex-vertex separation, it seems logical to
use both quantities in defining the network complexity. Given a node xi in a network
G, let di be its degree and λi its distance degree [66], which is computed by
V
λi = d(i, j),
j=1
where d(i, j) is the distance between node i and node j. The complexity index B is
then given by:
V
di
B= (10.4)
λi
i=1
Analyzing the definition of the above measures of graph complexity, one may notice
that all of them capture only very marginally the “intricacies” of the topological
structure, which is actually the aspect that makes a graph “complex”. As an example,
one would like to have a measure that distinguishes a star-shaped network from a
small-world network, or from a scale-free one with large hubs. An effort in this
direction has been done by Kahle et al. [278], who tried to link network complexity
to patterns of interactions among nodes.
334 10 Simplicity, Complex Systems, and Abstraction
The analysis of complex networks is difficult because of their very large size (number
of vertices) and intricacy of the interconnections (number of edges and topological
structure). In order to discover organizations, structures, or behaviors in these net-
works we need some tool that allows their simplification, without loosing the essence.
Abstraction comes here into play. As a matter of fact, there are few approaches that
explicitly mention abstraction to help investigating large networks, but there are many
that uses it without saying, for instance under the name of multi-scale or hierarchical
analysis.
A case of network abstraction is presented by Poudret et al. [427], who used
a topology-based approach to investigate the Golgi apparatus. A network of com-
partments describes the static and dynamic characteristics of the system, with spe-
cial focus on the interaction between adjacent compartments. Golgi apparatus is an
organelle whose role includes the transport of proteins synthesized by the cell from
the endoplasmic reticulum to the plasma membrane. Its structure is not completely
known, and the authors have built up two abstract topological models (the “plate
stack model” and the “tower model”) of the apparatus (represented in Fig. 10.5) in
order to discriminate between two existing alternative hypotheses: one supposes that
vesicles play a major role in the excretion of proteins, whereas the other one suggests
the presence of a continuous membrane flow.
The abstract models ignore the geometry of the apparatus components, and focus
on their interactions to better capture their dynamics. The building of these abstract
models allowed the authors to show that only one of them (namely, the “tower model”)
is consistent with the experimental results.
An interesting approach, developed for the analysis of biological networks (but
actually more general), has been recently presented by Cheng and Hu [97]. They
consider a complex network as a system of interacting objects, from which an itera-
tive process extracts meaningful information at multiple granularities. To make this
(a) (b)
Fig. 10.5 Abstract models of the Golgi apparatus. a Plate stack model. b Tower model. Only the
“tower model” proved to be consistent with the experimental results. (Reprinted with permission
from Poudret et al. [427])
10.1 Complex Systems 335
Fig. 10.6 Structure of the abstraction pyramid built up from a complex network. Each circle
represents a module. Vertical relationships and horizontal relationships are denoted by dashed lines
and solid lines, respectively. The thickness of a solid line increases with the importance of the
connection. The original network is at the bottom (Level 4). Higher-level networks are abstractions,
to a certain degree, of the next lower network. (Reprinted with permission from Cheng and Hu [97])
possible, the authors developed a network analysis tool, called “Pyramabs”, which
transforms the original network into a pyramid, formed by a series of n superposed
layers. At the lowest level (n ) there is the original network. Then, modules (sub-
graphs) are identified in the net and abstracted into single nodes, that are reported
on the immediately higher level. As this process is repeated, a pyramid is built up: at
each horizontal layer a network of (more and more abstract) interconnected modules
is located, whereas the relationship between layers i+1 and i is constituted by the
link between a module at layer i+1 and the corresponding abstract node at layer i .
An example of such a pyramid is reported in Fig. 10.6.
In order to generate the pyramid, two tasks must be executed by two modules,
namely the discovery and the organization modules. A top-down/bottom-up cluster-
ing algorithm identifies modules in a top-down fashion, and constructs the hierarchy
bottom up, producing an abstraction of the network with different granularities at
different levels in the hierarchy. Basically, the method consists of three phases: (1)
computing the proximity between nodes; (2) extracting the backbone (a spanning
tree) from the network, and partitioning the network based on that backbone; (3)
generating an abstract network. By iteratively applying the same procedures to a
newly generated abstract network, the pyramid is built up.
Other multi-scale or hierarchical approaches are proposed by Lambiotte [310],
Arenas, Fernández and Gómez [23], Bang [32], Binder and Plazas [61], Dorat et al.
[139], Kurant and Thiran [308], Oliveira and Seok [407], Ravasz and Barabasi [443],
and Sales-Pardo et al. [474].
336 10 Simplicity, Complex Systems, and Abstraction
2 Extensive presentations of the main indices characterizing networks are provided by Newman
[399], Boccaletti et al. [65], Costa et al. [116], Emmert-Streib and Dehmer [148], and Dehmer and
Sivakumar [130].
10.1 Complex Systems 337
Abstract Network
Abstract
Easier computation
Measures
Ground
Measures
X= ( )
Ground Network
Xa = (X')
Fig. 10.7 Scheme of the abstraction mechanism to compute network measures. Starting from a
ground interaction graph G , some measures must be obtained on it, such as, for instance, the centrality
of every node or their betweenness. Let X = f (G ) be one such measure, obtained by means of a
procedure f . In order to compute X at a reduced cost, an abstract network G is generated, and the
corresponding abstract measure X is computed. Finally, X is re-transformed into the value of X
again, by applying the functions h and obtaining Xa = h(X ). (Reprinted with permission from
Saitta et al. [466])
for detecting communities inside complex networks have been proposed by Arenas
et al. [23], Expert et al. [157], Girvan and Newman [212], Fortunato and Castellano
[178], Lancichinetti et al. [312], Leicht and Newman [325], Lozano et al. [346],
Newman [400], Zhang et al. [585], and Lancichinetti and Fortunato [311].
An interesting approach to reduce the size of a weighted (directed or undirected)
complex network, while preserving its modularity structure, is presented and/or
reviewed by Arenas et al. [22]. A comparative overview, in terms of sensitivity
and computational cost, of methods for detecting communities in networks is pre-
sented by Danon et al. [121], whereas Lancichinetti et al. [312] propose a benchmark
network for testing algorithms [313].
The notion of simplicity has played an important role in Philosophy since its early
times. As mentioned by Baker [29] and Bousquet [74], simplicity was already invoked
by Aristotle [24], in his Posterior Analytics, as a merit for demonstrations with as
few postulates as possible. Aristotle’s propension for simplicity stemmed from his
ontological belief that nature is essentially simple and parsimonious; then, reasoning
should imitate these characteristics.
However, the most famous statement in favor of simplicity in sciences is Ockham’s
razor, a principle attributed to the philosopher William of Ockham.3 In the light of
todays studies, Ockham’s razor appears to be a myth, in the sense that its reported
formulation “Entia non sunt multiplicanda, praeter necessitatem”4 does not appear
in any of his works nor in other medieval philosophical treatises; on the contrary, it
was minted in 1639 by John Ponce of Cork. Nevertheless, Ockham did actually share
the preference toward simplicity with Scotus and other fellow medieval philosophers;
indeed, he said that “Pluralitas non est ponenda sine necessitate”,5 even though he
did not invent the sentence himself.
Independently of the paternity of the principle, it is a fact that the putative Ock-
ham’s razor does play an important role in modern sciences. On the other hand, the
intuition that a simpler law has a greater probability of being true has never been
proved. As Pearl [415] puts it: “One must resign to the idea that no logical argument
can possibly connect simplicity with credibility”. A step ahead has been done by Karl
Popper [426], who has connected the notion of simplicity with that of falsifiability.
For him there is no a priori reason to choose a simpler scientific law; on the contrary,
the “best” law is the one that imposes most constraints. For such a law, in fact, it
will be easier to find a counterexample, if the law is actually false. According to this
3 William of Ockham (c. 1288–c. 1348) was an English Franciscan and scholastic philosopher, who
is considered to be one of the greatest medieval thinkers.
4 “Entities must not be multiplied without need”.
5 “Plurality is not to be supposed without necessity”.
10.2 Complexity and Simplicity 339
Pr(D|Hi ) Pr(Hi )
H ∗ = Arg Max Pr(Hi |D) = Arg Max (10.5)
Hi ∈H Hi ∈H P(D)
The rational behind Bayes’ formula is that we may have an a priori idea of the relative
probabilities of the various hypotheses being true, and this idea is represented by the
prior probability distribution Pr(Hi ). However, the examination of the experimental
data D modifies the prior belief, in order to take into account the observations, and
produces a posterior probability distribution Pr(Hi |D).
Going a bit further along this line of reasoning, Solomonoff [498, 499] has pro-
posed a formal definition of simplicity in the framework of inductive inference. In
his view, all the knowledge available in a domain at time t can be written in the
form of a binary string. When new experiences take place, their results contribute to
lengthen the string. Then, at any given time, we may see the initial part of a poten-
tially infinite string, representing the “perfect” knowledge. The induction problem
is to predict the next bit by extrapolating the known part of the string using Bayes’
formula. Solomonoff’s contribution consists in showing that it exists an optimal prior
probability distribution on the infinite strings, called universal distribution, such that
it determines the best possible extrapolation. By considering the strings as generated
by a program running on a universal computer, the simplicity of the string is defined
as the length of the shortest program that outputs it. This definition was independently
proposed by Kolmogorov [295] and Chaitin [90], and we will formally introduce it
later on.
According to Feldman [162],“simple patterns are compelling”. As a matter of fact,
there is no doubt that humans are attracted by simple or regular patterns. Intuitively,
we tend to ascribe to randomness the generation of complicated patterns, whereas
regular ones are believed to have been generated on purpose, and hence to bear some
meaning. As, in general, objects and events in the world are neither totally random
nor totally regular, it would be interesting to have some criterion to test whether a set
of observations can be considered random or not. Feldman provides such a criterion
by defining a test for checking the null hypothesis: H0 ≡ “observations are random”.
More precisely, let x = {x1 , ..., xn } be a set of objects or observations, each one
described by m attributes A1 , ..., Am . Each observation is defined by a conjunction
of m literals, i.e., statements of the form “Ai = v” or “Ai = v”. The goal of the
test is to allow a set of regularities to be extracted from the observations, where by
“regularity” it is meant a lawful relation satisfied by all n objects, with the syntax:
where φk is a regularity of degree k, and A0 is some label. Let S(x) be the mini-
mum set of (non redundant) regularities obtainable from x. The function |φk |, giving
the number of regularities of each degree k contained in S(x), is called the power
spectrum of x. A useful measure of the overall complexity of x is its total weighted
spectral power:
m−1
C(x) = (k + 1)|φk | (10.6)
k=0
The test for x’s simplicity consists in computing the probability distribution of C and
setting a critical region on it.
Along the same lines, Dessalles6 discusses Simplicity Theory, a “cognitive model
that states that humans are highly sensitive to any discrepancy in complexity, i.e., their
interest is aroused by any situation which appears too simple”. Simplicity Theory
links simplicity to unexpectedness, and its main claim is that “an event is unexpected
if it is simpler to describe than to generate”. Formally, U = Cw − C, where U is
the unexpectedness, Cw is the generation complexity, namely the size of the minimal
description of parameter values the “world” needs to generate the situation, and C
is the description complexity, namely the size of the minimal description that makes
the situation unique, i.e., Kolmogorov complexity.
The same idea of surprise underlies the work by Itti and Baldi [272]. These
authors claim that human attention is attracted by features that are “surprising”, and
that surprise is a general, information-theoretic concept. Then, they propose a quan-
tification of the surprise using a Bayesian definition. The background information
of an observer about a certain phenomenon is given by his/her prior probability
distribution over the current set M of explaining models. Acquiring data D allows
the observer to change the prior distribution P(M) (M ∈ M) into the posterior
distribution P(M|D) via Bayes theorem. Then, surprise can be measured by the dis-
tance between the prior and posterior distributions, via the Kullback-Liebler (KL)
divergence:
P(M|D)
S(D, M) = KL(P(M)||P(M|D)) = P(M|D) log2 (10.7)
P(M)
M∈M
In order to test their theory, the authors have performed some eye-tracking experi-
ments on subjects looking at videos. They have compared the measured tracks with
those that would have been predicted if people’s attention were guided by surprise,
entropy or salience. The results show that surprise appears to be a more effective
measure of visually interesting points.
The attempt to link surprise, unexpectedness, and information theory is not new.
It was already proposed by Watanabe [552] in the late 60’s. He linked the notion of
surprise with the information provided by an event in Shannon’s theory: if an event
e with probability p(e) close to 0 does actually occur, we would be very surprised;
6 http://perso.telecom-paristech.fr/~jld/Unexpectedness/
10.2 Complexity and Simplicity 341
on the contrary, we would not be surprised at all by the occurrence of an event with
probability close to 1. The surprise S(e) is then equated to Shannon’s information:
The complexity of an entity (an object, a problem, an event, ...) is a difficult notion to
pin down, because it depends both on the nature of the entity, and on the perspective
under which the entity is considered. Often, complexity is linked with information,
not an easier notion to be defined itself.
The approaches to quantify complexity include symbolic dynamics, information
and ergodic theory, thermodynamics, generalized dimensions and entropies, theory
of computation, logical depth and sophistication, forecasting measures, topological
exponents, and hierarchical scaling. All the above perspectives help us to understand
this rich, important, and yet elusive concept.
In January 2011 the Santa Fe Institute for complex sciences has organized a
workshop on “Randomness, Structure, and Causality: Measures of complexity from
theory to applications”, whose goal was to answer (among others) the following
questions:
1. Are there fundamental measures of complexity that can be applied across disci-
plines, or are measures of complexity necessarily tied to particular domains?
2. Are there universal mechanisms at work that lead to increases in complexity, or
does complexity arises for qualitatively different reasons in different settings?
3. Can an agreement be reached about the general properties that a measure of
complexity must have?
By exploring the talks at the workshop, one may notice that, even not addressing the
above questions explicitly, most of them provide an implicit negative answer to all
three. This comes to say that the issue of complexity definition is really a tough task,
especially concerning applicability to concrete systems.
As a further confirmation of the fluidity of the field, we may quote a recent
comment by Shalizi7 :
“Every few months seem to produce another paper proposing yet another measure
of complexity, generally a quantity which can’t be computed for anything you’d
actually care to know about, if at all. These quantities are almost never related to
any other variable, so they form no part of any theory telling us when or how things
get complex, and are usually just quantification for quantification’s own sweet sake.”
In the literature there are many definitions of complexity measures, which can be
roughly grouped into four main strands:
• Predictive information and excess entropy,
7 http://cscs.umich.edu/~crshalizi/notabene/complexity-measures.html
342 10 Simplicity, Complex Systems, and Abstraction
One of the first proposed measures of complexity has been Algorithmic Complexity,
which was independently defined by Solomonoff [498, 499], Kolmogorov [295],
and Chaitin [90], but is almost universally named after Kolmogorov (see also Li and
Vitànyi’s [331] book for an extensive treatment). Kolmogorov complexity K makes
use of a universal computer (a Turing machine) U, which has a language L in which
programs p are written. Programs output sequences of symbols in the vocabulary
(usually binary) of L. Let x n be one of such sequences. The Kolmogorov complexity8
of x n is defined as follows:
Equation (10.8) states that the complexity of x n is the length |p|, in bits, of the shortest
program that, when run on U, outputs x n and then halts. Generalizing (10.8), we can
consider a generic object x, described by strings generated by programs p on U.
There are small variants of formulation (10.8), but in all of them the complexity
captured by the definition is a descriptional complexity, quantifying the effort to be
done for identifying object x from its description. If an object is highly regular, it is
easy to describe it precisely, whereas a long description is required when the object
is random, as exemplified in Fig. 10.8. In its essence, Kolmogorov complexity K(x)
captures the randomness in x.
As all programs on a universal computer V can be translated into programs on
the computer U by q, one of U’s programs, the complexity KV (x) will not exceed
KU (x) plus the length of q. In other words:
Even though |q| may be large, it does not depend on x, and then we can say that
Kolmogorov complexity is (almost) machine-independent. Then, we may omit the
subscript indicating the universal machine, and write simply K(x). Whatever the
machine, K(x) captures all the regularities in x’s description.
Kolmogorov complexity provides Solomonoff’s universal distribution over
objects x belonging to a set X :
8 Notice that in the literature, this measure is often called Algorithmic Information Content (AIC).
10.3 Complexity Measures 343
(a) (b)
Fig. 10.8 a Very regular pattern, consisting of 48 tiles. b Irregular pattern, although not really
random, consisting of 34 tiles. Even though the left pattern has more tiles, its Kolmogorov complexity
is much lower than that of the right pattern
1 −K(x)
Pr(x) = 2 , (10.10)
C
where:
C= 2−K(x) (10.11)
x∈X
The universal distribution (10.10) has the remarkable property of being able to mimic
any computable probability distribution Q(x), namely:
Pr(x) A Q(x),
Fig. 10.9 Kolmogorov complexity captures the “regular” part of data, whereas the irregular part
(the “noise”) must be described separately. The regular part, i.e., the box, requires K(x) bits for its
description, whereas the sparse objects require log2 |X | bits, where x ∈ X
then a two-part encoding [546], one describing the regularities (the model) in the
data, and the other describing the random part (noise), as schematically represented
in Fig. 10.9. The irregular part describes the objects that are not represented by
the regularities simply by enumeration inside the set X . Then, an object x will be
described in (K(x) + log2 |X |) bits.
The two-part encoding of data is the starting point of Vitànyi’s definition of com-
plexity [546], called “Meaningful Information”, which is only the part encoding the
regularities; in fact, Vitànyi claims that this is the only useful part, separated from
accidental information. In a more recent work Cilibrasi and Vitànyi [106] have intro-
duced a relative notion of complexity, i.e., the “Normalized Compression Distance”
(NCD), which evaluates the complexity distance NCD(x, y) between two objects x
and y. The measure NCD is again derived from Kolmogorov complexity K(x), and
can be approximated by the following formula:
In (10.12) K(x) (resp. K(y)) is the compression length of string x (resp. y), while
K(x, y) is the compression length of the concatenation of strings x and y. These
lengths are obtained from compressors like gzip.
The approach to complexity by Gell-Mann and Lloyd [195] follows Vitànyi’s
idea that complexity should only be related to the part of the description that encodes
regularities of an object. For this reason they define the effective complexity (EC) of
an entity as the length of a highly compressed description of its regularities. More-
over, they claim that the notion of complexity is context-dependent and subjective,
and that it depends on the description granularity and language, as well as from a
clear distinction between regularity and noise and between important and irrelevant
aspects. The authors justify their proposal by stating that EC is the definition that
most closely corresponds to what we mean by complexity in ordinary conversation
and in most scientific discourses.
Following Gell-Mann and Lloyd’s approach, Ay et al. [26] have proposed a defi-
nition of effective complexity in terms of algorithmic information theory. Then, they
10.3 Complexity Measures 345
have applied this notion to the study of discrete-time stochastic stationary (and, in
general, not computable) processes with binary state space, and they show that, under
not too strong conditions, long typical process realizations are effectively simple.
The NCD measure is the starting point for another relative measure of complex-
ity, the “statistical complexity” CS , proposed very recently by Emmert-Streib [147],
which provides a statistical quantification of the statement “x is similarly complex
as y”, where x and y are strings of symbols from a given alphabet A. Acknowledging
that “a commonly acknowledged, rigorous mathematical definition of the complexity
of an object is not available”, Emmert-Streib tries to summarize the conditions under
which a complexity measure is considered a good one:
1. The complexity of simple and random objects is less than the complexity of
complex objects.
2. The complexity of an object does not change if its size changes.
3. A complexity measure should quantify the uncertainty of the complexity value.
Whereas the first two had been formulated previously, the third one has been
added on purpose by Emmert-Streib. The novelty of this author’s approach consists
in the fact that he does not attribute complexity to a single object x, but rather to the
whole class of objects generated by the same underlying mechanism. The measure
CS is defined through the following procedure:
1. Let X be a process that generates values x, x , x , ... (denoted x ∼ X), and let
F̂X,X be the estimate of the empirical distribution of the normalized compression
distances between the x s from n1 samples, SX,X n1
={xi =NCD(x , x ) | x , x ∼
n1
X}i=1 .
2. Let Y be a process that generates values y, y , y , .... Let F̂X,Y be the estimate of
the empirical distribution of the normalized compression distances between the
x s and y s from n2 samples, SX,Y
n2
= {yi = NCD(x , y ) | x ∼ X, y ∼ Y }ni=1 2
,
from object x and y of size m from two different processes X and Y .
3. Compute T = Supx |F̂X,X − F̂X,Y | and p = Pr(T < t).
n1 n2
4. Define CS SX,X , SX,Y | X, Y , m, n1 , n2 .
The statistic complexity corresponds to the p-value of the underlying null hypothesis
H0 = FX,X = FX,Y .
An interesting use of Kolmogorov complexity has been done by Schmidhuber
[477], who invoked simplicity as a means to capture the “essence” of depicted
objects (cartoon-like figures). For him the final design of a picture should have a low
Kolmogorov complexity, while still “looking right”. Schmidhuber generates figures
using only circles: any figure is composed by arcs on circles, as it was exemplified
in Fig. 8.2, and also in Fig. 10.10.
In order to make Kolmogorov complexity tractable, he selects a specific language
in which to write the program encoding figures. As the circles are drawn in sequence
with decreasing radiuses, each circle will have an integer associated to it, denoting
the ordering of drawing. Large circles are few and are coded by small numbers,
whereas small circles are many and are coded by larger numbers. For each arc in
346 10 Simplicity, Complex Systems, and Abstraction
the figure we need to specify the number c of the circle it belongs to, the start point
s and the end point e , and the line thickness w . Arcs are drawn clockwise from
s to e ; point s (resp. e ) can be specified by indicating the number of the circle
intersecting or touching c in s (resp. e ) (plus an extra bit to discriminate between
two possible intersections). Thus, each pixel on an arc can be specified by a triple of
circle numbers, two bits for differentiating intersections, and a few more bits for the
line width.
Using very many very small circles anything can be drawn. However, the chal-
lenge is to come up with an acceptable drawing with only a few large circles, because
this would mean to have captured the “essence”of the depicted object. Such repre-
sentations are difficult to obtain; Schmidhuber reports that he found much easier to
obtain acceptable complex drawings than acceptable simple ones of given objects.
By making a step further, Schmidhuber uses Kolmogorov complexity also to
define “beauty”, assuming that a “beautiful” object is one that requires the minimum
effort to be processed by our internal knowledge representation mechanism, namely
by the shortest encoding program.
CLMC = H ∗ D,
where:
N N
∗ 1 H 1 2
H =− pi log2 pi = and D = pi −
log2 N log2 N N
i=1 i=1
Notice that: 2
∗ N −1
0 H 1 and 0 D ≈1
N
This definition of complexity does satisfy the intuitive conditions mentioned above.
For a crystal, disequilibrium is large but the information stored is vanishingly small,
so CLMC ∼ = 0. On the other hand, H ∗ is large for an ideal gas, but D is small,
∼
so CLMC = 0 as well. Any other system will have an intermediate behavior and
therefore C > 0. A final remark is that the CLMC is dependent on the scale of system
analysis; changing the scale, the value of CLMC changes.
where π ∗ is the shortest program generating x running on U, and T (π) is the time
taken by program π.
As it is possible that the machine U needs some string of data in order to compute
x, Definition (10.1) can be generalized to the following one:
Definition 10.2 Let x and w be any two strings, U a universal Turing machine, and s
a significance parameter. A string’s depth relative to w at significance level s, denoted
Ds (x|w), is defined by
According to Definition (10.2) x’s depth relative to w is the minimum time required
to compute x from w by an s-incompressible program relative to w.
10.3 Complexity Measures 349
Lloyd and Pagels [338] take a view of complexity in physical systems similar to
Bennett’s one [49]. Given a system in a particular state, its complexity is not related
to the difficulty of state description, but rather to that of state generation. Then,
complexity is not a property of a state but of a process, it is a “measure of how hard
it is to put something together” [338].
More formally, let us consider a system in a macroscopic state d, and let σ1 , ..., σn
be the set of trajectories, in the system’s phase space, that lead to d. Let pi be the
probability that the system has followed the ith trajectory. Then, the “Depth” D of
the state d is
D = −k ln pi ,
DT = S̄ − S0 = S̄ − kB Ω0 ,
Shiner et al. [487] start by observing that several among the complexity measures
proposed in the literature, even though interesting, are difficult to compute in practice
and, in addition, they depend on the observation scale. On the contrary, they claim
that complexity should be easy to compute and be independent of the size of the
system under analysis.
Based on these considerations, Shiner et al. proposed a parameterized measure,
namely a simple measure of complexity Γαβ , which, by varying its two parameters α
and β, shows different types of behavior: complexity increasing with order, complex-
ity increasing with disorder, and complexity reaching a maximum in between order
and disorder. As required, the measure Γαβ is easy to compute and is independent
from the system’s size.
In order to provide a precise definition of Γαβ , order and disorder are to be
introduced first. If n is the number of system’s states, and pi (1 i n) the
probability that the system be in state i, then disorder is defined as:
S −1 n
Δ= = kB pi ln pi
Smax Smax
i=1
350 10 Simplicity, Complex Systems, and Abstraction
Using the definitions of order and disorder, the simple complexity of disorder strength
α and order strength β is defined as:
Γαβ = Δα Ω β (10.13)
(Seq − S)
Ω=
Seq
In other words, Ω is a measure of the distance from equilibrium. Thus, for non-
equilibrium systems, the simple measure of complexity is a function of both the
“disorder” of the system and its distance from equilibrium.
Finally, the authors also show that, for the logistic map, Γ11 behaves like Grass-
berger’s effective complexity (see Sect. 10.3.7). Moreover, Γ11 is also related to
Lopez-Ruiz et al.’s normalized complexity [340].
10.3.6 Sophistication
This definition states that the complexity of σ is the sum of the size of the program
that generates it plus the size of the input data.
Definition 10.4 (c-Minimal Description) A description of σ, (p, D), is c-minimal
if |p| + |D| H(σ) + c.
A description of σ is c-minimal if it does not exceeds the minimum one plus a
constant c.
Definition 10.5 (Sophistication) The c-sophistication of a finite string σ, SOPHc
(σ) = Min{|p| : ∃ D such that (p, D) is a c-minimal description of σ}.
Koppel shows that there is a strict relation between sophistication and logical depth,
as stated by the following theorem.
Theorem 10.1 (Koppel [296]) SOPH(σ) is defined for all σ. Moreover, there exists
a c such that, for all σ, either SOPH(σ) = D(σ) = ∞ or [SOPH(σ) − D(σ)] < c.
Grassberger did not say how to find the maximally predictive models, nor how the
information required can be minimized. However, in Information Theory, the data-
processing inequality says that, for any variables A and B, I[A, B] I[f (A), B]; in
other words, we cannot get more information out of data by processing it than was in
there to begin with. Since the state of the predictor is a function of the past, it follows
that I[X − , X + ] I[f (X − ), X + ]. It could be assumed that, for optimal predictors,
the two informations are equal, i.e., the predictor’s state is just as informative as the
original data. Moreover, for any variables A and B, it is the case that H[A] I[A, B],
352 10 Simplicity, Complex Systems, and Abstraction
namely no variable contains more information about another than it does about itself.
Then, for optimal models it is H[f (X − )] I[X − , X + ]. The latter quantity is what
Grassberger calls Effective Measure Complexity (EMC), and it can be estimated
purely from data. This quantity, which is the mutual information between the past
and the future, has been rediscovered many times, in many contexts, and called
with various names (e.g., excess entropy, predictive information, and so on). Since
it quantifies the degree of statistical dependence between the past and the future, it
looks reasonable as a measure of complexity.
In a recent work Abdallah and Plumbley [3] have proposed yet another measure of
complexity, the “predictive information rate” (PIR), which is supposed to capture
some information that was not taken into account by previous measures, namely tem-
poral dependency. More precisely, let {..., X−1 , X0 , X1 , ...} be a bi-infinite stationary
sequence of random variables, taking values in a discrete set X . Let μ be a shift-
invariant probability measure, such as the probability distribution of any contiguous
block of N variables (Xt+1 , ..., Xt+N ) is independent of t. Then, the shift-invariant
block entropy function will be:
H(N) = H(X1 , ..., XN ) = − μ (x) log2 pμ (x),
pN N
(10.15)
x∈Xn
where pN μ : X → [0, 1] is the unique probability mass function for any N consecu-
n
If we now consider the two continuous sequences (X−N , ..., X−1 ) (the “past”) and
(X0 , ..., XM−1 ) (the “future”), their mutual information can be expresses by
If both N and M tend to infinity, we obtain the excess entropy [119] or the effective
measure complexity E [221]:
On the other hand, for any given N, letting M go to infinity, Bielik et al.’s predictive
information Ipred [57] is obtained from:
10.3 Complexity Measures 353
←−
Considering a time t, let Xt = (..., Xt−2 , Xt−1 ) denote the variables before time t,
−
→
and Xt = (Xt+1 , Xt+2 , ...) denote the variables after time t. The PIR It is defined as:
−
→← − −
→← − −
→ −
←
It = I(Xt ; Xt | Xt ) = H Xt | Xt − H Xt |Xt , Xt (10.20)
Equation (10.20) can be read as the average reduction in uncertainty about the future
←− ←− − →
on learning Xt , given the past. H(Xt | Xt ) is the entropy rate hμ , but H(Xt | Xt , Xt ) is a
quantity not considered by other authors before. It is the conditional entropy of one
variable, given all the others in the sequence, future and past.
The PIR satisfies the condition of being low for both totally ordered and totally
disordered systems, as a complexity measure is required to be. Moreover, it captures
a different and non trivial aspect of temporal dependency structure not previously
examined.
10.3.9 Self-Dissimilarity
A different approach to complexity is taken by Wolpert and Macready [568], who base
their definition of complexity on experimental data. They start from the observation
that complex systems, observed at different spatio-temporal scales, show unexpected
patterns, that cannot be predicted from one scale to another. Then, self-dissimilarity
is a symptom (sufficient but not necessary) of complexity, and hence the parameter
to be quantified. From this perspective, a fractal, which looks very complex at our
eye, is actually a simple object, because is has a high degree of self-similarity.
Formally, let Ωs be a set of spaces, indexed by the scales s. Given two scales
(i)
s1 and s2 , with s2 > s1 , a set of mappings {ρs1 ←s2 }, indexed by i, is defined: each
mapping takes elements of Ωs2 to element of the smaller scale space Ωs1 . In Wolpert
and Macready’s approach scales do not refer to different levels of precision in a
system measurement, but rather to the width of a masking window through which
the system is observed. The index i denotes the location of the window.
(i)
Given a probability distribution πs2 over Ωs2 , a probability distribution πs1 ←s2 =
ρ(i) (i)
s1 ←s2 (πs2 ) over Ωs1 is inferred for each mapping ρs1 ←s2 . It is often convenient to
summarize the measures from the different windows with their average, denoted by
πs1 ←s2 = ρs1 ←s2 (πs2 ). The idea behind the self-dissimilarity measure is to compare
the probability structure at different scales. To this aim, the probabilities at scales
s1 and s2 are both translated to a common scale sc , such that sc Max{s1 , s2 }, and
then compared. Comparison is made through a scalar-valued function Δs (Qs , Qs )
that measures a distance between probability distributions Qs and Qs over a space
Ωs . The function Δs (Qs , Qs ) can be defined according to the problem at hand; for
instance it might be
354 10 Simplicity, Complex Systems, and Abstraction
the two sets Ψg and Ψa , we obtain: K(ψa ) = lg2 Ta < lg2 Tg = K(ψg ). In fact, we
have Ta < Tg by definition. Then, Kolmogorov complexity of Ψ decreases when
abstraction increases, as we would expect.
Clearly the above code is not effective, because Na and Ng cannot in general
be computed exactly. Moreover, this code does not say anything about the specific
ψg and ψa which are described, because Ig can be smaller, or larger, or equal to
Ia , depending on the order in which configurations are numbered inside Ψg and
Ψa . We consider then another code, which is more meaningful in our case. Let
(g) (g) (g) (g) (g)
Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR be a description frame, and let ψg be a con-
figuration in the associated space Ψg . We can code the elements of Γg by specifying
the following set of parameters:
• Number of types, V . Specifying V requires lg2 (V + 1) bits. List of types
{t1 , · · · , tV }. Specifying a type requires lg2 (V + 1) bits for each object.
• Number of objects, N. Specifying N requires lg2 (N + 1) bits. Objects are con-
sidered in a fixed order.
• Number of attributes, M. Specifying M requires lg2 (M + 1) bits. For each
attribute Am with domain Λm ∪ {UN, NA}, the specification of a value for Am
requires lg2 (|Λm | + 2) = lg2 (m + 2) bits (1 m M).
• Number of functions, H. Specifying H requires lg2 (H + 1) bits. Each function
fh of arity th has domain of cardinality N th and co-domain of cardinality ch . Each
tuple belonging to any RCOV(fh ) needs th lg2 (N + 1) + lg2 (ch + 1) to be
specified.
• Number of relations, K. Specifying K requires lg2 (K + 1) bits. Each relation
Rk has domain of cardinality N tk . Each pair belonging to any RCOV(Rk ) needs
tk lg2 (N + 1) to be specified.
As a conclusion, given a configuration ψg , its specification requires:
H (a)
+ |FCOV(fh )| (th lg2 N + lg2 (ch + 1))
h=1
K (a)
+ |FCOV(Rk )| tk lg2 N
k=1
(a) (a)
As N − 1 < N, |FCOV(fh )| |FCOV(fh )|, |FCOV(Rk )| |FCOV(Rk )|, we can
conclude that, for each ψg and ψa , K(ψa ) < K(ψg ).
Similar conclusions can be drawn for Vitànyi’s meaningful information, Gell-
Mann and Lloyd’s effective complexity, and Koppel’s sophistication. In fact, the
first part of a two-part code for ψg is actually a program p on a Turing machine,
which describes regularities shared by many elements of Ψg , including ψg . But
ψg may contain features that are not covered by the regularities. Then, K(ψg ) =
|p| + Kirr (ψg ). Without entering into details, we observe that the same technique
used for Kolmogorov complexity can be adapted to this case, by applying it to both
the regular and the irregular part of the description.
(g) (g) (g) (g) (g)
Example 10.1 Let Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR , be a ground description
frame, with:
(g)
ΓTYPE = {obj}
(g)
ΓO = {a,b,c}
(g)
ΓA = {(Color, {red, blue, green}), (Size, {small, medium,
large}), (Shape, {square, circle})}
(g)
ΓF = ∅
(g) (g) (g)
ΓR = {ROn ⊆ ΓO × ΓO }
In Γg we have V = 1, N = 3, M = 3, 1 = 3, 2 = 3, 3 = 2, H = 0, K = 1. Let
ψg be the following configuration:
ψg = {(a,obj,red,small,circle), (b,obj,green,small,square),
(c,obj,red,large,square)} ∪ {(a,b), (b,c)}
The analysis of Bennett’s logical depth is more difficult. For simplicity, let us
assume that, given a configuration ψ, its logical depth be the run time taken by the
program p, whose length is exactly its Kolmogorov complexity. Without specifying
the nature of the configurations, it is not possible to say whether pa will run faster
than pg . In fact, even though the abstract configurations are “simpler” (according to
Kolmogorov) than the ground ones, they may be more difficult to generate. Then,
nothing precise can be said, except that an abstract frame can be, according to Bennett,
either simpler or more complex than the ground one. This result comes from the fact
that the logical depth is a generative and not a descriptive measure of complexity.
In Eq. (10.22) the ψg ’s are disjoint, so that their probabilities can be summed up.
Notice that the sets COMPg (ψa ) are also disjoint for different ψa ’s, because each ψg
has a unique image in Ψa , for each abstraction process.
Let us consider now Lopez-Ruiz et al.’s normalized complexity CLMC [340], which
is the product of the normalized entropy and the diversity of Ψ . In probabilistic
contexts the comparison of complexities has to be done on configuration frames,
because it make no sense to consider single configurations. Then, the normalized
complexity (10.13) of the ground space is:
where Na = |Ψa |. In order for the abstract configuration space to be simpler, it must
be:
D(Ψg ) H ∗ (Ψa )
H ∗ (Ψa ) · D(Ψa ) < H ∗ (Ψg ) · D(Ψg ) → < ∗ (10.23)
D(Ψa ) H (Ψg )
1
πa (ψa ) =
Ng
(1)
• Ψg , which contains those configurations in which a
single object
has Size =
medium. Their number is m1 = 62208. In this case
COMPg (ψa )
= 3. Then,
each corresponding abstract configuration ψa has probability:
3
πa (ψa ) =
Ng
9
πa (ψa ) =
Ng
27
πa (ψa ) =
Ng
1 m0 m1 Ng m2 Ng
H ∗ (ψa ) = log2 Ng + log2 + log2
log2 Na Ng Ng 3 Ng 9
m3 Ng
+ log2 = 0.959 (10.25)
Ng 27
1 1 2 m1 3 1 2 m2 9 1 2
D(ψa ) = m0 − + − + −
Ng Na 3 Ng Na 9 Ng Na
m3 27 1 2
+ − = 0.00001807 (10.26)
27 Ng Na
Then:
As we may see, the abstract configuration space is slightly more complex that the
ground one. The reason is that Ψa is not very much diversified in terms of probability
distribution.
360 10 Simplicity, Complex Systems, and Abstraction
The opposite result would have been obtained if a non uniform probability distri-
bution πg over Ψg would generate a more uniform one over Ψa .
Let us consider now Shiner et al.’s simple complexity [487], reported in Sect. 10.3.5.
This measure is linked to the notion of order/disorder of the system under consider-
ation. Given a configuration space Ψ , its maximum entropy is reached when all con-
figurations have the same probabilities, i.e., Hmax (Ψ ) = log2 N. The actual entropy
of Ψ is given by
H(Ψ ) = − π(ψ) log2 π(ψ)
ψ∈Ψ
Then:
H(Ψ )
Δ(Ψ ) = = H ∗ (Ψ ),
Hmax (Ψ )
Ω(Ψ ) = 1 − Δ(Ψ )
In Fig. 10.11 an example of the function Γαβ (x) is reported. Let us consider a value
H(Ψ )
xg = log2 Ng g and let xg the other value of x corresponding to the same ordinate. Then,
H(Ψa )
Γαβ (xa ), where xa = log2 Na
, will be lower than Γαβ (xg ) if xa > xg or xa < xg . In
other words it must be:
Example 10.3 Let us consider again the description frames of Example 10.2. In
this case Ng = 139968 and Na = 73859; then Hmax (Ψg ) = log2 Ng = 17.095
10.5 Summary 361
(x)
0.35
0.30
0.25
0.20
0.15
0.10
0.05
Fig. 10.11 Graph of the curve Γαβ (x) = x α (1 − x)β for the values α = 0.5 and β = 1.3. When
α > 1 the curve has a horizontal tangent for x = 0. The x values (where x = HH max
) are included in
the interval [0, 1]. If xg is the value of x in the ground space, then the value xa in the abstract one
should either be larger than xg or be smaller than xg
and Hmax (Ψa ) = log2 Na = 16.173. The normalized entropies of Ψg and Ψa were,
respectively, xg = H ∗ (Ψg ) = 1 and xa = H ∗ (Ψa ) = 0.959. The value of the function
Γαβ (x) for xg = 1 is Γαβ (1) = 0; then, in this special case, the abstract space Ψa
can only be more complex than the ground one. The reason is that Γαβ (x) only takes
into account the probability distribution over the states, and there in no distribution
which has a greater entropy that the uniform one.
10.5 Summary
In conclusion, at the present state of the art, only the measures of simplicity based
on Kolmogorov’s one are guaranteed to be co-variant with abstraction, as defined in
our model. This is due to the fact that the notion of simplicity we are interested in
is of descriptive nature and not a probabilistic one. This result can be extended to
other models of abstraction, because, as we have shown, they can be reconnected
to the KRA model. On the other hand, Kolmogorov complexity is hard to handle,
even in approximate versions, and appears far from the concreteness required by the
applications. However, this complexity measure could be a starting point for a more
operational definition of simplicity.
Chapter 11
Case Studies and Applications
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 363
DOI: 10.1007/978-1-4614-7052-6_11, © Springer Science+Business Media New York 2013
364 11 Case Studies and Applications
discrepancy originates from a fault in the system. The task is then to uncover which
components are faulty, in order to account for the detected discrepancies.
Notwithstanding the increasing number of applications to real-world problems,
the routine use of MBD is still limited by the high computational complexity, residing
in the large number of hypothesized diagnoses. In order to alleviate this complexity
problem, abstraction has been often advocated as one of the most powerful remedies.
Typically, abstraction is used to construct a hierarchical representation of the
system, in terms of both structure and behavior. The pioneering work by Mozetič
[379] has established a connection between some of the computational theories of
abstraction proposed in other areas of Artificial Intelligence (for instance, [214]) and
a notion of abstraction that could be usefully exploited in the MBD field. Since then
novel proposals have been made to apply theories of abstraction in the context of the
MBD task (see, for instance, Friedrich [184], Console and Theseider Dupré [113],
Provan [432], Chittaro and Ranon [99], Torasso and Torta [531], Grastien and Torta
[222], and Saitta et al. [467]).
The approach described by Mozetič, and by some more recent authors [99, 432],
mostly uses abstraction in order to focus the diagnostic process, and thus to improve
its efficiency; in particular, the diagnosis of the system starts by considering the
abstract level(s) and, whenever an (abstract) diagnosis if found, the detailed model
is invoked in order to justify such diagnosis in terms of one or more detailed ones.
Abstraction also makes possible to return fewer and more concise abstract diagnoses
when it is not possible to discriminate among detailed diagnoses. The works by
Console and Theseider Dupré [113], Friedrich [184], and Grastien and Torta [222]
accomplish this goal by including abstraction axioms in the Domain Theory and
preferring diagnoses which are “as abstract as possible”.
Recently, some authors have aimed at the automatic abstraction of the system
model (see, for instance, Sachenbacher and Struss [463], Torta and Torasso [532],
and Torasso and Torta [531]). If the available observables and/or their granularity
are too coarse to distinguish among two or more behavioral modes of a component,
or the distinction is not important for the considered system, a system model is auto-
matically generated where such behavioral modes are merged into an abstract one.
By using the abstract model for diagnosis there is no loss of (important) information,
while the diagnostic process is more efficient and the returned diagnoses are in a
smaller number and more significant.
In MBD, abstraction takes the form of aggregating elementary system compo-
nents into an abstract component, with the effect of also merging combinations of
behaviors of the original components into a single abstract behavior of the abstract
component. Component aggregation can be modeled, inside the KRA framework,
through an aggregation operator, which automatically performs all the required trans-
formations, once the components to be aggregated are specified. These transforma-
tions change the original ground description of the system to be diagnosed into a
more abstract one involving macrocomponents, obtained by suitably aggregating
elementary components. In abstraction for MBD a key role is played by the notion
of indiscriminability, which refers to alternative observation patterns leading to the
same diagnosis. Indiscriminability corresponds, in the abstraction literature, to the
11.1 Model-Based Diagnosis 365
DT ∪ X ∪ O ∪ S ⊥
DTΣ ∪ XΣ ∪ SΣ ∪ PΣ ⊥ ⇔ DTΣ ∪ XΣ ∪ SΣ ∪ PΣ ⊥
where PΣ is any set of ground atoms expressing measurements on the external ports
PΣ,ext of Σ.
of Σ are indiscriminable iff
According to Definition 11.5, two states SΣ and SΣ
given any set of values measured on the external ports of the subsystem, SΣ and
are either both consistent or inconsistent with such measurements (under the
SΣ
constraints imposed by DTΣ and XΣ ).
11.1 Model-Based Diagnosis 367
Fig. 11.1 Fragment of an hydraulic system used as a running example. Two pipes P1 and P2 are
connected to a valve V1 through a set of ports (“u1 , . . . , v2 ”). The valve can receive an external
command s1
Fig. 11.2 Example of behavioral models of components. Fault PC denotes a partially clogged
pipe, BR denotes a broken pipe, whereas CL denotes a clogged pipe. Fault SO denotes a stuck open
valve, while SC denotes a stuck closed one
In the following we will use a running example to clarify the notions we introduce. In
particular, we consider the fragment of an hydraulic system, reported in Fig. 11.1. It
includes two types of components, i.e., pipe (objects P1 and P2 ), and valve (object
V1 ). Each component has two input and two output ports, where Δflow and Δpressure
can be measured. Ports connect components with each other (as, for instance u2 and
u1 ), or connect components to the environment (as, for instance, u1 and u2 ).
In Fig. 11.2 the behavioral models for a generic valve and a generic pipe are
reported. The models are expressed in terms of qualitative deviations [510]; for
example, when the pipe is in behavioral mode pc (partially clogged), the qualitative
deviation Δfout of the flow at the end of the pipe is the same as the qualitative deviation
Δfin of the flow at the beginning of the pipe. For the pressure, instead, we have that
Δrin = Δrout ⊕ + where ⊕ is the addition in the sign algebra [510]. In other words,
the pressure is qualitatively increased at one end of the pipe with respect to the other
end.
Let us assume that pipe has one nominal behavior (ok) and three faulty modes
pc (partially clogged), cl (clogged) and br (broken), while the valve can be in the
ok mode (and, in this case, it behaves in a different way depending on the external
command s1 set to open or closed), in the so (stuck open) mode, or in the sc
(stuck closed) mode.
In order to formulate MBD within the KRA model, we must define the different
entities needed for modeling the domain at the various abstraction levels. At the most
368 11 Case Studies and Applications
(g)
ΓTYPE = {pipe, valve, port, control}
(g) (g)
ΓO,pipe = {P1 , P2 , . . .}, ΓO,valve = {V1 , V2 , . . .}
(g) (g)
ΓO,port = {u1 , u2 , . . .}, ΓO,control = {s1 , s2 , . . .}
(g)
ΓA,pipe = {(Status, {ok,pc,cl,br})}
(g)
ΓA,valve = {(Status, {ok,so,sc})}
(g)
ΓA,port = {(Observable, {yes, no}), (Direction, {in, out}),
(Δr, {+, 0, −, NA}), (Δf , {+, 0, −, NA})}
(g)
ΓA,control = {(Status, {open, closed})}
The theory Tg contains two types of knowledge: an algebra over the qualitative
measurement values {+, 0, −}, and a set of rules specifying the component behaviors.
As an example of the first type of knowledge, the semantics of the qualitative sum
⊕ is given:
+ ⊕ + = +, + ⊕ 0 = +, + ⊕ − = {+, 0, −}
0 ⊕ 0 = 0, + ⊕ − = −, − ⊕ − = −
The second type of knowledge contains, for instance, the table in Fig. 11.2. Using
algorithm BUILD-DATA and BUILD-LANG, reported in Sect. 6.3, a database struc-
ture DB g can be constructed, as well as a language Lg to intensionally express DBg
and the theory. The database schema DB g contains the OBJ table, specifying all
objects and their type, the attribute tables PIPE-ATTR, VALVE-ATTR, PORT -ATTR,
and CONTROL-ATTR, and a table for each of the relations RisPortof , RConnect , and
RControls . As the construction of the tables is straightforward, we omit it.
The language Lg = Cg , X, O, Pg , Fg
contains the set of constants correspond-
ing to the identifiers of all objects of all types, and to the elements of the domains of
the attributes. There are no function, so that Fg = ∅.
11.1 Model-Based Diagnosis 369
For the sake of simplicity, we define in the theory a set of additional predicates, which
correspond to complex formulas, such as, for instance:
Let us consider now the specific hydraulic fragment Pg of Fig. 11.1. In this fragments
there are two pipes, P1 and P2 , one valve, V1 , a control device s1 , acting on the valve,
and several ports connecting pipes to the valve, and components of the fragment with
the external world. The complete description of this system is contained in a data
base Dg , corresponding to the schema DB g .
Using the KRA model, we apply operator ωaggr ({pipe, valve}, pv) to gener-
ate a new type of component, i.e., pv, starting from a pipe and a valve. Using
meth(Pg , ωaggr ) we first aggregate pipe P1 and valve V1 (see Fig. 11.3, left), obtain-
ing thus an abstract component PV1 , as described in the right part of Fig. 11.3. In PV1
the ports connecting P1 and V1 are hidden, whereas the others remain accessible; hid-
ing these ports aims at modeling a reduction of observability, and has the potential of
increasing the indiscriminability of the diagnoses of the subsystem involving P1 , V1 .
Fig. 11.3 New abstract component PV1 of type pv resulting from aggregating a pipe and a valve.
Ports u2 and u 1 are connected with each other and disappear, as well as ports v2 and v 1
370 11 Case Studies and Applications
The algorithm meth(Pg , ωaggr ) is applied to the input (P1 , V1 ) and generates
in output PV1 . It also computes the new attribute values of PV1 , according to the
rules inserted in its body by the user. The algorithm also modifies the cover of the
relations, and computes the abstract behaviors as well, starting from the information
reported in Fig. 11.2. Without entering into all the details of meth(Pg , ωaggr ), we
can mention, among others, the following operations:
• P1 and V1 are removed from view, together with their attributes, and PV1 is
added, with attribute Status. For each value of Status(s1 ) and for each pair
(Status(P1 ), Status(V1 )) the value of Status(PV1 ) is computed according to the
rules for composing behaviors.
• Tuples containing any of the hidden ports are hidden in the covers of all relations.
The visible ports are connected to PV1 through relation RisPortof , and the control
device s1 is also associated to PV1 .
• In order to define the behavioral modes of a pv component, the instantiations of the
behavioral modes of the pipe and the valve are partitioned according to the indis-
criminability relation, and a new behavioral mode for pv is introduced for each
class of the partition (an algorithm for efficiently computing indiscriminability
classes is reported by Torta and Torasso [533]).
Once the behavioral modes of the abstract component have been identified, corre-
sponding predicates are introduced at the language level. In particular, assuming that
Status(s1 ) = open, we may name the new modes as Oam1 , . . . Oam4 . The modes
correspond to sets of instances of the subsystem components. As an example, we
provide the definition of mode Oam3 :
The above described abstraction process can be automatize in the KRA model; start-
ing from a pipe and a valve serially connected, in principle the abstract component
could have 12 different behavioral modes. Actually, a large number of behavioral
assignments to the components P1 and V1 collapse into the same abstract mode of
the abstract component PV1 ; this is a strong indication that the abstraction is not only
possible but really useful.
The synthesis of formulas describing the behaviors of an abstract component pv
can be performed automatically by taking into account the formulas describing the
behaviors of pipes and valves, and the method meth(Pg , ωaggr ). In particular, given
that behaviors are described with qualitative equations like the ones of Fig. 11.2, it
is possible to synthesize qualitative equations for the abstract level; for example,
formulas for the behaviors of a pv component (with exogenous command open) are
reported in Fig. 11.4. We recall that meth(Pg , ωaggr ) stores in Δ(P ) the modifications
done during abstraction.
11.1 Model-Based Diagnosis 371
normally given by the designer, there may be parameters in it, which lend themselves
to automated learning. The application of abstraction in the domain of Cartography,
which we present here, shows not only that performing abstraction in this domain
brings important improvements, but also that the methods associated to the operators
can be partially learned from examples, contributing thus substantially to achieve
those improvements.
In order to understand the role of abstraction in the Cartography domain, some
introduction is necessary. Automatically creating maps at different scales is an impor-
tant and still unsolved problem in Cartography [180]. The problem has been tackled
through a variety of approaches, including Artificial Intelligence (AI) techniques
[349, 553].
The creation of a map is a multi-step process, involving several intermediate
representations of the underlying geographic data. The first step is the acquisition
of a picture (cf. Fig. 11.5a), taken, usually, from an airplane or a satellite. From
this picture, an expert (a Photogrammetrist or a Stereoplotter) extracts a geographic
database (cf. Fig. 11.5b). This database contains the coordinates of all points and lines
that were identified by the stereoplotter on the picture. Moreover, s/he associates a
category (road, building, field, etc.) to the identified objects. Then, a third step consists
in defining, starting from the geographic database, the objects to be symbolized in
the map, e.g., which roads must be represented, their position, and their level of
detail (e.g., the sinuosity of the road). The obtained representation is a map (cf.
Fig. 11.5c). It is important to notice that the third step has to be repeated for each
desired scale, increasing the cost and time of producing maps at different levels of
detail and reducing flexibility. In fact, maintaining multiple databases is resource-
intensive, time consuming and cumbersome. Furthermore, the map may be completed
with various kinds of information corresponding to the type of thematic map desired
(geology, rainfall, population, tourism, history, and so on).
11.2 Cartographic Generalization 373
(a)
(b) (c)
Fig. 11.6 a Part of a map at 1/25000 scale. b A 16-fold reduction of the map. c Cartographic
generalization of the map at the 1/100 000 scale. By comparing b and c the differences between
simply reducing and generalizing are clearly apparent. [A color version of this figure is reported in
Fig. H15 of Appendix H]
1 To avoid any confusion, we should mention here that the term generalization used in the field of
Cartography does not correspond to the same term used in Artificial Intelligence, but refers to the
process of generating a map while simplifying data.
374 11 Case Studies and Applications
W World W Geographic
World
Perceived
P(W) World ( ) P (W) Aerial image
Memorization Plotting
Data Structure
S ( )
S GIS/GDB
Description Description
Fig. 11.7 Correspondence between the KRA model and production steps of a map. The perception
of the world (the observations) corresponds to an aerial image of a zone of the surface. The stereo
plotting process generates a set of point coordinates, memorized in a geographical database. An
iconic language allows the numerical data containing in the databases to be transformed into the
symbols that appear in the map (rivers, roads, cities, buildings, and so on). A geographic theory,
which can be expressed in the defined language, allows the map to be targeted to a specific use and
interpreted
Object before
transformation
Simplification Displacement
Object after
transformation
Fig. 11.8 Simplification and displacement of buildings during reduction of the map scale. In the
upper-left corner there is the building as it appears in the initial representation (or, better, as it would
have appeared if it would have been represented at the level of detail with which it is memorized
in the GDB), and in the final representation (simple reduction). In the lower-left corner the object
appears as it is after simplification (both in the original and in the final representation). In the right
part of the figure a displacement of two buildings is described
Let us look at the two operations from the KRA point of view, with the aim of
finding an abstraction operator implementing them. In the case of simplification, the
effect is obtained by degrading the shape information in the GDB, whereas, in the
displacement, by changing the location information. In both cases the understand-
ability of the map, by the part of the reader, is improved with respect to what would
be perceivable without transformation, even though part of the original information
is lost (hidden).
Considering first the displacement operation, we can see that nothing changes in
the building’s shape representation from one scale to the other, except their relative
positions. If a building’s location is represented with the coordinates (xc , yc ) of its
center of mass, inthe ground map the distance between two buildings b1 and b2
2 2
is given by d = xc,1 − xc,2 + yc,1 − yc,2 . We take the occasion to observe
that, if the change of scale is realized by a proportional shrinking of all the linear
dimensions (as it happens in a photocopy) the translation from one scale to the other
would not involve any abstraction, but a simple reformulation. In fact, all the location
information in one map would be in a one-to-one correspondence with those in the
other, and no information is lost or hidden. Hence, the distance between the modified
buildings b 1 and b 2 would be
2 2
− x − y
d = xc,1 c,2 + yc,1 c,2 =
2 2
= α2 xc,1 − xc,2 + α2 yc,1 − yc,2 = α · d (11.1)
11.2 Cartographic Generalization 377
On the contrary, the displacement operation changes, for instance, the new location
(ap) (ap)
of building b 1 from (αxc,1 , αyc,1 ) to (xc,1 , yc,1 ), in such a way that
2 2
(ap) (ap) (ap)
d =
xc,1 − xc,2 + yc,1 , yc,2 > d (11.2)
By recalling the discussion reported in Sect. 7.6, we observe that, according to our
model, the displacement operation is a reformulation followed by an approxima-
tion. The reformulation changes all x into αx and all y into αy. If a building is an
object of type building, and Xc and Yc are two of its attributes with domain R,
then the reformulation construct b with attributes Xc = αXc and Yc = αYc . After-
(ap)
ward, the approximation process Πap = ρrepl (Xc , R), (Xc , R) , ρrepl (Yc , R),
(ap) (ap) (ap)
(Xc , R) generates b(ap) with attributes Xc and Yc , whose values are chosen
in such a way that condition (11.2) holds.
(ap)
In order to realize Πap , the methods meth Pg , ρrepl (Xc , R), (Xc , R) and
(ap)
meth Pg , ρrepl (Yc , R), (Yc , R) must be provided. These methods contain
parameters; for instance, only buildings that are very close must be displaced, and
only certain directions are useful to separate the buildings, and so on. These para-
meters can be learned from examples.
Moving to simplification, this operation is actually an abstraction, because some
of the information regarding the perimeter of a building is hidden. In the KRA
model, simplification can be modeled with more than one operator, depending on
the way building are represented. For what follows, we refer to Fig. 11.9.
Suppose that a building is represented in the GBD as a sequence of points (speci-
fied by their Cartesian coordinates) belonging to the perimeter, such as (1, 2, . . . , 9).
The theory shall contain an algorithm LINE that, given the coordinates of two points,
draws a straight line between them on the map. In this case, starting from the original
building (the leftmost in Fig. 11.9), we can obtain the final one (i.e., the rightmost),
by iteratively applying operator ωhobj (j) to some point j in the perimeter: the choice
Fig. 11.9 Simplification of the perimeter of a building b into a building b(a) . Given a sequence of
points, the intermediate ones can be hidden by applying a ωhobj operator. The method associated to
ωhobj contains an algorithm which draws a segment between the two extreme points
378 11 Case Studies and Applications
of what point to hide at each step, and the criterion to stop are provided in the associ-
ated meth(b, ωhobj (j)). For example, the final building in Fig. 11.9 can be obtained
from the original one by means of following combination2 of operator applications:
Notice that it is up to the user to say when to stop. For instance, hiding points could
have included a further step for hiding point 5, obtaining thus a quadrilateral.
Another (maybe more interesting) way of proceeding would be to define a domain-
specific operator ωsimplify ((p1 , . . . , pk ), (p1 , pk )), which, taken in input a sequence
of k points, hides all the intermediate ones, by leaving only the first, p1 , and the
last one, pk . This operator could be expressed as a “macro”-operator in terms of the
elementary ones described above.
In the cartographic domain there are other standard geometric transformations
[361], in addition to simplification and displacement:
• geometry enhancing (e.g., all angles of a building are drawn as right angles)
• deletion
• schematising (e.g., a group of objects is represented by a smaller set of objects,
but with the same repartition)
• caricaturing (e.g., an object or one of its parts is enlarged to give it more relevance)
• change of geometric type (e.g., a city, stored in the GDB with its exact surface
coverage, is represented as a point on the map).
Without describing the above operations in detail, we can say that all can be con-
sidered (as well as simplification and displacement) as kinds of combination of
abstractions and possibly an approximation step.
In this section we briefly show how the parameters occurring in the methods of the
above introduced operators can be automatically learned, with the goal of speeding
up and improve the process of cartographic generalization.
The first idea is to use, for learning, the scheme of Fig. 11.10, where a transfor-
mation (an operator or set of operators, in our model) is learnt directly from the
GBD. This is the approach taken by Werschlein and Weibel [557] and Weibel et al.
[554], who propose to represent this function by means of a neural network. Even
though interesting, this approach has the drawback or requiring a very large num-
ber of examples for learning, and is sensitive to the orientation of the object to be
represented.
In order to overcome the above problems, we have used a two-step approach,
where an abstract representation of the original object is first generated, by replacing
abstraction
{(xj,yj)}1 j 6
h1?
{(xi,yi)} 1 i 20
Reformulation
Fig. 11.10 Task of learning how to transform an element, i.e., learning the operator ω. The perimeter
of the building on the right has six sides (therefore less than the twenty on the left) and is more
regular
the quantitative description provided by the GBD with a qualitative one. Then, oper-
ators are applied to this abstract description, and their parameters are automatically
learned. More precisely, abstract objects are no more represented by a list of coor-
dinates, but by a fixed number of measurements describing them (size, number of
points, concavity). These measurements have been developed in the field of spatial
analysis [360, 422]. The two phases are the following ones:
• Learning how to link the original, numerical object’s attributes (such as Area,
Length, ) to abstract qualitative descriptors (such as the linguistic attribute Size),
using abstraction operators (e.g., “Area < x” → “Size = small”).
• Linking the new description to the operation to be performed on the object (e.g.,
“Size = small” and “Shape = verydetailed” → Apply Simplification with
parameters θ).
The abstraction step, involved in moving from a quantitative to a qualitative rep-
resentation of cartographic objects, can be modeled, in KRA, with operators similar
to those that generate linguistic variables in Fuzzy Sets, as described in Sect. 8.8.1.4.
Such operators establish an equivalence among subsets of values of attributes, and
label these subsets with linguistic terms, to be assumed by a predefined linguistic
variable. Each of these has the following form:
ωeqattrval (X, ΛX ), [xi , xj ], Lij , (11.3)
where X is the original, numerical attribute, ΛX is its domain, [xi , xj ] is the interval of
values that are made indistinguishable, and Lij is the associated linguistic term, taken
on by a linguistic variable L (defined by the user). As the expert is usually unable
to reliably supply the interval [xi , xj ], then it is learned from a set of examples. For
instance, to describe a building an expert can say whether it is small, medium or big,
380 11 Case Studies and Applications
but s/he will not be able to say that a small building is one with an area smaller than
300 m2 . S/he is able to show examples of small buildings, but usually is unable to
provide a threshold on the surface measurement to characterize small buildings.
The above learning approach has been tested on a task of abstracting buildings.
In order to collect the necessary data and knowledge a space analysis and algorithm
expert 3 was asked to define:
• A set of measures describing buildings, reported in Fig. 11.11. The algorithms for
computing these measures have been defined by Regnauld [446].
• A set of “operations” applicable to buildings, listed in Fig. 11.12.
Each “operation” of Fig. 11.12 has been translated into a corresponding abstraction
operator (or combination thereof). Afterwards, a Cartography expert was asked to
define a set of qualitative abstract descriptors for a given building that are somehow
related to the above-mentioned measures; these descriptors are reported in Fig. 11.13.
Then, the Cartography expert provided a set of 80 observations of buildings, and
he was first asked to describe each building with the defined qualitative descrip-
tors (this building has Shape = L-shape, the Size = medium, or Contains =
no-big-wings). The same set of buildings, each one abstracted by each opera-
tion, was presented to him, and he was asked to say, for each abstract building, if the
result was acceptable or not. Meanwhile the set of measures chosen by the expert
Fig. 11.14 Two of the eighty examples of buildings used in the Machine Learning test
was computed on each building. Two examples of buildings are shown in Fig. 11.14.
Eighty examples were used to first learn how to link measures to abstracted descrip-
tors, then to link abstract descriptors to applicability of each operations.
Figure 11.15 shows a decision tree learned by C4.5 to determine the linguistic
values of the abstract feature Size from a given set of measures.
In addition to the positive qualitative evaluation of meaningfulness, given by the
expert, the abstract descriptions proved to be effective in improving the following
step of choosing the transformation, as well. Finally, the automatically learned rules
382 11 Case Studies and Applications
Size =
<137 m2 Small
>0.93 Size =
Big
Fig. 11.15 Learned decision tree for determining the values of the qualitative attribute Size
Fig. 11.16 Different road representation. The use of abstraction produced the best compromise
between readability and amount of details kept
q12
q10 q13
q11
q00 q01
q00 q01 q02
The difficulty, in this basic formulation, is that, when the set of states Q grows large,
the number of parameters to estimate (A and B) rapidly becomes intractable. As
a matter of fact, in many applications to Pattern Recognition, an HMM λ may be
required to have a very large set of states. One possibility to address this problem is
to impose a structure upon the automaton, by a priori limiting the number of state
transitions and the possible symbol emissions. This corresponds to setting to 0 some
entries in matrices A or B (cfr. [143]).
Another way to face the structural and computational complexity of the HMM
is to use abstraction, searching for group of states to aggregate, subject to some
constraints. The results is an extension of the basic HMM, namely the Hierarchical
HMM (HHMM) first proposed by Fine, Singer and Tishby [169]. The extension
immediately follows from the regular languages property of being closed under sub-
stitution, which allows a large finite state automaton to be transformed into a hierar-
chy of simpler ones. More specifically, an HHMM is a hierarchy where, numbering
the hierarchy levels with ordinals increasing from the lowest towards the highest
level, observations generated in a state qi k by a stochastic automaton at level k are
sequences generated by an automaton at level k − 1. The emissions at the lowest
levels are again single tokens as in the basic HMM. Moreover, no direct transition
may occur between the states of different automata in the hierarchy. As in HMM,
in every automaton the transitions from state to state is governed by a probability
distribution A, and the probability for a state being the initial state is governed by a
distribution π. The constraint is that there is only one state that can be the terminal
one. Figure 11.17 shows an example of HHMM.
The major advantage provided by the hierarchical structure is a strong reduction
of the number of parameters to estimate. In fact, automata at the same level in the
hierarchy do not share interconnections: every interaction through them is governed
by transitions at the higher levels. This means that for two automata λl,k , λm,k at
level k the probability of moving from the terminal state of λl,k to one state of λm,k
386 11 Case Studies and Applications
processed extracting from each one the sequence which includes the most likely
hypotheses and is compatible with the given constraints. The default constraint is
that hypotheses must not overlap. Finally, every sequence is transformed again into
a string of symbols in order to be able to process it with standard local alignment
algorithms in the next step.
After the phases of abstraction and refinement, a hierarchical structure, such as
the one reported in Fig. 11.17 is obtained. As we can see, the original 8 states are
abstracted into 4, by reducing thus the number of parameters in the probability
distributions to be learned. The methodology sketched above has been applied to a
real-world problem of user profiling, and was able to handle ground HMMs with
some thousands of states [188]. This results could not have been achieved without
abstraction.
11.4 Summary
In this chapter we have briefly illustrated how the KRA model of abstraction has
been used in practice in three non trivial applications. In Model-Based Diagno-
sis (MBD) the model offers a number of advantages in capturing the process of
abstracting the architecture of the diagnosed system. In particular, the use of generic
aggregation operators allows different subsystems to be automatically abstracted
with the only need to specify the parameters in their associated methods. In other
words, the aggregation operators work independently of the nature of the aggregated
components, only exploiting the components’ topological arrangement. This is an
important feature of the model, because it frees the user from the need to redesigning
the procedure for abstracting different types of components.
In recent years, novel approaches have been proposed to try to solve the inverse
problem, i.e., to exploit the degree of observability for defining useful abstraction;
moreover, other works have started to investigate the problem of automatic synthesis
of abstract models. While these approaches have developed some interesting solu-
tions to the problem of abstracting models for MBD, the obtained results, even though
encouraging, are still limited, especially with respect to domain-independence. The
use of the KRA model, which offers a large set of already working and tunable oper-
ators, may help improving the results. The application of the model to Cartography
proved very useful, so that it was extensively used in various tasks related to map
production.
The application of the KRA model to the automated acquisition of the structure
and parameters of an HHMM is perhaps the most natural one, as the very structure
of an HMM spontaneously suggests state aggregation.
An indication of the flexibility and usefulness of the KRA model comes from the
fact that it has been used in real-world applications which are very different from one
another, yet exploiting the same set of operators. Another significant application of
the model to the problem of grounding symbols in a robotic vision system has been
described by Saitta and Zucker [470].
Chapter 12
Discussion
12.1 Analogy
The first topic that we find interesting at this point to further investigate how abstrac-
tion has been linked with analogy. The word analogy derives from the Greek
ὰναλoγ ία, which means proportion. Analogy is a powerful mechanism in both
human and animal reasoning [534], at the point that Hofstadter puts it at the very core
of cognition [263]. Differently from other reasoning mechanisms, such as induction
or abduction, analogy is a mapping from a particular to a particular. Oxford dictionary
defines analogy as “a comparison between one thing and another, typically for the
purpose of explanation or clarification”, or “a correspondence or partial similarity”,
or still “a thing which is comparable to something else in significant respects”. From
the above definitions it appears that analogy is somewhat equated to “similarity”, but
we argue that there is more to analogy than simple resemblance. We think, instead,
that analogy has its root in abstraction, as schematically illustrated in Fig. 12.1.
A characteristic of analogical reasoning is that it involves not only a holistic analy-
sis of the concerned entities, but also an account of their behaviors and relations inside
their contextual environment. Then, it is a more complex process than recognition
or classification, or extracting common features, or computing a similarity measure.
Just to give a feeling of the difference that we see, we report here an anecdote.
Jaime Carbonell, at Carnegie Mellon University (Pittsburgh, MA), one time told his
undergraduate students that an even number can be written as 2n, with n integer. To the
question of how an odd number could be defined, several students answered 3n. This
erroneous answer can be (arguably) explained by assuming that the students have
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 389
DOI: 10.1007/978-1-4614-7052-6_12, © Springer Science+Business Media New York 2013
390 12 Discussion
Fig. 12.1 Scheme of analogical reasoning. From a particular case or object, C1 , an abstract structure
is determined, which captures the essence of C1 . The same structure describes also another particular
case or object C2 , usually in another domain, to which some of the properties of C1 can be transferred
via the abstraction, once the association between C1 and C2 is done
where the antecedent A1 plays (represented by a semicolon “:”) for the consequent
B1 the same role that A2 plays for B2 . In the example above we would write:
1 An approach to analogy based on proportions has been presented by Miclet et al. [370].
12.1 Analogy 391
However, finding the right (useful) proportion is not easy, as usually the analogy
cannot be deduced from the definition of the terms in it. On the contrary, it is necessary
to recur to some abstract schema linking antecedents and consequents. In this case,
we must know that an animal, in order to stay alive and acting, needs food. In the
same way, a machine needs electric power in order to operate. On the other hand,
there is no apparent similarity between food and electric power, as there is (usually)
none between an animal and a machine. Even though analogy appears to be allowed
by some kind of abstraction, it is not identical to this last. Abstraction only acts as
a bridge through which information can be transferred, realizing thus the analogy.
Form this point of view, the notion of abstraction based on channel theory [464],
could be a good candidate to model analogical reasoning.
Investigation on analogy was quite active in the Middle Age, especially in the
domain of Law. Already in earlier times, the Roman law contemplated the analogia
legis, which allowed a concrete case (target), for which no explicit norm was given,
to be judged using an existing norm for a case (source) that shared with the target
the “ratio”, i.e., a legal foundation. Something similar is also present since old times
in the Islamic jurisprudence, where the word qiyās (“analogy”) denotes the process
allowing a new injunction to be derived from an old one (nass). According to this
method, the ruling of the Quran and sunnah may be extended to a new problem
provided that the precedent (asl) and the new problem (far) share the same operative
or effective cause (illah). The illah is the specific set of circumstances that triggers
a certain law into action. As an example the prohibition of drinking alcohol can be
extended to the prohibition of assuming cocaine.
One of the scholars who dealt with analogy in the Middle Age is Thomas de Vio
Cardinalis Cajetanus, who in 1498 published the treaty De Nominum Analogia, which
is a semantic analysis of analogy [86]. Cajetanus introduces three types of analogy:
• Analogy of inequality,
• Analogy of attribution,
• Analogy of proportionality.
Over the three alternative modes of analogy Cajetanus clearly favors analogy of
proportionality; he is mostly interested in languages, and shows that analogy is a
fundamental aspect of natural languages.
Other authors, such as Francis Bacon and Stuart Mill, base analogy on induction,
and offer the following reasoning scheme to account of it:
Premises
x is A B C D
y is A B C
Conclusion
y is probably D
In more recent times theories of analogy have been formulated in Philosophy and
Artificial Intelligence. A pioneer in the field has been Gentner [197, 198], who also
supports the claim that analogy is more than simple similarity or property sharing
among objects. Then, she proposes a structure-mapping theory, which is based on the
392 12 Discussion
idea that “… analogy is an assertion that a relational structure that normally applies
in one domain can be applied in another domain”. Gentner uses a conceptualization
of a domain in terms of objects, attributes and relations, similar to the description
frame Γ of our KRA model. She considers a base domain, B, and a target domain,
T , and her account of analogy consists of a two step process:
• Objects in domain B are set in correspondence with objects of domain T .
• Predicates in B are carried over to T , using the preceding for guiding predicate
instantiation.
• Predicates corresponding to object attributes are discarded, whereas relations are
preferentially kept.
• In order to choose what relations to keep, use the Systematicity Principle, exploiting
a set of second order predicates.
Gentner’s proposal heavily rely on the ability to let the “correct” objects be mapped
across domains.
It is instructive to look at the differences that Gentner sees between literal simi-
larity, analogy, and abstraction:
• A literal similarity is a comparison in which a large number of predicates is mapped
from B to T (relatively to those that are not mapped). Mapped predicates include
both attributes and relations.
• An analogy is a comparison in which most relational predicates (with few or no
object attributes) are mapped from B to T .
• An abstraction is a comparison in which B is an abstract relational structure, where
“objects” are not physical entities, but “generalized” ones. All predicates from B
are mapped to T .
As we can see, in Gentner’s original formulation abstraction and analogy have
the same status, i.e., a bridge between a base and a target domain. However, in a
recent paper Gentner and Smith [199] attributed to abstraction the role of “possible
outcome of structural alignment of the common relational pattern”, and investigated
the psychological foundations of analogical reasoning.
Another well known approach to analogy was proposed by Holyoak and Thagard
[261], as a continuation of older works by Holyoak and co-workers (e.g., [204]).
These authors were mostly interested in the role of analogy in problem solving,
where analogy was viewed as playing a central role in human reasoning. Holyoak
and Thagard described a multiconstraint approach to interpret analogies, in which
similarity, structural parallelism, and pragmatic factors interact. They also developed
simulation models of analogical mapping. Holyoak and his group are still interested
in analogy, and their approach evolved, in very recent years, toward a Bayesian
approach to relational transformation [347], and to the neural basis for analogy in
human reasoning [42].
In the field of qualitative physical systems Forbus and coworkers proposed the
system (simulator) SME for analogical reasoning [159]. Given a base and a target
domain, SME computes first one or more mappings, each consisting of a correspon-
dence between items in the two domains; then, it generates candidate inferences,
12.1 Analogy 393
which are statements about the base that are hypothesized to hold in the target by
virtue of these correspondences. The original model evolved along the years, and
it has been updated and improved in recent papers [341], in order to be applied to
human geometrical reasoning, where it obtained good experimental results.
The SME system was criticized by Chalmers et al. [91], who advocated a view of
analogy as “high-level perception”, i.e., as the process of making sense of complex
input data. These authors propose a model of high-level perception and analogical
reasoning in which perceptual processing is integrated with analogical mapping.
Later on, analogy has also been the subject of Hofstadter’s book Fluid Concepts and
Creative Analogies [255].
Several other authors have contributed to the research on analogy. For instance, in
relation to Linguistics, Itkonen [271] distinguishes analogy as a process from analogy
as a structure, and establishes links between analogy and other cognitive operations,
such as generalization and metaphoric reasoning. Keane [405] proposes a three-phase
model of analogical reasoning, encompassing the phases of retrieval, mapping and
inference validation, in the context of designing creative machines. Aamodt and Plaza
[1] set analogy in the context of case-based reasoning, whereas Ramscar and Yarlett
[442] describe a new model, EMMA, exploiting an “environmental” approach, which
relies on co-occurrence information provided by Latent Semantic Analysis. Sowa
and Majumdar [500] investigate the relationships between logical and analogical
reasoning, and describe a highly efficient analogy engine that uses conceptual graphs
as knowledge representation; these authors make the interesting claim that “before
any subject can be formalized to the stage where logic can be applied to it, analogies
must be used to derive an abstract representation from a mass of irrelevant detail”.
Finally, Turney [536] sees analogy as a general mechanism that works behind a
broad range of cognitive phenomena, including finding synonyms, antonyms, and
associations. A recent review of the topic can be found in Besold’s thesis [55].
In studying analogy there are two orthogonal issues: one is how to use analogical
reasoning for explanation, and the other is to “invent” analogies for discovery or
creativity. The first, which is the almost universally investigated issue, consists in
recognizing that some mapping exists between a base and a target domain, and
then using it to derive new properties in the target one. Typically, suppose that we
recognize that there is a correspondence between Coulomb’s Law of interaction
between electric charges and Newton’s Law of interaction between masses; then, it
would be sufficient to solve the equations of motion in just one domain, and then
transfer the results to the other. In this case, analogy is used a posteriori, for explain
why some phenomenon happens. The second issue, namely to start from just the base
domain, and to suggest a possible target one, is much more difficult and less studied.
has been at the core of the investigation on abstraction since the beginning. The
estimation of the saving in computation due to abstraction is problem-dependent,
and hence cannot be handled in general. We can examine some examples, in order
to provide a feeling of how the issue has been handled in the literature.
In the early 1980s researchers were trying to automatically create admissible heuris-
tics for the A* search using abstraction from a given (“ground”) state space Sg to an
“abstract” state space Sa = φ(Sg ). The idea was to map each ground state S ∈ Sg to
an abstract state φ(S), and to estimate the distance h(S) from S to a goal state G in Sg
using the exact distance between φ(S) and φ(G) in Sa . It has been proved [429] that
such an h(S) is an admissible heuristic if the distance between every pair of states
in Sg is not lower than the distance between the corresponding states in Sa . At the
time, embedding and homomorphism were the most common abstraction techniques
used in search; an embedding is an abstraction transformation that adds some edges
to Sg (for instance by defining a macro-operator), whereas a homomorphism is an
abstraction that groups together a set of ground states to generate a single abstract
one.
The goal of computing the heuristic function h(S) is to focus A*’s search; however,
the cost for computing h(S) has to be included in the computation balance, and it may
happens that the cost to get h(S) overweights its benefits in the search. An important
results was obtained very early by Valtorta [544], who proved that if Sg is embedded
in Sa and h(.) is computed by blindly searching Sa , then, using h(.) A* will expand
every state that is expanded by blindly searching directly in Sg .
Several years later, Holte et al. called the number of nodes expanded when blindly
searching in a given space the “Valtorta’s Barrier” [260]. According to Valtorta’s
results, this barrier cannot be broken using any embedding transformation. Actually,
Holte et al. have generalized Valtorta’s results to any abstraction in the following
theorem.
Theorem 12.1 (Holte et al. [260]) Let Sg be a state space, (Start, Goal) a problem
in Sg , and φ : Sg → Sa an abstraction mapping. Let hφ (S) be computed by blindly
searching in Sa from φ(S) to φ(Goal). Finally, let S be any state necessarily expanded
when the problem (Start, Goal) is solved by blind search directly in Sg .
Then, if the problem is solved in Sg by A* search using hφ (.), it is true that:
• either S itself will be expanded,
• or φ(S) will be expanded.
Valtorta’s theorem is a special case of Holte et al.’s one, obtained when “embed-
dings” are considered; in fact, in this case φ(S) = S. Holte et al. showed that, using
homomorphic abstraction techniques, Valtorta’s Barrier can be broken in a large vari-
ety of search spaces. To speed up search, Holte et al. recursively generate a hierarchy
12.2 Computational Complexity 395
of state spaces; at each level, the state with the largest degree is grouped together
with its neighbors, within a certain distance, to form a single abstract state. This is
repeated until all states have been assigned to some abstract state. The top of the hier-
archy is the trivial search space, whereas the bottom is the original search space. The
authors applied this approach to A*, by showing that in several domains improve-
ments can be obtained, provided that two novel cashing techniques are added to A*,
and a suitable level of granularity is chosen for abstraction.
The results obtained by Holte et al. [260] did not exhaust the topic, but started
instead a research activity, which evolved until recently [411], with the definition
of a novel type of abstraction, called multimapping abstraction, well suited to the
hierarchical version of the search algorithm IDA* (Iterative Deepening A*), which
eliminates the memory constraints of A* without sacrificing solution optimality.
Multimapping abstraction, which allows multiple heuristic values for a state to be
extracted from one abstract state space, consists in defining a function that maps a
state in the original state space to a set of states in the abstract space. Experimental
results show that the multimapping abstraction is very effective in terms of both
memory usage and speed [260].
A task of Artificial Intelligence in which abstraction has been frequently and effec-
tively used to reduce complexity is Constraint Satisfaction Problem (CSP). An
overview of the proposed approaches was provided by Holte and Choueiry [257].
Most of the research on abstraction in CSPs has focused on problem reformu-
lation. The most common abstraction involved the domains of the variables, and is
based on symmetry. When the symmetry is known in advance, it can be exploited
by the problem solver to avoid unnecessary exploration of equivalent solutions. A
set of values in the domain of a variable are said to be interchangeable if they all
produce identical results in the problem’s solution. This notion of equivalence allows
a variable’s domain to be partitioned into equivalence classes, where all the values in
a class are equivalent. This allows CSPs to be solved much more quickly, because,
instead of considering all the different values in the domain, it is only necessary to
consider one representative for each class. This approach has been explored since the
beginning of investigation on abstraction [104], and has been refined and extended
until now [103, 396, 521]. In particular, Choueiry and Davis [103] described the
Dynamic Bundling technique for efficiently discovering equivalence relations dur-
ing problem solving. It has been shown to yield multiple solutions to a CSP with
significantly less effort than is necessary to find a single solution.
Another technique for abstracting variables in CSPs is aggregation [257]. For
instance, in a CSP reformulation of the graph coloring problem nodes that are not
directly connected to one another but are connected to the same other nodes in
the graph can be given the same color in any solution to the problem. Thus, the
variables representing these nodes can be lumped together as a single variable. This
396 12 Discussion
aggregation, which can be applied repeatedly, reduces the number of variables in the
CSP, consequently reducing the cost of finding a solution. This procedure will not
necessarily find all possible solutions, but it is guaranteed to find a solution to the
problem if one exists.
Finally, a CSP can be made easier by hiding some of the constraints. If the abstract
problem is unsolvable, then the original problem has been shown to be unsolvable
too. On the other hand, if a solution is found for the abstract problem, this solution can
be used to guide finding a solution to the original problem. For continuous variables x
and y, a constraint C(x, y) can be represented by a 2-dimensional region in the (x, y)
plane. As this region can be very complex, C(x, y) could be replaced by the pair or
constraints C1 (x) and C2 (y), which are the projections of C(x, y) onto the axes x and
y, respectively. Again, if the abstract problem is unsolvable, so is the original one.
Domain abstraction in CSPs has been investigated by Schrag and Miranker [479]
in an original context, namely the emergence of a phase transition between the
solvable and unsolvable phases [4, 196, 431, 495, 561, 570, 571]. Domain abstraction
is a sound and incomplete method with respect to unsatisfiability, i.e., it is effective
only if both the abstract and the original problems are unsatisfiable, because it makes
the problem easier to solve.
Domain abstraction introduces a many-to-one mapping between values of the
variables, reducing thus the size of the variable domains from dg to da = dg /γ
where γ in an integer that is a divisor of dg . A sound and complete algorithm, with
worst-case complexity O(d n ), will solve the abstract problem with a saving factor of
O(γ n ). Schrag and Miranker uses the classical representation of a CSP consisting of
the 4-ple (n, d, m, q), where n is the number of variables, d the domain size (assumed
to be equal for all variables), m is the number of constraints, and q is the number of
allowed tuples for each constraint (assumed to be the same for all constraints). Then,
each CSP is a point in a 4-dimensional space. Domain abstraction maps a point in
this space (the original problem (n, dg , m, qg )) to another one (the abstract problem
(n, da , m, qa )). Clearly, the abstraction process modifies d and, as a consequence,
q, but does not modify either n or m. Whereas da is known, the effect of domain
abstraction on the tightness p2 = 1 − dq2 of the constraints is not; let Q be a random
variable that represents the number of tuples allowed by each constraint in the abstract
space.
By using a mean field approximation, Schrag and Miranker assume that Q is equal
to its mean value, i.e., Q = qa , and set out to predict both qa and the location of the
new problem in the problem space. A good estimate of qa appears to be
⎡ γ 2 ⎤
qg
qa = da2 ⎣1 − 1 − 2 ⎦ (12.1)
dg
Equation (12.1) tells that, increasing qg , qa reaches a plateau where all the constraints
are loose (they allow all possible tuples), and any problem instance is almost surely
satisfiable, making thus the domain abstraction ineffective. The greater γ, the earlier
the plateau is reached. Domain abstraction is then effective only when the constraints
12.2 Computational Complexity 397
are very tight. It is a natural question to ask what is the maximum value of γ that
can be used for domain abstraction to remain effective. Quite surprisingly, the effec-
tiveness of abstraction appears to have a phase transition itself, because there is a
critical value γcr separating effective from ineffective behaviors. Starting from a set
of problem instances (most of which unsatisfiable) at a point (n, dg , m, qg ), effec-
tiveness is computed as the fraction of problem instances still unsatisfiable at point
(n, da , m, qa ). This fraction jumps from almost 1 to almost 0 at γcr .
The authors consider their finding as mixed results, in the sense that, on the one
hand, very significant reductions in complexity can be observed, and, on the other,
the applicability of the approach seems rather limited, given the high degree of
tightness required for the original problem. They explain the negative results with
the absence of any “structure” in the variables’ domain. If some structure can be
found, for instance interchangeability (cfr. above), or hierarchical arrangement, or
other kinds of symmetries, then abstraction might have a more positive impact than
the one predicted by their theory.
If we consider abstraction under all the names it as been used in Machine Learn-
ing (feature selection and construction, term construction, motif discovery, state
aggregation, …), it would not be possible to investigate, in general, its effects on
computational complexity. What can be only said is that all researchers that used
those techniques reported some minor or major computational advantage.
Then, in this section we concentrate instead on a specific, even though quite
esoteric, issue, namely the relation between abstraction and the presence of a phase
transition in the problem of matching a hypothesis against an example in symbolic
learning [209, 469]. There are at least two reasons for this: firstly, it is a topic at the
frontier of research (and not yet well understood), and, secondly, it has strong links
with the issue discussed in the preceding section.
Matching a hypothesis ϕ(x1 , . . . , xn ), generated by a learner, against an example e
allows one to check whether the example verifies the hypothesis. In a logical context
for learning, a hypothesis is a formula in some First Order Logic language (usually a
DATALOG language), and an example consists of a set of relations, each one being
the extension of a predicate in the language. An example can be found in Fig. 12.2.
A matching problem can be represented with a 4-ple (n, L, m, N), where n is the
number of variable in ϕ, L is the size of the common domain of the variables, m
is the number of predicates in ϕ, and N is the number of allowed tuples in each
relation. It is immediate to see that the matching problem is equivalent to a CSP
(n, d, m, q), as defined in the previous section. The matching problem, which is
a decision problem, shows a phase transition with respect to any pair of the four
parameters. We consider as control parameters of the transition m (characterizing
the size of the hypothesis) and L (characterizing the size of the example), whereas
n and N are fixed. By considering as order parameter the probability Psol that a
398 12 Discussion
on(X,Y) left(X,Y)
e
X Y X Y
a
a b a c left(X,Y),on(Y,Z)
c
c d a d X Y Z
b b c a c d
d
b d b c d
(a) (b)
Fig. 12.3 3-Dimensional plot of the probability of solution Psol for n = 10 and N = 100. Some
contour level plots, corresponding to Psol values in the range [0.85 ÷ 0.15], have been projected
onto the plane (m, L)
Fig. 12.4 FOIL’s “competence map”: the success and failure regions, for n = 4 and N = 100.
The phase transition region is indicated by the dashed curves, corresponding, respectively, to the
contour plots for Psol = 90 %, Psol = 50 %, and Psol = 10 %. The crosses to the left of the phase
transition represent learning problems that could easily be solved exactly (i.e., the target concept
was found). The crosses to the right of the phase transition line represent learning problems that
could be approximated (i.e., hypotheses with low prediction error were found, but different from the
target concept). Dots represents problems that could not be solved. Successful learning problems
were those that showed at least 80 % accuracy on the test set. Failed problems are those that reached
almost 100 % accuracy on the learning set, but behaved almost like random predictors on the test
set (around 50 % accuracy)
(see previous section). An extensive experimentation has shown that, in this plane,
there is a large region where learning problems could not be solved by any available,
top-down, hypothesize-and-test relational learner [70], as represented in Fig. 12.4 for
FOIL [438].
A large region (a “blind spot”) emerges, located across the phase transition, where
no learner succeeded. In this region, in the vast majority of cases the hypotheses
learned were very accurate (accuracy close to 100 %) on the learning set, but behaved
like random predictors (accuracy close to 50 %) on the test set. The threshold of 80 %
in accuracy, that we have chosen to state that the learner was successful, could have
been any value between 95 % and 60 %, without making any significant difference
in the shape of the blind spot. The plot in Fig. 12.4 had n = 4, which is a very small
value. In fact, things become much worse with increasing n. Then, the number of
variables in the hypothesis is a crucial parameter for the complexity of learning.
Given this situation, one may wonder whether abstraction could be a way out
of this impasse. We have tried three abstraction operators, namely domain abstrac-
tion, arity reduction, and term construction. We point out that for learning it is not
400 12 Discussion
necessary to revert to the original problem: if good hypotheses can be found in the
abstract space, they can be used directly, forgetting about the original space. For
instance, when the operator that hides attributes is applied (performing feature selec-
tion), there is no need to reintroduce the hidden attributes. Then, what counts in
learning is either to move a learning problem from a position in which it is unsolv-
able to one in which it is, or to move it from a position where it is solvable, but has
high complexity, to a position where it is still solvable, and, in addition, it requires a
lower computational effort.
Domain Abstraction Operator
Let us consider the simplest domain abstraction operator ωeqobj ({a1 , a2 }, b), which
makes constants a1 and a2 indistinguishable, both denoted by b. The effect of apply-
ing ωeqobj is that, in each of the m relations contained in any training example, each
occurrence of either a1 or a2 is replaced by an occurrence of b. The language
of the hypothesis space does not change. With the application of ωeqobj we obtain
na = n, ma = m, La = L − 1, Na = N if we accept to keep possibly duplicate
tuples in the relations, or Na < N otherwise. The point Pg , corresponding to the
(eqobj)
learning problem in the ground space, jumps down vertically to Pa , located on
the horizontal line La = L − 1. At the same time, as (possibly) Na N, the phase
transition line (possibly) moves downwards, as described in Fig. 12.5a.
The effect of ωeqobj is different depending on the original position of the learning
problem (Pg ). If Pg is in the NO-region, moving downwards may let Pa enter the
blind spot, unless Na recedes sufficiently to let the blind spot move downward as well.
If Pg is in the lower border of the blind spot, Pa may fall outside of it, becoming a
solvable problem. However, by noticing that the downward jump in L is of a single
unit, it is more likely that this type of abstraction does not help in easing the learning
task, especially if Na < N. In summary, the application of ωeqobj is beneficial, from
(a) 50 (b) 50
40 40
30 30
20 20
10 20 30 40 50 10 20 30 40 50
Fig. 12.5 a Application of operator ωeqobj to Pg . The learning problem moves downwards toward
regions of (possibly) greater difficulty. b Application of operator ωhargrel or ωhargfunl to Pg . The
learning problem moves left toward regions of (possibly) lower difficulty
12.2 Computational Complexity 401
both the complexity and the learnability points of view, when Pg is located on or
below the phase transition, whereas it may have no effect, or even be harmful, when
Pg is located above it.
Arity Reduction Operator
As the exponential complexity of matching is mostly due to the number of variables,
trying to reduce this last could be beneficial, in principle, provided that still good
hypotheses can be learned.
Let us consider the case in which we want to hide a variable in all function
and predicates where it occurs. Then, we can apply a set of operators of the type
ωhargrel (Rk (x1 , . . . , xn ), xj ) (1 k K), or of the type ωhargfun (fh (x1 , . . . , xn ),
xj ) (1 h H). Each operator hides the column corresponding to xj in the cov-
ers RCOV (Rk ) or RCOV (fh ). At the same time, the hypothesis language is mod-
ified, because the predicate rk (x1 , . . . , xn ) corresponding to relation Rk , becomes
r(a) (x1 , . . . , xj−1 , xj+1 , . . . , xn ).
In the abstract space, the number of constants remains the same (La = L), while the
number of variables decreases by 1 (i.e., na = n − 1). The number of predicates most
likely decreases (i.e., ma m); in fact, hiding a variable in a binary predicate makes
this last a unary one, and hence it does not contribute anymore to the exponential
increase in computational cost of matching. For this reason it does not count anymore
in the m value. Finally, Na N, because some tuples may collapse.
As a consequence of arity abstraction the point Pg jumps horizontally to Pa ,
located on the line La = L, whereas the phase transition goes down because of the
decrease in n. The application of ωhargrel is most likely to be beneficial, from both
the complexity and the learnability points of view, when Pg is located on or below
the phase transition, whereas it may have no effect, or even be harmful, when Pg
is located above it, especially if it is at the right border, but outside, the blind spot.
However, considering that the curves for different values of n are quite close to one
another, it may be the case that the abstract problem jumps to the easy region without
entering the blind spot.
Term Construction Operator
The term construction operator ωaggr ({t1 , . . . , tk }, t(a) ) aggregates k objects
(xi , . . . , xk ) of type t1 , . . . , tk into an object y of a new type t(a) . To build term
y it is necessary to first find all the solutions of a smaller matching problem, and
to assign a new constant to each of the tuples (a1 , . . . , ak ) in this solution. For the
sake of simplicity, let us suppose that there is a single tuple (a1 , . . . , ak ) that can be
aggregated, and let b its new identifier. All objects a1 , . . . , ak must disappear from
the examples. In addition, a value UN will replace any occurrence of the ak ’s in any
function and relation. Then, na = n − k + 1, La = L + 1. The value Na is likely to
decrease and the value of m may also decrease (i.e., ma m). In the plane (m, L)
the point Pg moves leftward and upwards, which is most often beneficial, unless
Pg is located in the region corresponding to very low L and very large m values.
From the learnability point of view, the application of ωaggr may be particularly
beneficial when Pg is located at the upper border, but inside, the blind spot; in this
402 12 Discussion
case, problems that were unsolvable in the ground space may become solvable in the
abstract one.
The basic KRA model has been extended in two ways: the first extension keeps the
basic structure of the model unchanged, while adding a new facility for handling
types [514, 515, 550, 551]; the second extension, while maintaining the spirit of
the model, modifies its structure [244]. The two extensions will be described in the
following subsections.
The model G-KRA has been recently proposed by Ouyang, Sun and Wang [514,
515, 550, 551] with the aim of increasing KRA’s generality and automatization.
The extension consists in the addition of a Concept Ontology to improve knowledge
sharing and reuse, being ontologies a nowadays widely used tool to conceptualize
a domain.
The basic ideas is that an agent A gathers directly from the world a Primary
Perception, whereas other agents can use this “ground” perception to build more
abstract ones using abstraction operators. The difference, with respect to the original
KRA model, is that there are several ontologies, with different levels of abstraction,
which specify what objects and what object’s properties can be observed at each level.
The authors introduce the notion of Ontology Class OC = (E, L, T ), where E =
(OBJ, ATT , FUNC, REL), and L and T are a language and a theory, respectively.
This formulation shows the following differences/similarities with respect to the
original KRA model:
• OC corresponds to the query environment QE, except that there is no explicit
query; moreover, the database, even though used in practice, in not part of the
conceptualization of the domain.
• E is KRA’s P-Set P.
• Each ontology class specifies the type, attributes, functions and relations concern-
ing a given set of observable objects. In the KRA models this information is
specified globally in the description frame Γ .
• The ontologies improve knowledge sharing and reuse, because they do not change
across applications in a given domain.
Among all the ontology classes that can be considered, one is called Fundamental
Ontology Class (FOC), and is the one corresponding to the actual ground world. The
ground observations are collected into a database S. Any abstract world is described
by the tuple R = (Pa , S, La , Ta ), where Pa = δ(Pg , OCa ) is the abstract perception,
12.3 Extensions of the KRA Model 403
obtained from Pg by mapping objects in the FOC to objects in OCa . This process
can be repeated, by applying again δ to OCa , and obtaining a hierarchy of more and
more abstract descriptions.
Given the ground perception and the ontologies, the abstraction hierarchy can be
built up automatically. The authors have applied the model to Model-Based Diagnosis
problems, and have exemplified the functioning of G-KRA using the same hydraulic
system described in Sect. 11.1.1.
Fig. 12.6 KRA model modified by Hendriks [244]. The world provides the perception P , which
is memorized into a database Sg , which is then abstracted into a database Sa . Afterwards the
content of Sa is expressed in a well-defined language La , and a theory Ta is added to complete the
conceptualization of the domain
404 12 Discussion
12.4 Summary
Even though abstraction is an interesting process per se, its links with other rea-
soning mechanisms and computational issues let the centrality of its role in natural
and artificial systems increase. In fact, abstraction may be the basis for analogical
and metaphorical reasoning, for creating caricatures and archetypes, as well as an
important tool to form categories. Putting abstraction at the basis of analogy allows
this last to be distinguished from similarity-based reasoning, because abstraction is
able to build bridges between superficially very different entities.
Virtually all views of abstraction share the idea that abstraction should bring some
advantage in terms of simplification. In computational problems advantage can be
quantified in terms of a reduction of the resources needed to solve a problem. The
effective gain cannot be estimated in general, because it depends on the problem at
hand. Some generic considerations can nevertheless be done.
As an example, let us consider an algorithm with complexity Cg = O(f (n)),
where n is the number of objects in a Pg ; hiding m objects generates a Pa in which
the same algorithm will run in Ca = O(f (n − m)). If f (n) is a linear function, then
the abstract complexity will still be linear (not a large gain indeed). But in the case
f (n) is exponential, there will be an exponential gain O(en−m ). However, the cost of
abstracting must be taken into account as well. In order to estimate this cost, let us
consider the database Dg , where Pg is memorized. Hiding m objets in the OBJ table
and in the attribute tables Aj -ATTR (1 j M) has a complexity 2O(n). Hiding
the objects in each FCOV (fh ) (1 h H) and in each RCOV (Rk ) (1 k K),
respectively, has complexity
12.4 Summary 405
H
K
O(|FCOV (fh )|) + O(|RCOV (Rk )|) (12.2)
h=1 k=1
H
K
Cg Ca + 2O(n) + O(|FCOV (fh )|) + O(|RCOV (Rk )|) (12.3)
h=1 k=1
In the worst case, when |FCOV (fh )| = O(n2 ) for each h, and |FCOV (Rk )| = O(n2 )
for each k, then:
Cg Ca + 2O(n) + (H + K)O(n2 ) (12.4)
For exponential problems Eq. (12.4) is likely to be satisfied. Similar generic con-
siderations can be done for other operators, but a realistic computation can only be
performed once the problem and the algorithms to solve it are given.
Even if the issue of saving (be it computational or cognitive or other) occurs in
most fields where abstraction applies, we have provided examples of fields where
abstraction not only plays a fundamental role, but also interferes with complex phe-
nomena such as the emergence of phase transitions in computation. In principle,
working in an abstract space looks very promising to circumvent the negative effects
that these phenomena generate, but an effective use of abstraction toward this goal
is still in a preliminary stage.
The KRA model of abstraction that we have described lends itself to improve-
ments and extensions. One possible direction is to consider a stochastic environment,
where the application of an operator does not generate a deterministic abstract state,
but only a probability distribution over a subset of states. A brief mention of this
has been done when considering probabilistic complexity measures in Chap. 10. The
extension proposed by Ouyang, Sun, and Wang adds to the description frame Γ an
ontology, which allows different types of objects to be abstracted in a controlled way,
and has been applied with success to problems of model-based diagnosis. Finally,
Hendrik modifies the structure of the model, by keeping its essential ideas, by per-
forming abstraction before defining a language.
Chapter 13
Conclusion
The variety of roles played by abstraction in many fields of sciences, art, and life show
how pervasive, multifaceted, and elusive it may be. Nonetheless, the investigation
and comparison of these roles enlightened also a largely shared, basic understand-
ing of the notion, which can be broadly synthesized as a mechanism for changes of
representation aimed at simplicity. Illustrative examples of such changes of repre-
sentation are countless, and it is clear, by now, that they are a double-bladed knife: if
well chosen they may lead to a dramatic increase in problem solving performances,
but, if not, the may be even severely harmful.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 407
DOI: 10.1007/978-1-4614-7052-6_13, © Springer Science+Business Media New York 2013
408 13 Conclusion
All the preceding considerations, together with the strong task-dependency of the
abstraction effectiveness, asked for some model of abstraction capable of going
beyond the idea of logic mappings (be they semantic or syntactic ones), to become
13.3 The Need for an Operational Theory of Abstraction 409
closer to the world, where, in order to solve a problem, observations are to be made,
and knowledge has to be elicited. Mappings between formal languages become then
ancillary with respect to the idea of abstraction acting as a set of (physical and
conceptual) “lens” with different resolutions, used to take in the world.
The KRA model, which we propose, is limited in scope, because it is mainly
targeted to those applications where the interaction with the environment (be it nat-
ural or artifactual) has a primary role, but, for those applications, it offers “concrete”
tools to perform abstraction in practice, in the form of a library of already available
abstraction operators. As we have shown, in this type of applications some of the
problems encountered in logical definitions of abstraction (such as the inconsistency
in the abstract space) loose much of their harm, because they come down to sim-
ply acknowledge that abstraction, hiding information, does not put us in a state of
contradiction, but of ignorance.
An important aspect of the view of abstraction captured by KRA is that moving
across abstraction levels should be easy, in order to support trying different abstrac-
tions, when solving a problem. For this reason, all the hidden information ought to
be memorized during the process of abstracting, so that it can be quickly retrieved.
Finally, only transformations generated by a precisely defined set of abstraction
operators are considered in the model. This is done to avoid the costly process of
checking the more-abstract-than relation on pairs of generic configurations.
In order to make the model widely applicable, many operators have been intro-
duced, covering both abstraction and approximation. At a first look, they may appear
complicated, especially if contrasted with the neat logical theories previously pro-
posed. However, if abstraction operators are to be routinely used in real-world
domains, they must cope with the wealth of details that this implies. In order to
facilitate the task of a user, we took a layered approach, borrowed from the Abstract
Data Types theory: high level templates describe broad classes of operators, specify-
ing their general aspects, intended use, effects, and computational characteristics. A
user can then be directed toward that class of operators that most likely suit his/her
problem. Once the class chosen, a specific operator is identified and applied. Again,
the information about the operator is embedded: first “what” the operator does is
described, and then, “how” it does it. The “how” is a method, i.e., a program that
implements all the transformations that the operator has to perform. The set of intro-
duced operators are intended to formalize many abstractions that were, implicitly or
explicitly, already present in the literature of several fields.
The operators that we have implemented are only a central core of a possible
collection,1 because they are domain-independent, and, hence, they may not be as
effective as domain-dependent operators could be. Practitioners using abstraction in
various domains are welcome to add and share new operators.
Furthermore, following a precise characterization of both abstraction and approx-
imation, it was also possible to define some approximation operators. Even though
also reformulation could be precisely characterized, no operator has been proposed,
because reformulation may be a complex process, formalizable only contextually.
The grounded approach to abstraction we have contributed to trace still leaves open
fundamental questions. The most important is how to link the task to be performed to
the kind of abstraction operator that is best suited to it. As we have seen, the structure
of the query may sometimes suggest what details can be overlooked and/or what
information can be synthesized, but, in general, the process of selecting a “good”
abstraction is still a matter of art. In some field, such as human vision, our brain
has evolved extremely powerful abstraction techniques; in fact, we are continuously
abstracting sets of pixels into meaningful “objects” in such an effortless way that we
are not even aware of doing it. The investigation of human abstraction mechanisms
could be extremely useful to the design of artificial vision systems.
Regarding the problem of selecting a good abstraction, the definition of a wide
set of operators may ease (even though not solve) the problem. In fact, the introduc-
tion of a single framework, in which very different operators are unified and treated
uniformly, allows a systematic and automatic search in the space of possible abstrac-
tions to be performed, without the need that the user design and implement different
operators manually. In other words, the approaches used in Machine Learning for
feature selection could be extended to include other types of abstraction operators in
the same loop; for instance, feature selection, construction and discretization could
be tried inside the same search. This is allowed by the uniform representation of all
the abstraction operators.
Another relevant question concerns the study of the properties that abstraction
operators ought to preserve across abstraction spaces. For instance, in Machine Learn-
ing, it would be very useful to design generality-preserving abstraction operators. The
study of this topic is complicated by the fact that useful properties are operator- and
task-dependent, so that it is not possible to obtain general results. Luckily enough,
identification of properties to be preserved is to be done just once.
An exciting direction of research includes also the automatic change of represen-
tation by composing abstraction and reformulation operators.2
However, the most challenging task is to learn “good” abstractions. Learning an
abstraction should not be reduced to the problem of searching for a good operator.
On the contrary, learning a good abstraction should imply that the found operator
(a) is useful for a number of different tasks (according to the principle of cognitive
economy), and (b) its subsequent applications should become automatic, as soon
as the applicability conditions are recognized, without any further search. A typical
example of learning an aggregation operator occurs in human vision: when a matrix
of pixels arrives at our retina, we “see” in it known objects without any conscious
search; this very effective image processing is the result of a possibly long process
of learning and evolution, in which different abstraction operators have been tried,
and those that proved to be useful in each new task have been reinforced.
Certainly, studying abstraction both per se and in applications is one of the most
challenging direction of research in Artificial Intelligence and Complex Systems.
Significant results in the field would allow more efficient artifacts or models to be
built, but also a better understanding of human intelligence and common sense.
Appendix A
Concrete Art Manifesto
n 1930 the Dutch painter Theo van Doesbourg (a pseudonim for Christian
Emil Marie Küpper) published the Manifesto for Concrete Art, advocating the total
freedom of art from the need to describe or represent natural objects or sentiments.
The Manifesto is reported in Fig. A.1.
The translation of the Manifesto is the following one:
BASIS OF CONCRETE PAINTING
We say:
1. Art is universal.
2. A work of art must be entirely conceived and shaped by the mind before its exe-
cution. It shall not receive anything of nature’s or sensuality’s or sentimentality’s
formal data. We want to exclude lyricism, drama, symbolism, and so on.
3. The painting must be entirely built up with purely plastic elements, namely sur-
faces and colors. A pictorial element does not have any meaning beyond “itself”;
as a consequence, a painting does not have any meaning other than “itself”.
4. The construction of a painting, as well as that of its elements, must be simple and
visually controllable.
5. The painting technique must be mechanic, i.e., exact, anti-impressionistic.
6. An effort toward absolute clarity is mandatory.
Carlsund, Doesbourg, Hélion, Tutundjian and Wantz.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 413
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
414 Appendix A Concrete Art Manifesto
his appendix shows in Fig. B.1 some more roads and their different rep-
resentations with and without abstraction. The representations result from:
• a direct symbolization (initial),
• the cartographic result produced by the hand-crafted expert system GALBE,
specifically developed to generalize road [389, 391],
• the result produced by the set of rules obtained by learning without abstraction,
• the result produced by the set of rules obtained by combining learning and abstrac-
tion.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 415
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
416 Appendix B Cartographic Results for Roads
Fig. B.1 Different road generalization results, for different roads. The improvements brought by
abstraction are clearly visible
Appendix C
Relational Algebra
T : X1 × X2 × · · · × Xn (C.1)
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 417
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
418 Appendix C Relational Algebra
PhD MANAGERS
ID SURNAME AGE ID SURNAME AGE
23 Smith 38 72 Adams 50
40 Adams 39 40 Adams 39
132 Ross 32 132 Ross 32
Fig. C.1 Given the tables corresponding to the relations R1 = PhD and R2 = MANAGERS, we
can construct the tables PhD ∪ MANAGERS, PhD ∩ MANAGERS, and PhD-MANAGERS
Intersection
Given two relations R1 and R2 of the same arity, the intersection R = R1 ∩ R2 is a
relation obtained by only keeping the tuples occurring in both relations R1 and R2 .
Set difference
Given two relations R1 and R2 of the same arity, the difference S = R1 − R2 is
obtained by eliminating from R1 the tuples that occur in R2 .
In Fig. C.1 examples of the Union, Intersection, and Set Difference operators are
reported.
Cartesian product
Let R1 and R2 be two relations of arity n and m, respectively. The Cartesian product
R = R1 ×R2 is a relation of arity n+m, whose tuples have been obtained by chaining
one tuple of R1 with one tuple of R2 in all possible ways.
Projection
Let R1 and R2 be two relations of arity n and m, respectively, with n > m; the relation
R2 will be called a projection of R1 if it can be generated by taking the distinct tuples
obtained by deleting a choice of (n − m) columns in R1 . The projection is formally
written as R2 = πi1 ,i2 ,...,im (R1 ), where i1 , i2 , . . . , im denote the columns of R1 that
are to be kept in R2 .
Selection
Let R be a n-ary relation. A selection S = σθ (R) is obtained by selecting all tuples
in R satisfying a condition θ stated as a logical formula, built up using the usual
connectives ∧, ∨, ¬, the arithmetic predicates <, >, =, ≤, ≥ and the values of the
tuple’s components.
Appendix C Relational Algebra 419
PhD
LOCATION
ID SURNAME AGE CITY REGION
23 Smith 38 Rome Lazio
40 Adams 39 Milan Lombardia
132 Ross 32 Bergamo Lombardia
Fig. C.2 Given the relations R1 = PhD and R2 = LOCATION, the Cartesian product of R1 and
R2 contains 9 tuples, obtained by concatenating each tuple in R1 with each tuples in R2 . Relation
Proj-PhD is the projection of relation PhD over the attributes SURNAME and AGE, i.e., Proj-
PhD = πSURNAME,AGE (PhD). Finally, relation Sel-PhD is obtained by selection from PhD, and
contains the tuples that satisfy the condition AGE 38, i.e., Sel-PhD = σAGE 38 (PhD)
FATHERHOOD R-FATHERHOOD
FATHER CHILD PARENT CHILD
John Ann John Ann
Stuart Jeanne Stuart Jeanne
Robert Albert Robert Albert
Fig. C.3 Given the relations R = FATHERHOOD, we can rename attribute FATHER as PARENT ,
obtaining the new relation R-FATHERHOOD, i.e., R-FATHERHOOD = ρPARENT ←FATHER (R)
In Fig. C.2 examples of the Cartesian product, Projection, and Selection operators
are reported.
Renaming
If R is a relation, then R(B ← A) is the same relation, where column A is now named
B. The renaming operation is denoted by R(B ← A) = ρB←A (R). In Fig. C.3 an
example of the Renaming operator is reported.
Natural-join
Let R and S be two relations of arity n and m, respectively, such that k columns
A1 , A2 , . . . , Ak in S have the same name as in R. The natural join Q = R
S is the
(n + m − k) arity relation defined as:
πA1 ,A2 ,...,A(n+m−k) σR.A1 =S.A1 ∧R.A1 =S.A1 ∧···∧R.Ak =S.Ak (R × S).
420 Appendix C Relational Algebra
AFFILIATION RESEARCH
Fig. C.4 Given the two relations AFFILIATION and RESEARCH, their natural join is obtained by
considering all tuples that have the UNIVERSITY value in common
In other words, each tuple of Q is obtained by merging two tuples of R and S such
that the corresponding values of the shared columns are the same.
In Fig. C.4 an examples of the Natural-Join operator is reported.
Appendix D
Basic Notion of First Order Logics
n this appendix we recall the basic notions of First Order Logic (FOL),
in particular those that have been used in this book. Readers interested in a deeper
understanding of the topic can find excellent introductions in many textbooks (see,
for instance, [496, 545]).
First Order Logic (also known as First Order Predicate Calculus) is a language
used in Mathematics, Computer Science, and many other fields, for describing formal
reasoning. It is an extension of Propositional Logic to the manipulation of variables.
The definition of a logical language has two parts, namely the syntax of the language,
and the semantic.
D.1 Syntax
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 421
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
422 Appendix D Basic Notion of First Order Logics
D.1.1 Formulas
Logical formulas are expressions built up over the dictionary defined by logical
and non logical symbols. Well-formed-formulas (wffs) are the ones with the syntax
recursively defined in the following. We must defined terms, and formulas.
Terms
• A constant is a term.
• A variable is a term.
• If f is a function symbol of arity n and t1 , . . . , tn are terms, f (t1 , t2 , . . . , tn ) is a
term.
Formulas
• If p is a predicate symbol of arity n, and t1 , t2 , . . . , tn are terms, then p(t1 , t2 , . . . , tn )
is an atomic formula.
• If ϕ1 and ϕ2 are formulas, (ϕ1 ∧ ϕ2 ), (ϕ1 ∨ ϕ2 ), (ϕ1 → ϕ2 ), are formulas.
• If ϕ is a formula, then ¬ϕ is a formula.
• If ϕ is a formula and x is a variable occurring in ϕ, then ∀x(ϕ) and ∃x(ϕ) are
formulas.
Only expressions that can be obtained by finitely many applications of the above
rules are formulas.
Appendix D Basic Notion of First Order Logics 423
D.2 Semantics
1With x/A, x/B, y/C we mean that the variables x, y, and z are replaced by the constant values A,
B, and C, respectively.
424 Appendix D Basic Notion of First Order Logics
Fig. D.1 Semantics of logical connectives AND (∧), OR(∨), NOT (¬), Implication (→), and
BI-Implication (↔)
true iff the tuple (B, C) belongs to the table defining the binary relation Rp , associated
to predicate p.
The truth of complex formulas can be evaluated in a universe U by combining the
truth of the single atomic formulas according to the classical semantics of the logical
connectives (see Fig. D.1). For instance, the formula ϕ(x, y) = q(x/A)∧p(x/A, y/B)
is true iff A belongs to relation Rq and (A, B) belongs to relation Rp .
By referring to the truth tables reported in Fig. D.1, it is easy to prove that, among
the five connectives ∧, ∨, ¬, →, and ↔, only three of them are essential because
implication and bi-implication can be expressed as a combination of the others. For
instance, formula ϕ → ψ (ϕ implies ψ), is semantically equivalent to ¬ϕ ∨ ψ,
while formula ϕ ↔ ψ (ϕ implies ψ and ψ implies ϕ) is semantically equivalent to
(¬ϕ ∨ ψ) ∧ (¬ψ ∨ ϕ).
In wffs quantifiers can be nested arbitrarily. However, it can be proved that any wff
can be syntactically transformed in such a way that all quantifiers occur only at the
most external level, while preserving the formula’s semantics. This syntactic form
is called prenexed form. Moreover, the existential quantifier can be eliminated by
introducing the so called Skolem function.
The prenexed form of a formula can be a universally quantified formula of the
type ∀x1 ,x2 ,...,xn .ϕ(x1 , x2 , ..., xn ), where ϕ is a formula with only free variables, which
is built up by means of the connectives ∧, ∨, ¬, and, possibly, → and ↔. Finally,
any formula, built up through the connectives ∨, ∧ and ¬, can be represented in
Conjunctive Normal Form (CNF), i.e., as a conjunction of disjunctions of atoms
(literals). In particular, any FOL sentence can always be written as in the following:
where Lij denotes a positive or negative literal, with any subset of the variables
x1 , x2 , ...., xn as arguments.
Appendix D Basic Notion of First Order Logics 425
Form (D.1) is usually referred to as clausal form (the word clause denotes a
disjunction of literals), and is the one most widely used for representing knowledge
in Relational Machine Learning.
For the sake of simplicity, notation (D.1) is usually simplified as follows:
• Universal quantification is implicitly assumed, and the quantifier symbol is omit-
ted.
• Symbol ∧ denoting conjunction is replaced by “,” or implicitly assumed.
Horn clauses. A Horn clause is a clause with at most one positive literal. Horn clauses
are named after the logician Alfred Horn [262], who investigated the mathematical
properties of similar sentences in the non-clausal form of FOL. The general form of
Horn clauses is then:
¬L1 ∨ ¬L2 ∨ . . . ∨ ¬Lk−1 ∨ Lk , (D.2)
Horn clauses play a basic role in Logic Programming [299] and are important for
Machine Learning [382]. A Horn clause with exactly one positive literal is said a
definite clause. A definite clause with no negative literals is also called a fact.
DATALOG. DATALOG is a subset of a Horn clause language designed for querying
databases. It imposes several further restrictions to the clausal form:
• It disallows complex terms as arguments of predicates. Only constants and vari-
ables can be terms of a predicate.
• Variables are range restricted, i.e., each variable in the conclusion of a clause must
also appear in a non negated literal in the premise.
Appendix E
Abstraction Operators
ll operators that we have defined so far are summarized in Table E.1. They
are grouped according to the elements of the description frame they act upon, and
their abstraction mechanism. Even though there is quite a large number of them,
several operators can be “technically” applied in the same way, exploiting synergies.
For instance, equating values of a variable can be implemented with the same code for
attributes, argument values in functions and relations, and in a function’s co-domain.
Nevertheless, we have kept them separate, because they differ in meaning, and also
in the impact they have on the Γ ’s.
As it was said at the beginning, the listed operators are defined at the level of
description frames, because they correspond to abstracting the observations that
are obtained from the sensors used to analyze the world. To each one of them a
corresponding method is associated, which acts on specific P-Sets according to
rules that guide the actual process of abstraction.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 427
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
428 Appendix E Abstraction Operators
Table E.1 Summary of the elementary abstraction and approximation operators, classified accord-
ing to the elements of the description frame they act upon and their mechanism
Operators Elements Arguments Values
Hiding ωhobj , ωhtype , ωhfunarg , ωhrelarg ωhattrval
ωhattr , ωhrel , ωhfunargval
ωhfun ωhfuncodom
ωhrelargval
Equating ωeqobj , ωeqtype , ωeqfunarg ωeqattrval
ωeqattr , ωeqfun , ωeqrelarg ωeqfunargval
ωeqrel ωeqfuncodom
ωeqrelargval
Building ωhierattr , ωhierfun , ωhierattrval
hierarchy ωhierrel , ωhiertype ωhierfuncodom
Combining ωcoll , ωaggr , ωgroup ωconstr
Approximating ρrepl ρrepl ρrepl
ρidobj , ρidtype , ρidfunarg ρidattrval
ρidattr , ρidfun , ρidrelarg ρidfunargval
ρidrel ρidfuncodom
ρidrelargval
(g)
If X (g) = ΓTYPE and y = t, type t cannot be anymore observed in a system, and
objects that were previously of this type become of type obj. We define:
(g)
ωhtype (t) = ωh (ΓTYPE , t)
def
and we obtain:
(a) (g)
ΓTYPE = ΓTYPE − {t} ∪ {obj}
(g)
If X (g) = ΓF , y = (fh , CD(fh )), v ∈ CD(fh ), then the operator
(g)
ωhfuncodom (fh , CD(fh ), v) = ωh (ΓF , (fh , CD(fh )), v)
def
Appendix E Abstraction Operators 429
removes some value v from the codomain of fh . Then an abstract function is created,
whose codomain is given by:
and
(a) (g) (a)
ΓF = ΓF − {fh } ∪ {fh }
For instance, let us consider the function Price, with codomain CD(Price) =
{cheap, moderate, fair, costly, very-costly}; if we want to remove the
value very-costly, we have to specify, in method
(a)
what happens for those tuples in FCOV (fh ) that contain v. One possibility is that
the value is turned into UN.
(g)
If X (g) = ΓR and y(a) = R(a) , the operator makes indistinguishable all relations R
satisfying ϕeq (R1 , . . . , Rk ). Let
ΓR,eq = {(R1 , . . . , Rk )|ϕeq (R1 , . . . , Rk )}
be the set of indistinguishable relations. We define:
(g)
ωeqrel (ϕeq (R1 , . . . , Rk ), R(a) ) = ωeqelem (ΓR , ϕeq (R1 , . . . , Rk ), R(a) )
def
The operator ωeqrel (ϕeq (R1 , . . . , Rk ), R(a) ) generates first the set ΓR,eq , obtaining:
(a) (g)
ΓR = ΓR − ΓR,eq ∪ {R(a) }
It is the method metheqrel (Pg , ϕeq (R1 , . . . , Rk ), R(a) ) that specifies how the cover
of R(a) has to be computed.
As an example, let us suppose that the set of relations to be made indistinguishable
be define extensionally, as in the case of functions. For instance, let
where:
(g) (g)
RIsMotherof ⊆ Γwomen × Γpeople
(g) (g)
RIsStepMotherof ⊆ Γwomen × Γpeople
If we state the equivalence between the two relations, we may keep only R(a) in place
of the two. Again, method metheqrel (Pg , ϕeq (R1 , . . . , Rk ), R(a) ) shall specify how
the cover RCOV (RIsStepMotherof ) must be computed.
(g)
If X (g) = ΓF , Y = (fh , CD(fh )), Veq ⊆ CD(fh ), then the operator equates values
of the codomain of function fh and set all equal to v(a) . We define:
(g)
ωeqfuncodom (fh , CD(fh ), Veq , v(a) ) = ωeqval (ΓF , (fh , CD(fh )), Veq , v(a) )
def
Then:
(a) (g) (a)
ΓF = ΓF − {fh } ∪ {fh }
(a)
Method metheqfuncodom (Pg , (fh , CD(fh )), Veq , v(a) ) handles the cover of fh by
replacing in FCOV (fh(a) ) all occurrences of members of Veq with v(a) .
For the sake of exemplification, let us consider a gray-level picture, in which the
attribute Intensity of a pixel x can take on a value in the integer interval [0, 255]. Let
τ be a threshold, such that:
255 if I(x) > τ ,
I (a) (x) = (E.1)
I(x) otherwise.
In Eq. (E.1) all values greater than the threshold are considered equivalent. An exam-
ple is reported in Fig. E.1.
Appendix E Abstraction Operators 431
Fig. E.1 Example of method metheqfuncodom (Pg , (fh , CD(fh )), Veq , v(a) ). The picture on the left
is a 256-level gray picture. By a thresholding operation, all pixels whose intensity is greater than τ
are considered white
(g) (g)
If X (g) = ΓA , Y = ∅, Ychild = ΓA,child , and y(a) = (A(a) , Λ(a) ), then the operator
(g)
works on an attribute hierarchy, where a set of nodes, those contained in ΓA,child ,
are replaced by (A(a) , Λ(a) ). We define:
(g) (g) (g)
ωhierattr ΓA,child , (A(a) , Λ(a) ) = ωhier ΓA , ΓA,child , (A(a) , Λ(a) )
def
and we obtain:
(a) (g) (g)
ΓA = ΓA − ΓA,child ∪ {(A(a) , Λ(a) )}.
(g)
The method methhierattr (Pg , ΓA,child , (A(a) , Λ(a) )) states how the values in Λ(a)
(g)
must be derived from those in the domains of the attributes in ΓA,child .
As an example, let us consider the attributes Length and Width. We intro-
duce the abstract attribute LinearSize(a) , such that Length is-a LinearSize(a) and
Width is-a LinearSize(a) . We have then, ΓA,child = {Length, Width}, and A(a) =
LinearSize(a) . The values of the attribute LinearSize(a) are to be defined; for instance,
we may assume that, for an object x,
(g) (g)
If X (g) = ΓR , Y = ∅, Ychild = ΓR,child , and y(a) = R(a) , then the operator works
(g)
on a relation hierarchy, where a set of nodes, those contained in ΓR,child , are replaced
by R(a) . We define:
(g) (g) (g)
ωhierrel ΓR,child , R(a) = ωhier ΓR , ΓR,child , R(a)
def
and we obtain:
(a) (g) (g)
ΓR = ΓR − ΓR,child ∪ R(a)
(g)
The method methhierrel Pg , ΓR,child , R(a) states how the cover of R(a) must be
(g)
computed starting from those of the relations in ΓR,child .
(g) (g) (g) (g)
As an example, let RHorizAdjacent ⊆ ΓO × ΓO and RVertAdjacent ⊆ ΓO × ΓO
be two relations over pairs of objects. The former is verified when two objects touch
each other horizontally, whereas the latter is verified when two objects touch each
(a)
other vertically. We introduce the abstract relation RAdjacent ⊆ ΓO × ΓO , which does
not distinguish the modality (horizontal or vertical) of the adjacency. In this case we
(a)
have ΓR,child = {RHorizAdjacent , RVertAdjacent } and the new relation R(a) = RAdjacent .
(Ψ ) (g)
Operator ωhierrel (Pg , ΓR,child , R(a) ) will establish that, for instance:
(a)
FCOV (RAdjacent ) = FOCV (RHorizAdjacent ) ∪ FCOV (RVertAdjacent )
(g)
If X (g) = ΓTYPE and y(a) = t(a) , the operator makes all types satisfying ϕid (t1 , . . . , tk )
indistinguishable. Then type t(a) is applied to all objects in the equivalence class.
We define:
(g)
ωidtypes (ϕid (t1 , . . . , tk ), t(a) ) = ωidelem (ΓTYPE , ϕid (t1 , . . . , tk ), t(a) )
def
Appendix E Abstraction Operators 433
The operator ωidtype (ϕid (t1 , . . . , tk ), t(a) ) generates first the set of ΓTYPE,id of indis-
tinguishable types, and then it applies t(a) to the obtained class. All types in ΓTYPE,id
become t(a) , obtaining:
(a) (g)
ΓTYPE = ΓTYPE − ΓTYPE,id ∪ {t(a) }
It is the method methidtype (Pg , ϕid (t1 , . . . , tk ), t(a) ) that specifies what properties
are to be assigned to t(a) , considering the ones of the equated types. For instance, if
the types in ΓTYPE,id have different sets of attributes, t(a) could have the intersection
of these sets, or the union, by setting some values to NA, depending on the choice of
the user.
As an example, we can consider the types chair and armchair and we can
equate them to be both chair(a) .
(g)
If X (g) = ΓA , Y = (A, ΛA ), and Vid = ΛA,id ⊆ ΛA , then the operator makes
indistinguishable a subset ΛA,id of the domain ΛA of A. We define:
(g)
ωidattrval ((A, ΛA ), ΛA,id , v(a) ) = ωidval (ΓA , (A, ΛA ), ΛA,id , v(a) )
def
We obtain an approximate attribute A(a) such that ΛA(a) = ΛA − ΛA,id ∪ {v(a) }, and
(a) (g)
ΓA = ΓA − {(A, ΛA )} ∪ {(A(a) , ΛA(a) )}
For the sake of exemplification, let us consider an attribute, say Color, which
takes values in the set:
{white, yellow, olive-green, sea-green, lawn-green, red,
pink, light-green, dark-green, blue, light-blue, aquamar-
ine, orange, magenta, cyan, black}.
We might consider equivalent all the shades of green, and identify them with
v(a) = sea-green. In this case, the true shade of green is no more known (see
Fig. E.2.
As another important example, let us consider the discretization of real num-
bers. Let us consider the interval [0, 100), and let us divide it into 10 subintervals
{[10k, 10(k + 1)) | 0 k 9}. Numbers falling inside one of the intervals are
considered all equal to the mean value 10k + 0.5.
434 Appendix E Abstraction Operators
Fig. E.2 Application of method meth(Pg , ωeqattrval (Color, ΛColor ), Vid , v(a) to the figure
on the left. Let Vid = {olive-green, sea-green, lawn-green, light-green,
dark-green}. Objects o1 , o2 , and o3 have color dark-green, lawn-green, and
sea-green, respectively. After equating all shades of green to sea-green, the color of all
three objects becomes sea-green. [A color version of the figure is reported in Fig. H.16 of
Appendix H]
In Chap. 7 the methods associated to some operators have been described. In this
section we give some additional examples, whereas the complete set of methods is
provided in the book’s companion site.
Let us consider the operators that hide an attribute, or a function, or a relation,
i.e., ωhattr ((Am , Λm )), ωhfun (fh ), and ωhrel (Rk ). Hiding an attribute or a function or
a relation are all instantiations of the same PDT introduced in Sect. 7.2.1 , and then
we group them together in Table E.2, whereas their bodies are reported in Tables E.3,
E.4, and E.5, respectively.
Also at the description level the operators ωhattr ((Am , ΛM )), ωhfun (fh ), and
ωhrel (Rk ) are similar; in fact, they simply hide from the appropriate set the con-
cerned element (attribute, function, or relation), as it was illustrated in Sect. 7.2.1.
But when we must apply them to a specific Pg , some complication may arise. Let us
look first at Table E.2.
Operator ωhattr ((Am , Λm )) hides the attribute
from the set of available ones, and,
as a consequence, meth Pg , ωhattr (Am , Λm ) hides the value of that attribute in each
object in Pg . Hiding an attribute may cause the descriptions of some objects to become
identical. However, as each object has a unique identity, they remain distinguishable.
As both functions and relations cannot have an attribute as an argument, removing
Am does not have any further effect. For the hidden information, it is not necessary
to store all the tuples hidden in Ag , but only the value of Am for each object.
Appendix E Abstraction Operators 435
Table E.2 Summary of methods meth(Pg , ωhattr (Am , ΛM )), meth(Pg , ωhfun (fh )), and
meth(Pg , ωhrel (Rk ))
NAME meth(Pg , ωhattr ) meth(Pg , ωhfun ) meth(Pg , ωhrel )
INPUT Pg , (Am , Λm ) Pg , f h Pg , R k
OUTPUT Pa Pa Pa
APPL-CONDITIONS Am ∈ Ag fh ∈ F g Rk ∈ Rg
PARAMETERS ∅ ∅ ∅
MEMORY Δ(P ) Δ(P ) Δ(P )
BODY See Table E.3 See Table E.4 See Table E.5
Hiding a function is a simple operation, per se, but it may have indirect effects
on the set of functions and relations. In fact, if the co-domain of fh is the set of
(g) (g)
objects, there may be in ΓF or ΓR some function or relation that has fh as one of
its arguments. Then, hiding fh , these arguments disappear and new abstract functions
or relations, with one less argument, are to be defined, increasing thus the degree of
abstraction. Hinding a relation has no side-effects.
(continued)
438
Type “Abstr” stands for abstraction operator, whereas “Approx” stands for approximation operator
Appendix F
Abstraction Patterns
n this appendix two more abstraction patterns are described, for the sake
of illustration. The complete set, corresponding to the full set of operators listed in
Appendix E, can be found in the book’s companion Web site.
In Table F.1 the pattern referring to hiding an argument of a function or relation
is reported.
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 441
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
442 Appendix F Abstraction Patterns
Table F.2 AGGREGATION—Aggregation Pattern that forms new objects starting from existing
ones
NAME AGGREGATION
ALSO KNOWN In Machine Learning the aggregation operator is known as
“predicate invention”, “predicate construction”, or “term
construction”, whereas in Data Mining it is related to “motif
discovery”. In general, it is the basis of the “constructive
induction” approach to learning. In Planning, Problem
Solving, and Reinforcement Learning it includes “state
aggregation” and “spatial and/or temporal aggregation”.
GOAL It aims at working, in any field, with “high level” constructs in
the description of data and in theories, in order to reduce the
computational cost and increasing the meaningfulness of
the results.
TYPICAL Finding regions and objects in the visual input, representing
APPLICATIONS physical apparata at various levels of details by introducing
composite components.
IMPLEMENTATION Implementing the grouping operator may require even complex
ISSUES algorithms, and the cost of aggregation has to be weighted
against the advantages in the use of the abstract
representation.
KNOWN USES Even though not always under the name of abstraction,
aggregation and feature construction is very much used in
computer vision, description of physical systems, Machine
Learning, Data Mining, and Artificial Intelligence in
general.
Appendix G
Abstraction of Michalski’s “Train” Problem
Chap. 9. In Table G.1 the method meth Pg , ωaggr ({car, load}, loadedcar)
is reported.
The parameters, which are listed in Table G.2, specify how objects are actually
aggregated and how attributes and relations change as a consequence.
Finally, Table G.3 describes the actual algorithm performing the aggregation
abstraction.
APPL-CONDITIONS ∃c ∈ Ocar
∃ (different) 1 , . . . , n ∈ Oload
c, 1 , . . . , n are labelled with the same example
(i , c) ∈ RCOV (RInside ) (1 i n)
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 443
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
444 Appendix G Abstraction of Michalski’s “Train” Problem
Table G.2 Parameters of the method meth Pg , ωaggr ({car, load}, loadedcar)
Table G.3 Pseudo-code for the method meth Pg , ωaggr ({car, load}, loadedcar)
n this appendix, some of the figures appearing in the book are reported
with their original colors.
Fig. H.1 Vasilij Kandinsky, Composition VII, 1913. The Tretyakov Gallery, Moscow
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 445
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
446 Appendix H Color Figures
Fig. H.4 Upper (pink + yellow regions) and lower (yellow region) approximations of a concept X
= Oval, defined as a region in the 2D plane
Fig. H.5 Incas used quipus to memorize numbers. A quipu is a cord with nodes that assume
position-dependent values. An example of the complexity a quipu may reach. (Reprinted with
permission from Museo Larco, Pueblo Libre, Lima, Peru.)
448 Appendix H Color Figures
(a) (b)
Fig. H.6 a Picture of a poppy field. If we only have this picture, it is impossible to say whether
it is concrete or abstract. b The same picture in black and white. By comparison, this last is less
informative than the colored one, because the information referring to the color has been removed;
then picture b is more abstract than picture a
Fig. H.7 A color picture has been transformed into a black and white one. If the color is added
again, there is no clue for performing this addition correctly, if it is not know how the color was
originally removed
Appendix H Color Figures 449
Fig. H.8 Abstraction and generalization can be combined in every possible way. In the left-bottom
corner there is picture of one of the authors, which is specific (only one instance) and concrete (all
the skin, hair, face, ... details are visible). In the right-bottom corner there is a version of the picture
which is specific (only one instance, as the person is still recognizable) and abstract (most details
of the appearance are hidden). In the top-left corner the chimpanzee–human last common ancestor
is represented with many physical details, making thus the picture still concrete; however many
monkeys and hominides satisfy the same description, so that this is an example of a concrete but
general concept. Finally, in the top-right corner there is a representation of a human head according
to Marr [353] (see Fig. 2.13); the head is abstract (very few details of the appearance) and general
(any person could be an instance)
Fig. H.10 Example of method meth(Pg , ωhattr ((Am , Λm )). The attribute Am = Color is hidden
from the left picture giving a gray-level picture (right). Each pixel shows a value of the light intensity,
but this last is no more distributed over the R,G,B channels
Fig. H.11 Example of application of the method meth[Pg , ωhattrval ((Color, ΛColor ), turqu-
oise)]. The value turquoise is hidden from the left picture; a less colorful picture is obtained
(right), where objects of color turquoise become transparent (UN)
Appendix H Color Figures 451
Fig. H.12 The Rubik’s cube can be described in terms of the 26 small component cubes, which give
rise to the description frame Γ . Each arrangement of the cubes generates a specific configuration
ψ; the configuration set, Ψ , is very large. A configuration is a complete description of the positions
of the small cubes, so that it is unique. If Rubik’s cube is observed only partially, for instance by
looking only at one face, the observation corresponds to many configurations, each one obtained
by completing the invisible faces of the cube in a different way; in this case we have a P -Set P ,
which is a set of configurations. The query Q can be represented by a particular configuration to be
reached starting from an initial one
Fig. H.13 Application of method meth Pg , ωaggr ((figure, figure), tower) . Objects a and
b are aggregated to obtain object c1 , and objects c and d are aggregated to obtain object c2 . The
color of c1 is blue, because b is larger than a, whereas the color of c2 is green. Both composite
objects are large. The new object c1 is at the left of c2
452 Appendix H Color Figures
Fig. H.14 Examples of four structured objects, used to learn the concept of an “arch”. Each
component has a shape (rectangle or triangle) and a color (blue, red, yellow, or green). They are
linked by two relations, namely Rontop and Radjacent
(a)
(b) (c)
Fig. H.15 a Part of a map at 1/25000 scale. b A 16-fold reduction of the map. c Cartographic
generalization of the map at the 1/100 000 scale. By comparing b and c the differences between
simply reducing and generalizing are clearly apparent
Appendix H Color Figures 453
Fig. H.16 Application of method meth(Pg , ωeqattrval (Color, ΛColor ), Vid , v(a) to the figure
on the left. Let Vid = {olive-green, sea-green, lawn-green, light-green,
dark-green}. Objects o1 , o2 , and o3 have color dark-green, lawn-green, and
sea-green, respectively. After equating all shades of green to sea-green, the color of all
three considered objects becomes sea-green
References
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 455
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
456 References
39. A.G. Barto, S. Mahadevan, Recent advances in hierarchical Reinforcement Learning. Discrete
Event Dyn. Syst. 13, 41–77 (2003)
40. J. Barwise, J. Seligman, Information Flow: The Logic of Distributed Systems (Cambridge
University Press, New York, 1997)
41. M. Basseville, A. Benveniste, K.C. Chou, S.A. Golden, R. Nikoukhah, A.S. Willsky, Modeling
and estimation of multiresolution stochastic processes. IEEE Trans. Inf. Theor. 38, 766–784
(1992)
42. M. Bassok, K. Dunbar, K. Holyoak, Introduction to the special section on the neural substrate
of analogical reasoning and metaphor comprehension. J. Exp. Psychol. Learn. Mem. Cogn.
38, 261–263 (2012)
43. J. Bauer, I. Boneva, M. Kurbán, A. Rensink, A modal-logic based graph abstraction. Lect.
Notes Comput. Sci. 5214, 321–335 (2008)
44. K. Bayer, M. Michalowski, B. Choueiry, C. Knoblock, Reformulating constraint satisfac-
tion problems to improve scalability. in Proceedings of the 7th International Symposium on
Abstraction, Reformulation, and Approximation, (Whistler, Canada, 2007), pp. 64–79.
45. A. Belussi, C. Combi, G. Pozzani, Towards a formal framework for spatio-temporal granular-
ities. in Proceedings of the 15th International Symposium on Temporal Representation and
Reasoning, (Montreal, Canada, 2008), pp. 49–53.
46. P. Benjamin, M. Erraguntla, D. Delen, R. Mayer, Simulation modeling and multiple levels of
abstraction. in Proceedings of the Winter Simulation Conference, (Piscataway, New Jersey,
1998), pp. 391–398.
47. J. Benner, The Ancient Hebrew Lexicon of the Bible (Virtualbookworm, College Station, 2005)
48. C. Bennett, Dissipation, information, computational complexity and the definition of organiza-
tion, in Emerging Syntheses in Science, ed. by D. Pines (Redwood City, USA, Addison-Wesley,
1987), pp. 215–234
49. C. Bennett, Logical depth and physical complexity. in ed. by R. Herken The Universal Turing
Machine: A Half-Century Survey, (Oxford University Press, Oxford, 2011), pp. 227–257.
50. A. Berengolts, M. Lindenbaum, On the performance of connected components grouping. Int.
J. Comput. Vis. 41, 195–216 (2001)
51. A. Berengolts, M. Lindenbaum, On the distribution of saliency. in Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition, (Washington, USA, 2004), pp.
543–549.
52. F. Bergadano, A. Giordana, L. Saitta, Machine Learning: An Integrated Framework and its
Application (Ellis Horwood, Chichester, 1991)
53. G. Berkeley, Of the Principles of Human Knowledge. (Aaron Rahmes for Jeremy Pepyat,
Skynner Row, 1710)
54. S. Bertz, W. Herndon, The similarity of graphs and molecules, in Artificial Intelligence Appli-
cations to Chemistry, ed. by T. Pierce, B. Hohne (ACS, USA, 1986), pp. 169–175
55. T. Besold, Computational Models of Analogy-Making: An Overview Analysis of Compu-
tational Approaches to Analogical Reasoning, Ph.D. thesis (University of Amsterdam, NL,
2011).
56. C. Bessiere, P.V. Hentenryck, To be or not to be ... a global constraint. in Proceedings of the 9th
International Conference on Principles and Practices of Constraint Programming, (Kinsale,
Ireland, 2003).
57. W. Bialek, I. Nemenman, N. Tishby, Predictability, complexity, and learning. Neural Comput.
13, 2409–2463 (2001)
58. M. Biba, S. Ferilli, T. Basile, N.D. Mauro, F. Esposito, Induction of abstraction operators
using unsupervised discretization of continuous attributes. in Proceedings of The International
Conference on Inductive Logic Programming, (Santiago de Compostela, Spain, 2006), pp.
22–24.
59. I. Biederman, Recognition-by-components: a theory of human image understanding. Psychol.
Rev. 94, 115–147 (1987)
60. A. Bifet, G. Holmes, R. Kirkby, B. Pfahringer, Moa: Massive online analysis. J. Mach. Learn.
Res. 99, 1601–1604 (2010)
458 References
61. P. Binder, J. Plazas, Multiscale analysis of complex systems. Phys. Rev. E 63, 065203(R)
(2001).
62. J. Bishop, Data Abstraction in Programming Languages, (Addison-Wesley, Reading, 1986).
63. S. Bistarelli, P. Codognet, F. Rossi, Abstracting soft constraints: Framework, properties, exam-
ples. Artif. Intell. 139, 175–211 (2002)
64. A. Blum, P. Langley, Selection of relevant features and examples in machine learning. Artif.
Intell. 97, 245–271 (1997)
65. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D. Hwang, Complex networks: structure and
dynamics. Phys. Rep. 424, 175–308 (2006)
66. D. Bonchev, G. Buck, Quantitative measures of network complexity, in Complexity in Chem-
istry, Biology, and Ecology, ed. by D. Bonchev, D. Rouvray (Springer, USA, 2005), pp.
191–235
67. I. Boneva, A. Rensink, M. Durban, J. Bauer, Graph abstraction and abstract graph transfor-
mation (Centre for Telematics and Information Technology (University of Twente Enschede,
Technical report , 2007)
68. G. Booch, Object-Oriented Analysis and Design with Applications, (Addison-Wesley, Read-
ing, 2007).
69. M. Botta, A. Giordana, Smart+: A multi-strategy learning tool. in Proceedings of the 13th
International Joint Conference on Artificial Intelligence, (Chambéry, France, 1993), pp. 203–
207.
70. M. Botta, A. Giordana, L. Saitta, M. Sebag, Relational learning as search in a critical region.
J. Mach. Learn. Res. 4, 431–463 (2003)
71. P. Bottoni, L. Cinque, S. Levialdi, P. Musso, Matching the resolution level to salient image
features. Pattern Recogn. 31, 89–104 (1998)
72. I. Bournaud, M. Courtine, J.-D. Zucker, Propositionalization for clustering symbolic rela-
tional descriptions. in Proceedings of the 12th International Conference on Inductive Logic
Programming, (Szeged, Hungary, 2003), pp. 1–16.
73. E. Bourrel, V. Henn, Mixing micro and macro representations of traffic flow: a first theoretical
step. in Proceedings of the 9th Meeting of the Euro Working Group on, Transportation, pp.
10–13 (2002).
74. O. Bousquet, Apprentissage et simplicité. Diploma thesis, (Université de Paris Sud, Paris,
France, 1999), In French.
75. C. Boutilier, R. Dearden, M. Goldszmidt, Exploiting structure in policy construction. in Pro-
ceedings of the 14th International Joint Conference on Artificial Intelligence, (Montréal,
Canada, 1995), pp. 1104–1111.
76. C. Boutilier, R. Dearden, M. Goldszmidt, Stochastic dynamic programming with factored
representations. Artif. Intell. 121, 49–107 (2000)
77. J. Boyan, A. Moore, Generalization in reinforcement learning: safely approximating the value
function. Adv. Neural Inf. Process. Syst. 7, 369–376 (1995)
78. K. Brassel, R. Weibel, A review and conceptual framework of automated map generalization.
Int. J. Geogr. Inf. Syst. 2, 229–244 (1988)
79. N. Bredèche, Y. Chevaleyre, J.-D. Zucker, A. Drogoul, G. Sabah, A meta-learning approach
to ground symbols from visual percepts. Robot. Auton. Syst. 43, 149–162 (2003)
80. H. Brighton, C. Mellish, Advances in instance selection for instance-based learning algo-
rithms. Data Min. Knowl. Discov. 6, 153–172 (2002)
81. A. Brook, Approaches to abstraction: a commentary. Int. J. Educ. Res. 27, 77–88 (1997)
82. R. Brooks, Elephants don’t play chess. Robot. Auton. Syst. 6, 3–15 (1990)
83. R. Brooks, Intelligence without representation. Artif. Intell. 47, 139–159 (1991)
84. V. Bulitko, N. Sturtevant, J. Lu, T. Yau, Graph abstraction in real-time heuristic search. J.
Artif. Intell. Res. 30, 51–100 (2007)
85. N. Busch, I. Fruend, C. Herrmann, Electrophysiological evidence for different types of change
detection and change blindness. J. Cogn. Neurosci. 22, 1852–1869 (2010)
86. T.D.V. Cajetanus, De Nominum Analogia (1498) (Zammit, Rome (Italy, 1934)
References 459
87. T. Calders, R. Ng, J. Wijsen, Searching for dependencies at multiple abstraction levels. ACM
Trans. Database Syst. 27, 229–260 (2002)
88. G. Cantor, Contributions to the Founding of the Theory of Transfinite Numbers (Dover Pub-
lications, UK, 1915)
89. R. Cavendish, The Black Arts (Perigee Books, USA, 1967)
90. G. Chaitin, On the length of programs for computing finite binary sequences: statistical con-
siderations. J. ACM 16, 145–159 (1969)
91. D. Chalmers, R. French, D. Hofstadter, High-level perception, representation, and analogy: a
critique of Artificial Intelligence methodology. J. Exp. Theor. Artif. Intell. 4, 185–211 (1992)
92. V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Survey 41,
1–58 (2009)
93. J. Charnley, S. Colton, I. Miguel, Automated reformulation of constraint satisfaction problems.
in Proceedings of the Automated Reasoning Workshop, (Bristol, UK, 2006), pp. 128–135.
94. A. Chella, M. Frixione, S. Gaglio, A cognitive architecture for artificial vision. Artif. Intell.
89, 73–111 (1997)
95. A. Chella, M. Frixione, S. Gaglio, Understanding dynamic scenes. Artif. Intell. 123, 89–132
(2000)
96. A. Chella, M. Frixione, S. Gaglio, Conceptual spaces for computer vision representations.
Artif. Intell. Rev. 16, 87–118 (2001)
97. C. Cheng, Y. Hu, Extracting the abstraction pyramid from complex networks. BMC Bioinform.
11, 411 (2010)
98. Y. Chevaleyre, F. Koriche, J.-D. Zucker, Learning linear classifiers with ternary weights
from metagenomic data. in Proceedings of the Conférence Francophone sur l’Apprentissage
Automatique, (Nancy, France, 2012), In French.
99. L. Chittaro, R. Ranon, Hierarchical model-based diagnosis based on structural abstraction.
Artif. Intell. 155, 147–182 (2004)
100. T. Chothia, D. Duggan, Abstractions for fault-tolerant global computing. Theor. Comput. Sci.
322, 567–613 (2004)
101. B. Choueiry, B. Faltings, R. Weigel, Abstraction by interchangeability in resource allocation.
in Proceedings of the 14th International Joint Conference on Artificial Intelligence, (Montreal,
Canada, 1995), pp. 1694–1701.
102. B. Choueiry, Y. Iwasaki, S. McIlraith, Towards a practical theory of reformulation for reason-
ing about physical systems. Artif. Intell. 162, 145–204 (2005)
103. B. Choueiry, A. Davis, Dynamic bundling: Less effort for more solutions. in Proceedings
of the 5th International Symposium on Abstraction, Reformulation and Approximation,
(Kananaskis, Alberta, Canada, 2002), pp. 64–82.
104. B. Choueiry, G. Noubir, On the computation of local interchangeability in discrete constraint
Satisfaction problems. in Proceedings of the 15th National Conference on Artificial Intelli-
gence, (Madison, USA, 1998), pp. 326–333.
105. J. Christensen, A hierarchical planner that creates its own hierarchies. in Proceedings of the
8th National Conference on Artificial Intelligence, (Boston, USA, 1990), pp. 1004–1009.
106. R. Cilibrasi, P. Vitànyi, Clustering by compression. IEEE Trans. Inform. Theor. 51, 1523–1545
(2005)
107. A. Cimatti, F. Giunchiglia, M. Roveri, Abstraction in planning via model checking. in Proceed-
ings of the 8th International Symposium on Abstraction, Reformulation, and Approximation,
(Asilomar, USA, 1998), pp. 37–41.
108. E. Clarke, B. Barton, Entropy and MDL discretization of continuous variables for Bayesian
belief networks. Int. J. Intell. Syst. 15, 61, 92 (2000).
109. E. Codd, Further normalization of the data base relational model. in Courant Computer Science
Symposium 6: Data Base Systems, (Prentice-Hall, Englewood Cliff, 1971), pp. 33–64.
110. W. Cohen, Fast effective rule induction. in Proceedings of the 12th International Conference
on Machine Learning, (Lake Tahoe, USA, 1995), pp. 115–123.
111. T. Colburn, G. Shute, Abstraction in computer science. Minds Mach. 17, 169–184 (2007)
460 References
112. E. Colunga, L. Smith, The emergence of abstract ideas: evidence from networks and babies.
Phil. Trans. Roy. Soc. B 358, 1205–1214 (2003)
113. L. Console, D. Theseider-Dupré, Abductive reasoning with abstraction axioms. Lect. Notes
Comput. Sci. 810, 98–112 (1994)
114. S. Cook, The complexity of theorem proving procedures. in Proceedings of the 3rd Annual
ACM Symposium on Theory of Computing, (Shaker Heights, USA, 1971), pp. 151–158.
115. S. Coradeschi, A. Saffiotti, Anchoring symbols to sensor data: preliminary report. in Pro-
ceedings of the 17th National Conference on Artificial Intelligence, (Austin, USA, 2000), pp.
129–135.
116. L. Costa, F. Rodrigues, A. Cristino, Complex networks: the key to systems biology. Genet.
Mol. Biol. 31, 591–601 (2008)
117. P. Cousot, R. Cousot, Basic concepts of abstract interpretation. Build. Inf. Soc. 156, 359–366
(2004)
118. V. Cross, Defining fuzzy relationships in object models: Abstraction and interpretation. Fuzzy
Sets Syst. 140, 5–27 (2003)
119. J. Crutchfield, N. Packard, Symbolic dynamics of noisy chaos. Physica D 7, 201–223 (1983)
120. W. Daelemans, Abstraction considered harmful: lazy learning of language processing. in
Proceedings of 6th Belgian-Dutch Conference on Machine Learning, (Maastricht, NL, 1996),
pp. 3–12.
121. L. Danon, J. Duch, A. Diaz-Guilera, A. Artenas, Comparing community structure identifica-
tion. J. Stat. Mech. Theor. Exp. P09008 (2005).
122. J. Davis, V. Costa, S. Ray, D. Page, An integrated approach to feature invention and model
construction for drug activity prediction. in Proceedings of the 24th International Conference
on Machine Learning, (Corvallis, USA, 2007), pp. 217–224.
123. P. Davis, R. Hillestad, Families of models that cross levels of resolution: issues for design,
calibration and management. in Proceedings of the 25th Conference on Winter Simulation,
(Los Angeles, USA, 1993), pp. 1003–1012.
124. R. Davis, Diagnostic reasoning based on structure and behavior. Artif. Intell. 24, 347–410
(1984)
125. F. de Goes, S. Goldenstein, M. Desbrun, L. Velho, Exoskeleton: curve network abstraction
for 3D shapes. Comput. Graph. 35, 112–121 (2011)
126. J. de Kleer, B. Williams, Diagnosing multiple faults. Artif. Intell. 32, 97–130 (1987)
127. M. de Vries, Engineering science as a “discipline of the particular”? Types of generalization
in Engineering sciences. in ed. by I. van de Poel, D. Goldberg Philosophy and Engineering:
An Emerging Agenda, (Springer, 2010), pp. 83–93.
128. T. Dean, R. Givan, Model minimization in Markov decision processes. in Proceedings of the
National Conference on Artificial Intelligence, (Providence, USA, 1997), pp. 106–111.
129. D. DeCarlo, A. Santella, Stylization and abstraction of photographs. ACM Trans. Graph. 21,
769–776 (2002)
130. M. Dehmer, L. Sivakumar, Recent developments in quantitative graph theory: information
inequalities for networks. PLoS ONE 7, e31395 (2012)
131. O. Dekel, S. Shalev-shwartz, Y. Singer, The Forgetron: A kernel-based perceptron on a fixed
budget. in In Advances in Neural Information Processing Systems 18, (MIT Press, 2005), pp.
259–266.
132. A. Delorme, G. Richard, M. Fabre-Thorpe, Key visual features for rapid categorization of
animals in natural scenes. Front. Psychol. 1, 0021 (2010)
133. D. Dennett, The Intentional Stance (MIT Press, Cambridge, 1987)
134. K. Devlin, Why universities require computer science students to take math. Commun. ACM
46, 37–39 (2003)
135. T. Dietterich, R. Michalski, Inductive learning of structural description. Artif. Intell. 16, 257–
294 (1981)
136. T. Dietterich, R. Michalski, A comparative review of selected methods for learning from
examples, in Machine Learning: An Artificial Intelligence Approach, ed. by J. Carbonell, R.
Michalski, T. Mitchell (Tioga Publishing, Palo Alto, 1983).
References 461
137. T. Dietterich, An overview of MAXQ hierarchical reinforcement learning. Lect. Notes Com-
put. Sci. 26–44, 2000 (1864)
138. T. Dietterich, Machine Learning for sequential data: A review. in Proceedings of the Joint
IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition,
(London, UK, 2002), pp. 15–30.
139. R. Dorat, M. Latapy, B. Conein, N. Auray, Multi-level analysis of an interaction network
between individuals in a mailing-list. Ann. Telecommun. 62, 325–349 (2007)
140. J. Dougherty, R. Kohavi, M. Sahami, Supervised and unsupervised discretization of contin-
uous features. in Proceedings of the 12th International Conference on Machine Learning,
(Tahoe City, USA, 1995), pp. 194–202.
141. G. Drastal, G. Czako, S. Raatz, Induction in an abstraction space. in Proceedings of the 11th
International Joint Conference on Artificial Intelligence, (Detroit, USA, 1989), pp. 708–712.
142. E. Dubinsky, Reflective abstraction in advanced mathematical thinking, in Advanced Mathe-
matical Thinking, ed. by D. Tall (Kluwer, Dordrecht, 1991), pp. 95–123
143. R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis (Cambridge Uni-
versity Press, Cambridge, 1998)
144. P. Duygulu, M. Bastan, Multimedia translation for linking visual data to semantics in videos.
Mach. Vis. Appl. 22, 99–115 (2011)
145. S. Dzeroski, P. Langley, L. Todorovski, Computational discovery of scientific knowledge.
Lect. Notes Artif. Intell. 4660, 1–14 (2007)
146. T. Ellman, Synthesis of abstraction hierarchies for constraint satisfaction by clustering approx-
imately equivalent objects. in Proceedings of the 10th International Conference on Machine
Learning, (Amherst, USA, 1993), pp. 104–111.
147. F. Emmert-Streib, Statistic complexity: combining Kolmogorov complexity with an ensemble
approach. PLoS ONE 5, e12256 (2010)
148. F. Emmert-Streib, M. Dehmer, Exploring statistical and population aspects of network com-
plexity. PlosOne 7, e34523 (2012)
149. H. Enderton, A Mathematical Introduction to Logic, (Academic Press, 1972).
150. E. Engbers, M. Lindenbaum, A. Smeulders, An information-based measure for grouping
quality. in Proceedings of the European Conference on Computer Vision, (Prague, Czech
Republic, 2004), pp. 392–404.
151. D. Ensley, A hands-on approach to proof and abstraction. ACM SIGCSE Bull. 41, 45–47
(2009)
152. S. Epstein, X. Li, Cluster graphs as abstractions for constraint satisfaction problems. in Pro-
ceedings of the 8th International Symposium on Abstraction, Reformulation, and Approxi-
mation, (Lake Arrowhead, USA, 2009), pp. 58–65.
153. J. Euzenat, On a purely taxonomic and descriptive meaning for classes. in Proceedings of the
IJCAI Workshop on Object-Based Representation Systems, (Chambéry, France, 1993), pp.
81–92.
154. J. Euzenat, Représentation des connaissances: De l’ Approximation à la Confrontation. Ph.D.
thesis, (Université Joseph Fourier, Grenoble, France, 1999).
155. J. Euzenat, Granularity in relational formalisms with application to time and space represen-
tation. Comput. Intell. 17, 703–737 (2001)
156. J. Euzenat, A. Montanari, Time granularity, in Handbook of Temporal Reasoning in Artificial
Intelligence, ed. by M. Fisher, D. Gabbay, L. Vila (Elsevier, Amsterdam, 2005), pp. 59–118
157. P. Expert, T. Evans, V. Blondel, R. Lambiotte, Beyond space for spatial networks. PNAS 108,
7663–7668 (2010)
158. M. Fabre-Thorpe, Visual categorization: accessing abstraction in non-human primates. Phil.
Trans. Roy. Soc. B 358, 1215–1223 (2003)
159. B. Falkenhainer, K. Forbus, D. Gentner, The structure-mapping engine: algorithm and exam-
ples. Artif. Intell. 41, 1–63 (1989)
160. J. Fan, R. Samworth, Y. Wu, Ultrahigh dimensional feature selection: Beyond the linear model.
J. Mach. Learn. Res. 10, 2013–2038 (2009)
462 References
161. A. Feil, J. Mestre, Change blindness as a means of studying expertise in Physics. J. Learn.
Sci. 19, 480–505 (2010)
162. J. Feldman, How surprising is a simple pattern? Quantifying “Eureka!”. Cognition 93, 199–
224 (2004)
163. A. Felner, N. Ofek, Combining perimeter search and pattern database abstractions. in Proceed-
ings of the 7th International Symposium on Abstraction, Reformulation and Approximation,
(Whistler, Canada, 2007), pp. 155–168.
164. S. Ferilli, T. Basile, N.D. Mauro, F. Esposito, On the learnability of abstraction theories from
observations for relational learning. in Proceedings of European Conference on Machine
Learning, (Porto, Portugal, 2005), pp. 120–132.
165. G. Ferrari, Vedi cosa intendo?, in Percezione, liguaggio, coscienza, Saggi di filosofia della
mente, ed. by M. Carenini, M. Matteuzzi (Quodlibet, Macerata, Italy, 1999), pp. 203–224. In
Italian.
166. P. Ferrari, Abstraction in mathematics. Phil. Trans. Roy. Soc. B 358, 1225–1230 (2003)
167. R. Fikes, N. Nilsson, Strips: a new approach to the application of theorem proving to problem
solving. Artif. Intell. 2, 189–208 (1971)
168. K. Fine, The Limit of Abstraction (Clarendon Press, Oxford, 2002)
169. S. Fine, Y. Singer, N. Tishby, The hierarchical hidden markov model: analysis and applications.
Mach. Learn. 32, 41–62 (1998)
170. K. Fisher, J. Mitchell, On the relationship between classes, objects, and data abstraction.
Theor. Pract. Object Syst. 4, 3–25 (1998)
171. R. Fitch, B. Hengst, D. Šuc, G. Calbert, J. Scholz, Structural abstraction experiments in
Reinforcement Learning. Lect. Notes Artif. Intell. 3809, 164–175 (2005)
172. P. Flach, Predicate invention in inductive data Engineering. in Proceedings of the European
Conference on Machine Learning, (Wien, Austria, 1993), pp. 83–94.
173. P. Flach, N. Lavrač, The role of feature construction in inductive rule learning. in Proceedings
of the ICML Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries,
(Stanford, USA, 2000), pp. 1–11.
174. P. Flener, U. Schmid, Predicate invention, in Encyclopedia of Machine Learning, ed. by C.
Sammut, G.I. Webb (Springer, USA, 2010), pp. 537–544
175. L. Floridi, The method of levels of abstraction. Minds Mach. 18, 303–329 (2008)
176. L. Floridi, J. Sanders, The method of abstraction, in Yearbook of the Artificial, vol. 2, ed. by
M. Negrotti (Peter Lang AG, Germany, 2004), pp. 178–220
177. G. Forman, An extensive empirical study of feature selection metrics for text classification.
J. Mach. Learn. Res. pp. 1289–1305 (2003).
178. S. Fortunato, C. Castellano, Community structure in graphs. Networks 814, 42 (2007)
179. A. Frank, A (Asuncion, UCI Machine Learning Repository, 2010)
180. A. Frank, An operational meta-model for handling multiple scales in agent-based simulations.
in Proceedings of the Dagstuhl Seminar, (Dagstuhl, Germany, 2012), pp. 1–6.
181. G. Frege, Rezension von E. Husserl: philosophie der Arithmetik. Zeitschrift fúr Philosophie
und Philosophische Kritik 103, 313–332 (1894).
182. E. Freuder, Eliminating interchangeable values in constraint satisfaction problems. in Proceed-
ings of the 9th National Conference of the American Association for Artificial Intelligence,
(Anaheim, USA, 1991), pp. 227–233.
183. E. Freuder, D. Sabin, Interchangeability supports abstraction and reformulation for multi-
dimensional constraint satisfaction. in Proceedings of 13th National Conference of the Amer-
ican Association for Artificial Intelligence, (Portland, USA, 1996), pp. 191–196.
184. G. Friedrich, Theory diagnoses: a concise characterization of faulty systems. in Proceedings of
the 13th International Joint Conference on Artificial Intelligence, (Chambéry, France, 1993),
pp. 1466–1471.
185. L. Frommberger, Qualitative Spatial Abstraction in Reinforcement Learning, (Springer, 2010)
186. M. Gabbrielli, S. Martini, Data abstraction, in Programming Languages: Principles and Par-
adigms, ed. by A. Tucker, R. Noonan (Springer, Heidelberg, 2010), pp. 265–276
References 463
187. U. Galassi, M. Botta, A. Giordana, Hierarchical hidden markov models for user/process profile
learning. Fundam. Informaticae 78, 487–505 (2007)
188. U. Galassi, A. Giordana, L. Saitta, Structured hidden markov models: a general tool for
modeling agent behaviors, in Soft Computing Applications in Business, ed. by B. Prasad
(Springer, Heidelberg, 2008), pp. 273–292
189. J. Gama, R. Sebastião, P. Rodrigues, Issues in evaluation of stream learning algorithms. in
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, (New York, USA, 2009), pp. 329–338.
190. E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design patterns: abstraction and reuse of
object-oriented design. Lect. Notes Comput. Sci. 707, 406–431 (1993)
191. E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns (Elements of Reusable Object-
Oriented Software (Addison-Wesley Professional, Boston, 2005)
192. M. Garland, Multiresolution Modeling: Survey and Future Opportunities. Eurographics ’99-
State of the Art Reports, pp. 111–131 (1999).
193. P. Gärdenfors, Language and the evolution of cognition. in Lund University Cognitive Studies,
vol. 41. (Lund University Press, 1995).
194. P. Gärdenfors, Conceptual Spaces (MIT Press, Cambridge, 2004)
195. M. Gell-Mann, S. Lloyd, Effective complexity, in Nonextensive Entropy-Interdisciplinary
Applications, ed. by M. Gell-Mann, C. Tsallis (Oxford University Press, Oxford, 2003), pp.
387–398
196. I. Gent, T. Walsh, Phase transitions from real computational problems. in Proceedings of
the 8th International Symposium on Artificial Intelligence, (Monterrey, Mexico, 1995), pp.
356–364.
197. D. Gentner, Structure-mapping: a theoretical framework for analogy. Cogn. Sci. 7, 155–170
(1983)
198. D. Gentner, Analogical reasoning, Psychology, in Encyclopedia of Cognitive Science, ed. by
L. Nadel (Nature Publishing Group, London, 2003), pp. 106–112
199. D. Gentner, L. Smith, Analogical reasoning, in Encyclopedia of Human Behavior, ed. by V.S.
Ramachandran 2nd, edn. (Elsevier, Oxford, 2012), pp. 130–136
200. L. Getoor, B. Taskar, Introduction to Statistical Relational Learning (The MIT Press, Cam-
bridge, 2007)
201. C. Ghezzi, M. Jazayeri, D. Mandrioli, Fundamentals of Software Engineering, 2nd edn. (Pear-
son, NJ, 2003)
202. C. Ghidini, F. Giunchiglia, Local models semantics, or contextual reasoning = locality +
compatibility. Artif. Intell. 127, 221–259 (2001)
203. C. Ghidini, F. Giunchiglia, A semantic for abstraction. in Proceedings of 16th European Conf.
on Artificial Intelligence, (Valencia, Spain, 2004), pp. 338–342.
204. M. Gick, K. Holyoak, Analogical problem solving. Cogn. Psychol. 12, 306–355 (1980)
205. A. Gilpin, T. Sandholm, Lossless abstraction of imperfect information games. J. ACM 54,
1–32 (2007)
206. A. Giordana, G. Lobello, L. Saitta, Abstraction in propositional calculus. in Proceedings of
the Workshop on Knowledge Compilation and Speed Up Learning, (Amherst, USA, 1993),
pp. 56–64.
207. A. Giordana, G. Peretto, D. Roverso, L. Saitta, Abstraction: An alternative view of concept
acquisition, in Methodologies for Intelligent Systems, vol. 5, ed. by Z.W. Ras, M.L. Emrich
(Elsevier, New York, 1990), pp. 379–387
208. A. Giordana, L. Saitta, Abstraction: a general framework for learning. in Working Notes of the
AAAI Workshop on Automated Generation of Approximations and Abstractions, (Boston,
USA, 1990), pp. 245–256.
209. A. Giordana, L. Saitta, Phase transitions in relational learning. Mach. Learn. 41, 217–251
(2000)
210. A. Giordana, L. Saitta, D. Roverso, Abstracting concepts with inverse resolution. in Pro-
ceedings of the 8th International Machine Learning Workshop, (Evanston, USA, 1991), pp.
142–146.
464 References
287. R.D. King, A. Srinivasan, L. Dehaspe, Warmr: a data mining tool for chemical data. J. Comput.
Aided Mol. Des. 15, 173–181 (2001)
288. Y. Kinoshita, K. Nishizawa, An algebraic semantics of predicate abstraction for PML. Inform.
Media Technol. 5, 48–57 (2010)
289. C. Knoblock, Automatically generating abstractions for planning. Artif. Intell. 68, 243–302
(1994)
290. C. Knoblock, S. Minton, O. Etzioni, Integrating abstraction and explanation-based learning in
PRODIGY. in Proceedings of the 9th National Conference on Artificial Intelligence, (Menlo
Park, USA, 1991), pp. 541–546.
291. R. Kohavi, G. John, Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
292. R. Kohavi, M. Sahami, Error-based and entropy-based discretization of continuous features.
in Proceedings of the 2nd Knowledge Discovery and Data Mining Conference, (Portland,
USA, 1996), pp. 114–119.
293. S. Kok, P. Domingos, Statistical predicate invention. in Proceedings of the 24th International
Conference on Machine Learning, (Corvallis, USA, 2007), pp. 433–440.
294. D. Koller, M. Sahami, Toward optimal feature selection. in Proceedings of the 13th Interna-
tional Conference on Machine Learning, (Bari, Italy, 1996), pp. 284–292.
295. A. Kolmogorov, Three approaches to the quantitative definition of information. Probl. Inf.
Trans. 1, 4–7 (1965)
296. M. Koppel, Complexity, depth, and sophistication. Complex Syst. 1, 1087–1091 (1987)
297. R. Korf, Toward a model of representation changes. Artif. Intell. 14, 41–78 (1980)
298. S. Kotsiantis, D. Kanellopoulos, Discretization techniques: a recent survey. GESTS Int. Trans.
Comput. Sci. Eng. 32, 47–58 (2006)
299. R. Kowalski, Logic for Problem-Solving (North-Holland Publising, Amsterdam, 1986)
300. O. Kozlova, O. Sigaud, C. Meyer, Texdyna: hierarchical Reinforcement Learning in factored
MDPs. Lect. Notes Artif. Intell. 6226, 489–500 (2010)
301. J. Kramer, Is abstraction the key to computing? Commun. ACM 50, 37–42 (2007)
302. S. Kramer, Predicate invention: A comprehensive view. Technical Report OFAI-TR-95-32,
Austrian Research Institute for Artificial Intelligence, (Vienna, 1995).
303. S. Kramer, N. Lavrač, P. Flach, Propositionalization approaches to relational data mining, in
Relational Data Mining, ed. by S. Dzeroski, N. Lavrač (Springer, Berlin, 2001), pp. 262–291
304. S. Kramer, B. Pfahringer, C. Helma, Stochastic propositionalization of non-determinate back-
ground knowledge. Lect. Notes Comput. Sci. 1446, 80–94 (1998)
305. M. Krogel, S. Rawles, F. Železný, P. Flach, N. Lavrač, S. Wrobel, Comparative evaluation of
approaches to propositionalization. in Proceedings of the 13th International Conference on
Inductive Logic Programming, (Szeged, Hungary, 2003), pp. 194–217.
306. Y. Kudoh, M. Haraguchi, Y. Okubo, Data abstractions for decision tree induction. Theor.
Comput. Sci. 292, 387–416 (2003)
307. P. Kuksa, Y. Qi, B. Bai, R. Collobert, J. Weston, V. Pavlovic, X. Ning, Semi-supervised
abstraction-augmented string kernel for multi-level bio-relation extraction. in Proceedings of
the European Conference on Machine Learning, (Barcelona, Spain, 2010), pp. 128–144.
308. M. Kurant, P. Thiran, Layered complex networks. Phys. Rev. Lett. 96, 138701 (2006)
309. R. López-Ruiz, Statistical complexity and Fisher-Shannon information: Applications, in K,
ed. by Statistical Complexity (New York, Sen (Springer, 2011), pp. 65–127
310. R. Lambiotte, Multi-scale modularity in complex networks. in it Proceedings of the 8th Inter-
national Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Net-
works (Avignon, France, 2010) pp. 546–553.
311. A. Lancichinetti, S. Fortunato, Consensus clustering in complex networks. Sci. Rep. 2, 336–
342 (2012)
312. A. Lancichinetti, S. Fortunato, J. Kertesz, Detecting the overlapping and hierarchical com-
munity structure of complex networks. New J. Phys. 11, 033015 (2009)
313. A. Lancichinetti, S. Fortunato, F. Radicchi, Benchmark graphs for testing community detec-
tion algorithms. Phys. Rev. E 78, 046110 (2008)
314. T. Lang, Rules for the robot droughtsmen. Geogr. Mag. 42, 50–51 (1969)
468 References
315. P. Langley, Scientific Discovery: Computational Explorations of the Creative Processes (MIT
Press, Cambridge, 1987)
316. P. Langley, The computer-aided discovery of scientific knowledge. Int. J. Hum-Comput. Stud.
53, 393–410 (2000)
317. Y. Lasheng, J. Zhongbin, L. Kang, Research on task decomposition and state abstraction in
Reinforcement Learning. Artif. Intell. Rev. 38, 119–127 (2012)
318. N. Lavrač, P. Flach, An extended transformation approach to Inductive Logic Programming.
ACM Trans. Comput. Log. 2, 458–494 (2001)
319. N. Lavrač, J. Fürnkranz, D. Gamberger, Explicit feature construction and manipulation for
covering rule learning algorithms, in Advances in Machine Learning I, ed. by J. Koronacki,
Z. Ras, S. Wierzchon (Springer, New York, 2010), pp. 121–146
320. N. Lavrač, D. Gamberger, P. Turney, A relevancy filter for constructive induction. IEEE Intell.
Syst. Their Appl. 13, 50–56 (1998)
321. H. Laycock, Notes to object. in, Stanford Encyclopedia of Philosophy, 2010.
322. A. Lazaric, M. Ghavamzadeh, R. Munos, Analysis of a classification-based policy iteration
algorithm. in Proceedings of the 27th International Conference on Machine Learning (Haifa,
Israel, 2010), pp. 607–614.
323. H. Leather, E. Bonilla, M. O’Boyle, Automatic feature generation for Machine Learning based
optimizing compilation. in Proceedings of International Symposium on Code Generation and
Optimization (Seattle, 2009), pp. 81–91.
324. C. Lecoutre, Constraint Networks: Techniques and Algorithms (Wiley, 2009).
325. E. Leicht, M. Newman, Community structure in directed networks. Phys. Rev. Lett. 100,
118703 (2008)
326. U. Leron, Abstraction barriers in Mathematics and Computer Science. in Proceedings of 3rd
International Conference on Logo and Math Education (Montreal, Canada, 1987).
327. D. Levin, D. Simons, Failure to detect changes to attended objects in motion pictures. Psychon.
Bull. Rev. 4, 501–506 (1997)
328. A. Levy, Creating abstractions using relevance reasoning. in Proceedings of the 12th National
Conference on Artificial Intelligence (Seattle, 1994), pp. 588–594.
329. D. Lewis, On the Plurality of Worlds (Basil Blackwell, Oxford, 1986)
330. L. Li, T. Walsh, M.L. Littman, Towards a unified theory of state abstraction for MDPs. in
Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics
(Fort Lauderdale, 2010), pp. 531–539.
331. M. Li, P. Vitànyi, An Introduction to Kolmogorov Complexity and its Applications, 2nd edn.
(Springer, New York, 1997)
332. S. Li, M. Ying, Soft constraint abstraction based on semiring homomorphism. Theor. Comput.
Sci. 403, 192–201 (2008)
333. B. Liskov, J. Guttag, Abstraction and Specification in Program Development (MIT Press,
Cambridge, 1986)
334. H. Liu, H. Motoda, On issues of instance selection. Data Min. Knowl. Discov. 6, 115–130
(2002)
335. H. Liu, H. Motoda, R. Setiono, Z. Zhao, Feature selection: An ever evolving frontier in Data
Mining. J. Mach. Learn. Res. 10, 4–13 (2010)
336. H. Liu, R. Setiono, Feature selection via discretization. IEEE Trans. Knowl. Data Eng. 9,
642–645 (1997)
337. H. Liu, F. Hussain, C.L. Tan, M. Dash, Discretization: An enabling technique. Data Min.
Knowl. Discov. 6, 393–423 (2002)
338. S. Lloyd, H. Pagels, Complexity as thermodynamic depth. Ann. Phys. 188, 186–213 (1988)
339. J. Locke, Essay Concerning Human Understanding (Eliz Holt for Thomas Basset, London,
1690)
340. R. López-Ruiz, H. Mancini, X. Calbet, A statistical measure of complexity. Phys. Lett. A 209,
321–326 (1995)
341. A. Lovett, K. Forbus, Modeling multiple strategies for solving geometric analogy problems.
in Proceedings of the 34th Annual Conference of the Cognitive Science Society (Sapporo,
Japan, 2012).
References 469
415. J. Pearl, On the connection between the complexity and the credibility of inferred models.
Int. J. Gen. Syst. 4, 255–264 (1978)
416. C. Perlich, F. Provost, Distribution-based aggregation for relational learning with identifier
attributes. Mach. Learn. 62, 65–105 (2006)
417. J. Piaget, Genetic epistemology (Columbia University Press, New York, 1968)
418. S. Piramuthu, R.T. Sikora, Iterative feature construction for improving inductive learning
algorithms. Expert Syst. Appl. 36, 3401–3406 (2009)
419. D. Plaisted, Theorem proving with abstraction. Artif. Intell. 16, 47–108 (1981)
420. Plato. oλιτ ια (Republic), 7.514a. 380 BC
421. J. Platt, Prediction of isomeric differences in paraffin properties. J. Phys. Chem. 56, 328–336
(1952)
422. C. Plazanet, Enrichissement des bases de données géographiques : Analyse de la géométrie
des objets linéaires pour la généralisation cartographique (Application aux routes). Ph.D.
thesis, (University Marne-la-Vallée, France, 1996), In French.
423. G. Plotkin, A further note on inductive generalization. in Machine Intelligence, vol 6, (Edin-
burgh University Press, 1971).
424. G. Polya, How to Solve It: A New Aspect of Mathematical Methods (Princeton University
Press, Princeton, 1945)
425. M. Ponsen, M. Taylor, K. Tuyls, Abstraction and generalization in reinforcement learning: a
summary and framework. Lect. Notes Comput. Sci. 5924, 1–32 (2010)
426. K. Popper, The Logic of Scientific Discovery (Harper Torch, New York, 1968)
427. M. Poudret, A. Arnould, J. Comet, P.L. Gall, P. Meseure, F. Képès, Topology-based abstraction
of complex biological systems: application to the Golgi apparatus. Theor. Biosci. 127, 79–88
(2008)
428. W. Prenninger, A. Pretschner, Abstraction for model-based testing. Electron. Notes Theore.
Comput. Sci. 116, 59–71 (2005)
429. A. Prieditis, Machine discovery of admissible heuristics. Mach. Learn. 12, 117–142 (1993)
430. E. Prifti, J.D. Zucker, K. Clement, C. Henegar, Interactional and functional centrality in
transcriptional co-expression networks. Bioinform. 26(24), 3083–3089 (2010)
431. P. Prosser, An empirical study of phase transitions in constraint satisfaction problems. Artif.
Intell. 81, 81–109 (1996)
432. G. Provan, Hierarchical model-based diagnosis. in Proceedings of the 12th International Work-
shop on Principles of Diagnosis, (Murnau, Germany, 2001), pp. 167–174.
433. J. Provost, B.J. Kuipers, R. Miikkulainen, Developing navigation behavior through self-
organizing distinctive state abstraction. Connection Sci. 18, 159–172 (2006)
434. L. Pyeatt, A. Howe, Decision tree function approximation in Reinforcement Learning. in
Proceedings of the 3rd International Symposium on Adaptive Systems: Evolutionary Com-
putation and Probabilistic Graphical Models, (Havana, Cuba, 2001), pp. 70–77.
435. Z. Pylyshyn, What the mind’s eye tells the mind’s brain: a critique of mental imagery. Psychol.
Bull. 80, 1–24 (1973)
436. Z. Pylyshyn, Computation and Cognition: Toward a Foundation for Cognitive Science (MIT
Press, Cambridge, 1984)
437. W. Quine, Word and Object (MIT Press, Cambridge, 1960)
438. J. Quinlan, R. Cameron-Jones, Induction of logic programs: Foil and related systems. New
Gen. Comput. 13, 287–312 (1995)
439. J.R. Quinlan, R.M. Cameron-Jones, Foil: A midterm report. Lect. Notes Comput. Sci. 667,
3–20 (1993)
440. R. Quinlan, Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
441. L. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recog-
nition. Proc. IEEE 77, 257–286 (1989)
442. M. Ramscar, D. Yarlett, Semantic grounding in models of analogy: an environmental approach.
Cogn. Sci. 27, 41–71 (2003)
443. E. Ravasz, A. Barabasi, Hierarchical organization in complex networks. Phys. Rev. E 67,
026112 (2003)
References 473
470. L. Saitta, J.-D. Zucker, A model of abstraction in visual perception. Int. J. Appl. Intell. 80,
134–155 (2001)
471. L. Saitta, J.-D. Zucker, Abstraction and complexity measures. Lect. Notes Comput. Sci. 4612,
375–390 (2007)
472. L. Saitta, C. Vrain, Abstracting Markov networks. in Proceedings of the AAAI Workshop on
Abstraction, Reformulation, and Approximation, (Atlanta, Georgia, USA, 2010).
473. L. Saitta (ed.), The abstraction paths. Special Issue of the Philos. Trans. Roy. Soc. B 358,
1435 (2003).
474. M. Sales-Pardo, R. Guimerà, A. Moreira, L.N. Amaral, Extracting the hierarchical organiza-
tion of complex systems. PNAS 104, 15224–15229 (2007)
475. C. Sammut (ed.), Encyclopedia of Machine Learning (Springer, New York, 2011)
476. J. Schlimmer, Learning and representation change. in Proceedings of the 6th National Con-
ference on, Artificial Intelligence, pp. 511–535 (1987).
477. J. Schmidhuber, Low-complexity art. J. Int. Soc. Arts Sci. Technol. 30, 97–103 (1997)
478. H. Schmidtke, W. Woo, A size-based qualitative approach to the representation of spatial gran-
ularity. in Proceedings of the 20th International Joint Conference on Artificial Intelligence,
(Bangalore, India, 2007), pp. 563–568.
479. R. Schrag, D. Miranker, Abstraction in the CSP phase transition boundary. in Proceedings of
the 4th International Symposium on Artificial Intelligence and Mathematics, (Ft. Lauderdale,
USA, 1995), pp. 126–133.
480. J. Seligman, From logic to probability. Lect. Notes Comput. Sci. 5363, 193–233 (2009)
481. R. Serna Oliver, I. Shcherbakov, G. Fohler, An operating system abstraction layer for portable
applications in wireless sensor networks. in Proceedings of the ACM Symposium on Applied
Computing, (Sierre, Switzerland, 2010), pp. 742–748.
482. A. Sfard, L. Linchevsky, The gains and pitfalls of reification-the case of alegbra. Educ. Stud.
Math. 26, 191–228 (1994)
483. C. Shannon, The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423
(1948)
484. A. Sharpanskykh, Agent-based modeling and analysis of socio-technical systems. Cybern.
Syst. 42, 308–323 (2011)
485. S. Shekhar, C. Lu, P. Zhang, A unified approach to detecting spatial outliers. GeoInformatica
7, 139–166 (2003)
486. J. Shi, M. Littman, Abstraction methods for game theoretic poker. in Proceedings of the 2nd
International Conference on Computers and Games, (Hamamatsu, Japan, 2001), pp. 333–345.
487. J. Shiner, M. Davison, P. Landsberg, Simple measure of complexity. Phys. Rev. E 39, 1459–
1464 (1999)
488. G. Silverstein, M. Pazzani, Relational clichés: Constraining constructive induction during
relational learning. in Proceedings of the 8th International Workshop on Machine Learning,
(Evanston, USA, 1991), pp. 203–207.
489. G. Simmons, Shapes, Part Structure and Object Concepts, in Proceedings of the ECAI Work-
shop on Parts and Wholes: Conceptual Part-Whole Relationships and Formal Mereology
(Nederlands, Amsterdam, 1994)
490. H. Simon, The Sciences of the Artificial, 3rd edn. (MIT Press, Cambridge, 1999)
491. D. Simons, Current approaches to change blindness. Vis. Cogn. 7, 1–15 (2000)
492. D. Simons, C. Chabris, T. Schnur, Evidence for preserved representations in change blindness.
Conscious. Cogn. 11, 78–97 (2002)
493. Ö. Simsek, Workshop summary: abstraction in reinforcement learning. in Proceedings of the
International Conference on Machine Learning, (Montreal, Canada, 2009), p. 170.
494. M. Sizintsev, R. Wildes, Coarse-to-fine stereo vision with accurate 3D boundaries. Image Vis.
Comput. 28, 352–366 (2010)
495. B. Smith, M. Dyer, Locating the phase transition in binary constraint satisfaction problems.
Artif. Intell. 81, 155–181 (1996)
496. R.M. Smullyan, First-Order Logic (Dover Publications, Mineola, 1995)
References 475
497. N.N. Soja, S. Carey, E. Spelke, Ontological categories guide young children’s inductions of
word meaning: Object terms and substance terms. Cognition 38, 179–211 (1991)
498. R. Solomonoff, A formal theory of inductive inference-Part I. Inf. Contl. 7, 1–22 (1964)
499. R. Solomonoff, A formal theory of inductive inference-Part II. Inf. Contl. 7, 224–254 (1964)
500. J. Sowa, A. Majumdar, Conceptual structures for knowledge creation and communication. in
Proceedings of the International Conference on Conceptual Structures, (Dresden, Germany,
2012), pp. 17–24.
501. A. Srinivasan, S. Muggleton, M. Bain, Distinguishing exceptions from noise in non-monotonic
learning. in Proceedings of the 2nd International Workshop on Inductive Logic Programming,
(Tokyo, Japan, 1992), pp. 203–207.
502. S. Srivastava, N. Immerman, S. Zilberstein, Abstract planning with unknown object quantities
and properties. in Proceedings of the 8th Symposium on Abstraction, Reformulation, and
Approximation, (Lake Arrowhead, USA, 2009), pp. 143–150.
503. M. Stacey, C. McGregor, Temporal abstraction in intelligent clinical data analysis: a survey.
Artif. Intell. Med. 39, 1–24 (2007)
504. I. Stahl, Predicate invention in ILP-An overview. in Proceedings of the European Conference
on Machine Learning, (Vienna, Austria, 1993), pp. 311–322.
505. I. Stahl, The appropriateness of predicate invention as bias shift operation in ILP. Mach. Learn.
20, 95–117 (1995)
506. F. Staub, E. Stern, Abstract reasoning with mathematical constructs. Int. J. Educ. Res. 27,
63–75 (1997)
507. M. Stefik, Planning with constraints (MOLGEN: Part 1). Artif. Intell. 16, 111–139 (1981)
508. M. Stolle, D. Precup, Learning options in reinforcement learning. Lect. Notes Comput. Sci.
2371, 212–223 (2002)
509. J.V. Stone, Computer vision: What is the object? in Proceedings of the Artificial Intelligence
and Simulation of Behaviour Conference (Birmingham, UK, 1993), pp. 199–208
510. P. Struss, A. Malik, M. Sachenbacher, Qualitative modeling is the key to automated diagnosis.
in Proceedings of the 13st World Congress of the International Federation of Automatic
Control, (San Francisco, USA, 1996).
511. S. Stylianou, M.M. Fyrillas, Y. Chrysanthou, Scalable pedestrian simulation for virtual cities.
in Proceedings of the ACM Symposium on Virtual Reality software and technology, (New
York, USA, 2004), pp. 65–72.
512. D. Subramanian, A theory of justified reformulations. in Change of Representation and Induc-
tive Bias, ed. by P. Benjamin (Kluwer Academic Press, 1990), pp. 147–168.
513. D. Subramanian, R. Greiner, J. Pearl, The relevance of relevance (editorial). Artif. Intell. 97,
1–2 (1997)
514. S. Sun, N. Wang, Formalizing the multiple abstraction process within the G-KRA model
framework. in Proceedings of the International Conference on Intelligent Computing and
Integrated Systems, (Guilin, China, 2010), pp. 281–284.
515. S. Sun, N. Wang, D. Ouyang, General KRA abstraction model. J. Jilin Univ. 47, 537–542
(2009). In Chinese.
516. R. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: a framework for temporal
abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)
517. R. S. Sutton, E. J. Rafols, A. Koop, Temporal abstraction in temporal-difference networks. in
Proceedings of the NIPS-18, (Vancouver, Canada, 2006), pp. 1313–1320.
518. R. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse
coarse coding. Advances in Neural Information Processing Systems, pp. 1038–1044 (1996).
519. R. Sutton, A. Barto, Reinforcement Learning (MIT Press, Cambridge, 1998)
520. R. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy gradient methods for Reinforcement
Learning with function approximation. Adv. NIPS 12, 1057–1063 (2000)
521. A. Swearngin, B. Choueiry, E. Freuder, A reformulation strategy for multi-dimensional CSPs:
The case study of the SET game. in Proceedings of the 9th International Symposium on
Abstraction, Reformulation and Approximation, (Cardona, Spagna, 2011), pp. 107–116.
476 References
522. B. Sylvand, Une brève histoire du concept de “concept”. Ph.D. thesis, (Université La Sorbonne,
Paris, France, 2006), In French.
523. C. Szepesvári, Algorithms for Reinforcement Learning, (Morgan & Claypool, 2010).
524. M.E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: a survey. J.
Mach. Learn. Res. 10, 1633–1685 (2009)
525. J. Tenenberg, Abstraction in Planning, Ph.D. thesis, (Universtiy of Rochester, USA, 1988).
526. J. Tenenberg, Preserving consistency across abstraction mappings. in Proceedings 10th Inter-
national Joint Conference on Artificial Intelligence (Milan, Italy, 1987), pp. 1011–1014.
527. B. ter Haar Romeny, Designing multi-scale medical image analysis algorithms. in Proceedings
of the International Conference on Pattern Recognition (Tutorial) (Istanbul, Turkey, 2010).
528. C. Thinus-Blanc, Animal Spatial Cognition (World Scientific Publishing, Singapore, 1996)
529. R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B
(Methodological) 58, 267–288 (1996)
530. A. Tollner-Burngasser, M. Riley, W. Nelson, Individual and team susceptibility to change
blindness. Aviation Space Environ. Med. 81, 935–943 (2010)
531. P. Torasso, G. Torta, Automatic abstraction of time-varying system models for model based
diagnosis. Lect. Notes Artif. Intell. 3698, 176–190 (2005)
532. G. Torta, P. Torasso, Automatic abstraction in component-based diagnosis driven by system
observability. in Proceedings of the 18th International Joint Conference on Artificial Intelli-
gence, (Acapulco, Mexico, 2003), pp. 394–400.
533. G. Torta, P. Torasso, Qualitative domain abstractions for time-varying systems: an approach
based on reusable abstraction fragments. in Proceedings of the 17th International Workshop
on Principles of Diagnosis (Peñaranda de Duero, Spain, 2006), pp. 265–272.
534. V. Truppa, E.P. Mortari, D. Garofoli, S. Privitera, E. Visalberghi, Same/different concept
learning by Capuchin monkeys in matching-to-sample tasks. PLoS One 6, e23809 (2011)
535. J. Tsitsiklis, B. Van Roy, An analysis of temporal-difference learning with function approxi-
mation. IEEE Trans. Autom. Contl. 42, 674–690 (1997)
536. P. Turney, A uniform approach to analogies, synonyms, antonyms, and associations. in Pro-
ceedings of the International Conference on Computational Linguistics, vol. 1, (Manchester,
UK, 2008), pp. 905–912.
537. E. Tuv, A. Borisov, G. Runger, K. Torkkola, Feature selection with ensembles, artificial
variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
538. B. Tversky, K. Hemenway, Objects, parts, and categories. J. Exp. Phychol. Gen. 113, 169–193
(1984)
539. J. Ullman, Principles of Databases (Computer Science, Baltimore, 1982)
540. J. Ullman, Implementation of logical query languages for databases. ACM Trans. Database
Syst. 10, 298–321 (1985)
541. S. Ullman, Visual routines. Cognition 18, 97–159 (1984)
542. P.E. Utgoff, D.J. Stracuzzi, Many-layered learning. Neural Comput. 14, 2497–2529 (2002)
543. R. Valdés-Pérez, Principles of human-computer collaboration for knowledge discovery in
science. Artif. Intell. 107, 335–346 (1999)
544. M. Valtorta, A result on the computational complexity of heuristic estimates for the A∗
algorithm. Inf. Sci. 34, 48–59 (1984)
545. D. van Dalen, Logic and Structure, 4th edn. (Springer, New York, 2004)
546. P. Vitányi, Meaningful information. in Proceedings of the 13th International Symposium on
Algorithms and Computation, (Vancouver, Canada, 2002), pp. 588–599.
547. D. Vo, A. Drogoul, J.-D. Zucker, An operational meta-model for handling multiple scales in
agent-based simulations. in Proceedings of the International Conference on Computing and
Communication Technologies, Research, Innovation, and Vision for the Future (Ho Chi Minh
City, Vietnam, 2012), pp. 1–6.
548. P. Vogt, The physical symbol grounding problem. Cogn. Syst. Res. 3, 429–457 (2002)
549. F. Wang, On the abstraction of conventional dynamic systems: from numerical analysis to
linguistic analysis. Inf. Sci. 171, 233–259 (2005)
References 477
574. Q. Yang, Intelligent Planning: A Cecomposition and Abstraction Based Approach (Springer,
1997).
575. K. Yip, F. Zhao, Spatial aggregation: theory and applications. J. Artif. Intell. Res. 5, 1–26
(1996)
576. L. Zadeh, Fuzzy sets. Inf. Contl. 8, 338–353 (1965)
577. L. Zadeh, The concept of a linguistic variable and its application to approximate reasoning-I.
Inf. Sci. 8, 199–249 (1975)
578. M. Záková, F. Zelezný, Exploiting term, predicate, and feature taxonomies in propositional-
ization and propositional rule learning. Lect. Notes Comput. Sci. 4701, 798–805 (2007)
579. S. Zeki, The visual image in mind and brain. Sci. Am. 267, 68–76 (1992)
580. S. Zeki, A Vision of the Brain (Blackwell, Oxford, 1993)
581. S. Zeki, Inner Vision (Oxford University Press, Oxford, 1999)
582. S. Zeki, Splendors and Miseries of the Brain (Wiley-Blackwell, Oxford, 2009)
583. F. Železný, N. Lavrač, Propositionalization-based relational subgroup discovery with RSD.
Mach. Learn. 62, 33–63 (2006)
584. C. Zeng, S. Arikawa, Applying inverse resolution to EFS language learning. in Proceedings
of the International Conference for Young Computer Scientists (Shanghai, China, 1999), pp.
480–487.
585. S. Zhang, X. Ning, X. Zhang, Graph kernels, hierarchical clustering, and network community
structure: experiments and comparative analysis. Eur. Phys. J. B 57, 67–74 (2007)
586. F. Zhou, S. Mahler, H. Toivonen, Review of network abstraction techniques. in Proceedings of
the ECML Workshop on Explorative Analytics of Information Networks (Bristol, UK, 2009).
587. S. Zilles, R. Holte, The computational complexity of avoiding spurious states in state space
abstraction. Artif. Intell. 174, 1072–1092 (2010)
588. R. Zimmer, Abstraction in art with implications for perception. Phil. Trans. Roy. Soc. B 358,
1285–1291 (2003)
589. L. Zuck, A. Pnueli, Model checking and abstraction to the aid of parameterized systems (a
survey). Comput. Lang. Syst. Struct. 30, 139–169 (2004)
590. J.-D. Zucker, A grounded theory of abstraction in artificial intelligence. Phil. Trans. Roy. Soc.
Lond. B 358, 1293–1309 (2003)
591. J.-D. Zucker, J.-G. Ganascia, Selective reformulation of examples in concept learning. in
Proceedings of the 11th International Conference on Machine Learning (New Brunswick,
USA, 1994), pp. 352–360.
592. J.-D. Zucker, J.-G. Ganascia, Changes of representation for efficient learning in structural
domains. in Proceedings of the 13th International Conference on Machine Learning (Bari,
Italy, 1996), pp. 543–551.
Index
L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 479
DOI: 10.1007/978-1-4614-7052-6, Ó Springer Science+Business Media New York 2013
480 Index
D F
de-abstracting, 236, 266 Faltings, 59
de-abstraction, 12 Felner, 58
Deductive, 68 Ferrari, 22, 23, 43
Deleting arguments, 264 Fialetti, 10
Dennet, 18 Fikes, 55
Density, 59, 289 Filter, 279, 280, 283, 288, 289, 326
Design pattern, 8, 217–219 Fisher, 25
Desirable property, 57, 63 Flat, 303
Detail. See Level of detail Floridi, 4, 9, 16, 71–74, 76–78, 165, 236, 237,
Diagnostic, 35, 364–366, 371 244, 245, 247, 282
Dietterich, 276, 295, 297, 301 FOL, 51, 53, 155, 273–275
Discrete, 20, 61, 143, 144, 170, 235, 277, 330, Forgetron, 283
345, 352, 365 Forward, 26, 34, 231, 247, 289, 293, 336, 368,
Discretization, 60, 189, 201, 232, 273, 274, 390
278, 285, 286, 303, 326, 410 Fourier, 45, 288
Discretize, 253, 274, 285, 286 Fractal, 45, 269, 353
Distance, 9, 32, 36, 37, 47, 66, 268, 333, 337, Fragment, 332, 366, 369
340, 344, 345, 350, 353, 376, 386, 394, Frame induction, 386
395 Francisco, 251, 252
Domain, 60, 364, 365, 396, 397, 400 Frege, 14–16, 20–22, 70
Dorat, 335 Frege-Russel, 22
Downward, 25, 57, 63, 400 Fringe, 288
Drastal, 273 Frixione, 42
Function approximation, 297, 299, 300, 324
Fuzzy sets, 55, 60, 253
E
Edge complexity, 332
Embedded, 279, 280, 283, 288, 289, 326 G
Emmert-Streib, 279, 280, 283, 288, 289, 326, Gabbrielli, 25
336, 345 Gaglio, 42
Index 481
O
L Objective, 14, 25–27, 280
Latent Semantic Analysis, 393 Object-to-variable binding, 33, 266
Laycock, 15, 16 Odoardo, 10
Lecoutre, 59 Ofek, 58
Leicht, 338 Oliveira, 335
Levy, 27, 50, 52, 53, 227, 244, 260–263 Ompatibility, 164, 167, 175
Lindenbaum, 43 Operators
LOD. See Level of details, 45, 46, 268 approximation, 179, 185, 198, 199, 221,
Logical structure, 333, 334 222, 227, 230–232, 249, 254, 255, 267,
Low complexity, 347, 351 409
Lower approximation reformulation, 202, 232, 410
Lowry, 54 Options, 297, 301, 302
Lozano, 338
P
M Pachet, 289
Macro-actions, 303, 324 Pagels, 349
Malevich, 29 Paris, 18, 19, 52
Manifesto, 30 Paskin, 62
Mapping function, 67, 69, 248 Pawlak, 255
Index 483
Simplification, 2, 8, 9, 14, 16, 17, 19, 22, 23, Tractability, 217, 280, 286
26, 46, 71, 76, 165, 197, 231, 276, 334, Turney, 393
361, 373–378, 393, 404, 408
Simulation, 33, 46, 62, 299, 392
Skeleton, 42, 374 U
Sketch, 12, 40, 142, 387 Ullman, 44
Smith, 14, 27, 33, 79–84, 256–259, 392 Upper approximation, 255
Solving, 3, 5, 9, 17, 21, 52, 55, 59, 60, 63, 65, Upward, 57, 401
175, 177, 221, 227, 270, 294, 371, 384, Utility, 23, 243, 374
392, 395, 407–409
Sophistication, 329, 341, 350, 351, 356
Sowa V
Specialization, 13, 86, 168 Van, 154
Spectrum, 13, 202, 289, 302, 340 Visual perception, 37, 46, 266
Srivastava, 58 Vo, 33, 62
Stepp, 274 Vries, 17, 22
Stern, 11, 20, 21
Stijl, 30
Stone, 43 W
STRATA, 54, 55 Wang, 55, 402
STRIPS, 55, 56 Watanabe, 340
Struss, 364 Weigel, 59
Stuart, 391 Weight, 277, 338, 340, 394
Stylianou, 62 Westbound, 312
SUBDUE, 294 Whistler, 29
Subramanian, 55, 146 Wiemer-Hastings, 16
Subroutines, 301 Williams, 363
Summarize, 32, 66, 176, 202, 221, 296, 299, Wilson, 34
303, 345, 353, 407 Wrapper, 279, 280, 283, 288, 289, 326, 286,
Surrogate, 298 289
Sutton, 297, 301, 302 Wright, 15, 22, 70, 71
Symbol abstraction, 263
Syntactic, 18, 23, 50–53, 172, 231, 260, 264,
266, 408 Y
Syntactic mapping Ying, 60
Young, 34
Yves, 18
T
Tarskian
Tenenberg, 50–52, 57, 239, 240, 244, 264 Z
Thagard, 392 Zambon Zhang, 336, 338
Thinus, 14, 36 Zoom, 14, 36, 336
Thiran, 335 Zooming. See Zoom
Thomas, 391 Zucker, 51, 274, 387
Tibshirani, 280