0% found this document useful (0 votes)
109 views73 pages

KRR Unit 1

1. This document provides an overview of set theory concepts including defining sets by extension or intension, common set operations like union and intersection, and properties like idempotency and De Morgan's laws. 2. Recursive definitions are used to define complex sets and involve specifying starting elements and operations to generate new elements and take the closure of the starting set under these operations. 3. Sets discard duplicates, but some applications allow duplicates, using bags instead of sets.

Uploaded by

Mogili siva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views73 pages

KRR Unit 1

1. This document provides an overview of set theory concepts including defining sets by extension or intension, common set operations like union and intersection, and properties like idempotency and De Morgan's laws. 2. Recursive definitions are used to define complex sets and involve specifying starting elements and operations to generate new elements and take the closure of the starting set under these operations. 3. Sets discard duplicates, but some applications allow duplicates, using bags instead of sets.

Uploaded by

Mogili siva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Table of Contents of Knowledge Representation

1. Logic
1.1 Historical Background 1
1.2 Representing Knowledge in Logic 11
1.3 Varieties of Logic 18
1.4 Names, Types, and Measures 29
1.5 Unity Amidst Diversity 39
2. Ontology
2.1 Ontological Categories 51
2.2 Philosophical Background 55
2.3 Top-Level Categories 67
2.4 Describing Physical Entities 78
2.5 Defining Abstractions 89
2.6 Sets, Collections, Types, and Categories 97
2.7 Space and Time 109
3. Knowledge Representations
3.1 Knowledge Engineering 132
3.2 Representing Structure in Frames 143
3.3 Rules and Data 156
3.4 Object-Oriented Systems 169
3.5 Natural Language Semantics 178
3.6 Levels of Representation 186
4. Processes
4.1 Times, Events, and Situations 206
4.2 Classification of Processes 213
4.3 Procedures, Processes, and Histories 217
4.4 Concurrent Processes 223
4.5 Computation 232
4.6 Constraint Satisfaction 239
4.7 Change 245
5. Purposes, Contexts, and Agents
5.1 Purpose 265
5.2 Syntax of Contexts 274
5.3 Semantics of Contexts 284
5.4 First-Order Reasoning in Contexts 297
5.5 Modal Reasoning in Contexts 307
5.6 Encapsulating Objects in Contexts 321
5.7 Agents 330
6. Knowledge Soup
6.1 Vagueness, Uncertainty, Randomness, and Ignorance 348
6.2 Limitations of Logic 356
6.3 Fuzzy Logic 364
6.4 Nonmonotonic Logic 373
6.5 Theories, Models, and the World 383
6.6 Semiotics 394
7. Knowledge Acquisition and Sharing
7.1 Sharing Ontologies 408
7.2 Conceptual Schema 417
7.3 Accommodating Multiple Paradigms 427
7.4 Relating Different Knowledge Representations 438
7.5 Language Patterns 445
7.6 Tools for Knowledge Acquisition 452
Appendix A: Summary of Notations
A.1 Predicate Calculus 467
A.2 Conceptual Graphs 476
A.3 Knowledge Interchange Format 489
Appendix B: Ontology Base
B.1 Principles of Ontology 492
B.2 Top-Level Categories 497
B.3 Role and Relation Types 502
B.4 Thematic Roles 506
B.5 Placement of the Thematic Roles 510
Appendix C: Extended Examples
C.1 Hotel Reservation System 513
C.2 Library Database 515
C.3 ACE Vocabulary 518
C.4 Translating ACE to Logic 518

KRR UNIT 1

1. Sets, Bags, and Sequences


Elementary or "naive" set theory is used to define basic mathematical structures.
A set is an arbitrary collection of elements, which may be real or imaginary, physical
or abstract. In mathematics, sets are usually composed of abstract things like numbers
and points, but one can also talk about sets of apples, oranges, people, or canaries. In
computer science, sets are composed of bits, bytes, pointers, and blocks of storage. In
many applications, the elements are never defined, but are left as abstractions that
could be represented in many different ways in the human brain, on a piece of paper,
or in computer storage.

Curly braces are used to enclose a set specification. For small, finite sets, the
specification of a set can be an exhaustive list of all its elements:
{1, 97, 63, 12}.
This specifies a set consisting of the four integers 1, 97, 63, and 12. Since the order of
listing the elements is immaterial, the following specification is equivalent to the one
above:
{12, 63, 97, 1}.
If the set is very large, like the set of all mammals, a complete listing is impossible. It
is hard enough to enumerate all the people in a single city, let alone all the cats, dogs,
mice, deer, sheep, and kangaroos in the entire world. For such sets, the specification
must state some rule or property that determines which elements are in the set:
{x | vertebrate(x) and warmBlooded(x) and hasHair(x) and lactiferous(x)}
This specification may be read the set of all x such that x is vertebrate, x is warm
blooded, x has hair, and x is lactiferous. A given set may be specified in more than
one way. The following four specifications all determine the same set:
{1, 2, 3}
{x | x is an integer and 0<x<4}
{x | x is a positive integer, x divides 6, and x 6}
{x | x=1 or x=2 or x=3}
A set specification that lists all elements explicitly is called a definition by extension.
A specification that states a property that must be true of each element is called a
definition by intension. Only finite sets can be defined by extension. Infinite sets must
always be defined by intension or by some operations upon other infinite sets that
were previously defined by intension.

In any theory using sets, there are two privileged sets: the empty set {}, which
contains no elements at all, and the universal set U, which contains every element that
is being considered. In mathematical discussions, for example, the universal set may
be the set of all integers Z or the set of all real numbers R. In most discussions, the
universal set is usually defined at the beginning of the presentation. Thereafter, other
sets are built up from U: subsets of U, pairs of elements of U, sets of sets of elements
from U, etc.

Of all the operators that deal with sets, the most basic is , which states whether a
particular element is in a set: the notation x S means that x is an element of the set S;
it may also be read x is a member of the set S or simply x is in S. All other operators
on sets can be defined in terms of . Let A and B be any two sets. Following are the
common operators of set theory; listed for each one is its name, standard symbol,
informal English definition, and formal definition in terms of :

 Union. A B is the set that contains all the elements in either A or B or both:
 A B = {x | x A or x B}.
 Intersection. A B is the set that contains all the elements that are in
both A and B:
 A B = {x | x A and x B}.
 Complement. -A is the set that contains everything in the universal set that is
not in A:
 -A = {x | x U and not x A}.
 Difference. A-B is the set that contains all the elements that are in A but not
in B:
 A-B = {x | x A and not x B}.
 Subset. A B means that every element of A is also an element of B:
 If x A, then x B.

In particular, every set is a subset of itself: A A.

 Proper subset. A is a proper subset of B if A B and there is at least one


element of B that is not in A:
 If x A,
 then x B and there exists some b
 where b B and not b A.
 Superset. A is a superset of B if B is a subset of A.
 Empty set. The empty set has no elements: for every x, it is false that x {}.
The empty set is a subset of every set, including itself: for every set A, {} A.
 Disjoint sets. Two sets A and B are said to be disjoint if they have no common
elements; i.e., their intersection is the empty set: A B={}.

The operators for union, intersection, and complement satisfy several standard
identities. Some of the identities listed below are similar to the rules of ordinary
arithmetic. Addition and multiplication, for example, obey the rules of commutativity
and associativity, and the minus sign obeys the rule of double complementation.
Idempotency, absorption, and De Morgan's laws, however, do not hold for ordinary
arithmetic. Distributivity holds for multiplication over addition, but addition does not
distribute over multiplication.

 Idempotency. A A is identical to A, and A A is also identical to A.


 Commutativity. A B is identical to B A, and A B is identical to B A.
 Associativity. A (B C) is identical to (A B) C, and A (B C) is
identical to (A B) C.
 Distributivity. A (B C) is identical to (A B) (A C), and A (B C) is
identical to (A B) (A C).
 Absorption. A (A B) is identical to A, and A (A B) is also identical to A.
 Double complementation. - -A is identical to A.
 De Morgan's laws. -(A B) is identical to -A -B. and -(A B) is identical to -
A -B.

For complex sets, the rule for determining which elements are in the set may be too
complex to state in a single expression. An example is the set of all grammatical
sentences in some language, natural or artificial. Such sets are typically specified by
a recursive definition:

 First a finite starting set of elements is given.


 Then some operations are specified for generating new elements of the set from
old elements.
 Finally, the set is defined to be the smallest set containing the starting elements
and all others that can be derived from them by repeated application of the
generating operations.

The set resulting from these operations is said to be the closure of the starting set
under the given generating operations. As an example of a recursive definition, the
set S of all positive integers not divisible by 3 could be specified by intension:
S = {x | x is an integer, x>0, and 3 does not divide x}.
But the property x is an integer depends on some prior definition of the set of all
integers. The following recursive definition depends only on the operation of adding
3:

 Let the set {1, 2} be a subset of S.


 If x is any element of S, then x+3 is also an element of S.
 S is the smallest set that has the above two properties; i.e., S is a proper subset
of any other set that has those properties.

All elements of S may be enumerated by starting with {1, 2}. The first stage of adding
3 generates the new elements 4 and 5, adding 3 to them gives 7 and 8, then 10 and 11,
and so on. The set S is the closure of the set {1, 2} under the operation of adding 3. A
recursive definition is a special kind of definition by intension. The formal grammars
presented in Section 10 define languages by a recursive definition in which the
generating operations are specified by a set of production rules. For a discussion and
comparison of various methods of definition, see the notes on definitions by Norman
Swartz.

A set has no duplicate elements. Since all duplicates are discarded in computing the
union of two sets, the union operator is idempotent: A A=A. In some cases, one may
want to allow duplicates; therefore, a bag is a collection of things with possible
duplicates. Since there may be more than one occurrence of a given element x,
the count operator @ is a generalization of the element operator . The
expression x@A is the number of times the element x occurs in the bag A. Bags are
useful for many purposes, such as taking averages: if four men have heights of 178cm,
184cm, 178cm, and 181cm, then the set of those numbers is {178, 181, 184} with the
average 181; but the bag of the numbers is {178, 178, 181, 184} with average 180.25.
A sequence is an ordered bag. To distinguish ordered sequences from unordered sets
and bags, the elements of a sequence are enclosed in angle brackets: 178, 184, 178,
181 ; the empty sequence is written . If a sequence has n elements, the elements
are numbered from 1 to n (or alternatively from 0 to n-1). A sequence of two elements
is sometimes called an ordered pair; a sequence of three elements, a triple; a sequence
of four, a quadruple; a sequence of five, a quintuple; and a sequence of n elements,
an n-tuple. Historically, the theory of sets was first defined without considering order.
On a piece of paper or in computer storage, however, the elements of a set must be
listed in some order. Sequences are therefore easier to represent than bags, and bags
are easier to represent than sets: a bag is a sequence with the ordering ignored, and a
set is a sequence with both order and duplicates ignored.

New sets may be created by combining elements from the universe U in various ways.
The cross product of two sets A and B, written A×B, is the set of all possible ordered
pairs with the first element of each pair taken from A and the second element from B.
If A is the set {1,2} and B is the set {x,y,z}, then A×B is the set,

{ 1,x , 1,y , 1,z , 2,x , 2,y , 2,z }.


With the notation for defining a set by a property or rule, it is possible to give a
general definition for the cross product A×B:
{ x,y | x A and y B}.
The cross product can also be extended to three or more sets. The product A×B×C is
defined as
{ x,y,z | x A, y B, and z C}.
Since René Descartes introduced pairs of numbers for identifying points on a plane,
the cross product is also called the Cartesian product in his honor.

In this book, most sets are finite. Inside a computer or the human brain, all sets that
are explicitly stored must be finite. But mathematical definitions and proofs are
generally simpler if there is no upper limit on the size of sets. Therefore, definitions in
computer science often permit infinite sets, but with the understanding that any
implementation will only choose a finite subset. Most infinite sets discussed in
computer science are assumed to be countable: a countably infinite set is one whose
elements can be put in a one-to-one correspondence with the integers. The set of all
real numbers is uncountable, but such sets are far beyond anything that can be
implemented in computer systems.

The terminology for sets is quite standard, although some authors use the
word class for set and others make a distinction between classes and sets. Bags are not
used as commonly as sets, and the terminology is less standard. Some authors use the
word multiset for a bag. Sequences are sometimes called lists or vectors, but some
authors draw distinctions between them. Some authors use the symbol ∅ for the empty
set, but the notation {} is more consistent with the notation for the empty
sequence.

2. Functions
A function is a rule for mapping the elements of one set to elements of another set.
The notation f: A B means that f is a function that maps any element x in the
set A to some element f(x) in the set B. The set A is called the domain of f, and B is
called the range of f. In mathematics, the element x is called the argument, and f(x) is
called the result or the image of x under the mapping f. In computer science, x is
called the input and f(x) is called the output.

Suppose Z is the set of all integers, and N is the set of non-negative integers (i.e. the
positive integers and zero). Then define a function square: Z N with the mapping
rule,
square(x) = x2.
The function square applies to all elements in its domain Z, but not all elements in its
range N are images of some element of Z. For example, 17 is not the square of any
integer. Conversely, some elements in N are images of two different elements of Z,
since square(3)=9 and square(-3)=9.

A function is onto if every element of its range is the image of some element of its
domain. As an example, define the absolute value function, abs: Z N, with the
mapping,
+x if x 0
abs(x) =
-x if x<0
Every element of N is the image of at least one element of Z under the mapping abs;
therefore abs is onto. Note that abs is onto only because its range is limited to N. If
the range had been Z, it would not be onto because no negative integer is the absolute
value of anything in Z.

A function is one-to-one if no two elements of its domain are mapped into the same
element of its range. The function abs is not one-to-one because all the elements
of N except 0 are the images of two different elements of Z. For example, abs(-3)
and abs(3) are both 3. As a more subtle example, consider the function g: Z N with
the mapping,
g(x) = 2x2 + x.
Then g(0)=0, g(1)=3, g(-1)=1, g(2)=10, g(-2)=6, etc. The function g is one-to-one
since no two elements of Z are mapped into the same element of N. However, g is not
onto because many elements of N are not images of any element of Z. Note that g is
one-to-one only over the domain Z of integers. If its domain and range were extended
to the set R of all real numbers, it would not be one-to-one: g(-0.5) and g(0), for
example, are both 0.

A function that is both one-to-one and onto is called an isomorphism. The two sets
that form the domain and range of the function are said to be isomorphic to each
other. Let E be the set of even integers, and let O be the set of odd integers. Then
define the function increment: E O with the mapping,
increment(x) = x + 1.
This function is an isomorphism from the set E to the set O because it is both one-to-
one and onto. Therefore, the sets E and O are isomorphic. Instead of the terms one-to-
one, onto, and isomorphic, many authors use the equivalent terms injective, surjective,
and bijective.

For many applications, isomorphic structures are considered equivalent. In old-


fashioned computer systems, for example, holes on a punched card could represent the
same data as magnetized spots on tape or currents flowing in transistors. Differences
in the hardware are critical for the engineer, but irrelevant to the programmer. When
programmers copied data from cards to tape, they would blithely talk about "loading
cards to tape" as if the actual paper were moved. One mythical programmer even
wrote a suggestion for reducing the shipping costs in recycling old cards: load the
cards to tape and punch them out at the recycling works.

If f is an isomorphism from A to B, then there exists an inverse function, f -1: B A.


The inverse of the function increment is the function decrement: O E with the
mapping,
decrement(x) = x - 1.
The composition of two functions is the application of one function to the result of the
other. Suppose that f: A B and g: B C are two functions. Then their
composition g(f(x)) is a function from A to C. The composition of a function with its
inverse produces the identity function, which maps any element to itself. For
any x in E, decrement(increment(x)) is the original element x.

Functions may have more than one argument. A function of two arguments whose
first argument comes from a set A, second argument from a set B, and result from a
set C is specified f: A×B C. A function with one argument is called monadic, with
two arguments dyadic, with three arguments triadic, and with n arguments n-adic.
Those terms are derived from Greek. Some authors prefer the Latin terms unary,
binary, ternary, and n-ary. The number of arguments of a function is sometimes
called its valence, adicity, or arity.

The rule that defines a function f:A B as a mapping from a set A to a set B is called
the intension of the function f. The extension of f is the set of ordered pairs determined
by such a rule:

{ a1,b1 , a2,b2 , a3,b3 ,...}.


where the first element of each pair is an element x in A, and the second is the
image f(x) in the set B. A definition by extension is only possible when the
domain A is finite. In all other cases, the function must be defined by intension. (One
could, of course, define the extension of a function as an infinite set, but the set itself
would have to be defined by intension.)

Since a function is a rule for mapping one set to another, the term mapping is
sometimes used as a synonym for function. Another synonym for function is the
term operator. Addition, subtraction, multiplication, and division are dyadic functions
defined over the real numbers, but they are usually called operators. A common
distinction is that functions have ordinary alphabetic names, but operators are
designated by special symbols like + or ÷. Traditional mathematical practice has
tended to use several different terms as informal synonyms for functions:

 If the domain of a function is a set of simple things like numbers and type
labels, it is usually called a function.
 If its domain and range are sets of complex structures like conceptual graphs, it
is often called a mapping.
 If its name is being spelled in full for readability, it may be written as an
alphanumeric string, such as increment(x) or add1(x).
 If it often occurs in complex expressions, it may be abbreviated by a single
symbol or Greek letter, such as x, and be called an operator.

3. Lambda Calculus
Defining a function by a rule is more natural or intuitive than defining it as a set of
ordered pairs. But a question arises when functions defined by different rules or
intensions happen to have exactly the same sets of ordered pairs or extensions. In
developing his theory of lambda calculus, the logician Alonzo Church (1941)
distinguished equality by intension from equality by extension:
It is possible, however, to allow two functions to be different on the ground that the
rule of correspondence is different in meaning in the two cases although always
yielding the same result when applied to any particular argument. When this is done,
we shall say that we are dealing with functions in intension. The notion of difference
in meaning between two rules of correspondence is a vague one, but in terms of some
system of notation, it can be made exact in various ways.
Church's way of making the notion precise was to define lamda calculus as a notation
for defining functions and a method for converting any given definition into other
equivalent definitions.

In mathematics, the traditional way of defining a function is to specify the name of a


function and its formal parameter on the left side of an equation and to put the
defining expression on the right:
f(x) = 2x2 + 3x - 2.
This method of definition makes it impossible to specify the name f of the function
independently of the name x of the formal parameter. To separate them, Church
adopted the Greek letter as a marker of the defining lambda expression:
f = x(2x2 + 3x - 2).
In this equation, the name f appears by itself on the left, and its definition is a separate
expression on the right. The symbol x that appears after is called the formal
parameter or bound variable, and the remainder of the expression is called the body.

Church's rules for lambda conversion are formal statements of the common
techniques for defining and evaluating functions. Whenever a function is applied to its
arguments, such as f(5), the function may be evaluated by replacing the name f with
the body of the definition and substituting the argument 5 for every occurrence of the
formal parameter x. Church also defined additional operators, which combined with
function evaluation to produce a computational system that is as general as a Turing
machine.

With such rules, Church answered the question about equality of functions: they are
equal by extension if they have the same sets of ordered pairs, and they are equal by
intension if their definitions are reducible to the same canonical form by the rules of
lambda conversion. An important result of the lambda calculus is the Church-Rosser
theorem: when an expression has more than one function application that can be
evaluated, the order of evaluation is irrelevant because the same canonical form would
be obtained with any sequence of evaluations.

In computer science, the clear separation of the name of a function from its defining
expression enables a lambda expression to be used anywhere that a function name
could be used. This feature is especially useful for applications that create new
functions dynamically and pass them as arguments to other functions that evaluate
them. John McCarthy (1960) adopted the lambda notation as the basis for defining
and evaluating functions in the LISP programming language. A common technique of
computational linguistics is to translate natural language phrases to lambda
expressions that define their semantics. William Woods (1968) used that technique for
defining the semantics of the English quantifiers every and some as well as extended
quantifiers such as more than two or less than seven. He implemented his definitions
in LISP programs that translated English questions to lambda expressions, which were
later evaluated to compute the answers. Richard Montague (1970) adopted a similar
technique for his treatment of quantifiers in natural language semantics.

4. Graphs
In diagrams, a graph is normally drawn as a network of nodes connected by arcs. Such
diagrams introduce arbitrary conventions that are irrelevant to the mathematical
definitions and theorems: Are the arcs drawn curved or straight? Short or long? Are
the nodes drawn as dots, circles, or other shapes? Is there any significance in having
node x above, below, to the right, or to the left of node y? To avoid such questions, a
graph is defined formally without any reference to a diagram. Diagrams are then
introduced as informal illustrations. Diagrams are essential for developing an intuitive
understanding, but the definitions and proofs are independent of any features of the
drawing that are not explicitly mentioned in the formal definitions.

Figure 1: A sample graph

Formally, a graph G consists of a set N of nodes and a set A of arcs. Every arc in A is
a pair of nodes from the set N. For the sample graph in Figure 1, the set of nodes is
{A, B, C, D, E}, and the set of arcs is { A,B , A,D , B,C , C,D , D,E }.
Notice that node D happens to be an endpoint of three different arcs. That property
can be seen instantly from the diagram, but it takes careful checking to verify it from
the set of pairs. For people, diagrams are the most convenient way of thinking about
graphs. For mathematical theories, a set of pairs is easier to axiomatize. And for
computer implementations, many different data structures are used, such as blocks of
storage for the nodes and pointers for the arcs.

Figure 2: An alternate way of drawing the same graph as Figure 1.

Figure 2 is another way of drawing the same graph shown in Figure 1. The two
diagrams look very different, but their abstract representations as sets of nodes and
arcs are the same. Even when graphs are defined in a purely abstract way, questions
may arise about the order of the two nodes of an arc. If the order is irrelevant, the
notation {A,B} shows that the arc is an unordered set of two nodes. A graph whose
arcs are unordered pairs is said to be undirected. If the order is significant, A,B
and B,A represent distinct arcs, and the graph is said to be directed. For the
directed graph represented in Figures 1 and 2, an arrowhead on each arc points to the
second node of each ordered pair.

Although graphs are defined abstractly, mathematicians normally visualize them as


diagrams. The common conventions for drawing graphs are reflected in descriptive
terms like endpoint, loop, path, and cycle. Let e be the arc a,b . Then the
nodes a and b are called endpoints of e, and e is said to connect a and b. If e is an arc
of a directed graph, then the first endpoint a is called the source of e, and the second
endpoint b is called the target of e. The word target is easy to remember since that is
the direction the arrow points. A loop is an arc e whose endpoints are the same
node: e= a,a .

Combinations of arcs are often named by the methods of traversing a graph.


A walk through a graph is a sequence of nodes a0, a1, ..., an for which any two
adjacent nodes ai and ai+1 are the endpoints of some arc. Any arc whose endpoints are
adjacent nodes of a walk is said to be traversed by the walk. A walk that contains n+1
nodes must traverse n arcs and is therefore said to be of length n. A path is a walk in
which all nodes are distinct. A walk with only one node a0 is a path of length 0. If
the first and last nodes of a walk are the same, but all other nodes are distinct, then the
walk is called a cycle. Every loop is a cycle of length 1, but cycles may traverse more
than one arc.

For the graph in Figure 2, the walk E, D, A, B is a path because all nodes are
distinct. The path is of length 3, which is equal to the number of arcs traversed by a
point that moves along the path. The walk D, C, B, A, D is a cycle because it starts
and ends at the same node.

If G is a directed graph, then a walk, path, or cycle through G may or may not follow
the same direction as the arrows. A walk, path, or cycle through G is said to
be directed if adjacent nodes occur in the same order in which they occur in some arc
of G: if ai and ai+1 are adjacent nodes on the walk, then the ordered pair ai,ai+1 must
be an arc of G. An arc of a directed graph is like a one-way street, and a directed walk
obeys all the one-way signs (arrowheads). An undirected walk through a directed
graph is possible, simply by ignoring the ordering.

A graph is connected if there is a possible path (directed or undirected) between any


two nodes. If it is not connected, then it breaks down into disjoint components, each of
which is connected, but none of which has a path linking it to any other component.
A cutpoint of a graph is a node, which when removed, causes the graph (or the
component in which it is located) to separate into two or more disconnected
components.

Certain special cases of graphs are important enough to be given special names:
an acyclic graph is one that has no cycles, and a tree is an acyclic connected graph for
which the path between any two nodes is unique. The most commonly used trees
are rooted trees:

 The arcs of a rooted tree are directed.


 If a,b is an arc of the tree, the node a is called the parent of b, and b is
a child of a.
 There is a privileged node called the root, which has no parent.
 Every node except the root has exactly one parent.
 A node that has no child is called a leaf.

The terminology of trees is extended to related graphs: a forest is a collection of


disconnected trees; a chain is a tree with no branches all the nodes lie along a
single path; and a seed has only one node and no arcs. Some authors require every
graph to have at least one node, but other authors include the empty graph or blank,
which has no nodes or arcs.
Figure 3: A binary tree

A binary tree is a rooted tree where every node that is not a leaf has exactly two
children (Figure 3). In a binary tree, the two children of each node are usually
designated as the left child and the right child. Since a tree has no cycles, a common
convention for simplifying the diagrams is to omit the arrowheads on the arcs, but to
draw the parent nodes at a higher level than their children. For Figure 3, the root A,
which has no parent, is at the top; and the leaves, which have no children, are
arranged along the bottom.

In computer applications, each node of a tree or other graph may have some
associated data. To process that data, a program can take a walk through the tree and
process the data at each node it visits. For the tree in Figure 3, imagine a walk that
starts at the root, visits every node at least once, and stops when it returns to the root.
Assume that the left child is always visited before the right child. Such a walk will
visit the leaves of the tree only once, but it will visit each of the branching nodes three
times: A, B, D, B, E, B, A, C, F, H, F, I, F, C, G, C, A . There are therefore three
options for processing the data at the branching nodes:

 Preorder. Process the data at the first visit to the node. For Figure 3, the nodes
would be processed in the order A, B, D, E, C, F, H, I, G.
 Postorder. Process the data at the last visit to each node. For Figure 3, the
nodes would be processed in the order D, E, B, H, I, F, G, C, A.
 Inorder. Process the data at the middle visit to each node. For Figure 3, the
nodes would be processed in the order D, B, E, A, H, F, I, C, G.
These definitions can be generalized to trees with an aribtrary number of children at
each branching node. They can also be generalized to graphs by finding a spanning
tree, which includes all the nodes of the graph, but omits some subset of the arcs.

A common application of graph or tree walking algorithms is the translation of a parse


tree or a conceptual graph to some natural or artificial language. The patterns of word
order in various natual languages can be generated by different ways of walking
through a conceptual graph and translating the concept nodes to words of the target
language (Sowa 1984). Irish and Biblical Hebrew, for example, are preorder
languages that put the verb first, Latin and Japanese are postorder languages that put
the verb last, and English and Chinese are inorder languages that put the verb in the
middle.

The terminology for graphs in this section is fairly standard, but many of the ideas
have been developed independently by different people, who have introduced
different terms. Some authors use the terms vertex and edge instead of node and arc.
Others distinguish degrees of connectivity in a directed graph: it is strongly
connected if there is a directed path between any two nodes, and it is weakly
connected if there is only an undirected path between some pair of nodes. Some
authors use the term digraph as an abbreviation for directed graph, but that use is
confusing, since digraph should mean double graph. Occasionally, people introduce
fancy terms like arborescence for rooted tree, but the simpler terminology is more
descriptive.

5. Relations
A relation is a function of one or more arguments whose range is the set of truth
values {true,false}. An example of a dyadic or binary relation is the function less
than represented by the operator symbol <. Its domain is the set of ordered pairs of
real numbers, R×R:
<: R×R {true,false}.
When applied to the numbers 5 and 12, 5<12 has the value true, but 12<5 has the
value false. Dyadic relations are often written as infix operators with special symbols,
such as x<y or x S. Sometimes, relations are represented by single letters, such as
R(x,y) or S(x,y,z). For improved readability, they may be represented by arbitrarily
long alphanumeric strings, such as Mother(x,y) or Between(x,y,z). Traditional
mathematics uses single letters or symbols, but programming languages and database
systems usually have so many variables, functions, and relations that longer names are
preferred. The term predicate is often used as a synonym for relation. Some authors,
however, say that relations must have two or more arguments and call a predicate with
one argument a property.

As with other functions, relations may be defined either by intension or by extension.


An intensional definition is a rule for computing a value true or false for each
possible input. An extensional definition is a set of all n-tuples of arguments for which
the relation is true; for all other arguments, the relation is false. One instance of a
relation being true is represented by a single n-tuple, called a relationship; the relation
itself is the totality of all relationships of the same type. A marriage, for example, is a
relationship between two individuals, but the relation called marriage is the totality of
all marriages.

The following table lists some common types of relations, an axiom that states the
defining constraint for each type, and an example of the type. The symbol ®
represents an arbitrary dyadic relation.

Type Axiom Example of x®y

Reflexive (∀x) x®x x is as old as y

Irreflexive (∀x) not(x®x) x is the mother of y

Symmetric (∀x,y) x®y implies y®x x is the spouse of y

Asymmetric (∀x,y) x®y implies not(y®x) x is the husband of y

Antisymmetric (∀x,y) x®y and y®x implies x=y x was present at y's birth

Transitive (∀x,y,z) x®y and y®z implies x®z x is an ancestor of y

The symbol , called the universal quantifier, may be read for every or for all; it is
discussed further in Section 9 on predicate logic. Some important types of relations
satisfy two or more of the above axioms:

 A partial ordering, represented by the symbol , is a dyadic relation that


satisfies three of the above axioms: reflexive, antisymmetric, and transitive.
The subset relation is the most common partial ordering over sets. It is
antisymmetric because x y and y x imply that x=y. The subset relation is only
a partial ordering because there are many sets for which neither x y nor y x is
true.
 A linear ordering is a partial ordering where x y or y x for every pair x and y.
For real numbers, represents the linear ordering less than or equal to.
 An equivalence relation is reflexive, symmetric, and transitive. The archetype
of an equivalence relation is equality: it is reflexive, because x=x; it is
symmetric, because x=y implies y=x; and it is transitive,
because x=y and y=z imply x=z. As another example, born under the same sign
of the zodiac is an equivalence relation over the set of all people. Whenever an
equivalence relation is defined over a set, it divides the set into equivalence
classes. The zodiac relation divides the set of all human beings into 12
equivalence classes that have the traditional labels Aries, Taurus, ..., Pisces.

In a relational database, the n-tuples of the relations are explicitly grouped in tables,
which are accessed by means of the Structured Query Language (SQL). A stored
relation is one whose values are physically stored by extension, and a virtual
relation is computed by some rule. In an object-oriented database, the n-tuples of the
relations are grouped with the entities or objects that occur in the n-tuples. The
resulting data structures resemble graphs rather than tables, but they can represent
exactly the same logical information (as shown in Section 6). In theory,
implementation issues are irrelevant since the equivalent logical operations can be
performed on either form. In practice, the question of how a relation is implemented
may have important implications on efficiency or ease of use. As an extreme example,
the relation x<y is trivial to compute, but it would require an infinite amount of space
to store.

6. Representing Relations by Graphs


Graphs and dyadic relations are mathematical structures that look different, but they
can represent the same information in logically equivalent ways. Historically, graphs
are more closely associated with geometric properties that can be seen from diagrams,
and relations are associated with more abstract mathematics and logic. But every
dyadic relation can be represented as a graph, and every graph defines a dyadic
relation.

Let G be a graph, and let the symbol ® represent the corresponding dyadic relation.
If x and y are nodes in the graph G, define x®y=true if the pair x,y is an arc of G,
and x®y=false if x,y is not an arc of G. If the graph is undirected, then ®
is symmetric because it satisfies the axiom x®y=y®x.

Although every dyadic relation can be represented by a graph, some extensions are
necessary to represent multiple relations in the same graph. A common technique is
to label the arcs of a graph with the names of the relations they represent. Informally,
a labeled graph can be viewed as a set of graphs one for each relation overlaid
on top of one another with the labels showing which relation each arc has been
derived from. Formally, a labeled graph G can be constructed by the following steps:
1. Given a set {R1,...,Rn} of n dyadic relations, let {G1,...,Gn} be a set of graphs
that represent each of those relations.
2. The set of nodes N of the labeled graph G is defined as the union of the
nodes Ni of each graph Gi for each i from 1 to n.
3. The set of arcs A of G is the set of all triples of the form i,a,b where i is an
integer from 1 to n and a,b is an arc in the set Ai of the graph Gi. The
integer i is called the label of the arc.

This construction uses integers as labels, but it could be generalized to use character
strings or other symbols as labels. It could even be generalized to infinite graphs or
uncountably infinite graphs with real numbers used as labels. Such graphs could never
be stored in a computer, but it is possible to define them mathematically and prove
theorems about them.

Further extensions are needed to represent n-adic relations where n 2. Two kinds of
generalized graphs may be used to represent n-adic
relations: hypergraphs and bipartite graphs. The unlabeled versions can represent a
single relation, and the labeled versions can represent an arbitrary collection of n-adic
relations, including the possibility of allowing different relations to have a
different valence or number of arguments.

 A hypergraph consists of a set N of nodes and a set A of arcs, but each arc
in A may have an arbitrary number of endpoints. Each arc of a hypergraph can
represent one n-tuple or relationship among n entities in the set of nodes N.
 A bipartite graph also consists of a set of nodes and a set of arcs, but the set of
nodes is split in two disjoint sets C and R. The nodes in C represent the same
entities as the nodes in N of the hypergraph, and the nodes in R represent the
same relationships as the arcs in A of the hypergraph.

Every arc of a bipartite graph has exactly two endpoints, but one must be
in C and the other must be in R. The set C is the same as the set N of the
hypergraph, but there is an isomorphism between the set R of the bipartite
graph and the set of arcs A of the hypergraph: for each arc
in A with n endpoints a1,...,an the corresponding node r of R is an endpoint of
each of n arcs r,a1 ,..., r,an .

As these definitions illustrate, the mapping from relations to hypergraphs is somewhat


simpler than the mapping from relations to bipartite graphs. Formally, however,
hypergraphs and bipartite graphs are isomorphic, and they can represent the same
relations in equivalent ways.
Figure 4: A conceptual graph represented as a labeled bipartite graph.

When bipartite graphs are used to represent conceptual graphs (CGs), the nodes in
set C are called concepts and the nodes in set R are called conceptual relations. Figure
4 shows a conceptual graph that represents the English sentence John is going to
Boston by bus. The boxes in Figure 4 represent concept nodes, and the circles
represent conceptual relation nodes; every arc links a conceptual relation to a concept.
Although a CG could be defined formally as either a hypergraph or a bipartite graph,
the terminology of bipartite graphs maps more directly to the traditional way of
drawing CGs. For a tutorial, see the examples of conceptual graphs. For the formal
definitions, see the draft ISO standard for Common Logic.

7. Lattices
A mathematical structure is a set together with one or more relations or operators
defined over the set. A lattice is a structure consisting of a set L, a partial ordering ,
and two dyadic operators and . If a and b are elements of L, a b is called
the greatest lower bound or infimum of a and b, and a b is called the least upper
bound or supremum of a and b. For any a, b, and c in L, these operators satisfy the
following axioms:

 a b a and a b b.
 If c is any element of L for which c a and c b, then c a b.
 a a b and b a b.
 If c is any element of L for which a c and b c, then a b c.

The symbols and are the same as the symbols for intersection and union of sets.
This similarity is not an accident because the set of all subsets of a universal
set U forms a lattice with the subset relation as the partial ordering. A bounded
lattice is one with a top and a bottom , where for any element a in the lattice,
a . All finite lattices are bounded, and so are many infinite ones. In a lattice of
subsets, the universal set U itself is , and the empty set {} is .

Since the and operators on lattices are so similar to the unions and intersections
of sets, they have many of the same properties. Following are the identities defined for
the set operators that also hold for lattice operators:

 Idempotency. A A is identical to A, and A A is also identical to A.


 Commutativity. A B is identical to B A, and A B is identical to B A.
 Associativity. A (B C) is identical to (A B) C, and A (B C) is
identical to (A B) C.
 Absorption. A (A B) is identical to A, and A (A B) is also identical to A.

These identities hold for all lattices. A distributive lattice is one for which the
distributive law also holds. A complemented lattice is one for which a complement
operator and De Morgan's laws apply. The lattice of all subsets of some set is an
example of a bounded distributive complemented lattice, for which all the identities
hold.

Graphs of mathematical lattices look like the lattices that one would expect a wisteria
vine to climb on. Figure 5 shows the graphs for three kinds of partial orderings: a tree,
a lattice, and a general acyclic graph. To simplify the drawings, a common convention
for acyclic graphs is to omit the arrows on the arcs, but to assume that the arcs are
directed either from the higher node to the lower node or from the lower node to the
higher. For the partial ordering a b, the arc would be directed from a lower node a to
a higher node b.

Figure 5: A lattice, a tree, and an acyclic graph

The term hierarchy is often used indiscriminately for any partial ordering. Some
authors use the term hierarchy to mean a tree, and tangled hierarchy to mean an
acyclic graph that is not a tree. In general, every tree is an acyclic graph, and every
lattice is also an acyclic graph; but most lattices are not trees, and most trees are not
lattices. In fact, the only graphs that are both trees and lattices are the simple chains
(which are linearly ordered).

All the type hierarchies discussed in the book Knowledge Representation are partial
orderings, and many of them are also lattices. Leibniz's Universal Characteristic,
which is discussed in Section 1.1 of that book, defines a lattice of concept types in
which the partial ordering operator is called subtype.

 Each attribute used to define concepts is assigned a unique prime number.


 Each concept type is represented by a product of the primes assigned to each of
its defining attributes.
 Concept type a is a subtype of b if the number for a is divisible by the number
for b.
 a b is the greatest common divisor of a and b.
 a b is the smallest integer divisible by both a and b.
 The universal concept type , which has no defining attributes, is represented
by 1.
 The bottom of the lattice is the product of all the prime numbers that have
been assigned to possible attributes.

Another representation, which is isomorphic to Leibniz's products of primes, would


represent each concept type by a bit string. If n is the number of primitive attributes,
each product of primes could be mapped to a bit string of length n:

 An attribute represented by the i-th prime number is mapped to a string with 1


in the i-th position and all other bits 0.
 Each concept type is represented by the logical OR of all the bit strings for each
of its defining attributes.
 Concept type a is a subtype of b if each position that has a 1 bit for b has a 1 bit
for a.
 a b is the logical AND of the bit strings for a and b.
 a b is the logical OR of the bit strings for a and b.
 The top is the string with all bits 0.
 The bottom is the string with all bits 1.

Note the inverse relationship between the number of attributes that define a concept
type and the number of entities to which it applies. The type Dog applies to fewer
entities in the real world than its supertype Animal, but more attributes are required to
describe it. This inverse relationship between the number of attributes required to
define a concept and the number of entities to which it applies was first noted by
Aristotle. It is called the duality of intension and extension.
Leibniz's method generates lattices with all possible combinations of attributes, but
most combinations never occur in practice. The following table of beverages, which is
taken from a paper by Michael Erdmann (1998), illustrates a typical situation in which
many combinations do not occur. Some combinations are impossible, such as a
beverage that is simultaneously alcoholic and nonalcoholic. Others are merely
unlikely, such as hot and sparkling.

Attributes

Concept Types nonalcoholic hot alcoholic caffeinic sparkling

HerbTea x x

Coffee x x x

MineralWater x x

Wine x

Beer x x

Cola x x x

Champagne x x

Table of beverage types and attributes

To generate the minimal lattice for classifying the beverages in the above table,
Erdmann applied the method of formal concept analysis (FCA), developed by
Bernhard Ganter and Rudolf Wille (1999) and implemented in an automated tool
called Toscana. Figure 6 shows the resulting lattice; attributes begin with lower-case
letters, and concept types begin with upper-case letters.
Figure 6: Lattice constructed by the method of formal concept analysis

In Figure 6, beer and champagne are both classified at the same node, since they have
exactly the same attributes. To distinguish them more clearly, wine and champage
could be assigned the attribute madeFromGrapes, and beer the
attribute madeFromGrain. Then the Toscana system would automatically generate a
new lattice with three added nodes:

 Wine would be alcoholic madeFromGrapes.


 Beer would be sparkling alcoholic madeFromGrain.
 Champagne would be sparkling alcoholic madeFromGrapes.

Figure 7 shows the revised lattice with the new nodes and attributes.
Figure 7: Revised lattice with new attributes

Note that the attribute nonalcoholic is redundant, since it is the complement of the
attribute alcoholic. If that attribute had been omitted from the table, the FCA method
would still have constructed the same lattice. The only difference is that the node
corresponding to the attribute nonalcoholic would not have a label. In a lattice for a
familiar domain, such as beverages, most of the nodes correspond to common words
or phrases. In Figure 7, the only node that does not correspond to a common word or
phrase in English is sparkling alcoholic.

Lattices are especially important for representing ontology and the techiques for
revising, refining, and sharing ontologies. Each addition of a new attribute results in a
new lattice, which is a refinement of the previous lattice. A refinement generated by
FCA contains only the minimal number of nodes needed to accommodate the new
attribute and its subtypes. Leibniz's method, which generates all possible
combinations, would introduce superfluous nodes, such as hot caffeinic
sparkling madeFromGrapes. The FCA lattices, however, contain only the known
concept types and likely generalizations, such as sparkling alcoholic. For this
example, Leibniz's method would generate a lattice of 64 nodes, but the FCA method
generates only 14 nodes. A Leibniz-style of lattice is the ultimate refinement for a
given set of attributes, and it may be useful when all possible combinations must be
considered. But the more compact FCA lattices avoid the nonexistent combinations.
A further study of ontology raises questions about the origin of the various attributes
and their relationships to one another. In Leibniz's method and the FCA method, the
attributes madeFromGrapes and madeFromGrain are treated as independent
primitives. Yet both of them could be analyzed as the dyadic madeFrom relation
combined with either grapes or grain. Then madeFrom could be further analyzed
into make and from, but the verb make would raise new questions about the difference
between making wine from grapes and making a milkshake from milk. The plural
noun grapes and the mass noun grain would also raise questions about quantification
and measurement. A lattice is an important structure for organizing concept types, but
a complete definition of those types leads to all the problems of language, logic, and
ontology. For further discussion of those issues, see the paper "Concepts in the
Lexicon".

8. Propositional Logic
Symbolic logic has two main branches: propositional calculus and predicate calculus,
which are also called propositional logic and predicate logic. Propositional logic deals
with statements or propositions and the connections between them. The symbol p, for
example, could represent the proposition Lillian is the mother of Leslie. Predicate
logic, however, would represent that proposition by a predicate Mother applied to the
names of the two individuals: Mother(Lillian,Leslie). Whereas propositional logic
represents a complete statement by a single symbol, predicate logic analyzes the
statement into finer components.

Besides symbols for propositions, propositional logic also includes Boolean


operators that represent logical relations such as and, or, not, and if-then. Let p be the
proposition The sun is shining, and let q be the proposition It is raining. The most
commonly used operators in propositional logic correspond to the English words and,
or, not, if-then, and if-and-only-if:

 Conjunction (and). p q represents the proposition The sun is shining, and it is


raining.
 Disjunction (or). p q represents The sun is shining, or it is raining.
 Negation (not). ~p represents The sun is not shining.
 Material implication (if-then). p q represents If the sun is shining, then it is
raining.
 Equivalence (if-and-only-if). p q represents The sun is shining if and only if it
is raining.
The propositions represented in symbolic logic may be true or false. The rules of
propositional logic compute the truth value of a compound proposition from the truth
or falsity of the elementary propositions contained within it. They are therefore
called truth functions, whose inputs are the truth values T for true and F for false.
The following table, called a truth table, shows the outputs generated by the five truth
functions , , ~, , and for all possible combinations of the two inputs p and q.

Inputs Outputs

p q p q p q ~p p q p q

T T T T F T T

T F F T F F F

F T F T T T F

F F F F T T T

There are 16 possible truth functions of two arguments, but the five listed in this table
are the most commonly used. Another operator that is sometimes used is exclusive or,
which is equivalent to p or q, but not both. Two operators commonly used in
computer circuit design are nand and nor with the symbols and :

p q is equivalent to ~(p q),


p q is equivalent to ~(p q).

If one or two Boolean operators are taken as primitives, the others can be defined in
terms of them. One common choice of primitives is the pair ~ and . The other
operators can be defined by the following combinations:

p q is equivalent to ~(~p ~q)


p q is equivalent to ~(p ~q)
p q is equivalent to ~(p ~q) ~(~p q)

In fact, only one primitive operator, either or , is necessary since both ~ and
can be defined in terms of either one of them:

~p is equivalent to (p p)
~p is equivalent to (p p)
p q is equivalent to (p q) (p q)
p q is equivalent to (p p) (q q)

Peirce's existential graphs use negation and conjunction as the two Boolean
primitives. Peirce was also the first person to discover that all the other Boolean
operators could be defined in terms of or . For Peirce's own tutorial on existential
graphs, see his Manuscript 514.

9. Predicate Logic
In propositional logic, the proposition All peaches are fuzzy may be represented by a
single symbol p. In predicate logic, however, the fine structure of the proposition is
analyzed in detail. Then p would be represented by an entire formula,

x(peach(x) fuzzy(x)).
The symbol is called the universal quantifier, and the symbols peach and fuzzy are
predicates or relations, such as the ones described in Section 5 on relations. The
combination x may be read for every x or for all x, the combination peach(x) may be
read x is a peach, and the combination fuzzy(x) may be read x is fuzzy. The entire
formula, therefore, may be read For every x, if x is a peach, then x is fuzzy. Since
predicates (or relations) are functions that yield truth values as results and since the
Boolean operators are functions that take truth values as their inputs, predicates can be
combined with the same operators used in the propositional logic.

Besides the universal quantifier, predicate logic has an existential


quantifier represented as . The combination x may be read there exists an x such
that. The following formula uses an existential quantifier:

~ x(peach(x) ~fuzzy(x)).
This may be read It is false that there exists an x such that x is a peach and x is not
fuzzy. Formulas with more than one quantifier are possible. The English
statement For every integer x, there is a prime number greater than x is represented as
x y(integer(x) (prime(y) x<y)).
Literally, this formula may be read For every x, there exists a y such that if x is an
integer, then y is prime and x is less than y.

The two kinds of quantifiers, Boolean operators, variables, predicates, and the rules
for putting them together in formulas make up the entire notation of first-order
predicate calculus, which is also known as first-order logic or FOL. It is called first
order because the range of quantifiers is restricted to simple, unanalyzable
individuals.

Higher-order logic or HOL goes beyond FOL by allowing function variables and
predicate variables to be governed by quantifiers. An example of a higher-order
formula is the axiom of induction:

P((P(0) n(P(n) P(n+1)) nP(n)).


This formula may be read For every predicate P, if P is true of 0, and for every n,
P(n) implies P(n+1), then P is true of every n. This is the only axiom for arithmetic
that requires more expressive power than first-order logic.

Any of the functions, operators, relations, and predicates of Sections 1 through 6 can
also be used in the formulas of first-order logic. Following are the formation rules that
define the syntax of formulas:

 A term is either a constant like 2, a variable like x, or an n-adic function


symbol applied to n arguments, each of which is itself a term.
 An atom is either a single letter like p that represents a proposition or an n-adic
predicate symbol applied to n arguments, each of which is a term.
 A formula is either an atom, a formula preceded by ~, any two
formulas A and B together with any dyadic Boolean operator ® in the
combination (A ® B), or any formula A and any variable x in either of the
combinations xA or xA.

Completely formalized presentations of logic go into great detail about the syntactic
form of variables, constants, and functions. When those details are omitted, the nature
of the terms can be stated by a declaration, such as let x be a variable or let c be a
constant.

The formation rules of first-order logic are an example of a recursive definition. By


applying them repeatedly, any possible formula can be derived. Suppose that f is a
monadic function and + is a dyadic operator; then f(x) and 2+2 are terms. (By the
conventions of Section 2, functions written with single characters are called operators,
but they form terms just like other functions.) If P is a dyadic predicate and Q is a
monadic predicate, then P(f(x),2+2) and Q(7) are atoms. Since all atoms are formulas,
these two formulas can be combined by the Boolean operator to form a new
formula:
(P(f(x),2+2) Q(7)).
Since any formula may be preceded by ~ to form another formula, the following
formula may be derived:
~(P(f(x),2+2) Q(7)).
Putting the quantifier y in front of it produces
y~(P(f(x),2+2) Q(7)).
Adding another quantifier x produces
x y~(P(f(x),2+2) Q(7)).
And preceding this formula with ~ produces
~ x y~(P(f(x),2+2) Q(7)).
In this formula, the occurrence of x in f(x) is bound by the quantifier x, but the
quantifier y has no effect on the formula since there is no other occurrence of the
variable y.

The order of quantifiers in predicate logic makes a crucial difference, as it does in


English. Consider the sentence Every man in department C99 married a woman who
came from Boston, which may be represented by the formula,

x y((man(x) dept(x,C99))
(woman(y) hometown(y,Boston) married(x,y))).
This formula says that for every x there exists a y such that if x is a man and x works
in department C99, then y is a woman, the home town of y is Boston, and x married y.
Since the dyadic predicate married is symmetric, married(Ike,Mamie) is equivalent to
married(Mamie,Ike). Interchanging the arguments of that predicate makes no
difference, but interchanging the two quantifiers leads to the formula,
y x((man(x) dept(x,C99))
(woman(y) hometown(y,Boston) married(x,y))).
This formula says that there exists a y such that for every x, if x is a man and x works
in department C99, then y is a woman, the home town of y is Boston, and x married y.
In ordinary English, that would be the same as saying, A woman who came from
Boston married every man in department C99. If there is more than one man in
department C99, this sentence has implications that are very different from the
preceding one.

The first version of predicate logic was developed by Gottlob Frege (1879), but in a
notation that no one else ever used. The more common algebraic notation, which has
been presented in this section, was defined by Charles Sanders Peirce (1883, 1885),
but with a different choice of symbols for the quantifiers and Boolean operators.
Giuseppe Peano (1889) adopted the notation from Peirce and introduced the symbols
that are still in use today. That notation is sometimes called Peano-Russell notation,
since Alfred North Whitehead and Bertrand Russell popularized it in the Principia
Mathematica. But it is more accurate to call it Peirce-Peano notation, since the
extensions that Russell and Whitehead added are rarely used today. The notation
presented in this section is Peirce's algebraic notation with Peano's choice of symbols.
For a survey of other notations for logic, see the examples that compare predicate
calculus to conceptual graphs and the Knowledge Interchange Format (KIF). Aristotle
presented his original syllogisms in a stylized version of Greek, and modern computer
systems sometimes represent predicate calculus in a stylized or controlled natural
language.

10. Axioms and Proofs


Formation rules define the syntax of first-order logic. They do not, however, say
anything about the meaning of the formulas, their truth or falsity, or how they are
related to one another. Model theory, which is discussed in Section 13, defines
an evaluation function that determines whether a formula is true or false in terms of
some model. Proof theory, which is discussed in this section, determines how true
formulas can be derived from other true formulas. Since the time of Aristotle, many
different proof procedures have been developed for logic, but they all share some
common characteristics:

 Certain formulas or types of formulas, called axioms and definitions, are


assumed to be true.
 A proof is a sequence of formulas, each of which is derived from axioms,
definitions, or previous formulas in the sequence by applying some rule of
inference.
 The last line of a proof is called the conclusion.
 If the first line of a proof is a formula p, called a hypothesis, the conclusion of
the proof must be true whenever the hypothesis p and the starting axioms are
true.
 If the first line of a proof is an axiom, then the conclusion is called a theorem,
whose truth depends on nothing other than the truth of the original axioms.

As an example, the following two formulas are equivalent each one implies the
other:

x(peach(x) fuzzy(x))

~ x(peach(x) ~fuzzy(x))
To prove that these formulas are equivalent, it is necessary to find two proofs. One
proof would start with the first formula as hypothesis and apply the rules of inference
to derive the second. The second proof would start with the second formula as
hypothesis and derive the first.

Any equivalence, whether assumed as a definition or proved as a theorem, can be used


to derive one formula from another by the rule of substitution. For any formula A,
either of the following two equivalences may be assumed as a definition, and the other
can be proved as a theorem:

xA is equivalent to ~ x~A
xA is equivalent to ~ x~A

Since these equivalences are true when A is any formula whatever, they remain true
when any particular formula, such as (peach(x) fuzzy(x)), is substituted for A. With
this substitution in both sides of the equivalence for , the left side becomes identical
to the first peach formula above, and the right side becomes the following formula:

~ x~(peach(x) fuzzy(x)).
This formula can be tranformed to other equivalent formulas by using any of the
equivalences given in Section 8. For any formulas p and q, the formula p q was
defined as equivalent to the formula ~(p ~q). Therefore, the next formula can be
derived by subsituting peach(x) for p and fuzzy(x) for q:
~ x~~(peach(x) ~fuzzy(x)).
For any formula p, the double negation ~~p is equivalent to p. Therefore, the double
negation in the previous formula can be deleted to derive
~ x(peach(x) ~fuzzy(x)),
which is identical to the second peach formula. Therefore, the second peach formula
can be proved from the first.

To prove that both peach formulas are equivalent, another proof must start with the
second formula as hypothesis and derive the first formula. For this example, the
second proof is easy to find because each step in the previous proof used an
equivalence. Therefore, each step can be reversed to derive the first formula from the
second. Most proofs, however, are not reversible because some of the most important
rules of inference are not equivalences.

Following are some rules of inference for the propositional logic. The symbols p, q,
and r represent any formulas whatever. Since these symbols can also represent
formulas that include predicates and quantifiers, these same rules can be used for
predicate logic. Of these rules, only the rule of conjunction is an equivalence; none of
the others are reversible.

 Modus ponens. From p and p q, derive q.


 Modus tollens. From ~q and p q, derive ~p.
 Hypothetical syllogism. From p q and q r, derive p r.
 Disjunctive syllogism. From p q and ~p, derive q.
 Conjunction. From p and q, derive p q.
 Addition. From p, derive p q. This rule allows any formula whatever to be
added to a disjunction.
 Subtraction. From p q, derive p. This rule simplifies formulas by throwing
away unneeded conjuncts.

Not all of these rules are primitive. In developing a theory of logic, logicians try to
minimize the number of primitive rules. Then they show that other rules,
called derived rules of inference, can be defined in terms of them. Nevertheless, for all
versions of classical propositional and predicate logic, these rules are either primitive
or derived rules of inference.

Following are some common equivalences. When two formulas are equivalent, either
one can be substituted for any occurrence of the other, either alone or as part of some
larger formula:

 Idempotency. p p is equivalent to p, and p p is also equivalent to p.


 Commutativity. p q is equivalent to q p, and p q is equivalent to q p.
 Associativity. p (q r) is equivalent to (p q) r, and p (q r) is equivalent
to (p q) r.
 Distributivity. p (q r) is identical to (p q) (p r), and p (q r) is
equivalent to (p q) (p r).
 Absorption. p (p q) is equivalent to p, and p (p q) is equivalent to p.
 Double negation. p is equivalent to ~~p.
 De Morgan's laws. ~(p q) is equivalent to ~p ~q, and ~(p q) is equivalent
to ~p ~q.

These equivalences happen to have exactly the same form as the ones for union and
intersection of sets. This similarity is not an accident because George Boole used the
same operators of Boolean algebra for sets and propositions. Giuseppe Peano chose a
different selection of symbols because he wanted to use logic to prove theorems about
sets, and it would be confusing to use the same symbols for both kinds of operators. It
is helpful, however, to remember that the formulas for and can be used for
and just by making the rounded ends pointy.

For predicate logic, the rules of inference include all the rules of propositional logic
with additional rules about substituting values for quantified variables. Before those
rules can be stated, however, a distinction must be drawn between free
occurrences and bound occurrences of a variable:

 If A is an atom, then all occurrences of a variable x in A are said to be free.


 If a formula C was derived from formulas A and B by combining them with
Boolean operators, then all occurrences of variables that are free in A and B are
also free in C.
 If a formula C was derived from a formula A by preceding A with either
x or x, then all free occurrences of x in A are said to be bound in C. All free
occurrences of other variables in A remain free in C.

Once the definitions of free and bound occurrences have been stated, rules of
inference that deal with quantifiers can be given. Let (x) be a formula containing a
free occurrence of a variable x. Then (t) is the result of substituting a term t for
every free occurrence of x in . Following are the additional rules of inference:

 Universal instantiation. From x (x), derive (c), where c is any constant.


 Existential generalization. From (c), where c is any constant, derive x (x),
provided that every occurrence of x in (x) is free.
 Dropping quantifiers. If the variable x does not occur free in , then from x
derive , and from x derive .
 Adding quantifiers. From derive x or derive x , where x is any
variable whatever.
 Substituting equals for equals. From (t) and t=u, derive (u), provided that
all free occurrences of variables in u remain free in (u).

In a sound theory, the rules of inference preserve truth: only true formulas can be
proved. In a complete theory, all true formulas can be proved. Kurt Gödel (1930)
showed that the rules of inference for first-order logic are both sound and complete. In
the following year, he showed that the rules of inference for higher-order logic are
incomplete; i.e., there exist true formulas that are not provable.

11. Formal Grammars


A formal grammar is a system for defining the syntax of a language by specifying the
strings of symbols or sentences that are considered grammatical. Since the set of
grammatical sentences of a language may be very large or infinite, they are usually
derived by a recursive definition. The derivations are generated by production rules,
which were developed by Axel Thue (1914) as a method for transforming strings of
symbols. Emil Post (1943) showed that production rules were general enough to
simulate a Turing machine. Andrei Andreyevich Markov (1954) developed a general
theory of algorithms based on Post production rules. Besides string transformations,
Markov showed how production rules could compute any mathematical function that
was computable by recursive functions or the lambda calculus.
Noam Chomsky (1956, 1957) used production rules for specifying the syntax of
natural languages, and John Backus (1959) used them to specify programming
languages. Although Chomsky and Backus both adopted their notations from Post,
they found that the completely unrestricted versions used by Thue, Post, and Markov
were more powerful than they needed. Backus limited his grammars to the context-
free rules, while Chomsky also used the more general, but still restricted context-
sensitive rules. The unrestricted rules can be inefficient or undecidable, but the more
restricted rules allow simpler, more efficient algorithms for analyzing or parsing a
sentence.

A grammar has two main categories of symbols: terminal symbols like the, dog,
or jump, which appear in the sentences of the language itself;
and nonterminal symbols like N, NP, and S, which represent the grammatical
categories noun, noun phrase, and sentence. The production rules state how the
nonterminal symbols are transformed in generating sentences of the language.
Terminal symbols are called terminal because no production rules apply to them:
when a derivation generates a string consisting only of terminal symbols, it must
terminate. Nonterminal symbols, however, keep getting replaced during the
derivation. A formal grammar G has four components:

 A set of symbols T, called the terminal symbols.


 A set of symbols N, called the nonterminal symbols, with the restriction
that T and N are disjoint: T N={}.
 A special nonterminal symbol S, called the start symbol.
 A set of production rules P of the form:
 A B

where A is a sequence of symbols having at least one nonterminal, and B is the


result of replacing some nonterminal symbol in A with a sequence of symbols
(possibly empty) from T and N.

The start symbol corresponds to the highest level category that is recognized by the
grammar, such as sentence. The production rules generate sentences by starting with
the start symbol S and systematically replacing nonterminal symbols until a string
consisting only of terminals is derived. A parsing program applies the rules in reverse
to determine whether a given string is a sentence that can be generated from S.
Grammars of this form are called phrase-structure grammars because they determine
the structure of a sentence as a hierarchy of phrases.

Some convention is needed to distinguish terminals from nonterminals. Some people


write terminal symbols in lower case letters and nonterminals in upper case; other
people adopt the opposite convention. To be explicit, this section will enclose terminal
symbols in double quotes, as in "the" or "dog". To illustrate the formalism, the
following grammar defines a small subset of English:

 Terminal symbols T: {"the", "a", "cat", "dog", "saw", "chased"}


 Nonterminal symbols N: {S, NP, VP, Det, N, V}
 Start symbol S: S

The set T defines a 6-word vocabulary, and the set N defines the basic grammatical
categories. The starting symbol S represents a complete sentence. The symbol NP
represents a noun phrase, VP a verb phrase, Det a determiner, N a noun, and V a verb.
The following 9 production rules determine the grammatical combinations for this
language:
S NP VP
NP Det N
VP V NP
Det "the"
Det "a"
N "cat"
N "dog"
V "saw"
V "chased"
This grammar may be used to generate sentences by starting with the symbol S and
successively replacing nonterminal symbols on the left-hand side of some rule with
the string of symbols on the right:
S
NP VP
Det N VP
a N VP
a dog VP
a dog V NP
a dog chased NP
a dog chased Det N
a dog chased the N
a dog chased the dog
Since the last line contains only terminal symbols, the derivation stops. When more
than one rule applies, any one may be used. The symbol V, for example, could have
been replaced by saw instead of chased. The same grammar could be used to parse a
sentence by applying the rules in reverse. The parsing would start with a sentence
like a dog chased the dog and reduce it to the start symbol S.

The production rules for the above grammar belong to the category of context-free
rules. Other classes of grammars are ranked according to the complexity of their
production rules. The following four categories of complexity were originally defined
by Chomsky:
 A finite-state or regular grammar has production rules of the following two
forms:
 A x B
 C y

where A, B, and C are single nonterminal symbols, and x and y represent single
terminal symbols. Note that finite-state grammars may have recursive rules of
the form A x A, but the recursive symbol A may only occur as the rightmost
symbol. Such recursions, which are also called tail recursions, can always be
translated to looping statements by an optimizing compiler.

 A context-free grammar has production rules of the following form:


 A B C ... D

where A is a single nonterminal symbol, and B C ... D is any sequence of one


or more symbols, either terminal or nonterminal. In addition to the tail
recursions permitted by a finite-state grammar, a context-free grammar allows
recursive symbols to occur anywhere in the replacement string. A recursive
symbol in the middle of the string, called an embedded recursion, cannot in
general be eliminated by an optimizing compiler. Such recursions require a
parsing program to use a pushdown stack or an equivalent technique to manage
temporary storage.

 A context-sensitive grammar has production rules of the following form:


 a A z a B C ... D z

where A is a single nonterminal symbol, a and z are strings of zero or more


symbols (terminal or nonterminal), and B C ... D is a string of one or more
terminal or nonterminal symbols. To analyze a context-sensitive grammar, a
parsing program requires a symbol table or an equivalent storage-management
technique to keep track of context dependencies.

 A general-rewrite grammar allows the rules to transform any string of symbols


to any other string of symbols. A single modification is sufficient to convert a
context-sensitive grammar to a general-rewrite grammar: allow any
nonterminal symbol A to be replaced by the empty string. Production rules that
generate empty strings cause parsing programs to become nondeterministic
because an empty string might occur anywhere in the string to be analyzed.

Each one of these classes of grammars is more general than the one before and
requires more complex parsing algorithms to recognize sentences in the language.
Every finite-state grammar is also context free, every context-free grammar is also
context sensitive, and every context-sensitive grammar is also a general-rewrite
grammar. But the converses do not hold. For both programming languages and natural
languages, intermediate levels of complexity have been defined for which parsing
algorithms of greater or lesser efficiency can be written.

Once a grammar is defined, all grammatical sentences in the language can be


generated by the following procedure:

1. Write the start symbol as the first line of a derivation.


2. To derive a new line, find some production rule whose left-hand side matches a
substring of symbols in the current line. Then copy the current line, replacing
the matching substring with the symbols on the right-hand side of the
production rule.
3. If more than one production rule has a matching left-hand side, then any one of
them may be applied.
4. If no production rule can be applied to the current line, then stop; otherwise, go
back to step #2.

The last line in a derivation is called a sentence of the language defined by the given
grammar.

For convenience, production rules may be written in an extended notation, which uses
some additional symbols. The new symbols do not increase the number of sentences
that can be derived, but they reduce the total number of grammar rules that need to be
written. The following extensions, which define regular expressions, are used in Unix
utilities such as grep and awk.

 The vertical bar "|" separates alternatives.


 Parentheses "(" and ")" indicate grouping.
 An asterisk "*" following any symbol or group indicates 0 or more occurrences
of the preceding.
 A plus sign "+" following any symbol or group indicates 1 or more occurrences
of the preceding.
 A question mark "?" following any symbol or group indicates 0 or 1 occurrence
of the preceding.

With these extensions, any finite-state or regular grammar can be expressed in a single
production rule whose right-hand side consists of a regular expression. The more
complex grammars require more than one production rule, but usually many fewer
than with the basic notation.

Some examples may help to show how the extended notation reduces the number
rules. By using the vertical bar, the following two production rules,
N "cat"
N "dog"
can be combined in a single rule:
N "cat" | "dog"
If the grammar permits a noun phrase to contain an optional adjective, it might use the
following two rules to define NP:
NP Det N
NP Det Adj N
Then both rules can be combined by using the question mark to indicate an optional
adjective:
NP Det Adj? N
If an optional list of adjectives is permitted, then the question mark can be replaced
with an asterisk:
NP Det Adj* N
This single rule in the extended notation is equivalent to the following four rules in
the basic notation:
NP Det N
NP Det AdjList N
AdjList Adj
AdjList Adj AdjList
The last production rule has a tail recursion, in which the leftmost symbol AdjList is
replaced by a string that includes the same symbol. To generate an unlimited number
of possible sentences, a grammar must have at least one rule that is directly or
indirectly recursive. Since the sample grammar for a fragment of English had no
recursive rules, it could only generate a finite number of different sentences (in this
case 32). With the addition of a recursive rule such as this, it could generate infinitely
many sentences (all of which would be rather boring).

By allowing embedded recursions, the more general context-free grammars can


enforce constraints that cannot be expressed in a finite-state or regular grammar. The
following two rules, for example, generate all strings consisting of n left parentheses
followed by n right parentheses:
S "(" S ")"
S "(" ")"
Since each rule adds a balanced pair of parentheses, the result of applying these rules
any number of times must always be balanced. A finite-state grammar or a regular
expression cannot guarantee that the number of parentheses on the right matches the
number on the left. The following regular expression, for example, generates too
many strings:
S "("+ ")"+
Besides the strings of balanced parentheses, it generates strings like "())))" and
"((((())". A context-free grammar can ensure that the two sides are balanced by
generating part of the right side and the corresponding part of the left side in the same
rule. A context-sensitive grammar can impose more general constraints that depend on
combinations of symbols that occur anywhere in the string. A general-rewrite
grammar can impose any constraint that can be formulated in any programming
language.

12. Game Graphs


Programs in artificial intelligence use trees and graphs to represent games. Chess,
checkers, and tic-tac-toe may be represented by directed graphs whose nodes
represent game positions or states and whose arcs represent moves from one state to
the next. A complete play of a game, called a game path, is a directed walk from some
starting state to an ending state that determines a win, loss, or draw.

This section describes a common type of games called two-person zero-sum perfect-
information games. They are called two-person games to distinguish them from games
like poker with many players; they are zero-sum games because anything that one
player loses the other player wins (as distinguished from negative-sum games where
the house takes a cut or positive-sum games where new values are created); and they
are perfect-information games because each player can see the complete state at all
times (as distinguished from poker or bridge where some of the most significant
information is hidden). Following are some basic definitions:

 A game G is a directed graph with a set of nodes P called states and a set of
arcs M called moves.
 If P1,P2 is a move in M, then state P2 is called a successor of P1.
 Each state is an n-tuple p1,...,pn that describes the current position of the
game. The first element p1 is called the player on move and the remaining
elements p2,...,pn depend on the type of game.
 For a two-person game, the player on move is one of two symbols {A,B}. Each
player is called an opponent of the other player.
 There is a nonempty subset S of P called the starting states of the game.
 There is a nonempty subset E of P called the ending states of the
game. E consists of all states that have no successors.
 A game path is a directed walk from a starting state in S to an ending state in E.
 There is a function payoff that maps ending states into numbers. For every
ending state P, if payoff(P) is positive, then A has a win and B has a loss; if it is
negative, then B has a win and A has a loss; and if it is zero, then
both A and B have a draw.

This definition is not general enough to represent all possible games, but it is adequate
for typical board games. It allows the possibility of games with more than one starting
state, as in a game of chess where one player is given a handicap. For many games,
the payoff is +1 for a win, and -1 for a loss. Other games have a numerical score with
a wider range.

The play of the game consists of moves from state to state. If more than one move is
permitted in a given state, the player on move has the right to choose which one to
play. For a state p1,...,pn , the first symbol p1 identifies the player on move, and the
remaining information p2,...,pn depends on the type of game. In chess, for example, the
current state describes the location of all the pieces on the board, but it also includes
control information about the possibility of castling or en passant pawn captures.

Since the payoff function is only defined for ending states, the value at nonending
states may be computed on the assumption that each player makes an optimal choice
at each move. In choosing moves, A's strategy is to maximize the payoff, and B's
strategy is to minimize the payoff. Therefore, the usual method for computing the
value at a state P of a game G is called a minimax algorithm because it alternately
tries to minimize or maximize the predicted value depending on which player is on
move. The value at state P is determined by a recursive function value(P):

 If P is an ending state, value(P)=payoff(P).


 If P is a nonending state with player A on move, let Q1,...,Qn be the successors
to state P. Then value(P) is the maximum of value(Q1),...,value(Qn).
 If P is a nonending state with player B on move, let Q1,...,Qn be the successors
to state P. Then value(P) is the minimum of value(Q1),...,value(Qn).
 If value(P) is positive, then A is said to have a winning strategy in state P. If it
is negative, then B is said to a have a winning strategy in P.

The value function computes the expected payoff if both players make the best moves
at each turn. If some game paths are infinite, value(P) may be undefined for some
state P; if all game paths are finite, value(P) is defined for all P. Books on AI that
cover game-playing programs discuss algorithms for evaluating this function
efficiently or approximating it if an exact evaluation would take too much time. The
result of the value function is also called the backed-up value because it computes the
value of a nonending state P by looking ahead to the ending states that are reachable
from P and computing backward until it determines the value for P.

13. Model Theory


In two famous papers, "The Concept of Truth in Formalized Languages" and "On the
Concept of Logical Consequence," Alfred Tarski (1935, 1936) began the development
of model theory. He described his theory as a formalization in symbolic logic of
Aristotle's classic definition (Metaphysics, 1011b27):

To say of what is that it is not, or of what is not that it is, is false,


while to say of what is that it is, or of what is not that it is not, is true.

For Aristotle, these conditions define what it means for a sentence in a natural
language to be true about the world. Tarski, however, made two important
simplifications. First, he restricted his definitions to formalized languages, in
particular, to first-order predicate logic. Second, he replaced the complexities of the
real world by an abstract "model of an axiom system." For an informal discussion of
Tarski's approach, see his paper "The Semantic Conception of Truth".

Formally, a model M consists of a set D of entities called a domain of individuals and


a set R of relations defined over the individuals in D. To determine whether a sentence
is true or false, model theory defines an evaluation function , which can be applied
to any formula p in first-order logic and any model M= D,R :

 (p,M)=true if and only if p is true about the individuals in D and their


relationships in R.
 (p,M)=false if and only if p is false about the individuals in D and their
relationships in R.

The technical material in Tarski's papers defines the evaluation function in terms of
the syntax of first-order logic.

One of the most complicated features of Tarski's original definition is his method of
assigning entities in the domain D to the variables in the formula p. Although his
definition is mathematically correct, it is inefficient to compute for finite domains and
impossible to compute for infinite domains. As an example, consider the
statement For every integer n, the integer 2n is even. In predicate calculus, that
statement could be expressed by the following formula:

( n)( m)(integer(n) (times(2,n,m) even(m))).


According to Tarski's original definition, the truth of this formula is determined by
assigning all possible integers to the variable n and checking whether there exists a
value of m that makes the body of the formula true for each assignment. Such a
computation is impossible for infinite domains and extremely inefficient even for
finite domains.

To simplify the evaluation of , the logician Jaakko Hintikka (1973, 1985)


developed game-theoretical semantics as a systematic method for assigning values to
the variables one at a time. Risto Hilpinen (1982) showed that game-theoretical
semantics is equivalent to the method of endoporeutic, which Peirce developed for
determining the truth value of an existential graph. Sowa (1984) adopted game-
theoretical semantics as the method for defining the semantics of conceptual graphs.
In an introductory textbook for teaching model theory, Barwise and Etchemendy
(1993) adopted game-theoretical semantics and wrote a computer program to teach
their readers how to play the game. Game-theoretical semantics is mathematically
equivalent to Tarski's original definition, but it is easier to explain, easier to generalize
to a wide variety of languages, and more efficient to implement in computer
programs.

In game-theoretical semantics, every formula p determines a two-person zero-sum


perfect-information game, as defined in Section 12. The two players in the evaluation
game are the proposer, who wins the game by showing that p is true, and the skeptic,
who wins by showing that p is false. To play the game, the two players analyze p from
the outside in, according to the following rules. The rules progressivley simplify the
formula p by removing quantifiers and Boolean operators until the formula is reduced
to a single atom.

 Existential. If p has the form ( x)q, the proposer selects some individual in D,
gives it a name c, and substitutes c for every occurrence of x in q. Then the
game continues with the modified version of q as the new p.
 Universal. If p has the form ( x)q, the skeptic selects some individual in D,
gives it a name c, and substitutes c for every occurrence of x in q. Then the
game continues with the modified version of q as the new p.
 Conjunction. If p has the form q1 q2, the skeptic selects either q1 or q2, which
becomes the new p as the game continues.
 Disjunction. If p has the form q1 q2, the proposer selects either q1 or q2, which
becomes the new p as the game continues.
 Negation. If p has the form ~q, then q becomes the new p, and the two players
change sides: the proposer becomes the new skeptic, and the skeptic becomes
the new proposer.
 Implication. If p has the form q1 q2, then the game continues with (~q1) q2 as
the new p.
 Atom. An ending state is any state in which p is an atom: an n-adic predicate
symbol applied to n arguments, each of which is the name of some element of
the domain D.

Each rule except implication simplifies the formula p by removing one quantifier or
Boolean operator. The rules for conjunction and disjunction may even throw away
half of the formula. Although the implication rule replaces one operator with two, the
new operators are simpler than . Sooner or later, the game ends when p is reduced to
a single atom. Then the winner or loser is determined by checking whether that atom
consists of the name P of some n-adic relation in R applied to an n-tuple of
arguments c1,...,cn for which that relation is true.

 The player who is the current proposer wins the game if the n-tuple c1,...,cn is
in the extension of the relation named by P.
 The player who is the current skeptic wins if the n-tuple c1,...,cn is not in the
extension of the relation named by P.

If neither player made a bad move at any step, the original p is true if the player who
started the game as proposer is the winner and false if the player who started the game
as skeptic is the winner. In effect, the procedure for computing the evaluation
function (p,M) is equivalent to determining which side has a winning strategy in the
game specified by the formula p for the given model M.

To illustrate the evaluation game, compute the denotation of the formula that
expresses the statement For every integer n, the integer 2n is even:

( n)( m)(integer(n) (times(2,n,m) even(m))).


For this example, the domain D is the set of all integers and the set of
relations R would include the relations named times, integer, and even. Since the
extensions of these relations contain infinitely many tuples, they could not be stored
explicitly on a computer, but they could easily be computed as needed.

1. Since the first quantifier is , the skeptic makes the first move by choosing any
element of D. Suppose that the skeptic chooses n=17. Then the new version
of p becomes
2. ( m)(integer(17) (times(2,17,m) even(m))).
3. Since the quantifier is , the proposer makes the next move. By peeking ahead,
a clever proposer would guess m=34. Then the new version of p becomes
4. integer(17) (times(2,17,34) even(34)).
5. By the rule for , this formula is converted to
6. ~integer(17) (times(2,17,34) even(34)).
7. By the rule for , the proposer can choose either side of . Suppose that the
proposer chooses the right-hand side:
8. times(2,17,34) even(34).
9. By the rule for , the skeptic can choose either side of the . Choosing the left
side,
10. times(2,17,34).
11.Finally, the proposer and the skeptic search the table of triples for the times
relation. Since 2,17,34 is in the table, the proposer wins, and p is declared to
be true.
For the proposer to win one game is not sufficient to show that the formula has
denotation true. Instead, the proposer must have a winning strategy for any possible
move by the skeptic. Since the domain of integers is infinite, the skeptic has infinitely
many options at step 1. But for any choice n by the skeptic at step 1, the proposer can
always choose the value 2n for m at step 2. The next option for the skeptic is at step 5,
where the formula would become
times(2,n,2n) even(2n).
Since both sides of the operator must be true, the skeptic has no possibility of
winning. Therefore, the proposer is guaranteed to win, and the formula is true.

This example shows that a winning strategy can often be found for models that allow
infinitely many possible moves. Leon Henkin (1959) showed that game-theoretical
semantics could also be applied to infinitely long formulas. In some cases, the truth
value of an infinite formula can be computed in just a finite number of steps. For
example, consider an infinite conjunction:
p1 p2 p3 ...
If this formula happens to be true, the evaluation game would take infinitely long,
since every pi would have to be tested. But if it is false, the game would stop when the
skeptic finds the first false pi. A similar optimization holds for infinite disjunctions:
p1 p2 p3 ...
If a disjunctive formula is false, the evaluation game would take infinitely long. But if
it is true, the game would stop when the proposer finds the first true pi. Although
nothing implemented in a computer can be truly infinite, a typical database or
knowledge base may have millions of options. Tarski's original definition would
require an exhaustive check of all possibilities, but the game-theoretical method can
prune away large branches of the computation. In effect, the optimizations are
equivalent to the optimizations used in computer game-playing programs. They are
also similar to the methods used to answer an SQL query in terms of a relational
database.

Conceptual Graphs
Conceptual graphs (CGs) are a system of logic based on
the existential graphs of Charles Sanders Peirce and the
semantic networks of artificial intelligence. They express
meaning in a form that is logically precise, humanly readable,
and computationally tractable. With a direct mapping to
language, conceptual graphs serve as an intermediate
language for translating computer-oriented formalisms to and
from natural languages. With their graphic representation,
they serve as a readable, but formal design and specification
language. CGs have been implemented in a variety of projects
for information retrieval, database design, expert systems, and
natural language processing.

For examples of conceptual graphs and their translations to


predicate calculus and the Knowledge Interchange Format
(KIF):

Examples using the standard HTML 4.0 symbol


definitions

Examples using the symbols font in MS


Windows (suitable for older versions of Navigator or Internet
Explorer).

For the conceptual graph standard:

Conceptual Graph Examples


Conceptual graphs are formally defined in an abstract syntax that is independent of
any notation, but the formalism can be represented in several different concrete
notations. This document illustrates CGs by means of examples represented in the
graphical display form (DF), the formally defined conceptual graph interchange
form (CGIF), and the compact, but readable linear form (LF). Every CG is
represented in each of these three forms and is translated to a logically equivalent
representation in predicate calculus and in the Knowledge Interchange Format (KIF).
For the formal definition of conceptual graphs and the various notations for
representing them, see the draft proposed American National Standard. For examples
of an English-like notation for representing logic, see the web page on controlled
English.

List of Examples
Following are some sample sentences that are represented in each of the notations:
CGs, KIF, and predicate calculus. Click on the sentence to go directly to its
representation.

1. A cat is on a mat.
2. Every cat is on a mat.
3. John is going to Boston by bus.
4. A person is between a rock and a hard place.
5. Tom believes that Mary wants to marry a sailor.

1. A cat is on a mat.
In the display form (DF), concepts are represented by rectangles: the concept [Cat]
represents a instance of a cat, and [Mat] represents an instance of a mat. Conceptual
relations are represented by circles or ovals: the conceptual relation (On) relates a cat
to a mat. The arcs that link the relations to the concepts are represented by arrows: the
first arc has an arrow pointing toward the relation, and the second arc has an arrow
pointing away from the relation. If a relation has more than two arcs, the arcs are
numbered.

In the linear form (LF), concepts are represented by square brackets instead of boxes,
and the conceptual relations are represented by parentheses instead of circles:
[Cat]®(On)®[Mat].
Both DF and LF are designed for communication with humans or between humans
and machines. For communication between machines, the conceptual graph
interchange form (CGIF) has a syntax that uses coreference labels to represent the
arcs:
[Cat: *x] [Mat: *y] (On ?x ?y)

The symbols *x and *y are called defining labels. The matching symbols ?x and ?y
are the bound labels that indicate references to the same instance of a cat x or a mat y.
To reduce the number of coreference labels, CGIF also permits concepts to be nested
inside the relation nodes:
(On [Cat] [Mat])

The display form in Figure 1 represents the abstract CG most directly. All the
variations of LF and CGIF represent stylistically different, but logically equivalent
ways of linearizing the same abstract graph. All these variations are accommodated by
the LF grammar and the CGIF grammar, which are defined in the CG standard.

CGIF is intended for transfer between computer systems that use CGs as their internal
representation. For communication with systems that use other internal
representations, CGIF can be translated to another logic-based formalism called the
Knowledge Interchange Format (KIF):
(exists ((?x Cat) (?y Mat)) (On ?x ?y))

Although DF, LF, CGIF, and KIF look very different, their semantics is defined by
the same logical foundations. They can all be translated to a statement of the
following form in typed predicate calculus:
($x:Cat)($y:Mat)on(x,y).
Any statement expressed in any one of these notations can be automatically
translated to a logically equivalent statement in any of the others. Formatting and
stylistic information, however, may be lost in translations between DF, LF, CGIF, KIF,
and predicate calculus.

2. Every cat is on a mat.


The default quantifier in a concept is the existential $, which is normally represented
by a blank. The concept [Cat] without anything in the referent field is logically
equivalent to the concept [Cat: $], which asserts the proposition that there exists a cat.
Other quantifiers, such as the universal ", are called defined quantifiers because they
can be defined in terms of conceptual graphs containing only the default existential. In
Figure 2, the concept [Cat: "] represents the phrase every cat, and the complete CG
represents the sentence Every cat is on a mat.

In the linear form (LF), the universal quantifier may be represented by the symbol " if
it is available. Otherwise, it is represented by the symbol @every. Both of the
following CGs are semantically identical:
[Cat: "]®(On)®[Mat].

[Cat: @every]®(On)®[Mat].

Since CGIF is expressible in the 7-bit subset of ASCII or Unicode, the


character " must be represented by @every in CGIF:
[Cat: @every*x] [Mat: *y] (On ?x ?y)

As in Figure 1, CGIF permits concepts to be nested inside the relation nodes:


(On [Cat: @every] [Mat])

In all these examples, the universal quantifier @every or " includes the default
existential quantifiers in the same context within its scope. The scope is enforced by
the definition of the quantifier @every in terms of the existential. When the definition
is expanded, the CG in Figure 2 is expanded to a CG that can be represented by the
following LF graph:
~[ [Cat: *x]
~[ [?x]®(On)®[Mat] ]].
Literally, this CG may be read It is false that there exists a cat x that is not on a mat.
An alternate reading treats the two nested negations as an implication: If there exists
a cat x, then x is on a mat. Following is a CGIF representation of the expanded CG:
~[ [Cat: *x] ~[ (On ?x [Mat]) ]].
All of these forms are logically equivalent to the original CG in Figure 2.
Since KIF has a universal quantifier, it is not necessary to expand the defined
quantifier @every before translating a CG to KIF. Following is the KIF translation of
Figure 2:
(forall ((?x Cat)) (exists ((?y Mat)) (On ?x ?y)))
This KIF statement is logically equivalent to the KIF statement that results from
translating the expanded CG:
(not (exists ((?x Cat)) (not (exists ((?y Mat)) (On ?x ?y)))))

The original CG in Figure 2 may be represented by the following formula in typed


predicate calculus:
("x:Cat)($y:Mat)on(x,y).
This formula is logically equivalent to the formula that represents the expanded CG:
~($x:Cat)~($y:Mat)on(x,y).

3. John is going to Boston by bus.


Figure 3 shows a conceptual graph with four concepts: [Go], [Person: John], [City:
Boston], and [Bus]. It has three concdeptual relations: (Agnt) relates [Go] to the agent
John, (Dest) relates [Go] to the destination Boston, and (Inst) relates [Go] to the
instrument bus.

Since the concept [Go] is attached to three conceptual relations, the linear form cannot
be drawn in a straight line, as in Figure 1. Instead, a hyphen at the end of the first line
indicates that the relations attached to [Go] are continued on subsequent lines.
[Go]-
(Agnt)®[Person: John]
(Dest)®[City: Boston]
(Inst)®[Bus].
This example resembles frame notation, but LF also permits coreference labels to
represent the cross references needed to represent arbitrary graphs.

In the following CGIF representation for Figure 3, each concept has its own defining
label:
[Go: *x] [Person: John *y] [City: Boston *z] [Bus: *w]
(Agnt ?x ?y) (Dest ?x ?z) (Inst ?x ?z)

By nesting some of the concepts inside the relations, the CGIF form can be limited to
just a single defining label *x and a bound label ?x inside each relation node:
[Go *x] (Agnt ?x [Person: John]) (Dest ?x [City: Boston]) (Inst ?x [Bus])

The display form in Figure 3 represents the abstract CG most directly. All the
variations of LF and CGIF represent different, but logically equivalent ways of
linearizing the same abstract graph.

The version of CGIF that assigns a separate defining label to each concept usually has
the most direct mapping to KIF:
(exists ((?x Go) (?y Person) (?z City) (?w Bus))
(and (Name ?y John) (Name ?z Boston)
(Agnt ?x ?y) (Dest ?x ?z) (Inst ?x ?w)))

Following is the corresponding formula in typed predicate calculus:


($x:Go)($y:Person)($z:City)($w:Bus)
(name(y,'John') Ù name(z,'Boston') Ù
agnt(x,y) Ù dest(x,z) Ù inst(x,w))
For a list of the relations that connect the concepts corresponding to verbs to the
concepts of their participants, see the web page on thematic roles.

4. A person is between a rock and a hard place.


The between relation (Betw) is a triadic relation, whose first two arcs are linked to
concepts of entities that occur on either side of the entity represented by the concept
linked to the third arc. For a conceptual relation with n arcs, the first n-1 arcs have
arrows that point toward the circle, and the n-th or last arc points away.
In LF, Figure 4 may be represented in the following form:
[Person]¬(Betw)-
¬1-[Rock]
¬2-[Place]®(Attr)®[Hard].
The hyphen after the relation indicates that its other arcs are continued on
subsequent lines. The two arcs that point towards the relation are numbered 1 and
2. The arc that points away is the last or third arc; the number 3 may be omitted,
since it is implied by the outward pointing arrow. For monadic relations, both the
number 1 and the arrow pointing towards the circle are optional. For dyadic
relations, the arcs are either numbered 1 and 2, or the first arc points towards the
circle and the second arc points away.

CGIF allows any number of concepts to be nested inside the relations:


(Betw [Rock] [Place *x] [Person]) (Attr ?x [Hard])
For relations with more than 2 arcs, CGIF notation is more compact than the
multiline LF notation. Therefore, most LF implementations allow CGIF notation to be
mixed with the arrow notation:
[Place: *x]®(Attr)®[Hard] (Betw [Rock] ?x [Person]).
In the CG standard, the only notation that must be standardized is CGIF, since every
implementation must recognize exactly the same forms. The LF and DF notations are
specified only in an informative annex to the CG standard. Therefore, implementers
may experiment with different variations in an attempt to improve readability or
convenience. Nevertheless, too much variation may make it harder for human
readers to switch from one version to another.

Following is the KIF representation:


(exists ((?x person) (?y rock) (?z place) (?w hard))
(and (betw ?y ?z ?x) (attr ?z ?w)))
And following is the corresponding formula in predicate calculus:
($x:Person)($y:Rock)($z:Place)($w:Hard)
(betw(y,z,x) Ù attr(z,w))
To emphasize the correspondence between CGs and KIF, it is possible to write CGIF
in a form in which the concept nodes with their quantifiers and coreference labels
precede the relation nodes:
[Person *x] [Rock *y] [Place *z] [Hard *w]
(betw ?y ?z ?x) (attr ?z ?w)

5. Tom believes that Mary wants to marry a sailor.


A context is a concept with a nested conceptual graph that describes the referent. In
Figure 4, the concept of type Proposition is a context that describes a proposition that
Tom believes. Inside that context is another context of type Situation, which describes
a situation that Tom believes Mary wants. The resulting CG represents the
sentence Tom believes that Mary wants to marry a sailor.

In Figure 5, Tom is the experiencer (Expr) of the concept [Believe], which is linked
by the theme relation (Thme) to a proposition that Tom believes. The proposition box
contains another conceptual graph, which says that Mary is the experiencer of [Want],
which has as theme a situation that Mary hopes will come to pass. That situation is
described by another nested graph, which says that Mary (represented by the concept
[T]) marries a sailor. The dotted line, called a coreference link, shows that the concept
[T] in the situation box refers to the same individual as the concept [Person: Mary] in
the proposition box. Following is the corresponding linear form:
[Person: Tom]¬(Expr)¬[Believe]®(Thme)-
[Proposition: [Person: Mary *x]¬(Expr)¬[Want]®(Thme)-
[Situation: [?x]¬(Agnt)¬[Marry]®(Thme)®[Sailor] ]].
Both the display form and the linear form follow the same rules for the scope of
quantifiers. The part of the graph outside the nested contexts contains three
concepts: [Person: Tom], [Believe], and the proposition that Tom believes. That part
of the graph asserts information that is assumed to be true of the real world.

Inside the proposition box are three more concepts: [Person: Mary], [Want], and the
situation that Mary wants. Since those three are only asserted within the context of
Tom's belief, the graph does not imply that they must exist in the real world. Since
Mary is a named individual, one might give her the benefit of the doubt and assume
that she also exists; but her desire and the situation she supposedly desires exist in the
context of Tom's belief. If his belief is false, the referents of those concepts might not
exist in the real world. Inside the context of the desired situation are the concepts
[Marry] and [Sailor], whose referents exist within the scope of Mary's desire, which
itself exists only within the scope of Tom's belief.

Following is the CGIF representation for Figure 5:


[Person: *x1 'Tom'] [Believe *x2] (Expr ?x2 ?x1)
(Thme ?x2 [Proposition:
[Person: *x3 'Mary'] [Want *x4] (Expr ?x4 ?x3)
(Thme ?x4 [Situation:
[Marry *x5] (Agnt ?x5 ?x3) (Thme ?x5 [Sailor]) ]) ])

Following is the KIF statement:


(exists ((?x1 person) (?x2 believe))
(and (expr ?x2 ?x1)
(thme ?x2
(exists ((?x3 person) (?x4 want) (?x8 situation))
(and (name ?x3 'Mary) (expr ?x4 ?x3) (thme ?x4 ?x8)
(dscr ?x8 (exists ((?x5 marry) (?x6 sailor))
(and (Agnt ?x5 ?x3) (Thme ?x5 ?x6)))))))))
Following is the predicate calculus formula:
($x1:Person)($x2:Believe)(expr(x1,x2) Ù
thme(x2, ($x3:Person)($x4:Want)($x8:Situation)
(name(x3,'Mary') Ù expr(x4,x3) Ù thme(x4,x8) Ù
dscr(x8, ($x5:Marry)($x6:Sailor)
(agnt(x5,x3) Ù thme(x5,x6))))))
UNIT II ONTOLOGY

Words of Wisdom
There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.

William Shakespeare, Hamlet

The task of classifying all the words of language, or what's the same thing, all the
ideas that seek expression, is the most stupendous of logical tasks. Anybody but the
most accomplished logician must break down in it utterly; and even for the strongest
man, it is the severest possible tax on the logical equipment and faculty.

Charles Sanders Peirce, letter to editor B. E. Smith of the Century Dictionary

The art of ranking things in genera and species is of no small importance and very
much assists our judgment as well as our memory. You know how much it matters in
botany, not to mention animals and other substances, or again moral and notional
entities as some call them. Order largely depends on it, and many good authors write
in such a way that their whole account could be divided and subdivided according to
a procedure related to genera and species. This helps one not merely to retain things,
but also to find them. And those who have laid out all sorts of notions under certain
headings or categories have done something very useful.

Gottfried Wilhelm Leibniz, New Essays on Human Understanding

We must be systematic, but we should keep our systems open.

Alfred North Whitehead, Modes of Thought

Definition and Scope


The subject of ontology is the study of the categories of things that exist or may exist
in some domain. The product of such a study, called an ontology, is a catalog of the
types of things that are assumed to exist in a domain of interest D from the perspective
of a person who uses a language L for the purpose of talking about D. The types in the
ontology represent the predicates, word senses, or concept and relation types of the
language L when used to discuss topics in the domain D. An uninterpreted logic, such
as predicate calculus, conceptual graphs, or KIF, is ontologically neutral. It imposes
no constraints on the subject matter or the way the subject may be characterized. By
itself, logic says nothing about anything, but the combination of logic with an
ontology provides a language that can express relationships about the entities in the
domain of interest.

An informal ontology may be specified by a catalog of types that are either undefined
or defined only by statements in a natural language. A formal ontology is specified by
a collection of names for concept and relation types organized in a partial ordering by
the type-subtype relation. Formal ontologies are further distinguished by the way the
subtypes are distinguished from their supertypes: an axiomatized
ontology distinguishes subtypes by axioms and definitions stated in a formal language,
such as logic or some computer-oriented notation that can be translated to logic;
a prototype-based ontology distinguishes subtypes by a comparison with a typical
member or prototype for each subtype. Large ontologies often use a mixture of
definitional methods: formal axioms and definitions are used for the terms in
mathematics, physics, and engineering; and prototypes are used for plants, animals,
and common household items.

KR Ontology
The ontology presented on this web site is based on the book Knowledge
Representation by John F. Sowa. The basic categories and distinctions have been
derived from a variety of sources in logic, linguistics, philosophy, and artificial
intelligence. The two most important influences have been the philosophers Charles
Sanders Peirce and Alfred North Whitehead, who were pioneers in symbolic logic.
Peirce was also an associate editor of the Century Dictionary, for which he wrote,
revised, or edited over 16,000 definitions. In calling that task "stupendous," he was
looking beyond his personal experience of writing definitions in English to the task of
stating complete definitions in logic, which he said was "a labor for generations of
analysts, not for one." That labor, for which there was little practical application in the
19th century, is a major challenge for the 21st. Without it, there is no hope of merging
and integrating the ever expanding and multiplying databases and knowledge bases
around the world.

Yet as Shakespeare observed, any philosophy is destined to be incomplete. The


continuing advance of science and human experience invevitably leads to new words
and ideas that require extensions to any proposed system of categories. Whitehead's
motto is the best guideline for any philosopher or scientist: "We must be systematic,
but we should keep our systems open."
Hierarchies of Categories
To keep the system open-ended, the KR ontology is not based on a fixed hierarchy of
categories, but on a framework of distinctions, from which the hierarchy is generated
automatically. For any particular application, the categories are not defined by
drawing lines on a chart, but by selecting an appropriate set of distinctions. When the
application-dependent distinctions are added to the basic set, a new lattice of
categories can be created by pushing a button.

The icon in the upper left corner of this web page illustrates the lattice used to
represent the top-level categories, but lattices can also be used to represent categories
at any level. As an example of a lattice of lower-level types, Figure 1 shows beverage
types classified according to the attributes alcoholic, nonalcoholic, hot, sparkling,
caffeinic, madeFromGrapes, and madeFromGrain. This lattice was derived from the
attributes by the method of formal concept analysis. For more information, see
the FCA home page.

Figure 1: A lattice constructed by the method of formal concept analysis (FCA)

The FCA techniques belong to the general class of data mining procedures, which
find patterns in a relational database. The raw data used to generate FCA lattices is the
same kind of data that could be used for other data mining techniques, such as neural
networks. Each technique has its own advantages and disadvantages, depending on
how the result is going to used. For ontology, the FCA technique produces a sublattice
that can be automatically merged with a more general lattice of categories. In the case
of Figure 1, the top node represents the type Beverage, which could be defined as
DrinkableLiquid in terms of higher-level categories. For further discussion of these
techniques, see the tutorial on lattices and the glossary of terminology about defining,
refining, merging, and sharing ontologies.

Guide to this Web Site


The menu at the upper left of this web page lists other web pages that present the KR
ontology and related background information:

 Lexicon. An article, "Concepts in the Lexicon," which is based on material


taken from several published papers by John Sowa. Part I discusses the
problems and issues in defining a lexicon of words in natural language and
relating them to a semantic representation in logic.
 Top level. The top-level categories of the KR ontology with a discussion of the
distinctions from which they were derived and the basic axioms associated with
each category.
 Processes. The distinctions used to generate an ontology of process types and
their relationships to verbs in natural languages and methods of reasoning about
knowledge bases.
 Relations. The distinctions used to define roles and relations, including role
types such as Composite, Correlative, Component, Whole, Substrate, Part,
Piece, Participant, Stage, Property, Attribute, and Manner.
 Causality. Issues of causality and causation, their relation to processes and
time, and their representation in logic and Petri nets.
 Agents. Linguistically, an agent is represented by the subject of an active verb.
Socially, an agent is an animate being that takes responsibility for its actions in
the world. Computationally, an agent is a robot or softbot that can apply
general guidelines in deciding how to respond to a specific situation.
 Thematic roles. The distinctions used to define the subtypes of Participant
called thematic roles, which relate concepts of verbs to the concepts of the
entities that participate in the action of a verb.
 Glossary. A glossary of the terminology used for talking about ontologies and
the problems and techniques for defining, refining, merging, and sharing
ontologies.
 Math & logic. A 38-page tutorial of mathematical background on sets,
functions, relations, graphs, lambda calculus, lattices, logic, formal grammars,
and model theory.
 CG examples. Some sample sentences in English and their translation to three
different, but semantically equivalent notations for logic: conceptual graphs,
the Knowledge Interchange Format (KIF), and predicate calculus.
 CG standard. A web site containing the draft proposed ANSI standard for
conceptual graphs and related information about CG tools.

Top-Level Categories
This web page summarizes the top levels of the KR Ontology, which is defined in the
book Knowledge Representation by John F. Sowa. Figure 1 shows a lattice of the top-
level categories discussed in Chapter 2 of that book. These categories have been
derived from a synthesis of various sources, but the two major influences are the
semiotics of Charles Sanders Peirce and the categories of existence of Alfred North
Whitehead.

Figure 1: Hierarchy of top-level categories


Any category in Figure 1 can be abbreviated by the initials of the primitive categories
above it: Independent, Relative, or Mediating; Physical or Abstract; Continuant or
Occurrent. Actuality, for example, may be abbreviated as IP for Independent Physical,
and Purpose as MAO for Mediating Abstract Occurrent. The twelve categories
displayed in the center of the lattice and the primitives from which they are generated
can also be arranged in the matrix of Figure 2.

Physical Abstract

Continuant Occurrent Continuant Occurrent

Independent Object Process Schema Script

Relative Juncture Participation Description History

Mediating Structure Situation Reason Purpose

Figure 2: Matrix of the twelve central categories

The lattice and the matrix are different ways of displaying the combinatorial structure
of the categories. The two kinds of diagrams highlight different aspects:

 The lattice can display intermediate categories that are formed from the
options taken two at a time, such as Actuality (IP), Proposition (RA), or Nexus
(MP).
 But a diagram of the full lattice may become too cluttered if all options are
displayed, and some possible combinations, such as AbstractOccurrent (AO) or
RelativeContinuant (RC), were omitted to simplify the diagram.
 When all possible combinations are meaningful, the matrix is often the
simplest way to display them. But if some combinations are ruled out because
of other constraints, some boxes in the matrix may be empty.

All the categories defined by such lattices and matrices can be represented as monadic
predicates defined by conjunctions of simpler monadic predicates. For example, the
category Participation (RPO) corresponds to a predicate participation(x), which is
defined by the following conjunction:

 participation(x) º relative(x) Ù physical(x) Ù occurrent(x).


Categories defined by conjunctions of primitives are useful for generating the
structural backbone of the type hierarchy. But the more specialized categories in an
ontology may require more complex logical expressions. For further discussion of the
problems and issues of defining a large ontology of concepts with detailed semantic
representations, see the article "Concepts in the Lexicon".

The categories in Figure 1 are listed below. Nine primitive categories have associated
axioms: T, ^, Independent, Relative, Mediating, Physical, Abstract, Continuant, and
Occurrent. Each subtype is defined as the infimum (greatest common subtype,
represented by the symbol Ç) of two supertypes, whose axioms it inherits. For
example, the type Form is defined as IndependentÇAbstract; it therefore inherits the
axioms of Independent and Abstract, and it is abbreviated IA to indicate its two
supertypes. See the glossary for definitions of the techniques and metalevel
conventions used to define these categories. See the tutorial for a review of the
definitions and notations for sets, functions, relations, graphs, lattices, and logic.

T ().
The universal type, which has no differentiae. Formally, T is a primitive that
satisfies the following axioms:

 There exists something: ($x)T(x).


 Everything is an instance of T: ("x)T(x).
 Every type is a subtype of T: ("t:Type)t£T.

All other types are defined by adding differentiae to T to show how they are
distinguished from T and from one another. The type Entity is a pronounceable
synonym for T.

^ (IRMPACO).
The absurd type, which inherits all differentiae. Formally, ^ is a primitive that
satisfies the following axioms:

 Nothing is an instance of ^: ~($x)^(x).


 Every type is a supertype of ^: ("t:Type)^£t.

Since ^ is the inconsistent conjunction of all differentiae, it is not possible for


any existing entity to be an instance of ^. Two types s and t are said to
be incompatible if their only common subtype sÇt is ^. For example, DogÇCat
= ^ because it is not possible for anything to be both a dog and a cat at the
same time.

Abstract (A).
Pure information as distinguished from any particular encoding of the
information in a physical medium. Formally, Abstract is a primitive that
satisfies the following axioms:

 No abstraction has a location in space: ~($x:Abstract)($y:Place)loc(x,y).


 No abstraction occurs at a point in time: ~($x:Abstract)
($t:Time)pTim(x,t).

As an example, the information you are now reading is encoded on a physical


object in front of your eyes, but it is also encoded on paper, magnetic spots, and
electrical currents at several other locations. Each physical encoding is said
to represent the same abstract information.

Absurdity (IRMPACO) = ^.
A pronounceable synonym for ^. It cannot be the type of anything that exists.
Actuality (IP) = IndependentÇPhysical.
A physical entity (P) whose existence is independent (I) of any other entity. As
instances, the category Actuality includes both objects and processes. The
term is taken from Whitehead, who used it as a synonym for actual entity,
which he considered the equivalent of Aristotle's ousia and Descartes's res
vera.
Continuant (C).
An entity whose identity continues to be recognizable over some extended
interval of time. Formally, Continuant is a primitive that satisfies the following
axioms:

 A continuant x has only spatial parts and no temporal parts. At any


time t when x exists, all of x exists at the same time t. New parts of a
continuant x may be acquired and old parts may be lost, as when a
snake sheds its skin. Parts that have been lost may cease to exist, but
everything that remains a part of x continues to exist at the same time
as x.
 The identity conditions for a continuant are independent of time. If c is
a subtype of Continuant, then the identity predicate Idc(x,y) for
identifying two instances x and y of type c does not depend on time.

A physical continuant is an object, and an abstract continuant is a schema that


may be used to characterize some object.

Description (RAC) = PropositionÇContinuant.


A proposition (RA) about a continuant (C). A description is a proposition that
states how some schema characterizes some aspect or configuration of a
continuant.
Entity () = T.
A pronounceable synonym for T. Entity can be used as the default type for
anything of any category.
Form (IA) = AbstractÇIndependent.
Abstract information (A) independent (I) of any encoding or embodiment.
Forms can be said to exist in the same sense as mathematical objects such as
sets and relations, but instances of forms cannot exist at a particular place and
time without some physical encoding or embodiment. Whitehead called them
"eternal objects" because they are independent of space and time.
History (RAO) = PropositionÇOccurrent.
A proposition (RA) about an occurrent (O). A history is a proposition (RA) that
relates some script (IAO) to the stages of some occurrent (O). A computer
program, for example, is a script (IAO); a computer executing the program is a
process (IPO); and the abstract information (A) encoded in a trace of the
instructions executed is a history (RAO). Like any proposition, a history need
not be true, and it need not be predicated of the past: a myth is a history of an
imaginary past; a prediction is a history of an expected future; and a scenario
is a history of some hypothetical occurrent.
Independent (I).
An entity characterized by some inherent Firstness, independent of any
relationships it may have to other entities. Formally, Independent is a
primitive for which the has-test of Section 2.4 need not apply. If x is an
independent entity, it is not necessary that there exists an entity y such
that x has y or y has x:

 ("x:Independent)~o($y)(has(x,y) Ú has(y,x)).

Intention (MA) = AbstractÇMediating.


Abstraction (A) considered as mediating (M) other entities. Examples of
intentions include the hopes, fears, wishes, and purposes that mediate some
agent's actions.
Juncture (RPC) = PrehensionÇContinuant.
A prehension (RP) considered as a continuant (C) during some time interval.
The prehending entity is an object (IPC) in a stable relationship to some
prehended entity during that interval. An example of a juncture is the
relationship between two adjacent stones in an arch. The arch itself is a nexus
that both mediates and consists of the multiple junctures.
Mediating (M).
An entity characterized by some Thirdness that brings other entities into a
relationship. An independent entity need not have any relationship to
anything else, a relative entity must have some relationship to something else,
and a mediating entity creates a relationship between two other entities. An
example of a mediating entity is a marriage, which creates a relationship
between a husband and a wife.

According to Peirce, the defining aspect of Thirdness is "the conception of


mediation, whereby a first and a second are brought into relation." That
property could be expressed in second-order logic:

 ("m:Mediating)("x,y:Entity)
(($R,S:Relation)(R(m,x) Ù S(m,y))) É o($T:Relation)T(x,y).

This formula says that for any mediating entity m and any other entities x and y,
if there exist relations R and S that relate m to x and m to y, then it is necessarily
true that there exists some relation T that relates x to y. For example, if m is a
marriage, R relates m to a husband x, S relates m to a wife y, then T relates the
husband to the wife (or the wife to the husband).

Instead of a second-order formula, an equivalent first-order axiom could be


stated in terms of the primitive has relation, which is discussed in Section 2.4
of the book Knowledge Representation:

 ("m:Mediating)("x,y:Entity)
((has(m,x) Ù has(m,y)) É o(has(x,y) Ú has(y,x)).

This formula says that for any mediating entity m and any other entities x and y,
if m has x and m has y, then it is necessary that x has y or y has x. In effect,
the has relation in this formula is a generalization of the relations R, S, and T in
the second-order formula. For example, if m is a marriage that has a
husband x and a wife y, then the husband has the wife or the wife has the
husband (or both).

Nexus (MP) = PhysicalÇMediating.


A physical entity (P) mediating (M) two or more other entities. Each nexus is a
bundle of prehensions, which may be the junctures of an object or the
participants of a process. Examples include an arch that consists of junctures
of stones or an action that consists of what one participant called an agent is
doing to another participant called a patient.
Object (IPC) = ActualityÇContinuant.
Actuality (IP) considered as a continuant (C), which retains its identity over
some interval of time. Although no physical entity is ever permanent, an
object can be recognized by identity conditions that remain stable during its
lifetime. The type Object includes ordinary physical objects as well as the
instantiations of classes in object-oriented programming languages.
Occurrent (O).
An entity that does not have a stable identity during any interval of time.
Formally, Occurrent is a primitive that satisfies the following axioms:

 The temporal parts of an occurrent, which are called stages, exist at


different times.
 The spatial parts of an occurrent, which are called participants, may
exist at the same time, but an occurrent may have different participants
at different stages.
 There are no identity conditions that can be used to identify two
occurrents that are observed in nonoverlapping space-time regions.

A person's lifetime, for example, is an occurrent. Different stages of a life


cannot be reliably identified unless some continuant, such as the person's
fingerprints or DNA, is recognized by suitable identity conditions at each stage.
Even then, the identification depends on an inference that presupposes the
uniqueness of the identity conditions.

Participation (RPO) = PrehensionÇOccurrent.


A prehension (RP) considered as an occurrent (O) during the interval of
interest. The prehending entity is a process (IPO), and the prehended entity is
called a participant.
Physical (P).
An entity that has a location in space-time. Formally, Physical is a primitive
that satisfies the following axiom:

 Anything physical is located in some place: ("x:Physical)


($y:Place)loc(x,y).
 Anything physical occurs at some point in time: ("x:Physical)
($t:Time)pTim(x,t).

More detailed axioms that relate physical entities to space, time, matter, and
energy would involve a great deal physical theory, which is beyond the scope
of the KR book.

Process (IPO) = ActualityÇOccurrent.


Actuality (IP) considered as an occurrent (O) during the interval of interest.
Depending on the time scale and level of detail, the same actual entity may be
viewed as a stable object or a dynamic process. Even an entity as stable as a
diamond could be considered a process when viewed over a long time period
or at the atomic level of vibrating particles. For further discussion, see the web
page on processes.
Prehension (RP).
A physical entity (P) relative (R) to some entity or entities. The has-test is used
to check whether an entity x prehends an entity y. If so, the prehension may
be expressed has(x,y).
Proposition (RA).
An abstraction (A) that relates (R) some entity or entities. In logic, the
assertion of a proposition is a claim that the abstraction corresponds to some
aspect or configuration of the entity or entities involved. As an example, the
statement cat(Yojo) expresses a proposition that the form labeled Cat
characterizes the entity named Yojo. According to Peirce and Whitehead,
more complex propositions are asserted by constructing a compound
predicate, such as a mathematical expression or a diagram, and using it to
characterize the prehensions that relate multiple entities.
Purpose (MAO) = IntentionÇOccurrent.
Intention (MA) that has the form of an occurrent (O). As an example, the
words and notes of the song "Happy Birthday" constitute a script (IAO); a
description of how people at a party sang the song is history (RAO); and the
intention (MA) that explains the situation (MPO) is a purpose (MAO). The basic
axioms for Purpose are inherited from its supertypes Mediating, Abstract, and
Occurrent. Lower-level axioms relate purposes to actions and agents:

 Time sequence. If an agent x performs an act y whose purpose is a


situation z, the start of y occurs before the start of z.
 Contingency. If an agent x performs an act y whose purpose is a
situation z described by a proposition p, then it is possible that z might
not occur or that p might not be true of z.
 Success or failure. If an agent x performs an act y whose purpose is a
situation z described by a proposition p, then x is said to
be successful if z occurs and p is true of z; otherwise, x is said to
have failed.

For further discussion, see the web page on agents.

Reason (MAC) = IntentionÇContinuant.


Intention (MA) that has the form of a continuant (C). Unlike a simple
description (Secondness), a reason explains an entity in terms of an intention
(Thirdness). For a birthday party, a description might list the presents, but a
reason would explain why the presents are relevant to the party.
Relative (R).
An entity in a relationship to some other entity. Formally, Relative is a
primitive for which the has-test must apply:

 ("x:Relative)o($y)(has(x,y) Ú has(y,x)).

For any relative x, there must exist some y such that x has y or y has x.

Schema (IAC) = FormÇContinuant.


A form (IA) that has the structure of a continuant (C). A schema is an abstract
form (IA) whose structure does not specify time or timelike relationships.
Examples include geometric forms, the syntactic structures of sentences in
some language, or the encodings of pictures in a multimedia system.
Script (IAO) = FormÇOccurrent.
A form (IA) that has the structure of an occurrent (O). A script is an abstract
form (IA) that represents time sequences. Examples include computer
programs, a recipe for baking a cake, a sheet of music to be played on a piano,
or a differential equation that governs the evolution of a physical process. A
movie can be described by several different kinds of scripts: the first is a
specification of the actions and dialog to be acted out by humans; but the
sequence of frames in a reel of film is also a script that determines a process
carried out by a projector that generates flickering images on a screen.
Situation (MPO) = NexusÇOccurrent.
A nexus (MP) considered as an occurrent (O). A situation mediates the
participants of some process, whose stages may involve different participants
at different times.
Structure (MPC) = NexusÇContinuant.
A nexus (MP) considered as a continuant (C). A structure mediates multiple
objects whose junctures constitute the structure.
The primitive categories of any theory are undefinable in terms of anything more
primitive. The axioms associated with the categories are not closed-form definitions,
but constraints on how instances of those categories are related to instances of
other categories, many of which are not primitives. The only two categories in this
list whose axioms are completely formalized are T and ^. The other axioms cannot
be stated formally until a great deal more has been fully formalized. The axioms for
Physical, for example, use the categories Place and Time and the predicates loc and
pTim. A complete formalization of those axioms would depend on a fully developed
Grand Unified Theory of physics -- a task that the physicists are far from completing.

The task of formalizing everything is like the construction of a medieval cathedral: it


takes centuries to complete, and when it is done, someone else will have a plan for an
even grander cathedral. Whitehead's motto is the best guideline: "We must be
systematic, but we should keep our systems open." For further discussion of the
problems and issues, see Chapter 6 on "Knowledge Soup" in the book Knowledge
Representation.

Processes
Processes can be described by their starting and stopping points and by the kinds of
changes that take place in between. Figure 1 shows the category Process subdivided
by the distinction of continuous change versus discrete change. In a continuous
process, which is the normal kind of physical process, incremental changes take place
continuously. In a discrete process, which is typical of computer programs or
idealized approximations to physical processes, changes occur in discrete steps
called events, which are interleaved with periods of inactivity called states. A
continuous process with an explicit starting point is called an initiation; one with an
ending point is a cessation; and one whose endpoints are not being considered is
a continuation.
Figure 1: Types of Processes

Beneath each of the five categories at the leaves of Figure 1 is an icon that illustrates
the kind of change: a vertical bar indicates an endpoint, a wavy horizontal line
indicates change, and a straight horizontal line indicates no change. A discrete process
is a sequence of states and events that may be symbolized by a chain of straight and
wavy lines separated by vertical bars. A continuous process may be symbolized by a
continuous wavy line with occasional vertical bars. Those bars do not indicate a break
in the physical process, but a discontinuity in the way the process is classified or
described. The weather, for example, varies by continuous gradations, but different
time periods may be classified as hot or cold, cloudy or sunny.

The categories in Figure 1 could be further differentiated according to the kinds of


participants that are involved in the process and how they interact with one another.
An action is an event caused by an agent, and an activity is an extended process
(continuous or discrete) caused by one or more agents over some time interval. The
agent's intentions may also be important for determining the classification: a cessation
that satisfies an agent's goals is considered a success, and one that does not satisfy the
goals is a failure. For more detail, see the web pages on causality, agents, roles and
relations, and thematic roles. The fundamental principles and techniques for doing the
classification are discussed in the book Knowledge Representation.

In natural languages, the features of tense and aspect relate the event described by a
verb to the type of process and to the reference times of one or more observers.
The simple tenses -- past, present, and future -- relate the time of an event to the time
of the speech. The compound tenses, such as past perfect or future perfect, involve an
additional reference time in some real or hypothetical past or future. Aspect describes
the initiation, continuation, or completion of some action with respect to the reference
times. Whether an action is continuing or completed may depend on some agent's
intentions. The definitions of many verbs depend on the intentions of their agents, and
the same physical process described by different verbs can be classified in very
different ways. Different classifications, in turn, have different implications.
Depending on the verb that is used to describe an action, the agent may be praised for
a success or blamed for a failure.

In his book Features and Fluents, Erik Sandewall presented a detailed taxonomy of
process types, their representation in logic, and the techniques for reasoning about
them. His taxonomy is determined by a list of distinctions based on the complexity of
the interactions and the kinds of representations needed to describe them:

 Discrete or continuous. In physics, continuous processes are represented


by differential equations. On digital computers, time is divided
into discrete time points, which may be called integer time when they are
numbered 0, 1, 2.... Continuous processes can be approximated by decreasing
the time step of a discrete process and thereby increasing the number of
points that must be represented, stored, and computed. Differential equations
represent the limiting case when the time step approaches zero and the
number of points approaches infinity.
 Linear or branching. A linear order of time points and their associated events
creates a deterministic process that is easy to represent and compute.
Conditional alternatives, however, create branching time, with a
nondeterministic increase in the future possibilities that must be represented
and analyzed.
 Independent or ramified. If the variables that describe a process
are independent, a change to one has no effect on the others. In
a ramified process, a change in one variable can cause changes in the others.
Sandewall drew a three-way distinction: local independence, where all the
variables are independent; local ramification, where an event that changes
one variable only changes other variables that are directly involved in the
same event; and structural ramification, where the effects of changing one
variable may propagate to other events and cause indirect changes to
remotely related events.
 Immediate or delayed. If the changes caused by an event occur immediately,
they can be represented or simulated during the same time step as the event.
A delayed effect, however, does not cause an observable change until some
subsequent states or events.
 Sequential or concurrent. In a sequential process, only one event occurs at any
instant. In a concurrent process, multiple independent events may occur in
parallel.
 Predictable or surprising. A predictable process follows a script that specifies
all the possible causes and effects of each event.
A surprising or exogenous event causes a change that is not anticipated by the
script.
 Normal or equinormal. Some processes have a highly probable or normal flow
of events that can be assumed as the default. In an equinormal process,
multiple courses of events are equally likely, and no single outcome can be
considered the default.
 Flat or hierarchical. In a flat process, each event is described by a short list of
the changes it can cause. In a hierarchical process, any event may be
composed of subevents. A third option is a recursively hierarchical process, in
which some events are composed of subevents of the same type.
 Timeless or time-bound. The simplest processes to analyze involve a fixed
number of timeless objects that are neither created nor destroyed during the
process. Time-bound objects, which may be created or destroyed during the
interval of interest, lead to processes with an ever-changing inventory of
objects.
 Forgetful or memory-bound. In a forgetful process, the future course of events
depends only on the current state, not on the history of how the current state
came about. A memory-bound process retains some information from earlier
states, which may affect future outcomes.

The product of these distinctions generates an ontology with 28×32 or 2304


categories of processes. The first option in each distinction leads to a process as
predictable as a clock; the later options lead to richer but more complex processes.
Unfortunately for computational purposes, most naturally occurring processes
involve some or all of the later options.

The icons shown in Figure 1 can be linked end to end in a timing diagram to represent
a history of the process. At the bottom of Figure 2 is a timing diagram for a clock,
which is a discrete, linear, independent, immediate, sequential, predictable, normal,
flat, timeless, forgetful process. The history of the clock process shows an open-ended
sequence of states separated by negligibly short events called ticks. Each tick causes
an integer, called a counter, to be incremented by one.
Figure 2: Concurrent processes timed with a clock

Above the timing diagram for the clock is a timing diagram for a concurrent process,
which has varying numbers of subprocesses, called threads, that run in parallel. The
events and states of the process can be timed by comparing their occurrences with the
numbered ticks of the clock process. The concurrent process shown in Figure 2
represents a history of firings of a Petri net, which is discussed in Chapter 4 of the
book Knowledge Representation. For further discussion of Petri nets and their use in
representing processes, see the web page on causality.

Sandewall's list does not exhaust all the possible distinctions for classifying processes.
A system designer, for example, might distinguish a process local to a single system
from a process distributed across a network. Sandewall did not consider that
distinction because the location of a process is independent of the methods for
reasoning about it. For different purposes, new categories can be generated by adding
more distinctions to the list or by deleting some that are not relevant.

For further discussion about continuous processes, discrete processes, and causal
influences, see the paper on Processes and Causalit

You might also like