0% found this document useful (0 votes)
481 views

IS Notes

The document provides an overview of artificial intelligence and different approaches to creating intelligent systems. It discusses four categories of defining AI: thinking humanly, thinking rationally, acting humanly, and acting rationally. Some key approaches discussed include the Turing test, cognitive modeling of human thinking, using logical rules and reasoning, and creating rational agents that can perceive and act intelligently in an environment. The document also provides a brief history of milestones in AI and examples of capabilities demonstrated in current state-of-the-art systems.

Uploaded by

api-3834932
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
481 views

IS Notes

The document provides an overview of artificial intelligence and different approaches to creating intelligent systems. It discusses four categories of defining AI: thinking humanly, thinking rationally, acting humanly, and acting rationally. Some key approaches discussed include the Turing test, cognitive modeling of human thinking, using logical rules and reasoning, and creating rational agents that can perceive and act intelligently in an environment. The document also provides a brief history of milestones in AI and examples of capabilities demonstrated in current state-of-the-art systems.

Uploaded by

api-3834932
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Prof.

Marvin Intelligent Systems Page 1


Chapter 1: Introduction

What is AI?
Definitions of AI according to eight textbooks are shown in figure below:

Systems that think like humans Systems that think rationally


“The exciting new effort to make “The study of mental faculties
computers think …machines with through the use of computational
minds, in the full and literal models.” (Charniak and McDermott,
sense.” (Haugeland, 1985) 1985)

“[The automation of] activities that “The study of computations that


we associate with humans thinking, make it possible to perceive,
activities such as decision-making, reason and act.” (Winston, 1992)
problem solving, learning…”
(Bellman, 1978)
Systems that act like humans Systems that act rationally
“The act of creating machines that “Computational Intelligence is the
perform functions that require study of the design of intelligent
intelligence when performed by agents.” (Poole et al., 1998)
people.” (Kurzweil, 1990)

“The study of how to make computers “AI …is concerned with intelligent
do things which, at the moment, behavior in artifacts.” (Nilsson,
people do better.” (Rich and Knight, 1998)
1991)

The above views of AI fall into the following four categories:


Thinking Humanly Thinking Rationally
Acting Humanly Acting Rationally

Acting humanly: The Turing Test Approach


• "Can machines think?" Æ "Can machines behave intelligently?"
• Operational test for intelligent behavior: the Imitation Game

The Turing Test, proposed by Alan Turing (1950), was designed to


provide a satisfactory operational definition of intelligence. The
computer passes the test if a human interrogator, after posing some
written questions, cannot tell whether the written responses come from
a person or not. To pass the test the computer would need to possess
the following capabilities:
i. Natural Language Processing: To enable it to communicate
successfully in English.
ii. Knowledge Representation: To store what it knows or hears.
iii. Automated reasoning: To use the stored information to answer
questions and to draw new conclusions.
iv. Machine Learning: To adapt to new circumstances and to detect
and extrapolate patterns.

Turing’s test deliberately avoided direct physical interaction between


the interrogator and the computer, because physical simulation of a
person is unnecessary for intelligence. However the so-called total
Turing Test includes a video signal so that the interrogator can test
the subject’s perceptual abilities, as well as the opportunity for the
interrogator to pass physical objects “through the hatch”.

To pass the total turing test, the computer will need:


i. Computer Vision: To perceive objects
ii. Robotics: To manipulate objects and move about.
Prof. Marvin Intelligent Systems Page 2
Thinking humanly: The cognitive modeling approach

If we are going to say that a given program thinks like a human, we must
have some way of determining how humans think. We need to get inside the
actual workings of human minds. There are two ways to do this: through
introspection -trying to catch our own thoughts as they go by -and
through psychological experiments. The interdisciplinary field of
cognitive science brings together computer models from AI and
experimental techniques from psychology to try to construct precise and
testable theories of the workings of the human mind.

Thinking rationally: The “laws of thought” approach

The Greek philosopher Aristotle was one of the first to attempt to codify
“right thinking”, that is, irrefutable reasoning processes. His
syllogisms provided patterns for argument structures that always yielded
correct conclusions when given correct premises-for example, ”Socrates is
a man; all men are mortal; therefore, Socrates is mortal.” These laws of
thought were supposed to govern the operation of the mind; their study
initiated the field called logic.
Logicians in the 19th century developed a precise notation for
statements about all kinds of things in the world and about the relations
among them. By 1965, programs existed that could, in principle, solve any
solvable problem described in logical notation. The so-called logicist
tradition within artificial intelligence hopes to build on such programs
to create intelligent systems.
There are two main obstacles to this approach. First, it is not
easy to take information knowledge and state it in the formal terms
required by logical notation, particularly when the knowledge is les than
100% certain. Second, there is a big difference between being able to
solve a problem “in principle” and doing so in practice.

Acting rationally: The rational agent approach

An agent is just something that acts (agent comes from the Latin agere,
to do). But computer agents are expected to have other attributes that
distinguish them from mere “programs”, such as operating under autonomous
control, perceiving their environment, persisting over a prolonged time
period, adapting to change, and being capable of taking on another’s
goals. A rational agent is one that acts so as to achieve the best
outcome or, when there is uncertainty, the best expected outcome.
Making correct inferences is sometimes part of being a rational
agent, because one way to act rationally is to reason logically to the
conclusion that a given action will achieve one’s goals and then to act
on that conclusion.

Abridged history of AI
• 1943 McCulloch & Pitts: Boolean circuit model of brain
• 1950 Turing's "Computing Machinery and Intelligence"
• 1956 Dartmouth meeting: "Artificial Intelligence" adopted
• 1952—69 Look, Ma, no hands!
• 1950s Early AI programs, including Samuel's checkers
program, Newell & Simon's Logic Theorist,
Gelernter's Geometry Engine
• 1965 Robinson's complete algorithm for logical reasoning
• 1966—73 AI discovers computational complexity
Neural network research almost disappears
• 1969—79 Early development of knowledge-based systems
• 1980-- AI becomes an industry
• 1986-- Neural networks return to popularity
• 1987-- AI becomes a science
• 1995-- The emergence of intelligent agents(systems)
Prof. Marvin Intelligent Systems Page 3
State of the art
• Game playing: IBM’s Deep Blue defeated the reigning world chess
champion Garry Kasparov in 1997. The value of IBM’s stock increased
by $18 billion.
• Mathematics: Proved a mathematical conjecture (Robbins conjecture)
unsolved for decades.
• Autonomous control: The ALVINN computer vision system was trained
to steer a car to keep it following a lane. It was placed in CMU’s
NAVLAB computer-controlled minivan and used to navigate across the
United States from Pittsburgh to San Diego. For 2850 miles it was
in control of the vehicle 98% of the time. NAVLAB has video cameras
that transmit road images to ALVINN, which then computes the best
direction to steer, based on experience from previous training
runs.
• Diagnosis: Medical diagnosis programs based on probabilistic
analysis have been able to perform at the level of an expert
physician in several areas of medicine.
• Logistics planning: During the 1991 Gulf War, US forces deployed an
AI logistics planning and scheduling program (DART) to do automated
logistics planning and scheduling for transportation that involved
up to 50,000 vehicles, cargo, and people.
• Autonomous planning and scheduling: NASA's on-board autonomous
planning program controlled the scheduling of operations for a
spacecraft.
• Language understanding and problem solving: Proverb is a computer
program that solves crossword puzzles better than most humans.
• Robotics: Many surgeons now use robot assistants (like HipNav) in
microsurgery.
Prof. Marvin Intelligent Systems Page 4
Chapter 2: Intelligent Agents

An agent is anything that can be viewed as perceiving its environment


through sensors and acting upon that environment through effectors. A
human agent has eyes, ears, and other organs for sensors, and hands,
legs, mouth, and other body parts for effectors. A robotic agent
substitutes cameras and infrared range finders for the sensors and
various motors for the effectors. A software agent has encoded bit
strings as its percepts and actions.

2.1 How Agents Should Act

A rational agent is one that does the right thing. Obviously, this is
better than doing the wrong thing, but what does it mean? As a first
approximation, we will say that the right action is the one that will
cause the agent to be most successful. That leaves us with he problem of
deciding how and when to evaluate the agent’s success.

We use the term performance measure for the how-the criteria that
determine how successful an agent is. As an example, consider the case
of an agent that is supposed to vacuum a dirty floor. A plausible
performance measure would be the amount of dirt cleaned up in a single
eight-hour shift. A more sophisticated performance measure would factor
in the amount of electricity consumed and the amount of noise generated
as well. A third performance measure might give highest marks to an
agent that not only cleans the floor quietly and efficiently, but also
finds time to go windsurfing at the weekend.

What is rational at any given time depends on four things:


• The performance measure that defines degree of success.
• Everything that the agent has perceived so far. We will call this
complete perceptual history the percept sequence.
• What the agent knows about the environment.
• The actions that the agent can perform.

This leads to a definition of an ideal rational agent: For each possible


percept sequence, an ideal rational agent should do whatever action is
expected to maximize its performance measure, on the basis of the
evidence provided by the percept sequence and whatever built-in knowledge
the agent has.

The notion of an agent is meant to be a tool for analyzing systems, not


an absolute characterization that divides the world into agents and non-
agents. Consider a clock. It an be thought of as just an inanimate
object, or it can be thought of as a simple agent. As an agent, most
clocks always do the right action: moving their hands (or displaying
digits) in the proper fashion. Clocks are a kind of degenerate agent in
that their percept sequence is empty; no matter what happens outside, the
clock’s action should be unaffected.

Well, this is not quite true. If the clock and its owner take a trip
from California to Australia, the right thing for the clock to do would
be to turn itself back six hours. We do not get upset at our clocks for
failing to do this because we realize that they are acting rationally,
given their lack of perceptual equipment.

Autonomy
There is one more thing to deal with in the definition of an ideal
rational agent: the “built-in knowledge” part. If the agent’s actions
are based completely on built-in knowledge, such that it need pay no
attention to its percepts, then we say that the agent lacks autonomy.

An agent’s behavior can be based on both its own experience and the
built-in knowledge used in constructing the agent for the particular
Prof. Marvin Intelligent Systems Page 5
environment in which it operates. A system is autonomous to the extent
that its behavior is determined by its own experience.
Autonomy not only fits in with our intuition, but it is an example of
sound engineering practices. An agent that operates on the basis of
built-in assumptions will only operate successfully when those
assumptions hold, and thus lack flexibility.

2.2 Structure of Intelligent Agents:

The job of AI is to design the agent program; a function that implements


the agent mapping from the percepts to action .We assume this program
will run on the some sort of computing device, which we will call the
architecture.

The architecture might be a plain computer, or it might include special-


purpose hardware for certain tasks, such as processing camera images or
filtering audio input. In general, the architecture makes the percepts
from the sensors available to the program, runs the program, and feeds
the program’s action choices to the effectors as they are generated. The
relationship among agents, architectures, and programs can be summed up
as follows: agent = architecture + program

Before we design an agent program, we must have a pretty good idea of the
possible percepts and actions, what goals or performance measure the
agent is supposed to achieve, and what sort of environment it will
operate in. These come in a wide variety. The figure below shows the
basic elements for a selection of agent types.
AGENT PERCEPTS ACTIONS GOALS ENVIRONMENT
Medical Symptoms,findings, Questions, Healthy Patient,
diagnosis patient’s answers Tests, Patients, hospital
system treatments Minimize
costs
Satellite Pixels of varying Print a Correct Images from
image Intensity,color categorizatio Categoriza orbiting
Analysis n tion satellite
system scene
Part- Pixels of varying Pick up parts Place Conveyor
picking intensity And sort into parts in belt
robot bins correct With parts
bins
Refinery Temperature, Open,close Maximize, refinery
controller Pressure readings vavles;adjust purity,
temperature Yield,
safety
Interactive Typed words Print Maximize, Set of
English exercises, Students’s student
tutor Suggestions, score on
correction test

An Example
At this point, it will be helpful to consider a particular environment,
so that our discussion can become more concrete. We will look at the job
of designing an automated taxi driver.

We must first think about the percepts, actions, goals and environment
for the taxi. They are summarized in Figure below.
3

Agent Type Percepts Actions Goals Environment


Taxi Cameras, Steer, Safe, fast, Roads, other
driver speedometer, accelerate, legal, traffic,
GPS,sonar, brake, talk comfortable pedestrians,
microphone to passenger trip, maximize customers
profits

What performance measure would we like our automated driver to aspire to?
Desirable qualities include getting to the correct destination;
minimizing violations of traffic laws and disturbances to other drivers;
maximizing safety and passenger comfort; maximizing profits. Obviously,
some of these goals conflict, so there will be trade-offs involved.

Now we have to decide how to build a real program to implement the


mapping from percepts to action. We will find that different aspects of
Prof. Marvin Intelligent Systems Page 6
driving suggest different types of agent program. We will consider four
types of agent program:
• Simple reflex agents
• Agents that keep track of the world
• Goad-based agents
• Utility-based agents

Simple reflex agents:


Example, if the car in front brakes, and its brake lights come on, then
the driver should notice this and initiate braking. In other words, some
processing is done on the visual input to establish the condition we call
“The car in front is braking”; then this triggers some established
connection in the agent program to the action “initiate braking”. We call
such a connection a condition-action rule written as:
if car-in-front-is-braking then initiate-braking
Humans also have many such connections, some of which are learned
responses (as for driving) and some of which are innate reflexes (such as
blinking when something approaches the eye).

Figure gives the structure of a simple reflex agent in schematic form,


showing how the condition-action rules allow the agent to make the
connection from percept to action. We use rectangles denote the current
internal state of the agent’s decision process, and ovals to represent
the background information used in the process. A simple reflex agent
works by finding a rule whose condition matches the current situation (as
defined by the percept) and then doing the action associated with that
rule.

Agents that keep track of the world / Reflex agent with internal
state:
The simple reflex agent described before will work only if the correct
decision can be made on the basis of the current percept. If the car in
front is a recent model, and has the centrally mounted brake light now
required in the United States, then it will be possible to tell if it is
braking from a single image. Unfortunately, older models have different
configurations of tail lights, brake lights, and turn-signal lights, and
it is not always possible to tell if the car is braking. Thus, even for
the simple braking rule, our driver will have to maintain some sort of
internal state in order to choose an action.

Updating this internal state information as time goes by requires two


kinds of knowledge to be encoded in the agent program. First, we need
some information about how the world evolves independently of the agent-
for example, that an overtaking car generally will be closer behind than
it was a moment ago. Second, we need some information about how the
agent’s own actions affect the world-for example, that when the agent
changes lanes to the right, there is a gap (at least temporarily) in the
lane it was in before.
Prof. Marvin Intelligent Systems Page 7

Figure gives the structure of the reflex agent, showing how the current
percept is combined with the old internal state to generate the updated
description of the current state.

A reflex agent with internal state works by finding a rule whose


condition matches the current situation (as defined by the percept and
the stored internal state) and then doing the action associated with that
rule.

Goal-based agents:

Knowing about the current state of the environment is not always enough
to decide what to do. For example, at a road junction, the taxi can turn
left, right, or go straight on. The right decision depends on where the
taxi is trying to get to. In other words, as well as a current state
description, the agent needs some sort of goal information, which
describes situations that are desirable- for example, being at the
passenger’s destination. The agent program can combine this with
information about the results of possible actions (the same information
as was used to update internal state in the reflex agent) in order to
choose actions that achieve the goal.

Notice that the decision-making of this kind is fundamentally different


from the condition-action rules described earlier, in that it involves
consideration of the future-both “What will happen if I do such-and-
such?” and “Will that make me happy ?”

Utility-based agents:
Prof. Marvin Intelligent Systems Page 8

Goals alone are not really enough to generate high-quality behavior.


Goals just provide a crude distinction between “happy” and “unhappy”
states, whereas a more general performance measure should allow a
comparison of different world states (or sequences of states) according
to exactly how happy they would make the agent if they could be achieved.
The customary terminology is to say that if one world state is preferred
to another, then it has higher utility for the agent.

Utility is therefore a function that maps a state onto a real number,


which describes the associated degree of happiness. A complete
specification of the utility function allows rational decisions in two
kinds of cases where goals have trouble. First, when there are
conflicting goals, only some of which can be achieved (for example, speed
and safety), the utility function specifies the appropriate trade-off.
Second, when there are several goals that the agent can aim for, none of
which can be achieved with certainty, utility provides a way in which the
likelihood of success can be weighed up against the importance of the
goals.

2.3 Environments

Properties of environments
Environments come in several flavors. The principal distinction to be
make are as follows:

Accessible vs. inaccessible.


If an agent’s sensory apparatus gives it access to the complete state of
the environment, then we say that the environment is accessible to that
agent. An environment is effectively accessible if the sensors detect all
aspects that are relevant to the choice of action.

Deterministic vs. nondeterministic.


If the next state of the environment is completely determined by the
current state and the actions selected by the agents, then we say the
environment is deterministic.

Episodic vs. nonepisodic


In an episodic environment, the agent’s experience is divided into
“episodes.” Each episode consists of the agent perceiving and then
acting. The quality of its action depends just on the episode itself,
because subsequent episodes do not depend on what actions occur in
previous.

Static vs. dynamic


If the environment can change while an agent is deliberating, then we say
the environment is dynamic for that agent; otherwise it is static.

Discrete vs. continuous


If there are a limited number of distinct, clearly defined percepts and
actions we say that the environment is discrete.
Prof. Marvin Intelligent Systems Page 9
Chapter 3: Solving problems by searching

3.1 Solving problems by searching


Water jug problem: Refer class notes for the step by step solution.

State space search: The initial state and the successor function
implicitly define the state space of the problem--the set of all states
reachable from the initial state. The state space forms a graph in which
the nodes are states and the arcs between the nodes are actions.
Searching the state space to get a path from the initial state to the
final state is called state space search.

Uninformed search strategies:


This section covers six search strategies that come under the heading of
uninformed search (also called blind search). The term means that they
have no additional information about states beyond that provided in the
problem definition. All they can do is generate successors and
distinguish a goal state from a non-goal state. Strategies that know
whether one non-goal state is "more promising" than another are called
informed search or heuristic search strategies.

i. Breadth first search (BFS):


It is a simple strategy in which the root node is expanded first, then
all the successors of the root node are expanded next, then their
successors, and so on. In general, all the nodes are expanded at a given
depth in the search tree before any nodes at the next level are expanded.

Algorithm: Breadth-First Search


1. Create a variable called NODE-LIST and set it to the initial state.
2. Until a goal state is found or NODE-LIST is empty do:
a) Remove the first element from NODE-LIST and call it E. If
NODE-LIST was empty, quit.
b) For each way that each rule can match the state described in
E do:
i. Apply the rule to generate a new state.
ii. If the new state is a goal state, quit and return
this state.
iii. Otherwise, add the new state to the end of NODE-
LIST.

Refer class notes for examples and step by step solution.

ii. Depth first search (DFS) :


It always expands the deepest node in the current branch of the search
tree. It proceeds to expands all the nodes in the current branch until
it reaches either the goal state or a dead end ( A dead end in a state to
which application of any rule generates a state that is already existing
in the current solution path ), then it such a case it backtracks to
another branch.

Give the depth first search algorithm and state its advantages and disadvantages. [ BU Dec
2004 ]

Algorithm: Depth-First Search


1. If the initial state is a goal state, quit and return success.
2. Otherwise, do the following until success or failure is signaled:
a) Generate a successor, E, of the initial state. If there are
no more successors, signal failure.
b) Call Depth-First Search with E as the initial state.
c) If success is returned, signal success. Otherwise continue
in this loop.

Advantages
• Depth-first search requires less memory since only the nodes on the
current path are stored.
• By chance (or if care is taken in ordering the alternative
successor states), depth-first search may find a solution without
examining much of the search space at all.
Prof. Marvin Intelligent Systems Page 10

Disadvantages
• It has overhead of backtracking.
• It may or may not return an optimal solution.

Refer class notes for examples and step by step solution.

iii. Uniform cost search (UCS) :


BFS always expands the shallowest unexpanded node ( when all step costs
are equal). But, UCS expands the node n with the lowest path cost. It
is just a variation of BFS it is identical to BFS if all step costs are
equal.
Refer class notes for examples and step by step solution.

iv. Depth limited search (DLS) :


DLS is a variation of DFS. It is identical to DFS with an exception that
it imposes a cutoff on the maximum depth of a path. If the depth limit
is set as l, then the nodes at depth l are treated as if they have no
successors. If any node at depth l is not the goal state then the search
backtracks. The difficulty in DLS is that the search will remain
incomplete if the depth limit l is too small.

v. Iterative deepening depth-first search :


Iterative deepening search ( or iterative deepening depth-first search )
is a general strategy, often used in combination with depth-first search,
that finds the best depth limit. It does this by gradually increasing
the limit-first 0, then 1, then 2, and so on-until a goal is found. This
will occur when the depth limit reaches d, the depth of the shallowest
goal node. Iterative deepening combines the benefits of depth-first and
breadth-first search. Like depth-first search, its memory requirements
are very modest. Like breadth-first search, it is complete when the
branching factor is finite and optimal when the path cost is a non
decreasing function of the depth of the node.
Refer class notes for examples and step by step solution.

vi. Bidirectional search :


The idea behind bidirectional search is to run two simultaneous searches-
one forward from the initial state and the other backward from the goal,
stopping when the two searches meet in the middle.
Prof. Marvin Intelligent Systems Page 11
3.2 Informed Search Methods:
Informed Search Strategy is the one that uses problem-specific knowledge
beyond the definition of the problem itself to find solutions more
efficiently than an uninformed strategy.
Heuristic function is a concept that associates a value with every state
in a given problem space and thus helps use to compare any two states to
achieve a goal oriented search.

1. Hill Climbing:

Describe the Hill Climbing Algorithm. What are the problems in Hill Climbing Algorithms?
Suggest methods to overcome them. [ BU Dec 2004, May 2005, Dec 2005 ]

Algorithm: Steepest-Ascent Hill Climbing


1. Evaluate the initial state. If it is also a goal state, then
return it and quit. Otherwise, continue with the initial state as
the current state.
2. Loop until a solution is found or until a complete iteration
produces no change to current state:
a) Let SUCC be a state such that any possible successor of the
current state will be better than SUCC.
b) For each operator that applies to the current state do:
i. Apply the operator and generate a new state.
ii. Evaluate the new state. If it is a goal state, then
return it and quit. If not, compare it to SUCC. If
it is better, then set SUCC to this state. If it is
not better, leave SUCC alone.
c) If the SUCC is better than current state, then set current
state to SUCC.

Problems in Hill Climbing


• A local maximum is a state that is better than all its neighbors
but is not better than some other states farther away. At a local
maximum, all moves appear to make things worse. Local maxima are
particularly frustrating because they often occur almost within
sight of a solution. In this case, they are called foothills.
• A plateau is a flat area of the search space in which a whole set
of neighboring states have the same value. On a plateau, it is not
possible to determine the best direction in which to move by making
local comparisons.
• A ridge is a special kind of local maximum. It is an area of the
search space that is higher than surrounding areas and that itself
has a slope (which one would like to climb). But the orientation
of the high region, compared to the set of available moves and the
directions in which they move, makes it impossible to traverse a
ridge by single moves.

Methods to overcome the problems


• Backtrack to some earlier node and try going in a different
direction. This is particularly reasonable if at that node there
was another direction that looked as promising or almost as
promising as the one that was chosen earlier. To implement this
strategy, maintain a list of paths almost taken and go back to one
of them if the path that was taken leads to a dead end. This is a
fairly good way of dealing with local maxima.
• Make a big jump in some direction to try to get to a new section of
the search space. This is a particularly good way of dealing with
plateaus. If the only rules available describe single small steps,
apply them several times in the same direction.
• Apply two or more rules before doing the test. This corresponds to
moving in several directions at once. This is a particularly good
strategy for dealing with ridges.
Prof. Marvin Intelligent Systems Page 12
2. Best First Search:

Algorithm: Best-First Search


1. Start with OPEN containing just the initial state.
2. Until goal is found or there are no nodes left on OPEN do:
a) Pick the best node on OPEN.
b) Generate its successors.
c) For each successor do:
i. If it has not been generated before, evaluate it,
add it to OPEN, and record its parent.
ii. If it has been generated before, change the parent
if this new path is better than the previous one.
In that case, update the cost of getting to this
node and to any successors that this node may
already have.

Refer class notes for examples.

Problem of Best-First Search:


It is unable to compare two states having the same heuristic value at
different levels in a tree.

Solution is to include the distance of these states from the initial


state with h’, to compare them. So we have a new heuristic function:
f’ = g + h’

3. A* Algorithm:
Refer class notes for execution and cases.
Refer class notes for the proof of admissibility of A* algorithm. [ BU May
2004, May 2005 ]

4. AO* Algorithm:
1. Start with the initial state. Also call it as the current state
for this iteration.
2. Loop the following until the problem is solved or futility limit
is exceeded.
i. If it is the first iteration then expand all successors of
the initial state else expand the nodes on the path
selected as best in the previous iteration.
ii. Propagate the values of all the newly generated nodes till
the root.
a. Let S be a set of nodes that have been newly generated.
b. Loop the following until S is empty or null
i. Select from S, a node, none of whose descendants
in graph, also occur in S.
ii. Compute the value of the selected node by:
Heuristic function ( For leaf nodes ) OR
Summation of cost of its children + cost of the
arcs
iii. In case of multiple values, select the best
(i.e. least)
iv. Remove selected node from S & add its ancestor
to S.
3. At the root, make a choice for the best path for next iteration.

Refer class notes for execution of the algorithm.


Prof. Marvin Intelligent Systems Page 13

Chapter 4: Knowledge And Reasoning

4.1 A KNOWLEDGE-BASED AGENT

The central component of a knowledge-based agent is its knowledge base,


or KB. Informally, a knowledge base is a set of representations of facts
about the world. Each individual representation is called a sentence.
(Here “sentence” is used as a technical term. It is related to the
sentences of English and other natural languages, but it not identical.)
The sentences are expressed in a language called a knowledge
representation language.

We can describe a knowledge-based agent at three levels:


• The knowledge level or epistemological level is the most abstract;
we can describe the agent by saying what it knows. For example, an
automated taxi might be said to know that the Golden Gate Bridge
links San Francisco and Marin Country.
• The logical level is the level at which the knowledge is encoded
into sentences. For example the taxi might be described as having
the logical sentence Links(VashiBridge,Vashi,Mankhurd) in its
knowledge base.
• The implementation level is the level that runs on the agent
architecture. It is that level at which there are physical
representations of the sentences at the logical level.

4.2 The Wumpus World Environment

The wumpus world is a cave consisting of rooms connected by passageways.


Lurking somewhere in the cave is the wumpus, a beast that eats anyone who
enters the room. The wumpus can be shot by an agent, but the agent has
only one arrow. Some rooms contain bottomless pits that will trap anyone
who wanders into these (except for the wumpus, which is too big to fall
in). The only mitigating feature of living in this environment is the
possibility of finding a heap of gold.

To specify the agent’s task, we specify its percepts, actions, and goals.
In the wumpus world, these are as follows:

Percepts:
• In the square containing the wumpus and in the directly (not
diagonally) adjacent squares the agent will perceive a stench.
• In the squares directly adjacent to a pit, the agent will perceive
a breeze.
• In the square where the gold is, the agent will perceive a glitter.
• When an agent walks into a wall, it will perceive a bump.
• When wumpus is killed, it gives out a woeful scream that can be
perceived anywhere in a cave.
Prof. Marvin Intelligent Systems Page 14
• The percepts will be given to the agent in the form of a list of
five symbols; for example, if there is a stench, a breeze, and a
glitter buy no bump and no scream, the agent will receive the
percept [Stench,Breeze,Glitter,None,None]. The agent cannot
perceive its own location.

Actions:
• Go forward, turn right by 90O, and turn left by 90O. In addition,
the action Grab can be used to pick up an object that is in the
same square as the agent. The action Shoot can be used to fire an
arrow in a straight line in the direction the agent is facing. The
arrow continues until it either hits and kills the wumpus or hits
the wall. The agent only has one arrow, so only the first Shoot
action has any effect. Finally, the action Climb is used to leave
the cave; it is effective only when the agent is in the start
square.
• The agent dies a miserable death if it enters a square containing a
pit or a live wumpus. It is safe (but smelly) to enter a square
with a dead wumpus.

Goals:
• The agent’s goal is to find the gold and bring it back to the start
as quickly as possible, without getting killed. To be precise, 1000
points are awarded for climbing out of the cave while carrying the
gold, but there is a 1-point penalty for each action taken, and a
10,000-point penalty for getting killed.

Let us watch a knowledge-based wumpus agent exploring the environment


shown in the figure. The agent’s initial knowledge base contains the
rules of the environment, as listed above; in particular, it knows that
it is in [1,1] and that [1,1] is a safe square. We will see how its
knowledge evolves as new percepts arrive and actions are taken.

The first percept is [None, None, None, None, None], from which the agent
can conclude that its neighboring squares are safe. From the fact that
there is no stench or breeze in [1,1], the agent can infer that [1,2] and
[2,1] are free of dangers. Let the agent decide to move forward to [2,1].
The agent detects a breeze in [2,1], so there must be a pit in a
neighboring square. The pit cannot be in [1,1], by the rules of the game,
so there must be a pit in [2,2] or [3,1] or both. At this point there is
only one square that is safe and has not been visited yet, ie. [1,2]. So
the prudent agent will turn around, go back to [1,1], and then proceed to
[1,2]. The new percept in [1,2] is [Stench, None, None, None, None]. The
stench in [1,2] means that there must be a wumpus nearby. But the wumpus
cannot be in [1,1], by the rules of the game, and it cannot be in [2,2]
(or the agent would have detected a stench when it was in [2,1]).
Therefore, the agent can infer that the wumpus is in [1,3]. Moreover, the
lack of breeze in [1,2] implies that there is no pit in [2,2]. Yet we
already inferred that there must a pit in either [2,2] or [3,1], so this
means it must be in [3,1]. This is a fairly difficult inference, because
it combines knowledge gained at different times in different places and
relies on the lack of percept to make a crucial step. The agent has now
proved to itself that there is neither a pit nor a wumpus in [2,2], so it
is safe to move there. I am assuming here that the agent will then move
to [2,3] and perceive glitter and grab the gold.

In each case where the agent draws a conclusion from the available
information, that conclusion is guaranteed to be correct if the available
information is correct.
Prof. Marvin Intelligent Systems Page 15
4.3 Propositional Logic: A Very Simple Logic

Syntax
The syntax of propositional logic is simple. The symbols of propositional
logic are the local constants True and False, proposition symbols such as
P and Q, the logical connectives Λ, V, Ù, =>, and ¬, and parentheses,
(). All sentences are make by putting these symbols together using the
following rules:
• The logical constants True and False are sentences by themselves.
• A propositional symbol such as P or Q is a sentence by itself.
• Wrapping parentheses around a sentence yields a sentence, for
example, (P Λ Q).
• A sentence can be formed by combining simpler sentences with one of
the five logical connectives:

Λ (and): A sentence whose main connective is Λ, such as P Λ (Q V R), is


called a conjunction (logic); its parts are the conjuncts. (The Λ lookes
like an “A” for “And.”)

V (or): A sentence using V, such as R V (P Λ Q), is a disjunction of the


disjuncts R and (P Λ Q). (Historically, the V comes from the Latin “vel,”
which means “or.”)

=> (implies): A sentence such as (P Λ Q) => R is called an implication


(or conditional). Its premise or antecedent is P Λ Q, and its conclusion
or consequent is R. Implications are also known as rules or if-then
statements. The implication symbol is sometimes written in other books as
‫ כ‬or →.
Ù (equivalent): The sentence (P Λ Q) Ù (Q Λ P) is an equivalence (also
called a biconditional).

¬ (not): A sentence such as ¬P is called the negation of P. All the other


connectives combine two sentences into one; ¬ is the only connective that
operates on a single sentence.

Atomic sentences, in propositional logic consist of a single symbol


(e.g., P), and complex sentences contain connectives or parentheses (e.g.
P Λ Q). The term literal means either an atomic sentence or a negated
atomic sentence.

Semantics
The semantics of propositional logic is also quite straightforward. We
define it by specifying the interpretation of the proposition symbols and
constants, and specifying the meanings of the logical connectives.

A proposition symbol can mean whatever you want. That is, its
interpretation can be any arbitrary fact. The interpretation of P might
be the fact that Paris is the capital of France or that the wumpus is
dead. A sentence containing just a proposition symbol is satisfiable but
not valid: it is true just when the fact that it refers to is the case.
With logical constants, you have no choice; the sentence True always has
its interpretation the way the world actually is – the true fact. The
sentence False always has as its interpretation the way the world is not.

A complex sentence has a meaning derived from the meaning of its parts.
Each connective can be thought of as a function. Just as addition is a
function that takes two numbers as input and returns a number, so and is
a function that takes two truth values as input and returns a truth
value. We know that on way to define a function is to make a table that
gives the output value for every possible input value. For most functions
(such as addition), this is impractical because of the size of the table,
but there are only two possible truth values, so a logical function with
two arguments needs a table with only four entries. Such a table is
called a truth table. The truth tables for the logical connectives are
shown in Figure below.
P Q ¬P P Λ Q P V Q P => Q P Ù Q
False False True False False True True
False True True False True True False
True False False False True False False
True True False True True True True
Prof. Marvin Intelligent Systems Page 16

4.4 First Order Logic

First-order logic makes a stronger set of ontological commitments. The


main one is that the world consists of objects, that is, things with
individual identities and properties that distinguish them from other
objects.

Among these objects, various relations hold. Some of these relations are
functions—relations in which there is only one “value” for a given
“input.”

Term
A term is a logical expression that refers to an object.

Atomic Sentence
An atomic sentence is formed from a predicate symbol followed by a
parenthesized list of terms. For example,
Brother (Richard, John)
States, under the interpretation given before, that Richard the Lionheart
is the brother of King John. Atomic sentences can have arguments that are
complex terms:
Married(FatherOf(Richard),MotherOf(John))
States that Richard the Lionheart’s father is married to King John’s
mother.

Complex Sentences
We can use logical connectives to construct more complex sentences, just
as in propositional calculus. The semantics of sentences formed using
logical connectives is identical to that in that propositional case. For
example:
• Brother(Richard,John) Λ Brother(John,Richard) is true just when
John is the brother of Richard and Richard is the brother of John.
• Older(John,30) V Younger(John,30) is true when John is older then
30 or Join is younger than 30.
• Older(John,30) => ¬Younger(John,30) states that if John is older
than 30, then he is not younger than 30.
• ¬Brother(Robin,John) is true just when Robin is not the brother of
John.

Quantifiers
Once we have a logic that allows objects, it is only natural to want to
express properties of entire collection of objects, rather than having to
enumerate that objects by name. Quantifiers let us do this. First-order
logic contains two standard quantifiers, called universal and
existential.

Universal Quantification (V)


“All cats are mammals”
V x Cat(x) => Mammal(x)
Hence V is called a universal quantifier. We use the convention that all
variables start with a lowercase letter, and that all constant,
predicate, and function symbols are capitalized. A variable is a term all
by itself, and as such can also serve as the argument of a function, for
example, ChildOf(x). A term with no variables is called a ground term.

Existential Quantification (Э)


Universal quantification makes statements about every object. Similarly,
we can make a statement about some object in the universe without naming
it, by using an existential quantifier. To say, for example, that Spot
has a sister who is a cat, we write
Эx Sister(x,Spot) Λ Cat(x)

Nested Quantifiers
We will often want to express more complex sentences using multiple
quantifiers. The simplest case is where the quantifiers are of the same
type. For example, “For all x and all y, if x is the parent of y then y
is the child of x” becomes
Vx,y Parent(x,y) => Child(y,x)
Vx,y is equivalent to Vx Vy. Similarly, the fact that a person’s brother
has that person as a sibling is expressed by
Prof. Marvin Intelligent Systems Page 17
Vx,y Brother(x,y) => Sibling(y,x)
In other cases we will have mixtures. “Everyboday loves somebody” means
that for every person, there is someone that person loves:
Vx Эy Loves(x,y)
On the other hand, to say “there is someone who is loved by everyone” we
write
Эy Vx Loves(x,y)

Connections Between V and Э


The two quantifiers are actually intimately connected with each other
through negation.
The De Morgan rules for quantified and unquantified sentences are as
follows:

Represent the following sentences in first-order logic, using a


consistent vocabulary (which you must define):
Assumptions:
• Took(x,y,z): true when student x took class y in term z
• score(x,y,z): true when student x got a score of z in class y
• passed(x,y): true when student x passed class y

a) Some student took French in spring 2001.

Ans) Эx : took(x, French, Spring2001)

b) Every student who takes French passed it.

Ans) Vx,y : took(x, French, y) => passed(x, French)

c) Only one student took Greek in spring 2001

Ans) Vxy : (took(x, Greek, Spring2001) Λ took(y, Greek, Spring2001)) => (x = y)

d) The best score in Greek is always higher than the best score in
French.

Ans) Эx,m : score(m, Greek, x) Λ (Vy,z : score(y, Greek, z) => (x >= z))
Λ (Vab : score(a, French, b) => (x > b))

e) Every person who buys a policy is smart.


Assumption: Sells(x,y,z): x sells y to z

Ans) Vx,y : Buys(x, y) Λ Person(x) Λ Policy(y) => Smart(x)

f) No person buys an expensive policy.

Ans) Vxy : Buys(x; y) Λ Person(x) Λ Policy(y) => ¬Expensive(y)

g) There is an agent who sells policies only to people who are not
insured.

Ans) Эx Vy,z : Sells(x, y, z) Λ Person(z) Λ Policy(y) => ¬Insured(z)

h) There is a barber who shaves all men in town who do not shave
themselves. [B.U. Dec-04, May-05]

Ans) ∀x,y Men(x) ∧ InTown(x) ∧barber(y)⇒ Shave(y, x)∧¬Shave(y, y)

i) Politicians can fool some of the people all of the time, and they
can fool all of the people some of the time, but they can’t fool
all of the people all of the time.
Prof. Marvin Intelligent Systems Page 18

Ans) ∀p Politicians(p) ⇒((∀m ∃t people(m) ∧time(t)⇒Fool(p,m,t)) ∧(∃m ∀t


people(m) ∧time(t)⇒Fool(p,m,t)) ∧((¬∀m ∀t people(m) ∧time(t)⇒Fool(p,m,t)))

j) One more outburst like that and you’ll be in contempt of court.


[B.U. Dec-04, May-05]

Ans) ∀x One_More(Outburst_of(x, you)) ∧ Contempt_of_Court(you)

k) Either the Red Sox win or I’m out of ten dollars

Ans) Win(Red Sox) ∨ Lose(10 dollars, I)

l) It is not the case that if you attempt this exercise you will get
an F. Therefore, you will attempt this exercise.

Ans) ¬ (Attempt(exercise, you) => GetScore(F, you)) => Attempt(exercise, you)

m) James’s father is married to king Johan’s mother. [B.U. Dec-05]

Ans) Married(FatherOf(James),MotherOf(King Johan))

n) There is someone who is loved by everyone. [B.U. Dec-05]


Ans) ∃ y ∀ x Loves(x,y)

o) There is no one who does not like Ice Cream.{This is the same as
“Everyone likes Ice Cream”} [B.U. Dec-05]
Ans) ∀x Likes(x,Ice Cream) OR
¬∃x Likes(x,Ice Cream)

p) Spot has at least two sisters. [B.U. Dec-05]


Ans) ∃ x,y Sister(x,Spot) ∧ Sister(y,Spot) ∧ ¬(x=y)
Prof. Marvin Intelligent Systems Page 19
Chapter 5: Building A Knowledge Base
The process of building a knowledge base is called knowledge engineering.
A knowledge engineer is someone who investigates a particular domain,
determines what concepts are important in that domain, and creates a
formal representation of the objects and relations in that domain. The
knowledge engineer will usually interview the real experts to become
educated about the domain and to elicit the required knowledge, in a
process called knowledge acquisition.

5.1 Properties Of Good And Bad Knowledge Bases

A good knowledge representation language should be expressive, concise,


unambiguous, context-insensitive, and effective. A knowledge base should,
in addition, be clear and correct. The relations that matter should be
defined, and the irrelevant details should be suppressed.

The question of efficiency is a little more difficult to deal with.


Ideally, the separation between the knowledge base and the inference
procedure should be maintained. The same answers should be obtainable by
the inference procedure, no matter how the knowledge is encoded.

Every knowledge bas has two potential consumers: human readers and
inference procedures. A common mistake is to choose predicate names that
are meaningful to the human reader, and then be lulled into assuming that
the name is somehow meaningful to the inference procedure as well. The
sentence BearOfVerySmallBrain(Pooh) might be appropriate in certain
domains, but from this sentence alone, the inference procedure will not
be able to infer either that Pooh is a bear or that he has a very small
brain; that he has a brain at all; that very small brains are smaller
than small brains; or that this fact implies something about Pooh’s
behavior. The hard part is for the human reader to resist the temptation
to make the inferences that seem to be implied by long predicate names. A
knowledge engineer will often notice this kind of mistake when the
inference procedure fails to conclude, for example, Silly(Pooh). It is
compounding the mistake to write

V b BearOfVerySmallBrain(b) => Silly(b)

because this expresses the relevant knowledge at too specific a level.

In a good knowledge base, BearOfVerySmallBrain(Pooh) would be replaced by


something like the following:

1. Pooh is a bear; bears are animals; animals are physical things.

Bear(Pooh)
V b Bear(b) => Animal(b)
V a Animal(a) => PhysicalThing(a)

These sentences help to tie knowledge about Pooh into a broader context.
They also enable knowledge to be expressed at an appropriate level of
generality, depending on whether the information is applicable to bears,
animals, or all physical objects.

2. Pooh has a very small brain.

RelativeSize(BrainOf(Pooh),BrainOf(TypicalBear)) = Very(Small)

This provides a precise sense of “very small,” which would otherwise be


highly ambiguous. Is Pooh’s brain very small compared to a molecule or a
moon?

3. All animals (and only animals) have a brain, which is a part of


the animal.

V a Animal(a) Ù Brain(BrainOf(a))
V a part of(BrainOf(a),a)

This allows us to connect Pooh’s brain to Pooh himself, and introduces


some useful, general vocabulary.
Prof. Marvin Intelligent Systems Page 20
4. if something is part of a physical thing, then it is also a
physical thing:

V x,y part of(x,y) Λ PhysicalThing(y) => PhysicalThing(x)

5.2 Knowledge Engineering

The knowledge engineer must understand enough about the domain in


question to represent the important objects and relationships. He or she
must also understand enough about the representation language to
correctly encode these facts. Moreover, the knowledge engineer must also
understand enough about the implementation of the inference procedure.

Decide what to talk about: Understand the domain well enough to know
these objects and facts need to be talked about, and which can be
ignored. For the early examples in this chapter, this step is easy. In
some cases, however, it can be the hardest step. Many knowledge
engineering projects have failed because the knowledge engineers started
to formalize the domain before understand it.

Decide on a vocabulary of predicates, functions, and constants: That is,


translate the important domain-level concepts into logic-level names.
This involves many choices, some arbitrary and some important. Should
Size be a function or a predicate? Would Bigness be a better name than
Size? Should Small be a constant or a predicate? Is Small a measure of
relative size or absolute size? Once the choices have been made, the
result is a vocabulary that is known as the ontology of the domain.

Encode general knowledge about the domain: The ontology is an informal


list of the concepts in a domain. By writing logical sentences or axioms
about the terms in the ontology, we accomplish two goals: first, we make
the terms more precise so that humans will agree on their interpretation.
Second, we make it possible to run inference procedures to automatically
derive consequences from the knowledge base.

Encode a description of the specific problem instance: If the ontology is


well thought out, this step will be easy. It will mostly involve writing
simple atomic sentences about instances of concepts that are already part
of the ontology.

Pose queries to the inference procedure and get answers: This is where
the reward is: we can let the inference procedure operate on the axioms
and problem-specific facts to derive the facts we are interested in
knowing.

The Electronic Circuits Domain


We follow the five-step process for knowledge engineering for following
circuit (One-bit full adder).

Decide what to talk about


Digital circuits are composed of wires and gates. Signals flow along
wires to the inputs terminals of gates, and each gate produces a signal
on the output terminal that flows along another wire. There are four
types of gates: AND, OR, and XOR gates have exactly two input terminals,
and NOT gates have one. All gates have exactly one output terminal.
Circuits, which are composed of gates, also have input and output
terminals.
Prof. Marvin Intelligent Systems Page 21

Our main purpose is to analyze the design of circuits to see if they


match their specification. Thus, we need to talk about circuits, their
terminals, and the signals at the terminals. To determine what these
signals will be, we need to know about individual gates, and gate types:
AND, OR, XOR, and NOT.

Decide on a vocabulary
We need to be able to distinguish a gate from other gates. This is
handled by naming gates with constants: X1, X2, and so on. Next, we need
to know the type of a gate. A function is appropriate for this: Type(X1)
= XOR. This introduces that constant XOR for a particular gate type; the
other constants will be called OR, AND, and NOT.

A gate or circuit can have one or more input terminals and one or more
output terminals. We could simply name each one with a constant, just as
we named gates. Thus, gate X1 could have terminals named X1In1, X1In2, and
X1Out1. Names as long and structured as these, however, are as bad as
BearOfVerySmallBrain. They should be replaced with a notation that makes
it clear that X1Out1 is a terminal for gate X1, and that it is the first
output terminal. A function is appropriate for this; the function
Out(1,X1) denotes the first (and only) output terminal for gate X1. a
similar function In is used for input terminals.

The connectivity between gates can be represented by the predicate


Connected, which takes two terminals as arguments, as in Connected(Out(1,
X1),In(1, X2)).

Finally, we need to know if a signal is on or off. One possibility is to


use a unary predicate, On, which is true when the signal at a terminal is
on. This makes it a little difficult; however, to pose questions such as
“What are all the possible values of the signals at the following
terminals…? We will therefore introduce as objects two “signal values” On
or Off, and a function Signal which takes a terminal as argument and
denotes a signal value.

Encode general rules


With our example, we need only seven simple rules:
1. If two terminals are connected, then they have the same signal.
V t1,t2 Connected(t1,t2) => Signal(t1) = Signal(t2)
2. The signal at every terminal is either on or off (but not both):
Vt Signal(t) = On V Signal(t) = Off
On≠Off
3. Connected is a commutative predicate:
V t1,t2 Connected(t1,t2) Ù Connected(t1,t2)
4. An OR gate’s output is on if the only if any of its inputs are on:
Vg Type(g) = OR =>
Signal(Out(1,g)) = On Ù Эn Signal(In(n,g))=On
5. An AND gate’s output is off if and only if any of its inputs are
off:
Vg Type(g) = AND =>
Signal(Out(1,g)) = Off Ù Эn Signal(In(n,g))=Off
6. An XOR gate’s output is on if and only if its inputs are different:
Vg Type(g) = XOR =>
Signal(Out(1,g)) = On Ù Signal(In(1,g))≠Signal(In(2,g))
7. A NOT gate’s output is different from it input:
Vg (Type(g)=NOT) => Signal(Out(1,g))≠Signal(In(1,g))

Encode the specific instance


The circuit shown in Figure is encoded as circuit C1 with the following
description. First we categorize the gates:
Type(X1)=XOR Type(X2)=XOR
Type(A1)=AND Type(A1)=AND
Type(O1)=OR
Then, the connection between them:
Connected(Out(1,X1),In(1,X2)) Connected(In(1,C1),In(1,X1))
Connected(Out(1,X1),In(2,A2)) Connected(In(1,C1),In(1,A1))
Connected(Out(1,A2),In(1,O1)) Connected(In(2,C1),In(2,X1))
Connected(Out(1,A1),In(2,O1)) Connected(In(2,C1),In(2,A1))
Connected(Out(1,X2),Out(1,C1)) Connected(In(3,C1),In(2,X2))
Connected(Out(1,O1),Out(2,C1)) Connected(In(3,C1),In(1,A2))
Prof. Marvin Intelligent Systems Page 22
Pose queries to the inference procedure
What combinations of inputs would cause the first output to C1 (the carry
bit) to be on?

Э i1,i2,i3 Signal(In(1,C1))=i1 Λ Signal(In(2,C1))= i2 Λ Signal(In(3,C1))=i3


Λ Signal(Out(1,C1))=Off Signal(Out(2,C1))=On
The answer is
(i1=On Λ i2=On Λ i3=Off) V
(i1=On Λ i2=Off Λ i3=On) V
(i1=Off Λ i2=On Λ i3=On)
What are the possible sets of values of all the terminals for the adder
circuit?

Э i1,i2,i3,o1,o2 Signal(In(1,C1))=i1 Λ Signal(In(2,C1))= i2 Λ


Signal(In(3,C1))=i3 Λ Signal(Out(1,C1))=O1 Signal(Out(2,C1))=O2
This final query will return a complete input/output table for the
device, which can be used to check that it does in fact add its inputs
correctly. This is a simple example of circuit verification.

5.3 General Ontology


(Explain General Ontology with reference to measures composite objects,
events, times, intervals with suitable example. (B.U. Dec 2004))

This section is about a general ontology that incorporates decisions


about how to represent a broad selection of objects and relations. It is
encoded within first-order logic, but makes many ontological commitments
that first-order logic does not make.

Consider again the ontology for circuits in the previous section. It


makes a large number of simplifying assumptions. For example, time is
omitted completely. Signals are fixed, and there is no propagation of
signals. The structure of the circuit remains constant. Now we could take
a step toward generality by considering signals at particular times, and
including the wire lengths and propagation delays in wires and devices.
This would allow us to simulate the timing properties of the circuit, and
indeed such simulations are often carried out by circuit designers. We
could also introduce more interesting classes of gates, for example by
describing the technology (TTL, MOS, CMOS, and so on) as well as the
input/output specifications.

If we look at the wumpus world, similar considerations apply. We also


used the constant symbol Pit to say that there was a pit in a particular
square, because all pits were identical. We could have allowed for
different kinds of pits by having several individuals belonging to the
class of pits but having different properties. Similarly, we might want
to allow for several different kinds of animals, not jus wumpuses.

For any area of special-purpose ontology, it is possible to make changes


like these to move toward greater generality. There are two major
characteristics of general-purpose ontologies that distinguish them from
collections of special-purpose ontologies:

• A general-purpose ontology should be applicable in more or less any


special-purpose domain (with the addition of domain-specific
axioms). This means that as far as possible, no representational
issue can be finessed or brushed under the carpet.
• In any sufficiently demanding domain, different areas of knowledge
must be unified because reasoning and problem solving may involve
several areas simultaneously.

Our discussions of the general-purpose ontology are organized under the


following headings, each of which is really worth a chapter by itself:

Representing Categories
Rather than being an entirely random collection of objects, the world exhibits a good deal of
regularity. For example, there are many cases in which several objects have a number of properties in
common. It is usual to define categories that include as members all objects having certain properties.

The organization of objects into categories is a vital part of knowledge


representation. Although interaction with the world takes place at the
level of individual objects, much of reasoning takes place at the level
Prof. Marvin Intelligent Systems Page 23
of categories. One infers the presence of certain objects from
perceptual input, infers category membership from the perceived
properties of the objects, and then uses category information to make
predictions about the objects. For example, from its green, mottled
skin, large size, and ovoid shape, one can infer that an object is a
watermelon; from this, one infers that it would be useful for fruit
salad. There are two main choices fro representing categories in first-
order logic. The first categories are represented by unary predicates.
The predicate symbol Tomato, for example, represents the unary relation
that is true only for objects that are tomatoes, and Tomato(x) means that
x is a tomato.
The second choice is to reify the category. Reification is the
process of turning a predicate or function into an object in the
language. In this case, we use Tomatoes as a constant symbol referring
to the object that is the set of all tomatoes. We use x Є Tomatoes to
say that x is a tomato. Reified categories allow us to make assertions
about the category itself, rather than about members of the category.
For example, we an say Population(Humans)=5,000,000,000, even though
there is no individual human with a population of five billion.
Categories perform one more important role: they serve to organize
and simplify the knowledge base through inheritance. If we say that all
instances of the category Food are edible, and if we assert that Fruit is
a subclass of Food and Apples is a subclass of Fruit, then we know that
every apple is edible. We say that the individual apples inherit the
property of edibility, in this case from their membership in the Food
category.
Subclass relations organize categories into a taxonomy or taxonomic
hierarchy. Taxonomies have been used explicitly for centuries in
technical fields.
First-order logic makes it easy to state facts about categories,
either by relating objects to categories or by quantifying over their
members:
• An object is a member of category. For example:
Tomato12 Є Tomatoes
• A category is a subclass of another category. For example:
Tomatoes с Fruit
• All members of a category have some properties. For example:
V x x Є Tomatoes => Red(x) Λ Round(x)
• Members of a category can be recognized by some properties. For
example:
V x Red(Interior(x)) Λ Green(Exterior(x)) Λ x Є Melons => Watermelons
• A category as a whole has some properties. For example:
Tomatoes Є DomesticatedSpecies

Measures
Many useful properties such as mass, age, and price relate objects to quantities of particular types,
which we call measures. We explain how measures are represented in logic, and how they relate to
units of measure.

In both scientific and commonsense theories of the world, objects have


height, mass, cost, and so on. The values that we assign for these
properties are called measures. Ordinary, quantitative measures are
quite easy to represent. We imagine that the universe includes abstract
“measure objects,” such as the length that is the length of this line
segment: ______________. We can call this length 1.5 inches, or 3.81
centimeters. Thus, the same length has different names in our language.
Logically, this can be done by combining a units function with a number.
If L1 is the name of the line segment, then we can write

Length(L1) = Inches(1.5) = Centimeters(3.81)

Conversion between units is done with sentences such as

Similar axioms can be written for pounds and kilograms; seconds and days;
dollars and cents.
Prof. Marvin Intelligent Systems Page 24
Measures can be used to describe objects as follows:

Composite objects
It is very common for objects to belong to categories by virtue of their constituent structure. For
example, cars have wheels, an engine, and so on, arranged in particular ways; typical baseball games
have nine innings in which each team alternates pitching and batting.

The idea that one object can be part of another is a familiar one. One’s
nose is part of one’s head. We use the general PartOf relation to say
that one thing is part of another. PartOf is transitive and reflexive.
Objects can therefore be grouped into PartOf hierarchies, reminiscent of
the Subset hierarchy:
PartOf(Bucharest,Romania)
PartOf(Romania,EasternEurope)
PartOf(EasternEurope,Europe)
From these, given the transitivity of PartOf, we can infer that
PartOf(Bucharest,Europe).
Any object that has parts is called a composite object. Categories
of composite objects are often characterized by the structure of those
objects, that is, the parts and how the parts are related. For example,
a biped has exactly two legs that are attached to its body:

This general form of sentence can be used to define the structure of any
composite object. A generic event description of this kind is often
called a schema or script, particularly in the area of natural language
understanding.

Representing change with events


Situation calculus is perfect for the wumpus world, or any world in which
a single agent takes discrete actions. Unfortunately, situation calculus
has two problems that limit its applicability. First, situations are
instantaneous points in time, which are not very useful for describing
the gradual growth of a kitten into a cat, the flow of electrons along a
wire, or any other process where change occurs continuously over time.
Second, situation calculus works best when only one action happens at a
time. When there are multiple agents in the world, or when the world can
change spontaneously, situation calculus begins to break down.
Because of these limitations, we now turn to a different approach
toward representing change, which we call the event calculus. Event
calculus is rather like a continuous version of the situation-calculus
“movie” as shown in Figure below. We think of a particular universe as
having both a “spatial” and a temporal dimension. The “spatial”
dimension ranges over all of the objects in an instantaneous “snapshot”
or “cross-sections” of the universe. The temporal dimension ranges over
time. An event is, informally, just a “chunk” of this universe with both
temporal and spatial extent.
Prof. Marvin Intelligent Systems Page 25
Let us look at an example: World War II, referred to by the symbol
WorldWarII. World War II has parts that we refer to as subevents:

SubEvent(BattleOfBritain,WorldWarII)

Similarly, World War II is a subevent of the twentieth century:

SubEvent(WorldWarII,TwentiethCentury)

The twentieth century is a special kind of event called an


interval. An interval is an event that includes as subevents all events
occurring in a given time period. Intervals are therefore entire
temporal sections of the universe, as the figure illustrates. In
situation calculus, a given fact is true in a particular situation. In
event calculus, a given event occurs during a particular interval. The
previous SubEvent sentences are examples of this kind of statement.

Times, intervals, and actions


In this section, we flesh out the vocabulary of time intervals. Because
it is a limited domain, we can be more complete in deciding on a
vocabulary and encoding general rules. Time intervals are partitioned
into moments and extended intervals. The distinction is that only
moments have zero duration:

Partition({Moments,ExtendedIntervals},Intervals)
V i i Є Intervals => ( i Є Moments Ù Duration(i)=0)

Now we invent a time scale and associate points on that scale with
moments, giving us absolute times. The time scale is arbitrary; we will
measure it in seconds and say that the moment at midnight(GMT) on January
1, 1990, has time 0. The functions Start and End pick out the earliest
and latest moments in an interval, and the function Time delivers the
point on the time scale for a moment. The function Duration gives the
difference between the end time and the start time.
V i Interval(i) => Duration(i) = (Time(End(i)) – Time(Start(i)))
Time(Start(AD1990))=Seconds(0)
Time(Start(AD1991))=Seconds(2871694800)
Time(End(AD1991))=Seconds(2903230800)
Duration(AD1991)=Seconds(31536000)

To make these numbers easier to read, we also introduce a function Date,


which takes six arguments (hours, minutes, seconds, month, day, and year)
and returns a point on the time scale:

Time(Start(AD1991))=Date(00,00,00,Jan,1,1991)
Date(12,34,56,Feb,14,1993)=Seconds(2938682096)
Prof. Marvin Intelligent Systems Page 26
Chapter 6: Inference in First-Order Logic

6.1 Inference rules involving quantifiers:


The three new inference rules are as follows:
• Universal Elimination:
For example, from ∀ X Lik es(x,IceCream), we can use the
substitution {x/Ben} and infer Likes(Ben,IceCream).
• Existential Elimination:
For example, from ∃ X Kill(x,Victim), we can infer
Kill(Murderer,Victim), as long as Murderer does not appear
elsewherein the knowledge base.
• Existential Introduction:
For example, from Likes(Jerry,IceCream) we can infer ∃ X
Likes(x,IceCream).

6.2 An Example Proof


We will begin with the situation as it might be described in English:
The law says that it is a crime for an American to sell weapons to hostile nations. The
country Nono, an enemy of America, has some missiles, and all of its missiles were sold to it by
Colonel West, who is American.
What we wish to prove is that West is a criminal. We first represent
these facts in first-order logic, and then show the proof as a sequence
applications of the inference rules.

“... it is a crime for an American to sell weapons to hostile nations”:

∀ x,y,z American(x) Λ Weapon(y) Λ Nation(z) Λ Hostile(z) Λ


Sells(x,y,z) => Criminal(West)---(1)

“The country Nono ...”:


Nation(Nono) ---(2)

“Nono, an enemy of America ...”:


Enemy(Nono,America) ---(3)
Nation(America) ---(4)

“Nono ... has some missiles”:


∃ x Owns(Nono,x) Λ Missile(x) ---(5)

“All of its missiles were sold to it by Colonel West”:


∀ x Owns(Nono,x) Λ Missile(x) => Sells(West,Nono,x) ---(6)

“West, who is American ...”:


American(West) ---(7)

Common sense rules:

Missiles are weapons


∀ x Missile(x) => Weapon(x) ---(8)

Enemy of America is hostile


∀ x Enemy(x,America) => Hostile(x) ---(9)

The proof consists of a series of applications of the inference rules:

From (5) and Existential Elimination:


Owns(Nono,M1) Λ Missile(M1) ---(10)

From (10) and And-Elimination:


Own(Nono,M1) ---(11)
Missile(M1) ---(1 2)

From (8) and Universal Elimination:


Missile(M1) => Weapon(M1) ---(13)
From (12) (13) and Modus Pones:
Weapon(M1) ---(14)

From (6) and Universal Elimination:


Prof. Marvin Intelligent Systems Page 27
Owns(Nono,M1)Λ Missile(M1) => Sells(West,Nono,M1) ---(15)

From (15), (10) and Modus Ponens:


Sells(West,Nono,M1) ---(16)

From (9) and Universal Elimination:


Enemy(Nono,America) => Hostile(Nono) ---(17)

From (3), (17) and Modus Ponens:


Hostile(Nono) ---(18)

From (1) and Universal Elimination:


American(West)Λ Weapon(M1)Λ Nation(Nono)Λ Hostile(Nono)Λ
Sells(West,Nono,M1)=> Criminal(West) ---(19)

From (7), (14), (2), (18), (16) and And-Introduction


American(West)Λ Weapon(M1)Λ Nation(Nono)Λ Hostile(Nono)Λ
Sells(West,Nono,M1) ---(20)

From (19), (20) and Modus Ponens:


Criminal(West) ---(21)

6.3 Generalized Modus Ponens

In this section, we introduce a generalization of the Modus Ponens


inference rule that does in a single blow what required an And-
Introduction, Universal Elimination, and Modus Ponens in the earlier
proof. The idea is to be able to take a knowledge base containing.

Canonical Form
We are attempting to build an inferencing mechanism with one inference
rule – the generalized version of Modus Ponens. The canonical form for
Modus Ponens mandates that each sentence in the knowledge base be either
an atomic sentence or an implication with a conjunction of atomic
sentences on the left hand side and single atom on the right. Sentence
of this form are called Horn sentences, and a knowledge base consisting
of only Horn sentences is said to be in Horn Normal Form.

We convert sentences into Horn sentences when they are first entered into
the knowledge base, using Existential Elimination and And-Elimination.
For example, ∃x Owns(Nono,x) Λ Missile(x) is converted into the two
atomic Horn sentences Owns(Nono,M1) and Missile(M1). Once the
existential quantifiers are all eliminated, it is traditional to drop the
universal quantifiers, so that ∀y owns(y,M1) would be written as
Owns(y,M1). This is just an abbreviation-the meaning of y is still a
universally quantified variable.

Unification
The job of the unification routine, UNIFY, is to take two atomic
sentences p and q and return a substitution that would make p and q look
the same. (If there is no such substitution, then UNIFY should return
fail.) Formally,
UNIFY(p,q) = θ where SUBST(θ,p) = SUBST(θ,q)

θ is called the unifier of the two sentences. Suppose we have a rule:


Knows(John,x) => Hates(John,x)

(“John hates everyone he knows”) and we want to use this with the Modus
Ponens inference rule to find out whom he hates. In other words, we need
to find those sentences in the knowledge base that unify with
Knows(John,x), and then apply the unifier to Hates(John,x). Let our
knowledge base contain the following sentences:
Knows(John,Jane)
Knows(y,Leonid)
Knows(y,Mother(y))
Knows(x,Elizabeth)
The last unification fails because x cannot take on the alue John and the
value Elizabeth at the same time.
(Remember that x and y and implicitly universally quantified.) Unifying
the antecedent for the rule against each of the sentences in the
knowledge base in turn gives us:
Prof. Marvin Intelligent Systems Page 28
UNIFY(Knows(John,x),Knows(John,Jane))={ x/Jane}
UNIFY(Knows(John,x),Knows(y,Leonid))={x/Leonid,y/John}
UNIFY(Knows(John,x),Knows(y,Mother(y))={y/John,x?Mother(John)}
UNIFY(Knows(John,x),Knows(x,Elizabeth))= fail

The last unification fails because x cannot take on the value John and
the value Elizabeth at the same time.

One way to handle this problem is to standardize apart the two sentences
being unified, which means renaming the variables of one (or both) to
avoid name clashes. After standardizing apart, we would have
UNIFY(Knows(John,x1),Knows(x2,Elizabeth))={ x1/Elizabeth, x2/John}

Let us solve our crime problem using Generalized Modus Ponens. To do


this, we first need to put the original knowledge base into Horn form.
Sentences (1) through (9) become

American(x) Λ Weapon(y) Λ Nation(z) Λ Hostile(z) Λ


Sells(x,z,y)=>Criminal(x) ---(21)

Nation(Nono) ---(22)

Enemy(Nono,America) ---(23)

Nation(America) ---(24)

Owns(Nono,M1) ---(25)

Missile(M1) ---(26)

Owns(Nono,x) Λ Missile(x)=>Sells(West,Nono,x) ---(27)

American(West) ---(28)

Missile(x) => Weapon(x) ---(29)

Enemy(x,America) => Hostile(x) ---(30)

The proof involves just four steps.


From (26) and (29) using Modus Ponens:

Weapon(M1) ---(31)

From (23) and (30) using Modus Ponens:

Hostile(Nono) ---(32)

From (25), (26) and (27) using Modus Ponens:

Sells(West,Nono,M1) ---(33)

From (28), (31), (22), (32), (33) and (21) using Modus Ponens:

Criminal(West) ---(34)

This proof shows how natural reasoning with Generalized Modus Ponens can
be.

6.3 Forward and Backward Chaining

The Generalized Modus Ponens rule can be used in two ways. We can start
with the sentences in the knowledge base and generate new conclusions
that in turn can allow more inferences to be made. This is called
forward chaining. Forward chaining is usually used when a new fact is
added to the database and we want to generate its consequences.
Alternatively, we can start with something we want to prove, find
implication sentences that would allow us to conclude it, and then
attempt to establish their premises in turn. This is called backward
chaining, because it uses Modus Ponens backwards. Backward chaining is
normally used when there is a goal to be proved.
Prof. Marvin Intelligent Systems Page 29
Forward-chaining algorithm
Forward chaining is normally triggered by the addition of a new fact p to
the knowledge base. The idea is to find all implications that have p as
a premise; then if the other premises are already known to hold, we can
add the consequent of the implication to the knowledge base, triggering
further inference.
We also need the idea of a composition of substitutions. COMPOSE(θ1,θ2)
is the substitution whose effect is identical to the effect of applying
each substitution in turn. That is,
SUBST(COMPOSE(θ1,θ2),p) = SUBST(θ2,SUBST(θ1,p))

Algorithm:

procedure Forward-Chain(KB,p)

if there is a sentence in KB that is a renaming of p then return


Add p to KB
for each(p1 Λ... Λ pn => q) in KB
such that for some i, UNIFY(pi,p) = θ succeeds do
FIND-AND-INFER(KB,[p1...,pi-1,pi+1,...,pn],q,θ)
end

procedure FIND-AND-INFER(KB,premises,conclusion,θ)

if premises = [] then
FORWARD-CHAIN(KB,SUBST(θ,conclusion))
else for each p’ in KB such that UNIFY(p’,SUBST(θ,FIRST(premises))) = θ2
do
FIND-AND-INFER(KB,REST(premises),conclusion,COMPOSE(θ,θ2))
end

The forward-chaining inference algorithm adds to KB all the sentences that can be inferred from the sentence p. If p is
already in KB, it does nothing. If p is new, consider each implication that has a premise that matches p. For each such
implication, if all the remaining premises are in KB, then infer the conclusion. If the premises can be matched several
ways, then infer each corresponding conclusion. The substitution θ keeps track of the way things match.
We will use our crime problem again to illustrate how FORWARD-CHAIN
works. We will begin with the knowledge base containing only the
implications in Horn form:

American(x) Λ Weapon(y) Λ Nation(z) Λ Hostile(z) Λ


Sells(x,z,y)=>Criminal(x) ---(1)

Owns(Nono,x) Λ Missile(x) => Sells(West,Nono,x) ---(2)

Missile(x) => Weapon(x) ---(3)

Enemy(x,America) => Hostile(x) ---(4)

Now we add the atomic sentences to the knowledge base one by one, forward
chaining each time and showing any additional facts that are added:
FORWARD-CHAIN(KB,American(West))

Add to the KB. It unifies with a premise of (1), but the other premises
of (1) are not known, so FORWARD-CHAIN returns without making any new
inferences.
FORWARD-CHAIN(KB,Nation(Nono))

Add to the KB. It unifies with a premise of (1), but there are still
missing premises, so FORWARD-CHAIN returns.
FORWARD-CHAIN(KB,Enemy(Nono,America))

Add to the KB. It unifies with the premise of (4), with unifier
{x/Nono}. Call
FORWARD-CHAIN(KB,Hostile(Nono))

Add to the KB. It unifies with a premise of (1). Only two other
premises are known, so processing terminates.
FORWARD-CHAIN(KB,Owns(Nono,M1))
Prof. Marvin Intelligent Systems Page 30
Add to the KB. It unifies with a premise of (2), with unifier {x/M1}.
The other premise, now Missile(M1), is not known, so processing
terminates.
FORWARD-CHAIN(KB,Missile(M1))

Add to the KB. It unifies with a premise of (2) and (3). We will handle
them in that order.
• Missile(M1) unifies with a premise of (2) with unifier {x/M1}. The
other premise, now Owns(Nono,M1), is known, so call
FORWARD-CHAIN(KB,Sells(West,Nono,M1))

Add to the KB. It unifies with a premise of (1), with unifier


{x/West,y/M1,z/Nono}. The premise Weapon(M1) is unknown, so
processing terminates.

• Missile(M1) unifies with a premise of (3) with unifier {x/M1}.


Call FORWARD-CHAIN(KB,Weapon(M1))

Add to the KB. It unifies with a premise of (1), with unifier


{y/M1}. The other premises are all known, with accumulated unifier
{x/West,y/M1,z/Nono}. Call
FORWARD-CHAIN(KB,Criminal(West))

Add to the KB. Processing terminates.

Backward-chaining algorithm:
Backward chaining is designed to find all answers to a question posed to
the knowledge base. The backward-chaining algorithm BACK-CHAIN works by
first checking to see if answers can be provided directly from sentences
in the knowledge base. It then finds all implications whose conclusion
unifies with the query, and tries to establish the premises of those
implications, also by backward chaining. If the premise is a
conjunction, then BACK-CHAIN processes the conjunction conjunct by
conjunct, building up the unifier for the whole premise as it goes.

Figure 1 Criminal(x)

American(x) Weapon(y) Nation(z) Hostile(Nono) Sells(West,Nono,M1)


Yes, {x/West} Yes, {z/Nono}
Missile(y) Enemy(Nono,America)
Yes, {y/M1} Yes, {}

Owns(Nono,M1) Missile(M1)
Yes, {} Yes, {}

Figure 2 Criminal(x)

American(x) Weapon(y) Nation(z) Hostile(America) Sells(West,America,M1)


Yes, {x/West} Yes, {z/America} Fail
Missile(y)
Yes, {y/M1}
Prof. Marvin Intelligent Systems Page 31
Figure 1 is the proof tree for deriving Criminal(West) from sentences
(21) through (30). As a diagram of the backward chaining algorithm, the
tree should be read depth-first, left to right. To prove Criminal(x) we
have to prove the five conjuncts below it. Some of these are in the
knowledge base, and others require further backward chaining. Each leaf
node has succeeds, its substitution is applied to subsequent branches.
Thus, by the time BACK-CHAIN gets to Sells(x,z,y), all the variables are
instantiated. Figure 1 can also be seen as a diagram of forward
chaining. In this interpretation, the premises are added at the bottom,
and conclusions are added once all their premises are in the KB. Figure
2 shows what can happen if an incorrect choice is made in the search-in
this case, choosing America as the nation in question. There is no way
to prove that America is a hostile nation, so the proof fails to go
through, and we have to back up and consider another branch in the search
space.

6.4 Completeness
Suppose we have the following knowledge base:
∀ x P(x) => Q(x)
∀ x ¬ P(x) => R(x)
∀ x Q(x) => S(x)
∀ x R(x) => S(x)
Then we certainly want to be able to conclude S(A); S(A) is true if Q(A)
or R(A) is true, and one of the those must be true because either P(A) is
true or ¬ P(A) is true.
Unfortunately, chaining with Modus Ponens cannot derive S(A) for us. The
problem is that ∀ x ¬ P(x) => R(x) cannot be converted Horn form, and
thus cannot be used by Modus Ponens. That means that a proof procedure
using Modus Ponens is incomplete.

6.5 Resolution: A Complete Inference Procedure

Conversion to Normal Form


• Eliminate implications: Recall that p => q is the same as ¬ p ∨ q.
So replace all implications by the corresponding disjunctions.
• Move ¬ inwards: Negations are allowed only on atoms in conjunctive
normal form, and not at all in implicative normal form. We
eliminate negations with wide scope using de Morgan’s law, the
quantifier equivalences and double negation:
¬(p∨q) becomes ¬p∧¬q
¬(p∧q) becomes ¬p∨¬q
¬∀x,p becomes ∃x ¬p
¬∃x,p becomes ∀x ¬p
¬¬p becomes p
• Standardize variables: Fro sentences like (∀x P(x)) ∨ (∃x Q(x))
that use the same variable name twice, change the name of one of
the variables. This avoids confusion later when we drop the
quantifiers.
• Move quantifiers left: The sentence is now in a form in which all
the quantifiers can be moved to the left, in the order ni which
they appear, without changing the meaning of the sentence. It is
tedious to prove this properly; it involves equivalences such as
• Skolemize: Skolemization is the process of removing existential
quantifiers by elimination. In the simple case, it is just like
the Existential Elimination rule translate ∃x P(x) into P(A), where
A is a constant that does not appear elsewhere in the KB. But
there is the added complication that some of the existential
quantifiers, even though moved left, may still be nested inside a
universal quantifier. Consider “Everyone has a heart”:
∀x Person(x) ⇒ ∃y Heart(y) ∧Has(x,y)
If we just replaced y with a constant, H, we would get
∀x Person(x) ⇒y Heart(H) ∧Has(x,H)
which says that everyone has the same heart H. We need to say that
the heart they have is not necessarily shared, that is, it can be
found by applying to each person a function that maps from person
to heart:
∀x Person(x) ⇒Heart(F(x)) ∧Has(x,F(x))
where F is a function name that does not appear elsewhere in the
KB. F is called a Skolem function.
Prof. Marvin Intelligent Systems Page 32
• Distribute ∧ over ∨: (a∧b)∨c becomes (a∨c) ∧(b∨c).
• Flatten nested conjunction and disjunction: (a∨b)∨c becomes
(a∨b∨c), and (a∧b)∧c becomes (a∧b∧c).
At this point, the sentence is in conjunctive normal form (CNF): it
is a conjunction where every conjunct is a disjunction of literals.
• Convert disjunctions to implications: Optionally, you can take one
more step to convert to implicative normal form. For each conjunct,
gather up the negative literals into one list, (¬a∨¬b∨¬c∨¬d)
becomes (a∧b⇒c∨d)

Example: (B.U. Dec 2004, May 2005)


From “Horses are animals”, it follows that ”The head of the
horse is the head of an animal”. Demonstrate that this inference
is valid by carrying out the following steps:
1. Translate the premise and the conclusion into the language
of FOL. Use three predicates HeadOf(h,x), Horse(x) and
Animal(x).
2. Negate the conclusion, and convert the premise and the
negated conclusion into conjunctive normal form.
3. Use resolution to show that the conclusion follows from the
premise.
Solution:
1) premise: ∀x Horse(x) ⇒Animal(x)
conclusion: ∀h (∃ x Headof(h, x) ∧ Horse(x)) => (∃ y Headof(h, y) ∧ Animal(y))

2) Negation and CNF for premise: ¬Horse(x) ∨ Animal(x)


Negation of conclusion: ¬ (∀h (∃ x Headof(h, x) ∧ Horse(x)) => (∃ y Headof(h, y) ∧ Animal(y)))
CNF for conclusion:
1. eliminate implications:
¬ (∀h ¬ (∃ x Headof(h, x) ∧ Horse(x)) ∨ (∃ y Headof(h, y) ∧ Animal(y))
2. move ¬ inwards:
∃ h (∃ x Headof(h, x) ∧ Horse(x)) ∧ (∀y ¬Headof(h, y) ∨ ¬Animal(y))
3. move quantifiers left:
∃ h ∃ x ∀y (Headof(h, x) ∧ Horse(x)) ∧ (¬Headof(h, y) ∨ ¬Animal(y))
4. skolemize:
(Headof(H, X) ∧ Horse(X)) ∧ (¬Headof(H, y) ∨ ¬Animal(y))
5. flatten:
Headof(H, X) ∧ Horse(X) ∧ (¬Headof(H, y) ∨ ¬Animal(y))
3) Resolution Tree:
Prof. Marvin Intelligent Systems Page 33
Chapter 7: Acting Logically

7.1 Planning

Planning and problem solving are considered different subjects because of


the differences in the representations of goals, states, and actions, and
the differences in the representation and construction of action
sequences.

The basic elements of a search-based problem-solver:


• Representation of actions. Actions are described by programs that
generate successor state descriptions.
• Representation of states. In problem solving, a complete
description of the initial state is given, and actions are
represented by a program that generates complete state
descriptions. Therefore, all state representations are complete.
• Representation of goals. The only information that a problem-
solving agent has about its goal is in the form of the goal test
and heuristic function.
• Representation of plans. In problem solving, a solution is a
sequence of actions, such as “Go from Arad to Sibiu to Fagaras to
Bucharest.” During the construction of solutions, search algorithms
consider only unbroken sequences of action beginning from the
initial state (or, in case of bidirectional search, ending at a
goal state).

7.1.1 Basic Representation for Planning

The “classical” approach that most planners use today describes states
and operators in a restricted language known as the STRIPS language, or
in extensions thereof. The STRIPS language lends itself to efficient
planning algorithms, while retaining much of the expressiveness of
situation calculus representations.

Representations for States and goals

In the STRIPS language, states are represented by conjunction of


function-free ground literals, that is, predicates applied to constant
symbols, possibly negated. For example, the initial state for the milk-
and-bananas problem might be described as

At(Home) Λ ¬Have(Milk) Λ ¬Have(Bananas) Λ ¬Have(Drill)Λ...

Goals are also described by conjunctions of literals. For example, the


shopping goal might be represented as

At(Home) Λ Have(Milk) Λ Have(Bananas) Λ Have(Drill)

Representations for actions

Our STRIPS operators consist of three components:


• The action description is what an agent actually returns to the
environment in order to do something. Within the planner it serves
only as a name for a possible action.
• The precondition is a conjunction of atoms (positive literals) that
says what must be true before the operator can be applied.
• The effect of an operator is a conjunction of literals (positive or
negative) that describes how the situation changes when the
operator is applied.
Here is an example of the syntax we will use for forming a STRIPS
operator for going from one place to another:

Op(ACTION:Go(there), PRECOND:At(here) Λ Path(here,there),


EFFECT:At(there) Λ ¬At(here))
Prof. Marvin Intelligent Systems Page 34
7.1.2 A Partial-Order Planning Example

In this section, we sketch the outline of a partial-order regression


planner that searches through plan space. The planner starts with an
initial plan representing the start and finish steps, and on each
iteration adds one more step. If this leads to an inconsistent plan, it
backtracks and tries another branch of the search space. To keep the
search focused, the planner only considers adding steps that serve to
achieve a precondition that has not yet been achieved. The causal links
are used to keep track of this.
We illustrate the planner by returning to the problem of getting some
milk, a banana, and a drill, and bringing them back home. We will make
some simplifying assumptions. First, the Go action can be used to travel
between any two locations. Second, the description of the Buy action
ignores the question of money. The initial state is defined by the
following operator, where HWS means hardware store and SM means
supermarket:

Op(ACTION:Start,EFFECT:At(Home) Λ Sells(HWS,Drill)
Λ Sells(SM,Milk),Sells(SM,Banana))

The goal state is defined by a Finish step describing the objects to be


acquired and the final destination to be reached:

Op(ACTION:Finish,
PRECOND:Have(Drill) Λ Have(Milk) Λ Have(Bananas) Λ At(Home))

Figure 1

The actions themselves are defined as follows:

Op(ACTION:Go(there),PRECOND:At(here),
EFFECT:At(here) Λ ¬At(here))

Op(ACTION:Buy(x),PRECOND:At(Store) Λ Sells(store,x),
EFFECT:Have(x))

Figure 2

Figure 3
Prof. Marvin Intelligent Systems Page 35
In figure 2, we have selected three Buy actions to achieve three of the
preconditions of the Finish action. In each case there is only one
possible choice because the operator library offers no other way to
achieve these conditions.

The bold arrows in the figure 2 are causal links. For example, the
leftmost causal link the figure means that the step Buy(Drill) was added
in order to achieve the Finish step’s Have(Drill) precondition. The
planner will make sure that this condition is maintained by protecting
it: if a step might delete the Have(Drill) condition, then it will not be
inserted between the Buy(Drill) step and the Finish step. Light arrows in
the figure show ordering constraints.

Figure 3 shows the situation after the planner has chosen to achieve the
Sells preconditions by linking them to the initial state. Again, the
planner has no choice here because there is no other operator that
achieves Sells.

Figure 4

In figure 4 we extend the plan by choosing two Go actions to get us to


the hardware store and supermarket, thus achieving the At preconditions
of the Buy actions.

The two Go actions have unachieved preconditions that interact with each
other, because the agent cannot be At two places at the same time. Each
Go actions has a precondition At(x), where x is the location that the
agent was at before that Go actions. Suppose the planner tries to achieve
the preconditions of Go(HWS) and Go(SM) by linking them to the At(Home)
condition in the initial state. This results in the plan shown in Figure
5.

Figure 5

Unfortunately, this will lead to a problem. The step Go(HWS) adds the
condition At(HWS), but it also deletes the condition At(Home). So if the
agent goes to the hardware store, it can no longer go from home to the
supermarket.

On the other hand, if the agent goes to the supermarket first, it cannot
go from home to the hardware store. At this point, we have reached a dead
end in the search for a solution, and must back up and try another
choice. The interesting part is seeing how a planner could notice that
this partial plan is a dead end without wasting a lot of time on it. The
Prof. Marvin Intelligent Systems Page 36
key that the causal links in a partial plan are protected links. A causal
link is protected by ensuring that threats---that is , steps that might
delete (or clobber) the protected condition--- are ordered to come bore
or after the protected link. The casual link S1 => S2 is threatened by
the new step S3 because one effect of S3 to delete c. The way to resolve
the threat is to add ordering constraints to make sure that S3 does not
intervene between S1 and S2. If S3 is placed before S1 this is called
demotion, and if it is placed after S2, it is called promotion.

In figure 5, there is no way to resolve the threat that each Go step


poses to the other. Whichever Go step comes first will delete the
At(Home) condition on the other step. Whenever the planner is unable to
resolve a threat by promotion or demotion, it gives up on the partial
plan and backs up to try a different choice at some earlier point in the
planning process.

Suppose the next choice is to try a different way to achieve the At(x)
precondition of the Go(SM) step, this time by adding a causal link from
Go(HWS) to Go(SM). In other words, the plan is to go from home to the
hardware store and then to the supermarket. This introduces another
threat. Unless the plan is further refined, it will allow the agent to go
from the hardware store to the super market without first buying the
drill. Technically, the Go(SM) step threatens the At(HWS) precondition of
the Buy(Drill) step, which is protected by a causal link. The threat is
resolved by constraining Go(SM) to come after Buy(Drill). Figure 7 shows
this.

Figure 7

Only the At(Hone) precondition of the Finish step remains unachieved.


Adding a Go(Home) step achieves it, but introduces an At(x) precondition
that needs to be achieved. Again, the protection of causal links will
help the planner decide how to do this:
• If it tries to achieve At(x) by linking to At(Home) in the initial
state, there will be no way to resolve the threats caused by
Go(HWS) and Go(SM).
• If it tries to link At(x) to the Go(HWS) step, there will be no way
to resolve the threat posed by the Go(SM) step, which is already
constrained to come after Go(HWS).
• A link from Go(SM) to At(x) means that x is bound to SM, so that
now the Go(Home) step deletes the At(SM) condition. This results in
threats to the At(SM) preconditions of Buy(Milk) and Buy(Bananas),
but these can be resolved by ordering Go(Home) to come after these
steps.
Prof. Marvin Intelligent Systems Page 37
Figure 8 shows the complete solution plan, with the steps redrawn to
reflect the ordering constraints on them. The result is an almost totally
ordered plan; the only ambiguity is that Buy(Milk) and Buy(Bananas) can
come in either order.

Figure 8

7.2 Practical Planning

Section 7.1 showed how a partial-order planner’s search through the space
of plans can be more efficient than a problem-solver’s search through the
space of situations. On the other hand, the POP planner can only handle
problems that are stated in the STRIPS language, and its search process
is so unguided that it can still only be used for small problems. In this
section we begin by surveying existing planners that operate in complex,
realistic domains. This will help to pinpoint the weaknesses of POP and
suggest the necessary extensions.

Spacecraft assembly, integration, and verification

OPTIMUM-AIV is a planner that is used by the European Space Agency to


help in the assembly, integration, and verification (AIV) of spacecraft.
The system is used both to generate plans and to monitor their execution.
During monitoring, the system reminds the user of upcoming activities,
and can suggest repairs to the plan when an activity is performed late,
cancelled, or reveals something unexpected. In fact, the ability to
quickly replan is the principal objective of OPTIMUM-AVI. The system does
not execute the plans; that is done by humans with standard construction
and test equipment.

In complex projects like this, it is common to use scheduling tools from


operations research (OR) such as PERT charts or the critical path method.
These tools essentially take a hand-constructed complete partial-order
plan and generate an optimal schedule for it. Actions are treated as
objects that take up time and have ordering constraints; their effects
are ignored. This avoids the need for knowledge engineering, and for one-
shot problems it may be the most appropriate solution.

The success of real-world AI systems requires integration into the


environment in which they operate. It is a vital that a planner be able
to access existing databases of project information in whatever format
they might have, and that the planner’s input and output representations
be in a form that is both expressive and easily understood by users. The
STRIPS language is insufficient for the AIV domain because it cannot
express four key concepts:
Prof. Marvin Intelligent Systems Page 38

1. Hierarchical plans
2. Complex conditions
3. Time
4. Resources

OPTIMUM-AIV is based on the open planning architecture O-PLAN. O-PLAN is


similar to the POP planner except that it is augmented to accept a more
expressive language that can represent time, resources, and hierarchical
plans. It also accepts heuristics for guiding the search and records its
reasons for each choice, which makes it easier to replan when necessary.
O-PLAN has been applied to a variety of problems, including software
procurement planning at Price Waterhouse, back axle assembly process
planning at Jaguar Cars, and complete factory production planning at
Hitachi.

Job Shop scheduling

The problem that a factory solves is to take in raw materials and


components, and assemble them into finish products. The problem can be
divided into a planning task (deciding what assembly steps are going to
be performed) and a scheduling task (deciding when and where each step
will be performed). In many modern factories, the planning is done by
hand and the scheduling is done with an automated tool.

O-PLAN is being used by Hitachi for job shop planning and scheduling in a
system called TOSCA. A typical problem involves a product line of 350
different products, 35 assembly machines, and over 2000 different
operations. The planner comes up with a 30-day schedule for three 8-hour
shifts a day. In general, TOSCA follows the partial-order, least-
commitment planning approach. It also allows for “low-commitment”
decision: choices that impose constraints on the plan or on a particular
step. For example, the system might choose to schedule an action to be
carried out on a class of machine without specifying any particular one.

Building, aircraft carriers, and beer factories

SIPE (System for Interactive Planning and Execution monitoring) was the
first planner to deal with the problem of replanning, and the first to
take some important steps toward expressive operators. It has been used
in demonstration projects in several domains, including planning
operations on the flight deck of an aircraft carrier and job-shop
scheduling for an Australian beer factor. Another study used SIPE to plan
the construction of multistory buildings, one of the most complex domains
even tackled by a planner.
Prof. Marvin Intelligent Systems Page 39
Chapter 8: Uncertain Knowledge And Reasoning

8.1 Uncertainty

8.1.1 Acting Under Uncertainty:


One problem with first-order logic, and thus with the logical-agent
approach, is that agents almost never have access to the whole truth
about their environment. The agent must therefore act under uncertainty.
For example, a wumpus agent often will find itself unable to discover
which of the two squares contains a pit. If those squares are in route to
the gold, then the agent might have to take a chance and enter one of the
two squares. Uncertainty can also arise because of incompleteness and
incorrectness in the agent’s understanding of the properties of the
environment.

Handling uncertain knowledge

We will use a simple diagnosis example to illustrate the concepts


involved. Diagnosis-whether for medicine, automobile repair, or whatever-
is a task that almost always involves uncertainty. If we tried to build a
dental diagnosis system using first-order logic, we might propose rules
such as
∀ p Symptom(p,Toothache) ⇒ Disease(p,Cavity)

The problem is that this rule is wrong. Not all patients with toothaches
have cavities; some of them may have gum disease, or impacted wisdom
teeth, or one of several other problems:

∀ p Symptom(p,Toothache) ⇒
Disease(p,Cavity) ∨ Disease(p,GunDisease) ∨ Disease(p,ImpactedWisdom)

Unfortunately, in order to make the rule true, we have to add an almost


unlimited list of possible causes. We could try turning the rule into a
causal rule:

∀ p Disease(p,Cavity) ⇒ Symptom(p,Toothache)

But this rule is not right either; not all cavities cuase pain. The only
way to fix the rule is to make it logically exhaustive: to exten the
left-hand side to cover all possible reasons why a cavity might or might
not cause a toothache. Even then, for the purposes of diagnosis, one must
also take into account the possibility that the patient may have a
toothache and a cavity that are unconnected.

Trying to use first-order logic to cope with a domain like medical


diagnosis thus fails for three main reasons:
• Laziness: It is too much work to list the complete set of
antecedents or consequents needed to ensure an exceptionless rule,
and too hard to use the enormous rules that result.
• Theoretical ignorance: Medical science has not complete theory for
the domain.
• Practical ignorance: Even it we know all the rules, we may be
uncertain about a particular patient because all the necessary
tests have not or cannot be run.

The connection between toothaches and cavities is just not a logical


consequence in either direction. This is typical of the medical domain,
as well as most other judgmental domains: law, business, design,
automobile repair, gardening, dating, and so on. The agent’s knowledge
can at best provide only a degree of belief in the relevant sentences.
Our main tools for dealing with degrees of belief will be probability
theory, which assigns a numerical degree of belief between 0 and 1 to
sentences. Probability provides a way of summarizing the uncertainty that
comes from our laziness and ignorance.

8.1.2 The Axioms of Probability:

In order to define properly the semantics of statements in probability


theory, we will need to describe how probabilities and logical
connectives interact. The following axioms are in fact sufficient:
Prof. Marvin Intelligent Systems Page 40

1. All probabilities are between 0 and 1


0 ≤ P(A)≤ 1
2. Necessarily true (i.e., valid) propositions have probability 1,
necessarily false (i.e., unsatisfiable) propositions have
probability 0.
P(True)=1 P(False)=0

3. The probability of a disjunction is given by


P(A ∨ B) = P(A) + P(B) – P(A ∧ B)

Bayes’ rule and its use:

Recall the two forms of the product rule:


P(A∧B)=P(A|B)P(B)
P(A∧B)=P(B|A)P(A)

Equating the two right-hand sides and dividing by P(A), we get

P(B|A)= [P(A|B)P(B)] / P(A)

This equation is known as Bayes’ rule (also Bayes’ law or Bayes’


theorem). This simple equation underlies all modern AI systems for
probabilistic inference. The more general case of multivalued variables
can be written using the P notation as follows:

P(Y|X)= [P(X|Y)P(Y)] / P(X)

where again this is to be taken as representing a set of equations


relating corresponding elements of the tables. We will also have occasion
to use a more general version conditionalized on some background evidence
E:
P(Y|X,E)= [P(X|Y,E)P(Y|E)] / P(X|E)

Normalization:

Consider again the equation for calculating the probability of meningitis


given a stiff neck:

P(M|S)= [P(S|M)P(M)] / P(S)

Suppose we are also concerned with the possibility that the patient is
suffering from whiplash W given a stiff neck:

P(W|S)= [P(S|W)P(W)] / P(S)

Comparing these two equations, we see that in order to computer the


relative likelihood of the meningitis and whiplash, given a stiff neck,
we need no assess the prior probability P(S) of a stiff neck. To put
numbers on the equations, suppose that P(S/W) = 0.8 and P(W) = 1/1000.
Then

P(M|S)/P(W|S)= [P(S|M)P(M)]/[P(S|W)P(W)]=[0.5*1/50000]/[0.8*1/1000]=1/80

That is, whiplash is 80 times more likely than meningitis, given a stiff
neck.

In some cases, relative likelihood is sufficient for decision making, but


when, as in this case, the two possibilities yield radically different
utilities for various treatment actions, one needs exact values in order
to make rational decisions. It is still possible to avoid direct
assessment of the prior probability of the “symptoms” by considering an
exhaustive set of cases. For example, we can write equations for M and
for ¬ M:

P(M|S)= [P(S|M)P(M)] / P(S)


P(¬M|S)= [P(S|¬M)P(¬M)] / P(S)

Adding these two equations, and using the fact that P(M|S)+ P(¬M|S)=1,
we obtain:
P(S)= P(S|M)P(M)+ P(S|¬M)P(¬M)
Prof. Marvin Intelligent Systems Page 41

Substituting into the equation for P(M|S), we have


P(M|S)= [P(S|M)P(M)] /[P(S|M)P(M)+P(S|¬M)P(¬M)]

This process is called normalization, because it treats 1/P(S) as a


normalizing constant that allows the conditional terms to sum to 1. thus,
in return for assessing the conditional probability P(S|¬M), we can avod
assessing P(S) and still obtain exact probabilities from Bayes’ rule. In
the general, multivalued case, we obtain the following form from Bayes’
rule:
P(Y|X)= α P(X|Y)P(Y)

where α is the normalization constant needed to make the entries in the


table P(Y|X) sum to 1. The normal way to use normalization is to
calculate the unnormalized values, and then scale them all so that they
add to 1.

8.2 Representing Knowledge In An Uncertain Domain:

In Section 8.1 we saw that the joint probability distribution can answer
any question about the domain, but can become intractably large as the
number of variables grows. Furthermore, specifying probabilities for
atomic events is rather unnatural and may be very difficult unless a
large amount of data is available from which to gather statistical
estimates.

We also saw that, in the context of using Bayes’ rule, conditional


independence relationships among variables can simplify the computation
of query results and greatly reduce the number of conditional
probabilities that need to be specified. We use a data structure called a
belief network to represent the dependence between variables and to give
a concise specification of the joint probability distribution. A belief
network is a graph in which the following holds:

1. A set of random variables makes up the nodes of the network.


2. A set of directed links or arrows connects pairs of nodes. The
intuitive meaning of an arrow from node X to node Y is that X has a
direct influence of Y.
3. Each node has a conditional probability table that quantifies the
effect that the parents have on the node. The parents of a node are
all those nodes that have arrows pointing to it.
4. The graph has no directed cycles (hence is a directed, acyclic
graph, or DAG).

Consider the following situation. You have a new burglar alarm installed
at home. It is fairly reliable at detecting a burglary, but also responds
to occasion to minor earthquakes. (This example is due to Judea Pearl, a
resident of Los Angeles; hence the acute interest in earthquakes.) You
also have two neighbors, John and Mary, who have promised to call you at
work when they hear the alarm. John always calls when he hears the alarm,
but sometimes confuses the telephone ringing with the alarm and calls
then, too. Mary, on the other hand, likes rather loud music and sometimes
misses the alarm altogether. Given the evidence of who has or has not
called, we would like to estimate the probability of a burglary. This
simple domain is described by the belief network of figure below.

Once we have specified the topology, we need to specify the conditional


probability table or CPT for each node. Each row in the table contains
the conditional probability of each node value for a conditioning case. A
conditioning case is just a possible combination of values for the parent
nodes (a miniature atomic event, if you like). For example, the
conditional probability table for the random variable Alarm might look
like this:
Prof. Marvin Intelligent Systems Page 42
Burglary Earthquake P(Alarm|Burglary,Earchquake)
True False
True True 0.950 0.050
True False 0.950 0.050
False True 0.290 0.710
False False 0.001 0.999

Each row in a conditional probability table must sum to 1, because the


entries represent an exhaustive set of cases for the variable. Hence only
one of the two numbers in each row shown above is independently
specifiable. In general, a table for a Boolean variable with n Boolean
parents contains 2n independently specifiable probabilities. A node with
no parents has only one row, representing the prior probabilities of each
possible value of the variable.

The complete network for the burglary example is show in figure below,
where we show just the conditional probability for the True case of each
variable.

8.3 Inference in Belief Networks:

A certainity factor (CF[h,e]) is defined in terms of two components:


• MB[h,e]: a measure (between 0 and 1) of belief in hypothesis h
given the evidence e. MB measures the extent to which the evidence
supports the hypothesis. It is zero if the evidence fails to
support the hypothesis.
• MD[h,e]: A measure (between 0 and 1) of disbelief in hypothesis h
given the evidence e. MD measures the extent to which the evidence
supports the negation of the hypothesis. It is zero if the evidence
supports the hypothesis.

From these two measures, we can define the certainty factor as:
CF[h,e]=MB[h,e]-MD[h,e]

Dempster-Shafer Theory
The Dempster-Shafer theory is designed to deal with the distinction
between uncertainty and ignorance. Rather than computing the probability
of a proposition, it computes the probability that the evidence supports
the proposition. This measure belief is called a belief function, written
Bel(X).
Suppose a shady character comes up to you and offers to be you $10 that
his coin will come up heads on the next flip. Given that the coin may or
may not be fair, what belief should you ascribe to the event of it coming
up heads? Dempster-Shafer theory says that because you have no evidence
either way, you have to say that the belief Bel(Heads) = 0, and also that
Bel(¬Heads) = 0. This makes Dempster-Shafer reasoning systems skeptical
in a way that has some intuitive appeal. Now suppose you have an expert
at your disposal who testifies with 90% certainty that the coin is fair
(i.e., he is 90% sure that P(Heads) = 0.5). Then Dempster-Shafer theory
gives Bel(Heads) = 0.9 X 0.5 = 0.45 and likewise Bel(¬Heads) = 0.45.
There is still a 0.1 “gap” that is not accounted for by the evidence.
“Dempster’s rule” shows how to combine evidence to give new values for
Bel, and Shafer’s work extents this into a complete computational model.
As with default reasoning, there is a problem in connecting beliefs to
actions. With probabilities, decision theory says that if P(Heads) =
P(¬Heads) = 0.5 then (assuming that winning $10 and losing $10 are
considered equal opposites) the reasoner will be indifferent between the
action of accepting and declining the bet. A Dempster-Shafer reasoner has
Bel(¬Heads) = 0, and thus no reason to accept the bet, but then it also
has Bel(Heads) = 0, and thus no reason to decline it.
Prof. Marvin Intelligent Systems Page 43
Chapter 9: Learning
So far we have assumed that all the "intelligence" in an agent has been
built by the agent's designer. The agent in then let loose in an
environment, an does the best it can given the way it was programmed to
act. But this is not necessarily the best approach-for the agent or the
designer. Whenever the designer has incomplete knowledge of the
environment that the agent will lie in, learning is the only way that the
agent can acquire what it needs to know. Learning thus provides
autonomy.

9.1 Learning from Observations

A General Model of Learning Agents:


A learning agent can be divided into four conceptual components, as shown
in figure below. The most important distinction is between the learning
element, which is responsible for making improvements, and the
performance element, which is responsible for selecting external actions.
The performance element is what we have previously considered to be the
entire agent: it takes in percepts and decides on actions. The learning
element takes some feedback on how the agent is doing, and determines how
the performance element should be modified to (hopefully) do better in
the future. The design of the learning element depends very much on the
design of the performance element.

The critic is designed to tell the learning element how well the agent is
doing. The critic employs a fixed standard of performance. This is
necessary because the percepts themselves provide no indication of the
agent's success. For example, a chess program may receive a percept
indicating that it has checkmated its opponent, but it needs a
performance standard to know that this is a good thing; the percept
itself does not say so.

The last component of the learning agent is the problem generator. It is


responsible for suggesting actions that will lead to new and informative
experiences. If the agent is willing to explore a little, and do some
perhaps suboptimal actions in the short run, it might discover much
better actions for the long run. The problem generator’s job is to
suggest these exploratory actions.
To make the overall design more concrete, let us return to the automated
taxi example. The performance element consists of whatever collection of
knowledge and procedures the taxi has for selecting its driving actions
(turning, accelerating, braking, honking, and so on). The taxi goes out
on the road and drives, using this performance element. The learning
element formulates goals, for example, to learn better rules describing
the effects of braking and accelerating, to learn the geography of the
area, to learn how the taxi behaves on wet roads, and to learn what
causes annoyance to other drivers. The critic observes the world and
passes information along to the learning element. Occasionally, the
problem generator kicks in with a suggestion: try taking 7th Avenue
uptown this time, and see if it is faster than the normal route.

The learning element is also responsible for improving the efficiency of


the performance element. For example, when asked to make a trip to a new
destination, the taxi might take a while to consult its map and plan the
best route. But the next time a similar trip is requested, the planning
process should be much faster. This is called speedup learning.
Prof. Marvin Intelligent Systems Page 44

Components of the performance element:

We have seen that there are many ways to build the performance element of
an agent. The components can include the following:
1. A direct mapping from conditions on the current state to actions.
2. A means to infer relevant properties of the world from the percept
sequence.
3. Information about the way the world evolves.
4. Information about the results of possible actions the agent can
take.
5. Utility information indicating the desirability of world states.
6. Action-value information indicating the desirability of particular
actions in particular states.
7. Goals that describe classes of states whose achievement maximizes
the agent’s utility.

For some components, such as the component for predicting the outcome of
an action, the available feedback generally tells the agent what the
correct outcome is. That is, the agent predicts that a certain action
(braking) will have a certain outcome (stopping in 10 feet), and the
environment immediately provides a percept that describes the actual
correct outcome (stopping in 15 feet). Any situation in which both the
inputs and outputs of a component can be perceived is called supervised
learning. (Often, the outputs are provided by a friendly teacher).

Learning when there is no hint at all about the correct outputs is called
unsupervised learning. An unsupervised learner can always learn
relationships among its percepts using supervised learning methods-that
is, it can learn to predict its future percepts given its previous
percepts. It cannot learn what to do unless it already has a utility
function.

9.2 Learning in Neural and Belief Networks

9.2.1 Introduction to neural networks:

How the Brain Works

The neuron, or nerve cell, is the fundamental functional unit of all


nervous system tissue, including the brain Each neuron consists of a
cell body, or soma, that contains a cell nucleus. Branching out from the
cell body are a number of fibers called dendrites and a single long fiber
called the axon. Dendrites branch into a bushy network around the cell,
whereas the axon stretches out for a long distance. Eventually, the axon
also branches into strands and substrands that connect to the dendrites
and cell bodies of other neurons. The connecting junction is called a
synapse. Each neuron forms synapses with anywhere from a dozen to a
hundred thousand other neurons.

Signals are propagated from neuron to neuron by a complicated


electrochemical reaction. Chemical transmitter substances are released
from the synapses and enter the dendrite, raising or lowering the
electrical potential of the cell body. When the potential reaches a
threshold, an electrical pulse or action potential is sent down the axon.
The pulse spreads out along the branches of the axon, eventually reaching
Prof. Marvin Intelligent Systems Page 45
synapses and releasing transmitters into the bodies of other cells.
Synapses that increase the potential are called excitatory, and those
that decrease it are called inhibitory. Neurons also form new
connections with other neurons, and sometimes entire collection of
neurons can migrate from one place to another. These mechanisms are
thought to form the basis for learning in the brain.

Neural Networks

A neural network is composed of a number of nodes, or units, connected by


links. Each link has a numeric weight associated with it. Weights are
the primary means of long-term storage in neural networks, and learning
usually takes place by updating the weights.

Each unit has a set of input links from other units, a set of output
links to other units, a current activation level, and a means of
computing the activation level at the next step in time, given its inputs
and weights. Each unit performs a simple computation: it receives
signals from its input links and computes a new activation level that it
sends along each of its output links. The computation of the activation
level is based on the values of each input signal received from a
neighboring node, and the weights on each input link. The computation is
split into two components. First is a linear component, called the input
function, g, that transforms the weighted sum into the final value that
serves as the unit’s activation value, ai. Usually, all units in a
network use the same activation function.

The total weighted input is the sum of the input activations times their
respective weights:

ini = ∑j Wj,iaj = Wi.ai

where the final expression illustrates the use of vector notation. In


this notation, the weights on links into node I are denoted by W, the set
of input values is called ai, and the dot product denotes the sum of the
pairwise products.

The elementary computation step in each unit computes the new activation
value for the unit by applying the activation function, g, to the result
of the input function:

ai Å g(ini) = g(∑j Wj,iaj )

Different models are obtained by using different mathematical functions


for g. Three common choices are the step, sign, and sigmoid functions,
two of which are illustrated in figure below. The step function has a
threshold t such that it outputs a 1 when the input is greater than its
threshold, and outputs a 0 otherwise. The biological motivation is that
a 1 represents the firing of a pulse down the axon, and a 0 represents no
firing.

Step Function Sigmoid Function


Prof. Marvin Intelligent Systems Page 46
9.2.1 Perceptrons:

Network structures

There are a variety of kinds of network structure, each of which results


in very different computational properties. The main distinction to be
made is between feed-forward and recurrent networks. In a feed-forward
network, links are unidirectional, and there are no cycles. In a
recurrent network, the links can form arbitrary topologies. Technically
speaking, a feed-forward network is a directed acyclic graph (DAG). We
will usually be dealing with networks that are arranged in layers. In a
layered feed-forward network, each unit is linked only to units in the
next layer; there are no links between units in the same layer, no links
backward to a previous layer, and no links that skip a layer.

Feedforward neural network:

The feedforward neural networks are the first and arguably simplest type
of artificial neural networks devised. In this network, the information
moves in only one direction, forward, from the input nodes, through the
hidden nodes (if any) and to the output nodes. There are no cycles or
loops in the network.

The perceptron is a type of artificial neural network invented in 1957 at


the Cornell Aeronautical Laboratory by Frank Rosenblatt.

The perceptron consists of one or more layers of artificial neurons; the


inputs are fed directly to the outputs via a series of weights. In this
way it can be considered the simplest kind of feedforward network. Each
neuron calculates a weighted sum of its inputs - that is, the sum for all
inputs of the product of an input and its corresponding weight. If this
value is above some threshold, the neuron is said to 'fire', outputting
the value 1; otherwise it takes the value -1. To simplify training, the
threshold is often represented as an extra weight attached to a constant
input, with the actual threshold function centred on 0.

More generally, after the sum of the previous layer times the weights is
computed for each neuron, it is passed through a nonlinearity function.
The sigmoid function is a popular choice, because of its simple
derivative. The nonlinearity function is necessary for multilayer
networks, because otherwise they are linear and equivalent to simple two-
layer perceptrons.

Artificial neurons with this kind of activation function are also called
McCulloch-Pitts neurons or threshold neurons. In the literature the term
perceptron sometimes also refers to networks consisting of just one of
these units. Perceptrons can be trained by a simple learning algorithm
that is usually called the delta rule. It calculates the errors between
calculated output and sample output data, and uses this to create an
adjustment to the weights, thus implementing a form of gradient descent.

Single-layer perceptron

The earliest kind of neural network is a single-layer perceptron network,


which consists of a single layer of output nodes; the inputs are fed
directly to the outputs via a series of weights. In this way it can be
considered the simplest kind of feed-forward network. The sum of the
products of the weights and the inputs is calculated in each node, and if
the value is above some threshold (typically 0) the neuron fires and
takes the activated value (typically 1); otherwise it takes the
deactivated value (typically -1). Neurons with this kind of activation
function are also called McCulloch-Pitts neurons or threshold neurons. In
the literature the term perceptron often refers to networks consisting of
just one of these units.

A perceptron can be created using any values for the activated and
deactivated states as long as the threshold value lies between the two.
Most perceptrons have outputs of 1 or -1 with a threshold of 0 and there
is some evidence that such networks can be trained more quickly than
networks created from nodes with different activation and deactivation
values.
Prof. Marvin Intelligent Systems Page 47
Perceptrons can be trained by a simple learning algorithm that is usually
called the delta rule. It calculates the errors between calculated output
and sample output data, and uses this to create an adjustment to the
weights, thus implementing a form of gradient descent.

Single-unit perceptrons are only capable of learning linearly separable


patterns. Although a single threshold unit is quite limited in its
computational power, it has been shown that networks of parallel
threshold units can approximate any continuous function from a compact
interval of the real numbers into the interval [-1,1]. A single-layer
neural network can compute a continuous output instead of a step
function. A common choice is the so-called logistic function:

With this choice, the single-layer network is identical to the logistic


regression model, widely used in statistical modelling.

9.3 Reinforcement Learning

The chess-playing agent receives some feedback, even without a friendly


teacher—at the end of the game, the agent perceives whether it has won or
lost. This kind of feedback is called a reward, or reinforcement. In
games like chess, the reinforcement is received only at the end of the
game. We call this a terminal state in the state history sequence. In
other environments, the rewards come more frequently—in ping-pong, each
point scored can be considered a reward. Sometimes rewards are given by
a teacher who says “nice move” or “uh-oh” (but does not say what the best
move is).

The task of reinforcement learning is to use rewards to learn a


successful agent function. This is difficult because the agent is never
told what the right actions are, nor which rewards are due to which
actions. A game-playing agent may play flawlessly except for one
blunder, and at the end of the game get a single reinforcement that says
“you lose.” The agent must somehow determine which move was the blunder.

In many complex domains, reinforcement learning is the only feasible way


to train a program to perform at high levels. For example, in game
playing, it is very hard for a human needed to train an evaluation
function directly from examples. Instead, the program can be told when
it has won or lost, and can use this information to learn an evaluation
function that gives reasonably accurate estimates of the probability of
winning from any given position. Similarly, it is extremely difficult to
program a robot to juggle; yet given appropriate rewards every time a
ball is dropped or caught, the robot can learn to juggle by itself.

• The environment can be accessible or inaccessible. In an


accessible environment, states can be identified with percepts,
whereas in an inaccessible environment, the agent must maintain
some internal state to try to keep track of the environment.
• The agent can begin with knowledge of the environment and the
effects of its actions; or it will have to learn this model as well
as utility information.
• Rewards can be received only in terminal states, or in any state.
• Rewards can be components of the actual utility (points for a ping-
pong agent or dollars for a betting agent) that the agent is trying
to maximize, or they can be hints as to the actual utility (“nice
move” or “bad dog”).
• The agent can be a passive learner or an active learner. A passive
learner simply watches the world going by, and tries to learn the
utility of being in various states; an active learner must also act
using the learned information, and can use its problem generator to
suggest explorations of unknown portions of the environment.
Prof. Marvin Intelligent Systems Page 48
Genetic Algorithms and Evolutionary Programming:
Nature has a robust way of evolving successful organisms. The organisms
that are ill-suited for an environment die off, whereas the ones that are
fit live to reproduce. Offspring are similar to their parents so each
new generation has organisms that are similar to the fit members of the
previous generation. If the environment changes slowly, the species can
gradually evolve along with it, but a sudden change in the environment is
likely to wipe out a species.
It turns out that what’s good for nature is also good for artificial
systems. The GENETIC-ALGORITHM starts with a set of one or more
individuals and applies selection and reproduction operators to “evolve”
an individual that is successful, as measured by a fitness function.

function GENETIC-ALGORITHM(population,FITNESS-FN) returns an individual


inputs: population, a set of individuals
FITNESS-FN, a function that measures the fitness of an
individual
repeat
parents Å SELECTION(population,FITNESS-FN)
population Å REPRODUCTION(parents)
until some individual is fit enough

return the best individual in population, according to FITNESS-FN

Since the evolutionary process learns an agent function based on


occasional rewards (offspring) as supplied by the selection function, it
can be seen as a form of reinforcement learning. GENETIC-ALGORITHM
simply searches directly in the space of individuals, with the goal of
finding one that maximizes the fitness function. The search is parallel
because each individual in the population can be seen as a separate
search. It is hill climbing because we are making small genetic changes
to the individuals and using the best resulting offspring. The key
question is how to allocate the searching resources: clearly, we should
spend most of our time on local maximum.

Before we can apply GENETIC-ALGORITHM to a problem, we need to answer the


following four questions:
• What is the fitness function?
• How is an individual represented?
• How are individuals selected?
• How do individuals reproduce?
The fitness function depends on the problem, but in any case, it is a
function that takes an individual as input and returns a real number as
output.
In the “classic” genetic algorithm approach, an individual is represented
as a string over a finite alphabet. Each element of the string is called
a gene. In real DNA, the alphabet is AGTC (adenine, guanine, thymine,
cytosine), but in genetic algorithms, we usually use the binary
alphabet(0,1). Some authors reserve the term “genetic algorithm” for
cases where the representation is a bit string, and use the term
evolutionary programming when the representation is more complicated.
Other authors make no distinction, or make a slightly different one

The selection strategy is usually randomized, with the probability of


selection proportional to fitness. That is, if individual X scores twice
as high as on the fitness function, then X is twice as like to be
selected for reproduction that is Y. Usually, selection is done with
replacement, so that a very fit individual will get to reproduce several
times.

Reproduction is accomplished by cross-over and mutation. First, all the


individuals that have been selected for reproduction are randomly paired.
Then for each pair, a cross-over point is randomly chosen. Think of the
genes of each parent as being numbered from 1 to N. The cross-over
po0int is a number in that range; let us say it is 10. That means that
one offspring will get genes 1 through 10 from the first parent, and rest
from the second parent. The second offspring will get genes 1 through 10
from the second parent, and the rest from the first. However, each gene
can be altered by random mutation to a different value, with small
independent probability.
Prof. Marvin Intelligent Systems Page 49

EXTRA MATERIAL FROM THE LECTURES


Artificial Neural Networks
The neuron, or nerve cell, is the fundamental functional unit of all nervous system tissue, including
the brain. Each neuron consists of a cell body, or soma, that contains a cell nucleus. Branching out
from the cell body are a number of fibers called dendrites and a single long fiber called the axon.
Our brains are made up of about 100 billion tiny units (neurons). Each neuron is connected to
thousands of other neurons and communicates with them via electrochemical signals. Signals
coming into the neuron are received via junctions called synapses, these in turn are located at the
end of branches of the neuron cell called dendrites. The neuron continuously receives signals from
these inputs and then sums up the inputs to itself in some way and then, if the end result is greater
than some threshold value, the neuron fires. It generates a voltage and outputs a signal called as
action potential along the axon, eventually reaching the synapses which are connected to the other
neurons.

Neural networks are made up of many artificial neurons. An artificial neuron is simply an
electronically modelled biological neuron. How many neurons are used depends on the task at hand.
It could be as few as three or as many as several thousand. There are many different ways of
connecting artificial neurons together to create a neural network but I shall be concentrating on the
most common which is called a feedforward network.

Working of an Artificial Neuron:

Each input into the neuron has its own weight associated with it illustrated by the red circle. A
weight is simply a floating point number and it's these we adjust when we eventually come to train
the network. The weights in most neural nets can be both negative and positive, therefore providing
excitory or inhibitory influences to each input. As each input enters the nucleus (blue circle) it's
multiplied by its weight. The nucleus then sums all these new input values which gives us the
activation (again a floating point number which can be negative or positive). If the activation is
greater than a threshold value - lets use the number 1 as an example - the neuron outputs a signal. If
the activation is less than 1 the neuron outputs zero. This is typically called a step function as shown
below:

A neuron can have any number of inputs from one to n, where n is the total number of inputs. The
inputs may be represented therefore as x1, x2, x3… xn. And the corresponding weights for the inputs
Prof. Marvin Intelligent Systems Page 50
as w1, w2, w3… wn. Now, the summation of the weights multiplied by the inputs we talked about
above can be written as x1w1 + x2w2 + x3w3 …. + xnwn, which is the activation value. So… a =
x1w1+x2w2+x3w3... +xnwn
This can also be written as:

Assuming an array of inputs and weights are already initialized as x[n] and w[n] then the code can
be written as:

double activation = 0;

for (int i=0; i<n; i++)

activation += x[i] * w[i];

Remember that if the activation > threshold we output a 1 and if activation < threshold we output a
0.

The following diagram illustrates everything discussed till now:

Feedforward neural network:

The feedforward neural networks are the first and arguably simplest type of artificial neural
networks devised. In this network, the information moves in only one direction, forward, from the
input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops
in the network.

Perceptron:

A single-layer perceptron network consists of one or more artificial neurons in parallel. The earliest
kind of neural network is a single-layer perceptron network, which consists of a single layer of
output nodes; the inputs are fed directly to the outputs via a series of weights. In this way it can be
considered the simplest kind of feed-forward network. The sum of the products of the weights and
the inputs is calculated in each node, and if the value is above some threshold (typically 0) the
neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value
(typically -1).

Perceptron Learning:

The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons
in a single-layer perceptron. It calculates the errors between calculated output and sample output
data, and uses this to create an adjustment to the weights, thus implementing a form of gradient
descent. For a neuron j with activation function g(x) the delta rule for j's ith weight wji is given by:
Prof. Marvin Intelligent Systems Page 51
∆wji = α(tj − yj)xi

where α is a small constant called learning rate, tj is the target output, yj is the actual output, and xi
is the ith input.

Multi-Layer Perceptron:

Well, we have to link several of these neurons up in some way. One way of doing this is by
organising the neurons into a design called a feedforward network. It gets its name from the way the
neurons in each layer feed their output forward to the next layer until we get the final output from
the neural network. This is what a very simple feedforward network looks like:

Each input is sent to every neuron in the hidden layer and then each hidden layer’s neuron’s output
is connected to every neuron in the next layer. There can be any number of hidden layers within a
feedforward network but one is usually enough to suffice for most problems you will tackle. Also
the number of neurons I've chosen for the above diagram was completely arbitrary. There can be
any number of neurons in each layer, it all depends on the problem.

A real world example:

You probably know already that a popular use for neural nets is character recognition. So let's
design a neural network that will detect the number '4'. Given a panel made up of a grid of lights
which can be either on or off, we want our neural net to let us know whenever it thinks it sees the
character '4'. The panel is eight cells square and looks like this:

We would like to design a neural net that will accept the state of the panel as an input and
will output either a 1 or zero. A 1 to indicate that it thinks the character ‘4’ is being
displayed and 0 if it thinks it's not being displayed. Therefore the neural net will have 64
inputs, each one representing a particular cell in the panel and a hidden layer consisting of
a number of neurons (more on this later) all feeding their output into just one neuron in the
Prof. Marvin Intelligent Systems Page 52
output layer. Please picture this in your head because it would be very difficult to draw all those little
circles and lines.

Once the neural network has been created it needs to be trained. One way of doing this is
initialize the neural net with random weights and then feed it a series of inputs which
represent, in this example, the different panel configurations. For each configuration we
check to see what its output is and adjust the weights accordingly so that whenever it sees
something looking like a number 4 it outputs a 1 and for everything else it outputs a zero.
This type of training is called supervised learning and the data we feed it is called a
training set. There are many different ways of adjusting the weights, the most common for
this type of problem is called backpropagation.

If you think about it, you could increase the outputs of this neural net to 10. This way the
network can be trained to recognize all the digits 0 through to 9. Increase them further and
it could be trained to recognize the alphabet too!

Applications for Neural Networks

• Detection of medical phenomena. A variety of health-related indices (e.g., a combination


of heart rate, levels of various substances in the blood, respiration rate) can be monitored.
The onset of a particular medical condition could be associated with a very complex (e.g.,
nonlinear and interactive) combination of changes on a subset of the variables being
monitored. Neural networks have been used to recognize this predictive pattern so that the
appropriate treatment can be prescribed.
• Stock market prediction. Fluctuations of stock prices and stock indices are another
example of a complex, multidimensional, but in some circumstances at least partially-
deterministic phenomenon. Neural networks are being used by many technical analysts to
make predictions about stock prices based upon a large number of factors such as past
performance of other stocks and various economic indicators.
• Credit assignment. A variety of pieces of information are usually known about an applicant
for a loan. For instance, the applicant's age, education, occupation, and many other facts may
be available. After training a neural network on historical data, neural network analysis can
identify the most relevant characteristics and use those to classify applicants as good or bad
credit risks.
• Monitoring the condition of machinery. Neural networks can be instrumental in cutting
costs by bringing additional expertise to scheduling the preventive maintenance of
machines. A neural network can be trained to distinguish between the sounds a machine
makes when it is running normally ("false alarms") versus when it is on the verge of a
problem. After this training period, the expertise of the network can be used to warn a
technician of an upcoming breakdown, before it occurs and causes costly unforeseen
"downtime."
• Engine management. Neural networks have been used to analyze the input of sensors from
an engine. The neural network controls the various parameters within which the engine
functions, in order to achieve a particular goal, such as minimizing fuel consumption.
• Hand written character recognition
Prof. Marvin Intelligent Systems Page 53
Genetic Algorithms & Evolutionary Programming
Nature has a robust way of evolving successful organisms. The organisms that are ill-suited for
an environment die off, whereas the ones that are fit live to reproduce. Offspring are similar to
their parents so each new generation has organisms that are similar to the fit members of the
previous generation.

Every organism has a set of rules, a blueprint so to speak, describing how that organism is built up
from the tiny building blocks of life. These rules are encoded in the genes of an organism, which in
turn are connected together into long strings called chromosomes. Each gene represents a specific
trait of the organism, like eye colour or hair colour, and has several different settings. For example,
the settings for a hair colour gene may be blonde, black or auburn. These genes and their
settings are usually referred to as an organism's genotype. The physical expression of the genotype -
the organism itself - is called the phenotype.
When two organisms mate they share their genes. The resultant offspring may end up having half
the genes from one parent and half from the other. This process is called recombination. Very
occasionally a gene may be mutated. Normally this mutated gene will not affect the development of
the phenotype but very occasionally it will be expressed in the organism as a completely new trait.

Genetic Algorithms are a way of solving problems by mimicking the same processes
mother nature uses. They use the same combination of selection, recombination and
mutation to evolve a solution to a problem.

Reproduction:

During reproduction, recombination (or crossover) first occurs. Genes from parents combine to
form a whole new chromosome. The newly created offspring can then be mutated. Mutation means
that the elements of DNA are a bit changed. These changes are mainly caused by errors in copying
genes from parents.

The fitness of an organism is measured by success of the organism in its life (survival).

Outline of the Basic Genetic Algorithm:

1. [Start] Generate random population of n chromosomes (suitable solutions for the problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
3. [New population] Create a new population by repeating following steps until the new
population is complete
1. [Selection] Select two parent chromosomes from a population according to their
fitness (the better fitness, the bigger chance to be selected)
2. [Crossover] With a crossover probability cross over the parents to form new
offspring (children). If no crossover was performed, offspring is the exact copy of
parents.
3. [Mutation] With a mutation probability mutate new offspring at each locus (position
in chromosome).
4. [Accepting] Place new offspring in the new population
4. [Replace] Use new generated population for a further run of the algorithm
5. [Test] If the end condition is satisfied, stop, and return the best solution in current
population
6. [Loop] Go to step 2

Encoding of a Chromosome

A chromosome should in some way contain information about solution that it represents. The most
used way of encoding is a binary string. A chromosome then could look like this:

Chromosome 1 1101100100110110
Chromosome 2 1101111000011110

Each chromosome is represented by a binary string. Each bit in the string can represent some
characteristics of the solution. Of course, there are many other ways of encoding.
Prof. Marvin Intelligent Systems Page 54
Crossover

Crossover operates on selected genes from parent chromosomes and creates new offspring. The
simplest way how to do that is to choose randomly some crossover point and copy everything
before this point from the first parent and then copy everything after the crossover point from the
other parent.

Crossover can be illustrated as follows: ( | is the crossover point):

Chromosome 1 11011 | 00100110110


Chromosome 2 11011 | 11000011110
Offspring 1 11011 | 11000011110
Offspring 2 11011 | 00100110110

Mutation

After a crossover is performed, mutation takes place. Mutation is intended to prevent falling of all
solutions in the population into a local optimum of the solved problem. Mutation operation
randomly changes the offspring resulted from crossover.

Original offspring 1 1101111000011110


Original offspring 2 1101100100110110
Mutated offspring 1 1100111000011110
Mutated offspring 2 1101101100110110

Crossover and Mutation Probability

There are two basic parameters of GA - crossover probability and mutation probability.

Crossover probability: how often crossover will be performed. If there is no crossover, offspring
are exact copies of parents. If there is crossover, offspring are made from parts of both parent's
chromosome.
Crossover is made in hope that new chromosomes will contain good parts of old chromosomes and
therefore the new chromosomes will be better.

Mutation probability: how often parts of chromosome will be mutated. If there is no mutation,
offspring are generated immediately after crossover (or directly copied) without any change. If
mutation is performed, one or more parts of a chromosome are changed. If mutation probability is
100%, whole chromosome is changed, if it is 0%, nothing is changed.

Population size: how many chromosomes are in population (in one generation). If there are too few
chromosomes, GA have few possibilities to perform crossover. On the other hand, if there are too
many chromosomes, GA slows down.

Selection:
As you already know from the GA outline, chromosomes are selected from the population to be
parents for crossover. The problem is how to select these chromosomes. According to Darwin's
theory of evolution the best ones survive to create new offspring. There are many methods in
selecting the best chromosomes. Examples are roulette wheel selection, Boltzman selection,
tournament selection, rank selection, steady state selection and some others.

Roulette Wheel Selection


Prof. Marvin Intelligent Systems Page 55
This is a way of choosing members from the population of chromosomes in a way that is
proportional to their fitness. It does not guarantee that the fittest member goes through to the next
generation, merely that it has a very good chance of doing so.
It works like this: Imagine that the population’s total fitness score is represented by a pie chart, or
roulette wheel. Now you assign a slice of the wheel to each member of the population. The size of
the slice is proportional to that chromosomes fitness score. i.e. the fitter a member is the bigger the
slice of pie it gets. Now, to choose a chromosome all you have to do is spin the ball and grab the
chromosome at the point it stops.

Crossover and mutation:


Crossover and mutation are two basic operators of GA. Performance of GA depends on them very
much. There are many ways how to perform crossover and mutation.

Crossover:

Single point crossover - one crossover point is selected, binary string from the beginning of the
chromosome to the crossover point is copied from the first parent, the rest is copied from the other
parent

11001011+11011111 = 11001111

Two point crossover - two crossover points are selected, binary string from the beginning of the
chromosome to the first crossover point is copied from the first parent, the part from the first to the
second crossover point is copied from the other parent and the rest is copied from the first parent
again

11001011 + 11011111 = 11011111

Uniform crossover - bits are randomly copied from the first or from the second parent

11001011 + 11011101 = 11011111

Arithmetic crossover - some arithmetic operation is performed to make a new offspring

11001011 + 11011111 = 11001001 (AND)

Mutation:

Bit inversion - selected bits are inverted

11001001 => 10001001

Solving a Real Time Problem using Genetic Algorithms:

Given the digits 0 through 9 and the operators +, -, * and /, find a sequence that will
represent a given target number. The operators will be applied sequentially from left to
right as you read.

So, given the target number 23, the sequence 6+5*4/2+1 would be one possible solution.

If 75.5 is the chosen number then 5/2+9*7-5 would be a possible solution.

Stage 1: Encoding

First we need to encode a possible solution as a string of bits… a chromosome. So how do


we do this? Well, first we need to represent all the different characters available to the
solution... that is 0 through 9 and +, -, * and /. This will represent a gene. Each
chromosome will be made up of several genes.

Four bits are required to represent the range of characters used:

0: 0000
1: 0001
2: 0010
Prof. Marvin Intelligent Systems Page 56
3: 0011
4: 0100
5: 0101
6: 0110
7: 0111
8: 1000
9: 1001
+: 1010
-: 1011
*: 1100
/: 1101

The above show all the different genes required to encode the problem as described. The
possible genes 1110 & 1111 will remain unused and will be ignored by the algorithm if
encountered.

So now you can see that the solution mentioned above for 23, ' 6+5*4/2+1' would be
represented by nine genes like so:

0110 1010 0101 1100 0100 1101 0010 1010 0001

6 + 5 * 4 / 2 + 1

These genes are all strung together to form the chromosome:

011010100101110001001101001010100001

A Quick Word about Decoding

Because the algorithm deals with random arrangements of bits it is often going to come
across a string of bits like this:
0010001010101110101101110010
Decoded, these bits represent:

0010 0010 1010 1110 1011 0111 0010

2 2 + n/a - 7 2

Which is meaningless in the context of this problem! Therefore, when decoding, the
algorithm will just ignore any genes which don’t conform to the expected pattern of:
number -> operator -> number -> operator …and so on. With this in mind the above
‘nonsense’ chromosome is read (and tested) as:

2 + 7
Stage 2: Deciding on a Fitness Function

This can be the most difficult part of the algorithm to figure out. It really depends on what
problem you are trying to solve but the general idea is to give a higher fitness score the
closer a chromosome comes to solving the problem.
Eg., a fitness score can be assigned that's inversely proportional to the difference between
the solution and the value a decoded chromosome represents.

If we assume the target number (ie., solution) is 42, the chromosome mentioned above

011010100101110001001101001010100001

has a fitness score of 1/(42-23) or 1/19.

As it stands, if a solution is found, a divide by zero error would occur as the fitness would
be 1/(42-42). This is not a problem however as we have found what we were looking for...
a solution. Therefore a test can be made for this occurrence and the algorithm halted
accordingly.

Stage 3: Solving the problem:


Prof. Marvin Intelligent Systems Page 57
Generate a population by applying selection, crossover and mutation; Generate new
generation; Keep doing the task till you reach the solution.

Applications of GA

Genetic algorithms have been used for difficult problems (such as NP-hard problems), for machine learning
and also for evolving simple programs. They have been also used for some art, for evolving pictures and
music.

The advantage of GAs is in their parallelism. GA is travelling in a search space using more individuals (and
with genotype rather than phenotype) so that they are less likely to get stuck in a local extreme like the other
methods.

They are also easy to implement. Once you have the basic GA algorithm implemented, you have just to write
a new chromosome (just one object) to solve another problem. With the same encoding you just change the
fitness function - and you are done. However, for some problems, choosing and implementation of encoding
and fitness function can be difficult.

The disadvantage of GAs is in the computational time. GAs can be slower than other methods. But
sice we can terminate the computation in any time, the longer run is acceptable (especially with
faster and faster computers).
Prof. Marvin Intelligent Systems Page 58
Chapter 10: Agents that communicate
Communication is the intentional exchange of information brought about by
the production and perception of signs drawn from a shared system of
conventional signs.

Humans use a limited number of conventional signs (smiling, shaking


hands) to communicate in much the same way as other animals. Humans have
also developed a complex, structured system of signs known as language
that enables them to communicate most of what they know about the world.
Although chimpanzees, dolphins, and other mammals have shown vocabularies
of hundreds of signs and some aptitude for stringing them together,
humans are the only species that can reliably communicate an unbounded
number of qualitatively different messages.

10.1 Communication As Action

One of the actions available to an agent is to produce language. This is


called a speech act. “Speech” is used in the same sense as in “free
speech,” not “talking,” so typing, skywriting, and using sign language
all count as speech acts. English has no neutral word for an agent that
produces language, either by speaking or writing or anything else. We
will use speaker, hearer, and utterance as generic terms referring to any
mode of communication. We will also use the term words to refer to any
kind of conventional communicative sign.

Imagine a group of agents are exploring the wumpus world together. The
group gains an advantage (collectively and individually) by being able to
do the following:
• Inform each other about the part of the world each has explored, so
that each agent has less exploring to do. This is done by making
statements: There’s a breeze here in 3 4.
• Query other agents about particular aspects of the world. This is
typically done by asking questions: Have you smelled the wumpus
anywhere?
• Answer questions. This is a kind of informing. Yes, I smelled the
wumpus 2 5.
• Request or comman other agents to perform actions: Please help me
carry the gold. It can be seen as impolite to make a direct
requests, so often an indirect speech act(a request in the form a
statement or question) is used instead: I could use some help
carrying this or Could you help me carry this?
• Promise to do things or offer deals: I’ll shoot the wumpus if you
let me share the gold.
• Acknowledge requests and offers: OK.
• Share feelings and experiences with each other: You know, old chap,
when I get in a spot like that, I usually go back to the start and
head out in another direction, or Man, that wumpus sure needs some
deodorant!

Fundamentals of language

We distinguish between formal languages-the ones like Lisp and first-


order logic that are invented and rigidly defined-and natural languages-
the ones like Chinese, Danish, and English that humans use to talk to one
another. Although we are primarily interested in natural languages, we
will make use of all the tools of formal language theory.

A formal language is defined as a set of strings, where each string is a


sequence of symbols taken from a finite set called the terminal symbols.
For English, the terminal symbols include words like a, aardvark, aback,
abacus, and about 400,000 more.

Languages are based on the idea of phrase structure-that strings are


composed of substrings called phrases, which come in different
categories. For example, the phrases “the wumpus,” “The knig,” and “the
agent in the corner” are all examples of the category noun phrase (or NP
for short). There are two reasons for identifying phrases in this way.
First, phrases are convenient handles on which we can attach semsantics.
Second, categorizing phrases helps us to describe the allowable strings
Prof. Marvin Intelligent Systems Page 59
of the language. We can say that any of the noun phrases can combine
with a verb phrase(or VP) such as “is dead” to form a phrase of category
sentence (or S).

Categories such as NP, VP, and S are called nonterminal symbols. In the
BNF notation, rewrite rules consist of a single nonterminal symbol on the
left-hand side, and a sequence of terminals or nonterminals on the right-
hand side. The meaning of a rule such as

S Æ NP VP

is that we can take any phrase categorized as a NP, append to it any


phrase categorized as a VP, and the result will be a phrase categorized
as an S.

The component steps of communication

A typical communication episode, in which speaker S wants to convey


proposition P to hearer H using words W, is composed of seven processes.
Three take place in the speaker:

Intention: S wants H to believe P (where S typically believes P)


Generation: S chooses the words W (because they express the meaning P)
Synthesis: S utters the words W (usually addressing them to H)

Four take place in the hearer:

Perception: H perceives W’ (ideally W’ = W, but misperception is


possible)
Analysis: H infers that W’ has possible meanings P1,...,Pn (words
and Phrases can have several meanings)
Disambiguation: H infers that S intended to convey Pi (where ideally
Pi = P, but misinterpretation is possible)
Incorporation: H decides to believe Pi (or rejects it if it is out of
line with what H already believes)

10.2 Types of Communicating Agents

In this section we consider agents that communicate in two different


ways. First are agents who share a common internal representation
language: they can communicate without any external language at all.
Then come agents that make no assumptions about each other’s internal
language, but share a communication language that is a subset English.

1. Communicating using Tell and Ask

In this section, we study a form of communication in which agents share


the same internal representation language and have direct access to each
other’s knowledge bases through the TELL and ASK interface. That is,
agent A can communicate proposition P to agent B with TELL(KBB,”P”), just
as A would add P to its won knowledge base with TELL(KBA,”P”).
Similarly, agent A can find out if B knows Q with ASK(KBB,”Q”). We will
call this telepathic communication. Figure shows a schematic diagram in
which each agent is modified to have an input/output port to tis
knowledge base, in addition to the perception and action ports.

If agent A wanted to tell agent B that there is a pit in location[2,3],


all A would have to do is execute:

TELL(KBB,”Pit(PA1) ∧ At(PA1,[2,3],SA9)”)

where SA9 is the current situation, and PA1 is A’s symbol for the pit.

There are three difficulties:


1. There has to be a naming policy so that A and B do not
simultaneously introduce the same symbol to mean different things.
We have adopted the policy that each agent includes in own name as
part of the subscript to each symbol it introduces.
2. There has to be some way of relating symbols introduced by
different agents, so that an agent can tell whether PA1 and, say,
PB2 denote the same pit or not.
Prof. Marvin Intelligent Systems Page 60
3. The final difficulty is in reconciling the differences between
different agents’ knowledge bases. If communication is free and
instantaneous, then all agents can adopt the policy of broadcasting
each new fact to everyone else as soon as they learn it. That way
everyone will have all the same knowledge. But in most
applications, the bandwidth between agents is limited, and they are
often completely out of touch with each other for periods of time.
When they come back into contact, they have the problem of deciding
what new information is worth communicating, and of discovering
what interesting facts the other agent knows.

Another problem with telepathic agents as we have described them is that


they are vulnerable to sabotage. Another agent could TELL lies directly
into the knowledge base and make our naïve telepathic agent believe
anything.

2. Communicating using formal language

Because of the sabotage problem, and because it is infeasible for


everyone to have the same internal language, most agents communicate
through language rather than through direct access to knowledge bases.
Figure shows a diagram of this type of communication. Agents can perform
actions that produce language, which other agents can perceive. The
external communication language can be different from the internal
representation language, and the agents can each have different internal
languages.

10.3 A Formal Grammar for a Subset of English

In this section, we define a formal grammar for a small subset of English


that is suitable making statements about the wumpus world. We will call
this language ε0. In defining the language this way, we are implicitly
claiming that formal language techniques are appropriate for dealing with
natural language. In many ways, they are appropriate: natural make use
of a fixed set of letters (for written language) or sounds (for spoken
language) that combine into a relatively fixed set of words.

The Lexicon of ε0

The first step in defining a grammar is to define a lexicon, or list of


allowable vocabulary words. The words are grouped into the categories or
parts of speech familiar to dictionary users: nouns, pronouns, and names
to denote things, verbs to denote events, adjectives to modify nouns, and
adverbs to modify verbs. Categories that may be less familiar to some
readers are articles(such as the), prepositions(in), and
conjunctions(and).

Each of the categories ends in ... to indicate that there are other words
in the category. However, it should be noted that there are two distinct
reasons for the missing words. For nouns, verbs, adjectives, and
adverbs, it is in principle infeasible to list them all. Not only are
Prof. Marvin Intelligent Systems Page 61
there thousands or tens of thousands of members in each class, but new
ones are constantly being added. For example, “fax” is now a very common
noun and verb, but it was coined only a few years ago. These four
categories are called open classes. The other categories (pronoun,
article, preposition, and conjunction) are called closed classes. They
have a small number of words (a few to a few dozen) that could in
principle be enumerated. Closed classes change over the course of
centuries, not months. For example, “thee” and “thou” were commonly used
pronouns in the seventeenth century, were on the decline in the
nineteenth, and are seen today only in poetry and regional dialects.

The Grammar of ε0

The next step is to combine the words into phrases. We will use five
nonterminal symbols to define the different kinds of phrases:
sentence(S), noun phrase(NP), verb phrase(VP), prepositional phrase(PP),
and relative clause (RelClause). Figure shows a grammar for ε0 with an
example for each rewrite rule. ε0 generates good English sentences such
as the following:

John is in the pit


The wumpus that stinks is in 2 2
Mary is in Boston and John stinks

Unfortunately, the grammar overgenerates: that is, it generates sentences


that are not grammatical, such as “Me go Boston” and “I smell pit gold
wumpus nothing east.” It also undergenerates: there are many sentences
of English that it rejects, such as “I think the wumpus is smelly.”
(Another shortcoming is that the grammar does not capitalize the first
word of a sentence, nor add punctuation at the end. That is because it
is designed primarily for speech, not writing.)

Wumpus Lexicon:

Wumpus Grammer:
Prof. Marvin Intelligent Systems Page 62
Chapter 11: Expert Systems

An expert system is a computer program that uses knowledge and inference


techniques to solve problems that are usually solved with human
expertise. An expert system stores a large body of facts, along with
rules about how these facts can be used to solve problems. This
collection of knowledge is called a knowledge base.

The biggest problem in the design of an expert system is abstracting the


knowledge of the expert and putting it into an objective form for the
database. A human expert normally approaches a problem at least
partially subjectively. The computer, in contrast, can only work with
objective representations. Until formal rules and facts exist, the
expert system can not be designed.

Expert systems are designed by knowledge engineer. It is the knowledge


engineer who must observe, talk to, and work with the human expert to
determine how to express the reasoning process of the human expert in an
objective form.

Two basic types of expert systems can be built with Prolog.

1. Production system.
2. Frame-based system.

Production System:

A production system is a type of expert system in which the knowledge is


stored as rules and fact. Each rule is said to be a production rule.
The rules express relationships between facts. A production system has
two primary components: the knowledge base and the inference engine.

The knowledge base is the data or knowledge used to make decisions. The
knowledge base contains the rules and facts about the domain. The
knowledge base consists of two parts: the working memory and the rule
base. The rule base consists of facts and rules that are compiled as
part of the problem. The second part of the knowledge base is the
working memory. The working memory corresponds to Prolog’s dynamic
database.

The inference engine has two functions: inference and control. Inference
is the basic formal reasoning process. It involves matching and
unification. The control function determines the order in which the
rules are tested and what happens when a rule succeeds or fails.

Frame-Based System:

A second type of expert system is the frame-based system. In this


system, the object is represented by a frame. The frame contains one or
more filler slots, with each slot representing an attribute. Attribute
values are stored in the slots. A slot can also contain limited values
for another slot.

The frame is very complex and powerful form of knowledge representation.


Frames can be related in hierarchical relationships, expressing a
classification system. Slots can contain procedures that can be
activated to calculate a numerical value for an attribute.

Expert systems solve problems that are normally solved by human


“experts.” To solve expert-level problems, expert systems need access to
a substantial domain knowledge base, which must be built as efficiently
as possible. They also need to exploit one or more reasoning mechanisms
to apply their knowledge to the problems they are given.
Prof. Marvin Intelligent Systems Page 63
11.1 Representing and Using Domain Knowledge

MYCIN system attempts to recommend appropriate therapies for patients


with bacterial infections. It interacts with the physician to acquire
the clinical data it needs.

MYCIN represents most of its diagnostic knowledge as a set of rules.


Each rule has associated with it a certainty factor, which is a measure
of the extent to which the evidence that is described by the antecedent
of the rule supports the conclusion that is given in the rule’s
consequent. A typical MYCIN rule looks like:

If:
(1) the stain of the organism is gram-positive, and
(2) the morphology of the organism is coccus, and
(3) the growth conformation of the organism is clumps,

then there is suggestive evidence (0.7) that the identity of the organism
is staphylococcus.

PROSPECTOR is a program that provides advice on mineral exploration.

The DESIGN ADVISOR gives advice to a chip designer, who can accept or
reject the advice. If the advice is rejected, the system can exploit a
justification-based truth maintenance system to revise its model of the
circuit.

11.2 Expert System Shells

Initially, each expert system that was built was created from scratch,
usually in LISP. But, after several systems had been built this way, it
became clear that these systems often had a lot in common. In
particular, since the systems were constructed as a set of declarative
representations (mostly rules( combined with an interpreter for those
representations, it was possible to separate the interpreter from the
domain-specific knowledge and thus to create a system that could be used
to construct new expert systems by adding new knowledge corresponding to
the new problem domain. The resulting interpreters are called shells.
One influential example of such a shell is EMYCIN which was derived from
MYCIN.

11.3 Explanation

In order for an expert system to be an effective tool, people must be


able to interact with it easily. To facilitate this interaction, the
expert system must have the following two capabilities in addition to the
ability to perform its underlying task:

• Explain its reasoning: In many of the domains in which expert


systems operate, people will not accept results unless they have
been convinced of the accuracy of the reasoning process that
produced those results. Thus it is important the reasoning process
used in such programs proceed in understandable steps and that
enough meta-knowledge (knowledge about the reasoning process) be
available so the explanations of those steps can be generated.

• Acquire new knowledge and modifications of old knowledge: Since


expert systems derive their power from the richness of the
knowledge bases they exploit, it is extremely important that those
knowledge bases be as complete and as accurate as possible. But
often there exists no standard codification of that knowledge;
rather it exists only inside the heads of human experts. One way
to get this knowledge into a program is through interaction with
the human expert. Another way is to have the program learn expert
behavior from raw data.
Prof. Marvin Intelligent Systems Page 64
11.4 Knowledge Acquisition

How are expert systems built? Typically, a knowledge engineer interviews


a domain expert to elucidate expert knowledge, which is then translated
into rules. After the initial system is built, it must be iteratively
refined until it approximates expert-level performance. This process is
expensive and time-consuming, so it is worthwhile to look for more
automatic ways of constructing expert knowledge bases. While no totally
automatic knowledge acquisition systems yet exist, there are many
programs that interact with domain experts to extract expert knowledge
efficiently. These programs provide support for the following
activities:

• Entering knowledge
• Maintaining knowledge base consistency
• Ensuring knowledge base completeness

MOLE is a knowledge acquisition system for heuristic classification


problems, such as diagnosing diseases.

MOLE interacts with a domain expert to produce a knowledge base that a


system called MOLE-p (for MOLE-performance) uses to solve problems. The
acquisition proceeds through several steps:

1. Initial knowledge base construction.


2. Refinement of the knowledge base.

MOLE has been used to build systems that diagnose problems with car
engines, problems in steel-rolling mills, and inefficiencies in coal-
burning power plants.

Four major problems facing current expert systems are:

• Britleness: Because expert systems only have access to highly


specific domain knowledge, they cannot fall back on more general
knowledge when the need arises.

• Lack of Meta-Knowledge: Expert Systems do not have very


sophisticated knowledge about their own operation. They typically
cannot reason about their own scope and limitations, making it even
more difficult to deal with the brittleness problem.

• Knowledge Acquisition: Despite the development of the tools


acquisition still remains a major bottleneck in applying expert
systems technology to new domains.

• Validation: Measuring the performance of an expert system is


difficult because we do not know how to quantify the use of
knowledge.
Prof. Marvin Intelligent Systems Page 65

EXTRA MATERIAL FROM THE LECTURES


Expert Systems
An expert system is a computer program that uses knowledge and inference techniques to solve
problems that are usually solved with human expertise.
Building an expert system therefore first involves extracting the relevant knowledge from the
human expert.

A knowledge engineer has the job of extracting this knowledge and


building the expert system knowledge base. It is the knowledge engineer
who must observe, talk to, and work with the human expert to determine
how to express the reasoning process of the human expert in an objective
form.

(A first attempt at building an expert system is unlikely to be very successful. This is partly because
the expert generally finds it very difficult to express exactly what knowledge and rules they use to
solve a problem. Much of it is almost subconscious, or appears so obvious they don't even bother
mentioning it. Knowledge acquisition for expert systems is a big area of research, with a wide
variety of techniques developed. However, generally it is important to develop an initial prototype
based on information extracted by interviewing the expert, then iteratively refine it based on
feedback both from the expert and from potential users of the expert system.)

The system should be able to explain its reasoning (to expert, user and knowledge engineer) and
answer questions about the solution process.

Designing an Expert System (Rule-based expert system/Production System):

• The user interacts with the system through a user interface which may use menus, natural
language or any other style of interaction.
• An inference engine is used to reason with both the expert knowledge (extracted from our
friendly expert) and data specific to the particular problem being solved.
The expert knowledge will typically be in the form of a set of IF-THEN rules. The case specific data
includes both data provided by the user and partial conclusions (along with certainty measures)
based on this data.
• The explanation subsystem allows the program to explain its reasoning to the user.
• Knowledge base editor helps the expert or knowledge engineer to easily update and check
the knowledge base.

Rule-based systems can be either goal driven using backward chaining to test
whether some hypothesis is true, or data driven, using forward chaining to
draw new conclusions from existing data. Expert systems may use either or
both strategies, but the most common is probably the goal driven/backward
chaining strategy.
Prof. Marvin Intelligent Systems Page 66

Knowledge Acquisition:
There are many programs that interact with domain experts to extract expert knowledge efficiently.
These programs provide support for the following activities:
• Entering knowledge
• Maintaining knowledge base consistency
• Ensuring knowledge base completeness
MOLE is a knowledge acquisition system for heuristic classification problems, such as diagnosing
diseases.

Production Rules:

Suppose that we have the following rules:

1. IF engine_getting_petrol AND engine_turns_over


THEN problem_with_spark_plugs
2. IF NOT engine_turns_over AND NOT lights_come_on
THEN problem_with_battery
3. IF NOT engine_turns_over AND lights_come_on
THEN problem_with_starter
4. IF petrol_in_fuel_tank
THEN engine_getting_petrol

Our problem is to work out what's wrong with our car given some observable symptoms. There are
three possible problems with the car: problem_with_spark_plugs, problem_with_battery,
problem_with_starter.

We are assuming that we have been provided with no initial facts about the observable
symptoms.

In the simplest goal-directed system we would try to prove each hypothesised problem (with the
car) in turn. First the system would try to prove ``problem_with_spark_plugs''. Rule 1 is potentially
useful, so the system would set the new goals of proving ``engine_getting_petrol'' and
``engine_turns_over''. Trying to prove the first of these, rule 4 can be used, with new goal of
proving ``petrol_in_fuel_tank'' There are no rules which conclude this (and the system doesn't
already know the answer), so the system will ask the user:

Is it true that there's petrol in the fuel tank?

Let's say that the answer is yes. This answer would be recorded, so that the user doesn't get asked
the same question again. Anyway, the system now has proved that the engine is getting petrol, so
now wants to find out if the engine turns over. As the system doesn't yet know whether this is the
case, and as there are no rules which conclude this, the user will be asked:
Prof. Marvin Intelligent Systems Page 67
Is it true that the engine turns over?

Lets say this time the answer is no. There are no other rules which can be used to prove
``problem_with_spark_plugs'' so the system will conclude that this is not the solution to the
problem, and will consider the next hypothesis: problem_with_battery. It is true that the engine does
not turn over (the user has just said that), so all it has to prove is that the lights don't come one. It
will ask the user

Is it true that the lights come on?

Suppose the answer is no. It has now proved that the problem is with the battery. Some systems
might stop there, but usually there might be more than one solution, (e.g., more than one fault with
the car), or it will be uncertain which of various solutions is the right one. So usually all hypotheses
are considered. It will try to prove ``problem_with_starter'', but given the existing data (the lights
come on) the proof will fail, so the system will conclude that the problem is with the battery. A
complete interaction with our very simple system might be:

System: Is it true that there's petrol in the fuel tank?


User: Yes.
System: Is it true that the engine turns over?
User: No.
System Is it true that the lights come on?
User: No.
System: I conclude that there is a problem with battery.

Explanation facilities:

Most expert systems have explanation facilities that allow the user to ask why it asked some
question, and how it reached some conclusion. These questions are answered by referring to the
system goals, the rules being used, and any existing problem data. To illustrate the sorts of facilities
commonly provided, we'll elaborate on our simple example above, starting with a simple example
dialogue involving why and how questions and explanations.

System: Is there gas in the fuel tank?


User: Yes.
System: Is it true that the engine turns over?
User: No.
System: Is it true that the lights come on?
User: Why?
System: I am trying to prove: problem with battery. It has been established

that it is not the case that the engine turns over. Therefore if it is not the

case that the lights come on then there is a problem with battery.

Is it true that the lights come on?


User: No.
System: I conclude that there is a problem with the battery.
User: How?
System: This follows from Rule 2:

IF NOT engine_turns_overAND ... ….

NOT engine_turns_overwas given by the user.

NOT lights_come_onwas given by the user.


Prof. Marvin Intelligent Systems Page 68

Chapter 12: Applications

12.1 Natural language processing

Natural language processing (NLP) is a subfield of artificial


intelligence and linguistics. It studies the problems inherent in the
processing and manipulation of natural language, and, natural language
understanding devoted to making computers "understand" statements written
in human languages.

Early systems such as SHRDLU, working in restricted "blocks worlds" with


restricted vocabularies, worked extremely well, leading researchers to
excessive optimism which was soon lost when the systems were extended to
more realistic situations with real-world ambiguity and complexity.

Natural language understanding is sometimes referred to as an AI-complete


problem, because natural language recognition seems to require extensive
knowledge about the outside world and the ability to manipulate it. The
definition of "understanding" is one of the major problems in natural
language processing.

Some examples of the problems faced by natural language understanding


systems:

• The sentences We gave the monkeys the bananas because they were
hungry and We gave the monkeys the bananas because they were over-
ripe have the same surface grammatical structure. However, in one
of them the word they refers to the monkeys, in the other it refers
to the bananas: the sentence cannot be understood properly without
knowledge of the properties and behaviour of monkeys and bananas.

• A string of words may be interpreted in myriad ways. For example,


the string Time flies like an arrow may be interpreted in a variety
of ways:
o time moves quickly just like an arrow does;
o measure the speed of flying insects like you would measure
that of an arrow;
o measure the speed of flying insects like an arrow would;
o measure the speed of flying insects that are like arrows;
o a type of flying insect, "time-flies," enjoy arrows (compare
Fruit flies like a banana.)

The word "time" alone can be interpreted as three different parts of


speech, (noun in the first example, verb in 2, 3, 4, and adjective in 5).

English is particularly challenging in this regard because it has little


inflectional morphology to distinguish between parts of speech.

The major tasks in NLP:

• Text to speech
• Speech recognition
• Natural language generation
• Machine translation
• Question answering
• Information retrieval
• Information extraction
• Text-proofing
• Translation technology
• Automatic summarization

Speech synthesis:

Speech synthesis is the artificial production of human speech. A system


used for this purpose is termed a speech synthesizer, and can be
implemented in software or hardware. Speech synthesis systems are often
called text-to-speech (TTS) systems in reference to their ability to
Prof. Marvin Intelligent Systems Page 69
convert text into speech. However, there exist systems that instead
render symbolic linguistic representations like phonetic transcriptions
into speech.

A text-to-speech system (or engine) is composed of two parts: a front end


and a back end. Broadly, the front end takes input in the form of text
and outputs a symbolic linguistic representation. The back end takes the
symbolic linguistic representation as input and outputs the synthesized
speech waveform. The naturalness of a speech synthesizer usually refers
to how much the output sounds like the speech of a real person. The
intelligibility of a speech synthesizer refers to how easily the output
can be understood.

The front end has two major tasks. First it takes the raw text and
converts things like numbers and abbreviations into their written-out
word equivalents. This process is often called text normalization, pre-
processing, or tokenization. Then it assigns phonetic transcriptions to
each word, and divides and marks the text into various prosodic units,
like phrases, clauses, and sentences. The process of assigning phonetic
transcriptions to words is called text-to-phoneme (TTP) or grapheme-to-
phoneme (GTP) conversion. The combination of phonetic transcriptions and
prosody information make up the symbolic linguistic representation output
of the front end.

The other part, the back end, takes the symbolic linguistic
representation and converts it into actual sound output. The back end is
often referred to as the synthesizer. The different techniques
synthesizers use are described below.

Synthesizer technologies:

There are two main technologies used for the generating synthetic speech
waveforms: concatenative synthesis and formant synthesis.

Concatenative synthesis:

Concatenative synthesis is based on the concatenation (or stringing


together) of segments of recorded speech. Generally, concatenative
synthesis gives the most natural sounding synthesized speech. However,
natural variation in speech and automated techniques for segmenting the
waveforms sometimes result in audible glitches in the output, detracting
from the naturalness.

Formant synthesis:

Formant synthesis does not use any human speech samples at runtime.
Instead, the output synthesized speech is created using an acoustic
model. Parameters such as fundamental frequency, voicing, and noise
levels are varied over time to create a waveform of artificial speech.
This method is sometimes called Rule-based synthesis but some argue that
because many concatenative systems use rule-based components for some
parts of the system, like the front end, the term is not specific enough.

Speech recognition:

Speech recognition technologies allow computers equipped with a source of


sound input, such as a microphone, to interpret human speech, e.g., for
transcription or as an alternative method of interacting with a computer.

Classification:

Such systems can be classified as:

• whether they require the user to "train" the system to recognise


their own particular speech patterns or not,
• whether the system is trained for one user only or is speaker
independent,
• whether the system can recognise continuous speech or requires
users to break up their speech into discrete words,
Prof. Marvin Intelligent Systems Page 70
• whether the system is intended for clear speech material, or is
designed to operate on distorted transfer channels (e.g., cellular
telephones) and possibly background noise or other speaker talking
simultaneously, and
• whether the vocabulary the system recognises is small (in the order
of tens or at most hundreds of words), or large (thousands of
words).
• The context of recognition - digits, names, free sentences,
commands.

Approaches

The two most common approaches used to recognize a speaker’s response are
often called grammar constrained recognition and natural language
recognition. When ASR (automatic speech recognition) is used to
transcribe speech, it is commonly called dictation.

Natural language generation:

Natural Language Generation (NLG) is the natural language processing task


of generating natural language from a machine representation system such
as a knowledge base or a logical form.

Stages

The process to generate text can be as simple as keeping a list of canned


text that is copied and pasted, possibly linked with some glue text. The
results may be satisfactory in simple domains such as horoscope machines
or generators of personalised business letters. However, a sophisticated
NLG system needs to include stages of planning and merging of information
to enable the generation of text that looks natural and does not become
repetitive. Typical stages are:

Content determination: Determination of the salient features that are


worth being said. Methods used in this stage are related to data mining.

Discourse planning: Overall organisation of the information to convey.

Sentence aggregation: Merging of similar sentences to improve readability


and naturalness. For example, the sentences "The next train is the
Caledonian Express" and "The next train leaves Aberdeen at 10am" can be
aggregated to form "The next train, which leaves at 10am, is the
Caledonian express".

Lexicalisation: Putting words to the concepts.

Referring expression generation: Linking words in the sentences by


introducing pronouns and other types of means of reference.

Syntactic and morphological realisation: This stage is the inverse of


parsing: given all the information collected above, syntactic and
morphological rules are applied to produce the surface string.

Orthographic realisation: Matters like casing, punctuation, and


formatting are resolved.

Machine translation:

Machine translation (MT) is a form of translation where a computer


program analyses the text in one language — the "source text" — and then
attempts to produce another, equivalent text in another language — the
target text — without human intervention.

Currently the state of machine translation is such that it involves some


human intervention, as it requires a pre-editing and a post-editing
phase. Note that in machine translation, the translator supports the
machine and not the other way around.
Prof. Marvin Intelligent Systems Page 71
Nowadays most machine translation systems produce what is called a
"gisting translation" — a rough translation that gives the "gist" of the
source text, but is not otherwise usable.

However, in fields with highly limited ranges of vocabulary and simple


sentence structure, for example weather reports, machine translation can
deliver useful results.

Question answering:

Question answering (QA) is a type of information retrieval. Given a


collection of documents (such as the World Wide Web or a local
collection) the system should be able to retrieve answers to questions
posed in natural language. QA is regarded as requiring more complex
natural language processing (NLP) techniques than other types of
information retrieval such as document retrieval, and it is sometimes
regarded as the next step beyond search engines.

QA research attempts to deal with a wide range of question types


including: factoid, list, definition, How, Why, hypothetical,
semantically-constrained, and cross-lingual questions. Search collections
vary from small local document collections, to internal organization
documents, to compiled newswire reports, to the world wide web.

• Closed-domain question answering deals with questions under a


specific domain (for example, medicine or automotive maintenance), and
can be seen as an easier task because NLP systems can exploit domain-
specific knowledge such as ontologies.
• Open-domain question answering deals with questions about nearly
everything, and can only rely on general ontologies and world
knowledge. On the other hand, these systems usually have much more
data available from which to extract the answer.

Information retrieval:

Information retrieval (IR) is the art and science of searching for


information in documents, searching for documents themselves, searching
for metadata which describe documents, or searching within databases,
whether relational stand alone databases or hypertext networked databases
such as the Internet or intranets, for text, sound, images or data. There
is a common confusion, however, between data retrieval, document
retrieval, information retrieval, and text retrieval, and each of these
have their own bodies of literature, theory, praxis and technologies.

IR is a broad interdisciplinary field, that draws on many other


disciplines. Indeed, because it is so broad, it is normally poorly
understood, being approached typically from only one perspective or
another. It stands at the junction of many established fields, and draws
upon cognitive psychology, information architecture, information design,
human information behaviour, linguistics, semiotics, information science,
computer science and librarianship.

Automated information retrieval (IR) systems were originally used to


manage information explosion in scientific literature in the last few
decades. Many universities and public libraries use IR systems to provide
access to books, journals, and other documents. IR systems are often
related to object and query. Queries are formal statements of information
needs that are put to an IR system by the user. An object is an entity
which keeps or stores information in a database. User queries are matched
to documents stored in a database. A document is, therefore, a data
object. Often the documents themselves are not kept or stored directly in
the IR system, but are instead represented in the system by document
surrogates.
Prof. Marvin Intelligent Systems Page 72
Information extraction:

Information extraction (IE) is a type of information retrieval whose goal


is to automatically extract structured or semistructured information from
unstructured machine-readable documents.

A typical application of IE is to scan a set of documents written in a


natural language and populate a database with the information extracted.
Current approaches to IE use natural language processing techniques that
focus on very restricted domains.

Typical subtasks of IE are:

• Named Entity Recognition: recognition of entity names (for people


and organizations), place names, temporal expressions, and certain
types of numerical expressions.
• Coreference: identification chains of noun phrases that refer to
the same object. For example, anaphora are a type of coreference.

Proofreading:

Proofreading means reading a proof copy of a text in order to detect and


correct any errors. A proof copy is traditionally a version of a
manuscript that has been typeset after copy editing, but the line between
copy editing and proofreading is thin. Proof manuscripts often contain
typographical errors introduced during typesetting by mistyping (hence
the word "typo" to refer to misplaced or incorrect characters). Earlier,
when handwritten originals were common, it was often easier for a copy
editor to review and mark up a manuscript after it had been typeset.
Today, proofreading usually refers to reviewing any text, hardcopy or
electronic copy (on a computer), and checking for all types of errors.

Translation:

Translation is an activity comprising the interpretation of the meaning


of a text in one language — the source text — and the production of a
new, equivalent text in another language — called the target text, or the
translation.

Traditionally, translation has been a human activity, although attempts


have been made to automate and computerise the translation of natural
language texts — machine translation — or to use computers as an aid to
translation — computer-assisted translation.

The goal of translation is to establish a relationship of equivalence


between the source and the target texts (that is to say to ensure that
both texts communicate the same message), while taking into account a
number of constraints. These constraints include context, the rules of
grammar of the source language, its writing conventions, its idioms and
the like.

Automatic summarization:

Automatic Summarization is the creation of a shortened version of a text


by a computer program. The product of this procedure still contains the
most important points of the original text.

Access to coherent and correctly-developed text summaries can be of great


use, especially in our time of information overload, in which the amount
of information electronically available to us, grows every day. A good
example of the use of summarization technology could be search engines
such as Google.

Technologies that can make a coherent summary, of any kind of text, need
to take into account several variables such as length, writing-style and
syntax to make a useful summary.
Prof. Marvin Intelligent Systems Page 73
Some problems which make NLP difficult
Word boundary detection:
In spoken language, there are usually no gaps between words; where
to place the word boundary often depends on what choice makes the
most sense grammatically and given the context. In written form,
languages like Chinese do not signal word boundaries either.
Word sense disambiguation:
Many words have more than one meaning; we have to select the
meaning which makes the most sense in context.
Syntactic ambiguity:
The grammar for natural languages is not unambiguous, i.e. there
are often multiple possible parse trees for a given sentence.
Choosing the most appropriate one usually requires semantic and
contextual information.
Imperfect or irregular input:
Foreign or regional accents and vocal impediments in speech; typing
or grammatical errors, OCR errors in texts.
Speech acts and plans:
Sentences often don't mean what they literally say; for instance a
good answer to "Can you pass the salt" is to pass the salt; in most
contexts "Yes" is not a good answer, although "No" is better and
"I'm afraid that I can't see it" is better yet. Or again, if a
class was not offered last year, "The class was not offered last
year" is a better answer to the question "How many students failed
the class last year?" than "None" is.

12.2 Perception

In psychology and the cognitive sciences, perception is the process of


acquiring, interpreting, selecting, and organizing sensory information.
Methods of studying perception range from essentially biological or
physiological approaches, through psychological approaches to the often
abstract 'thought-experiments' of mental philosophy.

The senses

Human perception depends on the senses. The classical five senses are
sight, hearing, smell, taste and touch. Along with these there are at
least four other senses: proprioception (body awareness),
equilibrioception (balance), thermoception (heat) and nociception (pain).
Beyond these, some believe in the existence of other senses such as
precognition (or foretelling) or telepathy (direct communication between
human minds/brains without transmittance through any other medium). While
these are controversial, it is known that animals of other species
possess senses that are not found in humans: for example, some fish can
detect electric fields, while pigeons have been shown to detect magnetic
fields and to use them in homing.In psychology and the cognitive
sciences, perception is the process of acquiring, interpreting,
selecting, and organizing sensory information. Methods of studying
perception range from essentially biological or physiological approaches,
through psychological approaches to the often abstract 'thought-
experiments' of mental philosophy.

History of the study of perception

Perception is one of the oldest fields within scientific psychology, and


there are correspondingly many theories about its underlying processes.
The oldest quantitative law in psychology is the Weber-Fechner law, which
quantifies the relationship between the intensity of physical stimuli and
their perceptual effects. It was the study of perception that gave rise
to the Gestalt school of psychology, with its emphasis on holistic
approaches.

Perception and reality

Many cognitive psychologists hold that, as we move about in the world, we


create a model of how the world works. That is, we sense the objective
world, but our sensations map to percepts, and these percepts are
provisional, in the same sense that scientific hypotheses are provisional
(cf. in the scientific method). As we acquire new information, our
percepts shift. Abraham Pais' biography refers to the 'esemplastic'
Prof. Marvin Intelligent Systems Page 74
nature of imagination. In the case of visual perception, some people can
actually see the percept shift in their mind's eye. Others who are not
picture thinkers, may not necessarily perceive the 'shape-shifting' as
their world changes. The 'esemplastic' nature has been shown by
experiment: an ambiguous image has multiple interpretations on the
perceptual level. Just as one object can give rise to multiple percepts,
so an object may fail to give rise to any percept at all: if the percept
has no grounding in a person's experience, the person may literally not
perceive it.

This confusing ambiguity of perception is exploited in human technologies


such as camouflage, and also in biological mimicry, for example by
Peacock butterflies, whose wings bear eye markings that birds respond to
as though they were the eyes of a dangerous predator.

Cognitive theories of perception assume there is a poverty of stimulus.


This (with reference to perception) is the claim that sensations are, by
themselves, unable to provide a unique description of the world.
Sensations require 'enriching', which is the role of the mental model. A
different type of theory is the perceptual ecology approach of James J.
Gibson. Gibson rejected the assumption of a poverty of stimulus by
rejecting the notion that perception is based in sensations. Instead, he
investigated what information is actually presented to the perceptual
systems. He (and the psychologists who work within this paradigm)
detailed how the world could be specified to a mobile, exploring organism
via the lawful projection of information about the world into energy
arrays. Specification is a 1:1 mapping of some aspect of the world into a
perceptual array; given such a mapping, no enrichment is required and
perception is direct.

The philosophy of perception concerns how mental processes and symbols


depend on the world internal and external to the perceiver. Our
perception of the external world begins with the senses, which lead us to
generate empirical concepts representing the world around us, within a
mental framework relating new concepts to preexisting ones. Because
perception leads to an individual's impression of the world, its study
may be important for those interested in better understanding
communication, self, id, ego —even reality.

A major issue in the philosophy of perception is the possibility of


discrepancies between the external world and the perceiver's impressions,
which are sometimes referred to as qualia. While René Descartes concluded
that the question "Do I exist?" can only be answered in the affirmative
(cogito ergo sum), Freudian psychology suggests that self-perception is
an illusion of the ego, and cannot be trusted to decide what is in fact
real. Such questions are continuously reanimated, as each generation
grapples with the nature of existence from within the human condition.
The questions remain: Do our perceptions allow us to experience the world
as it "really is?" Can we ever know another point of view in the way we
know our own?

Categories of perception

We can categorize perception as internal or external.

• Internal perception (proprioception) tells us what's going on in


our bodies. We can sense where our limbs are, whether we're sitting
or standing; we can also sense whether we are hungry, or tired, and
so forth.
• External or Sensory perception (exteroception), tells us about the
world outside our bodies. Using our senses of sight, hearing,
touch, smell, and taste, we discover colors, sounds, textures, etc.
of the world at large.

The philosophy of perception is mainly concerned with exteroception. When


philosophers use the word perception they usually mean exteroception, and
the word is used in that sense everywhere .
Prof. Marvin Intelligent Systems Page 75
Visual perception:

Visual perception is one of the senses, consisting of the ability to


detect light and interpret (see) it as the perception known as sight or
naked eye vision. Vision has a specific sensory system, the visual
system.

There is disagreement as to whether or not this constitutes one, two or


even three distinct senses. Some people make a distinction between "black
and white" vision and the perception of colour, and others point out that
vision using rod cells uses different physical detectors on the retina
from cone cells. Some argue that the perception of depth also constitutes
a sense, but others argue that this is really cognition (that is, post-
sensory) function derived from having stereoscopic vision (two eyes) and
is not a sensory perception as such. Many people are also able to
perceive the polarization of light

12.3 Robotics

Introduction to Robotics:

The robots of the movies, such as C-3PO and the Terminator are portrayed
as fantastic, intelligent, even dangerous forms of artificial life.
However, robots of today are not exactly the walking, talking intelligent
machines of movies, stories and our dreams. Today, we find most robots
working for people in factories, warehouses, and laboratories. In the
future, robots may show up in other places: our schools, our homes, even
our bodies.

Robots have the potential to change our economy, our health, our standard
of living, our knowledge and the world in which we live. As the
technology progresses, we are finding new ways to use robots. Each new
use brings new hope and possibilities, but also potential dangers and
risks.

What is a Robot?

Joseph Engelberger, a pioneer in industrial robotics, once remarked "I


can't define a robot, but I know one when I see one." If you consider all
the different machines people call robots, you can see that it's nearly
impossible to come up with a comprehensive definition.

In practical usage, a robot is an autonomous or semi-autonomous device


which performs its tasks either by direct human control, partial control
with human supervision, or completely autonomously. Robots are typically
used to do tasks that are too dull, dirty, or dangerous for humans.
Industrial robots used in manufacturing lines used to be the most common
form of robots, but that has recently been replaced by consumer robots
cleaning floors and mowing lawns. Other applications include toxic waste
cleanup, underwater and space exploration, surgery, mining, search and
rescue, and mine finding. Robots are also finding their way into
entertainment and home health care.

Examples:

• R2D2 and C-3PO: The intelligent, speaking robots with loads of


personality in the Star Wars movies
• Sony's AIBO: A robotic dog that learns through human interaction
• Honda's ASIMO: A robot that can walk on two legs like a person
• Industrial robots: Automated machines that work on assembly lines
• Bomb-defusing robots
• NASA's Mars rovers
• The lawn-mowing robot from Friendly Robotics

Robot Basics:

Most robots are designed to be a helping hand. They help people with
tasks that would be difficult, unsafe, or boring for a real person to do
alone.
Prof. Marvin Intelligent Systems Page 76
At its simplest, a robot is machine that can be programmed to perform a
variety of jobs, which usually involve moving or handling objects. Robots
can range from simple machines to highly complex, computer-controlled
devices.

Many of today's robots are robotic arms. In this exhibit, we will focus
on one very "flexible" kind of robot, which looks similar to a certain
part of your body. It is called a jointed-arm robot.
The vast majority of robots do have several qualities in common. First of
all, almost all robots have a movable body. Some only have motorized
wheels, and others have dozens of movable segments, typically made of
metal or plastic. Like the bones in your body, the individual segments
are connected together with joints.

Robots spin wheels and pivot jointed segments with some sort of actuator.
Some robots use electric motors and solenoids as actuators; some use a
hydraulic system; and some use a pneumatic system (a system driven by
compressed gases). Robots may use all these actuator types.

A robot needs a power source to drive these actuators. Most robots either
have a battery or they plug into the wall. Hydraulic robots also need a
pump to pressurize the hydraulic fluid, and pneumatic robots need an air
compressor or compressed air tanks.

The actuators are all wired to an electrical circuit. The circuit powers
electrical motors and solenoids directly, and it activates the hydraulic
system by manipulating electrical valves. The valves determine the
pressurized fluid's path through the machine. To move a hydraulic leg,
for example, the robot's controller would open the valve leading from the
fluid pump to a piston cylinder attached to that leg. The pressurized
fluid would extend the piston, swiveling the leg forward. Typically, in
order to move their segments in two directions, robots use pistons that
can push both ways.

The robot's computer controls everything attached to the circuit. To move


the robot, the computer switches on all the necessary motors and valves.
Most robots are reprogrammable -- to change the robot's behavior, you
simply write a new program to its computer.

Not all robots have sensory systems, and few have the ability to see,
hear, smell or taste. The most common robotic sense is the sense of
movement -- the robot's ability to monitor its own motion. A standard
design uses slotted wheels attached to the robot's joints. An LED on one
side of the wheel shines a beam of light through the slots to a light
sensor on the other side of the wheel. When the robot moves a particular
joint, the slotted wheel turns. The slots break the light beam as the
wheel spins. The light sensor reads the pattern of the flashing light and
transmits the data to the computer. The computer can tell exactly how far
the joint has swiveled based on this pattern. This is the same basic
system used in computer mice.

These are the basic nuts and bolts of robotics. Roboticists can combine
these elements in an infinite number of ways to create robots of
unlimited complexity. In the next section, we'll look at one of the most
popular designs, the robotic arm.

Main Parts:

For a machine to qualify as a robot, it usually needs these 5 parts:

• Controller

Every robot is connected to a computer, which keeps the pieces of the arm
working together. This computer is known as the controller. The
controller functions as the "brain" of the robot. The controller also
allows the robot to be networked to other systems, so that it may work
together with other machines, processes, or robots. Robots today have
controllers that are run by programs - sets of instructions written in
code. Almost all robots of today are entirely pre-programmed by people;
they can do only what they are programmed to do at the time, and nothing
else. In the future, controllers with artificial intelligence, or AI
Prof. Marvin Intelligent Systems Page 77
could allow robots to think on their own, even program themselves. This
could make robots more self-reliant and independent.

• Arm

Robot arms come in all shapes and sizes. The arm is the part of the robot
that positions the end-effector and sensors to do their pre-programmed
business. Many (but not all) resemble human arms, and have shoulders,
elbows, wrists, even fingers. This gives the robot a lot of ways to
position itself in its environment. Each joint is said to give the robot
1 degree of freedom. So, a simple robot arm with 3 degrees of freedom
could move in 3 ways: up and down, left and right, forward and backward.
Most working robots today have 6 degrees of freedom.

• Drive

The drive is the "engine" that drives the links (the sections between the
joints) into their desired position. Without a drive, a robot would just
sit there, which is not often helpful. Most drives are powered by air,
water pressure, or electricity.

• End Effector

The end-effector is the "hand" connected to the robot's arm. It is often


different from a human hand - it could be a tool such as a gripper, a
vacuum pump, tweezers, scalpel, blowtorch - just about anything that
helps it do its job. Some robots can change end-effectors, and be
reprogrammed for a different set of tasks. If the robot has more than one
arm, there can be more than one end-effector on the same robot, each
suited for a specific task.

• Sensor

Most robots of today are nearly deaf and blind. Sensors can provide some
limited feedback to the robot so it can do its job. Compared to the
senses and abilities of even the simplest living things, robots have a
very long way to go. The sensor sends information, in the form of
electronic signals back to the controller. Sensors also give the robot
controller information about its surroundings and lets it know the exact
position of the arm, or the state of the world around it. Sight, sound,
touch, taste, and smell are the kinds of information we get from our
world. Robots can be designed and programmed to get specific information
that is beyond what our 5 senses can tell us. For instance, a robot
sensor might "see" in the dark, detect tiny amounts of invisible
radiation or measure movement that is too small or fast for the human eye
to see.

The Robotic Arm:


The term robot comes from the Czech word robota, generally translated as
"forced labor." This describes the majority of robots fairly well. Most
robots in the world are designed for heavy, repetitive manufacturing
work. They handle tasks that are difficult, dangerous or boring to human
beings.

The most common manufacturing robot is the robotic arm. A typical robotic
arm is made up of seven metal segments, joined by six joints. The
computer controls the robot by rotating individual step motors connected
to each joint (some larger arms use hydraulics or pneumatics). Unlike
ordinary motors, step motors move in exact increments. This allows the
computer to move the arm very precisely, repeating exactly the same
movement over and over again. The robot uses motion sensors to make sure
it moves just the right amount.

An industrial robot with six joints closely resembles a human arm -- it


has the equivalent of a shoulder, an elbow and a wrist. Typically, the
shoulder is mounted to a stationary base structure rather than to a
movable body. This type of robot has six degrees of freedom, meaning it
can pivot in six different ways. A human arm, by comparison, has seven
degrees of freedom.
Prof. Marvin Intelligent Systems Page 78
Your arm's job is to move your hand from place to place. Similarly, the
robotic arm's job is to move an end effector from place to place. You can
outfit robotic arms with all sorts of end effectors, which are suited to
a particular application. One common end effector is a simplified version
of the hand, which can grasp and carry different objects. Robotic hands
often have built-in pressure sensors that tell the computer how hard the
robot is gripping a particular object. This keeps the robot from dropping
or breaking whatever it's carrying. Other end effectors include
blowtorches, drills and spray painters.

Everyday robot tasks:

Although robots can't do every type of job, there are certain tasks
robots do very well:

• Assembling products
• Handling dangerous materials
• Spraying finishes
• Inspecting parts, produce, and livestock
• Cutting and polishing

In contemporary manufacturing, fewer people are doing these tasks, as


robots fill this niche.

The Future: AI and Robotics


Artificial intelligence (AI) is arguably the most exciting field in
robotics. It's certainly the most controversial: Everybody agrees that a
robot can work in an assembly line, but there's no consensus on whether a
robot can ever be intelligent.

Like the term "robot" itself, artificial intelligence is hard to define.


Ultimate AI would be a recreation of the human thought process -- a man-
made machine with our intellectual abilities. This would include the
ability to learn just about anything, the ability to reason, the ability
to use language and the ability to formulate original ideas. Roboticists
are nowhere near achieving this level of artificial intelligence, but
they have had made a lot of progress with more limited AI. Today's AI
machines can replicate some specific elements of intellectual ability.

Computers can already solve problems in limited realms. The basic idea of
AI problem-solving is very simple, though its execution is complicated.
First, the AI robot or computer gathers facts about a situation through
sensors or human input. The computer compares this information to stored
data and decides what the information signifies. The computer runs
through various possible actions and predicts which action will be most
successful based on the collected information. Of course, the computer
can only solve problems it's programmed to solve -- it doesn't have any
generalized analytical ability. Chess computers are one example of this
sort of machine.

Some modern robots also have the ability to learn in a limited capacity.
Learning robots recognize if a certain action (moving its legs in a
certain way, for instance) achieved a desired result (navigating an
obstacle). The robot stores this information and attempts the successful
action the next time it encounters the same situation. Again, modern
computers can only do this in very limited situations. They can't absorb
any sort of information like a human can. Some robots can learn by
mimicking human actions. In Japan, roboticists have taught a robot to
dance by demonstrating the moves themselves.

The real challenge of AI is to understand how natural intelligence works.


Developing AI isn't like building an artificial heart -- scientists don't
have a simple, concrete model to work from. We do know that the brain
contains billions and billions of neurons, and that we think and learn by
establishing electrical connections between different neurons. But we
don't know exactly how all of these connections add up to higher
reasoning, or even low-level operations. The complex circuitry seems
incomprehensible. Because of this, AI research is largely theoretical.
Scientists hypothesize on how and why we learn and think, and they
experiment with their ideas using robots.

You might also like