IS Notes
IS Notes
What is AI?
Definitions of AI according to eight textbooks are shown in figure below:
“The study of how to make computers “AI …is concerned with intelligent
do things which, at the moment, behavior in artifacts.” (Nilsson,
people do better.” (Rich and Knight, 1998)
1991)
If we are going to say that a given program thinks like a human, we must
have some way of determining how humans think. We need to get inside the
actual workings of human minds. There are two ways to do this: through
introspection -trying to catch our own thoughts as they go by -and
through psychological experiments. The interdisciplinary field of
cognitive science brings together computer models from AI and
experimental techniques from psychology to try to construct precise and
testable theories of the workings of the human mind.
The Greek philosopher Aristotle was one of the first to attempt to codify
“right thinking”, that is, irrefutable reasoning processes. His
syllogisms provided patterns for argument structures that always yielded
correct conclusions when given correct premises-for example, ”Socrates is
a man; all men are mortal; therefore, Socrates is mortal.” These laws of
thought were supposed to govern the operation of the mind; their study
initiated the field called logic.
Logicians in the 19th century developed a precise notation for
statements about all kinds of things in the world and about the relations
among them. By 1965, programs existed that could, in principle, solve any
solvable problem described in logical notation. The so-called logicist
tradition within artificial intelligence hopes to build on such programs
to create intelligent systems.
There are two main obstacles to this approach. First, it is not
easy to take information knowledge and state it in the formal terms
required by logical notation, particularly when the knowledge is les than
100% certain. Second, there is a big difference between being able to
solve a problem “in principle” and doing so in practice.
An agent is just something that acts (agent comes from the Latin agere,
to do). But computer agents are expected to have other attributes that
distinguish them from mere “programs”, such as operating under autonomous
control, perceiving their environment, persisting over a prolonged time
period, adapting to change, and being capable of taking on another’s
goals. A rational agent is one that acts so as to achieve the best
outcome or, when there is uncertainty, the best expected outcome.
Making correct inferences is sometimes part of being a rational
agent, because one way to act rationally is to reason logically to the
conclusion that a given action will achieve one’s goals and then to act
on that conclusion.
Abridged history of AI
• 1943 McCulloch & Pitts: Boolean circuit model of brain
• 1950 Turing's "Computing Machinery and Intelligence"
• 1956 Dartmouth meeting: "Artificial Intelligence" adopted
• 1952—69 Look, Ma, no hands!
• 1950s Early AI programs, including Samuel's checkers
program, Newell & Simon's Logic Theorist,
Gelernter's Geometry Engine
• 1965 Robinson's complete algorithm for logical reasoning
• 1966—73 AI discovers computational complexity
Neural network research almost disappears
• 1969—79 Early development of knowledge-based systems
• 1980-- AI becomes an industry
• 1986-- Neural networks return to popularity
• 1987-- AI becomes a science
• 1995-- The emergence of intelligent agents(systems)
Prof. Marvin Intelligent Systems Page 3
State of the art
• Game playing: IBM’s Deep Blue defeated the reigning world chess
champion Garry Kasparov in 1997. The value of IBM’s stock increased
by $18 billion.
• Mathematics: Proved a mathematical conjecture (Robbins conjecture)
unsolved for decades.
• Autonomous control: The ALVINN computer vision system was trained
to steer a car to keep it following a lane. It was placed in CMU’s
NAVLAB computer-controlled minivan and used to navigate across the
United States from Pittsburgh to San Diego. For 2850 miles it was
in control of the vehicle 98% of the time. NAVLAB has video cameras
that transmit road images to ALVINN, which then computes the best
direction to steer, based on experience from previous training
runs.
• Diagnosis: Medical diagnosis programs based on probabilistic
analysis have been able to perform at the level of an expert
physician in several areas of medicine.
• Logistics planning: During the 1991 Gulf War, US forces deployed an
AI logistics planning and scheduling program (DART) to do automated
logistics planning and scheduling for transportation that involved
up to 50,000 vehicles, cargo, and people.
• Autonomous planning and scheduling: NASA's on-board autonomous
planning program controlled the scheduling of operations for a
spacecraft.
• Language understanding and problem solving: Proverb is a computer
program that solves crossword puzzles better than most humans.
• Robotics: Many surgeons now use robot assistants (like HipNav) in
microsurgery.
Prof. Marvin Intelligent Systems Page 4
Chapter 2: Intelligent Agents
A rational agent is one that does the right thing. Obviously, this is
better than doing the wrong thing, but what does it mean? As a first
approximation, we will say that the right action is the one that will
cause the agent to be most successful. That leaves us with he problem of
deciding how and when to evaluate the agent’s success.
We use the term performance measure for the how-the criteria that
determine how successful an agent is. As an example, consider the case
of an agent that is supposed to vacuum a dirty floor. A plausible
performance measure would be the amount of dirt cleaned up in a single
eight-hour shift. A more sophisticated performance measure would factor
in the amount of electricity consumed and the amount of noise generated
as well. A third performance measure might give highest marks to an
agent that not only cleans the floor quietly and efficiently, but also
finds time to go windsurfing at the weekend.
Well, this is not quite true. If the clock and its owner take a trip
from California to Australia, the right thing for the clock to do would
be to turn itself back six hours. We do not get upset at our clocks for
failing to do this because we realize that they are acting rationally,
given their lack of perceptual equipment.
Autonomy
There is one more thing to deal with in the definition of an ideal
rational agent: the “built-in knowledge” part. If the agent’s actions
are based completely on built-in knowledge, such that it need pay no
attention to its percepts, then we say that the agent lacks autonomy.
An agent’s behavior can be based on both its own experience and the
built-in knowledge used in constructing the agent for the particular
Prof. Marvin Intelligent Systems Page 5
environment in which it operates. A system is autonomous to the extent
that its behavior is determined by its own experience.
Autonomy not only fits in with our intuition, but it is an example of
sound engineering practices. An agent that operates on the basis of
built-in assumptions will only operate successfully when those
assumptions hold, and thus lack flexibility.
Before we design an agent program, we must have a pretty good idea of the
possible percepts and actions, what goals or performance measure the
agent is supposed to achieve, and what sort of environment it will
operate in. These come in a wide variety. The figure below shows the
basic elements for a selection of agent types.
AGENT PERCEPTS ACTIONS GOALS ENVIRONMENT
Medical Symptoms,findings, Questions, Healthy Patient,
diagnosis patient’s answers Tests, Patients, hospital
system treatments Minimize
costs
Satellite Pixels of varying Print a Correct Images from
image Intensity,color categorizatio Categoriza orbiting
Analysis n tion satellite
system scene
Part- Pixels of varying Pick up parts Place Conveyor
picking intensity And sort into parts in belt
robot bins correct With parts
bins
Refinery Temperature, Open,close Maximize, refinery
controller Pressure readings vavles;adjust purity,
temperature Yield,
safety
Interactive Typed words Print Maximize, Set of
English exercises, Students’s student
tutor Suggestions, score on
correction test
An Example
At this point, it will be helpful to consider a particular environment,
so that our discussion can become more concrete. We will look at the job
of designing an automated taxi driver.
We must first think about the percepts, actions, goals and environment
for the taxi. They are summarized in Figure below.
3
What performance measure would we like our automated driver to aspire to?
Desirable qualities include getting to the correct destination;
minimizing violations of traffic laws and disturbances to other drivers;
maximizing safety and passenger comfort; maximizing profits. Obviously,
some of these goals conflict, so there will be trade-offs involved.
Agents that keep track of the world / Reflex agent with internal
state:
The simple reflex agent described before will work only if the correct
decision can be made on the basis of the current percept. If the car in
front is a recent model, and has the centrally mounted brake light now
required in the United States, then it will be possible to tell if it is
braking from a single image. Unfortunately, older models have different
configurations of tail lights, brake lights, and turn-signal lights, and
it is not always possible to tell if the car is braking. Thus, even for
the simple braking rule, our driver will have to maintain some sort of
internal state in order to choose an action.
Figure gives the structure of the reflex agent, showing how the current
percept is combined with the old internal state to generate the updated
description of the current state.
Goal-based agents:
Knowing about the current state of the environment is not always enough
to decide what to do. For example, at a road junction, the taxi can turn
left, right, or go straight on. The right decision depends on where the
taxi is trying to get to. In other words, as well as a current state
description, the agent needs some sort of goal information, which
describes situations that are desirable- for example, being at the
passenger’s destination. The agent program can combine this with
information about the results of possible actions (the same information
as was used to update internal state in the reflex agent) in order to
choose actions that achieve the goal.
Utility-based agents:
Prof. Marvin Intelligent Systems Page 8
2.3 Environments
Properties of environments
Environments come in several flavors. The principal distinction to be
make are as follows:
State space search: The initial state and the successor function
implicitly define the state space of the problem--the set of all states
reachable from the initial state. The state space forms a graph in which
the nodes are states and the arcs between the nodes are actions.
Searching the state space to get a path from the initial state to the
final state is called state space search.
Give the depth first search algorithm and state its advantages and disadvantages. [ BU Dec
2004 ]
Advantages
• Depth-first search requires less memory since only the nodes on the
current path are stored.
• By chance (or if care is taken in ordering the alternative
successor states), depth-first search may find a solution without
examining much of the search space at all.
Prof. Marvin Intelligent Systems Page 10
Disadvantages
• It has overhead of backtracking.
• It may or may not return an optimal solution.
1. Hill Climbing:
Describe the Hill Climbing Algorithm. What are the problems in Hill Climbing Algorithms?
Suggest methods to overcome them. [ BU Dec 2004, May 2005, Dec 2005 ]
3. A* Algorithm:
Refer class notes for execution and cases.
Refer class notes for the proof of admissibility of A* algorithm. [ BU May
2004, May 2005 ]
4. AO* Algorithm:
1. Start with the initial state. Also call it as the current state
for this iteration.
2. Loop the following until the problem is solved or futility limit
is exceeded.
i. If it is the first iteration then expand all successors of
the initial state else expand the nodes on the path
selected as best in the previous iteration.
ii. Propagate the values of all the newly generated nodes till
the root.
a. Let S be a set of nodes that have been newly generated.
b. Loop the following until S is empty or null
i. Select from S, a node, none of whose descendants
in graph, also occur in S.
ii. Compute the value of the selected node by:
Heuristic function ( For leaf nodes ) OR
Summation of cost of its children + cost of the
arcs
iii. In case of multiple values, select the best
(i.e. least)
iv. Remove selected node from S & add its ancestor
to S.
3. At the root, make a choice for the best path for next iteration.
To specify the agent’s task, we specify its percepts, actions, and goals.
In the wumpus world, these are as follows:
Percepts:
• In the square containing the wumpus and in the directly (not
diagonally) adjacent squares the agent will perceive a stench.
• In the squares directly adjacent to a pit, the agent will perceive
a breeze.
• In the square where the gold is, the agent will perceive a glitter.
• When an agent walks into a wall, it will perceive a bump.
• When wumpus is killed, it gives out a woeful scream that can be
perceived anywhere in a cave.
Prof. Marvin Intelligent Systems Page 14
• The percepts will be given to the agent in the form of a list of
five symbols; for example, if there is a stench, a breeze, and a
glitter buy no bump and no scream, the agent will receive the
percept [Stench,Breeze,Glitter,None,None]. The agent cannot
perceive its own location.
Actions:
• Go forward, turn right by 90O, and turn left by 90O. In addition,
the action Grab can be used to pick up an object that is in the
same square as the agent. The action Shoot can be used to fire an
arrow in a straight line in the direction the agent is facing. The
arrow continues until it either hits and kills the wumpus or hits
the wall. The agent only has one arrow, so only the first Shoot
action has any effect. Finally, the action Climb is used to leave
the cave; it is effective only when the agent is in the start
square.
• The agent dies a miserable death if it enters a square containing a
pit or a live wumpus. It is safe (but smelly) to enter a square
with a dead wumpus.
Goals:
• The agent’s goal is to find the gold and bring it back to the start
as quickly as possible, without getting killed. To be precise, 1000
points are awarded for climbing out of the cave while carrying the
gold, but there is a 1-point penalty for each action taken, and a
10,000-point penalty for getting killed.
The first percept is [None, None, None, None, None], from which the agent
can conclude that its neighboring squares are safe. From the fact that
there is no stench or breeze in [1,1], the agent can infer that [1,2] and
[2,1] are free of dangers. Let the agent decide to move forward to [2,1].
The agent detects a breeze in [2,1], so there must be a pit in a
neighboring square. The pit cannot be in [1,1], by the rules of the game,
so there must be a pit in [2,2] or [3,1] or both. At this point there is
only one square that is safe and has not been visited yet, ie. [1,2]. So
the prudent agent will turn around, go back to [1,1], and then proceed to
[1,2]. The new percept in [1,2] is [Stench, None, None, None, None]. The
stench in [1,2] means that there must be a wumpus nearby. But the wumpus
cannot be in [1,1], by the rules of the game, and it cannot be in [2,2]
(or the agent would have detected a stench when it was in [2,1]).
Therefore, the agent can infer that the wumpus is in [1,3]. Moreover, the
lack of breeze in [1,2] implies that there is no pit in [2,2]. Yet we
already inferred that there must a pit in either [2,2] or [3,1], so this
means it must be in [3,1]. This is a fairly difficult inference, because
it combines knowledge gained at different times in different places and
relies on the lack of percept to make a crucial step. The agent has now
proved to itself that there is neither a pit nor a wumpus in [2,2], so it
is safe to move there. I am assuming here that the agent will then move
to [2,3] and perceive glitter and grab the gold.
In each case where the agent draws a conclusion from the available
information, that conclusion is guaranteed to be correct if the available
information is correct.
Prof. Marvin Intelligent Systems Page 15
4.3 Propositional Logic: A Very Simple Logic
Syntax
The syntax of propositional logic is simple. The symbols of propositional
logic are the local constants True and False, proposition symbols such as
P and Q, the logical connectives Λ, V, Ù, =>, and ¬, and parentheses,
(). All sentences are make by putting these symbols together using the
following rules:
• The logical constants True and False are sentences by themselves.
• A propositional symbol such as P or Q is a sentence by itself.
• Wrapping parentheses around a sentence yields a sentence, for
example, (P Λ Q).
• A sentence can be formed by combining simpler sentences with one of
the five logical connectives:
Semantics
The semantics of propositional logic is also quite straightforward. We
define it by specifying the interpretation of the proposition symbols and
constants, and specifying the meanings of the logical connectives.
A proposition symbol can mean whatever you want. That is, its
interpretation can be any arbitrary fact. The interpretation of P might
be the fact that Paris is the capital of France or that the wumpus is
dead. A sentence containing just a proposition symbol is satisfiable but
not valid: it is true just when the fact that it refers to is the case.
With logical constants, you have no choice; the sentence True always has
its interpretation the way the world actually is – the true fact. The
sentence False always has as its interpretation the way the world is not.
A complex sentence has a meaning derived from the meaning of its parts.
Each connective can be thought of as a function. Just as addition is a
function that takes two numbers as input and returns a number, so and is
a function that takes two truth values as input and returns a truth
value. We know that on way to define a function is to make a table that
gives the output value for every possible input value. For most functions
(such as addition), this is impractical because of the size of the table,
but there are only two possible truth values, so a logical function with
two arguments needs a table with only four entries. Such a table is
called a truth table. The truth tables for the logical connectives are
shown in Figure below.
P Q ¬P P Λ Q P V Q P => Q P Ù Q
False False True False False True True
False True True False True True False
True False False False True False False
True True False True True True True
Prof. Marvin Intelligent Systems Page 16
Among these objects, various relations hold. Some of these relations are
functions—relations in which there is only one “value” for a given
“input.”
Term
A term is a logical expression that refers to an object.
Atomic Sentence
An atomic sentence is formed from a predicate symbol followed by a
parenthesized list of terms. For example,
Brother (Richard, John)
States, under the interpretation given before, that Richard the Lionheart
is the brother of King John. Atomic sentences can have arguments that are
complex terms:
Married(FatherOf(Richard),MotherOf(John))
States that Richard the Lionheart’s father is married to King John’s
mother.
Complex Sentences
We can use logical connectives to construct more complex sentences, just
as in propositional calculus. The semantics of sentences formed using
logical connectives is identical to that in that propositional case. For
example:
• Brother(Richard,John) Λ Brother(John,Richard) is true just when
John is the brother of Richard and Richard is the brother of John.
• Older(John,30) V Younger(John,30) is true when John is older then
30 or Join is younger than 30.
• Older(John,30) => ¬Younger(John,30) states that if John is older
than 30, then he is not younger than 30.
• ¬Brother(Robin,John) is true just when Robin is not the brother of
John.
Quantifiers
Once we have a logic that allows objects, it is only natural to want to
express properties of entire collection of objects, rather than having to
enumerate that objects by name. Quantifiers let us do this. First-order
logic contains two standard quantifiers, called universal and
existential.
Nested Quantifiers
We will often want to express more complex sentences using multiple
quantifiers. The simplest case is where the quantifiers are of the same
type. For example, “For all x and all y, if x is the parent of y then y
is the child of x” becomes
Vx,y Parent(x,y) => Child(y,x)
Vx,y is equivalent to Vx Vy. Similarly, the fact that a person’s brother
has that person as a sibling is expressed by
Prof. Marvin Intelligent Systems Page 17
Vx,y Brother(x,y) => Sibling(y,x)
In other cases we will have mixtures. “Everyboday loves somebody” means
that for every person, there is someone that person loves:
Vx Эy Loves(x,y)
On the other hand, to say “there is someone who is loved by everyone” we
write
Эy Vx Loves(x,y)
d) The best score in Greek is always higher than the best score in
French.
Ans) Эx,m : score(m, Greek, x) Λ (Vy,z : score(y, Greek, z) => (x >= z))
Λ (Vab : score(a, French, b) => (x > b))
g) There is an agent who sells policies only to people who are not
insured.
h) There is a barber who shaves all men in town who do not shave
themselves. [B.U. Dec-04, May-05]
i) Politicians can fool some of the people all of the time, and they
can fool all of the people some of the time, but they can’t fool
all of the people all of the time.
Prof. Marvin Intelligent Systems Page 18
l) It is not the case that if you attempt this exercise you will get
an F. Therefore, you will attempt this exercise.
o) There is no one who does not like Ice Cream.{This is the same as
“Everyone likes Ice Cream”} [B.U. Dec-05]
Ans) ∀x Likes(x,Ice Cream) OR
¬∃x Likes(x,Ice Cream)
Every knowledge bas has two potential consumers: human readers and
inference procedures. A common mistake is to choose predicate names that
are meaningful to the human reader, and then be lulled into assuming that
the name is somehow meaningful to the inference procedure as well. The
sentence BearOfVerySmallBrain(Pooh) might be appropriate in certain
domains, but from this sentence alone, the inference procedure will not
be able to infer either that Pooh is a bear or that he has a very small
brain; that he has a brain at all; that very small brains are smaller
than small brains; or that this fact implies something about Pooh’s
behavior. The hard part is for the human reader to resist the temptation
to make the inferences that seem to be implied by long predicate names. A
knowledge engineer will often notice this kind of mistake when the
inference procedure fails to conclude, for example, Silly(Pooh). It is
compounding the mistake to write
Bear(Pooh)
V b Bear(b) => Animal(b)
V a Animal(a) => PhysicalThing(a)
These sentences help to tie knowledge about Pooh into a broader context.
They also enable knowledge to be expressed at an appropriate level of
generality, depending on whether the information is applicable to bears,
animals, or all physical objects.
RelativeSize(BrainOf(Pooh),BrainOf(TypicalBear)) = Very(Small)
V a Animal(a) Ù Brain(BrainOf(a))
V a part of(BrainOf(a),a)
Decide what to talk about: Understand the domain well enough to know
these objects and facts need to be talked about, and which can be
ignored. For the early examples in this chapter, this step is easy. In
some cases, however, it can be the hardest step. Many knowledge
engineering projects have failed because the knowledge engineers started
to formalize the domain before understand it.
Pose queries to the inference procedure and get answers: This is where
the reward is: we can let the inference procedure operate on the axioms
and problem-specific facts to derive the facts we are interested in
knowing.
Decide on a vocabulary
We need to be able to distinguish a gate from other gates. This is
handled by naming gates with constants: X1, X2, and so on. Next, we need
to know the type of a gate. A function is appropriate for this: Type(X1)
= XOR. This introduces that constant XOR for a particular gate type; the
other constants will be called OR, AND, and NOT.
A gate or circuit can have one or more input terminals and one or more
output terminals. We could simply name each one with a constant, just as
we named gates. Thus, gate X1 could have terminals named X1In1, X1In2, and
X1Out1. Names as long and structured as these, however, are as bad as
BearOfVerySmallBrain. They should be replaced with a notation that makes
it clear that X1Out1 is a terminal for gate X1, and that it is the first
output terminal. A function is appropriate for this; the function
Out(1,X1) denotes the first (and only) output terminal for gate X1. a
similar function In is used for input terminals.
Representing Categories
Rather than being an entirely random collection of objects, the world exhibits a good deal of
regularity. For example, there are many cases in which several objects have a number of properties in
common. It is usual to define categories that include as members all objects having certain properties.
Measures
Many useful properties such as mass, age, and price relate objects to quantities of particular types,
which we call measures. We explain how measures are represented in logic, and how they relate to
units of measure.
Similar axioms can be written for pounds and kilograms; seconds and days;
dollars and cents.
Prof. Marvin Intelligent Systems Page 24
Measures can be used to describe objects as follows:
Composite objects
It is very common for objects to belong to categories by virtue of their constituent structure. For
example, cars have wheels, an engine, and so on, arranged in particular ways; typical baseball games
have nine innings in which each team alternates pitching and batting.
The idea that one object can be part of another is a familiar one. One’s
nose is part of one’s head. We use the general PartOf relation to say
that one thing is part of another. PartOf is transitive and reflexive.
Objects can therefore be grouped into PartOf hierarchies, reminiscent of
the Subset hierarchy:
PartOf(Bucharest,Romania)
PartOf(Romania,EasternEurope)
PartOf(EasternEurope,Europe)
From these, given the transitivity of PartOf, we can infer that
PartOf(Bucharest,Europe).
Any object that has parts is called a composite object. Categories
of composite objects are often characterized by the structure of those
objects, that is, the parts and how the parts are related. For example,
a biped has exactly two legs that are attached to its body:
This general form of sentence can be used to define the structure of any
composite object. A generic event description of this kind is often
called a schema or script, particularly in the area of natural language
understanding.
SubEvent(BattleOfBritain,WorldWarII)
SubEvent(WorldWarII,TwentiethCentury)
Partition({Moments,ExtendedIntervals},Intervals)
V i i Є Intervals => ( i Є Moments Ù Duration(i)=0)
Now we invent a time scale and associate points on that scale with
moments, giving us absolute times. The time scale is arbitrary; we will
measure it in seconds and say that the moment at midnight(GMT) on January
1, 1990, has time 0. The functions Start and End pick out the earliest
and latest moments in an interval, and the function Time delivers the
point on the time scale for a moment. The function Duration gives the
difference between the end time and the start time.
V i Interval(i) => Duration(i) = (Time(End(i)) – Time(Start(i)))
Time(Start(AD1990))=Seconds(0)
Time(Start(AD1991))=Seconds(2871694800)
Time(End(AD1991))=Seconds(2903230800)
Duration(AD1991)=Seconds(31536000)
Time(Start(AD1991))=Date(00,00,00,Jan,1,1991)
Date(12,34,56,Feb,14,1993)=Seconds(2938682096)
Prof. Marvin Intelligent Systems Page 26
Chapter 6: Inference in First-Order Logic
Canonical Form
We are attempting to build an inferencing mechanism with one inference
rule – the generalized version of Modus Ponens. The canonical form for
Modus Ponens mandates that each sentence in the knowledge base be either
an atomic sentence or an implication with a conjunction of atomic
sentences on the left hand side and single atom on the right. Sentence
of this form are called Horn sentences, and a knowledge base consisting
of only Horn sentences is said to be in Horn Normal Form.
We convert sentences into Horn sentences when they are first entered into
the knowledge base, using Existential Elimination and And-Elimination.
For example, ∃x Owns(Nono,x) Λ Missile(x) is converted into the two
atomic Horn sentences Owns(Nono,M1) and Missile(M1). Once the
existential quantifiers are all eliminated, it is traditional to drop the
universal quantifiers, so that ∀y owns(y,M1) would be written as
Owns(y,M1). This is just an abbreviation-the meaning of y is still a
universally quantified variable.
Unification
The job of the unification routine, UNIFY, is to take two atomic
sentences p and q and return a substitution that would make p and q look
the same. (If there is no such substitution, then UNIFY should return
fail.) Formally,
UNIFY(p,q) = θ where SUBST(θ,p) = SUBST(θ,q)
(“John hates everyone he knows”) and we want to use this with the Modus
Ponens inference rule to find out whom he hates. In other words, we need
to find those sentences in the knowledge base that unify with
Knows(John,x), and then apply the unifier to Hates(John,x). Let our
knowledge base contain the following sentences:
Knows(John,Jane)
Knows(y,Leonid)
Knows(y,Mother(y))
Knows(x,Elizabeth)
The last unification fails because x cannot take on the alue John and the
value Elizabeth at the same time.
(Remember that x and y and implicitly universally quantified.) Unifying
the antecedent for the rule against each of the sentences in the
knowledge base in turn gives us:
Prof. Marvin Intelligent Systems Page 28
UNIFY(Knows(John,x),Knows(John,Jane))={ x/Jane}
UNIFY(Knows(John,x),Knows(y,Leonid))={x/Leonid,y/John}
UNIFY(Knows(John,x),Knows(y,Mother(y))={y/John,x?Mother(John)}
UNIFY(Knows(John,x),Knows(x,Elizabeth))= fail
The last unification fails because x cannot take on the value John and
the value Elizabeth at the same time.
One way to handle this problem is to standardize apart the two sentences
being unified, which means renaming the variables of one (or both) to
avoid name clashes. After standardizing apart, we would have
UNIFY(Knows(John,x1),Knows(x2,Elizabeth))={ x1/Elizabeth, x2/John}
Nation(Nono) ---(22)
Enemy(Nono,America) ---(23)
Nation(America) ---(24)
Owns(Nono,M1) ---(25)
Missile(M1) ---(26)
American(West) ---(28)
Weapon(M1) ---(31)
Hostile(Nono) ---(32)
Sells(West,Nono,M1) ---(33)
From (28), (31), (22), (32), (33) and (21) using Modus Ponens:
Criminal(West) ---(34)
This proof shows how natural reasoning with Generalized Modus Ponens can
be.
The Generalized Modus Ponens rule can be used in two ways. We can start
with the sentences in the knowledge base and generate new conclusions
that in turn can allow more inferences to be made. This is called
forward chaining. Forward chaining is usually used when a new fact is
added to the database and we want to generate its consequences.
Alternatively, we can start with something we want to prove, find
implication sentences that would allow us to conclude it, and then
attempt to establish their premises in turn. This is called backward
chaining, because it uses Modus Ponens backwards. Backward chaining is
normally used when there is a goal to be proved.
Prof. Marvin Intelligent Systems Page 29
Forward-chaining algorithm
Forward chaining is normally triggered by the addition of a new fact p to
the knowledge base. The idea is to find all implications that have p as
a premise; then if the other premises are already known to hold, we can
add the consequent of the implication to the knowledge base, triggering
further inference.
We also need the idea of a composition of substitutions. COMPOSE(θ1,θ2)
is the substitution whose effect is identical to the effect of applying
each substitution in turn. That is,
SUBST(COMPOSE(θ1,θ2),p) = SUBST(θ2,SUBST(θ1,p))
Algorithm:
procedure Forward-Chain(KB,p)
procedure FIND-AND-INFER(KB,premises,conclusion,θ)
if premises = [] then
FORWARD-CHAIN(KB,SUBST(θ,conclusion))
else for each p’ in KB such that UNIFY(p’,SUBST(θ,FIRST(premises))) = θ2
do
FIND-AND-INFER(KB,REST(premises),conclusion,COMPOSE(θ,θ2))
end
The forward-chaining inference algorithm adds to KB all the sentences that can be inferred from the sentence p. If p is
already in KB, it does nothing. If p is new, consider each implication that has a premise that matches p. For each such
implication, if all the remaining premises are in KB, then infer the conclusion. If the premises can be matched several
ways, then infer each corresponding conclusion. The substitution θ keeps track of the way things match.
We will use our crime problem again to illustrate how FORWARD-CHAIN
works. We will begin with the knowledge base containing only the
implications in Horn form:
Now we add the atomic sentences to the knowledge base one by one, forward
chaining each time and showing any additional facts that are added:
FORWARD-CHAIN(KB,American(West))
Add to the KB. It unifies with a premise of (1), but the other premises
of (1) are not known, so FORWARD-CHAIN returns without making any new
inferences.
FORWARD-CHAIN(KB,Nation(Nono))
Add to the KB. It unifies with a premise of (1), but there are still
missing premises, so FORWARD-CHAIN returns.
FORWARD-CHAIN(KB,Enemy(Nono,America))
Add to the KB. It unifies with the premise of (4), with unifier
{x/Nono}. Call
FORWARD-CHAIN(KB,Hostile(Nono))
Add to the KB. It unifies with a premise of (1). Only two other
premises are known, so processing terminates.
FORWARD-CHAIN(KB,Owns(Nono,M1))
Prof. Marvin Intelligent Systems Page 30
Add to the KB. It unifies with a premise of (2), with unifier {x/M1}.
The other premise, now Missile(M1), is not known, so processing
terminates.
FORWARD-CHAIN(KB,Missile(M1))
Add to the KB. It unifies with a premise of (2) and (3). We will handle
them in that order.
• Missile(M1) unifies with a premise of (2) with unifier {x/M1}. The
other premise, now Owns(Nono,M1), is known, so call
FORWARD-CHAIN(KB,Sells(West,Nono,M1))
Backward-chaining algorithm:
Backward chaining is designed to find all answers to a question posed to
the knowledge base. The backward-chaining algorithm BACK-CHAIN works by
first checking to see if answers can be provided directly from sentences
in the knowledge base. It then finds all implications whose conclusion
unifies with the query, and tries to establish the premises of those
implications, also by backward chaining. If the premise is a
conjunction, then BACK-CHAIN processes the conjunction conjunct by
conjunct, building up the unifier for the whole premise as it goes.
Figure 1 Criminal(x)
Owns(Nono,M1) Missile(M1)
Yes, {} Yes, {}
Figure 2 Criminal(x)
6.4 Completeness
Suppose we have the following knowledge base:
∀ x P(x) => Q(x)
∀ x ¬ P(x) => R(x)
∀ x Q(x) => S(x)
∀ x R(x) => S(x)
Then we certainly want to be able to conclude S(A); S(A) is true if Q(A)
or R(A) is true, and one of the those must be true because either P(A) is
true or ¬ P(A) is true.
Unfortunately, chaining with Modus Ponens cannot derive S(A) for us. The
problem is that ∀ x ¬ P(x) => R(x) cannot be converted Horn form, and
thus cannot be used by Modus Ponens. That means that a proof procedure
using Modus Ponens is incomplete.
7.1 Planning
The “classical” approach that most planners use today describes states
and operators in a restricted language known as the STRIPS language, or
in extensions thereof. The STRIPS language lends itself to efficient
planning algorithms, while retaining much of the expressiveness of
situation calculus representations.
Op(ACTION:Start,EFFECT:At(Home) Λ Sells(HWS,Drill)
Λ Sells(SM,Milk),Sells(SM,Banana))
Op(ACTION:Finish,
PRECOND:Have(Drill) Λ Have(Milk) Λ Have(Bananas) Λ At(Home))
Figure 1
Op(ACTION:Go(there),PRECOND:At(here),
EFFECT:At(here) Λ ¬At(here))
Op(ACTION:Buy(x),PRECOND:At(Store) Λ Sells(store,x),
EFFECT:Have(x))
Figure 2
Figure 3
Prof. Marvin Intelligent Systems Page 35
In figure 2, we have selected three Buy actions to achieve three of the
preconditions of the Finish action. In each case there is only one
possible choice because the operator library offers no other way to
achieve these conditions.
The bold arrows in the figure 2 are causal links. For example, the
leftmost causal link the figure means that the step Buy(Drill) was added
in order to achieve the Finish step’s Have(Drill) precondition. The
planner will make sure that this condition is maintained by protecting
it: if a step might delete the Have(Drill) condition, then it will not be
inserted between the Buy(Drill) step and the Finish step. Light arrows in
the figure show ordering constraints.
Figure 3 shows the situation after the planner has chosen to achieve the
Sells preconditions by linking them to the initial state. Again, the
planner has no choice here because there is no other operator that
achieves Sells.
Figure 4
The two Go actions have unachieved preconditions that interact with each
other, because the agent cannot be At two places at the same time. Each
Go actions has a precondition At(x), where x is the location that the
agent was at before that Go actions. Suppose the planner tries to achieve
the preconditions of Go(HWS) and Go(SM) by linking them to the At(Home)
condition in the initial state. This results in the plan shown in Figure
5.
Figure 5
Unfortunately, this will lead to a problem. The step Go(HWS) adds the
condition At(HWS), but it also deletes the condition At(Home). So if the
agent goes to the hardware store, it can no longer go from home to the
supermarket.
On the other hand, if the agent goes to the supermarket first, it cannot
go from home to the hardware store. At this point, we have reached a dead
end in the search for a solution, and must back up and try another
choice. The interesting part is seeing how a planner could notice that
this partial plan is a dead end without wasting a lot of time on it. The
Prof. Marvin Intelligent Systems Page 36
key that the causal links in a partial plan are protected links. A causal
link is protected by ensuring that threats---that is , steps that might
delete (or clobber) the protected condition--- are ordered to come bore
or after the protected link. The casual link S1 => S2 is threatened by
the new step S3 because one effect of S3 to delete c. The way to resolve
the threat is to add ordering constraints to make sure that S3 does not
intervene between S1 and S2. If S3 is placed before S1 this is called
demotion, and if it is placed after S2, it is called promotion.
Suppose the next choice is to try a different way to achieve the At(x)
precondition of the Go(SM) step, this time by adding a causal link from
Go(HWS) to Go(SM). In other words, the plan is to go from home to the
hardware store and then to the supermarket. This introduces another
threat. Unless the plan is further refined, it will allow the agent to go
from the hardware store to the super market without first buying the
drill. Technically, the Go(SM) step threatens the At(HWS) precondition of
the Buy(Drill) step, which is protected by a causal link. The threat is
resolved by constraining Go(SM) to come after Buy(Drill). Figure 7 shows
this.
Figure 7
Figure 8
Section 7.1 showed how a partial-order planner’s search through the space
of plans can be more efficient than a problem-solver’s search through the
space of situations. On the other hand, the POP planner can only handle
problems that are stated in the STRIPS language, and its search process
is so unguided that it can still only be used for small problems. In this
section we begin by surveying existing planners that operate in complex,
realistic domains. This will help to pinpoint the weaknesses of POP and
suggest the necessary extensions.
1. Hierarchical plans
2. Complex conditions
3. Time
4. Resources
O-PLAN is being used by Hitachi for job shop planning and scheduling in a
system called TOSCA. A typical problem involves a product line of 350
different products, 35 assembly machines, and over 2000 different
operations. The planner comes up with a 30-day schedule for three 8-hour
shifts a day. In general, TOSCA follows the partial-order, least-
commitment planning approach. It also allows for “low-commitment”
decision: choices that impose constraints on the plan or on a particular
step. For example, the system might choose to schedule an action to be
carried out on a class of machine without specifying any particular one.
SIPE (System for Interactive Planning and Execution monitoring) was the
first planner to deal with the problem of replanning, and the first to
take some important steps toward expressive operators. It has been used
in demonstration projects in several domains, including planning
operations on the flight deck of an aircraft carrier and job-shop
scheduling for an Australian beer factor. Another study used SIPE to plan
the construction of multistory buildings, one of the most complex domains
even tackled by a planner.
Prof. Marvin Intelligent Systems Page 39
Chapter 8: Uncertain Knowledge And Reasoning
8.1 Uncertainty
The problem is that this rule is wrong. Not all patients with toothaches
have cavities; some of them may have gum disease, or impacted wisdom
teeth, or one of several other problems:
∀ p Symptom(p,Toothache) ⇒
Disease(p,Cavity) ∨ Disease(p,GunDisease) ∨ Disease(p,ImpactedWisdom)
∀ p Disease(p,Cavity) ⇒ Symptom(p,Toothache)
But this rule is not right either; not all cavities cuase pain. The only
way to fix the rule is to make it logically exhaustive: to exten the
left-hand side to cover all possible reasons why a cavity might or might
not cause a toothache. Even then, for the purposes of diagnosis, one must
also take into account the possibility that the patient may have a
toothache and a cavity that are unconnected.
Normalization:
Suppose we are also concerned with the possibility that the patient is
suffering from whiplash W given a stiff neck:
P(M|S)/P(W|S)= [P(S|M)P(M)]/[P(S|W)P(W)]=[0.5*1/50000]/[0.8*1/1000]=1/80
That is, whiplash is 80 times more likely than meningitis, given a stiff
neck.
Adding these two equations, and using the fact that P(M|S)+ P(¬M|S)=1,
we obtain:
P(S)= P(S|M)P(M)+ P(S|¬M)P(¬M)
Prof. Marvin Intelligent Systems Page 41
In Section 8.1 we saw that the joint probability distribution can answer
any question about the domain, but can become intractably large as the
number of variables grows. Furthermore, specifying probabilities for
atomic events is rather unnatural and may be very difficult unless a
large amount of data is available from which to gather statistical
estimates.
Consider the following situation. You have a new burglar alarm installed
at home. It is fairly reliable at detecting a burglary, but also responds
to occasion to minor earthquakes. (This example is due to Judea Pearl, a
resident of Los Angeles; hence the acute interest in earthquakes.) You
also have two neighbors, John and Mary, who have promised to call you at
work when they hear the alarm. John always calls when he hears the alarm,
but sometimes confuses the telephone ringing with the alarm and calls
then, too. Mary, on the other hand, likes rather loud music and sometimes
misses the alarm altogether. Given the evidence of who has or has not
called, we would like to estimate the probability of a burglary. This
simple domain is described by the belief network of figure below.
The complete network for the burglary example is show in figure below,
where we show just the conditional probability for the True case of each
variable.
From these two measures, we can define the certainty factor as:
CF[h,e]=MB[h,e]-MD[h,e]
Dempster-Shafer Theory
The Dempster-Shafer theory is designed to deal with the distinction
between uncertainty and ignorance. Rather than computing the probability
of a proposition, it computes the probability that the evidence supports
the proposition. This measure belief is called a belief function, written
Bel(X).
Suppose a shady character comes up to you and offers to be you $10 that
his coin will come up heads on the next flip. Given that the coin may or
may not be fair, what belief should you ascribe to the event of it coming
up heads? Dempster-Shafer theory says that because you have no evidence
either way, you have to say that the belief Bel(Heads) = 0, and also that
Bel(¬Heads) = 0. This makes Dempster-Shafer reasoning systems skeptical
in a way that has some intuitive appeal. Now suppose you have an expert
at your disposal who testifies with 90% certainty that the coin is fair
(i.e., he is 90% sure that P(Heads) = 0.5). Then Dempster-Shafer theory
gives Bel(Heads) = 0.9 X 0.5 = 0.45 and likewise Bel(¬Heads) = 0.45.
There is still a 0.1 “gap” that is not accounted for by the evidence.
“Dempster’s rule” shows how to combine evidence to give new values for
Bel, and Shafer’s work extents this into a complete computational model.
As with default reasoning, there is a problem in connecting beliefs to
actions. With probabilities, decision theory says that if P(Heads) =
P(¬Heads) = 0.5 then (assuming that winning $10 and losing $10 are
considered equal opposites) the reasoner will be indifferent between the
action of accepting and declining the bet. A Dempster-Shafer reasoner has
Bel(¬Heads) = 0, and thus no reason to accept the bet, but then it also
has Bel(Heads) = 0, and thus no reason to decline it.
Prof. Marvin Intelligent Systems Page 43
Chapter 9: Learning
So far we have assumed that all the "intelligence" in an agent has been
built by the agent's designer. The agent in then let loose in an
environment, an does the best it can given the way it was programmed to
act. But this is not necessarily the best approach-for the agent or the
designer. Whenever the designer has incomplete knowledge of the
environment that the agent will lie in, learning is the only way that the
agent can acquire what it needs to know. Learning thus provides
autonomy.
The critic is designed to tell the learning element how well the agent is
doing. The critic employs a fixed standard of performance. This is
necessary because the percepts themselves provide no indication of the
agent's success. For example, a chess program may receive a percept
indicating that it has checkmated its opponent, but it needs a
performance standard to know that this is a good thing; the percept
itself does not say so.
We have seen that there are many ways to build the performance element of
an agent. The components can include the following:
1. A direct mapping from conditions on the current state to actions.
2. A means to infer relevant properties of the world from the percept
sequence.
3. Information about the way the world evolves.
4. Information about the results of possible actions the agent can
take.
5. Utility information indicating the desirability of world states.
6. Action-value information indicating the desirability of particular
actions in particular states.
7. Goals that describe classes of states whose achievement maximizes
the agent’s utility.
For some components, such as the component for predicting the outcome of
an action, the available feedback generally tells the agent what the
correct outcome is. That is, the agent predicts that a certain action
(braking) will have a certain outcome (stopping in 10 feet), and the
environment immediately provides a percept that describes the actual
correct outcome (stopping in 15 feet). Any situation in which both the
inputs and outputs of a component can be perceived is called supervised
learning. (Often, the outputs are provided by a friendly teacher).
Learning when there is no hint at all about the correct outputs is called
unsupervised learning. An unsupervised learner can always learn
relationships among its percepts using supervised learning methods-that
is, it can learn to predict its future percepts given its previous
percepts. It cannot learn what to do unless it already has a utility
function.
Neural Networks
Each unit has a set of input links from other units, a set of output
links to other units, a current activation level, and a means of
computing the activation level at the next step in time, given its inputs
and weights. Each unit performs a simple computation: it receives
signals from its input links and computes a new activation level that it
sends along each of its output links. The computation of the activation
level is based on the values of each input signal received from a
neighboring node, and the weights on each input link. The computation is
split into two components. First is a linear component, called the input
function, g, that transforms the weighted sum into the final value that
serves as the unit’s activation value, ai. Usually, all units in a
network use the same activation function.
The total weighted input is the sum of the input activations times their
respective weights:
The elementary computation step in each unit computes the new activation
value for the unit by applying the activation function, g, to the result
of the input function:
Network structures
The feedforward neural networks are the first and arguably simplest type
of artificial neural networks devised. In this network, the information
moves in only one direction, forward, from the input nodes, through the
hidden nodes (if any) and to the output nodes. There are no cycles or
loops in the network.
More generally, after the sum of the previous layer times the weights is
computed for each neuron, it is passed through a nonlinearity function.
The sigmoid function is a popular choice, because of its simple
derivative. The nonlinearity function is necessary for multilayer
networks, because otherwise they are linear and equivalent to simple two-
layer perceptrons.
Artificial neurons with this kind of activation function are also called
McCulloch-Pitts neurons or threshold neurons. In the literature the term
perceptron sometimes also refers to networks consisting of just one of
these units. Perceptrons can be trained by a simple learning algorithm
that is usually called the delta rule. It calculates the errors between
calculated output and sample output data, and uses this to create an
adjustment to the weights, thus implementing a form of gradient descent.
Single-layer perceptron
A perceptron can be created using any values for the activated and
deactivated states as long as the threshold value lies between the two.
Most perceptrons have outputs of 1 or -1 with a threshold of 0 and there
is some evidence that such networks can be trained more quickly than
networks created from nodes with different activation and deactivation
values.
Prof. Marvin Intelligent Systems Page 47
Perceptrons can be trained by a simple learning algorithm that is usually
called the delta rule. It calculates the errors between calculated output
and sample output data, and uses this to create an adjustment to the
weights, thus implementing a form of gradient descent.
Neural networks are made up of many artificial neurons. An artificial neuron is simply an
electronically modelled biological neuron. How many neurons are used depends on the task at hand.
It could be as few as three or as many as several thousand. There are many different ways of
connecting artificial neurons together to create a neural network but I shall be concentrating on the
most common which is called a feedforward network.
Each input into the neuron has its own weight associated with it illustrated by the red circle. A
weight is simply a floating point number and it's these we adjust when we eventually come to train
the network. The weights in most neural nets can be both negative and positive, therefore providing
excitory or inhibitory influences to each input. As each input enters the nucleus (blue circle) it's
multiplied by its weight. The nucleus then sums all these new input values which gives us the
activation (again a floating point number which can be negative or positive). If the activation is
greater than a threshold value - lets use the number 1 as an example - the neuron outputs a signal. If
the activation is less than 1 the neuron outputs zero. This is typically called a step function as shown
below:
A neuron can have any number of inputs from one to n, where n is the total number of inputs. The
inputs may be represented therefore as x1, x2, x3… xn. And the corresponding weights for the inputs
Prof. Marvin Intelligent Systems Page 50
as w1, w2, w3… wn. Now, the summation of the weights multiplied by the inputs we talked about
above can be written as x1w1 + x2w2 + x3w3 …. + xnwn, which is the activation value. So… a =
x1w1+x2w2+x3w3... +xnwn
This can also be written as:
Assuming an array of inputs and weights are already initialized as x[n] and w[n] then the code can
be written as:
double activation = 0;
Remember that if the activation > threshold we output a 1 and if activation < threshold we output a
0.
The feedforward neural networks are the first and arguably simplest type of artificial neural
networks devised. In this network, the information moves in only one direction, forward, from the
input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops
in the network.
Perceptron:
A single-layer perceptron network consists of one or more artificial neurons in parallel. The earliest
kind of neural network is a single-layer perceptron network, which consists of a single layer of
output nodes; the inputs are fed directly to the outputs via a series of weights. In this way it can be
considered the simplest kind of feed-forward network. The sum of the products of the weights and
the inputs is calculated in each node, and if the value is above some threshold (typically 0) the
neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value
(typically -1).
Perceptron Learning:
The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons
in a single-layer perceptron. It calculates the errors between calculated output and sample output
data, and uses this to create an adjustment to the weights, thus implementing a form of gradient
descent. For a neuron j with activation function g(x) the delta rule for j's ith weight wji is given by:
Prof. Marvin Intelligent Systems Page 51
∆wji = α(tj − yj)xi
where α is a small constant called learning rate, tj is the target output, yj is the actual output, and xi
is the ith input.
Multi-Layer Perceptron:
Well, we have to link several of these neurons up in some way. One way of doing this is by
organising the neurons into a design called a feedforward network. It gets its name from the way the
neurons in each layer feed their output forward to the next layer until we get the final output from
the neural network. This is what a very simple feedforward network looks like:
Each input is sent to every neuron in the hidden layer and then each hidden layer’s neuron’s output
is connected to every neuron in the next layer. There can be any number of hidden layers within a
feedforward network but one is usually enough to suffice for most problems you will tackle. Also
the number of neurons I've chosen for the above diagram was completely arbitrary. There can be
any number of neurons in each layer, it all depends on the problem.
You probably know already that a popular use for neural nets is character recognition. So let's
design a neural network that will detect the number '4'. Given a panel made up of a grid of lights
which can be either on or off, we want our neural net to let us know whenever it thinks it sees the
character '4'. The panel is eight cells square and looks like this:
We would like to design a neural net that will accept the state of the panel as an input and
will output either a 1 or zero. A 1 to indicate that it thinks the character ‘4’ is being
displayed and 0 if it thinks it's not being displayed. Therefore the neural net will have 64
inputs, each one representing a particular cell in the panel and a hidden layer consisting of
a number of neurons (more on this later) all feeding their output into just one neuron in the
Prof. Marvin Intelligent Systems Page 52
output layer. Please picture this in your head because it would be very difficult to draw all those little
circles and lines.
Once the neural network has been created it needs to be trained. One way of doing this is
initialize the neural net with random weights and then feed it a series of inputs which
represent, in this example, the different panel configurations. For each configuration we
check to see what its output is and adjust the weights accordingly so that whenever it sees
something looking like a number 4 it outputs a 1 and for everything else it outputs a zero.
This type of training is called supervised learning and the data we feed it is called a
training set. There are many different ways of adjusting the weights, the most common for
this type of problem is called backpropagation.
If you think about it, you could increase the outputs of this neural net to 10. This way the
network can be trained to recognize all the digits 0 through to 9. Increase them further and
it could be trained to recognize the alphabet too!
Every organism has a set of rules, a blueprint so to speak, describing how that organism is built up
from the tiny building blocks of life. These rules are encoded in the genes of an organism, which in
turn are connected together into long strings called chromosomes. Each gene represents a specific
trait of the organism, like eye colour or hair colour, and has several different settings. For example,
the settings for a hair colour gene may be blonde, black or auburn. These genes and their
settings are usually referred to as an organism's genotype. The physical expression of the genotype -
the organism itself - is called the phenotype.
When two organisms mate they share their genes. The resultant offspring may end up having half
the genes from one parent and half from the other. This process is called recombination. Very
occasionally a gene may be mutated. Normally this mutated gene will not affect the development of
the phenotype but very occasionally it will be expressed in the organism as a completely new trait.
Genetic Algorithms are a way of solving problems by mimicking the same processes
mother nature uses. They use the same combination of selection, recombination and
mutation to evolve a solution to a problem.
Reproduction:
During reproduction, recombination (or crossover) first occurs. Genes from parents combine to
form a whole new chromosome. The newly created offspring can then be mutated. Mutation means
that the elements of DNA are a bit changed. These changes are mainly caused by errors in copying
genes from parents.
The fitness of an organism is measured by success of the organism in its life (survival).
1. [Start] Generate random population of n chromosomes (suitable solutions for the problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
3. [New population] Create a new population by repeating following steps until the new
population is complete
1. [Selection] Select two parent chromosomes from a population according to their
fitness (the better fitness, the bigger chance to be selected)
2. [Crossover] With a crossover probability cross over the parents to form new
offspring (children). If no crossover was performed, offspring is the exact copy of
parents.
3. [Mutation] With a mutation probability mutate new offspring at each locus (position
in chromosome).
4. [Accepting] Place new offspring in the new population
4. [Replace] Use new generated population for a further run of the algorithm
5. [Test] If the end condition is satisfied, stop, and return the best solution in current
population
6. [Loop] Go to step 2
Encoding of a Chromosome
A chromosome should in some way contain information about solution that it represents. The most
used way of encoding is a binary string. A chromosome then could look like this:
Chromosome 1 1101100100110110
Chromosome 2 1101111000011110
Each chromosome is represented by a binary string. Each bit in the string can represent some
characteristics of the solution. Of course, there are many other ways of encoding.
Prof. Marvin Intelligent Systems Page 54
Crossover
Crossover operates on selected genes from parent chromosomes and creates new offspring. The
simplest way how to do that is to choose randomly some crossover point and copy everything
before this point from the first parent and then copy everything after the crossover point from the
other parent.
Mutation
After a crossover is performed, mutation takes place. Mutation is intended to prevent falling of all
solutions in the population into a local optimum of the solved problem. Mutation operation
randomly changes the offspring resulted from crossover.
There are two basic parameters of GA - crossover probability and mutation probability.
Crossover probability: how often crossover will be performed. If there is no crossover, offspring
are exact copies of parents. If there is crossover, offspring are made from parts of both parent's
chromosome.
Crossover is made in hope that new chromosomes will contain good parts of old chromosomes and
therefore the new chromosomes will be better.
Mutation probability: how often parts of chromosome will be mutated. If there is no mutation,
offspring are generated immediately after crossover (or directly copied) without any change. If
mutation is performed, one or more parts of a chromosome are changed. If mutation probability is
100%, whole chromosome is changed, if it is 0%, nothing is changed.
Population size: how many chromosomes are in population (in one generation). If there are too few
chromosomes, GA have few possibilities to perform crossover. On the other hand, if there are too
many chromosomes, GA slows down.
Selection:
As you already know from the GA outline, chromosomes are selected from the population to be
parents for crossover. The problem is how to select these chromosomes. According to Darwin's
theory of evolution the best ones survive to create new offspring. There are many methods in
selecting the best chromosomes. Examples are roulette wheel selection, Boltzman selection,
tournament selection, rank selection, steady state selection and some others.
Crossover:
Single point crossover - one crossover point is selected, binary string from the beginning of the
chromosome to the crossover point is copied from the first parent, the rest is copied from the other
parent
11001011+11011111 = 11001111
Two point crossover - two crossover points are selected, binary string from the beginning of the
chromosome to the first crossover point is copied from the first parent, the part from the first to the
second crossover point is copied from the other parent and the rest is copied from the first parent
again
Uniform crossover - bits are randomly copied from the first or from the second parent
Mutation:
Given the digits 0 through 9 and the operators +, -, * and /, find a sequence that will
represent a given target number. The operators will be applied sequentially from left to
right as you read.
So, given the target number 23, the sequence 6+5*4/2+1 would be one possible solution.
Stage 1: Encoding
0: 0000
1: 0001
2: 0010
Prof. Marvin Intelligent Systems Page 56
3: 0011
4: 0100
5: 0101
6: 0110
7: 0111
8: 1000
9: 1001
+: 1010
-: 1011
*: 1100
/: 1101
The above show all the different genes required to encode the problem as described. The
possible genes 1110 & 1111 will remain unused and will be ignored by the algorithm if
encountered.
So now you can see that the solution mentioned above for 23, ' 6+5*4/2+1' would be
represented by nine genes like so:
6 + 5 * 4 / 2 + 1
011010100101110001001101001010100001
Because the algorithm deals with random arrangements of bits it is often going to come
across a string of bits like this:
0010001010101110101101110010
Decoded, these bits represent:
2 2 + n/a - 7 2
Which is meaningless in the context of this problem! Therefore, when decoding, the
algorithm will just ignore any genes which don’t conform to the expected pattern of:
number -> operator -> number -> operator …and so on. With this in mind the above
‘nonsense’ chromosome is read (and tested) as:
2 + 7
Stage 2: Deciding on a Fitness Function
This can be the most difficult part of the algorithm to figure out. It really depends on what
problem you are trying to solve but the general idea is to give a higher fitness score the
closer a chromosome comes to solving the problem.
Eg., a fitness score can be assigned that's inversely proportional to the difference between
the solution and the value a decoded chromosome represents.
If we assume the target number (ie., solution) is 42, the chromosome mentioned above
011010100101110001001101001010100001
As it stands, if a solution is found, a divide by zero error would occur as the fitness would
be 1/(42-42). This is not a problem however as we have found what we were looking for...
a solution. Therefore a test can be made for this occurrence and the algorithm halted
accordingly.
Applications of GA
Genetic algorithms have been used for difficult problems (such as NP-hard problems), for machine learning
and also for evolving simple programs. They have been also used for some art, for evolving pictures and
music.
The advantage of GAs is in their parallelism. GA is travelling in a search space using more individuals (and
with genotype rather than phenotype) so that they are less likely to get stuck in a local extreme like the other
methods.
They are also easy to implement. Once you have the basic GA algorithm implemented, you have just to write
a new chromosome (just one object) to solve another problem. With the same encoding you just change the
fitness function - and you are done. However, for some problems, choosing and implementation of encoding
and fitness function can be difficult.
The disadvantage of GAs is in the computational time. GAs can be slower than other methods. But
sice we can terminate the computation in any time, the longer run is acceptable (especially with
faster and faster computers).
Prof. Marvin Intelligent Systems Page 58
Chapter 10: Agents that communicate
Communication is the intentional exchange of information brought about by
the production and perception of signs drawn from a shared system of
conventional signs.
Imagine a group of agents are exploring the wumpus world together. The
group gains an advantage (collectively and individually) by being able to
do the following:
• Inform each other about the part of the world each has explored, so
that each agent has less exploring to do. This is done by making
statements: There’s a breeze here in 3 4.
• Query other agents about particular aspects of the world. This is
typically done by asking questions: Have you smelled the wumpus
anywhere?
• Answer questions. This is a kind of informing. Yes, I smelled the
wumpus 2 5.
• Request or comman other agents to perform actions: Please help me
carry the gold. It can be seen as impolite to make a direct
requests, so often an indirect speech act(a request in the form a
statement or question) is used instead: I could use some help
carrying this or Could you help me carry this?
• Promise to do things or offer deals: I’ll shoot the wumpus if you
let me share the gold.
• Acknowledge requests and offers: OK.
• Share feelings and experiences with each other: You know, old chap,
when I get in a spot like that, I usually go back to the start and
head out in another direction, or Man, that wumpus sure needs some
deodorant!
Fundamentals of language
Categories such as NP, VP, and S are called nonterminal symbols. In the
BNF notation, rewrite rules consist of a single nonterminal symbol on the
left-hand side, and a sequence of terminals or nonterminals on the right-
hand side. The meaning of a rule such as
S Æ NP VP
TELL(KBB,”Pit(PA1) ∧ At(PA1,[2,3],SA9)”)
where SA9 is the current situation, and PA1 is A’s symbol for the pit.
The Lexicon of ε0
Each of the categories ends in ... to indicate that there are other words
in the category. However, it should be noted that there are two distinct
reasons for the missing words. For nouns, verbs, adjectives, and
adverbs, it is in principle infeasible to list them all. Not only are
Prof. Marvin Intelligent Systems Page 61
there thousands or tens of thousands of members in each class, but new
ones are constantly being added. For example, “fax” is now a very common
noun and verb, but it was coined only a few years ago. These four
categories are called open classes. The other categories (pronoun,
article, preposition, and conjunction) are called closed classes. They
have a small number of words (a few to a few dozen) that could in
principle be enumerated. Closed classes change over the course of
centuries, not months. For example, “thee” and “thou” were commonly used
pronouns in the seventeenth century, were on the decline in the
nineteenth, and are seen today only in poetry and regional dialects.
The Grammar of ε0
The next step is to combine the words into phrases. We will use five
nonterminal symbols to define the different kinds of phrases:
sentence(S), noun phrase(NP), verb phrase(VP), prepositional phrase(PP),
and relative clause (RelClause). Figure shows a grammar for ε0 with an
example for each rewrite rule. ε0 generates good English sentences such
as the following:
Wumpus Lexicon:
Wumpus Grammer:
Prof. Marvin Intelligent Systems Page 62
Chapter 11: Expert Systems
1. Production system.
2. Frame-based system.
Production System:
The knowledge base is the data or knowledge used to make decisions. The
knowledge base contains the rules and facts about the domain. The
knowledge base consists of two parts: the working memory and the rule
base. The rule base consists of facts and rules that are compiled as
part of the problem. The second part of the knowledge base is the
working memory. The working memory corresponds to Prolog’s dynamic
database.
The inference engine has two functions: inference and control. Inference
is the basic formal reasoning process. It involves matching and
unification. The control function determines the order in which the
rules are tested and what happens when a rule succeeds or fails.
Frame-Based System:
If:
(1) the stain of the organism is gram-positive, and
(2) the morphology of the organism is coccus, and
(3) the growth conformation of the organism is clumps,
then there is suggestive evidence (0.7) that the identity of the organism
is staphylococcus.
The DESIGN ADVISOR gives advice to a chip designer, who can accept or
reject the advice. If the advice is rejected, the system can exploit a
justification-based truth maintenance system to revise its model of the
circuit.
Initially, each expert system that was built was created from scratch,
usually in LISP. But, after several systems had been built this way, it
became clear that these systems often had a lot in common. In
particular, since the systems were constructed as a set of declarative
representations (mostly rules( combined with an interpreter for those
representations, it was possible to separate the interpreter from the
domain-specific knowledge and thus to create a system that could be used
to construct new expert systems by adding new knowledge corresponding to
the new problem domain. The resulting interpreters are called shells.
One influential example of such a shell is EMYCIN which was derived from
MYCIN.
11.3 Explanation
• Entering knowledge
• Maintaining knowledge base consistency
• Ensuring knowledge base completeness
MOLE has been used to build systems that diagnose problems with car
engines, problems in steel-rolling mills, and inefficiencies in coal-
burning power plants.
(A first attempt at building an expert system is unlikely to be very successful. This is partly because
the expert generally finds it very difficult to express exactly what knowledge and rules they use to
solve a problem. Much of it is almost subconscious, or appears so obvious they don't even bother
mentioning it. Knowledge acquisition for expert systems is a big area of research, with a wide
variety of techniques developed. However, generally it is important to develop an initial prototype
based on information extracted by interviewing the expert, then iteratively refine it based on
feedback both from the expert and from potential users of the expert system.)
The system should be able to explain its reasoning (to expert, user and knowledge engineer) and
answer questions about the solution process.
• The user interacts with the system through a user interface which may use menus, natural
language or any other style of interaction.
• An inference engine is used to reason with both the expert knowledge (extracted from our
friendly expert) and data specific to the particular problem being solved.
The expert knowledge will typically be in the form of a set of IF-THEN rules. The case specific data
includes both data provided by the user and partial conclusions (along with certainty measures)
based on this data.
• The explanation subsystem allows the program to explain its reasoning to the user.
• Knowledge base editor helps the expert or knowledge engineer to easily update and check
the knowledge base.
Rule-based systems can be either goal driven using backward chaining to test
whether some hypothesis is true, or data driven, using forward chaining to
draw new conclusions from existing data. Expert systems may use either or
both strategies, but the most common is probably the goal driven/backward
chaining strategy.
Prof. Marvin Intelligent Systems Page 66
Knowledge Acquisition:
There are many programs that interact with domain experts to extract expert knowledge efficiently.
These programs provide support for the following activities:
• Entering knowledge
• Maintaining knowledge base consistency
• Ensuring knowledge base completeness
MOLE is a knowledge acquisition system for heuristic classification problems, such as diagnosing
diseases.
Production Rules:
Our problem is to work out what's wrong with our car given some observable symptoms. There are
three possible problems with the car: problem_with_spark_plugs, problem_with_battery,
problem_with_starter.
We are assuming that we have been provided with no initial facts about the observable
symptoms.
In the simplest goal-directed system we would try to prove each hypothesised problem (with the
car) in turn. First the system would try to prove ``problem_with_spark_plugs''. Rule 1 is potentially
useful, so the system would set the new goals of proving ``engine_getting_petrol'' and
``engine_turns_over''. Trying to prove the first of these, rule 4 can be used, with new goal of
proving ``petrol_in_fuel_tank'' There are no rules which conclude this (and the system doesn't
already know the answer), so the system will ask the user:
Let's say that the answer is yes. This answer would be recorded, so that the user doesn't get asked
the same question again. Anyway, the system now has proved that the engine is getting petrol, so
now wants to find out if the engine turns over. As the system doesn't yet know whether this is the
case, and as there are no rules which conclude this, the user will be asked:
Prof. Marvin Intelligent Systems Page 67
Is it true that the engine turns over?
Lets say this time the answer is no. There are no other rules which can be used to prove
``problem_with_spark_plugs'' so the system will conclude that this is not the solution to the
problem, and will consider the next hypothesis: problem_with_battery. It is true that the engine does
not turn over (the user has just said that), so all it has to prove is that the lights don't come one. It
will ask the user
Suppose the answer is no. It has now proved that the problem is with the battery. Some systems
might stop there, but usually there might be more than one solution, (e.g., more than one fault with
the car), or it will be uncertain which of various solutions is the right one. So usually all hypotheses
are considered. It will try to prove ``problem_with_starter'', but given the existing data (the lights
come on) the proof will fail, so the system will conclude that the problem is with the battery. A
complete interaction with our very simple system might be:
Explanation facilities:
Most expert systems have explanation facilities that allow the user to ask why it asked some
question, and how it reached some conclusion. These questions are answered by referring to the
system goals, the rules being used, and any existing problem data. To illustrate the sorts of facilities
commonly provided, we'll elaborate on our simple example above, starting with a simple example
dialogue involving why and how questions and explanations.
that it is not the case that the engine turns over. Therefore if it is not the
case that the lights come on then there is a problem with battery.
• The sentences We gave the monkeys the bananas because they were
hungry and We gave the monkeys the bananas because they were over-
ripe have the same surface grammatical structure. However, in one
of them the word they refers to the monkeys, in the other it refers
to the bananas: the sentence cannot be understood properly without
knowledge of the properties and behaviour of monkeys and bananas.
• Text to speech
• Speech recognition
• Natural language generation
• Machine translation
• Question answering
• Information retrieval
• Information extraction
• Text-proofing
• Translation technology
• Automatic summarization
Speech synthesis:
The front end has two major tasks. First it takes the raw text and
converts things like numbers and abbreviations into their written-out
word equivalents. This process is often called text normalization, pre-
processing, or tokenization. Then it assigns phonetic transcriptions to
each word, and divides and marks the text into various prosodic units,
like phrases, clauses, and sentences. The process of assigning phonetic
transcriptions to words is called text-to-phoneme (TTP) or grapheme-to-
phoneme (GTP) conversion. The combination of phonetic transcriptions and
prosody information make up the symbolic linguistic representation output
of the front end.
The other part, the back end, takes the symbolic linguistic
representation and converts it into actual sound output. The back end is
often referred to as the synthesizer. The different techniques
synthesizers use are described below.
Synthesizer technologies:
There are two main technologies used for the generating synthetic speech
waveforms: concatenative synthesis and formant synthesis.
Concatenative synthesis:
Formant synthesis:
Formant synthesis does not use any human speech samples at runtime.
Instead, the output synthesized speech is created using an acoustic
model. Parameters such as fundamental frequency, voicing, and noise
levels are varied over time to create a waveform of artificial speech.
This method is sometimes called Rule-based synthesis but some argue that
because many concatenative systems use rule-based components for some
parts of the system, like the front end, the term is not specific enough.
Speech recognition:
Classification:
Approaches
The two most common approaches used to recognize a speaker’s response are
often called grammar constrained recognition and natural language
recognition. When ASR (automatic speech recognition) is used to
transcribe speech, it is commonly called dictation.
Stages
Machine translation:
Question answering:
Information retrieval:
Proofreading:
Translation:
Automatic summarization:
Technologies that can make a coherent summary, of any kind of text, need
to take into account several variables such as length, writing-style and
syntax to make a useful summary.
Prof. Marvin Intelligent Systems Page 73
Some problems which make NLP difficult
Word boundary detection:
In spoken language, there are usually no gaps between words; where
to place the word boundary often depends on what choice makes the
most sense grammatically and given the context. In written form,
languages like Chinese do not signal word boundaries either.
Word sense disambiguation:
Many words have more than one meaning; we have to select the
meaning which makes the most sense in context.
Syntactic ambiguity:
The grammar for natural languages is not unambiguous, i.e. there
are often multiple possible parse trees for a given sentence.
Choosing the most appropriate one usually requires semantic and
contextual information.
Imperfect or irregular input:
Foreign or regional accents and vocal impediments in speech; typing
or grammatical errors, OCR errors in texts.
Speech acts and plans:
Sentences often don't mean what they literally say; for instance a
good answer to "Can you pass the salt" is to pass the salt; in most
contexts "Yes" is not a good answer, although "No" is better and
"I'm afraid that I can't see it" is better yet. Or again, if a
class was not offered last year, "The class was not offered last
year" is a better answer to the question "How many students failed
the class last year?" than "None" is.
12.2 Perception
The senses
Human perception depends on the senses. The classical five senses are
sight, hearing, smell, taste and touch. Along with these there are at
least four other senses: proprioception (body awareness),
equilibrioception (balance), thermoception (heat) and nociception (pain).
Beyond these, some believe in the existence of other senses such as
precognition (or foretelling) or telepathy (direct communication between
human minds/brains without transmittance through any other medium). While
these are controversial, it is known that animals of other species
possess senses that are not found in humans: for example, some fish can
detect electric fields, while pigeons have been shown to detect magnetic
fields and to use them in homing.In psychology and the cognitive
sciences, perception is the process of acquiring, interpreting,
selecting, and organizing sensory information. Methods of studying
perception range from essentially biological or physiological approaches,
through psychological approaches to the often abstract 'thought-
experiments' of mental philosophy.
Categories of perception
12.3 Robotics
Introduction to Robotics:
The robots of the movies, such as C-3PO and the Terminator are portrayed
as fantastic, intelligent, even dangerous forms of artificial life.
However, robots of today are not exactly the walking, talking intelligent
machines of movies, stories and our dreams. Today, we find most robots
working for people in factories, warehouses, and laboratories. In the
future, robots may show up in other places: our schools, our homes, even
our bodies.
Robots have the potential to change our economy, our health, our standard
of living, our knowledge and the world in which we live. As the
technology progresses, we are finding new ways to use robots. Each new
use brings new hope and possibilities, but also potential dangers and
risks.
What is a Robot?
Examples:
Robot Basics:
Most robots are designed to be a helping hand. They help people with
tasks that would be difficult, unsafe, or boring for a real person to do
alone.
Prof. Marvin Intelligent Systems Page 76
At its simplest, a robot is machine that can be programmed to perform a
variety of jobs, which usually involve moving or handling objects. Robots
can range from simple machines to highly complex, computer-controlled
devices.
Many of today's robots are robotic arms. In this exhibit, we will focus
on one very "flexible" kind of robot, which looks similar to a certain
part of your body. It is called a jointed-arm robot.
The vast majority of robots do have several qualities in common. First of
all, almost all robots have a movable body. Some only have motorized
wheels, and others have dozens of movable segments, typically made of
metal or plastic. Like the bones in your body, the individual segments
are connected together with joints.
Robots spin wheels and pivot jointed segments with some sort of actuator.
Some robots use electric motors and solenoids as actuators; some use a
hydraulic system; and some use a pneumatic system (a system driven by
compressed gases). Robots may use all these actuator types.
A robot needs a power source to drive these actuators. Most robots either
have a battery or they plug into the wall. Hydraulic robots also need a
pump to pressurize the hydraulic fluid, and pneumatic robots need an air
compressor or compressed air tanks.
The actuators are all wired to an electrical circuit. The circuit powers
electrical motors and solenoids directly, and it activates the hydraulic
system by manipulating electrical valves. The valves determine the
pressurized fluid's path through the machine. To move a hydraulic leg,
for example, the robot's controller would open the valve leading from the
fluid pump to a piston cylinder attached to that leg. The pressurized
fluid would extend the piston, swiveling the leg forward. Typically, in
order to move their segments in two directions, robots use pistons that
can push both ways.
Not all robots have sensory systems, and few have the ability to see,
hear, smell or taste. The most common robotic sense is the sense of
movement -- the robot's ability to monitor its own motion. A standard
design uses slotted wheels attached to the robot's joints. An LED on one
side of the wheel shines a beam of light through the slots to a light
sensor on the other side of the wheel. When the robot moves a particular
joint, the slotted wheel turns. The slots break the light beam as the
wheel spins. The light sensor reads the pattern of the flashing light and
transmits the data to the computer. The computer can tell exactly how far
the joint has swiveled based on this pattern. This is the same basic
system used in computer mice.
These are the basic nuts and bolts of robotics. Roboticists can combine
these elements in an infinite number of ways to create robots of
unlimited complexity. In the next section, we'll look at one of the most
popular designs, the robotic arm.
Main Parts:
• Controller
Every robot is connected to a computer, which keeps the pieces of the arm
working together. This computer is known as the controller. The
controller functions as the "brain" of the robot. The controller also
allows the robot to be networked to other systems, so that it may work
together with other machines, processes, or robots. Robots today have
controllers that are run by programs - sets of instructions written in
code. Almost all robots of today are entirely pre-programmed by people;
they can do only what they are programmed to do at the time, and nothing
else. In the future, controllers with artificial intelligence, or AI
Prof. Marvin Intelligent Systems Page 77
could allow robots to think on their own, even program themselves. This
could make robots more self-reliant and independent.
• Arm
Robot arms come in all shapes and sizes. The arm is the part of the robot
that positions the end-effector and sensors to do their pre-programmed
business. Many (but not all) resemble human arms, and have shoulders,
elbows, wrists, even fingers. This gives the robot a lot of ways to
position itself in its environment. Each joint is said to give the robot
1 degree of freedom. So, a simple robot arm with 3 degrees of freedom
could move in 3 ways: up and down, left and right, forward and backward.
Most working robots today have 6 degrees of freedom.
• Drive
The drive is the "engine" that drives the links (the sections between the
joints) into their desired position. Without a drive, a robot would just
sit there, which is not often helpful. Most drives are powered by air,
water pressure, or electricity.
• End Effector
• Sensor
Most robots of today are nearly deaf and blind. Sensors can provide some
limited feedback to the robot so it can do its job. Compared to the
senses and abilities of even the simplest living things, robots have a
very long way to go. The sensor sends information, in the form of
electronic signals back to the controller. Sensors also give the robot
controller information about its surroundings and lets it know the exact
position of the arm, or the state of the world around it. Sight, sound,
touch, taste, and smell are the kinds of information we get from our
world. Robots can be designed and programmed to get specific information
that is beyond what our 5 senses can tell us. For instance, a robot
sensor might "see" in the dark, detect tiny amounts of invisible
radiation or measure movement that is too small or fast for the human eye
to see.
The most common manufacturing robot is the robotic arm. A typical robotic
arm is made up of seven metal segments, joined by six joints. The
computer controls the robot by rotating individual step motors connected
to each joint (some larger arms use hydraulics or pneumatics). Unlike
ordinary motors, step motors move in exact increments. This allows the
computer to move the arm very precisely, repeating exactly the same
movement over and over again. The robot uses motion sensors to make sure
it moves just the right amount.
Although robots can't do every type of job, there are certain tasks
robots do very well:
• Assembling products
• Handling dangerous materials
• Spraying finishes
• Inspecting parts, produce, and livestock
• Cutting and polishing
Computers can already solve problems in limited realms. The basic idea of
AI problem-solving is very simple, though its execution is complicated.
First, the AI robot or computer gathers facts about a situation through
sensors or human input. The computer compares this information to stored
data and decides what the information signifies. The computer runs
through various possible actions and predicts which action will be most
successful based on the collected information. Of course, the computer
can only solve problems it's programmed to solve -- it doesn't have any
generalized analytical ability. Chess computers are one example of this
sort of machine.
Some modern robots also have the ability to learn in a limited capacity.
Learning robots recognize if a certain action (moving its legs in a
certain way, for instance) achieved a desired result (navigating an
obstacle). The robot stores this information and attempts the successful
action the next time it encounters the same situation. Again, modern
computers can only do this in very limited situations. They can't absorb
any sort of information like a human can. Some robots can learn by
mimicking human actions. In Japan, roboticists have taught a robot to
dance by demonstrating the moves themselves.