0% found this document useful (0 votes)
12 views

04 Search With Uncertainty

Uploaded by

shahzad.dar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

04 Search With Uncertainty

Uploaded by

shahzad.dar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CS 5/7320

Artificial Intelligence

Search with
Uncertainty
AIMA Chapters 4.3-4.5

Slides by Michael Hahsler


with figures from the AIMA textbook

This work is licensed under a Creative Commons


Attribution-ShareAlike 4.0 International License.
Nondeterministic Actions:
Outcome of an action in a state is
uncertain.

Types of No observations:
uncertainty Sensorless problem

we consider
for now* Partially observable environments:
The agent does not know in what
state the environment is.

Exploration:
Unknown environments and
Online search

* we will quantify uncertainty with


probabilities later.
Remember: Solving Search
Problems under Certainty
State space: A state completely describes the
No Uncertainty environment and agent

• Deterministic Initial state


actions with known
transition model
𝑅𝑒𝑠𝑢𝑙𝑡 𝑠1 , 𝑎 = 𝑠4
• Full observability
(we have sensors to
see the whole
environment)
Goal states

Solution of the planning phase is a sequence of actions also called a plan that can be
blindly followed: [Suck, Right, Suck]
Consequence of Uncertainty
Solution is typically not a fixed precomputed plan
(sequence of actions), but a

conditional plan (also called strategy or policy)

that depends on percepts.


Nondeterministic Actions
Nondeterministic Actions
Outcome of actions in the environment is
nondeterministic = transition model need to
describe uncertainty

Example transition:

𝑅𝑒𝑠𝑢𝑙𝑡𝑠 𝑠1 , 𝑎 = 𝑠2 , 𝑠4 , 𝑠5

i.e., action 𝑎 in 𝑠1 can lead to one of several states.


Example:
Erratic Vacuum World
Regular fully-observable vacuum world, but the
action ‘suck’ is more powerful and nondeterministic:

a) On a dirty square: cleans the square and


sometimes cleans dirt on adjacent squares as
well.
b) On a clean square: sometimes deposits some dirt
on the square.
Example:
Erratic Vacuum World
Start State

𝑅𝑒𝑠𝑢𝑙𝑡𝑠 1, 𝑆𝑢𝑐𝑘 = 5, 7

Goal states

We need a conditional plan


[Suck, if State = 5 then [Right, Suck] else []]
Finding a Cond. Plan: AND-OR Search Tree
Suck OR node (choose one action)
OR
AND node (all possible outcomes)

AND

Right

LOOP: No need to
Suck Solution is shown with bold arrows:
continue search. [Suck, if State = 5 then [Right, Suck] else []]
Solution is the
same as above.
Solution is a subtree that
1. has only GOAL leaf nodes
2. specifies one action at each OR node (state)
3. includes every outcome of AND nodes
AND-OR Tree search: Idea
OR node • Descend the tree by trying an
Suck action in each OR node and
AND node
considering all resulting states
of the AND nodes.
• Remove branches (actions) if
Right we cannot find a subtree
below that leads to only goal
nodes. (see failure in the code
on the next slide). Loop nodes
can be ignored.
Suck
• Stop when we find a subtree
that only has goal states in all
leaf nodes.
• Construct the conditional plan
that represents the subtree
starting at the root node.
[Suck, if State = 5 then [Right, Suck] else []]
AND-OR Recursive DFS Algorithm
= nested If-then-else statements

path is only maintained for cycle checking!

// don’t follow loops using path.


// try all possible actions

// fail means we found no action that leads to


// a goal-only subtree

// try all possible outcomes, none can fail!


// (= belief state)
// fail if we find any non-goal subtree

Notes:
• The DFS search tree is implicitly created using the call stack (recursive algorithm).
• DFS is not optimal! BFS and A* search can be used to find better solutions (e.g., smallest
subtree).
Use of Conditional Plans
• Planning is a goal-based agent.
• The conditional plan can be executed by a model-based reflex agent.

Example: After the initial action “suck”

Agent Step Program


1 [Suck,
Agent’s State
2 if State = 5 then
(= program counter)
3 [Right,
Step 2 4 Suck]
else
4b []
]
Search with no
Observations
Using Actions to
“Coerce” the World into
Known States
No Observations
Sensorless problem = unobservable environment also
called a conformant problem.

Why is this useful?

• Example: Doctor prescribes a broad-band antibiotic


instead of performing time-consuming blood work for a
more specific antibiotic. This saves time and money.

• Basic idea: Find a solution (a sequence of actions) that


works (reasonably well) from any state and then just
blindly execute it (open loop system).
?
Belief State
• The agent does not know in which state it is exactly in.
• However, it may know that it is in one of a set of possible states.
This set is called a belief state of the agent.
• Example: b = 𝑠2 , 𝑠4 , 𝑠6

b
Actions to Coerce the ?
World into States
• Actions can reduce the number of possible states.
• Example: Deterministic vacuum world. Agent does not know
its position and the dirt distribution.
Initial belief state {1,2,3,4,5,6,7,8}

right

Goal
states
Actions to Coerce the ?
World into States
• Actions can reduce the number of possible states.
• Example: Deterministic vacuum world. Agent does not know
its position and the dirt distribution.

suck
Actions to Coerce the ?
World into States
• The action sequence [right, suck, left, suck] coerces the
world into the goal state 7. It works from any initial state!
• There are no observations so there is no need for a
conditional plan.

[right,
suck,
left,
suck]
Example: The reachable belief-state ?
space for the deterministic,
sensorless vacuum world
Size of the belief state
space depends on the
number of states 𝑁:

𝒫𝑠 = 2𝑁 = 28 = 256
Initial
belief Only a small fraction
state (12 states) are
reachable.

No observations, so we
get a solution sequence
from an initial belief
state:
[Right, Suck, Left, Suck]
Finding a Solution Sequence
Note: State space size makes this
impractical for larger problems!

Formulate as a regular search and solve with DFS, BFS or A*:


• States: All belief states (=powerset 𝒫𝑠 of states of size 2𝑁 for N states)
• Initial state: Often the belief state consisting of all states.
• Actions: Actions of a belief state are the union of the possible actions for all the
states it contains.
• Transition model: 𝑏′ = 𝑅𝑒𝑠𝑢𝑙𝑡𝑠 𝑏, 𝑎 = {𝑠 ′ : 𝑠 ′ = 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 𝑎𝑛𝑑 𝑠 ∈ 𝑏}
• Goal test: Are all states in the belief state goal states?
• Simplifying property: If a belief state (e.g., 𝑏1 = {1,2,3,4,5}) is solvable (i.e.,
there is a sequence of actions that coerce all states to only goal states), then
belief states that are subsets (e.g., 𝑏2 = {2,5}) are also solved using the same
action sequence. Used to prune the search tree.

Other approach:
• Incremental belief-state search. Generate a solution that works for one state
and check if it also works for all other states. If it does not, then modify the
solution slightly. This is similar to local search.
3m

Case Study
1m
2m
x
Goal The agent can move up, down right, left.
location The agent has no sensors and does not
know its current location.
1. Can you navigate to the goal location?
How?

8m

Agent
2. What would you need to know about
the environment?

3. What type of agent can do this?


Partially Observable
Environments
Using Observations to
Learn About the State
Percepts and Observability
• Many problems cannot be solved efficiently
without sensing (e.g., 8-puzzle).
• We need to see at least one square.

Percept function: 𝑃𝑒𝑟𝑐𝑒𝑝𝑡 𝑠


𝑠 is the state

𝑃𝑒𝑟𝑐𝑒𝑝𝑡 𝑠 = 𝑇𝑖𝑙𝑒7
• Fully observable: 𝑃𝑒𝑟𝑐𝑒𝑝𝑡 𝑠 = 𝑠
Problem: Many
• Sensorless: 𝑃𝑒𝑟𝑐𝑒𝑝𝑡 𝑠 = 𝑛𝑢𝑙𝑙 states (different
• Partially observable: 𝑃𝑒𝑟𝑐𝑒𝑝𝑡 𝑠 = 𝑜 order of the hidden
𝑜 is called an observation and tells us something about 𝑠 tiles) can produce the
same observation!
Use Observations to Learn
About the State Prediction for
action 𝑎 𝑏 Update with
observation 𝑜

Agents choose an action and then receive an observation.


Idea: Observations can be used to learn about the agent’s
state.
Assume we have a current belief state 𝑏 (i.e., the set of states we could be in).
Prediction for action: Choose an action 𝑎 and compute a new belief state that results
from the action.

𝑏෠ = 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 𝑏, 𝑎 = ራ 𝑃𝑟𝑒𝑑𝑖𝑐𝑡(𝑠, 𝑎)
𝑠∈𝑏
Update with observation: You receive an observation 𝑜 and only keep states that are
consistent with the new observation. The belief after observing 𝑜 is:

෠ 𝑜 = {𝑠 ∶ 𝑠 ∈ 𝑏෠ ∧ 𝑃𝑒𝑟𝑐𝑒𝑝𝑡 𝑠 = 𝑜}
𝑏𝑜 = 𝑈𝑝𝑑𝑎𝑡𝑒 𝑏,

Both steps in one: 𝑏 ← 𝑈𝑝𝑑𝑎𝑡𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡(𝑏, 𝑎), 𝑜


Example: Deterministic local
sensing vacuum world

Predict for Update with


Prediction for
action 𝑎 𝑏 Update with
observation 𝑜 actions a observation 𝑜

[R,Dirty]

?
?

𝑏 ← 𝑈𝑝𝑑𝑎𝑡𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 𝑏 , 𝑎 , 𝑜
𝑈𝑝𝑑𝑎𝑡𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 1,3 , 𝑅𝑖𝑔ℎ𝑡 , [𝑅, 𝐷𝑖𝑟𝑡𝑦] = {2}
Solving Partially Observable Problems
Use an AND-OR tree of belief states to create a
conditional plan
Initial
belief state

OR
[L,Clean] [R,Dirty] [R,Clean]
AND
AND

Solution: [Suck, Right, if b = {6} then Suck else []]


Solving Partially Observable Problems
Use an AND-OR tree to create a conditional plan
predict

OR
update

[L,Clean] [R,Dirty] [R,Clean]


AND
AND

Solution: [Suck, Right, if b = {6} then Suck else []]


Solving Partially Observable Problems
Use an AND-OR tree to create a conditional plan
predict

OR
update

[L,Clean] [R,Dirty] [R,Clean]


AND
AND


Solution: [Suck, Right, if b = {6} then Suck else []]
Solving Partially Observable Problems
Use an AND-OR tree to create a conditional plan
predict

OR
update

[L,Clean] [R,Dirty] [R,Clean]


AND
AND


Solution: [Suck, Right, if b = {6} then Suck else []]

b = {6} is the result of the


update with o = [r, Dirty]
State Estimation and Prediction for
action 𝑎 𝑏 Update with
observation 𝑜
Approximate Belief States
• Agents choose an action and then receive an observation from the
environment.
• The agent keep track of its belief state using the following update:

𝑏 ← 𝑈𝑝𝑑𝑎𝑡𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 𝑏, 𝑎 , 𝑜

• This process is often called


• monitoring,
• filtering, or
• state estimation.

• The agent needs to be able to update its belief state following observations in
real time! For many practical application, there is only time to compute an
approximate belief state! These approximate methods are used in control
theory and reinforcement learning.
Case Study
Partially Observable
8-Puzzle
1. Give a problem description for each step.
• States:
• Initial state:
• Actions:
• Transition model:
• Goal test:
• Percept function:

2. The problem can be solved using an AND-


OR Tree, but is there an easier solution?

a. What type of agent do we use?

b. What algorithms can be used?


Exploration
Unknown Environment and
Online Search
Online Search
• Recall offline search: Create a plan using the state space as a model before
taking any action. The plan can be a sequence of actions or a conditional plan
to account for uncertainty.

• The agent uses the transition function to predict the consequence of actions.
What if the transition function is unknown?

• Online search explores the real world one action at a time. Prediction is
replaced by “act” and update by “observe.”

Act Observe Act Observe Act …


• Useful for
• Real-time problems: When offline computation takes too long and there is a penalty for
sitting around and thinking.
• Nondeterministic domain: Only focus on what actually happens instead of planning for
everything!
• Unknown environment: The agent has no complete model of how the environment works.
It needs to explore an unknown state space and/or what actions do. I.e., it needs to learn
the transition function 𝑓 ∶ 𝑆 × 𝐴 → 𝑆
Design Considerations for
Online Search
• Knowledge: What does the agent already know about
the outcome of actions? E.g.,
• Does go north and then south lead to the same location? Transition
• Where are the walls in the maze? function

• Safely explorable state space/world: There are no


irreversible actions (e.g., traps, cliffs). At least the
agent needs to be able to avoid these actions.

• Exploration order: Expanding nodes in local order is


more efficient if you must execute the actions to get
observations: Depth-first search with backtracking
instead of BFS or A* Search.
Online Search: Model-based Agent
Program for Unknown Transition model
Environment is deterministic but
• only partially observable (𝑝𝑒𝑟𝑐𝑒𝑝𝑡 𝑠 = 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛, state space may be unknown)
• unknown transition model (function 𝑟𝑒𝑠𝑢𝑙𝑡).
Approach: The algorithm builds the map 𝑟𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 → 𝑠′ by trying all actions and backtracks when
all actions in a state have been explored. Learns results function
(= transition function)

Untried is the “frontier”

unbacktracked stores the current path

Record the found transition

Keep breadcrumbs to go back


Case Study: DFS with Backtracking
for an unknown Maze
unbacktracked
• We can only see
adjacent squares and Start
don’t know the
location of the goal!
• We cannot plan but untried
we must explore by (~ frontier)
walking around!
• We only know what
we have already
explored.
• A simple method is to The
store the path for
backtracking to get transition
back to untied paths function is
when we run into a unknown.
dead end (i.e., use
breadcrumbs).
Important concepts that you
should be able to explain and
use now…

• Difference between solution types:


a. a fixed actions sequence,
b. a conditional plan (also called a strategy or
policy), and
c. exploration.
• What are belief states?
• How actions can be used to coerce the world into
known states.
• How observations can be used to learn about the
state: State estimation with repeated predict and
update steps.
• The use of AND-OR trees to solve small problems.

You might also like