0% found this document useful (0 votes)
30 views

Unit 4 Notes FAI

The document discusses planning and planning agents. It defines planning as devising a plan of action to achieve goals by coming up with a sequence of actions. Planning agents are similar to problem solving agents but differ in their representation of goals, states, actions and how they search for solutions. Classical planning considers fully observable, deterministic environments while nonclassical planning handles partially observable or stochastic environments. Forward and backward state-space search are approaches to planning as a search problem but have limitations in handling decomposition which is important for efficiency. The STRIPS language is presented as the basic representation for classical planning problems involving states, goals and actions.

Uploaded by

Uma Maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Unit 4 Notes FAI

The document discusses planning and planning agents. It defines planning as devising a plan of action to achieve goals by coming up with a sequence of actions. Planning agents are similar to problem solving agents but differ in their representation of goals, states, actions and how they search for solutions. Classical planning considers fully observable, deterministic environments while nonclassical planning handles partially observable or stochastic environments. Forward and backward state-space search are approaches to planning as a search problem but have limitations in handling decomposition which is important for efficiency. The STRIPS language is presented as the basic representation for classical planning problems involving states, goals and actions.

Uploaded by

Uma Maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Unit 4

PLANNING AND ACTING


Planning is devising a plan of action to achieve one’s goals. Planning is the task of coming
up with a sequence of actions that will achieve a goal.
Planning vs. Problem Solving
Planning agent is very similar to problem solving agent. It constructs plans to achieve goals,
then executes them.
Planning agent is different from problem solving agent in:
 Representation of goals, states, actions
 Use of explicit, logical representations
 Way it searches for solutions
Planning systems do the following:
 divide-and-conquer
 relax requirement for sequential construction of solutions.
NEW
The task of coming up with a sequence of actions that will achieve a goal is called planning.
We have seen two examples of planning agents so far: the search-based probleim-solving
agent and the logical planning agent.
Classical Planning:
we consider only environments that are fully observable, deterministic, finite, static (change happens
only when the agent acts), and discrete (in time, action, objects, and effects). These are called
classical planning environments.
Non classical Planning:
nonclassical planning is for partially observable or stochastic environments and involves a different
set of algorithms and agent designs.
1.
Disadvantage of Problem solving agent:
1. problem-solving agent can be overwhelmed by irrelevant actions.
2. finding a good heuristic function.
3. the problem solver might be inefficient because it cannot take advantage of problem
decomposition.
The planner can work on subgoals independently, but might need to do some additional work to
combine the resulting subplans.
The language of planning problems
the representation of planning problems-states, actions, and goals-should make it possible for
planning algorithms to take advantage of the logical structure of the problem. The key is to find a
language that is expressive enough to describe a wide variety of problems, but restrictive enough to
allow efficient algolrithms to operate over it.
The basic representation language of classical planners is known as the S TRIPS language.
Representation of states. Planners decompose the world into logical conditions and represent a
state as a conjunction of positive literals. The closed-world assumption is used, meaning that any
conditions that are not mentioned in a state are assumed false.
Representation of goals. A goal is a partially specified state, represented as a conjunction of
positive ground literals. For example, the state Rich A Famous A Miserable satisfies the goal Rich A
Famous.
Representation of actions. An action is specified in terms of the preconditions that must hold
before it can be executed and the effects that ensue when it is executed. For example, an action for
flying a plane from one location to another is:
Action(Fly(p, from, to),
PRECOND:A from A Plane(p) A Airport(from) A Airport(to)
EFFECT:~Afrom ,A At(p, to))
This is more properly called an action schema, meaning that it: represents a number of different
actions that can be derived by instantiating the variables p, from, and to different constants. In general,
an action schema consists of three parts:
The action name and parameter list-for example, Fly(p, from, to)-serves to identify the action.
The precondition is a conjunction of function-free positive literals stating what must
be true in a state before the action can be executed. Any variables in the precondition
must also appear in the action's parameter list.
The effect is a conjunction of function-free literals describing how the state changes when the
action is executed. A positive literal P in the effect is asserted to be true in the state resulting
from the action, whereas a negative literal P is asserted to be false. Variables in the effect
must also appear in the action's parameter list.
Having defined the syntax for representations of planning problems, we can now define the semantics.
The most straightforward way to do this is to describe how actions affect states.
First, we say that an action applicable in any state that satisfies the precondition; otherwise, the
action has no effect.
For a first-order action schema, establishing applicability will involve a substitution 0 for the
variables in the precondition. For example, suppose the current state is described by
At(Pl, JFK) A At(P2, SFO) A Plane(P1) A Plane(P2)
A Airport (JFK) A Airport (SFO) .
This state satisfies the precondition
At(p, from) A Plane(p) A Airport(from) A Airport(to)
with substitution {p/Pl, from/JFK, to/SFO)
Thus, after Fly(Pl, JFK, SFO), the current state becomes
At(Pl, SFO) A At(P2, SFO) A Plane(Pl) A Plane(P2)
A Airport (JFK) A Airport (SFO) .
In recent years, it has become clear that S TRIPS is insufficiently expressive for some real domains. As
a result, many language variants have been developed. Figure 11 I briefly describes one important
one, the Action Description Language or ADL, by comparing it with the basic STRIPS language. In
ADL, the Fly action could be written as Positive and negative literals in states:

2. PLANNING WITH STATE-SPACE SEARCH


Forward state-space search
Planning with forward state-space search is similar to the problem-solving. It is sometimes called
progression planning, because it moves in the forward direction.

We start in the problem's initial state, considering sequences of actions until we find a sequence
that reaches a goal state. The formulation of planning problems as state-space search problems is as
follows:
 The initial state of the search is the initial state from the planning problem. In general, each
state will be a set of positive ground literals; literals not appearing are false.
 The actions that are applicable to a state are all those whose preconditions are satisfied. The
successor state resulting from an action is generated by adding the positive effect literals and
deleting the negative effect literals.
 The goal test checks whether the state satisfies the goal of the planning problem.
 The step cost of each action is typically 1.
Why forward state space search is inefficient?
 It does not address the irrelevant action problem-all applicable actions are considered from
each state.
 The approach quickly bogs down without a good heuristic.
Backward state-space search
 Backward search is similar to bidirectional search.
 Disadvantages
o Difficult to implement when the goal states are described by a set of constraints rather
than being listed explicitly.
o It is not always obvious how to generate a description of the possible predecessors of
the set of goal states.
 Advantages
o It allows us to consider only relevant actions.
o An action is relevant to a conjunctive goal if it achieves one of the conjuncts of the
goal.
 Searching backwards is sometimes called regression planning.
 The principal question is:- what are the states from which applying a given action leads to the
goal?
 Computing the description of these states is called regressing the goal through the action.
 consider the air cargo example;- we have the goal as,
At(C1,B) ˄ At(C2,B) ˄ …… ˄ At(C20,B)
o and the relevant action Unload(C1,p,B), which achieves the first conjunct.
o The action will work only if its preconditions are satisfied.
o Therefore , any predecessor state must include these preconditions : In(C1,p)
At(p,B), Moreover the subgoal At(C1,B) should not be true in the predecessor state.
o The predecessor description is
In(C1,p) ˄ At(p,B) ˄ At(C2,B) ˄ …… ˄ At(C20,B)
o In addition to insisting that actions achieve some desired literal, we must insist that
the actions not undo any desired literals.
o An action that satisfies this restriction is called consistent.
 Given definitions of relevance and consistency, we can describe the general process of
constructing predecessors for backward search.
 Given a goal description G, let A be an action that is relevant and consistent.
 The corresponding predecessor is as follows:
o Any positive effects of A that appear in G are deleted.
o Each precondition literal of A is added, unless it already appears.
 Any of the standard search algorithms can be used to carry out the search.
 Termination occurs when a predecessor description is generated that is satisfied by the initial
state of the planning problem.

 Forward and backward state-space search are particular forms of totally ordered plan search.
 They explore only strictly linear sequences of actions directly connected to the start or goal.
 This means that they cannot take advantage of problem decomposition.
 Rather than work on each subproblem separately, they must always make decisions about
how to sequence actions from all the subproblems.
 We would prefer an approach that works on several subgoals independently, solves them with
several subplans, and then combines the subplans.
 Consider the simple problem of putting on a pair of shoes. We can describe this as a formal
planning problem as follows:

 A planner should be able to come up with the two-action sequence Rightsock followed by
Rightshoe to achieve the first conjunct of the goal and the sequence Leftsock followed by
LeftShoe for the second conjunct.
 Then the two sequences can be combined to yield the final plan.
 In doing this, the planner will be manipulating the two subsequences independently, without
committing to whether an action in one sequence is before or after an action in the other.
 Any planning algorithm that can place two actions into a plan without specifying which
comes first is called a partial-order planner.
 Figure shows the partial-order plan that is the solution to the shoes and socks problem. Note
that the solution is represented as a graph of actions, not a sequence.

 The partial-order solution corresponds to six possible total-order plans; each of these is called
a linearization of the partial-order plan.
 We will define the POP algorithm for partial-order planning.
 It is traditional to write out the POP algorithm as a stand-alone program, but we will instead
formulate partial-order planning as an instance of a search problem.
 Each plan has the following four components, where the first two define the steps of the plan
and the last two serve a bookkeeping function to determine how plans can be extended:
o A set of actions that make up the steps of the plan. These are taken from the set of
actions in the planning problem. The "empty" plan contains just the Start and Finish
actions. Start has no preconditions and has as its effect all the literals in the initial
state of the planning problem. Finish has no effects and has as its preconditions the
goal literals of the planning problem.
o A set of ordering constraints. Each ordering constraint is of the form , which
is read as "A before B" and means that action A must be executed sometime before
action B, but not necessarily immediately before. The ordering constraints must describe
a proper partial order. Any cycle-such as and B A-represents a contradiction,
so an ordering constraint cannot be added to the plan if it creates a cycle.
o A set of causal links. A causal link between two actions A and B in the plan is written
as and is read as "A achieves p for B. For example, the causal link

Asserts that RightSockOn is an effect of the RightSock action and a precondition of


RightShoe. It also asserts that RightSockOn must remain true from the time of action
RightSock to the time of action RightShoe. In other words, the plan may not be extended
by adding a new action C that conflicts with the causal link.
o A set of open preconditions. A precondition is open if it is not achieved by some action
in the plan. Planners will work to reduce the set of open preconditions to the empty set,
without introducing a contradiction.
The final plan is written as
We define a consistent plan as a plan in which there are no cycles in the ordering constraints and no
conflicts with the causal links. A consistent plan with no open preconditions is a solution.
Now we are ready to formulate the search problem that POP solves. As usual, the definition includes
the initial state, actions, and goal test.
o The initial plan contains Start and Finish, the ordering constraint Start Finish, and
no causal links and has all the preconditions in Finish as open preconditions.
o The successor function arbitrarily picks one open precondition p on an action B and
generates a successor plan for every possible consistent way of choosing an action A
that achieves p. Consistency is enforced as follows:
1. The causal link and the ordering constraint A B are added to the plan.
Action A may be an existing action in the plan or a new one. If it is new, add it to the
plan and also add Start A and A Finish.
2. We resolve conflicts between the new causal link and all existing actions and between the
action A (if it is new) and all existing causal links. A conflict between and C
is resolved by making C occur at some time outside the protection interval, either by
adding B C or C A. We add successor states for either or both if they result in
consistent plans.
o The goal test checks whether a plan is a solution to the original planning problem. Because
only consistent plans are generated, the goal test just needs to check that there are no open
preconditions.

PLANNING AND ACTING IN THE REAL


WORLD
Planners that are used in the real world for tasks such as scheduling Hubble Space Telescope
observations, operating factories, and handling the logistics for military campaigns are more complex;
they extend the basics in terms both of the representation language and of the way the planner
interacts with the environment.
o For some domains we would like to talk about when actions begin and end.
o Time is of the essence in the general family of applications called job shop scheduling.
o Such tasks require completing a set of jobs, each of which consists of a sequence of actions,
where each action has a given duration and might require some resources.
o The problem is to determine a schedule that minimizes the total time required to complete all
the jobs, while respecting the resource constraints.
o An example of a job shop scheduling problem is given in Figure 12.1.
o This is a highly simplified automobile assembly problem.
o There are two jobs: assembling cars C1 and C2.
o Each job consists of three actions: adding the engine, adding the wheels, and inspecting the
results.
Figure 12.2 shows the solution that the partial-order planner POP would come up with.

o To make this a scheduling problem rather than a planning problem, we must now determine
when each action should begin and end, based on the durations of actions as well as their
ordering.
o The notation Duration(d) in the effect of an action means that the action takes d minutes to
complete.
o Given a partial ordering of actions with durations, apply the critical path method (CPM) to
determine the possible start and end times of each action.
o A path through a partial-order plan is a linearly ordered sequence of actions beginning with
Start and ending with Finish.
o The critical path is that path whose total duration is longest; the path is "critical" because it
determines the duration of the entire plan.
o To complete the whole plan in the minimal total time, the actions on the critical path must be
executed with no delay between them.
o Actions that are off the critical path have some leeway- window of time in which they can be
executed.
o The window is specified in terms of an earliest possible start time, ES, and a latest possible
start time, LS.
o The quantity LS - ES is known as the slack of an action.
o Together the ES and LS times for all the actions constitute a schedule for the problem.
o The following formulas serve as a definition for ES and LS and also as the outline of a
dynamic programming algorithm to compute them:

o The idea is that we start by assigning ES(Start) to be 0.


o Then as soon as we get an action B such that all the actions that come immediately before B
have ES values assigned, we set ES(B) to be the maximum of the earliest finish times of those
immediately preceding actions, where the earliest finish time of an action is defined as the
earliest start time plus the duration.
o This process repeats until every action has been assigned an ES value.
o The LS values are computed in a similar manner, working backwards from the Finish action.
Scheduling with resource constraints
o Real scheduling problems are complicated by the presence of constraints on resources.
o For example, adding an engine to a car requires an engine hoist.
o If there is only one hoist, then we cannot simultaneously add engine El to car C1 and engine
E2 to car Cz;
o The engine hoist is an example of a reusable resource-a resource that is "occupied" during the
action but that becomes available again when the action is finished.
o Notice that reusable resources cannot be handled in our standard description of actions in
terms of preconditions and effects, because the amount of resource available is unchanged
after the action is completed.
o For this reason, we augment our representation to include a field of the form R ESOURCE:
R(k), which means that k units of resource R are required by the action.
o The resource requirement is both a prerequisite-the action cannot be performed if the resource
is unavailable-and a temporary effect, in the sense that the availability of resource r is
reduced by k for the duration of the action.
o Figure 12.3 shows how to extend the engine assembly problem to include three resources: an
engine hoist for installing engines, a wheel station for putting on the wheels, and two
inspectors.
o Figure 12.4 shows the solution for the problem.

o The representation of resources as numerical quantities, such as Inspectors (2), rather than as
named entities, such as Inspector(Il) and Inspector(12), is an example of a very general
technique called aggregation.
o The central idea of aggregation is to group individual objects into quantities when the objects
are all indistinguishable with respect to the purpose at hand. resource constraints make
scheduling problems more complicated by introducing additional interactions among actions.
o Whereas unconstrained scheduling using the critical-path method is easy, finding a resource-
constrained schedule with the earliest possible completion time is NP-hard.

o One of the most pervasive ideas for dealing with complexity is hierarchical decomposition.
o The key benefit of hierarchical structure is that, at each level of the hierarchy, a computational
task, military mission, or administrative function is reduced to a small number of activities at the
next lower level, so that the computational cost of finding the correct way to arrange those
activities for the current problem is small.
o Non hierarchical methods, on the other hand, reduce a task to a large number of individual
actions; for largescale problems, this is completely impractical.
o Hierarchical task networks or HTNs approach combines ideas from both partial-order
planning and the area known as "HTN planning."
o In HTN planning, the initial plan, which describes the problem, is viewed as a very high-level
description of what is to be done-for example, building a house.
o Plans are refined by applying action decompositions.
o Each action decomposition reduces a high-level action to a partially ordered set of lower-level
actions.
o Action decompositions, therefore, embody knowledge about how to implement actions.
o For example, building a house might be reduced to obtaining a permit, hiring a contractor, doing
the construction, and paying the contractor.
o The process continues until only primitive actions remain in the plan.
o Typically, the primitive actions will be actions that the agent can execute automatically.
o In "pure" HTN planning, plans are generated only by successive action decompositions.
o The HTN therefore views planning as a process of making ari activity description more concrete,
rather than a process of constructing an activity description, starting from the empty activity.
Representing action decompositions
o General descriptions of action decomposition methods are stored in a plan library, from which
they are extracted and instantiated to fit the needs of the plan being constructed.
o Each method is an expression of the form Decompose(a, d).
o This says that an action a can be decomposed into the plan d, which is represented as a partial-
order plan.
o Building a house is a nice, concrete example, so we will use it to illustrate the concept of action
decomposition.
o Figure 12.5 depicts one possible decomposition of the BuildHouse action into four lower-level
actions.
o Figure 12.6 shows some of the action descriptions for the domain, as well as the decomposition
for BuildHouse as it would appear in the plan library.

o The Start action of the decomposition supplies all those preconditions of actions in the plan that
are not supplied by other actions.
o These are called external preconditions.
o In the example, the external preconditions of the decomposition are Land and Money. Similarly,
the external effects, which are the preconditions of Finish, are all those effects of actions in the
plan that are not negated by other actions.
o In our example, the external effects of BuildHouse are House and money.
o Some HTN planners also distinguish between primary effects, such as House, and secondary
effects, such as money.
o Only primary effects may be used to achieve goals, whereas both kinds of effects might cause
conflicts with other actions; this can greatly reduce the search space.
Modifying the planner for decompositions
o To incorporate HTN planning POP is modified.
o It is done by modifying the successor function of the to allow decomposition methods to be
applied to the current partial plan P.
o The new successor plans are formed by first selecting some nonprimitive action a' in P and then,
for any Decompose(a, d) method from the plan library such that a and a' unify with substitution,
replacing a' with d' = SUBST(θ, d).
o Figure 12.7 shows an example.
o At the top, there is a plan P for getting a house.
o The high-level action, a' = BuildHouse, is selected for decomposition.
o The decomposition d is selected from Figure 12.5, and BuildHouse is replaced by this
decomposition.
o An additional step, GetLoan, is then introduced to resolve the new open condition, Money, that is
created by the decomposition step.

To be more precise, for each possible decomposition d':


1. First, the action a' is removed from P. Then, for each step s in the decomposition d', choose
an action to fill the role of s and add it to the plan. It can be either a new instantiation of s or
an existing step s’ from P that unifies with s. This is called subtask sharing.
2. The next step is to hook up the ordering constraints for a' in the original plan to the steps in
d'.
3. The next step is to hook up the ordering constraints for a' in the original plan to the steps in
d'.
PLANNING AND ACTING IN NONDETERMINISTIC DOMAINS
o Agents have to deal with both incomplete and incorrect information.
o Incompleteness arises because the world is partially observable, nondeterministic, or both.
o The possibility of having complete or correct knowledge depends on how much indeterminacy
there is in the world.
o With bounded indeterminacy, actions can have unpredictable effects, but the possible effects
can be listed in the action description axioms.
o With unbounded indeterminacy, on the other hand, the set of possible preconditions or effects
either is unknown or is too large to be enumerated completely.
o There are four planning methods for handling indeterminacy.
o The first two are suitable for bounded indeterminacy, and the second two for unbounded
indeterminacy:
 Sensorless planning: Also called conformant pdanning, this method constructs standard,
sequential plans that are to be executed without perception. The sensorless planning algorithm
must ensure that the plan achieves the goal in all possible circumstances, regardless of the
true initial state and the actual action outcomes. Sensorless planning relies on coercion-the
idea that the world can be forced into a given state even when the agent has only partial
information about the current state. Coercion is not always possible, so sensorless planning is
often inapplicable.
 Conditional planning: Also known as contingency planning, this approach deals with bounded
indeterminacy by constructing a conditional plan with different branches for the different
contingencies that could arise. Just as in c1;issical planning, the agent plans first and then
executes the plan that was produced. The agent finds out which part of the plan to execute by
including sensing actions in the plan to test for the appropriate conditions.
 Execution monitoring and replanning: In this approach, the agent can use any of the preceding
planning techniques (classical, sensorless, or conditional) to construct a plan, but it also uses
execution monitoring to judge whether the plan has a provision for the actual current situation
or need to be revised. Replanning occurs when something goes wrong. In this way, the agent
can handle unbounded indeterminacy.
 Continuous planning: A continuous planner is designed to persist over a lifetime. It can
handle unexpected circumstances in the environment, even if these occur while the agent is in
the middle of constructing a plan. It can also handle the abandonment of goals and the
creation of additional goals by goal formulation.

 Conditional planning is a way to deal with uncertainity by checking what is actually happening in the
environment at predetermined points in the plan.
Conditional planning in fully observable environment
 Fully observable means the agent always knows the current state.
 If the environment is nondeterministic, the agent will not able to predict the outcome of its actions.
 The conditional planning agent handles nondeterminism by building into the plan conditional steps
that will check the state of the environment to decide what to do next.
 Example: vaccum world
 Propositions of the states are At L/At R are be True if the agent is in left/right state and let Clean
L/Clean R be true if the left/right state is clean.
 To allow non determinism actions should have disjunctive effects.
 For example if moving left fails then the normal action description


 It is useful to have conditional effects, wherein the effect of the actions depends on the state in which it
is executed.
 Conditional effect appear in the Effect slot of an action with syntax

 Example

 A "game tree" for this environment is shown in Figure 12.9.


 Actions are taken by the robot in the "state" nodes of the tree, and nature decides what the
outcome will be at the "chance" nodes, shown as circles.
 A solution is a subtree that (1) has a goal node at every leaf, (2) specifies one action at each of its
"state" nodes, and (3) includes every outcome branch at each of its "chance" nodes.
 The solution is shown in bold lines in the figure; it corresponds to the plan [Left, if
[]
AtL˄CleanL˄ CleanR then else Suck].
 For exact solutions of games, we use the minimax algorithm. For conditional planning, there
are typically two modifications. First, MAX and MIN nodes can become OR and AND nodes

 Intuitively, the plan needs to take some action at every state it reaches, but must handle every
outcome for the action it takes.
 Second, the algorithm needs to return a conditional plan rather than just a single move.
 At an OR node, the plan is just the action selected, followed by whatever comes next.
 At an AND node, the plan is a nested series of if then-else steps specifying subplans for each
outcome.
Conditional planning in partially observable environments
 In the initial state of a partially observable planning problem, the agent knows only a certain
amount about the actual state.
 The simplest way to model this situation is to say that the initial state belongs to a state set; the
state set is a way of describing the agent's initial belief state.
 Suppose that a vacuum-world agent knows that it is in the right-hand square and that the square is
clean, but it cannot sense the presence or absence of dirt in other squares.
 Then as far as it knows it could be in one of two states: the left-hand square might be either clean
or dirty.
 This belief state is marked A in Figure 12.112.
 Let us look at how the AND-OR graph is constructed. From belief state A, we show the outcome of
moving Left.
 Because the agent can leave dirt behind, the two possible initial worlds become four possible
worlds, as shown in B and C.
 The worlds form two distinct belief states, classified by the available sensor information.
In B, the agent knows CleanL; in C it knows ⌐CleanL.
From C, cleaning up the dirt moves the agent to B. From B, moving Right might or might not leave
dirt behind, so there are again
four possible worlds, divided according to the agent's knowledge of CleanR (back to A) or ⌐CleanR
(belief state D).

 Conditional plans can be found, therefore, usiing exactly the same algorithm as in the fully
observable case, namely AND-OR-GRAPH-SEARCH.
 We still need to decide how belief states should be represented, how sensing works, and how
action descriptions shoullcl be written in this new setting.
There are three basic choices for belief states:
 1. Sets of full state descriptions. For example, the initial belief state in Figure 12.12 is

This representation is simple to work with, but very expensive: if there are n Boolean

Continuous Planning
 It is not a "problem solver" that is given a single goal and then plans and acts until the goal is
achieved; rather, it lives through a series of ever-changing goal formulation, planning, and acting
phases.
 The agent is thought of as always being part of the way through executing a plan-the grand plan
of living its life.
 Its activities include executing some steps of the plan that are ready to be executed, refining the
plan to satisfy open preconditions or resolve conflicts, and modifying the plan in the light of
additional information obtained during execution.
 Obviously, when it first formulates a new goal, the agent will have no actions ready to execute, so
it will spend a while generating a partial plan.
 It is quite possible, however, for the agent to begin execution before the plan is complete,
especially when it has independent subgoals to achieve.
 The continuous planning agent monitors the world continuously, updating its world model from
new percepts even if its deliberations are still continuing.
 The example we will use is a problem from the blocks world domain.
The action we will need is Move(x, y), which moves block x onto block y, provided that both are
clear. Its action schema is

 The agent first needs to formulate a goal for itself. We won't discuss goal formulation here, but
instead we will assume that somehow the agent was told to achieve the goal On(C, D) A On(D,
B). The agent starts planning for this goal.
 Unlike all our other agents, which would shut off their percepts until the planner returns a
complete
 solution to this problem, the continuous planning agent builds the plan incrementally, with each
increment taking a bounded amount of time.
 After each increment, the agent returns NoOp as its action and checks its percepts again.
 If the percepts don't change and the agent quickly constructs the plan shown in Figure 12.16
 Although the preconditions of both actions are satisfied by Start, there is an ordering constraint
putting Move(D, B) before Move(C, D).
 This is needed to ensure that Clear(D) remains true until Move(D, B) is completed.
 Throughout the continuous planning process, Start is always used as the label for the current
state. The agent updates the state after each action.

The plan is now ready to be executed, but before the agent can take action, nature intervenes.
An external agent moves D onto B.
The agent perceives this, recognizes that Clear(B) and On(D, G) are no longer true in the current
state, and updates its model of the current state accordingly.
The causal links that were supplying the preconditions Clear(B) and On(D, G) for the Move(D, B)
action become invalid and must be removed from the plan.
The new plan is shown in Figure 12.17.
Now the agent can take advantage of the "helpful" interference by noticing that the causal link
can be replaced by a direct link from Start to
Finish. This process is called extending a causal link and is done whenever a condition can
be supplied by an earlier step instead of a later one without causing a new conflict.
Once the old causal link from Moue(D, B) to Finish is removed, Move(D, B) no longer supplies any
causal links at all. It is now a redundant step.
All redundant steps, and any links supplying them, are dropped from the plan. 'This gives the plan in
Figure 12.18.

After Move(C, D) is executed and removed from the plan, the effects of the Start step reflect the fact
that C ended up on A instead of the intended D.
The goal precondition On(C, D) is still open.
The open condition is resolved by adding Move(C, D) back in.

Because all the goal conditions are satisfied by the Start step and there are no remaining actions, the
agent is now free to remove the goals from Finish and formulate a new goal.
MutliAgent Planning
 There will be other agents in the environment.
 Multiagent environments can be cooperative or competitive.
 example: team planning in doubles tennis.
 Plans can be constructed that specify actions for both players on the team.
 Efficient plan construction is useful, but does not guarantee success; the agents have to agree to
use the same plan! This requires some form of coordination, possibly achieved by
communication.
Cooperation: Joint goals and plans
 Two agents playing on a doubles tennis team have the joint goal of winning the match, which
gives rise to various subgoals.
 Mutiagent notation introduces two new features. First, Agents(A, B) declares that there are two
agents, A and B, who are participating in the plan.
 Second, each action explicitly mentions the agent as a parameter, because we need to keep track
of which agent does what.
 A solution to a multiagent planning problem is a joint plan consisting of actions for each agent.
 A joint plan is a solution if the goal will be achieved when each agent performs its assigned
actions. The following plan is a solution to the tennis problem:
PLAN 1 :
A : [Go(A,[ Right,B aseline]),H it(A, Ball)]
B : [NoOp(B)N, oOp(B)].
If both agents have the same knowledge base, and if this is the only solution, then everything would
be fine; the agents could each determine the solution and then jointly execute it.
Unfortunately for the agents there is another plan that satisfies the goal just as well as the first:
PLAN 2:
A : [Go(A, [Left, Net]), NoOp(A)]
B : [Go(B,[ Right,b aseline]),H it(23, Barll)]
 If A chooses plan 2 and B chooses plan 1, then nobody will return the ball.
 Hence, the existence of correct joint plans does not mean that the goal will be achieved.
 The agents need a mechanism for coordination to reach the same joint plan; moreover, it must
be common knowledge.
Multibody planning
 Multibody planning is the construction of correct joint plans, deferring the coordination issue for
the time being.
 multibody planning will be based on partial-order planning.
 We will assume full observability, to keep things simple. There is one additional issue that doesn't
arise in the single-agent case: the environment is no longer truly static, because other agents
could act while any particular agent is deliberating.
 For synchronization each action takes the same amount of time and that actions at each point in
the joint plan are simultaneous.
 At any point in time, each agent is executing exactly one action.
 This set of concurrent actions is called a joint action. For example, a joint action in the tennis
domain with two agents A and B is (NoOp(A),H it(B, Ball)).
 A joint plan consists of a partially ordered graph of joint actions.
 For example, Plan 2 for the tennis problem can be represented as this sequence of joint actlons:
(Go(A, [Left, Net]), Go(B, [Right, baseline]))
(NoOp(A)H, it(B,B all))
We could do planning using the regular POP algorithm, applied to the set of all possible joint actions.
The only problem is the size of this set: with 10 actions and 5 agents we get 105 joint actions.
It would be tedious to specify the preconditions and effects of each action correctly, and inefficient to
do planning with such a large set.
An alternative is to define joint actions implicitly, by describing how each individual action interacts
with other possible actions.
This will be simpler, because most actions are independent of most others; we need list only the few
actions that actually interact.
We can do that by augmenting the usual STRIPS or ADL action descriptions with one new feature: a
concurrent action list.
This is similar to the precondition of an action description except that rather than describing state
variables, it describes actions that must or must not be executed concurrently. For example, the Hit
action could be described as follows:

Here, we have the prohibited-concurrency constraint that, during the execution of the Hit action, there
can be no other Hit action by another agent.
With this representation, it is possible to create a planner that is very close to the POP partialorder
planner. There are three differences:
1. In addition to the temporal ordering relation , we allow A = B and , meaning
"concurrent" and "before or concurrent," respectively.
2. When a new action has required concurrent actions, we must instantiate those actions, using new or
existing actions in the plan.
3. Prohibited concurrent actions are an additional source of constraints. Each constraint must be
resolved by constraining conflicting actions to be before or after.
This representation gives us the equivalent of POP for multibody domains.
Coordination mechanisms
 The simplest method by which a group of agents can ensure agreement on a joint plan is to adopt
a convention prior to engaging in joint activity.
 A convention is any constraint on the selection of joint plans, beyond the basic constraint that the
joint plan must work if all agents adopt it.
 Some conventions, such as driving on the proper side of the road, are so widely adopted that they
are considered social laws.
 The conventions in the preceding paragraph are domain-specific and can be implemented by
constraining the action descriptions to rule out violations of the convention.
 A more general approach is to use domain-independent conventions.
 For example, if each agent runs the same multibody planning algorithm with the same inputs, it
can follow the convention of executing the first feasible joint plan found, confident that the other
agents will come to the same choice. A more robust but more expensive strategy would be to
generate all joint plans and then pick the one.
Competition
 Not all multiagent environments involve cooperative agents. Agents with conflicting utility
functions are in competition with each other. One example of this is two-player zero-sum games,
such as chess.
 A chess-playing agent needs to consider the opponent's possible moves for several steps into the
future.
 That is, an agent in a competitive environment must (a) recognize that there are other agents, (b)
compute some of the other agent's possible plans, (c) compute how the other agent's plans interact
with its own plans, and (d) decide on the best action in view of these interactions.
 So competition, like cooperation, requires a model of the other agent's plans.
 On the other hand, there is no commitment to a joint plan in a competitive environment.
 The conditional planning algorithm constructs plans that work under worst-case assumptions
about the environment, so it can be applied in competitive situations where the agent is concerned
only with success and failure.

You might also like