0% found this document useful (0 votes)
11 views

Learning-Based Planning: Sergio Jiménez Celorrio

Uploaded by

aljulaguiman589
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Learning-Based Planning: Sergio Jiménez Celorrio

Uploaded by

aljulaguiman589
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1024

Learning-Based Planning
Sergio Jiménez Celorrio
Universidad Carlos III de Madrid, Spain

Tomás de la Rosa Turbides


Universidad Carlos III de Madrid, Spain

INTRODUCTION ML to AP. Finally, we comment on the next avenues


for combining AP and ML and conclude.
Automated Planning (AP) studies the generation of
action sequences for problem solving. A problem in AP
is defined by a state-transition function describing the BACKGROUND
dynamics of the world, the initial state of the world and
the goals to be achieved. According to this definition, The languages for representing AP tasks are typically
AP problems seem to be easily tackled by searching based on extensions of first-order logic. They encode
for a path in a graph, which is a well-studied problem. tasks using a set of actions that represents the state-
However, the graphs resulting from AP problems are transition function of the world (the planning domain)
so large that explicitly specifying them is not feasible. and a set of first-order predicates that represent the
Thus, different approaches have been tried to address initial state together with the goals of the AP task (the
AP problems. Since the mid 90’s, new planning al- planning problem). In the early days of AP, STRIPS
gorithms have enabled the solution of practical-size was the most popular representation language. In 1998
AP problems. Nevertheless, domain-independent the Planning Domain Definition Language (PDDL)
planners still fail in solving complex AP problems, as was developed for the first International Planning
solving planning tasks is a PSPACE-Complete problem Competition (IPC) and since that date it has become
(Bylander, 94). the standard language for the AP community. In PDDL
How do humans cope with this planning-inherent (Fox & Long, 2003), an action in the planning domain
complexity? One answer is that our experience allows is represented by: (1) the action preconditions, a list
us to solve problems more quickly; we are endowed of predicates indicating the facts that must be true so
with learning skills that help us plan when problems the action becomes applicable and (2) the action post-
are selected from a stable population. Inspire by this conditions, typically separated in add and delete lists,
idea, the field of learning-based planning studies the which are lists of predicates indicating the changes in
development of AP systems able to modify their per- the state after the action is applied.
formance according to previous experiences. Before the mid ‘90s, automated planners could only
Since the first days, Artificial Intelligence (AI) has synthesize plans of no more than 10 actions in an ac-
been concerned with the problem of Machine Learning ceptable amount of time. During those years, planners
(ML). As early as 1959, Arthur L. Samuel developed strongly depended on speedup techniques for solving
a prominent program that learned to improve its play AP problems. Therefore, the application of search
in the game of checkers (Samuel, 1959). It is hardly control became a very popular solution to accelerate
surprising that ML has often been used to make changes planning algorithms. In the late 90’s, a significant scale-
in systems that perform tasks associated with AI, such up in planning took place due to the appearance of the
as perception, robot control or AP. This article analy- reachability planning graphs (Blum & Furst, 1995)
ses the diverse ways ML can be used to improve AP and the development of powerful domain independent
processes. First, we review the major AP concepts and heuristics (Hoffman & Nebel, 2001) (Bonet & Geffner,
summarize the main research done in learning-based 2001). Planners using these approaches could often
planning. Second, we describe current trends in applying synthesize 100-action plans just in seconds.

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Learning-Based Planning

At the present time, there is not such dependence task, which ignores the delete list of actions. The solu-
on ML for solving AP problems, but there is a renewed tion to the simplified task is taken as the estimated cost L
interest in applying ML to AP motivated by three factors: for reaching the task goals. These kinds of heuristics
(1) IPC-2000 showed that knowledge-based planners provide good guidance across the wide range of different
significantly outperform domain-independent planners. domains. However, they have some faults: (1) in many
The development of ML techniques that automatically domains, these heuristic functions vastly underestimate
define the kind of knowledge that humans put in these the distance to the goal leading to poor guidance, (2)
planners would bring great advances to the field. (2) the computation of the heuristic values of the search
Domain-independent planners are still not able to cope nodes is too expensive, and (3) these heuristics are
with real-world complex problems. On the contrary, non-admissible so heuristics planners do not find good
these problems are often solved by defining ad hoc plan- solutions in terms of plan quality.
ning strategies by hand. ML promises to be a solution to Since evaluating a search node in heuristic planning
automatically defining these strategies. And, (3) there is is so time consuming, (De la Rosa, García-Olaya &
a need for tools that assist in the definition, validation Borrajo, 2007) proposed using Case-based Reasoning
and maintenance of planning-domain models. At the (CBR) to reduce the number of explored nodes. Their
moment, these processes are still done by hand. approach stores sequences of abstracted state transi-
tions related to each particular object in a problem
instance. Then, with a new problem, these sequences
LEARNING-BASED PLANNING are retrieved and re-instantiated to support a forward
heuristic search, deciding the node ordering for com-
This section describes the current ML techniques puting its heuristic value.
for improving the performance of planning systems. In the last years, other approaches have been devel-
These techniques are grouped according to the target oped to minimize the negative effects of the heuristic
of learning: search control, domains-specific planners, through ML: (Botea, Enzenberger, Müller & Schaef-
or domain models. fer, 2005) learned off-line macro-actions to reduce the
number of evaluated nodes by decreasing the depth of
Learning Search Control the search tree. (Coles & Smith, 2007) learned on-line
macro-actions to escape from plateaus in the search tree
Domain-independent planners require high search ef- without any exploration. (Yoon, Fern & Givan, 2006)
fort, so search-control knowledge is frequently used proposed using an inductive approach to correct the
to reduce this effort. Hand-coded control knowledge domain-independent heuristic in those domains based
has proved to be useful in many domains, however on learning a supplement to the heuristic from observa-
is difficult for humans to formalize it, as it requires tions of solved problems in these domains.
specific knowledge of the planning domains and the All these methods for learning search-control knowl-
planner structure. Since AP’s early days, diverse edge suffer from the utility problem. Learning too much
ML techniques have been developed with the aim of control knowledge can actually be counterproductive
automatically learning search-control knowledge. A because the difficulty of storing and managing the
few examples of these techniques are macro-actions information and the difficulty of determining which
(Fikes, Hart & Nilsson, 1972), control-rules (Borrajo information to use when solving a particular problem
& Veloso, 1997), and case-based and analogical plan- can interfere with efficiency.
ning (Veloso, 1994).
At the present, most of the state-of-the-art planners Learning Domain-Specific Planners
are based on heuristic search over the state space (12
of the 20 participants in IPC-2006 used this approach). An alternative approach to learning search control con-
These planners achieve impressive performance in sists of learning domain-specific planning programs.
many domains and problems, but their performance These programs receive as input a planning problem
strongly depends on the definition of a good domain- of a fixed domain and return a plan that solves the
independent heuristic function. These heuristics are problem.
computed solving a simplified version of the planning

1025
Learning-Based Planning

The first approaches to learn domain-specific plan- complex. Actions may result in innumerable different
ners were based on supervised inductive learning; they outcomes, so more elaborated approaches are required.
used genetic programming (Spector, 1994) and deci- (Pasula, Zettlemoyer & Kaelbling, 2004) presented
sion-list learning (Khardon, 1999), but they were not the first specific algorithm to learn simple stochastic
able to reliably produce good results. Recently, (Winner actions without conditional effects. This algorithm is
& Veloso, 2003) presented a different approach based based on three levels of learning: the first one consists
on generalizing an example plan into a domain-specific of deterministic rule-learning techniques to induce
planning program and merging the resulting source the action preconditions. The second one relies on a
code with the previous ones. search for the set of action outcomes that best fits the
Domain-specific planners are also represented as execution examples, and; the third one consists of
policies, i.e., pairs of state and the preferred action estimating the probability distributions over the set of
to be executed in the state. Relational Reinforcement action outcomes. But, stochastic planning algorithms do
Learning (RRL) (Dzeroski, Raedt & Blockeel, 1998) not need to consider all the possible actions outcomes.
has aroused interest as an efficient approach for learning (Jimenez & Cussens 2006) proposed to learn complex
policies for relational domains. RRL includes a set of action-effect models (including conditions) for only
learning techniques for computing the optimal policy the relevant action outcomes. Thus, planners generate
for reaching the given goals by exploring the state robust plans by covering only the most likely execution
space though trial and error. The major benefit of these outcome while leaving others to be completed when
techniques is that they can be used to solve problems more information is available.
whether the action model is known or not. In the other In deterministic environments, (Shahaf & Amir,
hand, since RRL does not explicitly include the task 2006) introduced an algorithm that exactly learns
goals in the policies, new policies have to be learned STRIPS action schemas even if the domain is only
every time a new goal has to be achieved, even if the partially observable. But, in stochastic environments,
dynamics of the environment has not changed. there is still no general efficient approach to learn ac-
In general, domain-specific planners have to deal tion model.
with the problem of generalization. These techniques
build planning programs from a given set of solved
problems so cannot theoretically guarantee solving FUTURE TRENDS
subsequent problems.
Since the appearance of the first PDDL version in IPC-
Learning Domain Models 1998, the standard planning representation language has
evolved to bring together AP algorithms and real-world
No matter how efficient a planner is, if it is fed with a planning problems. Nowadays, the PDDL 3.0 version
defective domain model, it will return defective plans. for the IPC-2006 includes numeric state variables to
Designing, encoding and maintaining a domain model support quality metrics, durative actions that allow ex-
is very laborious. At the time being, planners are the plicit time representation, derived predicates to enrich
only tool available to assist in the development of an AP the descriptions of the system states, and soft goals
domain model, but planners are not designed specifi- and trajectory constraints to express user preferences
cally for this purpose. Domain model learning studies about the different possible plans without discarding
ML mechanisms to automatically acquire the planning valid plans. But, most of these new features are not
action schemas (the action preconditions and post-con- handled by the state-of-the-art planning algorithms:
ditions) from observations of action executions. The existing planners usually fail solving problems that
Learning domain models in deterministic environ- define quality metrics. The issue of goal and trajectory
ments is a well-studied problem; diverse inductive preferences has only been initially addressed. Time
learning techniques have been successfully applied and resources add such extra complexity to the search
to automatically define the actions schema from ob- process that a real-world problem becomes extremely
servations (Shen & Simon, 1989), (Benson, 1997), difficult to solve. New challenges for the AP community
(Yang, Wu & Jiang, 2005), (Shahaf & Amir, 2006). In are those related to developing new planning algorithms
stochastic environments, this problem becomes more and heuristics to deal with these kinds of problems. As

1026
Learning-Based Planning

it is very difficult to find an efficient general solution, techniques. Automatic learned knowledge is useful
ML must play an important role in addressing these for AP in diverse ways: it helps planners in guiding L
new challenges because it can be used to alleviate the search processes, in completing domain theories or in
complexity of the search process by exploiting regular- specifying particular solutions to a particular problem.
ity in the space of common problems. However, the learning-based planning community can
Besides, the state-of-the-art planning algorithms not only focus on developing new learning techniques
need a detailed domain description to efficiently solve but also on defining formal mechanisms to validate its
the AP task, but new applications like controlling un- performance against other generic planners and against
derwater autonomous vehicles, Mars rovers, etc. imply other learning-based planners.
planning in environments where the dynamics model
may be not easily accessible. There is a current need for
planning systems to be able to acquire information of REFERENCES
their execution environment. Future planning systems
have to include frameworks that allow the integration Benson, S. (1997). Learning Action Models for Re-
of the planning and execution processes together with active Autonomous Agents. PhD thesis, Stanford
domain modeling techniques. University.
Traditionally, learning-based planners are evaluated
only against the same planner but without learning, in Blum, A., & Furst, M. (1995). Fast planning through
order to prove their performance improvement. Addi- planning graph analysis. In C. S. Mellish, editor, Pro-
tionally, these systems are not exhaustively evaluated; ceedings of the 14th International Joint Conference
typically the evaluation only focuses on a very small on Artificial Intelligence, IJCAI-95, volume 2, pages
number of domains, so these planners are usually quite 1636–1642, Montreal, Canada, August 1995. Morgan
fragile when encountering new domains. Therefore, Kaufmann.
the community needs a formal methodology to validate Bonet, B. & Geffner, H. (2001). Planning as Heuristic
the performance of the new learning-based planning Search. Artificial Intelligence, 129 (1-2), 5-33.
systems, including mechanisms to compare different
learning-based planners. Borrajo, D., & Veloso, M. (1997). Lazy Incremental
Although ML techniques improve planning systems, Learning of Control Knowledge for Efficiently Obtain-
existing research cannot theoretically demonstrate ing Quality Plans. AI Review Journal. Special Issue
that they will be useful in new benchmark domains. on Lazy Learning. 11 (1-5), 371-405.
Moreover, for time being, it is not possible to formally Botea, A., Enzenberger, M., Müller, M. & Schaeffer,
explain the underlying meaning of the learned knowl- J. (2005). Macro-FF: Improving AI Planning with
edge (i.e., does the acquired knowledge subsumes task Automatically Learned Macro-Operators. Journal of
decomposition? a goal ordering? a solution path?). Artificial Intelligence Research (JAIR), 24, 581-621.
This point reveals that future research in AP and ML
will also focus on theoretical aspects that address these Bylander, T., The computational complexity of proposi-
issues. tional STRIPS planning. (1994). Artificial Intelligence,
69(1-2), 165–204.
Coles, A., & Smith, A. (2007). Marvin: A heuristic
CONCLUSION search planner with online macro-action learning. Jour-
nal of Artificial Intelligence Research, 28, 119–156.
Generic domain-independent planners are still not able
to address the complexity of real planning problems. De la Rosa, T., García Olaya, A., & Borrajo, D. (2007)
Thus, most planning systems implemented in applica- Using Utility Cases for Heuristic Planning Improve-
tions require additional knowledge to solve the real ment. Procceedings of the 7th International Conference
planning tasks. However, the extraction and compilation on Case-Based Reasoning, Belfast, Northern Ireland,
of this specific knowledge by hand is complicated. Springer-Verlag.
This article has described the main last advances Dzeroski, S., Raedt, L. D., & Blockeel, H., (1998)
in developing planners successfully assisted by ML Relational reinforcement learning. In International

1027
Learning-Based Planning

Workshop on Inductive Logic Programming, pages Yang, Q, Wu, K & Jiang, Y. (2005) Learning action
11–22. models from plan examples with incomplete knowledge.
In Proceedings of the 2005 International Conference
Fikes, R., Hart, P., & Nilsson, N., (1972) Learning and
on Automated Planning and Scheduling, (ICAPS 2005)
Executing Generalized Robot Plans, Artificial Intel-
Monterey, CA USA, pages 241–250.
ligence, 3, pages 251-288.
Yoon, S., Fern, A., & Givan, R., (2006). Learning
Fox, M. & Long, D, (2003) PDDL2.1: An extension to
heuristic functions from relaxed plans. In International
PDDL for expressing temporal planning domains. Jour-
Conference on Automated Planning and Scheduling
nal of Artificial Intelligence Research, 20, 61–124.
(ICAPS-2006).
Hoffmann J. & Nebel B. (2001) The FF planning system:
Fast plan generation through heuristic search. Journal
of Artificial Intelligence Research,14, 253–302.
KEY TERMS
Jiménez, S. & Cussens, J. (2006). Combining ILP and
Parameter Estimation to Plan Robustly in Probabilistic Control Rule: IF-THEN rule to guide the planning
Domains. In Conference on Inductive Logic Program- search-tree exploration.
ming. Santiago de Compostela, ILP2006. Spain.
Derived Predicate: Predicate used to enrich the
Khardon, R. (1999) Learning action strategies for plan- description of the states that is not affected by any of
ning domains. Artificial Intelligence, 113, 125–148, the domain actions. Instead, the predicate truth values
are derived by a set of rules of the form if formula(x)
Pasula, H. Zettlemoyer, L. & Kaelbling, L. (2004)
then predicate(x).
Learning probabilistic relational planning rules. Pro-
ceedings of the Fourteenth International Conference Domain Independent Planner: Planning system
on Automated Planning and Scheduling, ICAPS04. that addresses problems without specific knowledge of
the domain, as opposed to domain-dependent planners,
Samuel, A. L., (1959). Some studies in machine learning
which use domain-specific knowledge.
using the game of checkers. IBM Journal of Research
and Development, 3(3), 211–229. Macro-Action: Planning action resulting from
combining the actions that are frequently used together
Shahaf, D & Amir, E. (2006). Learning partially observ-
in a given domain. Used as control knowledge to speed
able action schemas. In Proceedings of the 21st National
up plan generation.
Conference on Artificial Intelligence (AAAI’06).
Online Learning: Knowledge acquisition during
Shen, W. & Simon. (1989). Rule creation and rule learn-
a problem-solving process with the aim of improving
ing through environmental exploration. In Proceedings
the rest of the process.
of the IJCAI-89, pages 675–680.
Plateau: Portion of a planning search tree where
Spector, L. (1994) Genetic programming andAI planning
the heuristic value of nodes is constant or does not
systems. In Proceedings of Twelfth National Conference
improve.
on Artificial Intelligence, Seattle,Washington,USA,
AAAI Press/MIT Press. Policy: Mapping between the world states and the
preferred action to be executed in order to achieve a
Veloso, M. (1994). Planning and learning by analogi-
given set of goals.
cal reasoning. Springer Verlag.
Search Control Knowledge: Additional knowledge
Winner, E. & Veloso, M. (2003) Distill: Towards learn-
introduced to the planner with the aim of simplifying
ing domain-specific planners by example. In Proceed-
the search process, mainly by pruning unexplored
ings of Twentieth International Conference on Machine
portions of the search space or by ordering the nodes
Learning (ICML 03), Washington, DC, USA.
for exploration.

1028

You might also like