icaps-llm-tut-slides-posted
icaps-llm-tut-slides-posted
bit.ly/3NC6vqs
Aim of the Tutorial
• Of late, there has been a significant rise in interest in using Large
Language Models in planning tasks
• In the last ~3 years, many papers have been published
• It started with training sequence learning models (specifically transformers)
tabula rasa on plans
• ..and has currently mostly become a sort of ersatz natural science of using huge
pre-trained models to see (and exploit) what planning abilities they may already
possess
• Much of the literature is in NLP and ML conferences.
• There is bidirectional ignorance.
• The authors of the papers don’t know ICAPS community
• and the ICAPS community is largely unaware of the work on LLMs and Planning
• This tutorial is an attempt to rectify this ignorance, and to take a critical
look at the role of LLMs in Planning
• (..and should thus be of interest to both populations..)
4
Broad Lessons of the Tutorial..
• We will talk about both how LLMs can’t do the kinds of tasks we
(ICAPS folks) consider planning
• ..and how the availability of LLMs can help increase the the planning
tasks that can be supported (if only with partial guarantees)
5
A Summary of LLM roles in Planning we saw
• LLMs to generate plans autonomously
• (Different prompting techniques, fine tuning techniques
• Not that impressive if the plans require interaction resolution
• LLMs to generate plans with the help of external planners/verifiers (LLM-Modulo
setting)
• As heuristic seeds to external planners
• As targets of “back prompts” by external verifiers This is a summary slide put here
• LLMs to extract plan knowledge just to foreshadow..
• Domain models
• Task reduction schemas
• LLMs as translators (e.g. from English to PDDL)
• Convert plans/recipes in English to formal representations (can be used for extracting
structured knowledge)
• Serve as a glorified natural language interface for specifying goals, problems etc..
6
A few caveats about the tutorial..
• This is not a dry chronological survey with a
laundry list of papers and their blurbs
• It is an opinionated perspective on the state of
LLMs and Planning intersection
• Informed by our own work in the area
• Caution: The authors of the papers brought up
in the tutorial may well bristle at the particular
perspective/pigeonholing of their work
• The aim is not to make up your mind, but to equip
you with a perspective that you may find useful
when you read the literature yourself (..or do work
in the area yourself..)
7
Expectations on the Tutorial Audience
• Some understanding of planning as practiced in
ICAPS is expected..
• After all, you are at ICAPS
• Not too much background on LLMs is expected
• The tutorial provides a perspective for viewing them..
8
A perspective on LLMs
9
LLMs are N-gram models on STEROIDS
• Text is a long sequence of words (including spaces,
punctuations)
• An n-gram model of language learns to predict n-th word given
the preceding n-1 words
• Probabilistically speaking it learns Pr(Wn |W1...Wn-1)
• Unigram predicts each word independently (no preceding context)
• Bigram predicts each word given the previous word
• A 3001-gram model learns to predict the next word given the previous
3000 words
• ChatGPT is just a 3001-gram model
• The power of an n-gram model depends on
• How much text it trains on
• How big is the n (context) and
• How high-capacity is the function learning Pr(Wn |W1...Wn-1)
• ChatGPT trains on ~600 gigabytes of text on the Web
• It learns a very high capacity function that has 175 billion parameters
• Learns Pr(Wn |W1...Wn-1) for all possible nth words Wn (Vocabulary of the
language, ~50K in English)
“Wild Cats are but a bunch of wannabe Sun Devils.”
• Each prefix of the sentence is a training • LLM uses its current function to guess the
example next word
• Wild ____ • Wild Geese
• Wild Cats____ • Guess: Geese Correct: Cats
• Wild Cats are_____
• .. • Error= {Cats - Geese}
• Wild Cats are but a bunch of wannabe • To the LLMs, all vocabulary tokens are just
Sun_____ vectors in some high dimensional
embedding space; so the difference is well
defined as the vector difference
• Propagate this error back through the
function, and change the parameters so
Answer: MAGIC..!
Some possible factors:
18
So, can LLMs Plan?
19
But they seem to fall flat on
the lowly 3-block stacking
problem
(that we affectionately call
Sussman Anomaly)
20
21
23
24
28
29
Results on GPT-4
Blocksworld
(Deceptive) Zero-shot 1/600 (0.16%)
-
Blocksworld
(Randomized) Zero-shot 0/600 (0%)
-
Plan Generation Results
Plan Generation Prompt - Blocksworld
Domain Query Instance
I am playing with a set of blocks where I need to arrange the blocks into stacks. Here are the actions I can do [STATEMENT]
Pick up a block As initial conditions I have that, the red block is clear, the yellow block is clear, the hand is empty, the red block is on top of the
Unstack a block from on top of another block blue block, the yellow block is on top of the orange block, the blue block is on the table and the orange block is on the table.
Put down a block My goal is to have that the orange block is on top of the red block.
Stack a block on top of another block My plan is as follows:
Example
[STATEMENT]
As initial conditions I have that, the red block is clear, the blue block is clear, the yellow block is clear, the
SUCCESS
hand is empty, the blue block is on top of the orange block, the red block is on the table, the orange block is
on the table and the yellow block is on the table.
My goal is to have that the orange block is on top of the blue block.
My plan is as follows:
[PLAN]
unstack the blue block from on top of the orange block
put down the blue block
pick up the orange block
stack the orange block on top of the blue block
[PLAN END]
Human Baseline for Blocksworld
• Preliminary study – 50 participants
• Asked to come up with a plan for one
instance (from the set of 500
instances)
• Two phases of interaction
• Plan writing phase – Participants write up
plans Plan Translation Plan Writing
• Plan translation phase – Participants
translate already written plans
• First for an example then the actual
instance 70%
22% 78%
• 50 human planners, 39 (78%) of them 8%
came up with valid plan, 35 (70%) of
which came up with the optimal plan
Invalid Optimal Sub-optimal
33
Are LLMs retrieving based on
names or are they reasoning?
What if GPT4 is basically bringing to bear its background knowledge
about blocks world instead of just depending on the domain model?
34
Mystery blocksworld domain
I am playing with a set of blocks where I need to arrange the blocks into stacks. Here are I am playing with a set of objects. Here are the actions I can do
the actions I can do
Attack object
Feast object from another object
Pick up a block
Succumb object
Unstack a block from on top of another block
Overcome object from another object
Put down a block
Stack a block on top of another block I have the following restrictions on my actions:
To perform Attack action, the following facts need to be true: Province object, Planet object, Harmony
I have the following restrictions on my actions: Once Attack action is performed the following facts will be true: Pain object
I can only pick up or unstack one block at a time. Once Attack action is performed the following facts will be false: Province object, Planet object, Harmony
To perform Succumb action, the following facts need to be true: Pain object
I can only pick up or unstack a block if my hand is empty.
Once Succumb action is performed the following facts will be true: Province object, Planet object, Harmony
I can only pick up a block if the block is on the table and the block is clear. A block is
Once Succumb action is performed the following facts will be false: Pain object.
clear if the block has no other blocks on top of it and if the block is not picked up. To perform Overcome action, the following needs to be true: Province other object, Pain object
I can only unstack a block from on top of another block if the block I am unstacking was Once Overcome action is performed the following will be true: Harmony, Pain object, Object Craves other object
really on top of the other block. Once Overcome action is performed the following will be false: Province other object, Pain object
I can only unstack a block from on top of another block if the block I am unstacking is To perform Feast action, the following needs to be true: Object Craves other object, Province object, Harmony.
clear. Once Feast action is performed the following will be true: Pain object, Province other object
Once I pick up or unstack a block, I am holding the block. Once Feast action is performed the following will be false:, Object Craves other object, Province object, Harmony
I can only put down a block that I am holding.
I can only stack a block on top of another block if I am holding the block being stacked.
I can only stack a block on top of another block if the block onto which I am stacking the
block is clear.
Once I put down or stack a block, my hand becomes empty.
Blocksworld
(Deceptive) Zero-shot 1/600 (0.16%)
-
Blocksworld
(Randomized) Zero-shot 0/600 (0%)
-
Plan Generation Results on Mystery BW
Plan Generation Results on Mystery BW
Can Alternate Prompting
Strategies Help?
Prompting directly in PDDL
(after all, LLMs are not confined to actual natural language!)
40
Plan Generation PDDL Prompt - Blocksworld
Domain Example Query Instance
Here is a pddl domain, an example problem and it's [PROBLEM] [QUERY PROBLEM]
corresponding plan.
(define (problem BW-rand-4) (define (problem BW-rand-4)
Provide the plan for the query problem. Provide only the
pddl syntax for the plan. (:domain blocksworld-4ops) (:domain blocksworld-4ops)
[DOMAIN] (:objects a b c d ) (:objects a b c d )
(define (domain blocksworld-4ops)
(:init (:init
(:requirements :strips)
(:predicates (clear ?x) (handempty) (handempty)
(ontable ?x) (ontable a) (on a b)
(handempty)
(holding ?x) (on b c) (ontable b)
(on ?x ?y)) (ontable c) (ontable c)
(ontable d) (on d c)
(:action pick-up
(clear a) (clear a)
:parameters (?ob)
:precondition (and (clear ?ob) (ontable ?ob) (handempty)) (clear b) (clear d))
:effect (and (holding ?ob) (not (clear ?ob)) (not (ontable (clear d))
?ob))
(:goal
(not (handempty)))) (:goal (and
(and (on c a))
(:action put-down
(on c b)) ))
:parameters (?ob)
:precondition (holding ?ob) ))
:effect (and (clear ?ob) (handempty) (ontable ?ob) [PLAN]
(not (holding ?ob)))) GPT-4 Response
(unstack b c)
(:action stack (put-down b) [PLAN]
:parameters (?ob ?underob) (pick-up c) (unstack a b)
:precondition (and (clear ?underob) (holding ?ob))
(stack c b) (put-down a)
:effect (and (handempty) (clear ?ob) (on ?ob ?underob)
(not (clear ?underob)) (not (holding ?ob)))) [PLAN_END] (pick-up c)
(stack c a)
(:action unstack
:parameters (?ob ?underob) [PLAN_END]
:precondition (and (on ?ob ?underob) (clear ?ob)
(handempty))
:effect (and (holding ?ob) (clear ?underob)
(not (on ?ob ?underob)) (not (clear ?ob)) (not
(handempty))))) SUCCESS
Plan Generation Results – PDDL prompts
Do we have to include the domain description
in the prompting?
• Including Domain Model in the prompt provides a more natural
way to ”tell” LLM to only generate plans with actions from the
domain model.
• Prompting LLMs with just the problem
• Issue: the actions output by LLM may not correspond to any
actual actions that the executor has access to
• Two ways to "solve" the problem
• Just look for an action "close"/"similar" in
the name (either English or embedding space)
• No guarantee of actual connection
• Manipulate the next token (action) generation
• Look for an action (name) that executor has access
to (Basically, SAYCAN) 43
Learning to Correct LLM Suggested Actions
• Huang et al*
• Evaluation of LLMs as zero shot planners
done in a household setting (Virtual Home;
Open-ended tasks)
• Metrics
• Executability: Whether an action satisfies the
constraints of the environment
• Correctness: How similar the generated plan is
to a human annotation for the task
• A pre-trained Bert-style LM is used to
semantically translate free-form LLM
outputs to an admissible action
• Cosine Similarity in the pre-trained embedded
space is used as the distance metric
*Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022, June). Language models as zero-shot planners: Extracting actionable knowledge 44
for embodied agents. In International Conference on Machine Learning (pp. 9118-9147). PMLR.
Modifying LLM Next Action Generator
(SayCan)
• SayCan - LLMs are used as heuristics
• Given:
• A high-level instruction i
• A set of low-level skills Π
• Language description 𝑙! for each of the skills 𝜋 ∈ Π
• LLM – provides 𝑝(𝑙! 𝑖 - probability that 𝑙! makes
progress towards i.
• LLMs score a set of pre-determined skills
• The scoring is done by accessing the inner log
probabilities generated by the LLM.
• Robot - provides 𝑝(𝑐! 𝑠, 𝑙! - probability that 𝜋
described by 𝑙! can be completed in state s.
Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., ... & Zeng, A. (2022). Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint
arXiv:2204.01691.
Will LLMs do better if they do "step-by-step"
plan generation?
• One of the claims in the LLM planning and reasoning papers has been that LLMs can do
better if they do "step by step" reasoning.
• Will they may be do better if they are "asked" to evolution of the plan?
• This can basically be incremental checking of any proof of correctness
• Let's start with progression proof which will involve tracking the state
• The prompt contains:
• The domain information
• A meta-prompt explaining about plan correctness
• An annotated example where each action is annotated with
• The state prior to the action.
• The reason for why the action is applicable in the prior state.
• The resulting state after applying the action.
• The query instance
• LLM is then asked to return a response making the same state tracking and justification
annotations that were included in the example.
State Tracking Example
Domain Annotated Example
I am playing with a set of blocks where I need to arrange the blocks into stacks. Here are the actions I can do [STATEMENT]
As initial conditions I have that, the red block is clear, the blue block is clear, the yellow block is clear, the hand is empty, the blue block is on top of
Pick up a block the orange block, the red block is on the table, the orange block is on the table and the yellow block is on the table
Unstack a block from on top of another block My goal is to have that the orange block is on top of the blue block.
My plan is as follows:
Put down a block
[PLAN]
Stack a block on top of another block
1. Current State: the red block is clear, the blue block is clear, the yellow block is clear, the hand is empty, the blue block is on top of the orange
block, the red block is on the table, the orange block is on the table and the yellow block is on the table
I have the following restrictions on my actions:
Action: unstack the blue block from on top of the orange block
I can only pick up or unstack one block at a time. Reason: The above action is applicable in the current state because its preconditions; the blue block is clear, the hand is empty and the blue block
is on top of the orange block, are satisfied in the current state.
I can only pick up or unstack a block if my hand is empty. Resulting State: the red block is clear, the orange block is clear, the yellow block is clear, the hand is currently holding blue block, the red block
I can only pick up a block if the block is on the table and the block is clear. A block is clear if the block has no other is on the table, the orange block is on the table and the yellow block is on the table
blocks on top of it and if the block is not picked up.
2. Current State: the red block is clear, the orange block is clear, the yellow block is clear, the hand is currently holding blue block, the red block is
I can only unstack a block from on top of another block if the block I am unstacking was really on top of the other on the table, the orange block is on the table and the yellow block is on the table
block. Action: put down the blue block
Reason: The above action is applicable in the current state because its preconditions; the hand is currently holding blue block, are satisfied in the
I can only unstack a block from on top of another block if the block I am unstacking is clear.
current state.
Once I pick up or unstack a block, I am holding the block. Resulting State: the red block is clear, the blue block is clear, the orange block is clear, the yellow block is clear, the hand is empty, the red block
is on the table, the blue block is on the table, the orange block is on the table and the yellow block is on the table
I can only put down a block that I am holding.
I can only stack a block on top of another block if I am holding the block being stacked. 3. Current State: the red block is clear, the blue block is clear, the orange block is clear, the yellow block is clear, the hand is empty, the red block
is on the table, the blue block is on the table, the orange block is on the table and the yellow block is on the table
I can only stack a block on top of another block if the block onto which I am stacking the block is clear. Action: pick up the orange block
Once I put down or stack a block, my hand becomes empty. Reason: The above action is applicable in the current state because its preconditions; the orange block is clear, the hand is empty and the orange
block is on the table, are satisfied in the current state.
Once you stack a block on top of a second block, the second block is no longer clear.
Resulting State: the red block is clear, the blue block is clear, the yellow block is clear, the hand is currently holding orange block, the red block
is on the table, the blue block is on the table and the yellow block is on the table
Meta prompt 4. Current State: the red block is clear, the blue block is clear, the yellow block is clear, the hand is currently holding orange block, the red block is
on the table, the blue block is on the table and the yellow block is on the table
The plan correctness is defined in terms of states resulting from executing the actions in the plan. An action is Action: stack the orange block on top of the blue block
executable in a state when all its preconditions hold in that state. The state resulting from the action execution Reason: The above action is applicable in the current state because its preconditions; the blue block is clear and the hand is currently holding
consists of everything in the previous state with the addition and deletion of add and delete effects of the orange block, are satisfied in the current state.
Resulting State: the red block is clear, the orange block is clear, the yellow block is clear, the hand is empty, the orange block is on top of the blue
action. Plan correctness is defined as follows: if the first action in the plan is applicable in the initial state, i.e.,
block, the red block is on the table, the blue block is on the table and the yellow block is on the table
its preconditions are all present there; and the second action is applicable in the state resulting from applying
the first action to the initial state, this process continues until the state resulting from the application of the last Final State: the red block is clear, the orange block is clear, the yellow block is clear, the hand is empty, the orange block is on top of the blue
action in the last but one state gives rise to the final state where all the goals are satisfied.
block, the red block is on the table, the blue block is on the table and the yellow block is on the table
The goal conditions are satisfied in the final state. Hence, the above plan is valid.
[PLAN END]
State Tracking Example
Query Instance GPT-4 Response
[STATEMENT] 4. Current State: the red block is clear, the blue block is clear, the orange block is clear, the hand is currently holding yellow
As initial conditions I have that, the red block is clear, the yellow block is clear, the hand is empty, the red block, the red block is on the table, the blue block is on the table, and the orange block is on the table
block is on top of the blue block, the yellow block is on top of the orange block, the blue block is on the table Action: put down the yellow block
and the orange block is on the table Reason: The above action is applicable in the current state because the precondition; the hand is currently holding yellow
My goal is to have that the orange block is on top of the red block. block, is satisfied in the current state.
My plan is as follows: Resulting State: the red block is clear, the blue block is clear, the orange block is clear, the yellow block is clear, the hand is
empty, the red block is on the table, the blue block is on the table, the orange block is on the table, and the yellow block is on
GPT-4 Response the table
[PLAN]
5. Current State: the red block is clear, the blue block is clear, the orange block is clear, the yellow block is clear, the hand is
1. Current State: the red block is clear, the yellow block is clear, the hand is empty, the red block is on top of
empty, the red block is on the table, the blue block is on the table, the orange block is on the table, and the yellow block is on
the blue block, the yellow block is on top of the orange block, the blue block is on the table, and the orange
the table
block is on the table
Action: unstack the red block from on top of the blue block Action: pick up the orange block
Reason: The above action is applicable in the current state because its preconditions; the orange block is clear, the hand is
Reason: The above action is applicable in the current state because its preconditions; the red block is clear,
the hand is empty, and the red block is on top of the blue block, are satisfied in the current state. empty, and the orange block is on the table, are satisfied in the current state.
Resulting State: the blue block is clear, the yellow block is clear, the hand is currently holding red block, the Resulting State: the red block is clear, the blue block is clear, the hand is currently holding orange block, the yellow block is
yellow block is on top of the orange block, the blue block is on the table, and the orange block is on the table clear, the red block is on the table, the blue block is on the table, and the yellow block is on the table
2. Current State: the blue block is clear, the yellow block is clear, the hand is currently holding red block, the 6. Current State: the red block is clear, the blue block is clear, the hand is currently holding orange block, the yellow block is
yellow block is on top of the orange block, the blue block is on the table, and the orange block is on the table clear, the red block is on the table, the blue block is on the table, and the yellow block is on the table
Action: put down the red block Action: stack the orange block on top of the red block
Reason: The above action is applicable in the current state because the precondition; the hand is currently Reason: The above action is applicable in the current state because its preconditions; the red block is clear, and the hand is
holding red block, is satisfied in the current state. currently holding orange block, are satisfied in the current state.
Resulting State: the red block is clear, the blue block is clear, the yellow block is clear, the hand is empty, the Resulting State: the blue block is clear, the orange block is clear, the hand is empty, the yellow block is clear, the orange
red block is on the table, the yellow block is on top of the orange block, the blue block is on the table, and the block is on top of the red block, the blue block is on the table, the red block is on the table, and the yellow block is on the table
orange block is on the table
Final State: the blue block is clear, the orange block is clear, the hand is empty, the yellow block is clear, the orange block is
3. Current State: the red block is clear, the blue block is clear, the yellow block is clear, the hand is empty, the on top of the red block, the blue block is on the table, the red block is on the table, and the yellow block is on the table
red block is on the table, the yellow block is on top of the orange block, the blue block is on the table, and the The goal conditions are satisfied in the final state. Hence, the above plan is valid.
orange block is on the table
[PLAN END]
Action: unstack the yellow block from on top of the orange block
Reason: The above action is applicable in the current state because its preconditions; the yellow block is
clear, the hand is empty, and the yellow block is on top of the orange block, are satisfied in the current state.
Resulting State: the red block is clear, the blue block is clear, the orange block is clear, the hand is currently
holding yellow block, the red block is on the table, the blue block is on the table, and the orange block is on
the table SUCCESS
State Tracking Results
What if we finetuned LLMs with successful
plans in the domain?
Instances correct by
• What if we further finetuned the next word Finetuned GPT-3
(action) completer with a bunch of correct plans
in the domain?
• This is basically the supervised finetuning stage LLMs
currently use to make them better at specific domains 20%
(e.g. Bloomberg's FinGPT..)
• We prepared a dataset comprising the initial 80%
state, goal state, and the respective plan for
1,000 distinct Blocksworld instances.
• By using the default hyperparameters provided
by OpenAI and an 80-20 train-validation data Incorrect Correct
split, we carried out the fine-tuning process.
• Finetuned-GPT3 could only solve around 20%
(122 out of 600) of the test set.
Doing even more fine tuning?
• Pallagani et al *
• Fine-tune (Code-T5) using a
dataset of 18000 plans for each
domain
• Change the representation of
the planning problem into a
more compact representation
for finetuning Table: Evaluation of plan generation capabilities of
LLMs (both prompting pre-trained model and fine-
• Provide PDDL-style prompts to tuned model)
pre-trained models and the
compact representation to the When does it become quixotic
fine-tuned models to give huge number of plans instead of
a small domain model?
Pallagani, V., Muppasani, B., Murugesan, K., Rossi, F., Srivastava, B., Horesh, L., & Loreggia, A. (2023). Understanding the Capabilities of Large Language Models for Automated Planning.
arXiv preprint arXiv:2305.16151.
Training Sequence models (e.g.
Transformers) Tabula Rasa
(without pre-training them on language)
52
Action Vector Models can be used to
Recognize Plans
With the learnt vectors wi, we can predict the target
plan (as the most consistent with the affinities). We
use an EM procedure to speedup the prediction.
• M = |the target plan|
accuracy
DUP" 0.2"
0.2" 0.2"
DUP"
0.1" ARMS+PRP" 0.1" 0.1"
ARMS+PRP"
0"
0.05" 0.1" 0.15" 0.2" 0.25"
0"
0.05" 0.1" 0.15" 0.2" 0.25"
0"
0.05" 0.1" 0.15" 0.2" 0.25"
Nominated for Best Student Paper
percentage of unobserved actions percentage of unobserved actions percentage of unobserved actions
Award at [AAMAS16]
Training Transformers on Plan Sequences
• Uses the autoregressive GPT
transformer architecture to predict
the next token in a reward, state,
action sequence
• Target reward and initial state are
provided as first tokens during
prediction
• Work focuses mainly on action
prediction and did not find improved
performance with reward and state
prediction (although it is possible)
"Decision transformer: Reinforcement learning via sequence modeling." Advances in neural information processing
systems 34 (2021): 15084-15097. Figures are from that paper.
57
Looks Like we showed that
LLMs can’t Plan..?
59
60
On the other hand, the literature seems rife
with claims of LLM planning abilities..
61
62
65
Then how come LLMs are trumpeted as doing
planning?
• Most cases where LLMs are • And the interaction
claimed to generate executable resolution/search part is
plans, on closer examination, turn • either pushed under the rug
out to be cases where LLMs are • Consider "high level" plans like
"wedding plans" for which there are
getting by with the generate enough generic recipes available in the
training set, and are described at a
approximate recipes step sufficiently high level of abstraction, the
execution issues are left to the user’s
• Generate approximate recipes/cases imagnination
(for common sense domains) • or has been pawed off to
• e.g. wedding plans human prompters who are required to
• Convert tasks into (approximate) task give "hints" to the LLM to come up
reduction schemas with plan variants that are (more)
correct
• Perhaps written out as "programs" • Note that here the human is
(e.g. Code as Policies..) essentially playing the role of an external
• (SHOP2 schemas were already pseudo verifier & critic
lisp code—if only written by humans) • In cases where the humans are end
users not well versed with all details of
• LLM-HTN and LLM-CBR differ from HTN and CBR the domain, they can be faulty verifiers
in that they generate the task-reduction schemas
or the cases on demand
66
67
Tree of Thoughts
68
Expressivity of LLMs is different from
that of LLM Modulo Solvers (i.e.,
LLMs with external plugins)
• LLMs by themselves are broad and shallow
• Cannot give guarantees about their reasoning
• But, they can improve their reasoning etc.
capabilities by making calls to external
solvers
• That themselves are deep and narrow
• The very current “AutoGPT” excitement is
basically about this..
• (Ensuring how a shallow manager can still
provide guarantees via the narrow/expert
workers is a fertile direction of research)
69
Workflow for using LLM’s as Idea Generators
(for External Sound Planners)
Model
• But the plans can be used as heuristic
guidance to drive sound planners.
LPG
• LPG [1]
• A local search planner
• Starts with a seed plan and repairs it iteratively till LLM generated plan as a heuristic to
it gets a correct plan. a sound planner like LPG
• Use LLM generated plans as seed plans to
LPG
• LPG can ‘repair’ these plans to generate
correctness (faster than it would, if it started
with an empty plan)
[1] Gerevini, Alfonso, and Ivan Serina. "LPG: A Planner Based on Local Search for Planning Graphs with Action Costs." AIPS. Vol. 2. 2002.
71
Connection to Case based Planning
• Note that there is an interesting
parallel between this and case based
planning systems—which retrieve an
old plan most relevant to the current
problem and try to modify the plan
• Modification by domain-specific rules
[e.g. CHEF]
• Modification by domain-independent
planners [e.g. PRIAR]
72
Back-prompting from External Verifiers
20%
31%
69%
80%
79
80
81
Mechanized Critiquing of LLM plans
• External model-based verifiers doing the critiquing
• A verifier working off of a separate (independently certified) domain model
• VAL for PDDL models
• (Variation) A simulator that can verify/critique the LLM plan
• (Remember: Simulators have to be separately engineered just like domain models)
• Critics can be sound but not complete (special purpose critics)
• Interestingly, the original NOAH planner (1976!!) viewed planning as an iterative criticism of the candidate plan
by a set of sound but incomplete critics (each of which were also expected to give resolutions for the flaws they
found—and they weren't always complete in specifying the resolutions..)
• Much of the latter domain-independent planning basically systematized this process
82
Fine Tuning the Pre-trained model to be both
a generator and verifier
• Start with GPT-2
• [Finetuned generator:] Fine tune GPT-
2 as a generator on a corpus of blocks
world plans
• [Finetuned Verifier:] Use the same
corpus to train a verifier (based off of
GPT-2)
• Do Verifier-augmented generation
• Sort of similar to the back-prompting
with VAL (except that the verifier here is
also learned from the same corpus)
Use LLMs to get "task reduction" schemas (recipes for individual tasks)
(That the humans in the loop correct with CoT prompting)
87
What Planning is & What LLMs are good at..
Planning (as used in common parlance) Contrasting what AI Planning & LLMs
involves bring to the table
• Planning knowledge • AI Planning (aka ICAPS planning) assumes that
• Actions, preconditions and effects the planning knowledge is given up front, and
• General Recipes: Task reduction schemata (e.g. focuses generation and verification
HTN planning) techniques
• Old examples: Case libraries • Emphasis on guaranteeing
completeness/correctness of the plans w.r.t. the
• Plan generation/verification techniques model
• Interaction analysis/resolution • By and large the common paradigm—although there
have been occasional mutinies
• Plan merging techniques • Model-Lite Planning approaches
• Plan modification techniques
• LLMs, trained as they are on everything ever
put on the web, have a kind of "approximate
LLMs accept any planning problem—even if it not omniscience". This helps them spit out
actions, recipes, or cases
expressible in PDDL standard—and they don’t give
• But they lack the ability to stitch the recipes
any correctness guarantees. together to ensure that there is no actually
interaction free!
AI Planners will give formal guarantees, but only
88
accept problems expressible in their language.
LLMs for constructing world models
91
Step 1. PDDL Construction
Newly defined
predicates are
appended to the
list of extracted
predicates
Code Parser
Extracts task Prompts again for any missing functions found
reduction Re-prompt
schema from
LLM! LLM Response
Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks
Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati.
Workshop on Theory of Mind in Communicating Agents at ICML 2023 (Oral)
LLMs as Human Preference Proxies
Can LLMs capture human preferences?
105
Text to plan using GPT-3
q Workshop on KEPS (ICAPS’21)
q Workshop on Planning for Financial
Services (ICAPS’21)
107
LLMs for Translating Natural
Language Goals to PDDL
• Using LLMs to translate Goal specifications from Natural Language
to PDDL. (LLM as an Interface between Humans and Planners).
• Translation subtasks:
• PDDL Domain Understanding
• Goal Inference
• PDDL Goal Specification.
• LLM does well for unambiguous fully specified goals but has mixed
performance for ambiguous partially specified goals.
• 5 skills tested for - 1)Linguistic competence 2) Object Association
3) Numerical Reasoning 4) Physical Reasoning 5) World
Knowledge
• Results show that LLM has reasonably good Linguistic
Competence, Object Association abilities, and World Knowledge.
However, it has poor Numerical & Physical Reasoning abilities.
Xie, Y., Yu, C., Zhu, T., Bai, J., Gong, Z., & Soh, H. (2023). Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128. 108
LLM+P: Empowering LLMs with Optimal
Planning Proficiency
• LLM translates Natural Language Problems into Problem PDDL by in-context Learning.
(Context is crucial).
• Planner: Problem PDDL + Domain PDDL --> PDDL Plan
• LLM: PDDL Plan --> Natural Language Plan.
• LLM as a planner:
• Lacks the ability to reason about preconditions.
• Performs poorly in Domains that require an understanding of complex spatial relationships.
Liu, B., Jiang, Y., Zhang, X., Liu, Q., Zhang, S., Biswas, J., & Stone, P. (2023). Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint 109
arXiv:2304.11477.
AutoTAMP
• LLMs are being used as translators and
verifiers
• They translate from natural language to
Signal Temporal Logic representation.
• An STL planner is used to come up with
plans.
• Re-prompting technique is used on the LLM
translator and the verifier to improve
performance
Chen, Y., Arkin, J., Zhang, Y., Roy, N., & Fan, C. (2023). AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers. arXiv preprint arXiv:2306.06531.
Epilogue
112
113
A Summary of LLM roles in Planning we saw
• LLMs to generate plans autonomously
• (Different prompting techniques, fine tuning techniques
• Not that impressive if the plans require interaction resolution
• LLMs to generate plans with the help of external planners/verifiers (LLM-Modulo
setting)
• As heuristic seeds to external planners
• As targets of “back prompts” by external verifiers
• LLMs to extract plan knowledge
• Domain models
• Task reduction schemas
• LLMs as translators (e.g. from English to PDDL)
• Convert plans/recipes in English to formal representations (can be used for extracting
structured knowledge)
• Serve as a glorified natural language interface for specifying goals, problems etc..
114
What Planning is & What LLMs are good at..
Planning (as used in common parlance) Contrasting what AI Planning & LLMs
involves bring to the table
• Planning knowledge • AI Planning (aka ICAPS planning) assumes that
• Actions, preconditions and effects the planning knowledge is given up front, and
• General Recipes: Task reduction schemata (e.g. focuses generation and verification
HTN planning) techniques
• Old examples: Case libraries • Emphasis on guaranteeing
completeness/correctness of the plans w.r.t. the
• Plan generation/verification techniques model
• Interaction analysis/resolution • By and large the common paradigm—although there
have been occasional mutinies
• Plan merging techniques • Model-Lite Planning approaches
• Plan modification techniques
• LLMs, trained as they are on everything ever
put on the web, have a kind of "approximate
LLMs accept any planning problem—even if it not omniscience". This helps them spit out
actions, recipes, or cases
expressible in PDDL standard—and they don’t give
• But they lack the ability to stitch the recipes
any correctness guarantees. together to ensure that there is no actually
interaction free!
AI Planners will give formal guarantees, but only
115
accept problems expressible in their language.
Then how come LLMs are trumpeted as doing
planning?
• Most cases where LLMs are • And the interaction
claimed to generate executable resolution/search part is
plans, on closer examination, turn • either pushed under the rug
out to be cases where LLMs are • Consider "high level" plans like
"wedding plans" for which there are
getting by with the generate enough generic recipes available in the
training set, and are described at a
approximate recipes step sufficiently high level of abstraction, the
execution issues are left to the imagination
• Generate approximate recipes/cases of the user
(for common sense domains) • or has been pawed off to
• e.g. wedding plans human prompters who are required to
• Convert tasks into (approximate) task give "hints" to the LLM to come up
reduction schemas with plan variants that are (more)
correct
• Perhaps written out as "programs" • Note that here the human is
(e.g. Code as Policies..) essentially playing the role of an external
• (SHOP2 schemas were already pseudo verifier & critic
lisp code—if only written by humans) • In cases where the humans are end
users not well versed with all details of
• LLM-HTN and LLM-CBR differ from HTN and CBR the domain, they can be faulty verifiers
in that they generate the task-reduction schemas
or the cases on demand
116
Planning in the age of LLMs
For far too long, there has been a race to LLMs change that—rather drastically!
bottom on the level of knowledge given to
planners
• Planning started knowledge-based • LLM makes it easy to get knowledge without
• Remember, Noah was an HTN planner, y’all! making it look like we are inconveniencing any
• ..and fell to ground propositional level –because it specific human
seemed too unseemly to depend on humans for these • We are just stealing everything humans told each
knowledge-based models other—is all.
• And focus on doing interaction resolution from first
principles • ..as long as you relax the requirement of the
knowledge actually being “correct”
• RL was worse—propositional was too high-level a • ..then again, do you really believe that huge human-
knowledge to ask from humans written models are correct?
• They wanted to say they will learn it all
• And not have humans give any knowledge about the
• So the million dollar qn is: How would you do
domain. They just wanted “SIMULATORS”, planning if you have some doddering know-it-all
• ..and it took for ever to do anything—even with ready to give you any kind of knowledge
simulators • “Actions and effects”
• RL is way darned too inefficient, y’all • “Task reduction schemas”
• “Cases”
• Time for LLM-HTN, LLM-CBR etc. paradigms
• Or even a resurrection of the model-lite planning
dream..
117
Tradi<onal Planning
c
m ic-
No ral
s<
l
Te ric
ra
al
t
PO
de
Te tr
ha
si c
po
et
po
n-
M
oc
as
m
Cl
St
Best Student Underlying System Dynamics
Paper Nominee
[AAMAS 2016] [AAMAS 2015] [AIJ 2017; ICAPS 2014; IJCAI 2009, 2007]
ß Associative/uninterpretable Causal/interpretable à
bit.ly/3NC6vqs
119