0% found this document useful (0 votes)

11 views

8283_Large_Language_Models_Can

Uploaded by

archerccc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

8283_Large_Language_Models_Can

Uploaded by

archerccc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Under review as a conference paper at ICLR 2024

L ARGE L ANGUAGE M ODELS C AN D ESIGN G AME -

THEORETIC O BJECTIVES FOR M ULTI -AGENT P LAN -
NING

Anonymous authors
Paper under double-blind review

A BSTRACT
Game theory is a powerful paradigm to describe the interplay between participants
in interactive multi-agent scenarios, and relies on the knowledge of player objec-
tives or payoff structures for game optimal decision making. However, designing
such objectives for games is challenging as it requires evaluating the impact of an
agent’s actions on the behavior of others, and understanding the effect of changes
in one’s policy on the behavior of others. Indeed, aligning objective representa-
tions with a desired multi-agent behavior is achieved via tedious and impractical
heuristics or human trial-and-error. This work aims to ease this process and pro-
poses a multi-agent planning architecture that relies on a large language model
(LLM) as the game formulation designer. First, we exhibit the zero-shot profi-
ciency of the more capable LLMs (such as GPT-4) in tuning continuous objective
function parameters in accordance with a specified high-level goal for autonomous
driving examples. We then develop a planner which uses an LLM as a matrix game
designer, for scenarios with discrete and finite action spaces. Given a scene his-
tory, the actions available to each agent, and high-level objectives (expressed in
natural language), the LLM evaluates the payoffs associated with each combina-
tion of actions. From the game structure obtained, agents execute Nash optimal
actions, the scene is re-evaluated, and the process is repeated. We evaluate our ap-
proach on a heterogeneous robot planning task inspired by wildlife conservation,
as well as a household multi-humanoid transport task, and show the superiority of
our LLM-based approach to other baselines 1 .

1 I NTRODUCTION
The game-theoretic approach to multi-agent planning delves into the intricate dynamics that arise
when multiple autonomous agents interact within a shared environment. In this framework, each
agent is assumed to be a rational decision-maker, striving to maximize its own utility or benefit. It
provides a formalized and mathematical foundation for predicting and understanding the behavior of
multiple agents. By modeling their interactions using game theory, one can analyze the potential out-
comes, equilibria, and strategies that agents might employ. The significance of this approach is un-
derscored by its applications in diverse fields like economics (Samuelson, 2016), robotics (LaValle,
2000), transportation (Fisk, 1984), and artificial intelligence (Shoham et al., 2007).
The contemporary landscape of multi-agent systems demands innovative, adaptive, and scalable
solutions. The integration of large language models (LLMs) into game-theoretic approaches presents
a compelling answer to this request. Here’s why:
Automated Game Design and Harnessing Domain Expertise: Traditional game-theoretic re-
search often focuses on solving games with predefined structures and objectives, yet the crux of
real-world utility lies in designing these very frameworks. Current methods, reliant on iterative
human input, often requires meticulous manual efforts and fails to scale. LLMs can revolutionize
this, offering automation in design and ensuring that domain-specific expertise is seamlessly in-
corporated with minimal human efforts, leading to outcomes that resonate more profoundly with
real-world scenarios in an efficient and scalable manner.
1 Videos can be found at the anonymous website: https://sites.google.com/view/game-llm

1
Under review as a conference paper at ICLR 2024

optional
Observation
User
Feedback

Game Game
Translator Solver
Multi-agent LLM
Simulator

Prompt

Action

Heterogeneous
robot planning
Cost function

cooperation
Complex
tuning

Poacher

AVs Drone

Household Humanoid
Responder items agents

Autonomous Driving Stop-The-Poacher Multi-Agent Transport

Figure 1: Game LLM: a game-theoretic objective designer. Our framework is amenable to both
assisting human tuning as well as designing payoffs for planning in dynamic scenarios.

Tackling Computational Challenges: Some games, even with known game structures, by virtue of
their complexity, resist optimal solutions through conventional methods and limited computations.
In such cases, heuristics or approximations offer a way out, often driven by the nuanced intuition that
human bring to the table. LLMs bridge the gap by emulating human-like problem-solving strategies,
providing high-level guidelines toward solving complex and intractable games.
Addressing the Unexpected and Promoting Adaptability: No model, however sophisticated, can
operate flawlessly in every possible scenarios, and unexpected situations are inevitable. LLMs can
offer dynamic solutions based on their vast knowledge and common sense reasoning capabilities,
and augment human interventions in navigating these uncharted regimes. Furthermore, just as hu-
mans evolve strategies based on experiences, LLMs can mimic this adaptability, refining strategies
and enhancing the overarching game-theoretic model on the fly through trial-and-error without ac-
tual human efforts.
Building Trust in Decision-making: Trustworthiness is a cornerstone for the widespread accep-
tance of any system. Stakeholders and users seek control, transparency, and interactability. LLMs,
adept at facilitating human-like interactions and explanations, enhance perceived trust with their
inherently human-comprehensible interfaces that allow seamless and natural interaction with users.
Using the scenario of rangers protecting rhinos from poachers, intelligently using LLM can ef-
ficiently craft game-theoretic objectives, drawing from domain insights such as rhino movement
patterns and past poaching incidents. It narrows the vast search space of the park using seasoned
rangers’ insights. As the ever-changing park landscape presents challenges, such as new poaching
tactics or rhino migrations, the LLM dynamically refines strategies, learning and adapting with min-
imal manual oversight. Crucially, by articulating decisions in accessible terms, LLMs build trust
with rangers, ensuring the devised strategies are both trusted and implemented.
To this end, we develop a comprehensive framework that harnesses the capabilities of LLMs to craft
game-theoretic objectives tailored for multi-agent planning. Our approach centers around General-
ized Nash Equilibrium Problems (GNEPs), a versatile game formulation that encompasses various
conventional games, including dynamic games and normal form games. Bridging the gap between
the LLM’s natural language reasoning and structured game constructs like optimization problems
or payoff matrices, we introduce a cost translation module. This module educates the LLM on in-
terpreting agent behaviors through the lens of game theory, guiding it to suggest game formulations

2
Under review as a conference paper at ICLR 2024

that align closely with specified objectives. Once the LLM proposes a game, it is then processed
by game solvers, such as nonlinear programming solvers, to determine the agents’ actions. To com-
plete the loop, we simulate the game’s outcomes, using the results as feedback signals for the LLM,
continuously refining and optimizing the proposed game structures.
We evaluate our framework in three benchmarks: (i) multi-agent motion planning for autonomous
driving (ii) heterogeneous cooperative/competitive strategic reasoning for protecting rhino from
poachers (iii) ThreeDWorld Multi-Agent Transport (TDW-MAT). In our autonomous vehicle ex-
periments with non-cooperative quadratic costs, we seamlessly convert high-level descriptions to
cost parameters for multi-agent behaviors. Our solution adeptly tunes intertwined parameters effi-
ciently based on human feedback. Furthermore, we demonstrate our method can perform superior
strategic planning comparing to other baselines in heterogeneous cooperative/competitive games in
both the Stop-The-Poacher task and the TDW-MAT Challenge. To summarize, we contribute to:
(i) A comprehensive framework with LLM integration to automate game-theoretic objectives de-
sign for multi-agent planning (ii) extensive evaluation across diverse benchmarks to demonstrate the
capability to convert high-level descriptions into cost parameters and superior strategic planning in
homogeneous/heterogeneous cooperative/competitive game scenarios.

2 R ELATED W ORK

Game-theoretic planning. A game-theoretic setting allows to model both adversarial and coopera-
tive agents. In fact, scenarios such as autonomous driving, where robots need to interact with other
intelligent agents, are fundamentally game-theoretic (Fisac et al., 2018; Trautman & Krause, 2010),
(Dreves & Gerdts, 2018). Formalizing such interactions as a game, robots can weigh the impact of
their decisions on the actions of other agents (Sadigh et al., 2016), even in competitive settings such
as drone or car racing (Spica et al., 2018; Wang et al., 2021). Most game-theoretic works do not
assume availability of a communication network among the agents and model the interactions as a
game formulation. In (Peters et al., 2021; Laine et al., 2021; Le Cleac’h et al., 2021) data-driven
methods are presented to infer objective functions of other agents online in game theoretic planning
without communication. On the other hand distributed optimization approaches assume the exis-
tence of a communication network and solve the optimization based on the received communicated
predictions (Rey et al., 2018), (Ferranti et al., 2018). They devise a communication and planning
protocol in which all the agents share their predicted planned trajectories. One drawback of the for-
mer approach is that in an uncertain and dynamic environment such as autonomous driving setting,
solving a game with a fixed structure may not be robust with respect to adversarial agents’ behav-
ior. Moreover, when there are adversarial agents involved in the game, communication is usually
disabled, and the objectives of adversarial agents are usually hard to obtain. In this work, we show
how we may infer objectives for multi-agent planning using large language models, in which case
the planning is more robust and intelligent compared to existing data-driven methods.
Large Language Models for Planning. Large language models (LLMs), such as ChatGPT (Stien-
non et al., 2020) (Gao et al., 2023), GPT-4 (OpenAI, 2023) (Katz et al., 2023), are powerful models
for natural language understanding and inference. Recently, LLMs have been widely used in vari-
ous robotic tasks, such as design (Singh et al., 2023), (Stella et al., 2023), (Vemprala et al., 2023),
navigation (Shah et al., 2023a) (Shah et al., 2023b) (Dorbala et al., 2023), and planning (Chen et al.,
2023) (Huang et al., 2022). Traditional planning methods require lengthy mapping and exploration,
which is a major challenge since it may take a considerably long time to find a planning trajectory,
especially in unfamiliar environments. However, humans can quickly navigate and plan in unfamil-
iar environments, which motivates the employment of LLMs for planning (Shah et al., 2023a) due
to their human-level capability in scene understanding. LLMs planning in embodied environments
requires planning skills, as well as the logic of planning. The solutions change over time in re-
sponse to the agent’s own and other agents’ decisions (Huang et al., 2022). This problem is usually
referred as a game. However, one obvious problem in planning using LLMs is that they generally
struggle to understand the agent interactions that play a critical role in the performance one can
achieve on a specific task in a multi-agent environment. In this work, we address this challenge by
combining LLMs with game-theoretic optimization. Instead of directly planning actions, the LLM
designs objectives for a game describing the interplay between agents. As a result, we can ensure the
plans obtained do not neglect the interactive nature of the problem while harnessing the reasonning
capabilities of LLMs.

3
Under review as a conference paper at ICLR 2024

3 LLM S AS GAME OBJECTIVE DESIGNERS

The framework we proposed does is not restricted to specific types of game formulations. It is a
general approach that relies on the assistance of LLMs for the design of game objective parameters,
which can be cost function parameters or simply payoff matrix entries. The applications we exhibit
where such an architecture is useful include tuning game structures to obtain desired human behav-
iors in autonomous driving scenarios, as well as multi-agent planning where the LLM successively
evaluates the payoff of actions available to agents in environments where the action space depends
on the state of the system. The proposed architecture is depicted in Figure 1.
Given a multi-agent simulation environment, we observe the state of the system (or start from known
initial conditions). The translation of the state of the system, the structure of the objectives of agents
in the scene (available actions in normal form games, parameterized functions for continuous costs)
can be done via an engineered prompt interfacing with the simulation environment, but can also
accommodate a human in the loop for additional information or expressing human preferences. The
LLM tunes objective parameters specified in the prompt, thus completing the information required
to completely specify the game setup at a current time. The game can be solved with a suitable
solver, generating the actions for each agent to execute. The simulator is then run until a new game
structure is to be designed, and the process is repeated.

3.1 G ENERALIZED NASH E QUILIBRIUM P ROBLEMS

We consider Generalized Nash Equilibrium Problems (GNEPs) involving N players i ∈ {1, . . . , N}

i
over a horizon of H time steps. An agent i’s state at time step index t is denoted xti ∈ Rn
i
and control input uti ∈ Rm , with dimensions of agent i’s state and control ni and mi . Let xt =
[xt1,⊤ , . . . , xtN,⊤ ]⊤ ∈ Rn denote the joint state and ut = [ut1,⊤ , . . . , utN,⊤ ]⊤ ∈ Rm denote the joint con-
trol of all agents at time t, with joint dimensions n = ∑i ni and m = ∑i mi . We define player i’s
policy as π i = [ui,⊤ i,⊤ ⊤ m̃i i i
1 , . . . , uH−1 ] ∈ R where m̃ = m (H − 1) denotes the dimension of the entire
trajectory of agent i’s control inputs. The notation ¬i indicates all agents except i, for instance π ¬i
represents the vector of the agents’ policies except that of i. Also, let X = [x⊤ ⊤ ⊤ ñ
2 , . . . , xH ] ∈ R , with
ñ = n(H − 1), denote the trajectory of joint state variables resulting from the application of the joint
control inputs to the dynamical system defined by f : Rn × Rm → Rn such that,
xt+1 = f (xt , ut ) (1)
Over the whole trajectory we can express the above kinodynamic constraints with ñ equality con-
straints,
D(X, π 1 , . . . , π N ) = D(X, π) = 0 ∈ Rñ (2)
The cost function of each player i can be parameterized by θi , which determines its structure. The
cost function depends on the agent i’s policy π i as well as on the joint state trajectory X, which is
common to all players, such that ∀i ∈ {1, . . . , N},
H−1
J θi (X, π i ) = cθHi (xH ) + ∑ ctθi (xt , uti ) (3)
t=1

Notice that as player i minimizes J θi with respect to X and π i , the selection of X is constrained by the
other players’ strategies π ¬i and the dynamics of the joint system via (2). In addition, the strategy
π i could be required to satisfy constraints that depend on the joint state trajectory X as well as on
the other players strategies π ¬i . This can be expressed with a set of g inequality constraints,
C(X, π) ≤ 0 ∈ Rg (4)
where C : Rñ × Rm̃ → Rg . The GNEP we form is the problem of minimizing (3) for all players
i ∈ {1, . . . , N} with respect to (2) and (4). More specifically,
min J θi (X, π i ) ∀i ∈ {1, . . . , N}
X,π i
subject to D(X, π) = 0 (5)
C(X, π) ≤ 0

4
Under review as a conference paper at ICLR 2024

The solution to such a dynamic game is a generalized Nash equilibrium, i.e. a policy π such that,
∀i ∈ {1, . . . , N}, π i is a solution to (5) with the other players’ policies given by π ¬i that is also solved
by (5) for all ¬i. As a consequence, at a Nash equilibrium point solution, no player can improve
their strategy by unilaterally modifying their policy.
Many familiar types of games fall under special cases of the GNEP formulation. Notably, matrix
games can be seen as a special case of GNEPs, with a unit horizon, two players, and a cost structure
dependency that can be described by a payoff matrix M. We can, thus, similarly parameterize the
payoff matrix in a normal form game. In this case, asking the LLM to design the game parameters
can be seen as having it fill in the entries of the payoff matrix. Achieving both tuning of continuous
cost functions, as well as tabular cost evaluation, requires a capability to reason about actions and
reactions.

3.1.1 G AME T RANSLATOR

We address the mechanism involved with obtaining LLM outputs that are amenable to conversion
into games that can be directly handled by solvers.
The human assistant tuner setup we devise relies on a two-stage prompt architecture consisting of
a motion descriptor followed by a cost function tuner. The method is an extension of the proposed
single agent setup proposed by Yu et al. (2023). First, the motion descriptor instructs the LLM to
interpret and translate the desired high-level user input behavior into a pre-defined template natural
language description of the agents’ behavior that is clearer for the LLM to decipher. Next, the cost
function tuner prompts the LLM to turn the motion description into tuned cost functions in code
form, which is subsequently inserted as is into the game definition code. We do not suggest any
values or orders of magnitude.
When using our architecture for decision making in environments with dynamically changing dis-
crete action options, prompting is designed to require the LLM to output payoffs for each combi-
nation of actions available to the agents at a specific planning step. This occurs in one prompting
step, after which we parse the output to construct a normal form game. In two player scenarios this
comes down to parsing the output into a matrix which is then solved to output the next Nash optimal
action to be taken.

3.1.2 S IMULATOR F EEDBACK

In the tuning setup we assume the human operator provides short explanations of the behavior of
agents in the simulator to the LLM. New parameters are then designed via the pipeline to be run
again. When running the framework for decision making, we design a perception module, i.e. state
to text scripts that take information from the simulator, such as agent positions and observations,
current actions and so on, and maintain a history of the behavior, updating a prompt template. This
ensures that all information required to assess options for a next planning step is made available to
the LLM at evaluation time.

4 E XPERIMENTS

4.1 C OST TUNING WITH LLM S FOR MULTI - AGENT MOTION PLANNING

In this section, we exploit the zero-shot reasoning capabilities of LLMs to ease the design of desired
behavior in multi-agent motion planning. The coupling of optimisation problems in game-theoretic
decision making makes it difficult to tune the cost parameters of different agents involved in the
scene to achieve a desired collective behaviour. Using LLMs as desired motion descriptors and then
cost function translators can significantly improve this tuning procedure. We use autonomous driv-
ing examples to illustrate the ease with which LLMs can help tune parameters to achieved desired
road scenarios. Details on dynamics, constraints, cost functions and solver are provided in section
A.0.1 of the Appendix.

5
Under review as a conference paper at ICLR 2024

4.1.1 N O OVERTAKE
We set two vehicles on the left side of a two lane highway. We wish to design a solution in which
overtaking is only allowed from the left. Vehicle Green (going at 0.9 m/s) is 0.5 m behind Vehicle
Red (going at 0.6 m/s). We wish for both vehicles to maintain their driving speed. Vehicle Red has
to be reluctant to switch lanes. We desire to design a scenario where vehicle Green is forced to slow
down to avoid crashing into vehicle Red. Figure 2 depicts the trajectories of the agents obtained by
solving the game designed by our Game LLM architecture.

Figure 2: No overtake: Vehicle trajectories obtained by solving the dynamic game with the LLM
generated cost parameters. Vehicle Green is in green and vehicle Red is in red. Marker size and
color increase with time.

The LLM provides a set of parameters that achieve the desired behavior from the first attempt in
zero-shot prompting fashion. Our architecture shows signs of multi-agent reasoning to tune these
parameters from scratch, as it has no access to initial proposed values or orders of magnitude for the
different parameters.

4.1.2 F ORCE OVERTAKE

We set the two vehicles on a two lane highway. They are both on the driving on the left lane.
Vehicle Green (going at 0.9 m/s) is 0.5 m behind Vehicle Red (going at 0.6 m/s). Both vehicles want
to maintain their driving speed. Vehicle Red is reluctant to switch lanes. We prompt the LLM to
design a scenario where vehicle Green overtakes vehicle Red via the right lane. Figure 3 depicts
the trajectories of the agents obtained by solving the succession of games designed by our Game
LLM architecture. After each execution, a human user gives a short description of the motion of the
vehicles and asks the LLM to modify the parameters to get to the desired behavior.

Figure 3: No overtake: Vehicle trajectories obtained by solving the dynamic game with the LLM
generated cost parameters. Vehicle Green is in green and vehicle Red is in red. Marker size and
color increase with time (vehicles driving from left to right). The trajectories are the iterations from
earliest (top) to final (bottom) with user feedback to LLM as it tunes the objective parameters.

It takes our proposed architecture four attempts to get to the described scenario, which we argue is
no more than what it would take a human to design such an experiment from scratch. Furthermore,
we analyse the choice of parameters the LLM selects to tune and the value evolution with each
iteration. In fact, the LLM pinpoints only two parameters that it modifies between runs; the cost
parameters associated with deviations from the desired lane. Indeed, since the description suggests
both agents wish to remain on the left lane and that overtakes should usually only occur from the

6
Under review as a conference paper at ICLR 2024

left, the desired lane for both agents is selected to be the left. The evolution of these parameters for
both agents is depicted in the right bar plot in Figure 3.
An interesting observation is that in addition to both reducing the Green vehicles cost associated
with staying on the left lane, and increasing that of the Red vehicle, it appears the tuner proceeds
in exponential steps (the bar plot is in log scale, the lane cost for Green is successively 0.2, 0.05,
0.01, 0.01 whereas the lane cost for Red is 0.9, 1.0, 2.0, 5.0). Our LLM tuner thus exhibits an
efficient tuning approach while showing correct understanding of human feedback and identifying
parameters of significance. It is also notable that the tuner is capable of tuning multiple values
simultaneously without resorting to a complete decoupling approach.
Our LLM-based architecture designed to reason about multi-agent behavior proves to be a useful
assistant for the design of objectives in game-theoretic motion planning scenarios.

4.2 S TOP -T HE -P OACHER

Stop-the-Poacher is a task that involves heterogeneous robot agents in a cooperative/competitive

environment. The task is designed to require coordinated team planning of agents with different
attributes to contain an adversary. The simulation is set up in TDW (Gan et al., 2021a), where a
team consisting of a Responder 4x4 SUV and a Drone are tasked with protecting a Rhino from a
Poacher (humanoid).

4.2.1 TASK D ESIGN

The environment consists of an open square simulation area. The Poacher is given privileged infor-
mation about the location of the Rhino and heads directly towards it. The locations of the Rhino and
Poacher are unknown to the Responder and Drone who have to first coordinate the exploration of
the area to locate other agents that enter their perception radius. The mission is considered a success
if the Responder can secure the Rhino (by reaching it before the Poacher) or if it can intercept the
Poacher before it gets to the Rhino. The Drone cannot physically intercept or secure, but can only
gather information. The Rhino and Poacher are initially spawned at random positions on the map.
The Drone and Responder both start at the center of the map in each simulation run. They both
move at the same speed and are faster than the Poacher. We assume the Responder and Drone share
information about their positions and the positions of agents they perceive.
We calibrate the map size and agent moving speeds for the success rates to be reasonable using a
baseline heuristic. We segment the map into 9 equal areas in a 3x3 square grid pattern. We reference
the different areas by their cardinal direction with respect to the central region, which we denote
capital C (for center). For example, the region to the east of C is denoted E.
Although the decision agents do not share the same physical capabilities, they share the same action
space in our setup. Both the Responder and the Drone can explore one of the 9 regions available.
Upon detection of the Poacher or the Rhino, they can also move towards either.
The LLM is tasked with evaluating the payoff value of each pair of actions at each given time, and
the pair maximising the team’s chances of success is executed. A new decision is triggered in two
scenarios: if an area has been completely searched or if a new agent is perceived.

4.2.2 P ERFORMANCE M ETRIC

The metric by which we evaluate a simulation run is the mission success rate. Indeed, all runs that
end with an unintercepted Poacher reaching the Rhino before the Responder are mission failures. In
the opposite case, if the Rhino is secured or if the Poacher is caught we consider the mission to be a
success. Policies are evaluated over a sample of initial simulation conditions.

4.2.3 R ESULTS
Four different planners are evaluated on the Stop-the-Poacher task. We first let the agents take a
uniformly randomized action from the list of available actions at the current state. We also develop
a heuristic that reasons as follows: while neither of the Poacher or the Rhino are spotted, the agents
explore opposite directions of the area (in order: S, SW, W, NW for the Responder, and N, NE, E,

7
Under review as a conference paper at ICLR 2024

SE for the Drone). When one of the Rhino or Poacher are spotted, both the Drone and Responder
begin to make their way towards it. If the second agent is perceived, the response team make their
way towards the closest of the two.
We also propose a Direct LLM decision planner. With knowledge of the positions of the Drone
and Responder, past actions and explored areas, and presence or not of the Rhino and Poacher in
the perception field, the LLM is asked to select a pair of admissible actions in a zero-shot manner,
without any reasoning artifacts or additional design.
We compare these baselines to our Game LLM architecture, which prompts the LLM to output a
score for each pair of actions available to the agents at a given state. The scores can then be combined
to constitute a matrix, the argmax of which is the optimal action as evaluated by the LLM in our
cooperative setting.
Both LLM architectures are instantiated with GPT-4. We access GPT-4 from the OpenAI Python
API and with temperature equal to 0.0, top-p set to 1, and a max number of tokens of 1024.
The results, in terms of number of runs ending in a poach, in an interception of the Poacher, or in a
secured Rhino as well as the total success rates achieved by the different planners are presented in
Table 1.

Table 1: Results for the Stop The Poacher task. (N=64).

ALGORITHM POACHES INTERCEPTIONS SECURED SUCCESS RATE

Random action 55 3 6 14 %
Heuristic 12 33 19 81.25 %
Direct LLM 20 26 18 68.7 %
Game LLM 9 30 25 85.94 %

The heuristic planner that serves as a principled baseline achieves a success rate of just over 80%.
We present a random action planner to exhibit the amount of reasoning required to achieve the task
in a successful way. It completes the task on about 14% of runs. The Direct LLM architecture
can achieve the task just under 70% of the time. Thus, it does exhibit some reasoning capability
to decide simultaneous actions for coupled agents. However, its performance is below that of a
simple hand designed human heuristic. Our proposed architecture improves the multi-agent planning
performance of the LLM, completing the task for close to 86% of runs, a higher return than that of
the baseline heuristic. By design, the Game LLM formulation is better adapted to this type of
problem, as it encourages the LLM to reason about outcomes of combinations of actions instead of
solely predicting a next decision.

4.3 M ULTI - AGENT T RANSPORT C HALLENGE

We test our framework on the ThreeDWorld Transport Challenge (Gan et al., 2021b), more specif-
ically on a multi-agent extension with additional objects and containers, and more realistic objects
placements. The experiment is named ThreeDWorld Multi-Agent Transport (TDW-MAT) (Zhang
et al., 2023). The simulation is also built on top of the general-purpose virtual world simulation
platform TDW (Gan et al., 2021a).

4.3.1 TASK D ESIGN

The agents are tasked with transporting the largest number of target objects possible to goal po-
sitions, using containers as tools (otherwise agents can transport only two objects at a time). The
actions available to agents include exploring a room, moving to an other room, manipulating objects
(pick up, drop), and using containers. The task is simplified from the setup presented in Zhang et al.
(2023). Indeed, we assume centralized information and decision making, and thus remove the need
for communication actions.
We select 6 scenes from the TDW-House dataset and run 2 samples of each of the 2 types of tasks in
each of the scenes, thus building a test set of 24 episodes. Every scene has 6 to 8 rooms, 10 objects,

8
Under review as a conference paper at ICLR 2024

and 4 containers. An episode is terminated if all the target objects have been transported to the goal
position or the maximum number of frames (3000) is reached.

4.3.2 P ERFORMANCE M ETRICS

We first define the Transport Rate (TR) as the fraction of the target objects successfully transported
to the goal position. This will serve as our metric for the task. We can also reinterpret the metric in
terms of Efficiency Improvement (EI), that being with respect to a single agent attempting the task,
which is computed according to EI = N1 ∑Ni=1 (TRmulti,i −TRsingle,i )/TRmulti,i , with TRsingle,i denoting
the single agent’s transport rate for episode i, and TRmulti,i denoting the two-agent transport rate for
episode i.

4.3.3 R ESULTS
We test our Game LLM setup against a Direct LLM planner that has to pick the next pair of actions
directly from the list of available actions to each player. Designing a heuristic for this setup is far
from straighforward so we only compare our approach to a non-game reasoning LLM. The results
are presented in Table 2.

Table 2: Transport Rates and Efficiency Improvements of the Direct LLM method vs our Game
LLM on the TDW-MAT task.
ALGORITHM TRANSPORT RATE EI

Direct LLM 0.64 19%

Game LLM 0.67 22%

The Direct LLM approach achieves a Transport Rate of 0.64 and is outperformed by our Game LLM
planner (0.67). We believe the expected improvement with respect to the direct method is mitigated
by the migration of the task from a decentralized setup (may need additional adaptation to truly
capture the mecanisms we seek to act upon). However, this remains in line with the remarks made
above about the more explicit reasoning about interaction that our architecture imposes on the LLM.

5 D ISCUSSION
In this work, we propose a general framework to automate the design of game objectives for multi-
agent planning that capitalizes on Large Language Models’ reasoning capabilities without compro-
mising on the interactive dimension of the problem.
Our methodology shows promise as a human assistant for the design of scenarios with desired inter-
active behavior. We experiment with designing autonomous vehicle scenarios with non-cooperative
quadratic cost structures and show the ease with which high-level descriptions can be converted into
cost parameters that achieve the sought multi-agent conduct. We also notice that through simple
human exchange, our solution proves to be an efficient tuning aide, understanding brief feedback
and efficiently adjusting coupled parameters.
The proposed architecture also proves effective as a planner in dynamical multi-agent environments
where players are faced with a succession of decisions. Evaluating it in two such simulation settings,
our proposed planner surpasses both the hand designed heuristic and the direct action selection
approach on the Stop-The-Poacher task, as well as exceeding the latter approach in the TDW-MAT
Challenge.
One major limitation of our approach is scalability. Although one can instruct the LLM to reduce its
search space by only considering a subset of actions that are the most sensible for each agent before
evaluating combinations, it is clear that for large systems the number of action combinations grows
exponentially with the number of agents. However we argue that in most robotics scenarios, it is
seldom the case that more than a handful of agents interact at once.
Extensions to such scenarios beyond two agents and competitive settings with partial observations
of the decision adversary are directions of future work.

9
Under review as a conference paper at ICLR 2024

R EFERENCES
Boyuan Chen, Fei Xia, Brian Ichter, Kanishka Rao, Keerthana Gopalakrishnan, Michael S Ryoo,
Austin Stone, and Daniel Kappler. Open-vocabulary queryable scene representations for real
world planning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.
11509–11522. IEEE, 2023.
Simon Le Cleac’h, Mac Schwager, and Zachary Manchester. ALGAMES: A fast solver for con-
strained dynamic games. CoRR, abs/1910.09713, 2019.
Vishnu Sashank Dorbala, James F Mullen Jr, and Dinesh Manocha. Can an embodied agent find
your” cat-shaped mug”? llm-based zero-shot object navigation. arXiv preprint arXiv:2303.03480,
2023.
Axel Dreves and Matthias Gerdts. A generalized nash equilibrium approach for optimal control
problems of autonomous cars. Optimal Control Applications & Methods, 39:326–342, 2018.
L. Ferranti, R. R. Negenborn, T. Keviczky, and J. Alonso-Mora. Coordination of multiple vessels
via distributed nonlinear model predictive control. In 2018 European Control Conference (ECC),
pp. 2523–2528, 2018. doi: 10.23919/ECC.2018.8550178.
Jaime F. Fisac, Eli Bronstein, Elis Stefansson, Dorsa Sadigh, S. Shankar Sastry, and Anca D. Dragan.
Hierarchical game-theoretic planning for autonomous vehicles. 2019 International Conference on
Robotics and Automation (ICRA), 2018.
CS Fisk. Game theory and transportation systems modelling. Transportation Research Part B:
Methodological, 18(4-5):301–313, 1984.
Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Ju-
lian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim,
Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund,
David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, and
Daniel L. K. Yamins. Threedworld: A platform for interactive multi-modal physical simulation,
2021a.
Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund,
Daniel L. K. Yamins, James J. DiCarlo, Josh H. McDermott, Antonio Torralba, and Joshua B.
Tenenbaum. The threedworld transport challenge: A visually guided task-and-motion planning
benchmark for physically realistic embodied AI. CoRR, abs/2103.14025, 2021b. URL https:
//arxiv.org/abs/2103.14025.
Leo Gao, John Schulman, and Jacob Hilton. Scaling laws for reward model overoptimization. In
International Conference on Machine Learning, pp. 10835–10866. PMLR, 2023.
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan
Tompson, Igor Mordatch, Yevgen Chebotar, et al. Inner monologue: Embodied reasoning through
planning with language models. arXiv preprint arXiv:2207.05608, 2022.
Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo. Gpt-4 passes
the bar exam. Available at SSRN 4389233, 2023.
Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu, and Claire Tomlin. Multi-hypothesis inter-
actions in game-theoretic motion planning. In 2021 IEEE International Conference on Robotics
and Automation (ICRA), pp. 8016–8023. IEEE, 2021.
Steven M LaValle. Robot motion planning: A game-theoretic foundation. Algorithmica, 26:430–
465, 2000.
Simon Le Cleac’h, Mac Schwager, and Zachary Manchester. Lucidgames: Online unscented inverse
dynamic games for adaptive trajectory prediction and planning. IEEE Robotics and Automation
Letters, 6(3):5485–5492, 2021.
OpenAI. Gpt-4 technical report, 2023.

10
Under review as a conference paper at ICLR 2024

Lasse Peters, David Fridovich-Keil, Vicenç Rubies-Royo, Claire J Tomlin, and Cyrill Stachniss.
Inferring objectives in continuous dynamic games from noise-corrupted partial state observations.
arXiv preprint arXiv:2106.03611, 2021.

Felix Rey, Zhoudan Pan, Adrian Hauswirth, and John Lygeros. Fully decentralized admm for co-
ordination and collision avoidance. In 2018 European Control Conference (ECC), pp. 825–830,
2018. doi: 10.23919/ECC.2018.8550245.

Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, and Anca D. Dragan. Planning for autonomous
cars that leverage effects on human actions. In Robotics: Science and Systems, 2016.

Larry Samuelson. Game theory in economics and beyond. Journal of Economic Perspectives, 30
(4):107–130, 2016.

Dhruv Shah, Michael Robert Equi, Błażej Osiński, Fei Xia, Sergey Levine, et al. Navigation with
large language models: Semantic guesswork as a heuristic for planning. In 7th Annual Conference
on Robot Learning, 2023a.

Dhruv Shah, Błażej Osiński, Sergey Levine, et al. Lm-nav: Robotic navigation with large pre-
trained models of language, vision, and action. In Conference on Robot Learning, pp. 492–504.
PMLR, 2023b.

Yoav Shoham, Rob Powers, and Trond Grenager. If multi-agent learning is the answer, what is the
question? Artificial intelligence, 171(7):365–377, 2007.

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter
Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using
large language models. In 2023 IEEE International Conference on Robotics and Automation
(ICRA), pp. 11523–11530. IEEE, 2023.

Riccardo Spica, Eric Cristofalo, Zijian Wang, Eduardo Montijano, and Mac Schwager. A real-time
game theoretic planner for autonomous two-player drone racing. IEEE Transactions on Robotics,
36:1389–1403, 2018.

Francesco Stella, Cosimo Della Santina, and Josie Hughes. How can llms transform the robotic
design process? Nature Machine Intelligence, pp. 1–4, 2023.

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford,
Dario Amodei, and Paul F Christiano. Learning to summarize with human feedback. Advances
in Neural Information Processing Systems, 33:3008–3021, 2020.

Peter Trautman and Andreas Krause. Unfreezing the robot: Navigation in dense, interacting crowds.
In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 797–803,
2010. doi: 10.1109/IROS.2010.5654369.

Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. Chatgpt for robotics: Design
principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.

Mingyu Wang, Zijian Wang, John Talbot, J. Christian Gerdes, and Mac Schwager. Game-theoretic
planning for self-driving cars in multivehicle competitive scenarios. IEEE Transactions on
Robotics, 37(4):1313–1325, 2021. doi: 10.1109/TRO.2020.3047521.

Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Are-
nas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted
Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa,
and Fei Xia. Language to rewards for robotic skill synthesis. Arxiv preprint arXiv:2306.08647,
2023.

Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tian-
min Shu, and Chuang Gan. Building cooperative embodied agents modularly with large language
models, 2023.

11
Under review as a conference paper at ICLR 2024

A A PPENDIX
A.0.1 D ETAILS : AUTONOMOUS D RIVING E XAMPLE
The dynamic game solver we use is ALGAMES (Cleac’h et al., 2019), which handles trajectory
optimization problems with multiple actors and general nonlinear state and input constraints. The
vehicles obey nonlinear unicycle dynamics. The state of a vehicle xti comprises of its 2D position
(px horizontal, py vertical), its heading angle and scalar velocity. The control input uti comprises of
the angular velocity and scalar acceleration.
The dynamics constraints at time t consist in following the system dynamics given by equation 1,
with f being unicycle dynamics in our driving simulations. We also enforce collision-avoidance
constraints on the trajectories, by modelling collision zones of the vehicles by circles or radius
r, such that, at any time step t, ∥pti − ptj ∥22 ≥ r2 , ∀i, j ∈ {1, . . . , N}. In addition, we require the
vehicles to remain on the road, by constraining the distance between each vehicle and the closest
point q on each boundary b to remain larger than the collision radius r, ∥pti − qb ∥22 ≥ r2 , ∀b, ∀i ∈
{1, . . . , N} where pti = [pxti , pyti ] contains the plane coordinates of the agent i at time t extracted from
the complete state vector xti . Thus, the autonomous driving problem is formalized via non-convex
and non-linear coupled constraints.
The cost structure considered is quadratic, penalizing the distance to the desired final state x f and
the use of controls,
H−1
1 1 1
J θi (X, π i ) = ∑ (xt − xθf i )⊤ Qθi (xt − xθf i ) + ut⊤ Rθi ut + (xH − xθf i )⊤ Qθf i (xH − xθf i ), (6)
t−1 2 2 2

where Qθi , Rθi and Qθf i represent state, input and final state penalization weight matrices, respec-
tively. They can all be parameterized such that the LLM decides what values to encode in them. The
same applies to the final desired state xθf i . This cost function depends only on the decision variables
of vehicle i, as players’ behaviors are only coupled through collision constraints. Thus, although
knowledge of other agents’ intentions does not intervene in the individual cost agent i is optimizing
for, it does however determine the trajectories of others, and subsequently the collision constraints
agent i has to satisfy.
The parameters we ask the LLM (GPT-4) to select are the target driving speed, the target lane to drive
in (both in xθf i ), the cost associated with deviations from the desired speed, the cost associated with
deviations from the desired lane, and the cost associated with deviations from the desired driving
angle, which is set to be parallel to the road (part of Qθi ).

Radio Channel Modelling For 5G PDF
No ratings yet
Radio Channel Modelling For 5G PDF
114 pages
LLMs abs aws
No ratings yet
LLMs abs aws
12 pages
2024.emnlp-main.416
No ratings yet
2024.emnlp-main.416
18 pages
LLM Deliberation
No ratings yet
LLM Deliberation
42 pages
Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation
No ratings yet
Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation
9 pages
2406.06485v1
No ratings yet
2406.06485v1
17 pages
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
No ratings yet
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
29 pages
A Survey on Large Language Model-Based Game Agents
No ratings yet
A Survey on Large Language Model-Based Game Agents
23 pages
2412.12119v1
No ratings yet
2412.12119v1
71 pages
Nikolay Khachatryan Project S3
No ratings yet
Nikolay Khachatryan Project S3
19 pages
2024.Findings Emnlp.297
No ratings yet
2024.Findings Emnlp.297
24 pages
The Rise and Potential of Large Language Model
No ratings yet
The Rise and Potential of Large Language Model
86 pages
2503.21683v1
No ratings yet
2503.21683v1
9 pages
EScholarship UC Item 4rm2z5w5
No ratings yet
EScholarship UC Item 4rm2z5w5
70 pages
Game Theory LLM
No ratings yet
Game Theory LLM
8 pages
R1
No ratings yet
R1
17 pages
AgentQ
No ratings yet
AgentQ
22 pages
LLMs Agents Guide
No ratings yet
LLMs Agents Guide
11 pages
2309.14365v1 (1)
No ratings yet
2309.14365v1 (1)
15 pages
A Survey On Game Playing Agents and Large Models
No ratings yet
A Survey On Game Playing Agents and Large Models
13 pages
TPTU: Task Planning and Tool Usage of Large Language Model-Based AI Agents
No ratings yet
TPTU: Task Planning and Tool Usage of Large Language Model-Based AI Agents
36 pages
Agentq
No ratings yet
Agentq
22 pages
Song LLM-Planner Few-Shot Grounded Planning For Embodied Agents With Large Language ICCV 2023 Paper
No ratings yet
Song LLM-Planner Few-Shot Grounded Planning For Embodied Agents With Large Language ICCV 2023 Paper
12 pages
2505.09396v1
No ratings yet
2505.09396v1
8 pages
2502.05439v1
No ratings yet
2502.05439v1
36 pages
RESQ Learning in Stochastic Games
No ratings yet
RESQ Learning in Stochastic Games
105 pages
s44336-024-00009-2
No ratings yet
s44336-024-00009-2
43 pages
ToM multi-agent
No ratings yet
ToM multi-agent
13 pages
Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing
No ratings yet
Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing
25 pages
Survey Agent Optimization Arxiv 2503
No ratings yet
Survey Agent Optimization Arxiv 2503
42 pages
Llms 46 60
No ratings yet
Llms 46 60
15 pages
Game Theoretical Motion Planning: Alessandro Zanardi, Saverio Bolognani, Andrea Censi and Emilio Frazzoli
No ratings yet
Game Theoretical Motion Planning: Alessandro Zanardi, Saverio Bolognani, Andrea Censi and Emilio Frazzoli
47 pages
Enabling Intelligent Interactions between an Agent and an
No ratings yet
Enabling Intelligent Interactions between an Agent and an
17 pages
Demonstrating specification gaming in reasoning models
No ratings yet
Demonstrating specification gaming in reasoning models
15 pages
Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control
No ratings yet
Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control
32 pages
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
No ratings yet
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
13 pages
2404.09699v2
No ratings yet
2404.09699v2
10 pages
LLM-Based Agent Society Investigation
No ratings yet
LLM-Based Agent Society Investigation
18 pages
A G: Enhancing Planning Abilities For Large Language Model Based Agent Via Environment and Task Generation
No ratings yet
A G: Enhancing Planning Abilities For Large Language Model Based Agent Via Environment and Task Generation
22 pages
Journal of Automation and Intelligence 2024---A Survey on Multi-Agent Reinforcement Learning and Its Application
No ratings yet
Journal of Automation and Intelligence 2024---A Survey on Multi-Agent Reinforcement Learning and Its Application
19 pages
Empowering Large Language Model Agents Through Act_250303_024632
No ratings yet
Empowering Large Language Model Agents Through Act_250303_024632
19 pages
Towards Efficient LLM Grounding for Embodiesd Multi-Agent Collaboration
No ratings yet
Towards Efficient LLM Grounding for Embodiesd Multi-Agent Collaboration
33 pages
LLM
No ratings yet
LLM
13 pages
LLM in Army Defense 2407.03453v1
No ratings yet
LLM in Army Defense 2407.03453v1
20 pages
Yang 10124273 Thesis Revised
No ratings yet
Yang 10124273 Thesis Revised
327 pages
NeurIPS 2022 Multi Agent Reinforcement Learning Is A Sequence Modeling Problem Paper Conference
No ratings yet
NeurIPS 2022 Multi Agent Reinforcement Learning Is A Sequence Modeling Problem Paper Conference
13 pages
LLM-based Agentic Systems in Medicine And Healthcare Jianing Qiu Kyle Lam Guohao Li Amish Acharya Tien Yin Wong Ara Darzi Wu Yuan Eric J. Topol Nature Machine Intelligence December 2024
No ratings yet
LLM-based Agentic Systems in Medicine And Healthcare Jianing Qiu Kyle Lam Guohao Li Amish Acharya Tien Yin Wong Ara Darzi Wu Yuan Eric J. Topol Nature Machine Intelligence December 2024
3 pages
Mathematics of Multi-Agent Learning Systems at The Interface of Game Theory
No ratings yet
Mathematics of Multi-Agent Learning Systems at The Interface of Game Theory
8 pages
Examining LLMs in Economic Settings
No ratings yet
Examining LLMs in Economic Settings
46 pages
A Biologically Inspired Architecture For Multiagent Games: Abstract-This Paper Reports Modifications On A Biologically
No ratings yet
A Biologically Inspired Architecture For Multiagent Games: Abstract-This Paper Reports Modifications On A Biologically
6 pages
Reinforcement Learning in Game Playing From Checkers to Complex Worlds
No ratings yet
Reinforcement Learning in Game Playing From Checkers to Complex Worlds
10 pages
On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
No ratings yet
On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
13 pages
Utility_engineering - Copy (2)
No ratings yet
Utility_engineering - Copy (2)
36 pages
2502.08640v2
No ratings yet
2502.08640v2
38 pages
LLM Agents
No ratings yet
LLM Agents
15 pages
The Role of AI and LLMs in Video Games
No ratings yet
The Role of AI and LLMs in Video Games
2 pages
Can LLMs Really Reason and Plan - blog@CACM - Communications of The ACM
No ratings yet
Can LLMs Really Reason and Plan - blog@CACM - Communications of The ACM
7 pages
AIM: S L L M - M - A S O - E: Etropolis Caling Arge Anguage Odel Based Ulti Gent Imulation With UT OF Order Xecution
No ratings yet
AIM: S L L M - M - A S O - E: Etropolis Caling Arge Anguage Odel Based Ulti Gent Imulation With UT OF Order Xecution
13 pages
Adaptive In-Conversation Team Building For Language Model Agents
No ratings yet
Adaptive In-Conversation Team Building For Language Model Agents
22 pages
LangGraph in Action: Practical Strategies for Designing Robust AI Agent Architectures
From Everand
LangGraph in Action: Practical Strategies for Designing Robust AI Agent Architectures
Aarav Joshi
No ratings yet
Multi Agent System: Fundamentals and Applications
From Everand
Multi Agent System: Fundamentals and Applications
Fouad Sabry
No ratings yet
Vishal Chandrasekar Resume Final
No ratings yet
Vishal Chandrasekar Resume Final
3 pages
Experimental Verification and Comparison of Analytical and FE Models For Calculation of A Bitter Solenoid
No ratings yet
Experimental Verification and Comparison of Analytical and FE Models For Calculation of A Bitter Solenoid
15 pages
Research Proposal - Australian National University FRT Fellowship
No ratings yet
Research Proposal - Australian National University FRT Fellowship
2 pages
EC2017 R1 WhatsNew PDF
No ratings yet
EC2017 R1 WhatsNew PDF
69 pages
ADVEI-D-21-00523_reviewer
No ratings yet
ADVEI-D-21-00523_reviewer
20 pages
Physics Based Animation PDF
No ratings yet
Physics Based Animation PDF
2 pages
Morris Method With Improved Sampling Strategy and Sobol' Variance-Based Method, As Validation Tool On Numerical Model of Richard's Equation
No ratings yet
Morris Method With Improved Sampling Strategy and Sobol' Variance-Based Method, As Validation Tool On Numerical Model of Richard's Equation
18 pages
Train Delay & Economic Impact of In-Service Failures of Railroad Rolling Stock
No ratings yet
Train Delay & Economic Impact of In-Service Failures of Railroad Rolling Stock
10 pages
IFASD2003 Livne FutureofAeroelasticity
No ratings yet
IFASD2003 Livne FutureofAeroelasticity
74 pages
ASTM D 5611 - 94 Sensitivity Analysis For GW Flow Model Application
No ratings yet
ASTM D 5611 - 94 Sensitivity Analysis For GW Flow Model Application
5 pages
Exhaust Manifold
100% (1)
Exhaust Manifold
5 pages
Mathematical Modeling For Business Analytics 1st Edition William P. Fox All Chapter Instant Download
100% (3)
Mathematical Modeling For Business Analytics 1st Edition William P. Fox All Chapter Instant Download
62 pages
User Manual of Emergy Simulator
No ratings yet
User Manual of Emergy Simulator
36 pages
Dynamics Hysys
No ratings yet
Dynamics Hysys
34 pages
FSI Methods
No ratings yet
FSI Methods
38 pages
Ecotectanalysis Detail Brochure
No ratings yet
Ecotectanalysis Detail Brochure
4 pages
Buy ebook Research amp Innovation Forum 2019 Technology Innovation Education and their Social Impact Anna Visvizi cheap price
No ratings yet
Buy ebook Research amp Innovation Forum 2019 Technology Innovation Education and their Social Impact Anna Visvizi cheap price
55 pages
AspenIcarusV10 1 Ref
No ratings yet
AspenIcarusV10 1 Ref
1,393 pages
AHSS Implementation - Liquid Metal Embrittlement Study - AHSS Insights
No ratings yet
AHSS Implementation - Liquid Metal Embrittlement Study - AHSS Insights
5 pages
Abstract Book: Construction in Developing Countries
No ratings yet
Abstract Book: Construction in Developing Countries
67 pages
Tutorial myRTM
No ratings yet
Tutorial myRTM
20 pages
AI in Smart Energy Systems Lecture 6 Notes
No ratings yet
AI in Smart Energy Systems Lecture 6 Notes
19 pages
Java Da 4
No ratings yet
Java Da 4
16 pages
GPSS World Simulation Report
No ratings yet
GPSS World Simulation Report
10 pages
Flight Trajectory of A Golf Ball For A Realistic Game: Seongmin Baek and Myunggyu Kim
No ratings yet
Flight Trajectory of A Golf Ball For A Realistic Game: Seongmin Baek and Myunggyu Kim
5 pages
Literasi Sains
No ratings yet
Literasi Sains
9 pages
Thompson 2016
No ratings yet
Thompson 2016
40 pages
State Space Model of An Aircraft Using Simulink
No ratings yet
State Space Model of An Aircraft Using Simulink
7 pages
Avinocs MilBus
No ratings yet
Avinocs MilBus
2 pages

8283_Large_Language_Models_Can

Uploaded by

8283_Large_Language_Models_Can

Uploaded by

Under review as a conference paper at ICLR 2024

L ARGE L ANGUAGE M ODELS C AN D ESIGN G AME -

Autonomous Driving Stop-The-Poacher Multi-Agent Transport

3 LLM S AS GAME OBJECTIVE DESIGNERS

3.1 G ENERALIZED NASH E QUILIBRIUM P ROBLEMS

We consider Generalized Nash Equilibrium Problems (GNEPs) involving N players i ∈ {1, . . . , N}

3.1.1 G AME T RANSLATOR

3.1.2 S IMULATOR F EEDBACK

4.1.2 F ORCE OVERTAKE

4.2 S TOP -T HE -P OACHER

Stop-the-Poacher is a task that involves heterogeneous robot agents in a cooperative/competitive

4.2.1 TASK D ESIGN

4.2.2 P ERFORMANCE M ETRIC

Table 1: Results for the Stop The Poacher task. (N=64).

4.3 M ULTI - AGENT T RANSPORT C HALLENGE

4.3.1 TASK D ESIGN

4.3.2 P ERFORMANCE M ETRICS

Direct LLM 0.64 19%

You might also like