0% found this document useful (0 votes)
27 views

A Control Theory of LLM Prompting

Prompt engineering is crucial for deploying LLMs but is poorly understood mathematically. We formalize LLM systems as a class of discrete stochastic dynamical systems to explore prompt engineering through the lens of control theory. We investigate the reachable set of output token sequences.

Uploaded by

Richard Head
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

A Control Theory of LLM Prompting

Prompt engineering is crucial for deploying LLMs but is poorly understood mathematically. We formalize LLM systems as a class of discrete stochastic dynamical systems to explore prompt engineering through the lens of control theory. We investigate the reachable set of output token sequences.

Uploaded by

Richard Head
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

What’s the Magic Word?

A C ONTROL T HEORY OF LLM


P ROMPTING
Aman Bhargava1 , Cameron Witkowski2 , Manav Shah2 , Matt Thomson1 ∗
1
California Institute of Technology, 2 University of Toronto
{abhargav,mthomson}@caltech.edu
{cameron.witkowski,manav.shah}@mail.utoronto.ca

A BSTRACT
arXiv:2310.04444v3 [cs.CL] 3 Jan 2024

Prompt engineering is crucial for deploying LLMs but is poorly understood math-
ematically. We formalize LLM systems as a class of discrete stochastic dynamical
systems to explore prompt engineering through the lens of control theory. We
investigate the reachable set of output token sequences Ry (x0 ) for which there
exists a control input sequence u for each y ∈ Ry (x0 ) that steers the LLM to out-
put y from initial state sequence x0 . We offer analytic analysis on the limitations
on the controllability of self-attention in terms of reachable set, where we prove an
upper bound on the reachable set of outputs Ry (x0 ) as a function of the singular
values of the parameter matrices. We present complementary empirical analy-
sis on the controllability of a panel of LLMs, including Falcon-7b, Llama-7b, and
Falcon-40b. Our results demonstrate a lower bound on the reachable set of outputs
Ry (x0 ) w.r.t. initial state sequences x0 sampled from the Wikitext dataset. We
find that the correct next Wikitext token following sequence x0 is reachable over
97% of the time with prompts of k ≤ 10 tokens. We also establish that the top
75 most likely next tokens, as estimated by the LLM itself, are reachable at least
85% of the time with prompts of k ≤ 10 tokens. Intriguingly, short prompt se-
quences can dramatically alter the likelihood of specific outputs, even making the
least likely tokens become the most likely ones. This control-centric analysis of
LLMs demonstrates the significant and poorly understood role of input sequences
in steering output probabilities, offering a foundational perspective for enhancing
language model system capabilities.

1 I NTRODUCTION
LLMs pre-trained on unsupervised next token prediction objectives exhibit unprecedented dynamic
reprogrammability achieved through “prompting”, often referred to as zero-shot learning (Brown
et al., 2020; Wei et al., 2022; Hagendorff, 2023; Noever & McKee, 2023; OpenAI, 2023; 2022).
These capabilities appear to emerge as the model’s size, training data, and training time are scaled.
The dynamic reprogrammability of LLMs is akin to the adaptable computational capacities observed
in biological systems. This feature finds applications across domains such as machine translation
(Wang et al., 2023a), code generation (Rozière et al., 2023), and chatbots Bai et al. (2022). A
rigorous understanding of the prompt’s influence over LLM generation would be of great utility for
understanding LLMs and building more robust and capable systems leveraging LLMs.
Strategies for controlling pre-trained LLM generation today fall into three broad categories (Zhang
et al., 2022):
1. Input Optimization (Prompting): Adjusting the input tokens (e.g., rewording the
prompt) to improve subsequent text generation.
2. Model Optimization: Adjusting the weights of the network (e.g., fine-tuning, RLHF) to
improve model behavior during inference.
3. Post-processing: Adjusting or re-ranking generated text (e.g., surrogate ranking algo-
rithm).

Code will be made available at https://github.com/amanb2000/Magic Words.

1
Of all these approaches, input optimization (i.e., prompting) is the least invasive and lowest-cost
method – and the least understood. Prompt optimization is also deeply connected to the zero-shot
capabilities of LLMs – the mysterious emergent capabilities of LLMs such as problem-solving,
knowledge retrieval, reasoning, and apparent general intelligence (Bubeck et al., 2023). With such
a view, we seek to characterize the controllability of LLMs via prompting.

1.1 C ONTRIBUTION

We formalize LLM systems in the mathematical framework of control theory in Section 3. Our
analysis focuses on the reachable set of outputs Ry (x0 ) for an LLM system. The reachable set
is a fundamental concept in control theory that underlies notions of controllability, stability, and
observability (cf. Appendix A). The reachable output set Ry (x0 ) is the set of output sequences y
for which there exists a control input sequence u∗ that steers the LLM from initial state x0 to output
y (cf. Definitions 3, 11).
Our analytic results in Section 4 prove an upper bound on the contents of the reachable output set
for a self-attention head as a function of the singular values of its parameter matrices. Since self-
attention is the only component in a transformer block where significant information is exchanged
between token representations, this bound provides a foothold for analysis of LLM controllability
from the perspective of mechanistic interpretability (e.g., Bricken et al. (2023); Chefer et al. (2021);
Conmy et al. (2023)). Moreover, this bound represents a necessary condition for an output to be in
the reachable set.
Our empirical results apply state-of-the-art prompt optimization techniques (Section 5.1) to demon-
strate a lower bound on the contents of the reachable output set for a panel of LLMs, including
Llama-7b (Touvron et al., 2023), Falcon-7b, and Falcon-40b (Almazrouei et al., 2023). Specifically,
we sample initial states x0 from the Wikitext dataset (Merity et al., 2016) and probe the reachable
output tokens y under length-constrained control input sequences u : |u| ≤ k. The length constraint
k is highly relevant for optimal control of LLMs, as prompts with fewer tokens require fewer com-
putation and memory resources. We find that the reachable output set contains the “correct” next
Wikitext token following x0 over 97% of the time with prompts of k ≤ 10 tokens. We expand our
analysis of the contents of Ry (x0 ) by sampling target output tokens y based on the LLMs initial
estimate of output likelihood PLM (y|x0 ). We find that the top 75 most likely output tokens y are
reachable at least 85% of the time with prompts of k ≤ 10 tokens. Intriguingly, some tokens drawn
from the set of least likely outputs are controllable to the most likely output with k ≤ 4 control input
tokens. Our results suggest that prior likelihood-based metrics, such as cross-entropy loss, cannot
guarantee exclusion from the reachable set, emphasizing the gap in our current understanding of
LLM systems and control. Implications of our results and open questions in LLM control theory are
further discussed in Section 6.

2 R ELATED W ORK
Much of the work on prompt optimization is concerned with finding prompts that induce higher
LLM performance on “fill-in-the-blank” or “cloze” tasks (Taylor, 1953). One can frame a range
of tasks including knowledge retrieval (Petroni et al., 2019), reasoning (Weston et al., 2016), and
sentiment analysis (Wang et al., 2023b) as fill-in-the-blank tasks:

• Knowledge Retrieval: “The Titanic sank in the year [MASK].” (Answer: “1912”)
• Reasoning: “A is taller than B. B is taller than C. Is A taller than C? Answer: [MASK]”
(Answer: “Yes”)
• Sentiment Analysis: “I am sad today. The sentiment of the previous sentence was
[MASK]” (Answer: “Negative”)

Notably, there is some freedom in the bolded “prompt text” that surrounds the question to convert
it into a “fill-in-the-blank” task. As it turns out, the prompt tokens have a large effect on LLM
performance (Brown et al., 2020; Zhang et al., 2022; Jiang et al., 2020).
Modern prompt optimization algorithms generally consist of two iterated steps: a sampling step
where new prompts are generated and a testing step where the utility of the new prompts is evaluated,

2
and the best are selected for the next iteration. Algorithms primarily differ in the sampling procedure,
where various heuristics may be used to pick high-value swaps (Wen et al., 2023; Zhou et al., 2023;
Reynolds & McDonell, 2021). Overall, AutoPrompt and its derivative algorithms have been the most
numerically successful prompt optimization methods, with the greedy coordinate gradient (GCG)
algorithm having state-of-the-art performance (Zou et al., 2023).

The AutoPrompt Family: AutoPrompt (Shin et al., 2020) pioneered the current wave of prompt
optimization. Shin et al propose a prompt optimization technique and demonstrate its effectiveness
for engineering prompts to improve LLM performance on knowledge and sentiment analysis tasks.
At its core, the AutoPrompt algorithm leverages gradient information at the token embedding layer
to inform iterative token exchanges within the prompt. This method was extended in Zou et al.
(2023) as the greedy coordinate gradient (GCG) algorithm. Taking inspiration from adversarial ex-
amples (Goodfellow et al., 2015), Zou et al applied this AutoPrompt variant to generate “jailbreak”
prompts that cause aligned LLMs to generate objectionable content.

Other Prompt Optimization Methods: Other investigations on LLMs as prompt optimizers


(Zhou et al., 2023) and further analysis of manual prompt optimization (Reynolds & McDonell,
2021) are informative but do not exceed the AutoPrompt family’s performance. Some other methods
include GBDA (Guo et al., 2021), an approach based on the Gumbel-Softmax reparametrization, the
PEZ algorithm (Wen et al., 2023), which directly optimizes embeddings via gradient information,
and FluentPrompt (Shi et al., 2022), which differs from AutoPrompt by incorporating Langevin
dynamics. Despite the variety of alternatives, GCG retains state-of-the-art performance.

Control Theory for LLMs: To our knowledge, the only other work to date on the controllabil-
ity of LLMs is Soatto et al. (2023). Soatto et al analyze the controllability of LLMs in terms of
“meaningful sentences”, defined as the sigma-algebra generated by snippets of text written on the
Internet. Their empirical analysis revolves around demonstrating that LLMs are capable of attribut-
ing meaning. The theoretical analysis of LLM controllability is limited to “meaningful sentences”,
eliminating the possibility of out-of-distribution inputs and outputs. These restrictions render their
results challenging to leverage toward a practical understanding of LLM controllability. We situate
our work as a practically oriented exploration of LLM controllability. Motivated by challenges in
developing LLM systems, we do not eliminate “meaningless sentences” from the state space or in-
put space. We aim to establish a rigorous, general framework for understanding LLM systems and
controllability that is amenable to the development of theory and practical engineering insights on
systems design.

3 C ONTROL T HEORY FOR LLM S


Control theory originates from the study of automatic control systems in engineering. It seeks to
understand how a “plant” system can be influenced toward a desired state using a “control signal” –
often in the presence of disturbances and uncertainty.
Control theory is central to a variety of engineering problems, from electrical engineering to au-
topilot to telecommunications to manufacturing. Surprisingly, control theory has also been highly
applicable to a diverse range of scientific disciplines. Analyzing systems through the lens of con-
trollability has proven fruitful for generating insight into biological systems such as cell signaling
pathways and neural networks (Yi et al., 2000), the economics of central banking (Aniţa et al., 2011),
and controlling the spread of infectious diseases (Roy et al., 2009). One of the central benefits of
studying systems via controllability is that a range of questions and problems naturally emerge from
the framing: when is control possible? What is the cost of control? How computationally intensive
is control? These questions are both practically useful and often lead to fundamental insights about
the nature of the system in question.
To develop a control theory of LLMs, we provide fundamental abstract definitions of systems and
control (Appendix A). We apply them to define LLM systems and outline specific canonical control
concepts and problems such as controllability and reachability that arise naturally for LLM systems.

Language Model Notation: We denote a causal language model using PLM . PLM maps from
an ordered list of tokens from a vocabulary set V (e.g., x ∈ V n ) to the probability distribution over

3
the next token PLM (xn+1 |x) ∈ [0, 1]|V| . We use V ∗ to denote the set of all possible sequences
of any length composed of tokens from V. The addition operator indicates the concatenation of
token sequences. Bolded lowercase variables (e.g., x = [x1 , . . . , xn ]) denote token sequences while
unbolded lowercase variables refer to individual tokens (e.g., x ∈ V). The length of a token sequence
is denoted |x|.
While LLMs are at times leveraged in a manner that masks the iterative aspects of generation,
the reality is that token generation and externally imposed “control input” sequences are generated
and processed sequentially, leading to non-trivial system dynamics. Several key differences remain
between LLM-based systems and systems typically modeled through ordinary differential equations
(ODEs), which have long been a cornerstone in the study of continuous-time dynamical systems:

1. Discrete state and time: LLM systems operate on sequences of discrete tokens over a
discrete time set, in contrast to the continuous state spaces and time sets studied in classical
control theory.
2. Shift-and-Grow State Dynamics: Whereas the system state in an ODE-based system has
a fixed size over time, the system state x(t) for LLM systems grows as tokens are added to
the state sequence.
3. Mutual exclusion on control input token vs. generated token: The LLM system state
x(t) is written to one token at a time. The newest token is either drawn from the control
input u(t) or is generated by the LLM by sampling x′ ∼ PLM (x′ |x(t)). This differs from
traditional discrete stochastic systems, where the control sequence and internal dynamics
generally affect the state synchronously.

We begin by rigorously defining LLM systems with user input, drawing from the abstract mathe-
matical definition of a system (Definition 7).
Definition 1 (LLM System with Control Input). An autoregressive LLM system with control input
Σ = (V, PLM ) consists of:

• T = N : The time set is the natural numbers.


• X = V ∗ : The state space consists of all possible token sequences of any length drawn
from V. We denote the state at time t as x(t) = [x0 (t), . . . , xt (t)].
• U = V ∪ ∅ : The input takes values from the vocabulary set V or null.
• ϕ : X × U × T 2 → X : The transition map is

x(t) + u(t) if u(t) ̸= ∅
ϕ(x(t), u(t), t, t + 1) = (1)
x(t) + x′ |x′ ∼ PLM (x′ |x(t)) else
Note that the general multi-step transition map ϕ(x(t), u, t, t + N ) can be achieved by
iterating equation 1 for control sequences u defined over the interval [t, t + N ].
• h(x(t); r) = [xt−r (t), . . . , xt (t)] : The readout map returns the most recent r tokens from
state x(t).

We note that this LLM system definition is generalizable to a variety of LLM augmentation, in-
cluding chain-of-thought (Wei et al., 2023), retrieval-augmented generation (Lewis et al., 2020),
and chatbot interaction. For example, chain-of-thought is equivalent to sampling the readout map
h(x(t), r) at time T > |u| + |x0 | + r for prompt u and initial state x0 . A similar formulation may
be applied to LLM systems endowed with programmatic tools (e.g., Patil et al. (2023)).
In Definition 1, we assume that the control input gets to “decide” whether to yield token generation
to the LLM (u(t) = ∅) or override the LLM and add some token u(t) ̸= ∅ to the state x(t).
This assumption generally holds when building LLM systems, though it may not hold when using
existing systems (e.g., via non-streaming API). When discussing finite-length control inputs – e.g.,
the family of k-long input sequences u ∈ V k – the value of u(ℓ) : ℓ > k is implicitly ∅ unless
otherwise stated.
While next token generation x′ ∼ PLM (x′ |x(t)) in equation 1 is probabilistic, we may render
the system deterministic by sampling with zero temperature (i.e., greedy decoding). The greedy

4
decoding assumption provides a foothold to analyze the reachable sets and controllability of LLM
systems without invoking notions of stochastic control as in Sivaramakrishnan et al. (2023); Soatto
et al. (2023). Moreover, it remains connected to temperature-based stochastic decoding strategies as
a limiting case of temperature-based sampling as zero-temperature sampling.
We now extend Definition 10 to define output controllability for LLM systems:
Definition 2 (LLM Output Reachability). Output token sequence y ∈ V r is reachable from initial
state x0 ∈ V ∗ for LLM system Σ(V, PLM ) iff there exists some time T and input u∗ ∈ U T that
steers the LLM from initial state x0 to output y = h(x(T ), r) at time T .
We disregard the trivial solution wherein the control input u∗ (t) overrides the LLM to force the state
sequence to take on the desired output value y.
The reachable output set definition for LLM systems follows from Definition 11:
Definition 3 (LLM Reachable Output Set). The reachable output set from initial state x0 ∈ V ∗ for
LLM system Σ = (V, PLM ) is denoted Ry (x0 ) and consists of all reachable outputs y ∈ V ∗ from
initial state x0 .
Output controllability for LLMs follows from Definition 13.
Definition 4 (LLM Output Controllability). An LLM system Σ = (V, PLM ) is output controllable
iff, for every initial state x0 ∈ V ∗ , the reachable output set Ry (x0 ) = V ∗ .
The turn-based nature of writing to the LLM state sequence x(t) invites the question of whether the
prompt u should preempt the imposed state x0 or come after the state 1 . We focus our efforts on
cases where u comes before imposed state sequence x0 due to its importance for developing system
prompts and controlling text completion-based generation where the desired output is x0 + y∗ for
some desired continuation y∗ of partial string x0 . Due to the costly nature of long prompts, we are
especially interested in the existence of prompts u∗ with minimal length |u∗ |.
Definitions 3 and 4 form the basis for our control theory of LLMs. While amenable to analytic
analysis as in Section 4 and Soatto et al. (2023), empirical analysis of the reachable set and control-
lability is challenging due to the intractable size of V ∗ . We propose the following statistical measure
of controllability for practically assessing the controllability of an LLM system w.r.t. a dataset D
under prompt length constraint |u| ≤ k:
Definition 5 (k − ϵ Controllability). Consider a dataset of state-output pairs D = {(xi0 , yi )}i∈[N ] .
An LLM Σ = (V, PLM ) is k − ϵ controllable w.r.t. D if, for at least a proportion of (1 − ϵ) of
(xi0 , yi ) ∈ D, the following condition holds:
yi ∈ Rky (xi0 ) (2)
Where Rky (xi0 ) is the reachable set of outputs as in Definition 3 under the constraint that prompts u
must have length |u| ≤ k.
Our empirical work in Section 5.2 explores k − ϵ controllability w.r.t. initial states x0 sampled
from the Wikitext dataset. While empirical analysis of LLM controllability is challenging due to
the lack of apparent structure in LLM dynamics and the combinatorially large state space, we may
still experimentally establish the existence of optimal prompts u∗ that elicit a given output, and thus
establish a lower bound on the content of the reachable set. Meanwhile, our theoretical work in
Section 4 establishes upper bounds on the content of the reachable set for self-attention. We hope
these complementary approaches aid in unifying our understanding of LLM systems.

4 M ATHEMATICAL A NALYSIS ON THE C ONTROLLABILITY OF


S ELF -ATTENTION
Self-attention is a central component in modern transformer-based language models (Brown et al.,
2020; Touvron et al., 2023; Radford et al., 2019; Min et al., 2023). Introduced in Vaswani et al.
1
Both situations are reasonable in developing LLM systems: u preceding x0 may arise when prompting an
LLM to complete a partial string x0 . u proceeding x0 may arise when prompting an LLM in the presence of
an imposed system prompt x0 . Therefore, how an initial state x0 is interleaved with control input u is largely
a design decision.

5
(2017), self-attention is the primary component in transformers where token representations ex-
change information.
Definition 6 (Self-Attention). Self-attention Ξ = (Wq , Wk , Wv ) is a map from RN ×din →
RN ×dout where N is an arbitrary number of input token representations each of dimensionality
din , and dout is the dimensionality of the output token representations.
 ⊤

Ξ(X) = D−1 exp QK √
d
V (3)
k

where exp() denotes element-wise exponentiation of the matrix entries, Wq , Wk ∈ Rdin ×dk ,
Wv ∈ Rdin ×dout , Q = XWq , K = XWk , V = XWv , and D is a diagonal positive definite
matrix defined as    

D = diag exp QK √
d
1 N ×1 (4)
k

where 1N ×1 is an N × 1 matrix of ones.


Note that the parameters and operation of Ξ are independent of the number of token representations
N . Self-attention may be applied to discrete token sequences x ∈ V ∗ provided that each token
xi ∈ x is first mapped to a representation in the input dimension with some embedding map E :
V → Rdin .
We are interested in the reachability of output token representations, where we partition the input
X ∈ R(k+M )×din into an k × din block of control input representations U and an M × din block
of imposed state representations X0 (cf. Definition 1). We also partition the output X′ = Ξ(X) ∈
R(k+M )×din into a corresponding k × dout matrix U′ and an M × dout matrix Y. Motivated by
the architecture of transformer-based language models, we seek to characterize the reachable set of
output representations Y ∈ Rky (X0 ) under imposed input representations X0 and controllable input
representations U, where U consists of k token representations. While the reachable set is now a set
of continuous-valued output representation matrices in RM ×din , we may readily adapt Definition 3
to define the reachable set for these conditions.
Theorem 1 (Condition for Exclusion from the Reachable Set). A desired output representation
Y∗ ∈ RM ×dout must be excluded from the reachable set Rky (X0 ) if the following condition holds
for any row i:

i
D̂xx
⟨y∗ i , ŷxi ⟩ ≤ 0 and ∥y∗ i −   ŷxi ∥ ≤ σq Ωu (5)
i Ωx σq σk Ωu
D̂xx + k exp √
dk

where Ωu = maxj ∥uj ∥ for rows uj of U, Ωx = maxj ∥xj0 ∥ for rows xi0 of X0 , σq and σk are the
i
maximum singular values of Wq , Wk respectively, D̂xx is the ith element on the diagonal of D̂xx ,
which is given by    

D̂xx = diag exp Qx√Kd x 1M ×1 , (6)
k

y∗ i is the ith row of Y∗ , and yxi is the ith row of Ŷx , which is given by
 ⊤

Ŷx = D̂−1xx exp
Qx √Kx
d
Vx (7)
k

where Qx = X0 Wq , Kx = X0 Wk , and Vx = X0 Wv
The proof of Equation 5 is provided in Appendix B.

Intuitively, this condition arises when when the output representation Ŷx that occurs when only the
imposed state X0 is fed into the transformer is too far away from the desired Y∗ for the control
tokens U to “steer” the output to Y∗ . The ability for the control input U to nullify the impact
of Ŷx = Ξ(X0 ) on the output scales with the number of control input tokens k. A control input
with many tokens can “dominate” the influence of X0 by re-allocating attention away from the
component of the output Ŷx that arises from X0 . A notable insight from the proof is that one may
decompose the output of attention into a components that arise largely from different parts of the
input. While there are cross terms in the attention matrix, these amount to only a positive scaling
factor applied to the “independent” components like Ŷx . Thus, we have an analytic bound on the
reachable output set for self-attention (see further commentary in Section 6).

6
5 E XPERIMENTS

To gain a practical, empirical understanding of the reachable set Rky (x0 ), we probe the existence of
optimal prompts u∗ across datasets D of initial state–desired output pairs (x0 , y ∗ ). We scope our
experiments to study immediate control (i.e., we check the LLM output after |y ∗ | tokens are gener-
ated) where the control input u is prepended to the imposed state x0 . Moreover, we focus on the
case of controlling the LLM system to produce a single output token y ∗ ∈ V under some constraint
|u| ≤ k. This “single-step” control renders the problem of gauging reachability computationally
tractable and is a fundamental step toward understanding the iterated dynamics of LLM systems in
terms of reachability and controllability. We leave the exploration of reachability and controllability
under an extended time horizon (e.g., chain-of-thought, chatbot dynamics, tool-wielding LLMs) and
under the requirement of multi-token outputs y to future work.

5.1 M ETHODS

We apply prompt optimization algorithms to establish the existence of optimal prompts u∗ of length
k that steer the LLM system from initial state x0 to output y for some dataset D of initial state-output
pairs. In general, prompt optimization algorithms accept a token sequence and a loss function on
said token sequence, along with a specification of which tokens are manipulable. The output of a
prompt optimizer is a manipulated token sequence (i.e., optimized prompt) designed to minimize the
loss. We apply two computational methods to generating optimal prompts: greedy back-generation
and greedy coordinate gradient (GCG, invented in Zou et al. (2023)). We found that greedy back-
generation performed best for short prompts k ≤ 3 tokens, while GCG was the best-performing
algorithm for prompts of 4 or more tokens. To our knowledge, our greedy back-generation algorithm
is novel. For brevity, we place the full description of the algorithms and our parameter values for
the two algorithms in Appendix C, as the specifics of the algorithms are not the main contribution
of this work.
We focus on understanding the content and structure of the reachable set of LLM system outputs
Rky (x0 ), particularly under a constraint on the number of input tokens k. To determine which out-
put tokens are reachable under varying input sequence lengths, we apply an incremental prompt
lengthening procedure when searching for optimal prompts on some dataset D.

Algorithm 1 Back-off Prompt


Require: State-output token sequence (x0 , y); LLM system Σ = (PLM , V).
1: for k from 1 to 3 do
2: uk = Greedy Back Generate(x0 , y; Σ)
3: return uk if it steers Σ from x0 → y.
4: end for
5: for k ∈ [4, 6, 8, 10] do
6: uk = Greedy Coordinate Gradient(x0 , y; Σ)
7: return uk if it steers Σ from x0 → y.
8: end for
9: return Failed to establish reachability.

5.2 R ESULTS

Our results revolve around the reachable set Rky (x0 ) for state sequences sampled from the Wikitext
dataset. Results were computed for a panel of models, including Falcon-7b, Falcon-40b, and Llama-
7b. Falcon-7b results are showcased in this section while additional plots and results for Falcon-40b
and Llama-7b can be found in Section D. We applied the same Back-off Prompt strategy (Algo-
rithm 1) to determine k − ϵ controllability for all experiments, varying the specifics of the dataset D
for each experiment.

“Ground truth” reachability: We established the reachability of the “ground truth” next token
y proceeding state token sequence x0 in Wikitext. In our tests on a dataset of 5000 state-output
sequences with states of length 8 − 32 tokens, we found that the true next token y is reachable

7
over 97% of the time across all models with a prompt of length k ≤ 10 (Figure 1, top left). Plots
and supplementary figures for Falcon-40b and Llama-7b controllability w.r.t. ground truth Wikitext
outputs can be found in Section D.1.

[Falcon-7b] k- Plot for (x_0, y*) from Wikitext [Falcon-7b] k- Plot for Top 75 y* on 25 x_0 from Wikitext
1.0 |x_0| = 8 1.0 |x_0| = 8
|x_0| = 10 |x_0| = 10
|x_0| = 16 |x_0| = 16
|x_0| = 22 |x_0| = 22
|x_0| = 32 0.8 |x_0| = 32
0.8
Portion Incorrect [ ]

Portion Incorrect [ ]
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10
Prompt Length [k] Prompt Length [k]

[Falcon-7b] Base Rank vs. Prompt Length for y* ~ uniform(V) on x_0 from Wikitext
Figure 1: Main experimental results on k − ϵ
controllability of Falcon-7b.
Top left: k − ϵ values for Falcon-7b on ground 10
truth target token y ∗ . 97.16% of the instances
8
were solved with a prompt of length k ≤ 10.
Prompt Length [k]

Top right: k − ϵ values for reaching the top 6


75 most likely outputs y ∗ for each x0 . The top
75 targets were reachable at least 89.39% of the 4
time with a prompt of length k ≤ 10.
2
Bottom right: Prior likelihood rank of target
token y ∗ versus required prompt length to elicit 0
solved
unsolved
y ∗ . Target tokens were sampled uniformly from 0 10000 20000 30000 40000 50000 60000
Base Rank of P(y* | x_0)
the least to most likely token.

Top-75 reachability: To explore the reachable set Ryk (x0 ) beyond the ground truth of Wikitext
outputs, we generated a synthetic dataset of outputs by sampling 25 Wikitext sequences x0 and
selecting the top 75 most likely next-tokens according to the model itself PLM (y|x0 ) as the target
tokens (Figure 1, top right). We found that the top 75 output tokens were reachable over 85% of the
time for all models with control sequence length k = 10. Supplementary figures including results
for Llama-7b and Falcon-40b on k − ϵ controllability with respect to the top 75 most likely output
tokens can be found in Section D.2.

Uniformly sampled target outputs: To maximally push the bounds of the reachable set within
our single output token scope, we created another synthetic dataset where the target output token
y ∗ was sampled uniformly from the highest likelihood next token to the lowest likelihood token.
Although the overall k − ϵ score was relatively poor (only 46.43% reachable with k = 10 for
Falcon-7b), we were intrigued by the near-uniform relationship between prior token rank (based
on PLM (y|x0 )) versus the required number of prompt tokens. Figure 1 (bottom right) plots the
relationship between prior target token rank based on P (y ∗ |x0 ) and the required prompt length
k to elicit the prompt. While over half were unreachable, the remaining reachable tokens appear
uniformly distributed in terms of required prompt length, regardless of rank. Supplementary figures
analyzing the k − ϵ controllability of Falcon-7b with respect to uniformly sampled target outputs y
can be found in Section D.3.

6 D ISCUSSION

We proposed a control theoretic framework for understanding language model prompting, orienting
our investigation around the reachable set of outputs Rky (x0 ). We proved a bound on the reachable

8
set of outputs for self-attention in terms of the singular values of its weight matrices, and we es-
tablished fundamental results on the reachability of “correct” next tokens (according to Wikitext).
We expanded the scope of this investigation by probing the reachability of tokens assigned high
likelihood by the LLM itself, and tokens assigned minimal likelihood by the LLM itself.
Bounding the reachable set for self-attention is deeply related to the mechanism by which consistent
representations are formed for multi-token generation. Steering a language model to generate a de-
sired token sequence requires that the control input induce a token representation in the right-most
token such that the next token prediction logits P (y|u + x0 ) achieves a desired value. Moreover,
generated tokens are fed back into the model, and their representations must be steered as well to
control iterated generation. Self-attention is the primary mechanism by which the token represen-
tations exchange information, making the reachable set of output representations across multiple
tokens in X0 for self-attention a fundamental part of LLM control theory.
Our empirical results suggest that there is far more to the reachability of a given output than just
prior likelihood or the prior rank the LLM assigns to a given token. Although prompt optimization-
based k − ϵ controllability experiments are only able to provide a lower bound on the content of
the reachable set, the ability to frequently control even the least likely token to being the most likely
token with just a few input tokens is intriguing. We believe this result indicates the importance
of further investigating the reachability and controllability of LLMs, particularly for developing
capable and reliable LLM systems.
Our investigations provide an entry into the understanding of LLM controllability via prompts. How-
ever, a comprehensive understanding necessitates extending our exploration into diverse regimes.
Exploring the controllability with longer prompts and longer questions (base token sequences) will
be pivotal. Equally important is the study of diverse models to verify the generality of our findings.
Moreover, the direct comparison of controllability scores of different model families is challenging
since family uses a different tokenizer. The Llama family tokenizer, for instance, has a vocabulary
of 30,000 tokens whereas the Falcon family has a vocabulary of 65,536 tokens. Further work is
required to robustly compare controllability across models.
An intriguing observation from our study is the log-linear relationship between prompt length k and
controllability fraction ϵ (see Figure 2 in Appendix D). While this is compelling within our studied
domain, it raises the essential question: is this relationship robust outside our current explorative
scope? Unearthing universal scaling laws in LLM controllability would not only inform practical
control applications but also open the door for theoretical insight into the nature of LLM behavior.
The progress we have made, both in understanding the bounds on self-attention controllability and
the empirical measures of k − ϵ LLM controllability, underscores the potential of this control theo-
retic framing for studying LLMs. Below is a non-exhaustive list of open problems in LLM control,
all stemming from the framing in section A:

• Control Properties of Chain-of-Thought: Chain-of-Thought is a powerful technique


where LLMs are allowed to generate intermediate tokens (i.e., “thoughts”) between a ques-
tion and an answer (Wei et al., 2023). The control properties (e.g., stability, reachability) of
systems leveraging these techniques are of great interest for understanding and composing
systems of LLMs in the real world.
• Distributional Control: To what extent can we control the output distribution of a lan-
guage model PLM (y|x0 + u) to a desired distribution P ∗ (y)?
• Computational Cost of Control: What are the performance characteristics of LLM con-
trol regularized by computational cost?
• Learnability of Control: To what extent can LLMs learn to control each other? Work such
as Zhou et al. (2023) showed that LLMs are capable of human-level prompt engineering,
but it is unclear how well an LLM can learn to control another when explicitly optimized
on the objective of LLM control.
• Controllable Subspaces: In the control of linear dynamical systems, it is known that un-
controllable systems are often coordinate transformable into a representation where a subset
of the coordinates are controllable and a subset are uncontrollable Sontag (2013). We have
shown that controllable and uncontrollable components naturally emerge for self-attention

9
heads in section 4 – can this be generalized to transformer blocks with nonlinearities and
residual streams?
• Composable LLM Systems: One of the greatest boons of control theory is the ability to
compose control modules and subsystems into an interpretable, predictable, and effective
whole (Lian et al., 2002). The composition of LLM systems (potentially with non-LLM
control modules) is an exciting avenue for scaling super intelligent systems.

R EFERENCES
Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Co-
jocaru, Merouane Debbah, Etienne Goffinet, Daniel Heslow, Julien Launay, Quentin Malartic,
Badreddine Noune, Baptiste Pannier, and Guilherme Penedo. Falcon-40B: an open large lan-
guage model with state-of-the-art performance. 2023.
Sebastian Aniţa, Viorel Arnăutu, Vincenzo Capasso, and Vincenzo Capasso. An introduction to
optimal control problems in life sciences and economics: From mathematical models to numerical
simulation with MATLAB®, volume 2. Springer, 2011.
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn
Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al. Training a helpful and harmless
assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862,
2022.
Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Con-
erly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu,
Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex
Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter,
Tom Henighan, and Christopher Olah. Towards monosemanticity: Decomposing language
models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-
circuits.pub/2023/monosemantic-features/index.html.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari-
wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agar-
wal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh,
Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz
Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec
Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. In
H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neu-
ral Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc.,
2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/
file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece
Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi,
Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments
with gpt-4, 2023.
Giuseppe C Calafiore and Laurent El Ghaoui. Optimization models. Cambridge university press,
2014.
Hila Chefer, Shir Gur, and Lior Wolf. Transformer interpretability beyond attention visualization,
2021.
Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià
Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability, 2023.
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial
examples, 2015.
Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, and Douwe Kiela. Gradient-based adversarial
attacks against text transformers, 2021.

10
Thilo Hagendorff. Machine psychology: Investigating emergent capabilities and behavior in large
language models using psychological methods, 2023.
Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig. How can we know what language
models know?, 2020.
Rudolf Emil Kalman, Peter L Falb, and Michael A Arbib. Topics in mathematical system theory,
volume 33. McGraw-Hill New York, 1969.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal,
Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented genera-
tion for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:
9459–9474, 2020.
Feng-Li Lian, James Moyne, and Dawn Tilbury. Network design consideration for distributed con-
trol systems. IEEE transactions on control systems technology, 10(2):297–307, 2002.
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture
models, 2016.
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz,
Eneko Agirre, Ilana Heintz, and Dan Roth. Recent advances in natural language processing via
large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40, 2023.
David Noever and Forrest McKee. Numeracy from literacy: Data science as an emergent skill from
large language models, 2023.
Katsuhiko Ogata. Modern control engineering fifth edition. 2010.
OpenAI, Nov 2022. URL https://openai.com/blog/chatgpt.
OpenAI. Gpt-4 technical report, 2023.
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model
connected with massive apis, 2023.
Fabio Petroni, Tim Rocktäschel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H.
Miller, and Sebastian Riedel. Language models as knowledge bases? CoRR, abs/1909.01066,
2019. URL http://arxiv.org/abs/1909.01066.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language
models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
Laria Reynolds and Kyle McDonell. Prompt programming for large language models: Beyond the
few-shot paradigm, 2021.
Sandip Roy, Yan Wan, and Ali Saberi. A network control theory approach to virus spread mitigation.
In 2009 IEEE Conference on Technologies for Homeland Security, pp. 599–606, 2009. doi:
10.1109/THS.2009.5168092.
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi
Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton,
Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez,
Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and
Gabriel Synnaeve. Code llama: Open foundation models for code, 2023.
Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, and Luke Zettlemoyer.
Toward human readable prompt tuning: Kubrick’s the shining is a good movie, and a good prompt
too?, 2022.
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV au2, Eric Wallace, and Sameer Singh. Auto-
prompt: Eliciting knowledge from language models with automatically generated prompts, 2020.
Karthik Sivaramakrishnan, Vignesh Sivaramakrishnan, and Meeko M. K. Oishi. Stochastic reacha-
bility of discrete-time stochastic systems via probability measures, 2023.

11
Stefano Soatto, Paulo Tabuada, Pratik Chaudhari, and Tian Yu Liu. Taming ai bots: Controllability
of neural states in large language models, 2023.
Eduardo D Sontag. Mathematical control theory: deterministic finite dimensional systems, vol-
ume 6. Springer Science & Business Media, 2013.
Wilson L. Taylor. “cloze procedure”: A new tool for measuring readability. Journalism Quarterly,
30(4):415–433, 1953. doi: 10.1177/107769905303000401. URL https://doi.org/10.
1177/107769905303000401.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée
Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Ar-
mand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation
language models, 2023.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural informa-
tion processing systems, 30, 2017.
Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, and Zhaopeng
Tu. Document-level machine translation with large language models. arXiv preprint
arXiv:2304.02210, 2023a.
Zengzhi Wang, Qiming Xie, Zixiang Ding, Yi Feng, and Rui Xia. Is chatgpt a good sentiment
analyzer? a preliminary study. arXiv preprint arXiv:2304.04339, 2023b.
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yo-
gatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol
Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models,
2022.
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc
Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models,
2023.
Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein.
Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery,
2023.
Jason Weston, Antoine Bordes, Sumit Chopra, and Tomás Mikolov. Towards ai-complete question
answering: A set of prerequisite toy tasks. In Yoshua Bengio and Yann LeCun (eds.), 4th Inter-
national Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4,
2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1502.05698.
Tau-Mu Yi, Yun Huang, Melvin I Simon, and John Doyle. Robust perfect adaptation in bacterial
chemotaxis through integral feedback control. Proceedings of the National Academy of Sciences,
97(9):4649–4653, 2000.
Hanqing Zhang, Haolin Song, Shaoyu Li, Ming Zhou, and Dawei Song. A survey of controllable
text generation using transformer-based pre-trained language models. CoRR, abs/2201.05337,
2022. URL https://arxiv.org/abs/2201.05337.
Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and
Jimmy Ba. Large language models are human-level prompt engineers, 2023.
Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial
attacks on aligned language models, 2023.

12
A A BSTRACT S YSTEMS AND C ONTROL T HEORY BACKGROUND
This section aims to provide an overview of fundamental control-theoretic concepts from an abstract
perspective. We primarily draw from Sontag (2013); Kalman et al. (1969), and Ogata (2010).
Diverse definitions of “system” or “machine” exist in the literature, all representing the same core
concept but varying in mathematical details. We offer the following high-level definition based on
Sontag (2013):
Definition 7 (System). A “system” or “machine” Σ = (T , X , U, ϕ) consists of:
• T : The time set along which system state evolves.
• X : The state space.
• U : The input space.
• ϕ : X × U × T 2 → X : The transition map.
A system may also be equipped with an output space and readout map (Y, h):
• Y : The output space.
• h : X × U × T → Y : The readout map.
In other words, at time t ∈ T , the system’s state takes on values x ∈ X , and the control input
takes values u ∈ U. The system evolves over time with the transition map ϕ(x, u, t, t′ ) that returns
the new state value x′ ∈ X at time t′ > t. A system can also have a readout map h(x, u, t) that
produces the output value y ∈ Y given the current time, state, and input value. An input u ∈ U
defined over interval [t, t′ ] may be said to steer the system Σ = (T , X , U, ϕ) from state x0 to state
x′ if x′ = ϕ(x0 , u, t, t′ ). A wide variety of systems are expressible within this framework. E.g., we
obtain discrete-time dynamical systems for T = Z+ . Continuous-time dynamical systems emerge
for T = R+ .
Note that we assume that the system Σ is time-invariant; its dynamics ϕ do not change as a function
of time. This assumption is widely applicable and is often made in the literature (Kalman et al.,
1969; Ogata, 2010; Sontag, 2013) to simplify definitions and discussions of systems.
Reachability is a core control theory concept and central to defining controllability. At their core,
definitions of reachability revolve around the existence of control inputs u ∈ U that steer the system
from a starting state x0 ∈ X to some desired state(s). Following from Kalman et al. (1969); Sontag
(2013), we define state reachability as:
Definition 8 (State Reachability). State x ∈ X is reachable from initial state x0 ∈ X for system
Σ = (T , X , U, ϕ) iff there exists some time T and control input u∗ ∈ U such that u∗ steers the
system from state x0 to state x at time T .
We may use this definition of state reachability to define the reachable state set for some initial state
x0 ∈ X :
Definition 9 (Reachable State Set). The reachable state set from initial state x0 ∈ X for system
Σ = (T , X , U, ϕ) is denoted R(x0 ) ⊆ X and consists of all reachable states x ∈ X from initial
state x0 (cf. Definition 8).
For systems with readout maps h, notions of output reachability arise naturally. Note that state
reachability is neither necessary nor sufficient to guarantee output reachability.
Definition 10 (Output Reachability). Output y ∈ Y is reachable from initial state x0 ∈ X for
system Σ = (T , X , U, ϕ, Y, h) iff there exists some time T and control input u∗ ∈ U such that u∗
steers the system from state x0 to output y in time T .
Definition 11 (Reachable Output Set). The reachable output set from initial state x0 ∈ X for system
Σ = (T , X , U, ϕ, Y, h) is denoted Ry (x0 ) and consists of all reachable outputs y ∈ Y from initial
state x0 (cf. Definition 10).
A system is controllable when the reachable set extends to the entire state space. Practically speak-
ing, this implies that one can steer the system from any initial state to any desired state.

13
Definition 12 (State Controllability). System Σ = (T , X , U, ϕ) is state controllable iff, for every
initial state x0 ∈ X , the reachable set R(x0 ) = X .
Definition 13 (Output Controllability). System Σ = (T , X , U, ϕ, Y, h) is output controllable iff, for
every initial state x0 ∈ X , the reachable output set Ry (x0 ) = Y.

A range of fruitful questions stem from these definitions: if there is a cost associated with control
inputs u ∈ U (e.g., power constraints, length constraints), what is the minimum cost of control?
What is the minimum time required to get from the initial state to the desired final state or output?
If the system is not completely controllable, under what conditions is it controllable? Under which
readout maps is a system output controllable?

14
B P ROOF OF S ELF -ATTENTION C ONTROLLABILITY T HEOREM 1
Note: Key terms for the proof are introduced in Section 4 surrounding Theorem 1. Specifically, the
definition of self-attention mechanism Ξ, the control problem setup, and the reachable set Rky (X0 )
are required background for this proof.

Proof. For each token representation matrix Q, K, V ∈ R(k+M )×· , we denote the first k rows
corresponding to U using u as a subscript, like Qu . The remaining M rows corresponding to X0
are denoted with subscript x like Qx .
Let A be the exponentiated query-key outer product matrix with the following block structure:
Qu K⊤ Qu K⊤
    
Auu Aux
 ⊤
Q K u x 1
A = exp √ = exp √ = (8)
dk Qx K⊤u Qx K⊤x
dk Axu Axx

We apply a similar quadrant decomposition to D, defined initially in Equation 4.


  
0
  ⊤

QK D u
D = diag exp √d 1N ×1 = (9)
k 0 Dx

where the quadrant demarcations in D follow from Equation 8.


We may now express the self-attention mechanism output representations Y as
Y = D−1 −1
x Axu Vu + Dx Axx Vx (10)

We begin by stating the equality between the desired output Y∗ and the true system output from
Equation 10. The final bound in Equation 5 of Theorem 1 is derived by isolating terms depending
on control input U, bounding them, and expressing that bound as a condition for achieving equality
between the desired output Y∗ and the true system output.
Y∗ = D−1 Axu Vu + D−1 Axx Vx (11)
| x {z } | x {z }
≜Yu ≜Yx
=⇒ Yu = Y∗ − Yx (12)

We may immediately bound the magnitude of the rows of Yu as the matrix D−1 x Axu has rows that
sum to less than 1 (it represents one quadrant of the row-wise softmaxed attention map, which has
rows that sum to 1 by construction). Therefore, each row yui of Yu lies within the convex hull
defined by the row vectors vui of Vu . Recalling Definition 6, Vu = UWv . Let Ωu = maxj ∥uj ∥
for rows uj of U, we can bound the norm of each vui in Vu with the maximum singular value
of parameter matrix Wv , denoted σq . Refer to Chapter 5 of Calafiore & El Ghaoui (2014) for an
overview of singular values. Thus we may bound each ∥vui ∥ ≤ Ωu σq . By the properties of convex
hulls, each row of Yu must inherit this upper bound on magnitude to retain feasibility.
∥yui ∥ < Ωu σq (13)
Refer to Chapter 8 of Calafiore & El Ghaoui (2014)) for a detailed explanation of convex hulls and
their properties.
While Yx in Equation 11 may appear to depend only on imposed X0 , the denominator term D−1 x
contains influences from U. Let us split the denominator term Dx = D̂xx + D̂xu where D̂xx depends
solely on the imposed input X0 . D̂xx is definedin Equation 6. Let D̂xu be defined as:
  ⊤
 
D̂xu = diag exp Qx√Kd u 1k×1 (14)
k

Recall Equation 7, which defines Ŷx , the output of Ξ if only X0 is input. Let us express the condition
in Equation 11 using Ŷx to disentangle the influence of the control input:

Yu = Y∗ − (D̂xu + D̂xx )−1 D̂xx Ŷx (15)

15
Observe that the rows of Yx and Ŷx are positively scaled versions of each other because the de-
nominator matrices are all positive and diagonal. Applying the bound in Equation 13 using row-wise
notation,

i
D̂xx
∥y∗ i − ŷxi ∥ ≤ σq Ωu (16)
i + D̂ i
D̂xx xu

Using the same singular values reasoning as in Equation 13 to bound the unknown denominator term
i
D̂xu , which is the only term still dependent on the control input U .
 
i Ωx σq σk Ωu
D̂xu ≤ k exp √
dk
(17)

Achieving this minimum value will minimize the value of yxi by maximally scaling down ŷxi . The
maximum value for yxi arises when D̂xu
i
is minimized (e.g., to zero) resulting in yxi = yˆxi .
Therefore, the the value of yxi is constrained linear scalings between this minimum and this maxi-
mum. If every scaling violates the inequality in Equation 16, then the system is strictly controllable.
Therefore, if ⟨y∗ i , ŷxi ⟩ ≤ 0 for some row i following inequality is met, the output Y ∗ is strictly
unreachable under imposed input representations X0 and control input U:

i
D̂xx
∥y∗ i −   ŷxi ∥ ≤ σq Ωu (18)
i Ωx σq σk Ωu
D̂xx + k exp √
dk

16
C P ROMPT O PTIMIZATION A LGORITHMS
Greedy Back-Generation: While testing all prompts in V k is intractable for k > 1, it takes only
|V| forward passes of the network to compute the loss on y induced by all possible single token
prompts u ∈ V. Our Greedy Back Generation algorithm leverages this fact to generate prompts u ∈
V k one token at a time, working backward sampling the ith greedy-optimal single token extension
u′ = arg maxu′ PLM (y|u′ + u + x) of the current prompt u ∈ V i−1 .

Algorithm 2 Greedy Token-Wise Prompt Generation


Require: A causal LLM PLM with vocabulary V, a set of base tokens x ∈ V n , a desired final token
y ∈ V, and a desired number of prompt tokens k.
Ensure: Magic words u∗ of length k.
1: Initialize u∗ to be empty.
2: for i from 1 to k do
3: for all u′ ∈ V do
4: compute PLM (y|u′ + u∗ + x)
5: end for
6: Select the u′ that maximizes the probability of y given u′ + u∗ + x. Prepend u′ to u∗
7: end for
8: return u∗

This method is optimal for k = 1 prompt token u∗ ∈ V and generally outperforms GCG for short
prompts of length k ≤ 3. Computing 1 additional prompt token takes roughly 1-4 minute when using
an NVIDIA A100-80GB GPU with a 7 billion parameter model and 5-20 minutes on 2 NVIDIA
A100-80GB GPUs with a 40 billion parameter model.

Greedy Coordinate Gradient (GCG): The Greedy Coordinate Gradient algorithm, presented
by (Zou et al., 2023) building off the work of (Shin et al., 2020), is the state-of-the-art method for
optimizing prompts. Starting with a random prompt of length k, the algorithm generates a batch of
alternative prompts. Each member of the batch swaps a random token in the current prompt with a
promising alternate token. The value metric for a swap is given by a first order approximation of the
change in loss L = CELoss(y, PLM (y|u + x)) with the embedding of each token in u.

Algorithm 3 Greedy Coordinate Gradient


Require: A causal LLM PLM that accepts token strings from a vocabulary X , an embedding dic-
tionary e, embeddings e∗i corresoponding to each token i of u∗ , a set of base tokens x1:n , a
desired number of prompt tokens k, iterations T , ksub , and batch size B.
Ensure: Magic words u∗ of length k.
1: Initialize u∗ to be random tokens from vocabulary.
2: for iteration from 1 to T do
3: for i from 1 to k do ▷ Compute the top ksub most promising substitutions.
4: Xi = Top-ksub (eT ∇e∗i PLM (xn |u∗ + x1:n−1 ))
5: end for
6: for b from 1 to B do
7: i = randint([1, . . . , k]) ▷ Select random position to swap.
8: j = randint([1, . . . , ksub ] ▷ Select random token from candidate set.
9: ũ∗b [i] = Xi [j] ▷ Swap token at position i.
10: end for
11: u∗ = ũ∗b∗ , where b∗ = argmaxb (PLM (xn |u∗ + x1:n−1 ))) ▷ Select replacement which
maximizes probability of future token.
12: end for
13: return u∗

This method outperforms all other methods we tested for prompts of length k > 3. We use a batch
size B = 768, sampled from the top ksub = 128 token replacements at each index, and iterate for
T = 34 iterations. For each instance, this optimization took roughly 2 minutes for the 7 billion
parameter models on a single A100-80GB GPU and 4-8 minutes for the 40 billion parameter model
on 4 A100-80GB GPU.

17
D S UPPLEMENTARY F IGURES : O PTIMAL C ONTROL P ROMPTS
D.1 “G ROUND T RUTH ” C ONTROLLABILITY R ESULTS

This subsection includes supplementary figures for the controllability of Llama-7b, Falcon-7b, and
Falcon-40b “ground truth” target outputs from Wikitext. For each initial state sequence x0 , the target
output y is the token immediately following x0 in Wikitext. We measured the k − ϵ controllability
of each of the 7 billion parameter models with a dataset of 5000 state-output pairs while we used a
dataset of 500 state-output pairs for Falcon-40b.
Figure 2 shows each model’s log-spaced k − ϵ curves on the Wikitext dataset, revealing a log-
linear relationship between maximum prompt length k and the fraction of uncontrollable initial
state-target output pairs (x0 , y). We visualize the relationship between prompt length and the prior
cross-entropy loss of each LLM on predicting the target output y given the state sequence x0 (i.e.,
− log PLM (y|x0 ) in Figure 3 where we find it difficult to predict the required prompt length from
the base loss.
Finally, Figure 4 shows a histogram of the tokens in the optimized prompts generated in the ground
truth k − ϵ controllability experiments on Wikitext.

[Falcon-7b] k-log( ) Controllability on Wiki5k [Llama-7b] k-log( ) Controllability on Wiki5k


Question Length 8 Question Length 8
Log(Portion Incorrect) [log(epsilon)]

Question Length 10 Log(Portion Incorrect) [log(epsilon)] Question Length 10


Question Length 16 Question Length 16
Question Length 22 Question Length 22
Question Length 32 1 Question Length 32
1
10
10

2
10
2
10
0 2 4 6 8 10 0 2 4 6 8 10
Prompt Length [k] Prompt Length [k]

Figure 2: Log spaced main results of k − log(ϵ)


controllability. Interestingly, the relationship
[Falcon-40b] k-log( ) Controllability on Wiki500
between k and log(ϵ) appears roughly linear for
Question Length 8
each question length in the regime studied.
Log(Portion Incorrect) [log(epsilon)]

Question Length 10
Top left: k − log(ϵ) values for Falcon-7b. With Question Length 16
Question Length 22
k = 10 control tokens, 97.16% of the target out- Question Length 32
1
puts were reachable. 10
Top right: k−log(ϵ) values for Llama-7b. With
k = 10 control tokens, 98.64% of the target out-
puts were reachable.
Bottom right: k −log(ϵ) values for Falcon-40b. 2
10
With k = 10 control tokens, 97.00% of the tar- 0 2 4 6 8 10
get outputs were reachable. Prompt Length [k]

D.2 T OP -75 W IKITEXT C ONTROLLABILITY R ESULTS

This subsection includes supplementary figures for the controllability of Llama-7b, Falcon-7b, and
Falcon-40b on the Wikitext dataset where the target output token y for a given initial state token
sequence x0 is sampled uniformly from the top 75 highest-probability tokens as determined by the
language model itself PLM (y|x0 ). Specifically, the dataset D consists of 25 unique initial state
token sequences x0 sampled from Wikitext, each replicated 75 times for the top 75 most probable
subsequent tokens y ∼ P (y|x0 ). This procedure yielded a dataset of 1875 initial state-target output
pairs (x0 , y) for the 7 billion parameter models. Due to the computational requirements for the 40
billion parameter model, the number of unique initial state token sequences was decreased to 10,
resulting in a dataset of 750 initial state-target output pairs. The k − ϵ plots for each model are
shown in Figure 5. On average, across the 3 models, the top 75 outputs were reachable 86.865%

18
[Falcon-7b] Base Loss vs. Prompt Length for (x_0, y*) from Wikitext [Llama-7b] Base Loss vs. Prompt Length for (x_0, y*) from Wikitext
solved
unsolved

10 10

8 8
Prompt Length [k]

Prompt Length [k]


6 6

4
4

2
2
solved
0 unsolved
0
0 5 10 15 20 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Base Loss on P(y* | x_0) Base Loss on P(y* | x_0)

[Falcon-40b] Base Loss vs. Prompt Length for (x_0, y*) from Wikitext
solved
unsolved

10

Prompt Length [k]


Figure 3: Required prompt length k versus base 6
loss on the target output L = − log PLM (y|x0 ).
While there does appear to be an “exclusion 4
zone” in the top left-hand corner where a high
2
prompt length is never associated with a base
loss below a given threshold, base loss appears 0
to be a poor predictor of required prompt length. 0 2 4 6 8 10 12 14
Base Loss on P(y* | x_0)

of the time with k ≤ 10 prompt tokens. Similar log-linear trends were observed in the k − ϵ plot.
Figure 6 shows the relationship between base loss and required prompt length, revealing a more
dramatic “exclusion zone” in the top left, similar to main “ground truth” results in Figure 3. Finally,
Figure 7 plots a histogram of the 40 most common tokens observed in the optimized control input
prompts from the top-75 experiments.

D.3 U NIFORMLY S AMPLED O UTPUT T OKEN R ESULTS

This section contains supplementary figures for k − ϵ controllability experiments on a synthetic


dataset D = {(x0 , y)} where x0 are sampled from the Wikitext dataset and y is sampled uniformly
from the vocabulary. The uniform target output dataset D consists of 616 state-output pairs. Due
to computational constraints, k − ϵ controllability was only measured for Falcon-7b. Overall, only
46.42% of the target outputs were reachable with k = 10 prompt tokens. Figure 8 visualizes the
k −ϵ results, the relationship between base loss and prompt length, and the most frequently observed
tokens in the optimized control prompts. While the “exclusion zone” behavior (cf Figures 6, 3) is
observed in the base loss vs. prompt length subplot, base loss remains a poor predictor of required
prompt length. Moreover, Figure 1 reveals an even more uniform relationship between the initial
rank of the target output token and the required prompt length.

19
Frequency Frequency

0
20
40
60
80
0
20
40
Frequency . «

0
1
2
3
4
, @
; >>DOMAIN<<
>>SUFFIX<< v
@
%),
{- >>PREFIX<<
Clickfunnels Z >>COMMENT<<
wickets /** >>SUFFIX<<
Loan ( .!
<!-- y Clickfunnels
WHERE
xbet . {-
); @ =
daunting h %),
/** 1 <!
>>MIDDLE<< <s> Thc
>>PREFIX<<
They S Biography
s -
- &
club M @@
("

20
crush slot .|

Token
Token
Token
additionally ' THE
namespace , poker

(b) Llama-7b
(a) Falcon-7b

(c) Falcon-40b
ONE
Far <!-- >>MIDDLE<<
Rehab [" gummies
Patton
Betsy <unk> ,(
Miami \({\ slots
/**************************************************************************** (@" [@
junction s /*
&=

[Falcon-40b] Top 40 Prompt Tokens for (x_0, y*) from Wikitext


[Llama-7b] Top 40 Prompt Tokens for (x_0, y*) from Wikitext
[Falcon-7b] Top 40 Prompt Tokens for (x_0, y*) from Wikitext

servers $\{ (@
]); initially >>QUESTION<<
Illinois '. '@
procent ]`. Coronavirus
train \% .
Lauderdale

tom) from Wikitext ground truth target token k − ϵ controllability experiments.


Atlanta /@ :
N ...
` (=
gebras
@ <!--
,-- crushers

Figure 4: Prompt token frequencies for Falcon-7b (top), Llama-7b (middle), and Falcon-40b (bot-
[Falcon-7b] k- Plot for Top 75 y* on 25 x_0 from Wikitext [Llama-7b] k- Plot for Top 75 y* on 25 x_0 from Wikitext
1.0 |x_0| = 8 1.0 |x_0| = 8
|x_0| = 10 |x_0| = 10
|x_0| = 16 |x_0| = 16
|x_0| = 22 |x_0| = 22
0.8 |x_0| = 32 0.8 |x_0| = 32
Portion Incorrect [ ]

Portion Incorrect [ ]
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10
Prompt Length [k] Prompt Length [k]

[Falcon-40b] k-log( ) Controllability on Wiki500


Figure 5: k − ϵ controllability plots on the top
Question Length 8
75 most likely output tokens.

Log(Portion Incorrect) [log(epsilon)]


Question Length 10
Top left: k − ϵ values for Falcon-7b. With k = Question Length 16
Question Length 22
10 control tokens, 89.387% of the top 75 output Question Length 32
1
tokens were reachable. 10
Top right: k − ϵ values for Llama-7b. With
k = 10 control tokens, 85.493% of the top 75
output tokens were reachable.
Bottom right: k−ϵ values for Falcon-40b. With 2
10
k = 10 control tokens, 85.714% of the top 75 0 2 4 6 8 10
output tokens were reachable. Prompt Length [k]

[Falcon-7b] Base Loss vs. Prompt Length for Top 75 y* [Llama-7b] Base Loss vs. Prompt Length for Top 75 y*
solved solved
unsolved unsolved

10 10

8 8
Prompt Length [k]

Prompt Length [k]

6 6

4 4

2 2

0 0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Base Loss on P(y* | x_0) Base Loss on P(y* | x_0)

[Falcon-40b] Base Loss vs. Prompt Length for Top 75 y*


solved
unsolved

10

8
Prompt Length [k]

Figure 6: Required prompt length k versus base 6


loss on the target output L = − log PLM (y|x0 )
on synthetic top-75 dataset. While there does 4
appear to be an “exclusion zone” in the top left-
2
hand corner where a high prompt length is never
associated with a base loss below a given thresh- 0
old, base loss appears to be a poor predictor of 0 2 4 6 8 10
Base Loss on P(y* | x_0)
required prompt length.

21
Frequency Frequency Frequency

0
5
10
0
10
20
0
20
Florida Kan >>COMMENT<<
Edison stadt Ki
Joomla physical Ki
>>SUFFIX<< . @
(' @
, </s>
(# Mat
(), >>DOMAIN<<
199 surprised =
14 ...
- S Q
( . .|
6 officers debt
Florida :
/
utterly slot WASHINGTON
CE displaystyle Mat
contains ; poker
Mod ég mortgage
homosexuality

22
Kant ...

Token
download

Token
Token
Football QQ
audible h AL

(b) Llama-7b
(a) Falcon-7b

(c) Falcon-40b
backing Florida
criticism bet ï
Amenities K
qu M --
OP rical q
[Llama-7b] Top 40 Prompt Tokens for Top 75 y*
[Falcon-7b] Top 40 Prompt Tokens for Top 75 y*

[Falcon-40b] Top 40 Prompt Tokens for Top 75 y*


Summer (
· ... Samsung
ing \\ ens
Filename sheet ....
depicts HL .
Dry èg >
Free [...] i

tom) from Wikitext top-75 synthetic dataset k − ϵ controllability experiments.


].[ NFL man
(. Contents
j ann campaign
] December >>MIDDLE<<
... Pokémon
![ &

Figure 7: Prompt token frequencies for Falcon-7b (top), Llama-7b (middle), and Falcon-40b (bot-
[Falcon-7b] k- Plot for y* ~ uniform(V) on x_0 from Wikitext [Falcon-7b] Base Loss vs. Prompt Length for y* ~ uniform(V) and x_0 ~ Wikitext
1.0 |x_0| = 8 solved
|x_0| = 10 unsolved
|x_0| = 16
0.9 |x_0| = 22
|x_0| = 32 10

0.8
8
Portion Incorrect [ ]

0.7 Prompt Length [k]


6
0.6
4
0.5
2
0.4

0
0.3
0 2 4 6 8 10 0 5 10 15 20 25
Prompt Length [k] Base Loss on P(y* | x_0)

[Falcon-7b] Top 40 Prompt Tokens for y* ~ uniform(V) and x_0 ~ Wikitext


Frequency

40
20
0
>>COMMENT<<

Received
debt
Mat
>

poker

around
mortgage
"@
Mat

Ki
@
ens

és

Ki
(
ation
=
Florida

i
WASHINGTON
KI

men
ed

.--
{@
<!
/>
the
s
est

je
Username

Token

Figure 8: Supplementary figures on uniformly sampled target output controllability tests on Falcon-
7b. Top Left: k − ϵ plot (46.42% controllable at k = 10). Top Right: Base loss versus required
prompt length. Bottom: Histogram of top 40 most frequent tokens in optimized control prompts.

23

You might also like