0% found this document useful (0 votes)
37 views21 pages

LLM Using Prompting method

This article reviews prompting engineering (PE) methods for large language models (LLMs) from a communication theory perspective, highlighting the shift from single-task NLP to multi-task learning. It categorizes PE methods into prompt template engineering, prompt answer engineering, and multi-prompt engineering, aiming to enhance the interaction between users and LLMs. The authors propose that optimizing mutual information between inputs and outputs is crucial for improving the effectiveness of these prompting methods.

Uploaded by

mohanaram352001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views21 pages

LLM Using Prompting method

This article reviews prompting engineering (PE) methods for large language models (LLMs) from a communication theory perspective, highlighting the shift from single-task NLP to multi-task learning. It categorizes PE methods into prompt template engineering, prompt answer engineering, and multi-prompt engineering, aiming to enhance the interaction between users and LLMs. The authors propose that optimizing mutual information between inputs and outputs is crucial for improving the effectiveness of these prompting methods.

Uploaded by

mohanaram352001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Song YF, He YQ, Zhao XF et al.

A communication theory perspective on prompting engineering methods for large lan-


guage models. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(4): 984−1004 July 2024. DOI:
10.1007/s11390-024-4058-8

A Communication Theory Perspective on Prompting Engineering


Methods for Large Language Models

Yuan-Feng Song (宋元峰), Yuan-Qin He (何元钦), Xue-Fang Zhao (赵雪芳), Han-Lin Gu (古瀚林)
Di Jiang (姜 迪), Hai-Jun Yang (杨海军), and Li-Xin Fan (范力欣)

AI Group, WeBank Co., Ltd, Shenzhen 518000, China

E-mail: [email protected]; [email protected]; [email protected]; [email protected]


[email protected]; [email protected]; [email protected]

Received December 21, 2023; accepted April 12, 2024.

Abstract The springing up of large language models (LLMs) has shifted the community from single-task-orientated
natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of re-
search endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the techno-
logical advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various
prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then
making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (e.g., GPT-3 and
GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of
studies related to the prompting and the ever-evolving nature of this field, this article aims to 1) illustrate a novel perspec-
tive to review existing PE methods within the well-established communication theory framework, 2) facilitate a
better/deeper understanding of developing trends of existing PE methods used in three typical tasks, and 3) shed light on
promising research directions for future PE methods.
Keywords prompting method, large language model, communication theory

1 Introduction because they are the key techniques in making full use
of the superior capabilities of LLMs via constructing
Large language models (LLMs) (e.g., GPT-3[1], appropriate prompts. PE refers to the process of care-
GPT-4[2], LLaMa[3]) make it possible for machines to fully constructing instructional prompts to steer and
understand users' attention accurately, thus revolu- shape the behavior of LLMs, and it greatly helps in
tionizing the human-computer interaction (HCI) bridging the gap between the pre-training tasks used
paradigm. Compared with traditional machine sys- to construct the LLM with the down-streaming tasks
tems like databases and search engines, LLMs demon- queried by the end users. Through careful prompt de-
strate impressive capability in understanding, gener- signing, users can steer LLM's output in the desired
ating, and processing natural language, facilitating a direction, shaping its style, tone, and content to align
series of services ranging from personal assistants[4], with their goals.
healthcare[5] to e-commercial tools[6] via a unified nat- To this end, numerous prompt engineering (PE)
ural language interface between users and machines. methods have been explored with the notable progress
The research paradigm around LLM has shifted of LLM advancement and technologies[7–24]. A com-
from single-task-orientated natural language process- mon theme of PE development lies in continuously
ing (NLP) research to a holistic end-to-end multi- improving accuracy and responsiveness of designed
task learning approach. Along this line of research en- prompts, which often include components like Role,
deavors, LLM-based prompting engineering (PE) Context, Input, Output Format, and Examples.
methods[1, 7] have attracted much attention, partially Specifically, prompt template and answering engineer-

Survey
©Institute of Computing Technology, Chinese Academy of Sciences 2024
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 985

ing have evolved from solely utilizing discrete within the LLM. For the same example in Table 1,
prompts to continuous prompts, and even to explor- the prompt answer engineering aims to find a map-
ing hybrid prompts that combine continuous and dis- ping from the result “good” obtained from the LLM
crete elements, which provides a larger optimization to the desired answer “positive”. The field of prompt
space to achieve better performance. With the emerg- answer engineering is currently witnessing a notable
ing capabilities of LLMs, these models can leverage development trend characterized by the pursuit of
their in-context learning abilities to plan and utilize models that excel in decoding model information from
external tools, significantly enhancing their perfor- simple mapping to complex mapping to enhance hu-
mance in specialized domains and broadening their man comprehension.
applications across diverse fields. • Third, multi -prompting methods mainly apply
Following these studies, representative PE meth-
ensemble techniques[10] to mitigate the sensitivity of
ods can be categorized as three groups that corre-
LLM to different formulations and to obtain a more
spond to three prompting tasks proposed to improve
stable output. In Table 1, the multi-prompting meth-
the qualities of LLMs' outputs, namely prompt tem-
ods combine three different templates (i.e., 1) “It was
plate engineering, prompt answer engineering, and
a [Z]” place, 2) “A [Z] place to eat”, and 3) “In gener-
multi-prompt engineering and multi-turn prompt en-
al, it was [Z]”), and their inference results (i.e., 1)
gineering, respectively. An example of the input and
“good”, 2) “fantastic”, and 3) “okay”) to obtain the
output for the above-mentioned tasks can be found in
final desired one (i.e., “positive”). Later, as LLMs be-
Table 1.
come more capable, multi-turn prompt methods at-
• First, prompt template engineering methods aim
tract more attention that aims to provide more con-
to carefully design a piece of “text” that guides the
language models to produce the desired outputs. For text to LLM by leveraging information either from
example, in Table 1, to finish a classical sentiment de- LLM itself or external tools[25, 26]. In the field of multi-
tection for an input A=“Delicious dining options close prompting methods, researchers are endeavoring to
to my current location”, the prompt template engi- develop adaptive strategies that enhance LLM's abili-
neering designs a template “[A] In summary, it was a ty to task planning and the utilization of tools.
[Z] restaurant” to enforce the LLM to fill the desired In this article, we summarize the prompting meth-
comments in the blank (i.e., [Z] ). Essentially this ods from a communication theory perspective with
type of template engineering method induces LLM to which the ultimate goal of PE is to reduce the infor-
focus on word embeddings that are relevant to the mation misunderstanding between the users and the
questions. A common designing principle of existing LLMs. Therefore, as delineated in Section 2, the com-
prompt template engineering methods is to better munication theory perspective provides a coherent ex-
align information between users and LLMs. Such planation of different PE methods in terms of their
a trend is manifested by the evolution from using objectives and underlying principles. Moreover, this
discrete prompts (i.e., a piece of human-readable novel perspective also offers and presents insights in-
text)[9, 11] to continuous ones (i.e., a continuous task- to scenarios where existing prompting methods come
specific vector)[13, 20]. short.
• Second, prompt answer engineering [7] refers to The remainder of the article is structured as fol-
the process of exploring the vast answer space and a lows: Section 2 details the overview of the prompting
map to the desired, intended output, which enhances methods from the communication theory perspective.
users' understanding of the information encapsulated Sections 3, 4, and 5 review and summarize the recent

Table 1. Running Examples for PE Methods


Stage Input Output
Prompt template Delicious dining options close to my current location Delicious dining options close to my current location.
engineering In summary, it was a [Z] restaurant
Large language Delicious dining options close to my current location. Delicious dining options close to my current location.
model In summary, it was a [Z] restaurant. In summary, it was a good restaurant
Prompt answering Good Positive
engineering
Multi-prompt 1) It was a [Z] place; 2) A [Z] place to eat; 3) In 1) good, 2) fantastic, and 3) okay
general, it was [Z]
986 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

progresses from three PE tasks namely prompt tem- communication process is often modeled as a chain of
plate engineering, prompt answer engineering, and information processing steps involving encoding,
multi-prompt engineering and multi-turn prompt en- transmitting, and decoding of messages, between a
gineering, respectively. Sections 6 discusses other re- sender and a receiver.
lated surveys and potential research directions. Final- To give a better illustration, Fig.1(a) depicts the
ly, we conclude this article in Section 7 by summariz- classical Model of Communication in the communica-
ing significant findings and discussing potential re- tion theory, which includes a sender encoding a mes-
search directions. We summarize the main symbols sage and transmitting it to the receiver over a chan-
and abbreviations in Table 2 for the convenience of nel. Then, the receiver decodes the message and deliv-
readers. ers some type of response. During the transmission
process, the message may be distorted due to noise,
2 Communication Theory Perspective of leading to the necessity of multi-turn interaction.
Prompting Methods The original communication theory is widely uti-
lized to examine factors including social[30], cultural[31],
The study of modern communication theory, and psychological[32] that influence human communi-
which dates back to the 1940s and the following cation. The overall goal of communication theory is to
decades, gave rise to a variety of communication reveal and clarify the common human experience of
models including both linear transmission models and interacting with others through information exchange.
non-linear models such as interaction, transaction, Among early studies of various communication
and convergence models[27–29]. A common theme of models, we are particularly inspired by two influen-
these early studies is to analyze how individuals uti- tial works, namely, Shannon-Weaver Model of Com-
lize verbal and non-verbal interactions to develop munication[33] and Schramm Communication Mo-
meaning in diverse circumstances. Conceptually, the del[34]. Shannon-Weaver's pioneering work, first pub-

Table 2. Summary of Key Symbols and Abbreviations


Symbol Description
PES Prompt engineering system, a mathematical formulation for interactive user-LLM communication
X Input to the LLM, which can be text or other data
PT Prompt template, a carefully crafted piece of text designed to guide the LLM to produce desired outputs
PA Prompt answer, the output yielded by the LLM following the input PT
Y Target output or desired result from the LLM
gωT Function representing the mapping from the input X to PT
fθ Function representing the mapping from the prompt PT to the answer PA
hωA Function representing the mapping from the answer PA to Y
I(X; Y ) Mutual information between two random variables X and Y, used in the context of maximizing information flow in PES

Prompt Template
Channel Engineering
Decode ( → )
Encode Input 
Message

LLM
Sender Receiver User
Noise ( → )

Decode Feedback Encode Output  Prompt Answer


Engineering
( → )
(a) (b)
Fig.1. Prompting methods from the communication theory perspective. (a) Classical interaction model of communication. (b) Differ-
ent aspects of existing prompting methods.
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 987

lished in 1948, provides a strong mathematical foun- instance, Sorensen et al.[36] demonstrated selecting the
dation to analyze information flow between an active prompt with the greater mutual information (MI) en-
sender and a passive receiver. It is however over-sim- hanced the model performance. This underscores the
plistic in the sense that it does not take into account objective of PE techniques, which is to optimize the
of complexities involved in interactive communica- mutual information between prompts and answers
tion between active senders and receivers, who may (see practical examples in Appendix C of [36]). This
respond by sending their messages as a form of feed- connection between PES and the communication
back. The interaction models of communication were models has never been explicitly stated before.
first studied by Scharmm and published in his 1954 Moreover, the existing work can be divided into
book[34], which pictorially illustrates the feedback loop three categories: prompt template engineering
gωT hωA
as depicted in Fig.1(a). Nevertheless, Scharmm's mod- ( X −→ PT ), prompt answer engineering ( PA −→ Y ),
el falls short of rigorous theoretical and mathematical and multi-prompt engineering and multi-turn prompt
formulation to accommodate quantitative analysis, engineering as shown in Fig.1(b). Specifically, the
e.g., information gain or mutual information between prompt template engineering aims to reduce the en-
senders and receivers. coding error or look for the prompt that is easily un-
Various prompting engineering methods for LLM, derstood by the machine, while the prompt answer-
in our view, can be understood from Scharmm's mod- ing engineering aims to reduce the decoding error or
el point of view (see Fig.1(b)). In the same vein of look for the prompt that can be easily understood by
Shannon-Weaver's analysis, we, therefore, delineate a the human. The development of LLMs aims to en-
mathematical formulation of prompting engineering hance the capability of the receiver that could better
systems for interactive user-LLM communication as handle users' information needs, and most important-
follows. ly, the multi-turn prompting and multi-prompt engi-
Definition 1 (PES). A prompt engineering system neering aim to constantly reduce the information mis-
(PES) consists of a processing chain: understanding via multi-turn interactions.
gω fθ hω
• Prompt template engineering aims to optimize
X −→
T
PT −→ PA −→
A
Y,
max I(X, PA ) = max I(X, fθ ◦ gωT (X)), (2)
where gωT represents the mapping from the input X to ωT ωT

the prompt PT , fθ denotes the mapping from the which looks for an additional piece of text, namely a
prompt PT to the answer PA , and hωA denotes the prompt, to steer the LLMs to produce the desired
mapping from the answer PA to the output Y (see outputs for downstream tasks. From the communica-
Fig.1(b) for an illustration). tion theory perspective, it acts as an “encoder” to
Definition 2 (Goal of PES). PES aims to maxi-
bridge the gap between the users and the LLMs by
mize the mutual information between the inputs X
encoding the messages in a way that the model can
and outputs Y , i.e.,
understand and then elicit knowledge from LLMs (see
max I(X, Y ) = max I(X, hωA ◦ fθ ◦ gωT (X)), (1) details in Section 3). In the encoding process, the
ωT , ωA ωT , ωA
challenge lies in the accurate understanding of the us-
where f ◦ g(x) = f (g(x)) . er's intention by LLM with limited instruction follow-
It is worth noting that prompt engineering is con- ing capability. Template engineering aims to reduce
sistently divided into two procedures: prompt tem-
this mismatch by translating the user's request to a
plate engineering and prompt answer engineering.
format that could be better understood by LLM.
Each procedure has specific goals similar to (1) that
• Prompt answer engineering aims to optimize
align with its intended purpose.
While the capacity in Definition 2 is well-known max I(PT , Y ) = max I(PT , hωA ◦ fθ (PT )), (3)
ωA ωA
in information theory[35], how to reach the maximum
of (1) for LLMs illustrated in Fig.1(b) remains an un- which focuses on developing appropriate inputs for
explored research direction. There exists a large vari- prompting methods. It has two goals: 1) to search for
ety of prompting engineering methods, which, in our a prompt answer PA and 2) to look for a map to the
view, essentially aim to reduce information misunder- target output Y that will result in an accurate predic-
standing between users and LLMs. In other words, tive model. In the decoding process, LLM-generated
they aim to reach the capacity of PES as defined. For output often carries redundant information in addi-
988 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

tion to the expected answer due to its unlimited out- 3.1 Constructing the Prompt
put space. Answer engineering aims to confine the
output space and extract the target answer. The field The basic motivation of constructing PT is to
of prompt answer engineering is currently witnessing transform the specific task to make it align with the
a notable development trend characterized by the pre-training objective (i.e., next-word prediction,
masked LM) of the LM. Existing prompt construc-
pursuit of effective answer engineering such that ulti-
tion methods[8–11, 15, 37–39] could be categorized into
mate outputs (i.e., Y ) are well aligned with that of
five different approaches as shown in Table 3, which
end users' expectations (see details in Section 4).
are discussed in detail as follows.
• To further reduce the information misunder-
standing, the user could conduct multi-interaction ac-
3.1.1 Manually-Designed
cording to (2) and (3), called multi-prompt/multi-
turn PE. Multi-prompting methods aim to optimize Initially, the prompt templates are manually de-

M signed in the natural language based on the user's ex-
max I(X, fθ ◦ gωTi (X)), perience, and they have been validated to be able to
ωT1, ..., ωTM
i=1 improve the performances of downstream tasks, espe-
which mainly applies ensemble techniques[10] to miti- cially in a zero-shot setting[1, 8]. The most frequent
gate the sensitivity of LLM to different formulations style is to reformulate the original task as a “fill-in-
and to obtain a more stable output. Later, as LLMs the-blank” cloze one[9, 10], and the answer is obtained
become more capable, multi-turn prompt methods fo- by predicting the words in the given “[mask]” place.
For example, as illustrated in Table 3, Petroni et al.[9]
cus on providing more context to LLM by leveraging
manually designed prompts to re-structure the rela-
multiple communication procedures between the ma-
tional knowledge, while studies like [10, 37] focused on
chine and person[25, 26]. In the field of multi-prompt- solving the text classification and language under-
ing methods, researchers are endeavoring to develop standing tasks by several self-defining prompt pat-
adaptive strategies that enhance LLM's ability to task terns and proposed[10] a new training procedure
planning and the utilization of tools. The adaptive named PET. Another line of work involves develop-
and iterative nature of multi-prompting methods is by ing prefix prompts for generation tasks, which pro-
the communication theory (see Section 5 for an elabo- vide instructions and steer the LLMs to finish the
rated explanation). sentence. For example, a summarization task can be
handled by adding “TL;DR:”[8], and a translation task
3 Prompt Template Engineering can be conducted into “English Translate to
Spanish:”[38]. Even though manually designed prompts
Given the information chain X → PT → PA , the show some effectiveness[39], they are also criticized for
being time-consuming and unstable[15]. A subtle differ-
answer PA is determined by the prompt-processed PT
ence in the designed prompts may result in a substan-
and model M with pre-trained weights θ . Suppose
tial performance decrease. As such, how to explore
that P̄A is the targeted prediction, the key problem of the prompt space and construct prompts more thor-
prompt template engineering is to find a good prompt oughly and more effectively becomes an important
that maximizes the probability p(P̄A |M, PT , θ) on di- and challenging issue.
verse downstream tasks with limited data. To obtain
the optimal prompt, current work[8–24] can be formu- 3.1.2 Heuristic-Based
lated into three categories: constructing PT , ranking
PT , and tuning PT . The heuristic-based methods focus on finding
Table 3. Summary of Prompt Construction Methods
Method Automated Gradient-Free Few-Shot Zero-Shot Stability Interpret-Ability
Manually-designed[8–10] ✗ ✓ ✓ ✓ ✗ ✓
Heuristic-based[11, 19, 40] ✓ ✓ ✓ ✓ ✓ ✓
Paraphrasing-based[11, 14, 41] ✓ ✓ ✓ ✓ ✗ ✓
Generation-based[17, 42] ✓ ✓ ✓ ✓ ✓ ✓
Optimization-based[12, 22] ✓ ✗ ✓ ✗ ✓ ✗
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 989

prompts by some intuitive strategies. For example, to by some optimization signals. For example, Shin et al.[12]
construct more flexible and diverse prompts for differ- employed gradients as the signals, and then searched
ent examples (rather than fixed ones), Jiang et al.[11] for discrete trigger words as prompts to enrich the
proposed to use the most frequent middle words and candidate space. Deng et al.[22] generated the prompt
the phrase spanning in the shortest dependency path using a reinforced-learning approach that is directed
that appeared in the training data as a prompt. This with the reward function.
method shows a large performance gain compared
with the manually-designed prompts. Han et al.[19] 3.2 Ranking the Prompt
tried to form task-specific prompts by combining sim-
ple human-picked sub-prompts according to some log- After obtaining multiple prompt candidates with
ic rules. Different from the above methods, Logan the above-mentioned methods, the next step is to
et al.[40] used an extremely simple uniform rule by null rank them to select the most effective one. Existing
prompts, which only concatenates the inputs and the studies solve this problem by finding prompts that are
“[mask]” token, and it is able to gain a comparable close to the training samples to reduce the informa-
accuracy with manually-defined prompts. tion mismatch between the pre-training and inference
phases.
3.1.3 Paraphrasing-Based
3.2.1 Execution Accuracy
The paraphrasing-based methods are widely used
in data augmentation, aiming at generating augment- Since the prompts are designed to accomplish spe-
ed data that is semantically related to the original cific downstream tasks, it is intuitive and straightfor-
text, and this could be achieved in various ways us- ward to evaluate their performance by measuring the
ing machine translation, model-based generation, and execution accuracy on those tasks[11, 17, 44].
rule-based generation[43]. The paraphrasing-based
methods could naturally be used to construct prompt 3.2.2 Log Probability
candidates based on the original text, and we could
further select the best one or integrate them to pro- The log probability criterion prefers the prompt
vide better performance. Representative studies in- that delivers the correct output with higher probabili-
cludes [11, 14, 41]. Specifically, Jiang et al.[11] used ty, rather than being forced to give the exact answer.
back-translation to enhance the lexical diversity while For example, a prompt template that can work well
keeping the semantic meaning. Yuan et al.[41] manual- for all training examples is given the maximum gener-
ly created some seeds and found their synonyms to ated probability in [17]. Furthermore, language mod-
narrow down the search space. Haviv et al.[14] used a els can also be utilized to evaluate the quality of
BERT-based model to act as a rewriter to obtain prompts. In [45], the prompt with the highest proba-
prompts that LLMs can understand better. bility given by an LM is selected, which indicates
closer to the general expression that appears in the
3.1.4 Generation-Based training dataset.

The generation-based methods treat prompt 3.2.3 Others


searching as a generative task that can be carried out
by some LMs. For example, Gao et al.[17] first lever- Other criteria can be used to select the top one or
aged the generative ability of T5[38] to fill in the the top- k prompt. For example, Shin et al.[12] regard-
placeholders as prompts, and then the prompts could ed the words that are estimated to have the largest
be further improved by encoding domain-specific in- performance improvement as the most crucial ele-
formation[42]. ments.

3.1.5 Optimization-Based 3.3 Tuning the Prompt

To alleviate the weakness of insufficient explo- Recent studies turn to optimizing the prompt as
ration space faced by existing methods, the opti- continuous embeddings to further improve the perfor-
mized-based methods try to generate prompts guided mance. The main idea is to learn a few continuous pa-
990 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

rameters, referred to as soft prompts, and these con- tuning[48], adapter tuning[49], Ladder Side-Tuning[50],
tinuous parameters can be optionally initialized by and the essential Prompt Tuning[13]. These methods
the previously obtained discrete prompt. Li et al.[13] aim to achieve a comparable performance by fine-tun-
first introduced a continuous task-specific “prefix-tun- ing the whole network by only tuning some parts of
ing” for generative tasks. Studies like [20] and [15] the parameters of LLMs.
adopted a similar strategy and proved its effective-
ness in various natural language understanding tasks. 3.4 Trends for Prompt Template Engineering
Following the above-mentioned studies, many im-
provements have been conducted to find better There are two trends in prompt template engi-
prompts, such as better optimizing strategies[16], bet- neering.
ter vector initialization[21, 23], and indicative • Increased reliance on automated methods over
anchors[15]. Furthermore, studies like [13, 20, 46] fur- manual design when constructing prompts, reducing
ther point out that prompt position, length, and ini- the need for human involvement.
tialization all affect the performance of continuous • Development of optimization -based techniques.
prompts[13, 20, 46] (Table 4). In this subsection, we The gradient-based searching method shows better
summarize these factors as follows: performance than the derivative-free one in hard
• Different Positions. There are three different po- prompts construction while the soft prompts appear
sitions for autoregressive LM that the prompt can be more promising than hard prompts.
inserted into, that is, the prefix [P REF IX; XT ; Y ] , From the communication theory perspective, the
the infix [XT ; IN F IX; Y ] , and the hybrid one development history of prompting template engineer-
[P REF IX; XT ; IN F IX; Y ] . There is no significant ing reflects the trends of utilizing prompts with
performance difference between those positions. Li et stronger expressive ability to better capture the user's
al.[13] showed that prefix prompt sightly outperforms intent.
infix prompt, and the hybrid one is much more flexi-
ble than the others. 4 Prompt Answering Engineering
• Different Lengths. There is no optimal length for
all tasks, but there is always a threshold. The perfor- As depicted in Fig.1(b), prompt answer engineer-
mance will increase before reaching the threshold, ing (PAE) aims to align LLMs outputs with the in-
then it will either plateau or slightly decrease. tended purpose. The use of PAE is motivated by the
• Different Initializations. A proper initialization need to mitigate the gap between the capabilities of
is essential for the performance of the prompts and pre-trained LLMs and a large variety of requirements
the performance of random initialization is usually of different downstream tasks (see more discussion in
unsatisfactory. Typical methods include initialized by Section 2). Technology-wise, PAE involves a set of
sampling real words[13, 20], using class labels[20], using methods that control the admissible answer space and
discrete prompts[16], and using pre-trained based vec- optimization mechanisms of LLMs' output (see
tors[21, 23]. Furthermore, the manually designed overview in Table 5).
prompts serve as a good starting point for the follow-
ing search process. 4.1 Search for an Answer Space
Besides the above-mentioned methods, PE meth-
ods have also been used for tuning and constructing 4.1.1 Pre-Defined Answer Space
the LLMs. Typical methods in this area include Bit-
Fit[47], Partial- k tuning, MLP- k tuning, side- This involves a set of pre-defined answers for the
Table 4. Summary of Prompt Tuning Methods
Work Position Length Initialization
Prefix tuning[13] Prefix, infix 200 (summarization), 10 (table-to-text) Random, real words
Prompt tuning[20] Prefix 1, 5, 20, 100, 150 Random, sampled vocabulary, class label
P-tuning[15] Hybrid 3 (prefix), 3 (infix) LSTM-trained
DART[18] Infix 3 Unused token in vocabulary
OPTIPROMPT[16] Infix 5, 10 Manual prompt
Dynamic[46] Hybrid, dynamic Dynamic Sampled vocabulary
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 991

Table 5. Summary for Prompt Answer Engineering Methods


Answer Space Type Answer Mapping Method Work Task Type
Optimizing the mapping Discrete answer space [10, 12, 17, 51] Classification & regression
Continuous answer space [52] Classification
Broadening the output Discrete answer space [11] Generation
Decomposing the output Discrete answer space [53] Classification
Manually mapping Pre-defined answer [9, 54, 55] Generation

question-answering task, e.g., pre-defined emotions Remark 1. Answer shapes summarized as follows
(“happiness”, “surprise”, “shame”, “anger”, etc.) for are also needed in prompt answer engineering. In
the sentiment classification task. The model can then practice, the choice of the answer shape depends on
be trained to select the best answer from this pre-de- the desired outcome of the task.
fined space. As an illustration, the answer space PA • Tokens: individual tokens within the vocabulary
can be defined as the set of all tokens[9], fixed-length of a pre-trained language model, or a subset of the vo-
spans[56], or token sequences[8]. Furthermore, in cer- cabulary.
tain tasks like text classification, question answering, • Span: short sequences of multiple tokens, often
or entity recognition, answers are crafted manually as comprising a phrase or segment of text.
word lists that pertain to relevant topics[7, 54, 55]. • Sentence: a longer segment of text that can en-
compass one or more complete sentences.
4.1.2 Discrete Answer Space
4.2 Search for an Answer Mapping
The discrete answer space refers to a set of specif-
ic and distinct answer options that a language model There are several strategies to search for an an-
can choose from when generating a response to a giv- swer mapping.
en prompt.
Specifically, the possible answers are limited to a 4.2.1 Manually Mapping
fixed set of choices, such as a small number of named
entities or keyphrases (e.g., the total choice of the In many cases, the mapping from potential an-
planet in the solar system is eight). The model can swers space PA to output Y is obvious such that this
then be trained to identify whether the correct an- mapping can be done manually. For instance, the an-
swer is among this set of possibilities[10–12]. swer is output itself for the translation task[9] such
that the mapping is identity mapping. Additionally,
4.1.3 Continuous Answer Space Yin et al.[54] designed related topics (“health”, “food”,
“finance”, “sports”, etc.), situations (“shelter”, “wa-
The continuous answer space refers to a scenario
ter”, “medical assistance", etc.), or other possible la-
where the possible answers or responses are not re-
bels. Cui et al.[55] manually proposed some entity tags,
stricted to a predefined set of discrete options. In-
e.g., “organization”, “person”, and “location”, for the
stead, the answers can take on a range of continuous
named entity recognition problem.
values or be any text, number, or value within a
broader, unbounded spectrum[52, 57].
The model can then be trained to predict a point 4.2.2 Broadening Answer PA
in the continuous space that corresponds to the cor-
rect answer. Broadening PA ( PA′ = B(PA ) ) is expanding the an-
swer space to obtain a more accurate mapping. Jiang
4.1.4 Hybrid Approach et al.[11] proposed a method to paraphrase the answer
space PA by transferring the original prompt into oth-
This involves combining multiple methods to de- er similar expressions. In their approach, they em-
sign the answer space, such as using a pre-defined list ployed a back-translation technique by first translat-
of entities for certain types of questions, but allowing ing prompts into another language and then translat-
for free-form text answers for other types of ing them back, resulting in a set of diverse para-
questions[58]. phrased answers. The probability of the final output
992 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4


can be expressed as P (Y |x) = y∈B(PA ) P (y|x) , where ent descent. Hambardzumyan et al.[52] allocated a vir-
B(Y ) represents the set of possible paraphrased an- tual token to represent each class label and opti-
swers. mized the token embedding for each class along with
the prompt token embedding using gradient descent.
4.2.3 Decomposing the Output
4.3 Trends for Prompt Answer Engineering
Decomposing Y (D (Y ) ) aims to expand the infor-
mation of Y , which makes it easier to look for a map- There are two trends in prompt answer engineer-
ping gθ . For example, Chen et al.[53] decomposed the ing:
labels into several words and regarded them as the • Developing more robust and generalizable ques-
answer. Concretely, they decomposed label/output tion-answering models that can handle more complex
tasks and a broader range of inputs. For example, the
“per:city_of_death” into three separated words {per-
answer space is some discrete spans at the beginning
son, city, death}. The probability of final output can
∑ (see Section 6) and developed to the complex continu-
be written as P (y|x) = y∈D(Y ) P (y|x) .
ous space (see Subsection 4.1.3).
• There is also a focus on improving the quality
4.2.4 Optimizing the Mapping and relevance of prompts to improve model perfor-
mance. Specifically, several techniques have been ex-
There exist two approaches to optimizing the plored, such as paraphrasing and pruning, after the
mapping function. The first approach is to generate direct mapping approach. More recently, optimiza-
the pruned space P̃A and search for a set of answers tion methods[59, 60] using gradient descent have been
within this pruned space. Schick et al.[10, 51] intro- proposed to enhance accuracy.
duced a technique for generating a mapping from The prompt answering engineering also shows a
each label to a singular token that represents its se- trend of exploring prompts to decode the machine
mantic meaning. This mapping, referred to as a ver- language with less information loss, i.e., has a better
balizer v , is designed to identify sets of answers. Their understanding of the machine.
approach involves estimating a verbalizer v by maxi-
mizing the likelihood w.r.t. the training data condi- 5 Multiple Prompting Methods
tioned on the verbalizer v . Shin et al.[12] proposed an
Multiple prompts can be utilized to further re-
alternative approach for selecting the answer tokens.
duce the information mismatch during the encoding
They employed logistic classifiers to identify the top-k
and decoding process. These methods can be catego-
tokens that yield the highest probability score, which rized into two main types, namely “multi-prompt en-
together form the selected answer. In addition, Gao gineering” and “multi-turn prompt engineering”, de-
et al.[17] constructed a pruned set P̃Ac containing the pending on the interrelationship of prompts (see
top-k vocabulary words based on their conditional Fig.2). Multi-prompt engineering is akin to an ensem-
likelihood for each class c . As for the second ap- ble system, whereby each response serves as a valid
proach, it investigates the potential of utilizing soft answer, and responses from multiple prompts are ag-
answer tokens that can be optimized through gradi- gregated to produce a more stable outcome. This type








User LLM User LLM










(a) (b)
Fig.2. Overview of multiple prompting methods. (a) Multi-prompt methods utilize several similar prompts to produce a more stable
result. (b) Multi-turn prompt methods produce the final result by aggregating responses from a sequence of prompts.
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 993

of method can be thought to extend the use of perspective, multi-prompt engineering can be consid-
prompts in the spatial domain. On the other hand, ered as sending multiple copies of the message to en-
multi-turn PE entails a sequence of prompts, where- sure the authentic delivery of data (see Fig.3(a)).
by subsequent prompts depend on the response gener-
ated from previous prompts or the obtaining of the fi- 5.1.1 Expanding the Prompt
nal answer relies on multiple responses. Consequently,
this type of method can be viewed as an extension in Expanding the prompt PT aims to cover a larger
the temporal domain. Table 6 summarizes the main semantic space around the sender's true intention,
multiple prompting methods. and a more stable approximation of the target out-
put, X A , can be obtained by aggregating the respons-
5.1 Multi-Prompt Engineering Methods es.
Jiang et al.[11], Lester et al.[20], and Ham-
Multi-prompt methods employ multiple prompts bardzumyan et al.[52] proposed to combine outputs of
with similar patterns during the inference aiming to different prompts to get the final result for classifica-
enhance information preservation. This method is tion tasks. Qin et al.[61] incorporated multi-prompt
closely associated with assembling techniques[91–93]. ideas with soft prompts and optimized the weights of
Although the primary motivation is to exploit the each prompt together with prompt parameters. Yuan
complementary advantages of different prompts and et al.[41] proposed to use text generation probability as
reduce the expenses associated with PE, it can also be the score for text generation evaluation, and aggregat-
integrated with prompt-engineering techniques to fur- ed multiple results of different prompts as the final
ther improve efficacy. From a communication theory score.

Table 6. Summary of PE Methods Involving Multiple Prompts


Method Language Understanding Language Generation Reasoning
Multi-prompt Expanding PT [11, 20, 52, 61] [41] –
Diversifying PA – – [62–67]
Optimizing θ [10, 37] [17, 68] –
Multi-turn prompt Decomposing PT – [59, 60, 69] [70–77]
Refining PT – [78, 79] [25, 60, 79–82]
Augmenting PT – [83, 84] [85–87]
Optimizing θ – [59, 69] [87–90]

Augmenting 
Expanding  Context
Context External Tools
Question
Question
Refining 
Optimizing 
Optimizing 
LLM

LLM

Decomposing 
Diversifying 
Rational Rational
Final Answer Final Answer

(a) (b)
Fig.3. Schematic illustrations of multi-prompting methods. (a) Multi-prompt methods utilize several similar prompts to produce a
more stable result. (b) Multi-turn prompt methods mainly leverage LLMs or external tools to provide clearer and more helpful con-
text.
994 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

5.1.2 Diversifying the Answer model. Schick et al.[96] extended this idea to the text
generation task by using the generation probability of
Different from expanding the prompt PT whose decoded text as the score. Gao et al.[17] used a similar
main goal is to leverage the input space around PT , method for automatic template generation. Schick et
diversifying the answer PA aims to exploit the vari- al.[37] further expanded PET with multiple verbalizers.
ous “thinking paths” of the LLM through sampling This was achieved by introducing sample-dependent
its decoder. This is especially effective for handling output space.
complex tasks, such as mathematical and reasoning
problems. 5.2 Multi-Turn Prompt Engineering
Wang et al.[62] proposed a self-consistency method Methods
based on the Chain-of-Thoughts (CoT) which sam-
ples multiple reasoning paths and selects the most Multi-turn prompt engineering methods involve
consistent answer by majority voting or weighted av- decomposing the full prompting task into several sub-
eraging. Lewkowycz et al.[63] applied a similar idea to tasks, each addressed by a corresponding prompt.
quantitative problems by combining multiple prompts This process typically entails a sequence of encoding
and output sampling. Wang et al.[64] investigated vari- and decoding operations, where subsequent prompts
ous ensemble variants in reasoning problems and may depend on the decoded message from previous
found that rational sampling in the output space is
prompts or each prompt is responsible for a sub-task.
more efficient. These methods solely use the final an-
The outcome can be obtained either from the result of
swer as the selection criterion and do not exploit the
the last prompt or by aggregating the responses gen-
generated rationals from various sampling paths. To
erated by all prompts. This strategy is designed to
take advantage of these intermediate results, Li et
tackle challenging tasks, such as complex mathemati-
al.[65] proposed to generate more diversified reasoning
cal questions or reasoning tasks. It mainly involves
paths with multiple prompts and used a model-based
two components: 1) decomposing the prompt PT into
verifier to select and rank these reasoning paths. Fu
sub-tasks to reduce the difficulty of each sub-task;
et al.[66] introduced a complexity-based metric to eval-
and 2) modifying the prompt PT to generate better
uate reasoning paths and prioritize those with higher
intermediate results for later steps. These two compo-
complexity in the aggregation. Weng et al.[94] em-
nents can help to bridge the gap between complex X
ployed the LLM to self-verify various reasonings by
and Y (see Fig.3(b)).
comparing predicted conditions using the generated
reasonings to original conditions. The consistency
score is then used to select the final result. Yao et 5.2.1 Decomposing the Prompt
al.[95] proposed the “Tree of Thoughts” to explore the
intermediate steps across various reasoning paths, and Decomposing the prompt PT is the first step in
used the LLM to evaluate the quality of each possi- handling complex tasks, and a proper decomposition
ble path. Besta et al.[67] further proposed the “Graph requires a good understanding of both the target task
of Thoughts” to treat the various reasoning paths as and the user's intention. Yang et al.[97] decomposed
graphs so that the essence of the thought networks SQL operations using fine-tuned few-shot models and
can be extracted. untrained zero-shot models combined with predefined
rules. However, ruled-based decomposition heavily re-
5.1.3 Optimizing the Model lies on human experiences, and thus it is desirable to
automate this step with LLMs. Min et al.[59] pro-
This line of work treats multiple prompts as a la- posed an unsupervised method that utilizes a similari-
bel generator to address the sample deficiency prob- ty-based pseudo-decomposition set as a target to train
lem. Schick et al.[10] first proposed pattern-exploiting a seq2seq model as a question generator. The decom-
training (PET) that employs a knowledge distillation posed simple question is then answered by an off-the-
strategy to aggregate results from multiple prompt- shelf single-hop QA model. Perez et al.[69] treated the
verbalizer combinations (PVP). They first utilized decomposition in a multi-hop reading comprehension
PVP pairs to train separate models that generate (RC) task as a span prediction problem which only
pseudo-labels for unlabeled datasets. This extended needs a few hundreds of samples. For each task, vari-
dataset was then used to train the final classification ous decomposition paths are generated, with each
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 995

sub-question answered by a single-hop RC model. Fi- tant for multi-step reasoning, where the quality of
nally, a scorer model is used to select the top-scoring generated intermediate reasonings has a critical im-
answer based on the solving path. Khot et al.[60] pro- pact on the final answer.
posed a text modular network leveraging existing Following the success of the few-shot chain-of-
models to build a next-question generator. The train- thoughts (CoT) prompting method, Kojima et al.[25]
ing samples are obtained from sub-task models condi- proposed a zero-shot CoT method that utilizes the
tioned on distant supervision hints. fixed prompt ``Let's think step by step" to generate
With the emergent general ability of LLMs, in- reasonings. These intermediate results are then fused
stead of training a task-specific decomposition model, with the original question to get the final answer. To
LLMs are used to fulfill decomposition tasks. Zhou et select more effective exemplars, various methods were
al.[70] proposed the least-to-most prompting method proposed. Li et al.[78] used LLMs to first generate a
where hard tasks are first reduced to less difficult sub- pseudo-QA pool, then a clustering method combined
tasks by LLMs. Then answers from previous sub- with similarity to the question was adopted to dy-
problems are combined with the original task to facili- namically select QA pairs from the generated QA
tate subsequent question solving. Dua et al.[71] em- pool as demonstration exemplars. Shum et al.[80] lever-
ployed a similar idea and appended both questions aged a high-quality exemplar pool to obtain an exem-
and answers from the previous stage to the subse- plar distribution using a variance-reduced policy gra-
quent prompt. Creswell et al.[72] proposed a selection- dient estimator. Ye et al.[79] employed a self-consisten-
inference framework. It uses LLM to alternatively ex- cy method[62] to generate pseudo-labels of an unla-
ecute selecting relevant information from a given con- beled dataset. The accuracy of these silver labels
text and inferring new facts based on the selected in- serves as the selection criterion of exemplars. To fur-
formation. Arora et al.[73] proposed to format the in- ther reduce the search complexity of various combina-
termediate steps as open-ended question-answering tions, additional surrogate metrics were introduced to
tasks using LLMs. It further generates a set of estimate the accuracy. Diao et al.[81] addressed this
prompt chains and uses weak supervision to aggre- problem by using hard questions with human annota-
gate the results. Khot et al.[74] proposed a modular tions as exemplars. The hardness is measured by the
approach for task decomposition with LLMs by using disagreement of results obtained by multiple sam-
specialized decomposition prompts. Drozdov et al.[98] pling of the LLM. Zhang et al.[82] proposed automatic
introduced a dynamic least-to-most prompting CoT methods. They introduced question clustering
method for semantic parsing tasks by utilizing multi- and demonstration sampling steps to automatically
ple prompts to build a more flexible tree-based de- select the best demonstrations for the CoT template.
composition. Ye et al.[75] used LLMs as the decompos-
er for table-based reasoning tasks. LLMs are used for 5.2.3 Augmenting the Prompt
both sub-table extraction and question decomposition.
Press et al.[99] proposed Self-Ask which decomposes Different from refining the prompt PT which
the original task by repeatedly asking the LLM if fol- mainly focuses on finding prompts that generate bet-
low-up questions are needed. Wu et al.[76] proposed to ter intermediate results, augmenting the prompt PT
build an interactive chaining framework with several leverages the exploitation of external information,
primitive operations of LLMs to provide better trans- knowledge, tools, etc. in the prompting. We present
parency and controllability of using LLMs. Wang et some examples in this field below, and for more de-
al.[77] proposed a Plan-and-Solve (PS) method that tails we refer the reader to the specific survey[100].
explicitly prompts LLM to devise a plan before solv- Yang et al.[83] proposed a recursive reprompting and
ing the problem to address the missing-steps error in revision (3R) framework for long story generation
the reasoning. leveraging pre-defined outlines. In each step, the con-
text of the current status and the outline of the story
5.2.2 Refining the Prompt are provided to the prompt to ensure better content
coherence. Yang et al.[84] proposed to use more de-
Refining the prompt PT aims to construct a bet- tailed outlines so that the story generation LLM can
ter representation of PT based on the feedback from focus more on linguistic aspects. Information re-
previous prompting results. This is especially impor- trieved from other sources is also often used to aug-
996 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

ment PT . Yao et al.[101] gave the LLM access to infor- paradigm, which utilizes embedded knowledge and
mation from Wikipedia. Thoppilan et al.[102] taught the capability of LLMs to handle various tasks via
the LLM to use search engines for knowledge re- prompting. This paradigm soon dominated because of
trieval. More broadly, Paranjape et al.[26] introduced a its efficiency and flexibility.
task library to enable the LLM using external tools. There are two trends in multiple prompting engi-
Schick et al.[85] trained the LLM to use various exter- neering.
nal tools via API. Shen et al.[86] utilized an LLM as a • Developing an enhanced adaptive prompting
central controller to coordinate other models to solve strategy for LLM-based task decomposition is impera-
tasks. tive. The extensive range and intricacy of tasks ren-
der human-based or rule-based task decomposition in-
5.2.4 Optimizing the Model feasible. While some studies have explored the use of
LLM prompting to generate intermediate questions or
General LMs (language models) are not opti- actions for specific tasks, a comprehensive strategy is
mized for producing intermediate rationals or decom- currently lacking.
posing a complex task or question. Before the era of • Enabling LLMs to leverage tools without the
LLMs, these tasks require specifically trained LMs. need for fine-tuning is a crucial objective. By incorpo-
Min et al.[59, 69] trained an LM model for decompos- rating external tools, LLMs can address their limita-
ing the original task into sub-tasks. Nye et al.[88] tions in specialized domains or capabilities. Previous
trained the LLM to produce intermediate steps stored studies[85] have employed fine-tuning based approach-
in a scratch pad for later usage. Zelikman et al.[89] uti- es to train LLMs in utilizing web search or other tools
lized the intermediate outputs that lead to the cor- accessible through APIs.
rect answer as the target to fine-tune the LLM. Wang From the communication theory perspective, mul-
et al.[87] proposed an iterative prompting framework tiple prompting methods evolved from the extension
using a context-aware prompter. The prompter con- in the spatial domain (ensemble-based methods) into
sists of a set of soft prompts that are prepared for the the temporal domain (mulit-turn), to better align the
encoder and decoder of the LLMs, respectively. Tay- user's intention and LLM's capability by decompos-
lor et al.[90] employed step-by-step solutions of scien- ing the user's request and leveraging external tools.
tific papers in the training corpus, which enables the
LM to output reasoning steps if required. 6 Discussion

5.3 Trends for Multiple Prompting Methods Researchers have proposed several surveys to reca-
pitulate the rapid advancements in the field of PE
Ensemble-based methods are easy to implement methods[7, 103–107]. To name a few, Liu et. al proposed
and flexible to incorporate with various strategies, e.g. a comprehensive survey about existing PE methods,
expanding the input space and aggregating the out- which covers common aspects like template engineer-
put space. However, this brings limited advantages ing, answering engineering, training strategies, appli-
for complex problems whose final answers are hard to cations, and challenges[7]. They revealed the develop-
obtain directly, but rely heavily on the intermediate ment history of prompting learning and describe a set
thinking steps. Therefore, multi-turn PE methods of mathematical notations that could summarize most
emerged. A multi-turn method essentially adjusts its of the existing studies. Furthermore, they considered
input dynamically during the interaction based on the prompt-based learning as a new paradigm that re-
knowledge and feedback from the LLM or external volves around the way we look at NLP. In another
tools. In this way, LLMs can leverage more context survey[103] that mainly focuses on the reasoning abili-
and understand better the true intention of the user. ties (e.g., arithmetic, commonsense, symbolic reason-
Initially, specialized LLMs are trained to handle plan- ing, logical, and multi-modal) of LLMs, Qiao et al.
ning and solving specific subtasks, which not only in- summarized the studies that harness these reasoning
troduces extra training effort but also constrains the abilities via advanced PE methods like chain-of-
generalization capability of LLM. With the increas- though and generated knowledge prompts. Addition-
ing understanding ability and larger input length of ally, some focused surveys cover specific topics like
LLMs, in-context learning becomes the preferred parameter-efficient fine-tuning (PEFT) LLMs using
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 997

PE methods[104]. Different from the above-mentioned termarking techniques[114], is essential to prevent theft
studies, we try to interpret existing PE methods from and ensure the rightful ownership of LLMs.
a communication theory perspective. • Interactive and Multi-Turn Prompting. Besides
Following this line of research, we also would like the automation in prompting methods, humans in the
to discuss some potential challenges and future direc- loop can bring more controllability, transparency, and
tions for PE methods, which could be divided into explainability over the process, producing more reli-
three categories including finding the optimal able results. The success of the chain-of-thoughts
prompts, privacy issue and concern, and interactive methodology[115] demonstrates the “thinking path”
and multi-turn prompting. can enhance LLMs' reasoning capability. This proper-
• Finding the Optimal Prompts. One of the points ty can also be exploited to generate step-by-step task-
of discrete prompts is that it is difficult to design and solving procedures like scratch paper in exams, so
choose an optimal prompt, causing its instability. Al- that the final answer can be better justified. Follow-
though soft prompts partly address this problem, the ing this idea, Wu et al.[76] built an interactive frame-
discrete prompt is still very important because it has work involving human interaction for better controlla-
good interpretability and has been proven to be able bility of the process. However, frequent human inter-
to help soft prompts search effectively. Looking vention will diminish the efficiency gained by using
through the existing methods, we can find that accu- LLMs. Therefore, in addition to the granularity of de-
racy-based criteria are resource-consuming, while LM- composed tasks, it is also required to determine when
based log probability is not sufficient to evaluate the to involve human feedback. This could be designed
prompt. Therefore, a well-designed ranking criterion manually for each task, but it would be much more
combined with a mass of auto-based generated efficient if LLMs could plan these stages by them-
prompts may be a good direction for the future. Fur- selves.
thermore, even though the prompt has been proven • Bias and Fairness. LLMs often tend to internal-
effective in many tasks such as classification and text ize biases presented in their training datasets, mak-
generation, most of the existing work has to design a ing the mitigation of such biases and the pursuit of
specific prompt for a given task, which makes it com- fairness a key aspect of existing PE methods. For ex-
plex and complicated[108]. Thus, how to generate a
ample, Zhao et al.[116] revealed that factors such as
task-agnostic prompt or transfer the prompt to other
the structure of prompts, demonstrations contained in
fields quickly may be a challenging problem. Discrete
the prompt, and even the order of these demonstra-
(meta-learning[109]) and continuous (decompositi-
tions can lead to diverse performance in in-context
on[110]) prompts are applied to tackle this issue. How-
learning prompts, and they further proposed calibra-
ever, they are not well-optimized and can not serve
tion to alleviate the bias. Schick et al.[117] designed bi-
unseen tasks.
ased or debiased instructions to guide the LLMs to
• Privacy Issue and Concern. There are two as-
conduct self-diagnosis and self-debiasing. Further-
pects of privacy and security issue in LLMs. First,
more, the social biases exhibited by LLMs can poten-
users' data may be leaked during the training and in-
tially lead to discriminatory actions or content target-
ference of LLMs. For instance, training LLMs re-
quires vast amounts of data including personal infor- ed at specific groups or demographics. Such problems
mation, private conversations, or copyrighted materi- are often caused from stereotypes that perpetuate
al. By providing a series of queries or prompts, an at- harmful generalizations related to gender, race, and
tacker might be able to extract personal details or religion. For instance, Liu et al.[118] introduced a nov-
confidential information from the model's responses. el prompting approach that reveals how existing
Privacy-preserving methods including techniques such LLMs exhibit social biases during text-to-SQL predic-
as differential privacy, homomorphic encryption, and tion tasks. Despite this progress, the challenge of mit-
federated learning[111–113] may preserve the privacy of igating biases in LLMs using PE methods remains an
the data used for training and inference. Second, the area that requires further investigation.
LLMs may be stolen by attackers to misuse. LLMs
are highly valuable assets, requiring substantial time 7 Conclusions
and financial investment for their training. Protect-
ing them from unauthorized access and misuse is cru- This article summarizes the prompting methods
cial. Developing robust security measures, such as wa- from a perspective of communication theory which
998 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

provides a coherent explanation of different prompt [8] Radford A, Wu J, Child R, Luan D, Amodei D, Sutskev-
engineering (PE) methods in terms of their objectives er I. Language models are unsupervised multitask learn-
ers. OpenAI blog, 2019, 1(8): Article No. 9.
and underlying principles. Theoretical analysis re-
[9] Petroni F, Rocktäschel T, Riedel S, Lewis P, Bakhtin A,
veals that the ultimate goal of PE is to reduce the in-
Wu Y X, Miller A. Language models as knowledge
formation misunderstanding between the users and
bases? In Proc. the 2019 Conference on Empirical Meth-
the LLMs. This novel view facilities a unified review ods in Natural Language Processing and the 9th Interna-
of three PE methods and offers and insights into sce- tional Joint Conference on Natural Language Processing
narios where existing prompting methods come short. (EMNLP-IJCNLP), Nov. 2019, pp.2463–2473. DOI: 10.
We hope this survey will inspire researchers with a 18653/v1/D19-1250.
new understanding of the related issues in prompting [10] Schick T, Schütze H. Exploiting cloze-questions for few-
methods, therefore stimulating progress in this shot text classification and natural language inference. In
promising area. Proc. the 16th Conference of the European Chapter of
the Association for Computational Linguistics: Main
Conflict of Interest The authors declare that Volume, Apr. 2021, pp.255–269. DOI: 10.18653/v1/2021.
they have no conflict of interest. eacl-main.20.
[11] Jiang Z B, Xu F F, Araki J, Neubig G. How can we
References know what language models know? Transactions of the
Association for Computational Linguistics, 2020, 8:
[1] Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, 423–438. DOI: 10.1162/tacl_a_00324.
Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell [12] Shin T, Razeghi Y, Logan IV R L, Wallace E, Singh S.
A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, AutoPrompt: Eliciting knowledge from language models
Child R, Ramesh A, Ziegler D M, Wu J, Winter C, with automatically generated prompts. In Proc. the 2020
Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Conference on Empirical Methods in Natural Language
Clark J, Berner C, McCandlish S, Radford A, Sutskever Processing (EMNLP), Nov. 2020, pp.4222–4235. DOI:
I, Amodei D. Language models are few-shot learners. In 10.18653/v1/2020.emnlp-main.346.
Proc. the 34th International Conference on Neural Infor- [13] Li X L, Liang P. Prefix-tuning: Optimizing continuous
mation Processing Systems, Dec. 2020, Article No. 159. prompts for generation. In Proc. the 59th Annual Meet-
[2] OpenAI. GPT-4 technical report. arXiv: 2303.08774, 2023. ing of the Association for Computational Linguistics and
https://arxiv.org/abs/2303.08774, Jul. 2024. the 11th International Joint Conference on Natural Lan-
[3] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M guage Processing (Volume 1: Long Papers), Aug. 2021,
A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, pp.4582–4597. DOI: 10.18653/v1/2021.acl-long.353.
Rodriguez A, Joulin A, Grave E, Lample G. LLaMA: [14] Haviv A, Berant J, Globerson A. BERTese: Learning to
Open and efficient foundation language models. arXiv: speak to BERT. In Proc. the 16th Conference of the Eu-
2302.13971, 2023. https://arxiv.org/abs/2302.13971, Jul. ropean Chapter of the Association for Computational
2024. Linguistics: Main Volume, Apr. 2021, pp.3618–3623. DOI:
[4] Cheng K M, Li Z Y, Li C, Xie R J, Guo Q, He Y B, Wu 10.18653/v1/2021.eacl-main.316.
H Y. The potential of GPT-4 as an AI-powered virtual [15] Liu X, Zheng Y N, Du Z X, Ding M, Qian Y J, Yang Z
assistant for surgeons specialized in joint arthroplasty. L, Tang J. GPT understands, too. AI Open, 2023. DOI:
Annals of Biomedical Engineering, 2023, 51(7): 1366– 10.1016/j.aiopen.2023.08.012.
1370. DOI: 10.1007/s10439-023-03207-z. [16] Zhong Z X, Friedman D, Chen D Q. Factual probing is
[5] Cascella M, Montomoli J, Bellini V, Bignami E. Evaluat- [MASK]: Learning vs. learning to recall. In Proc. the
ing the feasibility of ChatGPT in healthcare: An analy- 2021 Conference of the North American Chapter of the
sis of multiple clinical and research scenarios. Journal of Association for Computational Linguistics: Human Lan-
Medical Systems, 2023, 47(1): Article No. 33. DOI: 10. guage Technologies, Jun. 2021, pp.5017–5033. DOI: 10.
1007/s10916-023-01925-4. 18653/v1/2021.naacl-main.398.
[6] George A S, George A S H. A review of ChatGPT AI’s [17] Gao T Y, Fisch A, Chen D Q. Making pre-trained lan-
impact on several business sectors. Partners Universal guage models better few-shot learners. In Proc. the 59th
International Innovation Journal, 2023, 1(1): 9–23. DOI: Annual Meeting of the Association for Computational
10.5281/zenodo.7644359. Linguistics and the 11th International Joint Conference
[7] Liu P F, Yuan W Z, Fu J L, Jiang Z B, Hayashi H, Neu- on Natural Language Processing, Aug. 2021, pp.3816–
big G. Pre-train, prompt, and predict: A systematic sur- 3830. DOI: 10.18653/v1/2021.acl-long.295.
vey of prompting methods in natural language process- [18] Zhang N Y, Li L Q, Chen X, Deng S M, Bi Z, Tan C Q,
ing. ACM Computing Surveys, 2023, 55(9): 195. DOI: Huang F, Chen H J. Differentiable prompt makes pre-
10.1145/3560815. trained language models better few-shot learners. In
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 999

Proc. the 10th International Conference on Learning nal of Abnormal Psychology, 1994, 103(4): 655–668.
Representations, Apr. 2022. DOI: 10.1037/0021-843X.103.4.655.
[19] Han X, Zhao W L, Ding N, Liu Z Y, Sun M S. PTR: [33] Shannon C E. A mathematical theory of communication.
Prompt tuning with rules for text classification. AI The Bell System Technical Journal, 1948, 27(3):
Open, 2022, 3: 182–192. DOI: 10.1016/j.aiopen.2022.11.003. 379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x.
[20] Lester B, Al-Rfou R, Constant N. The power of scale for [34] Schramm W. The Process and Effects of Mass Commu-
parameter-efficient prompt tuning. In Proc. the 2021 nication. University of Illinois Press, 1954.
Conference on Empirical Methods in Natural Language [35] Cover T M, Thomas J A. Elements of Information Theo-
Processing, Nov. 2021, pp.3045–3059. DOI: 10.18653/v1/ ry. John Wiley & Sons, 1991.
2021.emnlp-main.243. [36] Sorensen T, Robinson J, Rytting C, Shaw A, Rogers K,
[21] Gu Y X, Han X, Liu Z Y, Huang M L. PPT: Pre-trained Delorey A, Khalil M, Fulda N, Wingate D. An informa-
prompt tuning for few-shot learning. In Proc. the 60th tion-theoretic approach to prompt engineering without
Annual Meeting of the Association for Computational ground truth labels. In Proc. the 60th Annual Meeting of
Linguistics (Volume 1: Long Papers), May 2022, pp.8410– the Association for Computational Linguistics (Volume
8423. DOI: 10.18653/v1/2022.acl-long.576. 1: Long Papers), May 2022, pp.819–862. DOI: 10.18653/
[22] Deng M K, Wang J Y, Hsieh C P, Wang Y H, Guo H, v1/2022.acl-long.60.
Shu T M, Song M, Xing E, Hu Z T. RLPrompt: Opti- [37] Schick T, Schütze H. It’s not just size that matters:
mizing discrete text prompts with reinforcement learn- Small language models are also few-shot learners. In
ing. In Proc. the 2022 Conference on Empirical Methods Proc. the 2021 Conference of the North American Chap-
in Natural Language Processing, Dec. 2022, pp.3369– ter of the Association for Computational Linguistics: Hu-
3391. DOI: 10.18653/v1/2022.emnlp-main.222. man Language Technologies, Jun. 2021, pp.2339–2352.
[23] Hou Y T, Dong H Y, Wang X H, Li B H, Che W X. DOI: 10.18653/v1/2021.naacl-main.185.
MetaPrompting: Learning to learn better prompts. In [38] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Mate-
Proc. the 29th International Conference on Computa- na M, Zhou Y Q, Li W, Liu P J. Exploring the limits of
tional Linguistics, Oct. 2022, pp.3251–3262. transfer learning with a unified text-to-text transformer.
[24] Wang Z, Panda R, Karlinsky L, Feris R, Sun H, Kim Y. The Journal of Machine Learning Research, 2020, 21(1):
Multitask prompt tuning enables parameter-efficient 140.
transfer learning. In Proc. the 11th International Confer- [39] Zhou Y L, Zhao Y R, Shumailov I, Mullins R, Gal Y.
ence on Learning Representations, May 2023. Revisiting automated prompting: Are we actually doing
[25] Kojima T, Gu S S, Reid M, Matsuo Y, Iwasawa Y. better? In Proc. the 61st Annual Meeting of the Associa-
Large language models are zero-shot reasoners. In Proc. tion for Computational Linguistics (Volume 2: Short Pa-
the 36th International Conference on Neural Informa- pers), Jul. 2023, pp.1822–1832. DOI: 10.18653/v1/2023.
tion Processing Systems, Nov. 28-Dec. 9, 2022, Article acl-short.155.
No. 1613. [40] Logan IV R, Balažević I, Wallace E, Petroni F, Singh S,
[26] Paranjape B, Lundberg S, Singh S, Hajishirzi H, Zettle- Riedel S. Cutting down on prompts and parameters:
moyer L, Ribeiro M T. ART: Automatic multi-step rea- Simple few-shot learning with language models. In Proc.
soning and tool-use for large language models. arXiv: the 2022 Findings of the Association for Computational
2303.09014, 2023. https://arxiv.org/abs/2303.09014, Jul. Linguistics, May 2022, pp.2824–2835. DOI: 10.18653/v1/
2024. 2022.findings-acl.222.
[27] Narula U. Handbook of Communication: Models, Per- [41] Yuan W Z, Neubig G, Liu P F. BARTSCORE: Evaluat-
spectives, Strategies. Atlantic Publishers & Distributors ing generated text as text generation. In Proc. the 35th
(P) Ltd, 2006. International Conference on Neural Information Process-
[28] Chandler D, Munday R. A Dictionary of Media and ing Systems, Dec. 2021, Article No. 2088.
Communication. Oxford University Press, 2011. [42] Ben-David E, Oved N, Reichart R. PADA: Example-
[29] Cobley P, Schulz P J. Theories and Models of Communi- based prompt learning for on-the-fly adaptation to un-
cation. De Gruyter Mouton, 2013. seen domains. Transactions of the Association for Com-
[30] Latané B. Dynamic social impact: The creation of culture putational Linguistics, 2022, 10: 414–433. DOI: 10.1162/
by communication. Journal of Communication, 1996, tacl_a_00468.
46(4): 13–25. DOI: 10.1111/j.1460-2466.1996.tb01501.x. [43] Li B H, Hou Y T, Che W X. Data augmentation ap-
[31] Orbe M P. From the standpoint(s) of traditionally mut- proaches in natural language processing: A survey. AI
ed groups: Explicating a co-cultural communication the- Open, 2022, 3: 71–90. DOI: 10.1016/j.aiopen.2022.03.001.
oretical model. Communication Theory, 1998, 8(1): 1–26. [44] Zhou Y C, Muresanu A I, Han Z W, Paster K, Pitis S,
DOI: 10.1111/j.1468-2885.1998.tb00209.x. Chan H, Ba J. Large language models are human-level
[32] Segrin C, Abramson L Y. Negative reactions to depres- prompt engineers. In Proc. the 11th International Con-
sive behaviors: A communication theories analysis. Jour- ference on Learning Representations, May 2023.
1000 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

[45] Davison J, Feldman J, Rush A M. Commonsense knowl- DOI: 10.18653/v1/D19-1404.


edge mining from pretrained models. In Proc. the 2019 [55] Cui L Y, Wu Y, Liu J, Yang S, Zhang Y. Template-
Conference on Empirical Methods in Natural Language based named entity recognition using BART. In Proc.
Processing and the 9th International Joint Conference on the 2021 Findings of the Association for Computational
Natural Language Processing (EMNLP-IJCNLP), Nov. Linguistics, Aug. 2021, pp.1835–1845. DOI: 10.18653/v1/
2019, pp.1173–1178. DOI: 10.18653/v1/D19-1109. 2021.findings-acl.161.
[46] Yang X J, Cheng W, Zhao X J, Yu W C, Petzold L, [56] Jiang Z B, Anastasopoulos A, Araki J, Ding H B, Neu-
Chen H F. Dynamic prompting: A unified framework for big G. X-FACTR: Multilingual factual knowledge re-
prompt tuning. arXiv: 2303.02909, 2023. https://arxiv. trieval from pretrained language models. In Proc. the
org/abs/2303.02909, Jul. 2024. 2020 Conference on Empirical Methods in Natural Lan-
[47] Zaken E B, Goldberg Y, Ravfogel S. BitFit: Simple pa- guage Processing (EMNLP), Nov. 2020, pp.5943–5959.
rameter-efficient fine-tuning for transformer-based DOI: 10.18653/v1/2020.emnlp-main.479.
masked language-models. In Proc. the 60th Annual [57] Nickel M, Kiela D. Learning continuous hierarchies in
Meeting of the Association for Computational Linguis- the Lorentz model of hyperbolic geometry. In Proc. the
tics (Volume 2: Short Papers), May 2022. DOI: 10. 35th International Conference on Machine Learning, Jul.
18653/v1/2022.acl-short.1. 2018, pp.3776–3785.
[48] Zhang J O, Sax A, Zamir A, Guibas L, Malik J. Side- [58] Hou Y T, Che W X, Lai Y K, Zhou Z H, Liu Y J, Liu
tuning: A baseline for network adaptation via additive H, Liu T. Few-shot slot tagging with collapsed depen-
side networks. In Proc. the 16th European Conference on dency transfer and label-enhanced task-adaptive projec-
Computer Vision, Aug. 2020, pp.698–714. DOI: 10.1007/ tion network. In Proc. the 58th Annual Meeting of the
978-3-030-58580-8_41. Association for Computational Linguistics, Jul. 2020,
[49] Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, de pp.1381–1393. DOI: 10.18653/v1/2020.acl-main.128.
Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Pa- [59] Min S, Zhong V, Zettlemoyer L, Hajishirzi H. Multi-hop
rameter-efficient transfer learning for NLP. In Proc. the reading comprehension through question decomposition
36th International Conference on Machine Learning, and rescoring. In Proc. the 57th Annual Meeting of the
Jun. 2019, pp.2790–2799. Association for Computational Linguistics, Jul. 2019,
[50] Sung Y L, Cho J, Bansal M. LST: Ladder side-tuning for pp.6097–6109. DOI: 10.18653/v1/P19-1613.
parameter and memory efficient transfer learning. In [60] Khot T, Khashabi D, Richardson K, Clark P, Sabharw-
Proc. the 36th International Conference on Neural Infor- al A. Text modular networks: Learning to decompose
mation Processing Systems, Nov. 28-Dec. 9, 2022, Arti- tasks in the language of existing models. In Proc. the
cle No. 944. 2021 Conference of the North American Chapter of the
[51] Schick T, Schmid H, Schütze H. Automatically identify- Association for Computational Linguistics: Human Lan-
ing words that can serve as labels for few-shot text clas- guage Technologies, Jun. 2021, pp.1264–1279. DOI: 10.
sification. In Proc. the 28th International Conference on 18653/v1/2021.naacl-main.99.
Computational Linguistics, Dec. 2020, pp.5569–5578. [61] Qin G H, Eisner J. Learning how to ask: Querying LMs
DOI: 10.18653/v1/2020.coling-main.488. with mixtures of soft prompts. In Proc. the 2021 Confer-
[52] Hambardzumyan K, Khachatrian H, May J. WARP: ence of the North American Chapter of the Association
Word-level adversarial reprogramming. In Proc. the 59th for Computational Linguistics: Human Language Tech-
Annual Meeting of the Association for Computational nologies, Jun. 2021, pp.5203–5212. DOI: 10.18653/v1/2021.
Linguistics and the 11th International Joint Conference naacl-main.410.
on Natural Language Processing (Volume 1: Long Pa- [62] Wang X Z, Wei J, Schuurmans D, Le Q V, Chi E H,
pers), Aug. 2021, pp.4921–4933. DOI: 10.18653/v1/2021. Narang S, Chowdhery A, Zhou D. Self-consistency im-
acl-long.381. proves chain of thought reasoning in language models. In
[53] Chen Y L, Liu Y, Dong L, Wang S H, Zhu C G, Zeng Proc. the 11th International Conference on Learning
M, Zhang Y. AdaPrompt: Adaptive model training for Representations, May 2023.
prompt-based NLP. In Proc. the 2022 Findings of the [63] Lewkowycz A, Andreassen A, Dohan D, Dyer E,
Association for Computational Linguistics, Dec. 2022, Michalewski H, Ramasesh V, Slone A, Anil C, Schlag I,
pp.6057–6068. DOI: 10.18653/v1/2022.findings-emnlp. Gutman-Solo T, Wu T H, Neyshabur B, Gur-Ari G,
448. Misra V. Solving quantitative reasoning problems with
[54] Yin W P, Hay J, Roth D. Benchmarking zero-shot text language models. In Proc. the 36th International Confer-
classification: Datasets, evaluation and entailment ap- ence on Neural Information Processing Systems, Nov. 28-
proach. In Proc. the 2019 Conference on Empirical Dec. 9, 2022, Article No. 278.
Methods in Natural Language Processing and the 9th In- [64] Wang X Z, Wei J, Schuurmans D, Le Q, Chi E, Zhou D.
ternational Joint Conference on Natural Language Pro- Rationale-augmented ensembles in language models.
cessing (EMNLP-IJCNLP), Nov. 2019, pp.3914–3923. arXiv: 2207.00747, 2022. https://arxiv.org/abs/2207.
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 1001

00747, Jul. 2024. guage model prompts. In Proc. the 2022 CHI Confer-
[65] Li Y F, Lin Z Q, Zhang S Z, Fu Q, Chen B, Lou J G, ence on Human Factors in Computing Systems, Apr. 29-
Chen W Z. On the advance of making language models May 5, 2022, Article No. 385. DOI: 10.1145/3491102.
better reasoners. arXiv: 2206.02336, 2022. https://arxiv. 3517582.
org/abs/2206.02336v1, Jul. 2024. [77] Wang L, Xu W Y, Lan Y H, Hu Z Q, Lan Y S, Lee R K
[66] Fu Y, Peng H, Sabharwal A, Clark P, Khot T. Complex- W, Lim E P. Plan-and-solve prompting: Improving zero-
ity-based prompting for multi-step reasoning. In Proc. shot chain-of-thought reasoning by large language mod-
the 11th International Conference on Learning Represen- els. In Proc. the 61st Annual Meeting of the Association
tations, May 2023. for Computational Linguistics (Volume 1: Long Papers),
[67] Besta M, Blach N, Kubicek A, Gerstenberger R, Pod- Jul. 2023, pp.2609–2634. DOI: 10.18653/v1/2023.acl-long.
stawski M, Gianinazzi L, Gajda J, Lehmann T, 147.
Niewiadomski H, Nyczyk P, Hoefler T. Graph of [78] Li J L, Wang J Y, Zhang Z S, Zhao H. Self-prompting
thoughts: Solving elaborate problems with large lan- large language models for zero-shot open-domain QA. In
guage models. In Proc. the 38th AAAI Conference on Proc. the 2024 Conference of the North American Chap-
Artificial Intelligence, Feb. 2024, pp.17682–17690. DOI: ter of the Association for Computational Linguistics: Hu-
10.1609/aaai.v38i16.29720. man Language Technologies (Volume 1: Long Papers),
[68] Schick T, Schütze H. Few-shot text generation with pat- Jun. 2024, pp.296–310. DOI: 10.18653/v1/2024.naacl-long.
tern-exploiting training. arXiv: 2012.11926, 2020. https:// 17.
arxiv.org/abs/2012.11926, Jul. 2024. [79] Ye X, Durrett G. Explanation selection using unlabeled
[69] Perez E, Lewis P, Yih W T, Cho K, Kiela D. Unsuper- data for chain-of-thought prompting. In Proc. the 2023
vised question decomposition for question answering. In Conference on Empirical Methods in Natural Language
Proc. the 2020 Conference on Empirical Methods in Nat- Processing, Dec. 2023, pp.619–637. DOI: 10.18653/v1/2023.
ural Language Processing (EMNLP), Nov. 2020, pp.8864– emnlp-main.41.
8880. DOI: 10.18653/v1/2020.emnlp-main.713. [80] Shum K, Diao S Z, Zhang T. Automatic prompt aug-
[70] Zhou D, Schärli N, Hou L, Wei J, Scales N, Wang X Z, mentation and selection with chain-of-thought from la-
Schuurmans D, Cui C, Bousquet O, Le Q V, Chi E H. beled data. In Proc. the 2023 Findings of the Associa-
Least-to-most prompting enables complex reasoning in tion for Computational Linguistics, Dec. 2023, pp.12113–
large language models. In Proc. the 11th International 12139. DOI: 10.18653/v1/2023.findings-emnlp.811.
Conference on Learning Representations, May 2023. [81] Diao S Z, Wang P C, Lin Y, Pan R, Liu X, Zhang T.
[71] Dua D, Gupta S, Singh S, Gardner M. Successive Active prompting with chain-of-thought for large lan-
prompting for decomposing complex questions. In Proc. guage models. arXiv: 2302.12246, 2023. https://arxiv.
the 2022 Conference on Empirical Methods in Natural org/abs/2302.12246, Jul. 2024.
Language Processing, Dec. 2022, pp.1251–1265. DOI: 10. [82] Zhang Z S, Zhang A, Li M, Smola A. Automatic chain of
18653/v1/2022.emnlp-main.81. thought prompting in large language models. In Proc.
[72] Creswell A, Shanahan M, Higgins I. Selection-inference: the 11th International Conference on Learning Represen-
Exploiting large language models for interpretable logi- tations, May 2023.
cal reasoning. In Proc. the 11th International Confer- [83] Yang K, Tian Y D, Peng N Y, Klein D. Re3: Generating
ence on Learning Representations, May 2023. longer stories with recursive reprompting and revision. In
[73] Arora S, Narayan A, Chen M F, Orr L J, Guha N, Bha- Proc. the 2022 Conference on Empirical Methods in Nat-
tia K, Chami I, Ré C. Ask me anything: A simple strate- ural Language Processing, Dec. 2022, pp.4393–4479.
gy for prompting language models. In Proc. the 11th In- DOI: 10.18653/v1/2022.emnlp-main.296.
ternational Conference on Learning Representations, [84] Yang K, Klein D, Peng N Y, Tian Y D. Doc: Improving
May 2023. long story coherence with detailed outline control. In
[74] Khot T, Trivedi H, Finlayson M, Fu Y, Richardson K, Proc. the 61st Annual Meeting of the Association for
Clark P, Sabharwal A. Decomposed prompting: A modu- Computational Linguistics (Volume 1: Long Papers),
lar approach for solving complex tasks. In Proc. the 11th Jul. 2023, pp.3378–3465. DOI: 10.18653/v1/2023.acl-long.
International Conference on Learning Representations, 190.
May 2023. [85] Schick T, Dwivedi-Yu J, Dessí R, Raileanu R, Lomeli M,
[75] Ye Y H, Hui B Y, Yang M, Li B H, Huang F, Li Y B. Hambro E, Zettlemoyer L, Cancedda N, Scialom T.
Large language models are versatile decomposers: De- Toolformer: Language models can teach themselves to
compose evidence and questions for table-based reason- use tools. In Proc. the 37th International Conference on
ing. arXiv: 2301.13808, 2023. https://arxiv.org/abs/2301. Neural Information Processing Systems, Dec. 2023, Arti-
13808, Jul. 2024. cle No. 2997.
[76] Wu T S, Terry M, Cai C J. AI chains: Transparent and [86] Shen Y L, Song K T, Tan X, Li D S, Lu W M, Zhuang
controllable human-AI interaction by chaining large lan- Y T. HuggingGPT: Solving AI tasks with ChatGPT and
1002 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

its friends in hugging face. In Proc. the 37th Internation- Computational Linguistics, Jul. 2022, pp.49–60. DOI: 10.
al Conference on Neural Information Processing Systems, 18653/v1/2022.findings-naacl.5.
Dec. 2023, Article No. 1657. [98] Drozdov A, Schärli N, Akyürek E, Scales N, Song X Y,
[87] Wang B S, Deng X, Sun H. Iteratively prompt pre- Chen X Y, Bousquet O, Zhou D. Compositional seman-
trained language models for chain of thought. In Proc. tic parsing with large language models. In Proc. the 11th
the 2022 Conference on Empirical Methods in Natural International Conference on Learning Representations,
Language Processing, Dec. 2022, pp.2714–2730. DOI: 10. May 2023.
18653/v1/2022.emnlp-main.174. [99] Press O, Zhang M R, Min S, Schmidt L, Smith N A,
[88] Nye M, Andreassen A J, Gur-Ari G, Michalewski H, Lewis M. Measuring and narrowing the compositionality
Austin J, Bieber D, Dohan D, Lewkowycz A, Bosma M, gap in language models. In Proc. the 2023 Findings of
Luan D, Sutton C, Odena A. Show your work: Scratch- the Association for Computational Linguistics, Dec.
pads for intermediate computation with language mod- 2023, pp.5687–5711. DOI: 10.18653/v1/2023.findings-
els. In Proc. the 2022 Deep Learning for Code Work- emnlp.378.
shop, May 2022. [100] Mialon G, Dessi R, Lomeli M, Nalmpantis C, Pasunuru
[89] Zelikman E, Wu Y H, Mu J, Goodman N D. STaR: Self- R, Raileanu R, Rozière B, Schick T, Dwivedi-Yu J, Ce-
taught reasoner bootstrapping reasoning with reasoning. likyilmaz A, Grave E, LeCun T, Scialom T. Augmented
In Proc. the 36th International Conference on Neural In- language models: A survey. arXiv: 2302.07842, 2023.
formation Processing Systems, Nov. 28-Dec. 9, 2022, Ar- https://arxiv.org/abs/2302.07842, Jul. 2024.
ticle No. 1126. [101] Yao S Y, Zhao J, Yu D, Du N, Shafran I, Narasimhan K
[90] Taylor R, Kardas M, Cucurull G, Scialom T, Hartshorn R, Cao Y. ReAct: Synergizing reasoning and acting in
A, Saravia E, Poulton A, Kerkez V, Stojnic R. Galacti- language models. In Proc. the 11th International Confer-
ca: A large language model for science. arXiv: 2211.09085, ence on Learning Representations, May 2023.
2022. https://arxiv.org/abs/2211.09085, Jul. 2024. [102] Thoppilan R, De Freitas D, Hall J, Shazeer N, Kul-
[91] Ting K M, Witten I H. Stacked generalization: When shreshtha A, Cheng H T, Jin A, Bos T, Baker L, Du Y,
does it work? In Proc. the 15th International Joint Con- Li Y, Lee H, Zheng H S, Ghafouri A, Menegali M,
ference on Artificial Intelligence, Aug. 1997, pp.866– Huang Y P, Krikun M, Lepikhin D, Qin J, Chen D H,
871. Xu Y Z, Chen Z F, Roberts A, Bosma M, Zhao V, Zhou
[92] Zhou Z H, Wu J X, Tang W. Ensembling neural net- Y Q, Chang C C, Krivokon I, Rusch W, Pickett M,
works: Many could be better than all. Artificial Intelli- Srinivasan P, Man L, Meier-Hellstern K, Morris M R,
gence, 2002, 137(1/2): 239–263. DOI: 10.1016/S0004- Doshi T, Santos R D, Duke T, Soraker J, Zevenbergen
3702(02)00190-X. B, Prabhakaran V, Diaz M, Hutchinson B, Olson K,
[93] Duh K, Sudoh K, Wu X C, Tsukada H, Nagata M. Gen- Molina A, Hoffman-John E, Lee J, Aroyo L, Rajakumar
eralized minimum Bayes risk system combination. In R, Butryna A, Lamm M, Kuzmina V, Fenton J, Cohen
Proc. the 5th International Joint Conference on Natural A, Bernstein R, Kurzweil R, Aguera-Arcas B, Cui C,
Language Processing, Nov. 2011, pp.1356–1360. Croak M, Chi E, Le Q. LaMDA: Language models for di-
[94] Weng Y X, Zhu M J, Xia F, Li B, He S Z, Liu S P, Sun alog applications. arXiv: 2201.08239, 2022. https://arxiv.
B, Liu K, Zhao J. Large language models are better rea- org/abs/2201.08239, Jul. 2024.
soners with self-verification. In Proc. the 2023 Findings [103] Qiao S F, Ou Y X, Zhang N Y, Chen X, Yao Y Z, Deng
of the Association for Computational Linguistics, Dec. S M, Tan C Q, Huang F, Chen H J. Reasoning with lan-
2023, pp.2550–2575. DOI: 10.18653/v1/2023.findings- guage model prompting: A survey. In Proc. the 61st An-
emnlp.167. nual Meeting of the Association for Computational Lin-
[95] Yao S Y, Yu D, Zhao J, Shafran I, Griffiths T L, Cao Y, guistics (Volume 1: Long Papers), Jul. 2023, pp.5368–
Narasimhan K. Tree of thoughts: Deliberate problem 5393. DOI: 10.18653/v1/2023.acl-long.294.
solving with large language models. In Proc. the 37th In- [104] Lialin V, Deshpande V, Rumshisky A. Scaling down to
ternational Conference on Neural Information Process- scale up: A guide to parameter-efficient fine-tuning. arX-
ing Systems, Dec. 2023, Article No. 517. iv: 2303.15647, 2023. https://arxiv.org/abs/2303.15647,
[96] Schick T, Schütze H. Few-shot text generation with nat- Jul. 2024.
ural language instructions. In Proc. the 2021 Conference [105] Zhao W X, Zhou K, Li J Y, Tang T Y, Wang X L, Hou
on Empirical Methods in Natural Language Processing, Y P, Min Y Q, Zhang B C, Zhang J J, Dong Z C, Du Y
Nov. 2021, pp.390–402. DOI: 10.18653/v1/2021.emnlp- F, Yang C, Chen Y S, Chen Z P, Jiang J H, Ren R Y,
main.32. Li Y F, Tang X Y, Liu Z K, Liu P Y, Nie J Y, Wen J R.
[97] Yang J F, Jiang H M, Yin Q Y, Zhang D Q, Yin B, A survey of large language models. arXiv: 2303.18223,
Yang D Y. SEQZERO: Few-shot compositional seman- 2023. https://arxiv.org/abs/2303.18223, Jul. 2024.
tic parsing with sequential prompts and zero-shot mod- [106] Dong Q X, Li L, Dai D M, Zheng C, Wu Z Y, Chang B
els. In Proc. the 2022 Findings of the Association for B, Sun X, Xu J J, Li L, Sui Z F. A survey for in-con-
Yuan-Feng Song et al.: Communication Theory Perspective on Prompting Engineering Methods for LLMs 1003

text learning. arXiv: 2301.00234, 2022. https://arxiv. Computational Linguistics (Volume 1: Long Papers),
org/abs/2301.00234v1, Jul. 2024. Jul. 2023, pp.13573–13584. DOI: 10.18653/v1/2023.acl-
[107] Lou R Z, Zhang K, Yin W P. Is prompt all you need? long.759.
No. A comprehensive and broader view of instruction
learning. arXiv: 2303.10475, 2023. https://arxiv.org/abs/
2303.10475v1, Jul. 2024. Yuan-Feng Song is a researcher in
[108] Zhong R Q, Lee K, Zhang Z, Klein D. Adapting lan-
WeBank AI Group, WeBank, Shen-
guage models for zero-shot learning by meta-tuning on
zhen. His research interests include
dataset and prompt collections. In Proc. the 2021 Find-
ings of the Association for Computational Linguistics, learning to rank, data visualization,
Nov. 2021, pp.2856–2878. DOI: 10.18653/v1/2021.findings- and speech-driven applications. In his
emnlp.244. career, he has published several pa-
[109] Reynolds L, McDonell K. Prompt programming for large pers in venues such as KDD, ICDM,
language models: Beyond the few-shot paradigm. In
EMNLP, MM, TIST, TKDE, and SIGMOD.
Proc. the 2021 CHI Conference on Human Factors in
Computing Systems, May 2021, Article No. 314. DOI:
10.1145/3411763.3451760.
Yuan-Qin He is currently a re-
[110] Gu Z H, Fan J, Tang N, Cao L, Jia B W, Madden S, Du
X Y. Few-shot text-to-SQL translation using structure searcher with WeBank AI Group, We-
and content prompt learning. Proceedings of the ACM Bank, Shenzhen. He received his B.S.
on Management of Data, 2023, 1(2): 147. DOI: 10.1145/ degree in Physics from Shanghai Jiao
3589292. Tong University, and his Ph.D. de-
[111] Abadi M, Chu A, Goodfellow I, McMahan H B, Mironov
gree in physics from the Technical
I, Talwar K, Zhang L. Deep learning with differential
University of Munich, Munich, in
privacy. In Proc. the 2016 ACM SIGSAC Conference on
Computer and Communications Security, Oct. 2016, 2017. His research interests include machine learning
pp.308–318. DOI: 10.1145/2976749.2978318. and federated learning.
[112] Gentry C. A fully homomorphic encryption scheme [Ph.
D. Thesis]. Stanford University, Palo Alto, 2009.
[113] Yang Q, Liu Y, Chen T J, Tong Y X. Federated ma- Xue-Fang Zhao received her Mas-
chine learning: Concept and applications. ACM Trans.
ter degree in computer science from
Intelligent Systems and Technology, 2019, 10(2): 12.
the Tsinghua University, Beijing, in
DOI: 10.1145/3298981.
[114] Kirchenbauer J, Geiping J, Wen Y X, Katz J, Miers I, 2020. She is currently a research engi-
Goldstein T. A watermark for large language models. In neer at WeBank AI Group, WeBank,
Proc. the 40th International Conference on Machine Shenzhen. Her research interests in-
Learning, Jul. 2023, pp.17061–17084. clude natural language processing and
[115] Wei J, Wang X Z, Schuurmans D, Bosma M, Ichter B,
speech recognition.
Xia F, Chi E H, Le Q V, Zhou D. Chain-of-thought
prompting elicits reasoning in large language models. In
Proc. the 36th International Conference on Neural Infor-
Han-Lin Gu received his B.S. de-
mation Processing Systems, Nov. 28-Dec. 9, 2022, Arti-
cle No. 1800. gree in mathematics from University
[116] Zhao Z H, Wallace E, Feng S, Klein D, Singh S. Cali- of Science and Technology of China,
brate before use: Improving few-shot performance of lan- Hefei, in 2017. He received his Ph.D.
guage models. In Proc. the 38th International Confer- degree in mathematics from Hong
ence on Machine Learning, Jul. 2021, pp.12697–12706.
Kong University of Science and Tech-
[117] Schick T, Udupa S, Schütze H. Self-diagnosis and self-de-
nology, Hong Kong, in 2022. He now
biasing: A proposal for reducing corpus-based bias in
NLP. Transactions of the Association for Computation- works as a senior researcher at WeBank AI Group, We-
al Linguistics, 2021, 9: 1408–1424. DOI: 10.1162/tacl_a_ Bank, Shenzhen. His research interests include federat-
00434.
ed learning and privacy-preserving methodology. He has
[118] Liu Y, Gao Y, Su Z, Chen X K, Ash E, Lou J G. Uncov-
ering and categorizing social biases in text-to-SQL. In published a series of papers in TPAMI, TDSC, IJCAI,
Proc. the 61st Annual Meeting of the Association for PAKDD, and so on.
1004 J. Comput. Sci. & Technol., July 2024, Vol.39, No.4

Di Jiang received his Ph.D. degree Li-Xin Fan is the Principal Scien-
in computer science from the Hong tist of Artificial Intelligence at We-
Kong University of Science and Tech- Bank, Shenzhen, and the Chairman of
nology, Hong Kong, in 2014. He is cur- the Federal Learning Industry Ecologi-
rently the principal scientist at We- cal Development Alliance. His re-
Bank AI Group, WeBank, Shenzhen. search fields include machine learning
His research interests include informa- and deep learning, computer vision
tion retrieval, natural language processing, and massive and pattern recognition, image and video processing, 3D
data management. big data processing, data visualization and rendering,
augmented and virtual reality, mobile computing and
ubiquitous computing, and intelligent man-machine in-
Hai-Jun Yang received his B.E. de- terface. He is the author of more than 70 international
gree in 2008 and his M.S. degree in journals and conference articles. He has worked at Nokia
2011, both from Harbin Institute of Research Center and Xerox Research Center Europe.
Technology, Harbin. He is currently His research includes the well-known Bag of Keypoints
the Senior Manager of the AI Group image classification method. He has participated in NIPS/
at WeBank, Shenzhen, mainly respon- NeurIPS, ICML, CVPR, ICCV, ECCV, IJCAI and oth-
sible for promoting the integration and er top artificial intelligence conferences for a long time,
implementation of AI technology with WeBank in cus- served as area chair of AAAI, and organized workshops
tomer service, risk control, marketing, and other busi- in various technical fields. He is also the inventor of
ness scenarios. more than one hundred patents filed in the United
States, Europe, and China, and the chairman of the
IEEE P2894 Explainable Artificial Intelligence (XAI)
Standard Working Group.

You might also like