AI_Safety_Challenge___Synthetic_Cancer___Omitted

The document discusses a novel type of metamorphic malware that utilizes large language models (LLMs) for code rewriting and targeted spreading through social engineering. It outlines the mechanism of the proposed exploit, including worm installation via email and the challenges in detection and mitigation. The authors emphasize the importance of responsible research and the potential for misuse, while providing a prototype demonstration and acknowledging the ethical implications of their findings.

Uploaded by

sherbina.pro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

AI_Safety_Challenge___Synthetic_Cancer___Omitted

Uploaded by

sherbina.pro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Synthetic Cancer - Augmenting Worms with LLMs

Submission for the Swiss AI Safety Prize

Benjamin Zimmerman∗ David Zollikofer∗

Ohio State University ETH Zürich &
[email protected] Innovista Management GmbH
[email protected]

Disclosure The findings presented in this 2 Worm Replication Using an LLM (in our
paper are intended solely for scientific and case, GPT-4), the worm replicates itself file by
research purposes. We are fully aware that file, completely rewriting its own source code
this paper presents a malware type with great and ensuring its signature is different to avoid
potential for abuse. We are publishing this in detection. This means every infected target
good faith and in an effort to raise awareness. has a different, possibly unique variant of the
We strictly prohibit any non-scientific use of worm, making signature-based detection in-
these findings. feasible.

1 Introduction 3 Worm Spreading After replication, Out-

look is scanned for past email conversations,
With the rise of LLMs, a completely new threat and an LLM inspects these conversations for
landscape has emerged, ranging from LLMs potential social engineering attacks, drafting
used for fraudulent purposes (Erzberger, 2023) a reply email where the user is encouraged to
to initial ideas of integrating LLMs in mal- open the attachment. The email is then sent
ware (Labs, 2023), and even first prototypes as a reply to that conversation with one of the
for LLM-based worms (Bil, 2023). copies of the worm, essentially closing the in-
We propose a novel type of metamorphic fectious circle.
malware that utilizes LLMs in two key areas:
(I) code rewriting, and (II) targeted spreading
combined with social engineering.
1 Worm
Our proposal is supplemented by a mini-
mal prototype, further described in Appendix
B. See https://youtu.be/RENigbqPfYI for a Worm
Worm
2 Worm Worm
demonstration. Worm

2 Mechanism of Proposed Exploit Worm

In this section, we describe a possible mecha-

Worm
nism of action inspired by our prototype. For 3 Worm

the relevant numbered sections, see Figure 1. Worm

1 Worm Installation The worm is sent to a Worm

target via email. The user executes the worm

using social engineering tactics. After initial Figure 1
execution, the worm can download required
dependencies from the internet due to email
attachment size limits. This stage is also where
3 Mitigation of Exploit
the worm can do potential damage, such as
encrypting a user’s file. We believe that the model providers likely can-
∗
Authors sorted alphabetically. All contributed not solve this issue in its entirety. After all,
equally. how can one distinguish a legitimate code
refactoring request from a malicious worm B42 Labs. 2023. LLM meets Malware: Starting the
refactoring request?1 Era of Autonomous Threat. Medium.
Further, as we use an LLM to draft the email
replies, we believe that further sensibilization
against social engineering will not solve this.
To efficiently combat this threat, we propose
that either contacting a LLM API must be de-
tected or if a model is run locally its execu-
tion must be detected. However, this is non-
trivial, as what separates a running LLM for
text completion (as found on the new macOS
(Jac, 2023)) from a malicious one.

4 Assumptions and Limitations

Need for access to a good enough language
model: The language model, which is central
to the malicious actor, can either be hosted on
the internet (requiring an unblocked internet
connection) or downloaded locally after worm
infection. This is due to the fact that LLM
weights are too large to be included in email
attachments.
Need for an Email client: We currently base
our approach on the presence of an instance
of Outlook with logged-in email accounts on
the target system2 . This strategy allows us to
sidestep the need for obtaining email creden-
tials directly.
Dependence on a socially engineerable user:
Our strategy operates on the assumption that
we can persuade the user to run the malicious
email attachment. Further insights on this as-
sumption can be found in Appendix C.
Ability to Execute Code: We require a user
being able to run non-signed executables or
scripts.

References
2023. A look at Apple’s new Transformer-powered
predictive text model. [Online; accessed 26. Oct.
2023].
2023. Alex Bilzerian on X. [Online; accessed 18.
Oct. 2023].
Arthur Erzberger. 2023. WormGPT and FraudGPT
– The Rise of Malicious LLMs. Trustwave Hold-
ings, Inc.
1We actually witness that GPT-4 detects such requests.
See Appendix.
2 Outlook must not be running for the attack to work
A Disclosure Policy as PyInstaller. However, Windows Defender
would often detect the .exe file as it was not
We recognize the potential for misuse of the
signed and its hash is not known to Microsoft.
methods outlined in our work. However,
As a result, we ship the worm as a .bat file
given that these systems were previously sug-
which sets up the Python environment and a
gested (Bil, 2023), we have decided to make
runner.py file which contains the worm logic.
our paper publicly accessible. We are con-
Packaging the worm is a possible area of
vinced that it is of crucial importance to have
future experimentation.
an open academic dialogue about such sys-
tems to advance the state of threat identifi- B.2.2 Cloning Survival
cation and elimination in a rapidly changing Malformed Code Sometimes the LLM em-
threat landscape. beds the code into a Markdown reply or
The findings presented in this paper are changes the semantics of the program. This
intended solely for scientific and research used to be a continuous challenge during de-
purposes. We explicitly prohibit any non- velopment but thanks to prompt engineering
scientific use of these findings! is now under control. However, when the
Furthermore, we aim to prevent the misuse worm spreads and rewrites itself many times,
with the following safeguards: including rewriting the embedded prompt this
becomes a key issue.
• Code not shared. The only exceptions
being researchers investigating the poten- GPT-4 Discovering Unethical Behavior The
tial effects of AI augmented malware after code cloning is done using GPT-4. As the code
they have signed an ethics statement or is cloned, so are the prompts given to the lan-
for reproduction. guage model, which are embedded in the code.
We were positively surprised to learn that GPT-
• Certain points in the paper above have 4 sometimes realizes what it is being used for
been left vague on purpose to not pro- and instead of rewriting the code as instructed,
vide direct instructions on building AI replied as follows:
augmented malware.
I must clarify that the code provided
• We will report this vulnerability to public
appears to engage in unethical behav-
incident databases (such as https://
ior – specifically, crafting emails to
incidentdatabase.ai/, https://openai.
encourage recipients to open an at-
com/form/model-behavior-feedback or
tached ZIP file that is suggestive of
similar) if applicable.
containing malicious content (men-
B Details of Prototype tioned as a "worm" or ".bat file").
Furthermore, the use of AI to gener-
For a demonstration of the prototype, see ate content for such purposes is also
https://youtu.be/RENigbqPfYI. against the use policies of AI plat-
Please keep in mind that the prototype forms, including OpenAI.
serves as a proof of concept and was a week-
Using AI or any software tool to aid
end project. As a result, there are many areas
in cyber attacks, including sending
of possible future research.
phishing emails or propagating mal-
B.1 Technical Prototype Design ware, is illegal and violates computer
misuse laws in many jurisdictions. I
Section omitted in public release. must also refrain from aiding the de-
velopment, refactoring, or advice on
B.2 Restrictions & Areas for Future executing such activities.
Improvement
It’s crucial to understand the legal
B.2.1 Packaging and ethical implications of writing
We have experimented with packaging up all and distributing code. Using one’s
dependencies in a .exe file using tools such skills for constructive and positive
contributions to society is important,
Subject: Invitation: Paul’s 38th Birth-
and I encourage you to use your abil-
day Celebration
ities for lawful and beneficial pur-
From: [email protected]
poses. If you have any other code
To: [email protected]
that does not involve unethical prac-
Date: October 23, 2023
tices, I’d be more than happy to help
with refactoring that.
Dear Claire,
Interestingly, this happens in a two stage I hope this message finds you well. As
process. In the first rewrite, GPT-4 rewrites the the calendar pages flip, another year
code as requested but renames some variables has passed and I’m about to turn 38.
e.g. to "WORM". In a second replication, GPT- While it’s not one of those monumental
4 detects this "WORM" and refused to rewrite milestones, I still believe in celebrating
the code. the little moments. I’m planning a small
get-together and would be honored if
Circumvention of Safety Features We have you could join.
seen some 2nd degree replications where Details:
safety features, especially only replying to a Date: November 5th, 2023
single email chain were changed to replying to Time: 6:30 pm onwards
every single email chain available in the inbox. Location: My residence, 123 Elm Street,
We find this very dangerous and as a result Maplewood
have stopped the chain at this point. Dress code: Casual
It’ll be an evening filled with good com-
Survival of Infectious Chain We have con-
pany, food, and music from our favorite
strained ourselves to (I) exclusively using Ope-
era, the 80s. A trip down memory lane,
nAI language models, and (II) only replying to
if you will.
a single conversation each time. In this set-up,
I’d appreciate it if you could RSVP by
we have only been able to demonstrate worms
October 30th. Looking forward to shar-
spreading two hosts but not replication onto
ing this special day with you.
a third host. We theorize that if our safety fea-
Best regards, Paul
tures were not enforced, a more liberal and
possibly more powerful LLM3 was used, as
well as an exponential growth in the email
replies (5 to 10 instead of a single one) expo- Subject: Re: Invitation: Paul’s 38th
nentially growing infection chains are possi- Birthday Celebration
ble4 . From: [email protected]
We refrain from giving concrete numbers as To: [email protected]
we do not have concrete data on this, but we Date: October 24, 2023
estimate that the chance of a second rewrite
failing (i.e. resulting in non-executable or
safety feature violating code being attached Hello Paul,
to the email) is approximately half.
Thank you for the thoughtful invitation.
C Socially Engineering User I’m glad to hear you’re marking the oc-
At the core of the spreading mechanism is the casion and I’d be delighted to join you
ability to socially engineer users to open links on the 5th. It’s always a pleasure to
in a custom drafted reply to a message chain. reminisce about the good old times, es-
Assume we are given the following message pecially with music from the 80s as a
chain: backdrop.
3 inthe future
4 as Do let me know if you’d like me to
given a fixed success probability, one can boost
it by increasing the number of emails to spread, every bring anything along—perhaps a bot-
single spread being independent of the other.
tle of wine or a dessert?

Warm regards, Claire

A generated malicious reply can look as fol-

lows (generated with our prompt to the above
message chain).

Subject: Your RSVP and a Glimpse of

the Past
From: [email protected]
To: [email protected]
Date: October 26, 2023

Dear Claire,

I’m truly delighted to hear that you’ll

be joining us for my 38th birthday cele-
bration. Your company is always cher-
ished, and I’m eagerly looking forward
to our nostalgic trip down the 80s lane.

Regarding your kind offer, a bottle of

wine would be splendid. If you have
any suggestions or preferences, do let
me know. Your taste has always been
impeccable!

Also, while preparing for the evening,

I came across a playlist that perfectly
captures the essence of our favorite era.
I thought it might bring back some
fond memories and set the tone for the
evening. I’ve attached it for you to pre-
view. It’s a mix of some forgotten hits
and our all-time favorites. Do give it a
listen when you find a moment.