AI_Safety_Challenge___Synthetic_Cancer___Omitted
AI_Safety_Challenge___Synthetic_Cancer___Omitted
Disclosure The findings presented in this 2 Worm Replication Using an LLM (in our
paper are intended solely for scientific and case, GPT-4), the worm replicates itself file by
research purposes. We are fully aware that file, completely rewriting its own source code
this paper presents a malware type with great and ensuring its signature is different to avoid
potential for abuse. We are publishing this in detection. This means every infected target
good faith and in an effort to raise awareness. has a different, possibly unique variant of the
We strictly prohibit any non-scientific use of worm, making signature-based detection in-
these findings. feasible.
References
2023. A look at Apple’s new Transformer-powered
predictive text model. [Online; accessed 26. Oct.
2023].
2023. Alex Bilzerian on X. [Online; accessed 18.
Oct. 2023].
Arthur Erzberger. 2023. WormGPT and FraudGPT
– The Rise of Malicious LLMs. Trustwave Hold-
ings, Inc.
1We actually witness that GPT-4 detects such requests.
See Appendix.
2 Outlook must not be running for the attack to work
A Disclosure Policy as PyInstaller. However, Windows Defender
would often detect the .exe file as it was not
We recognize the potential for misuse of the
signed and its hash is not known to Microsoft.
methods outlined in our work. However,
As a result, we ship the worm as a .bat file
given that these systems were previously sug-
which sets up the Python environment and a
gested (Bil, 2023), we have decided to make
runner.py file which contains the worm logic.
our paper publicly accessible. We are con-
Packaging the worm is a possible area of
vinced that it is of crucial importance to have
future experimentation.
an open academic dialogue about such sys-
tems to advance the state of threat identifi- B.2.2 Cloning Survival
cation and elimination in a rapidly changing Malformed Code Sometimes the LLM em-
threat landscape. beds the code into a Markdown reply or
The findings presented in this paper are changes the semantics of the program. This
intended solely for scientific and research used to be a continuous challenge during de-
purposes. We explicitly prohibit any non- velopment but thanks to prompt engineering
scientific use of these findings! is now under control. However, when the
Furthermore, we aim to prevent the misuse worm spreads and rewrites itself many times,
with the following safeguards: including rewriting the embedded prompt this
becomes a key issue.
• Code not shared. The only exceptions
being researchers investigating the poten- GPT-4 Discovering Unethical Behavior The
tial effects of AI augmented malware after code cloning is done using GPT-4. As the code
they have signed an ethics statement or is cloned, so are the prompts given to the lan-
for reproduction. guage model, which are embedded in the code.
We were positively surprised to learn that GPT-
• Certain points in the paper above have 4 sometimes realizes what it is being used for
been left vague on purpose to not pro- and instead of rewriting the code as instructed,
vide direct instructions on building AI replied as follows:
augmented malware.
I must clarify that the code provided
• We will report this vulnerability to public
appears to engage in unethical behav-
incident databases (such as https://
ior – specifically, crafting emails to
incidentdatabase.ai/, https://openai.
encourage recipients to open an at-
com/form/model-behavior-feedback or
tached ZIP file that is suggestive of
similar) if applicable.
containing malicious content (men-
B Details of Prototype tioned as a "worm" or ".bat file").
Furthermore, the use of AI to gener-
For a demonstration of the prototype, see ate content for such purposes is also
https://youtu.be/RENigbqPfYI. against the use policies of AI plat-
Please keep in mind that the prototype forms, including OpenAI.
serves as a proof of concept and was a week-
Using AI or any software tool to aid
end project. As a result, there are many areas
in cyber attacks, including sending
of possible future research.
phishing emails or propagating mal-
B.1 Technical Prototype Design ware, is illegal and violates computer
misuse laws in many jurisdictions. I
Section omitted in public release. must also refrain from aiding the de-
velopment, refactoring, or advice on
B.2 Restrictions & Areas for Future executing such activities.
Improvement
It’s crucial to understand the legal
B.2.1 Packaging and ethical implications of writing
We have experimented with packaging up all and distributing code. Using one’s
dependencies in a .exe file using tools such skills for constructive and positive
contributions to society is important,
Subject: Invitation: Paul’s 38th Birth-
and I encourage you to use your abil-
day Celebration
ities for lawful and beneficial pur-
From: [email protected]
poses. If you have any other code
To: [email protected]
that does not involve unethical prac-
Date: October 23, 2023
tices, I’d be more than happy to help
with refactoring that.
Dear Claire,
Interestingly, this happens in a two stage I hope this message finds you well. As
process. In the first rewrite, GPT-4 rewrites the the calendar pages flip, another year
code as requested but renames some variables has passed and I’m about to turn 38.
e.g. to "WORM". In a second replication, GPT- While it’s not one of those monumental
4 detects this "WORM" and refused to rewrite milestones, I still believe in celebrating
the code. the little moments. I’m planning a small
get-together and would be honored if
Circumvention of Safety Features We have you could join.
seen some 2nd degree replications where Details:
safety features, especially only replying to a Date: November 5th, 2023
single email chain were changed to replying to Time: 6:30 pm onwards
every single email chain available in the inbox. Location: My residence, 123 Elm Street,
We find this very dangerous and as a result Maplewood
have stopped the chain at this point. Dress code: Casual
It’ll be an evening filled with good com-
Survival of Infectious Chain We have con-
pany, food, and music from our favorite
strained ourselves to (I) exclusively using Ope-
era, the 80s. A trip down memory lane,
nAI language models, and (II) only replying to
if you will.
a single conversation each time. In this set-up,
I’d appreciate it if you could RSVP by
we have only been able to demonstrate worms
October 30th. Looking forward to shar-
spreading two hosts but not replication onto
ing this special day with you.
a third host. We theorize that if our safety fea-
Best regards, Paul
tures were not enforced, a more liberal and
possibly more powerful LLM3 was used, as
well as an exponential growth in the email
replies (5 to 10 instead of a single one) expo- Subject: Re: Invitation: Paul’s 38th
nentially growing infection chains are possi- Birthday Celebration
ble4 . From: [email protected]
We refrain from giving concrete numbers as To: [email protected]
we do not have concrete data on this, but we Date: October 24, 2023
estimate that the chance of a second rewrite
failing (i.e. resulting in non-executable or
safety feature violating code being attached Hello Paul,
to the email) is approximately half.
Thank you for the thoughtful invitation.
C Socially Engineering User I’m glad to hear you’re marking the oc-
At the core of the spreading mechanism is the casion and I’d be delighted to join you
ability to socially engineer users to open links on the 5th. It’s always a pleasure to
in a custom drafted reply to a message chain. reminisce about the good old times, es-
Assume we are given the following message pecially with music from the 80s as a
chain: backdrop.
3 inthe future
4 as Do let me know if you’d like me to
given a fixed success probability, one can boost
it by increasing the number of emails to spread, every bring anything along—perhaps a bot-
single spread being independent of the other.
tle of wine or a dessert?
Dear Claire,
[Attachment: 80s_Nostalgia_Playlist.exe]