0% found this document useful (0 votes)
7 views

GASH Review3

The document proposes a honeypot system called GASH that can dynamically adapt and engage with attackers through machine learning. It summarizes previous work, outlines the proposed methodology, and describes the architecture and logical data flow of GASH. GASH aims to intelligently interact with attackers while maintaining security through adaptive response mechanisms and behavioral analysis powered by machine learning.

Uploaded by

Apoorva VH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

GASH Review3

The document proposes a honeypot system called GASH that can dynamically adapt and engage with attackers through machine learning. It summarizes previous work, outlines the proposed methodology, and describes the architecture and logical data flow of GASH. GASH aims to intelligently interact with attackers while maintaining security through adaptive response mechanisms and behavioral analysis powered by machine learning.

Uploaded by

Apoorva VH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

UE21CS320A – Capstone Project Phase-1 Review #3

(High Level Design and Proposed Methodology)

Project Title : GASH - Generative Adaptive Self-Learning Honeypot


Project ID : 80
Project Guide : Dr. Sarasvathi V.
Project Team : Apoorva V.H - PES2UG21CS085
Nikhil Girish - PES2UG21CS334
Pragna Prasad - PES2UG21CS379
Prem Kumar S - PES2UG21CS397
Outline

▪ Abstract
▪ Summary of Literature Survey
• Suggestions from Review – 2
• Proposed Methodology / Design Approach
• Architecture
• Design Description
• Technologies Used
• Project Progress
• References
Abstract

● Traditional honeypots lack the ability to dynamically adapt


and engage with sophisticated threats like zero day attacks,
leaving network infrastructures vulnerable to evolving cyber
threats.

● There is a critical need for the development of a honeypot


system with higher interaction and the ability to adapt and
learn from the attackers’ strategies.

● Such a system would intelligently engage with malicious


actors while maintaining robust self-defense mechanisms.
Abstract

● The "Generative and Adaptive Self-Learning Honeypot" (GASH)


proposes a comprehensive solution to the challenge of
balancing interaction and security in honeypot deployment.

● Through dynamic interaction levels, adaptive response


mechanisms, and behavioral analysis powered by machine
learning, GASH intelligently adjusts its behavior to simulate
authentic systems while actively monitoring for threats and
adjusting responses accordingly.
Summary of Literature Survey in Review 2

● Adaptive honeypots like Asgard[12] demonstrate superior


effectiveness in collecting high-quality attacker data and evading
detection compared to conventional honeypots.
However, limitations such as potential observation constraints and
vulnerability to fingerprinting by attackers are noted.

● SSH and Telnet Protocol Analysis[2][6]:


The SSH and Telnet protocol analysis conducted through
honeypot-based techniques contributes to a deeper understanding
of attacker strategies and vulnerabilities in these services.
Insights gained from the analysis can inform the development of
targeted risk mitigation strategies and bolster overall cybersecurity
defenses against SSH and Telnet protocol-related threats
Summary of Literature Survey in Review 2

● Honeypot Realism Enhancement:


Existing honeypots often lack the realism needed to effectively
engage human attackers, diminishing their effectiveness.

● Adaptability with LLMs[1]:


LLMs enable on-demand generation of fake file systems and
command responses, allowing honeypots to dynamically adapt to
attacker actions.
This adaptability significantly enhances honeypot deception
capabilities, improving cybersecurity defenses.
Empirical evidence from human attacker experiments supports the
effectiveness of LLM-based honeypots.
Suggestions from Review – 2

● Focus on Implementation: Explore how various concepts and


methodologies are practically implemented in existing
research, gain insights into the challenges, best practices, and
lessons learned.

● Facilitating Project Development: Study the implementation


details of different papers to facilitate development of the
project, especially in terms of design decisions, tool selection,
and integration strategies. Understand how existing solutions
tackle practical challenges, to take an informed approach in
building a more robust and effective system.
Design Details

1. Novelty:
Unlike static approaches, GASH innovatively integrates reinforcement
learning models with the OpenAI API to dynamically enhance deception
tactics against attackers.

2. Innovativeness:
This project uniquely integrates RL-based attack identification with
advanced language models to compose fault-proof responses, effectively
deceiving attackers while maintaining operational security.

3.Interoperability:
Interoperability is ensured by adhering to the SSH protocol, facilitating
seamless communication between components. Standardized APIs are
provided for integration with external systems.
Design Details

4. Portability:
As the proposed solution, GASH, is not a physical system used as a
honeypot but instead a virtual one, it provides greater portability over
existing high-interaction honeypot systems which are typically
full-scale systems.

5. Security
As GASH is a virtual high-interaction honeypot, intended to run inside
a VM, any potential attacker would be sandboxed and unable to have
an impact on real systems/host machines. This is a step above
traditional high-interaction honeypots which are full systems, that can
prove vulnerable.
Proposed Methodology / Approach
Current System :

● The Cowrie honeypot operates by emulating SSH services to attract and


monitor potential attackers. It simulates a vulnerable SSH server,
enticing attackers to interact with the system.

● When attackers connect to the Cowrie honeypot, their activities,


commands, and interactions are logged and monitored for analysis. The
honeypot captures information such as login attempts, commands
executed, and potential exploits used by attackers. By mimicking a real
SSH server, the Cowrie honeypot aims to gather valuable insights into
attacker behavior, tactics, and techniques without exposing any real
systems to risk. This data can then be used for threat intelligence,
understanding attack patterns, and enhancing cybersecurity defenses
Proposed Methodology / Approach

Proposed System:

GASH incorporates key components and functionalities such as input


handling, attack type identification using a pre-trained RL model,
deception mechanism via the OpenAI API, comprehensive response
composition, real-time monitoring and alerting, and detailed
reporting for each detected attack. By providing dynamic, adaptive,
and proactive cybersecurity measures, GASH enables organizations to
effectively defend against evolving threats in the digital landscape.
Architecture
Proposed Methodology / Approach

Actions:
● Allow: This action allows the execution of a command inserted by an attacker.

● Block: This action blocks the execution of the command and for each command, provides
specific blocking messages. E.g: blocking a wget command can lead an attacker to use an
alternative repository and this may reveal another attacker controlled location. This is
viable as it has been proved that the lifetime of malicious code repositories is ~1h so it is
not unusual for GASH to return an error code.

● Fake output: The output of a command is faked, and for each command, provides specific
faked output. The stored fake output is a modified copy of a normal one and for different
types of commands it is listed line by line. E.g: w can have a faked output that displays
other users that do not actually exist.
Proposed Methodology / Approach

Actions:
● Insult: This action insults the attacker, and each attacker gets geo-localized via their IP
address and an insult message stored in the database in the native language will be printed on
the shell. The command will not get executed. This action mainly serves as reverse Turing
Test. The purpose of such a test is to discover whether an action is being performed by a
human being or an automated tool. E.g: An attacker has downloaded a customized tool and
wants to execute it. GASH could reply: Is this all what you want to do?

● Delay: This action intends to delay the execution of a command. Its utility resides in the fact
that attackers might consider the system being exhausted and will try to download other tools
that might have less resources usage, giving us more information about the methods and tools
that attackers use. Implementing this action would be equivalent to a sleep line code with a
specific interval in the loop which upon its expiration, the command is executed.
Proposed Methodology / Approach
Logical Dataflow:

● The attacker attempts an input and GASH passes this on to get


pre-processed and fed to the RL model

● This RL model has been pre-trained on existing datasets which have also
already been pre-processed

● Now, this RL model takes a decision (which is one of the following:


allow, block, delay, insult and fake)

● This decision is passed back down to GASH


Proposed Methodology / Approach

Logical Dataflow:

● GASH now sends this decision as well as the attacker’s input data to the
backend OpenAI API

● The API processes a carefully constructed response that ensures that


GASH is high interaction

● This response is then shown to the attacker


Proposed Methodology / Approach

Logical Dataflow:

● The same response is also simultaneously arranged into a logfile which


includes other data fields like the attacker’s input, necessary
timestamps, geolocation data, protocols used, source and destination
information etc

● This logfile is then sent to the live monitoring service (web server)
through which admins can gain insight into attacker behaviour
Proposed Methodology / Approach

Possible challenges:

- Efficient scalability: Scaling the project to incorporate larger networks and a


large number of attacks simultaneously

- Lack of computational resources: We may not have the resources to deploy


this at a higher real-world level with huge incoming and outgoing data traffic

- Training bias: GPT3.5 has only been trained with data up to 2021.The impact
of this cutoff is that any bias present in the training data will be inherited by
the model such as outdated log contents or lack of modern interactions.
Architecture

Logical User Groups:


Cybersecurity Professionals, Network Administrators, Security Analysts

Characteristics:
Expertise: Users must possess cybersecurity, network administration, or
security analysis skills to manage GASH effectively.
Technical Proficiency: Strong understanding of network protocols,
vulnerabilities, and attack techniques is required for interpreting GASH
reports and implementing countermeasures.
Analytical Skills: Proficiency in interpreting real-time monitoring data
and identifying threats is essential for proactive risk mitigation.
Architecture

Application Components:

GASH System: Core component responsible for deploying, monitoring,


and analyzing SSH interactions.
RL Model: Utilized for attack identification and decision-making.
OpenAI API : Generates deceptive responses to deceive attackers.
Network Infrastructure: Represents the environment where the GASH
system is deployed.
Architecture

Data Components:

● Attacker Interaction Data: Captures interactions initiated by


attackers.
● RL Model Output: Contains decisions and predictions made by the RL
model.
● OpenAI-Generated Prompts: Text prompts generated by the OpenAI
API for deception.
● Log Data: Records system activities, attacker behavior, and security
events.
Design Description

Master Class Diagram


Design Description

Entity-Relationship Diagram
Design Description

Use Case Diagram


Design Description

External Interfaces:
● Attacker Interface: This is a simple interface exposed to the internet.
Potential attackers will interact with GASH through this interface. They
can submit commands or attempt various attacks. This interface should be
designed to be deceptive, mimicking a real system but not revealing any
sensitive details about the actual underlying system.

● System Administration Interface: This interface would be used by system


administrators to monitor GASH's operation, view attack logs, and manage
configurations. It would likely be a web-based interface accessible only to
authorized users with proper credentials.
Design Description

External Interfaces
● OpenAI API: GASH interacts with OpenAI's API to generate responses that
are tailored to the attacker's input and the RL model's decision. This API is
external to GASH and provides functionalities for creative text generation.

● Live Monitoring Service (Web Server):GASH logs detailed information about


each interaction with an attacker.This log data is sent to a live monitoring
service, likely a web server, for further analysis and visualization.

● System administrators can access this service to gain insights into attacker
behavior and identify potential threats.
Technologies Used

Databases like SQLite or MongoDB to store captured logs from attacks on


the honeypot.

Tools and Libraries: Python 3.10 onwards with libraries like scapy for
packet manipulation, and other relevant tools for network monitoring
and analysis like Wireshark 4.0 onwards

Network Protocols: TCP/IP stack for communication over the Internet.

Socket Programming: Utilized for handling network connections.


Technologies Used

SSH Protocol: Fundamental for the operation of the SSH honeypot,


facilitating communication with potential attackers.

Reinforcement Learning (RL): Used for attack identification and


decision-making within the system, enabling adaptive responses to
evolving threats.

OpenAI API Integration: Utilized for generating deceptive responses,


enhancing the system's ability to deceive attackers.

Logging and Monitoring Tools: Employed to track system activity, analyze


logs, and detect anomalous behavior.
Capstone (Phase-I & Phase-II) Project Timeline

Provide
• The timelines for execution of the project through Gantt
chart.
• The plan in terms of efforts by individuals in the team.
• Mention the tasks involved in different stages.
Conclusion

Need for Adaptability: Traditional honeypots are insufficient in


dynamically responding to evolving threats, highlighting the necessity
for more adaptive solutions like GASH.

Dynamic Interaction: GASH introduces dynamic interaction levels and


adaptive responses, marking a significant departure from static
honeypot models.

Machine Learning Integration: The incorporation of machine learning


enables GASH to analyze attacker behavior and adjust responses
accordingly, enhancing its effectiveness in threat detection and
mitigation.
Conclusion

Comprehensive Components: GASH integrates various components,


including input handling, reinforcement learning-based attack
identification, and OpenAI deception mechanisms, to provide a
comprehensive defense strategy.

Identified Challenges: Scalability concerns, resource limitations, and


potential biases in training data are recognized challenges that require
further attention for optimizing GASH's performance in practical
deployment scenarios.
References
1. Sladić, M., Valeros, V., Catania, C., and Garcia, S., “LLM in the Shell: Generative Honeypots”
https://arxiv.org/abs/2309.00155

2. Harry Doubleday, Leandros Maglaras and Helge Janicke, “SSH Honeypot: Building, Deploying
and Analysis” International Journal of Advanced Computer Science and Applications(ijacsa),
7(5), 2016. http://dx.doi.org/10.14569/IJACSA.2016.070518

3. A. Pauna and I. Bica, "RASSH - Reinforced adaptive SSH honeypot," 2014 10th International
Conference on Communications (COMM), Bucharest, Romania, 2014, pp. 1-6, doi:
10.1109/ICComm.2014.6866707. https://ieeexplore.ieee.org/document/6866707

4. A. Pauna, A. -C. Iacob and I. Bica, "QRASSH - A Self-Adaptive SSH Honeypot Driven by
Q-Learning," 2018 International Conference on Communications (COMM), Bucharest,
Romania, 2018, pp. 441-446, doi: 10.1109/ICComm.2018.8484261.
https://ieeexplore.ieee.org/abstract/document/8484261
References

5. Á. Balogh, M. Érsok, A. Bánáti and L. Erdődi, "Concept for real time attacker profiling with
honeypots, by skill based attacker maturity model," 2024 IEEE 22nd World Symposium on Applied
Machine Intelligence and Informatics (SAMI), Stará Lesná Slovakia, 2024, pp. 000175-000180, doi:
10.1109/SAMI60510.2024.10432876 https://ieeexplore.ieee.org/document/10432876

6. J. S. Lopez–Yepez and A. Fagette, "Increasing attacker engagement on SSH honeypots using


semantic embeddings of cyber-attack patterns and deep reinforcement learning," 2022 IEEE
Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 389-395,
doi: 10.1109/SSCI51031.2022.10022206. https://ieeexplore.ieee.org/document/10022206

7. M. A. Kristyanto, H. Studiawan and B. A. Pratomo, "Evaluation of Reinforcement Learning


Algorithm on SSH Honeypot," 2022 6th International Conference on Information Technology,
Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 2022, pp.
346-350, doi: 10.1109/ICITISEE57756.2022.10057816.
https://ieeexplore.ieee.org/document/10057816
References

8. M. Boffa, G. Milan, L. Vassio, I. Drago, M. Mellia and Z. Ben Houidi, "Towards NLP- based
Processing of Honeypot Logs," 2022 IEEE European Symposium on Security and Privacy
Workshops (EuroS&PW), Genoa, Italy, 2022, pp. 314-321, doi:
10.1109/EuroSPW55150.2022.00038. https://ieeexplore.ieee.org/document/9799396

9. J. Buzzio-Garcia, "Creation of a High-Interaction Honeypot System based-on Docker containers,"


2021 Fifth World Conference on Smart Trends in Systems Security and Sustainability (WorldS4),
London, United Kingdom, 2021, pp. 146-151, doi: 10.1109/WorldS451998.2021.9514022.
https://ieeexplore.ieee.org/document/9514022

10. K. Ramakrishnan, P. Gokul and R. Nigam, "Pandora: An IOT based Intrusion Detection Honeypot
with Real-time Monitoring," 2021 International Conference on Forensics, Analytics, Big Data, Security
(FABS), Bengaluru, India, 2021, pp. 1-7, doi: 10.1109/FABS52071.2021.9702656.
https://ieeexplore.ieee.org/document/9702656
References
11. Provos, Niels & Mcnamee, Dean & Mavrommatis, Panayiotis & Wang, ke & Google, Nagendra. (2007).
The Ghost In The Browser Analysis of Web-based Malware.
https://www.researchgate.net/publication/228632321_The_Ghost_In_The_Browser_Analysis_of_Web-base
d_Malware

12. Touch, S.; Colin, J.-N. A Comparison of an Adaptive Self-Guarded Honeypot with Conventional
Honeypots. Appl. Sci. 2022, 12, 5224. https://doi.org/10.3390/app12105224 https://www.mdpi.com/1641818

13. D. Fraunholz, M. Zimmermann and H. D. Schotten, "An adaptive honeypot configuration, deployment
and maintenance strategy," 2017 19th International Conference on Advanced Communication Technology
(ICACT), PyeongChang, Korea (South), 2017, pp. 53-57, doi: 10.23919/ICACT.2017.7890056.
https://ieeexplore.ieee.org/document/7890056

14. B. Wang, Y. Dou, Y. Sang, Y. Zhang and J. Huang, "IoTCMal: Towards A Hybrid IoT Honeypot for
Capturing and Analyzing Malware," ICC 2020 - 2020 IEEE International Conference on Communications
(ICC), Dublin, Ireland, 2020, pp. 1-7, doi: 10.1109/ICC40277.2020.9149314.
https://ieeexplore.ieee.org/document/9149314
Thank
You

You might also like