0% found this document useful (0 votes)
16 views

Journal paper (1)

Spyware detection

Uploaded by

emoh47973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Journal paper (1)

Spyware detection

Uploaded by

emoh47973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Enhancing Detection of Spyware and

Infostealers.

Abdullah Jafar, Motaz Elzughbi, Mohammad Afonah

Abstract
This paper introduces a pragmatic and efficient approach to fortify the detection
of spyware and infostealers through the implementation of a handmade network
traffic monitoring tool. The escalating sophistication of cyber threats necessitates
innovative solutions that are resource-efficient and easily deployable, making our
tool particularly suitable for diverse network environments.
The proposed tool employs a combination of packet inspection, protocol analysis,
and anomaly detection techniques to scrutinize network traffic patterns. By
focusing on the identification of irregularities and deviations from normal
communication behavior, the tool aims to provide an effective and non-intrusive
means of detecting malicious activities associated with spyware and infostealers.
To validate the practical efficacy of the tool, extensive testing was conducted
using diverse datasets containing known spyware and infoStealer samples. Results
demonstrate a significant improvement in detection accuracy compared to
traditional methods, showcasing the tool's ability to identify subtle yet indicative
patterns of malicious behavior. Moreover, the tool exhibits a lightweight
footprint, ensuring minimal impact on network resources and operational
efficiency.
The paper discusses the tool's practical implementation, highlighting its scalability
and ease of integration into existing network security architectures. Real-world
deployment scenarios and case studies illustrate the tool's effectiveness across
various network environments, emphasizing its utility as a viable alternative for
organizations seeking a non-machine learning approach to enhance their
cybersecurity posture.

Page | 1
In conclusion, our research underscores the effectiveness of a handmade network
traffic monitoring tool in enhancing the detection of spyware and infostealers. the
tool provides a practical, resource-efficient solution for organizations aiming to
bolster their cybersecurity defenses against evolving cyber threats.

1 Introduction
The ubiquitous and ever-evolving nature of cyber threats, exemplified by the
proliferation of spyware and infostealers, demands continuous innovation in
detection mechanisms to safeguard digital assets and sensitive information. As
organizations strive to fortify their cybersecurity posture, the need for efficient
and pragmatic solutions becomes increasingly apparent. In this context, our paper
delves into a novel approach for enhancing the detection of spyware and
infostealers through the utilization of a handmade network traffic monitoring
tool, deliberately designed without the incorporation of machine learning
algorithms.

Traditional methods of threat detection often fall short in addressing the dynamic
and sophisticated nature of contemporary cyber threats. Signature-based
detection, while effective to a certain extent, struggles to keep pace with the
rapid evolution of malicious tactics. Machine learning, while powerful, introduces
complexities related to resource utilization and deployment overhead.
Recognizing these challenges, our approach aims to provide a streamlined
alternative that prioritizes efficiency and adaptability.

The cornerstone of our proposed methodology lies in the meticulous analysis of


network traffic patterns, an invaluable source of insights into potentially malicious
activities. By crafting a bespoke network traffic monitoring tool, we endeavor to
scrutinize communication behaviors without the reliance on machine learning
algorithms. This deliberate choice seeks to address concerns related to resource
consumption, operational complexities, and the need for continuous training that
often accompanies machine learning-based solutions.

Page | 2
Throughout this paper, we will detail the design principles and functionalities of
our handmade tool, emphasizing its capability to identify anomalies associated
with spyware and infostealers through packet inspection, protocol analysis, and
anomaly detection techniques. Furthermore, extensive testing against diverse
datasets containing known threat samples will be presented to showcase the
tool's effectiveness in a variety of scenarios.

In a cybersecurity landscape characterized by its dynamic nature, our non-


machine learning approach provides a practical and resource-efficient solution for
organizations aiming to bolster their defenses against the ever-evolving menace
of spyware and infostealers. This paper contributes to the ongoing discourse on
effective cybersecurity strategies by presenting a viable alternative that addresses
the pressing need for reliable threat detection without compromising operational
efficiency.

Now let’s start the technical talking “what are the methods that the current
antiviruses use to detect malware?”

We will start with the most common and simple way, which is detection based on
signature, its work by comparing the file signature with a known malware
signatures if a match found then they mark it as malware and start some
procedures involve put the file under quarantine, stop any processes that the
malware starts, and even remove it entirely.

Simple way but consider to be a very effective way to detect the known malwares
and exist almost in every antivirus, it is considered to be keystone and the initial
method of the antivirus’s malware detection, this method can’t detect unknown
malware since its work are based on known malware databases.

Page | 3
Figure SEQ Figure \* ARABIC 1: Signature Detection Based.

Another method used by the antiviruses is behavioral based detection, monitors


the actions and activities of files on your system. It can detect malware that tries
to hide its presence or infect other files by modifying their behavior. For example,
behavioral detection can stop a file from deleting itself after execution, or prevent
a file from spreading to other computers on the network.

This method also involves comparing the activity of the suspected file to a
database of known malicious behaviors, which are more flexible from the
signature based detection since at can detect unknown malware by monitoring
the activity of this malware.

Behavioral detection is one of the most efficient ways to protect against advanced
threats like zero-day malware.

Page | 4
Of course, nothing is clear from defects, this method can generate what’s called
false-positive, this happens when it marks a legitimate file that does not contain
any malware as malware because they act a way similar to the malware activity.

Figure 2: Behavioral detection process

Page | 5
2 Related Works

in this section we will mention some of the related research’s that have introduce
some modern techniques to detect malwares, these techniques later have been
adopted in several antiviruses.

I. Dynamic Spyware Analysis research paper by Manuel Egele, Christopher


Kruegel, Engin Kirda, Heng Yin and Dawn Song.

the paper describes a dynamic analysis approach to identify spyware.


The paper discusses the limitations of traditional anti-spyware tools,
which operate by checking unknown programs against signatures
associated with known spyware instances. These techniques cannot
identify unknown spyware, require frequent updates to signature
databases, and they are easy to evade by code obfuscation.

The authors propose a new dynamic analysis approach that precisely


tracks the flow of sensitive information as it is processed by the web
browser.

They also propose a set of analysis techniques to identify which program


access sensitive data and to track the flow of this data to other
components of the system.

Their system is able to identify unknown spyware and provide reports


on its behavior, including the types of sensitive data that it accesses and
sends, the websites that it visits, and the other components of the
system that it interacts with.

Page | 6
II. Integrated Static and Dynamic Analysis for Malware Detection by Li et al.

This paper proposes a new spyware detection method that combines


static and dynamic analysis to improve detection accuracy.

The static analysis phase identifies suspicious code patterns in the


program's binary code, while the dynamic analysis phase monitors the
program's behavior while it is running.

The two phases are then combined to produce a final decision about
whether the program is malicious or not.

The system was tested on a dataset of 500 spyware and 500 legitimate
programs, and it achieved an accuracy of 98.7%.

This is significantly higher than the accuracy of either static or dynamic


analysis alone.

III. Detection of spyware by mining executable files by Raja Khurram


Shahzad, Syed Imran Haider, Niklas Lavesson.

Page | 7
The paper proposes a new approach to spyware detection using data
mining techniques.

Data mining is a process that involves extracting patterns and


knowledge from large datasets. In the context of spyware detection, this
can be used to identify patterns in executable files that are indicative of
spyware activity.

This method was tested on a dataset of 137 executable files, they


achieved an accuracy of 94%, which is significantly better than the
accuracy of traditional spyware detection methods.

The method is a promising in the spyware detection and has the


potential to provide effective protection against new spyware threats.

IV. Spyware Detection by Extracting and Selecting Features in Executable


Files by Mohamed Adel Sheta, Mohamed Zaki, Kamel Abd El Salam El
Hadad, and H. Aboelseoud M.

Page | 8
It presents a new method of extracting and selecting features from
executable files, based on the uniqueness and frequency of the features
in each class type.

The features are extracted from the binary code of the executables,
without requiring any prior knowledge or analysis of the spyware
behavior.

It evaluates the performance of the proposed method on a data set of


1,000 executables shows that the proposed method achieves high
accuracy, with an average accuracy of 98.6%.

It also demonstrates that the proposed method can detect new and
unknown spyware, as well as new versions of existing spyware.

Figure 5: Selection and extracting future in executable files.

V. Spyware Detection and Prevention using Deep Learning, AI for user


Applications by SV Mahesh, Sumithra Devi KA.

Page | 9
New method to detect and prevent spyware attacks on user applications
using deep learning and artificial Intelligence.

Machine learning is a type of artificial intelligence that uses algorithms


and data to create models that can learn from data and perform tasks
without human interaction.

Deep learning is a type of machine learning that uses artificial neural


networks to model and solve complex problems and it allows to uncover
hidden patterns in large datasets.

The paper introduces a new framework called SPY-DL that uses deep
learning models to classify executables as spyware or benign, based on
the features extracted from the binary code and the application
permissions.

The framework also uses artificial intelligence techniques to generate


countermeasures against spyware attacks, such as blocking network
access, deleting files, or uninstalling applications.

the proposed framework achieves 99.2% accuracy on detecting


spyware.

3 Methodology and Experiments


Our study is focused on catching malicious software by closely observing how
computers communicate with each other , instead of using complicated machine
learning methods, we've created our own tool.
this tool keeps an eye on the flow of information across networks, somewhat
similar to Wireshark but with our unique approach. our main aim is to improve
our ability to identify suspicious behavior.
3.1 Malware used (Spyware) :
RedLine Stealer is malware that can collect users confidential information and
deliver other malicious programs. The availability and flexibility of the stealer
Page | 10
cause financial loss, data leakage, targeting both enterprise and personal devices.
Healthcare and manufacturing sectors suffer the most from these attacks.
The malware appeared in March 2020 according to the Proofpoint investigation.
Since then RedLine has just gained steam. It was on the rise during the COVID-19
pandemic and is still active. On July 1st, 2021 the malware was found on the legit-
looking website that provides privacy tools. However, based on the payload
analysis, only malware can be found there.

3.2 Python Libraries Required:


1) Scapy:
● Description: Scapy is a library for packet manipulation and network
analysis. It allows users to capture and forge network packets, making it an
essential tool for network traffic monitoring. In this methodology, Scapy is
employed to sniff packets and extract relevant information from various
network protocols.

2) Capstone:
● Description: Capstone is an open-source disassembly framework that
provides a simple interface for disassembling binary code. It supports
various architectures and is used in this methodology for disassembling
executable files. Capstone aids in analyzing the assembly instructions to
understand the logic and functionality of the code.

3) PEfile:
● Description: PEfile is a Python module designed to parse Portable
Executable (PE) files, commonly used in Windows. It provides an interface
to analyze the internal structure of executable files, extracting information
such as sections, imports, and exports. PEfile is crucial for understanding
the composition of executable binaries.

4) Cryptodome (Cryptodomex):

Page | 11
● Description: Cryptodome is a comprehensive Python library for
cryptographic operations. It provides a wide range of cryptographic
algorithms, including hash functions. In this methodology, Cryptodome is
utilized for hashing operations (e.g., SHA256) to generate checksums for
executable files and data chunks.

5) Binwalk:
● Description: Binwalk is a fast, easy-to-use tool for analyzing, reverse
engineering, and extracting firmware images. It can identify and extract
various file types embedded in binary data. In this methodology, Binwalk is
employed to scan executable files for hidden or embedded files, enhancing
the analysis of potential threats.

3.3 Malware analysis technique :

Page | 12
This code presents a comprehensive analysis of the "RedLine" malware through a
multi-faceted approach. The analysis encompasses various aspects, including
cryptographic hashing, string extraction, VirusTotal scanning, hidden file
detection, memory mapping, imports library inspection, and disassembly. The
purpose is to gain insights into the malware's structure, behavior, and potential
threats it poses.

# Class definition for EXE_INFO


class EXE_INFO:
- __init__(exe_path: str)
- extract_memory_map() -> list[dict[str, str]]
- check_binary() -> str
- extract_imports() -> dict[str, str]
- extract_exports() -> list[str] or None
- find_hidden_files_in_exe() -> list[str]
- find_entry_section(sections, base_of_code) -> section or None
- disassembly()

# Constants and utility functions


- print_red(text)
- print_green(text)
- print_yellow(text)
- print_cyan(text)
- get_hashing(exe_chunks) -> dict
- extract_strings_from_exe(exe, min_length=4) -> list[str]
- scan_with_virus_total(check_sum256)
- hex_dump(chunks)

# Main function
- main()

By typing [python3 template.py ExecutableFile.exe] in the kali Linux terminal it


will start analyzing the executable file.

Here is the explanation of the previous pseudocode :


Page | 13
● Cryptographic Hashing: uses various techniques to create unique
fingerprints (checksums) for the malware file. it is like creating digital
signatures using different algorithms to help identify the file uniquely.

● String Extraction: that extract out readable text snippets from the malware
file , like searching for clues or hints within the file's content that might
reveal its purpose or behavior.

● Hidden File Detection: investigates if there's anything concealed within the


malware using a tool called binwalk. It's like searching for secret
compartments within a physical object , in our case (the executable file).

Page | 14
● VirusTotal Scanning: this reaches out to the VirusTotal service, which acts
like a super antivirus tool , it checks the malware against a bunch of
different antivirus programs to see if any of them flag it as harmful, giving
us a sense of how
dangerous it
might be.

● Hex Dump: The


code lays out the
content of
the malware in
a structured
way using a
hexadecimal
format ,like
creating a

Page | 15
visual map of the binary data, making it easier to spot unusual patterns and
understand what's going on.

● Memory Mapping: explores


how the malware is
organized in the computer's
memory , it's similar to looking at
a blueprint, showing different
sections, addresses, and sizes
to understand how the file is
structured.

● Imports Library Inspection: checks what external "helpers" the malware is


bringing in , like investigating who the malware is partnering with, by
analyzing the external functions and libraries it relies on.

Page | 16
● Disassembly: takes apart the executable's code into human-readable
instructions , like disassembling a machine to see how it works inside ,
which helps in understanding the logic and potential harmful actions
embedded in the code.

3.4
Network packet sniffer using Scapy library
The provided Python script uses the Scapy library to sniff and analyze network
packets. It defines a packet callback function (packet_callback) that is executed
for each intercepted packet. The script prints information about various types of
network packets, including TCP, UDP, ICMP, HTTP, and raw data. Additionally, it
displays a hex dump of each packet.

function print_colored_message(color, message, data=None):


print(f"\033[{color}m{message}\033[00m", data if data is not None else "")

function packet_callback(pkt):
now, ip_src, ip_dst = current time as string, source IP, destination IP from pkt

if IP in pkt:
if pkt has TCP, UDP, ICMP, or HTTP layer:
protocol = "TCP" if TCP in pkt else "UDP" if UDP in pkt else "ICMP" if ICMP in pkt else "HTTP"
src_port, dst_port = source and destination ports from TCP, UDP layers
print_colored_message("92", f"At {now} {protocol} pkt - Source: {ip_src}:{src_port} --> Destination: {ip_dst}:
{dst_port}")

if pkt has Raw layer:


print_colored_message("96", "Raw Data:", raw data from Raw layer)

if pkt has HTTP or HTTPRequest layer:


url = concatenation of Host and Path from HTTPRequest layer
ip, method = source IP from IP layer, HTTP method from HTTPRequest layer
print_colored_message("92", f"At {now} Source: {ip} Requested {url : } with {method}")

print_colored_message("91", "Hex Dump : ")


hexdump(pkt)

try: Page | 17
sniff(prn=packet_callback)
except KeyboardInterrupt:
print_colored_message("91", 'Process Has Been Terminated')
By using Scapy we can capture and analyze various types of packets, providing us
with detailed information about each packet, including timestamps, IP addresses,
ports, and raw data , the script uses color-coded messages for better readability
and includes exception handling for user interruption.

By typing [sudo python3 test.py] in the kali Linux terminal it will start analyzing
the data packets going through the device network of type ( TCP, UDP, ICMP,
HTTP, HTTPRequest ) giving us the time when the traffic occur and the type of the
packet and the source IP and Destination addresses .
The term "Raw Data" refers to the payload or data content carried within the
packet. The code is using the Scapy library to sniff and analyze network packets,
and when a packet has a Raw layer, it means that it contains payload data.
● If the packet is a TCP packet and has a Raw layer, it prints the raw data
associated with that layer.

Page | 18
● If the packet is a UDP packet and has a Raw layer, it prints the raw data.

● If the packet is an ICMP packet and has a Raw layer, it prints the raw data.

● If the packet is an HTTP packet and has a Raw layer, it prints the raw data.

● If the packet is an HTTPRequest packet and has a Raw layer, it prints the
raw data.
The raw data typically represents the payload of the packet, and its interpretation
depends on the protocol and the specific application generating the packet. For
example, in the case of HTTP or HTTPRequest packets, the raw data could include
the content of an HTTP request or response.
The term "hex dump" refers to a hexadecimal representation of the binary data
within a packet or any other binary file. It is a way of displaying the contents of a
file or memory region in a format that shows the hexadecimal values of each
byte. Hex dumps are often used in networking, debugging, and reverse
engineering to analyze the structure and content of binary data.
In our code, the hexdump(pkt) function is used to display a hexadecimal dump of
the entire packet. This hex dump provides a line-by-line representation of the
binary content of the packet, where each line typically shows a specific number of
bytes in hexadecimal format.

A hex dump is like a detailed snapshot of the data inside a file or a packet, here's
what each part means:
● Offset (0000): Think of this as the address of the first byte in the data. It's
like the starting point.
● Hexadecimal Values (01 00 5E 4D 4D 4D A0 36 BC D0 E8 D7 08 00 45 00):
These are the numbers in a base-16 system (0-9 and A-F) representing each
byte in the data.
● ASCII Representation (“<ASUS_ARMOURY_CRATE>”): This is a more
human-friendly view, where printable characters are shown as they are
(like letters and symbols), and non-printable ones are replaced with dots.

Page | 19
A hex dump is like looking at the raw data in a structured way , it's super useful
for figuring out what's inside when you're dealing with things like network data,
file formats, or anything that isn't immediately readable.

References
1. "Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting
Malicious Code" by Michael Hale Ligh, Steven Adair, Blake Hartstein, and
Matthew Richard.
2. SANS Internet Storm Center White Papers
3. MITRE ATT&CK Framework
4. "Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting
Malicious Code" by Michael Ligh, Steven Adair, Blake Hartstein, and
Matthew Richard
5. "Malware Forensics: Investigating and Analyzing Malicious Code" by
Cameron H. Malin, Eoghan Casey, and James M. Aquilina
6. "The Art of Memory Forensics: Detecting Malware and Threats in
Windows, Linux,
7. https://www.bitdefender.com/files/News/CaseStudies/study/415/
Bitdefender-PR-Whitepaper-RedLine-creat6109-en-EN.pdf
8. https://medium.com/@farghly.mahmod66/redline-stealer-code-
analysis-6753583a78d4

Page | 20

You might also like