Behavioral Analytics for Threat DetectionMasterThesis
Behavioral Analytics for Threat DetectionMasterThesis
Master's Thesis
Submitted to the Faculty of the
Escola Tècnica d’Enginyeria de Telecomunicació de Barcelona
Universitat Politècnica de Catalunya
by
Sonu Preetam
In partial fulfilment
of the requirements for the degree of
MASTER IN CYBERSECURITY
With the increased sophistication of tools and techniques deployed by cybercriminals, traditional
methods such as signature-based detection are inadequate in combating these complex threats.
Hence, detection capabilities leveraging Behavioural Analysis need to be employed. The solution
involves streamlining the breadcrumbs of unrelated and harmful activity performed by the
attackers over a certain period to determine the behavioural patterns through analysis and
correlation. It implies adopting statistics, artificial intelligence, and machine learning methods to
analyse the enormous number of anomalies, data, and traffic that can proactively identify malicious
behaviour beyond the standard patterns pertinent to singular events. Technology and human
intervention are crucial for linking these activities and picturizing the attacker's behaviour to detect
and respond to attacks on time.
Multiple cybersecurity vendors currently provide actionable reports for dealing with threats, but
these products are subscription-based. Furthermore, the lack of academic research, vendor
collaborations, and cost-effective implementations make it difficult to analyse the emerging threats
leading to recursive attacks. Statistical AI-based techniques to digest threat feed and generate
distributive, bidirectional, and collaborative knowledge incurring lower costs are used to tackle
such attacks.
This study contributes towards generating indicators and patterns of a threat actor through deep
analysis, commercial feeds, and using the standardized language STIX within the MITRE
ATT&CK framework to enrich the knowledge bases in threat intelligence platforms. The solution
revolves around analysing a threat from a Cyber Threat Intelligence perspective. Eventually, the
defenders can utilize the indicators of behaviour and compromise developed and outlined in this
study report to identify suspicious user behaviour activity in the network by thresholding one or
more patterns in the form of alerts.
i
Version Control
Submitted: Sonu Preetam
Reviewers: Nil Ortiz, Pere Barlet
CHANGELOG
REVIEWER
DATE VERSION NOTES REMARKS
COMMENTS
Write the
01/04/22 1.3 Threat Analysis behaviours on the Done
go.
Python
19/04/22 1.4 Make a dataframe Done
Implementation
Focus on the
05/05/22 1.7 Progress Review Done
objectives only
ii
Normalize the
indicators of
Threat Intelligence
20/05/22 1.8 behaviour (choose Done
Gathering
the important ones)
in STIX
Figurative view of
01/06/22 2 Thesis Done
the project
iii
Acknowledgment
This research would not have been possible without the continuous support of i2CAT foundation.
My acknowledgement goes to my supervisor Nil Ortiz for his invaluable feedback and guidance
through this project that he provided with enthusiasm and a remarkable patience; his great
attention to detail and a constant urge for improvement were crucial to the advancement of my
project.
I am deeply grateful to my tutor, Prof. Pere Barlet for his supervision, support, and encouragement
during my studies.
I specially thank my partner and my mother for always supporting me during this project.
iv
Contents
Abstract .................................................................................................................................................... i
Version Control ...................................................................................................................................... ii
Acknowledgment................................................................................................................................... iv
Contents .................................................................................................................................................. v
List of Tables ........................................................................................................................................ vii
List of Figures ...................................................................................................................................... viii
1. Introduction ................................................................................................................................. 10
1.1 Context ................................................................................................................................10
1.2 TDA Project Overview of i2CAT Foundation ................................................................11
1.3 Objectives ............................................................................................................................12
1.4 Scope of the Research Thesis ............................................................................................13
1.5 Work Plan ............................................................................................................................13
1.6 Research Thesis Structure ..................................................................................................14
2. State of the Art............................................................................................................................. 15
2.1 Cyber Threat Intelligence Life Cycle .................................................................................15
2.2 Threat Platforms and Software Applications ...................................................................16
2.2.1 VirusTotal .......................................................................................................................16
2.2.2 VX-Underground ...........................................................................................................17
2.2.3 STIX: Structured Threat Information Expression ......................................................18
2.2.4 MISP: Malware Information Sharing Platform ............................................................19
2.2.5 BURP Suite .....................................................................................................................20
3. Research and Project Development ........................................................................................... 21
3.1 Critical Analysis of Malware Families................................................................................21
3.1.1 Malware Survey based on Affected OS, Recent Activity, and Severity......................22
3.1.2 Malware Survey based on Capabilities ..........................................................................23
3.2 Analysis of Sodinokibi Ransomware .................................................................................25
3.2.1 Threat Details .................................................................................................................26
3.2.2 Execution ........................................................................................................................26
3.2.3 Propagation .....................................................................................................................27
3.2.4 Privilege Escalation ........................................................................................................27
3.2.5 File Encryption ...............................................................................................................28
3.2.6 Extortion Alert ...............................................................................................................28
v
3.2.7 Persistence.......................................................................................................................29
3.2.8 Conclusion ......................................................................................................................29
3.3 Deep Behaviour Analysis of Sodinokibi Ransomware ....................................................30
3.3.1 Get Libraries ...................................................................................................................32
3.3.2 Create MUTEX ..............................................................................................................33
3.3.3 Privilege Scaling ..............................................................................................................33
3.3.4 Exploitation ....................................................................................................................34
3.3.5 Process Securing .............................................................................................................35
3.3.6 JSON Configuration and TXT ......................................................................................35
3.3.7 Excluded Languages .......................................................................................................38
3.3.8 List of Process to be Terminated ..................................................................................39
3.3.9 Deleting Shadow Copies ................................................................................................39
3.3.10 Emptying Folders.......................................................................................................40
3.3.11 File Encryption ...........................................................................................................40
3.3.12 Bitmap .........................................................................................................................43
3.3.13 Command and Control Communication .................................................................43
3.4 TTPs following MITRE ATT&CK Framework ..............................................................43
3.5 Extract Actionable Intelligence .........................................................................................46
3.6 Correlation of Sodinokibi's attributes in MISP ................................................................50
4. Code Repository and Structure .................................................................................................. 59
5. Conclusion ................................................................................................................................... 60
6. Future Work................................................................................................................................. 61
Glossary................................................................................................................................................. 62
Bibliography .......................................................................................................................................... 64
Appendices............................................................................................................................................ 67
Appendix A - Sodinokibi Threat Info. on CrowdStrike Falcon X Intelligence Platform..........67
Appendix B - List of VirusTotal API Objects...............................................................................68
vi
List of Tables
Table 1: Malware Survey based on Affected OS, Recent Activity and Severity ..............................23
Table 2: Malware Survey based on their Capabilities ........................................................................24
Table 3: Malware Samples fetched from External Sources...............................................................26
Table 4: JSON Configuration Field Definition .................................................................................38
Table 5: Registry values containing Sodinokibi Session Encryption Keys ......................................42
Table 6: File Information output from the code ...............................................................................48
Table 7: Sodinokibi´s Identified IoBs for MISP Platform ................................................................56
Table 8: Signatures already utilized for detecting Sodinokibi ...........................................................56
Table 9: Other activities performed by Sodinokibi ...........................................................................58
vii
List of Figures
Figure 1: Recent impact of Ransomware (BlackFog, 2022)..............................................................10
Figure 2: High level diagram of OpenUEBA ....................................................................................12
Figure 3: Gantt Chart ...........................................................................................................................14
Figure 4: Cyber Threat Intelligence Cycle ..........................................................................................16
Figure 5: VirusTotal webpage for searching Hashes (VirusTotal, 2022) .........................................17
Figure 6: Vx-underground webpage containing hash samples (Vx-underground, 2022) ...............18
Figure 7: STIX Patterning Language Format (Team, 2017) .............................................................18
Figure 8: STIX example of Indicator Object .....................................................................................19
Figure 9: Discovering similarities between Import and Existing events (MISP, n.d.) ....................20
Figure 10: Overall Implementation and Intelligence Gathering Plan ..............................................21
Figure 11: Average Ransom demand post negotiation (Q4 2020) (Recovery, 2020) .....................24
Figure 12: Sodinokibi Infection Chain (Trend Micro, 2021) ............................................................25
Figure 13: Extortion Alert Wallpaper after files are Encrypted (Fakterman, 2019) .......................29
Figure 14: Execution Flow of Sodinokibi (McAfee Labs, 2019) ......................................................31
Figure 15: General Diagram of Sodinokibi (Watchguard, 2019) ......................................................32
Figure 16: Function on the Entry Point (Watchguard, 2019) ...........................................................32
Figure 17: Mutex Function Format ....................................................................................................33
Figure 18: Checking Architecture .......................................................................................................35
Figure 19: Base64 encoded TXT file with Recovery Instructions ...................................................36
Figure 20: Part of the JSON file..........................................................................................................36
Figure 21: Values assigned to the JSON ............................................................................................37
Figure 22: Obtaining the Exclusion list for Languages .....................................................................39
Figure 23: Obtaining the list of Processes ..........................................................................................39
Figure 24: Shadow Copies Deletion Format (BlackBerry, 2019) .....................................................40
Figure 25: JSON configuration containing the list of folders to empty ..........................................40
Figure 26: Files, extensions, and directories excluded from encryption ..........................................41
Figure 27: Registry Setup .....................................................................................................................42
Figure 28: List of domains to which Sodinokibi transmits Information .........................................43
Figure 29: Pyramid of Pain (Bianco, 2014) ........................................................................................44
Figure 30: High-level representation of attack group knowledge base with STIX (Zych, 2022) ..45
Figure 31: Sodinokibi's TTPs ..............................................................................................................46
Figure 32: Sodinokibi samples collected from Malpedia and vx-underground ...............................46
viii
Figure 33: Examples of attributes of publicly accessible VirusTotal objects ..................................47
Figure 34: Sample from code representing dataframe of Extracted Indicators ..............................48
Figure 35: Other IoCs chosen for analysis .........................................................................................48
Figure 36: Request-Response captured using Burp Suite .................................................................49
Figure 37: STIX bundle with some indicators and behaviours for Sodinokibi ...............................50
Figure 38: MISP events accessed for Sodinokibi ...............................................................................51
Figure 39: Taxonomy used in MISP to classify Threats ...................................................................51
Figure 40: Example of Galaxy information collected from the MISP .............................................52
Figure 41: MITRE ATT&CK Behaviour for Sodinokibi in MISP clusters ....................................52
Figure 42: Vulnerability filtered in MISP to display details from MISP GitHub ............................53
Figure 43: Domains extracted from MISP APIs ...............................................................................54
Figure 44: Output of an attack pattern comparison in STIX ...........................................................54
Figure 45: Output of a sample from VirusTotal in MISP for Sodinokibi .......................................55
ix
1. Introduction
1.1 Context
Attackers primarily leverage malware for network reconnaissance, compromise, and information
gathering phases of their malicious cyber intrusion. The sophistication of the capabilities of such
threats further increased in 2019, allowing adversaries to steal information and perform multistage
attacks (BlackFog, 2022). Even though there has been a significant advancement in signature-based
technologies, identifying these attacks remained challenging due to the time-consuming process of
its analysis it in the first place and due to continuous emerging variants of malware. Then in 2021,
a total of 292 reported ransomware attacks were reported, an increase of 17% year after year.
Eighty percent of these attacks exfiltrated data using PowerShell. REvil threat group dominated
the attack surface by representing the highest number of victims, followed by Conti. The retail and
technology sectors were impacted the most. The percentage increase of ransomware attacks
suffered by the various sectors at the start of 2022 is illustrated in Figure 1.
There has been an ever-growing adoption of new technologies to analyse threat data. However,
the main question arises whether these technologies can fulfil the future requirements for
10
successful threat detections. Furthermore, despite the technological advancements in data analysis,
the lack of accessibility to these metamorphic threat data, cost-effective methods, and the dearth
of a centralized platform that could have strengthened collaboration between organizations
impacted by cybersecurity make the detection and analysis of the threats to be even more elusive.
The solution to this is to evaluate an object as per its intended actions before it demonstrates its
malicious nature. For example, behaviour such as attempting to discover sandbox environment,
performing actions for disabling the security controls or registering an AutoStart, installing
rootkits, and others. With behaviour-based analysis, it observes and evaluates every line of code
and analyses all requests to access connections, services, files, or processes. Such identification is
possible to detect entire malicious scheme or a minor suspicious activity to provide a clear view of
the threat or false positive. As it is known, premier security is using both the signature-based and
behavioural-based analysis.
Indicators resembling a known or unknown malware get formulated through this study's static
analysis of the threat. These indicators provide a comprehensive view of the malware, identify its
attack environment, and detect its target person, organization, or country.
This project is embedded as a part of i2CAT's ongoing projects. This opportunity provided by
i2CAT not only enabled contributing to it but also help observe the company dynamics, research
methodology and innovation efforts, and last but not the least, team collaboration. The overall
journey imparted a holistic outlook into the real-world practice and business context of their state-
of-the-art cybersecurity projects.
This thesis is a part of the openUEBA project, a proposal designed to respond to the challenge of
reducing the "Degree of exposure of users in the face of a cyber threat" posed by the Cybersecurity
Agency of Catalonia which would further develop solutions within the framework of Smart
Catalonia's Advanced Digital Technologies (TDA) Research and Innovation Programme. The
project aims to calculate the user’s exposure to a specific threat by evaluating its associated risk
and impact scores obtained via user behaviour analysis and threat profiling.
11
Figure 2: High level diagram of OpenUEBA
The challenge is part of the plan to update and improve the technological capabilities of the
security operation centre of the Cybersecurity Agency of Catalonia. The part covered in this thesis
is analysing a threat profile for listing their behaviours. The solution to be developed as part of
this research project will contribute toward integrating disparate data to provide a holistic view of
activity, patterns, and trends that lead to compromise among users. Currently i2CAT is monitoring
the log from Universität de Lleida and generating rules for detection of attacks.
1.3 Objectives
This thesis's main objective is to provide a methodology for analysing and detecting cyber threats
by developing an ontology for threat behaviour indicators. The project objectives are designed
using the requirements set by i2CAT and continual supervision. This thesis focuses on research
and development, going through a complete cyber threat intelligence cycle to develop behavioural
threat indicators. The critical issue was analysing the malware samples and obtaining information
about their patterns in a single threat intelligence platform. Current analysis of threats is based
primarily on known malware behaviour. It involves analysing and verifying new files, also known
as signature-based malware detection. However, unfortunately, new variations of the malicious
code appear now and then, making it difficult and time-consuming for the signature-based
technologies to recognize them. In the long term, the behavioural patterns collected as a part of
this study will be utilized to analyse the potential behaviours through suspicious activities or if the
intended actions are anomalous or unauthorized.
• Perform a survey of recent cyber threats and determine a specific threat for deeper analysis.
• Systematic and extensive research of the behaviours and indicators of the selected cyber threat.
• Classify the identified patterns (tactics, techniques, and procedures) and align them with the
MITRE ATT&CK framework.
• Extract actionable intelligence gathered from the data.
12
• Utilize the MISP platform to determine relationships concerning the cyber threat's potential
malicious/anomalous activity.
This thesis is limited to the behaviour analysis of a specific threat, i.e., Sodinokibi. The analysis is
based on the organization's requirements, database, and tools and will determine only the
preliminary development of threat indicators. This implemented code will also be utilized in the
future to retrieve other indicators. The thesis also does not cover utilizing multiple sources with
malware samples and numerous repositories with the threat intelligence platforms. There are
limitations on the API usage based on the number of samples that can leverage it per day. This
project has utilized all resources and tools to develop a zero-cost, open-source CTI cycle.
It was essential to understand the tools and software used, the threat intelligence lifecycle, scope,
and the threats' analysis to achieve the objectives. Once this knowledge has been achieved, the
requirements are proposed to determine the threat to be analysed and its indicators of compromise
and behaviour.
13
Figure 3: Gantt Chart
The initial research of different malware families structures the thesis for determining a specific
malware for its threat behaviour analysis. At first, a survey is conducted to determine a malware
out of ten different malwares as per their activity and impact. To further ascertain the chosen
threat, intensive research on the capabilities of these threats is performed. Based on these selection
criteria, the threat finally chosen for analysis is Sodinokibi, and information about the typical
malware behaviour and analysis of its underlying exploit code was collected. This highlighted
important artefacts about the malware proved beneficial during its further analysis.
To obtain the threat samples, vx-underground is chosen as its repository is constantly updated
with aggregating the data from external sources. These samples are utilized in the VirusTotal API
to obtain their respective patterns. The utilization of samples included synthesizing the variable
threat data obtained from API in a dataframe as per the current requirement. Then the indicators
and behaviours are compared with the data obtained from the MISP platform, and their similarity
is determined using the Levenshtein algorithm.
14
2. State of the Art
This State-of-the-Art chapter includes a background review on threat actors, specifically
Sodinokibi, and contains recent research on the subject matter. The study involves the state of the
art and feasibility of detecting different families of malware by choosing a specific threat of the
malware. Then, based on the development environment requirements and data availability
throughout the project's timeline, a specific threat is analysed to obtain its behaviour. To achieve
this, a roadmap is curated to develop threat behaviour indicators within the STIX2 standard,
integrate them into the taxonomy defined by the MITRE ATT&CK framework and validate the
viability of the indicators in a test bed environment available within specific i2CAT datasets. This
section will also state the crucial tools and techniques used in this thesis. Finally, this will describe
the threat intelligence lifecycle used to implement this. This thesis adopts new study methodologies
and procedures for the analysis of malware behaviour. The analysis will subsequently determine
suspicious user behaviour activity in future.
In solving the gaps to tackle attacks, the role of the research institutions is to provide more clarity
and galvanize innovation on cyber threat intelligence. By gathering information about the active
malware campaign, this thesis finds the circumstantial clues leading to the identification and
analysis of the most prevalent and notorious malware and its capabilities. The study determines
the preliminary malware for investigation to be Sodinokibi, which is the current scope of this
thesis. The knowledge gained through this research could ease the visualization and identification
of technical and non-technical aspects of a cyber threat.
Threat Intelligence is the information that allows prevention or mitigation of the risk associated
with cyberattacks. It addresses information related to the identification, motivation, indicators of
compromise, capabilities, and other vital facts about attacks. Cybersecurity professionals use this
information to make informed proactive decisions instead of reactive efforts to protect their
organizations. The threat intelligence cycle includes collection, processing, analysis, and
publication (i.e., dissemination) of intelligence.
15
• Process: Understanding the threat actor and processing the gathered data necessary for
analysis.
• Analyse: Analyse the threat data to obtain its patterns and indicators.
• Share: Utilization of the information for analysis in threat intelligence platforms.
2.2.1 VirusTotal
VirusTotal is a free online service that scans files and URLs for viruses, worms, and Trojans. As
shown in Figure 5, URL allow the identification of viruses, worms, Trojans, and other types of
malicious content detected by anti-virus and website scanners. In addition, it can detect false
positives, for example, harmless resources detected as malicious by one or more scanners.
16
Figure 5: VirusTotal webpage for searching Hashes (VirusTotal, 2022)
VirusTotal's mission is to assist in the improvement of the security industry and to make the
Internet a safer place by developing free tools and services. Although the service is made up of
engines belonging to different organizations, VirusTotal is entirely independent of these partners,
and it does not promote products belonging to third parties, which simply act as aggregators of
information. This feature prevents it from being subjected to any kind of bias and allows it to offer
an objective service to its users. Furthermore, the website scanning feature is carried out through
API queries from different companies offering the solution, therefore, using the most updated
version of their database. In this thesis, the file hashes are fed to the VirusTotal API and checked
if the analysis for the file hash is available. If available, the details are downloaded in JSON format.
2.2.2 VX-Underground
This website is the repository of the most extensive collection of code, malware samples, and
research papers on the internet. In this thesis, the utilization of hashes from this website is
extensively used for the collection and processing of the malware in VirusTotal. Furthermore, its
samples are highly organized, provide easy access, and are up to date.
17
Figure 6: Vx-underground webpage containing hash samples (Vx-underground, 2022)
There is a need to balance inward and outward focus on an adversary. This means that just knowing
the vulnerabilities is not sufficient. Understanding attackers' motivation, techniques, and activity
are essential to building better defence systems. As studied in the previous sections, the signature-
based models are ineffective to act on a new version or if an unknown attack has occurred. For
this, proactive detection and pre-exploit activity prevention must come together. This needs a
holistic threat intelligence view of the adversary at various time frames on different attack surfaces.
This requires sharing of knowledge about the adversary among the interested parties (McAfee,
2017).
Nevertheless, this data can be primarily massive and unstructured as its shared from multiple
partners and thus may need immediate analysis or automation for speed. This issue is solved using
Structured Threat Information Expression (STIX), the language for the characterization and
communication of cyber threat information. It was created as an attempt to standardize the data
model and format. Figure 7 below shows the STIX language format.
18
Some practical examples can be seen below:
Finding a URL: [url: value MATCHES '^ (?:https?: \/\/)? (?:www\.)? example\.com\/.*']
MISP is an open-source platform for sharing threat intelligence information such as indicators of
compromise, financial fraud, linked malware, etc. It enables storage of technical and non-technical
information about the adversaries in a structured format to be reutilized in detection systems and
forensic tools. It automatically creates links between malware and its attributes. It generates rules
such as IP addresses, file hashes, domain names, memory patterns, etc., for NIDS systems. It
enables sharing the malware and threat attributes with other parties and trust groups. It improves
malware detection and reversing to promote information exchange among organizations (e.g.,
avoiding duplicate works). It stores all information locally from other instances (ensuring
confidentiality on queries). All IOC data inputted is built from an event object and defined by its
connected attributes (MISP, n.d.). For example, the attribute types and their information can be
populated as below:
19
• domain: Domain name used in malware.
Figure 9: Discovering similarities between Import and Existing events (MISP, n.d.)
This study used the community edition of Burp Suite. It is a security testing tool for web
applications created by PortSwigger. Primarily, the proxy functionality of this tool will be used as
a proxy server to intercept the request and response between the browser and the end
application.
20
3. Research and Project Development
This chapter explains the project methodologies used to answer the research objectives as well as
other informative details found for the behaviour analytics of the threat. Figure 10 represents the
implementation plan to achieve these objectives:
The initial objective is searching for cyber threats and determining a threat to analyse its
behaviour. This section uses the planning stage of Cyber Threat Intelligence cycle to answer the
objective. In this stage, information is gathered from different internal and external sources
regarding the recent malware campaigns. The first part of this information gathering is to search
the threats as per their impact on users and organizations. And the second part of the objective is
to filter them as per their capabilities. The information gathered from both steps initiates the
roadmap of this thesis. Depending on this information, the threat actor to be analysed is
21
determined. Next, the indicators of its behaviour and compromise are extracted to be fed to the
threat intelligence platform.
3.1.1 Malware Survey based on Affected OS, Recent Activity, and Severity
The opportunity to focus on a collection of prevalent malware as per their recent activity provides
a solid foothold for choosing a malware for analysis. It also enabled learning about malware, their
malicious abilities, and their impact globally. Initially, the process of choosing a malware depended
on the information gathered from Top 10 malware as per their recent activity and context. This
information is filtered by going through the recent activities. Most of these malwares were
specifically chosen based on their financial impact to organizations or sectors. The malware is
categorized as per affected target system, threat groups they belong to, the impact caused, their
recent activity, and the availability of samples to analyse them. Table 1 shows information collected
as per the context.
Recent
Malware Threat Group OS Severity
Activity
TA505, UNC1878,
Trickbot Windows 28/12/2021 Very High
WIZARD SPIDER
Linux
Botenago Mirai code 27/01/2022 High
(Routers/IoTs)
22
Table 1: Malware Survey based on Affected OS, Recent Activity and Severity
This analysis provided the data regarding the malware, but due to their operational similarity,
choosing a malware based on the parameters defined is not possible. Hence, it is necessary to
introduce other parameters for choosing a malware for analysis. This led to extensive external
research on the critical capabilities of these malware.
In this step, additional information about the malware mentioned in Table 1 above is gathered as
per their degree of maliciousness. The initial legwork done by the hack group exposes the malicious
activity performed on a victim on successful infection. It is done by collecting information about
system and the internal network.
For example, while analysing different malware, the survey determined the impact of a ransomware
attack as per its severity and capabilities. From this analysis, ransomwares reflected to be the most
troublesome intruders of all the malware. These are an evolving threat to information pertaining
to individuals and businesses. One of the most used behaviours of a Ransomware is loss of
availability. For example, it encrypts files of an infected laptop and holds the key to decrypt them
until the payment is made. This kind of malware is liable for losses of millions.
The most malicious malware in the market is also that is currently able to perform all the activities
mentioned in row 1 of Table 2 below. This is explained in brief as below:
23
QakBot Yes Yes Yes Yes Yes
As per the information gathered in Table 2, trickbot, emotet, and sodinokibi perform all the
malicious activities outlined. Choosing a threat amongst these highly impactful malwares required
studies regarding its financial and technological impact.
Figure 11: Average Ransom demand post negotiation (Q4 2020) (Recovery, 2020)
From this study an interesting concept of Ransomware-as-a-Service came into light. In this
concept, the hacker group owning the ransomware involves other threat groups whose main task
is to perform research on targeted organizations or sectors including additional legwork. These
partners are given access to the RaaS platform for a specific attack campaign agreed upon
previously. This dynamic way of utilizing the virus by different threat groups makes this concept
highly efficient and malicious. Thus, its industrialization in terms of working in a team where they
get paid as per their performance has been the leading cause of Sodinokibi’s outbreak. Currently
only a few viruses in the market operate as RaaS and Sodinokibi is one amongst them which is
24
widely utilized for financial gain in different geographies and sectors (Recovery, 2020). Figure 11
shows the impact by multiple ransomwares based on financial gains in the last quarter of 2020 and
Sodinokibi stood out among this lot.
The family that owns this ransomware has many aliases, mainly known as REvil or Pinchy Spider
across threat gathering platforms and security communities. Its working as an organization and
support from other threat groups, makes it very popular. Its constantly evolving capabilities by
using existing malicious tools as an attack vector to spread ransomware, mine trojans, and
implement non-targeted ransomware attacks worldwide demands urgent attention to develop its
imminent detection and prevention. Hence, Sodinokibi will be the focus of this study for detection
and behavioural analysis. The next section of the study will focus on understanding the
implementation techniques and behaviour of this malware.
The information for the Sodinokibi is extracted from various external sources and briefed here to
gather a comprehensive view of the malware before conducting thorough technical analysis. This
section will provide the overall understanding of the malware and updated infection mechanisms
as per the new vulnerabilities found in systems. The objective is to summarise the actions
performed by Sodinokibi in a structured manner and to understand its working through static
analysis. It fulfils the objective of performing systematic research by gathering more information
about Sodinokibi. This starts the collection phase of Cyber Threat Intelligence Cycle for threat
detection.
25
The steps performed by the Sodinokibi malware is Execution, Propagation, Privilege Escalation,
File Encryption, Extortion, Persistence which will discussed later in this section.
This analysis is obtained from threat detection and intelligence organizations to provide aggregated
analysis of the following samples of the malware:
A md5 bed6fc04aeb785815744706239a1f243
B md5 65aa793c000762174b2f86077bdafaea
C md5 2abff29b4d87f30f011874b6e98959e9
D md5 4af953b20f3a1f165e7cf31d6156c035
E md5 3cae02306a95564b1fff4ea45a7dfc00
The extensive information about changes in each version can be found in (Malpedia, n.d.). Apart
from the samples mentioned in the Table 3, in newer versions of the malware, the registry key
fields and execution of Sodinokibi has changed noticeability to avoid being detected and break
existing defence rules (YARA).
3.2.2 Execution
The executable file of the analysed samples is packed with library names, strings, and DLL files.
This packing uses the RC4 cryptographic algorithm in which a variable-length random key is used
for each encrypted element. The API tables, text strings, and library infers that the anti-virus
software heavily relies on to detect the malicious code are difficult to identify. The information for
its load uses the hash of the string instead of the string itself when decoded at runtime.
Furthermore, the data structure does not recognize its key and size, making it even harder to detect.
Once the malicious code affects the victim system, as it has many attack vectors, the first thing the
malware chooses to do is to create an identifier or mutex so that it can prevent more than one
process from accessing a critical section of the memory at a time. Thus, preventing any unwanted
rerun which in turn reduces the ease of detection. Then the code with the configuration embedded
in the JSON format is deciphered. This code has the options that will create a path for the
operations to be carried out as per the parameters selected by the attacker.
26
3.2.3 Propagation
There are different methods the attacker uses to infect the victim’s system. The most widely used
is spam campaigns which is basically malicious code concealed behind advertisements sent through
emails which in turn acts as a clickbait for this malware to propagate. This is also known as
malicious advertising or malvertising. This can be directly executed on the computers or redirected
to servers for other executables to be downloaded.
With the policy of the cyber criminals, “first breach one host and then take the entire network,”
Sodinokibi ransomware targeted small and medium industries by infecting their systems. Attackers
first launched attacks by exploiting vulnerabilities like Confluence (CVE-2019-3396), UAF (CVE-
2018-4878), and a vulnerability in Oracle systems called Weblogic Deserialization (CVE-2019-
2725), which later evolved into Brute force attacks on the Remote Desktop Protocol (RDP). In
addition, CVE-2018-13379 and CVE-2019-11510 allow the threat actors to drop and execute other
components like the anti-antivirus, exfiltration tools, and Sodinokibi itself. Initial access can also
be performed through hacked managed service providers (MSPs) that use Kaseya, ScreenConnect,
Bomgar, and other remote monitoring and management products.
The malware subsequently performs privilege escalation by exploiting the CVE-2018-8453. This
is a vulnerability that cannot handle the objects in memory. Suppose it finds the system vulnerable,
it compares the creation date of system32 with an updated system. If the system is patched, it tries
to force execution for elevated privileges using the runas function and bypass the user account
control of the environment that prevents unauthorized changes in the system. Suppose no
privilege is obtained, the program exits. Otherwise, the program collects the information outlined
in the JSON configuration regarding all the details needed for further exploitation, i.e., session and
system language checks. The program terminates if the language matches the exclusion list in the
configuration.
If it is not confirmed that the language is in the exclusion list, it continues deactivating the
Windows Shadow Copy, System Restore options and disables Windows Startup Repair. This is
done by disabling the vssadmin process. Then, using bcdedit, backups are disabled, and all stored
snapshots and backups are deleted.
The malicious code checks whether the privilege is elevated to system-level or not, and if the
privilege is set to SYSTEM, then it will look for the explorer.exe process to obtain the session token
27
of the user logged in and reduce the privileges by using the ImpersonateLoggedOnUser function. Thus,
the file encryption process prevents the SYSTEM user from being affected.
Once the privilege is escalated, file encryption of both the local and connected network drives is
initiated. Some files, extensions, and operating system folders are excluded for minimal system
operation as outlined in the JSON configuration.
The file encryption utilizes many cryptographic algorithms by generating asymmetric and
symmetric key pairs using the elliptic curve Diffie-Hellman protocol.
The following briefly describes the cryptographic algorithms used for encryption:
The public-private key pair that is generated is encrypted using the above keys. A detailed analysis
of the file encryption is mentioned in section 3.3.11.
The binary consists of two public keys that give the makers/handlers of Sodinokibi the power to
decrypt the files taken over by their subscribers. The document is renamed after the encryption
operation is completed.
The victim is notified about the extortion when the encryption process is completed on renaming
the targeted files. These targeted files have extensions that depend on the JSON configuration.
28
For instance, if the assigned extension was ‘.abc123def12 when Sodinokibi encrypts the file
‘document.docx’, the listed files will appear as ‘document.docx.abc123def12’. In this process, an
instruction text file is also created with details to recover the files in the format ‘ [extension]-how-to-
decrypt.txt’ depending on the sample. The victim is then notified with ransom demand through
modified desktop background as per the configuration field ‘img’. Figure 13 below shows an
example of the same:
Figure 13: Extortion Alert Wallpaper after files are Encrypted (Fakterman, 2019)
These messages direct to a website that provides instructions to make payment to retrieve the data,
and a sample of three files can be retrieved as a test. Furthermore, no alternative method is
provided apart from those mentioned in the instructions. Finally, a download link is provided to
decrypt the files after the payment is made that are valid for that version of Sodinokibi.
3.2.7 Persistence
Sodinokibi consists of the code for maintaining persistence. The following section refers to the
configuration of malware whose decryption uses RC4 encryption. The options activated in the
configuration as below:
• HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\[code]
• HKCU\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\[code]
The windows logs contain the above configuration where the [code] corresponds to an identifier
that depends on the ransomware compilation.
3.2.8 Conclusion
Any alternative decryption method is currently unavailable after the attack is successfully carried
out. Also, following the attackers’ instructions is not advisable, as there is no guarantee of recovery,
or that the system will be free of infection after the payment. Furthermore, there is a possibility
that the attacker may not be the author of the ransomware as it can be obtained from different
29
sources without even requiring a decryption key. Also, buying bitcoins cannot be done
immediately, so meeting the attackers’ deadlines will be challenging.
Malware analysis is about understanding the behaviour and purpose of a suspicious file or activity
which can result in detection and mitigation of threats. This investigation collected samples from
different sources and determined the behaviour and indicators of compromise. Collection phase
of the cyber threat intelligence cycle now begins, where the potential parameters required are
gathered for validation in the analysis phase.
In this section, the systematic and extensive research to analyse Sodinokibi’s internal activities is
performed. This section also provides a direction towards getting the intrinsic indicators through
each of its infection phases. It provides reasonable goals and intelligence needs to determine
Sodinokibi’s indicators. The static analysis of Sodinokibi and information gathered in the previous
CTI stage from external sources provides adversarial motives and tactics, techniques, and
procedures to be utilized in the MITRE ATT&CK framework. Figure 14 shows analysis of a
certain Sodinokibi sample. The execution flow visualizes each internal steps carried out by
Sodinokibi, and its explanation and indicators will be covered in the subsequent sections.
The primary process performed by this ransomware is mentioned below (Cytomic, 2019):
• GetLibraries: This function dynamically loads libraries that will later be used.
• CreateMutex: Creates a mutex.
• CheckExp: Checks if it needs to escalate privileges. exp is the value it will check, which
will be either true or false on the JSON, depending on whether it has sufficient privileges.
• Exploit: Carries out Exploit CVE 2018-8453.
• GetProcessRun: Obtains and launches explorer.exe.
• Prepare Cipher: Carries out all Sodinokibi’s tasks, obtains JSON, executes language lists,
lists of processes to terminate, deleting Shadow Copies, etc.
This process is illustrated in Figure 15 visualising the general structure of Sodinokibi’s internal
functions.
30
Figure 14: Execution Flow of Sodinokibi (McAfee Labs, 2019)
31
Figure 15: General Diagram of Sodinokibi (Watchguard, 2019)
In the primary function, it calls two functions. The first is a vital function with the code, and the
second carries out a dynamic call called ExitProcess.
32
In its first function, it dynamically imports the functions the system is going to use by using a loop.
This is achieved by changing the entry parameters to the corresponding library. It is divided into
two parts:
• Obtaining the Library: The entry parameter number is used to move through nested ifs and
provide the requested library.
• Obtaining Import Address Table (IAT): In the second part of _BuildIAT, the desired
function is obtained from the previously obtained library. The entry parameter added to the
library’s base address returns the function address. Then it returns to the _BuildIAT function
looping to get all the critical system functions, stores them, and creates an IAT table.
Mutex objects are used as a locking mechanism to point access to a resource on the system. Each
thread takes ownership of the mutex object before executing the code that writes the memory.
This prevents multiple threads from writing to the same memory at a given time. After the thread
finishes writing to the shared memory, the thread releases the mutex object. Malware might also
use a mutex to avoid reinfecting the host. When the file infected by ransomware is executed,
Sodinokibi generates a different mutex for each build.
After creating the IAT, the function checks if it can execute itself. For this, it runs a mutex function
that uses a string as an identifier. An example of the identifier is below:
Global\\3555A3D6-37B3-0919-F7BE-F3AAB5B6644A
Once it has checked the mutex function, it checks the status for privilege in the settings file. This
is a JSON file where one of its sections has been extracted. The parameter that indicates whether
it needs to scale privileges is exp. If it is false, it will not scale privileges. It processes the JSON data
to know the value of exp, changing false and true into zero or one. If the ‘exp’ field is ‘true’, then a 32
or 64-bit shellcode is executed with the exploit CVE-2018-8453 (which exploits a vulnerability in
33
win32k) through the elevation of privilege. This vulnerability will be explained in the later sections
if the sample does not need to scale privileges. In this analysis, it has already scaled them, so exp:
false. It is common for this kind of malware to make several checks and privilege scales in different
CVE 2018-8453: It is a vulnerability in win32k.sys discovered by Kaspersky Lab. The exploit is first
executed through a malware installer to obtain privileges for persistence in victim systems. This
vulnerability has been extraordinarily targeted and used in conjunction with different attacks.
Improvement/security update on the vulnerability is covered in KB4464619 (Microsoft, 2019)
with key changes listed below:
As suggested earlier, it will use the runas command to launch an admin instance and terminate the
limited privilege user if the system is not vulnerable.
3.3.4 Exploitation
This part obtains the folder that has the file needed for the exploitation of Win32k. It runs two
functions to avoid being redirected to the 64-bit folder and request the 32-bit system folder. This
process provides the address c:\\windows\\system32. The exploit is carried out by unscrambling
the file location of win32kfull.sys and win32k.sys. It checks if these files exist in the system, or else
it returns 0. Also, it checks if the file is old enough to be exploited. With the GetNativeSystemInfo
function, it checks the architecture of the processor. It also ensures that the amount of memory is
utilized by the exploit.
34
Figure 18: Checking Architecture
It performs a virtual allocation of memory to reserve a space and copy this exploit to the assigned
space. Once it has the exploit in memory, it will dynamically load the libraries and obtain the
addresses of the functions it will need to create its own IAT. Finally, once it has all the functions,
it will carry out the exploit.
This accesses the function to run the process. It first checks if it has the necessary data to run the
process. Otherwise, it opens a function to obtain the information and then closes the handle. On
successful retrieval of information, it returns 1. It will obtain explorer.exe, which will be used to
check the SID later. Because of this, it skips everything else and goes straight to the XOR, which
means, for now, only one explorer.exe is open, where an ID has been checked.
In this routine, an essential section of the execution is seen. It obtains the following information
for profiling the victim machine:
• Filename
• Extensions
• Username
• Computer name
• Domain
• Language (check whether language is Russian)
35
• Version of OS
• Drive details
• CPU Architecture
The final part of this function is placing an info.txt with instructions to recover the files, as shown
in Figure 16.
The following section of the JSON file, as in Figure 17, refers to the configuration of malware.
Depending on the information mentioned in the JSON file, the malware carries out its operations.
36
Figure 21: Values assigned to the JSON
In the nname field, there is {EXT}.info.txt. {EXT} will be replaced by the random string generated
during execution.
Following is the table with the definition and format of each JSON field.
Fields Description
pid An identifier for sending data to C2 servers. If ‘net’ field is set to ‘true’
sub The identifier for sending data to C2 servers. If ‘net’ field is set to ‘true’
dbg Debug: true/false. A value used by the malware author. It is referred to when
trying to determine if the victim is Russian
fast true/false. A value that determines how files bigger than 65535 should be
encrypted
wipe true/false. A value that determines whether the ransomware should delete
directories specified in the ‘wfld’ field
wht → fld Folder exclusions. *wht suggests the values that must not be encrypted
37
wfld Wipe folder. Exclusion list for files to delete if the ‘wipe’ field contains the
value ‘true’
net Files encryption in the network: true/false. A value that determines if the
ransomware should send primary host and malware information to the C2
servers
nbody Instructions for payment. Text notes are obfuscated in base64, which will be
dropped in directories when the files are encrypted
nname {EXT}-readme.txt. Name of the file that will contain the note defined in the filed
‘nbody’
exp Exploit true/false. A value that determines if the ransomware needs to escalate
privileges by exploiting the level of privilege escalation vulnerability
img Image set as desktop wallpaper during encryption or with the obfuscated base64
text
We can see that the keyboard uses a list of exclusions. It obtains a list with the identifiers for the
keyboard layouts using GetKeyboardLayoutList, where it will go through the languages to check that
they are allowed. To do this, it carries out a switch with all the languages, which will be used later
for the text.
38
Figure 22: Obtaining the Exclusion list for Languages
The malware stops executing if the list items coincide with the ones illustrated in Figure 22 above.
This makes those victims with any of the observed keyboard layouts immune to the attack. It can
be inferred that the countries using these keyvoard languages are whitelisted or exempted from
these attacks.
In this case, we see that it takes a snapshot of the processes running on the system. It will go
through them and compare them with processes specified in the ‘prc’ field on the JSON. If they
coincide, they are terminated. As per Figure 23, it is only applicable for mysql.exe.
At this point, it runs a function to unscramble interesting strings that will be executed later. The
available and most crucial string is vssadmin.exe, which deletes system backups. This ensures that
39
the attacker has full control, and the victim cannot return to the previous OS version. The
following command is utilized for this action (Soft, 2019):
cmd.exe /c vssadmin.exe Delete Shadows /All /Quiet & bcdedit /set {default} recovery
enabled No & bcdedit /set {default} bootstatuspolicy ignoreallfailures
As the explorer.exe has system-level permissions, it tries to impersonate it. To delete the shadow
copies, it gives priority to the current window.
Then the malware checks the ‘wipe’ value in the JSON configuration and if set to true, it deletes all
the files in the folders that correspond to the ‘wfld’ value of the JSON configuration.
This phase of the Sodinokibi execution flow generates and stores encryption configuration and
victim’s metadata elements. The encryption of files goes through the following mechanism:
The queue structure handles the files needed for encryption with CreateIOCompletionPort. For this,
it gets the number of strings, the key, and the handle and introduces a structure into a string. Once
added, it introduces ransom file data into the memory and encryption routine. Then it searches for
the valid file for encryption and will be utilized to leave the ransom text file in the files and
40
subfolders. Next, it creates the encryption extension of the files to be encrypted. Finally, it copies
the ransom note in all the correct folders and subfolders. It first checks if the text’s folders are
valid as per the extension in the JSON configuration. This starts the encryption routine.
In terms of encryption, Sodinokibi uses a symmetric algorithm. For files, it uses Salsa20, AES-256-
CTR for registry values, and C2 for beacons with an asymmetric key exchange method based on
the curve25519 implementation. (BlackBerry, 2019) The analysis pertaining to the cryptographic
algorithms is out of scope for this study.
At first, a primary curve25519 key pair is created for the victim. Then, the Salsa20 keys are derived
from this public key pair for file encryption. Next, the private key for file encryption is encrypted
with AES-256-CTR and saved to the registry. Finally, the key for this is extracted using the SHA3-
256 sum of the shared secret of the private key of a secondary curve25519 key pair and the
attacker’s public key pk in the configuration.
Also, the malware has two versions of the victim’s private key; one is encrypted with the pk value
and the other with the hardcoded public key in the binary’s data. Both values are appended to the
encrypted files suggesting the hardcoded public key to be the master key of the attacker.
Then it generates a unique identifier (UID) for the host using the process. Next, it obtains the
serial number of the system drive and generates a CRC32 hash of the CPUID assembly instruction
with the serial number. After that, a unique ID is referenced in the payment URL of the ransom
note. Finally, it checks whether the encryption keys already exist. The following table shows the
registry key/value pairs generated in the HKEY_LOCAL_MACHINE (HKLM) or
HKEY_CURRENT_USER (HKCU) hives where HKCU is used in case of privilege escalation fails.
Mostly ‘Software\recfg’ registry subkey is accessed, suggesting the presence of the infection. When
41
there is no session key pair is available, Sodinokibi generates it and stores it under HKLM or HKCU
as \Software\recfg key as per the Table 5 below:
sk_key Session private key encrypted with attacker’s public key in Sodinokibi
config
0_key Session private key encrypted with public key embedded in binary of
Sodinokibi
stat Info on the victim and its encryption metadata stored as victim keys
encrypted with AES used in README File and POST requests.
Sodinokibi checks whether the recfg registry key has the rnd_ext value. This value contains the string
that is appended to encrypted files.
42
3.3.12 Bitmap
The malware has a process of creating the bitmap used in the computer’s background image by
choosing the pixels, sources, etc., and adding the characters and sentence that sends the ransom
note in a loop. Finally, the result will read the text files dropped into the folders and subfolders.
After changing the background, Sodinokibi malware will try to send information about the victim
to the servers mentioned in the JSON configuration. This information can be seen as a piece of
encrypted information about the victim on the servers. The ‘net’ parameter of the JSON file
enables the broadcasting of the victim’s metadata and system’s information. The domain
information is broadcasted to the entities listed under the ‘dmn’ value of the configuration. Each
of these domains contains the ‘stat’ value from the registry.
It is known that the value of the essential indicators like the hash values, domains, host, IP, and
network artefacts can change tremendously, and with their vast amount of data with each having
a short lifespan, it is difficult to analyse the indicators and alert for anomalies. However, when the
defenders detect the attacks on a TTP level, they do not have to rely on previous information
about the attacks, such as the different indicators of compromise, and directly affect the adversary’s
operation. The Pyramid of Pain illustrated in Figure 29, designed by David Bianco (Bianco, 2014),
suggests the effect on adversaries when detection and response by the defenders are based on the
TTPs. Through this figure, detection of any threat activity could be classified into this pyramid
and determine the complexity for adversaries to modify each layer to avoid detection..
43
Figure 29: Pyramid of Pain (Bianco, 2014)
In this section, the objective to classify the identified patterns are aligned in the MITRE ATT&CK
framework. This is the processing stage of the Cyber Threat Intelligence Cycle. In this stage
information is processed to filter and map the intelligent data in a structured format for visual and
further analysis. In MITRE ATT&CK, the information collected in the previous section is
processed as per the methodologies followed by a threat in this framework and by standardizing
the details of the threat in a way that it can utilized by the Threat Intelligence Platforms.
MITRE ATT&CK is an open-source framework that gathers knowledge about the discussed
adversary and its behaviours through its research and contribution from the security community
and partners. It is a globally accessible knowledge base of adversary tactics and techniques based
on real-world observations (News, 2020). ATT&CK focuses on how adversaries compromise and
operate within computer information networks. ATT&CK fundamentally is a set of taxonomies
that provides defenders with a common language to map and communicate their findings pertinent
to adversary behaviour. It can support different areas of cyberspace defence, such as adversary
emulation, behavioural analytics, cyber threat intelligence enrichment, defence gap assessment, red
teaming, and SOC maturity assessment. Furthermore, in the same project family, MITRE
maintains a knowledge base of activity groups (referred to as Groups) and their techniques. The
entire ATT&CK knowledge base can utilize STIX interfaces that MITRE provides on their
GitHub to access its components programmatically.
The threat techniques followed by Sodinokibi show that the malware interacts with most phases
of infection for its exploit. Though different threat groups can utilize the malware to exploit a
victim system, the main idea about its inner-lying transitions remains the same. The existing
representation approach of the threats can be enhanced with STIX to provide a structured and
usable form of information for threat intelligence platforms. The information about the MITRE
ATT&CK TTP in STIX representation focuses on the dormant semi-structured data of attack
44
groups, the country the attack group is from, their motivation, and the sectors or countries they
have targeted (Zych, 2022). Figure 30 below illustrates the object and relationship types of the
ATT&CK group knowledge base for attack group APT29 with STIX representation. The intrusion
set here is the attack group APT29, the attack pattern is the techniques used, and malware and
tools are the software/exploits used by the group.
Figure 30: High-level representation of attack group knowledge base with STIX (Zych, 2022)
The comprehensive view of the Sodinokibi behaviour and indicators for the samples analysed were
determined as per the threat intelligence reports and code analysis of a different malware version
(Millington, 2020). This information is then processed in the MITRE ATT&CK framework by
understanding each attack point and navigation technique. This catalogue of techniques in the
image below is referred to in threat intelligence reports and feeds. To structure a navigation layer
with techniques used by the malware, it is crucial to go through a deep analysis of the malware to
detect any of its future versions rapidly. With MITRE ATT&CK, a navigation layer for the
adversary can be determined with the available TTPs. These TTPs are designed with the
knowledge obtained from previous malware analysis and behaviours and updated regularly as per
the security contributions. Each of its TTP was selected from the information obtained from the
deep analysis of Sodinokibi. To be precise, the navigation layer in the MITRE ATT&CK
framework allows the sequential visualization of the attack phases of the Sodinokibi ransomware.
This navigation layer also has the new versions of infection vectors used by the malware as analysed
from external sources.
Figure 31 below shows the output of the navigation layer extracted to provide details of the TTPs
the threat is using.
45
Figure 31: Sodinokibi's TTPs
This is the implementation section of this study where analysis of the threat indicators for malware
samples are done. The objective here is to collect data and extract intelligence from this data. By
collecting the indicators, reports, and information about the malware from the different threat
intelligence platforms, community, partners, CRMs, and external references, data can be enriched
in the knowledge bases. And this is a continuous process as the data keeps on growing and
updating. Thus, the collection of this data needs an automated process for analysis. To implement
the automation process, the source for collecting the threat details is determined. And these details
are then fed to an Intelligence gathering platform for analysis. This process suggests the analysis
phase of the Cyber Threat Intelligence lifecycle.
The enormous amount of hash files for the Sodinokibi malware are collected from the vx-
underground website using Python. These files are then stored in a JSON format. This file will
serve as the database for feeding the hashes into a threat intelligence platform. The process
involves automating hash data extraction from the website using python. To perform this action
seamlessly due to its large amount of data, the following data is collected from vx-underground or
Malpedia and stored in a JSON file as per the Figure 32 below.
All the information about these hashes are needed to be fed into a Threat Intelligence Platform.
The chosen platforms for gathering information were CrowdStrike Falcon X Intelligence Platform
as in Appendix A and APIs of VirusTotal a complete list of which is mentioned in Appendix B.
46
For this study, only the public VirusTotal APIs are selected to collect the information related to
each hash file. The information about a hash is collected using the publicly accessible VirusTotal
APIs as shown in Figure 33 and by passing the hashes collected in the following formats:
https://www.virustotal.com/api/v3/files/{hash}
The main challenge of the implementation is automating the code as per the complex JSON file
for varying samples. A function get_vt_collection.py is formulated to retrieve all the hashes in vx-
underground website link by parsing the hashes on the website. These hashes are fed into the VT
to get the details for the hashes already analysed in VirusTotal environment and currently publicly
accessible through VT APIs. This analysis extracts all its details in the form of a JSON formatted
file. As VirusTotal allows only 500 requests per day and at a rate of 4 per minute, extracting
analysed available samples of the hashes and its indicators is a difficult job. To work around this
restriction, Burp suite is used as the proxy server to intercept the data retrieved from the website
and view it in a JSON file format. This data has the indicators and other non-essential details about
the malware that is not informative for the analysis. And, to understand the values of indicators
extracted, the data is relatively massive in size. To have informative indicators extracted, from this
file, another function extract_indicators.py is defined that successfully provides the IOCs for given
fields. In order to have a proper visualization, the essential IoCs from the hash files are fetched in
a dataframe using Panda library or converted to csv for further analysis. The output scope is
currently kept to ten hashes for analysis. Figure 34 shows the image of the data frame with the
hash id, attack variant, and the registry key value.
47
Figure 34: Sample from code representing dataframe of Extracted Indicators
Other attributes which are important for analysis as per the behaviour of the malware is as shown
in Figure 35.
Information about the dropped files can be captured for all the samples using Python in a similar
manner as behaviour were collected in Figure 34 and 35. This is highlighted in Table 6.
Burp Suite is issued to check the response on requesting cyber threat information on each hash
file. Figure 36 demonstrates the request-response cycle in VirusTotal:
48
Figure 36: Request-Response captured using Burp Suite
The data is then normalized using STIX to create a bundle that can be imported to the threat
sharing platform. Figure 37 below shows the output of a STIX bundle extracted for Sodinokibi
from its behaviour and other collected indicators.
49
Figure 37: STIX bundle with some indicators and behaviours for Sodinokibi
This section fulfils the objective of utilizing the MISP and determining relationships pertaining to
a potential threat activity. This is the final stage of the Cyber Threat Intelligence Cycle, where data
is structured into STIX format to be utilized and shared across threat sharing platforms. MISP is
an open-source threat intelligence and malware information sharing platform that collects,
distributes, and shares cybersecurity information for malware and incident analysis. The enormous
amount of information and the aggregation and processing of information allows data utilization
to provide broader information. Furthermore, MISP simplifies its data and relationships through
integration with other intelligence-sharing platforms for further analysis. For example, the MISP
data in i2CAT is currently fetched from the ICARO program of INCIBE, the Spanish National
Cybersecurity Institute that provides an opportunity to analyse and share information among
public and private organizations. There are a few key terminologies in MISP to look out for as
below:
IoC is the indicator of compromise in a network or an operating system that can be an intrusion
or technique used by an attacker.
Event resembles information of an attribute and object. It shows the extended events linked to
the current events. Figure 38 shows the information about the events fetched for Sodinokibi from
the EU-CERTS.
50
Figure 38: MISP events accessed for Sodinokibi
Taxonomies: In MISP, taxonomies are the classification libraries to identify events, indicators, or
threats using the existing taxonomies.
In these events, indicators can be completely different from the indicators of compromise
collected during the collection phase of the cyber threat intelligence cycle, but they can be filtered
by their behaviours indicated in the MITRE ATT&CK framework. Therefore, the main objective
of learning about the MISP user interface is to learn about the practical implementations of threat
sharing in an organization and how the relationships between different threats and indicators are
made.
After adequately understanding the platform and its workflow, it is vital to use some automation
techniques to retrieve data from the MISP platform to utilize it with the future indicators that will
be discovered.
51
Figure 40: Example of Galaxy information collected from the MISP
MISP has a robust REST API that allows automation to process the threat intelligence and data.
To get the indicators and behaviours of the threats, python scripts were developed using the special
library provided by CIRCL to access the MISP platform. PyMISP is a python library that accesses
MISP platforms using REST API. In this study, the events were searched using the tags and the
clusters using this library. Through these events, the IoCs stored in the MISP is queried.
Unfortunately, there were problems with the availability of data, relationships, and proper
taxonomies currently available in the MISP platform. Figure 41 shows the behaviours of
Sodinokibi malware in its clusters linked to MITRE ATT&CK techniques.
52
This fulfils the objective of sharing the information in the threat intelligence platform. The
indicators can be imported or extracted by the STIX2 standard used by the platform. The
information about the threats and their attributes were collected from the MISP platform by
understanding and using the platform and its REST APIs. These APIS were accessed using the
PyMISP library in python. This library can return a STIX package from an event in STIX, JSON,
or XML formats. This allows utilization of the information collected in other platforms and
security communities in a way they prefer to use the feed. Thus, the platform provides direct
import and other capacities for standardized threat-sharing.
Figure 42: Vulnerability filtered in MISP to display details from MISP GitHub
To track the similarity between the indicators, we used the Levenshtein distance algorithm with
which fuzzy matching of strings that are close to a specific threshold. By utilizing this algorithm,
the malicious contents that deceive the human perception or the false positives can be determined.
The domains were calculated as per their similarities with other events with the PyMISP libraries.
53
After accessing all the domains for the events tagged to misp-galaxy: Sodinokibi, by setting the
threshold limited to 6 letters, output from the script is extracted. The result of the script provides
a high value of accuracy in finding the most used domains and their string structures. Figure 43
shows the domains extracted using the automation scripts and Levenshtein algorithm. 88 domains
were extracted, but as per the threshold set, only 40 domains remain the most common ones.
These domains were extracted from 13 events following the same taxonomy in MISP from the
python API. misp_analysis.py is developed to avail such information among the others available in
the code repository in GitHub (Preetam, 2022).
The correlation with the samples matching from the VirusTotal using regular expressions and
other known vulnerabilities is implemented and validated (Expressions, 2022). The activities in
STIX format are matched as per the threshold to provide comparative analysis between data
retrieved and data present in the MISP. Figure 44 shows the similarity percentage of a Spear
Phishing attack pattern retrieved for the domains of Sodinokibi malware in STIX format. The
definition developed in python enabling this similarily is attack_pattern.py. Figure 45 shows the
output from MISP API for a sample from VirusTotal using the function misp_analysis.py.
54
Figure 45: Output of a sample from VirusTotal in MISP for Sodinokibi
With the deep analysis of the malware, its activity has been identified and needs to be integrated
with MISP. If it is already available in MISP, it automatically detects the similarity and shows the
result of other similar events or attributes. The indicators of the behaviour of Sodinokibi that will
be utilized in the MISP Platform are as deduced in Table 7 below:
Indicators Value
True
Privilege Elevation
CVE-2018-8453
ROMANIA
RUSSIA
UKRAINE
BELARUS
ESTONIA
LATVIA
LITHUANIA
Affiliated countries
TAJIKISTAN
IRAN
ARMENIA
CYRILLIC
GEORGIA
KAZAKHSTAN
KYRGYZSTAN
55
TURKMENISTAN
CYRILLIC
RUSSIA
False
AV Detection
Process hollowing
True
Networking
Found Tor onion address
C&C True
Mutex True
Table 8 contains the list of signatures already utilized by the platforms for detecting the threat.
Networking Anti-Debugging
Table 9 illustrates other activities performed by the Sodinokibi malware as listed below:
File Created
56
File Moved
File Written
File Read
Directory Queried
Key Created
Key-Value Created
Key-Value Queried
Thread Delayed
57
LPC Port Activities Escalated
58
4. Code Repository and Structure
GitHub Repository Name: Behavioural-Analytics (Preetam, 2022)
Objective: Enrichment of threat sharing database for providing Behaviour Analytics through
malware analysis and extraction of Indicators and Behaviour.
Implementation:
1. Extraction of hashes are done from Malpedia and vx-underground. This code uses the Burp
Suite proxy server for analysing the enormous data without causing overload. Also, the quota per
day analysis in VirusTotal is limited to 500. So, it prevents the extraction limit from going over this
limit.
2. get_vt_collection.py
Handles the collection of information from VirusTotal APIs for the hashes fed. If the sample is
already analysed in VT then it provides the details of the sample in JSON format.
3. extract_indicators.py
Performs analysis of the complex data in the JSON file and process the data to extract relevant
indicators
4. stix_bundle.py
5. attack_pattern.py
6. misp_analysis.py
59
5. Conclusion
Through the Cyber Threat Intelligence workflow, the proper and practical analysis of indicators
of a persistent and well-known threat is formulated. This study will help in providing a structured
methodology for analysing a threat and an extension of the current state-of-art of the novel services
for detection and analysis of the threat. It also suggests the planning any new or existing CTI
platforms must take to keep taps on the highly contagious threats. A proper understanding of
different infection techniques of the hashes can provide an overall behavioural analysis of the
malware. The significant knowledge-sharing gaps have been prevented using the STIX format in
the CTI platforms. The extraction and automation of CTI and the classification of the attacks
using taxonomies goes beyond regular string matching to find the highest correlation between the
strings using the Levenshtein algorithm for the i2CAT datasets in the MISP platform.
The challenges can be that the whole CTI lifecycle is an iterative process that needs to be refined
over time with human efforts and technology, so that irrespective of the amount of data collected
or indicated, dealing with large unstructured data won´t be a problem. The means to leverage
intelligence from different sources is either limited, cost-ineffective, or lacks collaborative efforts.
The inability to classify threats using standardized tags hinders assigning risk scores and prioritising
alerts. Enforcement for sharing threat data for CTI is challenging as different major platforms
have subscription tiers, in-house analysis of attacks and/or multiple standards and formats. Thus,
data transformation was one of the major tasks in this study to utilize the threat data for analysis
and to convert it to a standardized format. This study also addressed the gaps in the visualization
and analysis of raw complex data.
60
6. Future Work
Due to limited availability of time, automation of APIs for all the indicators and behaviours of the
malware was treated as out of scope. The basis for researching and automating the work outlined
this thesis can be used to gather additional attributes in the future in order to have a comprehensive
view of a threat actor. Also, using a robust ML/AI model can provide an edge towards scanning
through all the resources, and automation of this process to get accurate and reliable information
from the raw data. The techniques of the lifecycle used in the thesis can be documented and
investigated for more suitable and advanced techniques available in the market. The normalized
user and system behaviour score can be integrated with all the platforms. There are many more
gaps in the cyber threat intelligence framework that needs continuous addressing and research to
have a filtered view to tackle the adversaries.
61
Glossary
Below is the glossary for abbreviations and technical terms used in this research document:
AI Artificial Intelligence
IP Internet Protocol
62
MISP Malware Information Sharing Platform
ML Machine Learning
MS Microsoft
OS Operating System
VM Virtual Machine
VT VirusTotal
63
Bibliography
BlackFog, 2022. 2021 Ransomware Attack Report. [Online]
Available at: https://www.blackfog.com/2021-ransomware-attack-report/
[Accessed 19 January 2022].
Fakterman, T., 2019. REvil / Sodinokibi: The Crown Prince of Ransomware. [Online]
Available at: https://www.cybereason.com/blog/research/the-sodinokibi-ransomware-attack
[Accessed 19 June 2022].
McAfee Labs, 2019. McAfee ATR Analyzes Sodinokibi aka REvil Ransomware-as-a-Service – What The
Code Tells Us. [Online]
Available at: https://www.mcafee.com/blogs/other-blogs/mcafee-labs/mcafee-atr-analyzes-
sodinokibi-aka-revil-ransomware-as-a-service-what-the-code-tells-us/
[Accessed 19 June 2022].
Microsoft, 2019. Windows 10 and Windows Server 2019 update history. [Online]
Available at: https://support.microsoft.com/en-gb/topic/windows-10-and-windows-server-
2019-update-history-725fc2e1-4443-6831-a5ca-51ff5cbcb059
[Accessed 19 June 2022].
64
Bianco, D., 2014. The pyramid of pain. [Online]
Available at: http://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html
[Accessed 18 May 2022].
Zych, M. &. M. V., 2022. Enhancing the STIX Representation of MITRE ATT&CK for Group
Filtering and Technique Prioritization. ResearchGate, April.
Recovery, P., 2020. Sodinokibi Ransomware (Analysis and Recovery options). [Online]
Available at: https://www.provendatarecovery.com/sodinokibi-ransomware-recovery/
[Accessed 6 June 2022].
Soft, T., 2019. News - Malware & Hoax - TG Soft Cyber Security Specialist.. [Online]
Available at: https://www.tgsoft.it/english/news_archivio_eng.asp?id=1004
[Accessed 5 April 2022].
65
MISP, n.d. MISP. [Online]
Available at: https://github.com/MISP/MISP
[Accessed 26 May 2022].
66
Appendices
67
Appendix B - List of VirusTotal API Objects
68