Masters Thesis
Masters Thesis
Freddie Barr-Smith
University of Oxford
This dissertation is submitted in partial fulfilment of the requirements for the degree of
Master of Science in Software and Systems Security at the University of Oxford.
The author confirms that this dissertation does not contain material previously submitted
for another degree or award, and that the work presented here is the author’s own, except
where otherwise stated.
Abstract
The primary aim of this dissertation is to identify malware behaviour and classify mal-
ware type, based on the network traffic produced when malware is executed in a virtu-
alised environment.
This is accomplished via producing a platform with the ability to clone and deploy virtual
machines, deploy and execute malware and collect traffic from the executed malware
samples in the form of network packet captures. These packet captures are then subject
to analysis, to facilitate the extraction of behaviours from each network traffic capture.
Behaviours extracted from the network packet capture are then aggregated and weighted
heuristics are applied to classify malware type. Information resultant from dynamic
analysisis is then presented to the user of the platform in addition to other decorating
information regarding the malware.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Review of Literature and Tooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Dynamic Malware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Malware Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Behavioural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Blended Threat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Malware Analysis Via Network Traces . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Grouping of Malware by Network Behaviour . . . . . . . . . . . . . . . . . . 13
2.6 Anti Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.1 Anti-Disassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.2 Anti-VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Anti-Anti Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Requirements Elicited Through Surveys . . . . . . . . . . . . . . . . . . . . . 17
3.2 Survey Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Survey Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Non-Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Virtual Machine Deployment and Malware Execution Mechanism . . . . . . . 22
4.2.1 Malware Execution Mechanism . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 Anti-Anti-Forensics Capabilities . . . . . . . . . . . . . . . . . . . . 25
4.3 Data Collection and Exportation of Network Traffic From Malware . . . . . . 26
4.4 Web Application Interface for Malware Analysis . . . . . . . . . . . . . . . . 26
4.4.1 Malware Information . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Behavioural Heuristic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Classification of Malware Type Resultant from Heuristics . . . . . . . . . . . 28
1
5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 Selection of Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.1 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Malware and Threat Intelligence Sources . . . . . . . . . . . . . . . . . . . . 30
6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1 Dynamic Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.1 Clustering by Traffic Type . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.2 DNS and IP Address Analysis . . . . . . . . . . . . . . . . . . . . . 34
6.1.3 Propagation, Infection and Action . . . . . . . . . . . . . . . . . . . 35
6.1.4 HTTP Traffic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.5 DNS and IP Traffic Analysis . . . . . . . . . . . . . . . . . . . . . . 37
6.1.6 Temporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1.7 Geolocation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1.8 Traffic Pattern Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Weighted Heuristic Analysis of Characteristics and Behaviour . . . . . . . . . 40
6.2.1 Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2.2 Server vs. Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.3 Downloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.4 Tor Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2.5 DNS Blacklists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2.6 Crypto Mining Pools . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2.7 Fast-Flux Domain Switching . . . . . . . . . . . . . . . . . . . . . . 46
6.2.8 Web Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Behaviour and Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3.1 Weighted Heuristics for Classifying Malware Type . . . . . . . . . . 48
6.3.2 Cryptocurrency Mining Malware . . . . . . . . . . . . . . . . . . . . 49
6.3.3 In-Browser Cryptojacker . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3.4 Ransomware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.5 Remote Access Trojan . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.6 Fake Antivirus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.7 Botnet Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3.8 Worms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.9 Droppers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3.10 Exploit Kits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3.11 APT Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2
7.1 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2 User Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.1 Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.2 Personal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Bibliography 68
Appendices 79
1 Results of Malware Samples Classification . . . . . . . . . . . . . . . . . . . . . . . . 79
1.1 Cryptocurrency Mining Malware . . . . . . . . . . . . . . . . . . . . . . . . . 79
1.2 In-Browser Cryptojacker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.3 Ransomware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
1.4 Remote Access Trojan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.5 Fake Antivirus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1.6 Botnet Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
1.7 Worms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
1.8 Droppers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
1.9 Exploit Kits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
1.10 APT Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2 User Testing - Survey Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3
List of Figures
4
6.7 Process For Extracting Classifications Via Behavioural Heuristics . . . . . . . . . . . 41
6.8 Port Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.9 Pie Chart of Network Traffic Direction of Client Malware . . . . . . . . . . . . . . . 43
6.10 File Types and Hashes Downloaded in Malware Analysis Visualisation . . . . . . . . 44
6.11 Tor Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.12 Cryptocurrency Mining Pool Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.13 DNS Common Resource Record - Time To Live . . . . . . . . . . . . . . . . . . . . . 47
6.14 DNS Packet Header - Total Answers Field . . . . . . . . . . . . . . . . . . . . . . . . 47
6.15 XMRig Monero Mining Malware In Operation . . . . . . . . . . . . . . . . . . . . . . 49
6.16 Malware Classified as Cryptominer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.17 In-Browser CryptoJacking Malware Executing . . . . . . . . . . . . . . . . . . . . . . 51
6.18 Malware Accurately Classified as In-Browser Cryptominer . . . . . . . . . . . . . . . 52
6.19 Ransomware Executed in Virtualised Hosts . . . . . . . . . . . . . . . . . . . . . . . 53
6.20 Ransomware Executed in Virtualised Hosts . . . . . . . . . . . . . . . . . . . . . . . 54
6.21 Malware Classified as Ransomware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.22 Fake Antivirus Executed in Virtualised Hosts . . . . . . . . . . . . . . . . . . . . . . 56
6.23 Malware Classified as Fake Antivirus, Blended with Other Types . . . . . . . . . . . 57
6.24 Botnet Command and Control Structure . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.25 Malware Classified as Botnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.26 Worm Propagation Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.27 Worm Scanning Non-Vulnerable Host . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.28 Worm Scanning and Exploiting Vulnerable Host . . . . . . . . . . . . . . . . . . . . 60
6.29 Malware Classified as Worm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.30 Dropper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.31 Malware Classified as Dropper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.32 Malware Classified as Exploit Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.1 Were You Able To Use This System To Clone Virtual Machines And Execute Malware? 88
2.2 Is The Mapping Of Behaviours To Classifications Accurate? . . . . . . . . . . . . . . 89
2.3 Does This Visualisation Augment Your Understanding Of The Malware? . . . . . . . 89
2.4 Is the Graphic Visualisation of Malware’s Network Traffic Easy To Understand? . . 89
2.5 Did You Find That The Statistical Analysis Provided Actionable Intelligence? . . . 90
5
1 Introduction
Malware is software on computer systems that is designed to cause damage to the systems that it
infects or to subvert the intended usage of the system. The intentional use and development of mal-
ware and its use of [1]“more sophisticated exploitation methods” as well as its increased sophistication
in obfuscation and delivery is a relatively recent phenomenon.
The term virus was first introduced in a paper in 1984 by Cohen, this term referred to a concept
with limited experimental use at the time. The [2]“first virus was conceived of as an experiment
to be presented at a weekly seminar on computer security”. In the decades since that paper was
published there has been an entire industry created for the protection of computer systems from
malicious software. Twinned with this there has also been the rise of a cybercrime industry for the
production of malware. This is reflective of an overall transformation of the creation of computer
viruses as a form of light hearted prank or for notoriety into a tool for espionage and cybercrime.
As malware propagates through computer networks, infected computers will leave forensic evidence
of the activity of the malware through its network activity. Dynamic and static analysis can also be
performed on the code and its interactions with the operating system and memory itself, however
this is not the methodology used within this project to identify and analyse malware, as we will be
analysing the network traces produced.
There are different levels of skill employed to develop malware. More experienced developers will
typically be able to create more complex and obfuscated code which only executes in certain environ-
ments and is resistant to analysis. These are not properties solely available to experienced malware
developers however [3]“malware creation toolkits greatly lower the novice attackers barriers to enter
the cyber-crime world allowing inexperienced attackers to write and customise their own malware
samples and lead to a massive proliferation of new malware samples due to their wide availability”.
Malware may also try to obfuscate detection and analysis. Methods used include detection of being
in a virtualised environment and attempts to minimise its memory and network signatures. This has
led to the development of polymorphic malware and packers to obfuscate signature-based analysis.
The increased utilisation of heuristic analysis of malware has led to a corresponding rise in capability
in order to evade this heuristic analysis.
Malware may also try and obfuscate its network traffic to enable it to exfiltrate data using common
network protocols. This use of covertness is distinct from cryptography in that it is not trying to
make the data unreadable, but instead is trying to make the transmission and communication of
data seem indistinguishable from normal network traffic. This is defined as the [4]“camouflage of the
transmission of data” and means that malware authors attempt to make the data as inconspicuous
as possible. Due to the isolated sandbox nature of the system being developed, this camouflage
attempt is thwarted somewhat as the presence of any network traffic whatsoever is likely indicative
of malicious activity.
6
2 Review of Literature and Tooling
There is a wide variety of literature and research on the topic of the nascent field of malware analysis.
It is logical to build upon existing research in the field in the development of this system. A paper by
analysts at Lockheed Martin developed what has been referred to as the [5]“Intrusion Kill Chain”
which has been well regarded and is commonly seen as the attack model that intruders follow when
gaining access to a system. This malware analysis system does not detect the event of initial inbound
exploitation and intrusion to a system, but rather operates at the level of intrusion detection and
response. In the kill chain model, the parts of the attack process that are observed and analysed by
this system are the [5]“Installation, Command and Control and Actions on Objectives” portions of
the attack. However other portions of the malware’s execution may result in network activity that
can be classified as belonging to other part’s of the killchain.
This separation of exploit and payload as distinct parts of the malware’s operation is more nebulous
than it may initially appear, as when malware has achieved persistence on a target it will frequently
look for other wormable targets on the local network. This is an indication of the viruses attempt
to propagate and spread to other systems, but crucially involves the deployment of an exploit
and scanning capabilities. This propagation is so common amongst computer viruses that early
definitions of malware stated that [6]“in order to determine that a given program P is a virus it
must be determined that P infects other programs”, meaning that the propagation mechanism was
seen as the defining aspect of a virus.
An example of the exploit and payload of malware being distinct and separate parts of a malware
sample is the corpus of malware in 2017 that was developed after the leak of the EternalBlue exploit.
The corpus of malware using this exploit as an entry vector, differed greatly in their actions after
the initial target exploitation.
This separation of exploit and payload mirrors the deployment of early genes, for the sake of infection
and maintenance of persistence and the deployment of late genes, for propagation and replication,
that is present in biological viruses. This model is visible diagrammatically below.
7
Figure 2.2: [7]Deployment Model of Early and Late Genes of Biological Virus
Two notorious malware samples utilising EternalBlue were Wannacryptor and NotPetya. Even be-
tween the two samples, both of which were ransomware, using the same attack vector, there was a
difference in the propagation mechanism. With NotPetya taking a [8]“more targeted approach to in-
fection that [9]“uses a modified version of the Mimikatz tool to steal the user’s Windows credentials”
in combination with the [9]“the EternalBlue exploit tool” leading to [10]“the fastest-propagating piece
of malware we’ve ever seen”. This is in comparison to the marginally less aggressive propagation
methodology used by the [11]“WannaCry ransomware attack that affected more than 200,000 com-
puters in 150 nations”. This serves as a practical example of the conceptual separation of exploits
and payloads. There are a variety of other payloads that have utilised the EternalBlue exploit as
entry vector for more subtle compromise. As the initial FuzzBunch framework from which this tool
was leaked was a remote access tool, this subtle compromise is more congruent with its original
intended usage.
8
Figure 2.3: [12]Exploit Utilised as Entry Vector - EternalBlue
9
Figure 2.5: [12]DoublePulsar Payload Used for BackDoor Installation
There are two distinct techniques that are used for malware analysis, these are static analysis and
dynamic analysis. Static analysis is observing a malware sample’s properties without executing it in
order to determine characteristics of that malware. Static analysis can include source code analysis,
with the limitation that malware authors will try and obscure their source code. The eventual result
of static analysis is to provide a signature database of [13]“regular expressions that specify byte or
instruction sequences that are considered malicious. A program is declared malware when one of the
signatures is identified in the program’s code”. Many antivirus companies use a signature database
for their commercial offerings, to detect malicious software present on customers’ computer systems.
[14]“Dynamic analysis refers to techniques that execute a sample and verify the actions this sample
performs in practice”. This is achieved via analysis of the behaviour of the program in memory
and the use of debuggers such as IDA Pro. Dynamic analysis of memory is typically conducted
on an isolated virtual machine to prevent the compromise or damage of the host operating system.
Dynamic analysis will typically examine API and system calls in addition to the instruction trace for
a given binary or executable, to analyse the commands it executes in the format of assembly code.
Behavioural analysis of malware based on the network traffic that it generates during its operation
can be defined as dynamic analysis of malware. This is because it is relying on analysing the data
generated by the executed program to classify and identify its behaviour as malicious.
There are a number of tools that are effective at conducting dynamic analysis of malware currently
in use. Most existing malware analysis tools use virtualisation software in order to allow a platform
for the malicious code to be executed and analysed. This is in combination with additional software
to allow for analysis of the behaviour of the malware during its execution.
There are many different options for creating this virtualisation environment often bespoke for a
particular purpose. An example of which is that [15]“a number of SCADA testbeds have been devel-
oped by academic, government and private entities to find new vulnerabilities”. However as shown by
the fact that analysis evasion techniques are specifically designed to detect VMware hosts, the most
common deployment tools for dynamic malware analysis sandboxes are VMware and Virtualbox’s
virtualisation software.
With regards to the actual analysis of the behaviour of the malware there is a plethora of soft-
ware tools that can be used. One example is the Cuckoo sandbox software and its SaaS offering
Malwr.com, which allows the upload and dynamic analysis of files in a manner similar to Virusto-
tal. This software is designed to be used in addition to traditional techniques and [16]“before using
debugging techniques and static reverse engineering it can be useful to collect some corner pieces
from a sandbox report”. This project outline is similar in design to the proposed malware analysis
tool, in that its principle goal is that of ensuring the there is sandboxing of environments for the
executed malware. It also shares the aim of being ancillary to in-depth static and dynamic analysis
10
of malware.
Debugger and disassembly software such as GDB and IDA pro is used to track the instruction trace
resultant from malwares’ execution. The aim of this software is to manually view the [17]“sequence
of function calls that have been made in order for execution to reach a particular location within a
binary”. This is conducted through the methodology of stack traces and similar methods. These
debugging and disassembly technologies execute solely in memory for the given executable.
There is a large amount of network and digital forensics research on which it is possible to develop.
Regarding tools specifically for forensic analysis of malware via analysing the network data that is
generated, there are already a number of available solutions. Sandnet is a software and research
project that collects and performs analysis on network traffic data from malware. This project
aims to form a [18]“comprehensive characterisation of the network behaviours of malware” and in
many ways succeeds in this aim. The main analysis conducted by this project is of the collection of
network data from a large variety of malware samples in order to determine general characteristics,
rather than to probabilistically classify the malware based upon these characteristics. It is possible
to extract and imitate the successful elements of this research in order to further the development
of the system.
Another seminal research project in the field with many similarities to this is the work on the project
ANUBIS, which is a framework and toolkit for automatic analysis of unknown binaries.[19]“Malware
clustering to find partitioning of a given set of malware samples into subsets so that subsets share
some common traits” is the specific way in which ANUBIS excels.
There is a possible delineation of malware into different types and sub-types. The wide variety of
malware types that are present is a reflection of the complexity of malicious code in the cybercriminal
ecosystem. Classification of malware into types may fail to address the interconnected nature of
malware, in that malware may possess behavioural characteristics of another type. This may example
itself in a number of ways, one such way is that a ransomware variant may propagate and infect
target systems through the methodology of a worm. Research has [20]“shown interesting similarities
between malware families” and the intersection between these different malware types means that
there can be some shared traits between malware types. This means that there are classifications
that are somewhat fluid in definition.
This classification of malware is similar in many ways to a biological taxonomy of viruses. When
building a malware taxonomy a logical design architecture is a [21]“taxonomy built upon activities
that may be grouped to define a program’s behaviour”. This classification allows for the fact that
malware may contain several different behaviours, that will define the overarching definition of
malware type that it can be classified as.
Confusion between terms, definitions and classifications has been a problem that has faced computing
since it’s inception. Ada Lovelace, in her 1842 “Sketch of The Analytical Engine”, stated that
[22]“wherever terms have a shifting meaning, independent sets of considerations are liable to become
complicated together and reasonings and results are frequently falsified”. The same applies to malware
definitions and the amorphous definitions in this field.
Blended threat is a recent term that refers to the use of multiple attack vectors by given malware
campaigns. It also refers to the dissolution of the boundaries between different malware types.
11
Symantec state that [23]“today and the near future will be composed of blended threats and their
damage is still yet unseen”. A particularly strong example of this, is the adoption of worm-like
propagation mechanisms by ransomware.
There are also other cases in which different malware types contain characteristics or behaviours
from other malware types. Droppers and exploit kits in particular are malware types that do not
exist independently of other malware classifications, as they are typically tools for deploying other
malware classes. Droppers have been described as [24]“malware samples that download additional
components, for example as part of pay-per-install schemes), they do not provide a global view of the
malware delivery ecosystem”. The fact that they only display part of the delivery killchain means
that they must be considered as part of an aggregated malware campaign.
The Neutrino exploit kit typifies the metamorphic nature of modern malware wherein [25]“different
filtering rules are enforced and different payloads can be delivered based on the victim Geolocation,
browser and operating system. This complexity makes these threats a very interesting case study
and difficult to defend against”. This typifies the difficulty of effectively analysing and categorising
modern malware samples, in addition to the complexity of the modern malware production ecosystem
and cybercriminal economy.
12
Figure 2.6: [30]Hackforums Post Releasing Mirai IoT Malware as Open-Source Code
[31]“Network behaviour may be indicative of malware and has been used to detect malware infection,
in that most malware has network functionality in order to achieve propagation or communication
with its command and control servers. In addition to using network behaviour to detect whether or
not a piece of software is malicious in nature, this network behaviour can be used to identify and
group what particular type a malware sample is.
It is clear from existing research that a valid route of malware analysis is clustering malware by
its network behaviour. Network traffic is used to establish baseline network behaviours. This
behaviour can then be aggregated as part of a set and be used to classify malware that exhibits
certain behaviours as a specific type. Stringhini proposes that [32]“malicious communities tend to
exhibit a high degree of synchronisation in their events”, meaning that timing based and IP or DNS
based behaviour can be used to group malware.
Malware authors are aware that their software may be subject to forensic analysis and thus many
use obfuscation and analysis detection techniques to make malware analysis more difficult. The
behaviours by which malware seek to obfuscate and enable itself to operate more effectively are
[33]“ensuring entry point obfuscation, resisting manual and automated analysis, obfuscating the com-
munication of instructions and ensuring information exfiltration”.
Polymorphism and oligomorphism in viruses are a recent trend in viruses attempts to evade obfusca-
tion. [34]“Evasive polymorphism” is a technique wherein malwares’s memory and file signature will
be transformed in each sample of the binary or executable, to confound the typical virus detection
mechanism of signatures.
A significant advantage of the analysis of network traffic is that viruses do not typically use polymor-
phism to evade detection and analysis of their network signatures. Recent research by Martinovic
and AlAhmadi has shown that [35]“the main challenge in most behavioural-based malware analysis
approaches is malware behaviour obfuscation and manipulation, known as noise-injection attacks.
Although malware evasion by altering the binary itself might be feasible due to available obfuscation
tools, we believe that the network behaviour is more troublesome to tamper with”. Although poly-
morphism, obfuscation and other methods of disguise are present in various malware samples they
do not make all data collected for those samples invalid.[36]“Malware executables of the same family,
13
collected nearby in time, reuse endpoints even if their payloads are polymorphic”. Therefore network
traffic signatures indicative of malware activity can be identified and clustered via the analysis of
metadata. This means that network traffic analysis has the advantage over other forms of analysis
of not being flummoxed by obfuscation attempts by malware authors.
2.6.1 Anti-Disassembly
One such method is the detection of code being disassembled or the presence of a disassembler, such
as IDA pro on the guest host, in addition to obfuscation of the source code and behaviour of the
malware binary executable. This is not a concern for our system as we will not have a dissassembler
present on the system and are not conducting source code or binary analysis of the malware itself.
2.6.2 Anti-VM
An obstacle for the development of this system is the use of tools to detect if the malware is running
in a virtual machine. [37]“To thwart attempts at analysis, the malware attempts to detect whether it
is being run inside a virtual machine”. A variety of mechanisms will be deployed by a range of the
malware samples analysed in order to detect whether it is operating in a virtual machine. [38]“The
percentage of malware that detects VMware hovered around 18 percent” and this indicates that a
significant percentage of malware is able to alter its execution to prevent analysis from occurring.
Figure 2.7: Anti-Forensics via Anti-VM CPU Detection in Execution of Smominru malware
There has been code developed that is able to escape virtual machines and hypervisors. Although
this is patched it did allow attackers to [39]“reliably execute code from the guest into the host ”. This
allows malware authors to not only evade detection by analysts but allows them to compromise the
machine which is conducting the analysis. This attack vector is a continual threat to people using
sandboxing systems. Google’s Project Zero team were able to use a variety of methodologies of
escaping virtualisation software to attack the host operating system.
14
Company and Products MAC Unique Identifier
VMware ESX 3, Server, Workstation, Player 00-50-56, 00-0C-29, 00-05-69
Microsoft Hyper-V, Virtual Server, Virtual PC 00-03-FF
Parallells Desktop, Workstation, Server, Virtuozzo 00-1C-42
Virtual Iron 4 00-0F-4B
Red Hat Xen 00-16-3E
Oracle VM 00-16-3E
Xensource 00-16-3E
Novell Xen 00-16-3E
Sun xVM VirtualBox 08-00-27
To foil obfuscation attempts by malware authors, the virtualisation and network traffic capture
elements of this software must be able to bypass these anti-forensic mechanisms to record reliable
data. This ensures that the virtual machine which has been deployed must suppress evidence of its
virtualisation.
Detection of certain characteristics of the operating system on which the malware is deployed and
executed is one method by which anti-forensics components of malware function. This is a rudimen-
tary analysis of the host environment itself performed by the malware. To confound this malware
obfuscation technique it is necessary to disguise the fact that the host is virtualised and is being
monitored.
Detection of network characteristics and liveness checking is a common feature of malware protection.
A particular example of this is the WannaCry malware, which checked for the presence of a killswitch
domain before execution. [40]“The domain is a kill switch in case something goes wrong” in this
instance it may have also been attempted anti-analysis, wherein the feature was developed to prevent
analysis by systems that have a responding DNS server. However typically this behaviour will be to
perform liveness detection of a given host, in that the malware will only display its actual behaviour
when the malware is able to verify internet connectivity.
15
Figure 2.9: Normal DNS Server
This proposed system will not be wholly resistant to the anti-forensic methodologies despite attempts
at anti-anti forensics and can be seen to have some limitations at confounding anti-forensics. This
is due not only to the ingenuity of malware authors in detecting sandboxed or virtualised systems
but also due to the sheer variety of mechanisms that are used to detect sandboxing.[41]“Since we
run malware samples automatically with no human interaction, such behaviour will not occur in our
traces.” Although it is possible to create sandboxes that more closely imitate the actual wear and
tear of legitimate computer users, there are diminishing returns to such an endeavour.
Comparable systems have focused more on establishing a believable honeypot. One such method
of ensuring that active attackers are fooled is establishing personas and fake filesystems imitating
an actual user. This research and experimentation on virtual victims of a remote access trojan
found that [42]“personas with more detailed file systems occupied operators longer”. The usage of
the system by a normal user should dynamically generate a variety of process activity and a number
of log events that are not easy to imitate. Thus malware has begun to detect the [43]“wear and
tear that is expected to occur as a result of normal use” and trigger execution dependent on the
presence of this type of evidence on the system. To an extent it is the virtualisation of the operating
system itself that is causing the problems related to the detection of a sandbox. In many ways the
[44]“ultimate way to thwart such detection is to analyse malware in a bare-metal environment”. For
this reason many antivirus vendors, threat intelligence companies and security consultancies use a
bare-metal analysis environment in order to effectively analyse the samples that are provided.
16
3 Requirements
Before designing the system it is necessary to define the functionality of the system through a set of
formally defined requirements. These requirements must be both non-functional and functional in
order to effectively assist with the design of the system in addition to being objective and quantifiable.
Systems engineering as a discipline defines requirements as the [45]“needs and objectives for the
system and how they relate to how well the system will work in its intended environment”. It is also
necessary to define constraints imposed.
The inputs to the designed system will be malware samples that will be executed. The outputs of
the given system will be network traffic, that is analysed and presented to the user or analyst as
actionable data and information regarding the particular malware samples. Another form of outputs
will be characteristics, behaviour and classification of the given malware based on the network data
analysed within the system.
Essential to gaining an effective understanding of the requirements for malware analysts is to survey
them directly. This was the approach taken in order to elicit requirements for the development
and design of the system. These requirements were gleaned from performing a survey of malware
analysts on online forums and of personal acquaintance. An essential part of engineering successful
systems is [46]“gathering and understanding information about the users of the system, as well as
the technical requirements of the system”.
Elicitation of professional opinions on the subject resultant from conducting surveys of potential
users of the software is of importance and is resultant in a list of user supplied requirements and
evolutionary design methodologies. The questions on the survey principally relate to the current
tools being utilised by the malware analysts and also the design features to be contained.
To be compliant with the relevant ethical and legal matters for the collection of participant data it
is necessary to take ethical precautions. These ethical precautions are also taken for the purpose of
collecting and analysing survey results during user testing at the conclusion of the project.
The acquisition of data included the acquisition of informed consent for the processing of the data
provided. After this data was acquired it was stored on a host for preparing the survey results for
use in this report. This host is protected by the use of full disk encryption to prevent so-called
evil-maid attacks wherein an [47]“attacker gains access to your shut-down computer”. The utility of
this encryption is that [48]“FDE solutions aim to provide data security, even in the event that an
encrypted device is lost or stolen. All information is encrypted/decrypted on the fly, automatically
and transparently. Without the encryption key, the data stored on the disk remains inaccessible to
any users (regular or malicious)”. The security of the data can be guaranteed. This collected survey
data was then removed once the data had been processed into aggregate and anonymised form.
The process of anonymisation means that when an individual or organisation [49]“converts personal
data into an anonymised form and discloses it, this will not amount to a disclosure of personal
data”. If data is not stored locally and anonymised then this is not technically defined as disclo-
sure of personal data. This anonymisation is for the purpose of avoiding the [50]“possibility of
re-identifying anonymised datasets”. There are a variety of techniques that can be used to achieve
this. Some of these techniques are [51]“removal of identifiers, the use of pseudonyms and other
technical means for breaking the link between data and identifiable individuals such as broadbanding
or micro-aggregation”. Specifically removal of identifiers and broadbanding has been used to ensure
anonymisation of data.
17
3.3 Survey Results
This survey used both qualitative and quantitative methods to establish malware analysts’ opinions
of the field of network forensic analysis applied to malware investigation. These questions and their
aggregated answers were chosen principally to guide the design of the system. [52]“This is because
the system stakeholders have the most pertinent domain knowledge. Therefore any decision taken by
these stakeholders should take into account their different needs”. As such, it is pertinent to survey
people with relevant domain knowledge, those being current malware analysts.
Survey responses were analysed in aggregate and the data is sufficiently anonymised in it’s presen-
tation such that it does not compromise the confidentiality of the survey respondents. The results
of the survey indicated that a large amount of malware analysts use a wide variety of pre-existing
toolkits to accomplish their goals of analysing malware.
As shown in the above graph, the survey participants typically wanted both a command line and
web interface format in which they could view results. This corresponded with questions regarding
potential features for the toolkit, wherein multiple survey recipients stated that they would like
output in the format of pcap files, interpretable by wireshark.
Another interesting statistic shows the host operating systems on which malware is installed for
dynamic analysis. The majority of hosts under analysis are Windows systems. This is somewhat
counter-intuitive as you would expect there to be a somewhat equivalent split between Windows
and Linux hosts in terms of systems analysed. A potential reason for this divergence is the fact that
there is a significant prevalence of [53]“modern Windows desktop malware”, giving a larger number of
samples. As Windows is also the most prevalent operating system within the consumer marketplace
18
it also makes financial sense for malware analysts to develop a strong skillset in malware analysis in
the most common operating system.
In terms of network activity seen there is a prevalence of connection to command and control
infrastructure seen by malware analysts. In addition to this there is liveness and sandbox detection
seen via network communication. Dropper functionality, wherein the payload of the malware is
fetched via further communication with a command and control infrastructure can also be seen as
a characteristic of malware that results in network activity. Not mentioned in the survey is data
exfiltration, which is another common network behaviour of malware
It can be seen from the survey responses and comments that there are obfuscation attempts for
malware to attempt to restrict its detection by analysts or to avoid alerting targets that anything is
amiss. According to surveyed analysts this can take the form of encrypted traffic via SSL or using
intermediary proxy servers. There may also be polymorphic network activity, in that authors make
attempts to evade signature-based detection by making network activity different for each malware
sample.
Despite this obfuscation there can be grouping of families of malware by characteristics of the
malware available. In a qualitative element of the survey, analysts stated that although recent
developments in the sophistication of malware are making classification more difficult, it is still
possible.
19
3.4 Non-Functional Requirements
In terms of the non-functional requirements of the system, there are performance constraints which
must be adhered to. The [54]“specified hardware constraints” are 16 Gigabytes RAM and 2 Terabytes
of storage this means that the system must not require an undue amount of computation.
The web interface component of the system must also be usable and understandable to the extent that
during user testing people not familiar with computer security, are able to understand the analysis
presented. It must also adhere to good user design and experience principles and be aesthetically
pleasing.
The system must also be secure in that the malware samples downloaded and executed must not
propagate throughout the network or onto the host machine. This is accomplished via dynamically
cloned machines for each testing instance. These virtual machines that have had malware executed
upon them must not be reused and destroyed, to both maintain the security of the host machine and
to ensure that the behaviour of one malware sample does not occur in another sandboxed testing
environment.
There is also the requirement for this system to be tested in a variety of ways. One aspect of this
testing must be unit testing, in which the functionality of individual functions and components of
the code must be tested programatically. The system will need extensive amounts of testing and
analysis of malware samples as this is the primary function of the system.
There must also be user testing of the system once it is completely functional. This testing will ensure
that the initial requirements and the design emergent from these requirements can be evaluated as
fulfilled.
Functional requirements enumerate the functions that a system should perform. There are a few
key functions that the system is required to facilitate. These are the deployment and cloning of
virtual machines, the deployment and execution of malware on these target machines, the capture
and storage of this data and the analysis and visualisation of this collected data. Each of these
components should conform to a number of requirements in order to be classified as effective.
The virtual machine deployment and cloning portion of the system should be as fast as is possible
in terms of execution speed as otherwise it can slow the execution of the remainder of the system.
It should also use anti-forensics evasion in order to ensure that reliable collection of data is possible.
It should additionally be easily accessible and intuitive for a user to deploy new VMs for malware
transfer and execution.
The malware deployment and execution component of this malware must be able to execute on
a wide variety of operating systems to enable different sandboxed environments. The objective of
a sandboxed environment is to provide a [55]“generic, systematic methodology that can be applied
to various VMM platforms and operating systems”. This platform agnostic deployment of malware
increases exponentially the amount of malware samples that can be effectively deployed. This means
accommodation for the different execution environments present in Windows and Linux systems.
The network traffic capture and storage resultant from the execution of this malware must be
exported into pcap files which can be stored and analysed effectively.
The data analysis and display portion of this system must be able to be understood by a non
technically adept user in addition to providing more in-depth analysis of a malware’s network activity
to a skilled analyst. The analysis component should be able to classify the malware and whether a
program is malicious or not solely dependent on the network trace as input.
20
4 Design
The design of this system must be composed of a number of distinct components of software to
function correctly as a system. Design must also be informed by the collection and analysis of
requirements [56]“specification of user and organisational requirements, production and testing of
design solutions and evaluation of designs against user requirements”. These requirements can be
adapted into design principles that are factored into the design process.
In addition to the bespoke requirements and design principles that are emergent from the require-
ments solicitation, there should be adherence to commonly regarded good software design principles.
A core principle is that [57]“software does not directly implement the functionality of a system. In-
stead, it implements a set of abstractions or theories that provide the desired functionality.” For the
system to effectively achieve it’s underlying goals, it is important that these abstract principles are
logical and follow best practices.
One of the core design principles adhered to is the abstraction and modularity of separate parts of
the system, that whilst functioning effectively as constituent modules and objects should be able to
work together as a cohesive whole. This is defined as the [58]“appropriate levels of abstraction and
modularity that make it possible for individuals and the components they manage to work effectively”.
These separate parts should function as part of the malware analysis platform, this separation and
abstraction also allows for concurrency of task processing. In addition to this there are regular
backups taken of the codebase and data stored in order to ensure the system does not fail or have
corrupted data.
Regarding the modules of this system, it is necessary to list them and how they all cooperate to form
a functional system. The interface with which the user interacts is the web interface and application.
This web interface is modelled on that provided by Openstack’s horizon dashboard. [59]“With
this web based GUI provided by Horizon operations for instances (such as launching, termination,
suspension etc.)”. It is from this interface that other subsystems are accessed in order to achieve the
goals of the system. This architecture is congruent with the recommended architecture for Django
systems, that being the [60]“LAPD (Linux, Apache, PostgreSQL, and Django)” stack, wordplay on
the standardised web architecture of [61]“LAMP(Linux, Apache, MySQL, PHP) stack. Postgresql
being the database architecture supporting the web application and it’s deployment and analysis
modules.
Subsystems accessible from within the web application are the virtual machine deployment and
malware execution modules.
21
Figure 4.2: Structure of Virtual Machine and Malware Execution Mechanisms
Also within the web application are the malware analysis and visualisation elements. Malware
analysis, particularly network forensic analysis of malware, requires [62]“large amounts of data,
complex data analysis requirements, and the combination of automated data analysis with analytical
reasoning by domain experts lends itself very well to the notion of visual analytics”. This requires
the system to graphically visualise relevant elements of the data extracted from the packet captures.
The deployment of an operating system for a particular malware sample and its subsequent execution
is an important component of the system. This is because the system is dependent on the deployment
of an operating system and the execution of a binary or executable file. The deployment of these
virtual machines is performed through the web interface to facilitate the deployment and execution
of malware and facilitate the retrospective [63]“observation of the execution of malware programs”.
22
Figure 4.3: Operating System Cloning Functionality
The virtual machines onto which malware is deployed should be destroyed after the execution and
analysis has been performed. Additionally, malware should only be deployed onto cloned operating
systems to prevent cross-contamination of malware samples from one infected host. This is to ensure
that the base image remains free of malware and thus does not cause errors in the data due to the
aforementioned cross-contamination.
The execution of malware will be system agnostic, in that when malware is entered into the system
it should be able to be deployed on a wide range of operating system versions. Malware that
executes on both Windows and Linux systems is more rare than a strain that executes on only one
architecture. However these strains do exist, one example of such a piece of malware is CrossRAT,
which is[64]“able to execute on a wide range of platforms”. There is a far larger corpus of malware
designed to run on Windows systems than there is for other operating systems. This is due to the
larger user base of Windows systems and therefore the wider landscape of potential hosts to infect.
[65]“64-bit Windows operating systems are gaining an increasing market share, and currently hold
a clear majority of the operating system market. Despite the high proportion of 64-bit users, 64-bit
malware still makes up less than 1 percent of the current threat landscape”. As such the malware
execution mechanism will be geared towards executing binary payloads on Windows systems, with
capabilities to execute on both 64 and 32-bit architectures.
23
Figure 4.4: Malware Transfer and Execution Functionality
The web application will ideally enable the analyst to effortlessly deploy malware a cloned operating
system and execute malware on this system. The eventual aim of this execution is to [66]“capture
the complete program behaviour” that results from the deployment of code on a virtualised system.
These malware samples should be ran via a standardised execution process. This may prevent
the execution and effective capture of malware with a more advanced execution process for target
systems. This is an acceptable loss as the vast majority of malware typically does not use these
more complex execution patterns.
There are liveness detection checks used in malware typically however regarding the execution of
malware, the initial surveying of analysts indicated that in order to execute the malware and collect
network traffic, there is not the necessity to do anything more than start the malicious binary or
executable. There are no attempts to evade packers and other conditional execution methods of
obfuscation that are present in some samples. The reasoning behind this is that, unless specifically
designed to only target a given organisation or hardware configuration, the malware will be designed
to execute on as many potential targets as possible, surreptitiously or not.
Similarly there is also network liveness detection, as established in the survey of analysts, this will
24
require anti-anti forensics in the form of ensuring that the virtual network on which the system
resides resembles a real network as much as is possible. Although the use of virtualised networking
infrastructure was considered with regards to achieving this capability, this was deemed to be not
worth implementing. This is due to the NAT configuration in use by the virtual machines and the
host operating system. This NAT configuration mimics a real home or small business network, which
is the likely target of many different malware strains.
Anti-anti forensics capabilities must be ingrained within each operating system that is cloned and
deployed. This will enable malware execution to behave as expected, without killswitch detection
being triggered and thereby causing the malware to not behave as it would in the wild. Additionally
anti-forensics capabilities are an ancillary capability to the underlying central functionality of the
system.
To achieve this anti-forensics capability, there are a steps that need to be taken with the design of the
system that can be verified via an anti-forensics detector. This software aims to [67]“employ several
techniques to detect sandboxes and analysis environments in the same way as malware families do”.
The tool that is utilised to do this is called paranoid fish and is run automatically on all hosts, to
ensure that the host does not display common evidence of virtualisation, which can trigger malware
evasion mechanisms.
The anti-anti forensics technologies are a direct response to the techniques used by malware authors
for the purpose of sandbox detection. These will require manual configuration by the operator of
the malware analysis system. One such method is altering the default MAC address utilised by
VirtualBox via [68]“Medium Access Control (MAC) address spoofing” to foil malware anti-forensics
deployment.
A common test for a virtualised system is the amount of processing power and storage that is local
to the machine. Paranoidfish detects whether there is more than 60 Gigabytes of allocated space
for the system, it also detects whether there is 1 or more processor on the target machine being
utilised. Therefore the validity of the operating system is determined partially by the [69]“number
of processors in the system”. For this reason the systems are manually configured to use 2 processors
with 1GB of processing power each.
Another defensive mechanism utilised, is that of not bypassing sleep commands or having a debugger
present on the virtualised host. This is an ancillary benefit of not conducting traditional static and
dynamic analysis and the tools necessary for this. There is also a common check for installed
Perl and Python libraries on the host and due to the lack of local analysis on the hosts, this is
another anti-anti forensic tool installed by default. Similarly many anti-forensic capabilities used by
malware will test for VMware branded virtualisation, research states that [70]“Anti-VM techniques
typically target VMware because of it being the most used” platform for virtualisation. With the use
of VirtualBox for the virtualisation platform, this provides some measure of protection from some
anti-forensic virtualisation detection mechanisms.
There are more technical anti-forensics methodologies, that detect the presence of a hypervisor via
the detection of a specific bit or hypervisor vendors cpuids’. There is also detection of system uptime
and CPU timestamp counters. These can be bypassed by modifying certain aspects of the virtual
machine base images.
Anti-anti-forensics capabilities in this system are an integral part of ensuring that the behaviour of
malware is reliable and as close to real life as possible. Despite this, there are diminishing returns for
effort invested to ensure that the sandboxing system is able to evade anti-forensics. To this end, it is
not essential for all sandbox detection mechanisms in paranoidfish to be dumbfounded, as malware
does not use all available anti-forensics methodologies.
25
4.3 Data Collection and Exportation of Network Traffic From Malware
To collect and export data from the target hosts, the virtual network interface cards and the network
activity that is displayed must be mirrored from the host and stored locally for use in the analysis
system. The aim is to record as much data as is possible with regards to network forensics. [71]“The
more items that the investigator recovers the greater the chance is that they may come across a
potential lead. This again is platform independent and should be processed mainly in the form of
pcap files for packet capture. [72]“Network capture files are typically named PCAP files after the
library used to read them. Pcap files are suited to this analysis as they are the standard format for
packet capture and are easily imported into software such as wireshark that can be used for network
traffic analysis.
There is also the need for segmentation by file for easier analysis of these pcap files. For this reason
the file names of each pcap are the uuid of the host, the SHA hash of the malware and the Unix
timestamp of the beginning of the malware’s execution. This enables the pcaps to be easily accessed
and manipulated by the other aspects of the program. This correlation is further enabled by the
linking of the packet capture to the malware and operating system associated with it.
The web application is part of the analysis system that is designed for the purpose of allowing easier
deployment of virtual machines and malware on these virtual machines in addition to the display of
data.
An end product of the deployment and execution of the malware is the information that can be
extracted from this malware in terms of network activity. This should also display the classification
of malware that occurs via the analysis scripts to enhance the malware analysts understanding of
what is occurring.
There will also be a list of domains and IPs to which the malware is attempting to connect, to aid
the analyst in effective determination of what the malware is attempting to do. There should also
be a list of systems that the malware has been installed on and what requisite network behaviour is
shown on these systems.
The malware is identified by a unique SHA-256 hash of its file signature and the name of its ex-
ecutable or binary. SHA-256 is unique as it is a [73]“256-bit hash meant to provide 128 bits of
security against collision attacks”. This resistance to collision attacks means that there will not be
two pieces of malware with the same hash but different binaries, such as their potentially would
be, resultant from the usage of the MD5 hashing. [74]“Flame malware masqueraded as a file from
Windows Update by conducting a previously unknown chosen-prefix collision attack against the MD5
cryptographic hash”. Although it is improbable to expect this attack to occur against this system, it
is reasonable to phase out the use of legacy ciphers when designing analysis tools. This is especially
important when digital forensics tools are used to provide chain of custody within criminal investi-
gations. The legal and evidentiary tribulation that emerges from attacks of this nature means that
the evidence is no longer reliable. Technically this functions as [75]“collisions in hash values show
that two different files that have the same hash values compromises the integrity of files or data. In
the case of digital evidence, the effect can be fatal” to the reliability of the evidence.
There is also the option for the analyst to name a piece of malware by its colloquial name to more
easily identify the piece of software. This is useful as the differences in naming of malware types
between researchers and vendors can cause subtle problems during the process of malware analysis.
26
Historically, [76]“malware naming has never followed established convention. In fact, antivirus com-
panies and researchers used to name viruses based on characteristic they found interesting. However,
naming inconsistencies become a real research problem when trying to correlate or mine useful data
across different antiviruses”. As such it is useful for the analyst in question to be able to define the
colloquial name for the malware sample.
Additionally there will be the addition of static analysis capabilities to the extent that is provided
by the VirusTotal API. The VirusTotal API provides information relevant to a particular malware,
dependent on the submitted MD5 or SHA hash. This hash is then input into VirusTotal’s systems
[77]“to determine if it was previously scanned, if so, the stored report is provided to the user”. This is
based on a comparison of a large number of different antivirus vendors signature databases in order
to establish whether or not a given file under analysis can be classified as malicious. This is utilised
within the system and displayed to the analyst using the system in addition to other facts such as
colloquial name, MD5 hash, associated internet protocol addresses and classification of malware.
Data Visualisation
An important part of enabling analysts to effectively process the relevant information is to visualise
the collected data in a way that allows for conclusions to be logically drawn and for analysis to be
performed. Whilst the target user demographic for this system is malware analysts, a demographic
have a high degree of technical capability, it is still important to ensure that the data shown is clear,
concise and relevant.
This data visualisation will display traffic information in a time series format, to show the progression
of malware’s activity as it is executed. This enables analysts to plot a malware’s network activity
over time, which is more visual than a pcap analysis and enables the plotting of trends and visualising
network activity over time. [78]“Time is one of the most important properties of network traffic.
It is often used to correlate traffic loads and events of one host or higher level network entity with
others”. Displaying this information visually enables more meaningful in-depth analysis.
There will also be a geographic visualisation of where malware activity is occurring. Specific malware
may be originating from a known threat group in a particular geographical location. The inclusion of
geolocation will enable analysts to pinpoint particular regions and identify threat groups. Microsoft
research states that [79]“systematic analysis and comparison of areas highly impacted by malware
against those least affected can help uncover the various technical, economic, social and political
factors that influence regional malware infection rates”. Geographical analysis of malware command
and control distribution allows a far more holistic analysis of the threat landscape. However, threat
attribution can be unreliable due to the ease of deploying proxy servers and easily available cloud
hosting.
The analysis of data is integral to the correct functioning of this system and the facilitation of
this analysis is through the deployment of scripts to categorise the malware by its network traces.
Dynamic heuristic analysis occurs when [80]“code emulation techniques are used by simulating the
processor and operating system to detect suspicious operations”. These heuristics operate by iden-
tifying common patterns exhibited by malicious executables and can be identified in aggregate in
order to classify malware.
Leveraging existing research, outlined earlier, in order to facilitate this analysis, there will be scripts
executed that are designed to identify particular characteristics in network traffic that indicate what
type of malware is present. Heuristic mechanisms of virus classification are effective in evading the
variety of concealment strategies used by modern malware. Behavioural heuristics enable analysts
to [81]“detect malware that keep on generating new mutants since they will always use the system
27
resources and services in the similar manner”. Network behaviour is required in order for malware
to effectively function and as such the behaviour observed by heuristic analysis is likely to remain
relatively static across malware samples of the same type.
The tool should display the behaviours identified via heuristic analysis within the web application,
to enable further analysis and enable the analyst to draw conclusions on the possible nature of
the malware sample. In this regard the system is meant to augment the existing capabilities of
a malware analyst rather than replace them entirely. Behavioural dynamic analysis [82]“systems
might be confronted with samples that do not belong to any known families or exhibit behaviours
that are characteristic for multiple families”. As such, the suggestion of classification and exhibited
behaviours help inform the analyst.
The eventual aim of all this classification of malware is the grouping of malware behaviours into
indicators of a given malware type. This should be communicated back to the user in the form
of an attempted classification of a given malware sample. This classification of type is the core
functionality of the given system.
Classification of malware by network traffic in its most basic form is grouping together network
traffic events in order to trying and ascertain patterns of network behaviour. These patterns can be
considered malware characteristics, relevant to the malware that they were collected as data from.
This is accomplished via grouping [83]“malware samples into groups that share common traits”,
the traits that are in common with this sample set are the behavioural heuristics that have been
identified. This is based on the assumption that in order to [84]“compare the behaviours of two
malware samples, it is necessary to show that the aggregate behaviour of a group of functions in
each sample is equivalent”. These behavioural heuristics extracted from the network traffic can then
be aggregated into classification of type. This mapping of heuristics to classification is a known
technique. The eventual goal of this process is the [85]“classification of behaviour, which enables
assigning unknown malware to known classes of behaviour”.
28
5 Implementation
The choice taken regarding the software used for implementation of this solution, lead to the con-
clusion that the software should be developed principally in Python, for the fact that it allows web
applications to also implement analysis functionality. It also is a programming language that has
many libraries applicable to network traffic analysis. A useful library is Scapy, which provides the
capability to both dissect and send crafted packets within the Python language. Within this project
the Scapy library offers the capability for the initial dissection of packets through it’s capabilities as
a[86]“Sniffing tool to captures packets and possibly dissect them and act as a Fingerprinting Tool”.
It also enables custom statistical analysis to be performed to define heuristics of network behaviour.
The web application uses a framework called Django, selected over other comparable frameworks for
its ability to support complex web applications. This is in addition to its ability to interface with
databases effectively through the model view controller design pattern. Although the framework is
written in Django there is also the element of the command line interface which is written in Python.
Within the graphing element of the system, the Google Charts API was the methodology chosen
to effectively represent the data graphically. Google charts enables researchers and developers to
[87]“quickly create a cloud-based data visualisation”. This was the optimal choice over competing
frameworks due to its ease of use, integration with several programming languages and ability to
interface with large amounts of numerical data.
The database software being used is the Structured Query Language (SQL) database software. SQL
is the most widely used database software and therefore has universal applicability. In addition to
this SQL is fairly simple in its syntax. The variant of SQL that is being utilised in order to store
the data is PostgreSQL. The developers of Django, in their authoritative book on the subject, state
that they are [88]“quite fond of PostgreSQL” for a number of reasons so this database software was
used.
SQL is able to interface with Django and similar Model View Controller(MVC)frameworks using
object relational modelling. [89]“Object-relational data models incorporate characteristics of both
the relation and object-oriented data types, this is particularly effective for MVC frameworks due to
the object oriented nature of the software with which they are interfacing. This brings with it the
convenience of allowing traditional methods of database storage to be utilised whilst accommodating
architectural complexity.
The network classification, clustering and analysis tooling part of the project also used Python to
effectively perform statistical analysis on the collected network data. The advantage of Python for
this type of analysis is that [90]“Python is a general purpose programming language which is also
well suited to data analysis, econometrics and statistics”. The statistical and analytical capabilities
offered by Python allow the manipulation of data in order to conduct the analysis necessary for this
project.
5.1.1 Libraries
There were a large amount of libraries within Python utilised in order to conduct analysis on the
network packet capture in addition to the deployment of the virtualisation and data collection
infrastructure. The [91]“simplicity, flexibility and maintainability of Python code as well as the wide
range of its libraries” presents itself as a clear advantage. This also saves time with features such as
hashing malware and accessing application programming interfaces rather than replicating existing
work by coding these mechanisms for the purpose of this research.
29
5.2 Malware and Threat Intelligence Sources
When collecting the corpus of malware for this research project a variety of sources were used to
attain a reasonable amount of data for analysis. To gain the hashes or names for particular families
of malware, open source research produced by antivirus firms was leveraged.[92]“In some cases,
entire campaigns are uncovered on the basis of readily available malware deposited in watering hole
sites or collected in malware repositories like VirusTotal”. Whilst the aim of this research is not
to uncover novel campaigns, the benefits of combining open source threat intelligence released by
antivirus and security firms with open source malware repositories can lead to interesting results.
Threat intelligence sources such as Talos, which is [93]“Cisco’s threat intelligence organization”
helped provide timely and recent malware samples, to ensure that the malware under analysis was
not legacy and unlikely to be seen in the wild. Particularly for malware variants such as Droppers
and Botnet Nodes, where the existence of live command and control servers is important for accurate
behavioural analysis. Despite this, some legacy malware samples were still analysed due to either
their infamy or especially notorious and distinctive network signature. One such sample is the
Stuxnet virus, a landmark [94]“precision digital weapon aimed at sabotaging Iran’s centrifuges”.
Additionally [95]“the largest outbreak in years: The Conficker aka Downadup worm” was added to
the corpus for much the same reason.
30
Figure 5.2: VirusTotal Match for Indicator of Compromise Hash
To download the actual malware samples it was necessary to visit sites that contained existing copies
of malware binaries in order to gain as reliable a source of data as is possible. The two primary
sources that were used were the reverse.it and the VirusShare malware corpus. [96]“VirusShare.com
was selected for its size, modernity and facilities”, VirusShare had a significant amount of malware
binaries, however [97]“Payload Security’s Hybrid Analysis” site was found to have a larger number
of binaries for download and analysis, particularly regarding recent samples. Both sources were
used extensively in order to source the malware for transfer, execution and analysis. It would have
been possible with VirusTotal, to download binaries, but as they charge $80,000 dollars a year for
a license, this is beyond reasonable expenditure for a masters project. VirusShare also has the
advantage for this research, of returning the most freshly received sample for a given search term,
ensuring that timely samples are utilised.
31
Figure 5.4: reverse.it Interface
32
6 Analysis
The analysis portion of this project, is concerned mainly with identifying and categorising malwares
network behaviour. This is resultant from the collection of network data by earlier portions of the
system. The network traffic signatures that have been collected help identify each individual piece
of malware. It is also possible to identify characteristics of malware from this network behaviour.
These characteristics can then be listed for the analyst to observe, or grouped together to classify
malware variants and infer other expected malware behaviours and characteristics.
Modern malware is multi-staged in its method of operation and this means that there is not one
behavioural indicator that is representative of a particular malware sample but rather a sequence
and range of behaviours that are indicative of a malware’s behaviour. One such element is a dropper
and [98]“the dropper has the capacity to download the malware” in that the payload of a malware is
downloaded from an intermediary server. This may mean that the payload eventually executed is
secondary from the initial malware infection.
A latter stage of the malware is often to reach out to the command and control server from which
the adversary is able to remotely control the computer. [99]“Malware with command and control
capabilities will often connect back to the malware operator” during the latter stages of the operation.
This may take the form of control of the target host, in which a target will be part of a botnet, that
can perform denial of service attacks against targets of the command and control infrastructure’s
choosing.
Further to this there may be collection of sensitive data that is contained on the target host, this
sensitive data may be in the form of banking credentials or financial data within the target system.
The overarching aim of banking malware is to exfiltrate and[100]“steal users sensitive data, like
online banking credentials or username-password combinations for other high-value sites”. There
may also be personal data that can be used for the purpose of blackmail or fraudulent activity. A
behaviour not distinct from connection to command and control servers is the exfiltration of data
as part of the malware’s operation. This data exfiltration takes a variety of forms and may occur
only when the collection of banking credentials has been triggered.
There is a delineation between viruses that are similar in their characteristics and operation. This
manifests itself in the form of there being families of malware and malware sub-types. This is
more prevalent now that malware may change in their signature and patterns of operation as they
propagate. An example of this is the Olympic Destroyer malware that targeted the Pyeongchang
2018 Olympics. It was discovered during malware analysis that [101]“credentials have not been
hardcoded into the binary by the attackers themselves. The malware dynamically updates this list
after using the password stealers. A new version of the binary is generated with the newly discovered
credentials. This new binary will be used on the new infected systems via the propagation”.
Whilst identifying a malware sample’s individual characteristics may be of importance a crucial
element of malware analysis is to identify the malware type. There are a wide variety of different
malware types with requisite different behaviour. As such there may be a family of almost identical
malware samples, with slight differences, that can be for all intents and purposes considered the same
malware. In a similar manner [102]“Cyber criminals use polymorphic software that rolls up malware
into a single package that has the ability to make its signature mutate, evading typical detection”.
This can lead to further examples of malware similar in structure but different in eventual behaviour
following execution. Another example of malware being divided into families is that of exploit
kits often being used as the delivery mechanism but having different payloads following execution.
[103]“Malware variants will come by as a result of malware individualisation performed by an EK as
well as natural changes in the EK over time” and so variants of malware are all likely different in
signature and contain different payloads, it is accurate to define them as part of the same malware
family.
33
6.1 Dynamic Analysis Methods
To facilitate meaningful conclusions from the analysis of malware traffic, a necessary step is to cluster
the malwares’ traffic samples by the type of traffic that is emitted. There are different stages in
malwares’ execution mechanisms and therefore it can be possible to identify different behaviours
that a malware is exhibiting.
Confusion of malware behaviour with propagation mechanisms is also a common error that is made
by malware analysts, this is due to propagation methods being a type of behaviour exhibited by mal-
ware but not their operational behaviour. Kaspersky state that [104]“individual malware programs
often include several malicious functions and propagation routines and, without some additional
classification rules, this could lead to confusion”. When the malwares’ behaviour can be aggregated
and analysed it is possible to cluster different types of malware together. This is due to malware
types having behavioural characteristics representative of their type.
A source of data that is very informative resultant from malware network traffic capture is the
analysis of traffic that is using the DNS protocol. Domain names typically require registration with
a body known as a domain registrar. The act of registering a domain typically requires some effort
on the part of the purchaser and this means that domains can be a rich source of information. Other
protocols may attempt to contact a particular DNS address, which can be parsed from the logs and
show further information regarding this dataset.
A feature of modern malware is that of fast-flux domain switching, this is malware rapidly cycling
through a number of domains that are used as proxies to the malware’s command and control
infrastructure. If a list of domains associated with a piece of malware is visible then patterns can
be spotted in the domain generation algorithm.
34
Figure 6.1: Fast-Flux and Double Fast-Flux
Another feature that is enabled by the monitoring of DNS traffic that is emitted by malware is
correlation with known malware domains. There are publicly available blacklists that display a list
of domains that are known to be malicious in their content for the purpose of allowing enterprises
to block them. IP blacklists operate in a similar way in that they are managed by authorities
and keep a list of addresses that are known to be malicious. A combination of open source threat
intelligence feeds and application programming interfaces were used in order to obtain data on known
malware domains. In the parlance of threat intelligence these datapoints are known as Indicators of
Compromise (IOC’s). [105]“Indicators of Compromise (IOC) are forensic artifacts of an intrusion
such as virus signatures or IPs/domains of botnets”. For these indicators of compromise to be
leveraged, it is necessary to compare the domains and addresses with those listed and [106]“encoded
in publicly available blacklists”. There were several data sources, from separate threat intelligence
feeds, that were aggregated to produce one central repository of threat intelligence.
There can be division of malware behaviours into a number of types. These types can be defined
into [107]“three categories: propagation, data exfiltration and remote control”. Propagation of the
malware is attempts by the the malware to discover other vulnerable hosts to infect and then
attempting to deliver the payload to those hosts. There also exists the infection and action stages,
which are resultant from the propagation action.
A marked part of discovering malware propagation methodologies is that it will initially scan the
local network for computers to target. Thus activity to local addresses can be seen as suspect
and potentially indicative of attempted propagation activity occurring. Virus propagation need not
necessarily be local and many malicious executables will try and propagate throughout the internet.
This behavioural indicator follows the basic principle that propagation of a virus is very distinct
from normal network operation in volume of hosts attempted to access. [108]“The fundamental
assumption used is that for a virus to spread effectively it needs to contact as many machines as
35
possible, as fast as possible. This means that there will be a distinct behavioural pattern followed
by hosts that are engaging in propagation. This is distinct from the other overarching behaviours
shown by other aspects of the malware.
Anther portion of the behaviour shown by this malware is the preliminary activity of the dropper.
The essential functionality of the dropper is to [98]“download additional components such as the rest
of the malicious payload”. This can even be as complex as an additional executable. A statistical
minority of hosts engage in dropper activity. Dropper activity for malware is typically very noisy as
it involves needing to download a large amount of files from a staging server.
A final stage that is not necessarily distinct from the malware’s dropper and command and control
phase is that of data exfiltration from the host. A network forensic toolkit should have the capability
to [109]“identify possible data exfiltration events”. Classification of these events should help with
clustering of malware by type and group.
HTTP is the main protocol that is used to communicate over the web in order to allow the efficient
transfer of information. This protocol is a fundamental medium of exchange for[110]“linked infor-
mation systems”. HTTP has latterly been followed by HTTPS, which encrypts traffic. HTTP still
makes up a greater percentage of worldwide internet traffic than HTTPS. Analysing the format of
HTTP requests can provide a valuable source of information relevant to analysing and classifying
the malware that is under examination.
Malware often use [111]“HTTP/HTTPS ports to communicate, often because only these ports are
open at the firewall level”. An advantage of using HTTP traffic for command and control and other
aspects of malwares’ infrastructure is that it is unlikely to gain the attention of a security operations
centre analyst. This is because it is the most commonly used network communications protocol.
Something identified within HTTP traffic is the accompaniment of an HTTP request with headers.
The presence of headers is an indicator of malicious activity, as there is no need, during automated
HTTP traffic. A research project from ETH Zurich discovered that the typical process of malware is
to [112]“perform an HTTP GET request to receive a command (polling the CC) periodically, Execute
it and collect the results, Send back the response in an obvious way”. This results in the analysis of
the packet captures flagging attached metadata to requests as a heuristic.
36
Figure 6.2: TLS Client Hello Packet Diagram
An additional way in which HTTP traffic is differentiated from other traffic types is by the encryption
of data in transit via SSL and TLS , which [113]“are cryptographic protocols designed to provide secure
communication over insecure infrastructure”. Research specifically analysing the characteristics of
certificates used in encrypted malware traffic stated that there is more of tendency for malware
to offer legacy ciphers in the client configuration. It was observed that [114]“100 percent of the
malicious TLS sessions observed offered” these legacy ciphers. The use of Cipher-Block-Chaining
ciphers are insecure as the [115]“permutativity of the block cipher could be exploited ” in order to
decrypt communications. DES and 3DES are insecure and considered legacy as two decades ago the
Electronic Frontier Foundation were able to [116]“crack DES in an average of 4.5 days”. Now this
is considered trivial due to increases in computer hardware processing power.
Legacy Ciphers Accepted by Malware
0x000a (TLS RSA WITH 3DES EDE CBC SHA)
0x0005 (TLS RSA WITH RC4 128 SHA)
0x0004 (TLS RSA WITH RC4 128 MD5)
The advantage of analysing DNS and IP traffic is that they are metadata required for all network
communication. Whilst it is possible for network communication to occur without a domain name
assigned the absence of a domain name can also be a signal in and of itself. Given that the U.S.
37
makes use of metadata to conduct [117]“drone strikes and given how many of those strikes are now
dependent on metadata rather than human intelligence”, it can be extrapolated that metadata can
still be a valid source of information.
Domain names provide a wealth of evidence during the course of malware analysis. [118]“Under the
current structure of the Internet, a given top-level domain can have no more than one registry. A
registrar acts as an interface between customers and the registry, providing registration and value-
added services”. It is the very act of requiring a registrar that enables DNS to be such a rich
datasource when attempting to extract relevant data from a given malware sample. There may also
be information available on how recent domain registration was for the given domain.
If an analyst is especially fortunate, there will be data available via the WHOIS protocol which
provides [119]“white pages services and information about registered domain names”. There are
also other information sources that can be used with regards to information from the domain
name system. These data sources may be described as [120]“discrete data points that might in-
clude WHOIS record, IP address information, nameserver, hosting Information, Autonomous Sys-
tem Number (ASN) and Mail server. It is also possible to do DNS lookups and reverse DNS lookups
which help provide decorating details from an IP address and DNS name respectively. This can be
compared against known blacklist domains.
Understanding the timing of the traffic throughout the course of the malware’s execution is valuable.
This enables the factoring of the [121]“attributes relate to traffic statistical characteristics such as
flow duration, idle time, packets’ inter-arrival time and length” into the calculation and classification
of malware samples. Temporal analysis of malware network traffic incorporates [122]“properties of
the host-domain names that a client contacts, and on statistical patterns in the timing and data
volumes of the sequence of network flows from and to that client”. As such, visualised flow data can
provide analytical capabilities.
38
Figure 6.4: Line Chart of Continuous Botnet Traffic
This was visualised via the processing of the packet captures for each malware sample. From
these packet captures, statistics regarding the outgoing, incoming and total packets were extracted
programatically on a minute-by-minute scale. These are then inserted into the database and returned
to the user in the malware analysis information portion of the site in the form of a [123]“line chart
that is rendered within the browser”. This aids the analyst in understanding and classifying the
malware sample from which the packet capture is extracted.
Geolocation enables the identification of malware’s command and control architecture and the study
of other factors that may correlate with the malware’s mapped network activity. In particular, there
is value to be extracted from the combination of [124]“traffic, which is key to understanding the
spread of malware and geography, which is key to investigating country-level effects”. Due to the
increase in analysis capability offered by the combination of these datasources, there has been the
combination of subdivision by malware type classification, correlated with geolocation of internet
addresses.
This geolocation is accomplished via the aggregated ascertaining of of longitudinal and latitudi-
39
nal data from internet protocol addresses. The data source used is the MaxMind GeoIP City
Database, this database provides geolocation information for IP addresses and is updated regu-
larly. [125]“MaxMind GeoIP City database is being used for geolocation information” by a wide
variety of services. These addresses are then mapped geographically via combining this returned
longitudinal and latitudinal data with the Google Maps API via geocoding. [126]“Geocoding is the
process of converting addresses (like ”1600 Amphitheatre Parkway, Mountain View, CA”) into geo-
graphic coordinates (like latitude 37.423021 and longitude -122.083739), which you can use to place
markers on a map, or position the map. This aggregated and grouped geocoded data is then visible
to the user in a visual and easily understood format.
The timing and structure of the traffic communicated by malware is an important factor regarding
its classification.[127]“We cannot know a priori the frequency with which a zombie will contact its CC
server(s)” however this does not mean that meaningful patterns cannot be extracted from timing
analysis of communications. Timing and traffic analysis is [128]“impressive in what it can determine,
necessarily provides lower quality information compared with cryptanalysis and recovery of message
content”. There is a wealth of information that can be determined through analysing the timing and
distribution of traffic resultant from malware execution.
In the field of intrusion detection, which is not wholly separate from malware analysis, [129]“many
intrusions can be detected through static timing analysis”. This can be implemented in a practical way
by classifying the temporal distribution of different network events as indicative of certain behaviours
occurring.[130]“It is effective to look into the arrival timing of the packets as well”. Using the arrival
timing and mapping the regularity of communication between the infected host and external domains
can be indicative of activity occurring.
This allows the subdivision of malware into types by analysing what characteristics their timing
displays. This automated analysis can assist in malware classification. The analyst using the sys-
tem may wish to conduct their own temporal analysis, and during the requirements stage of this
project, surveyed analysts expressed this desire. It is for this reason that there is a line graph of
network activity over time.[131]“To infer the types of malicious activities, temporal traffic features
are analysed” and this may include a manual analysis of malware activity over time.
To identify and suggest the malware’s potential type classification, it is necessary to group and weight
the behaviour of observed malware samples as they correspond to a given malware classification.
The behaviour of malware is based on the identification of network traffic data and the corresponding
behaviours that can be extrapolated from this identification. The establishment of these behaviours
can be grouped and weighted into classifiers for a given malware type. Once these weighted heuristics
have been established it is possible to use them to classify other malware samples by the metrics
and patterns that have been shown to be present in the set of malware.
40
Figure 6.7: Process For Extracting Classifications Via Behavioural Heuristics
There were a variety of weighted heuristics utilised in order to differentiate the malware sam-
ples packet captures from one another. These weighted heuristics manifested themselves as dif-
ferent malware behaviours which could be clustered and grouped. These behaviours are assigned
[132]“different weighted values based on characteristics used by viruses”. It may be possible for a
malware sample to have two or more resultant classifications, dependent on the weighting of it’s
observed behaviours.[133]“Deriving a single accurate label from multiple noisy labels is a common
problem when working with real world data” and so this should not be viewed as a drawback, but a
method for drawing an approximate likely classification of malware type.
6.2.1 Scanning
A heuristic common to a number of malware classes is scanning activity. Existing research shows
that [134]“scans were mostly initiated by worms that scan specific Windows ports (e.g. 139, 445) or
ports related to backdoors (e.g., 9988)”.
One scanning variant, that of sequential scanning, is especially indicative of worm traffic in operation.
Sequential scanning is when a host displays scanning behaviour but also crucially tries to contact a
number of hosts that are adjacent to each other in the address space. [135]“Slammer and latter Witty
use naive random scanning (RS). RS chooses target IP addresses uniformly and does not take any
information on network structures into consideration. Advanced scanning methods, however, have
been developed that exploit the IP address structure. For example, Code Red II and Nimda worms
have used localised scanning”. This localised scanning is another heuristic that has been added, when
binaries that have displayed scanning behaviour restrict their scanning to hosts within the local IP
41
address space. Conversely, scanning restricted to external IP addresses is another heuristic that has
been added, one that may easily be confused with fast-flux botnet command and control traffic.
A type of scanning that is indicative of the utilisation of the Shadow Brokers exploit group is recon-
naissance and attempted exploitation of the Microsoft Server Message Block protocol. This protocol
is used by [136]“EternalChampion, EternalRomance, EternalSynergy, ArchiTouch and SMBTouch.
Most of these exploits target the Microsoft Server Message Block (SMB), which handles access shar-
ing between nodes on a network”. It is possible to identify malware contacting SMB services as it
shows that [137]“direct hosted NetBIOS-less SMB traffic uses port 445 (TCP and UDP) and SMB
over NetBT uses the nbsession service Port (139/TCP)”. As such, identifying this traffic is a matter
of identifying connections that make or attempt to make connections over these ports.
One behaviour type that was not isolated as a malware behaviour and therefore incorporated as a
weighted heuristic, was port scans. Port scans are a method of reconnaissance on computer networks,
they are designed to enumerate. The typical methodology by which a port scan is executed is via
Nmap software.
This behaviour was not incorporated within the system as it was not seen during the collection
of data. This is because malware does not typically port scan a large range of ports on target
hosts. This will be due to the fact that most malware samples only have a small number of exploits
contained within them for the purpose of propagation. They do not engage in an exploitation
pathway complex enough for in-depth reconnaissance to be necessary.
42
6.2.2 Server vs. Client
An important indicator of the nature of a malware sample’s activity is the volume of traffic that is
being transmitted and the direction in which that traffic is flowing. [138]“direction of the flows in
the plots when necessary, we denote the flows going from the controllers to the bots as ingress flows,
and the flows of the other direction as egress flows”. To this end typical malware such as botnets or
hosts exfiltrating data will have [139]“heavy outbound vs inbound data”. As such this is specified as
a heuristic, mapped to the classification of these aforementioned malware types. Additionally there
is a heuristic triggered by the inverse of this, where the received traffic is more voluminous than sent
traffic.
6.2.3 Downloading
A common behaviour is the downloading of supporting files in order to further their penetration of
a given system, or enable greater functionality for the executable. Typically, resultant from this,
[140]“malware’s malicious files are silently installed and executed on the system”. These files may
often be deleted after downloading and installation, however there will still remain network evidence
of this download that can be captured and identified as a heuristic pattern.
The methodology that was used in order to capture and record these files for the purpose of analysis
was to identify objects transmitted within the packet capture file. These objects were then isolated
and extracted, for storage within a local directory. These files were extracted with the nfex network
file extractor tool. Nfex is a library that functions as an [141]“asynchronous unix-based, command-
line driven standalone tool designed to perform real-time file carving from network streams”.
43
Figure 6.10: File Types and Hashes Downloaded in Malware Analysis Visualisation
The beginning bytes of each file were used in order to identify the filetype, this was recorded within
the database, but to see if any commonalities emerged between malware types regarding filetypes
downloaded. The logic behind this analysis method is that [142]“the first few bytes of a binary file
are often used to indicate type. This is generally referred to as its magic number”. One unsurprising
commonality that emerged was that malicious executables would download executable and binary
files during the course of their operation.
A variety of malware classes make use of Tor in order to obfuscate their command and control traffic
and also in order to evade takedown requests. [143]“The main reason that motivates botmasters
to move to Tor is to find a new environment to achieve stealthiness and untraceability. The Tor
hidden services provide anonymous CC servers, which are more difficult to take down”. Although
the validity and effectiveness of this approach by malware authors is debatable, the use of Tor for
communication purposes by malware is a trend that is well established in today’s malware corpus.
As such it is possible to use a heuristic pattern of identifying traffic transmitted and received in
order to identify specific malware classes. The malware classes that are typically likely to use Tor
routing protocol for command and control are botnet nodes and remote access trojans. Identifying
communication through Tor protocol is an effective method of isolating malicious traffic from benign
traffic in detection systems. There are uses for identifying Tor traffic within malware classification.
These attackers of the protocol are motivated by the illicit nature of the traffic and the large volumes
of currency moving through this part of the internet, with organisations such as Zerodium, the zero-
day brokers, successfully offering [144]“a total of one million U.S. dollars in rewards to acquire
zero-day exploits for Tor Browser on Tails Linux and Windows”. This is in addition to the academic
research that leads to attacks such as traffic analysis [145]“that greatly degrades the anonymity
44
provided by Tor, by allowing adversaries to discover the path of a Tor connection and thereby reducing
the protection to the level provided by a collection of simple proxy servers”.
Although the Tor network has a wide variety of developers and organisations that attempt to fund
security research of the protocol, it’s browser and communications protocol contain vulnerabilities
that affect it’s security. To identify outbound communication with the Tor network it is possible to
use a few separate techniques.
One such method is similar to the reference against known DNS and IP blacklists that is used by other
malware detection mechanisms. There is a published list of Tor exit nodes available, this enables the
utilisation of this identification methodology to detect incoming Tor connections through a network.
However this does not include entry nodes, for this reason what entry nodes can be extracted and
cross-referenced versus malware activity. Within the Tor network these are defined as “entry guard
nodes” wherein [145]“each Tor client selects a few relays at random to use as entry points, and uses
only those relays for her first hop”.
Blacklists from publicly available sources are a an effective classifier of malware types. Particularly
with samples that are known to be malicious such as those used within this project, this helps
identify the malware sample as being particularly noisy. Additionally it can also help separate
malicious traffic from benign.[146]“Frequent execution of reputation based approaches using DNS
data identifies and builds up-to-date blacklists”, many of these blacklists are publicly available and
maintained by a wide variety of sources.
A heuristic not implemented was the classification of malware subtype by blacklist type, wherein
certain domain names and domain name characteristics enable the ascertaining of subtype. Research
has shown that [147]“malicious domain names have different characteristics than benign domain
names. A sudden increase of queries and abnormal geographic query patterns are a common feature
of phishing and botnet domains and these observations coincide with findings in previous research”.
Unfortunately, classification by blacklist type falls out of scope of the core of this research.
45
6.2.6 Crypto Mining Pools
To identify malware that illegitimately mines cryptocurrency, a reliable indicator is the use of crypto-
mining pools. This is a method by which miners collaborate in order to effectively maximise the
resultant income, that is used by legitimate mining organisations. Legitimate [148]“mining pools
split the proceeds while operating near cheap electricity sources and cold air to cool the mining rigs”,
a similar model is followed by malware that is using victim’s processor cycles to mine cryptocurrency.
The malicious model is also seeking to ensure maximum efficiency and profitability for the operation.
As these malware traffic collections occur at the beginning of these malware samples patterns of
operation, it is also possible to spot a necessary conspicuous network traffic signature. This is the
stratum join request that is necessary for miners to operate as part of the[149]“stratum overlay
protocol extended to support pooled mining”.
It is also possible to identify cryptocurrency mining nodes through IP addresses correlated with
mining activity and also to [150]“identify mining traffic via well known pool IP and port pairs”.
Identifying the ports common to cryptocurrency mining, particularly Monero mining, enables traffic
samples to be identified as exhibiting the behaviour of cryptocurrency mining.
Several methods are used to identify fast-flux domain switching within a packet capture. As fast-
flux is a common obfuscation methodology used by many malware classes, this is an interesting
behavioural element to identify being present in a network packet capture.
One of the most reliable indicators is to examine the[151]“short time-to-live (TTL) value, i.e., 300
seconds, indicates that the records will expire after 300 seconds”. This functions by enabling the
rapid cycling of internet protocol addresses for a given domain once the TTL expires. This enables
the extraction of DNS fast-flux indicators from passively collected packet captures without active
investigation of the domain in question. [152]“When the TTL expires, the fast flux service network
operator’s automation assures that a new set of A records for name servers replaces the existing set”.
This activity and the low TTL is essential to DNS fast-flux functionality, making it conspicuos as a
behaviour within packet captures.
46
Figure 6.13: DNS Common Resource Record - Time To Live
Another indicator is the amount of records returned by a given domain, this is an effective indicator
as it can be analysed passively and is required for the effective operation of a fast-flux botnet. In fast-
flux networks [153]“fast-flux domains often return five or more A records in a single lookup in order
to have a higher guarantee that at least one of the IPs is online. This is precisely the identification
methodology that has been utilised to calculate the likelihood of a domain being a fast-flux domain.
These number of A records are returned to enable malware to rapidly switch between a number of
different IPs for command and control infrastructure.
Domain generation algorithms are the method by which a malware author will enable fast-flux
domain switching to operate. Domain generation algorithms modus operandi is to generate a list of
domains that are registered and cycled through in relatively quick succession.
A significant percentage of network traffic is constituted of traffic to and from web applications.
There are several traffic characteristics of web traffic that can be distilled into heuristic identifiers.
It needs to be noted that there has been widespread adoption of HTTPS that is increasing over
time. This adoption of HTTPS has been described as being adopted by [154]“full Alexa Million
represents the long-but-active tail of the Web, and only 40 percent support HTTPS (10 percent by
default). IPv4 hosts representing the long tail and flotsam of the Internet are even less likely to
47
support HTTPS (10 percent)”. This adoption is also reflected by domains utilised for command and
control by malicious software. It is possible to discern between legitimate and malicious domains by
the characteristics exhibited by certificates and encrypted communications. [155]“As of June 2015,
all of the major exploit kits served their exploits over an unencrypted protocol”.
As can be seen all these different types of network activity are recognisable as behaviours, which can
then be seen as characteristics of different malware types. This allows classification of the malware
into separate types of malware.
[156]“Heuristic as an adjective means serving to discover” and this is an effective description of the
purpose of using heuristics. They manifest themselves as a set of rules and patterns which must be
followed in order to achieve a solution to a given process. Within the context of computer science a
heuristic algorithm can be defined as an algorithm [157]“that either gives nearly the right answer or
provides a solution not for all instances of the problem”.
Weighted heuristics can be used within the context of malware analysis for the purpose of effective
classification. Heuristic analysis offers many advantages over traditional static and dynamic analysis.
Weighted heuristics allow for the consideration of different behaviours in simultaneity in order to
derive the eventual classification.
It is then possible to enable the system to automatically suggest to the analyst potential classification
of the malware. This is based on a set of malware deployments and recorded behaviours and
extrapolated mapping of behaviours to classifications. The suggested malware types are simply
these classifications and whichever of these seems like the most likely based upon the available
evidence is that which is returned to the user.
Using weighted heuristic techniques offers the advantage of being able to detect [158]“polymorphic
and metamorphic malware, which often can be only detected by algorithmic approaches like a heuristic
engine”. This is in addition to malware family subtypes which although they may have a different
static signature, will have the same network activity in terms of behaviour.[159]“Evading an antivirus
scanner means evading signatures, the scanning engine, and the detection logic. There are a number
of automated tools currently in the public domain such as Shellter, Veil and peCloak amongst others.
Although these tools are able to achieve provable evasion of known static and dynamic analysis tools
they may struggle to evade behavioural network forensic analysis.
48
6.3.2 Cryptocurrency Mining Malware
Malware thats purpose is utilising the target’s resource in order to illegitimately mine cryptocurrency
for the profit of the criminal operators is known as crypto-mining malware. This malware has risen
to prominence in the first 2 quarters of 2018 within global infection rates skyrocketing. [160]“since
September 2017, malicious cryptomining has been our top detection overall” according to Malware-
Bytes Labs and as such it can be regarded as a significant threat. The financial incentive behind
the rise in crypto-mining malware is obvious, with [161]“an average system would likely generate
about 0.25 dollars of Monero per day, meaning that an adversary who has enlisted 2,000 victims
(not a hard feat), could generate 500 dollars per day or 182,500 dollars per year”, as opposed to the
rather inefficient method of ransoming the data of the victims. This type of malware operates in
a similar manner to a legitimate mining operation, wherein operators [162]“combine computational
resources from multiple miners to increase the likelihood and frequency of finding a new block, and
then distributes mining rewards among participating miners based on the proportion of contributed
computational resources. Some variants of crypto-mining malware operate within the context of
victim’s browsers, this is called cryptojacking and is also a very common attack vector.
49
Figure 6.16: Malware Classified as Cryptominer
With regards to the network behaviour that these cryptocurrency miners exhibit, they are stealthy
in their method of operation, in comparison to other malware types. This is a reflection of their
authors aims to maintain persistence on an infected system for as long a period as possible, in order
to maximise the financial gains that are received from this infected machine. However there is one
commonality to crypto-mining malware which is the necessary connection to the mining pools in
order to verify. Also, as the malware typically is controlled via a traditional botnet command and
control architecture and spreads either like a worm, such as in the case of WannaMine, which propa-
gates via the methodology that if [163]“the ETERNALBLUE hole is already closed, WannaMine can
try to spread using password cracking tools to find weak passwords on your network”. It transmits
itself in a comparatively noisy process, much like a worm.
The other common methodology of cryptomining malware propagation is via malvertising and drive-
by downloads. Drive-by downloads are a particularly effective method of transmitting malware,
that have somewhat fallen out of favour recently. An example of cryptocurrency mining malware
that spread using this methodology is the Ngay campaign. This malware campaign of allegedly
Vietnamese origin begins with the Quantloader dropper [164]“downloaded and executed from RIG
50
Exploit Kit. And, QuantLoader downloads a cryptocurrency miner”. Whilst installation vectors may
vary between individual distribution campaigns, there remains the necessity of communication with
the mining pools in order for the malware author or distributor to still retain profit. Evidence of
any communications over odd protocols or containing wallet strings may also be indicative of a
cryptocurrency mining malware in operation.
With some of the malware samples, it was necessary to bypass some of the initial stages of the mal-
ware in order to ensure the execution of the payload, in one example this was because [165]“before
dropping the driver, the dropper checks if it is executing in a virtual machine environment, under a
control of a debugger or in a sandbox. If a virtual machine environment is detected, the malicious
driver is not dropped, and the execution continues”. This is part of the aforementioned anti-forensics
capability wherein malware authors will seek to evade analysis by detecting whether they are exe-
cuting from within a virtual machine.
Cryptocurrency mining malware is another relatively recent development in the malware space.
Being a relatively recent development, at the time of writing, there is a lack of published research
in the field. However typically the method is [166]“JavaScript code that executes client-side in her
browser, mines a cryptocurrency - typically without consent or knowledge and pays out the seigniorage
to the website”. In terms of the difference between cryptocurrency mining malware and in-browser
cryptojackers, the difference primarily manifests itself in the fact that in-browser cryptojacking is
not typically persistent on the end user’s system and does not execute in memory resultant from
the successful execution of a malware binary. Coinhive is the primary library used for in-browser
cryptomining, both legitimate and illegitimate, [167]“Coinhive keeps 30 percent of whatever amount
of Monero cryptocurrency that is mined using its code, whether or not a Web site has given consent
to run it. The code is tied to a special cryptographic key that identifies which user account is to
receive the other 70 percent”, there have been Coinhive clones that implement similar functionality.
Unlike other malware classes, this type of malware is used by legitimate websites as often as it is
by criminal organisations, which raises an interesting ethical quandary. In an analysis of Alexa’s
top 10,000 sites, there were [168]“found 220 sites that launch mining when a user opens their main
page, with an aggregated audience of 500 million people”.
51
It is possible to extract a signature that identifies a piece of malware as an in-browser cryptojacker.
This is accomplished through similar methods to the binary cryptocurrency mining software’s iden-
tification.
The collection of the data for this malware type, was more manual than other malware types, as
the execution of the malware requires the installation of a browser plugin or visiting a suspicious
site. The collection of these malware samples was aided by Symantec’s threat intelligence, which
provides a classification of [169]“PUA.JSCoinminer”, standing for potentially unwanted application
- JavaScript coinminer. The provision of this threat intelligence enabled the download of a number
of samples with this classification, from VirusShare.
52
6.3.4 Ransomware
Ransomware is another malware variant that has risen to prominence recently. Due to it’s func-
tionality as an attempted overt method of extortion by encrypting data, in terms of memory or
visual dynamic and static analysis it is quite noisy. Dynamic memory-based and filesystem-based
ransomware detection mechanisms have been extraordinarily successful, with [170]“492 real-world
ransomware samples (representing 14 distinct families, the largest study of encrypting ransomware
to date) and find a 100 per cent detection rate”. Additionally there is another effective system
Paybreak, which was [171]“evaluated on 107 ransomware samples and demonstrated that PayBreak
can successfully recover from the damage caused by twelve different ransomware families”. However
ransomware remains one of the most prevalent malware classes.
53
Figure 6.20: Ransomware Executed in Virtualised Hosts
The network communication necessary for the effective provision of ransomware’s capabilities to a
target host is typified by two tasks. These tasks are the [172]“downloading of payload related files,
and/or for the communication of the encryption key”. Although there may be additional commu-
nication such as keep-alive packets transmitted, these are the main forms of behaviour involving
network activity that ransomware will engage in. This also indicates another behaviour that can
be a heuristic classifying malware as ransomware, that of there being far more traffic received than
there has been sent. This traffic is also likely to be encrypted. This communication is also likely to
be temporally concentrated in one spike of activity, to coincide with the installation and execution
of the malware. The combination of all these indicators indicates that a malware sample is likely
to be a ransomware sample. However the key indicator remains the lack of continuity of communi-
cation and the fact that the [173]“malware will perform the secure key exchange with the command
and control (C2) server, build an encryption key that will only be used on the local system”. This
encrypted key transfer after an initial contact by the host is a behaviour that will eventually infer
type.
54
Figure 6.21: Malware Classified as Ransomware
Remote access trojans are a malware class that masquerades as a legitimate program. An in-depth
analysis of one particular trojan variant in the wild found [174]“47 percent of sessions involved RDP
use, which reveals the presence of the operator to the victim”. This was perceived as suprising to
the researchers, who thought that the use of remote access tools would aim to be surreptitious and
stealthy in it’s method of operation. There are a variety of data exfiltration and evasion methods that
can be used to obfuscate the necessary network communication that is required in order for malware
to execute effectively. As a result of this capability, research states that [175]“covert communication
channels are observed in both, cyber crime operations and targeted attacks”.
The network signature resultant from remote access trojans is quite unique and distinctive. Existing
research has shown the effectiveness of feature-based detection which can easily be incorporated into
a weighted heuristic system for malware classification. [176]“(1) the outbound flow is often larger
than the inbound flow in a monitored network, (2) the inbound/outbound packets have the same case,
(3) the Main-connection is available throughout the whole communication and its duration is much
longer than the Subconnections and (4) attackers need analysing time, which is much more than
packet interval time, to decide what kind of task to perform next ”. This tested method has been
exactly the methodology utilised in order to classify malware as a remote access trojan.
Fake antivirus is a trojan variant that is somewhat common. It achieves this goal via masquerading
as a legitimate piece of antivirus software. [177]“three primary infection methods used by fake AV
distributors to propagate their malware: social engineering, drive-by-download attacks, and botnets”.
55
As this threat has evolved it has incorporated more sophisticated delivery mechanisms such as
malvertising and delivery via social networks. As a malware class it is similar to spamming bots, in
that it was previously a very financially successful cybercriminal business model that has somewhat
waned as more profitable cybercrimes have eclipsed it.
The network signature resultant from the execution of fake antivirus is similar to that of trojans
and botnets, as it incorporates elements from both malware classes in it’s infection and management
methodologies. It was discovered in an analysis of fake antivirus programs that [178]“fake AV
malware possesses interesting characteristics that distinguishes it from typical web-based malware.
For example, Fake AV domains have more Landing domains funnelling user traffic than do other
Infection domains. Fake AV distributors also rely heavily on on-line advertisements and domains
with pages that contain trending keywords. We believe that Fake AV domains have also evolved
to use more agile distribution networks that continuously rotate among short-lived domains in an
attempt to avoid detection”. The main conclusion that can be drawn from this analysis is that fake
antivirus programs mirror other malware types in their use of fast-flux domains to escape takedown,
much like botnets their is also a sophisticated infrastructure required for maintenance of control, as
fake antivirus is a recurring business model. Another feature that is extracted from botnets is the
usage of [179]“fast and domain-flux techniques, used to improve the agility of malicious networking
infrastructures”.
This was reflected in the analysis of fake antivirus binaries that was conducted, wherein the sys-
tem’s grouping frequently suggested that malware classified as fake antivirus was also classifiable
as droppers, botnet nodes and trojans. This is because the fake antivirus malware shares a lot of
characteristics with these other classes.
56
Figure 6.23: Malware Classified as Fake Antivirus, Blended with Other Types
57
6.3.7 Botnet Nodes
Botnets have a unique signature, of which many other malware types have derived elements. The
communications infrastructure utilised in order to command and control botnets have evolved over
time in order to incorporate methods of evasion. There has been a significant amount of innovation
in the methods used by botnets, in that they have [180]“moved away from IRC and started using
proprietary CC protocols”, this innovation is a testament to how longstanding of a malware class
botnets have been. Additionally as a profitable business model, they have emulated the functionality
of legitimate software-as-a-service websites. One of the leading botnet types, gh0st, is able to
[181]“obfuscate client-server communications using a proprietary network protocol and comes bundled
with intuitive graphical user interfaces that make it simple to use”. Despite this innovation in evasion
and obfuscation techniques, the methods by which botnet command and control architectures operate
and communicate remain much the same, in principle.
58
Figure 6.25: Malware Classified as Botnet
6.3.8 Worms
Worms are a malware variant that is defined by their method of propagation. The method of
propagation for computer worms is typically automatic and this is what defines it as a worm rather
than any other piece of malware.
As such the principal behaviour that classifies a worm is the scanning activity that this worm
conducts. This scanning behaviour will frequently be accompanied with the attempted exploitation
of the vulnerability it was searching for.
59
Therefore in order to classify a given piece of malware as a worm it will likely have a lot of commu-
nication or attempted communication with a large number of hosts. The first target that a worm
will typically try and exploit will be the targets on a local network, however the scanning activity of
a worm may also include attempted reconnaissance on a large number of hosts across the internet
address space.
Existing research uses a similar heuristic in order to successfully classify malware as a worm. Worm
detection operates via utilising [182]“port-scanner detection technique as a heuristic to identify ma-
licious traffic; we classify all flows from port-scanning sources as suspicious”. This is the logic that
is going to be followed in the detection of the worm malware. Attempted port scanning of a local
network can be seen as reliable evidence of a worm’s operation in progress.
60
6.3.9 Droppers
Droppers can in many ways be defined as a parasite or catalyst for other malware classifications to
achieve their goal. In many cases, a [183]“dropper is a thin client module that download the malware
modules over the network” which results in execution of all the separate stages of the malware.
Due to their specific role within the functioning of a malware operation they have a unique network
traffic signature in terms of behaviour. This behaviour is similar to a botnet node in many ways.
However there is not as likely to be as sophisticated of a command and control infrastructure within
a dropper’s operation as there is within a botnet node’s operation. In the case of Hydraq, the dropper
was [184]“responsible for the installation of the DLL component, which contains all the features and
functionalities for Hydraqs remote attacker. The dropper function will typically download files that
are of a large size from a small amount of hosts or a singular host, in order to enable the full
functionality of the malware.
Existing research based on a comprehensive analysis of existing dropper samples has shown that it is
possible to discern between benign and malicious samples by analysing their nature and content. The
methodology used was to [185]“capture the relationships between downloaders and the supplementary
executables they download, to identify large parts of the malware download activity on each host by
analysing the upstream download chain”. As such it is possible to differentiate droppers from other
malware classes by looking at the asymmetry of information received versus that which is sent.
61
Figure 6.31: Malware Classified as Dropper
In a manner similar to droppers, exploit kits are a meta-type of malware, in that they simply act
as an enabler for more complex and damaging malware to execute. In terms of the methodology
by which this malware executes and infects a victim.[186]“The malicious web page then returns an
HTML document, containing exploits, which are usually hidden in an obfuscated JavaScript code.
If at least one exploit succeeds, then a victim gets infected. Successful exploitation means, that the
shellcode injected has finished flawlessly and hence accomplished its task - to download and execute
a malicious program”. This can be regarded as what is known as a drive-by-download attack, a
drive-by download attack is when a [187]“page contains code, typically written in the JavaScript
language, that exploits vulnerabilities in the” browser. This type of attack is a profitable business
model, as the resultant binary and infected computer can then be sold to other actors within the
malware ecosystem as a host for whichever malware is to be installed. The binary under analysis
will be the resultant binary that is installed
62
Figure 6.32: Malware Classified as Exploit Kit
As exploitation and malware move into the domain of nation state capabilities, it is necessary to
include the cyber-physical incarnation of nation state capabilities, that of the Advanced Persistent
Threat (APT). [188]“The term APT is something that grew out of the US military almost a decade
ago. It’s always had a reasonably precise definition: it covers attacks carried out by nation-states”.
The qualifying features of advanced persistent threats are typified by their advanced nature and
their targets being of global geopolitical strategic importance.
The sophistication and advanced nature of this malware makes it some degree immune to effective
analysis by an automated system. It can be noted that the malware in this training set, whilst it
contained behaviours that were common to other malware sets, there was not enough commonality
between the behaviours. They can be regarded as black swan events, due to their almost unique
nature. Black swan events are highly impactful events that ensure that [189]“the extreme, the
unknown, and the very improbable” are the most important events within human society. The
malware that can be classified as APT is unique enough to evade methods of generalisation via
network forensic analysis and classification.
Existing business literature recognises that [190]“black swan events that can occur suddenly, with
unexpectedly widespread ramifications” are now a reality within the cyber domain. The gain is not
primarily financial like other malware variants. The aim of this malware is typically to validate the
initial investment in cyberwarfare programs by the provision of strategic advantage. [191]“Cyber
disruptions should be extended into a larger modelling perspective that considers other business sec-
tors that depend on the initially affected company”. This quotation emphasises the economic and
strategic importance of being able to disable or control targets of advanced persistent threat cam-
paigns. The value of this malware infrastructure being present is not directly monetised. The value
therefore lies in the operational capability that is offered to the nation state actor that is funding
this. It is also worth noting that being able to obtain control or damage a potentially hostile nation’s
infrastructure or sensitive information is[192]“critical in shifting the cyberwarfare strategic balance”
for the nation that has these capabilities.
63
A reason this may be the case is the highly complex and targeted nature of these APT campaigns
means that they are fairly unique in terms of their methods of operation and so commonalities
cannot be derived. APT campaigns are typically defined as [193]“highly targeted attacks focus on
individual organisations in an effort to extract valuable information” and as such may be designed
only to execute on a particular architecture or operating system in a particular location. A malware
campaign that fits these criteria is the Stuxnet campaign which took special care to only execute it’s
payload on the centrifuges on which it executed. This was achieved through propagation limiting
which included [194]“rate-limiting code within Stuxnet for example, a USB infection will delete itself
from the USB key after the third infection”, in addition to this there were various other safeguards
that ensured the malware only executed on the correct [195]“Siemens S7-417 controllers with a
matching configuration. The S7-417 is a top-of-the line industrial controller for big automation
tasks. In Natanz, it is used to control the valves and pressure sensors of up to six cascades (or 984
centrifuges) that share common feed, product, and tails stations”.
Malware targeting a specific location and configuration means that virtualised infrastructure cannot
accurately capture the full behaviour and therefore the network traffic that is produced. It may
be the case that these campaigns contain such a degree of anti-forensics, evasion and obfuscation
capabilities, that they are unable to execute in their entirety on the target virtual machines.
64
7 Testing
To evaluate whether the project can be evaluated as a success, the system must be tested thoroughly
in a variety of ways. One way in which the system must be tested is the loading and execution of
malware samples.
This testing of the system will form the bulk of the analysis portion of the dissertation, this being
important as the system has been developed as a systems engineering project and thus must be
evaluated as such. There should also be an element of unit testing to ensure that each modular part
of the project works as it is supposed to. This is perceived as part of good software engineering
practice. There must also be user testing. This user testing must survey principally the usability of
the software but also whether or not the designed software fulfils the criteria earlier established.
An element of ensuring that a system has been completed to an excellent degree is that there is a large
amount of code coverage with unit tests. Code coverage [196]“is important for testing and validating
code during development” and should serve as verification that code written has fulfilled its expected
functionality. Unit testing is utilised for the purpose of ensuring that the system performance is
commensurate with the design plans that have been earlier described.
The overall use of unit testing to verify that the design has been adhered to is a tenet of good
software engineering. [197]“Unit testing is to test each unit (basic component) of a program to verify
that the detailed design for the unit has been correctly implemented”, this is the core functionality
intended by the use of unit testing in a piece of software. This is regarded as a necessary element
of evaluating whether a system has been effectively implemented. Unit testing is performed by, for
all distinct aspects of the system, assessing what possible inputs are available to the given system,
what possible outputs can occur from the processing and whether these conform. This unit testing
is performed programmatically for all functions that are constituent parts of the system.[198]“Unit
tests are structural, or white-box based” and this is performed with access to the code to ensure
verification of the correct functioning of all aspects of the systems.
This project has been developed for end users that are malware analysts. In this case people involved
in the cybersecurity industry and malware analysis are one of the core groups of testers. There is
significant weight attached to the opinions of the malware analysts as they are the end users for
which this product is designed.
Another demographic who will conduct testing on the system is users without experience in computer
software, this is as a measure of how effective the data visualisation is. The data visualisation
should [199]“seek to simplify the representation of hypotheses, theories, or stories, which are often
not clear from raw data. As a result of this simplification the user’s should be able to ascertain the
underlying type classification and nature of data visualised, without excessive training or familiarity
with malware.
Another integral part of the user testing of this system is the extraction of quantitative numerical
results. In order to transform the user testing into quantitative results it is necessary to survey
the users of the system after the system has been implemented. This allows the analysis of user
opinions and enables the more effective evaluation of whether or not the initial requirements and
design specifications have been fulfilled. To this end, the test set of 33 users were asked to rate their
agreement with a statement, on a scale from 1 to 5, 1 being strongly disagreeing, 5 being strongly
agreeing. These user survey results were then aggregated and are visible in appendix B
65
8 Reflection
8.1 Technical
The development of this system for the purpose of malware analysis has been a complex process and
a necessary component in any text describing this process is a retrospective evaluation. There are a
variety of different lessons that can be extracted from the development of the system.
Although it has been possible to classify the malware analysis portion of this project as a success,
with regards to it’s usability as a software engineering project as per the user requirements, there
are flaws. One critical flaw that was exposed throughout the implementation and testing process
was the inability for the platform to be effectively distributed as open source software, due to the
required unique configurations on test virtual machines and extensive setup process. This has been
recognised as an emergent problem within software systems for decades, as a Stanford paper from
the early 1990’s describes this same issue with virtual machines. This paper states that [200]“users
cannot move between computers and resume work uninterrupted. System administration is also more
difficult. Operating systems and applications are hard to maintain. Machines whose configurations
are meant to be the same drift apart as different sets of patches, updates, and installs are applied in
different orders”.
In hindsight one potential technical solution for this would have been utilising a configuration man-
agement tool such as Ansible, Puppet or CFEngine in order to standardise configuration management.
The validity of configuration management for standardisation of virtual machine environments on
deployed systems may be inappropriate for this use case, as valid strategies for [201]“configuration
management are often determined by the nature of the site being managed and its mission. As
off-the-shelf configuration management software is designed typically for enterprise environments it
would not be fit for this purpose.
A solution to combat this lack of portability and reproducability of this analysis platform would have
been to heed my supervisor’s guidance with regards to the planning and structure of the project.
This advice was to decide at an early stage whether the project was to be a software engineering
project following a design framework or to be a quantitative analysis research project. The approach
chosen was divergent from this and attempted to incorporate elements of software engineering and
design, whilst also enabling quantitative analysis of malware network traffic. Within future projects
care will be taken in order to ensure that the goals of the project are clear from the outset and that
there are clear and defined metrics for success.
With regards to the technical implementation, lessons can be gleaned from the implementation of
this system. The most pertinent knowledge that was extracted from this project was the familiarity
gained with the propagation mechanisms that malware utilises and other aspects of network forensics
with regards to malicious software.
Software such as Cuckoo Sandbox and Anubis achieves network packet capture on virtualised sys-
tems on a far larger scale successfully. The validity of this research, although successful in proving
it’s initial hypothesis, can be called into question. This ’reinventing the wheel’ despite a thorough
dissertation mirrors a similar mistake during my undergraduate dissertation, in which network in-
terception software was written instead of using an existing library. Despite the eventual software
produced achieving it’s aims of network intrusion detection on Linux systems, significant amounts
of work were created by not using libraries such as libpcap. By leveraging an existing malware
analysis and packet capture collection tool, time could have been saved, there could have been a
larger malware corpus analysed and there could have been more in-depth analysis and classification
conducted. This is especially important given the doctoral cybersecurity study being embarked upon
almost immediately following the submission of this dissertation, as the requirements for being a
doctoral researcher involve innovation and novelty within the research conducted. These mistakes
can be reacted to, to improve the quality of research conducted by myself in future, taking pains to
build upon existing libraries and datasets, enabling the conduction of innovative research.
66
A takeaway from this research is that novelty is an important criterion for the execution of successful
research. Whilst not explicitly required within the context of a dissertation, the conduction of
successful research must consider the novelty and whether or not this research has been conducted
successfully before. Whilst the classification of malware into type based on weighted heuristics is
novel, the scale of samples analysed and replicability has been performed more effectively by other
research projects.
8.2 Personal
There are also personal lessons to extract from this undertaking. One is the importance of being
realistic in terms of establishing requirements and scope. In choosing research projects and their
scope it is necessary to [202]“reflect if i am choosing the right approach, refine my question and
consider if i’m being overambitious in my plans”. Problems manifest in an early version of this
project that were resultant from an overambitious scope. Another issue with regards to scoping is
that of scope creep wherein [203]“more and more functions” are added to the scope of the initial
project requirement.
This is a lesson with regards to character that should enable the establishment of realistic goals.
This is also a professional and academic lesson, as it teaches the importance of setting specific,
measurable, attainable, realistic and timebound goals concerning planning of projects. Utilising
quantifiable goals fitting the preceding criteria would enable me to [204]“lose the benefit of a more
abstract objective in order to gain quantification” and this enables more effective research. Another
lesson learnt related to the previous is the importance of planning within a software project, to
achieve the necessary results within the timeframe provided. This is a perennially recurring concern
in research projects that involve software development elements due to the fact that [205]“more
software projects have gone awry for lack of calendar time than for all other causes combined”.
Overall, this project has complemented the course curriculum of the systems security programme. It
has given me valuable skills in the fields of network security, malware analysis, forensics and reverse
engineering. These skills have been developed in duality with the taught courses and my professional
development from systems administrator to penetration tester to security researcher throughout the
duration of the course. It is not possible to extricate this academic and professional development
from each other, particularly as this course has lead to spending at least the next four years in
academic cybersecurity research. The programme and this project have acted as a catalyst enabling
the stoic [206]“satisfaction in the things which it enables thee to do according to right reason”, as
defined in Marcus Aurelius’smeditations and enabled the beginning of an exciting research career.
67
Bibliography
[1] G. O. G. S. K. Olivier Thonnard, Leyla Bilge and M. Lee, “Industrial Espionage and Targeted
Attacks: Understanding the Characteristics of an Escalating Threat, Research in Attacks
Intrusions and Defenses,” p. 78, 2017. Symantec.
[2] F. Cohen, “Computer Viruses: Theory and Experiments,” p. 10, 1984. Department of Com-
puter Science, Lehigh University, Bethlehem.
[3] D. A. Yanfang Ye, Tao Li and S. Iyengar, “A Survey on Malware Detection Using Data Mining
Techniques,” p. 2, 2017. West Virginia University and Florida International University and
Nanjing University.
[4] V. B. Annarita Giani and G. Cybenko, “Data Exfiltration and Covert Channels,” p. 2, 2006.
Thayer School of Engineering, Dartmouth College.
[7] R. M. Henri Gruffat and E. Manet, “Herpesvirus Late Gene Expression: A Viral-Specific
Pre-initiation Complex Is Key,” p. 3, 2016. Universitie De Lyon, France.
[8] L. Newman, “Latest Ransomware Hackers Didn’t Make Wannacry’s Mistakes,” p. 1, 2017.
Wired Magazine.
68
[15] M. Rice and S. Shenoi, “Critical Infrastructure Protection X,” p. 186, 2016. Symantec.
[16] J. B. Claudio Guarnieri, Mark Scholesser and A. Tanasi, “Mo Malware, Mo Problems Cuckoo
Sandbox,” p. 2, 2013. Blackhat USA 2013.
[17] C. Eagle, “The Ida Pro Book,” p. 528, 2011. No Starch Press.
[20] C. W. P. D. Konrad Rieck, Thorsten Holz and P. Laskov, “Learning and Classification of
Malware Behavior,” p. 17, 2008. Fraunhofer Institute, University of Mannheim and University
of Tubingen.
[21] V. A. D. S. F. F. P. G. AndrÃľ Ricardo, Abed GrÃľgio and M. Jino, “Towards a Taxonomy
of Malware Behaviours,” p. 7, 2015. University of Campinas, Campinas, Sao Paulo, Brazil.
[22] A. Lovelace, “Sketch of The Analytical Engine,”
[23] E. Chien and P. SzÃűr, “Blended Attacks Exploits, Vulnerabilities and Buffer-Overflow Tech-
niques in Computer Viruses,” p. 32, 2017. Symantec Security Response, Virus Bulletin.
[24] Y. H. Gianluca Stringhini, Yun Shen and X. Zhang, “Marmite: Spreading Malicious File
Reputation Through Download Graphs,” p. 2, 2017. University College London, Symantec
Research Labs and King Abdullah University of Science and Technology.
[25] L. Rocha, “Neutrino Exploit Kit Analysis and Threat Indicators,”
[26] X. Y. Ali Zand, Giovanni Vigna and C. Kruegel, “Extracting Probable Command and Control
Signatures for for Detecting Botnets,” p. 1, 2014. Computer Security Lab, University of
California, Santa Barbara.
[27] B. Krebs, “KrebsOnSecurity Hit With Record DDoS,” p. KrebsOnSecurity, 2016. KrebsOn-
Security.
[30] B. Krebs, “Who is Anna-Senpai, the Mirai Worm Author?,” 2017. Krebs on Security.
[31] J. A. M. M. F. J. Michael Bailey, Jon Oberheide and J. Nazari, “Automated Classification
and Analysis of Internet Malware,” p. 17, 2007. Electrical Engineering and Computer Science
Department, University of Michigan.
[32] G. Stringhini, “Stepping Up the Cybersecurity Game: Protecting Online Services from Mali-
cious Activity,” p. 181, 2014. University of California, Santa Barbara.
[33] CERT-UK, National Cybersecurity Centre, “Code Obfuscation,” p. 7, 2014.
[34] P. F. G. G. Roberto Perdisci, Davide Ariu and W. Lee, “McPAD : A Multiple Classifier System
for Accurate Payload-based Anomaly Detection,” p. 6, 2009. Georgia Institute of Technology,
University of Cagliarai, Google.
69
[35] B. AlAhmadi and I. Martinovic, “MalClassifier: Malware Family Classification Using Network
Flow Sequence Behaviour,” p. 11, 2018. Department of Computer Science, University of
Oxford.
[36] M. Rafique and J. Caballero, “FIRMA: Malware Clustering and Network Signature Generation
with Mixed Network Behaviours,” p. 5, 2013. IMDEA Software Institute.
[37] M. Sikorski and A. Honig, “Practical Malware Analysis: The Hands-On Guide to Dissecting
Malicious Software,” p. 369, 2012. No Starch Press.
[38] C. Wueest, “Threats To Virtual Environments,” p. 12, 2014. Symantec.
[39] Immunity Incorporated, “Cloudburst,” p. 14, 2009. Blackhat USA.
[40] M. Hutchins, “How to Accidentally Stop a Global Cyber Attack,” p. 1, 2017. Malware Tech
Blog.
[41] C. H. C. K. Ulrich Bayer, Paolo Comparetti and E. Kirda, “Scalable, Behavior-Based Malware
Clustering,” p. 15, 2009. University of California, Santa Barbara.
70
[57] D. Huttenlocher and D. Spoonhowe, “Principles and Practices of Software Development,”
p. 19, 2002. Cornell and Carnegie Mellon University.
[58] R. C. Martin, “Clean Code: A Handbook of Agile Software Craftmanship,” p. 154, 2009.
Prentice Hall.
[59] R. Kamboj and A. Arya, “Openstack: Open Source Cloud Computing IaaS Platform,” p. 1,
2014. International Journal of Advanced Research in Computer Science and Software Engi-
neering.
[60] J. M. Adrian Holovaty, “The Definitive Guide to Django: Web Development Done Right,”
p. 227, 2007.
[61] U. Ramana and T. Prabhakar, “Some Experiments with the Performance of LAMP Architec-
ture,” p. 1, 2005. IIT Kanpur.
[62] R. L. A. H. A. R. D. K. Mark Wagner, Fred Fischer and W. Aigner, “A Survey of Visualization
Systems for Malware Analysis,” p. 1, 2015. Eurographics Conference on Visualisation.
[63] C. K. E. K. X. Z. Clemens Kolbitsch, Paolo Milani Comparetti and X. Wang, “Effective and
Efficient Malware Detection at the End Host,” p. 3, 2009. University of California, Santa
Barbara and Secure Systems Lab, TU Vienna.
[64] R. Kuakini, “CrossRAT, A New Cross Platform Malware,” p. 1, 2018. University of Hawai’i -
West O’ahu - Cyber Security Coordination Center.
[68] G. Kessler, “Anti-Forensics and the Digital Investigator,” p. 3, 2007. Champlain College.
[69] K. K. Kyounho Lee, Hyunuk Hwang and B. Noh, “Robust Bootstrapping Memory Analysis
Against Anti-Forensics,” p. 4, 2016. Chonnam National University, Gwangju, South Korea.
[70] A. Pekta and T. Acarman, “A Dynamic Malware Analyzer Against Virtual Machine Aware
Malicious Software,” p. 4, 2014. Computer Engineering Department, Galatasaray University,
Turkey.
[71] B. Blunden, “Anti-Forensics: The Rootkit Connection,” p. 16, 2009. Black Hat USA 2009 and
Below Gotham Labs.
[72] M. Cohen, “PyFlag An Advanced Network Forensic Framework,” p. 2, 2008. Australian Federal
Police, Brisbane, Australia.
[73] W. E. Naef, “Descriptions of SHA-256, SHA-384, and SHA-512,” p. 3, 2011. Information
Warfare Site.
[74] B. K. Doowon Kim and T. Dumitras, “Certified Malware: Measuring Breaches of Trust in the
Windows Code-Signing PKI,” p. 1, 2017. University of Maryland.
[75] G. W. Zulfany Rasjida, Benfano Soewitob and E. Abdurachman, “A Review of Collisions in
Cryptographic Hash Function Used in Digital Forensic Tools,” p. 8, 2017. Bina Nusantara
University.
[76] G. S. Federico Maggi, Andrea Bellini and S. Zanero, “Finding Non-Trivial Malware Naming
Inconsistencies,” p. 1, 2011. Dipartimento di Elettronica e Informazione, Politecnico di Milano.
71
[77] J. W. A. L. L. R. Y. E. Nir Nissim, Aviad Cohen and L. Giles, “Scholarly Digital Libraries as a
Platform for Malware Distribution,” p. 10, 2017. The Malware Lab at the Cyber Security Re-
search Center (CSRC), Ben-Gurion University, Pennyslvania State University and University
of Milano.
[78] F. Mansmann, “Visual Analysis of Network Traffic âĂŞ Interactive Monitoring, Detection,
and Interpretation of Security Threats,” p. 65, 2008. University of Konstanz.
[79] Microsoft Secure Blog Staff, “Understanding the Geography of Malware,” p. 2, 2016. Microsoft
Secure Blog Staff.
[80] A. Sharma and S. Sahay, “Evolution and Detection of Polymorphic and Metamorphic Mal-
wares: A Survey,” p. 3, 2014. BITS Pilani, Goa Campus, Zuaringiar.
[81] S. M. H. F. Zahra Bazrafshan, Hashem Hashemi and A. Hamzeh, “A Survey on Heuristic
Malware Detection Techniques,” p. 2, 2013. Shiraz University, Iran.
[82] E. K. Manuel Egele, Theodoor Scholte and C. Kruegel, “A Survey on Automated Dynamic
Malware Analysis Techniques and Tools,” p. 40, 2012. University of California, Santa Barbara
and Sophia Antipolis.
[83] K. G. Xin Hu, Sandeep Bhatkar and K. Shin, “MutantX-S: Scalable Malware Clustering Based
on Static Features,” p. 6, 2013. Usenix.
[84] I. G. Paul Black and R. Layton, “A Survey of Similarities in Banking Malware Behaviours,”
p. 6, 2017. Usenix.
[85] C. W. Konrad Rieck, Philipp Trinius and T. Holz, “Automatic Analysis of Malware Behavior
using Machine Learning,” p. 2, 2011. Berlin Institute of Technology, Germany.
[86] P. Biondi, “Packet Generation and Network Based Attacks With Scapy,” p. 9, 2005. Corporate
Research Center France, EADS CCR, Secdev.
[87] Y. Zhu, “Introducing Google Chart Tools and Google Maps API in Data Visualization
Courses,” p. 2, 2012. Georgia State University.
[88] A. Holovaty and J. Moss, “The Definitive Guide to Django: Web Development Done Right,”
p. 9, 2015. Massachusetts Institute of Technology.
[89] R. Tomlinson, “Thinking about GIS: Geographic Information System Planning for Managers,”
p. 97, 2007. Science.
[90] K. Sheppard, “Introduction to Python for Econometrics, Statistics and Data Analysis,” p. 31,
2014. University of Oxford.
[91] A. A. James Akeret, Loren Gamper and A. Refregier, “HOPE: A Python Just-in-time Compiler
for Astrophysical Computation,” p. 1, 2015. Institute for Astronomy, Department of Physics,
ETH Zurich.
[92] J. Guerroro-Saade, “The Ethics and Perils of APT Research: An Unexpected Transistion into
Intelligence Brokerage,” p. 3, 2015. Kaspersky Labs USA.
[93] Cisco Talos, “Talos Group: Protecting Your Network,” p. 1, 2015. Cisco Talos.
[94] K. Zetter, “Countdown to Zero Day,” p. 474, 2014. Penguin Random House.
[95] M. Hypponen, “The Conficker Myster,” p. 474, 2009. F-Secure Corporation.
[96] H. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Ma-
chine Learning Models,” p. 1, 2018. Endgame.
72
[97] A. Klein and I. Kotler, “The Adventures of AV and The Leaky Sandbox,” p. 13, 2017. Defcon
25, SafeBreach.
[98] J. J. L. B. Bum Kwon, Jayanta Mondal and T. Dumitras, “The Dropper Effect: Insights into
Malware Distribution withDownloader Graph Analytics,” p. 2, 2015. University of Maryland
and Symantec Research Labs.
[99] National Institute of Standards and Technology, “Malware Risks and Mitigation Report,”
p. 20, 2011. National Institute of Standards and Technology.
[100] Kaspersky, “The Big Four Banking Trojans,” 2013. Kaspersky.
[101] M. M. Warren Mercer, Ben Baker and P. Rascagneres, “Olympic Destroyer Takes Aim at
Winter Olympics,” p. 1, 2018. Cisco Talos.
[102] A. Lanstein, “Polymorphism in Crimeware and why it isnâĂŹt Needed in Targeted Attacks,”
p. 4, 2012. FireEye, Blackhat.
[103] B. L. Ben Stock and B. Zorn, “Kizzle: A Signature Compiler for Detecting Exploit Kits,” p. 1,
2016. Saarland University and Microsoft Research.
[104] Kaspersky, “Talos Group: Protecting Your Network,” p. 1, 2018. Kaspersky Lab.
[105] X. W. Z. L. L. X. R. B. Xiaojing Liao, Kan Yuan, “Acing the IOC Game: Toward Auto-
matic Discovery and Analysis of Open-Source Cyber Threat Intelligence,” p. 1, 2016. Georgia
Institute of Technology and Indiana University Bloomington.
[106] G. W. A. D. Cynthia Wagner, Andras Iklody and S. Mokaddem, “Decaying Indicators of
Compromise,” p. 5, 2018. CIRCL- Computer Incident Response Center Luxembourg.
[107] P. M. Michalis Polychronakis and N. Provos, “Ghost Turns Zombie: Exploring the Life Cycle
of Web-Based Malware,” p. 1, 2008. Google Research.
[108] M. Williams, “Throttling Viruses: Restricting Propogation to Defeat Malicious Code,” p. 7,
2002. HP Labs Bristol.
[109] J. B. George Silowash, Todd Lewellen and D. Costa, “Detecting and Preventing Data Exfiltra-
tion Through Encrypted Web Sessions via Traffic Inspection,” p. 43, 2013. Carnegie Mellon
Institute and Department of Defense.
[110] T. Berners-Lee, “Information Management: A Proposal,” p. 1, 1989. CERN.
[111] N. Villeneuve and J. Bennett, “Detecting APT Activity with Network Traffic Analysis,” p. 13,
2012. Trend Micro Inc.
[112] P. Lamprakis, “Human or Malware? Detection of Malicious Web Requests,” p. 26, 2016.
EidgenÃűssische Technische Hochschule ZÃijrich.
[113] I. Ristic, “Bulletproof SSL and TLS,” p. 36, 2014. Feisty Duck.
[114] S. P. Blake Anderson and D. McGrew, “Deciphering MalwareâĂŹs use of TLS (without De-
cryption),” p. 5, 2016. Cisco.
[115] J. K. Mihir Bellare and P. Rogaway, “The Security of Cipher Block Chaining,” p. 5, 1994.
Advanced Networking Laboratory, IBM T.J. Watson Research Center.
[116] Electronic Frontier Foundation, “Cracking DES: Secrets of Encryption Research, Wiretap
Politics Chip Design - How Federal Agencies Subvert Privacy,” 1998. Electronic Frontier
Foundation.
[117] J. Scahill, “The Assassination Complex: Inside the Government’s Secret Drone Warfare Pro-
gram ,” p. 207, 2016. Simon Schuster.
73
[118] P. Albitz and C. Liu, “DNS and BIND,” p. 44, 2001. O’ Reilly.
[119] L. Daigle, “WHOIS Protocol Specification,” p. 1, 2004. Network Working Group, Internet
Engineering Task Force.
[120] DomainTools, “Getting Started with DomainTools for Threat Intelligence and Incident Foren-
sics,” p. 1, 2018. Anti-Phishing Working Group.
[121] L. R. Dmitri Bekerman, Bracha Shapira and A. Bar, “Unknown Malware Detection Using
Network Traffic Classification,” p. 2, 2015. Ben-Gurion University of the Negev, Israel.
[122] T. P. J. H. Paul Prasse, Lukas Machlica and T. Scheffer, “Malware Detection by Analysing
Encrypted Network Traffic with Neural Networks,” p. 2, 2017. Department of Computer
Science, University of Potsdam, Germany.
[123] Google, “Google Charts: Line Chart,” 2018. Google.
[124] S. F. B. E. Steven Hofmeyr, Tyler Moore and G. Stelle, “Modeling Internet-Scale Policies for
Cleaning up Malware,” p. 1, 2012. Harvard University and University of New Mexico.
[125] Y. Shavitt and N. Zilberman, “A Geolocation Databases Study,” p. 2, 2011. Tel-Aviv Univer-
sity, Israel.
[126] Google, “Google Maps Geocoding API,” 2018. Google.
[127] N. T. E. S. Frederic Giroire, Jaideep Chandrashekar and D. Papagiannaki, “Exploiting Tem-
poral Persistence to Detect Covert Botnet Channels,” p. 332, 2009. Intel Research.
[128] G. Danezis and R. Clayton, “Introducing Traffic Analysis,” p. 3, 2005. University of Cambridge,
Computer Laboratory.
[129] J. C. Sang Suh, Ulrik Tanik and A. Eroglu, “Applied Cyber Physical Systems,” p. 78, 2013.
Springer Science and Business Media.
[135] Z. Chen and C. Ji, “Measuring Network-Aware Worm Spreading Ability,” p. 1, 2008. Georgia
INstitute of Technology.
[136] Trend Micro, “EternalRocks Emerges, Exploits Additional ShadowBroker Vulnerabilities,”
2017. Trend Micro.
74
[138] Z. Jin, “Visualization of Network Traffic to Detect Malicious Network Activity,” p. 79, 2008.
Norwegian University of Science and Technology Department of Telematics.
[139] B. Potter, “Malware Detection through Network Flow Analysis,” p. 40, 2013. Defcon 16 USA.
[140] F-Secure, “CoscmicDuke: Cosmu With a Twist of MiniDuke,” p. 3, 2014. F-Secure.
[144] Zerodium, “Tor Browser Zero-Day Exploits Bounty (Expired),” 2017. Zerodium.
[145] T. Project, “Frequently Asked Questions: What Are Entry Guards?,” 2018. Tor Project.
[146] M. N. Issa Khalil, Bei Guan and T. Yun, “Killing Two Birds with One Stone: Malicious
Domain Detection with High Accuracy and Coverage,” p. 14, 2017. Qatar Computing Research
Institute.
[147] W. Wang and K. Shirley, “Breaking Bad: Detecting Malicious Domains Using Word Segmen-
tation,” p. 2, 2015. ATT Security Research Center.
[148] J. Constine, “Energy-Saving Bitcoin Rival Chia Raises From A16Z Plans Mini-IPO,” 2018.
TechCrunch.
[153] K. R. Thorsten Holz, Christian Gorecki and F. Freiling, “Measuring and Detecting Fast-Flux
Service Networks,” p. 5, 2008. University of Mannheim and Fraunhofer FIRST.
[154] A. K. C. P. C. B. Adrienne Porter Felt, Richard Barnes and P. Tabriz, “Measuring HTTPS
Adoption on the Web,” p. 11, 2017. Google, Cisco, Mozilla, USENIX Security Symposium.
[155] S. Deck, “Extracting Files from Network Packet Captures,” p. 5, 2015. SANS Institute.
[156] G. Polya, “How to Solve It: A New Aspect of Mathematical Method,” p. 162, 1945. Princeton
University Press.
[157] N. Kokash, “An Introduction to Heuristic Algorithms,” p. 2, 2017. Department of Informatics
and Telecommunications University of Trento, Italy.
[158] M. Schmall, “Heuristic Techniques in AV Solutions: An Overview,” p. 1, 2002. Symantec.
[159] J. Koret and E. Bachaalany, “The Antivirus Hacker’s Handbook,” p. 162, 2015. Wiley.
[160] W. M. Nick Biasini, Edmund Brumaghin and J. Reynolds, “The State of Malicious Crypto-
mining,” p. 1, 2018. Malwarebytes Labs.
75
[161] W. M. Nick Biasini, Edmund Brumaghin and J. Reynolds, “Ransom Where? Malicious Cryp-
tocurrency Miners Takeover, Generating Millions,” p. 1, 2018. Cisco Talos.
[162] G. Hileman and M. Rauchs, “Global Cryptocurrency Benchmarking Study,” p. 88, 2017. Uni-
versity of Cambridge, Centre for Alternative Finance.
[163] P. Ducklin, “What are WannaMine Attacks and how do I Avoid Them?,” 2018. Sophos Naked
Security.
[164] nao sec, “Survey of ”Ngay Campaign”,” 2017. nao sec.
[165] “Cryptomining Campaign Returns Coal and Not Diamond,” 2018. Cisco Talos.
[169] H. Lau, “Broser-Based Cryptocurrency Mining - Makes Unexpected Return from the Dead,”
2017. Symantec Threat Intelligence.
[170] P. T. Nolen Scaife, Henry Carter and K. Butler, “CryptoLock (and Drop It): Stopping Ran-
somware Attacks on User Data,” p. 9, 2016. University of Florida and Villanova University.
[171] G. S. Eugene Kolodenker, William Koch and M. Egele, “PayBreak: Defense Against Crypto-
graphic Ransomware,” p. 11, 2017. Boston University and University College London.
[172] D. Nieuwenhuizen, “A behavioural-based approach to ransomware detection,” p. 7, 2017.
MWR Labs.
[173] A. Kurniawan and I. Riadi, “Detection and Analysis Cerber Ransomware Based on Network
Forensics Behavior,” p. 2, 2017. International Journal of Network Security.
[174] P. P. H. D. H. Y. S. B. D. M. Brown Farinholt, Mohammad Rezaeirad and K. Levchenko,
“To Catch a Ratter: Monitoring the Behavior of Amateur DarkComet RAT Operators in the
Wild,” p. 14, 2017. University of California, San Diego, George Mason University, University
of California, Berkeley, New York University.
[175] P.-M. Bureau and C. Dietrich, “Hiding in Plain Sight Advances in Malware Covert Commu-
nication Channels,” p. 6, 2015. Crowdstrike and Dell Secureworks.
[176] Y. Z. J. X. Shicong Li, Xiaochun Yun and Y. Wang, “A General Framework of Trojan Commu-
nication Detection Based on Network Traces,” p. 5, 2012. Institute of Computing Technology,
Chinese Academy of Sciences, Beijing.
[177] R. K. C. K. D. S. Brett Stone-Gross, Ryan Abman and G. Vigna, “The Underground Economy
of Fake Antivirus Software,” 2011. University of California, Santa Barbara.
[178] P. M. N. P. Moheeb Rajab, Lucas Ballard and X. Zhao, “The Nocebo Effect on the Web: An
Analysis of Fake Anti-Virus Distribution,” p. 8, 2010. Google Inc.
[179] P. Y. Dae Kim and J. Zhang, “Detecting Fake Anti-Virus Software Distribution Webpages,”
p. 4, 2014. Department of Computer Science and Engineering, Wright State University.
[180] G. Saito and G. Stringhini, “Master of Puppets: Analyzing And Attacking A Botnet For Fun
And Profit,” p. 13, 2015. University College London.
[181] Fortinet, “Threat Landscape Report 2017 Q4,” p. 18, 2017. Fortinet.
76
[182] B. Karp and H.-A. Kim, “Autograph: Toward Automated, Distributed Worm Signature De-
tection,” p. 4, 2004. Carnegie Mellon University and Intel Research.
[183] M. Rice and S. Shenoi, “Critical Infrastructure Protection XI,” p. 76, 2017. Springer.
[184] Z. Ferrer and M. C. Ferrer, “In-depth Analysis of Hydraq The Face of Cyberwar Enemies
Unfolds,” p. 9, 2014. CA ISBU.
[185] J. J. L. B. Bum Kwon, Jayanta Mondal and T. Dumitras, “The Dropper Effect: Insights into
Malware Distribution with Downloader Graph Analytics,” p. 11, 2015. University of Maryland
and IBM Research.
[186] V. Kotov and F. Massacci, “Anatomy of Exploit Kits: Preliminary Analysis of Exploit Kits
as Software Artefacts,” p. 3, 2016. University of Trento, Italy.
[187] C. K. Marco Cova and G. Vigna, “Detection and Analysis of Drive-by-Download Attacks and
Malicious JavaScript Code,” p. 1, 2010. University of California, Santa Barbara.
[188] R. Genes, “Targeted Attacks versus APTs: WhatâĂŹs The Difference?,” 2015. TrendLabs
Security Intelligence Blog.
[189] N. Taleb, “The Black Swan: The Impact of the Highly Improbable,” p. 27, 2007. Allen House.
[190] C. Herbolzheimer, “Preparing for a Black Swan Cyberattack,” 2016. Harvard Business Review.
[191] Y. H. Joost Santos and C. Lian, “A Framework for Linking Cybersecurity Metrics to the
Modeling of Macroeconomic Interdependencies,” p. 4, 2007. National Library of Medicine,
National Institute of Health.
[192] S. Goel, “Cyberwarfare: Connecting the Dots in Cyber Intelligence,” p. 4, 2011. University of
Albany.
[193] M. B. N. V. David Sancho, Jessa dela Torre and R. McArdle, “IXESHE: An APT Campaign,”
p. 3, 2012. Trend Micro.
[194] L. M. Nicolas Falliere and E. Chien, “W32.Stuxnet Dossier,” p. 10, 2011. Symantec Security
Response.
[195] R. Langner, “To Kill a Centrifuge: A Technical Analysis of What Stuxnet’s Creators Tried to
Achieve,” p. 8, 2013. The Langner Group.
[196] M. Tikir and J. Hollingsworth, “Efficient Instrumentation for Code Coverage Testing,” p. 1,
2002. University Of Maryland, College Park.
[197] J. Zhao, “Data-Flow-Based Unit Testing of Aspect-Oriented Programs,” p. 1, 2003. Depart-
ment of Computer Science and Engineering, Fukuoaka Institute of Technology.
[198] P. Runeson, “A Survey of Unit Testing Practices,” p. 22, 2005. Lund University.
[199] M. Gatto, “Making Research Useful: Current Challenges and Good Practices in Data Visual-
isation,” p. 9, 2015. University of Oxford, Reuters Institute of Journalism.
[200] B. P. J. C. M. L. Constantine Sapuntzakis, Ramesh Chandra and M. Rosenblum, “Optimizing
the Migration of Virtual Computers,” p. 1, 1993. Stanford University.
[201] M. Burgess and A. Couch, “Modeling Next Generation Configuration Management Tools,”
p. 1, 2006. Large Installation System Administration Conference.
[202] P. Boynton, “The Research Companion: A Practical Guide for Those in the Social Sciences,
Health and Development,” 2016. Routledge.
77
[203] B. Schneier, “Security and Function Creep,” 2010. Schneier on Security.
[204] G. Doran, “There’s a S.M.A.R.T. Way to Write Management’s Goals and Objectives,” p. 36,
1981. Management Review AMA Forum.
[205] F. Brooks, “The Mythical Man-Month,” p. 26, 1995. Addison Wesley.
78
1 Results of Malware Samples Classification
79
1.2 In-Browser Cryptojacker
81
1.4 Remote Access Trojan
82
1.5 Fake Antivirus
Malware Hash Observed Heuristic
Behaviours Classification
911191f4993c5d2a30127ab8cb7911c9 Client, Scan- Cryptominer,
a736b77d8617c926c0611a7f61b36651 ning, Scanning Worm
SMB, Connect-
ing to Crypto
Pools, HTTP,
HTTP Headers
a860a38b72f34d66afa93eb866a4c668 Client, Fast- Trojan,
6f3a77f150c24ed4460970ea82aadc86 Flux TTL, Worm,
HTTP, HTTPS, FakeAV
Downloading,
Spike Traffic
d231bc7187252a557d74954f1d6a0f77 Server, HTTP, None
172121d80edbb0a2ef20e9dfdbda93a1 HTTP Headers,
Spike Traffic
e51cc8b228917410f4fac00b4e9aa3dd Client, Scan- Trojan,
4ece29e571204e815c468560e5e03f62 ning, Scanning FakeAV,
SMB, HTTP, Worm
HTTPS, Spike
Traffic
f0b996ca7b8d1cfe16e01be0d34f4fcf Client, Fast- Dropper,
853f8e9050a825daa437960b94043a40 Flux Connec- Botnet,
tions, Fast-Flux FakeAV,
Addresses, Trojan
HTTP, HTTPS,
Downloading,
Malware Certs,
Spike Traffic
911191f4993c5d2a30127ab8cb7911c9 HTTP, HTTP In-Browser
a736b77d8617c926c0611a7f61b36651 Headers, Crypto
HTTPS, Mal- Mining
ware Certs,
Client
a860a38b72f34d66afa93eb866a4c668 Client, Scan- FakeAV,
6f3a77f150c24ed4460970ea82aadc86 ning, Fast-Flux Dropper,
TTL, HTTP, Trojan
HTTPS, Down-
loading, Spike
Traffic
d231bc7187252a557d74954f1d6a0f77 Client, Down- Trojan,
172121d80edbb0a2ef20e9dfdbda93a1 loading, Spike Dropper,
Traffic, HTTP FakeAV
Headers, HTTP,
HTTPS
e51cc8b228917410f4fac00b4e9aa3dd Client, Scan- FakeAV,
4ece29e571204e815c468560e5e03f62 ning, Scan- Worm,
ning SMB, Trojan
HTTP, HTTPS,
Downloading,
Malware Cer-
tificates, Spike
Traffic
f0b996ca7b8d1cfe16e01be0d34f4fcf Client, Fast- Dropper,
83
853f8e9050a825daa437960b94043a40 Flux TTL, FakeAV,
HTTP, HTTPS, Trojan
Downloading,
Spike Traffic
1.6 Botnet Nodes
Malware Hash Observed Heuristic
Behaviours Classification
0347711e8d153c5927f9c89537a60626 Server, Fast-Flux TTL, HTTP, Crypto-
9e4f56875aef2e981f73cb8b852a2ec6 Spike Traffic Miner
0fad7c39cfbe360eae8eed13c277200e Client, Fast-Flux TTL, Fast- Botnet
4c1e72ec062c52e153c2fbfb73bba937 Flux Addresses, HTTP, HTTPS,
Downloading, Continuos Traffic,
Malware Certificates
163d2b7c9b694fc951d7e84fcdbdbe71 Server, Fast-Flux TTL, HTTP, Crypto-
a7bee72cfc5646c45e60061e8eb32305 Spike Traffic Miner
18a2f191db62cc45601981180e6263c4 Server, Fast-Flux TTL, HTTP, Crypto-
6657f537e0842cbc350a47efaa775178 Spike Traffic Miner
2f72da4ba32c11ac82f72457f34ec9dc Client, Fast-Flux TTL, Fast- Botnet
124b7f2a0b5811b44108b1dbfc0746f8 Flux Addresses, HTTP, HTTPS,
Downloading, Continuos Traffic,
Malware Certificates
dbbed7659ca610e7ec3219dfb74c0235 Client, Fast-Flux TTL, HTTP, Botnet, Tro-
575716bfcfd176d6eb010af02fab10da HTTPS, Downloading, Spike jan
Traffic, Malware Certificates
6f950588f24d5ad8dc6ab646e452fc7b Client, Fast-Flux TTL, HTTP, Botnet, Tro-
ceae827f914c9b91f658e5fd80452a2d HTTPS, Downloading, Spike jan
Traffic, Malware Certificates
7ef44f865ec8b811d9c32ed5820cc9f4 Server, Fast-Flux Connections, None
5c2485481e84bfc5aa27c7d3389c9b4f Fast-Flux TTL, HTTP, Contin-
uos Traffic
f1c36aebdcd92a04fd689d31944e5388 Client, Fast-Flux TTL, HTTP, Botnet,
e7e9b9421063ec4c98804ac7a04e6b0d HTTPS, Downloading, Spike Dropper,
Traffic Trojan
f33fa8171259fb5de3a72afe667eea71 Server, Scanning, SMB Scan- None
2fb492ba294205f4a1eb9cdbc83ae2bd ning, Fast-Flux TTL, HTTP,
Continuos Traffic
1.7 Worms
84
Malware Hash Observed Heuristic
Behaviours Classification
2b3bd85a584181300a62f4e1f1dd56ee Client, Fast-Flux TTL, HTTP, Worm
872227dfdd36b06bfc1ed1761ea8276f HTTP HEaders, Continuos Traf-
fic
2e216615cdfb3770fffdae7e36e8f3d7 Server, HTTP, Spike Traffic, Worm
10c5a67a232e5daaa0ebe492187449fa Fast-Flux TTL, Scanning, Scan-
ning SMB
3d84a7395b23bc363a52a2028cea6ced Server, Scanning, Scanning Worm
b8ea4011ebc63865581c35aaa0da5da8 SMB, Fast-Flux TTL, HTTP,
Continuos Traffic
4973088c4421c804ffebaef95d519c82 Client, Fast-Flux TTL, HTTP, Dropper, FakeAV, Trojan
68f5e8e468bff24381442c077081c4f3 HTTPS, Downloading
5165d5012a46cb72271404c69f73ea54 Client, Fast-Flux Connections, Botnet, FakeAV, Trojan, Drop-
db6fa5c6bfb9162feea00cf9c10b22e0 HTTP, HTTPS, Downloading, per
Spike Traffic
66c801a1193adab4bdad5966c067aabd Client, Fast-Flux Connections, Botnet, FakeAV, Trojan, Drop-
50b2a71ddb5acd42a2b48a4cf7675a75 Fast-Flux TTL, HTTP, HTTPS, per, Ransomware
Downloading, Spike Traffic,
Downloading
8076c5715565cf4ab3b75ebf2966ceb8 Client, HTTP, HTTPS, Down- Dropper, Ransomware, Trojan
6b5777a39af198c71097f8638c95b28c loading, Malware Certificates
c3504306f67fae5b835b3aad2e98c649 Client, Scanning, Scanning Worm, Dropper
7ed3e7bcddb644d08e4ecc989c302bc0 SMB, Tor Traffic
e049d8f69ddee0c2d360c27b98fa9e61 Client, HTTP, HTTPS, Scan- Worm, Dropper
b7202bb0d3884dd3ca63f8aa288422dc ning SMB, Scanning, Fast-Flux
TTL, Malware Certificates,
HTTP Headers
f1e6cacf412b66ae7a9a335d0990f126 Client, HTTP, Spike Traffic, Worm
7dd215887ffe99ca034a013441b2a8b1 Fast-Flux Traffic, Scanning,
Scanning SMB
85
1.8 Droppers
86
Malware Hash Observed Heuristic
Behaviours Classification
15e3cfedba9a841df67d8194e7249afb Spike Traffic, Server, HTTP, Exploit Kit, Dropper
493b0e10d6138fb8ebab2c136e543efb HTTPS, Downloading, HTTP
Headers
71ea85fd9a93949b4a22ed0ac43caebf Server, HTTP, Continuos Traf- Exploit Kit
991f9c046318bf6a490fe1ecb95537fe fic, Fast-Flux TTL
9ae356843ccbda7747e45b292fcf0c3e Client, Fast-Flux Connections, Dropper, Trojan, Exploit
ebbcc4a93101752a0007c9abaa79037a HTTP, HTTP Headers, HTTPS, Kit
Downloading, Malware Certifi-
cates, Spike Traffic
9ccbec3dac898da303c5141b4f59224f Client, Scanning, Scanning Worms, Exploit Kit
1fd811b43e41acb96eaea86136786921 SMB, HTTP, HTTP Headers,
Downloading, Continuos Traffic,
Tor Connection
f5fc9ed04e73015a1ba1f8400a908fd1 Server, HTTP, HTTP Headers, Exploit Kit, Dropper
02eebf92eaee880d96ba59b1c4050a29 Continuos Traffic
65694c70f53e419a5904befcdbcd0567 Server, Scanning, Scanning Exploit Kit
485131e5c135decff74d7d861bb144c2 SMB, Malware Certificates,
Crypto Pool Connections,
HTTP, HTTPS, HTTP Headers
8b86662ab617d11079f16d95d4d584e8 Server, Fast-Flux TTL, HTTP, Exploit Kit
acb4a374b87edf341195ab9e043ed1d2 HTTPS, Continuos Traffic
93add6b50434284c05b7a7f851fef88f Client, HTTP, HTTP Headers, Dropper, Botnet, Trojan,
532f8dc6d5b3873fe9d5c3f3d16368f9 HTTPS, DOwnloading, Malware Fake AV
Certificates, Spike Traffic, Con-
necting To Crypto Pools
a5eecdd62c3e279f426cfbee11830dc2 Server, Fast-Flux Connections, Exploit Kit
3a43c61f28adf81875b95571074229f3 Fast-Flux TTL, HTTP, HTTPS,
Continuos Traffic
f710f3c77276e7082d68d365413a658d Client, Fast-Flux TTL, HTTP, Botnet
80b6cac66c8b0c9a67b20426259a2035 HTTP Headers, HTTPS, Down-
loading, Malware Certificates,
Continuos Traffic
87
1.10 APT Malware
Malware Hash Observed Heuristic
Behaviours Classification
295b089792d00870db938f2107772e0b Client, Fast-Flux TTL, HTTP, None
58b23e5e8c6c4465c23affe87e2e67ac HTTPS, Spike Traffic
52fe506928b0262f10de31e783af8540 Server, Fast-Flux TTL, HTTP, Cryptominer
b6a0b232b15749d647847488acd0e17a Spike Traffic
81cdbe905392155a1ba8b687a02e65d6 Client, Fast-Flux TTL, HTTP, Dropper,
11b60aac938e470a76ef518e8cffd74d HTTPS, Downloading, Malware Botnet,
Certificates, Spike Traffic Trojan,
Ransomware
4f02a9fcd2deb3936ede8ff009bd0866 Server, Fast-Flux TTL, HTTP, None
2bdb1f365c0f4a78b3757a98c2f40400 Continuos Traffic
bdd0f50b9f8e3e1c0322d36a60688949 Client, Scanning, Scanning Worm, Tro-
aae35691a4ac2bdfafccd74c19c820ae SMB, Fast-Flux Connections, jan, Dropper
HTTP, HTTPS, HTTP Headers,
Downloading, Spike Traffic
f9d94c5de86aa170384f1e2e71d95ec3 Server, Fast-Flux TTL, HTTP, None
73536899cb7985633d3ecfdb67af0f72 Continuos Traffic
edb1ff2521fb4bf748111f92786d260d Server, Continuos Traffic, None
40407a2e8463dcd24bb09f908ee13eb9 HTTP, Fast-Flux TTL
1b0eb1a1591140175d1ac111a98c8947 Server, Fast-Flux TTL, HTTP, Cryptominer
2b196599baf13ef67ee7f63d0052b00e Spiked Traffic
5130f600cd9a9cdc82d4bad938b20cbd Client, HTTP, HTTPS, Down- Dropper,
2f699aadb76e7f3f1a93602330d9997d loading, Malware Certificates Trojan,
Botnet
5af3fd53aea5e008d8725c720ea0290e Server, HTTP, Spike Traffic None
2e0cd485d8a953053ccf02e5e81a94a0
9d88425e266b3a74045186837fbd71de Client, Fast-Flux TTL, HTTP, Dropper,
657b47d11efefcf8b3cd185a884b5306 HTTPS, Downloading, Malware Trojan,
Certificates, Spike Traffic Botnet
7313eaf95a8a8b4c206b9afe306e7c06 Client, Fast-Flux Connections, Botnet
75a21999921a71a5a16456894571d21d Fast-Flux TTL, HTTP, HTTPS,
Downloading, Malware Certifi-
cates, Continuos Traffic
Figure 2.1: Were You Able To Use This System To Clone Virtual Machines And Execute Malware?
88
Figure 2.2: Is The Mapping Of Behaviours To Classifications Accurate?
Figure 2.3: Does This Visualisation Augment Your Understanding Of The Malware?
Figure 2.4: Is the Graphic Visualisation of Malware’s Network Traffic Easy To Understand?
89
Figure 2.5: Did You Find That The Statistical Analysis Provided Actionable Intelligence?
90