0% found this document useful (0 votes)
34 views

Aricle-Towards A Framework To Detect Multi-Stage

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Aricle-Towards A Framework To Detect Multi-Stage

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Towards a Framework to Detect Multi-Stage

Advanced Persistent Threats Attacks

Parth Bhatt1, Edgar Toshiro Yano2


Dept. of Electronics and Computer Engineering Dr. Per M. Gustavsson3
3
Instituto Tecnológico de Aeronáutica Combitech Sweden / Swedish National Defence College /
São José dos Campos,SP,Brasil George Mason University, USA
[email protected], [email protected]
[email protected]

Abstract— Detecting and defending against Multi-Stage A current weakness to deal with this new scenario is the
Advanced Persistent Threats (APT) Attacks is a challenge for difficulty in constructing, operating and maintaining an
mechanisms that are static in its nature and are based on appropriate defense system. Even larger organizations with
blacklisting and malware signature techniques. Blacklists and sophisticated defenses are targets of attacks. So, there is a
malware signatures are designed to detect known attacks. But
demand for frameworks to support the implementation of
multi-stage attacks are dynamic, conducted in parallel and use
several attack paths and can be conducted in multi-year effective solutions in order that even organizations with fewer
campaigns, in order to reach the desired effect. resources and knowledge can reasonably handle complex and
In this paper the design principles of a framework are presented persistent attacks. In this paper we present a research
that model Multi-Stage Attacks in a way that both describes the framework to handle complex attacks. The central basis of the
attack methods as well as the anticipated effects of attacks. The framework consists of an Intrusion Management System and a
foundation to model behaviors is by the combination of the multi-stage attack model. The multi-stage attack model is used
Intrusion Kill-Chain attack model and defense patterns (i.e. a to identify prevention and detection controls that provide logs
hypothesis based approach of known patterns). The used by the Intrusion Management System, and it is also used
implementation of the framework is made by using Apache
as a guide to logs correlation activities.
Hadoop with a logic layer that supports the evaluation of a
hypothesis. In the next section we present characteristics of APTs and
Keywords—APT; Multi-stage Attack; Hadoop; Intrusion Kill difficulties of current approaches to treat them. In section III
Chain; we present the framework with the underlying models and
architectural principles. In Section IV we present a process
I. INTRODUCTION with correlation patterns to detect an APT. In section V we
Currently, cyber systems are being attacked by complex, present related work. And finally in section VI, we present
persistent and stealthy attacks, also referred as APT conclusions and suggestions for future works.
(Advanced Persistent Threats) [1][2]. An APT usually has
II. APT – ADVANCED PERSISTENT THREATS
multiple stages [13]. At each stage the attacker gets more
privileges, information and resources to penetrate deeper A. APT Attack scenario
within the organization. Persistency means that the attacker A complex attack can overwhelm the defenses of a system
will persist patiently for a long time in their attempts to reach through a well-planned operation to explore existing
the desired goal. The attacker does not give up easily and weaknesses. The attacker first identifies potential targets in the
deviate from their targets. Thus, he has a well defined goal and organization. The selected targets are ceasing to be services or
he will persist until his goal is achieved. The attackers are applications, as these are generally better protected and
supposed to be supported by organizations or nations with monitored. A common target is a user within the organization
capabilities and resources to support their aims. with a closer access to assets desired by the attacker. He or she
To deal with this new scenario it is necessary that the may be the target of a focused phishing attack, or receives a
defense adopts a proactive behavior. That is, an attack must be gadget at a conference or exhibition, or is convinced to bring a
perceived and treated before it causes significant impacts to malicious device inside the organization. Once inside the
the business. The current approach to security, with static supposedly secure network, the malware establishes a stealthy
models of risk, compliance to standards and regulations and communication channel with the attacker, and exploring other
incident handling after impact, is no longer acceptable. weaknesses, the attack advances over other users and
Security must be dynamic, with risks assessed continuously resources to achieve its final goal.
and proactively with treatment actions being performed before
significant impacts are realized.
B. Explored Weaknesses  Command and control (C2) - Adversary requires a
This scenario is possible because even well designed communication channel to control its malware and
defenses have blind spots. An anti-virus is unable to detect a continue their actions. Therefore, it needs to be
malware not registered in the database of signatures, an IDS connected to a C2 server.
(Intrusion Detection System) is only effective if an attack
triggers a registered detection rule, and frequently it generates  Actions – it is the last phase of the kill chain in which
many false positives and negatives and so it is often
adversary achieves its objectives by performing
overlooked by security administrators (if the organization has
a fully qualified one). Alerts from different security sensors actions like data exfiltration. Defenders can be
are hardly correlated. Already fixed vulnerabilities remain confident that adversary achieves this phase after
present for a long time. Vulnerable applications settings are passing through previous phases.
used by many users. Even users with access to sensitive assets
do not have adequate training and awareness. These different
vulnerabilities could be discovered by an adversary through a
combination of social engineering and network reconnaissance
attacks.
III. PROPOSED FRAMEWORK
A research framework is being developed to support the
detection and analysis of multi-stage cyber-attacks. The
framework has the following main components:
 A Multi-stage Attack Model.
 A Layered Security Architecture.
 A Security Event Collection and Analysis System
A. Multi-stage Attack Model
The treatment of a cyber attack requires the use of an Figure 1: Intrusion Kill Chain (IKC)
appropriate attack model. Using an attack model it is possible
to recognize the current state of an attack and its possible To defeat more sophisticated defense systems, attackers
future states. An attack model is as a model of hypothesis may require the execution of one or more IKCs to circumvent
which will be used to infer possible actions of attackers. We different defensive controls. So, an adequate representation of
adopted the Intrusion Kill Chain (IKC) [3] model as the a complex attack is a multi-stage model, with each stage
central basis of our attack model. IKC is a model of seven represented by an IKC divided in its seven phases.
phases that an attacker inescapably follows to plan and carry
B. Layered Security Architecture
out an intrusion. The IKC phases are as follows:
 Information Gathering – Selection of targets, The detection of a complex attack in its earlier stages is
possible if we increase the difficulties for the attacker to
collecting information about the target, technologies
access the valuable assets. The attacker will need to invest
the target uses, potential vulnerabilities, etc. more resources and time to reach the targets. The likelihood,
that one or more sensors are activated and the attack is
 Weaponization – developing malicious code to detected, increases with the number of interactions of the
explore identified vulnerabilities, coupling the attacker with the targeted system. A pattern to facilitate
developed code with unsuspected deliverable detection of a complex attack is to protect assets by using a
payloads like pdfs, docs, and ppts. layered model. Most valuable assets should be in the inner
layers. The logic is to force the attacker to execute an attack
 Delivery- Transferring the weaponized payload to the with multiple stages. For each layer, at least once, the seven
target environment. phases of an IKC will need to be executed. So, there will be at
least seven opportunities for detecting an attack on a layer.
To be effective, the layered model should attend the
 Exploitation - Use of vulnerability of a target system
following requirements:
to execute a malicious code.
 The access to a layer will only be possible through
processes and applications of the immediately
 Installation - Remote Access Trojan’s (RAT) are
outermost layer. The attacker will have first to get an
generally installed which allows adversary to
access to the outermost layer.
maintain its persistence in the targeted environment.
 To circumvent the controls to get an access to a layer, Our framework using Hadoop is divided into 5 modules
the attacker will have to execute a kill chain from the namely, Logging Module, Log Management Module, Malware
outermost layer. Analysis Module, Intelligence Module and Control Module.
 The probability of finding common vulnerabilities in
controls, that are used to defend the different layers, Logging Module
must be very low. The idea is to minimize the reuse This module of consists of sensors from the security
of knowledge about vulnerabilities of a layer to architecture. It typically consists of HIDS (Host intrusion
attack another layer. The defense can hinder the detection system) and NIDS (Network intrusion detection
system), Firewall logs, Web Server logs, Mail Server logs, etc.
attack, forcing the adversaries to collect more
The rules and configuration for log generation can be set by
information and to develop new weapons to bypass the administrator using the Control Module. This Module
each different layer. executes a normalization task [6] to enable uniformity in the
analysis process.
C. A Security Event Collection and Analysis System
Log Management Module
An effective detection is possible only with appropriate
sensors that detect different facets of an attack. One possible All the logs generated in the Logging Module are moved to
approach is to provide each layer with sensors to detect this module, stored and pre-processed in the Hadoop
different phases of an IKC. The sensors are triggered by rules Distributed File System (HDFS) [7]. The logs are accessed
established in accordance with patterns of a malicious using Hive queries and for point queries on a small amount of
behavior. Each layer must have its own set of sensors logs a MySQL data base is used.
configured to detect an IKC inside that layer. Alerts and logs
Intelligence Module
collected by the sensors should be stored and correlated to
identify stages and phases of attacks in progress. Intelligence module contains the algorithms for log correlation
The process of collecting and correlation requires an and is responsible for automatic IKC search based on potential
infrastructure that can become difficult to properly operate and malicious events detected. Trigger events are the events on
maintain. A small network (about 100 hundred hosts) can which the Intelligence Module that can initiate an IKC
generate around 100 GB of daily logs and alarms [4,10]. reconstruction. Trigger events can be rule based or a system
Considering that an APT attack can last months or even years, administrator input. Generally, a trigger is a NIDS or HIDS
a large organization may require a significant investment to high risk alert. A multi-stage attack may persist for a long time
establish a system for collecting and analyzing logs. period. In order to enable this type of analysis, the intelligence
In order to attend this need, a model of collecting data module has a campaign analysis component. With the
based on Big Data technology was designed. This model was campaign analysis previous attacks data are collected and
implemented using Hadoop. Apache Hadoop [5] is an open correlated in order to identify a potential multi-stage attack.
source framework that allows distributed processing of large
The Intelligence Module activities are explained in section IV.
collection of data using cluster of computers each having local
computation and storage. Hadoop provides high availability, Malware Analysis Module
fault tolerance and faster processing speeds of large
(structured, semi-structured or un-structured) data sets even Malware analysis module consist of a malware analysis
with cheap commodity hardware. virtualized Lab Environment with detection tools. Explaining
malware analysis in detail is out of scope of this paper. The
primary approaches for malware analysis are Code Analysis
and Behavioral Analysis. There are several tools that help to
perform such analysis of executables. The malware analysis
module provides a more detailed understanding of the possible
actions and effects of a malware.
Control Module
Using the control module, the administrator governs the
framework. The administrator can set new rules for the
logging module, manage the cluster of the log management
module, or test hypothesis with the intelligence module.

Figure2. Overview of Complete framework


IV. INTELLIGENCE MODULE LOG ANALYSIS PROCESS It is worth noting the absence of attack mechanisms for the
phases of Weaponization and C2. At the stage of
The defense of a layer must be able to detect an IKC
Weaponization the attacker does not interact directly with the
identifying one of its seven stages. For each layer, it is
system to be attacked. In C2 phase, the attacker has obtained
elaborated a defense plan, as illustrated in Table I. A defense
sufficient privileges to establish a communication channel
plan identifies for each phase of an IKC the attack mechanisms
using authorized means. He will try to maintain a usage profile
that can be used at each stage, and the controls to prevent and
that will not provoke attention from defensive controls.
detect the attacks. Each row of the table can be understood as a
defense line which can prevent or detect a phase of an IKC. The main input to the intelligence module is the collected
The attack mechanisms were extracted from CAPEC list logs from the different prevention and detection controls. Each
maintained by Mitre [8]. log is normalized [6] to provide attributes that identify the
control, date and time, type of attack, source, destination, and
TABLE I. DEFENSE PLAN payload attributes. Each collected log is like a part of a puzzle.
Phase, CAPEC Attack Prevention Detection The complete puzzle is an IKC of a multi-stage attack. The
Defense Line Mechanisms process to analyze the logs is composed of the following
Info. Gathering Social Security User monitoring
Engineering Awareness and
activities:
Training  Identify the Defense Line.
Network IPSa, Firewall NIDSb, NABDc  Identify the phase of an IKC.
Recognition
Data leakage IPS, Firewall, NIDS, NABD  Rebuild an IKC.
Proxy  Identify a multi-stage attack.
Fingerprinting IPS, Firewall, NIDS, NABD
Information
Obfucation The process starts with the trigger of an alarm activated by
Footprinting IPS, Firewall HIDSd,NIDS the different controls. Alarms are logs classified as critical and
Weaponization Patching, requiring immediate security management attention.
Auditing,
Vulnerability A. Identify the Defense Line
Scanning
Delivery Spear Phishing Content Source
The same type of control can be used in different defense
Filtering, Correlation lines. The most likely defense line can be identified by the rule
Identity that triggered the alarm with correlation with other logs from
Verification, controls of the defense line. The main attribute for correlation
Blacklisting
is the time-stamp. Logs of different controls are verified by
Action Content Proxy
Spoofing Filtering time proximity. The other attributes (type of attack, source,
Injection IPS, Input NIDS, HIDS target, and payload) can be used to correlate with other IKCs
Filtering already identified. The defenders have the advantage of being
Supply-Chain Security Life NIDS, HIDS able to simultaneously view events in different hosts, networks
Attack Cycle
Hacking Configuration HIDS
and controls. For example, a log in the same kind of control in
Hardware Control other parts of a network may mean a coordinated attack in
Devices progress.
Exploit Data Structure Patching HIDS
Attacks B. Identify the Phase of an IKC
Execution Privilege Password HIDS The identification of the defense line may not be accurate
Escalation Control,
Firewall which leads to one or more possible phases. For example,
C2 Firewall, Proxy, HIDS, NIDS, Privilege Escalation type of attack may be an IKC at the
Encryption Use Content Actions or Installation Phase. The different alternatives must
Control, Analysis be marked for review by the IKC rebuild activity.
blacklisting
Actions Exploitation of Password HIDS C. Rebuild an IKC
Privilege/Trust Control,
Firewall An identified phase leads to a process of searching for the
Resource Firewall, Proxy, HIDS, NIDS earlier phases of an IKC. The main element of correlation is
Manipulation Encryption Use still time. However, it should take into account that an APT
Control
attack can take a long time. The time interval between two
Resource IPS HIDS, NIDS
Depletion phases can be long. For example, between C2 and Actions
IPS – Intrusion Prevention System phases an attacker may choose to keep the malware dormant
NIDS – Network Intrusion Detection System until he or she feels confident to start the Actions phase. This
NABD- Network Anomaly Behavior Detection can take months or even years. However, between the Exploit
HIDS-Host Intrusion Detection System and Installation phases time is usually short because the
attacker in general, due to peculiarities of the exploited
vulnerability, has a limited window of time to finish the
Installation phase operations. The reconstruction of an IKC is VI. EXPERIMENTS AND RESULTS
an activity that leads to reduction of false positives. The
A. Experimental Intrusion Scenario
discovering of earlier phases increases the probability that
there is an ongoing IKC. A scenario of a university network getting attacked by
APT is considered. We consider that the network is equipped
D. Identify a multi-stage attack with a complete framework as described in the Section III. As
Each IKC is a stage of a complex attack. An IKC in a layer it is a layered architecture, the attackers are able to reach only
may have started from an IKC executed in the outer layer. But the outermost layer in their first attempt. In the outermost
it can also be initiated by an IKC in the same layer. For layer, the easiest way is to get into university professors
example, an attacker got first an access with restricted mailboxes. The attacker targets some of the professors and
privileges and then ran a second IKC to increase his performs an initial reconnaissance about their interests. He
privileges. The linkage between two IKCs may be detected gets a conference of potential interest and weaponizes its pdf
correlating attributes from earlier phases of an IKC with flyer. Next, he crafts a Targeted malicious email (TME) with a
Actions phase of older IKCs. malicious pdf flyer in the attachment and finally sends this
email to the target completing the delivery phase of an IKC.
SEMI-AUTOMATIC PROCESS Upon reception of the email, the professor downloads the
The correlation process requires the command of an pdf flyer to get more details. As soon as the malicious pdf
experienced analyst. The beginning of each activity is flyer is opened, the malicious code gets executed, and within
performed in accordance with guidelines from an expert. Upon fraction of seconds the original pdf flyer is displayed to the
receiving an alert, the system identifies possible defense lines professor. This complete process appears to be normal, but the
involved. The expert selects the most promising IKC phase execution of the malicious code was due to exploitation of one
and calls for the reconstruction of the IKC. of the vulnerabilities of the pdf reader application which leads
The automatic event correlation to detect APTs is a to the installation of a malicious code. During the installation
research issue. Some promising approaches are the using of process, the HIDS generates an alert about a file modification
probabilistic techniques such as Hidden Markov Models of the windows file “explorer.exe”.
(HMM) [9]. Our framework can be used as a research
platform for the development of algorithms for detecting
B. Experiment
complex multi-stage attacks.
Logs were simulated according to the experimental
intrusion scenario. Log entries were created for components of
V. IMPLEMENTATION logging module such as OSSEC [15] logs and Mail logs.
The framework described in the previous section was The alert from OSSEC syscheck informing the file
implemented for basic testing purposes. A Hadoop Cluster modification of “explorer.exe” becomes the input for the
with 5 nodes was implemented using commodity hardware intelligence module and it starts the IKC reconstruction. The
obtained from a previous project, each machine powered by first step was to associate this alert to one of the phases of the
Intel ® Core™ 2 Duo CPU E4500 @ 2.20 Ghz × 2 with 2GB IKC. It was identified as the installation phase. Thus, the
of RAM, 80GB Hard Disk, 32bit machines forming a previous phases of this IKC are needed to be discovered. The
homogeneous cluster. One machine was set as master node intelligence module performs the intrusion reconstruction.
and other four as slave nodes. The master node was configured Some of the phases of any IKC, such as installation and
with Apache Sqoop [14] and MySQL. A Fast Ethernet Switch exploitation are generally in a very close temporal proximity,
was used for the networking within the cluster nodes. thus logs based on such temporal proximity are moved from
Hive external tables into MySQL using Sqoop and where
Furthermore, simple implementation of Intelligence module many point queries can be performed in realtime to search
was realized using a intrusion analysis algorithm in java information to complete the kill chain reconstruction. Phases
program that could access data present on the HDFS in Hive such as weaponization depend on the malware analysis
external tables using Hive Thrift service, Sqoop using module that was not implemented yet.
SqoopOptions class and MySQL. The inputs to the algorithm The total number of log records fed into Hadoop HDFS
are alerts generated by intrusion detection systems present in were 69,969,233 and 5 Hadoop nodes were used. The process
the logging module of the framework. The Flume was time to get the information based on the IDS alert was 7
configured to transfer data synchronously from different minutes and 38 seconds. Using 5 node Hadoop cluster, we
components of the logging module into HDFS. Malware were able to process huge amount of semi-structured logs.
analysis module was left for future implementation.
The IKC reconstruction process results is given in table II: can be used as a hypothesis model to improve the correlation
of logs and thereby facilitate the identification of ongoing
TABLE II: KILL CHAIN FOR OUTERMOST LAYER attacks.
Info. Gathering Mailing List , Xyz conference A prototype of the proposed framework was developed and
Website, University Website it was successfully tested with simulated attack data.
Weaponization Malware Analysis Lab Currently, we are going to test it with data of a real
Delivery [email protected] installation, and we are also making efforts to improve the
ip : 161.xyz.pq.35 human interface to facilitate experimentation with different
Sub: Xyz conference 2013 correlation algorithms.
xyz2013.pdf
Exploitation 0-day PDF ACKNOWLEDGMENTS
Installation Windows file modification detected The authors would like to thank the following organizations
“ explorer.exe” for their support: University of Skövde (HiS),Sweden, Swedish
C2 Left for intrusion Synthesis phase National Defence College (SNDC),Sweden, Combitech AB,
Actions - Sweden, and National Council for Scientific and
Technological Development (CNPq), Brazil.

VII. RELATED WORKS REFERENCES


[1] Li F, Atlasis A,“ A Detailed Analysis of an Advanced Persistent Threat
Due to increase in number of sophisticated threats and great Malware”, 2011, SANS Institute InfoSec Reading Room
increase in volume of data traffic, the landscape of analysing [2] Sood A.K., Enbody R.J. “ Targeted cyber attacks: A Superset of
log data has drastically changed, as now working with log data advanced persistent threats” Security & Privacy, IEEE Volume 11 ,
has entered in the category of Big Data problem [10]. Issue 1 2013
J. Howes, J. Solderitsch, I. Chen and J. Craighead [11] [3] Hutchins Eric M., Cloppert Michael J., Amin Rohan M,“Intelligence-
Driven Computer Network Defense Informed by Analysis of Adversary
proposed an analytical security model considering the security Campaigns and Intrusion Kill Chains” ICIW2011
analytics using Big Data. Their architecture is directed [4] Brad Hale, “Estimating Log Generation for Security Information Event
and Log Management”
towards dealing with operational concerns in security ,http://content.solarwinds.com/creative/pdf/Whitepapers/estimating_log_
organizations that aim to use existing security tools with Big generation_white_paper.pdf[accessed 1 Jan 2014]
Data analytics. Since their work is aimed towards operational [5] Tom White “Hadoop: The Definitive Guide”, 2009, 978-0-596-52197-4
side of security analytics therefore, it does not demonstrate [6] Kruegel, Christopher, Valeur, Fredrik and Vigna, Giovanni. Intrusion
Detection and Correlation - Challenges and Solutions. Vol. 14. :
any methodology of practical analysis of security threats as Springer, 2005.
compared to our framework. [7] HDFS Architecture Guide
http://hadoop.apache.org/docs/stable1/hdfs_design.html [accessed 1 Jan
2014]
J. Therdphapiyanak and K. Piromsopa [12] used Hadoop
[8] CAPEC – Common Attack Pattern Enumeration and Classification
map reduce model to analyze high volume of log files from online Mechanism of Attack at
server and distributed intrusion detection system and they http://capec.mitre.org/data/definitions/1000.html [accessed 1 Jan 2014]
[9] Dr. Dirk Ourston, Ms. Sara Matzner, Mr. William Stump, and Dr. Bryan
proved that their frameworks performance was better than a Hopkins,Applied Research Laboratories University of Texas at Austin
standalone intrusion detection system. They were able to Applications of Hidden Markov Models to Detecting Multi-stage
Network Attacks, Proceedings of the 36th Hawaii International
extract important information from the large security logs Conference on System Sciences, 2003.
using their analysis and scalability of Hadoop, but their work [10] MacDonald, Neil, 2012, Information Security is Becoming a Big Data
was limited to use of K-means clustering algorithm from Analytic Problem, Gartner, (23 March 2012), DOI=
http://www.gartner.com/id=1960615
Mahout[5] for detection of the deviated behavior Clusters
[11] J. Howes, J. Solderitsch, I. Chen & J. Craighead, “Enabling trustworthy
from normal behavior Clusters. Using the proven capabilities spaces via orchestrated analytical security”, ACM, CSIIRW 2013,
of Hadoop for log analysis as in [12], our proposed framework Article No. 13
[12] J. Therdphapiyanak, K. Piromsopa, “Applying Hadoop for log analysis
is directed towards practical analysis of dealing with Targeted toward distributed IDS” ACM ICUIMC 2013, Article No. 3
threats. [13] Vries, J.D. and Hoogstraaten H. and Berg, J.V.D. and Daskapan S,
“Systems for Detecting Advanced Persistent Threats CyberSecurity” 54-
VIII. CONCLUSIONS 61, IEEE Computer Society 2012
[14] Apache Sqoop http://sqoop.apache.org/ [accessed 1 Jan 2014]
APT attacks are a major challenge for current cyber [15] Hay Daniel Cid, R. B. A. OSSEC Host Based Intrusion Detection Guide.
defenses. We present the design of a framework for detecting [S.l.]: Syngress Publishing, Inc., 2008.
APTs. The conception used well known defense patterns in
order to increase the difficulty in performing multi-stage
attacks and thereby increasing the likelihood of early detection
of such attacks. The use of the IKC attack model allows a
better tuning of the configuration of security controls and it

You might also like