0% found this document useful (0 votes)

37 views37 pages

LaTeX PHD Thesis With The Memor Package

Illustration of one of the best latex formats for a PhD thesis

Uploaded by

Manuel Pérez Jigato

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views37 pages

LaTeX PHD Thesis With The Memor Package

Illustration of one of the best latex formats for a PhD thesis

Uploaded by

Manuel Pérez Jigato

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

University of California

Department of Computer Science

DOTTORATO DI RICERCA IN INGEGNERIA
DELL’INFORMAZIONE

Integrated Detection of Anomalous Behavior of

Computer Infrastructures

Doctoral Dissertation of:

Federico Maggi

Advisor:
Prof. Stefano Zanero

Tutor:
Prof. Letizia Tanca

Supervisor of the Doctoral Program:

Prof. Patrizio Colaneri

December 2013
i

Preface
is thesis embraces all the eﬀorts that I put during the last three years
as a PhD student at Politecnico di Milano. I have been working under
the supervision of Prof. S. Zanero and Prof. G. Serazzi, who is also the
leader of the research group I am part of. In this time frame I had the
wonderful opportunity of being “initiated” to research, which radically
changed the way I look at things: I found my natural “thinking outside
the box” attitude — that was probably well-hidden under a thick layer
of lack-of-opportunities, I took part of very interesting joint works —
among which the year I spent at the Computer Security Laboratory at
UC Santa Barbara is at the rst place, and I discovered the Zen of my
life.
My research is all about computers and every other technology pos-
sibly related to them. Clearly, the way I look at computers has changed
a bit since when I was seven. Still, I can remember me, typing on that
Commodore 64 in front of a tube TV screen, trying to get that d—n rou-
tine written in Basic to work. I was just playing, obviously, but when
I recently found a picture of me in front of that screen...it all became
clear.
So, although my attempt of writing a program to authenticate my-
self was a little bit naive — being limited to a print instruction up to
that point apart, of course — I thought “maybe I am not in the wrong
place, and the fact that my research is still about security is a good sign”!
Many years later, this work comes to life. ere is a humongous
amount of people that, directly or indirectly, have contributed to my
research and, in particular, to this work. Since my rst step into the lab,
I will not, ever, be thankful enough to Stefano, who, despite my skep-
ticism, convinced me to submit that application for the PhD program.
For trusting me since the very rst moment I am thankful to Prof. G.
Serazzi as well, who has been always supportive. For hosting and sup-
porting my research abroad I thank Prof. G. Vigna, Prof. C. Kruegel,
and Prof. R. Kemmerer. Also, I wish to thank Prof. M. Matteucci
for the great collaboration, Prof. I. Epifani for her insightful sugges-
tions and Prof. H. Bos for the detailed review and the constructive
comments.
On the colleagues-side of this acknowledgments I put all the fellows
of Room 157, Guido, the crew of the seclab and, in particular, Wil with
whom I shared all the pain of paper writing between Sept ’08 and Jun
’09.
ii

On the friends-side of this list Lorenzo and Simona go rst, for

being our family.
I have tried to translate in simple words the in nite gratitude I have
and will always have to Valentina and my parents for being my xed
point in my life. Obviously, I failed.

F M
Milano
September 2009
iv

Abstract

is dissertation details our research on anomaly detection

techniques, that are central to several classic security-related tasks
such as network monitoring, but it also have broader applications
such as program behavior characterization or malware classi ca-
tion. In particular, we worked on anomaly detection from three
different perspective, with the common goal of recognizing awk-
ward activity on computer infrastructures. In fact, a computer
system has several weak spots that must be protected to avoid
attackers to take advantage of them. We focused on protecting
the operating system, central to any computer, to avoid malicious
code to subvert its normal activity. Secondly, we concentrated
on protecting the web applications, which can be considered the
modern, shared operating systems; because of their immense pop-
ularity, they have indeed become the most targeted entry point
to violate a system. Last, we experimented with novel techniques
with the aim of identifying related events (e.g., alerts reported
by intrusion detection systems) to build new and more compact
knowledge to detect malicious activity on large-scale systems.
Our contributions regarding host-based protection systems
focus on characterizing a process’ behavior through the system
calls invoked into the kernel. In particular, we engineered and
carefully tested different versions of a multi-model detection sys-
tem using both stochastic and deterministic models to capture
the features of the system calls during normal operation of the
operating system. Besides demonstrating the effectiveness of our
approaches, we con rmed that the use of nite-state, determin-
istic models allow to detect deviations from the process’ control
ow with the highest accuracy; however, our contribution com-
bine this effectiveness with advanced models for the system calls’
arguments resulting in a signi cantly decreased number of false
alarms.
Our contributions regarding web-based protection systems
focus on advanced training procedures to enable learning systems
to perform well even in presence of changes in the web applica-
tion source code — particularly frequent in the Web 2.0 era. We
also addressed data scarcity issues that is a real problem when de-
ploying an anomaly detector to protect a new, never-used-before
application. Both these issues dramatically decrease the detection
capabilities of an intrusion detection system but can be effectively
mitigated by adopting the techniques we propose.
v

Last, we investigated the use of diﬀerent stochastic and fuzzy

models to perform automatic alert correlation, which is as post
processing step to intrusion detection. We proposed a fuzzy model
that formally de nes the errors that inevitably occur if time-based
alert aggregation (i.e., two alerts are considered correlated if they
are close in time) is used. is model allow to account for mea-
surements errors and avoid false correlations due to delays, for
instance, or incorrect parameter settings. In addition, we de ned
a model to describe the alert generation as a stochastic process
and experimented with non-parametric statistical tests to de ne
robust, zero-con guration correlation systems.
e aforementioned tools have been tested over diﬀerent datasets
— that are thoroughly documented in this document — and lead
to interesting results.
Contents

List of Figures viii

List of Tables x

List of Acronyms xiii

1 Introduction 1
1.1 Todays’ Security reats . . . . . . . . . . . . . . . . 3
1.1.1 e Role of Intrusion Detection . . . . . . . . 4
1.2 Original Contributions . . . . . . . . . . . . . . . . . 8
1.2.1 Host-based Anomaly Detection . . . . . . . . 8
1.2.2 Web-based Anomaly Detection . . . . . . . . 9
1.2.3 Alert Correlation . . . . . . . . . . . . . . . . 10

2 A Chapter of Examples 11
2.1 A Table . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 A Sideways Table . . . . . . . . . . . . . . . . . . . . 12
2.4 A Figure . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Bulleted List . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Numbered List . . . . . . . . . . . . . . . . . . . . . 14
2.7 A Description . . . . . . . . . . . . . . . . . . . . . . 14
2.8 An Equation . . . . . . . . . . . . . . . . . . . . . . 15
2.9 A eorem, Proposition & Proof . . . . . . . . . . . . 15
2.10 De nition . . . . . . . . . . . . . . . . . . . . . . . . 16
2.11 A Remark . . . . . . . . . . . . . . . . . . . . . . . . 16
2.12 An Example . . . . . . . . . . . . . . . . . . . . . . . 16
2.13 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

vi
CONTENTS vii

Bibliography 17

Index 21
List of Figures

1.1 Illustration taken from (Holz, 2005) and ©2005 IEEE. Au-
thorized license limited to University of California. . . . . 4

2.1 telnetd: distribution of the number of other system calls

among two execve system calls (i.e., distance between two
consecutive execve). . . . . . . . . . . . . . . . . . . . . . 14

viii
List of Tables

2.1 Duality between misuse- and anomaly-based intrusion de-

tection techniques. . . . . . . . . . . . . . . . . . . . . . 11
2.2 Taxonomy of the selected state of the art approaches for
network-based anomaly detection. . . . . . . . . . . . . . 13

x
List of Acronyms

DoS Denial of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

HTTP HyperText Transfer Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

IDS Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

ID Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

ISP Internet Service Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

IP Internet Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

SOM Self Organizing Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

SQL Structured Query Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
TCP Trasmission Control Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

TTL Time To Live. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

URL Uniform Resource Locator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

xiii
Colophon

is document was typeset using the XeTeX typesetting sys-

tem created by the Non-Roman Script Initiative and the memoir
class created by Peter Wilson. e body text is set 10pt with Adobe
Caslon Pro. Other fonts include Envy Code R, Optima Regular and.
Most of the drawings are typeset using the TikZ/PGF packages by
Till Tantau.
Introduction 1

Network connected devices such as personal computers, mobile phones,

or gaming consoles are nowadays enjoying immense popularity. In par-
allel, the Web and the humongous amount of services it oﬀers have
certainly became the most ubiquitous tools of all the times. Facebook
counts more than 250 millions active users of which 65 millions are
using it on mobile devices; not to mention that more than 1 billion
photos are uploaded to the site each month (Facebook, 2009). And this
is just one, popular website. One year ago, Google estimated that the
approximate number of unique Uniform Resource Locators (URLs) is 1
trillion (Alpert and Hajaj, 2008), while YouTube has stocked more than
70 million videos as of March 2008, with 112,486,327 views just on the
most popular video as of January 2009 (Singer, 2009). And people from
all over the world inundate the Web with more than 3 million tweets per
day. Not only the Web 2.0 has became predominant; in fact, thinking
that on December 1990 the Internet was made of one site and today it
counts more than 100 million sites is just astonishing (Zakon, 2006).
e Internet and the Web are huge (Miniwatts Marketing Grp.,
2009). e relevant fact, however, is that they both became the most
advanced workplace. Almost every industry connected its own network
to the Internet and relies on these infrastructures for a vast majority of
transactions; most of the time monetary transactions. As an example,
every year Google looses approximately 110 millions of US Dollars in

1
. I

ignored ads because of the “I’m feeling lucky” button. e scary part is
that, during their daily work activities, people typically pay poor or no
attention at all to the risks that derive from exchanging any kind of in-
formation over such a complex, interconnected infrastructure. is is
demonstrated by the eﬀectiveness of social engineering (Mitnick, 2002)
scams carried over the Internet or the phone (Granger, 2001). Recall
that 76% of the phishing is related to nance. Now, compare this land-
scape to what the most famous security quote states.

“ e only truly secure computer is one buried in con-

crete, with the power turned oﬀ and the network cable cut”.
—Anonymous

In fact, the Internet is all but a safe place (Ofer Shezaf and Jeremiah
Grossman and Robert Auger, 2009), with more than 1,250 known data
breaches between 2005 and 2009 (Clearinghouse, 2009) and an esti-
mate of 263,470,869 records stolen by intruders. One may wonder why
the advance of research in computer security and the increased aware-
ness of governments and public institutions are still not capable of avoid-
ing such incidents. Besides the fact that the aforementioned numbers
would be order of magnitude higher in absence of countermeasures, to-
days’ security issues are, basically, caused by the combination of two
phenomena: the high amount of software vulnerabilities and the eﬀec-
tiveness of todays’ exploitation strategy.

software aws — (un)surprisingly, software is aﬀected by vulnerabili-

ties. Incidentally, tools that have to do with the Web, namely,
browsers and 3rd -party extensions, and web applications, are the
most vulnerable ones. For instance, in 2008, Secunia reported
around 115 security vulnerabilities for Mozilla Firefox, 366 for
Internet Explorer’s ActiveX (Secunia, 2008). Oﬃce suites and e-
mail clients, that are certainly the must-have-installed tool on ev-
ery workstation, hold the second position ( e SANS Institute,
2005).
massi cation of attacks — in parallel to the explosion of the Web 2.0,
attackers and the underground economy have quickly learned that
a sweep of exploits run against every reachable host have more
chances to nd a vulnerable target and, thus, is much more prof-
itable compared to a single eﬀort to break into a high-value, well-
protected machine.

2
1.1. Todays’ Security reats

ese circumstances have initiated a vicious circle that provides the

attackers with a very large pool of vulnerable targets. Vulnerable client
hosts are compromised to ensure virtually unlimited bandwidth and
computational resources to attackers, while server side applications are
violated to host malicious code used to infect client visitors. And so
forth. An old fashioned attacker would have violated a single site using
all the resources available, stolen data and sold it to the underground
market. Instead, a modern attacker adopts a “vampire” approach and
exploit client-side software vulnerabilities to take (remote) control of
million hosts. In the past the diffusion of malicious code such as viruses
was sustained by sharing of infected, cracked software through oppy
or compact disks; nowadays, the Web offers unlimited, public storage
to attackers that deploy their exploit on compromised websites.
us, not only the type of vulnerabilities has changed, posing virtu-
ally every interconnected device at risk. e exploitation strategy cre-
ated new types of threats that take advantage of classic malicious code
patterns but in a new, extensive, and tremendously effective way.

1.1 Todays’ Security reats

Every year, new threats are discovered and attacker take advantage of
them until eﬀective countermeasures are found. en, new threats are
discovered, and so forth. Symantec quanti es the amount of new ma-
licious code threats to be 1,656,227 as of 2008 (Turner et al., 2009),
624,267 one year earlier and only 20,547 in 2002. us, countermea-
sures must advance at least with the same grow rate. In addition:

[...] the current threat landscape — such as the in-

creasing complexity and sophistication of attacks, the evo-
lution of attackers and attack patterns, and malicious ac-
tivities being pushed to emerging countries — show not
just the bene ts of, but also the need for increased cooper-
ation among security companies, governments, academics,
and other organizations and individuals to combat these
changes (Turner et al., 2009).

Todays’ underground economy run a very pro cient market: every-

one can buy credit card information for as low as $0.06–$30, full iden-
tities for just $0.70–$60 or rent a scam hosting solution for $3–$40 per
week plus $2-$20 for the design (Turner et al., 2009).

3
. I

F . : Illustration taken from (Holz, 2005) and ©2005 IEEE.

Authorized license limited to University of California.

e main underlying technology actually employs a classic type of

software called bot (jargon for robot), which is not malicious per sé, but
is used to remotely control a network of compromised hosts, called bot-
net (Holz, 2005). Remote commands can be of any type and typically
include launching an attack, starting a phishing or spam campaign, or
even updating to the latest version of the bot software by downloading
the binary code from a host controlled by the attackers (usually called
bot master) (Stone-Gross et al., 2009). e exchange good has now be-
come the botnet infrastructure itself rather than the data that can be
stolen or the spam that can be sent. ese are mere outputs of todays’
most popular service oﬀered for rent by the underground economy.

1.1.1 e Role of Intrusion Detection

e aforementioned, dramatic big picture may lead to think that the ma-
licious software will eventually proliferate at every host of the Internet
and no eﬀective remediation exists. However, a more careful analysis

4
1.1. Todays’ Security reats

reveals that, despite the complexity of this scenario, the problems that
must be solved by a security infrastructure can be decomposed into rel-
atively simple tasks that, surprisingly, may already have a solution. Let
us look at an example.

Example 1.1.1 is is how a sample exploitation can be structured:

injection — a malicious request is sent to the vulnerable web application
with the goal of corrupting all the responses sent to legitimate clients
from that moment on. For instance, more than one releases of the pop-
ular WordPress blog application are vulnerable to injection attacks1
that allow an attacker to permanently include arbitrary content to
the pages. Typically, such an arbitrary content is malicious code (e.g.,
JavaScript, VBSCrip, ActionScript, ActiveX) that, every time a legit-
imate user requests the infected page, executes on the client host.
infection — Assuming that the compromised site is frequently accessed —
this might be the realistic case of the WordPress-powered ZDNet news
blog2 — a signi cant amount of clients visit it. Due to the high popu-
larity of vulnerable browsers and plug-ins, the client may run Internet
Explorer — that is the most popular — or an outdated release of Fire-
fox on Windows. is create the perfect circumstances for the malicious
page to successfully execute. In the best case, it may download a virus
or a generic malware from a website under control of the attacker, so
infecting the machine. In the worst case, this code may also exploit
speci c browser vulnerabilities and execute in privileged mode.
control & use — e malicious code just download installs and hides itself
onto the victim’s computer, which has just joined a botnet. As part of it,
the client host can be remotely controlled by the attackers who can, for
instance, rent it, use its bandwidth and computational power along
with other computers to run a distributed Denial of Service (DoS)
attack. Also, the host can be used to automatically perform the same at-
tacks described above against other vulnerable web applications. And
so forth.
is simple yet quite realistic example shows the various kinds of
malicious activity that are generated during a typical drive-by exploita-
tion. It also shows its requirements and assumptions that must hold to
guarantee success. More precisely, we can recognize:
1 http://secunia.com/advisories/23595
2 http://wordpress.org/showcase/zdnet/

5
. I

network activity — clearly, the whole interaction relies on a network

connection over the Internet: the HyperText Transfer Protocol (HTTP)
connections used, for instance, to download the malicious code
as well as to launch the injection attack used to compromise the
web server.

host activity — similarly to every other type of attack against an appli-

cation, when the client-side code executes, the browser (or one
of its extension plug-ins) is forced to behave improperly. If the
malicious code executes till completion the attack succeeds and
the host is infected. is happens only if the platform, operating
system, and browser all match the requirements assumed by the
exploit designer. For instance, the attack may succeed on Win-
dows and not on Mac OS X, although the vulnerable version of,
say, Firefox is the same on both the hosts.

HTTP traﬃc — in order to exploit the vulnerability of the web ap-

plication, the attacking client must generate malicious HTTP
requests. For instance, in the case of an Structured Query Lan-
guage (SQL) injection — that is the second most common vul-
nerability in a web application — instead of a regular
GET /index.php?username=myuser

the web server might be forced to process a

GET /index.php?username=’ OR ’x’=’x’--\&content=<script src=”evil.com/
code.js”>

that causes the index.php page to behave improperly.

It is now clear that protection mechanisms that analyze the net-

work traﬃc, the activity of the client’s operating system, the web server’s
HTTP logs, or any combination of the three, have chances of recogniz-
ing that something malicious is happening in the network. For instance,
if the Internet Service Provider (ISP) network adopt Snort, a lightweight
Intrusion Detection System (IDS) that analyzes the network traﬃc for
known attack patterns, could block all the packets marked as suspicious.
is would prevent, for instance, the SQL injection to reach the web
application. A similar protection level can be achieved by using other
tools such as ModSecurity (Ristic, 2008). One of the problems that may
arise with these classic, widely adopted solutions is if a zero day attack

6
1.1. Todays’ Security reats

is used. A zero day attack or threat exploits a vulnerability that is un-

known to the public, undisclosed to the software vendor, or a x is not
available; thus, protection mechanisms that merely blacklist known ma-
licious activity immediately become ineﬀective. In a similar vein, if the
client is protected by an anti-virus, the infection phase can be blocked.
However, this countermeasure is once again successful only if the anti-
virus is capable of recognizing the malicious code, which assumes that
the code is known to be malicious.
Ideally, an eﬀective and comprehensive countermeasure can be achieved
if all the protection tools involved (e.g., client-side, server-side, network-
side) can collaborate together. For instance, if a website is publicly re-
ported to be malicious, a client-side protection tool should block all the
content downloaded from that particular website. is is only a simple
example.
us, countermeasures against todays’ threats already exist but are
subject to at least two drawbacks:

• they oﬀer protection only against known threats. To be eﬀective

we must assume that all the hostile traﬃc can be enumerated,
which is clearly an impossible task.

Why is “Enumerating Badness” a dumb idea? It’s

a dumb idea because sometime around 1992 the amount
of Badness in the Internet began to vastly outweigh
the amount of Goodness. For every harmless, legit-
imate, application, there are dozens or hundreds of
pieces of malware, worm tests, exploits, or viral code.
Examine a typical antivirus package and you’ll see it
knows about 75,000+ viruses that might infect your
machine. Compare that to the legitimate 30 or so
apps that I’ve installed on my machine, and you can
see it’s rather dumb to try to track 75,000 pieces of
Badness when even a simpleton could track 30 pieces
of Goodness (Ranum, 2005).

• they lack of cooperation, which is crucial to detect global and slow

attacks.

is said, we conclude that classic approaches such as dynamic and

static code analysis and IDS already oﬀer good protection but indus-
try and research should move toward methods that require little or no

7
. I

knowledge. In this work, we indeed focus on the so called anomaly-

based approaches, i.e., those that attempt to recognize the threats by
detecting any variation from a system’s normal operation, rather than
looking for signs of known-to-be-malicious activity.

1.2 Original Contributions

Our main research area is Intrusion Detection (ID). In particular, we fo-
cus on anomaly-based approaches to detect malicious activities. Since
todays’ threats are complex, a single point of inspection is not eﬀective.
A more comprehensive monitoring system is more desirable to protect
both the network, the applications running on a certain host, and the
web applications (that are particularly exposed due to the immense pop-
ularity of the Web). Our contributions focus on the mitigation of both
host-based and web-based attacks, along with two techniques to corre-
late alerts from hybrid sensors.

1.2.1 Host-based Anomaly Detection

Typical malicious processes can be detected by modeling the character-
istics (e.g., type of arguments, sequences) of the system calls executed
by the kernel, and by agging unexpected deviations as attacks. Regard-
ing this type of approaches, our contributions focus on hybrid models
to accurately characterize the behavior of a binary application. In par-
ticular:

• we enhanced, re-engineered, and evaluated a novel tool for mod-

eling the normal activity of the Linux 2.6 kernel. Compared to
other existing solutions, our system shows better detection capa-
bilities and good contextualization of the alerts reported.
• We engineered and evaluated an IDS to demonstrate that the
combined use of (1) deterministic models to characterize a pro-
cess’ control ow and (2) stochastic models to capture normal
features of the data ow, lead to better detection accuracy. Com-
pared to the existing deterministic and stochastic approaches sep-
arately, our system shows better accuracy, with almost zero false
positives.
• We adapted our techniques for forensics investigation. By run-
ning experiments on real-world data and attacks, we show that

8
1.2. Original Contributions

our system is able to detect hidden tamper evidence although so-

phisticated anti-forensics tools (e.g., userland process execution)
have been used.

1.2.2 Web-based Anomaly Detection

Attempts of compromising a web application can be detected by mod-
eling the characteristics (e.g., parameter values, character distributions,
session content) of the HTTP messages exchanged between servers and
clients during normal operation. is approach can detect virtually any
attempt of tampering with HTTP messages, which is assumed to be
evidence of attack. In this research eld, our contributions focus on
training data scarcity issues along with the problems that arise when an
application changes its legit behavior. In particular:

• we contributed to the development of a system that learns the le-

git behavior of a web application. Such a behavior is de ned by
means of features extracted from 1) HTTP requests, 2) HTTP re-
sponses, 3) SQL queries to the underlying database, if any. Each
feature is extracted and learned by using diﬀerent models, some of
which are improvements over well-known approaches and some
others are original. e main contribution of this work is the
combination of database query models with HTTP-based mod-
els. e resulting system has been validated through preliminary
experiments that shown very high accuracy.
• we developed a technique to automatically detect legit changes in
web applications with the goal of suppressing the large amount of
false detections due to code upgrades, frequent in todays’ web ap-
plications. We run experiments on real-world data to show that
our simple but very eﬀective approach accurately predict changes
in web applications and can distinguish good vs. malicious changes
(i.e., attacks).
• We designed and evaluated a machine learning technique to ag-
gregate IDS models with the goal of ensuring good detection
accuracy even in case of scarce training data available. Our ap-
proach relies on clustering techniques and nearest-neighbor search
to look-up well-trained models used to replace under-trained ones
that are prone to over tting and thus false detections. Experi-
ments on real-world data have shown that almost every false alert

9
. I

due to over tting is avoided with as low as 32-64 training samples

per model.

Although these techniques have been developed on top of a web-

based anomaly detector, they are suﬃciently generic to be easily adapted
to other systems using learning approaches.

1.2.3 Alert Correlation

IDS alerts are usually post-processed to generate compact reports and
eliminate redundant, meaningless, or false detections. In this research
eld, our contributions focus on unsupervised techniques applied to ag-
gregate and correlate alert events with the goal of reducing the eﬀort of
the security oﬃcer. In particular:

• We developed and tested an approach that accounts for the com-

mon measurement errors (e.g., delays and uncertainties) that oc-
cur in the alert generation process. Our approach exploits fuzzy
metrics both to model errors and to construct an alert aggrega-
tion criterion based on distance in time. is technique has been
show to be more robust compared to classic time-distance based
aggregation metrics.
• We designed and tested a prototype that models the alert gener-
ation process as a stochastic process. is setting allowed us to
construct a simple, non-parametric hypothesis test that can de-
tect whether two alert streams are correlated or not. Besides its
simplicity, the advantage of our approach is to not requiring any
parameter.

e aforementioned results have been published in the proceedings

of international conferences and international journals.

10
A Chapter of Examples 2

2.1 A Table

Feature M - A -
Modeled activity: Malicious Normal
Detection method: Matching Deviation
reats detected: Known Any
False negatives: High Low
False positives: Low High
Maintenance cost: High Low
Attack desc.: Accurate Absent
System design: Easy Diﬃcult

Table 2.1: Duality between misuse- and anomaly-based intrusion de-

tection techniques. Note that, an anomaly-based IDS can detect “Any”
threat, under the assumption that an attack always generates a deviation
in the modeled activity.

2.2 Code

11
. AC E

1 /* ... */ cd[’<’] = {0.1, 0.11} cd[’a’] = {0.01, 0.2} cd[’b’] =

2 {0.13, 0.23} /* ... */
3
4 b = decode(arg3_value);
5
6 if ( !(cd[’c’][0] < count(’c’, b) < cd[’c’][1]) ||\
7 !(cd[’<’][0] < count(’<’, b) < cd[’<’][1]) ||\
8 ... || ...) fire_alert(”Anomalous content detected!”);
9 /* ... */

2.3 A Sideways Table

12
A T H P S D . C

(Mahoney and Chan, 2001) • •

(Kruegel et al., 2002) • • •
(Sekar et al., 2002) • • •
(Ramadas, 2003) • •
(Mahoney and Chan, 2003) • • •
(Zanero and Savaresi, 2004) • • •
(Wang and Stolfo, 2004) • •
(Zanero, 2005) • • •
(Bolzoni et al., 2006) • • •
(Wang et al., 2006) • •

Table 2.2: Taxonomy of the selected state of the art approaches for network-based anomaly detection.

13
2.3. A Sideways Table
. AC E

2.4 A Figure

700

600

500

Number of occurrencies
400

300

200

100

0
25 30 35 40 45 50 55 60 65 70
Distance in syscalls

F . : telnetd: distribution of the number of other system calls

among two execve system calls (i.e., distance between two consecutive
execve).

2.5 Bulleted List

• O =“Intrusion”, ¬O =“Non-intrusion”;
• A =“Alert reported”, ¬A =“No alert reported”.

2.6 Numbered List

1. O =“Intrusion”, ¬O =“Non-intrusion”;
2. A =“Alert reported”, ¬A =“No alert reported”.

2.7 A Description
Time refers to the use of timestamp information, extracted from net-
work packets, to model normal packets. For example, normal
packets may be modeled by their minimum and maximum inter-
arrival time.

14
2.8. An Equation

Header means that the Trasmission Control Protocol (TCP) header is

decoded and the elds are modeled. For example, normal packets
may be modeled by the observed ports range.

Payload refers to the use of the payload, either at Internet Protocol (IP)
or TCP layer. For example, normal packets may be modeled by
the most frequent byte in the observed payloads.

Stochastic means that stochastic techniques are exploited to create mod-

els. For example, the model of normal packets may be constructed
by estimating the sample mean and variance of certain features
(e.g., port number, content length).

Deterministic means that certain features are modeled following a de-

terministic approach. For example, normal packets may be only
those containing a speci ed set of values for the Time To Live
(TTL) eld.

Clustering refers to the use of clustering (and subsequent classi ca-

tion) techniques. For instance, payload byte vectors may be com-
pressed using a Self Organizing Map (SOM) where class of diﬀer-
ent packets will stimulate neighbor nodes.

2.8 An Equation
{
Ka + αa δa (i, j) if the elements are diﬀerent
da (i, j) := (2.1)
0 otherwise

2.9 A eorem, Proposition & Proof

eorem 2.9.1 a2 + b2 = c2

Proposition 2.9.2 3 + 3 = 6

Proof 2.9.1 For any nite set {p1 , p2 , ..., pn } of primes, consider m =
p1 p2 ...pn + 1. If m is prime it is not in the set since m > pi for all i.
If m is not prime it has a prime divisor p. If p is one of the pi then p is a
divisor of p1 p2 ...pn and hence is a divisor of (m − p1 p2 ...pn ) = 1, which
is impossible; so p is not in the set. Hence a nite set {p1 , p2 , ..., pn } cannot
be the collection of all primes.

15
. AC E

2.10 De nition
De nition 2.10.1 (Anomaly-based IDS) An anomaly-based IDS is a
type of IDS that generate alerts A by relying on normal activity pro les.

2.11 A Remark
Remark 1 Although the network stack implementation may vary from sys-
tem to system (e.g., Windows and Cisco platforms have diﬀerent implemen-
tation of TCP).

2.12 An Example
Example 2.12.1 (Misuse vs. Anomaly) A misuse-based system M and an
anomaly-based system A process the same log containing a full dump of the
system calls invoked by the kernel of an audited machine. Log entries are in
the form:

<function_name>(<arg1_value>, <arg2_value>, ...)

2.13 Note
Note 2.13.1 (Inspection layer) Although the network stack implementa-
tion may vary from system to system (e.g., Windows and Cisco platforms
have diﬀerent implementation of TCP), it is important to underline that
the notion of IP, TCP, HTTP packet is well de ned in a system-agnostic
way, while the notion of operating system activity is rather vague and by
no means standardized.

16
Bibliography

Jesse Alpert and Nissan Hajaj. We knew the web was big...
Available online at http://googleblog.blogspot.com/2008/07/
we-knew-web-was-big.html, Jul 2008.

Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel, and Emmanuele

Zambon. Poseidon: a 2-tier anomaly-based network intrusion de-
tection system. In IWIA, pages 144–156. IEEE Computer Society,
2006. ISBN 0-7695-2564-4.

Privacy Rights Clearinghouse. A chronology of data breaches. Techni-

cal report, Privacy Rights Clearinghouse, July 2009.

Facebook. Statistics. Available online at http://www.facebook.com/

press/info.php?statistics, 2009.

Sarah Granger. Social engineering fundamentals, part i: Hacker tactics.

Available online at http://www.securityfocus.com/infocus/1527, Dec
2001.

orsten Holz. A short visit to the bot zoo. IEEE Security & Privacy,
3(3):76–79, 2005.

Christopher Kruegel, omas Toth, and Engin Kirda. Service-Speci c

Anomaly Detection for Network Intrusion Detection. In Proceedings
of the Symposium on Applied Computing (SAC 2002), Spain, March
2002.

Matthew V. Mahoney and Philip K. Chan. Learning rules for anomaly

detection of hostile network traﬃc. In Proceedings of the 3rd IEEE
International Conference on Data Mining, page 601, 2003. ISBN 0-
7695-1978-4.

17
B

M.V. Mahoney and P.K. Chan. Detecting novel attacks by identifying

anomalous network packet headers. Technical Report CS-2001-2,
Florida Institute of Technology, 2001.
Miniwatts Marketing Grp. World Internet Usage Statistics. http://
www.internetworldstats.com/stats.htm, January 2009.

Kevin Mitnick. e art of deception. Wiley, 2002.

Ofer Shezaf and Jeremiah Grossman and Robert Auger. Web Hacking
Incidents Database. http://www.xiom.com/whid-about, January 2009.
M. Ramadas. Detecting anomalous network traﬃc with self-organizing
maps. In Recent Advances in Intrusion Detection 6th International Sym-
posium, RAID 2003, Pittsburgh, PA, USA, September 8-10, 2003, Pro-
ceedings, Mar 2003.
Marcus J. Ranum. e six dumbest ideas in computer security. http://
www.ranum.com/security/computer_security/editorials/dumb/, Sept.
2005.
Ivan Ristic. mod_security: Open Source Web Application Firewall.
http://www.modsecurity.org/, June 2008.
Secunia. Secunia’s 2008 annual report. Available online at http:
//secunia.com/gfx/Secunia2008Report.pdf, 2008.
R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang, and
S. Zhou. Speci cation-based anomaly detection: a new approach for
detecting network intrusions. In CCS ’02: Proceedings of the 9th ACM
Conference on Computer and communications security, pages 265–274,
New York, NY, USA, 2002. ACM Press. ISBN 1-58113-612-9.
Adam Singer. Social media, web 2.0 and internet stats.
Available online at http://thefuturebuzz.com/2009/01/12/
social-media-web-20-internet-numbers-stats/, Jan 2009.

Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilber t, Mar

tin Szydlowski andRichard Kemmerer, and Christopher Kruegel
andGiovanni Vigna. Your botnet is my botnet: Analysis of a bot-
net takeover. In CCS 2009, Chicago, November 2009. ACM.
e SANS Institute. e twenty most critical internet security vulner-
abilities. http://www.sans.org/top20/, Nov. 2005.

18
Bibliography

Dean Turner, Marc Fossi, Eric Johnson, Trevor Mark, Joseph Black-
bird, Stephen Entwise, Mo King Low, David McKinney, and Can-
did Wueest. Symantec Global Internet Security reat Report –
Trends for 2008. Technical Report XIV, Symantec Corporation,
April 2009.
Ke Wang and Salvatore J. Stolfo. Anomalous payload-based network
intrusion detection. In Proceedings of the International Symposium on
Recent Advances in Intrusion Detection (RAID 2004). Springer-Verlag,
September 2004.
Ke Wang, Janak J. Parekh, and Salvatore J. Stolfo. Anagram: A con-
tent anomaly detector resistant to mimicry attack. In Proceedings of
the International Symposium on Recent Advances in Intrusion Detection
(RAID 2006), Hamburg, GR, September 2006. Springer-Verlag.
Robert H’obbes’ Zakon. Hobbes’ internet timeline v8.2. Available on-
line at http://www.zakon.org/robert/internet/timeline/, Nov 2006.
Stefano Zanero. Analyzing tcp traﬃc patterns using self organiz-
ing maps. In Fabio Roli and Sergio Vitulano, editors, Proceedings
13th International Conference on Image Analysis and Processing - ICIAP
2005, volume 3617 of Lecture Notes in Computer Science, pages 83–90,
Cagliari, Italy, Sept. 2005. Springer. ISBN 3-540-28869-4.

Stefano Zanero and Sergio M. Savaresi. Unsupervised learning tech-

niques for an intrusion detection system. In Proceedings of the 2004
ACM Symposium on Applied Computing, pages 412–419. ACM Press,
2004. ISBN 1-58113-812-1.

19
Index

0-day, 6

HTTP, 9

IP, 15

malware, iv

TCP, 15
TTL, 15

URL, 1

Fault Tolerant & Fault Testable Hardware Design
From Everand
Fault Tolerant & Fault Testable Hardware Design
Parag K. Lala
5/5 (2)
Hiremagalore_gmu_0883E_10953
No ratings yet
Hiremagalore_gmu_0883E_10953
100 pages
Thesis
No ratings yet
Thesis
53 pages
Dos thesis
No ratings yet
Dos thesis
44 pages
Master of Science Thesis: Fernando Álvarez Cabrera Supervisor: Professor Robin Sharp
No ratings yet
Master of Science Thesis: Fernando Álvarez Cabrera Supervisor: Professor Robin Sharp
194 pages
EVENT LOG ANALYSIS FOR INTRUSION DETECTION
No ratings yet
EVENT LOG ANALYSIS FOR INTRUSION DETECTION
60 pages
Intrusion Detection Systems
100% (1)
Intrusion Detection Systems
93 pages
Thesis Network Anomaly Detection With Incomplete Audit Data 2006
No ratings yet
Thesis Network Anomaly Detection With Incomplete Audit Data 2006
149 pages
FULLTEXT01
No ratings yet
FULLTEXT01
63 pages
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
No ratings yet
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
13 pages
1.1 Motivation
No ratings yet
1.1 Motivation
65 pages
Unit - 6 Advanced Concepts of Authorization
No ratings yet
Unit - 6 Advanced Concepts of Authorization
39 pages
Thesis - Proposal of MB BOT
No ratings yet
Thesis - Proposal of MB BOT
31 pages
Network Intrusion Detection - Monitoring Simulation and Visualiz
No ratings yet
Network Intrusion Detection - Monitoring Simulation and Visualiz
161 pages
dl-ids
No ratings yet
dl-ids
72 pages
Methods of Impvin String Effciency
No ratings yet
Methods of Impvin String Effciency
40 pages
Master of Engineering in Cyber Security
No ratings yet
Master of Engineering in Cyber Security
22 pages
076MSCSK002 Bibek Adhikari
No ratings yet
076MSCSK002 Bibek Adhikari
51 pages
Labonne 2020
No ratings yet
Labonne 2020
123 pages
Source 1
No ratings yet
Source 1
162 pages
Intrusion Detection Systems: An Introduction
No ratings yet
Intrusion Detection Systems: An Introduction
7 pages
10.5445IR1000146668
No ratings yet
10.5445IR1000146668
158 pages
Thesis PDF
No ratings yet
Thesis PDF
114 pages
Sg 248363
No ratings yet
Sg 248363
958 pages
Intrusion Detection and Response Systems For Mobile Ad Hoc Networks
No ratings yet
Intrusion Detection and Response Systems For Mobile Ad Hoc Networks
180 pages
c4 Rest 305 Report
No ratings yet
c4 Rest 305 Report
55 pages
2022 Zhang Ruohao
No ratings yet
2022 Zhang Ruohao
114 pages
Flowmon Ads Business Userguide en 2
No ratings yet
Flowmon Ads Business Userguide en 2
102 pages
AI EMPOWERED ANOMALY DETECTION Technique For Cyber Network Security
No ratings yet
AI EMPOWERED ANOMALY DETECTION Technique For Cyber Network Security
33 pages
Next-Generation SOC
No ratings yet
Next-Generation SOC
34 pages
Fulltext
No ratings yet
Fulltext
123 pages
17794
No ratings yet
17794
68 pages
Nettwork Intruder
No ratings yet
Nettwork Intruder
74 pages
Approaches For Anomaly Detection in Network - A
No ratings yet
Approaches For Anomaly Detection in Network - A
6 pages
(Original PDF) Principles of Information Security 6th by Michael E. Whitman instant download
100% (2)
(Original PDF) Principles of Information Security 6th by Michael E. Whitman instant download
53 pages
(Original PDF) Principles of Information Security 6th by Michael E. Whitman pdf download
100% (1)
(Original PDF) Principles of Information Security 6th by Michael E. Whitman pdf download
51 pages
Intrusion Detection System (CS6442)
No ratings yet
Intrusion Detection System (CS6442)
115 pages
Intrusion Detection System
No ratings yet
Intrusion Detection System
20 pages
Sg 248568
No ratings yet
Sg 248568
302 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
2019 TFG Gómez IDS
No ratings yet
2019 TFG Gómez IDS
236 pages
Cryptographic Solutions For Cyber-Physical System Security
No ratings yet
Cryptographic Solutions For Cyber-Physical System Security
179 pages
TSP JCS 46915
No ratings yet
TSP JCS 46915
23 pages
ADML_IoT_1-0-1
No ratings yet
ADML_IoT_1-0-1
115 pages
Download (Original PDF) Principles of Information Security 6th by Michael E. Whitman ebook All Chapters PDF
100% (1)
Download (Original PDF) Principles of Information Security 6th by Michael E. Whitman ebook All Chapters PDF
36 pages
gray-hat-hacking-23
No ratings yet
gray-hat-hacking-23
1 page
Linux Network Administration II, Network Security and Firewalls - Student Notebook (IBM Learning, 2003, Course Code LX24)
No ratings yet
Linux Network Administration II, Network Security and Firewalls - Student Notebook (IBM Learning, 2003, Course Code LX24)
382 pages
sg248568
No ratings yet
sg248568
290 pages
(Original PDF) Principles of Information Security 6th by Michael E. Whitman 2024 scribd download
100% (8)
(Original PDF) Principles of Information Security 6th by Michael E. Whitman 2024 scribd download
46 pages
Applying Machine Learning To Cyber Security
No ratings yet
Applying Machine Learning To Cyber Security
117 pages
A System For Denial-of-Service Attack Detection Based On Multivariate Correlation Analysis
No ratings yet
A System For Denial-of-Service Attack Detection Based On Multivariate Correlation Analysis
10 pages
Final ppt
No ratings yet
Final ppt
10 pages
QLX24STUD
No ratings yet
QLX24STUD
380 pages
Flowmon Ads Enterprise Userguide en
No ratings yet
Flowmon Ads Enterprise Userguide en
82 pages
Reversing: Secrets of Reverse Engineering
From Everand
Reversing: Secrets of Reverse Engineering
Eldad Eilam
4.5/5 (16)
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
Programming Concepts in Java
From Everand
Programming Concepts in Java
Robert Burns
No ratings yet
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
The Antivirus Hacker's Handbook
From Everand
The Antivirus Hacker's Handbook
Joxean Koret
No ratings yet
The Enigma of Embedded Systems
From Everand
The Enigma of Embedded Systems
Pasquale De Marco
No ratings yet
Analog To Digital Converter (ADC) Calibration Driver - ESP32-S3 - ESP-IDF Programming Guide Latest Documentation
No ratings yet
Analog To Digital Converter (ADC) Calibration Driver - ESP32-S3 - ESP-IDF Programming Guide Latest Documentation
5 pages
EventTracker Security Center Product Sheet
No ratings yet
EventTracker Security Center Product Sheet
2 pages
Synology DS223j Data Sheet Enu
No ratings yet
Synology DS223j Data Sheet Enu
9 pages
Schneider Electric_Easy-UPS-3Phase-Modular_EMUPS50K250PBHS
No ratings yet
Schneider Electric_Easy-UPS-3Phase-Modular_EMUPS50K250PBHS
4 pages
KỲ THI TUYỂN SINH VÀO LỚP 10 THPT - Quảng Ninh
No ratings yet
KỲ THI TUYỂN SINH VÀO LỚP 10 THPT - Quảng Ninh
3 pages
MCC - Assistant Computer Operator (01-07-2022)
No ratings yet
MCC - Assistant Computer Operator (01-07-2022)
5 pages
Network Lecture 6 - Network management
No ratings yet
Network Lecture 6 - Network management
20 pages
Database Design Lecture Notes
No ratings yet
Database Design Lecture Notes
9 pages
Sonar Qube
No ratings yet
Sonar Qube
46 pages
Daftar Pengaruh
No ratings yet
Daftar Pengaruh
2 pages
Nota
No ratings yet
Nota
16 pages
Auto Orderblock With Break of Structure - Manual
No ratings yet
Auto Orderblock With Break of Structure - Manual
20 pages
Online Platforms As Tools: For ICT Content Development
No ratings yet
Online Platforms As Tools: For ICT Content Development
38 pages
Primer On Fiber Optic Data Communications For The Premises Environment by Dr. Kenneth S. Schneider
No ratings yet
Primer On Fiber Optic Data Communications For The Premises Environment by Dr. Kenneth S. Schneider
90 pages
Correct Answer: B: - A. Ikev2 Ike - Sa - Init - D. Ikev2 Ike - Auth
No ratings yet
Correct Answer: B: - A. Ikev2 Ike - Sa - Init - D. Ikev2 Ike - Auth
12 pages
Active Directory Interview Questions With Answers
No ratings yet
Active Directory Interview Questions With Answers
367 pages
Handout No. 3 Finding The Derivative
No ratings yet
Handout No. 3 Finding The Derivative
6 pages
A10 Psse v35 Updating User Models
No ratings yet
A10 Psse v35 Updating User Models
10 pages
71402T Avaya Meetings Server Implement Online Test Free Dumps
No ratings yet
71402T Avaya Meetings Server Implement Online Test Free Dumps
4 pages
Power On Sequence Introduction PDF
No ratings yet
Power On Sequence Introduction PDF
8 pages
Zenon Web Server
No ratings yet
Zenon Web Server
58 pages
iWall-X-User-Manual-V1.0.3
No ratings yet
iWall-X-User-Manual-V1.0.3
30 pages
Gitchangeset Installation
No ratings yet
Gitchangeset Installation
9 pages
TC2431en-Ed03 Virtualization OXE Configuration Related To OXE Releases and ESXi Infrastructure Versions
100% (1)
TC2431en-Ed03 Virtualization OXE Configuration Related To OXE Releases and ESXi Infrastructure Versions
70 pages
Ict 11 Lesson Plan...
No ratings yet
Ict 11 Lesson Plan...
10 pages
HP Proliant Server Foundation 01047329 3
100% (1)
HP Proliant Server Foundation 01047329 3
20 pages
Cat8500 17 10 Rel Notes
No ratings yet
Cat8500 17 10 Rel Notes
8 pages
Magic Square Puzzle
No ratings yet
Magic Square Puzzle
1 page
7045 User Manual
No ratings yet
7045 User Manual
268 pages
Datacom-991c
No ratings yet
Datacom-991c
89 pages