Thesis
Thesis
/012345<yA|
M ASARYK U NIVERSITY
FACULTY OF I NFORMATICS
M ASTER ’ S T HESIS
Jan Vykopal
Brno, 2008
Declaration
I hereby declare that I am the sole author of this thesis. All sources and literature used in
this thesis are cited and listed properly.
ii
Acknowledgement
First of all, I would like to thank my advisor, Jiří Novotný, for his guidance and helpful
advices. Many thanks go also to the other colleagues from the Institute of Computer Science.
Last but not least, I am grateful to my family and friends for their love and support. Thank
you all.
iii
Abstract
In this thesis, methods for security analysis at the IP layer are presented and evaluated. The
evaluation is mainly focused on deployment in real high speed networks. Next, a solution
comprising selected method is proposed. The goal of this solution is to simplify work of a
network administrator and speed up the security incident response. Finally, the proposed
solution is tested in the campus network of the Masaryk University.
iv
Keywords
security analysis, intrusion detection, signature, anomaly, MINDS, entropy, host profiling,
CAMNEP, visualization, NetFlow probe, honeypot, NetFlow collector, MyNetScope
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Methods for Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Intrusion Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Flow-Based Traffic Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.1 NetFlow and IPFIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.2 Other Flow-based Technologies . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Signature-based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Stateful Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Anomaly-based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.1 Holt-Winters Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.2 Minnesota Intrusion Detection System (MINDS) . . . . . . . . . . . . . 10
2.6.3 The Work of Xu et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.4 Origin Destination Flow Analysis . . . . . . . . . . . . . . . . . . . . . 14
2.6.5 Cooperative Adaptive Mechanism for Network Protection (CAMNEP) 16
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Mapping in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Design of the IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Requirements on the IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Detection of Novel Threats . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 Operating in a High-speed Networks . . . . . . . . . . . . . . . . . . . 27
4.1.4 Early Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.5 Long-term Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.6 IPv6 support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.7 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.8 Easy Maintaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.9 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.10 Security Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.11 Anomaly Detection in Encrypted Traffic . . . . . . . . . . . . . . . . . . 28
4.1.12 User-friendly Interface and Well-arranged Visualization . . . . . . . . 28
4.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Network Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.2 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 MyNetScope and Data Sources . . . . . . . . . . . . . . . . . . . . . . . 33
vi
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Deployment of the Proposed IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1 Deployment status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Network Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.2 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.3 MyNetScope and Data Sources . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A An example of Holt-Winters prediction . . . . . . . . . . . . . . . . . . . . . . . . . 45
B The CD Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
vii
Chapter 1
Introduction
Computer networks are all around us. For example, they are essential for effective commu-
nication, sharing knowledge, research and development, modern education, entertainment
and, of course, e-commerce.
The TCP/IP protocol suite is widespread in today’s high speed computer networks.
Maybe surprisingly, the core of this suite, the Internet Protocol, was already published as
Request For Comment 791 [41] in 1981. In comparison to nowadays, the Internet was a
closed network in 1980s. So there was no need to consider security standpoints in the de-
sign. But now we are exposed to many security threats: denial of service (DoS), scanning,
password cracking, spoofing, eavesdropping, spamming, phishing, worms and others.
As a result, many companies and organizations define their network security policy. It
is a set of rules that should be followed by users to avoid or at least mitigate the security
threats. Technically, the policy is often implemented by firewalls, intrusion detection and
prevention systems (IDS, IPS) or a virtual private network (VPN). The firewall represents
basic level of a defence. It inspects network traffic passing through it and denies or permits
the passage based on a set of rules, a part of the network security policy. An intrusion de-
tection and/or prevention should be performed to fulfill two basic requirements: to identify
and/or protect host computer from security threats in the administered network connected
to the Internet or other networks and vice versa. We point out that both requirements are im-
portant. The network is exposed to attacks from outside as well as from inside. In addition,
the second requirement is important due to the presence of botnets that exploit “zombie
computers” in our network and use them to other malicious activities. In short, IDS and IPS
are the “checkpoints” that supervise firewalls or other components dedicated to the network
defence.
In this thesis, we focus on the security analysis of large networks such as the campus
network of Masaryk University that has tens of thousands of users a day and many en-
try points. Nowadays, the firewalls are part and parcel of the defence in this network. We
decide to deploy intrusion detection system to intensify network security although it is in-
tentionally an open and not restricted academic network. Our goal is to reveal the network
behaviour and examine if it complies with the defined security policy mainly implemented
by firewalls. Consequently, the security analysis should be easier and supported by IDS out-
puts.
1
1.1. THESIS OUTLINE
This thesis is divided into six chapters. Chapter 1 is this introduction. Selected modern meth-
ods for security analysis mainly at the IP layer are described and evaluated in Chapter 2.
Network traffic visualization as an important part of the security analysis is discussed in
Chapter 3. In Chapter 4, requirements on the intrusion detection system and a design that
meets these requirements are presented. Chapter 5 summarizes our experience in system
deployment in the Masaryk University network. Chapter 6 concludes the thesis.
2
Chapter 2
This chapter provides an introduction to the intrusion detection and modern methods for
the network security analysis. We are mainly focused on the methods working at the IP layer.
First of all, we explain basic terms related to the intrusion detection and traffic acquisition.
Then we describe and evaluate each method, especially according to the following criteria:
1. coverage,
2. effectiveness,
3. performance,
The first criterion is ability to detect security threats. The coverage is complete if the method
detects both known and unknown threats. The second criterion stands for detection accu-
racy, the rate of false positives produced by the method. The speed of processing network
traffic by the method, the third criterion, is crucial for a deployment in high-speed networks.
The fourth criterion determines whether packet capture and/or (sampled) flow-based data
are suitable as the input of the evaluated method. Last criterion is more and more important
in today’s network.
A basic classification of methods is taken from [12].
We can divide intrusion detection systems (IDS) into two basic classes according to their po-
sition in the network: host-based intrusion detection systems and network-based intrusion
detection systems [1]. Note there are other points of view of the IDS classification.
Host-based Intrusion Detection This type of detection is performed on a host computer
in a computer network. Host-based intrusion detection system (HIDS) usually monitors log
files (e. g. firewall logs, web server logs and system logs) and the integrity of system files
(e. g. the kernel integrity or opened ports).
Network-based Intrusion Detection On the contrary, the network-based approach ob-
serves the whole network or its part. All inbound or outbound network traffic is inspected
3
2.2. INTRUSION PREVENTION
for suspicious patterns. The patterns can be represented as a signature, a string of characters
that describes a certain attack. Another different approach is an anomaly-based detection.
First, the model of a normal network behaviour is created. Then the difference to the model
is evaluated. If it is greater than predefined value (threshold), it can point out an attack.
Other network-based intrusion detection system (NIDS) use stateful protocol analysis to de-
tect suspicious, unexpected or invalid sequences of packets in terms of a specific protocol.
These methods are discussed in detail in relevant sections in this chapter. NIDS are passive
systems: they are “invisible” to other hosts and mainly for the attackers.
In connection to IDS, there are frequently mentioned two following terms: false positive
and false negative. The former denotes a false IDS alert: the system classifies benign traffic
as malicious. On the contrary, the latter points to the malicious traffic that was not recog-
nized by IDS. Of course, there is a tendency to minimize the numbers of both false positives
and negatives. For example, if the IDS produces high false positive rate, it bothers the ad-
ministrator about a subsequent manual analysis of these alerts. In addition, there are some
techniques, such as squealing, which exploit the vulnerability of IDSs to high false positive
rates. [43]
In comparison to IDS, an intrusion prevention system (IPS) is a reactive system in which IDS
is tightly coupled with firewall (and should be a part of the communication link). The main
task of IPS is to mitigate (stop) the detected attack. IPS can be divided into three classes:
host-based, network-based and distributed IPS [18].
The classic approach of many IDS or IPS to data collection is to capture all network packets
that pass through the system, most frequently in pcap format1 . In contrast, many routers
and monitoring probes perform a flow-based data collection, typically in NetFlow format.
1. A binary format, native for tools built on libpcap library, a system-independent interface for user-level
packet capture. The pcap format can read tools such as tcpdump, Wireshark, tcpreplay and many others.
4
2.3. FLOW-BASED TRAFFIC ACQUISITION
application ports, input and output interfaces, etc. [34] Thus, the flow-based data collection
provides an aggregated view of network traffic.
IPFIX The continuation of IETF effort leads to unification of protocols and applications
that require flow-based IP traffic measurements. RFC 3917 defines requirements for export-
ing traffic flow information out of routers, middleboxes (e. g. firewalls, proxies, load bal-
ancers, NATs), or traffic measurement probes for further processing by applications located
on other devices [33]. Consequently, Cisco’s NetFlow version 9 was chosen as the basis of
the IP Flow Information Export (IPFIX). [35] There are no fixed properties (5-tuple) such as
in NetFlow version 5. The user can flexibly define the properties used for flows distinction.
RFC 5101, published in January 2008, specifies the IPFIX protocol that serves for trans-
mitting IP Traffic Flow information over the network [37]. Next, RFC 5102 defines an infor-
mation model for the IPFIX protocol. It is used by the IPFIX protocol for encoding measured
traffic information and information related to the whole process [38]. Thanks to the IPFIX
flexibility, RFC 5103 can introduce the term Biflow, a bidirectional flow, and describe an
efficient method for exporting Biflows information using the IPFIX protocol [39]. The bidi-
rectional view of network traffic might be useful for security analysis.
The development of IPFIX is not finished. The IPFIX working group is still working on
a few Internet drafts that would be published as RFC. The most recent RFC was issued in
April 2008. It provides guidelines for the implementation and use of the IPFIX protocol. [40]
Packet sampling is performed (especially by routers) to save the NetFlow exporter re-
sources. We distinguish two basic types of sampling:
• deterministic – exactly nth of every n packets is sampled,
• random – each packet is sampled with a probability 1/n.
The constant n is called sampling rate. For example, if it is set to 4 and the device receive 100
packet, 25 packets are analyzed and 75 packets are dropped for the analysis. Only common
packet header fields are recorded, not the whole payload. The flow sampling is another type
of aggregation.
Both the active and the inactive timeout values affect a flow creation. The active timeout
is applied to long-lasting flows. If the flow has been inactive for the inactive timeout or the
end of the flow is detected, flow statistics are exported from the probe to a collector. The
collector is a server dedicated to collection, long-term storage and analysis of flow statistics.
5
2.4. SIGNATURE-BASED DETECTION
sFlow Agent when forwarding data to a central data collector [32]. sFlow is supported by
Alcatel-Lucent, D-Link, Hewlett-Packard, Hitachi and NEC.
Other leaders in networking also develop their proprietary flow-based solutions: Juniper
Networks use Jflow and Huawei Technology their NetStream.
This is one of the oldest methods for security analysis. We mentioned it here because it is
widely used by many commercial and open-source IDSs.
Description A signature is a pattern that corresponds to a known threat. Signature-based
detection is the process of comparing signatures against observed events to identify possi-
ble incidents. It is the simplest detection method because it just compares the current unit
of activity, such as a packet or a log entry, to a list of signatures using string comparison
operations. [12] In short, the detection works with “local” information.
Evaluation This method is very effective at detecting known threats, but largely inef-
fective at detecting previously unknown threats, threats disguised by the use of evasion
techniques, and many variants of known threats. [12] For example, if the intruder use the
Unicode representation of the slash character (%c0%af) and the signature contains the slash,
signature-based detection is not successful (false negative). [1]
Next, we describe an example of the signature. The following string is a simple rule for
an open-source signature-based IDS Snort. [42]
6
2.5. STATEFUL PROTOCOL ANALYSIS
Snort was originally designed for small, lightly utilized networks. [42] The core of the
signature-based detection is generally expensive string matching. Every packet and its pay-
load is inspected for searched signatures. Snort usually runs on COST (commercial off-the-
shelf) hardware and its performance is not satisfactory for this task in multi-gigabit net-
works. This gap is fulfilled by hardware accelerators. Traffic Scanner, a hardware-accelerated
IDS based on Field-Programmable Gate Arrays (FPGAs). System uses an architecture based
on non-deterministic finite automaton for fast pattern matching. Using this approach, through-
put up to 3.2 Gbps is achieved on for all rules from Snort database. [46] Hardware accelera-
tion is also interesting for commercial companies. [8]
Although signature-based detection handles mainly with packet payload, some signa-
ture consist of properties acquired by flow-based data collection. Then the limiting detec-
tion is possible. However, if sampling is used, some packets containing signatures can be
lost and the effectivity is thus lower.
We chose Snort as an implementation of the signature-based detection for evaluation.
We conclude the coverage is low, because only the known attacks specified by signatures
are revealed by this method. The effectivity vary according to the quality of signatures, the
risk of high false positives is high in common networks, even without the use of some IDS
evasion techniques, e. g. squealing [43]. Performance is reasonable for high-speed networks
only if it is supported by the hardware acceleration. The use of flow-based data as an input
for this method is limiting. Generally, signature-based IDS suffers from considerable latency
in deployment of a brand-new rule (a signature) in such system. Last, but not least, the
method cannot cope with encrypted payload.
Another approach to intrusion detection is stateful protocol analysis that operate mainly on
the higher layers of the TCP/IP network model. We mention it here for completeness and
comparison.
Description Stateful protocol analysis (alternatively deep packet inspection) is the pro-
cess of comparing predetermined profiles of generally accepted definitions of benign pro-
tocol activity for each protocol state against observed events to identify deviations. Unlike
anomaly-based detection, it relies on vendor-developed universal profiles that specify how
particular protocols should and should not be used. That means that the IDS is capable of
understanding and tracking the state of network, transport, and application protocols that
have a notion of state. [12]
For example, when a user starts a File Transfer Protocol (FTP) session, the session is
initially in the unauthenticated state. Unauthenticated users should only perform a few
commands in this state, such as viewing help information or providing usernames and
passwords. An important part of understanding state is pairing requests with responses,
so when an FTP authentication attempt occurs, the IDS can determine if it was successful
by finding the status code in the corresponding response. Once the user has authenticated
successfully, the session is in the authenticated state, and users are expected to perform any
7
2.6. ANOMALY-BASED DETECTION
of several dozen commands. Performing most of these commands while in the unauthenti-
cated state would be considered suspicious, but in the authenticated state performing most
of them is considered benign. [12]
Evaluation Although there are some tools implementing basic stateful protocol analysis
(such as stream43 in Snort), the method is not wide-spread (such as signature-based detec-
tion). We identify the following reasons.
Firstly, it is a very resource-intensive task, particularly in high-speed networks. The com-
plexity of the analysis grows with the number of (simultaneous) sessions 4 . Secondly, it relies
on the “knowledge” of all analyzed protocols. Notice there are numerous differences be-
tween implementations by various vendors and definitions in RFC and other standards. In
addition, only the analysis of known protocols is possible. Next, the attacks (e. g., denial of
service attacks) that utilized well-formed packets and do not violate the normal behaviour
are not detected. Finally, the method is impuissant to encrypted packet payload too.
On the other hand, the method generally provides relatively high accuracy. In contrast to
signature-based method that searches for known patterns in the packet payload, this method
works with sessions. The method can correlate information obtained from the whole session
together and provides better view inside the network traffic. Stateful protocol analysis can
also reveals some threats that could be omitted by other methods that performs port-based
traffic classification. Last, but not least, a limited subset of the analysis can process flows too.
It is the process of comparing definitions of what activity is considered normal against ob-
served events to identify significant deviations. An IDS using anomaly-based detection has
profiles that represent the normal behaviour of such things as users, hosts, network con-
nections, or applications. The profiles are developed by monitoring the characteristics of a
typical activity over a period of time. The major benefit of anomaly-based detection meth-
ods is that they can be very effective at detecting previously unknown threats. For example,
suppose that a computer becomes infected with a new type of malware. It will probably
perform behaviour that would be significantly different from the established profiles for the
computer. [12]
3. A Snort preprocessor, actually aimed at the mitigation of “squealing” [43] attacks performed by stick and
snot tools. See http://cvs.snort.org/viewcvs.cgi/snort/doc/README.stream4.
4. Actually, the session consists of a few flow in terms of NetFlow.
8
2.6. ANOMALY-BASED DETECTION
Description Many service network variable time series exhibit the following regularities
(characteristics) that should be accounted for by a model:
• A trend over time (i. e., a gradual increase in application daemon requests over a two
month period due to increased subscriber load).
• A seasonal trend or cycle (i. e., every day bytes per second increases in the morning
hours, peaks in the afternoon and declines late at night).
• Seasonal variability (i. e., application requests fluctuate wildly minute by minute dur-
ing the peak hours of 4–8 pm, but at 1 am application requests hardly vary at all).
• Gradual evolution of regularities (1) through (3) over time (i. e., the daily cycle grad-
ual shifts as the number of evening daylight hours increases from December to June).
[3]
Let y1 ...yt−1 , yt , yt+1 ... denote the sequence of values for the time series observed at some
fixed temporal interval. Let m denote the period of the seasonal trend (i. e., the number of
observations per day). Holt-Winters Forecasting [17] rests on the premise that the observed
time series can be decomposed into three components: a baseline, a linear trend, and a sea-
sonal effect. The algorithm presumes each of these components evolves over time and this is
accomplished by applying exponential smoothing to incrementally update the components.
The prediction is the sum of the three components: ŷt+1 = at + bt + ct+1−m . The update
formulae for the three components, or coefficients a, b, c are:
The new estimate of the baseline is the observed value adjusted by the best available esti-
mate of the seasonal coefficient (ct−m ). As the updated baseline needs to account for change
due to the linear trend, the predicted slope is added to the baseline coefficient. The new esti-
mate of the slope is simply the difference between the new and the old baseline (as the time
interval between observations is fixed, it is not relevant). The new estimate of the seasonal
component is the difference between the observed value and the corresponding baseline.
α, β, γ are the adaptation parameters of the algorithm and 0 < α, β, γ < 1. Larger values
mean the algorithm adapts faster and predictions reflect recent observations in the time se-
ries; smaller values means the algorithm adapts slower, placing more weight on the past
history of the time series. [3]
A simple mechanism to detect an anomaly is to check if an observed value of the time
series falls outside the confidence band. A more robust mechanism is to use a moving win-
dow of a fixed number of observations. If the number of violations (observations that fall
outside the confidence band) exceeds a specified threshold, then trigger an alert for aberrant
9
2.6. ANOMALY-BASED DETECTION
10
2.6. ANOMALY-BASED DETECTION
features: time window-based, connections with similar characteristics in the last T seconds
and connection window-based, last N connections originating from (arriving at) distinct
sources (destinations). The former obviously do not include malicious activities (such as
stealthy port scans) which last more than T seconds. Hence, it is complemented by the latter.
The time window-based features are:
• count-src, number of flows from unique source IP addresses inside the network in
the last T seconds to the same destination,
• count-serv-src, number of flows from the source IP to the same destination port in
the last T seconds
• count-src-conn, number of flows from unique source IP addresses inside the network
in the last N flows to the same destination
• count-serv-src-conn, number of flows from the source IP to the same destination port
in the last N flows
Secondly, the data is fed into the MINDS anomaly detection module that uses an outlier
detection algorithm to assign the local outlier factor (LOF) [25], an anomaly score to each
network connection. The outlier factor of a data point is local in the sense that it measures
the degree of being an outlier with respect to its neighbourhood. For each data example,
the density of the neighbourhood is first computed. The LOF of a specific data example p
represents the average of the ratios of the density of the example p and the density of its
neighbours. [27]
Finally, the MINDS association pattern analysis module summarizes network connec-
tions that are ranked highly anomalous by the anomaly detection module. This module also
uses some signature-based detection techniques. See section 3.5 in [27].
Evaluation MINDS was deployed at the University of Minnesota in August 20029 . It has
been successful in detecting many novel network attacks and emerging network behaviour
11
2.6. ANOMALY-BASED DETECTION
that could not be detected using signature based systems such as Snort. See section 3.4 in
[27].
Input to MINDS is NetFlow version 5 data, because the authors admit they currently do
not have the capacity to collect and store data in pcap (tcpdump) format.
LOF requires the neighbourhood around all data points be constructed. This involves
calculating pairwise distances between all data points, which is an O(n2 ) process, which
makes it computationally infeasible for millions of data points. The author suggests a sam-
pling of a training set from the data and compare all data points to this small set, which
reduces the complexity to O(n × m) where n is the size of the data and m is the size of the
sample. [27] On the other hand, the effectiveness can be decreased, namely because of the
potential presence of threats in the training set.
The coverage is good, the authors claim that the LOF technique also showed great promise
in detecting novel intrusions on real network data. [27]
X is a random variable, one feature dimension (srcIP, dstIP, srcPrt or dstPrt) that may
take Nx discrete values, H(X) is the (empirical) entropy of X, Hmax (X) is the maximum
entropy of (sampled) X and m is sample size, the total number of flows observed during the
12
2.6. ANOMALY-BASED DETECTION
time interval. RU (X) lies in the [0, 1] interval. Clearly, if RU (X) = 0, then all observations
of X are of the same kind and vice versa. Consequently, the author propose simple rules
that divide each RU dimension into three categories: 0 (low), 1 (medium) and 2 (high). If we
fix one dimension and other three are free, we get a three-dimensional vector. The labelling
process classifies clusters (the vectors) into 27 (=33 ) possible behaviour classes (BC in short).
Research on popularity, average size and volatility of BCs shows that a predominant major-
ity of clusters stay in the same BC when they re-appear. Moreover, most of the behaviour
transitions are between “neighbouring” or “akin” BCs. [48]
Next, the dominant state analysis capture the most common or significant feature val-
ues and their interaction. The authors find clusters within a behaviour class have nearly
identical forms of structural models (“simpler” subsets of values or constraints which ap-
proximate the original data in their probability distribution). This model can also help an
analyst because it provide interpretive value for understanding the cluster behaviour.
Finally, the authors identified three “canonical” profiles: server/service behaviour (mostly
providing well-known services), heavy-hitter host behaviour (predominantly associated with
well-known services) and scan/exploit behaviour (frequently manifested by hosts infected
with known worms). These profiles are characterized by BCs they belong to and their prop-
erties, frequency and stability of individual clusters, dominant states and additional at-
tributes such as average flow size in terms of packet and byte counts and their variabil-
ity. [48]
Evaluation Firstly, there is no free available implementation (as opposed to Snort), hence
the benchmarking is doubtful. However, we suppose satisfactory performance because the
method was developed for saturated backbone links. In addition, it processes aggregated
NetFlow data captured in 5-minute time slot10 , not payload of each packet that go through
in real-time.
Another advantage is that this method promises high coverage. The behaviour profiles
are built without any presumption on what is normal or anomalous. The method dynami-
cally extracts significant flows. There are no fixed rules applied to particular flow or packet.
The flow (cluster) is marked as exploit if it belongs to such profile. What is more, we can ob-
serve rare and interesting relationship between clusters and particular flow, which can point
out other (unknown) malicious behaviour (e. g., clusters in rare BC, behavioural changes for
clusters and unusual profiles for popular service ports).
The authors did not mention (and they actually could not mention) accuracy evaluation,
because they used live network traffic without any knowledge of structure (mainly portion
of malicious traffic). Last, but not least, we note the method stands on a port-based traffic
classification.
10. The authors set a timeout value to 60 seconds and admit that it is a trade-off between the effectivity and
performance.
13
2.6. ANOMALY-BASED DETECTION
Anukool Lakhina et al. have introduced Origin-Destination (OD) flow as a basic unit of net-
work traffic. It is the collection of all traffic that enters the network from a common ingress
point and departs from a common egress point. [24] They believe that a thorough under-
standing of OD flows is essential for anomaly detection too. Lakhina et al. distinguish from
other authors because they perform whole-network traffic analysis: modeling the traffic on
all links simultaneously. OD flow is often high-dimensional structure (it depends on the size
of the network), hence the authors utilize a technique called Principal Component Analysis
(PCA) to reduce the “dimensionality”. They found that the hundreds of OD flows can be
accurately described in time using a few independent dimensions. The following descrip-
tion and evaluation is relevant to two chosen methods based on OD flow analysis (not to
the only one as in previous sections).
Description Volume anomalies detection is based on a separation of the space of traffic
measurements into normal and anomalous subspaces, by means of PCA. [23] The authors
suppose that a typical backbone network is composed of nodes (also called Points of Pres-
ence, or PoPs) that are connected by links. The path followed by each OD flow is determined
by the routing tables. The authors use the term volume anomaly to refer to a sudden (with
respect to timestep used) positive or negative change in an OD flow’s traffic. Because such
an anomaly originates outside the network, it will propagate from the origin PoP to the
destination PoP. OD flow based anomalies are identified by observing link counts.
Firstly, PCA is applied to Y, the t × m measurement matrix, where t denotes the number
of time intervals of interest and m the number of links in the network. Thus, the matrix
denotes the timeseries of all links. We yield a set of m principal components, {vi }m i=1 . The
first principal component v1 is the vector that points in the direction of maximum variance
in Y:
v1 = argmaxkvk=1 kYvk
kYvk2 is proportional to the variance of the data measured along v. Proceeding itera-
tively, once the first k − 1 principal components have been determined, the k-th principal
component corresponds to the maximum variance of the residual. The residual is the differ-
ence between the original data and the data mapped onto the first k − 1 principal axes. [23]
The authors validate the thesis in [24] that their link data have low effective dimensional-
ity. The vast majority of the variance in each of inspected link is covered in 3 or 4 principal
component.
The mapping of the data to principal axis i with normalization to unit length follows.
Such vectors capture the temporal variation common to the entire ensemble of link traffic
timeseries along principal axis i. Since the principal axes are in order of contribution to
overall variance, the first vector captures the strongest temporal trend common to all link
traffic, the second captures the next strongest, and so on.
14
2.6. ANOMALY-BASED DETECTION
S is the total number of observations in the histogram. The value of sample entropy lies
in the range [0, log2 N ]. The metric takes on the value 0 when the distribution is maximally
concentrated, i.e., all observations are the same. Sample entropy takes on the value log2 N
when the distribution is maximally dispersed, i. e. n1 = n2 = . . . = nN . [22] This method
uses the multiway subspace method that extend the subspace method described in previous
paragraphs about volume anomalies. In contrast to the subspace method, there are four
measurement matrices, one for each traffic feature. An effective way of analyzing multiway
data is to recast it into a simpler, single-way representation. The idea behind the multiway
subspace method is to “unfold” the multiway matrix into a single, large matrix. Then the
(simple) subspace method is applied. We also refer to [22] for details.
Evaluation First of all, we emphasis again that the described methods and OD flow
analysis in general is intended for the whole-network traffic analysis. The authors admit
it is a difficult objective, amplified by the fact that modeling traffic on a single link is itself a
complex task. [22] What is more, the whole-network analysis requires data from all of nodes
(Points of Presence) in the inspected network. This implies traffic acquisition at each such
node that can be infeasible. Next, these methods are designed for backbone networks and
15
2.6. ANOMALY-BASED DETECTION
their operators. There is not important who (which host) sent an anomaly flow, but where
it was originated (in which autonomy system). It is a basic, aggregated point of view that is
computationally efficient. Although the computation complexity is O(t × m2 ), we need not
cope with hundreds of thousand flows, but only with tens of OD flows. The exact flows can
be consequently specified “on demand”, when we found an anomaly in particular OD flow.
However, then it is more expensive.
The authors evaluate the volume anomalies-based method in two backbone networks
(European Sprint and US Abilene). They used NetFlow from routers with periodic sampling
at a rate of 1 out of 250 packets and Juniper’s Traffic Sampling, random sampling at a rate of
1 out of 100 packets respectively. Packets were aggregated into flows at the network prefix
level in 5-minute timeslots. Next, an addition aggregation into 10-minute timeslot was per-
formed (due to time synchronization issues). We find 10 minutes is too long (particularly in
high-speed networks), because it extends the response time in case of short-term anomalies.
Feature entropy detection method was also evaluated by the authors in two (different)
backbone networks (US Abilene and European Géant). There was also used sampling: pe-
riodical and at a rate of 1 out of 100 packets, 1 out of 1 000 packets respectively (Géant is
larger than Abilene). Data from both network was captured in 5-minute timeslots.
Due to the sampling (mainly at low rates) some flows can be omitted. Thus, the sam-
pling can hide some (small) anomaly flows. This idea is supported by [15], another work of
Lakhina and other authors. Furthermore, they revealed that anomalies found in unsampled
traffic remain “visible” in terms of feature entropy even in sampled traffic. On the contrary,
volume-based detection relies only on summary counters that can provide SNMP11 (e. g.,
implemented in routers).
It is important to point out that the methods do not make any a priori assumption about
the nature of network traffic. The classification is done during the computation. So this ap-
proach takes into account the detection of unknown threats.
Considering the effectiveness, the authors use confidence limit 1 − α. The limit also di-
rectly determine false positive rate α. So, the settings of α is key issue. Lakhina et al. perform
the following evaluation. [22] They employ the methods in detection of known anomalies
in off-line trace files. The confidence limit was set to 0.995, 0.999 respectively. They actually
controlled if the method is (more) conservative or not. The evaluation is focused on mea-
surements of detection rate. In other words, it is the true positive rate that indicates how
many anomalies were captured from the whole.
CAMNEP is an agent-based network IDS. It is not the only one method, but the whole
system based on a few method described above. In spite of the fact, we mention it here,
because it is an interesting concept of an incorporation of modern detection method that
profits from the synergy effect.
16
2.6. ANOMALY-BASED DETECTION
Description The architecture consists of several layers (see Figure 2.1) with varying
requirements on on-line processing characteristics, level of reasoning and responsiveness.
While the low-level layers need to be optimized to match the high wire-speed during the
network traffic acquisition and preprocessing, the higher layers use the preprocessed data
to infer the conclusions regarding the degree of anomaly and consecutively also the mali-
ciousness of the particular flow or a group of flows. [4]
Anomalies Anomalies
Cooperative Threat
Agent Conclusions Agent Conclusions Agent Detection Layer
Preprocessed Data
Traffic acquisition and preprocessing layer acquires the data from the network using
the hardware-accelerated NetFlow probes and perform their preprocessing. This approach
provides the real-time overview of all connections on the observed link. The preprocessing
layer aggregates global and flow statistics to speed-up the analysis of the data.
Cooperative threat detection layer consists of specialized, heterogeneous agents that seek
to identify the anomalies in the preprocessed traffic data by means of their extended trust
models. There are four agents that employ detection methods based on MINDS, work of
Xu et al. and work of Lakhina et al. Note the agents are not complete implementation of the
methods described in previous sections. The authors chose only these features and ideas that
are computationally efficient in near-real-time and even is possible to integrate them into
the whole agent platform. For example, MINDS agent performs only simplified observation
of time window-defined features and compares them with history data to determine the
anomaly of each flow.
As a result, each agent determines the anomaly of each flow as a value in the [0, 1] inter-
val, where 1 represents the maximal anomaly, and 0 no anomaly. The values are shared with
other agents. Each agent integrate these values into its trust model. To preserve the compu-
tational feasibility, these models work with significant flow samples and their trustfulness
in the identity-context space.
Trustfulness is also determined in the [0, 1] interval, where 0 corresponds to complete
distrust and 1 to complete trust. Hence, low trustfulness means that the flow is considered
as a part of an attack.
17
2.7. SUMMARY
The identity of each flow is defined by the features we can observe directly on the flow:
srcIP, dstIP, srcPrt, dstPrt, protocol, number of bytes and packets. If two flows in a data set
share the same values of these parameters, they are assumed to be identical. The context of
each flow is defined by the features that are observed on the other flows in the same data
set, such as the number of similar flows from the same srcIP, or entropy of the dstPrt of
all requests from the same host as the evaluated flow. [4] The identities are the same for all
agents, but the contexts are “agent-specific”.
The anomaly of each flow is used to update the trustfulness of flow samples in its vicinity
in the identity-context space. Each agent uses a distinct distance function, because it has a
different insight into the problem. The cross correlation function is implemented to eliminate
random anomalies.
Finally, each agent determines the trustfulness of each flow and all agents provide their
trustfulness assessment to the aggregation and visualization agents, and the aggregated val-
ues can then be used for traffic filtering. The authors can define the common misclassifica-
tions errors using the trustfulness and maliciousness of the flow. The flows that are malicious
and trusted are denoted as false negatives, and the flows that are untrusted, but legitimate
are denoted as false positives. [4]
The higher level is operator and analyst interface layer. The main component is an in-
telligent visualization agent that helps the operator to analyze the output of the detection
layer, by putting the processed anomaly information in context of other relevant informa-
tion. When the detection layer detects suspicious behaviour on the network, it is reported to
visualization.
Evaluation First of all, note that CAMNEP as a whole stands on the incorporated detec-
tion methods. One advantage is that the architecture is modular. The agent platform can be
widened by other agents, other (new) anomaly-based detection methods. The authors argue
that the use of trust model for integration of several anomaly detection methods and effi-
cient representation of history data shall reduce the high rate of false positives which limits
the effectiveness of current intrusion detection systems. We participated on the evaluation
and testing of the system. Results are also described in [4].
In a nutshell, the attacks with more than several hundreds flows are consistently discov-
ered by all agents. The slower attacks, using lower number of flows (300 and less) are more
tricky. Note that the evaluation was performed in a campus network loaded with thousands
of flows per second. On the other hand, CAMNEP is not able to detect attacks consist of few
packets, e. g. buffer overflow attack.
2.7 Summary
In this chapter, we studied a few detection methods for security analysis of a computer
network. Definitely, this is not an exhaustive list of known methods, but a selection of wide-
spread and as well as interesting methods and approaches.
We started with the commonly used signature-based method. Although, it operates at
higher layers than we are focused on, it is good for a comparison with other methods. Then
18
2.7. SUMMARY
we briefly described and evaluated stateful protocol analysis that extends previous method
in a particular way. Both methods inspect packets even their payload. Note this approach
also can interfere with law issues.
In contrast, the anomaly-based detection methods generally process flows, namely 5-
tuple (srcIP, srcP ort, dstIP, dstP ort, protocol) constructed from packet headers. It is more
efficient, particularly in multi-gigabit networks. On the other hand, the flow acquisition is
not a simple task, especially for non-dedicated devices such as routers. Due to that fact,
packet sampling is used. Unfortunately, it can introduce some inaccuracy. The impact of
packet sampling on anomaly detection is discussed in [15]. We think that future work could
be aimed at other key features that form the flow. Thus, the 5-tuple could be changed and/or
extended.
Another significant contrast between statistical methods and the others is that statistical
methods build behaviour profiles at host and service levels using traffic communication
patterns without any presumption on what is normal or anomalous. However, the “level
of presumption” differs. While Holt-Winters algorithm builds a model for normal traffic
based on parameter settings and a priori knowledge of the periodic structure in traffic, the
methods proposed by Xu et al. and Lakhina et al. do not rely on any parameter settings and
normal traffic behaviour is captured directly in the data.
Next, the statistical anomaly-based methods have to cope with three basic steps that were
outlined in [23]:
• detection,
• identification,
• quantification.
In fact, there is only one step in case of the other methods. They simply “know” what they
find (e. g., in terms of signature or protocol definition), hence we a priori identify a searched
anomaly and quantify its relevance.
Finally, there are a few existing IDSs based on the mentioned methods. Snort is a leading
representative of signature-based IDS and the de facto standard for intrusion detection. It
is wide-spread because it is an open-source software. Another commonly used system is
Bro [2]. Currently, we did not find any network-based toolset that implements anomaly-
based detection methods. The one exception to this conclusion is most likely CAMNEP that
validated the selected methods in distinct environment to the authors’ environment.
19
Chapter 3
Visualization
The key problem of the analysis is to comprehend the results of the whole process. We can
acquire data that (truly) picture the network traffic and process them by various methods.
However, if we do not use any data-mining technique, we still have to interpret the results
manually. It is throughout feasible in small network, but absolutely inconceivable in high-
speed networks because the human being does not manage to evaluate the large amount of
information. The visualization should help us and present significant information in differ-
ent and more comfortable view.
For example, tcpdump is the most used tool for network monitoring and data acquisi-
tion. It is a command-line tool that can read packets from network interface or data file and
display each packet on a new line on output. In contrast, a network packet analyzer Wire-
shark1 utilizes graphical user interface (GUI) and, for instance, “colourizes” packet display
based on filters. Actually, the tool processes classification and results are presented as var-
ious colours. We also can interactively browse the capture data, view summary and detail
information for each packet. We confirm such (small) improvements ease the analysis.
However, not only the colours usage is the visualization. In this chapter, we discuss the
visualization as an integral part of modern security analysis. We outline some ways of vi-
sualization in current software tools and evaluate their contribution to the analysis acceler-
ation. We mainly focus on open-source software that visualize captured network traffic in
pcap or NetFlow format. Meanwhile data in pcap format contain packet headers and the
payload, NetFlow records intentionally omit the payload.
3.1 Charts
The basic visualization instrument is a chart. There are many tools extending basic software
that perform only data acquisition. These tools often plot two-dimensional charts that depict
time series of monitored values or their aggregations. It is a simple and thus widespread
method of visualization. Namely, NfSen [28] integrates nfdump outputs with various charts
that show time series of total number of packets, flows and traffic volume. See Figure 3.1.
The charts are also used in other tools such as FlowScan2 , Java Netflow Collect-Analyzer3 ,
1. The tool was formerly known as Ethereal (see http://www.ethereal.com/) which still exists as a separate
project.
2. http://net.doit.wisc.edu/~plonka/FlowScan/
3. http://sourceforge.net/projects/jnca/
20
3.2. MAPPING IN SPACE
This visualization technique draws points in two or quasi three-dimensional space that is
displayed on a screen. It makes use of the human stereoscopic vision and “convert” patterns
in the captured data into graphic patterns in defined space.
For instance, The Spinning Cube of Potential Doom is an animated visual display of net-
work traffic. Each axis of cube represents a different component of a TCP connection: X is
the local IP address space, Z is the global IP addresses space and Y is the port numbers used
in connections to locate services and coordinate communication (such as 22 for SSH and 80
for HTTP). TCP connections, both attempted and successful, are displayed as single points
for each connection. Successful TCP connections are shown as white dots. Incomplete TCP
connections are shown as coloured dots. Incomplete connections are attempts to communi-
cate with nonexistent systems or systems no longer listening on that particular port number.
The Cube colours incomplete connections using a rainbow colour map with colour varying
4. http://www.ntop.org/
5. http://shlang.com/nfstat/
6. http://netflow.cesnet.cz/
7. http://www.caligare.com/netflow/index.php
8. http://stager.uninett.no/
21
3.3. GRAPHS
by port number; colour mapping assists viewers in locating the point in 3D space. [6] For
example, a port scan in captured data creates a line in the cube (see Figure 3.29 ). It is more
useful and efficient view on such event comparing to a manual examination of a tcpdump
or even Wireshark output.
An extension of the Cube is InetVis [16]. Similar approach is also used by Flamingo [9].
PortVis [29] and tnv [45] rather use two-dimensional space.
Figure 3.2: Port scan in The GPL Cube of Potential Doom, a GPL network visualizer based
on the Spinning Cube of Potential Doom.
3.3 Graphs
A natural representation of the network traffic is a graph where vertices correspond to hosts
and (oriented) edges correspond to the communication (flows) captured between the hosts
(see Figure 3.3). This structure digestedly depicts who communicates with whom. For a
comparison, classic output is on Figure 3.4.
22
3.3. GRAPHS
NfVis10 stands for NetFlow Visualizer and it is a proof of concept tool based on the
prefuse visualization toolkit11 . The graph-based traffic representation is enhanced with sev-
eral significant features. The user can list the flows and traffic statistics associated with each
edge/host. The traffic can be filtered and aggregated according to many relevant features.
The visual attributes of the display (such as node/edge size and colour) can also adapt
to these characteristics, making the user’s orientation easier. The information provided by
“third parties” (DNS and whois) is seamlessly integrated into the visualization. As current
network traffic is a scale-free network, it is particularly important to handle the visualization
of supernodes, i.e. the nodes with a high number of connections. These nodes are typical for
many attack scenarios, as well as for high-value targets. Visualizer therefore replaces the
one-shot connections to/from these hosts by a special representation of a “cloud” of traffic,
and only singles out the nodes that also connect to other nodes in the observed network. [5]
MyNetScope12 is a network visual analytics platform based on the standard NetFlow
data and heterogeneous data sources. It evolves NfVis in two important ways. Firstly, it can
incorporate other external data sources such as DNS resolution, whois response, outputs of
10. This software was denoted as Visio Agent in the CAMNEP project.
11. http://prefuse.org/
12. http://www.mycroftmind.com/products:mns
23
3.3. GRAPHS
various anomaly detection methods and the network topology information. Secondly, it is
a scalable solution even for wide networks. We participate on its development and testing,
hence we can confirm these statements. The integration of external data sources is very
welcome because it is not common that a security analyst works only with primary data
such as tcpdump outputs or NetFlow records. He or she generally has to gather additional
information from other available sources. Otherwise, the complete inspection of the security
incident is not possible.
We also mention other graph-based visualization tools. VisFlowConnect-IP visualizes
network traffic as a parallel axes graph with hosts as nodes and traffic flows as lines con-
necting these nodes. These graphs can then be animated over time to reveal trends. [47] Co-
operative Association for Internet Data Analysis (CAIDA) develops two interesting tools.
LibSea13 is both a file format and a Java library for representing large directed graphs on
disk and in memory. Scalability to graphs with as many as one million nodes has been the
primary goal. Additional goals have been expressiveness, compactness, and support for
application-specific conventions and policies. Walrus14 is a tool for interactively visualizing
large directed graphs15 in three-dimensional space. By employing a fisheye-like distortion, it
provides a display that simultaneously shows local detail and the global context. Although,
they are not specialized application for network traffic visualization, it would be useful to
13. http://www.caida.org/tools/visualization/libsea/
14. http://www.caida.org/tools/visualization/walrus/
15. LibSea graph files
24
3.4. SUMMARY
combine them for this purpose if there was a tool that provides output in LibSea format.
3.4 Summary
We explained why the visualization is important in the security analysis and introduced
three techniques and tools that they utilize. The common used charts were subsequently
complemented by methods that use mapping in space and graph representation of network
traffic. We also summarized their contribution to the analysis.
Naturally, the progress of visualization tools is connected with development of tools that
acquire and/or process network data. E. g., both tcpdump and Wireshark stand on libpcap a
system-independent interface for user-level packet capture16 . Similarly, NfSen is a graphical
web-based front end for the nfdump NetFlow tools and Walrus stands on LibSea.
We hope that a good visualization tool should display a complex picture of the network
traffic, ideally with marked up-to-date security incidents. However, all available details of
hosts and their communication should be displayed in well-arranged tables, charts and list-
ings on demand too.
25
Chapter 4
We described and evaluated several approaches to the intrusion detection as well as visual-
ization techniques of network traffic. In this chapter, we take into account our conclusions
and discuss the design of the intrusion detection system for large networks. First, we iden-
tify and give reasons for the requirements on such IDS and then we design a solution that
meet these requirements.
First of all, notice that we decided for intrusion detection system. In contrast to intrusion
prevention system (IPS), it “only” monitors the network traffic and alerts an operator in case
of a security incident. Consequently, he or she analyses the incident and eventually ensures
its mitigation. If we deployed IPS and it alerted false positive, it would immediately block a
legitimate network connection. Another reason is that IPS must be in-line (a part of the link).
When the IPS fails, the whole network may fail as well. Hence, we are conservative because
of the occurrence of false positive alarms and system failure. These are the main reasons for
the IDS deployment.
4.1.1 Accuracy
Now, there are many IDS capable of detection of known threats, especially signature-based
IDS such as Snort. Their drawback is that the rule base of such IDS has to be maintained by
the network or security administrator. Moreover, novel threats are included in the rule base
manually, often by third-party vendors. Finally, it is obvious that these systems are forceless
to novel threats. Therefore, the proposed IDS should detect even novel threats by some more
efficient detection mechanism.
26
4.1. REQUIREMENTS ON THE IDS
4.1.7 Scalability
IDS should monitor a network consisting of hundreds as well as thousands of computers.
IDS should be scalable and should not require any additional maintenance when a new
host is connected to the network or another host is disconnected or replaced. Again, the
additional maintenance annoys network administrators.
4.1.9 Transparency
The notion of transparency actually comprises two requirements. First, the IDS should be
“invisible” at the IP layer. That means we should not assign any IP to the IDS (except a
27
4.2. SOLUTION
management module). This is required to avoid some attacks such as (distributed) denial of
service (DDOS and DOS) where attacker floods the network with packets destined for the
IP address of IDS. Second, the IDS should not markedly influence network topology and
network traffic in any way. Namely, latency should be preserved and the IDS should not
load network links uselessly.
4.2 Solution
• Scalability,
• Easy Maintaining,
• Security Robustness.
In contrast to Host-based IDS (HIDS), the deployment of a new host in network does not
demand more effort to monitor the network activity of the new host. There is no need to
install any specialized software on the host. Note that the network may consist of some
specialized hosts (besides common servers or workstations). So, the HIDS installation is
impossible in such a case. Next, NIDSs are passive devices, “invisible” for the attackers. On
the contrary, HIDSs rely on processes that running in the operating system of the host. We
28
4.2. SOLUTION
also consider the deployment, testing and possible upgrade of IDS. Generally, it is easier to
update one component of NIDS than many components of HIDS on hosts.
We propose the solution that is consisted of several components and layers. Network
probes are “eyes and ears” of the proposed intrusion detection system. Collectors are the
“memory”, MyNetScope with data sources is the “brain and heart” and MyNetScope an-
alyst console acts as the “mouth” of the IDS. The “nervous system and blood circulation”
is represented by network links that connect all parts together. After all, the architecture
(Figure 4.1) is similar to the CAMNEP architecture depicted in Figure 2.1.
Probes create the bottom layer of our system. They acquire network traffic and serve collec-
tors with captured data. This section discusses probe features and probe deployment in the
administered network.
Data acquisition Network probes monitor the link and export captured data in the Net-
Flow format. We decided for this format to meet the requirement on operating in multi-
gigabit networks. We reject the use of SNMP counters and packet traces. The former gives
coarse-grained data and the latter is very difficult. It is practically infeasible to capture and
store packet at wire speed even with specialized hardware.
We emphasis we do not rely on NetFlow data that export some (edge) Cisco routers that
may exist in present network. Not only our measurements reveal that Cisco’s routers do not
29
4.2. SOLUTION
export NetFlow correctly in all circumstances. [26] Obviously, the main task of the router is
to route network traffic. We must take into account that NetFlow export is additional feature.
On the other hand, the NetFlow data from routers can be supplemental data source for our
system.
Next, we rather avoid the packet sampling due to possible distortion of acquired data.
Our decision is supported by [15].
We recommend to use probes based on COST (commercial off-the-shelf) computers be-
cause of their cost. There are two alternatives of network interface cards (NIC) used in the
probes. The former utilizes common NIC (such as Intel) and the latter rely on the COMBO
technology developed in the Liberouter project2 . The software probes that capture network
traffic by NIC (such as nprobe) is not sufficiently efficient. [20] Hence, we deploy Flow-
Mon, a hardware-accelerated passive network monitoring probe. [10] Generally, the soft-
ware probes are satisfactory for small networks, the hardware-accelerated probes for large,
multi-gigabit networks. Both types of probes meet the requirement on transparency since
they are “invisible” at the IP layer. There is no IP address assigned to the interface perform-
ing packet capturing. IPv6 is supported thanks to the use of NetFlow version 9.
Location A network probe monitors traffic passing through a certain node of the net-
work. Thus, the location of the network probe determines what is monitored. This is very
important because the proposed system is based on data provided by network probes. Ide-
ally, each packet that ingresses or egresses the administered network should pass through
the place where the probe is located. We discuss this with network administrators of the
campus network of the Masaryk University. We identify that the probes should be located
“in the neighbourhood” of the edge router considering the network traffic from/to the In-
ternet.
Figure 4.2 shows the location of the main probe. We were choosing between two alterna-
tives. We suppose that the edge router acts as a firewall too. If we placed the probe in front of
the router/firewall, we would also monitor the traffic that would not enter the administered
network. We chose the second alternative. The main probe is located in the administered
network, behind the router/firewall. This ensures that the probe “see” only the traffic that
passed through the firewall. The firewall usually implements (a part of) the security policy
of the organization.
As discussed above, we will not insert the probe into the network link, but only a net-
work tap. It is a hardware device which provides a way to access the data flowing across a
computer network3 . Thus, we actually delegate the responsibility for the continuous oper-
ating to the tap. If we use the tap that requires power supply, we should connect it to the
uninterruptible power supply (UPS). Also we should choose tap with dual power supply
unit in case of failure.
The main probe is capable to capture only the attacks that originate from or are destined
for outside the network. Concerning attacks by insiders, we propose to deploy other probes
30
4.2. SOLUTION
inside our network, specially in front of/behind the firewalls that protect particular network
segments. Then we can reveal possible malicious activities of hosts in our network. For in-
stance, Figure 4.3 depicts deployment of one main probe and three inside the administered
network. It can be demanded in campus or corporate networks. There is one segment con-
sisting of more sensitive servers than the others or the organization is large enough to mon-
itor network traffic inside the organization. The details of the deployment in the Masaryk
University network are discussed in the next chapter.
31
4.2. SOLUTION
addresses (or the whole subnet) for the honeypot. If it observes a connection attempt to
such address, it logs the host that originated the connection. However, we ought to avoid
premature conclusions. For example, consider an user who type an incorrect IP address,
misconfigured host and so on.
Security Security robustness is very important for such devices as network probes. The
probe itself is controlled via management interface. We use secure channel (namely SSH)
and the access is granted only from specified IP addresses. We employ identity management
system such as RADIUS [31]. It is advantageous to distributed systems because it eliminates
synchronization issues. Last, but not least, we use NTP4 to synchronize the clocks of comput-
ers over a network. Since the probes timestamp the flows using the host time it is necessary
to set the precise time.
Maintenance and Management Generally, the probes are easy to maintain devices. If we
place them in network and set up, they will work and fulfill their task. However, if they do
not send any data to the collector, we cannot determine whether the monitored link or the
probe fails. Hence, we employ NETCONF Configuration Protocol [36] over SSH to monitor
a probe status.
4.2.2 Collectors
A NetFlow collector is responsible for correct reception and storing NetFlow data that are
exported by network probes. To prevent reinventing the wheel, we use existing tools and
software that is well tested and wide-spread. In case of NetFlow collectors, we rely on nf-
dump and NfSen toolset [28]. Our collectors receive and store NetFlow records but also per-
form some preprocessing tasks such as periodically execution of scripts that monitor policy
violation. Collectors comply with requirements described above as well as other parts of the
proposed IDS.
Security To meet security requirements, we specify IP addresses of probes that are au-
thorized to send the NetFlow data to the particular collector. Notice that the collector itself
does not restrict the reception of NetFlow records. It can be considered to be a security threat
since the NetFlow records are transmitted in UDP packets that can be easily forged. If we do
not want to transmit NetFlow records via the same network, we can connect the collectors
directly to the probes through local network and thus considerably intensify the security. In
addition, this could lighten the loaded network links.
Long-term data storage Although NetFlow records are already aggregated (in terms of
network flows), they occupy relatively a lot of disk space. For example, the records that
cover one month of network traffic of large campus network occupy about 240 GB of disk
space5 . If we do not deploy more probes, we could utilize only one collector. Nevertheless,
long-term data storage requires enough space on disk drives.
In addition, it may be required by some law. This is regulated by “Vyhláška č. 485/2005 Sb.”6
4. NTP stands for Network Time Protocol. See http://www.ntp.org/ for details.
5. The records are stored in nfcapd format.
6. See http://www.sagit.cz/pages/sbirkatxt.asp?zdroj=sb05485&cd=76&typ=r for details (in Czech).
32
4.2. SOLUTION
In this section, we describe the core of our intrusion detection system. This layer requires
data from collectors and other sources for its operation.
MyNetScope We employ MyNetScope platform that was briefly described in Section 3.3.
It is not a standalone application, it is designed as client/server architecture. The server
reads NetFlow records from collectors, performs some preprocessing tasks on the flows and
replies to analyst’s queries that are submitted by client application (analyst console). Again,
the entire communication between all parts is encrypted. We use SSH tunnels.
CAMNEP MyNetScope itself does not perform intrusion detection. It is very useful vi-
sualization tool that meets the requirements in Section 4.1.12. Its power is in integration of
external data sources. We decided to deploy part of the CAMNEP project (described and
evaluated in Section 2.6.5) as the “brain” of our intrusion detection system. Thus, we can
meet following requirements:
• Accuracy,
• Early Detection,
We use mainly the CAMNEP Cooperative Threat Detection Layer that combines modern in-
trusion detection methods. In summary, we get better accuracy than we would deploy par-
ticular anomaly detection methods separately. The methods are able to detect novel threats
and anomalies in case of the security anomaly is captured as network traffic anomaly too.
For instance, a worm spreading or denial of service attack is “visible” in network flows. On
the contrary, single packet that causes buffer overflow on a host computer does not repre-
sent the network traffic anomaly. Next, the methods were designed for high-speed networks
from the very beginning or they were modified to meet this requirement. The detection is
performed in 5-minute time windows. This is a reasonable interval due to flow aggregation,
commonly used in connection with NetFlow. Finally, since the methods work purely with
packet headers, the anomaly detection is possible even in case of the encrypted payload.
CAMNEP Detection Layer computes for each network flow its trustfullness. This value
is then imparted to MyNetScope and the user can view the suspicious flows and query the
MyNetScope for other relevant information.
Other data sources Apart from CAMNEP, we also utilize other data sources such as DNS
server, whois service or specific scripts that periodically check for policy violation. Their
output is then included in MyNetScope too. These scripts are discussed in the next chapter.
33
4.3. SUMMARY
4.3 Summary
• collectors,
34
Chapter 5
We have already begun with system deployment and testing in the large campus network of
Masaryk University. This chapter summarizes our present experience in using the designed
system. First, we describe in detail the system deployment status. We structure the descrip-
tion according to Section 4.2. Then we outline a use case and compare a security analysis
performed with the help of the designed system with the classic approach.
1. For instance, “Směrnice rektora č. 2/2003, Užívání počítačové sítě Masarykovy univerzity”, see
http://is.muni.cz/do/1499/normy/smernicerektora/Smernice_rektora_2-2003.pdf (in Czech).
2. The router copies all packets that pass through it to this port.
3. AS stands for Autonomous System.
35
5.1. DEPLOYMENT STATUS
We have recently deployed the second probe. It is located in front of two routers that
connect the Faculty of Informatics with the university backbone. There are two network taps
between the backbone routers and the routers of the Faculty. The probe is connected to the
taps. We employ a four-port software probe FlowMon there. According to our measurement,
we decided for non-accelerated version of the probe. Although, we have not yet acquired
any data from this probe, we expect the link usage will be lower than in case of the main
probe. Both probes export data in NetFlow version 9 format.
A honeypot deployment is being prepared. We decided for Honeyd. It is a small daemon
that creates virtual hosts on a network. The hosts can be configured to run arbitrary services,
and their personality can be adapted so that they appear to be running certain operating
systems. Honeyd enables a single host to claim multiple addresses.4 Network administrators
have already assigned the address space for honeypots. We dispose of 254 IPv4 addresses
for a honeypot operation. Although the address space is unused, we can observe numerous
requests for the connection originated outside the administered network. So we expect the
honeypot will help us with security analysis. We also plan to assign some IPv6 addresses to
the honeypot.
In this phase of the deployment, we decided to assign public IP addresses to the probe
management interfaces due to an easier access and maintanance. We will consider the use of
private addresses with respect to the security issues in the next phase. Similarly, we have not
yet deployed the uniform identity management. However, the firewall (iptables) is running
on the probes.
We plan to deploy other probes in possibly interesting locations such as the Faculty of
Education, the Faculty of Science, the University Campus at Bohunice, the Faculty of Law
and students’ hostels.
5.1.2 Collectors
We still use only one PC5 equipped with 1TB hard drive. We estimate that this is sufficient
to store NetFlow record from the main probe for about 4 months. We will consider the usage
of some data thinning technique, compression or other collectors dedicated to each probe.
Results obtained from the data acquisition by the second probe can answer this question.
Nevertheless, we have to cope with the trade-off between the long-term data storage and
the completeness of the records.
The collector is also utilized for preproccessing. There is the cron daemon6 periodically
executing scripts that check the policy violation. The scripts are described in detail in the
next subsection. They usually perform tasks that load the collector and their evaluation last
some time (typically a few minutes). It is not surprising, because they typically process all-
day data (up to 17 GB). So, the scheduling and planning has become more important in case
of many scripts.
4. http://www.honeyd.org/
5. Intel Xeon 2 GHz CPU, 2 GB of RAM, Linux 2.6.9.
6. See http://unixhelp.ed.ac.uk/CGI/man-cgi?cron+8 for details.
36
5.1. DEPLOYMENT STATUS
Similarly to the probes, the collector is protected by firewall and communicates via as-
signed public IP address.
7. We discussed the time interval with network administrators and they found reasonable to execute the script
once a day.
8. See http://unixhelp.ed.ac.uk/CGI/man-cgi?host for details.
37
5.2. USE CASE
In spite of the fact all parts of the system have not been deployed yet, we can use some
its components for security analyses of the Masaryk University network, namely the main
NetFlow probe and the NetFlow collector. We mention the system use case in this section.
In April 2008, the Masaryk University received a warning on a phishing scam from Security
Incident Response Team (SIRT) of Internet Identity9 .
Phishing is an attempt to criminally and fraudulently acquire sensitive information, such
as usernames, passwords and credit card details, by masquerading as a trustworthy entity in
an electronic communication10 . SIRT investigated that a computer administered by Masaryk
University act as a web hosting server of a forged website of an American bank. Network
administrators had confirmed this. Consequently, they disconnected the host from the net-
work and informed us. We had to investigate this security incident in three ways:
1. to validate the findings of SIRT,
9. http://www.internetidentity.com
10. Cited from Wikipedia. See http://en.wikipedia.org/w/index.php?title=Phishing&oldid=211566316 for de-
tails.
38
5.3. SUMMARY
many flows. So we fulfilled even the third point and closed the investigation of the security
incident. We enclose a CD-ROM containing all relevant data to this incident (see Appendix B
for the CD contents).
After some time, the administrator provided us a disk image of the entire drive of the
attacked host. We found in log files some entries that confirmed our findings. Of course,
we could investigate the incident without our system. We could only inspect the system log
files. However, the logs or the whole host are not always available. For instance, consider
the advanced attacker who deletes the log files.
We emphasis that we used only two (lower) layer of the designed system: FlowMon
probe and NfSen collector. After the CAMNEP deployment the system will automatically
determine a list of hosts (flows) with low trustfullness. In addition, MyNetScope platform
visualizes the traffic as a graph, a natural picture of a network traffic.
5.3 Summary
We described the status of the development of the designed system. We were focused on
our work: system component testing, development and integration of other data sources
into the whole system (e. g., scripts that check the organization security policy). Although
some parts of the system are still under development, we could use it to investigate the
security incident with satisfactory results.
39
Chapter 6
Conclusion
The goal of this thesis was to design a system that simplifies a security analysis of large net-
works. First of all, we studied the state of the art in intrusion detection and prevention. We
focused on modern methods that operate at the IP layer since they are efficient in high-speed
gigabit networks. On the contrary, stateful protocol analysis or signature-based detection
performed at higher levels of the TCP/IP model are both resource demanding tasks. Hence,
some statistical methods do not inspect the whole packet but only the packet headers. They
operate on NetFlow data acquired from routers (typically from Cisco devices) or the packet
traces that are later “converted” into network flows. Although these methods work only
with the packet headers, they are able to detect some anomalies in the network behaviour.
Next, we identified and explained essential requirements on the intrusion detection sys-
tem. Then we designed a distributed system that meets the requirements. The system con-
sists of several various components. We combined some existing subsystems and have been
developing an integration platform. We employed hardware-accelerated NetFlow probes,
honeypots, NetFlow collectors, MyNetScope platform and other data sources such as DNS,
whois and the output of other scripts that (pre)process acquired data.
We note there are about fifteen people involved in this long-term and dynamic project.
We contributed to the system development by testing the particular components and ex-
amples of scripts that check some organization’s security rules. These scripts are in routine
operation and we can easily validate the adherence to the rules. We also tested a part of the
system on the investigation of a security incident that was reported by a third-party. As a
result, we identified a host that had attacked computer from the Masaryk University. The
host changed the superuser password and ran a forged website to acquire usernames and
passwords of clients of a bank.
Finally, we suggest future work could be aimed at developing a new detection method
based on new directions in data acquisition. Namely, the use of IPFIX format would “ac-
cess” interesting feature in the packet payload for the anomaly-based detection methods.
Currently, we are bounded by 5-tuple of NetFlow format. Also a closer integration of other
data sources such as honeypots would be valuable.
40
Bibliography
[1] Northcutt, S. and Frederick, K. and Winters, S. and Zeltser, L. and Ritchey, R.: Inside
Network Perimeter Security: The Definitive Guide to Firewalls, VPNs, Routers, and
Intrusion Detection Systems, New Rider’s Publishing, 2003, 978-0735712324. 2.1, 2.4
[2] Paxson, V.: Bro: A System for Detecting Network Intruders in Real-Time, 1999,
<http://www.icir.org/vern/papers/bro-CN99.html> . 2.7
[3] Brutlag, J.: Aberrant behaviour Detection in Time Series for Network Monitoring,
2000, <http://www.usenix.org/events/lisa00/full_papers/brutlag/
brutlag_html/index.html> . 2.6.1
[4] Rehák, M. and Pěchouček, M. and Bartoš, K. and Grill, M. and Čeleda, P. and Krmíček,
V.: CAMNEP: An intrusion detection system for high-speed networks, 2008, <http:
//www.nii.ac.jp/pi/n5/5_65.pdf> . 2.6.5, 2.1, 2.6.5
[5] Rehák, M. and Pěchouček, M. and Čeleda, P. and Krmíček, V. and Novotný, J. and Mi-
nařík, P.: CAMNEP: Agent-Based Network Intrusion Detection System (Short Paper),
2008. 3.3
[6] Lau, S.: The Spinning Cube of Potential Doom, 2004. 3.2
[7] Senie, D. and Sullivan, A.: Considerations for the use of DNS Re-
verse Mapping , 2008, <http://www.ietf.org/internet-drafts/
draft-ietf-dnsop-reverse-mapping-considerations-06.txt> . 5.1.3
[8] Graham, I.: Achieving Zero-loss Multi-gigabit IDS Results from Testing Snort on
Endace Accelerated Multi-CPU Platforms, 2006, <http://www.touchbriefings.
com/pdf/2259/graham.pdf> . 2.4
[9] Oberheide, J. and Goff, M. and Karir, M.: Flamingo: Visualizing Internet Traffic, 2006.
3.2
[10] Čeleda, P. and Kováčik, M. and Koníř, T. and Krmíček, V. and Žádník, M.: CESNET
technical report number 31/2006: FlowMon Probe, 2006, <http://www.cesnet.
cz/doc/techzpravy/2006/flowmon-probe/flowmon-probe.pdf> . 4.2.1
[11] Malagon, C. and Molina, M. and Schuurman, J.: Deliverable DJ2.2.4: Findings
of the Advanced Anomaly Detection Pilot, 6. 9. 2007, <http://www.geant2.
net/upload/pdf/GN2-07-218v2-DJ2-2-4_Findings_of_the_Advanced_
Anomaly_Detetion_Pilot.pdf> . 2.6.1
[12] Scarfone, K. and Mell, P.: Guide to Intrusion Detection and Prevention Systems
(IDPS), 2007, <http://csrc.nist.gov/publications/nistpubs/800-94/
SP800-94.pdf> . 2, 2.4, 2.5, 2.6
41
[13] Spitzner, L.: Honeypots, 2003, <http://www.tracking-hackers.com/papers/
honeypots.html> . 4.2.1
[14] Chatfield, C. and Yar, M.: Holt-Winters Forecasting: Some Practical Issues, 1988. 2.6.1
[15] Brauckhoff, D. and Tellenbach, B. and Wagner, A. and Lakhina, A. and May,
M.: Impact of Packet Sampling on Anomaly Detection Metrics, 2006, <http:
//cs-people.bu.edu/anukool/pubs/anomalymetrics-sampling-imc06.
pdf> . 2.6.4, 2.7, 4.2.1
[16] van Riel, J. and Irwin, B.: InetVis, a visual tool for network telescope traffic anal-
ysis, 2006, <http://www.cs.ru.ac.za/research/g02v2468/publications/
vanRiel-Afrigraph2006.pdf> . 3.2
[17] Brockwell, P. and Davis, R.: Introduction to Time Series and Forecasting, Second Edi-
tion, 2002, Springer-Verlag New York, Inc., 0-387-95351-5. 2.6.1
[18] Zhang, X. and Li, C. and Zheng, W.: Intrusion Prevention System Design, 2004. 2.2
[19] Deering, S. and Hinden, R.: RFC 2460: Internet Protocol, Version 6 (IPv6) Specification,
1998, <http://www.ietf.org/rfc/rfc2460.txt> . 4.1.6
[22] Lakhina, A. and Crovella, M. and Diot, C.: Mining Anomalies Using Traf-
fic Feature Distributions, 2005, <http://cs-people.bu.edu/anukool/pubs/
sigc05-mining-anomalies.pdf> . 2.6.4, 2.6.4
[24] Lakhina, A. and Papagiannaki, K. and Crovella, M. and Diot, C. and Kolaczyk, E. and
Taft, N.: Structural Analysis of Network Traffic Flows, 2004, <http://cs-people.
bu.edu/anukool/pubs/odflows-sigm04.pdf> . 2.6.4, 2.6.4
[25] Breuni, M. and Kriegel, H. and Ng, R. and Sander, J.: LOF: Identifying Density-
Based Local Outliers, 2000, <http://www.dbs.informatik.uni-muenchen.
de/Publikationen/Papers/LOF.pdf> . 2.6.2
[26] Summer, R. and Feldmann, A.: NetFlow: Information loss or win?, 2002. 4.2.1
42
[27] Ertöz, L. and Eilertson, E. and Lazarevic, A. and Tan, P. and Kumar, V. and Srivastava,
J. and Dokas, P.: The MINDS - Minnesota Intrusion Detection System, 2004, <http:
//www-users.cs.umn.edu/~kumar/papers/minds_chapter.pdf> . 2.6.2
[29] McPherson, J. and Ma, K. and Krystosk, P. and Bartoletti, T. and Christensen, M.:
PortVis: a tool for port-based detection of security events, 2004. 3.2
[30] Barr, D.: RFC 1912: Common DNS Operational and Configuration Errors, 1996,
<http://www.ietf.org/rfc/rfc1912.txt> . 5.1.3
[31] Rigney, C. and Willens, S. and Rubens, A. and Simpson, W.: RFC 2865: Remote Au-
thentication Dial In User Service (RADIUS), 2000, <http://www.ietf.org/rfc/
rfc2865.txt> . 4.2.1
[32] Phaal, P. and Panchen, S. and McKee, N.: RFC 3176: InMon Corporation’s sFlow:
A Method for Monitoring Traffic in Switched and Routed Networks, 2001, <http:
//www.ietf.org/rfc/rfc3176.txt> . 2.3.2
[33] Quittek, J. and Zseby, T. and Claise, B. and Zander, S.: RFC 3917: Requirements for IP
Flow Information Export (IPFIX), 2004, <http://www.ietf.org/rfc/rfc3917.
txt> . 2.3.1
[34] Claise, B.: RFC 3954: Cisco Systems NetFlow Services Export Version 9, 2004, <http:
//www.ietf.org/rfc/rfc3954.txt> . 2.3.1
[35] Leinen, S.: RFC 3955: Evaluation of Candidate Protocols for IP Flow Information Ex-
port (IPFIX), 2004, <http://www.ietf.org/rfc/rfc3955.txt> . 2.3.1
[36] Enns, R.: RFC 4741: NETCONF Configuration Protocol, 2006, <http://www.ietf.
org/rfc/rfc4741.txt> . 4.2.1
[37] Claise, B.: RFC 5101: Specification of the IP Flow Information Export (IPFIX) Protocol
for the Exchange of IP Traffic Flow Information, 2008, <http://www.ietf.org/
rfc/rfc5101.txt> . 2.3.1
[38] Claise, B. and Quittek, J. and Bryant, S. and Aitken, P. and Meyer, J.: RFC 5102: In-
formation Model for IP Flow Information Export, 2008, <http://www.ietf.org/
rfc/rfc5102.txt> . 2.3.1
[39] Trammell, B. and Boschi, E.: RFC 5103: Bidirectional Flow Export Using IP Flow Infor-
mation Export (IPFIX), 2008, <http://www.ietf.org/rfc/rfc5103.txt> . 2.3.1
[40] Boschi, E. and Mark, L. and Quittek, J. and Stiemerling, M. and Aitken, P.: RFC 5153: IP
Flow Information Export (IPFIX) Implementation Guidelines, 2008, <http://www.
ietf.org/rfc/rfc5153.txt> . 2.3.1
43
[41] Postel, J.: RFC 791: Internet Protocol, 2004, <http://www.ietf.org/rfc/
rfc791.txt> . 1
[42] Roesch, M.: Snort – Lightweight Intrusion Detection for Networks, 1999, <http:
//www.usenix.org/event/lisa99/full_papers/roesch/roesch_html/>
. 2.4
[43] Patton, S. and Yurcik, W. and Doss, D.: An Achilles’ Heel in Signature-Based IDS:
Squealing False Positives in SNORT, 2001, <http://www.raid-symposium.org/
raid2001/papers/patton_yurcik_doss_raid2001.pdf> . 2.1, 2.4, 3
[44] Chelli et al., Z.: NIST/SEMATECH e-Handbook of Statistical Methods, 2003, <http:
//www.itl.nist.gov/div898/handbook/> . 2.6.1
[45] Goodall, J. and Lutters, W. and Rheingans, P. and Komlodi, A.: Preserving the
Big Picture: Visual Network Traffic Analysis with TNV, 2005, <http://tnv.
sourceforge.net/papers/goodall-vizsec05.pdf> . 3.2
[46] Kobierský, P. and Kořenek, J. and Hank, A.: CESNET technical report 33/2006: Traffic
Scanner, 2006, <http://www.cesnet.cz/doc/techzpravy/2006/trafscan/>
. 2.4
[47] Yin, X. and Yurcik, W. and Slagell, A.: VisFlowConnect-IP: An Animated Link Analy-
sis Tool For Visualizing Netflows, 2005, <http://www.cert.org/flocon/2005/
presentations/Yin-VisFlowConnect-FloCon2005.pdf> . 3.3
[48] Xu, K. and Zhang, Z. and Bhattacharyya, S.: Profiling Internet BackboneTraffic: be-
haviour Models and Applications, 2005. 2.6.3, 2.6.3
44
Appendix A
Figure A.1: This figure depicts time series of the number of TCP flows. A circle denotes a big
difference between the predicted (the red line) and the observed value (the black line). The
plot was produced by R (http://www.r-project.org/).
45
Appendix B
The CD Contents
The enclosed CD-ROM contains anonymized NetFlow data in nfcapd format and shell
scripts that displays these data. The scripts show relevant information to the security in-
cident desribed in Section 5.2. Also two scripts mentioned in Section 4.2.3 are enclosed.
To summarize, the CD-ROM contains the following files and directories:
• scripts – shell scripts that require nfdump and other system utilities,
46