0% found this document useful (0 votes)
33 views53 pages

Thesis

Campus area network

Uploaded by

Min Khant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views53 pages

Thesis

Campus area network

Uploaded by

Min Khant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

}w!"#$%&'()+,-.

/012345<yA|
M ASARYK U NIVERSITY
FACULTY OF I NFORMATICS

Security Analysis of a Computer


Network

M ASTER ’ S T HESIS

Jan Vykopal

Brno, 2008
Declaration

I hereby declare that I am the sole author of this thesis. All sources and literature used in
this thesis are cited and listed properly.

Advisor: Ing. Jiří Novotný

ii
Acknowledgement

First of all, I would like to thank my advisor, Jiří Novotný, for his guidance and helpful
advices. Many thanks go also to the other colleagues from the Institute of Computer Science.
Last but not least, I am grateful to my family and friends for their love and support. Thank
you all.

iii
Abstract

In this thesis, methods for security analysis at the IP layer are presented and evaluated. The
evaluation is mainly focused on deployment in real high speed networks. Next, a solution
comprising selected method is proposed. The goal of this solution is to simplify work of a
network administrator and speed up the security incident response. Finally, the proposed
solution is tested in the campus network of the Masaryk University.

iv
Keywords

security analysis, intrusion detection, signature, anomaly, MINDS, entropy, host profiling,
CAMNEP, visualization, NetFlow probe, honeypot, NetFlow collector, MyNetScope

v
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Methods for Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Intrusion Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Flow-Based Traffic Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.1 NetFlow and IPFIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.2 Other Flow-based Technologies . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Signature-based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Stateful Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Anomaly-based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.1 Holt-Winters Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.2 Minnesota Intrusion Detection System (MINDS) . . . . . . . . . . . . . 10
2.6.3 The Work of Xu et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.4 Origin Destination Flow Analysis . . . . . . . . . . . . . . . . . . . . . 14
2.6.5 Cooperative Adaptive Mechanism for Network Protection (CAMNEP) 16
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Mapping in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Design of the IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Requirements on the IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Detection of Novel Threats . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 Operating in a High-speed Networks . . . . . . . . . . . . . . . . . . . 27
4.1.4 Early Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.5 Long-term Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.6 IPv6 support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.7 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.8 Easy Maintaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.9 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.10 Security Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.11 Anomaly Detection in Encrypted Traffic . . . . . . . . . . . . . . . . . . 28
4.1.12 User-friendly Interface and Well-arranged Visualization . . . . . . . . 28
4.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Network Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.2 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 MyNetScope and Data Sources . . . . . . . . . . . . . . . . . . . . . . . 33

vi
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Deployment of the Proposed IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1 Deployment status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Network Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.2 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.3 MyNetScope and Data Sources . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A An example of Holt-Winters prediction . . . . . . . . . . . . . . . . . . . . . . . . . 45
B The CD Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

vii
Chapter 1

Introduction

Computer networks are all around us. For example, they are essential for effective commu-
nication, sharing knowledge, research and development, modern education, entertainment
and, of course, e-commerce.
The TCP/IP protocol suite is widespread in today’s high speed computer networks.
Maybe surprisingly, the core of this suite, the Internet Protocol, was already published as
Request For Comment 791 [41] in 1981. In comparison to nowadays, the Internet was a
closed network in 1980s. So there was no need to consider security standpoints in the de-
sign. But now we are exposed to many security threats: denial of service (DoS), scanning,
password cracking, spoofing, eavesdropping, spamming, phishing, worms and others.
As a result, many companies and organizations define their network security policy. It
is a set of rules that should be followed by users to avoid or at least mitigate the security
threats. Technically, the policy is often implemented by firewalls, intrusion detection and
prevention systems (IDS, IPS) or a virtual private network (VPN). The firewall represents
basic level of a defence. It inspects network traffic passing through it and denies or permits
the passage based on a set of rules, a part of the network security policy. An intrusion de-
tection and/or prevention should be performed to fulfill two basic requirements: to identify
and/or protect host computer from security threats in the administered network connected
to the Internet or other networks and vice versa. We point out that both requirements are im-
portant. The network is exposed to attacks from outside as well as from inside. In addition,
the second requirement is important due to the presence of botnets that exploit “zombie
computers” in our network and use them to other malicious activities. In short, IDS and IPS
are the “checkpoints” that supervise firewalls or other components dedicated to the network
defence.
In this thesis, we focus on the security analysis of large networks such as the campus
network of Masaryk University that has tens of thousands of users a day and many en-
try points. Nowadays, the firewalls are part and parcel of the defence in this network. We
decide to deploy intrusion detection system to intensify network security although it is in-
tentionally an open and not restricted academic network. Our goal is to reveal the network
behaviour and examine if it complies with the defined security policy mainly implemented
by firewalls. Consequently, the security analysis should be easier and supported by IDS out-
puts.

1
1.1. THESIS OUTLINE

1.1 Thesis Outline

This thesis is divided into six chapters. Chapter 1 is this introduction. Selected modern meth-
ods for security analysis mainly at the IP layer are described and evaluated in Chapter 2.
Network traffic visualization as an important part of the security analysis is discussed in
Chapter 3. In Chapter 4, requirements on the intrusion detection system and a design that
meets these requirements are presented. Chapter 5 summarizes our experience in system
deployment in the Masaryk University network. Chapter 6 concludes the thesis.

2
Chapter 2

Methods for Security Analysis

This chapter provides an introduction to the intrusion detection and modern methods for
the network security analysis. We are mainly focused on the methods working at the IP layer.
First of all, we explain basic terms related to the intrusion detection and traffic acquisition.
Then we describe and evaluate each method, especially according to the following criteria:

1. coverage,

2. effectiveness,

3. performance,

4. applicability for different types of data acquisition,

5. ability of intrusion detection in encrypted traffic.

The first criterion is ability to detect security threats. The coverage is complete if the method
detects both known and unknown threats. The second criterion stands for detection accu-
racy, the rate of false positives produced by the method. The speed of processing network
traffic by the method, the third criterion, is crucial for a deployment in high-speed networks.
The fourth criterion determines whether packet capture and/or (sampled) flow-based data
are suitable as the input of the evaluated method. Last criterion is more and more important
in today’s network.
A basic classification of methods is taken from [12].

2.1 Intrusion Detection

We can divide intrusion detection systems (IDS) into two basic classes according to their po-
sition in the network: host-based intrusion detection systems and network-based intrusion
detection systems [1]. Note there are other points of view of the IDS classification.
Host-based Intrusion Detection This type of detection is performed on a host computer
in a computer network. Host-based intrusion detection system (HIDS) usually monitors log
files (e. g. firewall logs, web server logs and system logs) and the integrity of system files
(e. g. the kernel integrity or opened ports).
Network-based Intrusion Detection On the contrary, the network-based approach ob-
serves the whole network or its part. All inbound or outbound network traffic is inspected

3
2.2. INTRUSION PREVENTION

for suspicious patterns. The patterns can be represented as a signature, a string of characters
that describes a certain attack. Another different approach is an anomaly-based detection.
First, the model of a normal network behaviour is created. Then the difference to the model
is evaluated. If it is greater than predefined value (threshold), it can point out an attack.
Other network-based intrusion detection system (NIDS) use stateful protocol analysis to de-
tect suspicious, unexpected or invalid sequences of packets in terms of a specific protocol.
These methods are discussed in detail in relevant sections in this chapter. NIDS are passive
systems: they are “invisible” to other hosts and mainly for the attackers.
In connection to IDS, there are frequently mentioned two following terms: false positive
and false negative. The former denotes a false IDS alert: the system classifies benign traffic
as malicious. On the contrary, the latter points to the malicious traffic that was not recog-
nized by IDS. Of course, there is a tendency to minimize the numbers of both false positives
and negatives. For example, if the IDS produces high false positive rate, it bothers the ad-
ministrator about a subsequent manual analysis of these alerts. In addition, there are some
techniques, such as squealing, which exploit the vulnerability of IDSs to high false positive
rates. [43]

2.2 Intrusion Prevention

In comparison to IDS, an intrusion prevention system (IPS) is a reactive system in which IDS
is tightly coupled with firewall (and should be a part of the communication link). The main
task of IPS is to mitigate (stop) the detected attack. IPS can be divided into three classes:
host-based, network-based and distributed IPS [18].

2.3 Flow-Based Traffic Acquisition

The classic approach of many IDS or IPS to data collection is to capture all network packets
that pass through the system, most frequently in pcap format1 . In contrast, many routers
and monitoring probes perform a flow-based data collection, typically in NetFlow format.

2.3.1 NetFlow and IPFIX


NetFlow was originally developed by Cisco Systems, the world leader in networking so-
lutions. Many Cisco switches and routers are capable of exporting NetFlow records. There
are two widely used versions: NetFlow version 5 and 9. The former is Cisco’s proprietary
format and the latter was standardized as an open protocol by IETF in 2006.
A flow is defined as an unidirectional sequence of packets with some common properties
that pass through a network device. These collected flows are exported to an external device,
the NetFlow collector. Network flows are highly granular; for example, flow records include
details such as IP addresses, packet and byte counts, timestamps, Type of Service (ToS),

1. A binary format, native for tools built on libpcap library, a system-independent interface for user-level
packet capture. The pcap format can read tools such as tcpdump, Wireshark, tcpreplay and many others.

4
2.3. FLOW-BASED TRAFFIC ACQUISITION

application ports, input and output interfaces, etc. [34] Thus, the flow-based data collection
provides an aggregated view of network traffic.
IPFIX The continuation of IETF effort leads to unification of protocols and applications
that require flow-based IP traffic measurements. RFC 3917 defines requirements for export-
ing traffic flow information out of routers, middleboxes (e. g. firewalls, proxies, load bal-
ancers, NATs), or traffic measurement probes for further processing by applications located
on other devices [33]. Consequently, Cisco’s NetFlow version 9 was chosen as the basis of
the IP Flow Information Export (IPFIX). [35] There are no fixed properties (5-tuple) such as
in NetFlow version 5. The user can flexibly define the properties used for flows distinction.
RFC 5101, published in January 2008, specifies the IPFIX protocol that serves for trans-
mitting IP Traffic Flow information over the network [37]. Next, RFC 5102 defines an infor-
mation model for the IPFIX protocol. It is used by the IPFIX protocol for encoding measured
traffic information and information related to the whole process [38]. Thanks to the IPFIX
flexibility, RFC 5103 can introduce the term Biflow, a bidirectional flow, and describe an
efficient method for exporting Biflows information using the IPFIX protocol [39]. The bidi-
rectional view of network traffic might be useful for security analysis.
The development of IPFIX is not finished. The IPFIX working group is still working on
a few Internet drafts that would be published as RFC. The most recent RFC was issued in
April 2008. It provides guidelines for the implementation and use of the IPFIX protocol. [40]
Packet sampling is performed (especially by routers) to save the NetFlow exporter re-
sources. We distinguish two basic types of sampling:
• deterministic – exactly nth of every n packets is sampled,
• random – each packet is sampled with a probability 1/n.
The constant n is called sampling rate. For example, if it is set to 4 and the device receive 100
packet, 25 packets are analyzed and 75 packets are dropped for the analysis. Only common
packet header fields are recorded, not the whole payload. The flow sampling is another type
of aggregation.
Both the active and the inactive timeout values affect a flow creation. The active timeout
is applied to long-lasting flows. If the flow has been inactive for the inactive timeout or the
end of the flow is detected, flow statistics are exported from the probe to a collector. The
collector is a server dedicated to collection, long-term storage and analysis of flow statistics.

2.3.2 Other Flow-based Technologies


Proprietary Cisco NetFlow or open IETF standards are not the only one flow-based solu-
tions. Another industry standard was described in RFC 3176. sFlow is a technology for
monitoring traffic in data networks containing switches and routers. In particular, it de-
fines the sampling mechanisms implemented in the sFlow Agent for monitoring traffic, the
sFlow MIB2 for controlling the sFlow Agent, and the format of sample data used by the

2. MIB stands for Management Information Base.

5
2.4. SIGNATURE-BASED DETECTION

sFlow Agent when forwarding data to a central data collector [32]. sFlow is supported by
Alcatel-Lucent, D-Link, Hewlett-Packard, Hitachi and NEC.
Other leaders in networking also develop their proprietary flow-based solutions: Juniper
Networks use Jflow and Huawei Technology their NetStream.

2.4 Signature-based Detection

This is one of the oldest methods for security analysis. We mentioned it here because it is
widely used by many commercial and open-source IDSs.
Description A signature is a pattern that corresponds to a known threat. Signature-based
detection is the process of comparing signatures against observed events to identify possi-
ble incidents. It is the simplest detection method because it just compares the current unit
of activity, such as a packet or a log entry, to a list of signatures using string comparison
operations. [12] In short, the detection works with “local” information.
Evaluation This method is very effective at detecting known threats, but largely inef-
fective at detecting previously unknown threats, threats disguised by the use of evasion
techniques, and many variants of known threats. [12] For example, if the intruder use the
Unicode representation of the slash character (%c0%af) and the signature contains the slash,
signature-based detection is not successful (false negative). [1]
Next, we describe an example of the signature. The following string is a simple rule for
an open-source signature-based IDS Snort. [42]

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg: "WEB-ATTACKS


kill command attempt"; flow:to_server,established; content:"/bin/kill";
nocase; classtype:web-application-attack; sid:1335; rev:5;)
If Snort captures and recognizes a TCP packet with source IP address in the external net-
work, any source port, destination address, destination port of HTTP server in the admin-
istered network and the payload contains string "/bin/kill", it alerts "WEB-ATTACKS kill
command attempt" according to the rule. The rule contains variables $EXTERNAL_NET,
$HTTP_SERVERS and $HTTP_PORTS. They must be set by the administrator, Snort (or any
signature-based IDS) does not know particular values. Thus, the correct detection depends
on up-to-date configuration.
The rule above gives an example of possible false positive. Consider we provide an e-
mail service with web interface on our web servers. If someone sends an e-mail contain-
ing the searched string “/bin/kill”, Snort classifies such traffic as malicious. Particular net-
work traffic and rules themselves influence the accuracy of detection. We can observe very
low false positive rate on dedicated lines and vice versa. There are many ways to create
rules and signatures. We could use any destination port and/or destination host instead of
$HTTP_PORTS and/or $HTTP_SERVERS in our example. Snort and other signature-based
IDS allow specify the searched content as a regular expression too. E. g., it is useful when
the signature differs only in used protocol (FTP, HTTP or HTTP). “General” rules are easy
to manage, but can cause higher false positive rate.

6
2.5. STATEFUL PROTOCOL ANALYSIS

Snort was originally designed for small, lightly utilized networks. [42] The core of the
signature-based detection is generally expensive string matching. Every packet and its pay-
load is inspected for searched signatures. Snort usually runs on COST (commercial off-the-
shelf) hardware and its performance is not satisfactory for this task in multi-gigabit net-
works. This gap is fulfilled by hardware accelerators. Traffic Scanner, a hardware-accelerated
IDS based on Field-Programmable Gate Arrays (FPGAs). System uses an architecture based
on non-deterministic finite automaton for fast pattern matching. Using this approach, through-
put up to 3.2 Gbps is achieved on for all rules from Snort database. [46] Hardware accelera-
tion is also interesting for commercial companies. [8]
Although signature-based detection handles mainly with packet payload, some signa-
ture consist of properties acquired by flow-based data collection. Then the limiting detec-
tion is possible. However, if sampling is used, some packets containing signatures can be
lost and the effectivity is thus lower.
We chose Snort as an implementation of the signature-based detection for evaluation.
We conclude the coverage is low, because only the known attacks specified by signatures
are revealed by this method. The effectivity vary according to the quality of signatures, the
risk of high false positives is high in common networks, even without the use of some IDS
evasion techniques, e. g. squealing [43]. Performance is reasonable for high-speed networks
only if it is supported by the hardware acceleration. The use of flow-based data as an input
for this method is limiting. Generally, signature-based IDS suffers from considerable latency
in deployment of a brand-new rule (a signature) in such system. Last, but not least, the
method cannot cope with encrypted payload.

2.5 Stateful Protocol Analysis

Another approach to intrusion detection is stateful protocol analysis that operate mainly on
the higher layers of the TCP/IP network model. We mention it here for completeness and
comparison.
Description Stateful protocol analysis (alternatively deep packet inspection) is the pro-
cess of comparing predetermined profiles of generally accepted definitions of benign pro-
tocol activity for each protocol state against observed events to identify deviations. Unlike
anomaly-based detection, it relies on vendor-developed universal profiles that specify how
particular protocols should and should not be used. That means that the IDS is capable of
understanding and tracking the state of network, transport, and application protocols that
have a notion of state. [12]
For example, when a user starts a File Transfer Protocol (FTP) session, the session is
initially in the unauthenticated state. Unauthenticated users should only perform a few
commands in this state, such as viewing help information or providing usernames and
passwords. An important part of understanding state is pairing requests with responses,
so when an FTP authentication attempt occurs, the IDS can determine if it was successful
by finding the status code in the corresponding response. Once the user has authenticated
successfully, the session is in the authenticated state, and users are expected to perform any

7
2.6. ANOMALY-BASED DETECTION

of several dozen commands. Performing most of these commands while in the unauthenti-
cated state would be considered suspicious, but in the authenticated state performing most
of them is considered benign. [12]
Evaluation Although there are some tools implementing basic stateful protocol analysis
(such as stream43 in Snort), the method is not wide-spread (such as signature-based detec-
tion). We identify the following reasons.
Firstly, it is a very resource-intensive task, particularly in high-speed networks. The com-
plexity of the analysis grows with the number of (simultaneous) sessions 4 . Secondly, it relies
on the “knowledge” of all analyzed protocols. Notice there are numerous differences be-
tween implementations by various vendors and definitions in RFC and other standards. In
addition, only the analysis of known protocols is possible. Next, the attacks (e. g., denial of
service attacks) that utilized well-formed packets and do not violate the normal behaviour
are not detected. Finally, the method is impuissant to encrypted packet payload too.
On the other hand, the method generally provides relatively high accuracy. In contrast to
signature-based method that searches for known patterns in the packet payload, this method
works with sessions. The method can correlate information obtained from the whole session
together and provides better view inside the network traffic. Stateful protocol analysis can
also reveals some threats that could be omitted by other methods that performs port-based
traffic classification. Last, but not least, a limited subset of the analysis can process flows too.

2.6 Anomaly-based Detection

It is the process of comparing definitions of what activity is considered normal against ob-
served events to identify significant deviations. An IDS using anomaly-based detection has
profiles that represent the normal behaviour of such things as users, hosts, network con-
nections, or applications. The profiles are developed by monitoring the characteristics of a
typical activity over a period of time. The major benefit of anomaly-based detection meth-
ods is that they can be very effective at detecting previously unknown threats. For example,
suppose that a computer becomes infected with a new type of malware. It will probably
perform behaviour that would be significantly different from the established profiles for the
computer. [12]

2.6.1 Holt-Winters Method


This method, also known as triple exponential smoothing, has proven through the years to
be very useful in many forecasting situations. It was first suggested by C. C. Holt in 1957
and was meant to be used for non-seasonal time series showing no trend. He later offered a
procedure (1958) that does handle trends. Winters (1965) generalized the method to include
seasonality, hence the name “Holt-Winters Method”. [44]

3. A Snort preprocessor, actually aimed at the mitigation of “squealing” [43] attacks performed by stick and
snot tools. See http://cvs.snort.org/viewcvs.cgi/snort/doc/README.stream4.
4. Actually, the session consists of a few flow in terms of NetFlow.

8
2.6. ANOMALY-BASED DETECTION

Description Many service network variable time series exhibit the following regularities
(characteristics) that should be accounted for by a model:

• A trend over time (i. e., a gradual increase in application daemon requests over a two
month period due to increased subscriber load).

• A seasonal trend or cycle (i. e., every day bytes per second increases in the morning
hours, peaks in the afternoon and declines late at night).

• Seasonal variability (i. e., application requests fluctuate wildly minute by minute dur-
ing the peak hours of 4–8 pm, but at 1 am application requests hardly vary at all).

• Gradual evolution of regularities (1) through (3) over time (i. e., the daily cycle grad-
ual shifts as the number of evening daylight hours increases from December to June).
[3]

Let y1 ...yt−1 , yt , yt+1 ... denote the sequence of values for the time series observed at some
fixed temporal interval. Let m denote the period of the seasonal trend (i. e., the number of
observations per day). Holt-Winters Forecasting [17] rests on the premise that the observed
time series can be decomposed into three components: a baseline, a linear trend, and a sea-
sonal effect. The algorithm presumes each of these components evolves over time and this is
accomplished by applying exponential smoothing to incrementally update the components.
The prediction is the sum of the three components: ŷt+1 = at + bt + ct+1−m . The update
formulae for the three components, or coefficients a, b, c are:

• at = α(yt − ct−m ) + (1 − α)(at−1 + bt−1 ), baseline (“intercept”),

• bt = β(at − at−1 ) + (1 − β)bt−1 , linear trend (“slope”),

• ct = γ(yt − at ) + (1 − γ)ct−m , seasonal trend.

The new estimate of the baseline is the observed value adjusted by the best available esti-
mate of the seasonal coefficient (ct−m ). As the updated baseline needs to account for change
due to the linear trend, the predicted slope is added to the baseline coefficient. The new esti-
mate of the slope is simply the difference between the new and the old baseline (as the time
interval between observations is fixed, it is not relevant). The new estimate of the seasonal
component is the difference between the observed value and the corresponding baseline.
α, β, γ are the adaptation parameters of the algorithm and 0 < α, β, γ < 1. Larger values
mean the algorithm adapts faster and predictions reflect recent observations in the time se-
ries; smaller values means the algorithm adapts slower, placing more weight on the past
history of the time series. [3]
A simple mechanism to detect an anomaly is to check if an observed value of the time
series falls outside the confidence band. A more robust mechanism is to use a moving win-
dow of a fixed number of observations. If the number of violations (observations that fall
outside the confidence band) exceeds a specified threshold, then trigger an alert for aberrant

9
2.6. ANOMALY-BASED DETECTION

behaviour. Define a failure as exceeding a specified number of threshold violations within


a window of a specified numbers of observations (the window length). See details in [3].
The author outlines this method for networking monitoring, but it can be useful for security
analysis too. The continuation represents NfSen-HW, an experimental version of NfSen5 by
Gabor Kiss from HUNGARNET6 . He added a kind of aberrant behaviour detection based
on the built-in Holt-Winters algorithm of RRDtool. [21]
We note that R, a language and environment for statistical computing and graphics7 ,
provides a function called HoltWinters(). Appendix A gives an example of Holt-Winters
prediction.
Evaluation The settings of the parameters α, β, γ, confidence band and threshold are not
clear. The model parameters need to be set and tuned for the model to work well. There
is no single optimal set of values, even restricted to data for a single variable. This is due
to the interplay between multiple parameters in the model. [3] The author also gives some
suggestions and, more generally, the authors of [14]. The fact that the fine tuning of Holt-
Winters analysis is not a trivial task is confirmed by [11]. The settings could influence the
false positives rate and consequently the accuracy of the method. The training period is
also crucial: we naturally get false positives, if the malicious activity is considered normal.
A one-week training period usually gives satisfactory results.
The coverage is quite good. Because some threats and attacks behave similarly, we can
suggest the “sensitive” variable of network traffic and then detect even the previously un-
known threats.
The question of performance is tightly coupled with the data acquisition. Theoretically,
both packet capture and flow-based approach are possible. The latter provides aggregated
information as input for the method. This means that the method itself does not work with
a huge amount of data in (almost) real-time. That ensures the underlying layer: flow-based
probes and collectors. The method “only” computes the forecast on the basis of the historical
data (typically recent week [21]). The flow-based approach was chosen for the deployment
of this method in GEANT project [11] (NfSen-HW relies on NetFlow data).

2.6.2 Minnesota Intrusion Detection System (MINDS)


Description The core of MINDS is an anomaly detection technique that assigns a score to
each network connection that reflects how anomalous the connection is, and an association
pattern analysis based module that summarizes those network connections that are ranked
highly anomalous by the anomaly detection module. [27]
The analysis is performed on 10-minute windows of NetFlow data8 . Firstly, the feature
extraction is done. MINDS introduces two types of features derived from standard NetFlow

5. It is a graphical web-based front end for the nfdump tools.


6. HUNGARNET is Hungarian National Research and Education Network.
7. http://www.r-project.org/
8. A filtering is often done to exclude trusted hosts or unusual/anomalous network behaviour that is known
to be intrusion free.

10
2.6. ANOMALY-BASED DETECTION

features: time window-based, connections with similar characteristics in the last T seconds
and connection window-based, last N connections originating from (arriving at) distinct
sources (destinations). The former obviously do not include malicious activities (such as
stealthy port scans) which last more than T seconds. Hence, it is complemented by the latter.
The time window-based features are:

• count-dest, number of flows to unique destination IP addresses inside the network in


the last T seconds from the same source,

• count-src, number of flows from unique source IP addresses inside the network in
the last T seconds to the same destination,

• count-serv-src, number of flows from the source IP to the same destination port in
the last T seconds

• count-serv-dest, number of flows to the destination IP address using same source


port in the last T seconds.

The connection window-based feature list follows:

• count-dest-conn, number of flows to unique destination IP addresses inside the net-


work in the last N flows from the same source

• count-src-conn, number of flows from unique source IP addresses inside the network
in the last N flows to the same destination

• count-serv-src-conn, number of flows from the source IP to the same destination port
in the last N flows

• count-serv-dest-conn, number of flows to the destination IP address using same source


port in the last N flows. [27]

Secondly, the data is fed into the MINDS anomaly detection module that uses an outlier
detection algorithm to assign the local outlier factor (LOF) [25], an anomaly score to each
network connection. The outlier factor of a data point is local in the sense that it measures
the degree of being an outlier with respect to its neighbourhood. For each data example,
the density of the neighbourhood is first computed. The LOF of a specific data example p
represents the average of the ratios of the density of the example p and the density of its
neighbours. [27]
Finally, the MINDS association pattern analysis module summarizes network connec-
tions that are ranked highly anomalous by the anomaly detection module. This module also
uses some signature-based detection techniques. See section 3.5 in [27].
Evaluation MINDS was deployed at the University of Minnesota in August 20029 . It has
been successful in detecting many novel network attacks and emerging network behaviour

9. But unfortunately, MINDS source codes are not available.

11
2.6. ANOMALY-BASED DETECTION

that could not be detected using signature based systems such as Snort. See section 3.4 in
[27].
Input to MINDS is NetFlow version 5 data, because the authors admit they currently do
not have the capacity to collect and store data in pcap (tcpdump) format.
LOF requires the neighbourhood around all data points be constructed. This involves
calculating pairwise distances between all data points, which is an O(n2 ) process, which
makes it computationally infeasible for millions of data points. The author suggests a sam-
pling of a training set from the data and compare all data points to this small set, which
reduces the complexity to O(n × m) where n is the size of the data and m is the size of the
sample. [27] On the other hand, the effectiveness can be decreased, namely because of the
potential presence of threats in the training set.
The coverage is good, the authors claim that the LOF technique also showed great promise
in detecting novel intrusions on real network data. [27]

2.6.3 The Work of Xu et al.


Kuai Xu et al. developed a method that employs a combination of data mining and information-
theoretic techniques applied to flows, classify and build structural models to characterize
host/service behaviours of similar patterns. [48]
Description The authors work with four-dimensional feature space consisting of srcIP,
dstIP, srcPrt and dstPrt. Then clusters of significance along each dimension are extracted.
Each cluster consists of flows with the same feature value in the said dimension. This leads
to four collections of interesting clusters: srcIP, dstIP, srcPrt and dstPrt clusters. The first
two represent a collection of host behaviours while the last two represent a collection of
service behaviours. Clusters with feature values that are distinct in terms of distribution are
considered significant and extracted; this process is repeated until the remaining clusters
appear indistinguishable from each other. This yields a cluster extraction algorithm that
automatically adapts to the traffic mix and the feature in consideration. For example, the
authors get 117 srcIP clusters from 89 261 distinct source IP addresses in trace file used in
[48].
The second stage of the methodology is a behaviour classification scheme based on ob-
served similarities/dissimilarities in communication patterns (e.g., does a given source com-
municate with a single destination or with a multitude of destinations?). For every cluster,
an information-theoretic measure of the variability or relative uncertainty RU of each di-
mension except the (fixed) cluster key dimension is computed:
H(X) H(X)
RU (X) := Hmax (X) := log(min{Nx ,m})

Equation 2.6.1: Relative uncertainty

X is a random variable, one feature dimension (srcIP, dstIP, srcPrt or dstPrt) that may
take Nx discrete values, H(X) is the (empirical) entropy of X, Hmax (X) is the maximum
entropy of (sampled) X and m is sample size, the total number of flows observed during the

12
2.6. ANOMALY-BASED DETECTION

time interval. RU (X) lies in the [0, 1] interval. Clearly, if RU (X) = 0, then all observations
of X are of the same kind and vice versa. Consequently, the author propose simple rules
that divide each RU dimension into three categories: 0 (low), 1 (medium) and 2 (high). If we
fix one dimension and other three are free, we get a three-dimensional vector. The labelling
process classifies clusters (the vectors) into 27 (=33 ) possible behaviour classes (BC in short).
Research on popularity, average size and volatility of BCs shows that a predominant major-
ity of clusters stay in the same BC when they re-appear. Moreover, most of the behaviour
transitions are between “neighbouring” or “akin” BCs. [48]
Next, the dominant state analysis capture the most common or significant feature val-
ues and their interaction. The authors find clusters within a behaviour class have nearly
identical forms of structural models (“simpler” subsets of values or constraints which ap-
proximate the original data in their probability distribution). This model can also help an
analyst because it provide interpretive value for understanding the cluster behaviour.
Finally, the authors identified three “canonical” profiles: server/service behaviour (mostly
providing well-known services), heavy-hitter host behaviour (predominantly associated with
well-known services) and scan/exploit behaviour (frequently manifested by hosts infected
with known worms). These profiles are characterized by BCs they belong to and their prop-
erties, frequency and stability of individual clusters, dominant states and additional at-
tributes such as average flow size in terms of packet and byte counts and their variabil-
ity. [48]
Evaluation Firstly, there is no free available implementation (as opposed to Snort), hence
the benchmarking is doubtful. However, we suppose satisfactory performance because the
method was developed for saturated backbone links. In addition, it processes aggregated
NetFlow data captured in 5-minute time slot10 , not payload of each packet that go through
in real-time.
Another advantage is that this method promises high coverage. The behaviour profiles
are built without any presumption on what is normal or anomalous. The method dynami-
cally extracts significant flows. There are no fixed rules applied to particular flow or packet.
The flow (cluster) is marked as exploit if it belongs to such profile. What is more, we can ob-
serve rare and interesting relationship between clusters and particular flow, which can point
out other (unknown) malicious behaviour (e. g., clusters in rare BC, behavioural changes for
clusters and unusual profiles for popular service ports).
The authors did not mention (and they actually could not mention) accuracy evaluation,
because they used live network traffic without any knowledge of structure (mainly portion
of malicious traffic). Last, but not least, we note the method stands on a port-based traffic
classification.

10. The authors set a timeout value to 60 seconds and admit that it is a trade-off between the effectivity and
performance.

13
2.6. ANOMALY-BASED DETECTION

2.6.4 Origin Destination Flow Analysis

Anukool Lakhina et al. have introduced Origin-Destination (OD) flow as a basic unit of net-
work traffic. It is the collection of all traffic that enters the network from a common ingress
point and departs from a common egress point. [24] They believe that a thorough under-
standing of OD flows is essential for anomaly detection too. Lakhina et al. distinguish from
other authors because they perform whole-network traffic analysis: modeling the traffic on
all links simultaneously. OD flow is often high-dimensional structure (it depends on the size
of the network), hence the authors utilize a technique called Principal Component Analysis
(PCA) to reduce the “dimensionality”. They found that the hundreds of OD flows can be
accurately described in time using a few independent dimensions. The following descrip-
tion and evaluation is relevant to two chosen methods based on OD flow analysis (not to
the only one as in previous sections).
Description Volume anomalies detection is based on a separation of the space of traffic
measurements into normal and anomalous subspaces, by means of PCA. [23] The authors
suppose that a typical backbone network is composed of nodes (also called Points of Pres-
ence, or PoPs) that are connected by links. The path followed by each OD flow is determined
by the routing tables. The authors use the term volume anomaly to refer to a sudden (with
respect to timestep used) positive or negative change in an OD flow’s traffic. Because such
an anomaly originates outside the network, it will propagate from the origin PoP to the
destination PoP. OD flow based anomalies are identified by observing link counts.
Firstly, PCA is applied to Y, the t × m measurement matrix, where t denotes the number
of time intervals of interest and m the number of links in the network. Thus, the matrix
denotes the timeseries of all links. We yield a set of m principal components, {vi }m i=1 . The
first principal component v1 is the vector that points in the direction of maximum variance
in Y:
v1 = argmaxkvk=1 kYvk

Equation 2.6.2: Computation of the first principal component

kYvk2 is proportional to the variance of the data measured along v. Proceeding itera-
tively, once the first k − 1 principal components have been determined, the k-th principal
component corresponds to the maximum variance of the residual. The residual is the differ-
ence between the original data and the data mapped onto the first k − 1 principal axes. [23]
The authors validate the thesis in [24] that their link data have low effective dimensional-
ity. The vast majority of the variance in each of inspected link is covered in 3 or 4 principal
component.
The mapping of the data to principal axis i with normalization to unit length follows.
Such vectors capture the temporal variation common to the entire ensemble of link traffic
timeseries along principal axis i. Since the principal axes are in order of contribution to
overall variance, the first vector captures the strongest temporal trend common to all link
traffic, the second captures the next strongest, and so on.

14
2.6. ANOMALY-BASED DETECTION

Next, the vectors (components) are separated by a simple threshold-based method. As


soon as a projection is found that exceeds the threshold (e.g., contains a 3σ deviation from
the mean), that principal axis and all subsequent axes are assigned to the anomalous sub-
space. All previous principal axes then are assigned to the normal subspace. [23]
Then we can decompose a set of traffic measurements at a particular point in time into
normal and residual components. The size of the residual component is a measure of the
degree to which the particular measurement is anomalous. Statistical tests can then be for-
mulated to test for unusually large size, based on setting a desired false alarm rate. See
details in [23], section 5.1.
Another method, feature entropy detection is presented in [22]. The method comes from
observation of a change in distributional aspects of packet header fields, features. In contrast
to previous method, it can also capture some anomalies that have a minor effect on the traffic
volume: worms spreading, stealthy scans or small denial of service attacks. Thus, traffic
feature distributions are used there instead of traffic volume. Entropy captures in a single
value the distributional changes in traffic features, and observing the time series of entropy
on multiple features exposes unusual traffic behaviour. [22]
The authors focused on four fields: source address srcIP, destination address dstIP, source
port srcPort and destination port dstPort, but the feature list can be extended. Let X =
{ni , i = 1, . . . , N } is an empirical histogram, meaning that feature i occurs ni times in the
sample. N is the number of distinct values seen in the sampled set of packets. Then the
sample entropy is defined as:
N N
H(X) = − ( nSi ) log2 ( nSi ), S =
P P
ni
i=1 i=1

Equation 2.6.3: Sample entropy

S is the total number of observations in the histogram. The value of sample entropy lies
in the range [0, log2 N ]. The metric takes on the value 0 when the distribution is maximally
concentrated, i.e., all observations are the same. Sample entropy takes on the value log2 N
when the distribution is maximally dispersed, i. e. n1 = n2 = . . . = nN . [22] This method
uses the multiway subspace method that extend the subspace method described in previous
paragraphs about volume anomalies. In contrast to the subspace method, there are four
measurement matrices, one for each traffic feature. An effective way of analyzing multiway
data is to recast it into a simpler, single-way representation. The idea behind the multiway
subspace method is to “unfold” the multiway matrix into a single, large matrix. Then the
(simple) subspace method is applied. We also refer to [22] for details.
Evaluation First of all, we emphasis again that the described methods and OD flow
analysis in general is intended for the whole-network traffic analysis. The authors admit
it is a difficult objective, amplified by the fact that modeling traffic on a single link is itself a
complex task. [22] What is more, the whole-network analysis requires data from all of nodes
(Points of Presence) in the inspected network. This implies traffic acquisition at each such
node that can be infeasible. Next, these methods are designed for backbone networks and

15
2.6. ANOMALY-BASED DETECTION

their operators. There is not important who (which host) sent an anomaly flow, but where
it was originated (in which autonomy system). It is a basic, aggregated point of view that is
computationally efficient. Although the computation complexity is O(t × m2 ), we need not
cope with hundreds of thousand flows, but only with tens of OD flows. The exact flows can
be consequently specified “on demand”, when we found an anomaly in particular OD flow.
However, then it is more expensive.
The authors evaluate the volume anomalies-based method in two backbone networks
(European Sprint and US Abilene). They used NetFlow from routers with periodic sampling
at a rate of 1 out of 250 packets and Juniper’s Traffic Sampling, random sampling at a rate of
1 out of 100 packets respectively. Packets were aggregated into flows at the network prefix
level in 5-minute timeslots. Next, an addition aggregation into 10-minute timeslot was per-
formed (due to time synchronization issues). We find 10 minutes is too long (particularly in
high-speed networks), because it extends the response time in case of short-term anomalies.
Feature entropy detection method was also evaluated by the authors in two (different)
backbone networks (US Abilene and European Géant). There was also used sampling: pe-
riodical and at a rate of 1 out of 100 packets, 1 out of 1 000 packets respectively (Géant is
larger than Abilene). Data from both network was captured in 5-minute timeslots.
Due to the sampling (mainly at low rates) some flows can be omitted. Thus, the sam-
pling can hide some (small) anomaly flows. This idea is supported by [15], another work of
Lakhina and other authors. Furthermore, they revealed that anomalies found in unsampled
traffic remain “visible” in terms of feature entropy even in sampled traffic. On the contrary,
volume-based detection relies only on summary counters that can provide SNMP11 (e. g.,
implemented in routers).
It is important to point out that the methods do not make any a priori assumption about
the nature of network traffic. The classification is done during the computation. So this ap-
proach takes into account the detection of unknown threats.
Considering the effectiveness, the authors use confidence limit 1 − α. The limit also di-
rectly determine false positive rate α. So, the settings of α is key issue. Lakhina et al. perform
the following evaluation. [22] They employ the methods in detection of known anomalies
in off-line trace files. The confidence limit was set to 0.995, 0.999 respectively. They actually
controlled if the method is (more) conservative or not. The evaluation is focused on mea-
surements of detection rate. In other words, it is the true positive rate that indicates how
many anomalies were captured from the whole.

2.6.5 Cooperative Adaptive Mechanism for Network Protection (CAMNEP)

CAMNEP is an agent-based network IDS. It is not the only one method, but the whole
system based on a few method described above. In spite of the fact, we mention it here,
because it is an interesting concept of an incorporation of modern detection method that
profits from the synergy effect.

11. SNMP stands for Simple Network Management Protocol.

16
2.6. ANOMALY-BASED DETECTION

Description The architecture consists of several layers (see Figure 2.1) with varying
requirements on on-line processing characteristics, level of reasoning and responsiveness.
While the low-level layers need to be optimized to match the high wire-speed during the
network traffic acquisition and preprocessing, the higher layers use the preprocessed data
to infer the conclusions regarding the degree of anomaly and consecutively also the mali-
ciousness of the particular flow or a group of flows. [4]

Operator and Analyst


Mycroft Agent User Interface
Interface Layer
Suspicious Behaviour

Anomalies Anomalies
Cooperative Threat
Agent Conclusions Agent Conclusions Agent Detection Layer
Preprocessed Data

Collector Preprocessing Traffic Acquisition and


Preprocessing Layer
NetFlow Data

FlowMon probe FlowMon probe FlowMon probe

Figure 2.1: CAMNEP architecture [4]

Traffic acquisition and preprocessing layer acquires the data from the network using
the hardware-accelerated NetFlow probes and perform their preprocessing. This approach
provides the real-time overview of all connections on the observed link. The preprocessing
layer aggregates global and flow statistics to speed-up the analysis of the data.
Cooperative threat detection layer consists of specialized, heterogeneous agents that seek
to identify the anomalies in the preprocessed traffic data by means of their extended trust
models. There are four agents that employ detection methods based on MINDS, work of
Xu et al. and work of Lakhina et al. Note the agents are not complete implementation of the
methods described in previous sections. The authors chose only these features and ideas that
are computationally efficient in near-real-time and even is possible to integrate them into
the whole agent platform. For example, MINDS agent performs only simplified observation
of time window-defined features and compares them with history data to determine the
anomaly of each flow.
As a result, each agent determines the anomaly of each flow as a value in the [0, 1] inter-
val, where 1 represents the maximal anomaly, and 0 no anomaly. The values are shared with
other agents. Each agent integrate these values into its trust model. To preserve the compu-
tational feasibility, these models work with significant flow samples and their trustfulness
in the identity-context space.
Trustfulness is also determined in the [0, 1] interval, where 0 corresponds to complete
distrust and 1 to complete trust. Hence, low trustfulness means that the flow is considered
as a part of an attack.

17
2.7. SUMMARY

The identity of each flow is defined by the features we can observe directly on the flow:
srcIP, dstIP, srcPrt, dstPrt, protocol, number of bytes and packets. If two flows in a data set
share the same values of these parameters, they are assumed to be identical. The context of
each flow is defined by the features that are observed on the other flows in the same data
set, such as the number of similar flows from the same srcIP, or entropy of the dstPrt of
all requests from the same host as the evaluated flow. [4] The identities are the same for all
agents, but the contexts are “agent-specific”.
The anomaly of each flow is used to update the trustfulness of flow samples in its vicinity
in the identity-context space. Each agent uses a distinct distance function, because it has a
different insight into the problem. The cross correlation function is implemented to eliminate
random anomalies.
Finally, each agent determines the trustfulness of each flow and all agents provide their
trustfulness assessment to the aggregation and visualization agents, and the aggregated val-
ues can then be used for traffic filtering. The authors can define the common misclassifica-
tions errors using the trustfulness and maliciousness of the flow. The flows that are malicious
and trusted are denoted as false negatives, and the flows that are untrusted, but legitimate
are denoted as false positives. [4]
The higher level is operator and analyst interface layer. The main component is an in-
telligent visualization agent that helps the operator to analyze the output of the detection
layer, by putting the processed anomaly information in context of other relevant informa-
tion. When the detection layer detects suspicious behaviour on the network, it is reported to
visualization.
Evaluation First of all, note that CAMNEP as a whole stands on the incorporated detec-
tion methods. One advantage is that the architecture is modular. The agent platform can be
widened by other agents, other (new) anomaly-based detection methods. The authors argue
that the use of trust model for integration of several anomaly detection methods and effi-
cient representation of history data shall reduce the high rate of false positives which limits
the effectiveness of current intrusion detection systems. We participated on the evaluation
and testing of the system. Results are also described in [4].
In a nutshell, the attacks with more than several hundreds flows are consistently discov-
ered by all agents. The slower attacks, using lower number of flows (300 and less) are more
tricky. Note that the evaluation was performed in a campus network loaded with thousands
of flows per second. On the other hand, CAMNEP is not able to detect attacks consist of few
packets, e. g. buffer overflow attack.

2.7 Summary

In this chapter, we studied a few detection methods for security analysis of a computer
network. Definitely, this is not an exhaustive list of known methods, but a selection of wide-
spread and as well as interesting methods and approaches.
We started with the commonly used signature-based method. Although, it operates at
higher layers than we are focused on, it is good for a comparison with other methods. Then

18
2.7. SUMMARY

we briefly described and evaluated stateful protocol analysis that extends previous method
in a particular way. Both methods inspect packets even their payload. Note this approach
also can interfere with law issues.
In contrast, the anomaly-based detection methods generally process flows, namely 5-
tuple (srcIP, srcP ort, dstIP, dstP ort, protocol) constructed from packet headers. It is more
efficient, particularly in multi-gigabit networks. On the other hand, the flow acquisition is
not a simple task, especially for non-dedicated devices such as routers. Due to that fact,
packet sampling is used. Unfortunately, it can introduce some inaccuracy. The impact of
packet sampling on anomaly detection is discussed in [15]. We think that future work could
be aimed at other key features that form the flow. Thus, the 5-tuple could be changed and/or
extended.
Another significant contrast between statistical methods and the others is that statistical
methods build behaviour profiles at host and service levels using traffic communication
patterns without any presumption on what is normal or anomalous. However, the “level
of presumption” differs. While Holt-Winters algorithm builds a model for normal traffic
based on parameter settings and a priori knowledge of the periodic structure in traffic, the
methods proposed by Xu et al. and Lakhina et al. do not rely on any parameter settings and
normal traffic behaviour is captured directly in the data.
Next, the statistical anomaly-based methods have to cope with three basic steps that were
outlined in [23]:

• detection,

• identification,

• quantification.

In fact, there is only one step in case of the other methods. They simply “know” what they
find (e. g., in terms of signature or protocol definition), hence we a priori identify a searched
anomaly and quantify its relevance.
Finally, there are a few existing IDSs based on the mentioned methods. Snort is a leading
representative of signature-based IDS and the de facto standard for intrusion detection. It
is wide-spread because it is an open-source software. Another commonly used system is
Bro [2]. Currently, we did not find any network-based toolset that implements anomaly-
based detection methods. The one exception to this conclusion is most likely CAMNEP that
validated the selected methods in distinct environment to the authors’ environment.

19
Chapter 3

Visualization

The key problem of the analysis is to comprehend the results of the whole process. We can
acquire data that (truly) picture the network traffic and process them by various methods.
However, if we do not use any data-mining technique, we still have to interpret the results
manually. It is throughout feasible in small network, but absolutely inconceivable in high-
speed networks because the human being does not manage to evaluate the large amount of
information. The visualization should help us and present significant information in differ-
ent and more comfortable view.
For example, tcpdump is the most used tool for network monitoring and data acquisi-
tion. It is a command-line tool that can read packets from network interface or data file and
display each packet on a new line on output. In contrast, a network packet analyzer Wire-
shark1 utilizes graphical user interface (GUI) and, for instance, “colourizes” packet display
based on filters. Actually, the tool processes classification and results are presented as var-
ious colours. We also can interactively browse the capture data, view summary and detail
information for each packet. We confirm such (small) improvements ease the analysis.
However, not only the colours usage is the visualization. In this chapter, we discuss the
visualization as an integral part of modern security analysis. We outline some ways of vi-
sualization in current software tools and evaluate their contribution to the analysis acceler-
ation. We mainly focus on open-source software that visualize captured network traffic in
pcap or NetFlow format. Meanwhile data in pcap format contain packet headers and the
payload, NetFlow records intentionally omit the payload.

3.1 Charts

The basic visualization instrument is a chart. There are many tools extending basic software
that perform only data acquisition. These tools often plot two-dimensional charts that depict
time series of monitored values or their aggregations. It is a simple and thus widespread
method of visualization. Namely, NfSen [28] integrates nfdump outputs with various charts
that show time series of total number of packets, flows and traffic volume. See Figure 3.1.
The charts are also used in other tools such as FlowScan2 , Java Netflow Collect-Analyzer3 ,

1. The tool was formerly known as Ethereal (see http://www.ethereal.com/) which still exists as a separate
project.
2. http://net.doit.wisc.edu/~plonka/FlowScan/
3. http://sourceforge.net/projects/jnca/

20
3.2. MAPPING IN SPACE

ntop4 , nfstat5 , NetFlow Monitor6 , Caligare Flow Inspector7 or Stager8 .


Charts are also used in network monitoring. A network administrator can easily look at
the appropriate chart and immediately make a decision if a network or security anomaly
occurred. In such cases, the relevant curve used to grows or drops sharply.

Figure 3.1: A chart of network traffic volume in NfSen

3.2 Mapping in Space

This visualization technique draws points in two or quasi three-dimensional space that is
displayed on a screen. It makes use of the human stereoscopic vision and “convert” patterns
in the captured data into graphic patterns in defined space.
For instance, The Spinning Cube of Potential Doom is an animated visual display of net-
work traffic. Each axis of cube represents a different component of a TCP connection: X is
the local IP address space, Z is the global IP addresses space and Y is the port numbers used
in connections to locate services and coordinate communication (such as 22 for SSH and 80
for HTTP). TCP connections, both attempted and successful, are displayed as single points
for each connection. Successful TCP connections are shown as white dots. Incomplete TCP
connections are shown as coloured dots. Incomplete connections are attempts to communi-
cate with nonexistent systems or systems no longer listening on that particular port number.
The Cube colours incomplete connections using a rainbow colour map with colour varying

4. http://www.ntop.org/
5. http://shlang.com/nfstat/
6. http://netflow.cesnet.cz/
7. http://www.caligare.com/netflow/index.php
8. http://stager.uninett.no/

21
3.3. GRAPHS

by port number; colour mapping assists viewers in locating the point in 3D space. [6] For
example, a port scan in captured data creates a line in the cube (see Figure 3.29 ). It is more
useful and efficient view on such event comparing to a manual examination of a tcpdump
or even Wireshark output.
An extension of the Cube is InetVis [16]. Similar approach is also used by Flamingo [9].
PortVis [29] and tnv [45] rather use two-dimensional space.

Figure 3.2: Port scan in The GPL Cube of Potential Doom, a GPL network visualizer based
on the Spinning Cube of Potential Doom.

3.3 Graphs

A natural representation of the network traffic is a graph where vertices correspond to hosts
and (oriented) edges correspond to the communication (flows) captured between the hosts
(see Figure 3.3). This structure digestedly depicts who communicates with whom. For a
comparison, classic output is on Figure 3.4.

9. Cited from http://www.kismetwireless.net/doomcube/ (colours were inverted).

22
3.3. GRAPHS

Figure 3.3: Network traffic as a graph

NfVis10 stands for NetFlow Visualizer and it is a proof of concept tool based on the
prefuse visualization toolkit11 . The graph-based traffic representation is enhanced with sev-
eral significant features. The user can list the flows and traffic statistics associated with each
edge/host. The traffic can be filtered and aggregated according to many relevant features.
The visual attributes of the display (such as node/edge size and colour) can also adapt
to these characteristics, making the user’s orientation easier. The information provided by
“third parties” (DNS and whois) is seamlessly integrated into the visualization. As current
network traffic is a scale-free network, it is particularly important to handle the visualization
of supernodes, i.e. the nodes with a high number of connections. These nodes are typical for
many attack scenarios, as well as for high-value targets. Visualizer therefore replaces the
one-shot connections to/from these hosts by a special representation of a “cloud” of traffic,
and only singles out the nodes that also connect to other nodes in the observed network. [5]
MyNetScope12 is a network visual analytics platform based on the standard NetFlow
data and heterogeneous data sources. It evolves NfVis in two important ways. Firstly, it can
incorporate other external data sources such as DNS resolution, whois response, outputs of

10. This software was denoted as Visio Agent in the CAMNEP project.
11. http://prefuse.org/
12. http://www.mycroftmind.com/products:mns

23
3.3. GRAPHS

Figure 3.4: Network traffic as a listing of flows

various anomaly detection methods and the network topology information. Secondly, it is
a scalable solution even for wide networks. We participate on its development and testing,
hence we can confirm these statements. The integration of external data sources is very
welcome because it is not common that a security analyst works only with primary data
such as tcpdump outputs or NetFlow records. He or she generally has to gather additional
information from other available sources. Otherwise, the complete inspection of the security
incident is not possible.
We also mention other graph-based visualization tools. VisFlowConnect-IP visualizes
network traffic as a parallel axes graph with hosts as nodes and traffic flows as lines con-
necting these nodes. These graphs can then be animated over time to reveal trends. [47] Co-
operative Association for Internet Data Analysis (CAIDA) develops two interesting tools.
LibSea13 is both a file format and a Java library for representing large directed graphs on
disk and in memory. Scalability to graphs with as many as one million nodes has been the
primary goal. Additional goals have been expressiveness, compactness, and support for
application-specific conventions and policies. Walrus14 is a tool for interactively visualizing
large directed graphs15 in three-dimensional space. By employing a fisheye-like distortion, it
provides a display that simultaneously shows local detail and the global context. Although,
they are not specialized application for network traffic visualization, it would be useful to

13. http://www.caida.org/tools/visualization/libsea/
14. http://www.caida.org/tools/visualization/walrus/
15. LibSea graph files

24
3.4. SUMMARY

combine them for this purpose if there was a tool that provides output in LibSea format.

3.4 Summary

We explained why the visualization is important in the security analysis and introduced
three techniques and tools that they utilize. The common used charts were subsequently
complemented by methods that use mapping in space and graph representation of network
traffic. We also summarized their contribution to the analysis.
Naturally, the progress of visualization tools is connected with development of tools that
acquire and/or process network data. E. g., both tcpdump and Wireshark stand on libpcap a
system-independent interface for user-level packet capture16 . Similarly, NfSen is a graphical
web-based front end for the nfdump NetFlow tools and Walrus stands on LibSea.
We hope that a good visualization tool should display a complex picture of the network
traffic, ideally with marked up-to-date security incidents. However, all available details of
hosts and their communication should be displayed in well-arranged tables, charts and list-
ings on demand too.

16. See http://www.tcpdump.org for details.

25
Chapter 4

Design of the IDS

We described and evaluated several approaches to the intrusion detection as well as visual-
ization techniques of network traffic. In this chapter, we take into account our conclusions
and discuss the design of the intrusion detection system for large networks. First, we iden-
tify and give reasons for the requirements on such IDS and then we design a solution that
meet these requirements.
First of all, notice that we decided for intrusion detection system. In contrast to intrusion
prevention system (IPS), it “only” monitors the network traffic and alerts an operator in case
of a security incident. Consequently, he or she analyses the incident and eventually ensures
its mitigation. If we deployed IPS and it alerted false positive, it would immediately block a
legitimate network connection. Another reason is that IPS must be in-line (a part of the link).
When the IPS fails, the whole network may fail as well. Hence, we are conservative because
of the occurrence of false positive alarms and system failure. These are the main reasons for
the IDS deployment.

4.1 Requirements on the IDS

4.1.1 Accuracy

Accuracy is a fundamental requirement on any IDS. However, it is very difficult to meet


this requirement for current systems. They suffer from high rate of false positives. In ad-
dition, there are some IDS evasion techniques such as squealing. Due to these facts IDSs
are not widely accepted and deployed by network administrators. High false positive rate
overwhelms the administrators that are busy anyway. On the contrary, false negatives are
undetectable in routine operation. So IDS creates a “false sense of security”.

4.1.2 Detection of Novel Threats

Now, there are many IDS capable of detection of known threats, especially signature-based
IDS such as Snort. Their drawback is that the rule base of such IDS has to be maintained by
the network or security administrator. Moreover, novel threats are included in the rule base
manually, often by third-party vendors. Finally, it is obvious that these systems are forceless
to novel threats. Therefore, the proposed IDS should detect even novel threats by some more
efficient detection mechanism.

26
4.1. REQUIREMENTS ON THE IDS

4.1.3 Operating in a High-speed Networks


We request a solution that will operate in multi-gigabit networks. In case of the data link
layer is Ethernet, the IDS should support 1 and even 10 Gigabit Ethernet at wire speed. Note
that IEEE is developing 40 and 100 Gigabit Ethernet now.

4.1.4 Early Detection


IDS should begin with the detection as soon as possible a network packet passes through
its sensor. The results should be available to the security administrator in (near) real-time
because some security incidents last only a few minutes, even a few seconds.

4.1.5 Long-term Data Storage


Besides the early detection, the IDS should also provide records of mid-term and long-term
data. This is important when a Computer Security Incident Response Team (CSIRT) outside
our organization reports a security incident that originated from our network before some
time. If the IDS stores appropriate records, the security analysis is then easier.

4.1.6 IPv6 support


Although, wide IPv6 [19] deployment is not as fast as it was expected1 , we require its sup-
port. Nowadays, there are many well-secured IPv4 networks and the administrators work
on IPv6 deployment. However, they often “forget” about IPv6 network security. Thus, the
IDS should operate on both IPv4 and IPv6.

4.1.7 Scalability
IDS should monitor a network consisting of hundreds as well as thousands of computers.
IDS should be scalable and should not require any additional maintenance when a new
host is connected to the network or another host is disconnected or replaced. Again, the
additional maintenance annoys network administrators.

4.1.8 Easy Maintaining


This requirement is closely connected with scalability. Moreover, we expect the IDS main-
tenance will not consume too much time of a system administrator after its deployment.
Technically, all hardware components should be rack-mountable into a standard 19" rack.

4.1.9 Transparency
The notion of transparency actually comprises two requirements. First, the IDS should be
“invisible” at the IP layer. That means we should not assign any IP to the IDS (except a

1. The first IPv6 specification was issued in 1995 as RFC 1883.

27
4.2. SOLUTION

management module). This is required to avoid some attacks such as (distributed) denial of
service (DDOS and DOS) where attacker floods the network with packets destined for the
IP address of IDS. Second, the IDS should not markedly influence network topology and
network traffic in any way. Namely, latency should be preserved and the IDS should not
load network links uselessly.

4.1.10 Security Robustness


It is clear that IDSs attract attackers’ attention. The IDS itself should be invulnerable and
robust to security threats. We can prevent some attacks if we meet the previous requirement
of transparency at the IP layer. Next, the IDS integrity should be intact. For instance, if the
IDS is composed of several components, their communication could be invaded or eaves-
dropped. At all events, the security administrator must receive true results of the detection.

4.1.11 Anomaly Detection in Encrypted Traffic


Many current IDSs fail in the detection of threats in encrypted network traffic. Such systems
rely on the payload inspection. The proposed IDS should recognize anomalies even in the
encrypted traffic because more and more network services use encryption.

4.1.12 User-friendly Interface and Well-arranged Visualization


Last, but not least requirement is on the user interface. If the IDS meets all the previous
requirements, but the presentation of the results is not well-arranged, the IDS is not usable.
On the one hand, the interface should be helpful to the user and should offer all available
views of the data. On the other hand, it should provide support for repetitive transactions
and detailed view. The interface should be personalized by the user.

4.2 Solution

We decided for Network-based IDS (NIDS) to meet the following requirements:

• Scalability,

• Easy Maintaining,

• Security Robustness.

In contrast to Host-based IDS (HIDS), the deployment of a new host in network does not
demand more effort to monitor the network activity of the new host. There is no need to
install any specialized software on the host. Note that the network may consist of some
specialized hosts (besides common servers or workstations). So, the HIDS installation is
impossible in such a case. Next, NIDSs are passive devices, “invisible” for the attackers. On
the contrary, HIDSs rely on processes that running in the operating system of the host. We

28
4.2. SOLUTION

also consider the deployment, testing and possible upgrade of IDS. Generally, it is easier to
update one component of NIDS than many components of HIDS on hosts.
We propose the solution that is consisted of several components and layers. Network
probes are “eyes and ears” of the proposed intrusion detection system. Collectors are the
“memory”, MyNetScope with data sources is the “brain and heart” and MyNetScope an-
alyst console acts as the “mouth” of the IDS. The “nervous system and blood circulation”
is represented by network links that connect all parts together. After all, the architecture
(Figure 4.1) is similar to the CAMNEP architecture depicted in Figure 2.1.

Figure 4.1: The architecure of the proposed system

4.2.1 Network Probes

Probes create the bottom layer of our system. They acquire network traffic and serve collec-
tors with captured data. This section discusses probe features and probe deployment in the
administered network.
Data acquisition Network probes monitor the link and export captured data in the Net-
Flow format. We decided for this format to meet the requirement on operating in multi-
gigabit networks. We reject the use of SNMP counters and packet traces. The former gives
coarse-grained data and the latter is very difficult. It is practically infeasible to capture and
store packet at wire speed even with specialized hardware.
We emphasis we do not rely on NetFlow data that export some (edge) Cisco routers that
may exist in present network. Not only our measurements reveal that Cisco’s routers do not

29
4.2. SOLUTION

export NetFlow correctly in all circumstances. [26] Obviously, the main task of the router is
to route network traffic. We must take into account that NetFlow export is additional feature.
On the other hand, the NetFlow data from routers can be supplemental data source for our
system.
Next, we rather avoid the packet sampling due to possible distortion of acquired data.
Our decision is supported by [15].
We recommend to use probes based on COST (commercial off-the-shelf) computers be-
cause of their cost. There are two alternatives of network interface cards (NIC) used in the
probes. The former utilizes common NIC (such as Intel) and the latter rely on the COMBO
technology developed in the Liberouter project2 . The software probes that capture network
traffic by NIC (such as nprobe) is not sufficiently efficient. [20] Hence, we deploy Flow-
Mon, a hardware-accelerated passive network monitoring probe. [10] Generally, the soft-
ware probes are satisfactory for small networks, the hardware-accelerated probes for large,
multi-gigabit networks. Both types of probes meet the requirement on transparency since
they are “invisible” at the IP layer. There is no IP address assigned to the interface perform-
ing packet capturing. IPv6 is supported thanks to the use of NetFlow version 9.
Location A network probe monitors traffic passing through a certain node of the net-
work. Thus, the location of the network probe determines what is monitored. This is very
important because the proposed system is based on data provided by network probes. Ide-
ally, each packet that ingresses or egresses the administered network should pass through
the place where the probe is located. We discuss this with network administrators of the
campus network of the Masaryk University. We identify that the probes should be located
“in the neighbourhood” of the edge router considering the network traffic from/to the In-
ternet.
Figure 4.2 shows the location of the main probe. We were choosing between two alterna-
tives. We suppose that the edge router acts as a firewall too. If we placed the probe in front of
the router/firewall, we would also monitor the traffic that would not enter the administered
network. We chose the second alternative. The main probe is located in the administered
network, behind the router/firewall. This ensures that the probe “see” only the traffic that
passed through the firewall. The firewall usually implements (a part of) the security policy
of the organization.
As discussed above, we will not insert the probe into the network link, but only a net-
work tap. It is a hardware device which provides a way to access the data flowing across a
computer network3 . Thus, we actually delegate the responsibility for the continuous oper-
ating to the tap. If we use the tap that requires power supply, we should connect it to the
uninterruptible power supply (UPS). Also we should choose tap with dual power supply
unit in case of failure.
The main probe is capable to capture only the attacks that originate from or are destined
for outside the network. Concerning attacks by insiders, we propose to deploy other probes

2. See http://www.liberouter.org/ for details.


3. Cited from Wikipedia. See http://en.wikipedia.org/w/index.php?title=Network_tap&oldid=197240411.

30
4.2. SOLUTION

Figure 4.2: Network probe location

inside our network, specially in front of/behind the firewalls that protect particular network
segments. Then we can reveal possible malicious activities of hosts in our network. For in-
stance, Figure 4.3 depicts deployment of one main probe and three inside the administered
network. It can be demanded in campus or corporate networks. There is one segment con-
sisting of more sensitive servers than the others or the organization is large enough to mon-
itor network traffic inside the organization. The details of the deployment in the Masaryk
University network are discussed in the next chapter.

Figure 4.3: Probes inside the network

Honeypots Beside the NetFlow probes, we propose to deploy honeypots to complement


the probes functionality. It is an information system resource whose value lies in unautho-
rized or illicit use of that resource. [13] We chose a low-interaction honeypot because we
want to perform passive rather than active detection. The output of a honeypot should be
a list of hosts (from outside and even inside the network) that try to communicate with
imaginary hosts in the administered network. Typically, we reserve several unassigned IP

31
4.2. SOLUTION

addresses (or the whole subnet) for the honeypot. If it observes a connection attempt to
such address, it logs the host that originated the connection. However, we ought to avoid
premature conclusions. For example, consider an user who type an incorrect IP address,
misconfigured host and so on.
Security Security robustness is very important for such devices as network probes. The
probe itself is controlled via management interface. We use secure channel (namely SSH)
and the access is granted only from specified IP addresses. We employ identity management
system such as RADIUS [31]. It is advantageous to distributed systems because it eliminates
synchronization issues. Last, but not least, we use NTP4 to synchronize the clocks of comput-
ers over a network. Since the probes timestamp the flows using the host time it is necessary
to set the precise time.
Maintenance and Management Generally, the probes are easy to maintain devices. If we
place them in network and set up, they will work and fulfill their task. However, if they do
not send any data to the collector, we cannot determine whether the monitored link or the
probe fails. Hence, we employ NETCONF Configuration Protocol [36] over SSH to monitor
a probe status.

4.2.2 Collectors
A NetFlow collector is responsible for correct reception and storing NetFlow data that are
exported by network probes. To prevent reinventing the wheel, we use existing tools and
software that is well tested and wide-spread. In case of NetFlow collectors, we rely on nf-
dump and NfSen toolset [28]. Our collectors receive and store NetFlow records but also per-
form some preprocessing tasks such as periodically execution of scripts that monitor policy
violation. Collectors comply with requirements described above as well as other parts of the
proposed IDS.
Security To meet security requirements, we specify IP addresses of probes that are au-
thorized to send the NetFlow data to the particular collector. Notice that the collector itself
does not restrict the reception of NetFlow records. It can be considered to be a security threat
since the NetFlow records are transmitted in UDP packets that can be easily forged. If we do
not want to transmit NetFlow records via the same network, we can connect the collectors
directly to the probes through local network and thus considerably intensify the security. In
addition, this could lighten the loaded network links.
Long-term data storage Although NetFlow records are already aggregated (in terms of
network flows), they occupy relatively a lot of disk space. For example, the records that
cover one month of network traffic of large campus network occupy about 240 GB of disk
space5 . If we do not deploy more probes, we could utilize only one collector. Nevertheless,
long-term data storage requires enough space on disk drives.
In addition, it may be required by some law. This is regulated by “Vyhláška č. 485/2005 Sb.”6

4. NTP stands for Network Time Protocol. See http://www.ntp.org/ for details.
5. The records are stored in nfcapd format.
6. See http://www.sagit.cz/pages/sbirkatxt.asp?zdroj=sb05485&cd=76&typ=r for details (in Czech).

32
4.2. SOLUTION

in the Czech Republic.

4.2.3 MyNetScope and Data Sources

In this section, we describe the core of our intrusion detection system. This layer requires
data from collectors and other sources for its operation.
MyNetScope We employ MyNetScope platform that was briefly described in Section 3.3.
It is not a standalone application, it is designed as client/server architecture. The server
reads NetFlow records from collectors, performs some preprocessing tasks on the flows and
replies to analyst’s queries that are submitted by client application (analyst console). Again,
the entire communication between all parts is encrypted. We use SSH tunnels.
CAMNEP MyNetScope itself does not perform intrusion detection. It is very useful vi-
sualization tool that meets the requirements in Section 4.1.12. Its power is in integration of
external data sources. We decided to deploy part of the CAMNEP project (described and
evaluated in Section 2.6.5) as the “brain” of our intrusion detection system. Thus, we can
meet following requirements:

• Accuracy,

• Detection of Novel Threats,

• Operating in a High-speed Networks,

• Early Detection,

• Anomaly Detection in Encrypted Traffic.

We use mainly the CAMNEP Cooperative Threat Detection Layer that combines modern in-
trusion detection methods. In summary, we get better accuracy than we would deploy par-
ticular anomaly detection methods separately. The methods are able to detect novel threats
and anomalies in case of the security anomaly is captured as network traffic anomaly too.
For instance, a worm spreading or denial of service attack is “visible” in network flows. On
the contrary, single packet that causes buffer overflow on a host computer does not repre-
sent the network traffic anomaly. Next, the methods were designed for high-speed networks
from the very beginning or they were modified to meet this requirement. The detection is
performed in 5-minute time windows. This is a reasonable interval due to flow aggregation,
commonly used in connection with NetFlow. Finally, since the methods work purely with
packet headers, the anomaly detection is possible even in case of the encrypted payload.
CAMNEP Detection Layer computes for each network flow its trustfullness. This value
is then imparted to MyNetScope and the user can view the suspicious flows and query the
MyNetScope for other relevant information.
Other data sources Apart from CAMNEP, we also utilize other data sources such as DNS
server, whois service or specific scripts that periodically check for policy violation. Their
output is then included in MyNetScope too. These scripts are discussed in the next chapter.

33
4.3. SUMMARY

4.3 Summary

We identified and explained twelve fundamental requirements on an intrusion detection


system for large networks. Then we design a distributed system that meet these require-
ments. The system consists of several layers and components:

• NetFlow probes and honeypots,

• collectors,

• CAMNEP and other data sources,

• MyNetScope platform: server and client (analyst console).

34
Chapter 5

Deployment of the Proposed IDS

We have already begun with system deployment and testing in the large campus network of
Masaryk University. This chapter summarizes our present experience in using the designed
system. First, we describe in detail the system deployment status. We structure the descrip-
tion according to Section 4.2. Then we outline a use case and compare a security analysis
performed with the help of the designed system with the classic approach.

5.1 Deployment status

5.1.1 Network Probes


First of all, we started with the probe deployment and testing. As discussed in Section 4.2.1,
we considered various probe locations. We were discussing with network administrators
and we were testing selected locations. Finally, we decided for the main probe located be-
hind the edge router/firewall and the other probes located in front of the firewall that pro-
tects selected subnets (typically faculty subnets). It arises from the organization structure
of the university. Institute of Computer Science (ICS) is responsible for the development of
information and communication technologies at the university. Although the faculties and
other departments are to a certain degree autonomous units, they must adhere to rules1 and
cooperate with ICS. Therefore, it is useful that such an arrangement of probes can capture a
policy violation inside the network.
The main probe is temporarily connected to the SPAN port of the edge router (Cisco
Catalyst 7609). We chose hardware-accelerated FlowMon probe with 10 Gigabit Ethernet
interface. Since we use the SPAN port2 , we have to enable packet filtering at the probe.
Thus, only packets from/to the Masaryk University are acquired by the FlowMon probe.
We were also testing traffic acquisition of all packets from SPAN port, but the router serves
other international links that are heavy loaded. It required several times more disk space
on the collector. In addition, the probe cannot determine the correct AS3 because the traffic
contains packets from all interfaces of the router. Now, the probe processes every weekday
about 6 TB of data (1.2 Gbps spikes) in 200 million of flows (4 000 flows per second spikes).

1. For instance, “Směrnice rektora č. 2/2003, Užívání počítačové sítě Masarykovy univerzity”, see
http://is.muni.cz/do/1499/normy/smernicerektora/Smernice_rektora_2-2003.pdf (in Czech).
2. The router copies all packets that pass through it to this port.
3. AS stands for Autonomous System.

35
5.1. DEPLOYMENT STATUS

We have recently deployed the second probe. It is located in front of two routers that
connect the Faculty of Informatics with the university backbone. There are two network taps
between the backbone routers and the routers of the Faculty. The probe is connected to the
taps. We employ a four-port software probe FlowMon there. According to our measurement,
we decided for non-accelerated version of the probe. Although, we have not yet acquired
any data from this probe, we expect the link usage will be lower than in case of the main
probe. Both probes export data in NetFlow version 9 format.
A honeypot deployment is being prepared. We decided for Honeyd. It is a small daemon
that creates virtual hosts on a network. The hosts can be configured to run arbitrary services,
and their personality can be adapted so that they appear to be running certain operating
systems. Honeyd enables a single host to claim multiple addresses.4 Network administrators
have already assigned the address space for honeypots. We dispose of 254 IPv4 addresses
for a honeypot operation. Although the address space is unused, we can observe numerous
requests for the connection originated outside the administered network. So we expect the
honeypot will help us with security analysis. We also plan to assign some IPv6 addresses to
the honeypot.
In this phase of the deployment, we decided to assign public IP addresses to the probe
management interfaces due to an easier access and maintanance. We will consider the use of
private addresses with respect to the security issues in the next phase. Similarly, we have not
yet deployed the uniform identity management. However, the firewall (iptables) is running
on the probes.
We plan to deploy other probes in possibly interesting locations such as the Faculty of
Education, the Faculty of Science, the University Campus at Bohunice, the Faculty of Law
and students’ hostels.

5.1.2 Collectors
We still use only one PC5 equipped with 1TB hard drive. We estimate that this is sufficient
to store NetFlow record from the main probe for about 4 months. We will consider the usage
of some data thinning technique, compression or other collectors dedicated to each probe.
Results obtained from the data acquisition by the second probe can answer this question.
Nevertheless, we have to cope with the trade-off between the long-term data storage and
the completeness of the records.
The collector is also utilized for preproccessing. There is the cron daemon6 periodically
executing scripts that check the policy violation. The scripts are described in detail in the
next subsection. They usually perform tasks that load the collector and their evaluation last
some time (typically a few minutes). It is not surprising, because they typically process all-
day data (up to 17 GB). So, the scheduling and planning has become more important in case
of many scripts.

4. http://www.honeyd.org/
5. Intel Xeon 2 GHz CPU, 2 GB of RAM, Linux 2.6.9.
6. See http://unixhelp.ed.ac.uk/CGI/man-cgi?cron+8 for details.

36
5.1. DEPLOYMENT STATUS

Similarly to the probes, the collector is protected by firewall and communicates via as-
signed public IP address.

5.1.3 MyNetScope and Data Sources


We have designed the use of the CAMNEP project as the main data source. The probes and
the collector is prepared to CAMNEP deployment in the next phase. Now, we are focused on
MyNetScope. We are responsible for MyNetScope analyst console testing and development
of the scripts, the additional data sources. The MyNetScope analyst console is still under
development and in alpha testing phase. We are currently reporting bugs and suggest im-
provements of the system to MyNetScope developers.
We have already deployed two scripts that check the selected rules of the security policy.
These scripts are in routine operation and their output helps with a security analysis. The
scripts are periodically executed every night7 on the collector and provide output in two
formats. First, plain text files at the web server that is running on the collector are useful
for network administrator. Second, files with rules for MyNetScope platform access these
external data sources in MyNetScope analyst console. According to the rules, the nodes
(hosts) that violate the security policy are “colourized”. In addition, we plan that user will
be able to filter the hosts that violate the particular policy.
Reverse DNS entry policy The first script checks if all hosts (IPv4 addresses) from the
Masaryk University network that were communicating previous day have a valid DNS re-
verse entry. Every Internet-reachable host should have a name. Many services available on
the Internet will not talk to you if you are not correctly registered in the DNS. For every IP
address, there should be a matching PTR record in the in-addr.arpa domain. [30] Examples
of effects of missing reverse mapping are described in [7].
The script utilizes nfdump tool. It filters all communicating hosts from the Masaryk Uni-
versity network and save the output to a temporary text file. Next, all IP addresses are
passed as a parameter to a DNS lookup utility host8 . If the lookup fails, the relevant IP ad-
dress is logged. Network administrators can then inform appropriate administrators who
are responsible for such hosts. In spite of the fact that the script execution time differs, it
takes approximately up to 10 minutes in a weekday.
SMTP traffic policy The second script checks for anomalies in the SMTP traffic on TCP
port 25. It is not permitted to send e-mails to SMTP servers outside the Masaryk University
network excepting several well-known servers. We also monitor which hosts in the admin-
istered network behave as SMTP servers due to possible participation in spam campaigns.
The script logs all host inside the network that were communicating via TCP port 25
excepted replies to port scanning attempts. We take into account only the flows that contain
packets with TCP flags SYN, ACK and FIN. That means we are interested in TCP connec-
tions where the 3-way and 4-way handshake occured. The former is used for the connection

7. We discussed the time interval with network administrators and they found reasonable to execute the script
once a day.
8. See http://unixhelp.ed.ac.uk/CGI/man-cgi?host for details.

37
5.2. USE CASE

establishment and the latter for its termination.


Again, we use nfdump with a relevant filter to obtain interesting hosts. The script execu-
tion takes about 5 minutes. We point out it processes all-day data.

5.2 Use Case

In spite of the fact all parts of the system have not been deployed yet, we can use some
its components for security analyses of the Masaryk University network, namely the main
NetFlow probe and the NetFlow collector. We mention the system use case in this section.
In April 2008, the Masaryk University received a warning on a phishing scam from Security
Incident Response Team (SIRT) of Internet Identity9 .
Phishing is an attempt to criminally and fraudulently acquire sensitive information, such
as usernames, passwords and credit card details, by masquerading as a trustworthy entity in
an electronic communication10 . SIRT investigated that a computer administered by Masaryk
University act as a web hosting server of a forged website of an American bank. Network
administrators had confirmed this. Consequently, they disconnected the host from the net-
work and informed us. We had to investigate this security incident in three ways:
1. to validate the findings of SIRT,

2. to determine whether the phishing attack was successful,

3. to find out who was responsible for the attack.


Apart from information provided by a host administrator, we were inspected NetFlow
records. First of all, we identified a host profile. We set a filter for the destination IP address
of the host and filtered out all TCP flows that contained only SYN TCP flag. We found out the
host was used via secure shell. The administrator confirmed that the host had been reserved
for development and the presence of the web server was very suspicious. They also reported
that the attacker had changed the superuser password.
Second, we validated that the host acted as a web server: it had replied to requests on
TCP port 80 that is reserved for web traffic. In addition, we could exactly determine when
the server had replied for the first time. In total, we observed 54 distinct hosts (IP addresses)
that communicated with the attacked host. Hence, we fulfilled the first and the second point.
Finally, we were investigating the origin of the forged website. We supposed that the
host had been exposed to a SSH brute force attack. Consequently, we inspected the network
traffic on TCP port 22 that is reserved for SSH before the web server had been set up. We
found an extreme growth of number of flows in short time. This could point out just SSH
brute force attack. Since each attempt to log in is performed on a new port, it is considered
to be a new flow in terms of NetFlow. We identified a host that was responsible for too

9. http://www.internetidentity.com
10. Cited from Wikipedia. See http://en.wikipedia.org/w/index.php?title=Phishing&oldid=211566316 for de-
tails.

38
5.3. SUMMARY

many flows. So we fulfilled even the third point and closed the investigation of the security
incident. We enclose a CD-ROM containing all relevant data to this incident (see Appendix B
for the CD contents).
After some time, the administrator provided us a disk image of the entire drive of the
attacked host. We found in log files some entries that confirmed our findings. Of course,
we could investigate the incident without our system. We could only inspect the system log
files. However, the logs or the whole host are not always available. For instance, consider
the advanced attacker who deletes the log files.
We emphasis that we used only two (lower) layer of the designed system: FlowMon
probe and NfSen collector. After the CAMNEP deployment the system will automatically
determine a list of hosts (flows) with low trustfullness. In addition, MyNetScope platform
visualizes the traffic as a graph, a natural picture of a network traffic.

5.3 Summary

We described the status of the development of the designed system. We were focused on
our work: system component testing, development and integration of other data sources
into the whole system (e. g., scripts that check the organization security policy). Although
some parts of the system are still under development, we could use it to investigate the
security incident with satisfactory results.

39
Chapter 6

Conclusion

The goal of this thesis was to design a system that simplifies a security analysis of large net-
works. First of all, we studied the state of the art in intrusion detection and prevention. We
focused on modern methods that operate at the IP layer since they are efficient in high-speed
gigabit networks. On the contrary, stateful protocol analysis or signature-based detection
performed at higher levels of the TCP/IP model are both resource demanding tasks. Hence,
some statistical methods do not inspect the whole packet but only the packet headers. They
operate on NetFlow data acquired from routers (typically from Cisco devices) or the packet
traces that are later “converted” into network flows. Although these methods work only
with the packet headers, they are able to detect some anomalies in the network behaviour.
Next, we identified and explained essential requirements on the intrusion detection sys-
tem. Then we designed a distributed system that meets the requirements. The system con-
sists of several various components. We combined some existing subsystems and have been
developing an integration platform. We employed hardware-accelerated NetFlow probes,
honeypots, NetFlow collectors, MyNetScope platform and other data sources such as DNS,
whois and the output of other scripts that (pre)process acquired data.
We note there are about fifteen people involved in this long-term and dynamic project.
We contributed to the system development by testing the particular components and ex-
amples of scripts that check some organization’s security rules. These scripts are in routine
operation and we can easily validate the adherence to the rules. We also tested a part of the
system on the investigation of a security incident that was reported by a third-party. As a
result, we identified a host that had attacked computer from the Masaryk University. The
host changed the superuser password and ran a forged website to acquire usernames and
passwords of clients of a bank.
Finally, we suggest future work could be aimed at developing a new detection method
based on new directions in data acquisition. Namely, the use of IPFIX format would “ac-
cess” interesting feature in the packet payload for the anomaly-based detection methods.
Currently, we are bounded by 5-tuple of NetFlow format. Also a closer integration of other
data sources such as honeypots would be valuable.

40
Bibliography

[1] Northcutt, S. and Frederick, K. and Winters, S. and Zeltser, L. and Ritchey, R.: Inside
Network Perimeter Security: The Definitive Guide to Firewalls, VPNs, Routers, and
Intrusion Detection Systems, New Rider’s Publishing, 2003, 978-0735712324. 2.1, 2.4

[2] Paxson, V.: Bro: A System for Detecting Network Intruders in Real-Time, 1999,
<http://www.icir.org/vern/papers/bro-CN99.html> . 2.7

[3] Brutlag, J.: Aberrant behaviour Detection in Time Series for Network Monitoring,
2000, <http://www.usenix.org/events/lisa00/full_papers/brutlag/
brutlag_html/index.html> . 2.6.1

[4] Rehák, M. and Pěchouček, M. and Bartoš, K. and Grill, M. and Čeleda, P. and Krmíček,
V.: CAMNEP: An intrusion detection system for high-speed networks, 2008, <http:
//www.nii.ac.jp/pi/n5/5_65.pdf> . 2.6.5, 2.1, 2.6.5

[5] Rehák, M. and Pěchouček, M. and Čeleda, P. and Krmíček, V. and Novotný, J. and Mi-
nařík, P.: CAMNEP: Agent-Based Network Intrusion Detection System (Short Paper),
2008. 3.3

[6] Lau, S.: The Spinning Cube of Potential Doom, 2004. 3.2

[7] Senie, D. and Sullivan, A.: Considerations for the use of DNS Re-
verse Mapping , 2008, <http://www.ietf.org/internet-drafts/
draft-ietf-dnsop-reverse-mapping-considerations-06.txt> . 5.1.3

[8] Graham, I.: Achieving Zero-loss Multi-gigabit IDS Results from Testing Snort on
Endace Accelerated Multi-CPU Platforms, 2006, <http://www.touchbriefings.
com/pdf/2259/graham.pdf> . 2.4

[9] Oberheide, J. and Goff, M. and Karir, M.: Flamingo: Visualizing Internet Traffic, 2006.
3.2

[10] Čeleda, P. and Kováčik, M. and Koníř, T. and Krmíček, V. and Žádník, M.: CESNET
technical report number 31/2006: FlowMon Probe, 2006, <http://www.cesnet.
cz/doc/techzpravy/2006/flowmon-probe/flowmon-probe.pdf> . 4.2.1

[11] Malagon, C. and Molina, M. and Schuurman, J.: Deliverable DJ2.2.4: Findings
of the Advanced Anomaly Detection Pilot, 6. 9. 2007, <http://www.geant2.
net/upload/pdf/GN2-07-218v2-DJ2-2-4_Findings_of_the_Advanced_
Anomaly_Detetion_Pilot.pdf> . 2.6.1

[12] Scarfone, K. and Mell, P.: Guide to Intrusion Detection and Prevention Systems
(IDPS), 2007, <http://csrc.nist.gov/publications/nistpubs/800-94/
SP800-94.pdf> . 2, 2.4, 2.5, 2.6

41
[13] Spitzner, L.: Honeypots, 2003, <http://www.tracking-hackers.com/papers/
honeypots.html> . 4.2.1

[14] Chatfield, C. and Yar, M.: Holt-Winters Forecasting: Some Practical Issues, 1988. 2.6.1

[15] Brauckhoff, D. and Tellenbach, B. and Wagner, A. and Lakhina, A. and May,
M.: Impact of Packet Sampling on Anomaly Detection Metrics, 2006, <http:
//cs-people.bu.edu/anukool/pubs/anomalymetrics-sampling-imc06.
pdf> . 2.6.4, 2.7, 4.2.1

[16] van Riel, J. and Irwin, B.: InetVis, a visual tool for network telescope traffic anal-
ysis, 2006, <http://www.cs.ru.ac.za/research/g02v2468/publications/
vanRiel-Afrigraph2006.pdf> . 3.2

[17] Brockwell, P. and Davis, R.: Introduction to Time Series and Forecasting, Second Edi-
tion, 2002, Springer-Verlag New York, Inc., 0-387-95351-5. 2.6.1

[18] Zhang, X. and Li, C. and Zheng, W.: Intrusion Prevention System Design, 2004. 2.2

[19] Deering, S. and Hinden, R.: RFC 2460: Internet Protocol, Version 6 (IPv6) Specification,
1998, <http://www.ietf.org/rfc/rfc2460.txt> . 4.1.6

[20] Ivanko, J.: One-way Throughput Test - 20070715-F-0001, 2007, <http://www.


liberouter.org/flowmon/reports/report-20070715-F-0001.pdf> .
4.2.1

[21] Kiss, G.: NfSen-HW project site, 2006, http://bakacsin.ki.iif.hu/~kissg/project/nfsen-


hw <http://bakacsin.ki.iif.hu/~kissg/project/nfsen-hw/> . 2.6.1

[22] Lakhina, A. and Crovella, M. and Diot, C.: Mining Anomalies Using Traf-
fic Feature Distributions, 2005, <http://cs-people.bu.edu/anukool/pubs/
sigc05-mining-anomalies.pdf> . 2.6.4, 2.6.4

[23] Lakhina, A. and Crovella, M. and Diot, C.: Diagnosing Network-Wide


Traffic Anomalies, 2004, <http://cs-people.bu.edu/anukool/pubs/
subspacemethod-sigc04.pdf> . 2.6.4, 2.6.4, 2.7

[24] Lakhina, A. and Papagiannaki, K. and Crovella, M. and Diot, C. and Kolaczyk, E. and
Taft, N.: Structural Analysis of Network Traffic Flows, 2004, <http://cs-people.
bu.edu/anukool/pubs/odflows-sigm04.pdf> . 2.6.4, 2.6.4

[25] Breuni, M. and Kriegel, H. and Ng, R. and Sander, J.: LOF: Identifying Density-
Based Local Outliers, 2000, <http://www.dbs.informatik.uni-muenchen.
de/Publikationen/Papers/LOF.pdf> . 2.6.2

[26] Summer, R. and Feldmann, A.: NetFlow: Information loss or win?, 2002. 4.2.1

42
[27] Ertöz, L. and Eilertson, E. and Lazarevic, A. and Tan, P. and Kumar, V. and Srivastava,
J. and Dokas, P.: The MINDS - Minnesota Intrusion Detection System, 2004, <http:
//www-users.cs.umn.edu/~kumar/papers/minds_chapter.pdf> . 2.6.2

[28] Haag, P.: NfSen, 2007, <http://nfsen.sourceforge.net/> . 3.1, 4.2.2

[29] McPherson, J. and Ma, K. and Krystosk, P. and Bartoletti, T. and Christensen, M.:
PortVis: a tool for port-based detection of security events, 2004. 3.2

[30] Barr, D.: RFC 1912: Common DNS Operational and Configuration Errors, 1996,
<http://www.ietf.org/rfc/rfc1912.txt> . 5.1.3

[31] Rigney, C. and Willens, S. and Rubens, A. and Simpson, W.: RFC 2865: Remote Au-
thentication Dial In User Service (RADIUS), 2000, <http://www.ietf.org/rfc/
rfc2865.txt> . 4.2.1

[32] Phaal, P. and Panchen, S. and McKee, N.: RFC 3176: InMon Corporation’s sFlow:
A Method for Monitoring Traffic in Switched and Routed Networks, 2001, <http:
//www.ietf.org/rfc/rfc3176.txt> . 2.3.2

[33] Quittek, J. and Zseby, T. and Claise, B. and Zander, S.: RFC 3917: Requirements for IP
Flow Information Export (IPFIX), 2004, <http://www.ietf.org/rfc/rfc3917.
txt> . 2.3.1

[34] Claise, B.: RFC 3954: Cisco Systems NetFlow Services Export Version 9, 2004, <http:
//www.ietf.org/rfc/rfc3954.txt> . 2.3.1

[35] Leinen, S.: RFC 3955: Evaluation of Candidate Protocols for IP Flow Information Ex-
port (IPFIX), 2004, <http://www.ietf.org/rfc/rfc3955.txt> . 2.3.1

[36] Enns, R.: RFC 4741: NETCONF Configuration Protocol, 2006, <http://www.ietf.
org/rfc/rfc4741.txt> . 4.2.1

[37] Claise, B.: RFC 5101: Specification of the IP Flow Information Export (IPFIX) Protocol
for the Exchange of IP Traffic Flow Information, 2008, <http://www.ietf.org/
rfc/rfc5101.txt> . 2.3.1

[38] Claise, B. and Quittek, J. and Bryant, S. and Aitken, P. and Meyer, J.: RFC 5102: In-
formation Model for IP Flow Information Export, 2008, <http://www.ietf.org/
rfc/rfc5102.txt> . 2.3.1

[39] Trammell, B. and Boschi, E.: RFC 5103: Bidirectional Flow Export Using IP Flow Infor-
mation Export (IPFIX), 2008, <http://www.ietf.org/rfc/rfc5103.txt> . 2.3.1

[40] Boschi, E. and Mark, L. and Quittek, J. and Stiemerling, M. and Aitken, P.: RFC 5153: IP
Flow Information Export (IPFIX) Implementation Guidelines, 2008, <http://www.
ietf.org/rfc/rfc5153.txt> . 2.3.1

43
[41] Postel, J.: RFC 791: Internet Protocol, 2004, <http://www.ietf.org/rfc/
rfc791.txt> . 1

[42] Roesch, M.: Snort – Lightweight Intrusion Detection for Networks, 1999, <http:
//www.usenix.org/event/lisa99/full_papers/roesch/roesch_html/>
. 2.4

[43] Patton, S. and Yurcik, W. and Doss, D.: An Achilles’ Heel in Signature-Based IDS:
Squealing False Positives in SNORT, 2001, <http://www.raid-symposium.org/
raid2001/papers/patton_yurcik_doss_raid2001.pdf> . 2.1, 2.4, 3

[44] Chelli et al., Z.: NIST/SEMATECH e-Handbook of Statistical Methods, 2003, <http:
//www.itl.nist.gov/div898/handbook/> . 2.6.1

[45] Goodall, J. and Lutters, W. and Rheingans, P. and Komlodi, A.: Preserving the
Big Picture: Visual Network Traffic Analysis with TNV, 2005, <http://tnv.
sourceforge.net/papers/goodall-vizsec05.pdf> . 3.2

[46] Kobierský, P. and Kořenek, J. and Hank, A.: CESNET technical report 33/2006: Traffic
Scanner, 2006, <http://www.cesnet.cz/doc/techzpravy/2006/trafscan/>
. 2.4

[47] Yin, X. and Yurcik, W. and Slagell, A.: VisFlowConnect-IP: An Animated Link Analy-
sis Tool For Visualizing Netflows, 2005, <http://www.cert.org/flocon/2005/
presentations/Yin-VisFlowConnect-FloCon2005.pdf> . 3.3

[48] Xu, K. and Zhang, Z. and Bhattacharyya, S.: Profiling Internet BackboneTraffic: be-
haviour Models and Applications, 2005. 2.6.3, 2.6.3

44
Appendix A

An example of Holt-Winters prediction

Figure A.1: This figure depicts time series of the number of TCP flows. A circle denotes a big
difference between the predicted (the red line) and the observed value (the black line). The
plot was produced by R (http://www.r-project.org/).

45
Appendix B

The CD Contents

The enclosed CD-ROM contains anonymized NetFlow data in nfcapd format and shell
scripts that displays these data. The scripts show relevant information to the security in-
cident desribed in Section 5.2. Also two scripts mentioned in Section 4.2.3 are enclosed.
To summarize, the CD-ROM contains the following files and directories:

• data – NetFlow data in nfcapd format,

• scripts – shell scripts that require nfdump and other system utilities,

• README – the CD contents.

46

You might also like