0% found this document useful (0 votes)
15 views

Anomaly Detection of Web-Based Attacks

Uploaded by

Khoa Pham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Anomaly Detection of Web-Based Attacks

Uploaded by

Khoa Pham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Anomaly Detection of Web-based Attacks

Christopher Kruegel Giovanni Vigna


[email protected] [email protected]
Reliable Software Group
University of California, Santa Barbara
Santa Barbara, CA 93106

ABSTRACT oped without following a sound security methodology. At-


Web-based vulnerabilities represent a substantial portion of tacks that exploit web servers or server extensions (e.g., pro-
the security exposures of computer networks. In order to de- grams invoked through the Common Gateway Interface [7]
tect known web-based attacks, misuse detection systems are and Active Server Pages [22]) represent a substantial por-
equipped with a large number of signatures. Unfortunately, it tion of the total number of vulnerabilities. For example, in
is difficult to keep up with the daily disclosure of web-related the period between April 2001 and March 2002, web-related
vulnerabilities, and, in addition, vulnerabilities may be intro- vulnerabilities accounted for 23% of the total number of vul-
duced by installation-specific web-based applications. There- nerabilities disclosed [34]. In addition, the large installation
fore, misuse detection systems should be complemented with base makes both web applications and servers a privileged
anomaly detection systems. This paper presents an intrusion target for worm programs that exploit web-related vulnera-
detection system that uses a number of different anomaly de- bilities to spread across networks [5].
tection techniques to detect attacks against web servers and To detect web-based attacks, intrusion detection systems
web-based applications. The system correlates the server- (IDSs) are configured with a number of signatures that sup-
side programs referenced by client queries with the parameters port the detection of known attacks. For example, at the time
contained in these queries. The application-specific charac- of writing, Snort 2.0 [28] devotes 868 of its 1931 signatures
teristics of the parameters allow the system to perform fo- to detect web-related attacks. Unfortunately, it is hard to
cused analysis and produce a reduced number of false posi- keep intrusion detection signature sets updated with respect
tives. The system derives automatically the parameter pro- to the large numbers of vulnerabilities discovered daily. In
files associated with web applications (e.g., length and struc- addition, vulnerabilities may be introduced by custom web-
ture of parameters) from the analyzed data. Therefore, it based applications developed in-house. Developing ad hoc
can be deployed in very different application environments signatures to detect attacks against these applications is a
without having to perform time-consuming tuning and con- time-intensive and error-prone activity that requires substan-
figuration. tial security expertise.
To overcome these issues, misuse detection systems should
be composed with anomaly detection systems, which sup-
Categories and Subject Descriptors port the detection of new attacks. In addition, anomaly
D.4.6 [Operating Systems]: Security and Protection detection systems can be trained to detect attacks against
custom-developed web-based applications. Unfortunately, to
the best of our knowledge, there are no available anomaly de-
General Terms tection systems tailored to detect attacks against web servers
Security and web-based applications.
This paper presents an anomaly detection system that de-
Keywords tects web-based attacks using a number of different tech-
niques. The anomaly detection system takes as input the
Anomaly Detection, World-Wide Web, Network Security web server log files which conform to the Common Log For-
mat and produces an anomaly score for each web request.
1. INTRODUCTION More precisely, the analysis techniques used by the tool take
advantage of the particular structure of HTTP queries [11]
Web servers and web-based applications are popular at-
tack targets. Web servers are usually accessible through cor- that contain parameters. The parameters of the queries are
compared with established profiles that are specific to the
porate firewalls, and web-based applications are often devel-
program or active document being referenced. This approach
supports a more focused analysis with respect to generic
anomaly detection techniques that do not take into account
Permission to make digital or hard copies of all or part of this work for
the specific program being invoked.
personal or classroom use is granted without fee provided that copies are This paper is structured as follows. Section 2 presents re-
not made or distributed for profit or commercial advantage and that copies lated work on detection of web-based attacks and anomaly
bear this notice and the full citation on the first page. To copy otherwise, to detection in general. Section 3 describes an abstract model
republish, to post on servers or to redistribute to lists, requires prior specific for the data analyzed by our intrusion detection system. Sec-
permission and/or a fee. tion 4 presents the anomaly detection techniques used. Sec-
CCS’03, October 27–31, 2003, Washington, DC, USA. tion 5 contains the experimental evaluation of the system
Copyright 2003 ACM 1-58113-738-9/03/0010 ...$5.00.
with respect to real-world data and discusses the results ob- The analysis process focuses on the association between
tained so far and the limitations of the approach. Finally, programs, parameters, and their values. URIs that do not
Section 6 draws conclusions and outlines future work. contain a query string are irrelevant, and, therefore, they are
removed from U . In addition, the set of URIs U is partitioned
into subsets Ur according to the resource path. Therefore,
2. RELATED WORK each referred program r is assigned a set of corresponding
Anomaly detection relies on models of the intended behav- queries Ur . The anomaly detection algorithms are run on
ior of users and applications and interprets deviations from each set of queries Ur , independently. This means that the
this ‘normal’ behavior as evidence of malicious activity [10, modeling and the detection process are performed separately
17, 13, 19]. This approach is complementary with respect for each program r.
to misuse detection, where a number of attack descriptions In the following text, the term ‘request’ refers only to re-
(usually in the form of signatures) are matched against the quests with queries. Also, the terms ‘parameter’ and ‘at-
stream of audited events, looking for evidence that one of the tribute’ of a query are used interchangeably.
modeled attacks is occurring [14, 25, 23].
A basic assumption underlying anomaly detection is that 4. DETECTION MODELS
attack patterns differ from normal behavior. In addition,
anomaly detection assumes that this ‘difference’ can be ex- The anomaly detection process uses a number of different
pressed quantitatively. Under these assumptions, many tech- models to identify anomalous entries within a set of input
niques have been proposed to analyze different data streams, requests Ur associated with a program r. A model is a set
such as data mining for network traffic [21], statistical analy- of procedures used to evaluate a certain feature of a query
sis for audit records [16], and sequence analysis for operating attribute (e.g., the string length of an attribute value) or a
system calls [12]. certain feature of the query as a whole (e.g., the presence and
Of particular relevance to the work described here are tech- absence of a particular attribute). Each model is associated
niques that learn the detection parameters from the ana- with an attribute (or a set of attributes) of a program by
lyzed data. For instance, the framework developed by Lee et means of a profile. Consider, for example, the string length
al. [20] provides guidelines to extract features that are useful model for the username attribute of a login program. In
for building intrusion classification models. The approach this case, the profile for the string length model captures the
uses labeled data to derive which is the best set of features ‘normal’ string length of the user name attribute of the login
to be used in intrusion detection. program.
The approach described in this paper is similar to Lee’s The task of a model is to assign a probability value to
because it relies on a set of selected features to perform both either a query or one of the query’s attributes. This proba-
classification and link analysis on the data. On the other bility value reflects the probability of the occurrence of the
hand, the approach is different because it does not rely on given feature value with regards to an established profile.
the labeling of attacks in the training data in order to derive The assumption is that feature values with a sufficiently low
either the features or the threshold values used for detection. probability (i.e., abnormal values) indicate a potential at-
The learning process is purely based on past data, as, for tack.
example, in [18]. Based on the model outputs (i.e., the probability values of
the query and its individual attributes), a decision is made –
that is, the query is either reported as a potential attack or as
3. DATA MODEL normal. This decision is reached by calculating an anomaly
score individually for each query attribute and for the query
Our anomaly detection approach analyzes HTTP requests as a whole. When one or more anomaly scores (either for
as logged by most common web servers (for example, Apache the query or for one of its attributes) exceed the detection
[2]). More specifically, the analysis focuses on GET requests threshold determined during the training phase (see below),
that use parameters to pass values to server-side programs or the whole query is marked as anomalous. This is necessary
active documents. Neither header data of GET requests nor to prevent attackers from hiding a single malicious attribute
POST/HEAD requests are taken into account. Note, however, in a query with many ‘normal’ attributes.
that it is straightforward to include the parameters of these The anomaly scores for a query and its attributes are de-
requests. This is planned for future work. rived from the probability values returned by the correspond-
More formally, the input to the detection process consists ing models that are associated with the query or one of the
of an ordered set U = {u1 , u2 , ..., um } of URIs extracted from attributes. The anomaly score value is calculated using a
successful GET requests, that is, requests whose return code weighted sum as shown in Equation 1. In this equation, wm
is greater or equal to 200 and less than 300. represents the weight associated with model m, while pm is
A URI ui can be expressed as the composition of the path its returned probability value. The probability pm is sub-
to the desired resource (pathi ), an optional path information tracted from 1 because a value close to zero indicates an
component (pinfoi ), and an optional query string (q). The anomalous event that should yield a high anomaly score.
query string is used to pass parameters to the referenced
resource and it is identified by a leading ‘?’ character. A
query string consists of an ordered list of n pairs of param-
X
Anomaly Score = wm ∗ (1 − pm ) (1)
eters (or attributes) with their corresponding values. That m∈Models
is, q = (a1 , v1 ), (a2 , v2 ), . . . , (an , vn ) where ai ∈ A, the set of
all attributes, and vi is a string. The set Sq is defined as the A model can operate in one of two modes, training or de-
subset {aj , . . . , ak } of attributes of query q. Figure 1 shows tection. The training phase is required to determine the char-
an example of an entry from a web server log and the cor- acteristics of normal events (that is, the profile of a feature
responding elements that are used in the analysis. For this according to a specific model) and to establish anomaly score
example query q, Sq = {a1 , a2 }. thresholds to distinguish between regular and anomalous in-
169.229.60.105 − johndoe [6/Nov/2002:23:59:59 −0800 "GET /scripts/access.pl?user=johndoe&cred=admin" 200 2122

path a 1= v 1 a 2= v 2

Figure 1: Sample Web Server Access Log Entry

puts. This phase is divided into two steps. During the first 4.1.2 Detection
step, the system creates profiles for each server-side program Given the estimated query attribute length distribution
and its attributes. During the second step, suitable thresh- with parameters µ and σ 2 as determined by the previous
olds are established. This is done by evaluating queries and learning phase, it is the task of the detection phase to assess
their attributes using the profiles created during the previ- the regularity of a parameter with length l.
ous step. For each program and its attributes, the highest The probability of l can be calculated using the Chebyshev
anomaly score is stored and then, the threshold is set to a inequality shown below.
value that is a certain, adjustable percentage higher than this
maximum. The default setting for this percentage (also used
for our experiments) is 10%. By modifying this value, the σ2
p(|x − µ| > t) < (2)
user can adjust the sensitivity of the system and perform a t2
trade-off between the number of false positives and the ex- The Chebyshev inequality puts an upper bound on the
pected detection accuracy. The length of the training phase probability that the difference between the value of a ran-
(i.e., the number of queries and attributes that are utilized dom variable x and µ exceeds a certain threshold t, for an
to establish the profiles and the thresholds) is determined by arbitrary distribution with variance σ 2 and mean µ. This
an adjustable parameter. upper bound is strict and has the advantage that is does not
Once the profiles have been created – that is, the models assume a certain underlying distribution. We substitute the
have learned the characteristics of normal events and suit- threshold t with the distance between the attribute length
able thresholds have been derived – the system switches to l and the mean µ of the attribute length distribution (i.e.,
detection mode. In this mode, anomaly scores are calculated |l−µ|). This allows us to obtain an upper bound on the prob-
and anomalous queries are reported. ability that the length of the parameter deviates more from
The following sections describe the algorithms that ana- the mean than the current instance. The resulting probabil-
lyze the features that are considered relevant for detecting ity value p(l) for an attribute with length l is calculated as
malicious activity. For each algorithm, an explanation of the shown below.
model creation process (i.e., the learning phase) is included.
In addition, the mechanism to derive a probability value p for
a new input element (i.e., the detection phase) is discussed. σ2
p(|x − µ| > |l − µ|) < p(l) = (3)
(l − µ)2
4.1 Attribute Length This is the value returned by the model when operating in
In many cases, the length of a query attribute can be used detection mode. The Chebyshev inequality is independent
to detect anomalous requests. Usually, parameters are either of the underlying distribution and its computed bound is, in
fixed-size tokens (such as session identifiers) or short strings general, very weak. Applied to our model, this weak bound
derived from human input (such as fields in an HTML form). results in a high degree of tolerance to deviations of attribute
Therefore, the length of the parameter values does not vary lengths given an empirical mean and variance. Although
much between requests associated with a certain program. such a property is undesirable in many situations, by using
The situation may look different when malicious input is this technique only obvious outliers are flagged as suspicious,
passed to the program. For example, to overflow a buffer leading to a reduced number of false alarms.
in a target application, it is necessary to ship the shell code
and additional padding, depending on the length of the tar- 4.2 Attribute Character Distribution
get buffer. As a consequence, the attribute contains up to The attribute character distribution model captures the
several hundred bytes. concept of a ‘normal’ or ‘regular’ query parameter by look-
The goal of this model is to approximate the actual but ing at its character distribution. The approach is based
unknown distribution of the parameter lengths and detect on the observation that attributes have a regular structure,
instances that significantly deviate from the observed normal are mostly human-readable, and almost always contain only
behavior. Clearly, we cannot expect that the probability printable characters.
density function of the underlying real distribution will follow A large percentage of characters in such attributes are
a smooth curve. We also have to assume that the distribution drawn from a small subset of the 256 possible 8-bit values
has a large variance. Nevertheless, the model should be able (mainly from letters, numbers, and a few special charac-
to identify significant deviations. ters). As in English text, the characters are not uniformly
distributed, but occur with different frequencies. Obviously,
4.1.1 Learning it cannot be expected that the frequency distribution is iden-
We approximate the mean µ̇ and the variance σ̇ 2 of the real tical to a standard English text. Even the frequency of a cer-
attribute length distribution by calculating the sample mean tain character (e.g., the frequency of the letter ‘e’) varies con-
µ and the sample variance σ 2 for the lengths l1 , l2 , . . . , ln of siderably between different attributes. Nevertheless, there
the parameters processed during the learning phase (assum- are similarities between the character frequencies of query
ing that n queries with this attribute were processed). parameters. This becomes apparent when the relative fre-
quencies of all possible 256 characters are sorted in descend- 4.2.2 Detection
ing order. Given an idealized character distribution ICD, the task of
The algorithm is based only on the frequency values them- the detection phase is to determine the probability that the
selves and does not rely on the distributions of particular character distribution of a query attribute is an actual sam-
characters. That is, it does not matter whether the character ple drawn from its ICD. This probability, or more precisely,
with the most occurrences is an ‘a’ or a ‘/’. In the following, the confidence in the hypothesis that the character distribu-
the sorted, relative character frequencies of an attribute are tion is a sample from the idealized character distribution, is
called its character distribution. calculated by a statistical test.
For example, consider the parameter string ‘passwd’ with This test should yield a high confidence in the correctness
the corresponding ASCII values of ‘112 97 115 115 119 100’. of the hypothesis for normal (i.e., non-anomalous) attributes
The absolute frequency distribution is 2 for 115 and 1 for the while it should reject anomalous ones. The detection algo-
four others. When these absolute counts are transformed into rithm uses a variant of the Pearson χ2 -test as a ‘goodness-
sorted, relative frequencies (i.e., the character distribution), of-fit’ test [4].
the resulting values are 0.33, 0.17, 0.17, 0.17, 0.17 followed For the intended statistical calculations, it is not neces-
by 0 occurring 251 times. sary to operate on all values of ICD directly. Instead, it is
For an attribute of a legitimate query, one can expect that enough to consider a small number of intervals, or bins. For
the relative frequencies slowly decrease in value. In case of example, assume that the domain of ICD is divided into six
malicious input, however, the frequencies can drop extremely segments as shown in Table 1. Although the choice of six
fast (because of a peak caused by a single character with a bins is somewhat arbitrary1 , it has no significant impact on
very high frequency) or nearly not at all (in case of random the results.
values).
The character distribution of an attribute that is perfectly
normal (i.e., non-anomalous) is called the attribute’s ideal- Segment 0 1 2 3 4 5
ized character distribution (ICD). The idealized character x-Values 0 1-3 4-6 7-11 12-15 16-255
distribution is a discrete distribution with:
D P PD
ICD : 7→ with = {n ∈ N |0 ≤ n ≤ 255}, P = {p ∈ Table 1: Bins for the χ2 -test
R|0 ≤ p ≤ 1} and 255
i=0 ICD(i) = 1.0.
The relative frequency of the character that occurs n-most The expected relative frequency of characters in a segment
often (0-most denoting the maximum) is given as ICD(n). can be easily determined by adding the values of ICD for the
When the character distribution of the sample parameter corresponding x-values. Because the relative frequencies are
‘passwd’ is interpreted as the idealized character distribution, sorted in descending order, it can be expected that the values
then ICD(0) = 0.33 and ICD(1) to ICD(4) are equal to 0.17. of ICD(x) are more significant for the anomaly score when x
In contrast to signature-based approaches, this model has is small. This fact is clearly reflected in the division of ICD’s
the advantage that it cannot be evaded by some well-known domain.
attempts to hide malicious code inside a string. In fact, When a new query attribute is analyzed, the number of
signature-based systems often contain rules that raise an occurrences of each character in the string is determined.
alarm when long sequences of 0x90 bytes (the nop operation Afterward, the values are sorted in descending order and
in Intel x86-based architectures) are detected in a packet. combined according to Table 1 by aggregating values that
An intruder may substitute these sequences with instructions belong to the same segment. The χ2 -test is then used to cal-
that have a similar behavior (e.g., add rA,rA,0, which adds culate the probability that the given sample has been drawn
0 to the value in register A and stores the result back to A). from the idealized character distribution. The standard test
By doing this, it is possible to prevent signature-based sys- requires the following steps to be performed.
tems from detecting the attack. Such sequences, nonetheless,
cause a distortion of the attribute’s character distribution, 1. Calculate the observed and expected frequencies - The
and, therefore, the character distribution analysis still yields observed values Oi (one for each bin) are already given.
a high anomaly score. In addition, characters in malicious in- The expected number of occurrences Ei are calculated
put are sometimes disguised by xor’ing them with constants by multiplying the relative frequencies of each of the
or shifting them by a fixed value (e.g., using the ROT-13 six bins as determined by the ICD times the length of
code). In this case, the payload only contains a small rou- the attribute (i.e., the length of the string).
tine in clear text that has the task of decrypting and launch-
ing the primary attack code. These evasion attempts do not 2. Compute the χ2 -value as χ2 =
P i<6 (Oi −Ei )2
- note
i=0 Ei
change the resulting character distribution and the anomaly
that i ranges over all six bins.
score of the analyzed query parameter is unaffected.

4.2.1 Learning 3. Determine the degrees of freedom and obtain the sig-
nificance - The degrees of freedom for the χ2 -test are
The idealized character distribution is determined during identical to the number of addends in the formula above
the training phase. For each observed query attribute, its minus one, which yields five for the six bins used. The
character distribution is stored. The idealized character dis- actual probability p that the sample is derived from
tribution is then approximated by calculating the average of the idealized character distribution (that is, its signif-
all stored character distributions. This is done by setting icance) is read from a predefined table using the χ2 -
ICD(n) to the mean of the nth entry of the stored character value as index.
distributions ∀n : 0 ≤ n ≤ 255 . Because all individual char-
acter distributions sum up to unity, their average will do so as 1
The number six seems to have a particular relevance to the
well, and the idealized character distribution is well-defined. field of anomaly detection [32].
The derived value p is used as the return value for this probability of a single path is the product of the probabili-
model. When the probability that the sample is drawn from ties of the emitted symbols pSi (oi ) and the taken transitions
the idealized character distribution increases, p increases as p(ti ). The probabilities of all possible output words w sum
well. up to 1.

4.3 Structural Inference P Q


p(w) = p(o1 , o2 , . . . , ok ) = (4)
Often, the manifestation of an exploit is immediately vis- (paths p f or w) (states ∈ p) pSi (oi ) ∗ p(ti )
ible in query attributes as unusually long parameters or pa-
rameters that contain repetitions of non-printable characters.
Such anomalies are easily identifiable by the two mechanisms Start
explained before.
There are situations, however, when an attacker is able to
craft her attack in a manner that makes its manifestation 0.3 0.7
appear more regular. For example, non-printable characters
can be replaced by groups of printable characters. In such a | p(a) = 0.5
0.2 a | p(a) = 1
situations, we need a more detailed model of the query at- b | p(b) = 0.5
tribute that contains the evidence of the attack. This model
can be acquired by analyzing the parameter’s structure. For 0.4 1.0
our purposes, the structure of a parameter is the regular
grammar that describes all of its normal, legitimate values. 0.4 c | p(c) = 1 b | p(b) = 1

4.3.1 Learning 1.0 1.0


When structural inference is applied to a query attribute,
the resulting grammar must be able to produce at least all
Terminal
training examples. Unfortunately, there is no unique gram-
mar that can be derived from a set of input elements. When
no negative examples are given (i.e., elements that should
not be derivable from the grammar), it is always possible Figure 2: Markov Model Example
to create either a grammar that contains exactly the train-
ing data or a grammar that allows production of arbitrary For example, consider the NFA in Figure 2. To calculate
strings. The first case is a form of over-simplification, as the probability of the word ‘ab’, one has to sum the probabil-
the resulting grammar is only able to derive the learned in- ities of the two possible paths (one that follows the left arrow
put without providing any level of abstraction. This means and one that follows the right one). The start state emits no
that no new information is deduced. The second case is a symbol and has a probability of 1. Following Equation 4, the
form of over-generalization because the grammar is capable result is
of producing all possible strings, but there is no structural p(w) = (1.0 ∗ 0.3 ∗ 0.5 ∗ 0.2 ∗ 0.5 ∗ 0.4) +
information left.
(1.0 ∗ 0.7 ∗ 1.0 ∗ 1.0 ∗ 1.0 ∗ 1.0)
The basic approach used for our structural inference is to
generalize the grammar as long as it seems to be ‘reasonable’ = 0.706 (5)
and stop before too much structural information is lost. The The target of the structural inference process is to find a
notion of ‘reasonable generalization’ is specified with the help NFA that has the highest likelihood for the given training
of Markov models and Bayesian probability. elements. An excellent technique to derive a Markov model
In a first step, we consider the set of training items (i.e., from empirical data is explained in [30]. It uses the Bayesian
query attributes stored during the training phase) as the out- theorem to state this goal as
put of a probabilistic grammar. A probabilistic grammar is a
grammar that assigns probabilities to each of its productions.
This means that some words are more likely to be produced p(M odel|T rainingData) = (6)
than others, which fits well with the evidence gathered from p(T rainingData|M odel) ∗ p(M odel)
query parameters. Some values appear more often, and this
p(T rainingData)
is important information that should not be lost in the mod-
eling step. The probability of the training data is considered a scal-
A probabilistic regular grammar can be transformed into ing factor in Equation 6 and it is subsequently ignored. As
a non-deterministic finite automaton (NFA). Each state S we are interested in maximizing the a posteriori probability
of the automaton has a set of nS possible output symbols o (i.e., the left-hand side of the equation), we have to maximize
which are emitted with a probability of pS (o). Each transi- the product shown in the enumerator on the right-hand side
tion t is marked with a probability p(t) that characterizes the of the equation. The first term – the probability of the train-
likelihood that the transition is taken. An automaton that ing data given the model – can be calculated for a certain
has probabilities associated with its symbol emissions and its automaton (i.e., for a certain model) by adding the probabil-
transitions can also be considered a Markov model. ities calculated for each input training element as discussed
The output of the Markov model consists of all paths from above. The second term – the prior probability of the model
its start state to its terminal state. A probability value can be – is not as straightforward. It has to reflect the fact that,
assigned to each output word w (that is, a sequence of output in general, smaller models are preferred. The model proba-
symbols o1 , o2 , . . . , ok ). This probability value (as shown in bility is calculated heuristically and takes into account the
Equation 4) is calculated as the sum of the probabilities of all
distinct paths through the automaton that produce w. The
P P
total number of states N as well as the number of transitions
S trans and emissions S emit at each state S. This is
justified by the fact that smaller models can be described unknown threshold t in the case of an enumeration while it
with less states as well as fewer emissions and transitions. is unrestricted in the case of random values.
The actual value is derived as shown in Equation 7. When the number of different argument instances grows
proportional to the total number of argument instances, the
use of random values is indicated. If such an increase cannot
Y
p(M odel) =
P S trans
P S emit
(7) be observed, we assume an enumeration. More formally, to
decide if argument a is an enumeration, we calculate the
(N + 1) ∗ (N + 1)
S∈States
statistical correlation ρ between the values of the functions
f and g for increasing numbers 1, . . . , i of occurrences of a.
The term that is maximized – the product of the probabil- The functions f and g are defined as follows on N0 .
ity of the model given the data, times the prior probability
of the model itself – reflects the intuitive idea that there is a
conflict between simple models that tend to over-generalize f (x) = x (8)
and models that perfectly fit the data but are too complex.
Models that are too simple have a high model probabil- 8
>
ity, but the likelihood for producing the training data is ex- >
>
g(x − 1) + 1,
> g(x − 1) − 1,
th
tremely low. This yields a small product after both terms < if the x value for a is new

>
are multiplied. Models that are too complex have a high g(x) = (9)
th
likelihood of producing the training data (up to 1 when the
>
>
if the x value was seen before
model only contains the training input without any abstrac-
tions), but the probability of the model itself is very low. : 0,if x = 0
By maximizing the product, the Bayesian model induction
approach creates automatons that generalize enough to re- The correlation parameter ρ is derived after the training
flect the general structure of the input without discarding data has been processed. It is calculated from f and g with
too much information. their respective variances Var(f ), Var(g) and the covariance
The model building process starts with an automaton that Covar(f,g) as shown below.
exactly reflects the input data and then gradually merges
states. This state merging is continued until the a posteriori
probability no longer increases. There are a number of op- ρ= pVar(f
Covar(f, g)
) ∗ Var(g)
(10)
timizations such as the Viterbi path approximation and the
path prefix compression that need to be applied to make that If ρ is less than 0, then f and g are negatively correlated
process effective. The interested reader is referred to [30] and and an enumeration is assumed. This is motivated by the fact
[31] for details. Alternative applications of Markov models that, in this case, increasing function values of f (reflecting
for intrusion detection have been presented in [3] and in [35]. the increasing number of analyzed parameters) correlate with
decreasing values of g(x) (reflecting the fact that many argu-
4.3.2 Detection ment values for a have previously occurred). In the opposite
Once the Markov model has been built, it can be used case, where ρ is greater than 0, the values of a have shown
by the detection phase to evaluate query attributes by de- sufficient variation to support the hypothesis that they are
termining their probability. The probability of an attribute not drawn from a small set of predefined tokens.
is calculated in a way similar to the likelihood of a training When an enumeration is assumed, the complete set of iden-
item as shown in Equation 4. The problem is that even legi- tifiers is stored for use in the detection phase.
timate input that has been regularly seen during the training
phase may receive a very small probability value because the 4.4.2 Detection
probability values of all possible input words sum up to 1. Once it has been determined that the values of a query
Therefore, we chose to have the model return a probability attribute are tokens drawn from an enumeration, any new
value of 1 if the word is a valid output from the Markov value is expected to appear in the set of known values. When
model and a value of 0 when the value cannot be derived this happens, 1 is returned, 0 otherwise. If it has been de-
from the given grammar. termined that the parameter values are random, the model
always returns 1.
4.4 Token Finder
The purpose of the token finder model is to determine 4.5 Attribute Presence or Absence
whether the values of a certain query attribute are drawn Most of the time, server-side programs are not directly in-
from a limited set of possible alternatives (i.e., they are to- voked by users typing the input parameters into the URIs
kens or elements of an enumeration). Web applications often themselves. Instead, client-side programs, scripts, or HTML
require one out of a few possible values for certain query forms pre-process the data and transform it into a suitable
attributes, such as flags or indices. When a malicious user request. This processing step usually results in a high reg-
attempts to use these attributes to pass illegal values to the ularity in the number, name, and order of parameters. Em-
application, the attack can be detected. When no enumera- pirical evidence shows that hand-crafted attacks focus on ex-
tion can be identified, it is assumed that the attribute values ploiting a vulnerability in the code that processes a certain
are random. parameter value, and little attention is paid to the order or
completeness of the parameters.
4.4.1 Learning The analysis takes advantage of this fact and detects re-
The classification of an argument as an enumeration or as quests that deviate from the way parameters are presented
a random value is based on the observation that the number by legitimate client-side scripts or programs. This type of
of different occurrences of parameter values is bound by some anomaly is detected using two different algorithms. The first
one, described in this section, deals with the presence and either by a direct edge connecting their corresponding ver-
absence of attributes ai in a query q. The second one is tices, or by a path over a series of directed edges. At this
based on the relative order of parameters and is further dis- point, however, the graph could potentially contain cycles as
cussed in Section 4.6. Note that the two models differ from a result of precedence relationships between attributes de-
the previous ones because the analysis is performed on the rived from different queries. As such relationships are im-
query as a whole, and not individually on each parameter. possible, they have to be removed before the final order con-
The algorithm discussed hereinafter assumes that the ab- straints can be determined. This is done with the help of
sence or abnormal presence of one or more parameters in a Tarjan’s algorithm [33] which identifies all strongly connected
query might indicate malicious behavior. In particular, if components (SCCs) of G. For each component, all edges con-
an argument needed by a server-side program is missing, or necting vertices of the same SCC are removed. The resulting
if mutually exclusive arguments appear together, then the graph is acyclic and can be utilized to determine the set of
request is considered anomalous. attribute pairs O which are in a ‘precedes’ relationship. This
is obtained by enumerating for each vertex vi all its reachable
4.5.1 Learning nodes vg , . . . , vh in G, and adding the pairs (ai , ag ) . . . (ai , ah )
The test for presence and absence of parameters creates a to O.
model of acceptable subsets of attributes that appear simul-
taneously in a query. This is done by recording each distinct 4.6.2 Detection
subset Sq = {ai , . . . , ak } of attributes that is seen during the The detection process checks whether the attributes of a
training phase. query satisfy the order constraints deduced during the learn-
ing phase. Given a query with attributes a1 , a2 , . . . , ai and
4.5.2 Detection the set of order constraints O, all the parameter pairs (aj , ak )
During the detection phase, the algorithm performs for with j 6= k and 1 ≤ j, k ≤ i are analyzed to detect po-
each query a lookup of the current attribute set. When the tential violations. A violation occurs when for any single
set of parameters has been encountered during the training pair (aj , ak ), the corresponding pair with swapped elements
phase, 1 is returned, otherwise 0. (ak , aj ) is an element of O. In such a case, the algorithm
returns an anomaly score of 0, otherwise it returns 1.
4.6 Attribute Order
As discussed in the previous section, legitimate invocations
of server-side programs often contain the same parameters
5. EVALUATION
in the same order. Program logic is usually sequential, and, This section discusses our approach to validate the pro-
therefore, the relative order of attributes is preserved even posed models and to evaluate the detection effectiveness of
when parameters are omitted in certain queries. This is not our system. That is, we assess the capability of the models to
the case for hand-crafted requests, as the order chosen by a accurately capture the properties of the analyzed attributes
human can be arbitrary and has no influence on the execution and their ability to reliably detect potentially malicious de-
of the program. viations.
The test for parameter order in a query determines whether The evaluation was performed using three data sets. These
the given order of attributes is consistent with the model data sets were Apache log files from a production web server
deduced during the learning phase. at Google, Inc. and from two Computer Science Department
web servers located at the University of California, Santa
4.6.1 Learning Barbara (UCSB) and the Technical University, Vienna (TU
The order constraints between all k attributes (ai : ∀i = Vienna).
1 . . . k) of a query are gathered during the training phase. We had full access to the log files of the two universities.
An attribute as of a program precedes another attribute at However, the access to the log file from Google was restricted
when as and at appear together in the parameter list of at because of privacy issues. To obtain results for this data set,
least one query and as comes before at in the ordered list of our tool was run on our behalf locally at Google and the
attributes of all queries where they appear together. results were mailed to us.
This definition allows one to introduce the order constraints Table 2 provides information about important properties
as a set of attribute pairs O such that: of the data sets. The table shows the time interval during
which the data was recorded and the log file size. It also
lists the total number of HTTP queries in the log file, the
O = {(ai , aj ) : ai precedes aj and (11) number of requests that invoke server-side programs (such
as CGI requests), the total number of their attributes, and
ai , aj ∈ (Sqj : ∀j = 1 . . . n)}
the number of different server-side programs.
The set of attribute pairs O is determined as follows. Con-
sider a directed graph G that has a number of vertices equal 5.1 Model Validation
to the number of distinct attributes. Each vertex vi in G This section shows the validity of the claim that our pro-
is associated with the corresponding attribute ai . For every posed models are able to accurately describe properties of
query qj , with j = 1 . . . n, that is analyzed during the train- query attributes. For this purpose, our detection tool was
ing period, the ordered list of its attributes a1 , a2 , . . . , ai is run on the three data sets to determine the distribution of
processed. For each attribute pair (as , at ) in this list, with the probability values for the different models. The length
s 6= t and 1 ≤ s, t ≤ i, a directed edge is inserted into the of the training phase was set to 1,000 for this and all follow-
graph from vs to vt . ing experiments. This means that our system used the first
At the end of the learning process, graph G contains all or- thousand queries that invoked a certain server-side program
der constraints imposed by queries in the training data. The to establish its profiles and to determine suitable detection
order dependencies between two attributes are represented thresholds.
Data Set Time Interval Size (MByte) HTTP Queries Program Requests Attributes Programs
Google 1 hour 236 640,506 490,704 1,611,254 206
UCSB 297 days 1,001 9,951,174 7,993 4,617 395
TU Vienna 80 days 251 2,061,396 713,500 765,399 84

Table 2: Data Set Properties

1
Google
UCSB
TU Vienna

0.1
Relative Number of Attribute Values

0.01

0.001

0.0001
0 0.2 0.4 0.6 0.8 1
Probability Values

Figure 3: Attribute Length

Figure 3 and 4 show a distribution of the probability values search string is included in the distribution. Naturally, this
that have been assigned to the query attributes by the length string, which is provided by users via their web browsers to
and the character distribution models, respectively. The y- issue Google search request, varies to a great extent.
axis shows the percentage of attribute values that appeared
with a specific probability. For the figures, we aggregated the
probability values (which are real numbers in the interval be-
5.2 Detection Effectiveness
tween 0.0 and 1.0) into ten bins, each bin covering an interval This section analyzes the number of hits and false positives
of 0.1. That is, all probabilities in the interval [0.0, 0.1[ are raised during the operation of our tool.
added to the first bin, values in the interval [0.1, 0.2[ are To assess the number of false positives that can be ex-
added to the second bin, and so forth. Note that a proba- pected when our system is deployed, the intrusion detection
bility of 1 indicates a completely normal event. The relative system was run on our three data sets. For this experiment,
number of occurrences are shown on a logarithmic scale. we assumed that the training data contained no real attacks.
Although the original log files showed a significant number of
entries from Nimda or Code Red worm attacks, these queries
Table 3 shows the number of attributes that have been were excluded both from the model building and detection
rated as normal (with a probability of 1) or as anomalous process. Note, however, that this is due to the fact that
(with a probability of 0) by the structural model and the all three sites use the Apache HTTP server. This web server
token finder model. The table also provides the number fails to locate the targeted vulnerable program and thus, fails
of queries that have been classified as normal or as anoma- execute it. As we only include queries that result from the in-
lous by the presence/absence model and the attribute order vocation of existing programs into the training and detection
model. The number of queries is less than the number of process, these worm attacks were ignored.
attributes, as each query can contain multiple attributes. The false positive rate can be easily calculated by divid-
The distributions of the anomaly scores in Figure 3, Fig- ing the number of reported anomalous queries by the total
ure 4 and Table 3 show that all models are capable of captur- number of analyzed queries. It is shown for each data set in
ing the normality of their corresponding features. The vast Table 4.
majority of the analyzed attributes are classified as normal The relative numbers of false positives are very similar for
(reflected by an anomaly score close to one in the figures) all three sites, but the absolute numbers differ tremendously,
and only few instances deviate from the established profiles. reflecting the different web server loads. Although almost
The graphs in Figure 3 and 4 quickly drop from above 90% five thousand alerts per day for the Google server appears
of ‘most normal’ instances in the last bin to values below 1%. to be a very high number at a first glance, one has to take
It can be seen that the data collected by the Google server into account that this is an initial result. The alerts are the
shows the highest variability (especially in the case of the at- raw output produced by our system after a training phase
tribute length model). This is due to the fact that the Google with parameters chosen for the university log files. One ap-
Structure (Attribute) Token (Attribute) Presence (Query) Order (Query)
Data Set normal anomalous normal anomalous normal anomalous normal anomalous
Google 1,595,516 15,738 1,603,989 7,265 490,704 0 490,704 0
UCSB 7,992 1 7,974 19 4,616 1 4,617 0
TU Vienna 765,311 98 765,039 370 713,425 75 713,500 0

Table 3: Probability Values

1
Google
UCSB
TU Vienna

0.1
Relative Number of Attribute Values

0.01

0.001

0.0001
0 0.2 0.4 0.6 0.8 1
Probability Values

Figure 4: Attribute Character Distribution

proach to reduce the number of false positives is to modify the already been installed at this site and were regularly used.
training and detection thresholds to account for the higher This allowed us to base the evaluation on real-world training
variability in the Google traffic. Nearly half of the number of data.
false positives are caused by anomalous search strings that We used eleven real-world exploits downloaded from popu-
contain instances of non-printable characters (probably re- lar security sites [6, 27, 29] for our experiment. The set of at-
quests issued by users with incompatible character sets) or tacks consisted of a buffer overflow against phorum [26], a php
extremely long strings (such as URLs directly pasted into the message board, and three directory traversal attacks against
search field). Another approach is to perform post-processing htmlscript [24]. Two XSS (cross-site scripting) exploits
of the output, maybe using a signature-based intrusion de- were launched against imp [15], a web-based email client,
tection system to discard anomalous queries with known de- and two XSS exploits against csSearch [8], a search utility.
viations. In addition, it is not completely impossible to deal Webwho [9], a web-based directory service was compromised
with this amount of alerts manually. One or two full-time using three variations of input validation errors. We also
employees could browse the list of alerts, quickly discarding wanted to assess the ability of our system to detect worms
obviously incorrect instances and concentrating on the few such as Nimda or Code Red. However, as mentioned above,
suspicious ones. all log files were created by Apache web servers. Apache
When analyzing the output for the two university log files, is not vulnerable against the attacks, as both worms exploit
we encountered several anomalous queries with attributes vulnerabilities in Microsoft’s Internet Information Server (IIS).
that were not malicious, even though they could not be in- We solved the problem by installing a Microsoft IIS server
terpreted as correct in any way. For example, our tool re- and, after manually creating training data for the vulnerable
ported a character string in a field used by the application program, injecting the signature of a Code Red attack [5].
to transmit an index. By discussing these queries with the Then, we transformed the log file into Apache format and
administrators of the corresponding sites, it was concluded run our system on it.
that some of the mistakes may have been introduced by users All eleven attacks and the Code Red worm have been re-
that were testing the system for purposes other than security. liably detected by our anomaly detection system, using the
After estimating the false alarm rates, the detection ca- same thresholds and training data that were used to evaluate
pabilities of our tool were analyzed. For this experiment, a the false alarm rate for this data set. Although the attacks
number of attacks were introduced into the data set of TU were known to us, all are based on existing code that was
Vienna. We have chosen this data set to insert attacks for two used unmodified. In addition, the malicious queries were in-
reasons. First, we had access to the log file and could inject jected into the log files for this experiment after the model
queries; something that was impossible for the Google data algorithms were designed and the false alarm rate was as-
set. Second, the vulnerable programs that were attacked had sessed. No manual tuning or adjustment was necessary.
Data Set Number of Alerts Number of Queries False Positive Rate Alarms per Day
Google 206 490,704 0.000419 4,944
UCSB 3 4617 0.000650 0.01
TU Vienna 151 713,500 0.000212 1.89

Table 4: False Positive Rates

Attack Class Length Char. Distr. Structure Token Presence Order


Buffer Overflow x x x x
Directory Traversal x x
XSS (Cross-Site Scripting) x x x x
Input Validation x x
Code Red x x x

Table 5: Detection Capabilities

Table 5 shows the models that reported an anomalous than 0.2% false alarms in our experiments). Some of the at-
query or an anomalous attribute for each class of attacks. tacks are also detectable by signature-based intrusion detec-
It is evident that there is no model that raises an alert for tion systems such as Snort, because they represent variations
all attacks. This underlines the importance of choosing and of known attacks (e.g., Code Red, buffer overflows). Other
combining different properties of queries and attributes to attacks use malicious manipulation of the query parameters,
cover a large number of possible attack venues. which signature-based system do not notice. These attacks
The length model, the character distribution model, and are correctly flagged by our anomaly detection system.
the structural model are very effective against a broad range A limitation of the system is its reliance on web access logs.
of attacks that inject a substantial amount of malicious pay- Attacks that compromise the security of a web server before
load into an attribute string. Attacks such as buffer over- the logging is performed may not be detected. The approach
flow exploits (including the Code Red worm, which bases described in [1] advocates the direct instrumentation of web
its spreading mechanism on a buffer overflow in Microsoft’s servers in order to perform timely detection of attacks, even
IIS) and cross-site scripting attempts require a substantial before a query is processed. This approach may introduce
amount of characters, thereby increasing the attribute length some unwanted delay in certain cases, but if this delay is
noticeably. Also, a human operator can easily tell that a ma- acceptable then the system described here could be easily
liciously modified attribute does not ‘look right’. This ob- modified to fit that model.
servation is reflected in its anomalous character distribution
and a structure that differs from the previously established
profile. 6. CONCLUSIONS
Input validation errors, including directory traversal at- Web-based attacks should be addressed by tools and tech-
tempts, are harder to detect. The required number of charac- niques that compose the precision of signature-based detec-
ters is smaller than the number needed for buffer overflow or tion with the flexibility of anomaly-based intrusion detection
XSS exploits, often in the range of the legitimate attribute. system.
Directory traversal attempts stand out because of the un- This paper introduces a novel approach to perform anomaly
usual structure of the attribute string (repetitions of slashes detection, using as input HTTP queries containing param-
and dots). Unfortunately, this is not true for input valida- eters. The work presented here is novel in several ways.
tion attacks in general. The three attacks that exploit an First of all, to the best of our knowledge, this is the first
error in Webwho did not result in an anomalous attribute for anomaly detection system specifically tailored to the detec-
the character distribution model or the structural model. In tion of web-based attacks. Second, the system takes advan-
this particular case, however, the token finder raised an alert, tage of application-specific correlation between server-side
because only a few different values of the involved attribute programs and parameters used in their invocation. Third,
were encountered during the training phase. the parameter characteristics (e.g., length and structure) are
The presence/absence and the parameter order model can learned from input data. Ideally, the system will not re-
be evaded without much effort by an adversary that has suffi- quire any installation-specific configuration, even though the
cient knowledge of the structure of a legitimate query. Note, level of sensitivity to anomalous data can be configured via
however, that the available exploits used in our experiments thresholds to suit different site policies.
resulted in reported anomalies from at least one of the two The system has been tested on data gathered at Google,
models in 8 out of 11 cases (one buffer overflow, four directory Inc. and two universities in the United States and Europe.
traversal, and three input validation attacks). We therefore Future work will focus on further decreasing the number of
decided to include these models into our IDS, especially be- false positives by refining the algorithms developed so far,
cause of the low number of false alarms they produce. and by looking at additional features. The ultimate goal is
The results presented in this section show that our sys- to be able to perform anomaly detection in real-time for web
tem is able to detect a high percentage of attacks with a sites that process millions of queries per day with virtually
very limited number of false positives (all attacks, with less no false alarms.
Acknowledgments In Symposium on Applied Computing (SAC). ACM
We would like to thank Urs Hoelzle from Google, Inc. who Scientific Press, March 2002.
made it possible to test our system on log files from one of [19] T. Lane and C.E. Brodley. Temporal sequence learning
the world’s most popular web sites. and data reduction for anomaly detection. In
This research was supported by the Army Research Office, Proceedings of the 5th ACM conference on Computer
under agreement DAAD19-01-1-0484. The views and conclu- and communications security, pages 150–158. ACM
sions contained herein are those of the authors and should not Press, 1998.
be interpreted as necessarily representing the official policies [20] W. Lee and S. Stolfo. A Framework for Constructing
or endorsements, either expressed or implied, of the Army Features and Models for Intrusion Detection Systems.
Research Office, or the U.S. Government. ACM Transactions on Information and System
Security, 3(4), November 2000.
7. REFERENCES [21] W. Lee, S. Stolfo, and K. Mok. Mining in a Data-flow
Environment: Experience in Network Intrusion
[1] M. Almgren and U. Lindqvist. Application-Integrated
Data Collection for Security Monitoring. In Proceedings Detection. In Proceedings of the 5th ACM SIGKDD
International Conference on Knowledge Discovery &
of Recent Advances in Intrusion Detection (RAID),
LNCS, pages 22–36, Davis,CA, October 2001. Springer. Data Mining (KDD ’99), San Diego, CA, August 1999.
[22] J. Liberty and D. Hurwitz. Programming ASP.NET.
[2] Apache 2.0 Documentation, 2002.
O’REILLY, February 2002.
http://www.apache.org/.
[3] D. Barbara, R. Goel, and S. Jajodia. Mining Malicious [23] U. Lindqvist and P.A. Porras. Detecting Computer and
Network Misuse with the Production-Based Expert
Data Corruption with Hidden Markov Models. In 16th
System Toolset (P-BEST). In IEEE Symposium on
Annual IFIP WG 11.3 Working Conference on Data
and Application Security, Cambridge, England, July Security and Privacy, pages 146–161, Oakland,
California, May 1999.
2002.
[24] Miva HtmlScript. http://www.htmlscript.com/.
[4] Patrick Billingsley. Probability and Measure.
Wiley-Interscience, 3 edition, April 1995. [25] V. Paxson. Bro: A System for Detecting Network
Intruders in Real-Time. In Proceedings of the 7th
[5] CERT/CC. “Code Red Worm” Exploiting Buffer
USENIX Security Symposium, San Antonio, TX,
Overflow In IIS Indexing Service DLL. Advisory
CA-2001-19, July 2001. January 1998.
[26] Phorum: PHP Message Board.
[6] CGI Security Homepage.
http://www.phorum.org/.
http://www.cgisecurity.com/, 2002.
[7] K. Coar and D. Robinson. The WWW Common [27] PHP Advisory Homepage.
http://www.phpadvisory.com/, 2002.
Gateway Interface, Version 1.1. Internet Draft, June
1999. [28] M. Roesch. Snort - Lightweight Intrusion Detection for
[8] csSearch. http://www.cgiscript.net/. Networks. In Proceedings of the USENIX LISA ’99
Conference, November 1999.
[9] Cyberstrider WebWho. http://www.webwho.co.uk/.
[29] Security Focus Homepage.
[10] D.E. Denning. An Intrusion Detection Model. IEEE
http://www.securityfocus.com/, 2002.
Transactions on Software Engineering, 13(2):222–232,
February 1987. [30] Andreas Stolcke and Stephen Omohundro. Hidden
Markov Model Induction by Bayesian Model Merging.
[11] R. Fielding et al. Hypertext Transfer Protocol –
Advances in Neural Information Processing Systems,
HTTP/1.1. RFC 2616, June 1999.
1993.
[12] S. Forrest. A Sense of Self for UNIX Processes. In
[31] Andreas Stolcke and Stephen Omohundro. Inducing
Proceedings of the IEEE Symposium on Security and
Probabilistic Grammars by Bayesian Model Merging.
Privacy, pages 120–128, Oakland, CA, May 1996.
In Conference on Grammatical Inference, 1994.
[13] A.K. Ghosh, J. Wanken, and F. Charron. Detecting
[32] K. Tan and R. Maxion. ”Why 6?” Defining the
Anomalous and Unknown Intrusions Against
Operational Limits of Stide, an Anomaly-Based
Programs. In Proceedings of the Annual Computer
Intrusion Detector. In Proceedings of the IEEE
Security Applications Conference (ACSAC’98), pages
Symposium on Security and Privacy, pages 188–202,
259–267, Scottsdale, AZ, December 1998.
Oakland, CA, May 2002.
[14] K. Ilgun, R.A. Kemmerer, and P.A. Porras. State
[33] Robert Tarjan. Depth-First Search and Linear Graph
Transition Analysis: A Rule-Based Intrusion Detection
Algorithms. SIAM Journal of Computing, 1(2):10–20,
System. IEEE Transactions on Software Engineering,
June 1972.
21(3):181–199, March 1995.
[34] Security Tracker. Vulnerability statistics April
[15] IMP Webmail Client. http://www.horde.org/imp/.
2001-march 2002. http://www.securitytracker.com/
[16] H. S. Javitz and A. Valdes. The SRI IDES Statistical learn/statistics.html, April 2002.
Anomaly Detector. In Proceedings of the IEEE
[35] N. Ye, Y. Zhang, and C. M. Borror. Robustness of the
Symposium on Security and Privacy, May 1991.
Markov chain model for cyber attack detection. IEEE
[17] C. Ko, M. Ruschitzka, and K. Levitt. Execution Transactions on Reliability, 52(3), September 2003.
Monitoring of Security-Critical Programs in
Distributed Systems: A Specification-based Approach.
In Proceedings of the 1997 IEEE Symposium on
Security and Privacy, pages 175–187, May 1997.
[18] C. Kruegel, T. Toth, and E. Kirda. Service Specific
Anomaly Detection for Network Intrusion Detection.

You might also like