Optimizing Deep Packet Inspection For High-Speed Traffic Analysis
Optimizing Deep Packet Inspection For High-Speed Traffic Analysis
net/publication/220576043
CITATIONS READS
26 547
3 authors, including:
Fulvio Risso
Politecnico di Torino
117 PUBLICATIONS 1,382 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Fulvio Risso on 31 May 2014.
Niccolò Cascarano
Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy
E-mail: [email protected]
Luigi Ciminiera
Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy
E-mail: [email protected]
Fulvio Risso (corresponding author)
Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy
Phone: +39-0115647008,
Fax: +39-0115647099,
E-mail: [email protected]
2
1 Introduction
The usage of the Internet changed dramatically in the last few years. The
Internet new transports traffic generated by many different users and appli-
cations including financial transactions, e-business, entertainment and more,
which is definitely different from the traffic we had 30 years ago when the
network was engineered for email, telnet and FTP. Often, the only way to
keep the pace of the new traffic trends is to have an effective infrastructure
for real-time measurements in the network, which allow to discover changes in
the traffic patten as soon as they appear and adapt to them quickly.
The capability to recognize which application generated the traffic is per-
haps one of the most important challenges in network measurements. Several
technologies have been proposed so far. First was port-based traffic classifi-
cation [1], which is now considered extremely imprecise [2] due to the large
amount of applications using non-standard ports in order to escape network
limitations (e.g. bandwidth shaping enforced by network providers). Next
step was Deep Packet Inspection (DPI), which analyzed also data in the
application-layer payload and is usually extremely effective. On the downside,
this technology is considered extremely demanding in terms of processing re-
quirements and memory occupancy. Recently, newer techniques [3–13] have
been proposed that base on statistical methods, whose main advantage is the
capability to detect also encrypted and tunneled traffic, but whose precision
is acceptable only if we target a few protocols (often less than a dozen) out
of the hundreds available on nowadays networks. Moreover, their processing
(and memory) requirements may be comparable with packet-based DPI [14].
Due to the experimental nature of the statistical techniques and their cur-
rent limitations, the mostly used technology for traffic classification is still
DPI. Interesting, the biggest criticism to this technique is not related to its
difficulties in classifying encrypted or tunneled traffic, but to its supposed high
processing cost. In fact, DPI is extensively used in security applications such
as Intrusion Detection Systems (IDS) and firewalls, which have strict require-
ments in terms of precision. In other words, a single misclassification in such
these applications could let an attacker to compromise even an entire network,
and is therefore a risk that people do not want to run. In order to minimize
the risk of misclassifications, all these DPI implementations tend to privilege
the accuracy, without taking the processing cost into much consideration.
However our scenario is different because we focus on applications that use
traffic classification mainly for statistics, quality of service (QoS), and moni-
toring, and can accept some compromises in terms of accuracy. For instance,
misclassified traffic may update the byte counter of the wrong protocol but this
does not represent a threat such as a misclassification into an IDS. In this en-
vironment, the DPI implementation can optimized, for example by getting rid
of expensive algorithms (e.g. the ones that reconstruct the entire stream at the
application level) that are a must if the precision is a major concern, but that
contribute substantially to the processing costs and memory requirements.
This paper proposes (and validates) a set of optimizations that are possible
3
2 Related works
Most papers (e.g., [3, 7, 8, 10, 12, 13]) assume that that DPI is a very expensive
traffic classification technique without any further investigation. This mis-
perception is due to the fact that DPI is massively used in security applica-
tions (firewalls, IDS, etc.), which are well-known to have scalability problems
because of their processing requirements (mainly due to DPI).
However, traffic classification for network measurements has different char-
acteristics than the case of network security. The most evident is the expected
degree of precision, since network measurements can accept a limited amount
of uncertainty, while an IDS must intercept every single session that may rep-
resent a security threat and no compromises can be arranged about this point.
A second difference is in the number of protocols that need to be recognized
(often in the order of the hundreds) compared to the number of rules in se-
curity applications (several thousands1 ), which represents one of the inputs
of the pattern matching engine. These observations lead us to the conclusion
that the several works that focused on new techniques for fast and scalable
regular expression matching [16–20] may not be appropriate in our case, since
we can privilege simpler regular expressions (albeit with a limited decrease in
terms of accuracy), allowing us to use faster algorithms even if they may suffer
from state explosion [16]. In other words, we do not look at new algorithms,
but we want to use the fastest ones in their best operating conditions.
In the line of dismantling the myth of the excessive cost of DPI engines, [14]
recently demonstrated that the complexity of a well-known traffic classifier
based on Support Vector Machine [21] may be comparable to the DPI one.
However, this paper assumes a “lightweight” implementation of the DPI engine
that operates packet-based instead of message-based, but it does not justify
adequately this choice and it does not examine the (supposed) loss in precision
in the sake of performance. Along the same line of cost reductions, optimiza-
tions based on reducing the amount of data that will be analyzed by the
1 For instance, the ruleset of the November 2007 release of Snort includes 5549 rules
pattern matching engine, e.g. using only the first portion of the session data
for classification, are present in IDS such as BRO [22] and Snort [23]. Moore
et al. [24] had a similar idea based on a cascade of algorithms, each one fed
with more data than the one available in the previous step until a positive
match is obtained. Another technique consists in stopping the analysis of a
session if it is still unclassified after a given number of packets. However those
techniques were never validated; the advantages in terms of processing cost
and the (supposed) loss in terms of precision are still unknown. Furthermore,
an adequate justification of the choice of a packet-based approach (excluding
some partial results in [24, 25]) was still missing.
This paper aims at filling this gap by properly justifying of the feasibility
of the “lightweight” DPI approach, and by presenting (and evaluating) some
more optimizations that are able to decrease the processing cost even further,
all based on the assumption that we can tolerate a minor worsening in the clas-
sification accuracy. Furthermore, we can leverage some common characteristics
of the network traffic present in nowadays networks in order to speed-up the
processing of the most frequent processing path in the DPI engine.
3 DPI Improvements
This section describes the basic operation of a DPI traffic classifier and presents
the possible improvements with respect to processing speed.
This section presents some architectural choices that can have a huge impact
on the performance of the DPI traffic classifier, namely the possibility to op-
erate per-packet instead of per-session and the algorithm used by the pattern
matching engine.
Intuitively, the main cost of a DPI classifier derives from the execution of the
pattern matching algorithm; experimental evidence can be found e.g. in [14],
which refers to a PBFS traffic classifier. Using the same methodology presented
in [14] that splits the DPI engine in its main components, Table 6 shows that
the pattern matching can account up to 8900 CPU ticks2 , while the impact
of the other two main components of a PBFS classifier (Session ID Extrac-
tion, which extracts the source/destination IP address and source/destination
2 More details on the evaluation methodology will be provided in Section 4.
6
TCP/UDP port from each incoming packet, and Session Lookup, which de-
termines if the current session has already been classified) is negligible.
Although there are several possible implementations for regular expression
matching, we know from the theory that the choice of the proper algorithm for
the regular expression engine could make a big difference in terms of perfor-
mance. In this respect, the best option is the Deterministic Finite Automata
(DFA), in which the computational cost depends only on the length of the
input sequence, independently from the number of regular expression to be
checked. Unfortunately, depending on the characteristics of the signatures,
the automaton that represents the set of regular expressions may require a
large amount of memory, which is the reason why DFA is barely used in regu-
lar expression matching. Despite the common belief, this paper suggests that
DFA-based engines can be proficiently used for traffic classification because,
under some assumptions (that are verified in our scenario) we do not have state
explosion, which is the main reason for the adoption of other regex engines.
Another impact factor on execution cost is the “friendliness” of regular
expression used for identifying protocols. A massive use of wildcards can lower
the overall performance in terms of processing cost (and impacts on the mem-
ory occupancy too). When writing protocol signatures, we can have an ad-
ditional boost in performance if we pay attention to the form of the regular
expression used.
In this section we focus on the PBFS flavour of the DPI because, as it will be
demonstrated in Section 5.1, a packet-based approach is appropriate for our
objectives. In this case we can further improve the cost of the DPI classifica-
tion by implementing two additional techniques that are based on the idea of
reducing the amount of data processed by the pattern matching engine.
In fact, referring to the results reported in Table 6, the cost of the SessionID
Extraction and Session Lookup is fixed and present in all the packets, while the
pattern matching depends on the regular expressions used and on the payload
being analyzed (hence of the traffic trace used in the evaluation).
Since the signature matching may be repeated on any new packet until
the session is classified, important parameters are the number of times the
pattern matching algorithm is invoked (e.g. a trace with many long sessions
might require a smaller number of analysis), the presence of unclassifiable
sessions such as encrypted communications (that, being encrypted, will never
match their signature, although they might trigger some misclassifications),
the number of bytes inspected.
For these reasons, the maximum cost reported in Table 6 represents the
worst case in case of a PBFS classifier, while the actual cost of the pattern
matching is the average value3 that is derived by calculating the average pro-
cessing time over all packets submitted to the pattern matching engine for a
3 Although in case of real-time traffic analysis we should care about the worst case in
7
given trace.
From the above considerations, it is evident that two methods can be used
to further improve the execution cost of a PBFS DPI engine: the reduction
of the number of bytes of the payload that will be examined (Section 3.3.1)
and the reduction of the number of packets submitted to the pattern matching
algorithm (Section 3.3.2).
This technique aims at reducing the amount of bytes inspected, such as the
one presented in the previous section, but looking at a different perspective.
In fact, many application-layer protocols repeat their protocol headers at the
order to be sure to sustain any incoming load, this may lead to overprovisioned system,
since the probability that all the packets fall in the worst-case scenario is very small. We
consider the average case a better representation of what we can actually have in a real
network scenario.
8
4 Evaluation methodology
application were downloading and seeding some popular resources for all the
duration of the capture. This traffic dataset is known to be very challenging for
DPI classifier because of the massive presence of P2P traffic. Some signatures
related to WebTV protocols were in fact not available at all, while some have
been derived from reverse engineering. Skype traffic is encrypted and P2P
clients adopt hiding techniques for avoiding classification. The UNIBS-GT
trace was collected in a research laboratory where about 20 PhD students
were asked to run the GT daemon while doing their normal activities. This
trace is smaller than POLITO-GT in term of volume, but it is interesting
because it contains normal users’ activity, including some P2P file sharing.
Since POLITO-GT and UNIBS-GT include traffic generated by a limited
number of hosts due to the difficulties to deploy the GT suite over many
clients, we decided to add also the POLITO trace in our analysis that includes
traffic generated by about 6000 hosts during an entire working day in order
to extend the evaluation scenario. Although the ground truth is not available
on this trace (hence the accuracy of our optimizations cannot be assessed on
this trace), it represents the best choice for evaluating the impact of proposed
optimizations in terms of processing cost because of the presence of a large
variety of traffic that better represents the behavior of a real network.
5 Experimental evaluation
This section evaluates the impact that different algorithms may have on the
cost of the pattern matching, and suggests some methods to improve the
“friendlyness” of the regular expressions with respect to DFAs.
11
The algorithm used for the pattern matching has a huge impact on the per-
formance of a DPI classifier. Table 3 analyzes three different algorithms (in
the implementation provided by [28]), namely NFA (Non-Deterministic Finite
Automata), DFA and compressed DFA (cDFA [19]), when applied to the entire
signature database of l7-filter [26]. Results include the minimum, average
and maximum cost for analyzing a packet (as derived from the the POLITO
traffic trace), the number of distinct automata required to compile the pattern
set into the corresponding memory structure, and the total memory used.
The first result is that the l7-filter database cannot be compiled into
a single DFA because of the ambiguities contained in the pattern set, which
generates a graph that grows exponentially and cannot be contained in mem-
ory. For instance, DFA and cDFA require to partition the signatures in four
parts, leading to four different automata that were processed sequentially. The
splitting algorithm was very simple, since we created an additional automaton
as soon as the number of states exceeded 100K. Vice versa, the NFA does not
have any problem of memory explosion and the entire set can be compiled
into a single automaton. As expected (results in Table 3) the NFA guarantees
very limited memory occupancy but at the expense of the execution cost that
is prohibitively high, which may further increase when adding new protocol
signatures. In fact, its average cost is about three orders of magnitude higher
than the sequential execution of the four DFAs, and two order of magnitude
higher when compared to the cDFA case.
As predicted by the theory, a DFA-based algorithm has the best processing
performance and its worst-case execution cost is independent from the type
of signature used and from the number of regular expressions, as shown in
Figure 1. However, the question is whether this approach is applicable in our
case, since DFAs are well-known for states explosion. Tests reported before,
in fact, used four different DFAs in sequence in order to limit this problem,
but in line of principle we cannot guarantee that this approach is feasible, e.g.,
at we never reach one point in which the number of distinct DFA is so large
that it cancels the theoretical advantages of the algorithm. This point will be
discussed in the next section.
It is worthy remembering that DFA can have other limitations, e.g., pat-
terns cannot use context-sensitive regular expressions such as (a*)(b*)(\1)(\2).
In other words, a DFA implementation guarantees better performance but
limits the expressiveness of the signatures. We speculate that these regular
expressions are very rare in the real world; for instance, both the l7-filter
and NetPDL databases do not contain any of those.
Being the memory explosion the most important point against DFAs, we ana-
lyzed the memory occupancy in case of the signatures contained in the NetPDL
database available online [27]. Results are reported in Figure 1, that measures
12
10000 18
First anchored
Processing regular expression 16
9500 Cost with Kleene closure
Processing cost (CPU ticks)
Memory oc-
10
8500
8
8000 6
4
7500
2
7000 0
0 5 10 15 20 25 30 35 40 45 50
# of regular expressions
Fig. 1 Cost and memory occupation of DFA implementation of pattern matching algorithm.
the processing cost and the memory occupancy starting with a single regular
expression, then repeating the same test with up to N, N + 1, ... patterns, till
we reach the total number of signatures contained in our protocol database.
It is evident that the memory occupancy does not grow linearly, as it is
strongly dependent from the type of signature added. In this respect, we can
divide regular expressions in three classes: (i) anchored regexp (i.e., begins with
the ‘ˆ’ sign), that identifies the regular expressions satisfied only if the pattern
is found at the beginning of the payload, (ii) anchored regexp containing the
Kleene closure (i.e. the ‘*’ wildcard), in which the regular expression can be
found in any point of the input data, and (iii) not anchored regexp containing
the Kleene closure 4 .
In fact, Figure 1 shows that the memory occupancy increases linearly when
the input patterns contain only expressions of type (i) (first region on the
left), but the slope increases when we add also expressions of type (ii) (sec-
ond region), and it tends to increase exponentially when we add the second
expressions of type (iii). This is due to the possible ambiguities in the input
pattern that force the addition of a large number of states for matching all the
possible cases. It is worthy noting that the number of states explodes when at
4 The additional category not anchored regexp not containing the Kleene closure is omit-
ted since is equivalent to type (ii), where the Kleene closure is at the beginning of the regular
expression.
13
least two expressions of type (iii) are merged in the same DFA.
In our experience, traffic classification applications are not so prone to
state explosion for two reasons. First, the number of patterns in traffic clas-
sification is definitely smaller than the ones needed in security applications
(hundred against several thousands). Second, we can forge the signatures in
order not to trigger state explosion (Section 5.2.3), e.g. avoiding, whenever
possible, the Kleene closure and preferring anchored patterns. For instance,
the NetPDL signatures used in this paper includes only two patterns of type
(iii), one for a protocol encapsulated in TCP (http) while the other in UDP
(nt-security-log), giving us the possibility to split the patterns into two
different sets. The two resulting DFAs are reasonably small, each one contain-
ing less than 100K states while the total amount of memory used was about
3MBytes, roughly split in half between TCP and UDP. The problem of writing
“better” regular expression will be investigated in the next section, which will
demonstrate that the form of the regular expression may have an additional
impact on the overall performance, particularly in case of a DFA engine.
In case a state explosion may occur (e.g. in case we use the plain l7-filter
signature database as in Table 3), we can partition the automaton in multiple
sub-automaton executed in sequence, and if their cardinality is small (such as
we expect to be), we can parallelize their execution on different CPU cores,
using an approach investigated in [16, 29].
As the last resort in case a DFA is no longer applicable, we can still use
the several solutions (e.g., [17, 18, 20]) available in the literature, with the cor-
responding worsening in terms of processing cost. However, our tests seem to
suggest that the DFA engine is usually applicable in our application scenario.
We evaluated four slightly different signatures that are used to recognize the
HTTP protocol by different tools5 . We measured their execution cost on the
5 The “anchored” version was derived from the one used in the tstat tool:
ˆ((http\/(0\.9|1\.0|1\.1)\ [1-5][0-9][0-9])|(connect|post|get|head|propfind|mSkcol
|delete|put|copy|move|lock|unlock)\ ).
The “anchored + Kleene” version was used in the earlier version of l7-filter:
ˆ((http\/(0\.9|1\.0|1\.1)\ [1-5][0-9][0-9])|(connect|post|get|head|propfind|mSkcol
|delete|put|copy|move|lock|unlock)\ [\x09-\x0d\ -∼]*(\ http\/[01]\.[019])).
The “Not anchored + Kleene” was by far the most common one and was derived from
the one present in the current version of l7-filter: (http\/(0\.9|1\.0|1\.1)\ [1-5][0-9][0-
9])|(connect|post|get|head|propfind|mSkcol |delete|put|copy|move|lock|unlock)\ [\x09-\x0d\
-∼]*(\ http\/[01]\.[019]).
The “Not anchored, no Kleene” case is omitted since it is equivalent to the second type.
14
POLITO traffic trace in term of clock ticks per packet, differentiating between
matching and no-matching cases and averaging the results over the number of
HTTP packets inspected.
Table 4 shows that “not anchored” or “with Kleene” signatures often force
the algorithm to analyze the entire network packet before being able to con-
clude if the regular expression matches or not, while the “anchored” usually
stops the processing after a few bytes and has therefore the lowest average
cost for both the match and no-match cases. The “anchored + Kleene” rep-
resents a compromise because of its high cost in case of matching, while the
no-matching case is much more favorable because the algorithm usually stops
at the beginning of the payload (because of the anchor) and hence the Kleene
operator does not have to consume all the payload.
In addition to the processing cost, we measured (Table 5) also the impact
of different types of signatures on the classification precision, quantifying the
amount of traffic that is (supposedly) incorrectly classified with the simplest
signatures. We evaluated the variation in terms of classification accuracy of
the different types of signatures against the “not anchored+Kleene” used as
baseline. The first result is the amount of traffic (in bytes) that was classified as
HTTP with the signature used as reference and that becomes unknown traffic
with the signature under testing. The second result refers to the opposite
value, i.e. the traffic that went unclassified with the baseline signature and
that became HTTP traffic with the other regular expressions.
Due to the limited amount of HTTP traffic present on the POLITO-GT
trace, we concentrated our analysis on UNIBS-GT and POLITO traces. In-
terestingly, the UNIBS-GT traffic trace is classified exactly in the same way,
independently from the regular expression used. Very limited variations can
be seen on the POLITO trace, mostly referred to some (previously) unknown
traffic that becomes HTTP when using the “anchored” signature. Since we
do not have the ground truth for that trace, we selected randomly some ses-
sions which changed their classification result and we verified that these are
either (i) some HTTP-like protocols (apparently generated by our machines
on Planetlab) or (ii) some non-well formed HTTP requests, possibly generated
by other HTTP-like applications.
Summarizing, the “anchored” version of the signature is too permissive
and returns some false positives, while the “not anchored+Kleene” is roughly
All these signatures were updated in order to take into account also the new methods defined
in HTTP 1.1.
15
Previous section suggested that a PBFS DPI traffic classifier is appropriate for
traffic classification when a limited worsening in terms of accuracy is tolerated,
and that the DFA algorithm is feasible in our application environment.
This section presents the results that can be achieved by such a classifier,
which represents the “baseline classifier” that will be used to evaluate the new
optimizations presented in the rest of the paper. This classifier is the same
presented in [25] and uses the NetPDL database for the protocol signatures.
The current version of the NetPDL database (as of July 2009) includes 72
application-level protocols (39 TCP, 25 UDP and 8 that operate with both
TCP and UDP), whose signature are partially derived from the l7-filter
project. In addition to a pure PBFS classifier, we can analyze “correlated ses-
sions”, i.e., the ones created at run-time by some protocols (e.g., FTP, SIP),
whose network parameters are negotiated in the control connection, albeit on
a per-packet base. These sessions often transport a large amount of data and
therefore their impact is noticeable in terms of the amount of bytes classi-
fied [24].
Table 6 reports the cost of the main blocks of a PBFS DPI classifiers,
calculated per-packet. The cost of the pattern matching can range from 13 to
8900 ticks6 , due to the reasons presented in Section 3.3. The lowest cost is
obtained when a packet with 1-byte payload is inspected, while the highest is
related to a full-size IP packet (1460 bytes payload) filled with fake data that
does no match any pattern. The average cost is referred to the POLITO trace
and it is the number of ticks required to inspect a packet submitted to the
DPI engine averaged over the total number of packets inspected.
Table 7 reports the classification results (in terms of average processing
cost per packet and unknown / misclassified traffic) obtained on the three
traffic traces by the baseline DPI classifier. We can note that the average
6 These numbers have been derived with the DFA algorithm described in [16], which is
cost per packet is very different from the one presented in Figure 6, which is
because here we averaged over all the packets present in the trace (hence taking
into account that many packets do not reach the pattern matching engine).
Analyzing these results we can see that the traffic mix is very different in our
traces; for instance, POLITO-GT has by far the highest percentage of unknown
TCP traffic. This is expected because the signatures for the WebTV protocols,
which represent the largest part of the traffic captured, are partially unknown
and partially derived with reverse engineering (and not very precise). With
respect to the UDP portion, the accuracy is even more problematic because
of the high percentage of misclassified traffic. Trace UNIBS-GT is less critical
than POLITO-GT since the percentage of misclassified traffic is reasonably
low; we still have a large portion of unknown traffic due to the use of P2P
file sharing applications. For trace POLITO we have only the information of
unknown traffic that results to be lower than the other two traces for the TCP
case (most hosts on the network use only “standard” applications such as web
and email). Vice versa, the unknown UDP traffic is rather high, probably for
the presence of Skype traffic that goes mostly undetected.
7
Unibs-GT TCP
Polito-GT TCP
6 Polito TCP
Unibs-GT UDP
Polito-GT UDP
Speedup 5 Polito UDP
0
100 200 300 400 500 600 700 800 900 1000 1100
Snapshot length (bytes)
90
Unibs-GT misclassified TCP
80 Unibs-GT unknown TCP
Polito-GT misclassified TCP
Polito-GT unknown TCP
70 Polito unknown TCP
60
Traffic (bytes %)
50
40
30
20
10
0
100 200 300 400 500 600 700 800 900 1000 1100
Snapshot length (bytes)
90
Unibs-GT misclassified UDP
80 Unibs-GT unknown UDP
Polito-GT misclassified UDP
Polito-GT unknown UDP
70 Polito unknown UDP
60
Traffic (bytes %)
50
40
30
20
10
0
100 200 300 400 500 600 700 800 900 1000 1100
Snapshot length (bytes)
a snapshot length of 128 bytes does not affect significantly the accuracy; in
fact,the result achieved with a snapshot of 256 bytes is almost indistinguishable
from the result obtained with full payload. Furthermore, the traffic classified
differently includes also some sessions that are misclassified with full payload
and that remain unclassified with the snapshot, which represents an improve-
ment since the unknown traffic is usually preferred to misclassifications.
According to these results, an hard limit of 256 bytes seems to be a good
tradeoff between the improvements in processing costs (especially for TCP
traffic, which improves between 2.8 and 4.4 times the baseline classifier), and
the impact on accuracy. For UDP, the improvement is smaller and varies be-
tween 1.1 and 1.8 but there are almost no effects in terms of accuracy.
100%
Correctly classified
90% POLITO-GT
Misclassified
80% POLITO-GT
UNIBS-GT
60% Misclassified UNIBS-
GT
50%
40%
30%
20%
10%
0%
1 2 3-10 11-100 >100
Classification attempts performed
value for the mean and an high value for the standard deviation (e.g. Skype,
SSL). These protocols suggest that their signature is reasonably effective (is
able to identify the most part of the traffic within the first few packets) but
it is not very precise because several sessions are classified after inspecting a
large number of packets, which are mostly misclassifications. In all these cases,
a reasonable limit on the number of classification attempts per session should
not affect the amount of traffic correctly classified, while it should decrease
the misclassifications.
Figure 5 shows the percentage of traffic that is either correctly classified
or misclassified by the baseline classifier while examining the first N packets
of each session. It is evident that the most part of the traffic is correctly clas-
sified by examining only the first packet, while the sessions that are classified
when examining the N th packet (with N ≥ 2) is definitely limited. Besides,
20
90
Unibs-GT misclassified TCP
80 Unibs-GT unknown TCP
Polito-GT misclassified TCP
Polito-GT unknown TCP
70 Polito unknown TCP
60
Traffic (bytes %)
50
40
30
20
10
0
0 5 10 15 20 25 30 35 40 45 50
Classification attempt limits
inspecting more packets has the side effect of increasing the percentage of mis-
classified traffic, because the randomness of application data transported leads
to incidentally return a positive match on some “weak” signatures.
Figure 6 confirms that TCP traffic is classified almost entirely at the first
packet in both UNIBS-GT and POLITO-GT traces (the curves of unclassified
and misclassified traffic does not change sensibly with N ≤ 50). Considering a
limit of N = 2, the correctly classified traffic is reduced of 0.95% in the UNIBS-
GT trace but misclassifications almost disappear, which vice versa account up
to 7.73% in absence of limits. Results on the POLITO-GT are even better,
with a loss of 0.20% in terms of correct traffic and almost no misclassifications.
Unfortunately, the analysis of UDP traffic in Figure 7 is less clear. For
instance, the amount of correctly classified traffic in the POLITO-GT trace is
almost independent from the N limit, because the decrease of the unknown
traffic (with higher values of N ) is balanced by a corresponding increase of
the misclassifications. Vice versa, the UNIBS-GT trace shows an increase in
the amount of traffic correctly classified with higher values of N , since the
unclassified traffic decreases without a corresponding increase in the misclas-
sifications (which is in some sense expected because the misclassified traffic in
this trace is already very low).
For the POLITO trace we can see a decrease in the unknown traffic with
higher values of N but we cannot confirm whether this traffic is correctly clas-
sified, in both TCP and UDP cases, because we lack ground truth information.
Our analysis suggests that that limiting the number of classification at-
tempts of a DPI classifier is a good strategy for reducing the computational
cost introduced by unclassifiable traffic while preserving, and in some cases
improving, the classification accuracy and it is particularly effective on traces
with high amount of encrypted traffic or P2P applications. In case of TCP
traffic, a limit of N = 2 seems to be a good tradeoff7 , improving the classifica-
7 This value is due to the fact that some signatures match the message coming from the
21
60
Unibs-GT misclassified UDP
Unibs-GT unknown UDP
Polito-GT misclassified UDP
50 Polito-GT unknown UDP
Polito unknown UDP
40
Traffic (bytes %)
30
20
10
0
0 5 10 15 20 25 30 35 40 45 50
Classification attempt limits
tion accuracy and triggering an improvement of the processing cost that can
be up to 50 times in trace POLITO-GT (Figure 8).
With respect to the UDP traffic, this optimization does not seem to guar-
antee sensible improvements in terms of processing costs even in case of very
small values of N (Figure 8 shows 1.2 - 2.9 maximum speedup even with
N = 2). One reason may be the typical length of a UDP session that is defi-
nitely shorter than in the TCP case (e.g. several DNS queries are present in
the traces) and hence such a limit on the number of classification attempts
may not bring many advantages. Furthermore the impact on the classification
accuracy is unclear because the misclassifications are definitely reduced, but
the amount of correctly classified traffic may suffer. This may be due to the
poor quality of the signatures we use and the fact that some of them require
the inspection of several consecutive packets (e.g. RTP, Skype), but this point
will surely require further investigations. We suggest a limit of N = 10 that has
almost no impact on the processing cost (e.g., 1.03 speedup for the UNIBS-GT
trace) but it contributes to keep misclassifications at a reasonable value.
100
Unibs-GT TCP
Polito-GT TCP
Polito TCP
Unibs-GT UDP
Polito-GT UDP
Speedup Polito UDP
10
1
0 5 10 15 20 25 30 35 40 45 50
Classification attempt limits
90%
80%
70%
60%
50% No snap
40% 1024
30% 512
20% 256
10% 128
0% 96 Snapshot
1
3-10
11-100
>100
Classification attempts performed
when combining the two optimizations together, where bold values are in cor-
respondence of the best results, i.e. where the worsening in terms of accuracy
remains < 1%. Particularly, the last column represents the results obtained
applying only the limit on the number of classification attempts, while the last
row reports the results obtained applying only the snapshot length optimiza-
tion. It evident that even the best combination of the parameters does not
provide any noticeable advantage in terms of speedup compared to using the
most effective optimization alone. Furthermore, the same analysis repeated on
other traces confirm not only the results in terms of speedup, but also that
the best combination of parameters is definitely hard to predict.
6 Conclusions
we concluded that the DFA is feasible in our case, mainly because of the char-
acteristics of our protocol signatures. Furthermore, signatures can lead to very
high execution cost if they are not anchored and/or they contain the Kleene
closure and we demonstrated that simpler signatures do not necessarily cause
a sensible worsening in classification accuracy.
Focusing on a packet-based DPI engine, we exploited the typical structure
of protocol signatures and headers and we analyzed the impact of limiting
the amount of application-layer data fed to the pattern matching engine. Our
results demonstrate that the precision of our DPI classifier does not change
when reducing the payload analysis to 256 bytes, with a significant decrease
(about 4 times for TCP) in processing complexity. Furthermore, we analyzed
the distribution of the classification attempts needed to classify the most part
of traffic and we discovered that usually a PBFS DPI classifier requires a
few packets to correctly classify a session and that “late” classifications are
usually misclassifications. A simple limitation on the number of classification
attempts for each session leaded up to a 50x speedup (for TCP traffic) in terms
of processing speed compared to the baseline DPI engine, while reducing at
the same time the number of misclassications. Finally, we combined the last
two techniques together, but we found that the parameters are extremely
dependent on the traffic trace considered; our results suggest the adoption of
the first technique for TCP traffic (with a limit of 2 packets inspected for each
session) and the second for UDP traffic (with a limit at 256 bytes), with an
additional limit of maximum 10 packets inspected, which helps to decrease the
amount of misclassified traffic.
While the well known problems of the DPI still hold (difficulties in recogniz-
ing encrypted/tunneled traffic, extreme sensitiveness to the signature dataset,
necessity to manually derive the signatures for new applications, etc.), this
paper can change the way DPI is considered at least with respect to pro-
cessing cost, paving the way for the adoption of packet-based DPI techniques
also on very high speed networks. In fact, our results demonstrate that the
average cost of the sole pattern matching module (not counting all the other
costs present in a real classifier) can account to a few hundred CPU ticks per
packets, which means that a 3GHz CPU can potentially handle the impressive
number of more than 23M packets per second, i.e. a value comparable to the
amount of data transported on a full-duplex 10 Gigabit pipe.
7 Acknowledgement
References
Author Biographies