Issues in Benchmarking Intrusion Detection Systems: Marcus J. Ranum
Issues in Benchmarking Intrusion Detection Systems: Marcus J. Ranum
1
IDS Benchmarking?
• How hard can it be to benchmark
intrusion detection systems?
– Very!
– There are lots of ways to get it wrong
• Accidentally
• Deliberately
– Avoiding doing it wrong does not
necessarily mean you’ve done it right
2
What’s an IDS?
• IDS = Intrusion Detection System
– Primary criterion for measurement is the
IDS’ ability to detect intrusions
– Secondary criteria for measurement are
other issues:
• False positives - false alarms
• False negatives - real attacks that are missed
• Performance impact - thruoughput delay or CPU
usage on host processor
3
Types of IDS
• Primary Types:
– Network IDS (NIDS)
– Host IDS (HIDS)
• Hybrid Types:
– Per-Host Network IDS (PH-NIDS)
– Load Balanced Network IDS (LB-NIDS)
– Firewall IDS (FW-IDS)
4
Properties of: Network IDS
• Collect packets in promiscuous mode
• Issues:
– Packet collection rate - what is the
maximum throughput?
– Reassembly/defragmentation/reordering -
what about traffic spoofing?
– Selective analysis - is the IDS choosing to
ignore some traffic in order to optimize?
5
Properties of: Host IDS
• Operate on host logs and processes
– Sometimes forwards audit records to a
central for analysis
• Issues:
– CPU usage on host
– What about packet-oriented attacks?
– Per-platform (individual) view of attacks -
single system is monitored per agent
6
Properties of: Per-Host
Network IDS
• Network IDS “shim” layer inserted into
network stack on each host
• Issues:
– Has properties of a network IDS
– But:
• Traffic is processed per-host only
• Does not have same performance as NIDS
• “Local” only view of traffic (but no drops)
7
Properties of: Load-Balanced
Network IDS
• Use a load-balancing pre-processor to
“spread” load across multiple NIDS
• Issues:
– Can scale to “infinite” bandwidth
– Total cost of solution is not single unit
pricing (requires switch + multiple NIDS)
8
Properties of: Firewall IDS
• Place network IDS capability in a
firewall or bridge type device
• Issues:
– No packet loss issues (retransmits take
care of packets that are lost)
– (May) slow down network throughput
9
Other Issues
• Other things affecting speed and
detection ability:
– TCP fragment re-assembly
– TCP packet re-ordering
– TCP state/sequence tracking
– Analyzing only selected sessions
10
Fragment Re-assembly
• Re-assembling fragments takes significant
CPU time as well as memory to buffer
packets
– IDS can be negatively impacted by faked
fragments intended to consume extra memory
– How does IDS handle fragmented attacks?
Simply alert “I see fragmented traffic” or de-
fragment then apply IDS logic?
11
Packet Re-ordering
• Re-ordering packets requires significant
CPU as well as memory for packet
buffering
– IDS can be impacted by unintentional or
deliberate packet drops since it tries to buffer
out-of-sequence packets
– How does IDS handle re-ordering? Does it
just flag out-of-sequence packets, or does it
re-order then apply IDS logic?
12
TCP State Tracking
• Tracking TCP states requires
maintaining per-session information
– IDS is impacted by number of
simultaneous streams
– IDS is impacted by randomized traffic
– IDS is harder to fool with faked out-of-
sequence FIN packets
13
Analyzing Selected Sessions
• IDS can “optimize” performance by only
reassembling or tracking TCP related
with known signatures
– IDS might have extremely good
performance against random traffic but
poor performance against (e.g.) Web traffic
– Tradeoff is coverage versus performance;
vendors do not usually document this
14
Naïve Simulation Network
Target Host
Test
Network
Attack
Attack Stream NIDS
Generator
15
What’s Wrong?
• The Naïve test network permits traffic
that is not likely to be seen in a “real
world” deployment - e.g.: ARP cache
poisoning (you see a lot of this on
DEFCON CTF networks)
• The presence of a router would
“smooth” spikes somewhat and actually
achieve higher sustained loads
16
Naïve Simulation Network #2
Attack
Attack Stream NIDS
Generator
17
What’s Wrong?
• SmartBits style traffic generators do not
generate “real” TCP traffic
– This penalizes IDS that actually look at
streams and try to reassemble them (which
are desirable properties of a good IDS)
18
Skunking a Benchmark
Attack
Attack Stream
Generator
19
What’s Wrong?
• Packet style counts are not relevant to
host-network IDS
20
Skunking a Benchmark: #2
Attack
Attack Stream NIDS with
Generator selective detection
turned on
21
What’s Wrong?
• IDS with selective detection can be
configured to only look at traffic aimed
to local subnet
– SmartBits style generators’ random traffic
largely gets seen and discarded
22
Effective Simulation Network
Replayed Test
packets dumped Network
back onto network
25
Summary
• It’s easy to skunk an intrusion detection
benchmark
• It’s hard to design a good intrusion
detection benchmark
• If you want to see if a given system
works, the best way to find out is to try
it on your actual network
26