0% found this document useful (0 votes)
10 views

02 - 03 - Anomaly Detection Survey

Uploaded by

bzahidhussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

02 - 03 - Anomaly Detection Survey

Uploaded by

bzahidhussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Anomaly Detection Survey

Dr. Martin “Doc” Carlisle


Motivation

• Anomalies may yield critical, actionable intelligence


– Traffic pattern may indicate hacked computer
– Anomaly in MRI might indicate malignant tumor
– Anomaly in credit card data might mean theft
– Spacecraft sensor anomaly might indicate component fault
Challenges
• Defining “normal” is hard
– Boundary between “normal” and “anomalous” often not precise
• Malicious actors adapt to look “normal”
• “Normal” behavior keeps changing
• Techniques don’t transfer easily between domains
– E.g. consider human temperature changes vs. stock price
changes
• Labeled data is hard to get!
• Noise can be similar to “anomalies”
Anomaly Types (I)
• Point anomaly
– Ex. Amount spent in credit card fraud detection
Anomaly Types (II)
• Contextual anomaly
– Point is anomalous only in context (see June below)
Anomaly types (III)
• Collective Anomaly
– Single points aren’t anomalous, but collectively they are
Do we have labeled data?
• Supervised
– Can train on data with labeled instances of normal vs anomaly
classes
– Not very common
• Semisupervised
– Labeled instances for only normal data
• Unsupervised
– No labeled data
Outputs

• Scores
– E.g. a probability
• Labels
– E.g. normal or anomalous
Applications
• Intrusion Detection (host and network based)
• Fraud Detection (e.g. ID theft, credit card, mobile phone,
insurance claim, insider trading)
• Medical data (e.g. tumor detection, A-fib, etc.)
• Machine defects
• Image Processing
• Text data (e.g. terrorist threats)
• Sensor networks (e.g. gunshot detection)
Classification
• Semi-supervised
– Train classifier with labeled normal data, use to identify
anomalies
Neural Network Classification
• Use NN to create classes, then see how new data is
classified
Bayesian Networks

• Determine probability a particular data point is from each


class using probability distributions obtained from training
data
– Select “most probable” class
Support Vector Machine based

• Use hyperplanes to split data into regions with training


data, and then select in which region datapoint falls.
– “Kernels” can help map non-linear data to regular surface
Rule based

• Create a set of rules (e.g. “users don’t log in more than


once a day”) that capture normal behavior
kth nearest neighbor
Nearest Neighbor Techniques
• Distance to kth nearest neighbor is anomaly score
• Relative density of data instance is anomaly score
kth nearest neighbor
• Local density is better than global density
kth nearest neighbor

• Advantages
– Unsupervised, data-driven
– Can be improved with semi-supervised to catch more
anomalies
– Straight-forward to adapt to new datasets
kth nearest neighbor
• Disadvantages
– Unsupervised performs poorly if normal instances don’t have
enough neighbors, or anomalies have too many
– Semi-supervised performs poorly if normal instances don’t have
enough neighbors
– Computational complexity of testing is high
– Defining distance may be difficult (think network packets, e.g.)
Clustering

• Similar to kth nearest neighbor, but define centroids of


“normal”
• Anomaly score is distance to nearest centroid
Clustering Pros and Cons
• Advantages
– Unsupervised
– Can be adapted to other complex data types by plugging in a clustering
algorithm
– Testing phase is fast
• Disadvantages
– Highly dependent on effectiveness of clustering algorithms in capturing
structure
– Many techniques detect anomalies as a byproduct of clustering and are
not optimized for anomaly detection
– Techniques may force anomalies to be assigned to some cluster
– Some techniques only effective when anomalies don’t cluster
– Clustering may be slow
Statistical Techniques
• Parametric
– Assume data is generated by a parameterized distribution
• Anomaly score is inverse of probability density function
– Gaussian Model-Based
• Assume data is generated from Gaussian distribution
– 3 sigma rule (99.7%)
– Box plot rule (use 1.5 * IQR – difference between lower and upper quartile),
(99.3%)
– Grubb’s test: anomalous if difference from mean/std dev >
Statistical Techniques
• Parametric
– Regression Model-Based
• Use residual (part of test instance not explained by regression model)
• Non-Parametric
– Histogram (does this fall in an empty or small bin?)
– Kernel Function
• E.g. parzen windows (use kernel function to approximate density)
Statistical Techniques
• Parametric
– Regression Model-Based
• Use residual (part of test instance not explained by regression model)
• Non-Parametric
– Histogram (does this fall in an empty or small bin?)
– Kernel Function
• E.g. parzen windows (use kernel function to approximate density)
Statistical Pros and Cons
• Advantages
– If assumptions regarding statistical distribution are true, very
justifiable technique
– Score corresponds to confidence interval, which can be used for
tuning
– If distribution estimate robust to anomalies, can be unsupervised
• Disadvantages
– Many datasets don’t come from a particular distribution
– Even if they do, choosing best test statistic isn’t straightforward
– Histogram techniques don’t consider interactions between attributes
• (each particular may not be rare, but combo is)
Information Theoretical Techniques

• Assumption: anomalies introduce irregularities in the


information content of the dataset
– (This is an oversimplification, but think about how easy it would
be to compress the data if a particular element were removed)
Spectral Anomaly Detection

• Assumption: Can embed data in lower dimensional


subspace where anomalies look very different than
normal
– E.g. Principal Component Analysis

You might also like