02 - 03 - Anomaly Detection Survey
02 - 03 - Anomaly Detection Survey
• Scores
– E.g. a probability
• Labels
– E.g. normal or anomalous
Applications
• Intrusion Detection (host and network based)
• Fraud Detection (e.g. ID theft, credit card, mobile phone,
insurance claim, insider trading)
• Medical data (e.g. tumor detection, A-fib, etc.)
• Machine defects
• Image Processing
• Text data (e.g. terrorist threats)
• Sensor networks (e.g. gunshot detection)
Classification
• Semi-supervised
– Train classifier with labeled normal data, use to identify
anomalies
Neural Network Classification
• Use NN to create classes, then see how new data is
classified
Bayesian Networks
• Advantages
– Unsupervised, data-driven
– Can be improved with semi-supervised to catch more
anomalies
– Straight-forward to adapt to new datasets
kth nearest neighbor
• Disadvantages
– Unsupervised performs poorly if normal instances don’t have
enough neighbors, or anomalies have too many
– Semi-supervised performs poorly if normal instances don’t have
enough neighbors
– Computational complexity of testing is high
– Defining distance may be difficult (think network packets, e.g.)
Clustering