02 - 03 - Anomaly Detection Survey

Uploaded by

bzahidhussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

02 - 03 - Anomaly Detection Survey

Uploaded by

bzahidhussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Anomaly Detection Survey

Dr. Martin “Doc” Carlisle

Motivation

• Anomalies may yield critical, actionable intelligence

– Traffic pattern may indicate hacked computer
– Anomaly in MRI might indicate malignant tumor
– Anomaly in credit card data might mean theft
– Spacecraft sensor anomaly might indicate component fault
Challenges
• Defining “normal” is hard
– Boundary between “normal” and “anomalous” often not precise
• Malicious actors adapt to look “normal”
• “Normal” behavior keeps changing
• Techniques don’t transfer easily between domains
– E.g. consider human temperature changes vs. stock price
changes
• Labeled data is hard to get!
• Noise can be similar to “anomalies”
Anomaly Types (I)
• Point anomaly
– Ex. Amount spent in credit card fraud detection
Anomaly Types (II)
• Contextual anomaly
– Point is anomalous only in context (see June below)
Anomaly types (III)
• Collective Anomaly
– Single points aren’t anomalous, but collectively they are
Do we have labeled data?
• Supervised
– Can train on data with labeled instances of normal vs anomaly
classes
– Not very common
• Semisupervised
– Labeled instances for only normal data
• Unsupervised
– No labeled data
Outputs

• Scores
– E.g. a probability
• Labels
– E.g. normal or anomalous
Applications
• Intrusion Detection (host and network based)
• Fraud Detection (e.g. ID theft, credit card, mobile phone,
insurance claim, insider trading)
• Medical data (e.g. tumor detection, A-fib, etc.)
• Machine defects
• Image Processing
• Text data (e.g. terrorist threats)
• Sensor networks (e.g. gunshot detection)
Classification
• Semi-supervised
– Train classifier with labeled normal data, use to identify
anomalies
Neural Network Classification
• Use NN to create classes, then see how new data is
classified
Bayesian Networks

• Determine probability a particular data point is from each

class using probability distributions obtained from training
data
– Select “most probable” class
Support Vector Machine based

• Use hyperplanes to split data into regions with training

data, and then select in which region datapoint falls.
– “Kernels” can help map non-linear data to regular surface
Rule based

• Create a set of rules (e.g. “users don’t log in more than

once a day”) that capture normal behavior
kth nearest neighbor
Nearest Neighbor Techniques
• Distance to kth nearest neighbor is anomaly score
• Relative density of data instance is anomaly score
kth nearest neighbor
• Local density is better than global density
kth nearest neighbor

• Advantages
– Unsupervised, data-driven
– Can be improved with semi-supervised to catch more
anomalies
– Straight-forward to adapt to new datasets
kth nearest neighbor
• Disadvantages
– Unsupervised performs poorly if normal instances don’t have
enough neighbors, or anomalies have too many
– Semi-supervised performs poorly if normal instances don’t have
enough neighbors
– Computational complexity of testing is high
– Defining distance may be difficult (think network packets, e.g.)
Clustering

• Similar to kth nearest neighbor, but define centroids of

“normal”
• Anomaly score is distance to nearest centroid
Clustering Pros and Cons
• Advantages
– Unsupervised
– Can be adapted to other complex data types by plugging in a clustering
algorithm
– Testing phase is fast
• Disadvantages
– Highly dependent on effectiveness of clustering algorithms in capturing
structure
– Many techniques detect anomalies as a byproduct of clustering and are
not optimized for anomaly detection
– Techniques may force anomalies to be assigned to some cluster
– Some techniques only effective when anomalies don’t cluster
– Clustering may be slow
Statistical Techniques
• Parametric
– Assume data is generated by a parameterized distribution
• Anomaly score is inverse of probability density function
– Gaussian Model-Based
• Assume data is generated from Gaussian distribution
– 3 sigma rule (99.7%)
– Box plot rule (use 1.5 * IQR – difference between lower and upper quartile),
(99.3%)
– Grubb’s test: anomalous if difference from mean/std dev >
Statistical Techniques
• Parametric
– Regression Model-Based
• Use residual (part of test instance not explained by regression model)
• Non-Parametric
– Histogram (does this fall in an empty or small bin?)
– Kernel Function
• E.g. parzen windows (use kernel function to approximate density)
Statistical Techniques
• Parametric
– Regression Model-Based
• Use residual (part of test instance not explained by regression model)
• Non-Parametric
– Histogram (does this fall in an empty or small bin?)
– Kernel Function
• E.g. parzen windows (use kernel function to approximate density)
Statistical Pros and Cons
• Advantages
– If assumptions regarding statistical distribution are true, very
justifiable technique
– Score corresponds to confidence interval, which can be used for
tuning
– If distribution estimate robust to anomalies, can be unsupervised
• Disadvantages
– Many datasets don’t come from a particular distribution
– Even if they do, choosing best test statistic isn’t straightforward
– Histogram techniques don’t consider interactions between attributes
• (each particular may not be rare, but combo is)
Information Theoretical Techniques

• Assumption: anomalies introduce irregularities in the

information content of the dataset
– (This is an oversimplification, but think about how easy it would
be to compress the data if a particular element were removed)
Spectral Anomaly Detection

• Assumption: Can embed data in lower dimensional

subspace where anomalies look very different than
normal
– E.g. Principal Component Analysis

Anomaly Detection Principles and Algorithms: Mehrotra, K.G., Mohan, C.K., Huang, H
No ratings yet
Anomaly Detection Principles and Algorithms: Mehrotra, K.G., Mohan, C.K., Huang, H
1 page
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Introtoanomalydetection 170421012904
No ratings yet
Introtoanomalydetection 170421012904
53 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
WSDM21 Tutorial DLAD Slides
No ratings yet
WSDM21 Tutorial DLAD Slides
110 pages
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
33 pages
Module_11(c)
No ratings yet
Module_11(c)
4 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Anomaly Detection Survey
No ratings yet
Anomaly Detection Survey
72 pages
Outlier Detection
No ratings yet
Outlier Detection
36 pages
Machine Learning for Anomaly Detection
No ratings yet
Machine Learning for Anomaly Detection
23 pages
Anomaly Detection: A Tutorial
No ratings yet
Anomaly Detection: A Tutorial
101 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
No ratings yet
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
9 pages
Anomaly_Detection_for_Web_Log_based_Data
No ratings yet
Anomaly_Detection_for_Web_Log_based_Data
5 pages
Outlier Analysis
No ratings yet
Outlier Analysis
18 pages
Anomoly detection
No ratings yet
Anomoly detection
2 pages
28682-Article Text-32736-1-2-20240324
No ratings yet
28682-Article Text-32736-1-2-20240324
11 pages
Anomaly Detection
No ratings yet
Anomaly Detection
49 pages
Anomalies 2312.16139
No ratings yet
Anomalies 2312.16139
41 pages
Anomaly-Fraud-Detection
No ratings yet
Anomaly-Fraud-Detection
50 pages
Sciencedirect: Survey On Anomaly Detection Using Data Mining Techniques
No ratings yet
Sciencedirect: Survey On Anomaly Detection Using Data Mining Techniques
6 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
10 - Anomaly Detection
No ratings yet
10 - Anomaly Detection
12 pages
Anomaly_detection
No ratings yet
Anomaly_detection
13 pages
BITS-WASE-DATA MINING-Session-07-2015 PDF
No ratings yet
BITS-WASE-DATA MINING-Session-07-2015 PDF
25 pages
1 s2.0 S0952197622004936 Main
No ratings yet
1 s2.0 S0952197622004936 Main
8 pages
Anomaly Detection Guidebook
100% (1)
Anomaly Detection Guidebook
16 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
TT02 Data, Methods, and Scenarios
No ratings yet
TT02 Data, Methods, and Scenarios
44 pages
Knime Anomaly Detection Visualization
No ratings yet
Knime Anomaly Detection Visualization
13 pages
10.anomaly Detection
No ratings yet
10.anomaly Detection
24 pages
Lecture Notes For Chapter 10 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 10 Introduction To Data Mining: by Tan, Steinbach, Kumar
24 pages
CS L06 MachineLearning AnomalyDetection
No ratings yet
CS L06 MachineLearning AnomalyDetection
61 pages
Chap10 Anomaly Detection
No ratings yet
Chap10 Anomaly Detection
24 pages
Ijetae 0512 58 PDF
No ratings yet
Ijetae 0512 58 PDF
5 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Anomaly detection for Cyber security
No ratings yet
Anomaly detection for Cyber security
31 pages
Anomaly Detection: World-Leading Research With Real-World Impact!
No ratings yet
Anomaly Detection: World-Leading Research With Real-World Impact!
72 pages
Phát hiện bất thường sử dụng kỹ thuật bao lồi
No ratings yet
Phát hiện bất thường sử dụng kỹ thuật bao lồi
19 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
D L A D: A S: EEP Earning For Nomaly Etection Urvey
No ratings yet
D L A D: A S: EEP Earning For Nomaly Etection Urvey
50 pages
Deep Learning For Anomaly Detection A Survey.
No ratings yet
Deep Learning For Anomaly Detection A Survey.
50 pages
MBA Analytics For Finance 08
No ratings yet
MBA Analytics For Finance 08
9 pages
Anomaly or Outlier Detection
No ratings yet
Anomaly or Outlier Detection
14 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
Data Mining Chapter 6 Anomaly & Fraud Detection
No ratings yet
Data Mining Chapter 6 Anomaly & Fraud Detection
41 pages
A Novel Anomaly Detection Scheme Based On Principal Component Classifier
No ratings yet
A Novel Anomaly Detection Scheme Based On Principal Component Classifier
10 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
ff12-deep-learning-for-anomaly-detection
No ratings yet
ff12-deep-learning-for-anomaly-detection
71 pages
Paper 8664
No ratings yet
Paper 8664
8 pages
Anomaly Detection Using Ml (1)
No ratings yet
Anomaly Detection Using Ml (1)
30 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Window Tint Permit Application_Fillable_REV052024
No ratings yet
Window Tint Permit Application_Fillable_REV052024
2 pages
Hotal booking docs
No ratings yet
Hotal booking docs
2 pages
Philippines Festivals
No ratings yet
Philippines Festivals
10 pages
Team 5 - Y2Y
No ratings yet
Team 5 - Y2Y
17 pages
Assignment No.1 Title-Case Study Based Assignment On Method Study
No ratings yet
Assignment No.1 Title-Case Study Based Assignment On Method Study
4 pages
To What Extent Did the End of the Password Sharing Influence Netflix Brand Image and Sales
No ratings yet
To What Extent Did the End of the Password Sharing Influence Netflix Brand Image and Sales
25 pages
Vessel Weighing Guide-Loadcells
100% (1)
Vessel Weighing Guide-Loadcells
39 pages
Grade 8 Probability Intro Quiz
0% (1)
Grade 8 Probability Intro Quiz
2 pages
Human Resource Development
No ratings yet
Human Resource Development
85 pages
L316 15MY AVM 110 130 INDUSTRIAL 6PP tcm295-87386
No ratings yet
L316 15MY AVM 110 130 INDUSTRIAL 6PP tcm295-87386
6 pages
4 The Analysis and Design of Work
No ratings yet
4 The Analysis and Design of Work
15 pages
Allen test 7 with key
No ratings yet
Allen test 7 with key
11 pages
Work Energy & Power
No ratings yet
Work Energy & Power
5 pages
2.1 Properties of Matter
No ratings yet
2.1 Properties of Matter
54 pages
Questions On This Quiz Are Based On Information From .: Earth Science: Seasons
No ratings yet
Questions On This Quiz Are Based On Information From .: Earth Science: Seasons
2 pages
TDS 30562 Gardex Premium Semigloss (T) Euk KH
No ratings yet
TDS 30562 Gardex Premium Semigloss (T) Euk KH
3 pages
Relevant Provisions of Companies Act
No ratings yet
Relevant Provisions of Companies Act
19 pages
Dcs Database Organization
100% (1)
Dcs Database Organization
15 pages
CURRICULUM VITAE Moses
100% (5)
CURRICULUM VITAE Moses
3 pages
Home Learning Plan For Grade 10 KONTEMPORARYONG ISYU
No ratings yet
Home Learning Plan For Grade 10 KONTEMPORARYONG ISYU
8 pages
3D Trasar 3DT190
No ratings yet
3D Trasar 3DT190
9 pages
Teaser - Ampersand AGOFS-Dec-2024
No ratings yet
Teaser - Ampersand AGOFS-Dec-2024
2 pages
Recruitment of Nimhans
No ratings yet
Recruitment of Nimhans
3 pages
PWS - Drainage
No ratings yet
PWS - Drainage
3 pages
Changing XP Serial With A Valid SP2 Serial
No ratings yet
Changing XP Serial With A Valid SP2 Serial
116 pages
Role Clarification Process
No ratings yet
Role Clarification Process
1 page
Cuestionario. Actividad Formativa 7. Guess The Tenses
No ratings yet
Cuestionario. Actividad Formativa 7. Guess The Tenses
7 pages
Internet Banking in SBI - Preeti Pawar 357358
100% (3)
Internet Banking in SBI - Preeti Pawar 357358
90 pages
Communication Programmers Guide - G5015012 SW 8.20
No ratings yet
Communication Programmers Guide - G5015012 SW 8.20
56 pages
Rfg70N06, Rfp70N06, Rf1S70N06, Rf1S70N06Sm: 70A, 60V, Avalanche Rated, N-Channel Enhancement-Mode Power Mosfets
No ratings yet
Rfg70N06, Rfp70N06, Rf1S70N06, Rf1S70N06Sm: 70A, 60V, Avalanche Rated, N-Channel Enhancement-Mode Power Mosfets
6 pages