SlideShare a Scribd company logo
Location:
QuantUniversity Meetup
June 23rd 2016
Boston MA
Anomaly Detection
Techniques and Best Practices
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com
2
Slides and Code available at:
http://www.analyticscertificate.com/Anomaly/
- Analytics Advisory services
- Custom training programs
- Architecture assessments, advice and audits
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers (Shell, Firstfuel Software etc.)
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
4
5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 500 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Launching the Analytics Certificate
Program later in Fall
(MATLAB version also available)
7
• July
▫ 11th : QuantUniversity’s 2nd meetup
 Topic : Quantitative methods topic : TBD
▫ 18th and 19th : 2-day workshop on Anomaly Detection
 Registration and pricing details at www.analyticscertificate.com/Anomaly
• August
▫ 8th : QuantUniversity meetup
▫ 14-20th : ARPM in New York www.arpm.co
 QuantUniversity presenting on Model Risk on August 14th
▫ 18-21st : Big-data Bootcamp http://globalbigdataconference.com/68/boston/big-
data-bootcamp/event.html
Events of Interest
8
• July
▫ Anomaly Detection Workshop
▫ ARPM-prep seminar – Date TBD
• August
▫ Model Evaluation : Metrics, Scaling and Best Practices
• September
▫ What’s missing ? Best practices in missing data analysis
QuantUniversity’s Summer workshop series
9
What is anomaly detection?
• Anomalies or outliers are data points that appear to deviate
markedly from expected outputs.
• It is the process of finding patterns in data that don’t
conform to a prior expected behavior.
• Anomaly detection is being employed more increasingly in
the presence of big data that is captured by sensors(IOT),
social media platforms, huge networks, etc. including
energy systems, medical devices, banking, network
intrusion detection, etc.
10
11
• Fraud Detection
• Stock market
• E-commerce
Examples
12
1. Graphical approach
2. Statistical approach
3. Machine learning approach
Three methodologies to Anomaly Detection
13
 Boxplot
 Scatter plot
 Adjusted quantile plot
Anomaly Detection Methods
• Most outlier detection methods generate an output
that are:
▫ Real-valued outlier scores: quantifies the tendency of a
data point being an outlier by assigning a score or
probability to it.
▫ Binary labels: result of using a threshold to convert
outlier scores to binary labels, inlier or outlier.
14
Graphical approaches
• Statistical tails are most commonly used for one dimensional
distributions, although the same concept can be applied to
multidimensional case.
• It is important to understand that all extreme values are outliers
but the reverse may not be true.
• For instance in one dimensional dataset of
{1,3,3,3,50,97,97,97,100}, observation 50 equals to mean and isn’t
considered as an extreme value, but since this observation is the
most isolated point, it should be considered as an outlier.
15
Box plot
• A standardized way of displaying the
variation of data based on the five
number summary, which includes
minimum, first quartile, median, third
quartile, and maximum.
• This plot does not make any assumptions
of the underlying statistical distribution.
• Any data not included between the
minimum and maximum are considered
as an outlier.
16
Boxplot
17
See Graphical_Approach.R
Side-by-side boxplot for each variable
Scatter plot
• Scatter plots plot pairs of data to show the correlation between typically two
numerical variables.
• An outlier is defined as a data point that doesn't seem to fit with the rest of the
data points.
• In scatterplots, outliers of either intersection or union sets of two variables can
be shown.
18
Scatterplot
19
See Graphical_Approach.R
Scatterplot of Sepal.Width and Sepal.Length
20
• In statistics, a Q–Q plot is a probability plot, which is a graphical
method for comparing two probability distributions by plotting their
quantiles against each other.
• If the two distributions being compared are similar, the points in the
Q–Q plot will approximately lie on the line y = x.
Q-Q plot
Source: Wikipedia
Adjusted quantile plot
• This plot identifies possible multivariate outliers by calculating the Mahalanobis
distance of each point from the center of the data.
• Multi-dimensional Mahalanobis distance between vectors x and y in 𝑅 𝑛 can be
formulated as:
d(x,y) = x − y TS−1(x − y)
where x and y are random vectors of the same distribution with the covariance
matrix S.
• An outlier is defined as a point with a distance larger than some pre-
determined value.
21
Adjusted quantile plot
• Before applying this method and many other parametric
multivariate methods, first we need to check if the data is
multivariate normally distributed using different
multivariate normality tests, such as Royston, Mardia, Chi-
square, univariate plots, etc.
• In R, we use the “mvoutlier” package, which utilizes
graphical approaches as discussed above.
22
Adjusted quantile plot
23
Min-Max normalization before diving into analysis
Multivariate normality test
Outlier Boolean vector identifies the
outliers
Alpha defines maximum thresholding proportion
See Graphical_Approach.R
Adjusted quantile plot
24
See Graphical_Approach.R
Mahalanobis distances
Covariance matrix
Adjusted quantile plot
25
See Graphical_Approach.R
26
 Hypothesis testing (Grubb’s test)
 Scores
Grubbs’ test
• Test for outliers for univariate data sets assumed to come from a normally
distributed population.
• Grubbs' test detects one outlier at a time. This outlier is expunged from the
dataset and the test is iterated until no outliers are detected.
• This test is defined for the following hypotheses:
H0: There are no outliers in the data set
H1: There is exactly one outlier in the data set
• The Grubbs' test statistic is defined as:
27
Grubbs’ test
28
See Statistical_Approach.R
The above function repeats the Grubbs’ test until it finds
all the outliers within the data.
Grubbs’ test
29
See Statistical_Approach.R
Histogram of normal observations vs outliers)
Scores
• Scores quantifies the tendency of a data point being an outlier by assigning it a
score or probability.
• The most commonly used scores are:
▫ Normal score:
𝑥 𝑖 −𝑀𝑒𝑎𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
▫ T-student score:
(𝑧−𝑠𝑞𝑟𝑡 𝑛−2 )
𝑠𝑞𝑟𝑡(𝑧−1−𝑡2)
▫ Chi-square score:
𝑥 𝑖 −𝑀𝑒𝑎𝑛
𝑠𝑑
2
▫ IQR score: 𝑄3-𝑄1
• By using “score” function in R, p-values can be returned instead of scores.
30
Scores
31
See Statistical_Approach.R
“type” defines the type of the score, such as
normal, t-student, etc.
“prob=1” returns the corresponding p-value.
Scores
32
See Statistical_Approach.R
By setting “prob” to any specific value, logical vector
returns the data points, whose probabilities are
greater than this cut-off value, as outliers.
By setting “type” to IQR, all values lower than first
and greater than third quartiles are considered and
difference between them and nearest quartile
divided by IQR is calculated.
33
• Anomaly Detection
▫ Seasonal Hybrid ESD (S-H-ESD) builds upon the Generalized ESD test for
detecting anomalies.
▫ Anomaly detection referring to point-in-time anomalous data points that
could be global or local. A local anomaly is one that occurs inside a seasonal
pattern; Could be +ve or –ve.
▫ More details here: https://github.com/twitter/AnomalyDetection
• Breakout Detection
▫ A breakout is characterized in this package by two steady states and an
intermediate transition period that could be sudden or gradual
▫ Uses the E-Divisive with Medians algorithm; Can detect one or multiple
breakouts in a given time series and employs energy statistics to detect
divergence in mean. More details here:
(https://blog.twitter.com/2014/breakout-detection-in-the-wild )
Twitter packages
Ref: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm
34
• Twitter-R-Anomaly Detection tutorial.ipyb
Demo
35
 Linear regression
 Piecewise/ segmented regression
 Clustering-based approaches
Linear regression
• Linear regression investigates the linear relationships between variables and
predict one variable based on one or more other variables and it can be
formulated as:
𝑌 = 𝛽0 + ෍
𝑖=1
𝑝
𝛽𝑖 𝑋𝑖
where Y and 𝑋𝑖 are random variables, 𝛽𝑖 is regression coefficient and 𝛽0 is a
constant.
• In this model, ordinary least squares estimator is usually used to minimize the
difference between the dependent variable and independent variables.
36
Piecewise/segmented regression
• A method in regression analysis, in which the independent variable is
partitioned into intervals to allow multiple linear models to be fitted to data for
different ranges.
• This model can be applied when there are ‘breakpoints’ and clearly two
different linear relationships in the data with a sudden, sharp change in
directionality. Below is a simple segmented regression for data with two
breakpoints:
𝑌 = 𝐶0 + 𝜑1 𝑋 𝑋 < 𝑋1
𝑌 = 𝐶1 + 𝜑2 𝑋 𝑋 > 𝑋1
where Y is a predicted value, X is an independent variable, 𝐶0 and 𝐶1 are
constant values, 𝜑1 and 𝜑2 are regression coefficients, and 𝑋1 and 𝑋2 are
breakpoints.
37
38
Anomaly detection vs Supervised learning
Piecewise/segmented regression
• For this example, we use “segmented” package in R to first illustrate piecewise
regression for two dimensional data set, which has a breakpoint around z=0.5.
39
See Piecewise_Regression.R
“pmax” is used for parallel maximization to
create different values for y.
Piecewise/segmented regression
• Then, we use linear regression to predict y values for each segment of z.
40
See Piecewise_Regression.R
Piecewise/segmented regression
• Finally, the outliers can be detected for each segment by setting some rules for
residuals of model.
41
See Piecewise_Regression.R
Here, we set the rule for the residuals corresponding to z
less than 0.5, by which the outliers with residuals below
0.5 can be defined as outliers.
Clustering-based approaches
• These methods are suitable for unsupervised anomaly detection.
• They aim to partition the data into meaningful groups (clusters) based on the
similarities and relationships between the groups found in the data.
• Each data point is assigned a degree of membership for each of the clusters.
• Anomalies are those data points that:
▫ Do not fit into any clusters.
▫ Belong to a particular cluster but are far away from the cluster centroid.
▫ Form small or sparse clusters.
42
Clustering-based approaches
• These methods partition the data into k clusters by assigning each data point to
its closest cluster centroid by minimizing the within-cluster sum of squares
(WSS), which is:
෍
𝑘=1
𝐾
෍
𝑖∈𝑆 𝑘
෍
𝑗=1
𝑃
(𝑥𝑖𝑗 − 𝜇 𝑘𝑗)2
where 𝑆 𝑘 is the set of observations in the kth cluster and 𝜇 𝑘𝑗 is the mean of jth
variable of the cluster center of the kth cluster.
• Then, they select the top n points that are the farthest away from their nearest
cluster centers as outliers.
43
44
Anomaly Detection vs Unsupervised Learning
Clustering-based approaches
• “Kmod” package in R is used to show the application of K-means model.
45
In this example the number of clusters is defined
through bend graph in order to pass to K-mod
function.
See Clustering_Approach.R
Clustering-based approaches
46
See Clustering_Approach.R
K=4 is the number of clusters and L=10 is
the number of outliers
Clustering-based approaches
47
See Clustering_Approach.R
Scatter plots of normal and outlier data points
Summary
48
We have covered Anomaly detection
Introduction  Definition of anomaly detection and its importance in energy systems
 Different types of anomaly detection methods: Statistical, graphical and machine
learning methods
Graphical approach  Graphical methods consist of boxplot, scatterplot, adjusted quantile plot and symbol
plot to demonstrate outliers graphically
 The main assumption for applying graphical approaches is multivariate normality
 Mahalanobis distance methods is mainly used for calculating the distance of a point
from a center of multivariate distribution
Statistical approach  Statistical hypothesis testing includes of: Chi-square, Grubb’s test
 Statistical methods may use either scores or p-value as threshold to detect outliers
Machine learning approach  Both supervised and unsupervised learning methods can be used for outlier detection
 Piece wised or segmented regression can be used to identify outliers based on the
residuals for each segment
 In K-means clustering method outliers are defined as points which have doesn’t belong
to any cluster, are far away from the centroids of the cluster or shaping sparse clusters
(MATLAB version also available)
www.analyticscertificate.com
50
Q&A
Slides, code and details about the Anomaly detection workshop
at: http://www.analyticscertificate.com/Anomaly/
Thank you!
Members &
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
51
Ad

More Related Content

What's hot (18)

Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
QuantUniversity
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
ShantanuDeosthale
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
Kenneth Graham
 
Ds for finance day 2
Ds for finance day 2Ds for finance day 2
Ds for finance day 2
QuantUniversity
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
Manojit Nandi
 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniques
Omar F. Althuwaynee
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
Chakrit Phain
 
Anomaly Detection: A Survey
Anomaly Detection: A SurveyAnomaly Detection: A Survey
Anomaly Detection: A Survey
Konkuk University, Korea
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
QuantUniversity
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
guest0edcaf
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
DataCards
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
akanni azeez olamide
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Matthias Braunhofer
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Johnson Ubah
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Matthias Braunhofer
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
Khalid Elshafie
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender Systems
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender SystemsHybrid Solution of the Cold-Start Problem in Context-Aware Recommender Systems
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender Systems
Matthias Braunhofer
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
ShantanuDeosthale
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
Kenneth Graham
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
Manojit Nandi
 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniques
Omar F. Althuwaynee
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
Chakrit Phain
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
guest0edcaf
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
DataCards
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Matthias Braunhofer
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Johnson Ubah
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Matthias Braunhofer
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
Khalid Elshafie
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender Systems
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender SystemsHybrid Solution of the Cold-Start Problem in Context-Aware Recommender Systems
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender Systems
Matthias Braunhofer
 

Viewers also liked (20)

Deep learning - Part I
Deep learning - Part IDeep learning - Part I
Deep learning - Part I
QuantUniversity
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
Manojit Nandi
 
The definition of normal - An introduction and guide to anomaly detection.
The definition of normal - An introduction and guide to anomaly detection. The definition of normal - An introduction and guide to anomaly detection.
The definition of normal - An introduction and guide to anomaly detection.
Alois Reitbauer
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
Adam Gibson
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
Sara Asher
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
 
A System for Denial of Service Attack Detection Based On Multivariate Corelat...
A System for Denial of Service Attack Detection Based On Multivariate Corelat...A System for Denial of Service Attack Detection Based On Multivariate Corelat...
A System for Denial of Service Attack Detection Based On Multivariate Corelat...
IJCERT
 
a system for denial-of-service attack detection based on multivariate correla...
a system for denial-of-service attack detection based on multivariate correla...a system for denial-of-service attack detection based on multivariate correla...
a system for denial-of-service attack detection based on multivariate correla...
swathi78
 
Enterprise architecture-career-path
Enterprise architecture-career-pathEnterprise architecture-career-path
Enterprise architecture-career-path
Sim Kwan Choo
 
Depth based app
Depth based appDepth based app
Depth based app
madhavsolanki
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
철 김
 
Local Outlier Factor
Local Outlier FactorLocal Outlier Factor
Local Outlier Factor
AMR koura
 
Outliers
OutliersOutliers
Outliers
IMH chennai
 
Statistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical DataStatistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical Data
MSTomlinson
 
Class Outlier Mining
Class Outlier MiningClass Outlier Mining
Class Outlier Mining
Motaz Saad
 
Outliers, the story of success
Outliers, the story of successOutliers, the story of success
Outliers, the story of success
Hasbulnallah Mohamed Razali
 
Outliers -Story of Success by Malcolm Gladwell
 Outliers -Story of Success by Malcolm Gladwell Outliers -Story of Success by Malcolm Gladwell
Outliers -Story of Success by Malcolm Gladwell
Ma . Josefa Magbanua
 
"Outliers" - Malcolm Gladwell Book Review
"Outliers" - Malcolm Gladwell Book Review"Outliers" - Malcolm Gladwell Book Review
"Outliers" - Malcolm Gladwell Book Review
Archit Rathi
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
The definition of normal - An introduction and guide to anomaly detection.
The definition of normal - An introduction and guide to anomaly detection. The definition of normal - An introduction and guide to anomaly detection.
The definition of normal - An introduction and guide to anomaly detection.
Alois Reitbauer
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
Adam Gibson
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
Sara Asher
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
 
A System for Denial of Service Attack Detection Based On Multivariate Corelat...
A System for Denial of Service Attack Detection Based On Multivariate Corelat...A System for Denial of Service Attack Detection Based On Multivariate Corelat...
A System for Denial of Service Attack Detection Based On Multivariate Corelat...
IJCERT
 
a system for denial-of-service attack detection based on multivariate correla...
a system for denial-of-service attack detection based on multivariate correla...a system for denial-of-service attack detection based on multivariate correla...
a system for denial-of-service attack detection based on multivariate correla...
swathi78
 
Enterprise architecture-career-path
Enterprise architecture-career-pathEnterprise architecture-career-path
Enterprise architecture-career-path
Sim Kwan Choo
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
철 김
 
Local Outlier Factor
Local Outlier FactorLocal Outlier Factor
Local Outlier Factor
AMR koura
 
Statistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical DataStatistical Analysis of Left-Censored Geochemical Data
Statistical Analysis of Left-Censored Geochemical Data
MSTomlinson
 
Class Outlier Mining
Class Outlier MiningClass Outlier Mining
Class Outlier Mining
Motaz Saad
 
Outliers -Story of Success by Malcolm Gladwell
 Outliers -Story of Success by Malcolm Gladwell Outliers -Story of Success by Malcolm Gladwell
Outliers -Story of Success by Malcolm Gladwell
Ma . Josefa Magbanua
 
"Outliers" - Malcolm Gladwell Book Review
"Outliers" - Malcolm Gladwell Book Review"Outliers" - Malcolm Gladwell Book Review
"Outliers" - Malcolm Gladwell Book Review
Archit Rathi
 
Ad

Similar to Anomaly detection Meetup Slides (20)

Data_Analytics_for_IoT_Solutions.pptx.pdf
Data_Analytics_for_IoT_Solutions.pptx.pdfData_Analytics_for_IoT_Solutions.pptx.pdf
Data_Analytics_for_IoT_Solutions.pptx.pdf
ChellamuthuHaripriya
 
computer application in pharmaceutical research
computer application in pharmaceutical researchcomputer application in pharmaceutical research
computer application in pharmaceutical research
SUJITHA MARY
 
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
EXPLORATORY DATA ANALYSIS and ANALYSIS.pptEXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
ABIGESH1
 
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTSEXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
hemalatha909597
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
hiblooms
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Dr. Radhey Shyam
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
Nguyen Ngoc Binh Phuong
 
Clustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.pptClustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.ppt
shilpamathur13
 
Basic geostatistics
Basic geostatisticsBasic geostatistics
Basic geostatistics
Serdar Kaya
 
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
triwicak1
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
Parveen Vashisth
 
The RuLIS approach to outliers (Marcello D'Orazio,FAO)
The RuLIS approach to outliers (Marcello D'Orazio,FAO)The RuLIS approach to outliers (Marcello D'Orazio,FAO)
The RuLIS approach to outliers (Marcello D'Orazio,FAO)
FAO
 
Research methodology Regression Modeling.pptx
Research methodology Regression Modeling.pptxResearch methodology Regression Modeling.pptx
Research methodology Regression Modeling.pptx
keshavkumar403723
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
Henock Beyene
 
chi_square test.pptx
chi_square test.pptxchi_square test.pptx
chi_square test.pptx
SheetalSardhna
 
statistical remodelling in pharmaceutical research and developmnent by aina b...
statistical remodelling in pharmaceutical research and developmnent by aina b...statistical remodelling in pharmaceutical research and developmnent by aina b...
statistical remodelling in pharmaceutical research and developmnent by aina b...
AINABASHEER
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhzLect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
Statistical estimation and sample size determination
Statistical estimation and sample size determinationStatistical estimation and sample size determination
Statistical estimation and sample size determination
MikaPop
 
Data_Analytics_for_IoT_Solutions.pptx.pdf
Data_Analytics_for_IoT_Solutions.pptx.pdfData_Analytics_for_IoT_Solutions.pptx.pdf
Data_Analytics_for_IoT_Solutions.pptx.pdf
ChellamuthuHaripriya
 
computer application in pharmaceutical research
computer application in pharmaceutical researchcomputer application in pharmaceutical research
computer application in pharmaceutical research
SUJITHA MARY
 
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
EXPLORATORY DATA ANALYSIS and ANALYSIS.pptEXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
ABIGESH1
 
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTSEXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
hemalatha909597
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
hiblooms
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Dr. Radhey Shyam
 
Clustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.pptClustering in Machine Learning: A Brief Overview.ppt
Clustering in Machine Learning: A Brief Overview.ppt
shilpamathur13
 
Basic geostatistics
Basic geostatisticsBasic geostatistics
Basic geostatistics
Serdar Kaya
 
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
triwicak1
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
The RuLIS approach to outliers (Marcello D'Orazio,FAO)
The RuLIS approach to outliers (Marcello D'Orazio,FAO)The RuLIS approach to outliers (Marcello D'Orazio,FAO)
The RuLIS approach to outliers (Marcello D'Orazio,FAO)
FAO
 
Research methodology Regression Modeling.pptx
Research methodology Regression Modeling.pptxResearch methodology Regression Modeling.pptx
Research methodology Regression Modeling.pptx
keshavkumar403723
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
Henock Beyene
 
statistical remodelling in pharmaceutical research and developmnent by aina b...
statistical remodelling in pharmaceutical research and developmnent by aina b...statistical remodelling in pharmaceutical research and developmnent by aina b...
statistical remodelling in pharmaceutical research and developmnent by aina b...
AINABASHEER
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhzLect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
Statistical estimation and sample size determination
Statistical estimation and sample size determinationStatistical estimation and sample size determination
Statistical estimation and sample size determination
MikaPop
 
Ad

More from QuantUniversity (20)

AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
QuantUniversity
 
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
QuantUniversity
 
EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
QuantUniversity
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
QuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
QuantUniversity
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
QuantUniversity
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
QuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
QuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
QuantUniversity
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
QuantUniversity
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
QuantUniversity
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
QuantUniversity
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
QuantUniversity
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
QuantUniversity
 
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
QuantUniversity
 
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
QuantUniversity
 
EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
QuantUniversity
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
QuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
QuantUniversity
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
QuantUniversity
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
QuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
QuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
QuantUniversity
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
QuantUniversity
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
QuantUniversity
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
QuantUniversity
 

Recently uploaded (20)

Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 

Anomaly detection Meetup Slides

  • 1. Location: QuantUniversity Meetup June 23rd 2016 Boston MA Anomaly Detection Techniques and Best Practices 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com [email protected]
  • 2. 2 Slides and Code available at: http://www.analyticscertificate.com/Anomaly/
  • 3. - Analytics Advisory services - Custom training programs - Architecture assessments, advice and audits
  • 4. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers (Shell, Firstfuel Software etc.) • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  • 5. 5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launching the Analytics Certificate Program later in Fall
  • 6. (MATLAB version also available)
  • 7. 7 • July ▫ 11th : QuantUniversity’s 2nd meetup  Topic : Quantitative methods topic : TBD ▫ 18th and 19th : 2-day workshop on Anomaly Detection  Registration and pricing details at www.analyticscertificate.com/Anomaly • August ▫ 8th : QuantUniversity meetup ▫ 14-20th : ARPM in New York www.arpm.co  QuantUniversity presenting on Model Risk on August 14th ▫ 18-21st : Big-data Bootcamp http://globalbigdataconference.com/68/boston/big- data-bootcamp/event.html Events of Interest
  • 8. 8 • July ▫ Anomaly Detection Workshop ▫ ARPM-prep seminar – Date TBD • August ▫ Model Evaluation : Metrics, Scaling and Best Practices • September ▫ What’s missing ? Best practices in missing data analysis QuantUniversity’s Summer workshop series
  • 9. 9
  • 10. What is anomaly detection? • Anomalies or outliers are data points that appear to deviate markedly from expected outputs. • It is the process of finding patterns in data that don’t conform to a prior expected behavior. • Anomaly detection is being employed more increasingly in the presence of big data that is captured by sensors(IOT), social media platforms, huge networks, etc. including energy systems, medical devices, banking, network intrusion detection, etc. 10
  • 11. 11 • Fraud Detection • Stock market • E-commerce Examples
  • 12. 12 1. Graphical approach 2. Statistical approach 3. Machine learning approach Three methodologies to Anomaly Detection
  • 13. 13  Boxplot  Scatter plot  Adjusted quantile plot
  • 14. Anomaly Detection Methods • Most outlier detection methods generate an output that are: ▫ Real-valued outlier scores: quantifies the tendency of a data point being an outlier by assigning a score or probability to it. ▫ Binary labels: result of using a threshold to convert outlier scores to binary labels, inlier or outlier. 14
  • 15. Graphical approaches • Statistical tails are most commonly used for one dimensional distributions, although the same concept can be applied to multidimensional case. • It is important to understand that all extreme values are outliers but the reverse may not be true. • For instance in one dimensional dataset of {1,3,3,3,50,97,97,97,100}, observation 50 equals to mean and isn’t considered as an extreme value, but since this observation is the most isolated point, it should be considered as an outlier. 15
  • 16. Box plot • A standardized way of displaying the variation of data based on the five number summary, which includes minimum, first quartile, median, third quartile, and maximum. • This plot does not make any assumptions of the underlying statistical distribution. • Any data not included between the minimum and maximum are considered as an outlier. 16
  • 18. Scatter plot • Scatter plots plot pairs of data to show the correlation between typically two numerical variables. • An outlier is defined as a data point that doesn't seem to fit with the rest of the data points. • In scatterplots, outliers of either intersection or union sets of two variables can be shown. 18
  • 20. 20 • In statistics, a Q–Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. • If the two distributions being compared are similar, the points in the Q–Q plot will approximately lie on the line y = x. Q-Q plot Source: Wikipedia
  • 21. Adjusted quantile plot • This plot identifies possible multivariate outliers by calculating the Mahalanobis distance of each point from the center of the data. • Multi-dimensional Mahalanobis distance between vectors x and y in 𝑅 𝑛 can be formulated as: d(x,y) = x − y TS−1(x − y) where x and y are random vectors of the same distribution with the covariance matrix S. • An outlier is defined as a point with a distance larger than some pre- determined value. 21
  • 22. Adjusted quantile plot • Before applying this method and many other parametric multivariate methods, first we need to check if the data is multivariate normally distributed using different multivariate normality tests, such as Royston, Mardia, Chi- square, univariate plots, etc. • In R, we use the “mvoutlier” package, which utilizes graphical approaches as discussed above. 22
  • 23. Adjusted quantile plot 23 Min-Max normalization before diving into analysis Multivariate normality test Outlier Boolean vector identifies the outliers Alpha defines maximum thresholding proportion See Graphical_Approach.R
  • 24. Adjusted quantile plot 24 See Graphical_Approach.R Mahalanobis distances Covariance matrix
  • 25. Adjusted quantile plot 25 See Graphical_Approach.R
  • 26. 26  Hypothesis testing (Grubb’s test)  Scores
  • 27. Grubbs’ test • Test for outliers for univariate data sets assumed to come from a normally distributed population. • Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. • This test is defined for the following hypotheses: H0: There are no outliers in the data set H1: There is exactly one outlier in the data set • The Grubbs' test statistic is defined as: 27
  • 28. Grubbs’ test 28 See Statistical_Approach.R The above function repeats the Grubbs’ test until it finds all the outliers within the data.
  • 29. Grubbs’ test 29 See Statistical_Approach.R Histogram of normal observations vs outliers)
  • 30. Scores • Scores quantifies the tendency of a data point being an outlier by assigning it a score or probability. • The most commonly used scores are: ▫ Normal score: 𝑥 𝑖 −𝑀𝑒𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 ▫ T-student score: (𝑧−𝑠𝑞𝑟𝑡 𝑛−2 ) 𝑠𝑞𝑟𝑡(𝑧−1−𝑡2) ▫ Chi-square score: 𝑥 𝑖 −𝑀𝑒𝑎𝑛 𝑠𝑑 2 ▫ IQR score: 𝑄3-𝑄1 • By using “score” function in R, p-values can be returned instead of scores. 30
  • 31. Scores 31 See Statistical_Approach.R “type” defines the type of the score, such as normal, t-student, etc. “prob=1” returns the corresponding p-value.
  • 32. Scores 32 See Statistical_Approach.R By setting “prob” to any specific value, logical vector returns the data points, whose probabilities are greater than this cut-off value, as outliers. By setting “type” to IQR, all values lower than first and greater than third quartiles are considered and difference between them and nearest quartile divided by IQR is calculated.
  • 33. 33 • Anomaly Detection ▫ Seasonal Hybrid ESD (S-H-ESD) builds upon the Generalized ESD test for detecting anomalies. ▫ Anomaly detection referring to point-in-time anomalous data points that could be global or local. A local anomaly is one that occurs inside a seasonal pattern; Could be +ve or –ve. ▫ More details here: https://github.com/twitter/AnomalyDetection • Breakout Detection ▫ A breakout is characterized in this package by two steady states and an intermediate transition period that could be sudden or gradual ▫ Uses the E-Divisive with Medians algorithm; Can detect one or multiple breakouts in a given time series and employs energy statistics to detect divergence in mean. More details here: (https://blog.twitter.com/2014/breakout-detection-in-the-wild ) Twitter packages Ref: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm
  • 35. 35  Linear regression  Piecewise/ segmented regression  Clustering-based approaches
  • 36. Linear regression • Linear regression investigates the linear relationships between variables and predict one variable based on one or more other variables and it can be formulated as: 𝑌 = 𝛽0 + ෍ 𝑖=1 𝑝 𝛽𝑖 𝑋𝑖 where Y and 𝑋𝑖 are random variables, 𝛽𝑖 is regression coefficient and 𝛽0 is a constant. • In this model, ordinary least squares estimator is usually used to minimize the difference between the dependent variable and independent variables. 36
  • 37. Piecewise/segmented regression • A method in regression analysis, in which the independent variable is partitioned into intervals to allow multiple linear models to be fitted to data for different ranges. • This model can be applied when there are ‘breakpoints’ and clearly two different linear relationships in the data with a sudden, sharp change in directionality. Below is a simple segmented regression for data with two breakpoints: 𝑌 = 𝐶0 + 𝜑1 𝑋 𝑋 < 𝑋1 𝑌 = 𝐶1 + 𝜑2 𝑋 𝑋 > 𝑋1 where Y is a predicted value, X is an independent variable, 𝐶0 and 𝐶1 are constant values, 𝜑1 and 𝜑2 are regression coefficients, and 𝑋1 and 𝑋2 are breakpoints. 37
  • 38. 38 Anomaly detection vs Supervised learning
  • 39. Piecewise/segmented regression • For this example, we use “segmented” package in R to first illustrate piecewise regression for two dimensional data set, which has a breakpoint around z=0.5. 39 See Piecewise_Regression.R “pmax” is used for parallel maximization to create different values for y.
  • 40. Piecewise/segmented regression • Then, we use linear regression to predict y values for each segment of z. 40 See Piecewise_Regression.R
  • 41. Piecewise/segmented regression • Finally, the outliers can be detected for each segment by setting some rules for residuals of model. 41 See Piecewise_Regression.R Here, we set the rule for the residuals corresponding to z less than 0.5, by which the outliers with residuals below 0.5 can be defined as outliers.
  • 42. Clustering-based approaches • These methods are suitable for unsupervised anomaly detection. • They aim to partition the data into meaningful groups (clusters) based on the similarities and relationships between the groups found in the data. • Each data point is assigned a degree of membership for each of the clusters. • Anomalies are those data points that: ▫ Do not fit into any clusters. ▫ Belong to a particular cluster but are far away from the cluster centroid. ▫ Form small or sparse clusters. 42
  • 43. Clustering-based approaches • These methods partition the data into k clusters by assigning each data point to its closest cluster centroid by minimizing the within-cluster sum of squares (WSS), which is: ෍ 𝑘=1 𝐾 ෍ 𝑖∈𝑆 𝑘 ෍ 𝑗=1 𝑃 (𝑥𝑖𝑗 − 𝜇 𝑘𝑗)2 where 𝑆 𝑘 is the set of observations in the kth cluster and 𝜇 𝑘𝑗 is the mean of jth variable of the cluster center of the kth cluster. • Then, they select the top n points that are the farthest away from their nearest cluster centers as outliers. 43
  • 44. 44 Anomaly Detection vs Unsupervised Learning
  • 45. Clustering-based approaches • “Kmod” package in R is used to show the application of K-means model. 45 In this example the number of clusters is defined through bend graph in order to pass to K-mod function. See Clustering_Approach.R
  • 46. Clustering-based approaches 46 See Clustering_Approach.R K=4 is the number of clusters and L=10 is the number of outliers
  • 47. Clustering-based approaches 47 See Clustering_Approach.R Scatter plots of normal and outlier data points
  • 48. Summary 48 We have covered Anomaly detection Introduction  Definition of anomaly detection and its importance in energy systems  Different types of anomaly detection methods: Statistical, graphical and machine learning methods Graphical approach  Graphical methods consist of boxplot, scatterplot, adjusted quantile plot and symbol plot to demonstrate outliers graphically  The main assumption for applying graphical approaches is multivariate normality  Mahalanobis distance methods is mainly used for calculating the distance of a point from a center of multivariate distribution Statistical approach  Statistical hypothesis testing includes of: Chi-square, Grubb’s test  Statistical methods may use either scores or p-value as threshold to detect outliers Machine learning approach  Both supervised and unsupervised learning methods can be used for outlier detection  Piece wised or segmented regression can be used to identify outliers based on the residuals for each segment  In K-means clustering method outliers are defined as points which have doesn’t belong to any cluster, are far away from the centroids of the cluster or shaping sparse clusters
  • 49. (MATLAB version also available) www.analyticscertificate.com
  • 50. 50 Q&A Slides, code and details about the Anomaly detection workshop at: http://www.analyticscertificate.com/Anomaly/
  • 51. Thank you! Members & Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 51