Anomaly-Detection 112940
Anomaly-Detection 112940
Anomaly Detection
Introduction to
Anomaly Detection
Linghao Chen
HOMEPAGE: https://lhchen.top
[email protected]
School of Computer Science and Technology, Xidian University, Xi'an, ShaanXi, P.R.China
Anomaly Detection
What is it?
[1]: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining. IEEE, 2008.
[2]: Eswaran, Dhivya, et al. "Spotlight: Detecting anomalies in streaming graphs." Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 2018.
Problems
Classic Methods:
➢ kNN(K-Nearest Neighbor)
➢ LOF(Local Outlier Factor)
➢ PCA(Principal Component Analysis)
➢ HBOS(Histogram-based Outlier Score)
➢ Isolation Forest
➢ AE(Auto Encoder)
kNN(K-Nearest Neighbor)
1
𝑁 𝑝
𝑝
𝐷𝑖𝑠 𝑥, 𝑦 = 𝑥𝑖 − 𝑦𝑖
𝑖=1
[1]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod
Record, 29(2), pp. 427-438.
LOF(Local Outlier Factor)
K-distance of an object p
5-distance
[1]: Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record,
29(2), pp. 93-104.
LOF(Local Outlier Factor)
𝑁5 𝑂 = {𝑃1 , 𝑃2 , 𝑃3 , 𝑃4 , 𝑃5 , 𝑃6 }
𝑃5
𝜌𝑘 𝑂
σ𝑂∈𝑁𝑘 (𝑃)
𝜌𝑘 𝑃
5-distance
𝐿𝑂𝐹𝑘 𝑃 =
|𝑁𝑘 (𝑃)|
[1]: Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record,
29(2), pp. 93-104.
PCA(Principal Component Analysis)
Algorithm
Input: 𝑋 ∈ ℝn×𝑚 with 𝑛 samples
Output: 𝑌 = 𝑊𝑋 ∈ ℝn×𝑚′
1 𝑚
Normalization: 𝑥𝑖 = 𝑥𝑖 − σ 𝑥
𝑚 𝑗=1 𝑗
1
Covariance matrix: 𝐶 = 𝑋𝑋 𝑇
𝑚
Calculate eigenvectors
Anomaly score: the distance between the abnormal sample and the feature vector
[1]: Shyu, Mei-Ling, et al. A novel anomaly detection scheme based on principal component classifier. MIAMI UNIV CORAL GABLES
FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.
HBOS(Histogram-based Outlier Score)
Methods
[1]: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In KI-2012:
Poster and Demo Track, pp.59-63.
HBOS(Histogram-based Outlier Score)
Assumption
Multidimensional data is independent of each dimension.
Algorithm
➢ Draw a data histogram
➢ Divide the value range into K buckets of equal(sometimes can be dynamic)
width, and the frequency of the value falling into each bucket is used as an
estimate of density.
Anomaly Score
𝑎
1
𝐻𝐵𝑂𝑆 𝑝 = log( )
ℎ𝑖𝑠𝑡𝑖 (𝑝)
𝑖=0
[1]: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In KI-2012:
Poster and Demo Track, pp.59-63.
AE(Auto Encoder)
Latent Representation
[1]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod
Record, 29(2), pp. 427-438.
Isolation Forest
ICDM '08
[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
Isolation Forest
Anomaly Detection
[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
Isolation Forest
Anomaly Detection
[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
Isolation Forest
Anomaly Detection
[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
REFERENCE
[1]: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international
conference on data mining. IEEE, 2008.
[2]: Eswaran, Dhivya, et al. "Spotlight: Detecting anomalies in streaming graphs." Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
[3]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from
large data sets. ACM Sigmod Record, 29(2), pp. 427-438.
[4]: Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local
outliers. ACM Sigmod Record, 29(2), pp. 93-104.
[5]: Shyu, Mei-Ling, et al. A novel anomaly detection scheme based on principal component classifier.
MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.
[6]: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly
detection algorithm. In KI-2012: Poster and Demo Track, pp.59-63.
[7]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from
large data sets. ACM Sigmod Record, 29(2), pp. 427-438.
[8]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on
Data Mining (ICDM), pp. 413-422. IEEE
Introduction to
Anomaly Detection
Q&A
Linghao Chen
HOMEPAGE: https://lhchen.top
[email protected]
School of Computer Science and Technology, Xidian University, Xi'an, ShaanXi, P.R.China