Isolation Forest Based Anomaly Detection: A Systematic Literature Review

Uploaded by

Gandhimathinathan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

389 views

Isolation Forest Based Anomaly Detection: A Systematic Literature Review

Uploaded by

Gandhimathinathan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Isolation Forest Based Anomaly Detection: A

Systematic Literature Review

Wahid Salman Al Farizi Indriana Hidayah Muhammad Nur Rizal
Department of Electrical and Department of Electrical and Department of Electrical and
Information Engineering Information Engineering Information Engineering
2021 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE) | 978-1-6654-3998-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICITACEE53184.2021.9617498

Universitas Gadjah Mada Universitas Gadjah Mada Universitas Gadjah Mada

Yogyakarta, Indonesia Yogyakarta, Indonesia Yogyakarta, Indonesia
[email protected] [email protected] [email protected]

Abstract— Anomaly detection using machine learning [10]. In addition to the problems in IF, this paper also
algorithms is rising lately, especially with increased data presents the development of IF for use in various fields such
volume and velocity. One of the most recent anomaly detection as fraud detection [4], fault detection on manufacture
algorithms is Isolation Forest (IF). Despite its simplicity, it machines [9], [11], [12], attacks on network systems [10],
excels at dealing with high-dimensional data and excels at [13], [14]and remote sensing [15], [16].
speed. However, IF is not without weaknesses, and several
researchers have found its weaknesses and at the same time This paper will conduct a systematic literature review
provide solutions. Therefore, to help understanding researches (SLR) based on Kitchenham and Charters' guidelines [17]
related to IF, this paper will discuss 17 studies related to IF and divide SLR into three main phases: planning, conducting
improvement by conducting a systematic literature review that the review, and reporting. Section. II will introduce the
comprehensively discusses IF weaknesses, types of data, and concept of IF, Section. III will discuss planning, Section. IV
causes of occurrence, as well as dissecting the solutions offered discussed conducting review and Section. V presented
and the fields of research that use IF. From the review, it is reporting as a result of the review.
known that the main cause of the weakness of IF is the random
selections of variables in the data split and the solutions II. ISOLATION FOREST
proposed by the researchers are divided into three types: pre-
IF, post-IF and method improvement. To our knowledge, there
The main concepts of IF utilize two main characteristics
is no literature review related to IF improvement, and we of anomaly. i.e., anomalies are few and have different
expect this paper to help other researchers in developing attributes[2]. IF split data into two parts as in a binary tree,
anomaly detection based on IF. splitting is done to limit a certain tree height or until the data
can no longer be split. The structure formed is called iTree
Keywords— anomaly detection, isolation forest, systematic [2]. Samples isolated at the start of the split or closer to the
literature review root of iTree have the potential to become anomalous. In
contrast, normal samples are more difficult to isolate and
I. INTRODUCTION require much slicing until they can be isolated [2]. Notice in
Several researchers have developed Anomaly Detection Fig. 1, a node on iTree can be an external node with no
(AD). Many types of AD methods such as density-based, children or an internal node with two children. The isolated
distance-based, neural network, and spectral-based [1]. These sample is described as an external node, and the number of
algorithms were initially developed for machine learning slices needed to isolate the node is the number of edges
aimed at classification or clustering and were modified for calculated from the root node to the external node. The
AD. Until Liu [2] introducing Isolation Forest (IF), an number of edges is then referred to as the path length [18].
algorithm specially developed for AD.
Domingues et al. [3] evaluated several up-to-date
anomaly detection algorithms, including Gaussian Mixture
Model (GMM), Kernel Density Estimator (KDE),
Mahalanobis Distance, Local Outlier Factor (LOF), One-
class Support Vector Machine (OCSVM), and Isolation
Forest (IF). The evaluation used 15 different datasets from
public data UCI or OpenML and private data. The result has
shown that IF outperforms another algorithm in terms of
accuracy and time while also demonstrate satisfactory
performance in handling high-dimensional datasets.
However, several recent studies reported IF's weaknesses and
developed anomaly detection algorithm based on IF to
overcome these weaknesses.
This paper discusses some of the latest research related to
IF's application in various fields and solutions developed to
overcome IF's weaknesses. The most prominent problems
are the low accuracy in detecting local outliers or conditional
outliers [4], [5], time wastage due to the random selection for Fig. 1. iTree structure consisting of a root node, external node, internal
feature split [6]–[8], and reduced accuracy on a high- node and edge. Whereas Path Length is the number of edges from root to
dimensional dataset with low-dimensional anomaly data [9], external node [4], [10], [19]

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

118
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
III. REVIEW PLANNING TABLE I. RESEARCH QUESTIONS

The planning stage consists of explaining why SLR is ID Research Questions Objectives
needed, identifying research questions, and compiling a RQ 1 What are the main weaknesses of Identify the research
IF? gaps of each study
review protocol. RQ 2 What solutions are offered to Identify the method used
A. The reason behind SLR overcome each of these with IF
weaknesses?
IF is the first algorithm explicitly designed for anomaly RQ 3 In what areas have researchers Identify the environment
detection. IF also comes at the right time when data flooding applied IF? and data type suitable
occurs. However, to process various types of data with their for IF
various uniqueness, it is necessary to improve the IF
algorithm. This paper aims to review strategies undertaken TABLE II. PRELIMINARY SEARCH RESULT
by researchers in utilizing IF in anomaly detection and
Isolation Isolation
strategies for overcoming IF weaknesses in various data Isolation Based Based
fields. To achieve those goals, it will answer some Research Source Quantity
Forest Anomaly Outlier
Questions (RQ). Detection Detection

B. Research Questions Sciencedirect 584 675 519 1778

To answer the needs and control the direction of the SLR. springer 260 221 113 594
Research questions should be formulated based on the
IEEE 35 23 7 65
reasons behind implementing SLR. the questions can be seen
in Table I. Sagepub 18 34 31 83

C. Review Protocols 897 953 670 2520

This study needs protocols for performing SLRs to

prevent bias [17]. The protocols in this study are (1) the TABLE III. MANUAL LITERATURE SEARCH RESULT
reason behind the SLR, (2) research questions, (3) literature Refer Year Methodology Dataset
search strategy, (4) literature selection criteria, (5) result and ence
data discussion, and (6) conclusion or reports. Protocols 1 [9] 2017 Forward Selection Primary data in semi-
and 2 have already been implemented above, and the Independent Variables conductor manufacture
(FSIV) and Forward
following discussion will discuss the other four protocols. Selection Minimizing the
Maximum Reconstruction
IV. CONDUCTING REVIEW Error (FSMM)
[4] 2018 • Expert knowledge Primary data of
A. Literature Search Strategy Financial data
• Binary classification
The difference between SLR and traditional review [13] 2019 SPARK Parallel A public dataset of IDS
methods lies in a literature search strategy [17]. SLR has a Computing UNSW-NB15
literature search structured in such a way as to get credible [7] 2019 Random Slope Feature Public dataset on
literature sources and avoid bias[17]. In this study, we Splitting medical, weather, and
satellite
searched in 4 (four) digital libraries: [5] 2019 Correlation Analysis Public dataset –
Aircraft engine data
• Science Direct (http://www.sciencedirect.com/)
[10] 2019 PCA based Feature Primary Dataset of
• SpringerLink (http://link.springer.com/) Selection and Feature Power Control Center
Extraction
• IEEE-Xplore [6] 2019 Weighted Feature-based Primary data – Docker
(https://ieeexplore.ieee.org/Xplore/home.jsp) Feature Selection Container performance
data
• Sagepub (https://journals.sagepub.com/) [8] 2020 Histogram based feature A public dataset of
splitting NASA software defect
The keywords used are "isolation AND forest", "isolation [11] 2020 Local Outlier Factor Public data of concrete
AND based AND anomaly AND detection", and "isolation (UCI ML Repository)
AND based AND outlier AND detection". The filter applied [20] 2020 Deep learning Primary data of wind
turbine performance
is English-language research published in journals in
[19] 2020 Gaussian Mixture Model- Primary data of wind
computers and engineering from 2016 to 2020. The results of based clustering turbine
this search found 2520 researches. Table II displays the [21] 2020 mixture coefficients of a 14 Public ML dataset
search results in detail. mixture distribution
[22] 2020 Stack Autoencoder to Primary dataset:
B. Literature Selection Criteria reduce false positive network intrusions data
A manual search was carried out to select research [23] 2020 Autoencoder based Public dataset: network
dimension reduction intrusions data NSL-
related to IF. The research selected must propose an KDD
improvement to IF by overcoming IF weaknesses. Research [16] 2020 Multiple feature extraction Public Hyperspectral
must also use commonly known algorithmic evaluation data
methods such as AUROC, AUPRC, or F-measure. The [15] 2020 Kernel-based Public Hyperspectral
results of this manual search found 17 studies as seen in data
Table III. [12] 2020 Dual isolation forest Primary data of
Industrial Control
System Network

119
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
V. RESULT AND DATA DISCUSSION Aminanto [22] found that IF has decreased accuracy by
After reviewing each of the studies selected above, we producing a high false-positive rate on imbalanced high-
have obtained SLR results. Each study is grouped according dimensional data. This weakness is very crucial, especially if
to its suitability for the RQ to present the results for each the cost of false positives is very high.
objective. 3) Bias and Performance Issues
A. RQ 1: What are the main weaknesses of IF? Standart IF splits data randomly in parallel with the data
dimensions. In their research, Hariri et al. [7] used heat maps
The IF's central concept is to split data randomly until the to evaluate the AD method on IF and found that this method
data are isolated and then calculate the average number of can cause bias in AD. This bias is the occurrence of ghost
splits needed to isolate the data. The smaller the score areas that have high heat map scores.
obtained, the more likely the data are anomaly[2]. Based on
this concept, the researchers found several weaknesses in IF. Buschjäger et al. [21] evaluating IF fundamentally by
finding that the anomaly score assessment using the average
1) Low accuracy in the detection of conditional path length calculation method has no theoretical explanation
anomalies based on an understanding of the data distribution. This
The IF split data on randomly selected features and the theoretical explanation can be the basis for improving IF
data's split points are also randomly selected. Conditional performance.
anomalies can occur in many types of datasets. The study by
Stripling et al. [4] found the problem of AD by IF in a high- B. RQ 2: What solutions are offered to overcome these
dimensional dataset, but only a few features affect the weaknesses?
occurrence of anomalies. This weakness was also The researchers proposed several methods to overcome
investigated by Wang et al. [16] in their study stated that in the weaknesses of IF. Researchers not only use statistical
hyperspectral data, certain features generally determine the methods or other algorithms to strengthen IF but also
occurrence of anomalies. propose changes to the IF algorithm. They divide the
proposed method into three categories. (1) Pre-IF, which is
This problem was also found by Zou et al. [6] in their
an intervention where data processed by other methods
research which stated that each data has a different feature
before AD with IF, (2) post-IF, researcher use the output of
weight, so that random feature selection is not effective.
IF as input for other methods, and (3) method improvement,
Likewise, as in the research of Khan et al. [5], in datasets
modification of certain parts of the IF algorithm. Fig.2 shows
with correlated features, hidden anomalies can occur in
the proposed methods.
highly correlated data. Datasets taken under multiple
operating conditions can have extreme values that occur in
features to hide the anomaly [19].
Conditional anomalies can also lead to false positives or
false negatives, such as the imbalance dataset. Imbalance
means the number of anomalies is tiny compared to nominal
data[8], [23]. Or on hyperspectral data where IF is built
based on random selection for each pixel in the entire scene
[15].
2) Low accuracy in specific high dimensional data
IF has a good ability in anomaly detection on high
dimensional datasets [3], but researchers found that IF can
experience difficulties on datasets that have specific
properties. Puggini and McLoone's [9] study found that IF
experienced an increase in false positives on the high-
correlated dataset because slight differences outside the
normal value in groups of correlated variables can potentially
lead to anomalies rather than significant differences in
groups of isolated variables.
A high volume of high dimensional data can also lead to
degradation of accuracy. Tao et al. [15] stated that IF can
experience a bottleneck on big data if applied to a single
computer. The dataset processing capacity in the IF
algorithm is limited to the available memory on one machine
where the IF-based AD software is installed.
Ahmed et al. [10] examined AD on a very high
dimensional dataset with dimensions of up to 1122
dimensions and has a close relationship between dimensions.
Standard IF has decreased accuracy because the IF feature
splitting process does not involve inter-dimensional
relationships. Besides that, processing 1122 dimensions in IF
takes much time. Fig. 2. IF’s proposed methods scheme

120
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
1) Pre – IF c) Clustering
IF accuracy decreases in high-dimensional data due to the The data obtained under different operating conditions
nature of data mentioned in the previous discussion. can hide anomalies if the data are not separated based on
Therefore, to optimize IF, data pre-processing is required conditions. Operating conditions can be behavioral attributes
before AD. Data processing can be in the form of reducing or contextual attributes. When dealing with such data, it is
dimensions, selecting features, and clustering. necessary to pre-process the data before doing AD with IF.
a) Dimension Reduction In their research, Chen et al. [23] proposed data separation
based on operational conditions using the Gaussian Mixture
Although IF is good at handling high dimensional data, Model (GMM). They choose this method because the data
reducing dimension or feature extraction is necessary to used by [24] are complex data with nonparametric
maintain the performance and accuracy in large hyper- distribution.
dimension data. Ahmed et al. [11] used principal component
analysis (PCA) to reduce large power system data 2) Post-IF
dimensions. Due to the uniform distribution in the dataset Another approach to increasing the accuracy of IF in
and the Gaussian distribution formed from the sensor or conditional outlier detection is to use the IF output as input
meter measurement noise. Meanwhile, PCA can work well for other algorithms. Instead of making the IF output the
on data with Gaussian distribution. Besides, by using PCA, final result, Alsini et al. [11] made the IF output an outlier
the researcher can select the number of components candidate reprocessed using the LOF and increased the
representing the data dynamics from the variance criteria. accuracy by the sliding window method used in selecting
When PCA is suitable for correlated data, it becomes a candidates from the IF results.
problem when the anomaly is outside the large cluster of
correlated data because PCA will miss the anomaly data. In their research, Li et al. [15] found that random
Therefore Puggini and McLoone [9] propose a PCA-based selection of data points on hyperspectral data causes many
method to find the variables that have the least relationship false alarms. Therefore, they propose using twice IF, the first
with the variables selected by the PCA, which is named the to find global anomaly maps, the second to refining global
Forward Selection Independent Variables (FSIV) method anomaly maps to find the local anomaly. Elnour et al. [12]
and Forward Selection Minimizing the Maximum also use two Ifs. However, their research strengthened the
Reconstruction Error (FSMM). second IF using PCA.

Sadaf and Sultana [23] use an autoencoder (AE) to add Aminanto et al. [22]. In their study, investigate a high
features in the form of "attack" and "normal" labels to high- false-positive rate on imbalanced high-dimensional data. A
dimensional network traffic data so that the IF algorithm can stacked autoencoder (SAE) is used as a data processing
improve detection accuracy as it focuses on two features algorithm after IF. SAE is a set of autoencoders (AE)
provided by AE. arranged sequentially. The AE output becomes the next AE
input to produce a reconstruction error (RE). AE defines the
b) Feature Selection anomaly as a variable with a high RE score.
In low correlated data, feature selection can be used to select 3) Method Improvement
the most important features [10]. Stripling et al. [4], in their In addition to using other algorithms to help IF overcome
study, found that some high-dimensional data have more its weaknesses, researchers also modify the IF algorithm to
significant features than others. On nominal data, expert cover existing weaknesses. Buschjäger et al. [21] dissect the
knowledge can quickly identify these features. basic theory of IF and suggest that scoring anomaly using
Using expert experience to select nominal feature data and average path length is the same as calculating the mixture
feed it to the IF algorithm, [4] can find previously undetected components' estimated coefficient. Therefore, to optimize IF,
conditional anomalies. [21] proposes an estimate of mixture coefficients of mixture
distribution as an anomaly scoring method.
Zou et al. [6] conducted AD research on a dataset where
each feature's influence level can vary in each sample data. Another improvement was also made by Tao et al. [13]
In this condition, the accuracy of IF decreases. They by utilizing the independent nature of iTree to modify IF and
introduced the weighting method. This method is applied to implement it on a parallel computer. iTree construction on
docker container data by monitoring the most dominant bigdata processing requires extensive memory resources so
resource usage for each container. Selecting features based that a bottleneck can occur when using one machine, so Tao
on resource weight can improve the accuracy of IF. et al. [13], with the construction of iTree on a parallel
computer, has succeeded in overcoming the limited memory
Khan et al. [5] use the correlation analysis method for the resources for the IF process on one machine.
feature selection approach. The correlation analysis is a
method that measures the relationship between variables and Another study by Hariri [7] looked at the nature of data
then classifies them based on the correlation coefficient. This splitting on IF, which was always horizontal or vertical. This
method can perform AD more effectively because it isolated study then proposed an IF improvement by making the data
the group of variables causing the anomaly. splitting direction randomly. The method succeeded in
eliminating the bias and ghost area formed from horizontal
Ding and Xing [8] also use the feature selection approach and vertical splitting.
in overcoming the weaknesses of IF. In their research, they
convert the feature's distribution value into a histogram, then C. RQ 3: In what areas have researchers applied IF?
splitting is done on the value that lies at the distribution The areas of greatest interest are fault detection and
boundary or the value with low frequency. This method diagnosis [5], [6], [8], [9], [11], [20], [24]. Some researchers
achieved convergence faster than the random splitting point test the reliability of their proposed improvements to several
by IF standard.

121
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
[7] S. Hariri, M. Carrasco Kind, and R. J. Brunner, “Extended
Isolation Forest,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2019, doi:
10.1109/TKDE.2019.2947676.
[8] Z. Ding and L. Xing, “Improved software defect prediction
using Pruned Histogram-based isolation forest,” Reliab. Eng. Syst.
Saf., vol. 204, no. August, p. 107170, Dec. 2020, doi:
10.1016/j.ress.2020.107170.
[9] L. Puggini and S. McLoone, “An enhanced variable
selection and Isolation Forest based methodology for anomaly
detection with OES data,” Eng. Appl. Artif. Intell., vol. 67, no.
September 2017, pp. 126–135, Jan. 2018, doi:
10.1016/j.engappai.2017.09.021.
[10] S. Ahmed, Y. Lee, S.-H. Hyun, and I. Koo, “Unsupervised
Machine Learning-Based Detection of Covert Data Integrity Assault
in Smart Grid Networks Utilizing Isolation Forest,” IEEE Trans. Inf.
Fig. 3. Distribution of IF research areas Forensics Secur., vol. 14, no. 10, pp. 2765–2777, Oct. 2019, doi:
10.1109/TIFS.2019.2902822.
datasets at once [7], [21]. Research like this is grouped into [11] R. Alsini, A. Almakrab, A. Ibrahim, and X. Ma, “Improving the
outlier detection method in concrete mix design by combining the
"various fields", which describes the distribution of research isolation forest and local outlier factor,” Constr. Build. Mater., vol.
areas, while other studies are grouped in their respective 270, p. 121396, Feb. 2020, doi: 10.1016/j.conbuildmat.2020.121396.
fields as shown in Fig. 3. [12] M. Elnour, N. Meskin, K. Khan, and R. Jain, “A dual-isolation-
forests-based attack detection framework for industrial control
VI. CONCLUSION systems,” IEEE Access, vol. 8, pp. 36639–36651, 2020, doi:
10.1109/ACCESS.2020.2975066.
The increasing type and amount of data and the need for
[13] X. Tao, Y. Peng, F. Zhao, P. Zhao, and Y. Wang, “A parallel
fast and accurate data have made AD a rising branch of algorithm for network traffic anomaly detection based on Isolation
machine learning in recent years. Isolation forest was Forest,” Int. J. Distrib. Sens. Networks, vol. 14, no. 11, p.
introduced in 2008 as one of the AD methods recognized as 155014771881447, Nov. 2018, doi: 10.1177/1550147718814471.
having good potential and has several weaknesses. [14] M. Kiran, C. Wang, G. Papadimitriou, A. Mandal, and E. Deelman,
Researchers have studied these weaknesses and offer “Detecting anomalous packets in network transfers:
solutions. investigations using PCA, autoencoder and isolation forest in TCP,”
Mach. Learn., vol. 109, no. 5, pp. 1127–1143, May 2020, doi:
This paper reviewed 17 studies that suggest solutions to 10.1007/s10994-020-05870-y.
IF's weaknesses. Most researchers agree that IF's weakness [15] S. Li, K. Zhang, P. Duan, and X. Kang, “Hyperspectral
in conditional anomaly detection is due to the random Anomaly Detection With Kernel Isolation Forest,” IEEE Trans.
Geosci. Remote Sens., vol. 58, no. 1, pp. 319–329, Jan. 2020, doi:
selection of features and the random selection of split points. 10.1109/TGRS.2019.2936308.
However, there is no consensus on the best way to deal [16] R. Wang, F. Nie, Z. Wang, F. He, and X. Li, “Multiple Features
with this, and researchers are divided into three methods: and Isolation Forest-Based Fast Anomaly Detector for Hyperspectral
Imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 9, pp.
Pre-IF intervention, Post-IF intervention, and method 6664–6676, Sep. 2020, doi: 10.1109/TGRS.2020.2978491.
improvement. [17] B. Kitchenham and S. Charters, “Guidelines for performing
Research in AD with IF can still be developed in the systematic literature reviews in software engineering,” 2007.
future. Research topics related to the challenges in applying [18] Y. Chen and W. Wu, “Isolation Forest as an Alternative Data-
Driven Mineral Prospectivity Mapping Method with a Higher Data-
IF for text-mining and sentiment analysis to the best of our Processing Efficiency,” Nat. Resour. Res., vol. 28, no. 1, pp. 31–46,
knowledge have never been done. We think this topic is Jan. 2019, doi: 10.1007/s11053-018-9375-6.
exciting to be further explored with current social media [19] H. Chen, H. Ma, X. Chu, and D. Xue, “Anomaly detection and
trends. critical attributes identification for products with multiple operating
conditions based on isolation forest,” Adv. Eng. Informatics, vol. 46,
REFERENCES no. March 2019, p. 101139, Oct. 2020, doi:
10.1016/j.aei.2020.101139.
[1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly
detection : A Survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, [20] Z. Lin, X. Liu, and M. Collu, “Wind power prediction based
Jul. 2009, doi: 10.1145/1541880.1541882. on high-frequency SCADA data along with isolation forest and deep
learning neural networks,” Int. J. Electr. Power Energy Syst., vol.
[2] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation-Based
118, no. September 2019, p. 105835, Jun. 2020, doi:
Anomaly Detection,” ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, 10.1016/j.ijepes.2020.105835.
pp. 1–39, Mar. 2012, doi: 10.1145/2133360.2133363.
[21] S. Buschjäger, P.-J. Honysz, and K. Morik, “Randomized
[3] R. Domingues, M. Filippone, P. Michiardi, and J. Zouaoui, “A
outlier detection with trees,” Int. J. Data Sci. Anal., Dec. 2020, doi:
comparative evaluation of outlier detection algorithms: Experiments 10.1007/s41060-020-00238-w.
and analyses,” Pattern Recognit., vol. 74, pp. 406–421, Feb. 2018,
doi: 10.1016/j.patcog.2017.09.037. [22] M. E. Aminanto, T. Ban, R. Isawa, T. Takahashi, and D. Inoue,
“Threat Alert Prioritization Using Isolation Forest and Stacked
[4] E. Stripling, B. Baesens, B. Chizi, and S. vanden Broucke,
Auto Encoder With Day-Forward-Chaining Analysis,” IEEE Access,
“Isolation-based conditional anomaly detection on mixed-
vol. 8, pp. 217977–217986, 2020, doi:
attribute data to uncover workers’ compensation fraud,” Decis. 10.1109/ACCESS.2020.3041837.
Support Syst., vol. 111, no. April, pp. 13–26, Jul. 2018, doi:
10.1016/j.dss.2018.04.001. [23] K. Sadaf and J. Sultana, “Intrusion Detection Based on
Autoencoder and Isolation Forest in Fog Computing,” IEEE Access,
[5] S. Khan, C. F. Liew, T. Yairi, and R. McWilliam, “Unsupervised
vol. 8, pp. 167059–167068, 2020, doi:
anomaly detection in unmanned aerial vehicles,” Appl. Soft Comput., 10.1109/ACCESS.2020.3022855.
vol. 83, p. 105650, Oct. 2019, doi: 10.1016/j.asoc.2019.105650.
[24] H. Chen, H. Ma, X. Chu, and D. Xue, “Anomaly detection and
[6] Z. Zou, Y. Xie, K. Huang, G. Xu, D. Feng, and D. Long, “A
critical attributes identification for products with multiple operating
Docker Container Anomaly Monitoring System Based on Optimized
conditions based on isolation forest,” Adv. Eng. Informatics, vol. 46,
Isolation Forest,” IEEE Trans. Cloud Comput., vol. 7161, no. no. July, p. 101139, Oct. 2020, doi: 10.1016/j.aei.2020.101139.
SEPTEMBER 2018, pp. 1–1, 2019, doi: 10.1109/TCC.2019.2935724.

122
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app

Handicare 2000 Engineer Reference Guide
100% (1)
Handicare 2000 Engineer Reference Guide
44 pages
Apple Store Web Order
No ratings yet
Apple Store Web Order
1 page
Machine Learning Operations MLOps Overview Definition and Architecture
No ratings yet
Machine Learning Operations MLOps Overview Definition and Architecture
14 pages
ML Interview Cheat Sheet
No ratings yet
ML Interview Cheat Sheet
9 pages
Heckman JD, Ryaby JP, McCabe J, Et Al. Acceleration of Tibial Fracture Healing by Non Invasive, Low-Intensity Pulsed Ultrasound. (1994)
No ratings yet
Heckman JD, Ryaby JP, McCabe J, Et Al. Acceleration of Tibial Fracture Healing by Non Invasive, Low-Intensity Pulsed Ultrasound. (1994)
10 pages
Isolation Forest
No ratings yet
Isolation Forest
11 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
Manual Testing Cheat Sheet
No ratings yet
Manual Testing Cheat Sheet
9 pages
4 Automatic Outlier Detection Algorithms in Python
No ratings yet
4 Automatic Outlier Detection Algorithms in Python
2 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
Bayesian Learning: An Introduction: Jo Ao Gama
No ratings yet
Bayesian Learning: An Introduction: Jo Ao Gama
65 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Introduction To Machine Learning (CS419M)
No ratings yet
Introduction To Machine Learning (CS419M)
25 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Python For Data Science
100% (1)
Python For Data Science
4 pages
Machine learning_question bank
No ratings yet
Machine learning_question bank
45 pages
Python For Data Science Cheat Sheet: Subset Slice
50% (2)
Python For Data Science Cheat Sheet: Subset Slice
1 page
Regularization_for_Neural_Networks_1718966083
No ratings yet
Regularization_for_Neural_Networks_1718966083
9 pages
Cypress Component Testing 1
No ratings yet
Cypress Component Testing 1
2 pages
Learn Outlier Detection in Python PyOD Library 1566237490
No ratings yet
Learn Outlier Detection in Python PyOD Library 1566237490
23 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
Machine Learning Techniques Quantum
No ratings yet
Machine Learning Techniques Quantum
161 pages
Item-Based Collaborative Filtering Recommendation Algorithms
No ratings yet
Item-Based Collaborative Filtering Recommendation Algorithms
11 pages
Temporal Convolutional Network (TCN)
100% (1)
Temporal Convolutional Network (TCN)
21 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
POWER BI Interview Questions and Answer PDF
No ratings yet
POWER BI Interview Questions and Answer PDF
4 pages
WTX Overview 2014
No ratings yet
WTX Overview 2014
13 pages
Python Testing
No ratings yet
Python Testing
211 pages
Introduction
100% (1)
Introduction
49 pages
41 Essential Machine Learning Interview Questions: 18 Mins Read
No ratings yet
41 Essential Machine Learning Interview Questions: 18 Mins Read
21 pages
DATA SCIENCE INTERVIEW
No ratings yet
DATA SCIENCE INTERVIEW
32 pages
Lec19 - GANs
No ratings yet
Lec19 - GANs
47 pages
Python Core Material
No ratings yet
Python Core Material
162 pages
Data Smart For Product Managers
100% (1)
Data Smart For Product Managers
13 pages
Machine Learning Interviews V 2 Week 11715787639480
0% (1)
Machine Learning Interviews V 2 Week 11715787639480
49 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
39 pages
1.2 Introduction To Applied Data Science
No ratings yet
1.2 Introduction To Applied Data Science
47 pages
Data Science Statistics Mathematics Cheat Sheet
100% (1)
Data Science Statistics Mathematics Cheat Sheet
13 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Dragon Bundle Projects List
No ratings yet
Dragon Bundle Projects List
18 pages
Data Science Interview Guide
No ratings yet
Data Science Interview Guide
23 pages
Simran Kureel: Education Skills
No ratings yet
Simran Kureel: Education Skills
1 page
Machine Learning With Python.
0% (1)
Machine Learning With Python.
13 pages
Getting Started With Building Microservices
No ratings yet
Getting Started With Building Microservices
17 pages
Machine Learning: Interview Questions
No ratings yet
Machine Learning: Interview Questions
21 pages
Machine Learning AndrewNg
No ratings yet
Machine Learning AndrewNg
116 pages
Data Science Skills They Dont Teach You
No ratings yet
Data Science Skills They Dont Teach You
72 pages
Machine Learning Guide: Meher Krishna Patel
No ratings yet
Machine Learning Guide: Meher Krishna Patel
121 pages
Banking, Finance and Insurance Domain
No ratings yet
Banking, Finance and Insurance Domain
14 pages
Designing Machine Learning Workflows in Python Chapter4
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
38 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
276 pages
The Ultimate Data Observability Checklist Guide
No ratings yet
The Ultimate Data Observability Checklist Guide
8 pages
Intro To BI
No ratings yet
Intro To BI
28 pages
Framework Design Guidelines
No ratings yet
Framework Design Guidelines
90 pages
InterraIT QA and Testing Services v5
No ratings yet
InterraIT QA and Testing Services v5
24 pages
2.2 ML Session Bias Variance Tradeoffs
No ratings yet
2.2 ML Session Bias Variance Tradeoffs
38 pages
DS Notes
No ratings yet
DS Notes
170 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Pattern Recognition Letters: Julien Lesouple, Cédric Baudoin, Marc Spigai, Jean-Yves Tourneret
No ratings yet
Pattern Recognition Letters: Julien Lesouple, Cédric Baudoin, Marc Spigai, Jean-Yves Tourneret
11 pages
19EC409 - Discrete Time Signal Processing
No ratings yet
19EC409 - Discrete Time Signal Processing
4 pages
11 Parallel Resonance
No ratings yet
11 Parallel Resonance
14 pages
LA Opti Assignment 53.1 Lagrangian Duality
No ratings yet
LA Opti Assignment 53.1 Lagrangian Duality
8 pages
LA Assignment 50
No ratings yet
LA Assignment 50
4 pages
Early Sensing of Tomato Brown Rugose Fruit Virus in Tomato Plants Via Electrical Measurements
No ratings yet
Early Sensing of Tomato Brown Rugose Fruit Virus in Tomato Plants Via Electrical Measurements
4 pages
SAR New
No ratings yet
SAR New
31 pages
D 5730 - 02 - Rdu3mzatmdi - PDF
No ratings yet
D 5730 - 02 - Rdu3mzatmdi - PDF
32 pages
8D Report Training MID
No ratings yet
8D Report Training MID
23 pages
6 - Acronis Cyber Notary
No ratings yet
6 - Acronis Cyber Notary
23 pages
Somatic Embryogenesis of Tohiti Rattan (Calamus Inops Becc. Ex Heyne)
No ratings yet
Somatic Embryogenesis of Tohiti Rattan (Calamus Inops Becc. Ex Heyne)
10 pages
Asim Kumar Manna - Business Mathematics and Statistics (2018, McGraw-Hill Education)
50% (2)
Asim Kumar Manna - Business Mathematics and Statistics (2018, McGraw-Hill Education)
624 pages
IT353 Project
No ratings yet
IT353 Project
6 pages
Performance Task G12 ABM
No ratings yet
Performance Task G12 ABM
2 pages
1619012152048
No ratings yet
1619012152048
8 pages
Rubrics
No ratings yet
Rubrics
1 page
ISO IEC 17025 Accreditation Cert+Scope-GCC Accreditation
No ratings yet
ISO IEC 17025 Accreditation Cert+Scope-GCC Accreditation
23 pages
Flowchart - Programming
100% (23)
Flowchart - Programming
6 pages
Denver Developmental Screening Test Interpretation
No ratings yet
Denver Developmental Screening Test Interpretation
1 page
Power System Transients: Parameter Determination: D D R D D D D R
No ratings yet
Power System Transients: Parameter Determination: D D R D D D D R
1 page
Tabla Ford Focus
No ratings yet
Tabla Ford Focus
10 pages
Grafana
50% (2)
Grafana
57 pages
1 s2.0 S2212017312005956 Main
No ratings yet
1 s2.0 S2212017312005956 Main
8 pages
NS Grade 9 Term 3 Teacher Guide
No ratings yet
NS Grade 9 Term 3 Teacher Guide
32 pages
FINAL Assessment 7 Task Brief and Marking Key
No ratings yet
FINAL Assessment 7 Task Brief and Marking Key
3 pages
Ricoh MP 402spf
No ratings yet
Ricoh MP 402spf
60 pages
Techniques For Assessing A Project's Cost and Schedule Performance
No ratings yet
Techniques For Assessing A Project's Cost and Schedule Performance
21 pages
Stages in Construction
No ratings yet
Stages in Construction
16 pages
Nombre:: Evaluación de La Maduración Neurolimbica
No ratings yet
Nombre:: Evaluación de La Maduración Neurolimbica
16 pages
Reliance Jio Infocomm Limited Invoice: Unpaid
No ratings yet
Reliance Jio Infocomm Limited Invoice: Unpaid
1 page
C1 Oral Topics
No ratings yet
C1 Oral Topics
3 pages
Garmin Wiring Diagrams - G3X - All
No ratings yet
Garmin Wiring Diagrams - G3X - All
15 pages
DM Report Structure in Template
No ratings yet
DM Report Structure in Template
14 pages
Psyc 111 Chapter 6
No ratings yet
Psyc 111 Chapter 6
10 pages

Isolation Forest Based Anomaly Detection: A Systematic Literature Review

Uploaded by

Isolation Forest Based Anomaly Detection: A Systematic Literature Review

Uploaded by

Isolation Forest Based Anomaly Detection: A

Systematic Literature Review

Universitas Gadjah Mada Universitas Gadjah Mada Universitas Gadjah Mada

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

B. Research Questions Sciencedirect 584 675 519 1778

C. Review Protocols 897 953 670 2520

This study needs protocols for performing SLRs to

You might also like