Isolation Forest Based Anomaly Detection: A Systematic Literature Review
Isolation Forest Based Anomaly Detection: A Systematic Literature Review
Abstract— Anomaly detection using machine learning [10]. In addition to the problems in IF, this paper also
algorithms is rising lately, especially with increased data presents the development of IF for use in various fields such
volume and velocity. One of the most recent anomaly detection as fraud detection [4], fault detection on manufacture
algorithms is Isolation Forest (IF). Despite its simplicity, it machines [9], [11], [12], attacks on network systems [10],
excels at dealing with high-dimensional data and excels at [13], [14]and remote sensing [15], [16].
speed. However, IF is not without weaknesses, and several
researchers have found its weaknesses and at the same time This paper will conduct a systematic literature review
provide solutions. Therefore, to help understanding researches (SLR) based on Kitchenham and Charters' guidelines [17]
related to IF, this paper will discuss 17 studies related to IF and divide SLR into three main phases: planning, conducting
improvement by conducting a systematic literature review that the review, and reporting. Section. II will introduce the
comprehensively discusses IF weaknesses, types of data, and concept of IF, Section. III will discuss planning, Section. IV
causes of occurrence, as well as dissecting the solutions offered discussed conducting review and Section. V presented
and the fields of research that use IF. From the review, it is reporting as a result of the review.
known that the main cause of the weakness of IF is the random
selections of variables in the data split and the solutions II. ISOLATION FOREST
proposed by the researchers are divided into three types: pre-
IF, post-IF and method improvement. To our knowledge, there
The main concepts of IF utilize two main characteristics
is no literature review related to IF improvement, and we of anomaly. i.e., anomalies are few and have different
expect this paper to help other researchers in developing attributes[2]. IF split data into two parts as in a binary tree,
anomaly detection based on IF. splitting is done to limit a certain tree height or until the data
can no longer be split. The structure formed is called iTree
Keywords— anomaly detection, isolation forest, systematic [2]. Samples isolated at the start of the split or closer to the
literature review root of iTree have the potential to become anomalous. In
contrast, normal samples are more difficult to isolate and
I. INTRODUCTION require much slicing until they can be isolated [2]. Notice in
Several researchers have developed Anomaly Detection Fig. 1, a node on iTree can be an external node with no
(AD). Many types of AD methods such as density-based, children or an internal node with two children. The isolated
distance-based, neural network, and spectral-based [1]. These sample is described as an external node, and the number of
algorithms were initially developed for machine learning slices needed to isolate the node is the number of edges
aimed at classification or clustering and were modified for calculated from the root node to the external node. The
AD. Until Liu [2] introducing Isolation Forest (IF), an number of edges is then referred to as the path length [18].
algorithm specially developed for AD.
Domingues et al. [3] evaluated several up-to-date
anomaly detection algorithms, including Gaussian Mixture
Model (GMM), Kernel Density Estimator (KDE),
Mahalanobis Distance, Local Outlier Factor (LOF), One-
class Support Vector Machine (OCSVM), and Isolation
Forest (IF). The evaluation used 15 different datasets from
public data UCI or OpenML and private data. The result has
shown that IF outperforms another algorithm in terms of
accuracy and time while also demonstrate satisfactory
performance in handling high-dimensional datasets.
However, several recent studies reported IF's weaknesses and
developed anomaly detection algorithm based on IF to
overcome these weaknesses.
This paper discusses some of the latest research related to
IF's application in various fields and solutions developed to
overcome IF's weaknesses. The most prominent problems
are the low accuracy in detecting local outliers or conditional
outliers [4], [5], time wastage due to the random selection for Fig. 1. iTree structure consisting of a root node, external node, internal
feature split [6]–[8], and reduced accuracy on a high- node and edge. Whereas Path Length is the number of edges from root to
dimensional dataset with low-dimensional anomaly data [9], external node [4], [10], [19]
118
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
III. REVIEW PLANNING TABLE I. RESEARCH QUESTIONS
The planning stage consists of explaining why SLR is ID Research Questions Objectives
needed, identifying research questions, and compiling a RQ 1 What are the main weaknesses of Identify the research
IF? gaps of each study
review protocol. RQ 2 What solutions are offered to Identify the method used
A. The reason behind SLR overcome each of these with IF
weaknesses?
IF is the first algorithm explicitly designed for anomaly RQ 3 In what areas have researchers Identify the environment
detection. IF also comes at the right time when data flooding applied IF? and data type suitable
occurs. However, to process various types of data with their for IF
various uniqueness, it is necessary to improve the IF
algorithm. This paper aims to review strategies undertaken TABLE II. PRELIMINARY SEARCH RESULT
by researchers in utilizing IF in anomaly detection and
Isolation Isolation
strategies for overcoming IF weaknesses in various data Isolation Based Based
fields. To achieve those goals, it will answer some Research Source Quantity
Forest Anomaly Outlier
Questions (RQ). Detection Detection
119
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
V. RESULT AND DATA DISCUSSION Aminanto [22] found that IF has decreased accuracy by
After reviewing each of the studies selected above, we producing a high false-positive rate on imbalanced high-
have obtained SLR results. Each study is grouped according dimensional data. This weakness is very crucial, especially if
to its suitability for the RQ to present the results for each the cost of false positives is very high.
objective. 3) Bias and Performance Issues
A. RQ 1: What are the main weaknesses of IF? Standart IF splits data randomly in parallel with the data
dimensions. In their research, Hariri et al. [7] used heat maps
The IF's central concept is to split data randomly until the to evaluate the AD method on IF and found that this method
data are isolated and then calculate the average number of can cause bias in AD. This bias is the occurrence of ghost
splits needed to isolate the data. The smaller the score areas that have high heat map scores.
obtained, the more likely the data are anomaly[2]. Based on
this concept, the researchers found several weaknesses in IF. Buschjäger et al. [21] evaluating IF fundamentally by
finding that the anomaly score assessment using the average
1) Low accuracy in the detection of conditional path length calculation method has no theoretical explanation
anomalies based on an understanding of the data distribution. This
The IF split data on randomly selected features and the theoretical explanation can be the basis for improving IF
data's split points are also randomly selected. Conditional performance.
anomalies can occur in many types of datasets. The study by
Stripling et al. [4] found the problem of AD by IF in a high- B. RQ 2: What solutions are offered to overcome these
dimensional dataset, but only a few features affect the weaknesses?
occurrence of anomalies. This weakness was also The researchers proposed several methods to overcome
investigated by Wang et al. [16] in their study stated that in the weaknesses of IF. Researchers not only use statistical
hyperspectral data, certain features generally determine the methods or other algorithms to strengthen IF but also
occurrence of anomalies. propose changes to the IF algorithm. They divide the
proposed method into three categories. (1) Pre-IF, which is
This problem was also found by Zou et al. [6] in their
an intervention where data processed by other methods
research which stated that each data has a different feature
before AD with IF, (2) post-IF, researcher use the output of
weight, so that random feature selection is not effective.
IF as input for other methods, and (3) method improvement,
Likewise, as in the research of Khan et al. [5], in datasets
modification of certain parts of the IF algorithm. Fig.2 shows
with correlated features, hidden anomalies can occur in
the proposed methods.
highly correlated data. Datasets taken under multiple
operating conditions can have extreme values that occur in
features to hide the anomaly [19].
Conditional anomalies can also lead to false positives or
false negatives, such as the imbalance dataset. Imbalance
means the number of anomalies is tiny compared to nominal
data[8], [23]. Or on hyperspectral data where IF is built
based on random selection for each pixel in the entire scene
[15].
2) Low accuracy in specific high dimensional data
IF has a good ability in anomaly detection on high
dimensional datasets [3], but researchers found that IF can
experience difficulties on datasets that have specific
properties. Puggini and McLoone's [9] study found that IF
experienced an increase in false positives on the high-
correlated dataset because slight differences outside the
normal value in groups of correlated variables can potentially
lead to anomalies rather than significant differences in
groups of isolated variables.
A high volume of high dimensional data can also lead to
degradation of accuracy. Tao et al. [15] stated that IF can
experience a bottleneck on big data if applied to a single
computer. The dataset processing capacity in the IF
algorithm is limited to the available memory on one machine
where the IF-based AD software is installed.
Ahmed et al. [10] examined AD on a very high
dimensional dataset with dimensions of up to 1122
dimensions and has a close relationship between dimensions.
Standard IF has decreased accuracy because the IF feature
splitting process does not involve inter-dimensional
relationships. Besides that, processing 1122 dimensions in IF
takes much time. Fig. 2. IF’s proposed methods scheme
120
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
1) Pre – IF c) Clustering
IF accuracy decreases in high-dimensional data due to the The data obtained under different operating conditions
nature of data mentioned in the previous discussion. can hide anomalies if the data are not separated based on
Therefore, to optimize IF, data pre-processing is required conditions. Operating conditions can be behavioral attributes
before AD. Data processing can be in the form of reducing or contextual attributes. When dealing with such data, it is
dimensions, selecting features, and clustering. necessary to pre-process the data before doing AD with IF.
a) Dimension Reduction In their research, Chen et al. [23] proposed data separation
based on operational conditions using the Gaussian Mixture
Although IF is good at handling high dimensional data, Model (GMM). They choose this method because the data
reducing dimension or feature extraction is necessary to used by [24] are complex data with nonparametric
maintain the performance and accuracy in large hyper- distribution.
dimension data. Ahmed et al. [11] used principal component
analysis (PCA) to reduce large power system data 2) Post-IF
dimensions. Due to the uniform distribution in the dataset Another approach to increasing the accuracy of IF in
and the Gaussian distribution formed from the sensor or conditional outlier detection is to use the IF output as input
meter measurement noise. Meanwhile, PCA can work well for other algorithms. Instead of making the IF output the
on data with Gaussian distribution. Besides, by using PCA, final result, Alsini et al. [11] made the IF output an outlier
the researcher can select the number of components candidate reprocessed using the LOF and increased the
representing the data dynamics from the variance criteria. accuracy by the sliding window method used in selecting
When PCA is suitable for correlated data, it becomes a candidates from the IF results.
problem when the anomaly is outside the large cluster of
correlated data because PCA will miss the anomaly data. In their research, Li et al. [15] found that random
Therefore Puggini and McLoone [9] propose a PCA-based selection of data points on hyperspectral data causes many
method to find the variables that have the least relationship false alarms. Therefore, they propose using twice IF, the first
with the variables selected by the PCA, which is named the to find global anomaly maps, the second to refining global
Forward Selection Independent Variables (FSIV) method anomaly maps to find the local anomaly. Elnour et al. [12]
and Forward Selection Minimizing the Maximum also use two Ifs. However, their research strengthened the
Reconstruction Error (FSMM). second IF using PCA.
Sadaf and Sultana [23] use an autoencoder (AE) to add Aminanto et al. [22]. In their study, investigate a high
features in the form of "attack" and "normal" labels to high- false-positive rate on imbalanced high-dimensional data. A
dimensional network traffic data so that the IF algorithm can stacked autoencoder (SAE) is used as a data processing
improve detection accuracy as it focuses on two features algorithm after IF. SAE is a set of autoencoders (AE)
provided by AE. arranged sequentially. The AE output becomes the next AE
input to produce a reconstruction error (RE). AE defines the
b) Feature Selection anomaly as a variable with a high RE score.
In low correlated data, feature selection can be used to select 3) Method Improvement
the most important features [10]. Stripling et al. [4], in their In addition to using other algorithms to help IF overcome
study, found that some high-dimensional data have more its weaknesses, researchers also modify the IF algorithm to
significant features than others. On nominal data, expert cover existing weaknesses. Buschjäger et al. [21] dissect the
knowledge can quickly identify these features. basic theory of IF and suggest that scoring anomaly using
Using expert experience to select nominal feature data and average path length is the same as calculating the mixture
feed it to the IF algorithm, [4] can find previously undetected components' estimated coefficient. Therefore, to optimize IF,
conditional anomalies. [21] proposes an estimate of mixture coefficients of mixture
distribution as an anomaly scoring method.
Zou et al. [6] conducted AD research on a dataset where
each feature's influence level can vary in each sample data. Another improvement was also made by Tao et al. [13]
In this condition, the accuracy of IF decreases. They by utilizing the independent nature of iTree to modify IF and
introduced the weighting method. This method is applied to implement it on a parallel computer. iTree construction on
docker container data by monitoring the most dominant bigdata processing requires extensive memory resources so
resource usage for each container. Selecting features based that a bottleneck can occur when using one machine, so Tao
on resource weight can improve the accuracy of IF. et al. [13], with the construction of iTree on a parallel
computer, has succeeded in overcoming the limited memory
Khan et al. [5] use the correlation analysis method for the resources for the IF process on one machine.
feature selection approach. The correlation analysis is a
method that measures the relationship between variables and Another study by Hariri [7] looked at the nature of data
then classifies them based on the correlation coefficient. This splitting on IF, which was always horizontal or vertical. This
method can perform AD more effectively because it isolated study then proposed an IF improvement by making the data
the group of variables causing the anomaly. splitting direction randomly. The method succeeded in
eliminating the bias and ghost area formed from horizontal
Ding and Xing [8] also use the feature selection approach and vertical splitting.
in overcoming the weaknesses of IF. In their research, they
convert the feature's distribution value into a histogram, then C. RQ 3: In what areas have researchers applied IF?
splitting is done on the value that lies at the distribution The areas of greatest interest are fault detection and
boundary or the value with low frequency. This method diagnosis [5], [6], [8], [9], [11], [20], [24]. Some researchers
achieved convergence faster than the random splitting point test the reliability of their proposed improvements to several
by IF standard.
121
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app
[7] S. Hariri, M. Carrasco Kind, and R. J. Brunner, “Extended
Isolation Forest,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2019, doi:
10.1109/TKDE.2019.2947676.
[8] Z. Ding and L. Xing, “Improved software defect prediction
using Pruned Histogram-based isolation forest,” Reliab. Eng. Syst.
Saf., vol. 204, no. August, p. 107170, Dec. 2020, doi:
10.1016/j.ress.2020.107170.
[9] L. Puggini and S. McLoone, “An enhanced variable
selection and Isolation Forest based methodology for anomaly
detection with OES data,” Eng. Appl. Artif. Intell., vol. 67, no.
September 2017, pp. 126–135, Jan. 2018, doi:
10.1016/j.engappai.2017.09.021.
[10] S. Ahmed, Y. Lee, S.-H. Hyun, and I. Koo, “Unsupervised
Machine Learning-Based Detection of Covert Data Integrity Assault
in Smart Grid Networks Utilizing Isolation Forest,” IEEE Trans. Inf.
Fig. 3. Distribution of IF research areas Forensics Secur., vol. 14, no. 10, pp. 2765–2777, Oct. 2019, doi:
10.1109/TIFS.2019.2902822.
datasets at once [7], [21]. Research like this is grouped into [11] R. Alsini, A. Almakrab, A. Ibrahim, and X. Ma, “Improving the
outlier detection method in concrete mix design by combining the
"various fields", which describes the distribution of research isolation forest and local outlier factor,” Constr. Build. Mater., vol.
areas, while other studies are grouped in their respective 270, p. 121396, Feb. 2020, doi: 10.1016/j.conbuildmat.2020.121396.
fields as shown in Fig. 3. [12] M. Elnour, N. Meskin, K. Khan, and R. Jain, “A dual-isolation-
forests-based attack detection framework for industrial control
VI. CONCLUSION systems,” IEEE Access, vol. 8, pp. 36639–36651, 2020, doi:
10.1109/ACCESS.2020.2975066.
The increasing type and amount of data and the need for
[13] X. Tao, Y. Peng, F. Zhao, P. Zhao, and Y. Wang, “A parallel
fast and accurate data have made AD a rising branch of algorithm for network traffic anomaly detection based on Isolation
machine learning in recent years. Isolation forest was Forest,” Int. J. Distrib. Sens. Networks, vol. 14, no. 11, p.
introduced in 2008 as one of the AD methods recognized as 155014771881447, Nov. 2018, doi: 10.1177/1550147718814471.
having good potential and has several weaknesses. [14] M. Kiran, C. Wang, G. Papadimitriou, A. Mandal, and E. Deelman,
Researchers have studied these weaknesses and offer “Detecting anomalous packets in network transfers:
solutions. investigations using PCA, autoencoder and isolation forest in TCP,”
Mach. Learn., vol. 109, no. 5, pp. 1127–1143, May 2020, doi:
This paper reviewed 17 studies that suggest solutions to 10.1007/s10994-020-05870-y.
IF's weaknesses. Most researchers agree that IF's weakness [15] S. Li, K. Zhang, P. Duan, and X. Kang, “Hyperspectral
in conditional anomaly detection is due to the random Anomaly Detection With Kernel Isolation Forest,” IEEE Trans.
Geosci. Remote Sens., vol. 58, no. 1, pp. 319–329, Jan. 2020, doi:
selection of features and the random selection of split points. 10.1109/TGRS.2019.2936308.
However, there is no consensus on the best way to deal [16] R. Wang, F. Nie, Z. Wang, F. He, and X. Li, “Multiple Features
with this, and researchers are divided into three methods: and Isolation Forest-Based Fast Anomaly Detector for Hyperspectral
Imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 9, pp.
Pre-IF intervention, Post-IF intervention, and method 6664–6676, Sep. 2020, doi: 10.1109/TGRS.2020.2978491.
improvement. [17] B. Kitchenham and S. Charters, “Guidelines for performing
Research in AD with IF can still be developed in the systematic literature reviews in software engineering,” 2007.
future. Research topics related to the challenges in applying [18] Y. Chen and W. Wu, “Isolation Forest as an Alternative Data-
Driven Mineral Prospectivity Mapping Method with a Higher Data-
IF for text-mining and sentiment analysis to the best of our Processing Efficiency,” Nat. Resour. Res., vol. 28, no. 1, pp. 31–46,
knowledge have never been done. We think this topic is Jan. 2019, doi: 10.1007/s11053-018-9375-6.
exciting to be further explored with current social media [19] H. Chen, H. Ma, X. Chu, and D. Xue, “Anomaly detection and
trends. critical attributes identification for products with multiple operating
conditions based on isolation forest,” Adv. Eng. Informatics, vol. 46,
REFERENCES no. March 2019, p. 101139, Oct. 2020, doi:
10.1016/j.aei.2020.101139.
[1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly
detection : A Survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, [20] Z. Lin, X. Liu, and M. Collu, “Wind power prediction based
Jul. 2009, doi: 10.1145/1541880.1541882. on high-frequency SCADA data along with isolation forest and deep
learning neural networks,” Int. J. Electr. Power Energy Syst., vol.
[2] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation-Based
118, no. September 2019, p. 105835, Jun. 2020, doi:
Anomaly Detection,” ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, 10.1016/j.ijepes.2020.105835.
pp. 1–39, Mar. 2012, doi: 10.1145/2133360.2133363.
[21] S. Buschjäger, P.-J. Honysz, and K. Morik, “Randomized
[3] R. Domingues, M. Filippone, P. Michiardi, and J. Zouaoui, “A
outlier detection with trees,” Int. J. Data Sci. Anal., Dec. 2020, doi:
comparative evaluation of outlier detection algorithms: Experiments 10.1007/s41060-020-00238-w.
and analyses,” Pattern Recognit., vol. 74, pp. 406–421, Feb. 2018,
doi: 10.1016/j.patcog.2017.09.037. [22] M. E. Aminanto, T. Ban, R. Isawa, T. Takahashi, and D. Inoue,
“Threat Alert Prioritization Using Isolation Forest and Stacked
[4] E. Stripling, B. Baesens, B. Chizi, and S. vanden Broucke,
Auto Encoder With Day-Forward-Chaining Analysis,” IEEE Access,
“Isolation-based conditional anomaly detection on mixed-
vol. 8, pp. 217977–217986, 2020, doi:
attribute data to uncover workers’ compensation fraud,” Decis. 10.1109/ACCESS.2020.3041837.
Support Syst., vol. 111, no. April, pp. 13–26, Jul. 2018, doi:
10.1016/j.dss.2018.04.001. [23] K. Sadaf and J. Sultana, “Intrusion Detection Based on
Autoencoder and Isolation Forest in Fog Computing,” IEEE Access,
[5] S. Khan, C. F. Liew, T. Yairi, and R. McWilliam, “Unsupervised
vol. 8, pp. 167059–167068, 2020, doi:
anomaly detection in unmanned aerial vehicles,” Appl. Soft Comput., 10.1109/ACCESS.2020.3022855.
vol. 83, p. 105650, Oct. 2019, doi: 10.1016/j.asoc.2019.105650.
[24] H. Chen, H. Ma, X. Chu, and D. Xue, “Anomaly detection and
[6] Z. Zou, Y. Xie, K. Huang, G. Xu, D. Feng, and D. Long, “A
critical attributes identification for products with multiple operating
Docker Container Anomaly Monitoring System Based on Optimized
conditions based on isolation forest,” Adv. Eng. Informatics, vol. 46,
Isolation Forest,” IEEE Trans. Cloud Comput., vol. 7161, no. no. July, p. 101139, Oct. 2020, doi: 10.1016/j.aei.2020.101139.
SEPTEMBER 2018, pp. 1–1, 2019, doi: 10.1109/TCC.2019.2935724.
122
horized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on March 08,2022 at 08:54:52 UTC from IEEE Xplore. Restrictions app