Isolation Forest Algorithm For Anomaly Detection
Isolation Forest Algorithm For Anomaly Detection
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 1/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 2/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Introduction:
Did you ever wonder how credit card fraud detection is caught in real-time?
Do you want to know how to catch an intruder program if it is trying to
access your system? This is all possible by the application of the anomaly
detection machine learning model.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 3/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Let us try to understand this with one more example. As we know, if an egg
is floating in the water, it might be old and rotten. This indicates that the
weight of eggs varies and, on the basis of its weight, one can differentiate
between a fresh egg and a rotten egg.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 4/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Photo by science4fun
Suppose we have a list indicating the weight of each egg. As per the list
value, we want to identify the number of rotten eggs and learn the
percentage value from the lot. We can solve this using machine learning.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 5/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Real-world datasets may have very large datasets with complicated patterns
where it is difficult to detect the anomaly by just looking at the data. That’s
why the study of anomaly detection is an extremely important application
of machine learning.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 6/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Isolation Forest:
It is worth knowing that the most common techniques employed for
anomaly detection are based on the construction of a profile of what is
normal data. Anomalies are found as those instances of data that do not
conform to the defined normal profile.
However, the isolation forest does not work on the above methodology. It
identifies anomalies by isolating outliers in the data. Isolation forest exists
under an unsupervised machine learning algorithm.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 7/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
One of the advantages of using the isolation forest is that it not only detects
anomalies faster but also requires less memory compared to other anomaly
detection algorithms.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 8/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Implementation in Python
Let us start by importing the required libraries numpy , pandas , seaborn, and
matplotlib . We also need to import the isolation forest from
sklearn.ensemble
import numpy as np
import pandas as pd
Our second task is to read the data file from CSV to the pandas DataFrame.
The data is about the collection of egg weights in grams. This data has few
anomalies (like a weight too low) which the algorithm will detect.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 9/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
df = pd.read_csv('egg_weight.csv')
df.head(15)
Here we will define a model variable and instantiate the isolation forest
class. Note that the four main parameters that need to be passed to the
model are listed below.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 10/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
max_features=1.0
n_estimators=50
max_samples='auto'
contamination=float(0.2)
forest_model=IsolationForest(max_features = max_features,
n_estimators=n_estimators, max_samples=max_samples,
contamination=contamination)
model.fit(df[['Egg_weight']])
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 11/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
After we define the model it needs to be trained using the dataset provided.
For this, we are going to use the fit() method. We are passing one
parameter to the fit() method, which is our data of interest. This means the
egg weights column of the dataset.
Find Scores
Now let’s find the value of scores and that of the anomaly column.
Bypassing the egg weight as a parameter to decision_function() we can
find the values of the scores column.
Similarly, we can find the values of the anomaly column bypassing the egg
weight as a parameter to predict() the function of the trained model.
df['scores']=forest_model.decision_function(df[['Egg_weight']])
df['anomaly_Value']=forest_model.predict(df[['Egg_weight']])
df.head(10)
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 12/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
After adding the scores and anomalies for all the rows in the complete
dataset, it will print the predicted anomalies.
Anomalies
To show the predicted anomalies present in the dataset under the egg
weight column, data need to be analyzed after the addition of scores and
anomaly columns. Note that the anomaly column values would be -1 and
the corresponding scores will be negative.
By using this information one can show the predicted anomaly as below.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 13/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Evaluation
For model evaluation let’s set a threshold limit with egg weight <80 as an
outlier. Remember that our goal is to find out the number of outliers
present in the data as described in the above rule.
Outliers_Counter = 1
print("Accuracy percentage:",
100*list(df['anomaly_Value']).count(-1)/(outliers_counter))
Conclusion:
In this article, we discussed one of the most powerful anomaly detection
algorithms: the isolation forest.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 14/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
Isolation forest is used widely due to its faster anomaly detection and
smaller memory requirement.
Cheers!!
If you’d like to contribute, head on over to our call for contributors. You can
also sign up to receive our weekly newsletters (Deep Learning Weekly and the
Fritz AI Newsletter), join us on Slack, and follow Fritz AI on Twitter for all
the latest in mobile machine learning.
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 15/16
8/5/2021 Isolation Forest Algorithm for Anomaly Detection | by Prakash verma | Heartbeat
https://heartbeat.fritz.ai/isolation-forest-algorithm-for-anomaly-detection-2a4abd347a5 16/16