0% found this document useful (0 votes)

73 views

Anomaly Detection Using Machine Learning

This paper describes anomaly Detection using Machine Learning like clustering and density based methods.

Uploaded by

vishal jain

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views

Anomaly Detection Using Machine Learning

This paper describes anomaly Detection using Machine Learning like clustering and density based methods.

Uploaded by

vishal jain

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

International Journal of Advanced Science and Technology

Vol. 29, No. 6, (2020), pp. 2142 - 2151

Outlier Detection Based on Machine Learning Techniques

Harry Bhagat1, *S.Priya2, K. Aditya3

Department of Computer Science and Engineering

SRM Institute of Science and Technology
Chennai, India
1
[email protected], 2*[email protected], [email protected]

Abstract
Outliers are being researched in many fields of research and various domains. In this
paper, we analyse and bring together various outlier detection techniques. With this, we
hope to attain a better understanding of the different approaches of research on outlier
detection. The goal of this project was to detect the outliers of the housing prices in
Melbourne (Australia), using statistical and Machine Learning prediction models. The
type of Machine Learning implemented was unsupervised learning for all models. The
models used were Isolation Forest, Elliptic Envelope, Density-Based Spatial Clustering of
Applications with Noise (DBSCAN) and Local Outlier Factor (LOF). The results of each
model were visualised for multivariate data to detect outliers. Outlier Detection was
performed on univariate and multivariate data. A dummy data frame was created by 1000
random observation values and 4 features to perform parametric methods on univariate
and multivariate data.

Keywords: Outliers, Outlier Detection, Univariate, Multivariate.

1. Introduction
The past twenty years have profoundly dedicated to intrusion detection within the
information technology world. This is because intrusions violate system security policies
that correspond with information, as well as in identifying the processes that are utilized
in identifying intrusions. Currently, there is a profound insight as permeated by research
on intrusion detection, a factor that has gained tremendous attention and has further
resulted in comprehensive research. However, the research community is still confronted
by severe problems while researching on intrusion detection. The problems arise from the
difficulty in reducing the number of false alerts while facing problems relative to
identifying unknown attack patterns, a factor that has remained unresolved for a while.
Nevertheless, research has shown that there is a solution to the above problem. The
answer has been found to lie in anomaly detection or outlier detection. Outlier detection
has come to be known as key in research on intrusion detection. This is due to the fact that
if outlier detection indicates an anomaly or anything against the norm, then it means that
there is the presence of unintended or intended faults, induced attacks, or any form of
intrusion. Mainly, outlier detection is based on several machine learning techniques;
supervised outlier detection and unsupervised outlier detection. In light of understanding
how outlier detection is carried out, this paper seeks to report on the machine learning
techniques utilized, both supervised and unsupervised.
The rest of this paper is organised as follows. Section II discusses about the previous
work and Section III discusses about the nature of input data. The proposed work is
discussed in Section IV. The implementation results are given in Section V followed by
conclusion and future scope in Section VI.

ISSN: 2005-4238 IJAST 2142

2. Literature Review
Supervised Machine Learning and various statistical prediction models are discussed in
[1] can be used to predict housing prices in Melbourne (Australia) which include three
different Linear Regression based models and two decision tree based models. The data
was first cleaned and Exploratory Data Analysis (EDA) was performed on it to have an
insight on the data. „Price‟ was considered as the main variable/feature which was used by
the prediction models for price prediction. The results of RIDGE Linear Regression
(MAPE – 25.5%) gave the best result among the three Linear Regression models used.
Random Forest model (MAPE – 9.5%) performed the best among the other models where
there was an improvement in the results until 300 trees and after that, there was no major
difference in the results.EDA was helpful in data cleaning and finding out the main/target
variable i.e. „Price‟. RIDGE Regression and LASSO method help in reducing the variance
and the sample error. Decision tree model is useful for interpretability. The only problem
faced in this model was in predicting prices above a particular threshold.
Location plays a major role in predicting houseprices compared to other in-house
features. Various locations and their related data were gathered. The entire house data and
the prediction models used were partitioned and the Multi-Task Learning (MTL) model
was used for each partition which are aligned to a task. Different MTL-based methods are
used to find the relatedness between the aligned tasks. Experimental evaluations were
performed and the results show the superiority of MTL-based methods over other
methods. The impact of the task definitions was analysed along with MTL-based method
selection and it was shown that the prediction performance of the MTL-based methods is
more than the other methods.
In [3], Active Anomaly Detection is a new framework developed for anomaly detection
whose cost is same as that of unsupervised anomaly detection, and it produces better
results. It is shown that a Prior should be assumed in order to have guarantees in the
performance of the anomalies probability distribution. A new layer can be added to a
unsupervised anomaly detection model which will make it an Active anomaly detection
methods, and this can yield better results on any dataset based on anomaly detection.
From [4] ,the degree of dispersion between an object and its neighbours are often
ignored by some Local Outlier Detection approaches. These approaches are less efficient
as they perform local outlier factor calculation on the entire dataset which consists of
small amount of outlier data. Local Deviation Coefficient (LDC) uses the distribution of
the object and its neighbours. For Data Preprocessing, non-outlier data are removed by
Rough Clustering based on Multi-Level Queries (RCMLQ). RCMLQ is useful in
reducing the amount of data for local outlier detection. It is used in parallel with other
existing local outlier detection methods to improve their efficiency. LDC help in showing
the unusual or abnormal situations of the data for the scattered datasets.
Threshold-based alarms are used for detecting anomalies for critical metrics or health
probing requests as discuseed in [5]. Machine Learning classifiers can be used to predict
the status of the system‟s health. Recurrent Neural Networks with Long Short-Term
Memory (LSTM) is used on the real-world dataset and it was found to be more effective
in detecting health issues of the system and anomalies compared to other ML classifiers.
Area under precision-recall curve was found to be 0.44. 70% of the anomalies were
automatically detected at the default threshold. The rate at which false positives occur is
4% even though the precision was found to be low (31%)
From [6], Outliers describe patterns of data that are not complying with the broadly
anticipated behavior. Even though it is real outliers are found in the data since some
malicious activities such as credit card fraud breakdown of the system and cyber
intrusion. Still, at the same moment, they are vital to the relevant analyst, and the actual

ISSN: 2005-4238 IJAST 2143

life relevance of outliers is a primary feature for the outlier detection. Comparison of the
noise removal and noise accommodation of which both deals with the unrequired noise,
therefore, can be referred to as hindrance to data analysis hence not required in the data.
Noise removal is articulated by the urge to remove the objects that are unwanted before
conduction data analysis on the data.Elements determining the outlier detection
difficulties:With outline observations being attractive, the detection of outliers becomes a
very critical point. There are a variety of factors used in the determination of ways to
formulate an outlier detection difficulty.

3. Nature of input data

The significant factor of any outlier detection is to find knowledge about the data input
[6]. Where input describes the collection of data instances with every data instance being
defined in terms of characteristics that can be reefed to as variables, attribute feature.
Every data consisting of one variable is known as unvariate, and if it consists of multiple
variables, it‟s referred to as multivariate. For multivariate data instances, all
characteristics may be of a similar type or a mixture of various data types, such as
categorical data, continuous data, and spatial data. Depending on the nature of the data,
we can, therefore, apply a specific outlier technique. The most vital aspect of the outlier
detection technique is the evaluation of the nature and kind of outliers. Thus categorizing
outliers into five categories inclusive of:

1. Point outliers

Describing the individual data instance that is considered to be anomalous in comparison

to the rest of the data, thus the case is referred to as point outlier [7]. Which is one of the
simplest types of outliers and is the vital focus of the variety of research in outlier
detection.

2. Contextual outliers

Some data instances happen to be rare in occurrence concerning some particular context
and a regular appearance concerning another setting, therefore such forms of data are
referred to as contextual data sets, which are inclusive of time series data also [8].

3. Collective outliers

If an individual data instance happens not to be anomalous with the collection of the data
set, being anomalous classifies the data as a collective outlier. A collective outlier can be
used for sequence data, spatial data, and even graph data [8]. Point outliers a rare
occurrence in any data set where the data instances are in relation.

4. Erroneous outliers

When some observation is not noted correctly as an outlier, due to some inherent
difficulty or catastrophic failure, then these are mistake outliers, and we can define them
as illusive outliers [9]. Which takes the outcome of the data in another way.

Further in [10], Anomaly has been detected for the live streaming data in case of error for
instance, unwanted alarm. So, in this false fear alarms are detected by using effective
algorithms to overcome the unnecessary work load and security if alrm rings without any
emergency situation.

ISSN: 2005-4238 IJAST 2144

4. Proposed method

4.1. System Architecture

In our dataset (Melbourne house pricing), there are 13 features and considering those
features price prediction is made to detect which is normal and anomaly based on various
algorithms. All the algorithms are using different technique to give the best output and
after analyzing the different results we have come with both the data (normal and
outliers). . The “Melbourne housing prices” data set was used for non-parametric methods
on univariate and multivariate data. It consisted of 34,858 observations and 21 features.
First, the missing values for each column were filled with its median and specific columns
were visualised using Histograms. The target variables from the given data set were
“Rooms” and “Price”. The results of models based on DBCAN and LOF gave quite
satisfying results and were visualised accordingly where the outliers were duly marked in
the plots.

MELBOURNE
DATASET

DATA
PREPROCESSING

BOX PLOT FOR

EACH FEATURE

APPLY DB APPLY LOF

SCAN

OUTIER
DETECTED
Figure 1. System Architecture

4.2. Data Preprocessing

As our dataset have some blank values, in order to fill those we are formulation the
expressing and filling those values by using median value in the particular feature.
Therefore, all the missing values in the dataset are filled by median and further create a
list of indexing and plot features according to the dataset.

4.3. Parametric Method (Univariate)

4.3.1. Standard Deviation: In stable range, If the distribution of data is roughly normal
then near about 68% of the data values lie within one standard deviation of the mean and
about 95% are within two standard deviations, and about 99.7% lie within three standard

ISSN: 2005-4238 IJAST 2145

deviations. So, if there is presence of any data which is three times the value of standard
deviation, then those points have more chances to be called as anomaly or outliers. It is
possible for one dimensional dataset.

Figure 2.Standard deviation, detects anomaly based on upper and lower

limit.

4.3.2. Box plots: It is said to graphical representation of the data with help of quartiles. It
is said to be easiest, however it is more effective method. After finding the maximum and
minimum value we can identify the outliers which are below minimum range and above
maximum range. In this data set is divided into 4 intervals and quartile is the one which
divides them into 3 partitions.

IQR=Q3-Q1

Figure 3.Values below and above this range with be outliers

4.4.Non-parametric (Univariate)

4.4.1. Isolation Forest: It is an assembly regression and this concept is used to find
outlier which is very different from other. Earlier outlier was detected by identifying the
data which seems to be in the outer region and is most deviating from it‟s normal path but
here isolation forest uses a different approach in which there is no profiling for normal
instances, no point based distance calculation is performed. It builts an random tree for
the given data set which in return provide anomaly score to figure out how isolated the
object is in received structure. Firstly the data is generated from the given data set to form
a distribution and is compared with the anomaly score to find highlighted region where
there are maximum chaces of outlier.

ISSN: 2005-4238 IJAST 2146

Figure 4.Histogram is plotted to find the low probability regionsof given

data,which can also be noted as anomaly by looking at lef-side, righ-side
and zero region

4.5. Parametric Method (Multivariate)

4.5.1. Elliptic Envelope: It is used to detect anomaly in Gaussian Distributed Dataset.

Basically it illustrates the perfect shape of data we have by plotting contour. It can also
create a legend dictionary and store values. So, objects that are outside the contour are
said to be anomalies.

Figure 5.After plotting the contour the objects which are in greater densities
inside the contour are normal data but outside contour are anomalies

4.6. Non-Parametric Method (Multivariate)

4.6.1. DBSCAN Clustering: It is type of algorithm which is can be used to find outlier
more effectively because it is density based algorithm and has few concepts on data
points.

Concepts:
1. Core point: It consists of hyper parameters like- Minimum sample which can
form clusters using few core points and other one is ESP, it is the maximum
distance between two samples but has to be considered in same cluster.
2. Border point: They are in the same cluster but have great distance from center of
the circle.

ISSN: 2005-4238 IJAST 2147

3. Noise Point: They are the exception which does not belong to any cluster but can
be analyzed as anomaly or not.

In this algorithm all the data points are approached individually and labeled it as core,
border and noise point based on its conditions.

Conditions:In core point, random distance (ESP) is taken from one point to the nearest
and contour is formed. If the minimum of 3 points are present in the contour then the
point which was visited first to measure ESP is called core point.In border point, there
should be minimum of 2 points in contour and it should be neighbour of core point.In
noise point, no condition is satisfied and it doesnot have any neighbour.

4.6.2. LOF(Local Outlier Factor): LOF is density based method and it is most effective
method to find anomaly by forming a contour. Local outliers are the nearby points in one
area of the dataset which are totally different from global outliers but both of them can be
detected by considering the relative density and it can also detect outlier on skewed
datasets. Therefore by applying LOF anomaly is detected in different colour and lies
outside the contour.

5.Experiment Result
After analyzing all the algorithms separately we have come with perfect results and
outlier is detected by various algoritms using different methods given below:

5.1. LOF
LOF (Local outlier factor) has tha capability to find local outlier as well as global
outlier where others alorithms can easily find global outlier but they are not accurate in
case of local outlier. It is also density based algorithm and similar to dbscan but it forms a
contour where dbscan doesnot form. So, its gives the results based of neighbouring points
and whether those points lie in the contour or not. In LOF method, the number of
neighbours to be considered (n_neighbors) was taken to be 50 and the parameter
„contamination‟ was given as „auto‟ which takes the default threshold. The visualised
result of this method is given below.

Figure 6.Points outside the contour in red colour are outliers where points
inside the contour in white coloue are normal points

ISSN: 2005-4238 IJAST 2148

5.2. DBSCAN
DBSCAN is a density based based method which detect the anomaly with help of
neighbouring point, if the density of point is significantly different from its neighbour
then it is said to be outlier. Moreover in DBSCAN method, the parameter „eps‟ (the
maximum distance between two samples for one to be considered as in the neighbourhood
of the other) is given the value of 3.0 and the number of samples, in a neighbourhood for
a point to be considered as core point („min_samples‟), is taken as 10. The visualised
result of this method is given below.

Figure 7.Points in the red colour are outliers which doesnot satisfy any
condition and blue points are normal points

5.3.Comparing Standard Deviation and IQR

After making the comparison between standard deviation and inter quartile range we
are getting the results on the basis of lower and upper limit values. Plotting is done for
upper and lower threshold to detect outlier in form of boolean value(True or False). If true
that means it is good for plotting and in case of false it is masked value and has no
outlier.visualization is done for different standard deviations and threshold value(true) is
noted.

5.4. Isolation Forest

After training the isolation forest with help of generated distribute data, we have
compared the outlier score with each and every observation and denoted each observation
as anomaly or non-anomaly. The given graph shows the scores and regions.

ISSN: 2005-4238 IJAST 2149

Figure 8.Outlier regions are found in the areas of low probabilty in the pink
colour based on the anomaly-score

6. Conclusion and Future Scope

From, the proposed system all algorithms gives perfect results but, most effective
outcomes are given by LOF and DBSCAN which achieves high accuracy on outlier
detection and the outlier factor calculated though various algorithms shows an outperform
results. Notably, Outlier detection is done using machine learning techniques. So, it is
now easy to determine the most deviating house prices in our dataset (Melbourne house
pricing). And the most deviating price will be called as anomaly. If the price goes below
or above the given limit in the particular society it will be called as outlier. So, hopefully
we have come out with perfect results to find anomaly using various methods in order to
decrease the fraud pricing of housing by several agents. So, this technique has made the
work easier to find the price whether it‟s suitable for that society or not and moreover
decreased the human interaction and manual calculations.Exception location plans to
discover designs in information that don't adjust to anticipated conduct. It will have a
broad use in a wide assortment of uses, for example, military reconnaissance for foe
exercises, interruption location in digital security, extortion discovery for charge cards,
protection or medicinal services and flaw recognition in wellbeing basic frameworks.
Their significance in information is because of the way that they can convert into
noteworthy data in a wide assortment of utilizations. A strange traffic design in a PC
system could imply that a hacked PC is conveying touchy information to an unapproved
goal.

References

[1] C.C. Aggarwal, Outlier Analysis, Springer Publishing Company, Incorporated, 2nd
edition, (2016).
[2] Baingana, B. and Giannakis, G, “Joint community and anomaly tracking in dynamic
networks”,(2016).
[3] G. Gao, Z. Bao, J. Cao, A. Qin, T. Sellis and Z. Wu, “Location-centered house price
prediction: A multi-task learning approach”, (2019).
[4] M. Gupta, J. Gao, C.C.Aggarwaland J. Han, “Outlier detection for temporal data: A
survey”, IEEE Transactions on Knowledge and Data Engineering, 26(9),(2014), 2250–
2267.
[5] F.Huch, M. Golagha, A. Petrovskaand A. Krauss, “Machine learning-basedrun-time
anomaly detection in software systems: An industrial evaluation”, 2018 IEEE Workshop

ISSN: 2005-4238 IJAST 2150

on Machine Learning Techniques for Software Quality Evaluation (MaL-TeSQuE), 13–

18,(2018).
[6] Y. Pei, O. Zaiane, and Y. Gao, “An efficient reference-based approachto outlier
detection in large datasets”, Sixth International Conference on Data Mining (ICDM‟06),
478–487,(2006).
[7] T. Pimentel, M. Monteiro, J. Viana, A. Velosoand N. Ziviani, “A generalizedactive
learning approach for unsupervised anomaly detection, (2018).
[8] M. Shukla, Y. P. Kostaand P. Chauhan, “Analysis and evaluation of outlier detection
algorithms in data streams”, 2015 International Conference on Computer,Communication
and Control (IC4),(2015), 1–8.
[9] S. Su, L. Xiao, L. Ruan, F. Gu, S. Li, Z. Wang and R. Xu, “An efficient density based
local outlier detection approach for scattered data.” ,IEEE Access, 7,(2019), 1006–1020.
[10] H. Wang, M.J. Bah and M. Hammad,“Progress in outlier detection techniques: A survey”,
IEEE Access, 7, (2019), 107964–108000.

ISSN: 2005-4238 IJAST 2151

Anomaly_Detection_on_Data_Streams_with_H
No ratings yet
Anomaly_Detection_on_Data_Streams_with_H
6 pages
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
No ratings yet
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
16 pages
1 s2.0 S0952197622004936 Main
No ratings yet
1 s2.0 S0952197622004936 Main
8 pages
A Comparative Analysis of Data Mining Tools For Performance Mapping of Wlan Data
No ratings yet
A Comparative Analysis of Data Mining Tools For Performance Mapping of Wlan Data
11 pages
1 s2.0 S2665917422000411 Main
No ratings yet
1 s2.0 S2665917422000411 Main
6 pages
Kova Rasan 2018
No ratings yet
Kova Rasan 2018
11 pages
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
No ratings yet
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
12 pages
1-s2.0-S0020025523011052-main
No ratings yet
1-s2.0-S0020025523011052-main
17 pages
Feature Selection Based On Fuzzy Entropy
No ratings yet
Feature Selection Based On Fuzzy Entropy
5 pages
Anomaly_Detection_Review (1)(2)
No ratings yet
Anomaly_Detection_Review (1)(2)
3 pages
An Ensemble Design of Intrusion Detection System For Handling Uncertainty Using Neutrosophic Logic Classifier
No ratings yet
An Ensemble Design of Intrusion Detection System For Handling Uncertainty Using Neutrosophic Logic Classifier
9 pages
Intrusion Detection System IDS Developme
No ratings yet
Intrusion Detection System IDS Developme
17 pages
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
No ratings yet
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
17 pages
Intrusion Detection System (Ids) Development Using Tree - Based Machine Learning Algorithms
No ratings yet
Intrusion Detection System (Ids) Development Using Tree - Based Machine Learning Algorithms
17 pages
IDS in Telecommunication Network Using PCA
No ratings yet
IDS in Telecommunication Network Using PCA
11 pages
Enhanced Network Anomaly Detection Based On Deep Neural Networks
No ratings yet
Enhanced Network Anomaly Detection Based On Deep Neural Networks
16 pages
IEEE Conference Templa
No ratings yet
IEEE Conference Templa
4 pages
(C) 2019 Application of Outlier Detection Using Re-Weighted Least Squares and R-Squared For IoT Extracted Data
No ratings yet
(C) 2019 Application of Outlier Detection Using Re-Weighted Least Squares and R-Squared For IoT Extracted Data
6 pages
Anomaly Detection in Cybersecurity With Graph Based Approaches
No ratings yet
Anomaly Detection in Cybersecurity With Graph Based Approaches
9 pages
Empowering Anomaly Detection Algorithm: A Review
No ratings yet
Empowering Anomaly Detection Algorithm: A Review
14 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Seguridad
No ratings yet
Seguridad
29 pages
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
No ratings yet
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
13 pages
Machine Learning Algorithms For Spotting 6G Network Penetration For Different Attacks
No ratings yet
Machine Learning Algorithms For Spotting 6G Network Penetration For Different Attacks
5 pages
Road Anomalyreviewbynatha
No ratings yet
Road Anomalyreviewbynatha
13 pages
Intrusion Detection Using Geometrical Structure
No ratings yet
Intrusion Detection Using Geometrical Structure
7 pages
1-s2.0-S2215016125000299-main
No ratings yet
1-s2.0-S2215016125000299-main
11 pages
ms160400843 - Synopsis v2.4
No ratings yet
ms160400843 - Synopsis v2.4
11 pages
Impact of Outlier Removal and Normalization Approa
No ratings yet
Impact of Outlier Removal and Normalization Approa
6 pages
Measuring of Data Quality in KYC Using Anomaly Det
No ratings yet
Measuring of Data Quality in KYC Using Anomaly Det
7 pages
ms160400843 - Synopsis v2.3
No ratings yet
ms160400843 - Synopsis v2.3
12 pages
Feature extraction for machine learning-based intrusion detection in
No ratings yet
Feature extraction for machine learning-based intrusion detection in
12 pages
AI-Driven Anomaly Detection in Network Monitoring
No ratings yet
AI-Driven Anomaly Detection in Network Monitoring
6 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
4 pages
IJHS-9745+1341-1349
No ratings yet
IJHS-9745+1341-1349
9 pages
A Bidirectional LSTM Deep Learning Approach For Intrusion Detection
No ratings yet
A Bidirectional LSTM Deep Learning Approach For Intrusion Detection
30 pages
A Multi Model Approach A Data Engineering Driven Pipeline Model For Detecting Anomaly in Sensor Data Using Stacked LSTM
No ratings yet
A Multi Model Approach A Data Engineering Driven Pipeline Model For Detecting Anomaly in Sensor Data Using Stacked LSTM
8 pages
1 s2.0 S0167404820304375 Main
No ratings yet
1 s2.0 S0167404820304375 Main
12 pages
Nghiên cứu xử lý và tổng hợp thông tin trinh sát truyền thông
No ratings yet
Nghiên cứu xử lý và tổng hợp thông tin trinh sát truyền thông
5 pages
JETIR2411279
No ratings yet
JETIR2411279
7 pages
A Survey On Building An Effective Intrusion Detection System (IDS) Using Machine Learning Techniques, Challenges and Datasets
No ratings yet
A Survey On Building An Effective Intrusion Detection System (IDS) Using Machine Learning Techniques, Challenges and Datasets
8 pages
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
No ratings yet
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
4 pages
Expert Systems With Applications: Gang Wang, Jinxing Hao, Jian Ma, Lihua Huang
No ratings yet
Expert Systems With Applications: Gang Wang, Jinxing Hao, Jian Ma, Lihua Huang
8 pages
ms160400843 - Synopsis v2.2
No ratings yet
ms160400843 - Synopsis v2.2
15 pages
Sensors: Conditional Variational Autoencoder For Prediction and Feature Recovery Applied To Intrusion Detection in Iot
No ratings yet
Sensors: Conditional Variational Autoencoder For Prediction and Feature Recovery Applied To Intrusion Detection in Iot
17 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
Fingerprint Prediction Using Statistical and Machine Learning Methods
No ratings yet
Fingerprint Prediction Using Statistical and Machine Learning Methods
7 pages
5ANFIS and Deep Learning based missing sensor data prediction in IoT
No ratings yet
5ANFIS and Deep Learning based missing sensor data prediction in IoT
15 pages
s40537-024-00887-9
No ratings yet
s40537-024-00887-9
25 pages
Applying Stack Bidirectional LSTM Model To Intrusion Detection
No ratings yet
Applying Stack Bidirectional LSTM Model To Intrusion Detection
12 pages
s40537-023-00694-8
No ratings yet
s40537-023-00694-8
26 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
Research Paper
No ratings yet
Research Paper
17 pages
IET Communications - 2020 - Safara - Improved Intrusion Detection Method For Communication Networks Using Association Rule
No ratings yet
IET Communications - 2020 - Safara - Improved Intrusion Detection Method For Communication Networks Using Association Rule
6 pages
A Review: Machine Learning Approach and Deep Learning Approach For Fake News Detection
No ratings yet
A Review: Machine Learning Approach and Deep Learning Approach For Fake News Detection
5 pages
Efficient Outlier Detection in High-Dimensional Data Using
No ratings yet
Efficient Outlier Detection in High-Dimensional Data Using
21 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
Yadav 2020
No ratings yet
Yadav 2020
6 pages
Pattern Recognition and Anomaly Detection
No ratings yet
Pattern Recognition and Anomaly Detection
2 pages
Böhmer, 2020 - Mining Association Rules For Anomaly Detection in Dynamic Process Runtime Behavior and Explaining The Root Cause To Users
No ratings yet
Böhmer, 2020 - Mining Association Rules For Anomaly Detection in Dynamic Process Runtime Behavior and Explaining The Root Cause To Users
29 pages
h15691 Cloudiq Overview
No ratings yet
h15691 Cloudiq Overview
189 pages
BRM 1
No ratings yet
BRM 1
36 pages
Chapter II
No ratings yet
Chapter II
9 pages
Final Project-1
No ratings yet
Final Project-1
62 pages
Main
No ratings yet
Main
22 pages
AIOps Whitepaper
100% (1)
AIOps Whitepaper
28 pages
1-s2.0-S209672092400040X-main
No ratings yet
1-s2.0-S209672092400040X-main
32 pages
Cs Microproject
No ratings yet
Cs Microproject
3 pages
Fraud Detection in Financial Transactions
No ratings yet
Fraud Detection in Financial Transactions
5 pages
Pothole Severity Prediction Using Monocular Depth (3) (1) - 2
No ratings yet
Pothole Severity Prediction Using Monocular Depth (3) (1) - 2
15 pages
CNN Based Crack Detection in Concrete Structures
No ratings yet
CNN Based Crack Detection in Concrete Structures
2 pages
10 Standout Coding Projects
No ratings yet
10 Standout Coding Projects
61 pages
Underwater Sonar Signals Recognition by Incremental Data Stream Mining With Conflit Analysis PDF
No ratings yet
Underwater Sonar Signals Recognition by Incremental Data Stream Mining With Conflit Analysis PDF
12 pages
Team of One Agentic AI for Security 1743838803
No ratings yet
Team of One Agentic AI for Security 1743838803
115 pages
10.anomaly Detection
No ratings yet
10.anomaly Detection
24 pages
Lecture 7 - Data Cleaning
No ratings yet
Lecture 7 - Data Cleaning
36 pages
Project Report
No ratings yet
Project Report
40 pages
Final Datamining Report
No ratings yet
Final Datamining Report
25 pages
AI 900 - All Questions FINAL
No ratings yet
AI 900 - All Questions FINAL
108 pages
Predictive Analytics-Enabled Cyber Attack Detection
No ratings yet
Predictive Analytics-Enabled Cyber Attack Detection
6 pages
CSC649 Group Project and Presentation
No ratings yet
CSC649 Group Project and Presentation
4 pages
Detection of phishing web page using Machine Learning
No ratings yet
Detection of phishing web page using Machine Learning
20 pages
Machine Learning Based Data Driven Diagnostics & Prognostics Framework For Aircraft Predictive Maintenance
No ratings yet
Machine Learning Based Data Driven Diagnostics & Prognostics Framework For Aircraft Predictive Maintenance
15 pages
MTech AI DS KIIT Syllabus v1.5
No ratings yet
MTech AI DS KIIT Syllabus v1.5
27 pages
3647-Full Paper-12782-1-10-20230817
No ratings yet
3647-Full Paper-12782-1-10-20230817
6 pages
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
Technical Seminar Report
No ratings yet
Technical Seminar Report
30 pages

Anomaly Detection Using Machine Learning

Uploaded by

Anomaly Detection Using Machine Learning

Uploaded by

International Journal of Advanced Science and Technology

Vol. 29, No. 6, (2020), pp. 2142 - 2151

Outlier Detection Based on Machine Learning Techniques

Harry Bhagat1, *S.Priya2, K. Aditya3

Department of Computer Science and Engineering

Keywords: Outliers, Outlier Detection, Univariate, Multivariate.

ISSN: 2005-4238 IJAST 2142

ISSN: 2005-4238 IJAST 2143

3. Nature of input data

Describing the individual data instance that is considered to be anomalous in comparison

ISSN: 2005-4238 IJAST 2144

4.1. System Architecture

BOX PLOT FOR

APPLY DB APPLY LOF

4.2. Data Preprocessing

4.3. Parametric Method (Univariate)

ISSN: 2005-4238 IJAST 2145

Figure 2.Standard deviation, detects anomaly based on upper and lower

Figure 3.Values below and above this range with be outliers

ISSN: 2005-4238 IJAST 2146

Figure 4.Histogram is plotted to find the low probability regionsof given

4.5. Parametric Method (Multivariate)

4.5.1. Elliptic Envelope: It is used to detect anomaly in Gaussian Distributed Dataset.

4.6. Non-Parametric Method (Multivariate)

ISSN: 2005-4238 IJAST 2147

ISSN: 2005-4238 IJAST 2148

5.3.Comparing Standard Deviation and IQR

5.4. Isolation Forest

ISSN: 2005-4238 IJAST 2149

6. Conclusion and Future Scope

ISSN: 2005-4238 IJAST 2150

on Machine Learning Techniques for Software Quality Evaluation (MaL-TeSQuE), 13–

ISSN: 2005-4238 IJAST 2151

You might also like