0% found this document useful (0 votes)
5 views

Model_Based_Anomaly_Detection_in_High_Dimensional_DATA

The paper presents a structured overview of anomaly detection techniques, emphasizing their importance across various application domains such as fraud detection and medical diagnosis. It proposes a multi-vision approach using Case-Based Reasoning (CBR) to enhance anomaly detection in high-dimensional data, addressing challenges in identifying anomalies in complex datasets. The study categorizes methods based on data types and applications, and outlines a framework for implementing these techniques effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Model_Based_Anomaly_Detection_in_High_Dimensional_DATA

The paper presents a structured overview of anomaly detection techniques, emphasizing their importance across various application domains such as fraud detection and medical diagnosis. It proposes a multi-vision approach using Case-Based Reasoning (CBR) to enhance anomaly detection in high-dimensional data, addressing challenges in identifying anomalies in complex datasets. The study categorizes methods based on data types and applications, and outlines a framework for implementing these techniques effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

6th International Conference on Advanced Technologies DFM-57

for Signal and Image Processing - ATSIP'2022


May 24-27, 2022, Canada-Tunisia

Model Based Anomaly Detection in High


2022 6th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) | 978-1-6654-5116-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ATSIP55956.2022.9805987

Dimensional DATA

Imen GATFAOUI Basel SOLAIMAN Imed Riadh FARAH


National Institute of computer Science, Department of Image and Information National Institute of computer Science,
RIADI Laboratory, Processing RIADI Laboratory,
IMT Atlantique, ITI Laboratory, IMT Atlantique, Tunis, Tunisia
Tunis, Tunisia Brest, France Email:[email protected]
Email :[email protected] Email:[email protected]

Abstract— Anomaly detection is a growing could point out the theft of the credit card or
research issue in several application domains. This identity [10]. Anomaly detection improves data
paper attempts to provide a structured overview of
anomaly detection research. A state-of-the-art of quality by deleting or replacing anomalous data. In
anomaly detection techniques is then presented. A other cases, anomalies reflect an event and provide
classification of methods based both on the type of useful new knowledge. The importance of anomaly
datasets (Big DATA, data flow, graphs, time series,
detection is because anomalies in data indicate
etc.), application domains (fraud detection, intrusion
detection, medical anomaly detection, etc.) and the important, and often critical, actionable information
approach considered (Deep Learning, statistical, in a wide variety of application domains. Case-
classification, clustering based, etc.) is proposed. We Based Reasoning is proposed as a framework that
propose a multi-vision approach of case base
representation for anomaly detection in a high-
to use in order to tackle our main target of
dimensional data using Case-Based Reasoning. anomalies detection in a high dimensional data and
to position the proposed multi-vision model.
Keywords—Anomaly detection, CBR, high-
dimensional data, Big DATA, case base, Case-Based The focus of this paper is two-fold; firstly we
Reasoning.
present a structured overview of anomaly detection
I. INTRODUCTION research using Case-Based Reasoning in a smart
environment. Furthermore, a model case base
Various research field and applications have multi-vision is proposed in order to overcome
addressed the problem of anomaly detection. It major challenges in a Big Data framework.
consists on detecting rare events or, more generally,
observations that are outliers and different from the Our paper is organized as follows: The following
majority of the data. These rare events are often section reviews the related research in anomaly
called anomalies and they can be of various types detection methods. Section 3 presents the major
and are encountered in different areas. Indeed, in CBR processes for anomaly detection. Section 4
computer networks, an abnormal traffic pattern explains the representation of case base according
could indicate that a hacked computer is to multi vision approach for anomaly detection in
dispatching suspicious data to unauthorized access high-dimensional data. Finally, section 5
to computer systems [10]. In medical diagnosis, an contains the conclusion.
abnormal brain MRI could reveal critical
information on tumors [11]. In bank security, an II. RELATED WORK
abnormal data transaction, through credit card,

978-1-6654-5116-1/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: GITAM University. Downloaded on May 01,2025 at 05:44:23 UTC from IEEE Xplore. Restrictions apply.
Table 1 Comparison of 10 review:

Hodge and Patcha Chandola Zhang Gupta Aggarwal Salehi and Chalapathy
Souiden et A.Blazquez
Austin and Park et al. (2013) et al. (2017) Rashidi and Chawla
al. (2016) et al. (2021)
(2004) (2007) (2009) (2014) (2018) (2019)
Statistical ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Clustering Based ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Nearest Neighbor Based ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Techniques

Classification Based ✓ ✓ ✓ ✓
Regression ✓ ✓
Spectral ✓ ✓
Deep learning ✓ ✓ ✓
Data flow ✓ ✓ ✓ ✓ ✓
Time series ✓ ✓ ✓ ✓
Data types

Graphs ✓ ✓
Big DATA ✓ ✓ ✓
Cyber-Intrusion Detection ✓ ✓ ✓ ✓ ✓
Fraud Detection ✓ ✓ ✓
Applications

Medical Anomaly Detection ✓ ✓ ✓ ✓


Predictive Maintenance ✓ ✓ ✓ ✓
Sensor Networks ✓ ✓ ✓ ✓ ✓
Image Processing ✓ ✓ ✓

The detection of anomalies is a subject that interests proposed in order to recommend anomaly
many researchers and has been the subject of detection methods to be used according to the type
numerous works. Several methods have been of data available (Big DATA, data flow, graphs,
proposed for anomaly detection and each method etc.) with relevant bibliographic references (Table
has its strengths and weaknesses. Patcha and Park 1).
[4] reviewed the methods used for intrusion
detection. A general review of existing techniques Anomaly detection is defined in this paper as the
covering several approaches is proposed in detection of any situation inconsistent with normal
Aggarwal [6] and Chandola et al. [3]. Gupta et al. resident behavior and daily routines [12]. Several
[5] review the state of the art of methods according approaches have been used to allow systems, to
to the type of data considered: temporal data such interpret, and to reason with previous situations.
as time series, spatio-temporal data and data flows. Among these approaches, data mining techniques
Salehi and Rashidi [7] have also presented methods such as neural networks, decision trees, and
applicable to data flows. In Table 1, a summary machine learning are widely used. Other
based on major reviews in the literature is approaches are based on the use of probabilistic
presented. We respectively identify the anomaly reasoning tools, hidden Markov models to reason
detection techniques, types of datasets, and with different types of situations and to overcome
application areas covered in each of these reviews. uncertainty problems. Furthermore, despite their
The purpose of this section is to provide a complete success for detecting abnormal situations, these
state of the art by aggregating several information approaches generally suffer from the problem of
on the different anomaly detection methods, anomalies detection in a high- dimensional data,
datasets and application domains. A classification is where anomaly detection has been a long-standing
problem [13].

Authorized licensed use limited to: GITAM University. Downloaded on May 01,2025 at 05:44:23 UTC from IEEE Xplore. Restrictions apply.
Next section is devoted to the presentation of the b) Structure of a case
general Case-Based Reasoning framework that we
First, a case in a case-based reasoning system is
propose to use in order to tackle our main target of
generally composed of two disjointed spaces: the
anomalies detection in a high dimensional data and space of problems and the space of solutions. The
to position the proposed multi-vision model. problem area relates to the part in which the
objectives to be achieved with regard to the solution
III. CASE-BASED REASONING area are to be found. It groups together the
description of the solution provided by the
We present in this section the main Case-Based reasoning, its justification, its evaluation and the
Reasoning processes for the detection of anomalies. steps that led to this solution. Two types of cases
A Case-based reasoning system is a combination of can be distinguished: source and target cases.
process and knowledge containers, which preserve
- The source case is the one in which the "problem"
and exploit the past experiences to solved future
and "solution" parts are filled in. Thus, this is a case
abnormal situation. that will inspire the system to solve a new problem.
The source case may also contains another part
1. Representation and formalization called “quality information”. This section contains
information on how to use the case in the system;
Case-Based Reasoning, considered as a reasoning
approach, is based on solving new problems -The target case is the one that bears the problem
through adapting previous successful solutions to and its solution part is not filled in.
similar problems. As a result, the new problems
In our system, a case is represented as a set of
(cases) could be then enriched through time. The
features which are grouped into categories of
CBR reasoning techniques are widely used in smart parameters, as follows:
environment for anomaly detection [14].To
simplify the presentation, we use the model of CBR
presented by Afouba et al. [15]. The knowledge
structures are: the indexing vocabulary, the case
base, similarity metrics, and knowledge [16].

a) Case Definition

A case is an experience represented by knowledge.


This experiment is a lesson allowing the case-based
reasoning system to solve problems of different
kinds. Depending on the area of application and the
objectives to be achieved, the information contained
in the case varies. A case can be defined as the
computer description of a problem-solving episode. Fig. 1 . Example of Case Modeling

Defining a case (in the case database) involves


three steps:
2. Knowledge in a Case-Based Reasoning
The first step is the “synthesis” of findings a
system:
structure to meet specifications. The second step is
“analysis” which, based on a particular structure, is The different knowledge used by a Case-Based
to find the associated behavior. The third step is the Reasoning system are grouped into four
“assessment” to verify that the behavior is “knowledge containers” categories, as follows:
consistent with what is expected. We will detail the
structure of a case and its indexing in the base of a) Indexing Vocabulary:
cases according to several existing viewpoints in
the literature. In the proposed anomaly detection A set of attributes or features that characterize the
system, a case is defined as a path from a starting problem description and domain solutions. These
node to an arrival node in a node structured system. attributes are used to build the case base and play
an important role in the research of similar cases.

Authorized licensed use limited to: GITAM University. Downloaded on May 01,2025 at 05:44:23 UTC from IEEE Xplore. Restrictions apply.
b) Case Base

A set of structured cases that will be exploited by


Case-Based Reasoning process: the research
phases, adaptation and maintenance.

c) Measures of similarity

One Functions to assess similarity between two or


more cases. These measures are defined depending
on the traits and are used for search in the case
database.

d) Adaptive Knowledge

Domain heuristics, usually in the form of rules, to


modify solutions and evaluate their applicability to
new situations.
Fig.2. Model of case base
IV. CASE BASE MULTIPLE VISIONS

This section presents such unsolved detection A huge amount of real time data is daily generated
challenges in complex anomaly data. It explains too and stored in case databases, where we are appealed
Fig.2. Representation of case base
case-base representation with multiple visions. to visualize and manipulate a high-dimensional data
to detect anomalies. Hence, detecting anomalies in
this type of databases is complicated. Furthermore,
in a low-dimensional space, anomalies often exhibit
evident abnormal characteristics, but in a high-
dimensional space, they become hidden and
unnoticeable. Detecting anomalies in a reduced
lower-dimensional space spanned by a small subset
of original features or newly constructed features is
a straightforward solution. This is why it is crucial
to think about representing the case database
according to several criteria: (a time-indexed
database, a user database, a user-group database and
a database linked to the infrastructure generated
from our surveillance model) to reduce dimension
Fig.2. Model of infrastructure of data as defined in Fig.2.

This dimensional reduction allows us to focus


In fact, to design an anomaly detection system, we attention on the specific anomalies situations of
will start with environmental monitoring; the interest.
system counts on processing data from real-time
monitoring of the infrastructure, users and behavior. In fact, with this model of dimensionality reduction
This data processing could facilitate the operations
database in terms of multiple visions, detection of
of understanding and predicting mobility, the
detection and prediction of complex events on the complex anomalies (contextual and collective
infrastructure, the detection of anomalies on several anomalies) becomes easier and straightforward.
levels, or the visual analysis of mobility data. Thus, Furthermore most of existing techniques are for
to help users to understand the space around them point anomalies detection, which cannot be used for
and to facilitate their movement, while optimizing contextual anomalies and collective anomalies,
transport systems and promoting multi-modality on since they exhibit completely different behaviors
the one hand, and preserving the environment
from point anomalies. Which it is one main
through sustainable mobility on the other.
challenge of anomaly detection. Using case base
multiple visions we aim to facilitate diverse type’s

Authorized licensed use limited to: GITAM University. Downloaded on May 01,2025 at 05:44:23 UTC from IEEE Xplore. Restrictions apply.
anomaly detection in high-dimensional data. Since path .We can clearly notice that a user takes this
we can act at the concerned reduced database. To path by itself is cannot be an anomaly.
put this vision into practice, for example in our
Anomaly detection, concerns only anomalies at the
application (traffic in a road system), if a user has level of the case bases (users, user group) but it also
always taken a path different from the optimal path concerns anomalies at the model level (the
there is an anomaly; a priori it is a mono-user infrastructure). The case-base and the model are
anomaly. Hence, depending on the type of anomaly linked. The anomaly detection system is checked by
we will focus on the desired case base, as in the the model which is based on external knowledge. If
example above the user case base is the target to we have information from the infrastructure that
defines an incident, for example, a node will be
find the abnormal case. The nature of the desired
closed because of road network work, in this case
anomaly is an important aspect of an anomaly we cannot interpret this information as an anomaly
detection technique. Anomalies can be classified but without this knowledge it is obviously an
into following categories: point, contextual and anomaly. Thus, the anomaly detected at the model
collective anomalies. It is important to properly level is observed in the case database.
identify their type and then choose the algorithm
IV. CONCLUSION
most suitable for their detection. The type of
anomalies considered depends on the problem. It is The anomalies detection being transverse to many
also possible to want to detect several types of fields of data processing, therefore, different
anomalies at once, making the problem more methods are proposed according to the constraints
complex and the choice of the detection algorithm of each application domain and data type. In this
more complicated. paper, we have proposed a review and proposed a
general framework for the application of the
existing methods and those adapted to each field of
application and main types dataset. Furthermore,
Anomalie
we presented the various methods in which
anomaly detection problems have been formulated
in the literature. This work introduces a conceptual
Point Contextual collective
model of anomaly detection in high dimensional
User Infrastructure Group User
data and proposes, as a solution face to a big data
anomaly anomaly anomaly application to consider the dataset (i.e. the case
base) following different visions: individual user,
Fig.3. types of anomalie group of users as well as time of instances
User Infrastructure Group User occurrence. This multi vision approach allows to
anomaly anomaly anomaly tackle anomaly detection in an efficient way and to
be able to exploit different sources of knowledge, as
Point Anomalies: A point anomaly occurs when an
Fig.3. types of anomalie for instance, the knowledge source resuming
individual data instance can be considered as infrastructure resources of the node-based
abnormal in comparison with the rest of data. This considered structure in our application.
is the most basic type of anomaly, and it is the
subject of the majority of anomaly detection
research (for example, this is the case of behavior
anomaly in a mono-user system).

Contextual Anomalies: A contextual anomaly is


qualified as an anomaly if the data instance is
abnormal in a specific context (but not otherwise).
The concept of context is induced by the structure
of the dataset and must be specified as part of the
formulation of the considered problem. Each data
instance is defined using the following two sets of
characteristics [6]: Contextual attributes and
Behavioral attributes.

Collective Anomalies: A collective anomaly occurs


when a group of related data instances is anomalous
in comparison to the entire data set. For example, a
group of users used a path other than the optimum

Authorized licensed use limited to: GITAM University. Downloaded on May 01,2025 at 05:44:23 UTC from IEEE Xplore. Restrictions apply.
REFERENCES

[1] Hodge, V. et J. Austin (2004). A survey of outlier


detection methodologies. Artificial intelligence
review 22(2), 85–126
[2] Patcha, A. et J.-M. Park (2007). An overview of
anomaly detection techniques: Existing solutions and
latest technological trends. Computer networks
51(12), 3448–3470
[3] Chandola, V., A. Banerjee, et V. Kumar (2009).
Anomaly detection: A survey. ACM computing
surveys (CSUR) 41(3), 15
[4] Zhang, J. (2013). Advancements of outlier detection :
A survey. ICST Transactions on Scalable Information
Systems 13(1), 1–26.
[5] Gupta, M., J. Gao, C. C. Aggarwal, et J. Han (2014).
Outlier detection for temporal data: A survey. IEEE
Transactions on Knowledge and Data Engineering
26(9), 2250–2267.
[6] Aggarwal, C. C. (2017). Outlier Analysis (Second
Edition ed.). Springer International Publishing AG
2017
[7] Salehi, M. et L. Rashidi (2018). A survey on anomaly
detection in evolving data: [with application to forest
fire risk prediction]. ACM SIGKDD Explorations
Newsletter 20(1), 13–23
[8] Chalapathy, R. et S. Chawla (2019). Deep learning
for anomaly detection : A survey
[9] Ane Blazquez-Garcia, Angel Conde, Usue Mori, and
Jose A. Lozano. 2021. A Review on Outlier/Anomaly
Detection in Time Series Data. ACM Comput. Surv.
54, 3, Article 56 (April 2021), 33 pages
[10] Kumar, V. 2005. Parallel and distributed computing
for cybersecurity. IEEE Distrib. Syst. Online 6, 10
[11] Hawkins, D. M. (1980). Identification of outliers,
Volume 11. Springer[12] Hawkins, D. M.
(1980). Identification of outliers, Volume 11.
Springer
[12] Aran, O. et al. 2015. Anomaly Detection in Elderly
Daily Behavior in Ambient Sensing Environments.
In: International Workshop on Human Behavior
Understanding, pp. 51–67.
[13] Arthur Zimek, Erich Schubert, and Hans-Peter
Kriegel. 2012. A survey on unsupervised outlier
detection in high dimensional numerical data. Stat.
Anal. Data Min. 5, 5 (2012), 363–387.
[14] Chen, L. et al. 2014. An Ontology-based Hybrid
Approach toActivity Modeling for Smart Homes.
IEEE Transactions on Human-Machine Systems
44(1), pp. 92–105.
[15] Afouba N., Kerbrat S. et Labarang Z., Rapport du
projet d’intelligence artificielle: Le raisonnement à
partir des cas: Définitions et principes de
fonctionnement, Novembre 2004.
[16] HajSaid A., Distances sémantiques pour la
comparaison des connaissances objets dans le cadre
du raisonnement à partir de cas, pp. 25-40, 2004.

Authorized licensed use limited to: GITAM University. Downloaded on May 01,2025 at 05:44:23 UTC from IEEE Xplore. Restrictions apply.

You might also like