0% found this document useful (0 votes)
4 views14 pages

An Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

The document presents an anomaly-based network outlier detection system (NODS) optimized using a Support Vector Machine (SVM) algorithm to effectively classify and manage incoming network traffic. The system demonstrates high classification accuracy, low false alarms, and efficient detection times, validated through the NSL-KDD and CICIDS2017 datasets. The proposed NODS employs feature normalization and selection techniques, along with a Genetic Algorithm for parameter tuning, showcasing its effectiveness compared to existing methods.

Uploaded by

Aiko Hendry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views14 pages

An Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

The document presents an anomaly-based network outlier detection system (NODS) optimized using a Support Vector Machine (SVM) algorithm to effectively classify and manage incoming network traffic. The system demonstrates high classification accuracy, low false alarms, and efficient detection times, validated through the NSL-KDD and CICIDS2017 datasets. The proposed NODS employs feature normalization and selection techniques, along with a Genetic Algorithm for parameter tuning, showcasing its effectiveness compared to existing methods.

Uploaded by

Aiko Hendry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received 19 December 2023, accepted 3 February 2024, date of publication 9 February 2024, date of current version 20 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3364400

An Efficient Support Vector Machine Algorithm


Based Network Outlier Detection System
OMAR ALGHUSHAIRY 1 , RAED ALSINI 2 , ZAKHRIYA ALHASSAN 1,

ABDULRAHMAN A. ALSHDADI 1 , AMEEN BANJAR 1 ,


AYMAN YAFOZ 2 , AND XIAOGANG MA 3
1 Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia
2 Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
3 Department of Computer Science, University of Idaho, Moscow, ID 83844, USA

Corresponding authors: Omar Alghushairy ([email protected]) and Raed Alsini ([email protected])


This work was supported by the University of Jeddah, Jeddah, Saudi Arabia, under Grant UJ-22-DR-64.

ABSTRACT With the increase of cyber-attacks and security threats in the recent decade, it is necessary
to safeguard sensitive data and provide robust protection to information systems and computer networks.
In this paper, an anomaly-based network outlier detection system (NODS) is proposed and optimized to
check and classify the incoming network traffic stream’s behaviours that affect the computer networks.
The proposed NODS has high classification efficiency. Network connection events classified as outliers
are reported to the network admin to drop and block its packets. The NSL-KDD and CICIDS2017 intrusion
datasets were employed to build the proposed system and test its detection capabilities. Sequential scenarios
were implemented to optimize the system’s effectiveness. Network features were normalized by min-max
and Z-Score approaches, while the relevant features were selected individually by the principal component
analysis (PCA) and correlated features selection (CFS) techniques. Support vector machine (SVM) and
Gaussian Naive Bayes (GNB) algorithms are used to build the detection model, while the Genetic algorithm
(GA) was employed to tune their control parameters. The obtained evaluation results proved that the proposed
SVM based NODS is characterized by low false alarms and detection time as well as high classification
accuracy. Furthermore, a comparative analysis was conducted with other existing techniques, and the results
obtained demonstrate the effectiveness of the proposed SVM-IDS

INDEX TERMS Outlier detection, NSL-KDD, CICIDS2017, features normalization, features selection,
support vector machine, Gaussian Naive Bayes, genetic algorithm, RBF, tunning parameters.

I. INTRODUCTION the majority of other data [3]. These abnormal data points
Internet technologies and communication networks are represent unusual behaviours and are denoted as outliers.
evolving daily. In parallel, the advancement of cyber-attacks A network outlier detection system (NODS) provides the
and the appearance of novel security vulnerabilities are mechanism to inspect network activities for detecting any
quickly rising too [1]. Attempts that breach computer net- possible intrusive actions [4]. NODS can be installed in a
works’ availability, security, and privacy are known as net- host such as a computer to audit its activities, including
work intrusions, anomalous or outliers [2]. Outlier detection system calls and log files for detecting inclusive events [5].
is mainly employed for recognizing anomalous activities in Also, NODS can be deployed in a network to monitor and
many fields like network attacks detection. It is denoted as analyze its traffic stream behaviours to identify anomalous
the process of identifying data points which are varied from network connections [5]. Furthermore, NODS can identify
intrusion attempts using the signature, anomaly, or hybrid-
The associate editor coordinating the review of this manuscript and based detection approaches [6] The signature approach looks
approving it for publication was Yudong Zhang . for the intrusion occurrence based on gathered knowledge
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
24428 For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 12, 2024
O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

about previous well-known intrusion signatures; therefore, and detect potential malicious actors who may be attempting
it cannot identify the novel attacks [6]. The anomaly approach to gain access to the organization’s network or data [14].
looks for any deviation from regular behaviour activities of a Thus, proactive network monitoring helps organizations to
system or a network; therefore, it can recognize novel attacks. detect and respond to threats quickly, ensuring confidential-
The hybrid approach integrates anomaly and signature-based ity, integrity, and availability of computing resources as well
detection methods to deliver a robust detection capability as preventing technical and business losses.
embedded in a single approach [6]. Regarding approaches Lu et al. [16] address the issue of detecting the mag-
used for detecting outliers, they are categorized as density, netic tile’s internal defects leverages acoustic sound to detect
distance and machine learning or soft-computing [7]. In the defects. The non-stationary and non-Gaussian proper-
this paper, SVM and GNB are implemented individually to ties of acoustic sound limit the accuracy of using a single
develop the anomaly-based detection model of NODS which data modality for detecting internal defects. Another study
is built and evaluated on the labelled network traffic stream of presents a novel ensemble and efficacious anomaly detection
the benchmark NSL-KDD and CICIDS2017 datasets [8], [9]. approach that relies on a collaborative representation-based
Efficient data preprocessing of the network traffic data like detector. Background data is predicted using randomly cho-
features engineering is crucial in mitigating the model overfit- sen focused image pixels [17]. Connected and Autonomous
ting and boosting its generalization. Consequently, the outlier Vehicles (CAVs) are becoming increasingly common due
detection model performance gets improved and converged to the current technological development rate. However,
faster. The remainder of this paper is structured as follows: these cars’ networks are highly susceptible to illegal eaves-
Section II reviews related work, Section III discusses the dropping. Therefore, we propose using Deep Reinforcement
proposed NODS, Section IV highlights the implementation Learning (DRL) and Distributed Kalman Filtering (DKF)
and results of the experiment. Finally, Section V presents the methods to mitigate jamming interference and increase
research conclusion along with the future interests. communication robustness to eavesdropping. The overarch-
ing aim is to optimize security performance against smart
jammers and eavesdroppers. Thus, we formulate a DKF
II. RELATED WORKS algorithm that accurately tracks the attacker by sharing state
Network outliers are observations that are distinctly different estimates between nodes. Consequently, we conceptualize a
from other observations, making them appear to be generated design problem for managing transmission power and pick-
by a different process [10]. Unlike noise, network outliers ing communication channels. These provisions are made
carry important information, which can inform proactive net- while ascertaining that the authorized vehicle user’s quality
work threat management. For example, an unusually large needs are not compromised. A hierarchical Deep Q-Network
number of requests coming from one computer could be an (DQN)-based architecture is selected since the jamming and
outlier generated by a different process, which could indicate eavesdropping model is dynamic and uncertain. The DQN
a malicious attack or some other type of unusual activity [11]. architecture is employed for designing channel selection
Thus, network outliers can help detect malicious behavior or policies and anti-eavesdropping power control. The optimal
provide insight into abnormal traffic patterns. power control model is rapidly performed first without prior
By detecting unusual activity in the network, organizations data or insights on eavesdropping behaviors. The channel
can identify malicious activities and reduce the risk of secu- selection process, which is founded on the system secrecy
rity breaches. Network anomaly detection can also be used to rate analysis, then proceeds when necessary. We simulate the
improve network performance by identifying and addressing proposed system, finding that it increases the secrecy and
network congestion, latency issues, and slow response times attainable communication rates [18].
[11], [12]. Li et al. [13] developed an optimized resource allo- Connected and Autonomous Vehicles (CAVs) are becom-
cation and communication technique for the fault detection ing increasingly common due to the current technologi-
system. This method is vital considering the limited edge cal development rate. However, these cars’ networks are
device computation capabilities, minimal communication highly susceptible to illegal eavesdropping. Therefore,
resources, and varying monitoring accuracies. The proposed we propose using Deep Reinforcement Learning (DRL) and
approach maximizes the system’s processing performance, Distributed Kalman Filtering (DKF) methods to mitigate
optimizes resource use, and meets all data transmission and jamming interference and increase communication robust-
analysis latency needs. ness to eavesdropping. The overarching aim is to optimize
From an organization’s perspective, verifying the integrity security performance against smart jammers and eaves-
of the network ensures that legitimate traffic is not blocked droppers. Thus, we formulate a DKF algorithm that accu-
or rerouted to unknown sources, leading to a more secure rately tracks the attacker by sharing state estimates between
and reliable network [14]. Pour et al. noted that by detecting nodes. Consequently, we conceptualize a design problem
anomalous activity, organizations can also ensure compli- for managing transmission power and picking communica-
ance with regulatory requirements, and improve the overall tion channels. These provisions are made while ascertain-
security posture of their network [15]. Furthermore, network ing that the authorized vehicle user’s quality needs are not
anomaly detection can be used to monitor suspicious activity compromised. A hierarchical Deep Q-Network (DQN)-based

VOLUME 12, 2024 24429


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

architecture is selected since the jamming and eavesdrop- outliers early. Apache storm framework was used to han-
ping model is dynamic and uncertain. The DQN architecture dle the network stream big data characteristics. Assessment
is employed for designing channel selection policies and results stated the feasibility of the detection system. Besides,
anti-eavesdropping power control. The optimal power control the system performance can be improved by solving the
model is rapidly performed first without prior data or insights class imbalance problem. In [23], Han et al. developed an
on eavesdropping behaviors. The channel selection process, IDS to identify varied network attack types. Evolutionary
which is founded on the system secrecy rate analysis, then neural networks (ENNs) were used to construct the detection
proceeds when necessary. We simulate the proposed system, model on the network traffic of the DARPA IDEVAL dataset.
finding that it increases the secrecy and attainable communi- Evaluation results showed the system’s ability in detecting
cation rates [19]. network intrusion with low false alarms and a high detection
Several practical challenges constrain the conventional rate. In [24], Wang et al. developed an IDS to complement
‘‘forecast-response’’ paradigm. For instance, the method’s the firewall. It can identify network attacks that the firewall
applicability is poor when different situations need dis- cannot detect. The IDS was built based on the K-means
similar reaction processes. This deficiency originates from clustering-based density and the k-NN classifier on the KDD
the paradigm’s macro-perspective description of crises that intrusion dataset. Results proved that the system is effective
overlooks the micro-perspective evaluation of emergency in detecting varied network attacks. In [25], Sanjay et al.
response. Therefore, this research recommends employing presented an improving mechanism for the attack detection
the ‘‘scenario-response’’ paradigm, which leverages a micro- system based on streaming data mining approaches. NSL-
scopic approach to frame the implications of conforming KDD intrusion dataset was used to assess four classification
measures on events. Zhengzhou, China, experienced unex- techniques, and their evaluation results are compared. Results
pected torrential rains in 2021 that resulted in 398 fatalities proved that the Naïve Bayes classifier achieved the best accu-
and approximately 120.6 billion RMB of economic losses. racy, and the Hoeffding tree achieved the least detection time.
Consequently, an empirical assessment of the disaster based In [26], Zhang et al. developed an outlier detection technique
on Bayesian networks was done to analyze the emergency for data streams. The detection model is trained and assessed
response’s evolution. The constructed scenario Bayesian net- on KDD dataset. The performance evaluation proved the
work was built by amalgamating Dempster’s combination system’s effectiveness in detecting network outliers at a lower
rule, scenario evolution, and knowledge meta-theory with rate of false positives than other compared systems.
362 appropriate historical representative events. The network Kurniabudi et al. utilized the Information Gain to rank and
could also identify the progression of the respective emer- group features based on minimum weight values, enabling
gency events and combine different experts’ analyses. An the selection of relevant and significant features [27]. Sub-
event-driven Bayesian network was also employed to evalu- sequently, we employ five classifier algorithms, namely
ate the impact of individual actions on the response outcomes’ Random Forest (RF), Bayes Net (BN), Random Tree (RT),
odds. The interventions’ counterfactual outcomes were also Naive Bayes (NB), and J48, to conduct experiments on the
checked using causal inference to highlight the urgent and CICIDS2017 dataset. The experimental results demonstrate
vital responses. The similarity between each source and tar- that the number of relevant and significant features deter-
get scenario exceeded 0.7, with the highest value at 0.78. mined by Information Gain significantly impacts detection
Furthermore, the incident response’s evolutionary precision accuracy and execution time. Specifically, the Random Forest
was examined by contrasting scenario parallels. Thus, the algorithm achieves the highest accuracy of 99.86% when
proposed approach can offer a theoretical foundation for using 22 relevant selected features, whereas the J48 classifier
deploying a ‘‘scenario-response’’ paradigm [20]. algorithm attains an accuracy of 99.87% with 52 relevant
The number of multi objective large-scale optimization selected features, albeit requiring a longer execution time.
problems (MOLSOPs) has increased in recent years. The Pankaj Jairu et al. focused on building anomaly-based
MOLSOPs can be addressed using cooperative coevolu- IDS to detect variety of network attacks by using many
tion and variable grouping optimization. However, few supervised learning algorithms such as Logistic Regres-
researchers have attempted to decompose MOLSOP vari- sion, Support Vector Machine (SVM), K-Nearest Neighbor
ables. Therefore, they present a multi objective graph-based (KNN), Naïve Bayes, Decision Tree, and Random Forest on
differential grouping with shift (mogDG-shift) for decompos- multiple datasets, including the realistic evaluation dataset
ing the multiple MOLSOP variables. We begin by assessing CICIDS-2017 [28]. Results demonstrated that Random For-
variable attributes and then detect the variable interactions. est outperformed other supervised algorithms and achieved
Consequently, we categorize the variables according to their an impressive accuracy of 99.93% by using only 14 features
interactions and features [21]. selected via Pearson’s correlation coefficient method.
Asif et al. [22] developed an Intrusion Detection Sys- Shruti et al. introduced a novel intrusion detection
tem (IDS), where KDD 99 intrusion dataset was used as system that employs ensemble techniques of machine learn-
the network traffic source. The detection system developed ing algorithms [29]. The objective is to enhance clas-
was designed to identify anomalous activities and network sification accuracy and reduce false positives, utilizing

24430 VOLUME 12, 2024


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

features sourced from the CICIDS-2017 dataset. The which is achieved by calculating a score for each data point
proposed presents an intrusion detection system (IDS) [22]. Consequentially, the score is calculated based on a
implemented through machine learning algorithms, includ- variety of factors, such as distance from the cluster’s central
ing decision trees, random forests, and SVM. Additionally, point, variance from the cluster’s mean, and correlation with
this proposed incorporates LIME which is considered as an other data points in the cluster.
explainable framework to understand the model’s prediction. Smiti noted that if a data point has a significantly higher
The ensemble of ML models showed an improved accuracy score than the rest of the data, it is considered an outlier
of 96.25 for the IDS prediction, and the LIME explanation [20]. Once outliers are identified, they can be further analyzed
graphs showcased the prediction performance of the decision to determine what type of malicious activity is taking place
tree, random forest, and SVM algorithms. This integration [22]. As a result, the analysis can be done manually, or by
aims to enhance comprehensibility and insight into the pre- using automated tools such as machine learning algorithms.
viously opaque black-box methodology for reliable intrusion NODS is deployed and attached to the entry point device of a
detection. computer network, as shown in Figure 1. Its goal is to capture
Omar et. al have implemented five distinct deep learning and analyze the incoming network flow of this network.
models for the identification and categorization of suspi-
cious activities within network flows in IOT environment
[30]. These models are initially trained on a cloud server
and subsequently deployed to a gateway node, where the
pivotal network traffic classification is executed. The entire
process of model training and assessment is conducted uti-
lizing the CICIDS2017 dataset. The evaluation of the five
models’ accuracy revealed that the proposed model, named
EIDM, exhibited exceptional performance, surpassing the
other four models with a remarkable accuracy rate of 99.48%.
This superior performance was achieved while also taking
into consideration the time resources expended. Furthermore,
the EIDM model proved its efficacy by successfully cat-
egorizing the full spectrum of 15 traffic behaviors, which FIGURE 1. Network outlier detection system (NODS).
encompassed 14 diverse attack types within the CICIDS2017
dataset, achieving a commendable accuracy level of 95%.
NODS starts with capturing the network traffic stream data
III. NETWORK OUTLIER DETECTION SYSTEM (NODS) by a packet sniffer. Then the related network packets are
There are two main categories of NODS; supervised and gathered to form numbers of network connections and gen-
unsupervised. If a system utilizes both supervised and unsu- erate them into a dataset file to be analyzed [35], [36], [37].
pervised features, it is classified as semi-supervised [31] Each connection is described as a vector of many network
Supervised NODSs use labeled data to train a model that can features. Therefore, any network connection behaviour can
then be used to detect outliers in new, unlabeled data sets. be analyzed and classified as either normal or an outlier.
These systems are based on supervised learning techniques, Once NODS detects any abnormal network flow, an alarm is
such as decision trees, neural networks, and support vector raised to the network admin to take suitable countermeasures
machines (SVMs) [32]. These techniques are used to identify regarding this outlier traffic, like dropping this anomalous
patterns in the data that indicate the presence of outliers. In traffic by blocking its IPs. However, processing these data
decision tree-based NODSs, the data is split into multiple directly represents long time analysis processes and leads to
nodes based on the value of a certain feature [31], [33]. imprecise detection results. Therefore, it should be prepro-
The nodes are then classified as outliers or inliers. Then, the cessed well by many data mining techniques before being
system uses the decision tree to evaluate the data points and analyzed to ease the classification process and achieve effi-
identify outliers. cient classification results.
Unsupervised NODSs use only unlabeled data to iden-
tify outliers. In this case, the dataset is first divided into A. NETWORK TRAFFIC DATA PRE-PROCESSING
two or more clusters, where each cluster represents a set of 1) NETWORK FEATURES ENCODING
data points that share similar characteristics [32], [34]. The The network features values are heterogeneous in their types
clusters are then evaluated to determine whether any data where they can be founded either in nominal forms like
points are significantly different from the rest of the data. protocol type, e.g. TCP or UDP, or in numeric form like
The evaluation is done using a variety of methods, such as a port number. Many outlier detection models cannot work
density-based clustering, clustering based on distance, and with nominal data. It should be encoded into numeric form,
cluster-based outlier detection algorithms. Once clusters are and each connection’s class/target feature is encoded to 0 for
created, the next step is to identify anomalies in the data, normal and 1 for the outlier/anomalous behaviour.

VOLUME 12, 2024 24431


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

2) NETWORK FEATURES NORMALIZATION TABLE 1. NODS implementation general algorithm.

Naturally, the values range of network features is varied, lead-


ing the outlier detection model for biasing toward the high
scale features and ignoring others with a lesser scale. This
results in an inaccurate detection process, which could lead
to the model underfitting problem. Therefore, this problem
is avoided by rescaling the values of the feature ranges on
a uniform scale. Two normalization methods are used, the
min-max and the Z-score.
a) Min-Max method scales each feature values between
specific range of values [a,b] like [0 1] or [−1,+1] by the
following formula
(x − min (x))(b − a)
N (x) = a + (1)
max (x) − min(x)
where x is the original feature value, and N(x) denotes its
normalized value. decision boundary which separates the two classes and has
b) Z-score method scales each feature according to its the maximum influence on the position of the hyperplane
mean and standard deviation as the following formula [42], [43].
(x) − mean(x) SVM models are applied in NODS because, in these sys-
N (x) = (2)
std(x) tems, the goal is to identify ‘‘outliers’’—data points that
are significantly different from the data points in the same
3) NETWORK FEATURES SELECTION class or cluster. Outliers can indicate malicious behavior,
The network connection is described as a vector of net- faulty or malfunctioning nodes, or other anomalies [44]. To
work features representing the connection behaviour. The detect these outliers, it is necessary to use an algorithm that
information contribution of these features concerning the can distinguish between normal and abnormal data points.
connection behaviour label is varied [38]. Many features hold SVMs models are well-suited for this task because they
less information about the connection behaviour denoted by are capable of finding non-linear boundaries between data
irrelevant features, while others contain redundant informa- points.
tion denoted by redundant features. Building the detection SVM is considered a good candidate for building the
model on either irrelevant or redundant features causes the anomaly-based outlier classification model. It begins with
overfitting problem rather than increasing the model com- learning the network traffic’s normal/usual/inlier behaviour
plexity [39]. Discarding those features during the model obtained from the previous preprocessing stage. After,
building process improves model classification capabilities it builds a model which can recognize both normal and
[39]. Two features selection techniques, PCA [40] and CFS abnormal behaviours of unseen network traffic. Each network
[41] are adopted to select the dominant features from the connection differs from the usual behaviour/pattern treated as
whole network features set for building the detection model an outlier connection.
on its basis. PCA selects a subset of network features that
has the higher eigenvalues. In contrast, CFS selects features 2) GAUSSIAN NAIVE BAYES (GNB) MODEL FOR NODS
with a high correlation with the class/label of the network Considered a popular supervised probabilistic algorithm
connection behaviour and low or no correlation between each model and based on Bayes’ theorem. It is commonly used
other. for text classification and is widely used in various machine-
learning tasks, including spam filtering, intrusion detection,
B. NETWORK OUTLIER DETECTION and sentiment analysis [37].
1) SVM MODELS FOR NODS The key assumption in GNB is that all features are con-
Support Vector Machines (SVMs) are a type of supervised ditionally independent given the class label. In other words,
learning algorithm that has been successfully applied to a it assumes that the presence or absence of a particular feature
variety of classification and regression problems. The SVM does not affect the presence or absence of other features in the
algorithm is based on the idea of finding a hyperplane that same class. This is a strong and often unrealistic assumption,
best separates the data points into two distinct classes. The but it allows the algorithm to be computationally efficient and
SVM algorithm seeks to maximize the margin between the work well with high-dimensional data [45].
two classes, thereby obtaining a ‘‘maximum-margin hyper- GNB is an effective choice for identifying anomalous
plane’’ [42]. This hyperplane is determined through a process network activities and potential security threats. By consid-
of optimization which minimizes the overall classification ering the statistical distribution of features related to network
error. In SVM models, support vectors form the basis of the traffic, such as packet sizes, response times, and connection

24432 VOLUME 12, 2024


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

duration, the model can learn patterns of normal behavior. details on SVM, GNB and GA techniques are discussed in
During the testing phase, it can efficiently classify incoming [45], [50], [51], and [52].
data as either normal or malicious based on the learned prob-
ability distributions [46]. IV. IMPLEMENTATION AND EXPERIMENTAL RESULTS
A. NETWORK INTRUSION DATASET
1) NSL-KDD is a benchmark labelled network traffic dataset
3) TUNING SVM AND GNB CONTROL PARAMETERS BY used globally by researchers who are interested in intru-
USING GA sion detection field area [53]. It consists of two files, the
Radial Basis Function (RBF) SVMs are becoming increas- training set with 127973 network connection instances and
ingly popular for classification, regression, and clustering the testing set with 22544. Each connection described by a
tasks such as network outlier detection. Wainer et al. noted vector of 42 features as mentioned in Table 1. For the feature
that RBF technique is preferred due to its capability to map value types, all are considered as numeric except feature
non-linear data, which allows them to capture complex pat- numbers (2,3,4,42) are nominal, as shown in Table 1. The
terns in the data [41]. behaviour of each connection is classified as either normal or
SVM uses the RBF as a kernel function during the clas- outlier.
sification process. RBF has two parameters: the penalty It has 38 varied attack types, where the training set contains
(c) and kernel parameter (σ ). The former controls the SVM’s 22 types, and the testing set involves the other 16 [39]. Table 3
hyperplane flexibility, while the latter controls the correla- groups these attacks into four categories as following:
tion among support vectors of the same hyperplane. These 1. Probe: Intruder aims to obtain varied information con-
parameters have an observable impact on the SVM classi- cerning the victim host or network by scanning its
fication effectiveness. Thus, it’s necessary to properly tune opened and closed ports, rather than its IPs ranges to
these parameters values which considered an optimization launch future attacks.
problem. 2. Denial of Service: By using zombies, intruders can
For the GNB, the primary parameter that can be adjusted flood the target system with huge numbers of network
is the smoothing parameter which is used to prevent zero packets. As a sequence, the victim system resources
probabilities when a particular feature value is not observed in e.g. network bandwidth, and processing power are
the training data for a given class. The smoothing parameter is exhausted and become unreachable for its legitimated
a positive value added to all feature occurrences, which helps users.
in handling unseen feature combinations and avoids division 3. User to Root: Intruder aims to acquire the root/admin
by zero in probability calculations [47]. privileges of the victim machine by exploring and
Genetic Algorithms (GAs) have become an increasingly exploiting their vulnerabilities.
popular tool for optimizing complex systems, including 1) Remote to Local: Intruder who has no account on the
NODS. GAs have been shown to outperform traditional host aims to get unauthorized access to it.
optimization techniques in a variety of applications, from 2) CICIDS2017 is a benchmark dataset widely used in
distributed systems to clustering algorithms. GAs also pro- the field of intrusion detection research [54]. It was created
vide efficient and robust solutions for outlier detection, with to evaluate the performance of IDS in accurately identify-
applications in network intrusion detection, fraud detec- ing network attacks and distinguishing them from legitimate
tion, and traffic anomaly detection [48]. Notably, tradi- network activities. Most of the available network traffic
tional methods of NODS rely on static rules and thresholds, datasets suffer from the absence of traffic diversity, vol-
which can be difficult to maintain and may not always be umes, anonymized packet information payload, constraints
accurate. on the attacks range, the lack of the feature set and meta-
GAs offer an alternative approach to NODS, providing a data. Therefore, this dataset came to conquer these concerns.
more dynamic and adaptive solution. The basic idea behind It comprises various types of network traffic, including
GAs is to use evolutionary algorithms to search for the best benign/normal traffic and different categories of attacks
solutions to a given problem. In the case of network outlier including Brute Force attack, Web attack, DoS, Infiltration,
detection, this means using GAs to optimize the parameters Botnet, PortScan and DDoS. It consists of 2830540 con-
and thresholds used to detect outliers [49]. GAs are able to nection instances where each is described by a vector of
search through a large and complex search space to identify 79 features as mentioned in Table 4. All network traffic flow
the best parameters for a given problem. In this research, GA classes categorization of the CICIDS2017 dataset are listed
employed to search for the best values of RBF parameters in in Table 5, where all detailed analysis of the CICIDS2017
this research, GA is employed to search for the best values dataset is existed at [55].
of SVM’s RBF and GNB’s smoothing parameters in a given
search space which consists of number of candidates each B. EXPERIMENTAL SETUP
representing possible values for these parameters. Determin- A personal laptop is used to carry the proposed research
ing the appropriate candidate will boost SVM and GNB experiments with 4 GB RAM, Intel core i7 CPU, and Window
detection performance. Further theoretical and technical 10 OS. The setup of these experiments was as follow:

VOLUME 12, 2024 24433


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

TABLE 2. NSL-KDD dataset network features list. TABLE 3. All 38 attack types with four classes of NSL-KDD dataset.

• The Python-based Scikit-learn machine learning library


is employed for implementing and building the SVM
and GNB detection models individually on the network
traffic data of the NSL-KDD and CICIDS2017 datasets
and adopts the superiority of them as the detection model
for the proposed NODS.
• GA is implemented in Python to adjust and tune RBF
control parameters by using SVM and the smoothing
parameter of the GNB models. The model detection
accuracy is used as the GA fitness function for evaluat-
ing each candidate/individual/chromosome fitness dur-
ing the GA generation process.
• The number of GA iterations was 100, and the size
of the GA population was 300 candidates. Each GA
candidate consists of either two random values for SVM
RBF [penalty parameter (c), kernel parameter (σ )] or
one random value for the GNB’s smoothing parameter.
• The range values for the SVM RBF [penalty, kernel] and
GNB smoothing parameter are [.01:4000,.01:100], and
[.01:100] respectively.
• For the NODS implementation, 125973 and 22543
instances from NSL-KDD are used for the training and
testing steps, while 120023 and 30006 instances are used
from the CICIDS2017, respectively.
• The overall performance of the SVM and GNB detection
models is evaluated individually on the NSL-KDD and
CICIDS2017 datasets by many evaluation metrics as
discussed in the next subsection.

C. PERFORMANCE EVALUATION METRICS


Many metrics are calculated to evaluate the capabilities of
the proposed NODS. These metrics are inferred from the
following confusion matrix:

• Min-max and Z-score scaler/normalizer techniques are


implemented in Python to normalize and rescale the
input feature values of network traffic data.
• The Java-based weka platform is used to implement the
features selection process from network traffic data by
two filter techniques PCA and CFS.

24434 VOLUME 12, 2024


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

TABLE 4. CICIDS2017 dataset network features list. TABLE 4. CICIDS2017 dataset network features list.

All evaluation metrics are detailed as following [56]:


1. Detection Accuracy (DC): denotes the proportion of
the properly detected network connections to whole
detected connections.
TN + TP
DC = (3)
FN + FP + TN + TP
2. Detection Rate (DR): denotes the ratio of the properly
predicted network connections as outliers to the whole
real outlier connections.
TP
DR = (4)
TP + FN
3. False Negative Rate (FNR): denotes the ratio of the out-
lier network connections wrongly identified as normal
to the whole real outlier connections.
FN
FNR = (5)
FN + TP

VOLUME 12, 2024 24435


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

TABLE 5. Network traffic class composition of the CICIDS2017 dataset. fourth scenario mimics employing GA to tune the hyperpa-
rameters of the SVM’s RBF control parameters [c, σ ] and
the smoothing parameter of the GNB during the building
process of the used detection model on the pre-selected net-
work features subset obtained from the previous scenario
and analyze their impact on the final performance of the
proposed NODS. For the GA setup, we noticed that using
large individuals/candidates’ numbers of the GA population
resulted in providing better genetic variability and a faster
adaptation as well. And based on many pre-empirical experi-
mental tests and trials, we set the number of individuals in the
GA population to 300, and the generations number to 100.
Concerning the first scenario, the SVM and GNB detec-
tion models performance built on both the NSL-KDD and
CICIDS2017 datasets are ineffective totally according to their
evaluation results shown in Table 6 and 7. Due to the low
quality and non-preprocessing of the input network data,
the detection model got a high underfitting. Therefore, both
detection models’ accuracy and detection rates in recognizing
the network traffic were very low, and they required a long
time for classifying the traffic behaviour. As a result, the
network admin will be confused about the high false alarm
4. False Positive Rate (FPR): denotes the ratio of the nor- rates because much intrusive network traffics are recognized
mal network connections wrongly identified as outliers as normal.
to all real normal connections.
TABLE 6. The NODS performance evaluation of the first scenario on
FP NSL-KDD dataset.
FPR = (6)
FP + TN
5. Detection time (DT): represents the time taken to clas-
sify the behaviours of all unseen network connections
existed in the testing file of the dataset.
6. Area Under the Curve (AUC): measures the NODS per-
formance in identifying the normal and outlier classes.

D. EXPERIMENT SCENARIOS AND RESULTS DISCUSSION


Proposed research experiments are conducted by carrying out
four scenarios for developing and optimizing the proposed
NODS. The first scenario mimics building the detection sys-
tem on the original network traffic data of the pre-mentioned TABLE 7. The proposed NODS performance evaluation of the first
scenario on CICIDS2017 dataset.
dataset without performing any data preprocessing stages.
The second scenario mimics performing only one data pre-
processing stage by normalizing the network traffic data by
min-max [-1:+1], and z-score scaler methods before building
the detection system. The third scenario mimics applying
two data preprocessing stages before building the detection
system.
After normalizing the input network traffic data by the
best scaler approach determined from the previous scenario,
we apply the dimensionality reduction process on the input
normalized data by selecting the most informative and signif-
icant features subset from the whole features set. Two filter For the second scenario, applying the min-max [−1:1]
feature selection techniques, the PCA and IG, are applied and z-score normalization techniques as a data preprocessing
individually on the input normalized network data before the task to rescale and normalize the input network features
learning process to detect which selection technique affect values before the detection model training process. It helps
positively the NODS detection performance. Finally, the in preventing the biasing problem occurrence to the detection

24436 VOLUME 12, 2024


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

model toward the network features with high scale values TABLE 9. The proposed NODS performance evaluation of the second
scenario on CICIDS2017 dataset.
where this problem always affects negatively the model per-
formance. As shown in Table 8,9, and figures 2,3, both SVM
and GNB detection models performance after applying the
min-max [−1:1], and z-score methods were better than the
performance of the first scenario detection model. Results
ensure the importance of applying the normalization task
during the data preprocessing stage before the learning pro-
cess starts. Regarding the impact of the two normalization
approaches used for enhancing the SVM and GNB detection
models performance, the impact of applying z-score out-
performed the min-max [-1:1] scaler method on the models
built on the network traffic data of the NSL-KDD dataset
where the vice versa on the CICIDS2017 dataset. So, apply-
ing the normalization task helps in overcoming the model
biasing and underfitting problems and therefore optimizing
the NODS capabilities to be more effective and faster. In
addition, the detection model misclassifying rates represented
in either the false negative or positive alarms became much
lower than the first scenario results.
mitigating the overfitting risk. Furthermore, accelerating the
TABLE 8. The proposed NODS performance evaluation of the second detection models time, and improving their effectiveness in
scenario on NSL-KDD dataset. analyzing the input network traffic behaviours compared with
the second scenario results.

FIGURE 2. NODS performance on the second scenario for NSL-KDD using


the Min-max, and Z-score.

Regarding the third scenario, applying the dimensionality


reduction task on the best-normalized network traffic features
from both NSL-KDD and CICIDS2017 data resulted from the
previous scenario. Two common feature selection techniques,
PCA and CFS, are applied individually on the normalized
data before the SVM and GNB detection models learning
process, to assess their impact on the overall detection capa-
bilities of the used models.
The selected feature subsets from both the zscore-based
NSL-KDD and min-max [-1:1] based CICIDS2017 are tab-
ulated with their indices in Table 10,11. Both SVM and
GNB detection models are built on these selected feature
subsets and their evaluation performance is evaluated. Results FIGURE 3. NODS performance on the second scenario for CICIDS2017
in Table 12,13 stated that the PCA technique outperformed using the Min-max, and Z-score.

CFS in selecting the most relevant and informative features


from both the used two datasets. Consequently, it led for Regarding the fourth scenario, the GA is used to tune the
achieving a significant contribution in decreasing the SVM RBF control parameters [c, σ ] of the SVM and the smooth-
and GNB detection models learning time, complexity, and ing parameter of the GNB during their learning process on

VOLUME 12, 2024 24437


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

TABLE 10. Selected features subset by the CFS, and PCA techniques. TABLE 13. The proposed NODS performance evaluation of the third
scenario on CICIDS2017 dataset.

TABLE 11. Selected CICIDS2017’s features subset by the CFS, and PCA
techniques.

TABLE 14. The proposed NODS performance evaluation of the fourth


scenario on NSL-KDD dataset.

TABLE 12. The proposed NODS performance evaluation of the third


scenario on NSL-KDD dataset.

TABLE 15. The proposed NODS performance evaluation of the fourth


scenario on CICIDS2017 dataset.

the previous PCA-based selected network features of the


used datasets from the last scenario. Results in Table 14,15
stated that adjusting the two detection models hyperparam-
eters resulted in boosting their generalization ability and
convergence speed which led to an optimization in the overall
performance of the SVM and GNB models.
Regarding the evaluation comparison between the four suc-
cessive scenarios, it’s noted that the fourth detection NODS
models (PCA-GA-SVM and PCA-GA-GNB) considered the
superlative among all previous NODS scenarios in detecting
the normality and abnormality behaviours of the network
traffic connections of the used datasets.
For a comparison with other related detection systems as of our proposed system with lower false alarms and higher
shown in Table 16, evaluation results stated the superiority detection accuracy.

24438 VOLUME 12, 2024


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

TABLE 16. The Proposed NODS evaluation performance comparison with [9] L. Dhanabal and S. P. Shantharajah, ‘‘A study on NSL-KDD dataset for
other related work. intrusion detection system based on classification algorithms,’’ Int. J. Adv.
Res. Comput. Commun. Eng., vol. 4, no. 6, pp. 446–452, 2015.
[10] A. Gaddam, T. Wilkin, M. Angelova, and J. Gaddam, ‘‘Detecting sensor
faults, anomalies and outliers in the Internet of Things: A survey on the
challenges and solutions,’’ Electronics, vol. 9, no. 3, p. 511, 2020, doi:
10.3390/electronics9030511.
[11] L. Wawrowski, M. Michalak, A. Białas, R. Kurianowicz, M. Sikora,
M. Uchroński, and A. Kajzer, ‘‘Detecting anomalies and attacks in network
traffic monitoring with classification methods and XAI-based explainabil-
ity,’’ Proc. Comput. Sci., vol. 192, pp. 2259–2268, Jan. 2021.
[12] X. Chen, H. Kim, J. M. Aman, W. Chang, M. Lee, and J. Rexford,
‘‘Measuring TCP round-trip time in the data plane,’’ in Proc. Workshop
Secure Program. Netw. Infrastructure, Aug. 2020, pp. 35–41.
[13] J. Li, Y. Deng, W. Sun, W. Li, R. Li, Q. Li, and Z. Liu, ‘‘Resource
orchestration of cloud-edge-based smart grid fault detection,’’ ACM Trans.
Sensor Netw., vol. 18, no. 3, pp. 1–26, Aug. 2022, doi: 10.1145/3529509.
[14] S. S. Chakkaravarthy, D. Sangeetha, and V. Vaidehi, ‘‘A survey on malware
V. CONCLUSION analysis and mitigation techniques,’’ Comput. Sci. Rev., vol. 32, pp. 1–23,
An outlier detection system is proposed to identify the normal May 2019.
and abnormal network traffic. The SVM and GNB classi- [15] M. Safaei Pour, C. Nader, K. Friday, and E. Bou-Harb, ‘‘A comprehensive
survey of recent internet measurement techniques for cyber security,’’
fication algorithm are employed to classify the behaviours Comput. Secur., vol. 128, May 2023, Art. no. 103123.
of incoming network connections that affect a network of [16] H. Lu, Y. Zhu, M. Yin, G. Yin, and L. Xie, ‘‘Multimodal fusion convolu-
computers. They are built and evaluated on the NSL-KDD tional neural network with cross-attention mechanism for internal defect
detection of magnetic tile,’’ IEEE Access, vol. 10, pp. 60876–60886, 2022.
and CICIDS2017 network traffic datasets. Data mining pre-
[17] S. Wang, X. Hu, J. Sun, and J. Liu, ‘‘Hyperspectral anomaly detection
processing stages for network flow data, besides tuning using ensemble and robust collaborative representation,’’ Inf. Sci., vol. 624,
the SVM’s RBF control parameters and GNB’s smoothing pp. 748–760, May 2023.
parameter, were vital for improving the inclusive effective- [18] Z. Wu, J. Cao, Y. Wang, Y. Wang, L. Zhang, and J. Wu, ‘‘HPSD:
A hybrid PU-learning-based spammer detection model for product
ness of the proposed NODS. The performance of the pro- reviews,’’ IEEE Trans. Cybern., vol. 50, no. 4, pp. 1595–1606, Apr. 2020,
posed system is compared with other related IDSs and the doi: 10.1109/TCYB.2018.2877161.
evaluation results stated the superiority of the proposed SVM- [19] Y. Yao, J. Zhao, Z. Li, X. Cheng, and L. Wu, ‘‘Jamming and eaves-
dropping defense scheme based on deep reinforcement learning in
NODS in detecting the different intrusions. In our future autonomous vehicle networks,’’ IEEE Trans. Inf. Forensics Security,
work, we will explore and implement other strategies for vol. 18, pp. 1211–1224, 2023, doi: 10.1109/TIFS.2023.3236788.
boosting the detection system capabilities and also investigate [20] X. Xie, L. Huang, S. M. Marson, and G. Wei, ‘‘Emergency response
many deep learning trend models in building the proposed process for sudden rainstorm and flooding: Scenario deduction and
Bayesian network analysis using evidence theory and knowledge meta-
detection model. theory,’’ Natural Hazards, vol. 117, no. 3, pp. 3307–3329, Jul. 2023, doi:
10.1007/s11069-023-05988-x.
[21] B. Cao, J. Zhao, Y. Gu, Y. Ling, and X. Ma, ‘‘Applying graph-
ACKNOWLEDGMENT based differential grouping for multiobjective large-scale optimiza-
The authors would like to thank the University of Jeddah for tion,’’ Swarm Evol. Comput., vol. 53, Mar. 2020, Art. no. 100626, doi:
its technical support. 10.1016/j.swevo.2019.100626.
[22] M. A. Manzoor and Y. Morgan, ‘‘Network intrusion detection system using
apache storm,’’ Adv. Sci., Technol. Eng. Syst. J., vol. 2, no. 3, pp. 812–818,
REFERENCES Jun. 2017.
[23] S.-J. Han and S.-B. Cho, ‘‘Evolutionary neural networks for anomaly
[1] J. Jang-Jaccard and S. Nepal, ‘‘A survey of emerging threats in cybersecu-
detection based on the behavior of a program,’’ IEEE Trans. Syst., Man,
rity,’’ J. Comput. Syst. Sci., vol. 80, no. 5, pp. 973–993, Aug. 2014.
Cybern., B, vol. 36, no. 3, pp. 559–570, Jun. 2006.
[2] P. Schaik, J. Jansen, J. Onibokun, J. Camp, and P. Kusev, ‘‘Security and [24] X. Wang, C. Zhang, and K. Zheng, ‘‘Intrusion detection algorithm based on
privacy in online social networking: Risk perceptions and precautionary density, cluster centers, and nearest neighbors,’’ China Commun., vol. 13,
behaviour,’’ Comput. Hum. Behav., vol. 78, pp. 283–297, Jan. 2018. no. 7, pp. 24–31, Jul. 2016.
[3] K. Singh and S. Upadhyaya, ‘‘Outlier detection: Applications and tech- [25] K. S. Desale, C. N. Kumathekar, and A. P. Chavan, ‘‘Efficient intru-
niques,’’ J. Netw. Comput. Appl., vol. 9, p. 307, Jan. 2012. sion detection system using stream data mining classification tech-
[4] J. Branch, C. Giannella, B. Szymanski, E. Wolff, and H. Kargupta, ‘‘In- nique,’’ in Proc. Int. Conf. Comput. Commun. Control Autom., Feb. 2015,
network outlier detection in wireless sensor networks,’’ Knowl. Inf. Syst., pp. 469–473.
vol. 34, pp. 23–54, Jan. 2013. [26] J. Zhang, H. Li, Q. Gao, H. Wang, and Y. Luo, ‘‘Detecting anomalies from
[5] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, ‘‘Survey of big network traffic data using an adaptive detection approach,’’ Inf. Sci.,
intrusion detection systems: Techniques, datasets and challenges,’’ Cyber- vol. 318, pp. 91–110, Oct. 2015.
security, vol. 2, no. 1, pp. 19–38, Dec. 2019. [27] D. Stiawan, M. Y. B. Idris, A. M. Bamhdi, and R. Budiarto, ‘‘CICIDS-2017
[6] J. Zhang and M. Zulkernine, ‘‘Anomaly based network intrusion detection dataset feature analysis with information gain for anomaly detection,’’
with unsupervised outlier detection,’’ in Proc. IEEE Int. Conf. Commun., IEEE Access, vol. 8, pp. 132911–132921, 2020.
Jun. 2006, pp. 2388–2393. [28] P. Jairu and A. B. Mailewa, ‘‘Network anomaly uncovering on CICIDS-
[7] P. Gogoi, D. K. Bhattacharyya, B. Borah, and J. K. Kalita, ‘‘A survey of 2017 dataset: A supervised artificial intelligence approach,’’ in Proc. IEEE
outlier detection methods in network anomaly identification,’’ Comput. J., Int. Conf. Electro Inf. Technol. (eIT), Mankato, MN, USA, May 2022,
vol. 54, no. 4, pp. 570–588, Apr. 2011. pp. 606–615.
[8] S. Suthaharan, ‘‘Support vector machine,’’ in Machine Learning Models [29] S. Patil, V. Varadarajan, S. M. Mazhar, A. Sahibzada, N. Ahmed, O. Sinha,
and Algorithms for Big Data Classification (Integrated Series in Informa- S. Kumar, K. Shaw, and K. Kotecha, ‘‘Explainable artificial intelligence for
tion Systems), vol. 36. Berlin, Germany: Springer, 2016, pp. 207–235. intrusion detection system,’’ Electronics, vol. 11, no. 19, p. 3079, 2022.

VOLUME 12, 2024 24439


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

[30] O. Elnakib, E. Shaaban, M. Mahmoud, and K. Emara, ‘‘EIDM: Deep [51] M. Tabassum, ‘‘A genetic algorithm analysis towards optimization
learning model for IoT intrusion detection systems,’’ J. Supercomput., solutions,’’ Int. J. Digit. Inf. Wireless Commun., vol. 4, no. 1,
vol. 79, no. 12, pp. 13241–13261, Aug. 2023. pp. 124–142, 2014.
[31] Y. Hou, S. G. Teo, Z. Chen, M. Wu, C.-K. Kwoh, and T. Truong-Huu, [52] T. Alam, S. Qamar, A. Dixit, and M. Benaida, ‘‘Genetic algorithm:
‘‘Handling labeled data insufficiency: Semi-supervised learning with self- Reviews, implementations, and applications,’’ Int. J. Eng. Pedagogy
training mixup decision tree for classification of network attacking traffic,’’ (iJEP), vol. 10, no. 6, p. 57, Dec. 2020.
IEEE Trans. Dependable Secure Comput., early access, Aug. 1, 2022, doi: [53] (2009). NSL-KDD Dataset for Network-based Intrusion Detection Sys-
10.1109/TDSC.2022.3195534. tems. [Online]. Available: http://nsl.cs.unb.ca/KDD/NSLKDD.html
[32] A. Smiti, ‘‘A critical overview of outlier detection methods,’’ Comput. Sci. [54] S. S. Panwar, P. S. Negi, and Y. P. Raiwani, ‘‘Implementation of machine
Rev., vol. 38, Nov. 2020, Art. no. 100306. learning algorithms on CICIDS-2017 dataset for intrusion detection
[33] B. Deka, ‘‘Pattern recognition and machine intelligence,’’ in Proc. 8th Int. using WEKA,’’ Int. J. Recent Technol. Eng. (IJRTE), vol. 8, no. 3,
Conf., Tezpur, India. Cham, Switzerland: Springer, Dec. 2019, pp. 56–64. pp. 2195–2207, Sep. 2019.
[34] R. Aliakbarisani, A. Ghasemi, and S. Felix Wu, ‘‘A data-driven met- [55] R. Panigrahi and S. Borah, ‘‘A detailed analysis of CICIDS2017 dataset for
ric learning-based scheme for unsupervised network anomaly detec- designing intrusion detection systems,’’ Int. J. Eng. Technol., vol. 7, no. 3,
tion,’’ Comput. Electr. Eng., vol. 73, pp. 71–83, Jan. 2019, doi: pp. 479–482, 2018.
10.1016/j.compeleceng.2018.11.003. [56] M. E. Elhamahmy, H. N. Elmahdy, and I. A. Saroit, ‘‘A new approach for
[35] W. Bul’ajoul, A. James, and M. Pannu, ‘‘Improving network intrusion evaluating intrusion detection system,’’ Artif. Intell. Syst. Mach. Learn.,
detection system performance through quality of service configuration and vol. 2, pp. 290–298, Nov. 2010.
parallel technology,’’ J. Comput. Syst. Sci., vol. 81, no. 6, pp. 981–999, [57] P. Kar, S. Banerjee, K. C. Mondal, G. Mahapatra, and S. Chattopadhyay,
Sep. 2015. ‘‘A hybrid intrusion detection system for hierarchical filtration of anoma-
lies,’’ in Information and Communication Technology for Intelligent Sys-
[36] P. V. Alvarado, ‘‘Design of a traffic generation platform for offline evalua-
tems. Berlin, Germany: Springer, 2019, pp. 417–426.
tion of NIDS,’’ in Proc. 8th Int. Conf. Adv. Comput. Control Netw. ACCN,
[58] Y.-F. Hsu, Z. He, Y. Tarutani, and M. Matsuoka, ‘‘Toward an online
Jun. 2018, pp. 11–15.
network intrusion detection system based on ensemble learning,’’ in Proc.
[37] I. Sumaiya Thaseen and C. Aswani Kumar, ‘‘Intrusion detection model IEEE 12th Int. Conf. Cloud Comput. (CLOUD), Jul. 2019, pp. 174–178.
using fusion of chi-square feature selection and multi class SVM,’’ J. King
[59] S. Sarvari, N. F. Mohd Sani, Z. Mohd Hanapi, and M. T. Abdullah,
Saud Univ.-Comput. Inf. Sci., vol. 29, no. 4, pp. 462–472, Oct. 2017.
‘‘An efficient anomaly intrusion detection method with feature
[38] S. S. Sathiyadhas and M. C. V. Soosai Antony, ‘‘A network intru- selection and evolutionary neural network,’’ IEEE Access, vol. 8,
sion detection system in cloud computing environment using dragonfly pp. 70651–70663, 2020.
improved invasive weed optimization integrated shepard convolutional [60] A. Alsaleh and W. Binsaeedan, ‘‘The influence of salp swarm algorithm-
neural network,’’ Int. J. Adapt. Control Signal Process., vol. 36, no. 5, based feature selection on network anomaly intrusion detection,’’ IEEE
pp. 1060–1076, May 2022. Access, vol. 9, pp. 112466–112477, 2021.
[39] S. Panwar and Y. Raiwani, ‘‘Data reduction techniques to analyze NSL- [61] L. Almuqren, M. S. Maashi, M. Alamgeer, H. Mohsen, M. A. Hamza, and
KDD dataset,’’ Int. J. Comput. Eng. Technol, vol. 5, no. 10, pp. 21–31, A. A. Abdelmageed, ‘‘Explainable artificial intelligence enabled intrusion
2017. detection technique for secure cyber-physical systems,’’ Appl. Sci., vol. 13,
[40] K. Keerthi Vasan and B. Surendiran, ‘‘Dimensionality reduction using no. 5, p. 3081, Feb. 2023.
principal component analysis for network intrusion detection,’’ Perspect. [62] Y. N. Rao and K. S. Babu, ‘‘An imbalanced generative adversarial network-
Sci., vol. 8, pp. 510–512, Sep. 2016. based approach for network intrusion detection in an imbalanced dataset,’’
[41] M. A. Hall and L. A. Smith, ‘‘Feature selection for machine learning: Sensors, vol. 23, no. 1, p. 550, Jan. 2023.
Comparing a correlation-based filter approach to the wrapper,’’ in Proc.
12th Int. FLAIRS Conf., 1999, pp. 235–239.
[42] M. Hosseinzadeh, A. M. Rahmani, B. Vo, M. Bidaki, M. Masdari, and
M. Zangakani, ‘‘Improving security using SVM-based anomaly detection:
Issues and challenges,’’ Soft Comput., vol. 25, no. 4, pp. 3195–3223, 3195.
[43] S. A. Ajila and A. A. Bankole, ‘‘Using machine learning algorithms for
cloud client prediction models in a web VM resource provisioning envi- OMAR ALGHUSHAIRY received the bachelor’s
ronment,’’ Trans. Mach. Learn. Artif. Intell., vol. 4, no. 1, p. 28, Feb. 2016, degree in information systems from King Abdu-
doi: 10.14738/tmlai.41.1690. laziz University, Saudi Arabia, the master’s degree
[44] R. Duo, X. Nie, N. Yang, C. Yue, and Y. Wang, ‘‘Anomaly detection and in computer science from the University of Bridge-
attack classification for train real-time Ethernet,’’ IEEE Access, vol. 9, port, USA, and the Ph.D. degree in computer
pp. 22528–22541, 2021, doi: 10.1109/access.2021.3055209. science from the University of Idaho, USA. He
[45] B. Zhang, Z. Liu, Y. Jia, J. Ren, and X. Zhao, ‘‘Network intrusion detection is currently an Assistant Professor in data science
method based on PCA and Bayes algorithm,’’ Secur. Commun. Netw., and AI with the College of Computer Science and
vol. 2018, pp. 1–11, Nov. 2018. Engineering, University of Jeddah, Saudi Arabia.
[46] K. Bong and J. Kim, ‘‘Analysis of intrusion detection performance by His research interests include outlier detection,
smoothing factor of Gaussian NB model using modified NSL-KDD data stream mining, machine learning, AI, and evolutionary computation.
dataset,’’ in Proc. 13th Int. Conf. Inf. Commun. Technol. Converg. (ICTC),
2022, pp. 1471–1476.
[47] J. Wainer and P. Fonseca, ‘‘How to tune the RBF SVM hyperparameters?
An empirical evaluation of 18 search algorithms,’’ Artif. Intell. Rev., vol. 54,
no. 6, pp. 4771–4797, May 2021, doi: 10.1007/s10462-021-10011-5.
[48] A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio, T. Abr ao,
and M. L. Proença, ‘‘Network anomaly detection system using genetic RAED ALSINI received the M.Sc. degree in
algorithm and fuzzy logic,’’ Exp. Syst. Appl., vol. 92, pp. 390–402, computer science from California State Univer-
Feb. 2018, doi: 10.1016/j.eswa.2017.09.013. sity, Fullerton, USA, in 2016, and the Ph.D.
[49] Z. Chiba, ‘‘New anomaly network intrusion detection system in cloud degree in computer science from the Univer-
environment based on optimized back propagation neural network using sity of Idaho, USA, in 2021. He is currently an
improved genetic algorithm,’’ Int. J. Commun. Netw. Inf. Secur. (IJCNIS), Assistant Professor with the Information Systems
vol. 11, no. 1, pp. 61–84, Apr. 2022, doi: 10.17762/ijcnis.v11i1.3764. Department, Faculty of Computing and Informa-
[50] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, tion Technology, King Abdulaziz University. His
‘‘A comprehensive survey on support vector machine classification: Appli- research interests include big data, anomaly detec-
cations, challenges and trends,’’ Neurocomputing, vol. 408, pp. 189–215, tion, streaming data, artificial intelligence, and
Sep. 2020. cybersecurity.

24440 VOLUME 12, 2024


O. Alghushairy et al.: Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

ZAKHRIYA ALHASSAN received the Ph.D. AYMAN YAFOZ received the M.Sc. degree in web
degree from the College of Computer Sciences, technology from the University of Southampton,
Durham University, Durham, U.K., in 2021. He U.K., in 2015, and the Ph.D. degree in computer
is currently an Assistant Professor with the Col- science from the University of Regina, Canada,
lege of Computer Science and Engineering, Uni- in 2021. He is currently an Assistant Professor
versity of Jeddah, Jeddah, Saudi Arabia. Before with the Information Systems Department, Fac-
joining Durham, he was with Saudi Aramco, Gen- ulty of Computing and Information Technology,
eral Electric (GE) and Saudi Electricity Com- King Abdulaziz University. His research inter-
pany (SEC), Saudi Arabia, in the area of business ests include natural language processing and data
intelligence. His current research interests include science.
business intelligence, medical data mining, clinical informatics, machine
learning, and artificial networks.

ABDULRAHMAN A. ALSHDADI received the


Ph.D. degree in cloud computing from the
University of Southampton, Southampton, U.K.,
in February 2018. He is currently an Associate
Professor in computer science with the College
of Computer Science and Engineering, University
of Jeddah. His research interests include industry
4.0 prestaining issues of cloud computing and fog
computing security, the Internet of Things (IoT)
and smart cities, intelligent systems, deep learning, XIAOGANG (MARSHALL) MA received the
data science analytics, and modelling. He has published numerous confer- Ph.D. degree in earth systems science and
ence papers, journal articles, and one book chapter. GIScience from the University of Twente,
The Netherlands, in 2011. He is currently an
Associate Professor in computer science with
the University of Idaho. Then, he completed his
postdoctoral training in data science with the Rens-
AMEEN BANJAR received the Ph.D. degree in selaer Polytechnic Institute. His research interests
distributed network functions virtualization from include deploying data science in the semantic
the University of Technology, Sydney, Australia, web to support cross-disciplinary collaboration
in November 2016. He is currently an Associate and scientific discovery, with broad interests in complex systems in earth and
Professor in information technology telecommuni- environmental sciences, data interoperability and provenance, and visualized
cation with the College of Computer Science and exploratory analysis of big and small data. He was one of the four invited
Engineering (CCSE), University of Jeddah. His early-career panelists at the 2016 International Data Week. He is active in
research interests include industry 4.0, involving international societies of data science and geoinformatics, including ACM
intelligent digital technology, machine learning SIGWEB, CODATA, ESIP, RDA, GSA, AGU, and IAMG. He received the
and deep learning to create a more holistic and Science of Team Science (SciTS) Meritorious Contribution Award, in 2018,
better-connected ecosystem of companies focusing on manufacturing, data the IAMG A.B. Vistelius Research Award, in 2015, and the inaugural ICSU-
science analytics, and modelling. He has published numerous conference WDS Data Stewardship Award, in 2014.
papers, journal articles, and book chapters.

VOLUME 12, 2024 24441

You might also like