0% found this document useful (0 votes)
12 views

Intrusion Detection - Full Doc Java

Uploaded by

jefferjam716
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Intrusion Detection - Full Doc Java

Uploaded by

jefferjam716
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

TABLE OF CONTENTS

CHAPTER TITLE PAGE


NO. NO.

ABSTRACT
LIST OF TABLES
LIST OF FIGURES
LIST OF ABBREVIATIONS
1. INTRODUCTION
1.1 FEATURE SELECTION
1.2 FEATURE ENGINEERING
1.3 CLASSIFICATION
1.4 MACHINE LEARNING
1.5 ENSEMBLE LEARNING
1.6 ANOMALY DETECTION
1.7 INTRUSION DETECTION SYSTEM
1.8OBJECTIVES
2 LITERATURE SURVEY
3 SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
3.1.1 DRAWBACKS
3.2 PROPOSED SYSTEM
3.2.1 ADVANTAGES
3.3 FEASIBILITY STUDY
3.3.1 TECHNICAL FEASIBILITY
3.3.2 OPERATIONAL FEASIBILITY
3.3.3 ECONOMICAL FEASIBILITY
4 SYSTEM SPECIFICATION
4.1 HARDWARE CONFIGURATION
4.2 SOFTWARE SPECIFICATION
5 SOFTWARE DESCRIPTION
5.1 FRONT END
6 PROJECT DESCRIPTION
6.1 PROBLEM DEFINITION
6.2 MODULE DESCRIPTION
6.3 SYSTEM FLOW DIAGRAM
6.4 INPUT DESIGN
6.5 OUTPUT DESIGN

7 SYSTEM TESTING AND


IMPLEMENTATION
7.1 SYSTEM TESTING
7.2 SYSTEM IMPLEMENTATION
8 SYSTEM MAINTENANCE
8.1 CORRECTIVE MAINTENANCE
8.2 ADAPTIVE MAINTENANCE
8.3 PERFECTIVE MAINTENANCE
9 CONCLUSION AND FUTURE
ENHANCEMENT
10 APPENDICES
10.1 SOURCE CODE
10.2 SCREEN SHOTS

11 REFERENCES
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NUMBER


1 FEATURE
SELECTION
2 FEATURE
ENGINEERING
3 SYSTEM FLOW
DIAGRAM

4 COMPARISON
GRAPH
LIST OF ABBREVATIONS

ABBREVATIONS
NIDS) NETWORK INTRUSION
DETECTION SYSTEMS
DSL DIGITAL SUBSCRIBER LINE
TDM) TIME DIVISION MULTIPLEXING
EPON), ETHERNET PASSIVE OPTICAL
NETWORK
(NG-PON2 NEXT-GENERATION PASSIVE
OPTICAL NETWORK STAGE
WDM WAVELENGTHDIVISION
MULTIPLEXING
(WLAN WIRELESS LOCAL ACCESS
NETWORK
IOT INTERNET OF THINGS
D-FES DEEP-FEATURE EXTRACTION
AND SELECTION
AWID) AEGEAN WI-FI INTRUSION
DATASET
B2B BUSINESS-TO BUSINESS
GB GRADIENT BOOSTING
RF RANDOM FOREST
CNN CONVOLUTIONAL NEURAL
NETWORK
CHAPTER 1

1. INTRODUCTION

In the digital age, the security of computer networks and data has become paramount.
With the increasing sophistication of cyber threats and the interconnectedness of our
systems, the need for robust network intrusion detection systems (NIDS) has never
been greater. Intrusion detection plays a pivotal role in safeguarding organizations,
detecting unauthorized access, and mitigating potential threats to information
systems. Traditional intrusion detection methods often face challenges in adapting
to the ever-evolving threat landscape. To address these challenges and enhance the
efficacy of intrusion detection, we propose a novel approach "Network Intrusion
Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature
Selection. “This research embarks on a journey to amalgamate cutting-edge
techniques from the realms of machine learning, data science, and cybersecurity. By
fusing the power of ensemble learning and automatic feature selection into a two-
phased detection system, we aim to redefine the landscape of network intrusion
detection.

1.1FEATURE SELECTION

In the ever-expanding digital ecosystem, the security of networks and information


systems has become a paramount concern. The proliferation of cyber threats, from
sophisticated malware to advanced persistent threats, necessitates the constant
evolution of network intrusion detection systems (NIDS) to safeguard against
unauthorized access and malicious activities. At the heart of effective NIDS lies the
selection of the most pertinent data attributes, commonly referred to as "features."
Feature selection is a critical process within the field of machine learning and data
analysis, with the primary goal of identifying and retaining the most informative
attributes while discarding irrelevant or redundant ones. In the context of network
intrusion detection, the judicious selection of features is pivotal in enhancing both
the efficiency and accuracy of the detection process. The motivation behind feature
selection in network intrusion detection is rooted in the quest for improved detection
capabilities and resource optimization. Conventional NIDS often confront
challenges posed by high-dimensional data and noisy features. These challenges can
lead to increased computational demands, suboptimal detection rates, and
heightened susceptibility to false alarms. As such, there exists a compelling need to
streamline the feature space, preserving only those attributes that significantly
contribute to the detection of intrusions, while discarding those that introduce noise
or computational overhead.

Figure 1 FEATURE SELECTION

1.2 FEATURE ENGINEERING

In the realm of machine learning, the quality of data is often paramount to the success
of predictive models and data-driven applications. While machine learning
algorithms can work wonders when presented with vast datasets, the art of "feature
engineering" has emerged as an indispensable process to transform raw data into a
more informative and efficient format. Feature engineering is a craft, akin to
sculpting a raw material into a masterpiece, where the raw material comprises the
data and the masterpiece is an accurate and powerful predictive model. The
motivation behind feature engineering lies in the inherent limitations and
idiosyncrasies of raw data. In many real-world applications, data is messy,
incomplete, and often contains extraneous information. Furthermore, not all data
attributes are equally relevant to the task at hand. Feature engineering seeks to
address these challenges by meticulously crafting new features or transforming
existing ones to better capture the underlying patterns and relationships in the data.

Figure 2 FEATURE ENGINEERING

1.3 CLASSIFICATION

In the vast landscape of machine learning, the task of classification stands as one of
the most fundamental and ubiquitous endeavors. At its core, classification is about
imparting the ability to an algorithm to make sense of the world by assigning items
into predefined categories or classes based on their inherent characteristics. This
ability is not only pervasive but also profoundly influential, as it underpins a
multitude of real-world applications, ranging from spam email filtering to medical
diagnosis and beyond. The motivation behind classification is deeply rooted in our
innate human inclination to categorize and organize the information-rich
environment around us. In a digital context, classification serves as a potent tool for
automating decision-making processes, discerning patterns in data, and making
predictions. It is the keystone of supervised learning, where models are trained on
labeled data to replicate the human ability to classify objects or observations into
meaningful groups.

1.4 MACHINE LEARNING

In the digital age, we find ourselves surrounded by an unprecedented deluge of data.


From the clicks we make on the internet to the sensors in our smartphone and the
vast databases that underpin modern businesses, data has become the lifeblood of
the information age. Yet, amidst this data-driven revolution, the ability to transform
raw data into actionable knowledge is what sets the stage for the most remarkable
technological advancements we've ever witnessed. At the heart of this
transformative process stands the science of "Machine Learning."The motivation
behind machine learning is rooted in our desire to make sense of the vast and
complex data ecosystems that define our contemporary world. It's driven by the
recognition that traditional rule-based programming is often insufficient to address
the intricacies of modern problems. Instead, machine learning endows computers
with the capacity to learn from data, to recognize patterns, and to make decisions,
often with astonishing accuracy. Machine learning algorithms can be trained to
predict future events or outcomes, from stock prices and weather patterns to disease
diagnoses.

1.5 ENSEMBLE LEARNING

In the realm of machine learning, the quest for improved predictive accuracy and
robustness has led to the development of ingenious techniques, and at the forefront
of this innovation stands the concept of "Ensemble Learning." Much like the
collective intelligence of a diverse group of individuals can often outperform a single
expert, ensemble learning harnesses the power of multiple machine learning models
to make better predictions, decisions, and classifications. The motivation behind
ensemble learning stems from the acknowledgment that no single machine learning
model is universally optimal for all tasks and datasets. In practice, different
algorithms excel under different conditions, and they may be more adept at capturing
specific patterns or mitigating particular sources of error. Ensemble learning seeks
to capitalize on this diversity by combining the strengths of multiple models,
mitigating their individual weaknesses, and achieving superior performance as a
collective. By aggregating predictions from multiple models, ensemble learning
aims to improve overall predictive accuracy. This is particularly valuable in domains
where high accuracy is paramount, such as medical diagnosis or financial
forecasting.

1.6ANAMOLY DETECTION

In an interconnected world brimming with data, the ability to identify the


extraordinary within the ordinary has emerged as a critical pursuit across diverse
domains. Anomaly Detection, often referred to as outlier detection, stands as a
sentinel in this quest for insight and security. It is the art and science of
distinguishing exceptional occurrences, behaviours, or patterns that deviate
significantly from the expected norm. The ubiquity of data in contemporary society
has given rise to an unprecedented opportunity: the capacity to glean hidden
knowledge from the vast tapestry of information. Anomaly detection plays an
integral role in this endeavour by spotlighting the unusual, the unexpected, and the
potentially impactful amid the constant flow of data. Whether applied to fault
detection in industrial systems, fraud prevention in financial transactions, or
intrusion detection in cybersecurity, the fundamental goal remains the same: to
unearth anomalies that could signify opportunities, threats, or areas warranting
further investigation.

1.7 INTRUSION DETECTION SYSTEM

In today's digitally interconnected world, the protection of sensitive data and critical
infrastructure from cyber threats is of paramount importance. As the complexity and
sophistication of malicious activities continue to evolve, traditional rule-based
Intrusion Detection Systems (IDS) have faced limitations in effectively identifying
and mitigating these threats. In response to this ever-expanding threat landscape, the
integration of machine learning techniques within IDS has emerged as a promising
approach. Machine learning, a subset of artificial intelligence, has the unique
capability to adapt and learn from data, making it well-suited for the dynamic and
evolving nature of cyber threats. By leveraging advanced algorithms and data-driven
insights, machine learning-based IDS aim to bolster cybersecurity defences by
detecting anomalous patterns and malicious behaviours in network traffic, system
logs, and other digital assets.

1.8 OBJECTIVES

1. Develop and implement an Intrusion Detection System (IDS) using the ADT-
SVM algorithm for dynamic cybersecurity threat detection.

2. Explore temporal and thermal correlations in network data to enhance the


adaptability of the IDS.

3. Evaluate IDS performance using key metrics such as Detection Rate (DR) and
False Alarm Rate (FAR) on the KDD dataset.
CHAPTER 2

2. LITERATURE REVIEW

2.1 THE EVOLUTION OF ETHERNET PASSIVE OPTICAL


NETWORK (EPON) AND FUTURE TRENDS

Felix Obiteet.al. Has proposed in this paper, the tremendous Internet traffic growth
has confirmed that the telecommunications back bone is moving aggressively from
a time division multiplexing (TDM) orientation to a focus on Ethernet solution.
Ethernet PON, which presents the convergence of low-cost Ethernet and fiber
infrastructures, has taken over the market initially dominated by Digital Subscriber
Line (DSL) and cable modems. It is a new technology that is simple, inexpensive,
and scalable, having the ability to deliver massive data services to end-users over a
single network. This paper reviewed the evolution of Ethernet Passive Optical
Network (EPON), with focus on the current development process of the future high-
data-rate access networks such as Next-Generation Passive Optical Network Stage
2 (NG-PON2), Wavelength Division Multiplexing (WDM) PON, and Orthogonal
Frequency Division Multiplexing (OFDM) PON. In addition, the recently concluded
100 Gb Ethernet Passive Optical Network (100G-EPON) is reviewed with the aim
of highlighting the recent developments in the field. With this comprehensive and
up-to-date review, we equip network operators and interested practitioners to focus
on common priorities and timelines. Another goal of this study is to identify
technical remedies for future investigation. Data traffic is on the increase at an
alarming rate and more users are accessing online, those who are already online
spend more time online and use more bandwidth-intensive applications. Broadband
services permitting high-speed internet transmission is expected to improve
economies. Hence, large bandwidth and mobility are two basic requirements for
future access cable modems are unable to withstand such demand. They were
designed on top of previous communication infrastructures that was not optimized
for data traffic. In cable modem systems, just a few RF channels are dedicated for
data, while most networks, in order to support new and real-time broadband
applications. DSL and of the bandwidth is reserved for servicing legacy analog
video. DSL copper systems only allow limited data rate at required distances due to
signal attenuation and crosstalk. It has become necessary for a new data-centric
solution, a technology that would be optimized for (IP) data congestion. Emerging
as the next generation Ethernet passive optical network is the 10 G-EPON. The
technical specification was standardized by IEEE 802.3av Task Force in September
2009 (10GPON). One of the major requirements in designing the specification is to
develop a platform of co-existence with the current 1 G EPON Network on the same
optical system and backward compatibility. This paper has described the service
trends and operator requirements that define the evolution of EPON and future
trends. It has proved that optical technologies are evolving continuously in the
direction of higher speeds, higher wavelength capability, and higher loss budgets. A
smart allocation and coexistence strategy of new and existing users is required, with
a logical combination of different types of users such as business and residential
subscribers. WDM-PONs implemented possibly by TDMA and TDM techniques are
unarguably the next stage in PONs evolution. With optical amplification, they
present higher bandwidth per ONU, maximum reach, and splitting ratios, as
compared to EPON and GPON architectures. They can withstand various fiber
topologies and gives additional functionality such as protection. WDM-PONs if
implemented, will give access to new broadband structure and a broad scale
residential applications.
2.2 REVISITING WIRELESS INTERNET CONNECTIVITY: 5G
VS WI-FI 6

Edward J. Oughtonet.al. has proposed in this paper In recent years, significant


attention has been directed toward the fifth generation of wireless broadband
connectivity known as ‘5G’, currently being deployed by Mobile Network
Operators. Surprisingly, there has been considerably less attention paid to ‘Wi-Fi 6’,
the new IEEE 802.1ax standard in the family of Wireless Local Area Network
technologies with features targeting private, edge-networks. This paper revisits the
suitability of cellular and Wi-Fi in delivering high speed wireless Internet
connectivity. Both technologies aspire to deliver significantly enhanced
performance, enabling each to deliver much faster wireless broadband connectivity,
and provide further support for the Internet of Things and Machine-to-Machine
communications, positioning the two technologies as technical substitutes in many
usage scenarios. We conclude that both are likely to play important roles in the
future, and simultaneously serve as competitors and complements. We anticipate
that 5G will remain the preferred technology for wide-area coverage, while Wi-Fi 6
will remain the preferred technology for indoor use, thanks to its much lower
deployment costs. However, the traditional boundaries that differentiated earlier
generations of cellular and Wi-Fi are blurring. Proponents of one technology may
argue for the benefits of their chosen technology displacing the other, requesting
regulatory policies that would serve to tilt the marketplace in their favor. We believe
such efforts need to be resisted, and that both technologies have important roles to
play in the marketplace, based on the needs of heterogeneous use cases. Both
technologies should contribute to achieving the goal of providing affordable,
reliable, and ubiquitously available high-capacity wireless broadband connectivity.
Almost in synchrony we are seeing the roll-out of the next generation of wireless
technologies for both cellular and Wi-Fi connectivity. While there has been much
excitement around the world regarding the fifth generation of cellular technology
known as ‘5G’, there is comparable enthusiasm for the next version of the Institute
of Electrical and Electronics Engineers’ (IEEE) 802.11 Wireless Local Access
Network (WLAN) standard, ‘Wi-Fi 6’. Next generation wireless connectivity
technologies are needed to further enable the Herein we revisited the debate
associated with wireless Internet connectivity by providing a new evaluation of the
two main technologies involved in the provision of next generation wireless
broadband: 5G and Wi-Fi 6. Our analysis highlights how the futures for 5G and Wi-
Fi 6 needs to be understood within the larger context of how earlier generations of
cellular and Wi-Fi technologies have shaped the evolution of wireless networking
and what this may mean for the future. First, in terms of general demand-side trends,
data traffic is expected to continue to grow significantly with an increasing
proportion of devices utilizing wireless connectivity as the first connection point.
The COVID-19 pandemic of 2019–2021 has highlighted the importance of enhanced
digital connectivity to support remote work, education, and social engagement
during the global crisis. But there may also be potentially new trends which could
arise out of the shifting work and social patterns produced by the pandemic. Such
changes could have repercussions for the spatial and temporal usage of wireless
broadband connectivity and the associated economics of each technology.

2.3 INTRUSION DETECTION SYSTEMS IN THE INTERNET OF


THINGS: A COMPREHENSIVE INVESTIGATION

Somayeh Hajiheidariet.al. Has proposed in this system, Recently, a new dimension


of intelligent objects has been provided by reducing the power consumption of
electrical appliances. Daily physical objects have been upgraded by electronic
devices over the Internet to create local intelligence and make communication with
cyberspace. Internet of things (IoT) as a new term in this domain is used for realizing
these intelligent objects. Since the objects in the IoT are directly connected to the
unsafe Internet, the resource constraint devices are easily accessible by the attacker.
Such public access to the Internet causes things to become vulnerable to the
intrusions. The purpose is to categorize the attacks that do not explicitly damage the
network, but by infecting the internal nodes, they are ready to carry out the attacks
on the network, which are named as internal attacks. Therefore, the significance of
Intrusion Detection Systems (IDSs) in the IoT is undeniable. However, despite the
importance of this topic, there is not any comprehensive and systematic review about
discussing and analyzing its significant mechanisms. Therefore, in the current paper,
a Systematic Literature Review (SLR) of the IDSs in the IoT environment has been
presented. Then detailed categorizations of the IDSs in the IoT (anomaly-based,
signature-based, specification-based, and hybrid), (centralized, distributed, hybrid),
(simulation, theoretical), (denial of service attack, Sybil attack, replay attack,
selective forwarding attack, wormhole attack, black hole attack, sinkhole attack,
jamming attack, false data attack) have also been provided using common features.
Then the advantages and disadvantages of the selected mechanisms are discussed.
Finally, the examination of the open issues and directions for future trends are also
provided. Connectivity of physical things to the Internet makes it possible to control
and manage them from a distance. These devices sense and record client activities,
forecast their future actions and give him/her the useful services. It is anticipated
that, in the next decade, the Internet will be a seamless fabrication of common
networks and related objects. The IoT as a new term in data and information age was
originally introduced by the MIT Auto-ID Center in 1998. It represents a vision
where objects are exclusively identified and available over the Internet. Also, the
real world can be more available through personal computers and networked devices
over the IoT and Internet. US National Intelligence Council (NIC) believes that IoT
has a potential effect on US national power. So, they have decided to put it on the
list of six disruptive civil technologies. This study has proposed a systematic review
of IDSs in IoT environments. In a resembling way, we have reviewed numerous
highly developed intrusion detection in the IoT, clarifying and discussing open
issues via an in-depth analysis of over 40 main studies among the basic 324 papers.
Based on the accessible literature, the found papers are categorized into four main
categories including anomaly-based IDS, signature-based IDS, specification based
IDS, hybrid IDS and also three categories including centralized, distributed, and
hybrid.

2.4 ENSEMBLE LEARNING FOR INTRUSION DETECTION


SYSTEMS: A SYSTEMATIC MAPPING STUDY AND CROSS-
BENCHMARK EVALUATION

Bayu Adhi Tamaet.al. Has proposed in this system Intrusion detection systems
(IDSs) are intrinsically linked to a comprehensive solution of cyberattacks
prevention instruments. To achieve a higher detection rate, the ability to design an
improved detection framework is sought after, particularly when utilizing ensemble
learners. Designing an ensemble often lies in two main challenges such as the choice
of available base classifiers and combiner methods. This paper performs an overview
of how ensemble learners are exploited in IDSs by means of systematic mapping
study. We collected and analyzed 124 prominent publications from the existing
literature. The selected publications were then mapped into several categories such
as years of publications, publication venues, datasets used, ensemble methods, and
IDS techniques. Furthermore, this study reports and analyzes an empirical
investigation of a new classifier ensemble approach, called stack of ensemble (SoE)
for anomaly-based IDS. The SoE is an ensemble classifier that adopts parallel
architecture to combine three individual ensemble learners such as random forest,
gradient boosting machine, and extreme gradient boosting machine in a
homogeneous manner. The performance significance among classification
algorithms is statistically examined in terms of their Matthews correlation
coefficients, accuracies, false positive rates, and area under ROC curve metrics. Our
study fills the gap in current literature concerning an up-to-date systematic mapping
study, not to mention an extensive empirical evaluation of the recent advances of
ensemble learning techniques applied to Istle ensemble of classifiers; which is
hereafter mentioned as an ensemble learner, has drawn a lot of interest in
cybersecurity research, and in an intrusion detection system (IDS) domain is no
exception. An IDS deals with the proactive and responsive detection of external
aggressors and anomalous operations of the server before they make such a massive
destruction. As of today, a variety number of cyberattacks has been in perilous
situations, placing some organization’s critical infrastructures into risk. A successful
attack may lead to difficult consequences such as but not limited to financial loss,
operational termination, and confidential information disclosure. Moreover, the
larger the organization’s network, the bigger the chance for attackers to exploit. The
complexity of the network may also give rise to vulnerabilities and other specific
threats. Therefore, security mitigation and protection strategies should be considered
mandatory. This study revealed that there has been a great interest in applying
random forest classifier for IDSs. This is because the implementation of random
forest is diverse and almost effortless to apply for. For instance Caret, Boruta,
VSURF ,etc are the example of random forest implementation in R.
2.5 DEEP ABSTRACTION AND WEIGHTED FEATURE
SELECTION FOR WI-FI IMPERSONATION DETECTION

Muhamad Erza Amina toet.al. Has proposed in this system, The recent advances in
mobile technologies have resulted in IoT-enabled devices becoming more pervasive
and integrated into our daily lives. The security challenges that need to be overcome
mainly stem from the open nature of a wireless medium such as a Wi-Fi network.
An impersonation attack is an attack in which an adversary is disguised as a
legitimate party in a system or communications protocol. The connected devices are
pervasive, generating high-dimensional data on a large scale, which complicates
simultaneous detections. Feature learning, however, can circumvent the potential
problems that could be caused by the large-volume nature of network data. This
study thus proposes a novel Deep-Feature Extraction and Selection (D-FES), which
combines stacked feature extraction and weighted feature selection. The stacked
autoencoding is capable of providing representations that are more meaningful by
reconstructing the relevant information from its raw inputs. We then combine this
with modified weighted feature selection inspired by an existing shallow-structured
machine learner. We finally demonstrate the ability of the condensed set of features
to reduce the bias of a machine learner model as well as the computational
complexity. Our experimental results on a well-referenced Wi-Fi network
benchmark dataset, namely, the Aegean Wi-Fi Intrusion Dataset (AWID), prove the
usefulness and the utility of the proposed D-FES by achieving a detection accuracy
of 99.918% and a false alarm rate of 0.012%, which is the most accurate detection
of impersonation attacks reported in the literature HE rapid growth of the Internet
has led to a significant increase in wireless network traffic in recent years. According
to a worldwide telecommunication consortium, proliferation of 5G and Wi-Fi
networks is expected to occur in the next decades. By 2020 1 wireless network traffic
is anticipated to account for two thirds of total Internet traffic — with 66% of IP
traffic expected to be generated by Wi-Fi and cellular devices only. Although
wireless networks such as IEEE 802.11 have been widely deployed to provide users
with mobility and flexibility in the form of high-speed local area connectivity, other
issues such as privacy and security have raised. The rapid spread of Internet of
Things (IoT)-enabled devices has resulted in wireless networks becoming to both
passive and active attacks, the number of which has grown dramatically. Examples
of these attacks are impersonation, flooding, and injection attacks. In this study, we
presented a novel method, D-FES, which combines stacked feature extraction and
weighted feature selection techniques in order to detect impersonation attacks in Wi-
Fi networks. SAE is implemented to achieve high-level abstraction of complex and
large amounts of Wi-Fi network data. The model-free properties in SAE and its
learnability on complex and large-scale data take into account the open nature of
Wi-Fi networks, where an adversary can easily inject false data or modify data
forwarded in the network.

2.6 RANSOMWARE DETECTION AND MITIGATION USING


SOFTWARE-DEFINED NETWORKING: THE CASE OF
WANNACRY

MaxatAkbanovet.al has proposed in this paper Modern day ransomware families


implement sophisticated encryption and propagation schemes, thus limiting chances
to recover the data almost to zero. We investigate the use of software-defined
networking (SDN) to detect and mitigate advanced ransomware threat. We present
our ransomware analysis results and our developed SDN-based security framework.
For the proof of concept, the infamous WannaCry ransomware was used. Based on
the obtained results, we design an SDN detection and mitigation framework and
develop a solution based on OpenFlow. The developed solution detects suspicious
activities through network traffic monitoring and blocks infected hosts by adding
flow table entries into OpenFlow switches in a real-time manner. Finally, our
experiments with multiple samples of WannaCry show that the developed
mechanism in all cases is able to promptly detect the infected machines and prevent
WannaCry from spreading. Nowadays ransomware presents a huge and the fastest
growing problem for all types of users from small households to large corporations
and government bodies. Starting from relatively simple fake antivirus applications
in 2008, 5 ransomware has evolved during the time and emerged into sophisticated
forms such as crypto type ransomware. The apotheosis of this evolution is the
occurrence of a new type of ransomware which combines the usage of exploits with
worm-like spreading mechanisms to propagate itself in both internal and external
networks. Moreover, the emergence of new ransomware families, such as 10
WannaCry, showed that ransomware keeps evolving and cyber criminals are
upgrading the ransomware code with more sophisticated features, such as worm
propagation components and public-key encryption mechanisms. Therefore, from
the research perspective, the design and development of new counter measures is
considered as an important task. We have designed and implemented a feasible
approach based on software defined networking to detect and mitigate ransomware
threat. Our experiments have been conducted with real samples of WannaCry
ransomware. The developed solution involves an application built for a centralized
controller which 380 communicates with the switches using the OpenFlow protocol.
In particular, our solution includes two plugins for blocking malicious network
addresses and port numbers based on a blacklist database and involving WannaCry
characteristics derived via static and dynamic analysis. To the best of our knowledge,
this is the first work to demonstrate that the 385 security mechanisms based on
software-defined networking are capable to successfully stop the infections from
ransomware with worm-spreading capabilities. In particular, our experimental
results show that the proposed mechanism is able to detect and block the traffic from
infected host, and therefore secure the remaining untouched part of the network

2.7 A SEMI-BOOSTED NESTED MODEL WITH SENSITIVITY-


BASED WEIGHTED BINARIZATION FOR MULTI-DOMAIN
NETWORK INTRUSION DETECTION

Joseph w. Mikhail et.al has proposed in this paper Effective network intrusion
detection techniques are required to thwart evolving cybersecurity threats.
Historically, traditional enterprise networks have been researched extensively in this
regard. However, the cyber threat landscape has grown to include wireless networks.
In this article, the authors present a novel model that can be trained on completely
different feature sets and applied to two distinct intrusion detection applications:
traditional enterprise networks and 802.11 wireless networks. This is the first
method that demonstrates superior performance in both aforementioned
applications. The model is based on a one-versus-all binary framework comprising
multiple nested sub-ensembles. To provide good generalization ability, each sub-
ensemble contains a collection of sub-learners, and only a portion of the sub-learners
implement boosting. A class weight based on the sensitivity metric (true-positive
rate), learned from the training data only, is assigned to the sub-ensembles of each
class. The use of pruning to remove sub-learners that do not contribute to or have an
adverse effect on overall system performance is investigated as well. The results
demonstrate that the proposed system can achieve exceptional performance in
applications to both traditional enterprise intrusion detection and 802.11 wireless
intrusion detection.Massive growth in the use of computer and network services has
resulted in an increase in the quantity of network-based security threats. Hackers can
exploit many entry points to gain unauthorized access to networks and devices. An
intrusion is considered an attempt to bypass security measures and compromise
confidentiality, integrity, or availability. Network intrusion detection involves being
able to detect and classify network traffic into different attack categories. Intrusion
detection system (IDS) devices are typically placed inside computer networks, and
they scan traffic for malicious events. In recent times, security researchers have
focused on building detection and classification models that can overcome the
limitations of signature-based methods by training a model that can generate a
prediction based on observations of network traffic representative of both attacks
and normal traffic. The KDD Cup 1999 dataset (KDD’99) has served as the baseline
for standard network intrusion detection research for many years. However, wireless
networks are now an additional avenue for network exploitation. Recent reports have
shown that in addition to protocol vulnerabilities, wireless access points are often
configured improperly.

2.8 BUILDING AN EFFICIENT INTRUSION DETECTION


SYSTEM BASED ON FEATURE SELECTION AND ENSEMBLE
CLASSIFIER

Yuyang Zhou et.al has proposed in this paper Intrusion detection system (IDS) is
one of extensively used techniques in a network topology to safe guard the integrity
and availability of sensitive assets in the protected systems. Although many
supervised and unsupervised learning approaches from the field of machine learning
have been used to increase the efficacy of IDSs, it is still a problem for existing
intrusion detection algorithms to achieve good performance. First, lots of redundant
and irrelevant data in high-dimensional datasets interfere with the classification
process of an IDS. Second, an individual classifier may not perform well in the
detection of each type of attacks. Third, many models are built for stale datasets,
making them less adaptable for novel attacks. Thus, we propose a new intrusion
detection framework in this paper, and this framework is based on the feature
selection and ensemble learning techniques. In the first step, a heuristic algorithm
called CFS-BA is proposed for dimensionality reduction, which selects the optimal
subset based on the correlation between features. Then, we introduce an ensemble
approach that combines C4.5, Random Forest (RF), and Forest by Penalizing
Attributes (Forest PA) algorithms. Finally, voting technique is used to combine the
probability distributions of the base learners for attack recognition. The experimental
results, using NSL-KDD, AWID, and CIC-IDS2017 datasets, reveal that the
proposed CFS-BA-Ensemble method is able to exhibit better performance than other
related and state of the art approaches under several metrics. Nowadays, the
applications of the Internet help society in many areas such as electronic
communication, teaching, commerce, and entertainment, it has become a part of
daily life of the people. However, cyber security has become vulnerable due to the
massive expansion of the computer networks and rapid emergence of the intrusion
incidents. The necessity of developing cyber security has attracted considerable
attention from industry and academia around the world. Despite the use of different
security applications, such as firewalls, malware prevention, data encryption, and
user authentication, many organizations and enterprises fall victims to contemporary
cyber-attacks. In order to sneak into the system, attackers might deliberately exploit
the vulnerabilities of the target system and launch different types of attacks, which
may lead to the leakage of private information.
2.9MLESIDSS: MACHINE LEARNING-BASED ENSEMBLES
FOR INTRUSION DETECTION SYSTEMS—A REVIEW

Gulshan Kumaret.al has proposed in this system Network security plays an essential
role in secure communication and avoids financial loss and crippled services due to
network intrusions. Intruders generally exploit the flaws of popular software to
mount a variety of attacks against network computer systems. The damage caused
in the network attacks may vary from a little disruption in service to on developing
financial loss. Recently, intrusion detection systems (IDSs) comprising machine
learning techniques have emerged for handling unauthorized usage and access to
network resources. With the passage of time, a wide variety of machine learning
techniques have been designed and integrated with IDSs. Still, most of the IDSs
reported poor intrusion detection results using false positive rate and detection rate.
For solving these issues, researchers focused on the development of ensemble
classifiers involving the integration of predictions by multiple individual classifiers.
The ensemble classifiers enable to compensate for the weakness of individual
classifiers and use their combined knowledge to enhance its performance. This study
presents motivation and comprehensive review of intrusion detection systems based
on ensembles in machine learning as an extension of our previous work in the field.
Particularly, different ensemble methods in the field are analyzed, taking into
consideration different types of ensembles, and various approaches for integrating
the predictions of individual classifiers for an ensemble classifier. The representative
studies are compared in chronological order for systematic and critical analysis,
understanding the current challenges and status of research in the field. Finally, the
study presents essential future research directions for the development of effective
IDSs.Network security plays a vital role in avoiding financial loss, protecting
customers from monetary damages, avoiding disabling or crippling services, and
limiting severe information loss due to network intrusions. Attackers generally
exploit the configurations and vulnerabilities of popular software to mount attacks
against network computer systems. The damage caused in these attacks may vary
from a little disruption in services to high financial losses. Existing conventional
security techniques like firewalls are only used as the first line of defense. These
techniques can be easily bypassed by the attackers.

2.10 AN ENHANCED ANOMALY DETECTION IN WEB


TRAFFIC USING A STACK OF CLASSIFIER ENSEMBLE

BayuAdhi Tamaet.al has proposed in this paper A Web attack protection system is
extremely essential in today’s information age. Classifier ensembles have been
considered for anomaly-based intrusion detection in Web traffic. However, they
suffer from an unsatisfactory performance due to a poor ensemble design. This paper
proposes a stacked ensemble for anomaly-based intrusion detection systems in a
Web application. Unlike a conventional stacking, where some single weak learners
are prevalently used, the proposed stacked ensemble is an ensemble architecture, yet
its base learners are other ensembles learners, i.e. random forest, gradient boosting
machine, and XGBoost. To prove the generalizability of the proposed model, two
datasets that are specifically used for attack detection in a Web application, i.e.
CSIC-2010v2 and CICIDS-2017 are used in the experiment. Furthermore, the
proposed model significantly surpasses existing Web attack detection techniques
concerning the accuracy and false positive rate metrics. Validation result on the
CICIDS-2017, NSL-KDD, and UNSW-NB15 dataset also ameliorate the ones
obtained by some recent techniques. Finally, the performance of all classification
algorithms in terms of a two-step statistical significance test is further discussed,
providing a value-added contribution to the current literature.In today’s information
age, every organization attempts to place their business on the Internet. Internet-
based applications enable companies to increase their revenue as well as to improve
or even redesign their business process, i.e. virtualization in supply chain or adopting
futuristic business-to business (B2B) platform. The Internet has been employed in
the last two decades by companies and many organizations worldwide. It helps an
organization to place a Web-based application such as e-commerce to offer timely
services or getting closer to its customers, for instance. Furthermore, it has changed
people’s life dramatically, in which the users could stay online to communicate with
each other anywhere and anytime. Nowadays, a high-speed Internet has brought a
significant contribution to the development of various types of Internet-based
computing such as ubiquitous computing, cloud computing, and mobile cloud
computing, among others. People are not dependent on the-spot computing resources
to run the application services, yet various services, i.e. storage, applications, and
servers are delivered to the user’s computers or devices over the Internet This study
has explored the use of stack architecture to combine multiple classifier ensembles,
i.e. gradient boosting machine (GBM), random forest (RF), and extreme gradient
boosting machine (XGB) for detecting anomaly in a Web application scenario. To
prove the generalizability of our proposed model, we have tested on multiple IDS
datasets such as CSIC-2010v2, CICIDS-2017, NSL-KDD, and UNSWNB15. Unlike
a conventional stacking technique that usually considers a weak individual
classification algorithm, our proposed model is built based on a combination of
strong classifier ensembles that work as base learners.
2.11A COMPREHENSIVE STUDY OF ANOMALY DETECTION
SCHEMES IN IOT NETWORKS USING MACHINE LEARNING
ALGORITHMS

AbebeDiroet.al has proposed in this paper The Internet of Things (IoT) consists of a
massive number of smart devices capable of data collection, storage, processing, and
communication. The adoption of the IoT has brought about tremendous innovation
opportunities in industries, homes, the environment, and businesses. However, the
inherent vulnerabilities of the IoT have sparked concerns for wide adoption and
applications. Unlike traditional information technology (I.T.) systems, the IoT
environment is challenging to secure due to resource constraints, heterogeneity, and
distributed nature of the smart devices. This makes it impossible to apply host-based
prevention mechanisms such as anti-malware and anti-virus. These challenges and
the nature of IoT applications call for a monitoring system such as anomaly detection
both at device and network levels beyond the organizational boundary. This suggests
an anomaly detection system is strongly positioned to secure IoT devices better than
any other security mechanism. In this paper, we aim to provide an in-depth review
of existing works in developing anomaly detection solutions using machine learning
for protecting an IoT system. We also indicate that blockchain-based anomaly
detection systems can collaboratively learn effective machine learning models to
detect anomalies. The IoT consists of myriad smart devices capable of data
collection, storage, processing, and communication. The adoption of the IoT has
brought about tremendous innovation opportunities in industries, homes, the
environment, and businesses, and it has enhanced the quality of life, productivity,
and profitability. However, infrastructures, applications, and services associated
with the IoT introduced several threats and vulnerabilities as emerging protocols and
workflows exponentially increased attack surfaces. For instance, the outbreak of the
Mirai botnet exploited IoT vulnerabilities and crippled several websites and domain
name systems. The IoT environment’s massive number, heterogeneity, and resource
constraints have hindered cyber-attack prevention and detection capabilities. These
characteristics attract monitoring IoT devices at the network level as on-device
solutions are not feasible. To this end, anomaly detection is better positioned to
protect the IoT network. To protect the system, anomaly detection is considered to
be an important tool as it helps identify and alert abnormal activities in the system.
Machine learning has been applied for anomaly detection systems in I.T. and IoT
systems. However, the applications of anomaly detection systems using machine
learning in I.T. systems have been better than the IoT ecosystem due to their resource
capabilities and in-perimeter location

2.12 IMPROVED PSO_ADABOOST ENSEMBLE ALGORITHM


FOR IMBALANCED DATA

Kewen Liet.al has proposed in this paper The Adaptive Boosting (AdaBoost)
algorithm is a widely used ensemble learning framework, and it can get good
classification results on general datasets. However, it is challenging to apply the
AdaBoost algorithm directly to imbalanced data since it is designed mainly for
processing misclassified samples rather than samples of minority classes. To better
process imbalanced data, this paper introduces the indicator Area Under Curve
(AUC) which can reflect the comprehensive performance of the model, and proposes
an improved AdaBoost algorithm based on AUC (AdaBoost-A) which improves the
error calculation performance of the AdaBoost algorithm by comprehensively
considering the effects of misclassification probability and AUC. To prevent
redundant or useless weak classifiers the traditional AdaBoost algorithm generated
from consuming too much system resources, this paper proposes an ensemble
algorithm, PSOPD-AdaBoost-A, which can re-initialize parameters to avoid falling
into local optimum, and optimize the coefficients of AdaBoost weak classifiers.
Experiment results show that the proposed algorithm is effective for processing
imbalanced data, especially the data with relatively high imbalance. Since
imbalanced data can be found in any area, effective classification of imbalanced data
has become critical for many applications. The classification results of imbalanced
data generated by existing classification algorithms are usually significantly affected
by the majority class, resulting in low accuracy in classification of the minority class.
For example, the sensor network can accurately achieve target recognition under the
assumption of data distribution equilibrium. However, in practical applications, the
filed environment is complex and variable, and the difficulty of obtaining samples
is different, which results in imbalanced data. It is easy to ignore samples of minority
class in this case, resulting in incorrect classification. In the intrusion alarm
application, misclassification of samples of minority class means false alarm of
system, which will cause very serious consequences. Traditional AdaBoost
algorithm focuses on the misclassified samples instead of the samples of minority
class. In this paper, we propose an improved AdaBoost algorithm (AdaBoost-A).
Since the AUC can effectively reflect the performance of the classifier, we introduce
the AUC into error calculation, making the AdaBoost focus more on the
classification accuracy of the minority. Furthermore, the AdaBoost algorithm may
generate redundant or useless weak classifiers, significantly affecting the readability
of the classifier. We propose an ensemble algorithm, PSOPD-AdaBoost-A, which
can further optimize the weight of the weak classifiers.

2.13 SUSTAINABLE ENSEMBLE LEARNING DRIVING


INTRUSION DETECTION MODEL

Mengyao Zhuet.al has proposed in this paper Nowadays, in machine learning based
intrusion detection systems, ensemble learning is a commonly adopted method to
improve the detection accuracy. Unfortunately, the existing works have not
considered the accumulation and reuse of historical knowledge, as well as the
sensitivity of the detection model to different types of attacks, which leads to a low
detection accuracy. To address the issue, this paper proposes a model based on
sustainable ensemble learning. In the model training stage, by taking the individual
classifiers probability output and classification confidence as the training data, we
build multi-class regression models such that ensemble learning adapts to different
attacks. Besides, in the updating stage, an iterative updating method is presented,
where the parameters and decision results of the historical model are added to the
training process of the new ensemble model to realize the incremental learning.
Experiment results show that the proposed model significantly outperforms the
existing solutions in terms of detection accuracy, false alarm, stability and
robustness. WITH advances in network-based computing services and applications,
the Internet suffers from more and more security threats. Therefore, intrusion
detection systems (IDS) are particularly important as an essential part of network
security defense. IDS discovers and identifies intrusions in the system by detecting
and analyzing network traffic or host behaviors. In order to detect abnormal
behaviors in large-scale network traffic, machine learning-based intrusion detection
systems have attracted a wide range of attention. Such methods adopt machine
learning techniques to extract features from a large amount of data and train a
classification model to classify network traffic or host behaviors to detect intrusions
in the system. In order to reduce the false alarm rate and false negative rate, prior
works on the machine learning-based intrusion detection system often employ
multiple machine learning model to construct the detection model, called ensemble
learning method, as demonstrated in Fig. 1. In the ensemble learning, the system first
constructs multiple machine learning models, and then integrates all individual
results via voting or weighted voting methods to obtain the final decision results.
2.14 AUTOMATIC FEATURE EXTRACTION AND SELECTION
FOR MACHINE LEARNING BASED INTRUSION DETECTION

Jinjie Liuet.al has proposed in this paper As the advances in mobile technologies
and IoT enabled devices have been integrated into our daily lives, significant
increases in wireless network traffic generate a large scale of high dimensional
network log data. This has led to challenges in security of Wi-Fi network systems
that have to analyze such complex big data for intrusion detection. Many Wi-Fi
network systems commonly employee machine learning based Intrusion Detection
Systems (IDS). Such IDS usually adopt supervised methods that heavily depend on
observations of human experts for feature extraction, feature selection, and labeling
processes of training data for classification. In this study, using the recently collected
Aegean Wi-Fi Intrusion Dataset (AWID) which contains real traces of different
network attacks types, we propose an unsupervised approach with automatic feature
extraction and selection process to replace human intervention and manual labelling
process for analyzing a large scale high dimensional data to improve the prediction
accuracy of classification to detect 3 most common network attack types – Injection,
Flooding, and Impersonate attacks in an IDS with a large scale of high dimensional
data. The experiment results showed the effectiveness of our approach for feature
extraction and selection. The quality of the selected features and the accuracy of
intrusion detection of the three attack types are compared and analyzed Wireless
networks such as IEEE 802.11 have been widely deployed to provide users with
mobility and flexibility in the form of high-speed local area connectivity. The rapid
growth of Internet-of-Things (IoT) enabled devices and the advances of mobile
technologies have led to a significant increase in wireless network traffic in recent
years. Cisco reported that worldwide mobile data traffic increased 13- fold over the
recent four years, reaching 11.2 Exabyte per month (134 Exabyte annually) in 2017.
As the rapid advances of IoT devices have become more pervasive and integrated
into our daily lives, the wireless network systems have become vulnerable targets to
both passive and active attacks more than ever, the number of those attacks has
grown dramatically, which have raised many challenges in privacy and security of
the wireless network systems.

2.15 A SURVEY OF CNN-BASED NETWORK INTRUSION DETECTION

Leila Mohammadpouret.al has proposed in this paper Over the past few years,
Internet applications have become more advanced and widely used. This has
increased the need for Internet networks to be secured. Intrusion detection systems
(IDSs), which employ artificial intelligence (AI) methods, are vital to ensuring
network security. As a branch of AI, deep learning (DL) algorithms are now
effectively applied in IDSs. Among deep learning neural networks, the convolutional
neural network (CNN) is a well-known structure designed to process complex data.
The CNN overcomes the typical limitations of conventional machine learning
approaches and is mainly used in IDSs. Several CNN-based approaches are
employed in IDSs to handle privacy issues and security threats. However, there are
no comprehensive surveys of IDS schemes that have utilized CNN to the best of our
knowledge. Hence, in this study, our primary focus is on CNN-based IDSs so as to
increase our understanding of various uses of the CNN in detecting network
intrusions, anomalies, and other types of attacks. This paper innovatively organizes
the studied CNN-IDS approaches into multiple categories and describes their
primary capabilities and contributions. The main features of these approaches, such
as the dataset, architecture, input shape, evaluated metrics, performance, feature
extraction, and classifier method, are compared. Because different datasets are used
in CNN-IDS research, their experimental results are not comparable. Hence, this
study also conducted an empirical experiment to compare different approaches based
on standard datasets, and the comparative results are presented in detail. Worldwide
economic and business progress is directly tied to the Internet and enterprise
networks. Furthermore, cyber-attacks are becoming increasingly common, which is
a significant security concern. For this reason, technicians and network security
specialists are paying increasing attention to identifying network attacks.
Governments and private organizations require solutions offering stable
performance in protecting the information assets they hold from any unlawful or
unwanted access and in preventing and detecting intrusions. The term “intrusion
detection system” (IDS) refers to a system that monitors and categorizes network
flows to determine whether they are the typical (normal) activity that frequently
occurs in a network or activity that could threaten the security of information
sisterships play a crucial role in securing networks and computer systems worldwide
and employ several AI techniques, such as machine learning, to enhance
performance against novel cyber-attack challenges. Deep learning has a significant
benefit over other traditional machine learning methods because it can independently
detect relevant features in high dimensional data. Among deep learning algorithms,
the CNN has been widely used by researchers to improve IDS solutions regarding
privacy issues and security threats. Therefore, this study provides a comprehensive
survey of CNN-based IDS schemes. It initially provides the background, in which
deep learning is briefly discussed, and the CNN is explained in detail.

2.16 INTRUSION DETECTION SYSTEM FOR IOT BASED ON


DEEP LEARNING AND MODIFIED REPTILE SEARCH
ALGORITHM

AbdelghaniDahouet.al has proposed in this paper Tis study proposes a novel


framework to improve intrusion detection system (IDS) performance based on the
data collected from the Internet of things (IoT) environments. The developed
framework relies on deep learning and metaheuristic (MH) optimization algorithms
to perform feature extraction and selection. A simple yet effective convolutional
neural network (CNN) is implemented as the core feature extractor of the framework
to learn better and more relevant representations of the input data in a lower-
dimensional space. A new feature selection mechanism is proposed based on a
recently developed MH method, called Reptile Search Algorithm (RSA), which is
inspired by the hunting behaviors of the crocodiles. the RSA boosts the IDS system
performance by selecting only the most important features (an optimal subset of
features) from the extracted features using the CNN model. Several datasets,
including KDDCup-99, NSL-KDD, CICIDS2017, and Bot-IoT, were used to assess
the IDS system performance. the proposed framework achieved competitive
performance in classification metrics compared to other well-known optimization
methods applied for feature selection problems. The emerging technology of the
Internet of Tings (IoT) is constantly evolving and being exploited in the last couple
of years, enabling communications and interactions among several devices via a
network; thus, it is propelling new technology of business process. Subsequently,
several challenges in many aspects, such as financially, in proving credibility, in the
enforcement, and in business operations, have come to the fore resulting from the
exponential growth of cybersecurity attacks. Cloud computing is normally used as
an IoT data storage, which is formulated as a model that supplies various resources
and services to the customer on-demand. Typically, cloud computing minimizes the
human intervention between users and providers [3]. Due to its impressive features,
it has received serious attention from organizations and users. However, to transit
from the current platform to the cloud computing platform, several struggling issues
can be faced related to the operation mechanism and security. the vulnerability of
cloud computing is related to the valuable data stored remotely on servers. Tis
security threat makes it a target for many cybercriminals and intruders; therefore, it
hinders many people from favoring or transiting to the cloud computing platform.
There are several reasons why the recent cyber attacks are substantially growing.
One of the main reasons is related to the existence and accessible hacking tools that
can be easy to use, which allow the naive hackers to quickly attack the cloud storage
without brilliant skills or specific knowledge

2.17 REPRESENTATION LEARNING-BASED NETWORK


INTRUSION DETECTION SYSTEM BY CAPTURING EXPLICIT
AND IMPLICIT FEATURE INTERACTIONS

Wei Wanget.al has proposed in this paper Network intrusion detection system is an
important cyber defense tool to protect a system from illegal attacks. Building an
effective network intrusion detection system that makes good use of deep learning
methods is a challenging task. From the object perspective, different types of
malicious attacks have a quite imbalance distribution, especially compared with
normal network behavior. From the feature perspective, the network behavior
description contains heterogeneous features, including numeric and categorical
features and complex interactions among these features. To address these two
challenges, we propose a novel Network Intrusion Detection System which by
learning explicit and implicit feature interactions based on representation learning,
i.e., RL-NIDS, which models the network behavior by learning explicit and implicit
feature interactions in both feature value representation and object representation
spaces. Specifically, the RL-NIDS consists of two main modules, i.e., unsupervised
Feature Value Representation Learning module (FVRL) which aims to learn the
feature interactions among categorical features explicitly, and supervised Neural
Network for object Representation Learning (NNRL) which aims to learn the
implicit interactions in the representation space. Experiments show the effectiveness
of RL NIDS and the object representation learned by RL-NIDS with multiclass
classification on two real-world datasets. The RL-NIDS outperforms the state-of-
the-art feature selection-based methods and deep learning-based methods in terms
of both overall accuracy, precision, recall, and F1 score. The accuracy of
classification of NSL-KDD and AWIDS dataset is 81.38% and 95.72%,
respectively, achieve 3.9% and 0.9% improvements compare to the second-best
method. Moreover, a thorough ablation study demonstrates the contributions of both
FVRL and NNRL which complement each other for capturing feature interactions.
With the widespread use of the Internet, more and more research works focus on
cyber security. Among the bunch of cybersecurity defense techniques, the network
intrusion detection system (NIDS) is one of the most important tools that can
actively protect a system from illegal external attacks. Traditional NIDS is based on
pattern matching which compares the patterns of a network against existing
malicious patterns which are always summarized by human. Nowadays, an
increasing number of researchers try to involve machine learning techniques to make
intrusion detection more effectiveness is essential to cybersecurity and making good
use of deep learning techniques to build NIDS is not a trivial task. In this work, we
propose an effective NIDS, i.e., RL-NIDS, which contains explicit feature
interaction learning and implicit deep representation learning. The explicit feature
interaction learning captures the network behavior through a multi-grain clustering
and highlights the abnormal feature value in the learned embedding in an
unsupervised way. The implicit representation learning is implemented by a neural
network which is constrained by classification loss and triplet loss. We design a
customized triplet generating and learning process to learn a more discriminative
representation and decision boundaries to overcome the imbalanced data distribution
issue.
2.18 CLASSIFICATION BY PAIRWISE COUPLING OF
IMPRECISE PROBABILITIES

Benjamin Quostet.al has proposed in this paper In this paper, we are interested in
making decisions by combining classifiers providing uncertain outputs, in the form
of sets of probability distributions. More precisely, each classifier provides lower
and upper bounds on the conditional probabilities of the associated classes. The
classifiers are combined by computing the set of unconditional probability
distributions compatible with these bounds, by solving linear optimization problems.
When the classifier outputs are inconsistent, we propose a correcting step that
restores this consistency. The experiments show the interest of our approach for
solving multi-class classification problems, particularly when information is scarce
(i.e., a limited number of classifiers is available). In this case, modeling the lack of
information associated with classifier outputs gives good results even when they are
poorly regularized or overfit the data. Among those latter decomposition-and-
combination strategies, the use of binary classifiers has received particular attention.
In this case, each sub-problem consists in separating two (sets of) classes from each
other. For example, the one-against-all decomposition scheme consists in opposing
each class to all the others; the pairwise strategy (such as pictured in Figure 1 for a
4-class problem), in opposing each class to each other. Both approaches may be
generalized within the theoretical framework of error-correcting output codes. One
can then use any kind of classifier to solve each of the binary problems (e.g., support
vector machines, decision trees, naive Bayes, logistic regression, etc). Note,
however, that binary classifier combination is known to be beneficial with respect
to direct multiclass approaches when considering a class of simple classifiers (e.g.,
combining linear classifiers makes it possible to compute a non-linear decision
boundary). On the other hand, combining sophisticated classification algorithms
(such as, e.g., kernel SVM, neural networks, or deep learning) will not significantly
increase classification accuracy compared to direct multi-class classification. This
phenomenon was pointed out in, and further studied in where the pairwise, one-
against-all and direct multi-class schemes were shown to perform similarly when
well-regularized classifiers are combined. In this article, we presented an approach
to solve multiclass classification problems by combining imprecise binary
classifiers. Each classifier is trained to separate two sets of classes of the original
training set. It is assumed to provide bounds on the conditional probabilities that an
instance belongs to the sets of considered classes. The classifiers are combined by
computing the set of probability distributions which are consistent with their outputs.
The bounds are first expressed as constraints on the unconditional probabilities of
the classes. Then, the maximalist rule can be used to determine the set of plausible
(nominated) classes. Alternatively, should a single decision be made, the maximum
rule returns the class with highest lower probability. Inconsistencies are resolved by
discounting the classifier outputs so as to find at least a probability distribution
which is consistent with the classifier outputs.

2.19 ANOMALY NETWORK-BASED INTRUSION DETECTION


SYSTEM USING A RELIABLE HYBRID ARTIFICIAL BEE
COLONY AND ADABOOST ALGORITHMS

MehrnazMaziniet.al has proposed in this paper Intrusion detection systems (IDSs)


has been considered as the main component of a safe network. One of the problems
of these security systems is false alarm report of intrusion to the network and
intrusion detection accuracy that happens due to the high volume of network data.
This paper proposes a new reliable hybrid method for an anomaly network-based
IDS (A-NIDS) using artificial bee colony (ABC) and AdaBoost algorithms in order
to gain a high detection rate (DR) with low false positive rate (FPR). ABC algorithm
is used to feature selection and AdaBoost are used to evaluate and classify the
features. Results of the simulation on NSL-KDD and ISCXIDS2012 datasets
confirm that this reliable hybrid method has a significant difference from other IDS,
which are accomplished according to the same dataset. It has demonstrated
differently better performance in different attacks-based scenarios. The accuracy and
detection rate of this method has been improved in comparison with legendary
methods Due to increase in Internet attacks which cause numerous damages, security
of network activities highly considered in the computer networks. These networks
use various security systems such as IDS to deal with attacks. IDSs are usually used
along with firewalls and act as a supplement for them. This security system has been
used for observation and analysis of incidents that seriously violate or threaten
computer security policies in computers and networks (Gautham Raman et al.,
2017). In general, IDSs purpose is for detecting attacks and security bugs and
announcing it to the administrator. In this paper, a reliable approach for anomaly
network-based IDS is proposed. Simulations were performed to evaluate the
performance of the proposed model on the NSL-KDD and ISCXIDS2012 data sets.
The network traffic datasets are large and unbalanced, thus affects the performance
of the IDS. The imbalance causes the minority class not to be properly detected by
conventional data-mining algorithms. By ignoring the instance of this class, they try
to increase overall accuracy, while the correct instance of minority class protocols is
also important. So in the proposed approach, the AdaBoost meta-algorithm has been
used for unbalanced data according to the proper design. The reason for this assertion
is the high precision of the proposed approach to classifying different attacks classes.
On the other hand, ABC is a useful meta-algorithm which can be used in
optimization IDS problems. The proposed algorithm has been used to select the best
subset of related features to detect network connections and because of the high
ability of these algorithms. The parameters regulation method is also influential on
problem efficiency. Considering the high number of records and data properties, the
number of replicates or searches for solutions in the search space by the bee is of
great importance.

2.20 I-SIAMIDS: IMPROVED SIAM-IDS FOR HANDLING CLASS


IMBALANCE IN NETWORK-BASED INTRUSION DETECTION
SYSTEMS

PunamBediet.al has proposed in this paper Network-based Intrusion Detection


Systems (NIDSs) identify malicious activities by analyzing network traffic. NIDSs
are trained with the samples of benign and intrusive network traffic. Training
samples belong to either majority or minority classes depending upon the number of
available instances. Majority classes consist of abundant samples for the normal
traffic as well as for recurrent intrusions. Whereas, minority classes include fewer
samples for unknown events or infrequent intrusions. NIDSs trained on such
imbalanced data tend to give biased predictions against minority attack classes,
causing undetected or misclassified intrusions. Past research works handled this
class imbalance problem using data-level approaches that either increase minority
class samples or decrease majority class samples in the training data set. Although
these data-level balancing approaches indirectly improve the performance of NIDSs,
they do not address the underlying issue in NIDSs i.e. they are unable to identify
attacks having limited training data only. This paper proposes an algorithm-level
approach called Improved Siam-IDS (I-Siam IDS), which is a two-layer ensemble
for handling class imbalance problem. I-Siam IDS identifies both majority and
minority classes at the algorithm-level without using any data-level balancing
techniques. The first layer of I-Siam IDS uses an ensemble of binary extreme
Gradient Boosting (b-Boost), Siamese Neural Network (Siamese NN) and Deep
Neural Network (DNN) for hierarchical filtration of input samples to identify
attacks. These attacks are then sent to the second layer of I-Siam IDS for
classification into different attack classes using multi-class extreme Gradient
Boosting classifier (m-XGBoost). As compared to its counterparts, I-Siam IDS
showed significant improvement in terms of Accuracy, Recall, Precision, F1- score
and values of Area Under the Curve (AUC) for both NSL-KDD and CIDDS-001
datasets. To further strengthen the results, computational cost analysis was also
performed to study the acceptability of the proposed I-Siam IDS.In the present
digital era, computer networks have become a crucial part of human life. They not
only serve as mediums for exchange of digital information, but also act as providers
of several different services to their users. This dependence of individuals and
organizations on computer networks has made them a lucrative target of cyber-
attacks. Cyber criminals try to compromise the confidentiality, integrity and
availability of online data and services through different network intrusions. To
identify such intrusions, Intrusion Detection Systems (IDSs) came into existence.
IDSs monitor and analyses online traffic to segregate normal and malicious content.
When IDSs are deployed within a network to identify network-based intrusions, they
are known as Network-based Intrusion Detection Systems (NIDSs). These systems
capture online network traffic and analyses it to detect the presence of attacks. A
significant advantage of using NIDSs is their ability to single-handedly monitor the
traffic traversing several devices present within the network.A major challenge in
the development of Network-based Intrusion Detection Systems (NIDSs) is the
presence of imbalanced network traffic. This traffic consists of a large number of
samples for benign and/or recurrent intrusions and limited number of samples for
unknown events and/or infrequent intrusions. An efficient NIDSs must be able to
identify all types of intrusions by handling this class imbalance in network traffic.
In this paper, we proposed Improved Siam-IDS (I-Siam IDS), which is a two-layer
ensemble for handling the problem of class imbalance using an algorithm-level
approach. I Siam IDS uses an ensemble of binary extreme Gradient Boosting,
Siamese Neural Network and Deep Neural Network classifiers at the first layer. This
layer performs hierarchical filtration of network data into benign and malicious
samples. Filtration of incoming data multiple times through different classifiers
minimizes the chances of malicious traffic going undetected by I-Siam IDS.
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

With an increase in the number and types of network attacks, traditional firewalls
and data encryption methods can no longer meet the needs of current network
security. As a result, intrusion detection systems have been proposed to deal with
network threats. The current mainstream intrusion detection algorithms are aided
with machine learning but have problems of low detection rates and the need for
extensive feature engineering. To address the issue of low detection accuracy, this
paper proposes a model for traffic anomaly detection named a deep learning model
for network intrusion detection (DLNID), which combines an attention mechanism
and the bidirectional long short-term memory (Bi-LSTM) network, first extracting
sequence features of data traffic through a convolutional neural network (CNN)
network, then reassigning the weights of each channel through the attention
mechanism, and finally using Bi-LSTM to learn the network of sequence features.
In intrusion detection public data sets, there are serious imbalance data generally. To
address data imbalance issues, this paper employs the method of adaptive synthetic
sampling (ADASYN) for sample expansion of minority class samples, to eventually
form a relatively symmetric dataset, and uses a modified stacked auto encoder for
data dimensionality reduction with the objective of enhancing information fusion.

3.1.1 DRAWBACKS

• The DLNID models are computationally expensive to train and deploy. This
is because DLNID models require large amounts of data and powerful
hardware to train effectively.
• It requires large amounts of labeled data to train effectively. This data can be
difficult and expensive to collect.

• Additionally, DLNID models are sensitive to the quality of the training data.
If the training data is biased or contains noise, the model will not be able to
learn to detect intrusions accurately.

• These models are black box models, meaning that it is difficult to understand
how they make predictions. This can make it difficult to debug DLNID
models and to identify false positives and false negatives.

3.2 PROPOSED SYSTEM

The proposed system integrates advanced techniques for intrusion detection in the dynamic
cybersecurity landscape. Combining a Probability Model for baseline behavior analysis, a Link-
Anomaly Score computation for identifying suspicious network connections, Change Point
Analysis and Dynamic Time Warping for detecting shifts in statistical properties and temporal
patterns, and the Adaptive Decision Tree-Support Vector Machine (ADT-SVM) algorithm for
accurate classification, the system offers a comprehensive approach to identifying potential
security threats. By leveraging these modules, the proposed system aims to enhance the
adaptability and effectiveness of intrusion detection, providing a robust defense mechanism
against evolving cyber threats. The ADT-SVM algorithm, with its ability to learn and categorize
diverse data attributes, and the implementation process also includes the utilization of the KDD
dataset as a benchmark to validate the system's performance. plays a central role in the proposed
system, contributing to a more resilient and responsive cybersecurity framework.

3.2.1 ADVANTAGES

• ADT-SVM's adaptability enhances real-time response to evolving cyber


threats. Integration of temporal and thermal correlations improves anomaly
detection accuracy.
• Machine learning-based IDS increases efficiency by automating intrusion
detection processes.
• Utilizing the KDD dataset as a benchmark provides standardized performance
evaluation.
• Data-driven approach improves the system's ability to categorize and respond
to diverse intrusion scenarios.

3.3 FEASIBILITY STUDY

Preliminary investigation examine project feasibility, the likelihood the system will
be useful to the organization. The main objective of the feasibility study is to test the
Technical, Operational and Economical feasibility for adding new modules and
debugging old running system. All system is feasible if they are unlimited resources
and infinite time. There are aspects in the feasibility study portion of the preliminary
investigation:

• Technical Feasibility
• Operation Feasibility
• Economical Feasibility

3.3.1 TECHNICAL FEASIBILITY


The technical issue usually raised during the feasibility stage of the
investigation includes the following:

• Does the necessary technology exist to do what is suggested?


• Do the proposed equipment’s have the technical capacity to hold the data required
to use the new system?
• Will the proposed system provide adequate response to inquiries, regardless of
the number or location of users?
• Can the system be upgraded if developed?
• Are there technical guarantees of accuracy, reliability, ease of access and data
security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure
Implementation System’. The current system developed is technically feasible. It is
a web based user interface for audit workflow at DB2 Database. Thus it provides an
easy access to the users. The database’s purpose is to create, establish and maintain
a workflow among various entities in order to facilitate all concerned users in their
various capacities or roles. Permission to the users would be granted based on the
roles specified.

Therefore, it provides the technical guarantee of accuracy, reliability and


security. The software and hard requirements for the development of this project are
not many and are already available in-house at NIC or are available as free as open
source. The work for the project is done with the current equipment and existing
software technology. Necessary bandwidth exists for providing a fast feedback to
the users irrespective of the number of users using the system.

3.3.2 OPERATIONAL FEASIBILITY


Proposed projects are beneficial only if they can be turned out into information
system. That will meet the organization’s operating requirements. Operational
feasibility aspects of the project are to be taken as an important part of the project
implementation. Some of the important issues raised are to test the operational
feasibility of a project includes the following: -

• Is there sufficient support for the management from the users?


• Will the system be used and work properly if it is being developed and
implemented?
• Will there be any resistance from the user that will undermine the possible
application benefits?
This system is targeted to be in accordance with the above-mentioned issues.
Beforehand, the management issues and user requirements have been taken into
consideration. So there is no question of resistance from the users that can undermine
the possible application benefits.

The well-planned design would ensure the optimal utilization of the computer
resources and would help in the improvement of performance status.

3.3.3 ECONOMIC FEASIBILITY


A system can be developed technically and that will be used if installed must
still be a good investment for the organization. In the economical feasibility, the
development cost in creating the system is evaluated against the ultimate benefit
derived from the new systems. Financial benefits must equal or exceed the costs.

The system is economically feasible. It does not require any addition hardware
or software. Since the interface for this system is developed using the existing
resources and technologies available at NIC, There is nominal expenditure and
economical feasibility for certain.
CHAPTER 4

SYSTEM SPECIFICATION

4.1 HARDWARE REQUIREMENTS

CPU type : Intel core i3 processor

Clock speed : 3.0 GHz

RAM size : 8 GB

Hard disk capacity : 40 GB

Keyboard type : Internet Keyboard

CD -drive type : 52xmax

4.2 SOFTWARE REQUIREMENTS

Operating System : Windows 10

Front End : JAVA


CHAPTER 5

SOFTWARE DESCRIPTION

5.1 FRONT END

JAVA

The software requirement specification is created at the end of the analysis task. The
function and performance allocated to software as part of system engineering are
developed by establishing a complete information report as functional
representation, a representation of system behavior, an indication of performance
requirements and design constraints, appropriate validation criteria.

FEATURES OF JAVA

Java platform has two components:

➢ The Java Virtual Machine (Java VM)


➢ The Java Application Programming Interface (Java API)
The Java API is a large collection of ready-made software components that provide
many useful capabilities, such as graphical user interface (GUI) widgets. The Java
API is grouped into libraries (packages) of related components.

The following figure depicts a Java program, such as an application or applet, that's
running on the Java platform. As the figure shows, the Java API and Virtual Machine
insulates the Java program from hardware dependencies.
As a platform-independent environment, Java can be a bit slower than native
code. However, smart compilers, well-tuned interpreters, and just-in-time byte code
compilers can bring Java's performance close to that of native code without
threatening portability.

SOCKET OVERVIEW:

A network socket is a lot like an electrical socket. Various plugs around the
network have a standard way of delivering their payload. Anything that understands
the standard protocol can “plug in” to the socket and communicate.

Internet protocol (IP) is a low-level routing protocol that breaks data into small
packets and sends them to an address across a network, which does not guarantee to
deliver said packets to the destination.

Transmission Control Protocol (TCP) is a higher-level protocol that manages to


reliably transmit data. A third protocol, User Datagram Protocol (UDP), sits next to
TCP and can be used directly to support fast, connectionless, unreliable transport of
packets.
CLIENT/SERVER:

A server is anything that has some resource that can be shared. There
are compute servers, which provide computing power; print servers, which manage
a collection of printers; disk servers, which provide networked disk space; and web
servers, which store web pages. A client is simply any other entity that wants to gain
access to a particular server.

A server process is said to “listen” to a port until a client connects to it.


A server is allowed to accept multiple clients connected to the same port number,
although each session is unique. To manage multiple client connections, a server
process must be multithreaded or have some other means of multiplexing the
simultaneous I/O.

RESERVED SOCKETS:

Once connected, a higher-level protocol ensues, which is dependent on


which port user are using. TCP/IP reserves the lower, 1,024 ports for specific
protocols. Port number 21 is for FTP, 23 is for Telnet, 25 is for e-mail, 79 is for
finger, 80 is for HTTP, 119 is for Netnews-and the list goes on. It is up to each
protocol to determine how a client should interact with the port.

JAVA AND THE NET:

Java supports TCP/IP both by extending the already established stream


I/O interface. Java supports both the TCP and UDP protocol families. TCP is used
for reliable stream-based I/O across the network. UDP supports a simpler, hence
faster, point-to-point datagram-oriented model.
INETADDRESS:

The Ine Address class is used to encapsulate both the numerical IP


address and the domain name for that address. User interacts with this class by using
the name of an IP host, which is more convenient and understandable than its IP
address. The InetAddress class hides the number inside. As of Java 2, version 1.4,
InetAddress can handle both IPv4 and IPv6 addresses.

FACTORY METHODS:

Three commonly used InetAddress factory methods are:

The InetAddress class has no visible constructors. To create an InetAddress object,


user use one of the available factory methods. Factory methods are merely a
convention whereby static methods in a class return an instance of that class. This is
done in lieu of overloading a constructor with various parameter lists when having
unique method names makes the results much clearer.

1. Static InetAddressgetLocalHost ( ) throws

UnknownHostException

2. Static InetAddressgetByName (String hostName)

throwsUnknowsHostException

3. Static InetAddress [ ] getAllByName (String hostName)

ThrowsUnknownHostException
INSTANCE METHODS:

The InetAddress class also has several other methods, which can be
used on the objects returned by the methods just discussed. Here are some of the
most commonly used.

Boolean equals (Object other)- Returns true if this object has the same
Internet address as other.

1. byte [ ] get Address ( )- Returns a byte array that represents the


object’s Internet address in network byte order.

2. String getHostAddress ( ) - Returns a string that represents the host


address associated with the InetAddress object.

3. String get Hostname ( ) - Returns a string that represents the host name
associated with the InetAddress object.

4. booleanisMulticastAddress ( )- Returns true if this Internet address is a


multicast address. Otherwise, it returns false.

5. String toString ( ) - Returns a string that lists the host name and the IP
address for convenience.

TCP/IP CLIENT SOCKETS:

TCP/IP sockets are used to implement reliable, bidirectional,


persistent, point-to-point and stream-based connections between hosts on the
Internet. A socket can be used to connect Java’s I/O system to other programs that
may reside either on the local machine or on any other machine on the Internet.
There are two kinds of TCP sockets in Java. One is for servers,
and the other is for clients. The Server Socket class is designed to be a “listener,”
which waits for clients to connect before doing anything. The Socket class is
designed to connect to server sockets and initiate protocol exchanges.

The creation of a Socket object implicitly establishes a connection


between the client and server. There are no methods or constructors that explicitly
expose the details of establishing that connection. Here are two constructors used to
create client sockets

Socket (String hostName, intport) - Creates a socket connecting the local host to the
named host and port; can throw an UnknownHostException or anIOException.

Socket (InetAddressipAddress, intport) - Creates a socket using a


preexistingInetAddressobject and a port; can throw an IOException.

A socket can be examined at any time for the address and port
information associated with it, by use of the following methods:

➢ InetAddressgetInetAddress ( ) - Returns the InetAddress


associated with the Socket object.
➢ IntgetPort ( ) - Returns the remote port to which this Socket
object is connected.
➢ IntgetLocalPort ( ) - Returns the local port to which this Socket
object is connected.
Once the Socket object has been created, it can also be examined to
gain access to the input and output streams associated with it. Each of these methods
can throw an IO Exception if the sockets have been invalidated by a loss of
connection on the Net.
Input Streamget Input Stream ( ) - Returns the InputStream associated
with the invoking socket.

Output Streamget Output Stream ( ) - Returns the OutputStream


associated with the invoking socket.

TCP/IP SERVER SOCKETS:

Java has a different socket class that must be used for creating server
applications. The ServerSocket class is used to create servers that listen for either
local or remote client programs to connect to them on published ports. ServerSockets
are quite different form normal Sockets.

When the user create a ServerSocket, it will register itself with the system as having
an interest in client connections.

➢ ServerSocket(int port) - Creates server socket on the specified port with a


queue length of 50.
➢ Serversocket(int port, int maxQueue) - Creates a server socket on the specified
portwith a maximum queue length of maxQueue.
➢ ServerSocket(int port, int maxQueue, InetAddress localAddress)-Creates a
server socket on the specified port with a maximum queue length of
maxQueue. On a multihomed host, localAddress specifies the IP address to
which this socket binds.
➢ ServerSocket has a method called accept( ) - which is a blocking call that will
wait for a client to initiate communications, and then return with a normal
Socket that is then used for communication with the client.
URL:
The Web is a loose collection of higher-level protocols and file formats, all
unified in a web browser. One of the most important aspects of the Web is that Tim
Berners-Lee devised a saleable way to locate all of the resources of the Net. The
Uniform Resource Locator (URL) is used to name anything and everything reliably.

The URLprovides a reasonably intelligible form to uniquely identify or


address information on the Internet. URLs are ubiquitous; every browser uses them
to identify information on the Web.
CHAPTER 6

PROJECT DESCRIPTION

6.1 PROBLEM DEFINITION

Traditional network security measures such as firewalls and data encryption are no
longer sufficient to protect networks from the increasing number and types of cyber-
attacks. Intrusion detection systems (IDSs) have been proposed to address this
challenge, but they typically suffer from low detection rates and the need for
extensive feature engineering. Deep learning models have the potential to overcome
these challenges and provide more effective intrusion detection. Deep learning
models can learn complex patterns in network traffic data and detect new and
emerging threats without the need for extensive feature engineering. However, deep
learning models also have several drawbacks, including high computational cost,
data requirements, lack of interpretability, and vulnerability to adversarial attacks.

6.2 MODULE DESCRIPTION

6.2.1 PROBABILITY MODEL


This module involves the development and application of a probability model for
analyzing network data. The probability model likely assesses the likelihood of
certain events or patterns within the data, providing a foundational understanding of
the baseline behavior. By establishing a probability distribution, anomalies can be
identified by deviating from expected patterns, enabling the system to flag
potentially malicious activities.
6.2.2 COMPUTING THE LINK-ANOMALY SCORE
In this module, the system calculates link-anomaly scores to quantify the
abnormality of network links or connections. The computation involves analyzing
various attributes associated with network links, such as traffic patterns,
communication frequencies, or data transfer volumes. A higher link-anomaly score
may indicate suspicious or anomalous behavior, directing the attention of the
intrusion detection system to potential security threats within the network.

6.2.3 CHANGE POINT ANALYSIS AND DTO


This module focuses on change point analysis and Dynamic Time Warping (DTO)
techniques. Change point analysis aims to identify shifts or deviations in the
statistical properties of the data, signaling potential security incidents. DTO, on the
other hand, involves measuring the similarity between sequences over time, aiding
in the detection of variations in temporal patterns. Integrating these methods
enhances the system's ability to adapt to evolving cyber threats and identify
deviations from normal behavior.

6.2.4 ADT-SVM DETECTION METHOD


The ADT-SVM Detection Method module implements the Adaptive Decision Tree-
Support Vector Machine (ADT-SVM) algorithm for intrusion detection. This
algorithm combines the adaptability of decision trees with the classification power
of support vector machines. The ADT-SVM model is trained on labeled data,
learning to distinguish between normal and anomalous network behavior. Once
trained, it is employed to categorize incoming data attributes into predefined classes,
such as Basic, Content, Traffic, and Host, facilitating the identification of potential
security threats within the network. The module likely involves fine-tuning and
optimizing the ADT-SVM parameters for optimal detection performance.
6.3 SYSTEM FLOW DIAGRAM

Computing
Anomaly Score
Feature
Loading Dataset Preprocessing Based On
Selection
Selected
Features

Detecting Threats
Result Using ADT-SVM
Method
6.4 INPUT DESIGN

In the context of the research project focused on cybersecurity and intrusion


detection using the ADT-SVM algorithm, the input design involves carefully
structuring the data fed into the system. The design encompasses the selection and
preprocessing of relevant network datasets, including the KDD dataset as a
benchmark. Emphasis is placed on capturing temporal and thermal correlations
within the data, ensuring that the input reflects the dynamic nature of cyber threats.
Data attributes are categorized into four classes: Basic, Content, Traffic, and Host,
creating a comprehensive representation of network behavior. The success of the
input design is critical for the effective functioning of the ADT-SVM algorithm,
enabling the Intrusion Detection System (IDS) to learn and adapt to diverse intrusion
scenarios while providing a foundation for robust performance evaluation using
metrics such as Detection Rate and False Alarm Rate.

6.5 OUTPUT DESIGN

The output design in the context of the cybersecurity and intrusion detection project
utilizing the ADT-SVM algorithm involves the systematic presentation and
interpretation of results generated by the Intrusion Detection System (IDS). The
output includes categorizations of incoming data into four classes: Basic, Content,
Traffic, and Host, allowing for a granular understanding of network behavior.
Detection and classification outcomes are presented through visualizations or reports
that highlight instances of identified intrusions and false alarms.
CHAPTER 7

7. SYSTEM TESTING AND IMPLEMENTATION

7.1 SYSTEM TESTING

System testing in the context of the cybersecurity project employing the ADT-SVM
algorithm involves a comprehensive evaluation of the entire Intrusion Detection
System (IDS). This phase verifies the functionality, performance, and reliability of
the IDS by subjecting it to various test cases and scenarios. The testing process
includes assessing the adaptability of the ADT-SVM algorithm to dynamic cyber
threats and ensuring its ability to categorize data attributes into the designated
classes: Basic, Content, Traffic, and Host. The KDD dataset is utilized to simulate
real-world conditions and benchmark the system's performance. System testing also
incorporates the evaluation of key metrics such as Detection Rate (DR) and False
Alarm Rate (FAR) to gauge the accuracy and efficiency of the IDS in identifying
intrusions while minimizing false positives.

7.2 SYSTEM IMPLEMENTATION

System implementation in the cybersecurity project utilizing the ADT-SVM


algorithm involves the deployment and execution of the developed Intrusion
Detection System (IDS) in a real-world or simulated environment. This phase
encompasses translating the research findings and algorithmic models into a
functional system capable of actively monitoring and analyzing network data. The
ADT-SVM algorithm is integrated into the IDS architecture to classify incoming
data attributes into predefined classes such as Basic, Content, Traffic, and Host. The
implementation process also includes the utilization of the KDD dataset as a
benchmark to validate the system's performance.
CHAPTER 8

SYSTEM MAINTENANCE

The objectives of this maintenance work are to make sure that the system gets into
work all time without any bug. Provision must be for environmental changes which
may affect the computer or software system. This is called the maintenance of the
system. Nowadays there is the rapid change in the software world. Due to this rapid
change, the system should be capable of adapting these changes. In this project the
process can be added without affecting other parts of the system. Maintenance plays
a vital role. The system is liable to accept any modification after its implementation.
This system has been designed to favor all new changes. Doing this will not affect
the system’s performance or its accuracy.

Maintenance is necessary to eliminate errors in the system during its working life
and to tune the system to any variations in its working environment. It has been seen
that there are always some errors found in the system that must be noted and
corrected. It also means the review of the system from time to time.

The review of the system is done for:

▪ Knowing the full capabilities of the system.

▪ Knowing the required changes or the additional requirements.

▪ Studying the performance.

TYPES OF MAINTENANCE:

• Corrective maintenance

• Adaptive maintenance
• Perfective maintenance

• Preventive maintenance

8.1 CORRECTIVE MAINTENANCE

Changes made to a system to repair flows in its design coding or


implementation. The design of the software will be changed. The
corrective maintenance is applied to correct the errors that occur during
that operation time. The user may enter invalid file type while submitting
the information in the particular field, then the corrective maintenance
will displays the error message to the user in order to rectify the error.

Maintenance is a major income source. Nevertheless, even today


many organizations assign maintenance to unsupervised beginners, and
less competent programmers.

The user’s problems are often caused by the individuals who


developed the product, not the maintainer. The code itself may be badly
written maintenance is despised by many software developersunless good
maintenance service is provided, the client will take future development
business elsewhere. Maintenance is the most important phase of software
production, the most difficult and most thankless.
8.2 ADAPTIVE MAINTENANCE:

It means changes made to system to evolve its functionalities to change


business needs or technologies. If any modification in the modules the software will
adopt those modifications. If the user changes the server then the project will adapt
those changes. The modification server work as the existing is performed.

8.3 PERFECTIVE MAINTENANCE:

Perfective maintenance means made to a system to add new features or


improve performance. The perfective maintenance is done to take some perfect
measures to maintain the special features. It means enhancing the performance or
modifying the programs to respond to the users need or changing needs. This
proposed system could be added with additional functionalities easily. In this
project, if the user wants to improve the performance further then this software can
be easily upgraded.

8.4 PREVENTIVE MAINTENANCE:

Preventive maintenance involves changes made to a system to reduce the changes


of features system failure. The possible occurrence of error that might occur are
forecasted and prevented with suitable preventive problems. If the user wants to
improve the performance of any process then the new features can be added to the
system for this project.
CHAPTER 9

9. CONCLUSION

In conclusion, the presented cybersecurity framework, incorporating modules such


as the Probability Model, Link-Anomaly Score computation, Change Point Analysis
with Dynamic Time Warping, and the Adaptive Decision Tree-Support Vector
Machine (ADT-SVM) algorithm, constitutes a comprehensive and adaptive
intrusion detection system. By addressing the dynamic challenges in the cyber threat
landscape, this system leverages probabilistic analysis, anomaly scoring, and
machine learning to effectively identify potential security threats. The integration of
advanced techniques and the utilization of the ADT-SVM algorithm contribute to
the system's ability to adapt and learn from evolving cyber threats. The proposed
framework not only offers a multi-faceted approach to intrusion detection but also
emphasizes the importance of continual adaptation in the face of emerging
cybersecurity challenges.

FUTURE WORK

Future work in this domain could focus on refining and extending the proposed
cybersecurity framework to address emerging challenges. Further exploration of
advanced machine learning models, beyond ADT-SVM, could enhance the system's
detection capabilities. Investigating the integration of threat intelligence feeds and
real-time network monitoring technologies could contribute to a more proactive
defense mechanism. Additionally, incorporating mechanisms for self-learning and
adaptation to new attack vectors would be crucial for staying ahead of evolving
threats.
CHAPTER 10

APPENDICES

10.1 SOURCE CODE

DECISION TREE.JAVA

package adt;

import java.io.BufferedOutputStream;

import java.io.BufferedReader;

import java.io.DataOutputStream;

import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.FileReader;

import java.util.ArrayList;

import java.util.Random;

import libsvm.svm;

import libsvm.svm_model;

import weka.classifiers.Evaluation;

import weka.core.Instances;

import weka.core.converters.CSVLoader;
import weka.classifiers.trees.J48;

/**

* @author admin

*/

public class DecisionTree

Details dt=new Details();

ArrayList newCls=new ArrayList();

public void construct()

try

CSVLoader csv1=new CSVLoader();

csv1.setSource(new File("train1.csv"));

Instances trdata=csv1.getDataSet();
trdata.setClassIndex(trdata.numAttributes() - 1);

J48 nb=new J48();

nb.buildClassifier(trdata);

CSVLoader csv2=new CSVLoader();

csv2.setSource(new File("test1.csv"));

Instances tedata=csv2.getDataSet();

tedata.setClassIndex(tedata.numAttributes() - 1);

Evaluation eval = new Evaluation(trdata);

eval.evaluateModel(nb, trdata);//, null);

for(int i=0;i<tedata.numInstances();i++)

int ind=(int)nb.classifyInstance(tedata.instance(i));

newCls.add(ind);

// int it=(int)tedata.instance(i).classValue();

// int ind=(int)nb.classifyInstance(tedata.instance(i));

// System.out.println(it+" : "+ind);

}
ocSVM();

System.out.println(eval.toClassDetailsString());

ADTocSVM();

catch(Exception e)

e.printStackTrace();

public void ocSVM()

try

SVMData svm1=new SVMData();

svm1.readTrData("train2.csv");

svm1.convertTrData("train1.txt");

SVMTrain svmtr=new SVMTrain();


svmtr.run();

readData("test2.csv");

int i, predict_probability=0;

SVMPredict sm=new SVMPredict();

BufferedReader input = new BufferedReader(new FileReader("test1.txt"));

DataOutputStream output = new DataOutputStream(new


BufferedOutputStream(new FileOutputStream("Res1.txt")));

svm_model model = svm.svm_load_model("train1.model");

if(predict_probability == 1)

if(svm.svm_check_probability_model(model)==0)

System.err.print("Model does not support probabiliy estimates\n");

System.exit(1);

else

{
if(svm.svm_check_probability_model(model)!=0)

System.out.print("Model supports probability estimates, but disabled in


prediction.\n");

String res=sm.predict(input,output,model,predict_probability);

input.close();

output.close();

catch(Exception e)

e.printStackTrace();

public void ADTocSVM()

try

SVMData svm1=new SVMData();

svm1.readTrData("train2.csv");

svm1.convertTrData("train1.txt");
SVMTrain svmtr=new SVMTrain();

svmtr.run();

convertData2("test2.csv");

int i, predict_probability=0;

SVMPredict sm=new SVMPredict();

BufferedReader input = new BufferedReader(new


FileReader("newtest1.txt"));

DataOutputStream output = new DataOutputStream(new


BufferedOutputStream(new FileOutputStream("Res1.txt")));

svm_model model = svm.svm_load_model("train1.model");

if(predict_probability == 1)

if(svm.svm_check_probability_model(model)==0)

System.err.print("Model does not support probabiliy estimates\n");

System.exit(1);

}
else

if(svm.svm_check_probability_model(model)!=0)

System.out.print("Model supports probability estimates, but disabled in


prediction.\n");

String res=sm.predict(input,output,model,predict_probability);

input.close();

output.close();

catch(Exception e)

e.printStackTrace();

public void readData(String pp)

try

String dSet[][];
int nData[][];

ArrayList cls=new ArrayList();

ArrayList clsCnt=new ArrayList();

String colName[];

String colType[];

File fe=new File(pp);

FileInputStream fis=new FileInputStream(fe);

byte data[]=new byte[fis.available()];

fis.read(data);

fis.close();

String sg1[]=new String(data).split("\n");

String col[]=sg1[0].split(",");

String colty[]=sg1[1].split(",");

colName=new String[col.length];

colType=new String[col.length];

for(int i=0;i<col.length;i++)

colName[i]=col[i];

colType[i]=colty[i];

}
dSet=new String[sg1.length-2][col.length];

nData=new int[sg1.length-2][col.length];

for(int i=2;i<sg1.length;i++)

String sg2[]=sg1[i].split(",");

for(int j=0;j<sg2.length;j++)

dSet[i-2][j]=sg2[j]; //org

String c1=sg2[sg2.length-1].trim();

if(!cls.contains(c1))

cls.add(c1);

System.out.println("cls "+cls);

System.out.println("clsCnt "+clsCnt);

System.out.println("dset = "+dSet.length+" : "+dSet[0].length);


for(int i=0;i<colType.length;i++)

if(colType[i].trim().equals("dis"))

ArrayList at=new ArrayList();

for(int j=0;j<dSet.length;j++)

//System.out.println("i=== "+i+" : "+j+" = "+dSet[j][i]);

String g1=dSet[j][i].trim();

if(!at.contains(g1))

at.add(g1);

for(int j=0;j<dSet.length;j++)

String g1=dSet[j][i].trim();

nData[j][i]=at.indexOf(g1);

else

for(int j=0;j<dSet.length;j++)

{
dSet[j][i]=String.valueOf(Math.round(Double.parseDouble(dSet[j][i])));

nData[j][i]=Integer.parseInt(dSet[j][i]);

String txt1="";

for(int i=0;i<nData.length;i++)

// String g1="";

String g1=String.valueOf(nData[i][nData[0].length-1]);

for(int j=0;j<nData[0].length-1;j++)

g1=g1+"\t"+nData[i][j];

//g1=g1+nData[i][j]+"\t";

txt1=txt1+g1.trim()+"\n";

System.out.println(txt1);

File fe2=new File("test1.txt");

FileOutputStream fos=new FileOutputStream(fe2);


fos.write(txt1.getBytes());

fos.close();

catch(Exception e)

e.printStackTrace();

public void convertData2(String pp)

try

String dSet[][];

int nData[][];

ArrayList cls=new ArrayList();

ArrayList clsCnt=new ArrayList();

String colName[];

String colType[];

File fe=new File(pp);


FileInputStream fis=new FileInputStream(fe);

byte data[]=new byte[fis.available()];

fis.read(data);

fis.close();

String sg1[]=new String(data).split("\n");

String col[]=sg1[0].split(",");

String colty[]=sg1[1].split(",");

colName=new String[col.length];

colType=new String[col.length];

for(int i=0;i<col.length;i++)

colName[i]=col[i];

colType[i]=colty[i];

dSet=new String[sg1.length-2][col.length];

nData=new int[sg1.length-2][col.length];

for(int i=2;i<sg1.length;i++)

String sg2[]=sg1[i].split(",");
for(int j=0;j<sg2.length;j++)

dSet[i-2][j]=sg2[j]; //org

String c1=sg2[sg2.length-1].trim();

if(!cls.contains(c1))

cls.add(c1);

System.out.println("cls "+cls);

System.out.println("clsCnt "+clsCnt);

System.out.println("dset = "+dSet.length+" : "+dSet[0].length);

for(int i=0;i<colType.length;i++)

if(colType[i].trim().equals("dis"))

ArrayList at=new ArrayList();

for(int j=0;j<dSet.length;j++)

//System.out.println("i=== "+i+" : "+j+" = "+dSet[j][i]);


String g1=dSet[j][i].trim();

if(!at.contains(g1))

at.add(g1);

for(int j=0;j<dSet.length;j++)

String g1=dSet[j][i].trim();

nData[j][i]=at.indexOf(g1);

else

for(int j=0;j<dSet.length;j++)

dSet[j][i]=String.valueOf(Math.round(Double.parseDouble(dSet[j][i])));

nData[j][i]=Integer.parseInt(dSet[j][i]);

String txt1="";

for(int i=0;i<nData.length;i++)
{

//String g1=String.valueOf(nData[i][nData[0].length-1]);

String g1=newCls.get(i).toString();

//String g1="";

for(int j=0;j<nData[0].length-1;j++)

g1=g1+"\t"+nData[i][j];

//g1=g1+nData[i][j]+"\t";

g1=g1+newCls.get(nData[0].length-1);

txt1=txt1+g1.trim()+"\n";

System.out.println(txt1);

File fe2=new File("newtest1.txt");

FileOutputStream fos=new FileOutputStream(fe2);

fos.write(txt1.getBytes());

fos.close();

catch(Exception e)
{

e.printStackTrace();

10.2 SCREEN SHOTS


CHAPTER 11

REFERENCES

[1] R. Kumar, A. Malik, and V. Ranga, ‘‘An intellectual intrusion detection system
using hybrid hunger games search and remora optimization algorithm for IoT
wireless networks,’’ Knowl.-Based Syst., vol. 256, Nov. 2022, Art. no. 109762.

[2] W. Wang, S. Jian, Y. Tan, Q. Wu, and C. Huang, ‘‘Representation learningbased


network intrusion detection system by capturing explicit and implicit feature
interactions,’’ Comput. Secur., vol. 112, Jan. 2022, Art. no. 102537.

[3] J. Oughton, W. Lehr, K. Katsaros, I. Selinis, D. Bubley, and J. Kusuma,


‘‘Revisiting wireless internet connectivity: 5G vs Wi-Fi 6,’’ Telecomm. Policy, vol.
45, no. 5, Jun. 2021, Art. no. 102127

[4] B. A. Tama and S. Lim, ‘‘Ensemble learning for intrusion detection systems: A
systematic mapping study and cross-benchmark evaluation,’’ Comput. Sci. Rev.,
vol. 39, Feb. 2021, Art. no. 100357.

[5] S. Lei, C. Xia, Z. Li, X. Li, and T. Wang, ‘‘HNN: A novel model to study the
intrusion detection based on multi-feature correlation and temporalspatial analysis,’’
IEEE Trans. Netw. Sci. Eng., vol. 8, no. 4, pp. 3257–3274, Oct. 2021

[6] Y. Cheng, Y. Xu, H. Zhong, and Y. Liu, ‘‘Leveraging semisupervised


hierarchical stacking temporal convolutional network for anomaly detection in IoT
communication,’’ IEEE Internet Things J., vol. 8, no. 1, pp. 144–155, Jan. 2021.

[7] X. Li, M. Zhu, L. T. Yang, M. Xu, Z. Ma, C. Zhong, H. Li, and Y. Xiang,
‘‘Sustainable ensemble learning driving intrusion detection model,’’ IEEE Trans.
Dependable Secure Comput., vol. 18, no. 4, pp. 1591–1604, Jul./Aug. 2021
[8] Y. Zhou, G. Cheng, S. Jiang, and M. Dai, ‘‘Building an efficient intrusion
detection system based on feature selection and ensemble classifier,’’ Comput.
Netw., vol. 174, Jun. 2020, Art. no. 107247.

[9] G. Kumar, K. Thakur, and M. R. Ayyagari, ‘‘MLEsIDSs: Machine learning-


based ensembles for intrusion detection systems—A review,’’ J. Supercomput., vol.
76, no. 11, pp. 8938–8971, Nov. 2020

[10] B. A. Tama, L. Nkenyereye, S. M. R. Islam, and K. Kwak, ‘‘An enhanced


anomaly detection in web traffic using a stack of classifier ensemble,’’ IEEE Access,
vol. 8, pp. 24120–24134, 2020.

[11] S. Hajiheidari, K. Wakil, M. Badri, and N. J. Navimipour, ‘‘Intrusion detection


systems in the Internet of Things: A comprehensive investigation,’’ Comput. Netw.,
vol. 160, pp. 165–191, Sep. 2019.

[12] M. Akbanov, V. G. Vassilakis, and M. D. Logothetis, ‘‘Ransomware detection


and mitigation using software-defined networking: The case of WannaCry,’’
Comput. Electr. Eng., vol. 76, pp. 111–121, Jun. 2019

[13] J. W. Mikhail, J. M. Fossaceca, and R. Iammartino, ‘‘A semi-boosted nested


model with sensitivity-based weighted binarization for multi-domain network
intrusion detection,’’ ACM Trans. Intell. Syst. Technol., vol. 10, no. 3, pp. 1–27,
May 2019

[14] K. Li, G. Zhou, J. Zhai, F. Li, and M. Shao, ‘‘Improved PSO AdaBoost
ensemble algorithm for imbalanced data,’’ Sensors, vol. 19, no. 6, p. 1476, Mar.
2019

You might also like