Intrusion Detection - Full Doc Java
Intrusion Detection - Full Doc Java
ABSTRACT
LIST OF TABLES
LIST OF FIGURES
LIST OF ABBREVIATIONS
1. INTRODUCTION
1.1 FEATURE SELECTION
1.2 FEATURE ENGINEERING
1.3 CLASSIFICATION
1.4 MACHINE LEARNING
1.5 ENSEMBLE LEARNING
1.6 ANOMALY DETECTION
1.7 INTRUSION DETECTION SYSTEM
1.8OBJECTIVES
2 LITERATURE SURVEY
3 SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
3.1.1 DRAWBACKS
3.2 PROPOSED SYSTEM
3.2.1 ADVANTAGES
3.3 FEASIBILITY STUDY
3.3.1 TECHNICAL FEASIBILITY
3.3.2 OPERATIONAL FEASIBILITY
3.3.3 ECONOMICAL FEASIBILITY
4 SYSTEM SPECIFICATION
4.1 HARDWARE CONFIGURATION
4.2 SOFTWARE SPECIFICATION
5 SOFTWARE DESCRIPTION
5.1 FRONT END
6 PROJECT DESCRIPTION
6.1 PROBLEM DEFINITION
6.2 MODULE DESCRIPTION
6.3 SYSTEM FLOW DIAGRAM
6.4 INPUT DESIGN
6.5 OUTPUT DESIGN
11 REFERENCES
LIST OF FIGURES
4 COMPARISON
GRAPH
LIST OF ABBREVATIONS
ABBREVATIONS
NIDS) NETWORK INTRUSION
DETECTION SYSTEMS
DSL DIGITAL SUBSCRIBER LINE
TDM) TIME DIVISION MULTIPLEXING
EPON), ETHERNET PASSIVE OPTICAL
NETWORK
(NG-PON2 NEXT-GENERATION PASSIVE
OPTICAL NETWORK STAGE
WDM WAVELENGTHDIVISION
MULTIPLEXING
(WLAN WIRELESS LOCAL ACCESS
NETWORK
IOT INTERNET OF THINGS
D-FES DEEP-FEATURE EXTRACTION
AND SELECTION
AWID) AEGEAN WI-FI INTRUSION
DATASET
B2B BUSINESS-TO BUSINESS
GB GRADIENT BOOSTING
RF RANDOM FOREST
CNN CONVOLUTIONAL NEURAL
NETWORK
CHAPTER 1
1. INTRODUCTION
In the digital age, the security of computer networks and data has become paramount.
With the increasing sophistication of cyber threats and the interconnectedness of our
systems, the need for robust network intrusion detection systems (NIDS) has never
been greater. Intrusion detection plays a pivotal role in safeguarding organizations,
detecting unauthorized access, and mitigating potential threats to information
systems. Traditional intrusion detection methods often face challenges in adapting
to the ever-evolving threat landscape. To address these challenges and enhance the
efficacy of intrusion detection, we propose a novel approach "Network Intrusion
Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature
Selection. “This research embarks on a journey to amalgamate cutting-edge
techniques from the realms of machine learning, data science, and cybersecurity. By
fusing the power of ensemble learning and automatic feature selection into a two-
phased detection system, we aim to redefine the landscape of network intrusion
detection.
1.1FEATURE SELECTION
In the realm of machine learning, the quality of data is often paramount to the success
of predictive models and data-driven applications. While machine learning
algorithms can work wonders when presented with vast datasets, the art of "feature
engineering" has emerged as an indispensable process to transform raw data into a
more informative and efficient format. Feature engineering is a craft, akin to
sculpting a raw material into a masterpiece, where the raw material comprises the
data and the masterpiece is an accurate and powerful predictive model. The
motivation behind feature engineering lies in the inherent limitations and
idiosyncrasies of raw data. In many real-world applications, data is messy,
incomplete, and often contains extraneous information. Furthermore, not all data
attributes are equally relevant to the task at hand. Feature engineering seeks to
address these challenges by meticulously crafting new features or transforming
existing ones to better capture the underlying patterns and relationships in the data.
1.3 CLASSIFICATION
In the vast landscape of machine learning, the task of classification stands as one of
the most fundamental and ubiquitous endeavors. At its core, classification is about
imparting the ability to an algorithm to make sense of the world by assigning items
into predefined categories or classes based on their inherent characteristics. This
ability is not only pervasive but also profoundly influential, as it underpins a
multitude of real-world applications, ranging from spam email filtering to medical
diagnosis and beyond. The motivation behind classification is deeply rooted in our
innate human inclination to categorize and organize the information-rich
environment around us. In a digital context, classification serves as a potent tool for
automating decision-making processes, discerning patterns in data, and making
predictions. It is the keystone of supervised learning, where models are trained on
labeled data to replicate the human ability to classify objects or observations into
meaningful groups.
In the realm of machine learning, the quest for improved predictive accuracy and
robustness has led to the development of ingenious techniques, and at the forefront
of this innovation stands the concept of "Ensemble Learning." Much like the
collective intelligence of a diverse group of individuals can often outperform a single
expert, ensemble learning harnesses the power of multiple machine learning models
to make better predictions, decisions, and classifications. The motivation behind
ensemble learning stems from the acknowledgment that no single machine learning
model is universally optimal for all tasks and datasets. In practice, different
algorithms excel under different conditions, and they may be more adept at capturing
specific patterns or mitigating particular sources of error. Ensemble learning seeks
to capitalize on this diversity by combining the strengths of multiple models,
mitigating their individual weaknesses, and achieving superior performance as a
collective. By aggregating predictions from multiple models, ensemble learning
aims to improve overall predictive accuracy. This is particularly valuable in domains
where high accuracy is paramount, such as medical diagnosis or financial
forecasting.
1.6ANAMOLY DETECTION
In today's digitally interconnected world, the protection of sensitive data and critical
infrastructure from cyber threats is of paramount importance. As the complexity and
sophistication of malicious activities continue to evolve, traditional rule-based
Intrusion Detection Systems (IDS) have faced limitations in effectively identifying
and mitigating these threats. In response to this ever-expanding threat landscape, the
integration of machine learning techniques within IDS has emerged as a promising
approach. Machine learning, a subset of artificial intelligence, has the unique
capability to adapt and learn from data, making it well-suited for the dynamic and
evolving nature of cyber threats. By leveraging advanced algorithms and data-driven
insights, machine learning-based IDS aim to bolster cybersecurity defences by
detecting anomalous patterns and malicious behaviours in network traffic, system
logs, and other digital assets.
1.8 OBJECTIVES
1. Develop and implement an Intrusion Detection System (IDS) using the ADT-
SVM algorithm for dynamic cybersecurity threat detection.
3. Evaluate IDS performance using key metrics such as Detection Rate (DR) and
False Alarm Rate (FAR) on the KDD dataset.
CHAPTER 2
2. LITERATURE REVIEW
Felix Obiteet.al. Has proposed in this paper, the tremendous Internet traffic growth
has confirmed that the telecommunications back bone is moving aggressively from
a time division multiplexing (TDM) orientation to a focus on Ethernet solution.
Ethernet PON, which presents the convergence of low-cost Ethernet and fiber
infrastructures, has taken over the market initially dominated by Digital Subscriber
Line (DSL) and cable modems. It is a new technology that is simple, inexpensive,
and scalable, having the ability to deliver massive data services to end-users over a
single network. This paper reviewed the evolution of Ethernet Passive Optical
Network (EPON), with focus on the current development process of the future high-
data-rate access networks such as Next-Generation Passive Optical Network Stage
2 (NG-PON2), Wavelength Division Multiplexing (WDM) PON, and Orthogonal
Frequency Division Multiplexing (OFDM) PON. In addition, the recently concluded
100 Gb Ethernet Passive Optical Network (100G-EPON) is reviewed with the aim
of highlighting the recent developments in the field. With this comprehensive and
up-to-date review, we equip network operators and interested practitioners to focus
on common priorities and timelines. Another goal of this study is to identify
technical remedies for future investigation. Data traffic is on the increase at an
alarming rate and more users are accessing online, those who are already online
spend more time online and use more bandwidth-intensive applications. Broadband
services permitting high-speed internet transmission is expected to improve
economies. Hence, large bandwidth and mobility are two basic requirements for
future access cable modems are unable to withstand such demand. They were
designed on top of previous communication infrastructures that was not optimized
for data traffic. In cable modem systems, just a few RF channels are dedicated for
data, while most networks, in order to support new and real-time broadband
applications. DSL and of the bandwidth is reserved for servicing legacy analog
video. DSL copper systems only allow limited data rate at required distances due to
signal attenuation and crosstalk. It has become necessary for a new data-centric
solution, a technology that would be optimized for (IP) data congestion. Emerging
as the next generation Ethernet passive optical network is the 10 G-EPON. The
technical specification was standardized by IEEE 802.3av Task Force in September
2009 (10GPON). One of the major requirements in designing the specification is to
develop a platform of co-existence with the current 1 G EPON Network on the same
optical system and backward compatibility. This paper has described the service
trends and operator requirements that define the evolution of EPON and future
trends. It has proved that optical technologies are evolving continuously in the
direction of higher speeds, higher wavelength capability, and higher loss budgets. A
smart allocation and coexistence strategy of new and existing users is required, with
a logical combination of different types of users such as business and residential
subscribers. WDM-PONs implemented possibly by TDMA and TDM techniques are
unarguably the next stage in PONs evolution. With optical amplification, they
present higher bandwidth per ONU, maximum reach, and splitting ratios, as
compared to EPON and GPON architectures. They can withstand various fiber
topologies and gives additional functionality such as protection. WDM-PONs if
implemented, will give access to new broadband structure and a broad scale
residential applications.
2.2 REVISITING WIRELESS INTERNET CONNECTIVITY: 5G
VS WI-FI 6
Bayu Adhi Tamaet.al. Has proposed in this system Intrusion detection systems
(IDSs) are intrinsically linked to a comprehensive solution of cyberattacks
prevention instruments. To achieve a higher detection rate, the ability to design an
improved detection framework is sought after, particularly when utilizing ensemble
learners. Designing an ensemble often lies in two main challenges such as the choice
of available base classifiers and combiner methods. This paper performs an overview
of how ensemble learners are exploited in IDSs by means of systematic mapping
study. We collected and analyzed 124 prominent publications from the existing
literature. The selected publications were then mapped into several categories such
as years of publications, publication venues, datasets used, ensemble methods, and
IDS techniques. Furthermore, this study reports and analyzes an empirical
investigation of a new classifier ensemble approach, called stack of ensemble (SoE)
for anomaly-based IDS. The SoE is an ensemble classifier that adopts parallel
architecture to combine three individual ensemble learners such as random forest,
gradient boosting machine, and extreme gradient boosting machine in a
homogeneous manner. The performance significance among classification
algorithms is statistically examined in terms of their Matthews correlation
coefficients, accuracies, false positive rates, and area under ROC curve metrics. Our
study fills the gap in current literature concerning an up-to-date systematic mapping
study, not to mention an extensive empirical evaluation of the recent advances of
ensemble learning techniques applied to Istle ensemble of classifiers; which is
hereafter mentioned as an ensemble learner, has drawn a lot of interest in
cybersecurity research, and in an intrusion detection system (IDS) domain is no
exception. An IDS deals with the proactive and responsive detection of external
aggressors and anomalous operations of the server before they make such a massive
destruction. As of today, a variety number of cyberattacks has been in perilous
situations, placing some organization’s critical infrastructures into risk. A successful
attack may lead to difficult consequences such as but not limited to financial loss,
operational termination, and confidential information disclosure. Moreover, the
larger the organization’s network, the bigger the chance for attackers to exploit. The
complexity of the network may also give rise to vulnerabilities and other specific
threats. Therefore, security mitigation and protection strategies should be considered
mandatory. This study revealed that there has been a great interest in applying
random forest classifier for IDSs. This is because the implementation of random
forest is diverse and almost effortless to apply for. For instance Caret, Boruta,
VSURF ,etc are the example of random forest implementation in R.
2.5 DEEP ABSTRACTION AND WEIGHTED FEATURE
SELECTION FOR WI-FI IMPERSONATION DETECTION
Muhamad Erza Amina toet.al. Has proposed in this system, The recent advances in
mobile technologies have resulted in IoT-enabled devices becoming more pervasive
and integrated into our daily lives. The security challenges that need to be overcome
mainly stem from the open nature of a wireless medium such as a Wi-Fi network.
An impersonation attack is an attack in which an adversary is disguised as a
legitimate party in a system or communications protocol. The connected devices are
pervasive, generating high-dimensional data on a large scale, which complicates
simultaneous detections. Feature learning, however, can circumvent the potential
problems that could be caused by the large-volume nature of network data. This
study thus proposes a novel Deep-Feature Extraction and Selection (D-FES), which
combines stacked feature extraction and weighted feature selection. The stacked
autoencoding is capable of providing representations that are more meaningful by
reconstructing the relevant information from its raw inputs. We then combine this
with modified weighted feature selection inspired by an existing shallow-structured
machine learner. We finally demonstrate the ability of the condensed set of features
to reduce the bias of a machine learner model as well as the computational
complexity. Our experimental results on a well-referenced Wi-Fi network
benchmark dataset, namely, the Aegean Wi-Fi Intrusion Dataset (AWID), prove the
usefulness and the utility of the proposed D-FES by achieving a detection accuracy
of 99.918% and a false alarm rate of 0.012%, which is the most accurate detection
of impersonation attacks reported in the literature HE rapid growth of the Internet
has led to a significant increase in wireless network traffic in recent years. According
to a worldwide telecommunication consortium, proliferation of 5G and Wi-Fi
networks is expected to occur in the next decades. By 2020 1 wireless network traffic
is anticipated to account for two thirds of total Internet traffic — with 66% of IP
traffic expected to be generated by Wi-Fi and cellular devices only. Although
wireless networks such as IEEE 802.11 have been widely deployed to provide users
with mobility and flexibility in the form of high-speed local area connectivity, other
issues such as privacy and security have raised. The rapid spread of Internet of
Things (IoT)-enabled devices has resulted in wireless networks becoming to both
passive and active attacks, the number of which has grown dramatically. Examples
of these attacks are impersonation, flooding, and injection attacks. In this study, we
presented a novel method, D-FES, which combines stacked feature extraction and
weighted feature selection techniques in order to detect impersonation attacks in Wi-
Fi networks. SAE is implemented to achieve high-level abstraction of complex and
large amounts of Wi-Fi network data. The model-free properties in SAE and its
learnability on complex and large-scale data take into account the open nature of
Wi-Fi networks, where an adversary can easily inject false data or modify data
forwarded in the network.
Joseph w. Mikhail et.al has proposed in this paper Effective network intrusion
detection techniques are required to thwart evolving cybersecurity threats.
Historically, traditional enterprise networks have been researched extensively in this
regard. However, the cyber threat landscape has grown to include wireless networks.
In this article, the authors present a novel model that can be trained on completely
different feature sets and applied to two distinct intrusion detection applications:
traditional enterprise networks and 802.11 wireless networks. This is the first
method that demonstrates superior performance in both aforementioned
applications. The model is based on a one-versus-all binary framework comprising
multiple nested sub-ensembles. To provide good generalization ability, each sub-
ensemble contains a collection of sub-learners, and only a portion of the sub-learners
implement boosting. A class weight based on the sensitivity metric (true-positive
rate), learned from the training data only, is assigned to the sub-ensembles of each
class. The use of pruning to remove sub-learners that do not contribute to or have an
adverse effect on overall system performance is investigated as well. The results
demonstrate that the proposed system can achieve exceptional performance in
applications to both traditional enterprise intrusion detection and 802.11 wireless
intrusion detection.Massive growth in the use of computer and network services has
resulted in an increase in the quantity of network-based security threats. Hackers can
exploit many entry points to gain unauthorized access to networks and devices. An
intrusion is considered an attempt to bypass security measures and compromise
confidentiality, integrity, or availability. Network intrusion detection involves being
able to detect and classify network traffic into different attack categories. Intrusion
detection system (IDS) devices are typically placed inside computer networks, and
they scan traffic for malicious events. In recent times, security researchers have
focused on building detection and classification models that can overcome the
limitations of signature-based methods by training a model that can generate a
prediction based on observations of network traffic representative of both attacks
and normal traffic. The KDD Cup 1999 dataset (KDD’99) has served as the baseline
for standard network intrusion detection research for many years. However, wireless
networks are now an additional avenue for network exploitation. Recent reports have
shown that in addition to protocol vulnerabilities, wireless access points are often
configured improperly.
Yuyang Zhou et.al has proposed in this paper Intrusion detection system (IDS) is
one of extensively used techniques in a network topology to safe guard the integrity
and availability of sensitive assets in the protected systems. Although many
supervised and unsupervised learning approaches from the field of machine learning
have been used to increase the efficacy of IDSs, it is still a problem for existing
intrusion detection algorithms to achieve good performance. First, lots of redundant
and irrelevant data in high-dimensional datasets interfere with the classification
process of an IDS. Second, an individual classifier may not perform well in the
detection of each type of attacks. Third, many models are built for stale datasets,
making them less adaptable for novel attacks. Thus, we propose a new intrusion
detection framework in this paper, and this framework is based on the feature
selection and ensemble learning techniques. In the first step, a heuristic algorithm
called CFS-BA is proposed for dimensionality reduction, which selects the optimal
subset based on the correlation between features. Then, we introduce an ensemble
approach that combines C4.5, Random Forest (RF), and Forest by Penalizing
Attributes (Forest PA) algorithms. Finally, voting technique is used to combine the
probability distributions of the base learners for attack recognition. The experimental
results, using NSL-KDD, AWID, and CIC-IDS2017 datasets, reveal that the
proposed CFS-BA-Ensemble method is able to exhibit better performance than other
related and state of the art approaches under several metrics. Nowadays, the
applications of the Internet help society in many areas such as electronic
communication, teaching, commerce, and entertainment, it has become a part of
daily life of the people. However, cyber security has become vulnerable due to the
massive expansion of the computer networks and rapid emergence of the intrusion
incidents. The necessity of developing cyber security has attracted considerable
attention from industry and academia around the world. Despite the use of different
security applications, such as firewalls, malware prevention, data encryption, and
user authentication, many organizations and enterprises fall victims to contemporary
cyber-attacks. In order to sneak into the system, attackers might deliberately exploit
the vulnerabilities of the target system and launch different types of attacks, which
may lead to the leakage of private information.
2.9MLESIDSS: MACHINE LEARNING-BASED ENSEMBLES
FOR INTRUSION DETECTION SYSTEMS—A REVIEW
Gulshan Kumaret.al has proposed in this system Network security plays an essential
role in secure communication and avoids financial loss and crippled services due to
network intrusions. Intruders generally exploit the flaws of popular software to
mount a variety of attacks against network computer systems. The damage caused
in the network attacks may vary from a little disruption in service to on developing
financial loss. Recently, intrusion detection systems (IDSs) comprising machine
learning techniques have emerged for handling unauthorized usage and access to
network resources. With the passage of time, a wide variety of machine learning
techniques have been designed and integrated with IDSs. Still, most of the IDSs
reported poor intrusion detection results using false positive rate and detection rate.
For solving these issues, researchers focused on the development of ensemble
classifiers involving the integration of predictions by multiple individual classifiers.
The ensemble classifiers enable to compensate for the weakness of individual
classifiers and use their combined knowledge to enhance its performance. This study
presents motivation and comprehensive review of intrusion detection systems based
on ensembles in machine learning as an extension of our previous work in the field.
Particularly, different ensemble methods in the field are analyzed, taking into
consideration different types of ensembles, and various approaches for integrating
the predictions of individual classifiers for an ensemble classifier. The representative
studies are compared in chronological order for systematic and critical analysis,
understanding the current challenges and status of research in the field. Finally, the
study presents essential future research directions for the development of effective
IDSs.Network security plays a vital role in avoiding financial loss, protecting
customers from monetary damages, avoiding disabling or crippling services, and
limiting severe information loss due to network intrusions. Attackers generally
exploit the configurations and vulnerabilities of popular software to mount attacks
against network computer systems. The damage caused in these attacks may vary
from a little disruption in services to high financial losses. Existing conventional
security techniques like firewalls are only used as the first line of defense. These
techniques can be easily bypassed by the attackers.
BayuAdhi Tamaet.al has proposed in this paper A Web attack protection system is
extremely essential in today’s information age. Classifier ensembles have been
considered for anomaly-based intrusion detection in Web traffic. However, they
suffer from an unsatisfactory performance due to a poor ensemble design. This paper
proposes a stacked ensemble for anomaly-based intrusion detection systems in a
Web application. Unlike a conventional stacking, where some single weak learners
are prevalently used, the proposed stacked ensemble is an ensemble architecture, yet
its base learners are other ensembles learners, i.e. random forest, gradient boosting
machine, and XGBoost. To prove the generalizability of the proposed model, two
datasets that are specifically used for attack detection in a Web application, i.e.
CSIC-2010v2 and CICIDS-2017 are used in the experiment. Furthermore, the
proposed model significantly surpasses existing Web attack detection techniques
concerning the accuracy and false positive rate metrics. Validation result on the
CICIDS-2017, NSL-KDD, and UNSW-NB15 dataset also ameliorate the ones
obtained by some recent techniques. Finally, the performance of all classification
algorithms in terms of a two-step statistical significance test is further discussed,
providing a value-added contribution to the current literature.In today’s information
age, every organization attempts to place their business on the Internet. Internet-
based applications enable companies to increase their revenue as well as to improve
or even redesign their business process, i.e. virtualization in supply chain or adopting
futuristic business-to business (B2B) platform. The Internet has been employed in
the last two decades by companies and many organizations worldwide. It helps an
organization to place a Web-based application such as e-commerce to offer timely
services or getting closer to its customers, for instance. Furthermore, it has changed
people’s life dramatically, in which the users could stay online to communicate with
each other anywhere and anytime. Nowadays, a high-speed Internet has brought a
significant contribution to the development of various types of Internet-based
computing such as ubiquitous computing, cloud computing, and mobile cloud
computing, among others. People are not dependent on the-spot computing resources
to run the application services, yet various services, i.e. storage, applications, and
servers are delivered to the user’s computers or devices over the Internet This study
has explored the use of stack architecture to combine multiple classifier ensembles,
i.e. gradient boosting machine (GBM), random forest (RF), and extreme gradient
boosting machine (XGB) for detecting anomaly in a Web application scenario. To
prove the generalizability of our proposed model, we have tested on multiple IDS
datasets such as CSIC-2010v2, CICIDS-2017, NSL-KDD, and UNSWNB15. Unlike
a conventional stacking technique that usually considers a weak individual
classification algorithm, our proposed model is built based on a combination of
strong classifier ensembles that work as base learners.
2.11A COMPREHENSIVE STUDY OF ANOMALY DETECTION
SCHEMES IN IOT NETWORKS USING MACHINE LEARNING
ALGORITHMS
AbebeDiroet.al has proposed in this paper The Internet of Things (IoT) consists of a
massive number of smart devices capable of data collection, storage, processing, and
communication. The adoption of the IoT has brought about tremendous innovation
opportunities in industries, homes, the environment, and businesses. However, the
inherent vulnerabilities of the IoT have sparked concerns for wide adoption and
applications. Unlike traditional information technology (I.T.) systems, the IoT
environment is challenging to secure due to resource constraints, heterogeneity, and
distributed nature of the smart devices. This makes it impossible to apply host-based
prevention mechanisms such as anti-malware and anti-virus. These challenges and
the nature of IoT applications call for a monitoring system such as anomaly detection
both at device and network levels beyond the organizational boundary. This suggests
an anomaly detection system is strongly positioned to secure IoT devices better than
any other security mechanism. In this paper, we aim to provide an in-depth review
of existing works in developing anomaly detection solutions using machine learning
for protecting an IoT system. We also indicate that blockchain-based anomaly
detection systems can collaboratively learn effective machine learning models to
detect anomalies. The IoT consists of myriad smart devices capable of data
collection, storage, processing, and communication. The adoption of the IoT has
brought about tremendous innovation opportunities in industries, homes, the
environment, and businesses, and it has enhanced the quality of life, productivity,
and profitability. However, infrastructures, applications, and services associated
with the IoT introduced several threats and vulnerabilities as emerging protocols and
workflows exponentially increased attack surfaces. For instance, the outbreak of the
Mirai botnet exploited IoT vulnerabilities and crippled several websites and domain
name systems. The IoT environment’s massive number, heterogeneity, and resource
constraints have hindered cyber-attack prevention and detection capabilities. These
characteristics attract monitoring IoT devices at the network level as on-device
solutions are not feasible. To this end, anomaly detection is better positioned to
protect the IoT network. To protect the system, anomaly detection is considered to
be an important tool as it helps identify and alert abnormal activities in the system.
Machine learning has been applied for anomaly detection systems in I.T. and IoT
systems. However, the applications of anomaly detection systems using machine
learning in I.T. systems have been better than the IoT ecosystem due to their resource
capabilities and in-perimeter location
Kewen Liet.al has proposed in this paper The Adaptive Boosting (AdaBoost)
algorithm is a widely used ensemble learning framework, and it can get good
classification results on general datasets. However, it is challenging to apply the
AdaBoost algorithm directly to imbalanced data since it is designed mainly for
processing misclassified samples rather than samples of minority classes. To better
process imbalanced data, this paper introduces the indicator Area Under Curve
(AUC) which can reflect the comprehensive performance of the model, and proposes
an improved AdaBoost algorithm based on AUC (AdaBoost-A) which improves the
error calculation performance of the AdaBoost algorithm by comprehensively
considering the effects of misclassification probability and AUC. To prevent
redundant or useless weak classifiers the traditional AdaBoost algorithm generated
from consuming too much system resources, this paper proposes an ensemble
algorithm, PSOPD-AdaBoost-A, which can re-initialize parameters to avoid falling
into local optimum, and optimize the coefficients of AdaBoost weak classifiers.
Experiment results show that the proposed algorithm is effective for processing
imbalanced data, especially the data with relatively high imbalance. Since
imbalanced data can be found in any area, effective classification of imbalanced data
has become critical for many applications. The classification results of imbalanced
data generated by existing classification algorithms are usually significantly affected
by the majority class, resulting in low accuracy in classification of the minority class.
For example, the sensor network can accurately achieve target recognition under the
assumption of data distribution equilibrium. However, in practical applications, the
filed environment is complex and variable, and the difficulty of obtaining samples
is different, which results in imbalanced data. It is easy to ignore samples of minority
class in this case, resulting in incorrect classification. In the intrusion alarm
application, misclassification of samples of minority class means false alarm of
system, which will cause very serious consequences. Traditional AdaBoost
algorithm focuses on the misclassified samples instead of the samples of minority
class. In this paper, we propose an improved AdaBoost algorithm (AdaBoost-A).
Since the AUC can effectively reflect the performance of the classifier, we introduce
the AUC into error calculation, making the AdaBoost focus more on the
classification accuracy of the minority. Furthermore, the AdaBoost algorithm may
generate redundant or useless weak classifiers, significantly affecting the readability
of the classifier. We propose an ensemble algorithm, PSOPD-AdaBoost-A, which
can further optimize the weight of the weak classifiers.
Mengyao Zhuet.al has proposed in this paper Nowadays, in machine learning based
intrusion detection systems, ensemble learning is a commonly adopted method to
improve the detection accuracy. Unfortunately, the existing works have not
considered the accumulation and reuse of historical knowledge, as well as the
sensitivity of the detection model to different types of attacks, which leads to a low
detection accuracy. To address the issue, this paper proposes a model based on
sustainable ensemble learning. In the model training stage, by taking the individual
classifiers probability output and classification confidence as the training data, we
build multi-class regression models such that ensemble learning adapts to different
attacks. Besides, in the updating stage, an iterative updating method is presented,
where the parameters and decision results of the historical model are added to the
training process of the new ensemble model to realize the incremental learning.
Experiment results show that the proposed model significantly outperforms the
existing solutions in terms of detection accuracy, false alarm, stability and
robustness. WITH advances in network-based computing services and applications,
the Internet suffers from more and more security threats. Therefore, intrusion
detection systems (IDS) are particularly important as an essential part of network
security defense. IDS discovers and identifies intrusions in the system by detecting
and analyzing network traffic or host behaviors. In order to detect abnormal
behaviors in large-scale network traffic, machine learning-based intrusion detection
systems have attracted a wide range of attention. Such methods adopt machine
learning techniques to extract features from a large amount of data and train a
classification model to classify network traffic or host behaviors to detect intrusions
in the system. In order to reduce the false alarm rate and false negative rate, prior
works on the machine learning-based intrusion detection system often employ
multiple machine learning model to construct the detection model, called ensemble
learning method, as demonstrated in Fig. 1. In the ensemble learning, the system first
constructs multiple machine learning models, and then integrates all individual
results via voting or weighted voting methods to obtain the final decision results.
2.14 AUTOMATIC FEATURE EXTRACTION AND SELECTION
FOR MACHINE LEARNING BASED INTRUSION DETECTION
Jinjie Liuet.al has proposed in this paper As the advances in mobile technologies
and IoT enabled devices have been integrated into our daily lives, significant
increases in wireless network traffic generate a large scale of high dimensional
network log data. This has led to challenges in security of Wi-Fi network systems
that have to analyze such complex big data for intrusion detection. Many Wi-Fi
network systems commonly employee machine learning based Intrusion Detection
Systems (IDS). Such IDS usually adopt supervised methods that heavily depend on
observations of human experts for feature extraction, feature selection, and labeling
processes of training data for classification. In this study, using the recently collected
Aegean Wi-Fi Intrusion Dataset (AWID) which contains real traces of different
network attacks types, we propose an unsupervised approach with automatic feature
extraction and selection process to replace human intervention and manual labelling
process for analyzing a large scale high dimensional data to improve the prediction
accuracy of classification to detect 3 most common network attack types – Injection,
Flooding, and Impersonate attacks in an IDS with a large scale of high dimensional
data. The experiment results showed the effectiveness of our approach for feature
extraction and selection. The quality of the selected features and the accuracy of
intrusion detection of the three attack types are compared and analyzed Wireless
networks such as IEEE 802.11 have been widely deployed to provide users with
mobility and flexibility in the form of high-speed local area connectivity. The rapid
growth of Internet-of-Things (IoT) enabled devices and the advances of mobile
technologies have led to a significant increase in wireless network traffic in recent
years. Cisco reported that worldwide mobile data traffic increased 13- fold over the
recent four years, reaching 11.2 Exabyte per month (134 Exabyte annually) in 2017.
As the rapid advances of IoT devices have become more pervasive and integrated
into our daily lives, the wireless network systems have become vulnerable targets to
both passive and active attacks more than ever, the number of those attacks has
grown dramatically, which have raised many challenges in privacy and security of
the wireless network systems.
Leila Mohammadpouret.al has proposed in this paper Over the past few years,
Internet applications have become more advanced and widely used. This has
increased the need for Internet networks to be secured. Intrusion detection systems
(IDSs), which employ artificial intelligence (AI) methods, are vital to ensuring
network security. As a branch of AI, deep learning (DL) algorithms are now
effectively applied in IDSs. Among deep learning neural networks, the convolutional
neural network (CNN) is a well-known structure designed to process complex data.
The CNN overcomes the typical limitations of conventional machine learning
approaches and is mainly used in IDSs. Several CNN-based approaches are
employed in IDSs to handle privacy issues and security threats. However, there are
no comprehensive surveys of IDS schemes that have utilized CNN to the best of our
knowledge. Hence, in this study, our primary focus is on CNN-based IDSs so as to
increase our understanding of various uses of the CNN in detecting network
intrusions, anomalies, and other types of attacks. This paper innovatively organizes
the studied CNN-IDS approaches into multiple categories and describes their
primary capabilities and contributions. The main features of these approaches, such
as the dataset, architecture, input shape, evaluated metrics, performance, feature
extraction, and classifier method, are compared. Because different datasets are used
in CNN-IDS research, their experimental results are not comparable. Hence, this
study also conducted an empirical experiment to compare different approaches based
on standard datasets, and the comparative results are presented in detail. Worldwide
economic and business progress is directly tied to the Internet and enterprise
networks. Furthermore, cyber-attacks are becoming increasingly common, which is
a significant security concern. For this reason, technicians and network security
specialists are paying increasing attention to identifying network attacks.
Governments and private organizations require solutions offering stable
performance in protecting the information assets they hold from any unlawful or
unwanted access and in preventing and detecting intrusions. The term “intrusion
detection system” (IDS) refers to a system that monitors and categorizes network
flows to determine whether they are the typical (normal) activity that frequently
occurs in a network or activity that could threaten the security of information
sisterships play a crucial role in securing networks and computer systems worldwide
and employ several AI techniques, such as machine learning, to enhance
performance against novel cyber-attack challenges. Deep learning has a significant
benefit over other traditional machine learning methods because it can independently
detect relevant features in high dimensional data. Among deep learning algorithms,
the CNN has been widely used by researchers to improve IDS solutions regarding
privacy issues and security threats. Therefore, this study provides a comprehensive
survey of CNN-based IDS schemes. It initially provides the background, in which
deep learning is briefly discussed, and the CNN is explained in detail.
Wei Wanget.al has proposed in this paper Network intrusion detection system is an
important cyber defense tool to protect a system from illegal attacks. Building an
effective network intrusion detection system that makes good use of deep learning
methods is a challenging task. From the object perspective, different types of
malicious attacks have a quite imbalance distribution, especially compared with
normal network behavior. From the feature perspective, the network behavior
description contains heterogeneous features, including numeric and categorical
features and complex interactions among these features. To address these two
challenges, we propose a novel Network Intrusion Detection System which by
learning explicit and implicit feature interactions based on representation learning,
i.e., RL-NIDS, which models the network behavior by learning explicit and implicit
feature interactions in both feature value representation and object representation
spaces. Specifically, the RL-NIDS consists of two main modules, i.e., unsupervised
Feature Value Representation Learning module (FVRL) which aims to learn the
feature interactions among categorical features explicitly, and supervised Neural
Network for object Representation Learning (NNRL) which aims to learn the
implicit interactions in the representation space. Experiments show the effectiveness
of RL NIDS and the object representation learned by RL-NIDS with multiclass
classification on two real-world datasets. The RL-NIDS outperforms the state-of-
the-art feature selection-based methods and deep learning-based methods in terms
of both overall accuracy, precision, recall, and F1 score. The accuracy of
classification of NSL-KDD and AWIDS dataset is 81.38% and 95.72%,
respectively, achieve 3.9% and 0.9% improvements compare to the second-best
method. Moreover, a thorough ablation study demonstrates the contributions of both
FVRL and NNRL which complement each other for capturing feature interactions.
With the widespread use of the Internet, more and more research works focus on
cyber security. Among the bunch of cybersecurity defense techniques, the network
intrusion detection system (NIDS) is one of the most important tools that can
actively protect a system from illegal external attacks. Traditional NIDS is based on
pattern matching which compares the patterns of a network against existing
malicious patterns which are always summarized by human. Nowadays, an
increasing number of researchers try to involve machine learning techniques to make
intrusion detection more effectiveness is essential to cybersecurity and making good
use of deep learning techniques to build NIDS is not a trivial task. In this work, we
propose an effective NIDS, i.e., RL-NIDS, which contains explicit feature
interaction learning and implicit deep representation learning. The explicit feature
interaction learning captures the network behavior through a multi-grain clustering
and highlights the abnormal feature value in the learned embedding in an
unsupervised way. The implicit representation learning is implemented by a neural
network which is constrained by classification loss and triplet loss. We design a
customized triplet generating and learning process to learn a more discriminative
representation and decision boundaries to overcome the imbalanced data distribution
issue.
2.18 CLASSIFICATION BY PAIRWISE COUPLING OF
IMPRECISE PROBABILITIES
Benjamin Quostet.al has proposed in this paper In this paper, we are interested in
making decisions by combining classifiers providing uncertain outputs, in the form
of sets of probability distributions. More precisely, each classifier provides lower
and upper bounds on the conditional probabilities of the associated classes. The
classifiers are combined by computing the set of unconditional probability
distributions compatible with these bounds, by solving linear optimization problems.
When the classifier outputs are inconsistent, we propose a correcting step that
restores this consistency. The experiments show the interest of our approach for
solving multi-class classification problems, particularly when information is scarce
(i.e., a limited number of classifiers is available). In this case, modeling the lack of
information associated with classifier outputs gives good results even when they are
poorly regularized or overfit the data. Among those latter decomposition-and-
combination strategies, the use of binary classifiers has received particular attention.
In this case, each sub-problem consists in separating two (sets of) classes from each
other. For example, the one-against-all decomposition scheme consists in opposing
each class to all the others; the pairwise strategy (such as pictured in Figure 1 for a
4-class problem), in opposing each class to each other. Both approaches may be
generalized within the theoretical framework of error-correcting output codes. One
can then use any kind of classifier to solve each of the binary problems (e.g., support
vector machines, decision trees, naive Bayes, logistic regression, etc). Note,
however, that binary classifier combination is known to be beneficial with respect
to direct multiclass approaches when considering a class of simple classifiers (e.g.,
combining linear classifiers makes it possible to compute a non-linear decision
boundary). On the other hand, combining sophisticated classification algorithms
(such as, e.g., kernel SVM, neural networks, or deep learning) will not significantly
increase classification accuracy compared to direct multi-class classification. This
phenomenon was pointed out in, and further studied in where the pairwise, one-
against-all and direct multi-class schemes were shown to perform similarly when
well-regularized classifiers are combined. In this article, we presented an approach
to solve multiclass classification problems by combining imprecise binary
classifiers. Each classifier is trained to separate two sets of classes of the original
training set. It is assumed to provide bounds on the conditional probabilities that an
instance belongs to the sets of considered classes. The classifiers are combined by
computing the set of probability distributions which are consistent with their outputs.
The bounds are first expressed as constraints on the unconditional probabilities of
the classes. Then, the maximalist rule can be used to determine the set of plausible
(nominated) classes. Alternatively, should a single decision be made, the maximum
rule returns the class with highest lower probability. Inconsistencies are resolved by
discounting the classifier outputs so as to find at least a probability distribution
which is consistent with the classifier outputs.
SYSTEM ANALYSIS
With an increase in the number and types of network attacks, traditional firewalls
and data encryption methods can no longer meet the needs of current network
security. As a result, intrusion detection systems have been proposed to deal with
network threats. The current mainstream intrusion detection algorithms are aided
with machine learning but have problems of low detection rates and the need for
extensive feature engineering. To address the issue of low detection accuracy, this
paper proposes a model for traffic anomaly detection named a deep learning model
for network intrusion detection (DLNID), which combines an attention mechanism
and the bidirectional long short-term memory (Bi-LSTM) network, first extracting
sequence features of data traffic through a convolutional neural network (CNN)
network, then reassigning the weights of each channel through the attention
mechanism, and finally using Bi-LSTM to learn the network of sequence features.
In intrusion detection public data sets, there are serious imbalance data generally. To
address data imbalance issues, this paper employs the method of adaptive synthetic
sampling (ADASYN) for sample expansion of minority class samples, to eventually
form a relatively symmetric dataset, and uses a modified stacked auto encoder for
data dimensionality reduction with the objective of enhancing information fusion.
3.1.1 DRAWBACKS
• The DLNID models are computationally expensive to train and deploy. This
is because DLNID models require large amounts of data and powerful
hardware to train effectively.
• It requires large amounts of labeled data to train effectively. This data can be
difficult and expensive to collect.
• Additionally, DLNID models are sensitive to the quality of the training data.
If the training data is biased or contains noise, the model will not be able to
learn to detect intrusions accurately.
• These models are black box models, meaning that it is difficult to understand
how they make predictions. This can make it difficult to debug DLNID
models and to identify false positives and false negatives.
The proposed system integrates advanced techniques for intrusion detection in the dynamic
cybersecurity landscape. Combining a Probability Model for baseline behavior analysis, a Link-
Anomaly Score computation for identifying suspicious network connections, Change Point
Analysis and Dynamic Time Warping for detecting shifts in statistical properties and temporal
patterns, and the Adaptive Decision Tree-Support Vector Machine (ADT-SVM) algorithm for
accurate classification, the system offers a comprehensive approach to identifying potential
security threats. By leveraging these modules, the proposed system aims to enhance the
adaptability and effectiveness of intrusion detection, providing a robust defense mechanism
against evolving cyber threats. The ADT-SVM algorithm, with its ability to learn and categorize
diverse data attributes, and the implementation process also includes the utilization of the KDD
dataset as a benchmark to validate the system's performance. plays a central role in the proposed
system, contributing to a more resilient and responsive cybersecurity framework.
3.2.1 ADVANTAGES
Preliminary investigation examine project feasibility, the likelihood the system will
be useful to the organization. The main objective of the feasibility study is to test the
Technical, Operational and Economical feasibility for adding new modules and
debugging old running system. All system is feasible if they are unlimited resources
and infinite time. There are aspects in the feasibility study portion of the preliminary
investigation:
• Technical Feasibility
• Operation Feasibility
• Economical Feasibility
The well-planned design would ensure the optimal utilization of the computer
resources and would help in the improvement of performance status.
The system is economically feasible. It does not require any addition hardware
or software. Since the interface for this system is developed using the existing
resources and technologies available at NIC, There is nominal expenditure and
economical feasibility for certain.
CHAPTER 4
SYSTEM SPECIFICATION
RAM size : 8 GB
SOFTWARE DESCRIPTION
JAVA
The software requirement specification is created at the end of the analysis task. The
function and performance allocated to software as part of system engineering are
developed by establishing a complete information report as functional
representation, a representation of system behavior, an indication of performance
requirements and design constraints, appropriate validation criteria.
FEATURES OF JAVA
The following figure depicts a Java program, such as an application or applet, that's
running on the Java platform. As the figure shows, the Java API and Virtual Machine
insulates the Java program from hardware dependencies.
As a platform-independent environment, Java can be a bit slower than native
code. However, smart compilers, well-tuned interpreters, and just-in-time byte code
compilers can bring Java's performance close to that of native code without
threatening portability.
SOCKET OVERVIEW:
A network socket is a lot like an electrical socket. Various plugs around the
network have a standard way of delivering their payload. Anything that understands
the standard protocol can “plug in” to the socket and communicate.
Internet protocol (IP) is a low-level routing protocol that breaks data into small
packets and sends them to an address across a network, which does not guarantee to
deliver said packets to the destination.
A server is anything that has some resource that can be shared. There
are compute servers, which provide computing power; print servers, which manage
a collection of printers; disk servers, which provide networked disk space; and web
servers, which store web pages. A client is simply any other entity that wants to gain
access to a particular server.
RESERVED SOCKETS:
FACTORY METHODS:
UnknownHostException
throwsUnknowsHostException
ThrowsUnknownHostException
INSTANCE METHODS:
The InetAddress class also has several other methods, which can be
used on the objects returned by the methods just discussed. Here are some of the
most commonly used.
Boolean equals (Object other)- Returns true if this object has the same
Internet address as other.
3. String get Hostname ( ) - Returns a string that represents the host name
associated with the InetAddress object.
5. String toString ( ) - Returns a string that lists the host name and the IP
address for convenience.
Socket (String hostName, intport) - Creates a socket connecting the local host to the
named host and port; can throw an UnknownHostException or anIOException.
A socket can be examined at any time for the address and port
information associated with it, by use of the following methods:
Java has a different socket class that must be used for creating server
applications. The ServerSocket class is used to create servers that listen for either
local or remote client programs to connect to them on published ports. ServerSockets
are quite different form normal Sockets.
When the user create a ServerSocket, it will register itself with the system as having
an interest in client connections.
PROJECT DESCRIPTION
Traditional network security measures such as firewalls and data encryption are no
longer sufficient to protect networks from the increasing number and types of cyber-
attacks. Intrusion detection systems (IDSs) have been proposed to address this
challenge, but they typically suffer from low detection rates and the need for
extensive feature engineering. Deep learning models have the potential to overcome
these challenges and provide more effective intrusion detection. Deep learning
models can learn complex patterns in network traffic data and detect new and
emerging threats without the need for extensive feature engineering. However, deep
learning models also have several drawbacks, including high computational cost,
data requirements, lack of interpretability, and vulnerability to adversarial attacks.
Computing
Anomaly Score
Feature
Loading Dataset Preprocessing Based On
Selection
Selected
Features
Detecting Threats
Result Using ADT-SVM
Method
6.4 INPUT DESIGN
The output design in the context of the cybersecurity and intrusion detection project
utilizing the ADT-SVM algorithm involves the systematic presentation and
interpretation of results generated by the Intrusion Detection System (IDS). The
output includes categorizations of incoming data into four classes: Basic, Content,
Traffic, and Host, allowing for a granular understanding of network behavior.
Detection and classification outcomes are presented through visualizations or reports
that highlight instances of identified intrusions and false alarms.
CHAPTER 7
System testing in the context of the cybersecurity project employing the ADT-SVM
algorithm involves a comprehensive evaluation of the entire Intrusion Detection
System (IDS). This phase verifies the functionality, performance, and reliability of
the IDS by subjecting it to various test cases and scenarios. The testing process
includes assessing the adaptability of the ADT-SVM algorithm to dynamic cyber
threats and ensuring its ability to categorize data attributes into the designated
classes: Basic, Content, Traffic, and Host. The KDD dataset is utilized to simulate
real-world conditions and benchmark the system's performance. System testing also
incorporates the evaluation of key metrics such as Detection Rate (DR) and False
Alarm Rate (FAR) to gauge the accuracy and efficiency of the IDS in identifying
intrusions while minimizing false positives.
SYSTEM MAINTENANCE
The objectives of this maintenance work are to make sure that the system gets into
work all time without any bug. Provision must be for environmental changes which
may affect the computer or software system. This is called the maintenance of the
system. Nowadays there is the rapid change in the software world. Due to this rapid
change, the system should be capable of adapting these changes. In this project the
process can be added without affecting other parts of the system. Maintenance plays
a vital role. The system is liable to accept any modification after its implementation.
This system has been designed to favor all new changes. Doing this will not affect
the system’s performance or its accuracy.
Maintenance is necessary to eliminate errors in the system during its working life
and to tune the system to any variations in its working environment. It has been seen
that there are always some errors found in the system that must be noted and
corrected. It also means the review of the system from time to time.
TYPES OF MAINTENANCE:
• Corrective maintenance
• Adaptive maintenance
• Perfective maintenance
• Preventive maintenance
9. CONCLUSION
FUTURE WORK
Future work in this domain could focus on refining and extending the proposed
cybersecurity framework to address emerging challenges. Further exploration of
advanced machine learning models, beyond ADT-SVM, could enhance the system's
detection capabilities. Investigating the integration of threat intelligence feeds and
real-time network monitoring technologies could contribute to a more proactive
defense mechanism. Additionally, incorporating mechanisms for self-learning and
adaptation to new attack vectors would be crucial for staying ahead of evolving
threats.
CHAPTER 10
APPENDICES
DECISION TREE.JAVA
package adt;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Random;
import libsvm.svm;
import libsvm.svm_model;
import weka.classifiers.Evaluation;
import weka.core.Instances;
import weka.core.converters.CSVLoader;
import weka.classifiers.trees.J48;
/**
* @author admin
*/
try
csv1.setSource(new File("train1.csv"));
Instances trdata=csv1.getDataSet();
trdata.setClassIndex(trdata.numAttributes() - 1);
nb.buildClassifier(trdata);
csv2.setSource(new File("test1.csv"));
Instances tedata=csv2.getDataSet();
tedata.setClassIndex(tedata.numAttributes() - 1);
for(int i=0;i<tedata.numInstances();i++)
int ind=(int)nb.classifyInstance(tedata.instance(i));
newCls.add(ind);
// int it=(int)tedata.instance(i).classValue();
// int ind=(int)nb.classifyInstance(tedata.instance(i));
// System.out.println(it+" : "+ind);
}
ocSVM();
System.out.println(eval.toClassDetailsString());
ADTocSVM();
catch(Exception e)
e.printStackTrace();
try
svm1.readTrData("train2.csv");
svm1.convertTrData("train1.txt");
readData("test2.csv");
int i, predict_probability=0;
if(predict_probability == 1)
if(svm.svm_check_probability_model(model)==0)
System.exit(1);
else
{
if(svm.svm_check_probability_model(model)!=0)
String res=sm.predict(input,output,model,predict_probability);
input.close();
output.close();
catch(Exception e)
e.printStackTrace();
try
svm1.readTrData("train2.csv");
svm1.convertTrData("train1.txt");
SVMTrain svmtr=new SVMTrain();
svmtr.run();
convertData2("test2.csv");
int i, predict_probability=0;
if(predict_probability == 1)
if(svm.svm_check_probability_model(model)==0)
System.exit(1);
}
else
if(svm.svm_check_probability_model(model)!=0)
String res=sm.predict(input,output,model,predict_probability);
input.close();
output.close();
catch(Exception e)
e.printStackTrace();
try
String dSet[][];
int nData[][];
String colName[];
String colType[];
fis.read(data);
fis.close();
String col[]=sg1[0].split(",");
String colty[]=sg1[1].split(",");
colName=new String[col.length];
colType=new String[col.length];
for(int i=0;i<col.length;i++)
colName[i]=col[i];
colType[i]=colty[i];
}
dSet=new String[sg1.length-2][col.length];
nData=new int[sg1.length-2][col.length];
for(int i=2;i<sg1.length;i++)
String sg2[]=sg1[i].split(",");
for(int j=0;j<sg2.length;j++)
dSet[i-2][j]=sg2[j]; //org
String c1=sg2[sg2.length-1].trim();
if(!cls.contains(c1))
cls.add(c1);
System.out.println("cls "+cls);
System.out.println("clsCnt "+clsCnt);
if(colType[i].trim().equals("dis"))
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
if(!at.contains(g1))
at.add(g1);
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
nData[j][i]=at.indexOf(g1);
else
for(int j=0;j<dSet.length;j++)
{
dSet[j][i]=String.valueOf(Math.round(Double.parseDouble(dSet[j][i])));
nData[j][i]=Integer.parseInt(dSet[j][i]);
String txt1="";
for(int i=0;i<nData.length;i++)
// String g1="";
String g1=String.valueOf(nData[i][nData[0].length-1]);
for(int j=0;j<nData[0].length-1;j++)
g1=g1+"\t"+nData[i][j];
//g1=g1+nData[i][j]+"\t";
txt1=txt1+g1.trim()+"\n";
System.out.println(txt1);
fos.close();
catch(Exception e)
e.printStackTrace();
try
String dSet[][];
int nData[][];
String colName[];
String colType[];
fis.read(data);
fis.close();
String col[]=sg1[0].split(",");
String colty[]=sg1[1].split(",");
colName=new String[col.length];
colType=new String[col.length];
for(int i=0;i<col.length;i++)
colName[i]=col[i];
colType[i]=colty[i];
dSet=new String[sg1.length-2][col.length];
nData=new int[sg1.length-2][col.length];
for(int i=2;i<sg1.length;i++)
String sg2[]=sg1[i].split(",");
for(int j=0;j<sg2.length;j++)
dSet[i-2][j]=sg2[j]; //org
String c1=sg2[sg2.length-1].trim();
if(!cls.contains(c1))
cls.add(c1);
System.out.println("cls "+cls);
System.out.println("clsCnt "+clsCnt);
for(int i=0;i<colType.length;i++)
if(colType[i].trim().equals("dis"))
for(int j=0;j<dSet.length;j++)
if(!at.contains(g1))
at.add(g1);
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
nData[j][i]=at.indexOf(g1);
else
for(int j=0;j<dSet.length;j++)
dSet[j][i]=String.valueOf(Math.round(Double.parseDouble(dSet[j][i])));
nData[j][i]=Integer.parseInt(dSet[j][i]);
String txt1="";
for(int i=0;i<nData.length;i++)
{
//String g1=String.valueOf(nData[i][nData[0].length-1]);
String g1=newCls.get(i).toString();
//String g1="";
for(int j=0;j<nData[0].length-1;j++)
g1=g1+"\t"+nData[i][j];
//g1=g1+nData[i][j]+"\t";
g1=g1+newCls.get(nData[0].length-1);
txt1=txt1+g1.trim()+"\n";
System.out.println(txt1);
fos.write(txt1.getBytes());
fos.close();
catch(Exception e)
{
e.printStackTrace();
REFERENCES
[1] R. Kumar, A. Malik, and V. Ranga, ‘‘An intellectual intrusion detection system
using hybrid hunger games search and remora optimization algorithm for IoT
wireless networks,’’ Knowl.-Based Syst., vol. 256, Nov. 2022, Art. no. 109762.
[4] B. A. Tama and S. Lim, ‘‘Ensemble learning for intrusion detection systems: A
systematic mapping study and cross-benchmark evaluation,’’ Comput. Sci. Rev.,
vol. 39, Feb. 2021, Art. no. 100357.
[5] S. Lei, C. Xia, Z. Li, X. Li, and T. Wang, ‘‘HNN: A novel model to study the
intrusion detection based on multi-feature correlation and temporalspatial analysis,’’
IEEE Trans. Netw. Sci. Eng., vol. 8, no. 4, pp. 3257–3274, Oct. 2021
[7] X. Li, M. Zhu, L. T. Yang, M. Xu, Z. Ma, C. Zhong, H. Li, and Y. Xiang,
‘‘Sustainable ensemble learning driving intrusion detection model,’’ IEEE Trans.
Dependable Secure Comput., vol. 18, no. 4, pp. 1591–1604, Jul./Aug. 2021
[8] Y. Zhou, G. Cheng, S. Jiang, and M. Dai, ‘‘Building an efficient intrusion
detection system based on feature selection and ensemble classifier,’’ Comput.
Netw., vol. 174, Jun. 2020, Art. no. 107247.
[14] K. Li, G. Zhou, J. Zhai, F. Li, and M. Shao, ‘‘Improved PSO AdaBoost
ensemble algorithm for imbalanced data,’’ Sensors, vol. 19, no. 6, p. 1476, Mar.
2019