ids final report
ids final report
A PROJECT REPORT
Submitted by
BACHELOR OF ENGINEERING
IN
MAY 2024
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
of Engineering, of Engineering,
Tiruchendur-628215. Tiruchendur-628215.
First and foremost, we would like to thank The God Almighty, who by his
abundant grace sustained us to complete the project successfully.
We thank all our teaching and non-teaching staff members of the Computer
Science department for their passionate support, for helping us to identify our
flaws and also for the appreciation they gave us in achieving our goal. Also, we
would like to record our deepest gratitude to our parents for their constant
encouragement.
iii
ABSTRACT
iv
TABLE OF CONTENTS
v
5 SOFTWARE DESCRIPTION 22
5.1 FRONT END 22
6 PROJECT DESCRIPTION 29
6.1 PROBLEM DEFINITION 29
6.2 MODULE DESCRIPTION 29
6.3 SYSTEM FLOW DIAGRAM 31
6.4 INPUT DESIGN 32
6.5 OUTPUT DESIGN 32
7 SYSTEM TESTING AND 33
IMPLEMENTATION
7.1 SYSTEM TESTING 33
7.2 SYSTEM IMPLEMENTATION 33
8 SYSTEM MAINTENANCE 34
8.1 CORRECTIVE MAINTENANCE 35
8.2 ADAPTIVE MAINTENANCE 35
8.3 PERFECTIVE MAINTENANCE 36
9 CONCLUSION AND FUTURE 37
ENHANCEMENT
10 APPENDICES 38
10.1 SOURCE CODE 38
10.2 SCREEN SHOTS 54
11 REFERENCES 56
vi
LIST OF FIGURES
1 FEATURE SELECTION 2
2 FEATURE ENGINEERING 3
vii
LIST OF ABBREVATIONS
ABBREVATIONS
NIDS) NETWORK INTRUSION DETECTION
SYSTEMS
DSL DIGITAL SUBSCRIBER LINE
TDM) TIME DIVISION MULTIPLEXING
EPON), ETHERNET PASSIVE OPTICAL NETWORK
(NG-PON2 NEXT-GENERATION PASSIVE OPTICAL
NETWORK STAGE
WDM WAVELENGTHDIVISION MULTIPLEXING
(WLAN WIRELESS LOCAL ACCESS NETWORK
IOT INTERNET OF THINGS
D-FES DEEP - FEATURE EXTRACTION AND
SELECTION
AWID) AEGEAN WI-FI INTRUSION DATASET
B2B BUSINESS-TO BUSINESS
GB GRADIENT BOOSTING
RF RANDOM FOREST
CNN CONVOLUTIONAL NEURAL NETWORK
viii
CHAPTER 1
1. INTRODUCTION
In the digital age, the security of computer networks and data has become
paramount. With the increasing sophistication of cyber threats and the
interconnectedness of our systems, the need for robust network intrusion
detection systems (NIDS) has never been greater. Intrusion detection plays a
pivotal role in safeguarding organizations, detecting unauthorized access, and
mitigating potential threats to information systems. Traditional intrusion
detection methods often face challenges in adapting to the ever-evolving threat
landscape. To address these challenges and enhance the efficacy of intrusion
detection, we propose a novel approach "Network Intrusion Detection with
Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection.
“This research embarks on a journey to amalgamate cutting-edge techniques
from the realms of machine learning, data science, and cybersecurity. By fusing
the power of ensemble learning and automatic feature selection into a two-
phased detection system, we aim to redefine the landscape of network intrusion
detection.
1.1FEATURE SELECTION
In the realm of machine learning, the quality of data is often paramount to the
success of predictive models and data-driven applications. While machine
learning algorithms can work wonders when presented with vast datasets, the art
of "feature engineering" has emerged as an indispensable process to transform
raw data into a more informative and efficient format. Feature engineering is a
craft, akin to sculpting a raw material into a masterpiece, where the raw material
2
comprises the data and the masterpiece is an accurate and powerful predictive
model. The motivation behind feature engineering lies in the inherent
limitations and idiosyncrasies of raw data. In many real-world applications, data
is messy, incomplete, and often contains extraneous information. Furthermore,
not all data attributes are equally relevant to the task at hand. Feature
engineering seeks to address these challenges by meticulously crafting new
features or transforming existing ones to better capture the underlying patterns
and relationships in the data.
1.3 CLASSIFICATION
3
supervised learning, where models are trained on labeled data to replicate the
human ability to classify objects or observations into meaningful groups.
In the realm of machine learning, the quest for improved predictive accuracy
and robustness has led to the development of ingenious techniques, and at the
forefront of this innovation stands the concept of "Ensemble Learning." Much
like the collective intelligence of a diverse group of individuals can often
outperform a single expert, ensemble learning harnesses the power of multiple
machine learning models to make better predictions, decisions, and
classifications. The motivation behind ensemble learning stems from the
acknowledgment that no single machine learning model is universally optimal
4
for all tasks and datasets. In practice, different algorithms excel under different
conditions, and they may be more adept at capturing specific patterns or
mitigating particular sources of error. Ensemble learning seeks to capitalize on
this diversity by combining the strengths of multiple models, mitigating their
individual weaknesses, and achieving superior performance as a collective. By
aggregating predictions from multiple models, ensemble learning aims to
improve overall predictive accuracy. This is particularly valuable in domains
where high accuracy is paramount, such as medical diagnosis or financial
forecasting.
1.6ANAMOLY DETECTION
5
complexity and sophistication of malicious activities continue to evolve,
traditional rule-based Intrusion Detection Systems (IDS) have faced limitations
in effectively identifying and mitigating these threats. In response to this ever-
expanding threat landscape, the integration of machine learning techniques
within IDS has emerged as a promising approach. Machine learning, a subset of
artificial intelligence, has the unique capability to adapt and learn from data,
making it well-suited for the dynamic and evolving nature of cyber threats. By
leveraging advanced algorithms and data-driven insights, machine learning-
based IDS aim to bolster cybersecurity defences by detecting anomalous
patterns and malicious behaviours in network traffic, system logs, and other
digital assets.
1.8 OBJECTIVES
3. Evaluate IDS performance using key metrics such as Detection Rate (DR)
and False Alarm Rate (FAR) on the KDD dataset.
6
CHAPTER 2
2. LITERATURE REVIEW
Felix Obiteet.al. Has proposed in this paper, the tremendous Internet traffic
growth has confirmed that the telecommunications back bone is moving
aggressively from a time division multiplexing (TDM) orientation to a focus on
Ethernet solution. Ethernet PON, which presents the convergence of low-cost
Ethernet and fiber infrastructures, has taken over the market initially dominated
by Digital Subscriber Line (DSL) and cable modems. It is a new technology that
is simple, inexpensive, and scalable, having the ability to deliver massive data
services to end-users over a single network. This paper reviewed the evolution
of Ethernet Passive Optical Network (EPON), with focus on the current
development process of the future high-data-rate access networks such as Next-
Generation Passive Optical Network Stage 2 (NG-PON2), Wavelength Division
Multiplexing (WDM) PON, and Orthogonal Frequency Division Multiplexing
(OFDM) PON. In addition, the recently concluded 100 Gb Ethernet Passive
Optical Network (100G-EPON) is reviewed with the aim of highlighting the
recent developments in the field. With this comprehensive and up-to-date
review, we equip network operators and interested practitioners to focus on
common priorities and timelines. Another goal of this study is to identify
technical remedies for future investigation. Data traffic is on the increase at an
alarming rate and more users are accessing online, those who are already online
spend more time online and use more bandwidth-intensive applications.
Broadband services permitting high-speed internet transmission is expected to
improve economies. Hence, large bandwidth and mobility are two basic
requirements for future access cable modems are unable to withstand such
7
demand. They were designed on top of previous communication infrastructures
that was not optimized for data traffic. In cable modem systems, just a few RF
channels are dedicated for data, while most networks, in order to support new
and real-time broadband applications. DSL and of the bandwidth is reserved for
servicing legacy analog video. DSL copper systems only allow limited data rate
at required distances due to signal attenuation and crosstalk. It has become
necessary for a new data-centric solution, a technology that would be optimized
for (IP) data congestion. Emerging as the next generation Ethernet passive
optical network is the 10 G-EPON. The technical specification was standardized
by IEEE 802.3av Task Force in September 2009 (10GPON). One of the major
requirements in designing the specification is to develop a platform of co-
existence with the current 1 G EPON Network on the same optical system and
backward compatibility. This paper has described the service trends and
operator requirements that define the evolution of EPON and future trends. It
has proved that optical technologies are evolving continuously in the direction
of higher speeds, higher wavelength capability, and higher loss budgets. A
smart allocation and coexistence strategy of new and existing users is required,
with a logical combination of different types of users such as business and
residential subscribers. WDM-PONs implemented possibly by TDMA and
TDM techniques are unarguably the next stage in PONs evolution. With optical
amplification, they present higher bandwidth per ONU, maximum reach, and
splitting ratios, as compared to EPON and GPON architectures. They can
withstand various fiber topologies and gives additional functionality such as
protection. WDM-PONs if implemented, will give access to new broadband
structure and a broad scale residential applications.
8
2.2 REVISITING WIRELESS INTERNET CONNECTIVITY:
5G VS WI-FI 6
9
much excitement around the world regarding the fifth generation of cellular
technology known as ‘5G’, there is comparable enthusiasm for the next version
of the Institute of Electrical and Electronics Engineers’ (IEEE) 802.11 Wireless
Local Access Network (WLAN) standard, ‘Wi-Fi 6’. Next generation wireless
connectivity technologies are needed to further enable the Herein we revisited
the debate associated with wireless Internet connectivity by providing a new
evaluation of the two main technologies involved in the provision of next
generation wireless broadband: 5G and Wi-Fi 6. Our analysis highlights how
the futures for 5G and Wi-Fi 6 needs to be understood within the larger context
of how earlier generations of cellular and Wi-Fi technologies have shaped the
evolution of wireless networking and what this may mean for the future. First,
in terms of general demand-side trends, data traffic is expected to continue to
grow significantly with an increasing proportion of devices utilizing wireless
connectivity as the first connection point. The COVID-19 pandemic of 2019–
2021 has highlighted the importance of enhanced digital connectivity to support
remote work, education, and social engagement during the global crisis. But
there may also be potentially new trends which could arise out of the shifting
work and social patterns produced by the pandemic. Such changes could have
repercussions for the spatial and temporal usage of wireless broadband
connectivity and the associated economics of each technology.
11
proposed a systematic review of IDSs in IoT environments. In a resembling
way, we have reviewed numerous highly developed intrusion detection in the
IoT, clarifying and discussing open issues via an in-depth analysis of over 40
main studies among the basic 324 papers. Based on the accessible literature, the
found papers are categorized into four main categories including anomaly-based
IDS, signature-based IDS, specification based IDS, hybrid IDS and also three
categories including centralized, distributed, and hybrid.
Bayu Adhi Tamaet.al. Has proposed in this system Intrusion detection systems
(IDSs) are intrinsically linked to a comprehensive solution of cyberattacks
prevention instruments. To achieve a higher detection rate, the ability to design
an improved detection framework is sought after, particularly when utilizing
ensemble learners. Designing an ensemble often lies in two main challenges
such as the choice of available base classifiers and combiner methods. This
paper performs an overview of how ensemble learners are exploited in IDSs by
means of systematic mapping study. We collected and analyzed 124 prominent
publications from the existing literature. The selected publications were then
mapped into several categories such as years of publications, publication
venues, datasets used, ensemble methods, and IDS techniques. Furthermore, this
study reports and analyzes an empirical investigation of a new classifier
ensemble approach, called stack of ensemble (SoE) for anomaly-based IDS. The
SoE is an ensemble classifier that adopts parallel architecture to combine three
individual ensemble learners such as random forest, gradient boosting machine,
and extreme gradient boosting machine in a homogeneous manner. The
performance significance among classification algorithms is statistically
12
examined in terms of their Matthews correlation coefficients, accuracies, false
positive rates, and area under ROC curve metrics. Our study fills the gap in
current literature concerning an up-to-date systematic mapping study, not to
mention an extensive empirical evaluation of the recent advances of ensemble
learning techniques applied to Istle ensemble of classifiers; which is hereafter
mentioned as an ensemble learner, has drawn a lot of interest in cybersecurity
research, and in an intrusion detection system (IDS) domain is no exception. An
IDS deals with the proactive and responsive detection of external aggressors
and anomalous operations of the server before they make such a massive
destruction. As of today, a variety number of cyberattacks has been in perilous
situations, placing some organization’s critical infrastructures into risk. A
successful attack may lead to difficult consequences such as but not limited to
financial loss, operational termination, and confidential information disclosure.
Moreover, the larger the organization’s network, the bigger the chance for
attackers to exploit. The complexity of the network may also give rise to
vulnerabilities and other specific threats. Therefore, security mitigation and
protection strategies should be considered mandatory. This study revealed that
there has been a great interest in applying random forest classifier for IDSs.
This is because the implementation of random forest is diverse and almost
effortless to apply for. For instance Caret, Boruta, VSURF ,etc are the example
of random forest implementation in R.
Muhamad Erza Amina toet.al. Has proposed in this system, The recent advances
in mobile technologies have resulted in IoT-enabled devices becoming more
pervasive and integrated into our daily lives. The security challenges that need
to be overcome mainly stem from the open nature of a wireless medium such as
a Wi-Fi network. An impersonation attack is an attack in which an adversary is
13
disguised as a legitimate party in a system or communications protocol. The
connected devices are pervasive, generating high-dimensional data on a large
scale, which complicates simultaneous detections. Feature learning, however,
can circumvent the potential problems that could be caused by the large-volume
nature of network data. This study thus proposes a novel Deep-Feature
Extraction and Selection (D-FES), which combines stacked feature extraction
and weighted feature selection. The stacked autoencoding is capable of
providing representations that are more meaningful by reconstructing the
relevant information from its raw inputs. We then combine this with modified
weighted feature selection inspired by an existing shallow-structured machine
learner. We finally demonstrate the ability of the condensed set of features to
reduce the bias of a machine learner model as well as the computational
complexity. Our experimental results on a well-referenced Wi-Fi network
benchmark dataset, namely, the Aegean Wi-Fi Intrusion Dataset (AWID), prove
the usefulness and the utility of the proposed D-FES by achieving a detection
accuracy of 99.918% and a false alarm rate of 0.012%, which is the most
accurate detection of impersonation attacks reported in the literature HE rapid
growth of the Internet has led to a significant increase in wireless network
traffic in recent years. According to a worldwide telecommunication
consortium, proliferation of 5G and Wi-Fi networks is expected to occur in the
next decades. By 2020 1 wireless network traffic is anticipated to account for
two thirds of total Internet traffic — with 66% of IP traffic expected to be
generated by Wi-Fi and cellular devices only. Although wireless networks such
as IEEE 802.11 have been widely deployed to provide users with mobility and
flexibility in the form of high-speed local area connectivity, other issues such as
privacy and security have raised. The rapid spread of Internet of Things (IoT)-
enabled devices has resulted in wireless networks becoming to both passive and
active attacks, the number of which has grown dramatically. Examples of these
attacks are impersonation, flooding, and injection attacks. In this study, we
14
presented a novel method, D-FES, which combines stacked feature extraction
and weighted feature selection techniques in order to detect impersonation
attacks in Wi-Fi networks. SAE is implemented to achieve high-level
abstraction of complex and large amounts of Wi-Fi network data. The model-
free properties in SAE and its learnability on complex and large-scale data take
into account the open nature of Wi-Fi networks, where an adversary can easily
inject false data or modify data forwarded in the network.
15
CHAPTER 3
SYSTEM ANALYSIS
3.1.1 DRAWBACKS
16
• It requires large amounts of labeled data to train effectively. This data can
be difficult and expensive to collect.
17
3.2.1 ADVANTAGES
• Technical Feasibility
• Operation Feasibility
• Economical Feasibility
18
• Do the proposed equipment’s have the technical capacity to hold the data
required to use the new system?
• Will the proposed system provide adequate response to inquiries, regardless
of the number or location of users?
• Can the system be upgraded if developed?
• Are there technical guarantees of accuracy, reliability, ease of access and
data security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure
Implementation System’. The current system developed is technically feasible.
It is a web based user interface for audit workflow at DB2 Database. Thus it
provides an easy access to the users. The database’s purpose is to create,
establish and maintain a workflow among various entities in order to facilitate
all concerned users in their various capacities or roles. Permission to the users
would be granted based on the roles specified.
19
• Will the system be used and work properly if it is being developed and
implemented?
• Will there be any resistance from the user that will undermine the possible
application benefits?
This system is targeted to be in accordance with the above-mentioned
issues. Beforehand, the management issues and user requirements have been
taken into consideration. So there is no question of resistance from the users that
can undermine the possible application benefits.
The well-planned design would ensure the optimal utilization of the computer
resources and would help in the improvement of performance status.
20
CHAPTER 4
SYSTEM SPECIFICATION
RAM size : 8 GB
21
CHAPTER 5
SOFTWARE DESCRIPTION
JAVA
The software requirement specification is created at the end of the analysis task.
The function and performance allocated to software as part of system
engineering are developed by establishing a complete information report as
functional representation, a representation of system behavior, an indication of
performance requirements and design constraints, appropriate validation
criteria.
FEATURES OF JAVA
22
As a platform-independent environment, Java can be a bit slower than
native code. However, smart compilers, well-tuned interpreters, and just-in-time
byte code compilers can bring Java's performance close to that of native code
without threatening portability.
SOCKET OVERVIEW:
Internet protocol (IP) is a low-level routing protocol that breaks data into
small packets and sends them to an address across a network, which does not
guarantee to deliver said packets to the destination.
CLIENT/SERVER:
23
disk space; and web servers, which store web pages. A client is simply any
other entity that wants to gain access to a particular server.
RESERVED SOCKETS:
INETADDRESS:
24
FACTORY METHODS:
UnknownHostException
throwsUnknowsHostException
ThrowsUnknownHostException
INSTANCE METHODS:
The InetAddress class also has several other methods, which can be used
on the objects returned by the methods just discussed. Here are some of the
most commonly used.
Boolean equals (Object other) - Returns true if this object has the same
Internet address as other.
25
2. String getHostAddress ( ) - Returns a string that represents the host
address associated with the InetAddress object.
3. String get Hostname ( ) - Returns a string that represents the host name
associated with the InetAddress object.
5. String toString ( ) - Returns a string that lists the host name and the IP
address for convenience.
There are two kinds of TCP sockets in Java. One is for servers, and the
other is for clients. The Server Socket class is designed to be a “listener,” which
waits for clients to connect before doing anything. The Socket class is designed
to connect to server sockets and initiate protocol exchanges.
26
Socket (InetAddressipAddress, intport) - Creates a socket using a
preexistingInetAddressobject and a port; can throw an IOException.
A socket can be examined at any time for the address and port
information associated with it, by use of the following methods:
Java has a different socket class that must be used for creating server
applications. The ServerSocket class is used to create servers that listen for
either local or remote client programs to connect to them on published ports.
ServerSockets are quite different form normal Sockets.
When the user create a ServerSocket, it will register itself with the system
as having an interest in client connections.
27
➢ ServerSocket(int port) - Creates server socket on the specified port with a
queue length of 50.
➢ Serversocket(int port, int maxQueue) - Creates a server socket on the
specified portwith a maximum queue length of maxQueue.
➢ ServerSocket(int port, int maxQueue, InetAddress localAddress)-Creates
a server socket on the specified port with a maximum queue length of
maxQueue. On a multihomed host, localAddress specifies the IP address
to which this socket binds.
➢ ServerSocket has a method called accept( ) - which is a blocking call that
will wait for a client to initiate communications, and then return with a
normal Socket that is then used for communication with the client.
URL:
28
CHAPTER 6
PROJECT DESCRIPTION
Traditional network security measures such as firewalls and data encryption are
no longer sufficient to protect networks from the increasing number and types
of cyber-attacks. Intrusion detection systems (IDSs) have been proposed to
address this challenge, but they typically suffer from low detection rates and the
need for extensive feature engineering. Deep learning models have the potential
to overcome these challenges and provide more effective intrusion detection.
Deep learning models can learn complex patterns in network traffic data and
detect new and emerging threats without the need for extensive feature
engineering. However, deep learning models also have several drawbacks,
including high computational cost, data requirements, lack of interpretability,
and vulnerability to adversarial attacks.
30
6.3 SYSTEM FLOW DIAGRAM
Computing
Anomaly Score
Feature
Loading Dataset Preprocessing Based On
Selection
Selected
Features
Detecting Threats
Result Using ADT-SVM
Method
31
6.4 INPUT DESIGN
The output design in the context of the cybersecurity and intrusion detection
project utilizing the ADT-SVM algorithm involves the systematic presentation
and interpretation of results generated by the Intrusion Detection System (IDS).
The output includes categorizations of incoming data into four classes: Basic,
Content, Traffic, and Host, allowing for a granular understanding of network
behavior. Detection and classification outcomes are presented through
visualizations or reports that highlight instances of identified intrusions and
false alarms.
32
CHAPTER 7
System testing in the context of the cybersecurity project employing the ADT-
SVM algorithm involves a comprehensive evaluation of the entire Intrusion
Detection System (IDS). This phase verifies the functionality, performance, and
reliability of the IDS by subjecting it to various test cases and scenarios. The
testing process includes assessing the adaptability of the ADT-SVM algorithm
to dynamic cyber threats and ensuring its ability to categorize data attributes
into the designated classes: Basic, Content, Traffic, and Host. The KDD dataset
is utilized to simulate real-world conditions and benchmark the system's
performance. System testing also incorporates the evaluation of key metrics
such as Detection Rate (DR) and False Alarm Rate (FAR) to gauge the accuracy
and efficiency of the IDS in identifying intrusions while minimizing false
positives.
33
CHAPTER 8
SYSTEM MAINTENANCE
The objectives of this maintenance work are to make sure that the system gets
into work all time without any bug. Provision must be for environmental
changes which may affect the computer or software system. This is called the
maintenance of the system. Nowadays there is the rapid change in the software
world. Due to this rapid change, the system should be capable of adapting these
changes. In this project the process can be added without affecting other parts of
the system. Maintenance plays a vital role. The system is liable to accept any
modification after its implementation. This system has been designed to favor
all new changes. Doing this will not affect the system’s performance or its
accuracy.
TYPES OF MAINTENANCE:
o Corrective maintenance
o Adaptive maintenance
o Perfective maintenance
34
o Preventive maintenance
35
project will adapt those changes. The modification server work as the existing is
performed.
36
CHAPTER 9
9. CONCLUSION
FUTURE WORK
Future work in this domain could focus on refining and extending the proposed
cybersecurity framework to address emerging challenges. Further exploration of
advanced machine learning models, beyond ADT-SVM, could enhance the
system's detection capabilities. Investigating the integration of threat
intelligence feeds and real-time network monitoring technologies could
contribute to a more proactive defense mechanism. Additionally, incorporating
mechanisms for self-learning and adaptation to new attack vectors would be
crucial for staying ahead of evolving threats.
37
CHAPTER 10
APPENDICES
DECISION TREE.JAVA
package adt;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Random;
import libsvm.svm;
import libsvm.svm_model;
import weka.classifiers.Evaluation;
import weka.core.Instances;
import weka.core.converters.CSVLoader;
import weka.classifiers.trees.J48;
/**
38
* @author admin
*/
try
csv1.setSource(new File("train1.csv"));
Instances trdata=csv1.getDataSet();
trdata.setClassIndex(trdata.numAttributes() - 1);
nb.buildClassifier(trdata);
csv2.setSource(new File("test1.csv"));
Instances tedata=csv2.getDataSet();
tedata.setClassIndex(tedata.numAttributes() - 1);
39
Evaluation eval = new Evaluation(trdata);
for(int i=0;i<tedata.numInstances();i++)
int ind=(int)nb.classifyInstance(tedata.instance(i));
newCls.add(ind);
// int it=(int)tedata.instance(i).classValue();
// int ind=(int)nb.classifyInstance(tedata.instance(i));
// System.out.println(it+" : "+ind);
ocSVM();
System.out.println(eval.toClassDetailsString());
ADTocSVM();
catch(Exception e)
e.printStackTrace();
40
public void ocSVM()
try
svm1.readTrData("train2.csv");
svm1.convertTrData("train1.txt");
svmtr.run();
readData("test2.csv");
int i, predict_probability=0;
if(predict_probability == 1)
if(svm.svm_check_probability_model(model)==0)
41
{
System.exit(1);
else
if(svm.svm_check_probability_model(model)!=0)
String res=sm.predict(input,output,model,predict_probability);
input.close();
output.close();
catch(Exception e)
e.printStackTrace();
42
try
svm1.readTrData("train2.csv");
svm1.convertTrData("train1.txt");
svmtr.run();
convertData2("test2.csv");
int i, predict_probability=0;
if(predict_probability == 1)
if(svm.svm_check_probability_model(model)==0)
System.exit(1);
43
}
else
if(svm.svm_check_probability_model(model)!=0)
String res=sm.predict(input,output,model,predict_probability);
input.close();
output.close();
catch(Exception e)
e.printStackTrace();
try
String dSet[][];
int nData[][];
44
ArrayList cls=new ArrayList();
String colName[];
String colType[];
fis.read(data);
fis.close();
String col[]=sg1[0].split(",");
String colty[]=sg1[1].split(",");
colName=new String[col.length];
colType=new String[col.length];
for(int i=0;i<col.length;i++)
colName[i]=col[i];
colType[i]=colty[i];
dSet=new String[sg1.length-2][col.length];
45
nData=new int[sg1.length-2][col.length];
for(int i=2;i<sg1.length;i++)
String sg2[]=sg1[i].split(",");
for(int j=0;j<sg2.length;j++)
dSet[i-2][j]=sg2[j]; //org
String c1=sg2[sg2.length-1].trim();
if(!cls.contains(c1))
cls.add(c1);
System.out.println("cls "+cls);
System.out.println("clsCnt "+clsCnt);
for(int i=0;i<colType.length;i++)
if(colType[i].trim().equals("dis"))
46
ArrayList at=new ArrayList();
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
if(!at.contains(g1))
at.add(g1);
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
nData[j][i]=at.indexOf(g1);
else
for(int j=0;j<dSet.length;j++)
dSet[j][i]=String.valueOf(Math.round(Double.parseDouble(dSet[j][i])));
nData[j][i]=Integer.parseInt(dSet[j][i]);
47
String txt1="";
for(int i=0;i<nData.length;i++)
// String g1="";
String g1=String.valueOf(nData[i][nData[0].length-1]);
for(int j=0;j<nData[0].length-1;j++)
g1=g1+"\t"+nData[i][j];
//g1=g1+nData[i][j]+"\t";
txt1=txt1+g1.trim()+"\n";
System.out.println(txt1);
fos.write(txt1.getBytes());
fos.close();
catch(Exception e)
e.printStackTrace();
48
}
try
String dSet[][];
int nData[][];
String colName[];
String colType[];
fis.read(data);
fis.close();
String col[]=sg1[0].split(",");
String colty[]=sg1[1].split(",");
colName=new String[col.length];
49
colType=new String[col.length];
for(int i=0;i<col.length;i++)
colName[i]=col[i];
colType[i]=colty[i];
dSet=new String[sg1.length-2][col.length];
nData=new int[sg1.length-2][col.length];
for(int i=2;i<sg1.length;i++)
String sg2[]=sg1[i].split(",");
for(int j=0;j<sg2.length;j++)
dSet[i-2][j]=sg2[j]; //org
String c1=sg2[sg2.length-1].trim();
if(!cls.contains(c1))
cls.add(c1);
50
}
System.out.println("cls "+cls);
System.out.println("clsCnt "+clsCnt);
for(int i=0;i<colType.length;i++)
if(colType[i].trim().equals("dis"))
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
if(!at.contains(g1))
at.add(g1);
for(int j=0;j<dSet.length;j++)
String g1=dSet[j][i].trim();
nData[j][i]=at.indexOf(g1);
51
else
for(int j=0;j<dSet.length;j++)
dSet[j][i]=String.valueOf(Math.round(Double.parseDouble(dSet[j][i])));
nData[j][i]=Integer.parseInt(dSet[j][i]);
String txt1="";
for(int i=0;i<nData.length;i++)
//String g1=String.valueOf(nData[i][nData[0].length-1]);
String g1=newCls.get(i).toString();
//String g1="";
for(int j=0;j<nData[0].length-1;j++)
g1=g1+"\t"+nData[i][j];
//g1=g1+nData[i][j]+"\t";
g1=g1+newCls.get(nData[0].length-1);
txt1=txt1+g1.trim()+"\n";
52
System.out.println(txt1);
fos.write(txt1.getBytes());
fos.close();
catch(Exception e)
e.printStackTrace();
53
10.2 SCREEN SHOTS
54
55
CHAPTER 11
REFERENCES
[4] B. A. Tama and S. Lim, ‘‘Ensemble learning for intrusion detection systems:
A systematic mapping study and cross-benchmark evaluation,’’ Comput. Sci.
Rev., vol. 39, Feb. 2021, Art. no. 100357.
[5] S. Lei, C. Xia, Z. Li, X. Li, and T. Wang, ‘‘HNN: A novel model to study
the intrusion detection based on multi-feature correlation and temporalspatial
analysis,’’ IEEE Trans. Netw. Sci. Eng., vol. 8, no. 4, pp. 3257–3274, Oct. 2021
[7] X. Li, M. Zhu, L. T. Yang, M. Xu, Z. Ma, C. Zhong, H. Li, and Y. Xiang,
‘‘Sustainable ensemble learning driving intrusion detection model,’’ IEEE
56
Trans. Dependable Secure Comput., vol. 18, no. 4, pp. 1591–1604, Jul./Aug.
2021
[14] K. Li, G. Zhou, J. Zhai, F. Li, and M. Shao, ‘‘Improved PSO AdaBoost
ensemble algorithm for imbalanced data,’’ Sensors, vol. 19, no. 6, p. 1476, Mar.
2019
57