A Machine Learning Proposal
A Machine Learning Proposal
BY
SUBMITTED TO
MRS. M.M USMAN
JUNE, 2021
1
CHAPTER ONE: INTRODUCTION
The aim of this study is to examine a machine learning approach to information security.
Specifically, it sought to:
1. Investigate the use of UNSW_NB 15 (University of New South Wales –NB 2015) for
the protection of information system
2. Find out how Naive Bayes is used for the protection of information system
3. Examine the use of C4.5 Decision Tree machine learning algorithms for the protection of
information system
4. Ascertain the how KNN (K-Nearest Neighbour) is used for the protection of information
system
1.4 Scope and Limitation of Study
The study is carried out to a machine learning approach to information security. Machine
learning approaches are widely used to solve various types of information securities. The
proposed project would cover a Machine Learning, Network Intrusion Detection system for the
protection of information system based on the UNSW-NB15 dataset, Naive Bayes, KNN and
Decision Models.
However, in the effort of carrying out this research, researcher will face problem of time and
finance.
1.5 Significant of the Study
The results of this study will help the cyber security experts as it will direct them on how to save
guard and secured an information system against the notorious activities of hackers and cyber
attacker, the task of keeping information system secured and sustained in a secured state during
the period of their usage ( lifetime) is the aim of this research work.
2
1.6 Definition of terms
Algorithm: a process or set of rules to be followed in a computer
Cyber attack: any attempt to expose, alter, disable, destroy, steal or gain information through
unauthorized means
Cyber security: the practice of protecting systems, networks, and programs from digital attacks
Machine learning: the study of computer algorithms that improve automatically through
experience and by the use of data.
Information security: sometimes shortened to infosec, is the practice of
protecting information by mitigating information risks. It is part of information risk management.
CHAPTER TWO: LITERATURE REVEW
2.1 Machine Learning
Machine learning (ML) is the study of computer algorithms that improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence.
2.2 Machine Learning Approaches
Machine learning approaches are traditionally divided into three broad categories, depending on
the nature of the "signal" or "feedback" available to the learning system:
Supervised learning: The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to
outputs.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).
Reinforcement learning: A computer program interacts with a dynamic environment in
which it must perform a certain goal (such as driving a vehicle or playing a game against an
opponent). As it navigates its problem space, the program is provided feedback that's
analogous to rewards, which it tries to maximize
3
reduce/mitigate – implement safeguards and countermeasures to eliminate vulnerabilities or
block threats
assign/transfer – place the cost of the threat onto another entity or organization such as
purchasing insurance or outsourcing
accept – evaluate if the cost of the countermeasure outweighs the possible cost of loss due to
the threat
CHAPTER THREE: ANALYSIS AND DESIGN
3.1.0 ANALYSIS OF THE EXISTING SYSTEM
The existing system of machine learning, network intrusion detection system for the protection
of information system. It refers to the systems, tools and processes that are designed and then
deployed to field sensitive and confidential data from being compromised or tampered with.
3.1.1 STRENGTH OF THE EXISTING SYSTEM
The advantage of this system is to save guard and secured an information system against the
notorious activities of hackers and cyber attacker.
3.1.2 WEAKNESSES OF THE EXISTING SYSTEM
The weaknesses of the existing system were InfoSec was traditionally considered an IT
problem– this couldn’t be further from the truth. Attacks could occur from any weak link in the
company regardless of the hierarchy or department, so it is imperative that the entire enterprise is
protected by seamless security programmes.
3.2 ANALYSIS OF THE PROPOSED SYSTEM
Unsw-nb15 dataset has two attributes that can serve as class label; label and the attack_cat
attributes, the label attribute is a binary label attribute has value of 0 for normal connection and
value of 1 for attack connection, the attack_cat attribute has 10 values, each for the nine attacks
categories connections and the normal connection.
3.3 METHODOLOGY
This section presented machine learning-based information security intrusion detection models.
This comprised of several processing steps: exploring the security dataset, preparing raw data,
determining feature importance and ranking, and building the resultant models.
3.4 SYSTEM DESIGN
System design is a solution to a problem, it demands the translation of the requirements
uncovered in analysis into possible ways of meeting them (E.O Nwachukwu).
3.4.1 INPUT and output SPECIFICATION
The inputs and outputs to a machine learning task may be of different kinds. Generally, they are
in the form of numeric (both discrete and real-valued) or nominal attributes. Numeric attributes
may have continuous numeric values whereas nominal values may have values from a pre-
defined set.
4
CHAPTER FOUR: SYSTEM IMPLEMENTATION AND TESTING
4.1 Implementation
4.1.1 Naïve Bayes (NB).
These algorithms are probabilistic classifiers which make the a-priori assumption that the
features of the input dataset are independent from each other. They are scalable and do not
require huge training datasets to produce appreciable results
4.1.2 K-Nearest Neighbour (KNN).
KNN are used for classification and can be used for multi-class problems. However, both their
training and test phase are computationally demanding as to classify each test sample, they
compare it against all the training samples.
4.1.3 C4.5 Decision Tree
In this type of classification, the target concept is represented in the form of a tree.
The tree is built by using the principle of recursive partitioning. An attribute is selected as a
partitioning attribute (also referred to as node) based on some criteria (like information gain)
[Mit97].
4.2 Testing
The models were evaluated using the testing dataset, from the work,
CHAPTER FIVE: SUMMARY, CONCLUSION AND RECOMMENDATION
5.1 Summary
5.2 Conclusion
5.3 Recommendation
5.4 Future work
REFERENCE
APPENDIX