0% found this document useful (0 votes)
82 views23 pages

Machine Learning For Cyber: Unit 1: Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views23 pages

Machine Learning For Cyber: Unit 1: Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine Learning for

Cyber

Unit 1: Introduction

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Learning Outcomes

Upon completion of this unit:


• Students will have a better understanding of machine learning
approaches.
• Students will have a better understanding of features.
• Students will have a better understanding of data sets.
• Students will have a better understanding of the need for machine
learning to solve cyber security problems.
• Students will have a better understanding of the difference between
deep learning and machine learning.
• Students will have a better understanding of big data and how it
relates to machine learning and cyber security.

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Terms

Machine learning

Cyber security

Big data

Deep learning

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Tools

• WEKA
• Python
• Numpy
• Pandas
• Sklearn
• Tensorflow
• Hadoop
• Spark
• AWS
• GPUs, CPUs, TPUs, cognitive processors

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
What is machine learning?

• Methods for predicting, detecting , or grouping data samples based


on a model

• The model must be learned with data

• Methods can be geometrical (or not) and the model is based on


distance metrics or linear boundaries

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Why machine learning for cyber?

• Too much data

• Building models by hand is labor intensive

• Machine learning can also learn models

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
What is big data?

• Lots of data
• Terabytes

• So much data that a single computer with 8 RAM and latest CPU cannot
do the work

• Instead, need more powerful computer

• Better yet, several computers working in parallell

• Two main approaches:


• Parallel CPUs
• GPUs (1 or several also in parallel)

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Machine learning Terms - 1

• Supervised
• Classifiers

Classifier Y_test
Train Data Train Model Evaluation
dividing
Data

X_test
Test Data

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Machine learning Terms - 2

• Unsupervised
• Clustering

Clustering
Data Clustered Data Evaluation

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Machine learning Terms - 3

• Features
• Data sets
• Data pre-processing
• Performance metrics

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Machine learning algorithms

• Naïve Bayes
• Decision trees
• Random forest
• KNN
• Linear regression
• Logistic regression
• Neural networks
• Support Vector Machines

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
What is deep learning?

• Neural networks with more layers between the input and output
layers
• Batch processing for big data
• Matrix multiplication operation takes advantage of GPUs
• Have outperformed all others since around 2012

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Machine learning pipeline

Pre-processing
Data Vector Space Model
(formatting, featured…)

Evaluation Machine Learning Algorithms

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Dataset formats

• .csv
• 0,tcp,http,SF,162,4528,0,0,0,1, … ,normal.

• .libsvm
• [label] [index 1]:[value 1] [index 2]:[value 2] …

• .arff
• The format of Weka storage data
• @duration numeric
@protocol_type {tcp,udp,icmp}

@data
0,tcp,http,SF,162,4528,0,0,0,1, … ,normal

• etc …

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
What is a sample?

• Defining your sample is critical

• Examples:
• A single item: text (Bag-of-Words)
Data science is popular.

• An image: fingerprint.bmp

• Elements or averages within a time window:

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Data sets

• NSL-KDD network intrusion


• Unsw big data networking
• Iris
• Phishing
• Honeypot unsupervised
• Denial of service
• Malware
• Ransomware
• Biometrics

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Companies using Machine learning

• Tesla
• Facebook
• Google
• Amazon (Alexa for instance)
• Apple 
• Microsoft

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Companies using machine learning
for Cyber

• Northrop Grumman
• BluVector
• Banks
• Etc.

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Sample Code

• Additionally, all the code used in this book can be obtained from
GitHub at Prof. Calix's Github and any other complimentary materials

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Environment

• Virtual Machine
• You can use the latest version of Linux to run your code.
• Ubuntu 14.04 to 16.04 (64 bit) and Mac

• Tensorflow from (Tensorflow Website )

• Sklearn from (Scikit-learn Website )

• AWS

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
If you want to build a physical box

• A GPU GeForce gtx980 or better (or 1070 or Titan)


• A CPU such as the AMD 8 CORE
• Power supply EVGA SuperNOVA 1200 P2 220
• Motherboard for GPU and CPU
• 32 MB of RAM (DDR3)
• SSD hard drive 1 TB
• A case
• The total cost for 1 device with just 1 CPU and 1 GPU may be
between $1,500 and $2,000.

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017
Summary

• Intro to data science


• Intro tools for data science
• Intro machine learning and deep learning
• Intro cyber security datasets for the course
• Intro environment for the course

This document is licensed with a Creative Commons Attribution 4.0 International License ©2017

You might also like