0% found this document useful (0 votes)
46 views

Ddos Dataset: Import As Import As Import As Import As From Import

This document summarizes code for analyzing a dataset of DDoS network traffic. It loads a CSV dataset into a Pandas dataframe, checks the data types and shape of the dataframe, and defines functions for preprocessing the data by handling missing values and reducing memory usage.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Ddos Dataset: Import As Import As Import As Import As From Import

This document summarizes code for analyzing a dataset of DDoS network traffic. It loads a CSV dataset into a Pandas dataframe, checks the data types and shape of the dataframe, and defines functions for preprocessing the data by handling missing values and reducing memory usage.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

6/20/22, 12:53 PM Copy_of_DDoS

DDos Dataset
In [1]: import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

import seaborn as sns

In [2]: from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive

In [3]: from google.colab import drive

drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive

In [4]: df =pd.read_csv('/content/drive/MyDrive/DDoS/compiled.csv')

/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2882:
DtypeWarning: Columns (85) have mixed types.Specify dtype option on import or
set low_memory=False.

exec(code_obj, self.user_global_ns, self.user_ns)

In [5]: df.shape

Out[5]: (400000, 88)

In [6]: # df.to_csv('/content/drive/MyDrive/DDoS/compiled.csv',index=False)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 1/51
6/20/22, 12:53 PM Copy_of_DDoS

In [7]: np.array(df.dtypes)

Out[7]: array([dtype('int64'), dtype('O'), dtype('O'), dtype('int64'), dtype('O'),

dtype('int64'), dtype('int64'), dtype('O'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'),

dtype('int64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'),

dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'),

dtype('int64'), dtype('O')], dtype=object)

In [8]: df.head()

Out[8]:
Unnamed: Source Destination Destination
Flow ID Source IP Protocol Timestam
0 Port IP Port

172.16.0.5-
2018-12-0
0 12368 192.168.50.1- 172.16.0.5 550 192.168.50.1 1068 17
11:06:24.33969
550-1068-17

172.16.0.5-
2018-12-0
1 24112 192.168.50.1- 172.16.0.5 939 192.168.50.1 62932 17
11:06:21.1350
939-62932-17

172.16.0.5-
2018-12-0
2 23589 192.168.50.1- 172.16.0.5 564 192.168.50.1 32767 17
11:06:08.77624
564-32767-17

172.16.0.5-
2018-12-0
3 11258 192.168.50.1- 172.16.0.5 564 192.168.50.1 42118 17
11:06:19.0182
564-42118-17

172.16.0.5-
2018-12-0
4 9526 192.168.50.1- 172.16.0.5 559 192.168.50.1 10300 17
11:06:11.8384
559-10300-17

5 rows × 88 columns

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 2/51
6/20/22, 12:53 PM Copy_of_DDoS

In [9]: np.array(df.columns)

Out[9]: array(['Unnamed: 0', 'Flow ID', ' Source IP', ' Source Port',

' Destination IP', ' Destination Port', ' Protocol', ' Timestamp',

' Flow Duration', ' Total Fwd Packets', ' Total Backward Packets',

'Total Length of Fwd Packets', ' Total Length of Bwd Packets',

' Fwd Packet Length Max', ' Fwd Packet Length Min',

' Fwd Packet Length Mean', ' Fwd Packet Length Std',

'Bwd Packet Length Max', ' Bwd Packet Length Min',

' Bwd Packet Length Mean', ' Bwd Packet Length Std',

'Flow Bytes/s', ' Flow Packets/s', ' Flow IAT Mean',

' Flow IAT Std', ' Flow IAT Max', ' Flow IAT Min', 'Fwd IAT Total',

' Fwd IAT Mean', ' Fwd IAT Std', ' Fwd IAT Max', ' Fwd IAT Min',

'Bwd IAT Total', ' Bwd IAT Mean', ' Bwd IAT Std', ' Bwd IAT Max',

' Bwd IAT Min', 'Fwd PSH Flags', ' Bwd PSH Flags',

' Fwd URG Flags', ' Bwd URG Flags', ' Fwd Header Length',

' Bwd Header Length', 'Fwd Packets/s', ' Bwd Packets/s',

' Min Packet Length', ' Max Packet Length', ' Packet Length Mean',

' Packet Length Std', ' Packet Length Variance', 'FIN Flag Count',

' SYN Flag Count', ' RST Flag Count', ' PSH Flag Count',

' ACK Flag Count', ' URG Flag Count', ' CWE Flag Count',

' ECE Flag Count', ' Down/Up Ratio', ' Average Packet Size',

' Avg Fwd Segment Size', ' Avg Bwd Segment Size',

' Fwd Header Length.1', 'Fwd Avg Bytes/Bulk',

' Fwd Avg Packets/Bulk', ' Fwd Avg Bulk Rate',

' Bwd Avg Bytes/Bulk', ' Bwd Avg Packets/Bulk',

'Bwd Avg Bulk Rate', 'Subflow Fwd Packets', ' Subflow Fwd Bytes',

' Subflow Bwd Packets', ' Subflow Bwd Bytes',

'Init_Win_bytes_forward', ' Init_Win_bytes_backward',

' act_data_pkt_fwd', ' min_seg_size_forward', 'Active Mean',

' Active Std', ' Active Max', ' Active Min', 'Idle Mean',

' Idle Std', ' Idle Max', ' Idle Min', 'SimillarHTTP', ' Inbound',

' Label'], dtype=object)

In [10]: def Pre_Process_data(df,col):

print("Name of column with NaN: "+str(col))

print(df[col].value_counts(dropna=False,normalize=True).head())

df[col].replace(np.inf,-1,inplace=True)

return df

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 3/51
6/20/22, 12:53 PM Copy_of_DDoS

In [11]: def reduce_mem_usage(df):

start_mem_usg = df.memory_usage().sum()/ 1024**2

print("Memory usage of properties dataframe is :", start_mem_usg," MB")

for col in df.columns:

if(df[col].dtype != object):

print("*"*20)

print("Column: ",col)

print("dtype before: ",df[col].dtype)

IsInt = False

mx = df[col].max()

mn = df[col].min()

if not np.isfinite(df[col]).all():

df = Pre_Process_data(df,col)

asint = df[col].fillna(0).astype(np.int64)

result = (df[col]-asint)

result = result.sum()

if(result>-0.01 and result<0.01):

IsInt = True

if IsInt:

if mn>=0:

if mx<255:

df[col] = df[col].astype(np.uint8)

elif mx<65535:

df[col] = df[col].astype(np.uint16)

elif mx<4294967295:

df[col] = df[col].astype(np.uint32)

else:

df[col] = df[col].astype(np.uint64)

else:

if mn > np.iinfo(np.int8).min and mx < np.iinfo(np.int8).max:

df[col] = df[col].astype(np.int8)

elif mn > np.iinfo(np.int16).min and mx < np.iinfo(np.int16).max:

df[col] = df[col].astype(np.int16)

elif mn > np.iinfo(np.int32).min and mx < np.iinfo(np.int32).max:

df[col] = df[col].astype(np.int32)

elif mn > np.iinfo(np.int64).min and mx < np.iinfo(np.int64).max:

df[col] = df[col].astype(np.int64)

else:

df[col] = df[col].astype(np.float32)

print("dtype after: ",df[col].dtype)

print("*"*20)

print("__Memory Usage after completion:__")

mem_usg = df.memory_usage().sum() / 1024**2

print("Memory usage is :", mem_usg," MB")

print("this is ",100*mem_usg/start_mem_usg,"% of the initial size")

return df

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 4/51
6/20/22, 12:53 PM Copy_of_DDoS

In [12]: df = reduce_mem_usage(df)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 5/51
6/20/22, 12:53 PM Copy_of_DDoS

Memory usage of properties dataframe is : 268.5548095703125 MB

********************

Column: Unnamed: 0

dtype before: int64

dtype after: uint32

********************

dtype after: object

********************

dtype after: object

********************

********************

Column: Source Port

dtype before: int64

dtype after: uint16

********************

dtype after: object

********************

********************

Column: Destination Port

dtype before: int64

dtype after: uint32

********************

********************

Column: Protocol

dtype before: int64

dtype after: uint8

********************

dtype after: object

********************

********************

Column: Flow Duration

dtype before: int64

dtype after: uint32

********************

********************

Column: Total Fwd Packets

dtype before: int64

dtype after: uint32

********************

********************

Column: Total Backward Packets

dtype before: int64

dtype after: uint16

********************

********************

Column: Total Length of Fwd Packets

dtype before: float64


dtype after: uint32

********************

********************

Column: Total Length of Bwd Packets

dtype before: float64


dtype after: uint32

********************

********************

Column: Fwd Packet Length Max

dtype before: float64


localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 6/51
6/20/22, 12:53 PM Copy_of_DDoS

dtype after: uint16

********************

********************

Column: Fwd Packet Length Min

dtype before: float64


dtype after: uint16

********************

********************

Column: Fwd Packet Length Mean

dtype before: float64


dtype after: float32

********************

********************

Column: Fwd Packet Length Std

dtype before: float64


dtype after: float32

********************

********************

Column: Bwd Packet Length Max

dtype before: float64


dtype after: uint16

********************

********************

Column: Bwd Packet Length Min

dtype before: float64


dtype after: uint16

********************

********************

Column: Bwd Packet Length Mean

dtype before: float64


dtype after: float32

********************

********************

Column: Bwd Packet Length Std

dtype before: float64


dtype after: float32

********************

********************

Column: Flow Bytes/s

dtype before: float64


Name of column with NaN: Flow Bytes/s

0.000000e+00 0.162598

2.944000e+09 0.112522

4.580000e+08 0.102990

1.472000e+09 0.039172

2.290000e+08 0.020842

Name: Flow Bytes/s, dtype: float64

dtype after: float32

********************

********************

Column: Flow Packets/s

dtype before: float64


Name of column with NaN: Flow Packets/s

2.000000e+06 0.487212

1.000000e+06 0.108950

inf 0.033670

4.166667e+04 0.019860

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 7/51
6/20/22, 12:53 PM Copy_of_DDoS

6.666667e+05 0.014640

Name: Flow Packets/s, dtype: float64

dtype after: float32

********************

********************

Column: Flow IAT Mean

dtype before: float64


dtype after: float32

********************

********************

Column: Flow IAT Std


dtype before: float64
dtype after: float32

********************

********************

Column: Flow IAT Max


dtype before: float64
dtype after: uint32

********************

********************

Column: Flow IAT Min


dtype before: float64
dtype after: uint32

********************

********************

Column: Fwd IAT Total


dtype before: float64
dtype after: uint32

********************

********************

Column: Fwd IAT Mean


dtype before: float64
dtype after: float32

********************

********************

Column: Fwd IAT Std

dtype before: float64


dtype after: float32

********************

********************

Column: Fwd IAT Max

dtype before: float64


dtype after: uint32

********************

********************

Column: Fwd IAT Min

dtype before: float64


dtype after: uint32

********************

********************

Column: Bwd IAT Total


dtype before: float64
dtype after: uint32

********************

********************

Column: Bwd IAT Mean


dtype before: float64

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 8/51
6/20/22, 12:53 PM Copy_of_DDoS

dtype after: float32

********************

********************

Column: Bwd IAT Std

dtype before: float64


dtype after: float32

********************

********************

Column: Bwd IAT Max

dtype before: float64


dtype after: uint32

********************

********************

Column: Bwd IAT Min

dtype before: float64


dtype after: uint8

********************

********************

Column: Fwd PSH Flags


dtype before: int64

dtype after: uint8

********************

********************

Column: Bwd PSH Flags

dtype before: int64

dtype after: uint8

********************

********************

Column: Fwd URG Flags

dtype before: int64

dtype after: uint8

********************

********************

Column: Bwd URG Flags

dtype before: int64

dtype after: uint8

********************

********************

Column: Fwd Header Length

dtype before: int64

dtype after: int64

********************

********************

Column: Bwd Header Length

dtype before: int64

dtype after: int32

********************

********************

Column: Fwd Packets/s


dtype before: float64
dtype after: float32

********************

********************

Column: Bwd Packets/s

dtype before: float64


dtype after: float32

********************

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 9/51
6/20/22, 12:53 PM Copy_of_DDoS

********************

Column: Min Packet Length

dtype before: float64


dtype after: uint16

********************

********************

Column: Max Packet Length

dtype before: float64


dtype after: uint16

********************

********************

Column: Packet Length Mean

dtype before: float64


dtype after: float32

********************

********************

Column: Packet Length Std

dtype before: float64


dtype after: float32

********************

********************

Column: Packet Length Variance

dtype before: float64


dtype after: float32

********************

********************

Column: FIN Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: SYN Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: RST Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: PSH Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: ACK Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: URG Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: CWE Flag Count

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 10/51
6/20/22, 12:53 PM Copy_of_DDoS

dtype before: int64

dtype after: uint8

********************

********************

Column: ECE Flag Count

dtype before: int64

dtype after: uint8

********************

********************

Column: Down/Up Ratio

dtype before: float64


dtype after: uint8

********************

********************

Column: Average Packet Size

dtype before: float64


dtype after: float32

********************

********************

Column: Avg Fwd Segment Size

dtype before: float64


dtype after: float32

********************

********************

Column: Avg Bwd Segment Size

dtype before: float64


dtype after: float32

********************

********************

Column: Fwd Header Length.1

dtype before: int64

dtype after: int64

********************

********************

Column: Fwd Avg Bytes/Bulk

dtype before: int64

dtype after: uint8

********************

********************

Column: Fwd Avg Packets/Bulk

dtype before: int64

dtype after: uint8

********************

********************

Column: Fwd Avg Bulk Rate

dtype before: int64

dtype after: uint8

********************

********************

Column: Bwd Avg Bytes/Bulk

dtype before: int64

dtype after: uint8

********************

********************

Column: Bwd Avg Packets/Bulk

dtype before: int64

dtype after: uint8

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 11/51
6/20/22, 12:53 PM Copy_of_DDoS

********************

********************

Column: Bwd Avg Bulk Rate

dtype before: int64

dtype after: uint8

********************

********************

Column: Subflow Fwd Packets

dtype before: int64

dtype after: uint32

********************

********************

Column: Subflow Fwd Bytes

dtype before: int64

dtype after: uint32

********************

********************

Column: Subflow Bwd Packets

dtype before: int64

dtype after: uint16

********************

********************

Column: Subflow Bwd Bytes

dtype before: int64

dtype after: uint32

********************

********************

Column: Init_Win_bytes_forward

dtype before: int64

dtype after: int32

********************

********************

Column: Init_Win_bytes_backward

dtype before: int64

dtype after: int32

********************

********************

Column: act_data_pkt_fwd

dtype before: int64

dtype after: uint16

********************

********************

Column: min_seg_size_forward

dtype before: int64

dtype after: int32

********************

********************

Column: Active Mean

dtype before: float64


dtype after: float32

********************

********************

Column: Active Std

dtype before: float64


dtype after: float32

********************

********************

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 12/51
6/20/22, 12:53 PM Copy_of_DDoS

Column: Active Max

dtype before: float64


dtype after: uint32

********************

********************

Column: Active Min

dtype before: float64


dtype after: uint32

********************

********************

Column: Idle Mean

dtype before: float64


dtype after: float32

********************

********************

Column: Idle Std

dtype before: float64


dtype after: float32

********************

********************

Column: Idle Max

dtype before: float64


dtype after: uint32

********************

********************

Column: Idle Min

dtype before: float64


dtype after: uint32

********************

dtype after: object

********************

********************

Column: Inbound

dtype before: int64

dtype after: uint8

********************

dtype after: object

********************

__Memory Usage after completion:__

Memory usage is : 113.6781005859375 MB

this is 42.329571668376516 % of the initial size

Visualisation

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 13/51
6/20/22, 12:53 PM Copy_of_DDoS

In [13]: data_ = df

data = df

df[' Label'].value_counts()

Out[13]: Syn 39995


DrDoS_SNMP 39990
DrDoS_LDAP 39985
DrDoS_SSDP 39980
DrDoS_NetBIOS 39900
DrDoS_MSSQL 39854
DrDoS_UDP 39789
DrDoS_DNS 39637
UDP-lag 39225
DrDoS_NTP 37446
BENIGN 4124
WebDDoS 75
Name: Label, dtype: int64

In [14]: labels = df[' Label'].unique()

sizes = np.array(df[' Label'].value_counts())

sizes

Out[14]: array([39995, 39990, 39985, 39980, 39900, 39854, 39789, 39637, 39225,

37446, 4124, 75])

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 14/51
6/20/22, 12:53 PM Copy_of_DDoS

In [15]: colors = ['gold','yellowgreen','lightcoral','lightskyblue','yellow','purple',


'grey','indigo','orange','black','violet','magenta','white']

explode = (0.3,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.3,0.2,0.1)

plt.rcParams.update({'font.size': 22})

plt.figure(figsize=(10,10))

plt.pie(sizes,explode=explode,labels=labels,colors = colors,autopct = '%1.2f%%


',shadow=True,startangle=140)

plt.axis('equal')

plt.show()

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 15/51
6/20/22, 12:53 PM Copy_of_DDoS

In [16]: plt.figure(figsize=(40,20))

g1 = sns.countplot(x = ' Label',hue = ' Label',data= data_)

gt = g1.twinx()

gt = sns.pointplot(y = ' Flow Packets/s',x = ' Label',data= data_,color = 'bla


ck',legend = True)

gt.set_ylabel( ' Flow Packets/s',fontsize=16)

Out[16]: Text(0, 0.5, ' Flow Packets/s')

In [17]: plt.figure(figsize=(40,20))

g1 = sns.countplot(x = ' Label',hue = ' Label',data= data_)

gt = g1.twinx()

gt = sns.pointplot(y = 'Flow Bytes/s',x = ' Label',data= data_,color = 'black'


,legend = True)

gt.set_ylabel( 'Flow Bytes/s',fontsize=16)

Out[17]: Text(0, 0.5, 'Flow Bytes/s')

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 16/51
6/20/22, 12:53 PM Copy_of_DDoS

In [18]: plt.figure(figsize=(40,16))

g1 = sns.scatterplot(y=" Total Fwd Packets",x = 'Total Length of Fwd Packets',


sizes = (200,400),size=' Flow Duration',data= data_)

gt = g1.twinx()

gt = sns.pointplot(y = 'Fwd Packets/s',x = ' Label',data= data_,color = 'blac


k',legend = False)

gt.set_ylabel( 'Fwd Packets/s',fontsize=16)

Out[18]: Text(0, 0.5, 'Fwd Packets/s')

In [19]: plt.figure(figsize=(40,16))

g1 = sns.scatterplot(y=" Total Backward Packets",x = ' Total Length of Bwd Pac


kets',sizes = (200,400),size=' Flow Duration',data= data_)

gt = g1.twinx()

gt = sns.pointplot(y = ' Bwd Packets/s',x = ' Label',data= data_,color = 'blac


k',legend = False)

gt.set_ylabel( 'Bwd Packets/s',fontsize=16)

Out[19]: Text(0, 0.5, 'Bwd Packets/s')

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 17/51
6/20/22, 12:53 PM Copy_of_DDoS

In [20]: plt.figure(figsize=(20,16))

g1 = sns.countplot(x = ' Label',data= data_,alpha=0.5)

gt = g1.twinx()

gt = sns.countplot(x = ' Protocol',hue = ' Label',alpha=0.7,data=data_)

gt.set_ylabel( ' count',fontsize=16)

Out[20]: Text(0, 0.5, ' count')

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 18/51
6/20/22, 12:53 PM Copy_of_DDoS

In [21]: plt.figure(figsize=(20,16))

g1 = sns.countplot(x = ' Label',data= data_,alpha=0.5)

gt = g1.twinx()

gt = sns.countplot(x = ' Inbound',hue = ' Label',alpha=0.7,data=data_)

gt.set_ylabel( ' Inbound',fontsize=16)

Out[21]: Text(0, 0.5, ' Inbound')

In [22]: from sklearn.preprocessing import StandardScaler

from sklearn import preprocessing

In [23]: y = df[' Label']

df = df.drop(['Flow ID',' Source IP',' Source Port', ' Destination IP',' Desti
nation Port',' Timestamp','Fwd Packets/s','Flow Bytes/s','SimillarHTTP',' Labe
l'],axis=1)

X = StandardScaler().fit_transform(df)

X_norm = preprocessing.normalize(X)

In [24]: X_norm.shape

Out[24]: (400000, 78)

In [25]: df[' Inbound'].value_counts()

Out[25]: 1 395654

0 4346

Name: Inbound, dtype: int64

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 19/51
6/20/22, 12:53 PM Copy_of_DDoS

In [26]: f = plt.figure(figsize=(20,15))

plt.matshow(df.corr(),fignum=f.number)

plt.xticks(range(df.shape[1]),df.columns,fontsize=10,rotation=90)

plt.yticks(range(df.shape[1]),df.columns,fontsize=10)

cb = plt.colorbar()

cb.ax.tick_params(labelsize=14)

In [26]:

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 20/51
6/20/22, 12:53 PM Copy_of_DDoS

In [27]: from scipy import stats

total = 0

count=0

for i in df.columns:

for j in df.columns:

if i != j :

corr, _ = stats.pearsonr(data_[i],data_[j])

total=total+1

if corr>0.9:

count = count+1

print("Person correlation between "+i+' and '+j+' :%.3f' %corr)

print(count,total)

print(count/total)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 21/51
6/20/22, 12:53 PM Copy_of_DDoS

/usr/local/lib/python3.7/dist-packages/scipy/stats/stats.py:3508: PearsonRCon
stantInputWarning: An input array is constant; the correlation coefficent is
not defined.

warnings.warn(PearsonRConstantInputWarning())

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 22/51
6/20/22, 12:53 PM Copy_of_DDoS

Person correlation between Flow Duration and Fwd IAT Total :1.000

Person correlation between Total Fwd Packets and Subflow Fwd Packets :1.000

Person correlation between Total Backward Packets and Subflow Bwd Packets :
1.000

Person correlation between Total Length of Fwd Packets and Subflow Fwd Bytes
:1.000

Person correlation between Total Length of Fwd Packets and act_data_pkt_fwd


:0.997

Person correlation between Total Length of Bwd Packets and Subflow Bwd Byte
s :1.000

Person correlation between Fwd Packet Length Max and Fwd Packet Length Min
:0.993

Person correlation between Fwd Packet Length Max and Fwd Packet Length Mean
:0.997

Person correlation between Fwd Packet Length Max and Min Packet Length :0.9
92

Person correlation between Fwd Packet Length Max and Max Packet Length :0.9
77

Person correlation between Fwd Packet Length Max and Packet Length Mean :0.
997

Person correlation between Fwd Packet Length Max and Average Packet Size :
0.993

Person correlation between Fwd Packet Length Max and Avg Fwd Segment Size :
0.997

Person correlation between Fwd Packet Length Min and Fwd Packet Length Max
:0.993

Person correlation between Fwd Packet Length Min and Fwd Packet Length Mean
:0.997

Person correlation between Fwd Packet Length Min and Min Packet Length :1.0
00

Person correlation between Fwd Packet Length Min and Max Packet Length :0.9
65

Person correlation between Fwd Packet Length Min and Packet Length Mean :0.
997

Person correlation between Fwd Packet Length Min and Average Packet Size :
0.996

Person correlation between Fwd Packet Length Min and Avg Fwd Segment Size :
0.997

Person correlation between Fwd Packet Length Mean and Fwd Packet Length Max
:0.997

Person correlation between Fwd Packet Length Mean and Fwd Packet Length Min
:0.997

Person correlation between Fwd Packet Length Mean and Min Packet Length :0.
997

Person correlation between Fwd Packet Length Mean and Max Packet Length :0.
970

Person correlation between Fwd Packet Length Mean and Packet Length Mean :
0.999

Person correlation between Fwd Packet Length Mean and Average Packet Size :
0.996

Person correlation between Fwd Packet Length Mean and Avg Fwd Segment Size
:1.000

Person correlation between Bwd Packet Length Mean and Avg Bwd Segment Size
:1.000

Person correlation between Flow IAT Mean and Flow IAT Std :0.984

Person correlation between Flow IAT Mean and Flow IAT Max :0.954

Person correlation between Flow IAT Mean and Fwd IAT Mean :0.991

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 23/51
6/20/22, 12:53 PM Copy_of_DDoS

Person correlation between Flow IAT Mean and Fwd IAT Std :0.975

Person correlation between Flow IAT Mean and Fwd IAT Max :0.954

Person correlation between Flow IAT Mean and Idle Mean :0.952

Person correlation between Flow IAT Mean and Idle Max :0.954

Person correlation between Flow IAT Std and Flow IAT Mean :0.984

Person correlation between Flow IAT Std and Flow IAT Max :0.969

Person correlation between Flow IAT Std and Fwd IAT Mean :0.986

Person correlation between Flow IAT Std and Fwd IAT Std :0.998

Person correlation between Flow IAT Std and Fwd IAT Max :0.969

Person correlation between Flow IAT Std and Idle Mean :0.978

Person correlation between Flow IAT Std and Idle Max :0.969

Person correlation between Flow IAT Std and Idle Min :0.926

Person correlation between Flow IAT Max and Flow IAT Mean :0.954

Person correlation between Flow IAT Max and Flow IAT Std :0.969

Person correlation between Flow IAT Max and Fwd IAT Mean :0.968

Person correlation between Flow IAT Max and Fwd IAT Std :0.974

Person correlation between Flow IAT Max and Fwd IAT Max :1.000

Person correlation between Flow IAT Max and Idle Mean :0.968

Person correlation between Flow IAT Max and Idle Max :0.998

Person correlation between Flow IAT Min and Fwd IAT Min :0.999

Person correlation between Fwd IAT Total and Flow Duration :1.000

Person correlation between Fwd IAT Mean and Flow IAT Mean :0.991

Person correlation between Fwd IAT Mean and Flow IAT Std :0.986

Person correlation between Fwd IAT Mean and Flow IAT Max :0.968

Person correlation between Fwd IAT Mean and Fwd IAT Std :0.985

Person correlation between Fwd IAT Mean and Fwd IAT Max :0.969

Person correlation between Fwd IAT Mean and Idle Mean :0.963

Person correlation between Fwd IAT Mean and Idle Max :0.968

Person correlation between Fwd IAT Std and Flow IAT Mean :0.975

Person correlation between Fwd IAT Std and Flow IAT Std :0.998

Person correlation between Fwd IAT Std and Flow IAT Max :0.974

Person correlation between Fwd IAT Std and Fwd IAT Mean :0.985

Person correlation between Fwd IAT Std and Fwd IAT Max :0.974

Person correlation between Fwd IAT Std and Idle Mean :0.984

Person correlation between Fwd IAT Std and Idle Max :0.973

Person correlation between Fwd IAT Std and Idle Min :0.933

Person correlation between Fwd IAT Max and Flow IAT Mean :0.954

Person correlation between Fwd IAT Max and Flow IAT Std :0.969

Person correlation between Fwd IAT Max and Flow IAT Max :1.000

Person correlation between Fwd IAT Max and Fwd IAT Mean :0.969

Person correlation between Fwd IAT Max and Fwd IAT Std :0.974

Person correlation between Fwd IAT Max and Idle Mean :0.968

Person correlation between Fwd IAT Max and Idle Max :0.998

Person correlation between Fwd IAT Min and Flow IAT Min :0.999

Person correlation between Bwd IAT Total and Bwd IAT Max :0.919

Person correlation between Bwd IAT Mean and Bwd IAT Std :0.995

Person correlation between Bwd IAT Mean and Bwd IAT Max :0.958

Person correlation between Bwd IAT Std and Bwd IAT Mean :0.995

Person correlation between Bwd IAT Std and Bwd IAT Max :0.976

Person correlation between Bwd IAT Max and Bwd IAT Total :0.919

Person correlation between Bwd IAT Max and Bwd IAT Mean :0.958

Person correlation between Bwd IAT Max and Bwd IAT Std :0.976

Person correlation between Fwd PSH Flags and RST Flag Count :1.000

Person correlation between Fwd Header Length and Fwd Header Length.1 :1.000

Person correlation between Min Packet Length and Fwd Packet Length Max :0.9
92

Person correlation between Min Packet Length and Fwd Packet Length Min :1.0

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 24/51
6/20/22, 12:53 PM Copy_of_DDoS

00

Person correlation between Min Packet Length and Fwd Packet Length Mean :0.
997

Person correlation between Min Packet Length and Max Packet Length :0.964

Person correlation between Min Packet Length and Packet Length Mean :0.997

Person correlation between Min Packet Length and Average Packet Size :0.996

Person correlation between Min Packet Length and Avg Fwd Segment Size :0.99
7

Person correlation between Max Packet Length and Fwd Packet Length Max :0.9
77

Person correlation between Max Packet Length and Fwd Packet Length Min :0.9
65

Person correlation between Max Packet Length and Fwd Packet Length Mean :0.
970

Person correlation between Max Packet Length and Min Packet Length :0.964

Person correlation between Max Packet Length and Packet Length Mean :0.975

Person correlation between Max Packet Length and Average Packet Size :0.969

Person correlation between Max Packet Length and Avg Fwd Segment Size :0.97
0

Person correlation between Packet Length Mean and Fwd Packet Length Max :0.
997

Person correlation between Packet Length Mean and Fwd Packet Length Min :0.
997

Person correlation between Packet Length Mean and Fwd Packet Length Mean :
0.999

Person correlation between Packet Length Mean and Min Packet Length :0.997

Person correlation between Packet Length Mean and Max Packet Length :0.975

Person correlation between Packet Length Mean and Average Packet Size :0.99
7

Person correlation between Packet Length Mean and Avg Fwd Segment Size :0.9
99

Person correlation between RST Flag Count and Fwd PSH Flags :1.000

Person correlation between Average Packet Size and Fwd Packet Length Max :
0.993

Person correlation between Average Packet Size and Fwd Packet Length Min :
0.996

Person correlation between Average Packet Size and Fwd Packet Length Mean :
0.996

Person correlation between Average Packet Size and Min Packet Length :0.996

Person correlation between Average Packet Size and Max Packet Length :0.969

Person correlation between Average Packet Size and Packet Length Mean :0.99
7

Person correlation between Average Packet Size and Avg Fwd Segment Size :0.
996

Person correlation between Avg Fwd Segment Size and Fwd Packet Length Max :
0.997

Person correlation between Avg Fwd Segment Size and Fwd Packet Length Min :
0.997

Person correlation between Avg Fwd Segment Size and Fwd Packet Length Mean
:1.000

Person correlation between Avg Fwd Segment Size and Min Packet Length :0.99
7

Person correlation between Avg Fwd Segment Size and Max Packet Length :0.97
0

Person correlation between Avg Fwd Segment Size and Packet Length Mean :0.9
99

Person correlation between Avg Fwd Segment Size and Average Packet Size :0.

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 25/51
6/20/22, 12:53 PM Copy_of_DDoS

996

Person correlation between Avg Bwd Segment Size and Bwd Packet Length Mean
:1.000

Person correlation between Fwd Header Length.1 and Fwd Header Length :1.000

Person correlation between Subflow Fwd Packets and Total Fwd Packets :1.000

Person correlation between Subflow Fwd Bytes and Total Length of Fwd Packets
:1.000

Person correlation between Subflow Fwd Bytes and act_data_pkt_fwd :0.997

Person correlation between Subflow Bwd Packets and Total Backward Packets :
1.000

Person correlation between Subflow Bwd Bytes and Total Length of Bwd Packet
s :1.000

Person correlation between act_data_pkt_fwd and Total Length of Fwd Packets


:0.997

Person correlation between act_data_pkt_fwd and Subflow Fwd Bytes :0.997

Person correlation between Idle Mean and Flow IAT Mean :0.952

Person correlation between Idle Mean and Flow IAT Std :0.978

Person correlation between Idle Mean and Flow IAT Max :0.968

Person correlation between Idle Mean and Fwd IAT Mean :0.963

Person correlation between Idle Mean and Fwd IAT Std :0.984

Person correlation between Idle Mean and Fwd IAT Max :0.968

Person correlation between Idle Mean and Idle Max :0.970

Person correlation between Idle Mean and Idle Min :0.967

Person correlation between Idle Max and Flow IAT Mean :0.954

Person correlation between Idle Max and Flow IAT Std :0.969

Person correlation between Idle Max and Flow IAT Max :0.998

Person correlation between Idle Max and Fwd IAT Mean :0.968

Person correlation between Idle Max and Fwd IAT Std :0.973

Person correlation between Idle Max and Fwd IAT Max :0.998

Person correlation between Idle Max and Idle Mean :0.970

Person correlation between Idle Min and Flow IAT Std :0.926

Person correlation between Idle Min and Fwd IAT Std :0.933

Person correlation between Idle Min and Idle Mean :0.967

148 6006

0.024642024642024644

In [28]: X_std = StandardScaler().fit_transform(df)

mean_vec = np.mean(X,axis=0)

cov_mat = (X-mean_vec).T.dot((X-mean_vec))/(X.shape[0]-1)

print('Covarience matrix \n%s' %cov_mat)

Covarience matrix

[[ 1.0000025 -0.61843831 0.36058758 ... 0.32478623 0.25143831

0.05059552]

[-0.61843831 1.0000025 -0.47015115 ... -0.49611473 -0.46985241

0.08154645]

[ 0.36058758 -0.47015115 1.0000025 ... 0.89894796 0.71873276

-0.03082452]

...

[ 0.32478623 -0.49611473 0.89894796 ... 1.0000025 0.88606985

-0.03363851]

[ 0.25143831 -0.46985241 0.71873276 ... 0.88606985 1.0000025

-0.06401936]

[ 0.05059552 0.08154645 -0.03082452 ... -0.03363851 -0.06401936

1.0000025 ]]

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 26/51
6/20/22, 12:53 PM Copy_of_DDoS

Hypothesis Testing

Chi Squared Test


In [29]: from scipy.stats import chi2_contingency

stats,p,dof,expected = chi2_contingency(pd.crosstab(data[' SYN Flag Count'],da


ta[' Protocol']))

alpha = 0.05
print(" P value is "+str(p))

print(" SYN Flag Count "," Protocol")

if p > alpha :

print("Independent (H0 hold true)")

else:

print("dependent (reject H0)")

P value is 4.0030002619486493e-66

SYN Flag Count Protocol

dependent (reject H0)

In [30]: from scipy.stats import chi2_contingency

stats,p,dof,expected = chi2_contingency(pd.crosstab(data[' PSH Flag Count'],da


ta[' Protocol']))

alpha = 0.05
print(" P value is "+str(p))

print(" PSH Flag Count "," Protocol")

if p > alpha :

print("Independent (H0 hold true)")

else:

print("dependent (reject H0)")

P value is 1.0

PSH Flag Count Protocol

Independent (H0 hold true)

In [31]: from scipy.stats import chi2_contingency

stats,p,dof,expected = chi2_contingency(pd.crosstab(data[' RST Flag Count'],da


ta[' Protocol']))

alpha = 0.05
print(" P value is "+str(p))

print(" RST Flag Count "," Protocol")

if p > alpha :

print("Independent (H0 hold true)")

else:

print("dependent (reject H0)")

P value is 0.0

RST Flag Count Protocol

dependent (reject H0)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 27/51
6/20/22, 12:53 PM Copy_of_DDoS

In [32]: pd.crosstab(data[' RST Flag Count'],data[' Protocol'])

Out[32]:
Protocol 0 6 17

RST Flag Count

0 76 72324 327121

1 0 479 0

T Test
In [33]: from scipy.stats import ttest_ind

score = ttest_ind(data[' Flow Duration'],data['Fwd IAT Total'],equal_var = Fal


se)

print(score)

Ttest_indResult(statistic=0.11945494497236958, pvalue=0.9049149630747436)

Principle Component Analysis


In [34]: from sklearn.preprocessing import StandardScaler

X_std = StandardScaler().fit_transform(df)

mean_vec = np.mean(X,axis= 0)

cov_mat = (X-mean_vec).T.dot((X-mean_vec))/(X.shape[0]-1)

print("Covarience matrix \n%s"%cov_mat)

Covarience matrix

[[ 1.0000025 -0.61843831 0.36058758 ... 0.32478623 0.25143831

0.05059552]

[-0.61843831 1.0000025 -0.47015115 ... -0.49611473 -0.46985241

0.08154645]

[ 0.36058758 -0.47015115 1.0000025 ... 0.89894796 0.71873276

-0.03082452]

...

[ 0.32478623 -0.49611473 0.89894796 ... 1.0000025 0.88606985

-0.03363851]

[ 0.25143831 -0.46985241 0.71873276 ... 0.88606985 1.0000025

-0.06401936]

[ 0.05059552 0.08154645 -0.03082452 ... -0.03363851 -0.06401936

1.0000025 ]]

Eigen decomposition of the Covariance Matrix

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 28/51
6/20/22, 12:53 PM Copy_of_DDoS

In [35]: eig_vals, eig_vecs = np.linalg.eig(cov_mat)

print("Eigen Vector \n%s"%eig_vecs)

print("Eigen Values \n%s"%eig_vals)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 29/51
6/20/22, 12:53 PM Copy_of_DDoS

Eigen Vector

[[-0.13644319+0.j -0.06226499+0.j -0.08255249+0.j ... 0. +0.j

0. +0.j 0. +0.j]

[ 0.18521986+0.j 0.02758098+0.j 0.06785011+0.j ... 0. +0.j

0. +0.j 0. +0.j]

[-0.22202055+0.j 0.00369437+0.j 0.14341953+0.j ... 0. +0.j

0. +0.j 0. +0.j]

...

[-0.23666076+0.j 0.0011592 +0.j 0.15574923+0.j ... 0. +0.j

0. +0.j 0. +0.j]

[-0.21531557+0.j 0.01457661+0.j 0.13144819+0.j ... 0. +0.j

0. +0.j 0. +0.j]

[ 0.0279205 +0.j -0.10801287+0.j 0.06832728+0.j ... 0. +0.j

0. +0.j 0. +0.j]]

Eigen Values

[ 1.42033356e+01+0.00000000e+00j 7.72649536e+00+0.00000000e+00j

6.97276476e+00+0.00000000e+00j 3.44372591e+00+0.00000000e+00j

3.36559607e+00+0.00000000e+00j 3.06290611e+00+0.00000000e+00j

2.73200811e+00+0.00000000e+00j 2.16244583e+00+0.00000000e+00j

2.04284143e+00+0.00000000e+00j 1.98999267e+00+0.00000000e+00j

1.92553259e+00+0.00000000e+00j 1.87241124e+00+0.00000000e+00j

1.57834156e+00+0.00000000e+00j 1.31971871e+00+0.00000000e+00j

1.20821059e+00+0.00000000e+00j 6.46386672e-01+0.00000000e+00j

7.93066087e-01+0.00000000e+00j 1.04245104e+00+0.00000000e+00j

1.00743115e+00+0.00000000e+00j 9.98571864e-01+0.00000000e+00j

9.40046537e-01+0.00000000e+00j 8.64781483e-01+0.00000000e+00j

8.86464711e-01+0.00000000e+00j 5.41688756e-01+0.00000000e+00j

4.35892964e-01+0.00000000e+00j 3.35441715e-01+0.00000000e+00j

2.69710392e-01+0.00000000e+00j 4.63365483e-01+0.00000000e+00j

3.84410884e-01+0.00000000e+00j 2.14326099e-01+0.00000000e+00j

2.07266827e-01+0.00000000e+00j 8.51670083e-02+0.00000000e+00j

6.95002450e-02+0.00000000e+00j 5.28876705e-02+0.00000000e+00j

3.53399844e-02+0.00000000e+00j 3.39899391e-02+0.00000000e+00j

2.15243164e-02+0.00000000e+00j 1.62337338e-02+0.00000000e+00j

1.43799765e-02+0.00000000e+00j 1.03590900e-02+0.00000000e+00j

6.89319475e-03+0.00000000e+00j 3.29125760e-03+0.00000000e+00j

2.65339198e-03+0.00000000e+00j 2.00362749e-03+0.00000000e+00j

1.30937508e-03+0.00000000e+00j 1.38314696e-03+0.00000000e+00j

1.35270413e-03+0.00000000e+00j 1.09949040e-03+0.00000000e+00j

8.90704310e-04+0.00000000e+00j 6.29806195e-04+0.00000000e+00j

5.03692830e-04+0.00000000e+00j 3.76318337e-04+0.00000000e+00j

2.14122590e-04+0.00000000e+00j 3.15230787e-04+0.00000000e+00j

1.20002518e-04+0.00000000e+00j 9.35033083e-05+0.00000000e+00j

1.66480243e-05+0.00000000e+00j 7.56703066e-06+0.00000000e+00j

1.54407558e-16+0.00000000e+00j 7.59839399e-17+6.76312141e-17j

7.59839399e-17-6.76312141e-17j 9.16402231e-17+0.00000000e+00j

-5.44726130e-17+1.81967420e-17j -5.44726130e-17-1.81967420e-17j

-8.25377906e-19+0.00000000e+00j 2.01016260e-17+0.00000000e+00j

0.00000000e+00+0.00000000e+00j 0.00000000e+00+0.00000000e+00j

0.00000000e+00+0.00000000e+00j 0.00000000e+00+0.00000000e+00j

0.00000000e+00+0.00000000e+00j 0.00000000e+00+0.00000000e+00j

0.00000000e+00+0.00000000e+00j 0.00000000e+00+0.00000000e+00j

0.00000000e+00+0.00000000e+00j 0.00000000e+00+0.00000000e+00j

0.00000000e+00+0.00000000e+00j 0.00000000e+00+0.00000000e+00j]

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 30/51
6/20/22, 12:53 PM Copy_of_DDoS

Selecting Principle Components

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 31/51
6/20/22, 12:53 PM Copy_of_DDoS

In [36]: eig_pairs=[(np.abs(eig_vals[i]),eig_vecs[:,i]) for i in range(len(eig_vals))]

eig_pairs.sort(key=lambda x:[0],reverse=True)

print("Eigen Values in descending order: " )

for i, j in enumerate(eig_pairs):

print(i,j[0])

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 32/51
6/20/22, 12:53 PM Copy_of_DDoS

Eigen Values in descending order:

0 14.203335635459023

1 7.72649536453649

2 6.972764758518834

3 3.4437259145677888

4 3.3655960738401918

5 3.0629061126132973

6 2.732008106736874

7 2.162445831484391

8 2.0428414264157237

9 1.989992674733237

10 1.9255325868241058

11 1.872411238759887

12 1.5783415578920403

13 1.3197187145205351

14 1.2082105909293108

15 0.6463866715041711

16 0.7930660872085342

17 1.0424510364507815

18 1.0074311525518724

19 0.9985718638465273

20 0.9400465373681084

21 0.8647814834581833

22 0.8864647110533768

23 0.5416887563792396

24 0.4358929638916721

25 0.3354417151908467

26 0.2697103922117089

27 0.46336548291789303
28 0.38441088408307755
29 0.21432609904951572
30 0.20726682701012103
31 0.0851670083082053

32 0.06950024499568283
33 0.05288767048344181
34 0.035339984422644515

35 0.03398993913758611
36 0.02152431638474884
37 0.01623373382139512
38 0.014379976511664358

39 0.010359090026185399

40 0.006893194750722876

41 0.0032912575989725934

42 0.0026533919781355636

43 0.002003627487092827

44 0.0013093750777122082

45 0.00138314696307466
46 0.001352704129549723

47 0.0010994903982587496

48 0.0008907043103310611

49 0.0006298061948454105

50 0.0005036928300685212

51 0.0003763183374628182

52 0.00021412258994108418

53 0.0003152307866004254

54 0.00012000251762704192

55 9.350330830589868e-05

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 33/51
6/20/22, 12:53 PM Copy_of_DDoS

56 1.6648024281554622e-05

57 7.5670306649343e-06
58 1.5440755813161445e-16

59 1.0172286002046191e-16

60 1.0172286002046191e-16

61 9.164022310312424e-17

62 5.743158529421716e-17

63 5.743158529421716e-17

64 8.253779064973264e-19

65 2.0101625965361303e-17

66 0.0

67 0.0

68 0.0

69 0.0

70 0.0

71 0.0

72 0.0

73 0.0

74 0.0

75 0.0

76 0.0

77 0.0

In [37]: eig_pairs[0][1]

Out[37]: array([-0.13644319+0.j, 0.18521986+0.j, -0.22202055+0.j, -0.00095872+0.j,

-0.05317823+0.j, 0.02640326+0.j, -0.02036745+0.j, 0.15976835+0.j,

0.16079792+0.j, 0.16200916+0.j, -0.01763468+0.j, -0.03932833+0.j,

-0.00517699+0.j, -0.03510362+0.j, -0.04011123+0.j, 0.09059633+0.j,

-0.23017315+0.j, -0.23213642+0.j, -0.23736852+0.j, -0.00404002+0.j,

-0.22188271+0.j, -0.23440955+0.j, -0.23325352+0.j, -0.23716782+0.j,

-0.00398763+0.j, -0.09469499+0.j, -0.08653566+0.j, -0.08891269+0.j,

-0.09426935+0.j, -0.04060342+0.j, -0.01143907+0.j, 0. +0.j,

0. +0.j, 0. +0.j, -0.00467306+0.j, 0.0002597 +0.j,

-0.00703105+0.j, 0.16083084+0.j, 0.14914425+0.j, 0.16112801+0.j,

-0.03290546+0.j, -0.02526556+0.j, 0. +0.j, -0.00169732+0.j,

-0.01143907+0.j, 0. +0.j, -0.1843995 +0.j, -0.01435095+0.j,

-0.00928138+0.j, 0. +0.j, -0.02596931+0.j, 0.15806648+0.j,

0.16200916+0.j, -0.03510362+0.j, -0.00467306+0.j, 0. +0.j,

0. +0.j, 0. +0.j, 0. +0.j, 0. +0.j,

0. +0.j, -0.00095872+0.j, 0.02640326+0.j, -0.05317823+0.j,

-0.02036745+0.j, -0.14273218+0.j, -0.01447762+0.j, 0.01574994+0.j,

-0.02137095+0.j, -0.03407062+0.j, -0.04455467+0.j, -0.04660472+0.j,

-0.01707031+0.j, -0.23335742+0.j, -0.19564965+0.j, -0.23666076+0.j,

-0.21531557+0.j, 0.0279205 +0.j])

Explained Variance

In [38]: tot=sum(eig_vals)

var_exp=[(i/tot)*100 for i in sorted(eig_vals,reverse=True)]

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 34/51
6/20/22, 12:53 PM Copy_of_DDoS

In [39]: with plt.style.context("dark_background"):

plt.figure(figsize=(30,30))

plt.bar(range(78),var_exp,alpha=0.5,align="center",label="individual explain
ed variance")

plt.ylabel("explained variance raito")

plt.xlabel("Principal Component")

plt.legend(loc="best")

plt.tight_layout()

/usr/local/lib/python3.7/dist-packages/matplotlib/transforms.py:789: ComplexW
arning: Casting complex values to real discards the imaginary part

points = np.array(args, dtype=float).reshape(2, 2)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 35/51
6/20/22, 12:53 PM Copy_of_DDoS

In [40]: matrix_w=np.hstack((eig_pairs[0][1].reshape(78,1),eig_pairs[1][1].reshape(78,1
)))

print("Matrix W: \n",matrix_w)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 36/51
6/20/22, 12:53 PM Copy_of_DDoS

Matrix W:

[[-1.36443195e-01+0.j -6.22649911e-02+0.j]

[ 1.85219855e-01+0.j 2.75809848e-02+0.j]

[-2.22020551e-01+0.j 3.69437371e-03+0.j]

[-9.58716391e-04+0.j 2.68948940e-03+0.j]

[-5.31782329e-02+0.j 2.93186133e-01+0.j]

[ 2.64032641e-02+0.j 2.46285459e-03+0.j]

[-2.03674472e-02+0.j 2.70705165e-01+0.j]

[ 1.59768349e-01+0.j 9.77303616e-02+0.j]

[ 1.60797920e-01+0.j 8.02322353e-02+0.j]

[ 1.62009163e-01+0.j 8.43531223e-02+0.j]

[-1.76346755e-02+0.j 1.23578115e-01+0.j]

[-3.93283262e-02+0.j 3.23065708e-01+0.j]

[-5.17699479e-03+0.j 4.97207168e-02+0.j]

[-3.51036243e-02+0.j 2.90787160e-01+0.j]

[-4.01112340e-02+0.j 2.93186523e-01+0.j]

[ 9.05963286e-02+0.j -9.91649673e-03+0.j]

[-2.30173147e-01+0.j -2.48931229e-02+0.j]

[-2.32136421e-01+0.j -1.55999806e-02+0.j]

[-2.37368521e-01+0.j 6.69290113e-03+0.j]

[-4.04002103e-03+0.j 9.35728608e-04+0.j]

[-2.21882711e-01+0.j 3.16978406e-03+0.j]

[-2.34409551e-01+0.j -2.02868225e-02+0.j]

[-2.33253515e-01+0.j -1.11587362e-02+0.j]

[-2.37167823e-01+0.j 5.91666077e-03+0.j]

[-3.98763123e-03+0.j 8.68998842e-04+0.j]

[-9.46949890e-02+0.j 9.76610626e-02+0.j]

[-8.65356604e-02+0.j 3.31227630e-02+0.j]

[-8.89126885e-02+0.j 4.09135207e-02+0.j]

[-9.42693534e-02+0.j 6.42026251e-02+0.j]

[-4.06034182e-02+0.j 4.41208073e-03+0.j]

[-1.14390720e-02+0.j 1.76059773e-02+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[-4.67306418e-03+0.j 3.33247609e-04+0.j]

[ 2.59700180e-04+0.j -1.17091172e-03+0.j]

[-7.03104829e-03+0.j -3.50336382e-04+0.j]

[ 1.60830839e-01+0.j 7.96550072e-02+0.j]

[ 1.49144245e-01+0.j 1.61194665e-01+0.j]

[ 1.61128010e-01+0.j 9.31440822e-02+0.j]

[-3.29054579e-02+0.j 2.78601320e-01+0.j]

[-2.52655579e-02+0.j 3.05088181e-01+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[-1.69732333e-03+0.j 8.65537371e-04+0.j]

[-1.14390720e-02+0.j 1.76059773e-02+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[-1.84399501e-01+0.j -3.28368343e-02+0.j]

[-1.43509487e-02+0.j 2.27161780e-02+0.j]

[-9.28137823e-03+0.j 1.66573978e-02+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[-2.59693125e-02+0.j 6.79906605e-02+0.j]

[ 1.58066480e-01+0.j 8.85095299e-02+0.j]

[ 1.62009163e-01+0.j 8.43531223e-02+0.j]

[-3.51036243e-02+0.j 2.90787160e-01+0.j]

[-4.67306418e-03+0.j 3.33247609e-04+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 37/51
6/20/22, 12:53 PM Copy_of_DDoS

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[ 0.00000000e+00+0.j 0.00000000e+00+0.j]

[-9.58716391e-04+0.j 2.68948940e-03+0.j]

[ 2.64032641e-02+0.j 2.46285459e-03+0.j]

[-5.31782329e-02+0.j 2.93186133e-01+0.j]

[-2.03674472e-02+0.j 2.70705165e-01+0.j]

[-1.42732184e-01+0.j 7.40240838e-02+0.j]

[-1.44776217e-02+0.j 4.07840410e-02+0.j]

[ 1.57499410e-02+0.j -2.33859330e-03+0.j]

[-2.13709539e-02+0.j -3.97657590e-03+0.j]

[-3.40706178e-02+0.j 4.25100042e-02+0.j]

[-4.45546699e-02+0.j 3.16233496e-02+0.j]

[-4.66047215e-02+0.j 4.08394136e-02+0.j]

[-1.70703123e-02+0.j 3.08096393e-02+0.j]

[-2.33357423e-01+0.j 7.66424312e-03+0.j]

[-1.95649654e-01+0.j -1.47155175e-02+0.j]

[-2.36660758e-01+0.j 1.15920396e-03+0.j]

[-2.15315572e-01+0.j 1.45766138e-02+0.j]

[ 2.79204962e-02+0.j -1.08012869e-01+0.j]]

In [41]: Y=X_std.dot(matrix_w)

Out[41]: array([[ 3.32755152+0.j, 1.17455803+0.j],

[ 3.31620126+0.j, 1.1693784 +0.j],

[ 3.1241063 +0.j, 1.19068835+0.j],

...,

[ -1.93651008+0.j, -1.22422617+0.j],

[ -2.03393674+0.j, -1.26868622+0.j],

[-11.84870487+0.j, -1.02759442+0.j]])

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 38/51
6/20/22, 12:53 PM Copy_of_DDoS

In [42]: from sklearn.decomposition import PCA

pca=PCA().fit(X_std)

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlim(0,78,1)

plt.xlabel("Number of components")

plt.ylabel("Cumulative Explained Variance")

Out[42]: Text(0, 0.5, 'Cumulative Explained Variance')

In [43]: sklearn_pca=PCA(n_components=30)

Y_sklearn=sklearn_pca.fit_transform(X_std)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 39/51
6/20/22, 12:53 PM Copy_of_DDoS

In [44]: pca=PCA(n_components=2)

principalComponents=pca.fit_transform(X_norm)

plt.figure(figsize=(16,16))

g1=sns.scatterplot(principalComponents[:,0],principalComponents[:,1],s=100,hue
=data_[" Label"],cmap="Spectral",alpha=0.7)

plt.title('Visulizing DDos attack through PCA',fontsize=24)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 40/51
6/20/22, 12:53 PM Copy_of_DDoS

/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarni
ng: Pass the following variables as keyword args: x, y. From version 0.12, th
e only valid positional argument will be `data`, and passing other arguments
without an explicit keyword will result in an error or misinterpretation.

FutureWarning

Out[44]: Text(0.5, 1.0, 'Visulizing DDos attack through PCA')

/usr/local/lib/python3.7/dist-packages/google/colab/_event_manager.py:28: Use
rWarning: Creating legend with loc="best" can be slow with large amounts of d
ata.

func(*args, **kwargs)

/usr/local/lib/python3.7/dist-packages/IPython/core/pylabtools.py:125: UserWa
rning: Creating legend with loc="best" can be slow with large amounts of dat
a.

fig.canvas.print_figure(bytes_io, **kw)

Supervised Learning Model

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 41/51
6/20/22, 12:53 PM Copy_of_DDoS

In [45]: from imblearn.over_sampling import SMOTE

from collections import Counter

from matplotlib import pyplot

from sklearn.preprocessing import LabelEncoder

from sklearn import utils

from sklearn.utils import _safe_indexing

y=LabelEncoder().fit_transform(y)

oversample=SMOTE()

X,y=oversample.fit_resample(X,y)

counter=Counter(y)

for k, v in counter.items():

per=v/len(y)*100

print('Class=%d, n=%d, (%.3f%%)' %(k,v,per))

pyplot.bar(counter.keys(),counter.values())

pyplot.show()

Class=1, n=39995, (8.333%)

Class=0, n=39995, (8.333%)

Class=2, n=39995, (8.333%)

Class=3, n=39995, (8.333%)

Class=4, n=39995, (8.333%)

Class=5, n=39995, (8.333%)

Class=6, n=39995, (8.333%)

Class=7, n=39995, (8.333%)

Class=8, n=39995, (8.333%)

Class=10, n=39995, (8.333%)

Class=11, n=39995, (8.333%)

Class=9, n=39995, (8.333%)

In [46]: y.shape

Out[46]: (479940,)

In [47]: X.shape

Out[47]: (479940, 78)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 42/51
6/20/22, 12:53 PM Copy_of_DDoS

In [48]: from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix,accuracy_score

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state
=1)

In [ ]: from sklearn.datasets import make_classification

from sklearn.linear_model import LogisticRegression

from sklearn.multiclass import OneVsRestClassifier

model=LogisticRegression(max_iter=440000)

ovr=OneVsRestClassifier(model)

ovr.fit(X_train,y_train)

y_pred=ovr.predict(X_test)

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.98 0.96 0.97 9741

1 0.50 0.40 0.44 9990

2 0.48 0.69 0.56 10021

3 0.44 0.59 0.50 9999

4 0.93 0.93 0.93 10024

5 0.33 0.31 0.32 10093

6 0.60 0.91 0.72 10156

7 0.25 0.06 0.10 9977

8 0.52 0.49 0.50 10057

9 0.78 0.82 0.80 9983

10 0.74 0.52 0.61 9990

11 0.95 1.00 0.97 9954

accuracy 0.64 119985

macro avg 0.62 0.64 0.62 119985

weighted avg 0.62 0.64 0.62 119985

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 43/51
6/20/22, 12:53 PM Copy_of_DDoS

In [ ]: from sklearn.tree import DecisionTreeClassifier

from matplotlib import pyplot as plt

from sklearn import tree

classifier=DecisionTreeClassifier()

classifier.fit(X_train,y_train)

y_pred=classifier.predict(X_test)

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 1.00 1.00 1.00 9741

1 0.58 0.62 0.60 9990

2 0.59 0.56 0.58 10021

3 0.76 0.75 0.76 9999

4 0.99 0.99 0.99 10024

5 0.51 0.52 0.51 10093

6 0.63 0.62 0.63 10156

7 0.55 0.54 0.54 9977

8 0.61 0.61 0.61 10057

9 0.81 0.81 0.81 9983

10 0.73 0.73 0.73 9990

11 1.00 1.00 1.00 9954

accuracy 0.73 119985

macro avg 0.73 0.73 0.73 119985

weighted avg 0.73 0.73 0.73 119985

In [ ]: print(accuracy_score(y_test,y_pred))

0.7284327207567612

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 44/51
6/20/22, 12:53 PM Copy_of_DDoS

In [ ]: from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators = 1000, criterion = 'entropy'


,random_state = 0)

classifier.fit(X_train,y_train)

Y_pred = classifier.predict(X_test)

print(classification_report(y_test,Y_pred))

precision recall f1-score support

0 1.00 1.00 1.00 9741

1 0.77 0.41 0.53 9990

2 0.58 0.89 0.70 10021

3 0.82 0.85 0.84 9999

4 0.99 0.99 0.99 10024

5 0.81 0.25 0.38 10093

6 0.61 0.98 0.75 10156

7 0.58 0.71 0.63 9977

8 0.68 0.55 0.61 10057

9 0.81 0.81 0.81 9983

10 0.78 0.73 0.76 9990

11 0.99 1.00 1.00 9954

accuracy 0.76 119985

macro avg 0.78 0.76 0.75 119985

weighted avg 0.78 0.76 0.75 119985

In [ ]: print(confusion_matrix(y_test,y_pred))

[[9727 2 1 0 4 0 0 1 0 1 5 0]

[ 8 6197 3173 328 44 28 13 155 32 1 8 3]

[ 1 3767 5607 391 0 0 0 231 18 0 6 0]

[ 1 397 376 7546 7 1164 2 365 127 0 12 2]

[ 7 45 1 3 9951 2 0 0 4 0 0 11]

[ 0 19 5 1076 1 5236 3675 42 24 0 15 0]

[ 0 9 0 2 0 3800 6341 1 1 0 2 0]

[ 1 165 262 382 1 33 3 5377 3385 1 366 1]

[ 1 40 8 130 5 19 2 3323 6122 1 399 7]

[ 0 0 0 0 0 0 0 0 1 8060 1922 0]

[ 2 5 3 16 2 22 2 349 390 1907 7292 0]

[ 0 1 0 1 3 1 0 1 2 0 0 9945]]

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 45/51
6/20/22, 12:53 PM Copy_of_DDoS

In [ ]: skplt.metrics.plot_confusion_matrix(y_test,y_pred,figsize=(16,16))

Out[ ]: <matplotlib.axes._subplots.AxesSubplot at 0x7fa1b9ebb450>

Feature Scaling

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 46/51
6/20/22, 12:53 PM Copy_of_DDoS

In [51]: import numpy as np

from matplotlib import pyplot as plt

import pandas as pd

x=df.iloc[:,1:3].values

print("\n Original data values:\n",x)

min_max_scaler=preprocessing.MinMaxScaler(feature_range=(0,1))

x_after_min_max_scaler=min_max_scaler.fit_transform(x)

print("\nAfter min max scaling:\n",x_after_min_max_scaler)

Standardisation=preprocessing.StandardScaler()

x_after_Standardisation=Standardisation.fit_transform(x)

print("\n after standardisation :\n",x_after_Standardisation)

Original data values:


[[ 17 1]

[ 17 1]
[ 17 44]
...

[ 6 1]
[ 6 1]
[ 6 112584179]]

After min max scaling:


[[1.00000000e+00 8.33333646e-09]

[1.00000000e+00 8.33333646e-09]

[1.00000000e+00 3.66666804e-07]

...

[3.52941176e-01 8.33333646e-09]

[3.52941176e-01 8.33333646e-09]

[3.52941176e-01 9.38201843e-01]]

after standardisation :

[[ 0.47191611 -0.22334861]

[ 0.47191611 -0.22334861]

[ 0.47191611 -0.22334626]

...

[-2.11674636 -0.22334861]

[-2.11674636 -0.22334861]

[-2.11674636 5.91918411]]

In [58]: import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.cluster import DBSCAN

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import normalize

from sklearn.decomposition import PCA

In [60]: y.shape

Out[60]: (479940,)

In [61]: print(X.shape)

(479940, 78)

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 47/51
6/20/22, 12:53 PM Copy_of_DDoS

In [62]: from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test =train_test_split(X,y,test_size=0.20)

In [63]: from sklearn.preprocessing import StandardScaler

scaler =StandardScaler()

scaler.fit(X_train)

X_train=scaler.transform(X_train)

X_test=scaler.transform(X_test)

In [65]: from sklearn.ensemble import BaggingClassifier

from sklearn.tree import ExtraTreeClassifier

extra_tree =ExtraTreeClassifier(random_state=0)

cls=BaggingClassifier(extra_tree,random_state=0).fit(X_train,y_train)

cls.score(X_test,y_test)

Out[65]: 0.7575738633995917

In [66]: y_pred=cls.predict(X_test)

In [67]: from sklearn.metrics import classification_report,confusion_matrix

print(confusion_matrix(y_test,y_pred))

print(classification_report(y_test,y_pred))

[[7988 1 0 0 0 0 0 0 0 0 0 0]

[ 2 5158 2524 207 32 15 3 88 15 1 4 5]

[ 1 2857 4662 173 0 5 0 138 6 0 2 0]

[ 0 240 345 6716 2 405 0 236 51 1 5 2]

[ 5 31 1 0 7844 1 0 0 3 1 1 14]

[ 1 12 5 1058 3 4639 2363 14 7 1 16 2]

[ 0 4 0 1 1 2560 5379 0 1 0 1 0]

[ 1 124 216 257 1 16 1 4962 2226 0 136 2]

[ 1 25 9 108 8 6 0 2789 4917 1 120 10]

[ 1 0 0 0 0 0 0 0 0 6624 1465 0]

[ 0 5 4 7 8 13 0 372 266 1599 5690 0]

[ 0 0 0 0 0 0 0 0 0 0 0 8139]]

precision recall f1-score support

0 1.00 1.00 1.00 7989

1 0.61 0.64 0.62 8054

2 0.60 0.59 0.60 7844

3 0.79 0.84 0.81 8003

4 0.99 0.99 0.99 7901

5 0.61 0.57 0.59 8121

6 0.69 0.68 0.69 7947

7 0.58 0.62 0.60 7942

8 0.66 0.62 0.64 7994

9 0.81 0.82 0.81 8090

10 0.76 0.71 0.74 7964

11 1.00 1.00 1.00 8139

accuracy 0.76 95988

macro avg 0.76 0.76 0.76 95988

weighted avg 0.76 0.76 0.76 95988

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 48/51
6/20/22, 12:53 PM Copy_of_DDoS

In [74]: import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.ensemble import ExtraTreesClassifier

#Buildaclassification task using3informative features

X,y=make_classification(n_samples=1000,n_features=11,n_informative=3,n_redunda
nt=0,n_repeated=0,n_classes=2,random_state=0,shuffle=False)

#Buildaforest and compute the impurity-based feature importances

forest=ExtraTreesClassifier(n_estimators=250,random_state=0)

forest.fit(X,y)

importances =forest.feature_importances_

std=np.std([tree.feature_importances_ for tree in forest.estimators_],

axis=0)

indices=np.argsort(importances)[::-1]

#Print the feature ranking

print("Feature ranking:")

# for f in range(X.shape[1]):

# print("%d. %s (%f)" % (f+1,df['Label'],importances[indices[f]]))

#Plot the impurity-based feature importances of the forest

plt.figure()
plt.title("Feature importances")

plt.bar(range(X.shape[1]),importances[indices],

color="r",yerr=std[indices],align="center")

plt.xticks(range(X.shape[1]),indices)

plt.xlim([-1,x.shape[1]])

plt.show()

Feature ranking:

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 49/51
6/20/22, 12:53 PM Copy_of_DDoS

In [75]: from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.ensemble import ExtraTreesClassifier

from matplotlib import pyplot

#get the dataset

def get_dataset():

X,y=make_classification(n_samples=1000,n_features=20,n_informative=15,n_redu
ndant=5,random_state=4)

return X,y

#getalist of models to evaluate

def get_models():

models=dict()

#define number of trees to consider

n_trees=[10,50,100,500,1000,5000]

for n in n_trees:

models[str(n)]=ExtraTreesClassifier(n_estimators=n)

return models

#evaluateagiven model using cross-validation

def evaluate_model(model,X,y):

#define the evaluation procedure

cv=RepeatedStratifiedKFold(n_splits=10,n_repeats=3,random_state=1)

#evaluate the model and collect the results

scores=cross_val_score(model,X,y,scoring="accuracy",cv=cv,n_jobs=-1)

return scores

X,y=get_dataset()

#get the models to evaluate

models =get_models()

#evaluate the models and store results.

results,names =list(),list()

for name,model in models.items():

#evaluate the model

scores=evaluate_model(model,X,y)

#store the results

results.append(scores)

names.append(name)

#summarize the performance along the way

print(">%s %.3f (%.3f)" % (name,mean(scores),std(scores)))


#plot model performance for comparisonI

pyplot.boxplot(results,labels=names,showmeans=True)

pyplot.show()

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 50/51
6/20/22, 12:53 PM Copy_of_DDoS

>10 0.853 (0.030)

>50 0.905 (0.029)

>100 0.910 (0.022)

>500 0.906 (0.028)

>1000 0.912 (0.026)

/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executo
r.py:705: UserWarning: A worker stopped while some jobs were given to the exe
cutor. This can be caused by a too short worker timeout or by a memory leak.

"timeout or by a memory leak.", UserWarning

>5000 0.910 (0.025)

In [ ]:

localhost:8888/nbconvert/html/Downloads/Copy_of_DDoS.ipynb?download=false 51/51

You might also like