0% found this document useful (0 votes)
10 views

Classification Model To Classify Network Traffic

classification model to classify network traffic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Classification Model To Classify Network Traffic

classification model to classify network traffic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Classification Model Development and Evaluation Report

Anwar Khamis Hawash

1. Introduction:

• Objective: The objective of the project is to develop a classification model to classify


network traffic into different categories based on the provided dataset.
• Dataset: The dataset used for classification contains features related to network traffic,
such as flow duration, protocol type, rate, and various flags. The target variable is the
label indicating the type of network traffic.
* Getting the datasets:
!wget
http://205.174.165.80/IOTDataset/CIC_IOT_Dataset2023/Dataset/CSV/CI
CIoT2023.zip
>>>--2024-03-20 15:23:13--
http://205.174.165.80/IOTDataset/CIC_IOT_Dataset2023/Dataset/CSV/CICIoT2023.zi
p Connecting to 205.174.165.80:80... connected. HTTP request sent,
awaiting response... 200 OK Length: 2815436141 (2.6G)
[application/zip] Saving to: ‘CICIoT2023.zip’ CICIoT2023.zip 100%
[===================>] 2.62G 1.13MB/s in 41m 5s 2024-03-20 16:04:18
(1.09 MB/s) - ‘CICIoT2023.zip’ saved [2815436141/2815436141]

* Unzip :
!unzip /content/drive/MyDrive/CICIoT2023.zip
>>>the zip file extracted to about 189 csv file
d.columns
>>>Index(['flow_duration', 'Header_Length', 'Protocol Type', 'Duration', 'Rate',
'Srate', 'Drate', 'fin_flag_number', 'syn_flag_number',
'rst_flag_number', 'psh_flag_number', 'ack_flag_number',
'ece_flag_number', 'cwr_flag_number', 'ack_count', 'syn_count',
'fin_count', 'urg_count', 'rst_count', 'HTTP', 'HTTPS', 'DNS', 'Telnet',
'SMTP', 'SSH', 'IRC', 'TCP', 'UDP', 'DHCP', 'ARP', 'ICMP', 'IPv', 'LLC',
'Tot sum', 'Min', 'Max', 'AVG', 'Std', 'Tot size', 'IAT', 'Number',
'Magnitue', 'Radius', 'Covariance', 'Variance', 'Weight', 'label'],
dtype='object')

d['label'].unique()
>>>array(['DDoS-RSTFINFlood', 'DoS-TCP_Flood', 'DDoS-ICMP_Flood',
'DoS-UDP_Flood', 'DoS-SYN_Flood', 'Mirai-greeth_flood',
'DDoS-SynonymousIP_Flood', 'Mirai-udpplain', 'DDoS-SYN_Flood',
'DDoS-PSHACK_Flood', 'DDoS-TCP_Flood', 'DDoS-UDP_Flood',
'BenignTraffic', 'MITM-ArpSpoofing', 'DDoS-ACK_Fragmentation',
'Mirai-greip_flood', 'DoS-HTTP_Flood', 'DDoS-ICMP_Fragmentation',
'Recon-PortScan', 'DNS_Spoofing', 'DDoS-UDP_Fragmentation',
'Recon-OSScan', 'XSS', 'DDoS-HTTP_Flood', 'Recon-HostDiscovery',
'CommandInjection', 'VulnerabilityScan', 'DDoS-SlowLoris',
'Backdoor_Malware', 'BrowserHijacking', 'DictionaryBruteForce',
'SqlInjection', 'Recon-PingSweep', 'Uploading_Attack'],
dtype=object)

2. Data Preprocessing:

• Concatenating csv files : working with all csv files together is consuming the resource
of colap notebook, so I have just concatenate 20 csv file
import os
directory = '/content/'
file_paths = []
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file_paths.append(os.path.join(directory, filename))
selected_file_paths = file_paths[:20]
concatenated_df = pd.concat((pd.read_csv(f) for f in
selected_file_paths), axis=0)
concatenated_df.to_csv('concatenated_data.csv', index=False)

• Encoding Categorical Features: Encode categorical features using one-hot encoding


to convert them into numerical format.
X = pd.get_dummies(X)
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

• Splitting the Dataset: Split the dataset into training and testing sets for model training
and evaluation.
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

3. Model Selection and Training:

• XGBoost Classifier: The XGBoost classifier is chosen for the classification task due to
its effectiveness in handling complex datasets and its ability to provide high accuracy.
• Training Process: Initialize the XGBoost classifier and train it on the training set.

model = XGBClassifier(objective='multi:softmax',
num_class=len(label_encoder.classes_))
model.fit(X_train, y_train_encoded)

# Make predictions on the testing set


y_pred = model.predict(X_test)

4. Model Evaluation:

• Evaluation Results:
• Accuracy: The accuracy of the developed model is 99.16% on the testing set.
accuracy = accuracy_score(y_test_encoded, y_pred)
>>>Accuracy: 0.9933709709001998

• Classification Report: The classification report provides detailed metrics such


as precision, recall, and F1-score for each class, along with support values
print("\nClassification Report:\n",
classification_report(y_test_encoded, y_pred))

>>>

Class Label Precision Recall F1-Score Support


0 DDoS- 38.00% 6.00% 11.00% 79
RSTFINFlood
1 DoS-TCP_Flood 89.00% 98.00% 93.00% 27035
2 DDoS- 71.00% 18.00% 29.00% 157
ICMP_Flood
3 DoS- 71.00% 18.00% 28.00% 137
UDP_Flood
4 DoS- 100.00% 100.00% 100.00% 6994
SYN_Flood
5 Mirai- 99.00% 97.00% 98.00% 698
greeth_flood
6 DDoS- 100.00% 100.00% 100.00% 178059
SynonymousIP_
Flood
7 Mirai-udpplain 100.00% 100.00% 100.00% 11285
8 DDoS- 100.00% 100.00% 100.00% 101848
SYN_Flood
9 DDoS- 100.00% 100.00% 100.00% 100201
PSHACK_Flood
10 DDoS- 100.00% 100.00% 100.00% 100122
TCP_Flood
11 DDoS- 96.00% 98.00% 97.00% 538
UDP_Flood
12 BenignTraffic 100.00% 100.00% 100.00% 89085
13 MITM- 100.00% 100.00% 100.00% 111217
ArpSpoofing
14 DDoS- 100.00% 100.00% 100.00% 133767
ACK_Fragmenta
tion
15 Mirai- 100.00% 100.00% 100.00% 7064
greip_flood
16 DoS- 72.00% 68.00% 70.00% 4469
HTTP_Flood
17 DDoS- 62.00% 27.00% 38.00% 308
ICMP_Fragment
ation
18 Recon-PortScan 100.00% 100.00% 100.00% 1733
19 DNS_Spoofing 100.00% 100.00% 100.00% 50034
20 DDoS- 100.00% 100.00% 100.00% 65993
UDP_Fragmenta
tion
21 Recon-OSScan 100.00% 100.00% 100.00% 81761
22 XSS 88.00% 81.00% 85.00% 7572
23 DDoS- 100.00% 100.00% 100.00% 24584
HTTP_Flood
24 Recon- 100.00% 100.00% 100.00% 18474
HostDiscovery
25 CommandInjecti 100.00% 100.00% 100.00% 21908
on
26 VulnerabilitySca 78.00% 79.00% 79.00% 3361
n
27 DDoS- 67.00% 36.00% 47.00% 2419
SlowLoris
28 Backdoor_Malw 60.00% 13.00% 22.00% 45
are
29 BrowserHijackin 68.00% 66.00% 67.00% 2043
g
30 DictionaryBrute 38.00% 12.00% 19.00% 120
Force
31 SqlInjection 50.00% 3.00% 6.00% 34
32 Recon- 98.00% 99.00% 99.00% 918
PingSweep
33 Uploading_Attac 9.00% 1.00% 2.00% 104
k

• Accuracy: 0.99
• Macro Avg Precision: 0.84
• Macro Avg Recall: 0.74
• Macro Avg F1-Score: 0.76
• Weighted Avg Precision: 0.99
• Weighted Avg Recall: 0.99
• Weighted Avg F1-Score: 0.99
• Total Support: 1154166

5. Discussion:

• Performance: The model demonstrates high accuracy across most classes, indicating
its effectiveness in classifying network traffic.
• Observations: as I mentioned before working with all csv files together is consuming
the resource of colap notebook, so I have just concatenate 20 csv file
Why?
>> when I tried to train using only one csv file , I found that there is a shortage of some
samples, specifically those belonging to the lowest accuracy categories , so I
concatenated the 20 csv file together.
6. Conclusion:

1- Key Findings:

The classification task using the developed XGBoost model yielded promising results:
• Accuracy: The model achieved an accuracy of [99%], indicating its effectiveness in classifying
network traffic.
• Precision and Recall: The classification report reveals high precision and recall scores for
several classes, demonstrating the model's ability to correctly identify various types of network
traffic.
2- Applications:
The developed XGBoost model has several potential applications in real-world scenarios,
including:
• Network Security Monitoring: The model can be deployed in network security systems to
classify incoming traffic and identify potential threats such as DDoS attacks, TCP/UDP floods,
and malicious activities.
• Anomaly Detection: By analyzing patterns in network traffic, the model can detect anomalous
behavior and alert network administrators to potential security breaches or unusual activities.
• Traffic Management: The model's ability to classify different types of network traffic can be
utilized for optimizing network performance and resource allocation, ensuring efficient data
transmission and minimizing network congestion.

You might also like