Classification Model To Classify Network Traffic

classification model to classify network traffic

Uploaded by

أنور خميس عبدالحق

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Classification Model To Classify Network Traffic

classification model to classify network traffic

Uploaded by

أنور خميس عبدالحق

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Classification Model Development and Evaluation Report

Anwar Khamis Hawash

1. Introduction:

• Objective: The objective of the project is to develop a classification model to classify

network traffic into different categories based on the provided dataset.
• Dataset: The dataset used for classification contains features related to network traffic,
such as flow duration, protocol type, rate, and various flags. The target variable is the
label indicating the type of network traffic.
* Getting the datasets:
!wget
http://205.174.165.80/IOTDataset/CIC_IOT_Dataset2023/Dataset/CSV/CI
CIoT2023.zip
>>>--2024-03-20 15:23:13--
http://205.174.165.80/IOTDataset/CIC_IOT_Dataset2023/Dataset/CSV/CICIoT2023.zi
p Connecting to 205.174.165.80:80... connected. HTTP request sent,
awaiting response... 200 OK Length: 2815436141 (2.6G)
[application/zip] Saving to: ‘CICIoT2023.zip’ CICIoT2023.zip 100%
[===================>] 2.62G 1.13MB/s in 41m 5s 2024-03-20 16:04:18
(1.09 MB/s) - ‘CICIoT2023.zip’ saved [2815436141/2815436141]

* Unzip :
!unzip /content/drive/MyDrive/CICIoT2023.zip
>>>the zip file extracted to about 189 csv file
d.columns
>>>Index(['flow_duration', 'Header_Length', 'Protocol Type', 'Duration', 'Rate',
'Srate', 'Drate', 'fin_flag_number', 'syn_flag_number',
'rst_flag_number', 'psh_flag_number', 'ack_flag_number',
'ece_flag_number', 'cwr_flag_number', 'ack_count', 'syn_count',
'fin_count', 'urg_count', 'rst_count', 'HTTP', 'HTTPS', 'DNS', 'Telnet',
'SMTP', 'SSH', 'IRC', 'TCP', 'UDP', 'DHCP', 'ARP', 'ICMP', 'IPv', 'LLC',
'Tot sum', 'Min', 'Max', 'AVG', 'Std', 'Tot size', 'IAT', 'Number',
'Magnitue', 'Radius', 'Covariance', 'Variance', 'Weight', 'label'],
dtype='object')

d['label'].unique()
>>>array(['DDoS-RSTFINFlood', 'DoS-TCP_Flood', 'DDoS-ICMP_Flood',
'DoS-UDP_Flood', 'DoS-SYN_Flood', 'Mirai-greeth_flood',
'DDoS-SynonymousIP_Flood', 'Mirai-udpplain', 'DDoS-SYN_Flood',
'DDoS-PSHACK_Flood', 'DDoS-TCP_Flood', 'DDoS-UDP_Flood',
'BenignTraffic', 'MITM-ArpSpoofing', 'DDoS-ACK_Fragmentation',
'Mirai-greip_flood', 'DoS-HTTP_Flood', 'DDoS-ICMP_Fragmentation',
'Recon-PortScan', 'DNS_Spoofing', 'DDoS-UDP_Fragmentation',
'Recon-OSScan', 'XSS', 'DDoS-HTTP_Flood', 'Recon-HostDiscovery',
'CommandInjection', 'VulnerabilityScan', 'DDoS-SlowLoris',
'Backdoor_Malware', 'BrowserHijacking', 'DictionaryBruteForce',
'SqlInjection', 'Recon-PingSweep', 'Uploading_Attack'],
dtype=object)
•
2. Data Preprocessing:

• Concatenating csv files : working with all csv files together is consuming the resource
of colap notebook, so I have just concatenate 20 csv file
import os
directory = '/content/'
file_paths = []
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file_paths.append(os.path.join(directory, filename))
selected_file_paths = file_paths[:20]
concatenated_df = pd.concat((pd.read_csv(f) for f in
selected_file_paths), axis=0)
concatenated_df.to_csv('concatenated_data.csv', index=False)

• Encoding Categorical Features: Encode categorical features using one-hot encoding

to convert them into numerical format.
X = pd.get_dummies(X)
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

• Splitting the Dataset: Split the dataset into training and testing sets for model training
and evaluation.
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

3. Model Selection and Training:

• XGBoost Classifier: The XGBoost classifier is chosen for the classification task due to
its effectiveness in handling complex datasets and its ability to provide high accuracy.
• Training Process: Initialize the XGBoost classifier and train it on the training set.

model = XGBClassifier(objective='multi:softmax',
num_class=len(label_encoder.classes_))
model.fit(X_train, y_train_encoded)

# Make predictions on the testing set

y_pred = model.predict(X_test)

4. Model Evaluation:

• Evaluation Results:
• Accuracy: The accuracy of the developed model is 99.16% on the testing set.
accuracy = accuracy_score(y_test_encoded, y_pred)
>>>Accuracy: 0.9933709709001998

• Classification Report: The classification report provides detailed metrics such

as precision, recall, and F1-score for each class, along with support values
print("\nClassification Report:\n",
classification_report(y_test_encoded, y_pred))

>>>

Class Label Precision Recall F1-Score Support

0 DDoS- 38.00% 6.00% 11.00% 79
RSTFINFlood
1 DoS-TCP_Flood 89.00% 98.00% 93.00% 27035
2 DDoS- 71.00% 18.00% 29.00% 157
ICMP_Flood
3 DoS- 71.00% 18.00% 28.00% 137
UDP_Flood
4 DoS- 100.00% 100.00% 100.00% 6994
SYN_Flood
5 Mirai- 99.00% 97.00% 98.00% 698
greeth_flood
6 DDoS- 100.00% 100.00% 100.00% 178059
SynonymousIP_
Flood
7 Mirai-udpplain 100.00% 100.00% 100.00% 11285
8 DDoS- 100.00% 100.00% 100.00% 101848
SYN_Flood
9 DDoS- 100.00% 100.00% 100.00% 100201
PSHACK_Flood
10 DDoS- 100.00% 100.00% 100.00% 100122
TCP_Flood
11 DDoS- 96.00% 98.00% 97.00% 538
UDP_Flood
12 BenignTraffic 100.00% 100.00% 100.00% 89085
13 MITM- 100.00% 100.00% 100.00% 111217
ArpSpoofing
14 DDoS- 100.00% 100.00% 100.00% 133767
ACK_Fragmenta
tion
15 Mirai- 100.00% 100.00% 100.00% 7064
greip_flood
16 DoS- 72.00% 68.00% 70.00% 4469
HTTP_Flood
17 DDoS- 62.00% 27.00% 38.00% 308
ICMP_Fragment
ation
18 Recon-PortScan 100.00% 100.00% 100.00% 1733
19 DNS_Spoofing 100.00% 100.00% 100.00% 50034
20 DDoS- 100.00% 100.00% 100.00% 65993
UDP_Fragmenta
tion
21 Recon-OSScan 100.00% 100.00% 100.00% 81761
22 XSS 88.00% 81.00% 85.00% 7572
23 DDoS- 100.00% 100.00% 100.00% 24584
HTTP_Flood
24 Recon- 100.00% 100.00% 100.00% 18474
HostDiscovery
25 CommandInjecti 100.00% 100.00% 100.00% 21908
on
26 VulnerabilitySca 78.00% 79.00% 79.00% 3361
n
27 DDoS- 67.00% 36.00% 47.00% 2419
SlowLoris
28 Backdoor_Malw 60.00% 13.00% 22.00% 45
are
29 BrowserHijackin 68.00% 66.00% 67.00% 2043
g
30 DictionaryBrute 38.00% 12.00% 19.00% 120
Force
31 SqlInjection 50.00% 3.00% 6.00% 34
32 Recon- 98.00% 99.00% 99.00% 918
PingSweep
33 Uploading_Attac 9.00% 1.00% 2.00% 104
k

• Accuracy: 0.99
• Macro Avg Precision: 0.84
• Macro Avg Recall: 0.74
• Macro Avg F1-Score: 0.76
• Weighted Avg Precision: 0.99
• Weighted Avg Recall: 0.99
• Weighted Avg F1-Score: 0.99
• Total Support: 1154166

5. Discussion:

• Performance: The model demonstrates high accuracy across most classes, indicating
its effectiveness in classifying network traffic.
• Observations: as I mentioned before working with all csv files together is consuming
the resource of colap notebook, so I have just concatenate 20 csv file
Why?
>> when I tried to train using only one csv file , I found that there is a shortage of some
samples, specifically those belonging to the lowest accuracy categories , so I
concatenated the 20 csv file together.
6. Conclusion:

1- Key Findings:

The classification task using the developed XGBoost model yielded promising results:
• Accuracy: The model achieved an accuracy of [99%], indicating its effectiveness in classifying
network traffic.
• Precision and Recall: The classification report reveals high precision and recall scores for
several classes, demonstrating the model's ability to correctly identify various types of network
traffic.
2- Applications:
The developed XGBoost model has several potential applications in real-world scenarios,
including:
• Network Security Monitoring: The model can be deployed in network security systems to
classify incoming traffic and identify potential threats such as DDoS attacks, TCP/UDP floods,
and malicious activities.
• Anomaly Detection: By analyzing patterns in network traffic, the model can detect anomalous
behavior and alert network administrators to potential security breaches or unusual activities.
• Traffic Management: The model's ability to classify different types of network traffic can be
utilized for optimizing network performance and resource allocation, ensuring efficient data
transmission and minimizing network congestion.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Malware Analysis
0% (1)
Malware Analysis
11 pages
01 Spark
No ratings yet
01 Spark
7 pages
Test 2
No ratings yet
Test 2
11 pages
Speedify CLI
No ratings yet
Speedify CLI
29 pages
M1_Cloud_French
No ratings yet
M1_Cloud_French
7 pages
Crash 2019 09 10 07 54 00
No ratings yet
Crash 2019 09 10 07 54 00
3 pages
Crash 2019 09 12 06 48 04
No ratings yet
Crash 2019 09 12 06 48 04
3 pages
Anonymous_Scepter
100% (1)
Anonymous_Scepter
31 pages
CNN
No ratings yet
CNN
8 pages
Cisco IOS XE: PPPoE Radius
No ratings yet
Cisco IOS XE: PPPoE Radius
6 pages
Log
No ratings yet
Log
49 pages
Drop Box
No ratings yet
Drop Box
179 pages
SF Dump
No ratings yet
SF Dump
27 pages
Drop Box
No ratings yet
Drop Box
696 pages
Freee
No ratings yet
Freee
4 pages
DCCN Lab Record
No ratings yet
DCCN Lab Record
66 pages
dropbox
No ratings yet
dropbox
60 pages
Crash 2019 03 14 14 53 17
No ratings yet
Crash 2019 03 14 14 53 17
3 pages
Traffic Shaping Schedules Fortigate
No ratings yet
Traffic Shaping Schedules Fortigate
3 pages
Drop Box
No ratings yet
Drop Box
144 pages
Nmap Commands
No ratings yet
Nmap Commands
8 pages
crash-2024-02-17-13-26-26
No ratings yet
crash-2024-02-17-13-26-26
4 pages
crash-2025-03-21-11-15-01
No ratings yet
crash-2025-03-21-11-15-01
6 pages
Drop Box
No ratings yet
Drop Box
72 pages
Indi - Colab
No ratings yet
Indi - Colab
11 pages
SF Dump
No ratings yet
SF Dump
15 pages
MikroTik - OpenVPN Server Setup - ShellHacks
No ratings yet
MikroTik - OpenVPN Server Setup - ShellHacks
12 pages
Dropbox
No ratings yet
Dropbox
183 pages
JL_LOG-2024-12-26-01-10-52
No ratings yet
JL_LOG-2024-12-26-01-10-52
53 pages
SF_Dump
No ratings yet
SF_Dump
10 pages
dumpsys
No ratings yet
dumpsys
1,286 pages
Vertopal.com_ML LAB 8
No ratings yet
Vertopal.com_ML LAB 8
9 pages
dropbox
No ratings yet
dropbox
184 pages
Drop Box
No ratings yet
Drop Box
36 pages
Bugreport Augusta SP1A.210812.016 2022 12 30 07 20 09
No ratings yet
Bugreport Augusta SP1A.210812.016 2022 12 30 07 20 09
9,993 pages
Report
No ratings yet
Report
11 pages
Dropbox
No ratings yet
Dropbox
24 pages
SF Dump
No ratings yet
SF Dump
12 pages
Drop Box
No ratings yet
Drop Box
335 pages
dropbox
No ratings yet
dropbox
45 pages
Drop Box
No ratings yet
Drop Box
6 pages
Dot1x LAB
No ratings yet
Dot1x LAB
12 pages
٢٠٢٢ ٠٧ ٢٠
No ratings yet
٢٠٢٢ ٠٧ ٢٠
23 pages
Bugreport
No ratings yet
Bugreport
13 pages
Surface Flinger
No ratings yet
Surface Flinger
2 pages
Gov Uscourts FLSD 521536 237 7
No ratings yet
Gov Uscourts FLSD 521536 237 7
5 pages
MS888G2 - RFC2544 Benchmarking Test Report
No ratings yet
MS888G2 - RFC2544 Benchmarking Test Report
14 pages
SF_Dump
No ratings yet
SF_Dump
12 pages
WSC2022SE TP54 MD Actual en
No ratings yet
WSC2022SE TP54 MD Actual en
5 pages
Period Start Time PLMN Name: Voice Call Setup SR (Rrc+Cu) CSSR Ps NRT Rrcblockingcongestioncs
No ratings yet
Period Start Time PLMN Name: Voice Call Setup SR (Rrc+Cu) CSSR Ps NRT Rrcblockingcongestioncs
44 pages
Log
No ratings yet
Log
61 pages
Journal 0144
No ratings yet
Journal 0144
20 pages
Drop Box
No ratings yet
Drop Box
208 pages
Drop Box
No ratings yet
Drop Box
7 pages
crash-2024-02-02-11-58-28
No ratings yet
crash-2024-02-02-11-58-28
4 pages
2020BIT007 Assignment No7.Ipynb - Colaboratory
No ratings yet
2020BIT007 Assignment No7.Ipynb - Colaboratory
2 pages
W7D1 - Sprint3 - LoadBalancer & AWS - Presenting Project During Interview (17june)
No ratings yet
W7D1 - Sprint3 - LoadBalancer & AWS - Presenting Project During Interview (17june)
13 pages
dump-networking-2025_0123_205533_stop
No ratings yet
dump-networking-2025_0123_205533_stop
974 pages
DevOps for the Desperate: A Hands-On Survival Guide
From Everand
DevOps for the Desperate: A Hands-On Survival Guide
Bradley Smith
No ratings yet
CCNA Exam Focus: Study Guide with Practice Tests
From Everand
CCNA Exam Focus: Study Guide with Practice Tests
SUJAN
No ratings yet
YOLO-Based Vehicle Plate Number Recognition With Unconstrained Conditions
No ratings yet
YOLO-Based Vehicle Plate Number Recognition With Unconstrained Conditions
6 pages
MACHINE LEARNING-ASSIGNMENT-2_b339f095e2fec15758a04c22fc924544
No ratings yet
MACHINE LEARNING-ASSIGNMENT-2_b339f095e2fec15758a04c22fc924544
3 pages
Data Analytics and Model Evaluation
No ratings yet
Data Analytics and Model Evaluation
55 pages
Yolo-V7 Object Detection Assessment
No ratings yet
Yolo-V7 Object Detection Assessment
15 pages
Classification of Palm Trees Diseases Using Convolution Neural Network
No ratings yet
Classification of Palm Trees Diseases Using Convolution Neural Network
8 pages
Information Retrieval (IR) Is The Science of
No ratings yet
Information Retrieval (IR) Is The Science of
10 pages
20.k1.0038 Proposal Project Report Kelar-1
No ratings yet
20.k1.0038 Proposal Project Report Kelar-1
31 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
(Paper) An Approach Vehicle's Classification Using BRISK Feature Extraction
No ratings yet
(Paper) An Approach Vehicle's Classification Using BRISK Feature Extraction
6 pages
Unit 7 - Evaluation
No ratings yet
Unit 7 - Evaluation
7 pages
3465-Article Text-21203-5-10-20240401
No ratings yet
3465-Article Text-21203-5-10-20240401
12 pages
Biomedical Image Analysis Using Python
No ratings yet
Biomedical Image Analysis Using Python
27 pages
A Deep Learning-Based Framework For Offensive Text Detection in Unstructured Data For Heterogeneous Social Media
No ratings yet
A Deep Learning-Based Framework For Offensive Text Detection in Unstructured Data For Heterogeneous Social Media
15 pages
Lecture 4-5
No ratings yet
Lecture 4-5
48 pages
A hybrid approach to automatic corpus generation for Chinese spelling check
No ratings yet
A hybrid approach to automatic corpus generation for Chinese spelling check
11 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
19 - Crop Recommender System Using Machine Learning Approach
No ratings yet
19 - Crop Recommender System Using Machine Learning Approach
64 pages
Asif Et Al. - 2023
No ratings yet
Asif Et Al. - 2023
19 pages
Detection of Indonesian Fishing Vessels On Unmanned Aerial Vehicle Images Using
No ratings yet
Detection of Indonesian Fishing Vessels On Unmanned Aerial Vehicle Images Using
5 pages
Cyberbullying Detection On Twitter Using Machine Learning A Review
No ratings yet
Cyberbullying Detection On Twitter Using Machine Learning A Review
5 pages
SSRN Id4835311
No ratings yet
SSRN Id4835311
54 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
AI
No ratings yet
AI
107 pages
Documentation - Ishaan Mittal - Jio - Assessment
No ratings yet
Documentation - Ishaan Mittal - Jio - Assessment
9 pages
9071-PDF
No ratings yet
9071-PDF
16 pages
Journal of Land Management and Geomatics Education Volume 3
No ratings yet
Journal of Land Management and Geomatics Education Volume 3
66 pages
Personality Prediction Model For Social Media Us - 2022 - Computers and Electric
No ratings yet
Personality Prediction Model For Social Media Us - 2022 - Computers and Electric
12 pages
Fracture Detection in Pediatric Wrist Trauma X-Ray Images Using Yolov8 Algorithm
No ratings yet
Fracture Detection in Pediatric Wrist Trauma X-Ray Images Using Yolov8 Algorithm
15 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages