Classification Model To Classify Network Traffic
Classification Model To Classify Network Traffic
1. Introduction:
* Unzip :
!unzip /content/drive/MyDrive/CICIoT2023.zip
>>>the zip file extracted to about 189 csv file
d.columns
>>>Index(['flow_duration', 'Header_Length', 'Protocol Type', 'Duration', 'Rate',
'Srate', 'Drate', 'fin_flag_number', 'syn_flag_number',
'rst_flag_number', 'psh_flag_number', 'ack_flag_number',
'ece_flag_number', 'cwr_flag_number', 'ack_count', 'syn_count',
'fin_count', 'urg_count', 'rst_count', 'HTTP', 'HTTPS', 'DNS', 'Telnet',
'SMTP', 'SSH', 'IRC', 'TCP', 'UDP', 'DHCP', 'ARP', 'ICMP', 'IPv', 'LLC',
'Tot sum', 'Min', 'Max', 'AVG', 'Std', 'Tot size', 'IAT', 'Number',
'Magnitue', 'Radius', 'Covariance', 'Variance', 'Weight', 'label'],
dtype='object')
d['label'].unique()
>>>array(['DDoS-RSTFINFlood', 'DoS-TCP_Flood', 'DDoS-ICMP_Flood',
'DoS-UDP_Flood', 'DoS-SYN_Flood', 'Mirai-greeth_flood',
'DDoS-SynonymousIP_Flood', 'Mirai-udpplain', 'DDoS-SYN_Flood',
'DDoS-PSHACK_Flood', 'DDoS-TCP_Flood', 'DDoS-UDP_Flood',
'BenignTraffic', 'MITM-ArpSpoofing', 'DDoS-ACK_Fragmentation',
'Mirai-greip_flood', 'DoS-HTTP_Flood', 'DDoS-ICMP_Fragmentation',
'Recon-PortScan', 'DNS_Spoofing', 'DDoS-UDP_Fragmentation',
'Recon-OSScan', 'XSS', 'DDoS-HTTP_Flood', 'Recon-HostDiscovery',
'CommandInjection', 'VulnerabilityScan', 'DDoS-SlowLoris',
'Backdoor_Malware', 'BrowserHijacking', 'DictionaryBruteForce',
'SqlInjection', 'Recon-PingSweep', 'Uploading_Attack'],
dtype=object)
•
2. Data Preprocessing:
• Concatenating csv files : working with all csv files together is consuming the resource
of colap notebook, so I have just concatenate 20 csv file
import os
directory = '/content/'
file_paths = []
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file_paths.append(os.path.join(directory, filename))
selected_file_paths = file_paths[:20]
concatenated_df = pd.concat((pd.read_csv(f) for f in
selected_file_paths), axis=0)
concatenated_df.to_csv('concatenated_data.csv', index=False)
• Splitting the Dataset: Split the dataset into training and testing sets for model training
and evaluation.
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
• XGBoost Classifier: The XGBoost classifier is chosen for the classification task due to
its effectiveness in handling complex datasets and its ability to provide high accuracy.
• Training Process: Initialize the XGBoost classifier and train it on the training set.
model = XGBClassifier(objective='multi:softmax',
num_class=len(label_encoder.classes_))
model.fit(X_train, y_train_encoded)
4. Model Evaluation:
• Evaluation Results:
• Accuracy: The accuracy of the developed model is 99.16% on the testing set.
accuracy = accuracy_score(y_test_encoded, y_pred)
>>>Accuracy: 0.9933709709001998
>>>
• Accuracy: 0.99
• Macro Avg Precision: 0.84
• Macro Avg Recall: 0.74
• Macro Avg F1-Score: 0.76
• Weighted Avg Precision: 0.99
• Weighted Avg Recall: 0.99
• Weighted Avg F1-Score: 0.99
• Total Support: 1154166
5. Discussion:
• Performance: The model demonstrates high accuracy across most classes, indicating
its effectiveness in classifying network traffic.
• Observations: as I mentioned before working with all csv files together is consuming
the resource of colap notebook, so I have just concatenate 20 csv file
Why?
>> when I tried to train using only one csv file , I found that there is a shortage of some
samples, specifically those belonging to the lowest accuracy categories , so I
concatenated the 20 csv file together.
6. Conclusion:
1- Key Findings:
The classification task using the developed XGBoost model yielded promising results:
• Accuracy: The model achieved an accuracy of [99%], indicating its effectiveness in classifying
network traffic.
• Precision and Recall: The classification report reveals high precision and recall scores for
several classes, demonstrating the model's ability to correctly identify various types of network
traffic.
2- Applications:
The developed XGBoost model has several potential applications in real-world scenarios,
including:
• Network Security Monitoring: The model can be deployed in network security systems to
classify incoming traffic and identify potential threats such as DDoS attacks, TCP/UDP floods,
and malicious activities.
• Anomaly Detection: By analyzing patterns in network traffic, the model can detect anomalous
behavior and alert network administrators to potential security breaches or unusual activities.
• Traffic Management: The model's ability to classify different types of network traffic can be
utilized for optimizing network performance and resource allocation, ensuring efficient data
transmission and minimizing network congestion.