2405.11002v1
2405.11002v1
{hzhan363, melike.erolkantarci}@uottawa.ca,
{akram.bin.sediq, ali.afana}@ericsson.com
arXiv:2405.11002v1 [cs.LG] 17 May 2024
Abstract—Large language models (LLMs), especially gener- to effectively execute complex tasks. Additionally, as models
ative pre-trained transformers (GPTs), have recently demon- grow in scale, the process of fine-tuning LLMs becomes costly
strated outstanding ability in information comprehension and and technically challenging [5]. On the other hand, in-context
problem-solving. This has motivated many studies in applying
LLMs to wireless communication networks. In this paper, we learning gives insights into tackling this challenge [6]. In-
propose a pre-trained LLM-empowered framework to perform context learning is a specific prompt-building method where
fully automatic network intrusion detection. Three in-context task-related examples are provided as part of the prompt. In
learning methods are designed and compared to enhance the this way, it can empower the LLM domain-specific knowledge
performance of LLMs. With experiments on a real network without updating the pre-trained model parameters.
intrusion detection dataset, in-context learning proves to be
highly beneficial in improving the task processing performance in There are several benefits of using pre-trained LLMs instead
a way that no further training or fine-tuning of LLMs is required. of traditional machine learning (ML) models. First, there is no
We show that for GPT-4, testing accuracy and F1-Score can need to train the ML models from scratch for each individual
be improved by 90%. Moreover, pre-trained LLMs demonstrate task in different scenarios. This provides the possibility of
big potential in performing wireless communication-related tasks. using one model to fit all the network functions since LLM
Specifically, the proposed framework can reach an accuracy and
F1-Score of over 95% on different types of attacks with GPT-4 parameters do not need to be updated when applied to differ-
using only 10 in-context learning examples. ent tasks. In addition, traditional ML-based methods usually
Index Terms—Large language model, GPT, Generative AI, require large amounts of training data to perform given tasks.
intrusion detection, wireless networks, B5G and 6G, in-context In contrast, with the general intelligence obtained from pre-
learning. training, LLMs-based methods only need minimal example
I. I NTRODUCTION
data. Using LLM also solves the over-fitting problems of
Artificial intelligence (AI) enables machines to perform traditional ML models. Moreover, the network robustness is
automated perception and decision-making tasks that normally improved since using pre-trained LLMs can prevent attacks
require human intelligence. In the past years, AI technologies during the model training phase. The explainability of model
have been applied to wireless communication applications to decisions can also be improved since LLMs can give semantic
support a multitude of services. The widespread use of AI explanations about the decisions. Therefore, leveraging pre-
enables wireless networks to adapt to environmental changes trained LLMs for wireless network functions instead of tradi-
in an automated fashion and serves as a basis for AI-native tional ML-based methods is a promising direction.
sixth-generation (6G) networks [1]. Inspired by the above-mentioned thoughts, in this work, we
Large language models (LLMs), especially generative pre- explore the potential of using pre-trained LLMs to automati-
trained transformers (GPTs), have recently received extensive cally perform wireless communications-related tasks with in-
attention in various areas. Given the impressive capacity for in- context learning. Specifically, we propose a fully automated
formation comprehension and strategic planning demonstrated framework empowered by pre-trained LLMs and compare its
by LLMs, there is a future vision of achieving a self-evolving performance to a traditional Convolutional Neural Network
wireless network [2] by integrating LLMs into Radio Access (CNN)-based network intrusion detection model. Three well-
Network (RAN) controllers [3]. The Generative AI journey known LLMs are involved in the experiments, namely GPT-
in wireless networks begins by including LLM as a solution 3.5, GPT-4, and LLAMA. Our focus is on evaluating the effi-
for wireless application design and developing LLM-driven cacy of LLMs in wireless communication tasks and the impact
network applications [4]. of prompt design. The main contributions are summarized as
Although the potential research directions of combining follows:
LLMs with wireless communications have been explored by (1) We design a pipeline of automated LLM-empowered
several existing studies, only limited number of studies have network applications. This pipeline involves LLM-based input
proposed concrete realization methods or applications. The information selection, prompt building, in-context learning,
primary concern with this approach is that pre-trained LLMs and output conversion. This realizes a seamless integration of
may lack sufficient wireless communication-related knowledge LLMs within wireless communication systems.
(2) We propose a translation method between wireless with LLMs. [3] discussed several ways to integrate LLMs
network information and human-like text-based information within the 6G system. These works are from the perspective of
to realize the communication between LLM and wireless conducting an overview, however, our research can be seen as
networks. a concrete practice of possible research directions mentioned
(3) We propose three distinct in-context learning methods, in previous works. In addition, [11] proposed a LLM-enhanced
namely illustrative in-context learning, heuristic in-context multi-agent system that can generate communication models
learning, and interactive in-context learning. Each is designed with customized communication knowledge. [12] proposed a
to significantly improve the performance of pre-trained LLMs LLM-based framework that can train and coordinate edge AI
in the specified task. models to meet users’ demands. Similarly, [13] proposed a
According to the experimental results, with the benefit of in- split learning framework for LLM agents in 6G networks that
context learning, the pre-trained LLM is capable of performing enable collaboration between distributed devices. Our work
the intrusion detection task. Specifically, for GPT-4, in-context uses LLMs in a different way. Instead of generating code with
learning can improve the detection accuracy and F1-Score by LLMs or designing generic frameworks, we directly generate
90%. With more than 10 in-context learning examples, GPT-4 wireless network-related decisions with logical reasoning. It
can reach a detection accuracy and F1-Score higher than 95%. is worth noting that the proposed LLM framework is versatile
It achieves comparable or superior performance to traditional and can be adapted for various network application designs
ML models with only a small amount of task-specific data. beyond network intrusion detection. For instance, it can also
Although the proposed framework demonstrates good per- be applied to cell configuration, incoming traffic analytics and
formance, LLM, being a very new technology, still have some dynamic resource management [3][14].
potential pitfalls and risks while being applied to practical III. LLM- EMPOWERED NETWORK INTRUSION DETECTION
scenarios. Some frequently raised concerns include adversarial
The system model of the proposed pre-trained LLM-
prompting, hallucination and stochastic output. These concerns
empowered network intrusion detection framework is shown
can be mitigated by appropriate prompt design skills like
in Fig 1. In this work, we consider a cloud-based wireless
output formatting and in-context learning. Still, more research
communication system where malicious attackers can perform
on LLM needs to be conducted in the future to make it more
distributed denial of service (DDoS) attacks and generate
reliable.
malicious traffic to the networks. At the network controller
The rest of the paper is organized as follows. Section II
side, a cloud-based pre-trained LLM is deployed for network
introduces related works. Section III explains the implemen-
security monitoring and intrusion detection.
tation of the LLM-empowered network intrusion detection
As shown in Fig 1, four main steps are designed in the
framework. Section IV describes the design of in-context
framework to enable fully automatic intrusion detection for
learning methods. Section V gives experimental results, and
zero-touch networks. In the first step, the most relevant net-
Section VI concludes the paper.
work features for intrusion detection are selected by the LLM.
II. R ELATED WORKS In the second step, data is collected from the networks and
There are a few studies that have applied pre-trained LLMs, processed for LLM input. The third step is to build the prompt
especially GPT models, for domain-specific applications. For for LLM, and the last step is to extract the desired decision
instance, [7] proposed a framework using the GPT-4 model from LLM output. These steps automate the interactions
to explore the Minecraft world without human intervention. between the LLM and the 5G communication system, allowing
In [8], medical domain knowledge and logical reasoning of the LLM to choose desired network information based on its
LLMs are leveraged to enhance the output of computer- knowledge base, capture the required values from the system,
aided diagnosis networks. These studies have demonstrated the and feedback decisions. Detailed implementation of each step
powerful capabilities of LLMs and have shown the potential is explained below.
for broader applications. A. Feature selection
On the other hand, some studies have discussed how to To start LLM-based intrusion detection, feature selection is
use in-context learning to enhance the performance of LLMs. first performed to select the most important features from a
For instance, [9] discussed how to select effective in-context large number of network features according to their relevance
learning examples for LLMs to implement LLM-based code to the intrusion detection task. This step can be implemented
generation. In [10], in-context learning is applied to LLMs with the knowledge base of pre-trained LLMs.
to handle multiple task inputs and reduce costs. Different First, all the accessible network features are indexed and
from these works, our work proposes three different ways given to the LLM. Next, the LLM is instructed to select
to perform in-context learning and evaluate the effect of in- the indexing numbers corresponding to the ten most relevant
context learning on wireless communication-related tasks. features of the network intrusion detection task. The LLM is
Some other works have discussed the combination of then requested to rank the importance of all selected features
LLMs with wireless communications. In [2], two aspects on three levels: ‘very important’, ‘kind of important’, and ‘not
are discussed, which are how to leverage LLMs in wireless very important’. Only ’very important’ features and ’kind of
communications and how to empower wireless communication important’ features will be kept as desired features for the
2. Data collection 3. Prompt building
Decide the most relevant Translate network indicators
features for input. into text-based input. (a). Instructions
1. Feature selection gNB • Task publishing and clarification.
• Role description.
• Listing and indexing
all the available (b) In-context learning
network indicators. • Performing in-context learning with
• Choose useful examples.
indicators.
(c) Output format
• Rank the importance Benign Attacker • Defining output format.
of chosen indicators. user 1
Benign Benign (d) Input
user 2 user 3 • Showing values of current network
4. Decision extraction indicators
Translate text-based results into network
conditions.
following detection tasks. The goal of this step is to concise step, the decision extraction can be achieved with keyword
the length of the LLM input token and to avoid the impact of searching. In detail, if ”yes” or ”Yes” appears in the LLM
irrelevant features on the detection results. output while ”no” or ”No” does not appear, then the decision
B. Data collection and processing is that the traffic is malicious and vice versa. If neither of
After feature selection, the LLM is employed to monitor the the above cases is valid, output formatting will continue to be
network status for suspected intrusions. First, the values cor- performed until the desired clear output is obtained.
responding to the selected features are collected and converted IV. I N - CONTEXT LEARNING - ENHANCED DETECTION
to a text-based format. Next, a text-based template is created to Although LLMs may have some basic intelligence and
give definitions and semantic explanations about the collected knowledge about wireless networks, it is still difficult to
values. Finally, the text-based values are concatenated with directly perform intrusion detection tasks through pre-trained
pre-defined templates. Based on the above steps, selected input LLMs without fine-tuning. In-context learning is an effective
features can be translated into comprehensible human-like method that can improve the accuracy of LLMs on specific
descriptions of current network conditions. tasks with only a small amount of labelled data.
C. Prompt building In this section, we illustrate how we design in-context learn-
In this step, prompts are designed to provide the LLM ing schemes with labelled examples to improve the perfor-
with formalized guidance for effective intrusion detection. The mance of pre-trained LLMs for domain-specific tasks. In first
prompts are composed of four parts, instructions, in-context subsection, general principles of in-context learning in LLMs
learning examples, output formatting, and input information. are introduced. In second subsection, detailed implementation
The instructions usually include the task publishing and of in-context learning in our framework is given.
clarification and role description. In particular, for network A. In-context learning for LLM
intrusion detection tasks, the prompts should first define what In-context learning is an effective technique that enables
a network intrusion is, describe the role of the LLM as a 5G pre-trained LLMs to address specific tasks without updating
network safety monitor and clarify the task is to determine the parameters of LLM. This is achieved by integrating
whether the traffic is from a malicious user based on given text-based examples into the prompts and by changing the
information. Next, examples of both benign cases and mali- distribution of the input text to obtain the desired output. The
cious cases are included in the prompts to provide the LLM process can be formulated as:
with task-specific knowledge through in-context learning. The yt∗ = argmax P (yt |(x1 , y1 ), ..., (xn , yn ), xt ; Θorigin ) (1)
detailed implementation of in-context learning is given in the yt
Interactive:
Interactive in-context learning In the case of …, Your answer is correct. Please classify
Now, you observed the following information. 1. Feed network condition do you think this such examples as benign\malicious in
The source IP address of the traffic is _____, descriptions and inquire about traffic is your following judgments.
malicious? You
the destination IP address is _____, and the detection results.
should answer
2. Give correct answers
flow duration is ______. 3. Ask LLM for self-evaluation and yes/no.
Your answer is wrong, Can you
provoke thinking.
rethink why?
Data processing and conversion
Fig. 2: Three different methods to implement in-context learning with labelled examples.
learning. The implementations of three methods are presented between traditional ML models and LLMs. The performance is
in Fig. 2. evaluated according to the accuracy and F1-Score of network
In the illustrative in-context learning method, examples intrusion detection. All the models are implemented using
with labels are converted into human (and LLM)-interpretable Python with the support of PyTorch, Openai-api, and hug-
descriptions of cases and expected answers. Then, they are laid gingface.
out and fed directly into the LLM as part of the prompt. The B. experimental results
prompts usually start with identifying statements like ”Here Fig 3 shows the feature selection results of the network
are some examples” and each example ends with conclusive intrusion detection task from three LLMs, GPT-3.5, GPT-4,
instructions like ”You should answer yes/no.” and LLAMA. The manually selected features for a CNN-based
The illustrative in-context learning method is the simplest network intrusion detection model proposed in [15] are also
among the three in-context learning methods. On this basis, added for comparison. During the experiments, each LLM is
the heuristic in-context learning method is designed by adding asked to choose ten features from the given 84 features, and
heuristic questions to the prompts. Based on the given network the importance of these features are ranked according to three
intrusion detection task scenario, some critical questions that levels, ’very important’, ’kind of important’, and ’not very
may shape the outcome are extracted, like ”What are the important’. Only 16 features are mentioned by three LLMs,
commonalities of all the malicious examples?” or ”What is and among these 16 features, 5 features mentioned by only
the rational range of flow duration?”. These questions can one LLM are ranked as ’not very important’. This suggests
give insights to the LLM on what should be concluded from that different LLMs exhibit a high degree of similarity in
the given examples. The LLM outputs of these questions will feature selection for the given task. It can also be observed that
also be included in the prompts as context information for the the selected features overlap significantly with the manually
following intrusion detection tasks. selected features. This validates the knowledge base of pre-
The last proposed in-context learning method is the interac- trained LLM in the wireless communication domain and its
tive in-context learning. In this method, in-context examples ability to handle related tasks. In the following experiments,
are given and analyzed through a question-and-answer format. features ranked as ’not very important’ are removed and only
First, the examples are fed into the LLM without giving the ’very important’ features and ’kind of important’ features are
expected answers. Then the LLM is asked to provide detection kept as the input for intrusion detection.
results and perform self-evaluation. If the result matches the For the in-context learning experiments, we only test with
label, the LLM is encouraged to continue to make judgments GPT-3.5 and GPT-4. This is because LLAMA2 is an open-
in this manner. Otherwise, the LLM is asked to perform self- source model with a limited token size and is more suitable
corrections by providing explanations of expected results and for fine-tuning rather than in-context learning. Fig. 4 and
reflecting on the reasons for the incorrect answer. Similar to Fig. 5 show the comparison of three different in-context
the heuristic in-context learning method, the self-assessments learning methods on a TCP ACK attack detection task. It
and reflections will also be included in the prompts as a part can be observed that the testing accuracy and F1-Score grow
of the context information. with the number of examples. For GPT-3.5, the heuristic in-
context learning method shows a poor performance when
V. E XPERIMENTAL SETTINGS AND R ESULTS
there are only 2 examples. This is because GPT-3.5 may
A. experimental settings summarize incorrect conclusions from only 2 examples. As
In this work, we use a real network intrusion detection a result, it may give wrong answers to heuristic questions and
dataset proposed in [15] to generate testing examples and in- generate inaccurate detection results. When there are more in-
context learning examples. The dataset includes 9 types of context examples, the interactive in-context learning method
DDoS attacks and 84 network traffic features. Three LLMs and heuristic in-context learning method show better results
are used during the experiments, LLAMA-2-7b [16], GPT-3.5, than the illustrative in-context learning method. Furthermore,
and GPT-4 [17]. CNN-based network intrusion detection is increasing the number of in-context learning examples does
also implemented as a baseline and is used for the comparison not necessarily lead to an improvement in performance. This
Features\Models GPT-3.5 GPT-4 LLAMA-2 CNN-based model
Source IP address ! ! !
Destination IP address ! ! !
Flow duration " ! " !
Total number of forwarding packets " " " !
Total number of backward packets " " " !
Total length of forwarding packets " ! ! ! Very important
Total length of backward packets " ! ! " Kind of important
Length of forwarding packets std value ! # Not very important
Protocol !
Flow byte per second ! "
Flow packets per second " "
forward PSH flags #
backward PSH flags #
FIN flag #
SYN flag !
ACK flag ! !
Flow inter Arrival Time mean value #
Max idle value #
Fig. 3: The feature selection and importance ranking results from three different LLMs, GPT-3.5, GPT-4, and LLAMA.
100%
85% GPT-3.5 with Illustrative ICL
GPT-3.5 with Heuristic ICL 90%
75% GPT-3.5 with Interactive ICL
Testing accuracy
Testing accuracy
80%
65%
70%
55% 60%
GPT-4 with Illustractive ICL
45% 50% GPT-4 with Heuristic ICL
GPT-4 with Interactive ICL
35% 40%
0 2 4 6 8 0 2 4 6 8 10
In-context learning examples In-context learning examples
(a) Accuracy (a) Accuracy
1
0.8
0.9
Testing F1-Score
Testing F1-score
0.6
0.8
0.4