0% found this document useful (0 votes)

8 views

A framework for intrusion detection based on few-shot learning

Uploaded by

1435873621

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

A framework for intrusion detection based on few-shot learning

Uploaded by

1435873621

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Computers & Security 122 (2022) 102899

Contents lists available at ScienceDirect

Computers & Security

journal homepage: www.elsevier.com/locate/cose

FS-IDS: A framework for intrusion detection based on few-shot

learningR
Jingcheng Yang, Hongwei Li, Shuo Shao, Futai Zou, Yue Wu∗
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

a r t i c l e i n f o a b s t r a c t

Article history: Due to the high dependency of traditional intrusion detection method on a fully-labeled large dataset,
Received 16 February 2022 existing works can hardly be applied in real-world scenarios, especially facing zero-day attacks. In this
Revised 19 July 2022
paper we present a novel intrusion detection framework called “FS-IDS”, including flow data encoding
Accepted 24 August 2022
method, feature fusion mechanism and architecture of intrusion detection system based on few-shot
Available online 28 August 2022
learning. We utilize task generator to split the dataset into separate tasks and train model in an episodic
Keywords: way, hoping model to learn general knowledge rather than those specific to a single class. The extraction
Network security module and distance metric module are responsible for learning and determining whether the traffic
Intrusion detection system data are benign or not. We conduct three sets of experiments on “FS-IDS”, i.e., comparison study, abla-
Few-shot learning tion study and multiclass study. Comparison study firstly determines that the best measure metric for
Feature fusion discrimination is Euclidean distance. Based on the optimal implementation, “FS-IDS” achieves compa-
CNN
rable performance with existing works by using much fewer malicious samples. Ablation study sets two
Deep learning
base models to explore how proposed encoding method and feature fusion mechanism improve detection
capacity. Both the image representation and feature fusion achieve more than 2% improvement in accu-
racy and recall. Finally, to test whether “FS-IDS” can perform well under real-world scenario or not, we
design network traffic containing various attacks to simulate complex malicious network environment.
Experimental results show that “FS-IDS” maintains more than 90% detection accuracy and recall under
the worst circumstances, which composes of various seen or unseen attacks with only a few malicious
samples available.
© 2022 Elsevier Ltd. All rights reserved.

1. Introduction alert managers before the damage has been caused have sparked
more and more public concerns. Intrusion Detection System(IDS),
Nowadays, cyberspace has become the “fifth frontier” after the as a classic network security protection application, seems to be
ocean, land, air, and space (Jiangxing et al., 2018). Alongwith the a rational choice under such circumstances. Intrusion detection is
rapid growth of internet applications and network services, haz- the process of monitoring the events occurring in a computer sys-
ard caused by cyberspace intrusion towards network vulnerabilities tem or network, and analyzing them for signs of intrusions. Former
has become much more serious, especially 0-day attacks which ex- researches on IDS commonly extract knowledge from audit data,
ploit security weaknesses that the vendors or developers are un- user profiles or network traffic and formulate rules of benign or
aware of. Report by MIT Technology Review (O’Neill, 2021) said, abnormal behaviors manually (Liao et al., 2013). As artificial intel-
based on the data collected from multiple sources, that at least 66 ligence has become a buzzword since 2014, applying deep learning
zero-days have been found to be in use in 2021, which is almost into network intrusion detection or anomaly detection has become
double the number of such attacks recorded last year. Defense sce- a promising field. Owing to the capacity of deep neural network
narios to detect known or unknown intrusion actions in order to on learning high-level latent features from big data, deep neural
network has replaced manual rules as the powerful data analyz-
ing and decision-making tools (Andresini et al., 2021; Kim et al.,
R
This work was supported in part by National Key R&D Program of China under 2018; Li et al., 2017; Malaiya et al., 2018; Pektaş and Acarman,
Grant no. 2020YFB1807504 and National Science Foundation of China Key Project
under Grant no. 61831007.
2019; Wang et al., 2018).
∗
Corresponding author. However, utilization of neural network in IDS encounters many
E-mail addresses: [email protected] (J. Yang), [email protected] limitations and challenges. The most crucial difficulty of applica-
(H. Li), [email protected] (S. Shao), [email protected] (F. Zou), tion of IDS is the dependency of deep learning model on a large-
[email protected] (Y. Wu).

https://doi.org/10.1016/j.cose.2022.102899
0167-4048/© 2022 Elsevier Ltd. All rights reserved.
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

scale and well-labeled dataset (Ahmim et al., 2019; Andresini et al., fewer malicious samples than previous research. The result ob-
2021; Faker and Dogdu, 2019; Injadat et al., 2021; Manimuru- tained by FS-IDS are also the state-of-the-art performance in
gan et al., 2020; Min et al., 2018; Pektaş and Acarman, 2019; Re- intrusion detection based on few-shot learning.
sende and Drummond, 2018; de Souza et al., 2020; Zhang et al., 2. We proposed a novel network traffic data encoding method and
2019). When the number of training samples is insufficient, the a feature fusion method in order to construct an informative
model will suffer from severe overfitting and perform poorly. But representation for neural networks. Feature fusion combined
the great efforts of data gathering and labeling to generate a re- the original bytes content with extracted features of a certain
liable dataset always take huge costs. Above-mentioned problems network flow, instead of using only one of them in former re-
become even more severe in intrusion detection fields. Nowadays, searches. Ablation studies demonstrated that both the data en-
network data generated in one day can be measured in level of coding and feature fusion of multidimensional information im-
terabytes. Unlike image or corpus labeling in deep learning fields, prove detection accuracy and recall in few-shot conditions.
labeling process on network traffic always relies on expert knowl- 3. Since in real world IDS may face much more complex and seri-
edge, which makes it impractical to analyze and label such huge ous situations, we extended FS-IDS from binary classification in
amount of data manually. Moreover, it is impossible for security former researches to multi-class classification field. By experi-
experts to have adequate time and resources to collect, analyze and ments using blended traffic simulating to real world conditions,
label 0-day attack samples for intrusion detection model. Even af- we proved that FS-IDS can not only classify the network be-
ter security experts prepared a sufficient and well-labeled datase, haviors as benign or malicious, but also able to tell which kind
the included attack strategies may become obsolete. of attack strategies attackers used by training on only 5 attack
Another challenge of IDS is that neural network has its specific samples.
strict requirement for the value and shape of input data, while net-
work traffic tends to be diverse and heterogeneous. An informative
representation of network traffic data can be a critical factor af- 2. Related works
fecting model’s detection performance. Although there are a few
works on data representation of network flow for deep learning 2.1. Deep learning based intrusion detection
model (Kim et al., 2018; Li et al., 2017; Wang et al., 2018), pre-
vious methods only process the content of traffic data or statistic The recent rise of interests in the field of artificial intelligence
features. We think these methods cannot reflect the feature of net- resulted in major advancements of, among others, applications in
work behaviors comprehensively, especially with limited resources network intrusion detection mechanisms. With the dramatic in-
in few-shot conditions. How network traffic can be represented ap- creasing of computing resources available to train a neural net-
propriately for neural network remains a key issue. work, deep learning has been a common choice and their us-
To mitigate the high dependency of deep learning on high- age is no longer held back. The work of Malaiya et al. (2018) at-
quality dataset, as well as improve the capacity of IDS to detect 0- tributed the reason why conventional shallow learning may not
day attacks, we proposed an intrusion detection framework based work for identifying anomalies from the network traffic datasets
on few-shot learning. In proposed framework, a feature extrac- to very high degree of non-linearity from network traffic data. This
tion network was designed and trained according to specific al- past work designed a set of deep learning models including fully
gorithm. The training and test process was conducted on specific connected neural network, variational autoencoder, and sequence-
task set to obtain prior knowledge with generalization from known to-sequence structure, and showed the feasibility of deep learn-
attacks. Discriminate principle is that unseen malicious traffic can ing with greater accuracy in detection. The experimental results
be distinguished by comparing similarity measures with regard to also showed that the sequence-to-sequence model outperforms the
the “prototype” embedding generated by trained feature extraction others consistently. This work however does not design and eval-
network. In consideration of difficulty of data representation, we uate any model based on CNN. To employ CNN structure in in-
proposed a novel data encoding method to transform network traf- trusion detection, the authors of Li et al. (2017) designed a data
fic into image-format data for convolutional neural network (CNN). encoding module to convert various feature attributes into im-
We also utilized feature fusion to combine generated embedding age form. Then they used visual conversion of the NSL-KDD for-
from feature extraction network with compressed features from mat to evaluate the performance of CNN in intrusion detection.
autoencoder to form deep representation for network flow. We be- Proposed method performed one-hot encoding on symbolic fea-
lieve by these means we can utilize as much information as possi- tures. Continuous features can be transformed to symbolic features
ble from limited resources to detect novel attacks. by normalization and discretization. Experiments on the two NSL-
For intrusion detection based on few-shot learning, we find the KDD test datasets showed that CNN performs better than most
most similar work to ours is FC-Net proposed in Xu et al. (2020), standard classifiers although CNN does not improve state of the
which follows the same discriminate principle. However, FC-Net art completely. Kim et al. (2018) introduced an improved encod-
extracts feature embedding only from the traffic content but ig- ing technique that enhances the performance for the identification
nores statistic features, which misses information useful for detec- of anomalous events using CNN structure. The improved encoding
tion. We not only utilized an improved network architecture and method extended previous “gray-scale” like encoding into RGB-like
training algorithm to obtain model with enhanced detection capac- encoding, which allocated equal number of pixels to individual fea-
ity, but also proposed methods of data encoding and feature fusion tures. Experimental results demonstrate its superiority over previ-
to better present network flow for discrimination. By empowering ous researches.
the model with capacity to learn latent discriminant patterns from Aforementioned researches were conducted based on KDD’99 or
only a few labeled samples, we provide a solution for intrusion de- NSL-KDD datasets, which only recorded statistic features of net-
tection system under circumstances of deficient data samples and work traffic. The majority of publicly available datasets that are
emerging 0-day attacks. commonly used in literature of network security only disclosure
The main contributions of this paper are as follows: network attributes while reveal their traffic data of network flow
they recorded. For example, for KDD’99 or NSL-KDD dataset, there
1. We proposed an integrated intrusion detection framework are 41 features extracted from data captured in DAPRA’98 IDS eval-
named FS-IDS based on few-shot learning. FS-IDS achieves over uation program. The features contained can be classified into three
97% accuracy and 99% recall on detecting novel attacks by much groups (Tavallaee et al., 2009):

2
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

1. Basic features: this category encapsulates all the attributes that a few samples per class. Snell et al. (2017) improved upon Match-
can be extracted from a TCP/IP connection, e.g. duration of the ing Network by using neural network to learn a non-linear map-
connection, network service on the destination, type of proto- ping of the input into an embedding space, and take a class’s
col. prototype to be the mean vector. Classification is then performed
2. Traffic features: this category includes features that are com- for an embedded test data sample by simply finding the near-
puted with respect to a window interval, e.g., number of con- est class prototype. They named their revised version “Prototypical
nections, and is divided into two groups due to the relationship Network”. In 5-shot scenario, it achieves 98.8% and 68.2% accuracy
between the recorded connections with current connection: on Omniglot and miniImageNet dataset, respectively. On this basis,
1. “same host” features: examine only the connections in the Sung et al. (2018) presented “Relation Net” and further improved
past 2 s that have the same destination host as the cur- few-shot classification performance. Relation net inherited the rep-
rent connection, and calculate statistics related to protocol resentation learning network from prototypical network. However,
behavior, service, etc. it utilized a metric learning network rather than simply calculated
2. “same service” features: examine only the connections in Euclidean distance, to learn a non-linear classifier. By introducing
the past 2 s that have the same service as the current con- a new discriminant network, relation network improved accuracy
nection. by 1.2% on Omniglot.
The most related few-shot learning methodologies to ours are
Whether for objective limitations of datasets or researchers’ sub- prototypical network and relation net. According to their inher-
jective neglect, we find that researchers didn’t take the raw net- ent principles and architectures, we disassemble the network as
work traffic content into account. In recent years, some works representation learning module and metric/distance learning mod-
were done by utilizing traffic data rather than statistic features. ule, then conduct comprehensive studies on their effects on perfor-
Wang et al. (2018) proposed HAST-IDS which combines learn- mance of intrusion detection. Based on this, we present two intru-
ing of low-level spatial features with high-level temporal features sion detection frameworks and perform comparative study to ob-
from network traffic using both CNN and long short-term mem- tain the best intrusion detection model under the circumstances
ory network(LSTM). The automatically learned traffic features ef- where researchers have only a few labeled samples of novel cyber-
fectively reduce the false alarm rate, without any feature engineer- attacks.
ing techniques. This IDS achieved 99.89 accuracy and 96.96 re-
call on ISCX2012. A novel encoding method named “Flow-Image” 2.3. Intrusion detection with insufficient labeled samples
representation as well as “Segmented-CNN” were presented in
(Millar et al., 2019). A network traffic flow was represented in a While many researchers have noticed the limitations of prac-
two-dimensional array where each row of it represented a new ticability of IDS in real-world settings due to its dependency on
packet in the flow with its column representing a new byte in the a sufficient labeled network traffic dataset, most of them turned
packet. A novel “Segmented-CNN” architecture was proposed that to unsupervised scenarios. Unsupervised intrusion detection mod-
aims to exploit the distinct properties of the header and payload in els assume that the overwhelming majority of network traffic data
TCP/IP packet. Experimental results indicated that proposed model are normal instances. This hypothesis may incur high false posi-
obtain a good balance between efficiency and performance. tive rate since a little fluctuation of normal behaviors can be re-
garded as anomaly. In this field, autoencoder is the fundamen-
2.2. Few-shot learning tal deep architecture. Autoencoder represents data within multiple
hidden layers by reconstructing the input data, effectively learning
Recent years, due to advancements in computing resources an identity function (Raghavendra Chalapathy, 2019). Zavrak and
and large-scale datasets, artificial intelligence represented by deep skefiyeli (2020) adopted the method of autoencoder and variational
learning has involved in lots of fields as highly intelligent tools. Al- autoencoder, and compared it with the OCSVM algorithm. Experi-
though deep learning is being in the prosperous stage, it still has mental results showed that the AUC value obtained by the varia-
some intrinsic defects. One of them is its incapacity to general- tional autoencoder was 0.7596, which was better than that of au-
ize from few data to perform the task. Recall human can rapidly toencoder and OCSVM, but it was not easy to determine an ap-
generalize what they have learned to new task scenarios rapidly, propriate threshold that provides high detection accuracy or low
deep learning model must learn and make inference on the basis false alarm rate. Ieracitano et al. (2020) developed an intelligent
of large amounts of data. For the thirst of learning from limited su- IDS based on statistical analysis and autoencoder. They combined
pervised information, a new machine learning problem called few- data analysis and statistical techniques for feature extraction, then
shot Learning (FSL) emerges (Wang and Yao, 2020). used an autoencoder to reduce dimensions of original input data.
FSL aims to recognise novel categories from much fewer la- The compressed feature vector was used as the input of the fi-
beled examples than traditional deep learning. The idea is to fo- nal softmax layer for binary classification. The effectiveness of the
cus on the learning of the transferable embedding and pre-define a proposed IDS was tested using NSL-KDD dataset. An accuracy of
fixed metric (e.g., Euclidean Snell et al., 2017) for classification. The 84.21% was achieved, which was superior to algorithms such as
model performs non-parametric “learning” at the so-called “task” LSTM and MLP. Mirsky et al. (2018) presented an unsupervised
level by simply comparing validation points with training points plug-and-play IDS using an ensemble of autoencoders to collec-
and predicting the label of matching training points. This was first tively differentiate abnormal traffic patterns from normal. It ex-
achieved by “Siamese Network” presented in Koch et al. (2015). tracted damped incremental statistics from input traffic and inte-
Siamese Network consists of twin networks which accept distinct grated hierarchical sets of autoencoders to detect anomalies. Exper-
inputs but are joined by an energy function(e.g. a loss function) iments indicated that it can be employed on a lightweight network
at the top. They used the verification model to evaluate new im- device and obtained a relatively better performance than Isolation
ages, exactly one per novel class, in a pairwise manner against Forest and GMM.
the test images. In (Vinyals et al., 2016a), researchers define a Since few-shot learning problem is relatively new to intrusion
few-shot learning framework named “episode” based training pro- detection, we only find the work of Xu et al. (2020) similar to
cedure, which is inherited by following researches. Furthermore, ours. Xu et al. (2020) presented FC-Net, which is basically the
they presented “Matching Network” and trained it on proposed same as relation net except for the convolution block, to determine
episode-based manner, to perform rapid learning by showing only whether the input sample is benign or malicious. FC-Net was also

3
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 1. Overall structure of FS-IDS.

trained with episode-based manner, its performance evaluated on defined metric measure. Unseen malicious traffic can be distin-
CICIDS2017 reached 94.64% in few-shot scenario. However, FC-Net guished as the closest “prototype” embedding generated by feature
only used packet content to construct input data in the form of 3D fusion.
images for CNN. We find that such method ignores the informa-
tive statistic features extracted from network traffic, which can also 3.2. Network traffic encoder
play a crucial part in intrusion detection. We filled the gap by in-
troducing autoencoder as feature extractor into model architecture, As mentioned earlier, former researchers proposed various en-
and proved the effectiveness of it. Beyond that, we built a frame- coding methods to apply machine learning or deep learning into
work for solving few-shot learning problem in intrusion detection, IDS. However, the majority of them utilized incomplete informa-
as well as explored how different network architectures contribute tive resources for network traffic representation, which only col-
in detecting malicious traffic. lected the raw network flow or statistic features while neglected
the other. Therefore, we propose a novel network traffic encod-
3. Architecture of few-shot IDS
ing method and utilize it to design a data encoder for network
flow representation. Specifically, proposed method represents the
In this section, we elaborate the architecture of our proposed
raw content of network flow as the form of image for CNN, named
intrusion detection framework based on few-shot learning. In or-
“GrayScale Flow”, as well as encodes network flow attributes us-
der to emphasize the superiority of its high performance under
ing autoencoder to include comprehensive information of network
few-shot conditions, we named it as “FS-IDS” which is abbreviation
traffic. The pipeline of proposed method is shown in Fig. 2.
of “few-shot Intrusion Detection System”. At first we provide an
overview of FS-IDS. Then we elucidate functions and implementa-
tion details of each module in FS-IDS in sequence of network data 3.2.1. Image representation for network flow
processing. In the last subsection we will introduce the training In one hand, CNN has been widely applied in computer vision
strategy of our framework. The architecture of FS-IDS is shown in tasks. CNN defines a type of robust, popular neural network de-
Fig. 1. signed to process input data stored in arrays (Aggarwal, 2018). The
common form of input image of CNN is 2D array with uniform
3.1. Overview shape. In the other hand, a flow of network traffic is defined as
the amount of data transmitted between two certain communica-
As shown in Fig. 1, firstly, the input network traffic data is fed tion nodes across network over a specific period. The hierarchical
into the network traffic encoder module to be transformed into structure of network flow is illustrated in Fig. 3, where each net-
encoded vectors in specific formats for following processing. The work flow is composed of associated sequential packets. And each
encoder module can be divided into two parts: a flow traffic en- packet, no matter what network protocol it follows, includes one
coder utilizing proposed “GrayScale Flow” method and a feature or more headers as well as payloads of certain bytes. We notice
encoder on the basis of autoencoder. The flow traffic encoder is that since network flow consists of several bytes, it is basically the
responsible to encode the raw traffic data into fixed-size matrix, same as a pixel of 8-bit grayscale images. So it was natural to have
while compressed features of statistic characteristics are generated an insight that we can transform network flow data into grayscale
by autoencoder. images to satisfy the input requirement of CNN, where each byte
When the process of encoding and transforming of origi- of network flow represents a single pixel of corresponding output
nal dataset is completed, task generator module splits processed image-like data.
dataset into different tasks according to specific algorithm. The Normally, a network flow is regarded as a 1D byte ar-
purpose of few-shot learning is to make model learn general clas- ray composed of corresponding packets in chronological order.
sification capacity, rather than class-specific knowledge, through Xu et al. (2020) transformed a network flow to a 3D array, i.e.,
switching between different tasks. Then the created task sets are format of sets of colorful images. Specifically, Xu et al. (2020) un-
fed into feature extraction network to learn feature maps with gen- folded each packet into a single slice of video stream and ar-
eralization in latent embedding space. After that we use a concate- ranged them in corresponding order. By that means authors of
nation layer to implement feature fusion, which concatenates two Xu et al. (2020) used 3D convolutional network to not only ex-
representation vectors along the length direction to compose the tract spatial features, but also capture temporal relations between
final feature maps of network flow data. Through combination of packets. However, we believe this method can not make full use of
the raw traffic data with extracted network characteristics, we be- the feature extraction capacities of CNN. Convolution layer, which
lieve we can obtain robust, comprehensive representations of net- is the core of CNN, uses a square receptive field called convolu-
work traffic data. tion kernel sliding across the whole plane to extract spatial fea-
Finally, corresponding feature maps of each class are compared tures from each pixel with its neighbours. The convolution opera-
and distinguished by distance metric module based on its pre- tion makes discriminant results have strong spatial dependencies

4
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 2. Pipeline of network traﬃc encoding method.

Fig. 3. Hierarchical structure of network traﬃc.

on local regions of the grid-structured inputs. Simply arranging

each packet as a fixed-size of matrix seems not adaptive to the
mechanism how convolution kernel recognizes and extracts fea-
tures. Moreover, using the form of image sets as well as 3D convo-
lution leads to a huge increase of computational complexity.
Instead of previous methods, we propose a novel network flow
encoding method to encode a network flow to grayscale image-like
representation, called “Grayscale Flow”. Since the form of encoded
data is like grayscale images, we choose 1D byte array to represent
a single packet and appended each 1D array whose corresponding
packet belongs to the same flow in chronological order to construct
2D representation. In other words, we arrange each packet content
as each row of the output matrix, as shown in Fig. 4. In Fig. 4,
we make a comparison of the form of “Grayscale Flow” with clas- Fig. 4. (a) Representation of MNIST image; (b) Representation of “Grayscale Flow”;
sic image data in MNIST. Owing to the weight sharing mechanism (c) Sample of MNIST image; (d) Sample of “Grayscale Flow”.
as well as the fixed size of receptive field in convolution layer, it
is intrinsic that getting header and payload of each packet closer
improves the spatial dependency of encoded data. Because packets to be determined a priori. Since the integer power of two can im-
belonging to the same flow commonly have similar header archi- prove computational efficiency of GPU (Xu et al., 2020), we pre-
tectures and all of 1D array representations of packets are aligned fer to choose integer of two for the shape of input data. For the
in row order, such arrangement makes the same fields of adja- number of packets, we choose the first 16 packets in chronologi-
cent packets located in the neighbourhood of corresponding pixels. cal order to compose the representation of the flow. Packets ini-
This transformation introduces an inter-row continuity in the 1D tially transmitted in a network flow include the process of con-
feature vector by making similar features locate in adjacent rows, nection establishment between communication nodes, which nor-
which can help CNN capture spatial feature of network flow. mally have the most valuable information for discrimination. Such
Current implementations of CNN require fixed-sized input data. packets may also contain parts of transmission content in follow-
Therefore, the shape of transformed images of network flow has ing communications. Common approach in intrusion detection is

5
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 5. Structure of autoencoder in FS-IDS.

to choose the first 10 to 20 packets to represent the network flow model to complete. The dataset is always split into training set
(Millar et al., 2019; Wang et al., 2018; Xu et al., 2020). For input and test set. In notation, traditional method deals with dataset
size of each packet, We argue that the length of 256 bytes is an ap- D = {Dtrain , Dtest } to minimize a pre-defined loss function L. Stan-
propriate value for containing not only the whole header but also dard training procedure feeds batches of data into model and per-
parts of payload for each packet. The packets whose length is less forms gradient descent iteratively to obtain global optimized solu-
than 256 bytes was padded by zero to satisfy the input require- tion. The final step is testing the trained model on test set which
ment. normally consists of unseen data to evaluate its generalization abil-
Overall, by this way we encoded network flow with a variety ity. Sometimes before the test procedure finetuning with valida-
of protocols, sizes and lengths to uniformed grayscale image-like tion set is also involved. In network intrusion detection field, nor-
data, which take the form of 16 × 256 matrices. mally the task is determined as a binary classification task, i.e.,
training a binary classifier on dataset composed of I network flows
3.2.2. Encoded vector for statistic features D = {(x1 , y1 ), (x2 , y2 ), . . . , (xI , yI )}, where x denotes processed net-
In order to combine the raw traffic with network characteris- work flow data, y ∈ 0, 1 denotes labels of network flow. In general
tics in latent feature space, we use autoencoder to produce a ro- 0 denotes benign traffic and 1 denotes malicious traffic. The model
bust high-level representation. As mentioned earlier, autoencoder is trained to construct a nonlinear function f (x ) = y to discrimi-
(Hinton and Salakhutdinov, 2006) with its variants are kinds of nate whether input network flow is malicious or not.
deep learning model that have been widely adopted in unsuper- However, in few-shot condition, simply following standard
vised intrusion detection. In general, autoencoder consists of two methods always leads to a poor performance. It is generally be-
separate networks: encoder and decoder, including input layer, lieved that gradient-based optimization in high-capacity classifier
hidden layer, output layer. Encoder can be formulated as a map- requires many iterative steps over large sets of data to perform
ping function f (x ) that encodes the input x to its latent feature well. In FSL there are only a limited number of labeled samples
vector x . Decoder generates reconstructed input xˆ = g(x ). When that can be used for training, which can lead to severe overfitting.
the number of hidden layer neurons are less than the number of So in FSL, the whole dataset D = {Dmeta−train , Dmeta−test } is divided
input layer and output layer neurons, encoder compresses infor- into meta-training set and meta-testing set, and the training pro-
mation of original input x by x . We regard the compressed coding cedure composes of many episodes. If the practical application sce-
vector x as the latent feature vector that reflects the essential at- nario for the model is to classify instances from N different classes
tribute of network flow. by providing the classifier with K examples for each class, we call
The architecture of autoencoder that we used in FS-IDS is it a N-way K-shot task. If the final mission for classifier is a N-
shown in Fig. 5. The autoencoder comprised five fully-connected way K-shot task, two batches of N-way K-shot data are sampled in
(FC) layers and one dropout layer for preventing overfitting. The each episode during meta-training. One of them constitutes “sup-
numbers “80”, “30”, “10” and “Features” under each block rep- port set” Dsupport and the other constitutes “query set” Dquery . Then
resent the input and output dimensions of data through corre- the model is trained using data from support set and get tested by
sponding fully-connected layers. It is worth noting that the train- data from query set. It can be seen that each episode includes its
ing procedure of autoencoder follows the traditional way rather own training set(support set) and test set(query set) as well as a
than episodic way in few-shot learning. Because the purpose of complete standard pipeline of training and testing. So each episode
autoencoder is unsupervised feature extraction, not data discrim- aims to make model learn to solve the specific task by only N × K
ination, the model should be trained using as much data as possi- samples.
ble in order to produce robust feature vectors expressing attributes In FSL, K is a much smaller number than standard deep learn-
comprehensively. Using specific training strategy for autoencoder is ing, which tends to be 1, 5, 10 and so on. Through episode-based
beneficial for generating robust and representative features of net- training procedure on multiple tasks with similar data composi-
work flow data. tion, we hope the model does not focus on a single classification
task, but learns meta-knowledge that has nothing to do with the
3.3. Task generator specific task but related to the general capacity of discrimination.
So it can still maintain good performance even facing unseen data.
Conventional deep learning uses an end-to-end training strat- The meta-test set is used to simulate the task that the model ul-
egy to build a robust classifier, i.e., defining a target function to timately needs to handle. Normally it is to classify a sample that
be optimized with a large labeled dataset as the specific task for was never seen before by a few accessible samples. The structure

6
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Table 1
Data composition of CICIDS2017.

Attack Num of samples

DoS Hulk 231,073

DoS GoldenEye 10,293
DoS slowloris 5796
DoS Slowhttptest 5499
Heartbleed 11
FTP-Patator 7938
SSH-Patator 5897
Web Attack-Brute Force 1507
Web Attack-XSS 652
Web Attack-Sql Injection 21
Bot 1966
DDoS 128,027
PortScan 158,930
Inﬁltration 36

Fig. 6. Structure of convolutional block in feature extraction network.

of the meta-training set is derived from the structure of the meta-
test set, so the meta-test set also contains a training set and a test
set of N × K samples. guish malicious traffic from benign traffic. So we add benign traf-
In the field of intrusion detection, the situation seems slightly fic in each task to simulate the real-world scenario in step 10. By
different. Commonly used few-shot learning dataset such as Om- this means task generator compels model to learn through differ-
niglot (Lake et al., 2011), miniImageNet (Vinyals et al., 2016b) con- ent tasks with similar data composition, hoping it perform well in
sist of a mass of classes with a small number of samples belong- meta-test phase. In steps 11–13, task generator assembles support
ing to each class. For example, Omniglot consists of 1623 hand- set and query set as a specific task for model training and testing
written characters collected from 50 alphabets where each char- in FSL.
acter is associated with 20 examples drawn by a different hu-
3.4. Feature extraction network
man subject. Nevertheless, network traffic dataset always contains
much fewer attack classes with a massive number of samples per
In FS-IDS, the feature embedding of content of network flow is
class. As shown in Table 1, intrusion detection dataset CICIDS2017
built by feature extraction network using few-shot learning algo-
(Sharafaldin et al., 2018) covered 14 kinds of attacks, and some
rithm. By episode-based training, feature extraction network can
kinds of attacks, such as DoS Hulk attack, recorded more than
learn prior knowledge from known attacks. The model obtains ca-
10 0,0 0 0 network flows. To fill the gap between network traffic data
pacity to extract high-level feature vectors with generalization in
and few-shot dataset in computer vision, we design a task gen-
embedding space, which can be used for following discrimination
erator to sample network flow and create support set and query
by using only a few available samples.
set so as to construct a specific task for model training in each
The feature extraction network is implemented as a multi-
episode. The task generating algorithm is shown in Algorithm 1.
layer CNN, which composes of four sequential convolutional blocks.
Structure of each convolutional block is shown in Fig. 6. “Conv2d”
Algorithm 1: Generating algorithm of task generator. denotes a 2-dimension convolutional layer with 3 × 3 kernel size, 4
Input: Meta-testing task: Tmeta−test ; Num of Iterations: channels and 1-dimension padding. “BN2d(4)” denotes a 4 channel
I;Dataset D = {(xi , yi )}; Dc denotes the subset of D batch normalization layer. “MP(2)” denotes a 2-dimension max-
satisfying Dc = {(xi , yi )|yi = c, c = 0, 1, . . ., I}, where 0 pooling layer with 2 × 2 kernel size.
represents benign traffic, {1, 2, . . ., I} represents
different kinds of attacks 3.5. Distance metric module
Output: Meta-training task set: {Tmeta−train } j=1 .
j J
The kernel module of FS-IDS consists of two modules: feature
1 N := Num of classes in Tmeta−test ;K := Num of samples of extraction network fφ and distance metric module gψ . Through-
each class in Tmeta−test ;for j = 1 to J do out feature extraction network, samples of support set xi and sam-
2
j j
Dsupport , Dquery := ∅;Randomly sample ples of query set x j are mapped to their embedding features fφ (xi )
{c1 , c2 , . . ., cN−1 } ∈ {1, 2, . . ., C } as training classes;for k = 1 and fφ (x j ) respectively. The dual feature representations from traf-
to N − 1 do fic data and network characteristics(by autoencoder as mentioned
3 Randomly sample {(xi , yi )}2i=1 K from Dck ;Support set earlier) are integrated to construct embedding features contain-
Dsupport := Dsupport ∪ {(xi , yi )}Ki=1 ;Query set ing multi-dimensional information. The feature fusion mechanism
Dquery := Dquery ∪ {(xi , yi )}2i=KK+1 ; is implemented by a concatenation layer, which concatenates the
two feature vectors in depth to construct integrated embedding
4 Randomly sample {(xi , yi )}2i=1
K from D0 ;Support set
features.
Dsupport := Dsupport ∪ {(xi , yi )}Ki=1 ;Query set Dquery :=
j j j
Distance metric module receives embedding features of each
j
Dquery ∪ {(xi , yi )}2i=KK+1 ;{Tmeta
j
} = {Dsupport
j
, Dquery };
j classes and makes comparison between these to determine
−train
whether they derive from the same classes. Firstly, distance met-
ric module constructs a “prototype” representation based on pro-
cessed representations belonging to the same class from support
In each iteration, task generator randomly samples specific num-
set. The prototype representation of each class is formulated as the
ber of classes with corresponding samples, exactly the same as N
mean vector of samples in support set:
and K according to the final mission Tmeta−test . These processes
1
are reflected by steps 5–9. However, unlike the simply random pk = f φ ( xi ) (1)
sampling in traditional FSL, the final mission for IDS is to distin- |Kc | (x ,y )∈D
i i c

7
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 7. Structure of CNN in CNN-based distance metric module.

Then distance metric module calculates distance metric in em- 3 × 3 kernel size, 2 channels and 2 strides. “MP(2)” denotes an
bedding space to recognise network data according to its nearest 1-dimension maxpooling layer with 2 × 2 kernel size. “conv2” de-
neighbour, i.e., discriminates the input data as the same class of notes a 2-dimension convolution layer with 2 × 2 kernel size and
the nearest prototype. As for distance metric, we proposed two im- 1 channel.
plementations with reference to previous researches in computer As mentioned before, the concatenation layer concatenates pk
vision: Euclidean-based distance metric and CNN-based distance and fφ (x j ) in depth in order to construct a high-level vector
metric. C ( pk , fφ (x j )) containing features derived from both support point
and query point, where C denotes the concatenation operation.
3.5.1. Euclidean-based distance metric Then concatenated embedding vectors are fed into distance learn-
On the basis of prototypical network (Snell et al., 2017), in ing CNN gψ to obtain a scalar rk, j in the range of [0,1]. rk, j re-
Euclidean-based distance metric module we adopt Euclidean dis- ﬂects the similarity between query sample fφ (x j ) and correspond-
tance as the metric to measure the distance of each query point ing class prototype pk , which can be formulated as:
to calculated prototypes from support point. Speciﬁcally, distance
rk, j = gψ (C ( pk , fφ (x j ))) (3)
metric module calculates the Euclidean distance between received
query point and prototypes of all classes. Given the distance met- Since the output score rk, j is between 0 and 1, the similarity
rics, the module computes a distribution over classes for query can represent the probability for embedding of query point fφ (x j )
point x j based on a softmax over distances to the prototypes in belonging to class k. The module determines x j as class k that had
the embedding space: the largest rk, j .
exp(−d ( fφ (x j ), pk ))
P ( y = k|x ) = (2) 3.6. Training strategy
k exp (−d ( f φ (x j ), pk ))

By softmax function, the module outputs a probability distribu- The general pipeline how model get trained in an episode fol-
tion of received query samples over different classes. The model lows similar principles. Once task generator constructs a specific
determines query sample belonging to the class whose probabil- task, i.e., support set and query set, feature extraction network ex-
ity was largest, which is corresponding to the nearest prototype in tracts their embedding features. Distance metric module calculates
embedding space. the prototype representation of each class according to the sup-
port point, and identifies categories of query points based on its
3.5.2. CNN based distance metric pre-defined metric measure. Finally on the basis of discriminant
On the basis of relation net (Sung et al., 2018), CNN-based dis- results the loss function is computed and gets optimized through
tance metric replaces the simple linear metric with a neural net- back-propagation. However, with different implementations of dis-
work to learn a deep, non-linear metric. The architecture of the tance metric module, the training strategies of them have a certain
neural network in CNN-based distance metric module is shown distinction. Next we elaborate the training strategies of different
in Fig. 7. “conv1” denotes a 2-dimension convolution layer with implementations respectively.

8
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

3.6.1. Euclidean based distance metric correct predictions to the total number of predictions, which mea-
For Euclidean based distance metric module, since the model sures the capability of model to classify correctly. Recall measures
output is a probability distribution P (y = k|x ), we choose negative the percentage of positives that model classiﬁes correctly. So re-
log-probability as the loss function L. call is calculated as the ratio of the number of true positives over
L = −logP (y = k|x ) the total number of positives the model discriminates. In multi-
(4) class settings, we extended accuracy and recall to micro-accuracy
= d ( fφ (x j ), pk ) + log k exp(−d ( fφ (x j ), pk ))
and micro-recall, which are mean value of accuracy and recall over
The learning procedure proceeds by minimizing L via stochastic all classes. It is also widely used in multiclass classiﬁcation exper-
gradient descent. iments to evaluate model performance.

3.6.2. CNN based distance metric

4.2. Experiment settings
For CNN based distance metric, situation seems more complex
since another neural network is introduced. The mean square er-
To have a comprehensive and profound understanding of the
ror is used to regress the similarity rk, j to ground truth: matched
principle of FS-IDS, we design a series of experiments based on
pairs tend to be 1 and mismatched pair tend to be 0, which can
following questions:
be formulated as:
A: What is the best metric for proposed few-shot intrusion detec-
L = (rk, j − 1(y j == k ))2 (5)
tion model? Is it non-linear distance measurement obtained by
FS-IDS with CNN based distance metric module can be seen as CNN or linear distance in Euclidean space?
both learning a deep embedding as well as a complex non-linear B: Comparing to existing researches, how well does the optimal
metric measure. L is also minimized via stochastic gradient de- architecture FS-IDS perform? Does it really work in few-shot
scent, and the gradient is back-propagated throughout the feature conditions?
extraction network with the distance metric network to optimize C: We believe by network traffic encoding and feature fusion FS-
weight factors synchronously. So the two modules are mutually IDS can take fully advantage of limited resource in few-shot
tuned in episodic way to support each other in few-shot learning. conditions. Do they really work to improve performance of FS-
IDS? How much do they contribute to intrusion detection?
4. Experiments and analysis D: Real-world condition is always more complex that traffic data
consists of various attacks including seen or unseen ones. Ad-
4.1. Dataset and evaluation metrics vanced IDS should not only tell it is benign or malicious, but
also have capacity to identify which kind of attacks it is. Can
In evaluation experiments, we chose CICIDS2017 proposed IDS be applied in such multiclass situations?
(Sharafaldin et al., 2018) as the benchmark dataset, which has
been widely used in network security field. The reason we chose In order to find a comprehensive answer for all these ques-
this dataset led in four aspects: tions, we designed a series of experiments divided into comparison
study, ablation study and multiclass study. In comparison study, we
1) Since FS-IDS utilizes both original bit content transmitted
chose FTP-Patator, SSH-Patator, DoS(slowloris, Slowhttptest, Hulk,
through network as well as extracted attributes, the dataset
GoldenEye), PortScan and DDoS LOIT as attack classes to organize
must provide network traffic file. This requirement filters out
training and testing process for comparison. This choice also took
many commonly used datasets such as NSL-KDD (Dhanabal and
account of the data imbalance problem of CICIDS2017 shown in
Shantharajah, 2015).
Table 1. All the kinds of attacks we selected had relatively suffi-
2) FSL is a kind of supervised learning strategy adapting to condi-
cient samples. Otherwise, we selected two kinds of malicious sam-
tions where collecting sufficient samples may be impractical. So
ples as meta test classes, and the other three attack classes were
the chosen dataset must be labeled completely and correctly.
blended with benign samples to construct meta training tasks. So
3) As previously mentioned, dynamically developing attack sce-
each set of experiments consisted of C53 = 10 parallel experiments.
narios takes high time-validity of network intrusion dataset.
By that means we aimed to eliminate the random effects by spe-
Datasets such as DARPA 1998 (Lippmann et al., 20 0 0), KDD’99
cific class. For each attack class, we randomly selected 50 0 0 sam-
(Tavallaee et al., 2009) are too old and cyber-attacks included
ples as well as benign samples to produce dataset in each experi-
have become obsolete, which makes such datasets no longer
ment. And for each experiment we repeated 50 times to mitigate
applicable in current network environment.
the effect of sample randomness.
4) Reliable network intrusion dataset should support diverse pro-
In order to respond to question A, we implemented two archi-
tocols and network attacks. The more attack categories the
tectures of IDS using CNN-based and linear distance based met-
dataset covers, the more convincing the results derived from it
ric module and compared their performance to obtain the opti-
are.
mal IDS architecture. After that, we made comparison with related
Due to above criteria, we chose CICIDS2017 as the benchmark works of intrusion detection on the basis of big data or few-shot
dataset to test performance of FS-IDS. CICIDS2017 consists of real- to prove the effectiveness of FS-IDS, as well as respond to ques-
world traffic data captured through a week. It covers the most tion B. Specifically, we compared FS-IDS with related researches to
up-to-date common attacks, including Brute Force FTP, Brute Force demonstrate that FS-IDS can achieve even higher detection perfor-
SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS, mance by much fewer malicious samples. Furthermore, we com-
as shown in Table 1. CICIDS2017 also provides results of network pared FS-IDS with FC-Net (Xu et al., 2020), which is the only intru-
traffic analysis with manual labels. More than 80 features are ex- sion detection model applied in few-shot conditions. We not only
tracted from collected traffic. By that we used these extracted fea- compared the performance of FS-IDS with the published results
tures as network characteristics complementing with encoded flow from (Xu et al., 2020), but also implemented FC-Net and tested its
data. performance on the basis of the same settings and data to ensure
Since intrusion detection can be regarded as a classification the fairness of comparison.
problem, the most widely used evaluation metrics to quantify per- To answer question C, we performed ablation study exploring
formance of a classification model are accuracy and recall. Accu- how “GrayScale Flow” encoding on network traffic and feature fu-
racy is generally calculated as the ratio between the number of sion with extracted attributes can help detect malicious traffic. We

9
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Table 2
Average accuracy and recall of Euclidean-based distance metric module and CNN-
based distance metric module of FS-IDS on various attacks.

0.9883
0.9285
0.9965
0.9992

0.9447
0.9756
0.9969

0.9255
0.9917
0.983
Attack Accuracy Recall

0.99

0.99
N/A
Rec
types
Euclidean CNN Euclidean CNN

0.9887

0.9913
0.9991
0.9997
0.9666
0.9746
0.9963
0.9999
0.9801
0.9433
0.9157
0.9751
FTP-Patator 0.951 0.9157 0.9928 0.9402

N/A
SSH-Patator 0.996 0.9795 1 0.9963

Acc
PortScan 0.9968 0.9537 0.9983 0.985
DoS 0.9865 0.9022 0.9937 0.9527

REP Tree + JRip + Random Forest

DDoS 0.945 0.9103 0.9655 0.9137

Genetic Algorithm + kNN

kNN + Random Forest

Gradient Boosted Tree
replaced our encoding method with 3D encoding used in FC-Net

Deep Belief Network

Autoencoder + CNN

CNN + Autoencoder
Backbone network
(Xu et al., 2020) to observe detection results. Moreover, we elimi-
nated the feature fusion to see whether the performance degraded.

CNN + LSTM

CNN + LSTM
Autoencoder

DNN + kNN
By ablation study, we can ﬁgure out how much these factors help
FS-IDS exceed others by using lot fewer available samples. The data

CNN
CNN
composition and parallel experiments settings remained the same
as in comparison study.
The ﬁnal question is whether FS-IDS can be used in multiclass

Statistic Features + Raw Traﬃc

situations that are closer to real world situation. In multiclass ex-
periments, we used the same classes in comparison study to com-
pose training data. We chose Brute Force, XSS, SQL Injection as the
few-shot test classes, simulating the zero-day attacks where only

Statistic Features
Statistic Features
Statistic Features

Statistic Features
Statistic Features
Statistic Features
Statistic Features
Statistic Features
Statistic Features
a few malicious samples are accessible. To simulate the real world

Data sources
conditions, the model was trained using a mass of data belonging

Raw Traﬃc

Raw Traﬃc
Raw Traﬃc
to known attacks in training set and only a few new attack samples
in test set. The model was evaluated on blended test data which
covers all test classes to test its detection capacity of both seen or
unseen attacks in multiclass settings. We randomly selected 2600 Comparison of results, number of samples and data sources of intrusion detection methods and related research works.
samples for training set and 600 samples for test set in each ex- Num of samples
periment.
1,028,007

2,830,540
1,000,000

176,947
225,745
760,056

553,850
All experiments are performed on two NVIDIA GeForce GTX
40,000

40,000

30,000
1080 Ti GPUs and Intel(R) Xeon(R) CPU E5-2630 v4 @2.20 GHz. The

5
5
5
FS-IDS is implemented based on Software platforms Pytorch 1.3.1,
cuda 10.1.105 and cuDNN 7.
Spatial-Temporal Deep Learning Method (2018) (Pektaş and Acarman, 2019)

4.3. Comparison study

GA-based Adaptive Method (2018) (Resende and Drummond, 2018)

Multi-Stage Optimized ML-based IDS (2021) (Injadat et al., 2021)

In order to make a comprehensive comparison between linear

GBT-based Big Data Method (2019) (Faker and Dogdu, 2019)

distance module and neural network distance module, we imple-

mented the Euclidean distance metric and CNN-based metric mod-
DT and Rule Based IDS (2019) (Ahmim et al., 2019)

ule described in Section 3 to determine which is the best architec-

DBN-based IDS (2020) (Manimurugan et al., 2020)
Deep Hierarchical IDS (2019) (Zhang et al., 2019)

ture for measuring distance between embedding features. The ex-

DNN-kNN IDS (2020) (de Souza et al., 2020)

perimental results are shown in Fig. 8 and Table 2. In Fig. 8, the

FC-Net (Xu et al., 2020) (2019, reproduced)

horizontal and vertical axis of each heatmap represent two meta

CLAIRE (2021) (Andresini et al., 2021)
FC-Net (Xu et al., 2020) (2019, paper)

test classes due to our experimental settings. And the value in each
block is the corresponding evaluation metric on attack class speci-
SU-IDS (2018) (Min et al., 2018)

ﬁed by its X-coordinate. Fig. 8 depicts the speciﬁc performance on

each task composed of different categories of network attacks. It
can be seen clearly that IDS based on Euclidean distance metric
module outperforms CNN distance metric module throughout the
whole task distribution. Moreover, Table 2 lists the average accu-
racy and recall on each kind of network attack of different met-
FS-IDS(ours)

ric module architectures. In following sections, we use results ob-

Method

tained by IDS based on Euclidean distance metric module since it

represents the best performance achieved by FS-IDS.
After that, we made a comparison between FS-IDS with other
existing researches that used the same benchmark dataset CI-
Size of dataset

CIDS2017. The result is shown in Table 3. Among these IDS, FC-Net

(Xu et al., 2020) is the only detection method adapting to few-shot
Table 3

Data

few-

condition. Since the data and code of FC-Net is inaccessible, we re-

shot
Big

produced FC-Net and tested its performance on the same data to

ensure fairness. The results are denoted as “FC-Net(reproduced)”
in Table 3. Meanwhile, we also present experimental results from

10
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 8. Accuracy and recall of Euclidean-based distance metric module (left) and CNN-based distance metric module (right) FS-IDS on different tasks.

Xu et al. (2020), which we believe are the best performance of FC- Table 4
Average accuracy and recall of FS-IDS and FC-Net(paper).
Net.
The first noteworthy observation from Table 3 is that the over- Attack Accuracy Recall
whelming majority of IDS researches are on the basis of “big data”. types
FS-IDS FC-Net FS-IDS FC-Net
As far as we know, FC-Net is the first as well as the only IDS model
FTP-Patator 0.951 0.9454 0.9928 0.9956
achieving intrusion detection in the few-shot conditions. The de-
SSH-Patator 0.996 0.9491 1 0.9992
pendency of most IDS researches on big data can be reflected from PortScan 0.9968 0.9495 0.9983 0.9988
the third column of Table 3 denoting the number of training sam- DoS 0.9865 0.9505 0.9937 0.9964
ples used by each IDS. Note that the number of labeled samples DDoS 0.945 0.9165 0.9655 0.9646
used by all these methods reaches hundreds of thousands, or even
millions, which brings tremendous human efforts to collect and la- sult obtained by FS-IDS is on the basis of only 5 labeled samples
bel these data manually. The success of methods using big data used in training process. For the rest of related works (Andresini
has demonstrated the capacity of deep learning on the basis of big et al., 2021; Faker and Dogdu, 2019), FS-IDS outperforms them in
data. However, when a huge labeled dataset is not available, there’s terms of accuracy or recall.
little traditional methods can do about it. The last three rows are works based on few-shot learning. FC-
As shown in Table 3, FS-IDS outperforms GA-based Adaptive Net (Xu et al., 2020) obtained 94.33% accuracy and 99.17% recall ac-
Method (Resende and Drummond, 2018), DT and Rule Based IDS cording to the results presented. When applying on the same data
(Ahmim et al., 2019) and DBN-based IDS (Manimurugan et al., with our proposed FS-IDS, the performance of FC-Net degraded to
2020) in both accuracy and recall. Although FS-IDS doesn’t ex- 91.57% and 98.3% respectively, whereas FS-IDS obtained 97.51% ac-
ceed SU-IDS (Min et al., 2018), Deep Hierarchical IDS (Zhang et al., curacy and 99% recall. To present the results of the comparison
2019), DNN-kNN IDS (de Souza et al., 2020) and Multi-Stage Opti- with FC-Net in a comprehensive manner, Table 4 provide the re-
mized ML-based IDS (Injadat et al., 2021), all of these works are on sults for all of chosen attack types in detail. As shown in Table 4,
the basis of a large-scale dataset. The number of samples that SU- FS-IDS has outperformed FC-Net in all 5 meta test attack classes in
IDS (Min et al., 2018), Deep Hierarchical IDS (Zhang et al., 2019), terms of accuracy and recall.
DNN-kNN IDS (de Souza et al., 2020) and Multi-Stage Optimized Based on observation of these results, we demonstrate the su-
ML-based IDS (Injadat et al., 2021), used reach 40,0 0 0, 553,850, periority of FS-IDS and conclude as follows:
225,745 and 2,830,540, respectively. With a slight decline about 2% A: FS-IDS obtained a comparable, or even higher detection accu-
in accuracy and 0.6% in recall, required training samples of FS-IDS racy and recall on novel attacks than previous works by using
is much fewer than these method. It should be noted that the re- much fewer labeled samples: only 5 malicious samples.

11
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Table 5 Table 6
Components utilized by FS-IDS, Model I and Model II. Accuracy and recall of FS-IDS in 1, 3, and 5 shot.

Model Components Attack types 1 shot 3 shot 5 shot

FS-IDS “GrayScale Flow” Encoding Method + Feature Fusion Acc Rec Acc Rec Acc Rec
Model I “GrayScale Flow” Encoding Method
FTP-Patator 0.86 0.87 0.93 0.91 0.95 0.99
Model II Video-like 3D Encoding
SSH-Patator 0.90 0.88 0.97 0.98 0.99 1
PortScan 0.90 0.86 0.95 0.93 0.99 0.99
DoS 0.85 0.84 0.95 0.94 0.98 0.99
B: FS-IDS achieved the state-of-the-art performance among intru- DDoS 0.78 0.83 0.84 0.92 0.94 0.96

sion detection methods based on few-shot learning as far as we

know.
ing method were highly similar, we can claim that by integrating
Another noteworthy fact is that the majority of existing works multidimensional information from limited resources, both factors
are based on statistic features extracted from network traffic while contribute to the high performance of FS-IDS in few-shot scenario.
only a few works (Xu et al., 2020; Zhang et al., 2019) perform
detection on processing raw traffic content. We believe the rea- 4.5. Multiclass classification
son why researchers prefer network features is the convenience in
acquiring and processing. The form of extracted features is more Fig. 10 illustrates how FS-IDS performed under circumstances of
suitable for neural networks and easier to process than raw traffic multiclass classification on various malicious attacks. On selected
data. As the only work utilizing both network features and traffic training classes such as DoS, DDoS, FS-IDS obtained similar accu-
in intrusion detection, the effect of feature fusion in FS-IDS will be racy as in binary classification settings. The recall under blended
elaborated in following sections. traffic conditions represented a little decline. It is reasonable that
more kinds of attacks bring more difficulties for model to detect
4.4. Ablation study each kind of malicious data correctly. For the three kinds of meta-
test attacks: Web Brute Force attack, Web XSS attack and botnet,
We performed ablation experiments to investigate the effect detection performance showed an obvious difference with meta-
of traffic encoding method and introduction of statistic features training classes. For Brute Force, XSS and botnet, the accuracy were
on intrusion detection. In order to ensure the fairness of com- 92.4%, 92.6% and 90.6%, respectively, The recall also degraded to
parison, we produced two baseline models named Model I and 95.5%, 93.8% and 93.8%. However, with a certain degree of degra-
Model II to explore the benefits of different factors, respectively. dation, FS-IDS still maintained a considerable performance. Notice
For Model I, we eliminated the feature vector produced by autoen- that this result was tested under the worst conditions, i.e., detect-
coder and kept the other experimental settings unchanged. By re- ing unknown attack from complex network traffic containing more
moving feature fusion, IDS only learned to distinguish using origi- than one kind of malicious data, with only a few labeled samples
nal traffic through networks. In order to estimate how much per- available. Upon these results we can claim that it demonstrates
formance gain “GrayScale Flow” can lead to, we proposed second the stable and superior performance of FS-IDS under circumstances
base model named Model II. For Model II, we could not simply with such huge disadvantages.
remove transformed flow from IDS since by that means the re-
maining parts are a separately trained autoencoder and a linear 5. Discussions
unit to calculate distance, which left us no trainable model. So
we made Model II equipped with the same encoding method as 5.1. Why few-shot learning
Xu et al. (2020), which encoded a sequential network flow as a set
of several color images. Besides, we also removed feature vector in few-shot learning is naturally a sub-area in deep learning, and
model II. Table 5 lists the components integrated by different base the rise of FSL reflects the wishes of researchers to free deep learn-
models. ing model from reliance on large-scale labeled dataset. Main appli-
Fig. 9 shows the results how performance degraded by remov- cation scenarios of FSL are some kinds of tasks where supervised
ing feature vector. From Fig. 9 we can see that by removing dif- information is hard or impossible to acquire for some reasons. It
ferent components, the detection performance declined with dif- is through FSL that learning suitable models for these rare cases
ferent degrees. The comparison of FS-IDS with Model I illustrates becomes possible (Wang and Yao, 2020). We found that in intru-
how much the introduction of autoencoder with characteristic fea- sion detection, the process of acquiring, analyzing and labeling ma-
ture benefited detection capacity. It can be seen that network fea- licious samples from network traffic is overly reliant on manual
tures brought average improvement by 2.8% in the field of accuracy annotation and costly. We choose few-shot learning as our solu-
and 2.3% in recall. The performance gap related to different attack tion for intrusion detection owing to the few-shot model’s capac-
types in recall was slightly larger than that in accuracy. Feature fu- ity of generalization to new samples quickly. We find such char-
sion brought maximum 5.3% accuracy improvement in PortScan, as acteristic of FSL helpful in intrusion detection, especially detecting
well as maximum 7.0% recall improvement in DoS attack. Due to zero-day attack. By utilizing FSL, security researchers can obtain a
that we can claim that the introduction of network features not discriminant model from much fewer supervised information than
only helped IDS distinguish benign traffic from malicious traffic previous researches, which makes IDS practical in detecting novel
more accurately, but also reduced the false alarm rate of IDS. Oth- attacks in time.
erwise, the increase of detection performance shows some variance We choose the number of training samples to be 5 because we
among different attack types. For example, accuracy and recall got need to keep experimental conditions the same as Xu et al. (2020).
2.3% and 6.9% improvement in DoS attack while the degrees be- Intrinsically, the performance of few-shot learning model will im-
came 3.3% and 0.7% in FTP Patator. The comparison between Model prove as the number of training samples increases. We detailed the
I with Model II demonstrates how proposed “GrayScale Flow” en- results of FS-IDS with the numbers of training samples to be 1, 3
coding method proceeded 3D encoding. Model I is 2.8% and 2.4% and 5 in Table 6. With the scale of training set changed, another
higher than Model II for accuracy and recall in average, respec- experimental settings remain the same in Section 3. It can be seen
tively. Since the increase brought by network feature and encod- that with more samples involved in training, the better detection

12
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 9. Accuracy and recall of FS-IDS, Model I and Model II on various attacks.

performance the model obtained. By using only 1 available sample, 6. Conclusion

which is also called one-shot learning, the detection accuracy and
recall dropped to 0.85 in average. In view of the dependency of current researches on a huge,
Zero-shot learning is also a promising field in deep learning, fully-labeled dataset, we aimed to design an intrusion detection
which aims to train a model without any available samples. It framework maintaining detection capacity under circumstances
sounds fascinating in intrusion detection since it seems solve the that only a few malicious samples are available. To fill this gap,
problem of zero-day attack detection once and for all. However, we proposed FS-IDS, a novel framework of intrusion detection sys-
existing zero-shot learning relies on an external descriptive set to tem on the basis of few-shot learning. We first proposed a novel
provide the features of different classes, including unseen classes “GrayScale Flow” encoding method to transform the original traf-
(Sung et al., 2018). The creation of such descriptive set is also im- fic content to a image-format representation. As for the feature
practical in intrusion detection. Overall, we choose few-shot learn- vectors from attribute analysis, we utilized autoencoder to learn
ing as our solution for improving practicability and zero-day attack compressed features in an unsupervised way. Task generator was
detection capacity of IDS. designed to split the traditional learning process into a set of
tasks with separate data for CNN learning prior knowledge through
known attacks. We applied feature fusion to generate a compre-
5.2. Limitations
hensive, robust representation for network flow from limited re-
sources. In test phase, integrated representations of unseen attacks
This work achieves network intrusion detection from limited
were compared by distance metric module to determine whether
supervised information, in order to wean IDS from dependency
samples were benign or malicious.
on large-scale labeled datasets. However, detection capacity of FS-
To evaluate the performance of FS-IDS, we performed three sets
IDS still derives from specific malicious samples. Ideally, we hope
of experiments, which were comparison study, ablation study and
IDS can detect unseen attacks even if security experts know noth-
multiclass study. The comparison study determined the optimal
ing about them. Besides, in terms of detection accuracy and recall
distance metric is Euclidean distance. Moreover, comparison of FS-
comparing with supervised learning on large-scale datasets, FS-IDS
IDS with related works demonstrated that FS-IDS not only achieved
still has potential for making further progress. We think the key is-
comparable performance with other methods relying on a fully-
sues for advancement are more representative features of network
labeled large dataset, but also acquired the state-of-the-art perfor-
flow. Whether image-like representations or extracted characteris-
mance among intrusion detection methods in few-shot conditions.
tics both lose some specific information of network traffic. How to
Ablation study proved the effectiveness of the “GrayScale Flow” en-
design a set of comprehensive and representative features for de-
coding and feature fusion mechanism. Both of them provided more
scribing network behaviors accurately remains a problem for secu-
than 2.5% improvement on detection accuracy and recall. Finally, to
rity researchers.

13
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Fig. 10. Accuracy and recall of FS-IDS on various attacks under multiclass conditions.

test the practicability of FS-IDS, we simulated the real-world con- Kim, T., Suh, S.C., Kim, H., Kim, J., Kim, J., 2018. An encoding technique for CNN-
ditions by getting model tested on network traffic including var- based network anomaly detection. In: 2018 IEEE International Conference on
Big Data (Big Data), pp. 2960–2965. doi:10.1109/BigData.2018.8622568.
ious attacks. Results showed that FS-IDS achieved over 90% accu- Koch, G., Zemel, R., Salakhutdinov, R., 2015. Siamese neural networks for one-shot
racy and recall both on the seen or unseen attacks under the worst image recognition. ICML’15.
circumstances. Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.B., 2011. One shot learning of sim-
ple visual concepts. In: Proceedings of the 33rd Annual Conference of the Cog-
nitive Science Society.
Declaration of Competing Interest Li, Z., Qin, Z., Huang, K., Yang, X., Ye, S., 2017. Intrusion detection using convolutional
neural networks for representation learning. In: Liu, D., Xie, S., Li, Y., Zhao, D.,
El-Alfy, E.S.M. (Eds.), Neural Information Processing. Springer International Pub-
The authors declare that they have no known competing finan- lishing, Cham, pp. 858–866.
cial interests or personal relationships that could have appeared to Liao, H.J., Richard Lin, C.H., Lin, Y.C., Tung, K.Y., 2013. Intrusion detection system: a
comprehensive review. J. Netw. Comput. Appl. 36 (1), 16–24. doi:10.1016/j.jnca.
influence the work reported in this paper. 2012.09.004.
Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D.,
Webster, S.E., Wyschogrod, D., Cunningham, R.K., Zissman, M.A., 20 0 0. Evalu-
CRediT authorship contribution statement ating intrusion detection systems: the 1998 DARPA off-line intrusion detection
evaluation. In: Proceedings DARPA Information Survivability Conference and Ex-
Jingcheng Yang: Conceptualization, Methodology, Software, position. DISCEX’00, vol. 2, pp. 12–26. doi:10.1109/DISCEX.2000.821506.
Malaiya, R.K., Kwon, D., Kim, J., Suh, S.C., Kim, H., Kim, I., 2018. An empirical eval-
Writing – original draft. Hongwei Li: Investigation, Writing – re- uation of deep learning for network anomaly detection. In: 2018 International
view & editing. Shuo Shao: Writing – review & editing. Futai Zou: Conference on Computing, Networking and Communications (ICNC), pp. 893–
Data curation. Yue Wu: Supervision, Writing – review & editing. 898. doi:10.1109/ICCNC.2018.8390278.
Manimurugan, S., Al-Mutairi, S., Aborokbah, M.M., Chilamkurti, N., Ganesan, S.,
Patan, R., 2020. Effective attack detection in internet of medical things smart
References environment using a deep belief neural network. IEEE Access 8, 77396–77404.
doi:10.1109/ACCESS.2020.2986013.
Aggarwal, C.C., 2018. Neural Networks and Deep Learning - A Textbook. Springer. Millar, K., Cheng, A., Chew, H.G., Lim, C.C., 2019. Using convolutional neural net-
Ahmim, A., Maglaras, L., Ferrag, M.A., Derdour, M., Janicke, H., 2019. A novel hierar- works for classifying malicious network traffic. In: Alazab, M., Tang, M. (Eds.),
chical intrusion detection system based on decision tree and rules-based mod- Deep Learning Applications for Cyber Security. Springer International Publish-
els. In: 2019 15th International Conference on Distributed Computing in Sensor ing, Cham, pp. 103–126.
Systems (DCOSS), pp. 228–233. doi:10.1109/DCOSS.2019.0 0 059. Min, E., Long, J., Liu, Q., Cui, J., Cai, Z., Ma, J., 2018. SU-IDS: a semi-supervised and
Andresini, G., Appice, A., Malerba, D., 2021. Nearest cluster-based intrusion detection unsupervised framework for network intrusion detection. In: Sun, X., Pan, Z.,
through convolutional neural networks. Knowledge-Based Syst. 216, 106798. Bertino, E. (Eds.), Cloud Computing and Security. Springer International Publish-
doi:10.1016/j.knosys.2021.106798. ing, Cham, pp. 322–334.
Dhanabal L., Shantharajah S.. A study on NSL-KDD dataset for intrusion detection Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A., 2018. Kitsune: an ensemble of au-
system based on classification algorithms. 2015.. toencoders for online network intrusion detection. 25th Annual Network and
Faker, O., Dogdu, E., 2019. Intrusion Detection Using Big Data and deep Learning Distributed System Security Symposium, NDSS 2018, San Diego, California, USA,
Techniques. In: ACM SE ’19. Association for Computing Machinery, New York, February 18–21, 2018. The Internet Society.
NY, USA, pp. 86–93. doi:10.1145/3299815.3314439. O’Neill, P.H., 2021. 2021 has broken the record for zero-day hacking attacks. MIT
Hinton, G.E., Salakhutdinov, R.R., 2006. Reducing the dimensionality of data with Technol. Rev. September 23, 2021. https://www.technologyreview.com/2021/09/
neural networks. Science 313 (5786), 504–507. doi:10.1126/science.1127647. 23/1036140/2021- record- zero- day- hacks- reasons/
Ieracitano, C., Adeel, A., Morabito, F.C., Hussain, A., 2020. A novel statistical analysis Pektaş, A., Acarman, T., 2019. A deep learning method to detect network intrusion
and autoencoder driven intelligent intrusion detection approach. Neurocomput- through flow based features. Int. J. Netw. Manag. 29 (3). doi:10.1002/nem.2050.
ing 387, 51–62. doi:10.1016/j.neucom.2019.11.016. Raghavendra Chalapathy S.C.. Deep learning for anomaly detection: a survey. 2019.
Injadat, M., Moubayed, A., Nassif, A.B., Shami, A., 2021. Multi-stage optimized ma- Resende, P.A.A., Drummond, A.C., 2018. Adaptive anomaly-based intrusion detection
chine learning framework for network intrusion detection. IEEE Trans. Netw. system using genetic algorithm and profiling. Secur. Privacy 1 (4), e36. doi:10.
Serv. Manag. 18 (2), 1803–1816. doi:10.1109/TNSM.2020.3014929. 1002/spy2.36.
Jiangxing, W., Jianhua, L., Xinsheng, J., 2018. Security for cyberspace: challenges and Sharafaldin I., Habibi Lashkari A., Ghorbani A.. Toward generating a new intru-
opportunities. Front. Inf. Technol. Electron. Eng. 19 (12), 1459–1461. doi:10.1631/ sion detection dataset and intrusion traffic characterization. 2018. p. 108–116.
FITEE.1840 0 0 0. 10.5220/0 0 06639801080116

14
J. Yang, H. Li, S. Shao et al. Computers & Security 122 (2022) 102899

Snell, J., Swersky, K., Zemel, R., 2017. Prototypical Networks for Few-Shot Learning. Jingcheng Yang received the B.S. degree in information science from Southeast Uni-
In: NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp. 4080–4090. versity, China, in 2014 and the M.S. degree in cyber science and engineering from
de Souza, C.A., Westphall, C.B., Machado, R.B., Sobral, J.B.M., dos Santos Vieira, G., Shanghai Jiao Tong University, China, in 2018. He is currently pursuing the Ph.D. de-
2020. Hybrid approach to intrusion detection in fog-based IoT environments. gree in cyber science and engineering in Shanghai Jiao Tong University. His research
Comput. Netw. 180, 107417. doi:10.1016/j.comnet.2020.107417. interests include artificial intelligence, data privacy and intrusion detection.
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H.S., Hospedales, T.M., 2018. Learning to
compare: relation network for few-shot learning. In: 2018 IEEE/CVF Conference Hongwei Li was born in 1998. He received the B.S. degree in information engineer-
on Computer Vision and Pattern Recognition, pp. 1199–1208. doi:10.1109/CVPR. ing from Shanghai Jiao Tong University, Shanghai, in 2020. He is currently pursuing
2018.00131. the M.S. degree in information engineering in Shanghai Jiao Tong University. His
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A., 2009. A detailed analysis of the research interests include artificial intelligence and vulnerability detection and ex-
KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelli- ploitation.
gence for Security and Defense Applications, pp. 1–6. doi:10.1109/CISDA.2009.
5356528.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D., 2016a. Matching Futai Zou is currently an Associate Professor in School of Cyber Science and En-
Networks for one Shot Learning. In: NIPS’16. Curran Associates Inc., Red Hook, gineering, Shanghai Jiao Tong University, China. He received the Ph.D. degree in
NY, USA, pp. 3637–3645. computer science from Shanghai Jiao Tong University in 2005. His current research
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D., 2016b. Match- interests mainly focus on network attack and defense technology.
ing networks for one shot learning. In: Proceedings of the 30th International
Conference on Neural Information Processing Systems. In: NIPS’16. Curran As- Shuo Shao (Member, IEEE) received the B.S. degree in information science from
sociates Inc., Red Hook, NY, USA, pp. 3637–3645. Southeast University, China, in 2011, the M.A.Sc. degree in electrical and computer
Wang, W., Sheng, Y., Wang, J., Zeng, X., Ye, X., Huang, Y., Zhu, M., 2018. HAST- engineering from McMaster University, Canada, in 2013, and the Ph.D. degree from
IDS: learning hierarchical spatial-temporal features using deep neural networks Texas A&M University, USA, in 2017. In 2017, he joined the School of Electronics, In-
to improve intrusion detection. IEEE Access 6, 1792–1806. doi:10.1109/ACCESS. formation and Electrical Engineering, Shanghai Jiao Tong University, China. His re-
2017.2780250. search interests include network information theory, algebraic code, and machine
Wang Y., Yao Q.. Few-shot learning: a survey. 2020. learning.
Xu, C., Shen, J., Du, X., 2020. A method of few-shot network intrusion detection
based on meta-learning framework. IEEE Trans. Inf. Forensics Secur. 15, 3540– Yue Wu, received the B.S. degree from Dept. of Information and Electronics, Zhe-
3552. doi:10.1109/TIFS.2020.2991876. jiang University, Hangzhou, China in 1989, M.S. and Ph.D. degree from Dept. of Ra-
Zavrak, S., skefiyeli, M., 2020. Anomaly-based intrusion detection from network flow dio Engineering, Southeast University, Nanjing, China in 1998 and 2004 respectively.
features using variational autoencoder. IEEE Access 8, 108346–108358. doi:10. He is currently a Professor with School of Electronic Information and Electrical En-
1109/ACCESS.2020.3001350. gineering, Shanghai Jiaotong University, Shanghai, China. His research interests in-
Zhang, Y., Chen, X., Jin, L., Wang, X., Guo, D., 2019. Network intrusion detection: clude vehicular networks, wireless network security, security and trust for IoT. He
based on deep hierarchical network and original flow data. IEEE Access 7, is a member of IEEE and IEEE Communications and Information Security Technical
37004–37016. doi:10.1109/ACCESS.2019.2905041. Committee.

Physiological Effects of Massage
100% (4)
Physiological Effects of Massage
47 pages
HDLNIDS Hybrid Deep-Learning
No ratings yet
HDLNIDS Hybrid Deep-Learning
17 pages
AML Based Intrusion Detection
No ratings yet
AML Based Intrusion Detection
17 pages
1 s2.0 S2772503023000130 Main
No ratings yet
1 s2.0 S2772503023000130 Main
13 pages
Convolutional Neural Networks With LSTM For Intrusion Detection
No ratings yet
Convolutional Neural Networks With LSTM For Intrusion Detection
11 pages
Deep Learning Approach For Intelligent Intrusion Detection System
No ratings yet
Deep Learning Approach For Intelligent Intrusion Detection System
5 pages
Deep_Convolutional_Neural_Networks_for_Intrusion_Detection_in_Automotive_Ethernet_Networks
No ratings yet
Deep_Convolutional_Neural_Networks_for_Intrusion_Detection_in_Automotive_Ethernet_Networks
6 pages
Manjunath_jusstuu
No ratings yet
Manjunath_jusstuu
11 pages
Intrusion Detection Algorithm Based On Convolutional Neural Network
No ratings yet
Intrusion Detection Algorithm Based On Convolutional Neural Network
5 pages
Seguridad
No ratings yet
Seguridad
29 pages
sensors-23-07796
No ratings yet
sensors-23-07796
18 pages
CNN-based Network Intrusion Detection and Classification Model For Cyber-Attacks
No ratings yet
CNN-based Network Intrusion Detection and Classification Model For Cyber-Attacks
9 pages
Reinforcement Learning for Intrusion Detection More Model Longness and Fewer Update
No ratings yet
Reinforcement Learning for Intrusion Detection More Model Longness and Fewer Update
11 pages
APELID Augmentd WGAN and Parallel Ensemble Learning
No ratings yet
APELID Augmentd WGAN and Parallel Ensemble Learning
17 pages
Du 等 - 2024 - A Few-Shot Class-Incremental Learning Method for N
No ratings yet
Du 等 - 2024 - A Few-Shot Class-Incremental Learning Method for N
13 pages
Cyber Threat Detection Synopsis
No ratings yet
Cyber Threat Detection Synopsis
14 pages
EESNN Hybrid Deep Learning Empowered SpatialTemporal Features for Network Intrusion Detection System
No ratings yet
EESNN Hybrid Deep Learning Empowered SpatialTemporal Features for Network Intrusion Detection System
16 pages
Symmetry 15 01251
No ratings yet
Symmetry 15 01251
31 pages
Processes Intrusion Detection
No ratings yet
Processes Intrusion Detection
14 pages
1-s2.0-S2352864823000640-main
No ratings yet
1-s2.0-S2352864823000640-main
15 pages
Batch 5 PPT Final
No ratings yet
Batch 5 PPT Final
26 pages
Reference
No ratings yet
Reference
5 pages
Research 2
No ratings yet
Research 2
12 pages
Intrusion Detection Systems With Deep Learning: A Systematic Mapping Study
No ratings yet
Intrusion Detection Systems With Deep Learning: A Systematic Mapping Study
5 pages
Multi Level Deep Learning Model For Network Anomal
No ratings yet
Multi Level Deep Learning Model For Network Anomal
12 pages
8499-Article Text-9477-2-10-20231102
No ratings yet
8499-Article Text-9477-2-10-20231102
12 pages
An Efficient Hyperparameter Control Method For A Network
No ratings yet
An Efficient Hyperparameter Control Method For A Network
15 pages
Research Paper - Si
No ratings yet
Research Paper - Si
5 pages
631eaa91dbcfb7 78471842
No ratings yet
631eaa91dbcfb7 78471842
13 pages
Intrusion Detection Model using Machine Learning Algorithms on NSL-KDD Dataset
No ratings yet
Intrusion Detection Model using Machine Learning Algorithms on NSL-KDD Dataset
14 pages
Intrusion Detection of Imbalanced Network Traffic Based On Machine Learning and Deep Learning
No ratings yet
Intrusion Detection of Imbalanced Network Traffic Based On Machine Learning and Deep Learning
14 pages
1.1 Motivation
No ratings yet
1.1 Motivation
65 pages
Saurabh Kansal Dec Month 2024- 18 Feb[1]
No ratings yet
Saurabh Kansal Dec Month 2024- 18 Feb[1]
12 pages
1-s2.0-S2352864820302868-main
No ratings yet
1-s2.0-S2352864820302868-main
8 pages
IDS Using Deep Learning
No ratings yet
IDS Using Deep Learning
5 pages
ppt
No ratings yet
ppt
32 pages
A Hybrid Intrution Detection Approach Based On Deep Learning
No ratings yet
A Hybrid Intrution Detection Approach Based On Deep Learning
16 pages
AWID For IntrusionCISS2019
No ratings yet
AWID For IntrusionCISS2019
6 pages
A Machine Learning Approach For Intrusion Detection
No ratings yet
A Machine Learning Approach For Intrusion Detection
6 pages
Batch 1_4 CSE C
No ratings yet
Batch 1_4 CSE C
9 pages
fin_irjmets1708609848
No ratings yet
fin_irjmets1708609848
4 pages
Final Progress
No ratings yet
Final Progress
22 pages
Comparison 2018
No ratings yet
Comparison 2018
14 pages
1 s2.0 S0045790621000628 Main
No ratings yet
1 s2.0 S0045790621000628 Main
11 pages
DTL-IDS
No ratings yet
DTL-IDS
10 pages
Erkihun Mulu Muche
No ratings yet
Erkihun Mulu Muche
3 pages
Electronics 11 00898
No ratings yet
Electronics 11 00898
13 pages
19148-Article Text-78917-2-10-20240405
No ratings yet
19148-Article Text-78917-2-10-20240405
24 pages
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
No ratings yet
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
10 pages
Deep Learning-Based Intrusion
No ratings yet
Deep Learning-Based Intrusion
6 pages
A Feed-Forward and Pattern Recognition ANN Model For Network Intrusion Detection
No ratings yet
A Feed-Forward and Pattern Recognition ANN Model For Network Intrusion Detection
7 pages
Deep Learning in Intrusion Detection Systems-2018
No ratings yet
Deep Learning in Intrusion Detection Systems-2018
4 pages
Performance Evaluation of Machine Learning Algorithms For Intrusion Detection System
No ratings yet
Performance Evaluation of Machine Learning Algorithms For Intrusion Detection System
20 pages
Network and Host Based Intrusion Detecti
No ratings yet
Network and Host Based Intrusion Detecti
20 pages
A Survey On Effective Machine Learning Algorithm For Intrusion Detection System
No ratings yet
A Survey On Effective Machine Learning Algorithm For Intrusion Detection System
4 pages
Flow Dataset For Network Intrusion Detection
No ratings yet
Flow Dataset For Network Intrusion Detection
23 pages
A Bidirectional LSTM Deep Learning Approach For Intrusion Detection
No ratings yet
A Bidirectional LSTM Deep Learning Approach For Intrusion Detection
30 pages
Deep Learning-Based Hybrid Intelligent Intrusion Detection System
No ratings yet
Deep Learning-Based Hybrid Intelligent Intrusion Detection System
14 pages
TABLE OF CONTENT (1)(2)
No ratings yet
TABLE OF CONTENT (1)(2)
55 pages
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
No ratings yet
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
44 pages
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
The River’s Promise
No ratings yet
The River’s Promise
2 pages
Project Snowblind - Manual - PS2
No ratings yet
Project Snowblind - Manual - PS2
15 pages
Bajaj Order4
No ratings yet
Bajaj Order4
6 pages
Full Length Article: Sciencedirect
No ratings yet
Full Length Article: Sciencedirect
15 pages
Arts 2nd Quarter Report 1
No ratings yet
Arts 2nd Quarter Report 1
25 pages
Ultherapy System Instructions For Use
100% (1)
Ultherapy System Instructions For Use
72 pages
GSW 3208m2 Datasheet
No ratings yet
GSW 3208m2 Datasheet
2 pages
Placement Brochure MNNIT-Allahabad
No ratings yet
Placement Brochure MNNIT-Allahabad
28 pages
Schacht - Nietzsche On Interpretation and Truth
No ratings yet
Schacht - Nietzsche On Interpretation and Truth
12 pages
Ib & PBD Tangub Sris 2023
No ratings yet
Ib & PBD Tangub Sris 2023
128 pages
Transformer Protection
50% (2)
Transformer Protection
23 pages
A Displacement-Based Adaptive Pushover For Assessment of Buildings and Bridges - Rui Pinho, Et Al, 2006
No ratings yet
A Displacement-Based Adaptive Pushover For Assessment of Buildings and Bridges - Rui Pinho, Et Al, 2006
16 pages
PF Chronicler Anthology Vol. 1
No ratings yet
PF Chronicler Anthology Vol. 1
217 pages
Pestel Analysis of Uber: Social Environment
No ratings yet
Pestel Analysis of Uber: Social Environment
3 pages
Walnut Financier - Bruno Albouze
No ratings yet
Walnut Financier - Bruno Albouze
1 page
Ob Exam Study Guide The Bible 001 49pgs
No ratings yet
Ob Exam Study Guide The Bible 001 49pgs
50 pages
Intelligent Rain Sensing Using Automatic Wiper System: Fig 1. System Implementation On Car
No ratings yet
Intelligent Rain Sensing Using Automatic Wiper System: Fig 1. System Implementation On Car
3 pages
Job Description DCS Operator: Doc No.: Date: Rev No.: Page 1 of
No ratings yet
Job Description DCS Operator: Doc No.: Date: Rev No.: Page 1 of
2 pages
Item Name & Description Pe'S Estimate Quantity Unit Official Rate (NRS) Sl. No Category Hierarchy
No ratings yet
Item Name & Description Pe'S Estimate Quantity Unit Official Rate (NRS) Sl. No Category Hierarchy
20 pages
BMB401-2010 Syllabus 1
No ratings yet
BMB401-2010 Syllabus 1
3 pages
Ingles Sebastian Barrera 901: Spent Used
No ratings yet
Ingles Sebastian Barrera 901: Spent Used
3 pages
S - R - O - 410 For Maximum Retail Prices of Drug (09-06-2015)
No ratings yet
S - R - O - 410 For Maximum Retail Prices of Drug (09-06-2015)
27 pages
Dye Penetrant Testing
No ratings yet
Dye Penetrant Testing
20 pages
ES Unit - 7 Human Communities and Environment
No ratings yet
ES Unit - 7 Human Communities and Environment
4 pages
Size and Performance of 5-Star Hotel Chains in Greece
No ratings yet
Size and Performance of 5-Star Hotel Chains in Greece
37 pages
Di Cosmo
No ratings yet
Di Cosmo
36 pages
Garelli Vip N Owner S Manual 34
No ratings yet
Garelli Vip N Owner S Manual 34
34 pages
LNGFM 24 I
No ratings yet
LNGFM 24 I
52 pages
System modeling and simulation an introduction 1st Edition Frank L. Severance instant download
100% (1)
System modeling and simulation an introduction 1st Edition Frank L. Severance instant download
73 pages

A framework for intrusion detection based on few-shot learning

Uploaded by

A framework for intrusion detection based on few-shot learning

Uploaded by

Computers & Security 122 (2022) 102899

Contents lists available at ScienceDirect

Computers & Security

FS-IDS: A framework for intrusion detection based on few-shot

Fig. 1. Overall structure of FS-IDS.

Fig. 2. Pipeline of network traﬃc encoding method.

Fig. 3. Hierarchical structure of network traﬃc.

on local regions of the grid-structured inputs. Simply arranging

Fig. 5. Structure of autoencoder in FS-IDS.

Attack Num of samples

DoS Hulk 231,073

Fig. 6. Structure of convolutional block in feature extraction network.

Fig. 7. Structure of CNN in CNN-based distance metric module.

3.6.2. CNN based distance metric

REP Tree + JRip + Random Forest

Genetic Algorithm + kNN

kNN + Random Forest

Deep Belief Network

Statistic Features + Raw Traﬃc

4.3. Comparison study

Multi-Stage Optimized ML-based IDS (2021) (Injadat et al., 2021)

In order to make a comprehensive comparison between linear

distance module and neural network distance module, we imple-

ule described in Section 3 to determine which is the best architec-

ture for measuring distance between embedding features. The ex-

perimental results are shown in Fig. 8 and Table 2. In Fig. 8, the

horizontal and vertical axis of each heatmap represent two meta

ﬁed by its X-coordinate. Fig. 8 depicts the speciﬁc performance on

ric module architectures. In following sections, we use results ob-

tained by IDS based on Euclidean distance metric module since it

CIDS2017. The result is shown in Table 3. Among these IDS, FC-Net

condition. Since the data and code of FC-Net is inaccessible, we re-

produced FC-Net and tested its performance on the same data to

Model Components Attack types 1 shot 3 shot 5 shot

sion detection methods based on few-shot learning as far as we

performance the model obtained. By using only 1 available sample, 6. Conclusion

You might also like