0% found this document useful (0 votes)
4 views

Meta-Path Based Attentional Graph Learning Model f

Uploaded by

Hoàng Anh Đỗ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Meta-Path Based Attentional Graph Learning Model f

Uploaded by

Hoàng Anh Đỗ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1

Meta-Path Based Attentional Graph Learning


Model for Vulnerability Detection
Xin-Cheng Wen† , Cuiyun Gao∗†‡ , Jiaxin Ye§ , Yichen Li¶ , Zhihong Tiank , Yan Jia† , and Xuan Wang†
† Harbin Institute of Technology, Shenzhen, China
‡ Peng Cheng Laboratory, Shenzhen, China
§ Fudan University, Shanghai, China
¶ The Chinese University of Hong Kong, Hong Kong, China
k GuangZhou University, GuangZhou, China
arXiv:2212.14274v1 [cs.SE] 29 Dec 2022

Abstract—In recent years, deep learning (DL)-based methods have been widely used in code vulnerability detection. The DL-based
methods typically extract structural information from source code, e.g., code structure graph, and adopt neural networks such as Graph
Neural Networks (GNNs) to learn the graph representations. However, these methods do not consider the heterogeneous relations in
the code structure graph, i.e., different types of nodes and edges, which may obstruct the graph representation learning. Besides,
these methods are limited in capturing long-range dependencies due to the deep levels in the code structure graph. In this paper, we
propose a Meta-path based Attentional Graph learning model for code vulNErability deTection, called MAGNET. MAGNET constructs
a multi-granularity meta-path graph for each code snippet, in which the heterogeneous relations are denoted as meta paths to
represent the structural information. A meta-path based hierarchical attentional graph neural network is also proposed to capture the
relations between distant nodes in the graph. We evaluate MAGNET on three public datasets and the results show that MAGNET
outperforms the best baseline method in terms of F1 score by 6.32%, 21.50%, and 25.40%, respectively. MAGNET also achieves the
best performance among all the baseline methods in detecting Top-25 most dangerous Common Weakness Enumerations (CWEs),
further demonstrating its effectiveness in vulnerability detection.

Index Terms—Software Vulnerability; Deep Learning; Graph Neural Network

F
1 I NTRODUCTION
Software vulnerabilities are generally specific flaws or Neural Network (GGNN) to build a vulnerability detection
oversights in the pieces of software that allow attackers model. The work Devign [14] proposes a joint graph which
to disrupt or damage a computer system or program [1], incorporates four types of edges (i.e., Abstract Syntax Tree
leading to security risks [2]–[4] such as system crash and (AST) [22], Control Flow Graph (CFG) [23], Data Flow
data leakage. The ever-growing number of software vulner- Graph (DFG) [24] and Natural Code Sequence (NCS) [25]).
abilities poses a threat to social public security. For instance, To learn the structural information, existing studies adopt
Bugcrowd [5], a crowdsourced security platform, reported a Graph Neural Networks (GNNs), such as GGNN, Graph
185% increase in the number of high-risk vulnerabilities in Convolution Network (GCN), etc. achieving state-of-the-art
2021 compared to the previous year. In December 2021, only performance in vulnerability detection. These GNNs aggre-
11 days after the Apache Log4j2’s remote code execution gate nodes [26] based on the parent-child relations [27], [28],
vulnerability was disclosed, attackers exploited the vulner- which is beneficial for capturing the adjacent-level infor-
ability to attack Belgian network systems, causing system mation [29] from the source code. Despite the promising
outages [6]. Thus, software vulnerability detection is critical performance of the existing GNN-based methods, they still
for improving the security of society. have the following limitations:
To accurately detect software vulnerabilities, various (1) The heterogeneous relations in the code structure
vulnerability detection methods based on deep-learning graph are ignored. Previous studies [14] generally focus
(DL) techniques [7]–[9], which aim at learning the pat- on employing node values in the code structure graph for
terns of vulnerable code, have been proposed in recent learning the graph representations. The recent state-of-the-
years. They generally process the source code as token art models have considered the node types [18] or edge
sequences [10]–[13] or code structure graphs [14]–[16]. For types [30] in the code structure graph, demonstrating the
example, VulDeePecker [17] represents the source code into importance of the structural information for vulnerability
a sequence of tokens as input and uses a bidirectional Long detection. However, the studies do not jointly consider
Short Term Memory (LSTM) model for vulnerability detec- different types of nodes and edges, i.e., the heterogeneous
tion. Recent studies [18]–[20] demonstrate that the structure relations, which are helpful for capturing the patterns of
graph plays a nonnegligible role in capturing vulnerable vulnerable code. For example, as shown in Fig. 1b, we
code patterns. Reveal [18] leverages code property graph can find that nodes A and B have the same value, but
(CPG) [21] by parsing source code and adopts Gated Graph with different node types (i.e., ExpressionStatement and
AssignmentExpression, respectively). Besides, nodes in
* corresponding author. the graph are connected by different edge types (i.e., AST
2

and CFG, respectively). The heterogeneous relations can en- 1) We propose a novel approach MAGNET, a meta-path
rich the representations of nodes, and thereby are beneficial based attentional graph learning model for vulnera-
for venerability detection. bility detection. MAGNET captures heterogeneous re-
(2) The long-range dependencies in the graph are still lations in the code structure graph by constructing a
hard to be captured. Most DL-based approaches including multi-granularity meta-path graph.
the state-of-the art ones [18], [19] use GNNs [26], [27] for 2) We propose a meta-based hierarchical attentional graph
code vulnerability detection. However, it is well-known that neural network, called MHAGNN. It can learn the
GNNs are limited in handling relationships between distant representation of each meta-path and capture the long-
nodes [31], [32], since GNNs mainly use neighborhood range dependency in the meta-path graph.
aggregation for message passing. Due to a large number of The rest of this paper is organized as follows. Sec-
nodes and deep levels in AST-based graphs [33], the current tion 2 describes the background. Section 3 details the two
approaches still face the challenge of learning long-range components in the proposed model of MAGNET, includ-
dependencies in the structure graph by directly adopting ing the multi-granularity meta-path constructing and meta-
GNNs for vulnerability detection [34], [35]. path based hierarchical attentional graph neural network.
To alleviate the above limitations, in this paper, we Section 4 describes the evaluation methods, including the
present MAGNET, a Meta-path based Attentional Graph datasets, baselines, implementation and metrics. Section 5
learning model for code vulNErability deTection. Specifi- presents the experimental results. Section 6 discusses some
cally, MAGNET involves two main components: cases and threats to validity. Section 8 concludes the paper.
(1) Multi-granularity meta-path graph construction. To
exploit the heterogeneous relations in the code structure 2 BACKGROUND
graph for vulnerability detection, we design a meta-path
2.1 Code Structure Graph
graph which jointly involves node types and edge types.
Each meta path in the graph indicates a heterogeneous rela- Code structure graphs are widely used in code vulnerability
tion, denoted as a triplet, e.g., (ExpressionStatement, AST, detection. Devign [14] uses code structure graph, which
AssignmentExpression) for the relation between nodes A shares the same node set with AST and merges the edge
and B in Fig. 1b. Considering the diversity of node types, sets of AST, CFG, DFG, and natural code sequence (NCS).
e.g., there exist 69 node types in the code structure graph of Fig.1b shows a code snippet of CWE-476 [38] and the
Reveal [18], the number of heterogeneous relations tends corresponding code structure graph. As shown in Fig.1b,
to increase exponentially, which would result in under- besides different edge types, each input node is represented
fitting for the GNN models [36]. To mitigate the issue, we with two attributes: Value (described in the first line) and
propose to group the node types into different granularities, Type (descried in the second line).
including ‘Statement’, ‘Expression’, and ‘Symbol’, thereby Although the code structure graph contains rich syntac-
reducing the complexity of node types. Specifically, we con- tic information of the code, the existing DL-based vulnera-
struct a multi-granularity meta-path graph for facilitating bility detection methods are limited in exploiting the infor-
vulnerability detection. mation. The methods focus on combining node types [19]
(2) Meta-path based hierarchical attentional graph or edge types [14], while ignoring the heterogeneous rela-
neural network. To learn the representations of the meta- tions, i.e., jointly considering different types of nodes and
path graph, we propose a meta-path based hierarchical edges. In this paper, we aim at exploiting the heterogeneous
attentional graph neural network, called MHAGNN. First, relations for vulnerability detection by defining meta paths.
a meta-path attention mechanism is proposed to learn the
representation of each meta path, i.e., local dependency, by 2.2 Graph Neural Networks
endowing nodes and edges with different attention weights. Graph Neural Networks (GNNs) have been widely used
Then, to capture the long-range dependency in the meta- in the software engineering tasks, such as code classifica-
path graph, we propose a multi-granularity attention mech- tion [39], code clone detection [40]–[42], etc. This is due to
anism, which captures the importance of heterogeneous the inherent ability to capture the structural information of
relations in different granularities for the final graph rep- source code [18]. GNNs generally aggregate neighbouring
resentation. nodes’ information and use message passing for passing
We evaluate MAGNET on three widely-studied bench- information from the current node to other nodes, finally
mark datasets in software vulnerability detection, including forming a graph representation.
FFMPeg+Qemu [14], Reveal [18], and Fan et al. [37]. We com- Several GNN-based methods have been proposed for
pare with six state-of-the-art software vulnerability detec- vulnerability detection. For instance, Devign [14] learns
tion methods. The experimental results show that the pro- the code structure information by adopting Gated Graph
posed approach outperforms the state-of-the-art baselines. Neural Network (GGNN) to process multiple edge types
Specifically, MAGNET achieves 6.32%, 21.50% and 25.40% graphs. Reveal [18] aims at offering a better separability
improvement comparing with the best baseline regarding between vulnerable and non-vulnerable samples by using
the F1 score metric, respectively. In real-world scenarios, GGNN and multi-layer perceptron (MLP). IVDetect [19]
MAGNET detects 27.78% more vulnerabilities than the best uses the feature-attention GCN model to learn the source
baseline method. code representation, achieving state-of-the-art performance.
In summary, our major contributions in this paper are as Despite the good performance, the existing methods are
follows: still difficult to learn the long-range dependencies. As the
3

void host_lookup(char *user_supplied_addr) ... addr = inet_addr( user_supplied_addr ) hp = gethostbyaddr{...} ...


{ ExpressionStatement A ExpressionStatement H
struct hostent *hp;
AST Edge
in_addr_t *addr; addr = inet_addr( user_supplied_addr )
AssignmentExpression B CFG Edge
char hostname[64];
DFG Edge
in_addr_t inet_addr(const char *cp); C
inet_addr( user_supplied_addr ) NCS Edge
validate_addr_form(user_supplied_addr);
CallExpression
Value
addr = inet_addr(user_supplied_addr);
Type
D E G
hp = gethostbyaddr( addr, sizeof(struct
addr inet_addr user_supplied_addr
in_addr), AF_INET); Identifier Callee ArgumentList
strcpy(hostname, hp->h_name);
} F inet_addr ...
Identifier

(a) A source code snippet (b) A partial code structure graph

Fig. 1: (a) is a source code snippet of CWE-476. (b) is a visualisation of the code structure graph of the statements highlighted
in red box in (a). Each node is represented with two attributes: Value (described in the first line) and Type (descried in the
second line). The nodes with different shades indicate different node types.

prior studies [31], [32] demonstrated, GNNs only can learn different granularities, including “Statement”, “Expression”
neighborhood nodes information. According to Wang et and “Symbol”.
al. [29], the code structure graph has deep levels and a Specifically, according to the code parsing principles [21]
large number of nodes, which it is challenging to learn the we group all node types into the following three categories:
long-range dependencies. In this paper, we propose multi- (1) nodes at “Statement” granularity: the node represents
granularity attention mechanism for learning the long-range the whole sentence in a code snippet, e.g., node A and
dependencies. node H in Fig. 1b. (2) nodes at “Expression” granularity:
the node consists of two or more operator/operands [43],
e.g, the brown-shaded node B and C in Fig. 1b. (3) nodes at
3 P ROPOSED M ODEL “Symbol” granularity: the remaining nodes are categorized
In this section, we introduce the overall architecture of as “Symbol” nodes for simplicity, e.g., the nodes D, E, F and
MAGNET. As shown in Fig. 2, the architecture includes two G in Fig. 1b.
main components: (1) multi-granularity meta-path graph The node types at each granularity are illustrated in
construction, aiming at constructing heterogeneous relations Table 1, from coarse-grained Statement category to fine-
as meta paths. (2) meta-path based hierarchical attentional grained Symbol category. The granularity-related categories
graph neural network, aiming at learning the representa- reflect the structural information of the node value and can
tions of the meta-path graph. facilitate the subsequent DL-based learning process.

3.1 Multi-granularity Meta-path Graph Construction 3.1.2 Meta Path Construction


The code structure graph G is a direct graph, indicated as
In this section, we first illustrate how to group the node
types into multiple granularities, and then describe how we
G(V, E, A, R), where V , E , A, and R represent the node
set, edge set, node type set, and edge type set, respectively.
construct the meta-path graph.
Each node v ∈ V has its associated type with the mapping
function τ (v) : V → A. Each edge e ∈ E is associated with
3.1.1 Node Type Grouping
a type, with the mapping function: ψ(e) : E → R. The edge
Directly employing the node types provided by the pars- e = (s, t) denotes the path linked from source node s ∈ V
ing principles [21] would lead to an increasingly large to target node t ∈ V . To capture the structural informa-
number of heterogeneous relations. For example, following tion of heterogeneous relations between nodes at different
Reveal [18], each dataset can be parsed into 69 node types, granularity, we propose to build meta paths. Based on the
resulting in more than 10,0001 heterogeneous relations. Prior grouped tn node types (tn = 3, i.e., Statement, Expression
research [36] demonstrates that GNN models tend to get un- and Symbol) and contained te edge types (te = 4, i.e., AST ,
derfitting on complex heterogeneous relations. To mitigate CF G, DF G and N CS ), we define a meta path as below.
the issue, we propose to group the node types into three
Definition 1 (meta path). A meta path on the code structure
1. It contains 69 unique node types and 4 edge types, and the number graph is denoted as a triplet (τ (s), ψ(e), τ (t)), indicating
of types for heterogeneous relations are calculated as 692 ∗ 4 = 19044. a heterogeneous relation from source node s to a target
4

� ��
Edge-based attention �
+ + ( ��� + ��� + ��� )
Node-based attention �
��

< �(�), �(�), �(�)>
Source code Code structure Meta-path graph Meta-path attention Multi-granularity
graph construction MHAGNN attention

Fig. 2: The architecture of MAGNET.

TABLE 1: Classification of node types.

Node type Classification Type Number


Statement, SwitchStatement, DoStatement, GotoStatement, WhileStatement, BreakStatement, CompoundStatement,
Statement ForStatement, ReturnStatement, IdentifierDeclStatement, TryStatement, ClassDefStatement, ContinueStatement, 17
IfStatement, ExpressionStatement, ElseStatement, DeclStatement
Expression, InclusiveOrExpression, MultiplicativeExpression, AssignmentExpression, UnaryOperationExpression,
SizeofExpression, OrExpression, ShiftExpression, RelationalExpression, CallExpression, CastExpression,
Expression 20
ConditionalExpression, OperationExpression, EqualityExpression, AdditiveExpression,
PrimaryExpression, AndExpression, ExclusiveOrExpression, BitAndExpression, UnaryExpression
Symbol, File, IncDec, ForInit, SizeofOperand, PtrMemberAccess, Sizeof, IdentifierDeclType, IdentifierDecl, ClassDef,
ParameterList, Callee, Condition, ArrayIndexing, ArgumentList, Parameter, Argument, ParameterType, CastTarget,
Symbol 32
Function, ReturnType, Label, FunctionDef, MemberAccess, InitializerList, CFGErrorNode, InfiniteForNode,
CFGExitNode, CFGEntryNode, Identifier, Decl

node t with a connection edge e. τ (·) denotes the type 3.2.1 Meta-path Attention Mechanism
category of the corresponding node, and ψ(e) means the To better utilize the heterogeneous relations, i.e., triplets
type of the edge e. (τ (s), ψ(e), τ (t)), from the constructed multi-granularity
meta-paths, we devise a meta-path attention mechanism,
The maximum number of types for the meta paths is with detailed architecture presented in Fig. 4. The meta-path
t2n ∗ te = 36. We than analyze the distribution of hetero- attention consists of node-based attention and edge-based
geneous relations belonging to different meta-path types. attention, which aims at learning the importance of different
Fig. 3 illustrate the results on the FFMPeg+Qemu dataset, node types and different edge types in representation the
and the other datasets show a similar distribution trend. graph structure.
As can be seen, the last four types of the total 36 types, Node-based attention: The node-based attention score
e.g., (Ex, 2, Ex), (Ex, 2, St), appear fewer than three times Attlnode for the target node t in the l-th layer is defined as
in the dataset. To facilitate the representation learning of follows:
the heterogeneous relations in the graph, we filter out the
  
Attlnode = σ Wτl (t) · K l (s)||Ql (t) (1)
rare types [44], [45], and employ the remaining 32 types of
meta paths for constructing the meta-path graph. For the
K l (s) = Linearτ (s) (hl−1
s ) (2)
two nodes that have more than one meta path, we keep
the multiple meta paths, e.g., nodes A and H in Fig. 1b Ql (t) = Linearτ (t) (hl−1
t ) (3)
have (St, DF G, St) and (St, CF G, St) meta-paths; thus,
the structure information of the code strucure graph can be where Wτl (t) is a trainable weight matrix, indicating the
maintained by constructing meta-path graph. contribution of node type τ (t) to the representation of the
whole graph. The symbol || is the concatenation operation
and σ is the sigmoid activation function. Ql (t) and K l (s)
are the linear projection of node t and s, respectively. In
3.2 Meta-path based Hierarchical Attentional Graph
Equation (3) and (2), Linear denotes a fully connected
Neural Network
neural network layer. hl−1
t and hl−1
s denote the node vectors
In this section, we illustrate the proposed meta-path of t and s in (l − 1)-th layer, respectively, where h0t and h0s
based hierarchical attentional graph neural network, named are initialized as 100-dimensional vector by word2vec [46].
MHAGNN. MHAGNN consists of two modules: (1) a meta- Edge-based attention: The edge-based attention score
path attention mechanism for capturing the representations Attledge for the target node t is defined as following:
of heterogeneous relations; and (2) a multi-granularity at-   µ(ψ(e))
tention mechanism for capturing long-range dependency in Attledge = K l (s)Wψ(e) Ql (t)T · q (4)
d
the meta-path graph. h
5

800000
700000
600000
500000
Number

400000
300000
200000
100000
         
0
, )
, )
, )
, )
St,0,Sy)
Sy ,Sy)
Sy St)
, )
,3, )
St, Sy)
Sy ,Sy)
Ex St)
,1, )
St, Ex)
Ex 0,Ex)
Ex 3,Ex)
St,0,St)
Ex 0,St)
Ex 3,St)
, )
St,2,Ex)
Sy ,Ex)
St, St)
St, ,Ex)
Ex 3,St)
St,1,Ex)
St, ,Sy)
Sy ,Sy)
St, St)
Ex 2,St)
Ex 1,St)
St, Sy)
St, 1,St)
Ex ,Ex)
Ex St)
)
Sy 0,Sy
Sy 3,Sy
Sy 2,Sy
Ex 1,Sy

Sy 0,Ex
Ex 3,Ex

Sy 0,Ex

Sy Sy

Ex
,0,

,3,

,1,

,2,

,2,
,2,
,1,

,2,
3

2
0

2
1
,

,
,

,
,
Sy

Meta-path type
Fig. 3: Distribution of different types of meta paths on FFMPeg+Qemu dataset. ‘St’ , ‘Ex’ and ‘Sy ’ denote the ‘Statement’,
‘Expression’ and ‘Symbol’ types of nodes. ‘0’, ‘1’, ‘2’, and ‘3’ represent ’AST’, ‘DFG’, ‘CFG’ and ‘NCS’ type edges,
respectively. The x-axis indicates the type of meta-path, and the y-axis indicates the number of each meta-path.

��(�)

Message
···
···
�(�)
Statement ��(�) node weight ···
�(�) ��−�

Neighbor
··· ··· ···
··· ··· ··· ��(�)
Attention
������(��(�) , ��(�) )
···

e =(s,t) sum
AST
�(�) ��(�) ��(�) ���
��−�

Attention
···
×
···
··· ��(�) < �(�), �(�), �(�)>
···
+

···
···

Expression meta-path weight softmax


�(�)

Fig. 4: The architecture of meta-path attention.

where Wψ(e) and µ(ψ(e)) denote the trainable matrix and from source nodes to the target nodes, which incorporates
parameter for each edge type ψ(e), representing the impor- the heterogeneous relations into l-th layer’s. To enhance
tance of the edge type to the source node s and the target the ability of GNN to represent different nodes, we also
node t, respectively. d and H denote the vector dimension establish a residual connection [47] with the previous (l−1)-
and the number of edge-based attention heads, respectively. th layer’s output, and get the updated node vector hlt as:
Meta-path attention: Based on the computed node-
based attention and edge-based attention, we devise the  
meta-path attention score Attl(τ (s),ψ(e),τ (t)) for the target (l−1)
X
hlt = σ  Attl(τ (s),ψ(e),τ (t)) · V l (s) + ht (6)
node t to model the heterogeneity of the relationship s∈Nt
(τ (s), ψ(e), τ (t)):
   V l (s) = Linearτ (s) (hl−1
s ) (7)
Attl(τ (s),ψ(e),τ (t)) = sof tmax kH 1 Attledge + Attlnode
(5) where Nt denotes the set of neighboring nodes of node t
where ||H represents the concatenation of H attention and hsl−1 denotes the vector representation of the node s in
1
heads. (l − 1)-th layer. V l (s) is also the linear projection of node s.
Finally, we sum the attention scores of all neighbor nodes
connected to the node t, which is used as the meta-path 3.2.2 Multi-granularity Attention
attention score of the node t. We treat meta-path attention To enhance the representations of nodes at different granu-
score Attl(τ (s),ψ(e),τ (t)) as input and utilize message passing larities, we propose to adopt both the average-pooling layer
6

TABLE 2: Statistics of the datasets. 4.2.2 Baseline Methods


We consider the token-based methods and graph-based
Dataset Samples Ratio (#Vul:#Non-vul) Language
methods in vulnerability detection. We implement the base-
FFMPeg+Qemu [14] 22,361 1:1.2 C lines and their corresponding parameter settings based on
Reveal [18] 18,169 1:9.9 C/C++ the original papers as much as possible. Since Devign [14]
Fan et al. [37] 179,299 1:16 C/C++ did not publish their source code and parameter settings,
we used the repository reproduced by Reveal [18].
Token-based methods: VulDeePecker [17] splits source
and the max-pooling layer simultaneously. The average- code into program slices with control dependencies incorpo-
pooling layer [48] is designed to capture the long-range rated. The program slices are fed into the model built with
dependency in the meta-path graph; while the max-pooling an LSTM and attention mechanism. Russell et al. [25] em-
layer aims at magnifying the contribution of the key bed the labeled source code into the corresponding matrix.
nodes considering that only few nodes in the graph are They detect code vulnerabilities by CNN, integrated learn-
vulnerability-related [49]. The multi-granularity attention ing, and random forest classifiers. SySeVR uses multiple
score M is calculated as below: code features as input (i.e., code statements, program de-
M = σ (M LP (ω1,i · AvgP ool(Fi ) + ω2,i · M axP ool(Fi ))) pendencies, and program slices) and utilizes a bidirectional
(i = st, ex, sy) RNN for code vulnerability detection.
(8) Graph-based methods: Devign [14] uses the GGNN
where ω denotes a trainable weight for different levels approach on the code structure graph for vulnerability
average-pooled and max-pooled features and i denotes detection. Reveal [18] uses Code Structure Graph (CPG) as
different levels (i.e., granularities) of nodes. Fst , Fex , Fsy the input and uses GGNN in the feature extraction step,
are the node type representation of “Statement”, “Expres- and uses MLP and Triplet Loss during the training phase.
sion” and “Symbol”, respectively, which are calculated as IVDetect [19] constructs the Program Dependency Graph
|Vi |
Fi = {ht }q=1 , (i = st, ex, sy). |Vi | denotes the number of a (PDG) and uses GCN to learn the graph representation for
single node types in the whole graph. The single node vector capturing vulnerable pattern.
ht is 
concatenated by bidirectional GRU [50], calculated as
−−−→ l  ←−−− l  4.2.3 Implementation
ht = GRU ht || GRU ht . Finally, we use the typical
We implement our model MAGNET in Python 3.7 using Py-
CrossEntropy loss function [51] for vulnerability prediction.
Torch [52] and Deep Graph Library (DGL) [53]. We train our
model with the NVIDIA GeForce RTX 3090 GPU, installed
4 E VALUATION with Ubuntu 20.04 and CUDA 11.4. In the embedding layer,
4.1 Research Questions the initial input dimension d is set to 100 and the hidden
state dimension is set to 64. The number of MHAGNN lay-
We evaluate the MAGNET with the state-of-the-art vul-
ers is set to 2 and the head of meta-path attention is set to h
nerability methods and aim at answering the following
= 4. We adopt Adam optimizer [54] to train our model with
questions:
a learning rate 5e−4 . The batch sizes for FFMPeg+Qemu,
RQ1: How does our method MAGNET perform in vul- Reveal, and Fan et al. datasets are set as 512, 512 and 256,
nerability detection? respectively.
RQ2: What is the impact of different modules in In addition, to ensure the fairness of the experiments,
MHAGNN on the detection performance of MAG- we use the same data splitting for all baseline approaches as
NET? MAGNET. We randomly partition the dataset into disjoint
RQ3: How effective is MAGNET for detecting Top-25 train, valid, and test sets with the ratio of 8:1:1.
Most Dangerous CWE?
RQ4: What is the influence of hyper-parameters on the
performance of MAGNET? 4.3 Evaluation Metrics
We use the following four widely-used evaluation metrics
to measure our model’s performance:
4.2 Experiment Setup
Precision: Precision is the percentage of true vulnerabili-
4.2.1 Dataset ties among the vulnerabilities retrieved. T P and F P are the
In the experiments, we conduct MAGNET on three vul- number of true positives and false positives, respectively.
nerability datasets, i.e., FFMPeg+Qemu [14], Reveal [18],
TP
and Fan et al. [37]. The statistics of the three datasets are P recision = (9)
listed in detail in Table 2. The FFMPeg+Qemu dataset was TP + FP
proposed by Devign [14], which contains +10k vulnerable Recall: Recall is the percentage of vulnerabilities that are
and +12k non-vulnerable entries. It is a relatively balanced retrieved out of all vulnerable samples. T P and F N are the
dataset and 45.0% of the samples are vulnerable. The Reveal number of true positives and false negatives, respectively.
and Fan et al. datasets are imbalanced, which contain +18k
TP
samples with 9.16% vulnerable and +179k samples with Recall = (10)
5.88% vulnerable samples. For the programming languages, TP + FN
FFMPeg+Qemu only comes from open-source C projects, F1 score: F1 score is the harmonic mean of precision and
and the others datasets collect open-source C/C++ projects. recall metrics.
7

TABLE 3: Comparison results between MAGNET and the baselines on the three datasets. “-” means that the baseline does
not apply to the dataset in this scenario. The best result for each metric is highlighted in bold. The shaded cells represent
the performance of the top-3 best methods in each metric. Darker cells represent better performance.

Dataset
FFMPeg+Qemu [14] Reveal [18] Fan et al.. [37]
Metrics(%)
Baseline Accuracy Precision Recall F1 score Accuracy Precision Recall F1 score Accuracy Precision Recall F1 score
VulDeePecker 49.61 46.05 32.55 38.14 76.37 21.13 13.10 16.17 81.19 38.44 12.75 19.15
Russell et al. 57.60 54.76 40.72 46.71 68.51 16.21 52.68 24.79 86.85 14.86 26.97 19.17
SySeVR 47.85 46.06 58.81 51.66 74.33 40.07 24.94 30.74 90.10 30.91 14.08 19.34
Devign 56.89 52.50 64.67 57.95 87.49 31.55 36.65 33.91 92.78 30.61 15.96 20.98
Reveal 61.07 55.50 70.70 62.19 81.77 31.55 61.14 41.62 87.14 17.22 34.04 22.87
IVDetect 57.26 52.37 57.55 54.84 - - - - - - - -
MAGNET 63.28 56.27 80.15 66.12 91.60 42.86 61.68 50.57 91.38 22.71 38.92 28.68

token-based methods. For example, the gray cells repre-


P recision × Recall senting the top-3 best results in the FFMPeg+Qemu dataset
F 1 score = 2 × (11)
P recision + Recall appear 6 times in the graph-based methods but only twice
Accuracy: Accuracy is the percentage of correctly clas- in the token-based methods. The reason may be attributed
sified samples to all samples. T N is the number of true to that token-based methods are more advantageous in
negatives and T P + T N + F N + F P represents the all capturing the sequence information of codes while ignoring
samples. the structural information. In contrast, graph-based meth-
ods can better capture code structure information, which is
TP + TN
Accuracy = (12) beneficial for vulnerability detection.
TP + TN + FN + FP
5.1.2 Result Visualization
5 E XPERIMENTAL R ESULT To further analyze the effectiveness of MAGNET, we visu-
5.1 RQ1: How does our method MAGNET perform in alize the representations learnt by MAGNET via the pop-
vulnerability detection? ular t-SNE technique [55]. For comparison, we also use t-
To answer this research question, we first explore the per- SNE to visualize existing graph-based code vulnerability
formance of MAGNET and compare it with other baseline detection methods as well. In the t-SNE space, a larger
methods. Then, we visualize the features learned by the distance between different classes (i.e., vulnerable and non-
MAGNET to verify the validity of the learned vulnerability vulnerable examples) of nodes indicates a clear and greater
patterns. separability of classes, which leads to a higher performance
of vulnerability detection. In addition, to facilitate the quan-
5.1.1 Effectiveness of MAGNET tification, we use centroids distance D [56] for quantitatively
Table 3 shows the overall results of all baseline models illustrating the separability between different classes.
and MAGNET on the four evaluation metrics. Overall, The feature visualization graphs of t-SNE for the existing
MAGNET achieves better results and outperforms all of graph-based models are shown in Fig. 5a to 5c. It shows that
the six referred token-based and graph-based approaches the positive and negative samples in Devign are thoroughly
on FFMPeg+Qemu, Reveal and Fan et al. dataset in terms mixed, with the central distance at only 0.0108. Compared
of F1 score by 6.32%, 21.50% and 25.40%, respectively. with Devign, both Reveal and IVDetect obtain larger central
For the four performance metrics on the three datasets, distances, and the scatter appears more dispersed but still
MAGNET has the best performance in 10 out of the 12 lacks the separability visible to the naked eye. In Fig. 5d,
cases. Compared with the best-performing baseline Reveal, we show the separability of our MAGNET. We can observe
our method obtains an average performance improvement that the left side aggregates more vulnerability samples,
of 6.84%, 23.04%, 9.53% and 17.74% on the four metrics, while the right side has more non-vulnerable examples.
respectively. Besides, MAGNET shows the largest central distance at
We observe that MAGNET outperforms all the baseline 0.2901 among all the methods. The visualization further
methods on the three datasets in terms of F1 score and demonstrates the effectiveness of MAGNET in distinguish-
recall metric. Compared with the best-performing base- ing vulnerable code from non-vulnerable code.
line method, MAGNET achieves an average absolute im-
provement of 6.23% with respect to the F1 score on the Answer to RQ1: MAGNET outperforms all baseline
three datasets. MAGNET also improves the recall metric methods in terms of recall and F1-score. On the F1
by 13.37%, 0.88% and 14.34%, respectively, over the best score, MAGNET improved by 6.32%, 21.50% and
baseline methods. Such improvement benefits the scenario 25.40% compared with the best baseline Reveal on
of vulnerability detection, since a higher recall indicates a the three datasets, respectively. Visualization further
larger percentage of vulnerabilities that can be detected. demonstrates that MAGNET distinguishes code vul-
Experimental results also show that three graph-based nerabilities better than the baselines.
methods Devign, Reveal and IVDetect outperform the three
8

1.01.0 Non-Vulnerable
Non-Vulnerable 1.0
1.0
1.0
1.0 Non-Vulnerable
Non-Vulnerable
Vulnerable
1.0
1.0
Vulnerable
Vulnerable
0.80.8 Vulnerable 0.8
0.8
0.8
0.8 0.8
0.8
0.60.6 0.6
0.6
0.6
0.6 0.6
0.6
0.40.4
0.4
0.4
0.4
0.4
0.4 0.4
0.2
0.2
0.2 0.2
0.2
0.2
0.2 0.2 Non-Vulnerable
Non-Vulnerable
Non-Vulnerable
Vulnerable
000 000 Vulnerable
Vulnerable
0 0000 0.2
0.2 0.40.4 0.60.6 0.8
0.8 1.0
1.0 0 0000 0.2
0.2 0.4
0.4 0.6
0.6 0.8 1.0
0.2
0.2 0.4 0.6
0.4 Devign
(a): 0.6 0.8
0.8 1.0
1.0 0.2
0.2 0.4
0.4 0.6
0.6 0.8
0.8 1.0
1.0
(a):Devign
(a):
(a):
(a) Devign Devign
Devign
D = 0.0108
(b):
(b): Reveal
(b): Reveal
(b):DReveal
(b) Reveal Reveal
= 0.0427
1.0
1.0
1.0
1.0 Non-Vulnerable
Non-Vulnerable 1.0
1.0
1.0
1.0 Non-Vulnerable
Non-Vulnerable
Non-Vulnerable
Non-Vulnerable
Vulnerable
Vulnerable Non-Vulnerable
Non-Vulnerable
Vulnerable
Vulnerable
0.8
0.8 Vulnerable
Vulnerable 0.8
0.8 Vulnerable
Vulnerable
0.8
0.8 0.8
0.8
0.6
0.6
0.6
0.6 0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4 0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2 0.2
0.2
0.2
0.2
0000 0000
0000 0.2
0.2
0.2
0.2 0.4
0.4
0.4 0.6
0.6
0.6
0.4 IVDETECT
0.6 0.8
0.8
0.8
0.8 1.0
1.0
1.0
1.0 0000 0.2
0.2
0.2
0.2 0.4
0.4
0.4 0.6
0.60.8
0.8
0.4(d): Our0.6
0.8
0.6
0.8 1.0
1.0
1.0
1.0
(c):
(c):
(c):IVDETECT
(c):IVDETECT
IVDETECT (d):
(d): Our
(d):Our
Our
(c) IVDetect D = 0.0110 (d) MAGNET D = 0.2901

Fig. 5: The t-SNE [55] plot illustrates the distribution between vulnerable (pink) and non-vulnerable (dark blue) examples
in the code representations of the different approaches. D indicates the centroids distance between the centers of the
vulnerable and non-vulnerable examples.

5.2 RQ2: What is the impact of different modules of which improves the F1 score by 11.93%, 21.18%, and 33.54%
MHAGNN? on the three datasets, respectively. The reason may be at-
To answer this research question, we explore the effect of tributed to that the multi-granularity attention facilitates the
each module in MHAGNN on the performance of MAGNET learning process of global long-range dependency, and can
by performing ablation study on all three datasets. better capture the patterns of vulnerable code.
We construct the following three variations of MAGNET
for comparison: (1) without edge-based attention (denoted Answer to RQ2: The various components of the
as w/o edge-att): we remove the edge-based attention to MHAGNN effectively improve the MAGNET per-
validate the impact of the edge-based attention; (2) without formance. The multi-granularity attention layer con-
node-based attention (denoted as w/o node-att): we remove tributes the most to the overall performance.
the node-based attention to verify the impact of node-
based attention; (3) without the multi-granularity attention
(denoted as w/o multi-att): we obtain the graph representa-
tion through simply summing the node feature embeddings 5.3 RQ3: How effective is MAGNET for Top-25 Most
instead of using the proposed multi-granularity attention. Dangerous CWE?
The results of the different variants are shown in Table 4. Common Weakness Enumeration (CWE) [57] is a list of
We find that the performance of all the variants is lower than vulnerability weakness types, which serves as a common
that of MAGNET, which indicates that all the modules con- language for describing and identifying vulnerabilities. The
tribute to the overall performance of MAGNET. Specifically, Top-25 most dangerous CWEs list officially publishes the
without the edge-based attention, the results of accuracy, most common and impactful software vulnerabilities over
precision, recall and F1 score on the three datasets drop the previous two calendar years based on Common Vulner-
by 1.18%, 2.24%, 6.65%, and 3.57% on average, respectively. abilities and Exposures (CVE) data [58]. In order to explore
Without node-based attention, the four metrics decrease by the effectiveness of MAGNET in real-world scenarios, we
1.04%, 2.26%, 9.52%, and 5.02%, respectively. The node- validate the effectiveness of MAGNET on the Top-25 most
based attention and edge-based attention contribute greatly dangerous CWEs.
to the model performance, since they capture different types Specifically, we prepare the evaluation set by extracting
of structural information in the meta-path graph. the samples belonging to the Top-25 List from the Fan et al.
Among all the three parts, the multi-granularity atten- dataset. The evaluation set is named as Top-CWE dataset for
tion layer contributes the most to the overall performance, brevity in this paper. The Top-CWE dataset contains 8,989
9

TABLE 4: Results of ablation study.

Dataset
FFMPeg+Qemu [14] Reveal [18] Fan et al.. [37]
Metrics(%)
Metrics Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Precision Recall F1
w/o edge-att 61.16 55.96 69.52 62.01 91.02 39.60 55.14 46.09 90.54 19.57 36.15 25.39
w/o node-att 60.97 55.97 62.70 59.15 90.89 39.62 58.88 47.37 91.27 19.47 30.62 23.80
w/o multi-att 60.94 55.77 60.92 58.23 88.41 31.22 55.14 39.86 86.27 12.92 36.31 19.06
MAGNET 63.28 56.27 80.15 66.12 91.60 42.86 61.68 50.57 91.38 22.71 38.92 28.68

TABLE 5: The accuracy of baseline and MAGNET for the 68 67


top-25 most dangerous CWEs. Due to the small number of 67 66
partial vulnerabilities, we only show the types of vulnera- 66
65

F1 score

F1 score
bilities with more than 50 samples. Percentage indicates the 65
proportion of vulnerabilities in the samples. 64
64
63 63
CWE Type Percentage Devign Reveal IVDetect MAGNET
62 1 2 3 4 5 62 1 2 4 8 16
CWE-787 3.34% 45.37 51.71 56.60 83.19 Layers Head number
CWE-79 1.16% 39.13 65.22 51.43 71.79 (a) Layers (b) Head numbers
CWE-125 8.61% 47.14 52.54 61.08 72.86
CWE-20 23.48% 42.00 53.58 57.88 75.32 Fig. 6: Parameter analysis of MHAGNN’s layers and head
CWE-416 11.78% 46.67 58.13 61.05 72.97 numbers of meta-path attention on the FFMPeg+Qemu
CWE-22 0.77% 56.86 47.06 48.00 64.00 dataset.
CWE-190 3.84% 44.11 52.47 59.20 76.34
CWE-287 0.72% 35.14 48.65 51.28 71.43
CWE-476 5.33% 48.00 49.41 63.22 73.54
5.4 RQ4: What is the influence of hyper-parameters on
CWE-119 28.61% 43.77 51.23 60.83 77.07
the performance of MAGNET?
CWE-200 9.02% 44.59 51.18 56.17 76.82
CWE-732 1.64% 34.38 56.25 43.75 76.67 In this section, we explore the impact of two key hyper-
Average 44.38 52.26 59.24 75.70 parameters on the performance of MAGNET, including the
number of layers of MHAGNN and the number of meta-
path attention heads.

code functions, and the specific type distribution is shown 5.4.1 Layer Number of MHAGNN
in the table 5. In terms of quantity, CWE-119 (the Bounds
of a Memory Buffer) [59] and CWE-20 (Improper Input We explore the effect of different numbers of layers in
Validation) [60] have the largest proportion, with 28.61% MHAGNN on the performance of MAGNET. Fig. 6a shows
and 23.48%, respectively. We then train on the FFMPeg+Qeu the F1 score of MAGNET with different numbers of
dataset and test on the Top-CWE dataset. We compare with MHAGNN layers. As can be seen, the F1 score of MAGNET
all the graph-based baselines that have been trained on the first shows an increasing trend and then decreases as the
FFMPeg+Qemu model. number of layers increases on the FFMPeg+Qemu dataset,
and a similar trend is observed on the other two datasets
As shown in Table 5, our MAGNET achieves more than
as well. MAGNET generally obtains the highest F1 score
70% accuracy on all vulnerabilities, with an average im-
of 66.12% when the number of layers is set as 4. We sup-
provement of 27.78%, compared to the previous best base-
pose that MAGNET can better capture the information of
lines. It indicates that our method is able to discover differ-
the neighborhood as the layer number increases. However,
ent real-world vulnerabilities more accurately. Specifically,
as the layer number further increases, the over-smoothing
our method obtains the highest identification accuracy of
issue would reduce the model performance.
83.19% on CWE-287 (Improper Authentication) [61] among
all the types of vulnerabilities detected. In CWE-20 and
CWE-119, which account for a large proportion, MAGNET 5.4.2 Attention Head Numbers
has an accuracy of 75.32% and 77.07%, respectively, showing We analyze the effect of different attention head numbers
30.13% and 26.70% improvement over previous methods, on the performance of MAGNET, with results shown in
respectively. Fig. 6b. As can be seen, MAGNET achieves the optimal
F1 score when the number of attention heads is set as 4.
Answer to RQ3: MAGNET achieves more than Overall, more heads show a significant improvement in F1
70% accuracy on real-world vulnerabilities, with score compared to a smaller number of heads, which indi-
an improvement of 27.78% over the previous best- cates that more heads are beneficial for capturing the code
performing baseline. structure information in the meta-path graph. However, the
performance starts to degrade after more than 4 heads. We
think that lots of heads result in a smaller dimension for
10

Therefore, we only conduct experiments on the C/C++


dataset and do not choose other programming languages
such as Java and python. However, the main idea of MAG-
NET can be generalized to other programming languages
because the approach does not rely on language-specific
features. We will evaluate MAGNET on more programming
languages in our future work.

7 R ELATED W ORK
Fig. 7: The heatmap of attention weights for a code ex-
ample from CWE-190 (Integer Overflow or Wraparound). Recently, learning-based vulnerability detection has been a
The code in red indicates a vulnerable statement. The red, significant research problem in software engineering. De-
orange, yellow and green-shaded statements indicate that pending on how the source code is represented and which
the corresponding statements are associated with decreasing type of learning model is utilised, existing technologies can
attention weights. be generally divided into two different types: token-based
and graph-based methods.
The token-based methods [11], [17], [25], [63], [64] treat
each head, which leads to worse results in fitting the target the code as a sequence of tokens, which contain two phases:
data [62]. feature extraction and training. In the feature extraction
phase, the token-based methods usually extract token-based
Answer to RQ4: The hyper-parameter settings can features as the model input, which include identifiers, key-
impact the performance of MAGNET. We empiri- words, separators, and operators. These features are usually
cally set the values of the hyper-parameters. specified and written by developers and can represent the
different line structure [65] of the code. For example, Russell
et al. [25] divides each code fragment at the function level
and treats them as an individual sample. It generates a lexi-
6 D ISCUSSION cal token sequence for each function to represent the whole
code sample feature set. Code2Seq [63] uses path-context
6.1 Case Study
extracted features for each program method and splits code
We conduct a case study to further verify the effectiveness tokens into subtokens. In the training phase, these methods
of MAGNET in vulnerability detection. For analysis, we treat source code as sequences via utilizing various deep
visualize the attention weight of each statement produced neural networks. Russell et al. use CNN [66] for code
by MAGNET. Fig. 7 visualizes the heatmap of attention vulnerability detection, concatenating these token features
weights for a vulnerable example from CWE-190. In this through convolution filters. SyseVR [11] uses GRU [67] to
example, the line 6 is the vulnerable statement, where the capture the sequence information of the code. Code2Seq
sum of the three variables may exceed the maximum value uses BiLSTM [68] to encode the necessary information in
of the short int primitive type, producing a potential integer a sequence of tokens.
overflow. MAGNET notices the vulnerability and gives this In recent years, the graph-based methods [14], [18], [19],
statement the highest level of attention (red-shaded). The [69], [70] have achieved state-of-the-art performance on vul-
initialization statements (lines 3-5) also present higher at- nerability detection. They capture more structural informa-
tention weights (orange-shaded and red-shaded). From the tion in the source code than token-based methods. They gen-
case, we guess that MAGNET is able to capture the code erally represent source code snippets as graphs generated
structural information and patterns of vulnerabilities, which from static analysis. Depending on different code represen-
is helpful for detecting code vulnerabilities effectively. tations, they design various GNN models for detecting code
vulnerabilities. For example, VGDETECTOR [69] uses CFGs
6.2 Threat to Validity to embed the execution order of a code sample and then
uses the GCN [26] to capture neighborhood information in
Dataset Partition. None of the existing baseline methods
the graph structure. Reveal [18] first generate CPG [21] and
publish their divisions of datasets, so we can not completely
use Word2Vec [46] to initial the node vector representations
reproduce the previous results. Following the data division
of the code tokens. Then they use GGNN [27] to detect
of previous work [14], [18], we perform a dataset division
code vulnerabilities. However, all these methods focus on
and reproduce the experimental results based on their arti-
learning the local features of nodes, and fail to capture het-
cles as much as possible. Reveal and IVDetect use different
erogeneous relations and long-range dependencies among
preprocessing methods, which may lead to inconsistencies
different types of nodes and edges. In this paper, we propose
in the dataset under the same division. We relied on their
a meta-path based attentional graph learning model to learn
source code for the preprocessing work as much as possible.
the heterogeneous relations and long-range dependencies in
Devign [18] did not publish the source code of their im-
the code structure graph.
plementation, we reproduce Devign based on Reveal’s [18]
implementation version and try to be consistent with the
original description. 8 C ONCLUSION
Generalizability of Other Programming Languages. Our In this paper, we propose MAGNET, a meta-path based
node classification is based on the AST node type in C/C++. attentional graph learning model for vulnerability detec-
11

tion. MAGNET consists of a multi-granularity meta-path [17] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong,
construction, which consider the heterogeneous relations “Vuldeepecker: A deep learning-based system for vulnerability
detection,” in 25th Annual Network and Distributed System Security
between the different node and edge type. We also propose Symposium, NDSS 2018, San Diego, California, USA, February 18-21,
a multi-level attentional graph neural network MHAGNN 2018. The Internet Society, 2018.
to comprehensively capture long-range dependencies and [18] S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning
structural information in the meta-path graph. Our ex- based vulnerability detection: Are we there yet?” CoRR, vol.
abs/2009.07235, 2020.
perimental results on three popular datasets validate the
[19] Y. Li, S. Wang, and T. N. Nguyen, “Vulnerability detection with
effectiveness of MAGNET, and the ablation studies and fine-grained interpretations,” in ESEC/FSE ’21: 29th ACM Joint
visualizations further confirm the advantages of MAGNET. European Software Engineering Conference and Symposium on the
Compared with state-of-the-art deep learning-based meth- Foundations of Software Engineering, Athens, Greece, August 23-28,
2021. ACM, 2021, pp. 292–303.
ods, MAGNET gains better performance and detects more [20] J. K. Siow, S. Liu, X. Xie, G. Meng, and Y. Liu, “Learning program
vulnerabilities in the real world. semantics with code representations: An empirical study,” CoRR,
vol. abs/2203.11790, 2022.
[21] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and
discovering vulnerabilities with code property graphs,” in 2014
R EFERENCES IEEE Symposium on Security and Privacy, SP 2014. IEEE Computer
Society, 2014, pp. 590–604.
[1] S. M. Ghaffarian and H. R. Shahriari, “Software vulnerability [22] I. Neamtiu, J. S. Foster, and M. Hicks, “Understanding source code
analysis and discovery using machine-learning and data-mining evolution using abstract syntax tree matching,” ACM SIGSOFT
techniques: A survey,” ACM Comput. Surv., vol. 50, no. 4, pp. 56:1– Softw. Eng. Notes, vol. 30, no. 4, pp. 1–5, 2005.
56:36, 2017. [23] X. Huo, M. Li, and Z. Zhou, “Control flow graph embedding based
[2] A. Johnson, K. Dempsey, R. Ross, S. Gupta, D. Bailey et al., “Guide on multi-instance decomposition for bug localization,” in The
for security-focused configuration management of information Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020,
systems,” NIST special publication, vol. 800, no. 128, pp. 16–16, 2011. The Thirty-Second Innovative Applications of Artificial Intelligence
[3] Y. Wei, X. Sun, L. Bo, S. Cao, X. Xia, and B. Li, “A comprehensive Conference, IAAI 2020, The Tenth AAAI Symposium on Educational
study on security bug characteristics,” J. Softw. Evol. Process., Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA,
vol. 33, no. 10, 2021. February 7-12, 2020. AAAI Press, 2020, pp. 4223–4230.
[4] Q. Tan, X. Wang, W. Shi, J. Tang, and Z. Tian, “An anonymity [24] C. Cummins, Z. V. Fisches, T. Ben-Nun, T. Hoefler, M. F. P. O’Boyle,
vulnerability in tor,” IEEE/ACM Transactions on Networking, vol. 30, and H. Leather, “Programl: A graph-based program representation
no. 6, pp. 2574–2587, 2022. for data flow analysis and compiler optimizations,” in Proceedings
[5] Bugcrowd. (2022) 2022 bugcrowd priority one report. of the 38th International Conference on Machine Learning, ICML
[Online]. Available: https://www.bugcrowd.com/resources/ 2021, 18-24 July 2021, Virtual Event, ser. Proceedings of Machine
reports/priority-one-report/ Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR,
[6] Wikipedia. (2021) Log4shell. [Online]. Available: https://en. 2021, pp. 2244–2253.
wikipedia.org/wiki/Log4Shell [25] R. L. Russell, L. Y. Kim, L. H. Hamilton, T. Lazovich, J. Harer,
[7] S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, “Predicting O. Ozdemir, P. M. Ellingwood, and M. W. McConley, “Automated
vulnerable software components,” in Proceedings of the 2007 ACM vulnerability detection in source code using deep representation
Conference on Computer and Communications Security, CCS 2007, learning,” in 17th IEEE International Conference on Machine Learning
Alexandria, Virginia, USA, October 28-31, 2007, P. Ning, S. D. C. and Applications, ICMLA 2018, Orlando, FL, USA, December 17-20,
di Vimercati, and P. F. Syverson, Eds. ACM, 2007, pp. 529–540. 2018, M. A. Wani, M. M. Kantardzic, M. S. Mouchaweh, J. Gama,
[8] R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen, “Predict- and E. Lughofer, Eds. IEEE, 2018, pp. 757–762.
ing vulnerable software components via text mining,” IEEE Trans. [26] T. N. Kipf and M. Welling, “Semi-supervised classification with
Software Eng., vol. 40, no. 10, pp. 993–1006, 2014. graph convolutional networks,” in 5th International Conference on
[9] Y. Shin, A. Meneely, L. A. Williams, and J. A. Osborne, “Evaluating Learning Representations, ICLR 2017, Toulon, France, April 24-26,
complexity, code churn, and developer activity metrics as indica- 2017, Conference Track Proceedings. OpenReview.net, 2017.
tors of software vulnerabilities,” IEEE Trans. Software Eng., vol. 37,
[27] Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel, “Gated
no. 6, pp. 772–787, 2011.
graph sequence neural networks,” in 4th International Conference
[10] Y. Wu, D. Zou, S. Dou, W. Yang, D. Xu, and H. Jin, “Vulcnn: An
on Learning Representations, ICLR 2016, 2016.
image-inspired scalable vulnerability detection system,” in 44th
IEEE/ACM 44th International Conference on Software Engineering, [28] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and
ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2022, Y. Bengio, “Graph attention networks,” CoRR, vol. abs/1710.10903,
pp. 2365–2376. 2017.
[11] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, and Z. Chen, “Sysevr: A frame- [29] X. Wang, Q. Wu, H. Zhang, C. Lyu, X. Jiang, Z. Zheng, L. Lyu, and
work for using deep learning to detect software vulnerabilities,” S. Hu, “Heloc: Hierarchical contrastive learning of source code
IEEE Trans. Dependable Secur. Comput., vol. 19, no. 4, pp. 2244–2258, representation,” CoRR, vol. abs/2203.14285, 2022.
2022. [30] W. Min, W. Rongcun, and J. Shujuan, “Source code vulnerability
[12] D. Hin, A. Kan, H. Chen, and M. A. Babar, “Linevd: Statement- detection based on relational graph convolution network,” Journal
level vulnerability detection using graph neural networks,” CoRR, of Computer Applications, vol. 42, no. 6, p. 1814, 2022.
vol. abs/2203.05181, 2022. [31] F. Liu, Z. Cheng, L. Zhu, Z. Gao, and L. Nie, “Interest-aware
[13] M. Fu and C. Tantithamthavorn, “Linevul: A transformer-based message-passing GCN for recommendation,” in WWW ’21: The
line-level vulnerability prediction,” 2022. Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23,
[14] Y. Zhou, S. Liu, J. K. Siow, X. Du, and Y. Liu, “Devign: Effective 2021, J. Leskovec, M. Grobelnik, M. Najork, J. Tang, and L. Zia,
vulnerability identification by learning comprehensive program Eds. ACM / IW3C2, 2021, pp. 1296–1305.
semantics via graph neural networks,” in Advances in Neural [32] U. Alon and E. Yahav, “On the bottleneck of graph neural net-
Information Processing Systems 32: Annual Conference on Neural works and its practical implications,” in 9th International Conference
Information Processing Systems 2019, NeurIPS 2019, 2019, pp. 10 197– on Learning Representations, ICLR 2021, Virtual Event, Austria, May
10 207. 3-7, 2021. OpenReview.net, 2021.
[15] S. Cao, X. Sun, L. Bo, R. Wu, B. Li, and C. Tao, “MVD: memory- [33] D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan,
related vulnerability detection based on flow-sensitive graph neu- A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. B. Clement,
ral networks,” CoRR, vol. abs/2203.02660, 2022. D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou, “Graph-
[16] H. Wang, G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, Y. Feng, codebert: Pre-training code representations with data flow,” in
L. Bian, and Z. Wang, “Combining graph-based learning with 9th International Conference on Learning Representations, ICLR 2021,
automated data collection for code vulnerability detection,” IEEE Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
Trans. Inf. Forensics Secur., vol. 16, pp. 1943–1958, 2021. [34] D. Zhu, X. Dai, and J. Chen, “Pre-train and learn: Preserving global
12

information for graph neural networks,” J. Comput. Sci. Technol., A. Köpf, E. Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chil-
vol. 36, no. 6, pp. 1420–1430, 2021. amkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch:
[35] Q. Li, Z. Han, and X. Wu, “Deeper insights into graph convolu- An imperative style, high-performance deep learning library,”
tional networks for semi-supervised learning,” in Proceedings of in Advances in Neural Information Processing Systems 32: Annual
the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI- Conference on Neural Information Processing Systems 2019, NeurIPS
18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach,
and the 8th AAAI Symposium on Educational Advances in Artificial H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and
Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, R. Garnett, Eds., 2019, pp. 8024–8035.
2018. AAAI Press, 2018, pp. 3538–3545. [53] M. Wang, L. Yu, D. Zheng, Q. Gan, Y. Gai, Z. Ye, M. Li,
[36] Y. Xiong, Y. Zhang, X. Kong, H. Chen, and Y. Zhu, “Graphincep- J. Zhou, Q. Huang, C. Ma, Z. Huang, Q. Guo, H. Zhang, H. Lin,
tion: Convolutional neural networks for collective classification J. Zhao, J. Li, A. J. Smola, and Z. Zhang, “Deep graph library:
in heterogeneous information networks,” IEEE Trans. Knowl. Data Towards efficient and scalable deep learning on graphs,” CoRR,
Eng., vol. 33, no. 5, pp. 1960–1972, 2021. vol. abs/1909.01315, 2019.
[37] J. Fan, Y. Li, S. Wang, and T. N. Nguyen, “A C/C++ code vulnera- [54] Z. Zhang, “Improved adam optimizer for deep neural networks,”
bility dataset with code changes and CVE summaries,” in MSR ’20: in 26th IEEE/ACM International Symposium on Quality of Service,
17th International Conference on Mining Software Repositories, Seoul, IWQoS 2018, Banff, AB, Canada, June 4-6, 2018. IEEE, 2018, pp.
Republic of Korea, 29-30 June, 2020. ACM, 2020, pp. 508–512. 1–2.
[38] Cwe-476: Null pointer dereference. [Online]. Available: https: [55] V. der Maaten, Laurens, and H. Geoffrey, “Visualizing data using
//cwe.mitre.org/data/definitions/476.html t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
[39] M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to [56] C. Mao, Z. Zhong, J. Yang, C. Vondrick, and B. Ray, “Metric
represent programs with graphs,” in 6th International Conference on learning for adversarial robustness,” in Advances in Neural Informa-
Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 tion Processing Systems 32: Annual Conference on Neural Information
- May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Van-
[40] Y. Wu, D. Zou, S. Dou, S. Yang, W. Yang, F. Cheng, H. Liang, couver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer,
and H. Jin, “Scdetector: Software functional clone detection based F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 478–489.
on semantic tokens analysis,” in 35th IEEE/ACM International [57] Cwe. common weakness enumeratio, 2021. [Online]. Available:
Conference on Automated Software Engineering, ASE 2020, Melbourne, https://cwe.mitre.org/.
Australia, September 21-25, 2020. IEEE, 2020, pp. 821–833. [58] “2021 cwe top 25 most dangerous software weaknesses”, 2021.
[41] W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting code clones [Online]. Available: https://cwe.mitre.org/top25/archive/2021/
with graph neural network and flow-augmented abstract syntax 2021 cwe top25.html.
tree,” in 27th IEEE International Conference on Software Analysis, [59] Cwe-119: Improper restriction of operations within the bounds
Evolution and Reengineering, SANER 2020, London, ON, Canada, of a memory buffer. [Online]. Available: https://cwe.mitre.org/
February 18-21, 2020, K. Kontogiannis, F. Khomh, A. Chatzigeor- data/definitions/119.html
giou, M. Fokaefs, and M. Zhou, Eds. IEEE, 2020, pp. 261–271. [60] Cwe-20: Improper input validation. [Online]. Available: https:
[42] W. Hua, Y. Sui, Y. Wan, G. Liu, and G. Xu, “FCCA: hybrid //cwe.mitre.org/data/definitions/20.html
code representation for functional clone detection using attention [61] Cwe-287: Improper authentication. [Online]. Available: https:
networks,” IEEE Trans. Reliab., vol. 70, no. 1, pp. 304–318, 2021. //cwe.mitre.org/data/definitions/287.html
[43] P. J. Landin, “The mechanical evaluation of expressions,” The [62] P. Michel, O. Levy, and G. Neubig, “Are sixteen heads really better
computer journal, vol. 6, no. 4, pp. 308–320, 1964. than one?” in Advances in Neural Information Processing Systems 32:
[44] C. Wang, Y. Song, H. Li, M. Zhang, and J. Han, “Unsupervised Annual Conference on Neural Information Processing Systems 2019,
meta-path selection for text similarity measure based on hetero- NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M.
geneous information networks,” Data Min. Knowl. Discov., vol. 32, Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox,
no. 6, pp. 1735–1767, 2018. and R. Garnett, Eds., 2019, pp. 14 014–14 024.
[45] W. Ning, R. Cheng, J. Shen, N. A. H. Haldar, B. Kao, X. Yan, [63] U. Alon, S. Brody, O. Levy, and E. Yahav, “code2seq: Generat-
N. Huo, W. K. Lam, T. Li, and B. Tang, “Automatic meta-path dis- ing sequences from structured representations of code,” in 7th
covery for effective graph-based recommendation,” in Proceedings International Conference on Learning Representations, ICLR 2019, New
of the 31st ACM International Conference on Information & Knowledge Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
Management, Atlanta, GA, USA, October 17-21, 2022, M. A. Hasan [64] V. Nguyen, T. Le, O. Y. de Vel, P. Montague, J. Grundy, and
and L. Xiong, Eds. ACM, 2022, pp. 1563–1572. D. Phung, “Information-theoretic source code vulnerability high-
[46] K. W. Church, “Word2vec,” Nat. Lang. Eng., vol. 23, no. 1, pp. 155– lighting,” in International Joint Conference on Neural Networks,
162, 2017. IJCNN 2021, Shenzhen, China, July 18-22, 2021. IEEE, 2021, pp.
[47] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep 1–8.
residual networks,” in Computer Vision - ECCV 2016 - 14th European [65] T. Kamiya, S. Kusumoto, and K. Inoue, “Ccfinder: A multilinguis-
Conference, Amsterdam, The Netherlands, October 11-14, 2016, Pro- tic token-based code clone detection system for large scale source
ceedings, Part IV, ser. Lecture Notes in Computer Science, B. Leibe, code,” IEEE Trans. Software Eng., vol. 28, no. 7, pp. 654–670, 2002.
J. Matas, N. Sebe, and M. Welling, Eds., vol. 9908. Springer, 2016, [66] Y. Kim, “Convolutional neural networks for sentence classifica-
pp. 630–645. tion,” in Proceedings of the 2014 Conference on Empirical Methods
[48] L. Sun, Z. Chen, Q. M. J. Wu, H. Zhao, W. He, and X. Yan, in Natural Language Processing, EMNLP 2014, October 25-29, 2014,
“Ampnet: Average- and max-pool networks for salient object Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the
detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 11, ACL, A. Moschitti, B. Pang, and W. Daelemans, Eds. ACL, 2014,
pp. 4321–4333, 2021. pp. 1746–1751.
[49] S. Woo, J. Park, J. Lee, and I. S. Kweon, “CBAM: convolutional [67] D. Tang, B. Qin, and T. Liu, “Document modeling with gated re-
block attention module,” in Computer Vision - ECCV 2018 - 15th current neural network for sentiment classification,” in Proceedings
European Conference, Munich, Germany, September 8-14, 2018, Pro- of the 2015 Conference on Empirical Methods in Natural Language
ceedings, Part VII, ser. Lecture Notes in Computer Science, V. Fer- Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015,
rari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11211. L. Màrquez, C. Callison-Burch, J. Su, D. Pighin, and Y. Marton,
Springer, 2018, pp. 3–19. Eds. The Association for Computational Linguistics, 2015, pp.
[50] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation 1422–1432.
by jointly learning to align and translate,” in 3rd International [68] A. Graves and J. Schmidhuber, “Framewise phoneme classification
Conference on Learning Representations, ICLR 2015, San Diego, CA, with bidirectional LSTM and other neural network architectures,”
USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Neural Networks, vol. 18, no. 5-6, pp. 602–610, 2005.
Y. LeCun, Eds., 2015. [69] X. Cheng, H. Wang, J. Hua, M. Zhang, G. Xu, L. Yi, and
[51] P. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A Y. Sui, “Static detection of control-flow-related vulnerabilities us-
tutorial on the cross-entropy method,” Ann. Oper. Res., vol. 134, ing graph embedding,” in 24th International Conference on Engineer-
no. 1, pp. 19–67, 2005. ing of Complex Computer Systems, ICECCS 2019, Guangzhou, China,
[52] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, November 10-13, 2019, J. Pang and J. Sun, Eds. IEEE, 2019, pp.
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, 41–50.
13

[70] Y. Ding, S. Suneja, Y. Zheng, J. Laredo, A. Morari, G. E. Kaiser, Conference on Software Analysis, Evolution and Reengineering, SANER
and B. Ray, “VELVET: a novel ensemble learning approach to 2022, Honolulu, HI, USA, March 15-18, 2022. IEEE, 2022, pp. 959–
automatically locate vulnerable statements,” in IEEE International 970.

You might also like