0% found this document useful (0 votes)

6 views

2021 - Makhor - Malware Detection Using Fuzzy Similarity of System Call Dependency Sequence

This document summarizes a research paper that proposes a new dynamic malware detection method called Markhor. Markhor uses system call dependency sequences to create patterns of malicious behavior. It then determines the similarity of an unknown file's system calls to these patterns using a fuzzy algorithm to classify the file as malicious or benign. The evaluation showed Markhor achieved high accuracy, precision, and F-measure in malware detection.

Uploaded by

aulia rachma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

2021 - Makhor - Malware Detection Using Fuzzy Similarity of System Call Dependency Sequence

Uploaded by

aulia rachma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Journal of Computer Virology and Hacking Techniques

https://doi.org/10.1007/s11416-021-00383-1

ORIGINAL PAPER

Markhor: malware detection using fuzzy similarity of system call

dependency sequences
Amir Mohammadzade Lajevardi1 · Saeed Parsa2 · Mohammad Javad Amiri3

Received: 1 November 2020 / Accepted: 4 April 2021

Abstract
Static malware detection approaches are time-consuming and cannot deal with code obfuscation techniques. Dynamic malware
detection approaches, on the other hand, address these two challenges, however, suffer from behavioral ambiguity, such as the
system calls obfuscation. In this paper, we introduce Markhor, a dynamic and behavior-based malware detection approach.
Markhor uses system call data dependency and system call control dependency sequences to create a weighted list of malicious
patterns. The list is then used to determine the malicious processes. Next, the similarity of a file system call sequences to a
malicious pattern is extracted based on a fuzzy algorithm and the file nature is determined. The evaluation results reveal the
efficiency of Markhor in terms of accuracy (0.982), precision (0.976), and F-measure (0.982).

1 Introduction ber of signatures is increasing over time, and Also, such

techniques can not be used to detect new malware also fam-
Malware detection approaches can be categorized into static ilies [2]. Moreover, signature-based approaches require a
and dynamic approaches [1]. While static approaches can database to be updated within a short period of time [2], and
determine the nature of the software (i.e., malicious or finally, malware that uses deformation or code obfuscation
benign) without running the malware, these approaches are is not easily detected using this approach. In the behavior-
time-consuming and also vulnerable to code obfuscation. based approaches, on the other hand, malware is detected by
Dynamic approaches, on the other hand, execute the software analyzing its behavior [3,4]. In such approaches, malware
and detect the nature of the software based on its requests behavioral patterns are modeled and if a malicious behavior
resulting in a faster detection. is recognized, the program is avoided to be run.
Malware detection approaches can also be categorized In software security, malware behavior is detected accord-
into signature-based and behavioral-based approaches. Signa ing to its system resources usage. The behavior of malware,
ture-based approaches extract a specific byte code for a therefore, is classified into five different classes [3,5,6]:
malware family and use it to detect all samples of that par- file-based behaviors, process-based behaviors, windows-
ticular family. While Signature-based approaches are very based behaviors, network-based behaviors, and operating
fast, they suffer from several drawbacks. First, the num- system alteration-based behaviors. Figure 1 shows the num-
ber of application programming interface (API) calls for 386
B Amir Mohammadzade Lajevardi malware analyzed in [7]. As it is demonstrated, malware
[email protected] behaviors were mostly intended to search files in order to
Saeed Parsa read or write them.
[email protected] In this paper, we present Markhor,1 a behavioral-based
Mohammad Javad Amiri and dynamic approach for detecting the malicious files
[email protected] or processes. Markhor uses system call data dependency
and system call control dependency sequences to create a
1 Department of Computer Engineering, Sharif University of
Technology, Tehran, Iran
1 Markhor (Capra falconeri), is a large Capra species native to Central
2 Department of Computer Engineering, Iran University of
Asia, Karakoram and the Himalayas. The name is thought to be derived
Science and Technology, Tehran, Iran
from Persian–a conjunction of mar (“snake, serpent”) and the suffix
3 Department of Computer and Information Science, University khor (“-eater”), interpreted to represent the animal’s alleged ability to
of Pennsylvania, Pennsylvania, USA kill snakes.

123
A. M. Lajevardi et al.

Fig. 1 API calls distribution in

386 malware analyzed in [7]

weighted list of malicious patterns. In Markhor, the similar- and operations might be obfuscated or encoded. As a result,
ity of a file system call sequences to a malicious patterns is it might be too difficult to detect their real values. Whereas,
extracted based on a fuzzy algorithm and the file nature is in the dynamic method, the malware is decoded and sends
determined in malware run time. its requests to the operating system. It is obvious that these
The main contributions of this paper are: requests are not obfuscated because if the operating system
cannot recognize the sent requests, it is not able to send
1. Using semantic relations of API calls instead of structural the desired response. Dynamic approaches include Inline
relations to construct the real malicious API sequences, API Hooking [6] and tracking with operating system service
2. Detecting fake API calls from the malicious behaviors of descriptor Table [8].
malware, and After extracting and tracking a program behavior accu-
3. Proposing a fuzzy-based algorithm to determining whether rately in a secure environment, it is necessary to determine
a program is malicious or benign. the program nature to check whether it is benign or malicious.
In the rest of this section, we discuss malware detec-
tion approaches that use behavioral features. Extracting the
The rest of this paper is organized as follows. Section 2 dynamic-link library dependency tree for suspicious software
discusses related work. Markhor is introduced in Sect. 3. from the import address Table (IAT) without the execution
Section 4 evaluates the performance of Markhor, and Sect. 5 of the application is proposed in [9] to detect malware. The
concludes the paper. approach is able to detect fake dynamic-link library injection
and uses Dependency Walker [10] to generate the behavioral
tree of each file. The main drawback of this approach is low
2 Related work accuracy for detecting the malware in which IAT is destroyed.
The extracted behavioral tree might also have many unim-
Behavioral feature extraction is one of the most important portant nodes which lead to increasing detection time.
components of the malware detection approaches. If the fea- Modeling the program behavior based on the frequency of
tures are not extracted correctly, the type of software cannot API calls is studied in [11]. The method, however, assumes
be detected accurately. Behavioral feature extraction can be that API calls are independent resulting in lower precision.
done statically or dynamically. Static extraction is suitable Most recent malware detection approach [12–17] use API
when a malware behavior is needed to be analyzed without call sequences to reduce the false positive and detect the fake
running it. This is done through program source code and API call injection. These approaches use the API method
analyzing extracted codes from the program. Such methods call sequence as malicious patterns to detect malware where
either use program import address Table [4] or track call the API call sequence is extracted for each malware and
operations in program code [5]. benign program. Using data mining techniques, then, effec-
The dynamic extraction of behavior features requires to tive sequences to detect malware are extracted. Finally, the
run the malware first and then track its behavior. Dynamic dependency between API calls is considered based on their
approaches are more efficient in comparison to static ones, call sequence. The main challenge in these approaches is the
because, in the static analysis, that codes, the address table,

123
MARKHOR: malware detection using fuzzy similarity of system call dependency sequences

Table 1 Malware families used Malware family Count 1 UNICODE_STRING uniName;

to produce test data 2 OBJECT_ATTRIBUTES objAttr;
Trojan 11735 3 //Refer to a file by its object name
4 RtlInitUnicodeString(&uniName, L"\\DosDevices\\C:\\
Virus 4060
WINDOWS\\example.txt");
Worm 1413 5 InitializeObjectAttributes(&objAttr, &uniName,
Backdoor 1388 OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE,
Rootkit 33 NULL, NULL);
6 //Obtain a file handle
Sum 18629 7 HANDLE handle;
8 NTSTATUS ntstatus;
way API call sequence is extracted, which leads to decreasing 9 IO_STATUS_BLOCK ioStatusBlock;
the detection rate. 10 if(KeGetCurrentIrql() != PASSIVE_LEVEL)
11 return STATUS_INVALID_DEVICE_STATE;
12 ntstatus = ZwCreateFile(&handle,GENERIC_WRITE,&objAttr
,&ioStatusBlock,NULL,FILE_ATTRIBUTE_NORMAL,0,
3 Markhor FILE_OVERWRITE_IF,
FILE_SYNCHRONOUS_IO_NONALERT,NULL, 0);
13 if (Condition1)
The proposed approach consists of four main steps. In the 14 {
first step, test data including malware and benign files are 15 //Write to a file
collected. The focus of the second step is on extracting mal- 16 #define BUFFER_SIZE 30
17 CHAR buffer[BUFFER_SIZE];
ware behavioral feature by tracking malware behavior. In the
18 size_tcb;
third step, useful patterns are extracted from test data using 19
system call dependency sequences, and finally, in the fourth 20 if(NT_SUCCESS(ntstatus))
step, the characteristics of a program are explored using the 21 {
22 ntstatus = RtlStringCbPrintf(buffer, sizeof(buffer), "This is %
extracted patterns. In this section, we present these four steps
d test\r\n", 0x0);
in detail. 23 if(NT_SUCCESS(ntstatus))
24 {
3.1 Test data collection 25 ntstatus = RtlStringCbLength(buffer, sizeof(buffer), &cb);
26 if(NT_SUCCESS(ntstatus))
27 {
In most existing approaches, the behavioral model is cre- 28 ntstatus = ZwWriteFile(handle, NULL, NULL, NULL,&
ated using the dataset presented in [18]. This dataset includes ioStatusBlock, buffer, cb, NULL,NULL);
information about the malware behavior based on some pre- 29 }
30 }
defined system calls. Since our proposed approach covers a
31 ZwClose(handle);
wide range of API calls, the executable file of each malware is 32 }
needed. Therefore, virus sign malware Database [19] is used 33 }
to collect the executable file of malware. We collect 18629 34 else // Not Condition1
35 {
malware and 15460 benign programs in total. The benign
36 //Read from a file
programs are mainly obtained from Program Files and Win- 37 LARGE_INTEGER byteOffset;
dows directory of Windows operating system. The malware 38

families used to produce test data and their corresponding 39 if(NT_SUCCESS(ntstatus))

40 {
counts are shown in Table 1.
41 byteOffset.LowPart = byteOffset.HighPart = 0;
42 ntstatus = ZwReadFile(handle, NULL, NULL, NULL, &
3.2 Feature extraction ioStatusBlock, buffer, BUFFER_SIZE, &byteOffset, NULL
);
43 if(NT_SUCCESS(ntstatus))
To extract the behavior of a malware, it needs to be run in
44 {
an operating system. Running malware, however, will affect 45 buffer[BUFFER_SIZE−1] = ’\0’;
the operating system, hence, most existing approaches use 46 DbgPrint("%s\n", buffer);
virtual machines and sandboxes to run malware. In Markhor, 47 }
48 ZwClose(handle);
we use VirtualBox as a virtual machine which contains Win-
49 }
dows XP as a guest operating system. We also used API 50 }
monitor [20] to track the file’s behavior. This software accepts
Code 1 Code sample for reading and writing a file.
an executable file of malware as the input and generates
their API call in the run-time. Furthermore, by running the
malware, data and control dependency sequences will be
extracted. Note that the collected malware does not use red-

123
A. M. Lajevardi et al.

Fig. 2 A part of intercepted API calls for the notepad.exe sample file
pills in run-time. Using red-pills, malware can recognize the for code 1 is as follows:
virtual machine environment and hides its real behaviour.
Each malware is run for 5 min. During this time its behaviour sccds(code1) = {1 → 2 → 3 → 4 → 5 → 6 → 7 → 8
is extracted and logged as a sequence of API call with their
→ 9 → 10 → 11}
arguments. A part of intercepted API calls for the notepad.exe
sample file is shown in Fig. 2.
It is very important to eliminate dependencies between
system calls that are not related to each other logically
and semantically. Therefore, it is necessary to identify data
3.3 Pattern extraction from test data
dependencies between system calls rather than sequential
dependencies.
The aim of this step is to extract malicious patterns from
Step two: System call data dependency sequence (SCDDS)
the dataset. These patterns are extracted based on control
extraction: To find semantic dependence between system
dependency and data dependency which are described in the
calls, we use data dependency among these calls. The param-
rest of this section.
eters in system calls mainly consist of types in and out. Data
dependency between system calls occurs when the output
of a system call is the input of another call. The way these
3.3.1 System call dependency sequence (SCDS) extraction
dependencies are extracted is described below.
In Markhor, system calls and their dependencies are modeled
in a novel way. we use a sample code, shown in Code 1, – Def-use pair: In this part, def-use pairs are extracted
to explain the proposed method. As shown in Code 1, the for each system call. For each system call, definitions are
program attempts to read/write a file using a sequence of parameters with type output of a system call. Uses are
system calls. Condition1 defines the type of operation on non-constant parameters with type input of a system call
a file. This operation can either be a read or a write. Since this that were previously defined in another system call. To
paper is intended for dynamic analysis of malware (run-time extract def-use pairs, the type of parameters should be
analysis), we assumed that in either case the condition holds, determined. Table 2 shows the type of parameters defined
so the write operation is performed successfully on the file. in system calls for sample code 1. It is worth mentioning
The process of system call dependency sequence (SCDS) that since in most cases the return value of system calls
extraction is described in the following steps. are from type Boolean or NtStatus, the focus of this paper
Step one: System call control dependency sequence (SCCDS) is on system calls parameters, not on their return values.
extraction: Since the program is analyzed dynamically (and After calculating the types of system calls parameters,
not statically), the system calls control dependency forms a we can extract the Def-use pairs for each system call.
sequence rather a graph. In Fig. 3, the system call control Table 3 shows these pairs.
sequence for sample code 1 is shown. According to this fig- – Reaching definition extraction: Using def-use pairs
ure, the system call control dependency sequence or SCCDS and system call control dependency sequence, data

123
MARKHOR: malware detection using fuzzy similarity of system call dependency sequences

Table 2 Types of system calls

Method Parametere Properties
parameters
RtlInitUnicodeString Par1 = Out , Par2 = In(optional)
InitializeObjectAttributes Par1 = Out, Par2 = In, Par3 = In,Par4 = In,Par5 = In(optional)
KeGetCurrentIrql –
ZwCreateFile Par1 = Out, Par2 = In, Par3 = In,Par4 = Out, Par5 = In(optional),
Par6 = In, Par7 = In, Par8 = In, Par9 = In,Par10 = In(optional), Par11 = In
NT_SUCCESS Par1 = In
RtlStringCbPrintf Par1 = Out, Par2 = In, Par3 = In
RtlStringCbLength Par1 = In, Par2 = In, Par3 = Out(optional)
ZwWriteFile Par1 = In, Par2 = In(optional),Par3 = In(optional), Par4 = In(optional),
Par5 = Out, Par6 = In, Par7 = In, Par8 = In(optional), Par9 = In(optional)
ZwReadFile Par1 = In, Par2 = In(optional),Par3 = In(optional), Par4 = In(optional),
Par5 = Out, Par6 = Out, Par7 = In, Par8 = In(optional), Par9 = In(optional)
ZwClose Par1 = In

Table 3 Def-use pairs extraction for system calls for sample Code 1
# Node Method name Def-use chain

1 RtlInitUnicodeString Def uniName

Use par3:L\ \ DosDevices\ \ C:\ \ WINDOWS\ \ example.txt
(par4:OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE)
2 InitializeObjectAttributes Def objAttr
Use –
3 KeGetCurrentIrql Def –
Use –
4 ZwCreateFile Def handle,ioStatusBlock
Use objAttr
(par2:GENERIC_WRITE)(par6:FILE_ATTRIBUTE_NORMAL)
(par7:,0)(par8:FILE_OVERWRITE_IF)
(par9:FILE_SYNCHRONOUS_IO_NONALERT) (par11:0)
5 NT_SUCCESS Def –
Use ntstatus
6 RtlStringCbPrintf Def buffer
Use buffer
(par3:“This is% d test\r \n”)
7 NT_SUCCESS Def –
Use ntstatus
8 RtlStringCbLength Def cb
Use buffer
9 NT_SUCCESS Def –
Use ntstatus
10 ZwWriteFile Def ioStatusBlock
Use Handle, buffer, cb
11 ZwClose Def –
Use Handle

dependency among methods would be known. To do so, – In: Set of definitions from the previous system calls
four sets are defined as follows: reached to the current system call (according to the sys-
– Gen: Set of definitions (out parameters) done by a system tem call control dependency sequence extraction).
call.

123
A. M. Lajevardi et al.

Algorithm 1: Reaching Definitions Extraction Algo-

rithm
Data: Gen for each node n
Result: IN & Out for each node n
1 initialization;
2 n=SCCDS Start Node ;
3 repeat
4 IN[n] = OUT[P] ; // where P is Parent of n
/* In is null for start node */
5 Kill[n] ={(i, j)|(i, j) ∈ I n[n], (k, j) ∈ Gen[n]} ;
6 OUT[n] =G E N [n] ∪ (I N [n] − K I L L[n]) ;
7 n= child node of n ;
8 until n<>null;

Table 4 The dependencies between sets In, Kill, Out, and Gen

I n[B] = ∪Out[ p], ∀ p ∈ Pr edecessor B

K ill[B] = {(i, j)|(i, j) ∈ I n[B], (k, j) ∈ Gen[B]}
Out[B] = Gen[B] ∪ (I n[B] − K ill[B])

Algorithm 2: Def-use Chain Extraction Algorithm

Data: A System Call Flow Graph for which the IN sets for
reaching definitions have been computed for each node n.
Result: DUChain: a set of definition-use pairs.
/* Method: Visit each node in the control
flow graph. For each node, use upwards
exposed uses and reaching definitions to
form definition-use chains. */
1 initialization;
2 DU Chain = ∅;
3 foreach node n do
4 foreach use U in n do
5 foreach reaching definition D in IN[n] do
6 if D is a definition of v and U is a use of v then
7 DU Chain = DU Chain ∪ (D, U )
8 end
9 end
10 end
11 end

system call control dependency sequence. Reaching defini-

tions are then extracted using algorithm 1.
According to what was mentioned before, these sets are
Fig. 3 System call control dependency sequence for code shown in
shown in Table 5.
code 1

– Kill: Set of definitions from the previous system calls – Def-use chain extraction: To extract def-use chain
reached to the current system call but are killed with algorithm 2 is used to show where each definition is
redefining in the current system call. used. Def-use chain for sample code 1 are extracted
– Out: Set of definitions leaving the current system call and shown in Table 6.
towards next system calls.
So def-use chain for Code 1 based on Table 6 is as
Values in these sets are shown by ordered pairs (i,j) in follow:
which i is function number in SCDS in which variable j is
used, defined, entered or left. The dependency between these {(2 : obj Attr , 4), (6 : bu f f er , 8), (4 : H andle, 10),
sets is shown in Table 4. Symbol B refers to a system call in (6 : bu f f er , 10), (8 : cb, 10), (4 : handle, 11)}

123
MARKHOR: malware detection using fuzzy similarity of system call dependency sequences

Table 5 Reaching definitions extraction for sample Code 1 according to Algorithm 1

Node Gen In Kill Out

1 (1, uniName) Null Null (1, uniName)

2 (2, objAttr) (1, uniName) Null (2, objAttr)(1, uniName)
3 Null (1, uniName) (2, objAttr) Null (1, uniName)(2, objAttr)
4 (4, handle) (4, ioStatusBlock) (1, uniName) (2, objAttr) Null (1, uniName) (2, objAttr) (4,
handle)
(4, ioStatusBlock)
5 Null (1, uniName) (2, objAttr)(4, Null (1, uniName) (2, objAttr) (4,
handle) handle)
(4, ioStatusBlock) (4, ioStatusBlock)
6 (6, buffer) (1, uniName) (2, objAttr)(4, Null (1, uniName) (2, objAttr) (4,
handle) handle)
(4, ioStatusBlock) (4, ioStatusBlock) (6, buffer)
7 Null (1, uniName) (2, objAttr)(4, Null (1, uniName) (2, objAttr) (4,
handle) handle)
(4, ioStatusBlock) (6, buffer) (4, ioStatusBlock) (6, buffer)
8 (8, cb) (1, uniName) (2, objAttr)(4, Null (1, uniName) (2, objAttr) (4,
handle) handle)
(4, ioStatusBlock) (6, buffer) (4, ioStatusBlock) (6, buffer)
(8, cb)
9 Null (1, uniName) (2, objAttr) (4, Null (1, uniName) (2, objAttr) (4,
handle) handle)
(4, ioStatusBlock) (6, buffer) (4, ioStatusBlock) (6, buffer)
(8, cb) (8, cb)
10 (10, ioStatusBlock) (1, uniName)(2, objAttr) (4, (4, ioStatusBlock) (1, uniName) (2, objAttr) (4,
handle) handle)
(4, ioStatusBlock) (6, buffer) (6, buffer) (8, cb)(10, ioSta-
(8, cb) tusBlock)
11 Null (1, uniName) (2, objAttr)(4, Null (1, uniName) (2, objAttr)(4,
handle) handle)
(6, buffer) (8, cb) (10, ioSta- (6, buffer) (8, cb) (10, ioSta-
tusBlock) tusBlock)

According to the def-use chain, we should calculate the scds( f ) = scdds( f ) ∪ {n|n has no data dependency} (1)
longest path to extract the data dependencies among system
calls which are shown in Table 7. According to this table, So according to Eq. 1, the SCDS for code 1 is as follows:
system call data dependency sequence for sample Code 1 is
as follows: scds(code1) = {2 → 4 → 10, 2 → 4 → 11, 6 → 8 → 10,
6 → 10} ∪ {1, 3, 5, 7, 9}

scdds(code1) = {2 → 4 → 10, 2 → 4 → 11, 6 → 8 → 3.3.2 Assigning weights to the sequences

10, 6 → 10}
Each system call sequence has a weight that specifies its
importance to determine the nature of the software. Similar
Step three: System calls dependency sequence (SCDS) to [21], we consider two parameters called LBF and LBC. The
extraction: In this step according to the control and data first parameter, LBF, shows the probability of the sequence
dependency sequence from previous steps, system calls being malicious and the second one, LBC, is the probability
dependency sequence set is extracted. System calls depen- representing a sequence is benign. For a system call sequence
dency sequence set for suspicious file f can be calculated λ, LBF, and LBC are calculated according to the following
based on the following equation: equations:

123
A. M. Lajevardi et al.

Table 6 Def-use chain

Node Use Def=In DUChain
extraction for the code given in
Code 1 1 Null Null
2 (1, uniName) Null
3 (1, uniName) (2, objAttr) Null
4 (4, objAttr) (1, uniName) (2, objAttr) { (2:objAttr, 4)}
5 (5, ntstatus) (1, uniName) (2, objAttr) { (2:objAttr , 4) }
(4, handle) (4, ioStatusBlock)
6 (6, buffer) (1, uniName) (2, objAttr) {(2:objAttr, 4) }
(4, handle) (4, ioStatusBlock)
7 (7, ntstatus) (1, uniName) (2,objAttr) (4, handle) { (2:objAttr, 4) }
(4, ioStatusBlock) (6, buffer)
8 (8, buffer) (1, uniName) (2, objAttr) (4, handle) {(2:objAttr, 4), (6:buffer, 8) }
(4, ioStatusBlock) (6, buffer)
9 (9, ntstatus) (1, uniName) (2, objAttr) (4, handle) { (2:objAttr, 4), (6:buffer, 8) }
(4, ioStatusBlock) (6, buffer) (8, cb)
10 (10, Handle) (1, uniName) (2, objAttr) (4, handle) { (2:objAttr, 4), (6:buffer, 8),
(10, buffer) (4, ioStatusBlock) (6, buffer) (8, cb) (4: Handle, 10), (6:buffer, 10),
(10, cb) (8:cb, 10) }
11 (11, handle) (1, uniName) (2, objAttr) (4, handle) { (2:objAttr, 4), (6:buffer, 8),
(6, buffer) (8, cb) (10, ioStatusBlock) (4: Handle, 10), (6:buffer, 10),
(8:cb, 10), (4:handle, 11) }

Table 7 System call data

dependency sequence for 2 → 4 → 10 I nitiali zeObject Attributes → Z wCr eateFile → Z wW riteFile
sample Code 1 2 → 4 → 11 I nitiali zeObject Attributes → Z wCr eateFile → Z wClose
6 → 8 → 10 Rtl StringCb Print f → Rtl StringCbLength → Z wW riteFile
6 → 10 Rtl StringCb Print f → Z wW riteFile

Table 8 Calculating the best value for Θ

Θ TP FP Accuracy Precision F measure
# of Visited Sequences λ in Malware Dataset
L B F(λ) =
# of Malware Samples 20 0.991 0.0335 0.979 0.967 0.979
# of Visited Sequences λ in Benign Dataset 25 0.99 0.033 0.979 0.968 0.979
L BC(λ) =
# of Benign Samples 30 0.989 0.031 0.979 0.970 0.979
36 0.987 0.028 0.980 0.972 0.980
After calculating LBF and LBC, we determine the effect 37 0.987 0.024 0.982 0.976 0.982
of each one on finding the program behavior. The sequence 38 0.986 0.0255 0.980 0.975 0.980
weight function, ω(λ), is introduced for the sequence λ as 40 0.982 0.024 0.979 0.976 0.979
follow: 45 0.984 0.025 0.980 0.975 0.980
⎧
⎪
⎪ L B Fλ × (T otal N umber O f BenignSample)
⎪
⎪
⎪
⎪ i f (L BCλ = 0) 3.4 Malware detection
⎪
⎪
⎨ L B Fλ
ω(λ) = L BCλ
Once the malicious rules are specified, we can detect the
⎪
⎪ else i f (L B Fλ > L BCλ )
⎪
⎪
⎪
⎪ nature of a suspicious file. For each suspicious file f , its
⎪
⎪0
⎩ system call dependency sequence is extracted and compared
else i f (L B Fλ ≤ L BCλ )
with the available malicious rules in the database D B. If they
match, the sequence weight, ω, affects the final score. The
After assigning scores to the sequences, the sequences final score is calculated based on the sum of the scores of
with scores higher than zero are added to the database D B the detected malicious rules, which is defined in Eq. 2 based
as the malicious rules set. on their system call dependency sequences. In other word,

123
MARKHOR: malware detection using fuzzy similarity of system call dependency sequences

Table 9 Comparison of the

Approach TP FP Accuracy Precision F-Measure
proposed approach and other
similar approaches Sami et al. [12] 0.941 0.0612 0.940 0.939 0.940
Garg and Yadav [11] 0.833 0.091 0.871 0.902 0.866
Suaboot et al. [17] 0.887 0.071 0.908 0.926 0.906
Our approach 0.987 0.024 0.982 0.976 0.982

suspicious file f is malware, if and only if, to detect the semantic relation between API calls based on
their arguments. In the future, we plan to use other features

ω(x) ≥ Θ (2) to detect malicious software that uses behavior obfuscation.

where x ∈ D B and x ∈ scds( f ). Threshold Θ should be

determined to specify the minimum score needed to detect
References
the nature of a suspicious file. The precise value of Θ is
discussed in Evaluation section. 1. Damodaran, A., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp,
M.: A comparison of static, dynamic, and hybrid analysis for mal-
ware detection. J. Comput. Virol. Hacking Tech. 13(1), 1–12 (2017)
2. Scott, J..: Signature Based Malware Detection is Dead, Cyberse-
4 Evaluation curity Think Tank. Institute for Critical Infrastructure Technology
(February). www.ICITForum.org
The goal of this evaluation is to find the best value for Θ 3. Alazab, M., Venkataraman, S., Watters, P.: Towards understanding
based on our test dataset. To calculate Θ, the test data, which malware behaviour by the extraction of API calls. In: Proceedings
of the 2nd Cybercrime and Trustworthy Computing Workshop, pp.
was discussed in Sect. 3.1, is divided into 10 equal portions.
52–59 (2010). 10.1109/CTC.2010.8
Each time, 9 portions are used as the learning data and one 4. Fang, Z., Wang, J., Li, B., Wu, S., Zhou, Y., Huang, H.: Evad-
portion as the test data. Eight different values are used for Θ. ing anti-malware engines with deep reinforcement learning. IEEE
The results are shown in Table 8. As it is shown in this table, Access 7, 48867–48879 (2019)
5. Martín, A., Menéndez, H. D., Camacho, D.: Studying the influ-
according to the detection rates, value 37 has the best result
ence of static API calls for hiding malware. In: Lecture Notes in
for Θ. Computer Science, vol. 9868, pp. 363–372. Springer (2016)
We next compare our approach with several other approaches 6. Lopez, J., Babun, L., Aksu, H., Uluagac, A.S.: A survey on function
which can be implemented and tested on our dataset. The and system call hooking approaches. J. Hardw. Syst. Secur. 1(2),
114–136 (2017)
comparison results are shown in Table 9.
7. Alazab, M., Venkataraman, S., Watters, P.: Towards understanding
The proposed approach, as shown in Table 9, incurs a malware behaviour by the extraction of API calls. In: Cybercrime
low false positive rate due to using system calls data depen- and Trustworthy Computing Workshop, pp. 52–59 (2010)
dency and also numerous benign programs to extract patterns. 8. Sihwail, R., Omar, K., Ariffin, K.A.: A survey on malware analysis
techniques: Static, dynamic, hybrid and memory analysis. Int. J.
Moreover, the proposed approach has a high false positive
Adv. Sci. Eng. Inf. Technol. 8(4–2), 1662–1671 (2018)
rate facing behaviour obfuscation methods such as replacing 9. Narouei, M., Ahmadi, M., Giacinto, G., Takabi, H., Sami, A.:
the order of system calls or using fake system calls. DLLMiner: structural mining for malware detection. Secur. Com-
mun. Netw. 8(18), 3311–3322 (2015)
10. Dependency Walker, Dependency Walker (2018). http://www.
dependencywalker.com/
5 Conclusion 11. Garg, V., Yadav, R.K.: Malware detection based on API calls fre-
quency. In: International Conference on Information Systems and
In this paper, we present a dynamic and behavior-based Computer Networks, pp. 400–404. IEEE (2019)
12. Sami, A., Yadegari, B., Rahimi, H., Peiravian, N., Hashemi, S.,
approach to detect malware. First, behavioral features includ- Hamze, A.: Malware detection based on mining API calls. In:
ing the control and data dependency sequences of system Proceedings of the ACM Symposium on Applied Computing, pp.
calls are extracted from a database of malware and benign 1020–1025. ACM Press, New York (2010)
programs. Then, using a fuzzy approach, a value is assigned 13. Qiao, Y., Yang, Y., He, J., Tang, C., Liu, Z.: CBM: free, auto-
matic malware analysis framework using API call sequences. In:
to each sequence. This value represents the effect of a Advances in Intelligent Systems and Computing, vol. 214, pp. 225–
sequence in recognizing the type of a program. Finally, to 236. Springer (2014)
detect the type, the program is run and upon extracting system 14. Tran, T.K., Sato, H.: NLP-based approaches for malware classi-
calls dependency sequences and matching them with avail- fication from API sequences. In: Symposium on Intelligent and
Evolutionary Systems, vol. 2017-Janua, pp. 101–105. Institute of
able sequences in the database, the type is recognized. Our Electrical and Electronics Engineers Inc. (2017)
evaluation results demonstrate 0.982 accuracy, 0.976 preci- 15. Kim, H., Kim, J., Kim, Y., Kim, I., Kim, K.J., Kim, H.: Improve-
sion, and 0.982 F-Measure. This approach can also be used ment of malware detection and classification using API call

123
A. M. Lajevardi et al.

sequence alignment and visualization. Clust. Comput. 22(1), 921– 20. API Monitoring Tool. https://www.rohitab.com/apimonitor
929 (2019) 21. Parsa, S., Zareie, F., Vahidi-Asl, M.: Fuzzy clustering the backward
16. Fadadu, F.: Evading API call sequence based malware classifiers. dynamic slices of programs to identify the origins of failure. In:
In: International Conference on Information and Communications Lecture Notes in Computer Science, vol. 6630, pp. 352–363 (2011)
Security, pp. 18–33. Springer, Cham (2019)
17. Suaboot, J., Tari, Z., Mahmood, A., Zomaya, A.Y., Li, W.: Sub-
curve HMM: a malware detection approach based on partial
Publisher’s Note Springer Nature remains neutral with regard to juris-
analysis of API call sequences. Comput. Secur. 92, 101773 (2020)
dictional claims in published maps and institutional affiliations.
18. CWSandbox Data. http://pi1.informatik.uni-mannheim.de/
malheur/
19. Virus Sign Malware Data Base. https://www.virussign.com

123

Microsoft Logo Third Party Usage Guidance: June 2021
No ratings yet
Microsoft Logo Third Party Usage Guidance: June 2021
7 pages
document_malware
No ratings yet
document_malware
9 pages
Comparison of Malware Classification Methods Using Convolutional Neural Network Based On Api Call Stream
No ratings yet
Comparison of Malware Classification Methods Using Convolutional Neural Network Based On Api Call Stream
19 pages
A Behavior-Based Approach For Malware Detection: Rayan Mosli, Rui Li, Bo Yuan, Yin Pan
No ratings yet
A Behavior-Based Approach For Malware Detection: Rayan Mosli, Rui Li, Bo Yuan, Yin Pan
16 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
8 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
9 pages
Malware Categrisn
No ratings yet
Malware Categrisn
16 pages
A Behavior-Based Approach For Malware Detection
No ratings yet
A Behavior-Based Approach For Malware Detection
15 pages
Researchdemo 2
No ratings yet
Researchdemo 2
13 pages
Behavior-based_features_model_for_malware_detectio
No ratings yet
Behavior-based_features_model_for_malware_detectio
12 pages
5474-Article Text-8699-1-10-20200511
No ratings yet
5474-Article Text-8699-1-10-20200511
8 pages
A novel ensemble-based approach for Windows malware detection
No ratings yet
A novel ensemble-based approach for Windows malware detection
10 pages
API_MD
No ratings yet
API_MD
13 pages
Researchdemo 3
No ratings yet
Researchdemo 3
19 pages
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
No ratings yet
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
11 pages
Malware Detection Using ANN
No ratings yet
Malware Detection Using ANN
10 pages
Mini Project
No ratings yet
Mini Project
11 pages
Analysis of Cyber Security Threats Using
No ratings yet
Analysis of Cyber Security Threats Using
5 pages
document
No ratings yet
document
5 pages
Effective Malware Detection Based On Behaviour and Data Features
No ratings yet
Effective Malware Detection Based On Behaviour and Data Features
16 pages
Comparative Analysis of Feature Extraction Methods of PXC
No ratings yet
Comparative Analysis of Feature Extraction Methods of PXC
7 pages
Building A Malware Detection System Based On A Mac
No ratings yet
Building A Malware Detection System Based On A Mac
6 pages
API Call Based Malware Detection Approach Using Recurrent Neural Network-LSTM
No ratings yet
API Call Based Malware Detection Approach Using Recurrent Neural Network-LSTM
13 pages
Computers 11 00160 v2
No ratings yet
Computers 11 00160 v2
15 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
A Comprehensive Review On Malware Detection Approaches
No ratings yet
A Comprehensive Review On Malware Detection Approaches
23 pages
Paper 2 179999913001 INDJCSE22-13-05-109
No ratings yet
Paper 2 179999913001 INDJCSE22-13-05-109
14 pages
Final Research - Merged
No ratings yet
Final Research - Merged
10 pages
Udayakumar 2017
No ratings yet
Udayakumar 2017
6 pages
SOA - A Malware Detection System Using A Hybrid Approach of Multi-Heads Attention-Based Control Flow Traces and Image Visualization
No ratings yet
SOA - A Malware Detection System Using A Hybrid Approach of Multi-Heads Attention-Based Control Flow Traces and Image Visualization
47 pages
Iterative System Call Patterns Blow The Malware Cover: M. Ahmadi, A. Sami, H. Rahimi, B. Yadegari
No ratings yet
Iterative System Call Patterns Blow The Malware Cover: M. Ahmadi, A. Sami, H. Rahimi, B. Yadegari
24 pages
JSDC: A Hybrid Approach For Javascript Malware Detection and Classification
No ratings yet
JSDC: A Hybrid Approach For Javascript Malware Detection and Classification
12 pages
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
No ratings yet
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
5 pages
Malware Detection and Classification Based On Extraction of API Sequences
No ratings yet
Malware Detection and Classification Based On Extraction of API Sequences
6 pages
(IJETA-V7I5P3) :prateek Nigam
No ratings yet
(IJETA-V7I5P3) :prateek Nigam
8 pages
Malware Detection With LSTM Using Opcode Language
100% (1)
Malware Detection With LSTM Using Opcode Language
7 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
No ratings yet
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
12 pages
An Effective End-To-End Android Malware Detection Method - Research Base Paper PDF
No ratings yet
An Effective End-To-End Android Malware Detection Method - Research Base Paper PDF
10 pages
pdf3
No ratings yet
pdf3
9 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
11 pages
1 s2.0 S0957417422012763 Main
No ratings yet
1 s2.0 S0957417422012763 Main
18 pages
Malicious Behavior Detection Method Using API Sequence in Binary Execution Path
No ratings yet
Malicious Behavior Detection Method Using API Sequence in Binary Execution Path
9 pages
Lightweight and Robust Malware Detection Using Dictionaries of API Calls
No ratings yet
Lightweight and Robust Malware Detection Using Dictionaries of API Calls
12 pages
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
No ratings yet
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
18 pages
Kolter
No ratings yet
Kolter
24 pages
07 Art NLP-based Entity Behavior Analytics For Malware Detection
No ratings yet
07 Art NLP-based Entity Behavior Analytics For Malware Detection
5 pages
Reasearch 1
No ratings yet
Reasearch 1
18 pages
Mal Wares
No ratings yet
Mal Wares
48 pages
16.experimental Comparison of Features and Classifiers For Android Malware Detection
No ratings yet
16.experimental Comparison of Features and Classifiers For Android Malware Detection
12 pages
Virus Detection Techniques
No ratings yet
Virus Detection Techniques
1 page
Bounouh
No ratings yet
Bounouh
13 pages
A Survey On Malware Analysis Techniques
No ratings yet
A Survey On Malware Analysis Techniques
16 pages
Liu et al. - 2024 - SeGDroid An Android malware detection method base
No ratings yet
Liu et al. - 2024 - SeGDroid An Android malware detection method base
15 pages
A Hybrid Approach for Android Mal Ware Detection
No ratings yet
A Hybrid Approach for Android Mal Ware Detection
15 pages
A Framework For Detection of Malicious Code by Exploiting Machine Learning Techniques On Portable Executables
No ratings yet
A Framework For Detection of Malicious Code by Exploiting Machine Learning Techniques On Portable Executables
4 pages
2408.02066v1
No ratings yet
2408.02066v1
13 pages
im_2007
No ratings yet
im_2007
48 pages
Environment-Reactive Malware Behavior: Detection and Categorization
No ratings yet
Environment-Reactive Malware Behavior: Detection and Categorization
16 pages
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Introduction To PL PGSQL Development
No ratings yet
Introduction To PL PGSQL Development
145 pages
OPENova - OpenERP 7
No ratings yet
OPENova - OpenERP 7
120 pages
CV-Sagar-Instrumentation-10 Yrs Exp - 221108 - 130459
No ratings yet
CV-Sagar-Instrumentation-10 Yrs Exp - 221108 - 130459
4 pages
Robotics Unit2
No ratings yet
Robotics Unit2
8 pages
Technical Report Writing: Report On Electrical Engineering Department Job Fair 2017
No ratings yet
Technical Report Writing: Report On Electrical Engineering Department Job Fair 2017
18 pages
DrewWatch Installation Procedure
No ratings yet
DrewWatch Installation Procedure
14 pages
Spectre Attacks V1
No ratings yet
Spectre Attacks V1
25 pages
O714 - RLS 2.3 For Deployment Engineers Student Lab Guide - Reve
No ratings yet
O714 - RLS 2.3 For Deployment Engineers Student Lab Guide - Reve
144 pages
Hytera tm600 Service - Manual PDF
No ratings yet
Hytera tm600 Service - Manual PDF
143 pages
Current Affairs - 04-12-2024
No ratings yet
Current Affairs - 04-12-2024
5 pages
1) DCL Stands For: Answer - Click Here
No ratings yet
1) DCL Stands For: Answer - Click Here
7 pages
Human Computer Interaction - CS408 Power Point Slides Lecture 03
No ratings yet
Human Computer Interaction - CS408 Power Point Slides Lecture 03
48 pages
MEG 4 What's New
No ratings yet
MEG 4 What's New
7 pages
Uncertainty
No ratings yet
Uncertainty
26 pages
Station Rotation Model
No ratings yet
Station Rotation Model
4 pages
Palo Alto Networks CEF Certified Configuration Guide 03 02 11
No ratings yet
Palo Alto Networks CEF Certified Configuration Guide 03 02 11
12 pages
Infineon IRGP50B60PD1 DataSheet v01 - 00 EN
No ratings yet
Infineon IRGP50B60PD1 DataSheet v01 - 00 EN
10 pages
Technical Library Procedure Manual
No ratings yet
Technical Library Procedure Manual
48 pages
Preliminary Investigation: Computer-Based Inventory System
No ratings yet
Preliminary Investigation: Computer-Based Inventory System
10 pages
Half Wave Rectifier
No ratings yet
Half Wave Rectifier
12 pages
PI Interface Configuration Utility (PI ICU) 1.5.1 User Guide
No ratings yet
PI Interface Configuration Utility (PI ICU) 1.5.1 User Guide
108 pages
Bank - Management
No ratings yet
Bank - Management
14 pages
Testing & Prompt Engineering
No ratings yet
Testing & Prompt Engineering
10 pages
Sara Kim Resume 2018
No ratings yet
Sara Kim Resume 2018
2 pages
ADC Using SAR Via DAC With PWM
No ratings yet
ADC Using SAR Via DAC With PWM
35 pages
3886144
No ratings yet
3886144
9 pages
User Guide Color Grading Luts
No ratings yet
User Guide Color Grading Luts
6 pages
Chief Technology Officer CTO in Dallas TX Resume Scott Davis
No ratings yet
Chief Technology Officer CTO in Dallas TX Resume Scott Davis
2 pages
S.No Sap T-Codes Activity Observe
No ratings yet
S.No Sap T-Codes Activity Observe
5 pages

2021 - Makhor - Malware Detection Using Fuzzy Similarity of System Call Dependency Sequence

Uploaded by

2021 - Makhor - Malware Detection Using Fuzzy Similarity of System Call Dependency Sequence

Uploaded by

Journal of Computer Virology and Hacking Techniques

Markhor: malware detection using fuzzy similarity of system call

Received: 1 November 2020 / Accepted: 4 April 2021

1 Introduction ber of signatures is increasing over time, and Also, such

Fig. 1 API calls distribution in

Table 1 Malware families used Malware family Count 1 UNICODE_STRING uniName;

families used to produce test data and their corresponding 39 if(NT_SUCCESS(ntstatus))

Table 2 Types of system calls

1 RtlInitUnicodeString Def uniName

Algorithm 1: Reaching Definitions Extraction Algo-

I n[B] = ∪Out[ p], ∀ p ∈ Pr edecessor B

Algorithm 2: Def-use Chain Extraction Algorithm

system call control dependency sequence. Reaching defini-

Table 5 Reaching definitions extraction for sample Code 1 according to Algorithm 1

1 (1, uniName) Null Null (1, uniName)

scdds(code1) = {2 → 4 → 10, 2 → 4 → 11, 6 → 8 → 3.3.2 Assigning weights to the sequences

Table 6 Def-use chain

Table 7 System call data

Table 8 Calculating the best value for Θ

Table 9 Comparison of the

where x ∈ D B and x ∈ scds( f ). Threshold Θ should be

You might also like