0% found this document useful (0 votes)
18 views

SOA - A Malware Detection System Using A Hybrid Approach of Multi-Heads Attention-Based Control Flow Traces and Image Visualization

Uploaded by

vicki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

SOA - A Malware Detection System Using A Hybrid Approach of Multi-Heads Attention-Based Control Flow Traces and Image Visualization

Uploaded by

vicki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Computer Science and Information Engineering

A malware detection system using


a hybrid approach of multi-heads
attention-based control flow traces
and image visualization
Ullah, F., Srivastava, G. & Ullah, S. A malware detection system using
a hybrid approach of multi-heads attention-based control flow traces
and image visualization. J Cloud Comp 11, 75 (2022).
https://doi.org/10.1186/s13677-022-00349-8

25/09/2024 Student Name : Victori Polly


Student ID : D1229004
Introductio
n
• Android's Popularity and the
Threat of Malware

• Android's rapid growth has created a


large community of app developers
and millions of apps for users to
download.
• However, with the rise of Android
apps, there's also been a huge
increase in mobile malware (malicious
software that can harm devices).
• Recent surveys show the number of
malicious apps and attacks is growing
rapidly, posing a major threat to
Android users.
Introduction
• Methods to Detect Malware.

• Graph-based methods that use


Control Flow Graphs (CFGs).
• Behavior-based, signature-based,
image-based, and machine
learning methods.
• Machine learning is one of the
most promising methods because
it can detect new, unknown
malware (zero-day threats).
• There are three main types of features used in malware detection:
• Static analysis: Looks at the app's code without running it (e.g.,

Introduction examining call graphs and APIs).


• Dynamic analysis: Observes the app's behavior while it's running.
• Hybrid analysis: Combines both static and dynamic features for a
more complete view.
Introduction
• The research proposes combining call-
graph analysis (which looks at how an
app's functions interact) and image-
based features (which visualize an app's
structure) to create a hybrid approach
for detecting Android malware.
• Call-graphs are useful for finding
malicious code or URLs, but they can be
affected by techniques like code
obfuscation (hiding the real code).
• Image-based analysis can capture a
broader range of app behaviors, but it
may miss specific malicious code.
Introduction
Problem statement
• Figure 1 depicts a malicious code snippet from the dowgin family for adware. It can be seen that the
“airpush” API posts the malicious ads using “static void a (PushAds pushAds, String str)”.
• Such API can also attempt to obtain detailed information about a genuine app to fully utilize its
functions, such as “appId, apikey, url, campId, createId”.
• Furthermore, the malicious URLFootnote1 is being used as a host app URL to push the malicious ads.
• Such semantic patterns cannot be obtained solely via image visualization. Nevertheless, the call-
graph evaluation may be affected by code obfuscation, insertion, reshuffling, etc. Image-based
malware categorization is extensively employed as it can gather all kinds of structural data,
including memory, process, header, etc.
• Thus, visual images can be utilized to fetch any type of dynamic or obfuscated data. However, it can
change the overall hierarchy of an Android file, making it impossible to target a particular malicious
code snippet, URL, etc. Aside from that, this technique is entirely based on image characteristics.
• For instance, hackers can attack the malware images, affecting classification performance.
• As a result, we combined call-graph features to detect potentially malicious scripts/URLs with
textural image features to detect other potentially hazardous tendencies such as memory or
resource usage. A hybrid strategy can effectively utilize and classify malicious and benign files.
The major highlights of this work
are stated as follows;
• Custom dataset: A reverse-engineering method is used to
create a dataset with Java source code, DEX files (Android
executables), and other resource files.
• API Call Graphs (ACGs): The study extracts call graphs from
Java code and uses a method (based on BERT, a type of deep
learning model) to analyze them.
• Malware-to-image conversion: A method to convert DEX files
into images for structural analysis. Important features are
marked using techniques called FAST and BRIEF.
• Hybrid detection: By combining both call-graph and image-
based features, the approach effectively detects and classifies
malware.
Literature Review
• Android apps require users to give permission for
accessing certain features (like location), but users
often grant these permissions without
understanding the risks.
• One popular method is Control Flow Graphs
(CFG), which represents how an app executes
different parts of its code.
• Arslan et al. proposed a tool that transforms
features from Android apps into vectors to train a
Convolutional Neural Network (CNN),
achieving 96.2% detection accuracy.
• Kumar et al. used a grayscale image method for
detecting malware, achieving 98.34% accuracy.
• Other methods like the use of API call graphs and
function-call graphs have been shown to
achieve very high malware detection rates (up to
98.7%).
Proposed
methodology

• Reverse-engineering tools are used to


extract the code from APK files (Android app
files)
• Figure 2 depicts the proposed method for
classifying Android malware, which combines
ACGs with texture features.
• Reverse engineering tools are used to extract
Java source codes and DEX files from Android
APKs. To extract CFG features from Java code,
the graph-based method is used.
• When dealing with malware detection, these are
high-level features that should be traversed
each time. As a result, instead of using
complete CFG features,
• this method focused on ACGs features that can
reduce execution load and extract more specific
features. Following that,
• ACGs and texture features are extracted from
Java source code and DEX files for effective
malware detection and classification.
Proposed methodology
• Reverse-engineering tools are used to extract the code from APK files
(Android app files)
Reverse-engineering tools are used to extract the
code from APK files (Android app files)

• Figure 3 depicts the reverse engineering procedure for retrieving Java codes and DEX files.
• To reverse-engineer the application, we would need its APK.
• The APK Extractor file explorer is used to open the extracted APKs folder in the Internal
Storage directory.
• The chosen APKs are copied to system storage so they can be further processed. These
APKs are then reversed to reveal the code.
• This can help us understand the structure of the code and identify the security measures
they have implemented to avoid a reverse engineering attack.
• The [app].apk file is renamed to [app].zip and then unzip it up and retrieve it. The
classes.dex file, which includes the app code, can be found within the retrieved repository.
• A Dalvik Executable, or DEX file, is an executable file that runs on the Android OS and
contains the compiled script.
• The Jadx decompiler is then used to decompile the DEX file to extract the Java codes. In the
proposed work, the java programming codes and DEX files are used together to extract
features [21]. The reverse engineering process is shown in Algorithm 1.
• Graph-based methods focus on analyzing API
Proposed Call Graphs (ACGs) instead of the whole
methodology CFG, which makes the analysis faster and
more specific to detecting malicious behavior.
Proposed methodology

• Graph-based Feature Analysis

• The system looks at API usage patterns.


Malicious apps often use specific APIs, like
those that access personal information or
network resources.
• The system examines the sequence of API calls
to better understand how an app behaves.
Proposed
methodology

• Using Deep Learning (BERT


Model)
Proposed
methodology

• The method uses multi-head


attention with the BERT model
(a natural language processing
technique) to analyze patterns
in API calls.
• This helps to better identify
malicious behavior by learning
from past examples.
Proposed
methodolog
y
• Texture Feature Analysis

• APK files are also


converted into
grayscale images to
analyze their
"texture.“
• Techniques like FAST
and BRIEF are used to
quickly extract
important visual
features from these
images.
Proposed
methodology
• Combining Graph & Texture
Features
• Features from both API call graphs and
textures are combined.
• A CNN (Convolutional Neural
Network) is used to extract deep
features, improving malware
classification accuracy.
Proposed
methodology
Transfer
learning
with
multi-
heads
attention
Transfer learning with multi-heads
attention
• Figure 5(b) depicts the iterative and simultaneous computations
performed by the attention module of the transformer.
• Each of these is known as an attention head. The N-way split of
the query, key, and value parameters is handled by a separate
head thanks to the attention module.
• The sum of all of these related attention computations results in
the final attention score.
• This is known as “multi-head attention”, and it improves the
transformer’s capacity to encode relationship dynamics and
refinement for each ACG feature.
• The procedure for extracting train textual features from ACGs is
shown in Algorithm 2
Proposed methodology
Malware visualization and texture feature
extraction
Malware visualization and texture
feature extraction
• Figure 6 depicts images with a resolution of 256x256 extracted from the DEX files of adware families
such as dowgin, ewind, feiwo, and gooligan.
• It is found that a large DEX size is shrunk down to a more manageable one.
• For instance, the DEX in the image is reduced from megabytes to kilobytes.
• Consequently, it may be feasible to decrease computational resources. By combining FAST, and BRIEF,
texture features are then retrieved from DEX images [26].
• The FAST extractor can perform calculations quickly and accurately. First, it detects edges by circling a
pixel (p) with 1 to 16 pixels known as the Bresenham circle. Pixels from 1 to 16 are now identified.
• Examine a random sample of N labels inside the circle to see if any of them correspond to pixels that
are brighter than the 16 chosen pixels.
• Because BRIEF is only a feature descriptor, features are extracted and described using the FAST corner
extractor. For ease of use, the implementation procedure is divided into three phases.

• The image is first loaded into memory.

• A copy of the image is generated that is identical in terms of scaling and rotation.

• The combination of the BRIEF descriptor and FAST extractor is used to highlight features.
• Stacked Generalization (Ensemble Learning)

Proposed • The final detection model uses multiple classifiers (like


Naïve Bayes, Decision Trees, and Support Vector Machines)

methodology
combined with a meta-learner (Logistic Regression).
• This ensemble method improves the accuracy by learning
the best combination of predictions from different models.
Results and
Discussion
• Fig. 7. Individual learners are the level-0 learners, and the combiner is the level-1 learner. Following is
specific information regarding the stacked generalization.

• 1Level-0: This is also known as base-learner. The deep features are divided into training and testing sets,
and the training set is then used to generate base learners via base learning models. We combine several
models to work as a base-learner, including Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM)
with Radial Basis Function (RBF), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbor (KNN), and
Multi-Layer Perceptron (MLP). Using out-of-sample data, the prediction is made for each base learner.

• 2Level-1: This is also known as meta-learner. The outcome of the base learners is fed into the meta learner’s
data, and a single meta-learner learns to make accurate malware detection from this data. We used Logistic
Regression (LR) as a metal learner. To prevent overfitting, the meta-learner is trained on a different dataset
than the instances used to train the base learners. The testing part of the deep features is used to train the
metal learner.

• When compared to individual models, we achieve better malware detection and classification results. It is
capable of optimizing the best linear combinations of models. This enables us to obtain the optimal blend of
diversity from each model and achieve the highest level of detection accuracy. However, the computation
time for a stacked ensemble is longer than for any single model. Algorithm 4 depicts the process of
detecting malware using hybrid features.
• The experimentation setup and evaluation of results are performed to ensure the
effectiveness of the projected system

Results and
• We prepared a customized dataset from CIC-InvesAnd-Mal2019 [31] by using reverse
engineering and data mining tools. Originally, the dataset is available in the form of
APKs.

Discussion • It includes four types of malware such as adware, ransomware, scareware, and SMS.
Each malware type is further subdivided into 10 to 11 families. This dataset is been
compiled to install 5, 000 samples on real Android devices.
• These samples originated from 42 distinct families of 342 malicious Android apps as
shown in Table 3.
• These APKs are thoroughly analyzed to unbox and prepare our customized dataset for
effective malware detection, as shown in Table 4.
• The Java programming codes and DEX files are obtained by reverse engineering the
Android APKs.
• There are approximately 3.2K ACGs collected from adware and ransomware, and 3.4K
ACGs collected from scareware and SMS, respectively.
• Similarly, the proposed method crawls the train and texture features with 8.4K for both
adware and ransomware and 8.6K for scareware and SMS. These features are combined
further to extract deep features for improved malware classification results.

Results and
Discussion
• The comparison of the five malware detection performance measures
is shown in Table 5. The KNN model has the lowest performance with

Results and
(precision, recall, F1-score, MCC, and accuracy), (96%, 98%, 97%,
97.42%, and 97.12%), respectively.

Discussion
• However, the proposed ensemble model performs best in terms of
(precision, recall, F1-score, MCC, and accuracy), with (99%, 99%, 99%,
99.14%, and 99.27%).
• While the MLP comes in second place after the ensemble.
Results
and
Discussion

• When compared to the base learners, the stacked ensemble


as meta learner performs the best.
• Table 6 shows the performance compassion for malware
detection for both malware and benign class. The precision,
recall, and F1-score for each class are presented using the
base learner and meta learner.
• The stacked ensemble performed the best, with (100%, 98%,
and 98%) for malware and (97%, 99%, and 99%),
Results and Discussion
• Figure 8 shows the training and testing epoch curves for malware detection using accuracy, loss,
precision, and recall.
• The colours blue, red, orange, and green represent the accuracy, loss, precision, and recall,
respectively.
• Using the training data in part a, the accuracy begins at 80% and increases to 99% by the 20th
epoch.
• The loss begins at 97% and gradually decreases with each epoch.
• The loss is approximately 5% on the 28th epoch and then becomes more or less constant.
Similarly, precision and recall begin at 70% and 50%, respectively, and gradually increase to 98%
in the 20th epoch.
• The inverse relationship between accuracy and loss indicates that the proposed model performs
better on training data.
• Part b also shows that the accuracy and loss are inversely proportional, indicating that the model
performs better on test data.
• In the 15th epoch, there is a drop of up to 70% in accuracy, precision, and recall, with the loss
increasing to 29%.
• Overall, the three performance measures provide 99% performance on the 23rd epoch and are
more or less constant after that. In addition, the normal behaviour of these dynamic curves
Results
and
Discussion

The performance comparison for malware classification is shown in Table 7. The ensemble
provides the best classification results, with precision, recall, F1-score, MCC, and accuracy of
100%, 98%, 98%, 98.52%, and 99.17%, respectively. While the SVM-rbf achieves the lowest
• Figure 9 shows the training and testing epoch curves for malware classification
using accuracy, loss, precision, and recall.
• In part a using training data, the accuracy curve starts from 50% and gradually
increases to reach 83% on the 20th epoch.
• Further, it moves up and reaches 98% in the 40th epoch. After that, it is more or
less constant. Conversely, the loss starts from 75% and then drops gradually up
to 20% in the 22nd epoch.
• Further, it is more or less constant after the 40th epoch and drops up to 4%. The
precision and recall behave close to accuracy which indicates that the proposed
approach performs better for training data.
• In part b, the same performance measures are shown for testing data. The
accuracy, precision, and recall behave abruptly sometimes but provide the best
performance.
• There is a slight drop up to 75% and an increase in loss up to 32%, but after that,
they behave normally.
Results and
Discussion

• Figure 10 depicts the malware classification for each type of malware, namely adware,
ransomware, scareware, and SMS.
• The precision, recall, and F1-score are indicated by the blue, orange, and gray colours.
The recall is lowest when using base and meta learners, while the F1-score is the best.
• However, accuracy yields the best results for ransomware and scareware when using
ensemble, while it yields the worst results for adware when using LR and SVM-rbf.
• There is a drop in accuracy and F1-score of up to 84% when using SVM-rbf for adware,
indicating that this base learner provides the worst classification results. The ensemble
produces the best results overall.
Results and
Discussion

• Figure 11 depicts the confusion matrices, which can be used to investigate


classification and misclassification for malware detection.
• The confusion matrix is provided for each base learner and ensemble.
• The blue diagonal values represent classification values, while the off-diagonal
values represent misclassification. The ensemble model produces the best
classification results of 99%, 99%, and 1%, 1% for malware and benign,
• Figure 12 depicts the confusion matrices for
malware classification.
• It is once again demonstrated that SVM-rbf
has the lowest performance while the

Results and
ensemble has the highest.
• For instance, the classification results for

Discussion
adware, ransomware, scareware, and SMS
are 93%, 93%, 92%, and 97%, respectively,
whereas the ensemble has 100%, 98%, 98%,
and 100% for the same classes.
• It is shown that the proposed hybrid results
using the ensemble model outperform the
base learners for each malware variant.
• Table 8 summarizes the performance of the proposed approach for
adware families, which include dowgin, ewind, feiwo, gooligan,
kemoge, koodous, mobidash, selfmite, shuanet, and youmi.
• When compared to others, the feiwo, kudous, and shuanet have
the best classification results.
• For feiwo, kudous, and shuanet, the precision, recall, and f-score
are (99%, 100%, 100%), (100%, 100%, 100%), and (100%, 99%,
100%), respectively.
• However, kemoge and youmi produce the fewest results.
• For instance, the precision, recall, and F1-score for kemoge and
youmi are (97%, 96%, 96%), (98%, 96%, 97%), and (98%, 96%,
97%), respectively.
The paper introduces a new
method to detect malware by
combining two techniques:
•ACGs (API Call Graphs): These
graphs represent the behavior of
an app by tracking its API calls.
•Malware Images: The app’s code
is converted into an image, and
features are extracted from this
image.

Conclusion
Conclusi
on
• Reverse Engineering: To
analyze an app, its DEX file
(compiled code) and Java
source code are extracted.
• Creating ACGs: API calls are
collected from the app’s control
flow graphs (CFGs) to create
ACGs, which act like a digital
fingerprint of the app’s activity.
• Attention-Based Transfer
Learning: This method uses
multiple heads (like focusing on
different parts of the data) to
extract important features from
ACGs.
Conclusion
• Combining Features: The features from ACGs and
malware images are combined to improve malware
detection accuracy.
• High Accuracy: The proposed method achieves a
high classification accuracy of 99.27% using a
specific dataset. It outperforms other methods,
including one that uses BERT-base with texture
features, which has a 98.52% accuracy.

You might also like