Malware_Image_Classification_Using_ML_DL (1)
Malware_Image_Classification_Using_ML_DL (1)
Abstract—The abstract goes here. models that have already been trained are often offered in
Index Terms—Malware Image Classification, ML/DL Tech- a range of various configurations, with differing numbers
niques, Image Classification of layers and levels of complexity. These models may be
improved by retraining the network’s higher layers on a
fresh dataset with a slower learning rate or by using them
I. I NTRODUCTION as fixed feature extractors to provide picture features for
Malware image classification is a crucial task in the field of subsequent tasks like object identification or image captioning.
computer security, as it involves identifying and categorizing
malicious software into different classes based on behavior and
functionality. Since that malware assaults are always becoming
more sophisticated and common, this work is crucial, and tra-
ditional signature-based detection methods are becoming less
effective in keeping up with the evolving threat landscape.With
the advancement of deep learning techniques, it has become
possible to leverage computer vision algorithms to automati-
cally classify malware images into various categories, such as
Trojans, worms, viruses, and spyware. These techniques have
shown promising results in detecting malware in an efficient
and scalable manner, without requiring a prior knowledge of
the specific signatures of each malware family.
There are several pre-trained deep learning models that can
be used for malware image classification. Some of the most
popular ones are:
1) ResNet
2) InceptionNet
3) VGGNet
4) DenseNet Fig. 1. fig:ResNetNet Architecture
layers arranged in a parallel fashion. known as VGGNet (short for Visual Geometry Group Net-
work) for image categorization in 2014. VGGNet achieved
The InceptionNet architecture is divided into several stages, state-of-the-art performance on the ImageNet Large Scale Vi-
each of which contains multiple inception modules. The basic sual Recognition Challenge (ILSVRC) 2014 dataset.VGGNet
building block of an inception module is the ”inception block”, has a very simple architecture with all layers using very
which consists of several convolutional and pooling layers in small filters (3x3), followed by a max pooling layer. The
parallel. Each parallel path in the inception block performs network consists of a series of convolutional and pooling
a different type of convolution operation (e.g. 1x1, 3x3, or layers, followed by three fully connected (FC) layers at the
5x5), which allows the network to capture features at different end. The architecture can be divided into two main parts: the
scales. The output of each path is then concatenated along the feature extractor and the classifier.
channel dimension to form a single output tensor.
Dataset : Malimg
ClassName Train Test
Adialer.c 97 25
Agent.FYI 91 25
Allaple.A 2824 125
Allaple.L 1491 100
Alueron.gen!J 173 25
Fig. 4. fig:DenseNet Architecture
Autorun.K 81 25
C2LOP.P 121 25
DenseNet is composed of several blocks called ”Dense C2LOP.gen!g 175 25
Blocks”, where each block consists of a series of convolutional Dialplatform.B 152 25
Dontovo.A 137 25
layers, batch normalization, and activation functions. These Fakerean 306 75
blocks are connected to each other by transition layers, which Instantaccess 356 75
reduce the dimensionality of the feature maps by applying a Lolyda.AA1 153 60
combination of pooling and convolutional operations. Lolyda.AA2 159 25
Lolyda.AA3 98 25
The main advantage of DenseNet is that each layer has Lolyda.AT 134 25
direct access to the feature maps of all preceding layers, Malen.gen!j 111 25
allowing for more effective parameter utilisation. As a result, Obfuscator.AD 117 25
Rbot!gen 133 25
there are far less parameters required than in other deep Skittrim.N 55 25
neural network topologies. Moreover, it has been demonstrated Swissor.gen!E 103 25
that DenseNet performs at the cutting edge on a number Swissor.gen!I 107 25
of computer vision tasks, including object identification and VB.AT 383 25
Wintrim.Bx 72 25
picture categorization. DenseNet is a strong and successful Yuner.A 775 25
design for deep neural networks, notably in the domain of
computer vision.
B. Microsoft Big 2015
II. DATASETS The Microsoft Malware Classification Challenge (BIG
There are several malware image datasets available.Here are 2015) dataset is a collection of color images of malware used
the information which we used for the training models. in a machine learning competition organized by Microsoft in
2015. The competition was aimed at developing algorithms for
malware classification and detection.The BIG 2015 dataset has
A. MalImg Dataset been used in several research studies, including the develop-
ment of machine learning models for malware classification
The Malimg dataset is a collection of grayscale images
and detection.
of malware taken from different sources, including viruses,
worms, and trojans.It contains 9339 images of size 256 x 256
pixels, and each image is associated with a malware family
C. MalImg Dataset
label. The dataset includes 25 different malware families.we
used 8404 images for training and 935 images for testing.The The BIG 2015 dataset contains 9,360 images of size 64 x
dataset is available for download from the following link: 64 pixels, and each image is associated with a malware family
https://www.kaggle.com/datasets/keerthicheepurupalli/malimg- label. The dataset includes 9 different malware families,we
dataset9010 used 8684 images for training and 2176 for testing.
4
Dataset : Malevis
ClassName Train Test
Adposhel 350 144
Agent 350 120
Allaple 350 128
Amonetize 350 147
Androm 350 150
Autorun 350 146
Browser Fox 350 143
Dinwod 350 149
Elex 350 150
Expiro 350 151
Fasong 350 150
HackKMS 350 149
Hlux 350 150
Injector 350 145
InstallCore 350 150
MultiPlug 350 149
Neoreklami 350 150
Neshta 350 147
Regrun 350 135
Sality 350 149
Snarasite 350 150
Dataset : Microsoft Big 2015 Stantinko 350 150
ClassName Train Test VBA 350 150
Gatak 810 203 VBKrypt 350 146
Kelihos ver1 318 80 Vilsel 350 146
Kelihos ver3 2353 589
Lollipop 1982 496
Obfuscator.ACY 982 246
Ramnit 1226 307
Simda 33 9
Tracur 600 151
Vundo 380 95
D. Malevis Dataset
The ”MALEVIS Malware Image Classification Dataset” is a
dataset of malware images that can be used for classification
tasks. It consists of 12394 malware images divided into 25
different classes. for training we used 8750 and 3644 images
for testing.