p317 Han
p317 Han
net/publication/262350024
CITATIONS READS
82 4,295
3 authors, including:
Eul Gyu Im
Hanyang University
118 PUBLICATIONS 1,862 CITATIONS
SEE PROFILE
All content following this page was uploaded by Eul Gyu Im on 12 November 2018.
317
Conti et al. [15] proposed an integrated visualizing system that
enables the analysis of byte information of malware samples
through different graphical elements. A “byteview visualization”
shows each byte in the binary sample to a pixel, and a “byte
presence visualization” shows how many bytes have appeared. Figure 2. Binary information extraction procedure
Moreover, a “dot plot visualization” detects duplicated sequence To extract binary information, the binary sample files are
of bytes contained within a sample. Because of the overhead of disassembled first, using disassembling tools, such as IDA Pro
dot plot algorithm, they implemented simplified algorithm by [21] or OllyDbg [22]. After assembly codes are extracted using a
applying these visualization techniques. tool, the sequence of assembly codes are divided into blocks
Anderson et al. [16] visually showed the results of similarity according to some instructions that are used as delimiters, as
calculations between malware samples through images named shown in Figure 3.
“Heatmap.” Nataraj et al. [17] scanned all malware bytes, The sequence of opcodes included in individual blocks is used as
converted the information into gray-scale images, and classified binary information. From each opcode, only first three characters
the malware using image processing. After generating images, are used to generate information for the block. For example, four-
they applied an abstract representation technique for the scene character opcode instructions such as push are reduced to three-
image, i.e. GIST [18,19], to compute texture features and to character instruction. Then, these three-character instructions are
classify malware. Moreover, they proved that the binary texture concatenated together, and the character string is used to represent
analysis techniques using image processing can classify malware the opcode block in the next step to generate an image matrix.
more quickly than existing malware classification methods [20].
However, since the texture analysis method has large
computational overheads, the proposed method has problems to
process a large number of malware.
In this paper, we propose a novel analysis method using image
matrices in order to visually represent malware so that the features
of malware can be easily detected and the similarities between
different malware can be calculated faster than other visualization
methods.
318
a local-sensitive hash function that assumes if input values are matrix to be compared. For example, as shown in Figure 8, an
similar, output values will also be similar. Therefore, if character image matrix can be divided into 16 (N=16) areas and four (n=4)
strings of binary information are similar, the outputs will be areas can be randomly selected.
similar and it will map into similar coordinates in an image matrix. Matching pixels are now identified in each selected area and used
Second, the RGB color-defining module defines the color values in similarity calculations. In this paper, vector angular-based
of images on an image matrix. djb2 [24] is applied to binary distance measure algorithm [25] which decides the similarity
information to determine colors of images for the binary using vector value for each pixel is used to calculate the
information. RGB colors are defined by calculating values of 8 similarities between image matrices. The similarities among n
bits each for red, green, and blue colors. pieces of areas are calculated, and the overall similarity is
Once the coordinates and RGB colors of individual images have calculated as the average of the similarities for the matching pixels
been defined, RGB colored images are recorded on individual on each area.
coordinates of image matrices. To provide human analysts with a
more convenient visual analysis, pixels around the defined
coordinates are recorded simultaneously. As shown in Figure 5,
nine pixels from (x–1,y–1) to (x+1,y+1) around an (x,y) coordinate
defined through the opcode instruction sequence for a block are
recorded.
3.4 Similarity Calculation between Image 4.2 Results of Image Matrix Extraction
Matrices Figure 9 shows image matrices extracted from individual benign
We used “selective area matching” to calculate the similarities binary samples, and Figure 10 shows image matrices extracted
between image matrices. For selective area matching, an image from individual malware families. Since the number of opcode
matrix should be divided into N pieces, where N can be set to 4x instruction sequencesused as binary information varies, the
(x=1,2,3, …), such as 4, 16, and 64. Figure 7 shows image number of pixels recorded on image matrices differs. For the
matrices in which the areas were divided according to different N benign binaries, even if pixels are recorded on the same
values. Then, n pieces are randomly selected from the image coordinates of different image matrices, similarities between
319
image matrices are minimal because the RGB color information of coordinates and a maximum of 18 cases showed the same images
the relevant pixels is different. In contrast, many similar pixels are on image matrices of benign binaries and malware families. Our
found among the image matrices of binary files classified to the results show that image matrices of variants included in same
same malware family. malware family can be shown to be similar and that clear
differences exist between malware binaries and benign binaries.
320
Our future studies include visualizaing various other information [12] Walenstein, A., Venable, M., Hayes, M., Thompson, C., and
from binary files, and extending opcode instruction sequences and Lakhotia, A., 2007. Exploiting similarity between variants to
algorithms for automatic malware classification. defeat malware. In Proceedings of the BlackHat DC
Conference.
6. ACKNOWLEDGMENTS [13] Trinius, P., Holz, T., Gobel, J., and Freiling, F.C., 2009.
This research was supported by Next-Generation Information Visual analysis of malware behavior using treemaps and
Computing Development Program through the National Research thread graphs. In Proceedings of the 6th International
Foundation of Korea(NRF) funded by the Ministry of Science, Workshop on IEEE Visualization for Cyber Security
ICT & Future Plannig (2011-0029924) (VizSec ) 2009., 33-38.
[14] Saxe, J., Mentis, D., and Greamo, C., 2012. Visualization of
7. REFERENCES shared system call sequence relationships in large malware
[1] Christodorescu, M. and Jha, S., 2004. Testing malware corpora. In Proceedings of the Ninth International
detectors. ACM SIGSOFT Software Engineering Notes 29, 4, Symposium on Visualization for Cyber Security, ACM, 33-
34-44. 40.
[2] Kang, B., Kim, T., Kwon, H., Choi, Y., and Im, E.G., 2012. [15] Conti, G., Dean, E., Sinda, M., and Sangster, B., 2008.
Malware classification method via binary content Visual reverse engineering of binary and data files.
comparison. In Proceedings of the 2012 ACM Research in Visualization for Computer Security, Springer, 1-17.
Applied Computation Symposium ACM, 316-321. [16] Anderson, B., Storlie, C., and Lane, T., 2012. Improving
[3] Moser, A., Kruegel, C., and Kirda, E., 2007. Limits of static malware classification: bridging the static/dynamic gap. In
analysis for malware detection. In Proceedings of the Proceedings of the 5th ACM workshop on Security and
Twenty-Third Annual IEEE Computer Security Applications artificial intelligence, ACM, 3-14.
Conference (ACSAC) 2007., 421-430. [17] Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B.,
[4] Cesare, S. and Xiang, Y., 2010. A fast flowgraph based 2011. Malware images: visualization and automatic
classification system for packed and polymorphic malware classification. In Proceedings of the 8th International
on the endhost. In Proceedings of the 24th IEEE Symposium on Visualization for Cyber Security, ,ACM.
International Conference on IEEE Advanced Information [18] Oliva, A. and Torralba, A., 2001. Modeling the shape of the
Networking and Applications (AINA), 2010, 721-728. scene: A holistic representation of the spatial envelope.
[5] Kinable, J. and Kostakis, O., 2011. Malware classification International journal of computer vision, 42, 3, 145-175.
based on call graph clustering. Journal in computer virology [19] Torralba, A., Murphy, K.P., Freeman, W.T., and Rubin,
7, 4, 233-245. M.A., 2003. Context-based vision system for place and
[6] Shang, S., Zheng, N., Xu, J., Xu, M., and Zhang, H., 2010. object recognition. In Proceedings of the Ninth IEEE
Detecting malware variants via function-call graph similarity. International Conference on Computer Vision, 273-280.
In Proceedings of the 5th International Conference on IEEE [20] Nataraj, L., Yegneswaran, V., Porras, P., and Zhang, J., 2011.
Malicious and Unwanted Software (MALWARE), 2010, 113- A comparative assessment of malware classification using
120. binary texture analysis and dynamic analysis. In Proceedings
[7] Tabish, S.M., Shafiq, M.Z., and Farooq, M., 2009. Malware of the 4th ACM workshop on Security and artificial
detection using statistical analysis of byte-level file content. intelligence, ACM, 21-30.
In Proceedings of the ACM SIGKDD Workshop on [21] Eagle, C., 2008. The IDA Pro Book: The Unofficial Guide to
CyberSecurity and Intelligence Informatics, ACM, 23-31. the World's Most Popular Disassembler. No Starch Press.
[8] Bilar, D., 2007. Opcodes as predictor for malware. [22] Yuschuk, O., 2007. Ollydbg. http://www.ollydbg.de/
International Journal of Electronic Security and Digital
[23] Charikar, M.S., 2002. Similarity estimation techniques from
Forensics 1, 2, 156-168.
rounding algorithms. In Proceedings of the thiry-fourth
[9] Han, K.S., Kim, S.-R., and Im, E.G., 2012. Instruction annual ACM symposium on Theory of computing, ACM,
frequency-based malware classification method. 380-388.
INFORMATION - An International Interdisciplinary Journal
[24] D. Bernstein. Usenet posting, comp.lang.c.
15, 7, 2973-2984.
http://groups.google.com/group/comp.lang.c/msg/6b82e9648
[10] Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., 87d73d9, Dec. 1990.
Laorden, C., and Bringas, P.G., 2010. Idea: Opcode-
[25] Androutsos, D., Plataniotis, K., and Venetsanopoulos, A.N.,
sequence-based malware detection. Engineering Secure
1999. A novel vector-based approach to color image retrieval
Software and Systems, Springer, 35-43.
using a vector angular-based distance measure. Computer
[11] Sung, A.H., Xu, J., Chavez, P., and Mukkamala, S., 2004. Vision and Image Understanding,75, 1, 46-58.
Static analyzer of vicious executables (save). In Proceedings
of the 20th Annual IEEE Computer Security Applications
Conference, 2004., 326-334.
321