0% found this document useful (0 votes)
139 views53 pages

Malware Analysis Professional: VA/RVA/Offset & PE File Format

Uploaded by

Saw Gyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views53 pages

Malware Analysis Professional: VA/RVA/Offset & PE File Format

Uploaded by

Saw Gyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Malware Analysis

Professional

VA/RVA/Offset & PE File Format


S e c t i o n 0 2 | M o d u l e 0 4
© Caendra Inc. 2020
All Rights Reserved
Table of Contents

MODULE 04 | VA/RVA/OFFSET & PE FILE FORMAT


4.1 Introduction

4.2 VA/RVA/Offset

4.3 Overview of the Portable Executable File Format (PE)

4.4 Memory and File Alignment

4.5 Conclusion

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.2


4.1

Introduction

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.3


4.1 Introduction

In this module, we will discuss virtual addresses, relative


virtual addresses and offsets, as well as some basic
information about the Portable Executable File Format,
which describes the basic structure of all Windows
executable files.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.4


4.2

VA/RVA/Offset

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.5


4.2 VA/RVA/Offset

Before proceeding with the technical discussing, we should


summarize these three concepts.

Understanding how they are interconnected and how to


calculate one from the other it is critical. This is something
you will have to deal with very often, not just for this course,
but in every reversing challenge that you will choose to take
on during your career in this field.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.6


4.2 VA/RVA/Offset

Applications do not directly access physical memory, only


virtual memory. In other words, the memory addresses
referenced by an application are virtual addresses (VAs).

Virtualizing access to memory provides flexibility in the way


applications use available physical memory. In fact, an
application doesn’t have to occupy a contiguous piece of
physical memory; it can be broken down into parts, without
the application even needing to know about it.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.7
4.2 VA/RVA/Offset

The OS provides the illusion to the application that it


occupies a contiguous memory area by translating virtual
addresses to physical memory addresses.

A relative virtual address (RVA) is the difference between


two VAs and refers to the highest one.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.8


4.2 VA/RVA/Offset
EXAMPLE

VA_1 = 0x00400000
VA_2 = 0x00401000
RVA of VA_2 = VA_2 - VA_1 = 0x00001000

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.9


4.2 VA/RVA/Offset

On the other hand, when we talk about offsets, we usually


refer either to physical memory, a physical file on disk, or
in other general in cases where we treat data as raw data,
without worrying about any differences in the internal
alignment of this data on memory against the one on the
disk.

The offset is the difference between the locations of 2


bytes, for example inside a file, usually starting from the
beginning of the file.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.10
4.2 VA/RVA/Offset

NOTE: We will demonstrate in the next chapter how to


obtain the real offset of a byte in the file itself by knowing
its Virtual Address and other information we retrieve from
the PE Header.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.11


4.2.1 Why Do We Need All of This Information?

The answer is straightforward. Very often during the


reversing process of an application, you will have to do
some code modifications for testing purposes or for other
reasons, such as permanently de-activating an anti-
reversing trick.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.12


4.2.1 Why Do We Need All of This Information?

Of course, patching the code in memory at runtime is a


temporary solution that lasts only as long as the application
is running.

In order to make this change permanent, it would be


necessary to patch the physical file itself, and for that, it’s
necessary that you can calculate the offsets of the bytes
you want to modify in the file on disk.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.13


4.2.1 Why Do We Need All of This Information?

Even though you’ll not usually have to manually calculate


the offset of a byte inside the file itself using its VA or RVA
(because the debugger automatically does this), it is
extremely important to keep these concepts straight in your
mind; you never know when you might need them.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.14


4.3

Overview of the
Portable Executable
Format (PE) [1]

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.15


4.3 Overview of the Portable Executable Format
(PE) [1]

Since this training targets Windows applications, let’s


demonstrate the basic structure of the PE file format
starting at the beginning of the file.

There are minor differences between 32-bit and 64-bit


applications, but for simplicity, let’s focus on 32-bit
applications.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.16


4.3.1 MS-DOS Header

Every PE file starts with a small MS-DOS executable. This


was required in the early days of Windows, before it
became a popular OS. This small stub executable would, at
a minimum, display a message stating that Windows OS
was required to run the application.

The first bytes of the PE file are, indeed, the traditional MS-
DOS header, called also IMAGE_DOS_HEADER.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.17


4.3.1 MS-DOS Header

The two important values in this header are:


• e_magic: a WORD (16-bit) value which must be 0x5A4D
(“MZ” in ASCII - The initials of Mark Zbikowski, one of the
original architects of MS-DOS) called also
IMAGE_DOS_SIGNATURE, and;

• e_lfanew : at offset 3Ch which contains the file offset of


the start of the PE header, also called the
IMAGE_NT_HEADERS structure.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.18
4.3.2 IMAGE_NT_HEADERS Structure (PE Header) [2]

The PE Header, is actually formed by combining together a


few other structures that we are going to discuss about in
detail. In other words it is a structure of structures and it is
defined as follows:

Figure 4.1 IMAGE_NT_HEADERS structure

Its signature is 50450000h (“PE\0\0” in ASCII).


MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.19
4.3.2.1 IMAGE_FILE_HEADER Structure [3]

Figure 4.2 IMAGE_FILE_HEADER structure

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.20


4.3.2.1 IMAGE_FILE_HEADER Structure [3]

This structure contains information about some characteristics


of the executable such as the target CPU architecture (x86, x64),
the number of sections (.text, .data etc..). It also contains a
member with the size value of the IMAGE_OPTIONAL_HEADER
structure called SizeOfOptionalHeader (whose value must be set
to E0h.) This structure is required for every PE executable, even
though it is called “optional”.

Finally the member Characteristics defines some characteristics


of the executable file such as the type of the executable module
(.exe or .dll).
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.21
4.3.2.2 IMAGE_OPTIONAL_HEADER Structure [4]

As mentioned above, this


structure is not optional,
even if its name would
make someone to assume
it as such.

This structure is
demonstrated in the figure
here. Figure 4.3 IMAGE_OPTIONAL_HEADER structure
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.22
4.3.2.2 IMAGE_OPTIONAL_HEADER Structure [4]

This structure contains also some very important


information about the PE file. The Magic member defines if
it as 32 or a 64-bit module.

The AddressOfEntryPoint holds the RVA of the EntryPoint


(EP) of the module. That is the RVA of the address inside
the module itself where the first instruction to be executed
is located.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.23
4.3.2.2 IMAGE_OPTIONAL_HEADER Structure [4]

The BaseOfCode and BaseOfData members hold the RVAs


of the beginning of the code and data sections respectively.

The ImageBase member contains the ImageBase of the


module. This is in reality is the preferred VA where the PE
file will be loaded in memory. This is by default
0x00400000 for applications and 0x10000000 for DLLs.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.24


4.3.2.2 IMAGE_OPTIONAL_HEADER Structure [4]

The SectionAlignment and FileAlignment members


indicate the alignment of the sections of the PE in memory
and in the file respectively. We discuss about these later in
this chapter.

The SizeOfImage member indicates the memory size


occupied by the PE file on runtime. It has to be a multiple of
the SectionAlignment value.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.25
4.3.2.2 IMAGE_OPTIONAL_HEADER Structure [4]

Finally, at the end of the IMAGE_OPTIONAL_HEADER


structure it is located the so called DataDirectory array of
IMAGE_DATA_DIRECTORY structures.

The DataDirectory member (Figure 4.3) is basically a


pointer to the first IMAGE_DATA_DIRECTORY structure.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.26


4.3.3 IMAGE_DATA_DIRECTORY Structure [5]

Each one of these structures (16 by default), contains the


RVA and the size of specific data inside the PE image on
runtime.

Figure 4.4 IMAGE_DATA_DIRECTORY structure

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.27


4.3.3 IMAGE_DATA_DIRECTORY Structure [5]

A few examples of them are the ExportTableAddress (table


of exported functions), the ImportTableAddress (table of
imported functions), the ResourcesTable (table of
resources such as images embedded in the PE), and the
ImportAddressTable (IAT) which stores on runtime the
addresses of the imported functions.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.28


4.3.4 The Section Table

The section table is an array of IMAGE_SECTION_HEADER


structures [6]. Each structure contains information about
its associated section: location, size, and characteristics
that describe the access permissions within that section.

An example would be whether the code in memory in that


section is writable, readable, or if the code there has
execute permissions.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.29


4.3.4 The Section Table

The following is an example of an


IMAGE_SECTION_HEADER structure for one of the
sections of a PE file (DD = DWORD, DW = WORD).

The section name can be at most 8 ASCII chars long, so


this member always occupies 8 bytes of memory.

Let’s point out the real meaning of the more important


members.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.30
4.3.4 The Section Table

004001A8 ASCII ".text"  SECTION NAME


004001B0 92200000 DD 00002092h ; VirtualSize = 2092h
004001B4 00100000 DD 00001000h ; VirtualAddress = 1000h
004001B8 00220000 DD 00002200h ; SizeOfRawData = 2200h
004001BC 00040000 DD 00000400h ; PointerToRawData = 400h
004001C0 00000000 DD 00000000 ; PointerToRelocations = 0
004001C4 00000000 DD 00000000 ; PointerToLineNumbers = 0
004001C8 0000 DW 0000 ; NumberOfRelocations = 0
004001CA 0000 DW 0000 ; NumberOfLineNumbers = 0
004001CC 20000060 DD 60000020h ; Characteristics =
CODE|EXECUTE|READ

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.31


4.3.4 The Section Table

VirtualSize: The size of the section in memory, without the


padding to satisfy the memory alignment properties that
are applied, as described later in this chapter.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.32


4.3.4 The Section Table

VirtualAddress: The RVA of the section in memory. Using


this and the VirtualSize info we can attain the RVA of the
next section, assuming that the memory alignment property
is set to default. We add 2092h (VirtualSize) + 1000
(VirtualAddress) = 3092h, which we need to pad (see
below) to be a multiple of 1000h, so the RVA of the next
section is 4000h.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.33


4.3.4 The Section Table

SizeOfRawData: The real size of this section in the file. If


this is the last section and we sum this value with the
PointerToRawData value, then the result will be the size of
the file itself.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.34


4.3.4 The Section Table

PointerToRawData: The offset where the Raw Data section


starts in the file. So, by adding this to the value above and
assuming that the file alignment property is set to default,
we can obtain the offset of where the next section starts in
the file.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.35


4.3.4 The Section Table

We add 2200h (SizeOfRawData) + 400h


(PointerToRawData) = 2600h. (We don’t need to do any
padding since 2600h is multiple of 200h.)

Remember that when we either refer to arrays in memory or


file offsets, the first byte is at index 0. So, for example, if the
size of a file is 10 bytes then the last byte is located at
offset 9.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.36


4.3.4 The Section Table

Characteristics: The memory access rights for that section


in memory (R, RW, RWE etc..).

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.37


4.3.4 The Section Table

Common section names of an executable are:

.text → This is normally the first section and contains the


executable code for the application. Inside this section is
also the entry point of the application: the address of the
first application instruction that will be executed. An
executable can have more than one section with executable
code.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.38


4.3.4 The Section Table

data → This section contains the initialized data of an


application such as strings.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.39


4.3.4 The Section Table

.rdata or .idata → Usually these section names are used for


the sections where the import table is located. This is the
table that lists the Windows APIs used by the application
(along with the names of their associated DLLs.)

Using this, the Windows loader knows the APIs to find, in


which system DLL, in order to retrieve its address.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.40


4.3.4 The Section Table

.rsrc → This is the common name for the resource-


container section, which contains things like images used
for the application’s UI.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.41


4.3.4 The Section Table

Of course, the author can modify any of these names. In any


case, though, these are just some common names of specific
sections, which shouldn’t imply that they will always be used with
the same name or for the same purpose. Their maximum length
is 8 ASCII characters, and each section has its own
characteristics (access right permissions in memory.)

For example, .text section usually has read/execute access, .data


section: read/write, .rsrc section: read-only access, etc.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.42


4.4

Memory and File


Alignment [1]

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.43


4.4 Memory and File Alignment [1]

As we mentioned earlier, typically an executable file


comprises several sections that serve different purposes.
They could contain executable code or simply data needed
by the application.

The alignment of the sections of an executable in the


physical file is usually different from its image in memory at
runtime.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.44
4.4 Memory and File Alignment [1]

Typically, the default alignment of the sections in the file is


0x200 and default in memory is 0x1000, which of course
doesn’t mean that the same alignment value can’t be used
in both cases if necessary.

These values indicate that the size of a section in the file or


in memory must be a multiple of these values. These are
stored inside the IMAGE_OPTIONAL_HEADER structure,
occupying two 32-bit members called FileAlignment and
SectionAlignment.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.45
4.4 Memory and File Alignment [1]

The reason behind giving these default values is that in this


way we ensure that the beginning of each section in the file
will correspond to the beginning of a disk sector which
usually has also a size of 0x200 (512) bytes.

The linker, while creating the executable, will check the


value of the section’s alignment and if it is 0x200, will then
modify the size of the .text section to the closest multiple
of 0x200 (which is 0x400) by padding it with extra bytes,
usually “zero” bytes.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.46
4.4 Memory and File Alignment [1]

The same thing happens when we launch the application.


The Windows loader will check the alignment of these
sections in memory and if it is 0x1000, it will have to then
reserve the closest multiple number to 0x1000, which is
0x1000 bytes in this case, since the size was smaller than
0x1000. That memory area will be again padded with “zero”
bytes in order to reach that size.

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.47


4.4 Memory and File Alignment [1]

In another instance, for example, if the size of the code in


the .text section was 0x1050 bytes, then the linker would
pad it to reach a value which is a multiple of 0x200. The
closest value would be 0x1200.

But, in memory, it should be aligned to a number that is a


multiple of 0x1000 and the closest number in this instance
to 0x1050 is 0x2000.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.48
4.5

Conclusion

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.49


4.5 Conclusion

We are going to see all this information in practice during


the next chapters whenever it is necessary, so don’t worry if
you didn’t understand everything with alacrity now. It will
make more sense in the coming chapters.

Usually, we are interested only in specific information about


the executable file under analysis, and we don’t have to
care about all of the information located inside the header
of a PE executable. For this reason, we focused on those
parts that you definitely need to be aware of.
MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.50
References

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.51


References
Here’s a list of all references linked or used in this course.
An In-Depth Look into the Win32 Portable Executable File Format.
http://msdn.microsoft.com/en-us/magazine/cc301805.aspx

IMAGE_NT_HEADERS structure
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680336%28v=vs.85%29.aspx

IMAGE_FILE_HEADER structure
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680313%28v=vs.85%29.aspx

IMAGE_OPTIONAL_HEADER structure
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680339%28v=vs.85%29.aspx

IMAGE_DATA_DIRECTORY structure
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680305%28v=vs.85%29.aspx

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.52


References
IMAGE_SECTION_HEADER structure
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680341%28v=vs.85%29.aspx

MAPv1: Section 02, Module 04 - Caendra Inc. © 2020 | p.53

You might also like