0% found this document useful (0 votes)
10 views

103 Assembly Crash Course

Uploaded by

n3rdh4x0r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

103 Assembly Crash Course

Uploaded by

n3rdh4x0r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 157

Malware Analysis

Professional

x64 Crash Course


S e c t i o n 0 1 | M o d u l e 0 3
© Caendra Inc. 2020
All Rights Reserved
Table of Contents

MODULE 03 | x64 CRASH COURSE

3.1 Introduction

3.2 CPU Architecture

3.3 ASM – The Basics

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.2


3.1

Introduction
Your journey into x64 Assembly…

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.3


3.1.1 Assembly Language – What is it?

High Level Languages


Assembly language is a Low-Level Fortran
Programming language. C++
Pascal
C
Basic
Low level programming languages are
close to the machine language, non- compiler
portable, and with almost no Assembly Language
abstraction (more details on that assembler
shortly)…
Machine Code
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.4
3.1.1 Assembly Language – What is it?

Assembly language is an intermediate layer between


machine code and (almost) all high-level languages.

High level languages which require a virtual machine (like


JAVA) have a different approach. The virtual machine runs
the code and might convert it to Assembly then Machine
code at runtime using a JIT compiler (Just-In-Time
compiler).
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.5
3.1.1 Assembly Language – What is it?

In this course, we will use C++ as an Java Code


example for High Level Languages for compiler
the following reasons:
• C++ to Assembly code is more clear Byte Code
• Better support for disassembly tools JIT (convert to
assembly then
machine code)

NOTE: JVM (Java Virtual Machine) is


written in C++ Machine Code

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.6


3.1.1 Assembly Language – What is it?

Here are some additional details on the main


characteristics of Assembly as a Low-Level Language:
• Close to machine code: every machine code is
represented by a single assembly instruction.

• Non-portable: because it is close to the machine code,


the code can be used on that specific machine. If we
need it on another machine, we will need to re-write the
code!
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.7
3.1.1 Assembly Language – What is it?

• No abstraction: every instruction can be directly


converted to a machine code depending on its operand.
The language does not provide objects, classes, reusable
functions, or even control statements or loops. All of that
can be achieved in assembly, but with a lot of code!

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.8


3.1.1 Assembly Language – What is it?

After what we’ve explained so far, it is not a surprise that a simple


statement in C++ can result in multiple Assembly language
instructions (thus, multiple machine codes), as seen below.
C++
aka:
y *= x; byte-code

Assembly Language Machine Code (hex)

mov eax,dword ptr [rsp+20h] 8B 44 24 20


imul eax,dword ptr [rsp+24h] 0F AF 44 24 24
mov dword ptr [rsp+20h],eax 89 44 24 20
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.9
3.1.2 Assembly Language – Why needed?

There are many high programming languages that make


developers’ life easier, and for some developers, these
languages make application development an art and a fun
process! It is much easier to create programs in high level
languages, plus it is easier to both debug and maintain
them.

Q: So, why learn a low-level or old fashion language?


A: Because that’s what you will be dealing with when doing
reverse-engineering.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.10
3.1.2 Assembly Language – Why needed?

Assembly language was needed for a quite long time to


create an optimized code for performance reasons.

But with modern compilers shipped with intelligent


optimizers, the usage for a low-level language has dropped!

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.11


3.1.2 Assembly Language – Why needed?

The purpose of learning assembly can be:


1. Writing code for specific tasks related to hardware
drivers for some operating systems (out of the scope of
this course).

2. Reverse engineering an existing executable’s code to:


• Find a vulnerability and create an exploit
• Understand the executable’s functionality. (For
malware analysis, this is what we need)
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.12
3.1.2 Assembly Language – Why needed?

When debugging software, we target the running process


and dive into its executable code (the machine code).
Reading this code as Assembly instructions is much easier
than reading sequences of endless hexadecimal numbers
(byte-code).

Learning Assembly language will help us to understand the


behavior of this program during execution at the CPU level,
and with such knowledge we can detour the code during
execution to take other paths that suite our analysis.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.13
3.2

CPU Architecture

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.14


3.2.1 Overview

Because Assembly language is a machine code that can be


understood by humans, the first step in our journey is to
understand the architecture of a Machine (the CPU).

From a high-level view, the CPU consist of the following


components:
• Control Unit (CU): to manage execution and data flow of
all other components.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.15
3.2.1 Overview

From a high-level view, the CPU consist of the following


components (CONT.):
• Registers: special memory-stores to be used by various
machine instructions. Almost all instructions refer to a
register in one of their operands.
• Arithmetic and Logic Unit (ALU): to handle arithmetic
and logical (bitwise) operation on integer values.
• Floating-point Unit (FPU): to handle arithmetic
operations on float values.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.16
3.2.1 Overview

That was a simple architecture. In modern CPUs, there are


more units, and more internal memory-stores (like cache
memory). Some CPUs now even include an internal Graphic
Processing Unit (GPU).

All extra components that are not covered in this course


will not change any concept discussed here. It will just give
us more ways to manipulate the code, or better
understanding of CPU performance and optimization
techniques.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.17
3.2.1 Overview

The Control Unit (CU): CPU


• Fetches machine codes from the CU

Main Memory (RAM)


Memory
• Decodes codes into instructions
Registers
• Loads/saves data into Registers
• Orders other Units to perform ALU
specific actions
• Repeats all of that again and FPU
again
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.18
3.2.2 Registers

This course will discuss Intel® 64 architecture (64-bit CPU


is the most used), which is the same architecture used by
AMD64. It is also referred to as x64, and compatible with
x86_64 architecture. The Intel® 64 CPUs have four main
types of registers:

1. General Purpose Registers: 16 registers, 64-bit each,


used for many purposes (reason for calling them general):
• Operands for logical and arithmetic operations
• Operands for address calculations
• Pointers to memory locations
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.19
3.2.2 Registers

2. Segment Registers: 6 registers; 64-bit each. To control


memory segmentation (code space, data space, etc..) during
execution

3. RFLAGS Register: a single 64-bit register that holds many bit-


represented flags; each flag has a special meaning

4. Instruction Pointer Register: a single 64-bit register. Points to


the memory location that holds the next machine instruction to
be executed. (this is important for exploit development to gain
control of the value saved in this register)
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.20
3.2.3 CPU Architecture – General Purpose
Registers
General purpose registers are 64-bit registers. CPU
instructions can access these registers completely, or
partially:
• Access the complete 64-bit register
• Access the lower 32-bit part of the register
• Access the lower 16-bit part of the register
• Access the lower 8-bit part of the register
• Only 4 registers have a special access to (high part) of
the (lower 16-bit part)
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.21
3.2.3 CPU Architecture – General Purpose Registers

Example: register RAX is a general-purpose register. Referring to RAX in


an instruction means that we are trying to access the full value held by
the register. While EAX means we are accessing only the lower 32bit
part of the same register.
63 31 15 7 0

AH AL
AX
EAX
RAX
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.22
3.2.3 CPU Architecture – General Purpose
Registers
Full Register 64-bit Lower 32-bit part Lower 16-bit part Special Access Lower 8-bit part
RAX EAX AX AH AL
RBX EBX BX BH BL
RCX ECX CS CH CL
RDX EDX DX DH DL
RDI EDI DI DIL
RSI ESI SI SIL
RBP EBP BP BPL
RSP ESP SP SPL
R8 R8D R8W R8B
R9 R9D R9W R9B
…R15 …R15D …R15W …R15B

Please note that three dots (…) in last row means that the same applies to register
R10, R11, R12, R13, and R14; they were removed for brevity.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.23
3.2.3 CPU Architecture – General Purpose
Registers

Each Assembly instruction has its own applicability of


register usage. That is what makes them Registers not an
arbitrary random Variable.

Each register has a common use in many instructions.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.24


3.2.3 CPU Architecture – General Purpose
Registers
The register’s names is derived from their common usage:
• RAX: many Arithmetic instructions depend on the value of this
register.

• RBX: mostly used as the Base index for pointers.

• RCX: loop operations use this register as their Counter.

• RDX: used by many Device input/output operations, and


arithmetic Divide instructions.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.25
3.2.3 CPU Architecture – General Purpose
Registers

• RDI: used by string operations as the Destination Index.

• RSI: used by string operations as the Source Index.

• RBP: mostly used as the Base Pointer for the memory


location holds variables passed to sub-routines.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.26


3.2.3 CPU Architecture – General Purpose
Registers

• RSP: mostly used as a Stack Pointer for the program’s


logical stack.

• R8…R15: are new Registers added to 64-bit CPUs. With


no special use by instructions, they can be used as
helpers for any instruction that accepts them as
operands.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.27


3.2.3 CPU Architecture – General Purpose
Registers

We can also mention that:


• (L) in 8-bit partial access refers to (Low)
• (H) in special access refers to (High)
• (X) in 16-bit partial access refers to (high and low
together)
• (E) in 32-bit partial access refers to (Extended)
• And (D, W, B) in R8..R15 partial access refer to (DWORD,
WORD, BYTE) respectively
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.28
3.2.3 CPU Architecture – General Purpose
Registers

In addition to the common use of Registers in Assembly


instructions; Operating Systems have their (standard) way
to use them on function/API calls.

This is named (Calling Convention), which will be discussed


in detail later in this module.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.29


3.2.4 CPU Architecture – Segment Registers

Segment Registers:
• Before the 64-bit architecture (for example: x86); these
registers, controlled segmentation of memory in many
modes. In 64-bit architectures, they are always a flat
address space (base Zero for each segment).

• These registers are not used in this course because we


are focusing on 64-bit architecture, and the four main
segment registers are forced to Zero in this mode.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.30
3.2.4 CPU Architecture – Segment Registers

Segment registers are:


• CS: Code Segment register to point to code space in
memory
• DS: Data Segment register to point to data space in
memory
• SS: Stack Segment register to point to stack space in
memory
• ES, FS, and GS: Extra data Segment to point to an extra
data space in memory
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.31
3.2.4 CPU Architecture – Segment Registers

For the sake of more clarity; let us assume that we are


using a 16-bit CPU, this means that General Purpose
Registers are 16-bit length.

If we need to copy the value 12h (12 hexadecimal which is


18 in decimal) to the memory, and for that we use a register
as a pointer, we will notice that we can only access 64-
Kbytes of the memory. Because the maximum value that a
register can be set to is 65535 (2^16 – 1).
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.32
3.2.4 CPU Architecture – Segment Registers

To access more than 64-Kbytes memory, we can use a


combination of Segment and General Register to have more
than 16-bit length for maximum value:

MOV ES:CX, 12h

The combination of ES:CX will allow the instruction to refer


to more memory. But that still depends on the
segmentation mode
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.33
3.2.4 CPU Architecture – Segment Registers

It is clear now why these registers are almost not used in


64-bit programs. It is because 64-bit registers can address
(2^64 – 1) bytes (16 Exabytes, or 16 million Terabytes).

64-bit Operating Systems Use FS and GS registers to create


virtual memory isolation rings for security reasons.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.34


3.2.5 CPU Architecture – RFLAGS Register

RFLAGS is a 64-bit register, but the high 32-bit part is


reserved for future use, in other words not used yet.

A flag, is one bit of the 32-bits of RFLAGS register.

Many instructions either affect and/or get affected by the


values in RFLAGS flags.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.35


3.2.5 CPU Architecture – RFLAGS Register

It is worth to mention that in 32-bit (x86) architecture; this


register was a 32-bit register and named EFLAGS. The x64
architecture just extended the size of the register but kept
its use unchanged from the older x86. Additionally, not all
32 flags are used. Some of them are reserved and must be
kept intact.

This course will only discuss the use of some flags, as


needed.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.36
3.2.5 CPU Architecture – RFLAGS Register

The following is a quick view on the lower 32-bit part of the register.
Bit Name Meaning Bit Name Meaning Bit Name Meaning
0 CF Carry Flag 11 OF Overflow Flag 22 -- -Reserved-
1 -- -Reserved- 12 23 -- -Reserved-
IOPL I/O privilege Level field
2 PF Parity Flag 13 24 -- -Reserved-
3 -- -Reserved- 14 NT Nested Task flag 25 -- -Reserved-
4 AF Auxiliary Carry Flag 15 -- -Reserved- 26 -- -Reserved-
5 -- -Reserved- 16 RF Resume Flag 27 -- -Reserved-
6 ZF Zero Flag 17 VM Virtual-8086 Mode flag 28 -- -Reserved-
7 SF Sign Flag 18 AC Alignment Check flag 29 -- -Reserved-
8 TF Trap Flag 19 VIF Virtual Interrupt Flag 30 -- -Reserved-
9 IF Interrupt enabled Flag 20 VIP Virtual Interrupt Pending flag 31 -- -Reserved-
10 DG Direction Flag 21 ID Identification Flag
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.37
3.2.6 CPU Architecture – Instruction Pointer Flag

The short name of the Instruction Pointer register in x64


architect is RIP.

The next instruction to be executed is always pointed by


this register. Be aware that it is the next instruction, not the
current instruction.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.38


3.2.6 CPU Architecture – Instruction Pointer Flag

There is no instruction that can update the value of this


register. Simply because changing this pointer will change
the original execution flow/path of the program.

When a sub-routine is called using a call instruction; the


current value of RIP register is pushed into the program’s
memory stack, so it can be popped later after the sub-
routine returned using a ret instruction so execution can
continue from where it was interrupted by the call
instruction.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.39
3.2.6 CPU Architecture – Instruction Pointer Flag

Say that again please!

We cannot manipulate this register directly because there is no


instruction to do that, the architecture prevents us from doing
that, but the value of this register is being saved to memory stack
and loaded from there. The memory region that we can access
and modify through the use of many instructions!

Sounds interesting to exploit developers and maybe to us, too.


MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.40
3.2.7 CPU Architecture – Instruction Set

The Control Unit (CU) does all the processing magic by


executing a series of instructions represented by Machine
Codes (aka byte-code).

Machine Codes are just a set of binary numbers. The CU


makes these codes usable by performing a very simple task
according to code values. Therefore, the long list of codes
understood by the CPU is called Instruction Set.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.41
3.2.7 CPU Architecture – Instruction Set

Deep knowledge of Instruction Set formatting is not our


interest.

Our interest is to know how to read/write instructions in


Assembly Language (better than the readable format in
binary representation).

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.42


3.3

ASM – The Basics


Your journey into x64 Assembly…

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.43


3.3.1 Instructions

In general; instructions are represented in the following


format:

<instruction>[<white space><destination>[<,source>]]

Note: <> for mandatory, [] for optional, depending on the


instruction

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.44


3.3.1 Instructions

Anything after a semicolon is considered a comment,


except for semicolon inside double quotations which are
treated as normal text.

Example:
mov eax, ebx ; this is to copy EBX value into EAX

This is a comment.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.45
3.3.1 Instructions

MOV RCX, 20h Copy hex value 20 into RCX register


;MOV RCX, 33h Does nothing, this is just a comment
MOV RCX ; 20h Syntax error, source operand not found!
INC EAX Increase EAX value by one
INC EAX, 3h Syntax error, INC dose not take source operand
INC EAX;whatever Increase EAX value by one
CLTQ Convert signed value in EAX into 64-bit and copy it to RAX
CLTQ EBX Syntax error, CLTQ does not take any operand
CLTQ EAX Syntax error, CLTQ does not take any operand

More examples: MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.46
3.3.1 Instructions

There are many groups of instructions, each group represents a feature of


the CPU, therefore one CPU can contain a set of categories that are not
included in another CPU even if they are both x64 architecture. For
example:
• General Purpose Instructions: Available in all x86 and x64 CPUs
• x87 FPU Instructions: For floating point, available in almost all modern
x64 CPUs
• MMX Instructions: Multimedia helpers, available in almost all modern
x64 CPUs
• Intel® SHA Extension: SHA algorithm calculators, available in Intel® Atom
CPUs
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.47
3.3.1 Instructions

Assembly Language contains hundreds of instructions.


Each instruction has its own use, size, and speed! The size
of instruction (in bytes) varies according to given operands.
Same goes for execution speed of the instruction.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.48


3.3.1 Instructions

But for modern CPUs, speed comparison between


instruction is not feasible anymore!

Q: You might be asking, why?


A: For many factors in modern CPU architecture. Anyway,
discussing these factors is out of the scope of this course.
We are concerned with effects of instructions, not their
speed.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.49
3.3.1 Instructions

Low level programming developers tend to learn as many


instructions as they can for two main reasons:

1.Optimization: For example both ( add eax, 1 )and


( inc eax ) will increment the value of EAX register by one.
We use INC because its machine code is smaller. But if
we need to watch Carry Flag (CF) then we use ADD
because it affects that flag unlike INC instruction that
preserves it.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.50
3.3.1 Instructions

2. Obfuscation: Some instructions make it harder on


reverse engineers to figure out the execution path of
the process. Note that obfuscation may greatly impact
optimization.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.51


3.3.2 Fundamental Data Types

Fundamental Data Types also represent unsigned numeric


values:
• Byte: one memory byte (8-bits)
• Word: two memory bytes (16-bits)
• Double Word: four memory bytes (32-bits)
• Quad Word: eight memory bytes (64-bits)
• Double Quad Word: 16 memory bytes (128-bits)

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.52


3.3.3 Signed Numeric Data

Each fundamental data type can represent a Signed Numeric value by


reserving the highest bit for the sign (0 for positive, 1 for negative).
Examples:
Data Length Binary representation As unsigned As signed
Byte 0000 0010 2 2
Byte 1000 0010 130 -126
Word 00000000 10000010 130 130
Word 10000000 10000010 32898 -32638
Double Word 00000000 00000000 10000000 10000010 130 130
- Negatives are calculated by Two’s Complement method
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.53
3.3.4 Floating-Point Data Type

Floating-Point Data Type is the binary representation for


floats (numbers with decimal fractions).

In the binary level (the only level understood by the CPU),


mathematic operations on floats are complicated when
compared to operations on integers.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.54


3.3.4 Floating-Point Data Type

For example, 32-bit value for a float contains three things:

31 23 0

Sign bit 8 bits exponent 23 bits precision (fraction)

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.55


3.3.4 Floating-Point Data Type

General Purpose Instructions cannot deal with float value


directly.

A simple operation like incrementing a float value by one


will require many lines of complicated code using General-
Purpose instructions like ADD, MOV, SHL, SHR, etc.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.56


3.3.4 Floating-Point Data Type

The use of FPU instruction comes here.

For example:
• FADD is used to add values to a float
• FMUL is used for multiplication

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.57


3.3.4 Floating-Point Data Type

FPU Instructions can process four types of floats:


15 10 0
Sign
5 bits exponent 10 bits precision
31 23
Sign

8 bits exponent 23 bits precision


63 52

Sign 11 bits exponent 52 bits precision


79 64 63

Sign 14 bits exponent Integer bit 64 bits precision


MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.58
3.3.5 Using Registers

Copying values from one register to another, from memory


location to register, or from a register to memory location is
called moving.

Therefore, in Assembly language, when we move a value,


we are actually copying it from source to destination. The
most frequently used instruction for this purpose is MOV
instruction.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.59
3.3.5 Using Registers

Registers are the fastest memory locations in the


computer, simple because they are built inside the CPU and
highly integrated with CPU units.

Because accessing registers is faster than accessing


memory, it will be very convenient for any operation to
move values from memory to registers, process them, then
moving results back to memory.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.60
3.3.5 Using Registers

A side note while speaking about fast memory:


• There is a small memory named cache memory, which is
located inside the CPU too and is considered very fast
compared to main memory (RAM).

• Cache memory contains small copies of the RAM,


gathered by complex internal CPU mechanism. When a
memory location is needed; the CPU looks first in cache
memory to fetch the value from there instead of going the
long way to the RAM.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.61
3.3.5 Using Registers

There is no way to directly access cache memory


intentionally by any instruction. It is a pure internal
architecture thing that is handled implicitly by the CPU.

Therefore, we won’t be focusing on cache memory in our


Assembly coding or disassembling tasks.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.62


3.3.5 Using Registers

A register itself can be accessed in many sizes (ex: RAX for


64-bit, EAX for 32-bit).

Moving values between registers and memory locations


must be handled in identical sizes. Moving values partially
to a register will not affect values already saved in other
parts of that register. Keep that in mind.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.63


3.3.5 Using Registers

For example:
mov rax, 0x100000000;now rax value is 0x00000001 0000 0000
mov eax, x0ffff ;now rax value is 0x00000001 0000 ffff
mov ah, 0 ;now rax value is 0x00000001 0000 00ff

(affected parts are shown in red)


MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.64
3.3.5 Using Registers

In this example, we use SHL instruction, which shifts bits to the left
inside the target part of the register. Syntax:
SHL <register>, <bits count>
mov rbx, 0xa0000000f; new rbx value is 0x0000000a 0000 000f
shl ebx, 8 ; new rbx value is 0x0000000a 0000 0f00
shl rbx, 4 ; new rbx value is 0x000000a0 0000 f000
shl bx, 4 ; new rbx value is 0x000000a0 0000 0000
mov bl, 0xbb ; new rbx value is 0x000000a0 0000 00bb
mov rcx, 0xcc ; new rcx value is 0x00000000 0000 00cc
shl rcx, 32 ; new rcx value is 0x000000cc 0000 0000
mov ecx, ebx ; new rcx value is 0x000000cc 0000 00bb
(effected parts shown in red)
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.65
3.3.6 Mathematical Operations

Registers can be used for mathematical operations through


General Purpose instructions.

The most used instructions are:


1. ADD
2. SUB
3. MUL
4. DIV

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.66


3.3.6 Mathematical Operations

ADD: calculates summation of destination and source


operand and saves the result in destination.

SUB: subtracts the value of source from destination and


saves the result in destination.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.67


3.3.6 Mathematical Operations

Both ADD and SUB accept two operand (source and


destination), where operands can be:
• Registers
• Memory
• Immediate (constant)

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.68


3.3.6 Mathematical Operations

ADD and SUB operands can be Registers, Memory, or Immediate


(constant). Not all combinations are allowed. For example:
add eax, bx ; wrong operand size combination
add rex, rax
add al, bl
add al, bh
add ah, r8b
add al, r8b ; not allowed
add ax, 1
add 1, ax ; invalid, destination cannot be immediate
add eax, dword ptr [esi]
add dword ptr [esi], 1
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.69
3.3.6 Mathematical Operations

Details about validity of operand combinations for each


instruction can be found in CPU reference books.

Another way is to test the code in a compiler such as GCC


and observe the result.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.70


3.3.6 Mathematical Operations

A fast way is to try the code in an online compiler that uses


GCC or NASM behind the scene. For example:
https://defuse.ca/online-x86-assembler.htm and
https://rextester.com/l/nasm_online_compiler

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.71


3.3.6 Mathematical Operations

MUL and DIV have special use of Registers (many other


instructions have). Unlike ADD and SUB, these instruction
accept one operand, and that operand is considered a
source.

The destination operand is implicit and must be filled in a


previous statement.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.72


3.3.6 Mathematical Operations

( mul <source> ) considers the destination as RAX, EAX, AX,


or AL (depending on source size).

It will multiply value of source by value of destination then


saves the result in destination Register as the low part, and
RDX, EDX, DX, DL (depending on source size) as a high part.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.73


3.3.6 Mathematical Operations

For example, 16-bit multiplication must use AX and DX:

; ffh x ab00h = aa5500h ax bx dx dx:ax


mov ax, 0xff 00ffh ? ? ?
mov bx, 0xab00 00ffh ab00h ? ?
mul bx 5500h ab00h 00aah 00aa5500h

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.74


3.3.6 Mathematical Operations

Another example, 32-bit multiplication must use EAX and EDX:

; 0xff00 X 0xab0000 = 0xaa55000000 eax ebx edx edx:eax


mov eax, 0xff00 0000ff00h ? ? ?
mov ebx, 0xab0000 0000ff00h 00ab0000h ? ?
mul ebx 55000000h 00ab0000h 000000aah 00aa55000000h

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.75


3.3.6 Mathematical Operations

DIV instruction works in a similar way. We will consider it an


exercise to search for the appropriate way to use the DIV
instruction.

It is also important to know that MUL deals with unsigned


values. While IMUL (a different instruction) is for signed
values.

DIV results are integers with a reminder. For float division,


we use the instruction FDIV.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.76
3.3.7 Logical Operations

Logical operations can be performed using AND, OR, XOR,


and NOT instructions.

Flags like Zero Flag (ZF) are affected by the result of these
instructions.

In addition, TEST instruction is used to perform a virtual


AND operation and affects flag values without saving the
result in destination operand.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.77
3.3.7 Logical Operations

Logical instruction are very important as they are heavily


used to control execution path of the program.

This is done using conditional jump instructions that are


controlled by the flags.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.78


3.3.7 Logical Operations

XOR instruction can be seen a lot in a way that looks odd:


xor eax, eax

This will do a logical XOR for the value of EAX Register with
itself and save the result in the EAX Register. The result will
always be Zero, so the instruction is similar to:
mov eax, 0

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.79


3.3.7 Logical Operations

The XOR instruction is an optimized use of Assembly


instructions.

Converting MOV instruction to machine code consumes


more bytes than converting XOR instruction. Moreover, in
older CPUs, XOR is faster than MOV.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.80


3.3.8 Bitwise Operations

Bitwise operations are performed using many instructions


like:
• SHL: shift bits to the left (to higher bits). Most-significant
bits might be lost
• SHR: shift bits to the right (to lower bits). Least-significant
bits might be lost
• ROL: rotate bits to the left. Most-significant bits reinserted
into the right
• ROR: rotate bits to the right. Least-significant bits
reinserted into the left
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.81
3.3.8 Bitwise Operations

Example:
eax (in binary)

mov eax, 0x1004 00010000 00000100

shr ax, 2 00010000 00000001

ror eax, 1 10001000 00000000

shl ax, 1 10001000 00000000

shl eax, 1 00010000 00000000

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.82


3.3.9 Control-Transfer Instructions

To change execution path and create a control flow of the


program, Control-Transfer instructions come in handy. Using
them is the only way to change the value of RIP - the execution
path.

Control-Transfer instructions can be split in two main categories:


• Unconditional: always change path when encountered. For
example, the instructions like: JMP, CALL, RET and INT
• Conditional: change path if-and-only-if a condition is met. For
example, instructions like: JZ, JNZ, JC, JE, and LOOP
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.83
3.3.9 Control-Transfer Instructions

Some conditional jump instructions:


• JZ: jump if Zero Flag (ZF) is set
• JNZ: jump if Zero Flag (ZF) is clear
• JCF: jump if Carry Flag is set
• JE: jump if Equal (when Zero Flag (ZF) is set). Yes, it is
the same as JZ, the same machine code as well
• JG: jump if greater (when ZF is cleared, and SF=OF)

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.84


3.3.9 Control-Transfer Instructions

For example, the following assembly code compares the


values of ebx and eax and changes the execution path if
both registers contain the same value:

xor ebx, eax


jz 0x50000000 ; address of next line of code when equal

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.85


3.3.9 Control-Transfer Instructions

Consider the impact of the XOR instruction, which saves the


output into the destination register. What is the original
value which was overwritten is needed later in the program?

If we need to preserve that value, then we need to use the


instruction “CMP”.
cmp ebx, eax
jz 0x50000000 ; address of next line of code when equal

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.86


3.3.9 Control-Transfer Instructions

LOOP instruction controls execution path in a special way. It


decrements the value of the ECX Register by one then jumps if the value
in ECX is not zero. Take a look at the example below:
xor rax, rax
mov ecx, 12

_summation_loop: At the end of this code, EBX value will


add eax, ecx be 78.

loop _summation_loop Can you trace and verify that?

mov ebx, eax


MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.87
3.3.9 Control-Transfer Instructions

The previous example used a label instead of an actual


address “_summation_loop”
• Labels are helpers in Assembly Language to be used
instead of addresses of instructions.
• Labels can also be used with JUMP and CALL
instructions.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.88


3.3.9 Control-Transfer Instructions

Don’t mix between Assembly labels and function names in


high level languages. Assembly labels are very similar to C
labels that are used by legacy goto statements.

Best practices of high-level languages restrict the use of


goto and labels. But for Assembly; the program highly
depends on jumps and labels!

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.89


3.3.9 Control-Transfer Instructions

Interrupts:
• INT instruction is a very special instruction that is used to
call kernel functions.

• These kernel calls are named interrupts.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.90


3.3.9 Control-Transfer Instructions

Interrupts (cont.):
• The program must fill few registers with specific values in
a specific way before calling an interrupt; each interrupt
has its own manual regarding that.

• Because interrupts are kernel calls, this make them


dependent on the Operating System that is running the
program.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.91
3.3.9 Control-Transfer Instructions

Sub-routines:
• CALL instruction is used similarly to JUMP to change
execution path unconditionally.

• The power of CALL instruction is that it saves the original


value of RIP Register on the stack. Then RET instruction is
used to pop out RIP value and return back to the original
location (the instruction exactly after CALL).
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.92
3.3.9 Control-Transfer Instructions

Sub-routines (cont.):
• A sub-routine is a virtual convention that is used to define
the chunk of instructions starting from first instruction
located by CALL, and ended by RET.

• Calling a sub-routine can also represent a system function


call. More on that later.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.93


3.3.9 Control-Transfer Instructions

Sub-routines (cont.):
• Assembly Developers (and compilers for high level
languages) are responsible of keeping the sync between
CALL and RET. Each CALL must have a corresponding
RET.

• Later in this module, we will discuss more details about:


• CALL,RET instructions and their affect on the Stack.
• Calling convention in 64-bit Operating Systems (like
Windows).
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.94
3.3.9 Control-Transfer Instructions

;... some code


call _sub1
call _sub2
;... some code
Main part of ;... end execution code
the program _sub1:
;... do something.. Sub-routine
ret

_sub2:
;... do something else Sub-routine
ret
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.95
3.3.10 Accessing Memory

Most instructions can access memory directly. In x64 the


default is flat-memory access mode. Means the process
can access any part of the memory, starting from memory
location 0x00 as the topmost location.

In instructions, memory address is included inside brackets


to indicate a memory access. An example is shown on the
next slide.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.96
3.3.10 Accessing Memory

Let’s assume that the value 0x11223344 is already present


at memory address 0x40000000.

eax ebx ecx [ebx]


mov ebx, 0x40000000 ? 0x40000000 ? 0x11223344
mov eax, ebx 0x40000000 0x40000000 ? 0x11223344
mov ecx, [ebx] 0x40000000 0x40000000 0x11223344 0x11223344
inc dword [ebx] 0x40000000 0x40000000 0x11223344 0x11223345
mov [ebx], ecx 0x40000000 0x40000000 0x11223344 0x11223344
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.97
3.3.11 The Stack

Program Stack is a one-dimensional memory array.


Memory address of the top item is saved in RSP Register.

IMPORTANT: Pushing into the Stack will decrease the value


of RSP Register. While popping from it will increase the
value of RSP Register.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.98


3.3.11 The Stack

Pushing or Popping will decrease and increase value of


RSP by 2 or 8 depending on the used operand (16-bit or 64-
bit).

Pushing or Popping 32-bit Registers or immediate values is


not allowed!

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.99


3.3.11 The Stack

• PUSH instruction is used to push data into the stack.

• POP instruction is used to push data into the stack.

• CALL instruction pushes the value of RIP Register into the


stack.

• RET instruction pops the value of RIP Register into the


stack
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.100
3.3.11 The Stack

The PUSH and POP operands will determine if a 16-bit or


64-bit value is to be used.

CALL and RET will always push and pop a 64-bit value
from/to RIP Register.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.101


3.3.11 The Stack

By default 64-bit processes should align stack items in 64-


bit values, even if the required value is a 32-bit , 16-bit, or 8-
bit. This default is for the sake of speed access to main
memory.

Each process is responsible of managing its own stack.


Managing the stack by manually changing RSP value (using
SUB, ADD, or MOV) is a standard practice in 64-bit
Operating Systems (more on that later).
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.102
3.3.11 The Stack

Instructions that manage the Stack, depend on RSP Register. As


a Base Pointer for the Stack (a pointer to the first lower item);
RBP Register is the standard register to be used.

Having a Base Pointer is not always necessary. Actually


compilers will not tend to create code to management base
pointer unless it is necessary.

We determine if managing a Base Pointer is necessary or not; by


the need of Stack Frame. If a Stack Frame is need, then RBP
should come into play. (more on Stack Frames later)
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.103
3.3.11 The Stack

The following is a sample representation of 64-bit aligned


stack:

Address of this item = RSP Top item of the stack


Address of this item = RSP + 8 Fourth item of the stack
Address of this item = RSP + 16 Third item in the stack
Address of this item = RSP + 24 Second item in the stack
Address of this item = RSP + 32 First item of the stack

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.104


3.3.11 The Stack

Can we mix 16-bit and 64-bit alignments in the same


stack?

Yes, but that needs a careful implementation. We can even


mix non-standard alignments like 8-bit or 24-bit using
manual modification of RSP register. But all of this is not
recommended anyway.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.105


3.3.11 The Stack

How to tell that the stack is empty?

It is not always possible. In 32-bit processes, EBP register is


used the Base Pointer of the stack, but this register is not
always used in 64-bit processes.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.106


3.3.11 The Stack

Example use of the stack: (initial value of RSP is an


assumption for the sake of illustration)
rax rbx rsp
mov rax, 0xaaaa 0xaaaa ? 0x0222
mov rbx, 0xbbbb 0xaaaa 0xbbbb 0x0222
push rax 0xaaaa 0xbbbb 0x021a
push rbx 0xaaaa 0xbbbb 0x0212
mov rbx, [rsp + 8] 0xaaaa 0xaaaa 0x0212

Address of this item = RSP 0xbbbb


Address of this item = RSP + 8 0xaaaa
Address of this item = RSP + 16 ?????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.107
3.3.11 The Stack

More examples of valid use of the stack:


• push 0x123456789a ; pushing 64-bit immediate
• push 0x1 ; pushing 64-bit immediate (immediate is always 64-bit)
• push ex ; pushing value of 16-bit register
• push word ptr [rax] ; pushing 16-bit memory value
• push qword ptr [rax] ; pushing 64-bit memory value
• add rsp, 0x8 ; adding 8 to RSP, similar to popping 64-bit into no
where!
• add rsp, 0x1 ; adding 1 to RSP, like popping 8-bit into no where.
but this is totally not recommended because it will mis-aligned the
stack.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.108
3.3.11 The Stack

Some examples of invalid use of the stack:


• push ah ; pushing value of 8-bit register is not allowed
• push eax ; pushing value of 32-bit register is not allowed
• push dword ptr [rax] ; pushing 32-bit memory value
is not allowed
• push byte ptr [rax] ; pushing 8-bit memory value is
not allowed

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.109


3.3.12 x64 Calling Conventions

The term Calling Convention refers to the way Registers


and Stack are managed when calling and returning from
sub-routines.

There are many calling conventions. In this section, we will


discuss the standard x64 calling convention used by the
Windows® operating system.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.110


3.3.12 x64 Calling Conventions

We will use C++ language as a high-level language for our


examples.

In C++, x64 calling convention is set by preceding the


function declaration with:

extern “C”

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.111


3.3.12 x64 Calling Conventions

The most crucial use of the Stack is managing calls between


sub-routines. In general, to call a sub-routine we must have the
ability to do four things:
1. Preserve address of next instruction to be executed after the
sub-routine finished.
2. Pass arguments to the sub-routine.
3. Have a small memory space for sub-routine’s local variables.
4. Get a returned value by the sub-routine (in Assembly we call
them all sub-routines, not functions, even if they return a
value).
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.112
3.3.12 x64 Calling Conventions

Remember, all the discussion here is about x64 calling


conventions!

The sub-routine to be called is referred as the “ Callee”.

The sub-routine initializing the call is referred as the


“Caller”.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.113


3.3.12 x64 Calling Conventions

Any Callee can be a Caller for another sub-routine (or even


to itself in case of recursive calls), and vice versa.

The Caller and the Callee share responsibility of managing


the Stack.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.114


3.3.12 x64 Calling Conventions

Main task managed during a sub-routine call:


Using Using
Task Responsible
Stack Registers
Preserve address of next instruction after the call Caller Yes No
Passing arguments Caller Yes Yes
Saving a return value Callee No Yes
Manage memory for Callee’s local variables Callee Yes Maybe

Preserve values of some Registers Callee Maybe Maybe

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.115


3.3.12 x64 Calling Conventions

Some general-purpose registers, and some SSE registers


are used by the Caller to pass arguments to the Callee,
while others are used to return a value from the Callee.

The first four arguments are passed using Registers, while


any extra argument is passed using the Stack.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.116


3.3.12 x64 Calling Conventions

The stack is always aligned as 64-bit items. No matter what


arguments are used.

Regardless of number of arguments, the Caller reserves at


least four 64-bit locations in the stack.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.117


3.3.12 x64 Calling Conventions

Volatile Register: Registers that not expected to retain their


values by the Callee. They can be anything at the end of
Callee execution:
• RAX, RCX, RDX, R8..to..R11
• XMM0..to..XMM5
• YMM0..to..YMM16

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.118


3.3.12 x64 Calling Conventions

Non-Volatile Registers: Registers that must retain their


value by the Callee
• RBX, R12..to..R15, RDI, RSI, RSP, RBP
• XMM6..to..XMM15

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.119


3.3.12 x64 Calling Conventions

The Caller will always trust the Callee that it did not change
any value of any Non-Volatile Register.

However, the Callee can safely change values of a Non-


Volatile Register during the Callee’s execution. The Callee
must save original values of these registers before using
them, then reverse them back to original values before
returning to the Caller.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.120
3.3.12 x64 Calling Conventions

As an example, we will start with a very simple C++ code, and see
how it would be converted into Assembly. Remember that (extern
“C”) will tell the compiler to use x64 fastcall convention:

extern “C” int _get_sum(int v1, int v2) {


return v1 + v2;
}

int main() {
int result = _get_sum(5, 7);
}
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.121
3.3.12 x64 Calling Conventions

The assembly code will look like the following (with a


general description for each line):
…… Program initialization code
mov ecx, 0x5 First argument
mov edx, 0x7 Second argument
sub rsp, 0x20 Allocating Stack (32 bytes=4 x 64bit)
call _get_sum Calling sub-routine (Callee)
add rsp, 0x20 Deallocating stack
…… Other code, plus program exit code
_get_sum: (label to indicate the beginning of the Callee
mov eax, ecx Get first argument
add eax, edx Add to it the value of second argument
ret Return to Caller (with return value in eax)

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.122


3.3.12 x64 Calling Conventions

Before digging deeper into the previous example. We noticed that arguments
were passed in Registers. This is not something random. x64 fastcall
convention has its rules:

Item Type integer, pass using Type float, pass using

1st argument RCX XMM0


2nd argument RDX XMM1
3rd argument R8 XMM2
4th argument R9 XMM3
5th ..to..nth argument The Stack The Stack
Return value RAX XMM0
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.123
3.3.12 x64 Calling Conventions

Type Float arguments refers to (float) and (double) types in


C++.

Type integer refers to everything else.

Non-scalar type are passed by reference.

Scalar types with length more than 64-bit are passed by


reference.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.124
3.3.12 x64 Calling Conventions

Other rules to be applied are:


• If arguments are a mix of integers and floats, then
registers used to pass arguments are a mix of General
Purpose and SSE.
• Even with first four arguments passed through registers,
the Caller will preserve 32 bytes in the stack. These bytes
are called Shadow Space.
• An extra 4 bytes for each extra argument is needed in the
stack, the last argument is the first one to be pushed into
the stack.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.125
3.3.12 x64 Calling Conventions

Now back to our simple example. Let’s trace execution of the code. We are
moving values to 32-bit registers because (int) is a 32-bit type. Memory
64bit items

RIP sub rsp, 0x20 EAX ????


mov ecx, 0x5
mov edx, 0x7 ECX ????
call _get_sum EDX ????
add rsp, 0x20
……
_get_sum:
mov eax, ecx
add eax, edx
ret RSP ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.126
3.3.12 x64 Calling Conventions

Caller allocated Shadow Space (0x20 = 32 = 4 x 64-bit):


Memory
64bit items
sub rsp, 0x20 EAX ????
RIP mov ecx, 0x5
mov edx, 0x7 ECX ????
call _get_sum EDX ????
add rsp, 0x20
……
_get_sum: RSP ????
Shadow Space ????
mov eax, ecx ????
add eax, edx ????
ret Some old value in the stack ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.127
3.3.12 x64 Calling Conventions

First argument value is (5), copies to ECX. Remember that RIP is


always pointing to the next instruction. Memory
64bit items
sub rsp, 0x20 EAX ????
mov ecx, 0x5
RIP mov edx, 0x7 ECX 0x5
call _get_sum EDX ????
add rsp, 0x20
……
_get_sum: RSP ????
Shadow Space ????
mov eax, ecx ????
add eax, edx ????
ret ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.128
3.3.12 x64 Calling Conventions

Second argument value is (7), copies to EDX.


Memory
64bit items
sub rsp, 0x20 EAX ????
mov ecx, 0x5
mov edx, 0x7 ECX 0x5
RIP call _get_sum EDX 0x7
add rsp, 0x20
……
_get_sum: RSP ????
Shadow Space ????
mov eax, ecx ????
add eax, edx ????
ret ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.129
3.3.12 x64 Calling Conventions

Address of next statement pushed into the stack, then RIP is set to
point to the first statement of the Callee. Memory
64bit items
sub rsp, 0x20 EAX ????
mov ecx, 0x5
mov edx, 0x7 ECX 0x5
call _get_sum EDX 0x7
add rsp, 0x20
…… RSP
_get_sum: ????
????
RIP mov eax, ecx Shadow Space
????
add eax, edx ????
ret Some old value in the stack ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.130
3.3.12 x64 Calling Conventions
Preparing return value. Value of first argument copies from ECX to EAX.
Again, return value type is (int), which is 32-bit. Therefore, we use EAX. Memory
64bit items
sub rsp, 0x20 EAX 0x5
mov ecx, 0x5
mov edx, 0x7 ECX 0x5
call _get_sum EDX 0x7
add rsp, 0x20
…… RSP
_get_sum: ????
????
mov eax, ecx Shadow Space
????
RIP add eax, edx ????
ret Some old value in the stack ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.131
3.3.12 x64 Calling Conventions

Value of second argument added to EAX, it contains the return value of


the Callee. Memory
64bit items
sub rsp, 0x20 EAX 0xC
mov ecx, 0x5
mov edx, 0x7 ECX 0x5
call _get_sum EDX 0x7
add rsp, 0x20
…… RSP
_get_sum: ????
????
mov eax, ecx Shadow Space
????
add eax, edx ????
RIP ret Some old value in the stack ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.132
3.3.12 x64 Calling Conventions
Returned to Caller by Popping address of next statement from the stack into RIP.
Notice here that popping items will not clear the value from the stack . Memory
It will only move RSP. 64bit items
sub rsp, 0x20 EAX 0xC
mov ecx, 0x5
mov edx, 0x7 ECX 0x5
call _get_sum EDX 0x7
RIP add rsp, 0x20
……
_get_sum: RSP ????
Shadow Space ????
mov eax, ecx ????
add eax, edx ????
ret Some old value in the stack ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.133
3.3.12 x64 Calling Conventions

Shadow Space deallocated. Caller finished processing the call of Callee.


The Caller will use EAX value as the returned value from the Callee. Memory
64bit items
sub rsp, 0x20 EAX 0xC
mov ecx, 0x5
mov edx, 0x7 ECX 0x5
call _get_sum EDX 0x7
add rsp, 0x20
RIP ……
_get_sum: ????
????
mov eax, ecx ????
add eax, edx ????
ret RSP ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.134
3.3.12 x64 Calling Conventions

Shadow Space is a standard rule in x64 calls. It is used to


align arguments on the Stack. This will become clearer
when we discuss the 6 arguments example shortly.

Another advantage of Shadow Space is to preserve values


of registers if the Callee needs to use them (needs to
amend their values) in the sub-routine.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.135


3.3.12 x64 Calling Conventions

Regardless of whether the Callee uses the registers, the


Shadow Space will be allocated on the Stack.

That is because the Caller sub-route is not aware of the


functionality of the called sub-routine; hence, the best way
to ensure that the register values are restored when
execution returns is to indeed allocate that space.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.136


3.3.12 x64 Calling Conventions

Let’s assume that the value 0xFF is stored in ECX. If the Callee needs to use ECX then the
Value of ECX needs to be stored before doing so. Here is an example of how that can be
achieved:
Memory
64bit items

RIP mov dword ptr[rsp+0x8], ecx EAX ??


mov ecx, 0x1234
mov eax, dword ptr[rsp+0x8] ECX 0xFF
……
Next statement in Caller
RSP
1st
Reserved for argument. RSP+0x08 ????
Shadow Space Reserved for 2nd argument. RSP+0x10 ????
Reserved for 3rd argument. RSP+0x18 ????
Reserved for 4th argument. RSP+0x20 ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.137
3.3.12 x64 Calling Conventions

After the first instruction is executed, the ECX value is saved on the
stack. It is available to be used when needed.
Memory
64bit items
mov dword ptr[rsp+0x8], ecx EAX ??
RIP mov ecx, 0x1234
mov eax, dword ptr[rsp+0x8] ECX 0xFF
……
Next statement in Caller
RSP
1st
Reserved for argument. RSP+0x08 0xFF
Shadow Space Reserved for 2nd argument. RSP+0x10 ????
Reserved for 3rd argument. RSP+0x18 ????
Reserved for 4th argument. RSP+0x20 ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.138
3.3.12 x64 Calling Conventions

With the second instruction executed, the ECX value changed to 0x1234. This is
fine because now, we have the original value ion the Stack, so we can restore it
later on. Memory
64bit items
mov dword ptr[rsp+0x8], ecx EAX ??
mov ecx, 0x1234
RIP mov eax, dword ptr[rsp+0x8] ECX 0x1234
……
Next statement in Caller
RSP
1st
Reserved for argument. RSP+0x08 0xFF
Shadow Space Reserved for 2nd argument. RSP+0x10 ????
Reserved for 3rd argument. RSP+0x18 ????
Reserved for 4th argument. RSP+0x20 ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.139
3.3.12 x64 Calling Conventions

With the last instruction in the example, the EAX register loaded the
value of the first argument from the stack.
Memory
64bit items
mov dword ptr[rsp+0x8], ecx EAX 0xFF
mov ecx, 0x1234
mov eax, dword ptr[rsp+0x8] ECX 0x1234
RIP ……
Next statement in Caller
RSP
1st
Reserved for argument. RSP+0x08 0xFF
Shadow Space Reserved for 2nd argument. RSP+0x10 ????
Reserved for 3rd argument. RSP+0x18 ????
Reserved for 4th argument. RSP+0x20 ????
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.140
3.3.12 x64 Calling Conventions

Last thing to discuss regarding Calling Conventions is the stack


management when we have more than four arguments.

There are two main differences:


• Number of bytes to shift RSP Register will be equal to (8 x
number of arguments).
• Before the Caller initiate the call, RSP will be pointing to the
First Argument Shadow Space. After entering the Callee,
RSP will be pointing to the Next instruction of the Caller.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.141
3.3.12 x64 Calling Conventions

Here is an example of a sub-routine with 6 arguments. Caller code and memory


are as follows before initiating the call instruction:
Memory
64bit items
sub rsp, 0x30 ECX 0x11
mov dword ptr[rsp+0x28],0x66 EDX 0x22
mov dword ptr[rsp+0x20],0x55 R8D 0x33
R9D 0x44
mov r9d, 0x44
mov r8d, 0x33 Reserved for 1st argument. RSP ????
mov edx, 0x22 Shadow Space
nd
Reserved for 2 argument. RSP+0x08 ????
mov ecx, 0x11
Reserved for 3rd argument. RSP+0x10 ????
Reserved for 4th argument. RSP+0x18 ????
RIP call _callee Reserved for 5th argument. RSP+0x20 0x55
Reserved for 6th argument. RSP+0x28 0x66
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.142
3.3.12 x64 Calling Conventions

Now pay attention to the Callee code. There is an 8 bytes shift when accessing
5th and 6th arguments. It is caused by the pushed (next RIP) value.
Memory
64bit items
mov eax, dword ptr[rsp+0x30] ECX 0x11
add eax, dword ptr[rsp+0x28] EDX 0x22
add eax, r9d R8D 0x33
R9D 0x44
add eax, r8d RSP <RIP>
add eax, edx st
Reserved for 1 argument. RSP+0x08 ????
add eax, ecx Shadow Space Reserved for 2nd argument. RSP+0x10 ????
ret
rd
Reserved for 3 argument. RSP+0x18 ????
RIP Reserved for 4th argument. RSP+0x20 ????
Reserved for 5th argument. RSP+0x28 0x55
Reserved for 6th argument. RSP+0x30 0x66
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.143
3.3.12 x64 Calling Conventions

A Stack Frame is a special representation of the stack inside a


sub-routine, where the base of the stack is known and pointed by
RBP.

Any other register can be used to point to the base and keep the
same logic, but the standard Register used by all compilers is
RBP.

Base of the stack does not mean bottom of the program stack. It
is the bottom of the Stack Frame inside the sub-routine.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.144
3.3.12 x64 Calling Conventions

Consider the following points to see what problem Stack Frames


are solving:
• In the previous example, we noticed how referring to an
argument differs between the Caller and the Callee. The Callee
adds 8 more bytes to RSP to access an argument.
• The cause of the 8 bytes shift is comes from an operation
made by the CALL instruction.
• What if the Callee needs to have a few PUSH and POP
instructions internally?
• What if the Callee needs to allocate more memory in the stack
as Local Variable?
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.145
3.3.12 x64 Calling Conventions

For example, in the Callee, compare the address of the 5th argument in the
following situations. Keeping track of RSP moves will be a nightmare!

Memory Memory Memory


Callee reserved a space
64bit items Callee reserved a 64bit items for a local variable, then 64bit items
space for a local pushed another value
variable
RSP (pushed)
RSP (local) RSP+0x08 (local)
RSP <RIP> RSP+0x08 <RIP> RSP+0x10 <RIP>
RSP+0x08 1st arg. RSP+0x10 1st arg. RSP+0x18 1st arg.
RSP+0x10 2nd arg. RSP+0x18 2nd arg. RSP+0x20 2nd arg.
RSP+0x18 3rd arg. RSP+0x20 3rd arg. RSP+0x28 3rd arg.
RSP+0x20 4th arg. RSP+0x28 4th arg. RSP+0x30 4th arg.
RSP+0x28 5th arg. RSP+0x30 5th arg. RSP+0x38 5th arg.
RSP+0x30 6th arg. RSP+0x38 6th arg. RSP+0x40 6th arg.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.146
3.3.12 x64 Calling Conventions
We can see the power of using a Stack Frame where RBP is always pointing to
the base of the Callee’s stack frame base address as a way to return to
preserved stack state into the Callee’s sub-routine.
Memory Memory Memory
Callee reserved a space
64bit items Callee reserved a 64bit items for a local variable, then 64bit items
space for a local pushed another value
variable
RSP RBP-0x10 (pushed)
RSP RBP-0x08 (local) RBP-0x08 (local)
RBP RSP <RIP> RBP <RIP> RBP <RIP>
RBP+0x08 1st arg. RBP+0x08 1st arg. RBP+0x08 1st arg.
RBP+0x10 2nd arg. RBP+0x10 2nd arg. RBP+0x10 2nd arg.
RBP+0x18 3rd arg. RBP+0x18 3rd arg. RBP+0x18 3rd arg.
RBP+0x20 4th arg. RBP+0x20 4th arg. RBP+0x20 4th arg.
RBP+0x28 5th arg. RBP+0x28 5th arg. RBP+0x28 5th arg.
RBP+0x30 6th arg. RBP+0x30 6th arg. RBP+0x30 6th arg.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.147
3.3.12 x64 Calling Conventions

But there is a problem in this solution. RBP Register is a


Non-Volatile Register. The Callee must preserve its value.

To resolve that, the Callee must push RBP value into the
stack before creating the Stack Frame.

Before leaving the Callee, RBP recovers its value by popping


it from the stack.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.148
3.3.12 x64 Calling Conventions

Now this is the actual representation of Stack Frames in x64. Original RBP value is
pushed right after entering the Callee, then RBP is set to the same value of RSP.

Memory Callee reserved a Memory Callee reserved a space Memory


for a local variable, then
64bit items space for a local 64bit items pushed another value 64bit items
variable
RSP RBP-0x10 (pushed)
RSP RBP-0x08 (local) RBP-0x08 (local)
RBP RSP <org. RBP> RBP <org. RBP> RBP <org. RBP>
RBP+0x08 <RIP> RBP+0x08 <RIP> RBP+0x08 <RIP>
RBP+0x10 1st arg. RBP+0x10 1st arg. RBP+0x10 1st arg.
RBP+0x18 2nd arg. RBP+0x18 2nd arg. RBP+0x18 2nd arg.
RBP+0x20 3rd arg. RBP+0x20 3rd arg. RBP+0x20 3rd arg.
RBP+0x28 4th arg. RBP+0x28 4th arg. RBP+0x28 4th arg.
RBP+0x30 5th arg. RBP+0x30 5th arg. RBP+0x30 5th arg.
RBP+0x38 6th arg. RBP+0x38 6th arg. RBP+0x38 6th arg.
MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.149
3.3.12 x64 Calling Conventions

Each sub-routine is responsible of creating and destroying


it own Stack Frame when needed. And remember, it is a
Callee issue, the Caller has nothing to do with it.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.150


3.3.12 x64 Calling Conventions

Prolog: is the code for creating the Stack Frame. There are
primarily three steps:
• Push RBP to preserve its original value
• Set RBP to the same value of RSP
• Move RSP to allocate space for local variables

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.151


3.3.12 x64 Calling Conventions

Epilog: is the code for destroying the Stack Frame. There


are primarly two steps:
• Move RSP to deallocate local variables space
• Pop RBP

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.152


3.3.12 x64 Calling Conventions

The following is a sample assembly of managing Stack


Frames:
push rbp
mov rbp, rsp prolog
sub rsp, 10h

mov dword prt[rbp-8h], 123h Callee body, accessing


mov dword prt[rbp+10h], 456h locals by (rbp -) and
arguments by (rbp +)

mov rsp, rbp epilog


pop rpb

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.153


Hera Lab #6
Put what you’ve learned to practice
with the Writing and Debugging
Assembly x64 Code lab!

To ACCESS your lab, go to the


course in your members area and
click the labs drop-down in the
appropriate module line, then click
the manual icon.

Please note that Malware Analysis


Labs are only available in Full
or Elite Editions.

*NOTE: some courses contain several labs and manuals, please make sure to click the file icon as it may
be a zip that contains multiple lab manuals.

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.154


References

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.155


References
Here’s a list of all references linked or used in this course.
Intel® 64 and IA-32 Architectures Software Developer Manuals
https://software.intel.com/en-us/articles/intel-sdm

Online x86 / x64 Assembler and Disassembler


https://defuse.ca/online-x86-assembler.htm

compile nasm online


https://rextester.com/l/nasm_online_compiler

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.156


Labs
Here’s a list of all labs in this module. To ACCESS a lab, go to the course in
your members area and click the Labs drop-down in the appropriate module
line. Labs for this section are available in Full or Elite Editions.

Writing and Debugging Assembly x64 Code

MAPv1: Section 01, Module 03 - Caendra Inc. © 2020 | p.157

You might also like