0% found this document useful (0 votes)
3 views

Assembly #1

This document provides an introduction to x86 assembly language, focusing on the use of registers, data declaration, and memory management. It covers the roles of various registers, the structure of assembly programs, and how to manipulate data using instructions like MOV, ADD, and CMP. Additionally, it explains the organization of code sections such as .data, .bss, and .rodata, as well as the implementation of functions and loops in assembly.

Uploaded by

Braincain007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assembly #1

This document provides an introduction to x86 assembly language, focusing on the use of registers, data declaration, and memory management. It covers the roles of various registers, the structure of assembly programs, and how to manipulate data using instructions like MOV, ADD, and CMP. Additionally, it explains the organization of code sections such as .data, .bss, and .rodata, as well as the implementation of functions and loops in assembly.

Uploaded by

Braincain007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Assembly #01 - Introduction to x86 Assembly

**Note: All examples are in NASM Syntax

Registers
General Purpose registers are small, fast storage locations inside the CPU used for:
• Arithmetic operations
• Data movement
• Loop control
• Function arguments/return values

Register Naming Hierarchy


Each register has multiple names depending on the size you’re working with:

64-bit 32-bit 16-bit 8-bit (high) 8-bit (low) Description


RAX EAX AX AH AL Accumulator (used for math ops)
RBX EBX BX BH BL Base (used for address/base calc)
RCX ECX CX CH CL Counter (used in loops)
RDX EDX DX DH DL Data register (math+I/O)
RSI ESI SI SIL Source index
RDI EDI DI DIL Destination index
RBP EBP BP BPL Base pointer (stack frame)
RSP ESP SP SPL Stack pointer
R8 R8D R8W R8B General-use register
R9 R9D R9W R9B General-use register
R10 R10D R10W R10B General-use register
R11 R11D R11W R11B General-use register
R12 R12D R12W R12B General-use register
R13 R13D R13W R13B General-use register
R14 R14D R14W R14B General-use register
R15 R15D R15W R15B General-use register

Example of sizes:
1 mov rax , 0 x 123456 789ABC DEF0
2 ; RAX = 123456789 ABCDEF0
3 ; EAX = 9 ABCDEF0
4 ; ^^^ Writing to EAX zeroes the upper 32 - bits of RAX
5 ; Use XOR reg , reg to Zero registers
6 ; AX = DEF0
7 ; AL = F0
8 ; AH = DE

1
Register Roles & Conventions
Register Common use (x86-64 Linux Systems)
RAX Return value (or accumulator)
RBX Callee-saved register
RCX 4th argument
RDX 3rd argument
RSI 2nd argument
RDI 1st argument
RSP Stack pointer
RBP Base pointer (used in stack frames)
R8-R9 5th & 6th argument
R10-R11 Temporary (caller-saved)
R12-R15 Callee-saved, general use

** Our coding standards for this class:

• RAX - Used for results.

• RCX/RDX/RSI/RDI/R8-R9 - Use for arguments.

• RBP/RSP - Never use/trash these unless you know what you’re doing.

Declaring a Value in ASM


A value in x86-64 assembly, can be declared in the data section using directives that
specify the type and size of the data.

Directive Size Example Description


BYTE 1 byte myByte db 0x10 Declares a byte (8-bit value)
WORD 2 bytes myWord dw 0x1234 Declares a word (16-bit value)
DWORD 4 bytes myDword dd 0x12345678 Declares a doubleword (32-bit value)
QWORD 8 bytes myQword dq 0x1122334455667788 Declares a quadword (64-bit value)

The syntax of a typical variable in the .data section:


1 section .data
2 myVar dd 10 ; Declaring a 32 - bit integer intialized to 10
3 ; myVar - variable name to be used within program
4 ; dd - Doubleword = > variable type ( in bytes )
5 ; 10 - intialized to the value of 10

2
Dynamic Changes of a Variable’s Value
After a variable has been declared it can be modified dynamically using instructions
like MOV, ADD, and SUB.

The standard method of modifying → move it into a register, modify it, and then
store it back.

Example:
1 section .data
2 myVar dd 10 ; Declare a 32 - bit integer initialized to 10
3
4 section .bss
5 temp resd 1 ; Reserve space for a variable (32 - bit )
6
7 section .text
8 global _start
9
10 _start :
11 mov eax , [ myVar ] ; Load value of myVar into register
12 add eax , 5 ; Increment by 5
13 mov [ myVar ] , eax ; Store the new value back into memory
14
15 ; Exit program
16 mov eax , 60 ; syscall : exit
17 xor edi , edi ; status : 0
18 syscall

What do the Square Brackets [ ] mean for a variable?

• This is known as a memory access (or lookup)

1 mov eax , [ var ]


2 ; this means -> Load the value stored at the memory address of var
into the register EAX
3
4 ; Some considerations :
5 mov eax , var ; Load the address of var into eax
6 mov eax , [ var ] ; Load the value at memory location var into eax
7 mov [ var ] , eax ; Store the value of eax into memory at address var
8
9 ; NASM instructions adopt the [ destination , source ] concept ( Intel
syntax )
10 ; mov dst , src
11 ; add dst , src
12 ; sub dst , src

3
What about Registers in brackets?

• This is possible, it allows you to index into arrays or dynamically computed


addresses.

1 ; Array - like Access


2 mov rsi , array ; point to start of array
3 mov eax , [ rsi +4] ; access 2 nd element ( assuming 4 - byte ints )
4

5 ; Computed Address ( aka Effective Address )


6 ; [ register ] -> indirect access
7 ; [ register + offset ] -> like accessing a field
8 ; [ base + index * scale ] -> array indexing
9 ; [ base + index * scale + offset ] -> full effective address
10

11 ; Examples :
12 mov eax , [ rbp - 4] ; Load a local variable from the stack frame
13 mov eax , [ rsi + rdi *4] ; Access array element at index in RDI
14
15 ; !! If you ever need a value from a variable use brackets !!

.data Section (Initializing Data)


You will notice that to print to the terminal we needed to define a variable to do so
(and in the future using a program). This is done through the .data section, it is
typically used to store predefined variables and constants that have initial values.
1 ; Example :
2 section .data
3 msg db " Hello , world ! " , 0 xA ; string with newline
4 msg_len equ $ - msg ; Dynamically calculate length
5
6 ; equ -> defines a constant ( used for computing string length )
7 ; 0 xA -> represents a newline character ( i.e. , in ASCII -> \ n )

.bss Section (Unitialized Data)


The .bss section is used to allocate memory for variables that are not initialized at
compile time.
1 section .bss
2 some_input resb 20 ; Reserve 20 bytes for input
3
4 ; resb 20 -> reserves 20 bytes ( b = bytes )
5 ; The memory is allocated but not initialized ( contents are
undefined ) .
6 ; Typically avoid initializing values here ; do it in .data or at
runtime.

4
.rodata Section (Read-Only Data)
The .rodata section stores constants that should never change, such as messages or
mathematical constants.
1 section .rodata
2 pi dq 3 .141592653589793 ; Define a doubel - precision constant
3

4 ; dq -> defines a quardword (64 - bit ) floating - point number.


5 ; This data cannot be modified at runtime.

Stacking
The stack is managed by RSP/ESP (stack pointer) and grows downward in memory.
1 ; Stack example
2 section .text
3 global _start
4
5 _start :
6 push 5 ; Push integer 5 onto stack
7 push 10 ; Push integer 10 onto stack
8 pop rax ; Pop value (10) into eax
9 pop rbx ; Pop value (5) into ebx

Functions
Functions are similar to labels, but must be called using the call instruction.

Python example:
1 def add (a , b ) :
2 return a + b
3
4 result = add (3 , 4)

Assembly conversion:
1 section .text
2 global _start
3 global add_numbers ; Declare function for external use
4

5 _start :
6 mov edi , 3 ; First argument ( a )
7 mov esi , 4 ; Second argument ( b )
8 call add_numbers ; Call function
9 ; EAX now contains the result (3 + 4)
10

11 ; Exit program
12 mov eax , 60 ; syscall : exit
13 xor edi , edi
14 syscall
15
16 add_numbers :
17 mov eax , edi ; Load first argument

5
18 add eax , esi ; Add second argument
19 ret ; Return result in EAX

If we notice this will not print out the value even though we save it to the return
register, EAX. What we need to do is run the write system call to actually print
to the terminal. We first would need to convert the result into ASCII and then run
a system call to print the resulting ASCII. There are a few methods to do this, all
being a little complex but the general instructions that will satisfy the most use-cases
are below:

This is a lot of work to do every time, let’s create a reusable call statement that we
can compile with.
** I will provide you with a version that can handle signed numbers. **

If-Clause
Let’s look at a simple Python function:
1 def check_number ( n ) :
2 if n > 10:
3 return 1
4 else :
5 return 0

When converting this to ASM there are some points to think about (i.e., what is
involved in this code):

• This is a function, meaning we need a label.

• We need to compare a value.

• The comparison is a condition, meaning we need a conditional jump.

• We then need to return the value.

Some other concerns is with jumping in assembly; jumps are usually tested against
flags which are set via the CMP and TEST instructions. Here is the typical use-case
for each:

• Use CMP for comparing values: ==, <, >, loops, conditionals

– CMP performs: r10 - r12, but discards result


∗ Example: cmp r10, r12

• Use TEST to check for zero (val == 0) or bit flags without modifying data

– TEST performs: r14 AND r15, discards result (bitwise AND)


∗ Example: test r14, r15

** To come to a common coding standard, we will prioritize CMP for this class. **

6
Jump Condition Description
je Zero Flag = 1 Jump if Equal
jne Zero Flag = 0 Jump if Not Equal
jg ZF = 0 and SF = OF Jump if Greater (signed)
jge SF = OF Jump if Greater or Equal (signed)
jl SF != OF Jump if Less (signed)
jle ZF = 1 or SF != OF Jump if Less or Equal (signed)
*jmp N/A Unconditional jump
*call N/A Jump to procedure, save return address
*ret N/A Return from procedure

* means it is an unconditional jump (does not use CMP or TEST, then test flags)

Example of jump cases:


1 cmp eax , ebx ; eax - ebx
2 je .equal ; if eax == ebx -> usually results in 0
3 jg .greater ; if eax > ebx ( signed )
4 jl .lower ; if eax < ebx ( unsigned )

To print directly from the script (full script using ASCII without our print function):
1 section .data
2 result_char db ’0 ’ ; default character ( updated at runtime )
3
4 section .text
5 global _start
6
7 _start :
8 mov edi , 15 ; <-- Change this to test different
values
9 call check_number ; Result returned in EAX -> 0 or 1
10

11 add eax , ’0 ’ ; Convert to ASCII ( ’0 ’ or ’1 ’)


12 mov [ result_char ] , al ; Stored in result buffer
13
14 ; Print result using write syscall
15 mov rax , 1 ; syscall : write
16 mov rdi , 1 ; file descriptor : stdout
17 mov rsi , result_char ; address of character to print
18 mov rdx , 1 ; number of bytes
19 syscall
20
21 ; Exit and cleanup
22 mov rax , 60 ; syscall : exit
23 xor rdi , rdi ; exit code 0
24 syscall
25
26 ; ---- Function : check_number ----
27 check_number :
28 mov eax , edi ; Load function argument ( n ) in to EAX
29 cmp eax , 10 ; Compare n with 10
30 jle return_zero ; Jump if n <= 10
31

7
32 mov eax , 1 ; If n > 10 , return 1
33 jmp end_func
34
35 return_zero :
36 mov eax , 0 ; If n <= 10 , return 0
37
38 end_func :
39 ret

Loops in Assembly
Loops in assembly are typically implemented using CMP (compare), JMP (jump),
and LOOP instructions. The most common loop types are:

• For-loop (Count-Controlled Loop)

• While (Condition-Based Loop)

For a For-loop, it will run a fixed number of times:


1 for i in range (5) :
2 print ( i )

The assembly conversion requires a few things:

• A counter, to count the iterations

• A conditional jump (back and forth)

• Some way to increment the counter

You might also like