Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
};
names[1]. To access individual character within a name, we use the since it changes the index based on a
row column.
index12 e.g. names[0][0] gives the character “J”, names[0][1] gives the 12
Same with column index, the right
This section will explore how compiler transform high level code into as-
sembly code that CPU can execute, and see how common assembly
pat- terns help to create higher level syntax. -S option is added to
objdump to better demonstrate the connection between high and low
level code.
In this section, the option --no-show-raw-insn is added to objdump
command to omit the opcodes for clarity:
int32_t i = 0x12345678;
return 0;
}
80483f6: ret
80483f7: xchg ax,ax
80483f9: xchg ax,ax
80483fb: xchg ax,ax
80483fd: xchg ax,ax
80483ff: nop
The general data movement is performed with the mov instruction. Note
that despite the instruction being called mov, it actually copies data
from one destination to another.
The red instruction copies data from the register esp to the register
ebp. This mov instruction moves data between registers and is assigned
the opcode 89.
The blue instructions copies data from one memory location (the i
variable) to another (the j variable). There exists no data movement
from memory to memory; it requires two mov instructions, one for copying
the data from a memory location to a register, and one for copying the
data
from the register to the destination memory location.
4.9.2 Expressions
Source
int expr(int i, int j)
{
int add = i +
j; int sub = i
- j; int mul =
i * j; int div
= i / j; int
mod = i % j;
int neg = -i;
int and = i &
j; int or = i |
j; int xor = i
^ j; int not =
~i;
int shl = i << 8;
4 OPERATING SYSTEMS : FROM 0 TO 1
return 0;
}
Assembly The full assembly listing is really long. For that reason, we ex-
amine expression by expression.
is first loaded with i, then is multiplied with j and stored the re- instruction.
sult back into eax, then stored into the variable mul at location [ebp-0x34].
Expression: int div = i / j;
80483ff: mov eax,DWORD PTR
[ebp+0x8] 8048402: cdq
8048403: idiv DWORD PTR [ebp+0xc]
8048406: mov DWORD PTR [ebp-
0x30],eax
Similar to imul, idiv performs sign divide. But, different from imul
above idiv only takes one operand:
1. First, i is reloaded into eax.
2. Then, cdq converts the double word value in eax into a quad-
word value stored in the pair of registers edx:eax, by copying
the signed (bit 31th) of the value in eax into every bit position
in edx. The pair edx:eax is the dividend, which is the variable
i, and the operand to idiv is the divisor, which is the variable
j.
3. After the calculation, the result is stored into the pair edx:eax
registers, with the quotient in eax and remainder in edx. The
quotient is stored in the variable div, at location [ebp-0x30].
6 OPERATING SYSTEMS : FROM 0 TO 1
The same idiv instruction also performs the modulo operation, since
it also calculates a remainder and stores in the variable mod, at lo-
cation [ebp-0x2c].
Expression: int or = i | j;
shl (shift logical left) shifts the bits in the destination operand to
the left by the number of bits specified in the source operand. In
this case, eax stores i and shl shifts eax by 8 bits to the left. A
dif- ferent name for shl is sal ( shift arithmetic left). Both can be
used
synonymous. Finally, the result is stored in the variable shl at [ebp-0x14].
Here is a visual demonstration of shl/sal and shr instructions:
After shifting to the left, the right most bit is set for Carry Flag in
EFLAGS register.
sar is similar to shl/sal, but shift bits to the right and extends
the sign bit. For right shift, shr and sar are two different
instruc-
tions. shr differs to sar is that it does not extend the sign bit. Finally,
the result is stored in the variable shr at [ebp-0x10].
8 OPERATING SYSTEMS : FROM 0 TO 1
Initial State
Initial State
CF
CF
X
10001000100010001000100010001 10001000100010001000100010001
X
After 1-bit SHL/SAL instruction
1 01000100010001000100010001000111 1
0 0
00010001000100010001000100011
0 00000000001000100010001000100010 0
0 0
01000100010001000111100000000
(a) SHL/SAL (Source: Figure 7-6, Volume 1) (b) SHR (Source: Figure 7-7, Volume 1)
Operand CF
X Figure 4.9.2: SAR Instruction
0100010001000100010001000100011 Operation (Source: Figure 7-8,
Volume 1)
After 1-bit SAR instruction
00100010001000100010001000100011
1
11100010001000100010001000100011
1
With sar, the sign bit (the most significant bit) is preserved. That
is, if the sign bit is 0, the new bits always get the value 0; if the sign
bit is 1, the new bits always get the value 1.
cmp and variants of the variants of set instructions make up all the
logical comparisons. In this expression, cmp compares variable i and
j; then sete stores the value 1 to al register if the comparison
from cmp earlier is equal, or stores 0 otherwise. The general name for
vari- ants of set instruction is called SETcc. The suffix cc denotes the
condition being tested for in EFLAGS register. Appendix B in vol-
ume 1, “EFLAGS Condition Codes”, lists the conditions it is possi-
ble to test for with this instruction. Finally, the result is stored in
the variable equal1 at [ebp-0x41].
Logical AND operator && is one of the syntaxes that is made entirely
in software14 with simpler instructions. The algorithm from the as- 14
That is, there is no equivalent assem-
First, i is copied into eax at 80484d9. Then, the value of eax + 0x1
is copied into edx as an effective address at 80484dc. The lea (load
effective address) instruction copies a memory address into a reg-
ister. According to Volume 2, the source operand is a memory ad-
dress specified with one of the processors addressing modes. This
means, the source operand must be specified by the addressing modes
defined in 16-bit/32-bit ModR/M Byte tables, 4.5.1 and 4.5.2.
After loading the incremented value into edx, the value of i is in-
creased by 1 at 80484df. Finally, the previous i value is stored back
to i1 at [ebp-0x8] by the instruction at 80484e2.
The primary differences between this increment syntax and the pre-
vious one are:
4.9.3 Stack
A stack is a contiguous array of memory locations that holds a
collection of discrete data. When a new element is added, a stack
grows down in memory toward lesser addresses, and shrinks up toward
greater addresses when an element is removed. x86 uses the esp register
to point to the top of the stack, at the newest element. A stack can be
originated any- where in main memory, as esp can be set to any
memory address. x86 provides two operations for manipulating
stacks:
D push instruction and its variants add a new element on top of the stack
D pop instructions and its variants remove the top-most element from
the stack.
{
int a = 1;
int b = 2;
{
return i = a + b;
}
}
}
a and b are local to where it is defined and local into its inner child
scope that return i = a + b. However, they do not exist at the
function scope that creates i.
the stack. The local variables and arguments are automatically allocated
upon enter a function and destroyed after exiting a function, that’s
why it’s called automatic variables.
D All local variables are allocated after the ebp pointer. Thus, to access
a local variable, a number is subtracted from ebp to reach the loca-
tion of the variable.
D The ebp itself pointer points to the return address of its caller.
return i;
}
Assembly 080483db
<add>: #include
<stdint.h>
int add(int a, int b) {
80483db: push ebp
16 OPERATING SYSTEMS : FROM 0 TO 1
D [ebp+0x8] accesses a.
D [ebp+0xc] access b.
For accessing arguments, the rule is that the closer a variable on stack
to ebp, the closer it is to a function name.
↓ ↓ ↓ ↓
ebp+0x ebp+0x ebp+0x e
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0x10000 b a Return Address Old ebp
↓ ↓
ebp+0x ebp+0x
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0xffe0 N i
Figure 4.9.6: Function arguments
N = Next local variable starts here and local variables in memory
From the figure, we can see that a and b are laid out in memory
with the exact order as written in C, relative to the return address.
Source
#include <stdio.h>
return a + b;
}
return 0;
}
Assembly For every function call, gcc pushes arguments on the stack in
reversed order with the push instructions. That is, the arguments pushed
on stack are in reserved order as it is written in high level C code, to
ensure the relative order between arguments, as seen in previous sec-
tion how function arguments and local variables are laid out. Then,
gcc generates a call instruction, which then implicitly pushes a re-
turn address before transferring the control to add function:
080483f2 <main>:
int main(int argc, char *argv[])
{ 80483f2: push ebp
80483f3: mov ebp,esp
add(1,2);
80483f5: push 0x2
80483f7: push 0x1
80483f9: call 80483db
<add> 80483fe: add esp,0x8
return 0;
8048401: mov eax,0x0
}
8048406: leave
18 OPERATING SYSTEMS : FROM 0 TO 1
8048407: ret
Upon finishing the call to add function, the stack is restored by adding
0x8 to stack pointer esp (which is equivalent to 2 pop instructions).
Finally, a leave instruction is executed and main returns with a ret
instruction.
A ret instruction transfers the program execution back to the caller to
the instruction right after the call instruction, the add instruction. The
reason ret can return to such location is that the return address implic-
itly pushed by the call instruction, which is the address right after the
call instruction; whenever the CPU executes ret instruction, it retrieves
the return address that sits right after all the arguments on the stack:
080483db <add>:
#include <stdio.h>
int add(int a, int b) {
80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
int local = 0x12345;
80483e1: DWORD PTR [ebp-0x4],0x12345
return a + b;
80483e8: mov edx,DWORD PTR [ebp+0x8]
80483eb: mov eax,DWORD PTR [ebp+0xc]
80483ee: add eax,edx
}
80483f0: leave
80483f1: ret
Exercise 4.9.3. The above code that gcc generated for function call-
ing is actually the standard method x86 defined. Read chapter 6, “Produce
Calls, Interrupts, and Exceptions”, Intel manual volume 1.
X86 ASSEMBLY AND C 19
4.9.6 Loop
Loop is simply resetting the instruction pointer to an already executed
instruction and starting from there all over again. A loop is just one ap-
plication of jmp instruction. However, because looping is a pervasive pat-
tern, it earned its own syntax in C.
Source
#include <stdio.h>
return 0;
}
Assembly 080483db
<main>: #include
<stdio.h>
int main(int argc, char *argv[])
{ 80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
for (int i = 0; i < 10; i++)
{
80483e1: mov DWORD PTR [ebp-0x4],0x0
80483e8: jmp 80483ee <main+0x13>
80483ea: add DWORD PTR [ebp-
0x4],0x1 80483ee: cmp DWORD PTR [ebp-
0x4],0x9 80483f2: jle 80483ea
<main+0xf>
}
return 0;
80483f4: b8 00 00 00 00 mov eax,0x0
}
80483f9: c9 leave
80483fa: c3 ret
80483fb: 66 90 xchg ax,ax
80483fd: 66 90 xchg ax,ax
20 OPERATING SYSTEMS : FROM 0 TO 1
80483ff: 90 nop
Exercise 4.9.4. Why does the increment instruction (the blue instruc-
tion) appears before the compare instructions (the green instructions)?
Exercise 4.9.5. What assembly code can be generated for while and
do...while?
4.9.7 Conditional
Again, conditional in C with if...else... construct is just another ap-
plication of jmp instruction under the hood. It is also a pervasive pat-
tern that earned its own syntax in C.
Source
#include <stdio.h>
if (argc) {
i = 1;
} else {
i = 0;
}
return 0;
}