Operating System From 0 To 1
Operating System From 0 To 1
O P E R AT I N G S Y S T E M :
FROM 0 TO 1
Contents
Preface i
I Preliminary 1
1 The Documents . . . . . . . . . . . . . . . . 3
2.4 Abstraction . . . . . . . . . . . . . . . . . 26
3 Computer Architecture . . . . . . . . . . . . . 33
4.1 objdump . . . . . . . . . . . . . . . . . . 50
II Groundwork 195
7 Bootloader . . . . . . . . . . . . . . . . . . 197
10 Process . . . . . . . . . . . . . . . . . . . . 285
11 Interrupt . . . . . . . . . . . . . . . . . . . 291
Index . . . . . . . . . . . . . . . . . . . . . 297
Biblography . . . . . . . . . . . . . . . . . . 299
Preface
Greetings!
If that is the case, this book is for you. By going through this book,
you will be able to find the missing piece that is essential and enable
you to implement your operating system, from scratch! Yes, from
scratch, without going through any existing operating system layer
to comfort yourself that you are an operating system developer. You
may ask, "isnt it more practical to learn the internals of Linux? In the
end, you learn Linux to improve your day job significantly and might
be able to contribute to it. That is an excellent approach. However,
if you follow that route, you still do not achieve the ultimate goal:
writing an actual operating system. Also, you gain knowledge that
ii tu, do hoang
can never learn from using any current operating system, and this
experience can impact how you write software . These are the benefits
of writing your operating system:
Learn how a computer works at the hardware level, and how you
can write software that manages it directly.
You wont fully understand how Linux works without writing one
on your own anyway. That is, to hack Linux internals profoundly,
you will need to write at least one operating system on your own.
Just like to write a large application, you need to start writing
simple programs. Later you can proudly say that you really wrote
an operating system from scratch, not a copy/paste textbook
operating system.
There are many books on this topic and even courses out there al-
ready, made by famous professors and experts. Who am I to write a
book on such advanced topic? It is true that many quality books are
out there. But then: of all the operating system books, does any of
them show you how to compile your C code that is independent of
an existing operating system and the C runtime library on top of it?
Most books talk about operating system designs and books that talk
about implementations, only mention the software side of operating
system: algorithms, while the hardware side to enable an operating
system to communicate with hardware is skipped; even if you read
operating system: from 0 to 1 iii
such books, the best you can only do is to experiment its algorithm
in an existing operating system, which is not so satisfying. Maybe
the scope of hardware is too large to include in an operating system
book, and probably the students are assumed to obtain enough hard-
ware knowledge beforehand from other courses. Or, the students only
need to work within existing assignment framework: the only thing
they need to do is filling the empty function body, run against test
cases and complete their homework. Either way, hardware details are
ignored, and it is hard to find relevant resources on the internet for
a self-learner. The aim of this book is to provide that missing gap:
not only you will learn how to program hardware directly, but you
will also learn how to read official documents from hardware vendors
to program it. You no longer have to guess where the online articles
get such mysterious information of a hardware device to program it.
You can do it yourself. Lastly, I wrote this book from an autodidacts
perspective, thus I worked my best to make the book as self-contained
as possible, so you spend more time learning and less time guessing
and looking for information.
operating system.
If you agree with me, then wait no longer. Start diving in. With
this book, I hope that I could provide enough foundational knowledge
that finally you can make sense of available OS books and code. This
book is especially beneficial to students whove just finished their first
C/C++ course. Imagine how cool it is to show your future employer
that you already wrote an OS on your own.
Prerequisites
Ohm law
Know how to touch type. Since we are going to use Linux, touch
typing helps. I know typing speed does not relate to problem -
solving, but at least the speed should be fast enough not to let it
get it the way and degrade the learning experience.
operating system: from 0 to 1 v
Acknowledgments
Preliminary
1
The Documents
A problem domain is the part of the world where the computer is to problem domain
produce effects, together with the means available to produce them,
directly or indirectly. (Kovitz, 1999)
Application Non-software
Software Domain
Domain Domains
What vs How
Sketches
...
8 operating system: from 0 to 1
Its better to put anything related to the problem domain in the re-
quirement document. A good way to test the quality of requirement
document is to hand it to the domain expert for proofreading if he
can understand the material thoroughly. Requirement document is
also useful as a help document later, or for writing one much easier.
Software specification document states rules relating desired behavior Software specification
of the output devices to all possible behavior of the input devices,
as well as any rules that other parts of the problem domain must
obey.Kovitz (1999)
defined, with the tiniest details in it. It needs to be that way because
once hardware is physically manufactured, theres no going back, and
if defects exist, its a devastating damage to the company on both
finance and reputation.
Aside from the Intels official website, the website of this book also
hosts the documents for convenience2 . 2
Intel may change the links to the
documents as they update their
website, so this book doesnt contain
any link to the documents to avoid
confusion for readers.
10 operating system: from 0 to 1
At the core, a transistor is just a resistor whose values can vary transistor
based on an input voltage value
Figure 2.1.2: Modern transistor
. With this property, a transistor can be used as a current amplifier
(more voltage, less resistance) or switch electrical signals off and
on (block and unblock an electron flow) based on a voltage level.
At 0 v, no current can pass through a transistor, thus it acts like a
circuit with an open switch (light bulb off) because the resistor value
is enough to block the electrical flow. Similarly, at +3.5 v, current
1 2 3
can flow through a transistor because the resistor value is lessened,
effectively enables electron flow, thus acts like a circuit with a closed
switch. If you want a deeper expla-
A bit has two states: 0 and 1, which is the building block of all nation of transistors e.g. how
digital systems and software. Similar to a light bulb that can be electrons move, you should look
turned on and off, bits are made out of this electrical stream from the at the video How semiconduc-
power source: Bit 0 are represented with 0 v (no electron flow), and tors work on Youtube, by Ben
bit 1 is +3.5 v to +5 v (electron flow). Transistor implements a bit Eater.
correctly, as it can regulate the electron flow based on voltage level.
The classic transistors invented open a whole new world of micro dig-
ital devices. Prior to the invention, vacuum tubes - which are just
fancier light bulbs - were used to present 0 and 1, and required human
to turn it on and off. MOSFET, or M etalOxideSemiconductor MOSFET
Field-Effect T ransistor, invented in 1959 by Dawon Kahng and Mar-
tin M. (John) Atalla at Bell Labs, is an improved version of classic
transistors that is more suitable for digital devices, as it requires
shorter switching time between two states 0 and 1, more stable, con-
sumes less power and easier to produce.
There are also two types of MOSFETs analogous to two types
of transistors: n-MOSFET and p-MOSFET. n-MOSFET and p-
MOSFET are also called NMOS and PMOS transistors for short.
All digital devices are designed with logic gates. A logic gate is a logic gate
from hardware to software: layers of abstraction 13
boolean functions.
A out
2.2.1 The theory behind logic gates B
Logic gates accept only binary inputs1 and produce binary outputs. 1
Input that is either a 0 or 1.
In other words, logic gates are functions that transform binary val-
ues. Fortunately, a branch of math that deals exclusively with bi-
nary values already existed, called Boolean Algebra, developed in the
19th century by George Boole. From a sound mathematical theory as
a foundation created logic gates. As logic gates implement Boolean
functions, a set of Boolean functions is functionally complete, if this functionally complete
set can construct all other Boolean functions can be constructed from.
Later, Charles Sanders Peirce (during 1880 1881) proved that either
Boolean function of NOR or NAND alone is enough to create all other
Boolean logic functions. Thus NOR and NAND gates are function-
ally completePeirce (1933). Gates are simply the implementations of
Boolean logic functions, therefore NAND or NOR gate is enough to
implement all other logic gates. The simplest gates CMOS circuit can
implement are inverters (NOT gates) and from the inverters, comes
NAND gates. With NAND gates, we are confident to implement ev-
erything else. This is why the inventions of transistors, then CMOS
circuit revolutionized computer industry. If you want to understand
We should realize and appreciate how powerful boolean functions why and how from NAND
are available in all programming languages. gate we can create all Boolean
functions and a computer, I
2.2.2 Logic Gate implementation: CMOS circuit suggest the course Build a
Modern Computer from First
Underlying every logic gate is a circuit called CMOS - C omplementary
Principles: From Nand to
MOSFET. CMOS consists of two complementary transistors, NMOS
Tetris available on Coursera:
and PMOS. The simplest CMOS circuit is an inverter or a NOT gate:
https://www.coursera.org/
learn/build-a-computer. Go
even further, after the course,
you should take the series Com-
putational Structures on Edx.
CMOS
14 operating system: from 0 to 1
Example 2.2.1. 74HC00 is a chip with four 2-input NAND gates. The
chip comes with 8 input pins and 4 output pins, 1 pin for connecting
to a voltage source and 1 pin for connecting to the ground. This
device is the physical implementation of NAND gates that we can
physically touch and use. But instead of just a single gate, the chip
comes with 4 gates that can be combined. Each combination enables
a different logic function, effective creating other logic gates. This
feature is what make the chip popular.
Each of the gates above is just a simple NAND circuit with the
electron flows, as demonstrated earlier. Yet, many these NAND-gates
chips combined can build a simple computer. Software, at the physical
16 operating system: from 0 to 1
(a) Logic diagram of 74HC00 (b) Logic diagram of one NAND gate
A A
A A
Y Y
B B
B B
How can the above gates can be created with 74HC00? It is simple:
as every gate has 2 input pins and 1 output pin, we can write the
output of 1 NAND gate to an input of another NAND gate, thus
chaining NAND gates together to produce the diagrams as above.
Note that CPU is not the only device with its language. CPU is
just a name to indicate a hardware device that controls a computer
system. A hardware device may not be a CPU but still has its lan-
guage. A device with its own machine language is a programmable
device, since a user can use the language to command the device to
perform different actions. For example, a printer has its set of com-
mands for instructing it how to prints a page.
Example 2.3.1. A user can use 74HC00 chip without knowing its
internal, but only the interface for using the device. First, we need to
know its layout:
18 operating system: from 0 to 1
2A 4 11 4Y
2B 5 10 3B
2Y 6 9 3A
GND 7 8 3Y
Input Output
Table 2.3.2: Functional Descrip-
nA nB nY tion
L X H
X L H
H H L n is a number, either 1, 2, 3,
The functional description provides a truth table with all possible or 4
pin inputs and outputs, which also describes the usage of all pins in
H = HIGH voltage level; L =
the device. A user needs not to know the implementation, but on
LOW voltage level; X = dont
such table to use the device. We can say that the truth table above
care.
is the machine language of the device. Since the device is digital, its
language is a collection of binary strings:
The device has 8 input pins, and this means it accepts binary
strings of 8 bits.
The device has 4 output pins, and this means it produces binary
strings of 4 bits from the 8-bit inputs.
The number of input strings is what the device understand, and the
number of output strings is what the device can speak. Together, they
from hardware to software: layers of abstraction 19
make the language of the device. Even though this device is simple,
yet the language it can accept contains quite many binary strings:
28 + 24 = 272. However, such number is a tiny fraction of a complex
device like a CPU, with hundreds of pins.
Pin 1A 1B 2A 2B 3A 3B 4A 4B 1Y 2Y 3Y 4Y
Value 1 1 0 0 1 1 0 0 0 1 0 1
2A 0 1 4Y
2B 0 1 3B
2Y 1 1 3A
GND 0 3Y
A B C D Y
Table 2.3.3: Truth table of OR
0 0 1 1 0 logic diagram.
0 1 1 0 1
1 0 0 1 1
1 1 0 0 1
20 operating system: from 0 to 1
1A A Vcc
1B A 4B
A C
NAND1 1Y C 4A
2A B 4Y
Y
NAND3 2B B C 3B
2Y D D 3A
B D
NAND2 GND Y 3Y
(a) 2-bit OR gate logic diagram, built from 3 NAND (b) Pin layout; pin 3A and 3B take the values from 1Y
gates with 4 pins just for 2 bits of input. and 2Y.
Figure 2.3.3: 2-bit OR gate
To implement a 4-bit OR gate, we need a total of four of such implementation
1A A1 Vcc
Figure 2.3.4: 4-bit OR chip made
1B A2 4B from four 74HC00 devices
1Y C1 4A
2A B1 4Y
2B B1 C1 3B
2Y D1 D1 3A
GND Y1 3Y
1A A2 Vcc
1B A2 4B
1Y C2 4A
2A B2 4Y
2B B2 C2 3B
2Y D2 D2 3A
GND Y2 3Y
1A A3 Vcc
1B A3 4B
1Y C3 4A
2A B3 4Y
2B B3 C3 3B
2Y D3 D3 3A
GND Y3 3Y
1A A4 Vcc
1B A4 4B
1Y C4 4A
2A B4 4Y
2B B4 C4 3B
2Y D4 D4 3A
GND Y4 3Y
or <op1>, <op2>
nand <op1>, <op2>
1A A2 Vcc
1B A2 4B
1Y C2 4A
2A B2 4Y
2B B2 C2 3B
2Y D2 D2 3A
GND Y2 3Y
1A 1 Vcc 1A A4 Vcc
1B 1 0 4B 1B A4 4B
1Y 0 0 4A 1Y C4 4A
2A B4 4Y
2A 0 4-bit NAND 1 4Y
2B B4 C4 3B
2B 0 1 3B
2Y 1 1 3A 2Y D4 D4 3A
GND 0 3Y GND Y4 3Y
choice to NAND gate for creating other logic gates. Similarly, if we keep
improving our hypothetical device, it eventually becomes a full-fledge
computer.
source1.asm
if (...) {
Figure 2.3.6: Repeated assembly
....... patterns are generalized into a new
} else { language.
.......
}
.................
source2.asm
.................
.................
source<n>.asm
ware can also implement. The reverse is also true: any hardware logic
that is implemented in a circuit can be reimplemented in a program-
ming language. The simple reason is that programming languages,
or assembly languages, or machine languages, or logic gates are just
languages to express computations. It is impossible for software to
implement something hardware is incapable of because programming
language is just a simpler way to use the underlying hardware. At
the end of the day, programming languages are translated to machine
instructions that are valid to a CPU. Otherwise, code is not runnable,
thus a useless software. In reverse, software can do everything hard-
ware (that run the software) can, as programming languages are just
an easier way to use the hardware.
In reality, even though all languages are equivalent in power, not all
of them are capable of express programs of each other. Programming
languages vary between two ends of a spectrum: high level and low
level.
2.4 Abstraction
The recurring details are given a new and simpler language than
the languages of the lower layers.
What to realize is that every layer is just and a more convenient lan-
guage to describe the lower layer. Only after a description is fully
created with the language of the higher layer, it is then be imple-
mented with the language of the lower layer.
CMOS layer has a recurring pattern that makes sure logic gates
are reliably translated to CMOS circuits: a k-input gate uses k
PMOS and k NMOS transistors, a total of 2k transistors
Wakerly (1999). Since digital devices use CMOS exclusively, a
language arose to describe higher level ideas while hiding CMOS
circuits: Logic Gates.
There were many such ideas that can be reliably translated into
Assembly code. Thus, the ideas were extracted for building into the
high level programming languages that everyone programmer learns
today.
Programming Language
Assembly Language
Logic Gates
Circuit
a
digraph {
a -> b;
b -> c;
a -> c; b d
d -> c;
}
draw_line(a, b);
a -> b;
3.1.1 Server
A mobile computer is similar to a desktop computer with fewer re- mobile computer
sources but can be carried around.
computer architecture 35
rate both the input and output systems along with the computer in a
single package.
A printed circuit board is a physical board that contains lines and pads
to enable electron flows between electrical and electronics components.
Without a PCB, devices cannot be combined to create a larger de-
vice. As long as these devices are hidden inside a larger device and
contribute to a larger device that operates at a higher level layer for a
higher level purpose, they are embedded devices. Writing a program
for an embedded device is therefore called embedded programming.
Embedded computers are used in automatically controlled devices in-
cluding power tools, toys, implantable medical devices, office machines,
engine control systems, appliances, remote controls and other types of
embedded systems.
1
2x USB 2.0
Raspberry Pi Model B+ V1.2
RUN
(C)Raspberry Pi 2014
4x USB +
Ethernet
CPU/GPU
Display DSI
on bottom side
controller
microSD slot
Broadcom LAN9514
BCM2835 2x USB 2.0
512MB SDRAM
Camera CSI
3.3V
current
limiter
HDMI
&
1.8V
Regulator polarity protection Ethernet
Video+audio
3.5mm out
RJ45
Composite
Micro power
good
USB HDMI out Ethernet
Power in
4 poles jack
Field Programmable Gate Array (FPGA) is a hardware an array of Field Programmable Gate
reconfigurable gates that makes circuit structure programmable after Array
it is shipped away from the factory1 . Recall that in the previous 1
This is why it is called Field Gate
Programmable Array. It is changeable
chapter, each 74HC00 chip can be configured as a gate, and a more in the field where it is applied.
sophisticated device can be built by combining multiple 74HC00
chips. In a similar manner, each FPGA device contains thousands
of chips called logic blocks, which is a more complicated chip than a
74HC00 chip that can be configured to implement a Boolean logic
function. These logic blocks can be chained together to create a high-
level hardware feature. This high-level feature is usually a dedicated
algorithm that needs high-speed processing.
in logic gates, which the FPGA device then follows the description to
configure itself to run the algorithm. An algorithm written for a micro-
controller is in assembly instructions that a processor can understand
and act accordingly.
Computer organization is the functional view of the design of a com- Computer organization
puter. In this view, hardware components of a computer are presented
as boxes with input and output that connects to each other and form
the design of a computer. Two computers may have the same ISA, but
different organizations. For example, both AMD and Intel processors
implement x86 ISA, but the hardware components of each processor
that make up the environments for the ISA are not the same.
Computer organizations may vary depend on a manufacturers de-
sign, but they are all originated from the Von Neumann architecture2 : 2
John von Neumann was a mathe-
matician and physicist who invented a
computer architecture.
CPU fetches instructions continuously from main memory and exe-
cute.
Bus are electrical wires for sending raw bits between the above compo-
nents.
computer architecture 41
Input and
CPU Memory
Figure 3.2.1: Von-Neumann
Output Architecture
System bus
Control bus
Address bus
Data bus
I/O Devices are devices that give input to a computer i.e. keyboard,
mouse, sensor... and takes the output from a computer i.e. monitor
takes information sent from CPU to display it, LED turns on/off
according to a pattern computed by CPU...
Not all registers are used for communication with other devices. In
a CPU, most registers are used as high-speed storage for temporary
computer architecture 43
data. Other devices that a CPU can communicate always have a set
of registers for interfacing with the CPU.
These two interfaces are extremely important, as they are the only
interfaces for controlling hardware with software. Writing device
drivers is essentially learning the functionality of each register and how
to use them properly to control the device.
System Bus
System Bus
Control
Control Address
Address CPU Memory
MCH Data
Memory Data
MCH
CPU
(a) Old CPU (b) Modern CPU
stores charge; the charge gradually leaks and after a period, the grid
needs refreshing otherwise it loses all stored values. The individual
transistor that stores charge is called capacitor.
3.2.3 Hardware
generic slots for other devices, e.g. network card, sound card.
CPU
Clock Front-side
Graphics Generator bus
card slot
Chipset
Memory Slots
High-speed
graphics bus
(AGP or PCI
Express)
Northbridge Memory
bus
(memory
controller hub)
Internal
Bus
Southbridge
PCI (I/O controller
Bus hub)
IDE
SATA
USB Cables and
Ethernet ports leading
Audio Codec
CMOS Memory off-board
PCI Slots
LPC
Bus Super I/O
Serial Port
Parallel Port
Flash ROM Floppy Disk
(BIOS) Keyboard
Mouse
computer architecture 47
etc.
For the remain of this chapter, please carry on the reading to chapter
3 in Intel Manual Volume 1, Basic Execution Environment .
4
x86 Assembly and C
Not quite. Surely, the compiler at its current state of the art is
trustworthy, and we do not need to write code in assembly, most of the
time. A compiler can generate code, but as mentioned previously, a
high-level language is a collection of patterns of a lower-level language.
It does not cover everything that a hardware platform provides. As
a consequence, not every assembly instruction can be generated by
a compiler, so we still need to write assembly code for these circum-
stances to access hardware-specific features. Since hardware-specific
features require writing assembly code, debugging requires reading it.
We might spend even more time reading than writing. Working with
low-level code that interacts directly with hardware, assembly code
is unavoidable. Also, understand how a compiler generates assembly
code could improve a programmers productivity. For example, if a
job or school assignment requires us to write assembly code, we can
simply write it in C, then let gcc does the hard working of writing the
assembly code for us. We merely collect the generated assembly code,
modify as needed and be done with the assignment.
4.1 objdump
$ objdump -d hello
$ objdump -D hello
At the start of the output displays the file format of the object file:
00000000004003e0 <_start>:
4003e0: 31 ed xor ebp,ebp
4003e2: 49 89 d1 mov r9,rdx
4003e5: 5e pop rsi
...more assembly code....
0000000000400410 <deregister_tm_clones>:
400410: b8 3f 10 60 00 mov eax,0x60103f
400415: 55 push rbp
400416: 48 2d 38 10 60 00 sub rax,0x601038
...more assembly code....
jmp eax
Then, we use an editor e.g. Emacs, then create a new file, write the
code and save it in a file, e.g. test.asm. Then, in the terminal, run
the command:
-f option specifies the file format, e.g. ELF, of the final output file.
But in this case, the format is bin, which means this file is just a flat
binary output without any extra information. That is, the written
assembly code is translated to machine code as is, without the
overhead of the metadata from file format like ELF. Indeed, after
compiling, we can examine the output using this command:
$ hd test
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 40 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |@.......4.....(.|
00000030 05 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 |................|
00000070 06 00 00 00 00 00 00 00 10 01 00 00 02 00 00 00 |................|
56 operating system: from 0 to 1
00000080 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 |................|
00000090 07 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000000a0 20 01 00 00 21 00 00 00 00 00 00 00 00 00 00 00 | ...!...........|
000000b0 01 00 00 00 00 00 00 00 11 00 00 00 02 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 50 01 00 00 30 00 00 00 |........P...0...|
000000d0 04 00 00 00 03 00 00 00 04 00 00 00 10 00 00 00 |................|
000000e0 19 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 80 01 00 00 0d 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000110 ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 00 2e 74 65 78 74 00 2e 73 68 73 74 72 74 61 62 |..text..shstrtab|
00000130 00 2e 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 |..symtab..strtab|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000160 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 |................|
00000180 00 64 69 73 70 38 2d 35 2e 61 73 6d 00 00 00 00 |.disp8-5.asm....|
00000190
Note: Using the bin format puts nasm by default into 16-bit mode.
To enable 32-bit code to be generated, we must add this line at the
beginning of an nasm source file:
bits 32
Instruction
Opcode ModR/M SIB Displacement Immediate
Prexes
7 65 32 0 7 65 32 0
Reg/
Mod R/M Scale Index Base
Opcode
1. The REX prex is optional, but if used must be immediately before the opcode; see Section
2.2.1, REX Prexes in the manual for additional information.
2. For VEX encoding information, see Section 2.3, Intel Advanced Vector Extensions (Intel
AVX) in the manual.
3. Some rare instructions can take an 8B immediate or 8B displacement.
jmp [0x1234]
ff 26 34 12
The very first byte, 0xff is the opcode, which is unique to jmp
instruction.
mod field, or modifier field, is combined with r/m field for a total
of 5 bits of information to encode 32 possible values: 8 registers
and 24 addressing modes.
The tables 4.5.1 and 4.5.2 list all possible 256 values of ModR/M byte
and how each value maps to an addressing mode and a register, in
16-bit and 32-bit modes.
x86 assembly and c 59
r8(/r) AL CL DL BL AH CH DH BH
r16(/r) AX CX DX BX SP BP1 SI DI
r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI
mm(/r) MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
xmm(/r) XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
(In decimal) /digit (Opcode) 0 1 2 3 4 5 6 7
(In binary) REG = 000 001 010 011 100 101 110 111
Effective Address Mod R/M Values of ModR/M Byte (In Hexadecimal)
[BX + SI] 00 000 00 08 10 18 20 28 30 38
[BX + DI] 001 01 09 11 19 21 29 31 39
[BP + SI] 010 02 0A 12 1A 22 2A 32 3A
[BP + DI] 011 03 0B 13 1B 23 2B 33 3B
[SI] 100 04 0C 14 1C 24 2C 34 3C
[DI] 101 05 0D 15 1D 25 2D 35 3D
disp162 110 06 0E 16 1E 26 2E 36 3E
[BX] 111 07 0F 17 1F 27 2F 37 3F
[BX + SI] + disp83 01 000 40 48 50 58 60 68 70 78
[BX + DI] + disp8 001 41 49 51 59 61 69 71 79
[BP + SI] + disp8 010 42 4A 52 5A 62 6A 72 7A
[BP + DI] + disp8 011 43 4B 53 5B 63 6B 73 7B
[SI] + disp8 100 44 4C 54 5C 64 6C 74 7C
[DI] + disp8 101 45 4D 55 5D 65 6D 75 7D
[BP] + disp8 110 46 4E 56 5E 66 6E 76 7E
[BX] + disp8 111 47 4F 57 5F 67 6F 77 7F
[BX + SI] + disp16 10 000 80 88 90 98 A0 A8 B0 B8
[BX + DI] + disp16 001 81 89 91 99 A1 A9 B1 B9
[BP + SI] + disp16 010 82 8A 92 9A A2 AA B2 BA
[BP + DI] + disp16 011 83 8B 93 9B A3 AB B3 BB
[SI] + disp16 100 84 8C 94 9C A4 AC B4 BC
[DI] + disp16 101 85 8D 95 9D A5 AD B5 BD
[BP] + disp16 110 86 8E 96 9E A6 AE B6 BE
[BX] + disp16 111 87 8F 97 9F A7 AF B7 BF
EAX/AX/AL/MM0/XMM0 11 000 C0 C8 D0 D8 E0 E8 F0 F8
ECX/CX/CL/MM1/XMM1 001 C1 C9 D1 D9 E1 E9 F1 F9
EDX/DX/DL/MM2/XMM2 010 C2 CA D2 DA E2 EA F2 FA
EBX/BX/BL/MM3/XMM3 011 C3 CB D3 DB E3 EB F3 FB
ESP/SP/AHMM4/XMM4 100 C4 CC D4 DC E4 EC F4 FC
EBP/BP/CH/MM5/XMM5 101 C5 CD D5 DD E5 ED F5 FD
ESI/SI/DH/MM6/XMM6 110 C6 CE D6 DE E6 EE F6 FE
EDI/DI/BH/MM7/XMM7 111 C7 CF D7 DF E7 EF F7 FF
1. The default segment register is SS for the effective addresses containing a BP index, DS for other effective
addresses.
2. The disp16 nomenclature denotes a 16-bit displacement that follows the ModR/M byte and that is added to the
index.
3. The disp8 nomenclature denotes an 8-bit displacement that follows the ModR/M byte and that is sign-extended
and added to the index.
r8(/r) AL CL DL BL AH CH DH BH
r16(/r) AX CX DX BX SP BP SI DI
r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI
mm(/r) MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
xmm(/r) XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
(In decimal) /digit (Opcode) 0 1 2 3 4 5 6 7
(In binary) REG = 000 001 010 011 100 101 110 111
Effective Address Mod R/M Values of ModR/M Byte (In Hexadecimal)
[EAX] 00 000 00 08 10 18 20 28 30 38
[ECX] 001 01 09 11 19 21 29 31 39
[EDX] 010 02 0A 12 1A 22 2A 32 3A
[EBX] 011 03 0B 13 1B 23 2B 33 3B
[--][--]1 100 04 0C 14 1C 24 2C 34 3C
disp322 101 05 0D 15 1D 25 2D 35 3D
[ESI] 110 06 0E 16 1E 26 2E 36 3E
[EDI] 111 07 0F 17 1F 27 2F 37 3F
[EAX] + disp83 01 000 40 48 50 58 60 68 70 78
[ECX] + disp8 001 41 49 51 59 61 69 71 79
[EDX] + disp8 010 42 4A 52 5A 62 6A 72 7A
[EBX] + disp8 011 43 4B 53 5B 63 6B 73 7B
[--][--] + disp8 100 44 4C 54 5C 64 6C 74 7C
[EBP] + disp8 101 45 4D 55 5D 65 6D 75 7D
[ESI] + disp8 110 46 4E 56 5E 66 6E 76 7E
[EDI] + disp8 111 47 4F 57 5F 67 6F 77 7F
[EAX] + disp32 10 000 80 88 90 98 A0 A8 B0 B8
[ECX] + disp32 001 81 89 91 99 A1 A9 B1 B9
[EDX] + disp32 010 82 8A 92 9A A2 AA B2 BA
[EBX] + disp32 011 83 8B 93 9B A3 AB B3 BB
[--][--] + disp32 100 84 8C 94 9C A4 AC B4 BC
[EBP] + disp32 101 85 8D 95 9D A5 AD B5 BD
[ESI] + disp32 110 86 8E 96 9E A6 AE B6 BE
[EDI] + disp32 111 87 8F 97 9F A7 AF B7 BF
EAX/AX/AL/MM0/XMM0 11 000 C0 C8 D0 D8 E0 E8 F0 F8
ECX/CX/CL/MM/XMM1 001 C1 C9 D1 D9 E1 E9 F1 F9
EDX/DX/DL/MM2/XMM2 010 C2 CA D2 DA E2 EA F2 FA
EBX/BX/BL/MM3/XMM3 011 C3 CB D3 DB E3 EB F3 FB
ESP/SP/AH/MM4/XMM4 100 C4 CC D4 DC E4 EC F4 FC
EBP/BP/CH/MM5/XMM5 101 C5 CD D5 DD E5 ED F5 FD
ESI/SI/DH/MM6/XMM6 110 C6 CE D6 DE E6 EE F6 FE
EDI/DI/BH/MM7/XMM7 111 C7 CF D7 DF E7 EF F7 FF
3. The disp8 nomenclature denotes an 8-bit displacement that follows the ModR/M byte (or the SIB byte if one is
present) and that is sign-extended and added to the index.
jmp [0x1234]
ff 26 34 12
0xff is the opcode. Next to it, 0x26 is the ModR/M byte. Look up in
the 16-bit table , the first operand is in the row, equivalent to a Remember, using bin format
disp16, which means a 16-bit offset. Since the instruction does not generates 16-bit code by default
have a second operand, the column can be ignored.
01 c8
Why is the first operand in the row and the second in a column?
Lets break down the ModR/M byte, with an example value c8, into
bits:
1 1 0 0 1 0 0 0
62 operating system: from 0 to 1
Below is the table listing all 256 values of SIB byte, with the lookup
rule similar to ModR/M tables:
1. The [*] nomenclature means a disp32 with no base if the MOD is 00B. Otherwise, [*] means disp8 or disp32 +
[EBP]. This provides the following address modes:
00000000 67 ff 24 43
64 operating system: from 0 to 1
First of all, the first byte, 0x67 is not an opcode but a prefix. The
number is a predefined prefix for address-size override prefix. After
the prefix, comes the opcode 0x67 and the ModR/M byte 0x24. The
value from ModR/M suggests that there exists a SIB byte that follows.
The SIB byte is 0x43.
Look up in the SIB table, the row tells that eax is scaled by 2, and
the column tells that the base to be added is in ebx.
jmp [0x1234]
ff 26 34 12
67 ff 24 8d 34 12 00 00
0x24 is the ModR/M byte. The value suggests that a SIB byte
follows, according to table 4.5.2.
x86 assembly and c 65
66 b8 34 12 00 00
Exercise 4.5.1. Read section 2.1 in Volume 2 for even more details.
Each table contains the following fields, and can have one or more
rows:
Opcode Instruction Op/En 64/32-bit Mode CPUID Description
Feature flag
Compat/Leg Mode Many instructions do not have this field, cat /proc/cpuinfo
but instead is replaced with Compat/Leg Mode, which stands lists the information of available
for Compatibility or Legacy Mode. This mode enables 64-bit CPUs and its features in flags
variants of instructions to run normally in 16 or 32-bit mode. field.
Description briefly explains the variant of an instruction in the Table 4.6.1: Notations in
current row. Compat/Leg Mode
Notation Description
Description specifies the purpose of the instructions and how an Valid Supported
I Not supported
instruction works in detail.
N.E. The 64-bit opcode cannot be
encoded as it overlaps with
Operation is pseudo-code that implements an instruction. If a de-
existing 32-bit opcode.
scription is vague, this section is the next best source to understand
an assembly instruction. The syntax is described in section 3.1.1.9
in volume 2.
Exceptions list the possible errors that can occur when an instruc-
tion cannot run correctly. This section is valuable for OS debugging.
Exceptions fall into one of the following categories:
Floating-Point Exception
68 operating system: from 0 to 1
For our OS, we only use Protected Mode Exceptions and Real-Address
Mode Exceptions. The details are in section 3.1.1.13 and 3.1.1.14,
volume 2.
Lets look at our good old jmp instruction. First, the opcode table:
main:
jmp main
jmp main2
jmp main
main2:
jmp 0x1234
main main2
Table 4.7.2: Memory address of
each opcode
Address 00 01 02 03 04 05 06 07 08 09
Opcode eb fe eb 02 eb fa e9 2b 12 00
program is the address 00, the end of jmp 0x1234 is the address 092 , 2
which means 9 bytes was consumed,
starting from address 0.
so the offset is calculated as 0x1234 - 0x9 = 0x122b. That solved the
mystery!
ff 26 34 12
Since this is 16-bit code, we use table 4.5.1. Looking up the table,
ModR/M value 26 means disp16, which means a 16-bit offset from the
start of current index4 , which is the base address stored in DS register. 4
Look at the note under the table.
is generated into:
67 ff 28
Since 28 is the value in the 5th column of the table 4.5.25 that refers 5
Remember the prefix 67 indicates
the instruction is used as 32-bit. The
to [eax], we successfully generate an instruction for a far jump. After prefix only added if the default envi-
ronment is assumed as 16-bit when
CPU runs the instruction, the program counter eip and code segment generating code by an assembler.
x86 assembly and c 71
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
eax
0x00001000 1000 00 00 34 12 78 56
cs
0x00001234
jmp 0x1234:0x5678
is generated into:
ea 78 56 34 12
Note: zero bytes are hidden with three dot symbols: ... To show
all the zero bytes, we add -z option.
The most basic types that x86 architecture works with are based on
sizes, each is twice as large as the previous one: 1 byte (8 bits), 2
bytes (16 bits), 4 bytes (32 bits), 8 bytes (64 bits) and 16 bytes (128
bits).
These types are simplest: they are just chunks of memory at dif-
ferent sizes that enables CPU to access memory efficiently. From the
x86 assembly and c 73
Source
#include <stdint.h>
Assembly
0804a018 <byte>:
804a018: 12 00 adc al,BYTE PTR [eax]
0804a01a <word>:
804a01a: 34 12 xor al,0x12
0804a01c <dword>:
804a01c: 78 56 js 804a074 <_end+0x48>
804a01e: 34 12 xor al,0x12
0804a020 <qword>:
804a020: ef out dx,eax
804a021: cd ab int 0xab
804a023: 89 67 45 mov DWORD PTR [edi+0x45],esp
804a026: 23 01 and eax,DWORD PTR [ecx]
0000000000601040 <dqword1>:
601040: ef out dx,eax
601041: cd ab int 0xab
601043: 89 67 45 mov DWORD PTR [rdi+0x45],esp
601046: 23 01 and eax,DWORD PTR [rcx]
601048: 00 00 add BYTE PTR [rax],al
60104a: 00 00 add BYTE PTR [rax],al
60104c: 00 00 add BYTE PTR [rax],al
60104e: 00 00 add BYTE PTR [rax],al
0000000000601050 <dqword2>:
601050: 00 00 add BYTE PTR [rax],al
601052: 00 00 add BYTE PTR [rax],al
601054: 00 00 add BYTE PTR [rax],al
601056: 00 00 add BYTE PTR [rax],al
601058: ef out dx,eax
601059: cd ab int 0xab
60105b: 89 67 45 mov DWORD PTR [rdi+0x45],esp
60105e: 23 01 and eax,DWORD PTR [rcx]
same color. Since this is data section, the assembly listing carries no
meaning. When byte is declared with uint8_t, gcc guarantees that
the size of byte is always 1 byte. But, an alert reader might notice the
00 value next to the 12 value in the byte variable. This is normal, as
gcc avoid memory misalignment by adding extra padding bytes. To
make it easier to see, we look at readelf output of .data section:
the output is (the colors mark which values belong to which variables):
In all the examples above, when the value of a variable with smaller
size is assigned to a variable with larger size, the value easily fits in
the larger variable. On the contrary, the value of a variable with larger
size is assigned to a variable with smaller size, two scenarios occur:
The value is greater than the maximum value of the variable with
smaller layout, so it needs truncating to the size of the variable and
causing incorrect value.
The value is smaller than the maximum value of the variable with a
smaller layout, so it fits the variable.
However, the value might be unknown until runtime and can be value,
it is best not to let such implicit conversion handled by the compiler,
but explicitly controlled by a programmer. Otherwise it will cause
subtle bugs that are hard to catch as the erroneous values might rarely
be used to reproduce the bugs.
Pointers are variables that hold memory addresses. x86 works with 2
types of pointers:
Far pointer is also an offset like a near pointer, but with an explicit
segment selector.
Near Pointer
Figure 4.8.2: Numeric Data
Types
Oset
31 0
47 32 31 0
Source
#include <stdint.h>
int8_t i = 0;
int8_t *p1 = (int8_t *) 0x1234;
int8_t *p2 = &i;
Assembly
0000000000601030 <p1>:
601030: 34 12 xor al,0x12
601032: 00 00 add BYTE PTR [rax],al
601034: 00 00 add BYTE PTR [rax],al
601036: 00 00 add BYTE PTR [rax],al
0000000000601038 <p2>:
601038: 41 10 60 00 adc BYTE PTR [r8+0x0],spl
60103c: 00 00 add BYTE PTR [rax],al
60103e: 00 00 add BYTE PTR [rax],al
Disassembly of section .bss:
78 operating system: from 0 to 1
0000000000601040 <__bss_start>:
601040: 00 00 add BYTE PTR [rax],al
0000000000601041 <i>:
601041: 00 00 add BYTE PTR [rax],al
601043: 00 00 add BYTE PTR [rax],al
601045: 00 00 add BYTE PTR [rax],al
601047: 00 .byte 0x0
The pointer p1 holds a direct address with the value 0x1234. The
pointer p2 holds the address of the variable i. Note that both the
pointers are 8 bytes in size (or 4-byte, if 32-bit).
A bit field is a contiguous sequence of bits. Bit fields allow data struc-
turing at bit level. For example, a 32-bit data can hold multiple bit
fields that represent multiples different pieces of information, such
as bits 0-4 specifies the size of a data structure, bit 5-6 specifies per-
missions and so on. Data structures at the bit level are common for
low-level programming.
Least
Signicant
Bit
Source
struct bit_field {
int data1:8;
int data2:8;
int data3:8;
int data4:8;
};
x86 assembly and c 79
struct bit_field2 {
int data1:8;
int data2:8;
int data3:8;
int data4:8;
char data5:4;
};
struct normal_struct {
int data1;
int data2;
int data3;
int data4;
};
struct normal_struct ns = {
.data1 = 0x12345678,
.data2 = 0x9abcdef0,
.data3 = 0x12345678,
.data4 = 0x9abcdef0,
};
int i = 0x12345678;
struct bit_field bf = {
.data1 = 0x12,
.data2 = 0x34,
.data3 = 0x56,
.data4 = 0x78
};
.data4 = 0x78,
.data5 = 0xf
};
Assembly
Each variable and its value are given a unique color in the assembly
listing below:
0804a018 <ns>:
804a018: 78 56 js 804a070 <_end+0x34>
804a01a: 34 12 xor al,0x12
804a01c: f0 de bc 9a 78 56 34 lock fidivr WORD PTR [edx+ebx*4+0x12345678]
804a023: 12
804a024: f0 de bc 9a 78 56 34 lock fidivr WORD PTR [edx+ebx*4+0x12345678]
804a02b: 12
0804a028 <i>:
804a028: 78 56 js 804a080 <_end+0x44>
804a02a: 34 12 xor al,0x12
0804a02c <bf>:
804a02c: 12 34 56 adc dh,BYTE PTR [esi+edx*2]
804a02f: 78 12 js 804a043 <_end+0x7>
0804a030 <bf2>:
804a030: 12 34 56 adc dh,BYTE PTR [esi+edx*2]
804a033: 78 0f js 804a044 <_end+0x8>
804a035: 00 00 add BYTE PTR [eax],al
804a037: 00 .byte 0x0
The sample code creates 4 variables: ns, i, bf, bf2. The definition
of normal_struct and bit_field structs both specify 4 integers.
bit_field specifies additional information next to its member name,
separated by a colon, e.g. .data1 : 8. This extra information is
the bit width of each bit group. It means, even though defined as an
x86 assembly and c 81
If the new data members fit within the remaining bits after .data,
which are 24 bits7 , then the total size of bit_field struct is still 4 7
Since .data1 is declared as an int, 32
bits are still allocated, but .data1 can
bytes, or 32 bits. only access 8 bits of information.
If the new data members dont fit, then the remaining 24 bits (3
bytes) are still allocated. However, the new data members are
allocated brand new storages, without using the previous 24 bits.
the mean of a pointer, or casting to another data type that can fully
access all 4 bytes..
struct bit_field {
int data1:8;
};
struct bit_field bf = {
.data1 = 0x1234,
};
struct bit_field2 {
int data1:8;
int data5:32;
};
Source
#include <stdint.h>
Assembly
0804a018 <a8>:
804a018: 12 34 00 adc dh,BYTE PTR [eax+eax*1]
804a01b: 00 34 12 add BYTE PTR [edx+edx*1],dh
0804a01c <a16>:
804a01c: 34 12 xor al,0x12
804a01e: 78 56 js 804a076 <_end+0x3a>
0804a020 <a32>:
804a020: 78 56 js 804a078 <_end+0x3c>
804a022: 34 12 xor al,0x12
804a024: f0 de bc 9a f0 de bc lock fidivr WORD PTR [edx+ebx*4-0x65432110]
804a02b: 9a
0804a028 <a64>:
804a028: f0 de bc 9a 78 56 34 lock fidivr WORD PTR [edx+ebx*4+0x12345678]
804a02f: 12
804a030: f0 de bc 9a 78 56 34 lock fidivr WORD PTR [edx+ebx*4+0x12345678]
804a037: 12
84 operating system: from 0 to 1
Finally is a64, also with 2 elements, but 8 bytes each. The total
size of a64 is 16 bytes, which is in the natural alignment, therefore no
padding bytes added. The values of both a64[0] and a64[1] are the
same: f0 de bc 9a 78 56 34 12, that got misinterpreted to fidivr
instruction.
Source
#include <stdint.h>
uint8_t a2[2][2] = {
{0x12, 0x34},
x86 assembly and c 85
{0x56, 0x78}
};
uint8_t a3[2][2][2] = {
{{0x12, 0x34},
{0x56, 0x78}},
{{0x9a, 0xbc},
{0xde, 0xff}},
};
Assembly
0804a018 <a2>:
804a018: 12 34 56 adc dh,BYTE PTR [esi+edx*2]
804a01b: 78 12 js 804a02f <_end+0x7>
0804a01c <a3>:
804a01c: 12 34 56 adc dh,BYTE PTR [esi+edx*2]
804a01f: 78 9a js 8049fbb <_DYNAMIC+0xa7>
804a021: bc .byte 0xbc
804a022: de ff fdivrp st(7),st
char names[2][10] = {
"JohnDoe",
"JaneDoe"
};
This section will explore how compiler transform high level code into
assembly code that CPU can execute, and see how common assembly
patterns help to create higher level syntax. -S option is added to
objdump to better demonstrate the connection between high and low
level code.
x86 assembly and c 87
Previous section explores how various types of data are created, and
how they are laid out in memory. Once memory storages are allocated
for variables, they must be accessible and writable. Data transfer
instructions move data (bytes, words, doublewords or quadwords)
between memory and registers, and between registers, effectively read
from a storage source and write to another storage source.
Source
#include <stdint.h>
int32_t i = 0x12345678;
return 0;
}
Assembly
080483db <main>:
#include <stdint.h>
int32_t i = 0x12345678;
int main(int argc, char *argv[]) {
80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
88 operating system: from 0 to 1
int j = i;
80483e1: mov eax,ds:0x804a018
80483e6: mov DWORD PTR [ebp-0x8],eax
int k = 0xabcdef;
80483e9: mov DWORD PTR [ebp-0x4],0xabcdef
return 0;
80483f0: mov eax,0x0
}
80483f5: leave
80483f6: ret
80483f7: xchg ax,ax
80483f9: xchg ax,ax
80483fb: xchg ax,ax
80483fd: xchg ax,ax
80483ff: nop
The red instruction copies data from the register esp to the reg-
ister ebp. This mov instruction moves data between registers and is
assigned the opcode 89.
The blue instructions copies data from one memory location (the i
variable) to another (the j variable). There exists no data movement
from memory to memory; it requires two mov instructions, one for
copying the data from a memory location to a register, and one for
copying the data from the register to the destination memory location.
4.9.2 Expressions
Source
int sub = i - j;
int mul = i * j;
int div = i / j;
int mod = i % j;
int neg = -i;
int and = i & j;
int or = i | j;
int xor = i ^ j;
int not = ~i;
int shl = i << 8;
int shr = i >> 8;
char equal1 = (i == j);
int equal2 = (i == j);
char greater = (i > j);
char less = (i < j);
char greater_equal = (i >= j);
char less_equal = (i <= j);
int logical_and = i && j;
int logical_or = i || j;
++i;
--i;
int i1 = i++;
int i2 = ++i;
int i3 = i--;
int i4 = --i;
return 0;
}
Assembly
The full assembly listing is really long. For that reason, we examine
90 operating system: from 0 to 1
expression by expression.
Expression: int or = i | j;
shl (shift logical left) shifts the bits in the destination operand
to the left by the number of bits specified in the source operand.
x86 assembly and c 93
In this case, eax stores i and shl shifts eax by 8 bits to the left.
A different name for shl is sal ( shift arithmetic left). Both can
be used synonymous. Finally, the result is stored in the variable
shl at [ebp-0x14].
Here is a visual demonstration of shl/sal and shr instructions:
X 10001000100010001000100010001111 10001000100010001000100010001111 X
1 00010001000100010001000100011110 0 0 01000100010001000100010001000111 1
0 01000100010001000111100000000000 0 0 00000000001000100010001000100010 0
(a) SHL/SAL (Source: Figure 7-6, Volume 1) (b) SHR (Source: Figure 7-7, Volume 1)
sar is similar to shl/sal, but shift bits to the right and ex-
tends the sign bit. For right shift, shr and sar are two differ-
ent instructions. shr differs to sar is that it does not extend
the sign bit. Finally, the result is stored in the variable shr at
[ebp-0x10].
In the figure 4.9.1(b), notice that initially, the sign bit is 1, but
after 1-bit and 10-bit shiftings, the shifted-out bits are filled with
zeros.
With sar, the sign bit (the most significant bit) is preserved.
That is, if the sign bit is 0, the new bits always get the value 0; if
the sign bit is 1, the new bits always get the value 1.
00100010001000100010001000100011 1
11000100010001000100010001000111 X
11100010001000100010001000100011 1
4.9.3 Stack
push instruction and its variants add a new element on top of the
stack
pop instructions and its variants remove the top-most element from
the stack.
void foo() {
int a;
int b;
}
int foo() {
int i;
{
int a = 1;
int b = 2;
{
return i = a + b;
}
}
}
a and b are local to where it is defined and local into its inner
child scope that return i = a + b. However, they do not exist at the
function scope that creates i.
to access a variable:
All local variables are allocated after the ebp pointer. Thus, to
access a local variable, a number is subtracted from ebp to reach
the location of the variable.
x86 assembly and c 101
The ebp itself pointer points to the return address of its caller.
L = Local Variable
Source
return i;
}
Assembly
080483db <add>:
#include <stdint.h>
int add(int a, int b) {
80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
int i = a + b;
80483e1: mov edx,DWORD PTR [ebp+0x8]
80483e4: mov eax,DWORD PTR [ebp+0xc]
80483e7: add eax,edx
80483e9: mov DWORD PTR [ebp-0x4],eax
return i;
80483ec: mov eax,DWORD PTR [ebp-0x4]
}
102 operating system: from 0 to 1
80483ef: leave
80483f0: ret
[ebp+0x8] accesses a.
[ebp+0xc] access b.
For accessing arguments, the rule is that the closer a variable on stack
to ebp, the closer it is to a function name.
ebp+0x8 ebp+0x4
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0xffe0 N i
Figure 4.9.6: Function arguments
N = Next local variable starts here and local variables in memory
From the figure, we can see that a and b are laid out in memory
with the exact order as written in C, relative to the return address.
Source
#include <stdio.h>
return a + b;
}
add(1,1);
return 0;
}
Assembly
080483f2 <main>:
int main(int argc, char *argv[]) {
80483f2: push ebp
80483f3: mov ebp,esp
add(1,2);
80483f5: push 0x2
80483f7: push 0x1
80483f9: call 80483db <add>
80483fe: add esp,0x8
return 0;
8048401: mov eax,0x0
}
8048406: leave
8048407: ret
Upon finishing the call to add function, the stack is restored by adding
0x8 to stack pointer esp (which is equivalent to 2 pop instructions).
Finally, a leave instruction is executed and main returns with a ret
instruction. A ret instruction transfers the program execution back to
the caller to the instruction right after the call instruction, the add
instruction. The reason ret can return to such location is that the
104 operating system: from 0 to 1
080483db <add>:
#include <stdio.h>
int add(int a, int b) {
80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
int local = 0x12345;
80483e1: DWORD PTR [ebp-0x4],0x12345
return a + b;
80483e8: mov edx,DWORD PTR [ebp+0x8]
80483eb: mov eax,DWORD PTR [ebp+0xc]
80483ee: add eax,edx
}
80483f0: leave
80483f1: ret
Exercise 4.9.3. The above code that gcc generated for function
calling is actually the standard method x86 defined. Read chapter 6,
Produce Calls, Interrupts, and Exceptions, Intel manual volume 1.
4.9.6 Loop
Source
#include <stdio.h>
x86 assembly and c 105
return 0;
}
Assembly
080483db <main>:
#include <stdio.h>
int main(int argc, char *argv[]) {
80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
for (int i = 0; i < 10; i++) {
80483e1: mov DWORD PTR [ebp-0x4],0x0
80483e8: jmp 80483ee <main+0x13>
80483ea: add DWORD PTR [ebp-0x4],0x1
80483ee: cmp DWORD PTR [ebp-0x4],0x9
80483f2: jle 80483ea <main+0xf>
}
return 0;
80483f4: b8 00 00 00 00 mov eax,0x0
}
80483f9: c9 leave
80483fa: c3 ret
80483fb: 66 90 xchg ax,ax
80483fd: 66 90 xchg ax,ax
80483ff: 90 nop
4.9.7 Conditional
Source
#include <stdio.h>
if (argc) {
i = 1;
} else {
i = 0;
}
return 0;
}
Assembly
The generated assembly code follows the same order as the corre-
sponding high level syntax:
green instruction is the exit point for both if and else branch.
Every program consists of code and data, and only those two com-
ponents made up a program. However, if a program consists purely
code and data of its own, from the perspective of an operating system
(as well as human), it does not know in a program, which block of
binary is a program and which is just raw data, where in the program
to start execution, which region of memory should be protected and
which is free to modify. For that reason, each program carries extra
metadata to communicate with the operating system how to handle
the program.
ELF lists various sections used for code and data, and the memory
addresses of each symbol along with other information.
An ELF header: the very first section of an executable that de- ELF header
scribes the files organization.
A program header table: is an array of fixed-size structures that program header table
describes segments of an executable.
A section header table: is an array of fixed-size structures that section header table
describes sections of an executable.
Segments and sections are the main content of an ELF binary, Segments and sections
which are the code and data, divided into chunks of different pur-
poses.
{
Wikipedia)
.text
.rodata
{ ...
.data
$ man elf
$ readelf -h hello
The output:
112 operating system: from 0 to 1
Magic
Output Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Byte Description
Class
Possible values:
Value Description
0 Invalid class
1 32-bit objects
2 64-bit objects
Data
Possible values:
Value Description
Version
Possible values:
Value Description
0 Invalid version
1 Current version
OS/ABI
Type
0 No file type
1 Relocatable file
2 Executable file
3 Shared object file
4 Core file
0xff00 Processor specific, lower bound
0xffff Processor specific, upper bound
The values from 0xff00 to 0xffff are reserved for a processor to
define additional file types meaningful to it.
Machine
Specifies the required architecture value for an ELF file e.g. x86_64,
MIPS, SPARC, etc. In the example, the machine is of x86_64
architecture.
Version
Specifies the version number of the current object file (not the
version of the ELF header, as the above Version field specified).
the anatomy of a program 115
Specifies the memory address where the very first code to be ex-
ecuted. The address of main function is the default in a normal
application program, but it can be any function by explicitly spec-
ifying the function name to gcc. For the operating system we are
going to write, this is the single most important field that we need
to retrieve to bootstrap our kernel, and everything else can be
ignored.
The offset of the section header table in bytes, similar to the start
of program headers. In the example, it is 6648 bytes into file.
Flags
Hold processor-specific flags associated with the file. When the pro-
gram is loaded, in a x86 machine, EFLAGS register is set according
to this value. In the example, the value is 0x0, which means EFLAGS
register is in a clear state.
Specifies the total size of ELF headers size in bytes. In the exam-
ple, it is 64 bytes, which is equivalent to Start of program headers.
Note that these two numbers are not necessary equivalent, as pro-
gram header table might be placed far away from the ELF header.
The only fixed component in the ELF executable binary is the ELF
header, which appears at the very beginning of the file.
Specifies the index of the header in the section header table that
points to the section that holds all null-terminated strings. In the
example, the index is 28, which means its the 28th entry of the
table.
Every section in an object file has exactly one section header de-
scribing it. But, section headers may exist that do not have a
section.
An object file may have inactive space. The various headers and the
sections might not cover every byte in an object file. The contents
of the inactive data are unspecified.
To get all the headers from an executable binary e.g. hello, use the
following command:
$ readelf -S hello
Here is a sample output (do not worry if you dont understand the
output. Just skim to get your eyes familiar with it. We will dissect it
soon enough):
summarizes the total number of sections in the file, and where the
address where it starts. Then, comes the listing section by section
with the following header, is also the format of each section output:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
Type This field (in a section header) identifies the type of each section.
Types classify sections (similar to types in programming languages
are used by a compiler).
120 operating system: from 0 to 1
Address The starting virtual address of each section. Note that the
addresses are virtual only when a program runs in an OS with
support for virtual memory enabled. In our OS, since we run on
bare metal, the addresses will all be physical.
Offset The offset of each section into a file. An offset is a distance in offset
bytes, from the first byte of a file to the start of an object, such as a
section or a segment in the context of an ELF binary file.
Flag Descriptions
M The data in the section may be merged to eliminate duplication. Each element in the
section is compared against other elements in sections with the same name, type and flags.
Elements that would have identical values at program run-time may be merged.
S The data elements in the section consist of null-terminated character strings. The size of
each character is specified in the section headers EntSize field.
l Specific large section for x86_64 architecture. This flag is not specified in the Generic
ABI but in x86_64 ABI.
I The Info field of this section header holds an index of a section header. Otherwise, the
number is the index of something else.
L Preserve section ordering when linking. If this section is combined with other sections in
the output file, it must appear in the same relative order with respect to those sections, as
the linked-to section appears with respect to sections the linked-to section is combined
with. Apply when the Link field of this sections header references another section (the
linked-to section)
G This section is a member (perhaps the only one) of a section group.
T This section holds Thread-Local Storage, meaning that each thread has its own distinct
instance of this data. A thread is a distinct execution flow of code. A program can have
multiple threads that pack different pieces of code and execute separately, at the same
time. We will learn more about threads when writing our kernel.
E Link editor is to exclude this section from executable and shared library that it builds
when those objects are not to be further relocated.
x Unknown flag to readelf. It happens because the linking process can be done manually
with a linker like GNU ld (we will later later). That is, section flags can be specified
manually, and some flags are for a customized ELF that the open-source readelf doesnt
know of.
O This section requires special OS-specific processing (beyond the standard linking rules) to
avoid incorrect behavior. A link editor encounters sections whose headers contain
OS-specific values it does not recognize by Type or Flags values defined by ELF standard,
the link editor should combine those sections.
o All bits included in this flag are reserved for operating system-specific semantics.
p All bits included in this flag are reserved for processor-specific semantics. If meanings are
specified, the processor supplement explains them.
Link and Info are numbers that references the indexes of sections,
symbol table entries, hash table entries. Link field holds the index
122 operating system: from 0 to 1
Later when writing our OS, we will handcraft the kernel image
by explicitly linking the object files (produced by gcc) through
a linker script. We will specify the memory layout of sections by
specifying at what addresses they will appear in the final image.
But we will not assign any section flag and let the linker take care
of it. Nevertheless, knowing which flag does what is useful.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
Nr is 1.
EntSize is 0, which means this section does not have any fixed-size
entry.
Info and Link are 0 and 0, which means this section links to no
section or entry in any table.
Output
[14] .text PROGBITS 00000000004003e0 000003e0
0000000000000192 0000000000000000 AX 0 0 16
Nr is 14.
EntSize is 0, which means this section does not have any fixed-size
entry.
Info and Link are 0 and 0, which means this section links to no
section or entry in any table.
Align is 16, which means the starting address of the section should
be divisible by 16, or 0x10. Indeed, it is: 0x3e0/0x10 = 0x3e.
In this section, we will learn different details of section types and the
purposes of special sections e.g. .bss, .text, .data... by looking
at each section one by one. We will also examine the content of each
section as a hexdump with the commands:
$ readelf -x 25 hello
NULL marks a section header as inactive and does not have an associ-
ated section. NULL section is always the first entry of section header
table. It means, any useful section starts from 1.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
$ readelf -x 2 hello
the anatomy of a program 125
we have:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
...
[11] .init PROGBITS 0000000000400390 00000390
000000000000001a 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000004003b0 000003b0
0000000000000020 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000004003d0 000003d0
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000004003e0 000003e0
0000000000000192 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 0000000000400574 00000574
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 0000000000400580 00000580
0000000000000004 0000000000000004 AM 0 0 4
[17] .eh_frame_hdr PROGBITS 0000000000400584 00000584
000000000000003c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 00000000004005c0 000005c0
0000000000000114 0000000000000000 A 0 0 8
...
[23] .got PROGBITS 0000000000600ff8 00000ff8
0000000000000008 0000000000000008 WA 0 0 8
[24] .got.plt PROGBITS 0000000000601000 00001000
126 operating system: from 0 to 1
0000000000000020 0000000000000008 WA 0 0 8
[25] .data PROGBITS 0000000000601020 00001020
0000000000000010 0000000000000000 WA 0 0 8
[27] .comment PROGBITS 0000000000000000 00001030
0000000000000034 0000000000000001 MS 0 0 1
.text
This section holds all the compiled code of a program.
.data
This section holds the initialized data of a program. Since the
data are initialized with actual values, gcc allocates the section
with actual byte in the executable binary.
.rodata
This section holds read-only data, such as fixed-size strings in a
program, e.g. Hello World, and others.
.bss
This section, shorts for Block Started by Symbol, holds unini-
tialized data of a program. Unlike other sections, no space is
allocated for this section in the image of the executable binary
on disk. The section is allocated only when the program is
loaded into main memory.
Other sections are mainly needed for dynamic linking, that is code
linking at runtime for sharing between many programs. To enable
such feature, an OS as a runtime environment must be presented.
Since we run our OS on bare metal, we are effectively creating such
environment. For simplicity, we wont add dynamic linking to our
OS.
SYMTAB and DYNSYM These sections hold symbol table. A symbol table
is an array of entries that describe symbols in a program. A symbol
is a name assigned to an entity in a program. The types of these
the anatomy of a program 127
entities are also the types of symbols, and these are the possible
types of an entity:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 5] .dynsym DYNSYM 00000000004002b8 000002b8
0000000000000048 0000000000000018 A 6 1 8
...
[29] .symtab SYMTAB 0000000000000000 00001068
0000000000000648 0000000000000018 30 47 8
$ readelf -s hello
LOCAL are symbols that are only visible in the object files
that defined them. In C, the static modifier marks a symbol
(e.g. a variable/function) as local to only the file that defines
it.
Example 5.4.5. If we define variables and functions with
static modifer:
hello.c
return 0;
}
hello.c
#include <stdio.h>
$ ./hello
warning: function is not implemented.
add(1,2) is 0
math.c
Value Description
HIDDEN A symbol is hidden when the name is not visible to any other program outside of its
running program.
PROTECTED A symbol is protected when it is shared outside of its running program or shared libary
and cannot be overridden. That is, there can only be one definition for this symbol
across running programs that use it. No program can define its own definition of the
same symbol.
INTERNAL Visibility is processor-specific and is defined by processor-specific ABI.
Ndx is the index of a section that the symbol is in. Aside from fixed
index numbers that represent section indexes, index has these
special values:
Value Description
main is a function.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[28] .shstrtab STRTAB 0000000000000000 000018b6
000000000000010c 0000000000000000 0 0 1
[30] .strtab STRTAB 0000000000000000 000016b0
0000000000000206 0000000000000000 0 0 1
$ readelf -p 29 hello
The output shows all the section names, with the offset (also the
string index) into .shstrtab the table to the left:
[ e6] .dynamic
[ ef] .got.plt
[ f8] .data
[ fe] .bss
[ 103] .comment
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
00000000 \0 . s y m t a b \0 . s t r t a b
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
00000010 \0 . s h s t r t a b \0 . i n t e
.... and so on ....
Figure 5.4.1: String table in
memory of .shstrtab. A red
Similarly, the output of .strtab: number is the starting index of a
string.
Output String dump of section .strtab:
[ 1] crtstuff.c
[ c] __JCR_LIST__
[ 19] deregister_tm_clones
[ 2e] __do_global_dtors_aux
[ 44] completed.7585
[ 53] __do_global_dtors_aux_fini_array_entry
[ 7a] frame_dummy
[ 86] __frame_dummy_init_array_entry
[ a5] hello.c
[ ad] __FRAME_END__
[ bb] __JCR_END__
[ c7] __init_array_end
[ d8] _DYNAMIC
136 operating system: from 0 to 1
[ e1] __init_array_start
[ f4] __GNU_EH_FRAME_HDR
[ 107] _GLOBAL_OFFSET_TABLE_
[ 11d] __libc_csu_fini
[ 12d] _ITM_deregisterTMCloneTable
[ 149] j
[ 14b] _edata
[ 152] __libc_start_main@@GLIBC_2.2.5
[ 171] __data_start
[ 17e] __gmon_start__
[ 18d] __dso_handle
[ 19a] _IO_stdin_used
[ 1a9] __libc_csu_init
[ 1b9] __bss_start
[ 1c5] main
[ 1ca] _Jv_RegisterClasses
[ 1de] __TMC_END__
[ 1ea] _ITM_registerTMCloneTable
HASH holds a symbol hash table, which supports symbol table access.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[26] .bss NOBITS 0000000000601038 00001038
0000000000000008 0000000000000000 WA 0 0 1
the anatomy of a program 137
In the above output, the size of the section is only 8 bytes, while
the offsets of both sections are the same, which means .bss con-
sumes no byte of the executable binary on disk.
REL holds relocation entries without explicit addends. This type will
be explained in details in 8.1
RELA holds relocation entries with explicit addends. This type will be
explained in details in 8.1
hello.c
138 operating system: from 0 to 1
#include <stdio.h>
return 0;
}
hello.c
#include <stdio.h>
printf("%s\n", __FUNCTION__);
}
return 0;
}
hello.c
#include <stdio.h>
void init1() {
printf("%s\n", __FUNCTION__);
}
void init2() {
printf("%s\n", __FUNCTION__);
}
140 operating system: from 0 to 1
return 0;
}
hello.c
the anatomy of a program 141
#include <stdio.h>
return 0;
}
hello.c
#include <stdio.h>
void preinit1() {
printf("%s\n", __FUNCTION__);
}
void preinit2() {
printf("%s\n", __FUNCTION__);
}
142 operating system: from 0 to 1
void init1() {
printf("%s\n", __FUNCTION__);
}
void init2() {
printf("%s\n", __FUNCTION__);
}
__attribute__((section(".init_array"))) preinit
preinit_arr[2] = {preinit1, preinit2};
__attribute__((section(".init_array"))) init init_arr[2]
= {init1, init2};
return 0;
}
GROUP defines a section group, which is the same section that appears
the anatomy of a program 143
in different object files but when merged into the final executable
binary file, only one copy is kept and the rest in other object files
are discarded. This section is only relevant in C++ object files, so
we will not examine further.
Exercise 5.4.2. Verify that the value of the Info field of a SYMTAB
section is the index of last local symbol + 1. It means, in the symbol
table, from the index listed by Info field onward, no local symbol
appears.
144 operating system: from 0 to 1
Exercise 5.4.3. Verify that the value of the Info field of a REL section
is the index of the SYMTAB section.
Exercise 5.4.4. Verify that the value of the Link field of a REL section
is the index of the section where relocation is applied. For example. if
the section is .rel.text, then the relocating section should be .text.
PHDR specifies the location and size of the program header table itself,
both in the file and in the memory image of the program
LOAD specifies a loadable segment. That is, this segment is loaded into
main memory.
A segment also has permission, which is a combination of these 3 Table 5.5.1: Segment Permission
values: Permission Description
R Readable
Read (R) W Writable
E Executable
Write (W)
Execute (E)
$ readelf -l hello
Output:
Output
Elf file type is EXEC (Executable file)
Entry point 0x400430
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000070c 0x000000000000070c R E 200000
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000228 0x0000000000000230 RW 200000
DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000005e4 0x00000000004005e4 0x00000000004005e4
146 operating system: from 0 to 1
0x0000000000000034 0x0000000000000034 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x00000000000001f0 0x00000000000001f0 R 1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr
.gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini
.rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got
the upper LOAD has Read and Execute permission. This is a text
segment. A text segment contains read-only instructions and read-
only data.
the lower LOAD has Read and Write permission. This is a data
segment. It means that this segment can be read and written
to, but is not allowed to be used as executable code, for security
reason.
the anatomy of a program 147
To see the last point clearer, consider an example of linking two object
files. Suppose we have two source files:
hello.c
#include <stdio.h>
and:
math.c
$ readelf -S math.o
$ readelf -l math.o
There are no program headers in this file.
$ readelf -l hello.o
There are no program headers in this file.
Only when object files are combined into a final executable binary,
sections are fully realized:
1st section address = starting segment address + section offset = 0x8048000 + 0x154 = 0x08048154
2nd section address = starting segment address + section offset = 0x8048000 + 0x168 = 0x08048168
Indeed, the end address of a segment is also the end address of the
final section. We can see this by listing all the segments:
$ readelf -l hello
Output
Elf file type is EXEC (Executable file)
Entry point 0x8048310
There are 9 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
INTERP 0x000154 0x08048154 0x08048154 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x005fc 0x005fc R E 0x1000
LOAD 0x000f08 0x08049f08 0x08049f08 0x00114 0x00118 RW 0x1000
DYNAMIC 0x000f14 0x08049f14 0x08049f14 0x000e8 0x000e8 RW 0x4
NOTE 0x000168 0x08048168 0x08048168 0x00044 0x00044 R 0x4
GNU_EH_FRAME 0x0004dc 0x080484dc 0x080484dc 0x00034 0x00034 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x000f08 0x08049f08 0x08049f08 0x000f8 0x000f8 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr
.gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .plt.got .text .fini
.rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
the anatomy of a program 153
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got
The last section in the first LOAD segment is .eh_frame. The sec-
tion starts at 0x08048510, with the offset 0x510 and its size is 0xec.
The end address of .eh_frame should be: 0x08048510 + 0x510 + 0xec = 0x080485fc,
exactly the same as the end address of the first LOAD segment.
Start your program, specifying anything that might affect its behav-
ior.
hello.c
#include <stdio.h>
$ gdb hello
Output
Symbols from "/tmp/hello".
Local exec file:
/tmp/hello, file type elf32-i386.
Entry point: 0x8048310
0x08048154 - 0x08048167 is .interp
0x08048168 - 0x08048188 is .note.ABI-tag
0x08048188 - 0x080481ac is .note.gnu.build-id
0x080481ac - 0x080481cc is .gnu.hash
0x080481cc - 0x0804821c is .dynsym
0x0804821c - 0x08048266 is .dynstr
0x08048266 - 0x08048270 is .gnu.version
0x08048270 - 0x08048290 is .gnu.version_r
0x08048290 - 0x08048298 is .rel.dyn
0x08048298 - 0x080482a8 is .rel.plt
0x080482a8 - 0x080482cb is .init
0x080482d0 - 0x08048300 is .plt
0x08048300 - 0x08048308 is .plt.got
0x08048310 - 0x080484a2 is .text
0x080484a4 - 0x080484b8 is .fini
0x080484b8 - 0x080484cd is .rodata
0x080484d0 - 0x080484fc is .eh_frame_hdr
0x080484fc - 0x080485c8 is .eh_frame
0x08049f08 - 0x08049f0c is .init_array
0x08049f0c - 0x08049f10 is .fini_array
0x08049f10 - 0x08049f14 is .jcr
0x08049f14 - 0x08049ffc is .dynamic
0x08049ffc - 0x0804a000 is .got
0x0804a000 - 0x0804a014 is .got.plt
0x0804a014 - 0x0804a01c is .data
0x0804a01c - 0x0804a020 is .bss
Path of a symbol file. A symbol file is the file that contains the
debugging information. Usually, this is the same file as the binary,
158 operating system: from 0 to 1
The path of the debugging program and its file type. In the exam-
ple, it is this line:
The entry point to the debugging program. That is, the very first
code the program runs. In the example, it is this line:
The output is similar to info target, but with more details. Next
to the section names are the section flags, which are attributes of a
section. Here, we can see that the sections with LOAD flag are from
LOAD segment. The command can be combined with the section flags
for filtered outputs:
ALLOBJ displays sections for all loaded object files, including shared
libraries. Shared libraries are only displayed when the program is
already running.
LOAD Section will be loaded from the file into the child process
memory. Set for pre-initialized code and data, clear for .bss
sections.
The output:
This commands list all function names and their loaded addresses.
The names can be filtered with a regular expression.
This command lists all global and static variable names, or filtered
with a regular expression.
5 printf("Hello World!\n");
0x0804841c <+17>: sub esp,0xc
0x0804841f <+20>: push 0x80484c0
0x08048424 <+25>: call 0x80482e0 <puts@plt>
0x08048429 <+30>: add esp,0x10
6 return 0;
0x0804842c <+33>: mov eax,0x0
7 }
0x08048431 <+38>: mov ecx,DWORD PTR [ebp-0x4]
0x08048434 <+41>: leave
0x08048435 <+42>: lea esp,[ecx-0x4]
0x08048438 <+45>: ret
End of assembler dump.
Now the high level source (in green text) is included as part of the
assembly dump. Each line is backed by the corresponding assembly
code below it.
6.2.6 Command: x
(gdb) x main
Output 0x804840b <main>: 0x8d 0x4c 0x24 0x04 0x83 0xe40xf0 0xff
0x8048413 <main+8>: 0x71 0xfc 0x55 0x89 0xe5 0x510x83 0xec
0x804841b <main+16>: 0x04 0x83 0xec 0x0c
(gdb) r
Output
Starting program: /tmp/hello
Hello World!
[Inferior 1 (process 1002) exited normally]
hello.c
1 #include <stdio.h>
2
3 int main(int argc, char *argv[])
4 {
5 printf("Hello World!\n");
6 return 0;
7 }
(gdb) b 3
runtime inspection and debug 169
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0x7fffffffdfb8) at hello.c:5
5 printf("Hello World!\n");
Example 6.3.3. Line of code is not always the reliable way to specify
a breakpoint, as the source code can be changed. What if gdb should
always stop at main function? In this case, a better method is to use
the function name directly:
b main
Then, regardless of how the source code changes, gdb always stops
at the main function.
Output
$3 = {int (int, char **)} 0x400526 <main>
b *0x400526
Example 6.3.5. gdb can also set breakpoint in any source file. Sup-
pose that hello program is composed not just one file but many files
e.g. hello1.c, hello2.c, hello3.c... In that case, simply add the
filename before either a line number:
b hello.c:3
b hello.c:main
This command executes the current line and stops at the next line.
When the current line is a function call, steps over it.
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0x7fffffffdfb8) at hello.c:5
5 printf("Hello World!\n");
(gdb) n
In the output, the first line shows the output produced after execut-
ing line 5; then, the next line shows where gdb stops currently, which
is line 6.
This command executes the current line and stops at the next line.
When the current line is a function call, steps into it to the first next
line in the called function.
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0xffffd154) at hello.c:11
11 add(1, 2);
172 operating system: from 0 to 1
(gdb) s
Output
add (a=1, b=2) at hello.c:6
6 return a + b;
After executing the command s, gdb stepped into the add function
where the first statement is a return.
6.3.5 Command: ni
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0xffffd154) at hello.c:5
5 printf("Hello World!\n");
(gdb) ni
Output
0x0804841f 5 printf("Hello World!\n");
(gdb) ni
Output
0x08048424 5 printf("Hello World!\n");
174 operating system: from 0 to 1
(gdb) ni
(gdb)
Output 6 return 0;
Upon entering ni, gdb executes current instruction and display the
next instruction. Thats why from the output, gdb only displays 3 ad-
dresses: 0x0804841f, 0x08048424 and 0x08048429. The instruction at
0x0804841c, which is the first instruction of printf, is not displayed
because it is the first instruction that gdb stopped at. Assume that
gdb stopped at the first instruction of printf at 0x0804841c, the
current instruction can be displayed using x command:
6.3.6 Command: si
(gdb) si
Output
0x0804841f 5 printf("Hello World!\n");
(gdb) si
176 operating system: from 0 to 1
Output
0x08048424 5 printf("Hello World!\n");
(gdb) si
Output
0x080482e0 in puts@plt ()
This command executes until the next line is greater than the current
line.
hello.c
#include <stdio.h>
int add1000() {
int total = 0;
printf("Done adding!\n");
runtime inspection and debug 177
return total;
}
Using next command, we need to press 1000 times for finishing the
loop. Instead, a faster way is to use until:
(gdb) b add1000
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, add1000 () at hello.c:4
4 int total = 0;
(gdb) until
Output
5 for (int i = 0; i < 1000; ++i){
(gdb) until
Output 6 total += i;
178 operating system: from 0 to 1
(gdb) until
Output
5 for (int i = 0; i < 1000; ++i){
(gdb) until
Output
8 printf("Done adding!\n");
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, add1000 () at hello.c:4
4 int total = 0;
(gdb) until 8
Output
add1000 () at hello.c:8
8 printf("Done adding!\n");
runtime inspection and debug 179
This command executes until the end of a function and displays the
return value. finish is actually just a more convenient version of
until.
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, add1000 () at hello.c:4
4 int total = 0;
(gdb) finish
Output
Run till exit from #0 add1000 () at hello.c:4
Done adding!
0x08048466 in main (argc=1, argv=0xffffd154) at hello.c:15
15 add1000(1, 2);
Value returned is $1 = 499500
6.3.9 Command: bt
This command prints the backtrace of all stack frames. A backtrace is backtrace
a list of currently active functions:
hello.c
void d(int d) { };
void c(int c) { d(0); }
void b(int b) { c(1); }
void a(int a) { b(2); }
180 operating system: from 0 to 1
(gdb) b a
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, a (a=3) at hello.c:9
9 void a(int a) { b(2); }
(gdb) s
Output
b (b=2) at hello.c:7
7 void b(int b) { c(1); }
(gdb) s
Output
c (c=1) at hello.c:5
5 void c(int c) { d(0); }
(gdb) s
runtime inspection and debug 181
Output
d (d=0) at hello.c:3
3 void d(int d) { };
(gdb) bt
Output
#0 d (d=0) at hello.c:3
#1 0x080483eb in c (c=1) at hello.c:5
#2 0x080483fb in b (b=2) at hello.c:7
#3 0x0804840b in a (a=3) at hello.c:9
#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13
Most-recent calls are placed on top and least-recent calls are near
the bottom. In this case, d is the most current active function, so it
has the index 0. Next is c, the 2nd active function, has the index 1
and so on with function b, function a, and finally function main at the
bottom, the least-recent function. That is how we read a backtrace.
6.3.10 Command: up
(gdb) bt
Output
#0 d (d=0) at hello.c:3
#1 0x080483eb in c (c=1) at hello.c:5
#2 0x080483fb in b (b=2) at hello.c:7
#3 0x0804840b in a (a=3) at hello.c:9
#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13
(gdb) up
182 operating system: from 0 to 1
Output
#1 0x080483eb in c (c=1) at hello.c:3
3 void b(int b) { c(1); }
The output displays the current frame is moved to c and where the
call to c is made, which is in function b at line 3.
Similar to up, this command goes down one frame later then the
current frame.
(gdb) bt
Output
#0 d (d=0) at hello.c:3
#1 0x080483eb in c (c=1) at hello.c:5
#2 0x080483fb in b (b=2) at hello.c:7
#3 0x0804840b in a (a=3) at hello.c:9
#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13
(gdb) up
Output
#1 0x080483eb in c (c=1) at hello.c:3
3 void b(int b) { c(1); }
(gdb) down
Output
#0 d (d=0) at hello.c:1
1 void d(int d) { };
runtime inspection and debug 183
The above registers suffice for writing our operating system in later
part.
83 ec 0c cc ec 0c
Figure 6.4.1: Opcode replace-
sub esp,0x4 int 3
ment, with int 3
cc ec 0c 83 ec 0c
Figure 6.4.2: Restore the orig-
int 3 sub esp,0x4
inal opcode, after int 3 was
executed
Example 6.4.1. It is simple to see int 3 in action. First, we add an
int 3 instruction where we need gdb to stop:
hello.c
#include <stdio.h>
$ gdb hello
(gdb) r
Output
Starting program: /tmp/hello
Program received signal SIGTRAP, Trace/breakpoint trap.
main (argc=1, argv=0xffffd154) at hello.c:6
6 printf("Hello World\n");
hello.c DIE
Line 1 #include <stdio.h> ....
Line 2 ....
Line 3 int main(int argc, char *argv[]) main in hello.c is at
Line 5 .......... 0x804840b in hello
Line 6 .......... ....
....
then when the binary actually runs, .text should really be loaded at
0x800000 for gdb to be able to correctly match running instructions
with high-level code statement. Address mismatching makes debug
information useless, as actual code at one address is displayed as code
at another address. Without this knowledge, we will not be able to
build an operating system that can be debugged with gdb.
hello.c
#include <stdio.h>
return 0;
}
With the binary ready, we can look at the line number table with
the command:
Line number is the line number in the source file of which the line
is not an empty line. In the example, line 8 is an empty line, so it
does not appear.
Starting address is the memory address where the line actually starts
in the executable binary.
With such crystal clear information, this is how gdb is able to set a
breakpoint on a line easily. For placing breakpoints on variables and
functions, it is time to look at the DIEs. To get the DIEs information
from an executable binary, run the command:
-wi option lists all the DIE entries. This is one typical DIE entry:
Blue These numbers in hex format indicate the offsets into .debug_info
section. Each meaningful information is displayed along with its
offset. When an attribute references to another attribute, the offset
is used to precisely identify the referenced attribute.
Green These names with DW_AT_ prefix are the attributes attached to
a DIE that describe an entity. Notable attributes:
DW_AT_name
DW_AT_comp_dir The filename of the compilation unit and the
directory where compilation occurred. Without the filename
and the path, gdb would not be able to display the high-level
source, despite the availability of the debug info. Debug info only
contains the mapping between source and binary, not the source
code itself.
DW_AT_low_pc
DW_AT_high_pc The start and end of the current entity, which
is the compilation unit, in the executable binary. The value in
DW_AT_low_pc is the starting address. DW_AT_high_pc is the size
of the compilation unit, when adding up to DW_AT_low_pc re-
sults in the end address of the entity. In this example, code com-
piled from hello.c starts at 0x804840b and end at 0x804840b + 0x2e = 0x8048439.
To really make sure, we verify with objdump:
Output
int main(int argc, char *argv[])
{
804840b: 8d 4c 24 04 lea ecx,[esp+0x4]
804840f: 83 e4 f0 and esp,0xfffffff0
8048412: ff 71 fc push DWORD PTR [ecx-0x4]
190 operating system: from 0 to 1
Then, the all DIE entries in test.c are displayed before the DIE
entries in hello.c:
<c> DW_AT_producer : (indirect string, offset: 0x0): GNU C11 5.4.0 20160609
-masm=intel -m32 -mtune=generic -march=i686 -g -fstack-protector-strong
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0x64): test.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x5f): /tmp
<19> DW_AT_low_pc : 0x804840b
<1d> DW_AT_high_pc : 0x6
<21> DW_AT_stmt_list : 0x0
<1><25>: Abbrev Number: 2 (DW_TAG_subprogram)
<26> DW_AT_external : 1
<26> DW_AT_name : bar
<2a> DW_AT_decl_file : 1
<2b> DW_AT_decl_line : 1
<2c> DW_AT_low_pc : 0x804840b
<30> DW_AT_high_pc : 0x6
<34> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<36> DW_AT_GNU_all_call_sites: 1
....after all DIEs in test.c listed....
<0><42>: Abbrev Number: 1 (DW_TAG_compile_unit)
<43> DW_AT_producer : (indirect string, offset: 0x0): GNU C11 5.4.0 20160609
-masm=intel -m32 -mtune=generic -march=i686 -g -fstack-protector-strong
<47> DW_AT_language : 12 (ANSI C99)
<48> DW_AT_name : (indirect string, offset: 0xc5): hello.c
<4c> DW_AT_comp_dir : (indirect string, offset: 0x5f): /tmp
<50> DW_AT_low_pc : 0x8048411
<54> DW_AT_high_pc : 0x2e
<58> DW_AT_stmt_list : 0x35
....then all DIEs in hello.c are listed....
Part II
Groundwork
7
Bootloader
BIOS provides many basic services for controlling the hardware at the
boot stage. A service is a group of routines that controls a particular
hardware device, or returns information of current system. Each
service is given an interrupt number. To call a BIOS routine, an
int instruction must be used with an interrupt number. Each BIOS
service defines its own numbers for its routines; to call a routine, a
specific number must be written to a register required by each service.
The list of all BIOS interrupts is available with Ralf Browns Interrupt
List at: http://www.cs.cmu.edu/~ralf/files.html.
This is when the operating system stands on its own: it must provide
its own kernel drivers for talking to hardware.
(c) Jump to the starting code address of the kernel and execute.
bootloader.asm
1 ;******************************************
2 ; Bootloader.asm
200 operating system: from 0 to 1
3 ; A Simple Bootloader
4 ;******************************************
5 bits 16
6 start: jmp boot
7
8 ;; constant and variable definitions
9 msg db "Welcome to My Operating System!", 0ah, 0dh, 0h
10
11 boot:
12 cli ; no interrupts
13 cld ; all that we need to init
14 hlt ; halt the system
15
16 ; We have to be 512 bytes. Clear the rest of the bytes with
0
17 times 510 - ($-$$) db 0
18 dw 0xAA55 ; Boot Signiture
With option -S, QEMU waits for gdb to connect before it starts
running.
Output
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB. Attempting to continue with the default i8086 settings.
The target architecture is assumed to be i8086
Then, connect gdb to the waiting virtual machine with this com-
mand:
(gdb) b *0x7c00
Note the before the memory address. Without the asterisk, gdb
treats the address as a symbol in a program rather than an address.
Then, for convenience, we use a split layout for viewing the assembly
code and registers together:
(gdb) c
7.5.1 Debugging
If, for some reason, the sample bootloader cannot get to such screen
and gdb does not stop at 0x7c00, then the following scenarios are
likely:
204 operating system: from 0 to 1
$ hd disk.img | less
If the first 512 bytes are all zeroes, then it is likely that the boot-
loader is incorrectly written to another sector.
First, create a file io.asm for I/O related routines. Then, write the
following routines:
1. MovCursor
Parameters:
bootloader 205
bh = Y coordinate
bl = X coordinate.
Return: None
2. PutChar
Parameters:
al = Character to print
bl = text color
Return: None
3. Print
Parameters:
Return: None
Now that we get the feel of how to use the BIOS services, it is time for
something more complicated. We will place our kernel on 2nd sector
onward, and our bootloader reads 30 sectors starting from 2nd sector.
Why 30 sectors? Our kernel will grow gradually, so we will preserve 30
206 operating system: from 0 to 1
sectors and save us time for modifying the bootloader each time the
kernel size expands another sector.
read a floppy disk. Inside a floppy drive contains an arm with 2 heads, Sector
each head reads a side of a floppy drive; head 0 writes the upper side
and head 1 writes the lower side of a floppy disk.
When a floppy drive writes data to a brand new floppy disk, track 0
on the upper side is written first, by head 0. When the upper track 0
is full, the lower track 0 is used by head 1. When both the upper and
lower side of a track 0 are full, it goes back to head 0 for writing data
again, but this time the upper side of track 1 and so on, until no space
Figure 7.6.2: Floppy disk platter
left on the device. The same procedure is also applied for reading data with 2 sides.
from floppy disk. Head 0
Head 1
bootloader 207
First, we need to a sample program for writing into the 2nd sector, so
we can experiment with floppy disk reading:
sample.asm
;******************************************
; sample.asm
; A Sample Program
;******************************************
mov eax, 1
add eax, 1
Next, we need to fix the bootloader for reading from the floppy
disk and load a number of arbitrary sectors. Before doing so, a basic
understanding of floppy disk is required. To read data from disk,
interrupt 13 with AH = 02 is a routine for reading sectors from disk
into memory:
AH = 02
AL = number of sectors to read (1-128 dec.)
CH = track/cylinder number (0-1023 dec., see below)
CL = sector number (1-17 dec.)
DH = head number (0-15 dec.)
DL = drive number (0=A:, 1=2nd floppy, 80h=drive 0, 81h=drive 1)
208 operating system: from 0 to 1
Apply the above routine, the bootloader can read the 2nd sector:
bootloader.asm
;******************************************
; Bootloader.asm
; A Simple Bootloader
;******************************************
bits 16
start: jmp boot
boot:
cli ; no interrupts
cld ; all that we need to init
Makefile
bootloader:
nasm -f bin bootloader.asm -o bootloader.o
kernel:
nasm -f bin sample.asm -o bootloader.o
$ make bootdisk
1474560 bytes (1.5 MB, 1.4 MiB) copied, 0.00482188 s, 306 MB/s
dd conv=notrunc if=bootloader.o of=disk.img bs=512 count=1 seek=0
0+1 records in
0+1 records out
10 bytes copied, 7.0316e-05 s, 142 kB/s
dd conv=notrunc if=sample.o of=disk.img bs=512 count=1 seek=1
0+1 records in
0+1 records out
10 bytes copied, 0.000208375 s, 48.0 kB/s
First, the name disk.img are all over the place. When we want to
change the disk image name e.g. floppy_disk.img, all the places with
the name disk.img must be changed manually. To solve this problem,
we use a variable, and every appearance of disk.img is replaced with
the reference to the variable. This way, only one place that is changed
- the variable definition - all other places are updated automatically.
The following variables are added:
BOOTLOADER=bootloader.o
OS=sample.o
DISK_IMG=disk.img.o
The second problem is, the name bootloader and sample ap-
pears as part of the filenames of the source files e.g. bootloader.asm
and sample.asm, as well as the filenames of the binary files e.g.
bootloader and sample. Similar to disk.img, when a name changed,
every reference of that name must also be changed manually for both
the names of the source files and the names of the binary files e.g.
if we change bootloader.asm to loader.asm, then the object file
bootloader.o needs changing to loader.o. To solve this problem,
instead of changing filenames manually, we create a rule that au-
tomatically generate the filenames of one extension to another. In
this case, we want any source file that starts with .asm to have its
equivalent binary files, without any extension e.g. bootloader.asm
212 operating system: from 0 to 1
bootloader.asm sample.asm
bootloader.o sample.o
%.o: %.asm
nasm -f bin $< -o $@
$< is a special variable that refers to the input of the recipe: %.asm.
When the recipe is executed, the variables are replaced with the
actual values. For example, if a transformation is bootloader.asm
bootloader.o, then the actual command executed when replace the
Figure 7.7.1: A better project
placeholders in the recipe is:
layout
Makefile
BOOTLOADER=bootloader.o
OS=sample.o
DISK_IMG=disk.img
all: bootdisk
%.o: %.asm
nasm -f bin $< -o $@
bootdisk: $(BOOTLOADER_OBJS)
dd if=/dev/zero of=$(DISK_IMG) bs=512 count=2880
dd conv=notrunc if=$(BOOTLOADER) of=$(DISK_IMG) bs=512
count=1 seek=0
dd conv=notrunc if=$(OS) of=$(DISK_IMG) bs=512 count=1
seek=1
The object files are in the same directory as the source files, making
it more difficult when working with the source tree. Ideally, object files
and source files should live in different directories. We want a better
organized directory layout like Figure 7.7.1.
bootloader/Makefile
BUILD_DIR=../build/bootloader
The entire recipe implements the transformation from <source_file.asm> Figure 7.7.3: Makefile in os/
../build/<object_file.o>. Note that all paths must be correct. .
If we try to build object files in a different directory e.g. current di- bootloader
bootloader.asm
rectory, it will not work since there is no such recipe exists to build
Makefile
objects at such a path. build
bootloader
We also create a similar Makefile for os/ directory:
bootloader.o
disk.img
os
os/Makefile
sample.o
BUILD_DIR=../build/os Makefile
os
Makefile
OS_SRCS := $(wildcard *.asm) sample.asm
OS_OBJS := $(patsubst %.asm, $(BUILD_DIR)/%.o, $(OS_SRCS))
bootloader 215
all: $(OS_OBJS)
$(BUILD_DIR)/%.o: %.asm
Figure 7.7.4: Top-level Makefile
nasm -f bin $< -o $@
.
bootloader
bootloader.asm
For now, it looks almost identical to the Makefile for bootloader. In Makefile
the next chapter, we will update it for C code. Then, we update the build
bootloader
top-level Makefile:
bootloader.o
disk.img
os
Makefile sample.o
BUILD_DIR=build Makefile
os
BOOTLOADER=$(BUILD_DIR)/bootloader/bootloader.o Makefile
OS=$(BUILD_DIR)/os/sample.o sample.asm
DISK_IMG=disk.img
all: bootdisk
bootloader:
make -C bootloader
os:
make -C os
bootdisk: bootloader os
dd if=/dev/zero of=$(DISK_IMG) bs=512 count=2880
dd conv=notrunc if=$(BOOTLOADER) of=$(DISK_IMG) bs=512
count=1 seek=0
dd conv=notrunc if=$(OS) of=$(DISK_IMG) bs=512 count=1
seek=1
qemu:
qemu-system-i386 -machine q35 -fda $(DISK_IMG) -gdb tcp
::26000 -S
Bootloader Makefile:
clean:
rm $(BUILD_DIR)/*
OS Makefile:
clean:
rm $(BUILD_DIR)/*
bootloader 217
Top-level Makefile:
clean:
make -C bootloader clean
make -C os clean
Simply invoking make clean at the project root, all object files the are
removed.
Syntax Description
target: prerequisites
command
For example, one target is for building the project, one target is for
generating the documents e.g. test reports, another target for running
the whole test suite and all runs every main targets.
.gdbinit
define hook-stop
# Translate the segment:offset into a physical address
printf "[%4x:%4x] ", $cs, $eip
x/i $cs*16+$eip
end
layout asm
layout reg
Every time the QEMU virtual machine starts, gdb must always con-
nect to port 26000. To avoid the trouble of manually connecting to
the virtual machine, add the command:
b *0x7c00
Relocation is the process of replacing symbol references with its ac- Relocation
tual symbolic definitions in an object file. A symbol reference is the
memory address of a symbol.
The list of items represents the relocation table, where the memory
location for each symbol (item) is predetermined.
The new address, where all the goods are delivered, represents the
final executable binary or the final object file. Since the items on
display are not for sale, the shop owner delivers brand new goods
instead. Similarly, the object files are not merged together, but
copied all over a new file, the object/executable file.
main.c
int i;
void foo();
int main(int argc, char *argv[])
{
i = 5;
foo();
return 0;
}
void foo() {}
linking and loading on bare metal 223
$ readelf -r main.o
The output:
8.1.1 Offset
An offset is the location into a section of a binary file, where the offset
actual memory address of a symbol definition is replaced. The section
with .rel prefix determines which section to offset into. For example,
.rel.text is the relocation table of symbols whose address needs
correcting in .text section, at a specific offset into .text section. In
the example output:
8.1.2 Info
Info specifies index of a symbol in the symbol table and the type of
relocation to perform.
The pink number is the index of symbol foo in the symbol table,
and the green number is the relocation type. The numbers are written
in hex format. In the example, 0a means 10 in decimal, and symbol
foo is indeed at index 10:
8.1.3 Type
Type represents the type value in textual form. Looking at the type of
foo:
Relocated Of f set = S + A P
8.1.4 Sym.Value
This field shows the symbol value. A symbol value is a value assigned
to a symbol, whose meaning depends on the Ndx field:
Output Symbol table .symtab contains 75 entries: gcc -g -m32 -masm=intel hello.o -o hello
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048154 0 SECTION LOCAL DEFAULT 1
2: 08048168 0 SECTION LOCAL DEFAULT 2
3: 08048188 0 SECTION LOCAL DEFAULT 3
....output omitted...
64: 08048409 6 FUNC GLOBAL DEFAULT 14 foo
65: 0804a020 0 NOTYPE GLOBAL DEFAULT 26 _end
66: 080482e0 0 FUNC GLOBAL DEFAULT 14 _start
67: 08048488 4 OBJECT GLOBAL DEFAULT 16 _fp_hw
68: 0804a01c 4 OBJECT GLOBAL DEFAULT 26 i
69: 0804a018 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
70: 080483db 46 FUNC GLOBAL DEFAULT 14 main
...ouput omitted...
Unlike the values of the symbols foo, i and main as in the hello.o
228 operating system: from 0 to 1
Relocated Of f set = S + A P
where
The distance between the usage of foo in main.o and its definition,
applying the formula S + A P is: 2e + 0 1c = 12. That is, the
place where memory fixing starts is 0x12 or 18 bytes away from the
linking and loading on bare metal 229
The place where memory fixing starts is after the opcode e8, with
the mock value fc ff ff ff, which is -4 in decimal. However, the
assembly code, the value is displayed as 1c. the memory address right
after e8. The reason is that the instruction e8 starts at 1b and ends
230 operating system: from 0 to 1
A linker is a program that combines separated object files into a final linker
binary file. When gcc is invoked, it runs ld underneath to turn object
files into the final executable file..
A linker script is a text file that instructs how a linker should linker script
combine object files. When gcc runs, it uses its default linker script
to build the memory layout of a compiled binary file. Standardized
memory layout is called object file format e.g. ELF includes program
headers, section headers and their attributes. The default linker script
is made for running in the current operating system environment9 . 9
To view the default script, use
--verbose option:
Running on bare metal, the default script cannot be used as it is not
ld --verbose
designed for such environment. For that reason, a programmer needs
to supply his own linker script for such environments.
COMMAND
{
sub-command 1
sub-command 2
.... more sub-command....
}
232 operating system: from 0 to 1
main.lds
SECTIONS /* Command */
{
. = 0x10000; /* sub-command 1 */
.text : { *(.text) } /* sub-command 2 */
. = 0x8000000; /* sub-command 3 */
.data : { *(.data) } /* sub-command 4 */
.bss : { *(.bss) } /* sub-command 5 */
}
Code Dissection:
Code Description
SECTION Top-level command that declares a list of custom program
sections. ld provides a set of such commands.
. = 0x10000; Set location counter to the address 0x10000. Location counter
specifies the base address for subsequent commands. In this
example, subsequent commands will use 0x10000 onward.
.text : { *(.text) } Since location counter is set to 0x10000, the output .text in
the final binary file will starts at the address 0x10000. This
command combines all .text sections from all object files with
*(.text) syntax into a final .text section. The * is the
wildcard which matches any file name.
. = 0x8000000; Again, the location counter is set to 0x8000000. Subsequent
commands will use this address for working with sections.
.data : { *(.data) } All .data section are combined into one .data section in the
final binary file.
.bss : { *(.bss) } All .bss section are combined into one .bss section in the final
binary file.
main.c
void test() {}
int main(int argc, char *argv[])
{
return 0;
}
Then, we compile the file and explicitly invoke ld with the linker
script:
-m Specify object file format that ld produces. In the example, elf_i386 means a 32-bit ELF
is to be produced.
-o Specify the name of the final executable binary.
-T Specify the linker script to use. In the example, it is main.lds.
The remaining input is a list of object files for linking. After the
command ld is executed, the final executable binary - main - is pro-
duced. If we try running it:
$ ./main
Segmentation fault
The reason is that when linking manually, the entry address must
be explicitly set, or else ld sets it to the start of .text section by
default. We can verify from the readelf output:
$ readelf -h main
234 operating system: from 0 to 1
we see that the address 0x10000 does not start at main function
when the program runs:
return 0;
10009: b8 00 00 00 00 mov eax,0x0
}
1000e: 5d pop ebp
1000f: c3 ret
ENTRY(main)
Recompile the executable binary file main again. This time, the out-
put from readelf is different:
Version: 0x1
Entry point address: 0x10006
Start of program headers: 52 (bytes into file)
Start of section headers: 9168 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 14
Section header string table index: 11
$ gdb ./main
(gdb) b test
(gdb) b main
(gdb) r
linking and loading on bare metal 237
Output
Starting program: /tmp/main
Breakpoint 2, main (argc=-11493, argv=0x0) at main.c:5
5 return 0;
$ ./main
Segmentation fault
hello.c
void test() {}
int main(int argc, char *argv[])
{
asm("mov eax, 0x1\n"
"mov ebx, 0x0\n"
"int 0x80");
}
Now that we can precisely control where the program runs initially,
it is easy to bootstrap the kernel from the bootloader. Before we move
on to the next section, note how readelf and objdump can be applied
238 operating system: from 0 to 1
$ readelf -e main
Segment Sections...
00 .text .eh_frame
01
First, we need to craft our own program header table by using the
following syntax:
PHDRS
{
<name> <type> [ FILEHDR ] [ PHDRS ] [ AT ( address ) ]
[ FLAGS ( flags ) ] ;
}
240 operating system: from 0 to 1
Example 8.2.1. With only name and type, we can create any number
of program segments. For example, we can add the NULL program
segment and remove the GNU_STACK segment:
main.lds
PHDRS
{
null PT_NULL;
code PT_LOAD;
}
SECTIONS
{
. = 0x10000;
.text : { *(.text) } :code
. = 0x8000000;
.data : { *(.data) }
.bss : { *(.bss) }
}
The content of PHDRS command tells that the final executable bi-
nary contains 2 program segments: NULL and LOAD. The NULL segment
is given the name null and LOAD segment given the name code to
signify this LOAD segment contains program code. Then, to put a sec-
tion into a segment, we use the syntax :<phdr>, where phdr is the
linking and loading on bare metal 241
Output
Elf file type is EXEC (Executable file)
Entry point 0x10000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4
LOAD 0x001000 0x00010000 0x00010000 0x00010 0x00010 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text .eh_frame
Those 2 segments are now NULL and LOAD instead of LOAD and
GNU_STACK.
main.lds
PHDRS
{
null1 PT_NULL;
null2 PT_NULL;
code1 PT_LOAD;
code2 PT_LOAD;
}
SECTIONS
{
242 operating system: from 0 to 1
. = 0x10000;
.text : { *(.text) } :code1
.eh_frame : { *(.eh_frame) } :code2
. = 0x8000000;
.data : { *(.data) }
.bss : { *(.bss) }
}
After amending the PHDRS content earlier with this new segment
listing, we put .text into code1 segment and .eh_frame into code2
segment, we compile and see the new segments:
Output
Elf file type is EXEC (Executable file)
Entry point 0x10000
There are 4 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4
NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4
LOAD 0x001000 0x00010000 0x00010000 0x00010 0x00010 R E 0x1000
LOAD 0x001010 0x00010010 0x00010010 0x00058 0x00058 R 0x1000
Section to Segment mapping:
Segment Sections...
00
01
02 .text
03 .eh_frame
gram segment includes the ELF file header of the executable binary.
However, this attribute should only added for the first program
segment, as it drastically alters the size and starting address of a
segment because the ELF header is always at the beginning of a
binary file, recall that a segment starts at the address of its first
content, which is in most of the cases (except for this case, which is
the file header), the first section.
main.lds
PHDRS
{
null PT_NULL FILEHDR;
code PT_LOAD;
}
..... content is the same .....
Output
Elf file type is EXEC (Executable file)
Entry point 0x10000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NULL 0x000000 0x00000000 0x00000000 0x00034 0x00034 R 0x4
LOAD 0x001000 0x00010000 0x00010000 0x00068 0x00068 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
244 operating system: from 0 to 1
01 .text .eh_frame
In previous examples, the file size and memory size of the NULL
section are always 0, now they are both 34 bytes, which is the size of
an ELF header.
main.lds
PHDRS
{
null PT_NULL;
code PT_LOAD FILEHDR;
}
..... content is the same .....
Output
Elf file type is EXEC (Executable file)
Entry point 0x10000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4
LOAD 0x000000 0x0000f000 0x0000f000 0x01068 0x01068 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text .eh_frame
The size of the LOAD segment in the previous example is only 0x68,
the same size as the total sizes of .text and .eh_frame sections in it.
But now, it is 0x01068, got 0x1000 bytes larger. What is the reason
linking and loading on bare metal 245
0x0
Figure 8.2.1: LOAD segment on
disk and in memory.
0x0
0x34 ELF header
0x1000 0x10000
.text .eh_frame LOAD segment .text .eh_frame Loaded content
0x1068 0x10068
0x1590
File
0xFFFFFFFF
Memory
(a) Without FILEHDR.
0x0
0x0 0xf000
0x34 ELF header 0xf034
ELF header
0x1068 0x10068
0x1590
File
0xFFFFFFFF
Memory
(b) With FILEHDR.
linking and loading on bare metal 247
main.lds
PHDRS
{
headers PT_PHDR FILEHDR PHDRS;
code PT_LOAD FILEHDR;
}
..... content is the same .....
Output
Elf file type is EXEC (Executable file)
Entry point 0x10000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x001000 0x00010000 0x00010000 0x00068 0x00068 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text .eh_frame
As shown in the output, the first segment is of type PHDR. Its size is
0x74, which includes:
248 operating system: from 0 to 1
0x40 bytes for the program segment header table, with 2 entries,
each is 0x20 bytes (32 bytes) in length.
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
....... output omitted ......
Size of this header: 52 (bytes) --> 0x34 bytes
Size of program headers: 32 (bytes) --> 0x20 bytes each program header
Number of program headers: 2 --> 0x40 bytes in total
Size of section headers: 40 (bytes)
Number of section headers: 12
Section header string table index: 9
A load memory address is the physical memory address, where a load memory address
program is loaded but not yet running.
main.lds
PHDRS
{
headers PT_PHDR FILEHDR PHDRS AT(0x500);
code PT_LOAD;
}
..... content is the same .....
Output
Elf file type is EXEC (Executable file)
Entry point 0x4000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000500 0x00074 0x00074 R 0x4
LOAD 0x001000 0x00004000 0x00002000 0x00068 0x00068 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
250 operating system: from 0 to 1
01 .text .eh_frame
R 1 Readable
W 2 Writable
E 4 Executable
main.lds
PHDRS
{
headers PT_PHDR FILEHDR PHDRS AT(0x500);
code PT_LOAD FILEHDR FLAGS(0x1 | 0x2 | 0x4);
}
..... content is the same .....
Output
Elf file type is EXEC (Executable file)
Entry point 0x0
There are 2 program headers, starting at offset 52
linking and loading on bare metal 251
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000500 0x00074 0x00074 R 0x4
LOAD 0x001000 0x00000000 0x00000000 0x00010 0x00010 RWE 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text .eh_frame
LOAD segment now gets all the RWE permissions, as shown above.
main.lds
SECTIONS
{
/* . = 0x10000; */
.text : { *(.text) } :code
. = 0x8000000;
.data : { *(.data) }
.bss : { *(.bss) }
/DISCARD/ : { *(.eh_frame) }
}
Output
Elf file type is EXEC (Executable file)
Entry point 0x0
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000500 0x00074 0x00074 R 0x4
LOAD 0x001000 0x00000000 0x00000000 0x00010 0x00010 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
When first powered up, a desktop computer loads its basic system
routines from a read-only memory stored on the motherboard.
to this line:
3. Finally, we use objcopy to separate extract only the flat binary con-
tent as the original bootloader by adding this line to $(BUILD_DIR)/%.o:
%.asm:
$(BUILD_DIR)/%.o: %.asm
nasm -f elf $< -F dwarf -g -o $@
ld -m elf_i386 -T bootloader.lds $@ -o [email protected]
objcopy -O binary $(BUILD_DIR)/bootloader.o.elf $@
$ make qemu
$ gdb build/bootloader/bootloader.o.elf
After getting into gdb, press the Enter key and if the sample
.gdbinit section 7.7.3 is used, the output should look like:
os.c
void main() {}
BUILD_DIR=../build_os
OS=$(BUILD_DIR)/os
all: $(OS)
$(BUILD_DIR)/%.o: %.c
gcc $(CFLAGS) -m32 -c $< -o $@
$(OS): $(OS_OBJS)
ld -m elf_i386 -Tos.lds $(OS_OBJS) -o $@
clean:
rm $(OS_OBJS)
Everything looks good, except for the linker script part. Why is it
needed? The linker script is required for controlling at which physical
memory address the operating system binary appears in the memory,
so the linker can jump to the operating system code and execute it.
To complete this requirement, the default linker script used by gcc
would not work as it assumes the compiled executable runs inside an
existing operating system, while we are writing an operating system
itself.
The next question is, what will be the content in the linker script?
To answer this question, we must understand what goals to achieve
with the linker script:
For gdb to debug correctly with the operating system source code.
If only it is that simple. The idea is correctly, but not enough. The
goals implies the following constraints:
2. To debug properly with gdb, the debug info must contain correct
mappings between instruction addresses and source code.
typedef struct {
unsigned char e_ident[EI_NIDENT];
uint16_t e_type;
uint16_t e_machine;
uint32_t e_version;
ElfN_Addr e_entry;
ElfN_Off e_phoff;
ElfN_Off e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
} ElfN_Ehdr;
The offset from the start of the struct to the start of e_entry is:
16 bytes of e_ident[EI_NIDENT]:
#define EI_NIDENT 16
2 bytes of e_type
linking and loading on bare metal 259
2 bytes of e_machine
4 bytes of e_version
Offset = 16 + 2 + 2 + 4 = 24 = 0x18
hello.c
#include <stdio.h>
$ hd hello | less
Output
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 30 04 40 00 00 00 00 00 |..>.....0.@.....|
.........
Now that we know where the position of the entry address in the
ELF header, it is easy to modify the bootloader made in section 7.6.2
to retrieve and jump to the address:
bootloader.asm
;******************************************
; Bootloader.asm
; A Simple Bootloader
;******************************************
bits 16
start: jmp boot
boot:
cli ; no interrupts
cld ; all that we need to init
The first part is done. For the next part, we need to build an ELF
operating system image for the bootloader to load. The first step is to
create a linker script:
main.lds
ENTRY(main);
PHDRS
{
headers PT_PHDR FILEHDR PHDRS;
code PT_LOAD;
262 operating system: from 0 to 1
SECTIONS
{
.text 0x500: { *(.text) } :code
.data : { *(.data) }
.bss : { *(.bss) }
/DISCARD/ : { *(.eh_frame) }
}
After putting the script, we compile with make and it should work
smoothly:
Output
Elf file type is EXEC (Executable file)
Entry point 0x500
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x000500 0x00000500 0x00000500 0x00040 0x00040 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
linking and loading on bare metal 263
All looks good, until we run it. We begin by starting the QEMU
virtual machine:
$ make qemu
Then, start gdb and load the debug info (which is also in the same
binary file) and set a breakpoint at main:
(gdb) c
Continuing.
[ 0:7c00]
Breakpoint 1, 0x00007c00 in ?? ()
(gdb) c
Continuing.
[ 0: 500]
Breakpoint 2, main () at main.c:1
Output
main.c
B+> 1 void main(){}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Output
/home/tuhdo/workspace/os/build/os/os: file format elf32-i386
Disassembly of section .text:
00000500 <main>:
500: 55 push %ebp
501: 89 e5 mov %esp,%ebp
503: 90 nop
504: 5d pop %ebp
505: c3 ret
.... remaining output omitted ....
What is the reason for the incorrect Assembly code in main displayed 0x0
0x500
by gdb? There can only be one cause: the bootloader jumped to the ELF header
wrong addresses. But why was the address wrong? We made the Loaded content
.text
.text section at address 0x500, in which main code is in the first byte
for executing, and instructed the bootloader to retrieve the address at
the offset 0x18, then jump to the entry address.
Here is the problem: 0x500 is the start of the ELF header. The
bootloader actually loads the 2nd sector, which stores the executable
as a whole, to 0x500. Clearly, .text section, where main resides, is
far from 0x500. Since the in-memory entry address of the executable
binary is 0x500, .text should be at 0x500 + 0x500 = 0xa00. However,
the entry address recorded in the ELF header remains 0x500 and as a
result, the bootloader jumped there instead of 0xa00. This is one of
the issues that must be fixed.
The other issue is the mapping between debug info and the memory
address. Because the debug info is compiled with the assumed offset
0x500 that is the start of .text section, but due to actual loading, the
offset is pushed another 0x500 bytes, making the address actually is at
0xa00. This memory mismatch renders the debug info useless.
0x0
Debug Info
0x500 Figure 8.5.2: Wrong symbol-
.text ELF header
memory mappings in debug info.
Loaded content
.text .text
Debug info is
supposed to be here
0xFFFFFFFF
Memory
Fix the entry address to account for the extra offset when loading
268 operating system: from 0 to 1
into memory.
Fix the debug info to account for the extra offset when loading into
memory.
$ readelf -l build/os/os
Output
Elf file type is EXEC (Executable file)
Entry point 0x500
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x000500 0x00000500 0x00000500 0x00040 0x00040 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
Notice the Offset and the VirtAddress fields: both have the same
value. This is problematic, as the entry address and the memory
addresses in the debug info depend on VirtAddr field, but the Offset
having the same value destroys the validity of VirtAddr18 because it 18
The offset is the distance in bytes
between the beginning of the file, the
means that the real in-memory address will always be greater than the address 0, to the beginning address of
a segment or a section.
VirtAddr.
Output
Elf file type is EXEC (Executable file)
Entry point 0x1074
There are 2 program headers, starting at offset 52
linking and loading on bare metal 269
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x000074 0x00001074 0x00001074 0x00006 0x00006 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
Output
Elf file type is EXEC (Executable file)
Entry point 0x1073
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x001073 0x00001073 0x00001073 0x00006 0x00006 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
Output
Elf file type is EXEC (Executable file)
Entry point 0x0
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x001000 0x00000000 0x00000000 0x00006 0x00006 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
Output
Elf file type is EXEC (Executable file)
Entry point 0x74
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x000074 0x00000074 0x00000074 0x00006 0x00006 R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
Now we get a hint how to control the values of Offset and VirtAddr
to produce a desired binary layout. What we need is to change the
Align field to a value with smaller value for finer grain control. It
might work out with a binary layout like this:
Output
Elf file type is EXEC (Executable file)
Entry point 0x600
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x000100 0x00000600 0x00000600 0x00006 0x00006 R E 0x100
Section to Segment mapping:
Segment Sections...
00
01 .text
If we set the Offset field to 0x100 from the beginning of the file
and the VirtAddr to 0x600, when loading in memory, the actual
memory of .text is 0x500 + 0x100 = 0x600; 0x500 is the memory
location where the bootloader loads into the physical memory and
0x100 is the offset from the end of ELF header to .text. The entry
address and the debug info will then take the value 0x600 from the
VirtAddr field above, which totally matches the actual physical layout.
We can do it by changing os.lds as follow:
main.lds
ENTRY(main);
272 operating system: from 0 to 1
0x0
0xFFFFFFFF
Memory
PHDRS
{
headers PT_PHDR FILEHDR PHDRS;
code PT_LOAD;
}
SECTIONS
{
.text 0x600: ALIGN(0x100) { *(.text) } :code
.data : { *(.data) }
.bss : { *(.bss) }
/DISCARD/ : { *(.eh_frame) }
}
Output -n
--nmagic
Turn off page alignment of sections, and disable linking against shared
libraries. If the output format supports Unix style magic numbers, mark the
output as "NMAGIC"
os/Makefile
$ ls -l build/os/os
-rwxrwxr-x 1 tuhdo tuhdo 9060 Feb 13 21:37
build/os/os
os/Makefile
Output
Elf file type is EXEC (Executable file)
Entry point 0x600
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4
LOAD 0x000100 0x00000600 0x00000600 0x00006 0x00006 R E 0x100
Section to Segment mapping:
Segment Sections...
00
01 .text
$ make qemu
In another terminal, we start gdb, loading the debug info and set a
breakpoint at main:
$ gdb
Then, let gdb runs until it hits the main function, then we change
to the split layout between source and assembly:
linking and loading on bare metal 275
Output
main.c
B+> 1 void main(){}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Output 0x600 <main>: 0x55 0x89 0xe5 0x90 0x5d 0xc3 0x37
0x00
0x608: 0x00 0x00 0x04 0x00 0x00 0x00 0x00 0x00
Output
build/os/os: file format elf32-i386
Disassembly of section .text:
00000600 <main>:
void main(){}
600: 55 push ebp
601: 89 e5 mov ebp,esp
603: 90 nop
604: 5d pop ebp
605: c3 ret
Disassembly of section .debug_info:
...... output omitted ......
Both raw opcode displayed by the two programs are the same. In
this case, it proved that gdb correctly jumped to the address of main
for a proper debugging. This is an extremely important milestone.
Being able to debug in bare metal will help tremendously in writing
an operating system, as a debugger allows a programmer to inspect
the internal state of a running machine at each step to verify his
code, step by step, to gradually build up a solid understanding. Some
professional programmers do not like debuggers, but it is because
they understand their domain deep enough to not need to rely on a
debugger to verify their code. When encountering new domains, a
debugger is indispensable learning tool because of its verifiability.
Kernel Programming
9
x86 Descriptors
reusability: that is, the same software API can be reused across
programs, thus simplifying software development process
There are so many hardware devices out there, so its best to leave
the hardware engineers how the devices talk to an OS. To achieve this
goal, the OS only provides a set of agreed software interfaces between
itself and the device driver writers and is called Hardware Abstraction
Layer.
9.2 Drivers
10.1 Concepts
10.2 Process
10.2.1 Task
10.2.2 Process
track of where the stack and the heap allocated for firefox are, where
Firefoxs code area is and which instruction EIP is holding to execute
next... The typical process structure looks like this:
10.2.3 Scheduler
10.2.5 Priority
10.2.8 procfs
10.3 Threads
Threads are units of work inside a process that shares the execution
environment. A process creates a whole new execution environment
with code of its own:
10.6.1 Requirements
Description
qasdfasdf asd
Constraints
Design
Implementation plan
Address space is the set of all addressable memory locations. There are
2 types of address spaces in physical memory address:
appear next to each other in virtual memory space, they are scattered
through out the physical memory.
Port, 43 transistor, 12