0% found this document useful (0 votes)
269 views

5 Xv6-Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
269 views

5 Xv6-Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 316

Reading xv6 code: Notes

Abhijit A. M.
[email protected]

Credits:
xv6 book by Cox, Kaashoek, Morris
Notes by Prof. Sorav Bansal
Use cscope and ctags with VIM

Go to folder of xv6 code and run
cscope -q *.[chS]

Also run
ctags *.[chS]

Now download the file
http://cscope.sourceforge.net/cscope_maps.vim
as .cscope_maps.vim in your ~ folder
 And add line “source ~/.cscope_maps.vim” in your
~/.vimrc file

Read this tutorial
http://cscope.sourceforge.net/cscope_vim_tutorial.html
Use call graphs (using doxygen)

Doxygen – a documentation generator.

Can also be used to generate “call graphs” of
functions

Download xv6

Install doxygen on your Ubuntu machine.

cd to xv6 folder

Run “doxygen -g doxyconfig”

This creates the file “doxyconfig”
Use call graphs (using doxygen)

Create a folder “doxygen”

Open “doxyconfig” file and make these changes.
PROJECT_NAME = "XV6"
OUTPUT_DIRECTORY = ./doxygen
CREATE_SUBDIRS = YES
EXTRACT_ALL = YES
EXCLUDE = usertests.c cat.c yes.c echo.c
forktest.c grep.c init.c kill.c ln.c ls.c mkdir.c rm.c sh.c
stressfs.c wc.c zombie.c
CALL_GRAPH = YES
CALLER_GRAPH = YES

Now run “doxygen doxyconfig”

Go to “doxygen”/html and open “firefox index.html” --> See call graphs
in files -> any file
About xv6

Unix Like OS

Multi tasking, Single user

On x86 processor

Supports some system calls

Small code, 7 to 10k

Meant for learning OS concepts

No : demand paging, no copy-on-write fork, no
shared-memory, fixed size stack for user programs
Xv6 follows monolithic kernel
approach
qemu

A virtual machine manager, like Virtualbox

Qemu provides us

BIOS

Virtual CPU, RAM, Disk controller, Keyboard controller

IOAPIC, LAPIC

Qemu runs xv6 using this command
qemu -serial mon:stdio -drive
file=fs.img,index=1,media=disk,format=raw -drive
file=xv6.img,index=0,media=disk,format=raw -smp 2 -
m 512

Invoked when you run “make qemu”
qemu

Understanding qemu command

-serial mon:stdio

the window of xv6 is also multiplexed in your normal terminal.

Run “make qemu”, then Press “Ctrl-a” and “c” in terminal and you
get qemu prompt

-drive file=fs.img,index=1,media=disk,format=raw

Specify the hard disk in “fs.img”, accessible at first slot in IDE(or
SATA, etc), as a “disk” , with “raw” format

-smp 2

Two cores in SMP mode to be simulated

-m 512

Use 512 MB ram
About files in XV6 code
 cat.c echo.c forktest.c grep.c
init.c kill.c ln.c ls.c mkdir.c
rm.c sh.c stressfs.c usertests.c
wc.c yes.c zombie.c

User programs for testing xv6
 Makefile

To compile the code
 dot-bochsrc

For running with emulator bochs
About files in XV6 code
 bootasm.S entryother.S entry.S
initcode.S swtch.S trapasm.S
usys.S

Kernel code written in Assembly. Total 373 lines
 kernel.ld

Instructions to Linker, for linking the kernel
properly
 README Notes LICENSE

Misc files
Using Makefile

make qemu

Compile code and run using “qemu” emulator

make xv6.pdf

Generate a PDF of xv6 code

make mkfs

Create the mkfs program

make clean

Remove all intermediary and final build files
Files generated by Makefile
 .o files

Compiled from each .c file

No need of separate instruction in Makefile to
create .o files

_%: %.o $(ULIB) line is sufficient to build
each .o for a _xyz file

Files generated by Makefile
 asm files

Each of them has an equivalent object code file or C file. For example
bootblock: bootasm.S bootmain.c
$(CC) $(CFLAGS) -fno-pic -O -nostdinc -I. -c
bootmain.c
$(CC) $(CFLAGS) -fno-pic -nostdinc -I. -c
bootasm.S
$(LD) $(LDFLAGS) -N -e start -Ttext 0x7C00 -o
bootblock.o bootasm.o bootmain.o
$(OBJDUMP) -S bootblock.o > bootblock.asm
$(OBJCOPY) -S -O binary -j .text bootblock.o
bootblock
./sign.pl bootblock
Files generated by Makefile
 _ln, _ls, etc

Executable user programs

Compilation process is explained after few
slides
Files generated by Makefile
 xv6.img

Image of xv6 created
xv6.img: bootblock kernel
dd if=/dev/zero of=xv6.img
count=10000
dd if=bootblock of=xv6.img
conv=notrunc
dd if=kernel of=xv6.img seek=1
conv=notrunc
Files generated by Makefile
 bootblock
bootblock: bootasm.S bootmain.c
$(CC) $(CFLAGS) -fno-pic -O -nostdinc -I.
-c bootmain.c
$(CC) $(CFLAGS) -fno-pic -nostdinc -I. -c
bootasm.S
$(LD) $(LDFLAGS) -N -e start -Ttext
0x7C00 -o bootblock.o bootasm.o bootmain.o
$(OBJDUMP) -S bootblock.o > bootblock.asm
$(OBJCOPY) -S -O binary -j .text
bootblock.o bootblock
./sign.pl bootblock
Files generated by Makefile
kernel
kernel: $(OBJS) entry.o entryother initcode
kernel.ld
$(LD) $(LDFLAGS) -T kernel.ld -
o kernel entry.o $(OBJS) -b binary
initcode entryother
$(OBJDUMP) -S kernel >
kernel.asm
$(OBJDUMP) -t kernel | sed
'1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d'
> kernel.sym
Files generated by Makefile
 fs.img

A disk image containing user programs and README
fs.img: mkfs README $(UPROGS)
./mkfs fs.img README $(UPROGS)
 .sym files

Symbol tables of different programs

E.g. for file “kernel”
$(OBJDUMP) -t kernel | sed '1,/SYMBOL
TABLE/d; s/ .* / /; /^$$/d' > kernel.sym
Size of xv6 C code

wc *[ch] | sort -n

10595 34249 278455 total

Out of which

738 4271 33514 dot-bochsrc

wc cat.c echo.c forktest.c grep.c init.c kill.c
ln.c ls.c mkdir.c rm.c sh.c stressfs.c
usertests.c wc.c yes.c zombie.c

2849 6864 51993 total

So total code is 10595 – 2849 – 738 = 7008 lines
List of commands to try (in given
order)
usertests # Runs lot of tests and takes upto 10 minutes to run
stressfs # opens , reads and writes to files in parallel
ls # out put is filetyep, inode number, type
cat README
ls;ls
cat README | grep BUILD
echo hi there
echo hi there | grep hi
echo "hi there
List of commands to try (in this
order)
echo README | grep Wa ls ../ # works from inside test
echo README | grep Wa | cd # fails
grep ty # does not work cd / # works
cat README | grep Wa | wc README
grep bl # works rm out
ls > out # takes time! ls . test # listing both
mkdir test directories
cd test ln cat xyz; ls
ls # fails rm xyz; ls
User Libraries: Used to link user
land programs

Ulib.c

Strcpy, strcmp,strlen, memset, strchr, stat, atoi, memove

Stat uses open()

Usys.S -> compiles into usys.o

Assembly code file. Basically converts all calls like open()
(e.g. used in ulib.c) into assembly code using “int”
instruction.
Run following command see the last 4 lines in the output
objdump -d usys.o
00000048 <open>:
48: b8 0f 00 00 00 mov $0xf,%eax
4d: cd 40 int $0x40
4f: c3 ret
User Libraries: Used to link user
land programs

printf.c

Code for printf()!

Interesting to read this code.

Uses variable number of arguments. Normal
technique in C is to use va_args library, but here it
uses pointer arithmetic.

Written using two more functions: printint() and
putc() - both call write()

Where is code for write()?
User Libraries: Used to link user
land programs

umalloc.c

This is an implementation of malloc() and free()

Almost same as the one done in “The C
Programming Language” by Kernighan and
Ritchie

Uses sbrk() to get more memory from xv6 kernel
Understanding the build process
in more details

Run
make qemu | tee make-output.txt

You will get all compilation commands in
make-output.txt
Compiling user land programs
Normally when you compile a program on Linux
You compile it for the same ‘target’ machine ( = CPU + OS)
The compiler itself runs on the same OS

To compile a user land program for xv6, we don’t have a compiler


on xv6,
So we compile the programs (using make, cc) on Linux , for xv6
Obviously they can’t link with the standard libraries on Linux
Compiling user land programs
ULIB = ulib.o usys.o printf.o umalloc.o
_%: %.o $(ULIB)
$(LD) $(LDFLAGS) -N -e main -Ttext 0 -o $@ $^
$(OBJDUMP) -S $@ > $*.asm
$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d;
s/ .* / /; /^$$/d' > $*.sym

$@ is the name of the file being generated


$^ is dependencies . i.e. $(ULIB) and %.o in this case
Compiling user land programs
gcc -fno-pic -static -fno-builtin -fno-strict-aliasing
-O2 -Wall -MD -ggdb -m32 -Werror -fno-omit-frame-
pointer -fno-stack-protector -fno-pie -no-pie -c -o
cat.o cat.c
ld -m elf_i386 -N -e main -Ttext 0 -o _cat cat.o
ulib.o usys.o printf.o umalloc.o
objdump -S _cat > cat.asm
objdump -t _cat | sed '1,/SYMBOL TABLE/d;
s/ .* / /; /^$/d' > cat.sym
Compiling user land programs
Mkfs is compiled like a Linux program !
gcc -Werror -Wall -o mkfs mkfs.c
Important header files for user
programs
 types.h
typedef unsigned int uint;
typedef unsigned short ushort;
typedef unsigned char uchar;
typedef uint pde_t;
Important header files for user
programs
 stat.h
#define T_DIR 1 // Directory
#define T_FILE 2 // File
#define T_DEV 3 // Device

struct stat {
short type; // Type of file
int dev; // File system's disk device
uint ino; // Inode number
short nlink; // Number of links to file
uint size; // Size of file in bytes
};
 Used by stat system call
Important header files for user
programs
 fcntl.h
#define O_RDONLY 0x000
#define O_WRONLY 0x001
#define O_RDWR 0x002
#define O_CREATE 0x200
Important header files for user
programs
 user.h
Prototypes of all system calls (fork,
wait, etc)
and ulib.c functions (strcpy, etc )
Some numbers and their
‘meaning’

These numbers occur very frequently in
discussion

0x 80000000 = 2 GB = KERNBASE

0x 100000 = 1 MB = EXTMEM

0x 80100000 = 2GB + 1MB = KERNLINK

0x E000000 = 224 MB = PHYSTOP

0x FE000000 = 3.96 GB = 4064 MB = DEVSPACE

4096 – 4064 = 32 MB left on top
How to read kernel code ?

Understand the data structures

Know each global variable, typedefs, lists, arrays, etc.

Know the purpose of each of them

While reading a code path, e.g. exec()

Try to ‘locate’ the key line of code that does major work

Initially (but not forever) ignore the ‘error checking’ code

Keep summarising what you have read

Remembering is important !

To understand kernel code, you should be good with
concepts in OS , C, assembly, hardware
Bootloader
What does a bootloader do?

Bootloader itself

Is loaded by the BIOS at a fixed location in memory and
BIOS makes it run

Our job, as OS programmers, is to write the bootloader
code

Bootloader does

Pick up code of OS from a ‘known’ location and loads it in
memory

Makes the OS run

Xv6 bootloader: bootasm.S bootmain.c (see Makefile)
bootloader

BIOS Runs (automatically)

Loads boot sector into RAM at 0x7c00

Starts executing that code

Make sure that your bootloader is loaded at 0x7c00

Makefile has
bootblock: bootblock.S bootmain.c
$(CC) $(CFLAGS) -fno-pic -nostdinc -I. -c bootasm.S .....
...
$(LD) $(LDFLAGS) -N -e start -Ttext 0x7C00 -o bootblock.o
bootasm.o bootmain.o
Resuls in:
00007c00 <start>: in bootblock.asm
Virtual ddress = offset Address

Effective memory translation in the


beginning
At _start:
bootloader

First instruction is ‘cli’

disable interrupts

So that until your code loads all hardware
interrupt handlers, no interrupt will occur
Processor starts in real mode

Processor starts in real mode – works like 16
bit 8088

eight 16-bit general-purpose registers,

Segment registers %cs, %ds, %es, and %ss --
> additional bits necessary to generate 20-bit
memory addresses from 16-bit registers.
addr = seg * 4 + addr
Relation between Logical and Physical
Address

Virtual address --> is logical address here


Paging is optional . If paging is off then linear address is
physical address
bootloader: addr translation

(Later on) xv6 configures the segmentation
hardware to translate logical to linear
addresses without change, so that they are
always equal.
-> Segmentation is pratically off

Once paging is enabled, the only interesting
address mapping in the system will be linear
to physical.
Zeroing registers
start: 
disable interrupts
cli # BIOS enabled
interrupts; disable 

# Zero data segment registers DS, ES,


and SS.

zero ax and ds, es,
xorw %ax,%ax # Set %ax to ss
zero
movw %ax,%ds # -> Data

BIOS did not put in
Segment
anything perhaps
movw %ax,%es # -> Extra
Segment
movw %ax,%ss # -> Stack
Segment

Seg:off with 16 bit segments
can actually address more
A not so necessary detail than 20 bits of memory. After
Enable 21 bit address 0x100000 (=2^20), 8086
wrapped addresses to 0.
seta20.1: 
80286 introduced 21st bit of
inb $0x64,%al # Wait for not busy
address. But older software
testb $0x2,%al
required 20 bits only. BIOS
jnz seta20.1
movb $0xd1,%al # 0xd1 -> port
disabled 21st bit. Some OS
0x64 needed 21st Bit. So enable it.
outb %al,$0x64 
Write to Port 0x64 and 0x60 ->
seta20.2: keyboard controller
inb $0x64,%al # Wait for not busy

to enable 21st bit out of address
testb $0x2,%al
translation
jnz seta20.2
movb $0xdf,%al # 0xdf -> port 0x60

Why? Before the A20, i.e. 21st bit
was introduced, it belonged to
outb %al,$0x60
keyboard controller

For more details see
https://en.wikipedia.org/wiki/A20
_line

https://en.wikipedia.org/wiki/A20
_line
Real mode Vs protected mode

Real mode 16 bit registers

Protected mode

32 bit registers

can address upto 232 memory

Can do arithmetic in 32 bits

Segment registers is index into segment
descriptor table
Segments in protected mode

Xv6 makes
almost (!=
zero)
no use of
segmentatio
n, and relies
only on
paging.
More later.
lgdt gdtdesc
...
# Bootstrap GDT
Bootloader
.p2align 2 # force 4 byte
alignment

load the processor’s (GDT)
register with the value gdtdesc
gdt: which points to the table gdt.
SEG_NULLASM # null seg 
table gdt : The table has a null
SEG_ASM(STA_X|STA_R, 0x0, entry, one entry for executable
0xffffffff) # code seg code, and one entry to data.
SEG_ASM(STA_W, 0x0, 0xffffffff) 
all segments have a base address
of zero and the maximum
possible limit
# data seg

The code segment descriptor has
a flag set that indicates that the
gdtdesc: code should run in 32-bit mode
.word (gdtdesc - gdt - 1) 
With this setup, when the boot
# sizeof(gdt) - 1 loader enters protected mode,
logical addresses map one-to-one
.long gdt to physical addresses.
bootasm.S after “lgdt gdtdesc”
till jump to “entry”
Still
Logical Address =
Physical address!
Logical Address = offset But with GDT in picture
Physical
Address and
Protected Mode
operation

CS, SS, etc.


During this time,
Base Limit Permissions
Selector Loading kernel from
ELF into physical
memory
3
Addresses in “kernel”
2 0 4GB Write file translate to same
physical address!
1 0 4GB Read, Execute
DS

0 0 0 0
SS

CS GDT
Enable protected mode

boot loader enables movl %cr0, %eax
protected mode by orl $CR0_PE, %eax
setting the 1 bit
(CR0_PE) in register movl %eax, %cr0
%cr0
Complete transition to 32 bit
mode
ljmp $ Complete the transition
(SEG_KCODE<<3), to 32-bit protected mode
$start32 by using a long jmp
to reload %cs and
%eip. The segment
descriptors are set up
with no translation, so
that the mapping is still
the identity mapping.
Jumping to “C” code
movw $(SEG_KDATA<<3), %ax
segment selector
# Our data 
Setup Data, extra, stack
movw %ax, %ds # -> DS: Data segment with
Segment
movw %ax, %es # -> ES: Extra
SEG_KDATA
Segment
movw %ax, %ss # -> SS: Stack

Move “$start” = 7c00 to
Segment stack
movw $0, %ax # Zero segments
not ready for use 
It will grow from 7c00 to
movw %ax, %fs # -> FS 0000
movw %ax, %gs # -> GS

Call bootmain() a C
# Set up the stack pointer and call into C.
function
movl $start, %esp
call bootmain 
In bootmain.c
bootmain(): already in memory, as
part of ‘bootblock’

bootmain.c , expects to find a void
copy of the kernel executable bootmain(void)
on the disk starting at the {
second sector (sector = 1). struct elfhdr *elf;

Why? struct proghdr *ph, *eph;
void (*entry)(void);

The kernel is an ELF format
uchar* pa;
binary

Bootmain loads the first 4096 elf = (struct elfhdr*)0x10000; // scratch
bytes of the ELF binary It space
places the in-memory copy at
address 0x10000 // Read 1st page off disk
readseg((uchar*)elf, 4096, 0);
bootmain()

Check if it’s really // Is this an ELF
ELF or not executable?

Next load kernel if(elf->magic !=
code from ELF file ELF_MAGIC)
“kernel” into return; // let
memory bootasm.S handle
error
struct elfhdr {
uint magic; // must equal
ELF ELF_MAGIC
uchar elf[12];
ushort type;
ushort machine;
uint version;
uint entry;
uint phoff; // where is program
header table
uint shoff;
uint flags;
ushort ehsize;
ushort phentsize;
ushort phnum; // no. Of program
header entries
ushort shentsize;
ushort shnum;
ushort shstrndx;
};
// Program header
struct proghdr {
ELF uint type; // Loadable segment ,
Dynamic linking information ,
Interpreter information , Thread-
Local Storage template , etc.
uint off; //Offset of the segment
in the file image.
uint vaddr; //Virtual address of
the segment in memory.
uint paddr; // physical address to
load this program, if PA is relevant
uint filesz; //Size in bytes of the
segment in the file image.
uint memsz; //Size in bytes of the
segment in memory. May be 0.
uint flags;
uint align;
};
Run ‘objdump -x -a kernel | head -15’ & see this
Diff
Code to be between
kernel: file format elf32-i386 loaded at memsz &
kernel KERNBASE + filesz, will
architecture: i386, flags 0x00000112: KERNLINK
EXEC_P, HAS_SYMS, D_PAGED be filled
start address 0x0010000c with zeroes
in memory
Program Header:
LOAD off 0x00001000 vaddr 0x80100000 paddr 0x00100000 align 2**12
filesz 0x0000a516 memsz 0x000154a8 flags rwx
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rwx

Stack :
everything
zeroes
Load code from ELF to memory

// Load each program segment (ignores ph flags).


ph = (struct proghdr*)((uchar*)elf + elf->phoff);
eph = ph + elf->phnum;
// Abhijit: number of program headers
for(; ph < eph; ph++){
// Abhijit: iterate over each program header
pa = (uchar*)ph->paddr;
// Abhijit: the physical address to load program
/* Abhijit: read ph->filesz bytes, into 'pa',
from ph->off in kernel/disk */
readseg(pa, ph->filesz, ph->off);
if(ph->memsz > ph->filesz)
stosb(pa + ph->filesz, 0, ph->memsz - ph->filesz);
// Zero the reminder section*/
}
Jump to Entry
// Call the entry point from the ELF header.
// Does not return!
/* Abhijit:
* elf->entry was set by Linker using kernel.ld
* This is address 0x80100000 specified in
kernel.ld
* See kernel.asm for kernel assembly code).
*/
entry = (void(*)(void))(elf->entry);
entry();
Before code of entry(), Reminder:
PDE looks like this
Before code of
entry(),
Reminder:

with 4 MB
pages, memory
translation
entrypgdir in main.c, is used by
entry()
#define PTE_P 0x001 // Present
#define PTE_W 0x002 // Writeable
#define PTE_U 0x004 // User
__attribute__((__aligned__(PGSIZE))) #define PTE_PS 0x080 // Page Size
pde_t entrypgdir[NPDENTRIES] = { #define PDXSHIFT 22 // offset of
// Map VA's [0, 4MB) to PA's [0, 4MB) PDX in a linear address
[0] = (0) | PTE_P | PTE_W | PTE_PS,
// Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB). This is entry 512
[KERNBASE>>PDXSHIFT] = (0) | PTE_P | PTE_W | PTE_PS,
};
This is entry page directory during entry(), beginning of kernel
Mapping 0:0x400000 (i.e. 0: 4MB) to physical addresses 0:0x400000. is required as long
as entry is executing at low addresses, but will eventually be removed.
This mapping restricts the kernel instructions and data to 4 Mbytes.
entry() in entry.S
entry:
movl %cr4, %eax 
# Turn on page size extension
orl $(CR4_PSE), %eax for 4Mbyte pages
movl %eax, %cr4

# Set page directory. 4 MB
pages (temporarily only. More
movl $(V2P_WO(entrypgdir)), later)
%eax 
# Turn on paging.
movl %eax, %cr3 
# Set up the stack pointer.
movl %cr0, %eax 
# Jump to main(), and switch
orl $(CR0_PG|CR0_WP), %eax to executing at high addresses.
The indirect call is needed
movl %eax, %cr0 because the assembler
movl $(stack + KSTACKSIZE), produces a PC-relative
%esp instruction for a direct jump.
mov $main, %eax 

jmp *%eax
More about entry()
movl $ 
V2P is simple:
(V2P_WO(entrypgdir)), substract
%eax 0x80000000 i.e.
movl %eax, %cr3 KERNBASE from
address
-> Here we use physical
address using V2P_WO
because paging is not
turned on yet
More about entry()
movl %cr0, %eax 
But we have already
orl $(CR0_PG| set 0’th entry in
CR0_WP), %eax pgdir to address 0
movl %eax, %cr0 
So it still works!
This turns on paging
After this also, entry() is
running and processor is
executing code at lower
addresses
entry()

# Set up the stack pointer.

# Abhijit: +KSTACKSIZE is
movl $(stack + done as stack grows
KSTACKSIZE), %esp downwards
mov $main, %eax 
# Jump to main(), and
switch to executing at
jmp *%eax high addresses. The
.comm stack, indirect call is needed
because the assembler
KSTACKSIZE produces a PC-relative
# Abhijit: allocate here 'stack' of size = instruction for a direct
KSTACKSIZE jump.
From entry: RAM
Till: inside main(), before kvmalloc()

Logical Address = offset


Linear Address

4MB

CS, SS, etc. 0


Base Limit Permissions
Selector

3 512 0 P,W,PS

2 0 4GB Write .

.
1 0 4GB Read, Execute
DS .
0 0 0 0
3
SS
GDT 2

CS 1
CR3
0 0 P,W,PS
entrypgdir
From entry: RAM
Till: inside main(), before kvmalloc()

Physical Addr

Logical Address = offset


Dir Offset

4MB

CS, SS, etc. 0

Base Limit Permissions


Selector

3 512 0 P,W,PS

0 4GB Write .
2
.
0 4GB Read, Execute
1
DS .
0 0 0
0 3
SS
GDT 2

1
CS Even now, every Logical CR3
address = Physical address,
but through Page dir 0 0 P,W,PS
entrypgdir
Code from bootasm.S bootmain.c is over!
Kernel is loaded.
Now kernel is going to prepare itself
main() in main.c

Initializes “free list” of 
Initializes
page frames 
LAPIC on each processor,

In 2 steps. Why? IOAPIC

Sets up page table for

Disables PIC
kernel 
“Console” hardware (the
standard I/O)

Detects configuration of 
Serial Port
all processors

Interrupt Descriptor Table

Starts all processors 
Buffer Cache

Just like the first processor 
Files Table

Creates the first process! 
Hard Disk (IDE)
main() in main.c
int void
main(void) { kinit1(void *vstart, void
kinit1(end,
*vend) {
P2V(4*1024*1024)); // phys initlock(&kmem.lock,
page allocator "kmem");
kvmalloc(); // kernel page
kmem.use_lock = 0;
table
freerange(vstart, vend);
}
kfree(char *v) {
struct run *r;
main() in main.c if((uint)v % PGSIZE || v <
end || V2P(v) >= PHYSTOP)
panic("kfree");
// Fill with junk to catch
dangling refs.
void memset(v, 1, PGSIZE);
if(kmem.use_lock)
freerange(void *vstart, void acquire(&kmem.lock);
*vend) r = (struct run*)v;
r->next = kmem.freelist;
kmem.freelist = r;
{ if(kmem.use_lock)
release(&kmem.lock); }
char *p;
p=
(char*)PGROUNDUP((uint)vsta
rt);
for(; p + PGSIZE <=
(char*)vend; p += PGSIZE)
kfree(p);
}
Free List in XV6 Obtained after main() -> kinit1()

Pages obtained Between


end = 801154a8 = 2049 MB to P2V(4MB) = 2052 MB
Remember
Right now Logical = Physical address.

lock
kmem
uselock Seen
Actually like independently
run *freelist
this in memory

run * run * run *


// Allocate one page
Back to table for the machine
main() for the kernel address
// space for scheduler
int processes.
main(void) { void
kinit1(end, kvmalloc(void)
P2V(4*1024*1024)); //
phys page allocator {
kvmalloc(); // kpgdir = setupkvm();
kernel page table switchkvm();
}
// Allocate one page
Back to table for the machine
main() for the kernel address
int // space for scheduler
main(void) { processes.
kinit1(end, void
P2V(4*1024*1024)); // kvmalloc(void)
phys page allocator {
kvmalloc(); // kpgdir =
kernel page table setupkvm(); // global
var kpgdir
switchkvm();
}
pde_t*
for(k = kmap; k <
setupkvm(void) &kmap[NELEM(kmap)];
{ k++)
pde_t *pgdir; if(mappages(pgdir, k-
>virt, k->phys_end - k-
struct kmap *k; >phys_start,
if((pgdir = (pde_t*)kalloc()) (uint)k-
== 0)
>phys_start, k->perm) <
return 0; 0) {
memset(pgdir, 0, PGSIZE); freevm(pgdir);
if (P2V(PHYSTOP) > return 0;
(void*)DEVSPACE)
}
panic("PHYSTOP too
high"); return pgdir;
}
static struct kmap {
void *virt;
uint phys_start;
uint phys_end;
int perm;
} kmap[] = {
{ (void*)KERNBASE, 0, EXTMEM, PTE_W}, // I/O space
{ (void*)KERNLINK, V2P(KERNLINK), V2P(data), 0}, // kern text+rodata
{ (void*)data, V2P(data), PHYSTOP, PTE_W}, // kern data+memory
{ (void*)DEVSPACE, DEVSPACE, 0, PTE_W}, // more devices
};
kmap[] mappings done in kvmalloc(). This shows segmentwise, entries are done in page
directory and page table for corresponding VA-> PA mappings

4B

DEVSPACE=3.96GB
4GB
Un
mapped DEVSPACE=3.96GB

KERNBASE+PHYSTOP=
2.224GB= 2272MB Unused
Kernel
data
+ memory PHYSTOP = 224MB
data= 2049.0.3125 MB Kernel
Kernel data
code + + memory
RO Data
KERNBASE+EXTMEM=2049 MB 1.03125 MB = 1MB + data
Kernel
I/O Space code +
RO Data
KERNBASE=2048MB EXTMEM=1MB
I/O Space
Process
address 0
0 space 0x80108000 =data =
2049.3125 MB
Is obtained from
kernel.sym
After kvmalloc() in main() RAM

Physical Addr
Linear Address
Logical Address = offset
Dir pg Offset

4MB

0
CS, SS, etc.

Base Limit Permissions


Selector

3
0 4GB Write
2
0 4GB Read, Execute
1
DS
0 0 0
0
SS Page
GDT
Table
CS CR3
Now Linear Address = Logical
Address != Physical Address kpgdir
main()->mpinit()

Multiprocessor and APIC


configuration

Reference: Intel Multiprocessor Specification


https://web.archive.org/web/20121002210153/http://download.inte
l.com/design/archives/processors/pro/docs/24201606.pdf
Intel
MultiProces
sor
Specificatio
n (MPS)

Note:
Advanced
systems
today use
Advanced
Configuratio
n and
Power
Interface
(ACPI) and
not MPS
https://web.archive.org/web/20121002210153/http://
download.intel.com/design/archives/processors/pro/docs/
Figure 3-1. System Memory
Address Map
mp *mp
mpinit()
phyaddr mpsearch() gives mp

mp gives mpconf
mpconf *conf
Mpconf, gives lapic
lapicaddr address

Each optional entry in


MP_PROC, mpconf gives CPU info
CPU LAPIC ID,
CPU Signature

MP_PROC,
CPU LAPIC ID,
CPU Signature

MP_IOAPIC,
IOAPIC Id, etc

global unit *lapic


After mpinit()
cpus[]

0
apicid= Z Ioapic_Number
global int ioapicid

1 apicid= X

2 apicid= W
LAPIC
lapicaddr Variables.
Some place
3 global int *lapic in mem
Mapped
devices
main() -> lapicinit()

Enable Local APIC static void

Set timer to generate lapicw(int index, int value)
interrupt at 10ms
lapicw(TIMER, PERIODIC | {
(T_IRQ0 + IRQ_TIMER));
lapic[index] = value; //
lapicw(TICR, 10000000); Abhijit: lapic was set in
mpinit()

Disable some un-necessary
interrupts
lapic[ID]; // wait for write
to finish, by reading

Enable interrupts on APIC
(not on CPU) }
main()->seginit()

Re-initialize GDT

Once and forever now

Just set 4 entries

All spanning 4 GB

Differing only in permissions and privilege level
After seginit() in main(). RAM
On the processor where we started booting
Physical Addr
Linear Address
Logical Address = offset
Dir pg Offset

4MB

0
CS, SS, etc.

Base Limit Permissions DPL


Selector

4
0 4GB Write
3
3
0 4GB Read, Execute
3
2
0 4GB Write
0
1
0 4GB Read, Execute
0
DS
0 0 0
0
SS Page
GDT
Table
CS CR3
Now Linear Address = Logical
Address != Physical Address
kpgdir
main()->picinit()
//Abhijit: Small code. Just disable 8259A interrupt controller
// Don't use the 8259A interrupt controllers. Xv6 assumes SMP
hardware.
void
picinit(void)
{
// mask all interrupts
outb(IO_PIC1+1, 0xFF);
outb(IO_PIC2+1, 0xFF);
}
void
ioapicinit(void)
{ main()->ioapicinit()
int i, id, maxintr;
/* Abhijit: global variable set to IOAPIC */
ioapic = (volatile struct ioapic*)IOAPIC; 
Location of ioapic is
maxintr = (ioapicread(REG_VER) >>
16) & 0xFF;
fixed
id = ioapicread(REG_ID) >> 24; 
#define IOAPIC
if(id != ioapicid) 0xFEC00000
cprintf("ioapicinit: id isn't equal to
ioapicid; not a MP\n"); 
In the loop enable
// Mark all interrupts edge-triggered,
active high, disabled, all interrupts upto
// and not routed to any CPUs. maxintr
for(i = 0; i <= maxintr; i++){
ioapicwrite(REG_TABLE+2*i,
INT_DISABLED | (T_IRQ0 + i));
ioapicwrite(REG_TABLE+2*i+1, 0);
}
}
main()->consoleinit()
#define NDEV 10 void
struct devsw devsw[NDEV]; consoleinit(void)
#define CONSOLE 1 {
// table mapping major initlock(&cons.lock, "console");
device number to
//device functions devsw[CONSOLE].write =
consolewrite;
struct devsw {
devsw[CONSOLE].read =
int (*read)(struct consoleread;
inode*, char*, int);
int (*write)(struct cons.locking = 1;
inode*, char*, int); ioapicenable(IRQ_KBD, 0);
}; }
devsw
read
write
Console handling in xv6
Device Files

Are files. Have an inode on disk. Type is “Device”.

Store No data.

Inode has “major” and “minor” number

Major number is an index into “devsw”

“devsw” entry gives “read” and “write” functions for that
device

Minor number identifies a device amongst many of that
type (in code of devsw.read, devsw.write)

sys_read() and sys_write() redirect the request to
devsw.read and devsw.write
Console handling in xv6
consoleread() 
Waits for data in ‘buf’
//Uses the array input 
Data put into ‘buf’ by
#define INPUT_BUF 128 ‘consoleintr’
struct { 
The interrupt handler
called when keys are
char buf[INPUT_BUF]; pressed
uint r; // Read index 
Consolewrite()
uint w; // Write index 
Uses uartcputc()
uint e; // Edit index Lower level function to
} input; write to I/O port
void
uartinit(void)
{
char *p;
main() -
// Turn off the FIFO
>uartinit()
outb(COM1+2, 0);

// 9600 baud, 8 data bits, 1 stop bit, parity off.



Universal
outb(COM1+3, 0x80);
outb(COM1+0, 115200/9600);
// Unlock divisor
Asynchronous
outb(COM1+1, 0);
outb(COM1+3, 0x03); // Lock divisor, 8 data bits.
Receiver Transmitter
outb(COM1+4, 0);
outb(COM1+1, 0x01); // Enable receive interrupts.

Used for I/O from
users
// If status is 0xFF, no serial port.
if(inb(COM1+5) == 0xFF)
return;

Basically writing to
uart = 1;
various output ports
// Acknowledge pre-existing interrupt conditions;
// enable interrupts.
using “OUT”
inb(COM1+2);
inb(COM1+0);
instruction
ioapicenable(IRQ_COM1, 0);

// Announce that we're here.
for(p="xv6...\n"; *p; p++)
uartputc(*p);
}
main() -> pinit()

Just initialize the lock on process table
struct {
struct spinlock lock;
struct proc proc[NPROC];
} ptable;
void pinit(void) {
initlock(&ptable.lock, "ptable");
}
main() -> tvinit()
Sets all “trap” handlers in the global
struct gatedesc idt[256];

Does NOT enable the interrupt handlers, just stores them


in idt variable
That will be done later in main()->mpmain()->idtinit(), and
main()->startothers()->mpenter()->mpmain()->ideinit()
called on each processor
Remember: each processor needs to enable interrupt
handlers
Handling Traps
Handling traps

Transition from user mode to kernel mode

On a system call

On a hardware interrupt

User program doing illegal work (exception)

Actions needed, particularly w.r.t. to hardware
interrupts

Change to kernel mode & switch to kernel stack

Kernel to work with devices, if needed

Kernel to understand interface of device
Handling traps

Actions needed on a trap

Save the processor’s registers (context) for future use

Set up the system to run kernel code (kernel context)
on kernel stack

Start kernel in appropriate place (sys call, intr handler,
etc)

Kernel to get all info related to event (which block I/O
done?, which sys call called, which process did
exception and what type, get arguments to system
call, etc)

Privilege level

The x86 has 4 protection levels, numbered 0
(most privilege) to 3 (least privilege).

In practice, most operating systems use
only 2 levels: 0 and 3, which are then called
kernel mode and user mode, respectively.

The current privilege level with which the
x86 executes instructions is stored in %cs
register, in the field CPL.
Privilege level

Changes automatically on
“int” instruction
hardware interrupt
exeception

Changes back on
iret

“int” 10 --> makes 10th hardware interrupt. S/w
interrupt can be used to create hardware interrupt’

Xv6 uses “int 64” for actual system calls
Interrupt Descriptor Table (IDT)

IDT defines intertupt handlers

Has 256 entries

each giving the %cs and %eip to be used when handling the
corresponding interrupt.

Interrupts 0-31 are defined for software exceptions,
like divide errors or attempts to access invalid
memory addresses.

Xv6 maps the 32 hardware interrupts to the range 32-
63

and uses interrupt 64 as the system call interrupt
Interrupt Descriptor Table (IDT)
entries
// Gate descriptors for interrupts and traps
struct gatedesc {
uint off_15_0 : 16; // low 16 bits of offset in
segment
uint cs : 16; // code segment selector
uint args : 5; // # args, 0 for interrupt/trap
gates
uint rsv1 : 3; // reserved(should be zero I
guess)
uint type : 4; // type(STS_{IG32,TG32})
uint s : 1; // must be 0 (system)
uint dpl : 2; // descriptor(meaning new)
privilege level
uint p : 1; // Present
uint off_31_16 : 16; // high bits of offset in segment
};
Setting IDT entries
void
tvinit(void)
{
int i;
for(i = 0; i < 256; i++)
SETGATE(idt[i], 0, SEG_KCODE<<3, vectors[i], 0);
SETGATE(idt[T_SYSCALL], 1, SEG_KCODE<<3,
vectors[T_SYSCALL], DPL_USER);
/* value 1 in second argument --> don't disable
interrupts
* DPL_USER means that processes can raise
this interrupt. */
initlock(&tickslock, "time");
}
Setting IDT entries
#define SETGATE(gate, istrap, sel, off, d) \
{ \
(gate).off_15_0 = (uint)(off) & 0xffff; \
(gate).cs = (sel); \
(gate).args = 0; \
(gate).rsv1 = 0; \
(gate).type = (istrap) ? STS_TG32 : STS_IG32; \
(gate).s = 0; \
(gate).dpl = (d); \
(gate).p = 1; \
(gate).off_31_16 = (uint)(off) >> 16; \
}
Setting IDT entries
Vectors.S trapasm.S
# generated by vectors.pl - #include "mmu.h"
do not edit
# vectors.S sends all traps
# handlers here.
.globl alltraps .globl alltraps
.globl vector0 alltraps:
vector0:
# Build trap frame.
pushl $0
pushl %ds
pushl $0
pushl %es
jmp alltraps
pushl %fs
.globl vector1
vector1:
pushl %gs
pushl $0 Pushal
pushl $1 ....
jmp alltraps
How will interrupts be handled?
On int instruction/interrupt
the CPU does this:

Fetch the n’th descriptor from the 
Push %ss. // optional
IDT, where n is the argument of
int. 
Push %esp. // optional (also

Check that CPL in %cs is <= DPL,
changes ss,esp using TSS)
where DPL is the privilege level in 
Push %eflags.
the descriptor.

Save %esp and %ss in CPU- 
Push %cs.
internal registers, but only if the
target segment selector’s PL <

Push %eip.
CPL.

Switching from user mode to kernel

Clear the IF bit in %eflags,
mode. Hence save user code’s SS but only on an interrupt.
and ESP

Load %ss and %esp from a task

Set %cs and %eip to the
segment descriptor. values in the descriptor.

Stack changes to kernel stack now.
TS descriptor is on GDT, index given
by TR register. See switchuvm()
After “int” ‘s job is done

IDT was already set

Remember vectors.S

So jump to 64th entry in vector’s
vector64:
pushl $0
pushl $64
jmp alltraps

So now stack has ss, esp,eflags, cs, eip, 0 (for error code),
64

Next run alltraps from trapasm.S
# Build trap frame.
pushl %ds alltraps:
pushl %es
pushl %fs 
Now stack contains
pushl %gs 
ss, esp,eflags, cs, eip, 0
pushal // push all gen purpose (for error code), 64, ds,
regs es, fs, gs, eax, ecx, edx,
# Set up data segments. ebx, oesp, ebp, esi, edi
movw $(SEG_KDATA<<3), %ax 
This is the struct
trapframe !
movw %ax, %ds

So the kernel stack now
movw %ax, %es
contains the trapframe
# Call trap(tf), where tf=%esp 
Trapframe is a part of
pushl %esp # first arg to trap() kernel stcak
call trap
addl $4, %esp
void
trap(struct trapframe *tf) trap()
{
if(tf->trapno == T_SYSCALL){ 
Argument is trapframe
if(myproc()->killed) 
In alltraps
exit(); 
Before “call trap”, there
myproc()->tf = tf; was “push %esp” and
syscall(); stack had the trapframe
if(myproc()->killed)

Remember calling
convention --> when a
exit(); function is called, the
return; stack contains the
} arguments in reverse
order (here only 1 arg)
switch(tf->trapno){
.....
trap()
Timer
Has a switch



wakeup(&ticks)

switch(tf->trapno) 
IDE: disk interrupt

Ideintr()

Q: who set this 
KBD
trapno? 
Kbdintr()
COM1
Depending on the



Uatrintr()
type of trap 
If Timer
Call yield() -- calls sched()
Call interrupt



If process was killed (how is that
handler done?

Call exit()!

Stack had (trapframe)
ss, esp,eflags, cs, eip, 0 (for
when trap() returns

error code), 64, ds, es, fs, gs,


eax, ecx, edx, ebx, oesp, ebp,
esi, edi, esp

#Back in alltraps
call trap 
add $4 %esp
addl $4, %esp 
esp

# Return falls through to trapret...



popal
.globl trapret 
eax, ecx, edx, ebx, oesp, ebp,
trapret: esi, edi
popal 
Then gs, fs, es, ds
popl %gs
popl %fs 
add $0x8, %esp
popl %es 
0 (for error code), 64
popl %ds
addl $0x8, %esp # trapno and errcode

iret
iret 
ss, esp,eflags, cs, eip,

main()->binit() struct buf {
int flags;
uint dev;

Initialize the buffer cache

Buffers are used for storing data uint blockno;
read from disk struct sleeplock lock;

Mainly Files’ data

Metadata uint refcnt;
struct { struct buf *prev; // LRU cache
struct spinlock lock; list
struct buf buf[NBUF]; struct buf *next;
// Linked list of all buffers, through
prev/next.
struct buf *qnext; // disk queue
// head.next is most recently used. uchar data[BSIZE];
struct buf head; };
} bcache;

void binit() code
binit(void)
{
struct buf *b;
initlock(&bcache.lock, "bcache");
// Create linked list of buffers
bcache.head.prev = &bcache.head;
bcache.head.next = &bcache.head;
for(b = bcache.buf; b < bcache.buf+NBUF; b++){
b->next = bcache.head.next;
b->prev = &bcache.head;
initsleeplock(&b->lock, "buffer");
bcache.head.next->prev = b;
bcache.head.next = b;
}
}
After main()->binit()

head Conceptually
Linked liks
this
n n n n n n n n n

p p p p p p p p p

lock buf[0] buf[1] buf[2] ... head

struct bcache
main()->fileinit()
struct { struct file {
struct spinlock lock; enum { FD_NONE, FD_PIPE,
FD_INODE } type;
struct file file[NFILE];
int ref; // reference count
} ftable; char readable;
void fileinit(void) char writable;
{ struct pipe *pipe;
initlock(&ftable.lock, struct inode *ip;
"ftable"); uint off;
} };

A global array of “file” in ftable, is used for allocating


“struct file” whenever a open() call is made
main() ->ideinit()
void
ideinit(void)

Initialize idelock
{
int i;

Enable IDE interrupt
initlock(&idelock, "ide");
ioapicenable(IRQ_IDE, ncpu - 1);

Check if there is a
idewait(0); second disk and set
// Check if disk 1 is present
outb(0x1f6, 0xe0 | (1<<4)); havedisk1= 1
for(i=0; i<1000; i++){
if(inb(0x1f7) != 0){
havedisk1 = 1;
break;
}
}
// Switch back to disk 0.
outb(0x1f6, 0xe0 | (0<<4));
}
main()->startothers()

Used to turn on all other 
Each CPU needs
processors 
GDT to be initialized

When kernel booted, only 
same as first CPU
one processor was turned on

LAPIC initialized

Each CPU needs

Same as first CPU

it’s own kernel stack (as it
will run kernel independently 
IDT initialized

kernel page directory (same 
Same as first CPU
kpgdir will be shared) 
Start running a process

To be turned ON once above setup is done

lapicstartap() 
Call scheduler()
main()->startothers()
static void startothers(void) { 
The code in entryother.S
extern uchar is available using symbols
_binary_entryother_start[],
_binary_entryother_size[];

_binary_entryother_start
uchar *code; 
Represents _start in
struct cpu *c; entryother.S
char *stack; 
Memmove() will copy the
// Write entry code to unused code from entryother.S at
memory at 0x7000. 0x7000
// The linker has placed the image of
entryother.S in

Question: where is this
address in physical
// _binary_entryother_start.
memory now?
code = P2V(0x7000);
memmove(code,
_binary_entryother_start,
(uint)_binary_entryother_size);
for(c = cpus; c < cpus+ncpu; c++){
if(c == mycpu()) // We've started already.
continue;
main()-
// Tell entryother.S what stack to use,
where to enter, and what
>startothers()
// pgdir to use. We cannot use kpgdir yet, 
Iterate over each CPU. For
because the AP processor each
// is running in low memory, so we use
entrypgdir for the APs too.

Allocate a kernel stack
stack = kalloc();

Set (code-4) to point to stack
*(void**)(code-4) = stack + KSTACKSIZE; 
Set (code-8) to point to mpenter
*(void(**)(void))(code-8) = mpenter; // 
Set (code-12) to point to
type cast to a funcptr entrypgdir
*(int**)(code-12) = (void *) 
All these above will be used in
V2P(entrypgdir);
entry_ in entryother.S
lapicstartap(c->apicid, V2P(code));
// wait for cpu to finish mpmain()

Lapicstatap(...code) -> turns
while(c->started == 0)
CPU ON, hence jumps to
code, i.e. _start in entryother.S
;
}
}
_start:
main()->startothers()
... inside entryother.s
... 
All code is almost
... same as “entry.S”
# Switch to the stack 
Except
allocated by 
Stack is obtained from
startothers() start-4

We set it to kernel
movl (start-4), %esp stack before
# Call mpenter() 
Function to jump is
from start-8, i.e. code
call *(start-8) – 8, i.e. mpenter()
static void main()->startothers()
mpenter(void) inside entryother.s ->
mpenter()
{ 
We know all these
switchkvm(); functions!
seginit(); 
Q: why do we need
lapicinit(); entryother.S
(assembly code) to
mpmain(); turn on other CPUs?
}
main()->kinit2(P2V(4*1024*1024),
P2V(PHYSTOP));

Kinit1() gave us few pages

Now that all CPUs are initialized

Segmentation is initialized properly

Kernel page directory created for all CPUs from earlier
allocated cache

From kinit1() allocated pages

Now

Reclaim all physical memory into free list

from 4+2048 MB to 224 + 2048 MB VM

i.e. in kernel-data-+memory region as per kmap[]
Creation of first process by kernel
Why first process needs ‘special’
treatment?

Normally process is created using fork()

and typically followed by a call to exec()

Fork will use the PCB of existing process to
create a new process

as a clone

The first process has nothing to copy from!

So it’s PCB needs to “built” by kernel code
Why first process needs ‘special’
treatment?

XV6 approach

Create the process as if it was created by “fork”

Ensure that the process starts in a call to “exec”

Let “Exec” do the rest of the JOB as expected

In this case exec() will call

exec(“/init”, NULL);

See the code of init.c

opens console() device for I/O; dups 0 on 1 and 2!

Same device file for I/O

forks a process and execs (“sh”) on it.

Itself keeps waiting for zombie processes
Why first process needs ‘special’
treatment?

What needs to be done ?

Build struct proc by hand

How data structures (proc, stack, etc) are hand-
crafted so that when kernel returns, the process
starts in code of init
Imp Concepts

A process has two stacks

user stack: used when user code is running

kernel stack: used when kernel is running on
behalf of a process

Note: there is a third stack also!

The kernel stack used by the scheduler itself

Not a per process stack
Imp Concepts
struct proc {
uint sz; // Size of process memory (bytes)
pde_t* pgdir; // Page table
char *kstack; // Bottom of kernel stack for this process
enum procstate state; // Process state
int pid; // Process ID
struct proc *parent; // Parent process
struct trapframe *tf; // Trap frame for current syscall
struct context *context; // swtch() here to run process
void *chan; // If non-zero, sleeping on chan
int killed; // If non-zero, have been killed
struct file *ofile[NOFILE]; // Open files
struct inode *cwd; // Current directory
char name[16]; // Process name (debugging)
};
setupkvm()
does this mapping

Layout of
process’s
VA space

These mappings
need to be
created per
process
Memory
Layout
of a
user
process
After
exec()

Note the
argc,
argv on
stack

stack is
just one
page.

size of
text and
data is
derived
from
ELF file
main()->userinit()
Creating first process by hand

Code of the first process

initcode.S and init.c

init.c is compiled into “/init” file

During make !

Trick:

Use initcode.S to “exec(“/init”)”

And let exec() do rest of the job

But before you do exec()

Process must exist as if it was forked() and running
main()->userinit()
Creating first process by hand
void
userinit(void)
{
struct proc *p;
extern char _binary_initcode_start[], _binary_initcode_size[];

// Abhijit: obtain proc 'p', with stack initialized


// and trapframe created and eip set to 'forkret'
p = allocproc();
// let’s see what allocproc() does
First process creation
Let’s revisit struct proc
// Per-process state
struct proc {
uint sz; // Size of process memory (bytes)
pde_t* pgdir; // Page table
char *kstack; // Bottom of kernel stack for this process
enum procstate state; // Process state. allocated, ready to run, running,
wait-
ing for I/O, or exiting.
int pid; // Process ID
struct proc *parent; // Parent process
struct trapframe *tf; // Trap frame for current syscall
struct context *context; // swtch() here to run process. Process’s context
void *chan; // If non-zero, sleeping on chan. More when we discuss
sleep, wakeup
int killed; // If non-zero, have been killed
struct file *ofile[NOFILE]; // Open files, used by open(), read(),...
struct inode *cwd; // Current directory, changed with “chdir()”
char name[16]; // Process name (for debugging)
};
allocproc()
static struct proc*
found:
allocproc(void)
{ p->state = EMBRYO;
struct proc *p;
char *sp; p->pid = nextpid++;
acquire(&ptable.lock);
for(p = ptable.proc; p <
&ptable.proc[NPROC]; p++) release(&ptable.lock);
if(p->state == UNUSED)
goto found;
release(&ptable.lock);
return 0;
allocproc() setting up stack
p sp
if((p->kstack = kalloc()) == 0){ context
p->state = UNUSED;
return 0; kstack
}
tf
sp = p->kstack + KSTACKSIZE;
// Abhijit KSTCKSIZE = PGSIZE
proc
// Leave room for trap frame.
sp -= sizeof *p->tf;
p->tf = (struct trapframe*)sp;
// Set up new context to start executing at
forkret,
// which returns to trapret.
sp -= 4;
*(uint*)sp = (uint)trapret;
sp -= sizeof *p->context;
p->context = (struct context*)sp;
memset(p->context, 0, sizeof *p->context);
p->context->eip = (uint)forkret;
allocproc() setting up stack
p sp
if((p->kstack = kalloc()) == 0){ context
p->state = UNUSED;
return 0; kstack sizeof(trapframe)
}
tf
sp = p->kstack + KSTACKSIZE;
// Abhijit KSTCKSIZE = PGSIZE
proc
// Leave room for trap frame.
sp -= sizeof *p->tf;
p->tf = (struct trapframe*)sp;
// Set up new context to start executing at
forkret,
// which returns to trapret.
sp -= 4;
*(uint*)sp = (uint)trapret;
sp -= sizeof *p->context;
p->context = (struct context*)sp;
memset(p->context, 0, sizeof *p->context);
p->context->eip = (uint)forkret;
allocproc() setting up stack
p
if((p->kstack = kalloc()) == 0){ context sp
p->state = UNUSED;
return 0; kstack sizeof(trapframe)
}
tf
sp = p->kstack + KSTACKSIZE;
// Abhijit KSTCKSIZE = PGSIZE
proc trapret()
// Leave room for trap frame.
sp -= sizeof *p->tf;
p->tf = (struct trapframe*)sp;
// Set up new context to start executing at
forkret,
// which returns to trapret.
sp -= 4;
*(uint*)sp = (uint)trapret;
sp -= sizeof *p->context;
p->context = (struct context*)sp;
memset(p->context, 0, sizeof *p->context);
p->context->eip = (uint)forkret;
allocproc() setting up stack
p
if((p->kstack = kalloc()) == 0){ context sp
p->state = UNUSED;
return 0; kstack sizeof(trapframe)
}
tf
sp = p->kstack + KSTACKSIZE;
// Abhijit KSTCKSIZE = PGSIZE
proc trapret()
// Leave room for trap frame. sizeof(context)
sp -= sizeof *p->tf;
p->tf = (struct trapframe*)sp;
// Set up new context to start executing at
forkret,
// which returns to trapret.
sp -= 4;
*(uint*)sp = (uint)trapret;
sp -= sizeof *p->context;
p->context = (struct context*)sp;
memset(p->context, 0, sizeof *p->context);
p->context->eip = (uint)forkret;
allocproc() setting up stack
p
if((p->kstack = kalloc()) == 0){ context sp
p->state = UNUSED;
return 0; kstack sizeof(trapframe)
}
tf
sp = p->kstack + KSTACKSIZE;
// Abhijit KSTCKSIZE = PGSIZE
proc trapret()
// Leave room for trap frame. eip=forkret()
ebp
sp -= sizeof *p->tf;
ebx
p->tf = (struct trapframe*)sp; esi
// Set up new context to start executing at edi
forkret,
// which returns to trapret.
sp -= 4;
*(uint*)sp = (uint)trapret;
sp -= sizeof *p->context;
p->context = (struct context*)sp;
memset(p->context, 0, sizeof *p->context);
p->context->eip = (uint)forkret;
Next in userinit()
initproc = p; p->tf->eflags = FL_IF;
if((p->pgdir = setupkvm()) == 0) p->tf->esp = PGSIZE;
panic("userinit: out of p->tf->eip = 0; // beginning of
memory?"); initcode.S
inituvm(p->pgdir,
_binary_initcode_start, safestrcpy(p->name, "initcode",
(int)_binary_initcode_size); sizeof(p->name));
p->sz = PGSIZE; p->cwd = namei("/");
memset(p->tf, 0, sizeof(*p->tf)); acquire(&ptable.lock);
p->tf->cs = (SEG_UCODE << 3) | p->state = RUNNABLE;
DPL_USER;
p->tf->ds = (SEG_UDATA << 3) |
DPL_USER;
release(&ptable.lock);
p->tf->es = p->tf->ds;
p->tf->ss = p->tf->ds;
After userinit()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNALBE ebx
pages esi
edi

Page table stack


entry.S
pgdir
Inode
esp for “/”
Q: why there is no user
kpgdir
stack yet? CR3
main()->mpmain()
static void 
Load IDT register
mpmain(void) 
Copy from idt[] array
{
into IDTR
cprintf("cpu%d: starting %d\n",
cpuid(), cpuid()); 
Call scheduler()
idtinit(); // load idt register 
One process has
xchg(&(mycpu()->started), 1); // already been made
tell startothers() we're up
runnable
scheduler(); // start running
processes 
Let’s enter
} scheduler now
Before reading scheduler(): Note

The esp is still pointing to the 
Fields in cpu[] not yet
stack which was allocated in set
entry.S !

this is the kernel only stack

context * scheduler -->

Not the per process kernel
will be setup in sched()
stack. 
taskstate ts --> large

CR3 points to kpgdir structure, only parts
used in switchuvm()

Struct cpu[ ] has been setup
up already 
ncli, intena --> used

apicid – in mpinit() while locking

segdesc gdt – in seginit() 
proc *proc -> set during

started – in mpmain() scheduler()
void scheduler()
scheduler(void)
{
struct proc *p;
struct cpu *c = mycpu();
c->proc = 0;

for(;;){
sti();
// Loop over process table looking for process to run.
acquire(&ptable.lock);
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
if(p->state != RUNNABLE)
continue;
// Switch to chosen process. It is the process's job
// to release ptable.lock and then reacquire it
// before jumping back to us.
c->proc = p;
scheduler() cpu
proc
called first *c
time
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNALBE ebx
pages esi
edi

Page table stack


entry.S
pgdir
Inode
esp for “/”
kpgdir
CR3
scheduler()
acquire(&ptable.lock);
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
if(p->state != RUNNABLE)
continue;

// Switch to chosen process. It is the process's job


// to release ptable.lock and then reacquire it
// before jumping back to us.
c->proc = p;
switchuvm(p);
p->state = RUNNING;
after cpu
proc
switchuvm() *c
in scheduler()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNING ebx
pages esi
edi

Page table stack


entry.S
pgdir
Inode
esp for “/”
kpgdir
CR3
scheduler()
acquire(&ptable.lock);
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
if(p->state != RUNNABLE)
continue;

// Switch to chosen process. It is the process's job


// to release ptable.lock and then reacquire it
// before jumping back to us.
c->proc = p;
switchuvm(p);
p->state = RUNNING
swtch(&(c->scheduler), p->context);
;
cpu
at call to proc
*c
swtch()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNING ebx
pages esi
edi
addr of
p->context
--
Page table addr of
c->scheduler
pgdir --
ret value in
scheduler() Inode
for “/”
kpgdir
CR3
esp
swtch

swtch:
#Abhijit: swtch was called through a function call.
#So %eip was saved on stack already
movl 4(%esp), %eax # Abhijit: eax = old
movl 8(%esp), %edx # Abhijit: edx = new
cpu
during proc
*c
swtch()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNING ebx
pages esi
edi
addr of
p->context
--
Page table addr of
c->scheduler
pgdir --
ret value in
scheduler() Inode eax = &c->scheduler
for “/”
kpgdir edx = &p->context
CR3
esp
swtch

swtch:
#Abhijit: swtch was called through a function call.
#So %eip was saved on stack already
movl 4(%esp), %eax # Abhijit: eax = old
movl 8(%esp), %edx # Abhijit: edx = new
# Save old callee-saved registers
pushl %ebp
pushl %ebx
pushl %esi
pushl %edi # Abhijit: esp = esp + 16
cpu
during proc
*c
swtch()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNING ebx
pages esi
edi
addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode eax = &c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
swtch
swtch:
#Abhijit: swtch was called through a function call.
#So %eip was saved on stack already
movl 4(%esp), %eax # Abhijit: eax = old
movl 8(%esp), %edx # Abhijit: edx = new
# Save old callee-saved registers
pushl %ebp
pushl %ebx
pushl %esi
pushl %edi # Abhijit: esp = esp + 16
# Switch stacks
movl %esp, (%eax) # Abhijit: *old = updated old stack
movl %edx, %esp # Abhijit: esp = new
cpu
during proc
*c
swtch()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNING ebx
pages esi
edi
addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
swtch:
swtch
#Abhijit: swtch was called through a function call.
#So %eip was saved on stack already
movl 4(%esp), %eax # Abhijit: eax = old
movl 8(%esp), %edx # Abhijit: edx = new
# Save old callee-saved registers
pushl %ebp
pushl %ebx
pushl %esi
pushl %edi # Abhijit: esp = esp + 16
# Switch stacks
movl %esp, (%eax) # Abhijit: *old = updated old stack
movl %edx, %esp # Abhijit: esp = new
# Load new callee-saved registers
popl %edi
popl %esi
popl %ebx
popl %ebp # Abhijit: newesp = newesp - 16, context restored
cpu
during proc
*c
swtch()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir eip=forkret()
s cwd ebp
Kernel state = RUNNING ebx
pages esi
edi
addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
swtch:
swtch
#Abhijit: swtch was called through a function call.
#So %eip was saved on stack already
movl 4(%esp), %eax # Abhijit: eax = old
movl 8(%esp), %edx # Abhijit: edx = new
# Save old callee-saved registers
pushl %ebp
pushl %ebx
pushl %esi
pushl %edi # Abhijit: esp = esp + 16
# Switch stacks
movl %esp, (%eax) # Abhijit: *old = updated old stack
movl %edx, %esp # Abhijit: esp = new
# Load new callee-saved registers
popl %edi
popl %esi
popl %ebx
popl %ebp # Abhijit: newesp = newesp - 16, context restored
ret # Abhijit: will pop from esp now -> function where to
return.
after “ret”
cpu
from swtch() proc
*c
just before
forkret()
Code/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir
s cwd
Kernel state = RUNNING
pages

addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
After swtch()

Process is running in forkret()

c->csheduler has saved the old kernel stack

with the context of p, return value in scheduler,
ebp, ebx, esi, edi on stack

remember }edi, esi, ebx, ebp, ret-value } =
context

The c->scheduler is pointing to old context

CR3 is pointing to process pgdir
after forkret()
cpu
just before proc
*c
trapret()
beginsCode/stack
of p
“initcode” proc Trapframe
context CS =3
Page table DS = ES = SS =4
kstack EFLAGS = FL_IF
ESP = 4096
tf EIP = 0
sz = 4096
Page table name = “initcode” trapret()
Toward pgdir
s cwd
Kernel state = RUNNING
pages

addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
After iret in trapret

The CS, EIP, ESP will be changed

to values already stored on trapframe

this is done by iret

Hence after this user code will run

On user stack!

Hence code of initcode will run now
eip cpu
at the end proc
*c
of trapret()
Code/stack
of p
“initcode” proc
context
Page table
kstack

tf
sz = 4096
Page table name = “initcode”
Toward pgdir
s cwd
Kernel state = RUNNING
pages

addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
initcode
# char init[] = "/init\0"; start:
init: pushl $argv
pushl $init
.string "/init\0"
pushl $0 // where caller pc
would be
# char *argv[] = { init, 0 }; movl $SYS_exec, %eax
.p2align 2 int $T_SYSCALL
argv:
# for(;;) exit();
.long init
exit:
.long 0
movl $SYS_exit, %eax
int $T_SYSCALL
jmp exit
esp
0x24 = addr of argv
0x1c = addr of init
0x0

00000000 <start>:
0: 68 24 00 00 00 push $0x24
5: 68 1c 00 00 00 push $0x1c
a: 6a 00 push $0x0
c: b8 07 00 00 00 mov $0x7,%eax
11: cd 40 int $0x40

00000013 <exit>:
13: b8 02 00 00 00 mov $0x2,%eax
18: cd 40 int $0x40
1a: eb f7 jmp 13 <exit>
0000001c <init>:
”/init\0”
00000024 <argv>:
1c 00
00 00
eip cpu
on sys_exec() proc
*c
+ all traps()
0x24 alltraps():
0x1c
0 p
proc ss =4,esp, eflags
code context cs = 3, eip
Page table 0,64,ds,es,fs,gs,
kstack gen registers,
add of this esp,
tf ret add in alltraps()
sz = 4096
Page table name = “initcode”
Toward pgdir
s cwd
Kernel state = RUNNING
pages

addr of
p->context
addr of
c->scheduler
Page table ret value in
scheduler()
pgdir
ebp,ebx,
esi,edi Inode c->scheduler
for “/”
kpgdir edx = &p->context
CR3 esp
Understanding fork() and exec()

First, revising some concepts already learnt


then code of fork(), exec()
First process creation
Let’s revisit struct proc
// Per-process state
struct proc {
uint sz; // Size of process memory (bytes)
pde_t* pgdir; // Page table
char *kstack; // Bottom of kernel stack for this process
enum procstate state; // Process state. allocated, ready to run, running,
wait-
ing for I/O, or exiting.
int pid; // Process ID
struct proc *parent; // Parent process
struct trapframe *tf; // Trap frame for current syscall
struct context *context; // swtch() here to run process. Process’s context
void *chan; // If non-zero, sleeping on chan. More when we discuss
sleep, wakeup
int killed; // If non-zero, have been killed
struct file *ofile[NOFILE]; // Open files, used by open(), read(),...
struct inode *cwd; // Current directory, changed with “chdir()”
char name[16]; // Process name (for debugging)
};
Trapframe

struct proc diagram edi, esi, ebp,ebx,


edx, ecx, eax,
gs, fs, es=4,
Code &data ds=4, trapno=?,
of err, eip, cs = 3,
“cat” proc EFLAGS = FL_IF
process ESP = 4096, ss=4
Page table
stack sz = a24f EIP = 0
pgdir trapret()
kstack
int state=RUNNABLE eip=forkret()
Page table int pid = 22 ebp
Toward proc *parent ebx
s tf esi
Kernel context edi
pages void *chan
int killed=0
file *ofile
inode *cwd
name =”cat”
Page table In use only when
you are in kernel on
pgdir a “trap” =
Inode interrupt/syscall. “tf”
sz = ELF-code->memsz (includes data, check “ld -N” for “/” always used.
+ 2*4096 (for stack) trapret,forkret used
during fork()
fork()/exec() are syscalls. On
every syscall this happens

Fetch the n’th descriptor from the 
Push %ss. // optional
IDT, where n is the argument of
int. 
Push %esp. // optional (also

Check that CPL in %cs is <= DPL,
changes ss,esp using TSS)
where DPL is the privilege level in 
Push %eflags.
the descriptor.

Save %esp and %ss in CPU- 
Push %cs.
internal registers, but only if the
target segment selector’s PL <

Push %eip.
CPL.

Switching from user mode to kernel

Clear the IF bit in %eflags,
mode. Hence save user code’s SS but only on an interrupt.
and ESP

Load %ss and %esp from a task

Set %cs and %eip to the
segment descriptor. values in the descriptor.

Stack changes to kernel stack now.
TS descriptor is on GDT, index given
by TR register. See switchuvm()
After “int” ‘s job is done

IDT was already set, during idtinit()

Remember vectors.S – gives jump locations for each interrupt

“int 64” ->jump to 64th entry in vector table
vector64:
pushl $0
pushl $64
jmp alltraps

So now stack has ss, esp,eflags, cs, eip, 0 (for error code), 64

Next run alltraps from trapasm.S
# Build trap frame.
pushl %ds alltraps:
pushl %es
pushl %fs 
Now stack contains
pushl %gs ss, esp,eflags, cs, eip, 0
pushal // push all gen purpose (for error code), 64, ds,
regs es, fs, gs, eax, ecx, edx,
# Set up data segments. ebx, oesp, ebp, esi, edi
movw $(SEG_KDATA<<3), %ax 
This is the struct
trapframe !
movw %ax, %ds

So the kernel stack now
movw %ax, %es
contains the trapframe
# Call trap(tf), where tf=%esp 
Trapframe is a part of
pushl %esp # first arg to trap() kernel stcak
call trap
addl $4, %esp
void
trap(struct trapframe *tf) trap()
{
if(tf->trapno == T_SYSCALL){ 
Argument is trapframe
if(myproc()->killed) 
In alltraps
exit(); 
Before “call trap”, there
myproc()->tf = tf; was “push %esp” and
syscall(); stack had the trapframe
if(myproc()->killed)

Remember calling
convention --> when a
exit(); function is called, the
return; stack contains the
} arguments in reverse
order (here only 1 arg)
switch(tf->trapno){
.....
trap()
Timer
Has a switch



wakeup(&ticks)

switch(tf->trapno) 
IDE: disk interrupt

Ideintr()

Q: who set this 
KBD
trapno? 
Kbdintr()
COM1
Depending on the



Uatrintr()
type of trap 
If Timer
Call yield() -- calls sched()
Call interrupt



If process was killed (how is that
handler done?

Call exit()!

Stack had (trapframe)
ss, esp,eflags, cs, eip, 0 (for
when trap() returns

error code), 64, ds, es, fs, gs,


eax, ecx, edx, ebx, oesp, ebp,
esi, edi, esp

#Back in alltraps
call trap 
add $4 %esp
addl $4, %esp 
esp

# Return falls through to trapret...



popal
.globl trapret 
eax, ecx, edx, ebx, oesp, ebp,
trapret: esi, edi
popal 
Then gs, fs, es, ds
popl %gs
popl %fs 
add $0x8, %esp
popl %es 
0 (for error code), 64
popl %ds
addl $0x8, %esp # trapno and errcode

iret
iret 
ss, esp,eflags, cs, eip,

understanding fork()

What should fork do?

Create a copy of the existing process

child is same as parent, except pid, parent-child relation,
return value (pid or 0)

Please go through every member of struct proc,
understand it’s meaning to appreciate what fork() should
do

create a struct proc, and

duplicate pages, page directory, sz, state,trapframe,context,
ofile (and files!), cwd, name

modify: pid, parent, trapframe, state
understanding fork()
int int
fork(void)
sys_fork(void) {
int i, pid;
{
struct proc *np;
return fork(); struct proc *curproc = myproc();

} // Allocate process.
if((np = allocproc()) == 0){
return -1;
}
after allocproc()
-- we studied this -- same as creation of first
process
p
context sp
kstack sizeof(trapframe)

tf
proc trapret()
eip=forkret()
ebp
ebx
esi
edi
understanding fork()
// Copy process state from proc. ●
copy the pages, page
if((np->pgdir = copyuvm(curproc- tables, page directory
>pgdir, curproc->sz)) == 0){
kfree(np->kstack);

no copy on write here!
np->kstack = 0; ●
Rewind operation of
np->state = UNUSED; copyuvm() fails
return -1; ●
copy size
}
np->sz = curproc->sz;

set parent of child
np->parent = curproc; ●
copy trapframe
*np->tf = *curproc->tf;
pde_t*
copyuvm(pde_t *pgdir, uint sz)
{ understanding
fork()->copyuvm()
pde_t *d; pte_t *pte; uint pa, i, flags;
char *mem;
if((d = setupkvm()) == 0)
return 0;
for(i = 0; i < sz; i += PGSIZE){

Map kernel pages
if((pte = walkpgdir(pgdir, (void *) i, 0)) == 0)
panic("copyuvm: pte should exist"); ●
for every page in
if(!(*pte & PTE_P))
panic("copyuvm: page not present");
parent’s VM address
pa = PTE_ADDR(*pte); space
flags = PTE_FLAGS(*pte);
if((mem = kalloc()) == 0) ●
allocate a PTE for child
goto bad;
memmove(mem, (char*)P2V(pa), PGSIZE); ●
set flags
if(mappages(d, (void*)i, PGSIZE, V2P(mem), flags) < 0) {
kfree(mem);

copy data
goto bad;
}

map pages in child’s
} page directory/tables
return d;
bad:
freevm(d);
return 0;
}
understanding fork()
np->tf->eax = 0; 
set return value of child to
for(i = 0; i < NOFILE; i++) 0
if(curproc->ofile[i]) 
eax contains return value,
np->ofile[i] = filedup(curproc- it’s on TF
>ofile[i]);
np->cwd = idup(curproc->cwd);

copy each struct file
safestrcpy(np->name, curproc- 
copy current working dir
>name, sizeof(curproc->name)); inode
pid = np->pid; 
copy name
acquire(&ptable.lock);
np->state = RUNNABLE;

set pid of child
release(&ptable.lock); 
set child “RUNNABLE”
exec() - different prototype

int exec(char*, char**);

usage: to print README and test.txt using “cat”
int main(int argc, char *argv[])
{
char *cmd = "/cat";
char *argstr[4] = { "/cat", "README",
"test.txt", 0};
exec(cmd, argstr);
}

note: to really run this code in xv6, you need to make changes to Makefile. First,
add this program to UPROGS, then write a file test.txt using Linux, and add
‘test.txt’ to list of files in ‘mkfs’ target in Makefile
int

sys_exec()
sys_exec(void)
{
char *path, *argv[MAXARG];
int i;
uint uargv, uarg;
if(argstr(0, &path) < 0 || argint(1, (int*)&uargv) < 0){

argstr(n,), argint(n,)
return -1; 
Fetch the n’th argument
}
memset(argv, 0, sizeof(argv));
from process stack
for(i=0;; i++){ using p->tf->esp + offset
if(i >= NELEM(argv)) 
Again: revise calling
return -1;
if(fetchint(uargv+4*i, (int*)&uarg) < 0)
conventions
return -1; 
0’th argument: name of
if(uarg == 0){
argv[i] = 0;
executable file
break; 
1st Argument: address of
}
the array of arguments
if(fetchstr(uarg, &argv[i]) < 0)
return -1;

store in uargv
}
return exec(path, argv);
}
int sys_exec(void)
{
char *path, *argv[MAXARG]; sys_exec()
int i; uint uargv, uarg;
if(argstr(0, &path) < 0 || argint(1,
(int*)&uargv) < 0){

the local array argv[]
return -1; (allocated on kernel stack,
} obviously) set to 0
memset(argv, 0, sizeof(argv));
for(i=0;; i++){

fetch every next argument
if(i >= NELEM(argv)) return -1; from array of arguments
if(fetchint(uargv+4*i, (int*)&uarg) < 0) 
Sets the address of
return -1; argument in argv[1]
if(uarg == 0){
argv[i] = 0; break;

call exec
}

beware: mistake to assume
if(fetchstr(uarg, &argv[i]) < 0) that this exec() is the exec()
return -1; called from user code! NO!
}
return exec(path, argv);
}
What should exec() do?

Remember, it came from fork()

so proc & within it tf, context, kstack, pgdir-tables-pages, all
exist.

Code, stack pages exist, and mappings exist through proc-
>pgdir

Hence

read the ELF executable file (argv[0])

create a new page dir – create mappings for kernel and user
code+data; copy data from ELF to these pages (later discard
old pagedir)

Copy the argv onto the user stack – so that when new
process starts it has it’s main(argc, argv[]) built

set values of other fields in proc to start program correctly
User
stack
after
call
to
exec()
is over

normally data on stack on fn call: ret value, first arg, second arg, ...
main(int argc, char *argv[])
argv[] is address of array of string; string itself is an adress. Hence
2 levels of indirection on stack
exec()
int 
ustack
exec(char *path, char **argv)
{

used to build the
... arguments to be
uint argc, sz, sp, pushed on user-
ustack[3+MAXARG+1]; stack
...
if((ip = namei(path)) == 0){

namei
end_op(); 
get the inode of the
cprintf("exec: fail\n");
executable file
return -1;
}
exec()
// Check ELF header 
readi
if(readi(ip, (char*)&elf, 0, 
read ELF header
sizeof(elf)) != sizeof(elf))
goto bad;

setupkvm()
if(elf.magic != ELF_MAGIC) 
creating a new page
goto bad;
directory and
mapping kernel
pages
if((pgdir = setupkvm()) == 0)
goto bad;
sz = 0;
for(i=0, off=elf.phoff; i<elf.phnum; i++, off+=sizeof(ph)){ exec()
if(readi(ip, (char*)&ph, off, sizeof(ph)) != sizeof(ph))
goto bad;
if(ph.type != ELF_PROG_LOAD) 
Read ELF
continue; program
headers from
if(ph.memsz < ph.filesz)
ELF file
goto bad;

Map the
if(ph.vaddr + ph.memsz < ph.vaddr) code/data into
goto bad; pagedir-
if((sz = allocuvm(pgdir, sz, ph.vaddr + ph.memsz)) == 0) pagetable-
goto bad; pages
if(ph.vaddr % PGSIZE != 0) 
Copy data
goto bad;
from ELF file
into the pages
if(loaduvm(pgdir, (char*)ph.vaddr, ip, ph.off, ph.filesz) < allocated
0)
goto bad;
}
exec()
sz = PGROUNDUP(sz); 
Allocate 2
pages on
if((sz = allocuvm(pgdir, sz, sz + top of proc-
2*PGSIZE)) == 0) >sz

goto bad;

One page
for stack
clearpteu(pgdir, (char*)(sz - 
one page for
2*PGSIZE)); guard page

sp = sz;

Clear the
valid flag on
guard page
// Push argument strings, prepare rest of stack
in ustack.
for(argc = 0; argv[argc]; argc++) { exec()
if(argc >= MAXARG)
goto bad; 
For each entry in argv[]
sp = (sp - (strlen(argv[argc]) + 1)) & ~3; 
copy it on user-stack
if(copyout(pgdir, sp, argv[argc],
strlen(argv[argc]) + 1) < 0)

remember it’s
location on user
goto bad; stack in ustack
ustack[3+argc] = sp; 
add extra entries (to be
} copied to user stack) to
ustack[3+argc] = 0; ustack
ustack[0] = 0xffffffff; // fake return PC 
copy argc, argv pointer
ustack[1] = argc; 
take sp to bottom
ustack[2] = sp - (argc+1)*4; // argv pointer 
copy ustack to user
sp -= (3+argc+1) * 4; stack
if(copyout(pgdir, sp, ustack, (3+argc+1)*4) < 0)
goto bad;
This is
what the
code on
earlier
slide did
// Save program name for debugging.
for(last=s=path; *s; s++) exec()
if(*s == '/') 
copy name of new
last = s+1; process in proc->name
safestrcpy(curproc->name, last, 
change to new page
sizeof(curproc->name)); directory

change new size
// Commit to the user image. 
tf->eip will be used
when we return from
oldpgdir = curproc->pgdir; exec() to jump to user
curproc->pgdir = pgdir; code. Set to to first
instruction of code,
curproc->sz = sz; given by elf.entry
curproc->tf->eip = elf.entry; // main 
Set user stack pointer
curproc->tf->esp = sp; to “sp” (bottom of
stack of arguments)
switchuvm(curproc);

Update TSS, change
freevm(oldpgdir); CR3 to newpagedir
return 0; 
free old page dir
return 0 from exec()?

We know exec() does not return !

This was exec() function !

Returns to sys_exec()

sys_exec() also returns , where?

Remember we are still in kernel code, running on kernel stack.
p->kstack has the trapframe setup

There is context struct on stack. Why?

sys_exec() returns to trapret(), the trap frame will be popped !

with “iret” jump into new program !

New program is not old program , which could have accessed
return value of sys_exec()
Scheduler
Steps in scheduling scheduling

Suppose you want to switch from P1 to P2 on
a timer interrupt

P1 was doing
F() { i++; j++;}

P2 was doing
G() { x--; y++; }

P1 will experience a timer interrupt, switch to
kernel (scheduler) and scheduler will
scheduler P2
Steps in scheduling scheduling

User process -> kernel

Switch to kernel stack

The normal sequence on any
interrupt !

Kernel stack of process to
kernel stack of scheduler

Why?

Kernel stack of scheduler to
kernel stack of new
process . Why?

Kernel stack of new process
to user stack of new
process
scheduler()

Disable interrupts

Find a RUNNABLE process. Simple round-
robin!

c->proc = p

switchuvm(p) : Save TSS of scheduler’s stack
and make CR3 to point to new process pagedir

p->state = RUNNING

swtch(&(c->scheduler), p->context)
scheduler

swtch(&(c->scheduler), p->context)

Note that when scheduler() was called, when
P1 was running

After call to swtch() shown above

The call does NOT return!

The new process P2 given by ‘p’ starts running !

Let’s review swtch() again
swtch(old, new)

The magic function in swtch.S

Saves callee-save registers of old context

Switches esp to new-context’s stack

Pop callee-save registers from new context
ret

where? in the case of first process – returns to forkret() because
stack was setup like that !

in case of other processes, return where?

Return address given on kernel stack. But what’s that?

The EIP in p->context

When was EIP set in p->context ?
scheduler()

Called from?

mpmain() - already seen

No where else!

sched() is another scheduler function !

Who calls sched() ?

exit() - a process exiting calls sched ()

yield() - a process giving up CPU on timer calls yield()

sleep() - a process going to wait calls sleep()
void
sched(void)
sched()

get current process
{
int intena;

Error checking code (ignore as
of now)
struct proc *p = myproc();

get interrupt enabled status on
if(!holding(&ptable.lock))
current CPU (ignore as of now)
panic("sched ptable.lock");

call to swtch
if(mycpu()->ncli != 1)

Note tha arguments’ order
panic("sched locks");

p->context first, mycpu()-
if(p->state == RUNNING) >scheduler second
panic("sched running"); 
swtch() is a function call
if(readeflags()&FL_IF) 
pushes address of /*A*/ on
panic("sched interruptible"); stack of current process p
intena = mycpu()->intena; 
switches stack to mycpu()-
>scheduler. Then pops EIP from
swtch(&p->context, mycpu()-
that stack and jumps there.
>scheduler);

when was mycpu()->scheduler
/*A*/ mycpu()->intena = intena;
set? Ans: during scheduler()!
}
sched() and schduler()
sched() { scheduler(void) {
...
...
swtch(&p->context, mycpu()-
>scheduler); /* X */ swtch(&(c->scheduler), p-
>context); / * Y */
} }

scheduler() saves context in c->scheduler, sched() saves
context in p->context

after swtch() call in sched(), the control jumps to Y in scheduler

Switch from process stack to scheduler’s stack

after swtch() call in scheduler(), the control jumps to X in
sched()

Switch from scheduler’s stack to new process’s stack

Set of co-operating functions
sched() and scheduler() as co-
routines

In sched()
swtch(&p->context, mycpu()->scheduler);

In scheduler()
swtch(&(c->scheduler), p->context);

These two keep switching between processes

These two functions work together to achieve
scheduling

Using asynchronous jumps

Hence they are co-routines
To summarize

On a timer interrupt 
Now the loop in scheduler()
during P1 
calls switchkvm()

trap() is called. Stack has

Then continues to find next
process (P2) to run
changed from P1’s user
stack to P1’s kernel stack 
then calls swtch(&c-
>scheduler, p2’s->context)

trap()->yield() 
Stack changes to P2’s kernel

yield()->sched() stack.

sched() -> swtch(&p-

P2 runs the last instruction it
was was in ! Where was it?
>context, c->scheduler() 
mycpu()->intena = intena; in

Stack changes to sched()
scheduler’s kernel stack. 
Then returns to the one who
called sched() i.e. exit/sleep, etc

Switches to location “Y” 
Finally returns from it’s own
in scheduler(). “TRAP” handler and returns to
P2’s user stack and user code

Locks
struct spinlock
// Mutual exclusion lock.
struct spinlock {
uint locked; // Is the lock held?

// For debugging:
char *name; // Name of lock.
struct cpu *cpu; // The cpu holding the lock.
uint pcs[10]; // The call stack (an array of program counters)
// that locked the lock.
};
spinlocks in xv6 code
struct { static struct spinlock idelock;
struct spinlock lock; struct {
struct buf buf[NBUF]; struct spinlock lock;

struct buf head; int use_lock;


struct run *freelist;
} bcache;
} kmem;
struct {
struct log {
struct spinlock lock;
struct spinlock lock;
struct file file[NFILE];
...}
} ftable; struct pipe {
struct { struct spinlock lock;
struct spinlock lock; ...}
struct inode inode[NINODE]; struct {
} icache; struct spinlock lock;
struct sleeplock { struct proc proc[NPROC];
uint locked; // Is the lock held? } ptable;

struct spinlock lk; struct spinlock tickslock;


Void acquire(struct spinlock *lk)
{
pushcli(); // disable interrupts to avoid deadlock.
if(holding(lk))
spinlocks
panic("acquire");
......
void pushcli(void)

Pushcli() - disable
{ interrupts on that
int eflags;
processor
eflags = readeflags(); 
One after another
cli();
if(mycpu()->ncli == 0)
many acquire() can
mycpu()->intena = eflags & FL_IF; be called on different
mycpu()->ncli += 1;
spinlocks
}
static inline uint 
Keep a count of them
readeflags(void) in mycpu()->ncli
{
uint eflags;
asm volatile("pushfl; popl %0" : "=r" (eflags));
return eflags;
}
void
release(struct spinlock *lk)
{
...
spinlocks
asm volatile("movl $0, %0" : "+m" (lk-
>locked) : );
popcli();

Popcli()
} 
Restore interrupts if
.
Void popcli(void)
last popcli() call
{ restores ncli to 0 &
if(readeflags()&FL_IF) interrupts were
panic("popcli - interruptible"); enabled before
if(--mycpu()->ncli < 0)
pushcli() was called
panic("popcli");
if(mycpu()->ncli == 0 && mycpu()->intena)
sti();
}
spinlocks

Always disable interrupts while acquiring spinlock

Suppose iderw held the idelock and then got interrupted to run
ideintr.

Ideintr would try to lock idelock, see it was held, and wait for it to be
released.

In this situation, idelock will never be released

Deadlock

General OS rule: if a spin-lock is used by an interrupt handler,
a processor must never hold that lock with interrupts enabled

Xv6 rule: when a processor enters a spin-lock critical section,
xv6 always ensures interrupts are disabled on that processor.
sleeplocks

Sleeplocks don’t spin. They move a process to a
wait-queue if the lock can’t be acquired

XV6 approach to “wait-queues”

Any memory address serves as a “wait channel”

The sleep() and wakeup() functions just use that
address as a ‘condition’

There are no per condition process queues! Just one
global queue of processes used for scheduling, sleep,
wakeup etc. --> Linear search everytime !

costly, but simple
void
sleep(void *chan, struct spinlock *lk)
{
sleep()
struct proc *p = myproc();
.... 
At call must hold lock on the
if(lk != &ptable.lock){
resource on which you are
going to sleep
acquire(&ptable.lock);

since you are going to change
release(lk);
p-> values & call sched(), hold
} ptable.lock if not held
p->chan = chan; 
p->chan = given address
p->state = SLEEPING; remembers on which
sched(); condition the process is
// Reacquire original lock. waiting
if(lk != &ptable.lock){

call to sched() blocks the
process
release(&ptable.lock);
acquire(lk);
}
Calls to sleep() : examples of
“chan” (output from cscope)
0 console.c 7 proc.c wait
consoleread 251 317 sleep(curproc,
sleep(&input.r, &cons.lock); &ptable.lock);
2 ide.c iderw
8 sleeplock.c
169 sleep(b, &idelock);
acquiresleep 28
3 log.c begin_op sleep(lk, &lk->lk);
131 sleep(&log, &log.lock);
9 sysproc.c
6 pipe.c piperead
111 sleep(&p->nread, &p-
sys_sleep 74
>lock); sleep(&ticks, &tickslock);
void wakeup(void *chan)
{ Wakeup()
acquire(&ptable.lock);
wakeup1(chan); 
Acquire ptable.lock
release(&ptable.lock);
since you are going to
}
change ptable and p->
static void wakeup1(void *chan) values
{

just linear search in
struct proc *p;
process table for a
process where p-
for(p = ptable.proc; p < >chan is given address
&ptable.proc[NPROC]; p++)
if(p->state == SLEEPING &&

Make it runnable
p->chan == chan)
p->state = RUNNABLE;
}
sleeplock
// Long-term locks for processes
struct sleeplock {
uint locked; // Is the lock held?
struct spinlock lk; // spinlock protecting this sleep lock

// For debugging:
char *name; // Name of lock.
int pid; // Process holding lock
};
Sleeplock acquire and release
void void
acquiresleep(struct sleeplock *lk)
releasesleep(struct sleeplock
{
*lk)
acquire(&lk->lk);
while (lk->locked) { {
/* Abhijit: interrupts are not disabled in acquire(&lk->lk);
sleep !*/
sleep(lk, &lk->lk); lk->locked = 0;
} lk->pid = 0;
lk->locked = 1;
wakeup(lk);
lk->pid = myproc()->pid;
release(&lk->lk); release(&lk->lk);
} }
Where are sleeplocks used?

struct buf 
Just two !

waiting for I/O on
this buffer

struct inode

waiting for I/o to this
inode
Sleeplocks issues

sleep-locks support yielding the processor during their critical
sections.

This property poses a design challenge:

if thread T1 holds lock L1 and has yielded the processor,

and thread T2 wishes to acquire L1,

we have to ensure that T1 can execute

while T2 is waiting so that T1 can release L1.

T2 can’t use the spin-lock acquire function here: it spins with interrupts
turned off, and that would prevent T1 from running.

To avoid this deadlock, the sleep-lock acquire routine (called
acquiresleep) yields the processor while waiting, and does not
disable interrupts.
Sleep-locks leave interrupts enabled, they cannot be used in
interrupt handlers.
Lock Ordering

lock on the directory, a lock on the new file’s
inode, a lock on a disk block buffer, idelock,
and ptable.lock.
Interesting case of holding and releasing
ptable.lock in scheduling

One process acquires, another releases!


Giving up CPU

A process that wants to give up the CPU

must acquire the process table lock ptable.lock

release any other locks it is holding

update its own state (proc->state),

and then call sched()

Yield follows this convention, as do sleep and exit

Lock held by one process P1, will be released another
process P2 that starts running after sched()

remember P2 returns either in yield() or sleep()

In both, the first thing done is releasing ptable.lock
Interesting race if ptable.lock is
not held

Suppose P1 calls yield()

Suppose yield() does not take ptable.lock

Remember yield() is for a process to give up CPU

Yield sets process state of P1 to RUNNABLE

Before yield’s sched() calls swtch()

Another processor runs scheduler() and runs P1 on
that processor

Now we have P1 running on both processors!

P1 in yield taking ptable.lock prevents this
open,read, write, close, ... FS system calls
Files, Inodes, Buffers
What we already know

File system related system calls

deal with ‘fd’ arrays (ofile in xv6). open() returns first empty index.
open should ideallly locate the inode on disk and initialize some data
structures

maintain ‘offsets’ within a ‘file’ to support sequential read/write

dup() like system calls duplicate pointers in fd-array

read/write like system calls, going through ‘ofile’ array, should locate
data of file on disk

We need functions to read/write from disk – that is disk driver

cache data of files in OS data structures for performance : buffering

Need to handle on disk data structures as well

Faster recovery (like journaling in ext3) is desired
xv6 file handling code

Is a very good example in ‘design’ of a
layered and modular architecture

Splits the entire work into different modules,
and modules into functions properly

The task of each function is neatly defined
and compartamentalized
Layers of xv6 file system code
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,


iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart, ideintr, iderw

Normally, any upper layer can call any lower layer below

Abhijit: Block allocator should be considered as another Layer!


Layout of xv6 file system
Pointer shown are
conceptual

May see the code of mkfs.c to get insight into the layout

struct superblock {
uint size; // Size of file system image (blocks)
uint nblocks; // Number of data blocks
uint ninodes; // Number of inodes.
uint nlog; // Number of log blocks
uint logstart; // Block number of first log block
uint inodestart; // Block number of first inode block
uint bmapstart; // Block number of first free map block
};
#define ROOTINO 1 // root i-number
#define BSIZE 512 // block size
Layout of xv6 file system

#define NDIRECT 12
#define NINDIRECT (BSIZE / sizeof(uint))
#define MAXFILE (NDIRECT + NINDIRECT)
// On-disk inode structure
struct dinode {
short type; // File type
short major; // Major device number (T_DEV only)
short minor; // Minor device number (T_DEV only)
short nlink; // Number of links to inode in file system
uint size; // Size of file (bytes)
uint addrs[NDIRECT+1]; // Data block addresses
};

#define DIRSIZ 14

struct dirent {
ushort inum;
char name[DIRSIZ];
};
File on disk
Let’s discuss lowest layer first
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,


iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart,


ideintr, iderw
Normally, any upper layer can call any lower layer below
ide.c: idewait, ideinit, idestart,
ideintr, iderw
static struct spinlock idelock;
static struct buf *idequeue;
static int havedisk1;

ideinit

was called from main.c: main()

Initialized IDE controller by writing to certain ports

havedisk=1 setup

Initialize idelock

idewait

BUSY loop waiting for IDE to be ready
ide.c: idewait, ideinit, idestart,
ideintr, iderw

void idestart(buf *b)

static void idestart(struct buf *b)

Calculate sector number on disk using b->blockno

Issue a read/write command to IDE controller.

(This is the first buf on idequeue)

ideintr

Take idelock. Called on IDE interrupt (through alltraps()->trap())

Wakeup the process waiting on first buffer in buffer *idequeue;

call idestart(). Release idelock.

iderw(buf *b)

Move buf b to end of idequeue

Call idestart() if not running, sleep on idelock
Let’s see buffer cache layer
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,


iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,

bio.c binit, bget, bread,


bwrite, brelse
ide.c: idewait, ideinit, idestart, ideintr, iderw

Normally, any upper layer can call any lower layer below
Reminder: After main()->binit()

head Conceptually
Linked liks
this
n n n n n n n n n
Buffers keep
p p p p p p p p p moving on
list, as LRU

lock buf[0] buf[1] buf[2] ... head

struct bcache
struct buf
struct buf {
int flags; // 0 or B_VALID or B_DIRTY
uint dev; // device number
uint blockno; // seq block number on device
struct sleeplock lock; // Lock to be held by process using it
uint refcnt; // Number of live accesses to the buf
struct buf *prev; // LRU cache list
struct buf *next; // LRU cache list
struct buf *qnext; // disk queue
uchar data[BSIZE]; // data 512 bytes
};
#define B_VALID 0x2 // buffer has been read from disk
#define B_DIRTY 0x4 // buffer needs to be written to disk
buffer cache:
static struct buf* bget(uint dev, uint blockno)

The bcache.head list is maintained on Most Recently
Used (MRU) basis

head.next is the Most Recently Used (MRU) buffer

hence head.prev is the Least Recently Used (LRU)

Look for a buffer with b->blockno = blockno and b-
>dev = dev

Search the head.next list for existing buffer (MRU order)

Else search the head.prev list for empty buffer

panic() if found in-use or empty buffer

Increment b->refcnt ; Returns buffer locked

Does not change the list structure, just returns a buf in
use
buffer cache:
struct buf* bread(uint dev, uint blockno)
struct buf* void
bread(uint dev, uint blockno) bwrite(struct buf *b)
{
{
struct buf *b;
if(!holdingsleep(&b-
b = bget(dev, blockno);
>lock))
if((b->flags & B_VALID) == 0) {
panic("bwrite");
iderw(b);
} b->flags |= B_DIRTY;
return b; // locked buffer iderw(b);
} }
Recollect: iderw moves buf to tail of idequeue, calls idestart() and sleep()
buffer cache:
void brelse(struct buf *b)

release lock on buffer

b->refcnt = 0

If b->refcnt = 0

Means buffer will no longer be used

Move it to front of the front of bcache.head
Overall in this diagram

head

n n n n n n n n n

p p p p p p p p p

Buffers keep moving to the front of the list and around


The list always contains NBUF=30 buffers
head.next is always the MRU and head.prev is always LRU
buffer
Let’s see logging layer
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,


iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op,
initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart, ideintr, iderw


Normally, any upper layer can call any lower layer below
log in xv6

a mechanism of recovery from disk

Concept: multiple write operations needed for system
calls (e.g. ‘open’ system call to create a file in a
directory)

some writes succed and some don’t

leading to inconsistencies on disk

In the log, all changes for a ‘transaction’ (an
operation) are either written completely or not at all

During recovery, completed operations can be “rerun”
and incomplete operations neglected
log in xv6

xv6 system call does not directly write the on-disk file system
data structures.

A system call calls begin_op() at begining and end_op() at end

begin_op() increments log.outstanding

end_op() decrements log.outstanding, and if it’s 0, then calls commit()

During the code of system call, whenever a buffer is modified,
(and done with)

log_write() is called

This copies the block in an array of blocks for log

when finally commit() is called, all modified blocks are copied
to disk
log
struct logheader { // ON DISK
int n; // number of entries in use in block[] below
int block[LOGSIZE]; // List of block numbers stored
};
struct log { // only in memory
struct spinlock lock;
int start; // first log block on disk (starts with logheader)
int size; // total number of log blocks (in use out of 30)
int outstanding; // how many FS sys calls are executing.
int committing; // in commit(), please wait.
int dev; // FS device
struct logheader lh; // copy of the on disk logheader
};
struct log log;
log on disk
log.start log.start + 1
log.start+30

data data
boot super
block block
log 52nd of 68nd
oflog ... log indoes | bitmap | data ....
block block

block[30]
n 52 68
=2

0 1 2 3 ..... 29

logheader
Typical use case of logging
/* In a system call code * / prepare for logging. Wait if
logging system is not ready or
begin_op(); ‘committing’. ++outstanding
... read and get access to a data
block – as a buffer
bp = bread(...); modify buffer
bp->data[...] = ...; note down this buffer for
writing, in log. proxy for
log_write(bp); bwrite(). Mark B_DIRTY. Absorb
... multiple writes into one.
Syscall done. write log and all
end_op(); blocks. --outstanding.
If outstanding = 0, commit().

match colors in code and comments on right-side


Example of calls to logging
//file_write() code 
each writei() in turn
calls bread(),
begin_op(); log_write() and brelse()
ilock(f->ip); 
also calles iupdate(ip)
which also calls bread,
/*loop */ r = writei(f- log_write and brelse
>ip, ...); 
Multiple writes are
iunlock(f->ip); combined between
begin_op() and
end_op(); end_op()
Let’s see block allocation layer
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,


iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart, ideintr, iderw

Normally, any upper layer can call any lower layer below

Abhijit: Block allocator should be considered as another Layer!


allocating & deallocating blocks
on DISK

balloc() 
bfree()

looks for a block whose
bitmap bit is zero, indicating

finds the right
that it is free. bitmap block and

On finding updates the clears the right bit.
bitmap and returns the block.

balloc() calls bread()->bget to

Also calls log_write()
get a block from disk in a
buffer.

Race prevented by the fact that
the buffer cache only lets one
process use any one bitmap
block at a time.

Calls log_write(bp);

Thus writes to bitmap blocks
are also logged
Let’s see Inode Layer
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup,


ilock, iunlock, iput, iunlockput, itrunc,
stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart, ideintr, iderw


On disk & in memory inodes
// in-memory copy of an inode
struct { struct inode {

struct spinlock lock; uint dev; // Device number


uint inum; // Inode number
struct inode inode[NINODE];
int ref; // Reference count
} icache; struct sleeplock lock; // protects
everything below here
// On-disk inode structure int valid; // been read from disk?
struct dinode {
short type; // File type short type; // copy of disk inode
short major; // T_DEV Major device short major;
number short minor;
short minor; // Minor device number short nlink;
short nlink; // Number of links uint size;
uint size; // Size of file (bytes) uint addrs[NDIRECT+1];
uint addrs[NDIRECT+1]; / };
};
In memory inodes

Kernel keeps a subset 
See the caller graph
of on disk inodes, of iget()
those in use, in 
all those who call
memory
iget()

as long as ‘ref’ is >0

Sleep lock in ‘inode’

The iget and iput
protects
functions acquire and
release pointers to an 
fields in inode
inode, modifying the 
data blocks of inode
ref count.
iget and iupdate

iget 
iupdate(inode *ip)

searches for an inode in 
read on disk block of
icache
inode

if found, increments ref
and returns pointer to

get on disk inode
inode 
modify it as specified

else gets empty inode , in ‘ip’
initializes, ref=1 and return 
modify disk block of

No lock held after iget() inode

Code must call ilock() after
iget() to get lock

log_write(disk block of
inode)

During lookup (later), many
processes can iget() an
inode, but only one holds
the lock
itrunc , iput

itrunc(ip) 
iput(ip)

write all data blocks 
if ref is 1
of inode to disk 
itrunc(ip)

using bfree() 
type = 0

iupdate(ip) 
iupdate(ip)

called only when

i->valid = 0 // free in
memory
‘ref’ becomes zero

else

ref--
race in iput ?

A concurrent thread void
might be waiting in ilock iput(struct inode *ip)
to use this inode {
acquiresleep(&ip->lock);

and won’t be prepared to
if(ip->valid && ip->nlink == 0){
find the in ode is not
longer allocated acquire(&icache.lock);
int r = ip->ref;

This is not possible. release(&icache.lock);
Why? if(r == 1){

no way for a syscall to // inode has no links and no other
get a ref to a inode with references: truncate and free.
ip->ref = 1 itrunc(ip);
buffer and inode cache

to read an inode, it’s 
The inode cache is
block must be read write-through,
in a buffer 
code that modifies a

So the buffer always cached inode must
immediately write it
contains a copy of to disk with iupdate
the on-disk dinode

duplicate copy in in-

Inode may still exist
memory inode in the buffer cache
allocating inode

Loop over all disk 
ilock
inodes 
code must acquire

read inode (from it’s ilock before using
block)
inode’s data/fields

if it’s free (note inum)

Ilock reads inode if

zero on disk inode
it’s already not in

write on disk inode (as memory
zeroes)

return iget(dev, inum)

panic if no free inodes
Trouble with iput() and crashes

iput() doesn’t truncate a 
if a crash happens
file immediately when before the last
the link count for the file process closes the file
drops to zero, because descriptor for the file,

some process might still 
then the file will be
hold a reference to the marked allocated on
inode in memory: a
disk but no directory
process might still be
reading and writing to
entry points to it
the file, because it 
Unsolved problem.
successfully opened it.

How to solve it?
Get Inode data: bmap(ip, bn)

Allocate ‘bn’th block
for the file given by
inode ‘ip’

Allocate block on disk
and store it in either
direct entries or block
of indirect entries

allocate block of
indirect entries if
needed using balloc()
writing/reading data at a given
offset in file
readi(struct inode *ip, 
Calculate the block
char *dst, uint off, uint number in file where ‘off’
belongs
n)

Read sufficient blocks to
writei(struct inode *ip, read ‘n’ bytes
char *src, uint off, uint 
using bread(), brelse()
n) 
Call devsw.read if inode
is a device Inode.

Writei() also updates size
if required
Reading Directory Layer
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup, fileclose, filestat,


fileread, filewrite,
fs.c namex, namei, nameiparent,
skipelem
fs.c dirlookup, dirlink
fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,
iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart, ideintr, iderw


directory entry
#define DIRSIZ 14

struct dirent {
ushort inum;
char name[DIRSIZ];
};
Data of a directory file is a sequence of such entries. To find
a name, just get all the data blocks and search the name
How to get the data for a directory? We already know the ans!
struct inode*
dirlookup(struct inode *dp, char *name, uint *poff)


Given a pointer to directory inode (dp), name of file
to be searched

return the pointer to inode of that file (NULL if not found)

set the ‘offset’ of the entry found, inside directories data
blocks, in poff

How was ‘dp’ obtained? Who should be calling
dirlookup? Why is poff returned?

During resolution of pathnames?

Code: call readi() to get data of dp, search name in it,
name comes with inode-num, iget() that inode-num
int
dirlink(struct inode *dp, char *name, uint inum)


Create a new entry for ‘name’_’inum’ in
directory given by ‘dp’

inode number must have been obtained before
calling this. How to do that?

Use dirlookup() to verify entry does not exist!

Get empty slot in directory’s data block

Make directory entry

Update directory inode! writei()
namex

Called by namei(), or nameiparent()

Just iteratively split a path using “/”
separator and get inode for last component

iget() root inode, then

Repeatedly calls

split on “/”, dirlookup() for next component

races in namex()

Crucial. Called so many times!

one kernel thread is looking up a pathname another
kernel thread may be changing the directory by calling
unlink

when executing dirlookup in namex, the lookup thread holds
the lock on the directory and dirlookup() returns an inode that
was obtained using iget.

Deadlock? next points to the same inode as ip when
looking up ".". Locking next before releasing the lock
on ip would result in a deadlock.

namex unlocks the directory before obtaining a lock on next.
File descriptor layer code
System Calls open, read, write, close, link, pipe, mknod, unlink, fstat,
mkdir, chdir, dup,

file.c fileinit, filealloc, filedup,


fileclose, filestat, fileread, filewrite,
fs.c namex, namei, nameiparent, skipelem

fs.c dirlookup, dirlink

fs.c iiinit, ialloc, iupdate,iget, idup, ilock, iunlock, iput,


iunlockput, itrunc, stati, readi, writei, bmap,
Block allocation on disk: balloc, bfree
log.c : begin_op, end_op, initlog, commit,
bio.c binit, bget, bread, bwrite, brelse

ide.c: idewait, ideinit, idestart, ideintr, iderw


data structures related to “file”
layer
struct file { struct proc {
enum { FD_NONE, FD_PIPE, ...
FD_INODE } type; struct file *ofile[NOFILE]; // Open files
int ref; // reference count per process
char readable; ...
}
char writable;
struct {
struct pipe *pipe; // used only if it
works as a pipe struct spinlock lock;
struct inode *ip; struct file file[NFILE];

uint off; } ftable; //global table from which ‘file’


is allocated to every process
};
Lock is used to protect updates to
// interesting no lock in struct file ! every entry in the array
Multiple processes accessing
same file.

Each will get a different ‘struct file’

but share the inode !

different offset in struct file, for each process

Also true, if same process opens file many times

File can be a PIPE (more later)

what about STDIN, STDOUT, STDERR files ?

Figure out!

ref

used if the file was ‘duped’ or process forked . in that case
the ‘struct file’ is shared
file layer functions

filealloc 
fileclose
--ref
find an empty struct

file in ‘ftable’ and 


if ref = 0
return it

free struct file

iput() / pipeclose()

set ref = 1 
note – transaction if iput()
called

filedup(file *) 
filestat

simply ref++ 
simply return fields from
inode, after holding lock.
on inodes for files only.
file layer functions

fileread 
Why does readi()

call readi() or piperead() call read on the

readi() later calls device- device , why not
read or inode read (using
bread())
fileread() itself call
device read ?

filewrite

call pipewrite() or writei()

writei() is called in a loop,
within a transaction
pipes
struct pipe { 
functions
struct spinlock lock;
char data[PIPESIZE];

pipealloc
uint nread; 
pipeclose
// number of bytes read
uint nwrite;

pipread
// number of bytes written 
pipewrite
int readopen;

// read fd is still open
int writeopen;
// write fd is still open
};
pipes

pipealloc 
pipewrite

wait if pipe full

allocate two struct file 
write to pipe

allocate pipe itself 
wakeup processes waiting to
using kalloc (it’s a big read
structure with array) 
piperead
wait if no data
init lock



read from pipe

initialize both struct 
wakeup processes waiting to
file as 2 ends (r/w) write

Good producer consumer
code !
Further to reading system call
code now

Now we are ready to read the code of
system calls on file system

sys_open, sys_write, sys_read , etc.

Advise: Before you read code of these,
contemplate on what these functions
should do using the functions we have
studied so far.

Also think of locks that need to be held.
Extra Slides
(ignore)
Possible assignments

Create a device with major no 4 and add code for
handling that device

Implement lseek system call

Implement buddy or slab allocator for kernel data
structures

Implement “ps” in xv6

Print stack trace in panic()

Add “exit” to shell and make sure that the OS halts
Possible assignments

bget() panicks if no buffers are available.
Make it wait and wakeup when a buffer is
available. Ensure no deadlocks.

inodes may be marked allocated on disk,
even though they are not in use anymore.
solve this problem.
Extra slides
(useless)
Extra slides
(useless)
Drivers
void
ideinit(void)
int i;
{ Disk Driver in
initlock(&idelock, "ide"); xv6
ioapicenable(IRQ_IDE, ncpu
- 1); 
Enable interrupts on
idewait(0); IRQ_IDE line, only
// Check if disk 1 is
present on last CPU
outb(0x1f6, 0xe0 |
(1<<4));

Last CPU handles
for(i=0; i<1000; i++){ these interrupts
if(inb(0x1f7) != 0){ 
Wait for disk to be
havedisk1 = 1;
break;
ready
} 
Check if you have
}
another disk
// Switch back to disk 0.
outb(0x1f6, 0xe0 |
(0<<4));
}
void

forkret()
forkret(void)
{
static int first = 1;
// Still holding ptable.lock from
scheduler.

Doesnt’ do much
release(&ptable.lock); 
Releases ptable.lock
if (first) { 
Why? We will see later
// Some initialization functions must be
run in the context 
Does some initialization if
// of a regular process (e.g., they call this process was “initcode”
sleep), and thus cannot
// be run from main().

Returns
first = 0; 
To? trapret()
iinit(ROOTDEV); 
Why?
initlog(ROOTDEV); 
We copied trapret() above
} forkret() on stack in allocproc()
// Return to "caller", actually trapret (see
allocproc).
}
trapret

We have already seen concept of trapret

Will just pop off entire trap frame from stack

And Return

Where?

Had EIP = 0 in trapframe

CS already points to 3 (from trapframe)

Pgdir points to process’s page dir

So just jump to _start in initcode.S
Initcode.S
start: 
exec(“filename”,
pushl $argv
pushl $init
arg1, arg2, NULL);
pushl $0 // where caller pc would be 
exec(“/init”, NULL)
movl $SYS_exec, %eax
int $T_SYSCALL

Next
init: 
We go to land of
.string "/init\0" exec() and fork()
argv:
.long init
.long 0
Processes
Logical layout of memory for a
process

Address 0: code

Then globals

Then stack

Then heap

Each processe’s address
space maps kernel’s text,
data also --> so that system
calls run with these
mappings

Kernel code can directly
access user memory now
Process Table
struct { 
One single global
struct spinlock array of processes
lock; 
Protected by
struct proc ptable.lock
proc[NPROC];
} ptable;
Struct proc
// Per-process state
struct proc {
uint sz; // Size of process memory (bytes)
pde_t* pgdir; // Page table
char *kstack; // Bottom of kernel stack for this process
enum procstate state; // Process state. allocated, ready to run, running,
wait-
ing for I/O, or exiting.
int pid; // Process ID
struct proc *parent; // Parent process
struct trapframe *tf; // Trap frame for current syscall
struct context *context; // swtch() here to run process. Process’s context
void *chan; // If non-zero, sleeping on chan. More when we discuss
sleep, wakeup
int killed; // If non-zero, have been killed
struct file *ofile[NOFILE]; // Open files, used by open(), read(),...
struct inode *cwd; // Current directory, changed with “chdir()”
char name[16]; // Process name (for debugging)
};
Process’s stacks

2 stacks for each process

User stack and kernel stack (p->kstack)

When running user code, the user stack is used
(kernel stack is empty)

When running kernel code, the user stack still
contains local variables, formal parameters

Kernel mappings in user address
space
actual location of kernel

Kernel is loaded at
0x100000 physical
address

PA 0 to 0x100000 is
BIOS and devices

Process’s page
table will map
VA 0x80000000 to
PA 0x00000 and
VA 0x8010000 to
0x100000
Kernel mappings in user address
space
actual location of kernel

Kernel is not
loaded at the PA
0x80000000
because some
systems may not
have that much
memory

0x80000000 is
called
KERNBASE in
xv6
Memory Management
X86 page
table
hardware
Layout of
process’s
VA space
Memory
Layout
of a
user
process

After
exec()

Note the
argc,
argv on
stack
Memory Layout of a
user process

The “guard page” is just


a mapping in page table.
No frame allocated. It’s
marked as invalid. So if
stack grows (due to
many function calls),
then OS will detect it
with an exception
Memory Layout of a
user process

On sbrk()
The system call to grow
process’s address space.
Calls growproc()

growproc()
Allocate a frame, Add an
entry in page table at the top
(above proc->sz)
//This entry can’t go beyond
KERNBASE
Calls switchuvm()

Switchuvm()
Ultimately loads CR3,
invalidating cache
Free List in
XV6

lock
kmem
uselock Seen
Actually like
run *freelist independent
this in memory
ly

run * run * run *


exec()

sys_exec()
exec(path, argv)

exec(parth, argv)
ip = namei(path))
readi(ip, (char*)&elf, 0, sizeof(elf)) != sizeof(elf)
for(i=0, off=elf.phoff; i<elf.phnum; i++, off+=sizeof(ph)){
if(readi(ip, (char*)&ph, off, sizeof(ph)) != sizeof(ph))
if((sz = allocuvm(pgdir, sz, ph.vaddr + ph.memsz)) == 0)
if(loaduvm(pgdir, (char*)ph.vaddr, ip, ph.off, ph.filesz) < 0)
}
exec()

exec(parth, argv)
// Allocate two pages at the next page boundary.
// Make the first inaccessible. Use the second as the user
stack.
sz = PGROUNDUP(sz);
if((sz = allocuvm(pgdir, sz, sz + 2*PGSIZE)) == 0)
// Push argument strings, prepare rest of stack in ustack.
for(argc = 0; argv[argc]; argc++) {
sp = (sp - (strlen(argv[argc]) + 1)) & ~3;
if(copyout(pgdir, sp, argv[argc], strlen(argv[argc]) + 1) < 0)
Various Data Structures in XV6
elf.h

Executable and Linkable 
struct elfhdr
Format (ELF) is Linux’s
standard format for

Header in every ELF
executable files file

elf.h contains header 
struct proghdr
files to access ELF files 
Every “section” in
to some extent ELF has a program

Does not have complete header
ELF capabilities. Not
needed either
Handling ELF in XV6

When is it needed?

During Exec()!

Also during load of
Kernel itself (
List of system calls in xv6

Create a process
Terminate the current Read n bytes from
fork() read(fd, buf, n)
● ●
 
process an open file into buf

exit() ●
Wait for a child 
write(fd, buf, n) ●
Write n bytes to an
process to exit open file

wait() ●
Terminate process

close(fd) ●
Release open file fd
pid dup(fd)

kill(pid)  ●
Duplicate fd

Return the current Create a pipe and
pipe(p)

getpid() process’s pid 



return fd’s in p
Sleep for n clock
chdir(dirname)

sleep(n)
 ●
Change the current

ticks
directory

exec(filename,

Load a file and 
mkdir(dirname) ●
Create a new
execute it
*argv) 
mknod(name, directory

Grow process’s
memory by n bytes major, minor) ●
Create a device file

sbrk(n) ●
Return info about an
Open a file; the flags
fstat(fd)


open file

open(filename indicate read/write
link(f1, f2) Create another name
, flags)


(f2) for the file f1
exec()

exec(parth, argv)
// Commit to the user image.
oldpgdir = curproc->pgdir;
curproc->pgdir = pgdir;
curproc->sz = sz;
curproc->tf->eip = elf.entry; // main
curproc->tf->esp = sp;
switchuvm(curproc);
freevm(oldpgdir)
Handling Interrupt Controllers
IO-APIC and L-APIC

APIC: Advanced Programmable Interrupt
Controller

IO-APIC

Routing interrupts from for I/O subsystem

L-APIC

Routing Interrupts on each processor
IO-APIC
 ioapic.c
/* roughly 3.98 GB. In high mem. Memory Mapped I/O address */
#define IOAPIC 0xFEC00000 // Default physical address of IO APIC

struct ioapic { uint reg; uint pad[3];


uint data;};
volatile struct ioapic *ioapic; // = IOAPIC
above
static void ioapicwrite(int reg, uint data)
{
ioapic->reg = reg;
ioapic->data = data; }
IO-APIC
ioapic.c /* called to enable
KBD, COM1, IDE
void ioapicinit(void) interrupts */
{
void ioapicenable(int
... irq, int cpunum)
for(i = 0; i <= maxintr; i+ {
+){
ioapicwrite(REG_TABLE+2*i,
INT_DISABLED | (T_IRQ0 + i)); ioapicwrite(REG_TABLE+2*
irq, T_IRQ0 + irq);
ioapicwrite(REG_TABLE+2*i+1,
0); ioapicwrite(REG_TABLE+2*
} irq+1, cpunum << 24);
All disabled, later enabled }
L-APIC
 lapic.c
Does many complex things
For multiprocessors
Initializes the APIC to deliver desired interrupts,
including TIMER interrupt
L-APIC
 lapic.c  void
 lapicinit(void)
volatile uint *lapic; //
set using mpconf-  {..
>lapicaddr in mpinit()
called during main()  ...

 static void lapicw(int  /* periodically generate


index, int value) { timer interrupt on each
processor */
 lapic[index] = value;
 lapicw(TIMER, PERIODIC |
 lapic[ID]; // wait for (T_IRQ0 + IRQ_TIMER));
write to finish, by reading

 }

You might also like