Unit 4
Unit 4
The kernel has a process table where it stores the state of the
process and other information about the process. The information of
the entry and the u-area of the process combined is the context of
the process.
The process enters the created state when the parent process
executes the fork system call model and eventually moves into a
state where it is ready to run (3 or 5). The scheduler will eventually
pick the process and the process enters the state kernel running,
:
where it completes its part of fork system call. After the completion
of system call, it may move to user running. When interrupts occur
(such as system call), it again enters the state kernel running. After
the servicing of the interrupt the kernel may decide to schedule
another process to execute, so the first process enters the state
preempted. The state preempted is really same as the state ready to
run in memory, but they are depicted separately to stress that a
process executing in kernel mode can be preempted only when it is
about to return to user mode. Consequently, the kernel could swap a
process from the state preempted if necessary. Eventually, it will
return to user running again.
When a system call is executed, it leaves the state user running and
enters the state kernel running. If in kernel mode, the process needs
to sleep for some reason (such as waiting for I/O), it enters the state
asleep in memory. When the event on it which it has slept, happens,
the interrupt handler awakens the process, and it enters the state
ready to run in memory.
Some state transitions can be controlled by the users, but not all.
User can create a process. But the user has no control over when a
process transitions to sleeping in memory to sleeping in swap, or
:
ready to run in memory to ready to run in swap, etc. A process can
make a system call to transition itself to kernel running state. But it
has no control over when it will return from kernel mode. Finally, a
process can exit whenever it wants, but that is not the only reason
for exit to be called.
To solve this problem, the kernel treats the addresses given by the
compiler as virtual addresses. And when the program starts
executing, the memory management unit translates the virtual
addresses to physical addresses. The compiler doesn't need to know
which physical addresses the process will get. For example, two
instances of a same program could be executing in memory using
the same virtual addresses but different physical addresses. The
subsystems of the kernel and the hardware that cooperate to
translate virtual to physical addresses comprise the memory
management subsystem.
Regions
:
The UNIX system divides its virtual address space in logically
separated regions. The regions are contiguous area of virtual
address space. A region is a logically distinct object which can be
shared. The text, data, and stack are usually separate regions. It is
common to share the text region among instances of a same
process.
The region table entries contain the physical locations at which the
region is spread. Each process contains a private per process
regions table, called a pregion. The pregion entry contains a pointer
to an entry in the region table, and contains starting virtual address
of the region. pregion are stored in process table, or u-area, or a
separately allocated memory space, according to the
implementation. The pregion entries contain the access permissions:
read-only, read-write, or read-execute. The pregion and the region
structure is analogous to file table and the in-core inode table. But
since, pregions are specific to a process, pregion table is private to a
process, however the file table is global. Regions can be shared
amongst processes.
An example of regions:
These tables are called page tables. Region table entry has pointers
to page tables. Since logical address space is contiguous, it is just
the index into an array of physical page numbers. The page tables
also contain hardware dependent information such as permissions
for pages. Modern machines have special hardware for address
translation. Because software implementation of such translation
would be too slow. Hence, when a process starts executing, the
kernel tells the hardware where its page tables reside.
Example:
The U Area
Example:
The first two register triples point to text and data and the third triple
refers to the u-area of currently executing process (in this case,
:
process D). When a context switch happens, the entry in this fields
changes and points to the u-area of the newly scheduled process.
Entries 1 and 2 do not change as all the process share the kernel text
and data.
User level context consists of the process text, data, user stack and
shared memory that is in the virtual address space of the process.
The part which resides on swap space is also part of the user level
context.
The system level context has a "static part" and a "dynamic part". A
process has one static part throughout its lifetime. But it can have a
variable number of dynamic parts.
The kernel stack contains the stack frames the kernel functions.
Even if all processes share the kernel text and data, kernel stack
needs to be different for all processes as every process might
be in a different state depending on the system calls it executes.
The pointer to the kernel stack is usually stored in the u-area
but it differs according to system implementations. The kernel
stack is empty when the process executes in user mode
The dynamic part of the system level context consists of a set of
layers, visualized as a last-in-first-out stack. Each system-level
context layer contains information necessary to recover the
previous layer, including register context of the previous layer.
The following figure shows the components that form the context of
a process:
The right side of the figure shows the dynamic portion of the
context. It consists of several stack frames where each stack frame
contains saved register context of the previous layer and the kernel
stack as it executes in that layer. System context layer 0 is a dummy
layer that represents the user-level context; growth of the stack here
is in the user address space and the kernel stack is null. The process
table entry contains the information to recover the current layer in of
the process (shown by the arrow).
3. The kernel invokes the interrupt handler. The kernel stack of the
new context layer is logically distinct from the kernel stack of
the previous context layer. Some implementations use the
processes kernel stack to store the stack frame of an interrupt
handler, while some implementations use a global interrupt
stack for the interrupt handlers which are guaranteed to return
without a context switch.
4. The kernel returns from the interrupt handler and executes a set
of hardware instructions which restore the previous context.
The interrupt handler may affect the behavior of the process as
it might modify the kernel data structures. But usually, the
process resumes execution as if the interrupt never occurred.
/* Algorithm: inthand
:
* Input: none
* Output: none
*/
{
save (push) current context layer;
determine interrupt source;
find interrupt vector;
call interrupt handler;
restore (pop) previous context layer;
}
/* Algorithm: syscall
* Input: system call number
:
* Output: result of system call
*/
{
find entry in the system call table corresponding to the system call nu
determine number of parameters to the system call;
copy parameters from the user address space to u-area;
save current context for abortive return; // studied later
invoke system call code in kernel;
if (error during execution of system call)
{
set register 0 in user saved register context to error number;
turn on carry bit in PS register in user saved register context
}
else
set register 0, 1 in user saved register context to return valu
}
Consider the following code which calls the creat function of the C
library. And the assembly code generated by the compiler (on a
Motorola 68000) :
Context Switch
This is the VAX code for moving one character from user to kernel
address space:
The inode of the file from which the region was initially loaded.
The type of the region (text, shared memory, private data, or
stack).
The size of the region.
The location of the region in physical memory.
The state of the region:
locked
:
in demand
being loaded into memory
valid, loaded into memory
The reference count, giving the number of processes that
reference the region
lock a region
unlock a region
allocate a region
attach a region to the memory space of a process
change the size of a region
load a region from a file into the memory space of a process
free a region
detach a region from the memory space of a process, and
duplicate the contents of a region
Allocating a Region
/* Algorithm: allocreg
* Input: indoe pointer
* region type
* Output: locked region
*/
{
remove region from linked list of free regions;
assign region type;
assign region inode pointer;
if (inode pointer not null)
increment inode reference count;
place region on linked list of active regions;
return (locked region);
}
We check if the inode pointer is not null because there are a few
exceptions where a region is not associated with an inode.
/* Algorithm: attachreg
* Input: pointer to (locked) region being attached
* process to which the region is being attached
:
* virtual address in process where region will be attached
* region type
* Output: pre process region table entry
*/
{
allocate per process region table entry for process;
initialize per process region table entry;
set pointer to region being attached;
set type field;
set virtual address field;
check legality of virtual address, region size;
increment region reference count;
increment process size according to attached region;
initialize new hardware register triple for process;
return (per process region table entry);
}
/* Algorithm: growreg
* Input: pointer to per process region table entry
* change in size of region (positive or negative)
* Output: none
*/
:
{
if (region size increasing)
{
check legality of new region size;
allocate auxiliary tables (page tables);
if (not system supporting demand paging)
{
allocate physical memory;
initialize auxiliary tables, as necessary;
}
}
else // region size decreasing
{
free physical memory, as appropriate;
free auxiliary tables, as appropriate;
}
Loading a Region
/* Algorithm: loadreg
* Input: pointer to per process region table entry
* virtual address to load region
* inode pointer of file for loading region
* byte offset in file for start of region
* byte count for amount of data to load
* Output: none
*/
{
increase region size according to eventual size of region (algorithm: g
mark region state: being loaded into memory;
unlock region;
set up u-area parameters for reading file:
target virtual address where data is read to,
start offset value for reading file,
count of bytes to read from file;
read file into region (internal variant of read algorithm);
lock region;
mark region state: completely loaded into memory;
awaken all processes waiting for region to be loaded�;
}
For example, if the kernel wants to load text of size 7K into a region
that is attached at virtual address 0 of a process but wants to leave a
gap of 1K bytes at the beginning of the region. By this time, the
kernel will have allocated a region table entry and will have attached
the region at address 0 using algorithms allocreg and attachreg. Now
:
it invokes loadreg, which invokes growreg twice -- first, to account
for the 1K byte gap at the beginning of the region, and second, to
allocate storage for the contents of the region -- and growreg
allocates a page table for the region. The kernel then sets up fields in
the u-area to read the file: It reads 7K bytes from a specified byte
offset in the file (supplies as a parameter by the kernel) into virtual
address 1K of the process. It is shown in the following diagram:
Freeing a Region
/* Algorithm: freereg
* Input: pointer to a (locked) region
* Output: none
*/
{
if (region reference count non zero)
{
// some process still using region
release region lock;
if (region has an associated inode)
release inode lock;
return;
}
if (region has associated inode)
release inode (algorithm: iput);
free physical memory still associated with region;
free auxiliary tables associated with region;
clear region fields;
place region on region free list;
unlock region;
:
}
The kernel detaches regions in the exec, exit, and shmdt system
calls. It updates the pregion entry and cuts the connection to
physical memory by invalidating the associated memory
management register triple. The address translation mechanisms
thus invalidated apply specifically to the process, not to the region
(as in the algorithm freereg). The kernel decrements the region
reference count. If the region referenced count drops to 0 and there
is no reason to keep the region in memory (studied later), the kernel
frees the region with algorithm freereg. Otherwise, it only releases
the region and inode locks.
/* Algorithm: detachreg
* Input: pointer to per process region table entry
* Output: none
*/
{
get auxiliary memory management tables for process, release as appropri
decrement process size;
decrement region reference count;
if (region reference count is 0 and region not stick bit)
free region (algorithm: freereg;)
else // either reference count non-0 or region sticky bit on
{
free inode lock, if applicable (inode associated with region);
free region lock;
}
}
:
Duplicating a Region
In the fork system call, the kernel needs to duplicate the regions of a
process. If the region is shared, the kernel just increments the
reference count of the region. If it is not shared, the kernel has to
physically copy it, so it allocates a new region table entry, page table,
and physical memory for the region. The algorithm dupreg is given
below:
/* Algorithm: dupreg
* Input: pointer to region table entry
* Output: pointer to a region that looks identical to input region
*/
{
if (region type shared)
// caller will increment region reference count with subsequent
return (input region pointer);
allocate new region (algorithm: allocreg);
set up auxiliary memory management structures, as currently exist in in
allocate physical memory region contents;
"copy" region contents from input region to newly allocated region;
return (pointer to allocated region);
}
Sleep
Processes sleep inside of system calls awaiting for a particular
resource or even if a page fault occurs. In such cases, they push a
context layer and do a context switch. The context layers of a sleep
process are shown below:
:
Sleep Event and Addresses
Many process are waiting on addr A but some are waiting for the
buffer while a process is awaiting I/O completion. On any of these
two events, all the processes will be woken up. Even though any one
of these two events occurs, all the process will be woken up since
they are sleeping on the same address. It would have been better if
there was a one-to-one mapping, but practically, such clashes are
rare and system performance is not affected.
/* Algorithm: sleep
* Input: sleep address
* priority
* Output: 1 if a process awakened as a result of a signal that process catche
* longjmp if the process is awakened as a result of a signal it does
* 0 otherwise
*/
:
{
raise processor execution level to block all the interrupts;
set process state to sleep;
put process on sleep hash queue, based on sleep address;
save sleep address in process table slot;
set process priority level to input priority;
if (process sleep is not interruptible)
{
do context switch;
// process resumes execution here when it wakes up
reset processor priority level to allow interrupts as when proc
return (0);
}
reset processor priority level to what it was when the process went to
if (process sleep priority set to catch signals)
return 1;
do longjump algorithm;
}
The the kernel raises the processor execution level, it stores the old
level so that it can be restored when the process wakes up. The
kernel saves the sleep address and sleep priority in the process
:
table.
/* Algorithm: wakeup
* Input: sleep address
* Output: none
*/
{
raise process execution level to block all the interrupts;
find sleep hash queue for sleep address;
for (every process asleep on sleep address)
{
remove process from hash queue;
mark process as "ready to run";
put process on scheduler list of processes ready to run;
clear field in process table entry for sleep address;
if (process not loaded in memory)
wake up swapper process (0);
else if (awakened process is more eligible to run than currentl
set scheduler flag;
}
restore processor execution level to original level;
}
In the case where a signal has arrived when a process enters the
sleep algorithm, the process will sleep if the sleep priority is above a
threshold value, but if the priority value is below the threshold, it will
never sleep and respond to the signal as if it had arrived while it was
sleeping. If it had slept, the signal might not arrive later and the
process might never wakeup.