Introduction To The Kernel: Architecture of The UNIX Operating System
Introduction To The Kernel: Architecture of The UNIX Operating System
Processes
A process is the execution of a program and consists of a pattern of bytes that the CPU
interprets as machine instructions (called text), data, and stack. Processes communicate
with other processes and with the rest of the world via system calls.
A process on a UNIX system is the entity that is created by the fork system call. Every
process except process 0 is created when another process executes the fork system call.
The process which invoked fork system call is called the parent process and the newly
created process is called the child process. A process can have only one parent process
but it can have many child processes. The kernel identifies each process by its process
number, called the process ID (PID). Process 0 is a special process that is created "by
hand" when the system boots; after forking a child process (process 1), process 0
becomes the swapper process. Process 1, known as init is the ancestor of every other
process.
An executable file consists of the following parts:
• a set of "headers" that describe the attributes of the file
• the program text
• a machine language representation of the data that has initial values when the
program starts execution, and an indication of how much space the kernel should
allocate for uninitialized data, called bss (block started by symbol).
• other sections, such as symbol table information.
#include <stdio.h>
char buffer[2048];
int version = 1;
main() {
printf("Hello, world!");
}
In the code given above, the initialized data is the variable version and the uninitialized
data (i.e bss) is the array buffer.
The kernel loads an executable file into memory during an exec system call, and the
loaded process consists of at least three parts, called regions: text, data, and stack. The
text and data regions correspond to the text and data-bss sections of the executable file,
but the stack region is automatically created and its size is dynamically adjusted by the
kernel at runtime. The stack consists of logical stack frames that are pushed when calling
a function and popped when returning; a special register called the stack pointer
indicates the current stack depth. A stack frame consists of parameters to a function, its
local variables and the data necessary to recover the previous stack frame, including the
value of the program counter and stack pointer at the time of the function call.
Because a process in the UNIX system can execute in two modes, kernel or user, it uses a
separate stack for each mode. When a system call is made, a trap instruction is executed
which causes an interrupt which makes the hardware switch to kernel mode. The kernel
stack of a process is null when the process executes in user mode.
User and Kernel stack
User and Kernel stack
Every process has an entry in the kernel process table, and each process is allocated a u
area ("u" stands for "user") that contains private data manipulated only by the kernel.
The process table contains (or points to) a per process region table, whose entries point
to entries in a region table. A region is a contiguous area of a process's address space,
such as text, data, and stack. Region table entries describe the attributes of the region,
such as whether it contains text or data, whether it is shared or private, and where the
"data" of the region is located in memory. The extra level of indirection (from the per
process region table to the region table) allows independent processes to share regions.
Data structures for processes
Data structures for processes
Important fields in the process table are:
• a state field
• identifiers indicating the user who owns the process (user IDs, or UIDs)
• an event descriptor set when a process is suspended (in the sleep state)
The u area contains information that needs to be accessible only when the process is
executing. Important fields in the u area are:
• a pointer to the process table slot of the currently executing process
• parameters of the current system call, return values and error codes
• file descriptors for all open files
• internal I/O parameters
• current directory and current root
• process and file size limits
The kernel internally uses a structure variable u which points to the u area of the
currently executing process. When another process executes, the kernel rearranges its
virtual address space that u refers to the u area of the new process.
Context of a process
Context of a process consists of the following:
• text region
• values of global variables and data structures
• values of machine registers
• values stored in its process table slot
• u area
• contents of user and kernel stacks
The text of the operating system and its global data structures are shared by all
processes but do not constitute part of the context of a process.
When the kernel decides that it should execute another process, it does a context switch,
so that the system executes in the context of the other process.
The kernel services the interrupts in the context of the interrupted process even though
it may not have caused the interrupt. Interrupts are served in kernel mode.
Process states
1. Process is currently executing in user mode.
2. Process is currently executing in kernel mode.
3. Process is not executing, but it is ready to run as soon as the scheduler chooses it.
4. Process is sleeping.
Because a processor can execute only one process at a time, at most one process may be
in states 1 and 2.
State transitions
Processes move continuously between the states according to well-defined rules. A state
transition diagram is a directed graph.
Process States and Transitions
Process States and Transitions
By prohibiting arbitrary context switches and controlling the occurrence of interrupts,
the kernel protects its consistency.
The kernel allows a context switch only when a process moves from the state "kernel
running" to the state "asleep in memory". Processes running in kernel mode cannot be
preempted by other processes; therefore the kernel is sometimes said to be non-
preemptive.
Consider the following code snippet:
struct queue {
} *bp, bp1;
bp1->forp = bp->forp;
bp1->backp = bp
bp->forp = bp1;
// consider possible context switch here
bp1->forp->backp = bp1;
In the above code, we are trying to put a new node into a doubly linked list. Kernel uses
many such doubly linked lists as its data structures. While inserting the node in the list,
if a context switch occurs at the specified line, it will have incorrect links. If other
process modifies the list, it will get corrupt.
It unlocks the lock and awakens all processes asleep on the lock in the following
manner:
set condition false;
wakeup (event: the condition is false);
System Administration
Conceptually, there is no difference between system administrative processes and user
processes. It's just that the system administrative processes have more rights and
privileges. Internally, the kernel distinguishes a special user called the superuser. A user
may become a superuser by going through a login-password sequence or by executing
special programs.