0% found this document useful (0 votes)
41 views

Scheduling

The document discusses Linux scheduling. It aims for O(1) scheduling by using per-CPU run queues and priority-based scheduling. Processes are assigned both a static and dynamic priority. The dynamic priority is adjusted based on factors like how interactive or I/O-bound a process is. Processes are scheduled from priority-based run queues, with higher priority processes getting longer time quanta. Real-time processes have the highest priorities. The scheduler uses priority arrays and swapping between active and expired queues to avoid starvation and provide fairness.

Uploaded by

Owner JustACode
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Scheduling

The document discusses Linux scheduling. It aims for O(1) scheduling by using per-CPU run queues and priority-based scheduling. Processes are assigned both a static and dynamic priority. The dynamic priority is adjusted based on factors like how interactive or I/O-bound a process is. Processes are scheduled from priority-based run queues, with higher priority processes getting longer time quanta. Real-time processes have the highest priorities. The scheduler uses priority arrays and swapping between active and expired queues to avoid starvation and provide fairness.

Uploaded by

Owner JustACode
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Scheduling

in Linux
COMS W4118
Spring 2008
Scheduling Goals
 O(1) scheduling; 2.4 scheduler iterated through
 Run queue on each invocation
 Task queue at each epoch
 Scale well on multiple processors
 per-CPU run queues
 SMP affinity
 Interactivity boost
 Fairness
 Optimize for one or two runnable processes
2
Basic Philosophies
 Priority is the primary scheduling mechanism
 Priority is dynamically adjusted at run time
 Processes denied access to CPU get increased
 Processes running a long time get decreased
 Try to distinguish interactive processes from non-
interactive
 Bonus or penalty reflecting whether I/O or compute
bound
 Use large quanta for important processes
 Modify quanta based on CPU use
 Quantum != clock tick
 Associate processes to CPUs
 Do everything in O(1) time
3
The Run Queue
 140 separate queues, one for each priority
level
 Actually, two sets, active and expired
 Priorities 0-99 for real-time processes
 Priorities 100-139 for normal processes;
value set via nice() system call

4
Runqueue for O(1) Scheduler
priority array Higher priority
more I/O
priority queue 800ms quanta
. .
active . .
. . priority queue lower priority
more CPU
10ms quanta
expired
priority array
priority queue

. .
. .
. . priority queue
5
Scheduler Runqueue
 A scheduler runqueue is a list of tasks that are
runnable on a particular CPU.
 A rq structure maintains a linked list of those
tasks.
 The runqueues are maintained as an array
runqueues, indexed by the CPU number.
 The rq keeps a reference to its idle task
 The idle task for a CPU is never on the scheduler
runqueue for that CPU (it's always the last choice)
 Access to a runqueue is serialized by
acquiring and releasing rq->lock
Basic Scheduling Algorithm
 Find the highest-priority queue with a
runnable process
 Find the first process on that queue
 Calculate its quantum size
 Let it run
 When its time is up, put it on the expired list
 Repeat

7
The Highest Priority Process
 There is a bit map indicating which queues
have processes that are ready to run
 Find the first bit that’s set:
 140 queues  5 integers
 Only a few compares to find the first that is non-
zero
 Hardware instruction to find the first 1-bit
 bsfl on Intel
 Time depends on the number of priority levels,
not the number of processes
8
Scheduling Components
 Static Priority
 Sleep Average
 Bonus
 Interactivity Status
 Dynamic Priority

9
Static Priority
 Each task has a static priority that is set based
upon the nice value specified by the task.
 static_prio in task_struct
 The nice value is in a range of 0 to 39, with the
default value being 20. Only privileged tasks
can set the nice value below 20.
 For normal tasks, the static priority is 100 + the
nice value.
 Each task has a dynamic priority that is set
based upon a number of factors
Sleep Average
 Interactivity heuristic: sleep ratio
 Mostly sleeping: I/O bound
 Mostly running: CPU bound
 Sleep ratio approximation
 sleep_avg in the task_struct
 Range: 0 .. MAX_SLEEP_AVG (10 ms)
 When process wakes up (is made runnable),
recalc_task_prio adds in how many ticks it was sleeping
(blocked), up to some maximum value
(MAX_SLEEP_AVG)
 When process is switched out, schedule subtracts the
number of ticks that a task actually ran (without
blocking)

11
Bonus and Dynamic Priority
/* We scale the actual sleep average
* [0 .... MAX_SLEEP_AVG] into the
* -5 ... 0 ... +5 bonus/penalty range.

 Dynamic priority (prio in task_struct) is calculated in


effective_prio from static priority and bonus (which in
turn is derived from sleep_avg)
 Roughly speaking, the bonus is a number in [-5, 5] that
measures what percentage of the time the process was
sleeping recently; 0 is neutral, 5 helps, -5 hurts:

DP = SP − bonus + 5
DP = min(139, max(100, DP))
12
Calculating Time Slices
 time_slice in the task_struct
 Calculate Quantum where
 If (SP < 120): Quantum = (140 − SP) × 20
 if (SP >= 120): Quantum = (140 − SP) × 5
where SP is the static priority
 Higher priority process get longer quanta
 Basic idea: important processes should run longer
 As we will see, other mechanisms are used for quick
interactive response

13
Typical Quanta
Priority: Static Pri Niceness Quantum

Highest 100 -20 800 ms

High 110 -10 600 ms

Normal 120 0 100 ms

Low 130 10 50 ms

Lowest 139 20 5 ms
14
Interactive Processes
 A process is considered interactive if
bonus − 5 >= (Static Priority / 4) − 28
 Low-priority processes have a hard time becoming
interactive:
 A high static priority (100) becomes interactive when its
average sleep time is greater than 200 ms
 A default static priority process becomes interactive when
its sleep time is greater than 700 ms
 Lowest priority (139) can never become interactive
 The higher the bonus the task is getting and the
higher its static priority, the more likely it is to be
considered interactive.

15
Using Quanta
 At every time tick (in scheduler_tick) , decrement the quantum of
the current running process (time_slice)
 If the time goes to zero, the process is done
 Check interactive status:
 If non-interactive, put it aside on the expired list
 If interactive, put it at the end of the active list
 Exceptions: don’t put on active list if:
 If higher-priority process is on expired list
 If expired task has been waiting more than STARVATION_LIMIT
 If there’s nothing else at that priority, it will run again immediately
 Of course, by running so much, its bonus will go down, and so
will its priority and its interactive status

16
Avoiding Starvation
 The system only runs processes from active
queues, and puts them on expired queues when
they use up their quanta
 When a priority level of the active queue is empty,
the scheduler looks for the next-highest priority
queue
 After running all of the active queues, the active and
expired queues are swapped
 There are pointers to the current arrays; at the end
of a cycle, the pointers are switched
17
The Priority Arrays
struct prio_array {
unsigned int nr_active;
unsigned long bitmap[5];
struct list_head queue[140];
};
struct rq {
spinlock_t lock;
unsigned_long nr_running;
struct prio_array *active, *expired;
struct prio_array arrays[2];
task_struct *curr, *idle;

};

18
Swapping Arrays
struct prioarray *array =
rq->active;
if (array->nr_active == 0) {
rq->active = rq->expired;
rq->expired = array;
}

19
Why Two Arrays?
 Why is it done this way?
 It avoids the need for traditional aging
 Why is aging bad?
 It’s O(n) at each clock tick

20
The Traditional Algorithm
for(pp = proc; pp < proc+NPROC; pp++) {
if (pp->prio != MAX)
pp->prio++;
if (pp->prio > curproc->prio)
reschedule();
}
Every process is examined, quite frequently (This code
is taken almost verbatim from 6th Edition Unix, circa
1976.)
21
Linux is More Efficient
 Processes are touched only when they start
or stop running
 That’s when we recalculate priorities,
bonuses, quanta, and interactive status
 There are no loops over all processes or
even over all runnable processes

22
Real-Time Scheduling
 Linux has soft real-time scheduling
 No hard real-time guarantees
 All real-time processes are higher priority than any
conventional processes
 Processes with priorities [0, 99] are real-time
 saved in rt_priority in the task_struct
 scheduling priority of a real time task is: 99 - rt_priority
 Process can be converted to real-time via
sched_setscheduler system call

23
Real-Time Policies
 First-in, first-out: SCHED_FIFO
 Static priority
 Process is only preempted for a higher-priority process
 No time quanta; it runs until it blocks or yields voluntarily
 RR within same priority level
 Round-robin: SCHED_RR
 As above but with a time quanta (800 ms)
 Normal processes have SCHED_OTHER
scheduling policy

24
Multiprocessor Scheduling
 Each processor has a separate run queue
 Each processor only selects processes from its own
queue to run
 Yes, it’s possible for one processor to be idle while
others have jobs waiting in their run queues
 Periodically, the queues are rebalanced: if one
processor’s run queue is too long, some processes
are moved from it to another processor’s queue

25
Locking Runqueues
 To rebalance, the kernel sometimes needs to move
processes from one runqueue to another
 This is actually done by special kernel threads
 Naturally, the runqueue must be locked before this
happens
 The kernel always locks runqueues in order of
increasing indexes
 Why? Deadlock prevention!

26
Processor Affinity
 Each process has a bitmask saying what
CPUs it can run on
 Normally, of course, all CPUs are listed
 Processes can change the mask
 The mask is inherited by child processes
(and threads), thus tending to keep them on
the same CPU
 Rebalancing does not override affinity
27
Load Balancing
 To keep all CPUs busy, load balancing
pulls tasks from busy runqueues to idle
runqueues.
 If schedule finds that a runqueue has no
runnable tasks (other than the idle task), it
calls load_balance
 load_balance also called via timer
 schedule_tick calls rebalance_tick
 Every tick when system is idle
 Every 100 ms otherwise
Load Balancing
 load_balance looks for the busiest runqueue
(most runnable tasks) and takes a task that is
(in order of preference):

inactive (likely to be cache cold)

high priority
 load_balance skips tasks that are:

likely to be cache warm (hasn't run for
cache_decay_ticks time)

currently running on a CPU

not allowed to run on the current CPU (as
indicated by the cpus_allowed bitmask in the
task_struct)
Optimizations
 If next is a kernel thread, borrow the MM
mappings from prev
 User-level MMs are unused.
 Kernel-level MMs are the same for all kernel
threads
 If prev == next
 Don’t context switch

30
Sleep Time and Bonus
Average Sleep Time (ms) Bonus Time Slice Granularity
000 to 100 0 5120
100 to 200 1 2560
200 to 300 2 1280
300 to 400 3 640
400 to 500 4 320
500 to 600 5 160
600 to 700 6 80
700 to 800 7 40
800 to 900 8 20
900 to 999 9 10
31
1 second 10 10

You might also like