Distributed Scheduling: B. Prabhakaran 1
Distributed Scheduling: B. Prabhakaran 1
Motivation: A distributed system may have a mix of heavily and lightly loaded systems. System performance can be improved by distributed load among heavy and light loaded nodes of the system. Distributed scheduler resource management component distributing load of the system.
B. Prabhakaran
What is Load?
Load on a system/node can correspond to the queue length of tasks/ processes that need to be processed. Queue length of waiting tasks: proportional to task response time, hence a good indicator of system load. Distributing load: transfer tasks/processes among nodes. If a task transfer (from another node) takes a long time, the node may accept more tasks during the transfer time. Causes the node to be highly loaded. Affects performance. Solution: artificially increment the queue length when a task is accepted for transfer from remote node (to account for the proposed increased in load). Task transfer can fail? : use timeouts. 2
Types of Algorithms
Static load distribution algorithms: Decisions are hardcoded into an algorithm with a priori knowledge of system. Dynamic load distribution: use system state information such as task queue length, processor utilization. Adaptive load distribution: adapt the approach based on system state.
(e.g.,) Dynamic distribution algorithms collect load information from nodes even at very high system loads. Load information collection itself can add load on the system as messages need to be exchanged. Adaptive distribution algorithms may stop collecting state information at high loads.
3
Transfer tasks even if a node is not heavily loaded so that queue lengths on all nodes are approximately equal. More number of task transfers, might degrade performance. Transfer tasks only when the queue length exceeds a certain threshold. Less number of task transfers.
Anticipatory task transfers: transfer from overloaded nodes to ones that are likely to become idle/lightly loaded.
Expensive as it involves collection of task states. Task state: virtual memory image, process control block, IO buffers, file pointers, timers, ...
Non-preemptive task transfers: transfer tasks that have not begun execution.
Do not require transfer of task states. Suitable for load sharing not for load balancing.
Algorithm Components
Thresholds, perhaps in terms of number of tasks, are generally used. (Another threshold can be processor utilization). When a load on a node exceeds a threshold T, the node becomes a sender. When it falls below a threshold, it becomes a receiver.
Algorithm Components...
Criteria: task transfer should lead to reduced response time. Simplest approach: select newly originated tasks. Transfer costs lower as no state information is to be transferred. Non-preemptive transfers. Other factors for selection: smaller tasks have less overhead.
Algorithm Components...
Polling is generally used. A node polls/checks whether another is suitable and willing. Polling can be done serially or in parallel (using multicast). A node can be selected for polling randomly, based on information from previous polls, or on a nearestneighbor manner.
Algorithm Components...
Information policy: for collecting system state information. (when, where, what)
Demand-driven Collection: Only when a node is highly or lightly loaded, i.e., when a node becomes a potential sender or receiver.
Periodic: May not be adaptive. Collection may be done at high loads worsening system performance. State-change driven: only when state changes by a certain degree.
The sum of load due to tasks and distribution must be less than the system capacity, else it is termed as unstable system. A stable algorithm can still give worse performance. When an algorithm improves performance is called effective. Load balancing algorithms should avoid fruitless actions. (e.g.,) processor thrashing: task transfer makes the receiver highly loaded, so the task gets transferred again, perhaps repeatedly.
10
Algorithmic perspective
Sender-initiated: distribution initiated by an overloaded node. Receiver-initiated: distribution initiated by lightly loaded nodes. Symmetric: initiated by both senders and receivers. Has advantages and disadvantages of both the approaches. Adaptive: sensitive to state of the system.
11
Sender-initiated
Sender if queue length exceeds T. Receiver if accepting a task will not make queue length exceed T.
12
Sender-initiated
Location Policy:
No need for state collection. Unnecessary task transfers (processor thrashing) may occur.
Threshold: Poll nodes until a receiver is found. Up to PollLimit nodes are polled. If none is a receiver, then the sender commits to the task.
Shortest: Poll a set of nodes. Select the receiver with shortest task queue length among the polled nodes.
13
Sender-initiated
Information Policy: demand-driven, iniiated by the sender. Stability: can become unstable at high loads.
At high loads, it may become difficult for senders to find receivers. Also, the number of senders increase at high system loads thereby increasing the polling activity. Polling activity may make the system unstable at high loads.
14
Receiver-initiated
Transfer Policy: uses thresholds. Queue lengths below T identifies receivers and those above T identifies senders. Selection Policy: as before.
Location Policy: Polling. Randomly poll nodes until a sender is found, and transfer a task from it. If no sender is found, wait for a period or until a task completes, and repeat.
15
Receiver-initiated
Information policy: demand-driven, initiated by receiver Stability: Not unstable since there are lightly loaded systems that have initiated the algorithm.
16
Receiver-initiated
Drawback: CPU scheduling algorithms are mostly round-robin, so a newly arrived task at an overload node is quickly
17
Symmetric
Combine both sender-initiated and receiver-initiated components in order to get a hybrid algorithm with the advantages of both. Care must be taken since otherwise, the hybrid algorithm may inherit the disadvantages of both sender and receiver initiated algorithms.
B. Prabhakaran
18
Two adaptive thresholds instead of one. If a nodes estimated average load is A, a higher threshold TooHigh > A and a lower threshold TooLow < A are used. Load < TooLow -> receiver. Load > TooHigh -> sender.
19
Above-average Algorithm
Location policy:
Sender Component:
sender sends TooHigh msg, sets TooHigh timeout, and listens for Acceptmsgs. receiver that gets a TooHigh msg, cancels its TooLow timeout, sends Accept msg, increase its load value, and sets AwaitingTask timeout. If AwaitTask timeout expires, load value is decreased. sender receiving Accept msg, transfers task, and cancels timeout.
B. Prabhakaran 20
Continued
if sender receiving a TooLow msg from a receiver, while waiting for an Accept, sends a TooHigh msg to it. sender whose TooHigh timeout expires, it broadcasts a ChangeAverage msg to all nodes to increase the average load estimate at the other nodes.
21
Above-average Algorithm...
Receiver Component
a receiver sends a TooLow msg, sets a TooLow timeout,and starts listening for TooHigh msgs.
a receiver getting a TooHigh msg, sends Accept msg, increase load, and sets AwaitingTask timeout. If it expires, decrease load value.
receiver whose TooLow timeout expires, sends ChangeAverage to decrease load estimate at other nodes.
22
Above-average Algorithm...
Selection Policy: as discussed before. Information policy: demand driven. Average load is modified based on system load.
Average system load is determined individually. There is a range of acceptable load before trying to be a sender or a receiver.
23
Adaptive Algorithms
Limit Senders polling actions at high load to avoid instability. Utilize the collected state information during previous polling operations. Each node maintains three lists of nodes
Sender, receiver, and OK list Initially it assumes that all other nodes are receivers Update these lists each time information about the state of a node is receive
24
Adaptive Algorithms
Transfer policy
A node is a receiver, a sender, or an OK node depending on whether its queue length is <= LT, >= UT, between LT and UT, where LT and UT are two thesholds
Polls up to PollLimit nodes in the Receiver list (head to tail) If a polled is still a receiver, then transfer a task to it If not, move the polled node at the head of the Sender or OK list, depending on its reported status
25
Adaptive Algorithms
Sender List (head to tail) OK list (tail to head) Receiver list (tail to head).
If the polled node is a sender, a task is transferred. If the node is not a sender, both the polled node and receiver update each others status. Polling process stops if a sender is found or a static PollLimit is reached.
26
Adaptive Algorithms
Selection policy: same as for sender-initiated and receiver-initiated Information policy: demand driven Algorithm dynamically becomes senderinitiated at low loads and receiver-initiated at high loads. Hence, algorithm is stable and can use nonpreemptive transfers at low loads (sender initiated).
27