Unit-3 Part1
Unit-3 Part1
System
Unit – III
Dr.A.Kathirvel
Professor & Head/IT - VCEW
Unit - III
Distributed File systems – Architecture – Mechanisms –
Design Issues – Distributed Shared Memory –
Architecture – Algorithm – Protocols - Design Issues.
Distributed Scheduling – Issues – Components –
Algorithms.
DISTRIBUTED FILE SYSTEMS
A Distributed File System ( DFS ) is simply a classical
model of a file system ( as discussed before )
distributed across multiple machines. The purpose is
to promote sharing of dispersed files.
This is an area of active research interest today.
The resources on a particular machine are local to
itself. Resources on other machines are remote.
A file system provides a service for clients. The server
interface is the normal set of file operations: create,
read, etc. on files.
3
Definition of a DFS
DFS: multiple users, multiple sites, and
(possibly) distributed storage of files.
Benefits
File sharing
Uniform view of system from different clients
Centralized administration
Goals of a distributed file system
Network Transparency (access transparency)
Availability
Goals
Network (Access)Transparency
Users should be able to access files over a
network as easily as if the files were stored
locally.
Users should not have to know the physical
location of a file to access it.
Transparency can be addressed through
naming and file mounting mechanisms
Components of Access Transparency
cache cache
Communication Network
Server
cache cache
Disks
Server Server
Shared Memory
30
Advantages of distributed shared memory (DSM)
Data sharing is implicit, hiding data movement (as opposed to ‘Send’/‘Receive’
in message passing model)
Passing data structures containing pointers is easier (in message passing model
data moves between different address spaces)
Moving entire object to user takes advantage of locality difference
Less expensive to build than tightly coupled multiprocessor system: off-the-shelf
hardware, no expensive interface to shared physical memory
Very large total physical memory for all nodes: Large programs can run more
efficiently
No serial access to common bus for shared physical memory like in
multiprocessor systems
Programs written for shared memory multiprocessors can be run on DSM
systems with minimum changes
31
Algorithms for implementing DSM
Issues
How to keep track of the location of remote data
How to minimize communication overhead when accessing remote data
How to access concurrently remote data at several nodes
1. The Central Server Algorithm
Central server maintains all shared data
Read request: returns data item
Write request: updates data and returns acknowledgement message
Implementation
A timeout is used to resend a request if acknowledgment fails
Associated sequence numbers can be used to detect duplicate write requests
If an application’s request to access shared data fails repeatedly, a failure
condition is sent to the application
Issues: performance and reliability
Possible solutions
Partition shared data between several servers
Use a mapping function to distribute/locate data
32
Algorithms for implementing DSM (cont.)
2. The Migration Algorithm
Operation
Ship (migrate) entire data object (page, block) containing data item to requesting
location
Allow only one node to access a shared data at a time
Advantages
Takes advantage of the locality of reference
DSM can be integrated with VM at each node
Make DSM page multiple of VM page size
A locally held shared memory can be mapped into the VM page address
space
If page not local, fault-handler migrates page and removes it from address
space at remote node
To locate a remote data object:
Use a location server
Maintain hints at each node
Broadcast query
Issues
Only one node can access a data object at a time
Thrashing can occur: to minimize it, set minimum time data object resides at a
node 33
Algorithms for implementing DSM (cont.)
34
Algorithms for implementing DSM (cont.)
35
Memory coherence
DSM are based on
Replicated shared data objects
Concurrent access of data objects at many nodes
Coherent memory: when value returned by read operation is the
expected value (e.g., value of most recent write)
Mechanism that control/synchronizes accesses is needed to
maintain memory coherence
Sequential consistency: A system is sequentially consistent if
The result of any execution of operations of all processors is the same as if
they were executed in sequential order, and
The operations of each processor appear in this sequence in the order
specified by its program
General consistency:
All copies of a memory location (replicas) eventually contain same data
when all writes issued by every processor have completed
36
Memory coherence (Cont.)
Processor consistency:
Operations issued by a processor are performed in the order they are issued
Operations issued by several processors may not be performed in the same
order (e.g. simultaneous reads of same location by different processors may
yields different results)
Weak consistency:
Memory is consistent only (immediately) after a synchronization operation
A regular data access can be performed only after all previous
synchronization accesses have completed
Release consistency:
Further relaxation of weak consistency
Synchronization operations must be consistent which each other only within
a processor
Synchronization operations: Acquire (i.e. lock), Release (i.e. unlock)
Sequence: Acquire
Regular access
Release
37
Coherence Protocols
Issues
How do we ensure that all replicas have the same information
How do we ensure that nodes do not access stale data
1. Write-invalidate protocol
A write to shared data invalidates all copies except one before write executes
Invalidated copies are no longer accessible
Advantage: good performance for
Many updates between reads
Per node locality of reference
Disadvantage
Invalidations sent to all nodes that have copies
Inefficient if many nodes access same object
Examples: most DSM systems: IVY, Clouds, Dash, Memnet, Mermaid, and Mirage
2. Write-update protocol
A write to shared data causes all copies to be updated (new value sent, instead of
validation)
More difficult to implement
38
Design issues
Granularity: size of shared memory unit
If DSM page size is a multiple of the local virtual memory (VM)
management page size (supported by hardware), then DSM can be
integrated with VM, i.e. use the VM page handling
Advantages vs. disadvantages of using a large page size:
(+) Exploit locality of reference
(+) Less overhead in page transport
(-) More contention for page by many processes
Advantages vs. disadvantages of using a small page size
(+) Less contention
(+) Less false sharing (page contains two items, not shared but needed by two
processes)
(-) More page traffic
Examples
PLUS: page size 4 Kbytes, unit of memory access is 32-bit word
Clouds, Munin: object is unit of shared data structure
39
Design issues (cont.)
Page replacement
Replacement algorithm (e.g. LRU) must take into account page access
modes: shared, private, read-only, writable
Example: LRU with access modes
Private (local) pages to be replaced before shared ones
Private pages swapped to disk
Shared pages sent over network to owner
Read-only pages may be discarded (owners have a copy)
40
Distributed Scheduling
Good resource allocation schemes are needed to
fully utilize the computing capacity of the DS
Distributed scheduler is a resource management
component of a DOS
It focuses on judiciously and transparently
redistributing the load of the system among the
computers
Target is to maximize the overall performance of the
system
More suitable for DS based on LANs
Motivation
A locally distributed system consists of a collection of
autonomous computers connected by a local area
communication network
Users submit tasks at their host computers for processing
Load distributed is required in such environment because of
random arrival of tasks and their random CPU service time
There is a possibility that several computers are heavily
loaded and others are idle of lightly loaded
If the load is heavier on some systems or if some processors
execute tasks at a slower rate than others, this situation will
occur often
Distributed Systems Modeling
Consider a system of N identical and independent
servers
Identical means that all servers have the same task
arrival and service rates
Let ρ be the utilization of each server, than P=1- ρ, is
the probability that a server is idle
If the ρ=0.6, it means that P=0.4,
If the systems have different load than load can be
transferred from highly loaded systems to lightly load
systems to increase the performance
Issues in Load Distribution
Load
Resource queue lengths and particularly the CPU queue
length are good indicators of load
Measuring the CPU queue length is fairly simple and
carries little overhead
CPU queue length does not always tell the correct
situation as the jobs may differ in types
Another load measuring criterion is the processor
utilization
Requires a background process that monitors CPU
utilization continuously and imposes more overhead
Used in most of the load balancing algorithms
Classification of LDA
Basic function is to transfer load from heavily loaded
systems to idle or lightly loaded systems
These algorithms can be classified as :
Static
decisions are hard-wired in the algorithm using a prior knowledge
of the system
Dynamic
use system state information to make load distributing decisions
Adaptive
special case of dynamic algorithms in that they adapt their
activities by dynamically changing the parameters of the algorithm
to suit the changing system state
Basic Terminologies
Load Balancing vs. Load sharing
Load sharing algorithms strive to reduce the possibility for
a system to go to a state in which it lies idle while at the
same time tasks contend service at another, by
transferring tasks to lightly loaded nodes
Load balancing algorithms try to equalize loads at al
computers
Because a load balancing algorithm transfers tasks at
higher rate than a load sharing algorithm, the higher
overhead incurred by the load balancing algorithm may
outweigh this potential performance improvement
Basic Terminologies (contd.)
Preemptive vs. Non-preemptive transfer
Preemptive task transfers involve the transfer of a task that
is partially executed
Non-preemptive task transfers involve the transfer of the
tasks that have not begun execution and hence do not
require the transfer of the task’s state
Preemptive transfer is an expensive operation as the
collection of a task’s state can be difficult
What does a task’s state consist of?
Non-preemptive task transfers are also referred to as task
placements
Components of a Load Balancing
Algorithm
Transfer Policy
determines whether a node is in a suitable state to participate in a task
transfer
requires information on the local nodes’ state to make decisions
Selection Policy
determines which task should be transferred
Location Policy
determines to which node a task selected for transfer should be sent
requires information on the states of remote nodes to make decisions
Information policy
responsible for triggering the collection of system state information
Three types are: Demand-Driven, Periodic, State-Change-Driven
Stability
The two views of stability are,
The Queuing-Theoretic Perspective
A system is termed as unstable if the CPU queues grow
without bound when the long term arrival rate of work
to a system is greater than the rate at which the system
can perform work.
The Algorithmic Perspective
If an algorithm can perform fruitless actions
indefinitely with finite probability, the algorithm is said
to be unstable.
Load Distributing Algorithms
Sender-Initiated Algorithms
Receiver-Initiated Algorithms
Symmetrically Initiated Algorithms
Adaptive Algorithms
Sender-Initiated Algorithms
Activity is initiated by an overloaded node (sender)
A task is sent to an underloaded node (receiver)
Transfer Policy
A node is identified as a sender if a new task originating at the
node makes the queue length exceed a threshold T.
Selection Policy
Only new arrived tasks are considered for transfer
Location Policy
Random: dynamic location policy, no prior information exchange
Threshold: polling a node (selected at random) to find a receiver
Shortest: a group of nodes are polled to determine their queue
Information Policy
A demand-driven type
Stability
Location policies adopted cause system instability at high loads
Yes
Poll-set = Nil
Transfer task Yes QueueLength at “I”
Yes to “i” <T
Task
Arrives No
QueueLength+1
>T
No
Queue the
task locally
Receiver-Initiated Algorithms
Initiated from an underloaded node (receiver) to
obtain a task from an overloaded node (sender)
Transfer Policy
Triggered when a task departs
Selection Policy
Same as the previous
Location Policy
A node selected at random is polled to determine if transferring a
task from it would place its queue length below the threshold
level, if not, the polled node transfers a task.
Information Policy
A demand-driven type
Stability
Do not cause system instability in high system load, however, in
low load it spare CPU cycles
Most transfers are preemptive and therefore expensive
Yes
Poll-set = Nil
Transfer task Yes QueueLength at “I”
Yes from “i” to “j” >T
No
QueueLength
<T
Wait for a
perdetermined period No