0% found this document useful (0 votes)
37 views

Distributed Computing

Distributed computing involves multiple networked computers communicating and coordinating actions by passing messages. It allows for pooling of resources, increased reliability through replication, and easy expansion by adding nodes. Common architectures include intranets and mobile/ubiquitous computing. Distributed systems must address challenges like heterogeneity, security, scalability, failure handling, and concurrency through techniques such as protocols, middleware, published interfaces, and redundancy.

Uploaded by

anurag_garg_20
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Distributed Computing

Distributed computing involves multiple networked computers communicating and coordinating actions by passing messages. It allows for pooling of resources, increased reliability through replication, and easy expansion by adding nodes. Common architectures include intranets and mobile/ubiquitous computing. Distributed systems must address challenges like heterogeneity, security, scalability, failure handling, and concurrency through techniques such as protocols, middleware, published interfaces, and redundancy.

Uploaded by

anurag_garg_20
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

Distributed Computing Overviews

Agenda

• What is distributed computing


• Why distributed computing
• Common Architecture
• Best Practice
What is Distributed Systems

• A distributed system is one in which


components located at networked computers
communicate and coordinate their actions
only by passing messages.
• Three Examples
– The internet
– An intranet which is a portion of the internet
managed by an organization
– Mobile and ubiquitous computing
The Internet

• The Internet is a very large distributed system.


• The implementation of the internet and the services
that it suports has entailed the development of
practical solutions to many distributed system issues.
Intranets

• An intranet is a portion of the Internet that is


separately administered and has a boundary that can
be configured to enforce local security policies
• The main issues arising in the design of components
for use in intranets are: file services, firewalls, cost.
Mobile and ubiquitous computing

• The portability of the devices, such as laptop


computers, PDA, mobil phone, refrigerators,
togather with their ability to connect
conveniently to networks in different places,
makes mobile computing possible.
• Ubiquitous computing is the harnessing of
many small cheap computational devices that
are present in users’ physical environments,
including the home, office and elsewhere.
Significant Consequences of DS

• Concurrency
– The capacity of the system to handle shared
resources can be increased by adding more
resources to the network.
• No global clock
– The only communication is by sending messages
through a network.
• Independent failures
– The programs may not be able to detect whether
the network has failed or has become unusually
slow.
Resource

• The term ”resource” is a rather abstract one, but it


best characterizes the range of things that can
usefully be shared in a networked computer system.

• It extends from hardware components such as disks


and printers to software-defined entities such as files,
databases and data objects of all kinds.
Important Terms of Web

• Services
– A distinct part of a computer system that manages a
collection of related resources and presents their
functionality to users and applications.
– http, telnet, pop3...
• Server
– A running program (a process) on a networked computer
that accepts requests from programs running on other
computers to perform a service, and responds
apppropriately.
– IIS, Apache...
• Client
– The requesting processes.
Challenges

• Heterogeneity
• Openness
• Security
• Scalability
• Failure handling
• Concurrency
• Transparency
Heterogeneity

• Different networks, hardware, operating


systems, programming languages,
developers.
• We set up protocols to solve these
heterogeneities.
• Middleware: a software layer that provides a
programming abstraction as well as masking
the heterogeneity.
Openness

• The openness of DS is determined primarily by


the degree to which new resource-sharing
services can be added and be made available
for use by a variety of client programs.
• Open systems are characterized by the fact that
their key interfaces are published.
• Open DS are based on the provision of a uniform
communication mechanism and published
interfaces for access to shared resources.
• Open DS can be constructed from
heterogeneous hardware and software.
Security

• Security for information resources has three


components:
– Confidentiality: protection against disclosure to
unauthorized individuals.
– Integrity: protection against alteration or
corruption.
– Availability: protection against interference with the
means to access the resources.
• Two new security challenges:
– Denial of service attacks (DoS).
– Security of mobile code.
Scalability

• A system is described as scalable if it


remains effective when there is a significant
increase in the number of resources and the
number of users.
• Challenges:
– Controlling the cost of resources or money.
– Controlling the performance loss.
– Preventing software resources from running out
– Avoiding preformance bottlenecks.
Scaling Techniques (1)

1.4

The difference between letting: (a) a server or (b)a


client check forms as they are being filled
Scaling Techniques (2)

1.5

An example of dividing the DNS name space into


zones.
Failure handling

• When faults occur in hardware or software,


programs may produce incorrect results or
they may stop before they have completed
the intended computation.
• Techniques for dealing with failures:
– Detecting failures
– Masking failures
– Tolerating failures
– Recovering form failures
– Redundancy
Concurrency

• There is a possibility that several clients will attempt


to access a shared resource at the same time.
• Any object that represents a shared resource in a
distributed system must be responsible for ensuring
that operates correctly in a concurrent environment.
INTRODUCTION

Distributed computing involves computing performed among


multiple network-connected computers.

Distributed system is a collection of individual computers.


Distributed programming is the process of writing distributed
program

[19]
What is Distributed Computing/System?

• Distributed computing
– A field of computing science that
studies distributed system.
– The use of distributed systems to
solve computational problems.
• Distributed system
• There are several autonomous
computational entities, each of which has its
own local memory.
• The entities communicate with each other by
message passing.
– Operating System Concept
• The processors communicate with one
another through various communication
lines, such as high-speed buses or
telephone lines.
• Each processor has its own local memory.
WHY DISTRIBUTED COMPUTING?

Economics- distributed systems allow the pooling


of resources.
Reliability- distributed system allow replication of
resources .

The Internet has become a universal platform for distributed


computing.

[21]
Why Distributed Computing?

• The nature of application


• Performance
– Computing intensive
• The task could consume a lot of time on computing.

– Data intensive
• The task that deals with a lot mount or large size of files. For example,
Facebook, LHC(Large Hadron Collider).
• Robustness
– No SPOF (Single Point Of Failure)
– Other nodes can execute the same task executed
on failed node.
Common properties

– Fault tolerance
• When one or some nodes fails, the whole system can still work fine
except performance.
• Need to check the status of each node
– Each node play partial role
• Each computer has only a limited, incomplete view of the system. Each
computer may know only one part of the input.
– Resource sharing
• Each user can share the computing power and storage resource in the
system with other users
– Load Sharing
• Dispatching several tasks to each nodes can help share loading to the
whole system.
– Easy to expand
• We expect to use few time when adding nodes. Hope to spend no time
if possible.
Centralized vs. Distributed
Computing

t e r m in a l
m a in f r a m e c o m p u t e r
w o r k s t a t io n

Early computing was performed


on a single processor.
Uni-processor computing
can be called centralized
n e t w o r k lin k
computing

n e tw o r k h o s t
c e n t r a liz e d c o m p u tin g
d is t r ib u t e d c o m p u t in g
Centralized vs. Distributed Computing
A distributed system is a collection of independent computers,
interconnected via a network, capable of collaborating on a task.
Distributed computing is computing performed in a distributed
system.
Distributed computing has become increasingly common due
advances that have made both machines and networks
cheaper and faster.
t e r m in a l
m a in f r a m e c o m p u t e r
w o r k s t a t io n

n e t w o r k l in k

n e tw o rk h o s t
c e n tr a liz e d c o m p u t in g
d is tr ib u te d c o m p u tin g
A typical portion of the Internet

intranet %
%
% ISP

backbone

satellite link

desktop computer:
server:
network link:
Computers in a Distributed System

• Workstations: computers used by end-users to perform


computing
• Server machines: computers which provide resources
and services
• Personal Assistance Devices: handheld computers
connected to the system via a wireless communication
link.
• …
DISTRIBUTED COMPUTING PARADIGMS

Paradigm means a pattern, example, or model.

Paradigms for distributed applications


Message passing : Appropriate paradigm for network
services.
Client server : Provides an efficient abstraction for the
delivery of network services.
Peer to peer :The participating processes play equal
roles with equivalent capabilities and
responsibilities

[28]
Message Passing

• Message passing is the paradigm of communication


where messages are sent from a sender to one or
more recipients.
• Forms of messages include
(remote) method invocation, signals, and data
packets. 
• Message Passing: Synchronous and Asynchronous
Synchronous Message Passing

• Synchronous message passing systems require the


sender and receiver to wait for each other to transfer the
message. That is, the sender will not continue until the
receiver has received the message.
• Synchronous communication has two advantages. The
first advantage is that reasoning about the program can
be simplified in that there is a synchronisation point
between sender and receiver on message transfer. The
second advantage is that no buffering is required. The
message can always be stored on the receiving side,
because the sender will not continue until the receiver is
ready.
Asynchronous Message Passing

• Asynchronous message passing systems deliver a


message from sender to receiver, without waiting for
the receiver to be ready.
• The advantage of asynchronous communication is
that the sender and receiver can overlap their
computation because they do not wait for each other.
Message passing - Synchronous Vs.
Asynchronous
Peer-peer Computing

• Peer-to-peer (P2P) computing or networking is a


distributed application architecture that partitions tasks or
workloads between peers.
• Peers are equally privileged, equipotent participants in the
application. They are said to form a peer-to-peer network
of nodes.
• Peers make a portion of their resources, such as
processing power, disk storage or network bandwidth,
directly available to other network participants, without the
need for central coordination by servers or stable hosts.
• Peers are both suppliers and consumers of resources, in
contrast to the traditional client–server model where only
servers supply, and clients consume.
Best Practice

• Data Intensive or Computing Intensive


– Data size and the amount of data
• The attribute of data you consume
• Computing intensive
– We can move data to the nodes where we can execute jobs
• Data Intensive
– We can separate/replicate data to difference nodes, then we can execute
our tasks on these nodes
– Reduce data replication when executing tasks

• Master nodes need to know data location


• No data loss when incidents happen
– SAN (Storage Area Network)
– Data replication on different nodes
• Synchronization
– When splitting tasks to different nodes, how can
we make sure these tasks are synchronized?
Best Practice

• Robustness
– Still safe when one or partial nodes fail
– Need to recover when failed nodes are online. No
further or few action is needed
• Condor – restart daemon

– Failure detection
• When any nodes fails, master nodes can detect this situation.
– Eg: Heartbeat detection

– App/Users don’t need to know if any partial failure


happens.
• Restart tasks on other nodes for users
Best Practice

• Network issue
– Bandwidth
• Need to think of bandwidth when copying files from one node to other nodes
if we would like to execute the task on the nodes if no data in these nodes.

• Scalability
– Easy to expand
• Hadoop – configuration modification and start daemon

• Optimization
– What can we do if the performance of some nodes is
not good?
• Monitoring the performance of each node
– According to any information exchange like heartbeat or log
• Resume the same task on another nodes
Best Practice

• App/User
– shouldn’t know how to communicate between
nodes
– User mobility – user can access the system from
some point or anywhere
• Grid – UI (User interface)
• Condor – submit machine
Components of Distributed Software
Systems

• Distributed systems
• Middleware
• Distributed applications
Middleware

Figure 1-1. The middleware layer extends over multiple machines,


and offers each application the same interface.
Middleware: Goals

• Middleware handles heterogeneity


• Higher-level support
– Make distributed nature of application transparent to
the user/programmer
• Remote Procedure Calls
• RPC + Object orientation = CORBA
• Higher-level support BUT expose remote
objects, partial failure, etc. to the programmer
– JINI, Javaspaces
• Scalability
Communication Patterns

• Client-server
• Group-oriented/Peer-to-Peer
– Applications that require reliability, scalability

41
Clients invoke individual servers

Client invocation Server


invocation

result result
Server

Client
Key:
Process: Computer:
A service provided by multiple servers

Service

Server
Client

Server

Client
Server
Web proxy server

Client Web
server
Proxy
server

Client Web
server
A distributed application based on peer
processes
Peer 2

Peer 1
Application

Application

Sharable Peer 3
objects
Application

Peer 4

Application

Peers 5 .... N
Distributed applications

• Applications that consist of a set of processes


that are distributed across a network of
machines and work together as an ensemble
to solve a common problem
• In the past, mostly “client-server”
– Resource management centralized at the server
• “Peer to Peer” computing represents a
movement towards more “truly” distributed
applications
Transparency

• Transparency is defined as the concealment


from the user and the application programmer
of the separation of components in a
distributed system, so that the system is
perceived as a whole rather than as a
collection of independent components.
• Eight forms of transparency:
Forms of Transparency in a
Distributed System
Transparency Description
Hide differences in data representation and how a
Access
resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Hide that a resource may be moved to another
Relocation
location while in use
Hide that a resource may be shared by several
Replication
competitive users
Hide that a resource may be shared by several
Concurrency
competitive users
Failure Hide the failure and recovery of a resource
Hide whether a (software) resource is in memory or
Persistence
on disk

Distributed Computing Systems

48
Transparency in Distributed Systems

Access transparency: enables local and remote resources to be


accessed using identical operations.
Location transparency: enables resources to be accessed
without knowledge of their physical or network location (for
example, which building or IP address).
Concurrency transparency: enables several processes to
operate concurrently using shared resources without
interference between them.
Replication transparency: enables multiple instances of
resources to be used to increase reliability and performance
without knowledge of the replicas by users or application
programmers.
Transparency in Distributed Systems

Failure transparency: enables the concealment of faults,


allowing users and application programs to complete their tasks
despite the failure of hardware or software components.
Mobility transparency: allows the movement of resources and
clients within a system without affecting the operation of users
or programs.
Performance transparency: allows the system to be
reconfigured to improve performance as loads vary.
Scaling transparency: allows the system and applications to
expand in scale without change to the system structure or the
application algorithms.
Case study - Hadoop

• HDFS
– Namenode:
• manages the file system namespace and regulates access to files by
clients.
• determines the mapping of blocks to DataNodes.

– Data Node :
• manage storage attached to the nodes that they run on
• save CRC codes
• send heartbeat to namenode.
• Each data is split as a chunk and each chuck is stored on some data
nodes.
– Secondary Namenode
• responsible for merging fsImage and EditLog
Case study - Hadoop
Case study - Hadoop

• Map-reduce Framework
– JobTracker
• Responsible for dispatch job to each tasktracker
• Job management like removing and scheduling.

– TaskTracker
• Responsible for executing job. Usually tasktracker launch another JVM
to execute the job.
Case study - Hadoop

From Hadoop - The Definitive Guide


Case study - Hadoop

• Data replication
– Data are replicated to different nodes
• Reduce the possibility of data loss
• Data locality. Job will be sent to the node where data are.

• Robustness
– One datanode fails
• We can get data from other nodes.

– One tasktracker failed


• We can start the same task on different node

– Recovery
• Only need to restart the daemon when the failed nodes are online
Case study - Hadoop

• Resource sharing
– Each hadoop user can share computing power
and storage space with other hadoop users.
• Synchronization
– No synchronization
• Failure detection
– Namenode/Jobtracker can know when
datanode/tasktracker fails
• Based on heartbeat

You might also like