0% found this document useful (0 votes)
48 views19 pages

Lecture1a DistSyst

A distributed system is defined as a collection of independent computers that appears to its users as a single coherent system. Key features of distributed systems include no shared memory and message-based communication between computers running their own operating systems in a potentially heterogeneous environment. The goal of distributed systems is to present a single-system image to users while providing characteristics like expandability, continuous availability despite failures of components, and distribution transparency to hide details of the system's distribution. Distributed systems aim to enable resource sharing, availability, and scalability.

Uploaded by

Deepesh Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views19 pages

Lecture1a DistSyst

A distributed system is defined as a collection of independent computers that appears to its users as a single coherent system. Key features of distributed systems include no shared memory and message-based communication between computers running their own operating systems in a potentially heterogeneous environment. The goal of distributed systems is to present a single-system image to users while providing characteristics like expandability, continuous availability despite failures of components, and distribution transparency to hide details of the system's distribution. Distributed systems aim to enable resource sharing, availability, and scalability.

Uploaded by

Deepesh Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

What Is A Distributed System?

“A collection of independent computers that appears to its


users as a single coherent system.”
• Features:
• No shared memory – message-based communication
• Each runs its own local OS
• Heterogeneity
• Ideal: to present a single-system image:
• The distributed system “looks like” a single computer
rather than a collection of separate computers.
Distributed System Characteristics

• To present a single-system image:


• Hide internal organization, communication details
• Provide uniform interface
• Easily expandable
• Adding new computers is hidden from users
• Continuous availability
• Failures in one component can be covered by other
components
• Supported by middleware
Definition of a Distributed System

Figure 1-1. A distributed system organized as middleware. The


middleware layer runs on all machines, and offers a uniform
interface to the system
Goal 1 – Resource Availability
• Support user access to remote resources
(printers, data files, web pages, CPU cycles) and
the fair sharing of the resources
• Economics of sharing expensive resources
• Performance enhancement – due to multiple
processors; also due to ease of collaboration and
info exchange – access to remote services
• Resource sharing introduces security problems.
Goal 2 – Distribution Transparency
• Software hides some of the details of the distribution
of system resources.
• Makes the system more user friendly.
• A distributed system that appears to its users &
applications to be a single computer system is said to
be transparent.
• Users & apps should be able to access remote
resources in the same way they access local
resources.
• Transparency has several dimensions.
Types of Transparency
Transparency Description
Access Hide differences in data representation &
resource access (enables interoperability)
Location Hide location of resource (can use resource
without knowing its location)
Migration Hide possibility that a system may change
location of resource (no effect on access)
Replication Hide the possibility that multiple copies of the
resource exist (for reliability and/or availability)
Concurrency Hide the possibility that the resource may be
shared concurrently
Failure Hide failure and recovery of the resource. How
does one differentiate betw. slow and failed?
Relocation Hide that resource may be moved during use

Figure 1-2. Different forms of transparency in a distributed system


(ISO, 1995)
Transparency to Handle Failures?

slide from Jeff Dean, Google


Goal 3 - Openness
• An open distributed system “…offers services according to
standard rules that describe the syntax and semantics of
those services.” In other words, the interfaces to the system
are clearly specified and freely available.
• Compare to network protocols, Not proprietary

• Interface Definition/Description Languages (IDL): used to


describe the interfaces between software components,
usually in a distributed system
• Definitions are language & machine independent
• Support communication between systems using different
OS/programming languages; e.g. a C++ program running on Windows
communicates with a Java program running on UNIX
• Communication is usually RPC-based.
Open Systems Support …
• Interoperability: the ability of two different
systems or applications to work together
• A process that needs a service should be able to
talk to any process that provides the service.
• Multiple implementations of the same service
may be provided, as long as the interface is
maintained
• Portability: an application designed to run on one
distributed system can run on another system
which implements the same interface.
• Extensibility: Easy to add new components,
features
Goal 4 - Scalability
• Dimensions that may scale:
• With respect to size
• With respect to geographical distribution
• With respect to the number of
administrative organizations spanned
• A scalable system still performs well as it
scales up along any of the three
dimensions.
A Google Datacenter
How big? Perhaps one million+ machines

but it’s not that bad...


usually don’t use more than 20,000 machines to
accomplish a single task. [2009, probably out of
date]
Search for “Trump
hairdo”
Front-end
slide from Jeff Dean, Google
Front-end
Split into chunks:
Replicate:
make single
Handle load
queries faster
i1 i2 i3 i1 i2 i3 i1 i2 i3
i4 ... i4 ... i4 ...

Replicated
GFS distributed filesystem Consistent
Fast
How do you index the
web?
1. Get a copy of the
web.

2. Build
Thereanare
index.
over 1 trillion unique URLs
Billions
3. Profit. of unique web pages
Hundreds of millions of websites
30?? terabytes of text
=
• Crawling -- • Profiting -- we
download those leave that to you.
web pages

• Indexing -- harness
10s of thousands • “Data-Intensive
of machines to do Computing”
it
MapReduce / Hadoop
DataWhy? Hiding details of programming 10,000
Computers
machines!
Chunks

Programmer writes two simple functions:


Sort
map (data item) -> list(tmp values) Storage
reduce ( list(tmp values)) -> list(out values)
...
MapReduce system balances load, handles
failures, starts job, collects results, etc.
Data Data
Storage
Transformation Aggregation
All that...
• Hundreds of DNS get packets around
servers the globe

• Protocols on • Hundreds of
protocols on thousands of servers
protocols
• ... to find out what’s
• Distributed network the deal with Trump’s
of Internet routers to hair!

You might also like