0% found this document useful (0 votes)

45 views

How To Scale A Distributed System

This document provides lessons learned from building distributed systems over 8+ years. It discusses key distributed systems concepts and primitives like failure detection, quorums, and consensus protocols. It emphasizes the importance of scaling systems from the beginning by considering additional axes like failures and decomposing properties into safety and liveness. Examples from Apache Impala show how it prioritized performance over strong consistency. The document advises keeping coordination costs low by separating control and data planes, with the control plane small and the data plane large. It also discusses approaches to reconciling distributed systems impossibility results with practical system design.

Uploaded by

Harsh Vardhan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

How To Scale A Distributed System

Uploaded by

Harsh Vardhan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

How to scale a distributed system

Henry Robinson
@henryr / [email protected]
What is this, and who’s it for?
§ Lessons learned from the trenches building distributed systems for 8+ years at Cloudera and in
open source communities.
What is this, and who’s it for?
§ Lessons learned from the trenches building distributed systems for 8+ years at Cloudera and in
open source communities. 
 

§ Not:
§ A complete course in distributed systems theory (but boy do I have references for you)
§ Always specific to distributed systems
§ Complete
§ Signed off by experts
§ A panacea (sorry)
…and you are?
§ Distributed systems dilettante 

§ Some years in graduate school for distributed systems 

 
..followed by some years in industry for the same thing. 

§ Some writing on my blog: http://the-paper-trail.org/ 

§ A community: https://dist-sys-slack.herokuapp.com/ for the invite

Today
§ Primitives 

§ Practices 

§ Possibility 

§ Papers
Today
§ Primitives - what are the concepts, and nouns, that it’s important to know? 

§ Practices - what are good habits in distributed systems design? 

§ Possibility - how should we think - if at all - about formal impossibility? 

§ Papers - you don’t have time to read everything? Join the club.
[spoiler: everyone argues 
about CAP, forever]
1. Primitives
Basic concepts
§ Processes may fail.

§ There is no particularly good way to tell that they have done so.

§ Almost always better to err on the side of caution.

Basic concepts
1. Failure detectors
2. Symmetry breaking (with leader election as an example)
3. Fault models
4. Replicated state machines
5. Quorums
6. Logical time
7. Coordination: broadcast, consensus, commit protocols
2. Practices
Always Be sCaling
What do we talk about, when we talk about scaling?
§ Scaling (up) means more. Of everything.

§“what happens to the behavioral characteristics of my system as

the operational parameters increase?”

§ Not just number of nodes.

Why are we scaling? Not just increased load.
§ Commodity hardware revolution made incremental capacity improvements possible.

§ The operational mode of the software we build has changed: availability is the sword by which web
properties live or die.

§ Redundancy is the basic conceptual approach to providing availability

§ Adding more processing power is how we provide redundancy; i.e. we scale our systems up.
Scalability axes

§ One rarely considered scalability axis:

more failures. (and more types of
failure)
Scalability axes

§ GFS Paper (SOSP 2003)

§ One rarely considered scalability axis:

more failures. (and more types of
failure)
Scalability axes

§ GFS Paper (SOSP 2003)

§ One rarely considered scalability axis:

more failures. (and more types of
failure)
Apache Impala has to scale with respect to…
Apache Impala has to scale with respect to…
§ Query complexity 

§ Queries per second 

§ Cluster size 

§ Node CPU / memory 

§ Degree of per-node parallelism 

§ Number of clients per node 

§ Number of clients per cluster 

§ Number of tables 

§ Number of partitions per table

Apache Impala has to scale with respect to…
§ Query complexity 
§ Number of columns per table 
§ Queries per second 
§ Data size per table 
§ Cluster size 
§ Intermediate result size  
§ Node CPU / memory 
§ Kerberos ticket grants
§ Degree of per-node parallelism 

§ Number of clients per node 

§ Number of clients per cluster 

§ Number of tables 

§ Number of partitions per table

Scale is a fundamental design consideration

Just like security, include scalability in your thinking from day one.

Scalability behaviors are usually discontinuous - they exhibit phase changes rather than gradual
improvement. (20->50 nodes, not 20->22)

That means you can clearly identify scaling boundaries. Do this wherever possible. The rest of the
your team - and the systems you interact with - will thank you for it.

It also means that, by attacking the scaling boundary, you can have a large impact - when the time is
right.
Draw your borders before you drive off a cliff
Draw your borders before you drive off a cliff

super-linear costs will

eventually dominate
Decompose system properties into
safety and liveness
System invariants
Safety Liveness
System invariants
Safety Liveness

“Nothing bad ever happens!”

For example:

§ Queries never return incorrect results 

§ Corrupt data is never written to disk 

§ Data is never read remotely 

§ Only one leader exists at any time

System invariants
Safety Liveness

“Nothing bad ever happens!” “Something good eventually happens!”

For example: For example:

§ Queries never return incorrect results  § New nodes eventually join the cluster 

§ Corrupt data is never written to disk  § All queries complete 

§ Data is never read remotely  § Some data gets written to disk on INSERT

§ Only one leader exists at any time

System invariants
Safety Liveness

“Nothing bad ever happens!” “Something good eventually happens!”

For example: For example:

§ Queries never return incorrect results  § New nodes eventually join the cluster 

All
§ Corrupt data is system
never properties
written to disk  § Allcan be
queries described
complete  as
a combination
§ Data is never read remotely 
of safety and liveness properties.
§ Some data gets written to disk on INSERT

§ Only one leader exists at any time

Example: Impala’s query liveness and safety
§ For queries, liveness means “all queries eventually complete” 
(note I didn’t say they complete successfully) 

§ Safety property is more interesting. Choice between:

1. Query never returns anything but its full result set
2. Query must return anything, but must signal an error when it does.

§ Impala chose option #2, despite #1 being much more attractive.

§ Why?
Example: Impala’s query liveness and safety
§ It’s obviously better to always return complete results, but failures make that extremely hard.

§ If Impala had tried to enforce strong query safety from day 1, it would never have been a success:
achieving performance goals would have been much harder.

§ Instead, make fault tolerance trivial by weakening the definition. By definition, such a system scales
better.
Think global,  
act local.
Coordination costs
§ Coordination: getting different processes to agree on some shared fact.

§ Coordination is incredibly costly in distributed systems and the cost increases with the number of
participants.

§ This is the reason most ZooKeeper deployments are 3-5 nodes.

Avoid coordination wherever possible
§ Mostly got this right in Impala:

§ Metadata consistent on session level (sticky to one machine) -> no coordination required

§ Data processing is heavily parallel.

§ Coordination happens almost entirely at distinguished coordinator node, asynchronously wrt to

query execution
Example: synchronous DDL

§ Some users wanted cross-session metadata consistency, i.e. I create a table, you can instantly see
it.

§ Problem: symmetry of Impala’s architecture means every Impala daemon needs to see all updates
synchronously.

§ Latency of these operations is by definition pessimal.

Small control plane,
big data plane
Two types of communication

§ Communication in distributed systems serves roughly one of two purposes:

§ Control logic tells processes what to do next 

§ Data flow exchanges data between processes for computation 

Data vs control
Data protocols

§ Simple protocols

§ Typically need local-state only 

§ Very high data volume 

§ Heavy resource consumption 

§ Highly scalable 

§ Dominates CPU execution time 

Data vs control
Data protocols Control protocols

§ Simple protocols § Complex protocols

§ Typically need local-state only  § Global view of cluster state

§ Very high data volume  § Relatively small data volume

§ Heavy resource consumption  § Lightweight resource consumption

§ Highly scalable  § Not highly scalable

§ Dominates CPU execution time  § Low relative cost

3. (im)possibility
YOU CAN’T DO THAT!
§ Nothing trips up Distributed Systems Twitter faster than impossibility results 

§ Two camps:
§ “your system doesn’t beat CAP, so I don’t care”
§ “I don’t care about CAP, it’s really unlikely I’ll lose that transaction”

§ Impossibility results - and there are a lot of them - tell us about some fundamental tension. But they
are completely silent on practicalities. Just because you can’t do something, doesn’t mean you
shouldn’t try. 

§ The best way to think about impossibility is to recognize the safety and liveness tension that a
result represents.  

§ Decide which you’re willing to give up.  

§ And then protect the other at all cost.  

4. Papers
Read papers.
Read papers.
Not too many.
Read papers.
Not too many.
Mostly real systems papers.
Thank you!

The Human Challenges of The Digital World
No ratings yet
The Human Challenges of The Digital World
27 pages
2022 Medical Equipment List With Price
83% (6)
2022 Medical Equipment List With Price
4 pages
System Design For Cracking Interviews
No ratings yet
System Design For Cracking Interviews
15 pages
Blockchain Essentials & Dapps
100% (1)
Blockchain Essentials & Dapps
125 pages
Distributed Systems Practitioners Dimos Raptis Raspoznan
No ratings yet
Distributed Systems Practitioners Dimos Raptis Raspoznan
259 pages
Advance Software Enginerring
No ratings yet
Advance Software Enginerring
9 pages
IAU-ST-Lecture2
No ratings yet
IAU-ST-Lecture2
30 pages
Hadoop Training #1: Thinking at Scale
100% (1)
Hadoop Training #1: Thinking at Scale
20 pages
Traing On Hadoop
No ratings yet
Traing On Hadoop
123 pages
IntroDS - introds
No ratings yet
IntroDS - introds
13 pages
DS Syllabus Introduction (Reference)
No ratings yet
DS Syllabus Introduction (Reference)
44 pages
Distributed Systems For Fun and Profit PDF
0% (1)
Distributed Systems For Fun and Profit PDF
88 pages
Intro To DS Chapter 1
No ratings yet
Intro To DS Chapter 1
56 pages
Chapter 01: Introduction: Distributed Systems Principles and Paradigms
No ratings yet
Chapter 01: Introduction: Distributed Systems Principles and Paradigms
10 pages
01 en Principles of Distributed Systems
No ratings yet
01 en Principles of Distributed Systems
35 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
42 pages
Distributed Systems
No ratings yet
Distributed Systems
68 pages
Critical Infrastructure Security: The Emerging Smart Grid
No ratings yet
Critical Infrastructure Security: The Emerging Smart Grid
37 pages
Destributed System Lecture Note Finale
No ratings yet
Destributed System Lecture Note Finale
148 pages
Characterization of Distributed Systems (Chapter-1)
No ratings yet
Characterization of Distributed Systems (Chapter-1)
35 pages
Hints 170 Full
No ratings yet
Hints 170 Full
101 pages
Theres Just No Getting Arround It Youre Buildign A Distributed System
No ratings yet
Theres Just No Getting Arround It Youre Buildign A Distributed System
12 pages
Introduction To Distributed Systems (DS) : INF5040/9040 Autumn 2014
No ratings yet
Introduction To Distributed Systems (DS) : INF5040/9040 Autumn 2014
14 pages
Module 1 Ppt
No ratings yet
Module 1 Ppt
47 pages
CSCE455/855 Distributed Operating Systems: Dr. Ying Lu Schorr Center 106
No ratings yet
CSCE455/855 Distributed Operating Systems: Dr. Ying Lu Schorr Center 106
41 pages
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
No ratings yet
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
14 pages
Distributed Systems Introduction
No ratings yet
Distributed Systems Introduction
40 pages
Distributed Systems Introduction
No ratings yet
Distributed Systems Introduction
21 pages
01 - Ch1
No ratings yet
01 - Ch1
25 pages
RMCS
No ratings yet
RMCS
127 pages
Binder 1
No ratings yet
Binder 1
62 pages
Network Is Reliable
No ratings yet
Network Is Reliable
13 pages
Hadoop 2
No ratings yet
Hadoop 2
27 pages
L01
No ratings yet
L01
12 pages
Chapter 1 - Characterization of Distributed Systems
No ratings yet
Chapter 1 - Characterization of Distributed Systems
20 pages
Distributed Computing: Beakal Gizachew Assefa
No ratings yet
Distributed Computing: Beakal Gizachew Assefa
54 pages
CSE352 Lecture9 DistributedSystemsDesignInto
No ratings yet
CSE352 Lecture9 DistributedSystemsDesignInto
98 pages
System Design
No ratings yet
System Design
9 pages
Security Challenges of DS?: 2.what Do You Mean by Scalability of A System?
No ratings yet
Security Challenges of DS?: 2.what Do You Mean by Scalability of A System?
10 pages
System Scalability
No ratings yet
System Scalability
63 pages
Hints 190 Full
No ratings yet
Hints 190 Full
106 pages
Understanding Distributed Systems What Every Developer Should Know About Large Distributed Applications
No ratings yet
Understanding Distributed Systems What Every Developer Should Know About Large Distributed Applications
226 pages
Course Notes
No ratings yet
Course Notes
213 pages
Intro To Distributed Systems
No ratings yet
Intro To Distributed Systems
30 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
5 pages
Software Engineering - Challenges Ahead
No ratings yet
Software Engineering - Challenges Ahead
43 pages
Distributed Systems: MSC in Computer Science Unyt-Uog Assoc. Prof. Marenglen Biba
No ratings yet
Distributed Systems: MSC in Computer Science Unyt-Uog Assoc. Prof. Marenglen Biba
105 pages
Ebook - Cracking The System Design Interview Course
No ratings yet
Ebook - Cracking The System Design Interview Course
91 pages
Distributed Systems: Chapter 1 - Introduction
100% (2)
Distributed Systems: Chapter 1 - Introduction
74 pages
(Ebook) The Science of the Blockchain by Roger Wattenhofer ISBN 9781522751830, 1522751831 download
100% (2)
(Ebook) The Science of the Blockchain by Roger Wattenhofer ISBN 9781522751830, 1522751831 download
51 pages
01
No ratings yet
01
47 pages
Lesson 1
No ratings yet
Lesson 1
27 pages
Ccaws Unit 5
No ratings yet
Ccaws Unit 5
17 pages
Understanding Distributed Systems Sample
No ratings yet
Understanding Distributed Systems Sample
32 pages
Scalability in Distributed Systems
No ratings yet
Scalability in Distributed Systems
12 pages
Distributed Systems: Tanenbaum Chapter 1
No ratings yet
Distributed Systems: Tanenbaum Chapter 1
70 pages
Introduction
No ratings yet
Introduction
59 pages
W01-L01 Introduction To Distributed Computing
No ratings yet
W01-L01 Introduction To Distributed Computing
46 pages
Chapter 1-Introduction (2)
No ratings yet
Chapter 1-Introduction (2)
45 pages
Practical Packet Analysis, 3rd Edition: Using Wireshark to Solve Real-World Network Problems
From Everand
Practical Packet Analysis, 3rd Edition: Using Wireshark to Solve Real-World Network Problems
Chris Sanders
3.5/5 (6)
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
Prometheus_Asset_Allocation
No ratings yet
Prometheus_Asset_Allocation
19 pages
Trading Plan
No ratings yet
Trading Plan
1 page
Midweek Review 25 Jan 2024
No ratings yet
Midweek Review 25 Jan 2024
8 pages
Highly Skilled
No ratings yet
Highly Skilled
17 pages
Your Marksheet - Sambalpur University (Odisha)
No ratings yet
Your Marksheet - Sambalpur University (Odisha)
1 page
Beam Design PDF
100% (1)
Beam Design PDF
20 pages
Territories of The Soul by Nadia Ellis
No ratings yet
Territories of The Soul by Nadia Ellis
59 pages
Fantastico V Malicse GR No. 190912
No ratings yet
Fantastico V Malicse GR No. 190912
4 pages
Kensington Brushless Motor Cordless Stick Vacuum 25V - 18P
No ratings yet
Kensington Brushless Motor Cordless Stick Vacuum 25V - 18P
18 pages
RI - Wingreens
No ratings yet
RI - Wingreens
6 pages
2 Only God Can Make A Tree
No ratings yet
2 Only God Can Make A Tree
7 pages
Peterpaul Nacua@deped Gov PH
100% (1)
Peterpaul Nacua@deped Gov PH
11 pages
Commission On Elections: Prec: 0800A
No ratings yet
Commission On Elections: Prec: 0800A
133 pages
Expressing Annoyance
No ratings yet
Expressing Annoyance
12 pages
Forex Trading Guide .2.0
No ratings yet
Forex Trading Guide .2.0
53 pages
Robert Walters Salary Survey
No ratings yet
Robert Walters Salary Survey
5 pages
Access20 01 The Path To Discrete Choice Models
No ratings yet
Access20 01 The Path To Discrete Choice Models
6 pages
781 Sewing Instruction Pay
No ratings yet
781 Sewing Instruction Pay
28 pages
Tucci 1935 A Propos The Legend of Nāropā
No ratings yet
Tucci 1935 A Propos The Legend of Nāropā
13 pages
HIS F3 PP2 Teacher - Co - Ke
No ratings yet
HIS F3 PP2 Teacher - Co - Ke
2 pages
(Ebook) The Man They Couldn't Hang : A Tale of Murder, Mystery and Celebrity by Michael Crowley ISBN 9781906534974, 1906534977 all chapter instant download
100% (3)
(Ebook) The Man They Couldn't Hang : A Tale of Murder, Mystery and Celebrity by Michael Crowley ISBN 9781906534974, 1906534977 all chapter instant download
67 pages
Incredible God
No ratings yet
Incredible God
2 pages
02-Structure of Atom
No ratings yet
02-Structure of Atom
2 pages
Chalukyas of Kalyana
No ratings yet
Chalukyas of Kalyana
15 pages
PMK Final Report-Starbucks
No ratings yet
PMK Final Report-Starbucks
22 pages
02 Whole
No ratings yet
02 Whole
110 pages
Global Textiles & Apparels - E-Paper 09 August 2021
No ratings yet
Global Textiles & Apparels - E-Paper 09 August 2021
7 pages
Darwin's Finches
No ratings yet
Darwin's Finches
5 pages
Histopathology 1
No ratings yet
Histopathology 1
57 pages
Cub Cadet Parts Manual For Model 2155 Tractor SN 326006 and Up
100% (59)
Cub Cadet Parts Manual For Model 2155 Tractor SN 326006 and Up
8 pages
English Science: Araling Panlipunan Edukasyon Sa Pagpapakatao
No ratings yet
English Science: Araling Panlipunan Edukasyon Sa Pagpapakatao
2 pages
Module 1 - CONTEMPORARY ARTS
100% (1)
Module 1 - CONTEMPORARY ARTS
19 pages

How To Scale A Distributed System

Uploaded by

How To Scale A Distributed System

Uploaded by

How to scale a distributed system

§ Some years in graduate school for distributed systems

§ Some writing on my blog: http://the-paper-trail.org/

§ A community: https://dist-sys-slack.herokuapp.com/ for the invite

§ Practices - what are good habits in distributed systems design?

§ Possibility - how should we think - if at all - about formal impossibility?

§ Almost always better to err on the side of caution.

§“what happens to the behavioral characteristics of my system as

§ Not just number of nodes.

§ Redundancy is the basic conceptual approach to providing availability

§ One rarely considered scalability axis:

§ GFS Paper (SOSP 2003)

§ One rarely considered scalability axis:

§ GFS Paper (SOSP 2003)

§ One rarely considered scalability axis:

§ Queries per second

§ Node CPU / memory

§ Degree of per-node parallelism

§ Number of clients per node

§ Number of clients per cluster

§ Number of partitions per table

§ Number of clients per node

§ Number of clients per cluster

§ Number of partitions per table

super-linear costs will

“Nothing bad ever happens!”

§ Queries never return incorrect results

§ Corrupt data is never written to disk

§ Data is never read remotely

§ Only one leader exists at any time

“Nothing bad ever happens!” “Something good eventually happens!”

For example: For example:

§ Corrupt data is never written to disk § All queries complete

§ Only one leader exists at any time

“Nothing bad ever happens!” “Something good eventually happens!”

For example: For example:

§ Only one leader exists at any time

§ Safety property is more interesting. Choice between:

§ Impala chose option #2, despite #1 being much more attractive.

§ This is the reason most ZooKeeper deployments are 3-5 nodes.

§ Data processing is heavily parallel.

§ Coordination happens almost entirely at distinguished coordinator node, asynchronously wrt to

§ Latency of these operations is by definition pessimal.

§ Communication in distributed systems serves roughly one of two purposes:

§ Control logic tells processes what to do next

§ Data flow exchanges data between processes for computation

§ Typically need local-state only

§ Very high data volume

§ Heavy resource consumption

§ Dominates CPU execution time

§ Simple protocols § Complex protocols

§ Typically need local-state only § Global view of cluster state

§ Very high data volume § Relatively small data volume

§ Heavy resource consumption § Lightweight resource consumption

§ Highly scalable § Not highly scalable

§ Dominates CPU execution time § Low relative cost

§ Decide which you’re willing to give up.

§ And then protect the other at all cost.

You might also like

§ Some years in graduate school for distributed systems 

§ Some writing on my blog: http://the-paper-trail.org/ 

§ Practices - what are good habits in distributed systems design? 

§ Possibility - how should we think - if at all - about formal impossibility? 

§ Queries per second 

§ Node CPU / memory 

§ Degree of per-node parallelism 

§ Number of clients per node 

§ Number of clients per cluster 

§ Number of clients per node 

§ Number of clients per cluster 

§ Queries never return incorrect results 

§ Corrupt data is never written to disk 

§ Data is never read remotely 

§ Corrupt data is never written to disk  § All queries complete 

§ Control logic tells processes what to do next 

§ Data flow exchanges data between processes for computation 

§ Typically need local-state only 

§ Very high data volume 

§ Heavy resource consumption 

§ Dominates CPU execution time 

§ Typically need local-state only  § Global view of cluster state

§ Very high data volume  § Relatively small data volume

§ Heavy resource consumption  § Lightweight resource consumption

§ Highly scalable  § Not highly scalable

§ Dominates CPU execution time  § Low relative cost

§ Decide which you’re willing to give up.  

§ And then protect the other at all cost.