How To Scale A Distributed System
How To Scale A Distributed System
Henry Robinson
@henryr / [email protected]
What is this, and who’s it for?
§ Lessons learned from the trenches building distributed systems for 8+ years at Cloudera and in
open source communities.
What is this, and who’s it for?
§ Lessons learned from the trenches building distributed systems for 8+ years at Cloudera and in
open source communities.
§ Not:
§ A complete course in distributed systems theory (but boy do I have references for you)
§ Always specific to distributed systems
§ Complete
§ Signed off by experts
§ A panacea (sorry)
…and you are?
§ Distributed systems dilettante
§ Practices
§ Possibility
§ Papers
Today
§ Primitives - what are the concepts, and nouns, that it’s important to know?
§ Papers - you don’t have time to read everything? Join the club.
[spoiler: everyone argues
about CAP, forever]
1. Primitives
Basic concepts
§ Processes may fail.
§ There is no particularly good way to tell that they have done so.
§ The operational mode of the software we build has changed: availability is the sword by which web
properties live or die.
§ Adding more processing power is how we provide redundancy; i.e. we scale our systems up.
Scalability axes
§ Cluster size
§ Number of tables
§ Number of tables
Just like security, include scalability in your thinking from day one.
Scalability behaviors are usually discontinuous - they exhibit phase changes rather than gradual
improvement. (20->50 nodes, not 20->22)
That means you can clearly identify scaling boundaries. Do this wherever possible. The rest of the
your team - and the systems you interact with - will thank you for it.
It also means that, by attacking the scaling boundary, you can have a large impact - when the time is
right.
Draw your borders before you drive off a cliff
Draw your borders before you drive off a cliff
For example:
§ Queries never return incorrect results § New nodes eventually join the cluster
§ Data is never read remotely § Some data gets written to disk on INSERT
§ Queries never return incorrect results § New nodes eventually join the cluster
All
§ Corrupt data is system
never properties
written to disk
§ Allcan be
queries described
complete
as
a combination
§ Data is never read remotely
of safety and liveness properties.
§ Some data gets written to disk on INSERT
§ Why?
Example: Impala’s query liveness and safety
§ It’s obviously better to always return complete results, but failures make that extremely hard.
§ If Impala had tried to enforce strong query safety from day 1, it would never have been a success:
achieving performance goals would have been much harder.
§ Instead, make fault tolerance trivial by weakening the definition. By definition, such a system scales
better.
Think global,
act local.
Coordination costs
§ Coordination: getting different processes to agree on some shared fact.
§ Coordination is incredibly costly in distributed systems and the cost increases with the number of
participants.
§ Metadata consistent on session level (sticky to one machine) -> no coordination required
§ Some users wanted cross-session metadata consistency, i.e. I create a table, you can instantly see
it.
§ Problem: symmetry of Impala’s architecture means every Impala daemon needs to see all updates
synchronously.
§ Simple protocols
§ Highly scalable
§ Two camps:
§ “your system doesn’t beat CAP, so I don’t care”
§ “I don’t care about CAP, it’s really unlikely I’ll lose that transaction”
§ Impossibility results - and there are a lot of them - tell us about some fundamental tension. But they
are completely silent on practicalities. Just because you can’t do something, doesn’t mean you
shouldn’t try.
§ The best way to think about impossibility is to recognize the safety and liveness tension that a
result represents.