Distributed systems - A Primer

Distributed Systems – A
Primer
MD Sayem Ahmed

Who am I
● A Bangladeshi currently living in Berlin, Germany
● Occasionally blogs at www.codesod.com
● Tweets at @say3mbd
● Can also be found on LinkedIn
● Can be reached via email at sayem64@gmail.com

Today’s Agenda
● What are Distributed Systems?
● Why Distributed Systems?
● Read Replication/Single-master Replication
● CAP Theorem
● Sharding/Partitioning/Multi-master Replication

Distributed Systems – A Definition
A distributed system is a collection of independent computers that
appears to its users as a single coherent system
Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum,‎ Maarten van Steen

A Typical Web Application is a Distributed
System

Key Characteristics of Distributed Systems
● Concurrency – all the computers operate at the same time

● Transparency – system is perceived as a whole

● Independent failure – the computers can fail independently

● Independent failure – the computers can fail independently
● No global clock

More and more people are using my book shop! But
then….
It takes a long time to load the book review pages!

Book Shop is not scalable!
Needs to handle more users!

● Powerful
Application
Servers?

● Powerful
Application
Servers?
● More
Application
Servers?

● Powerful
Application
Servers?
● More
Application
Servers?
● Powerful
Database
Server?

Measure,‎ Measure,‎ and
Measure!

Focus on
● Database
● Memory
● CPU
● Network I/O
● Disk I/O

Performance Measurement – Burning Questions
● Are my database queries slow?

● Is my application’s CPU consumption high?

● Is my application running out of memory?

● Does my application have a memory leak?

● Is garbage collection being triggered too often?

● Is there something wrong with the Network I/O?

● Is there something funny going on with the Disk Usage? Are
reading/writing to disks taking too long?

● Are the third-party APIs taking too long to respond?

● Are the third-party APIs taking too long to respond?
● … and so on

Tools that help to measure
● New Relic / AppDynamics / DataDog
● Metrics (from Dropwizard), Grafana
● Cloud provider tools (i.e., AWS CloudWatch)
● Custom Resource Monitoring Tools

Some simple scaling strategies
● Try to optimize database queries (more on this later)

● Try purchasing more powerful CPUs and more memory (Vertical
Scaling) for the application servers (only works for CPU- and
memory-bound applications)

● If database is not a bottleneck, try adding more application server
instances (Horizontal Scaling)

● If database is not a bottleneck, try adding more application server
instances (Horizontal Scaling)
● Try using a CDN to serve static contents

Most of the time, it is the Database

Scaling a Single Database – some simple
strategies
● Reduce the number of queries

strategies
● Use indexes

strategies
● Use indexes
● Make sure your indexes are being used by the queries in production

strategies
● Use indexes
● Make sure you are not creating too many indexes on write-heavy
tables

strategies
● Use indexes
tables
● Try purchasing powerful CPUs and more memories for the database
server (Vertical Scaling)

strategies
● Use indexes
tables
● Try purchasing powerful CPUs and more memories for the database
server (Vertical Scaling)
● … and many more (indexed views, denormalization, store pre-
computed value for fast read etc.)

Distributed systems - A Primer

Scaling Database Reads through Read Replication

Eventual Consistency
ACID
BASE

… at the cost of Availability

There will always be
Trade-offs

It is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees:
CAP Theorem
Source: https://en.wikipedia.org/wiki/CAP_theorem

– Consistency
CAP Theorem

– Consistency
– Availability
CAP Theorem

– Consistency
– Availability
– Partition Tolerance
CAP Theorem

Every read receives the most recent write or an error
CAP Theorem - Consistency

Every request receives a (non-error) response – without
guarantee that it contains the most recent write
CAP Theorem - Availability

The system continues to operate despite an arbitrary number
of messages being dropped (or delayed) by the network
between nodes
CAP Theorem – Partition Tolerance

Read Replication - Advantages
● Can easily handle vast amount of concurrent reads

Read Replication - Advantages
● Can easily handle vast amount of concurrent reads
● Configuring Redundancy is very easy

Read Replication - Problems
● Not ACID

● Not ACID
● Consistency or Availability – choose one

● Not ACID
● Consistency or Availability – choose one
● Increased operational complexity compared to a single database
instance

Sharding / Partitioning / Multi-master Replication

Read/Write Operation
User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18

User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
User ID = 50000

User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
User ID = 150000

“I would like to calculate the total revenue
earned from Harry Potter and the Deathly
Hallows over a certain period”

… Aka MapReduce
● Scatter/Gather is famously known as the MapReduce paradigm
● Popularized by a famous research paper from Google
● A popular implementation is part of the Apache Hadoop project

Sharding – advantages
Can easily scale read/write to the Moon

Sharding – problems
● Operationally complex
– Cluster Management
– All queries need to have the Shard Key

● Sharding an RDBMS is painful
– Referential integrity cannot be guaranteed anymore

● Each table must have the Shard Key

● Each table must have the Shard Key
● Not suitable if most of the queries are Scatter/Gather

Next Topics
● Distributed Hash Tables / Consistently Hashed Data Stores
● Distributed Transactions
● A very brief introduction to Microservices

Additional Resources
● Distributed Systems in One Lesson by Tim Berglund
● Distributed Systems reading list by Tim Berglund
● Building Microservices by Sam Newman
● PostgreSQL documentation on High Availability
● MongoDB Replication Manual
● High Scalability
● Enterprise Integration Patterns

Distributed systems - A Primer

Recommended

More Related Content

Similar to Distributed systems - A Primer (20)

More from MD Sayem Ahmed (6)

Recently uploaded (20)

Distributed systems - A Primer