SlideShare a Scribd company logo
Distributed Systems – A
Primer
MD Sayem Ahmed
Who am I
● A Bangladeshi currently living in Berlin, Germany
● Occasionally blogs at www.codesod.com
● Tweets at @say3mbd
● Can also be found on LinkedIn
● Can be reached via email at sayem64@gmail.com
Today’s Agenda
● What are Distributed Systems?
● Why Distributed Systems?
● Read Replication/Single-master Replication
● CAP Theorem
● Sharding/Partitioning/Multi-master Replication
Distributed Systems – A Definition
A distributed system is a collection of independent computers that
appears to its users as a single coherent system
Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum,‎ Maarten van Steen
A Typical Web Application is a Distributed
System
Key Characteristics of Distributed Systems
● Concurrency – all the computers operate at the same time
Key Characteristics of Distributed Systems
● Concurrency – all the computers operate at the same time
● Transparency – system is perceived as a whole
Key Characteristics of Distributed Systems
● Concurrency – all the computers operate at the same time
● Transparency – system is perceived as a whole
● Independent failure – the computers can fail independently
Independent Failure
Key Characteristics of Distributed Systems
● Concurrency – all the computers operate at the same time
● Transparency – system is perceived as a whole
● Independent failure – the computers can fail independently
● No global clock
Improving Availability
Back to the Book Shop
More and more people are using my book shop! But
then….
It takes a long time to load the book review pages!
Book Shop is not scalable!
Needs to handle more users!
What should I scale?
● Powerful
Application
Servers?
● Powerful
Application
Servers?
● More
Application
Servers?
● Powerful
Application
Servers?
● More
Application
Servers?
● Powerful
Database
Server?
Measure,‎ Measure,‎ and
Measure!
Focus on
● Database
● Memory
● CPU
● Network I/O
● Disk I/O
Performance Measurement – Burning Questions
● Are my database queries slow?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
● Does my application have a memory leak?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
● Does my application have a memory leak?
● Is garbage collection being triggered too often?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
● Does my application have a memory leak?
● Is garbage collection being triggered too often?
● Is there something wrong with the Network I/O?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
● Does my application have a memory leak?
● Is garbage collection being triggered too often?
● Is there something wrong with the Network I/O?
● Is there something funny going on with the Disk Usage? Are
reading/writing to disks taking too long?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
● Does my application have a memory leak?
● Is garbage collection being triggered too often?
● Is there something wrong with the Network I/O?
● Is there something funny going on with the Disk Usage? Are
reading/writing to disks taking too long?
● Are the third-party APIs taking too long to respond?
Performance Measurement – Burning Questions
● Are my database queries slow?
● Is my application’s CPU consumption high?
● Is my application running out of memory?
● Does my application have a memory leak?
● Is garbage collection being triggered too often?
● Is there something wrong with the Network I/O?
● Is there something funny going on with the Disk Usage? Are
reading/writing to disks taking too long?
● Are the third-party APIs taking too long to respond?
● … and so on
Tools that help to measure
● New Relic / AppDynamics / DataDog
● Metrics (from Dropwizard), Grafana
● Cloud provider tools (i.e., AWS CloudWatch)
● Custom Resource Monitoring Tools
Some simple scaling strategies
● Try to optimize database queries (more on this later)
Some simple scaling strategies
● Try to optimize database queries (more on this later)
● Try purchasing more powerful CPUs and more memory (Vertical
Scaling) for the application servers (only works for CPU- and
memory-bound applications)
Some simple scaling strategies
● Try to optimize database queries (more on this later)
● Try purchasing more powerful CPUs and more memory (Vertical
Scaling) for the application servers (only works for CPU- and
memory-bound applications)
● If database is not a bottleneck, try adding more application server
instances (Horizontal Scaling)
Some simple scaling strategies
● Try to optimize database queries (more on this later)
● Try purchasing more powerful CPUs and more memory (Vertical
Scaling) for the application servers (only works for CPU- and
memory-bound applications)
● If database is not a bottleneck, try adding more application server
instances (Horizontal Scaling)
● Try using a CDN to serve static contents
Most of the time, it is the Database
Scaling a Single Database – some simple
strategies
● Reduce the number of queries
Scaling a Single Database – some simple
strategies
● Reduce the number of queries
● Use indexes
Scaling a Single Database – some simple
strategies
● Reduce the number of queries
● Use indexes
● Make sure your indexes are being used by the queries in production
Scaling a Single Database – some simple
strategies
● Reduce the number of queries
● Use indexes
● Make sure your indexes are being used by the queries in production
● Make sure you are not creating too many indexes on write-heavy
tables
Scaling a Single Database – some simple
strategies
● Reduce the number of queries
● Use indexes
● Make sure your indexes are being used by the queries in production
● Make sure you are not creating too many indexes on write-heavy
tables
● Try purchasing powerful CPUs and more memories for the database
server (Vertical Scaling)
Scaling a Single Database – some simple
strategies
● Reduce the number of queries
● Use indexes
● Make sure your indexes are being used by the queries in production
● Make sure you are not creating too many indexes on write-heavy
tables
● Try purchasing powerful CPUs and more memories for the database
server (Vertical Scaling)
● … and many more (indexed views, denormalization, store pre-
computed value for fast read etc.)
Distributed systems - A Primer
Scaling Database Reads through Read Replication
Read Operation
Read Operation
Read Operation
Write Operation
Write Operation
Write Operation
Write Operation
Write Operation
Write Operation
Inconsistent Read
Inconsistent Read
Inconsistent Read
Inconsistent Read
Inconsistent Read
Inconsistent Read
ACID
Inconsistent Read
ACID
Eventual Consistency
ACID
BASE
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
Strong Consistency
… at the cost of Availability
There will always be
Trade-offs
It is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees:
CAP Theorem
Source: https://en.wikipedia.org/wiki/CAP_theorem
It is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees:
– Consistency
CAP Theorem
Source: https://en.wikipedia.org/wiki/CAP_theorem
It is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees:
– Consistency
– Availability
CAP Theorem
Source: https://en.wikipedia.org/wiki/CAP_theorem
It is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees:
– Consistency
– Availability
– Partition Tolerance
CAP Theorem
Source: https://en.wikipedia.org/wiki/CAP_theorem
Every read receives the most recent write or an error
CAP Theorem - Consistency
Source: https://en.wikipedia.org/wiki/CAP_theorem
Every request receives a (non-error) response – without
guarantee that it contains the most recent write
CAP Theorem - Availability
Source: https://en.wikipedia.org/wiki/CAP_theorem
The system continues to operate despite an arbitrary number
of messages being dropped (or delayed) by the network
between nodes
CAP Theorem – Partition Tolerance
Source: https://en.wikipedia.org/wiki/CAP_theorem
Read Replication - Advantages
● Can easily handle vast amount of concurrent reads
Read Replication - Advantages
● Can easily handle vast amount of concurrent reads
● Configuring Redundancy is very easy
Read Replication - Problems
● Not ACID
Read Replication - Problems
● Not ACID
● Consistency or Availability – choose one
Read Replication - Problems
● Not ACID
● Consistency or Availability – choose one
● Increased operational complexity compared to a single database
instance
How do I scale Writes?
Sharding / Partitioning / Multi-master Replication
Read/Write Operation
User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
Read/Write Operation
User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
User ID = 50000
Read/Write Operation
User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
User ID = 50000
Read/Write Operation
User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
User ID = 150000
Read/Write Operation
User ID IP
1 - 100000 192.168.197.17
100001 - 200000 192.168.197.18
User ID = 150000
“I would like to calculate the total revenue
earned from Harry Potter and the Deathly
Hallows over a certain period”
Scatter
Compute
Gather
… Aka MapReduce
● Scatter/Gather is famously known as the MapReduce paradigm
● Popularized by a famous research paper from Google
● A popular implementation is part of the Apache Hadoop project
Sharding – advantages
Can easily scale read/write to the Moon
Sharding – problems
● Operationally complex
– Cluster Management
– All queries need to have the Shard Key
Sharding – problems
● Operationally complex
– Cluster Management
– All queries need to have the Shard Key
● Sharding an RDBMS is painful
– Referential integrity cannot be guaranteed anymore
Sharding – problems
● Operationally complex
– Cluster Management
– All queries need to have the Shard Key
● Sharding an RDBMS is painful
– Referential integrity cannot be guaranteed anymore
● Each table must have the Shard Key
Sharding – problems
● Operationally complex
– Cluster Management
– All queries need to have the Shard Key
● Sharding an RDBMS is painful
– Referential integrity cannot be guaranteed anymore
● Each table must have the Shard Key
● Not suitable if most of the queries are Scatter/Gather
Next Topics
● Distributed Hash Tables / Consistently Hashed Data Stores
● Distributed Transactions
● A very brief introduction to Microservices
Additional Resources
● Distributed Systems in One Lesson by Tim Berglund
● Distributed Systems reading list by Tim Berglund
● Building Microservices by Sam Newman
● PostgreSQL documentation on High Availability
● MongoDB Replication Manual
● High Scalability
● Enterprise Integration Patterns
Thank you!
Questions?
Ad

More Related Content

Similar to Distributed systems - A Primer (20)

Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
IDERA Software
 
Capacityplanning
Capacityplanning Capacityplanning
Capacityplanning
Paulo Fagundes
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
Magento Imagine 2015 - Aspirin For Your MySQL Headaches
Magento Imagine 2015 - Aspirin For Your MySQL HeadachesMagento Imagine 2015 - Aspirin For Your MySQL Headaches
Magento Imagine 2015 - Aspirin For Your MySQL Headaches
Nexcess.net LLC
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
Steve Feldman
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Paris Open Source Summit
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – ArchitectureTaking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – Architecture
Splunk
 
Taking Splunk to the Next Level - Technical
Taking Splunk to the Next Level - TechnicalTaking Splunk to the Next Level - Technical
Taking Splunk to the Next Level - Technical
Splunk
 
Microservices. Mastering Chaos
Microservices. Mastering ChaosMicroservices. Mastering Chaos
Microservices. Mastering Chaos
UP2IT
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
Marko Mitranić
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Sergey Platonov
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureTaking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - Architecture
Splunk
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
Alex Cruise
 
VMworld 2013: Practical Real World Reporting with vCenter Operations
VMworld 2013: Practical Real World Reporting with vCenter OperationsVMworld 2013: Practical Real World Reporting with vCenter Operations
VMworld 2013: Practical Real World Reporting with vCenter Operations
VMworld
 
Stopping Storage Hardware Sprawl
Stopping Storage Hardware SprawlStopping Storage Hardware Sprawl
Stopping Storage Hardware Sprawl
Storage Switzerland
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
IDERA Software
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
Magento Imagine 2015 - Aspirin For Your MySQL Headaches
Magento Imagine 2015 - Aspirin For Your MySQL HeadachesMagento Imagine 2015 - Aspirin For Your MySQL Headaches
Magento Imagine 2015 - Aspirin For Your MySQL Headaches
Nexcess.net LLC
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
Steve Feldman
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Paris Open Source Summit
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – ArchitectureTaking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – Architecture
Splunk
 
Taking Splunk to the Next Level - Technical
Taking Splunk to the Next Level - TechnicalTaking Splunk to the Next Level - Technical
Taking Splunk to the Next Level - Technical
Splunk
 
Microservices. Mastering Chaos
Microservices. Mastering ChaosMicroservices. Mastering Chaos
Microservices. Mastering Chaos
UP2IT
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
Marko Mitranić
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Sergey Platonov
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureTaking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - Architecture
Splunk
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
Alex Cruise
 
VMworld 2013: Practical Real World Reporting with vCenter Operations
VMworld 2013: Practical Real World Reporting with vCenter OperationsVMworld 2013: Practical Real World Reporting with vCenter Operations
VMworld 2013: Practical Real World Reporting with vCenter Operations
VMworld
 
Stopping Storage Hardware Sprawl
Stopping Storage Hardware SprawlStopping Storage Hardware Sprawl
Stopping Storage Hardware Sprawl
Storage Switzerland
 

More from MD Sayem Ahmed (6)

Factory Method Pattern
Factory Method PatternFactory Method Pattern
Factory Method Pattern
MD Sayem Ahmed
 
An Introduction to Maven Part 1
An Introduction to Maven Part 1An Introduction to Maven Part 1
An Introduction to Maven Part 1
MD Sayem Ahmed
 
A brief overview of java frameworks
A brief overview of java frameworksA brief overview of java frameworks
A brief overview of java frameworks
MD Sayem Ahmed
 
Restful web services
Restful web servicesRestful web services
Restful web services
MD Sayem Ahmed
 
An introduction to javascript
An introduction to javascriptAn introduction to javascript
An introduction to javascript
MD Sayem Ahmed
 
01. design pattern
01. design pattern01. design pattern
01. design pattern
MD Sayem Ahmed
 
Factory Method Pattern
Factory Method PatternFactory Method Pattern
Factory Method Pattern
MD Sayem Ahmed
 
An Introduction to Maven Part 1
An Introduction to Maven Part 1An Introduction to Maven Part 1
An Introduction to Maven Part 1
MD Sayem Ahmed
 
A brief overview of java frameworks
A brief overview of java frameworksA brief overview of java frameworks
A brief overview of java frameworks
MD Sayem Ahmed
 
An introduction to javascript
An introduction to javascriptAn introduction to javascript
An introduction to javascript
MD Sayem Ahmed
 
Ad

Recently uploaded (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Ad

Distributed systems - A Primer

  • 1. Distributed Systems – A Primer MD Sayem Ahmed
  • 2. Who am I ● A Bangladeshi currently living in Berlin, Germany ● Occasionally blogs at www.codesod.com ● Tweets at @say3mbd ● Can also be found on LinkedIn ● Can be reached via email at [email protected]
  • 3. Today’s Agenda ● What are Distributed Systems? ● Why Distributed Systems? ● Read Replication/Single-master Replication ● CAP Theorem ● Sharding/Partitioning/Multi-master Replication
  • 4. Distributed Systems – A Definition A distributed system is a collection of independent computers that appears to its users as a single coherent system Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum,‎ Maarten van Steen
  • 5. A Typical Web Application is a Distributed System
  • 6. Key Characteristics of Distributed Systems ● Concurrency – all the computers operate at the same time
  • 7. Key Characteristics of Distributed Systems ● Concurrency – all the computers operate at the same time ● Transparency – system is perceived as a whole
  • 8. Key Characteristics of Distributed Systems ● Concurrency – all the computers operate at the same time ● Transparency – system is perceived as a whole ● Independent failure – the computers can fail independently
  • 10. Key Characteristics of Distributed Systems ● Concurrency – all the computers operate at the same time ● Transparency – system is perceived as a whole ● Independent failure – the computers can fail independently ● No global clock
  • 12. Back to the Book Shop
  • 13. More and more people are using my book shop! But then…. It takes a long time to load the book review pages!
  • 14. Book Shop is not scalable! Needs to handle more users!
  • 15. What should I scale?
  • 20. Focus on ● Database ● Memory ● CPU ● Network I/O ● Disk I/O
  • 21. Performance Measurement – Burning Questions ● Are my database queries slow?
  • 22. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high?
  • 23. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory?
  • 24. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory? ● Does my application have a memory leak?
  • 25. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory? ● Does my application have a memory leak? ● Is garbage collection being triggered too often?
  • 26. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory? ● Does my application have a memory leak? ● Is garbage collection being triggered too often? ● Is there something wrong with the Network I/O?
  • 27. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory? ● Does my application have a memory leak? ● Is garbage collection being triggered too often? ● Is there something wrong with the Network I/O? ● Is there something funny going on with the Disk Usage? Are reading/writing to disks taking too long?
  • 28. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory? ● Does my application have a memory leak? ● Is garbage collection being triggered too often? ● Is there something wrong with the Network I/O? ● Is there something funny going on with the Disk Usage? Are reading/writing to disks taking too long? ● Are the third-party APIs taking too long to respond?
  • 29. Performance Measurement – Burning Questions ● Are my database queries slow? ● Is my application’s CPU consumption high? ● Is my application running out of memory? ● Does my application have a memory leak? ● Is garbage collection being triggered too often? ● Is there something wrong with the Network I/O? ● Is there something funny going on with the Disk Usage? Are reading/writing to disks taking too long? ● Are the third-party APIs taking too long to respond? ● … and so on
  • 30. Tools that help to measure ● New Relic / AppDynamics / DataDog ● Metrics (from Dropwizard), Grafana ● Cloud provider tools (i.e., AWS CloudWatch) ● Custom Resource Monitoring Tools
  • 31. Some simple scaling strategies ● Try to optimize database queries (more on this later)
  • 32. Some simple scaling strategies ● Try to optimize database queries (more on this later) ● Try purchasing more powerful CPUs and more memory (Vertical Scaling) for the application servers (only works for CPU- and memory-bound applications)
  • 33. Some simple scaling strategies ● Try to optimize database queries (more on this later) ● Try purchasing more powerful CPUs and more memory (Vertical Scaling) for the application servers (only works for CPU- and memory-bound applications) ● If database is not a bottleneck, try adding more application server instances (Horizontal Scaling)
  • 34. Some simple scaling strategies ● Try to optimize database queries (more on this later) ● Try purchasing more powerful CPUs and more memory (Vertical Scaling) for the application servers (only works for CPU- and memory-bound applications) ● If database is not a bottleneck, try adding more application server instances (Horizontal Scaling) ● Try using a CDN to serve static contents
  • 35. Most of the time, it is the Database
  • 36. Scaling a Single Database – some simple strategies ● Reduce the number of queries
  • 37. Scaling a Single Database – some simple strategies ● Reduce the number of queries ● Use indexes
  • 38. Scaling a Single Database – some simple strategies ● Reduce the number of queries ● Use indexes ● Make sure your indexes are being used by the queries in production
  • 39. Scaling a Single Database – some simple strategies ● Reduce the number of queries ● Use indexes ● Make sure your indexes are being used by the queries in production ● Make sure you are not creating too many indexes on write-heavy tables
  • 40. Scaling a Single Database – some simple strategies ● Reduce the number of queries ● Use indexes ● Make sure your indexes are being used by the queries in production ● Make sure you are not creating too many indexes on write-heavy tables ● Try purchasing powerful CPUs and more memories for the database server (Vertical Scaling)
  • 41. Scaling a Single Database – some simple strategies ● Reduce the number of queries ● Use indexes ● Make sure your indexes are being used by the queries in production ● Make sure you are not creating too many indexes on write-heavy tables ● Try purchasing powerful CPUs and more memories for the database server (Vertical Scaling) ● … and many more (indexed views, denormalization, store pre- computed value for fast read etc.)
  • 43. Scaling Database Reads through Read Replication
  • 71. … at the cost of Availability
  • 72. There will always be Trade-offs
  • 73. It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: CAP Theorem Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 74. It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: – Consistency CAP Theorem Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 75. It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: – Consistency – Availability CAP Theorem Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 76. It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: – Consistency – Availability – Partition Tolerance CAP Theorem Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 77. Every read receives the most recent write or an error CAP Theorem - Consistency Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 78. Every request receives a (non-error) response – without guarantee that it contains the most recent write CAP Theorem - Availability Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 79. The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes CAP Theorem – Partition Tolerance Source: https://en.wikipedia.org/wiki/CAP_theorem
  • 80. Read Replication - Advantages ● Can easily handle vast amount of concurrent reads
  • 81. Read Replication - Advantages ● Can easily handle vast amount of concurrent reads ● Configuring Redundancy is very easy
  • 82. Read Replication - Problems ● Not ACID
  • 83. Read Replication - Problems ● Not ACID ● Consistency or Availability – choose one
  • 84. Read Replication - Problems ● Not ACID ● Consistency or Availability – choose one ● Increased operational complexity compared to a single database instance
  • 85. How do I scale Writes?
  • 86. Sharding / Partitioning / Multi-master Replication
  • 87. Read/Write Operation User ID IP 1 - 100000 192.168.197.17 100001 - 200000 192.168.197.18
  • 88. Read/Write Operation User ID IP 1 - 100000 192.168.197.17 100001 - 200000 192.168.197.18 User ID = 50000
  • 89. Read/Write Operation User ID IP 1 - 100000 192.168.197.17 100001 - 200000 192.168.197.18 User ID = 50000
  • 90. Read/Write Operation User ID IP 1 - 100000 192.168.197.17 100001 - 200000 192.168.197.18 User ID = 150000
  • 91. Read/Write Operation User ID IP 1 - 100000 192.168.197.17 100001 - 200000 192.168.197.18 User ID = 150000
  • 92. “I would like to calculate the total revenue earned from Harry Potter and the Deathly Hallows over a certain period”
  • 96. … Aka MapReduce ● Scatter/Gather is famously known as the MapReduce paradigm ● Popularized by a famous research paper from Google ● A popular implementation is part of the Apache Hadoop project
  • 97. Sharding – advantages Can easily scale read/write to the Moon
  • 98. Sharding – problems ● Operationally complex – Cluster Management – All queries need to have the Shard Key
  • 99. Sharding – problems ● Operationally complex – Cluster Management – All queries need to have the Shard Key ● Sharding an RDBMS is painful – Referential integrity cannot be guaranteed anymore
  • 100. Sharding – problems ● Operationally complex – Cluster Management – All queries need to have the Shard Key ● Sharding an RDBMS is painful – Referential integrity cannot be guaranteed anymore ● Each table must have the Shard Key
  • 101. Sharding – problems ● Operationally complex – Cluster Management – All queries need to have the Shard Key ● Sharding an RDBMS is painful – Referential integrity cannot be guaranteed anymore ● Each table must have the Shard Key ● Not suitable if most of the queries are Scatter/Gather
  • 102. Next Topics ● Distributed Hash Tables / Consistently Hashed Data Stores ● Distributed Transactions ● A very brief introduction to Microservices
  • 103. Additional Resources ● Distributed Systems in One Lesson by Tim Berglund ● Distributed Systems reading list by Tim Berglund ● Building Microservices by Sam Newman ● PostgreSQL documentation on High Availability ● MongoDB Replication Manual ● High Scalability ● Enterprise Integration Patterns