SlideShare a Scribd company logo
Scalable Web Architectures Common Patterns & Approaches Cal Henderson
Hello
Scalable Web Architectures? What does scalable mean? What’s an architecture?
Scalability – myths and lies What is scalability?
Scalability – myths and lies What is scalability  not  ?
Scalability – myths and lies What is scalability  not  ? Raw Speed / Performance HA / BCP Technology X Protocol Y
Scalability – myths and lies So what  is  scalability?
Scalability – myths and lies So what  is  scalability? Traffic growth Dataset growth Maintainability
Scalability Two kinds: Vertical (get bigger) Horizontal (get more)
Big Irons Sunfire E20k $450,000 - $2,500,000 36x 1.8GHz processors PowerEdge SC1425 2.8 GHz processor Under $1,500
Cost vs Cost
Cost vs Cost But sometimes vertical scaling is right Buying a bigger box is quick (ish) Redesigning software is not Running out of MySQL performance? Spend months on data federation Or, Just buy a ton more RAM
Cost vs Cost But let’s talk horizontal Else this is going to be boring
Architectures then? The way the bits fit together What grows where The trade offs between good/fast/cheap
LAMP We’re talking about LAMP Linux Apache (or LightHTTPd) MySQL (or Postgres) PHP (or Perl, Python, Ruby) All open source All well supported All used in large operations
Simple web apps A Web Application Or  “Web Site”  in Web 1.0 terminology Interwebnet App server Database
App servers App servers scale in two ways:
App servers App servers scale in two ways: Really well
App servers App servers scale in two ways: Really well Quite badly
App servers Sessions! (State) Local sessions == bad When they move == quite bad Central sessions == good No sessions at all == awesome!
Local sessions Stored on disk PHP sessions Stored in memory Shared memory block Bad! Can’t move users Can’t avoid hotspots
Mobile local sessions Custom built Store last session location in cookie If we hit a different server, pull our session information across If your load balancer has sticky sessions, you can still get hotspots Depends on volume – fewer heavier users hurt more
Remote centralized sessions Store in a central database Or an in-memory cache No porting around of session data No need for sticky sessions No hot spots Need to be able to scale the data store But we’ve pushed the issue down the stack
No sessions Stash it all in a cookie! Sign it for safety $data = $user_id . ‘-’ . $user_name; $time = time(); $sig = sha1($secret . $time . $data); $cookie = base64(“$sig-$time-$data”); Timestamp means it’s simple to expire it
Super slim sessions If you need more than the cookie (login status, user id, username), then pull their account row from the DB Or from the account cache None of the drawbacks of sessions Avoids the overhead of a query per page Great for high-volume pages which need little personalization Turns out you can stick quite a lot in a cookie too Pack with base64 and it’s easy to delimit fields
App servers The Rasmus way App server has ‘shared nothing’ Responsibility pushed down the stack Ooh, the stack
Trifle
Trifle Sponge / Database Jelly / Business Logic Custard / Page Logic Cream / Markup Fruit / Presentation
Trifle Sponge / Database Jelly / Business Logic Custard / Page Logic Cream / Markup Fruit / Presentation
App servers
App servers
App servers
Well, that was easy Scaling the web app server part is  easy The rest is the trickier part Database Serving static content Storing static content
The others Other services scale similarly to web apps That is, horizontally The canonical examples: Image conversion Audio transcoding Video transcoding Web crawling
Parallelizable == easy! If we can transcode/crawl in parallel, it’s easy But think about queuing And asynchronous systems The web ain’t built for slow things But still, a simple problem
Asynchronous systems
Asynchronous systems
Helps with peak periods
Asynchronous systems
Asynchronous systems
Asynchronous systems
The big three Let’s talk about the big three then… Databases Serving lots of static content Storing lots of static content
Databases Unless we’re doing a lot of file serving, the database is the toughest part to scale If we can, best to avoid the issue altogether and just buy bigger hardware Dual Opteron/Intel64 systems with 16GB of RAM can get you a long way
More read power Web apps typically have a read/write ratio of somewhere between 80/20 and 90/10 If we can scale read capacity, we can solve a lot of situations MySQL replication!
Master-Slave Replication
Master-Slave Replication Reads and Writes Reads
Master-Slave Replication
Master-Slave Replication
Master-Slave Replication
Master-Slave Replication
Master-Slave Replication
Master-Slave Replication
Master-Slave Replication
Master-Slave Replication
Caching Caching avoids needing to scale! Or makes it cheaper Simple stuff mod_perl / shared memory – dumb MySQL query cache - dumbish
Caching Getting more complicated… Write-through cache Write-back cache Sideline cache
Write-through cache
Write-back cache
Sideline cache
Sideline cache Easy to implement Just add app logic Need to manually invalidate cache Well designed code makes it easy Memcached From Danga (LiveJournal) http://www.danga.com/memcached/
But what about HA?
But what about HA?
SPOF! The key to HA is avoiding SPOFs Identify Eliminate Some stuff is hard to solve Fix it further up the tree Dual DCs solves Router/Switch SPOF
Master-Master
Master-Master Either hot/warm or hot/hot Writes can go to either But avoid collisions No auto-inc columns for hot/hot Bad for hot/warm too Design schema/access to avoid collisions Hashing users to servers
Rings Master-master is just a small ring With 2 members Bigger rings are possible But not a mesh! Each slave may only have a single master Unless you build some kind of manual replication
Rings
Rings
Dual trees Master-master is good for HA But we can’t scale out the reads We often need to combine the read scaling with HA We can combine the two
Dual trees
Data federation At some point, you need more writes This is tough Each cluster of servers has limited write capacity Just add more clusters!
Data federation Split up large tables, organized by some primary object Usually users Put all of a user’s data on one ‘cluster’ Or shard, or cell Have one central cluster for lookups
Data federation
Data federation Need more capacity? Just add shards! Don’t assign to shards based on user_id! For resource leveling as time goes on, we want to be able to move objects between shards ‘ Lockable’ objects
Data federation Heterogeneous hardware is fine Just give a larger/smaller proportion of objects depending on hardware Bigger/faster hardware for paying users A common approach
Downsides Need to keep stuff in the right place App logic gets more complicated More clusters to manage Backups, etc More database connections needed per page The dual table issue Avoid walking the shards!
Bottom line Data federation is how large applications are scaled
Bottom line It’s hard, but not impossible Good software design makes it easier Abstraction! Master-master pairs for shards give us HA Master-master trees work for central cluster (many reads, few writes)
Multiple Datacenters Having multiple datacenters is hard Not just with MySQL Hot/warm with MySQL slaved setup But manual Hot/hot with master-master But dangerous Hot/hot with sync/async manual replication But tough
Multiple Datacenters
Serving lots of files Serving lots of files is not too tough Just buy lots of machines and load balance! We’re IO bound – need more spindles! But keeping many copies of data in sync is hard And sometimes we have other per-request overhead (like auth)
Reverse proxy
Reverse proxy Serving out of memory is fast! And our caching proxies can have disks too Fast or otherwise More spindles is better We stay in sync automatically We can parallelize it!  50 cache servers gives us 50 times the serving rate of the origin server Assuming the working set is small enough to fit in memory in the cache cluster
Invalidation Dealing with invalidation is tricky We can prod the cache servers directly to clear stuff out Scales badly – need to clear asset from every server – doesn’t work well for 100 caches
Invalidation We can change the URLs of modified resources And let the old ones drop out cache naturally Or prod them out, for sensitive data Good approach! Avoids browser cache staleness Hello akamai (and other CDNs) Read more:  http://www.thinkvitamin.com/features/webapps/serving-javascript-fast
Reverse proxy Choices L7 load balancer & Squid http://www.squid-cache.org/ mod_proxy & mod_cache http://www.apache.org/ Perlbal and Memcache? http://www.danga.com/
High overhead serving What if you need to authenticate your asset serving Private photos Private data Subscriber-only files Two main approaches
Perlbal backhanding Perlbal can do redirection magic Backend server sends header to Perbal Perlbal goes to pick up the file from elsewhere Transparent to user
Perlbal backhanding
Perlbal backhanding Doesn’t keep database around while serving Doesn’t keep app server around while serving User doesn’t find out how to access asset directly
Permission URLs But why bother!? If we bake the auth into the URL then it saves the auth step We can do the auth on the web app servers when creating HTML Just need some magic to translate to paths We don’t want paths to be guessable
Permission URLs
Storing lots of files Storing files is easy! Get a big disk Get a bigger disk Uh oh! Horizontal scaling is the key Again
Connecting to storage NFS Stateful == Sucks Hard mounts vs Soft mounts SMB / CIFS / Samba Turn off MSRPC & WINS (NetBOIS NS) Stateful but degrades gracefully HTTP Stateless == yay! Just use Apache
Multiple volumes Volumes are limited in total size Except under ZFS & others Sometimes we need multiple volumes for performance reasons When use RAID with single/dual parity At some point, we need multiple volumes
Multiple volumes
Multiple hosts Further down the road, a single host will be too small Total throughput of machine becomes an issue Even physical space can start to matter So we need to be able to use multiple hosts
Multiple hosts
HA Storage HA is important for assets too We can back stuff up But we want it hot redundant RAID is good RAID5 is cheap, RAID 10 is fast
HA Storage But whole machines can fail So we stick assets on multiple machines In this case, we can ignore RAID In failure case, we serve from alternative source But need to weigh up the rebuild time and effort against the risk Store more than 2 copies?
HA Storage
Self repairing systems When something fails, repairing can be a pain RAID rebuilds by itself, but machine replication doesn’t The big appliances self heal NetApp, StorEdge, etc So does MogileFS
Real world examples Flickr Because I know it LiveJournal Because everyone copies it
Flickr Architecture
LiveJournal Architecture
Buy my book!
The end!
Awesome! These slides are available online: iamcal.com/talks/

More Related Content

What's hot (20)

PDF
WAG the Blog
Evan Volgas
 
PPTX
Do you queue
10n Software, LLC
 
PDF
Optimizing wp
Mark Kelnar
 
PPTX
Day 7 - Make it Fast
Barry Jones
 
PPTX
HyperDB, MySQL Performance, & Flavors of MySQL
Evan Volgas
 
PDF
Best practices-wordpress-enterprise
Taylor Lovett
 
PPTX
BTV PHP - Building Fast Websites
Jonathan Klein
 
PPTX
First steps of programming with php
Kanha Sahu
 
PPT
20130714 php matsuri - highly available php
Graham Weldon
 
PDF
Speeding up your WordPress Site - WordCamp Toronto 2015
Alan Lok
 
PPS
Web20expo Filesystems
royans
 
PDF
Drupal 8: Huge wins, a Bigger Community, and why you (and I) will Love it
Ryan Weaver
 
PDF
Drupal Performance : DrupalCamp North
Philip Norton
 
PDF
Cache Rules Everything Around Me
Russell Heimlich
 
PPS
Web20expo Scalable Web Arch
guest18a0f1
 
PPT
Windy cityrails performance_tuning
John McCaffrey
 
PPTX
Website design & developemet
Apurva Tripathi
 
PPTX
Paragraphs at drupal 8.
Anatoliy Polyakov
 
PPTX
Piecing Together the WordPress Puzzle
Business Vitality LLC
 
WAG the Blog
Evan Volgas
 
Do you queue
10n Software, LLC
 
Optimizing wp
Mark Kelnar
 
Day 7 - Make it Fast
Barry Jones
 
HyperDB, MySQL Performance, & Flavors of MySQL
Evan Volgas
 
Best practices-wordpress-enterprise
Taylor Lovett
 
BTV PHP - Building Fast Websites
Jonathan Klein
 
First steps of programming with php
Kanha Sahu
 
20130714 php matsuri - highly available php
Graham Weldon
 
Speeding up your WordPress Site - WordCamp Toronto 2015
Alan Lok
 
Web20expo Filesystems
royans
 
Drupal 8: Huge wins, a Bigger Community, and why you (and I) will Love it
Ryan Weaver
 
Drupal Performance : DrupalCamp North
Philip Norton
 
Cache Rules Everything Around Me
Russell Heimlich
 
Web20expo Scalable Web Arch
guest18a0f1
 
Windy cityrails performance_tuning
John McCaffrey
 
Website design & developemet
Apurva Tripathi
 
Paragraphs at drupal 8.
Anatoliy Polyakov
 
Piecing Together the WordPress Puzzle
Business Vitality LLC
 

Similar to Scalable Web Architectures - Common Patterns & Approaches (20)

PPS
Web20expo Scalable Web Arch
mclee
 
PPS
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
PPT
Knowledge share about scalable application architecture
AHM Pervej Kabir
 
ODP
MNPHP Scalable Architecture 101 - Feb 3 2011
Mike Willbanks
 
PPS
Beyond the File System - Designing Large Scale File Storage and Serving
mclee
 
PPS
Filesystems
royans
 
PPS
Web20expo Filesystems
guest18a0f1
 
PPS
Beyond the File System: Designing Large-Scale File Storage and Serving
mclee
 
PPS
Web20expo Filesystems
royans
 
ODP
Scalable Architecture 101
Mike Willbanks
 
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
PPT
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
PPT
FOWA Scaling The Lamp Stack Workshop
dlieberman
 
ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
PPT
Apache Con 2008 Top 10 Mistakes
John Coggeshall
 
PPTX
Handling Data in Mega Scale Systems
Directi Group
 
PPT
Lamp Stack Optimization
Dave Ross
 
PPT
Web Speed And Scalability
Jason Ragsdale
 
PPT
Domino server and application performance in the real world
dominion
 
PPTX
Building the perfect share point farm
David Broussard
 
Web20expo Scalable Web Arch
mclee
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Knowledge share about scalable application architecture
AHM Pervej Kabir
 
MNPHP Scalable Architecture 101 - Feb 3 2011
Mike Willbanks
 
Beyond the File System - Designing Large Scale File Storage and Serving
mclee
 
Filesystems
royans
 
Web20expo Filesystems
guest18a0f1
 
Beyond the File System: Designing Large-Scale File Storage and Serving
mclee
 
Web20expo Filesystems
royans
 
Scalable Architecture 101
Mike Willbanks
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
FOWA Scaling The Lamp Stack Workshop
dlieberman
 
Front Range PHP NoSQL Databases
Jon Meredith
 
Apache Con 2008 Top 10 Mistakes
John Coggeshall
 
Handling Data in Mega Scale Systems
Directi Group
 
Lamp Stack Optimization
Dave Ross
 
Web Speed And Scalability
Jason Ragsdale
 
Domino server and application performance in the real world
dominion
 
Building the perfect share point farm
David Broussard
 
Ad

More from Cal Henderson (8)

PPSX
Web App Scaffolding - FOWA London 2010
Cal Henderson
 
PPT
Building Big on the Web
Cal Henderson
 
PDF
Why I Hate Django - Part 2/2
Cal Henderson
 
PDF
Why I Hate Django - Part 1/2
Cal Henderson
 
PPS
I can has API? A Love Story
Cal Henderson
 
PPS
Scalable PHP
Cal Henderson
 
PPS
Ten reasons to love Web 2.0
Cal Henderson
 
PPS
Web Services Mash-Up
Cal Henderson
 
Web App Scaffolding - FOWA London 2010
Cal Henderson
 
Building Big on the Web
Cal Henderson
 
Why I Hate Django - Part 2/2
Cal Henderson
 
Why I Hate Django - Part 1/2
Cal Henderson
 
I can has API? A Love Story
Cal Henderson
 
Scalable PHP
Cal Henderson
 
Ten reasons to love Web 2.0
Cal Henderson
 
Web Services Mash-Up
Cal Henderson
 
Ad

Recently uploaded (20)

PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PDF
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Digital Circuits, important subject in CS
contactparinay1
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 

Scalable Web Architectures - Common Patterns & Approaches

  • 1. Scalable Web Architectures Common Patterns & Approaches Cal Henderson
  • 3. Scalable Web Architectures? What does scalable mean? What’s an architecture?
  • 4. Scalability – myths and lies What is scalability?
  • 5. Scalability – myths and lies What is scalability not ?
  • 6. Scalability – myths and lies What is scalability not ? Raw Speed / Performance HA / BCP Technology X Protocol Y
  • 7. Scalability – myths and lies So what is scalability?
  • 8. Scalability – myths and lies So what is scalability? Traffic growth Dataset growth Maintainability
  • 9. Scalability Two kinds: Vertical (get bigger) Horizontal (get more)
  • 10. Big Irons Sunfire E20k $450,000 - $2,500,000 36x 1.8GHz processors PowerEdge SC1425 2.8 GHz processor Under $1,500
  • 12. Cost vs Cost But sometimes vertical scaling is right Buying a bigger box is quick (ish) Redesigning software is not Running out of MySQL performance? Spend months on data federation Or, Just buy a ton more RAM
  • 13. Cost vs Cost But let’s talk horizontal Else this is going to be boring
  • 14. Architectures then? The way the bits fit together What grows where The trade offs between good/fast/cheap
  • 15. LAMP We’re talking about LAMP Linux Apache (or LightHTTPd) MySQL (or Postgres) PHP (or Perl, Python, Ruby) All open source All well supported All used in large operations
  • 16. Simple web apps A Web Application Or “Web Site” in Web 1.0 terminology Interwebnet App server Database
  • 17. App servers App servers scale in two ways:
  • 18. App servers App servers scale in two ways: Really well
  • 19. App servers App servers scale in two ways: Really well Quite badly
  • 20. App servers Sessions! (State) Local sessions == bad When they move == quite bad Central sessions == good No sessions at all == awesome!
  • 21. Local sessions Stored on disk PHP sessions Stored in memory Shared memory block Bad! Can’t move users Can’t avoid hotspots
  • 22. Mobile local sessions Custom built Store last session location in cookie If we hit a different server, pull our session information across If your load balancer has sticky sessions, you can still get hotspots Depends on volume – fewer heavier users hurt more
  • 23. Remote centralized sessions Store in a central database Or an in-memory cache No porting around of session data No need for sticky sessions No hot spots Need to be able to scale the data store But we’ve pushed the issue down the stack
  • 24. No sessions Stash it all in a cookie! Sign it for safety $data = $user_id . ‘-’ . $user_name; $time = time(); $sig = sha1($secret . $time . $data); $cookie = base64(“$sig-$time-$data”); Timestamp means it’s simple to expire it
  • 25. Super slim sessions If you need more than the cookie (login status, user id, username), then pull their account row from the DB Or from the account cache None of the drawbacks of sessions Avoids the overhead of a query per page Great for high-volume pages which need little personalization Turns out you can stick quite a lot in a cookie too Pack with base64 and it’s easy to delimit fields
  • 26. App servers The Rasmus way App server has ‘shared nothing’ Responsibility pushed down the stack Ooh, the stack
  • 28. Trifle Sponge / Database Jelly / Business Logic Custard / Page Logic Cream / Markup Fruit / Presentation
  • 29. Trifle Sponge / Database Jelly / Business Logic Custard / Page Logic Cream / Markup Fruit / Presentation
  • 33. Well, that was easy Scaling the web app server part is easy The rest is the trickier part Database Serving static content Storing static content
  • 34. The others Other services scale similarly to web apps That is, horizontally The canonical examples: Image conversion Audio transcoding Video transcoding Web crawling
  • 35. Parallelizable == easy! If we can transcode/crawl in parallel, it’s easy But think about queuing And asynchronous systems The web ain’t built for slow things But still, a simple problem
  • 38. Helps with peak periods
  • 42. The big three Let’s talk about the big three then… Databases Serving lots of static content Storing lots of static content
  • 43. Databases Unless we’re doing a lot of file serving, the database is the toughest part to scale If we can, best to avoid the issue altogether and just buy bigger hardware Dual Opteron/Intel64 systems with 16GB of RAM can get you a long way
  • 44. More read power Web apps typically have a read/write ratio of somewhere between 80/20 and 90/10 If we can scale read capacity, we can solve a lot of situations MySQL replication!
  • 46. Master-Slave Replication Reads and Writes Reads
  • 55. Caching Caching avoids needing to scale! Or makes it cheaper Simple stuff mod_perl / shared memory – dumb MySQL query cache - dumbish
  • 56. Caching Getting more complicated… Write-through cache Write-back cache Sideline cache
  • 60. Sideline cache Easy to implement Just add app logic Need to manually invalidate cache Well designed code makes it easy Memcached From Danga (LiveJournal) http://www.danga.com/memcached/
  • 63. SPOF! The key to HA is avoiding SPOFs Identify Eliminate Some stuff is hard to solve Fix it further up the tree Dual DCs solves Router/Switch SPOF
  • 65. Master-Master Either hot/warm or hot/hot Writes can go to either But avoid collisions No auto-inc columns for hot/hot Bad for hot/warm too Design schema/access to avoid collisions Hashing users to servers
  • 66. Rings Master-master is just a small ring With 2 members Bigger rings are possible But not a mesh! Each slave may only have a single master Unless you build some kind of manual replication
  • 67. Rings
  • 68. Rings
  • 69. Dual trees Master-master is good for HA But we can’t scale out the reads We often need to combine the read scaling with HA We can combine the two
  • 71. Data federation At some point, you need more writes This is tough Each cluster of servers has limited write capacity Just add more clusters!
  • 72. Data federation Split up large tables, organized by some primary object Usually users Put all of a user’s data on one ‘cluster’ Or shard, or cell Have one central cluster for lookups
  • 74. Data federation Need more capacity? Just add shards! Don’t assign to shards based on user_id! For resource leveling as time goes on, we want to be able to move objects between shards ‘ Lockable’ objects
  • 75. Data federation Heterogeneous hardware is fine Just give a larger/smaller proportion of objects depending on hardware Bigger/faster hardware for paying users A common approach
  • 76. Downsides Need to keep stuff in the right place App logic gets more complicated More clusters to manage Backups, etc More database connections needed per page The dual table issue Avoid walking the shards!
  • 77. Bottom line Data federation is how large applications are scaled
  • 78. Bottom line It’s hard, but not impossible Good software design makes it easier Abstraction! Master-master pairs for shards give us HA Master-master trees work for central cluster (many reads, few writes)
  • 79. Multiple Datacenters Having multiple datacenters is hard Not just with MySQL Hot/warm with MySQL slaved setup But manual Hot/hot with master-master But dangerous Hot/hot with sync/async manual replication But tough
  • 81. Serving lots of files Serving lots of files is not too tough Just buy lots of machines and load balance! We’re IO bound – need more spindles! But keeping many copies of data in sync is hard And sometimes we have other per-request overhead (like auth)
  • 83. Reverse proxy Serving out of memory is fast! And our caching proxies can have disks too Fast or otherwise More spindles is better We stay in sync automatically We can parallelize it! 50 cache servers gives us 50 times the serving rate of the origin server Assuming the working set is small enough to fit in memory in the cache cluster
  • 84. Invalidation Dealing with invalidation is tricky We can prod the cache servers directly to clear stuff out Scales badly – need to clear asset from every server – doesn’t work well for 100 caches
  • 85. Invalidation We can change the URLs of modified resources And let the old ones drop out cache naturally Or prod them out, for sensitive data Good approach! Avoids browser cache staleness Hello akamai (and other CDNs) Read more: http://www.thinkvitamin.com/features/webapps/serving-javascript-fast
  • 86. Reverse proxy Choices L7 load balancer & Squid http://www.squid-cache.org/ mod_proxy & mod_cache http://www.apache.org/ Perlbal and Memcache? http://www.danga.com/
  • 87. High overhead serving What if you need to authenticate your asset serving Private photos Private data Subscriber-only files Two main approaches
  • 88. Perlbal backhanding Perlbal can do redirection magic Backend server sends header to Perbal Perlbal goes to pick up the file from elsewhere Transparent to user
  • 90. Perlbal backhanding Doesn’t keep database around while serving Doesn’t keep app server around while serving User doesn’t find out how to access asset directly
  • 91. Permission URLs But why bother!? If we bake the auth into the URL then it saves the auth step We can do the auth on the web app servers when creating HTML Just need some magic to translate to paths We don’t want paths to be guessable
  • 93. Storing lots of files Storing files is easy! Get a big disk Get a bigger disk Uh oh! Horizontal scaling is the key Again
  • 94. Connecting to storage NFS Stateful == Sucks Hard mounts vs Soft mounts SMB / CIFS / Samba Turn off MSRPC & WINS (NetBOIS NS) Stateful but degrades gracefully HTTP Stateless == yay! Just use Apache
  • 95. Multiple volumes Volumes are limited in total size Except under ZFS & others Sometimes we need multiple volumes for performance reasons When use RAID with single/dual parity At some point, we need multiple volumes
  • 97. Multiple hosts Further down the road, a single host will be too small Total throughput of machine becomes an issue Even physical space can start to matter So we need to be able to use multiple hosts
  • 99. HA Storage HA is important for assets too We can back stuff up But we want it hot redundant RAID is good RAID5 is cheap, RAID 10 is fast
  • 100. HA Storage But whole machines can fail So we stick assets on multiple machines In this case, we can ignore RAID In failure case, we serve from alternative source But need to weigh up the rebuild time and effort against the risk Store more than 2 copies?
  • 102. Self repairing systems When something fails, repairing can be a pain RAID rebuilds by itself, but machine replication doesn’t The big appliances self heal NetApp, StorEdge, etc So does MogileFS
  • 103. Real world examples Flickr Because I know it LiveJournal Because everyone copies it
  • 108. Awesome! These slides are available online: iamcal.com/talks/