SlideShare a Scribd company logo
Large-Scale Architecture
The Unreasonable
Effectiveness of Simplicity
Randy Shoup
@randyshoup
Background
@randyshoup
@randyshoup
• 1995 Monolithic Perl
• 1996-2002 v2
o Monolithic C++ ISAPI DLL
o 3.4M lines of code
o Compiler limits on number of methods per class
• 2002-2006 v3 Migration
o Java “mini-applications”
o Shared databases
• 2012 v4 Microservices
• 2017 v5 Microservices
@randyshoup
eBay Architecture
• 1995-2001 Obidos
o Monolithic Perl / Mason frontend over C
backend
o ~4GB application in a 4GB address space
o Regularly breaking Gnu linker
o Restarting every 100-200 requests for memory
leaks
o Releasing once per quarter
• 2001-2005 Service Migration
o Services in C++, Java, etc.
o No shared databases
• 2006 AWS launches
@randyshoup
Amazon Architecture
No one starts with microservices
…
Past a certain scale, everyone ends
up with microservices
@randyshoup
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
“There are two methods in
software design: One is to make
the program so simple, there are
obviously no errors. The other is
to make it so complicated, there
are no obvious errors.”
-- Tony Hoare
@randyshoup
Modular Services
• Service boundaries match the problem
domain
• Service boundaries encapsulate
business logic and data
o All interactions through published service interface
o Interface hides internal implementation details
o No back doors
• Service boundaries encapsulate
architectural -ilities
o Fault isolation
o Performance optimization
o Security boundary
@randyshoup
Orthogonal Domain Logic
• Stateless domain logic
o Ideally stateless pure function
o Matches domain problem as directly as possible
o Deterministic and testable in isolation
o Robust to change over time
• “Straight-line processing”
o Straightforward, synchronous, minimal branching
• Separate domain logic from I/O
o Hexagonal architecture, Ports and Adapters
o Functional core, imperative shell
@randyshoup
Sharding
• Shards partition the service’s “data space”
o Units for distribution, replication, processing, storage
o Hidden as internal implementation detail
• Shards encapsulate architectural -ilities
o Resource isolation
o Fault isolation
o Availability
o Performance
• Shards are autoscaled
o Divide or scale out as processing or data needs increase
o E.g., DynamoDB partitions, Aurora segments, Bigtable
tablets
@randyshoup
Service Graph
• Common services provide and
abstract widely-used capabilities
• Service topology
o Services call others, which call others, etc.
o Graph, not a strict layering
• Simplifies concerns for service team
o Concentrate on business logic
o Abstract away details of dependencies
o Focus on services you need, capabilities you
provide
@randyshoup
Common Platform
• “Paved Road”
o Shared infrastructure
o Standard frameworks
o Developer experience
o E.g., Netflix, Google
• Separation of Concerns
o Reduce cognitive load on stream-aligned
teams
o Bound decisions through enabling constraints
@randyshoup
Large-scale organizations often
invest more than 50% of
engineering effort in platform
capabilities
@randyshoup
Service Ecosystem
@randyshoup
Evolving Services
• Variation and Natural Selection
o Create / extract new services when needed to
solve a problem
o Services justify their continued existence through
usage
o Deprecate services when they are no longer used
• Services grow and evolve over time
o Factor out common libraries and services as
needed
o Teams and services split like “cellular mitosis”
@randyshoup
“Every service at
Google is either
deprecated or not
ready yet.”
@randyshoup
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
“Everything should
be made as simple
as possible, but not
simpler.”
@randyshoup
Event-Driven
• Communicate state changes as
stream of events
o Statement that some interesting thing
occurred
o Ideally represents a semantic domain event
• Decouples domains and teams
o Abstracted through a well-defined
interface
o Asynchronous from one another
• Simplifies component
implementation
@randyshoup
Immutable Log
• Store state as immutable log of events
o Event Sourcing
• Often matches domain
o E.g., Stitch Fix order processing / delivery state
• Log encapsulates architectural –ilities
o Durable
o Traceable and auditable
o Replayable
o Explicit and comprehensible
• Compact snapshots for efficiency
@randyshoup
Immutable Log
• Stitch Fix order states
Request a fix fix_scheduled
Assign fix to warehouse fix_hizzy_assigned
Assign fix to a stylist fix_stylist_assigned
Style the fix fix_styled
Pick the items for the fix fix_picked
Pack the items into a box fix_packed
Ship the fix via a carrier fix_shipped
Fix travels to customer fix_delivered
Customer decides, pays fix_checked_out
Embrace Asynchrony
• Decouples operations in time
o Decoupled availability
o Independent scalability
o Allows more complex processing, more
processing in parallel
o Safer to make independent changes
• Simplifies component
implementation
@randyshoup
Embrace Asynchrony
• Invert from synchronous call
graph to async dataflow
o Exploit asymmetry between writes and
reads
o Can be orders of magnitude less
resource intensive
@randyshoup
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
“A complex system that works is
invariably found to have evolved
from a simple system that
worked.”
-- Gall’s Law
@randyshoup
Incremental Change
• Decompose every large change into small
incremental steps
• Each step maintains backward / forward
compatibility of data and interfaces
• Multiple service versions commonly coexist
o Every change is a rolling upgrade
o Transitional states are normal, not exceptional
Continuous Testing
• Tests help us go faster
o Tests are “solid ground”
o Tests are the safety net
• Tests make better code
o Confidence to break things
o Courage to refactor mercilessly
• Tests make better systems
o Catch bugs earlier, fail faster
@randyshoup
Developer Productivity
• 75% reading
existing code
• 20% modifying
existing code
• 5% writing new
code
https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/
@randyshoup
Developer Productivity
• 75% reading
existing code
• 20% modifying
existing code
• 5% writing new
code
https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/
@randyshoup
Continuous Testing
• Tests make better designs
o Modularity
o Separation of Concerns
o Encapsulation
@randyshoup
“There’s a deep synergy between
testability and good design. All of the
pain that we feel when writing unit tests
points at underlying design problems.”
@randyshoup
-- Michael Feathers
Test-Driven Development
• è Basically no bug tracking system (!)
o “Inbox Zero” for bugs
o Bugs are fixed as they come up
o Backlog contains features we want to build
o Backlog contains technical debt we want to repay
@randyshoup
Canary Deployments
• Staged rollout
o Go slowly at first; go faster when you gain confidence
• Automated rollout / rollback
o Automatically monitor changes to metrics
o If metrics look good, keep going; if metrics look bad, roll back
• Make deployments routine and boring
@randyshoup
Feature Flags
• Configuration “flag” to enable / disable a feature
for a particular set of users
o Independently discovered at eBay, Facebook, Google, etc.
• More solid systems
o Decouple feature delivery from code delivery
o Rapid on and off
o Separate experiment and control groups
o Develop / test / verify in production
@randyshoup
Continuous Delivery
• Deploy services multiple times per day
o Robust build, test, deploy pipeline
o SLO monitoring
o Synthetic monitoring
• More solid systems
o Release smaller, simpler units of work
o Smaller changes to roll back or roll forward
o Faster to repair, easier to understand, simpler to diagnose
o Increase rate of change and reduce risk of change
@randyshoup
• Cross-company Velocity Initiative to
improve software delivery
o Think Big, Start Small, Learn Fast
o Iteratively identify and remove bottlenecks for teams
o “What would it take to deploy your application every
day?”
• Doubled engineering productivity
o 5x faster deployment frequency
o 5x faster lead time
o 3x lower change failure rate
o 3x lower mean-time-to-restore
• Prerequisite for large-scale architecture
changes
@randyshoup
Continuous Delivery
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
System of Record
• Single System of Record
o Every piece of data is owned by a single service
o That service is the canonical system of record for that data
• Every other copy is a read-only, non-authoritative cache
customer-service
styling-service
customer-search
billing-service
@randyshoup
Shared Data
Option 1: Synchronous Lookup
o Customer service owns customer data
o Fulfillment service calls customer service in real time
fulfillment-service
customer-service
@randyshoup
Shared Data
Option 2: Async event + local cache
o Customer service owns customer data
o Customer service sends address-updated event when customer address
changes
o Fulfillment service caches current customer address
fulfillment-service
customer-service
@randyshoup
Joins
Option 1: Join in Client Service
o Get a single customer from customer-service
o Query matching orders for that customer from order-service
Customers
Orders
order-history-page
customer-service order-service
@randyshoup
Joins
Option 2: Service that “Materializes the View”
o Listen to events from customer-service, events from order-service
o Maintain denormalized join of customer data and orders together in local
storage
Customer Orders
customer-order-service
customer-service
order-service
@randyshoup
Netflix Viewing History
• Store and process member’s playback
data
o 1M requests per second
o Used for viewing history, personalization,
recommendations, analytics, etc.
• Original synchronous architecture
o Synchronously write to persistent storage and lookup
cache
o Availability and data loss from backpressure at high
load
• Asynchronous rearchitecture
o Write to durable queue
o Async pipeline to enrich, process, store, serve
o Materialize views to serve reads
@randyshoup Sharma Podila, 2021, Microservices to Async Processing Migration at Scale, QConPlus 2021.
Walmart Item Availability
• Is this item available to ship to this customer?
o Customer SLO 99.98% uptime in 300ms
• Complex logic involving many teams and
domains
o Inventory, reservations, backorders, eligibility, sales caps, etc.
• Original synchronous architecture
o Graph of 23 nested synchronous service calls in hot path
o Any component failure invalidates results
o Service SLOs 99.999% uptime with 50ms marginal latency
o Extremely expensive to build and operate
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Walmart Item Availability
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Walmart Item Availability
• Invert each service to use async events
o Event-driven “dataflow”
o Idempotent processing
o Event-sourced immutable log
o Materialized view of data from upstream dependencies
• Asynchronous rearchitecture
o 2 services in synchronous hot path
o Async service SLOs 99.9% uptime with latency in seconds
or minutes
o More resilient to delays and outages
o Orders of magnitude simpler to build and operate
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Walmart Item Availability
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
Thank you!
@randyshoup
linkedin.com/in/randyshoup
medium.com/@randyshoup

More Related Content

What's hot (20)

PPSX
Event Sourcing & CQRS, Kafka, Rabbit MQ
Araf Karsh Hamid
 
PPTX
Microservices Architecture - Bangkok 2018
Araf Karsh Hamid
 
PPTX
Migrate an Existing Application to Microsoft Azure
Chris Dufour
 
PPTX
Azure Migration Program Pitch Deck
Nicholas Vossburg
 
PDF
Migrate to Microsoft Azure with Confidence
David J Rosenthal
 
PPTX
Exposing services with Azure API Management
Callon Campbell
 
PDF
Event Driven-Architecture from a Scalability perspective
Jonas Bonér
 
PDF
A microservice approach for legacy modernisation
luisw19
 
PPTX
Microsoft Azure Technical Overview
gjuljo
 
PPTX
Microservices Architecture & Testing Strategies
Araf Karsh Hamid
 
PPTX
Micro services Architecture
Araf Karsh Hamid
 
PDF
webMethods World: How Can You Innovate Even Faster With the Latest webMethods...
Software AG
 
PDF
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
PPTX
Apigee Products Overview
Apigee | Google Cloud
 
PPTX
Microservice architecture design principles
Sanjoy Kumar Roy
 
PPTX
Cloud computing
Siddiq Abu Bakkar
 
PPTX
Monoliths and Microservices
Bozhidar Bozhanov
 
PPTX
Microservices Part 3 Service Mesh and Kafka
Araf Karsh Hamid
 
PPTX
Azure migration
Arnon Rotem-Gal-Oz
 
PPSX
Service Mesh - Observability
Araf Karsh Hamid
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Araf Karsh Hamid
 
Microservices Architecture - Bangkok 2018
Araf Karsh Hamid
 
Migrate an Existing Application to Microsoft Azure
Chris Dufour
 
Azure Migration Program Pitch Deck
Nicholas Vossburg
 
Migrate to Microsoft Azure with Confidence
David J Rosenthal
 
Exposing services with Azure API Management
Callon Campbell
 
Event Driven-Architecture from a Scalability perspective
Jonas Bonér
 
A microservice approach for legacy modernisation
luisw19
 
Microsoft Azure Technical Overview
gjuljo
 
Microservices Architecture & Testing Strategies
Araf Karsh Hamid
 
Micro services Architecture
Araf Karsh Hamid
 
webMethods World: How Can You Innovate Even Faster With the Latest webMethods...
Software AG
 
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
Apigee Products Overview
Apigee | Google Cloud
 
Microservice architecture design principles
Sanjoy Kumar Roy
 
Cloud computing
Siddiq Abu Bakkar
 
Monoliths and Microservices
Bozhidar Bozhanov
 
Microservices Part 3 Service Mesh and Kafka
Araf Karsh Hamid
 
Azure migration
Arnon Rotem-Gal-Oz
 
Service Mesh - Observability
Araf Karsh Hamid
 

Similar to Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity (20)

PPTX
Scaling Your Architecture for the Long Term
Randy Shoup
 
PPTX
Service Architectures At Scale - QCon London 2015
Randy Shoup
 
PPTX
Moving Fast At Scale
Randy Shoup
 
PPTX
Service Architectures at Scale
Randy Shoup
 
PPTX
DevOps - It's About How We Work
Randy Shoup
 
PPTX
Melbourne Microservices Meetup: Agenda for a new Architecture
Saul Caganoff
 
PDF
When Should You Consider Meta Architectures
Daniel Cukier
 
PDF
When Should You Consider Meta Architectures
ccsl-usp
 
PPTX
Architecting Microservices in .Net
Richard Banks
 
PPT
The Economies of Scaling Software
Abdelmonaim Remani
 
PPTX
Pragmatic Microservices
Randy Shoup
 
PPT
The economies of scaling software - Abdel Remani
jaxconf
 
PPTX
Minimum Viable Architecture - Good Enough is Good Enough
Randy Shoup
 
PDF
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
PPTX
The Big Picture - Integrating Buzzwords
Alessandro Giorgetti
 
PDF
SACon 2019 - Surviving in a Microservices Environment
Steve Pember
 
PPTX
Mykhailo Hryhorash: Архітектура IT-рішень (Частина 1) (UA)
Lviv Startup Club
 
PDF
Microservices: State of the Union
C4Media
 
PPTX
Monolith to serverless service based architectures in the enterprise
Sameh Deabes
 
PPTX
RightScale User Conference: Why RightScale?
Erik Osterman
 
Scaling Your Architecture for the Long Term
Randy Shoup
 
Service Architectures At Scale - QCon London 2015
Randy Shoup
 
Moving Fast At Scale
Randy Shoup
 
Service Architectures at Scale
Randy Shoup
 
DevOps - It's About How We Work
Randy Shoup
 
Melbourne Microservices Meetup: Agenda for a new Architecture
Saul Caganoff
 
When Should You Consider Meta Architectures
Daniel Cukier
 
When Should You Consider Meta Architectures
ccsl-usp
 
Architecting Microservices in .Net
Richard Banks
 
The Economies of Scaling Software
Abdelmonaim Remani
 
Pragmatic Microservices
Randy Shoup
 
The economies of scaling software - Abdel Remani
jaxconf
 
Minimum Viable Architecture - Good Enough is Good Enough
Randy Shoup
 
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
The Big Picture - Integrating Buzzwords
Alessandro Giorgetti
 
SACon 2019 - Surviving in a Microservices Environment
Steve Pember
 
Mykhailo Hryhorash: Архітектура IT-рішень (Частина 1) (UA)
Lviv Startup Club
 
Microservices: State of the Union
C4Media
 
Monolith to serverless service based architectures in the enterprise
Sameh Deabes
 
RightScale User Conference: Why RightScale?
Erik Osterman
 
Ad

More from Randy Shoup (20)

PPTX
Anatomy of Three Incidents -- Commonalities and Lessons
Randy Shoup
 
PPTX
One Terrible Day at Google, and How It Made Us Better
Randy Shoup
 
PPTX
Minimal Viable Architecture - Silicon Slopes 2020
Randy Shoup
 
PPTX
An Agile Approach to Machine Learning
Randy Shoup
 
PPTX
Moving Fast at Scale
Randy Shoup
 
PPTX
Breaking Codes, Designing Jets, and Building Teams
Randy Shoup
 
PPTX
Scaling Your Architecture with Services and Events
Randy Shoup
 
PPTX
Learning from Learnings: Anatomy of Three Incidents
Randy Shoup
 
PPTX
Managing Data at Scale - Microservices and Events
Randy Shoup
 
PPTX
Monoliths, Migrations, and Microservices
Randy Shoup
 
PPTX
Evolving Architecture and Organization - Lessons from Google and eBay
Randy Shoup
 
PPTX
Ten Lessons of the DevOps Transition
Randy Shoup
 
PPTX
Managing Data in Microservices
Randy Shoup
 
PPTX
Effective Microservices In a Data-centric World
Randy Shoup
 
PPTX
A CTO's Guide to Scaling Organizations
Randy Shoup
 
PPTX
From the Monolith to Microservices - CraftConf 2015
Randy Shoup
 
PPTX
Concurrency at Scale: Evolution to Micro-Services
Randy Shoup
 
PPTX
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
Randy Shoup
 
PPTX
Why Enterprises Are Embracing the Cloud
Randy Shoup
 
PPTX
DevOpsDays Silicon Valley 2014 - The Game of Operations
Randy Shoup
 
Anatomy of Three Incidents -- Commonalities and Lessons
Randy Shoup
 
One Terrible Day at Google, and How It Made Us Better
Randy Shoup
 
Minimal Viable Architecture - Silicon Slopes 2020
Randy Shoup
 
An Agile Approach to Machine Learning
Randy Shoup
 
Moving Fast at Scale
Randy Shoup
 
Breaking Codes, Designing Jets, and Building Teams
Randy Shoup
 
Scaling Your Architecture with Services and Events
Randy Shoup
 
Learning from Learnings: Anatomy of Three Incidents
Randy Shoup
 
Managing Data at Scale - Microservices and Events
Randy Shoup
 
Monoliths, Migrations, and Microservices
Randy Shoup
 
Evolving Architecture and Organization - Lessons from Google and eBay
Randy Shoup
 
Ten Lessons of the DevOps Transition
Randy Shoup
 
Managing Data in Microservices
Randy Shoup
 
Effective Microservices In a Data-centric World
Randy Shoup
 
A CTO's Guide to Scaling Organizations
Randy Shoup
 
From the Monolith to Microservices - CraftConf 2015
Randy Shoup
 
Concurrency at Scale: Evolution to Micro-Services
Randy Shoup
 
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
Randy Shoup
 
Why Enterprises Are Embracing the Cloud
Randy Shoup
 
DevOpsDays Silicon Valley 2014 - The Game of Operations
Randy Shoup
 
Ad

Recently uploaded (20)

PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 

Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity

  • 1. Large-Scale Architecture The Unreasonable Effectiveness of Simplicity Randy Shoup @randyshoup
  • 4. • 1995 Monolithic Perl • 1996-2002 v2 o Monolithic C++ ISAPI DLL o 3.4M lines of code o Compiler limits on number of methods per class • 2002-2006 v3 Migration o Java “mini-applications” o Shared databases • 2012 v4 Microservices • 2017 v5 Microservices @randyshoup eBay Architecture
  • 5. • 1995-2001 Obidos o Monolithic Perl / Mason frontend over C backend o ~4GB application in a 4GB address space o Regularly breaking Gnu linker o Restarting every 100-200 requests for memory leaks o Releasing once per quarter • 2001-2005 Service Migration o Services in C++, Java, etc. o No shared databases • 2006 AWS launches @randyshoup Amazon Architecture
  • 6. No one starts with microservices … Past a certain scale, everyone ends up with microservices @randyshoup
  • 7. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 8. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 9. “There are two methods in software design: One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.” -- Tony Hoare @randyshoup
  • 10. Modular Services • Service boundaries match the problem domain • Service boundaries encapsulate business logic and data o All interactions through published service interface o Interface hides internal implementation details o No back doors • Service boundaries encapsulate architectural -ilities o Fault isolation o Performance optimization o Security boundary @randyshoup
  • 11. Orthogonal Domain Logic • Stateless domain logic o Ideally stateless pure function o Matches domain problem as directly as possible o Deterministic and testable in isolation o Robust to change over time • “Straight-line processing” o Straightforward, synchronous, minimal branching • Separate domain logic from I/O o Hexagonal architecture, Ports and Adapters o Functional core, imperative shell @randyshoup
  • 12. Sharding • Shards partition the service’s “data space” o Units for distribution, replication, processing, storage o Hidden as internal implementation detail • Shards encapsulate architectural -ilities o Resource isolation o Fault isolation o Availability o Performance • Shards are autoscaled o Divide or scale out as processing or data needs increase o E.g., DynamoDB partitions, Aurora segments, Bigtable tablets @randyshoup
  • 13. Service Graph • Common services provide and abstract widely-used capabilities • Service topology o Services call others, which call others, etc. o Graph, not a strict layering • Simplifies concerns for service team o Concentrate on business logic o Abstract away details of dependencies o Focus on services you need, capabilities you provide @randyshoup
  • 14. Common Platform • “Paved Road” o Shared infrastructure o Standard frameworks o Developer experience o E.g., Netflix, Google • Separation of Concerns o Reduce cognitive load on stream-aligned teams o Bound decisions through enabling constraints @randyshoup
  • 15. Large-scale organizations often invest more than 50% of engineering effort in platform capabilities @randyshoup
  • 17. Evolving Services • Variation and Natural Selection o Create / extract new services when needed to solve a problem o Services justify their continued existence through usage o Deprecate services when they are no longer used • Services grow and evolve over time o Factor out common libraries and services as needed o Teams and services split like “cellular mitosis” @randyshoup
  • 18. “Every service at Google is either deprecated or not ready yet.” @randyshoup
  • 19. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 20. “Everything should be made as simple as possible, but not simpler.” @randyshoup
  • 21. Event-Driven • Communicate state changes as stream of events o Statement that some interesting thing occurred o Ideally represents a semantic domain event • Decouples domains and teams o Abstracted through a well-defined interface o Asynchronous from one another • Simplifies component implementation @randyshoup
  • 22. Immutable Log • Store state as immutable log of events o Event Sourcing • Often matches domain o E.g., Stitch Fix order processing / delivery state • Log encapsulates architectural –ilities o Durable o Traceable and auditable o Replayable o Explicit and comprehensible • Compact snapshots for efficiency @randyshoup
  • 23. Immutable Log • Stitch Fix order states Request a fix fix_scheduled Assign fix to warehouse fix_hizzy_assigned Assign fix to a stylist fix_stylist_assigned Style the fix fix_styled Pick the items for the fix fix_picked Pack the items into a box fix_packed Ship the fix via a carrier fix_shipped Fix travels to customer fix_delivered Customer decides, pays fix_checked_out
  • 24. Embrace Asynchrony • Decouples operations in time o Decoupled availability o Independent scalability o Allows more complex processing, more processing in parallel o Safer to make independent changes • Simplifies component implementation @randyshoup
  • 25. Embrace Asynchrony • Invert from synchronous call graph to async dataflow o Exploit asymmetry between writes and reads o Can be orders of magnitude less resource intensive @randyshoup
  • 26. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 27. “A complex system that works is invariably found to have evolved from a simple system that worked.” -- Gall’s Law @randyshoup
  • 28. Incremental Change • Decompose every large change into small incremental steps • Each step maintains backward / forward compatibility of data and interfaces • Multiple service versions commonly coexist o Every change is a rolling upgrade o Transitional states are normal, not exceptional
  • 29. Continuous Testing • Tests help us go faster o Tests are “solid ground” o Tests are the safety net • Tests make better code o Confidence to break things o Courage to refactor mercilessly • Tests make better systems o Catch bugs earlier, fail faster @randyshoup
  • 30. Developer Productivity • 75% reading existing code • 20% modifying existing code • 5% writing new code https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/ @randyshoup
  • 31. Developer Productivity • 75% reading existing code • 20% modifying existing code • 5% writing new code https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/ @randyshoup
  • 32. Continuous Testing • Tests make better designs o Modularity o Separation of Concerns o Encapsulation @randyshoup
  • 33. “There’s a deep synergy between testability and good design. All of the pain that we feel when writing unit tests points at underlying design problems.” @randyshoup -- Michael Feathers
  • 34. Test-Driven Development • è Basically no bug tracking system (!) o “Inbox Zero” for bugs o Bugs are fixed as they come up o Backlog contains features we want to build o Backlog contains technical debt we want to repay @randyshoup
  • 35. Canary Deployments • Staged rollout o Go slowly at first; go faster when you gain confidence • Automated rollout / rollback o Automatically monitor changes to metrics o If metrics look good, keep going; if metrics look bad, roll back • Make deployments routine and boring @randyshoup
  • 36. Feature Flags • Configuration “flag” to enable / disable a feature for a particular set of users o Independently discovered at eBay, Facebook, Google, etc. • More solid systems o Decouple feature delivery from code delivery o Rapid on and off o Separate experiment and control groups o Develop / test / verify in production @randyshoup
  • 37. Continuous Delivery • Deploy services multiple times per day o Robust build, test, deploy pipeline o SLO monitoring o Synthetic monitoring • More solid systems o Release smaller, simpler units of work o Smaller changes to roll back or roll forward o Faster to repair, easier to understand, simpler to diagnose o Increase rate of change and reduce risk of change @randyshoup
  • 38. • Cross-company Velocity Initiative to improve software delivery o Think Big, Start Small, Learn Fast o Iteratively identify and remove bottlenecks for teams o “What would it take to deploy your application every day?” • Doubled engineering productivity o 5x faster deployment frequency o 5x faster lead time o 3x lower change failure rate o 3x lower mean-time-to-restore • Prerequisite for large-scale architecture changes @randyshoup Continuous Delivery
  • 39. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 40. System of Record • Single System of Record o Every piece of data is owned by a single service o That service is the canonical system of record for that data • Every other copy is a read-only, non-authoritative cache customer-service styling-service customer-search billing-service @randyshoup
  • 41. Shared Data Option 1: Synchronous Lookup o Customer service owns customer data o Fulfillment service calls customer service in real time fulfillment-service customer-service @randyshoup
  • 42. Shared Data Option 2: Async event + local cache o Customer service owns customer data o Customer service sends address-updated event when customer address changes o Fulfillment service caches current customer address fulfillment-service customer-service @randyshoup
  • 43. Joins Option 1: Join in Client Service o Get a single customer from customer-service o Query matching orders for that customer from order-service Customers Orders order-history-page customer-service order-service @randyshoup
  • 44. Joins Option 2: Service that “Materializes the View” o Listen to events from customer-service, events from order-service o Maintain denormalized join of customer data and orders together in local storage Customer Orders customer-order-service customer-service order-service @randyshoup
  • 45. Netflix Viewing History • Store and process member’s playback data o 1M requests per second o Used for viewing history, personalization, recommendations, analytics, etc. • Original synchronous architecture o Synchronously write to persistent storage and lookup cache o Availability and data loss from backpressure at high load • Asynchronous rearchitecture o Write to durable queue o Async pipeline to enrich, process, store, serve o Materialize views to serve reads @randyshoup Sharma Podila, 2021, Microservices to Async Processing Migration at Scale, QConPlus 2021.
  • 46. Walmart Item Availability • Is this item available to ship to this customer? o Customer SLO 99.98% uptime in 300ms • Complex logic involving many teams and domains o Inventory, reservations, backorders, eligibility, sales caps, etc. • Original synchronous architecture o Graph of 23 nested synchronous service calls in hot path o Any component failure invalidates results o Service SLOs 99.999% uptime with 50ms marginal latency o Extremely expensive to build and operate @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 47. Walmart Item Availability @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 48. Walmart Item Availability • Invert each service to use async events o Event-driven “dataflow” o Idempotent processing o Event-sourced immutable log o Materialized view of data from upstream dependencies • Asynchronous rearchitecture o 2 services in synchronous hot path o Async service SLOs 99.9% uptime with latency in seconds or minutes o More resilient to delays and outages o Orders of magnitude simpler to build and operate @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 49. Walmart Item Availability @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 50. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together