Building Scalable Web Architectures: Aaron Bannert
Building Scalable Web Architectures: Aaron Bannert
Aaron Bannert
[email protected] / [email protected]
Abundance of talent
Modular Components
Public APIs
Open Architecture
Vendor Neutral
Many options at all levels
Extendable
Price
Speed
Quality
•Routers
•Switches
•Firewalls
•Load Balancers
Software Choices
Building LAMP Software
External Caching Tier
External Caching Tier
What is this?
Squid
Apache’s mod_proxy
Flushes Connections
Useful for modem users, frees up web tier
Hardware Requirements
Lots of Memory
Moderate to little CPU
Fast Network
Moderate Disk Capacity
Room for cache, logs, etc… (disks are cheap)
One slow disk is OK
Other Questions
What to cache?
What is this?
Apache
thttpd
Tux Web Server
IIS
Netscape
Web Serving Tier
Hardware Requirements
Lots and lots of Memory
Memory is main bottleneck in web serving
Memory determines max number of users
Fast Network
CPU depends on usage
Dynamic content needs CPU
Static file serving requires very little CPU
Cheap slow disk, enough to hold your content
Web Serving Tier: Zero-copy
Performance Hint
Dedicated static content servers
Modern web servers are very good at serving static
content such as
• HTML
• CSS
• Images
• Zip/GZ/Tar files
Web Serving Tier
Performance Hint
Stateless Sessions
Each connection is a fresh start
Server remembers nothing
Benefits?
Allows Better Caching
Scales Horizontally
Web Serving Tier
Choices
How much dynamic content?
When to offload dynamic processing?
When to offload database operations?
When to add more web servers?
Application Server Tier
Application Server Tier
Internal Services
Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier
Caveats
Decoupling of services is GOOD
Manage Complexity using well-defined APIs
Don’t decouple for scaling, change your algorithms!
Remote Calling overhead can be expensive
Marshaling of data
Sockets, net latency, throughput constraints…
XML, Soap, XMLRPC, yuck (don’t scale well)
Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier
More Caveats
Remote Calling can introduce new failure
scenarios
Classic Distributed Problems
• How to detect remote failures?
How long to wait until deciding it’s failed?
• How to react to remote failures?
What do we do when all app servers have failed?
Application Server Tier
Hardware Requirements
Lots and Lots and Lots of Memory
App Servers are very memory hungry
Java was hungry to being with
Consider going to 64bit for larger memory-space
Disk depends on application, typically minimal needed
FAST CPU required, and lots of them
(This will be an expensive machine.)
Database Tier
Database Tier
Available DB Products
Free/Open Source DBs
PostgreSQL MySQL
GNU DBM SQLite
Ingres mSQL
SQLite Berkeley DB
Commercial
Oracle
MS SQL
IBM DB2
Sybase
SleepyCat
Database Tier
Choices
How much logic to place inside the DB?
Use Connection Pooling?
Data Partitioning?
Spreading a dataset across multiple logical database
“slices” in order to achieve better performance.
Database Tier
Hardware Requirements
Entirely dependent upon application.
Likely to be your most expensive machine(s).
Tons of Memory
Spindles galore
RAID is useful (in software or hardware)
Reliability usually trumps Speed
• RAID levels 0, 5, 1+0, and 5+0 are useful
CPU also important
Dual power supplies
Dual Network
Internal Cache Tier
Internal Cache Tier
What is this?
Object Cache
What Applications?
Memcache
Local Lookup Tables
BDB, GDBM, SQL-based
Application-local Caching (eg. LRU tables)
Homebrew Caching (disk or memory)
Internal Cache Tier
Hardware Requirements
Lots of Memory
Note that 32bit processes are typically limited to 2GB
of RAM
Little or no disk
Moderate to low CPU
Fast Network
Misc. Services (DNS, Mail, etc…)
Misc. Services (DNS, Mail, etc…)
Important Points
Always have an offsite NS slave
Always have an onsite NS slave
Minimize network latency
Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization
Fault Notification
The Glue
•Routers
•Switches
•Firewalls
•Load Balancers
Routers and Switches
Expensive
Complex
Crucial Piece of the System
Hints
Use GigE if you can
Jumbo Frames are GOOD
VLans to manage complexity
LACP (802.3ad) for failover/redundancy
Load Balancers
Linux
FreeBSD
NetBSD
OpenBSD
OpenSolaris
Commercial Unix
What’s Important?
Maintainability
Upgrade Path
Security Updates
Bug Fixes
Usability
Do your engineers like it?
Cost
Hardware Requirements
(you don’t need a commercial Unix anymore)
Features to look for
Multi-processor Support
64bit Capable
Number of Spindles
More spindles can give
Higher Throughput
Higher Concurrency
• Concurrency is crucial for Databases
Reliability
• Failover drives, mirrors
Memory Technologies
ECC Non-ECC
Expensive
Cheap, Fast
Standardize Hardware
(Open Source!)
Avoid Fads
THE END
Thank You