Scale From Zero To Millions of Users
Scale From Zero To Millions of Users
Here, we're building a system that supports a few users & gradually scale it to support
millions.
User asks DNS server for the IP of my site (ie api.mysite.com ->
15.125.23.214). Usually, DNS is provided by third-parties instead of hosting it
yourself.
HTTP requests are sent directly to server (via its IP) from your device
Server returns HTML pages or JSON payloads, used for rendering.
Traffic to web server comes from either a web application or a mobile application:
Web applications use a combo of server-side languages (ie Java, Python) to handle
business logic & storage. Client-side languages (ie HTML, JS) are used for
presentation.
Mobile apps use the HTTP protocol for communication between mobile & the web
server. JSON is used for formatting transmitted data. Example payload:
{
"id":12,
"firstName":"John",
"lastName":"Smith",
"address":{
"streetAddress":"21 2nd Street",
"city":"New York",
"state":"NY",
"postalCode":10021
},
"phoneNumbers":[
"212 555-1234",
"646 555-4567"
]
}
Database
As the user base grows, storing everything on a single server is insufficient. We can separate
our database on another server so that it can be scaled independently from the web tier:
Which databases to use?
You can choose either a traditional relational database or a non-relational (NoSQL) one.
Relational databases represent & store data in tables & rows. You can join different tables to
represent aggregate objects. NoSQL databases are grouped into four categories - key-value
stores, graph stores, column stores & document stores. Join operations are generally not
supported.
For most use-cases, relational databases are the best option as they've been around the most
& have worked quite well historically.
If not suitable though, it might be worth exploring NoSQL databases. They might be a better
option if:
Horizontal scaling == scale out. Add more servers to your pool of resources.
Vertical scaling is great when traffic is low. Simplicity is its main advantage, but it has
limitations:
Horizontal scaling is more appropriate for larger applications due to vertical scaling's
limitations. Its main disadvantage is that it's harder to get right.
In design so far, the server going down (ie due to failure or overload) means the whole
application goes down with it. A good solution for this problem is to use a load balancer.
Load balancer
A load balancer evenly distributes incoming traffic among web servers in a load-balanced set:
Clients connect to the public IP of the load balancer. Web servers are unreachable by clients
directly. Instead, they have private IPs, which the load balancer has access to.
By adding a load balancer, we successfully made our web tier more available and we also
added possibility for fail over.
How it works?
If server 1 goes down, all traffic will be routed to server 2. This prevents website from
going offline. We'll also add a fresh new server to balance the load.
If website traffic spikes and two servers are not sufficient to handle traffic, load
balancer can handle this gracefully by adding more servers to the pool.
Web tier looks lit now. But what about the data tier?
Database replication
Database replication can usually be achieved via master/slave replication (side note -
nowadays, it's usually referred to as primary/secondary replication).
A master database generally only supports writes. Slave databases store copies of the data
from the master & only support read operations. This setup works well for most applications
as there's usually a higher read to write ratio. Reads can easily be scaled by adding more
slave instances.
Advantages:
If slave database goes offline, read operations are routed to the master/other slaves
temporarily.
If master goes down, a slave instance will be promoted to the new master. A new
slave instance will replace the old master.
Sweet, let's now improve the load/response time by adding a cache & shifting static content
to a CDN.
Cache
Cache is a temporary storage which stores frequently accessed data or results of expensive
computations.
In our web application, every time a web page is loaded, expensive queries are sent to the
database. We can mitigate this using a cache.
Cache tier
The cache tier is a temporary storage layer, from which results are fetched much more rapidly
than from within a database. It can also be scaled independently from the database.
The example above is a read-through cache - server checks if data is available in the cache. If
not, data is fetched from the database.
User tries fetching an image via URL. URLs are provided by the CDN, eg
https://mysite.cloudfront.net/logo.jpg
If the image is not in the cache, the CDN requests the file from the origin - eg web
server, S3 bucket, etc.
Origin returns the image to the CDN with an optional TTL (time to live) parameter,
which controls how long that static resource is to be cached.
Subsequent users fetch the image from the CDN without any requests reaching the
origin as long as it's within the TTL.
In order to do that, we can store user session data in persistent data storage such as our
relational database or a NoSQL database.
Stateful architecture
Stateful servers remember client data across different requests. Stateless servers don't.
In the above case, users are coupled to the server which stores their session data. If they make
a request to another server, it won't have access to the user's session.
This can be solved via sticky sessions, which most load balancers support, but it adds
overhead. Adding/removing servers is much more challenging, which limits our options in
case of server failures.
Stateless architecture
In this scenario, servers don't store any user data themselves. Instead, they store it in a shared
data store, which all servers have access to.
This way, HTTP requests from users can be served by any web server.
Updated web application architecture:
The user session data store could either be a relational database or a NoSQL data store, which
is easier to scale for this kind of data. The next step in the app's evolution is supporting
multiple data centers.
Data centers
In the above example, clients are geo-routed to the nearest data center based on the IP
address.
In the event of an outage, we route all traffic to the healthy data center:
To achieve this multi-datacenter setup, there are several issues we need to address:
traffic redirection - tooling for correctly directing traffic to the right data center.
GeoDNS can be used in this case.
data synchronization - in case of failover, users from DC1 go to DC2. A challenge is
whether their user data is there.
test and deployment - automated deployment & testing is crucial to keep deployments
consistent across DCs.
To further scale the system, we need to decouple different system components so they can
scale independently.
Message queues
Message queues are durable components, which enable asynchronous communication.
Basic architecture:
Logging - error logs can be emitted to a data store, which can later be read by service
operators.
Metrics - collecting various types of metrics helps us collect business insight &
monitor the health of the system.
Automation - investing in continuous integration such as automated build, test,
deployment can detect various problems early and also increases developer
productivity.
Updated system design:
Database scaling
There are two approaches to database scaling - vertical and horizontal.
Vertical scaling
Also known as scaling up, it means adding more physical resources to your database nodes -
CPU, RAM, HDD, etc. In Amazon RDS, for example, you can get a database node with 24
TB of RAM.
This kind of database can handle lots of data - eg stackoverflow in 2013 had 10mil monthly
unique visitors \w a single database node.
There are hardware limits to the amount of resources you can add to a node.
You still have a single point of failure.
Overall cost is high - the price of powerful servers is high.
Horizontal scaling
Instead of adding bigger servers, you can add more of them:
Sharding is a type of database horizontal scaling which separates large data sets into smaller
ones. Each shard shares the same schema, but the actual data is different.
One way to shard the database is based on some key, which is equally distributed on all
shards using the modulo operator:
Here's how the user data looks like in this example:
The sharding key (aka partition key) is the most important factor to consider when using
sharding. In particular, the key should be chosen in a way that distributes the data as evenly
as possible.
Resharding data - you need to do it if a single shard grows too big. This can happen
rather quickly if data is distributed unevenly. Consistent hashing helps to avoid
moving too much data around.
Celebrity problem (aka hotspot) - one shard could be accessed much more frequently
than others and can lead to server overload. We may have to resort to using separate
shards for certain celebrities.
Join and de-normalization - It is hard to perform join operations across shards. A
common workaround is to de-normalize your tables to avoid making joins.
Here's how our application architecture looks like after introducing sharding and a NoSQL
database for some of the non-relational data:
Millions of users and beyond
Scaling a system is iterative.
What we've learned so far can get us far, but we might need to apply even more sophisticated
techniques to scale the application beyond millions of users.
The techniques we saw so far can offer a good foundation to start from.
Here's a summary: