System_Design_Notes_1664811186
System_Design_Notes_1664811186
NOTES
Latency:
How long it takes for data from one point in a system to another point in system.
Talking about Latency we might refer to a lot of different kind of things system.
o Network Request
o Memory
o Disk: HDD or SSD
Important fact to know different things in system will have different latency
o Reading data from Memory will be faster than reading data from Disk
o Network Calls latency increases as distance increases.
1|Page
When designing systems you typically want to optimize those system by lowering
overall latencies of the system.
Some systems might really care for low latencies.
o Video Game
o Video Conferencing
Some systems might not care for low latencies
o Websites
o Accurate information
o Never Down
Throughput:
How much work a machine can perform in given period of time.
What we are actually referring here is how much data can be transferred from one point
of system to another point of system in a given amount of time.
Typically we measure this throughput in Gigabits per second or Kilobits per second ,
megabits per second
For example Network of 1Gbps.
2|Page
Availability:
How Resistant a system is for failures.
Is your system completely go down or your system still going to be operational.
Think about availability as percentage of time in a given period of time, like a month
or a year your services operational enough such that all of it’s primary functions are
satisfied.
There are varying degree of availability that you might expect from different
systems.
o YouTube
o Airplane System
o Cloud Providers
o We typically measure availability as the percentage of a system's uptime in a
given year
If a system is up and operational for half of an entire year, then system has 50%
availability.
In practice, you could imagine that 50% availability would be really, really bad for
most services.
Percentages can be pretty deceptive because even an availability of 90% isn't really
great.
Outage of 36 days around out of the year.
Nines:
We measure availability not exactly in percentages but rather in what we call nines.
Nines are effectively percentages but they are specifically percentages with the
number nine.
If you have a system that has 99% availability. Then, in the industry, we say that
your system has two nines of availability.
If it has 99.99%, then we say it has four nines of availability.
3|Page
4|Page
Load Balancer:
A load balancer is going to be a server that sits in between your clients and your
servers, and that basically has the job of, as its name suggests, balancing workloads
across resources.
People who are in charge of the system can configure the load balancer and the
servers to know about each other.
When you add a new server or when you remove an old server, it registers itself with
the load balancer or perhaps it deregisters itself with a load balancer.
5|Page
Server Selection Strategy:
6|Page
Proxies:
Forward Proxy ( generally referred as proxy):
A forward proxy is a server that sits in between a client or a set of clients.
But more specifically a forward proxy is a server that acts on behalf of the client
or clients.
7|Page
Reverse Proxy:
Reverse proxies act on behalf of a server in an interaction between a client and
a server.
Popular example Nginx
Caching:
Caching is the process of storing copies of files in a cache, or temporary
storage location, so that they can be accessed more quickly.
The data in a cache is generally stored in fast access hardware such as RAM
(Random-access memory)
A cache's primary purpose is to increase data retrieval performance by
reducing the need to access the underlying slower storage layer.
Caching is used to reduce or to improve the latency of a system.
And you can use caching in a bunch of different places in a system.
Browser Cache ( Client Level ):
When a user visits a new website, their browser needs to download data to load
and display the content on the page.
To speed up this process the next time a user visits the site, browsers cache the
content on the page and save a copy of it.
As a result, the next time the user goes to that website, the content is already
stored on their device and the page will load faster.
8|Page
Server Level:
Client interacts with the server 20 times to get a piece of data, but maybe the server
doesn't always need to go to the database to retrieve data.
Maybe Server only needs to go to the database once and we can have some form of
cache here at the server level (in memory).
You could also have a cache in-between two components in a system. So maybe
you could have a cache in-between a server and a database.
The first instance where caching is going to be really helpful is if you're doing a lot
of network requests and you basically want to avoid doing all of these network
requests.
Another instance where caching is very helpful is if you're doing some very
computationally long operation.
Assume at the server level you perform some very long algorithm, maybe an
algorithm that has a poor time complexity.
Cache that result because you don't want to be performing that very long operation
multiple times.
9|Page
10 | P a g
e
11 | P a g
e
Stale Cache:
Sometimes for certain parts of our system, or rather certain features , we might
actually not care that much about the staleness or non-staleness of the data in our
caches.
As an example, let's take view count on YouTube videos.
If one user sees a slightly stale version of a view count on a video, that's probably
not going to be the end of the world.
12 |P a g
e
Hashing:
Hashing is an action that you can perform to transform an arbitrary piece of data into
a fixed size value, typically an integer value.
In the context of systems design interview, that arbitrary piece of data can be an IP
address, it can be a username, it can be an HTTP request, anything that can be
hashed or transformed into an integer value.
Few hashing function and algorithm used in Industry
MD5 hashing
SHA-256 hashing algorithm,
Bcrypt hashing function
13 | P a g
e
Simple Hashing Example or Why Consistent Hashing?
In Hashing we can hash the requests that come in to the load balancer.
And then based on the hash we can send the requests according to the position of
the servers.
So let's walk through this example and for the sake of simplicity what we're going
to do here is we're just going to hash the names of our clients.
Our goal is to get every client to have all of its requests rerouted to the same server.
We are going to hash the client's names themselves that is C1, C2, C3, C4.
Let's assume that we've got a hashing function that's been given to us and that when
we pass C1, C2, C3, C4 through that hashing function.
We get the following results. 11 for C1, we get 12 for C2, we get 13 for C3 and we
get 14 for C4.
remember that a hashing function transforms your arbitrary pieces of data into
some fixed size value, typically an integer value.
Right Now we will use really the simplest hashing strategy in the context of
systems design interviews,
We will mod these hashes here by the number of servers that we have.
11%4 = 3, 12%4 = 0, 13%4 = 1, 14%4= 2
Now it means that we have the numbers corresponding to our four servers that
these four clients should be associated with.
C1 -> D. C2 -> A , C3 -> B , C4 -> C
11%4 = 3. 12%4 = 0. 13%4 = 1, 14%4= 2
14 | P a g
e
If we keep modding our hashes for our clients by four, then all of our requests are
always only going to go to servers A, B, C, and D. They're never going to go to E.
If we add a new server, we have to change some logic here. Namely, we have to
mod our hashes by the new number of servers.
11%5 = 1 , 12%5 = 2, 13%5 =3, 14%5 =4
Previous values: 11%4 = 3. 12%4 = 0. 13%4 = 1, 14%4= 2
When we mod our hashes by five instead of four, we get completely different
results for the servers that our clients are going to be rerouted to.
Our very simple hashing strategy of hashing our clients or requests IP addresses,
and then modding the hashes by the number of servers to figure out what server to
reroute stuff to just doesn't work.
All of your in-memory caches that you may have had in your system are no longer
nearly as useful
Consistent Hashing to Rescue.
Consistent Hashing:
Consistent Hashing is a distributed hashing scheme that operates independently of
the number of servers or objects
We assigning servers, object a position on an abstract circle, or hash ring.
15 | P a g
e
Hash Ring:
The number of locations is no longer fixed, but the ring is considered to have an
infinite number of points and the server nodes can be placed at random locations on
this ring.
Of course, choosing this random number again can be done using a hash function.
Step of dividing it with the number of available locations is skipped as it is no
longer a finite number.
We will map the hash output range on the edge of a circle.
That means that the minimum possible hash value, zero, would correspond to an
angle of zero,
The maximum possible value (some big integer we’ll call INT_MAX) or 360
degrees, and all other hash values would linearly fit somewhere in between.
The way that you place these servers here on the circle is by putting them through a
hashing function.
We will pass server names in a hashing function. You get a value and depending on
the value, you position them on the circle.
If hashing function used is a good hashing function that has that uniformity about
it, then the servers will be sort of evenly distributed.
Exact same thing with your clients. Our clients are going to go through a hashing
function and then you position them on the circle.
16 | P a g
e
17 | P a g
e
Replication:
Replication is the process of storing the same data in multiple locations to improve
data availability and accessibility, and to improve system resilience and reliability.
The idea behind replication is that you have a duplicate version of your main
database, a replica of the main database.
The main database handles all of the reads and writes coming to it, but it also
updates the replica such that the replica is effectively the same as the main
database.
The replica can take over if the main database fails.
o So when the main database goes down, the replica takes over and now
becomes the new main database
Once the original main database comes back up, it gets updates by the replica, and
then eventually they can swap roles.
o This is one possible use case.
Or maybe your main database server is getting overloaded, then you can split up
your traffic between replicas hence increased throughput.
In order for this to work, Your replica needs to always be exactly up to date with
the main database.
Whenever someone writes or update the main database, that update needs to also
happen in the replica.
If write operation fails on the replica, there's an issue and the right operation should
not complete on the main database.
In this scenario where you want your replica to be able to take over for your main
database in the event of database failure, you never want the replica to be out of
date with the main database.
This means that your write operations are going to take a little bit longer, because
they have to be done both on the main database and on the replica.
Benefits of data replication:
Improved reliability and availability: If one system goes down due to faulty
hardware, malware attack, or another problem, the data can be accessed from a
different server.
Improved network performance: Having the same data in multiple locations can
lower data access latency, since required data can be retrieved closer to where the
transaction is executing.
Issues:
If our system is a system that's got tons of data? Where we've got over a billion
users, Do we really want to have all of that data replicated across a bunch of
different databases? Maybe No.
Keeping copies of the same data in multiple locations leads to higher storage and
processor costs.
Maintaining consistency across data copies requires new procedures and adds
traffic to the network hence increased bandwidth consumption.
18 | P a g
e
Sharding:
Sharding is a database architecture where we separate table’s rows into multiple
different tables, known as partitions.
Each partition has the same schema and columns, but also entirely different rows.
One part of the data would be stored in one database server, another part of the data
would be stored in another database server, and so on.
Splitting up of main database into a bunch of little databases, which are called
shards or data partitions.
Key Based Sharding:
Key based sharding, also known as hash based sharding.
It involves using a value taken from newly written data - such as a customer’s ID
number, a client application’s IP address, a ZIP code, etc
And plugging it into a hash function to determine which shard the data should go to
To ensure that entries are placed in the correct shards and in a consistent manner,
the values are passed through the hash function
The main appeal of this strategy is that it can be used to evenly distribute data so as
to prevent hotspots.
Hotspots:
The database hotspot problem arises when one shard accessed more as compared to
all other shards
And hence, in this case, any benefits of sharding the database are cancelled out by
the slowdowns and crashes.
Drawback of Key based Sharding:
Its challenging to dynamically add or remove a database server.
Every time this happens, we need to re-shard the database which means we need to
update the hash function and rebalance the data.
If your database server goes down, then consistent hashing will not help. We will
probably need a replica of each shard.
19 | P a g
e
Range Based Sharding:
In range-based sharding, the shard is chosen on the basis of the range of a shard
key.
Let’s say we have a recommender system that stores all the information about a
user and recommends user movies based on their age.
Range-based sharding is easy to implement as we just need to check the range in
which our current data falls and insert/read data from the shard corresponding to
that shard.
Drawback:
The major drawback of this technique is that if our data is unevenly
distributed, again it can lead to database hotspots.
Directory-Based Sharding:
In directory-based sharding have a lookup table.
It stores the shard key to keep track of which shard store what entry.
To read or write data, first we need to consult the lookup table to find the
shard number for the corresponding data using the shard-key and then visits
a particular shard to perform the further operation.
20 | P a g
e
Drawback:
The main issue with directory-based sharding is we need to consult a lookup table
before every read and write query hence it can impact application performance.
Also, the lookup table is prone to a single point of failure.
Geo-Based sharding:
In Geo-based sharding, the data is processed by a shard corresponding to the user
region or location.
The obvious problem we will face using this strategy is if we have the majority of
users from one pin code, city, or country, then we will have hotspots.
Leader Election:
Why Leader Election?
Imagine that you're designing a system for a product that allows users to
subscribe to the product on a recurring basis.
You can think of Netflix, or of Amazon Prime, where users can subscribe on
monthly or annual basis.
You will have a database in which you're going to store information about
user subscriptions.
You might store whether or not a user is currently subscribed to the service
that you're offering.
You might store the date at which point the users subscription is suppose to
renew.
You might store the price that the user should be charged on a recurring
basis.
21 | P a g
e
And then, you would be using a third party service that would be the service
actually taking care of charging the users, or debiting funds from their bank
accounts.
Suppose your third party service is PayPal or Stripe
And of course, this means that your third party service needs to somehow
communicate with your database.
Because your third party service needs to know when a user should be charged
again, how much they should be charged, etc.
You don’t want to have this third party service actually interact with your database
directly.
Your database is a pretty sensitive part of your system.
It contains important information, and you may not want to have some seemly
random third party service connect directly into it.
So, a reasonable thing to do, would be to create a service in the middle, between
the third party service and the database.
And this service would be in charge of talking to the database, maybe on a periodic
basis.
This new service will figure out when certain users subscription is going to renew,
how much that user needs to be charged.
And then the new service is going to go to the third party service, to PayPal or to
Stripe, and actually tell the third party service to charge the user.
22 | P a g
e
Leader Election:
Leader election, as the name suggest, if you have a group of machines or a group of
servers that are in charge of doing the same thing, instead of having all of them
doing that same thing, a machine is selected as a leader which will perform actions.
The leader is going to be the one performing the business logic, or whatever needs
to be done.
Like in our previous example you definitely don't want to make a request for a
given user multiple times.
So in our case, we've got five servers that were all effectively responsible for this
business logic.
But instead of having all of them do business logic, they are going to elect a leader
amongst themselves.
And the leader, is going to be the only server responsible for doing the business
logic. And four other servers in example are just going to sit there, on standby, in
case something happens to the leader.
And if something happens to the leader, then one of the other servers is going to
become the new leader.
A new leader is going to be elected, and one of these other servers is going to
become that leader, and is going to take over.
24 | P a g
e