HLD_Interview
HLD_Interview
html
let's assume you have many terabytes (TB) of data and you want to allow users to
access small portions of that data at random. This is similar to locating an image file
somewhere on the file server in the image application example.
This is particularly challenging because it can be very costly to load TBs of data into
memory; this directly translates to disk IO. Reading from disk is many times slower
than from memory.
Thankfully there are many options that you can employ to make this easier; four of
the more important ones are caches, proxies, indexes and load balancers.
Caches
Caches take advantage of the locality of reference principle: recently requested data
is likely to be requested again. They are used in almost every layer of computing:
hardware, operating systems, web browsers, web applications and more. A cache is
like short-term memory: it has a limited amount of space, but is typically faster than
the original data source and contains the most recently accessed items. Caches can
exist at all levels in architecture, but are often found at the level nearest to the front
end, where they are implemented to return data quickly without taxing downstream
levels.
Global Cache:
all the nodes use the same single cache space. This involves adding a
server, or file store of some sort, faster than your original store and
accessible by all the request layer nodes. Each of the request nodes
queries the cache in the same way it would a local one. This kind of
caching scheme can get a bit complicated because it is very easy to
overwhelm a single cache as the number of clients and requests increase,
but is very effective in some architectures (particularly ones with
specialized hardware that make this global cache very fast, or that have a
fixed dataset that needs to be cached).
There are two common forms of global caches depicted in the diagrams. In Figure
1.10, when a cached response is not found in the cache, the cache itself becomes
responsible for retrieving the missing piece of data from the underlying store.
The majority of applications leveraging global caches tend to use
the first type, where the cache itself manages eviction and fetching
data to prevent a flood of requests for the same data from the
clients.
However, there are some cases where the 1.11 implementation
makes more sense. For example, if the cache is being used for very
large files, a low cache hit percentage would cause the cache buffer
to become overwhelmed with cache misses; in this situation it helps
to have a large percentage of the total data set (or hot data set) in
the cache.
A shared distributed cache for all services to store user session data, ensuring every service
can retrieve the user's session state when needed.
When you log into Facebook and load your news feed:
1. The server retrieves your user profile, preferences, and recent activities.
2. This data is fetched from the distributed cache (Memcached), pulling pieces from different
servers.
3. Intermediate computations (e.g.,which posts to show you) are sped up by language-level
caching like APC or $GLOBALS.
The result is delivered to you almost instantly because the caching system avoids unnecessary
work and fetches data efficiently.
Imagine many users or clients request the same data from a server around the same
time (e.g., fetching a popular image, video, or API response)
Proxies
Without optimization: The proxy or server must handle each request individually. This leads to
redundant work, as the same data is fetched multiple times from the backend servers.It increases
server load, network traffic, and response time
6. Common Uses
Content Delivery Networks (CDNs): CDNs use collapsed forwarding to reduce backend load
when serving popular content (e.g., videos, images, or webpages).
Caching Proxies: Proxies like Varnish or Squid implement this to optimize responses for
repeated API calls or database queries.
Distributed Systems: Systems with multiple microservices can use collapsed forwarding to
reduce internal communication overhead.
There is some cost associated with this design, since each request
can have slightly higher latency, and some requests may be slightly
delayed to be grouped with similar ones. But it will improve
performance in high load situations, particularly when that same
data is requested over and over. This is similar to a cache, but
instead of storing the data/document like a cache, it is optimizing
the requests or calls for those documents and acting as a proxy for
those clients.
It is worth noting that you can use proxies and caches together, but
generally it is best to put the cache in front of the proxy, This is
because the cache is serving data from memory, it is very fast, and
it doesn't mind multiple requests for the same result. But if the
cache was located on the other side of the proxy server, then there
would be additional latency with every request before the cache,
and this could hinder performance.
Payload refers to the actual piece of data that you want to retrieve or work with.
Think of it as the useful or meaningful portion of the data you're searching for, as
opposed to all the surrounding data in the larger dataset.
The payload is the small, specific piece of data that you're interested in, such as:
Without Indexing
Nginx uses round robin by default. To change it, we can simply add the
required algorithm (eg.. ip_hash) in the upstream block.
Scalability:
Decoupling: The session data is decoupled from the individual servers, allowing you to add or
remove servers without impacting the user's session.
Load Distribution: Since session data isn't tied to a specific server, the load balancer can
route users to any available server, distributing traffic evenly and enabling the system to
handle higher user volumes.
However, there is not necessarily more than one storage place for session data in
distributed session management. The centralized session store can consist of:
1. A single centralized database (though this might create a bottleneck or single point of failure
unless replicated).
2. A distributed database or caching system, where data is spread across multiple nodes to
ensure both scalability and fault tolerance.
To avoid this synchronous waiting and the risk of server failure affecting clients,
abstraction is needed. This means separating or decoupling the client’s request from
the actual work that needs to be done to fulfill that request. Here’s how it works:
Asynchronous Processing: Instead of making the client wait for the server to
complete the work, the server can handle the request asynchronously. This
means the server tells the client that it’s accepted the request and will process
it in the background, without the client needing to wait. The client can
continue doing other tasks, like browsing other products, while the server
works on the original request. When the work is done, the client is notified
with the results.
Queueing Work: Rather than each server immediately processing the client’s
request, a message queue can be used. When the client sends a request, it’s
placed in a queue. The work is picked up by available servers as they become
free. This avoids overloading any single server, and servers can handle
requests in a more balanced way.
Failover Systems: By abstracting the client request from the specific server
handling it, the system can ensure that if one server fails, another server can
take over the work. This ensures fault tolerance and improves reliability. The
client doesn’t have to know which server is processing the request, making the
system more resilient to server failures.
Abstraction between the client and the server’s work, such as
asynchronous processing, message queues, and failover systems,
helps to improve performance, distribute the load fairly, and ensure
reliability.
Services:
Read files will typically be read from cache, and writes will have to
go to disk eventually Even if everything is in memory or read from
disks (like SSDs), database writes will almost always be slower than
reads.
In the solution where we split out read and write services separately
it is easier to scale hardware based on actual usage (the number of
reads and writes across the whole system).
Redundancy:
If there is a core piece of functionality for an application, ensuring that multiple
copies or versions are running simultaneously can secure against the failure of a
single node.
Creating redundancy in a system can remove single points of failure and provide
a backup or spare functionality if needed in a crisis. For example, if there are two
instances of the same service running in production, and one fails or degrades,
the system can failover to the healthy copy. Failover can happen automatically or
require manual intervention.
In an active-active cluster, utilization of both nodes nears half and half— although
each node can handle the entire load alone. However, this also means that node
failure can cause performance to degrade if one active-active configuration node
handles more than half of the load consistently.
For example, in our image server application, all images would have
redundant copies on another piece of hardware somewhere (ideally
in a different geographic location in the event of a catastrophe like
an earthquake or fire in the data center), and the services to access
the images would be redundant.
In our image server example, it is possible that the single file server
used to store images could be replaced by multiple file servers,
each containing its own unique set of images. (See figure below)
Such an architecture would allow the system to fill each file server
with images, adding additional servers as the disks become full. The
design would require a naming scheme that tied an image's
filename to the server containing it. An image's name could be
formed from a consistent hashing scheme mapped across the
servers. Or alternatively, each image could be assigned an
incremental ID, so that when a client makes a request for an image,
the image retrieval service only needs to maintain the range of IDs
that are mapped to each of the servers (like an index).
https://www.hellointerview.com/learn/system-design/in-a-hurry/delivery
Requirements
Many of these systems have hundreds of features, but it's your job to identify and prioritize the top 3. Having a long
list of requirements will hurt you more than it will help you and many top FAANGs directly evaluate you on your ability
to focus on what matters
2) Non-functional requirements are statements about the system qualities that are important
to your users. These can be phrased as "The system should be able to..." or "The system
should be..." statements.
It's important that non-functional requirements are put in the context of the system and,
where possible, are quantified. For example, "the system should be low latency" is
obvious and not very meaningful—nearly all systems should be low latency. "The system
should have low latency search, < 500ms," is much more useful.
Here is a checklist of things(non functional requirements) to consider that might help you
identify the most important non-functional requirements for your system. You'll want to
identify the top 3-5 that are most relevant to your system.
Our suggestion is to explain to the interviewer that you would like to skip on estimations upfront
and that you will do math while designing when/if necessary.perform calculations only if they will
directly influence your design
When would it be necessary? Imagine you are designing a TopK system for trending topics in FB
posts. You would want to estimate the number of topics you would expect to see, as this will
influence whether you can use a single instance of a data structure like a min-heap or if you need
to shard it across multiple instances, which will have a big impact on your design.
Core Entities: These are the core entities that your API will exchange and that your
system will persist in a Data Model.
User
Tweet
Follow
Aim to choose good names for your entities
It's incredibly common for candidates to start layering on complexity too early, resulting in them never arriving at a
complete solution. Focus on a relatively simple design that meets the core functional requirements, and then layer on
complexity to satisfy the non-functional requirements in your deep dives section. It's natural to identify areas where
you can add complexity, like caches or message queues, while in the high-level design. We encourage you to note
these areas with a simple verbal callout and written note, and then move on.
As you're drawing your design, you should be talking through your thought process with your
interviewer. Be explicit about how data flows through the system and what state (either in
databases, caches, message queues, etc.) changes with each request, starting from API requests
and ending with the response. When your request reaches your database or persistence layer, it's
a great time to start documenting the relevant columns/fields for each entity. You can do this
directly next to your database visually. This helps keep it close to the relevant components and
makes it easy to evolve as you iterate on your design.
Deep Dives
A) ensuring it meets all of your non-functional requirements (b) addressing edge cases (c)
identifying and adressing issues and bottlenecks and (d) improving the design based on probes
from your interviewer.
So for example, one of our non-functional requirements for Twitter was that our system needs to
scale to >100M DAU. We could then lead a discussion oriented around horizontal scaling, the
introduction of caches, and database sharding -- updating our design as we go. Another was that
feeds need to be fetched with low latency. In the case of Twitter, this is actually the most
interesting problem. We'd lead a discussion about fanout-on-read vs fanout-on-write and the use
of caches.
Core Concepts:
Scaling:
If you can estimate your workload and determine that you can scale vertically for the
forseeable future, this is often a better solution than horizontal scaling. Many
systems can scale vertically to a surprising degree.
The first challenge of horizontal scaling is getting the work to the right machine. This is often done
via a load balancer. For asynchronous jobs work, this is often done via a queueing system.
However, in hybrid systems (real-time + asynchronous), you might see both load
balancers and queues used together!
Work distribution needs to try to keep load on the system as even as possible. For example, if
you're using a hash map to distribute work across a set of nodes, you might find that one node is
getting a disproportionate amount of work because of the distribution of incoming requests.
Data Distribution:
We can keep data in-memory on the node that's processing the request. data in a database that's
shared across all nodes. Look for ways to partition your data such that a single node can access
the data it needs without needing to talk to another node. If you do need to talk to other nodes (a
concept known as "fan-out"), keep the number small.
A common antipattern is to have requests which fan out to many different nodes and then the
results are all gathered together. This "scatter gather" pattern can be problematic because it can
lead to a lot of network traffic, is sensitive to failures in each connection, and suffers from tail
latency issues.
How to mitigate Tail Latency:
Many candidates will try to make a decision on consistency across their entire system, but many systems will actually
blend strong and weak consistency in different parts of the system. For example, on an ecommerce site the item
details might be eventually consistent (if you update the description of an item, it's okay if it takes a few minutes for
the description to update on the product page) but the inventory count needs to be strongly consistent (if you sell an
item, it's not okay for another customer to buy the same unit).
Locks happen at every scale of computer systems: there are locks in your operating system
kernel, locks in your applications, locks in the database, and even distributed locks. Locks are
important for enforcing the correctness of our system but can be disastrous for performance.
Traditional databases with ACID properties use transaction locks to keep data consistent, which is
great for ensuring that while one user is updating a record, no one else can update it, but they're
not designed for longer-term locking. This is where distributed locks come in handy.
Distributed locks are perfect for situations where you need to lock something across different
systems or processes for a reasonable period of time. They're often implemented using a
distributed key-value store like Redis or Zookeeper. The basic idea is that you can use a key-value
store to store a lock and then use the atomicity of the key-value store to ensure that only one
process can acquire the lock at a time. For example, if you have a Redis instance with a key
ticket-123 and you want to lock it, you can set the value of ticket-123 to locked.
If another process tries to set the value of ticket-123 to locked, it will fail because the
value is already set to locked. Once the first process is done with the lock, it can set the value
of ticket-123 to unlocked and another process can acquire the lock.
Another handy feature of distributed locks is that they can be set to expire after a certain amount of
time. This is great for ensuring that locks don't get stuck in a locked state if a process crashes or is
killed. For example, if you set the value of ticket-123 to locked and then the process
crashes, the lock will expire after a certain amount of time (like after 10 minutes) and another
process can acquire the lock at that point.
Redlock is a distributed locking algorithm built on top of Redis,
designed to handle distributed locks in a fault-tolerant and
consistent way. The problem Redlock aims to solve is how to
ensure that a lock is acquired across a distributed system (where
multiple instances or nodes are involved) while avoiding race
conditions and ensuring that locks are properly released.
Why Use Multiple Redis Instances?
Fault tolerance: If one or more Redis nodes fail, Redlock still works as long
as the majority of the nodes are still available. This prevents the entire lock
mechanism from failing if a single Redis server goes down.
Consistency: Using multiple nodes helps to avoid scenarios where one node's
state is inconsistent due to network partitions or failures. The majority
consensus ensures that the lock is acquired safely.
When you horizontally scale data (distribute it across multiple servers):
1. If each server keeps a copy of the data, updates must be synchronized across all
servers.
2. Example: If Server A updates "Name: John," Server B's copy must also be updated to
"Name: John," or inconsistencies occur.
Transactions and distributed locks help solve these challenges, but they come with
trade-offs like added complexity and performance overhead.
If your system design problem involves geography, there's a good chance you have the option to partition by some
sort of REGION_ID.
Replication ensures that the user's data is available in both the US-East and EU-West
regions. Here's how it works:
The user's data is periodically copied from the primary database (e.g., in "US-East") to
secondary databases in other regions (e.g., "EU-West").
This allows the system in "EU-West" to serve the user's requests locally.
Distributed Cache
What is a distributed cache and when should you use it?
As the system gets bigger, the cache size also gets bigger and a single-
node cache often falls short when scaling to handle millions of users and
massive datasets.
In such scenarios, we need to distribute the cache data across multiple
servers. This is where distributed caching comes into play.
https://blog.algomaster.io/p/distributed-caching
The two primary options are using dedicated cache servers or co-
locating the cache with application servers.
Don't forget to be explicit about what data you are storing in the cache, including the data structure you're using.
Remember, modern caches have many different datastructures you can leverage, they are not just simple key-value
stores. So for example, if you are storing a list of events in your cache, you might want to use a sorted set so that you
can easily retrieve the most popular events. Many candidates will just say, "I'll store the events in a cache" and leave
it at that. This is a missed opportunity and may invite follow-up questions.
The two most common in-memory caches are Redis and Memcached. Redis is a key-value store
that supports many different data structures, including strings, hashes, lists, sets, sorted sets,
bitmaps, and hyperloglogs. Memcached is a simple key-value store that supports strings and
binary objects.
A distributed cache is a system that stores data in memory across
multiple nodes (servers) in a distributed network, making it highly
available and scalable.
Consistency: Most real-life systems don't require strong consistency everywhere. For
example, a social media feed can be eventually consistent -- if you post a tweet, it's okay if it takes
a few seconds for your followers to see it. However, a banking system needs to be strongly
consistent -- if you transfer money from one account to another, it's not okay if the money
disappears from one account and doesn't appear in the other.
many systems will actually blend strong and weak consistency in different parts of the system. For example, on an
ecommerce site the item details might be eventually consistent (if you update the description of an item, it's okay if it
takes a few minutes for the description to update on the product page) but the inventory count needs to be strongly
consistent (if you sell an item, it's not okay for another customer to buy the same unit).
Locking: Locking is the process of ensuring that only one client can access a shared resource at
a time.
Locks happen at every scale of computer systems: there are locks in your operating system
kernel, locks in your applications, locks in the database, and even distributed locks.
Locks are important for enforcing the correctness of our system but can be disastrous for
performance.
Technically, ElasticSearch is not a database in the same way MySQL or MongoDB is.
However:
It can store data (in JSON format), so some people use it as a NoSQL database.
Its primary purpose is search and indexing rather than managing transactional data (e.g.,
updating user account balances like traditional databases).
So, ElasticSearch is best thought of as a search layer or search engine for your
database, rather than a replacement for the database itself.
Secondary indexes are additional data structures created on top of your primary
database to enable faster searches. ElasticSearch can act as a secondary index for
databases by indexing your data and allowing for:
1. Full-text search: Searching large volumes of text very quickly (e.g., finding articles containing
"climate change").
2. Geospatial search: Finding nearby locations (e.g., restaurants within 5 km).
3. Vector search: Finding similar items (e.g., images or documents).
You've got two different categories of protocols to handle: internal and external.
Internally, for a typical microservice application which consistitues 90%+ of system design
problems, either HTTP(S) or gRPC will do the job. Don't make things complicated.
Externally, you'll need to consider how your clients will communicate with your system:
who initiates the communication, what are the latency considerations, and how much
data needs to be sent.
Across choices, most systems can be built with a combination of HTTP(S), SSE or long
polling, and Websockets.
Use HTTP(S) for APIs with simple request and responses. Because
each request is stateless, you can scale your API horizontally by
placing it behind a load balancer. Make sure that your services aren't
assuming dependencies on the state of the client (e.g. sessions) and
you're good to go.
When a client makes a request and waits for data, the load
balancer simply routes the request to an available server. If that
server is processing a long-polling request, it continues to hold it
until it can respond with new data.
Lastly, Server Sent Events (SSE) are a great way to send updates
from the server to the client. They're similar to long polling, but
they're more efficient for unidirectional communication from the
server to the client. SSE allows the server to push updates to the
client whenever new data is available, without the client having to
make repeated requests as in long polling. This is achieved through a
single, long-lived HTTP connection, making it more suitable for
scenarios where the server frequently updates data that needs to be
sent to the client. Unlike Websockets, SSE is designed specifically
for server-to-client communication and does not support client-to-
server messaging. This makes SSE simpler to implement and
integrate into existing HTTP infrastructure, such as load balancers
and firewalls, without the need for special handling.
Statefulness is a major source of complexity for systems. Where possible, relegating your state to a
message broker or a database is a great way to simplify your system. This enables your services to be
stateless and horizontally scalable while still maintaining stateful communication with your clients.
For example, if you're building a system where every user has a dedicated
service instance, you might hash the user ID and route all messages for that
user to the same partition or queue. This way, the same service will process
those messages consistently.
Queue
The most common queueing technologies are Kafka and SQS. Kafka is a distributed streaming
platform that can be used as a queue, while SQS is a fully managed queue services provided by AWS.
Sometimes you'll be asked a question that requires either processing vast amounts of data in real-time
or supporting complex processing scenarios, such as event sourcing.
Event sourcing is a technique where changes in application state are stored as a sequence of events. These events can
be replayed to reconstruct the application's state at any point in time, making it an effective strategy for systems that
require a detailed audit trail or the ability to reverse or replay transactions.
In either case, you'll likely want to use a stream. Unlike message queues, streams can retain data for a
configurable period of time, allowing consumers to read and re-read messages from the same position
or from a specified time in the past. Streams are a good choice
The most common stream technologies are Kafka and Kinesis. Kafka can be configured to be both a
message queue and a stream, while Kinesis is a fully managed stream service provided by AWS. Both
are great choices.
Security
Authentication/Authorization
In many systems you'll expose an API to external users which needs to be locked down to only
specific users. Delegating this work to either an API Gateway or a dedicated service like Auth0 is a
great way to ensure that you're not reinventing the wheel.
often it's sufficient to say "My API Gateway will handle authentication and authorization".
The most common API gateways are AWS API Gateway, Kong, and Apigee.
An API gateway accepts API requests from a client, processes them based on
defined policies, directs them to the appropriate services, and combines the
responses for a simplified user experience. Typically, it handles a request by
invoking multiple microservices and aggregating the results. It can also
translate between protocols in legacy deployments.
https://www.nginx.com/blog/how-do-i-choose-api-gateway-vs-ingress-controller-vs-service-mesh/
Backend API Response:
The backend API processes the request and sends a response back to the client. The API
Gateway may handle additional concerns like rate limiting, logging, or security headers.
Encryption:
You'll want to cover both the data in transit (e.g. via protocol encryption) and the data at rest (e.g.
via storage encryption). HTTPS is the SSL/TLS protocol that encrypts data in transit and is the
standard for web traffic. If you're using gRPC it supports SSL/TLS out of the box. For data at rest,
you'll want to use a database that supports encryption or encrypt the data yourself before storing it.
For example, if you're building a system that stores user data, you might want to encrypt that data with a key that's
unique to each user. This way, even if your database is compromised, the data is still secure.
DATA PROTECTION:
A hacker could send a lot of requests to your endpoint, trying to
access information they shouldn't be able to. This is often called
"scraping," where someone collects data by sending many requests,
sometimes without permission.
MONITORING:
DATABASES:
the most common are relational databases (e.g. Postgres) and NoSQL databases (e.g.
DynamoDB) - we recommend you pick one of these for your interview. If you are taking
predominantly product design interviews, we recommend you pick a relational database. If you are
taking predominantly infrastructure design interviews, we recommend you pick a NoSQL database.
The great thing about relational databases is (a) their support for
arbitrarily many indexes, which allows you to optimize for different
queries and (b) their support for multi-column and specialized
indexes (e.g. geospatial indexes, full-text indexes.)
Blob Storage:
Blob storage services are simple. You can upload a blob of data and that data is stored and get
back a URL. You can then use this URL to download the blob of data. Often times blob storage
services work in conjunction with CDNs, so you can get fast downloads from anywhere in the
world. Upload a file/blob to blob storage which will act as your origin, and then use a CDN to cache
the file/blob in edge locations around the world.
Avoid using blob storage like S3 as your primary data store unless you have a very good reason. In a typical setup
you will have a core database like Postgres or DynamoDB that has pointers (just a url) to the blobs stored in S3. This
allows you to use the database to query and index the data, while still getting the benefits of cheap blob storage.
Full-text search is the ability to search through a large amount of text
data and find relevant results. This is different from a traditional
database query, which is usually based on exact matches or ranges.
Without a search optimized database, you would need to run a query
that looks something like this:
SELECT * FROM documents WHERE document_text LIKE
'%search_term%'
This query is slow and inefficient, and it doesn't scale well because it
requires a full table scan. Search optimized databases, on the other
hand, are specifically designed to handle full-text search. They use
techniques like indexing, tokenization, and stemming to make search
queries fast and efficient. In short, they work by building what are
called inverted indexes.
Elastic Search:
Sharding in ElasticSearch:
Elasticsearch automatically sends the query to Shard 2 because "Harry Potter" starts with
"H."
The results from all shards are combined and returned to the user.
When you create an index in Elasticsearch, the data in that index is divided into
smaller, more manageable pieces called primary shards. Each primary shard is
responsible for storing and indexing a portion of your data.
CDN: CDNs are often used to deliver static content like images, videos, and HTML files, but
they can also be used to deliver dynamic content like API responses.
A CDN (Content Delivery Network) does cache content in itself, but only under
certain conditions. Let me clarify how CDNs work and address your question:
CDNs like Cloudflare, Akamai, and Amazon CloudFront maintain thousands of edge
servers worldwide in regions like:
North America, Europe, Asia-Pacific, Africa, South America, and the Middle East.
They are strategically placed near major Internet Exchange Points (IXPs) to connect efficiently
to local ISPs (Internet Service Providers).
Patterns:
This pattern is common in systems that need to process a lot of data, like a social network that needs to
process a lot of images or videos. You'll use a queue to store jobs, and a pool of workers to process them.
A popular option for the queue is SQS, and for the workers, you might use a pool of EC2 instances or
Lambda functions. SQS guarantees at least once delivery of messages and the workers will respond back
to the queue with heartbeat messages to indicate that they are still processing the job. If the worker fails
to respond with a heartbeat, the job will be retried on another host.
Another option is for your queue to be a log of events coming from something like Kafka. Kafka gives you
many of the same guarantees as SQS, but since the requests are written to an append-only log, you can
replay the log to reprocess events if something goes wrong.
Kafka is a system designed to handle large streams of data in real-time. It works like a message broker,
which means it helps applications send messages to each other. But unlike some other systems (like SQS),
Kafka stores messages in a log.
An append only log is like a diary or notebook where you keep writing new entries at the end. Once
written, entries are never erased or modified.
In Kafka:
2. These messages are stored for a certain period (or until they are manually deleted), even
after being read.
3. You can “replay” the log to process past messages again if needed
1. Replay Events: If something goes wrong while processing a message, you can go back to the
log and reprocess it. For example:
• With Kafka, you can replay the log to process payments from the point of failure without
losing data.
2. Multiple Consumers: Many different systems can read the same log and process it
independently without interfering with each other
Example to clarify:
1. Every time a user makes a purchase, an “Order Placed” event is sent to Kafka.
4. If the payment system crashes, you can replay the log to reprocess payment events.
Two stage architecture
A common problem in system design is in "scaling" an algorithm with poor performance
characteristics.
Two-stage architecture solves the problem of scaling an
algorithm with poor performance characteristics by
addressing efficiency at large scales. Here's how:
Event-Driven Architecture
Event-Driven Architecture (EDA) is a design pattern centered around
events. This architecture is particularly useful in systems where it is
crucial to react to changes in real-time. EDA helps in building
systems that are highly responsive, scalable, and loosely coupled.
The core components of an EDA are event producers, event routers (or brokers), and event
consumers. Event producers generate a stream of events which are sent to an event router. The
router, such as Apache Kafka or AWS EventBridge, then dispatches these events to appropriate
consumers based on the event type or content. Consumers process the events and take
necessary actions, which could range from sending notifications to updating databases or
triggering other processes.
An example use of EDA could be in an e-commerce system where an event is emitted every time
a new order is placed. This event can trigger multiple downstream processes like order
processing, inventory management, and notification systems simultaneously.
One of the more important design decisions in event-driven architectures is how to handle failures. Technologies like
Kafka keep a durable log of their events with configurable retention which allows processors to pick up where they
left off. This can be a double-edged sword! If your system can only process N messages per second, you may
quickly find yourself in a situation where you'll take hours or even days to catch back up with the service substantially
degraded the entire time. Be careful about where this is used.
Key Point
The durable log in Kafka ensures no messages are lost, which is great for reliability.
However, if your system gets overwhelmed by a backlog, this reliability can become a
double-edged sword:
Some systems need to manage long-running jobs that can take hours or days to complete. For
example, a system that needs to process a large amount of data might need to run a job that takes
a long time to complete. If the system crashes, you don't want to lose the progress of the job. You
also want to be able to scale the job across multiple machines.
A common pattern is to use a log like Kafka to store the jobs, and then have a pool of workers that
can process the jobs. The workers will periodically checkpoint their progress to the log, and if a
worker crashes, another worker can pick up the job where the last worker left off. Another option is
to use something like Uber's Cadence (more popularly Temporal).
Setups like this can be difficult to evolve with time. For example, if you want to change the format of the job, you'll
need to handle both the old and new formats for a while.
How to handle these changes?
· Kafka and tools like Temporal ensure jobs are processed reliably, even if workers crash.
· The challenge lies in evolving the system over time, such as changing the job format.
· Solutions include versioning, backward compatibility, and careful updates to handle both old and
new job formats.
The architecture typically involves dividing the geographical area into manageable regions and
indexing entities within these regions. This allows the system to quickly exclude vast areas that
don't contain relevant entities, thereby reducing the search space significantly.
Note that most systems won't require users to be querying globally. Often, when proximity is
involved, it means users are looking for entities local to them.
For example:
If the user is in NYC, the index skips all restaurants in San Francisco, significantly reducing the
search space.
While geospatial indexes are great, they're only really necessary when you need to index hundreds of thousands or
millions of items. If you need to search through a map of 1,000 items, you're better off scanning all of the items than
the overhead of a purpose-built index or service.
Web Server is designed to serve HTTP Content. App Server can also serve HTTP
Content but is not limited to just HTTP. It can be provided other protocol support such
as RMI/RPC
Web Server is mostly designed to serve static content, though most Web
Servers have plugins to support scripting languages like Perl, PHP, ASP, JSP
etc. through which these servers can generate dynamic HTTP content.
Most of the application servers have Web Server as integral part of them, that means
App Server can do whatever Web Server is capable of. Additionally App Server have
components and features to support Application level services such as Connection
Pooling, Object Pooling, Transaction Support, Messaging services etc.
Transactions:
As web servers are well suited for static content and app servers for dynamic content, most
of the production environments have web server acting as reverse proxy to app server.
That means while servicing a page request, static contents (such as images/Static HTML)
are served by web server that interprets the request. Using some kind of filtering technique
(mostly extension of requested resource) web server identifies dynamic content request
and transparently forwards to app server.
· Reverse Proxy: When you need to improve backend security, enable SSL termination, or cache
content close to clients.
E.g : Nginx
Reverse Proxy:
· Forward Proxy: When clients need anonymity, content filtering, or access to restricted resources.
· Load Balancer: When you need to distribute traffic for better availability and performance.
The order in which a Load Balancer, API Gateway, and Proxy are
placed in an architecture depends on the specific design and
purpose of each component.
Example Scenario
In some setups, the API Gateway itself might perform load balancing (e.g., AWS API
Gateway).
A proxy might not be required if a load balancer or API Gateway already handles all
necessary functions.
Database With the growth of the user base, one server is not
enough, and we need multiple servers: one for web/mobile traffic,
the other for the database. Separating web/mobile traffic (web tier)
and database (data tier) servers allows them to be scaled
independently.
· Serialization is the process of converting data or objects (like those used in programming) into a
format that can be stored or transmitted.
· These formats are typically text-based or binary, such as JSON, XML, or YAML, which are easy to
send over a network or save in a file.
What is Serialization?
Serialization is the process of converting data or objects (like those used in programming)
into a format that can be stored or transmitted.
These formats are typically text-based or binary, such as JSON, XML, or YAML, which are
easy to send over a network or save in a file.
python
Copy code
person = {
"name": "Alice",
"age": 30,
"city": "New York"
}
To send this over the internet or store it in a database, you serialize it into a JSON
string:
json
Copy code
{
"name": "Alice",
"age": 30,
"city": "New York"}
This JSON string is the serialized version of the person object. It’s compact and easy
to send or save.
What is Deserialization?
Deserialization is the reverse process: converting the serialized data (e.g., JSON or XML) back
into the original object or data structure that can be used by the application.
Think of it as unpacking the data from the suitcase so you can use it.
If the website traffic grows rapidly, and two servers are not enough
to handle the traffic, the load balancer can handle this problem
gracefully. You only need to add more servers to the web server
pool, and the load balancer automatically starts to send requests to
them.
Cache: Every time a new web page loads, one or more database
calls are executed to fetch data. The application performance is
greatly affected by calling the database repeatedly. The cache can
mitigate this problem.
After receiving a request, a web server first checks if the cache has
the available response. If it has, it sends data back to the client. If
not, it queries the database, stores the response in cache, and
sends it back to the client. This caching strategy is called a read-
through cache.
CDNs like Cloudflare and Amazon CloudFront have servers distributed across the world.
Static assets (JS, CSS, images, etc.,) are no longer served by web
servers. They are fetched from the CDN for better performance. 2.
The database load is lightened by caching data.
State:
In the context of web applications, "state" refers to any data that is stored or tracked
during a user's interaction with the web application. This data can include things like:
User session information: This might include things like the user’s login status, preferences,
or items in their shopping cart.
Application data: This can be temporary or session-specific data related to the operations or
actions a user is performing on the site.
When you interact with a web application, the system needs to remember things about
you across different pages. For example, when you log in to an online store and add
items to your cart, the system needs to keep track of who you are and what you've
added to the cart while you navigate between pages. This is the "state" of your
session.
Data Centers:
Example setup with two data centers. In normal operation, users
are geoDNS-routed, also known as geo-routed, to the closest data
center, with a split traffic of x% in US-East and (100 – x)% in US-
West. geoDNS is a DNS service that allows domain names to be
resolved to IP addresses based on the location of a user.
In the event of any significant data center outage, we direct all
traffic to a healthy data center.
With the message queue, the producer can post a message to the
queue when the consumer is unavailable to process it. The
consumer can read messages from the queue even when the
producer is unavailable.
In this model, messages are sent from one producer to one consumer.
Horizantal scaling:
Consensus Algorithms
HotStuff
Paxos
Raft etc…
Batch Processing:
Batch Processing Frameworks and Tools: Apache Hadoop, Apache Spark, AWS Batch
Stream Processing:
Stream Processing Workflow:
Stream Processing Frameworks and Tools: Apache Kafka, Apache Flink, Apache Kinesis
This method bridges the gap between traditional batch and stream
processing by processing small chunks of data over short intervals.
This allows for near real-time processing with the simplicity of batch
processing.
The Circuit Breaker design pattern is used to stop the request and
response process if a service is not working.
You can leverage the Circuit Breaker Design Pattern to avoid such
issues. The consumer will use this pattern to invoke a remote
service using a proxy. This proxy will behave as a circuit barrier.
During this timeout period, any requests to the offline server will
fail. When that time period is up, the circuit breaker will allow a
limited number of tests to pass, and if those requests are successful,
the circuit breaker will return to normal operation. If there is a
failure, the time out period will start again.
--> Service Running out of threads
Idempotency:
This scenario highlights a common problem in distributed
systems: handling repeated operations gracefully.
Unlike the traditional HTTP protocol, where the client sends a request to
the server and waits for a response, WebSockets allow both the client and
server to send messages to each other independently and continuously
after the connection is established.
The data set is partitioned among multiple nodes to horizontally scale out. The different
techniques for partitioning the cache servers are the following :
Random assignment
Single global cache
Key range partitioning
Static hash partitioning
Consistent hashing
The basic gist behind the consistent hashing algorithm is to hash both node identifiers
and data keys using the same hash function. A uniform and independent hashing
function such as message-digest 5 (MD5) is used to find the position of the nodes and
keys (data objects) on the hash ring. The output range of the hash function must be of
reasonable size to prevent collisions.
There is a chance that nodes are not uniformly distributed on the consistent hash ring.
The nodes that receive a huge amount of traffic become hotspots resulting in cascading
failure of the nodes.
The nodes are assigned to multiple positions on the hash ring by hashing the node IDs
through distinct hash functions to ensure uniform distribution of keys among the nodes.
The technique of assigning multiple positions to a node is known as a virtual node.
The virtual nodes improve the load balancing of the system and prevent hotspots. The
number of positions for a node is decided by the heterogeneity of the node. In other
words, the nodes with a higher capacity are assigned more positions on the hash ring.
more spaces are needed to store data about virtual nodes. This is a
tradeoff, and we can tune the number of virtual nodes to fit our
system requirements.
The data objects can be replicated on adjacent nodes to minimize the data movement
when a node crashes or when a node is added to the hash ring. In conclusion, consistent
hashing resolves the problem of dynamic load.
The distributed NoSQL data stores such as Amazon DynamoDB, Apache Cassandra,
and Riak use consistent hashing to dynamically partition the data set across the set
of nodes. The data is partitioned for incremental scalability.
Consistent hashing with bounded loads: The consistent hashing with bounded
load puts an upper limit on the load received by a node on the hash ring, relative to
the average load of the whole hash ring. The distribution of requests is the same as
consistent hashing as long as the nodes are not overloaded.
When a specific data object becomes extremely popular, the node hosting the data
object receives a significant amount of traffic resulting in the degradation of the
service. If a node is overloaded, the incoming request is delegated to a fallback
node. The list of fallback nodes will be the same for the same request hash. In
simple words, the same node(s) will consistently be the “second choice” for a
popular data object. The fallback nodes resolve the popular data object caching
problem.
If a node is overloaded, the list of the fallback nodes will usually be different for
different request hashes. In other words, the requests to an overloaded node are
distributed among the available nodes instead of a single fallback node.
Cloud disaster recovery can greatly reduce the costs of RTO and RPO when it comes to
fulfilling on-premises requirements for capacity, security, network infrastructure,
bandwidth, support, and facilities. A highly managed service on Google Cloud can help
you avoid most, if not all, complicating factors and allow you to reduce many business
costs significantly.
https://www.dynatrace.com/news/blog/what-is-distributed-tracing/
https://youtu.be/XYvQHjWJJTE?si=WEk6KLr9HFmvg82D
With open telemetry developers can capture and export trace data to observability
tools for visualization, analysis and troubleshooting . Jaeger and Zipkin integrate with
open telemetry and can correlate data from span and provide webpage visualization.
In addition to open telemetry there are other tools and platforms available that offer
distributed tracing capabilities along with additional features. Some of these include
APM (Application Performance Monitoring) solutions like new relic, splunk, datadog.
APM tools provide end to end visibility into applications performance and offers
features beyond distributed tracing. They collect data on errors, metrics and logs
providing a comprehensive view of the systems health. These tools typically offer
user friendly interfaces and advanced analytics capabilities making them suitable for
monitoring and optimizing complex distributed systems.
Instead of Open Telemetry (Vendor Neutral) if we use vendor specific solution like
new relic APM for data collection you are tightly coupled to that particular vendor.If
we want to change vendor it would require changes to instrumentation code/
configurations. However by leveraging open telemetry you can achieve a higher level
of flexibility and portability since open telemetry provides a standard way to collect
and export telemetry data you can switch b/w different vendors or tools without
needing to modify your application code extensively. We can just simply configure
open Telemetry exporters to send the data to new vendors platform without rewriting
or reconfiguring your application instrumentation. This flexibility gives you the
freedom to choose the best monitoring and observability solution for your needs and
easily adapt to changing requirements or preferences.
It is used to answer the question, is this element in the set? A Bloom filter would
answer with either a firm no or a probably yes. In other words, false positives are
possible. That is, the element is not there, but the Bloom filter says it is. While false
negatives are not possible. That is, the element is there, but the Bloom filter says it's
not. The probably yes part is what makes a Bloom filter probabilistic.
As with many things in software engineering, this is a trade-off. The trade-off here is
this. In exchange for providing sometimes incorrect false positive answers, a Bloom
filter consumes a lot less memory than a data structure, like a hash table that would
provide a perfect answer all the time.
Many NoSQL databases use bloom filters to reduce the disk reads for keys that don't
exist. With an LSM tree-based database, searching for a key that doesn't exist requires
looking through many files and is very costly.
Content delivery networks like Akamai use Bloom Filter to prevent caching one-hit
wonders. These are web pages that are only requested once. According to Akamai,
75% of the pages are one-hit wonders. Using a Bloom Filter to track all the URLs
seen and only caching a page on the second request, it significantly reduces the
caching workload and increases the caching hit rate.
Web browsers like Chrome, used to use a Bloom filter to identify malicious URLs.
Any URL was first checked against a Bloom filter. It only performed a more
expensive full check of the URL if the Bloom filter returned a probably-yes answer.
This is no longer used, however, as the number of malicious URLs grows to the
millions, and a more efficient but complicated solution is needed.
Similarly, some password validators use bloom filter to prevent users from using
weak passwords. Sometimes a strong password will be a victim of a false positive.
But in this case, they could just ask the users to come up with another password.
Now let's discuss how a bloom filter works. A critical ingredient to a good bloom
filter is some good hash functions. These hash functions should be fast, and they
should produce outputs that are evenly and randomly distributed. Collisions are okay
as long as they are rare. A Bloom filter is a large set of buckets, with each bucket
containing a single bit, and they all start with a 0. Let's imagine we want to keep track
of the food I like. For this example, we'll use a Bloom filter with 10 buckets labeled
from 0 to 9. And we would use 3 hash functions. Let's start by putting ribs into the
Bloom filter.
The three hash functions return the numbers 1, 3, and 4. These will set the buckets at
those locations to 1. Now this can be done in constant time. Next, let's put potato into
the bloom filter. The hashing function returned the numbers 0, 4, and 8 this time.
let's see if the bloom filter thinks I like porkchop. In this case, the hash functions
return the buckets 0, 5, and 8. And even though buckets 0 and 8 are set to 1, bucket 5
is 0. In this case, the bloom filter can confidently say no, I don't like porkchop.
But how do we get the Bloom filter to tell us something is there when it's not? Let's
walk through an example. Let's say Lemon hashes to the bucket 1, 4, and 8. Since all
those buckets are set to 1, even though I don't like Lemon, the Bloom filter will return
Yes in this case, which is a false positive.
We can control how often we see false positives. by choosing the correct size for the
bloom filter based on the expected number of entries in it. These are trade-offs
between space used and accuracy.
Throughput vs Latency:
Latency determines the delay that a user experiences when they send or receive data from the network. Throughput
determines the number of users that can access the network at the same time.
How to measure
You can measure network latency by measuring ping time. This process is where you transmit a small data packet
and receive confirmation that it arrived.
Most operating systems support a ping command which does this from your device. The round-trip-time (RTT)
displays in milliseconds and gives you an idea of how long it takes for your network to transfer data.
You can measure throughput either with network testing tools or manually. If you wanted to test throughput manually,
you would send a file and divide the file size by the time it takes to arrive. However, latency and bandwidth impact
throughput. Because of this, many people use network testing tools, as the tools report throughput alongside other
One of the most important factors is the location of where data originates and its intended destination. If your servers
are in a different geographical region from your device, the data has to travel further, which increases latency. This
https://blog.algomaster.io/p/aec1cebf-6060-45a7-8e00-47364ca70761
REST VS RPC : https://blog.algomaster.io/p/106604fb-b746-41de-88fb-60e932b2ff68
REST treats server data as resources that can be created, read, updated, or deleted (CRUD
operations) using standard HTTP methods (GET, POST, PUT, DELETE).
Data and Resources: Emphasizes on resources, identified by URLs, and their state
transferred over HTTP in a textual representation like JSON or XML.
RPC: Remote Procedure Calls --> It is designed to make a network call look just like
a local function call
Why RPC:
REST remains a solid choice for public APIs and web services due to its
scalability, flexibility, and widespread adoption.
- Key points:
- Data is loaded into the cache only when it is requested (on-demand).
- The cache is populated gradually, based on what the application needs.
- The cache doesn’t automatically know when data in the DB is updated, leading to potential stale
data.
- Best for:
- Systems where reads are infrequent but maintaining up-to-date data is crucial.
- Applications with unpredictable data access patterns.
- Drawback:
- Cache miss penalty on the first read, leading to slower initial response.
Cache-through :
- The cache is tightly integrated with the DB.
- All reads and writes are routed through the cache.
- If a cache miss occurs, the cache fetches data from the DB, updates itself, and serves the data.
- Key points:
- The cache acts as a proxy for the DB.
- Changes to the DB are automatically synchronized with the cache.
- Ensures that data consistency between the cache and DB is maintained.
- Best for:
- Applications requiring a consistent caching layer that is closely tied to the DB.
- Systems with frequent reads and writes where maintaining the cache manually would be complex.
- Drawback:
- Increased complexity due to tight coupling between the cache and DB.
- If the cache layer is unavailable, the system may fail to handle requests efficiently.
Refresh-ahead :
- Cache entries are refreshed predictively before they expire.
- Uses patterns or demand forecasting to update data in advance.
- Key points:
- Ensures that frequently accessed data is always fresh in the cache.
- Minimizes cache misses by updating data before it’s requested.
- Works well with time-sensitive or regularly accessed data.
- Best for:
- Systems where cache misses are expensive (e.g., large-scale distributed systems).
- Apps with predictable data access patterns (e.g., stock prices, live scores).
- Drawback:
- Can lead to wasted resources by preloading data that may not be accessed.
Write-through (Synchronous) :
- Data is written to both the cache and the database simultaneously during a write operation.
- Ensures that the cache always contains the most up-to-date data.
- Key points:
- Writes are atomic, ensuring data consistency between the cache and the database.
- Slower write performance due to simultaneous writes to both layers.
- Useful when the cache needs to store the exact same data as the database.
- Best for:
- Systems where data consistency is critical, and the cache must always reflect the latest state.
- Applications with moderate write operations.
- Drawback:
- Increased latency in write operations as both the cache and database must be updated.
Write-behind (Asynchronous) :
- Data is first written to the cache.
- The cache asynchronously writes the data to the database later.
- Key points:
- Faster write performance as the database write is deferred.
- Reduces load on the database for write-heavy systems.
- Risk of data loss if the cache fails before syncing with the database.
- Best for:
- Write-heavy systems where immediate database consistency isn’t required.
- Scenarios where optimizing write latency is more important than immediate durability.
- Drawback:
- Increased complexity due to managing asynchronous writes.
- Possibility of stale or missing data if the cache fails before persisting.
Write-around:
- Data is written directly to the database, bypassing the cache during write operations.
- The cache gets updated only when the data is read again.
- Key points:
- Reduces cache churn by avoiding updates for data that isn’t frequently accessed.
- The cache can become inconsistent until the next read operation triggers an update.
- Best for:
- Systems with infrequent reads where caching write operations adds unnecessary overhead.
- Use cases where stale data is acceptable in the short term.
- Drawback:
- Cache inconsistency, as it won’t always reflect the latest writes until a subsequent read occurs.
Summary
- Read Strategies: Cache-aside, Cache-through, and Refresh-ahead focus on how and when data is
loaded into the cache.
- Write Strategies: Write-through, Write-behind, and Write-around deal with how changes in data
propagate between the cache and database.
Each caching strategy comes with trade-offs. Selecting the right one depends on your system’s
requirements for read/write performance, data consistency, and resource efficiency.
HLD Questions:
URL shortener
https://www.youtube.com/watch?v=iUU4O1sWtJA&t=65s
the endpoint /urls is plural because it represents a collection of URLs being managed
by the system.
Your goal is to simply go one-by-one through the core requirements and define the APIs that are
necessary to satisfy them. Usually, these map 1:1 to the functional requirements, but there are
times when multiple endpoints are needed to satisfy an individual functional requirement.
HLD design: we go through each of the APIs and starting with an API we draw out
the system that’s necessary in order to satisfy that API.
https://www.youtube.com/watch?v=JQDHz72OA3c
https://www.hellointerview.com/learn/system-design/problem-breakdowns/bitly
In a URL shortening service, it ensures that when you use a short URL, it always maps to the
correct long URL, no matter which server or data center processes the request.
Horizontally scaling our write service introduces a significant issue! For our short code
generation to remain globally unique, we need a single source of truth for the counter. This
counter needs to be accessible to all instances of the Write Service so that they can all agree
on the next value.
We could solve this by using a centralized Redis instance to store the counter. This Redis
instance can be used to store the counter and any other metadata that needs to be shared
across all instances of the Write Service.
But should we be concerned about the overhead of an additional network request for each new
write request? Because every new request should go through this centralized redis server.
The reality is, this is probably not a big deal. Network requests are fast! In practice, the overhead
of an additional network request is negligible compared to the time it takes to perform other
operations in the system. That said, we could always use a technique called "counter batching"
to reduce the number of network requests.
· IP Whitelisting ensures only trusted clients can communicate with the API.
Token Bucket:
Leaky Bucket:
Con: • A burst of traffic fills up the queue with old requests, and if
they are not processed in time, recent requests will be rate limited.
Ticket Master:
https://www.hellointerview.com/learn/system-design/problem-breakdowns/
ticketmaster
https://www.youtube.com/watch?v=fhdPyoO6aXI
Based on the above functional requirements we should do API design
From website
Sweet, we now have the core functionality in place to view an event! But how are users supposed
to find events in the first place? When users first open your site, they expect to be able to search
for upcoming events. This search will be parameterized based on any combination of keywords,
artists/teams, location, date, or event type.
--> The highlighted query is really slow.
The main thing we are trying to avoid is two (or more) users paying for the same ticket. That would
make for an awkward situation at the event! To handle this consistency issue, we need to select a
database that supports transactions.
While anything from MySQL to DynamoDB would be fine choices (just needs ACID properties),
we'll opt for PostgreSQL. Additionally, we need to implement proper isolation levels and either row-
level locking or Optimistic Concurrency Control (OCC) to fully prevent double bookings.
Stripe handles payment asynchronously. It needs to call out to the credit card/ debit
card company to determine whether or not this payment can hapeen which happens
very quickly and it calls back to our system not by respomding to single request but
actually via web hook so you are going to register a call back URL and we will have
some end points in our booking service which is exposed for this to call back to so we
use two arrows instead of one.
Other issue with this approach is that when a user clicked on a ticket and went to a
payment page and on the payment page they have a timer lets say 10 mins what
happens if 10 mins is exceeded and what if they closed the laptop the ticket status
would stay reserved forever what that means is that when we show users the seat
mapwe would be querying the database for tickets that are available which would
exclude the reserved of course and that seat would be infinitely reserved for that user
which is wrong and doesn’t meet our requirements of it needing to expire after 10
minutes.
So how do we handle it ?
1) We can add an additional column in the Ticket table with the time stamp of when
the ticket is reserved. We can make a query to the database to show all available
tickets and the reserved tickets whose timestamp exceeds 10 minutes limit.
2) We can introduce a cron job that will run every 10 mins or so and its responsible
for querying the database for every ticket that’s in a reserved status and checking it’s
reserved timestamp, if its more than 10 mins It will set the status to available. By the
time a cron job was running if time consumed for a reserved ticket is 9 mins then its
status won’t be changed but it will be changed after 19mins.
When a user try to reserve a ticket we are not going to write to the database at all
instead we are simply going to put that ticket in our lock and we are going to lock
that ticket for 10 mins by setting the ticket ID with a TTL of 10 mins.
If through our event CRUD service we want to view all available tickets for that event
we will first query our database for all tickets that have a status of available and then
for each of those ticket IDs we would need to look them up in redis to see if they are
reserved , if they are reserved we remove them from that list of available tickets and
send it back to our client.
We are using this redis cache for locking instead of putting it in the memory in
booking service because thers going to be multiple instances of this booking service
and all them needs to have the same consistent singular view of a lock hence we use it
separately.
If the lock goes down and in theory for a 10 minute window users will be using
directly database and because of the ACID properties whoever ends up submitting
that purchase first is going to win and all others will get an error. This is a bad user
experience for that 10 mins window. i.e Users can get to the booking page, type in their
payment details, and then find out that the ticket they wanted is no longer available.
However Elastic Search cannot be used as our primary database due to durability
limitations and no support for complex transaction management.
1) By directly connecting Elastic search with Event CRUD service. Event CRUD
service writes to both Postgres DB and Elastic search --> this becomes complex logic
in our application code because you need to handle the case when write to DB fails
we don’t want that write to happen to Elastic Search and some more cases to consider.
2) CDC: Changes to a primary data store can be put onto a stream and those changes
can be consumed.
In this system we are not doing any ranking or recommendation for users so if two
users search for the same thing they are going to get the same result, To spped up
search queries we can use caching.
If we chose to use AWS Open Search as Elastic Search It provides caching on each of
the instances of our elastic search cluster. Or we can use redis for caching . we can
use the search term or normalize the search term and use as key and search results as
value .
CDN can cache API calls so if the search term string is short enough then we get very
little cache misses however if this search term string is lomg then precision increases
then we can see more cache misses.
We need to make the Seat Map real time to have good user experience for this we can
use Long Polling or Websockets or SSE.
Long Polling : Client sends a HTTP Request and that request is kept open for usually
like 30s to a minute or so for the server to be able to respond and we can keep it in a
while loop. No additional infrastructure and works especially well if users are not on
this page for a long time.
If users sit on this page for more time then we need a more sophisticated approach.
That sophisticated approach could open up a persistent connection like websockets
but here we can use SSE . Websockets are bidirectionsal and SSE is unidirectional
from server side.
Everytime there is a change to either to database or ticket lock we push the change to
client.
For events like Taylor Swift concert the user experience would be they would open
the seat map and everythig goes black they’ll see all these available seats and then
within couple of milliseconds it’s just going to go black because everything gets
booked because we have like 10 million users fighting for 10,000 seats . To fix that
issue we need to introduce a choke point to protect our backend services and improve
user experience. We can use a virtual waiting queue for really popular events which
says thank you for your interest you are in the queue and we’ll let you know when
you are out.
This queue could be redis that would probably be a cheap lightweight implementation
you can use a redis sorted set so that’s a priority queue based on the time they arrived
other implementations make this random so it’s a bit more fair and its not just the
users who are closest to our company servers that can get in first then we will have
some event driven logic like we let 100 people by adding userID’s to the queue once
we have 100 seats booked we let the next 100 people or 100 Whatever it may be and
you pull those people off of the queue and then notify them through SSE.
We can add Redis for event CRUD service and we can cache the events, venues and
performers in redis. We do not cahe tickets because its dynamic.
For scaling:
API Gateway takes care of load balancing and each service can have it own load
balancer .
Database Shard.
From Website:
We need to ensure that the ticket is locked for the user while they are checking out. We also need
to ensure that if the user abandons the checkout process, the ticket is released for other users to
purchase. Finally, we need to ensure that if the user completes the checkout process, the ticket is
marked as sold and the booking is confirmed.
--> doubt
Design a Key Value Store:
The value in a key-value pair can be strings, lists, objects, etc. The
value is usually treated as an opaque object in key-value stores,
such as Amazon dynamo, Memcached, Redis.
Partition Tolerance Focus: Ensures the system doesn’t crash entirely during network issues.
Availability Focus: Ensures users receive responses even when parts of the system are down
or unreachable.
Qurom Consensus:
A coordinator acts as a proxy between the client and the nodes.
Strong consistency is usually achieved by forcing a replica not to accept new
reads/writes until every replica has agreed on current write. This approach is
not ideal for highly available systems because it could block new operations.
Dynamo and Cassandra adopt eventual consistency, which is our recommended
consistency model for our key-value store. From concurrent writes, eventual
consistency allows inconsistent values to enter the system and force the client
to read the values to reconcile.
Reconciliation can happen in two ways:
The system uses reconciliation (client-side or server-side) to detect
and resolve the conflict between valueC and valueD, ensuring the
replicas converge to a consistent state.
Gossip Protocol:
• Node s0 maintains a node membership list shown on the left side.
After failures have been detected through the gossip protocol, the
system needs to deploy certain mechanisms to ensure availability.
In the strict quorum approach, read and write operations could be
blocked as illustrated in the quorum consensus section. A
technique called “sloppy quorum” is used to improve availability.
Instead of enforcing the quorum requirement, the system chooses
the first W healthy servers for writes and first R healthy servers for
reads on the hash ring. Offline servers are ignored.
Handling permanent failures Hinted handoff is used to
handle temporary failures. What if a replica is permanently
unavailable? To handle such a situation, we implement an
anti-entropy protocol to keep replicas in sync. Anti-entropy
involves comparing each piece of data on replicas and
updating each replica to the newest version. A Merkle tree is
used for inconsistency detection and minimizing the amount
of data transferred.