Educative Top 10 System Design
Educative Top 10 System Design
Fahim ul Haq
While SDI questions change over time, some have remained popular
in interviews across various top companies.
Today, we’ll explore the top 10 most commonly asked system design
interview questions, common problems you’ll have to address in
each, and some tools to help you do that.
Required Features
Messages must be sent and received via the internet.
Service must support one-on-one and group chats.
Messages should be stored for later viewing.
Users should be able to send pictures and videos as well as text
messages.
Messages should be encrypted during transit.
Messages should be visible with minimal latency.
Common Problems
What happens if a message is sent without an internet
connection? Is it sent when the connection is restored?
How will you encrypt and decrypt the message without
increasing latency?
How do users receive notifications?
Are messages pulled from the device (server periodically
prompts the devices if they’re waiting to send a message) or are
pushed to the server (device prompts the server that it has a
message to send)?
Tools to Consider
Split the database schema (https://www.educative.io/blog/what-
are-database-schemas-examples) into multiple tables: user
table (with the user ID and contacts), a chat table (with chat IDs
and a list of participating user IDs), and message table (with
past messages a reference to the chat ID).
Use WebSocket for bi-directional connections between device
and server.
Use Push notifications to notify members even if they’re offline.
The app then tracks a route between the driver and user’s current
locations, then from the user’s location to the destination.
Required Features
The system must track the current location of all users and
drivers.
Users and drivers must receive updated trip information while
in transit.
Must support thousands of users at various points in the
process and scale accordingly.
Both driver and user must be constantly connected to the
server.
Common Problems
How can you keep latency low during busy periods?
How is the driver paired with the user? Iterating all drivers to
find Euclidean distance would be inefficient.
What happens if the driver or user loses connection?
How do you store all cached location data?
Tools to Consider
Use the S2Geometry library to split locations into cells. Only
calculate driver distance with drivers in the same cell as the
user.
Use distributed storage to store locations of all users, location
data will only be roughly 1Kb per user.
If location data halts, the device continues to report their
previous location while waiting for reconnection.
Allow a buffer after prompting the closest driver to take a trip.
If they refuse, move to the next driver.
3. Design a URL shortening
| Blog Home (/blog)
Required Features
Returns a URL that is shorter than the original
Must store the original URL
Newly generated URL must be able to link to the stored original
Shortened URL should allow redirects
Must support custom short URLs
Must support many requests at once
Common Problems
What if two users input the same custom URL?
What if there are more users than expected?
How does the database regulate storage space?
Tools to Consider
Use hashing to link original and new URLs
Use REST API to load balance high traffic and handle front-end
client communication
Use multithreading to handle multiple requests at once
Use NoSQL database to store original URLs (no relation
between stored URLs)
Learn how to solve this problem in our step-by-step guide
Design TinyURl and Instagram
(https://www.educative.io/blog/system-design-tinyurl-
instagram)
Required Features
Robust newsfeed and recommendation system
Users can make public posts
Other users can comment or like posts
Must comfortably accommodate many users at once
System must be highly available
Common Problems
Famous users will have millions of followers, how are they
handled vs standard users?
How does the system weight posts by age? Old posts are less
likely to be viewed than new posts.
What’s the ratio of read and write focused nodes? Are there
likely to be more read requests (users viewing posts) or write
requests (users creating posts)?
How can you increase availability? How does the system
update? What happens if a node fails?
Tools to Consider
Use rolling updates and replica nodes to maximize availability.
Use a trained machine learning algorithm to recommend posts.
Create a database schema that stores celebrities and users
separately.
Use a social graph to further track following habits
Required Features
Users must be able to create public posts and apply tags
Posts must be sortable by tag
Other users must be able to post comments in real-time.
The database must store data on each post (views, upvotes, etc.)
The newsfeed must display posts from followed tags AND posts
from other tags that the user will like.
Must support high traffic of viewers and new posts.
Common Problems
Does our product only need to work on the web?
Where are user uploaded images/links stored?
How will the system determine related tags? How many posts
from unfollowed tags are shown in the feed?
How are posts distributed across a network of servers?
Tools to Consider
Use an SQL database to map the relational data (users have
posts, posts have comments/likes, categories have related posts,
etc.)
Use multithreading and a load balancer layer to help support
higher traffic.
Use sharding to break up the system. Consider sharding by
category to store posts of the same tags in one machine.
Use Machine Learning and Natural Language Processing
(https://www.educative.io/blog/what-is-language-modeling-nlp)
to find correlations between the relationships between tags
Keep learning about System
Design Interviews.
Get hands-on practice with all the top SDI questions. Educative’s
courses are created by current developers to help you learn in half
the time.
Required Features
Users should be able to save/delete/update/share files over the
web
Old versions of documents should be saved to rollback
Files updates should sync across multiple devices
Common Problems
Where are the files stored?
How do you handle updates? Do you re-upload the entire file
again?
Do small updates require a full file update?
How does the system handle two users updating a document at
the same time?
Tools to Consider
Use chunking to split files into multiple sections. Updates only
re-upload the section rather than the whole file.
Use cloud storage like Amazon S3
(https://www.educative.io/blog/amazon-aws-best-services) to
handle the internal database.
Make the client constantly check with the server to ensure
concurrent updates are applied.
Required Features
Videos should be uploadable over the web
Users should receive an uninterrupted stream over the internet
Video statistics should be stored and accessible for every video.
Comments must be saved and displayed with the video to other
comments
Should support high traffic of several thousand users
Common Problems
How will your service ensure smooth video streaming on
various internet qualities?
How does your service respond to a sudden drop in streaming
speed (buffering, reduced quality, etc.)?
How are the videos stored?
Tools to Consider
Use cloud technology (https://www.educative.io/blog/beginners-
guide-cloud-computation) to store and transmit video data.
Use Machine Learning to suggest new video content.
Prevent stuttering for inconsistent connections with a delay.
The user views data from a few moments ago rather than as it
comes in.
8. Design an API Rate Limiter for
sites like Firebase or Github
For this question, you’ll create an API Rate Limiter that limits the
number of API calls a service can receive in a given time period to
avoid an overload.
The interviewer can ask for this at various scales, from a single
machine to an entire distributed network.
Required Features
Devices are limited to 10 requests per hour
The Limiter must notify the user if their request is blocked.
Must handle traffic suitable to its scale.
Common Problems
How does your system measure requests per hour? If a user
makes 10 requests at 1:20 then another 10 at 2:10, they’ve made
20 in the same 1-hour window despite the hour change.
How would designing for a distributed system differ from a
local system?
Tools to Consider
Use sliding time windows to avoid hourly resets.
Save a counter integer instead of the request itself to save
space.
9. Design a proximity server like
Yelp or Nearby Places/Friends
For this final question, you’ll design a proximity server that stores
and reports the distance to places like restaurants. Users can search
nearby places by distance or popularity. The database must store
data for 500 million places across the globe but have low latency.
Required Features
Store up to 500 million locations.
Locations must be uniquely identified and have corresponding
data like a quality review and hours of service.
Searches must return results with minimal latency.
Users must be able to search results by distance or quality.
Common Problems
How do you store so much location data?
How do you achieve quick search results?
How does your system handle different population densities?
Rigid latitude/longitude grids will cause varied responsiveness
based on density.
How can we optimize commonly searched locations?
Tools to Consider
Use a relational database to store the list of locations and
related data.
Use caching to store data for the most popular locations.
Use sharding to split data by region.
Search locations within a certain dynamic grid. If there are
more than 500 locations in a single cell, split the grid into 4
smaller cells. Repeat until you only have to search less than 500
locations.
10. Design a search engine
related service like Type-Ahead
This service will partially complete search queries and display 5
suggestions to complete the query. It should adapt to highly
searched content in real-time and suggest that to other users.
Required Features
The service should match partial queries with popular queries.
Minor spelling mistakes should be corrected, i.e. “dgo → dog”
Should guess the 5 most likely options based on the query
Results should update as the query is being written
Common Problems
How strong do you make the spelling mistake corrections?
How do you update selections without causing latency?
How do you determine the most likely completed query? Does
it adapt to the user’s searches?
What happens if the user types very quickly? Do suggestions
only appear after they’re done?
Tools to Consider
Use a natural language processing machine learning algorithm
to anticipate the next characters.
Use Markov Chains (https://www.educative.io/blog/deep-
learning-text-generation-markov-chains) to rank the
probability of top queries.
Update ML algorithm hourly or daily rather than in real-time to
reduce burden.
Happy learning!
Fahim ul Haq
(/)
LEARN
Courses
(/explore)
(/explore/early-access)
Edpresso
(/edpresso)
Assessments New
(/assessments)
Blog
(/blog)
Pricing
(/unlimited)
(/trial)
For Business
(/business)
CodingInterview.com (//codinginterview.com/)
SCHOLARS H IPS
For Students
(/github-students)
For Educators
(/github-educators)
CONTRIBUTE
Become an Author
(/authors)
Become an Affiliate
(/affiliate)
LEGAL
Privacy Policy
(/privacy)
Terms of Service
(/terms)
(/enterprise-terms)
MORE
Our Team
(/team)
(/blog/enterprise)
Quality Commitment
(/quality)
FAQ
(/courses/educative-faq)
Press
(/press)
Contact Us
(/contactUs)
(//linkedin.com/company/educative- (//www.youtube.com/channel/UCT_8FqzTIr2Q1BOtvX_DPPw/?
om/educativeinc) (//twitter.com/educativeinc) (//educativesession
inc/) sub_confirmation=1)