WWW Hellointerview
WWW Hellointerview
It's worth noting that there are related system design problems around designing Blob
Storage itself. This is out of scope for this problem, but you may consider doing some
research on your own to understand how Blob Storage works and how it's designed.
Non-Functional Requirements
Core Requirements
Many candidates struggle with the CAP theorem trade-off for this question. Remember,
you prioritize consistency over availability only if every read must receive the most recent
write; otherwise, the system will break. For example, with a stock trading app, if a user
buys a share of APPL in Germany and then another user immediately tries to buy a share
of APPL in the US, you need to be sure that the first transaction has been replicated to the
US before you can proceed. However, for a file storage system like Dropbox, it's okay if a
user in Germany uploads a file and a user in the US can't see it for a few seconds.
The Set Up
Planning the Approach
Before you move on to designing the system, it's important to start by taking a
moment to plan your strategy. Fortunately, for these product design style
questions, the plan should be straightforward: build your design up sequentially,
going one by one through your functional requirements. T his will help you stay
focused and ensure you don't get lost in the weeds as you go. Once you've satisfied
the functional requirements, you'll rely on your non-functional requirements to
guide you through the deep dives.
1. File: T his is the raw data that users will be uploading, downloading, and
sharing.
2. FileMetadata: T his is the metadata associated with the file. It will include
information like the file's name, size, mime type, and the user who uploaded
it.
3. User: T he user of our system.
In the actual interview, this can be as simple as a short list like this. Just make sure
you talk through the entities with your interviewer to ensure you are on the same
page.
As you move onto the design, your objective is simple: create a system that meets all
functional and non-functional requirements. To do this, I recommend you start by
satisfying the functional requirements and then layer in the non-functional requirements
afterward. This will help you stay focused and ensure you don't get lost in the weeds as
you go.
POST /files
Request:
{
File,
FileMetadata
}
Be aware that your APIs may change or evolve as you progress. In this case, our upload
and download APIs actually evolve significantly as we weigh the trade-offs of various
approaches in our high-level design (more on this later). You can proactively communicate
this to your interviewer by saying, "I am going to outline some simple APIs, but may come
back and improve them as we delve deeper into the design."
POST /files/{fileId}/share
Request:
{
User[] // The users to share the file with
}
With each of these requests, the user information will be passed in the headers (either via
session token or JWT). This is a common pattern for APIs and is a good way to ensure that
the user is authenticated and authorized to perform the action while preserving security.
You should avoid passing user information in the request body, as this can be easily
manipulated by the client.
High-Level Design
1) Users should be able to upload a file from any
device
T he main requirement for a system like Dropbox is to allow users to upload files.
When it comes to storing a file, we need to consider two things:
For the metadata, we can use a NoSQL database like DynamoDB. DynamoDB is a
fully managed NoSQL database hosted by AWS. Our metadata is loosly structured,
with few relations and the main query pattern being to fetch files by user. T his
makes DynamoDB a solid choice, but don't get too caught up in making the right
choice here in your interview. T he reality is a SQL database like PostgreSQL would
work just as well for this use case. Learn more about how to choose the right
database (and why it may not matter), here.
Our schema will be a simple document and can start with something like this:
{
"id": "123",
"name": "file.txt",
"size": 1000,
"mimeType": "text/plain",
"uploadedBy": "user1"
}
As for how we store the file itself, we have a few options. Let's take a look at the
trade-offs of each.
Presigned URLs are a feature provided by cloud storage services, such as Amazon S3, that
allow temporary access to private resources. These URLs are generated with a specific
expiration time, after which they become invalid, offering a secure way to share files
without altering permissions. When a presigned URL is created, it includes authentication
information as part of the query string, enabling controlled access to otherwise private
objects.
This makes them ideal for use cases like temporary file sharing, uploading objects to a
bucket without giving users full API access, or providing limited-time access to resources.
Show More
T he main consideration here in an interview is how you can make this process fast
and efficient. Let's break it down.
Uploader: T his is the client that uploads the file. It could be a web browser, a
mobile app, or a desktop app.
Downloader: T his is the client that downloads the file. Of course, this can be
the same client as the uploader, but it doesn't have to be. We separate them
in our design for clarity.
LB & API Gateway: T his is the load balancer and API Gateway that sits in front
of our application servers. It's responsible for routing requests to the
appropriate server and handling things like SSL termination, rate limiting, and
request validation.
File Service: T he file service is only responsible for writing to and from the file
metadata db as well as requesting presigned URLs from S3. It doesn't actually
handle the file upload or download. It's just a middleman between the client
and S3.
File Metadata DB: T his is where we store metadata about the files. T his
includes things like the file name, size, MIME type, and the user who uploaded
the file. We also store a shared files table here that maps files to users who
have access to them. We use this table to enforce permissions when a user
tries to download a file.
S3: T his is where the files are actually stored. We upload and download files
directly to and from S3 using the presigned URLs we get from the file server.
CDN: T his is a content delivery network that caches files close to the user to
reduce latency. We use the CDN to serve files to the downloader.
1. Prog ress Indicator: Users should be able to see the progress of their
upload so that they know it's working and how long it will take.
2. Resumable Uploads: Users should be able to pause and resume uploads. If
they lose their internet connection or close the browser, they should be able
to pick up where they left off rather than redownloading the 49GB that may
have already been uploaded before the interruption.
T his is, in some sense, the meat of the problem and where I usually end up spending
the most time with candidates in a real interview.
Timeouts: Web servers and clients typically have timeout settings to prevent
indefinite waiting for a response. A single POST request for a 50GB file could
easily exceed these timeouts. In fact, this may be an appropriate time to do
some quick math in the interview. If we have a 50GB file and an internet
connection of 100Mbps, how long will it take to upload the file? 50GB * 8
bits/byte / 100Mbps = 4000 seconds then 4000 seconds / 60
T o address these limitations, we can use a technique called "chunking" to break the
file into smaller pieces and upload them one at a time (or in parallel, depending on
network bandwidth). Chunking needs to be done on the client so that the file can be
broken into pieces before it is sent to the server (or S3 in our case). A very common
mistake candidates make is to chunk the file on the server, which effectively defeats
the purpose since you still upload the entire file at once to get it on the server in the
first place. When we chunk, we typically break the file into 5-10 MB pieces, but this
can be adjusted based on the network conditions and the size of the file.
With chunks, it's rather straightforward for us to show a progress indicator to the
user. We can simply track the progress of each chunk and update the progress bar
as each chunk is successfully uploaded. T his provides a much better user experience
than the user simply staring at a spinning wheel for an hour.
T he next question is: how will we handle resumable uploads? We need to keep
track of which chunks have been uploaded and which haven't. We can do this by
saving the state of the upload in the database, specifically in our FileMetadata
table. Let's update the FileMetadata schema to include a chunks field.
{
"id": "123",
"name": "file.txt",
"size": 1000,
"mimeType": "text/plain",
"uploadedBy": "user1",
"status": "uploading",
"chunks": [
{
"id": "chunk1",
"status": "uploaded"
},
{
"id": "chunk2",
"status": "uploading"
},
{
"id": "chunk3",
"status": "not-uploaded"
}
]
}
When the user resumes the upload, we can check the chunks field to see which
chunks have been uploaded and which haven't. We can then start uploading the
chunks that haven't been uploaded yet. T his way, the user doesn't have to start the
upload from scratch if they lose their internet connection or close the browser.
But how should we ensure this chunks field is kept in sync with the actual
chunks that have been uploaded? T here are two approaches we can take:
Next, let's talk about how to uniquely identify a file and a chunk. When you try to
resume an upload, the very first question that should be asked is: (1) Have I tried to
upload this file before? and (2) If yes, which chunks have I already uploaded? T o
answer the first question, we cannot naively rely on the file name. T his is because
two different users (or even the same user) could upload files with the same name.
Instead, we need to rely on a unique identifier that is derived from the file's content.
T his is called a fingerprint.
For resumable uploads, the process involves not only fingerprinting the entire file
but also generating fingerprints for each individual chunk. T his chunk-level
fingerprinting allows the system to precisely identify which parts of the file have
already been transmitted.
T aking a step back, we can tie it all together. Here is what will happen when a user
uploads a large file:
1. T he client will chunk the file into 5-10Mb pieces and calculate a fingerprint
for each chunk. It will also calculate a fingerprint for the entire file, this
becomes the fileId.
2. T he client will send a GET request to fetch the FileMetadata for the file with
the given fileId (fingerprint) in order to see if it already exists -- in which case,
we can resume the upload.
3. If the file does not exist, the client will POST a request to /files/presigned-
url to get a presigned URL for the file. T he backend will save the file
metadata in the FileMetadata table with a status of "uploading" and the
chunks array will be a list of the chunk fingerprints with a status of "not-
uploaded".
4. T he client will then upload each chunk to S3 using the presigned URL. After
each chunk is uploaded, S3 will send a message to our backend using S3
event notifications. Our backend will then update the chunks field in the
FileMetadata table to mark the chunk as "uploaded".
5. Once all chunks in our chunks array are marked as "uploaded", the backend
will update the FileMetadata table to mark the file as "uploaded".
All throughout this process, the client is responsible for keeping track of the
progress of the upload and updating the user interface accordingly so the user
knows how far in they are and how much longer it will take.
The approach we just described is not novel. In fact, this is a problem that has been solved
by cloud storage providers like Amazon S3. They have a feature called Multipart Upload
that allows you to upload large objects in parts. This is exactly what we just described. The
client breaks the file into parts and uploads each part to S3. S3 then combines the parts
into a single object. They even provide a handy JavaScript SDK which will handle all of the
chunking and uploading for you.
In practice, you'd rely on this API when designing a system like Dropbox. However, it's
almost certaintly the case that you could not get away with just saying, "I'd use the S3
Show More
Beyond that which we've already discussed, we can also utilize compression to
speed up both uploads and downloads. Compression reduces the size of the file,
which means fewer bytes need to be transferred. We can compress a file on the
client before uploading it and then decompress it on the server after it's uploaded.
We can also compress the file on the server before sending it to the client and then
rely on the client to decompress it.
We'll need to be smart about when we compress though. Compression is only useful
if the speed gained from transferring fewer bytes outweighs the time it takes to
compress and decompress the file. For some file types, particularly media files like
images and videos, the compression ratio is so low that it's not worth the time it
takes to compress and decompress the file. If you take a .png off your computer
right now and compress it, you'll be lucky to have decreased the file size by more
than a few percent -- so it's not worth it. For text files, on the other hand, the
compression ratio is much higher and, depending on network conditions, it may very
well be worth it. A 5GB text file could compress down to 1GB or even less depending
on the content.
In the end, you'll want to implement logic on the client that decides whether or not
to compress the file before uploading it based on the file type, size, and network
conditions.
There are a number of compression algorithms that you can use to compress files. The
most common are Gzip, Brotli, and Zstandard. Each of these algorithms has its own
tradeoffs in terms of compression ratio and speed. Gzip is the most widely used and is
supported by all modern web browsers. Brotli is newer and has a higher compression ratio
than Gzip, but it's not supported by all web browsers. Zstandard is the newest and has the
highest compression ratio and speed, but it's not supported by all web browsers. You'll
need to decide which algorithm to use based on your specific use case.
One important fact about compression is that you should always compress before you
Show More
T his is where those signed URLs we talked about early come back into play. When a
user requests a download link, we generate a signed URL that is only valid for a
short period of time (e.g. 5 minutes). T his signed URL is then sent to the user, who
can use it to download the file. If an unauthorized user gets a hold of the signed
URL, they won't be able to use it to download the file because it will have expired.
T hey also work with modern CDNs like CloudFront and are a feature of S3. Here is
how:
Mid-level
Breadth vs. Depth: A mid-level candidate will be mostly focused on breadth (80% vs
20%). You should be able to craft a high-level design that meets the functional
requirements you've defined, but many of the components will be abstractions with
which you only have surface-level familiarity.
Probing the Basics: Your interviewer will spend some time probing the basics to
confirm that you know what each component in your system does. For example, if
you add an API Gateway, expect that they may ask you what it does and how it works
(at a high level). In short, the interviewer is not taking anything for granted with
respect to your knowledge.
Mixture of Driving and Taking the Backseat: You should drive the early stages of
the interview in particular, but the interviewer doesn’t expect that you are able to
proactively recognize problems in your design with high precision. Because of this,
it’s reasonable that they will take over and drive the later stages of the interview
while probing your design.
The Bar for Dropbox: For this question, an E4 candidate will have clearly defined the
API endpoints and data model, landed on a high-level design that is functional for all
of uploading, downloading, and sharing. I don't expect candidates to know about
pre-signed URLs or uploading/downloading to/from S3 directly, or to immediately
know about chunking. However, I expect that when I ask probing questions like,
"You're uploading the file twice right now, how can we avoid that?" or "How can you
show a user's progress while allowing them to resume an upload?" that they can
reason through the problem and come to a solution via some back and forth.
Senior
Depth of Expertise: As a senior candidate, expectations shift towards more in-depth
knowledge — about 60% breadth and 40% depth. T his means you should be able to
go into technical details in areas where you have hands-on experience. It's crucial
that you demonstrate a deep understanding of key concepts and technologies
relevant to the task at hand.
Advanced System Desig n: You should be familiar with advanced system design
principles. For example, knowing how to utilize blob storage for large files, or how
to implement a CDN for faster downloads. You should be able to discuss the trade-
offs involved in different design choices and justify your decisions based on your
experience.
The Bar for Dropbox: For this question, E5 candidates are expected to quickly go
through the initial high-level design so that they can spend time discussing, in detail,
how to handle uploading large files, in particular. I expect them to be more proactive
here than mid-level candidates, thinking through several options and arriving at a
reasonable solution. While not strictly required, many candidates will have
experience with file uploads and can speak directly about certain APIs (like multipart
upload) and how they work.
Staff+
Emphasis on Depth: As a staff+ candidate, the expectation is a deep dive into the
nuances of system design — I'm looking for about 40% breadth and 60% depth in
your understanding. T his level is all about demonstrating that, while you may not
have solved this particular problem before, you have solved enough problems in the
real world to be able to confidently design a solution backed by your experience.
You should know which technologies to use, not just in theory but in practice, and
be able to draw from your past experiences to explain how they’d be applied to solve
specific problems effectively. T he interviewer knows you know the small stuff (REST
API, data normalization, etc) so you can breeze through that at a high level so you
have time to get into what is interesting.
Advanced System Desig n and Scalability: Your approach to system design should
be advanced, focusing on scalability and reliability, especially under high load
conditions. T his includes a thorough understanding of distributed systems, load
balancing, caching strategies, and other advanced concepts necessary for building
robust, scalable systems.
The Bar for Dropbox: For a staff-level candidate, expectations are high regarding
the depth and quality of solutions, especially for the complex scenarios discussed
earlier. Exceptional candidates delve deeply into each of the topics mentioned above
and may even steer the conversation in a different direction, focusing extensively on
a topic they find particularly interesting or relevant. T hey are also expected to
possess a solid understanding of the trade-offs between various solutions and to be
able to articulate them clearly, treating the interviewer as a peer.
Your account is free and you can post anonymously if you choose.
E ElectricT omatoSeahorse112
Created at: 3/1/2024, 05:20 AM
Ive watched a few youtube video breakdowns and this is way better. Curious though if you
need the CDN? Shouldn't the users mostly be accessing files from the same region since
they uploaded them?