How Amazon S3 Stores 350 Trillion Objects with 11 Nines of Durability
How Amazon S3 Stores 350 Trillion Objects with 11 Nines of Durability
Disclaimer: The details in this post have been derived from Amazon
Engineering Blog and other sources. All credit for the technical details goes to
the Amazon engineering team. The links to the original articles are present in
the references section at the end of the post. We’ve attempted to analyze the
details and provide our input about them. If you find any inaccuracies or
omissions, please leave a comment, and we will do our best to fix them.
Amazon Simple Storage Service (S3) is a highly scalable and durable object
storage service designed for developers, businesses, and enterprises.
Amazon S3 plays a vital role in the cloud computing ecosystem for several
reasons:
In this article, we’ll take a deeper look at the architecture of Amazon S3 and
understand how it works. We will also learn about the challenges faced by the
1/12
Amazon engineering team while building S3.
Evolution of Amazon S3
Amazon Simple Storage Service (S3) was launched in March 2006 as one of
the first AWS cloud services. At the time, businesses were struggling with
traditional on-premises storage limitations, including high costs, complex
scalability issues, and unreliable backup solutions.
2/12
Introduced cross-region replication (CRR) and event notifications
for automation.
In its early days, they underestimated the explosive demand for cloud
storage. Initially projected to store a few gigabytes per customer, they had to
quickly scale to support exabytes of data due to the rapid adoption of digital
transformation.
As S3 usage grew exponentially, AWS had to shift its strategy from reactive
problem-solving to proactive scaling.
3/12
Amazon S3 engineers started using threat modeling to predict potential
system failures and optimize performance before problems occurred.
See the diagram below that shows a high-level view of the S3 service
architecture.
4/12
Here’s how the overall system works:
This includes:
For example, when a user uploads a file, these services determine the best
S3 storage node to store the object while balancing load across AWS
regions.
5/12
Partitioning Engine: Distributes index data to optimize retrieval speed.
For example, when you request a file, S3 doesn’t scan billions of objects.
Instead, the metadata service instantly finds the correct storage location.
For example, when a user uploads a file, the storage engine decides where
to store it, applies encryption, and ensures copies exist in multiple regions.
This includes:
6/12
Background Auditors: Scans storage nodes for corrupted files and
automatically repairs them.
For example, if a hard drive storing S3 objects fails, the background auditors
detect the issue and trigger an automatic rebuild of the missing data.
A client (user application) can send a request to write data into Amazon S3
using different interfaces such as AWS SDKs (available in multiple
7/12
programming languages), AWS CLI, or the S3 REST API.
S3 also supports multi-part upload for large files (>5 MB). For such files, S3
splits the data into chunks and uploads them in parallel.
The client’s request is sent to Amazon Route 53, AWS’s DNS service. Route
53 resolves the bucket’s domain (my-bucket.s3.amazonaws.com) into an IP
address. The request is then directed to the nearest AWS edge location for
processing.
8/12
AWS uses multi-value DNS to spread requests across multiple front-end
servers. Also, global load balancing ensures that requests are routed to the
nearest available S3 region.
The metadata store is responsible for tracking object names, sizes, storage
locations, and permissions. Each S3 object key is mapped to a physical
storage location. Lexicographic partitioning is used to distribute high-load
buckets across multiple indexing servers.
The indexing plays a key role in fast retrieval. Instead of scanning billions of
objects, S3 retrieves objects using index lookups. Background indexing
ensures updates are consistent across all availability zones. Sharding
distributes metadata across multiple servers to avoid bottlenecks.
Once the metadata is recorded, the data must be physically stored in S3.
The object is split into multiple fragments and stored across multiple disks.
AWS uses Erasure Coding (Reed-Solomon algorithms) to improve durability.
{
"ETag": "d41d8cd98f00b204e9800998ecf8427e",
9/12
"VersionId": "3i4aUVmMNBzHj1aJChF7sHG.jP0tGdrd"
}
Here, ETag is the unique checksum of the stored object and VersionId is the
object’s version.
The failure responses could be 403 (Forbidden), 503 (Slow Down), or 500
(AWS server issue). From an optimization point of view, features like S3
Select can query objects directly without downloading.
Fast lookups
Scalability
10/12
Without partitioning, a single index server would be overloaded, slowing down
performance. For example, if S3 stored 1 billion objects in a single index,
searching for an object would be extremely slow. Partitioning spreads the
objects across thousands of smaller indexes, allowing rapid lookups.
S3 sorts and distributes object keys alphabetically across storage nodes. This
prevents overloading specific servers with frequently accessed objects. If
certain key patterns become hotspots, S3 rebalances them dynamically.
/2024/reports/file1.json
/2024/reports/file2.json
/2024/reports/file3.json
Since all keys start with "2024", S3 would route all requests to the same
partition, creating a performance bottleneck.
By distributing keys more evenly, S3 ensures that requests are spread across
multiple storage nodes. As a best practice, AWS recommends adding high-
cardinality prefixes (random characters) at the beginning of keys to improve
load balancing and retrieval speed.
11/12
Handling 100 Million+ Requests Per Second
Conclusion
Amazon S3 is a technological marvel. It can handle over 350 trillion objects
and process more than 100 million requests per second with sub-millisecond
latency.
12/12