0% found this document useful (0 votes)
2 views

Deep Dive on Object Storage - Amazon S3 and Amazon Glacier

The document provides an in-depth overview of Amazon S3, detailing its capabilities, architectural patterns, best practices, and various use cases. It highlights the storage classes available, data management strategies, and tools for data transfer and analysis. Additionally, it discusses the integration of Amazon S3 with other AWS services for enhanced functionality and performance monitoring.

Uploaded by

Vikram Simha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Deep Dive on Object Storage - Amazon S3 and Amazon Glacier

The document provides an in-depth overview of Amazon S3, detailing its capabilities, architectural patterns, best practices, and various use cases. It highlights the storage classes available, data management strategies, and tools for data transfer and analysis. Additionally, it discusses the integration of Amazon S3 with other AWS services for enhanced functionality and performance monitoring.

Uploaded by

Vikram Simha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Deep Dive on Amazon S3

Adrian Hornsby, Technical Evangelist


@adhorn
[email protected]

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to Expect from the Session

• What you need to know about S3 on AWS.


• Architectural design patterns with S3.
• Best practices & tips.
• Tools to help you.
Amazon S3 in 2006
Amazon S3 today

Amazon S3 holds trillions of objects and regularly peaks at


millions of requests per second.

(1,000,000,000,000; one million million; 1012; SI prefix: tera-), ..American and British English
(1,000,000,000,000,000,000; one million million million; 1018; SI prefix: exa-), ..non-English-speaking countries
https://en.wikipedia.org/wiki/Trillion
Netflix delivers billions of hours of
content from Amazon S3.

SmugMug stores billions of photos


and images on Amazon S3.

Airbnb handles over 10PB of user


images on Amazon S3.

Soundcloud currently stores 2.5 PB


of data on Amazon Glacier.

Nasdaq uses Amazon S3 to support


years of historical tick data down to
the millisecond.
We currently log 20 terabytes of
new data each day, and have
around 10 petabytes of data in S3.
(2014)

FINRA stores over 700 TB of data


on Amazon S3 for low cost, durable,
scalable storage and uses Amazon
EMR for scalable compute workloads using
Hive, Presto, and Spark.

Sony moved over 1M hours of video from


magnetic tape to Glacier for digital
preservation.
Amazon S3 usage pattern

Collect Store Analyze Visualize

Amazon S3
Choice of storage classes on S3

Standard Standard - Infrequent Access Amazon Glacier

Active data Infrequently accessed data Archive data


Choice of storage classes on S3

S3 Standard Standard - IA Amazon Glacier


• Big data analysis • Backup & archive • Long term archives
• Content distribution • Disaster recovery • Digital preservation
• Static website • File sync & share • Magnetic tape
hosting • Long-retained data replacement

Active data Infrequently accessed data Archive data


Disaster Recovery & Backups
Back up data to Amazon S3
Amazon
http://example.net Route53

On Premise
Infrastructure

Copy to S3

Traditional Amazon S3
Server Bucket
Data collection into Amazon S3

AWS Direct Connect AWS Snowball ISV Connectors AWS Snowball Edge

Amazon Kinesis S3 Transfer AWS Storage


AWS Snowmobile
Firehose Acceleration Gateway
Fun fact

Since October 2015, AWS


Snowball has moved over
5 billion objects into Amazon S3,
and AWS Snowball appliances
have traveled a distance equal to
circling the world more than 100
times.
Exabyte-scale data transfer

up to 100PB per Snowmobile


Archiving
Data access pattern.

Access
Frequency

0 days 30 days 90 days 300 days


Time
S3 Analytics

• Visualize the access pattern.

• Measure the object age.

• By bucket, prefixes or tag.

• Analysis based lifecycle


policy.
Export S3 Analytics
Amazon Storage Partner Solutions
Primary Storage Backup and Recovery Archive

Solutions that leverage file, block, object, Solutions that leverage Amazon S3 for Solutions that leverage Amazon
and streamed data formats as an durable data backup Glacier for durable and cost-effective
extension to on-premises storage long-term data backup

aws.amazon.com/backup-recovery/partner-solutions/
Note: Represents a sample of storage partners
Automate Lifecycle policies
Transition

90 days 1 year

Amazon S3 Amazon S3 Amazon


Standard Infrequent Glacier
Access
Automate Lifecycle policies
Deletion

1 year

Amazon S3 Delete
Standard
Protect your data from the “oups”

**default

** versioning-enabled

** suspended
MFA
Versioning Protection on delete
(multi-factor authentication)

• Protects from: • Requires additional authentication to:


• unintended user deletes • Change the versioning state of your bucket
• application failures • Permanently delete an object version
• New version with every upload
• Easy retrieval
• Roll back to previous versions
Content Storage & Distribution
AWS Global Infrastructure

16
Regions
43 Availability Zones
Cross region replication
(PUTs only)

Asynchronous
Replication
“The S3 cross-region replication feature will enable FINRA to transfer
large amounts of data in a far more automated, timely and cost effective
manner. Making use of the new feature to help meet resiliency,
compliance or DR data requirements is a no brainer.”

Peter Boyle, Senior Director


24 MAR 2015
Amazon CloudFront (CDN)

• Cache content at the edge.


• Lower load on origin.
• Dynamic and static content.
• Custom SSL certificates
• Low TTLs
Faster upload over long distances
S3 Transfer Acceleration

• Change your endpoint, not your code


Optimized
• No firewall changes or client software Throughput!

• Longer distance, larger files, more benefit


S3 Bucket
• Faster or free AWS Edge
Location

• 73 global edge locations


Uploader

Try it at S3speedtest.com
Service traffic flow
Client to S3 Bucket example

Resolve
b1.s3-accelerate.amazonaws.com
Amazon
Route 53
1 AWS Region
AWS Edge Location 3 EC2 Proxy

HTTPS PUT/POST
upload_files.zip

HTTP/S PUT/POST

2 “upload_files.zip”
b1.s3-accelerate.amazonaws.com
Customer Client
4
S3 Bucket
Use case: media uploads

“ We have customers uploading large files from all over


the world.

We’ve seen performance improvements in excess of


500% in some cases.

- Emery Wells, Cofounder/CTO ”


Use case: media uploads

“ We loved how easy it was to get started with S3 transfer


acceleration — just a simple endpoint change in our
application and done.

S3 transfer acceleration reduces the average time it takes for


us to ingest videos from our global user base by almost half.
This gives our customers the ability to edit and share videos
sooner where speed is a critical factor.

All this for a fraction of the cost of the solution we evaluated


before.
- Brian Kaiser, CTO
Use case: media uploads

“ S3 transfer acceleration is way faster than we expected.


It’s removed the international distance barrier when
uploading video.

Our customers now have more time to focus on


producing great videos rather than waiting for a video to
upload.
- Domagoj Filipovic, CTO

Multipart uploads/download for large objects

Large File Large Object

Amazon S3

Multiparts / Multi-threads
AWS SDKs
• Automatically switching to multipart
transfers when a file is over a specific
size threshold
• Uploading/downloading a file in parallel
• Progress callbacks to monitor transfers
• Retries.
Organize your data with object tags
Manage data based on what it is as opposed to where its located

Up to 10 tags per object


• Tag your objects with key-value pairs
• Write policies once based on the type of data
Tags • Put object with tag or add tag to existing
objects
Higher TPS by distributing key names
If you regularly exceed 100 TPS on a bucket
• Avoid starting with a date or monotonically increasing numbers

Don’t do this…
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
<my_bucket>/2013_11_11-164533134.jpg
<my_bucket>/2013_11_11-164533135.jpg
<my_bucket>/2013_11_11-164533136.jpg
Distributing key names
Add randomness to the beginning of the key name
with a hash or reversed timestamp (ssmmhhddmmyy)
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg
Website Hosting
App 0.1
Simple Static Website

Amazon
Route53
http://poliko.adhorn.me

http://poliko.adhorn.me.s3-website-eu-west-1.amazonaws.com
Amazon S3
App 0.2
Simple Static Website
Amazon
Route53 http://poliko.adhorn.me

Amazon
CloudFront

Amazon S3
App 0.3
Separate static assets from dynamic content

Amazon Amazon
User
Route 53 CloudFront

Elastic IP

Amazon S3 *.js
*.jpeg
*.mp4
Application Database
Big Data Analytics
The Dark Data Problem
Most generated data is unavailable for analysis
Data Volume

Generated Data
Available for Analysis

Year
1990 2000 2010 2020
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Amazon Athena
Amazon Athena: SQL Query on S3

• No loading of data
• Serverless
• Support text, CSV, TSV, JSON, AVRO
• Columnar formats Apache ORC & Parquet
• Access via Console or JDBC driver
• $5 per TB scanned from S3
Amazon Redshift Spectrum
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

S3
SQL
High concurrency: Multiple No ETL: Query data in-place Full Amazon Redshift
clusters access same data using open file formats SQL support
S3 Inventory

Bucket
Key
Version Id
Is Latest
Delete Marker
Size
Save time Daily or Weekly delivery Delivery to S3 bucket CSV File Output Last Modified
ETag
StorageClass
Multipart Uploaded
Replication Status
Indexing S3 content using Elasticsearch
Source data
AWS
Lambda
Source data
ObjectCreate
Event
Source data

Index Record
Source data
S3 Bucket

Search
Query
Amazon
Elasticsearch
Service
Cloud Native Applications
Cognito support for Identity
Cognito User Pools

SAML
Identity Provider

Username

Password 2. Get AWS credentials Amazon Cognito


Sign In

DynamoDB S3

API Gateway Lambda


Leverage Amazon S3 directly from the app.

Amazon Cognito

MyApp

Save Pic

2. Put object
Amazon S3
http://poliko.adhorn.me
http://poliko.adhorn.me

Amazon Cognito

Poliko

Take Pic
2. Detect Labels
3. Detect Faces Amazon Rekognition

4. Synthesize-speech
Amazon Polly

Amazon S3
“Static website hosting” enabled
Event Driven Architecture
Event driven

Event on B by A triggers C A B C
Amazon S3 with event-driven workflow

Source data
SQS Queue

Source data

Source data

Source data SNS Topic


Amazon S3

Lambda function

Invoked in response to events


Event-Driven validation layer on Amazon S3

Source data
AWS Lambda
Input Validation and
Conversion layer
Source data

Source data

Source data Data Staging


Data Staging
Layer Layer
/data/source-raw /data/source-validated
Event-driven photo manipulation with Lambda

S3: AWS Lambda: S3:


Source Bucket e.g. Resize Images Destination Bucket

Triggered on
PUTs
Users upload photos
Event-driven photo analysis with
Lambda&Rekognition

S3: Image Analysis


Source Bucket AWS Lambda:

Triggered on
PUTs
Users upload photos Amazon Rekognition
Audit and Monitoring
Audit and monitor access
AWS CloudTrail data events
Use cases:
• Perform security analysis
• Meet your IT auditing and compliance needs
• Take immediate action on activity

How it works:
• Capture S3 object-level requests
• Enable at the bucket level
• Logs delivered to your S3 bucket
• $0.10 per 100,000 data events
Monitor performance and operation
Amazon CloudWatch metrics for S3

• Generate metrics for data of your choice


• Entire bucket, prefixes, and tags
• Up to 1,000 groups per bucket

• 1-minute CloudWatch metrics


• Alert and alarm on metrics
• $0.30 per metric per month
Example
Summary
"All Roads Lead to Rome Amazon S3"
Thank you!!

Adrian Hornsby, Technical Evangelist


@adhorn
[email protected]

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

You might also like