Deep Dive on Object Storage - Amazon S3 and Amazon Glacier
Deep Dive on Object Storage - Amazon S3 and Amazon Glacier
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to Expect from the Session
(1,000,000,000,000; one million million; 1012; SI prefix: tera-), ..American and British English
(1,000,000,000,000,000,000; one million million million; 1018; SI prefix: exa-), ..non-English-speaking countries
https://en.wikipedia.org/wiki/Trillion
Netflix delivers billions of hours of
content from Amazon S3.
Amazon S3
Choice of storage classes on S3
On Premise
Infrastructure
Copy to S3
Traditional Amazon S3
Server Bucket
Data collection into Amazon S3
AWS Direct Connect AWS Snowball ISV Connectors AWS Snowball Edge
Access
Frequency
Solutions that leverage file, block, object, Solutions that leverage Amazon S3 for Solutions that leverage Amazon
and streamed data formats as an durable data backup Glacier for durable and cost-effective
extension to on-premises storage long-term data backup
aws.amazon.com/backup-recovery/partner-solutions/
Note: Represents a sample of storage partners
Automate Lifecycle policies
Transition
90 days 1 year
1 year
Amazon S3 Delete
Standard
Protect your data from the “oups”
**default
** versioning-enabled
** suspended
MFA
Versioning Protection on delete
(multi-factor authentication)
16
Regions
43 Availability Zones
Cross region replication
(PUTs only)
Asynchronous
Replication
“The S3 cross-region replication feature will enable FINRA to transfer
large amounts of data in a far more automated, timely and cost effective
manner. Making use of the new feature to help meet resiliency,
compliance or DR data requirements is a no brainer.”
Try it at S3speedtest.com
Service traffic flow
Client to S3 Bucket example
Resolve
b1.s3-accelerate.amazonaws.com
Amazon
Route 53
1 AWS Region
AWS Edge Location 3 EC2 Proxy
HTTPS PUT/POST
upload_files.zip
HTTP/S PUT/POST
2 “upload_files.zip”
b1.s3-accelerate.amazonaws.com
Customer Client
4
S3 Bucket
Use case: media uploads
”
before.
- Brian Kaiser, CTO
Use case: media uploads
Amazon S3
Multiparts / Multi-threads
AWS SDKs
• Automatically switching to multipart
transfers when a file is over a specific
size threshold
• Uploading/downloading a file in parallel
• Progress callbacks to monitor transfers
• Retries.
Organize your data with object tags
Manage data based on what it is as opposed to where its located
Don’t do this…
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
<my_bucket>/2013_11_11-164533134.jpg
<my_bucket>/2013_11_11-164533135.jpg
<my_bucket>/2013_11_11-164533136.jpg
Distributing key names
Add randomness to the beginning of the key name
with a hash or reversed timestamp (ssmmhhddmmyy)
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg
Website Hosting
App 0.1
Simple Static Website
Amazon
Route53
http://poliko.adhorn.me
http://poliko.adhorn.me.s3-website-eu-west-1.amazonaws.com
Amazon S3
App 0.2
Simple Static Website
Amazon
Route53 http://poliko.adhorn.me
Amazon
CloudFront
Amazon S3
App 0.3
Separate static assets from dynamic content
Amazon Amazon
User
Route 53 CloudFront
Elastic IP
Amazon S3 *.js
*.jpeg
*.mp4
Application Database
Big Data Analytics
The Dark Data Problem
Most generated data is unavailable for analysis
Data Volume
Generated Data
Available for Analysis
Year
1990 2000 2010 2020
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Amazon Athena
Amazon Athena: SQL Query on S3
• No loading of data
• Serverless
• Support text, CSV, TSV, JSON, AVRO
• Columnar formats Apache ORC & Parquet
• Access via Console or JDBC driver
• $5 per TB scanned from S3
Amazon Redshift Spectrum
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes
S3
SQL
High concurrency: Multiple No ETL: Query data in-place Full Amazon Redshift
clusters access same data using open file formats SQL support
S3 Inventory
Bucket
Key
Version Id
Is Latest
Delete Marker
Size
Save time Daily or Weekly delivery Delivery to S3 bucket CSV File Output Last Modified
ETag
StorageClass
Multipart Uploaded
Replication Status
Indexing S3 content using Elasticsearch
Source data
AWS
Lambda
Source data
ObjectCreate
Event
Source data
Index Record
Source data
S3 Bucket
Search
Query
Amazon
Elasticsearch
Service
Cloud Native Applications
Cognito support for Identity
Cognito User Pools
SAML
Identity Provider
Username
DynamoDB S3
Amazon Cognito
MyApp
Save Pic
2. Put object
Amazon S3
http://poliko.adhorn.me
http://poliko.adhorn.me
Amazon Cognito
Poliko
Take Pic
2. Detect Labels
3. Detect Faces Amazon Rekognition
4. Synthesize-speech
Amazon Polly
Amazon S3
“Static website hosting” enabled
Event Driven Architecture
Event driven
Event on B by A triggers C A B C
Amazon S3 with event-driven workflow
Source data
SQS Queue
Source data
Source data
Lambda function
Source data
AWS Lambda
Input Validation and
Conversion layer
Source data
Source data
Triggered on
PUTs
Users upload photos
Event-driven photo analysis with
Lambda&Rekognition
Triggered on
PUTs
Users upload photos Amazon Rekognition
Audit and Monitoring
Audit and monitor access
AWS CloudTrail data events
Use cases:
• Perform security analysis
• Meet your IT auditing and compliance needs
• Take immediate action on activity
How it works:
• Capture S3 object-level requests
• Enable at the bucket level
• Logs delivered to your S3 bucket
• $0.10 per 100,000 data events
Monitor performance and operation
Amazon CloudWatch metrics for S3
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.