SlideShare a Scribd company logo
London
Hadoop User
Group
Deep experience in
building and
operating global web
scale systems
About	
  Amazon	
  
Web	
  Services	
  
?
…get into cloud computing?
How did Amazon…
Utility computing
On demand Pay as you go
Uniform Available
Utility computing
On demand Pay as you go
Uniform Available
Utility computing
Utility computing
On demand Pay as you go
Uniform Available
Compute	
  
Storage	
  
Security	
  
Scaling	
  
Database	
  
Networking	
  
Monitoring	
  
Messaging	
  
Workflow	
  
DNS	
  
Load	
  Balancing	
  
Backup	
  CDN	
  
No	
  Up-­‐Front	
  
Capital	
  Expense	
  
Pay	
  Only	
  for	
  
What	
  You	
  Use	
  
Self-­‐Service	
  
Infrastructure	
  
Easily	
  Scale	
  Up	
  
and	
  Down	
  
Improve	
  Agility	
  &	
  
Time-­‐to-­‐Market	
  
Low	
  Cost	
  
Deploy
Cloud computing benefits
Traditional IT
capacity
ElasNc	
  capacity	
  
Capacity
Time
Your IT needs
On	
  and	
  Off	
   Fast	
  Growth	
  
Variable	
  peaks	
   Predictable	
  peaks	
  
ElasNc	
  capacity	
  
ElasNc	
  capacity	
  
On	
  and	
  Off	
   Fast	
  Growth	
  
Predictable	
  peaks	
  Variable	
  peaks	
  
WASTE
CUSTOMER DISSATISFACTION
ElasNc	
  capacity	
  
Fast	
  Growth	
  On	
  and	
  Off	
  
Predictable	
  peaks	
  Variable	
  peaks	
  
NumberofEC2Instances
4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/20084/17/20084/13/2008
40	
  servers	
  to	
  5000	
  in	
  3	
  days	
  
EC2 scaled to peak of 5000
instances
“Techcrunched”
Launch of Facebook
modification
Steady state of ~40
instances
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Global Infrastructure
Global Infrastructure
Region
US-WEST (N. California)
 EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA PAC
(Sydney)
Availability Zone
Global Infrastructure
Customer Needs
•  Store	
  Any	
  Amount	
  of	
  Data	
  
–  Without	
  Capacity	
  Planning	
  
•  Perform	
  Complex	
  Analysis	
  on	
  Any	
  Data	
  
–  Scale	
  on	
  Demand	
  
•  Store	
  Data	
  Securely	
  
•  Decrease	
  Time	
  to	
  Market	
  
–  Build	
  Environments	
  Quickly	
  
•  Reduce	
  Costs	
  
–  Reduce	
  Capital	
  Expenditure	
  
•  Enable	
  Global	
  Reach	
  
IngesNon	
  |	
  IntegraNon	
  
ElasNc	
  Block	
  Store	
  
High performance block storage
device
1GB to 1TB in size
Mount as drives to instances with
snapshot/cloning functionalities
IMAGE
Availability
99.99%
Durability
99.999999999%
Is a Web Store
Not a file system
No Single Points of Failure
Eventually consistent
Paradigm Object store
Performance Very Fast
Redundancy Across Availability Zones
Security Public Key / Private Key
Pricing $0.095/GB/month
Typical use
case
Write once, read many
Limits 100 Buckets, Unlimited
Storage, 5TB Objects
Simple	
  Storage	
  Service	
  
Highly	
  scalable	
  object	
  storage	
  for	
  the	
  internet	
  
1	
  byte	
  to	
  5TB	
  in	
  size	
  
99.999999999%	
  durability	
  
Peak Requests: 830,000+ per second
Total Number of Objects Stored in Amazon S3
14 Billion
 40 Billion
102 Billion
762 Billion
262 Billion
1.3 Trillion
Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012
Objects in S3
Glacier	
  
Long	
  term	
  object	
  archive	
  
Extremely	
  low	
  cost	
  per	
  gigabyte	
  
99.999999999%	
  durability	
  
ElasNc	
  Block	
  Store	
  
High performance block storage
device
1GB to 1TB in size
Mount as drives to instances with
snapshot/cloning functionalities
IMAGE
Durability
99.999999999%
Designed for Archival
Not a file system
Vaults & Archives
3-5 Hour Retrieval Time
Paradigm Archive Store
Performance Configurable - Low
Redundancy Across Availability Zones
Security Public Key / Private Key
Pricing $0.011/GB/month
Typical use
case
Write once, read
infrequently
< 10% / Month
Simple	
  Storage	
  Service	
  
Highly	
  scalable	
  object	
  storage	
  
1	
  byte	
  to	
  5TB	
  in	
  size	
  
99.999999999%	
  durability	
  
Glacier	
  
Long	
  term	
  object	
  archive	
  
Extremely	
  low	
  cost	
  per	
  gigabyte	
  
99.999999999%	
  durability	
  
Storage	
  Lifecycle	
  IntegraNon	
  
Structured	
  Data	
  Management	
  
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
Relational Database Service
Managed Oracle, MySQL & SQL Server
Dynamo DB
Managed NOSQL Database
Amazon Redshift
Massively Parallel Petabyte Scale Data Warehouse
RDS Dynamo
DB
Redshift
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
Relational Database Service
Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations
Integration with Data Pipeline
RDS Dynamo
DB
Redshift
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
DynamoDB
Provisioned throughput NoSQL database
Fast, predictable, configurable performance
Fully distributed, fault tolerant HA architecture
Integration with EMR & Hive
RDS Dynamo
DB
Redshift
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
Redshift
Managed Massively Parallel Petabyte Scale Data
Warehouse
Streaming Backup/Restore to S3
Extensive Security
2 TB -> 1.6 PB
RDS Dynamo
DB
Redshift
Unstructured	
  Data	
  
…	
  
Parallel	
  ETL	
  
Elastic MapReduce
Managed, elastic Hadoop cluster
Integrates with S3 & DynamoDB
Leverage Hive & Pig analytics scripts
Support for Spot Instances
Integrated HBase NOSQL Database
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Application Services
Elastic
MapReduce
•  AWS Web Console
•  Command Line
elastic-­‐mapreduce	
  -­‐-­‐create	
  -­‐-­‐key-­‐pair	
  micro	
  -­‐-­‐region	
  eu-­‐
west-­‐1	
  -­‐-­‐name	
  IanMM-­‐Test1	
  -­‐-­‐num-­‐instances	
  5	
  -­‐-­‐instance-­‐
type	
  m2.4xlarge	
  –alive	
  -­‐-­‐log-­‐uri	
  s3n://meyersi-­‐ire/EMR/
log	
  
Launching Clusters
•  Enabling Tools
elastic-­‐mapreduce	
  -­‐-­‐create	
  -­‐-­‐key-­‐pair	
  micro	
  -­‐-­‐region	
  eu-­‐west-­‐1	
  -­‐-­‐
name	
  IanMM-­‐Test1	
  -­‐-­‐num-­‐instances	
  5	
  -­‐-­‐instance-­‐type	
  m2.4xlarge	
  -­‐-­‐
alive	
  	
  
-­‐-­‐pig-­‐interactive	
  -­‐-­‐pig-­‐versions	
  latest	
  
-­‐-­‐hive-­‐interactive	
  –-­‐hive-­‐versions	
  latest	
  
-­‐-­‐hbase	
  	
  
-­‐-­‐log-­‐uri	
  s3n://meyersi-­‐ire/EMR/log	
  
Launching Clusters
•  Hadoop Configuration Bootstrap Action
elastic-­‐mapreduce	
  -­‐-­‐create	
  -­‐-­‐bootstrap-­‐action	
  
s3://elasticmapreduce/bootstrap-­‐
actions/configure-­‐hadoop	
  -­‐-­‐args	
  "-­‐
s,dfs.block.size=1048576”	
  -­‐-­‐key-­‐pair	
  micro	
  
-­‐-­‐region	
  eu-­‐west-­‐1	
  -­‐-­‐name	
  IanMM-­‐Test-­‐3	
  -­‐-­‐instance-­‐group	
  
core	
  -­‐-­‐instance-­‐count	
  2	
  -­‐-­‐instance-­‐type	
  m2.4xlarge	
  -­‐-­‐
instance-­‐group	
  task	
  -­‐-­‐instance-­‐count	
  2	
  -­‐-­‐instance-­‐type	
  
m2.4xlarge	
  -­‐-­‐alive	
  -­‐-­‐pig-­‐interactive	
  -­‐-­‐hive-­‐interactive	
  
-­‐-­‐log-­‐uri	
  s3n://meyersi-­‐ire/EMR/log	
  
Launching Clusters
Input Datanode: This could be a S3 bucket, RDS
table, EMR Hive table, etc. 	
  
Activity: This is a data aggregation,
manipulation, or copy that runs on a user-
configured schedule.
Output Datanode: This supports all the same
datasources as the input datanode, but they don’t
have to be the same type.	
  
Amazon Data Pipeline
Output:	
  S3	
  file	
  
Path:	
  s3://trend-­‐data/#{year-­‐month-­‐day}.csv	
  
AcNvity:	
  EMR	
  Transform	
  
Hive	
  Query:	
  user-­‐metrics.hql	
  
Frequency:	
  Daily	
  
Input:	
  RDS	
  Table	
  
Table:	
  User-­‐Demographics	
  
SQL	
  PrecondiNon:	
  	
  “Select	
  last_update	
  from	
  table“	
  >	
  #{YY-­‐MM-­‐DD}	
  
Input:	
  DynamoDB	
  Table	
  
Table:	
  User-­‐Event-­‐Data-­‐#{year-­‐month}	
  
Success	
  NoNficaNon:	
  metrics@example.com	
  
Failure	
  NoNficaNon:	
  emr-­‐admin@example.com	
  
Delay	
  NoNficaNon:	
  :	
  emr-­‐admin@example.com	
  
	
  
Orchestration with Data Pipeline
Analytics Pipeline
Redshift
S3
RDS
EMR
Data Pipeline
…collect & store
…orchestrate
…process & analyse
Dynamo DB
Benefits only possible in the Cloud
Pay as you
Go
Lower
Overall
Costs
Stop
Guessing
Capacity
Agility /
Speed /
Innovation
Avoid
Undifferentiated
Heavy Lifting
Go Global
in Minutes
✔ ✔ ✔ ✔ ✔ ✔
“Private
Cloud” /
On
Premises
X X X X X X
Agility & Global Reach

at the Core of EMR
Ease of Operation
Compute	
  Infrastructure	
  
Hadoop	
  ConfiguraNon	
   Local	
  Disk	
   OperaNng	
  System	
  Config	
  
HDFS	
  
Networking	
  
Hive	
   Pig	
   HBase	
  
User	
  Defined	
  Sogware	
  InstallaNon	
  
Ease of Operation
Compute	
  Infrastructure	
  
Hadoop	
  
ConfiguraNon	
  
Local	
  Disk	
  
OperaNng	
  
System	
  Config	
  
HDFS	
  
Networking	
  
Hive	
  
Pig	
  
HBase	
  
User	
  Defined	
  Sogware	
  InstallaNon	
  
Multiple Hadoop
Distributions - Open Source
& MapR
Clusters Launched with 1
Command
Up in 5 Minutes
Hard Partitioned per
Customer on CPU, Memory
and Disk
Dynamic Cluster Resizing
In any of 8 Regions around
the Globe
Lower Overall Costs

Cheaper | Spot Market Management
Lower TCO
June	
  2013	
  Study	
  by	
  Accenture	
  
Technology	
  Labs	
  
	
  
	
  
Not	
  Sponsored	
  or	
  Funded	
  by	
  Amazon	
  
	
  
	
  
“Accenture	
  assessed	
  the	
  price-­‐
performance	
  raJo	
  between	
  bare-­‐metal	
  
Hadoop	
  clusters	
  and	
  Hadoop-­‐as-­‐a-­‐Service	
  
on	
  Amazon	
  Web	
  Services…[and]	
  revealed	
  
that	
  Hadoop-­‐as-­‐a-­‐Service	
  offers	
  bePer	
  
price-­‐performance	
  raJo…”	
  
	
  
	
  
	
  
hkp://www.accenture.com/us-­‐en/Pages/insight-­‐hadoop-­‐
deployment-­‐comparison.aspx	
  
•  Spot allows customers
to bid on unused EC2
capacity
•  Spot price based on
supply/demand of
instance types in an
Availability Zone
•  Customers are fulfilled
when their bid price is
higher than the Spot
Price
•  Instances will be
interrupted when the
Spot price exceed the
bid price
Spot 101 - What are Spot Instances
elastic-mapreduce --add-instance-group TASK --instance-count 100 --bid-price .4
Mix Spot and On-Demand instances to reduce cost and
accelerate computation while protecting against interruption
#1: Cost without Spot
4 instances *14 hrs * $0.50 = $28
Job Flow
14 Hours
Duration:
Other EMR + Spot Use Cases
§ Run entire cluster on Spot for biggest cost savings
§ Reduce the cost of application testing
#2: Cost with Spot
4 instances *7 hrs * $0.50 = $14 +
5 instances * 7 hrs * $0.25 = $8.75
Total = $22.75
Scenario #1
Duration:
Job Flow
7 Hours
Scenario #2
Time Savings: 50%
Cost Savings: ~20%
Reducing Hadoop Costs with Spot
Stop Guessing Capacity

Dynamic Clusters
Extend on-premise environments…
with Amazon VPC…
Populate as demand dictates…
Connect over dedicated links…
And turn it off when you are done
EMR is Hadoop…

…cheaper, easier, and more agile
What’s New?
•  MapR M7 Introduction
•  Optimised for HBase Clusters
•  Failure Recovery
•  Point in Time Recovery
Snapshotting
•  Low Latency Hadoop Optimisations
•  HBase Mirroring
•  NFS + HDFS
•  MapR M5 Price Drop
•  Support for Pig 0.11.1
•  RANK, CUBE & ROLLUP capability
•  Groovy UDF’s
•  Support for Guava Functions
•  Performance Improvements
•  Spark/Shark Bootstrap
Action
•  In Memory Hadoop
•  Spark Scripting (similar to Pig)
•  Shark Shell with Hive
Interoperability
Ad

More Related Content

What's hot (10)

Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
Shu-Jeng Hsieh
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Michael Stack
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2
Paulraj Pappaiah
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Michael Stack
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2
Paulraj Pappaiah
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 

Similar to Amazon Elastic Map Reduce - Ian Meyers (9)

2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
Amazon Web Services Korea
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개
2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개
2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개
Amazon Web Services Korea
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
Amazon Web Services Korea
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개
2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개
2017 AWS DB Day | Amazon DynamoDB 서비스, 개요 및 신규 기능 소개
Amazon Web Services Korea
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
 
Ad

More from huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
huguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
huguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
huguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
huguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
huguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
huguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
huguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
huguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
huguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
huguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
huguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
huguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
huguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
huguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
huguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
huguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
huguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
huguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
huguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
huguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
huguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
huguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
 
Ad

Recently uploaded (20)

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 

Amazon Elastic Map Reduce - Ian Meyers

  • 2. Deep experience in building and operating global web scale systems About  Amazon   Web  Services   ? …get into cloud computing? How did Amazon…
  • 3. Utility computing On demand Pay as you go Uniform Available
  • 4. Utility computing On demand Pay as you go Uniform Available
  • 6. Utility computing On demand Pay as you go Uniform Available Compute   Storage   Security   Scaling   Database   Networking   Monitoring   Messaging   Workflow   DNS   Load  Balancing   Backup  CDN  
  • 7. No  Up-­‐Front   Capital  Expense   Pay  Only  for   What  You  Use   Self-­‐Service   Infrastructure   Easily  Scale  Up   and  Down   Improve  Agility  &   Time-­‐to-­‐Market   Low  Cost   Deploy Cloud computing benefits
  • 8. Traditional IT capacity ElasNc  capacity   Capacity Time Your IT needs
  • 9. On  and  Off   Fast  Growth   Variable  peaks   Predictable  peaks   ElasNc  capacity  
  • 10. ElasNc  capacity   On  and  Off   Fast  Growth   Predictable  peaks  Variable  peaks   WASTE CUSTOMER DISSATISFACTION
  • 11. ElasNc  capacity   Fast  Growth  On  and  Off   Predictable  peaks  Variable  peaks  
  • 12. NumberofEC2Instances 4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/20084/17/20084/13/2008 40  servers  to  5000  in  3  days   EC2 scaled to peak of 5000 instances “Techcrunched” Launch of Facebook modification Steady state of ~40 instances
  • 13. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Global Infrastructure
  • 14. Global Infrastructure Region US-WEST (N. California) EU-WEST (Ireland) ASIA PAC (Tokyo) ASIA PAC (Singapore) US-WEST (Oregon) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) GOV CLOUD ASIA PAC (Sydney)
  • 16. Customer Needs •  Store  Any  Amount  of  Data   –  Without  Capacity  Planning   •  Perform  Complex  Analysis  on  Any  Data   –  Scale  on  Demand   •  Store  Data  Securely   •  Decrease  Time  to  Market   –  Build  Environments  Quickly   •  Reduce  Costs   –  Reduce  Capital  Expenditure   •  Enable  Global  Reach  
  • 18. ElasNc  Block  Store   High performance block storage device 1GB to 1TB in size Mount as drives to instances with snapshot/cloning functionalities IMAGE Availability 99.99% Durability 99.999999999% Is a Web Store Not a file system No Single Points of Failure Eventually consistent Paradigm Object store Performance Very Fast Redundancy Across Availability Zones Security Public Key / Private Key Pricing $0.095/GB/month Typical use case Write once, read many Limits 100 Buckets, Unlimited Storage, 5TB Objects Simple  Storage  Service   Highly  scalable  object  storage  for  the  internet   1  byte  to  5TB  in  size   99.999999999%  durability  
  • 19. Peak Requests: 830,000+ per second Total Number of Objects Stored in Amazon S3 14 Billion 40 Billion 102 Billion 762 Billion 262 Billion 1.3 Trillion Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Objects in S3
  • 20. Glacier   Long  term  object  archive   Extremely  low  cost  per  gigabyte   99.999999999%  durability   ElasNc  Block  Store   High performance block storage device 1GB to 1TB in size Mount as drives to instances with snapshot/cloning functionalities IMAGE Durability 99.999999999% Designed for Archival Not a file system Vaults & Archives 3-5 Hour Retrieval Time Paradigm Archive Store Performance Configurable - Low Redundancy Across Availability Zones Security Public Key / Private Key Pricing $0.011/GB/month Typical use case Write once, read infrequently < 10% / Month
  • 21. Simple  Storage  Service   Highly  scalable  object  storage   1  byte  to  5TB  in  size   99.999999999%  durability   Glacier   Long  term  object  archive   Extremely  low  cost  per  gigabyte   99.999999999%  durability   Storage  Lifecycle  IntegraNon  
  • 23. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database Relational Database Service Managed Oracle, MySQL & SQL Server Dynamo DB Managed NOSQL Database Amazon Redshift Massively Parallel Petabyte Scale Data Warehouse RDS Dynamo DB Redshift
  • 24. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database Relational Database Service Database-as-a-Service No need to install or manage database instances Scalable and fault tolerant configurations Integration with Data Pipeline RDS Dynamo DB Redshift
  • 25. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database DynamoDB Provisioned throughput NoSQL database Fast, predictable, configurable performance Fully distributed, fault tolerant HA architecture Integration with EMR & Hive RDS Dynamo DB Redshift
  • 26. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database Redshift Managed Massively Parallel Petabyte Scale Data Warehouse Streaming Backup/Restore to S3 Extensive Security 2 TB -> 1.6 PB RDS Dynamo DB Redshift
  • 27. Unstructured  Data   …   Parallel  ETL  
  • 28. Elastic MapReduce Managed, elastic Hadoop cluster Integrates with S3 & DynamoDB Leverage Hive & Pig analytics scripts Support for Spot Instances Integrated HBase NOSQL Database Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Application Services Elastic MapReduce
  • 29. •  AWS Web Console •  Command Line elastic-­‐mapreduce  -­‐-­‐create  -­‐-­‐key-­‐pair  micro  -­‐-­‐region  eu-­‐ west-­‐1  -­‐-­‐name  IanMM-­‐Test1  -­‐-­‐num-­‐instances  5  -­‐-­‐instance-­‐ type  m2.4xlarge  –alive  -­‐-­‐log-­‐uri  s3n://meyersi-­‐ire/EMR/ log   Launching Clusters
  • 30. •  Enabling Tools elastic-­‐mapreduce  -­‐-­‐create  -­‐-­‐key-­‐pair  micro  -­‐-­‐region  eu-­‐west-­‐1  -­‐-­‐ name  IanMM-­‐Test1  -­‐-­‐num-­‐instances  5  -­‐-­‐instance-­‐type  m2.4xlarge  -­‐-­‐ alive     -­‐-­‐pig-­‐interactive  -­‐-­‐pig-­‐versions  latest   -­‐-­‐hive-­‐interactive  –-­‐hive-­‐versions  latest   -­‐-­‐hbase     -­‐-­‐log-­‐uri  s3n://meyersi-­‐ire/EMR/log   Launching Clusters
  • 31. •  Hadoop Configuration Bootstrap Action elastic-­‐mapreduce  -­‐-­‐create  -­‐-­‐bootstrap-­‐action   s3://elasticmapreduce/bootstrap-­‐ actions/configure-­‐hadoop  -­‐-­‐args  "-­‐ s,dfs.block.size=1048576”  -­‐-­‐key-­‐pair  micro   -­‐-­‐region  eu-­‐west-­‐1  -­‐-­‐name  IanMM-­‐Test-­‐3  -­‐-­‐instance-­‐group   core  -­‐-­‐instance-­‐count  2  -­‐-­‐instance-­‐type  m2.4xlarge  -­‐-­‐ instance-­‐group  task  -­‐-­‐instance-­‐count  2  -­‐-­‐instance-­‐type   m2.4xlarge  -­‐-­‐alive  -­‐-­‐pig-­‐interactive  -­‐-­‐hive-­‐interactive   -­‐-­‐log-­‐uri  s3n://meyersi-­‐ire/EMR/log   Launching Clusters
  • 32. Input Datanode: This could be a S3 bucket, RDS table, EMR Hive table, etc.   Activity: This is a data aggregation, manipulation, or copy that runs on a user- configured schedule. Output Datanode: This supports all the same datasources as the input datanode, but they don’t have to be the same type.   Amazon Data Pipeline
  • 33. Output:  S3  file   Path:  s3://trend-­‐data/#{year-­‐month-­‐day}.csv   AcNvity:  EMR  Transform   Hive  Query:  user-­‐metrics.hql   Frequency:  Daily   Input:  RDS  Table   Table:  User-­‐Demographics   SQL  PrecondiNon:    “Select  last_update  from  table“  >  #{YY-­‐MM-­‐DD}   Input:  DynamoDB  Table   Table:  User-­‐Event-­‐Data-­‐#{year-­‐month}   Success  NoNficaNon:  [email protected]   Failure  NoNficaNon:  emr-­‐[email protected]   Delay  NoNficaNon:  :  emr-­‐[email protected]     Orchestration with Data Pipeline
  • 34. Analytics Pipeline Redshift S3 RDS EMR Data Pipeline …collect & store …orchestrate …process & analyse Dynamo DB
  • 35. Benefits only possible in the Cloud Pay as you Go Lower Overall Costs Stop Guessing Capacity Agility / Speed / Innovation Avoid Undifferentiated Heavy Lifting Go Global in Minutes ✔ ✔ ✔ ✔ ✔ ✔ “Private Cloud” / On Premises X X X X X X
  • 36. Agility & Global Reach at the Core of EMR
  • 37. Ease of Operation Compute  Infrastructure   Hadoop  ConfiguraNon   Local  Disk   OperaNng  System  Config   HDFS   Networking   Hive   Pig   HBase   User  Defined  Sogware  InstallaNon  
  • 38. Ease of Operation Compute  Infrastructure   Hadoop   ConfiguraNon   Local  Disk   OperaNng   System  Config   HDFS   Networking   Hive   Pig   HBase   User  Defined  Sogware  InstallaNon   Multiple Hadoop Distributions - Open Source & MapR Clusters Launched with 1 Command Up in 5 Minutes Hard Partitioned per Customer on CPU, Memory and Disk Dynamic Cluster Resizing In any of 8 Regions around the Globe
  • 39. Lower Overall Costs Cheaper | Spot Market Management
  • 40. Lower TCO June  2013  Study  by  Accenture   Technology  Labs       Not  Sponsored  or  Funded  by  Amazon       “Accenture  assessed  the  price-­‐ performance  raJo  between  bare-­‐metal   Hadoop  clusters  and  Hadoop-­‐as-­‐a-­‐Service   on  Amazon  Web  Services…[and]  revealed   that  Hadoop-­‐as-­‐a-­‐Service  offers  bePer   price-­‐performance  raJo…”         hkp://www.accenture.com/us-­‐en/Pages/insight-­‐hadoop-­‐ deployment-­‐comparison.aspx  
  • 41. •  Spot allows customers to bid on unused EC2 capacity •  Spot price based on supply/demand of instance types in an Availability Zone •  Customers are fulfilled when their bid price is higher than the Spot Price •  Instances will be interrupted when the Spot price exceed the bid price Spot 101 - What are Spot Instances
  • 42. elastic-mapreduce --add-instance-group TASK --instance-count 100 --bid-price .4
  • 43. Mix Spot and On-Demand instances to reduce cost and accelerate computation while protecting against interruption #1: Cost without Spot 4 instances *14 hrs * $0.50 = $28 Job Flow 14 Hours Duration: Other EMR + Spot Use Cases § Run entire cluster on Spot for biggest cost savings § Reduce the cost of application testing #2: Cost with Spot 4 instances *7 hrs * $0.50 = $14 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $22.75 Scenario #1 Duration: Job Flow 7 Hours Scenario #2 Time Savings: 50% Cost Savings: ~20% Reducing Hadoop Costs with Spot
  • 47. Populate as demand dictates…
  • 49. And turn it off when you are done
  • 50. EMR is Hadoop… …cheaper, easier, and more agile
  • 51. What’s New? •  MapR M7 Introduction •  Optimised for HBase Clusters •  Failure Recovery •  Point in Time Recovery Snapshotting •  Low Latency Hadoop Optimisations •  HBase Mirroring •  NFS + HDFS •  MapR M5 Price Drop •  Support for Pig 0.11.1 •  RANK, CUBE & ROLLUP capability •  Groovy UDF’s •  Support for Guava Functions •  Performance Improvements •  Spark/Shark Bootstrap Action •  In Memory Hadoop •  Spark Scripting (similar to Pig) •  Shark Shell with Hive Interoperability