SlideShare a Scribd company logo
Big Data on EC2: Mashing Technology in the Cloud

ShareThis


Paco Nathan, Data Insights Team




ScaleCamp 2009-06-09
Scenario…


✤   Given: >10^6 publishers, >10^9 users, >10^10 urls

✤   Early-stage start-up, < 25 people, seeking to minimize spend on Ops
    and capex for data centers

✤   Serving widgets on page views for popular online publications: ESPN,
    HuffPost, FOX, CS Monitor, CBS Marketwatch, Wired, TechCrunch, etc.

✤   Spikes in popularity of stories leads to elastic demands throughout
    the system architecture: serving API, logging, DW, BI, etc.

✤   Business needs to improve user experience by analyzing how people
    share online content


ScaleCamp 2009-06-09
System Architecture


✤   100% infrastructure based on Amazon AWS

✤   Each component designed for cost-effective, horizontal scale-out

✤   AsterData: infrastructure based on a “hub-and-spoke” pattern of batch
    jobs and data consolidation

✤   Cascading: abstraction layer for tying together system components

✤   Batch jobs run on Amazon Elastic MapReduce and AsterData SQL/MR

✤   Vertical search based on Bixo and Katta


ScaleCamp 2009-06-09
ScaleCamp 2009-06-09
Cascading

✤   Syntax is for humans, APIs are for software

✤   Cascading defines apps in terms of functions applied to data flows,
    incorporates end-points; whereas MR is coarse, key/values are brittle

✤   Direct benefits to team’s responsibilities, process, deliverables

✤   Also consider impact on team size, composition, staffing, training

✤   Ideal for provisioning/monitoring elastic resources, such as EMR

✤   Great potential to allow for migrating apps

✤   (see also: Cascading talk @ Hadoop Summit tomorrow, 6pm)

ScaleCamp 2009-06-09
Amazon Elastic MapReduce

✤   Can scale-out wide and select different instance types at launch

✤   Reads input from S3, writes output to S3, optionally copies logs to S3

✤   Excellent command line tools make dev/test/debug cycle more efficient

✤   Risks for DIY Hadoop clusters: time+cost to acquire leases, data locality,
    etc., — EMR resolves these to make large apps more cost-effective

✤   Simplifies needs for Ops — which can be quite difficult and expensive
    to staff for Big Data apps, and pose a risk to scale-out strategy

✤   See the Cascading examples: LogAnalyzer for CloudFront and Multitool

ScaleCamp 2009-06-09
AsterData nCluster Cloud Edition

✤   Scalable, fault-tolerant, relational database — directly in the cloud

✤   Takes only minutes to go from idea to a prototype which scales well;
    accessible by data scientists who have less coding background

✤   Use of SQL unions on map phase, plus other SQL primitives for shuffle
    and reduce phases, allows for complex MR workflows

✤   Leveraging SQL primitives during shuffle and reduce phases allows for
    automated optimizations

✤   Contributed code among developer community, implementing libraries
    for popular algorithms based on In-Database MapReduce


ScaleCamp 2009-06-09
ShareThis as Case Study

See also: ShareThis case study, "Cascading"
by Chris K Wensel, in…




ScaleCamp 2009-06-09
Contacts:
http://sharethis.com
@pacoid on Twitter
http://cascading.org
http://asterdata.com/product/ncluster_cloud.php
http://aws.amazon.com/elasticmapreduce
http://github.com/emi/bixo
http://katta.sourceforge.net
http://www.hadoopbook.com




ScaleCamp 2009-06-09
Ad

More Related Content

What's hot (19)

PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architectures
Jampp
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
SingleStore
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
Jampp
 
Razorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecaseRazorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecase
guestda111d9
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Shakas Technologies
 
Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019
Petr Zapletal
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Coburn Watson
 
Howtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboardHowtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboard
GeoMedeelel
 
0 supermapproductsintroduction
0 supermapproductsintroduction0 supermapproductsintroduction
0 supermapproductsintroduction
GeoMedeelel
 
Practical Experiences With ArcGIS Server
Practical Experiences With ArcGIS ServerPractical Experiences With ArcGIS Server
Practical Experiences With ArcGIS Server
Wisconsin Land Information Association
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
GeoMedeelel
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
GeoMedeelel
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura
 
01 supermapiserverintroduction
01 supermapiserverintroduction01 supermapiserverintroduction
01 supermapiserverintroduction
GeoMedeelel
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
Cloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStackCloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStack
Asociatia ProLinux
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architectures
Jampp
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
SingleStore
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
Jampp
 
Razorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecaseRazorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecase
guestda111d9
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Shakas Technologies
 
Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019
Petr Zapletal
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Coburn Watson
 
Howtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboardHowtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboard
GeoMedeelel
 
0 supermapproductsintroduction
0 supermapproductsintroduction0 supermapproductsintroduction
0 supermapproductsintroduction
GeoMedeelel
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
GeoMedeelel
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
GeoMedeelel
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura
 
01 supermapiserverintroduction
01 supermapiserverintroduction01 supermapiserverintroduction
01 supermapiserverintroduction
GeoMedeelel
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
Cloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStackCloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStack
Asociatia ProLinux
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 

Viewers also liked (20)

Ahead of the curve with Ecommerce
Ahead of the curve with EcommerceAhead of the curve with Ecommerce
Ahead of the curve with Ecommerce
Rahul Singh
 
Social Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingSocial Media for Atlantic Canada Marketing
Social Media for Atlantic Canada Marketing
MediaBadger
 
Social media for brands.pdf
Social media for brands.pdfSocial media for brands.pdf
Social media for brands.pdf
Anjanette Delgado
 
Canada
CanadaCanada
Canada
albinonunez
 
Women and Social Media in Canada
Women and Social Media in CanadaWomen and Social Media in Canada
Women and Social Media in Canada
Ayelet Baron
 
How Canadians Shop
How Canadians ShopHow Canadians Shop
How Canadians Shop
guest970d5
 
Canadaeconomics
CanadaeconomicsCanadaeconomics
Canadaeconomics
North Gwinnett Middle School
 
Maglev trains
Maglev trainsMaglev trains
Maglev trains
Self-employed
 
Wind power forecasting accuracy and uncertainty in Finland
Wind power forecasting accuracy and uncertainty in FinlandWind power forecasting accuracy and uncertainty in Finland
Wind power forecasting accuracy and uncertainty in Finland
VTT Technical Research Centre of Finland Ltd
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
BAQMaR
 
Animatronics Portfolio
Animatronics PortfolioAnimatronics Portfolio
Animatronics Portfolio
Brian Jaecker-Jones
 
Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01
ajeesh n
 
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital BehaviorseMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
EMARKETER
 
Stan winston & animatronics
Stan winston & animatronicsStan winston & animatronics
Stan winston & animatronics
Jason Seon-gyu Park
 
Animatronics
AnimatronicsAnimatronics
Animatronics
PLANB1
 
Maglev trains ppt
Maglev trains pptMaglev trains ppt
Maglev trains ppt
Lokesh Choudhary
 
The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)
Revolution Analytics
 
Ultra wideband technology (UWB)
Ultra wideband technology (UWB)Ultra wideband technology (UWB)
Ultra wideband technology (UWB)
Mustafa Khaleel
 
Canada country presentation_presentation
Canada country presentation_presentationCanada country presentation_presentation
Canada country presentation_presentation
Teresa Maroto Ramos
 
Wireless Application Protocol ppt
Wireless Application Protocol pptWireless Application Protocol ppt
Wireless Application Protocol ppt
go2project
 
Ahead of the curve with Ecommerce
Ahead of the curve with EcommerceAhead of the curve with Ecommerce
Ahead of the curve with Ecommerce
Rahul Singh
 
Social Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingSocial Media for Atlantic Canada Marketing
Social Media for Atlantic Canada Marketing
MediaBadger
 
Women and Social Media in Canada
Women and Social Media in CanadaWomen and Social Media in Canada
Women and Social Media in Canada
Ayelet Baron
 
How Canadians Shop
How Canadians ShopHow Canadians Shop
How Canadians Shop
guest970d5
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
BAQMaR
 
Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01
ajeesh n
 
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital BehaviorseMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
EMARKETER
 
Animatronics
AnimatronicsAnimatronics
Animatronics
PLANB1
 
The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)
Revolution Analytics
 
Ultra wideband technology (UWB)
Ultra wideband technology (UWB)Ultra wideband technology (UWB)
Ultra wideband technology (UWB)
Mustafa Khaleel
 
Canada country presentation_presentation
Canada country presentation_presentationCanada country presentation_presentation
Canada country presentation_presentation
Teresa Maroto Ramos
 
Wireless Application Protocol ppt
Wireless Application Protocol pptWireless Application Protocol ppt
Wireless Application Protocol ppt
go2project
 
Ad

Similar to Big Data on EC2: Mashing Technology in the Cloud (20)

Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
DATAVERSITY
 
AWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThisAWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThis
Paco Nathan
 
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Compuware APM
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
Netcetera
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
confluent
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Ronald.Ramos
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
webscale
 
Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09
Amnon Raviv
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
Ashish Mrig
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
Araf Karsh Hamid
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Yahoo Developer Network
 
GigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The CloudGigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The Cloud
gigaspaces
 
Giga Spaces Getting Ready For The Cloud
Giga Spaces   Getting Ready For The CloudGiga Spaces   Getting Ready For The Cloud
Giga Spaces Getting Ready For The Cloud
chzesin
 
Best Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data LayersBest Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data Layers
IBMCompose
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Successful Cloud Orchestration with RightScale CMP
Successful Cloud Orchestration with RightScale CMPSuccessful Cloud Orchestration with RightScale CMP
Successful Cloud Orchestration with RightScale CMP
RightScale
 
Confluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfConfluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdf
Ahmed791434
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
DATAVERSITY
 
AWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThisAWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThis
Paco Nathan
 
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Compuware APM
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
Netcetera
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
confluent
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Ronald.Ramos
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
webscale
 
Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09
Amnon Raviv
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
Ashish Mrig
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
Araf Karsh Hamid
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Yahoo Developer Network
 
GigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The CloudGigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The Cloud
gigaspaces
 
Giga Spaces Getting Ready For The Cloud
Giga Spaces   Getting Ready For The CloudGiga Spaces   Getting Ready For The Cloud
Giga Spaces Getting Ready For The Cloud
chzesin
 
Best Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data LayersBest Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data Layers
IBMCompose
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Successful Cloud Orchestration with RightScale CMP
Successful Cloud Orchestration with RightScale CMPSuccessful Cloud Orchestration with RightScale CMP
Successful Cloud Orchestration with RightScale CMP
RightScale
 
Confluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfConfluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdf
Ahmed791434
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Ad

More from George Ang (20)

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
George Ang
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
George Ang
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
George Ang
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
George Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
George Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
George Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
George Ang
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
George Ang
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
George Ang
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
George Ang
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
George Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
George Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
George Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
George Ang
 

Recently uploaded (20)

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 

Big Data on EC2: Mashing Technology in the Cloud

  • 1. Big Data on EC2: Mashing Technology in the Cloud ShareThis Paco Nathan, Data Insights Team ScaleCamp 2009-06-09
  • 2. Scenario… ✤ Given: >10^6 publishers, >10^9 users, >10^10 urls ✤ Early-stage start-up, < 25 people, seeking to minimize spend on Ops and capex for data centers ✤ Serving widgets on page views for popular online publications: ESPN, HuffPost, FOX, CS Monitor, CBS Marketwatch, Wired, TechCrunch, etc. ✤ Spikes in popularity of stories leads to elastic demands throughout the system architecture: serving API, logging, DW, BI, etc. ✤ Business needs to improve user experience by analyzing how people share online content ScaleCamp 2009-06-09
  • 3. System Architecture ✤ 100% infrastructure based on Amazon AWS ✤ Each component designed for cost-effective, horizontal scale-out ✤ AsterData: infrastructure based on a “hub-and-spoke” pattern of batch jobs and data consolidation ✤ Cascading: abstraction layer for tying together system components ✤ Batch jobs run on Amazon Elastic MapReduce and AsterData SQL/MR ✤ Vertical search based on Bixo and Katta ScaleCamp 2009-06-09
  • 5. Cascading ✤ Syntax is for humans, APIs are for software ✤ Cascading defines apps in terms of functions applied to data flows, incorporates end-points; whereas MR is coarse, key/values are brittle ✤ Direct benefits to team’s responsibilities, process, deliverables ✤ Also consider impact on team size, composition, staffing, training ✤ Ideal for provisioning/monitoring elastic resources, such as EMR ✤ Great potential to allow for migrating apps ✤ (see also: Cascading talk @ Hadoop Summit tomorrow, 6pm) ScaleCamp 2009-06-09
  • 6. Amazon Elastic MapReduce ✤ Can scale-out wide and select different instance types at launch ✤ Reads input from S3, writes output to S3, optionally copies logs to S3 ✤ Excellent command line tools make dev/test/debug cycle more efficient ✤ Risks for DIY Hadoop clusters: time+cost to acquire leases, data locality, etc., — EMR resolves these to make large apps more cost-effective ✤ Simplifies needs for Ops — which can be quite difficult and expensive to staff for Big Data apps, and pose a risk to scale-out strategy ✤ See the Cascading examples: LogAnalyzer for CloudFront and Multitool ScaleCamp 2009-06-09
  • 7. AsterData nCluster Cloud Edition ✤ Scalable, fault-tolerant, relational database — directly in the cloud ✤ Takes only minutes to go from idea to a prototype which scales well; accessible by data scientists who have less coding background ✤ Use of SQL unions on map phase, plus other SQL primitives for shuffle and reduce phases, allows for complex MR workflows ✤ Leveraging SQL primitives during shuffle and reduce phases allows for automated optimizations ✤ Contributed code among developer community, implementing libraries for popular algorithms based on In-Database MapReduce ScaleCamp 2009-06-09
  • 8. ShareThis as Case Study See also: ShareThis case study, "Cascading" by Chris K Wensel, in… ScaleCamp 2009-06-09