SlideShare a Scribd company logo
FARMING HADOOP  IN THE CLOUD Steve Loughran  HP Laboratories June 2010
ABOUT ME Researcher at HP Laboratories Bristol, England
Datacentre-scale apps & IaaS
ASF Member: stevel@apache.org
Committer: Ant, Hadoop-common, HDFS, MapReduce
Author: Ant in Action
Somewhat obsessive about testing
CLOUD ELIMINATES Buying hardware based on predicted load
2+ week lead time on new hardware, storage
High Availability
Homogeneity
Static machine names, addresses and capabilities
Stable machines
A fast private network
Someone in the datacentre who cares about you
APPLICATIONS MUST BE AGILE Directory, database or CM service to configure
Applications to handle moving services
Use dynamic DNS services; don’t cache IPAddrs
Don’t expect HDD content to last on a single disk
Restart VMs on any app failure Nothing is static. Nothing lasts .
CLASSIC TEAM ROLES Business Development Architecture Operations Development
Business Development Architecture Operations Development BEFORE Design Code Test Staging Live
RESPONSIBILITIES Work Architects design the application
Developers code and test on local machines

More Related Content

What's hot (20)

PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
PPTX
Hadoop on Virtual Machines
Richard McDougall
 
PPTX
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
PPTX
Oracle big data appliance and solutions
solarisyougood
 
PDF
Bare-metal performance for Big Data workloads on Docker containers
BlueData, Inc.
 
PPTX
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
PPTX
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
PPTX
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
PDF
Hadoop Operations at LinkedIn
DataWorks Summit
 
PPTX
Intro to Apache Spark
Cloudera, Inc.
 
PDF
20150716 introduction to apache spark v3
Andrey Vykhodtsev
 
PDF
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Jerry Wen
 
PDF
Hadoop Virtualization - Intel White Paper
BlueData, Inc.
 
PPTX
How to deploy Apache Spark in a multi-tenant, on-premises environment
BlueData, Inc.
 
PPTX
Hadoop on Docker
Rakesh Saha
 
PDF
Hazelcast 3.6 Roadmap Preview
Hazelcast
 
PDF
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
PPTX
Ceph Deployment at Target: Customer Spotlight
Colleen Corrice
 
PPTX
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
Hadoop on Virtual Machines
Richard McDougall
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
Oracle big data appliance and solutions
solarisyougood
 
Bare-metal performance for Big Data workloads on Docker containers
BlueData, Inc.
 
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Hadoop Operations at LinkedIn
DataWorks Summit
 
Intro to Apache Spark
Cloudera, Inc.
 
20150716 introduction to apache spark v3
Andrey Vykhodtsev
 
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Jerry Wen
 
Hadoop Virtualization - Intel White Paper
BlueData, Inc.
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
BlueData, Inc.
 
Hadoop on Docker
Rakesh Saha
 
Hazelcast 3.6 Roadmap Preview
Hazelcast
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
Ceph Deployment at Target: Customer Spotlight
Colleen Corrice
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 

Viewers also liked (9)

PDF
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
Stephane Manciot
 
PDF
Des principes de la démarche DevOps à sa mise en oeuvre
Stephane Manciot
 
PDF
Machine learning
ebiznext
 
PDF
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Stephane Manciot
 
PPTX
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Cloudera, Inc.
 
PDF
Spark / Mesos Cluster Optimization
ebiznext
 
PDF
DevOps avec Ansible et Docker
Stephane Manciot
 
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
PPTX
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Cloudera, Inc.
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
Stephane Manciot
 
Des principes de la démarche DevOps à sa mise en oeuvre
Stephane Manciot
 
Machine learning
ebiznext
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Stephane Manciot
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Cloudera, Inc.
 
Spark / Mesos Cluster Optimization
ebiznext
 
DevOps avec Ansible et Docker
Stephane Manciot
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Cloudera, Inc.
 
Ad

Similar to Farming hadoop in_the_cloud (20)

PPTX
New Roles In The Cloud
Steve Loughran
 
PDF
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
PDF
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
DataWorks Summit
 
PPTX
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
PDF
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
 
PPT
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
PDF
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
PDF
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
PDF
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
PPT
Eric Baldeschwieler Keynote from Storage Developers Conference
Hortonworks
 
PPTX
Hadoop ppt1
chariorienit
 
PPTX
Introduction to Hadoop and Big Data
Joe Alex
 
PPTX
Module 1- Introduction to Big Data and Hadoop
SiddheshMhatre27
 
PDF
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
PDF
Big Data Architecture and Deployment
Cisco Canada
 
PDF
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Leons Petražickis
 
PDF
Dynamic Hadoop Clusters
Steve Loughran
 
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
New Roles In The Cloud
Steve Loughran
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
DataWorks Summit
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
 
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Hortonworks
 
Hadoop ppt1
chariorienit
 
Introduction to Hadoop and Big Data
Joe Alex
 
Module 1- Introduction to Big Data and Hadoop
SiddheshMhatre27
 
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
Big Data Architecture and Deployment
Cisco Canada
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Leons Petražickis
 
Dynamic Hadoop Clusters
Steve Loughran
 
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Ad

More from Steve Loughran (20)

PPTX
Hadoop Vectored IO
Steve Loughran
 
PPTX
The age of rename() is over
Steve Loughran
 
PPTX
What does Rename Do: (detailed version)
Steve Loughran
 
PPTX
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
PPTX
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PPTX
PUT is the new rename()
Steve Loughran
 
PPT
Extreme Programming Deployed
Steve Loughran
 
PPT
Testing
Steve Loughran
 
PPTX
I hate mocking
Steve Loughran
 
PPTX
What does rename() do?
Steve Loughran
 
PPTX
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
PPTX
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
PPTX
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
PPTX
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
PPTX
Apache Spark and Object Stores
Steve Loughran
 
PPTX
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
PPTX
Slider: Applications on YARN
Steve Loughran
 
PPTX
YARN Services
Steve Loughran
 
Hadoop Vectored IO
Steve Loughran
 
The age of rename() is over
Steve Loughran
 
What does Rename Do: (detailed version)
Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PUT is the new rename()
Steve Loughran
 
Extreme Programming Deployed
Steve Loughran
 
I hate mocking
Steve Loughran
 
What does rename() do?
Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Apache Spark and Object Stores
Steve Loughran
 
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Slider: Applications on YARN
Steve Loughran
 
YARN Services
Steve Loughran
 

Recently uploaded (20)

PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
The Future of Artificial Intelligence (AI)
Mukul
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 

Farming hadoop in_the_cloud

Editor's Notes

  • #2: 6 June 2010 HP Confidential
  • #4: What does all this mean? You don’t need to predict your customer load in advance, though you had better hope your supplier can offer a service to match You don’ t have to wait a few weeks for some order of hardware to get delivered. You can’t buy HA kit: RAID, L7 routers, other nice things, to address availability. You need to design these in You can’t be sure your machines will stay around, that when they come back their names and IP Addresses may change You don’t have someone with a pager in the room who will track down network problems for you 6 June 2010 HP Confidential
  • #5: We really need to rethink how to design apps in this world, the old ways don’t. When a VM goes, so does any transient HDD. When a machine gets terminated and re-instantiated, it can have different hostname and address. Nor can that server deal with machines moving around. Which is a pity as the simplest way to deal with app trouble is to reset the VM. No need to worry about what its previous state June 6, 2010 HP Confidential
  • #6: Here are some of the classic roles of back-end projects. There’s also graphic designers, marketing, content generation, etc. But this is the code side. Everyone’s job is hard. Biz dev: make sure the idea is good, predict demand , get the ops team to work with Arch and Finance to get machines to meet the demand Architecture: design something that works in the machines that ops will bring up Developers: code and test the app, produce something that works 6 June 2010 HP Confidential
  • #7: This is how things were built -at best- if you had a static set of machines as your target. Even if you design/code/test in a cycle, going live creates problems. Different systems, different networks, etc. Staging is meant to simplify this with a setup that mimics production, but it still has different users . June 6, 2010 HP Confidential
  • #8: This is how things are today. Set up for conflict. The big one is developers "ship code that is functional" and ops "run secure services". 6 June 2010 HP Confidential
  • #9: Once you stop needing a physical cluster of machines to test on, you can give every developer a virtual cluster which mimics that in production. You can bring up a staging site on the public server farm, let third parties play with it, switch it over when you are happy (ignoring data issues) 6 June 2010 HP Confidential
  • #10: Developers shouldn’t be creating the machine configurations; that’s a job for the architect and ops Ops have to move beyond the pager when a machine fails to getting an overall statistical view of what works, doesn't work, and look at the total, perceived picture. No more panicing when a machine goes down, but do worry when all the machines start to fail too often. Solution: monitoring and statistics. Datamining. Hadoop. Biz dev/management need to keep an eye on costs and revenue. Costs: machines. Revenue, things like why people are switching from free to premium, where customers are coming from. Statistics.Datamining. Hadoop. June 6, 2010 HP Confidential
  • #11: At this scale, datamining and statistics becomes an essential background activity Test result collection and analysis Application and VM log file capture, analysis: chukwa Application load analysis - feed into VM create/destroy User/paying customer mining -when do people pay, when do they leave? Infrastructure: how do people and their VMs behave? 6 June 2010 HP Confidential
  • #12: These are where Hadoop contains assumptions that are valid in the physical datacentre, but which don't work in a virtual world. 6 June 2010 HP Confidential
  • #15: This for everyone to create machines. You can only create machines in roles you have the right to. This is more than a constrained image, much more of the config is locked down: VM, networking, dynamic options. June 6, 2010 HP Confidential
  • #16: I’ve cheated and added some Hadoop-specificness in the web front end; you can create Hadoop workers and it knows to create the Master first, and passes the master hostname down so that the workers bond properly. This use case needs to be made generic June 6, 2010 HP Confidential
  • #17: This is a fairly weak Web UI but it’s designed to feed into portals. It also happens to test easily. 6 June 2010 HP Confidential
  • #25: 6 June 2010 HP Confidential