SlideShare a Scribd company logo
Apache Mesos Overview and
Integration:
Docker, Kubernetes, and Beyond
Alex Barreto
Disney Interactive - Data Technology
alex.barreto@disney.com
@shakamunyi
What you should take away
● Overview
● History
● Use Cases
● Integration
– Dokah, dokah, dokah
– Kubernetes
What is Apache Mesos?
Apache Mesos is a cluster manager that
provides efficient resource isolation and sharing
across distributed applications, or frameworks.
It can run Hadoop, MPI, Hypertable, Spark,
Elastic Search, Storm, Aurora, Marathon ... and
other applications on a dynamically shared pool of
nodes.
Rephrased:
- Mesos is a distributed systems kernel -
● Mesos is built using the same principles as the Linux
kernel, only at a different level of abstraction. The
Mesos kernel runs on every machine and provides
applications (e.g., Hadoop, Spark, Kafka, Elastic
Search) with API’s for resource management and
scheduling across entire datacenter and cloud
environments.
● Or you could think of it as a distributed system to build
distributed systems.
Project Highlights
● Top-level Apache project ~ 1 year (mesos.apache.org)
● Scales to 10,000s of nodes
● Obviates the need for virtual machines in some use cases
● Isolation for CPU, RAM, I/O, FS, etc.
● Fault-tolerant leader election (HA) based on Zookeeper
● API's in C++, Java/Scala, Python, Go, Erlang, Haskell.
● Web UI for inspecting state
● Available for Linux, OpenSolaris, Mac OSX
Mesos From 50K.
Who is using Mesos? - This list is old.
Apache Mesos Overview and Integration
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
History
Understanding of Datacenter Computing
Google has been doing data center computing for
years to address the complexities of large-scale
data workflows:
● Leveraging the modern kernel isolation. (cgroups)
● Containerization !Virtualization (lmctfy - Docker)
● Most (>80) jobs are batch jobs, but the majority of
resources(55-80%) are allocated to service jobs.
● Mixed workloads, multi-tenancy
● Relatively high utilization rates
● JVM? Not so much...
● Reality: scheduling batch is simple;
– scheduling services is hard/expensive.
Refs.
● The Datacenter as a Computer: An Introduction
to the Design of Warehouse-Scale Machines
– http://research.google.com/pubs/pub35290.html
● GAFS Omega John Wilkes
– https://www.youtube.com/watch?v=0ZFMlO98Jkc
● Taming Latency Variability and Scaling Deep
Learning
– http://youtu.be/nK6daeTZGA8
Why Mesos?
● Many existing solutions had architectural
deficiencies around a constraining model.
– Everything is a “job”, good for batch.
– Everything was a “service”, good for PaaS.
– What happens when you want to write your own
distributed application? (no primitives)
– What happens when you want to write your own
scheduler (elastic service). Reinvent the Square
wheel.
The New Reality
● New applications need to be:
– Fault Tolerant (Withstand failure)
– Scalable (Not crumble under it's own weight)
– Elastic (Can grow and shrink based on demand)
– Multi-tenent (It can't have it's own dedicated cluster)
● Must play nice with the other kids.
● So what does that really mean?
Distributed Applications
● “There's Just No Getting Around It: You're Building a Distributed System” Mark Cavage
– queue.acm.org/detail.cfm?id=2482856
● Key takeaways on architecture:
– Decompose the business applications into discrete services on the boundaries of fault
domains, scaling, and data workload.
– Make as many things as possible stateless
– When dealing with state, deeply understand CAP, latency, throughput, and durability
requirements.
“Without practical experience working on successful—and failed—systems, most engineers take
a "hopefully it works" approach and attempt to string together off-the-shelf software, whether
open source or commercial, and often are unsuccessful at building a resilient, performant system.
In reality, building a distributed system requires a methodical approach to requirements along the
boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for
the business application at all aspects of the application.”
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
Emerging
at Berkeley
BDAS Stack
Prior Practice: Dedicated Servers
• low utilization rates
• longer time to ramp up new services
DATACENTER
Prior Practice: Virtualization
DATACENTER PROVISIONED VMS
• even more machines to manage
• substantial performance decrease
due to virtualization
• VM licensing costs
Prior Practice: Static Partitioning
STATIC PARTITIONING
• even more machines to manage
• substantial performance decrease
due to virtualization
• VM licensing costs
• failures make static partitioning
more complex to manage
DATACENTER
What are the costs of Single Tenancy?
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
tt
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
MESOS
Mesos: One Large Pool of Resources
“We wanted people to be able to program
for the datacenter just like they program
for their laptop."
Ben Hindman
DATACENTER
Kernel
Apps
servicesbatch
Frameworks
Python
JVM
C
++
Workloads
distributed file system
Chronos
DFS
distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster
Storm
Kafka JBoss Django RailsImpalaScalding
Marathon
SparkHadoopMPI
MySQL
Mesos – architecture
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
Use Cases
Case Study: Twitter (bare metal / on premise)
“Mesos is the cornerstone of our elastic compute infrastructure –
it’s how we build all our new services and is critical for Twitter’s
continued success at scale. It's one of the primary keys to our
data center efficiency."
Chris Fry, SVP Engineering
blog.twitter.com/2013/mesos-graduates-from-apache-incubation
wired.com/gadgetlab/2013/11/qa-with-chris-fry/
• key services run in production: analytics, typeahead, ads
• Twitter engineers rely on Mesos to build all new services
• instead of thinking about static machines, engineers think
about resources like CPU, memory and disk
• allows services to scale and leverage a shared pool of
servers across datacenters efficiently
• reduces the time between prototyping and launching
Case Study: Airbnb (fungible cloud infrastructure)
“We think we might be pushing data science in the field of travel
more so than anyone has ever done before… a smaller number
of engineers can have higher impact through automation on
Mesos."
Mike Curtis, VP Engineering
gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data...
• improves resource management and efficiency
• helps advance engineering strategy of building small teams
that can move fast
• key to letting engineers make the most of AWS-based
infrastructure beyond just Hadoop
• allowed company to migrate off Elastic MapReduce
• enables use of Hadoop along with Chronos, Spark, Storm, etc.
Case Study: HubSpot (cluster management)
Tom Petr
youtu.be/ROn14csiikw
• 500 deployable objects; 100 deploys/day to
production; 90 engineers; 3 devops on Mesos
cluster
• “Our QA cluster is now a fixed $10K/month —
that used to fluctuate”
Dock-ah, dock-ah, dock-ah
"If a Docker application is a Lego brick, Kubernetes would be like a kit
for building the Millennium Falcon and the Mesos cluster would be like
a whole Star Wars universe made of Legos." ~ Solomon
● Mesos used to have plugable add-ons for docker, and now they are 1st
classed into the core.
– Lots of churn between 0.20-0.21 on feedback from the field.
● It enables programmatic scheduling of containers from batch (cronos), to
service orchestration (marathon & Aurora).
● Common deployment is Marathon + HA-Proxy + Docker running from
O(10^1) - O(10^3) with mesos clusters O(10^4).
● Some users are exploring customized elastic batch.
● Custom frameworks are quite a large space, we only see a tip of the iceburg.
Kubernetes+Mesos = Awesome
● Kubernetes provides robust declarative
primitives for distributed micro-services.
● Mesos provides an imperative framework by
which application developers can define
scheduling policy in a programmatic fashion.
● When leveraged together it provides a data-
center with the ability to both.
– Both batch and service.
Kubernetes+Mesos cont.
● Launching pods
– Implement Kube-scheduler API
– Pick a Pod (FCFS), match it to an offer.
– Launch it!
– Kubelet as Executor+Containerizer
● Pod Labels: for Service Discovery + Load Balancing
● Running multi-node on GCE
● Replication Control
● Use resource shapes to schedule pods
● Even smarter scheduling
The BEYOND Section
Coming – “Soon-ish”
● Primitives to support Stateful Services (HDFS,
Cassandra, etc.)
– Persistent disk data.
– Lazy resource reservation
– Expanded Disk Isolation
● I/O
● FS/Mount namespace
● Support multiple spindles
– Dynamic slave attributes.
“Soon-ish” - part 2
● Opportunistic Offers
– Free-for-all like Omega.
– Requires Framework API modifications
– Increased utilization at cloud-scale.
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
Thank You!
mesos.apache.org
@shakamunyi
Ad

More Related Content

What's hot (20)

RHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStackRHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStack
Jerome Marc
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
Hortonworks
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
Orgad Kimchi
 
Red Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureRed Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud Infrastructure
Alex Baretto
 
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
OpenShift Origin
 
Red Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechRed Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure Tech
ProxyServices
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
Robert Bohne
 
Open stack platform director
Open stack platform director Open stack platform director
Open stack platform director
Jsonr4
 
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
OpenShift Origin
 
OpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - IntroductionOpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - Introduction
Behnam Loghmani
 
Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10
Mesosphere Inc.
 
What is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootWhat is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - Kangaroot
Kangaroot
 
Meetup
MeetupMeetup
Meetup
Victor Estival
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
Winton Winton
 
Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013
Arthur Berezin
 
Red Hat OpenStack Deployment
Red Hat OpenStack DeploymentRed Hat OpenStack Deployment
Red Hat OpenStack Deployment
Michael Solberg
 
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Diane Mueller
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
Kangaroot
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack Platform
Rhys Oxenham
 
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
Diane Mueller
 
RHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStackRHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStack
Jerome Marc
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
Hortonworks
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
Orgad Kimchi
 
Red Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureRed Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud Infrastructure
Alex Baretto
 
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
OpenShift Origin
 
Red Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechRed Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure Tech
ProxyServices
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
Robert Bohne
 
Open stack platform director
Open stack platform director Open stack platform director
Open stack platform director
Jsonr4
 
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
OpenShift Origin
 
OpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - IntroductionOpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - Introduction
Behnam Loghmani
 
Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10
Mesosphere Inc.
 
What is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootWhat is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - Kangaroot
Kangaroot
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
Winton Winton
 
Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013
Arthur Berezin
 
Red Hat OpenStack Deployment
Red Hat OpenStack DeploymentRed Hat OpenStack Deployment
Red Hat OpenStack Deployment
Michael Solberg
 
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Diane Mueller
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
Kangaroot
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack Platform
Rhys Oxenham
 
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
Diane Mueller
 

Similar to Apache Mesos Overview and Integration (20)

Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Timothy St. Clair
 
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksStrata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Paco Nathan
 
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Paco Nathan
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OW2
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware
 
Openstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformOpenstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platform
Nagaraj Shenoy
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
Jan Repnak
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
Marc Dutoo
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparison
Krishna-Kumar
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
Weston Bassler
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
OCCIware
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
Marc Dutoo
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Marc Dutoo
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container Engine
Google Cloud Platform - Japan
 
Docker
DockerDocker
Docker
Yansi Keim
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
Ryan Aydelott
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
WSO2
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Timothy St. Clair
 
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksStrata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Paco Nathan
 
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Paco Nathan
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OW2
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware
 
Openstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformOpenstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platform
Nagaraj Shenoy
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
Jan Repnak
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
Marc Dutoo
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparison
Krishna-Kumar
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
Weston Bassler
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
OCCIware
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
Marc Dutoo
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Marc Dutoo
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container Engine
Google Cloud Platform - Japan
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
Ryan Aydelott
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
WSO2
 
Ad

Recently uploaded (20)

Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Ad

Apache Mesos Overview and Integration

  • 1. Apache Mesos Overview and Integration: Docker, Kubernetes, and Beyond Alex Barreto Disney Interactive - Data Technology [email protected] @shakamunyi
  • 2. What you should take away ● Overview ● History ● Use Cases ● Integration – Dokah, dokah, dokah – Kubernetes
  • 3. What is Apache Mesos? Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, Elastic Search, Storm, Aurora, Marathon ... and other applications on a dynamically shared pool of nodes.
  • 4. Rephrased: - Mesos is a distributed systems kernel - ● Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments. ● Or you could think of it as a distributed system to build distributed systems.
  • 5. Project Highlights ● Top-level Apache project ~ 1 year (mesos.apache.org) ● Scales to 10,000s of nodes ● Obviates the need for virtual machines in some use cases ● Isolation for CPU, RAM, I/O, FS, etc. ● Fault-tolerant leader election (HA) based on Zookeeper ● API's in C++, Java/Scala, Python, Go, Erlang, Haskell. ● Web UI for inspecting state ● Available for Linux, OpenSolaris, Mac OSX
  • 7. Who is using Mesos? - This list is old.
  • 9. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv History
  • 10. Understanding of Datacenter Computing Google has been doing data center computing for years to address the complexities of large-scale data workflows: ● Leveraging the modern kernel isolation. (cgroups) ● Containerization !Virtualization (lmctfy - Docker) ● Most (>80) jobs are batch jobs, but the majority of resources(55-80%) are allocated to service jobs. ● Mixed workloads, multi-tenancy ● Relatively high utilization rates ● JVM? Not so much... ● Reality: scheduling batch is simple; – scheduling services is hard/expensive.
  • 11. Refs. ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines – http://research.google.com/pubs/pub35290.html ● GAFS Omega John Wilkes – https://www.youtube.com/watch?v=0ZFMlO98Jkc ● Taming Latency Variability and Scaling Deep Learning – http://youtu.be/nK6daeTZGA8
  • 12. Why Mesos? ● Many existing solutions had architectural deficiencies around a constraining model. – Everything is a “job”, good for batch. – Everything was a “service”, good for PaaS. – What happens when you want to write your own distributed application? (no primitives) – What happens when you want to write your own scheduler (elastic service). Reinvent the Square wheel.
  • 13. The New Reality ● New applications need to be: – Fault Tolerant (Withstand failure) – Scalable (Not crumble under it's own weight) – Elastic (Can grow and shrink based on demand) – Multi-tenent (It can't have it's own dedicated cluster) ● Must play nice with the other kids. ● So what does that really mean?
  • 14. Distributed Applications ● “There's Just No Getting Around It: You're Building a Distributed System” Mark Cavage – queue.acm.org/detail.cfm?id=2482856 ● Key takeaways on architecture: – Decompose the business applications into discrete services on the boundaries of fault domains, scaling, and data workload. – Make as many things as possible stateless – When dealing with state, deeply understand CAP, latency, throughput, and durability requirements. “Without practical experience working on successful—and failed—systems, most engineers take a "hopefully it works" approach and attempt to string together off-the-shelf software, whether open source or commercial, and often are unsuccessful at building a resilient, performant system. In reality, building a distributed system requires a methodical approach to requirements along the boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for the business application at all aspects of the application.”
  • 15. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv Emerging at Berkeley
  • 17. Prior Practice: Dedicated Servers • low utilization rates • longer time to ramp up new services DATACENTER
  • 18. Prior Practice: Virtualization DATACENTER PROVISIONED VMS • even more machines to manage • substantial performance decrease due to virtualization • VM licensing costs
  • 19. Prior Practice: Static Partitioning STATIC PARTITIONING • even more machines to manage • substantial performance decrease due to virtualization • VM licensing costs • failures make static partitioning more complex to manage DATACENTER
  • 20. What are the costs of Single Tenancy? 0% 25% 50% 75% 100% RAILS CPU LOAD MEMCACHED CPU LOAD 0% 25% 50% 75% 100% HADOOP CPU LOAD 0% 25% 50% 75% 100% tt 0% 25% 50% 75% 100% Rails Memcached Hadoop COMBINED CPU LOAD (RAILS, MEMCACHED, HADOOP)
  • 21. MESOS Mesos: One Large Pool of Resources “We wanted people to be able to program for the datacenter just like they program for their laptop." Ben Hindman DATACENTER
  • 22. Kernel Apps servicesbatch Frameworks Python JVM C ++ Workloads distributed file system Chronos DFS distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster Storm Kafka JBoss Django RailsImpalaScalding Marathon SparkHadoopMPI MySQL Mesos – architecture
  • 23. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv Use Cases
  • 24. Case Study: Twitter (bare metal / on premise) “Mesos is the cornerstone of our elastic compute infrastructure – it’s how we build all our new services and is critical for Twitter’s continued success at scale. It's one of the primary keys to our data center efficiency." Chris Fry, SVP Engineering blog.twitter.com/2013/mesos-graduates-from-apache-incubation wired.com/gadgetlab/2013/11/qa-with-chris-fry/ • key services run in production: analytics, typeahead, ads • Twitter engineers rely on Mesos to build all new services • instead of thinking about static machines, engineers think about resources like CPU, memory and disk • allows services to scale and leverage a shared pool of servers across datacenters efficiently • reduces the time between prototyping and launching
  • 25. Case Study: Airbnb (fungible cloud infrastructure) “We think we might be pushing data science in the field of travel more so than anyone has ever done before… a smaller number of engineers can have higher impact through automation on Mesos." Mike Curtis, VP Engineering gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data... • improves resource management and efficiency • helps advance engineering strategy of building small teams that can move fast • key to letting engineers make the most of AWS-based infrastructure beyond just Hadoop • allowed company to migrate off Elastic MapReduce • enables use of Hadoop along with Chronos, Spark, Storm, etc.
  • 26. Case Study: HubSpot (cluster management) Tom Petr youtu.be/ROn14csiikw • 500 deployable objects; 100 deploys/day to production; 90 engineers; 3 devops on Mesos cluster • “Our QA cluster is now a fixed $10K/month — that used to fluctuate”
  • 27. Dock-ah, dock-ah, dock-ah "If a Docker application is a Lego brick, Kubernetes would be like a kit for building the Millennium Falcon and the Mesos cluster would be like a whole Star Wars universe made of Legos." ~ Solomon ● Mesos used to have plugable add-ons for docker, and now they are 1st classed into the core. – Lots of churn between 0.20-0.21 on feedback from the field. ● It enables programmatic scheduling of containers from batch (cronos), to service orchestration (marathon & Aurora). ● Common deployment is Marathon + HA-Proxy + Docker running from O(10^1) - O(10^3) with mesos clusters O(10^4). ● Some users are exploring customized elastic batch. ● Custom frameworks are quite a large space, we only see a tip of the iceburg.
  • 28. Kubernetes+Mesos = Awesome ● Kubernetes provides robust declarative primitives for distributed micro-services. ● Mesos provides an imperative framework by which application developers can define scheduling policy in a programmatic fashion. ● When leveraged together it provides a data- center with the ability to both. – Both batch and service.
  • 29. Kubernetes+Mesos cont. ● Launching pods – Implement Kube-scheduler API – Pick a Pod (FCFS), match it to an offer. – Launch it! – Kubelet as Executor+Containerizer ● Pod Labels: for Service Discovery + Load Balancing ● Running multi-node on GCE ● Replication Control ● Use resource shapes to schedule pods ● Even smarter scheduling
  • 31. Coming – “Soon-ish” ● Primitives to support Stateful Services (HDFS, Cassandra, etc.) – Persistent disk data. – Lazy resource reservation – Expanded Disk Isolation ● I/O ● FS/Mount namespace ● Support multiple spindles – Dynamic slave attributes.
  • 32. “Soon-ish” - part 2 ● Opportunistic Offers – Free-for-all like Omega. – Requires Framework API modifications – Increased utilization at cloud-scale.
  • 33. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv Thank You! mesos.apache.org @shakamunyi