SlideShare a Scribd company logo
Containers and
Hadoop
Hadoop virtualization, done right!
Dinesh Subhraveti - dineshs@altiscale.com
Altiscale Inc.
“Brief History of Containers”
2001 2002 2003 20052004
First implementation of
containers based on syscall
interposition — Columbia
“Brief History of Containers”
2001 2002 2003 20052004
First implementation of
containers based on syscall
interposition — Columbia
First research paper on
Linux Containers —
OSDI’02
“Brief History of Containers”
2001 2002 2003 20052004
First research paper on
Linux Containers —
OSDI’02
First container-based
distributed checkpointing —
HP Labs
First implementation of
containers based on syscall
interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper on
Linux Containers —
OSDI’02
First container-based
distributed checkpointing —
HP Labs
First implementation of
containers based on syscall
interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper on
Linux Containers —
OSDI’02
IBM acquires Meiosys —
Focus shifted to AIX
First container-based
distributed checkpointing —
HP Labs
First implementation of
containers based on syscall
interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper on
Linux Containers —
OSDI’02
IBM acquires Meiosys —
Focus shifted to AIX
First container-based
distributed checkpointing —
HP Labs
First implementation of
containers based on syscall
interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper on
Linux Containers —
OSDI’02
IBM acquires Meiosys —
Focus shifted to AIX
First container-based
distributed checkpointing —
HP Labs
First implementation of
containers based on syscall
interposition — Columbia
Most core kernel changes
finally made into Linux mainline
Container Renaissance
“Datacenter is the Computer”
“The new computer needs an OS!”
Computer
OS
Mesos KubernetesYARN
Mesos KubernetesYARN
Containers: Enabler of the Datacenter OS
Computer
OS
ProcessesContainers: isolated abstractions
Why not Virtual Machines?
Application — Hardware misalignment
Hypervisor
Container Host
Application
Application
Applications have round edges
— system call interface
Hypervisors expose square holes
— hardware interface
Lightweight abstraction without
IO overhead or startup latency
Why not Virtual Machines?
Application — Hardware misalignment
Hypervisor
Container Host
Application
Applications have round edges
— system call interface
Hypervisors expose square holes
— hardware interface
Lightweight abstraction without
IO overhead or startup latency
The unwelcome
Guest OS
Application
Host
iSCSI, NFS
Image Format Interpreter
Virtual Device
VM Exit (Context Switch)
Guest Driver
Guest File System
Host
Application
Why not Virtual Machines?
Layers of Intermediate Software
VMsContainers
Application
High IO overhead due to
many intermediate layers
Why not Virtual Machines?
The Unwelcome Guest OS
Slow startup time
Guest OS licensing and maintenance burden
Poor scalability
High resource consumption due to duplication
Obfuscated network / storage / compute topologies
Application semantic information is lost
!
Hadoop
Resource Manager
Map Reduce
!
YARN
Map
Reduce
Spark Hbase ...
Evolution of Hadoop from Map Reduce to
YARN
Isolation is an immediate challenge
!
Hadoop
Resource Manager
Map Reduce
!
YARN
Map
Reduce
Spark Hbase ...
Containers on YARN
Containers provide a simple and elegant solution
Container Virtualization
!
Node Manager
Customer A
Task 1
Customer B
Task 1
Containers on YARN
Node Manager Spawned Tasks as Containers
Container Virtualization
Customer A
Task 2
Customer C
Task 1
Tasks representing the same job share the same container
Containers on YARN
Advantages
Secure multitenancy
Performance Isolation
Utilization via coscheduling IO and CPU tasks
Consistent cluster environment
Isolation of software dependencies / configuration
Reproducible way to define app environment
Rapid provisioning
❏ Recent addition to the kernel

❏ Superuser in container maps to a
regular user on the host

❏ Docker support for UID virtualization
Privilege Isolation through UID namespaces
Host
Container
Container root
UID 0
Regular user
UID 100
UID Virtualization
U
Host root
UID 0
References
!
❏ Blog post describing UID virtualization support in Docker
❏ https://www.altiscale.com/making-docker-work-yarn/
❏ Apache wiki page tracking work status across Docker and YARN projects
❏ https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers
❏ JIRA tracking Docker integration into YARN
❏ https://issues.apache.org/jira/browse/YARN-1964
❏ Related Docker tickets
❏ Several tickets linked from: https://github.com/dotcloud/docker/pull/4572



dineshs@altiscale.com
Questions?
Backup
Containers on Hadoop or
Hadoop on Containers?
Hadoop on Separate Physical Clusters
Awesomely Secure !
Everybody gets private hardware running private
services
Customer 1 Customer 2 Customer 3
Hadoop on Separate Physical Clusters
Customer 1 Customer 2 Customer 3
Cannot scale the business this way!
Poor utilization
Host platform is a huge maintenance burden
❖ Customer 1 needs R
❖ Customer 2 needs Matlab
❖ Customer 3 needs ß∂ø…
Utilization: 6
Spare: 0
Unused: 3
Utilization: 1
Spare: 6
Unused: 2
Utilization: 4
Spare: 3
Unused: 2
Container Clusters to Decouple Host from Customer
Each customer gets a container image
❖ Encapsulates customer specific software and
configuration
❖ Host platform remains lean and simple
Utilization: 6
Spare: 0
Unused: 3
Utilization: 1
Spare: 6
Unused: 2
Utilization: 4
Spare: 3
Unused: 2
Poor utilization
Customer 1 Customer 2 Customer 3
Global Pool of Resources
Global Utilization: 11
Spare: 16
Unused: 0
Container Clusters to Drive Utilization
Each customer gets a container image
❖ Encapsulates customer specific software and
configuration
❖ Host platform remains lean and simple
Densely pack containers together
Global Pool of Resources
Containers with Fine-grain Resources
❖ Container resource levels adjusted dynamically per
customer
➢ As dictated by business policy
❖ Fractional resource allocation
Global Pool of Resources
Disaggregated Compute and Storage
DNNM
❖ Add more storage to Customer 1 cluster from a storage rich node
➢ While a compute intensive job from Customer 2 utilizes the available compute capacity on the
same node
Independently scale compute and storage

More Related Content

PDF
M.E.L.I.G. Unikernel and Serverless
QNIB Solutions
 
PDF
Building Distributed Systems With Riak and Riak Core
Andy Gross
 
PDF
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Odinot Stanislas
 
PPTX
Oracle database on Docker Container
Jesus Guzman
 
PPTX
Containers #101 Meetup: Containers and OpenStack
Codefresh
 
PPTX
Containers #101 Meetup: Containers & OpenStack
Brittany Ingram
 
PPTX
Virtualization Vs. Containers
actualtechmedia
 
PPTX
Docker 101 - Nov 2016
Docker, Inc.
 
M.E.L.I.G. Unikernel and Serverless
QNIB Solutions
 
Building Distributed Systems With Riak and Riak Core
Andy Gross
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Odinot Stanislas
 
Oracle database on Docker Container
Jesus Guzman
 
Containers #101 Meetup: Containers and OpenStack
Codefresh
 
Containers #101 Meetup: Containers & OpenStack
Brittany Ingram
 
Virtualization Vs. Containers
actualtechmedia
 
Docker 101 - Nov 2016
Docker, Inc.
 

What's hot (20)

PDF
ACROPOLIS CONTAINER SERVICES
TREEPTIK
 
PDF
Turning OpenStack Swift into a VM storage platform
OpenStack_Online
 
PDF
Docker and containers : Disrupting the virtual machine(VM)
Rama Krishna B
 
PPSX
Containers Docker Kind Kubernetes Istio
Araf Karsh Hamid
 
PDF
Containers 101 Meetup - VMs vs Containers
Tommy Berry
 
PPTX
Containers and workload security an overview
Krishna-Kumar
 
PDF
Demystifying Containerization Principles for Data Scientists
Dr Ganesh Iyer
 
PPTX
Oracle Database on Docker - Best Practices
gvenzl
 
PDF
7 characteristics of container-native infrastructure, Docker Zurich 2015-09-08
Casey Bisson
 
PPTX
Webinar Docker Tri Series
Newt Global Consulting LLC
 
PPTX
Production debugging web applications
Ido Flatow
 
PDF
The 7 characteristics of container native infrastructure, LinuxCon/ContainerC...
Casey Bisson
 
PPTX
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
PDF
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
PPTX
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
Docker, Inc.
 
PPTX
Opinionated containers and the future of game servers by Brendan Fosberry
Docker, Inc.
 
PDF
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Docker, Inc.
 
PPTX
Webinar : Docker in Production
Newt Global Consulting LLC
 
PPTX
Hypervisor "versus" Linux Containers with Docker !
Francisco Gonçalves
 
PPTX
Using Docker in production: Get started today!
Clarence Bakirtzidis
 
ACROPOLIS CONTAINER SERVICES
TREEPTIK
 
Turning OpenStack Swift into a VM storage platform
OpenStack_Online
 
Docker and containers : Disrupting the virtual machine(VM)
Rama Krishna B
 
Containers Docker Kind Kubernetes Istio
Araf Karsh Hamid
 
Containers 101 Meetup - VMs vs Containers
Tommy Berry
 
Containers and workload security an overview
Krishna-Kumar
 
Demystifying Containerization Principles for Data Scientists
Dr Ganesh Iyer
 
Oracle Database on Docker - Best Practices
gvenzl
 
7 characteristics of container-native infrastructure, Docker Zurich 2015-09-08
Casey Bisson
 
Webinar Docker Tri Series
Newt Global Consulting LLC
 
Production debugging web applications
Ido Flatow
 
The 7 characteristics of container native infrastructure, LinuxCon/ContainerC...
Casey Bisson
 
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
Docker, Inc.
 
Opinionated containers and the future of game servers by Brendan Fosberry
Docker, Inc.
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Docker, Inc.
 
Webinar : Docker in Production
Newt Global Consulting LLC
 
Hypervisor "versus" Linux Containers with Docker !
Francisco Gonçalves
 
Using Docker in production: Get started today!
Clarence Bakirtzidis
 
Ad

Similar to July 2014 HUG : Privilege Isolation in Docker Containers (20)

PDF
Containers and Nutanix - Acropolis Container Services
NEXTtour
 
PPTX
Docker-Intro
Sujai Sivasamy
 
PPTX
OpenStack Summit
Docker, Inc.
 
PPTX
Docker-Hanoi @DKT , Presentation about Docker Ecosystem
Van Phuc
 
PPTX
State of the Container Ecosystem
Vinay Rao
 
PDF
Dockers and kubernetes
Dr Ganesh Iyer
 
PPTX
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
dotCloud
 
PDF
Introduction to Docker
Aditya Konarde
 
PDF
DCEU 18: Provisioning and Managing Storage for Docker Containers
Docker, Inc.
 
PDF
Docker handons-workshop-for-charity
Yusuf Hadiwinata Sutandar
 
PDF
1. Docker Introduction.pdf
AmarGautam15
 
PPTX
ma-formation-en-Docker-jlklk,nknkjn.pptx
imenhamada17
 
PDF
Inside Triton, July 2015
Casey Bisson
 
PPTX
Docker - Portable Deployment
javaonfly
 
PPT
Containers 101
Black Duck by Synopsys
 
PPTX
Intro Docker october 2013
dotCloud
 
PDF
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy
 
PDF
Getting started with MariaDB with Docker
MariaDB plc
 
PDF
Journey to the devops automation with docker kubernetes and openshift
Yusuf Hadiwinata Sutandar
 
Containers and Nutanix - Acropolis Container Services
NEXTtour
 
Docker-Intro
Sujai Sivasamy
 
OpenStack Summit
Docker, Inc.
 
Docker-Hanoi @DKT , Presentation about Docker Ecosystem
Van Phuc
 
State of the Container Ecosystem
Vinay Rao
 
Dockers and kubernetes
Dr Ganesh Iyer
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
dotCloud
 
Introduction to Docker
Aditya Konarde
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
Docker, Inc.
 
Docker handons-workshop-for-charity
Yusuf Hadiwinata Sutandar
 
1. Docker Introduction.pdf
AmarGautam15
 
ma-formation-en-Docker-jlklk,nknkjn.pptx
imenhamada17
 
Inside Triton, July 2015
Casey Bisson
 
Docker - Portable Deployment
javaonfly
 
Containers 101
Black Duck by Synopsys
 
Intro Docker october 2013
dotCloud
 
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy
 
Getting started with MariaDB with Docker
MariaDB plc
 
Journey to the devops automation with docker kubernetes and openshift
Yusuf Hadiwinata Sutandar
 
Ad

More from Yahoo Developer Network (20)

PDF
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Yahoo Developer Network
 
PDF
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Yahoo Developer Network
 
PDF
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Yahoo Developer Network
 
PDF
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Yahoo Developer Network
 
PDF
CICD at Oath using Screwdriver
Yahoo Developer Network
 
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
 
PPTX
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
Yahoo Developer Network
 
PDF
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Yahoo Developer Network
 
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Yahoo Developer Network
 
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
PDF
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Yahoo Developer Network
 
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
 
PDF
Moving the Oath Grid to Docker, Eric Badger, Oath
Yahoo Developer Network
 
PDF
Architecting Petabyte Scale AI Applications
Yahoo Developer Network
 
PDF
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Yahoo Developer Network
 
PPTX
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
 
PDF
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
PPTX
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Yahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Yahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Yahoo Developer Network
 
CICD at Oath using Screwdriver
Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
Yahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Yahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Yahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Yahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Yahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

Recently uploaded (20)

PPTX
Azure Data management Engineer project.pptx
sumitmundhe77
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
1intro to AI.pptx AI components & composition
ssuserb993e5
 
PDF
Company Profile 2023 PT. ZEKON INDONESIA.pdf
hendranofriadi26
 
PPT
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
PDF
A Systems Thinking Approach to Algorithmic Fairness.pdf
Epistamai
 
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Web_Engineering_Assignment_Clean.pptxfor college
HUSNAINAHMAD39
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot (1).pdf
CA Suvidha Chaplot
 
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
PDF
345_IT infrastructure for business management.pdf
LEANHTRAN4
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
Azure Data management Engineer project.pptx
sumitmundhe77
 
Chad Readey - An Independent Thinker
Chad Readey
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
1intro to AI.pptx AI components & composition
ssuserb993e5
 
Company Profile 2023 PT. ZEKON INDONESIA.pdf
hendranofriadi26
 
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
A Systems Thinking Approach to Algorithmic Fairness.pdf
Epistamai
 
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Web_Engineering_Assignment_Clean.pptxfor college
HUSNAINAHMAD39
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot (1).pdf
CA Suvidha Chaplot
 
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
345_IT infrastructure for business management.pdf
LEANHTRAN4
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 

July 2014 HUG : Privilege Isolation in Docker Containers

  • 1. Containers and Hadoop Hadoop virtualization, done right! Dinesh Subhraveti - [email protected] Altiscale Inc.
  • 2. “Brief History of Containers” 2001 2002 2003 20052004 First implementation of containers based on syscall interposition — Columbia
  • 3. “Brief History of Containers” 2001 2002 2003 20052004 First implementation of containers based on syscall interposition — Columbia First research paper on Linux Containers — OSDI’02
  • 4. “Brief History of Containers” 2001 2002 2003 20052004 First research paper on Linux Containers — OSDI’02 First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
  • 5. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
  • 6. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 IBM acquires Meiosys — Focus shifted to AIX First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
  • 7. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 IBM acquires Meiosys — Focus shifted to AIX First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
  • 8. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 IBM acquires Meiosys — Focus shifted to AIX First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia Most core kernel changes finally made into Linux mainline
  • 10. “The new computer needs an OS!” Computer OS Mesos KubernetesYARN
  • 11. Mesos KubernetesYARN Containers: Enabler of the Datacenter OS Computer OS ProcessesContainers: isolated abstractions
  • 12. Why not Virtual Machines? Application — Hardware misalignment Hypervisor Container Host Application Application Applications have round edges — system call interface Hypervisors expose square holes — hardware interface Lightweight abstraction without IO overhead or startup latency
  • 13. Why not Virtual Machines? Application — Hardware misalignment Hypervisor Container Host Application Applications have round edges — system call interface Hypervisors expose square holes — hardware interface Lightweight abstraction without IO overhead or startup latency The unwelcome Guest OS Application
  • 14. Host iSCSI, NFS Image Format Interpreter Virtual Device VM Exit (Context Switch) Guest Driver Guest File System Host Application Why not Virtual Machines? Layers of Intermediate Software VMsContainers Application High IO overhead due to many intermediate layers
  • 15. Why not Virtual Machines? The Unwelcome Guest OS Slow startup time Guest OS licensing and maintenance burden Poor scalability High resource consumption due to duplication Obfuscated network / storage / compute topologies Application semantic information is lost
  • 16. ! Hadoop Resource Manager Map Reduce ! YARN Map Reduce Spark Hbase ... Evolution of Hadoop from Map Reduce to YARN Isolation is an immediate challenge
  • 17. ! Hadoop Resource Manager Map Reduce ! YARN Map Reduce Spark Hbase ... Containers on YARN Containers provide a simple and elegant solution Container Virtualization
  • 18. ! Node Manager Customer A Task 1 Customer B Task 1 Containers on YARN Node Manager Spawned Tasks as Containers Container Virtualization Customer A Task 2 Customer C Task 1 Tasks representing the same job share the same container
  • 19. Containers on YARN Advantages Secure multitenancy Performance Isolation Utilization via coscheduling IO and CPU tasks Consistent cluster environment Isolation of software dependencies / configuration Reproducible way to define app environment Rapid provisioning
  • 20. ❏ Recent addition to the kernel
 ❏ Superuser in container maps to a regular user on the host
 ❏ Docker support for UID virtualization Privilege Isolation through UID namespaces Host Container Container root UID 0 Regular user UID 100 UID Virtualization U Host root UID 0
  • 21. References ! ❏ Blog post describing UID virtualization support in Docker ❏ https://www.altiscale.com/making-docker-work-yarn/ ❏ Apache wiki page tracking work status across Docker and YARN projects ❏ https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers ❏ JIRA tracking Docker integration into YARN ❏ https://issues.apache.org/jira/browse/YARN-1964 ❏ Related Docker tickets ❏ Several tickets linked from: https://github.com/dotcloud/docker/pull/4572
 
 [email protected] Questions?
  • 22. Backup Containers on Hadoop or Hadoop on Containers?
  • 23. Hadoop on Separate Physical Clusters Awesomely Secure ! Everybody gets private hardware running private services Customer 1 Customer 2 Customer 3
  • 24. Hadoop on Separate Physical Clusters Customer 1 Customer 2 Customer 3 Cannot scale the business this way! Poor utilization Host platform is a huge maintenance burden ❖ Customer 1 needs R ❖ Customer 2 needs Matlab ❖ Customer 3 needs ß∂ø… Utilization: 6 Spare: 0 Unused: 3 Utilization: 1 Spare: 6 Unused: 2 Utilization: 4 Spare: 3 Unused: 2
  • 25. Container Clusters to Decouple Host from Customer Each customer gets a container image ❖ Encapsulates customer specific software and configuration ❖ Host platform remains lean and simple Utilization: 6 Spare: 0 Unused: 3 Utilization: 1 Spare: 6 Unused: 2 Utilization: 4 Spare: 3 Unused: 2 Poor utilization Customer 1 Customer 2 Customer 3
  • 26. Global Pool of Resources Global Utilization: 11 Spare: 16 Unused: 0 Container Clusters to Drive Utilization Each customer gets a container image ❖ Encapsulates customer specific software and configuration ❖ Host platform remains lean and simple Densely pack containers together
  • 27. Global Pool of Resources Containers with Fine-grain Resources ❖ Container resource levels adjusted dynamically per customer ➢ As dictated by business policy ❖ Fractional resource allocation
  • 28. Global Pool of Resources Disaggregated Compute and Storage DNNM ❖ Add more storage to Customer 1 cluster from a storage rich node ➢ While a compute intensive job from Customer 2 utilizes the available compute capacity on the same node Independently scale compute and storage

Editor's Notes

  • #21: Loss of locality etc. doesn’t make material difference Suboptimal scheduling No sharing (IA usecase: universities sharing data over a common HDFS)
  • #22: Loss of locality etc. doesn’t make material difference Suboptimal scheduling No sharing (IA usecase: universities sharing data over a common HDFS)
  • #23: Loss of locality etc. doesn’t make material difference Suboptimal scheduling No sharing (IA usecase: universities sharing data over a common HDFS)