SlideShare a Scribd company logo
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
привет!
меня зовут Джером
Я не говорю по России
(к сожалению)
Lightweight Virtualization
with
Linux Containers
and
Docker
Yet another Conference – Moscow, 2013
Jérôme Petazzoni, dotCloud Inc.
Яндекс:
спасибо большое!
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
Why Linux Containers?
What are
we trying
to solve?
The Matrix From Hell
The Matrix From Hell
Many payloads
● backend services (API)
● databases
● distributed stores
● webapps
Many payloads
● Go
● Java
● Node.js
● PHP
● Python
● Ruby
● …
Many payloads
● CherryPy
● Django
● Flask
● Plone
● ...
Many payloads
● Apache
● Gunicorn
● uWSGI
● ...
Many payloads
+ your code
Many targets
● your local development environment
● your coworkers' developement environment
● your Q&A team's test environment
● some random demo/test server
● the staging server(s)
● the production server(s)
● bare metal
● virtual machines
● shared hosting
+ your dog's Raspberry Pi
Many targets
● BSD
● Linux
● OS X
● Windows
Many targets
● BSD
● Linux
● OS X
● Windows
Not yet
The Matrix From Hell
Static website ? ? ? ? ? ? ?
Web frontend ? ? ? ? ? ? ?
background
workers ? ? ? ? ? ? ?
User DB ? ? ? ? ? ? ?
Analytics DB ? ? ? ? ? ? ?
Queue ? ? ? ? ? ? ?
Development VM QA Server Single Prod Server Onsite Cluster Public Cloud Contributor’s laptop Customer Servers
Real-world analogy:
containers
Many products
● clothes
● electronics
● raw materials
● wine
● …
Many transportation methods
● ships
● trains
● trucks
● ...
Another matrix from hell
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ?
Solution to the transport problem:
the intermodal shipping container
Solution to the transport problem:
the intermodal shipping container
● 90% of all cargo now shipped in a standard container
● faster and cheaper to load and unload on ships
(by an order of magnitude)
● less theft, less damage
● freight cost used to be >25% of final goods cost, now <3%
● 5000 ships deliver 200M containers per year
Solution to the deployment problem:
the Linux container
Linux containers...
● run everywhere
– regardless of kernel version
– regardless of host distro
– (but container and host architecture must match*)
● run anything
– if it can run on the host, it can run in the container
– i.e., if it can run on a Linux kernel, it can run
*Unless you emulate CPU with qemu and binfmt
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
What are Linux Containers exactly?
High level approach:
it's a lightweight VM
● own process space
● own network interface
● can run stuff as root
● can have its own /sbin/init
(different from the host)
« Machine Container »
Low level approach:
it's chroot on steroids
● can also not have its own /sbin/init
● container = isolated process(es)
● share kernel with host
● no device emulation (neither HVM nor PV)
« Application Container »
Separation of concerns:
Dmitry the Developer
● inside my container:
– my code
– my libraries
– my package manager
– my app
– my data
Separation of concerns:
Oleg the Ops guy
● outside the container:
– logging
– remote access
– network configuration
– monitoring
How does it work?
Isolation with namespaces
● pid
● mnt
● net
● uts
● ipc
● user
pid namespace
jpetazzo@tarrasque:~$ ps aux | wc -l
212
jpetazzo@tarrasque:~$ sudo docker run -t -i ubuntu bash
root@ea319b8ac416:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 18044 1956 ? S 02:54 0:00 bash
root 16 0.0 0.0 15276 1136 ? R+ 02:55 0:00 ps aux
(That's 2 processes)
mnt namespace
jpetazzo@tarrasque:~$ wc -l /proc/mounts
32 /proc/mounts
root@ea319b8ac416:/# wc -l /proc/mounts
10 /proc/mounts
net namespace
root@ea319b8ac416:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
22: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 2a:d1:4b:7e:bf:b5 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.3/24 brd 10.1.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::28d1:4bff:fe7e:bfb5/64 scope link
valid_lft forever preferred_lft forever
uts namespace
jpetazzo@tarrasque:~$ hostname
tarrasque
root@ea319b8ac416:/# hostname
ea319b8ac416
ipc namespace
jpetazzo@tarrasque:~$ ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 3178496 jpetazzo 600 393216 2 dest
0x00000000 557057 jpetazzo 777 2778672 0
0x00000000 3211266 jpetazzo 600 393216 2 dest
root@ea319b8ac416:/# ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
------ Semaphore Arrays --------
key semid owner perms nsems
------ Message Queues --------
key msqid owner perms used-bytes messages
user namespace
● no « demo » for this one... Yet!
● UID 0→1999 in container C1 is mapped to
UID 10000→11999 in host;
UID 0→1999 in container C2 is mapped to
UID 12000→13999 in host; etc.
● required lots of VFS and FS patches (esp. XFS)
● what will happen with copy-on-write?
– double translation at VFS?
– single root UID on read-only FS?
How does it work?
Isolation with cgroups
● memory
● cpu
● blkio
● devices
memory cgroup
● keeps track pages used by each group:
– file (read/write/mmap from block devices; swap)
– anonymous (stack, heap, anonymous mmap)
– active (recently accessed)
– inactive (candidate for eviction)
● each page is « charged » to a group
● pages can be shared (e.g. if you use any COW FS)
● Individual (per-cgroup) limits and out-of-memory killer
cpu and cpuset cgroups
● keep track of user/system CPU time
● set relative weight per group
● pin groups to specific CPU(s)
– Can be used to « reserve » CPUs for some apps
– This is also relevant for big NUMA systems
blkio cgroups
● keep track IOs for each block device
– read vs write; sync vs async
● set relative weights
● set throttle (limits) for each block device
– read vs write; bytes/sec vs operations/sec
Note: earlier versions (pre-3.8) didn't account async correctly.
3.8 is better, but use 3.10 for best results.
devices cgroups
● controls read/write/mknod permissions
● typically:
– allow: /dev/{tty,zero,random,null}...
– deny: everything else
– maybe: /dev/net/tun, /dev/fuse
If you're serious about security,
you also need…
● capabilities
– okay: cap_ipc_lock, cap_lease, cap_mknod, cap_net_admin,
cap_net_bind_service, cap_net_raw
– troublesome: cap_sys_admin (mount!)
● think twice before granting root
● grsec is nice
● seccomp (very specific use cases); seccomp-bpf
● beware of full-scale kernel exploits!
Efficiency
Efficiency: almost no overhead
● processes are isolated, but run straight on the host
● CPU performance
= native performance
● memory performance
= a few % shaved off for (optional) accounting
● network performance
= small overhead; can be optimized to zero overhead
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
Efficiency: storage-friendly
● unioning filesystems
(AUFS, overlayfs)
● snapshotting filesystems
(BTRFS, ZFS)
● copy-on-write
(thin snapshots with LVM or device-mapper)
This is now being integrated with low-level LXC tools as well!
Efficiency: storage-friendly
● provisioning now takes a few milliseconds
● … and a few kilobytes
● creating a new base image (from a running container) takes a
few seconds (or even less)
Docker
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
What can Docker do?
● Open Source engine to commoditize LXC
● using copy-on-write for quick provisioning
● allowing to create and share images
● standard format for containers
(stack of layers; 1 layer = tarball+metadata)
● standard, reproducible way to easily build trusted images
(Dockerfile)
Docker: authoring images
● you can author « images »
– either with « run+commit » cycles, taking snapshots
– or with a Dockerfile (=source code for a container)
– both ways, it's ridiculously easy
● you can run them
– anywhere
– multiple times
Dockerfile example
FROM ubuntu
RUN apt-get -y update
RUN apt-get install -y g++
RUN apt-get install -y erlang-dev erlang-manpages erlang-base-hipe ...
RUN apt-get install -y libmozjs185-dev libicu-dev libtool ...
RUN apt-get install -y make wget
RUN wget http://.../apache-couchdb-1.3.1.tar.gz | tar -C /tmp -zxf-
RUN cd /tmp/apache-couchdb-* && ./configure && make install
RUN printf "[httpd]nport = 8101nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
Yes, but...
● « I don't need Docker;
I can do all that stuff with LXC tools, rsync, some scripts! »
● correct on all accounts;
but it's also true for apt, dpkg, rpm, yum, etc.
● the whole point is to commoditize,
i.e. make it ridiculously easy to use
Containers before Docker
Containers after Docker
What this really means…
● instead of writing « very small shell scripts » to
manage containers, write them to do the rest:
– continuous deployment/integration/testing
– orchestration
● = use Docker as a building block
● re-use other people images (yay ecosystem!)
Docker: sharing images
● you can push/pull images to/from a registry
(public or private)
● you can search images through a public index
● dotCloud maintains a collection of base images
(Ubuntu, Fedora...)
● satisfaction guaranteed or your money back
Docker: not sharing images
● private registry
– for proprietary code
– or security credentials
– or fast local access
● the private registry is available
as an image on the public registry
(yes, that makes sense)
Typical workflow
● code in local environment
(« dockerized » or not)
● each push to the git repo triggers a hook
● the hook tells a build server to clone the code and run « docker build »
(using the Dockerfile)
● the containers are tested (nosetests, Jenkins...),
and if the tests pass, pushed to the registry
● production servers pull the containers and run them
● for network services, load balancers are updated
Hybrid clouds
● Docker is part of OpenStack « Havana »,
as a Nova driver + Glance translator
● typical workflow:
– code on local environment
– push container to Glance-backed registry
– run and manage containers using OpenStack APIs
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
What's Docker exactly?
● rewrite of dotCloud internal container engine
– original version: Python, tied to dotCloud's internal stuff
– released version: Go, legacy-free
● the Docker daemon runs in the background
– manages containers, images, and builds
– HTTP API (over UNIX or TCP socket)
– embedded CLI talking to the API
● Open Source (GitHub public repository + issue tracking)
● user and dev mailing lists
Docker: the community
● Docker: >160 >170 contributors
● latest milestone (0.6): 40 contributors
● GitHub repository: >600 >680 forks
Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
Docker roadmap
● Today: Docker 0.6
– LXC
– AUFS
● Tomorrow: Docker 0.7
– LXC
– device-mapper thin snapshots (target: RHEL)
● The day after: Docker 1.0
– LXC, libvirt, qemu, KVM, OpenVZ, chroot…
– multiple storage back-ends
– plugins
Docker: the ecosystem
● Cocaine (PAAS; has Docker plugin)
● CoreOS (full distro based on Docker)
● Deis (PAAS; available)
● Dokku (mini-Heroku in 100 lines of bash)
● Flynn (PAAS; in development)
● Maestro (orchestration from a simple YAML file)
● OpenStack integration (in Havana, Nova has a Docker driver)
● Shipper (fabric-like orchestration)
And many more
Cocaine integration
● what's Cocaine?
– Open Source PaaS from Yandex
– modular: can switch logging, storage, etc. without changing apps
– infrastructure abstraction layer + service discovery
– monitoring: metrics collection; load balancing
● why Docker?
– Cocaine initially used cgroups
– wanted to add LXC for better isolation and resource control
– heard about Docker at the right time
– uses custom distributed storage instead of Docker registry
device-mapper thin snapshots
(aka « thinp »)
● start with a 10 GB empty ext4 filesystem
– snapshot: that's the root of everything
● base image:
– clone the original snapshot
– untar image on the clone
– re-snapshot; that's your image
● create container from image:
– clone the image snapshot
– run; repeat cycle as many times as needed
AUFS vs THINP
AUFS
● easy to see changes
● small change =
copy whole file
● ~42 layers
● patched kernel
(Debian, Ubuntu OK)
● efficient caching
● no quotas
THINP
● must diff manually
● small change =
copy 1 block (100k-1M)
● unlimited layers
● stock kernel (>3.2)
(RHEL 2.6.32 OK)
● duplicated pages
● FS size acts as quota
Misconceptions about THINP
● « performance degradation »
no; that was with « old » LVM snapshots
● « can't handle 1000s of volumes »
that's LVM; Docker uses devmapper directly
● « if snapshot volume is out of space,
it breaks and you lose everything »
that's « old » LVM snapshots; thinp halts I/O
● « if still use disk space after 'rm -rf' »
no, thanks to 'discard passdown'
Other features in 0.7
● links
– linked containers can discover each other
– environment variable injection
– allows to expose remote services thru containers
(implements the ambassador pattern)
– side-effect: container naming
● host integration
– we ♥ systemd
0.8 and beyond
● beam
– introspection API
– based on Redis protocol
(i.e. all Redis clients work)
– works well for synchronous req/rep and streams
– reimplementation of Redis core in Go
– think of it as « live environment variables »,
that you can watch/subscribe to
● and much more
Thank you! Questions?
http://docker.io/
https://github.com/dotcloud/docker
@docker
@jpetazzo

More Related Content

PDF
Docker Introduction
PPTX
K8s from Zero to ~Hero~ Seasoned Beginner
PDF
Canary deployment with Traefik and K3S
PPTX
Rancher master class globalized edge workloads with k3s
PDF
Clean Infrastructure as Code
PPTX
Infrastrucutre as Code
PPTX
A basic overview of Containers
PDF
Declarative Import with Magento 2 Import Framework (M2IF)
Docker Introduction
K8s from Zero to ~Hero~ Seasoned Beginner
Canary deployment with Traefik and K3S
Rancher master class globalized edge workloads with k3s
Clean Infrastructure as Code
Infrastrucutre as Code
A basic overview of Containers
Declarative Import with Magento 2 Import Framework (M2IF)

What's hot (20)

PDF
Efficient DevOps Tooling with Java and GraalVM
PDF
CQRS - Eine Einführung - NOUG 2011
PDF
Go for Operations
PPTX
betterCode Workshop: Effizientes DevOps-Tooling mit Go
PDF
Continuous (Non)-Functional Testing of Microservices on k8s
PDF
TDC2018FLN | Trilha Containers - Kubernetes para usuarios Docker.
PDF
TDC2018FLN | Trilha Containers - Redes em containers
PDF
Serverless architectures with Fn Project
PDF
You Want to Kubernetes? You MUST Know Containers!
PDF
Ich brauche einen Abstraktions-Layer für meine Cloud
PDF
4K–Kubernetes with Knative, Kafka and Kamel
PDF
Improving security with Istio | DevNation Tech Talk
PDF
Continuous (Non-)Functional Testing of Microservices on K8s
PDF
Kubernetes laravel and kubernetes
PDF
Tales of Training: Scaling CodeLabs with Swarm Mode and Docker-Compose
PDF
Drone Continuous Integration
PDF
All Things Open 2017: How to Treat a Network as a Container
PDF
GDGSCL - Docker a jeho provoz v Heroku a AWS
PDF
Docker Platform Internals: Taking runtimes and image creation to the next lev...
PPTX
Ci with jenkins docker and mssql belgium
Efficient DevOps Tooling with Java and GraalVM
CQRS - Eine Einführung - NOUG 2011
Go for Operations
betterCode Workshop: Effizientes DevOps-Tooling mit Go
Continuous (Non)-Functional Testing of Microservices on k8s
TDC2018FLN | Trilha Containers - Kubernetes para usuarios Docker.
TDC2018FLN | Trilha Containers - Redes em containers
Serverless architectures with Fn Project
You Want to Kubernetes? You MUST Know Containers!
Ich brauche einen Abstraktions-Layer für meine Cloud
4K–Kubernetes with Knative, Kafka and Kamel
Improving security with Istio | DevNation Tech Talk
Continuous (Non-)Functional Testing of Microservices on K8s
Kubernetes laravel and kubernetes
Tales of Training: Scaling CodeLabs with Swarm Mode and Docker-Compose
Drone Continuous Integration
All Things Open 2017: How to Treat a Network as a Container
GDGSCL - Docker a jeho provoz v Heroku a AWS
Docker Platform Internals: Taking runtimes and image creation to the next lev...
Ci with jenkins docker and mssql belgium
Ad

Viewers also liked (20)

PDF
Docker Plugin for Heat II
PDF
Mobycraft - Docker in 8-bit by Aditya Gupta
PDF
Mobycraft:Docker in 8-bit (Meetup at Docker HQ 4/7)
PDF
Autoscaling Docker Containers by Konstantinos Faliagkas, Docker Birthday #3 A...
PPTX
20 mins to Faking the DevOps Unicorn by Matt williams, Datadog
PPTX
OpenStack Boston
PPTX
DockerCon 14 Keynote Day 2
PDF
Contribute and Collaborate 101
PPTX
Dockerfile Basics Workshop #1
PPTX
DockerCon SF 2015: Orchestration for Devs (machine + compose)
PPTX
Docker at RelateIQ
PPT
Developer Week
PPTX
DockerCon14 eBay
PDF
DockerCon SF 2015: Beyond CI to Production Scale PaaS with Docker
PPTX
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
PDF
DockerCon SF 2015: From Months to Minutes
PDF
How to Use Your Own Private Registry
PPTX
Immutable Infrastructure with Docker and EC2
PPTX
DockerCon 16 - Moby's Cool Hack Session
PPTX
DockerCon SF 2015: Networking Breakout
Docker Plugin for Heat II
Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft:Docker in 8-bit (Meetup at Docker HQ 4/7)
Autoscaling Docker Containers by Konstantinos Faliagkas, Docker Birthday #3 A...
20 mins to Faking the DevOps Unicorn by Matt williams, Datadog
OpenStack Boston
DockerCon 14 Keynote Day 2
Contribute and Collaborate 101
Dockerfile Basics Workshop #1
DockerCon SF 2015: Orchestration for Devs (machine + compose)
Docker at RelateIQ
Developer Week
DockerCon14 eBay
DockerCon SF 2015: Beyond CI to Production Scale PaaS with Docker
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
DockerCon SF 2015: From Months to Minutes
How to Use Your Own Private Registry
Immutable Infrastructure with Docker and EC2
DockerCon 16 - Moby's Cool Hack Session
DockerCon SF 2015: Networking Breakout
Ad

Similar to Lightweight Virtualization with Linux Containers and Docker I YaC 2013 (20)

PDF
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
PDF
Introduction to Docker and Containers
PDF
Let's Containerize New York with Docker!
PDF
Docker and-containers-for-development-and-deployment-scale12x
PDF
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
PDF
Docker Introduction + what is new in 0.9
PDF
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
PDF
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
PDF
A Gentle Introduction to Docker and Containers
PDF
Introduction to Docker (as presented at December 2013 Global Hackathon)
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
PDF
LXC, Docker, and the future of software delivery | LinuxCon 2013
PDF
LXC Docker and the Future of Software Delivery
PDF
Workshop : 45 minutes pour comprendre Docker avec Jérôme Petazzoni
PDF
Introduction to Docker, December 2014 "Tour de France" Edition
PDF
Containers > VMs
PDF
Docker and Containers for Development and Deployment — SCALE12X
PDF
Introduction to Docker (and a bit more) at LSPE meetup Sunnyvale
PDF
[DockerCon 2020] Hardening Docker daemon with Rootless Mode
PDF
A Gentle Introduction To Docker And All Things Containers
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Introduction to Docker and Containers
Let's Containerize New York with Docker!
Docker and-containers-for-development-and-deployment-scale12x
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Docker Introduction + what is new in 0.9
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
A Gentle Introduction to Docker and Containers
Introduction to Docker (as presented at December 2013 Global Hackathon)
Docker 0.11 at MaxCDN meetup in Los Angeles
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC Docker and the Future of Software Delivery
Workshop : 45 minutes pour comprendre Docker avec Jérôme Petazzoni
Introduction to Docker, December 2014 "Tour de France" Edition
Containers > VMs
Docker and Containers for Development and Deployment — SCALE12X
Introduction to Docker (and a bit more) at LSPE meetup Sunnyvale
[DockerCon 2020] Hardening Docker daemon with Rootless Mode
A Gentle Introduction To Docker And All Things Containers

More from Docker, Inc. (20)

PDF
Containerize Your Game Server for the Best Multiplayer Experience
PDF
How to Improve Your Image Builds Using Advance Docker Build
PDF
Build & Deploy Multi-Container Applications to AWS
PDF
Securing Your Containerized Applications with NGINX
PDF
How To Build and Run Node Apps with Docker and Compose
PDF
Hands-on Helm
PDF
Distributed Deep Learning with Docker at Salesforce
PDF
The First 10M Pulls: Building The Official Curl Image for Docker Hub
PDF
Monitoring in a Microservices World
PDF
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
PDF
Predicting Space Weather with Docker
PDF
Become a Docker Power User With Microsoft Visual Studio Code
PDF
How to Use Mirroring and Caching to Optimize your Container Registry
PDF
Monolithic to Microservices + Docker = SDLC on Steroids!
PDF
Kubernetes at Datadog Scale
PDF
Labels, Labels, Labels
PDF
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
PDF
Build & Deploy Multi-Container Applications to AWS
PDF
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
PDF
Developing with Docker for the Arm Architecture
Containerize Your Game Server for the Best Multiplayer Experience
How to Improve Your Image Builds Using Advance Docker Build
Build & Deploy Multi-Container Applications to AWS
Securing Your Containerized Applications with NGINX
How To Build and Run Node Apps with Docker and Compose
Hands-on Helm
Distributed Deep Learning with Docker at Salesforce
The First 10M Pulls: Building The Official Curl Image for Docker Hub
Monitoring in a Microservices World
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
Predicting Space Weather with Docker
Become a Docker Power User With Microsoft Visual Studio Code
How to Use Mirroring and Caching to Optimize your Container Registry
Monolithic to Microservices + Docker = SDLC on Steroids!
Kubernetes at Datadog Scale
Labels, Labels, Labels
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Build & Deploy Multi-Container Applications to AWS
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
Developing with Docker for the Arm Architecture

Lightweight Virtualization with Linux Containers and Docker I YaC 2013

  • 2. привет! меня зовут Джером Я не говорю по России (к сожалению)
  • 3. Lightweight Virtualization with Linux Containers and Docker Yet another Conference – Moscow, 2013 Jérôme Petazzoni, dotCloud Inc.
  • 5. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 6. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 7. Why Linux Containers? What are we trying to solve?
  • 10. Many payloads ● backend services (API) ● databases ● distributed stores ● webapps
  • 11. Many payloads ● Go ● Java ● Node.js ● PHP ● Python ● Ruby ● …
  • 12. Many payloads ● CherryPy ● Django ● Flask ● Plone ● ...
  • 13. Many payloads ● Apache ● Gunicorn ● uWSGI ● ...
  • 15. Many targets ● your local development environment ● your coworkers' developement environment ● your Q&A team's test environment ● some random demo/test server ● the staging server(s) ● the production server(s) ● bare metal ● virtual machines ● shared hosting + your dog's Raspberry Pi
  • 16. Many targets ● BSD ● Linux ● OS X ● Windows
  • 17. Many targets ● BSD ● Linux ● OS X ● Windows Not yet
  • 18. The Matrix From Hell Static website ? ? ? ? ? ? ? Web frontend ? ? ? ? ? ? ? background workers ? ? ? ? ? ? ? User DB ? ? ? ? ? ? ? Analytics DB ? ? ? ? ? ? ? Queue ? ? ? ? ? ? ? Development VM QA Server Single Prod Server Onsite Cluster Public Cloud Contributor’s laptop Customer Servers
  • 20. Many products ● clothes ● electronics ● raw materials ● wine ● …
  • 21. Many transportation methods ● ships ● trains ● trucks ● ...
  • 22. Another matrix from hell ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  • 23. Solution to the transport problem: the intermodal shipping container
  • 24. Solution to the transport problem: the intermodal shipping container ● 90% of all cargo now shipped in a standard container ● faster and cheaper to load and unload on ships (by an order of magnitude) ● less theft, less damage ● freight cost used to be >25% of final goods cost, now <3% ● 5000 ships deliver 200M containers per year
  • 25. Solution to the deployment problem: the Linux container
  • 26. Linux containers... ● run everywhere – regardless of kernel version – regardless of host distro – (but container and host architecture must match*) ● run anything – if it can run on the host, it can run in the container – i.e., if it can run on a Linux kernel, it can run *Unless you emulate CPU with qemu and binfmt
  • 27. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 28. What are Linux Containers exactly?
  • 29. High level approach: it's a lightweight VM ● own process space ● own network interface ● can run stuff as root ● can have its own /sbin/init (different from the host) « Machine Container »
  • 30. Low level approach: it's chroot on steroids ● can also not have its own /sbin/init ● container = isolated process(es) ● share kernel with host ● no device emulation (neither HVM nor PV) « Application Container »
  • 31. Separation of concerns: Dmitry the Developer ● inside my container: – my code – my libraries – my package manager – my app – my data
  • 32. Separation of concerns: Oleg the Ops guy ● outside the container: – logging – remote access – network configuration – monitoring
  • 33. How does it work? Isolation with namespaces ● pid ● mnt ● net ● uts ● ipc ● user
  • 34. pid namespace jpetazzo@tarrasque:~$ ps aux | wc -l 212 jpetazzo@tarrasque:~$ sudo docker run -t -i ubuntu bash root@ea319b8ac416:/# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 18044 1956 ? S 02:54 0:00 bash root 16 0.0 0.0 15276 1136 ? R+ 02:55 0:00 ps aux (That's 2 processes)
  • 35. mnt namespace jpetazzo@tarrasque:~$ wc -l /proc/mounts 32 /proc/mounts root@ea319b8ac416:/# wc -l /proc/mounts 10 /proc/mounts
  • 36. net namespace root@ea319b8ac416:/# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 22: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 2a:d1:4b:7e:bf:b5 brd ff:ff:ff:ff:ff:ff inet 10.1.1.3/24 brd 10.1.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::28d1:4bff:fe7e:bfb5/64 scope link valid_lft forever preferred_lft forever
  • 38. ipc namespace jpetazzo@tarrasque:~$ ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 3178496 jpetazzo 600 393216 2 dest 0x00000000 557057 jpetazzo 777 2778672 0 0x00000000 3211266 jpetazzo 600 393216 2 dest root@ea319b8ac416:/# ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status ------ Semaphore Arrays -------- key semid owner perms nsems ------ Message Queues -------- key msqid owner perms used-bytes messages
  • 39. user namespace ● no « demo » for this one... Yet! ● UID 0→1999 in container C1 is mapped to UID 10000→11999 in host; UID 0→1999 in container C2 is mapped to UID 12000→13999 in host; etc. ● required lots of VFS and FS patches (esp. XFS) ● what will happen with copy-on-write? – double translation at VFS? – single root UID on read-only FS?
  • 40. How does it work? Isolation with cgroups ● memory ● cpu ● blkio ● devices
  • 41. memory cgroup ● keeps track pages used by each group: – file (read/write/mmap from block devices; swap) – anonymous (stack, heap, anonymous mmap) – active (recently accessed) – inactive (candidate for eviction) ● each page is « charged » to a group ● pages can be shared (e.g. if you use any COW FS) ● Individual (per-cgroup) limits and out-of-memory killer
  • 42. cpu and cpuset cgroups ● keep track of user/system CPU time ● set relative weight per group ● pin groups to specific CPU(s) – Can be used to « reserve » CPUs for some apps – This is also relevant for big NUMA systems
  • 43. blkio cgroups ● keep track IOs for each block device – read vs write; sync vs async ● set relative weights ● set throttle (limits) for each block device – read vs write; bytes/sec vs operations/sec Note: earlier versions (pre-3.8) didn't account async correctly. 3.8 is better, but use 3.10 for best results.
  • 44. devices cgroups ● controls read/write/mknod permissions ● typically: – allow: /dev/{tty,zero,random,null}... – deny: everything else – maybe: /dev/net/tun, /dev/fuse
  • 45. If you're serious about security, you also need… ● capabilities – okay: cap_ipc_lock, cap_lease, cap_mknod, cap_net_admin, cap_net_bind_service, cap_net_raw – troublesome: cap_sys_admin (mount!) ● think twice before granting root ● grsec is nice ● seccomp (very specific use cases); seccomp-bpf ● beware of full-scale kernel exploits!
  • 47. Efficiency: almost no overhead ● processes are isolated, but run straight on the host ● CPU performance = native performance ● memory performance = a few % shaved off for (optional) accounting ● network performance = small overhead; can be optimized to zero overhead
  • 48. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 49. Efficiency: storage-friendly ● unioning filesystems (AUFS, overlayfs) ● snapshotting filesystems (BTRFS, ZFS) ● copy-on-write (thin snapshots with LVM or device-mapper) This is now being integrated with low-level LXC tools as well!
  • 50. Efficiency: storage-friendly ● provisioning now takes a few milliseconds ● … and a few kilobytes ● creating a new base image (from a running container) takes a few seconds (or even less)
  • 52. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 53. What can Docker do? ● Open Source engine to commoditize LXC ● using copy-on-write for quick provisioning ● allowing to create and share images ● standard format for containers (stack of layers; 1 layer = tarball+metadata) ● standard, reproducible way to easily build trusted images (Dockerfile)
  • 54. Docker: authoring images ● you can author « images » – either with « run+commit » cycles, taking snapshots – or with a Dockerfile (=source code for a container) – both ways, it's ridiculously easy ● you can run them – anywhere – multiple times
  • 55. Dockerfile example FROM ubuntu RUN apt-get -y update RUN apt-get install -y g++ RUN apt-get install -y erlang-dev erlang-manpages erlang-base-hipe ... RUN apt-get install -y libmozjs185-dev libicu-dev libtool ... RUN apt-get install -y make wget RUN wget http://.../apache-couchdb-1.3.1.tar.gz | tar -C /tmp -zxf- RUN cd /tmp/apache-couchdb-* && ./configure && make install RUN printf "[httpd]nport = 8101nbind_address = 0.0.0.0" > /usr/local/etc/couchdb/local.d/docker.ini EXPOSE 8101 CMD ["/usr/local/bin/couchdb"]
  • 56. Yes, but... ● « I don't need Docker; I can do all that stuff with LXC tools, rsync, some scripts! » ● correct on all accounts; but it's also true for apt, dpkg, rpm, yum, etc. ● the whole point is to commoditize, i.e. make it ridiculously easy to use
  • 59. What this really means… ● instead of writing « very small shell scripts » to manage containers, write them to do the rest: – continuous deployment/integration/testing – orchestration ● = use Docker as a building block ● re-use other people images (yay ecosystem!)
  • 60. Docker: sharing images ● you can push/pull images to/from a registry (public or private) ● you can search images through a public index ● dotCloud maintains a collection of base images (Ubuntu, Fedora...) ● satisfaction guaranteed or your money back
  • 61. Docker: not sharing images ● private registry – for proprietary code – or security credentials – or fast local access ● the private registry is available as an image on the public registry (yes, that makes sense)
  • 62. Typical workflow ● code in local environment (« dockerized » or not) ● each push to the git repo triggers a hook ● the hook tells a build server to clone the code and run « docker build » (using the Dockerfile) ● the containers are tested (nosetests, Jenkins...), and if the tests pass, pushed to the registry ● production servers pull the containers and run them ● for network services, load balancers are updated
  • 63. Hybrid clouds ● Docker is part of OpenStack « Havana », as a Nova driver + Glance translator ● typical workflow: – code on local environment – push container to Glance-backed registry – run and manage containers using OpenStack APIs
  • 64. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 65. What's Docker exactly? ● rewrite of dotCloud internal container engine – original version: Python, tied to dotCloud's internal stuff – released version: Go, legacy-free ● the Docker daemon runs in the background – manages containers, images, and builds – HTTP API (over UNIX or TCP socket) – embedded CLI talking to the API ● Open Source (GitHub public repository + issue tracking) ● user and dev mailing lists
  • 66. Docker: the community ● Docker: >160 >170 contributors ● latest milestone (0.6): 40 contributors ● GitHub repository: >600 >680 forks
  • 67. Outline ● Why Linux Containers? ● What are Linux Containers exactly? ● What do we need on top of LXC? ● Why Docker? ● What is Docker exactly? ● Where is it going?
  • 68. Docker roadmap ● Today: Docker 0.6 – LXC – AUFS ● Tomorrow: Docker 0.7 – LXC – device-mapper thin snapshots (target: RHEL) ● The day after: Docker 1.0 – LXC, libvirt, qemu, KVM, OpenVZ, chroot… – multiple storage back-ends – plugins
  • 69. Docker: the ecosystem ● Cocaine (PAAS; has Docker plugin) ● CoreOS (full distro based on Docker) ● Deis (PAAS; available) ● Dokku (mini-Heroku in 100 lines of bash) ● Flynn (PAAS; in development) ● Maestro (orchestration from a simple YAML file) ● OpenStack integration (in Havana, Nova has a Docker driver) ● Shipper (fabric-like orchestration) And many more
  • 70. Cocaine integration ● what's Cocaine? – Open Source PaaS from Yandex – modular: can switch logging, storage, etc. without changing apps – infrastructure abstraction layer + service discovery – monitoring: metrics collection; load balancing ● why Docker? – Cocaine initially used cgroups – wanted to add LXC for better isolation and resource control – heard about Docker at the right time – uses custom distributed storage instead of Docker registry
  • 71. device-mapper thin snapshots (aka « thinp ») ● start with a 10 GB empty ext4 filesystem – snapshot: that's the root of everything ● base image: – clone the original snapshot – untar image on the clone – re-snapshot; that's your image ● create container from image: – clone the image snapshot – run; repeat cycle as many times as needed
  • 72. AUFS vs THINP AUFS ● easy to see changes ● small change = copy whole file ● ~42 layers ● patched kernel (Debian, Ubuntu OK) ● efficient caching ● no quotas THINP ● must diff manually ● small change = copy 1 block (100k-1M) ● unlimited layers ● stock kernel (>3.2) (RHEL 2.6.32 OK) ● duplicated pages ● FS size acts as quota
  • 73. Misconceptions about THINP ● « performance degradation » no; that was with « old » LVM snapshots ● « can't handle 1000s of volumes » that's LVM; Docker uses devmapper directly ● « if snapshot volume is out of space, it breaks and you lose everything » that's « old » LVM snapshots; thinp halts I/O ● « if still use disk space after 'rm -rf' » no, thanks to 'discard passdown'
  • 74. Other features in 0.7 ● links – linked containers can discover each other – environment variable injection – allows to expose remote services thru containers (implements the ambassador pattern) – side-effect: container naming ● host integration – we ♥ systemd
  • 75. 0.8 and beyond ● beam – introspection API – based on Redis protocol (i.e. all Redis clients work) – works well for synchronous req/rep and streams – reimplementation of Redis core in Go – think of it as « live environment variables », that you can watch/subscribe to ● and much more