SlideShare a Scribd company logo
Kohei Tokunaga, NTT Corporation
Startup Containers in Lightning Speed
with Lazy Image Distribution
l Pull is one of the time-consuming steps in container lifecycle
l Stargz Snapshotter, non-core subproject of containerd, is trying to solve it by lazy-pulling
images leveraging stargz image by Google
• Further runtime optimization is also held with an extended version of stargz (eStargz)
l There are also other OCI-alternative image distribution strategies in container ecosystem
Summary
Host: EC2 Oregon (m5.2xlarge, Ubuntu 20.04)
Registry: Docker Hub (docker.io)
Commit b53e8fe
(See detailed info in the later slides)
[sec]
0 5 10 15 20 25
estargz
stargz
legacy
python:3.7 (print “hello”)
pull create run
Pull is time-consuming
pulling packages accounts for 76% of container start time,
but only 6.4% of that data is read [Harter et al. 2016]
[Harter et al. 2016] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. "Slacker: Fast Distribution with Lazy Docker
Containers". 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016, Santa Clara, CA, USA
Caching images
Minimizing image size
Cold start is still slow
Not all images are minimizable
Language runtimes, frameworks, etc.
Workarounds are known but not enough
NodeRegistry
Image Container
pull run
OCI/Docker Specs for image distribution
A container is a set of layers
Distribution Spec
l Defines HTTP API of registry
l Layer can be fetched as a “blob” named
with a content-addressable digest
l Optional support for HTTP Range Request
Registry
sha256:deadbeaf…
sha256:1a3b5c…
sha256:ffe63c…
sha256:6ccde1…
GET /v2/<image-name>/blobs/
layers
(blobs)
rootfs
Extract
&
Mergemani
fest
Image Spec
l Defines layers and metadata (image
manifest, etc.)
l Layer is defined as tar (+compression)
l Rootfs can be composed by merging layers
layers
Image
Problems on the OCI/Docker Specs
sha256:deadbeaf…
sha256:1a3b5c…
sha256:ffe63c…
sha256:6ccde1…
GET /v2/<image-name>/blobs/
bin/bash
bin/ls
etc/passwd
etc/group
usr/bin/apt
layer =
tarball (+compression)
A container is a set of tarball layers
A container can’t be started until the all layers become locally available
even if the most of the contents won’t be used on container startup
l Need to scan the entire blob even for
extracting single file entry
• If the blob is gzip-compressed,
it’s non-seekable anymore
l No parallel extraction
• Need to scan the blob from the
top, sequentially
Lazypull with containerd Stargz Snapshotter
Stargz
Snapshotter
kubelet, etc
OCI
runtimes
Container Registry
lazypullstargz
images
l Non-core subproject of containerd
l Works as a plugin of containerd
l Standard-compliant lazy pull leveraging stargz image by Google
Stargz Snapshotter
doesn’t download the entire image on pull operation
but fetches necessary chunks of contents on-demand
https://github.com/containerd/stargz-snapshotter
Standard-compliant lazypull
l Leverages OCI/Docker compatibility of stargz:
• can be lazily pulled from standard registries
• can also be run by legacy runtimes (but not lazily pulled)
l Mounts rootfs snapshots as FUSE and downloads accessed file contents on-demand
Proc
Stargz
Snapshotter
stargz
images
containerKubelet, etc.
Standard Registries
(e.g. Docker Hub)
Node
Lazy
pull
Mount rootfs as FUSE
pulling file contents on demand
still pullable/runnable
by legacy runtimes
implemented as a
remote snapshotter plugin
Stargz archive format
l Proposed by Google CRFS project: https://github.com/google/crfs
l Stands for Seekable targz so it’s seekable but still valid targz = usable as a valid OCI/Docker image layer
l Entries can be extracted separately
• Can be fetched separately from registries using HTTP Range Request
tar.gz layer stargz layer
bin/bash
bin/ls
etc/passwd
etc/group
usr/bin/apt
TOCEntries:
index and files metadata
needs to scan the
entire blob even for
getting single entry
can be extracted per-file
with HTTP Range Request
bin/bash
bin/ls
etc/passwd
etc/group
usr/bin/apt
non-seekable seekable
gzip member
per regular file
eStargz archive for prefetch
l NW-related overheads can’t be ignored for on-demand fetching with stargz
l eStargz enables to prefetch files that are llikey accessed during runtime (= prioritized files)
l Filesystem prefetches and pre-caches these files with a single HTTP Range Request on mount
landmark file
Files prefetched
by a single HTTP Range Requestbin/ls
usr/bin/apt
entrypoint.sh
sort
stargz layer eStargz layer
likely accessed
during runtime too
Prioritized files
bin/bash
bin/ls
usr/bin/apt
entrypoint.sh
bin/bash
Files fetched on demand
but aggressively download
in background
TOCEntriesTOCEntries
Workload-based runtime optimization with eStargz
l Leveraging eStargz, CLI converter command provides workload-based optimization
l Generally, containers are built with purpose
• Workloads are defined in the Dockerfile, etc. (entrypoint, user, envvar, etc…) and stored in the image
l CLI converter runs provided image in a sandbox and profiles all file accesses
• Regards accessed files are also likely accessed during runtime (= prioritized files in eStargz)
• Stargz Snapshotter will prefetch and pre-caches these files when mounts this eStargz image
eStargz
imageMeta
data
Original Image
Optimized image
for the workload
Contains workload
information
Specified by Dockerfile, etc.
(entrypoint, user, envvar)
Custom workloads can be
specified throught CLI options
Profile file accesses
in a sandbox
proc
sandbox
Benchmarking results
l Measures the container startup time which includes:
• Pulling an image from Docker Hub
• For language containers, running “print hello world” program in the container
• For server containers, waiting for the readiness (until “up and running” message is printed)
Ø This method is based on Hello Bench [Harter, et al. 2016]
l Takes 95 percentile of 100 operations
l Host: EC2 Oregon (m5.2xlarge, Ubuntu 20.04)
l Registry: Docker Hub (docker.io)
l Target commit: b53e8fe8d37751753bc623b037729b6a6d9c1122
[Harter et al. 2016] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-
Dusseau, Remzi H. Arpaci-Dusseau. "Slacker: Fast Distribution with Lazy
Docker Containers". 14th USENIX Conference on File and Storage
Technologies (FAST ’16). February 22–25, 2016, Santa Clara, CA, USA Credit to Akihiro Suda (NTT) for discussion and experiment environment
Time to take for container startup
0 5 10 15 20 25
estargz
stargz
legacy
python:3.7 (print “hello”)
pull create run [sec]
Waits for prefetch completion
Credit to Akihiro Suda (NTT) for discussion and experiment environment
Time to take for container startup
0 5 10 15 20 25 30
estargz
stargz
legacy
gcc:9.2.0 (compiles and runs printf(“hello”);)
pull create run [sec]
Credit to Akihiro Suda (NTT) for discussion and experiment environment
Time to take for container startup
0 5 10 15 20 25
estargz
stargz
legacy
glassfish:4.1-jdk8 (runs until “Running GlassFish” is printed)
pull create run [sec]
Credit to Akihiro Suda (NTT) for discussion and experiment environment
Expected use-cases
Speeding up base image distribution on image build
l Especially for temporary base images of “dev” stages in multi-stage build
• won’t be included in the result image
• https://github.com/moby/buildkit/pull/1402
Speeding up dev pipeline (or building/testing environment)
l The initial motivation in Go community to invent stargz was to speed up
the builder image distribution in their build system
• https://github.com/golang/go/issues/30829
Sharing large scientific software stack (e.g. ML frameworks)
l For example, ML frameworks tend to be large (> 1GB)
Improving cold start performance (e.g. Serverless)
l But needs more investigation
• https://github.com/knative/serving/issues/5913
Stargz Snapshotter is still in early stage
Ø Needs more performance improvements for
the filesystem
Ø Lazy pull performance seems to be affected
by the internet condition (e.g. CDN), etc.
Ø Be careful for the fault tolerance until the
layer contents are fully cached
Ø …
Feedbacks/comments are always welcome!
Other OCI-alternative lazy image distribution
Slacker: https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter
l Uses NFS infra for the distribution of rootfs snapshots of containers
l Registries are used for sharing snapshot IDs among hosts
CernVM-FS: https://cvmfs.readthedocs.io/en/stable/
l FUSE Filesystem by CERN for sharing High Energy Physics (HEP) software on worldwide infrastructure
l Software stack can be mounted and lazily downloaded from CernVM-FS “repository” via HTTP
l Remote Snapshotter implementation for containerd
• https://github.com/cvmfs/containerd-remote-snapshotter
l On-going discussion towards integration with Podman
• https://github.com/containers/storage/issues/383
Other OCI-alternative lazy image distribution
Filegrain: https://github.com/akihirosuda/filegrain
l Proposed by Akihiro Suda (NTT)
l OCI compliant image format but uses continuity manifests as layers
l An image can be mounted and files are pulled lazily
l Each file is treated as a content-addressable blob => de-duplication in file granuality
On-going discussion towards “OCIv2”: https://hackmd.io/@cyphar/ociv2-brainstorm
l Proposed by Aleksa Sarai (SUSE)
l Brainstorm is in progress (2020/07)
l Lazy fetch support, mountable filesystem are also in the scope
crfs-plugin for fuse-overlayfs: https://github.com/giuseppe/crfs-plugin
l Proposed by Giuseppe Scrivano (Red Hat)
l Plugin of fuse-overlayfs for mounting stargz layer
Recap
l Pull is one of the time-consuming steps in the container lifecycle.
l Stargz Snapshotter, non-core subproject in containerd, is trying to solve it by lazy-pulling
images leveraging stargz image by Google.
• Standard compliant so can be pushed to and lazily pulled from standard registries
• Workload-based runtime optimization is also held with eStargz
l There are also other OCI-alternative image distribution strategies in container ecosystem
Feedbacks and suggestions are always welcome!
https://github.com/containerd/stargz-snapshotter
Startup Containers in Lightning Speed with Lazy Image Distribution

More Related Content

What's hot (20)

PDF
BuildKitでLazy Pullを有効にしてビルドを早くする話
Kohei Tokunaga
 
PDF
P2P Container Image Distribution on IPFS With containerd and nerdctl
Kohei Tokunaga
 
PDF
Introduction and Deep Dive Into Containerd
Kohei Tokunaga
 
PPTX
Java applications containerized and deployed
Anthony Dahanne
 
PDF
SCALE 2011 Deploying OpenStack with Chef
Matt Ray
 
PDF
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
Shannon McFarland
 
PDF
App container rkt
Xiaofeng Guo
 
PDF
[FOSDEM 2020] Lazy distribution of container images
Akihiro Suda
 
PDF
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
Akihiro Suda
 
PDF
【CNDO2021】Calicoのデプロイをミスって本番クラスタを壊しそうになった話
katsuya kawabe
 
PPTX
Usernetes: Kubernetes as a non-root user
Akihiro Suda
 
PDF
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
Akihiro Suda
 
PDF
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
Akihiro Suda
 
PPTX
Introduction to Docker
Nissan Dookeran
 
PDF
[KubeCon NA 2020] containerd: Rootless Containers 2020
Akihiro Suda
 
PDF
[KubeCon EU 2020] containerd Deep Dive
Akihiro Suda
 
PDF
Learning kubernetes
Eueung Mulyana
 
PDF
[DockerCon 2020] Hardening Docker daemon with Rootless Mode
Akihiro Suda
 
PDF
ISC HPCW talks
Akihiro Suda
 
PDF
Docker on ARM Raspberry Pi
Eueung Mulyana
 
BuildKitでLazy Pullを有効にしてビルドを早くする話
Kohei Tokunaga
 
P2P Container Image Distribution on IPFS With containerd and nerdctl
Kohei Tokunaga
 
Introduction and Deep Dive Into Containerd
Kohei Tokunaga
 
Java applications containerized and deployed
Anthony Dahanne
 
SCALE 2011 Deploying OpenStack with Chef
Matt Ray
 
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
Shannon McFarland
 
App container rkt
Xiaofeng Guo
 
[FOSDEM 2020] Lazy distribution of container images
Akihiro Suda
 
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
Akihiro Suda
 
【CNDO2021】Calicoのデプロイをミスって本番クラスタを壊しそうになった話
katsuya kawabe
 
Usernetes: Kubernetes as a non-root user
Akihiro Suda
 
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
Akihiro Suda
 
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
Akihiro Suda
 
Introduction to Docker
Nissan Dookeran
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
Akihiro Suda
 
[KubeCon EU 2020] containerd Deep Dive
Akihiro Suda
 
Learning kubernetes
Eueung Mulyana
 
[DockerCon 2020] Hardening Docker daemon with Rootless Mode
Akihiro Suda
 
ISC HPCW talks
Akihiro Suda
 
Docker on ARM Raspberry Pi
Eueung Mulyana
 

Similar to Startup Containers in Lightning Speed with Lazy Image Distribution (20)

PPTX
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Sai praveen Seva
 
PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
FILEgrain: Transport-Agnostic, Fine-Grained Content-Addressable Container Ima...
Akihiro Suda
 
PDF
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy
 
PPTX
The state of containerd
Docker, Inc.
 
PDF
Introduction to Docker storage, volume and image
ejlp12
 
PDF
containerd summit - Deep Dive into containerd
Docker, Inc.
 
PDF
From data centers to fog computing: the evaporating cloud
FogGuru MSCA Project
 
PDF
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
PDF
Alternatives to layer-based image distribution: using CERN filesystem for images
George Lestaris
 
PDF
Container Performance Analysis Brendan Gregg, Netflix
Docker, Inc.
 
PDF
DCEU 18: Tips and Tricks of the Docker Captains
Docker, Inc.
 
PPTX
Containerd internals: building a core container runtime
Docker, Inc.
 
PDF
Dockercon 2015 Recap
ehazlett
 
PDF
Container Performance Analysis
Brendan Gregg
 
PDF
DCSF19 Containers for Beginners
Docker, Inc.
 
PDF
What's Running My Containers? A review of runtimes and standards.
Phil Estes
 
PDF
Containers for Science and High-Performance Computing
Dmitry Spodarets
 
PPTX
containerd the universal container runtime
Docker, Inc.
 
PDF
Be a better developer with Docker (revision 3)
Nicola Paolucci
 
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Sai praveen Seva
 
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
FILEgrain: Transport-Agnostic, Fine-Grained Content-Addressable Container Ima...
Akihiro Suda
 
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy
 
The state of containerd
Docker, Inc.
 
Introduction to Docker storage, volume and image
ejlp12
 
containerd summit - Deep Dive into containerd
Docker, Inc.
 
From data centers to fog computing: the evaporating cloud
FogGuru MSCA Project
 
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
Alternatives to layer-based image distribution: using CERN filesystem for images
George Lestaris
 
Container Performance Analysis Brendan Gregg, Netflix
Docker, Inc.
 
DCEU 18: Tips and Tricks of the Docker Captains
Docker, Inc.
 
Containerd internals: building a core container runtime
Docker, Inc.
 
Dockercon 2015 Recap
ehazlett
 
Container Performance Analysis
Brendan Gregg
 
DCSF19 Containers for Beginners
Docker, Inc.
 
What's Running My Containers? A review of runtimes and standards.
Phil Estes
 
Containers for Science and High-Performance Computing
Dmitry Spodarets
 
containerd the universal container runtime
Docker, Inc.
 
Be a better developer with Docker (revision 3)
Nicola Paolucci
 
Ad

More from Kohei Tokunaga (8)

PDF
BuildKitの概要と最近の機能
Kohei Tokunaga
 
PDF
Dockerからcontainerdへの移行
Kohei Tokunaga
 
PDF
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Kohei Tokunaga
 
PDF
OCIランタイムの筆頭「runc」を俯瞰する
Kohei Tokunaga
 
PDF
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
Kohei Tokunaga
 
PDF
5分で振り返るKubeCon EU 2019:ランタイムとイメージの話題ダイジェスト
Kohei Tokunaga
 
PDF
今話題のいろいろなコンテナランタイムを比較してみた
Kohei Tokunaga
 
PDF
コンテナ未経験新人が学ぶコンテナ技術入門
Kohei Tokunaga
 
BuildKitの概要と最近の機能
Kohei Tokunaga
 
Dockerからcontainerdへの移行
Kohei Tokunaga
 
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Kohei Tokunaga
 
OCIランタイムの筆頭「runc」を俯瞰する
Kohei Tokunaga
 
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
Kohei Tokunaga
 
5分で振り返るKubeCon EU 2019:ランタイムとイメージの話題ダイジェスト
Kohei Tokunaga
 
今話題のいろいろなコンテナランタイムを比較してみた
Kohei Tokunaga
 
コンテナ未経験新人が学ぶコンテナ技術入門
Kohei Tokunaga
 
Ad

Recently uploaded (20)

PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 

Startup Containers in Lightning Speed with Lazy Image Distribution

  • 1. Kohei Tokunaga, NTT Corporation Startup Containers in Lightning Speed with Lazy Image Distribution
  • 2. l Pull is one of the time-consuming steps in container lifecycle l Stargz Snapshotter, non-core subproject of containerd, is trying to solve it by lazy-pulling images leveraging stargz image by Google • Further runtime optimization is also held with an extended version of stargz (eStargz) l There are also other OCI-alternative image distribution strategies in container ecosystem Summary Host: EC2 Oregon (m5.2xlarge, Ubuntu 20.04) Registry: Docker Hub (docker.io) Commit b53e8fe (See detailed info in the later slides) [sec] 0 5 10 15 20 25 estargz stargz legacy python:3.7 (print “hello”) pull create run
  • 3. Pull is time-consuming pulling packages accounts for 76% of container start time, but only 6.4% of that data is read [Harter et al. 2016] [Harter et al. 2016] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. "Slacker: Fast Distribution with Lazy Docker Containers". 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016, Santa Clara, CA, USA Caching images Minimizing image size Cold start is still slow Not all images are minimizable Language runtimes, frameworks, etc. Workarounds are known but not enough NodeRegistry Image Container pull run
  • 4. OCI/Docker Specs for image distribution A container is a set of layers Distribution Spec l Defines HTTP API of registry l Layer can be fetched as a “blob” named with a content-addressable digest l Optional support for HTTP Range Request Registry sha256:deadbeaf… sha256:1a3b5c… sha256:ffe63c… sha256:6ccde1… GET /v2/<image-name>/blobs/ layers (blobs) rootfs Extract & Mergemani fest Image Spec l Defines layers and metadata (image manifest, etc.) l Layer is defined as tar (+compression) l Rootfs can be composed by merging layers layers Image
  • 5. Problems on the OCI/Docker Specs sha256:deadbeaf… sha256:1a3b5c… sha256:ffe63c… sha256:6ccde1… GET /v2/<image-name>/blobs/ bin/bash bin/ls etc/passwd etc/group usr/bin/apt layer = tarball (+compression) A container is a set of tarball layers A container can’t be started until the all layers become locally available even if the most of the contents won’t be used on container startup l Need to scan the entire blob even for extracting single file entry • If the blob is gzip-compressed, it’s non-seekable anymore l No parallel extraction • Need to scan the blob from the top, sequentially
  • 6. Lazypull with containerd Stargz Snapshotter Stargz Snapshotter kubelet, etc OCI runtimes Container Registry lazypullstargz images l Non-core subproject of containerd l Works as a plugin of containerd l Standard-compliant lazy pull leveraging stargz image by Google Stargz Snapshotter doesn’t download the entire image on pull operation but fetches necessary chunks of contents on-demand https://github.com/containerd/stargz-snapshotter
  • 7. Standard-compliant lazypull l Leverages OCI/Docker compatibility of stargz: • can be lazily pulled from standard registries • can also be run by legacy runtimes (but not lazily pulled) l Mounts rootfs snapshots as FUSE and downloads accessed file contents on-demand Proc Stargz Snapshotter stargz images containerKubelet, etc. Standard Registries (e.g. Docker Hub) Node Lazy pull Mount rootfs as FUSE pulling file contents on demand still pullable/runnable by legacy runtimes implemented as a remote snapshotter plugin
  • 8. Stargz archive format l Proposed by Google CRFS project: https://github.com/google/crfs l Stands for Seekable targz so it’s seekable but still valid targz = usable as a valid OCI/Docker image layer l Entries can be extracted separately • Can be fetched separately from registries using HTTP Range Request tar.gz layer stargz layer bin/bash bin/ls etc/passwd etc/group usr/bin/apt TOCEntries: index and files metadata needs to scan the entire blob even for getting single entry can be extracted per-file with HTTP Range Request bin/bash bin/ls etc/passwd etc/group usr/bin/apt non-seekable seekable gzip member per regular file
  • 9. eStargz archive for prefetch l NW-related overheads can’t be ignored for on-demand fetching with stargz l eStargz enables to prefetch files that are llikey accessed during runtime (= prioritized files) l Filesystem prefetches and pre-caches these files with a single HTTP Range Request on mount landmark file Files prefetched by a single HTTP Range Requestbin/ls usr/bin/apt entrypoint.sh sort stargz layer eStargz layer likely accessed during runtime too Prioritized files bin/bash bin/ls usr/bin/apt entrypoint.sh bin/bash Files fetched on demand but aggressively download in background TOCEntriesTOCEntries
  • 10. Workload-based runtime optimization with eStargz l Leveraging eStargz, CLI converter command provides workload-based optimization l Generally, containers are built with purpose • Workloads are defined in the Dockerfile, etc. (entrypoint, user, envvar, etc…) and stored in the image l CLI converter runs provided image in a sandbox and profiles all file accesses • Regards accessed files are also likely accessed during runtime (= prioritized files in eStargz) • Stargz Snapshotter will prefetch and pre-caches these files when mounts this eStargz image eStargz imageMeta data Original Image Optimized image for the workload Contains workload information Specified by Dockerfile, etc. (entrypoint, user, envvar) Custom workloads can be specified throught CLI options Profile file accesses in a sandbox proc sandbox
  • 11. Benchmarking results l Measures the container startup time which includes: • Pulling an image from Docker Hub • For language containers, running “print hello world” program in the container • For server containers, waiting for the readiness (until “up and running” message is printed) Ø This method is based on Hello Bench [Harter, et al. 2016] l Takes 95 percentile of 100 operations l Host: EC2 Oregon (m5.2xlarge, Ubuntu 20.04) l Registry: Docker Hub (docker.io) l Target commit: b53e8fe8d37751753bc623b037729b6a6d9c1122 [Harter et al. 2016] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci- Dusseau, Remzi H. Arpaci-Dusseau. "Slacker: Fast Distribution with Lazy Docker Containers". 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016, Santa Clara, CA, USA Credit to Akihiro Suda (NTT) for discussion and experiment environment
  • 12. Time to take for container startup 0 5 10 15 20 25 estargz stargz legacy python:3.7 (print “hello”) pull create run [sec] Waits for prefetch completion Credit to Akihiro Suda (NTT) for discussion and experiment environment
  • 13. Time to take for container startup 0 5 10 15 20 25 30 estargz stargz legacy gcc:9.2.0 (compiles and runs printf(“hello”);) pull create run [sec] Credit to Akihiro Suda (NTT) for discussion and experiment environment
  • 14. Time to take for container startup 0 5 10 15 20 25 estargz stargz legacy glassfish:4.1-jdk8 (runs until “Running GlassFish” is printed) pull create run [sec] Credit to Akihiro Suda (NTT) for discussion and experiment environment
  • 15. Expected use-cases Speeding up base image distribution on image build l Especially for temporary base images of “dev” stages in multi-stage build • won’t be included in the result image • https://github.com/moby/buildkit/pull/1402 Speeding up dev pipeline (or building/testing environment) l The initial motivation in Go community to invent stargz was to speed up the builder image distribution in their build system • https://github.com/golang/go/issues/30829 Sharing large scientific software stack (e.g. ML frameworks) l For example, ML frameworks tend to be large (> 1GB) Improving cold start performance (e.g. Serverless) l But needs more investigation • https://github.com/knative/serving/issues/5913 Stargz Snapshotter is still in early stage Ø Needs more performance improvements for the filesystem Ø Lazy pull performance seems to be affected by the internet condition (e.g. CDN), etc. Ø Be careful for the fault tolerance until the layer contents are fully cached Ø … Feedbacks/comments are always welcome!
  • 16. Other OCI-alternative lazy image distribution Slacker: https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter l Uses NFS infra for the distribution of rootfs snapshots of containers l Registries are used for sharing snapshot IDs among hosts CernVM-FS: https://cvmfs.readthedocs.io/en/stable/ l FUSE Filesystem by CERN for sharing High Energy Physics (HEP) software on worldwide infrastructure l Software stack can be mounted and lazily downloaded from CernVM-FS “repository” via HTTP l Remote Snapshotter implementation for containerd • https://github.com/cvmfs/containerd-remote-snapshotter l On-going discussion towards integration with Podman • https://github.com/containers/storage/issues/383
  • 17. Other OCI-alternative lazy image distribution Filegrain: https://github.com/akihirosuda/filegrain l Proposed by Akihiro Suda (NTT) l OCI compliant image format but uses continuity manifests as layers l An image can be mounted and files are pulled lazily l Each file is treated as a content-addressable blob => de-duplication in file granuality On-going discussion towards “OCIv2”: https://hackmd.io/@cyphar/ociv2-brainstorm l Proposed by Aleksa Sarai (SUSE) l Brainstorm is in progress (2020/07) l Lazy fetch support, mountable filesystem are also in the scope crfs-plugin for fuse-overlayfs: https://github.com/giuseppe/crfs-plugin l Proposed by Giuseppe Scrivano (Red Hat) l Plugin of fuse-overlayfs for mounting stargz layer
  • 18. Recap l Pull is one of the time-consuming steps in the container lifecycle. l Stargz Snapshotter, non-core subproject in containerd, is trying to solve it by lazy-pulling images leveraging stargz image by Google. • Standard compliant so can be pushed to and lazily pulled from standard registries • Workload-based runtime optimization is also held with eStargz l There are also other OCI-alternative image distribution strategies in container ecosystem Feedbacks and suggestions are always welcome! https://github.com/containerd/stargz-snapshotter