Introduction to Docker
Introduction to Docker
advantages
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Prerequisites
We will use:
ls , cd , and mkdir to find our way in and manage the file system.
INTRODUCTION TO DOCKER
Containers
A portable computing environment
INTRODUCTION TO DOCKER
Making it less abstract
INTRODUCTION TO DOCKER
Containers run identically every time
INTRODUCTION TO DOCKER
Containers run identically everywhere
INTRODUCTION TO DOCKER
Isolation
INTRODUCTION TO DOCKER
Containers provide security
INTRODUCTION TO DOCKER
Containers are lightweight
Security
Portability
Reproducibility
Lightweight
In comparison to running an application:
Outside of a container
INTRODUCTION TO DOCKER
Containers and data science
Automatically reproducible
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
The Docker Engine
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Docker ecosystem
INTRODUCTION TO DOCKER
Docker Engine
1 https://docs.docker.com/engine/
INTRODUCTION TO DOCKER
The Docker daemon
1 https://docs.docker.com/engine/ 2 https://docs.docker.com/get-started/overview/#docker-architecture
INTRODUCTION TO DOCKER
Images and Containers
1 https://docs.docker.com/engine/ 2 https://docs.docker.com/get-started/overview/#docker-architecture
INTRODUCTION TO DOCKER
Containers are processes
INTRODUCTION TO DOCKER
Containers are processes
INTRODUCTION TO DOCKER
Containers are isolated processes
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Containers vs.
Virtual Machines
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Containers and Virtual Machines
INTRODUCTION TO DOCKER
Resource Virtualization
INTRODUCTION TO DOCKER
Containers vs Virtual Machines
INTRODUCTION TO DOCKER
Security of Virtualization
INTRODUCTION TO DOCKER
Containers are lightweight
INTRODUCTION TO DOCKER
Advantages of containers
Because of their smaller size containers
Are faster to
Start
Stop
Distribute
To change or update
INTRODUCTION TO DOCKER
Advantages of Virtual Machines
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Running Docker
containers
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Prerequisite
Command Usage
nano <file-name> Opens <file-name> in the nano text editor
touch <file-name> Creates an empty file with the specified name
echo "<text>" Prints <text> to the console
<command> >> <file> Pushes the output of <command> to the end of <file>
<command> -y Automatically respond yes to all prompts from <command>
INTRODUCTION TO DOCKER
The Docker CLI
Docker command line interface will send instructions to the Docker daemon.
INTRODUCTION TO DOCKER
Docker container output
docker run <image-name>
INTRODUCTION TO DOCKER
Choosing Docker container output
docker run <image-name>
repl@host:/#
INTRODUCTION TO DOCKER
An interactive Docker container
Adding -it to docker run will give us an interactive shell in the started container.
repl@container:/# exit
exit
repl@host:/#
INTRODUCTION TO DOCKER
Running a container detached
Adding -d to docker run will run the container in the background, giving us back control of
the shell.
INTRODUCTION TO DOCKER
Listing and stopping running containers
docker ps
repl@host:/# docker ps
CONTAINER ID IMAGE COMMAND CREATED
4957362b5fb7 postgres "docker-entrypoint.s…" About a minute ago
STATUS PORTS NAMES
Up About a minute 5432/tcp awesome_curie
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Start a container docker run <image-name>
Start an interactive container docker run -it <image-name>
Start a detached container docker run -d <image-name>
List running containers docker ps
Stop a container docker stop <container-id>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Working with Docker
containers
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Listing containers
repl@host:/# docker ps
CONTAINER ID IMAGE .. CREATED STATUS ... NAMES
3b87ec116cb6 postgres 2 seconds ago Up 1 second ... adoring_germain
8a7830bbc787 postgres 3 seconds ago Up 2 seconds ... exciting_heisenberg
fefdf1687b39 postgres 3 seconds ago Up 2 seconds ... vigilant_swanson
b70d549d4611 postgres 4 seconds ago Up 3 seconds ... nostalgic_matsumoto
a66c71c54b92 postgres 4 seconds ago Up 4 seconds ... lucid_matsumoto
8d4f412adc3f postgres 6 seconds ago Up 5 seconds ... fervent_ramanujan
fd0b3b2a843e postgres 7 seconds ago Up 6 seconds ... cool_dijkstra
0d1951db81c4 postgres 8 seconds ago Up 7 seconds ... happy_sammet
...
INTRODUCTION TO DOCKER
Named containers
docker run --name <container-name> <image-name>
INTRODUCTION TO DOCKER
Filtering running containers
docker ps -f "name=<container-name>"
INTRODUCTION TO DOCKER
Container logs
docker logs <container-id>
2022-10-24 12:10:40.318 UTC [1] LOG: database system is ready to accept connect..
INTRODUCTION TO DOCKER
Live logs
docker logs -f <container-id>
2022-10-24 12:10:40.309 UTC [1] LOG: starting PostgreSQL 14.5 (Debian 14.5-1.pg..
2022-10-24 12:10:40.309 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port ..
2022-10-24 12:10:40.309 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-10-24 12:10:40.311 UTC [1] LOG: listening on Unix socket "/var/run/postgre..
2022-10-24 12:10:40.315 UTC [62] LOG: database system was shut down at 2022-10-..
2022-10-24 12:10:40.318 UTC [1] LOG: database system is ready to accept connect..
INTRODUCTION TO DOCKER
Cleaning up
docker container rm <container-id>
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Start container with a name docker run --name <container-name> <image-name>
Filter running container on name docker ps -f "name=<container-name>"
See existing logs for container docker logs <container-id>
See live logs for container docker logs -f <container-id>
Exit live log view of container CTRL+C
Remove stopped container docker container rm <container-id>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Managing local
docker images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
INTRODUCTION TO DOCKER
Pulling an image
docker pull <image-name>
INTRODUCTION TO DOCKER
Image versions
INTRODUCTION TO DOCKER
Listing images
docker images
INTRODUCTION TO DOCKER
Removing images
docker image rm <image-name>
INTRODUCTION TO DOCKER
Cleaning up containers
docker container prune
INTRODUCTION TO DOCKER
Cleaning up images
docker image prune -a
INTRODUCTION TO DOCKER
Dangling images
docker images
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Pull an image docker pull <image-name>
Pull a specific version of an image docker pull <image-name>:<image-version>
List all local images docker images
Remove an image docker image rm <image-name>
Remove all stopped containers docker container prune
Remove all images docker image prune -a
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Distributing Docker
Images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Private Docker registries
Unlike Docker official images there is no quality guarantee
dockerhub.myprivateregistry.com/classify_spam
Using tag: v1
latest: Pulling from dockerhub.myprivateregistry.com
ed02c6ade914: Pull complete
Digest: sha256:b6b83d3c331794420340093eb706b6f152d9c1fa51b262d9bf34594887c2c7ac
Status: Downloaded newer image for dockerhub.myprivateregistry.com/classify_spam:v1
dockerhub.myprivateregistry.com/classify_spam:v1
INTRODUCTION TO DOCKER
Pushing to a registry
docker image push <image name>
Pushing to a specific registry --> name of the image needs to start with the registry url
INTRODUCTION TO DOCKER
Authenticating against a registry
Docker official images --> No authentication needed
INTRODUCTION TO DOCKER
Docker images as files
Sending a Docker image to one or a few people? Send it as a file!
Save an image
Load an image
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Pull image from private registry docker pull <private-registry-url>/<image-name>
Name an image docker tag <old-name> <new-name>
Push an image docker image push <image-name>
Login to private registry docker login <private-registry-url>
Save image to file docker save -o <file-name> <image-name>
Load image from file docker load -i <file-name>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Creating your own
Docker images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Creating images with Dockerfiles
INTRODUCTION TO DOCKER
Starting a Dockerfile
A Dockerfile always start from another image, specified using the FROM instruction.
FROM postgres
FROM ubuntu
FROM hello-world
FROM my-custom-data-pipeline
FROM postgres:15.0
FROM ubuntu:22.04
FROM hello-world:latest
FROM my-custom-data-pipeline:v1
INTRODUCTION TO DOCKER
Building a Dockerfile
Building a Dockerfile creates an image.
INTRODUCTION TO DOCKER
Naming our image
In practice we almost always give our images a name using the -t flag:
...
=> => writing image sha256:a67f41b1d127160a7647b6709b3789b1e954710d96df39ccaa21..
=> => naming to docker.io/library/first_image
INTRODUCTION TO DOCKER
Customizing images
RUN <valid-shell-command>
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
...
After this operation, 22.8 MB of additional disk space will be used.
Do you want to continue? [Y/n]
INTRODUCTION TO DOCKER
Building a non-trivial Dockerfile
When building an image Docker actually runs commands after RUN
Docker running RUN apt-get update takes the same amount of time as us running it!
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Start a Dockerfile from an image FROM <image-name>
Add a shell command to image RUN <valid-shell-command>
Make sure no user input is needed for the shell-command. RUN apt-get install -y python3
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Managing files in
your image
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
COPYing files into an image
The COPY instruction copies files from our local machine into the image we're building:
If the destination path does not have a filename, the original filename is used:
INTRODUCTION TO DOCKER
COPYing folders
Not specifying a filename in the src-path will copy all the file contents.
/projects/
pipeline_v3/
pipeline.py
requirements.txt
tests/
test_pipeline.py
INTRODUCTION TO DOCKER
Copy files from a parent directory
/init.py
/projects/
Dockerfile
pipeline_v3/
pipeline.py
INTRODUCTION TO DOCKER
Downloading files
Instead of copying files from a local directory, files are often downloaded in the image build:
Download a file
RUN rm <copy_directory>/<filename>.zip
INTRODUCTION TO DOCKER
Downloading files efficiently
Each instruction that downloads files adds to the total size of the image.
Even if the files are later deleted.
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
COPY <src-path-on-host> <dest-path-
Copy files from host to the image on-image>
Copy a folder from host to the image COPY <src-folder> <dest-folder>
We can't copy from a parent directory where we
COPY ../<file-in-parent-directory> /
build a Dockerfile
Keep images small by downloading, unzipping, and cleaning up in a single RUN instruction:
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Choosing a start
command for your
Docker image
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
What is a start command?
The hello-world image prints text and then stops.
INTRODUCTION TO DOCKER
What is a start command?
An image with python could start python on startup.
....
>>> exit()
repl@host:/#
INTRODUCTION TO DOCKER
Running a shell command at startup
CMD <shell-command>
INTRODUCTION TO DOCKER
Typical usage
Starting an application to run a workflow or that accepts outside connections.
CMD postgres
CMD start.sh
INTRODUCTION TO DOCKER
When will it stop?
INTRODUCTION TO DOCKER
Overriding the default start command
Starting an image
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Add a shell command run when a container is started from the CMD <shell-
image. command>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Introduction to
Docker layers and
caching
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Docker build
Downloading and unzipping a file using the Docker instructions.
/example_folder.zip
/example_folder/
example_file1
example_file2
INTRODUCTION TO DOCKER
Docker instructions are linked to File system changes
Each instruction in the Dockerfile is linked to the changes it made in the image file system.
FROM docker.io/library/ubuntu
=> Gives us a file system to start from with all files needed to run Ubuntu
INTRODUCTION TO DOCKER
Docker layers
Docker layer: All changes caused by a single Dockerfile instruction.
Docker image: All layers created during a build
--> Docker image: All changes to the file system by all Dockerfile instructions.
INTRODUCTION TO DOCKER
Docker caching
Consecutive builds are much faster because Docker re-uses layers that haven't changed.
Re-running a build:
INTRODUCTION TO DOCKER
Understanding Docker caching
When layers are cached helps us understand why sometimes images don't change after a
rebuild.
Docker will use cached layers because the instructions are identical to previous builds.
INTRODUCTION TO DOCKER
Understanding Docker caching
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.
In the following Dockerfile all instructions need to be rebuild if the pipeline.py file is changed:
FROM ubuntu
COPY /app/pipeline.py /app/pipeline.py
RUN apt-get update
RUN apt-get install -y python3
INTRODUCTION TO DOCKER
Understanding Docker caching
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.
In the following Dockerfile, only the COPY instruction will need to be re-run.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
COPY /app/pipeline.py /app/pipeline.py
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Changing users and
working directory
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Dockerfile instruction interaction
FROM, RUN, and COPY interact through the file system.
INTRODUCTION TO DOCKER
WORKDIR - Changing the working directory
Starting all paths at the root of the file system:
WORKDIR /home/my_user_with_a_long_name/work/projects/
INTRODUCTION TO DOCKER
RUN in the current working directory
Instead of using the full path for every command:
RUN /home/repl/projects/pipeline/init.sh
RUN /home/repl/projects/pipeline/start.sh
WORKDIR /home/repl/projects/pipeline/
RUN ./init.sh
RUN ./start.sh
INTRODUCTION TO DOCKER
Changing the startup behavior with WORKDIR
Instead of using the full path:
CMD /home/repl/projects/pipeline/start.sh
WORKDIR /home/repl/projects/pipeline/
CMD start.sh
INTRODUCTION TO DOCKER
Linux permissions
Permissions are assigned to users.
Root is a special user with all permissions.
Best practice
Use root to create new users with permissions for specific tasks.
INTRODUCTION TO DOCKER
Changing the user in an image
Best practice: Don't run everything as root
Ubuntu -> root by default
INTRODUCTION TO DOCKER
Changing the user in a container
Dockerfile setting the user to repl:
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Change the current working directory WORKDIR <path>
Change the current user USER <user-name>
INTRODUCTION TO DOCKER
Time for practice!
INTRODUCTION TO DOCKER
Variables in
Dockerfiles
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Variables with the ARG instruction
Create variables in a Dockerfile
ARG <var_name>=<var_value>
$path
INTRODUCTION TO DOCKER
Use-cases for the ARG instruction
Setting the Python version
FROM ubuntu
ARG python_version=3.9.7-1+bionic1
RUN apt-get install python3=$python_version
RUN apt-get install python3-dev=$python_version
Configuring a folder
FROM ubuntu
ARG project_folder=/projects/pipeline_v3
COPY /local/project/files $project_folder
COPY /local/project/test_files $project_folder/tests
INTRODUCTION TO DOCKER
Setting ARG variables at build time
FROM ubuntu
ARG project_folder /projects/pipeline_v3
COPY /local/project/files $project_folder
COPY /local/project/test_files $project_folder/tests
INTRODUCTION TO DOCKER
Variables with ENV
Create variables in a Dockerfile
ENV <var_name>=<var_value>
$DB_USER
INTRODUCTION TO DOCKER
Use-cases for the ENV instruction
Setting a directory to be used at runtime
ENV DATA_DIR=/usr/local/var/postgres
1 https://hub.docker.com/_/postgres
INTRODUCTION TO DOCKER
Secrets in variables are not secure
docker history <image-name>
ARG DB_PASSWORD=example_password
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Create a variable accessible only during the build ARG <name>=<value>
Create a variable ENV <name>=<value>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Creating Secure
Docker Images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Inherent Security
INTRODUCTION TO DOCKER
Making secure images
INTRODUCTION TO DOCKER
Images from a trusted source
Creating secure images -> Start with an image from a trusted source
Docker Hub filters:
INTRODUCTION TO DOCKER
Keep software up-to-date
INTRODUCTION TO DOCKER
Keep images minimal
Adding unnecessary packages Installing only essential packages
reduces security improves security
Ubuntu with: Ubuntu with:
Python2.7 Python3.11
Java default-jre
Java openjdk-11
Java openjdk-8
Airflow
Our pipeline application
INTRODUCTION TO DOCKER
Don't run applications as root
Allowing root access to an image defeats keeping the image up-to-date and minimal.
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Wrap-up
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Chapter 1: The theoretical foundation
INTRODUCTION TO DOCKER
Chapter 2: The Docker CLI
Usage Command
docker run (--name <container-name>) (-it) (-d) <image-
Start a container name>
List running containers docker ps (-f "name=<container-name>")
Stop a container docker stop <container-id>
See (live) logs for container docker logs (-f) <container-id>
Remove stopped container docker container rm <container-id>
Pull a specific version of an docker pull <image-name>:<image-version>
image
List all local images docker images
Remove an image docker image rm <image-name>
INTRODUCTION TO DOCKER
Chapter 3: Dockerfiles
FROM ubuntu
RUN apt-get update && apt-get install python3
COPY /projects/pipeline /app/
CMD /app/init.py
INTRODUCTION TO DOCKER
Chapter 4: Security and Customization
Usage Dockerfile Instruction
Change the current working directory WORKDIR <path>
Change the current user USER <user-name>
Create a variable accessible only during the build ARG <name>=<value>
Create a variable ENV <name>=<value>
INTRODUCTION TO DOCKER
Chapter 4: Security and Customization
Isolation provided by containers gives security but is not perfect.
Use the "Trusted Content" images from the official Docker Hub registry
Only install the software you need for the current use case.
INTRODUCTION TO DOCKER
What more is there to learn?
Dockerfile instructions Multi stage builds
ENTRYPOINT
FROM ubuntu as stage1
HEALTHCHECK RUN generate_data.py
EXPOSE ...
FROM postgres as stage2
...
COPY --from=stage 1 /tmp /data
INTRODUCTION TO DOCKER
Thank you!
INTRODUCTION TO DOCKER