TheNewStack Book4 Networking Security and Storage With Docker and Containers
TheNewStack Book4 Networking Security and Storage With Docker and Containers
vol.
NETWORKING,
SECURITY
& STORAGE
WITH
DOCKER
& CONTAINERS
EDITED & CURATED BY ALEX WILLIAMS
The New Stack:
The Docker and Container Ecosystem Ebook Series
Contributors:
Judy Williams, Copy Editor
Networking ..................................................................................................................................87
Security.........................................................................................................................................90
Storage .........................................................................................................................................95
Disclosures...................................................................................................................................98
Comparatively, there are relative veterans who have long been composing
been cited as a barrier to entry for containers. This ebook explains how
containers can facilitate a more secure environment by addressing
storage strategies with the intent to show some of the patterns that have
worked for others implementing container storage.
Networking, security and storage are all topics with broad and deep
subject matter. Each of these topics deserves a full book of its own, but
setting the stage in this initial ebook on these topics is an important
exercise. The container ecosystem is becoming as relevant for operations
teams as it is for developers who are packaging their apps in new ways.
This combined interest has created a renaissance for technologists, who
have become the central players in the emergence of new strategic
thinking about how developers consume infrastructure.
There are more ways than one to skin a cat, and while we try to educate
on the problems, strategies and products, much of this will be quickly
outgrown. In two years’ time, many of the approaches to networking,
security and storage that we discuss in the ebook will not be as relevant.
But the concepts behind these topics will remain part of the conversation.
Containers will still need to communicate with each other securely,
Thanks,
Benjamin Ball
Technical Editor and Producer
The New Stack
Jason McGee
discussion extends into the plugin ecosystem for Docker, and how
Phil Estes
mistakenly set.
This article is split into two primary areas of focus around types of
Networking starts with connectivity. Part one starts with the various ways
in which container-to-container and container-to-host connectivity is
provided.
• None
• Bridge
• Overlay
• Underlay
For the second half of this article, there are two container networking
greatly.
Some types are container engine-agnostic, and others are locked into a
Container-Mapped Networking
In this mode of networking, one container reuses (maps to) the
networking namespace of another container. This mode of networking
may only be invoked when running a docker container like this:
--net:container:some_container_name_or_id.
inside of the network stack that has already been created inside of
another container. While sharing the same IP and MAC address and port
on the two containers will be able to connect to each other over the
loopback interface.
sharing the same IP address, are inherent to the notion that containers
run in the same pod, which is the behavior of rkt containers.
None
None is straightforward in that the container receives a network stack, but
lacks an external network interface. It does, however, receive a loopback
interface. Both the rkt and docker container projects provide similar
behavior when none or null networking is used. This mode of container
networking has a number of uses including testing containers, staging a
container for a later network connection, and being assigned to
containers with no need for external communication.
Bridge
A Linux bridge provides a host-internal network in which containers on the
same host may communicate, but the IP addresses assigned to each
container are not accessible from outside the host. Bridge networking
leverages iptables for NAT and port-mapping, which provide single-
host networking. Bridge networking is the default Docker network type
(i.e., docker0), where one end of a virtual network interface pair is
connected between the bridge and the container.
4. iptables with NAT are used to map between each private container
and the host’s public interface.
Host
In this approach, a newly created container shares its network namespace
the container has access to all of the host’s network interfaces, unless
network stack.
Host networking is the default type used within Mesos. In other words, if
the framework does not specify a network type, a new network
namespace will not be associated with the container, but with the host
network. Sometimes referred to as native networking, host networking is
conceptually simple, making it easier to understand, troubleshoot and
use.
Overlay
Overlays use networking tunnels to deliver communication across hosts.
This allows containers to behave as if they are on the same machine by
tunneling network subnets from one host to the next; in essence,
spanning one network across multiple hosts. Many tunneling technologies
exist, such as virtual extensible local area network (VXLAN).
VXLAN has been the tunneling technology of choice for Docker libnetwork,
whose multi-host networking entered as a native capability in the 1.9
release. With the introduction of this capability, Docker chose to leverage
For those needing support for other tunneling technologies, Flannel may
be the way to go. It supports udp, vxlan, host-gw, aws-vpc or gce.
Each of the cloud provider tunnel types creates routes in the provider’s
routing tables, just for your account or virtual private cloud (VPC). The
support for public clouds is particularly key for overlay drivers given that
among others, overlays best address hybrid cloud use cases and provide
scaling and redundancy without having to open public ports.
Underlays
Underlay network drivers expose host interfaces (i.e., the physical network
interface at eth0) directly to containers or VMs running on the host. Two
such underlay drivers are media access control virtual local area network
(MACvlan) and internet protocol vlan (IPvlan). The operation of and the
behavior of MACvlan and IPvlan drivers are very familiar to network
engineers. Both network drivers are conceptually simpler than bridge
Moreover, IPvlan has an L3 mode that resonates well with many network
MACvlan
MACvlan allows creation of multiple virtual network interfaces behind the
host’s single physical interface. Each virtual interface has unique MAC and
IP addresses assigned, with a restriction: the IP addresses need to be in
the same broadcast domain as the physical interface. While many
network engineers may be more familiar with the term subinterface (not
to be confused with a secondary interface), the parlance used to describe
MACvlan uses a unique MAC address per container, and this may cause
issue with network switches that have security policies in place to prevent
interface.
which completely isolates the host from the containers it runs. The host
cannot reach the containers. The container is isolated from the host. This
is useful for service providers or multi-tenant scenarios, and has more
isolation than the bridge model.
protocols were designed with on-premises use cases in mind. Your public
cloud mileage will vary as most do not support promiscuous mode on
their VM interfaces.
IPvlan
IPvlan is similar to MACvlan in that it creates new virtual network
same MAC address of the physical interface. The need for this behavior is
ip link, is
ephemeral, so most operators use network startup scripts to persist
stands to improve. For example, when new VLANs are created on a top of
rack switch, these VLANs may be pushed into Linux hosts via the exposed
container engine API.
Direct Routing
For the same reasons that IPvlan L3 mode resonates with network
engineers, they may chose to push past L2 challenges and focus on
addressing network complexity in Layer 3 instead. This approach
Project Calico is one such project and uses BGP to distribute routes for
Fan Networking
Fan networking is a way of gaining access to many more IP addresses,
expanding from one assigned IP address to 250 IP addresses. This is a
performant way of getting more IPs without the need for overlay
networks. This style of networking is particularly useful when running
containers in a public cloud, where a single IP address is assigned to a
host and spinning up additional networks is prohibitive, or running
another load-balancer instance is costly.
Point-to-Point
Point-to-point is perhaps the simplest type of networking, and the default
networking used by CoreOS rkt. Using NAT, or IP Masquerade (IPMASQ), by
default, it creates a virtual ethernet pair, placing one on the host and the
other in the container pod. Point-to-point networking leverages iptables
for internal communication between other containers in the pod over the
loopback interface.
Capabilities
Outside of pure connectivity, support for other networking capabilities
and network services needs to be considered. Many modes of container
networking either leverage NAT and port-forwarding or intentionally avoid
their use. IP address management (IPAM), multicast, broadcast, IPv6, load-
in top public clouds reinforces the need for other networking types, such
as overlays and fan networking.
Linux containers: The container network model (CNM) and the container
network interface (CNI). As stated above, networking is complex and there
are many ways to deliver functionality. Arguments can be made as to
which one is easier to adopt than the next, or which one is less tethered to
their benefactor’s technology.
FIG 1:
Docker
Runtime
Network 1 Network 2
FIG 2:
plugins). The native drivers are none, bridge, overlay and MACvlan. Remote
• Network:
group of endpoints that are able to communicate with each other.
--label
libnetwork and drivers. Labels are powerful in that the runtime may
inform driver behavior.
Container
Runtime
Multiple plugins may be run at one time with a container joining networks
CNI Flow
Mesos is the the latest project to add CNI support, and there is a Cloud
Foundry implementation in progress. The current state of Mesos
networking uses host networking, wherein the container shares the same
IP address as the host. Mesos is looking to provide each container with its
own network namespace, and consequently, its own IP address. The
project is moving to an IP-per-container model and, in doing so, seeks to
democratize networking such that operators have freedom to choose the
style of networking that best suits their purpose.
Currently, CNI primitives handle concerns with IPAM, L2 and L3, and
expect the container runtime to handle port-mapping (L4). From a Mesos
perspective, this minimalist approach comes with a couple caveats, one of
Both models provide separate extension points, aka plugin interfaces, for
CNM does not provide network drivers access to the container’s network
CNI supports integration with third-party IPAM and can be used with any
container runtime. CNM is designed to support the Docker runtime engine
only. With CNI’s simplistic approach, it’s been argued that it’s
comparatively easier to create a CNI plugin than a CNM plugin.
Kuryr
Kuryr, a project providing container networking, currently works as a
remote driver for libnetwork to provide networking for Docker using
Neutron as a backend network engine. Support for CNM has been
delivered and the roadmap for this project includes support for CNI.
Magnum
Magnum, a project providing Containers as a Service (CaaS) and
leveraging Heat to instantiate clusters running other container
orchestration engines, currently uses non-Neutron networking options for
containers.
equipment.
Summary
We discussed a number of considerations for choosing which type of
Ken Owens
The critical architectural distinction for the Docker scheme concerns just
what part is being extended. In Docker architecture, the daemon of
Docker Engine runs on the host server where the applications are being
system. We had to externalize it, and plugins are the way we’re doing that.”
Then CoreOS produced its Container Network Interface (CNI). It’s more
rudimentary than CNM, in that it only has two commands: create a
with CNI. As a result, Flannel and Weave Net have been implemented as
Kubernetes plugins using CNI.
concluded that which model you choose will depend, perhaps entirely, on
how much integration you require between containers and pre-existing
workloads.
“If your job previously ran on a VM, and your VM had an IP and could talk
to the other VMs in your project,” explained ClusterHQ Senior Vice
President of Engineering and Operations Sandeepan Banerjee, “you are
“If that is not a world that you are coming from,” he continued, “and you
want to embrace the Docker framework as something you see as
then the Docker proposal is powerful, with merits, with probably a lot
more tunability overall.”
extend this library by way of CNI. At the time of this writing, Mesosphere
had published a document stating its intent to implement CNI support in
Docker will become the universal plugins. And I think what you’re seeing in
the industry already today is, that’s not the case.”
the proper destination IPs. This accomplishes the cross-cloud scope that
Hindman told us he sees value in how Flannel, Weave, and other network
overlay systems solve the problem of container networking, at a much
higher level than piecing together a VXLAN. The fact that such an
alternative would emerge, he said, “is just capturing the fact that we, as an
industry, are sort of going through and experimenting with a couple of
settle on a handful of things, and overlays are still going to be there. But
there are going to be some other ways in which people link together and
don’t quite understand the concepts behind the processes they are
require to invest both their faith and their capital expense. Some might
think this is a watering down of the topic. In truth, integration is an
elevation of the basic idea to a shared plane of discussion. Everyone
understands the basic need to make old systems coexist, interface and
communicate with new ones. So even though the methodologies may
seem convoluted or impractical in a few years, the inspiration behind
working toward a laudable goal will have made it all worth pursuing.
John Morello
step is to investigate the security gains that can be achieved through the
will push to the continuous integration (CI) system, which will build and
test the images. The image will then be pushed to the registry. It is now
ready for deployment to production, which will typically involve an
orchestration system such as Docker’s built-in orchestration, Kubernetes,
Mesos, etc. Some organizations may instead push to a staging
environment before production.
DEVELOPER
COMMITS PULLS FEEDBACK
LATEST FEEDBACK
CODE LOOP
SIGNED LOOP
IMAGE
PULLS
CI/CD SENDS
LATEST Production
System SIGNED Registry STABLE Environment
IMAGES
IMAGE
TRIGGERS
UPDATE
FIG 1:
-
Image Provenance
The gold standard for image provenance is Docker Content Trust. With
Docker Content Trust enabled, a digital signature is added to images
before they are pushed to the registry. When the image is pulled, Docker
Content Trust will verify the signature, thereby ensuring the image comes
from the correct organization and the contents of the image exactly
match the image that was pushed. This ensures attackers did not tamper
with the image, either in transit or when it was stored at the registry.
matches the image. Unlike tags, a digest will always point to exactly the
same image; any update to the image will result in the generation of a new
digest. The problem with using digests is organizations need to set up a
proprietary system for automatically extracting and distributing them.
Security Scanning
several companies. The basic idea is simple: take a Docker image and
Some scanners, including Clair, will just interrogate the package manager
boundaries.
have some vulnerabilities and this isn’t a realistic option. For example,
vulnerability due to the version of Perl used (most other images have
to do this is to use a very small base image, such as Alpine, which comes
in at only 5MB. Another, somewhat extreme, possibility is to build a
statically linked binary and copy it on top of the empty “scratch” image.
That way there are no OS level vulnerabilities at all. The major
disadvantage of this approach is that building and debugging become
Auditing
Auditing directly follows security scanning and image provenance. At any
point in time, we would like to be able to see which images are running in
production and which version of the code they are running. In particular, it
is important to identify containers running out-of-date, potentially
vulnerable images.
Note that it isn’t enough to scan images before they are deployed. As new
vulnerabilities are reported, images with a previous clean bill of health will
With reference to containers, this means that each container should run
A large and easy win for security is to run containers with read-only
when the container starts should not run as root. If it does, any attacker
who compromises the process will have root-level privileges inside the
container. Much worse, as users are not namespaced by default, should
the attacker manage to break out of the container and onto the host, they
might be able to get full root-level privileges on the host.
and switch to it before executing the main process. Since Docker 1.10,
there has been optional support for enabling user namespacing, which
automatically maps the user in a container to a high-numbered user on
the host. This works, but currently has several drawbacks, including
which kernel calls are allowed, Docker now has seccomp support for
specifying exactly which calls can be used, and ships with a default
seccomp policy that has already shown to be at mitigating
problems in the Linux kernel. The main problem with both approaches is
detection and response include Aqua Security, Joyent Triton SmartOS and
Twistlock. As more mission-critical applications move to containers,
automating runtime threat detection and response will be increasingly
important to container security. The ability to correlate information,
analyze indicators of compromise, and manage forensics and response
actions, in an automated fashion, will be the only way to scale up runtime
security for a containerized world.
Access Controls
The Linux kernel has support for security modules that can apply policies
prior to the execution of kernel calls. The two most common security
modules are AppArmor and SELinux, both of which implement what is
known as mandatory access control (MAC). MAC will check that a user or
process has the rights to perform various actions, such as reading and
Further to the topic of access control, it’s important to note that anyone
upon execution) binaries that can be copied back to the host. In most
situations, this is just something to be aware of, but some organizations
Conclusion
It is essential for organizations to consider security when implementing a
Containers and the golden image approach enable new ways of working
and tooling, especially around image scanning and auditing.
production and can much more easily and quickly react to vulnerabilities.
Updated base images can be tested, integrated and deployed in minutes.
Image signing validates the authenticity of containers and ensures that
attackers have not tampered with their contents.
Bryan Cantrill
One of the goals for Docker was to avoid the pitfall illustrated by “VM
sprawl.” The only way Docker could avoid the trap of fragmented images
to create isolation between the host system and containers. This isolation
was core to the security of containerized applications. To meet these
requirements, Docker adopted a for the
images and containers.
applications that can rapidly scale. Since they are self-contained, Docker
When a Docker image is pulled from the registry, the engine downloads all
the dependent layers to the host. When a container is launched from a
downloaded image comprised of many layers, Docker uses the copy-on-
write
topmost working layer. All other processes using the original image’s
layers will continue to access the read-only, original version of the layer.
This technique optimizes both image disk space usage and the
performance of container start times.
deleted, all of the data written to the container is deleted along with it.
Host-Based Persistence
Host-based persistence is one of the early implementations of data
durability in containers, which has matured to support multiple use cases.
In this architecture, containers depend on the underlying host for
Data volumes are directly accessible from the Docker host. This means
you can read and write to them with normal Linux tools. In most cases
you should not do this, as it can cause data corruption if your containers
and applications are unaware of your direct access.
FIG 1: -
Host
/
container.
var
lib
docker
volumes
233ran5d5o2m
Container
r9a3n8745dom
3ran506d87om /
content content
dir dir
or more containers.
FIG 2: -
Host Container 1
/ /
opt web
web
Container 2
stopping Docker Engine. Since this shared mount point is fully outside the
control of Docker Engine’s storage backend, it is not part of the layered,
This technique is the most popular one used by DevOps teams. Referred
to as data volumes
• Changes applied to a data volume will not be included when the image
gets updated.
containers nonportable. The data residing on the host will not move along
with the container, which creates a tight bind between the host and
container.
/ / /
web web web
/ / /
opt opt opt
web web web
Distributed Filesystem
Source: Janakiram MSV
FIG 3: -
combined with the explicit storage technique. Since the mount point is
available on all nodes, it can be leveraged to create a shared mount point
among containers.
• Create volumes:
containers. It results in the Docker Engine creating a designated
• Deleting volumes:
removing associated containers; they will need to be manually deleted
by the operations team.
• In multi-host
Volume Plugins
Although host-based persistence is a valuable addition to Docker for
Storage Backend 2
Storage Backend 3
Storage Backend...
FIG 4: -
ers.
the volume. The operations supported by the CLI go beyond the standard
tasks that the Docker CLI can perform.
The volume plugin clients enable the following tasks as part of the
lifecycle management:
FIG 5:
used with any container within the cluster. It manages Docker containers
and data volumes together, enabling the volumes to follow the containers
storage providers. While there are a few dozen entities delivering storage
solutions from the container ecosystem, we will explore some of the
prominent players from each category.
The rise of containers in enterprise has led to the creation of a new class
of storage optimized for containerized workloads. Existing storage
technologies, such as network-attached storage (NAS) and storage area
network (SAN), are not designed to run containerized applications.
on faster SSDs while moving the archival data to magnetic disks. This
delivers the right level of performance for workloads, such as online
infrastructure.
object storage services, such as Amazon S3, IBM Bluemix Object Storage
and Joyent Manta, as well as block storage devices such as Elastic Block
Storage (EBS) or Google Compute Engine (GCE) persistent disks. To enable
easy integration with the infrastructure, these cloud providers are
investing in storage drivers and plugins that bring persistence to
containers. DevOps teams can host image registries in the public cloud
backed by object storage. Block storage devices deliver performance and
durability to workloads.
Docker announced native support for AWS and Azure. This will accelerate
the development and optimization of storage drivers for object and block
storage.
Summary
Storage is one of the key building blocks of a viable enterprise container
infrastructure. Though Docker made it easy to add persistence based on
data volumes, the ecosystem is taking it to the next level. Volume plugins
are a major step towards integrating containers with some of the latest
innovations in the storage industry. Providers are making it possible to tap
into the power of enterprise storage platforms.
Public cloud providers with robust storage infrastructure are getting ready
cloud. Containers as a Service will also drive the demand for native drivers
for public cloud storage.
Hari Krishnan
Harmeet Sahni,
full scale on your hardware. For running a multitier application, you spent
time on using a service discovery mechanism for your application
containers. You have a logging mechanism that pulls out the information
from each container and ships them to a server to be indexed. Using a
monitoring tool that is well suited for this era when machines are dispos-
able, you see an aggregate of your monitoring data, giving you a view of
the data grouped around container roles. Everything falls nicely into place.
You’re ready to take this to the next level by connecting your pipeline to
production. The production environment is where the containers will see
the most entropy. Rolling containers into production requires that you
spend your time building a canary release system to implement a rolling
upgrade process. Every change travels neatly from the development
environment to your production environment, shutting down one
container at a time and replacing them with a brand new version of your
code. This is what usually comes to mind when we talk about adopting
containers at a high level.
However, to the true practitioner, this is the tip of the iceberg. Doing
everything mentioned earlier still does not guarantee a perfect
environment for your containers. There’s still potential to have your plans
Container Networking
Containers do not live in isolation; they need to connect with other
services. Containers need to be discoverable and available for connection
If you have just started using containers in production, there are some key
questions that need answering to help stabilize your approach:
•
scenario? Should you use a bridge, overlay, underlay or another
networking approach?
default port. For example, when running a Tomcat container for a Java
application, the server and Apache JServ Protocol (AJP) port numbers
could be supplied at runtime using the operating system (OS) environment
variables.
containers from binding to a consistent port on the host, and will allow
The host network also prevents the constant change of iptables, which is
common with the bridge network. You would not want to do that change
in a production environment where iptables could be used for
. The bridged network is commonly used in development
and testing environments to allow multiple concurrent containers of the
same kind to run on a set of shared hosts. Port mappings are the way to
allow the bridged containers to be accessed from end users.
model for container networks that share the same IP address. This is
useful for grouping application services in containers that usually work
together.
backends.
service discovery.
together.
Isolating Networks
Container Storage
Containers have become an essential component of the continuous
delivery (CD) story for most organizations. Moreover, containers are the
best packaging means for microservices. Running containers in a
Lightweight kernels have emerged over the last few years, purpose-built
for the demands of applications. This rise is due to the fact that a
lightweight container image contributes to a faster deployment, which
then leads to rapid feedback loops for developers. Teams running
• How do you select the right persistent storage backend for containers?
OverlayFS
compared to AUFS; however, it is important to have this tested for a
certain period of time before moving to production, primarily due to its
the container runtime rkt also uses it in the new Linux kernels.
If you choose to use Device Mapper, and are running multiple containers
on a host, it is preferred to use real block devices for data and metadata.
user inside the container. This could be painful to manage if the shared
one line, and making them available as one layer when building container
images.
When using lightweight base images like Alpine for Docker, you need to
make sure they comply with apk package manager, which may have
Most host housekeeping mechanisms need to watch out for old container
images that have been around for a while and do not need to be on the
host. If your continuous delivery pipeline has been active, putting out tens
Container Security
Security concerns also impact the way container storage and networks
are implemented. Vulnerability scans and signed container images are
becoming well-known practices amongst developers. Container
marketplaces like Docker’s Hub and Store, have already implemented
Here are some key questions that need to be answered when considering
the security of container deployments in production environments:
•
deploying them in production?
deployment. Usually, the base image is the one that remains less prone to
changes, while the application changes are constantly baked in through
continuous integration. When security changes are proposed in the form
of patches to the operating environment, the change is passed on to the
base image.
The base image is rebuilt with the planned change, and is then used as a
new base image to rebuild the application containers. It is important to
have a consistent image across all environments. A change in the base
obtain the keys either from the local store or a remote metadata store.
CoreOS has also integrated SVirt with the rkt runtime; it is available by
default whenever a new container is run. This is especially critical if the
developer uses an untrusted container image from the public web without
Summary
Nathan McCauley
NETWORKING, SECURITY
& STORAGE DIRECTORY
Avi Vantage Platform (Avi Networks) Overlays and Virtual Networking Tools
A software-based application delivery solution that integrates with container-based environments to provide
Aviatrix (Aviatrix)
Open Source Cisco Application Centric Infrastructure (Cisco) Overlays and Virtual Networking Tools
container-based
Diamanti (Diamanti)
A purpose-built container infrastructure that addresses the challenges of deploying containers to production
Open Source Kuryr (OpenStack Foundation) Overlays and Virtual Networking Tools
libnetwork provides a
Nuage Networks VSP (Virtualized Services Platform) Overlays and Virtual Networking Tools
(Nokia)
overlay for
PLUMgrid Open Networking Suite (PLUMgrid) Overlays and Virtual Networking Tools
Open Source Project Calico (Tigera) Overlays and Virtual Networking Tools
BanyanOps (BanyanOps)
Bluemix (IBM)
A container vulnerability analysis service providing static analysis of vulnerabilities in appc and Docker
Conjur (Conjur)
Provides on-premises container management and deployment services to enterprises with a production-ready
platform supported by Docker and
Docker Content Trust is a feature that makes it possible to verify the publisher
ASP is a distributed software platform designed to continuously protect communications within and
The OpenSCAP Base is both a library and a command line tool which can be used to parse and evaluate each
Polyverse (Polyverse)
The
private registry supports group access policies to
Quay.io (CoreOS)
the latest
providing insight to the security posture before instantiating a live
Acropolis (Nutanix)
Bluemix (IBM)
to
Diamanti (Diamanti)
A purpose-built container infrastructure that addresses the challenges of deploying containers to production
Manta is an
Manta
Quobyte (Quobyte)
StorageOS (StorageOS)
provides enterprise storage array functionality delivered via software on a pay-as-you-go basis