0% found this document useful (0 votes)
20 views

Mongodb Microservices Containers Orchestration

This white paper introduces containers and orchestration technologies. Containers isolate applications from their environment and other containers, allowing consistent deployment. Orchestration tools manage multiple container deployments across infrastructure. The paper discusses Docker, Kubernetes, Mesos and other orchestration frameworks, and how to implement a MongoDB replica set using Docker and Kubernetes for high availability and scalability. It also addresses security considerations and real world use cases.

Uploaded by

Sahana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Mongodb Microservices Containers Orchestration

This white paper introduces containers and orchestration technologies. Containers isolate applications from their environment and other containers, allowing consistent deployment. Orchestration tools manage multiple container deployments across infrastructure. The paper discusses Docker, Kubernetes, Mesos and other orchestration frameworks, and how to implement a MongoDB replica set using Docker and Kubernetes for high availability and scalability. It also addresses security considerations and real world use cases.

Uploaded by

Sahana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

A MongoDB White Paper

Enabling Microservices
Containers & Orchestration Explained
June 2017
Table of Contents
Introduction 1

What are Containers? 1


Containers Compared to Virtual Machines (VMs) 2

How Containers Benefit Your business 3

Docker – the Most Popular Container Technology 3

Orchestration 4
Docker Machine 4
Docker Swarm 4
Docker Compose 4
Kubernetes 4
Mesos 5
Choosing an Orchestration Framework 6

Security Considerations 6

Considerations for MongoDB 6

Implementing a MongoDB Replica Set using Docker &


Kubernetes 7
Multiple Availability Zone MongoDB Replica Set 11

Kubernetes Enhancements for Stateful Services – Stateful


Sets 13

MongoDB and Containers in the Real World 13

We Can Help 13

Resources 14
Introduction

Want to try out MongoDB on your laptop? Execute a single This white paper introduces the concepts behind
command and you have a lightweight, self-contained containers and orchestration, then explains the available
sandbox; another command removes all traces when you're technologies and how to use them with MongoDB.
done.

Need an identical copy of your application stack in multiple What are Containers?
environments? Build your own container image and let your
development, test, operations, and support teams launch an
identical clone of your environment. To illustrate the concepts associated with software
containers, it is helpful to consider a similar example from
Containers are revolutionizing the entire software lifecycle: the physical world – shipping containers.
from the earliest technical experiments and proofs of
concept through development, test, deployment, and Shipping containers are efficiently moved using different
support. modes of transport – perhaps initially being carried by a
truck to a port, then neatly stacked alongside thousands of
Orchestration tools manage how multiple containers are other shipping containers on a huge container ship that
created, upgraded and made highly available. Orchestration carries them to the other side of the world. At no point in
also controls how containers are connected to build the journey do the contents of that container need to
sophisticated applications from multiple, microservice repacked or modified in any way.
containers.
Shipping containers are ubiquitous, standardized, and
The rich functionality, simple tools, and powerful APIs make available anywhere in the world, and they're extremely
container and orchestration functionality a favorite for simple to use – just open them up, load in your cargo, and
DevOps teams who integrate them into Continuous lock the doors shut.
Integration (CI) and Continuous Delivery (CD) workflows.

1
The contents of each container are kept isolated from that Containers Compared to Virtual Machines
of the others; the container full of Mentos can safely sit (VMs)
next to the container full of soda without any risk of a
reaction. Once a spot on the container ship has been There are a number of similarities between virtual
booked, you can be confident that there's room for all of machines (VMs) and containers – in particular, they both
your packed cargo for the whole trip – there's no way for a allow you to create an image and spin up one or more
neighboring container to steal more than its share of instances, then safely work in isolation within each one.
space. Containers, however, have a number of advantages which
make them better suited to building and deploying
Software containers fulfill a similar role for your application. applications.
Packing the container involves defining what needs to be
there for your application to work – operating system, Each instance of a VM must contain an entire operating
libraries, configuration files, application binaries, and other system, all required libraries, and of course the actual
parts of your technology stack. Once the container has application binaries. All of that software consumes several
been defined, that image is used to create containers that Gigabytes of storage and memory. In contrast, each
run in any environment, from the developer's laptop to your container holds its application and any dependencies, but
test/QA rig, to the production data center, on-premises or the same Linux kernel and libraries can be shared between
in the cloud, without any changes. This consistency can be multiple containers running on the host. The fact that each
very useful: for example, a support engineer can spin up a container imposes minimal overhead on storage, RAM, and
container to replicate an issue and be confident that it CPU means that many can run on the same host, and each
exactly matches what's running in the field. takes just a couple of seconds to launch.

Containers are very efficient and many of them can run on Running many containers allows each one to focus on a
the same machine, allowing full use of all available specific task; multiple containers then work in concert to
resources. Linux containers and cgroups are used to make implement sophisticated applications. In such microservice
sure that there's no cross-contamination between architectures, each container can use different versions of
containers: data files, libraries, ports, namespaces, and programming languages and libraries that can be upgraded
memory contents are all kept isolated. They also enforce independently.
upper boundaries on how much system resource (memory,
Due to the isolation of capabilities within containers, the
storage, CPU, network bandwidth, and disk I/O) a
effort and risk associated with updating any given
container can consume so that a critical application isn't
container is far lower than with a more monolithic
squeezed out by noisy neighbors.
architecture. The lends itself to Continuous Delivery – an
Metaphors tend to fall apart at some point, and that's true approach that involves fast software development
with this one as well. There are exceptions but shipping iterations and frequent, safe updates to the deployed
containers typically don't interact with each other – each application.
has its job to fulfill (keep its contents together and safe
The tools and APIs provided with container technologies
during shipping) and it doesn't need help from any of its
such as Docker are very powerful and more
peers to achieve that. In contrast, it can be very powerful to
developer-focused than those available with VMs. These
have software containers interact with each other through
APIs allow the management of containers to be integrated
well defined interfaces – e.g., one container provides a
into automated systems – such as Chef and Puppet – used
database service that an application running in another
by DevOps teams to cover the entire software development
container can access through an agreed port. The modular
lifecycle. This has led to wide scale adoption by
container model is a great way to implement microservice
DevOps-oriented groups.
architectures.
Virtual machines still have an essential role to play, as you'll
very often be running your containers within VMs –

2
including when using the cloud services provided by providing the same capability – continue to provide service.
Amazon, Google, or Microsoft. With the addition of some automation (see the
orchestration section of this paper), failed containers can
be automatically recreated (rescheduled) either on the
How Containers Benefit Your same or a different host, restoring full capacity and

business redundancy.

DevOps & Continuous Delivery


Delivery. When the application Docker – the Most Popular
consists of multiple containers with clear interfaces
between them, it is a simple and low-risk matter to update
Container Technology
a container, assess the impact, and then either revert to the
old version or roll the update out across similar containers. The simplicity of Docker and its rich ecosystem make it
By having multiple containers provide the same capability, extremely powerful and easy to use.
upgrading each container can be done without negatively
Specific Docker containers are created from images which
affecting service.
have been designed to provide a particular capability –
Replic
Replicating
ating Envir
Environments
onments. When using containers, it's a whether that be, for example, just a base operating system,
trivial matter to instantiate identical copies of your full a web server, or a database. Docker images are
application stack and configuration. These can then be constructed from layered filesystems so they can share
used by new hires, partners, support teams, and others to common files, reducing disk usage and speeding up image
safely experiment in isolation. download. Docker Hub provides thousands of images, that
can be extended or used as is, to quickly create a container
Accurate TTesting
esting. You can have confidence that your QA that's running the software you want to use – for example,
environment exactly matches what will be deployed – down all it takes to get MongoDB up and running is the
to the exact version of every library. command docker run --name my-mongodb -d mongo
Sc
Scalability
alability. By architecting an application to be built from which will download the image (if it’s not already on the
multiple container instances, adding more containers machine) and use it to start the container. Proprietary
scales out capacity and throughput. Similarly, containers images can be made available within the enterprise using a
can be removed when demand falls. Using orchestration local, private registry rather than Docker Hub.
frameworks – such as Kubernetes and Apache Mesos –
Docker containers are based on open standards, allowing
further simplifies elastic scaling.
containers to run on all major Linux distributions. They
Isolation
Isolation. Every container running on the same host is support bare metal, VMs, and cloud infrastructure from
independent and isolated from the others as well as from vendors such as Amazon, Google, and Microsoft.
the host itself. The same equipment can simultaneously Integration with cloud services – e.g., with the Google
host development, support, test, and production versions of Container Engine (GCE) – means that running your
your application – even running different versions of tools, software in a scalable, highly available configuration is just
languages, databases, and libraries without any risk that a few clicks away.
one environment will impact another. Docker provides strong isolation where each container has
Performance
erformance. Unlike VMs (whether used directly or its own root filesystem, processes, memory, network ports,
through Vagrant), containers are lightweight and have namespace, and devices. But to be of use, containers need
minimal impact on performance. to be able to communicate with the outside world as well
as other containers. To this end, Docker containers can be
High AAvailability
vailability. By running with multiple containers, configured to expose ports as well as map volumes to
redundancy can be built into the application. If one directories on the host. Alternatively, Docker containers can
container fails, then the surviving peers – which are

3
be linked so that they communicate without opening up Docker Swarm
these resources to other systems.
Docker Swarm produces a single, virtual Docker host by
clustering multiple Docker hosts together. It presents the
Orchestration same Docker API; allowing it to integrate with any tool that
works with a single Docker host.

Clearly, the process of deploying multiple containers to A common practice is for Docker Swarm to employ Docker
implement an application can be optimized through Machine to create the hosts making up the swarm –
automation. This becomes more and more valuable as the especially early on in the development process.
number of containers and hosts grow. This type of
Docker Swarm can grow with your needs as it allows for
automation is referred to as orchestration. Orchestration
pluggable scheduler backends. You can start off with the
can include a number of features, including:
default scheduler but swap in Mesos (see below) for large,
• Provisioning hosts production deployments.

• Instantiating a set of containers


• Rescheduling failed containers Docker Compose
• Linking containers together through agreed interfaces Docker Compose takes a file defining a multi-container
• Exposing services to machines outside of the cluster application (including dependencies) and deploys the
described application by creating the required containers. It
• Scaling out or down the cluster by adding or removing
is mostly aimed at development, testing, and staging
containers
environments.
A common term used in orchestration is scheduling – this
Benefits of using Docker Compose include:
refers to the orchestration framework deciding on which
host a container should run and then starting the container • A single host can run multiple, isolated environments
there. Rescheduling refers to restarting a container, either • Data is preserved when containers are shut down and
on the same host or elsewhere – e.g., when the container's restarted.
existing host restarts.
• It determines which containers for a project are already
There are many orchestration tools available for Docker; running, and which need to be started
some of the most common are described here.
• Compose files can be reused and extended between
projects
Docker Machine
Docker Machine provisions hosts and installs Docker Kubernetes
Engine (the lightweight runtime and tooling used to run
Kubernetes was created by Google and is one of the most
Docker containers) software on them.
feature-rich and widely used orchestration frameworks; its
Docker Machine provides commands for starting, stopping, key features include:
and restarting hosts in addition to upgrading the Docker
• Automated deployment and replication of containers
client and daemon.
• Online scale-in or scale-out of container clusters
Docker Machine is particularly convenient when using a
• Load balancing over groups of containers
Windows or OS X machine as it can automatically create a
Linux VM running on VirtualBox in which the Docker • Rolling upgrades of application containers
process runs. It is also able to create hosts in cloud • Resilience, with automated rescheduling of failed
environments, including: AWS, Azure, and Digital Ocean. containers

4
• Controlled exposure of network ports to systems and of which type, which should exist at any point in
outside of the cluster time. If a pod fails then the Replication Controller
creates a replacement; if a request is made to increase
Kubernetes is designed to work in multiple environments,
the size of the cluster then the Replication Controller
including bare metal, on-premises VMs, and public clouds.
starts the additional pods.
Google Container Engine provides a tightly integrated
platform which includes hosting of the Kubernetes and
Docker software, as well as provisioning the host VMs and Mesos
orchestrating the containers.
Apache Mesos is designed to scale to tens of thousands of
The key components making up Kubernetes are: physical machines. Mesos is in production with a number of
large enterprises such as Twitter, Airbnb, and Apple. An
• A Cluster is a collection of one or more bare-metal
application running on top of Mesos is made up of one or
servers or virtual machines (referred to as nodes)
more containers and is referred to as a framework. Mesos
providing the resources used by Kubernetes to run one
offers resources to each framework; and each framework
or more applications.
must then decide which to accept. Mesos is less
• Pods are groups of containers and volumes co-located feature-rich than Kubernetes and may involve extra
on the same host. Containers in the same Pod share the integration work – defining services or batch jobs for
same network namespace and IP address (unique only Mesos is programmatic while it is declarative for
within the cluster) and can communicate with each Kubernetes.
other using localhost. Pods are considered to be
There is currently a project to run Kubernetes as a Mesos
ephemeral, rather than durable entities, and are the
framework. Mesos provides the fine grained resource
basic scheduling unit. The structure, contents, and
allocation of Kubernetes pods across the nodes in a
interfaces for a pod are defined using either a JSON or
cluster. Kubernetes adds the higher level functions such
YAML configuration file.
as: load balancing; high availability through failover
• Volumes map ephemeral directories within a container (rescheduling); and elastic scaling.
to persistent storage which survives container restarts
and rescheduling. Volumes also allow data to be shared Mesos is particularly suited to environments where the
amongst containers within a pod. application needs to be co-located with other services such
as Hadoop, Kafka, and Spark. Mesos is also the foundation
• Services act as basic load balancers and ambassadors
for a number of distributed systems such as:
for other containers, exposing them to the outside
world. e.g., A service can provide a static, external IP • Apache Aurora – a highly scalable service scheduler for
Address & port which it maps to another (internal to the long-running services and cron jobs; it's used by
cluster) port on multiple containers. Twitter. Aurora extends Mesos by adding rolling
• Labels are tags assigned to entities such as containers updates, service registration, and resource quotas.
that allow them to be managed or referenced as a • Chronos – a fault tolerant service scheduler, to be used
group. One resource can reference one or more other as a replacement for cron, to orchestrate scheduled
resources by including their label(s) in its selector. jobs within Mesos.
e.g., containers in a cluster might have labels for
• Marathon – a simple to use service scheduler; it builds
environment, role and location; a service could be setup
upon Mesos and Chronos by ensuring that two Chronos
to act as an interface to a subset of those containers by
instances are running.
setting its selector to environment=production,
role=web-server, location=new-york.

• A Replic
Replication
ation Contr
Controller
oller handles the scheduling of
pods across the cluster. When configuring a Replication
Controller, you specify the required numbers of pods,

5
Choosing an Orchestration Framework Within a container, the ability for malicious or buggy
software to cause harm can be reduced by using resource
Each orchestration platform has advantages relative to the isolation and rationing.
others and so users should evaluate which are best suited
to their needs. Aspects to consider include: It's important to ensure that the container images are
regularly scanned for vulnerabilities, and that the images
• Does your enterprise have an existing DevOps are digitally signed. There are now many projects that
framework that the orchestration must fit within and provide scripts and scanning tools that can check if images
what APIs does it require? and packages are up to date and free of security defects.
• How many hosts will be used? Mesos is proven to work Note that updating the images has no impact on existing
over thousands of physical machines. containers; fortunately, Kubernetes and Aurora have the
ability to perform rolling updates of containers.
• Will the containers be run on bare metal, private VMs, or
in the cloud? Kubernetes is widely used in cloud
deployments.
Considerations for MongoDB
• Are there requirements for automated high availability?
A Kubernetes Replication Controller will automatically
reschedule failed pods/containers; Mesos considers Running MongoDB with containers and orchestration
this the role of an application’s framework code. introduces some additional considerations:

• Is grouping and load balancing required for services? • MongoDB database nodes are stateful. In the event that
Kubernetes provides this but Mesos considers it a a container fails, and is rescheduled, it's undesirable for
responsibility of the application’s framework code. the data to be lost (it could be recovered from other
nodes in the replica set, but that takes time). To solve
• What skills do you have within your organization? Mesos
typically requires custom coding to allow your this, features such as the Volume abstraction in
application to run as a framework; Kubernetes is more Kubernetes can be used to map what would otherwise
declarative. be an ephemeral MongoDB data directory in the
container to a persistent location where the data
• Setting up the infrastructure to run containers is simple
survives container failure and rescheduling.
but the same is not true for some of orchestration
frameworks – including Kubernetes and Mesos. • MongoDB database nodes within a replica set must
Consider using hosted services such as Google communicate with each other – including after
Container Engine for Kubernetes, particularly for proofs rescheduling. All of the nodes within a replica set must
of concept. know the addresses of all of their peers, but when a
container is rescheduled, it is likely to be restarted with
a different IP Address. For example, all containers within
Security Considerations a Kubernetes Pod share a single IP address, which
changes when the pod is rescheduled. With Kubernetes,
this can be handled by associating a Kubernetes
While many of the concerns when using containers are Service with each MongoDB node, which uses the
common to bare metal deployments, containers provide an Kubernetes DNS service to provide a hostname for the
opportunity to improve levels of security if used properly. service that remains constant through rescheduling.
Because containers are so lightweight and easy to use, it's
• Once each of the individual MongoDB nodes is running
easy to deploy them for very specific purposes, and the
(each within its own container), the replica set must be
container technology helps ensure that only the minimum
initialized and each node added. This is likely to require
required capabilities are exposed.
some additional logic beyond that offered by off the
shelf orchestration tools. Specifically, one MongoDB

6
node within the intended replica set must be used to This section starts by creating the entire MongoDB replica
execute the rs.initiate and rs.add commands. set on a single GCE cluster, meaning that all members of
the replica set will be within a single availability zone. This
• When using WiredTiger (default from 3.2), the
clearly doesn't provide geographic redundancy. In reality,
WiredTiger cache size defaults to 60% of the host's
little has to be changed to run across multiple clusters and
RAM minus 1 GB. If using cgroups to constrain the
those steps are described in the "Multiple Availability Zone
amount of RAM used by a container, WiredTiger ignores
MongoDB Replica Set" section.
that constraint when calculating its cache size. Override
that behaviour by setting wiredTigerCacheSizeGB. Each member of the replica set will be run as its own pod
• If the orchestration framework provides automated with a service exposing an external IP address and port.
rescheduling of containers (as Kubernetes does, for This 'fixed' IP address is important as both external
instance) then this can increase MongoDB's resiliency applications and the other replica set members can rely on
as a failed replica set member can be automatically it remaining constant in the event that a pod is
recreated, restoring full redundancy levels without rescheduled.
human intervention.
Figure 1 illustrates one of these pods and the associated
• It should be noted that while the orchestration Replication Controller and service:
framework might monitor the state of the containers, it
• Starting at the core there is a single container named
is unlikely to monitor the applications running within the
mongo-node1. mongo-node1 includes an image called
containers, or backup their data. That means it's
mongo which is a publicly available MongoDB container
important to use a strong monitoring and backup
solution such as MongoDB Cloud Manager, included image hosted on Docker Hub. The container exposes
with MongoDB Enterprise Advanced and MongoDB port 27107 within the cluster.
Professional. Consider creating your own image that • The Kubernetes volumes feature is used to map the
contains both your preferred version of MongoDB and /data/db directory within the connector to the
the MongoDB Automation Agent. persistent storage element named
mongo-persistent-storage1; which in turn is
mapped to a disk named mongodb-disk1 created in
Implementing a MongoDB the Google Cloud. This is where MongoDB would store
Replica Set using Docker and its data so that it is persisted over container

Kubernetes rescheduling.

• The container is held within a pod which has the labels


to name the pod mongo-node and provide an (arbitrary)
As described in the previous section, distributed databases
instance name of rod.
such as MongoDB require a little extra attention when
being deployed with orchestration frameworks such as • A Replication Controller named mongo-rc1 is
Kubernetes. This section goes to the next level of detail, configured to ensure that a single instance of the
showing how this can actually be implemented. mongo-node1 pod is always running.

• The LoadBalancer service named mongo-svc-a


For simplicity, the Google Container Engine (GCE) cloud
environment is used but most of the information holds true exposes an IP Address to the outside world together
if you deploy your own Kubernetes infrastructure. with the port of 27017 which is mapped to the same

Kubernetes 1.3 introduced Minikube to simplify the port number in the container. The service identifies the
creation of a minimal, single-node Kubernetes cluster on correct pod using a selector that matches the pod's
your laptop to get developers up and running quickly – this labels. That external IP Address and port will be used by
could be a good option for initial experiments. both an application and for communication between the
replica set members. There are also local IP addresses
for each container, but those change when containers

7
Figur
Figure
e 1: Pod for a Single Replica Set member

are moved or restarted, and so aren't of use for the apiVersion: v1


kind: ReplicationController
replica set. metadata:
name: mongo-rc1
labels:
name: mongo-rc
spec:
The configuration in Figure 1 can be described in a replicas: 1
selector:
Kubernetes YAML file (this will later be extended to include
name: mongo-node1
the other two members of the replica set): template:
metadata:
apiVersion: v1 labels:
kind: Service name: mongo-node1
metadata: instance: rod
name: mongo-svc-a spec:
labels: containers:
name: mongo-svc-a - name: mongo-node
spec: image: mongo
type: LoadBalancer ports:
ports: - containerPort: 27017
- port: 27017 volumeMounts:
targetPort: 27017 - name: mongo-persistent-storage1
protocol: TCP mountPath: /data/db
name: mongo-svc-a volumes:
selector: - name: mongo-persistent-storage1
name: mongo-node1 gcePersistentDisk:
instance: rod pdName: mongodb-disk1
--- fsType: ext4

Figure 2 shows the configuration for a second member of


the replica set. 90% of the configuration is the same, with
just these changes:

8
Figur
Figure
e 2: Pod for the second Replica Set member

• The disk and volume names must be unique and so apiVersion: v1


kind: Service
mongodb-disk2 and mongo-persistent-storage2 metadata:
are used name: mongo-svc-a
labels:
• The Pod is assigned a label of instance: jane and name: mongo-svc-a
spec:
name: mongo-node2 so that the new service can
type: LoadBalancer
distinguish it (using a selector) from the rod Pod used ports:
in Figure 1. - port: 27017
targetPort: 27017
• The Replication Controller is named mongo-rc2 protocol: TCP
name: mongo-svc-a
• The Service is named mongo-svc-b and gets a unique, selector:
name: mongo-node1
external IP Address (in this instance, Kubernetes has instance: rod
assigned 104.1.4.5) ---
apiVersion: v1
The configuration of the third replica set member follows kind: Service
metadata:
the same pattern. name: mongo-svc-b
labels:
The final change required to the Kubernetes YAML file is name: mongo-svc-b
to override the default command that will be run when each spec:
type: LoadBalancer
container is started (the mongo image executes the ports:
mongod binary unless instructed otherwise). This is - port: 27017
targetPort: 27017
achieved by adding a command attribute to the container protocol: TCP
definitions of mongod --replSet my_replica_set. name: mongo-svc-b
selector:
name: mongo-node2
This is the final Kubernetes YAML file for the full
instance: jane
MongoDB replica set:

9
--- name: mongo-node2
apiVersion: v1 instance: jane
kind: Service spec:
metadata: containers:
name: mongo-svc-c - name: mongo-node2
labels: image: mongo
name: mongo-svc-c command:
spec: - mongod
type: LoadBalancer - "--replSet"
ports: - my_replica_set
- port: 27017 ports:
targetPort: 27017 - containerPort: 27017
protocol: TCP volumeMounts:
name: mongo-svc-c - name: mongo-persistent-storage2
selector: mountPath: /data/db
name: mongo-node3 volumes:
instance: freddy - name: mongo-persistent-storage2
--- gcePersistentDisk:
apiVersion: v1 pdName: mongodb-disk2
kind: ReplicationController fsType: ext4
metadata: ---
name: mongo-rc1 apiVersion: v1
labels: kind: ReplicationController
name: mongo-rc metadata:
spec: name: mongo-rc3
replicas: 1 labels:
selector: name: mongo-rc
name: mongo-node1 spec:
template: replicas: 1
metadata: selector:
labels: name: mongo-node3
name: mongo-node1 template:
instance: rod metadata:
spec: labels:
containers: name: mongo-node3
- name: mongo-node1 instance: freddy
image: mongo spec:
command: containers:
- mongod - name: mongo-node3
- "--replSet" image: mongo
- my_replica_set command:
ports: - mongod
- containerPort: 27017 - "--replSet"
volumeMounts: - my_replica_set
- name: mongo-persistent-storage1 ports:
mountPath: /data/db - containerPort: 27017
volumes: volumeMounts:
- name: mongo-persistent-storage1 - name: mongo-persistent-storage3
gcePersistentDisk: mountPath: /data/db
pdName: mongodb-disk1 volumes:
fsType: ext4 - name: mongo-persistent-storage3
--- gcePersistentDisk:
apiVersion: v1 pdName: mongodb-disk3
kind: ReplicationController fsType: ext4
metadata:
name: mongo-rc2 Figure 3 shows the full target configuration.
labels:
name: mongo-rc
spec:
replicas: 1
selector:
name: mongo-node2
template:
metadata:
labels:

10
The mongo client is then used to access any one of the
services to initiate the replica set and add the remaining
members:

mongo --host 104.155.100.188


rs.initiate()
conf=rs.conf()
conf.members[0].host="104.155.100.188:27017"
rs.reconfig(conf)
rs.add("104.155.91.88")
rs.add("104.155.93.219")

An application may now connect to the replica set as


normal by including all three external IP addresses in its
connect string; the MongoDB connector logic will ensure
that writes and queries are directed to the correct pod.

All resources other than the disks can be removed by


executing kubectl delete -f mongo-rs.yaml.

Note that even if running the configuration shown in Figure


3 on a Kubernetes cluster of three or more nodes,
Kubernetes may (and often will) schedule two or more
MongoDB replica set members on the same host. This is
because Kubernetes views the three pods as belonging to
three independent services.

To increase redundancy (within the zone), an additional


headless service can be created. The new service provides
no capabilities to the outside world (and will not even have
Figur
Figuree 3: Complete Replica Set Deployment an IP address) but it serves to inform Kubernetes that the
three MongoDB pods form a service and so Kubernetes
In order to actually create the replica set, the first step is to
will attempt to schedule them on different nodes.
create a GCE cluster of 3 machines, and 3 persistent disks
using the gcloud command: Figure 4 illustrates the addition of the extra service
(mongo-svc-null) which associates itself with the three
gcloud container clusters create cluster1
MongoDB pods using a selector of mongo-rs-name:
gcloud compute disks create mongodb-disk1 rainbow which matches with a new label added to each of
gcloud compute disks create mongodb-disk2
the pod definitions for the MongoDB replica set members.
gcloud compute disks create mongodb-disk3

It is then simply a matter of using the kubectl command


to process the Kubernetes YAML file:
Multiple Availability Zone MongoDB
Replica Set
kubectl create -f mongo-rs.yaml

There is risk associated with the replica set created above


At this point, each of the mongod processes is running it its
in that everything is running in the same GCE cluster, and
own pod/container but there is a final step to form them
hence in the same availability zone. If there were a major
into an active replica set. Before continuing, it is necessary
incident that took the availability zone offline, then the
to identify the external IP addresses of the three new
MongoDB replica set would be unavailable. If geographic
services using the command kubectl get svc.

11
Figur
Figure
e 4: Headless service to avoid co-locating of MongoDB replica set members

redundancy is required then the three pods should be run gcloud container clusters create "europe-1" \
--zone "europe-west1-c" --num-nodes 1
in three different availability zones or regions. gcloud compute disks create --size=200GB \
--zone="europe-west1-c" mongodb-disk-europe
Surprisingly little needs to change in order to create a kubectl create -f mongo-europe.yaml
similar replica set that is split between three zones – which
requires three clusters. Each cluster requires its own
Kubernetes YAML file that defines just the pod, Replication
Controller and service for one member of the replica set. It
is then a simple matter to create a cluster, persistent
storage, and MongoDB node for each zone, e.g.:

Figur
Figure
e 5: Replica set running over multiple availability zones

12
Kubernetes Enhancements for Square Enix is one of the world’s leading providers of
gaming experiences, publishing such iconic titles as Tomb
Stateful Services – StatefulSets Raider and Final Fantasy. They have produced an internal
multi-tenant database-as-a-service using MongoDB and
Docker – find out more in this case study.
Running stateful services inside containers has historically
been discouraged, but it is something that organizations Comparethemarket.com is a one of the UK’s leading
have found increasingly attractive and necessary. As a providers for price comparison services and uses
result, the Kubernetes development community have been MongoDB as the operational database behind its large
working to make it simpler to do. microservice environment. Service uptime is critical, and
MongoDB’s distributed design is key to ensure that SLA’s
Kubernetes 1.3 (July 2016) introduced an alpha version of
are always met. Comparethemarket.com’s deployment
PetSets, which were subsequently replaced by
consists of microservices deployed in AWS. Each
St
StatefulSets
atefulSets – beta in Kubernetes 1.5 (December 2016)
microservice, or logical grouping of related microservices, is
and 1.6 (March 2017). StatefulSets group together a set of
provisioned with its own MongoDB replica set running in
stateful pods.
Docker containers, and deployed across multiple AWS
The StatefulSet functionality provides stricter guarantees Availability Zones to provide resiliency and high availability.
than those offered by a traditional Replication Controller – MongoDB Ops Manager is used to provide the operational
all targeted towards managing distributed, stateful automation that is essential to launch new features quickly:
applications. The most applicable guarantee for MongoDB deploying replica sets, providing continuous backups, and
are: performing zero downtime upgrades.

• Each pod has a unique, predictable, addressable, and


stable (survives rescheduling) hostname. These stable We Can Help
hostnames make it possible to create a MongoDB
replica set in Kubernetes without the need to use
external IP addresses within the replica set. We are the MongoDB experts. Over 3,000 organizations
rely on our commercial products, including startups and
• Stable, persistent storage volume mappings
more than half of the Fortune 100. We offer software and
• Sequential, predictable start up and shutdown ordering services to make your life easier:
of pods
MongoDB Enterprise Advanced is the best way to run
As these capabilities become generally available, this paper MongoDB in your data center. It's a finely-tuned package
will be updated to take advantage of the features of advanced software, support, certifications, and other
appropriate for MongoDB deployments. In the meantime, services designed for the way you do business.
Deploying a MongoDB Replica Set as a GKE Kubernetes
StatefulSet provides a worked example. MongoDB Atlas is a database as a service for MongoDB,
letting you focus on apps instead of ops. With MongoDB
Atlas, you only pay for what you use with a convenient
MongoDB and Containers in the hourly billing model. With the click of a button, you can
scale up and down when you need to, with no downtime,
Real World full security, and high performance.

MongoDB Stitch is a backend as a service (BaaS), giving


fuboTV provide a soccer streaming service in North
developers full access to MongoDB, declarative read/write
America and they run their full stack (including MongoDB)
controls, and integration with their choice of services.
on Docker and Kubernetes; find out the benefits they see
from this and how it's achieved in this case study. MongoDB Cloud Manager is a cloud-based tool that helps
you manage MongoDB on your own infrastructure. With

13
automated provisioning, fine-grained monitoring, and
continuous backups, you get a full management suite that
reduces operational overhead, while maintaining full control
over your databases.

MongoDB Professional helps you manage your


deployment and keep it running smoothly. It includes
support from MongoDB engineers, as well as access to
MongoDB Cloud Manager.

Development Support helps you get up and running quickly.


It gives you a complete package of software and services
for the early stages of your project.

MongoDB Consulting packages get you to production


faster, help you tune performance in production, help you
scale, and free you up to focus on your next release.

MongoDB Training helps you become a MongoDB expert,


from design to operating mission-critical systems at scale.
Whether you're a developer, DBA, or architect, we can
make you better at MongoDB.

Resources

For more information, please visit mongodb.com or contact


us at [email protected].

Case Studies (mongodb.com/customers)


Presentations (mongodb.com/presentations)
Free Online Training (university.mongodb.com)
Webinars and Events (mongodb.com/events)
Documentation (docs.mongodb.com)
MongoDB Enterprise Download (mongodb.com/download)
MongoDB Atlas database as a service for MongoDB
(mongodb.com/cloud)
MongoDB Stitch backend as a service (mongodb.com/
cloud/stitch)

New York • Palo Alto • Washington, D.C. • London • Dublin • Barcelona • Sydney • Tel Aviv
US 866-237-8815 • INTL +1-650-440-4474 • [email protected]
© 2017 MongoDB, Inc. All rights reserved.

14

You might also like