Cassandra on Docker

Dockerizing Cassandra on Modern Linux

Myself & Instaclustr
• Adam Zegelin — Founding Software Engineer & Co-founder of Instaclustr 
adam@instaclustr.com · @zegelin
• Managed DataStax Enterprise and Apache Cassandra in the ☁  
(AWS, Azure, SoftLayer)
• Self-service dashboard — create, manage & monitor clusters
• 24/7/365 support, on-call engineers, uptime guarantee
• Focus on developing your awesome apps — we handle the Cassandra
• Grew from a need for Cassandra in a project
2© 2015. All Rights Reserved.

Nodes — Software Stack
• CoreOS — lightweight OS
• Docker — containerisation of everything
• systemd — service managemen
• journald — logging
• D-Bus — controlling systemd from Java from inside containers

Initial Implementation
• Amazon Web Services only
• Custom Ubuntu AMI (Amazon Machine Image)
• Based on stock Ubuntu AMI
• 2 AMIs (PV/HVM) × 9 regions = 18 images per version! 
(became unmaintainable very quickly)
• Custom cloud-init scripts — RAID disks, fetch config, etc.
• Cassandra installed with apt-get install cassandra / dse

Initial Implementation — AWS
• We selected instance storage backed AWS instances
• Instance storage is fast (SSDs) and low latency (local disk) but is volatile
— terminate the instance and all your data is gone
• The alternative, EBS (Elastic Block Storage), is basically SAN — slower,
higher latency and originally shared instance network bandwidth
• The newer c4.x and m4.x instances are “EBS optimised” and don’t share these limitations
• Only way to change AMI is to start a new machine
• Not possible to use immutable images with persistent ephemeral data
• Only feasible solution for updates is apt-get install

• One of the first “Docker Operating Systems”
• Available on every provider we support — AWS, Azure, SoftLayer
• CoreOS has pre-built images
• Small and minimalist — not much userland (not even man!)
• Other useful software — etcd, fleet, etc. 
(we currently don’t use them — but maybe in the future)
• In-use by some big players (Rackspace, PlayStation, Instaclustr 😀 )
• Recent funding from Google Ventures

• Container runtime + standardised image distribution & hosting + ecosystem
• Private image hosting options available, such as quay.io
• Immutable images — Yay! 🎉
• Images running in dev, test and production environments are equal
• Software installs, upgrades and uninstalls are clean
• Components are isolated — potentially conflicting components (different library
versions, JVM versions, etc.) can co-exist
• Even different userland layouts (Ubuntu, Debian, CentOS, etc)

• We containerise everything — C*, internal services, node
management and monitoring apps
• Single, well understood, image build and deploy process —
docker build & docker push
• Executed via Makefiles — one Make target per image — make push-all builds
and pushes everything
• Helps that all our internal apps are Java-based too

• Docker gives us immutable images for our components without
instance replacement
• CoreOS handles the rest (OS-level) via in-place updates
• Docker is provider agnostic
• CoreOS runs on all major cloud providers and bare-metal
• The result ☞ Instaclustr-managed C* can run anywhere #
+

systemd
• CoreOS uses systemd for service management
• systemd supports inter-service dependencies
• e.g. cassandra-backups.service “wants” cassandra.service
• aka, cassandra-backups can only run when cassandra is running
• systemd can automatically restart services
• Instaclustr services are fail-fast
• Cassandra not so much — in some cases — watchdog?

systemd cont’d
• Manages units of different types — service, timer, target, etc.
• service units manage processes
• timers start services on a schedule (ala cron)
• targets are for grouping/sync points
• cassandra.target “wants” cassandra.service, monitoring.serivce, datastax-
agent.service, backups.timer, etc
• All units can define dependencies and conflicts
• Dependencies of different “strengths” — Wants vs. Requires
• In both directions — Requires and RequiredBy

Basic Integration
• Cassandra runs as PID 1 in the container
• 1 primary process per container model
• Runs in foreground mode (-f)
• Responds to SIGTERM via docker stop, systemctl stop, etc
• Cassandra data and configuration is persistent on host
• Survives container restart
• Cassandra data and configuration directories mounted from host 
docker run -v /var/lib/instaclustr/etc/cassandra:/etc/cassandra …

Basic Integration cont’d
• Docker containers managed via systemd
• cassandra.service execs docker run cassandra …
• systemctl [start|stop|restart|status|…] cassandra
• Cassandra logging configured to write only to stdout
• systemd logging best practice
• Cassandra ⇢ Docker ⇢ systemd ⇢ journald
• journalctl -u cassandra

Basic Integration — Issues
• systemd starts dependent units when state is active
• process running = service active — unless configured otherwise
• ∴ dependent units start immediately
• process can hang but service stays active

Cassandra Startup
• JVM starts quickly
• JMX (nodetool) connectivity is available early
• Objects are exposed where they are constructed
• CQL/Thrift available late
• Can be toggled via cassandra.yaml or JMX/nodetool
• When is Cassandra “running”?
• When does cassandra.service transition from activating to active?
• When do dependent services start?

D-Bus
• RPC between processes
• Notifications
• Socket-based (typically UNIX sockets, but can be TCP)
• Accessible inside a container — mount the socket 
docker run -v /run/dbus:/run/dbus -v /run/systemd:/run/systemd …
• Multiple language bindings, including Java

D-Bus cont’d
• systemd is controlable via D-Bus
• Control host systemd inside a Docker container
• No need to fork/exec to run systemctl and co. 
(in-fact, systemctl is a wrapper around D-Bus calls)

D-Bus cont’d
Java bindings — dbus-java
systemctl restart cassandra
≝
systemdManager.RestartUnit("cassandra.service", "replace");

Enhanced Integration
• Service status = “active” — process running, or something more?
• Cassandra java process running vs. C* accepting CQL connections
• CQL clients are dependencies, but shouldn’t start until CQL is available
• Clients could fail-fast on no connectivity
• Will be automatically restarted
• Service will oscillate between active and failed — hard to detect
actual failures
• systemd will eventually timeout or give up — configurable
• JVM startup can be expensive — CPU usage spikes

Enhanced Integration cont’d
• systemd targets for CQL & Thrift — cassandra-cql.target
• Life-cycle tracks internal C* service
• i.e., Starts when CQL is available — not immediate
• nodetool disablebinary implies systemctl stop cassandra-cql.target
• Services that require CQL connectivity use 
WantedBy=cassandra-cql.target
• Starting cassandra-cql.target starts these services too
• Inverse of Wants

Enhanced Integration cont’d
• Java Agent side-loaded into Cassandra JVM
• Hooks into CQL/Thrift service life-cycle
• Implemented using runtime byte-code modification
• Controls systemd via D-Bus to start/stop associated
target units
• But Cassandra is open-source — why not modify‽
• Agents work with DSE & Apache Cassandra

Java Agent
• Java Agents (java.lang.instrument)
• java -javaagent:instaclustr-agent.jar …
• premain(…) method called at JVM startup
• can hook into JVM class-loading, transform byte-code, etc.
• Javassist, ASM — byte-code modification libraries

Hooks
public interface Server { 
public void start(); 
 
public void stop();
⋮ 
}
// in CassandraDaemon:
// Thrift 
thriftServer = new ThriftServer(rpcAddr, rpcPort, listenBacklog);
⋮ 
thriftServer.start();
⋮ 
thriftServer.stop();
 
// CQL 
nativeServer = new org.apache.cassandra.transport.Server(nativeAddr, nativePort);
⋮
nativeServer.start();
⋮
nativeServer.stop();

Hooks
public static void premain(String agentArgs, Instrumentation inst) { 
inst.addTransformer((loader, className, classBeingRedefined, protectionDomain, classfileBuffer) -> { 
if (!"org/apache/cassandra/transport/Server".equals(className)) 
return null; 
 
final ClassPool pool = ClassPool.getDefault(); 
try { 
final CtClass ctClass = pool.get("org.apache.cassandra.transport.Server"); 
// patch start() and stop() methods of the Server class 
{ 
final CtMethod method = ctClass.getDeclaredMethod("start"); 
method.insertAfter("com.instaclustr.Agent.serverStarted($0);"); 
} 
{ 
final CtMethod method = ctClass.getDeclaredMethod("stop"); 
method.insertAfter("com.instaclustr.Agent.serverStopped($0);"); 
} 
 
byte[] byteCode = ctClass.toBytecode(); 
ctClass.detach(); 
 
return byteCode; // return the modified byte-code 
 
} catch (final Exception e) {…} 
 
return null; 
}); 
}
// called when Server started — call systemd via dbus-java to start cassandra-cql.target
public static void serverStarted(final CassandraDaemon.Server server) {…} 
// called when Server stopped — call systemd via dbus-java to stop cassandra-cql.target 
public static void serverStopped(final CassandraDaemon.Server server) {…}

Docker Limitations and Sore Spots
• docker run is just a TTY proxy — actual container process is under
the docker dæmon process/cgroup
• systemd requires startup & watchdog notifications to originate
from started process, child, or process in same cgroup
• docker crash = all containers go bye-bye
• docker … everything — inc. image downloads & builds — runs as
root in the dæmon!
• processes inside containers are run un-elevated

Future
• Devel. systemd can now launch Docker containers natively via
machinectl
• Tighter integration with systemd
• Process hierarchy is correct — right cgroup and parents
• Java Agent can notify systemd for startup, status &
watchdog — via JNA + libsystemd

Cassandra on Docker

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Cassandra on Docker (20)

More from Instaclustr (9)

Recently uploaded (20)

Cassandra on Docker