SlideShare a Scribd company logo
Building Apache Cassandra
clusters for massive scale
Covering theory and operational aspects of bring up
Apache Cassandra clusters - this presentation can be used
as a field reference.
Alex Thompson, Solution Architect APAC - DataStax Australia Pty Ltd
Operationalise the rollout of
nodes
2
Build a best practice reproducible machine
image using automation:
Use one of the core test linux distros and versions: RHEL, CentOS or Ubuntu Server.
Select a cloud server or on-premise hardware that at least meets minimum specifications for Apache Cassandra, refer
to this guide for details: Planning Apache Cassandra Hardware
For production, load testing and production like workloads do NOT use a SAN, NAS, CEPH or any other type of shared
storage, DO use directly attached SSDs.
More RAM is better, more CPU is better but don’t get stuck in the RDBMS trap of vertically scaling, Apache Cassandra
works best with many more medium spec’d nodes than a smaller amount of very large nodes - think horizontal scaling
not vertical scaling.
3
Build a best practice reproducible machine
image using automation:
Use an automation tool like Ansible, Salt, Chef or Puppet to:
1. Apply Apache Cassandra OS specific settings for Linux
2. Install Java JDK 1.8.latest
3. Install but not start Apache Cassandra via yum or apt (a tarball is also available)
4. Copy over this nodes cassandra.yaml and cassandra-env.sh
5. Lock down all ports except the required Apache Cassandra ports in iptables, you can see a list of the ports and
their usage here: Securing Firewall but as a simple list you need access on 22 (SSH), 7000, 7001(SSL), 9042
(CQL), 9160(Thrift - optional) and 7199(JMX-optional)
Refer to the presentation by Jon from Macquarie Bank on the use of Ansible and lessons learned for an in depth
discussion on automation - November 2016 meetup.
4
Minimum node specific cassandra.yaml
fields for automation deployment scripts:
cluster_name All nodes participating in a cluster must have the identical cluster name.
hints_directory Where to store hints for other nodes that are down, small disk space requirement.
authenticator Used to identify users; default is wide open, lock this down in combination with transport layer security and
on disk encryption if internet exposed.
authorizer Used to limit access/provide permissions; default is wide open, lock this down in combination with transport
layer security and on disk encryption if internet exposed.
data_file_directories Where you will store data for this node, this will be the largest consumer of disk space. You should put your
commitlog_directory and data_file_directories on different drives for performance.
commitlog_directory You should put your commitlog_directory and data_file_directories on different drives for performance.
saved_caches_directory Where to store your “fast start-up” cache; small disk space requirement.
5
Minimum node specific cassandra.yaml
fields for automation deployment scripts:
seeds When bootstrapping a new node into a cluster, the bootstrapping node will refer to a seed node to learn
topology of the cluster, with this information it can take ownership of token ranges and begin data transfer.
listen_address The ip-address of the node for a single homed 1x NIC node.
rpc_address The ip-address of the node for a single homed 1x NIC node.
endpoint_snitch GossipingPropertyFileSnitch
1. The parameter list above is for a basic C* cluster leaving many unlisted parameters at their default settings, the
default settings are very sane for most use cases but can be fine tuned to maximize performance and hardware
utilisation, only tweak the unlisted parameters when you know what you are doing.
2. The parameters listed above are in top down order as at 13/2/2017 for the github.com master Apache Cassandra
repository here: cassandra.yaml
6
Minimum node specific cassandra-env.sh
fields for automation deployment scripts:
If the cassandra-env.sh is left in default form it will allocate ¼ of the RAM in the node to Apache Cassandra, this can be
problematic on very small spec’d nodes as C* really needs a minimum 4GB HEAP allocation to function in development.
As a general rule if HEAP =< 16GB use ParNew/CMS GC otherwise HEAP > 16GB use G1 GC.
You set the HEAP by uncommenting the following in the cassandra-env.sh:
#MAX_HEAP_SIZE="4G"
#HEAP_NEWSIZE="800M"
G1 requires that only MAX_HEAP_SIZE be set.
In production the HEAP setting on G1 GC are usually 16,24,32GB.
ParNew/CMS requires both are set, as a guide HEAP_NEWSIZE should be 20-25% of MAX_HEAP_SIZE.
7
Summary so far
We now have a node that:
1. Is on the correct hardware
2. Has correct OS with basic tuning in place
3. Has the correct Java JDK version
4. Has Apache Cassandra installed via yum or apt
5. Has customised cassandra.yaml and cassandra-env.sh files
6. Has been secured at IPtable level
7. Can now be started and bootstrapped against seed in the cluster
8
Construction of the cluster
9
Bringing up the first node...
This is a new cluster when bring up the first node so there is in effect nothing to bootstrap against, Cassandra
understands this and initialises the node without going thru the bootstrapping phase.
>service cassandra start
Check /var/log/cassandra/system.log for startup process and monitor for any warnings or exceptions.
You most likely want to bring up multiple nodes at once in the new cluster, for the sake of this presentation I am
looking at one at a time so that i can break down the bootstrapping phases, to skip that and bring multiple nodes up at
once follow the documentation here:
Initializing a multiple node cluster (single datacenter)
10
Load some data
Load some data into the first node.
Here I am going to use the
cassandra-stress tool to load 100GB of
sample data.
Cassandra-stress can be used for
loading sample data and/or stress
testing a Cassandra cluster with read /
write workloads.
You can read more about
cassandra-stress here.
1
Tokens 0-9
Data on disk 100GB
11
Bootstrapping the second node...
Put the ip-address of the first node in the seed list of this node’s cassandra.yaml
>service cassandra start
Check /var/log/cassandra/system.log for bootstrapping progress.
12
Bootstrapping the second node...
Run the following on the first node and you will see your new node in UJ state - Up Joining:
>nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.10.3.62 100 GB 256 ? c934ced4-b1c9-4f0f-b278-83282cd7107f RAC2
UJ 10.10.3.63 3 MB 256 ? 1a3df7fa-a1e7-464a-9495-c6a52d61eafa RAC3
13
Bootstrapping...what happened?
So what is happening in this bootstrapping phase?
In Up Joining (UJ) state the node is not actively participating in any queries either read or write for both internode and
client to node traffic.
1. A calculation is done for this node’s share of the token space, in this case it takes half of the token space as it is
one of only two nodes in the ring and in taking half the token space it is taking responsibility for half the data in
the ring.
2. The node begins streaming in the data from the first node for its tokens.
3. The node completes streaming its data from the first node, this can take time for 100’s of GBs of data
4. The node changes state to UN (Up Normal)
5. The node can now be discovered by drivers and their application servers and now start responding to read /
write requests.
14
Data streaming
during bootstrap
Be aware on small clusters of the
costs of bootstrapping, the data
streaming phase can consume
considerable resources and take
increasing amounts of time for very
large amounts of data.
1
2
Tokens 0-4
Data on disk 100GB
Tokens 5-9
Data on disk ..growing
15
Second node
added
Notice that the second node now owns
half of the tokens in the ring.
Notice that the data on node 1 is
100GB on disk and the data on the
new node 2 is only 50GB on disk.
1
2
Tokens 0-4
Data on disk 100GB
Tokens 5-9
Data on disk 50GB
16
Bootstrapping data...WTF?
In bootstrapping the new node, I knew it took half the data off the first node but the amount of disk space used on the
first node didn’t change, it didn’t go down? WTF is going on here? Something is broken!
Rule: Bootstrapping a new node into a cluster does NOT clean up after itself and delete the orphaned data on the
original nodes!
Don’t get me wrong, the data on the first node is not hurting anything, it’s not used anymore, it just sits there using up
precious space, let's get rid of it by running the following command on the first node:
>nodetool cleanup
Note that in a Vnode cluster (most likely what you will be using) you have to run nodetool cleanup on all nodes in the
DC except of course the node you just added.
17
After cleanup
After [nodetool cleanup] has run data
is once again evenly distributed over
nodes.
1
2
Tokens 0-4
Data on disk 50GB
Tokens 5-9
Data on disk 50GB
18
Powerful
implications
We just doubled the raw compute
capacity of our database tier in the
following ways:
1. Doubled IO throughput
2. Doubled the amount of RAM
3. Doubled the amount of disk
4. Doubled the number of CPUs
1
2
Tokens 0-4
Data on disk 50GB
Tokens 5-9
Data on disk 50GB
19
Powerful
implications
The effect at the application tier is
arguably more profound, we have
doubled the workload capacity of the
underlying database tier to handle
increases in application tier traffic. So
as our workload increases at the
application tier we simply add nodes at
the Cassandra cluster level to soak up
the workload increase.
*The tps figures in this series are not real, your
tps limits will be dependent on your hardware,
data model, replication_factor and how you
read / write data. Use cassandra-stress to
emulate your real world traffic patterns and
and record performance behaviour.
1
Application server max tps 5000 tps
1000 tps
20
Powerful
implications
The effect at the application tier is
arguably more profound, we have
doubled the workload capacity of the
underlying database tier to handle
increases in application tier traffic. So
as our workload increases at the
application tier we simply add nodes at
the Cassandra cluster level to soak up
the workload increase.
1
2
1000 tps
1000 tps
Application server max tps 5000 tps
21
Practical
considerations
There is not much use having a two
node cluster, you really want a
minimum of 3 nodes and a
replication_factor of 3 and then scale
out your cluster from there.
1
23
22
Practical
considerations
Here we have stayed with a single
application server which is not a really
good idea from a redundancy
perspective but there is another
problem.
The tps capacity of the database tier
has scaled past the tps capacity of the
application tier, leaving the database
tier under-utilized.
1
5
2
3
4
8
6
7
9
9000 tps
Application server max tps 5000 tps
23
Practical
considerations
Time to start scaling out the
application tier to fully utilize the
capacity of the database tier.
1
5
2
3
4
8
6
7
9
9000 tps
Application server max tps 10000 tps
24
Triggers for adding more nodes and
capacity planning
Too much data per node You want to aim for 500GB-1TB of data per node, the more data per node the longer repairs,
bootstrapping and compactions take.
Insufficient free space on drives For SizeTieredCompactionStrategy (the default) you need 50% of the disk free at all times in the
worst case.
Poor IO performance If you have done everything right in regards to amount of data per node, have directly attached
SSD’s and have tuned both your hardware and Cassandra to maximize IO performance and you
still have poor IO performance then you need to scale out of the problem.
Bottlenecked CPUs Same as above, if you have done everything right and tuned both your hardware and Cassandra
to maximize CPU performance and you still have poor CPU performance then you need to scale
out of the problem.
25
Triggers for adding more nodes and
capacity planning
Poor JVM GC behaviour This can be tricky to troubleshoot, more than likely it’s just a scale out fix as you are
overloading the nodes with read / write traffic, but there are cases where a poor access pattern
or problematic use case can be the cause of GC churning.
Adding additional keyspaces and
application workloads to the cluster
Workloads are cumulative in resource demand.
Increases in application tier traffic If you double the amount of requests against your application tier, the relationship with
Cassandra is linear, you will need to double the number of nodes in your cluster to maintain
the same performance, it’s simple maths.
26
Summary so far
Now we have a basic cluster of 9 nodes that we can continue to scale out.
What we do not have is any form of redundancy:
1. What if a shared switch goes down?
2. What if a common rack chassis power supply goes down?
3. What if we loose the network to this physical data center?
Cassandra has probably the best answer to this of any DB solution available: the logical data center.
27
Redundancy, replication and
workload isolation via logical
Cassandra data centers
28
cluster
Data centers
Cassandra data centers (DCs) are a
logical not physical concept.
A Cassandra cluster is made up of
data centers and each data center
holds a complete token range.
You write your data to one data center
and it is replicated to another
datacenter, that other data center
could be in the same rack or across
the world.
A cluster can have many data centers
but practical limits do apply.
DC1
1
5
2
3
4
8
6
7
9
DC2
1
5
2
3
4
8
6
7
9
29
cluster
Data centers
Data centers are a versatile concept
and can be used for many differing
purposes, here are some examples:
1. Simple redundancy
2. Active failover from app tier
3. Geo edge serving
4. Workload isolation
As mentioned before, each DC holds
complete token range for the
keyspaces that are replicated to it, you
decide which keyspaces are
replicated.
DC1
1
5
2
3
4
8
6
7
9
DC2
1
5
2
3
4
8
6
7
9
CREATE KEYSPACE myKeyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}
30
cluster
Simple redundancy
This multi-dc cluster is a simple
redundancy setup, if we lose us-east-1
due to an outage we can access
us-west-1 for the data for business
continuity.
us-east-1
1
5
2
3
4
8
6
7
9
us-west-1
1
5
2
3
4
8
6
7
9
read/write DC
CREATE KEYSPACE myKeyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'}
31
cluster
Active failover
This multi-dc cluster is a an active
failover setup, if we lose us-east-1 due
to an outage we can failover the
application servers to us-west-1, this
can be configured at the cassandra
driver level*, in custom code, the
network layer or at the DNS level.
* See the April 2016 Sydney Cassandra Users
Meetup talk that covers most aspects of driver
configuration and strategies.
us-east-1
1
5
2
3
4
8
6
7
9
us-west-1
1
5
2
3
4
8
6
7
9
read/write DC actively fails over to the us-west-1 DC
CREATE KEYSPACE myKeyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'}
32
cluster
Geo edge serving
All DC’s are close to their own
in-country app servers.
Writes can be handled in any number
of ways, reads are always from the
closest DC.
Any write to any DC replicates to the
other 3 geographic locations.
US-DC
1
5
2
3
4
8
6
7
9
CREATE KEYSPACE myKeyspace
WITH replication =
{'class': 'NetworkTopologyStrategy', 'US-DC: '3', 'EU-DC': '3',, 'ME-DC': '3', 'AP-DC': '3'}
EU-DC
1
5
2
3
4
8
6
7
9
ME-DC
1
5
2
3
4
8
6
7
9
AP-DC
1
5
2
3
4
8
6
7
9
33
Workload isolation
34
cluster
Workload isolation
Apart from simple redundancy this is the most
important use of logical data centers in
Cassandra.
Different workloads are pointed to different
data centers to allow us to isolate say a spiky
web workload from an analytic Spark
workload, we can then independently scale
each DC to its own workload making the most
efficient use of resources.
In this example we replicate cass-DC tables to
spark-DC, perform analytics on them and write
to recommendation tables in the spark-DC
which replicate back to the cass-DC.
cass-DC
1
5
2
3
4
8
6
7
9
spark-DC
1
5
2
3
4
8
6
7
9
app server
CREATE KEYSPACE web-tables
WITH replication = {'class': 'NetworkTopologyStrategy', 'cass-DC: '3', 'spark-DC': '2'}
CREATE KEYSPACE recommendation-tables
WITH replication = {'class': 'NetworkTopologyStrategy', 'spark-DC: '2', 'cass-DC': '3'}
spark
35
C* Learning resources
The datastax documentation has more extensive descriptions of all the concepts listed here, please
refer to it if you need more in depth knowledge and don’t forget academy.datastax.com for full
courses and a multitude of Apache Cassandra learning resources.
36
Thanks!
Contact us:
DataStax Australia
alex.thompson@datastax.com
www.datastax.com
37
Ad

More Related Content

What's hot (19)

Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
BertrandDrouvot
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
DataStax Academy
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
Omid Vahdaty
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
Sudheer Kondla
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Evention
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Chris Adkin
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
Nathan Milford
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
Julien Anguenot
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Abdelhamide EL ARIB
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
Denish Patel
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
DataStax
 
JahiaOne - Performance Tuning
JahiaOne - Performance TuningJahiaOne - Performance Tuning
JahiaOne - Performance Tuning
Jahia Solutions Group
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and later
J.B. Langston
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
Paris Data Engineers !
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
BertrandDrouvot
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
DataStax Academy
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
Omid Vahdaty
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
Sudheer Kondla
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Evention
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Chris Adkin
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
Nathan Milford
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
Julien Anguenot
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Abdelhamide EL ARIB
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
Denish Patel
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
DataStax
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and later
J.B. Langston
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
Paris Data Engineers !
 

Similar to Building Apache Cassandra clusters for massive scale (20)

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
Alex Thompson
 
Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0
Santosh Kangane
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
Franck Pachot
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
Omid Vahdaty
 
Arun
ArunArun
Arun
123aruntajne
 
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Osama Mustafa
 
os
osos
os
lavanya lalu
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Stavros Kontopoulos
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Pollfish
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
Edureka!
 
Squid proxy server
Squid proxy serverSquid proxy server
Squid proxy server
Green Jb
 
Avi Apelbaum - RAC
Avi Apelbaum - RAC Avi Apelbaum - RAC
Avi Apelbaum - RAC
gridcontrol
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
varasteh65
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Impetus Technologies
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
Tharun Srinivasa
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
Ashok Modi
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
Alex Thompson
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0
Santosh Kangane
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
Franck Pachot
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
Omid Vahdaty
 
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Osama Mustafa
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Pollfish
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
Edureka!
 
Squid proxy server
Squid proxy serverSquid proxy server
Squid proxy server
Green Jb
 
Avi Apelbaum - RAC
Avi Apelbaum - RAC Avi Apelbaum - RAC
Avi Apelbaum - RAC
gridcontrol
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
varasteh65
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Impetus Technologies
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
Tharun Srinivasa
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
Ashok Modi
 
Ad

Recently uploaded (20)

Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Ad

Building Apache Cassandra clusters for massive scale

  • 1. Building Apache Cassandra clusters for massive scale Covering theory and operational aspects of bring up Apache Cassandra clusters - this presentation can be used as a field reference. Alex Thompson, Solution Architect APAC - DataStax Australia Pty Ltd
  • 3. Build a best practice reproducible machine image using automation: Use one of the core test linux distros and versions: RHEL, CentOS or Ubuntu Server. Select a cloud server or on-premise hardware that at least meets minimum specifications for Apache Cassandra, refer to this guide for details: Planning Apache Cassandra Hardware For production, load testing and production like workloads do NOT use a SAN, NAS, CEPH or any other type of shared storage, DO use directly attached SSDs. More RAM is better, more CPU is better but don’t get stuck in the RDBMS trap of vertically scaling, Apache Cassandra works best with many more medium spec’d nodes than a smaller amount of very large nodes - think horizontal scaling not vertical scaling. 3
  • 4. Build a best practice reproducible machine image using automation: Use an automation tool like Ansible, Salt, Chef or Puppet to: 1. Apply Apache Cassandra OS specific settings for Linux 2. Install Java JDK 1.8.latest 3. Install but not start Apache Cassandra via yum or apt (a tarball is also available) 4. Copy over this nodes cassandra.yaml and cassandra-env.sh 5. Lock down all ports except the required Apache Cassandra ports in iptables, you can see a list of the ports and their usage here: Securing Firewall but as a simple list you need access on 22 (SSH), 7000, 7001(SSL), 9042 (CQL), 9160(Thrift - optional) and 7199(JMX-optional) Refer to the presentation by Jon from Macquarie Bank on the use of Ansible and lessons learned for an in depth discussion on automation - November 2016 meetup. 4
  • 5. Minimum node specific cassandra.yaml fields for automation deployment scripts: cluster_name All nodes participating in a cluster must have the identical cluster name. hints_directory Where to store hints for other nodes that are down, small disk space requirement. authenticator Used to identify users; default is wide open, lock this down in combination with transport layer security and on disk encryption if internet exposed. authorizer Used to limit access/provide permissions; default is wide open, lock this down in combination with transport layer security and on disk encryption if internet exposed. data_file_directories Where you will store data for this node, this will be the largest consumer of disk space. You should put your commitlog_directory and data_file_directories on different drives for performance. commitlog_directory You should put your commitlog_directory and data_file_directories on different drives for performance. saved_caches_directory Where to store your “fast start-up” cache; small disk space requirement. 5
  • 6. Minimum node specific cassandra.yaml fields for automation deployment scripts: seeds When bootstrapping a new node into a cluster, the bootstrapping node will refer to a seed node to learn topology of the cluster, with this information it can take ownership of token ranges and begin data transfer. listen_address The ip-address of the node for a single homed 1x NIC node. rpc_address The ip-address of the node for a single homed 1x NIC node. endpoint_snitch GossipingPropertyFileSnitch 1. The parameter list above is for a basic C* cluster leaving many unlisted parameters at their default settings, the default settings are very sane for most use cases but can be fine tuned to maximize performance and hardware utilisation, only tweak the unlisted parameters when you know what you are doing. 2. The parameters listed above are in top down order as at 13/2/2017 for the github.com master Apache Cassandra repository here: cassandra.yaml 6
  • 7. Minimum node specific cassandra-env.sh fields for automation deployment scripts: If the cassandra-env.sh is left in default form it will allocate ¼ of the RAM in the node to Apache Cassandra, this can be problematic on very small spec’d nodes as C* really needs a minimum 4GB HEAP allocation to function in development. As a general rule if HEAP =< 16GB use ParNew/CMS GC otherwise HEAP > 16GB use G1 GC. You set the HEAP by uncommenting the following in the cassandra-env.sh: #MAX_HEAP_SIZE="4G" #HEAP_NEWSIZE="800M" G1 requires that only MAX_HEAP_SIZE be set. In production the HEAP setting on G1 GC are usually 16,24,32GB. ParNew/CMS requires both are set, as a guide HEAP_NEWSIZE should be 20-25% of MAX_HEAP_SIZE. 7
  • 8. Summary so far We now have a node that: 1. Is on the correct hardware 2. Has correct OS with basic tuning in place 3. Has the correct Java JDK version 4. Has Apache Cassandra installed via yum or apt 5. Has customised cassandra.yaml and cassandra-env.sh files 6. Has been secured at IPtable level 7. Can now be started and bootstrapped against seed in the cluster 8
  • 10. Bringing up the first node... This is a new cluster when bring up the first node so there is in effect nothing to bootstrap against, Cassandra understands this and initialises the node without going thru the bootstrapping phase. >service cassandra start Check /var/log/cassandra/system.log for startup process and monitor for any warnings or exceptions. You most likely want to bring up multiple nodes at once in the new cluster, for the sake of this presentation I am looking at one at a time so that i can break down the bootstrapping phases, to skip that and bring multiple nodes up at once follow the documentation here: Initializing a multiple node cluster (single datacenter) 10
  • 11. Load some data Load some data into the first node. Here I am going to use the cassandra-stress tool to load 100GB of sample data. Cassandra-stress can be used for loading sample data and/or stress testing a Cassandra cluster with read / write workloads. You can read more about cassandra-stress here. 1 Tokens 0-9 Data on disk 100GB 11
  • 12. Bootstrapping the second node... Put the ip-address of the first node in the seed list of this node’s cassandra.yaml >service cassandra start Check /var/log/cassandra/system.log for bootstrapping progress. 12
  • 13. Bootstrapping the second node... Run the following on the first node and you will see your new node in UJ state - Up Joining: >nodetool status Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.10.3.62 100 GB 256 ? c934ced4-b1c9-4f0f-b278-83282cd7107f RAC2 UJ 10.10.3.63 3 MB 256 ? 1a3df7fa-a1e7-464a-9495-c6a52d61eafa RAC3 13
  • 14. Bootstrapping...what happened? So what is happening in this bootstrapping phase? In Up Joining (UJ) state the node is not actively participating in any queries either read or write for both internode and client to node traffic. 1. A calculation is done for this node’s share of the token space, in this case it takes half of the token space as it is one of only two nodes in the ring and in taking half the token space it is taking responsibility for half the data in the ring. 2. The node begins streaming in the data from the first node for its tokens. 3. The node completes streaming its data from the first node, this can take time for 100’s of GBs of data 4. The node changes state to UN (Up Normal) 5. The node can now be discovered by drivers and their application servers and now start responding to read / write requests. 14
  • 15. Data streaming during bootstrap Be aware on small clusters of the costs of bootstrapping, the data streaming phase can consume considerable resources and take increasing amounts of time for very large amounts of data. 1 2 Tokens 0-4 Data on disk 100GB Tokens 5-9 Data on disk ..growing 15
  • 16. Second node added Notice that the second node now owns half of the tokens in the ring. Notice that the data on node 1 is 100GB on disk and the data on the new node 2 is only 50GB on disk. 1 2 Tokens 0-4 Data on disk 100GB Tokens 5-9 Data on disk 50GB 16
  • 17. Bootstrapping data...WTF? In bootstrapping the new node, I knew it took half the data off the first node but the amount of disk space used on the first node didn’t change, it didn’t go down? WTF is going on here? Something is broken! Rule: Bootstrapping a new node into a cluster does NOT clean up after itself and delete the orphaned data on the original nodes! Don’t get me wrong, the data on the first node is not hurting anything, it’s not used anymore, it just sits there using up precious space, let's get rid of it by running the following command on the first node: >nodetool cleanup Note that in a Vnode cluster (most likely what you will be using) you have to run nodetool cleanup on all nodes in the DC except of course the node you just added. 17
  • 18. After cleanup After [nodetool cleanup] has run data is once again evenly distributed over nodes. 1 2 Tokens 0-4 Data on disk 50GB Tokens 5-9 Data on disk 50GB 18
  • 19. Powerful implications We just doubled the raw compute capacity of our database tier in the following ways: 1. Doubled IO throughput 2. Doubled the amount of RAM 3. Doubled the amount of disk 4. Doubled the number of CPUs 1 2 Tokens 0-4 Data on disk 50GB Tokens 5-9 Data on disk 50GB 19
  • 20. Powerful implications The effect at the application tier is arguably more profound, we have doubled the workload capacity of the underlying database tier to handle increases in application tier traffic. So as our workload increases at the application tier we simply add nodes at the Cassandra cluster level to soak up the workload increase. *The tps figures in this series are not real, your tps limits will be dependent on your hardware, data model, replication_factor and how you read / write data. Use cassandra-stress to emulate your real world traffic patterns and and record performance behaviour. 1 Application server max tps 5000 tps 1000 tps 20
  • 21. Powerful implications The effect at the application tier is arguably more profound, we have doubled the workload capacity of the underlying database tier to handle increases in application tier traffic. So as our workload increases at the application tier we simply add nodes at the Cassandra cluster level to soak up the workload increase. 1 2 1000 tps 1000 tps Application server max tps 5000 tps 21
  • 22. Practical considerations There is not much use having a two node cluster, you really want a minimum of 3 nodes and a replication_factor of 3 and then scale out your cluster from there. 1 23 22
  • 23. Practical considerations Here we have stayed with a single application server which is not a really good idea from a redundancy perspective but there is another problem. The tps capacity of the database tier has scaled past the tps capacity of the application tier, leaving the database tier under-utilized. 1 5 2 3 4 8 6 7 9 9000 tps Application server max tps 5000 tps 23
  • 24. Practical considerations Time to start scaling out the application tier to fully utilize the capacity of the database tier. 1 5 2 3 4 8 6 7 9 9000 tps Application server max tps 10000 tps 24
  • 25. Triggers for adding more nodes and capacity planning Too much data per node You want to aim for 500GB-1TB of data per node, the more data per node the longer repairs, bootstrapping and compactions take. Insufficient free space on drives For SizeTieredCompactionStrategy (the default) you need 50% of the disk free at all times in the worst case. Poor IO performance If you have done everything right in regards to amount of data per node, have directly attached SSD’s and have tuned both your hardware and Cassandra to maximize IO performance and you still have poor IO performance then you need to scale out of the problem. Bottlenecked CPUs Same as above, if you have done everything right and tuned both your hardware and Cassandra to maximize CPU performance and you still have poor CPU performance then you need to scale out of the problem. 25
  • 26. Triggers for adding more nodes and capacity planning Poor JVM GC behaviour This can be tricky to troubleshoot, more than likely it’s just a scale out fix as you are overloading the nodes with read / write traffic, but there are cases where a poor access pattern or problematic use case can be the cause of GC churning. Adding additional keyspaces and application workloads to the cluster Workloads are cumulative in resource demand. Increases in application tier traffic If you double the amount of requests against your application tier, the relationship with Cassandra is linear, you will need to double the number of nodes in your cluster to maintain the same performance, it’s simple maths. 26
  • 27. Summary so far Now we have a basic cluster of 9 nodes that we can continue to scale out. What we do not have is any form of redundancy: 1. What if a shared switch goes down? 2. What if a common rack chassis power supply goes down? 3. What if we loose the network to this physical data center? Cassandra has probably the best answer to this of any DB solution available: the logical data center. 27
  • 28. Redundancy, replication and workload isolation via logical Cassandra data centers 28
  • 29. cluster Data centers Cassandra data centers (DCs) are a logical not physical concept. A Cassandra cluster is made up of data centers and each data center holds a complete token range. You write your data to one data center and it is replicated to another datacenter, that other data center could be in the same rack or across the world. A cluster can have many data centers but practical limits do apply. DC1 1 5 2 3 4 8 6 7 9 DC2 1 5 2 3 4 8 6 7 9 29
  • 30. cluster Data centers Data centers are a versatile concept and can be used for many differing purposes, here are some examples: 1. Simple redundancy 2. Active failover from app tier 3. Geo edge serving 4. Workload isolation As mentioned before, each DC holds complete token range for the keyspaces that are replicated to it, you decide which keyspaces are replicated. DC1 1 5 2 3 4 8 6 7 9 DC2 1 5 2 3 4 8 6 7 9 CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} 30
  • 31. cluster Simple redundancy This multi-dc cluster is a simple redundancy setup, if we lose us-east-1 due to an outage we can access us-west-1 for the data for business continuity. us-east-1 1 5 2 3 4 8 6 7 9 us-west-1 1 5 2 3 4 8 6 7 9 read/write DC CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'} 31
  • 32. cluster Active failover This multi-dc cluster is a an active failover setup, if we lose us-east-1 due to an outage we can failover the application servers to us-west-1, this can be configured at the cassandra driver level*, in custom code, the network layer or at the DNS level. * See the April 2016 Sydney Cassandra Users Meetup talk that covers most aspects of driver configuration and strategies. us-east-1 1 5 2 3 4 8 6 7 9 us-west-1 1 5 2 3 4 8 6 7 9 read/write DC actively fails over to the us-west-1 DC CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'} 32
  • 33. cluster Geo edge serving All DC’s are close to their own in-country app servers. Writes can be handled in any number of ways, reads are always from the closest DC. Any write to any DC replicates to the other 3 geographic locations. US-DC 1 5 2 3 4 8 6 7 9 CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'US-DC: '3', 'EU-DC': '3',, 'ME-DC': '3', 'AP-DC': '3'} EU-DC 1 5 2 3 4 8 6 7 9 ME-DC 1 5 2 3 4 8 6 7 9 AP-DC 1 5 2 3 4 8 6 7 9 33
  • 35. cluster Workload isolation Apart from simple redundancy this is the most important use of logical data centers in Cassandra. Different workloads are pointed to different data centers to allow us to isolate say a spiky web workload from an analytic Spark workload, we can then independently scale each DC to its own workload making the most efficient use of resources. In this example we replicate cass-DC tables to spark-DC, perform analytics on them and write to recommendation tables in the spark-DC which replicate back to the cass-DC. cass-DC 1 5 2 3 4 8 6 7 9 spark-DC 1 5 2 3 4 8 6 7 9 app server CREATE KEYSPACE web-tables WITH replication = {'class': 'NetworkTopologyStrategy', 'cass-DC: '3', 'spark-DC': '2'} CREATE KEYSPACE recommendation-tables WITH replication = {'class': 'NetworkTopologyStrategy', 'spark-DC: '2', 'cass-DC': '3'} spark 35
  • 36. C* Learning resources The datastax documentation has more extensive descriptions of all the concepts listed here, please refer to it if you need more in depth knowledge and don’t forget academy.datastax.com for full courses and a multitude of Apache Cassandra learning resources. 36