OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard

Scaling Open Source Big
Data Cloud Applications is
Easy/Hard
Paul Brebner
Instaclustr—Technology Evangelist
©Instaclustr Pty Limited, 2022
DeveloperWeek 10 May 2022

Who am I?
• Previously
• R&D in distributed systems and performance engineering.
• Last 5 years
• Technology Evangelist for Instaclustr (soon NetApp)
• 100+ Blogs, demo applications, talks
• Open Source technologies including
• Apache Cassandra, Spark, Kafka, Zookeeper
• and Redis, OpenSearch, PostgreSQL, Kubernetes,
Prometheus, OpenTracing, etc

Cloud Platform for Big Data
Open Source Technologies
Latest addition is
Workflows with Uber’s
Instaclustr Managed Platform

Cloud Platform for Big Data
Open Source Technologies
Latest addition is
Workflows with Uber’s
This talk focuses on Cassandra and Kafka

Scaling is Easy! Cassandra and Kafka
Homogeneous distributed clusters à horizontally scalable
www.cassandra.apache.org/_/cassandra-basics.html

But actually lots of moving parts
(source: http://trumpetb.net/loco/rodsf.html)

Complications – DCs, Racks, Nodes, Partitions,
Replication Factor, Time (for auto-scaling)
Rows have a
partition key
and are
stored in
different
partitions

Example 1 – Cassandra Auto-Scaling

Two Ways of Resizing Clusters
1 - Horizontal Scaling
• Add nodes, no interruption
• But scale up only (not down)
• Takes time, puts extra load on cluster as data streams to extra nodes
2 - Vertical Scaling
• Replace nodes with bigger (or smaller) node types (more/less cores)
• Scale up and down
• Takes time, temporary reduction in capacity
• Choice of how many nodes are replaced concurrently – by “node” (1 node at a
time) or by “rack” (all nodes in a rack) , or in-between

Cluster resizing time – by node vs. by
rack – by rack is faster but …?
Cluster = 6 nodes, 3 racks, 2 nodes per rack
By node (concurrency 1)
By rack (concurrency 2)

Resizing by node – capacity reduced by 1/6 total
nodes each resize operation (simplified model)

Resizing by rack – capacity reduced by 2/6
nodes each resize operation

Comparison – resize by rack faster but has
bigger capacity hit during resize

Observations
• If the capacity during resize is exceeded latencies will increase
• Made worse by Cassandra load balancing which assumes equal sized
nodes
• By node, more nodes in the Cluster reduces the impact of reduced cluster
capacity during resizing (some clusters have 100s of nodes) – but will take
longer
• Many of our clusters have <= 6 nodes

Auto-scaling model - increasing load à linear
regression over 1 hour extrapolated to future
We predict the cluster will reach
100% capacity around the 280
minute mark (220 minutes in the
future)
Extrapolated
Measured

Resize by Rack vs. Node - initiated in time to
prevent overloading during resize operation
Resize by rack must be initiated sooner c.f. resize by node, even thought it’s faster to resize, as it has less capacity
during resize (67% c.f. 83% of initial capacity)
By
Rack
By Node

Auto-scaling POC – worked!
Monitoring API
Linear Regression +
Rules
Provisioning API
Rules generalized to allow for
• scaling up and down
• resizing by any number of nodes concurrently, up to rack size

Example 2 – Anomaly Detection
JoAnn Morgan Apollo 11 Mission Control

Multiple technologies: Kafka,
Cassandra, Kubernetes

Massively Scalable Anomaly Detection
– Tuning knobs (Orange h/w, yellow s/w)
Scaling is (too) Easy!
Initially just increased h/w resources

But scalability not great
0
1
2
3
4
5
6
7
8
0 100 200 300 400 500 600 700
Billions
checks/day
Total Cores
Total Cores vs. Billions of checks/day (pre-tuning)

Tuning required! Scalability Post-tuning
0
2
4
6
8
10
12
14
16
18
20
0 100 200 300 400 500 600 700
Billions
checks/day
Total Cores
Total Cores vs. Billions of checks/day (pre-tuning)
Billions of checks/day (pre-tuning) Billions of checks/day (post-tuning)

Tuning – Optimize s/w resources
(red arrows)
1
2
3
1. Minimize Kafka Consumers (thread pool 1)
2. Minimize Cassandra Connections
3. Maximize Cassandra client concurrency (thread
pool 2)

Kafka topic partitions enable
consumer concurrency
partitions >= consumers
Partition n
Topic “Parties”
Partition 1
Producer
Partition 2
Consumer Group
Consumer
Consumer
Consumers share
work within groups
Consumer

High consumer/partition fan out
Can be caused by:
1 Design – many topics and/or many consumers
2 Slow consumers à need more consumers to increase
throughput

Kafka write architecture – partition
replication

Benchmarking revealed that partitions
and replication factor are the culprit
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1 10 100 1000 10000
TPS
Partitions
Kafka Partitions vs. Throughput
Cluster: 3 nodes x 4 cores = 12 cores total
Replication Factor 3 (TPS) Replication Factor 1 (TPS)

Implications?
• Bigger Cluster (more nodes, bigger nodes)
• Design to minimize topics and consumers
• Optimize consumers for minimum time
• Always benchmark with many partitions
• Blame the Apache Zookeeper?
• Responsible for Kafka control
• From version 3.0 it’s being replaced by native KRaft protocol
• Not yet production ready
• May enable more partitions (but may not impact throughput)

Scaling is Mostly Easy!
§ Using Scalable Open Source Big Data Technologies
§ Hosted by suitable Cloud providers
§ With suitable monitoring, understanding of autoscaling
and how different software “knobs” interact, and by
scaling incrementally
© Instaclustr Pty Limited, 2022

www.instaclustr.com
info@instaclustr.com
@instaclustr
THANK
YOU!
For further Information see blogs www.instaclustr.com/paul-brebner/

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard

More Related Content

Similar to OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard (20)

More from Paul Brebner (20)

Recently uploaded (20)

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard