Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?

Flexible transactional scale for the connected world.
Challenges to Scaling MySQL:
Pros & Cons of Sharding
Dave A. Anselmi @AnselmiDave
Director of Product Management

Questions for Today
o Why do DBAs think sharding is the only solution?
o What are the long-term costs of sharding?
o What is a better alternative to sharding MySQL?
o How real is it? Is it too good to be true?
PROPRIETARY & CONFIDENTIAL 2

The Typical Path to Sharding…
SCALE
T I M E
App Too Slow
Lost Users
Hit the wall
App Too Slow
Lost Users
Hit the wall
(again)
LAMP stack
AWS, Azure,
RAX, GCE, etc
Private cloud
Migrate to
bigger
MySQL
machine
• Read slaves, then
Sharding, etc:
• Add more hardware &
DBAs
• Refactor Code
/Hardwired App
 More Expensive
 Higher Risk
 Lost Revenue

• Ongoing Refactoring
• Ongoing Hardware
• Ongoing Data balancing
• Ongoing Shard
Maintenance
 Increasing Expenses
 Increasing Maintenance
 Increasing Risk
SCALE
T I M E
App Too Slow
Lost Users
Hit the wall
App Too Slow
Lost Users
Hit the wall
(again)
LAMP stack
AWS, Azure,
RAX, GCE, etc
Private cloud
Migrate to
bigger
MySQL
machine
Repeat 
The Typical Path to Sharding…
• Read slaves, then
Sharding, etc:
• Add more hardware &
DBAs
• Refactor Code
/Hardwired App
 More Expensive
 Higher Risk
 Lost Revenue

This is tradition
o Lots of people do it, including Facebook
o You can hang onto MySQL
o You do get write scale… with caveats

‘INEVITABILITY’ of SHARDING...

Other Scaling Solutions Can’t Scale-out Writes
Approach How Pro’s Con’s
Scale-Up Keep increasing the size of
the (single) database
server
• Simple, no application changes needed
• Best solution for capacity, if it can handle
your workload
• Expensive. At some point, you’re paying 5x for
2x the performance
• Capacity Limit. Most clouds provide up to 64
‘vcpu’s at most for a single server
Read Slaves Add a ‘Slave’ read-
server(s) to ‘Master’
database server
• Simple to implement, lots of automation
available
• Read/write fan-out can be done at the
proxy level
• Best for read-heavy workloads- only adds
Read performance
• Data consistency issues can occur
• Critical apps still need to read from master
Master-Master Add additional ‘Master’(s)
to ‘Master’ database server
• Adds read + write scaling without needing
to shard
• Depending on workload, scaling can
approach linear
• Adds write scaling at the cost of read-slaves,
adding even more latency
• Application changes required to ensure data
consistency / conflict resolution

Sharding’s Two Options
Approach How Pro’s Con’s
Vertical
Sharding
Separating tables across
separate database servers
• Adds both write and read scaling,
depending on well-chosen table
distribution
• Much less difficult than ‘regular’ sharding,
with much of the gains
• Loses transactionality, referential integrity and
ACID across shards
• Consistent backups across all the shards are
very hard to manage
• Data management (skew/hotness) is ongoing
significant maintenance
Horizontal
Sharding
Partitioning tables across
separate database servers
• Adds both read and write scaling,
depending on well-chosen keys and low
skew
• Most common way to scale-out both
reads and writes
• Loses transactionality, referential integrity and
ACID across shards
• Consistent backups across all the shards are
very hard to manage
• Data management (skew/hotness) is ongoing
significant maintenance

What are the Challenges of Sharding?
c.f. “SQL Databases Don’t Scale”, 2006
Data Skew
across the
shard array
ACID Transactions
across all nodes
Significant Application
Changes Required
Consistent Backups
across the shard array
Complicated Infra
Full redundancy
required for HA
Rolling Reboots
Sharding Key
management
Cross-node
Referential Integrity

Sharding’s Challenges: Application Changes
o Significant Application Changes required
– Do you need cross-node transactions?
– Do you need consistent data?
o ACID Transactions across all nodes
– No RDBMS ensures cross-node ACID transactions
– Avoiding cross-node transactions requires array of cross-node
replication, introducing latency and significant infra complexity
o Cross-node referential integrity
– Parent/child relationships between tables (& self-referentiality)
which cross nodes aren’t automatically maintained
– Cascade UPDATEs and DELETEs aren’t automatically maintained

Sharding’s Challenges: Data Maintenance
o Data Skew across the shard array
– Each shard grows independently
– As shards split (or are combined) requires topology and/or PK
changes, updates to memcache, etc
o Sharding Key management
– Careful creation of the key is critical; updates cause both app
changes and significant data re-distribution
– Shard-key:Shard lookups require fast KVS, often memcache
o Rolling Reboots
– Often needed to reset inter-shard communication and/or
consistency

Sharding’s Challenges: Infra Maintenance
o Complicated Infra
– Typically each shard has 2 to 3 nodes
– Shard mapping (often memcache)
– Replication between nodes for consistency
o Full redundancy required for HA
– Each shard typically has at least 1 full copy (cf Pinterest:
Master/Master, DR, and Backup node)
o Consistent Backups across the shard array
– At best you can recover to a point-in-time on each shard, and then
use replication to roll-forward to match the other shards
– Very involved and time-consuming

Review: Questions for Today
o Why do DBAs think sharding is the only end-game?
– Sharding is the only well-known way to scale-out Writes
o What are the long-term costs of sharding?
– Significant app changes required, else Business flexibility suffers
– Very complicated infrastructure is needed
– Lots of redundant hardware and DBA + DevOps OPEX
o What is a better alternative to sharding MySQL?
o How real is it? Is it too good to be true?

o Shared-nothing architecture  scales linearly
o Clustered  Built-in redundancy
o Elastic  Scale out or scale in
o Cloud-native  Works on any public or private cloud
The new RDBMS requirements
16
Load Balancer

Scaling-Out Writes + Reads
without Sharding
1. Single Logical Relational Database
2. Automatic Data Distribution
3. Automatic Query Fan-Out
4. Automatic Data Rebalancing

What does ClustrixDB do?
Scale Beyond MySQL without Replication or Sharding
 More Transactions
 More Connections
 More Reads
 More Writes
Applications
ClustrixDB

ClustrixDB: Write + Read Linear Scale-Out,
Fault-tolerant, MySQL-Compatible
ClustrixDB
ACID Compliant
Transactions & Joins
Optimized for OLTP
Built-In Fault Tolerance
Flex-Up and Flex-Down
Minimal DB Admin
Also runs GREAT in
the Data Center
Built to run
GREAT
in any Cloud

ClustrixDB Overview
o Fully consistent and ACID-compliant
database
– Supports transactions
– Supports joins
– Optimized for OLTP
– Also supports reporting SQL
o All nodes are equal (no “special” node)
o All servers are read/write
o All servers accept client connections
o Tables and indexes distributed across all
nodes
– Fully automatic distribution, rebalancing, and
re-protection
Fully Distributed & Consistent Cluster
PrivateNetwork
ClustrixDB on commodity/cloud servers
HW or SW Load
Balancer
High Concurrency
Custom:
PHP, Java, Ruby, etc
Packaged:
Magento, etc
SQL-Based
Applications

ClustrixDB Design: Shared Nothing
Architecture
o Query compiler
– Distribute compiled partial query fragments to
the node containing the ranking replica
o Data map
– All nodes know where all replicas are, and
current ranking replicas
o Database engine
– All nodes can perform all database operations
(no leader, aggregator, leaf, data-only, etc.
nodes)
o Data: table slices
– All table slices (default: replicas = 2) auto-
redistributed by the rebalancer
Each Node Contains
ClustrixDB
Compiler Map
Engine Data
Map
Data
Map
Data
Compiler
Compiler
Engine
Engine
Map

Scaling-Out Writes + Reads without Sharding
1. Single Logical Relational Database
1. ACID transactions across all nodes in cluster
2. Ad-hoc queries and on-line schema changes
2. Automatic Data Distribution
1. As tables and indexes are created
2. 64bit consistent hashing: slice location is predictable
3. Automatic Query Fan-Out
1. Compiled query fragments routed to node with data
2. Queries processed in parallel across all nodes in cluster
4. Automatic Data Rebalancing
1. Fixes hotspots and data imbalances
2. Reprotects to ensure HA/Fault Tolerance

ClustrixDB
Automatic Data Distribution
o Tables auto-split into slices
o Every slice has a replica on another node
– Slices are auto distributed, auto protected
No Manual Intervention Required
BillionsofRows
Database
Tables
S1 S2
S2
S3
S3
S4
S4
S5
S5S1

Adding + Removing Nodes: Scaling Out + In
o Easy and simple Flex Up (or Flex Down)
– Single minimal interruption of service
o All servers handle writes and reads
– Workload is spread across more servers
after Flex Up
o Data is automatically rebalanced across
the cluster
– Tables are online for reads and writes
S1
S2
S3
S3
S4
S4
S5
S1
ClustrixDB
S2
S5

Automatic Query Fan-Out
o Load balancer spreads DB connections
across all nodes
o A session is established on any node
o Session controls query(s) execution
– Parse SQL
– Generate the execution plan
– Compile into binary fragments
– Lookup record(s) location
– Send fragments
– Complete the transaction
 Parse
 Plan
 Compile
Session
SQL-based
Applications
UPDATE users
SET online = 1
WHERE id = 8797;
HW or SW Load
Balancer
ID: 8797 | … | ONLINE:0
Fragment

o Load balancer spreads DB connections
across all nodes
o A session is established on any node
o Session controls query(s) execution
– Parse SQL
– Generate the execution plan
– Compile into binary fragments
– Lookup record(s) location
– Send fragments
– Complete the transaction
Session
SQL-based
Applications
UPDATE users
SET online = 1
WHERE id = 8797;
HW or SW Load
Balancer
ID: 8797 | … | ONLINE:1
Ack
Ack
Automatic Query Fan-Out

Automatic Data Rebalancing
o The Rebalancer automatically:
– Initial Data
• Distributes the data into even slices across nodes
– Data Growth
• Splits large slices into smaller slices
– Flex Up/Flex Down
• Moves slices to leverage new nodes and/or evacuate nodes
– Failed Nodes
• Re-protects slices to ensure proper replicas exist
– Skewed Data
• Re-distributes the data to even out across nodes
– Hot Slice Balancing
• Finds hot slices and balances them across nodes
Q: How do you ensure data stays well distributed in a clustered environment?
A: Let the ClustrixDB Rebalancer handle it automatically!
…while the DB stays open for businessPatent 8,543,538
Patent 8,554,726
Patent 9,348,883

Online Schema Change
o Allows reads & writes during
ALTER TABLE operations
– Add or remove column
– Rename column
– Change data type
– CREATE INDEX
o Process:
– Queue created to track changes
– Copy of table created
– Queues replayed until synchronized
– Atomic flip of table between transactions
TableTable
Queue
MYTABLE __building_MYTABLE
Atomic Flip
Reads & Writes
ALTER TABLE mytable ADD (foo int);

Online Schema Change
o Allows reads & writes during
ALTER TABLE operations
– Add or remove column
– Rename column
– Change data type
– CREATE INDEX
o Process:
– Queue created to track changes
– Copy of table created
– Queues replayed until synchronized
– Atomic flip of table between transactions
TableTable
MYTABLE__building_MYTABLE
Atomic Flip
Reads & Writes
ALTER TABLE mytable ADD (foo int);


PERFORMANCE BENCHMARKS
SYSBENCH & YCSB

Sysbench OLTP 90:10 Mix
o 90% Reads + 10% Writes
– Very typical workload mix
o 1 TPS = 10 SQL
– 9 SELECT + 1 UDATE
– a.k.a 10 operations/sec
o Shows scaling TPS by
adding servers:
– Oak4 = 4 (8-core) servers
To scale TPS while maintaining
Performance (i.e. 20 ms response)
just add servers to ClustrixDB
800,000 SQL/sec
@ 20 ms

ClustrixDB vs. Aurora vs. MySQL RDS
– Very typical workload mix
o 1 TPS = 10 SQL
– 9 SELECT + 1 UDATE
– a.k.a 10 operations/sec
o ClustrixDB shows scaling TPS
by adding servers:
– Aws4 = 4 (8-core) servers
ClustrixDB scales TPS far past Aurora’s
largest instance (db.r3.8xlarge)
MySQL
(largest)
Aurora
(largest)
ClustrixDB

Sysbench 100% Reads:
2.2 Million QPS @ 3ms latency (64 servers)
o Achieved 2.2 Million
SQL/sec
– With <3 ms query response
– Using 64 ClustrixDB servers
• 8-core AWS instances
Demonstrates linear scalability
to 64 servers
2,200,000 SQL/sec
@ 3 ms

Yahoo! Cloud Service Benchmark (YCSB)
95:05 Mix
– 1 Ops/sec = 1 SQL
o Designed to compare NoSQL
engines
– ClustrixDB runs equally fast
o Shows scaling TPS by
adding servers:
– 4 nodes = 4 (8-core) servers
To scale TPS while maintaining
Performance (i.e. 3 ms response)
just add servers to ClustrixDB
> 875,000
SQL/sec
@ 3 ms

Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?

More Related Content

What's hot (20)

Similar to Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL? (20)

More from Clustrix (10)

Recently uploaded (20)

Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?