SlideShare a Scribd company logo
MyRocks in MariaDB
Sergei Petrunia
sergey@mariadb.com
- What is MyRocks
- MyRocks’ way into MariaDB
- Using MyRocks
3
What is MyRocks
● InnoDB’s limitations
● LSM Trees
● RocksDB + MyRocks
4
Data vs Disk

Put some data into the database

How much is written to disk?
INSERT INTO tbl1 VALUES ('foo')

amplification =
size of data
size of data on disk
5
Amplification

Write amplification

Space amplification

Read amplification
amplification =
size of data
size of data on disk
foo
6
Amplification in InnoDB
● B*-tree
● Read amplification
– Assume random data lookup
– Locate the page, read it
– Root page is cached
● ~1 disk read for the leaf page
– Read amplification is ok
7
Write amplification in InnoDB
● Locate the page to update
● Read; modify; write
– Write more if we caused a page split
● One page write per update!
– page_size / sizeof(int) = 16K/8
● Write amplification is an issue.
8
Space amplification in InnoDB
● Page fill factor <100%
– to allow for updates
● Compression is done per-page
– Compressing bigger portions would be
better
– Page alignment
● Compressed to 3K ? Still on 4k page.
● => Space amplification is an issue
9
InnoDB amplification summary
● Read amplification is ok
● Write and space amplification is an issue
– Saturated IO
– SSDs wear out faster
– Need more space on SSD
● => Low storage efficiency
10
What is MyRocks
● InnoDB’s limitations
● LSM Trees
● RocksDB + MyRocks
11
Log-structured merge tree
● Writes go to
– MemTable
– Log (for durablility)
● When MemTable is full, it is
flushed to disk
● SST= SortedStringTable
– Is written sequentially
– Allows for lookups
MemTableWrite
Log SST
MemTable
12
Log-structured merge tree
● Writing produces
more SSTs
● SSTs are immutable
● SSTs may have
multiple versions of
data
MemTableWrite
Log SST
MemTable
SST ...
13
Reads in LSM tree
● Need to merge the
data on read
– Read amplification
suffers
● Should not have too
many SSTs.
MemTable Read
Log SST
MemTable
SST ...SST
14
Compaction
● Merges multiple SSTs into one
● Removes old data versions
● Reduces the number of files
● Write amplification++
SST SST . . .SST
SST
15
LSM Tree summary
● LSM architecture
– Data is stored in the log, then SST files
– Writes to SST files are sequential
– Have to look in several SST files to read the data
● Efficiency
– Write amplification is reduced
– Space amplification is reduced
– Read amplification increases
16
What is MyRocks
● InnoDB’s limitations
● LSM Trees
● RocksDB + MyRocks
17
RocksDB
● “An embeddable key-value store for
fast storage environments”
● LSM architecture
● Developed at Facebook (2012)
● Used at Facebook and many other companies
● It’s a key-value store, not a database
– C++ Library (local only)
– NoSQL-like API (Get, Put, Delete, Scan)
– No datatypes, tables, secondary indexes, ...
18
MyRocks
● A MySQL storage engine
● Uses RocksDB for storage
● Implements a MySQL storage engine
on top
– Secondary indexes
– Data types
– SQL transactions
– …
● Developed* and used by Facebook
– *-- with some MariaDB involvement
MyRocks
RocksDB
MySQL
InnoDB
19
Size amplification benchmark
● Benchmark data and
chart from Facebook
● Linkbench run
● 24 hours
20
Write amplification benchmark
● Benchmark data and
chart from Facebook
● Linkbench
21
QPS
● Benchmark data and
chart from Facebook
● Linkbench
22
QPS on in-memory workload
● Benchmark data and
chart from Facebook
● Sysbench read-write,
in-memory
● MyRocks doesn’t
always beat InnoDB
23
QPS
● Benchmark data and
chart from Facebook
● Linkbench
24
Another write amplification test
● InnoDB
vs
● MyRocks serving 2x
data
InnoDB 2x RocksDB
0
4
8
12
Flash GC
Binlog / Relay log
Storage engine
25
A real efficiency test
For a certain workload we could reduce the number of MySQL
servers by 50%
– Yoshinori Matsunobu
- What is MyRocks
- MyRocks’ way into MariaDB
- Using MyRocks
27
The source of MyRocks
● MyRocks is a part of Facebook’s MySQL tree
– github.com/facebook/mysql-5.6
● No binaries or packages
● No releases
● Extra features
● “In-house” experience
28
MyRocks in MariaDB
● MyRocks is a part of MariaDB 10.2 and MariaDB 10.3
– MyRocks itself is the same across 10.2 and 10.3
– New related feature in 10.3: “Per-engine gtid_slave_pos”
● MyRocks is a loadable plugin with its own Maturity level.
● Releases
– April, 2017: MyRocks is Alpha (in MariaDB 10.2.5 RC)
– January, 2018: MyRocks is Beta
– February, 2018: MyRocks is RC
● Release pending
- What is MyRocks
- MyRocks’ way into MariaDB
- Using MyRocks
30
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
31
MyRocks packages and binaries
Packages are available for modern distributions
● RPM-based (RedHat, CentOS)
● Deb-based (Debian, Ubuntu)
● Binary tarball
– mariadb-10.2.12-linux-systemd-x86_64.tar.gz
● Windows
● ...
32
Installing MyRocks
● Install the plugin
– RPM
– DEB
– Bintar/windows:
apt install mariadb-plugin-rocksdb
MariaDB> install soname 'ha_rocksdb';
MariaDB> create table t1( … ) engine=rocksdb;
yum install MariaDB-rocksdb-engine
● Restart the server
● Check
● Use
MariaDB> SHOW PLUGINS;
33
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
34
Data loading – good news
● It’s a write-optimized storage engine
● The same data in MyRocks takes less space on disk
than on InnoDB
– 1.5x, 3x, or more
● Can load the data faster than InnoDB
● But…
35
Data loading – limitations
● Limitation: Transaction must fit in memory
mysql> ALTER TABLE big_table ENGINE=RocksDB;
ERROR 2013 (HY000): Lost connection to MySQL server during query
● Uses a lot of memory, then killed by OOM killer.
● Need to use special settings for loading data
– https://mariadb.com/kb/en/library/loading-data-into-myrocks/
– https://github.com/facebook/mysql-5.6/wiki/data-loading
36
Sorted bulk loading
MariaDB> set rocksdb_bulk_load=1;
... # something that inserts the data in PK order
MariaDB> set rocksdb_bulk_load=0;
Bypasses
● Transactional locking
● Seeing the data you are inserting
● Compactions
● ...
37
Unsorted bulk loading
MariaDB> set rocksdb_bulk_load_allow_unsorted=0;
MariaDB> set rocksdb_bulk_load=1;
... # something that inserts data in any order
MariaDB> set rocksdb_bulk_load=0;
● Same as previous but doesn’t require PK order
38
Other speedups for lading
● rocksdb_commit_in_the_middle
– Auto commit every rocksdb_bulk_load_size rows
● @@unique_checks
– Setting to FALSE will bypass unique checks
39
Creating indexes
● Index creation doesn’t need any special settings
mysql> ALTER TABLE myrocks_table ADD INDEX(...);
● Compression is enabled by default
– Lots of settings to tune it further
40
max_row_locks as a safety setting
● Avoid run-away memory usage and OOM killer:
MariaDB> set rocksdb_max_row_locks=10000;
MariaDB> alter table t1 engine=rocksdb;
ERROR 4067 (HY000): Status error 10 received from RocksDB:
Operation aborted: Failed to acquire lock due to
max_num_locks limit
● This setting is useful after migration, too.
41
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
42
Essential MyRocks tuning
● rocksdb_flush_log_at_trx_commit=1
– Same as innodb_flush_log_at_trx_commit=1
– Together with sync_binlog=1 makes the master crash-
safe
● rocksdb_block_cache_size=...
– Similar to innodb_buffer_pool_size
– 500Mb by default (will use up to that amount * 1.5?)
– Set higher if have more RAM
● Safety: rocksdb_max_row_locks=...
43
Indexes
● Primary Key is the clustered index (like in InnoDB)
● Secondary indexes include PK columns (like in InnoDB)
● Non-unique secondary indexes are cheap
– Index maintenance is read-free
● Unique secondary indexes are more expensive
– Maintenance is not read-free
– Reads can suffer from read amplification
– Can use @@unique_check=false, at your own risk
44
Charsets and collations
● Index entries are compared with memcmp(), “mem-comparable”
● “Reversible collations”: binary, latin1_bin, utf8_bin
– Can convert values to/from their mem-comparable form
● “Restorable collations”
– 1-byte characters, one weght per character
– e.g. latin1_general_ci, latin1_swedish_ci, ...
– Index stores mem-comparable form + restore data
● Other collations (e.g. utf8_general_ci)
– Index-only scans are not supported
– Table row stores the original value
‘A’ a 1
‘a’ a 0
45
Charsets and collations (2)
● Using indexes on “Other” (collation) produces a warning
MariaDB> create table t3 (...
a varchar(100) collate utf8_general_ci, key(a)) engine=rocksdb;
Query OK, 0 rows affected, 1 warning (0.20 sec)
MariaDB> show warningsG
*************************** 1. row ***************************
Level: Warning
Code: 1815
Message: Internal error: Indexed column test.t3.a uses a collation that
does not allow index-only access in secondary key and has reduced disk
space efficiency in primary key.
46
Using MyRocks
● Installing
● Migration
● Tuning
● Replication
● Backup
47
MyRocks only supports RBR
● MyRocks’ highest isolation level is “snapshot isolation”,
that is REPEATABLE-READ
● Statement-Based Replication
– Slave runs the statements sequentially (~ serializable)
– InnoDB uses “Gap Locks” to provide isolation req’d by SBR
– MyRocks doesn’t support Gap Locks
● Row-Based Replication must be used
– binlog_format=row
48
Parallel replication
● Facebook/MySQL-5.6 is based on MySQL 5.6
– Parallel replication for data in different databases
● MariaDB has more advanced parallel slave
(controlled by @@slave_parallel_mode)
– Conservative (group-commit based)
– Optimistic (apply transactions in parallel, on conflict roll
back the transaction with greater seq_no)
– Aggressive
49
Parallel replication (2)
● Conservative parallel
replication works with MyRocks
– Different execution paths for
log_slave_updates=ON|OFF
– ON might be faster
● Optimistic and aggressive do
not provide speedups over
conservative.
50
mysql.gtid_slave_pos
●
mysql.gtid_slave_pos stores slave’s position
● Use a transactional storage engine for it
– Database contents will match slave position after crash
recovery
– Makes the slave crash-safe.
●
mysql.gtid_slave_pos uses a different engine?
– Cross-engine transaction (slow).
51
Per-engine mysql.gtid_slave_pos
● The idea:
– Have mysql.gtid_slave_pos_${engine} for each
storage engine
– Slave position is the biggest position in all tables.
– Transaction affecting only MyRocks will only update
mysql.gtid_slave_pos_rocksdb
● Configuration:
--gtid-pos-auto-engines=engine1,engine2,...
52
Per-engine mysql.gtid_slave_pos
● Available in MariaDB 10.3
● Thanks for the patch to
– Kristian Nielsen (implementation)
– Booking.com (request and funding)
● Don’t let your slave be 2-3x slower:
– 10.3: my.cnf: gtid-pos-auto-engines=INNODB,ROCKSDB
– 10.2: MariaDB> ALTER TABLE mysql.gtid_slave_pos
ENGINE=RocksDB;
53
Replication takeaways
● Must use binlog_format=row
● Parallel replication works
(slave_parallel_mode=conservative)
● Set your mysql.gtid_slave_pos correctly
– 10.2: preferred engine
– 10.3: per-engine
54
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
55
myrocks_hotbackup tool
● Takes a hot (non-blocking) backup of RocksDB data
mkdir /path/to/checkpoint-dir
myrocks_hotbackup --user=... --port=... 
--checkpoint_dir=/path/to/checkpoint-dir | ssh "tar -xi ..."
myrocks_hotbackup --move_back 
--datadir=/path/to/restore-dir 
--rocksdb_datadir=/path/to/restore-dir/#rocksdb 
--rocksdb_waldir= /path/to/restore-dir/#rocksdb 
--backup_dir=/where/tarball/was/unpacked
● Making a backup
● Starting from backup
56
myrocks_hotbackup limitations
● Targets MyRocks-only installs
– Copies MyRocks data and supplementary files
– Does *not* work for mixed MyRocks+InnoDB
setups
● Assumes no DDL is running concurrently with the
backup
– May get a wrong frm file
● Not very user-friendly
57
Further backup considerations
● mariabackup does not work with MyRocks
currently
● myrocks_hotbackup is similar (actually simpler) than
xtrabackup
● Will look at making mariabackup work with MyRocks
58
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
✔
✔
✔
✔
✔
59
Documentation
● https://mariadb.com/kb/en/library/myrocks/
● https://github.com/facebook/mysql-5.6/wiki
● https://mariadb.com/kb/en/library/differences-
between-myrocks-variants/
60
Conclusions
● MyRocks is a write-optimized storage engine based
on LSM-trees
– Developed and used by Facebook
● Available in MariaDB 10.2 and MariaDB 10.3
– Currently RC, Stable very soon
● It has some limitations and requires some tuning
– Easy to tackle if you were here for this talk :-)
Thanks!
Q&A

More Related Content

What's hot (20)

PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
PDF
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
PPTX
Using Time Window Compaction Strategy For Time Series Workloads
Jeff Jirsa
 
PPTX
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
PPTX
My SYSAUX tablespace is full - please help
Markus Flechtner
 
PPSX
Oracle Performance Tuning Fundamentals
Carlos Sierra
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PDF
Advanced RAC troubleshooting: Network
Riyaj Shamsudeen
 
PDF
Query optimization techniques for partitioned tables.
Ashutosh Bapat
 
PPT
Oracle archi ppt
Hitesh Kumar Markam
 
PPTX
What’s New in Oracle Database 19c - Part 1
Satishbabu Gunukula
 
PDF
Understanding Oracle RAC 11g Release 2 Internals
Markus Michalewicz
 
PDF
Scalable Filesystem Metadata Services with RocksDB
Alluxio, Inc.
 
PPTX
Building a modern data warehouse
James Serra
 
PDF
A Cloud Journey - Move to the Oracle Cloud
Markus Michalewicz
 
PDF
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
PDF
Change Data Feed in Delta
Databricks
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PPTX
Nosql databases
Fayez Shayeb
 
PPTX
Real time big data stream processing
Luay AL-Assadi
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Using Time Window Compaction Strategy For Time Series Workloads
Jeff Jirsa
 
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
My SYSAUX tablespace is full - please help
Markus Flechtner
 
Oracle Performance Tuning Fundamentals
Carlos Sierra
 
Free Training: How to Build a Lakehouse
Databricks
 
Advanced RAC troubleshooting: Network
Riyaj Shamsudeen
 
Query optimization techniques for partitioned tables.
Ashutosh Bapat
 
Oracle archi ppt
Hitesh Kumar Markam
 
What’s New in Oracle Database 19c - Part 1
Satishbabu Gunukula
 
Understanding Oracle RAC 11g Release 2 Internals
Markus Michalewicz
 
Scalable Filesystem Metadata Services with RocksDB
Alluxio, Inc.
 
Building a modern data warehouse
James Serra
 
A Cloud Journey - Move to the Oracle Cloud
Markus Michalewicz
 
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
Change Data Feed in Delta
Databricks
 
Delta lake and the delta architecture
Adam Doyle
 
Nosql databases
Fayez Shayeb
 
Real time big data stream processing
Luay AL-Assadi
 

Similar to M|18 How to use MyRocks with MariaDB Server (20)

PDF
MyRocks in MariaDB | M18
Sergey Petrunya
 
PDF
Say Hello to MyRocks
Sergey Petrunya
 
PPTX
When is MyRocks good?
Alkin Tezuysal
 
PDF
MyRocks in MariaDB
Sergey Petrunya
 
PPTX
M|18 How Facebook Migrated to MyRocks
MariaDB plc
 
PPTX
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
PDF
When is Myrocks good? 2020 Webinar Series
Alkin Tezuysal
 
PPTX
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive
 
PDF
MyRocks in MariaDB: why and how
Sergey Petrunya
 
PPTX
Myrocks in the wild wild west! FOSDEM 2020
Alkin Tezuysal
 
PDF
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Ontico
 
PDF
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
Sveta Smirnova
 
PDF
What is MariaDB Server 10.3?
Colin Charles
 
PDF
MariaDB - the "new" MySQL is 5 years old and everywhere (LinuxCon Europe 2015)
Colin Charles
 
PDF
Meet MariaDB 10.1 at the Bulgaria Web Summit
Colin Charles
 
PDF
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...
Insight Technology, Inc.
 
PDF
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
Colin Charles
 
PDF
MariaDB 10: A MySQL Replacement - HKOSC
Colin Charles
 
PDF
MySQL Storage Engines Landscape
Colin Charles
 
PDF
Why MariaDB?
Colin Charles
 
MyRocks in MariaDB | M18
Sergey Petrunya
 
Say Hello to MyRocks
Sergey Petrunya
 
When is MyRocks good?
Alkin Tezuysal
 
MyRocks in MariaDB
Sergey Petrunya
 
M|18 How Facebook Migrated to MyRocks
MariaDB plc
 
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
When is Myrocks good? 2020 Webinar Series
Alkin Tezuysal
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive
 
MyRocks in MariaDB: why and how
Sergey Petrunya
 
Myrocks in the wild wild west! FOSDEM 2020
Alkin Tezuysal
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Ontico
 
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
Sveta Smirnova
 
What is MariaDB Server 10.3?
Colin Charles
 
MariaDB - the "new" MySQL is 5 years old and everywhere (LinuxCon Europe 2015)
Colin Charles
 
Meet MariaDB 10.1 at the Bulgaria Web Summit
Colin Charles
 
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...
Insight Technology, Inc.
 
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
Colin Charles
 
MariaDB 10: A MySQL Replacement - HKOSC
Colin Charles
 
MySQL Storage Engines Landscape
Colin Charles
 
Why MariaDB?
Colin Charles
 
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
PDF
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
PDF
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
PDF
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
PDF
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
PDF
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
PDF
Introducing workload analysis
MariaDB plc
 
PDF
Under the hood: SkySQL monitoring
MariaDB plc
 
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
MariaDB plc
 
Ad

Recently uploaded (20)

PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 

M|18 How to use MyRocks with MariaDB Server

  • 2. - What is MyRocks - MyRocks’ way into MariaDB - Using MyRocks
  • 3. 3 What is MyRocks ● InnoDB’s limitations ● LSM Trees ● RocksDB + MyRocks
  • 4. 4 Data vs Disk  Put some data into the database  How much is written to disk? INSERT INTO tbl1 VALUES ('foo')  amplification = size of data size of data on disk
  • 5. 5 Amplification  Write amplification  Space amplification  Read amplification amplification = size of data size of data on disk foo
  • 6. 6 Amplification in InnoDB ● B*-tree ● Read amplification – Assume random data lookup – Locate the page, read it – Root page is cached ● ~1 disk read for the leaf page – Read amplification is ok
  • 7. 7 Write amplification in InnoDB ● Locate the page to update ● Read; modify; write – Write more if we caused a page split ● One page write per update! – page_size / sizeof(int) = 16K/8 ● Write amplification is an issue.
  • 8. 8 Space amplification in InnoDB ● Page fill factor <100% – to allow for updates ● Compression is done per-page – Compressing bigger portions would be better – Page alignment ● Compressed to 3K ? Still on 4k page. ● => Space amplification is an issue
  • 9. 9 InnoDB amplification summary ● Read amplification is ok ● Write and space amplification is an issue – Saturated IO – SSDs wear out faster – Need more space on SSD ● => Low storage efficiency
  • 10. 10 What is MyRocks ● InnoDB’s limitations ● LSM Trees ● RocksDB + MyRocks
  • 11. 11 Log-structured merge tree ● Writes go to – MemTable – Log (for durablility) ● When MemTable is full, it is flushed to disk ● SST= SortedStringTable – Is written sequentially – Allows for lookups MemTableWrite Log SST MemTable
  • 12. 12 Log-structured merge tree ● Writing produces more SSTs ● SSTs are immutable ● SSTs may have multiple versions of data MemTableWrite Log SST MemTable SST ...
  • 13. 13 Reads in LSM tree ● Need to merge the data on read – Read amplification suffers ● Should not have too many SSTs. MemTable Read Log SST MemTable SST ...SST
  • 14. 14 Compaction ● Merges multiple SSTs into one ● Removes old data versions ● Reduces the number of files ● Write amplification++ SST SST . . .SST SST
  • 15. 15 LSM Tree summary ● LSM architecture – Data is stored in the log, then SST files – Writes to SST files are sequential – Have to look in several SST files to read the data ● Efficiency – Write amplification is reduced – Space amplification is reduced – Read amplification increases
  • 16. 16 What is MyRocks ● InnoDB’s limitations ● LSM Trees ● RocksDB + MyRocks
  • 17. 17 RocksDB ● “An embeddable key-value store for fast storage environments” ● LSM architecture ● Developed at Facebook (2012) ● Used at Facebook and many other companies ● It’s a key-value store, not a database – C++ Library (local only) – NoSQL-like API (Get, Put, Delete, Scan) – No datatypes, tables, secondary indexes, ...
  • 18. 18 MyRocks ● A MySQL storage engine ● Uses RocksDB for storage ● Implements a MySQL storage engine on top – Secondary indexes – Data types – SQL transactions – … ● Developed* and used by Facebook – *-- with some MariaDB involvement MyRocks RocksDB MySQL InnoDB
  • 19. 19 Size amplification benchmark ● Benchmark data and chart from Facebook ● Linkbench run ● 24 hours
  • 20. 20 Write amplification benchmark ● Benchmark data and chart from Facebook ● Linkbench
  • 21. 21 QPS ● Benchmark data and chart from Facebook ● Linkbench
  • 22. 22 QPS on in-memory workload ● Benchmark data and chart from Facebook ● Sysbench read-write, in-memory ● MyRocks doesn’t always beat InnoDB
  • 23. 23 QPS ● Benchmark data and chart from Facebook ● Linkbench
  • 24. 24 Another write amplification test ● InnoDB vs ● MyRocks serving 2x data InnoDB 2x RocksDB 0 4 8 12 Flash GC Binlog / Relay log Storage engine
  • 25. 25 A real efficiency test For a certain workload we could reduce the number of MySQL servers by 50% – Yoshinori Matsunobu
  • 26. - What is MyRocks - MyRocks’ way into MariaDB - Using MyRocks
  • 27. 27 The source of MyRocks ● MyRocks is a part of Facebook’s MySQL tree – github.com/facebook/mysql-5.6 ● No binaries or packages ● No releases ● Extra features ● “In-house” experience
  • 28. 28 MyRocks in MariaDB ● MyRocks is a part of MariaDB 10.2 and MariaDB 10.3 – MyRocks itself is the same across 10.2 and 10.3 – New related feature in 10.3: “Per-engine gtid_slave_pos” ● MyRocks is a loadable plugin with its own Maturity level. ● Releases – April, 2017: MyRocks is Alpha (in MariaDB 10.2.5 RC) – January, 2018: MyRocks is Beta – February, 2018: MyRocks is RC ● Release pending
  • 29. - What is MyRocks - MyRocks’ way into MariaDB - Using MyRocks
  • 30. 30 Using MyRocks ● Installation ● Migration ● Tuning ● Replication ● Backup
  • 31. 31 MyRocks packages and binaries Packages are available for modern distributions ● RPM-based (RedHat, CentOS) ● Deb-based (Debian, Ubuntu) ● Binary tarball – mariadb-10.2.12-linux-systemd-x86_64.tar.gz ● Windows ● ...
  • 32. 32 Installing MyRocks ● Install the plugin – RPM – DEB – Bintar/windows: apt install mariadb-plugin-rocksdb MariaDB> install soname 'ha_rocksdb'; MariaDB> create table t1( … ) engine=rocksdb; yum install MariaDB-rocksdb-engine ● Restart the server ● Check ● Use MariaDB> SHOW PLUGINS;
  • 33. 33 Using MyRocks ● Installation ● Migration ● Tuning ● Replication ● Backup
  • 34. 34 Data loading – good news ● It’s a write-optimized storage engine ● The same data in MyRocks takes less space on disk than on InnoDB – 1.5x, 3x, or more ● Can load the data faster than InnoDB ● But…
  • 35. 35 Data loading – limitations ● Limitation: Transaction must fit in memory mysql> ALTER TABLE big_table ENGINE=RocksDB; ERROR 2013 (HY000): Lost connection to MySQL server during query ● Uses a lot of memory, then killed by OOM killer. ● Need to use special settings for loading data – https://mariadb.com/kb/en/library/loading-data-into-myrocks/ – https://github.com/facebook/mysql-5.6/wiki/data-loading
  • 36. 36 Sorted bulk loading MariaDB> set rocksdb_bulk_load=1; ... # something that inserts the data in PK order MariaDB> set rocksdb_bulk_load=0; Bypasses ● Transactional locking ● Seeing the data you are inserting ● Compactions ● ...
  • 37. 37 Unsorted bulk loading MariaDB> set rocksdb_bulk_load_allow_unsorted=0; MariaDB> set rocksdb_bulk_load=1; ... # something that inserts data in any order MariaDB> set rocksdb_bulk_load=0; ● Same as previous but doesn’t require PK order
  • 38. 38 Other speedups for lading ● rocksdb_commit_in_the_middle – Auto commit every rocksdb_bulk_load_size rows ● @@unique_checks – Setting to FALSE will bypass unique checks
  • 39. 39 Creating indexes ● Index creation doesn’t need any special settings mysql> ALTER TABLE myrocks_table ADD INDEX(...); ● Compression is enabled by default – Lots of settings to tune it further
  • 40. 40 max_row_locks as a safety setting ● Avoid run-away memory usage and OOM killer: MariaDB> set rocksdb_max_row_locks=10000; MariaDB> alter table t1 engine=rocksdb; ERROR 4067 (HY000): Status error 10 received from RocksDB: Operation aborted: Failed to acquire lock due to max_num_locks limit ● This setting is useful after migration, too.
  • 41. 41 Using MyRocks ● Installation ● Migration ● Tuning ● Replication ● Backup
  • 42. 42 Essential MyRocks tuning ● rocksdb_flush_log_at_trx_commit=1 – Same as innodb_flush_log_at_trx_commit=1 – Together with sync_binlog=1 makes the master crash- safe ● rocksdb_block_cache_size=... – Similar to innodb_buffer_pool_size – 500Mb by default (will use up to that amount * 1.5?) – Set higher if have more RAM ● Safety: rocksdb_max_row_locks=...
  • 43. 43 Indexes ● Primary Key is the clustered index (like in InnoDB) ● Secondary indexes include PK columns (like in InnoDB) ● Non-unique secondary indexes are cheap – Index maintenance is read-free ● Unique secondary indexes are more expensive – Maintenance is not read-free – Reads can suffer from read amplification – Can use @@unique_check=false, at your own risk
  • 44. 44 Charsets and collations ● Index entries are compared with memcmp(), “mem-comparable” ● “Reversible collations”: binary, latin1_bin, utf8_bin – Can convert values to/from their mem-comparable form ● “Restorable collations” – 1-byte characters, one weght per character – e.g. latin1_general_ci, latin1_swedish_ci, ... – Index stores mem-comparable form + restore data ● Other collations (e.g. utf8_general_ci) – Index-only scans are not supported – Table row stores the original value ‘A’ a 1 ‘a’ a 0
  • 45. 45 Charsets and collations (2) ● Using indexes on “Other” (collation) produces a warning MariaDB> create table t3 (... a varchar(100) collate utf8_general_ci, key(a)) engine=rocksdb; Query OK, 0 rows affected, 1 warning (0.20 sec) MariaDB> show warningsG *************************** 1. row *************************** Level: Warning Code: 1815 Message: Internal error: Indexed column test.t3.a uses a collation that does not allow index-only access in secondary key and has reduced disk space efficiency in primary key.
  • 46. 46 Using MyRocks ● Installing ● Migration ● Tuning ● Replication ● Backup
  • 47. 47 MyRocks only supports RBR ● MyRocks’ highest isolation level is “snapshot isolation”, that is REPEATABLE-READ ● Statement-Based Replication – Slave runs the statements sequentially (~ serializable) – InnoDB uses “Gap Locks” to provide isolation req’d by SBR – MyRocks doesn’t support Gap Locks ● Row-Based Replication must be used – binlog_format=row
  • 48. 48 Parallel replication ● Facebook/MySQL-5.6 is based on MySQL 5.6 – Parallel replication for data in different databases ● MariaDB has more advanced parallel slave (controlled by @@slave_parallel_mode) – Conservative (group-commit based) – Optimistic (apply transactions in parallel, on conflict roll back the transaction with greater seq_no) – Aggressive
  • 49. 49 Parallel replication (2) ● Conservative parallel replication works with MyRocks – Different execution paths for log_slave_updates=ON|OFF – ON might be faster ● Optimistic and aggressive do not provide speedups over conservative.
  • 50. 50 mysql.gtid_slave_pos ● mysql.gtid_slave_pos stores slave’s position ● Use a transactional storage engine for it – Database contents will match slave position after crash recovery – Makes the slave crash-safe. ● mysql.gtid_slave_pos uses a different engine? – Cross-engine transaction (slow).
  • 51. 51 Per-engine mysql.gtid_slave_pos ● The idea: – Have mysql.gtid_slave_pos_${engine} for each storage engine – Slave position is the biggest position in all tables. – Transaction affecting only MyRocks will only update mysql.gtid_slave_pos_rocksdb ● Configuration: --gtid-pos-auto-engines=engine1,engine2,...
  • 52. 52 Per-engine mysql.gtid_slave_pos ● Available in MariaDB 10.3 ● Thanks for the patch to – Kristian Nielsen (implementation) – Booking.com (request and funding) ● Don’t let your slave be 2-3x slower: – 10.3: my.cnf: gtid-pos-auto-engines=INNODB,ROCKSDB – 10.2: MariaDB> ALTER TABLE mysql.gtid_slave_pos ENGINE=RocksDB;
  • 53. 53 Replication takeaways ● Must use binlog_format=row ● Parallel replication works (slave_parallel_mode=conservative) ● Set your mysql.gtid_slave_pos correctly – 10.2: preferred engine – 10.3: per-engine
  • 54. 54 Using MyRocks ● Installation ● Migration ● Tuning ● Replication ● Backup
  • 55. 55 myrocks_hotbackup tool ● Takes a hot (non-blocking) backup of RocksDB data mkdir /path/to/checkpoint-dir myrocks_hotbackup --user=... --port=... --checkpoint_dir=/path/to/checkpoint-dir | ssh "tar -xi ..." myrocks_hotbackup --move_back --datadir=/path/to/restore-dir --rocksdb_datadir=/path/to/restore-dir/#rocksdb --rocksdb_waldir= /path/to/restore-dir/#rocksdb --backup_dir=/where/tarball/was/unpacked ● Making a backup ● Starting from backup
  • 56. 56 myrocks_hotbackup limitations ● Targets MyRocks-only installs – Copies MyRocks data and supplementary files – Does *not* work for mixed MyRocks+InnoDB setups ● Assumes no DDL is running concurrently with the backup – May get a wrong frm file ● Not very user-friendly
  • 57. 57 Further backup considerations ● mariabackup does not work with MyRocks currently ● myrocks_hotbackup is similar (actually simpler) than xtrabackup ● Will look at making mariabackup work with MyRocks
  • 58. 58 Using MyRocks ● Installation ● Migration ● Tuning ● Replication ● Backup ✔ ✔ ✔ ✔ ✔
  • 60. 60 Conclusions ● MyRocks is a write-optimized storage engine based on LSM-trees – Developed and used by Facebook ● Available in MariaDB 10.2 and MariaDB 10.3 – Currently RC, Stable very soon ● It has some limitations and requires some tuning – Easy to tackle if you were here for this talk :-)