How we switched to columnar at SpendHQ

How we switched to columnar
w/ SpendHQ
Allen Herrera

2 https://www.spendhq.com/

Drivers For Change?
• Massive growth in the last couple years
• Legacy application architecture not built to scale
• Need to Improve query performance
• Need to modernize
3

Why Leave our old database?
• Old DB
• Modernization
• Based off MySQL 5.1.X
• Performance
• Slow
• Single Threaded
• Couldn't Scale Vertically Anymore
• Not Clusterable
• What were we looking for
• Ease of transition
• Scalability
• Lower cost if possible
• Community Support
4

Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Identify Options
Quantify Targets
Overcome Challenges
Set up cluster
Professional Services
Define Migration Process
Automate Cluster Creation
Fail Deploying
Refactor ETLs
Actually Deploy
The Journey

Identify Options

Identifying Alternative Databases
Consultant identified 7 open source database technologies
7
Database Name Released Notes
Calpont InfiniDB 2010 C/C++ MySQL front end
ClickHouse 2014 C/C++
CreateDB 2013 Java Based
Greenplum Database 2005 Postgres Based
MariaDB ColumnStore 2016 MySQL /Inifinibd branch
MapD Technologies 2016 C/C++
MonetDB 2004 C
Chose MariaDB Columnstore - syntax similarity to our prior DB

• ANSI SQL
• Open Source
• Enterprise Support
• Professional Services
• Scalable
• Performant
8
Why MariaDB Columnstore!

Quantify Targets

Quantify Targets
• Goals
• 71% reduction by switching databases
• 95% reduction if we de-normalize our schemas
10
-6.00
4.00
14.00
24.00
34.00
44.00
54.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Seconds
Query
Query Performance Chart
InfoBright Joins MCS Joins MCS Flat queries

Overcome Challenges

Setting up our first Columnstore DB
Really Easy !
https://github.com/toddstoffel/columnstore_easy_setup
Lots of my.cnf optimizations out of the box, very few we had to adjust including
» interactive_timeout
» wait_timeout
» max_length_for_sort_data
» innodb_buffer_pool_size
12

Connecting the first Columnstore database
13

1st Challenge
14
Array
(
[0] => Array
(
[0] => Array
(
[min_date] => 2015-10-01
)
[Company] => Array
(
[lft] => 731
)
)
)
Array
(
[0] => Array
(
[$vtable_723] => Array
(
[max_date] => 2013-05-01
[lft] => 29
)
)
)
Root Cause:
Cakephp ORM use of mysqli_fetch_field_direct()
Overcoming legacy framework limitations

2nd Challenge
15
Bad SQL:
SELECT uuid , `vendor_name` , SUM(amount) FROM table GROUP BY
name;
Proper SQL
SELECT MIN(uuid) , ` vendor_name` , SUM(amount) FROM table GROUP
BY name;
Overcoming legacy code
Internal error: IDB-2021: 'table. uuid’ is not in GROUP BY clause.
All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.

3rd Challenge
16
Overcoming case sensitive group bys
id name
1 allen
2 Allen
SELECT COUNT(id), `name` FROM test_table GROUP BY `name`;
MariaDB -
Old DB -
Results

Professional Services

Reviewing progress with professional services
Analyzing performance
1. Hard drives
• Fio testing - https://github.com/axboe/fio.git
˗ /usr/local/bin/fio --randrepeat=1 --ioengine=libaio --direct=1 --
gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --
size=4G --readwrite=randrw --rwmixread=75
˗ We noticed mixed iops of ~2,000
˗ After switching to SSDs ~ 13,000
2. Query Configuration
• Adjusted innodb buffer size
• Adjusted columnstore.xml
• PmMaxMemorySmallSide – small side table joins memory size
18

Reviewing progress with professional services
Analyzing performance
» Queries
» Page loads
• Confirmed improved query performance translated to improved
uncached page load times in our app
19

Automate Cluster Creation

Automating Cluster Creation
21 Based off of: https://github.com/toddstoffel/columnstore_easy_setup

Define Migration Process

Defining our data transfer process
64 minutes - insert into {columnstore} select * from {innodb}
46 minute - load from outfile
26 minute - cpimport
For InnoDB – 5 hours vs 15 hours - split large csv
23
181 Million records from InnoDB to Columnstore

Fail Deploying

Solution
First deployment Fail
1. Attach more storage – doubled to 32 TB
2. Utilize /etc/rc.local to connect to iscsi target and remount automatically
25
Problems
1. Storage drives – 16TB wasn’t enough!
2. iSCSI volumes in fstab – no no

Refactor ETLs

Refactoring data processes for Columnstore
Write operations were not plug and play
27
40%
44%
1040%
100+ %
1200%
100%

7x - ETL – utilize new multi processes architecture to take advantage
of innodb row level locking
Client Shard Rebuilds - export to csv and import from outfile
28

Where we ended up
29

Actually Deploy

Releasing!
Storage Networking on our UM
latency
bandwidth
write speeds
Multipath
yum install device-mapper-multipath
31
ProblemsSolution

What Next!

Where we are going next
Refactor legacy critical performance areas as needed
Building a new version of our APP
Addressing data schema
not to use as many joins
separate
application data (transactional/state based)
client data (columnar)
Testing GPU databases
Brytlyt
Omnisci
33

Read Time
~78%
Write Time
~10%
Storage
10 times more
Modify Application
Time Consuming
Biggest wins Biggest Losses
ETL
25x
Concurrency
About Same

Questions?
@allenherrera
aherrera@spendhq.com

How we switched to columnar at SpendHQ

Recommended

More Related Content

What's hot (10)

Similar to How we switched to columnar at SpendHQ (20)

More from MariaDB plc (20)

Recently uploaded (20)

How we switched to columnar at SpendHQ

Editor's Notes