SlideShare a Scribd company logo
How we switched to columnar
w/ SpendHQ
Allen Herrera
2 https://www.spendhq.com/
Drivers For Change?
• Massive growth in the last couple years
• Legacy application architecture not built to scale
• Need to Improve query performance
• Need to modernize
3
Why Leave our old database?
• Old DB
• Modernization
• Based off MySQL 5.1.X
• Performance
• Slow
• Single Threaded
• Couldn't Scale Vertically Anymore
• Not Clusterable
• What were we looking for
• Ease of transition
• Scalability
• Lower cost if possible
• Community Support
4
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Identify Options
Quantify Targets
Overcome Challenges
Set up cluster
Professional Services
Define Migration Process
Automate Cluster Creation
Fail Deploying
Refactor ETLs
Actually Deploy
The Journey
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Identify Options
Identifying Alternative Databases
Consultant identified 7 open source database technologies
7
Database Name Released Notes
Calpont InfiniDB 2010 C/C++ MySQL front end
ClickHouse 2014 C/C++
CreateDB 2013 Java Based
Greenplum Database 2005 Postgres Based
MariaDB ColumnStore 2016 MySQL /Inifinibd branch
MapD Technologies 2016 C/C++
MonetDB 2004 C
Chose MariaDB Columnstore - syntax similarity to our prior DB
• ANSI SQL
• Open Source
• Enterprise Support
• Professional Services
• Scalable
• Performant
8
Why MariaDB Columnstore!
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Quantify Targets
Quantify Targets
• Goals
• 71% reduction by switching databases
• 95% reduction if we de-normalize our schemas
10
-6.00
4.00
14.00
24.00
34.00
44.00
54.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Seconds
Query
Query Performance Chart
InfoBright Joins MCS Joins MCS Flat queries
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Overcome Challenges
Setting up our first Columnstore DB
Really Easy !
https://github.com/toddstoffel/columnstore_easy_setup
Lots of my.cnf optimizations out of the box, very few we had to adjust including
» interactive_timeout
» wait_timeout
» max_length_for_sort_data
» innodb_buffer_pool_size
12
Connecting the first Columnstore database
13
1st Challenge
14
Array
(
[0] => Array
(
[0] => Array
(
[min_date] => 2015-10-01
)
[Company] => Array
(
[lft] => 731
)
)
)
Array
(
[0] => Array
(
[$vtable_723] => Array
(
[max_date] => 2013-05-01
[lft] => 29
)
)
)
Root Cause:
Cakephp ORM use of mysqli_fetch_field_direct()
Overcoming legacy framework limitations
2nd Challenge
15
Bad SQL:
SELECT uuid , `vendor_name` , SUM(amount) FROM table GROUP BY
name;
Proper SQL
SELECT MIN(uuid) , ` vendor_name` , SUM(amount) FROM table GROUP
BY name;
Overcoming legacy code
Internal error: IDB-2021: 'table. uuid’ is not in GROUP BY clause.
All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
3rd Challenge
16
Overcoming case sensitive group bys
id name
1 allen
2 Allen
SELECT COUNT(id), `name` FROM test_table GROUP BY `name`;
MariaDB -
Old DB -
Results
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Professional Services
Reviewing progress with professional services
Analyzing performance
1. Hard drives
• Fio testing - https://github.com/axboe/fio.git
˗ /usr/local/bin/fio --randrepeat=1 --ioengine=libaio --direct=1 --
gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --
size=4G --readwrite=randrw --rwmixread=75
˗ We noticed mixed iops of ~2,000
˗ After switching to SSDs ~ 13,000
2. Query Configuration
• Adjusted innodb buffer size
• Adjusted columnstore.xml
• PmMaxMemorySmallSide – small side table joins memory size
18
Reviewing progress with professional services
Analyzing performance
» Queries
» Page loads
• Confirmed improved query performance translated to improved
uncached page load times in our app
19
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Automate Cluster Creation
Automating Cluster Creation
21 Based off of: https://github.com/toddstoffel/columnstore_easy_setup
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Define Migration Process
Defining our data transfer process
64 minutes - insert into {columnstore} select * from {innodb}
46 minute - load from outfile
26 minute - cpimport
For InnoDB – 5 hours vs 15 hours - split large csv
23
181 Million records from InnoDB to Columnstore
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Fail Deploying
Solution
First deployment Fail
1. Attach more storage – doubled to 32 TB
2. Utilize /etc/rc.local to connect to iscsi target and remount automatically
25
Problems
1. Storage drives – 16TB wasn’t enough!
2. iSCSI volumes in fstab – no no
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Refactor ETLs
Refactoring data processes for Columnstore
Write operations were not plug and play
27
40%
44%
1040%
100+ %
1200%
100%
Refactoring data processes for Columnstore
7x - ETL – utilize new multi processes architecture to take advantage
of innodb row level locking
Client Shard Rebuilds - export to csv and import from outfile
28
Refactoring data processes for Columnstore
Where we ended up
29
Prepare RefineAnalyze
Dec ‘ 17 Mar Aug Nov Dec
Actually Deploy
Releasing!
Storage Networking on our UM
latency
bandwidth
write speeds
Multipath
yum install device-mapper-multipath
31
ProblemsSolution
What Next!
Dec ‘ 17 Mar Aug Nov Dec
Where we are going next
Refactor legacy critical performance areas as needed
Building a new version of our APP
Addressing data schema
not to use as many joins
separate
application data (transactional/state based)
client data (columnar)
Testing GPU databases
Brytlyt
Omnisci
33
Read Time
~78%
Write Time
~10%
Storage
10 times more
Modify Application
Time Consuming
Biggest wins Biggest Losses
ETL
25x
Concurrency
About Same
Questions?
@allenherrera
aherrera@spendhq.com
Ad

More Related Content

What's hot (10)

Caffe - A deep learning framework (Ramin Fahimi)
Caffe - A deep learning framework (Ramin Fahimi)Caffe - A deep learning framework (Ramin Fahimi)
Caffe - A deep learning framework (Ramin Fahimi)
irpycon
 
Office 365 migration
Office 365 migrationOffice 365 migration
Office 365 migration
Motty Ben Atia
 
OneDrive for Business Best Practices
OneDrive for Business Best PracticesOneDrive for Business Best Practices
OneDrive for Business Best Practices
Chris Woodill
 
Standard Chartered- Threat Intelligence using Knowledge Graphs.pdf
Standard Chartered- Threat Intelligence using Knowledge Graphs.pdfStandard Chartered- Threat Intelligence using Knowledge Graphs.pdf
Standard Chartered- Threat Intelligence using Knowledge Graphs.pdf
Neo4j
 
Files Vs DataBase
Files Vs DataBaseFiles Vs DataBase
Files Vs DataBase
Dr. C.V. Suresh Babu
 
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
ssuserec8a711
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
SAP Materialbewertung
SAP MaterialbewertungSAP Materialbewertung
SAP Materialbewertung
SERKEM GmbH
 
Présentation AzureAD ( Identité hybrides et securité)
Présentation AzureAD ( Identité hybrides et securité)Présentation AzureAD ( Identité hybrides et securité)
Présentation AzureAD ( Identité hybrides et securité)
☁️Seyfallah Tagrerout☁ [MVP]
 
Dependencies
DependenciesDependencies
Dependencies
Muhammad Ishaq
 
Caffe - A deep learning framework (Ramin Fahimi)
Caffe - A deep learning framework (Ramin Fahimi)Caffe - A deep learning framework (Ramin Fahimi)
Caffe - A deep learning framework (Ramin Fahimi)
irpycon
 
OneDrive for Business Best Practices
OneDrive for Business Best PracticesOneDrive for Business Best Practices
OneDrive for Business Best Practices
Chris Woodill
 
Standard Chartered- Threat Intelligence using Knowledge Graphs.pdf
Standard Chartered- Threat Intelligence using Knowledge Graphs.pdfStandard Chartered- Threat Intelligence using Knowledge Graphs.pdf
Standard Chartered- Threat Intelligence using Knowledge Graphs.pdf
Neo4j
 
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
ssuserec8a711
 
SAP Materialbewertung
SAP MaterialbewertungSAP Materialbewertung
SAP Materialbewertung
SERKEM GmbH
 

Similar to How we switched to columnar at SpendHQ (20)

Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with Postgres
EDB
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
Microsoft TechNet - Belgium and Luxembourg
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Denny Lee
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
DataWorks Summit
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
Continuent
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
Sadayuki Furuhashi
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with Postgres
EDB
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Denny Lee
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
Continuent
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
Sadayuki Furuhashi
 
Ad

More from MariaDB plc (20)

MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
MariaDB plc
 
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
MariaDB plc
 
Ad

Recently uploaded (20)

Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 

How we switched to columnar at SpendHQ

  • 1. How we switched to columnar w/ SpendHQ Allen Herrera
  • 3. Drivers For Change? • Massive growth in the last couple years • Legacy application architecture not built to scale • Need to Improve query performance • Need to modernize 3
  • 4. Why Leave our old database? • Old DB • Modernization • Based off MySQL 5.1.X • Performance • Slow • Single Threaded • Couldn't Scale Vertically Anymore • Not Clusterable • What were we looking for • Ease of transition • Scalability • Lower cost if possible • Community Support 4
  • 5. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Identify Options Quantify Targets Overcome Challenges Set up cluster Professional Services Define Migration Process Automate Cluster Creation Fail Deploying Refactor ETLs Actually Deploy The Journey
  • 6. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Identify Options
  • 7. Identifying Alternative Databases Consultant identified 7 open source database technologies 7 Database Name Released Notes Calpont InfiniDB 2010 C/C++ MySQL front end ClickHouse 2014 C/C++ CreateDB 2013 Java Based Greenplum Database 2005 Postgres Based MariaDB ColumnStore 2016 MySQL /Inifinibd branch MapD Technologies 2016 C/C++ MonetDB 2004 C Chose MariaDB Columnstore - syntax similarity to our prior DB
  • 8. • ANSI SQL • Open Source • Enterprise Support • Professional Services • Scalable • Performant 8 Why MariaDB Columnstore!
  • 9. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Quantify Targets
  • 10. Quantify Targets • Goals • 71% reduction by switching databases • 95% reduction if we de-normalize our schemas 10 -6.00 4.00 14.00 24.00 34.00 44.00 54.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Seconds Query Query Performance Chart InfoBright Joins MCS Joins MCS Flat queries
  • 11. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Overcome Challenges
  • 12. Setting up our first Columnstore DB Really Easy ! https://github.com/toddstoffel/columnstore_easy_setup Lots of my.cnf optimizations out of the box, very few we had to adjust including » interactive_timeout » wait_timeout » max_length_for_sort_data » innodb_buffer_pool_size 12
  • 13. Connecting the first Columnstore database 13
  • 14. 1st Challenge 14 Array ( [0] => Array ( [0] => Array ( [min_date] => 2015-10-01 ) [Company] => Array ( [lft] => 731 ) ) ) Array ( [0] => Array ( [$vtable_723] => Array ( [max_date] => 2013-05-01 [lft] => 29 ) ) ) Root Cause: Cakephp ORM use of mysqli_fetch_field_direct() Overcoming legacy framework limitations
  • 15. 2nd Challenge 15 Bad SQL: SELECT uuid , `vendor_name` , SUM(amount) FROM table GROUP BY name; Proper SQL SELECT MIN(uuid) , ` vendor_name` , SUM(amount) FROM table GROUP BY name; Overcoming legacy code Internal error: IDB-2021: 'table. uuid’ is not in GROUP BY clause. All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
  • 16. 3rd Challenge 16 Overcoming case sensitive group bys id name 1 allen 2 Allen SELECT COUNT(id), `name` FROM test_table GROUP BY `name`; MariaDB - Old DB - Results
  • 17. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Professional Services
  • 18. Reviewing progress with professional services Analyzing performance 1. Hard drives • Fio testing - https://github.com/axboe/fio.git ˗ /usr/local/bin/fio --randrepeat=1 --ioengine=libaio --direct=1 -- gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 -- size=4G --readwrite=randrw --rwmixread=75 ˗ We noticed mixed iops of ~2,000 ˗ After switching to SSDs ~ 13,000 2. Query Configuration • Adjusted innodb buffer size • Adjusted columnstore.xml • PmMaxMemorySmallSide – small side table joins memory size 18
  • 19. Reviewing progress with professional services Analyzing performance » Queries » Page loads • Confirmed improved query performance translated to improved uncached page load times in our app 19
  • 20. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Automate Cluster Creation
  • 21. Automating Cluster Creation 21 Based off of: https://github.com/toddstoffel/columnstore_easy_setup
  • 22. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Define Migration Process
  • 23. Defining our data transfer process 64 minutes - insert into {columnstore} select * from {innodb} 46 minute - load from outfile 26 minute - cpimport For InnoDB – 5 hours vs 15 hours - split large csv 23 181 Million records from InnoDB to Columnstore
  • 24. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Fail Deploying
  • 25. Solution First deployment Fail 1. Attach more storage – doubled to 32 TB 2. Utilize /etc/rc.local to connect to iscsi target and remount automatically 25 Problems 1. Storage drives – 16TB wasn’t enough! 2. iSCSI volumes in fstab – no no
  • 26. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Refactor ETLs
  • 27. Refactoring data processes for Columnstore Write operations were not plug and play 27 40% 44% 1040% 100+ % 1200% 100%
  • 28. Refactoring data processes for Columnstore 7x - ETL – utilize new multi processes architecture to take advantage of innodb row level locking Client Shard Rebuilds - export to csv and import from outfile 28
  • 29. Refactoring data processes for Columnstore Where we ended up 29
  • 30. Prepare RefineAnalyze Dec ‘ 17 Mar Aug Nov Dec Actually Deploy
  • 31. Releasing! Storage Networking on our UM latency bandwidth write speeds Multipath yum install device-mapper-multipath 31 ProblemsSolution
  • 32. What Next! Dec ‘ 17 Mar Aug Nov Dec
  • 33. Where we are going next Refactor legacy critical performance areas as needed Building a new version of our APP Addressing data schema not to use as many joins separate application data (transactional/state based) client data (columnar) Testing GPU databases Brytlyt Omnisci 33
  • 34. Read Time ~78% Write Time ~10% Storage 10 times more Modify Application Time Consuming Biggest wins Biggest Losses ETL 25x Concurrency About Same

Editor's Notes

  • #2: Thanks for coming! Hi I’m Allen Herrera. I’m an Engineer with SpendHQ, Most recently encharged with the migration from our prior database to MariaDB Columnstore in the later half of 2018. I’m excited and nervous to be here speaking, sharing the results of our journey as this is a first for me. So how we’ll do this is Ill start with a business level summary/ justification and then jump into a time lined story of our process ,challenges and results of switching to MariaDB Columnstore.
  • #3: So lets start with some high level background information about SpendHQ! We are In the business of cleaning up client data and helping them identify savings opportunities from your data We do this through our sister consulting company ISG and our Data Analytic / Visualization web application It all starts with the client sending us raw data in any format they have. This includes excel files, csv files and more. We then take this data and consolidate, normalize it into a single schema . Part of this process includes normalizing vendor/company names Next the data is categorized against a custom taxonomy defined by the client. We then have internal experts review the results with the client in case further data processing is necessary All this to result in clean data being uploaded into our production web application for our clients to browse their data and drive conversations around potential savings opportunities. This final part , step 7, is where we’ve migrated from our old columnar database Infobright to MariaDB Columnstore.
  • #4: So why change? At SpendHQ, we’ve been going through some massive growth that’s exposed scalability flaws with our legacy architecture. One of those was our database. Over the last two plus years as we’ve over doubled in size as a company but the data we get is 10 to 20 times greater than before. Naturally, as data grew, we realized we needed to address query performance and modernization.
  • #5: So why specifically was our old database flawed. Simple our prior DB was old. It was based of MySQL 5.1 ( similar to infiniDB actually which Columstore was created out of but our old database stopped giving updates). This older version translates to slower performance compared to modern databases. Infobrights columnar db was single threaded which didnt help performance. We couldn’t clusterize it. We didn’t have access to Innodb tables for transactions, thus we were left with MyIsam. Furthermore we couldn’t scale vertically anymore to marginally improve performance either like in years past. All that said, when defining what to move to, these were our top priorities. Ease of transition, Scalability , Lower Cost and Support.
  • #6: With that said, I’ve set the stage to take us back to December 2017 when we began considering other databases. Now when going through this, we didn’t plan on three sections but when looking backwards this is how I see. We had three phases, analyze, prepare and refine.
  • #7: Step one of analyzing was to identify options
  • #8: To do this, we engaged with pythian on a consulting engagement to identify and recommend a database that fit our needs. Taking into consideration our wants from a couple slides ago, (ease of transition, scalability , Cost and Support ) and our business model, they identified 7 column-oriented databases and recommended one. MariaDB Columnstore. By the way thank you John Shults for your work on this here.
  • #9: MariaDB Columnstore met all our need to haves. Its ANSI SQL, apart from some special columnstore commands and intricacies, Its open source helping keep costs down and community support big There’s enterprise support for those who want it, which we at SpendHQ definitely take advantage of There’s Professional services to ramp up team education and to be a partner in any project. Plus MariaDB Columnstore is scalable and performant.
  • #10: Next was to quantify what we aimed to accomplish by switching to MariaDB so we could sell Bussiness Folks that this is a good decision and so that we could measure our criteria of success.
  • #11: Working with our data team of Robert Little and Dan Mackey, they identified roughly 25 problematic queries that we wanted to see improved. In our final report from pythian they estimated we could achieve a 71% reduction in query time by simply switching databases without significantly refactoring the queries or the schema. Management was blown away. 71% reduction in reads. That means 19 second queries in only 3 and half . Furthermore, if we were to refactor the schema to de-normalize the tables, we could achieve 95%. (blue is for our old db, red was mariadb columnstore, green is a denormalized tables in columnstore) So these became our goals.
  • #12: It took us until March/April to begin actual work for the preparation of the migration. We hired professional services to come out and setup out first Columnstore instance so we could connect to it and do some minor performance tuning of the database.
  • #13: Our consultant from MariaDB was Todd Stoffel. Thank you Todd, great and knowledgeable guy. He has a git repo that we used to easily install Columnstore using ansible. It auto tunes the configs to the hardware better than the Columnstore defaults. So it was a great place to start. However the challenges we faced were NOT on behalf of mariadb, but rather our legacy app.
  • #14: Fail fail fail when making queries to the database. Let me quickly summarize the 3 challenges we had to overcome to keep moving forward.
  • #15: Our framework’s ORM picked up the use of vtables on Columnstore and modified the data objects returns to include them. The root cause of the issue had to do with a specific php function returning the vtable value within our frameworks ORM that we had to wrap around custom logic tying values to tables they were queried from.
  • #16: The next challenge was improper SQL. Somehow our prior database let queries like this above to execute.
  • #17: The 3rd challenge was minor but revolved around case sensitive group bys relative to our old database.
  • #18: Once we got past our application level challenges, we were ready to move from a standalone Columnstore instance to a cluster and actually benchmark performance. We brought Todd back in for a 2nd time to look at the cluster we had setup on our own and help drive optimizations.
  • #19: The first thing we did was look at our hard drives with FIO testing. We identified our storage solution was HDD and had low IOPS speeds. Thus Todd recommend faster hard drives. After switching to SSDs we noticed better concurrency performance from the cluster as well as better individual performance. Next we looked at the configuration files The two key changes that yielded the best results were increasing both our innodb buffer size in the my.cnf and PmMaxMemorySmallSide from the columnstore.xml
  • #20: After other minor adjustments, we went back to the original 25 ish queries and re-benchmarked them all again. Results were great!
  • #21: Next we moved on to making cluster creation faster with some automation
  • #22: [ start playing video then speak] Todd built the original version that we then modified for subsequent deployments. Here is a small video of a 1 um 3 pm setup.
  • #23: Next we worked on the process to move the actual data from our old database to MariaDB Columnstore. At a high level what we chose to do was essentially export all our data as CSVs onto a hard drive, move it to MariaDB and import with cpimport.
  • #24: To optimize this performance, we adjusted key sql variables and split CSVs for tables that were InnoDB. This took our migration time down from 35 hours to 8 hours as Innodb tables were the slow ones to insert. [Talk about slide numbers]
  • #25: With the migration process in place, we were ready to test deploying production.
  • #26: And it failed. The reason for it had to do with our hard drives. We didn’t have enough storage which was shocking to us. The other was a silly mistake of having iSCSI volumes in the fstab without having authenticated with the target first. Simple solutions, more storage and adding some logic to rc.local
  • #27: Furthermore, once we felt ready to deploy again, we noticed performance issues in writes.
  • #28: We then benchmarked 6 core data changing processes that write performance wasn’t good. So I sought out to refactor some of the critical write performance areas. This included the ETL.
  • #29: Utilizing a new multi process architecture to take advantage of innodb row level locking. This resulted in our ETL being 7 times faster on the same hardware as Infobright, but also opened the door to more vertical scaling resulting in 25 times faster ETL uploads.
  • #30: With that and 2 more refactors, we made a huge leap of performance.
  • #31: Now we could deploy
  • #32: The main issue we had when releasing was concurrency and our hard drives not working fast enough. When we asked our storage provider, they recommended we use multipath to open additional sessions from the server to the storage, opening up additional bandwidth.
  • #33: So what do next!
  • #34: Next to do is to de-normalize our schema but instead of trying to refactor our existing app, we’ll be starting a new app. We’ll also want to pilot a GPU database to see the results of the Brytlyt partnership with MariaDB. Overall we are happy with MariaDB Columnstore. Performance is great, the only issues we’ve come across with it are really just our own.
  • #35: So to summarize Where we stand is great compared to where we were. Faster writes, significantly faster reads, significantly faster ETL. Storage needs took a hit and our application needed quite a bit of work given its age. Concurrency isn’t any better for us at the moment but can be solved if we utilize maxscale with additional UMs. Before concluding I want to give a special shout out to all the support we’ve received from MariaDB, specifically Todd Stoffel, Geoff Montee, David Hill plus more