SlideShare a Scribd company logo
Scaling & High Availability MySQL
learnings from the past decade+
Colin Charles, Chief Evangelist, Percona Inc.

colin.charles@percona.com / byte@bytebot.net

http://www.bytebot.net/blog/ | @bytebot on Twitter

OSDC.de, Berlin, Germany

12 June 2018
whoami
• Chief Evangelist, Percona Inc

• Focusing on the MySQL ecosystem (MySQL, Percona Server, MariaDB
Server), as well as the MongoDB ecosystem (Percona Server for
MongoDB) + 100% open source tools from Percona like Percona
Monitoring & Management, Percona xtrabackup, Percona Toolkit, etc.

• Founding team of MariaDB Server (2009-2016), previously at Monty
Program Ab, merged with SkySQL Ab, now MariaDB Corporation

• Formerly MySQL AB (exit: Sun Microsystems)

• Past lives include The Fedora Project (FESCO), OpenOffice.org

• MySQL Community Contributor of the Year Award winner 2014
License
• Creative Commons BY-NC-SA 4.0

• https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 

OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+ by Colin Charles
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+ by Colin Charles
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+ by Colin Charles
Uptime
Percentile target Max downtime per year
90% 36 days
99% 3.65 days
99.5% 1.83 days
99.9% 8.76 hours
99.99% 52.56 minutes
99.999% 5.25 minutes
99.9999% 31.5 seconds
Estimates of levels of availability
Method
Level of
Availability
Simple replication 98-99.9%
Master-Master/MMM 99%
SAN 99.5-99.9%
DRBD, MHA 99.9%
NDBCluster, Galera Cluster,
InnoDB Cluster
99.999%
HA is Redundancy
• RAID: disk crashes? Another works

• Clustering: server crashes? Another works

• Power: fuse blows? Redundant power supplies

• Network: Switch/NIC crashes? 2nd network route

• Geographical: Datacenter offline/destroyed? Computation to another
DC
Durability
• Data stored on disks

• Is it really written to the disk?

▪ being durable means calling fsync() on each commit

• Is it written in a transactional way to guarantee atomicity, crash
safety, integrity?
High Availability for databases
• HA is harder for databases

• Hardware resources and data need to be redundant

• Remember, this isn’t just data - constantly changing data

• HA means the operation can continue uninterrupted, not by
restoring a new/backup server

• uninterrupted: measured in percentiles
Redundancy through client-side XA
transactions
• Client writes to 2 independent but identical databases

• HA-JDBC (http://ha-jdbc.github.io/) 

• No replication anywhere
InnoDB “recovery” time
• innodb_log_file_size
• larger = longer recovery times

• In 2007, Wikipedia reported 40 minutes with 256MB logs;
sometimes it takes 5-10 minutes, sometimes hours

• Percona Server 5.5 (XtraDB) - innodb_recovery_stats
• But today, there has been paradigm change…
Redundancy through shared storage
• Requires specialist hardware, like a SAN

• Complex to operate

• One set of data is your single point of failure

• Cold standby

• failover 1-30 minutes

• this isn’t scale-out

• Active/Active solutions: Oracle RAC
Redundancy through disk replication
• DRBD

• Linux administration vs. DBA skills

• Synchronous

• Second set of data inaccessible for use

• Passive server acting as hot standby

• Failover: 1-30 minutes

• Performance hit compared to single node performance, with higher
average latencies
Redundancy through MySQL replication
• MySQL replication

• Tungsten Replicator

• Galera Cluster

• MySQL InnoDB Cluster

• MySQL Cluster (NDBCLUSTER)

• Storage requirements are multiplied

• Huge potential for scaling out
MySQL Replication
• Statement based generally

• Row based became available in 5.1, and the default in 5.7

• mixed-mode, resulting in STATEMENT except if calling

▪ UUID function, UDF, CURRENT_USER/USER function, LOAD_FILE
function

▪ 2 or more AUTO_INCREMENT columns updated with same statement

▪ server variable used in statement

▪ storage engine doesn’t allow statement based replication, like
NDBCLUSTER

▪ default in MariaDB Server 10.2
MySQL Replication II
• Asynchronous by default

• Semi-synchronous plugin in 5.5+

• However the holy grail of fully synchronous replication is not part of
standard MySQL replication (yet?)

• MariaDB Galera Cluster is built-in to MariaDB Server 10.1

• MySQL InnoDB Cluster is available in MySQL 8.0, combining the
likes of group replication (the Galera Cluster equivalent), InnoDB,
and mysqlsh for maintaining it; it also uses MySQL Router for load
balancing purposes
Semi-synchronous replication
• semi-sync capable slave acknowledges transaction event only after
written to relay log & flushed to disk

• timeout occurs? master reverts to async replication; resumes when
slaves catch up

• at scale, Facebook runs semi-sync: http://
yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-
replication-at-facebook.html
MySQL Replication in 5.6
• Global Transaction ID (GTID)

• Server UUID

• Ignore (master) server IDs (filtering)

• Per-schema multi-threaded slave

• Group commit in the binary log

• Binary log (binlog) checksums

• Crash safe binlog and relay logs

• Time delayed replication

• Parallel replication (per database)
Replication: START TRANSACTION WITH
CONSISTENT SNAPSHOT
• Works with the binlog, possible to obtain the binlog position corresponding to
a transactional snapshot of the database without blocking any other queries. 

• by-product of group commit in the binlog to view commit ordering

• Used by the command mysqldump--single-transaction --master-
data to do a fully non-blocking backup 

• Works consistently between transactions involving more than one storage
engine

• https://kb.askmonty.org/en/enhancements-for-start-transaction-with-
consistent/

• Percona Server improved it, by session ID, and also introducing backup locks
Multi-source replication
• Multi-source replication - (real-time) analytics, shard provisioning,
backups, etc.

• @@default_master_connection contains current connection name
(used if connection name is not given)

• All master/slave commands take a connection name now (like CHANGE
MASTER “connection_name”, SHOW SLAVE “connection_name”
STATUS, etc.)
Global Transaction ID (GTID)
• Supports multi-source replication

• GTID can be enabled or disabled independently and online for masters or
slaves

• Slaves using GTID do not have to have binary logging enabled.

• (MariaDB) Supports multiple replication domains (independent binlog streams)

• Queries in different domains can be run in parallel on the slave.
Why MariaDB GTID is different compared to
5.6?
• MySQL 5.6 GTID does not support multi-source replication (only 5.7
supports this)

• Supports —log-slave-updates=0 for efficiency (like 5.7)

• Enabled by default

• Turn it on without having to restart the topology (just like 5.7)
Parallel replication
• Multi-source replication from different masters executed in parallel 

• Queries from different domains are executed in parallel 

• Queries that are run in parallel on the master are run in parallel on the
slave (based on group commit).

• Transactions modifying the same table can be updated in parallel on
the slave! 

• Supports both statement based and row based replication.
All in… sometimes it can get out of sync
• Changed information on slave directly

• Statement based replication

• non-deterministic SQL (UPDATE/DELETE with LIMIT and without ORDER BY)

• triggers & stored procedures

• Master in MyISAM, slave in InnoDB (deadlocks)

• --replication-ignore-db with fully qualified queries

• Binlog corruption on master

• PURGE BINARY LOGS issued and not enough files to update slave

• read_buffer_size larger than max_allowed_packet

• Bugs?
Replication Monitoring
• Percona Toolkit is important

• pt-slave-find: find slave information from master

• pt-table-checksum: online replication consistency check

• executes checksum queries on master

• pt-table-sync: synchronise table data efficiently

• changes data, so backups important
Replication Monitoring with PMM
•http://pmmdemo.percona.com/
mysqlbinlog versions
• ERROR: Error in Log_event::read_log_event(): 'Found invalid event in
binary log', data_len: 56, event_type: 30

• 5.6 ships with a “streaming binlog backup server” - v.3.4; MariaDB
10 doesn’t - v.3.3 (fixed in 10.2 - MDEV-8713)

• GTID variances!

• Beware mysql-client from your Linux distribution
Slave prefetching
• Replication Booster

• https://github.com/yoshinorim/replication-booster-for-mysql

• Prefetch MySQL relay logs to make the SQL thread faster

• Tungsten has slave prefetch

• Percona Server till 5.6 + MariaDB till 10.1 have InnoDB fake changes
Changing paradigm: What replaces slave
prefetching?
• In Percona Server 5.7, slave prefetching has been replaced by doing
intra-schema parallel replication

• Feature removed from XtraDB

• MariaDB Server 10.2 also has this feature removed, as they switched
to InnoDB!
Galera Cluster
• Inside MySQL, a replication plugin (wsrep)

• Replaces MySQL replication (but can work alongside it too)

• True multi-master, active-active solution

• Virtually Synchronous

• WAN performance: 100-300ms/commit, works in parallel

• No slave lag or integrity issues

• Automatic node provisioning
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+ by Colin Charles
Percona XtraDB Cluster 5.7
• Engineering within Percona

• Load balancing with ProxySQL (bundled)

• PMM integration

• Benefits of all the MySQL 5.7 feature-set

• ProxySQL admin tool

• Safety features enabled (e.g. no accidentally using MyISAM, etc.)
Group replication
• Fully synchronous replication (update everywhere), self-healing, with
elasticity, redundancy

• Single primary mode supported

• MySQL InnoDB Cluster - a combination of group replication, Router, to
make magic!

• Recent blogs:

• https://www.percona.com/blog/2017/02/24/battle-for-synchronous-
replication-in-mysql-galera-vs-group-replication/

• https://www.percona.com/blog/2017/02/15/group-replication-shipped-
early/
Summary of Replication Performance
• SAN has "some" latency overhead compared to local disk. Can be
great for throughput.

• DRBD has a performance penalty

• Replication, when implemented correctly, has no performance penalty

• But MySQL replication with disk bound data set has single-threaded
issues!

• Semi-sync is poorer on WAN compared to async

• Galera & InnoDB Cluster provide read/write scale-out, thus more
performance
Handling failure
• How do we find out about failure?

• Polling, monitoring, alerts...

• Error returned to and handled in client side

• What should we do about it?

• Direct requests to the spare nodes (or DCs)

• How to protect data integrity?

• Master-slave is unidirectional: Must ensure there is only one master at all times.

• DRBD and SAN have cold-standby: Must mount disks and start mysqld.

• In all cases must ensure that 2 disconnected replicas cannot both commit
independently. (split brain)
Frameworks to handle failure
• MySQL-MMM

• Severalnines ClusterControl

• Orchestrator

• MySQL MHA

• Percona Replication Manager

• Tungsten Replicator

• 5.6: mysqlfailover, mysqlrpladmin

• (MariaDB) Replication Manager
Orchestrator
• Reads replication topologies, keeps state, continuous polling

• Modify your topology — move slaves around

• Nice GUI, JSON API, CLI
MySQL MHA
• Like MMM, specialized solution for MySQL replication

• Developed by Yoshinori Matsunobu at DeNA

• Automated and manual failover options

• Topology: 1 master, many slaves

• Choose new master by comparing slave binlog positions

• Can be used in conjunction with other solutions

• http://code.google.com/p/mysql-master-ha/
Pacemaker
• Heartbeat, Corosync, Pacemaker

• Resource Agents, Percona-PRM

• Percona Replication Manager - cluster, geographical disaster
recovery options

• Pacemaker agent specialised on MySQL replication

• https://github.com/percona/percona-pacemaker-agents/ 

• Pacemaker Resource Agents 3.9.3+ include Percona Replication
Manager (PRM)
Load Balancers for multi-master clusters
• Synchronous multi-master clusters like Galera require load balancers

• HAProxy

• Galera Load Balancer (GLB)

• MaxScale

• ProxySQL
MySQL Router
• Routing between applications and any backend MySQL servers

• Failover

• Load Balancing

• Pluggable architecture (connection routing)
MariaDB MaxScale
• “Pluggable router” that offers
connection & statement based
load balancing

• Possibilities are endless - use it
for logging, writing to other
databases (besides MySQL),
preventing SQL injections via
regex filtering, route via hints,
query rewriting, have a binlog
relay, etc.
ProxySQL
• High Performance MySQL proxy with a GPL license

• Performance is a priority - the numbers prove it

• Can query rewrite

• Sharding by host/schema or both, with rule engine + modification to
SQL + application logic
JDBC/PHP drivers
• JDBC - multi-host failover feature (just specify master/slave hosts in
the properties)

• true for MariaDB Java Connector too

• PHP handles this too - mysqlnd_ms

• Can handle read-write splitting, round robin or random host
selection, and more
Clustering: solution or part of problem?
• "Causes of Downtime in Production MySQL Servers" whitepaper,
Baron Schwartz VividCortex

• Human error

• SAN

• Clustering framework + SAN = more problems

• Galera/group replication is replication based, has no false positives
as there’s no “failover” moment, you don’t need a clustering
framework (JDBC or PHP can load balance), and is relatively elegant
overall
Replication type
• Competence choices

• Replication: MySQL DBA manages

• DRBD: Linux admin manages

• SAN: requires domain controller

• Operations

• DRBD (disk level) = cold standby =
longer failover

• Replication = hot standby =
shorter failover

• GTID helps tremendously

• Performance

• SAN has higher latency than local
disk

• DRBD has higher latency than
local disk

• Replication has little overhead

• Redundancy

• Shared disk = SPoF

• Shared nothing = redundant
SBR vs RBR? Async vs sync?
• row based: deterministic

• statement based: dangerous

• GTID: easier setup & failover of complex topologies

• async: data loss in failover

• sync: best

• multi-threaded slaves: scalability (hello 5.6+, Tungsten)
What about the cloud?
• Usually scalability & high availability is more or less “built-in”

• e.g. RDS has multi-AZ (synchronous data replication), but doesn’t
give you a read replica; Cloud SQL uses semi-sync replication

• Watch out for the SLAs (and automatic upgrades)

• Monitoring via PERFORMANCE_SCHEMA

• “Bad” nodes do exist; do not assume node provisioning is quick
Conclusion
• MySQL replication is amazing if you know it (and monitor it) well
enough

• Large sites run just fine with semi-sync + tooling for automated
failover

• Galera Cluster/MySQL InnoDB Cluster is great for virtually fully
synchronous replication

• Don’t forget the need for a load balancer: ProxySQL is nifty

• When thinking scaling, think scale out (it is more efficient, and fits
modern cloud mantras too!)
At Percona, we care about your High
Availability
• Percona XtraDB Cluster 5.7 with support for ProxySQL and Percona
Monitoring & Management (PMM)

• Percona Monitoring & Management (PMM) with Orchestrator

• Percona Toolkit

• Percona Server for MySQL 5.7

• Percona XtraBackup
Thank you!
Colin Charles
colin.charles@percona.com / byte@bytebot.net
http://bytebot.net/blog | @bytebot on twitter
slides: slideshare.net/bytebot

More Related Content

What's hot (20)

PPTX
Container Orchestration
dfilppi
 
PDF
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Etsuji Nakai
 
PDF
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
NETWAYS
 
PDF
OSDC 2018 | Spicing up VMWare with Ansible and InSpec by Martin Schurz and S...
NETWAYS
 
PDF
Building stateful applications on Kubernetes with Rook
Roberto Hashioka
 
PPTX
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Opcito Technologies
 
PDF
How to Prepare for CKA Exam
Alfie Chen
 
PDF
Fabric8 - Being devOps doesn't suck anymore
Henryk Konsek
 
PDF
Scale Kubernetes to support 50000 services
LinuxCon ContainerCon CloudOpen China
 
PPTX
Kubernetes Introduction & Whats new in Kubernetes 1.6
Opcito Technologies
 
PDF
Linuxcon secureefficientcontainerimagemanagementharbor
LinuxCon ContainerCon CloudOpen China
 
PDF
Best practices in Deploying SUSE CaaS Platform v3
Juan Herrera Utande
 
PPTX
Open shift enterprise 3.1 paas on kubernetes
Samuel Terburg
 
PPTX
Service Discovery In Kubernetes
Knoldus Inc.
 
PDF
HPC in a Box - Docker Workshop at ISC 2015
inside-BigData.com
 
PDF
Docker for HPC in a Nutshell
inside-BigData.com
 
PDF
Releasing a Distribution in the Age of DevOps.
LinuxCon ContainerCon CloudOpen China
 
PDF
AWS Lambda and serverless Java | DevNation Live
Red Hat Developers
 
PDF
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
Juan Herrera Utande
 
Container Orchestration
dfilppi
 
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Etsuji Nakai
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
NETWAYS
 
OSDC 2018 | Spicing up VMWare with Ansible and InSpec by Martin Schurz and S...
NETWAYS
 
Building stateful applications on Kubernetes with Rook
Roberto Hashioka
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Opcito Technologies
 
How to Prepare for CKA Exam
Alfie Chen
 
Fabric8 - Being devOps doesn't suck anymore
Henryk Konsek
 
Scale Kubernetes to support 50000 services
LinuxCon ContainerCon CloudOpen China
 
Kubernetes Introduction & Whats new in Kubernetes 1.6
Opcito Technologies
 
Linuxcon secureefficientcontainerimagemanagementharbor
LinuxCon ContainerCon CloudOpen China
 
Best practices in Deploying SUSE CaaS Platform v3
Juan Herrera Utande
 
Open shift enterprise 3.1 paas on kubernetes
Samuel Terburg
 
Service Discovery In Kubernetes
Knoldus Inc.
 
HPC in a Box - Docker Workshop at ISC 2015
inside-BigData.com
 
Docker for HPC in a Nutshell
inside-BigData.com
 
Releasing a Distribution in the Age of DevOps.
LinuxCon ContainerCon CloudOpen China
 
AWS Lambda and serverless Java | DevNation Live
Red Hat Developers
 
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
Juan Herrera Utande
 

Similar to OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+ by Colin Charles (20)

PDF
Best practices for MySQL High Availability Tutorial
Colin Charles
 
PDF
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
PDF
Best practices for MySQL High Availability
Colin Charles
 
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ontico
 
PDF
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...
Insight Technology, Inc.
 
PDF
The MySQL High Availability Landscape and where Galera Cluster fits in
Sakari Keskitalo
 
PDF
OSDC 2017 | Lessons from database failures by Colin Charles
NETWAYS
 
PDF
MySQL Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
PDF
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
PDF
MySQL Parallel Replication: inventory, use-cases and limitations
Jean-François Gagné
 
PDF
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
PDF
The MySQL Server Ecosystem in 2016
Colin Charles
 
PDF
MySQL Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
PDF
Lessons from database failures
Colin Charles
 
PDF
Lessons from database failures
Colin Charles
 
PPTX
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Codership Oy - Creators of Galera Cluster
 
PDF
MySQL features missing in MariaDB Server
Colin Charles
 
PDF
MySQL highav Availability
Baruch Osoveskiy
 
PDF
MySQL Ecosystem in 2018
Laurynas Biveinis
 
Best practices for MySQL High Availability Tutorial
Colin Charles
 
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
Best practices for MySQL High Availability
Colin Charles
 
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ontico
 
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...
Insight Technology, Inc.
 
The MySQL High Availability Landscape and where Galera Cluster fits in
Sakari Keskitalo
 
OSDC 2017 | Lessons from database failures by Colin Charles
NETWAYS
 
MySQL Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
MySQL Parallel Replication: inventory, use-cases and limitations
Jean-François Gagné
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
The MySQL Server Ecosystem in 2016
Colin Charles
 
MySQL Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
Lessons from database failures
Colin Charles
 
Lessons from database failures
Colin Charles
 
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Codership Oy - Creators of Galera Cluster
 
MySQL features missing in MariaDB Server
Colin Charles
 
MySQL highav Availability
Baruch Osoveskiy
 
MySQL Ecosystem in 2018
Laurynas Biveinis
 
Ad

Recently uploaded (20)

PPTX
EO4EU Ocean Monitoring: Maritime Weather Routing Optimsation Use Case
EO4EU
 
PPTX
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
 
PDF
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
 
PPTX
ManageIQ - Sprint 264 Review - Slide Deck
ManageIQ
 
PDF
From Chaos to Clarity: Mastering Analytics Governance in the Modern Enterprise
Wiiisdom
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PDF
>Nitro Pro Crack 14.36.1.0 + Keygen Free Download [Latest]
utfefguu
 
PDF
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
PPTX
IObit Driver Booster Pro 12.4-12.5 license keys 2025-2026
chaudhryakashoo065
 
PDF
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
 
PDF
Code Once; Run Everywhere - A Beginner’s Journey with React Native
Hasitha Walpola
 
PDF
Laboratory Workflows Digitalized and live in 90 days with Scifeon´s SAPPA P...
info969686
 
PPTX
CONCEPT OF PROGRAMMING in language .pptx
tamim41
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PDF
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
PDF
Cloud computing Lec 02 - virtualization.pdf
asokawennawatte
 
PDF
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
 
PPTX
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
PPTX
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
 
EO4EU Ocean Monitoring: Maritime Weather Routing Optimsation Use Case
EO4EU
 
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
 
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
 
ManageIQ - Sprint 264 Review - Slide Deck
ManageIQ
 
From Chaos to Clarity: Mastering Analytics Governance in the Modern Enterprise
Wiiisdom
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
>Nitro Pro Crack 14.36.1.0 + Keygen Free Download [Latest]
utfefguu
 
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
IObit Driver Booster Pro 12.4-12.5 license keys 2025-2026
chaudhryakashoo065
 
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
 
Code Once; Run Everywhere - A Beginner’s Journey with React Native
Hasitha Walpola
 
Laboratory Workflows Digitalized and live in 90 days with Scifeon´s SAPPA P...
info969686
 
CONCEPT OF PROGRAMMING in language .pptx
tamim41
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
Cloud computing Lec 02 - virtualization.pdf
asokawennawatte
 
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
 
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
 
Ad

OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+ by Colin Charles

  • 1. Scaling & High Availability MySQL learnings from the past decade+ Colin Charles, Chief Evangelist, Percona Inc. [email protected] / [email protected] http://www.bytebot.net/blog/ | @bytebot on Twitter OSDC.de, Berlin, Germany 12 June 2018
  • 2. whoami • Chief Evangelist, Percona Inc • Focusing on the MySQL ecosystem (MySQL, Percona Server, MariaDB Server), as well as the MongoDB ecosystem (Percona Server for MongoDB) + 100% open source tools from Percona like Percona Monitoring & Management, Percona xtrabackup, Percona Toolkit, etc. • Founding team of MariaDB Server (2009-2016), previously at Monty Program Ab, merged with SkySQL Ab, now MariaDB Corporation • Formerly MySQL AB (exit: Sun Microsystems) • Past lives include The Fedora Project (FESCO), OpenOffice.org • MySQL Community Contributor of the Year Award winner 2014
  • 3. License • Creative Commons BY-NC-SA 4.0 • https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 

  • 7. Uptime Percentile target Max downtime per year 90% 36 days 99% 3.65 days 99.5% 1.83 days 99.9% 8.76 hours 99.99% 52.56 minutes 99.999% 5.25 minutes 99.9999% 31.5 seconds
  • 8. Estimates of levels of availability Method Level of Availability Simple replication 98-99.9% Master-Master/MMM 99% SAN 99.5-99.9% DRBD, MHA 99.9% NDBCluster, Galera Cluster, InnoDB Cluster 99.999%
  • 9. HA is Redundancy • RAID: disk crashes? Another works • Clustering: server crashes? Another works • Power: fuse blows? Redundant power supplies • Network: Switch/NIC crashes? 2nd network route • Geographical: Datacenter offline/destroyed? Computation to another DC
  • 10. Durability • Data stored on disks • Is it really written to the disk? ▪ being durable means calling fsync() on each commit • Is it written in a transactional way to guarantee atomicity, crash safety, integrity?
  • 11. High Availability for databases • HA is harder for databases • Hardware resources and data need to be redundant • Remember, this isn’t just data - constantly changing data • HA means the operation can continue uninterrupted, not by restoring a new/backup server • uninterrupted: measured in percentiles
  • 12. Redundancy through client-side XA transactions • Client writes to 2 independent but identical databases • HA-JDBC (http://ha-jdbc.github.io/) • No replication anywhere
  • 13. InnoDB “recovery” time • innodb_log_file_size • larger = longer recovery times • In 2007, Wikipedia reported 40 minutes with 256MB logs; sometimes it takes 5-10 minutes, sometimes hours • Percona Server 5.5 (XtraDB) - innodb_recovery_stats • But today, there has been paradigm change…
  • 14. Redundancy through shared storage • Requires specialist hardware, like a SAN • Complex to operate • One set of data is your single point of failure • Cold standby • failover 1-30 minutes • this isn’t scale-out • Active/Active solutions: Oracle RAC
  • 15. Redundancy through disk replication • DRBD • Linux administration vs. DBA skills • Synchronous • Second set of data inaccessible for use • Passive server acting as hot standby • Failover: 1-30 minutes • Performance hit compared to single node performance, with higher average latencies
  • 16. Redundancy through MySQL replication • MySQL replication • Tungsten Replicator • Galera Cluster • MySQL InnoDB Cluster • MySQL Cluster (NDBCLUSTER) • Storage requirements are multiplied • Huge potential for scaling out
  • 17. MySQL Replication • Statement based generally • Row based became available in 5.1, and the default in 5.7 • mixed-mode, resulting in STATEMENT except if calling ▪ UUID function, UDF, CURRENT_USER/USER function, LOAD_FILE function ▪ 2 or more AUTO_INCREMENT columns updated with same statement ▪ server variable used in statement ▪ storage engine doesn’t allow statement based replication, like NDBCLUSTER ▪ default in MariaDB Server 10.2
  • 18. MySQL Replication II • Asynchronous by default • Semi-synchronous plugin in 5.5+ • However the holy grail of fully synchronous replication is not part of standard MySQL replication (yet?) • MariaDB Galera Cluster is built-in to MariaDB Server 10.1 • MySQL InnoDB Cluster is available in MySQL 8.0, combining the likes of group replication (the Galera Cluster equivalent), InnoDB, and mysqlsh for maintaining it; it also uses MySQL Router for load balancing purposes
  • 19. Semi-synchronous replication • semi-sync capable slave acknowledges transaction event only after written to relay log & flushed to disk • timeout occurs? master reverts to async replication; resumes when slaves catch up • at scale, Facebook runs semi-sync: http:// yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous- replication-at-facebook.html
  • 20. MySQL Replication in 5.6 • Global Transaction ID (GTID) • Server UUID • Ignore (master) server IDs (filtering) • Per-schema multi-threaded slave • Group commit in the binary log • Binary log (binlog) checksums • Crash safe binlog and relay logs • Time delayed replication • Parallel replication (per database)
  • 21. Replication: START TRANSACTION WITH CONSISTENT SNAPSHOT • Works with the binlog, possible to obtain the binlog position corresponding to a transactional snapshot of the database without blocking any other queries. • by-product of group commit in the binlog to view commit ordering • Used by the command mysqldump--single-transaction --master- data to do a fully non-blocking backup • Works consistently between transactions involving more than one storage engine • https://kb.askmonty.org/en/enhancements-for-start-transaction-with- consistent/ • Percona Server improved it, by session ID, and also introducing backup locks
  • 22. Multi-source replication • Multi-source replication - (real-time) analytics, shard provisioning, backups, etc. • @@default_master_connection contains current connection name (used if connection name is not given) • All master/slave commands take a connection name now (like CHANGE MASTER “connection_name”, SHOW SLAVE “connection_name” STATUS, etc.)
  • 23. Global Transaction ID (GTID) • Supports multi-source replication • GTID can be enabled or disabled independently and online for masters or slaves • Slaves using GTID do not have to have binary logging enabled. • (MariaDB) Supports multiple replication domains (independent binlog streams) • Queries in different domains can be run in parallel on the slave.
  • 24. Why MariaDB GTID is different compared to 5.6? • MySQL 5.6 GTID does not support multi-source replication (only 5.7 supports this) • Supports —log-slave-updates=0 for efficiency (like 5.7) • Enabled by default • Turn it on without having to restart the topology (just like 5.7)
  • 25. Parallel replication • Multi-source replication from different masters executed in parallel • Queries from different domains are executed in parallel • Queries that are run in parallel on the master are run in parallel on the slave (based on group commit). • Transactions modifying the same table can be updated in parallel on the slave! • Supports both statement based and row based replication.
  • 26. All in… sometimes it can get out of sync • Changed information on slave directly • Statement based replication • non-deterministic SQL (UPDATE/DELETE with LIMIT and without ORDER BY) • triggers & stored procedures • Master in MyISAM, slave in InnoDB (deadlocks) • --replication-ignore-db with fully qualified queries • Binlog corruption on master • PURGE BINARY LOGS issued and not enough files to update slave • read_buffer_size larger than max_allowed_packet • Bugs?
  • 27. Replication Monitoring • Percona Toolkit is important • pt-slave-find: find slave information from master • pt-table-checksum: online replication consistency check • executes checksum queries on master • pt-table-sync: synchronise table data efficiently • changes data, so backups important
  • 28. Replication Monitoring with PMM •http://pmmdemo.percona.com/
  • 29. mysqlbinlog versions • ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 56, event_type: 30 • 5.6 ships with a “streaming binlog backup server” - v.3.4; MariaDB 10 doesn’t - v.3.3 (fixed in 10.2 - MDEV-8713) • GTID variances! • Beware mysql-client from your Linux distribution
  • 30. Slave prefetching • Replication Booster • https://github.com/yoshinorim/replication-booster-for-mysql • Prefetch MySQL relay logs to make the SQL thread faster • Tungsten has slave prefetch • Percona Server till 5.6 + MariaDB till 10.1 have InnoDB fake changes
  • 31. Changing paradigm: What replaces slave prefetching? • In Percona Server 5.7, slave prefetching has been replaced by doing intra-schema parallel replication • Feature removed from XtraDB • MariaDB Server 10.2 also has this feature removed, as they switched to InnoDB!
  • 32. Galera Cluster • Inside MySQL, a replication plugin (wsrep) • Replaces MySQL replication (but can work alongside it too) • True multi-master, active-active solution • Virtually Synchronous • WAN performance: 100-300ms/commit, works in parallel • No slave lag or integrity issues • Automatic node provisioning
  • 34. Percona XtraDB Cluster 5.7 • Engineering within Percona • Load balancing with ProxySQL (bundled) • PMM integration • Benefits of all the MySQL 5.7 feature-set • ProxySQL admin tool • Safety features enabled (e.g. no accidentally using MyISAM, etc.)
  • 35. Group replication • Fully synchronous replication (update everywhere), self-healing, with elasticity, redundancy • Single primary mode supported • MySQL InnoDB Cluster - a combination of group replication, Router, to make magic! • Recent blogs: • https://www.percona.com/blog/2017/02/24/battle-for-synchronous- replication-in-mysql-galera-vs-group-replication/ • https://www.percona.com/blog/2017/02/15/group-replication-shipped- early/
  • 36. Summary of Replication Performance • SAN has "some" latency overhead compared to local disk. Can be great for throughput. • DRBD has a performance penalty • Replication, when implemented correctly, has no performance penalty • But MySQL replication with disk bound data set has single-threaded issues! • Semi-sync is poorer on WAN compared to async • Galera & InnoDB Cluster provide read/write scale-out, thus more performance
  • 37. Handling failure • How do we find out about failure? • Polling, monitoring, alerts... • Error returned to and handled in client side • What should we do about it? • Direct requests to the spare nodes (or DCs) • How to protect data integrity? • Master-slave is unidirectional: Must ensure there is only one master at all times. • DRBD and SAN have cold-standby: Must mount disks and start mysqld. • In all cases must ensure that 2 disconnected replicas cannot both commit independently. (split brain)
  • 38. Frameworks to handle failure • MySQL-MMM • Severalnines ClusterControl • Orchestrator • MySQL MHA • Percona Replication Manager • Tungsten Replicator • 5.6: mysqlfailover, mysqlrpladmin • (MariaDB) Replication Manager
  • 39. Orchestrator • Reads replication topologies, keeps state, continuous polling • Modify your topology — move slaves around • Nice GUI, JSON API, CLI
  • 40. MySQL MHA • Like MMM, specialized solution for MySQL replication • Developed by Yoshinori Matsunobu at DeNA • Automated and manual failover options • Topology: 1 master, many slaves • Choose new master by comparing slave binlog positions • Can be used in conjunction with other solutions • http://code.google.com/p/mysql-master-ha/
  • 41. Pacemaker • Heartbeat, Corosync, Pacemaker • Resource Agents, Percona-PRM • Percona Replication Manager - cluster, geographical disaster recovery options • Pacemaker agent specialised on MySQL replication • https://github.com/percona/percona-pacemaker-agents/ • Pacemaker Resource Agents 3.9.3+ include Percona Replication Manager (PRM)
  • 42. Load Balancers for multi-master clusters • Synchronous multi-master clusters like Galera require load balancers • HAProxy • Galera Load Balancer (GLB) • MaxScale • ProxySQL
  • 43. MySQL Router • Routing between applications and any backend MySQL servers • Failover • Load Balancing • Pluggable architecture (connection routing)
  • 44. MariaDB MaxScale • “Pluggable router” that offers connection & statement based load balancing • Possibilities are endless - use it for logging, writing to other databases (besides MySQL), preventing SQL injections via regex filtering, route via hints, query rewriting, have a binlog relay, etc.
  • 45. ProxySQL • High Performance MySQL proxy with a GPL license • Performance is a priority - the numbers prove it • Can query rewrite • Sharding by host/schema or both, with rule engine + modification to SQL + application logic
  • 46. JDBC/PHP drivers • JDBC - multi-host failover feature (just specify master/slave hosts in the properties) • true for MariaDB Java Connector too • PHP handles this too - mysqlnd_ms • Can handle read-write splitting, round robin or random host selection, and more
  • 47. Clustering: solution or part of problem? • "Causes of Downtime in Production MySQL Servers" whitepaper, Baron Schwartz VividCortex • Human error • SAN • Clustering framework + SAN = more problems • Galera/group replication is replication based, has no false positives as there’s no “failover” moment, you don’t need a clustering framework (JDBC or PHP can load balance), and is relatively elegant overall
  • 48. Replication type • Competence choices • Replication: MySQL DBA manages • DRBD: Linux admin manages • SAN: requires domain controller • Operations • DRBD (disk level) = cold standby = longer failover • Replication = hot standby = shorter failover • GTID helps tremendously • Performance • SAN has higher latency than local disk • DRBD has higher latency than local disk • Replication has little overhead • Redundancy • Shared disk = SPoF • Shared nothing = redundant
  • 49. SBR vs RBR? Async vs sync? • row based: deterministic • statement based: dangerous • GTID: easier setup & failover of complex topologies • async: data loss in failover • sync: best • multi-threaded slaves: scalability (hello 5.6+, Tungsten)
  • 50. What about the cloud? • Usually scalability & high availability is more or less “built-in” • e.g. RDS has multi-AZ (synchronous data replication), but doesn’t give you a read replica; Cloud SQL uses semi-sync replication • Watch out for the SLAs (and automatic upgrades) • Monitoring via PERFORMANCE_SCHEMA • “Bad” nodes do exist; do not assume node provisioning is quick
  • 51. Conclusion • MySQL replication is amazing if you know it (and monitor it) well enough • Large sites run just fine with semi-sync + tooling for automated failover • Galera Cluster/MySQL InnoDB Cluster is great for virtually fully synchronous replication • Don’t forget the need for a load balancer: ProxySQL is nifty • When thinking scaling, think scale out (it is more efficient, and fits modern cloud mantras too!)
  • 52. At Percona, we care about your High Availability • Percona XtraDB Cluster 5.7 with support for ProxySQL and Percona Monitoring & Management (PMM) • Percona Monitoring & Management (PMM) with Orchestrator • Percona Toolkit • Percona Server for MySQL 5.7 • Percona XtraBackup
  • 53. Thank you! Colin Charles [email protected] / [email protected] http://bytebot.net/blog | @bytebot on twitter slides: slideshare.net/bytebot