0% found this document useful (0 votes)

82 views52 pages

Diagnosing Failures in Mysql Replication: Devananda Van Der Veen Percona Live 2012

This document summarizes diagnosing failures in MySQL replication. It begins with introducing common replication problems and reviewing how MySQL replication works. It then discusses two main times when replication can fail: when the MySQL daemon is okay or when it was interrupted. Finally, it covers some specific common situations like documented limitations, writes on the slave, and file corruption. The goal is to help understand why and where failures happen to determine the best solution.

Uploaded by

api-68620136

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views52 pages

Diagnosing Failures in Mysql Replication: Devananda Van Der Veen Percona Live 2012

Uploaded by

api-68620136

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Diagnosing Failures in MySQL Replication

Devananda van der Veen Percona Live 2012 [email protected]

Introduction
About Me Building replication clusters since 2004

DBA at Hydra Media (2005 - 2009) Consultant at Percona (2009 - 2012) Now hacking on Openstack @ HP

-2-

Table of Contents

Introduce the Problems Replication Architecture Review Some ways it can break How to diagnose the problems Knowing when to skip, when/how to fix, and when to just rebuild. Questions (if we have time)

-3-

Replication is Ubiquitous
Who doesn't use MySQL Replication for...

Load distribution Redundancy (local or geographic) Master-Master fail-over Taking backups from slave Archiving & Data Warehousing Many other things too

-4-

but not perfect ...

Google mysql replication failure 130,000 650,000 results

-5-

and has some very old bugs.

MySQL lists 51 46 Active bugs in Replication as of April 4, 2012
26945 48062 43457 58637 62557

Explicit TEMP TABLE causes replication failure if mysqld restarted Date arithmetic in Stored Procedure breaks replication replicate-ignore-db behaves differently with different binlog formats INSERT..ON DUP KEY UPDATE unsafe if more than one unique key SHOW SLAVE STATUS gives wrong output with master-master and using SET uservars

http://bit.ly/MysqlRepBugs

-6-

Knowledge is Power
Here are some specific problems that I'll discuss (time permitting)

Duplicate key errors Temp table does not exist after slave restart Binary and relay log corruption Slave reads from wrong position after crash And more...

-7-

But first...
Small review of how MySQL replication works

Master & Slave threads Binary & Relay log files Status files (or status tables in 5.6)

For more, see Lars Thalmann's presentations bit.ly/LarsRepTalk bit.ly/LarsRepTalk1

-8-

Let's Review
Write to Keeps list at

Master Host
master-bin.xxx master-bin.index

(binary file) (plain-text file)

Composed of two threads: IO + SQL

IO_Thread SQL_Thread

Slave Host

Status file Reads from Writes to Local list

Status file master.info master-bin.xxx Reads from relay-bin.xxx relay-bin.index

relay.info relay-bin.xxx

bit.ly/5-5-slave-log-status
-9-

The Index Files

# cat master-bin.index ./master-bin.000001 ./master-bin.000002 # cat relay-bin.index ./relay-bin.000001 ./relay-bin.000002 Master binary log index file

Slave relay log index file

You sometimes need to edit these files manually (but only in dire situations)

-10-

The Info Files

Slave:$ cat master.info # of lines in file 15 Master_Log_File master-bin.000001 Read_Master_Log_Pos 1818 Master_Host 127.0.0.1 Master_User repl_user repl_pass (plain-text!!) Master_Password Master_Port 19501 Connect_Retry 60 ... ... current position of io_thd on master

-11-

The Info Files part 2

local read position Slave:$ cat relay-log.info of sql_thd ./relay-bin.000002 Relay_Log_File 1392 Relay_Log_Pos master-bin.000001 Relay_Master_Log_File 1818 Exec_Master_Log_Pos remote read position of sql_thd

These positions correspond to the end_log_pos of the last statement executed by sql_thread

-12-

mysql> SHOW SLAVE STATUS\G

Slave_IO_State: Waiting for master Slave_IO_Running: Yes Slave_SQL_Running: Yes Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1818 Relay_Log_File: relay-bin.000001 Relay_Log_Pos: 251 Relay_Master_Log_File: master-bin.000001 Exec_Master_Log_Pos: 1818 io_thread remote pos sql_thread local pos sql_thread remote pos

Fields re-ordered for display

-13-

Running Example
~/sandboxes$ make_repl_sandbox --circular 2 5.1.55 Percona-Server-5.1.55-rel12.6-200-Linux-x86_64 Node1 = Active master | Node2 = Passive Master node1 > create table foo ( id int not null auto_increment primary key, v varchar(255) not null default '', unique (v) ); Query OK, 0 rows affected (0.01 sec) node1 > insert into foo (v) values ('a'), ('b'); Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0

-14-

mysqlbinlogon master-bin.xxxxxx
# at 1610 #110318 14:20:12 server id 101 end_log_pos 1683 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1300483212/*!*/; BEGIN/*!*/; # at 1683 #110318 14:20:12 server id 101 end_log_pos 1791 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1300483212/*!*/; INSERT INTO `foo` VALUES (1,'a'),(2,'b')/*!*/; # at 1791 #110318 14:20:12 server id 101 Xid = 32 COMMIT/*!*/;
-15-

end_log_pos 1818

mysqlbinlogon relay-bin.xxxxxx
# at 1184 #110318 14:20:12 server id 101 end_log_pos 1683 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1300483212/*!*/; BEGIN/*!*/;

end_log_pos is copied from master!

# at 1257 #110318 14:20:12 server id 101 end_log_pos 1791 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1300483212/*!*/; INSERT INTO `foo` VALUES (1,'a'),(2,'b')/*!*/; # at 1365 #110318 14:20:12 server id 101 Xid = 32 COMMIT/*!*/;
-16-

end_log_pos 1818

mysqlbinlogon slave-bin.xxxxxx
# at 1610 #110318 14:20:12 server id 101 end_log_pos 1674 Query thread_id=8 exec_time=15 error_code=0 SET TIMESTAMP=1300483212/*!*/; BEGIN/*!*/; Here, exec_time means slave lag! # at 1674 #110318 14:20:12 server id 101 end_log_pos 1782 Query thread_id=8 exec_time=15 error_code=0 SET TIMESTAMP=1300483212/*!*/; INSERT INTO `foo` VALUES (1,'a'),(2,'b')/*!*/; # at 1782 #110318 14:20:12 server id 101 Xid = 35 COMMIT/*!*/;
-17-

end_log_pos 1809

End of the Review

Any Questions? (find me after the talk)

Finally, the meat of the talk!

Step One
When replication stops... Why did it stop? This usually tells you how to fix it

Generally does not take much time

Where did it break? Determine extent of corruption / data loss

Were statements skipped or run twice?

What is the business impact? Don't over-analyze. Estimate is OK at this step.

-20-

Step Two
Now that you know why it stopped... Pick your restore method Do you prioritize site availability or data integrity

Validate consistency If possible, before allowing traffic

If necessary, use tools to resync the slave

Have a plan B Sometimes you just need to rebuild

-21-

Understand What Failed

Where and why the failure happens matters a lot! Understand the basics Know your architecture How you solve it is Learned from experience Based on common principles Similar even for different organizations

-22-

Two Times when Replication Fails

mysqld is OK

mysqld was interrupted

Writes on the slave File corruption Documented limitation Un/known bugs?

Hardware failure mysqld crashed Kernel panic / OOM killer kill -9 `pidof mysqld` Host or SAN restarted and rolled back status file

-23-

Two Times when Replication Fails

mysqld is OK Dedicated Monitoring!! Seconds_Behind_Master lies. Watch IO and SQL thread state and position. Use a heartbeat table. mysqld was interrupted skip-slave-start ib_overwrite_relay_log_info (only in Percona-Server) 5.6-labs features bit.ly/crash-safe-replication

-24-

Common Situations
Documented limitations & bugs Writes on the slave File Corruption Hardware Failures

Documented Limitations

Just a feature by another name ;)

Documented Limitations

Stmt-Based Replication (SBR) has many limitations Documentation: http://bit.ly/5-5-rep-limits

33 subsections Still the default format, even in 5.5

Row-Based Replication (RBR) avoids most limitations But not all: http://bit.ly/5-5-rbr-limits

All tables require a PRIMARY KEY

Can be more difficult to do online alter table than SBR

-27-

Documented Limitations

Get to know the limitations & open bugs Some are documented as bug reports Avoid them like the plague Watch the patch notes like a hawk (even if you don't upgrade) Change your application as necessary

-28-

Some limitations in SBR:

Statement may not be safe to log in statement format

Non-Deterministic functions or statements UUID, NOW, CURRENT_USER, FOUND_ROWS UPDATE|DELETE with LIMIT but no ORDER BY TEMPORARY and MEMORY tables SET TX_ISOLATION = READ_COMMITTED Routines / Triggers with logical expressions Dynamic SQL And lots more ...

-29-

Some limitations in RBR, too ...

Can't read SQL in binary log Different DDL on master / slave breaks RBR (*) Slave calls fsync() for every row, not every transaction
(this is really a performance issue, not a limitation)

Mixing InnoDB and MyISAM in one transaction ... It's just not a good idea!

Order of binlog changed multiple times in 5.1

-30-

Writes in the Wrong Place

or Why ring replication is A Bad Idea.

Write on Slave (1)

(when a slave is just a slave) node2 > insert into foo (v) values ('bad insert'); node2 > select * from foo; | 1 | a | | 2 | b | | 3 | bad insert | node1 > insert into foo (v) values ('c'); node2 > show slave status\G Last_Errno: 1062 Last_Error: Error 'Duplicate entry '3' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into foo (v) values ('c')'

-32-

Write on Slave (1)

(when a slave is just a slave)

Sample binlog from slave (node2)

#110328 13:55:52 server id 101 end_log_pos 640 Query thread_id=7 exec_time=0 error_code=0 INSERT INTO `foo` VALUES (1,'a'),(2,'b') #110328 13:56:34 server id 102 end_log_pos 877 Query thread_id=7 exec_time=0 error_code=0 SET TIMESTAMP=1301345794/*!*/; insert into foo (v) values ('bad insert')

-33-

Write on Slave (1)

(when a slave is just a slave)

DELETE record from slave; optionally, insert to master may preserve data integrity

insert to master allows you to keep the record more time-consuming than skipping foreign keys and triggers make this very complicated!

-34-

Write on Slave (1)

(when a slave is just a slave)

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; easiest method

data inconsistency may propagate to other tables, eg. if you use INSERT..SELECT or triggers must sync later (eg, with pt-table-sync)

-35-

Write on Slave (1)

(when a slave is just a slave)

auto_increment_increment & _offset would have prevented failure in this example safe to do even if you never write to slave master = even, slave = odd makes it trivial to identify bad writes - `id` will be odd

-36-

Write on Slave (2)

(when a slave is also a master) node1 > insert into foo (v) values ('c'); node1 > SHOW SLAVE STATUS\G Error 'Duplicate entry '3' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into foo (v) values ('bad insert')' node2 > insert into foo (v) values ('bad insert'); node2 > SHOW SLAVE STATUS\G Error 'Duplicate entry '3' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into foo (v) values ('c')'
Note: I used SLAVE STOP; on both hosts to simulate simultaneous writes

-37-

Write on Slave (2)

(when a slave is also a master)

Fixing master-master is same principle ... But you might accidentally corrupt your primary! So be extra careful...

For example: DELETE record from secondary & insert to primary Use SET SESSION SQL_LOG_BIN = 0 for DELETE But allow INSERT to replicate normally

-38-

Write on Slave (2)

(when a slave is also a master)

SET SQL_SLAVE_SKIP_COUNTER = 1 On both masters Then sync with pt-table-sync auto_increment_increment & _offset A must have if you use auto_inc + master-master replication + fail-over

-39-

Common mistake:
There is a real open source project that [W]ill not only check on the health of your MySQL Replication setup, but it can also automatically heal and restart the MySQL Replication. Self-Healing MySQL Replication is not a new idea. bit.ly/replication-suicide but it's a really bad idea!!

-40-

File corruption

It's not always the file's fault!

File corruption
Slave may stop if any of these files become corrupt Binary log | Relay log | Table data | Index data Examples of SHOW SLAVE STATUS
Last_IO_Error: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log Last_SQL_Errno: 1594 Last_SQL_Error: Relay log read failure: Could not parse relay log event entry.

-42-

If (master's) binary log is corrupt:

Verify it with mysqlbinlog [--verbose --base64] Find extent of corruption CHANGE MASTER TO Master_Log_Pos = <good_pos> Analyze bad section of log. Identify affected tables. Use pttable-sync when replication catches up. Resolve underlying (hardware) cause of the corruption!

-43-

If (slave's) relay-log is corrupt:

Verify it with mysqlbinlog <relay-log-file> Check master binlog for corruption. If found, see previous slide. Re-fetch the corrupted relay log: CHANGE MASTER TO Master_Log_Pos = <Exec_master_log_pos>; Very rare (in my experience) Often due to network congestion

-44-

If a table is corrupt...

Stop replication and stop all traffic to that server Mount the file system read-only Check mysql error log - it may contain more info Even after fixing the corruption, pt-table-sync Sometimes, only option is restore from a backup

-45-

Hardware Failures

A SAN is just a bigger Single Point of Failure. ~unknown

HW Failure #1
Unplanned master restart all slaves stop with error: 'Client requested master to start replication from impossible position' Process: Compare each SLAVE STATUS to master binlog Realize that Exec_Master_Log_Pos different on each slave, and greater than master's log file size! Panic for a minute...

-47-

HW Failure #1
Possible solutions: Promote slave that read the most Isolate extra events from slave binlog, replay on master Force slaves to resume from next binlog, then pt-table-sync to remove extra events Prevent with: RAID BBU + HDD cache disabled innodb_flush_log_at_trx_commit = 1 sync_binlog = 1
-48-

HW Failure #2
Replaced a failed disk in a replica. After restart, replication fails with duplicate key error. Process: Check error log... there's no slave stop coordinates! Realize file system in read-only mode before shut down Was master.info rolled back by file system journal?

-49-

HW Failure #2
Possible solutions: Guess where slave stopped, then CHANGE MASTER But what if data files also rolled back? SQL_SLAVE_SKIP_COUNTER + prayers + pt-tablesync Rebuild the slave Prevent with: Percona Server + innodb_overwrite_relay_log_info

-50-

War Stories

Slave generates different auto_inc values for INSERT inside an ON_INSERTTRIGGER. This goes unnoticed for months, then starts causing problems. During MMM migration, some uncommitted transactions left behind duplicate key errors on both nodes. Major hardware failure in a geo-distributed three-node replication ring. Each host is written to by local processes. Some tables only written by single host; some shared tables. How do you determine which of remaining two masters is good?

-51-

[email protected] devananda.vdv@gmail

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Lab 1 Experimental Errors
100% (1)
Lab 1 Experimental Errors
7 pages
Mind Over Markets - 2012 - Dalton - Front Matter
0% (1)
Mind Over Markets - 2012 - Dalton - Front Matter
17 pages
Oracle Ultimate DBA Interview Questions
From Everand
Oracle Ultimate DBA Interview Questions
equitypress
4.5/5 (3)
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
Final Capstone CRM - Report
No ratings yet
Final Capstone CRM - Report
39 pages
PLUK2011 Diagnosing and Fixing Mysql Replication
No ratings yet
PLUK2011 Diagnosing and Fixing Mysql Replication
56 pages
MySQL Replication Tutorial
100% (9)
MySQL Replication Tutorial
114 pages
Mysql Replication PDF
No ratings yet
Mysql Replication PDF
15 pages
Mysql Replication
No ratings yet
Mysql Replication
125 pages
Mysql Replication Update
No ratings yet
Mysql Replication Update
38 pages
Replication Coordinates
No ratings yet
Replication Coordinates
5 pages
Replication Webinar
No ratings yet
Replication Webinar
30 pages
MySQL Replication Tutorial
100% (6)
MySQL Replication Tutorial
114 pages
MySQL-slave Status
No ratings yet
MySQL-slave Status
2 pages
Reliable Mysql Using Replication: Issac Goldstand Mirimar Networks
No ratings yet
Reliable Mysql Using Replication: Issac Goldstand Mirimar Networks
33 pages
Database Replication in MySQL
No ratings yet
Database Replication in MySQL
5 pages
Mysql High Availability
No ratings yet
Mysql High Availability
94 pages
Automated, Non-Stop MySQL Operations and Failover Presentation
100% (1)
Automated, Non-Stop MySQL Operations and Failover Presentation
46 pages
MySQL Replication Formats
No ratings yet
MySQL Replication Formats
17 pages
MySQL_v2
No ratings yet
MySQL_v2
22 pages
Mysql
No ratings yet
Mysql
77 pages
Mysql High Availability Solutions
No ratings yet
Mysql High Availability Solutions
41 pages
Mysql Dba Qa
No ratings yet
Mysql Dba Qa
4 pages
Notes From Session
No ratings yet
Notes From Session
6 pages
MySQL Replication Setup
No ratings yet
MySQL Replication Setup
7 pages
MySQL-Replication Status
No ratings yet
MySQL-Replication Status
3 pages
Replicacion MySql
No ratings yet
Replicacion MySql
87 pages
Mysql Replication & Cluster
100% (9)
Mysql Replication & Cluster
40 pages
Replication and Consistency For Mysql
No ratings yet
Replication and Consistency For Mysql
5 pages
Running Low On Disk Space Using MySQL Replication
No ratings yet
Running Low On Disk Space Using MySQL Replication
3 pages
Problems When Creating Reports From Slave
No ratings yet
Problems When Creating Reports From Slave
3 pages
MM Single Slave Repl
No ratings yet
MM Single Slave Repl
4 pages
Mysql Pacemaker
No ratings yet
Mysql Pacemaker
49 pages
mysql_replication_troubleshooting__1723528792
No ratings yet
mysql_replication_troubleshooting__1723528792
9 pages
MySQL Master Slave Replication in Windows (1).PDF
No ratings yet
MySQL Master Slave Replication in Windows (1).PDF
10 pages
Linux and H/W Optimizations For Mysql: Yoshinori Matsunobu
No ratings yet
Linux and H/W Optimizations For Mysql: Yoshinori Matsunobu
160 pages
MariaDB 10
No ratings yet
MariaDB 10
5 pages
How To Repair MySQL Replication - HowtoForge - Linux Howtos and Tutorials
No ratings yet
How To Repair MySQL Replication - HowtoForge - Linux Howtos and Tutorials
12 pages
High-Value Transaction Processing With MySQL
No ratings yet
High-Value Transaction Processing With MySQL
53 pages
Session 8 9 Questions
No ratings yet
Session 8 9 Questions
27 pages
MySQL Replication
No ratings yet
MySQL Replication
4 pages
Mysql High Availability Solutions
No ratings yet
Mysql High Availability Solutions
35 pages
Performance Is Overrated - NEDB 2012
100% (2)
Performance Is Overrated - NEDB 2012
44 pages
MySQL Replication Blueprint
No ratings yet
MySQL Replication Blueprint
26 pages
MySQL Upgrade Problems
No ratings yet
MySQL Upgrade Problems
4 pages
Mysql 8.0 en 121 150
No ratings yet
Mysql 8.0 en 121 150
30 pages
Session 7 Questions
No ratings yet
Session 7 Questions
6 pages
Cluster Ppt
No ratings yet
Cluster Ppt
10 pages
Practical Distributed Processing Using MySQL Built-In Functionality Presentation
No ratings yet
Practical Distributed Processing Using MySQL Built-In Functionality Presentation
46 pages
Setting Up Multi-Source Replication in MariaDB 10
No ratings yet
Setting Up Multi-Source Replication in MariaDB 10
71 pages
Hive Metastore HA - MySQL Replication For Failover Protection - Hortonworks
100% (2)
Hive Metastore HA - MySQL Replication For Failover Protection - Hortonworks
18 pages
MySql High Availability and Scalability
No ratings yet
MySql High Availability and Scalability
36 pages
MariaDB_compress_backup
No ratings yet
MariaDB_compress_backup
175 pages
MySQL Master-Master Replication
No ratings yet
MySQL Master-Master Replication
7 pages
Inside Livejournal Backend
100% (7)
Inside Livejournal Backend
49 pages
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
The Little Book of Sitecore® Tips: Volume 2
From Everand
The Little Book of Sitecore® Tips: Volume 2
Neil P Shack
No ratings yet
Kali Linux Penetration Testing Bible
From Everand
Kali Linux Penetration Testing Bible
Gus Khawaja
No ratings yet
Lotus Notes Interview Questions, Answers and Explanations
From Everand
Lotus Notes Interview Questions, Answers and Explanations
equitypress
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
CompTIA A+ Complete Review Guide: Core 1 Exam 220-1101 and Core 2 Exam 220-1102
From Everand
CompTIA A+ Complete Review Guide: Core 1 Exam 220-1101 and Core 2 Exam 220-1102
Troy McMillan
5/5 (2)
Data Analytics Essentials Online Course
No ratings yet
Data Analytics Essentials Online Course
15 pages
Project HRT Report
No ratings yet
Project HRT Report
25 pages
Aveva Pdms
No ratings yet
Aveva Pdms
34 pages
Chapter 1 - Fundamental of DB - Part 1
No ratings yet
Chapter 1 - Fundamental of DB - Part 1
40 pages
Uob Python Lecture2p
No ratings yet
Uob Python Lecture2p
22 pages
Connect Microsoft Fabric Lakehouse With DBeaver
No ratings yet
Connect Microsoft Fabric Lakehouse With DBeaver
11 pages
Basic - 7 - Introduction To Power BI
No ratings yet
Basic - 7 - Introduction To Power BI
42 pages
02.defensepro Platform
No ratings yet
02.defensepro Platform
2 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Oracle Tuning
No ratings yet
Oracle Tuning
16 pages
Program Details Software Courses
No ratings yet
Program Details Software Courses
12 pages
Cs614-Mid Term Solved Subjectives With References by Moaaz PDF
No ratings yet
Cs614-Mid Term Solved Subjectives With References by Moaaz PDF
18 pages
Spare Parts Monitoring System: Republic of The Philippines
No ratings yet
Spare Parts Monitoring System: Republic of The Philippines
5 pages
applications.bcfe.ie
No ratings yet
applications.bcfe.ie
10 pages
Data Science Tutorial Library - 370+ Free Tutorials
100% (1)
Data Science Tutorial Library - 370+ Free Tutorials
14 pages
Review and Discussion Questions Chapter Nine
No ratings yet
Review and Discussion Questions Chapter Nine
9 pages
How To Handle 100 Million Rows With SQL Server BCP
No ratings yet
How To Handle 100 Million Rows With SQL Server BCP
12 pages
Analytics of AI in Enviroments of Development
No ratings yet
Analytics of AI in Enviroments of Development
11 pages
PROG2 Assessment
No ratings yet
PROG2 Assessment
66 pages
Laptop Video 2 Go
No ratings yet
Laptop Video 2 Go
121 pages
Java JDBC PreparedStatement Example - HowToDoInJava
No ratings yet
Java JDBC PreparedStatement Example - HowToDoInJava
1 page
Hem0007 Gantt Chart
No ratings yet
Hem0007 Gantt Chart
2 pages
PCED-30-01 Certified Entry-Level Data Analyst With Python Dumps
No ratings yet
PCED-30-01 Certified Entry-Level Data Analyst With Python Dumps
7 pages
IBM TS7700 Virtual Tape Library - Level 2 Quiz
100% (1)
IBM TS7700 Virtual Tape Library - Level 2 Quiz
4 pages
Ampalaya Chapters
50% (6)
Ampalaya Chapters
5 pages
An Airport Flights Database System Report
No ratings yet
An Airport Flights Database System Report
25 pages
Research Methodology
No ratings yet
Research Methodology
14 pages

Diagnosing Failures in Mysql Replication: Devananda Van Der Veen Percona Live 2012

Uploaded by

Diagnosing Failures in Mysql Replication: Devananda Van Der Veen Percona Live 2012

Uploaded by

Diagnosing Failures in MySQL Replication

Devananda van der Veen Percona Live 2012 [email protected]

but not perfect ...

and has some very old bugs.

For more, see Lars Thalmann's presentations bit.ly/LarsRepTalk bit.ly/LarsRepTalk1

(binary file) (plain-text file)

Composed of two threads: IO + SQL

Status file Reads from Writes to Local list

Status file master.info master-bin.xxx Reads from relay-bin.xxx relay-bin.index

The Index Files

Slave relay log index file

The Info Files

The Info Files part 2

mysql> SHOW SLAVE STATUS\G

Fields re-ordered for display

end_log_pos is copied from master!

End of the Review

Any Questions? (find me after the talk)

Finally, the meat of the talk!

Generally does not take much time

Where did it break? Determine extent of corruption / data loss

Were statements skipped or run twice?

What is the business impact? Don't over-analyze. Estimate is OK at this step.

Validate consistency If possible, before allowing traffic

If necessary, use tools to resync the slave

Have a plan B Sometimes you just need to rebuild

Understand What Failed

Two Times when Replication Fails

mysqld was interrupted

Writes on the slave File corruption Documented limitation Un/known bugs?

Two Times when Replication Fails

Just a feature by another name ;)

Stmt-Based Replication (SBR) has many limitations Documentation: http://bit.ly/5-5-rep-limits

33 subsections Still the default format, even in 5.5

All tables require a PRIMARY KEY

Can be more difficult to do online alter table than SBR

Some limitations in SBR:

Some limitations in RBR, too ...

Order of binlog changed multiple times in 5.1

Writes in the Wrong Place

or Why ring replication is A Bad Idea.

Write on Slave (1)

Write on Slave (1)

Sample binlog from slave (node2)

Write on Slave (1)

Write on Slave (1)

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; easiest method

Write on Slave (1)

Write on Slave (2)

Write on Slave (2)

Write on Slave (2)

It's not always the file's fault!

If (master's) binary log is corrupt:

If (slave's) relay-log is corrupt:

A SAN is just a bigger Single Point of Failure. ~unknown

You might also like