61 Synchronous Log Shipping Replication
61 Synchronous Log Shipping Replication
Replication
PGCon 2008
Agenda
WAL
Active Server Standby Server
Commit 1
2 Flush WAL to disk
(Return) 3
Failover
Crash!
WAL seg Sent after commits
4 archive_command
WAL seg
Redo
(Return) 4
WAL seg
Failover
Crash!
Redo
Crash!
20 sec 60 ~ 180 sec (*)
to umount to recover
Log-shipping system and remount from the last
shared disks checkpoint
Internet networks
Communicator
PostgreSQL PostgreSQL
DB DB
wal
buffers WAL
WALSender WALReceiver
PostgreSQL PostgreSQL
DB DB
wal
buffers WAL
WALSender WALReceiver
wal
buffers WAL
WALSender WALReceiver
wal
buffers WAL
WALSender WALReceiver
PostgreSQL PostgreSQL
DB DB
In order to manage these resources,
WAL postgres startup WAL
there is heartbeat in both nodes
wal
buffers WAL
WALSender WALReceiver
PostgreSQL PostgreSQL
DB DB
wal
buffers WAL
WALSender WALReceiver
Commit
Flush
WAL
Request
Read
Send / Recv
(Return)
(Return)
Commit
Flush Update command triggers XLogInsert()
and inserts WAL
WAL into walbuffers
Request
Read
Send / Recv
(Return)
(Return)
XLogWrite()
Commit
Flush
WAL
Request
Read
Send / Recv
(Return)
Commit command triggers
XLogWrite() and flushs WAL to disk
(Return)
XLogWrite()
Commit
Flush
WAL
Changed Request
Read
Send / Recv
(Return)
(Return)
We changed XLogWrite() to request
WALSender to transfer WAL
XLogWrite()
Commit WALSender reads WAL from
Flush
walbuffers
WAL and transfer them
Changed Request
Read
Send / Recv
(Return)
(Return)
After transfer finishes, commit
command returns
Inform
Read
Replay
Inform
WALReceiver receives WAL from
Read
WALSender and flushes them to disk
Replay
Inform
Read
WALReceiver informs startup
process of the latest LSN.
Replay
Read Changed
Our replicator
Warm-standby
segment1 segment2
WAL block
WAL which can be replayed now
WAL needed to be replayed at failover
Our replicator
Warm-standby
segment1 segment2
WAL block
WAL which can be replayed now
WAL needed to be replayed at failover
Our replicator
Warm-standby
segment1 segment2
WAL block
WAL which can be replayed now
WAL needed to be replayed at failover
Our replicator
Warm-standby
segment1 segment2
WAL block
WAL which can be replayed now
WAL needed to be replayed at failover
Our replicator
Warm-standby
segment1 segment2
WAL block
WAL which can be replayed now
WAL needed to be replayed at failover
Our replicator
Therefore, we implemeted replay Warm-Standby
by each WAL record WAL record
Replay by each WAL segment
Needed to be replayed at failover a few records the latest one segment
Delay in read-only queries shorter longer
Our replicator
Warm-standby
segment1 segment2
WAL block
WAL which can be replayed now
WAL needed to be replayed at failover
Invoke RA Resources
Heartbeat
start promote PostgreSQL
stop demote (WALSender)
monitor WALReceiver
Detect
Act ReadRecord()
Request
Changed
At failover, heartbeat requests startup
process to finish WAL replay.
Replay
We changed ReadRecord() to deal with
this request.
Detect
Act ReadRecord()
Request
Changed
• How to detect
– Timeout notification is needed to detect
– Keepalive, but it doesn't work occasionally on Linux (Linux bug!?)
– Original timeout
Active
postgres WALSender
Commit
Request
Send WAL Down!!
Standby
Wait
(Return)
(Return) Blocked
• How to detect
– Timeout notification is needed to detect
– Keepalive
• Our setKeepAlive patch was accepted in JDBC 8.4dev
– Socket timeout
– Query timeout Client
We want to implement
these timeouts!! Down!! Active Active
Turn off!!
STONITH
active
standby
• the other terminal is for operation
– Client
– Node0
– Node1
3. online backup
2 servers
2 * 50% 1 * 100%
3 servers
3 * 66% 2 * 100%
Contact
[email protected]
[email protected]