Module 13 - Synchronous Replication of Volumes
Module 13 - Synchronous Replication of Volumes
• Identify and explain the differences between Asynchronous and Synchronous Replication
• Define the EqualLogic requirements and implementation of Synchronous Replication (SyncRep)
• Explain the difference between switching over and failing over a SyncRep volume
• Identify and describe the differences between implementing SyncRep with (a) single member pool(s) and
(a) multi-member pool(s)
PS Storage Replication
Methods
• PS Series firmware provides the following methods for automatically replicating block volumes to
provide protection against accidental data loss:
• Traditional replication (referred to as Replication or Auto-Replication) is a point-in-time process that is conducted
between two groups, often in geographically diverse locations. Replication provides protection against a regional disaster
such as an earthquake or hurricane.
• Synchronous replication (also known as SyncRep) is a real-time process that simultaneously writes volume data across
two different pools within the same PS Series group. This method is useful for maintaining two copies of a volume’s data
in the same data center, or dispersed to two different facilities on the same campus or in the same metropolitan area.
• Note: You cannot enable synchronous replication on a volume for which traditional replication is
configured, and you cannot enable traditional replication on a volume for which synchronous replication
is configured.
• SyncRep is a mechanism that will allow for a volume to be synchronously replicated within a Group
• Distance limited
• Requires a Group with at least two pools (minimum of two arrays)
• Each pool contains a copy of the volume
• Every write to the volume is acknowledged by both pools before an acknowledgement is sent back to the
initiator host
• SyncRep provides near-zero recovery time objectives (RTOs) and real time crash-consistent Recovery
Point Objective (RPOs)
• Highly desirable for critical applications that require volumes with high availability
Synchronous Replication
(SyncRep)
• Pool Roles
• SyncActive Pool
• The SyncActive Pool is the pool where the active image resides.
• SyncAlternate Pool
• This pool is where the alternate image of the volume reside
• A pool can contain both Active and Alternate volume images, and the description of the pool is directly related to the volume’s state
• Volume States
• In sync
• The volume images from each pool are identical
• Paused
• Only the SyncActive Pool will acknowledge writes. If any writes come in while paused, the changes to the volume are tracked until SyncRep is resumed
• Out-of-sync
• The SyncAlternate volume image does not contain the same data as the SyncActive image
• Can occur if paused, if pool (or members in pool) becomes unavailable, if pool is out of free space, if snapshot reserve space for the volume is full
• Snapshots supported
• Snapshots only occur on SyncActive Pool
• Snapshots are not mirrored to SyncAlternate Pool
• Normal operation
• With every SCSI WRITE request, the
target sends back an acknowledgement
• SyncRep Configured Volume
• Appears as a single volume in Group
Manager and to Host
• SyncRep manages Active and Alternate page
tables / volumes
• Both volumes have the same IQN
• SyncRep enabled volume will only send back
an acknowledgement once both copies from
each pool have acknowledged the write
• Either volume can be made the Active
volume at any time while volume is In Sync
Dell - Restricted - Confidential
12 Dell - Restricted - Confidential
Internal Use - Confidential
How SyncRep Works – Paused
• Switchover
• Only available when volume images are In Sync
• Typically used for planned maintenance
• Can only while paused as long as images are In Sync
• Failover
• Replaces Switchover as an option when volumes images are Out of Sync
• This is the choice when the volume is out of sync and a fault has already occurred
• Used when some data loss is acceptable over unavailable data
• Both Switchover and Failover force the initiator to reconnect to the volume through an Async Logout
• Allows for the use of the same IP and IQN string to connect to volume in alternate pool
• Switching to SyncAlternate
• Only available when Volume Images are
In Sync
• Change the role of the pools
• Switching to SyncAlternate will cause
iSCSI initiators to be to be redirected to
the new SyncActive volume
• The switch should complete in a second
(or less) in a low latency network
• A system snapshot is automatically taken of the Active Image when the SyncRep
volumes go Out of Sync
• When a write comes into the active image, SyncRep becomes Out of Sync, and
Switchover to SyncAlternate is no longer allowed
• Failover to SyncAlternate will appear when the relationship becomes Out of Sync
• You should not attempt to failover to an Out of Sync Offline volume unless data loss is
acceptable
• Creating a snapshot of a volume will create a snapshot in the SyncActive pool only
• As with snapshots of volumes for which synchronous replication is not enabled, you can access the data in
the snapshot by setting the snapshot online or cloning it regardless of what pool it is in.
• To restore a volume from snapshot the snapshot must be in the SyncActive pool
• Switchover / Failover moves where a schedule creates a snapshot to the new SyncActive pool, but
existing snapshots remain in the pool they were created in
• Temporary snapshots are created in both pools to ensure changes to the volume are tracked if the volume
is out of sync.
• Snapshots can exist and be accessible in either pool but are only created in the pool
that is SyncActive Pool at the time of creation
• Restoring a Snapshot based in the SyncAlternate will prompt you to Switch to
SyncAlternate before the restore occurs
Troubleshooting SyncRep
• Failover situations
• If the SyncActive volume becomes unavailable while the volume is in sync, you can safely failover to the SyncAlternate
volume. Although host access to the volume is disrupted during the switch, no initiator changes are required
• If the SyncActive volume becomes unavailable while the volume is out of sync, any changes written to the SyncActive
volume that have not yet been replicated to the SyncAlternate pool will be lost and cannot be recovered. Failovers should
only be performed under extraordinary circumstance
• Changes cannot be tracked in a snapshot if there is not enough snapshot reserve space
• SyncRep must be disabled before a volume can be saved in the volume recovery bin
• Utilize the same switch technologies for the SyncActive and SyncAlternate Pools (Force10 is preferred)
• When using multiple switches ensure only a single hop between switches
• Maintain short distance between Pools for acceptable level of latency tolerance.
• Latency is determined by 2 factors – Distance between pools and interaction between TCP reliability and congestion
control protocols.
• To minimize application performance impact it is recommended having the source and destination PS Pools located
within 1km in distance
• Describe, compare and/or demonstrate the differences between running diagnostics from within the Group
Manager GUI and the CLI
• Identify the 15 different diagnostics sections that are collected when Diagnostics are run on an array
• Describe what is meant by ‘abbreviated’ diagnostics and when they are useful
• Identify the different diagnostic parameters and switches that you may need to employ when running diagnostics
• Identify the different methods to extract the diagnostic output from a member or group of members
Collecting PS Series
Array Diagnostic Data
• SAN Assist is preferred method for collecting Group/Array Diagnostics for customers
• Running Dell EqualLogic Array Diagnostics
• GUI
• Easiest method to execute for customer
• Can run across multiple members at the same time
• Cannot pass parameters to the ‘under-the-hood’ diagnostic script
• CLI
• Must connect directly to the member (not the Group IP)
• Must run manually on every member
• Can run any subset of the diagnostics collectors (sometimes required by customers)
• Many parameters available to CLI, including abbreviated output
• When diagnostics is run, multiple encrypted files (seg_#.dgo) are generated and stored
on the array (grpadmin’s / ftp directory)
• Full Diagnostics collects 15 groups (or sections) of information:
• Login directly to the member as grpadmin (or any Group Administrator account)
• Determine if any parameters need to be passed to script
• To the management port if enabled, otherwise any eth port or via the active CM’s serial port
• ALWAYS run diagnostics on the Group Lead in addition to problem /suspect arrays
• If a small group (e.g. 4 arrays), run diags on all members in the group
• Execute script with any necessary parameters
• Enter diag -h for command syntax & help
• <CTRL>+C can break out of the script
• Requirements
• Host that can communicate over
management network to the array
• Over iSCSI SAN if management port
not configured due to mixed traffic
• FTP Client
• FileZilla, WinSCP (FTP protocol)
• Command prompt