0% found this document useful (0 votes)
74 views30 pages

DR SRM Slides 2019 03 19

The document discusses building disaster recovery (DR) solutions using VMware Site Recovery Manager (SRM). It provides an overview of SRM, including its key functions and features. It also discusses common DR challenges, the objectives of a DR solution, and the components of an SRM solution, including storage-based replication, vSphere Replication, and recovery plan orchestration. The document uses examples to illustrate how to design an SRM solution that meets requirements for DR testing, non-disruptive testing, and recovery time/point objectives.

Uploaded by

ashukwatra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views30 pages

DR SRM Slides 2019 03 19

The document discusses building disaster recovery (DR) solutions using VMware Site Recovery Manager (SRM). It provides an overview of SRM, including its key functions and features. It also discusses common DR challenges, the objectives of a DR solution, and the components of an SRM solution, including storage-based replication, vSphere Replication, and recovery plan orchestration. The document uses examples to illustrate how to design an SRM solution that meets requirements for DR testing, non-disruptive testing, and recovery time/point objectives.

Uploaded by

ashukwatra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Building DR Solutions with

VMware Site Recovery Manager

March 2019
John A. Davis
Virtualization Architect, @johnnyadavis, vLoreBlog.com
Problems Addressed
Let’s focus on these issues today

Many organizations have components of a Disaster Recovery (DR) solution in place but do not necessarily
have confidence that they can successfully execute a failover in the event of an actual disaster.

• No DR plans or inadequate solution.


• DR testing is too painful
• DR Run books involve manual processes
• RPO and RTO are not met

Let’s look at building DR solutions based on VMware Site Recovery Manager

2
Overview
What are we covering today?

Agenda Key Take-aways

• The need for DR and common DR challenges • Tips on designing a solid DR solution based on Site
Recovery Manager (SRM)
• Solution overview
• Example Design:
• Understanding of the solution components,
including SRM, storage based replication and
‣ key requirements vSphere Replication

‣ high level • Ideas for leveraging NSX to enable application


functionality testing without disrupting production
‣ low level design
• Lessons Learned

3
Disaster Recovery
What is it? Why do we need it?

• Key part of business continuity


• Recovery from failure of
National Archives and Records Administration:
‣ full data center 93% of companies suffering significant data
‣ Significant portion of a data center loss perish within 5 years
‣ Key distributed application
‣ Access to a data center
• Root causes:
‣ natural disasters
‣ power / network outage
‣ cyber attacks / ransomware
‣ human error

4
Disaster Recovery
What are the key challenges?

‣ Complex, sensitive applications


‣ RPO, RTO
‣ Production ready recovery site
‣ Disaster mitigation, DR testing, failback
‣ Expensive:
• Bandwidth between data centers
• Network and hardware infrastructure for a passive site
• Replication technologies
• Labor for DR planning and testing

5
DR Solution Objectives
What are the short comings of your current solution?

It is Inadequate It Lacks

• SLAs (RPO and RTO) are not met • Disaster mitigation


• Limited DR testing • Failback
• Recovery data center • Non-disruptive, full application DR testing
‣ Not production ready • Auditing, reporting
‣ Lacks backup, monitoring, management, etc. • Proactive monitoring, alerting
‣ Susceptible to same disaster
• Not reliable
• Too expensive
• Does not cover some of my main risks

6
VMware Site Recovery Manager (SRM)
Solution Overview

7
SRM Solution Overview
Why SRM?

Functions Features and Benefits

• Planned migration • Application-agnostic


• Re-protect • Recovery plan orchestration
• Test recovery • Frequent, non-disruptive testing
• Disaster recovery • Centralized management
• Failback (re-protect + planned migration) • Planned migration enables disaster avoidance
• Flexibly for data replication

8
SRM Use Cases
DR is just one use case, here are some others

Use Cases More Detail

• DR protection • SRM Data Sheet: https://bit.ly/2x8L1KE


• DR testing • SRM 8.1 Technical Overview:
https://bit.ly/2O8l7Op
• Disaster avoidance
• Failback
• Data center migrations
• Upgrade and Patch testing

9
What’s New in SRM 8.1?
https://blogs.vmware.com/virtualblocks/2018/04/17/srm-vr-81-whats-new/

• HTML 5 interface (Clarity UI)


• The VR workflow now allows you to add the VM to an existing or new (or no) recovery plan
• SRM 8.1 and VR 8.1 are decoupled from specific VC versions. (compatible with 6.0Ue, 6.5, 6.5U1, 6.7, etc)
• SRM / VR 8.1 can be paired with SRM / VR 8.0
• Config maximums:
‣ 500 protection groups
‣ 5,000 VMs (500 VMs per protection group)
‣ 250 recovery plans (10 concurrently running recovery plans)
‣ 2,000 VMs per plan
‣ 2000 VMs protected with VR
• Compatible with FT protected VMs (array based replication only, the SRM recovered VM is not FT protected)

10
Terminology
Here is our vocabulary lesson for the day

• Recovery time objective (RTO): Targeted amount of time a business process should be restored after a disaster or
disruption in order to avoid unacceptable consequences associated with a break in business continuity.
• Recovery point objective (RPO): Maximum age of files recovered from backup storage for normal operations to
resume if a system goes offline as a result of a hardware, program, or communications failure.
• Consistency group: One or more LUNs or volumes that are replicated at the same time. When recovering items in a
consistency group, all items are restored to the same point in time.
• Datastore group: One or more datastores that are treated as a unit in Site Recovery Manager. A common example is a
consistency group in an array replication solution.
• Protected site: Site that contains protected virtual machines.
• Recovery site: Site where protected virtual machines are recovered in the event of a failover.

NOTE: It is possible for the same site to serve as a protected site and recovery site when replication is occurring in both
directions and Site Recovery Manager is protecting virtual machines at both sites.

11
SRM Solution Components
Management, data movers, and orchestration

12
vSphere Replication vs Storage Replication
https://blogs.vmware.com/vsphere/2015/04/srm-abrvsvr.html
Feature Array-Based Replication vSphere Replication

Minimum RPO 0 mins (vendor dependent) 15 mins.


(5 mins with VSAN)
Maximum Protected VMs 5,000 VMs 2,000 VMs

Vendor / Array / Storage types FC, iSCSI or NFS Supports any storage covered by the vSphere
HCL
Cost / Licence Replication and snapshot licensing is required Included in vSphere Essentials Plus 5.1 and
higher

Application consistency Depends on vendor, may require guest based Supports VSS & Linux file system application
agents consistency
Powered off VMs, Templates, Linked clones, Able to replicate Can only replicate powered on VMs.
ISO’s
RDM support Physical and Virtual mode RDMs can be Only Virtual mode RDMs can be replicated
replicated
Multiple Points in Time (MPIT) MPIT is supported by some storage vendors Supports up to 24 recovery points

13
SRM / Storage Compatibility
http://www.vmware.com/resources/compatibility/search.php?deviceCategory=sra

14 Footer
SRM with Storage-based Replication
SRM integrates with vendor specific SRA to manage replication

15
SRM with vSphere Replication
Software based virtual disk replication that integrated easily with SRM

16
vSphere Replication Data Flow
Hypervisor based replication

17
Network and Inventory Mapping
Map source networks, compute resources, VM folders between sites

18
Recovery Plan Orchestration
Predefine your recovery plans in SRM

19
SRM Licensing
Work with your VMware license provider to understand your unique options

• Licensed per VM in packs of 25 VMs.


‣ SRM Standard – up to 75 VMs per site (3 packs).
‣ SRM Enterprise unlimited number of VMs (unlimited number of packs)

• SRM Enterprise exclusive features:


‣ VMware NSX integration
‣ Orchestrated cross-vCenter vMotion
‣ Stretched storage support
‣ Storage policy-based management

NOTE: some SRM bundling options may exist that allow per processor instead of per VM

20
Multi vCenter Server Deployment
Multi-vCenter Server instances per site

21
Example: Key Requirements
DR Test Success Criteria

How do we verify that the DR Solution works well?


• VMs start successfully
• VMs have network connectivity
• Application functionality test

Disruptive vs Non-disruptive Testing


• Non-disruptive testing plus application functionality = complex DR Test Network
• For disruptive testing, will data changes be persisted or discarded?
• For non-disruptive tests, ensure replication still occurs and DR is still available.

Example: Requirements included Test Plan with application specific steps and expected results.

22
Example: High Level Design
Mapping your Unique Requirements to potential solution components
Requirement Solution Component

Ease of Management Standard Replication: vSphere Replication

SLA Tiers: RPO < 15 minutes, RPO =4 hours, RPO = 24 hours Storage based replication, vSphere Replication RPO setting

Application Consistency vSphere Replication VSS Quiescing Support,


Storage based consistency groups
RDMs in Physical Compatibility Mode Storage based replication

Recover from Virus / Hack Disaster Multiple Point in Time Recovery

DR tests plans with application functionality NSX based networks, virtual desktops, required services (AD, DNS)

Proactive alerting based on RPO vSphere Replication RPO violated alarms

Backup and recovery of the DR solution Backup Exec – daily full and differential backups

23
Example: High Level Design
High-level design: SRM with vSphere Replication, NFS, and block storage

24
Example: Application / VM Details
VM worksheet identifying application, priority, target IP, dependencies, etc.

25
Example: Recovery Site Logical Design
Provide network infrastructure and services for non-disruptive DR testing

26
Example: Monitoring / Alerting
We configured email notifications on these specific vCenter Server alarms

27
Example: Multi-site Deployment
Shared Recovery or Protected Site Site A to B to C

28
Lessons Learned
A few lessons I learned the hard way

• Follow the storage vendor documentation.


• Storage based replication requires
‣ VMs to be carefully grouped into LUNs / Consistency Groups
‣ All grouped VMs must be recovered and tested together
‣ Adding a VM to a consistency group may requires SRM work
• Clearly identify the success criteria for DR testing
• Identify multi-site recovery scenarios and requirements
• Always run recovery plans in test mode prior to running in planned migration or actual recovery mode

29
Call to Action
Lots of ways to get started

• Learn more: HOL-1905-01-SDC: https://labs.hol.vmware.com


• Review Product Details: https://www.vmware.com/products/site-recovery-manager.html
• Proof of Concept Testing: https://storagehub.vmware.com/t/site-recovery-manager-3/srm-evaluation-guide/
• VMware Professional Services: https://www.vmware.com/professional-services.html
• VMware Education: SRM Fundamental Couse:
https://mylearn.vmware.com/descriptions/EDU_DATASHEET_SRMICM_V6_1.pdf
• Reach out to me: @johnnyadavis

30

You might also like