Disaster+Recovery+Course
Disaster+Recovery+Course
Multi-region
Backup/Restore Pilot Light Warm Standby
active/active
Common Events
• Easy to recover
• Availability
Scenario 1
• ATM usable only 8 out of 10 times
• Availability = 8/10 or 80%
Scenario 2
• No one is using the machine
• Availability = 0%
Scenario
• ATM broke down twice in the past 100 hours
• Average down time: 5 hours
• Availability = 90/100 or 90%
Request-based or time-based
Automatic recovery
Zero downtime
Copyright © ChandraMohan Lingam. All Rights Reserved.
S3 – Server and Storage Redundancy
S3
• SNS
• SQS
• DynamoDB
• And more
Primary Standby
Primary
Server Server
AZ 1 AZ 2
Copyright © ChandraMohan Lingam. All Rights Reserved.
Multiple Web Servers
AZ 1 AZ 2 AZ 3
Auto Scaling
Primary Standby
Natural disasters
Earthquakes, floods, hurricanes, snowstorms
Technical failures
Power failures, Network Outage
Human actions
Misconfiguration, unauthorized access or modification
Reference:
https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/what-is-a-
disaster.html
Redundant power,
networking, AZ 1 AZ 2
connectivity
Interconnected via
Located within 60 miles redundant, ultra low
(100 KMs) of each other latency network
AZ 3
Backup
Backup
Backup
Primary Standby
Region A
Region A
https://www.datacenterdynamics.com/en/news/aws-us-east-1-region-suffers-errors-and-outages-
impacting-its-status-page/
Copyright © ChandraMohan Lingam. All Rights Reserved.
Disaster Events
https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-
aws/introduction.html
Traffic
Primary DR
Site Site
Traffic
Primary DR
Site Site
Backup
Application Cross-region
Data
Region A
Region A
Backup
• Enable Continuous Backup (Point In Time Recovery) Backup
Backup
• Periodic Full Backup of Your Data (Snapshot)
• Maintain Copy in a Second Region
Region B
Copyright © ChandraMohan Lingam. All Rights Reserved.
Backup and Restore
After disaster
• Restore data
• Deploy servers and other resources
Low-cost
RTO/RPO in hours
Copyright © ChandraMohan Lingam. All Rights Reserved.
Backup and Point-in-time Recovery
And many
EFS backup
more
Web Web
Resources Resources
Pre-configured
and turned OFF
App App
Resources Resources
Continuous Resources to
replication support
Application Application continuous
Data Data replication are
always ON
Region A Region B
After disaster
• Quickly start your web and app servers and
scale them to handle traffic
More expensive
Copyright © ChandraMohan Lingam. All Rights Reserved.
Continuous Replication (RDS)
Continuous
replication Read
Primary Primary
Replica
Region A Region B
• RDS read-replica
• Aurora Global Database
• Both services maintain read-replica(s) in another region
• After a disaster event, promote one of the read-replica as
the new primary to allow read-write traffic
Copyright © ChandraMohan Lingam. All Rights Reserved.
DynamoDB Global Table
• Automatic replication
across specified regions
• All copies are read-write
• Changes are automatically
propagated to other
regions
Image:
https://aws.amazon.com/dynamodb/global-tables/
Continuous
Source replication Destination
Bucket Bucket
Region A Region B
Block-level
continuous
Servers replication
AWS
On-premises
EC2 AWS
Region A Region B
• Route 53
• Global Accelerator
After disaster
• Quickly start your web and app servers and
scale them to handle traffic
Web Web
Resources Resources Fully functional
Scaled down
App App
Resources Resources
Continuous Resources to
replication support
Application Application continuous
Data Data replication are
always ON
Region A Region B
RTO/RPO in minutes
After disaster
• Scale your web and app servers to handle
traffic (Auto scaling)
Web Web
Resources Resources
App App
Resources Resources
Continuous
replication
Application Data Application Data
Region A Region B
Copyright © ChandraMohan Lingam. All Rights Reserved.
DynamoDB Global Table
• Automatic replication
across specified regions
• All copies are read-write
• Changes are automatically
propagated to other
regions
Image:
https://aws.amazon.com/dynamodb/global-tables/
Continuous
replication Read
Primary Standby
Replica
Region A Region B
• RDS read-replica
• Aurora Global Database
• Both services maintain read-replica(s) in another region
• After a disaster event, promote one of the read-replica as
the new primary to allow read-write traffic
Copyright © ChandraMohan Lingam. All Rights Reserved.
Multi-region active/active
After disaster
• Zero downtime
• Traffic automatically routed to other regions
Data loss near zero
Most expensive
Copyright © ChandraMohan Lingam. All Rights Reserved.
Cloud DR
ELB Route 53
demolearn.com
Web Server
DynamoDB Table
Oregon N Virginia
us-west-2 us-east-1
DR Options
1. Backup and Restore
2. Pilot Light
3. Warm Standby
4. Multi-Site Active-Active
Oregon N Virginia
us-west-2 us-east-1
DR Options
1. Backup and Restore
2. Pilot Light
3. Warm Standby
4. Multi-Site Active-Active
Oregon N Virginia
us-west-2 us-east-1
DR Options
1. Backup and Restore
2. Pilot Light
3. Warm Standby
4. Multi-Site Active-Active
Oregon N Virginia
us-west-2 us-east-1
DR Options
1. Backup and Restore
2. Pilot Light
3. Warm Standby
4. Multi-Site Active-Active
Oregon N Virginia
us-west-2 us-east-1
DR Options
1. Backup and Restore
2. Pilot Light
3. Warm Standby
4. Multi-Site Active-Active
7X AWS Certified
Chandra Lingam
100K+ Students