0% found this document useful (0 votes)
17 views53 pages

Rai 11 Itsec Design Implementing Maintaining Recovery

The document discusses various aspects of designing backup and recovery processes including: 1) Defining the data to backup, backup frequency, methods, and recovery order. 2) Factors that influence backup techniques and timing such as data categories, disaster scenarios, and data transport options. 3) Types of recovery sites including cold, warm, and hot sites that vary in recovery time from days to seconds based on level of setup and resources preconfigured at the site.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views53 pages

Rai 11 Itsec Design Implementing Maintaining Recovery

The document discusses various aspects of designing backup and recovery processes including: 1) Defining the data to backup, backup frequency, methods, and recovery order. 2) Factors that influence backup techniques and timing such as data categories, disaster scenarios, and data transport options. 3) Types of recovery sites including cold, warm, and hot sites that vary in recovery time from days to seconds based on level of setup and resources preconfigured at the site.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Designing, Implementing and

Maintaining the Recovery


Solution
Lecture 12
Backup and Recovery Processes

Determine processes for backing up and


recovering the critical data

Define the required recovery configuration.


Data Backup and Recovery
Processes

• The physical resources required at the


alternate site and the interconnection
between the sites are influenced by:
◦ the methods used to back up the
data
◦ the way it is transported and stored
off-site Cloud / other dvin
◦ the techniques for recovering the
data
Data Backup and Recovery
Processes
DP: Data processing

• DP Resources : e.g. data, hardware,


vendor software, building facilities
• Data
• the most important DP resources
• Why?
• the most difficult to manage during
recovery
• Why?
Data Backup and Recovery
Strategy

• Must include :
– what data will be backed up
– how often the backup is done
– in what way is the backup done
– how will the data be recovered
– in what order is the backup and
recovery performed
Backup and Recovery Techniques
and Timing Factors

• Factors that influence the decisions on backup


and recovery techniques and timing:
I) data categories
II) disaster recovery scenario
III) interrelationships within data
IV) data backup options
V) data transport and secure storage
VI) readiness of the alternate site
Data Categories

Database
High
APPLICATIONS Batch data
Quanti
ty of
data
Application program
Vadati
lity of
data

Security system
INFRASTRUCTURE Database system
Catalog
Operating
Low SYSTEM PLATFORM
system
Issues on Disaster Recovery Scenario

• 5 issues must be considered during a disaster


recovery scenario:
– data safety and usability
– orphan data (data for which the owner cannot be
identified or is not available)
– lost data
– catch-up data (data recovered immediately after
the disaster happen)
– data recovery
Disaster Recovery Scenario
R S
e e
L
c r
a
D o v
s
e v i
t
c e c
s
i r e
a
s y r
f
F i s e
e
a o i s
b
i n t u
a
l t e m
c
u o o e
k
r m p
u
e o e
p
v r Re-entry of
e a
Orphan data t Orphan data
i
o
n

Site a
l
Data Recovery
preparation
Diagnosis

Normal Outage Recovered


operation operation
E.g.: Orphan Folder/File/Data

Additional Reading
on Example of File
Recovery Methods:

http://www.r-
tt.com/Articles/File_
Recovery_Basics/

If the file system on the disk is severely damaged, this recovery method
cannot recreate the entire folder structure. Then recovered files will
appear in "orphaned" folders.
Managing Recovery at Alternate
Site
• Factors that will influence the decision of how to
manage and operate the alternate site:
I. options for operating an alternate site
II. considerations when assessing options
III. automated operations
IV. operations culture at remote site
V. additional considerations
Decision Criteria

• Decision criteria when designing a


disaster recovery solution:
I. cost of the disaster recovery solution
II. disaster coverage (residual risk)
III. speed of recovery
IV. completeness of recovery (data
currency)
Decision Criteria Interrelationship

Disaster
Coverage
(residual risk)

Cost
Recovery speed
Completeness of
recovery
Product Selection

• Objectives:
– Required to implement the solution in
disaster recovery plan
– Helps to estimate the cost of a disaster
recovery
– Intentionally excluded from the design to
allow the initial design to be determined
Recovery Site Selection

• Approaches that used to choose the type of


alternative site
◦ Owned by the organization
◦ Third party recovery service
◦ Mutual agreement between disaster
recovery partner
• Approach depends on:
◦ How many factors involved?
◦ cost
Recovery Site Type

• One of the biggest issues facing disaster recovery


planners is the selection of an appropriate DR site
type.
• DR site type:
– Cold sites
– Warm sites
– Hot sites
Cold Sites
• Bare-bones approach to disaster recovery
• These facilities have the basic infrastructure needed to run
a data center, such as heating, ventilation and air
conditioning (HVAC), power and network connectivity, but
not much else.
• Cold sites are designed to provide coverage for long-term
outages of the primary site, such as those caused by a
building fire, hurricane or other major disaster that renders the
primary site completely inoperable
• recovery time for cold sites is measured in days or weeks
rather than in hours.
• create a cold site in a facility already owned by your
organization and used for another noncritical purpose
Warm Sites
• Warm sites is created when the long activation time required to stand up a
cold site presents an unacceptable risk
• Depending on the nature of the warm site, administrators may also choose
to have the hardware loaded with the operating systems and/or applications
required to resume operations
• The time required to activate a warm site depends on many of the decisions
made when configuring the warm site:
– Is the organization’s data stored on a storage system that can be directly
accessed by servers at the site, or does it need to be restored from tape?
– Are operating systems already loaded on the hardware at the site?
– Are applications installed on those systems as well?
• An organization can typically activate the warm site in a matter of hours. In
other cases, it may take several days to get the site up and running
Hot Sites
• Hot sites provide the ultimate disaster recovery experience,
with instantaneous or near real-time recovery of operations
when the primary site fails
• Hot sites build upon the warm site concept by taking it to
the next level: ensuring that systems at the site are
preloaded with operating systems, applications and the
data necessary to resume operations
• The significant investment of time and money required to
stand up a hot site provides the organization with the ability
to resume operations in minutes or seconds after a disaster
disrupts operations at the primary site.
Quality of Product

• Indicators that may give a higher degree


of confidence:
– Reputation of the supplier
– Level of support offered for the
product
– Availability of source code
– Proven versus new
– Simple versus complex
Quality of Product – Cont.

• Indicators that may give a higher degree of


confidence:
– Available versus unannounced
– Standard versus custom-built
– Conventional versus innovative
– Hardware versus software versus
hardware & software
– Operating system code versus
application code
Backup/Recovery Solution

• Determine the cost of recovery solution


– Should include all the components of
solution
• Compare the estimations cost with the
cost and risk of disaster
– The cost of solution must be in
proportion to the cost and risk of a
disaster.
Components of Solution

• Hardware
• Software
• Network
• Alternate site
• Effort to implement
• Effort to maintain
Interrelationship Between Cost and
Recovery Time in DR Strategy

B
a
c
k
u
p
s
t
r
a
t
e
g
y
c
o
s
t
Time to Recover
Interrelationship Between Cost of DR
Strategy and Cost of Outage

Cost of solution
Cost Vs
Time of recovery Cost of
Outage
over Time
Cost/Time
Window

Time
Implementation of the Backup
and Recovery Solution
Recovery Solution Areas

• Setting up the alternate recovery


• Developing and implementing the
technical procedures
• Developing the recovery plan
Alternate Recovery Facility
• Define the detailed specifications for the recovery site
and the facilities needed
• Considerations should be taken:
– Length or duration of contract versus cost
– Distance from primary site
– Ease of upgrading when required
– Size and technology used at the site
– Support services
– Network capability
Disaster Recovery Plan Procedures

• Create a new procedures and amended the


existing procedure

• Objective: to ensure the critical processes can


be recovered and run at the recovery site
Disaster Recovery Plan Procedures

• Types of procedure:
– Data backup procedures
– Off-site storage procedures
– Data recovery procedures
– Change management procedures
– Application design rules
– Human resources procedures
Developing the Recovery Plan

• What should included in developing


recovery plan?
– Plan contents
– Disaster recovery teams and key roles
– Confidentiality and access to the plan
– Plan ownership and maintenance
Plan Contents

• Recognize the disaster


• Invoke the DRP
• Information policy
• Shift schedule
• Return to primary site
Disaster Recovery Teams and Key
Roles
• Project Leader
• Disaster Recovery Coordinator (DRC)
• Disaster Recovery steering committee
• Alternate Site Manager
• Auditors
• Management Team
• Administrative Team
• Etc.
Documentation

• Needs to be reviewed, enhanced and


recognized in developing and
documenting a disaster recovery plan

• Types of documentation:
– New
– Existing
Documentation

• Categories of documentation:
• Official manuals and publication about
hardware, software, products, concepts
and solutions
• Standards and procedures for daily
operations
• Specific disaster recovery
documentation
Existing Documents

• Areas that are may be changed:


• Systems and network management
• Configuration management, resource
and capacity planning
• Change management
• Problem management
Existing Documents – Cont.

• Areas that are may be changed:


• Operation processes and procedures
• Data backup concept and procedures
• Application programming standards
• security
New Documents
• Disaster Recovery Strategy
– Contain information about residual risks, the current
status of strategy implementation and any open
problems that might have arisen during the last
recovery test

• Service level Agreement


– a part of a service contract where a service is
formally defined.
– In practice, the term SLA is sometimes used to refer
to the contracted delivery time (of the service or
performance)
Example of SLA interaction
New Documents

• Recovery Test
– Should contain:
• Information about test frequency
• Checklists for test preparation, for the
review of the results and implementation
improvement
• Forms to record test results
• Schedule plans and test processes
• Summaries of the test results
Up to Date Backup and
Recovery Solution
Up to Date Backup and Recovery
Solution Elements

• Three distinct elements that should


incorporate in these procedures are:
– Maintenance
– Auditing
– Testing
Maintenance of the Solution

• Purpose:
– to provide a mechanism for
updating the recovery solution when
changes are made to the
environment
Maintenance of the Solution –
Cont.
• Changes that affect the recovery solution:
– New applications development
– Current hardware configuration changes
– Network changes
– Organizational changes
– System changes
– Alternate sites changes
Plan Auditing

• Purpose: to determine if the DRP


document was updated and
changed.

• Should be audited on a semi-


annual or annual basis.
Plan Auditing – Cont.

• Considerations should be taken:


– Documents should not all be taken for
audit at the same time
– All audited documents should be
returned to the holder within 24 hours
– Documents should be audited against
the DRC’s copy
– etc
Plan Testing
• Purpose: to tell if the maintenance
procedures are working
• Benefits that can be derived from testing
procedure are:
– Knowledge that the recovery plan
works
– Discovery of problems, mistakes and
errors which can be resolved
Plan Testing – Cont.
• Benefits that can be derived from testing
procedure are:
– Training of employees in executing
tests and managing disaster recovery
situations
– Making the recovery plan a “living”
document
– Raising awareness in all parts of DP
organizations regarding the necessity of
DRP
Plan Testing – Cont.

• Types of testing:
– Active
– Passive

• Steps:
– Problem escalation
– Disaster declaration procedures
– Assembling the teams
Plan Testing – Cont.

• What should be involved in the testing


plan?
– Frequency of tests
– Test levels
– Activities before test
– Acceptance criteria
– Execute the test
– After the test
Circumstances May Restrict the
Ability to Test the DR Concept

• Reliability of the network


• Disk space
• Interdependencies between
applications
• Hardware configuration at the
alternate site
• Involve with the running system
during testing
Conclusion
• One of the most critical elements of any disaster recovery
plan is the creation and maintenance of system backups.

• Backups should include not only the organization’s


critical software as well.

• To implement a DRP, a company typically uses outside


services such as a shared-site arrangement or third-party
vendor to replicate critical data processing services.
Conclusion
• Several alternatives are hot-site, cold-site and warm-
site arrangements.
• Which Disaster Recovery Site Strategy
Is Right for You?
– Be sure to factor in business objectives and
continuity needs before making an
investment

• The plan must thoroughly tested testing techniques


such as simulation or parallel testing.

You might also like