0% found this document useful (0 votes)
25 views

Principles of Information Security 7E - Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Principles of Information Security 7E - Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

MODULE 5

Incident Response and


Contingency Planning

Upon completion of this material, you should be able to: A little fire is quickly
1 Discuss the need for contingency planning trodden out; which,
2 Describe the major components of incident response, disaster recovery, and being suffered, rivers
business continuity cannot quench.
—William Shakespeare, King
3 Identify the processes used in digital forensics investigations Henry VI, Part III, Act IV, Scene 8
4 Define the components of crisis management
5 Discuss how the organization would prepare and execute a test of contingency plans

Opening Scenario
Charlie Moody flipped up his jacket collar to cover his ears. The spray blowing over him from the fire hoses was icing the cars
along the street where he stood watching his office building burn. The warehouse and shipping dock were not gone but were
severely damaged by smoke and water. He tried to hide his dismay by turning to speak to Fred Chin, standing beside him
overlooking the smoking remains.
“Look at the bright side,” said Charlie. “At least we can get the new servers that we’ve been putting off.”
Fred shook his head. “Charlie, you must be dreaming. We don’t have enough insurance for a full replacement of everything we’ve lost.”
Charlie was stunned. The offices were gone; all the computer systems, servers, and desktops were melted slag. He would
have to try to rebuild without the resources he needed. At least he had good backups, or so he hoped. He thought hard, trying
to remember the last time the off-site backups had been tested.
He wondered where all the network design diagrams were. He knew he could call his Internet provider to order new con-
nections as soon as Fred found some new office space. But where was all the vendor contact information? The only copy had
been on the computer in his office, which wasn’t there anymore. This was not going to be fun. He would have to call his boss,
Gladys Williams, the chief information officer (CIO), at home just to get the contact information for the rest of the executive team.
Charlie heard a buzzing noise to his left. He turned to see the flashing numbers of his alarm clock. Relief flooded him as
he realized it was just a nightmare; Sequential Label and Supply (SLS) had not burned down. He turned on the light and started
making notes to review with his staff as soon as he got into the office. Charlie would make some changes to the company
contingency plans today.
176 Principles of Information Security

Introduction To Incident Response And Contingency


Planning
You were introduced to planning in Module 3, when you learned about planning for the organization in general and
for the information security (InfoSec) program in particular. This module focuses on another type of planning—plans
that are made for unexpected adverse events—when the use of technology is disrupted and business operations can
come to a standstill. Because technology drives business, planning for an unexpected adverse event usually involves
managers from general business management as well as the information technology (IT) and InfoSec communities
of interest. They collectively analyze and assess the entire technological infrastructure of the organization using the
mission statement and current organizational objectives to drive their planning activities. But, for a plan to gain the
support of all members of the organization, it must be sanctioned and actively supported by the general business
community of interest. It must also be carefully monitored and coordinated with the InfoSec community of interest to
ensure that information is protected during and after an adverse event, such as an incident or disaster. Information
that affects the plan must be made securely available to the organization when it may not be operating under normal
circumstances or in its normal locations.
The need to have a plan in place that systematically addresses how to identify, contain, and resolve any possible
unexpected adverse event was identified in the earliest days of IT. Professional practice in the area of contingency
planning continues to evolve, as reflected in Special Publication (SP) 800-34, Rev. 1, “Contingency Planning Guide for
Federal Information Systems,” published by the National Institute of Standards and Technology (NIST). NIST is a non-
regulatory federal agency within the U.S. Department of Commerce that serves to enhance innovation and competitive-
ness in the United States by acting as a clearinghouse for standards related to technology.1 The Applied Cybersecurity
Division of NIST facilitates sharing of information about practices that can be used to secure information systems.
NIST advises the following:

Because information system resources are essential to an organization’s success, it is critical that
identified services provided by these systems are able to operate effectively without excessive
interruption. Contingency planning supports this requirement by establishing thorough plans,
procedures, and technical measures that can enable a system to be recovered as quickly and effectively
as possible following a service disruption.2

Some organizations—particularly federal agencies for national security reasons—are charged by law, policy, or
other mandate to have such plans and procedures in place at all times.
Organizations of every size and purpose should also prepare for the unexpected. In general, an organization’s
ability to weather losses caused by an adverse event depends on proper planning and execution of the plan. Without
a workable plan, an adverse event can cause severe damage to an organization’s information resources and assets
from which it may never recover. The Hartford insurance company estimates that, on average, more than 40 percent
of businesses that don’t have a disaster plan go out of business after a major loss like a fire, a break-in, or a storm. 3
The development of a plan for handling unexpected events should be a high priority for all managers. The plan
should account for the possibility that key members of the organization will not be available to assist in the recovery
process. In fact, many organizations expect that some key members of the team may not be present when an unex-
pected event occurs. To keep the consequences of adverse events less catastrophic, many firms limit the number of
executives or other key personnel who take the same flight or attend special events. The concept of a designated sur-
vivor has become more common in government and corporate organizations—a certain number of specifically skilled
personnel are kept away from group activities in case of unexpected adverse events.
There is a growing emphasis on the need for comprehensive and robust planning for adverse circumstances.
In the past, organizations tended to focus on defensive preparations, using comprehensive threat assessments
combined with defense in depth to harden systems and networks against all possible risks. More organizations
now understand that preparations against the threat of attack remain an urgent and important activity, but that
defenses will fail as attackers acquire new capabilities and systems reveal latent flaws. When—not if—defenses
are compromised, prudent security managers have prepared the organization in order to minimize losses and
reduce the time and effort needed to recover. Sound risk management practices dictate that organizations must
be ready for anything.
Module 5 Incident Response and Contingency Planning 177

Fundamentals Of Contingency Planning


The overall process of preparing for unexpected adverse events is called contingency planning (CP). During CP, the
IT and InfoSec communities of interest position their respective organizational units to prepare for, detect, react to,
and recover from events that threaten the security of information resources and assets, including human, information,
and capital assets. The main goal of CP is to restore normal modes of operation with minimal cost and disruption to
normal business activities after an adverse event—in other words, to make sure things get back to the way they were
within a reasonable period of time. Ideally, CP should ensure the continuous availability of information systems to the
organization even in the face of the unexpected.
CP consists of four major components:

• Business impact analysis (BIA)


• Incident response plan (IR plan)
• Disaster recovery plan (DR plan)
• Business continuity plan (BC plan)

The BIA is a preparatory activity common to both CP and risk management, which was covered in Module 4. It
helps the organization determine which business functions and information systems are the most critical to the success
of the organization. The IR plan focuses on the immediate response to an incident. Any unexpected adverse event is
treated as an incident unless and until a response team deems it to be a disaster. Then the DR plan, which focuses on
restoring operations at the primary site, is invoked. If operations at the primary site cannot be quickly restored—for
example, when the damage is major or will affect the organization’s functioning over the long term—the BC plan occurs
concurrently with the DR plan, enabling the business to continue at an alternate site until the organization is able to
resume operations at its primary site or select a new primary location.
Depending on the organization’s size and business philosophy, IT and InfoSec managers can either create and
develop these four CP components as one unified plan or create the four separately in conjunction with a set of inter-
locking procedures that enable continuity. Typically, larger, more complex organizations create and develop the CP
components separately, as the functions of each component differ in scope, applicability, and design. Smaller organiza-
tions tend to adopt a one-plan method, consisting of a straightforward set of recovery strategies.
Ideally, the chief information officer (CIO), systems administrators, the chief
information security officer (CISO), and key IT and business managers should be
actively involved during the creation and development of all CP components, as well adverse event
as during the distribution of responsibilities among the three communities of inter- An event with negative conse-
est. The elements required to begin the CP process are a planning methodology; a quences that could threaten the
organization’s information assets
policy environment to enable the planning process; an understanding of the causes or operations; also referred to as
and effects of core precursor activities, known as the BIA; and access to financial an incident candidate.
and other resources, as articulated and outlined by the planning budget. Each of
these is explained in the sections that follow. Once formed, the contingency plan- contingency planning
ning management team (CPMT) begins developing a CP document, for which NIST (CP)
recommends using the following steps: The actions taken by senior man-
agement to specify the organi-
1. Develop the CP policy statement. A formal policy provides the authority zation’s efforts and actions if an
adverse event becomes an incident
and guidance necessary to develop an effective contingency plan.
or disaster; CP typically includes
2. Conduct the BIA. The BIA helps identify and prioritize information incident response, disaster recov-
systems and components critical to supporting the organization’s ery, and business continuity efforts,
as well as preparatory business
mission/business processes. A template for developing the BIA is
impact analysis.
provided to assist the user.
3. Identify preventive controls. Measures taken to reduce the effects of system
disruptions can increase system availability and reduce contingency life
contingency planning
management team
cycle costs. (CPMT)
4. Create contingency strategies. Thorough recovery strategies ensure
The group of senior managers and
that the system may be recovered quickly and effectively following a project members organized to con-
disruption. duct and lead all CP efforts.
178 Principles of Information Security

5. Develop a contingency plan. The contingency plan should contain detailed guidance and procedures
for restoring damaged organizational facilities unique to each business unit’s impact level and recovery
requirements.
6. Ensure plan testing, training, and exercises. Testing validates recovery capabilities, whereas training
prepares recovery personnel for plan activation and exercising the plan identifies planning gaps; when
combined, the activities improve plan effectiveness and overall organization preparedness.
7. Ensure plan maintenance. The plan should be a living document that is updated regularly to remain
current with system enhancements and organizational changes.4
Even though NIST methodologies are used extensively in this module, NIST treats incident response separately from
contingency planning; the latter is focused on disaster recovery and business continuity. This module integrates the
approach to contingency planning from NIST SP 800-34, Rev. 1, with the guide to incident handling from NIST SP 800-61,
Rev. 2. It also incorporates material from the newly released NIST SP 800-184, “Guide for Cybersecurity Event Recovery.”
Effective CP begins with effective policy. Before the CPMT can fully develop the planning document, the team must
receive guidance from executive management, as described earlier, through formal CP policy. This policy defines the
scope of the CP operations and establishes managerial intent in regard to timetables for response to incidents, recovery
from disasters, and reestablishment of operations for continuity. It also stipulates responsibility for the development
and operations of the CPMT in general and may provide specifics on the constituencies of all CP-related teams. It is
recommended that the CP policy contain, at a minimum, the following sections:

• An introductory statement of philosophical perspective by senior management as to the importance of CP to


the strategic, long-term operations of the organization
• A statement of the scope and purpose of the CP operations, stipulating the requirement to cover all critical
business functions and activities
• A call for periodic (e.g., yearly) risk assessment and BIA by the CPMT, to include identification and prioritiza-
tion of critical business functions (while the need for such studies is well understood by the CPMT, the formal
inclusion in policy reinforces that need to the rest of the organization)
• A description of the major components of the CP to be designed by the CPMT, as described earlier
• A call for, and guidance in, the selection of recovery options and continuity strategies
• A requirement to test the various plans on a regular basis (e.g., annually, semiannually, or more often as
needed)
• Identification of key regulations and standards that impact CP and a brief overview of their relevance
• Identification of key individuals responsible for CP operations, such as establishment of the chief operations
officer (COO) as CPMT lead, the CISO as IR team lead, the manager of business operations as DR team lead, the
manager of information systems and services as BC team lead, and legal counsel as crisis management team lead
• An appeal to the individual members of the organization, asking for their support and reinforcing their impor-
tance as part of the overall CP process
• Additional administrative information, including the original date of the document, revision dates, and a
schedule for periodic review and maintenance

A number of individuals and teams are involved in CP operations:


The CPMT collects information about the organization and the threats it faces, conducts the BIA, and then coor-
dinates the development of contingency plans for incident response, disaster recovery, and business continuity. The
CPMT often consists of a coordinating executive, representatives from major business units, and the managers respon-
sible for each of the other three teams. It should include the following personnel:
• Champion—As with any strategic function, the CP project must have a high-level manager to support, promote,
and endorse the findings of the project. This champion could be the COO or (ideally) the CEO/president.
• Project manager—A champion provides the strategic vision and the linkage to the power structure of the
organization but does not manage the project. A project manager—possibly a mid-level operations manager
or even the CISO—leads the project, putting in place a sound project planning process, guiding the develop-
ment of a complete and useful project, and prudently managing resources.
• Team members—The team members should be the managers or their representatives from the various
communities of interest: business, IT, and InfoSec. Business managers supply details of their activities and
Module 5 Incident Response and Contingency Planning 179

insight into functions that are critical to running the business. IT managers supply information about the
at-risk systems used in the development of the BIA and the IR, DR, and BC plans. InfoSec managers oversee
the security planning and provide information on threats, vulnerabilities, attacks, and recovery require-
ments. A representative from the legal affairs or corporate counsel’s office helps keep all planning steps
within legal and contractual boundaries. A member of the corporate communications department makes
sure the crisis management and communications plan elements are consistent with the needs of that group.
Supplemental team members also include representatives of supplemental planning teams: the incident
response planning team (IRPT), disaster recovery planning team (DRPT), and business continuity plan-
ning team (BCPT) . For organizations that decide to separate crisis management from disaster recovery,
there may also be representatives from the crisis management planning team (CMPT).

As indicated earlier, in larger organizations these teams are distinct entities, with non-overlapping memberships,
although the latter three teams have representatives on the CPMT. In smaller organizations, the four teams may
include overlapping groups of people, although this is discouraged because the three planning teams (IR, DR, and
BC) will most likely include members of their respective response teams—the individuals who will actually respond
to an incident or disaster. The planning teams and response teams are distinctly separate groups, but representatives
of the response team will most likely be included on the planning team for continuity purposes and to facilitate plan
development and the communication of planning activities to the response units. If the same individuals are on the DR
and BC teams, for example, they may find themselves with different responsibilities in different locations at the same time.
It is virtually impossible to establish operations at the alternate site if team members are busy managing the recovery
at the primary site, some distance away. Thus, if the organization has sufficient personnel, it is advisable to staff the
two groups with separate members.
As illustrated in the opening scenario of this module, many organizations’ con- incident response
tingency plans are woefully inadequate. CP often fails to receive the high priority planning team (IRPT)
necessary for the efficient and timely recovery of business operations during and The team responsible for designing
after an unexpected event. The fact that many organizations do not place an ade- and managing the IR plan by speci-
quate premium on CP does not mean that it is unimportant, however. Here is how fying the organization’s prepara-
tion, reaction, and recovery from
NIST’s Computer Security Resource Center (CSRC) describes the need for this type incidents.
of planning:

These procedures (contingency plans, business interruption plans, and disaster recovery
continuity of operations plans) should be coordinated with the backup, planning team (DRPT)
contingency, and recovery plans of any general support systems, including The team responsible for design-
ing and managing the DR plan by
networks used by the application. The contingency plans should ensure specifying the organization’s prepa-
that interfacing systems are identified and contingency/disaster planning ration, response, and recovery from
coordinated.5 disasters, including reestablish-
ment of business operations at the
As you learn more about CP, you may notice that it shares certain characteristics primary site after the disaster.
with risk management and the SDLC methodology. Many IT and InfoSec managers
are already familiar with these processes and thus can readily adapt their existing business continuity
knowledge to the CP process. planning team (BCPT)
The team responsible for design-

Components of Contingency Planning ing and managing the BC plan of


relocating the organization and
establishing primary operations at
As noted earlier, CP includes four major components: the BIA and the IR, DR, and an alternate site until the disaster
BC policies and plans. Whether an organization adopts the one-plan method or the recovery planning team can recover
multiple-plan method with interlocking procedures, each of these CP components the primary site or establish a new
location.
must be addressed and developed in their entirety. The following sections describe
each component in detail, including when and how each should be used. They also
explain how to determine which plan is best suited for the identification, contain- crisis management
ment, and resolution of any given unexpected event. Figure 5-1 depicts the major
planning team (CMPT)
The individuals from various func-
project modules performed during CP efforts. Figure 5-2 shows the overall stages of
tional areas of the organization
the CP process, which are derived from the NIST IR and CP methodologies presented assigned to develop and implement
earlier. the CM plan.
180 Principles of Information Security

Contingency
Planning

Business
Impact
Analysis

Incident Disaster Business


Response Recovery Continuity
Planning Planning Planning
Business Resumption Planning
Crisis
Management
Planning

Figure 5-1 Contingency planning hierarchies

Conduct the Develop


Form the CP Create response
business impact subordinate
team. strategies
analysis (BIA). planning policies
(IR/DR/BC).
(IR/DR/BC).
Review/revise Determine
as needed mission/business
processes & Integrate the Develop
recovery criticality. business impact subordinate plans
analysis (BIA). (IR/DR/BC).
Develop the CP Identify resource
policy statement. requirements.

Identify recovery Ensure plan testing,


Identify preventive
priorities for training, and
controls.
system resources. exercises.

Form subordinate Organize


Ensure plan
planning teams response
maintenance.
(IR/DR/BC). teams (IR/DR/BC).

Continuous improvement

Figure 5-2 Contingency planning life cycle

Business Impact Analysis


The business impact analysis (BIA) is the first major component of the CP process. A crucial foundation for the initial
planning stages, it serves as an investigation and assessment of the impact that various adverse events can have on
the organization.
business impact One of the fundamental differences between a BIA and the risk management pro-
analysis (BIA) cesses discussed in Module 4 is that risk management focuses on identifying threats,
An investigation and assessment vulnerabilities, and attacks to determine which controls can protect information. The
of adverse events that can affect
BIA assumes that these controls have been bypassed, have failed, or have otherwise
the organization, conducted as a
preliminary phase of the contin- proved ineffective, that the attack succeeded, and that the adversity that was being
gency planning process; it includes defended against has been successful. By assuming the worst has happened, and
a determination of how critical a
then assessing how that adversity will impact the organization, insight is gained
system or set of information is to
the organization’s core processes for how the organization must respond to the adverse event, minimize the damage,
and its recovery priorities. recover from the effects, and return to normal operations.
Module 5 Incident Response and Contingency Planning 181

The BIA begins with the prioritized list of threats and vulnerabilities identified in the risk management process
discussed in Module 4, and then the list is enhanced by adding the information needed to respond to the adversity.
Obviously, the organization’s security team does everything in its power to stop attacks, but as you have seen, some
attacks, such as natural disasters, deviations from service providers, acts of human failure or error, and deliberate
acts of sabotage and vandalism, may be unstoppable.
When undertaking the BIA, the organization should consider the following:

1. Scope—Carefully consider which parts of the organization to include in the BIA; determine which business
units to cover, which systems to include, and the nature of the risk being evaluated.
2. Plan—The needed data will likely be voluminous and complex, so work from a careful plan to ensure that
the proper data is collected to enable a comprehensive analysis. Getting the correct information to address
the needs of decision makers is important.
3. Balance—Weigh the information available; some information may be objective in nature, while other
information may only be available as subjective or anecdotal references. Facts should be weighted
properly against opinions; however, sometimes the knowledge and experience of key personnel can be
invaluable.
4. Objective—Identify in advance what the key decision makers require for making choices. Structure the BIA
to bring them the information they need and to facilitate consideration of those choices.
5. Follow-up—Communicate periodically to ensure that process owners and decision makers will support the
process and end result of the BIA.6

According to NIST’s SP 800-34, Rev. 1, the CPMT conducts the BIA in three stages described in the sections that
follow:7

1. Determine mission/business processes and recovery criticality.


2. Identify resource requirements.
3. Identify recovery priorities for system resources.

Determine Mission/Business Processes and Recovery Criticality


The first major BIA task is the analysis and prioritization of business processes within the organization, based on their
relationship to the organization’s mission. Each business department, unit, or division must be independently evalu-
ated to determine how important its functions are to the organization as a whole. For example, recovery operations
would probably focus on the IT department and network operation before turning to the personnel department’s
hiring activities. Likewise, recovering a manufacturing company’s assembly line is more urgent than recovering its
maintenance tracking system. This is not to say that personnel functions and assembly line maintenance are not
important to the business, but unless the organization’s main revenue-producing operations can be restored quickly,
other functions are irrelevant.
Note that throughout this section, the term mission/business process is used, as some agencies that adopt this
methodology are not businesses and thus do not have business processes per se. Do not let the term confuse you.
Whenever you see the term, it’s essentially describing a business process. NIST prefers mission/business process,
although business process is just as accurate.
It is important to collect critical information about each business unit before beginning the process of prioritizing
the business units. The key thing to remember is to avoid “turf wars” and instead focus on the selection of business
functions that must be sustained to continue business operations. While one manager or executive might feel that
his or her function is the most critical to the organization, that function might prove to be less critical in the event of
a major incident or disaster. It is the role of senior management to arbitrate these inevitable conflicts about priority;
after all, senior management has the perspective to make these types of trade-off decisions.
A weighted table analysis (WTA), or weighted factor analysis, as shown in Table 5-1, can be useful in resolving the
issue of what business function is the most critical. The CPMT can use this tool by first identifying the characteristics
of each business function that matter most to the organization—in other words, the
criteria. The team should then allocate relative weights to each of these criteria. business process
Each of the criteria is assessed on its influence toward overall importance in the A task performed by an organiza-
tion or one of its units in support
decision-making process. Once the characteristics to be used as criteria have been of the organization’s overall mission
identified and weighted (usually as columns in the WTA worksheet), the various and operations.
182 Principles of Information Security

Table 5-1 Example of Weighted Table Analysis of Business Processes

Impact Impact on Impact on


on Impact on Product/Service Market Impact on
Criterion Revenue Profitability Delivery Share Reputation
Criterion 0.25 0.3 0.15 0.2 0.1 TOTAL Importance
Weight (0–5; Not
Important
Business to Critically
# Process Important)
1 Customer 5 5 5 5 4 4.9 Critically
sales Important
2 Production 5 5 5 3 3 4.4 Critically
Important
3 Information 3 3 3 3 5 3.2 Very
security Important
services
4 IT services 4 3 4 2 2 3.1 Very
Important
5 Customer 2 3 2 1 4 2.3 Important
service
6 Research & 1 1 2 3 3 1.75 Somewhat
development Important
7 Employee 1 1 2 1 2 1.25 Somewhat
support Important
services

business functions are listed (usually as rows on the same worksheet). Each business function is assessed a score for
each of the criteria. Next, the weights can be multiplied against the scores in each of the criteria, and then the rows
are summed to obtain the overall scored value of the function to the organization. The higher the value computed for
a given business function, the more important that function is to the organization.
A BIA questionnaire is an instrument used to collect relevant business impact information for the required analysis.
It is useful as a tool for identifying and collecting information about business functions for the analysis just described.
It can also be used to allow functional managers to directly enter information about the business processes within
their area of control, the impacts of these processes on the business, and dependencies that exist for the functions
from specific resources and outside service providers.
NIST Business Process and Recovery Criticality NIST’s SP 800-34, Rev. 1, recom-
recovery time objective mends that organizations use simple qualitative categories like “low impact,” “moder-
(RTO) ate impact,” or “high impact” for the security objectives of confidentiality, integrity,
The maximum amount of time and availability (NIST’s Risk Management Framework Step 1). Note that large quan-
that a system resource can remain
unavailable before there is an unac-
tities of information are assembled, and a data collection process is essential if all
ceptable impact on other system meaningful and useful information collected in the BIA process is to be made avail-
resources, supported business pro- able for use in overall CP development.
cesses, and the maximum tolerable
When organizations consider recovery criticality, key recovery measures are
downtime.
usually described in terms of how much of the asset they must recover and what time
frame it must be recovered within. The following terms are most frequently used to
recovery point
objective (RPO) describe these values:
The point in time before a disrup-
tion or system outage to which • Recovery time objective (RTO)
business process data can be
• Recovery point objective (RPO)
recovered after an outage, given
the most recent backup copy of the • Maximum tolerable downtime (MTD)
data. • Work recovery time (WRT)
Module 5 Incident Response and Contingency Planning 183

The difference between RTO and RPO is illustrated in Figure 5-3. WRT typically maximum tolerable
involves the addition of nontechnical tasks required for the organization to make the downtime (MTD)
information asset usable again for its intended business function. The WRT can be The total amount of time the sys-
added to the RTO to determine the realistic amount of elapsed time required before tem owner or authorizing official is
willing to accept for a business pro-
a business function is back in useful service, as illustrated in Figure 5-4. cess outage or disruption. The MTD
NIST goes on to say that failing to determine MTD “could leave contingency plan- includes all impact considerations.
ners with imprecise direction on (1) selection of an appropriate recovery method and
(2) the depth of detail that will be required when developing recovery procedures,
work recovery time
including their scope and content.”8 Determining the RTO for the information system (WRT)
resource, NIST adds, “is important for selecting appropriate technologies that are The amount of effort (expressed as
best suited for meeting the MTD.”9 As for reducing RTO, that requires mechanisms elapsed time) needed to make busi-
to shorten the start-up time or provisions to make data available online at a failover ness functions work again after the
technology element is recovered.
site. Unlike RTO, NIST adds, “RPO is not considered as part of MTD. Rather, it is a This recovery time is identified by
factor of how much data loss the mission/business process can tolerate during the the RTO.
recovery process.”10 Reducing RPO requires mechanisms to increase the synchronic-
ity of data replication between production systems and the backup implementations
for those systems.

Last backup or

Source: http://networksandservers.blogspot.com/2011/02/high-
point where data
is in usable and Systems & data
recoverable state recovered

Incident/
disaster strikes

Time

availability-terminology-ii.html.
How far back? How long to recover?

Recovery Recovery
point time
(how much lost data?) (how soon for
restoration
& recovery?)

Figure 5-3 RTO vs. RPO

Last Data loss/ Systems Data


com/2011/02/high-availability-terminology-ii.html.

backup systems down recovered recovered


Source: http://networksandservers.blogspot.

Recovery
Incident/ complete/
disaster Physical/systems Data Testing & resume
strikes recovery recovery validation operations

Normal operations Recovery operations Normal operations

RPO RTO WRT


MTD

Figure 5-4 RTO, RPO, MTD, and WRT


184 Principles of Information Security

Because of the critical need to recover business functionality, the total time needed to place the busi-
ness function back in service must be shorter than the MTD. Planners should determine the optimal point to
recover the information system in order to meet BIA-mandated recovery needs while balancing the cost of
system inoperability against the cost of the resources required for restoring systems. This must be done in
the context of the BIA-identified critical business processes and can be shown with a simple chart, such as the
one in Figure 5-5.
The longer an interruption to system availability remains, the more impact and cost it will have for the organization
and its operations. When plans require a short RTO, the solutions that will be required are usually more expensive to
design and use. For example, if a system must be recovered immediately, it will have an RTO of 0.
These types of solutions will require fully redundant alternative processing sites and will therefore have much
higher costs. On the other hand, a longer RTO would allow a less expensive recovery system. Plotting the cost balance
points will show an optimal point between disruption and recovery costs. The intersecting point, labeled the cost
balance point in Figure 5-5, will be different for every organization and system, based on the financial constraints and
operating requirements.11

Cost of disruption
Cost to recover (business impact)
(system mirror)

Cost
Cost
Balance
Point
Cost to recover
(tape backup)

Length of disruption time

Figure 5-5 Cost balancing

Information Asset Prioritization As the CPMT conducts the BIA, it will be assessing priorities and relative values
for mission/business processes. To do so, it needs to understand the information assets used by those processes. In
essence, the organization has determined which processes are most critical to its long-term viability, and now it must
determine which information assets are most critical to each process.
Note that the presence of high-value information assets may influence the valuation of a particular business pro-
cess. In any event, once the business processes have been prioritized, the organization should identify, classify, and
prioritize the information assets both across the organization and within each business process, placing classification
labels on each collection or repository of information in order to better understand its value and to prioritize its pro-
tection. Normally, this task would be performed as part of the risk assessment function within the risk management
process. If the organization has not performed this task, the BIA process is the appropriate time to do so. Again, the
WTA can be a useful tool to determine the information asset priorities.

Identify Recovery Resource Requirements


Once the organization has created a prioritized list of its mission/business processes, it needs to determine what
resources would be required to recover those processes and the assets associated with them. Some processes are
resource-intensive—like IT functions. Supporting customer data, production data, and other organizational information
requires extensive quantities of information processing, storage, and transmission (through networking). Other busi-
ness production processes require complex or expensive components to operate. For each process and information
asset identified in the previous BIA stage, the organization should identify and describe the relevant resources needed
to provide or support that process. A simplified method for organizing this information is to put it into a resource/
component table, like the example shown in Table 5-2. Note in the table how one business process will typically have
multiple components, each of which must be enumerated separately.
Module 5 Incident Response and Contingency Planning 185

Table 5-2 Example Resource/Component Table

Mission/Business Process Required Resource Additional Resource Description and Estimated


Components Details Costs
Provide customer support Trouble ticket and Application server built Each help-desk technician
(help desk) resolution application from Linux OS, Apache requires access to the
server, and SQL database organization’s trouble ticket and
resolution software application,
hosted on a dedicated server.
See current cost recovery
statement for valuation.
Provide customer support Help-desk network 25 Cat5e network drops, The help-desk applications
(help desk) segment gigabit network hub are networked and require a
network segment to access. See
current cost recovery statement
for valuation.
Provide customer support Help-desk access 1 laptop/PC per The help-desk applications
(help desk) terminals technician, with Web- require a Web interface on a
browsing software laptop/PC to access. See current
cost recovery statement for
valuation.
Provide customer billing Customized accounts Application server with Accounts Receivable requires
receivable application Linux OS, Apache server, access to its customized
and SQL database AR software and customer
database to process customer
billing. See current cost recovery
statement for valuation.

Identify System Resource Recovery Priorities


The last stage of the BIA is prioritizing the resources associated with the mission/business processes, which provides
a better understanding of what must be recovered first, even within the most critical processes. With the information
from previous steps in hand, the organization can create additional weighted tables of the resources needed to support
the individual processes. By assigning values to each resource, the organization will have a custom-designed “to-do”
list available once the recovery phase commences. Whether it is an IR- or DR-focused recovery or the implementation
of critical processes in an alternate site during business continuity, these lists will prove invaluable to those who are
tasked to establish (or reestablish) critical processes quickly.
In addition to the weighted tables described earlier, a simple valuation and classification scale, such as Primary/Second-
ary/Tertiary or Critical/Very Important/Important/Routine, can be used to provide a quicker method of valuating the sup-
porting resources. What is most important is not to get so bogged down in the process that you lose sight of the objective
(the old “can’t see the forest for the trees” problem). Teams that spend too much time developing and completing weighted
tables may find a simple classification scheme more suited to their task. However, in a complex process with many resources,
a more sophisticated valuation method like the weighted tables may be more appropriate. One of the jobs of the CPMT while
preparing to conduct the BIA is to determine what method to use for valuating processes and their supporting resources.

Contingency Planning Policies


Prior to the development of each of the types of CP documents outlined in this module, the CP team should work to
develop the policy environment that will enable the BIA process and should provide specific policy guidance toward
authorizing the creation of each of the planning components (IR, DR, and BC). These policies provide guidance on the
structure of the subordinate teams and the philosophy of the organization, and they assist in the structuring of the plan.
Each of the CP documents will include a policy similar in structure to all other policies used by the organization.
Just as the enterprise InfoSec policy defines the InfoSec roles and responsibilities for the entire enterprise, each of the
CP documents is based on a specific policy that defines the related roles and responsibilities for that element of the
overall CP environment within the organization.
186 Principles of Information Security

Incident Response
Most organizations have experience detecting, reacting to, and recovering from cyberattacks, employee errors, service
outages, and small-scale natural disasters. While they may not have formally labeled such efforts, these organizations
are performing incident response (IR). IR must be carefully planned and coordinated because organizations heavily
depend on the quick and efficient containment and resolution of incidents.
Incident response planning (IRP), therefore, is the preparation for such an effort and is performed by the IRP team
(IRPT). Note that the term incident response could be used either to describe the entire set of activities or a specific
phase in the overall reaction. However, in an effort to minimize confusion, this text will use the term IR to describe
the overall process, and reaction rather than response to describe the organization’s performance after it detects an
incident.
In business, unexpected events happen. When those events represent the poten-
incident response (IR) tial for loss, they are referred to as adverse events or incident candidates. When
An organization’s set of planning an adverse event begins to manifest as a real threat to information, it becomes an
and preparation efforts for detect- incident. The incident response plan (IR plan) is usually activated when the orga-
ing, reacting to, and recovering
nization detects an incident that affects it, regardless of how minor the effect is.
from an incident.

incident response Getting Started


planning (IRP) As mentioned previously, an early task for the CPMT is to form the IRPT, which will
The actions taken by senior man- begin work by developing policy to define the team’s operations, articulate the orga-
agement to develop and implement
the IR policy, plan, and computer nization’s response to various types of incidents, and advise users how to contribute
security incident response team. to the organization’s effective response rather than contributing to the problem at
hand. The IRPT then forms the computer security incident response team (CSIRT).
incident candidate Some key members of the IRPT may be part of the CSIRT. You will learn more about
See adverse event. the CSIRT’s roles and composition later in this section. Figure 5-6 illustrates the NIST
incident response life cycle.
incident As part of an increased focus on cybersecurity infrastructure protection, NIST
has developed a Framework for Improving Critical Infrastructure Cybersecurity, also
An adverse event that could result
in a loss of information assets but referred to as the NIST Cybersecurity Framework (CSF). The CSF includes, and is
does not threaten the viability of designed to be complementary to, the existing IR methodologies and SPs. In fact,
the entire organization.
the documents described in this module are the foundation of the new CSF. Figure
5-6 shows the phases in the CSF, including those of event recovery, which is the
incident response plan subject of NIST SP 800-184, “Guide for Cybersecurity Event Recovery” (2016). It is
(IR plan)
The documented product of inci-
dent response planning; a plan that
Source: NIST SP 800-61, Rev. 2, “The Computer Security

shows the organization’s intended


efforts in the event of an incident.

computer security
incident response Containment
team (CSIRT) Detection & eradication Post-incident
An IR team composed of techni- Preparation analysis & recovery activity
Incident Handling Guide.”

cal IT, managerial IT, and InfoSec


professionals who are prepared to
detect, react to, and recover from
an incident; may include members
of the IRPT.

Figure 5-6 NIST incident response life cycle


Module 5 Incident Response and Contingency Planning 187

r
y

on
t

ve
t if

ct
ec

te

co
sp
en

ot

De

Re

Re
Pr
Id
Tactical Strategic
Detect cyber Respond to Remediate recovery
Identify Protect recovery
event cyber event root cause
phase phase
Guide for cybersecurity event recovery

Figure 5-7 NIST Cybersecurity Framework

not difficult to map the phases shown in Figure 5-6 to those of Figure 5-7. Within the CSF, the five stages shown in
Figure 5-7 include the following:
• Identify—Relates to risk management and governance
• Protect—Relates to implementation of effective security controls (policy, education, training and awareness,
and technology)
• Detect—Relates to the identification of adverse events
• Respond—Relates to reacting to an incident
• Recover—Relates to putting things “as they were before” the incident12

The Detect, Respond, and Recover stages directly relate to NIST’s IR strategy, as described in detail in SP 800-61,
Rev. 2.

For more information on the NIST Cybersecurity Framework, download the Framework for Improving Critical
i Infrastructure Cybersecurity from www.nist.gov/sites/default/files/documents/cyberframework/cybersecurity-
framework-021214.pdf.

Incident Response Policy


An important early step for the CSIRT is to develop an IR policy. NIST’s SP 800-61, Rev. 2, “The Computer Security
Incident Handling Guide,” identifies the following key components of a typical IR policy:

• Statement of management commitment


• Purpose and objectives of the policy
• Scope of the policy (to whom and what it applies and under what circumstances)
• Definition of InfoSec incidents and related terms
• Organizational structure and definition of roles, responsibilities, and levels of authority; should include
the authority of the incident response team to confiscate or disconnect equipment and to monitor suspi-
cious activity, the requirements for reporting certain types of incidents, the requirements and guidelines
for external communications and information sharing (e.g., what can be shared with whom, when, and
over what channels), and the handoff and escalation points in the incident management process
• Prioritization or severity ratings of incidents
• Performance measures
• Reporting and contact forms13 IR policy
The policy document that guides
IR policy, like all policies, must gain the full support of top management and be
the development and implementa-
clearly understood by all affected parties. It is especially important to gain the support tion of IR plans and the formulation
of communities of interest that will be required to alter business practices or make and performance of IR teams.
188 Principles of Information Security

changes to their IT infrastructures. For example, if the CSIRT determines that the only way to stop a massive denial-of-
service attack is to sever the organization’s connection to the Internet, it should have the approved permission stored
in an appropriate and secure location before authorizing such action. This ensures that the CSIRT is performing autho-
rized actions and protects both the CSIRT members and the organization from misunderstanding and potential liability.

Incident Response Planning


If the IR plan is not adequate to deal with the situation, it would be necessary to initiate the DR plan and the BC plan,
both of which are discussed later in this module. When one of the threats that were discussed in Modules 1 and 2 is
made manifest in an actual adverse event, it is classified as an InfoSec incident, but only if it has all of the following
characteristics:
• It is directed against information assets.
• It has a realistic chance of success.
• It threatens the confidentiality, integrity, or availability of information resources and assets.

The prevention of threats and attacks has been intentionally omitted from this discussion because guarding
against such possibilities is primarily the responsibility of the InfoSec department, which works with the rest of the
organization to implement sound policy, effective risk controls, and ongoing training and awareness programs. It is
important to understand that IR is a reactive measure, not a preventive one, although most IR plans include preventa-
tive recommendations.
The responsibility for creating an organization’s IR plan usually falls to the CIO, the CISO, or an IT manager with
security responsibilities. With the aid of other managers and systems administrators on the CP team, the CISO should
select members from each community of interest to form an independent IR team, which executes the IR plan. The
roles and responsibilities of IR team members should be clearly documented and communicated throughout the
organization. The IR plan also includes an alert roster, which lists certain critical individuals and organizations to be
contacted during the course of an incident.
Using the multistep CP process discussed in the previous section as a model, the CP team can create the IR plan.
According to NIST SP 800-61, Rev. 2, the IR plan should include the following elements:
• Mission
• Strategies and goals
• Senior management approval
• Organizational approach to incident response
• How the incident response team will communicate with the rest of the organization and with other
organizations
• Metrics for measuring incident response capability and its effectiveness
• Roadmap for maturing incident response capability
• How the program fits into the overall organization14
During this planning process, the IR procedures take shape. For every incident scenario, the CP team creates three
sets of incident handling procedures:
1. During the incident—The planners develop and document the procedures that must be performed during the
incident. These procedures are grouped and assigned to individuals. Systems administrators’ tasks differ from
managerial tasks, so members of the planning committee must draft a set of function-specific procedures.
2. After the incident—Once the procedures for handling an incident are drafted, the planners develop and
document the procedures that must be performed immediately after the incident has ceased. Again,
separate functional areas may develop different procedures.
3. Before the incident—The planners draft a third set of procedures: those tasks that must be performed
to prepare for the incident, including actions that could mitigate any damage from the incident. These
procedures include details of the data backup schedules, disaster recovery
preparation, training schedules, testing plans, copies of service agreements,
IR procedures
and BC plans, if any. At this level, the BC plan could consist just of additional
Detailed, step-by-step methods of
preparing, detecting, reacting to, material about a service bureau that stores data off-site via electronic vaulting,
and recovering from an incident. with an agreement to provide office space and lease equipment as needed.
Module 5 Incident Response and Contingency Planning 189

Planning for an incident and the responses to it requires a detailed understanding of the information systems and the
threats they face. The BIA provides the data used to develop the IR plan. The IRPT seeks to develop a series of predefined
responses that will guide the CSIRT and InfoSec staff through the IR process. Predefining incident responses enables the
organization to react to a detected incident quickly and effectively, without confusion or wasted time and effort.
The execution of the IR plan typically falls to the CSIRT. As noted previously, the CSIRT is a separate group from the IRPT,
although some overlap may occur; the CSIRT is composed of technical and managerial IT and InfoSec professionals who are
prepared to diagnose and respond to an incident. In some organizations, the CSIRT may simply be a loose or informal associa-
tion of IT and InfoSec staffers who would be called if an attack were detected on the organization’s information assets. In other,
more formal implementations, the CSIRT is a set of policies, procedures, technologies, people, and data put in place to prevent,
detect, react to, and recover from an incident that could potentially damage the organization’s information. At some level, all
members of an organization are members of the CSIRT, because every action they take can cause or avert an incident.
The CSIRT should be available for contact by anyone who discovers or suspects that an incident involving the
organization has occurred. One or more team members, depending on the magnitude of the incident and availability
of personnel, then handle the incident. The incident handlers analyze the incident data, determine the impact of the
incident, and act appropriately to limit the damage to the organization and restore normal services. Although the
CSIRT may have only a few members, the team’s success depends on the participation and cooperation of individuals
throughout the organization.
The CSIRT consists of professionals who can handle the systems and functional areas affected by an incident. For
example, imagine a firefighting team responding to an emergency call. Rather than responding to the fire as individu-
als, every member of the team has a specific role to perform, so that the team acts as a unified body that assesses the
situation, determines the appropriate response, and coordinates the response. Similarly, each member of the IR team
must know his or her specific role, work in concert with other team members, and execute the objectives of the IR plan.
Incident response actions can be organized into three basic phases:

• Detection—Recognition that an incident is under way


• Reaction—Responding to the incident in a predetermined fashion to contain and mitigate its potential damage
(the new NIST CSF refers to this stage as “Respond” in its Detect, Respond, Recover approach)
• Recovery—Returning all systems and data to their state before the incident. Table 5-3 shows the incident
handling checklist from NIST SP 800-61, Rev 2.

Data Protection in Preparation for Incidents


An organization has several options for protecting its information and getting operations up and running quickly after
an incident:

• Traditional data backups—The organization can use a combination of on-site


and off-site tape-drive, hard-drive, and cloud backup methods, in a variety of
rotation schemes; because the backup point is sometime in the past, recent electronic vaulting
data is potentially lost. Most common data backup schemes involve a redun- A backup strategy that transfers
dant array of independent disks (RAID) or disk-to-disk-to-cloud methods. data in bulk batches to an off-site
facility.
• Electronic vaulting—The organization can employ bulk batch transfer of data
to an off-site facility, usually via leased lines or secure Internet connections.
The receiving server archives the data as it is received. Some DR companies remote journaling
specialize in electronic vaulting services. A backup strategy that transfers
only transaction data in near real
• Remote journaling—The organization can transfer live transactions to an off-
time to an off-site facility.
site facility. Remote journaling differs from electronic vaulting in two ways:
(1) Only transactions are transferred, not archived data, and (2) the transfer
takes place online and in much closer to real time. While electronic vaulting is
database shadowing
A backup strategy that transfers
akin to a traditional backup, with a dump of data to the off-site storage, remote
duplicate online transaction data
journaling involves online activities on a systems level, much like server fault and duplicate databases to a
tolerance, where data is written to two locations simultaneously. remote site on a redundant server,
• Database shadowing—The organization can store duplicate online transac- combining electronic vaulting with
remote journaling by writing mul-
tion data, along with duplicate databases, at the remote site on a redun- tiple copies of the database simul-
dant server; database shadowing combines electronic vaulting with remote taneously to two locations.
190 Principles of Information Security

Table 5-3 Incident Handling Checklist from NIST SP 800-61, Rev. 2

Action Completed
Detection and Analysis
1. Determine whether an incident has occurred
1.1 Analyze the precursors and indicators
1.2 Look for correlating information
1.3 Perform research (e.g., search engines, knowledge base)
1.4 As soon as the handler believes an incident has occurred, begin
documenting the investigation and gathering evidence
2. Prioritize handling the incident based on the relevant
factors (functional impact, information impact,
recoverability effort, etc.)
3. Report the incident to the appropriate internal personnel and
external organizations
Containment, Eradication, and Recovery
4. Acquire, preserve, secure, and document evidence
5. Contain the incident
6. Eradicate the incident
6.1 Identify and mitigate all vulnerabilities that were exploited
6.2 Remove malware, inappropriate materials, and other
components
6.3 If more affected hosts are discovered (e.g., new malware
infections), repeat the Detection and Analysis steps (1.1, 1.2) to
identify all other affected hosts, then contain (5) and eradicate
(6) the incident for them
7. Recover from the incident
7.1 Return affected systems to an operationally ready state
7.2 Confirm that the affected systems are functioning normally
7.3 If necessary, implement additional monitoring to look for future
related activity
Post-Incident Activity
8. Create a follow-up report
9. Hold a lessons learned meeting (mandatory for major incidents,
optional otherwise). While not explicitly noted in the NIST
document, most organizations will document the findings from
this activity and use it to update relevant plans, policies, and
procedures.
Source: NIST SP 800-61, Rev. 2.

journaling by writing multiple copies of the database simultaneously to two sepa-


rate locations.
3-2-1 backup rule
A backup strategy that recom- Industry recommendations for data backups include the “3-2-1 backup rule,”
mends the creation of at least three which encourages maintaining three copies of important data (the original and two
copies of critical data (the original
backup copies) on at least two different media (like local hard drives and cloud
and two copies) on at least two dif-
ferent media, with at least one copy backup), with at least one copy stored off-site. Other recommendations include daily
stored off-site. backups that are stored on-site and a weekly backup stored off-site.
Module 5 Incident Response and Contingency Planning 191

Detecting Incidents incident classification


The process of examining an
The challenge for every IR team is determining whether an event is the product of adverse event or incident candidate
routine systems use or an actual incident. Incident classification involves reviewing and determining whether it consti-
tutes an actual incident.
each adverse event that has the potential to escalate into an incident and determin-
ing whether it constitutes an actual incident and thus should trigger the IR plan.
Classifying an incident is the responsibility of the CSIRT, unless the organization has incident detection
deployed a security operations center (SOC) with individuals trained to perform this The identification and classification
of an adverse event as an incident,
task prior to notifying the CSIRT and activating the IR plan. Initial reports from end accompanied by the notification of
users, intrusion detection systems, host- and network-based virus detection soft- the CSIRT and the activation of the
ware, and systems administrators are all ways to detect, track, and classify adverse IR reaction phase.
events. Careful training in the reporting of an adverse event allows end users, help-
desk staff, and all security personnel to relay vital information to the IR team. But, no
matter how well trained the team is, event data that flows in an endless stream from hundreds or thousands of network
devices and system components requires automated tools for collection and screening. Later modules describe pro-
cesses for event log data collection, analysis, and event detection using intrusion detection and prevention systems as
well as security information and event management systems. For now, let’s say that once an actual incident is properly
identified and classified, members of the IR team can effectively execute the corresponding procedures from the IR
plan. This is the primary purpose of the first phase of IR: incident detection.
Several occurrences could signal an incident. Unfortunately, these same events can result from an overloaded
network, computer, or server, and some are similar to the normal operation of these information assets. Other inci-
dents mimic the actions of a misbehaving computing system, software package, or other less serious threat. To help
make incident detection more reliable, renowned security consultant Donald Pipkin has identified three categories of
incident indicators: possible, probable, and definite.15

Possible Indicators
The following types of incident candidates are considered possible indicators of actual incidents:

• Presence of unfamiliar files—Users might discover unfamiliar files in their home directories or on their office
computers. Administrators might also find unexplained files that do not seem to be in a logical location or are
not owned by an authorized user.
• Presence or execution of unknown programs or processes—Users or administrators might detect unfamiliar pro-
grams running, or processes executing, on office machines or network servers. Users should become familiar
with accessing running programs and processes (usually through the Windows Task Manager shown in Figure
5-8) so they can detect rogue instances.
• Unusual consumption of computing resources—An example would be a sudden spike or fall in consumption of
memory or hard disk space. Many computer operating systems, including Windows, Linux, and UNIX variants,
allow users and administrators to monitor CPU and memory consumption. The Windows Task Manager has a
Performance tab that provides this information, also shown in Figure 5-8. Most computers also have the ability
to monitor hard drive space. In addition, servers maintain logs of file creation and storage.
• Unusual system crashes—Computer systems can crash. Older operating systems running newer programs are
notorious for locking up or spontaneously rebooting whenever the operating system is unable to execute a
requested process or service. You are probably familiar with system error messages such as “Unrecoverable
Application Error,” “General Protection Fault,” and the infamous Windows “Blue Screen of Death.” However,
if a computer system seems to be crashing, hanging, rebooting, or freezing more frequently than usual, the
cause could be an incident candidate.

Probable Indicators
The following types of incident candidates are considered probable indicators of actual incidents:
• Activities at unexpected times—If traffic levels on the organization’s network exceed the measured baseline
values, an incident candidate is probably present. If this activity surge occurs outside normal business hours,
the probability becomes much higher. Similarly, if systems are accessing drives and otherwise indicating high
activity when employees aren’t using them, an incident may also be occurring.
192 Principles of Information Security

Source: Microsoft.
Figure 5-8 Windows Task Manager showing processes (left) and services (right)

• Presence of new accounts—Periodic review of user accounts can reveal accounts that the administrator does
not remember creating or that are not logged in the administrator’s journal. Even one unlogged new account
is an incident candidate. An unlogged new account with root or other special privileges has an even higher
probability of being an actual incident.
• Reported attacks—If users of the system report a suspected attack, there is a high probability that an incident
has occurred, whether it was an attack or not. The technical sophistication of the person making the report
should be considered. If systems administrators are reporting attacks, odds are that additional attacks are
occurring throughout the organization.
• Notification from an IDPS—If the organization has installed and correctly configured a host- or network-based
intrusion detection and prevention system (IDPS), then a notification from the IDPS indicates that an incident
might be in progress. However, IDPSs are difficult to configure perfectly, and even when they are, they tend to
issue false positives or false alarms. The administrator must then determine whether the notification is real
or the result of a routine operation by a user or other administrator.

Definite Indicators
The following five types of incident candidates are definite indicators of an actual incident. That is, they clearly signal
that an incident is in progress or has occurred. In these cases, the IR plan must be activated immediately, and appro-
priate measures must be taken by the CSIRT.

• Use of dormant accounts—Many network servers maintain default accounts, and there are often accounts
from former employees, employees on a leave of absence or sabbatical without remote access privileges,
or dummy accounts set up to support system testing. If any of these accounts activate and begin accessing
system resources, querying servers, or engaging in other activities, an incident is certain to have occurred.
• Changes to logs—Smart systems administrators back up system logs as well as system data. As part of a routine
incident scan, systems administrators can compare these logs to the online versions to determine whether
they have been modified. If they have, and the systems administrator cannot determine explicitly that an
authorized individual modified them, an incident has occurred.
• Presence of hacker tools—Network administrators sometimes use system vulnerability and network evaluation
tools to scan internal computers and networks to determine what a hacker can see. These tools are also used
to support research into attack profiles. All too often, however, they are used by individuals with local network
access to hack into systems or just “look around.” To combat this problem, many organizations explicitly prohibit
the use of these tools without permission from the CISO, making any unauthorized installation a policy violation.
Most organizations that engage in penetration testing require that all tools in this category be confined to specific
systems and that they not be used on the general network unless active penetration testing is under way. Finding
hacker tools, or even legal security tools, in places they should not be is an indicator that an incident has occurred.
Module 5 Incident Response and Contingency Planning 193

• Notifications by partner or peer—If a business partner or another integrated organization reports an attack
from your computing systems, then an incident has occurred. It’s quite common for an attacker to use a third
party’s conscripted systems to attack another system rather than attacking directly.
• Notification by hacker—Some hackers enjoy taunting their victims. If an organization’s Web pages are defaced,
it is an incident. If an organization receives an extortion request for money in exchange for its stolen data, an
incident is in progress. Note that even if an actual attack has not occurred—for example, the hacker is just
making an empty threat—the reputational risk is real and should be treated as such.

Potential Incident Results


The situations described in the following list may simply be caused by the abnormal performance of a misbehaving IT
system. However, because accidental and intentional incidents can lead to the following results, organizations should
err on the side of caution and treat every adverse event as if it could evolve into an actual incident:

• Loss of availability—Information or information systems become unavailable.


• Loss of integrity—Users report corrupt data files, garbage where data should be, or data that just looks wrong.
• Loss of confidentiality—There is a notification of a sensitive information leak, or information that was thought
to be protected has been disclosed.
• Violation of policy—There is a violation of organizational policies addressing information or InfoSec.
• Violation of law or regulation—The law has been broken and the organization’s information assets are involved.

Reacting to Incidents
Once an actual incident has been confirmed and properly classified, the IR plan moves from the detection phase
to the reaction phase. NIST SP 800-61, Rev. 2, combines the reaction and recovery phases into their “Containment,
Eradication, and Recovery” phase, but the phases are treated separately as “Respond” and “Recover” under the
new CSF.16
The steps in IR are designed to stop the incident, mitigate its effects, and provide information for recovery from
the incident. In the Reaction or Response phase, several action steps taken by the CSIRT and others must occur quickly
and may take place concurrently. An effective IR plan prioritizes and documents these steps to allow for efficient refer-
ence during an incident. These steps include notification of key personnel, documentation of the incident, determining
containment options, and escalation of the incident if needed.

Notification of Key Personnel


As soon as the CSIRT determines that an incident is in progress, the right people must be notified in the right order.
Most “reaction” organizations, such as firefighters or the military, use an alert roster for just such a situation. Organiza-
tions can adopt this approach to ensure that appropriate personnel are notified in the event of an incident or disaster.
There are two ways to activate an alert roster: sequentially and hierarchically. A sequential roster requires that
a designated contact person initiate contact with each and every person on the roster using the identified method. A
hierarchical roster requires that the first person initiate contact with a specific number of designated people on the
roster, who in turn contact other designated people, and so on. Each approach has advantages and disadvantages.
The hierarchical system is quicker because more people are making contacts at the same time, but the message
can become distorted as it is passed from person to person. A hierarchical system
can also suffer from a break in the chain if people can’t reach all of the employees
alert roster
they’re supposed to contact. In that situation, everyone “downstream” may not
A document that contains contact
be notified. The sequential system is more accurate, but slower because a single
information for personnel to be
contact person must contact each recipient and deliver the message. Fortunately, notified in the event of an incident
many automated systems are available to facilitate either approach. or disaster.

For more information on selecting an automated notification system, read the article by Steven Ross on
i TechTarget’s page at https://searchdisasterrecovery.techtarget.com/feature/Selecting-an-automated-notification-
system-for-data-center-disasters.
194 Principles of Information Security

alert message The alert roster is used to deliver the alert message, which tells each team
A description of the incident or
member his or her expected task and situation. It provides just enough information
disaster that usually contains just so that each responder, CSIRT or otherwise, knows what portion of the IR plan to
enough information so that each implement without impeding the notification process. It is important to recognize
person knows what portion of the
that not everyone is on the alert roster—only individuals who must respond to an
IR or DR plan to implement with-
out slowing down the notification actual incident. As with any part of the IR plan, the alert roster must be regularly
process. maintained, tested, and rehearsed if it is to remain effective.
During this phase, other key personnel not on the alert roster, such as general
management, must be notified of the incident as well. This notification should occur
only after the incident has been confirmed but before media or other external sources learn of it. Among those likely to
be included in the notification process are members of the legal, communications, and human resources departments.
In addition, some incidents are disclosed to the employees in general as a lesson in security, and some are not, as a
measure of security. Furthermore, other organizations may need to be notified if it is determined that the incident is
not confined to internal information resources or is part of a larger-scale assault. Distributed denial-of-service attacks
are an example of this type of general assault against the cyber infrastructure. In general, the IR planners should
determine in advance whom to notify and when, and should offer guidance about additional notification steps to
take as needed.

Documenting an Incident
As soon as an incident has been confirmed and the notification process is under way, the team should begin to docu-
ment it. The documentation should record the who, what, when, where, why, and how of each action taken while the
incident is occurring. This documentation serves as a case study after the fact to determine whether the right actions
were taken and if they were effective. It also proves that the organization did everything possible to prevent the spread
of the incident.
Legally, the standards of due care may offer some protection to the organization if an incident adversely affects
individuals inside and outside the organization, or if it affects other organizations that use the target organization’s
systems. Incident documentation can also be used as a simulation in future training sessions with the IR plan.

Incident Containment Strategies


One of the most critical components of IR is stopping the incident and containing its scope or impact. Incident
containment strategies vary depending on the incident and on the amount of damage caused. Before an incident
can be stopped or contained, however, the affected areas must be identified. Now is not the time to conduct a
detailed analysis of the affected areas; that task is typically performed after the fact, in the forensics process.
Instead, simple identification of what information and systems are involved determines the containment actions
to be taken. Incident containment strategies focus on two tasks: stopping the incident and recovering control of
the affected systems.
The CSIRT can stop the incident and attempt to recover control by means of several strategies. If the incident
originates outside the organization, the simplest and most straightforward approach is to disconnect the affected com-
munication circuits. Of course, if the organization’s lifeblood runs through that circuit, this step may be too drastic; if
the incident does not threaten critical functional areas, it may be more feasible to monitor the incident and contain it
another way. One approach used by some organizations is to apply filtering rules dynamically to limit certain types of
network access. For example, if a threat agent is attacking a network by exploiting a vulnerability in the Simple Network
Management Protocol (SNMP), then applying a blocking filter on the commonly used IP ports for that vulnerability
will stop the attack without compromising other services on the network. Depending on the nature of the attack and
the organization’s technical capabilities, using ad hoc controls can sometimes buy valuable time to devise a more
permanent control strategy. Typical containment strategies include the following:
• Disabling compromised user accounts
• Reconfiguring a firewall to block the problem traffic
• Temporarily disabling compromised processes or services
• Taking down the conduit application or server—for example, the e-mail server
• Disconnecting affected networks or network segments
• Stopping (powering down) all computers and network devices
Module 5 Incident Response and Contingency Planning 195

Obviously, the final strategy is used only when all system control has been lost and the only hope is to preserve
the data stored on the computers so that operations can resume normally once the incident is resolved. The CSIRT,
following the procedures outlined in the IR plan, determines the length of the interruption.
Consider what would happen during an incident if key personnel are on sick leave, vacation, or otherwise not
at work? Think of how many people in your class or office are not there on a regular basis. Many businesses require
travel, with employees going off-site to meetings, seminars, or training, and to fulfill other diverse requirements. In
addition, “life happens”—employees are sometimes absent due to illness, injury, routine medical activities, and other
unexpected events. In considering these possibilities, the importance of preparedness becomes clear. Everyone should
know how to react to an incident, not just the CISO and security administrators.

Incident Escalation
An incident may increase in scope or severity to the point that the IR plan cannot adequately handle it. An important
part of knowing how to handle an incident is knowing at what point to escalate it to a disaster, or to transfer the incident
to an outside authority such as law enforcement or some other public response unit. During the BIA, each organiza-
tion will have to determine the point at which an incident is deemed a disaster. These criteria must be included in the
IR plan. The organization must also document when to involve outside responders, as discussed in other sections.
Escalation is one of those things that, once done, cannot be undone, so it is important to know when and where it
should be used.

Recovering from Incidents


Once the incident has been contained and system control has been regained, incident recovery can begin. As in the
incident reaction phase, the first task is to inform the appropriate human resources. Almost simultaneously, the CSIRT
must assess the full extent of the damage to determine what must be done to restore the systems. Everyone involved
should begin recovery operations based on the appropriate incident recovery section of the IR plan. NIST SP 800-184,
“Guide for Cybersecurity Event Recovery,” contains a detailed methodology for recovering from security incidents.
The CSIRT uses a process called incident damage assessment to immediately determine the impact from a breach
of confidentiality, integrity, and availability on information and information assets. Incident damage assessment can
take days or weeks, depending on the extent of the damage. The damage can range from minor, such as when a curi-
ous hacker snoops around, to a more severe case in which hundreds of computer systems are infected by malware.
System logs, intrusion detection logs, configuration logs, and other documents, as well as the documentation from
the incident response, provide information on the type, scope, and extent of damage. Using this information, the CSIRT
assesses the current state of the data and systems and compares it to a known state. Individuals who document the
damage from actual incidents must be trained to collect and preserve evidence in case the incident is part of a crime
or results in a civil action.
Once the extent of the damage has been determined, the recovery process begins. According to noted security
consultant and author Donald Pipkin, this process involves the following steps:17

• Identify the vulnerabilities that allowed the incident to occur and spread. Resolve them.
• Address the safeguards that failed to stop or limit the incident or were missing from the system in the first
place. Install, replace, or upgrade them.
• Evaluate monitoring capabilities (if present). Improve detection and reporting methods or install new
monitoring capabilities.
• Restore the data from backups, as needed. The IR team must understand the backup strategy used by the
organization, restore the data contained in backups, and then use the appropriate recovery processes, from
incremental backups or database journals, to recreate any data that was created or modified since the last backup.
• Restore the services and processes in use. Compromised services and processes must be examined, cleaned,
and then restored. If services or processes were interrupted while regaining control of the systems, they need
to be brought back online.
• Continuously monitor the system. If an incident happened once, it could easily happen again. Hackers fre-
quently boast of their exploits in chat rooms and dare their peers to match their efforts. If word gets out, oth-
ers may be tempted to try the same or different attacks on your systems. It is therefore important to maintain
vigilance during the entire IR process.
196 Principles of Information Security

• Restore the confidence of the organization’s communities of interest. The CSIRT, following a recommendation
from management, may want to issue a short memorandum outlining the incident and assuring everyone that
it was handled and the damage was controlled. If the incident was minor, say so. If the incident was major or
severely damaged systems or data, reassure users that they can expect operations to return to normal as soon
as possible. The objective of this communication is to prevent panic or confusion from causing additional
disruption to the operations of the organization.

According to NIST SP 800-184, every organization should have a recovery plan (as a subset of the IR plan) to guide spe-
cific efforts after the incident has been contained. The following is the summary of recommendations from that document:

Understand how to be prepared for resilience at all times, planning how to operate in a diminished capacity
or restore services over time based on their relative priorities.
Identify and document the key personnel who will be responsible for defining recovery criteria and associated
plans, and ensure these personnel understand their roles and responsibilities.
Create and maintain a list of people, process, and technology assets that enable the organization to achieve
its mission (including external resources), along with all dependencies among these assets. Document and
maintain categorizations for these assets based on their relative importance and interdependencies to enable
prioritization of recovery efforts.
Develop comprehensive plan(s) for recovery that support the prioritizations and recovery objectives, and use
the plans as the basis of developing recovery processes and procedures that ensure timely restoration of sys-
tems and other assets affected by future cyber events. The plan(s) should ensure that underlying assumptions
(e.g., availability of core services) will not undermine recovery, and that processes and procedures address
both technical and non-technical activity affecting people, processes, and technologies.
Develop, implement, and practice the defined recovery processes, based upon the organization’s recovery
requirements, to ensure timely recovery team coordination and restoration of capabilities or services affected
by cyber events.
Formally define and document the conditions under which the recovery plan is to be invoked, who has the
authority to invoke the plan, and how recovery personnel will be notified of the need for recovery activities
to be performed.
Define key milestones for meeting intermediate recovery goals and terminating active recovery efforts.
Adjust incident detection and response policies, processes, and procedures to ensure that recovery does not
hinder effective response (e.g., by alerting an adversary or by erroneously destroying forensic evidence).
Develop a comprehensive recovery communications plan, and fully integrate communications considerations
into recovery policies, plans, processes, and procedures.
Clearly define recovery communication goals, objectives, and scope, including information sharing rules and
methods. Based upon this communications plan, consider sharing actionable information about cyber threats
with relevant organizations, such as those described in NIST SP 800-150.18

Before returning to its routine duties, the CSIRT should conduct an after-action review (AAR). The AAR is an
opportunity for everyone who was involved in an incident or disaster to sit down and discuss what happened. In
an AAR, a designated person acts as a moderator and allows everyone to share what happened from his or her own
perspective, while ensuring there is no blame or finger-pointing. All team members review their actions during the
incident and identify areas where the IR plan worked, did not work, or could be improved. Once completed, the AAR
is written up and shared.
All key players review their notes and the AAR and verify that the IR documentation is accurate and precise. The
AAR allows the team to update the plan and brings the reaction team’s actions to a close. The AAR can serve as a
training case for future staff.
According to McAfee, there are 10 common mistakes that an organization’s
after-action review CSIRTs make in IR:
(AAR)
1. Failure to appoint a clear chain of command with a specified individual in
A detailed examination and discus-
sion of the events that occurred charge
during an incident or disaster, from 2. Failure to establish a central operations center
first detection to final recovery.
3. Failure to “know their enemy,” as described in Modules 2 and 4
Module 5 Incident Response and Contingency Planning 197

4. Failure to develop a comprehensive IR plan with containment strategies


5. Failure to record IR activities at all phases, especially help-desk tickets to detect incidents
6. Failure to document the events as they occur in a timeline
7. Failure to distinguish incident containment from incident remediation (as part of reaction)
8. Failure to secure and monitor networks and network devices
9. Failure to establish and manage system and network logging
10. Failure to establish and support effective antivirus and antimalware solutions19

NIST SP 800-61, Rev. 2, makes the following recommendations for handling incidents:
• Acquire tools and resources that may be of value during incident handling—The team will be more efficient at
handling incidents if various tools and resources are already available to them. Examples include contact lists,
encryption software, network diagrams, backup devices, digital forensic software, and port lists.
• Prevent incidents from occurring by ensuring that networks, systems, and applications are sufficiently secure—
Preventing incidents is beneficial to the organization and reduces the workload of the incident response team.
Performing periodic risk assessments and reducing the identified risks to an acceptable level are effective
in reducing the number of incidents. Awareness of security policies and procedures by users, IT staff, and
management is also very important.
• Identify precursors and indicators through alerts generated by several types of security software—Intrusion detec-
tion and prevention systems, antivirus software, and file integrity checking software are valuable for detect-
ing signs of incidents. Each type of software may detect incidents that the other types cannot, so the use of
several types of computer security software is highly recommended. Third-party monitoring services can
also be helpful.
• Establish mechanisms for outside parties to report incidents—Outside parties may want to report incidents
to the organization—for example, they may believe that one of the organization’s users is attacking them.
Organizations should publish a phone number and e-mail address that outside parties can use to report such
incidents.
• Require a baseline level of logging and auditing on all systems and a higher baseline level on all critical systems—
Logs from operating systems, services, and applications frequently provide value during incident analysis,
particularly if auditing was enabled. The logs can provide information such as which accounts were accessed
and what actions were performed.
• Profile networks and systems—Profiling measures the characteristics of expected activity levels so that changes
in patterns can be more easily identified. If the profiling process is automated, deviations from expected activ-
ity levels can be detected and reported to administrators quickly, leading to faster detection of incidents and
operational issues.
• Understand the normal behaviors of networks, systems, and applications—Team members who understand nor-
mal behavior should be able to recognize abnormal behavior more easily. This knowledge can best be gained
by reviewing log entries and security alerts; the handlers should become familiar with typical data and can
investigate unusual entries to gain more knowledge.
• Create a log retention policy—Information about an incident may be recorded in several places. Creating and
implementing a log retention policy that specifies how long log data should be maintained may be extremely
helpful in analysis because older log entries may show reconnaissance activity or previous instances of similar
attacks.
• Perform event correlation—Evidence of an incident may be captured in several logs. Correlating events among
multiple sources can be invaluable in collecting all the available information for an incident and validating
whether the incident occurred.
• Keep all host clocks synchronized—If the devices that report events have inconsistent clock settings, event
correlation will be more complicated. Clock discrepancies may also cause problems from an evidentiary
standpoint.
• Maintain and use a knowledge base of information—Handlers need to reference information quickly during
incident analysis; a centralized knowledge base provides a consistent, maintainable source of information.
The knowledge base should include general information such as data on precursors and indicators of previ-
ous incidents.
198 Principles of Information Security

• Start recording all information as soon as the team suspects that an incident has occurred—Every step taken, from
the time the incident was detected to its final resolution, should be documented and time-stamped. Informa-
tion of this nature can serve as evidence in a court of law if legal prosecution is pursued. Recording the steps
performed can also lead to a more efficient, more systematic, and less error-prone handling of the problem.
• Safeguard incident data—This data often contains sensitive information about vulnerabilities, security
breaches, and users who may have performed inappropriate actions. The team should ensure that access to
incident data is properly restricted, both logically and physically.
• Prioritize handling of incidents based on relevant factors—Because of resource limitations, incidents should
not be handled on a first-come, first-served basis. Instead, organizations should establish written guidelines
that outline how quickly the team must respond to the incident and what actions should be performed, based
on relevant factors such as the functional and information impact of the incident and the likely recoverability
from the incident. This saves time for the incident handlers and provides a justification to management and
system owners for their actions. Organizations should also establish an escalation process for instances when
the team does not respond to an incident within the designated time.
• Include provisions for incident reporting in the organization’s incident response policy—Organizations should
specify which incidents must be reported, when they must be reported, and to whom. The parties most
commonly notified are the CIO, the head of information security, the local information security officer, other
incident response teams within the organization, and system owners.
• Establish strategies and procedures for containing incidents—It is important to contain incidents quickly and
effectively limit their business impact. Organizations should define acceptable risks in containing incidents
and develop strategies and procedures accordingly. Containment strategies should vary based on the type
of incident.
• Follow established procedures for evidence gathering and handling—The team should clearly document how
all evidence has been preserved. Evidence should be accounted for at all times. The team should meet with
legal staff and law enforcement agencies to discuss evidence handling and then develop procedures based
on those discussions.
• Capture volatile data from systems as evidence—This data includes lists of network connections, processes, login
sessions, open files, network interface configurations, and the contents of memory. Running carefully chosen
commands from trusted media can collect the necessary information without damaging the system’s evidence.
• Obtain system snapshots through full forensic disk images, not file system backups—Disk images should be
made to sanitized write-protectable or write-once media. This process is superior to a file system backup for
investigatory and evidentiary purposes. Imaging is also valuable in that it is much safer to analyze an image
than it is to perform analysis on the original system because the analysis may inadvertently alter the original.
• Hold lessons-learned meetings after major incidents—Lessons-learned meetings are extremely helpful in improv-
ing security measures and the incident handling process itself.20

Note that some of these recommendations were covered earlier in this section. CSIRT members should be very
familiar with these tools and techniques prior to an incident. Trying to use unfamiliar procedures in the middle of an
incident could prove very costly to the organization and cause more harm than good.

For more information on incident handling, read the Incident Handlers Handbook by Patrick Kral, which is avail-
i able from the SANS reading room at www.sans.org/reading-room/whitepapers/incident/incident-handlers-hand-
book-33901. You can search for other incident handling papers at www.sans.org/reading-room/whitepapers/
incident/.

Organizational Philosophy on Incident and Disaster Handling


Eventually, the organization will encounter incidents and disasters that stem from an intentional attack on its infor-
mation assets by an individual or group, as opposed to an incident from an unintentional source, such as a service
outage, employee mistake, or natural disaster. At that point, the organization must choose one of two philosophies
that will affect its approach to IR and DR as well as subsequent involvement of digital forensics and law enforcement:

• Protect and forget—This approach, also known as “patch and proceed,” focuses on the defense of data and the
systems that house, use, and transmit it. An investigation that takes this approach focuses on the detection
Module 5 Incident Response and Contingency Planning 199

and analysis of events to determine how they happened and to prevent reoc- protect and forget
currence. Once the current event is over, the questions of who caused it and The organizational CP philosophy
why are almost immaterial. that focuses on the defense of
information assets and prevent-
• Apprehend and prosecute—This approach, also known as “pursue and punish,”
ing reoccurrence rather than the
focuses on the identification and apprehension of responsible individuals, with attacker’s identification and pros-
additional attention paid to the collection and preservation of potential eviden- ecution; also known as “patch and
tiary material that might support administrative or criminal prosecution. This proceed.”

approach requires much more attention to detail to prevent contamination of


evidence that might hinder prosecution. apprehend and
prosecute
An organization might find it impossible to retain enough data to successfully The organizational CP philoso-
handle even administrative penalties, but it should certainly adopt the apprehend- phy that focuses on an attacker’s
and-prosecute approach if it wants to pursue formal punishment, especially if the identification and prosecution,
the defense of information assets,
employee is likely to challenge that punishment. The use of digital forensics to aid and preventing reoccurrence; also
in IR and DR when dealing with intentional attacks will be discussed later in this known as “pursue and punish.”
module, along with information for when or if to involve law enforcement agencies.
What is shocking is how few organizations notify individuals that their personal
data has been breached. Should it ever be exposed to the public, those organizations could find themselves confronted
with criminal charges or corporate negligence suits. Laws like the Sarbanes–Oxley Act of 2002 specifically implement
personal ethical liability requirements for organizational management. Failure to report loss of personal data can run
directly afoul of these laws.

Viewpoint on the Causes of Incidents and Disasters


By Karen Scarfone, Principal Consultant, Scarfone Cybersecurity
The term incident has somewhat different meanings in the contexts of incident response and disaster recovery. People in
the incident response community generally think of an incident as being caused by a malicious attack and a disaster as being
caused by natural causes (fire, floods, earthquakes, etc.). Meanwhile, people in the disaster recovery community tend to use
the term incident in a cause-free manner, with the cause of the incident or disaster generally being irrelevant and the differ-
ence between the two being based solely on the scope of the event’s impact. An incident is a milder event, and a disaster is
a more serious event.
The result is that people who are deeply embedded in the incident response community often think of incident response
as being largely unrelated to disaster recovery, because they think of a disaster as being caused by a natural event, not an
attack. Incident responders also often think of operational problems, such as major service failures, as being neither incidents
nor disasters. Meanwhile, people who are deeply embedded in the disaster recovery community see incident response and
disaster recovery as being much more similar and covering a much more comprehensive range of problems.
So where does the truth lie? Well, it depends on the organization. Some organizations take a more integrated approach
to business continuity and have their incident response, disaster recovery, and other business continuity components closely
integrated with one another so that they work together fairly seamlessly. Other organizations treat these business continuity
components as more discrete elements and focus on making each element strong rather than establishing strong commonali-
ties and linkages among the components. There are pluses and minuses to each of these approaches.
Personally, I find that the most important thing is to avoid turf wars between the business continuity component teams.
There is nothing more frustrating than delaying the response to an incident or disaster because people disagree on its cause.
The security folks say it is an operational problem, the operational folks say it is a disaster, and the disaster folks say it is a
security incident. So, like a hot potato, the event gets passed from team to team while people argue about its cause. In reality,
for some problems the cause is not immediately apparent.
What is important to any organization is that each adverse event, regardless of the cause, be assessed and prioritized as
quickly as possible. That means teams need to be willing to step up and address adverse events, regardless of whether the
event is clearly their responsibility. The impact of the incident is largely unrelated to the cause. If later information shows that
a particular cause better fits a different team, the handling of the event can be transferred to the other team. Teams should
be prepared to transfer events to other teams and to receive transferred events from other teams at any time.
200 Principles of Information Security

Responding as quickly as possible to incidents has become even more important with the increasing integration between
the cyber world and the physical world. Operational technology (OT), cyber-physical systems (CPS), and the Internet of Things
(IoT) are all driving this integration. Now an attacker can exploit cyber vulnerabilities to cause physical impacts, including over-
riding a building’s card readers and other physical security systems to gain unauthorized access and feeding crafted malicious
data into a factory’s power system in order to start a fire or cause an explosion. Delaying the response to an incident may put
human lives at unnecessary risk and ultimately lead to deaths that should have been prevented.

Digital Forensics
Whether due to a character flaw, a need for vengeance, a profit motive, or simple curiosity, an employee or outsider
may attack a physical asset or information asset. When the asset is the responsibility of the CISO, he or she is expected
to understand how policies and laws require the matter to be managed and protected. To protect the organization and
possibly assist law enforcement in an investigation, the CISO must determine what happened and how an incident
occurred. This process is called digital forensics.
Digital forensics is based on the field of traditional forensics. Made popular by scientific detective shows that focus
on crime scene investigations, forensics involves the use of science to investigate events. Not all events involve crimes;
some involve natural events, accidents, or system malfunctions. Forensics allows investigators to determine what hap-
pened by examining the results of an event. It also allows them to determine how the
event happened by examining activities, individual actions, physical evidence, and
digital forensics testimony related to the event. However, forensics might not figure out the “why” of
Investigations that involve the pres- the event; that’s the focus of psychological, sociological, and criminal justice stud-
ervation, identification, extraction,
documentation, and interpretation
ies. Here, the focus is on the application of forensics techniques in the digital arena.
of computer media for evidentiary Digital forensics involves the preservation, identification, extraction, documen-
and root cause analysis, following tation, and interpretation of digital media, including computer media, for evidentiary
clear, well-defined methodologies.
and root cause analysis. Like traditional forensics, it follows clear, well-defined meth-
odologies, but it still tends to be as much an art as a science. In other words, the
forensics natural curiosity and personal skill of the investigator play a key role in discovering
The coherent application of potential evidentiary material (EM). An item does not become evidence until it is
methodical investigatory tech-
formally admitted by a judge or other ruling official.
niques to present evidence of
crimes in a court or similar setting. Digital forensics investigators use a variety of tools to support their work, as you
will learn later in this module. However, the tools and methods used by attackers can
be equally sophisticated. Digital forensics can be used for two key purposes:
evidentiary material
(EM) • To investigate allegations of digital malfeasance. Such an investigation requires
Any information that could poten- digital forensics to gather, analyze, and report the findings. This is the primary
tially support an organization’s
legal or policy-based case against
mission of law enforcement in investigating crimes that involve computer tech-
a suspect; also known as items of nologies or online information.
potential evidentiary value. • To perform root cause analysis. If an incident occurs and the organization sus-
pects an attack was successful, digital forensics can be used to examine the
digital malfeasance path and methodology for gaining unauthorized access, and to determine how
A crime involving digital media, pervasive and successful the attack was. This type of analysis is used primarily
computer technology, or related by incident response teams to examine their equipment after an incident.
components.
Some investigations are undertaken by an organization’s own personnel, while
others require the immediate involvement of law enforcement. In general, whenever
root cause analysis
investigators discover evidence of a crime, they should immediately notify manage-
The determination of the source or
origin of an event, problem, or issue ment and recommend contacting law enforcement. Failure to do so could result in
like an incident. unfavorable action against the investigator or organization.
Module 5 Incident Response and Contingency Planning 201

For more information on digital forensics, visit the American Society of Digital Forensics and eDiscovery at
i www.asdfed.com.

The Digital Forensics Team


Most organizations cannot sustain a permanent digital forensics team; such expertise is so rarely called upon that it
may be better to collect the data and then outsource the analysis component to a regional expert. The organization can
then maintain an arm’s-length distance from the case and have additional expertise to call upon if the process ends in
court. Even so, the information security group should contain members who are trained to understand and manage the
forensics process. If the group receives a report of suspected misuse, either internally or externally, a group member
must be familiar with digital forensics procedures to avoid contaminating potential EM.
This expertise can be obtained by sending staff members to a regional or national information security conference
with a digital forensics track or to dedicated digital forensics training. The organization should use caution in selecting
training for the team or a specialist, as many forensics training programs begin with the analysis process and promote
a specific tool rather than teaching management of the process.

Affidavits and Search Warrants


Most investigations begin with an allegation or an indication of an incident. Whether via the help desk, the organiza-
tion’s sexual harassment reporting channels, or a direct report, someone alleges that a worker is performing actions
explicitly prohibited by the organization or that make another worker uncomfortable in the workplace. In the InfoSec
department, a security analyst notes unusual system or network behavior, as described earlier in this module.
The organization’s forensics team or other authorized entity must then obtain permission to examine digital media
for potential EM. In law enforcement, the investigating agent would create an affidavit requesting permission to search
for and confiscate related EM. The affidavit summarizes the facts of the case, items relevant to the investigation, and
the location of the event. When an approving authority signs the affidavit or creates a synopsis form based on the
document, it becomes a search warrant. In corporate environments, the names of these documents may change, and
in many cases written authorization may not be needed, but the process should be the same. Formal permission is
obtained before an investigation occurs.

Digital Forensics Methodology


In digital forensics, all investigations follow the same basic methodology once permission for search and seizure has
been obtained:

1. Identify relevant EM.


2. Acquire (seize) the evidence without alteration or damage.
3. Take steps to ensure that the evidence is verifiably authentic at every step and is unchanged from the time
it was seized.
4. Analyze the data without risking modification or unauthorized access.
5. Report the findings to the proper authority. affidavit
Sworn testimony that certain facts
This process is illustrated in Figure 5-9. are in the possession of an investi-
To support the selection and implementation of a methodology for forensics, gating officer and that they warrant
the organization may want to seek legal advice or consult with local or state law the examination of specific items
located at a specific place; the affi-
enforcement. Other references that should become part of the organization’s library davit specifies the facts, the items,
include the following: and the place.

• Electronic Crime Scene Investigation: A Guide for First Responders, 2nd


Edition, April 2008 (https://www.ncjrs.gov/pdffiles1/nij/219941.pdf) search warrant
• Searching and Seizing Computers and Obtaining Electronic Evidence Permission to search for eviden-
tiary material at a specified loca-
in Criminal Investigations (www.justice.gov/criminal/cybercrime/docs/ tion or to seize items to return to an
ssmanual2009.pdf) investigator’s lab for examination.
202 Principles of Information Security

Security incident
Archive triggers incident
response process

No
Prepare affidavit
Policy violation or seeking Investigation
Yes Collect evidence
crime detected authorization authorized?
to investigate

Either internal or external to the organization

Produce report
Archive and submit Analyze evidence
for disposition

Figure 5-9 The digital forensics process

• Scientific Working Group on Digital Evidence: Published Guidelines and Best Practices (https://www.swgde.
org/documents/published)
• First Responders Guide to Computer Forensics (https://resources.sei.cmu.edu/asset_files/Handbook/
2005_002_001_14429.pdf)
• First Responders Guide to Computer Forensics: Advanced Topics (http://resources.sei.cmu.edu/asset_files/
handbook/2005_002_001_14432.pdf)

Identifying Relevant Items


The affidavit or warrant that authorizes a search must identify what items of evidence can be seized and where they
are located. Only EM that fits the description on the authorization can be seized. These seizures often occur under
stressful circumstances and strict time constraints, so thorough item descriptions help the process function smoothly
and ensure that critical evidence is not overlooked. Thorough descriptions also ensure that items are not wrongly
included as EM, which could jeopardize the investigation.
Because users have access to many online server locations via free e-mail archives, FTP servers, and video
archives, and could have terabytes of information stored in off-site locations across the Web or on their local systems,
investigators must have an idea of what to look for or they may never find it.

Acquiring the Evidence


The principal responsibility of the response team is to acquire the information without altering it. Computers and
users modify data constantly. Every time someone opens, modifies, or saves a file, or even opens a directory index to
view the available files, the state of the system is changed. Normal system file changes may be difficult to explain to
a layperson—for example, a jury member with little or no technical knowledge. A normal system consequence of the
search for EM could be portrayed by a defense attorney as harmful to the EM’s authenticity or integrity, which could
lead a jury to suspect it was planted or is otherwise suspect.

Online Versus Offline Data Acquisition There are generally two methods of acquiring evidence from a system. The
first is the offline model, in which the investigator removes the power source and then uses a utility or special device
to make a bit-stream, sector-by-sector copy of the hard drives on the system. By copying the drives at the sector level,
you can ensure that any hidden or erased files are also captured. The copied drive then becomes the image that can
be used for analysis, and the original drive is stored for safekeeping as true EM or possibly returned to service. For
the purposes of this discussion, the term copy refers to a drive duplication technique, whereas an image is the file that
contains all the information from the source drive.
Module 5 Incident Response and Contingency Planning 203

This approach requires the use of sound processes and techniques or read-only hardware known as write-blockers
to prevent the accidental overwriting of data on the source drive. The use of these tools also allows investigators
to assert that the EM was not modified during acquisition. In another offline approach, the investigator can reboot
the system with an alternate operating system or a specialty boot disk like Helix or Knoppix. Still another approach
involves specialty hardware that connects directly to a powered-down hard drive and provides direct power and data
connections to copy data to an internal drive.
In online or live data acquisition, investigators use network-based tools to acquire a protected copy of the informa-
tion. The only real difference between the two methods is that the source system cannot be taken offline, and the tools
must be sophisticated enough to avoid altering the system during data acquisition. Furthermore, live data acquisition
techniques may acquire data that is in movement and in an inconsistent state with some transactions that are only
partially recorded. Table 5-4 lists common methods of acquiring data.
The creation of a copy or image can take a substantial amount of time. Users who have made USB copies of their
data know how much time it takes to back up several gigabytes of data. When dealing with networked server drives,
the data acquisition phase can take many hours to complete, which is one reason investigators prefer to seize drives
and take them back to the lab to be imaged or copied.

Other Potential EM Not all EM is on a suspect’s computer hard drive. A technically savvy attacker is more likely to
store incriminating evidence on other digital media, such as smartphones, removable drives, CDs, DVDs, flash drives,
memory chips or sticks, or other computers accessed across the organization’s networks or via the Internet. EM located
outside the organization is particularly problematic because the organization cannot legally search systems it doesn’t
own. However, the simple act of viewing EM on a system leaves clues about the location of the source material, and a
skilled investigator can at least provide some assistance to law enforcement when conducting a preliminary investiga-
tion. Log files are another source of information about the access and location of EM, as well as what happened and when.

Table 5-4 Summary of Methods Employed to Acquire Forensic Data

Method Advantages Disadvantages


Use a dedicated forensic workstation No concern about the validity Inconvenient, time-consuming. May
to examine a write-protected hard of software or hardware on the result in loss of volatile information.
drive or image of the suspect hard suspect host. Produces evidence
drive. most easily defended in court.
Boot the system using a verified, write- Convenient, quick. Evidence is Assumes that hardware has not
protected CD or other media with defensible if suspect drives are been compromised because it is
kernel and tools. mounted as read-only. much less likely than compromised
software. May result in loss of
volatile information.
Build a new system that contains Completely replicates operating Requires availability of hardware
an image of the suspect system and environment of suspect computer that is identical to that on the
examine it. without running the risk of suspect computer. May result in los
changing its information. of volatile information.
Examine the system using external Convenient, quick. Allows If a kernel is compromised, results
media with verified software. examination of volatile information. may be misleading. External media
may not contain every necessary
utility.
Verify the software on the suspect Requires minimal preparation. Lack of write protection for suspect
system, and then use the verified local Allows examination of volatile drives makes evidence difficult to
software to conduct the examination. information. Can be performed defend in court. Finding sources for
remotely. hash values and verifying the local
software requires at least several
hours, unless Tripwire was used
ahead of time.
Examine the suspect system using the Requires least amount of Least reliable method. This is exactly
software on it, without verifying the preparation. Allows examination what cyberattackers are hoping you
software. of volatile information. Can be will do. Often a complete waste of
performed remotely. time.
204 Principles of Information Security

Some evidence isn’t electronic or digital. Many suspects have been further incriminated when passwords to
their digital media were discovered in the margins of user manuals, in calendars and day planners, and even on notes
attached to their systems.
EM Handling Once the evidence is acquired, both the copy image and the original drive should be handled properly
to avoid legal challenges based on authenticity and preservation of integrity. If the organization or law enforcement
cannot demonstrate that no one had access to the evidence, they cannot provide strong assurances that it has not
been altered. Such access can be physical or logical if the device is connected to a network. Once the evidence is in the
possession of investigators, they must track its movement, storage, and access until the resolution of the event or case.
This is typically accomplished through chain of evidence (also known as chain of custody) procedures. The evidence
is then tracked wherever it is located. When the evidence changes hands or is stored, the documentation is updated.
Not all evidence-handling requirements are met through the chain of custody process. Digital media must be stored
in a specially designed environment that can be secured to prevent unauthorized access. For example, individual
items might need to be stored in containers or bags that protect them from electrostatic discharge or magnetic fields.
Additional details are provided in the nearby feature on search-and-seizure procedures.

Authenticating the Recovered Evidence The copy or image is typically transferred to the laboratory for the next
stage of authentication. Using cryptographic hash tools, the team must be able to demonstrate that any analyzed
copy or image is a true and accurate replica of the source EM. As you will learn in Module 10, the hash tool takes a
variable-length file and creates a single numerical value, usually represented in hexadecimal notation, that functions
like a digital fingerprint. By hashing the source file and the copy, the investigator can assert that the copy is a true
and accurate duplicate of the source.
Analyzing the Data The most complex part of an investigation is analyzing the copy
or image for potential EM. While the process can be performed manually using simple
utilities, three industry-leading applications dominate the market for digital forensics:
chain of evidence
• Guidance Software’s EnCase (www.guidancesoftware.com)
The detailed documentation of the
collection, storage, transfer, and • AccessData Forensics Tool Kit (FTK, at www.accessdata.com)
ownership of evidentiary material • OSForensics (www.osforensics.com)
from the crime scene through its
presentation in court and its even- Open-source alternatives to these rather expensive tools include Autopsy
tual disposition. and The Sleuth Kit, which are available from www.sleuthkit.org. Autopsy, shown in
Figure 5-10, is a stand-alone GUI interface for The Sleuth Kit, which natively uses a
chain of custody command-line interface. Each tool is designed to support an investigation and assist
See chain of evidence. in the management of the entire case.
Source: sleuthkit.org.

Figure 5-10 Autopsy software


Module 5 Incident Response and Contingency Planning 205

General Procedures for Evidence Search and Seizure


At the crime scene, a fully qualified and authorized forensics team should be supervised as it completes the following tasks:
1. Secure the crime scene by clearing all unauthorized personnel, delimit the scene with tape or other markers,
and post a guard or other person at the entrance.
2. Log in to the crime scene by signing the entry/exit log.
3. Photograph the scene beginning at the doorway and covering the entire room in 360 degrees. Include specific photos
of potential evidentiary material.
4. Sketch the layout of the room, including furniture and equipment.
5. Following proper procedure, begin searching for physical, documentary evidence to support your case, including
papers, media such as CDs or flash memory devices, or other artifacts. Identify the location of each piece of evidence
with a marker or other designator, and cross-reference it on the sketch. Photograph the item in situ to establish its
location and state.
6. For each computer, first check for the presence of a screensaver by moving the mouse. Do not click the mouse or use
the keyboard. If the screen is active, photograph the screen. Turn off the power on permitted systems. Document
each computer by taking a photograph and providing a detailed written description of the manufacturer, model num-
ber, serial number, and other details. Using sound processes, remove each disk drive and image it using the appropri-
ate process and equipment. Document each source drive by photographing it and providing a detailed description of
the manufacturer, serial number, and other details. Package and secure the image.
7. For each object found, complete the necessary evidence or chain of custody labels.
8. Log out of the crime scene by signing the entry/exit log.
9. Transfer all evidence to the lab for investigation or to a suitable evidence locker for storage. Store and transport all
evidence, documentation, and photographic materials in a locked field evidence locker.
Analyze the image:

1. Build the case file by entering background information, including the investigator, suspect, date, time, and sys-
tem analyzed.
2. Load the image file into the case file. Typical image files have .img, .e01, or .001 extensions.
3. Index the image. Note that some systems use a database of known files to filter out files that are applications, system
files, or utilities. The use of this filter improves the quality and effectiveness of the indexing process.
4. Identify, export, and bookmark related text files by searching the index.
5. Identify, export, and bookmark related graphics by reviewing the images folder. If the suspect is accused of viewing
child pornography, do not directly view the images. Some things you can’t “unsee.” Use the database of known images
to compare hash values and tag them as suspect.
6. Identify, export, and bookmark other evidence files.
7. Integrate all exported and bookmarked material into the case report.

The first component of the analysis phase is indexing. During indexing, many investigatory tools create an index
of all text found on the drive, including data found in deleted files and in file slack space. This indexing is similar to
that performed by Google Desktop or Windows Desktop Search tools. The index can then be used by the investigator
to locate specific documents or document fragments. While indexing, the tools typically organize files into categories,
such as documents, images, and executables. Unfortunately, like imaging, indexing is a time- and processor-consuming
operation, and it could take days on images that are larger than 20 gigabytes.
In some cases, the investigator may find password-protected files that the suspect used to protect the data. Several
commercial password-cracking tools can assist the investigator. Some are sold in conjunction with forensics tools, like
the AccessData Password Recovery Tool Kit.

Reporting the Findings As investigators examine the analyzed copies or images and identify potential EM, they can
tag it and add it to their case files. Once they have found a suitable amount of information, they can summarize their
findings with a synopsis of their investigatory procedures in a report and submit it to the appropriate authority. This
206 Principles of Information Security

authority could be law enforcement or management. The suitable amount of EM is a flexible determination made by the
investigator. In certain cases, like child pornography, one file is sufficient to warrant turning over the entire investigation
to law enforcement. On the other hand, dismissing an employee for the unauthorized sale of intellectual property may
require a substantial amount of information to support the organization’s assertion. Reporting methods and formats
vary among organizations and should be specified in the digital forensics policy. A general guideline is that the report
should be sufficiently detailed to allow a similarly trained person to repeat the analysis and achieve similar results.

Evidentiary Procedures
In information security, most operations focus on policies—documents that provide managerial guidance for ongoing imple-
mentation and operations. In digital forensics, however, the focus is on procedures. When investigating digital malfeasance
or performing root cause analysis, keep in mind that the results and methods of the investigation may end up in criminal
or civil court. For example, during a routine systems update, suppose that a technician finds objectionable material on an
employee’s computer. The employee is fired and promptly sues the organization for wrongful termination, so the investiga-
tion of the objectionable material comes under scrutiny by the plaintiff’s attorney, who will attempt to cast doubt on the
ability of the investigator. While technically not illegal, the presence of the material may have been a clear violation of policy,
prompting the dismissal of the employee. However, if an attorney can convince a jury or judge that someone else could have
placed the material on the plaintiff’s system, the employee could win the case and potentially a large financial settlement.
When the scenario involves criminal issues in which an employee discovers evidence of a crime, the situation
changes somewhat. The investigation, analysis, and report are typically performed by law enforcement personnel.
However, if the defense attorney can cast reasonable doubt on whether the organization’s information security profes-
sionals compromised the digital evidentiary material, the employee might win the case. How do you avoid these legal
pitfalls? Strong procedures for handling potential evidentiary material can minimize the probability that an organiza-
tion will lose a legal challenge.
Organizations should develop specific procedures, along with guidance for their effective use. The policy docu-
ment should specify the following:
• Who may conduct an investigation
• Who may authorize an investigation
• What affidavits and related documents are required
• What search warrants and related documents are required
• What digital media may be seized or taken offline
• What methodology should be followed
• What methods are required for chain of custody or chain of evidence
• What format the final report should take and to whom it should be given
The policy document should be supported by a procedures manual and devel-
disaster recovery (DR) oped based on the documents discussed earlier, along with guidance from law
An organization’s set of planning enforcement or consultants. By creating and using these policies and procedures,
and preparation efforts for detect- an organization can best protect itself from challenges by employees who have been
ing, reacting to, and recovering subject to unfavorable action from an investigation.
from a disaster.

disaster recovery
planning (DRP) Disaster Recovery
The actions taken by senior man-
agement to develop and implement The next vital part of CP focuses on disaster recovery (DR). Disaster recovery
the DR policy, plan, and recovery planning (DRP) entails the preparation for and recovery from a disaster, whether
teams.
natural or human-made. In some cases, incidents detected by the IR team may esca-
late to the level of disaster, and the IR plan may no longer be able to handle the
disaster recovery plan effective and efficient recovery from the loss. For example, if a malicious program
(DR plan) evades containment actions and infects and disables many or most of an organiza-
The documented product of disas-
tion’s systems and their ability to function, the disaster recovery plan (DR plan) is
ter recovery planning; a plan that
shows the organization’s intended activated. Sometimes, events are by their nature immediately classified as disasters,
efforts in the event of a disaster. such as an extensive fire, flood, damaging storm, or earthquake.
Module 5 Incident Response and Contingency Planning 207

As you learned earlier in this module, the CP team creates the DR planning team (DRPT). The DRPT in turn orga-
nizes and prepares the DR response teams (DRRTs) to implement the DR plan in the event of a disaster. In reality,
there may be many different DRRTs, each tasked with a different aspect of recovery. InfoSec staff most likely will not
lead these teams but will support their efforts, ensuring that no new vulnerabilities arise during the recovery process.
The various DRRTs will have multiple responsibilities in the recovery of the primary site and the reestablishment of
operations:

• Recover information assets that are salvageable from the primary facility after the disaster.
• Purchase or otherwise acquire replacement information assets from appropriate sources.
• Reestablish functional information assets at the primary site if possible or at a new primary site, if
necessary.

Some common DRRTs include the following:

• DR management team—Coordinates the on-site efforts of all other DRRTs.


• Communications team—With representatives from the public relations and legal departments, provides feed-
back to anyone who wants additional information about the organization’s efforts in recovering from the
disaster.
• Computer recovery (hardware) team—Works to recover any physical computing assets that might be usable
after the disaster and acquire replacement assets for resumption of operations.
• Systems recovery (OS) team—Works to recover operating systems and may contain one or more specialists
on each operating system that the organization employs; may be combined with the applications recovery
team as a “software recovery team” or with the hardware team as a “systems recovery team” or “computer
recovery team.”
• Network recovery team—Works to determine the extent of damage to the network wiring and hardware (hubs,
switches, and routers) as well as to Internet and intranet connectivity.
• Storage recovery team—Works with the other teams to recover storage-related information assets; may be
subsumed into other hardware and software teams.
• Applications recovery team—Works to recover critical applications.
• Data management team—Works on data restoration and recovery, whether from on-site, off-site, or online
transactional data.
• Vendor contact team—Works with suppliers and vendors to replace damaged or destroyed materials, equip-
ment, or services, as determined by the other teams.
• Damage assessment and salvage team—Specialized individuals who provide initial assessments of the extent
of damage to materials, inventory, equipment, and systems on-site.
• Business interface team—Works with the remainder of the organization to assist in the recovery of nontechnol-
ogy functions.
• Logistics team—Responsible for providing any needed supplies, space, materials, food, services, or facilities
at the primary site; may be combined with the vendor contact team.
• Other teams as needed.

The Disaster Recovery Process


In general, a disaster has occurred when either of two criteria is met: (1) The organization is unable to contain or
control the impact of an incident, or (2) the level of damage or destruction from an incident is so severe that the orga-
nization cannot quickly recover from it. The distinction between an incident and a disaster may be subtle. The DRPT
must document in the DR plan whether a particular event is classified as an incident or a disaster. This determination
is critical because it determines which plan is activated. The key role of the DR plan is to prepare to reestablish opera-
tions at the organization’s primary location after a disaster or to establish operations at a new location if the primary
site is no longer viable.
You learned earlier in this module about the CP process recommended by NIST, which uses seven steps. In the
broader context of organizational CP, these steps form the overall CP process. These steps are adapted and applied
here within the narrower context of the DRP process, resulting in an eight-step DR process.
208 Principles of Information Security

1. Organize the DR team—The initial assignments to the DR team, including the team lead, will most likely
be performed by the CPMT; however, additional personnel may need to be assigned to the team as the
specifics of the DR policy and plan are developed, and as individual roles and responsibilities are defined
and assigned.
2. Develop the DR planning policy statement—A formal department or agency policy provides the authority
and guidance necessary to develop an effective contingency plan.
3. Review the BIA—The BIA was prepared to help identify and prioritize critical information and its host
systems. A review of what was discovered is an important step in the process.
4. Identify preventive controls—Measures taken to reduce the effects of business and system disruptions can
increase information availability and reduce contingency life cycle costs.
5. Create DR strategies—Thorough recovery strategies ensure that the system can be recovered quickly and
effectively following a disruption.
6. Develop the DR plan document—The plan should contain detailed guidance and procedures for restoring a
damaged system.
7. Ensure DR plan testing, training, and exercises—Testing the plan identifies planning gaps, whereas training
prepares recovery personnel for plan activation; both activities improve plan effectiveness and overall
agency preparedness.
8. Ensure DR plan maintenance—The plan should be a living document that is updated regularly to remain
current with system enhancements.

Disaster Recovery Policy


As noted in step 2 of the preceding list, the DR team, led by the manager designated as the DR team leader, begins
with the development of the DR policy soon after the team is formed. The policy presents an overview of the orga-
nization’s philosophy on the conduct of DR operations and serves as the guide for the development of the DR plan.
The DR policy itself may have been created by the organization’s CP team and handed down to the DR team leader.
Alternatively, the DR team may be assigned the role of developing the DR policy. In either case, the DR policy contains
the following key elements:
• Purpose—The purpose of the DR program is to provide direction and guidance for all DR operations. In addi-
tion, the program provides for the development and support of the DR plan. In everyday practice, those
responsible for the program must also work to emphasize the importance of creating and maintaining effective
DR functions. As with any major enterprise-wide policy effort, it is important for the DR program to begin with
a clear statement of executive vision.
• Scope—This section of the policy identifies the organizational units and groups of employees to which the
policy applies. This clarification is important if the organization is geographically dispersed or is creating dif-
ferent policies for different organizational units.
• Roles and responsibilities—This section of the policy identifies the roles and responsibilities of the key players in
the DR operation. It can include a delineation of the responsibilities of executive management down to individual
employees. Some sections of the DR policy may be duplicated from the organization’s overall CP policy. In smaller
organizations, this redundancy can be eliminated, as many of the functions are performed by the same group.
• Resource requirements—An organization can allocate specific resources to the development of DR plans here. While
this may include directives for individuals, it can be separated from the previous section for emphasis and clarity.
• Training requirements—This section defines and highlights training requirements for units within the organiza-
tion and the various categories of employees.
• Exercise and testing schedules—This section stipulates the testing intervals of the DR plan as well as the type
of testing and the individuals involved.
• Plan maintenance schedule—This section states the required review and update
intervals of the plan and identifies who is involved in the review. It is not necessary for
DR policy the entire DR team to be involved, but the review can be combined with a periodic test
The policy document that guides
of the DR plan as long as the resulting discussion includes areas for improving the plan.
the development and implementa-
tion of DR plans and the formula- • Special considerations—This section includes such items as information storage
tion and performance of DR teams. and maintenance.
Module 5 Incident Response and Contingency Planning 209

Disaster Classification disaster classification


The process of examining an
A DR plan can classify disasters in a number of ways. The most common method of adverse event or incident and
disaster classification is to evaluate the amount of damage that could potentially determining whether it constitutes
be caused by the disaster—usually on a scale of Moderate, Severe, or Critical, for an actual disaster.

example. Disasters could also be classified by their origin, such as natural or human-
made. Most incidents fall into the human-made category (like hacker intrusions or slow-onset disasters
malware), but some could be tied to natural origins, such as fires or floods. Many Disasters that occur over time and
gradually degrade the capacity of
disasters begin as incidents, and only when they reach a specified threshold are they
an organization to withstand their
escalated from incident to disaster. A denial-of-service attack that affects a single effects.
system for a short time may be an incident, but when it escalates to affect an entire
organization for a much longer period of time, it may be reclassified as a disaster. rapid-onset disasters
Who makes this classification? It is most commonly done by a senior IT or InfoSec Disasters that occur suddenly,
manager working closely with the CSIRT and DR team leads. When the CSIRT reports with little warning, taking people’s
that an incident or collection of incidents has begun to exceed their capability to lives and destroying the means of
production.
respond, they may request that the incident(s) be reclassified as a disaster in order
for the organization to better handle the expected damage or loss.
Disasters may also be classified by their rate of occurrence. Slow-onset disasters build up gradually over time
before they can degrade the operations of the organization to withstand their effect. Hazards that cause these disaster
conditions typically include natural causes such as droughts, famines, environmental degradation, desertification,
deforestation, and pest infestation, as well as human-made causes such as malware, hackers, disgruntled employ-
ees, and service provider issues. The series of U.S. hurricanes during the fall of 2017 were an example of slow-onset
disasters—effective weather predictions enabled much of the southeast United States to prepare for the hurricanes’
potential impacts days before the storms made landfall. Similarly, the COVID-19 pandemic of 2020 was an example of
a slow-onset disaster, as its progression was tracked by global media from the start.
Usually, disasters that strike quickly are instantly classified as disasters. These disasters are commonly
referred to as rapid-onset disasters, as they occur suddenly with little warning, taking people’s lives and destroy-
ing the means of production. Rapid-onset disasters may be caused by natural effects like earthquakes, floods, storm
winds, tornadoes, and mud flows, or by human-made effects like massively distributed denial-of-service attacks;
acts of terrorism, including cyberterrorism or hacktivism; and acts of war. Interestingly, fire is an example of an
incident that can either escalate to disaster or begin as one (in the event of an explosion, for example). Fire can
be categorized as a natural disaster when caused by a lightning strike or as human-made when it is the result of
arson or an accident.
Table 5-5 presents a list of natural disasters, their effects, and recommendations for mitigation.

Planning to Recover
To plan for disasters, the CPMT engages in scenario development and impact analysis, along the way categoriz-
ing the level of threat that each potential disaster poses. When generating a DR scenario, start with the most
important asset: people. Do you have the human resources with the appropriate organizational knowledge to
restore business operations? Organizations must cross-train their employees to ensure that operations and a
sense of normalcy can be restored. In addition, the DR plan must be tested regularly so that the DR team can
lead the recovery effort quickly and efficiently. Key elements that the CPMT must build into the DR plan include
the following:

1. Clear delegation of roles and responsibilities—Everyone assigned to the DR team should be aware of his or
her duties during a disaster. Some team members may be responsible for coordinating with local services,
such as fire, police, and medical personnel. Some may be responsible for the evacuation of company
personnel, if required. Others may be assigned to simply pack up and leave.
2. Execution of the alert roster and notification of key personnel—These notifications may extend outside
the organization to include the fire, police, or medical services mentioned earlier, as well as insurance
agencies, disaster teams such as those of the Red Cross, and management teams.
210 Principles of Information Security

Table 5-5 Natural Disasters and Their Effects on Information Systems

Natural Disaster Effects and Mitigation


Fire Damages the building housing the computing equipment that constitutes all or
part of the information system. Also encompasses smoke damage from the fire
and water damage from sprinkler systems or firefighters. Can usually be mitigated
with fire casualty insurance or business interruption insurance.
Flood Can cause direct damage to all or part of the information system or to the building
that houses all or part of the information system. May also disrupt operations
by interrupting access to the buildings that house all or part of the information
system. Can sometimes be mitigated with flood insurance or business interruption
insurance.
Earthquake Can cause direct damage to all or part of the information system or, more often, to
the building that houses it. May also disrupt operations by interrupting access to
the buildings that house all or part of the information system. Can sometimes be
mitigated with specific casualty insurance or business interruption insurance, but
is usually a specific and separate policy.
Lightning Can directly damage all or part of the information system or its power distribution
components. Can also cause fires or other damage to the building that houses
all or part of the information system. May also disrupt operations by interrupting
access to the buildings that house all or part of the information system as
well as the routine delivery of electrical power. Can usually be mitigated with
multipurpose casualty insurance or business interruption insurance.
Landslide or mudslide Can damage all or part of the information system or, more likely, the building
that houses it. May also disrupt operations by interrupting access to the buildings
that house all or part of the information system as well as the routine delivery of
electrical power. Can sometimes be mitigated with casualty insurance or business
interruption insurance.
Tornado or severe windstorm Can directly damage all or part of the information system or, more likely, the
building that houses it. May also disrupt operations by interrupting access to the
buildings that house all or part of the information system as well as the routine
delivery of electrical power. Can sometimes be mitigated with casualty insurance
or business interruption insurance.
Hurricane or typhoon Can directly damage all or part of the information system or, more likely, the
building that houses it. Organizations located in coastal or low-lying areas may
experience flooding. May also disrupt operations by interrupting access to the
buildings that house all or part of the information system as well as the routine
delivery of electrical power. Can sometimes be mitigated with casualty insurance
or business interruption insurance.
Tsunami Can directly damage all or part of the information system or, more likely, the
building that houses it. Organizations located in coastal areas may experience
tsunamis. May also cause disruption to operations by interrupting access or
electrical power to the buildings that house all or part of the information system.
Can sometimes be mitigated with casualty insurance or business interruption
insurance.
Electrostatic discharge (ESD) Can be costly or dangerous when it ignites flammable mixtures and damages
costly electronic components. Static electricity can draw dust into clean-room
environments or cause products to stick together. The cost of servicing ESD-
damaged electronic devices and interruptions can range from a few cents to millions
of dollars for critical systems. Loss of production time in information processing
due to the effects of ESD is significant. While not usually viewed as a threat, ESD can
disrupt information systems and is not usually an insurable loss unless covered by
business interruption insurance. ESD can be mitigated with special static discharge
equipment and by managing HVAC temperature and humidity levels.
Dust contamination Can shorten the life of information systems or cause unplanned downtime.
Can usually be mitigated with an effective HVAC filtration system and simple
procedures, such as efficient housekeeping, placing tacky floor mats at entrances,
and prohibiting the use of paper and cardboard in the data center.
Module 5 Incident Response and Contingency Planning 211

3. Clear establishment of priorities—During a disaster response, the first priority is always the preservation
of human life. Data and systems protection is subordinate when the disaster threatens the lives, health, or
welfare of the employees or members of the community. Only after all employees and neighbors have been
safeguarded can the DR team attend to protecting other organizational assets.
4. Procedures for documentation of the disaster—Just as in an incident response, the disaster must be carefully
recorded from the onset. This documentation is used later to determine how and why the disaster
occurred.
5. Action steps to mitigate the impact of the disaster on the operations of the organization—The DR plan should
specify the responsibilities of each DR team member, such as the evacuation of physical assets or making
sure that all systems are securely shut down to prevent further loss of data.
6. Alternative implementations for the various system components, should primary versions be unavailable—
These components include standby equipment that is either purchased, leased, or under contract with a
DR service agency. Developing systems with excess capacity, fault tolerance, autorecovery, and fail-safe
features facilitates a quick recovery. Something as simple as using Dynamic Host Control Protocol (DHCP)
to assign network addresses instead of using static addresses can allow systems to regain connectivity
quickly and easily without technical support. Networks should support dynamic reconfiguration;
restoration of network connectivity should be planned. Data recovery requires effective backup
strategies as well as flexible hardware configurations. System management should be a top priority. All
solutions should be tightly integrated and developed in a strategic plan to provide continuity. Piecemeal
construction can result in a disaster after the disaster, as incompatible systems are unexpectedly thrust
together.
As part of DR plan readiness, each employee should have two sets of emergency information in his or her possession
at all times. The first is personal emergency information—the person to notify in case of an emergency (next of kin), medi-
cal conditions, and a form of identification. The second is a set of instructions on what to do in the event of an emergency.
This snapshot of the DR plan should contain a contact number or hotline for calling the organization during an emergency,
emergency services numbers (fire, police, medical), evacuation and assembly locations (e.g., storm shelters), the name
and number of the DR coordinator, and any other needed information. An example of an emergency ID card is shown in
Figure 5-11.

Responding to the Disaster


When a disaster strikes, actual events can at times overwhelm even the best of DR plans. To be prepared, the CPMT
should incorporate a degree of flexibility into the plan. If the physical facilities are intact, the DR team should begin
the restoration of systems and data to work toward full operational capability. If the organization’s facilities are
destroyed, alternative actions must be taken until new facilities can be acquired. When a disaster threatens the
viability of an organization at the primary site, the DR process becomes a business continuity process, which is
described next.

Front Back
ABC Company Emergency ID Card ABC Company DR Plan Codes
Name:___________________________ DOB:_____________ CODE ACTION
Address:__________________________________________ 1a Shelter in Place – do not report to work
City:_________________ St:_________ Zip:_____________
Blood Type:__________ 1b Shelter in Place – DR team to work
Allergies:__________________________________________
2a Evacuate immediately – do not report to work
Organ Donor?:____________________________________
Emergency Contacts:_______________________________ 2b Evacuate immediately – DR team to work
> 3 Lockdown – Secure all doors/windows – do
> not report to work if off-site
Call 800-555-1212 for updates and to report status Call 800-555-1212 for updates and to report status

Figure 5-11 A sample emergency information card


212 Principles of Information Security

business continuity
(BC) Business Continuity
An organization’s set of efforts to
ensure its long-term viability when Sometimes, disasters have such a profound effect on the organization that it cannot
a disaster precludes normal opera- continue operations at its primary site until it fully completes all DR efforts. To deal
tions at the primary site; typically with such events, the organization implements its business continuity (BC) strategies.
includes temporarily establish-
ing critical operations at an alter- Business continuity planning (BCP) ensures that critical business functions
nate site until operations can be can continue if a disaster occurs. Like the DR plan, the BC plan involves teams from
resumed at the primary site or a across the organization, including IT and business operations, and is supported by
new permanent site.
InfoSec. The BC plan is usually managed by the CEO or COO of the organization, and
is activated and executed concurrently with the DR plan when the disaster is major
business continuity or long-term and requires fuller and more complex restoration of information and IT
planning (BCP)
resources. If a disaster renders the current business location unusable, there must
The actions taken by senior man-
agement to develop and implement be a plan to allow the business to continue to function. While the BC plan reestab-
the BC policy, plan, and continuity lishes critical business functions at an alternate site, the DR plan focuses on reestab-
teams. lishment of the technical infrastructure and business operations at the primary site.
Not every business needs a BC plan or BC facilities. Some small companies or fis-
BC plan cally sound organizations may be able simply to cease operations until the primary
The documented product of busi- facilities are restored. Manufacturing and retail organizations, however, depend on
ness continuity planning; a plan
continued operations for revenue. Thus, these entities must have a BC plan in place
that shows the organization’s
intended efforts to continue critical if they need to relocate operations quickly with minimal loss of revenue.
functions when operations at the BC is an element of CP, and it is best accomplished using a repeatable process or
primary site are not feasible. methodology. NIST’s SP 800-34, Rev. 1, “Contingency Planning Guide for Federal Informa-
tion Systems,”21 includes guidance for planning for incidents, disasters, and situations
that call for BC. The approach used in that document has been adapted for BC use here.
The first step in all contingency efforts is the development of policy; the next step is planning. In some organiza-
tions, these steps are considered concurrent operations in which development of policy is a function of planning; in
other organizations, policy comes before planning and is a separate process. In this text, the BC policy is developed
prior to the BC plan, and both are developed as part of BC planning. The same seven-step approach that NIST recom-
mends for CP can be adapted to an eight-step model that can be used to develop and maintain a viable BC program.
Those steps are as follows:
1. Form the BC team—As was done with the DR planning process, the initial assignments to the BC team,
including the team lead, will most likely be performed by the CPMT; however, additional personnel
may need to be assigned to the team as the specifics of the BC policy and plan are developed, and their
individual roles and responsibilities will have to be defined and assigned.
2. Develop the BC planning policy statement—A formal organizational policy provides the authority and
guidance necessary to develop an effective continuity plan. As with any enterprise-wide policy process, it
is important to begin with the executive vision.
3. Review the BIA—Information contained within the BIA can help identify and prioritize critical
organizational functions and systems for the purposes of business continuity, making it easier to
understand what functions and systems will need to be reestablished elsewhere in the event of a disaster.
4. Identify preventive controls—Little is done here exclusively for BC. Most of the steps taken in the CP and
DRP processes will provide the necessary foundation for BCP.
5. Create relocation strategies—Thorough relocation strategies ensure that critical business functions will be
reestablished quickly and effectively at an alternate location following a disruption.
6. Develop the BC plan—The BC plan should contain detailed guidance and procedures for implementing BC
strategies at predetermined locations in accordance with management’s guidance.
7. Ensure BC plan testing, training, and exercises—Testing the plan identifies planning gaps, whereas training
prepares recovery personnel for plan activation; both activities improve plan effectiveness and overall
agency preparedness.
8. Ensure BC plan maintenance—The plan should be a living document that is updated regularly to remain
current with system enhancements.
Module 5 Incident Response and Contingency Planning 213

Business Continuity Policy BC policy


The policy document that guides
BCP begins with the development of the BC policy, which reflects the organization’s the development and implementa-
philosophy on the conduct of BC operations and serves as the guiding document for tion of BC plans and the formula-
tion and performance of BC teams.
the development of BCP. The BC team leader might receive the BC policy from the
CP team or might guide the BC team in developing one. The BC policy contains the
following key sections: business resumption
planning (BRP)
• Purpose—The purpose of the BC program is to provide the necessary planning The actions taken by senior man-
and coordination to help relocate critical business functions should a disaster agement to develop and implement
prohibit continued operations at the primary site. a combined DR and BC policy, plan,
and set of recovery teams.
• Scope—This section identifies the organizational units and groups of employ-
ees to which the policy applies. This is especially useful in organizations that
are geographically dispersed or that are creating different policies for different organizational units.
• Roles and responsibilities—This section identifies the roles and responsibilities of key players in the BC opera-
tion, from executive management down to individual employees. In some cases, sections may be duplicated
from the organization’s overall CP policy. In smaller organizations, this redundancy can be eliminated because
many of the functions are performed by the same group of individuals.
• Resource requirements—Organizations can allocate specific resources to the development of BC plans.
Although this section may include directives for individual team members, it can be separated from the roles
and responsibilities section for emphasis and clarity.
• Training requirements—This section specifies the training requirements for the various employee groups.
• Exercise and testing schedules—This section stipulates the frequency of BC plan testing and can include both
the type of exercise or testing and the individuals involved.
• Plan maintenance schedule—This section specifies the procedures and frequency of BC plan reviews and identi-
fies the personnel who will be involved in the review. It is not necessary for the entire BC team to be involved;
the review can be combined with a periodic test of the BC plan (as in a talk-through) as long as the resulting
discussion includes areas for plan improvement.
• Special considerations—In extreme situations, the DR and BC plans overlap, as described earlier. Thus, this
section provides an overview of the organization’s information storage and retrieval plans. While the specif-
ics do not have to be elaborated on in this document, the plan should at least identify where more detailed
documentation is kept, which individuals are responsible, and any other information needed to implement
the strategy.
You may have noticed that this structure is virtually identical to that of the disaster recovery policy and plans.
The processes are generally the same, with minor differences in focus and implementation.
The identification of critical business functions and the resources to support them is the cornerstone of the
BC plan. When a disaster strikes, these functions are the first to be reestablished at the alternate site. The CP
team needs to appoint a group of individuals to evaluate and compare the various alternatives and to recommend
which strategy should be selected and implemented. The strategy selected usually involves an off-site facility,
which should be inspected, configured, secured, and tested on a periodic basis. The selection should be reviewed
periodically to determine whether a better alternative has emerged or whether the organization needs a different
solution.
Many organizations with operations in New York City had their BC efforts (or lack thereof) tested critically on
September 11, 2001. Similarly, organizations on the U.S. Gulf Coast had their BC plan effectiveness tested during the
aftermath of Hurricane Katrina in 2005 and by the series of hurricanes that affected Texas and Florida in 2017.

Business Resumption
Because the DR and BC plans are closely related, most organizations merge the two functions into a single function
called business resumption planning (BRP). Such a comprehensive plan must be able to support the reestablishment
of operations at two different locations—one immediately at an alternate site and one eventually back at the primary
site. Therefore, although a single planning team can develop the BR plan, execution of the plan requires separate
execution teams.
214 Principles of Information Security

The planning process for the BR plan should be tied to, but distinct from, the IR plan. As noted earlier in the mod-
ule, an incident may escalate into a disaster when it grows dramatically in scope and intensity. It is important that the
three planning development processes be so tightly integrated that the reaction teams can easily make the transition
from incident response to disaster recovery and BCP.

Continuity Strategies
The CPMT can choose from several strategies in its BC planning. The determining factor is usually cost. Note that these
strategies are chosen from a spectrum of options rather than from the absolute specifications that follow. Also, many
organizations now use cloud-based production systems that would supplement, if not preclude, the following approaches.
In general, two categories of strategies are used in BC: exclusive use and shared use. Exclusive-use facilities
are reserved for the sole use of the leasing organization, and shared-use facilities represent contractual agreements
between parties to share or support each other during a BC event. Three general exclusive-use strategies are available:

• Hot site—A hot site is a fully configured computing facility that includes all services, communications links,
and physical plant operations. It duplicates computing resources, peripherals, phone systems, applications,
and workstations. Essentially, this duplicate facility needs only the latest data backups and the personnel to
function. If the organization uses an adequate data service, a hot site can be fully functional within minutes. Not
surprisingly, a hot site is the most expensive alternative. Disadvantages include the need to provide mainte-
nance for all the systems and equipment at the hot site, as well as physical and information security. However,
if the organization requires a 24/7 capability for near real-time recovery, the hot site is the optimal strategy.
• Warm site—A warm site provides many of the same services and options as the hot site, but typically software
applications are not included or are not installed and configured. A warm site frequently includes computing
equipment and peripherals with servers but not client workstations. Overall, it offers many of the advan-
tages of a hot site at a lower cost. The disadvantage is that several hours of preparation—perhaps days—are
required to make a warm site fully functional.
• Cold site—A cold site provides only rudimentary services and facilities. No com-
hot site puter hardware or peripherals are provided. All communications services must
A fully configured BC facility that be installed after the site is occupied. A cold site is an empty room with standard
includes all computing services, heating, air conditioning, and electrical service. Everything else is an added-cost
communications links, and physi-
cal plant operations.
option. Despite these disadvantages, a cold site may be better than nothing. Its
primary advantage is its low cost. The most useful feature of this approach is
that it ensures an organization has floor space if a widespread disaster strikes,
warm site
but some organizations are prepared to struggle to lease new space rather than
A BC facility that provides many of
the same services and options as a
pay maintenance fees on a cold site.
hot site, but typically without installed
Likewise, there are three strategies in which an organization can gain shared use
and configured software applications.
of a facility when needed for contingency options:

cold site • Timeshare—A timeshare operates like one of the three sites described previ-
A BC facility that provides only rudi- ously but is leased in conjunction with a business partner or sister organization.
mentary services, with no computer It allows the organization to provide a DR/BC option while reducing its overall
hardware or peripherals.
costs. The primary disadvantage is the possibility that more than one timeshare
participant will need the facility simultaneously. Other disadvantages include
timeshare the need to stock the facility with equipment and data from all organizations
A continuity strategy in which an
involved, the complexity of negotiating the timeshare with sharing organizations,
organization co-leases facilities
with a business partner or sister and the possibility that one or more parties might exit the agreement or sublease
organization, which allows the orga- their options. Operating under a timeshare is much like agreeing to co-lease an
nization to have a BC option while apartment with a group of friends. One can only hope that the organizations
reducing its overall costs.
remain on amicable terms, as they all could potentially gain physical access to
each other’s data.
service bureau • Service bureau—A service bureau is an agency that provides a service for a fee.
A BC strategy in which an organiza-
In the case of DR/BC planning, this service is the provision of physical facilities in
tion contracts with a service agency
to provide a facility for a fee. the event of a disaster. Such agencies also frequently provide off-site data storage
Module 5 Incident Response and Contingency Planning 215

for a fee. Contracts with service bureaus can specify exactly what the organiza- mutual agreement
tion needs under what circumstances. A service agreement usually guarantees A BC strategy in which two organi-
space when needed; the service bureau must acquire additional space in the zations sign a contract to assist the
other in a disaster by providing BC
event of a widespread disaster. In this sense, it resembles the rental-car provi-
facilities, resources, and services
sion in a car insurance policy. The disadvantage is that service contracts must until the organization in need can
be renegotiated periodically and rates can change. The contracts can also be recover from the disaster.
quite expensive.
• Mutual agreement—A mutual agreement is a contract between two organiza- rolling mobile site
tions in which each party agrees to assist the other in the event of a disaster. A BC strategy that involves contract-
It stipulates that an organization is obligated to provide necessary facilities, ing with an organization to provide
resources, and services until the receiving organization is able to recover from specialized facilities configured in
the payload area of a tractor-trailer.
the disaster. This arrangement can be a lot like moving in with relatives or
friends—it does not take long for an organization to wear out its welcome.
Many organizations balk at the idea of having to fund duplicate services and resources, even in the short term.
Still, mutual agreements between divisions of the same parent company, between subordinate and senior orga-
nizations, or between business partners may be a cost-effective solution when both parties to the agreement
have a mutual interest in the other’s continued operations and both have similar capabilities and capacities.
In addition to the preceding basic strategies, there are specialized alternatives, such as the following:

• A rolling mobile site is configured in the payload area of a tractor-trailer.


• Externally stored resources, such as a rental storage area that contains duplicate or older equipment, can be
positioned to provide backup systems. These alternatives are similar to the Prepositioning of Material Con-
figured to Unit Sets (POMCUS) sites of the Cold War era, in which caches of materials to be used in the event
of an emergency or war were stored outside normal operations areas.
• An organization might arrange with a prefabricated building contractor to provide immediate, temporary facilities
(mobile offices) on-site in the event of a disaster.
• In recent years, the option to use cloud-based provisioning has emerged. These types of services can be both
a potential continuity option for production systems and a mechanism to manage recovery from disrupted
operations.

Timing and Sequence of CP Elements


As indicated earlier, the IR plan focuses on immediate response, but if the incident escalates into a disaster, the IR
plan may give way to the DR plan and BC plan, as illustrated in Figure 5-12. The DR plan typically focuses on restoring
systems after disasters occur and is therefore closely associated with the BC plan. The BC plan occurs concurrently
with the DR plan when the damage is major or long-term, and when the plan requires more than simple restoration of
information and information resources, as illustrated in Figure 5-13.
Source: This figure has multiple sources. Top left: PR Image

Shutterstock.com. Bottom center: poylock19/Shutterstock.


Factory/Shutterstock.com. Top right: Zephyr_p/

Incident: Ransomware attack on Disaster: Ransomware attack on


a single system/user all organizational systems/users

Attack occurs: Depending on scope, may be


com.

classi ed as an incident or a disaster

Figure 5-12 Incident response and disaster recovery


216 Principles of Information Security

com. Bottom left: Konstantin L/Shutterstock.com. Top right: Monkey Business


Images/Shutterstock.com. Bottom right: Sylvie Bouchard/Shutterstock.com.
Source: This figure has multiple sources. Top left: sandyman/Shutterstock.
Organizational disaster occurs Staff implements DR/BC plans;
BC plan relocates organization to…

DR plan works to
reestablish
operations at

Primary site (or new permanent site) Alternate site

Figure 5-13 Disaster recovery and business continuity planning

Some experts argue that the three planning components (IR, DR, and BC) of CP are so closely linked that they are
indistinguishable. Actually, each has a distinct place, role, and planning requirement. Furthermore, each component
comes into play at a specific time in the life of an incident. Figure 5-14 illustrates this sequence and shows the overlap
that may occur.

Incident
Incident detection Incident recovered,
Starts as IR plan activated Incident recovery operations
reaction
incident restored,
Adverse end IR
event Incident response IR can’t contain
Star escalates to disaster
disa ts as
ster
Disaster
Disaster reaction DR salvage/recovery operations recovered,
DR plan activated (operations restored at primary site) operations
restored,
end DR
Disaster recovery
DR can’t restore ops quickly DR complete
triggers BC triggers end of BC

BC operations
Continuity response
(operations established
BC plan activated
Threat of injury at alternate site) All personnel safe
Business continuity or loss of life and/or accounted for
to personnel triggers end of CM
CM operations
Crisis management response
(emergency services notified
CM plan activated
and coordinated)
Crisis management

Attack occurs Post-attack (hours) Post-attack (days)

Figure 5-14 Contingency planning implementation timeline


Module 5 Incident Response and Contingency Planning 217

Crisis Management crisis management


(CM)
An organization’s set of planning
Another process that many organizations plan for separately is crisis management and preparation efforts for dealing
(CM), which focuses more on the effects that a disaster has on people than its effects with potential human injury, emo-
on other assets. While some organizations include crisis management as a subset of the tional trauma, or loss of life as a
result of a disaster.
DR plan, the protection of human life and the organization’s image are such high priori-
ties that crisis management may deserve its own committee, policy, and plan. Thus,
the organization should form a crisis management planning team (CMPT), which then crisis management
organizes a crisis management response team (CMRT). The appropriate DRRT works policy (CM policy)
closely with the CMRT to ensure complete and timely communication during a disas- The policy document that guides
the development and implementa-
ter. According to Gartner Research, the crisis management team is responsible for tion of CM plans and the formula-
managing the event from an enterprise perspective and performs the following roles: tion and performance of CM teams.

• Supporting personnel and their loved ones during the crisis


• Keeping the public informed about the event and the actions being taken to crisis management
plan (CM plan)
ensure the recovery of personnel and the enterprise
The documented product of crisis
• Communicating with major customers, suppliers, partners, regulatory agen- management planning; a plan that
cies, industry organizations, the media, and other interested parties22 shows the organization’s intended
efforts to protect its personnel and
The CMPT should establish a base of operations or command center near the respond to safety threats.
site of the disaster as soon as possible. The CMPT should include individuals from
all functional areas of the organization in order to facilitate communications and crisis management
cooperation. The CMPT is charged with three primary responsibilities: planning (CMP)
The actions taken by senior man-
1. Verifying personnel status—Everyone must be accounted for, including
agement to develop and implement
individuals who are on vacations, leaves of absence, and business trips. the CM policy, plan, and response
2. Activating the alert roster—Alert rosters and general personnel phone lists teams.
are used to notify individuals whose assistance may be needed or simply to
tell employees not to report to work until the disaster is over. desk check
3. Coordinating with emergency services—If someone is injured or killed during The CP testing strategy in which
a disaster, the CM response team will work closely with fire officials, police, copies of the appropriate plans
are distributed to all individuals
medical response units, and the Red Cross to provide appropriate services who will be assigned roles during
to all affected parties as quickly as possible. an actual incident or disaster; each
individual reviews the plan and vali-
The CMPT should plan an approach for releasing information in the event of dates its components.
a disaster and should perhaps even have boilerplate scripts prepared for press
releases. Advice from Lanny Davis, former counselor to President Bill Clinton, is relevant here. When beset by damag-
ing events, heed the subtitle to Davis’s memoir: Truth to Tell: Tell It Early, Tell It All, Tell It Yourself.23
As with IR, DR, and BC, if CM is organized and conducted as a separate entity, it should have a CM policy and a
CM plan. The methodologies for CM policies and CM planning (CMP) can follow the same basic models as DR policies
and plans, but they should include additional content focused on personnel safety (such as shelter areas), evacuation
plans, contact information for emergency services, and the like.

Testing Contingency Plans


Very few plans are executable as initially written; instead, they must be tested to identify vulnerabilities, faults, and
inefficient processes. Once problems are identified during the testing process, improvements can be made, and the
resulting plan can be relied on in times of need. The following strategies can be used to test contingency plans:

• Desk check—The simplest kind of validation involves distributing copies of the appropriate plans to all indi-
viduals who will be assigned roles during an actual incident or disaster. Each of these individuals performs
a desk check by reviewing the plan and creating a list of correct and incorrect components. While not a true
218 Principles of Information Security

structured test, this strategy is a good way to review the perceived feasibility and effective-
walk-through ness of the plan and ensure at least a nominal update of the policies and plans.
The CP testing strategy in which all • Structured walk-through—In a structured walk-through, all involved individuals
involved individuals walk through walk through the steps they would take during an actual incident or disaster.
a site and discuss the steps they
would take during an actual CP
This exercise can consist of an on-site walk-through, in which everyone dis-
event; can also be conducted as a cusses his or her actions at each particular location and juncture, or it may be
conference room talk-through. more of a talk-through, in which all involved individuals sit around a conference
table and discuss their responsibilities as the incident unfolds.
talk-through • Simulation—In a simulation, the organization creates a role-playing exercise in
A form of structured walk-through which the CP team is presented with a scenario of an actual incident or disaster
in which individuals meet in a con- and expected to react as if it had occurred. The simulation usually involves
ference room and discuss a CP plan
rather than walking around the performing the communications that should occur and specifying the required
organization. physical tasks, but it stops short of performing the actual tasks required, such
as installing the backup data or disconnecting a communications circuit. The
simulation major difference between a walk-through and a simulation is that in simulations,
The CP testing strategy in which the the discussion is driven by a scenario, whereas walk-throughs focus on simply
organization conducts a role-play- discussing the plan in the absence of any particular incident or disaster. Simu-
ing exercise as if an actual incident
lations tend to be much more structured, with time limits, planned AARs, and
or disaster had occurred. The CP
team is presented with a scenario moderators to manage the scenarios.
in which all members must specify • Full-interruption testing—In full-interruption testing, individuals follow each and
how they would react and commu- every IR/DR/BC procedure, including the interruption of service, restoration of data
nicate their efforts.
from backups, and notification of appropriate individuals. This exercise is often per-
formed after normal business hours in organizations that cannot afford to disrupt or
full-interruption simulate the disruption of business functions. Although full-interruption testing is
testing
the most rigorous testing strategy, it is unfortunately too risky for most businesses.
The CP testing strategy in which all
team members follow each IR/DR/ At a minimum, organizations should conduct periodic walk-throughs (or talk-
BC procedure, including those for
throughs) of each of the CP component plans. Failure to update these plans as
interruption of service, restoration
of data from backups, and notifica-the business and its information resources change can erode the team’s ability to
tion of appropriate individuals. respond to an incident, or possibly cause greater damage than the incident itself. If
this sounds like a major training effort, note what the author Richard Marcinko, a former Navy SEAL, has to say about
motivating a team:24
• The more you sweat to train, the less you bleed in combat.
• Training and preparation can hurt.
• Lead from the front, not the rear.
• You don’t have to like it; you just have to do it.
• Keep it simple.
• Never assume.
• You are paid for results, not methods.

One often-neglected aspect of training is cross-training. In a real incident or disaster, the people assigned to par-
ticular roles are often not available. In some cases, alternate people must perform the duties of personnel who have
been incapacitated by the disastrous event that triggered the activation of the plan. The testing process should train
people to take over in the event that a team leader or integral member of the execution team is unavailable.

Final Thoughts on CP
As in all organizational efforts, iteration results in improvement. A critical component of the NIST-based methodologies
presented in this module is continuous process improvement (CPI). Each time the organization rehearses its plans, it
should learn from the process, improve the plans, and then rehearse again. Each time an incident or disaster occurs,
the organization should review what went right and what went wrong. The actual results should be so thoroughly
analyzed that any changes to the plans that could have improved the outcome will be implemented into a revised set
of plans. Through ongoing evaluation and improvement, the organization continues to move forward and continually
improves upon the process so that it can strive for an even better outcome.
Module 5 Incident Response and Contingency Planning 219

Closing Scenario
Charlie sat at his desk the morning after his nightmare. He had answered the most pressing e-mails in his inbox and had a
piping hot cup of coffee at his elbow. He looked down at a blank legal pad, ready to make notes about what to do in case
his nightmare became reality.

Discussion Questions
1. What would be the first note you wrote down if you were Charlie?
2. What else should be on Charlie’s list?
3. Suppose Charlie encountered resistance to his plans to improve contingency planning. What appeals could he
use to sway opinions toward improved business contingency planning?

Ethical Decision Making


Suppose Charlie’s manager, Gladys, tells him that everything is just fine the way it is. Charlie is firmly convinced that the
company is not prepared for any significant adverse events that may occur. Should Charlie’s professional responsibilities
include escalating this matter to higher levels of the organization?

Selected Readings
• A complete treatment of the contingency planning process is presented in Principles of Incident Response and Disaster
Recovery, 3rd Edition, by Michael Whitman and Herbert Mattord, published by Cengage Learning.
• A book that focuses on the incident response elements of contingency planning is Intelligence-Driven Incident Response:
Outwitting the Adversary by Scott J. Roberts and Rebekah Brown, published by O’Reilly.

Module Summary
• Planning for unexpected events is usually the responsibility of general business managers and the information
technology and information security communities of interest.
• For a plan to be seen as valid by all members of the organization, it must be sanctioned and actively supported
by the general business community of interest.
• Some organizations are required by law or other mandate to have contingency planning procedures in place
at all times, but all business organizations should prepare for the unexpected.
• Contingency planning (CP) is the process by which the information technology and information security com-
munities of interest position their organizations to prepare for, detect, react to, and recover from events that
threaten the security of information resources and assets.
• CP is made up of four major components: the data collection and documentation process known as the busi-
ness impact analysis (BIA), the incident response (IR) plan, the disaster recovery (DR) plan, and the business
continuity (BC) plan.
• Organizations can either create and develop the four planning elements of the CP process as one unified plan,
or they can create these elements separately in conjunction with a set of interlocking procedures that enable
continuity.
220 Principles of Information Security

• To ensure continuity during the creation of the CP components, a seven-step CP process is used:
1. Develop the contingency planning policy statement.
2. Conduct the BIA.
3. Identify preventive controls.
4. Create contingency strategies.
5. Develop a contingency plan.
6. Ensure plan testing, training, and exercises.
7. Ensure plan maintenance.
• Four teams are involved in contingency planning and contingency operations: the CP team, the IR team, the
DR team, and the BC team. The IR team ensures that the CSIRT is formed.
• The IR plan is a detailed set of processes and procedures that plan for, detect, and resolve the effects of an
unexpected event on information resources and assets.
• For every scenario identified, the CP team creates three sets of procedures—for before, during, and after the
incident—to detect, contain, and resolve the incident.
• Incident classification is the process by which the IR team examines an incident candidate and determines
whether it constitutes an actual incident.
• Three categories of incident indicators are used: possible, probable, and definite.
• When any one of the following happens, an actual incident is in progress: loss of availability of information,
loss of integrity of information, loss of confidentiality of information, violation of policy, or violation of law.
• Digital forensics is the investigation of wrongdoing in the arena of information security. Digital forensics
requires the preservation, identification, extraction, documentation, and interpretation of computer media
for evidentiary and root cause analysis.
• DR planning encompasses preparation for handling and recovering from a disaster, whether natural or
human-made.
• BC planning ensures that critical business functions continue if a catastrophic incident or disaster occurs.
BC plans can include provisions for hot sites, warm sites, cold sites, timeshares, service bureaus, and mutual
agreements.
• Because the DR and BC plans are closely related, most organizations prepare the two at the same time and
may combine them into a single planning document called the business resumption (BR) plan.
• The DR plan should include crisis management, the action steps taken during and after a disaster. In some
cases, the protection of human life and the organization’s image are such high priorities that crisis manage-
ment may deserve its own policy and plan.
• All plans must be tested to identify vulnerabilities, faults, and inefficient processes. Several strategies can
be used to test contingency plans: desk checks, structured walk-throughs, simulations, and full interruption.

Review Questions
1. What is the name for the broad process of plan- 6. Define the term incident as used in the context of
ning for the unexpected? What are its primary IRP. How is it related to the concept of incident
components? response?
2. Which two communities of interest are usually 7. List and describe the criteria used to determine
associated with contingency planning? Which com- whether an actual incident is occurring.
munity must give authority to ensure broad sup- 8. List and describe the sets of procedures used to
port for the plans? detect, contain, and resolve an incident.
3. According to some reports, what percentage of 9. What is incident classification?
businesses that do not have a disaster plan go out 10. List and describe the actions that should be taken
of business after a major loss? during the reaction to an incident.
4. List the seven-step CP process recommended by 11. What is an alert roster? What is an alert message?
NIST. Describe the two ways they can be used.
5. List and describe the teams that perform the plan- 12. List and describe several containment strate-
ning and execution of the CP plans and processes. gies given in the text. On which tasks do they
What is the primary role of each? focus?
Module 5 Incident Response and Contingency Planning 221

13. What is a disaster recovery plan, and why is it might use the various contingency planning compo-
important to the organization? nents as separate plans? Why?
14. What is a business continuity plan, and why is it 18. What strategies can be used to test contingency
important? plans?
15. What is a business impact analysis, and what is it 19. List and describe two specialized alternatives not
used for? often used as a continuity strategy.
16. Why should contingency plans be tested and 20. What is digital forensics, and when is it used in a
rehearsed? business setting?
17. Which types of organizations might use a unified
continuity plan? Which types of organizations

Exercises
1. Using a Web search engine, search for the terms disaster recovery and business continuity. How many responses
do you get for each term? Note the names of some of the companies in the response. Now perform the search
again, adding the name of your metropolitan area or community.
2. Go to http://csrc.nist.gov. Under “Publications,” select Special Publications, and then locate SP 800-34, Rev. 1,
“Contingency Planning Guide for Federal Information Systems.” Download and review this document. Outline
and summarize the key points for an in-class discussion.
3. Use your library or the Web to find a reported natural disaster that occurred at least six months ago. From the
news accounts, determine whether local or national officials had prepared disaster plans and if the plans were
used. See if you can determine how the plans helped officials improve disaster response. How do the plans help
the recovery?
4. Using the format provided in the text, design an incident response plan for your home computer. Include actions
to be taken if each of the following events occur:
a. Virus attack
b. Power failure
c. Fire
d. Burst water pipe
e. ISP failure
What other scenarios do you think are important to plan for?
5. Classify each of the following occurrences as an incident or disaster. If an occurrence is a disaster, determine
whether business continuity plans would be called into play.
a. A hacker breaks into the company network and deletes files from a server.
b. A fire breaks out in the storeroom and sets off sprinklers on that floor. Some computers are damaged, but the fire is
contained.
c. A tornado hits a local power station, and the company will be without power for three to five days.
d. Employees go on strike, and the company could be without critical workers for weeks.
e. A disgruntled employee takes a critical server home, sneaking it out after hours.
For each of the scenarios (a–e), describe the steps necessary to restore operations. Indicate whether law
enforcement would be involved.

References
1. “NIST General Information.” National Institute of Standards and Technology. Accessed September 1, 2020,
from www.nist.gov/director/pao/nist-general-information.
2. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
222 Principles of Information Security

3. “Disaster Recovery Guide.” The Hartford. Accessed September 1, 2020, from www.thehartford.com/
higrd16/claims/business-disaster-recovery-guide.
4. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
5. Swanson, M., Hash, J., and Bowen, P. Special Publication 800-18, Rev 1: “Guide for Developing Security
Plans for Information Systems.” National Institute of Standards and Technology. February 2006. Page 31.
Accessed December 6, 2017, from csrc.nist.gov/publications/nistpubs/800-18-Rev1/sp800-18-Rev1-final.pdf.
6. Zawada, B., and Evans, L. “Creating a More Rigorous BIA.” CPM Group. November/December 2002.
Accessed May 12, 2005, from www.contingencyplanning.com/archives/2002/novdec/4.aspx.
7. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
8. Ibid.
9. Ibid.
10. Ibid.
11. Ibid.
12. Bartock, M., Cichonski, J., Souppaya, M., Smith, M., Witte, G., and Scarfone, K. Special Publication
800-184, “Guide for Cybersecurity Event Recovery.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-184/final.
13. Cichonski, P., Millar, T., Grance, T., and Scarfone, K. Special Publication 800-61, Rev. 2: “Computer Security
Incident Handling Guide.” National Institute of Standards and Technology. Accessed September 1, 2020,
from https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final.
14. Ibid.
15. Pipkin, D. Information Security: Protecting the Global Enterprise. Upper Saddle River, NJ: Prentice Hall PTR,
2000:285.
16. Cichonski, P., Millar, T., Grance, T., and Scarfone, K. Special Publication 800-61, Rev. 2: “Computer Security
Incident Handling Guide.” National Institute of Standards and Technology. Accessed September 1, 2020,
from https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final.
17. Pipkin, D. Information Security: Protecting the Global Enterprise. Upper Saddle River, NJ: Prentice Hall PTR,
2000:285.
18. Bartock, M., Cichonski, J., Souppaya, M., Smith, M., Witte, G., and Scarfone, K. Special Publication 800-184,
“Guide for Cybersecurity Event Recovery.” Pages 13–14. National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-184/final.
19. McAfee. “Emergency Incident Response: 10 Common Mistakes of Incident Responders.” Accessed Septem-
ber 1, 2020, from www.techwire.net/uploads/2012/09/wp-10-common-mistakes-incident-responders.pdf.
20. Cichonski, P., Millar, T., Grance, T., and Scarfone, K. Special Publication 800-61, Rev. 2: “Computer Security
Incident Handling Guide.” National Institute of Standards and Technology. Accessed September 1, 2020,
from https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final.
21. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
22. Witty, R. “What is Crisis Management?” Gartner Online. September 19, 2001. Accessed December 6, 2017,
from www.gartner.com/doc/340971.
23. Davis, L. Truth to Tell: Tell It Early, Tell It All, Tell It Yourself: Notes from My White House Education.
New York: Free Press, May 1999.
24. Marcinko, R., and Weisman, J. Designation Gold. New York: Pocket Books, 1998.

You might also like