Principles of Information Security 7E - Module 5
Principles of Information Security 7E - Module 5
Upon completion of this material, you should be able to: A little fire is quickly
1 Discuss the need for contingency planning trodden out; which,
2 Describe the major components of incident response, disaster recovery, and being suffered, rivers
business continuity cannot quench.
—William Shakespeare, King
3 Identify the processes used in digital forensics investigations Henry VI, Part III, Act IV, Scene 8
4 Define the components of crisis management
5 Discuss how the organization would prepare and execute a test of contingency plans
Opening Scenario
Charlie Moody flipped up his jacket collar to cover his ears. The spray blowing over him from the fire hoses was icing the cars
along the street where he stood watching his office building burn. The warehouse and shipping dock were not gone but were
severely damaged by smoke and water. He tried to hide his dismay by turning to speak to Fred Chin, standing beside him
overlooking the smoking remains.
“Look at the bright side,” said Charlie. “At least we can get the new servers that we’ve been putting off.”
Fred shook his head. “Charlie, you must be dreaming. We don’t have enough insurance for a full replacement of everything we’ve lost.”
Charlie was stunned. The offices were gone; all the computer systems, servers, and desktops were melted slag. He would
have to try to rebuild without the resources he needed. At least he had good backups, or so he hoped. He thought hard, trying
to remember the last time the off-site backups had been tested.
He wondered where all the network design diagrams were. He knew he could call his Internet provider to order new con-
nections as soon as Fred found some new office space. But where was all the vendor contact information? The only copy had
been on the computer in his office, which wasn’t there anymore. This was not going to be fun. He would have to call his boss,
Gladys Williams, the chief information officer (CIO), at home just to get the contact information for the rest of the executive team.
Charlie heard a buzzing noise to his left. He turned to see the flashing numbers of his alarm clock. Relief flooded him as
he realized it was just a nightmare; Sequential Label and Supply (SLS) had not burned down. He turned on the light and started
making notes to review with his staff as soon as he got into the office. Charlie would make some changes to the company
contingency plans today.
176 Principles of Information Security
Because information system resources are essential to an organization’s success, it is critical that
identified services provided by these systems are able to operate effectively without excessive
interruption. Contingency planning supports this requirement by establishing thorough plans,
procedures, and technical measures that can enable a system to be recovered as quickly and effectively
as possible following a service disruption.2
Some organizations—particularly federal agencies for national security reasons—are charged by law, policy, or
other mandate to have such plans and procedures in place at all times.
Organizations of every size and purpose should also prepare for the unexpected. In general, an organization’s
ability to weather losses caused by an adverse event depends on proper planning and execution of the plan. Without
a workable plan, an adverse event can cause severe damage to an organization’s information resources and assets
from which it may never recover. The Hartford insurance company estimates that, on average, more than 40 percent
of businesses that don’t have a disaster plan go out of business after a major loss like a fire, a break-in, or a storm. 3
The development of a plan for handling unexpected events should be a high priority for all managers. The plan
should account for the possibility that key members of the organization will not be available to assist in the recovery
process. In fact, many organizations expect that some key members of the team may not be present when an unex-
pected event occurs. To keep the consequences of adverse events less catastrophic, many firms limit the number of
executives or other key personnel who take the same flight or attend special events. The concept of a designated sur-
vivor has become more common in government and corporate organizations—a certain number of specifically skilled
personnel are kept away from group activities in case of unexpected adverse events.
There is a growing emphasis on the need for comprehensive and robust planning for adverse circumstances.
In the past, organizations tended to focus on defensive preparations, using comprehensive threat assessments
combined with defense in depth to harden systems and networks against all possible risks. More organizations
now understand that preparations against the threat of attack remain an urgent and important activity, but that
defenses will fail as attackers acquire new capabilities and systems reveal latent flaws. When—not if—defenses
are compromised, prudent security managers have prepared the organization in order to minimize losses and
reduce the time and effort needed to recover. Sound risk management practices dictate that organizations must
be ready for anything.
Module 5 Incident Response and Contingency Planning 177
The BIA is a preparatory activity common to both CP and risk management, which was covered in Module 4. It
helps the organization determine which business functions and information systems are the most critical to the success
of the organization. The IR plan focuses on the immediate response to an incident. Any unexpected adverse event is
treated as an incident unless and until a response team deems it to be a disaster. Then the DR plan, which focuses on
restoring operations at the primary site, is invoked. If operations at the primary site cannot be quickly restored—for
example, when the damage is major or will affect the organization’s functioning over the long term—the BC plan occurs
concurrently with the DR plan, enabling the business to continue at an alternate site until the organization is able to
resume operations at its primary site or select a new primary location.
Depending on the organization’s size and business philosophy, IT and InfoSec managers can either create and
develop these four CP components as one unified plan or create the four separately in conjunction with a set of inter-
locking procedures that enable continuity. Typically, larger, more complex organizations create and develop the CP
components separately, as the functions of each component differ in scope, applicability, and design. Smaller organiza-
tions tend to adopt a one-plan method, consisting of a straightforward set of recovery strategies.
Ideally, the chief information officer (CIO), systems administrators, the chief
information security officer (CISO), and key IT and business managers should be
actively involved during the creation and development of all CP components, as well adverse event
as during the distribution of responsibilities among the three communities of inter- An event with negative conse-
est. The elements required to begin the CP process are a planning methodology; a quences that could threaten the
organization’s information assets
policy environment to enable the planning process; an understanding of the causes or operations; also referred to as
and effects of core precursor activities, known as the BIA; and access to financial an incident candidate.
and other resources, as articulated and outlined by the planning budget. Each of
these is explained in the sections that follow. Once formed, the contingency plan- contingency planning
ning management team (CPMT) begins developing a CP document, for which NIST (CP)
recommends using the following steps: The actions taken by senior man-
agement to specify the organi-
1. Develop the CP policy statement. A formal policy provides the authority zation’s efforts and actions if an
adverse event becomes an incident
and guidance necessary to develop an effective contingency plan.
or disaster; CP typically includes
2. Conduct the BIA. The BIA helps identify and prioritize information incident response, disaster recov-
systems and components critical to supporting the organization’s ery, and business continuity efforts,
as well as preparatory business
mission/business processes. A template for developing the BIA is
impact analysis.
provided to assist the user.
3. Identify preventive controls. Measures taken to reduce the effects of system
disruptions can increase system availability and reduce contingency life
contingency planning
management team
cycle costs. (CPMT)
4. Create contingency strategies. Thorough recovery strategies ensure
The group of senior managers and
that the system may be recovered quickly and effectively following a project members organized to con-
disruption. duct and lead all CP efforts.
178 Principles of Information Security
5. Develop a contingency plan. The contingency plan should contain detailed guidance and procedures
for restoring damaged organizational facilities unique to each business unit’s impact level and recovery
requirements.
6. Ensure plan testing, training, and exercises. Testing validates recovery capabilities, whereas training
prepares recovery personnel for plan activation and exercising the plan identifies planning gaps; when
combined, the activities improve plan effectiveness and overall organization preparedness.
7. Ensure plan maintenance. The plan should be a living document that is updated regularly to remain
current with system enhancements and organizational changes.4
Even though NIST methodologies are used extensively in this module, NIST treats incident response separately from
contingency planning; the latter is focused on disaster recovery and business continuity. This module integrates the
approach to contingency planning from NIST SP 800-34, Rev. 1, with the guide to incident handling from NIST SP 800-61,
Rev. 2. It also incorporates material from the newly released NIST SP 800-184, “Guide for Cybersecurity Event Recovery.”
Effective CP begins with effective policy. Before the CPMT can fully develop the planning document, the team must
receive guidance from executive management, as described earlier, through formal CP policy. This policy defines the
scope of the CP operations and establishes managerial intent in regard to timetables for response to incidents, recovery
from disasters, and reestablishment of operations for continuity. It also stipulates responsibility for the development
and operations of the CPMT in general and may provide specifics on the constituencies of all CP-related teams. It is
recommended that the CP policy contain, at a minimum, the following sections:
insight into functions that are critical to running the business. IT managers supply information about the
at-risk systems used in the development of the BIA and the IR, DR, and BC plans. InfoSec managers oversee
the security planning and provide information on threats, vulnerabilities, attacks, and recovery require-
ments. A representative from the legal affairs or corporate counsel’s office helps keep all planning steps
within legal and contractual boundaries. A member of the corporate communications department makes
sure the crisis management and communications plan elements are consistent with the needs of that group.
Supplemental team members also include representatives of supplemental planning teams: the incident
response planning team (IRPT), disaster recovery planning team (DRPT), and business continuity plan-
ning team (BCPT) . For organizations that decide to separate crisis management from disaster recovery,
there may also be representatives from the crisis management planning team (CMPT).
As indicated earlier, in larger organizations these teams are distinct entities, with non-overlapping memberships,
although the latter three teams have representatives on the CPMT. In smaller organizations, the four teams may
include overlapping groups of people, although this is discouraged because the three planning teams (IR, DR, and
BC) will most likely include members of their respective response teams—the individuals who will actually respond
to an incident or disaster. The planning teams and response teams are distinctly separate groups, but representatives
of the response team will most likely be included on the planning team for continuity purposes and to facilitate plan
development and the communication of planning activities to the response units. If the same individuals are on the DR
and BC teams, for example, they may find themselves with different responsibilities in different locations at the same time.
It is virtually impossible to establish operations at the alternate site if team members are busy managing the recovery
at the primary site, some distance away. Thus, if the organization has sufficient personnel, it is advisable to staff the
two groups with separate members.
As illustrated in the opening scenario of this module, many organizations’ con- incident response
tingency plans are woefully inadequate. CP often fails to receive the high priority planning team (IRPT)
necessary for the efficient and timely recovery of business operations during and The team responsible for designing
after an unexpected event. The fact that many organizations do not place an ade- and managing the IR plan by speci-
quate premium on CP does not mean that it is unimportant, however. Here is how fying the organization’s prepara-
tion, reaction, and recovery from
NIST’s Computer Security Resource Center (CSRC) describes the need for this type incidents.
of planning:
These procedures (contingency plans, business interruption plans, and disaster recovery
continuity of operations plans) should be coordinated with the backup, planning team (DRPT)
contingency, and recovery plans of any general support systems, including The team responsible for design-
ing and managing the DR plan by
networks used by the application. The contingency plans should ensure specifying the organization’s prepa-
that interfacing systems are identified and contingency/disaster planning ration, response, and recovery from
coordinated.5 disasters, including reestablish-
ment of business operations at the
As you learn more about CP, you may notice that it shares certain characteristics primary site after the disaster.
with risk management and the SDLC methodology. Many IT and InfoSec managers
are already familiar with these processes and thus can readily adapt their existing business continuity
knowledge to the CP process. planning team (BCPT)
The team responsible for design-
Contingency
Planning
Business
Impact
Analysis
Continuous improvement
The BIA begins with the prioritized list of threats and vulnerabilities identified in the risk management process
discussed in Module 4, and then the list is enhanced by adding the information needed to respond to the adversity.
Obviously, the organization’s security team does everything in its power to stop attacks, but as you have seen, some
attacks, such as natural disasters, deviations from service providers, acts of human failure or error, and deliberate
acts of sabotage and vandalism, may be unstoppable.
When undertaking the BIA, the organization should consider the following:
1. Scope—Carefully consider which parts of the organization to include in the BIA; determine which business
units to cover, which systems to include, and the nature of the risk being evaluated.
2. Plan—The needed data will likely be voluminous and complex, so work from a careful plan to ensure that
the proper data is collected to enable a comprehensive analysis. Getting the correct information to address
the needs of decision makers is important.
3. Balance—Weigh the information available; some information may be objective in nature, while other
information may only be available as subjective or anecdotal references. Facts should be weighted
properly against opinions; however, sometimes the knowledge and experience of key personnel can be
invaluable.
4. Objective—Identify in advance what the key decision makers require for making choices. Structure the BIA
to bring them the information they need and to facilitate consideration of those choices.
5. Follow-up—Communicate periodically to ensure that process owners and decision makers will support the
process and end result of the BIA.6
According to NIST’s SP 800-34, Rev. 1, the CPMT conducts the BIA in three stages described in the sections that
follow:7
business functions are listed (usually as rows on the same worksheet). Each business function is assessed a score for
each of the criteria. Next, the weights can be multiplied against the scores in each of the criteria, and then the rows
are summed to obtain the overall scored value of the function to the organization. The higher the value computed for
a given business function, the more important that function is to the organization.
A BIA questionnaire is an instrument used to collect relevant business impact information for the required analysis.
It is useful as a tool for identifying and collecting information about business functions for the analysis just described.
It can also be used to allow functional managers to directly enter information about the business processes within
their area of control, the impacts of these processes on the business, and dependencies that exist for the functions
from specific resources and outside service providers.
NIST Business Process and Recovery Criticality NIST’s SP 800-34, Rev. 1, recom-
recovery time objective mends that organizations use simple qualitative categories like “low impact,” “moder-
(RTO) ate impact,” or “high impact” for the security objectives of confidentiality, integrity,
The maximum amount of time and availability (NIST’s Risk Management Framework Step 1). Note that large quan-
that a system resource can remain
unavailable before there is an unac-
tities of information are assembled, and a data collection process is essential if all
ceptable impact on other system meaningful and useful information collected in the BIA process is to be made avail-
resources, supported business pro- able for use in overall CP development.
cesses, and the maximum tolerable
When organizations consider recovery criticality, key recovery measures are
downtime.
usually described in terms of how much of the asset they must recover and what time
frame it must be recovered within. The following terms are most frequently used to
recovery point
objective (RPO) describe these values:
The point in time before a disrup-
tion or system outage to which • Recovery time objective (RTO)
business process data can be
• Recovery point objective (RPO)
recovered after an outage, given
the most recent backup copy of the • Maximum tolerable downtime (MTD)
data. • Work recovery time (WRT)
Module 5 Incident Response and Contingency Planning 183
The difference between RTO and RPO is illustrated in Figure 5-3. WRT typically maximum tolerable
involves the addition of nontechnical tasks required for the organization to make the downtime (MTD)
information asset usable again for its intended business function. The WRT can be The total amount of time the sys-
added to the RTO to determine the realistic amount of elapsed time required before tem owner or authorizing official is
willing to accept for a business pro-
a business function is back in useful service, as illustrated in Figure 5-4. cess outage or disruption. The MTD
NIST goes on to say that failing to determine MTD “could leave contingency plan- includes all impact considerations.
ners with imprecise direction on (1) selection of an appropriate recovery method and
(2) the depth of detail that will be required when developing recovery procedures,
work recovery time
including their scope and content.”8 Determining the RTO for the information system (WRT)
resource, NIST adds, “is important for selecting appropriate technologies that are The amount of effort (expressed as
best suited for meeting the MTD.”9 As for reducing RTO, that requires mechanisms elapsed time) needed to make busi-
to shorten the start-up time or provisions to make data available online at a failover ness functions work again after the
technology element is recovered.
site. Unlike RTO, NIST adds, “RPO is not considered as part of MTD. Rather, it is a This recovery time is identified by
factor of how much data loss the mission/business process can tolerate during the the RTO.
recovery process.”10 Reducing RPO requires mechanisms to increase the synchronic-
ity of data replication between production systems and the backup implementations
for those systems.
Last backup or
Source: http://networksandservers.blogspot.com/2011/02/high-
point where data
is in usable and Systems & data
recoverable state recovered
Incident/
disaster strikes
Time
availability-terminology-ii.html.
How far back? How long to recover?
Recovery Recovery
point time
(how much lost data?) (how soon for
restoration
& recovery?)
Recovery
Incident/ complete/
disaster Physical/systems Data Testing & resume
strikes recovery recovery validation operations
Because of the critical need to recover business functionality, the total time needed to place the busi-
ness function back in service must be shorter than the MTD. Planners should determine the optimal point to
recover the information system in order to meet BIA-mandated recovery needs while balancing the cost of
system inoperability against the cost of the resources required for restoring systems. This must be done in
the context of the BIA-identified critical business processes and can be shown with a simple chart, such as the
one in Figure 5-5.
The longer an interruption to system availability remains, the more impact and cost it will have for the organization
and its operations. When plans require a short RTO, the solutions that will be required are usually more expensive to
design and use. For example, if a system must be recovered immediately, it will have an RTO of 0.
These types of solutions will require fully redundant alternative processing sites and will therefore have much
higher costs. On the other hand, a longer RTO would allow a less expensive recovery system. Plotting the cost balance
points will show an optimal point between disruption and recovery costs. The intersecting point, labeled the cost
balance point in Figure 5-5, will be different for every organization and system, based on the financial constraints and
operating requirements.11
Cost of disruption
Cost to recover (business impact)
(system mirror)
Cost
Cost
Balance
Point
Cost to recover
(tape backup)
Information Asset Prioritization As the CPMT conducts the BIA, it will be assessing priorities and relative values
for mission/business processes. To do so, it needs to understand the information assets used by those processes. In
essence, the organization has determined which processes are most critical to its long-term viability, and now it must
determine which information assets are most critical to each process.
Note that the presence of high-value information assets may influence the valuation of a particular business pro-
cess. In any event, once the business processes have been prioritized, the organization should identify, classify, and
prioritize the information assets both across the organization and within each business process, placing classification
labels on each collection or repository of information in order to better understand its value and to prioritize its pro-
tection. Normally, this task would be performed as part of the risk assessment function within the risk management
process. If the organization has not performed this task, the BIA process is the appropriate time to do so. Again, the
WTA can be a useful tool to determine the information asset priorities.
Incident Response
Most organizations have experience detecting, reacting to, and recovering from cyberattacks, employee errors, service
outages, and small-scale natural disasters. While they may not have formally labeled such efforts, these organizations
are performing incident response (IR). IR must be carefully planned and coordinated because organizations heavily
depend on the quick and efficient containment and resolution of incidents.
Incident response planning (IRP), therefore, is the preparation for such an effort and is performed by the IRP team
(IRPT). Note that the term incident response could be used either to describe the entire set of activities or a specific
phase in the overall reaction. However, in an effort to minimize confusion, this text will use the term IR to describe
the overall process, and reaction rather than response to describe the organization’s performance after it detects an
incident.
In business, unexpected events happen. When those events represent the poten-
incident response (IR) tial for loss, they are referred to as adverse events or incident candidates. When
An organization’s set of planning an adverse event begins to manifest as a real threat to information, it becomes an
and preparation efforts for detect- incident. The incident response plan (IR plan) is usually activated when the orga-
ing, reacting to, and recovering
nization detects an incident that affects it, regardless of how minor the effect is.
from an incident.
computer security
incident response Containment
team (CSIRT) Detection & eradication Post-incident
An IR team composed of techni- Preparation analysis & recovery activity
Incident Handling Guide.”
r
y
on
t
ve
t if
ct
ec
te
co
sp
en
ot
De
Re
Re
Pr
Id
Tactical Strategic
Detect cyber Respond to Remediate recovery
Identify Protect recovery
event cyber event root cause
phase phase
Guide for cybersecurity event recovery
not difficult to map the phases shown in Figure 5-6 to those of Figure 5-7. Within the CSF, the five stages shown in
Figure 5-7 include the following:
• Identify—Relates to risk management and governance
• Protect—Relates to implementation of effective security controls (policy, education, training and awareness,
and technology)
• Detect—Relates to the identification of adverse events
• Respond—Relates to reacting to an incident
• Recover—Relates to putting things “as they were before” the incident12
The Detect, Respond, and Recover stages directly relate to NIST’s IR strategy, as described in detail in SP 800-61,
Rev. 2.
For more information on the NIST Cybersecurity Framework, download the Framework for Improving Critical
i Infrastructure Cybersecurity from www.nist.gov/sites/default/files/documents/cyberframework/cybersecurity-
framework-021214.pdf.
changes to their IT infrastructures. For example, if the CSIRT determines that the only way to stop a massive denial-of-
service attack is to sever the organization’s connection to the Internet, it should have the approved permission stored
in an appropriate and secure location before authorizing such action. This ensures that the CSIRT is performing autho-
rized actions and protects both the CSIRT members and the organization from misunderstanding and potential liability.
The prevention of threats and attacks has been intentionally omitted from this discussion because guarding
against such possibilities is primarily the responsibility of the InfoSec department, which works with the rest of the
organization to implement sound policy, effective risk controls, and ongoing training and awareness programs. It is
important to understand that IR is a reactive measure, not a preventive one, although most IR plans include preventa-
tive recommendations.
The responsibility for creating an organization’s IR plan usually falls to the CIO, the CISO, or an IT manager with
security responsibilities. With the aid of other managers and systems administrators on the CP team, the CISO should
select members from each community of interest to form an independent IR team, which executes the IR plan. The
roles and responsibilities of IR team members should be clearly documented and communicated throughout the
organization. The IR plan also includes an alert roster, which lists certain critical individuals and organizations to be
contacted during the course of an incident.
Using the multistep CP process discussed in the previous section as a model, the CP team can create the IR plan.
According to NIST SP 800-61, Rev. 2, the IR plan should include the following elements:
• Mission
• Strategies and goals
• Senior management approval
• Organizational approach to incident response
• How the incident response team will communicate with the rest of the organization and with other
organizations
• Metrics for measuring incident response capability and its effectiveness
• Roadmap for maturing incident response capability
• How the program fits into the overall organization14
During this planning process, the IR procedures take shape. For every incident scenario, the CP team creates three
sets of incident handling procedures:
1. During the incident—The planners develop and document the procedures that must be performed during the
incident. These procedures are grouped and assigned to individuals. Systems administrators’ tasks differ from
managerial tasks, so members of the planning committee must draft a set of function-specific procedures.
2. After the incident—Once the procedures for handling an incident are drafted, the planners develop and
document the procedures that must be performed immediately after the incident has ceased. Again,
separate functional areas may develop different procedures.
3. Before the incident—The planners draft a third set of procedures: those tasks that must be performed
to prepare for the incident, including actions that could mitigate any damage from the incident. These
procedures include details of the data backup schedules, disaster recovery
preparation, training schedules, testing plans, copies of service agreements,
IR procedures
and BC plans, if any. At this level, the BC plan could consist just of additional
Detailed, step-by-step methods of
preparing, detecting, reacting to, material about a service bureau that stores data off-site via electronic vaulting,
and recovering from an incident. with an agreement to provide office space and lease equipment as needed.
Module 5 Incident Response and Contingency Planning 189
Planning for an incident and the responses to it requires a detailed understanding of the information systems and the
threats they face. The BIA provides the data used to develop the IR plan. The IRPT seeks to develop a series of predefined
responses that will guide the CSIRT and InfoSec staff through the IR process. Predefining incident responses enables the
organization to react to a detected incident quickly and effectively, without confusion or wasted time and effort.
The execution of the IR plan typically falls to the CSIRT. As noted previously, the CSIRT is a separate group from the IRPT,
although some overlap may occur; the CSIRT is composed of technical and managerial IT and InfoSec professionals who are
prepared to diagnose and respond to an incident. In some organizations, the CSIRT may simply be a loose or informal associa-
tion of IT and InfoSec staffers who would be called if an attack were detected on the organization’s information assets. In other,
more formal implementations, the CSIRT is a set of policies, procedures, technologies, people, and data put in place to prevent,
detect, react to, and recover from an incident that could potentially damage the organization’s information. At some level, all
members of an organization are members of the CSIRT, because every action they take can cause or avert an incident.
The CSIRT should be available for contact by anyone who discovers or suspects that an incident involving the
organization has occurred. One or more team members, depending on the magnitude of the incident and availability
of personnel, then handle the incident. The incident handlers analyze the incident data, determine the impact of the
incident, and act appropriately to limit the damage to the organization and restore normal services. Although the
CSIRT may have only a few members, the team’s success depends on the participation and cooperation of individuals
throughout the organization.
The CSIRT consists of professionals who can handle the systems and functional areas affected by an incident. For
example, imagine a firefighting team responding to an emergency call. Rather than responding to the fire as individu-
als, every member of the team has a specific role to perform, so that the team acts as a unified body that assesses the
situation, determines the appropriate response, and coordinates the response. Similarly, each member of the IR team
must know his or her specific role, work in concert with other team members, and execute the objectives of the IR plan.
Incident response actions can be organized into three basic phases:
Action Completed
Detection and Analysis
1. Determine whether an incident has occurred
1.1 Analyze the precursors and indicators
1.2 Look for correlating information
1.3 Perform research (e.g., search engines, knowledge base)
1.4 As soon as the handler believes an incident has occurred, begin
documenting the investigation and gathering evidence
2. Prioritize handling the incident based on the relevant
factors (functional impact, information impact,
recoverability effort, etc.)
3. Report the incident to the appropriate internal personnel and
external organizations
Containment, Eradication, and Recovery
4. Acquire, preserve, secure, and document evidence
5. Contain the incident
6. Eradicate the incident
6.1 Identify and mitigate all vulnerabilities that were exploited
6.2 Remove malware, inappropriate materials, and other
components
6.3 If more affected hosts are discovered (e.g., new malware
infections), repeat the Detection and Analysis steps (1.1, 1.2) to
identify all other affected hosts, then contain (5) and eradicate
(6) the incident for them
7. Recover from the incident
7.1 Return affected systems to an operationally ready state
7.2 Confirm that the affected systems are functioning normally
7.3 If necessary, implement additional monitoring to look for future
related activity
Post-Incident Activity
8. Create a follow-up report
9. Hold a lessons learned meeting (mandatory for major incidents,
optional otherwise). While not explicitly noted in the NIST
document, most organizations will document the findings from
this activity and use it to update relevant plans, policies, and
procedures.
Source: NIST SP 800-61, Rev. 2.
Possible Indicators
The following types of incident candidates are considered possible indicators of actual incidents:
• Presence of unfamiliar files—Users might discover unfamiliar files in their home directories or on their office
computers. Administrators might also find unexplained files that do not seem to be in a logical location or are
not owned by an authorized user.
• Presence or execution of unknown programs or processes—Users or administrators might detect unfamiliar pro-
grams running, or processes executing, on office machines or network servers. Users should become familiar
with accessing running programs and processes (usually through the Windows Task Manager shown in Figure
5-8) so they can detect rogue instances.
• Unusual consumption of computing resources—An example would be a sudden spike or fall in consumption of
memory or hard disk space. Many computer operating systems, including Windows, Linux, and UNIX variants,
allow users and administrators to monitor CPU and memory consumption. The Windows Task Manager has a
Performance tab that provides this information, also shown in Figure 5-8. Most computers also have the ability
to monitor hard drive space. In addition, servers maintain logs of file creation and storage.
• Unusual system crashes—Computer systems can crash. Older operating systems running newer programs are
notorious for locking up or spontaneously rebooting whenever the operating system is unable to execute a
requested process or service. You are probably familiar with system error messages such as “Unrecoverable
Application Error,” “General Protection Fault,” and the infamous Windows “Blue Screen of Death.” However,
if a computer system seems to be crashing, hanging, rebooting, or freezing more frequently than usual, the
cause could be an incident candidate.
Probable Indicators
The following types of incident candidates are considered probable indicators of actual incidents:
• Activities at unexpected times—If traffic levels on the organization’s network exceed the measured baseline
values, an incident candidate is probably present. If this activity surge occurs outside normal business hours,
the probability becomes much higher. Similarly, if systems are accessing drives and otherwise indicating high
activity when employees aren’t using them, an incident may also be occurring.
192 Principles of Information Security
Source: Microsoft.
Figure 5-8 Windows Task Manager showing processes (left) and services (right)
• Presence of new accounts—Periodic review of user accounts can reveal accounts that the administrator does
not remember creating or that are not logged in the administrator’s journal. Even one unlogged new account
is an incident candidate. An unlogged new account with root or other special privileges has an even higher
probability of being an actual incident.
• Reported attacks—If users of the system report a suspected attack, there is a high probability that an incident
has occurred, whether it was an attack or not. The technical sophistication of the person making the report
should be considered. If systems administrators are reporting attacks, odds are that additional attacks are
occurring throughout the organization.
• Notification from an IDPS—If the organization has installed and correctly configured a host- or network-based
intrusion detection and prevention system (IDPS), then a notification from the IDPS indicates that an incident
might be in progress. However, IDPSs are difficult to configure perfectly, and even when they are, they tend to
issue false positives or false alarms. The administrator must then determine whether the notification is real
or the result of a routine operation by a user or other administrator.
Definite Indicators
The following five types of incident candidates are definite indicators of an actual incident. That is, they clearly signal
that an incident is in progress or has occurred. In these cases, the IR plan must be activated immediately, and appro-
priate measures must be taken by the CSIRT.
• Use of dormant accounts—Many network servers maintain default accounts, and there are often accounts
from former employees, employees on a leave of absence or sabbatical without remote access privileges,
or dummy accounts set up to support system testing. If any of these accounts activate and begin accessing
system resources, querying servers, or engaging in other activities, an incident is certain to have occurred.
• Changes to logs—Smart systems administrators back up system logs as well as system data. As part of a routine
incident scan, systems administrators can compare these logs to the online versions to determine whether
they have been modified. If they have, and the systems administrator cannot determine explicitly that an
authorized individual modified them, an incident has occurred.
• Presence of hacker tools—Network administrators sometimes use system vulnerability and network evaluation
tools to scan internal computers and networks to determine what a hacker can see. These tools are also used
to support research into attack profiles. All too often, however, they are used by individuals with local network
access to hack into systems or just “look around.” To combat this problem, many organizations explicitly prohibit
the use of these tools without permission from the CISO, making any unauthorized installation a policy violation.
Most organizations that engage in penetration testing require that all tools in this category be confined to specific
systems and that they not be used on the general network unless active penetration testing is under way. Finding
hacker tools, or even legal security tools, in places they should not be is an indicator that an incident has occurred.
Module 5 Incident Response and Contingency Planning 193
• Notifications by partner or peer—If a business partner or another integrated organization reports an attack
from your computing systems, then an incident has occurred. It’s quite common for an attacker to use a third
party’s conscripted systems to attack another system rather than attacking directly.
• Notification by hacker—Some hackers enjoy taunting their victims. If an organization’s Web pages are defaced,
it is an incident. If an organization receives an extortion request for money in exchange for its stolen data, an
incident is in progress. Note that even if an actual attack has not occurred—for example, the hacker is just
making an empty threat—the reputational risk is real and should be treated as such.
Reacting to Incidents
Once an actual incident has been confirmed and properly classified, the IR plan moves from the detection phase
to the reaction phase. NIST SP 800-61, Rev. 2, combines the reaction and recovery phases into their “Containment,
Eradication, and Recovery” phase, but the phases are treated separately as “Respond” and “Recover” under the
new CSF.16
The steps in IR are designed to stop the incident, mitigate its effects, and provide information for recovery from
the incident. In the Reaction or Response phase, several action steps taken by the CSIRT and others must occur quickly
and may take place concurrently. An effective IR plan prioritizes and documents these steps to allow for efficient refer-
ence during an incident. These steps include notification of key personnel, documentation of the incident, determining
containment options, and escalation of the incident if needed.
For more information on selecting an automated notification system, read the article by Steven Ross on
i TechTarget’s page at https://searchdisasterrecovery.techtarget.com/feature/Selecting-an-automated-notification-
system-for-data-center-disasters.
194 Principles of Information Security
alert message The alert roster is used to deliver the alert message, which tells each team
A description of the incident or
member his or her expected task and situation. It provides just enough information
disaster that usually contains just so that each responder, CSIRT or otherwise, knows what portion of the IR plan to
enough information so that each implement without impeding the notification process. It is important to recognize
person knows what portion of the
that not everyone is on the alert roster—only individuals who must respond to an
IR or DR plan to implement with-
out slowing down the notification actual incident. As with any part of the IR plan, the alert roster must be regularly
process. maintained, tested, and rehearsed if it is to remain effective.
During this phase, other key personnel not on the alert roster, such as general
management, must be notified of the incident as well. This notification should occur
only after the incident has been confirmed but before media or other external sources learn of it. Among those likely to
be included in the notification process are members of the legal, communications, and human resources departments.
In addition, some incidents are disclosed to the employees in general as a lesson in security, and some are not, as a
measure of security. Furthermore, other organizations may need to be notified if it is determined that the incident is
not confined to internal information resources or is part of a larger-scale assault. Distributed denial-of-service attacks
are an example of this type of general assault against the cyber infrastructure. In general, the IR planners should
determine in advance whom to notify and when, and should offer guidance about additional notification steps to
take as needed.
Documenting an Incident
As soon as an incident has been confirmed and the notification process is under way, the team should begin to docu-
ment it. The documentation should record the who, what, when, where, why, and how of each action taken while the
incident is occurring. This documentation serves as a case study after the fact to determine whether the right actions
were taken and if they were effective. It also proves that the organization did everything possible to prevent the spread
of the incident.
Legally, the standards of due care may offer some protection to the organization if an incident adversely affects
individuals inside and outside the organization, or if it affects other organizations that use the target organization’s
systems. Incident documentation can also be used as a simulation in future training sessions with the IR plan.
Obviously, the final strategy is used only when all system control has been lost and the only hope is to preserve
the data stored on the computers so that operations can resume normally once the incident is resolved. The CSIRT,
following the procedures outlined in the IR plan, determines the length of the interruption.
Consider what would happen during an incident if key personnel are on sick leave, vacation, or otherwise not
at work? Think of how many people in your class or office are not there on a regular basis. Many businesses require
travel, with employees going off-site to meetings, seminars, or training, and to fulfill other diverse requirements. In
addition, “life happens”—employees are sometimes absent due to illness, injury, routine medical activities, and other
unexpected events. In considering these possibilities, the importance of preparedness becomes clear. Everyone should
know how to react to an incident, not just the CISO and security administrators.
Incident Escalation
An incident may increase in scope or severity to the point that the IR plan cannot adequately handle it. An important
part of knowing how to handle an incident is knowing at what point to escalate it to a disaster, or to transfer the incident
to an outside authority such as law enforcement or some other public response unit. During the BIA, each organiza-
tion will have to determine the point at which an incident is deemed a disaster. These criteria must be included in the
IR plan. The organization must also document when to involve outside responders, as discussed in other sections.
Escalation is one of those things that, once done, cannot be undone, so it is important to know when and where it
should be used.
• Identify the vulnerabilities that allowed the incident to occur and spread. Resolve them.
• Address the safeguards that failed to stop or limit the incident or were missing from the system in the first
place. Install, replace, or upgrade them.
• Evaluate monitoring capabilities (if present). Improve detection and reporting methods or install new
monitoring capabilities.
• Restore the data from backups, as needed. The IR team must understand the backup strategy used by the
organization, restore the data contained in backups, and then use the appropriate recovery processes, from
incremental backups or database journals, to recreate any data that was created or modified since the last backup.
• Restore the services and processes in use. Compromised services and processes must be examined, cleaned,
and then restored. If services or processes were interrupted while regaining control of the systems, they need
to be brought back online.
• Continuously monitor the system. If an incident happened once, it could easily happen again. Hackers fre-
quently boast of their exploits in chat rooms and dare their peers to match their efforts. If word gets out, oth-
ers may be tempted to try the same or different attacks on your systems. It is therefore important to maintain
vigilance during the entire IR process.
196 Principles of Information Security
• Restore the confidence of the organization’s communities of interest. The CSIRT, following a recommendation
from management, may want to issue a short memorandum outlining the incident and assuring everyone that
it was handled and the damage was controlled. If the incident was minor, say so. If the incident was major or
severely damaged systems or data, reassure users that they can expect operations to return to normal as soon
as possible. The objective of this communication is to prevent panic or confusion from causing additional
disruption to the operations of the organization.
According to NIST SP 800-184, every organization should have a recovery plan (as a subset of the IR plan) to guide spe-
cific efforts after the incident has been contained. The following is the summary of recommendations from that document:
Understand how to be prepared for resilience at all times, planning how to operate in a diminished capacity
or restore services over time based on their relative priorities.
Identify and document the key personnel who will be responsible for defining recovery criteria and associated
plans, and ensure these personnel understand their roles and responsibilities.
Create and maintain a list of people, process, and technology assets that enable the organization to achieve
its mission (including external resources), along with all dependencies among these assets. Document and
maintain categorizations for these assets based on their relative importance and interdependencies to enable
prioritization of recovery efforts.
Develop comprehensive plan(s) for recovery that support the prioritizations and recovery objectives, and use
the plans as the basis of developing recovery processes and procedures that ensure timely restoration of sys-
tems and other assets affected by future cyber events. The plan(s) should ensure that underlying assumptions
(e.g., availability of core services) will not undermine recovery, and that processes and procedures address
both technical and non-technical activity affecting people, processes, and technologies.
Develop, implement, and practice the defined recovery processes, based upon the organization’s recovery
requirements, to ensure timely recovery team coordination and restoration of capabilities or services affected
by cyber events.
Formally define and document the conditions under which the recovery plan is to be invoked, who has the
authority to invoke the plan, and how recovery personnel will be notified of the need for recovery activities
to be performed.
Define key milestones for meeting intermediate recovery goals and terminating active recovery efforts.
Adjust incident detection and response policies, processes, and procedures to ensure that recovery does not
hinder effective response (e.g., by alerting an adversary or by erroneously destroying forensic evidence).
Develop a comprehensive recovery communications plan, and fully integrate communications considerations
into recovery policies, plans, processes, and procedures.
Clearly define recovery communication goals, objectives, and scope, including information sharing rules and
methods. Based upon this communications plan, consider sharing actionable information about cyber threats
with relevant organizations, such as those described in NIST SP 800-150.18
Before returning to its routine duties, the CSIRT should conduct an after-action review (AAR). The AAR is an
opportunity for everyone who was involved in an incident or disaster to sit down and discuss what happened. In
an AAR, a designated person acts as a moderator and allows everyone to share what happened from his or her own
perspective, while ensuring there is no blame or finger-pointing. All team members review their actions during the
incident and identify areas where the IR plan worked, did not work, or could be improved. Once completed, the AAR
is written up and shared.
All key players review their notes and the AAR and verify that the IR documentation is accurate and precise. The
AAR allows the team to update the plan and brings the reaction team’s actions to a close. The AAR can serve as a
training case for future staff.
According to McAfee, there are 10 common mistakes that an organization’s
after-action review CSIRTs make in IR:
(AAR)
1. Failure to appoint a clear chain of command with a specified individual in
A detailed examination and discus-
sion of the events that occurred charge
during an incident or disaster, from 2. Failure to establish a central operations center
first detection to final recovery.
3. Failure to “know their enemy,” as described in Modules 2 and 4
Module 5 Incident Response and Contingency Planning 197
NIST SP 800-61, Rev. 2, makes the following recommendations for handling incidents:
• Acquire tools and resources that may be of value during incident handling—The team will be more efficient at
handling incidents if various tools and resources are already available to them. Examples include contact lists,
encryption software, network diagrams, backup devices, digital forensic software, and port lists.
• Prevent incidents from occurring by ensuring that networks, systems, and applications are sufficiently secure—
Preventing incidents is beneficial to the organization and reduces the workload of the incident response team.
Performing periodic risk assessments and reducing the identified risks to an acceptable level are effective
in reducing the number of incidents. Awareness of security policies and procedures by users, IT staff, and
management is also very important.
• Identify precursors and indicators through alerts generated by several types of security software—Intrusion detec-
tion and prevention systems, antivirus software, and file integrity checking software are valuable for detect-
ing signs of incidents. Each type of software may detect incidents that the other types cannot, so the use of
several types of computer security software is highly recommended. Third-party monitoring services can
also be helpful.
• Establish mechanisms for outside parties to report incidents—Outside parties may want to report incidents
to the organization—for example, they may believe that one of the organization’s users is attacking them.
Organizations should publish a phone number and e-mail address that outside parties can use to report such
incidents.
• Require a baseline level of logging and auditing on all systems and a higher baseline level on all critical systems—
Logs from operating systems, services, and applications frequently provide value during incident analysis,
particularly if auditing was enabled. The logs can provide information such as which accounts were accessed
and what actions were performed.
• Profile networks and systems—Profiling measures the characteristics of expected activity levels so that changes
in patterns can be more easily identified. If the profiling process is automated, deviations from expected activ-
ity levels can be detected and reported to administrators quickly, leading to faster detection of incidents and
operational issues.
• Understand the normal behaviors of networks, systems, and applications—Team members who understand nor-
mal behavior should be able to recognize abnormal behavior more easily. This knowledge can best be gained
by reviewing log entries and security alerts; the handlers should become familiar with typical data and can
investigate unusual entries to gain more knowledge.
• Create a log retention policy—Information about an incident may be recorded in several places. Creating and
implementing a log retention policy that specifies how long log data should be maintained may be extremely
helpful in analysis because older log entries may show reconnaissance activity or previous instances of similar
attacks.
• Perform event correlation—Evidence of an incident may be captured in several logs. Correlating events among
multiple sources can be invaluable in collecting all the available information for an incident and validating
whether the incident occurred.
• Keep all host clocks synchronized—If the devices that report events have inconsistent clock settings, event
correlation will be more complicated. Clock discrepancies may also cause problems from an evidentiary
standpoint.
• Maintain and use a knowledge base of information—Handlers need to reference information quickly during
incident analysis; a centralized knowledge base provides a consistent, maintainable source of information.
The knowledge base should include general information such as data on precursors and indicators of previ-
ous incidents.
198 Principles of Information Security
• Start recording all information as soon as the team suspects that an incident has occurred—Every step taken, from
the time the incident was detected to its final resolution, should be documented and time-stamped. Informa-
tion of this nature can serve as evidence in a court of law if legal prosecution is pursued. Recording the steps
performed can also lead to a more efficient, more systematic, and less error-prone handling of the problem.
• Safeguard incident data—This data often contains sensitive information about vulnerabilities, security
breaches, and users who may have performed inappropriate actions. The team should ensure that access to
incident data is properly restricted, both logically and physically.
• Prioritize handling of incidents based on relevant factors—Because of resource limitations, incidents should
not be handled on a first-come, first-served basis. Instead, organizations should establish written guidelines
that outline how quickly the team must respond to the incident and what actions should be performed, based
on relevant factors such as the functional and information impact of the incident and the likely recoverability
from the incident. This saves time for the incident handlers and provides a justification to management and
system owners for their actions. Organizations should also establish an escalation process for instances when
the team does not respond to an incident within the designated time.
• Include provisions for incident reporting in the organization’s incident response policy—Organizations should
specify which incidents must be reported, when they must be reported, and to whom. The parties most
commonly notified are the CIO, the head of information security, the local information security officer, other
incident response teams within the organization, and system owners.
• Establish strategies and procedures for containing incidents—It is important to contain incidents quickly and
effectively limit their business impact. Organizations should define acceptable risks in containing incidents
and develop strategies and procedures accordingly. Containment strategies should vary based on the type
of incident.
• Follow established procedures for evidence gathering and handling—The team should clearly document how
all evidence has been preserved. Evidence should be accounted for at all times. The team should meet with
legal staff and law enforcement agencies to discuss evidence handling and then develop procedures based
on those discussions.
• Capture volatile data from systems as evidence—This data includes lists of network connections, processes, login
sessions, open files, network interface configurations, and the contents of memory. Running carefully chosen
commands from trusted media can collect the necessary information without damaging the system’s evidence.
• Obtain system snapshots through full forensic disk images, not file system backups—Disk images should be
made to sanitized write-protectable or write-once media. This process is superior to a file system backup for
investigatory and evidentiary purposes. Imaging is also valuable in that it is much safer to analyze an image
than it is to perform analysis on the original system because the analysis may inadvertently alter the original.
• Hold lessons-learned meetings after major incidents—Lessons-learned meetings are extremely helpful in improv-
ing security measures and the incident handling process itself.20
Note that some of these recommendations were covered earlier in this section. CSIRT members should be very
familiar with these tools and techniques prior to an incident. Trying to use unfamiliar procedures in the middle of an
incident could prove very costly to the organization and cause more harm than good.
For more information on incident handling, read the Incident Handlers Handbook by Patrick Kral, which is avail-
i able from the SANS reading room at www.sans.org/reading-room/whitepapers/incident/incident-handlers-hand-
book-33901. You can search for other incident handling papers at www.sans.org/reading-room/whitepapers/
incident/.
• Protect and forget—This approach, also known as “patch and proceed,” focuses on the defense of data and the
systems that house, use, and transmit it. An investigation that takes this approach focuses on the detection
Module 5 Incident Response and Contingency Planning 199
and analysis of events to determine how they happened and to prevent reoc- protect and forget
currence. Once the current event is over, the questions of who caused it and The organizational CP philosophy
why are almost immaterial. that focuses on the defense of
information assets and prevent-
• Apprehend and prosecute—This approach, also known as “pursue and punish,”
ing reoccurrence rather than the
focuses on the identification and apprehension of responsible individuals, with attacker’s identification and pros-
additional attention paid to the collection and preservation of potential eviden- ecution; also known as “patch and
tiary material that might support administrative or criminal prosecution. This proceed.”
Responding as quickly as possible to incidents has become even more important with the increasing integration between
the cyber world and the physical world. Operational technology (OT), cyber-physical systems (CPS), and the Internet of Things
(IoT) are all driving this integration. Now an attacker can exploit cyber vulnerabilities to cause physical impacts, including over-
riding a building’s card readers and other physical security systems to gain unauthorized access and feeding crafted malicious
data into a factory’s power system in order to start a fire or cause an explosion. Delaying the response to an incident may put
human lives at unnecessary risk and ultimately lead to deaths that should have been prevented.
Digital Forensics
Whether due to a character flaw, a need for vengeance, a profit motive, or simple curiosity, an employee or outsider
may attack a physical asset or information asset. When the asset is the responsibility of the CISO, he or she is expected
to understand how policies and laws require the matter to be managed and protected. To protect the organization and
possibly assist law enforcement in an investigation, the CISO must determine what happened and how an incident
occurred. This process is called digital forensics.
Digital forensics is based on the field of traditional forensics. Made popular by scientific detective shows that focus
on crime scene investigations, forensics involves the use of science to investigate events. Not all events involve crimes;
some involve natural events, accidents, or system malfunctions. Forensics allows investigators to determine what hap-
pened by examining the results of an event. It also allows them to determine how the
event happened by examining activities, individual actions, physical evidence, and
digital forensics testimony related to the event. However, forensics might not figure out the “why” of
Investigations that involve the pres- the event; that’s the focus of psychological, sociological, and criminal justice stud-
ervation, identification, extraction,
documentation, and interpretation
ies. Here, the focus is on the application of forensics techniques in the digital arena.
of computer media for evidentiary Digital forensics involves the preservation, identification, extraction, documen-
and root cause analysis, following tation, and interpretation of digital media, including computer media, for evidentiary
clear, well-defined methodologies.
and root cause analysis. Like traditional forensics, it follows clear, well-defined meth-
odologies, but it still tends to be as much an art as a science. In other words, the
forensics natural curiosity and personal skill of the investigator play a key role in discovering
The coherent application of potential evidentiary material (EM). An item does not become evidence until it is
methodical investigatory tech-
formally admitted by a judge or other ruling official.
niques to present evidence of
crimes in a court or similar setting. Digital forensics investigators use a variety of tools to support their work, as you
will learn later in this module. However, the tools and methods used by attackers can
be equally sophisticated. Digital forensics can be used for two key purposes:
evidentiary material
(EM) • To investigate allegations of digital malfeasance. Such an investigation requires
Any information that could poten- digital forensics to gather, analyze, and report the findings. This is the primary
tially support an organization’s
legal or policy-based case against
mission of law enforcement in investigating crimes that involve computer tech-
a suspect; also known as items of nologies or online information.
potential evidentiary value. • To perform root cause analysis. If an incident occurs and the organization sus-
pects an attack was successful, digital forensics can be used to examine the
digital malfeasance path and methodology for gaining unauthorized access, and to determine how
A crime involving digital media, pervasive and successful the attack was. This type of analysis is used primarily
computer technology, or related by incident response teams to examine their equipment after an incident.
components.
Some investigations are undertaken by an organization’s own personnel, while
others require the immediate involvement of law enforcement. In general, whenever
root cause analysis
investigators discover evidence of a crime, they should immediately notify manage-
The determination of the source or
origin of an event, problem, or issue ment and recommend contacting law enforcement. Failure to do so could result in
like an incident. unfavorable action against the investigator or organization.
Module 5 Incident Response and Contingency Planning 201
For more information on digital forensics, visit the American Society of Digital Forensics and eDiscovery at
i www.asdfed.com.
Security incident
Archive triggers incident
response process
No
Prepare affidavit
Policy violation or seeking Investigation
Yes Collect evidence
crime detected authorization authorized?
to investigate
Produce report
Archive and submit Analyze evidence
for disposition
• Scientific Working Group on Digital Evidence: Published Guidelines and Best Practices (https://www.swgde.
org/documents/published)
• First Responders Guide to Computer Forensics (https://resources.sei.cmu.edu/asset_files/Handbook/
2005_002_001_14429.pdf)
• First Responders Guide to Computer Forensics: Advanced Topics (http://resources.sei.cmu.edu/asset_files/
handbook/2005_002_001_14432.pdf)
Online Versus Offline Data Acquisition There are generally two methods of acquiring evidence from a system. The
first is the offline model, in which the investigator removes the power source and then uses a utility or special device
to make a bit-stream, sector-by-sector copy of the hard drives on the system. By copying the drives at the sector level,
you can ensure that any hidden or erased files are also captured. The copied drive then becomes the image that can
be used for analysis, and the original drive is stored for safekeeping as true EM or possibly returned to service. For
the purposes of this discussion, the term copy refers to a drive duplication technique, whereas an image is the file that
contains all the information from the source drive.
Module 5 Incident Response and Contingency Planning 203
This approach requires the use of sound processes and techniques or read-only hardware known as write-blockers
to prevent the accidental overwriting of data on the source drive. The use of these tools also allows investigators
to assert that the EM was not modified during acquisition. In another offline approach, the investigator can reboot
the system with an alternate operating system or a specialty boot disk like Helix or Knoppix. Still another approach
involves specialty hardware that connects directly to a powered-down hard drive and provides direct power and data
connections to copy data to an internal drive.
In online or live data acquisition, investigators use network-based tools to acquire a protected copy of the informa-
tion. The only real difference between the two methods is that the source system cannot be taken offline, and the tools
must be sophisticated enough to avoid altering the system during data acquisition. Furthermore, live data acquisition
techniques may acquire data that is in movement and in an inconsistent state with some transactions that are only
partially recorded. Table 5-4 lists common methods of acquiring data.
The creation of a copy or image can take a substantial amount of time. Users who have made USB copies of their
data know how much time it takes to back up several gigabytes of data. When dealing with networked server drives,
the data acquisition phase can take many hours to complete, which is one reason investigators prefer to seize drives
and take them back to the lab to be imaged or copied.
Other Potential EM Not all EM is on a suspect’s computer hard drive. A technically savvy attacker is more likely to
store incriminating evidence on other digital media, such as smartphones, removable drives, CDs, DVDs, flash drives,
memory chips or sticks, or other computers accessed across the organization’s networks or via the Internet. EM located
outside the organization is particularly problematic because the organization cannot legally search systems it doesn’t
own. However, the simple act of viewing EM on a system leaves clues about the location of the source material, and a
skilled investigator can at least provide some assistance to law enforcement when conducting a preliminary investiga-
tion. Log files are another source of information about the access and location of EM, as well as what happened and when.
Some evidence isn’t electronic or digital. Many suspects have been further incriminated when passwords to
their digital media were discovered in the margins of user manuals, in calendars and day planners, and even on notes
attached to their systems.
EM Handling Once the evidence is acquired, both the copy image and the original drive should be handled properly
to avoid legal challenges based on authenticity and preservation of integrity. If the organization or law enforcement
cannot demonstrate that no one had access to the evidence, they cannot provide strong assurances that it has not
been altered. Such access can be physical or logical if the device is connected to a network. Once the evidence is in the
possession of investigators, they must track its movement, storage, and access until the resolution of the event or case.
This is typically accomplished through chain of evidence (also known as chain of custody) procedures. The evidence
is then tracked wherever it is located. When the evidence changes hands or is stored, the documentation is updated.
Not all evidence-handling requirements are met through the chain of custody process. Digital media must be stored
in a specially designed environment that can be secured to prevent unauthorized access. For example, individual
items might need to be stored in containers or bags that protect them from electrostatic discharge or magnetic fields.
Additional details are provided in the nearby feature on search-and-seizure procedures.
Authenticating the Recovered Evidence The copy or image is typically transferred to the laboratory for the next
stage of authentication. Using cryptographic hash tools, the team must be able to demonstrate that any analyzed
copy or image is a true and accurate replica of the source EM. As you will learn in Module 10, the hash tool takes a
variable-length file and creates a single numerical value, usually represented in hexadecimal notation, that functions
like a digital fingerprint. By hashing the source file and the copy, the investigator can assert that the copy is a true
and accurate duplicate of the source.
Analyzing the Data The most complex part of an investigation is analyzing the copy
or image for potential EM. While the process can be performed manually using simple
utilities, three industry-leading applications dominate the market for digital forensics:
chain of evidence
• Guidance Software’s EnCase (www.guidancesoftware.com)
The detailed documentation of the
collection, storage, transfer, and • AccessData Forensics Tool Kit (FTK, at www.accessdata.com)
ownership of evidentiary material • OSForensics (www.osforensics.com)
from the crime scene through its
presentation in court and its even- Open-source alternatives to these rather expensive tools include Autopsy
tual disposition. and The Sleuth Kit, which are available from www.sleuthkit.org. Autopsy, shown in
Figure 5-10, is a stand-alone GUI interface for The Sleuth Kit, which natively uses a
chain of custody command-line interface. Each tool is designed to support an investigation and assist
See chain of evidence. in the management of the entire case.
Source: sleuthkit.org.
1. Build the case file by entering background information, including the investigator, suspect, date, time, and sys-
tem analyzed.
2. Load the image file into the case file. Typical image files have .img, .e01, or .001 extensions.
3. Index the image. Note that some systems use a database of known files to filter out files that are applications, system
files, or utilities. The use of this filter improves the quality and effectiveness of the indexing process.
4. Identify, export, and bookmark related text files by searching the index.
5. Identify, export, and bookmark related graphics by reviewing the images folder. If the suspect is accused of viewing
child pornography, do not directly view the images. Some things you can’t “unsee.” Use the database of known images
to compare hash values and tag them as suspect.
6. Identify, export, and bookmark other evidence files.
7. Integrate all exported and bookmarked material into the case report.
The first component of the analysis phase is indexing. During indexing, many investigatory tools create an index
of all text found on the drive, including data found in deleted files and in file slack space. This indexing is similar to
that performed by Google Desktop or Windows Desktop Search tools. The index can then be used by the investigator
to locate specific documents or document fragments. While indexing, the tools typically organize files into categories,
such as documents, images, and executables. Unfortunately, like imaging, indexing is a time- and processor-consuming
operation, and it could take days on images that are larger than 20 gigabytes.
In some cases, the investigator may find password-protected files that the suspect used to protect the data. Several
commercial password-cracking tools can assist the investigator. Some are sold in conjunction with forensics tools, like
the AccessData Password Recovery Tool Kit.
Reporting the Findings As investigators examine the analyzed copies or images and identify potential EM, they can
tag it and add it to their case files. Once they have found a suitable amount of information, they can summarize their
findings with a synopsis of their investigatory procedures in a report and submit it to the appropriate authority. This
206 Principles of Information Security
authority could be law enforcement or management. The suitable amount of EM is a flexible determination made by the
investigator. In certain cases, like child pornography, one file is sufficient to warrant turning over the entire investigation
to law enforcement. On the other hand, dismissing an employee for the unauthorized sale of intellectual property may
require a substantial amount of information to support the organization’s assertion. Reporting methods and formats
vary among organizations and should be specified in the digital forensics policy. A general guideline is that the report
should be sufficiently detailed to allow a similarly trained person to repeat the analysis and achieve similar results.
Evidentiary Procedures
In information security, most operations focus on policies—documents that provide managerial guidance for ongoing imple-
mentation and operations. In digital forensics, however, the focus is on procedures. When investigating digital malfeasance
or performing root cause analysis, keep in mind that the results and methods of the investigation may end up in criminal
or civil court. For example, during a routine systems update, suppose that a technician finds objectionable material on an
employee’s computer. The employee is fired and promptly sues the organization for wrongful termination, so the investiga-
tion of the objectionable material comes under scrutiny by the plaintiff’s attorney, who will attempt to cast doubt on the
ability of the investigator. While technically not illegal, the presence of the material may have been a clear violation of policy,
prompting the dismissal of the employee. However, if an attorney can convince a jury or judge that someone else could have
placed the material on the plaintiff’s system, the employee could win the case and potentially a large financial settlement.
When the scenario involves criminal issues in which an employee discovers evidence of a crime, the situation
changes somewhat. The investigation, analysis, and report are typically performed by law enforcement personnel.
However, if the defense attorney can cast reasonable doubt on whether the organization’s information security profes-
sionals compromised the digital evidentiary material, the employee might win the case. How do you avoid these legal
pitfalls? Strong procedures for handling potential evidentiary material can minimize the probability that an organiza-
tion will lose a legal challenge.
Organizations should develop specific procedures, along with guidance for their effective use. The policy docu-
ment should specify the following:
• Who may conduct an investigation
• Who may authorize an investigation
• What affidavits and related documents are required
• What search warrants and related documents are required
• What digital media may be seized or taken offline
• What methodology should be followed
• What methods are required for chain of custody or chain of evidence
• What format the final report should take and to whom it should be given
The policy document should be supported by a procedures manual and devel-
disaster recovery (DR) oped based on the documents discussed earlier, along with guidance from law
An organization’s set of planning enforcement or consultants. By creating and using these policies and procedures,
and preparation efforts for detect- an organization can best protect itself from challenges by employees who have been
ing, reacting to, and recovering subject to unfavorable action from an investigation.
from a disaster.
disaster recovery
planning (DRP) Disaster Recovery
The actions taken by senior man-
agement to develop and implement The next vital part of CP focuses on disaster recovery (DR). Disaster recovery
the DR policy, plan, and recovery planning (DRP) entails the preparation for and recovery from a disaster, whether
teams.
natural or human-made. In some cases, incidents detected by the IR team may esca-
late to the level of disaster, and the IR plan may no longer be able to handle the
disaster recovery plan effective and efficient recovery from the loss. For example, if a malicious program
(DR plan) evades containment actions and infects and disables many or most of an organiza-
The documented product of disas-
tion’s systems and their ability to function, the disaster recovery plan (DR plan) is
ter recovery planning; a plan that
shows the organization’s intended activated. Sometimes, events are by their nature immediately classified as disasters,
efforts in the event of a disaster. such as an extensive fire, flood, damaging storm, or earthquake.
Module 5 Incident Response and Contingency Planning 207
As you learned earlier in this module, the CP team creates the DR planning team (DRPT). The DRPT in turn orga-
nizes and prepares the DR response teams (DRRTs) to implement the DR plan in the event of a disaster. In reality,
there may be many different DRRTs, each tasked with a different aspect of recovery. InfoSec staff most likely will not
lead these teams but will support their efforts, ensuring that no new vulnerabilities arise during the recovery process.
The various DRRTs will have multiple responsibilities in the recovery of the primary site and the reestablishment of
operations:
• Recover information assets that are salvageable from the primary facility after the disaster.
• Purchase or otherwise acquire replacement information assets from appropriate sources.
• Reestablish functional information assets at the primary site if possible or at a new primary site, if
necessary.
1. Organize the DR team—The initial assignments to the DR team, including the team lead, will most likely
be performed by the CPMT; however, additional personnel may need to be assigned to the team as the
specifics of the DR policy and plan are developed, and as individual roles and responsibilities are defined
and assigned.
2. Develop the DR planning policy statement—A formal department or agency policy provides the authority
and guidance necessary to develop an effective contingency plan.
3. Review the BIA—The BIA was prepared to help identify and prioritize critical information and its host
systems. A review of what was discovered is an important step in the process.
4. Identify preventive controls—Measures taken to reduce the effects of business and system disruptions can
increase information availability and reduce contingency life cycle costs.
5. Create DR strategies—Thorough recovery strategies ensure that the system can be recovered quickly and
effectively following a disruption.
6. Develop the DR plan document—The plan should contain detailed guidance and procedures for restoring a
damaged system.
7. Ensure DR plan testing, training, and exercises—Testing the plan identifies planning gaps, whereas training
prepares recovery personnel for plan activation; both activities improve plan effectiveness and overall
agency preparedness.
8. Ensure DR plan maintenance—The plan should be a living document that is updated regularly to remain
current with system enhancements.
example. Disasters could also be classified by their origin, such as natural or human-
made. Most incidents fall into the human-made category (like hacker intrusions or slow-onset disasters
malware), but some could be tied to natural origins, such as fires or floods. Many Disasters that occur over time and
gradually degrade the capacity of
disasters begin as incidents, and only when they reach a specified threshold are they
an organization to withstand their
escalated from incident to disaster. A denial-of-service attack that affects a single effects.
system for a short time may be an incident, but when it escalates to affect an entire
organization for a much longer period of time, it may be reclassified as a disaster. rapid-onset disasters
Who makes this classification? It is most commonly done by a senior IT or InfoSec Disasters that occur suddenly,
manager working closely with the CSIRT and DR team leads. When the CSIRT reports with little warning, taking people’s
that an incident or collection of incidents has begun to exceed their capability to lives and destroying the means of
production.
respond, they may request that the incident(s) be reclassified as a disaster in order
for the organization to better handle the expected damage or loss.
Disasters may also be classified by their rate of occurrence. Slow-onset disasters build up gradually over time
before they can degrade the operations of the organization to withstand their effect. Hazards that cause these disaster
conditions typically include natural causes such as droughts, famines, environmental degradation, desertification,
deforestation, and pest infestation, as well as human-made causes such as malware, hackers, disgruntled employ-
ees, and service provider issues. The series of U.S. hurricanes during the fall of 2017 were an example of slow-onset
disasters—effective weather predictions enabled much of the southeast United States to prepare for the hurricanes’
potential impacts days before the storms made landfall. Similarly, the COVID-19 pandemic of 2020 was an example of
a slow-onset disaster, as its progression was tracked by global media from the start.
Usually, disasters that strike quickly are instantly classified as disasters. These disasters are commonly
referred to as rapid-onset disasters, as they occur suddenly with little warning, taking people’s lives and destroy-
ing the means of production. Rapid-onset disasters may be caused by natural effects like earthquakes, floods, storm
winds, tornadoes, and mud flows, or by human-made effects like massively distributed denial-of-service attacks;
acts of terrorism, including cyberterrorism or hacktivism; and acts of war. Interestingly, fire is an example of an
incident that can either escalate to disaster or begin as one (in the event of an explosion, for example). Fire can
be categorized as a natural disaster when caused by a lightning strike or as human-made when it is the result of
arson or an accident.
Table 5-5 presents a list of natural disasters, their effects, and recommendations for mitigation.
Planning to Recover
To plan for disasters, the CPMT engages in scenario development and impact analysis, along the way categoriz-
ing the level of threat that each potential disaster poses. When generating a DR scenario, start with the most
important asset: people. Do you have the human resources with the appropriate organizational knowledge to
restore business operations? Organizations must cross-train their employees to ensure that operations and a
sense of normalcy can be restored. In addition, the DR plan must be tested regularly so that the DR team can
lead the recovery effort quickly and efficiently. Key elements that the CPMT must build into the DR plan include
the following:
1. Clear delegation of roles and responsibilities—Everyone assigned to the DR team should be aware of his or
her duties during a disaster. Some team members may be responsible for coordinating with local services,
such as fire, police, and medical personnel. Some may be responsible for the evacuation of company
personnel, if required. Others may be assigned to simply pack up and leave.
2. Execution of the alert roster and notification of key personnel—These notifications may extend outside
the organization to include the fire, police, or medical services mentioned earlier, as well as insurance
agencies, disaster teams such as those of the Red Cross, and management teams.
210 Principles of Information Security
3. Clear establishment of priorities—During a disaster response, the first priority is always the preservation
of human life. Data and systems protection is subordinate when the disaster threatens the lives, health, or
welfare of the employees or members of the community. Only after all employees and neighbors have been
safeguarded can the DR team attend to protecting other organizational assets.
4. Procedures for documentation of the disaster—Just as in an incident response, the disaster must be carefully
recorded from the onset. This documentation is used later to determine how and why the disaster
occurred.
5. Action steps to mitigate the impact of the disaster on the operations of the organization—The DR plan should
specify the responsibilities of each DR team member, such as the evacuation of physical assets or making
sure that all systems are securely shut down to prevent further loss of data.
6. Alternative implementations for the various system components, should primary versions be unavailable—
These components include standby equipment that is either purchased, leased, or under contract with a
DR service agency. Developing systems with excess capacity, fault tolerance, autorecovery, and fail-safe
features facilitates a quick recovery. Something as simple as using Dynamic Host Control Protocol (DHCP)
to assign network addresses instead of using static addresses can allow systems to regain connectivity
quickly and easily without technical support. Networks should support dynamic reconfiguration;
restoration of network connectivity should be planned. Data recovery requires effective backup
strategies as well as flexible hardware configurations. System management should be a top priority. All
solutions should be tightly integrated and developed in a strategic plan to provide continuity. Piecemeal
construction can result in a disaster after the disaster, as incompatible systems are unexpectedly thrust
together.
As part of DR plan readiness, each employee should have two sets of emergency information in his or her possession
at all times. The first is personal emergency information—the person to notify in case of an emergency (next of kin), medi-
cal conditions, and a form of identification. The second is a set of instructions on what to do in the event of an emergency.
This snapshot of the DR plan should contain a contact number or hotline for calling the organization during an emergency,
emergency services numbers (fire, police, medical), evacuation and assembly locations (e.g., storm shelters), the name
and number of the DR coordinator, and any other needed information. An example of an emergency ID card is shown in
Figure 5-11.
Front Back
ABC Company Emergency ID Card ABC Company DR Plan Codes
Name:___________________________ DOB:_____________ CODE ACTION
Address:__________________________________________ 1a Shelter in Place – do not report to work
City:_________________ St:_________ Zip:_____________
Blood Type:__________ 1b Shelter in Place – DR team to work
Allergies:__________________________________________
2a Evacuate immediately – do not report to work
Organ Donor?:____________________________________
Emergency Contacts:_______________________________ 2b Evacuate immediately – DR team to work
> 3 Lockdown – Secure all doors/windows – do
> not report to work if off-site
Call 800-555-1212 for updates and to report status Call 800-555-1212 for updates and to report status
business continuity
(BC) Business Continuity
An organization’s set of efforts to
ensure its long-term viability when Sometimes, disasters have such a profound effect on the organization that it cannot
a disaster precludes normal opera- continue operations at its primary site until it fully completes all DR efforts. To deal
tions at the primary site; typically with such events, the organization implements its business continuity (BC) strategies.
includes temporarily establish-
ing critical operations at an alter- Business continuity planning (BCP) ensures that critical business functions
nate site until operations can be can continue if a disaster occurs. Like the DR plan, the BC plan involves teams from
resumed at the primary site or a across the organization, including IT and business operations, and is supported by
new permanent site.
InfoSec. The BC plan is usually managed by the CEO or COO of the organization, and
is activated and executed concurrently with the DR plan when the disaster is major
business continuity or long-term and requires fuller and more complex restoration of information and IT
planning (BCP)
resources. If a disaster renders the current business location unusable, there must
The actions taken by senior man-
agement to develop and implement be a plan to allow the business to continue to function. While the BC plan reestab-
the BC policy, plan, and continuity lishes critical business functions at an alternate site, the DR plan focuses on reestab-
teams. lishment of the technical infrastructure and business operations at the primary site.
Not every business needs a BC plan or BC facilities. Some small companies or fis-
BC plan cally sound organizations may be able simply to cease operations until the primary
The documented product of busi- facilities are restored. Manufacturing and retail organizations, however, depend on
ness continuity planning; a plan
continued operations for revenue. Thus, these entities must have a BC plan in place
that shows the organization’s
intended efforts to continue critical if they need to relocate operations quickly with minimal loss of revenue.
functions when operations at the BC is an element of CP, and it is best accomplished using a repeatable process or
primary site are not feasible. methodology. NIST’s SP 800-34, Rev. 1, “Contingency Planning Guide for Federal Informa-
tion Systems,”21 includes guidance for planning for incidents, disasters, and situations
that call for BC. The approach used in that document has been adapted for BC use here.
The first step in all contingency efforts is the development of policy; the next step is planning. In some organiza-
tions, these steps are considered concurrent operations in which development of policy is a function of planning; in
other organizations, policy comes before planning and is a separate process. In this text, the BC policy is developed
prior to the BC plan, and both are developed as part of BC planning. The same seven-step approach that NIST recom-
mends for CP can be adapted to an eight-step model that can be used to develop and maintain a viable BC program.
Those steps are as follows:
1. Form the BC team—As was done with the DR planning process, the initial assignments to the BC team,
including the team lead, will most likely be performed by the CPMT; however, additional personnel
may need to be assigned to the team as the specifics of the BC policy and plan are developed, and their
individual roles and responsibilities will have to be defined and assigned.
2. Develop the BC planning policy statement—A formal organizational policy provides the authority and
guidance necessary to develop an effective continuity plan. As with any enterprise-wide policy process, it
is important to begin with the executive vision.
3. Review the BIA—Information contained within the BIA can help identify and prioritize critical
organizational functions and systems for the purposes of business continuity, making it easier to
understand what functions and systems will need to be reestablished elsewhere in the event of a disaster.
4. Identify preventive controls—Little is done here exclusively for BC. Most of the steps taken in the CP and
DRP processes will provide the necessary foundation for BCP.
5. Create relocation strategies—Thorough relocation strategies ensure that critical business functions will be
reestablished quickly and effectively at an alternate location following a disruption.
6. Develop the BC plan—The BC plan should contain detailed guidance and procedures for implementing BC
strategies at predetermined locations in accordance with management’s guidance.
7. Ensure BC plan testing, training, and exercises—Testing the plan identifies planning gaps, whereas training
prepares recovery personnel for plan activation; both activities improve plan effectiveness and overall
agency preparedness.
8. Ensure BC plan maintenance—The plan should be a living document that is updated regularly to remain
current with system enhancements.
Module 5 Incident Response and Contingency Planning 213
Business Resumption
Because the DR and BC plans are closely related, most organizations merge the two functions into a single function
called business resumption planning (BRP). Such a comprehensive plan must be able to support the reestablishment
of operations at two different locations—one immediately at an alternate site and one eventually back at the primary
site. Therefore, although a single planning team can develop the BR plan, execution of the plan requires separate
execution teams.
214 Principles of Information Security
The planning process for the BR plan should be tied to, but distinct from, the IR plan. As noted earlier in the mod-
ule, an incident may escalate into a disaster when it grows dramatically in scope and intensity. It is important that the
three planning development processes be so tightly integrated that the reaction teams can easily make the transition
from incident response to disaster recovery and BCP.
Continuity Strategies
The CPMT can choose from several strategies in its BC planning. The determining factor is usually cost. Note that these
strategies are chosen from a spectrum of options rather than from the absolute specifications that follow. Also, many
organizations now use cloud-based production systems that would supplement, if not preclude, the following approaches.
In general, two categories of strategies are used in BC: exclusive use and shared use. Exclusive-use facilities
are reserved for the sole use of the leasing organization, and shared-use facilities represent contractual agreements
between parties to share or support each other during a BC event. Three general exclusive-use strategies are available:
• Hot site—A hot site is a fully configured computing facility that includes all services, communications links,
and physical plant operations. It duplicates computing resources, peripherals, phone systems, applications,
and workstations. Essentially, this duplicate facility needs only the latest data backups and the personnel to
function. If the organization uses an adequate data service, a hot site can be fully functional within minutes. Not
surprisingly, a hot site is the most expensive alternative. Disadvantages include the need to provide mainte-
nance for all the systems and equipment at the hot site, as well as physical and information security. However,
if the organization requires a 24/7 capability for near real-time recovery, the hot site is the optimal strategy.
• Warm site—A warm site provides many of the same services and options as the hot site, but typically software
applications are not included or are not installed and configured. A warm site frequently includes computing
equipment and peripherals with servers but not client workstations. Overall, it offers many of the advan-
tages of a hot site at a lower cost. The disadvantage is that several hours of preparation—perhaps days—are
required to make a warm site fully functional.
• Cold site—A cold site provides only rudimentary services and facilities. No com-
hot site puter hardware or peripherals are provided. All communications services must
A fully configured BC facility that be installed after the site is occupied. A cold site is an empty room with standard
includes all computing services, heating, air conditioning, and electrical service. Everything else is an added-cost
communications links, and physi-
cal plant operations.
option. Despite these disadvantages, a cold site may be better than nothing. Its
primary advantage is its low cost. The most useful feature of this approach is
that it ensures an organization has floor space if a widespread disaster strikes,
warm site
but some organizations are prepared to struggle to lease new space rather than
A BC facility that provides many of
the same services and options as a
pay maintenance fees on a cold site.
hot site, but typically without installed
Likewise, there are three strategies in which an organization can gain shared use
and configured software applications.
of a facility when needed for contingency options:
cold site • Timeshare—A timeshare operates like one of the three sites described previ-
A BC facility that provides only rudi- ously but is leased in conjunction with a business partner or sister organization.
mentary services, with no computer It allows the organization to provide a DR/BC option while reducing its overall
hardware or peripherals.
costs. The primary disadvantage is the possibility that more than one timeshare
participant will need the facility simultaneously. Other disadvantages include
timeshare the need to stock the facility with equipment and data from all organizations
A continuity strategy in which an
involved, the complexity of negotiating the timeshare with sharing organizations,
organization co-leases facilities
with a business partner or sister and the possibility that one or more parties might exit the agreement or sublease
organization, which allows the orga- their options. Operating under a timeshare is much like agreeing to co-lease an
nization to have a BC option while apartment with a group of friends. One can only hope that the organizations
reducing its overall costs.
remain on amicable terms, as they all could potentially gain physical access to
each other’s data.
service bureau • Service bureau—A service bureau is an agency that provides a service for a fee.
A BC strategy in which an organiza-
In the case of DR/BC planning, this service is the provision of physical facilities in
tion contracts with a service agency
to provide a facility for a fee. the event of a disaster. Such agencies also frequently provide off-site data storage
Module 5 Incident Response and Contingency Planning 215
for a fee. Contracts with service bureaus can specify exactly what the organiza- mutual agreement
tion needs under what circumstances. A service agreement usually guarantees A BC strategy in which two organi-
space when needed; the service bureau must acquire additional space in the zations sign a contract to assist the
other in a disaster by providing BC
event of a widespread disaster. In this sense, it resembles the rental-car provi-
facilities, resources, and services
sion in a car insurance policy. The disadvantage is that service contracts must until the organization in need can
be renegotiated periodically and rates can change. The contracts can also be recover from the disaster.
quite expensive.
• Mutual agreement—A mutual agreement is a contract between two organiza- rolling mobile site
tions in which each party agrees to assist the other in the event of a disaster. A BC strategy that involves contract-
It stipulates that an organization is obligated to provide necessary facilities, ing with an organization to provide
resources, and services until the receiving organization is able to recover from specialized facilities configured in
the payload area of a tractor-trailer.
the disaster. This arrangement can be a lot like moving in with relatives or
friends—it does not take long for an organization to wear out its welcome.
Many organizations balk at the idea of having to fund duplicate services and resources, even in the short term.
Still, mutual agreements between divisions of the same parent company, between subordinate and senior orga-
nizations, or between business partners may be a cost-effective solution when both parties to the agreement
have a mutual interest in the other’s continued operations and both have similar capabilities and capacities.
In addition to the preceding basic strategies, there are specialized alternatives, such as the following:
DR plan works to
reestablish
operations at
Some experts argue that the three planning components (IR, DR, and BC) of CP are so closely linked that they are
indistinguishable. Actually, each has a distinct place, role, and planning requirement. Furthermore, each component
comes into play at a specific time in the life of an incident. Figure 5-14 illustrates this sequence and shows the overlap
that may occur.
Incident
Incident detection Incident recovered,
Starts as IR plan activated Incident recovery operations
reaction
incident restored,
Adverse end IR
event Incident response IR can’t contain
Star escalates to disaster
disa ts as
ster
Disaster
Disaster reaction DR salvage/recovery operations recovered,
DR plan activated (operations restored at primary site) operations
restored,
end DR
Disaster recovery
DR can’t restore ops quickly DR complete
triggers BC triggers end of BC
BC operations
Continuity response
(operations established
BC plan activated
Threat of injury at alternate site) All personnel safe
Business continuity or loss of life and/or accounted for
to personnel triggers end of CM
CM operations
Crisis management response
(emergency services notified
CM plan activated
and coordinated)
Crisis management
• Desk check—The simplest kind of validation involves distributing copies of the appropriate plans to all indi-
viduals who will be assigned roles during an actual incident or disaster. Each of these individuals performs
a desk check by reviewing the plan and creating a list of correct and incorrect components. While not a true
218 Principles of Information Security
structured test, this strategy is a good way to review the perceived feasibility and effective-
walk-through ness of the plan and ensure at least a nominal update of the policies and plans.
The CP testing strategy in which all • Structured walk-through—In a structured walk-through, all involved individuals
involved individuals walk through walk through the steps they would take during an actual incident or disaster.
a site and discuss the steps they
would take during an actual CP
This exercise can consist of an on-site walk-through, in which everyone dis-
event; can also be conducted as a cusses his or her actions at each particular location and juncture, or it may be
conference room talk-through. more of a talk-through, in which all involved individuals sit around a conference
table and discuss their responsibilities as the incident unfolds.
talk-through • Simulation—In a simulation, the organization creates a role-playing exercise in
A form of structured walk-through which the CP team is presented with a scenario of an actual incident or disaster
in which individuals meet in a con- and expected to react as if it had occurred. The simulation usually involves
ference room and discuss a CP plan
rather than walking around the performing the communications that should occur and specifying the required
organization. physical tasks, but it stops short of performing the actual tasks required, such
as installing the backup data or disconnecting a communications circuit. The
simulation major difference between a walk-through and a simulation is that in simulations,
The CP testing strategy in which the the discussion is driven by a scenario, whereas walk-throughs focus on simply
organization conducts a role-play- discussing the plan in the absence of any particular incident or disaster. Simu-
ing exercise as if an actual incident
lations tend to be much more structured, with time limits, planned AARs, and
or disaster had occurred. The CP
team is presented with a scenario moderators to manage the scenarios.
in which all members must specify • Full-interruption testing—In full-interruption testing, individuals follow each and
how they would react and commu- every IR/DR/BC procedure, including the interruption of service, restoration of data
nicate their efforts.
from backups, and notification of appropriate individuals. This exercise is often per-
formed after normal business hours in organizations that cannot afford to disrupt or
full-interruption simulate the disruption of business functions. Although full-interruption testing is
testing
the most rigorous testing strategy, it is unfortunately too risky for most businesses.
The CP testing strategy in which all
team members follow each IR/DR/ At a minimum, organizations should conduct periodic walk-throughs (or talk-
BC procedure, including those for
throughs) of each of the CP component plans. Failure to update these plans as
interruption of service, restoration
of data from backups, and notifica-the business and its information resources change can erode the team’s ability to
tion of appropriate individuals. respond to an incident, or possibly cause greater damage than the incident itself. If
this sounds like a major training effort, note what the author Richard Marcinko, a former Navy SEAL, has to say about
motivating a team:24
• The more you sweat to train, the less you bleed in combat.
• Training and preparation can hurt.
• Lead from the front, not the rear.
• You don’t have to like it; you just have to do it.
• Keep it simple.
• Never assume.
• You are paid for results, not methods.
One often-neglected aspect of training is cross-training. In a real incident or disaster, the people assigned to par-
ticular roles are often not available. In some cases, alternate people must perform the duties of personnel who have
been incapacitated by the disastrous event that triggered the activation of the plan. The testing process should train
people to take over in the event that a team leader or integral member of the execution team is unavailable.
Final Thoughts on CP
As in all organizational efforts, iteration results in improvement. A critical component of the NIST-based methodologies
presented in this module is continuous process improvement (CPI). Each time the organization rehearses its plans, it
should learn from the process, improve the plans, and then rehearse again. Each time an incident or disaster occurs,
the organization should review what went right and what went wrong. The actual results should be so thoroughly
analyzed that any changes to the plans that could have improved the outcome will be implemented into a revised set
of plans. Through ongoing evaluation and improvement, the organization continues to move forward and continually
improves upon the process so that it can strive for an even better outcome.
Module 5 Incident Response and Contingency Planning 219
Closing Scenario
Charlie sat at his desk the morning after his nightmare. He had answered the most pressing e-mails in his inbox and had a
piping hot cup of coffee at his elbow. He looked down at a blank legal pad, ready to make notes about what to do in case
his nightmare became reality.
Discussion Questions
1. What would be the first note you wrote down if you were Charlie?
2. What else should be on Charlie’s list?
3. Suppose Charlie encountered resistance to his plans to improve contingency planning. What appeals could he
use to sway opinions toward improved business contingency planning?
Selected Readings
• A complete treatment of the contingency planning process is presented in Principles of Incident Response and Disaster
Recovery, 3rd Edition, by Michael Whitman and Herbert Mattord, published by Cengage Learning.
• A book that focuses on the incident response elements of contingency planning is Intelligence-Driven Incident Response:
Outwitting the Adversary by Scott J. Roberts and Rebekah Brown, published by O’Reilly.
Module Summary
• Planning for unexpected events is usually the responsibility of general business managers and the information
technology and information security communities of interest.
• For a plan to be seen as valid by all members of the organization, it must be sanctioned and actively supported
by the general business community of interest.
• Some organizations are required by law or other mandate to have contingency planning procedures in place
at all times, but all business organizations should prepare for the unexpected.
• Contingency planning (CP) is the process by which the information technology and information security com-
munities of interest position their organizations to prepare for, detect, react to, and recover from events that
threaten the security of information resources and assets.
• CP is made up of four major components: the data collection and documentation process known as the busi-
ness impact analysis (BIA), the incident response (IR) plan, the disaster recovery (DR) plan, and the business
continuity (BC) plan.
• Organizations can either create and develop the four planning elements of the CP process as one unified plan,
or they can create these elements separately in conjunction with a set of interlocking procedures that enable
continuity.
220 Principles of Information Security
• To ensure continuity during the creation of the CP components, a seven-step CP process is used:
1. Develop the contingency planning policy statement.
2. Conduct the BIA.
3. Identify preventive controls.
4. Create contingency strategies.
5. Develop a contingency plan.
6. Ensure plan testing, training, and exercises.
7. Ensure plan maintenance.
• Four teams are involved in contingency planning and contingency operations: the CP team, the IR team, the
DR team, and the BC team. The IR team ensures that the CSIRT is formed.
• The IR plan is a detailed set of processes and procedures that plan for, detect, and resolve the effects of an
unexpected event on information resources and assets.
• For every scenario identified, the CP team creates three sets of procedures—for before, during, and after the
incident—to detect, contain, and resolve the incident.
• Incident classification is the process by which the IR team examines an incident candidate and determines
whether it constitutes an actual incident.
• Three categories of incident indicators are used: possible, probable, and definite.
• When any one of the following happens, an actual incident is in progress: loss of availability of information,
loss of integrity of information, loss of confidentiality of information, violation of policy, or violation of law.
• Digital forensics is the investigation of wrongdoing in the arena of information security. Digital forensics
requires the preservation, identification, extraction, documentation, and interpretation of computer media
for evidentiary and root cause analysis.
• DR planning encompasses preparation for handling and recovering from a disaster, whether natural or
human-made.
• BC planning ensures that critical business functions continue if a catastrophic incident or disaster occurs.
BC plans can include provisions for hot sites, warm sites, cold sites, timeshares, service bureaus, and mutual
agreements.
• Because the DR and BC plans are closely related, most organizations prepare the two at the same time and
may combine them into a single planning document called the business resumption (BR) plan.
• The DR plan should include crisis management, the action steps taken during and after a disaster. In some
cases, the protection of human life and the organization’s image are such high priorities that crisis manage-
ment may deserve its own policy and plan.
• All plans must be tested to identify vulnerabilities, faults, and inefficient processes. Several strategies can
be used to test contingency plans: desk checks, structured walk-throughs, simulations, and full interruption.
Review Questions
1. What is the name for the broad process of plan- 6. Define the term incident as used in the context of
ning for the unexpected? What are its primary IRP. How is it related to the concept of incident
components? response?
2. Which two communities of interest are usually 7. List and describe the criteria used to determine
associated with contingency planning? Which com- whether an actual incident is occurring.
munity must give authority to ensure broad sup- 8. List and describe the sets of procedures used to
port for the plans? detect, contain, and resolve an incident.
3. According to some reports, what percentage of 9. What is incident classification?
businesses that do not have a disaster plan go out 10. List and describe the actions that should be taken
of business after a major loss? during the reaction to an incident.
4. List the seven-step CP process recommended by 11. What is an alert roster? What is an alert message?
NIST. Describe the two ways they can be used.
5. List and describe the teams that perform the plan- 12. List and describe several containment strate-
ning and execution of the CP plans and processes. gies given in the text. On which tasks do they
What is the primary role of each? focus?
Module 5 Incident Response and Contingency Planning 221
13. What is a disaster recovery plan, and why is it might use the various contingency planning compo-
important to the organization? nents as separate plans? Why?
14. What is a business continuity plan, and why is it 18. What strategies can be used to test contingency
important? plans?
15. What is a business impact analysis, and what is it 19. List and describe two specialized alternatives not
used for? often used as a continuity strategy.
16. Why should contingency plans be tested and 20. What is digital forensics, and when is it used in a
rehearsed? business setting?
17. Which types of organizations might use a unified
continuity plan? Which types of organizations
Exercises
1. Using a Web search engine, search for the terms disaster recovery and business continuity. How many responses
do you get for each term? Note the names of some of the companies in the response. Now perform the search
again, adding the name of your metropolitan area or community.
2. Go to http://csrc.nist.gov. Under “Publications,” select Special Publications, and then locate SP 800-34, Rev. 1,
“Contingency Planning Guide for Federal Information Systems.” Download and review this document. Outline
and summarize the key points for an in-class discussion.
3. Use your library or the Web to find a reported natural disaster that occurred at least six months ago. From the
news accounts, determine whether local or national officials had prepared disaster plans and if the plans were
used. See if you can determine how the plans helped officials improve disaster response. How do the plans help
the recovery?
4. Using the format provided in the text, design an incident response plan for your home computer. Include actions
to be taken if each of the following events occur:
a. Virus attack
b. Power failure
c. Fire
d. Burst water pipe
e. ISP failure
What other scenarios do you think are important to plan for?
5. Classify each of the following occurrences as an incident or disaster. If an occurrence is a disaster, determine
whether business continuity plans would be called into play.
a. A hacker breaks into the company network and deletes files from a server.
b. A fire breaks out in the storeroom and sets off sprinklers on that floor. Some computers are damaged, but the fire is
contained.
c. A tornado hits a local power station, and the company will be without power for three to five days.
d. Employees go on strike, and the company could be without critical workers for weeks.
e. A disgruntled employee takes a critical server home, sneaking it out after hours.
For each of the scenarios (a–e), describe the steps necessary to restore operations. Indicate whether law
enforcement would be involved.
References
1. “NIST General Information.” National Institute of Standards and Technology. Accessed September 1, 2020,
from www.nist.gov/director/pao/nist-general-information.
2. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
222 Principles of Information Security
3. “Disaster Recovery Guide.” The Hartford. Accessed September 1, 2020, from www.thehartford.com/
higrd16/claims/business-disaster-recovery-guide.
4. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
5. Swanson, M., Hash, J., and Bowen, P. Special Publication 800-18, Rev 1: “Guide for Developing Security
Plans for Information Systems.” National Institute of Standards and Technology. February 2006. Page 31.
Accessed December 6, 2017, from csrc.nist.gov/publications/nistpubs/800-18-Rev1/sp800-18-Rev1-final.pdf.
6. Zawada, B., and Evans, L. “Creating a More Rigorous BIA.” CPM Group. November/December 2002.
Accessed May 12, 2005, from www.contingencyplanning.com/archives/2002/novdec/4.aspx.
7. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
8. Ibid.
9. Ibid.
10. Ibid.
11. Ibid.
12. Bartock, M., Cichonski, J., Souppaya, M., Smith, M., Witte, G., and Scarfone, K. Special Publication
800-184, “Guide for Cybersecurity Event Recovery.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-184/final.
13. Cichonski, P., Millar, T., Grance, T., and Scarfone, K. Special Publication 800-61, Rev. 2: “Computer Security
Incident Handling Guide.” National Institute of Standards and Technology. Accessed September 1, 2020,
from https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final.
14. Ibid.
15. Pipkin, D. Information Security: Protecting the Global Enterprise. Upper Saddle River, NJ: Prentice Hall PTR,
2000:285.
16. Cichonski, P., Millar, T., Grance, T., and Scarfone, K. Special Publication 800-61, Rev. 2: “Computer Security
Incident Handling Guide.” National Institute of Standards and Technology. Accessed September 1, 2020,
from https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final.
17. Pipkin, D. Information Security: Protecting the Global Enterprise. Upper Saddle River, NJ: Prentice Hall PTR,
2000:285.
18. Bartock, M., Cichonski, J., Souppaya, M., Smith, M., Witte, G., and Scarfone, K. Special Publication 800-184,
“Guide for Cybersecurity Event Recovery.” Pages 13–14. National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-184/final.
19. McAfee. “Emergency Incident Response: 10 Common Mistakes of Incident Responders.” Accessed Septem-
ber 1, 2020, from www.techwire.net/uploads/2012/09/wp-10-common-mistakes-incident-responders.pdf.
20. Cichonski, P., Millar, T., Grance, T., and Scarfone, K. Special Publication 800-61, Rev. 2: “Computer Security
Incident Handling Guide.” National Institute of Standards and Technology. Accessed September 1, 2020,
from https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final.
21. Swanson, M., Bowen, P., Phillips, A., Gallup, D., and Lynes, D. Special Publication 800-34, Rev. 1: “Contin-
gency Planning Guide for Federal Information Systems.” National Institute of Standards and Technology.
Accessed September 1, 2020, from https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final.
22. Witty, R. “What is Crisis Management?” Gartner Online. September 19, 2001. Accessed December 6, 2017,
from www.gartner.com/doc/340971.
23. Davis, L. Truth to Tell: Tell It Early, Tell It All, Tell It Yourself: Notes from My White House Education.
New York: Free Press, May 1999.
24. Marcinko, R., and Weisman, J. Designation Gold. New York: Pocket Books, 1998.