Efficient PHA of Non-Continuous Operating Modes
Efficient PHA of Non-Continuous Operating Modes
Keywords: Human Factors, Hazard Evaluation, Process Hazard Analysis, Human Error, PHA,
Layer of Protection Analysis, LOPA, Process Safety, PSM, Risk-Based Process Safety, RBPS
Abstract
Process safety is about controlling risk of failures and errors; controlling risk is
primarily about reducing human error. All elements of Risk-Based Process
Safety (RBPS) and alternative standards for process safety (such as US
OSHA’s standard for Process Safety Management [PSM] or ACC’s Process
Safety Code™ [PSC]) have many elements, and each of these in turn help to
reduce the chance of human error or else limit its impact. One core element is
the process hazard analysis (PHA), also called a hazard identification and risk
assessment (HIRA). PHAs have been performed formally in gradually
improving fashion for more than five decades. Methods such as HAZOP and
What-If Analysis have been developed and honed during this time. But, one
weakness identified twenty years ago still exists in the majority of PHAs
performed around the world; most PHAs do not thoroughly analyze the errors
that can occur during startup mode, shutdown modes, and other non-routine
(non-normal) modes of operations. This is despite the fact that most major
accidents occur during non-routine operations (about 70%), even though the
process/plant may only be in that mode of operation for 5% or less a year.
Instead of focusing on the most hazardous modes of operation, most PHAs
focus on normal operations. In a majority of both older operations and new
plants/projects, the non-routine modes of operations are not analyzed at all.
This means that perhaps 60 to 80% of the accident scenarios during non-
routine operations are being missed by the PHAs. If the hazard evaluation
does not find the scenarios that can likely occur during these non-routine
operations, the organization will not know what safeguards are needed
against these scenarios.
This paper shows practical ways to efficiently and thoroughly analyze the step-
by-step procedures that are used to control non-routine operating modes, as
well as those for batch and between batch operations. This paper builds upon
the methods and rules provided in papers beginning in 1993 and brings them
up-to-date. Experienced PHA leaders should be able to use the rules and
approaches provided in this paper to improve their PHAs. And others will be
able to use the results of this paper to estimate the number of accident
scenarios they may be missing and to estimate the time it would take to
complete an efficient and thorough PHA of the non-routine modes of
operation.
1. Introduction
During the period 1970 to 1989, 60% to 75% of major accidents in continuous processes
occurred during non-routine modes of operation (principally startup and online maintenance
modes).1 This trend has continued unabated for most the process industry to the present day. A
compilation of 47 major process safety accidents in the past 23 years is given in Appendix A; of
these, 66% occurred during non-routine operations. In addition, a poll of over 50 clients
indicates that 70% of their moderate and major accidents occurred during non-routine modes of
operation. This data is particularly disturbing when factoring in the time at risk, since most
continuous processes are typically shut down 5% or less per year. Therefore, for many
continuous processes the workers and other stakeholders are 30 to 50 times more likely to have a
major accident during the time frame of startup, shutdown, or on-line maintenance modes of
operation. One reason for processes being at higher risk during these operating modes is many
of the safeguards (independent protection layers) are bypassed or may not be fully capable. A
hazard evaluation is necessary to help a company identify the layers of protection necessary to
lower the risk to acceptable levels. To fulfill this need, a company operating a continuous
process should Fully evaluate the hazards during All modes of operation. Unfortunately, in the
first four decades of hazard evaluation use (beginning after the Flixborough disaster in the UK in
1974 – another accident that occurred during startup in a temporary configuration), many
companies have done a poor job of identifying and evaluating accident scenarios during startup,
shutdown, and online maintenance modes of operation, while usually doing a good job of
evaluating hazards of normal modes (continuous or normal batch modes) of operation.
Recent US OSHA PSM National Emphasis Programs underscore the need for companies to
identify potential accident scenarios during non-routine modes, and to reduce the frequency and
consequences of such errors as part of an overall process safety management (PSM) program.
Paragraph (e) of the US OSHA regulation on PSM, 29 CFR 1910.119,2 and similar requirements
in US EPA's rule for risk management programs (RMP), 40 CFR 68.24,3 specifically require that
PHAs consider human factors and address all hazards (stated as “during all modes of operation,
both routine and non-routine,” in Appendix C to the OSHA PSM regulation).
Human factor deficiencies can make operations during non-routine modes extremely hazardous –
since operators have less operating experience for non-routine modes, and these times of
operations rely heavily on operator decision-making and tasks. In addition, there are less layers
of protection in effect during non-routine operations. Analyzing procedure steps can identify
steps where the operator is most likely to make mistakes and suggest ways, ranging from adding
hardware to improving management systems, to reduce risk of an accident scenario.
The approach outlined in this work applies equally to any hazard evaluation where the steps for a
non-routine mode of operation are well defined (i.e., written), including PHAs of existing units,
hazard evaluations during preliminary and detailed design phases of projects (for new/revised
processes), and large or small management-of-change hazard reviews.
Human errors are sometimes mistakenly called procedural errors. This is not true anymore than
saying all equipment errors are due to design errors. People make mistakes for many reasons,
but experts estimate that only about 10% of accidents due to human errors in the workplace
occur because of personal influences, such as emotional state, health, or carelessness. Over the
past five decades of research and observation in the workplace on human error, we have come to
know that human error probability depends on many factors. These factors (described in more
detail in Human Factors Missing from PSM 5), include:
Procedure accuracy and clarity (the number one, most sited root cause of accidents):
o A procedure typically needs to be 95% or better accuracy to help reduce human
error; humans tend to compensate for the remaining 5% of errors in a written
procedure.
o A procedure must clearly convey the information (there are about 25 rules for
structuring procedures to accomplish this) and the procedure must be convenient
to use.
o Checklist features – These should be used and enforced either in the procedure or
in a supplemental document.
Fitness for Duty – Includes control of many sub-factors such as fatigue (a factor in a great
many accidents), stress, illness and medications, and substance abuse.
Workload management – Too little workload and the mind becomes bored and looks for
distraction; too many tasks per hour can increase human error as well.
In addition to the human factors listed, other considerations for use of a human as an IPL include
(1) time available to perform the action and (2) physical capability to perform the action. Table
1 lists the main human factors as well as the multiplying effect of deficiencies in control of these
human factors.
a
Based in part on The SPAR-H Human Reliability Analysis Method, NUREG/CR-6883.6 Another US NRC initiative,
NUREG/CR-6903, Human Event Repository and Analysis (HERA), 20077, builds on the human factors categories
described in SPAR-H in order to develop a taxonomy for collection of human error data from events at nuclear
power plants. PII has modified the list to account for general industry data and terminology and to incorporate PII
internal data.
These human-error causes (human factors), which in turn result from other human errors, are all
directly within management's control. When using human error data for controlling initiating
events (IEs) and independent protection layers (IPLs), the site must ensure that the factors above
are consistently controlled over the long-term and that they are controlled to the same degree
during the mode of operation that the PHA, HAZOP, What-if, FMEA, or LOPA covers. For
instance, if the workers are fatigued following many extra hours of work in a two week period
leading up to restart of a process, then the human error rates can increase by a factor of 10 times
or 20 times during startup.
Checklist of human factors issues (see an earlier paper4 and also in the Guidelines for Hazard
Evaluation Procedures8) can be very useful after the detailed hazard evaluation of deviations of
steps. Such analysis can indicate where generic weaknesses exist that can make errors during
any mode of operation more likely, or that can make errors during maintenance more likely.
Such human factors checklists are normally used at the end of the analysis, they can be done
piecemeal during an analysis (on breaks from the meetings) by individuals on the team, and then
the results of each individual review can be discussed as a team at the end.
As with scenarios uncovered during continuous modes of operation, the company may need to
perform analysis (including quantitative analysis such as LOPA or HRA) to more fully address
any unresolved or complex issues raised in the hazard evaluation of non-routine modes of
operation.
Case studies illustrate the analysis approach and the usefulness of this strategy.
From an informal survey of more than 100 companies, most do not currently perform process
hazard evaluations of procedures, although many do perform some type of job safety analysis
(JSA). The JSA is an excellent starting point for an evaluation of procedures because a JSA
identifies the tasks that workers must perform and the equipment required to protect workers
from typical industrial hazards (slips, falls, cuts, burns, fumes, etc.). Unfortunately, a typical
JSA will not usually identify process safety issues or related human factors concerns. For
example, from a JSA perspective, it may be perfectly safe for an operator to open a steam valve
before opening a feed valve; however, from a process safety perspective, the operator may need
to open the feed valve before the steam valve to avoid the potential for overheating the reactor
and initiating an exothermic decomposition. The primary purpose of a JSA and other traditional
methods for reviewing procedures has been to ensure that the procedures are accurate and
complete (which is required of employers in 29 CFR 1910.119(f)(3)2).
By contrast, the purpose of a hazard evaluation is not to ensure the procedures are accurate and
acceptable, but instead, to evaluate the accident scenarios if the procedures are not followed.
Even the best procedure may not be followed for any number of reasons, and these failures to
follow the prescribed instructions can and do result in accidents. In fact, in the chemical industry
and most other process industries the chance of an operator or other worker making a mistake in
following a procedure is greater than 1/100, and in some cases much greater. When taking into
account common human factor deficiencies that accompany non-routine operations, such as
fatigue, lack of practice, the rush to restart and return to full production, etc., the probability of
errors can climb to 1/10 chances per task (a task being about 1 to 10 detailed steps). Table 1
presented earlier lists the factors that can increase or decrease human error rates.
Regulators have repeatedly recognized the need for full evaluation of hazards during all modes
of operations. US OSHA obviously recognized the importance of this category of hazard
evaluation when emphasizing in CPL 2-2.45 (Systems Safety Evaluation of Operations with
Catastrophic Potential)9 that a human error analysis should address:
In another citation11 OSHA alleged a serious violation because the company did not address all
of the hazards of a process. In particular, the company was cited for not evaluating the hazards
(during the unit-wide PHA) associated with non-routine procedures such as "startup, shutdown,
emergency shutdown, and emergency operations." There were several other violations assessed
in this citation because these non-routine procedures did not (allegedly) address the
consequences associated with operators failing to follow the prescribed procedures. The OSHA
inspector was convinced that a hazard evaluation of the non-routine operating procedures should
have been part of the PHA scope.
In an article by Woodcock (an OSHA staff member) entitled, "Program Quality Verification of
Process Hazard Analysis"12 (for use in OSHA's training program for PSM inspectors), the author
states that a PHA should include analysis of the "procedures for the operation and support
functions" and goes on to define a "procedure analysis" as evaluating the risk of “skipping steps
and performing steps wrong.”
In the Risk Management Program rule (40 CFR 68.24)3 EPA also recognizes the importance of
procedural analysis, by defining the purpose of a PHA to be to "examine, in a systematic, step-
by-step way, the equipment, systems, and procedures (emphasis added) for handling regulated
substances."
Industry has found that a HAZOP or what-if analysis, structured to address procedures, can be
used effectively for finding the great majority of accident scenarios that can occur during non-
routine modes of operation.4, 8, 13, 14 Experience shows that reviews of non-routine procedures
have revealed many more hazards than merely trying to address these modes of operation during
the P&ID driven hazard evaluations.
Example: For the BP Texas City, Texas refinery, pre-2005, if the isomerization unit had a
hazard analysis of the startup mode (using What-If and 2-Guide Word analysis [explained
later in this work]), the team would have likely identified that the high-high level switch in
the column was a critical safety device. They also may have recommended moving the
switch to a location higher in the column and then interlocking the high-high level switch to
shutdown feed to the column. However, the site performed a parametric deviation HAZOP
of the equipment nodes which focusing on continuous mode of operation, and so the team
decided that the high-high level switch was not critical since devices in upstream and
downstream process units (during continuous operation) would indicate possible level
problems in the column – and besides, the operator would certainly notice the high-level
condition on the sight glass during the rounds twice per shift. Unfortunately, these are not
necessarily safeguards during startup of the column (1) since the routine practice was to
overfill the bottoms (raise the level above the upper tap of the level controllers transmitter
and above the nozzle for the high-high level switch and (2) since swings in upstream and
downstream units are expected (and so likely such swings would not have led to
intervention by the operators of the other units).
To reinforce the need for and to explain the method for analysis of deviations of steps in a
procedure, Section 9.1 was included in the 3rd Edition of Guidelines for Hazard Evaluation
Procedures, 20088; this was one of the major changes to the hazard evaluation procedures.
In turn, the “pre-HAZOP” method for brainstorming accident scenarios from not following
procedures (including because the procedure is itself wrong) is based on the understanding
that human errors occur by someone not doing a step (errors of omission) or by doing a step
incorrectly (errors of commission). So, simply asking what would happen if the operator
omitted a step or performed a step wrong is one way to structure a hazard evaluation of a step-
by-step procedure. (We will discuss the usefulness of this simple approach to hazard
evaluation of steps later.)
In an effort to be more thorough, the inventors of HAZOP (at ICI) broke these two types of
errors into subparts and agreed on using the following 7 Guide Words:
Commission: More
Less
Out of Sequence
As Well As
Other Than
Reverse
In the early 1990s, the guide word Skip was augmented by adding the option of discussing
“are there any steps missing from the procedure.”4
To apply HAZOP to procedural steps for startup, shutdown, online maintenance, and other
modes of operation, the facilitator (or team) must first divide the procedure into individual
actions. This is already done if there is only one action per step. Then, the set of guide words
or questions is systematically applied to each action of the procedure resulting in procedural
deviations or what-if questions. The guide words (or procedural deviation phrases) shown in
Table 2 above were derived from HAZOP guide words commonly used for analysis of batch
processes. The definition of each guide word is carefully chosen to allow universal and
thorough application to both routine batch and non-routine continuous and batch procedures.
The actual review team structure and meeting progression are nearly identical to that of a
process equipment HAZOP or what-if analysis, except that active participation of one or more
operators is even more important and usually requires two operators for a thorough review; a
senior operator and a junior operator. For each deviation from the intention of the process
step (denoted by these guide words applied to the process step or action), the team must dig
beyond the obvious cause, "operator error," to identify root causes associated with human
error such as "inadequate emphasis on this step during training," “responsible for performing
two tasks simultaneously,” "inadequate labeling of valves," or "instrument display confusing
or not readable." The guide word missing elicits causes such as "no written procedural step or
formal training to obtain a hot work permit before this step," or "no written procedural step or
formal training to open the discharge valve before starting the pump."
5.2 Two (2) Guide Word Method for Analyzing Deviations of Procedural Steps
A more streamlined guide word approach has also proven very useful for (1) procedures
related to less hazardous operations and tasks and/or (2) when the leader has extensive
experience in the use of the guide words mentioned previously and can therefore compensate
for the weaknesses of a more streamlined approach. The two guide words for this approach
(as defined in Table 3 below) encompass the basic human error categories: errors of omission
and commission. These guide words are used in an identical way to the guide words
introduced earlier. Essentially "omit" includes the errors of omission related to the guide
words "skip," "part of," and "missing" mentioned earlier. The guide word "incorrect"
incorporates the errors of commission related to the guide words "more," "less," "out of
sequence," "as well as," and "other than" mentioned earlier. Note that these two guide words
(Table 3) fill the basic requirements for a human error analysis as outlined in OSHA's CPL 2-
2.45.9
Drawing or Procedure: SOP-03-002; Cooling Water Failure Unit: HF Alkylation Method: 2 Guide Word Analysis Documentation Type: Cause-by-Cause
Node: 23 Description: STEP 2: Block in olefin feed to each of the 2 reactors by blocking in feed at flow control valves
Item Deviation Causes Consequences Safeguards Recommendation
23.1 Step not Operator failing to block in one of High pressure due to possible runaway High temperature alarm on reactor
performed the reactors, such as due to reaction (because cooling is already lost), High pressure alarm on reactor
miscommunication between because of continued feeding of olefin (link
control room operator and field to 11.7 - High Rxn Rate; HF Alky Reactor Field operator may notice sound of fluid flow across
operator; or control valve sticking #1/#2) valve
open or leaking through High pressure due to high level in the Flow indication (in olefin charge line to reactor that is
reactor, because of continued feeding olefin inadvertently NOT shutdown)
(link to 11.1 - High Level; HF Alky Reactor Level indicator, high level alarm, and independent high-
#1/#2) high level switch/alarm
Operator failing to make sure High pressure due to possible runaway High temperature alarm on reactor
bypass valve is also closed, since reaction (because cooling is already lost), High pressure alarm on reactor
this precaution is not listed in the because of continued feeding of olefin (link
written procedure; or the bypass to 11.7 - High Rxn Rate; HF Alky Reactor Field operator skill training requires always checking
valve leaks through #1/#2) bypasses are closed), when blocking in control valves
High pressure due to high level in the Field operator may notice sound of fluid flow across
reactor, because of continued feeding olefin valve
(link to 11.1 - High Level; HF Alky Reactor Flow indication in olefin charge line (but likely not
#1/#2) sensitive enough for small flows)
Level indicator, high level alarm, and independent high-
high level switch/alarm
Operator failing to close flow Valve possibly opens full at restart, allowing Control room skill training requires always manually 37. Implement best-
control valve manually from the too much flow to reactor at restart, resulting commanding automatic valves closed before telling practice rules for
DCS because the phrase “block in” in poor quality at startup and/or possibly field operator to block in control valve procedure writing,
is used instead of the word “close” resulting in runaway reaction and high which includes using
pressure common terminology.
23.2 Step Operator closing the olefin charge Deadheading of charge pump, leading to Step 3 of procedure that says to shutdown charge 41. Move Step 3
performed flow control valves before shutting possible pump seal damage/failure and/or pump ahead of Step 2.
wrong down the charge pump, primarily other leak, resulting in a fire hazard affecting The step to shut down the charge pump (Step 3) is
because the steps are written out a small area (link to 5.12 - Loss of typically accomplished before Step 2 (in practice)
of the proper sequence Containment; Olefin Charge Line/Pump)
Field operator closing both Possible trapping of liquid between block Field operator skill training stresses that only one block
upstream and downstream block valve and control valve, leading to possible valve should be closed
valves valve damage (due to thermal expansion)
6. What-if Method for Analyzing Deviations of Procedural Steps
The What-if method for analyzing procedure-based modes of operations is free brainstorming
without the aid (or constraints) of guide words. This method is described in detail the
Guideline for Hazard Evaluation Procedures (CCPS)8. The hazard evaluation team using
this method would read the procedure and then answer the question: “What mistakes will
lead to our consequences of interest?” The team would list these mistakes and then
brainstorm the full consequences, causes, and existing safeguards – the same analysis
approach described for the guide word approaches mentioned earlier in this section. What-if
brainstorming is not applied to each step of the procedure, but rather covers the entire task
(procedure) at one time.
Experience of the leader or the team plays a major part in selecting the procedures to be
analyzed, and then in deciding when to use each guide word set.
Figure 1 shows the typical usage of the three methods described above for a typical set of
operations procedures within a complex chemical plant or refinery or other process/
operation. Most of the procedures are simple enough, or have low enough hazards to
warrant using the What-if method. Currently, the 7-8 Guide Word approach is used
infrequently, since most tasks do not require that level of scrutiny to find the accident
scenarios during non-routine modes of operations.
Figure 1. Relative Usage of Techniques for Analysis of Procedure-Based Modes of
Operation
The experience of the leader or the team plays a major part in selecting the method to use for
each task/procedures to be analyzed. However the first decision will always be “Are these
procedures ready to be risk reviewed?” If the procedures are up-to-date, complete, clear,
and used by operators, then the best approach for completing a complete hazard evaluation
of All modes of operation, including routine modes of operation, is shown in Table 5 below:
HAZOP of Parameters of
HAZOP of Steps
Continuous Mode
o 7 Guide Word
o 2-3 Guide Word
FMEA of Continuous Mode
What-if of Simple Tasks
What-if of Simple Sub-systems
If you do not have accurate procedures then the best approach is to develop accurate and up-
to-date procedures as quickly as possible and in the meantime follow the approach shown in
Table 6 below:
Table 6. Example Choice of Methods for Hazard Evaluation of Partially All Modes of
Operations if the Operating Procedures are Less Than 80% Accurate15
Any procedure (even a computer program) can be analyzed using these techniques.
Reviews of routine procedures are important, but reviews of non-routine procedures are
even more important. As mentioned earlier, the nature of non-routine procedures means that
operators have much less experience performing them, and many organizations do not
regularly update these procedures [though this should change as companies comply with 29
CFR 1910.119(f)]2. Also, during non-routine operations, many of the standard equipment
safeguards or interlocks are off or bypassed.
Using the approaches above, a company doing a complete hazard evaluation of an existing
unit will invest about 65% of their time to evaluate normal (e.g,, continuous mode)
operation and 35% of their time for evaluating the risks of non-routine modes of operation.
Many companies do not perform a thorough analysis of the risk for startup, shutdown,
and on-line maintenance modes of operation; the reason normally given is that the
analysis of these modes of operation takes “too long.” Yet, actually, the hazard
evaluation of the normal mode is taking too long and so the organization feels it has no
time left for the analysis of procedures for startup and shutdown modes of operation. But,
if these hazard evaluations for the normal mode of operation are optimized (such as using
rules presented elsewhere16), the organization will have time for thoroughly analyzing the
non-routine modes (typically discontinuous modes) of operation and the organization will
still have a net savings overall! This point is critical since 60-75% of catastrophic
accidents occur during non-routine modes of operation. Figure 2 illustrates (for a
continuous process unit) the typical split of meeting time for analysis of routine mode of
operation versus non-routine modes of operation.
Figure 2. Relative Amount of Meeting Time Spent for Analysis of Routine and Non-
routine Modes of Operation for a Continuous Process
Define the assumptions about the system's initial status. “What is assumed to be the
starting conditions when the user of the procedure begins with Step 1?”
Define the complete design intention for each step. “Is the step actually 3 or 5 actions
instead of one action? If so, what are the individual actions to accomplish this task?”
Don’t analyze safeguard steps that start with ensure, check, verify, inspect, etc., or where
the consequence of skip is “loss of one level of safeguard/protection against …..” There
is no reason to analyze these steps since they will show up as safeguards of deviations of
other steps. This approach is similar to not analyzing a PSV during a HAZOP of
continuous mode (parametric deviation analysis); instead the PSV is shown as a
safeguard against loss of containment.
Together with an operator before the meeting, identify the sections of the procedures that
warrant use of:
o 7-8 Guide Words (extremely large consequences can happen if deviations occur)
o 2 Guide Words (the system is complex, mistakes are costly, or several
consequences could occur)
o On others, use What-If (no guide words or guide phrases; for use on simpler or
lower hazard systems)
Decompose each written step into a sequence of actions (verbs)
Apply guide words directly to the intentions of each action
Walk through procedure in the plant with one or more operators to see the work situation
and verify the accuracy of the written procedure. This is optional and should have also
been performed as part of validation of the procedure after it was originally drafted.
Determine if the procedure follows the best practices for “presentation” of the content;
the best practices will limit the probability of human error.
Discuss generic issues related to operating procedures, such as:
o staffing (normal and temporary)
o human-machine interface
o worker training, certification, etc.
o management of change
o policy enforcement
Review other related procedures such as lock out/tag out and hot work.
9. Case Studies
The following case studies illustrate the usefulness of the process outlined in this paper.
In 1991-1992, a PHA was performed for the first of the rebuilt polyethylene plants at the
Pasadena, TX, Phllips 66 plant. The accident there two years prior claimed 24 lives, injured
hundreds of others, destroyed all three polyethylene plants, and cost Phillips an estimated
$1.4 billion. Following the investigation of the accident, one of the requirements of the
settlement agreement between Phillips and the US government was to ensure the PHA of the
rebuilt units addressed hazards during All modes of operation.
The PHA team varied in size, but always included at least two operators. The team leader
was a process engineer with 15 years of experience, who was also trained in human factors.
The PHA first covered the continuous mode of operation for the approximately 200 nodes of
equipment (from feed stock through pellet handling) using the “parametric deviation” form
of HAZOP (and some What-If). Then, to complete the analysis of all modes of operation,
the PHA team performed a step-by-step analysis of all steps of all startup and shutdown and
online maintenance procedures (about 700 steps changed the state of the system and each of
these steps were analyzed) using the 7 Guide Word HAZOP method (2 Guide Word analysis
was not known to the team at this time). For deviations such as “operator skips a step,” the
causes identified by the team included "the operator doing this step miscommunicates with
the operator who performed steps earlier in the day and went to the wrong reset panel/switch
in the field". In this example, an "other-than" error led to the "skip" error; so two errors
occurred at once: the wrong switch was flipped and the correct switch was not flipped.
Other causes included: “label not distinct enough” or “thinking/believing the previous
operator completed this step.” The additional safeguards suggested by the PHA team
sometimes lower the likelihood of the error by addressing a human factors weakness. But in
many cases, the solution was a change to the hardware or instrumentation, including adding
new interlocks (these would be called Safety Instrumented Functions today) and adding
mechanical interlocks and installing larger relief valves. In a couple of cases, isolated
sections of the process were redesigned to lower the inherent risk, such as adding error-
proofing (Poke Yoka) features.
The 7 Guide Word HAZOP of non-routine modes of operation took 2.5 weeks of meetings,
40 hours a week. This was in addition to the 3.5 weeks of meetings to complete the
parametric deviation analysis HAZOP of the continuous (normal) mode of operation (as
mentioned before, 200 nodes of equipment). Note that if the team had known of and been
trained in 2 Guide Word HAZOP for procedure steps, they likely would have chosen that for
many of the tasks and it is estimated that the meeting time for analysis of non-routine
procedures would have been reduced to less than 2 weeks, with little or no loss of
thoroughness. The completed PHA report was submitted to US OSHA for review and was
approved almost immediately; OSHA particularly reviewed the analysis of all modes of
operation and coverage of human factors.
After several near misses and accidents, a company completely upgraded its PSM program.
Detailed operating and maintenance procedures were written for all non-routine tasks.
Training experts were called in to ensure that workers understood these new procedures.
During process design, particular emphasis was placed on reducing human error. The
company also decided to perform PHAs (using the HAZOP analysis technique) to uncover
potential accident scenarios in its process design and procedures, with the PHA teams
instructed to focus on human error sources. Detailed human reliability analysis was
reserved for critical, complex tasks identified by the PHA teams as having potentially severe
consequences.
The following results were taken from the analysis of a continuous catalyst addition system
for a reactor, shown in Figure 4. Though the normal mode of operation was continuous, the
addition system was frequently isolated, depressurized, refilled, and then restarted while a
standby addition system maintained feed flow to the reactor. It was interesting to note how
the team's perception of "likelihood" for a given human error scenario changed after the
procedures were reviewed.
The HAZOP review of the procedures for switching between feeders indicated that the
steps telling the operator to close the drain line valve, and later to verify the drain valve
was closed, were both "missing." Also, the operators recalled that formal (classroom)
training did not mention the operation of this manual valve and field training did not
always cover operation of this valve. The senior operators on the review team began to
realize that an inexperienced operator might not understand that the sound made by the
rushing backflow of fluid is abnormal and might react too slowly to prevent over-
pressurization. Therefore, the team concluded that if hands-on training failed to correct
the procedure deficiencies, backflow from the reactor to the KO drum was a likely
accident scenario, especially with new operators. The team recommended the
procedures be modified to (1) reflect the proper sequence of steps and (2) emphasize
the consequences of leaving the drain valve open and then later opening the feed line to
the reactor. A checklist of the proper sequence of steps was recommended for this
procedure. In addition, a recommendation from the continuous mode HAZOP
involving the relief valve on the KO drum was modified to ensure the relief valve was
also capable of handling reverse flow from the reactor (the relief size was ultimately
increased).
Note about 3% of the relief valves across the entire chemical complex (which
contained more than 3000 relief valves) were resized to account for accident scenarios
that were only found during analysis of non-routine modes of operation.
9.3 Case Study 3: Urea and Ammonia Plants in the Arabian Gulf Region
Several ammonia and urea plants had hazard evaluations performed using HAZOP of nodes
(parametric deviations for continuous mode of operation) and then non-routine modes of
operation were analyzed using What-if and 2-Guide Word analysis for selected, critical
procedures. The HAZOP of nodes (which focuses necessarily on the continuous mode of
operation) missed many accident scenarios that were later found during review of non-
routine modes of operation. The analysis of critical procedures resulted in a number of new
recommendations to reduce the risk of accident scenarios (these accident scenarios were not
identified during the HAZOP of the nodes a few days before). One example was the
recommendation to install a second, independent level transmitter on the urea reactor to
prevent overflow to the high pressure scrubber which will lead to overpressure of the entire
synthesis loop and poor reaction kinetics during startup and normal operation.
Another example related to the start-up of high pressure ammonia pumps in the synthesis
section of the ammonia plant follows. A recommendation was made during the analysis to
include more detail on the required rate of temperature increase together with notes
explaining the potential consequences for heating too quickly. In this case, a high heating
rate may damage the 316L stainless steel liner in vessels and would have caused corrosion,
deformation and ultimately a vessel failure.
It was also identified during the analysis that the development of trouble-shooting guides
should be considered for responding to severe system upsets, including control valve failure,
sticking of a hand valves, and to address contamination issues. These can be developed with
the help of the PHA tables, which contain 40% of the scenarios and information related to
trouble-shooting guides.
10. Conclusion
Qualitative analyses of non-routine operating procedures is an extremely powerful of tools
for uncovering deficiencies that can lead to human errors and for uncovering accident
scenarios during all modes of operation. This approach of step-by-step HAZOP and/or
What-If analysis is not new to industry and regulators have required similar approaches for
decades. And regulators continue to note lack of analysis of the risk of non-routine
operations and lack of risk review of changes to procedures.
From CSB Report on August 2008 Bayer CropScience Explosion: 17 “The accident
occurred during the startup of the methomyl unit, following a lengthy period of
maintenance … CSB investigators also found the company failed to perform a
thorough Process Hazard Analysis, or PHA, as required by regulation…In particular,
for operational tasks that depend heavily on task performance and operator decisions,
the team should analyze the procedures step-by-step to identify potential incident
scenarios and their consequences, and to determine if the protections in place are
sufficient.”
Further, such analyses make it easier to provide a thorough consideration of human factors.
Regardless of what hazard evaluation technique is employed, it is imperative for PHA teams
to ask, "Why would someone make this mistake?" whenever a human error is identified as a
cause of a potential accident. "To err is human" may be a true statement, but the frequency
and consequences of such errors can be effectively reduced with a well-designed strategy for
analyzing risk of non-routine operating modes.
11. References
1. Rasmussen, B. "Chemical Process Hazard Identification," Reliability Engineering and
System Safety, Vol. 24, Elsevier Science Publishers Ltd., Great Britain, 1989.
4. Bridges, W.G., et. al., “Addressing Human Error During Process Hazard Analyses,”
Chemical Engineering Progress, May 1994.
5. Tew, R. and Bridges, W., “Human Factors Missing from PSM,” Loss Prevention
Symposium (part of the Global Congress on Process Safety [GCPS]), AIChE, March,
2010.
8. “Guidelines for Hazard Evaluation Procedures, 3rd Edition, with Worked Examples,”
Center for Chemical Process Safety, AIChE, New York, 2008.
10. OSHA Inspection Number 103490306, page 7 of 77, Issued November 2, 1992.
11. OSHA Inspection Number 123807828; pages 2, 3, and 7 through 22, of 25; Issued
November 18, 1993.
12. Woodcock, Henry C., “Program Quality Verification of Process Hazard Analyses (for
instructional purposes only),” US OSHA, 1993.
13. Hammer, W., “Occupational Safety Management and Engineering, 3rd Ed.,” Prentice
Hall, 1985.
15. “Hazard Evaluation (PHA) Leadership Course,” Process Improvement Institute, Inc.,
2003.
16. Tew, R., et.al., “Optimizing Qualitative Hazard Evaluations (or How to Complete A
Qualitative Hazard Evaluation Meeting in One-Third the Time Currently Required),” 5th
Global Congress on Process Safety, AIChE, April 2009.
a
Data in the table excludes storage and terminals and also excludes exploration (drilling, etc.) and transportation
b
Forty-seven major accidents are listed above, of which thirty-one are during non-routine operation