Uptime Elements Passport: Gineer
Uptime Elements Passport: Gineer
Uptime® Elements ™
Passport
ng
i
Reliab
Part of the Certified Reliability Leader
Body of Knowledge REM
e
nc
fo a
r M ai nten
Publisher: Reliabilityweb.com
Designer: Jocelyn Brown
10 9 8 7 6 5 4 3 2 1
REM Contents
criticality analysis
Ca Introduction.......................................................... 3
Key Terms and Definitions................................... 5
Criticality Analysis Development......................... 8
Analysis Process Methodology........................... 13
Benefits of Criticality Analysis............................. 15
What Every Reliability Leader Should Know....... 16
Summary.............................................................. 17
iii
Summary.............................................................. 47
References............................................................ 50
reliability engineering
Re Introduction.......................................................... 55
Key Terms and Definitions................................... 55
Purpose of Reliability Engineering....................... 58
Role of Reliability.................................................. 59
Measuring Reliability............................................ 62
Software Reliability............................................... 64
Benefits of Reliability............................................ 65
What Every Reliability Leader Should Know....... 66
Summary.............................................................. 66
References............................................................ 68
iv
What Every Reliability Leader Should Know....... 79
Summary.............................................................. 80
References............................................................ 83
v
What Every Reliability Leader Should Know....... 116
Summary.............................................................. 117
References............................................................ 118
Acknowledgment............................................... 121
vi
The Uptime Elements is a holistic system
based approach to reliability
that includes: Technical Elements,
Cultural Elements, Leadership Elements
Reliability Engineering
REM for Maintenance
Ca Rsd
criticality reliability
analysis strategy
development
Re
reliability
Rca
root cause
engineering analysis
Cp Rcd
capital reliability
project centered
management design
® ™
Uptime Elements
Technical Activities Leadership Business Processes
Re
reliability
Rca
root cause
Ut
ultrasound
Ir
infrared
Mtmotor
Odr Mro
operator driven mro-spares
Hcm Cbl Ri Ak Alm
human capital competency risk asset asset lifecycle
engineering analysis testing thermal testing reliability management management based management knowledge management
imaging learning
Asset Lifecycle
Reprinted with permission from NetexpressUSA Inc. d/b/a Reliabilityweb.com. Copyright © 2016-2017. All rights reserved. No part of this graphic may be reproduced or transmitted in any form or by any means without the prior express written consent of NetexpressUSA Inc. Uptime®,
Reliability®, Certified Reliability Leader™, Reliabilityweb.com® , A Reliability Framework and Asset Management System™ and Uptime® Elements™ are trademarks and registered trademarks of NetexpressUSA Inc. in the U.S. and several other countries.
Introduction
Criticality analysis (CA) is a key element in the Reli-
Ca
ability Engineering for Maintenance (REM) domain of
Uptime Elements and is fundamental to asset manage-
ment. CA is used to evaluate how asset failures impact
organizational performance and to systematically rank
plant assets for the purpose of workflow prioritization,
preventive maintenance and condition monitoring
development, maintenance reliability initiatives, etc.
It provides the basis for determining the value and
impact a specific asset has on the production/operations
process, as well as the level of attention the asset requires
with regard to reliability strategy development (RSD) or
strategies and plans (SP) for asset management.
A failure mode and effects analysis (FMEA) is used
to determine different failure modes and their effects on
the asset, while a criticality analysis classifies and prior-
itizes the level of importance of a failure on operations.
This ranking is based on several factors, such as the pro-
jected failure rate of the asset, the severity of the effect
(i.e., consequences) of the failure and the likelihood of
the failure being detected before it occurs.
Asset criticality is sometimes called asset risk profile.
It uses a risk formula to determine the financial impact
3
Reliability Engineering for Maintenance
4
criticality analysis
Ca
orders are completed.
Criticality analysis is an important tool that provides
valuable information for decisions about work priority,
developing reliability strategies, justifying resources to
conduct root cause analysis (RCA), FMEA, etc. CA
helps ensure that resources are being spent on the right
assets to get more value.
6
criticality analysis
Ca
including design, development, build, operate, maintain
and disposal.
Maintenance program – A comprehensive set of main-
tenance activities, their intervals and required activities,
along with accurate documentation of these activities.
Maintenance strategies – A long-term plan covering
all aspects of maintenance management that sets the
direction on how assets will be maintained and contains
action plans for achieving the desired future state.
Mean time between failures (MTBF) – A basic measure
of asset reliability calculated by dividing total operating
time of the asset by the number of failures over a period;
the inverse of failure rate (λ) and is generally used for
repairable systems.
Mean time to repair (MTTR) – A basic measure of
maintainability, it represents the average time needed
to restore an asset to its full operational condition after
a failure; calculated by dividing total repair time of the
asset by the number of failures over a period of time.
Predictive maintenance (PdM) – An advanced main-
tenance technique focused on using technology, such
7
Reliability Engineering for Maintenance
Ca
facility assets, it would consume almost all of its highly
skilled and specialized engineering team resources and
take an extensive period that would defer the benefits
the company might achieve while conducting it.
As a rule, the largest percentage of an organization’s
total risk for its equipment and plant assets is concen-
trated on a small proportion of these items. These are the
equipment and plant asset items that should be involved
in the FMEA process. Therefore, the emphasis should be
on those items that are critical for sustaining continuous
operation of the equipment and plant assets. This must
be the focus of the criticality analysis.
So, where is the starting point for a criticality analysis?
To understand a standardized approach of consequences
and severity of a failure, it’s best to review the follow-
ing chart (Figure 1) from the ISO14224 standard. The
categories on the chart define the type of failure based
on whether it’s catastrophic, severe, moderate, or minor.
Criteria for these categories must be determined by
each organization. For example, a failure that results in
death would be catastrophic; similarly, complete system
failure or production shutdown also could be viewed as
9
Reliability Engineering for Maintenance
10
criticality analysis
Ca
However, the classification system damage in the range
of $1 million may vary from company to company. So,
the dollar threshold for the severe, moderate and minor
categories becomes organization dependent.
The operational consequences, which include
expenses, also introduce a subjective factor. For exam-
ple, what may be a very high maintenance cost to one
organization might not be as dramatic to another. As
such, setting the dollar amount in each column cate-
gory becomes dependent on the company. However, the
ISO14224 block diagram is an excellent starting point
for developing criticality analysis criteria. As a com-
pany fills out this chart, it is able to determine what is
unacceptable and must be prevented at any cost, when a
corrective measure should be considered at a reasonable
cost, or what an acceptable risk is and its run to failure
strategy.
The ISO14224 failure consequence diagram is also a
logical starting point for the severity of the failure. An
alternative approach utilized by some organizations is to
use a quantitative number that can be determined by a
criteria, such as hours of downtime, cost of repair, asset
11
Reliability Engineering for Maintenance
12
criticality analysis
Ca
The team, based on their collective knowledge, can
choose the most appropriate factors.
Analysis Process Methodology
The suggested steps to conduct a criticality analysis are:
1. Select team members from cross-functional areas to
perform the analysis;
2. Get the list of assets from the CMMS based on an
established hierarchy scheme:
a. Use ISO14224 as a guideline, if needed, to
improve hierarchy and taxonomy;
3. Establish appropriate criteria and weighting factors
for criticality analysis;
4. Apply criteria and develop criticalty ranking number
for each asset, or assign Low (L), Medium (M), or
High (H) criticality based on collective team knowl-
edge and data available:
a. Numerical results can be scaled and grouped,
making it possible to classify asset groups by their
functional importance to the business;
b. Functional grouping can be classified into three
types of assets:
13
Table 1
Asset ID Asset Type Asset Description Criticality Criteria
Reliability Engineering for Maintenance
Weighted Weighted
(1) Mission - (2) (3) Safety - (4) (5) Single (6) Asset (7) Criticality Criticality
Operations Customer HSE Regulatory Point Replacement Maintenance (8) Spare Raw Rating Rating
Impact Impact Impact Impact Failure cost Cost lead Time Score (100) L-M-H
1 A-001 Assembly machine 4 4 2 1 3 3 3 3 23 57.5 M
2 Conveyor system 2 1 2 1 2 1 1 1 11 27.5 L
3 Hydraulic Power unit 2 1 3 3 2 2 2 2 17 42.5 M
4 Crane - OH 10 Ton 3 1 2 2 1 2 1 2 14 35 L
5 Transformer Area Transformer unit -PT1 5 3 2 1 4 3 2 3 23 57.5 M
Numerical criteria rating scale = 1-5 (5 being high impact) Crticality Rating … Low -L = 0-40 Medium -M= 41-70 High -H= 71-100
14
criticality analysis
Ca
ii essential to operations, can be classified as (M)
assets;
iii critical to operations, can be classified as (H)
assets.
15
Reliability Engineering for Maintenance
16
criticality analysis
Summary
Asset criticality is fundamental to asset management.
Ca
Organizations must define which of its assets are crit-
ical and focus their maintenance reliability efforts on
those assets first. Criticality prioritizes which assets are
important to monitor, maintain and improve. There-
fore, performing a criticality analysis, identifying critical
assets and building a reliability, maintenance, or asset
management plan is a good strategy.
The ranking process requires the selection of team
members from cross-functional areas, such as produc-
tion/operations, engineering, maintenance, quality,
health, safety and environment, etc., to perform the anal-
ysis. The ranking process defines the relative importance
of asset failure consequences to the overall business. This
is accomplished by evaluating asset failure consequences
and the probability of failure against weighted criteria
within several business impact factors. Typically, the
business impact factors of mission/customer, safety,
quality, regulatory, throughput and cost impact are used
for an evaluation.
The next step is to establish appropriate criteria and
weighting factors for criticality. Knowledge of ISO14224
could be very helpful with this task. Then, apply the criteria
17
Reliability Engineering for Maintenance
18
Rsd
reliability
strategy
development
reliability strategy development
Introduction
Reliability strategy development (RSD) is based on
three main techniques:
Rsd
• Preventive maintenance optimization (PMO);
• Failure mode and effects analysis (FMEA).
RCM
RCM is generally used to achieve improvements in all
aspects of asset management, such as the establishment
of a safe, minimum, or optimized level of maintenance,
changes in operating procedures and establishment of
an effective maintenance plan for the most critical systems.
Successful implementation of RCM promotes cost-
effectiveness, asset uptime and a better understanding of
the level of risk the organization is currently managing.
It has been demonstrated that the best benefit for
applying RCM is realized during the design and devel-
opment phases of the asset lifecycle by eliminating or
22
reliability strategy development
Rsd
the 1970s, during which RCM has become a mature
process. However, industry has yet to fully embrace the
RCM methodology in spite of its proven track record.
PMO
A preventive/planned maintenance optimization pro-
cess focuses on evaluating each PM task and eliminating
unnecessary tasks or wasteful activities, thus improving
the plant’s overall performance. This allows refocusing
the resource’s constrained maintenance toward effective
failure prevention maintenance activities.
FMEA
Failure mode and effects analysis (FMEA), also some-
times called failure mode, effects and criticality analysis
(FMECA), is a step-by-step approach for identifying
all possible failures in design and operations (e.g., the
manufacturing process of a product or service).
Developed in the 1940s by the U.S. military, the
FMEA process was further developed and enhanced
23
Reliability Engineering for Maintenance
24
reliability strategy development
Rsd
determine potential ways it can fail and its potential
effects on required functions, and to identify appropriate
mitigation tasks for highest priority risks.
Hidden Failure – A failure mode that is not evident to
a person or operating crew under normal circumstances.
Operating Context – The environment in which an
asset is expected to be used.
Preventive Maintenance Optimization (PMO) – A
methodology focusing on improving maintenance
effectiveness and efficiency by reviewing an existing
maintenance program and, in most cases, adding main-
tenance tasks to account for failure modes not addressed
by the existing program.
Reliability-Centered Maintenance (RCM) – A system-
atic, disciplined process for establishing the appropriate
maintenance plan for an asset/system to minimize the
probability of failures. The process ensures safety, system
function and mission compliance.
25
Reliability Engineering for Maintenance
26
reliability strategy development
Reliability-Centered Maintenance
Principles and Standards
There are four principles that define and characterize
RCM and set it apart from any other preventive main-
tenance planning process.
Rsd
Principle 1: The primary objective of RCM is to preserve
system function.
Principle 2: Identify failure modes that can defeat the
functions.
Principle 3: Prioritize function needs (i.e., failures modes).
Principle 4: Select applicable and effective tasks.
27
Reliability Engineering for Maintenance
RCM Standards
The SAE JA1011 standard describes the minimum cri-
teria to which a process must comply to be called RCM.
A highly simplified RCM decision framework is
shown in Figure 2 to the right.
28
reliability strategy development
Rsd
Will the failure result
in other economic No
Yes loss (high cost
damage to machines
Is there an No or systems)?
effective CM
technology or
approach?
Yes Candidate
Develop &
schedule CM
Is there an
effective Interval-
No
For
task to monitor
Based task?
condition.
Yes Yes
Redesign system,
Develop &
Perform Condition- accept the failure
schedule Interval- Run-to-Fail?
Based task. risk, or install
Based task.
redundancy.
Figure 2
29
FAILURE PATTERNS
Random failures account for 77-92% of total failures and age related failure characteristics for the remaining 8-23%.
AGE RELATED
Probability of Failure
Probability of Failure
Reliability Engineering for Maintenance
Probability of Failure
Probability of Failure
Probability of Failure
Time Time Time
Reprinted with permission from NetexpressUSA Inc. d/b/a Reliabilityweb.com. Copyright © 2016. All rights reserved. No part of this graphic may be reproduced or transmitted in any form or by any means without the prior express
Figure 3: Failure written
patterns (Source: Reliabilityweb.com)
consent of NetexpressUSA Inc., Reliability® and Reliabilityweb.com® are trademarks and registered trademarks of NetexpressUSA Inc. in the U.S. and several other countries.
reliabilityweb.com • maintenance.org • reliabilityleadership.com
30
reliability strategy development
Rsd
infant mortality, after which their failure probability
increases gradually or remains constant, and a marked
wear out age is not common. In many cases, scheduled
overhaul increases the overall failure rate by intro-
ducing a high infant mortality rate into an otherwise
stable system.
31
Reliability Engineering for Maintenance
32
reliability strategy development
VALUE:
When all parties involved in plant success (includes risks
to avoid) agree on asset function, they share an under-
standing of what is important and why it adds value.
RISK OF NOT APPLYING THIS STEP:
Rsd
A lack of understanding or agreement regarding asset
functions causes a lack of clarity regarding the right
thing to do. This leads to:
• Differing priorities;
• Inability to measure performance;
• Excess costs (i.e., not enough of the right thing or too
much of the wrong thing).
2. In what ways can the asset fail to fulfill its functions
(i.e., functional failures)?
PURPOSE:
This question focuses decisions on relevant functional
problems and the degree to which these problems can
manifest themselves a little or a lot.
VALUE:
Provides a logical connection between equipment failure
and the consequence of that failure to the component,
the system and the plant.
33
Reliability Engineering for Maintenance
34
reliability strategy development
PURPOSE:
This question identifies how component failure impacts
other components, systems, the plant, surroundings, or
the ability to detect failures.
VALUE:
Rsd
Detailed knowledge about adverse impacts, if any,
improves the quality of decisions made to manage them.
RISK OF NOT APPLYING THIS STEP:
Not understanding the effects of failure guarantees that
the consequences of failure are also unknown.
5. In what ways does each failure matter (i.e., failure
consequences)?
PURPOSE:
This question identifies how important the failure is to
control, prevent, or mitigate in terms of safety, opera-
tions, the environment and economics.
VALUE:
With infinite resources, you would address every poten-
tial problem equally. This question helps you identify
where you must actively manage failure and the extent
to which you must do so over other priorities.
35
Reliability Engineering for Maintenance
36
reliability strategy development
VALUE:
Helps an organization eliminate risk, rather than
live with it. Documentation from all seven questions
will ensure the risk is given the appropriate level of
consideration.
Rsd
RISK OF NOT APPLYING THIS STEP:
The failure and its consequences are not under the con-
trol of the organization.
38
reliability strategy development
Rsd
failure modes, or automating the process using software
to reduce the time taken to complete the analysis. In
addition, software programs are available to help reduce
the time to perform analyses.
It is important for users of these tools and tech-
niques to understand the limitations imposed by these
shortcuts. This enables users to apply RSD with confi-
dence by knowing the right tool is selected at the right
time and driven by the criticality of the equipment/
systems.
39
Reliability Engineering for Maintenance
40
reliability strategy development
Rsd
important concern is cost-effectiveness, which takes
into consideration the priority or mission critical-
ity and then matches a level of cost appropriate to
that priority. The flexibility of the RSD approach to
maintenance ensures the proper type of maintenance
is performed when it is needed. Maintenance that is
not cost effective is identified and not performed.
Benefits of PMO
If one were to conduct a survey among maintenance pro-
fessionals to ascertain how their PMs came about or the
basis of their program, the responses would probably fail
to provide definitive and meaningful information. Most
existing PM programs cannot be traced to their origins.
For those that can, most are unlikely to make sense.
The following reasons are usually the ones given for
a PM program:
• Experienced based;
• Failure prevention;
• Brute force;
• Regulations.
42
reliability strategy development
Rsd
Figure 4: Evaluation of failures (Source: Nexus Global)
43
Reliability Engineering for Maintenance
Reliability-Centered Maintenance
PM Optimization
PM optimization is a best practice that is achieved by:
• Removing or enhancing all maintenance tasks that are
vague, don’t add any value, or are not cost-effective;
44
reliability strategy development
Rsd
assets;
• Assigning tasks appropriately between maintenance
and operations;
• Making PMO a living program, updating as needed.
45
Reliability Engineering for Maintenance
46
reliability strategy development
Rsd
• Minimization of late changes and associated costs;
• Improved asset (i.e., product), process reliability and
quality;
• Reduction of lifecycle costs;
• Catalyst for teamwork among design, operations and
maintenance.
Summary
Reliability-centered maintenance (RCM) is a process
to ensure assets continue to do what their users require
in their present operating context. The RCM process is
defined by the technical standard SAE JA1011, which sets
the minimum criteria that any process should meet before
it can be called RCM.
RCM is generally used to achieve improvements in
asset/plant operations, such as the establishment of safe
minimum levels of maintenance, including changes to
operating procedures. Successful implementation of
RCM leads to increased cost-effectiveness, asset uptime
47
Reliability Engineering for Maintenance
48
reliability strategy development
Rsd
RCM must be considered throughout the lifecycle
of an asset if it is to achieve maximum effectiveness.
According to many studies, about 80 percent or more
of an asset’s lifecycle cost is fixed during the planning,
design and build phases. The subsequent phases set the
remaining 20 percent or so of the lifecycle cost. Thus,
the decision to institute RCM for an asset, including
condition monitoring, will have a major impact on the
lifecycle cost of the asset. This decision is best made
during the planning and design phase.
FMEA helps designers and engineers improve the
reliability of assets and systems to produce quality prod-
ucts. Although the purpose, terminology and other
details can vary according to the FMEA type, the basic
methodology is similar for all types.
PMO can address most existing PM programs that
cannot be traced to their origins. For those that can,
most are unlikely to make sense. The following reasons
are usually the ones given for a PM program:
49
Reliability Engineering for Maintenance
• OEM recommendations;
• Experienced-based;
• Failure prevention;
• Brute force;
• Regulations.
References
Society of Automotive Engineers. SAE JA1011, Evaluation
Criteria for Reliability-Centered Maintenance (RCM)
Processes, 1998.
http://standards.sae.org/ja1011_200908/
Society of Automotive Engineers. SAE JA1012, A Guide to
the Reliability-Centered Maintenance (RCM) Standard, 2002.
http://standards.sae.org/ja1012_200201/
Smith, Anthony M. and Hinchcliffe, Glenn R. RCM –
Gateway to World Class Maintenance. Waltham: Elsevier, 2004.
50
reliability strategy development
Rsd
New York: Industrial Press, 2009/2012.
Nowlan, Stanley F. and Heap, Howard F. Reliability-
Centered Maintenance. U.S. Department of Defense:
Report Number AD-A066579 (pdf ), 1978.
NASA. Reliability-Centered Maintenance Guide for Facilities
and Collateral Equipment (pdf ). NASA: February 2000.
Paske, Sam. Developer of The 7 questions of RCM, 2013
RCM Project Managers' Guide, www.reliabilityweb.com
51
Re
reliability
engineering
reliability engineering
Introduction
Reliability engineering (RE) is a field that deals with the
study, evaluation and lifecycle management of reliability
for an asset or product. Reliability engineering is consid-
ered a sub-discipline of systems engineering.
Reliability engineering plays a significant role in
cost-effective operations and maintenance of an asset,
machine, or system by ensuring it consistently performs
its intended or required function or mission on demand
Re
and without degradation or failure.
Many times, the terms reliability, availability and
maintainability (RAM) or reliability, availability, main-
tainability and safety or sustainability (RAMS) are used
in reliability engineering analysis.
56
reliability engineering
Re
Operating Context – The environment in which an
asset is expected to be used.
Reliability – The probability that an asset, item, or
system will perform its required functions satisfactorily
under specific conditions within a certain time period.
Reliability Centered Maintenance (RCM) – A system-
atic, disciplined process for establishing the appropriate
maintenance plan (requirements) for an asset/system to
minimize the probability of failures. The process ensures
safety, system function and mission compliance under
present operating context.
Run to Failure (RTF) – A maintenance strategy or
policy for assets where the cost and impact of failure is
57
Reliability Engineering for Maintenance
58
reliability engineering
Role of Reliability
The primary role of a reliability professional/engineer
(RP/E) is to identify and manage the reliability risks
of an asset that could adversely affect plant or business
operations. This broad primary role can be divided into
Re
three key areas:
60
reliability engineering
Re
RCM-based preventive maintenance tasks and effec-
tive utilization of predictive and other non-destructive
testing methodologies to identify and isolate inherent
reliability problems.
• Providing input to a risk management plan that will
anticipate reliability-related and non-reliability-related
risks that could adversely impact plant operations.
• Providing support in finding engineering solutions to
repetitive failures and all other problems that adversely
affect plant operations, such as capacity, quality, cost,
or regulatory compliance issues, by applying data
analysis techniques that can include statistical process
control; reliability modeling and prediction; fault tree
analysis; Weibull analysis; Six Sigma methodology;
and root cause failure analysis.
61
Reliability Engineering for Maintenance
Measuring Reliability
Reliability, maintainability and availability are three key
terms in reliability engineering. Although we say asset
reliability improvement, many times what we really mean
is availability. Availability (A) is a function or product of
reliability and maintainability of the asset. It is measured
by the degree to which an item or asset is in an operable
and committed state at the start of the mission when the
mission is called at an unspecified (random) time.
In simple terms, the availability may be stated as the
probability that an asset will be in operating condition
when needed. Mathematically, the availability is defined:
Uptime
Availability (A) =
Uptime + Downtime
MTBF
=
MTBF + MTTR
62
reliability engineering
Re
MTBF =
2000 hours ÷ 10 failures = 200 hours per failure
or
12 months ÷ 10 failures = 1.2 months per failure
63
Reliability Engineering for Maintenance
Software Reliability
Software reliability is a special aspect of reliability engi-
neering. Asset/system reliability, by definition, includes
all parts of the system, including hardware, software,
supporting infrastructure (including critical external
interfaces), operators and procedures. Traditionally, reli-
ability engineering focuses on critical hardware parts of
the system. Since the widespread use of digital integrated
circuit technology, software has become an increasingly
critical part of nearly all present day assets/systems.
As with hardware, software reliability depends on
good requirements, design and implementation. Soft-
ware reliability engineering relies heavily on a disciplined
software engineering process to anticipate and design
64
reliability engineering
Benefits of Reliability
Asset reliability is an important attribute for several rea-
sons, including:
Re
• Improves Customer Satisfaction. Reliable assets will
perform to meet customer needs on time, every time.
• Increases Repeat Business. Customer satisfaction
will bring repeat business and have a positive impact
on future business.
• Enhances Reputation. The more reliable plant assets
are, the more likely the organization will have a favor-
able reputation.
• Reduces Operations and Maintenance Costs. Poor
asset performance costs more to operate and maintain.
• Improves Competitive Advantage. With greater
emphasis on a plant reliability improvement program,
companies gain an advantage over their competition.
65
Reliability Engineering for Maintenance
Summary
Reliability engineering is a relatively new discipline. Its
growth and importance have been the result of several
factors, including the increased complexity and sophis-
tication of assets/systems, regulatory and community
requirements to meet reliability, maintainability, safety
and sustainability performance specifications, and an
organization’s profit concerns resulting from the high
cost of failures and their repairs.
66
reliability engineering
Re
Reliability, along with availability, maintainability,
safety and sustainability, are not only an important part
of the engineering design process, but also necessary
functions of asset lifecycle management. Reliability engi-
neering provides support in reducing the total cost of
asset ownership by providing cost benefit analysis, oper-
ational capabilities loss-risks studies/analysis, repair and
facility resourcing optimization, replacement decisions,
spare parts and inventory optimization, establishment of
an optimum maintenance or PM program, etc.
67
Reliability Engineering for Maintenance
References
Gulati, Ramesh. Maintenance and Reliability Best Practices.
New York: Industrial Press, 2009/2012.
Ebeling, Charles E. An Introduction to Reliability and
Maintainability Engineering. Long Grove: Waveland Press,
2005.
Ray, Donald. What’s the Role of the Reliability Engineer?
Reliable Plant: http://www.reliableplant.com/Read/23083/
role-reliability-engineer-operations.
Smith, Anthony M. and Hinchcliffe, Glenn R. RCM –
Gateway to World Class Maintenance. Waltham: Elsevier,
2004.
Reliability Engineer and Maintenance Engineer
Job Descriptions. www.reliabilityweb.com/articles/re-vs-me
68
Rca
root cause
analysis
root cause analysis
Introduction
Root cause analysis (RCA) is a method of problem
solving that tries to identify the root causes of faults or
problems that cause failure events.
RCA can help transform a reactive culture into a
forward-looking culture that solves problems before
they occur or escalate. More importantly, it reduces the
frequency of problems occurring over time within the
environment. Having unreliable asset performance can
be a threat in many cultures and environments. Old
measures that pit production against maintenance may
have to be removed from the system. Empowering defect
Rca
elimination and cross-training teams may be required to
overcome the resistance from cultures.
Root cause analysis, or root cause failure analysis
(RCFA) as it is sometimes called, is a step-by-step
methodology that leads to the discovery of the prime
cause (or the root cause) of the failure. If the root cause
of a failure is not addressed in a timely fashion, the fail-
ure will repeat itself, usually causing unnecessary loss
of production and increasing the cost of maintenance.
RCA is a structured way to arrive at the root cause,
thus facilitating elimination of the cause and not just
the symptoms associated with it.
71
Reliability Engineering for Maintenance
72
root cause analysis
Rca
purpose of performing a RCA is to analyze problems or
events to identify what happened, how it happened and
why it happened so actions for preventing reoccurrence
can be developed.
To be effective, RCA must be performed systemati-
cally, usually as part of an investigation, with conclusions
and root causes that are identified and backed up by
documented evidence. Usually, a team effort is required.
There may be more than one root cause for an event or
problem; the difficult part is demonstrating persistence
and sustaining the effort required to determine them.
The purpose of identifying all solutions to a problem is
73
Reliability Engineering for Maintenance
74
root cause analysis
Rca
1. Safety-based RCA is performed to find causes of
accidents related to occupational safety, health and
environment.
2. Product or production-based RCA is performed to
identify causes of poor quality, production and other
problems in manufacturing related to the product.
3. Process-based RCA is performed to identify causes
of problems related to processes, including business
systems.
4. Asset failure-based RCA is performed for failure
analysis of assets or systems in engineering and the
maintenance area.
75
Reliability Engineering for Maintenance
Rca
analyzing, documenting, communicating and solving a
problem to show how individual cause and effect rela-
tionships are interconnected.
• Cause and Effects Analysis – Also called Ishikawa or
fishbone diagram, it identifies many possible causes for
an effect or problem and then sorts ideas into useful
categories to help in developing appropriate corrective
actions. The design of the diagram looks like the skeleton
of a fish, hence the designation “fishbone” diagram.
• Change Analysis – Looks systematically for possible
risk impacts and appropriate risk management strategies
in situations where change is occurring. This includes
77
Reliability Engineering for Maintenance
78
root cause analysis
Benefits
RCA solves problems at their root, rather than just fixing
the obvious. It is often equated to a Kaizen improvement
process, and rightly so, as it often digs into possible orga-
nizational change, rather than localized optimizations.
The benefits of RCA include uncovering relationships
between causes and symptoms of problems, working to
solve issues at the root itself and providing tangible evi-
dence of cause and effect and solutions. RCA can:
Rca
• Identify true root causes.
• Eliminate repeated failures.
• Identify major, long-term opportunities for
improvement.
• Reduce costs and increase revenue.
• Enable organizations to expand findings to multiple
sites.
79
Reliability Engineering for Maintenance
Summary
When do most organizations conduct a RCA? Typically
when someone is injured, when there is catastrophic
damage, when there has been an “incident” and when
there has been an environmental release, violation, etc.
Most of these high visibility occurrences require us to
perform analysis by some federal or state regulatory
agency. Therefore, we conduct RCAs in an effort to
comply with regulatory requirements only. We don’t
need to perform RCA for compliance only, its real
80
root cause analysis
Rca
• Various systems-based processes, including change
management and risk management.
Organizations must continually improve processes,
reduce costs and cut waste to remain competitive. To
make improvements in any process, failure/problem,
including potential failures, it needs to be analyzed using
tools and techniques for developing and implement-
ing corrective actions. A variety of methods, techniques
and tools are available, ranging from a simple checklist
to sophisticated modeling software. They can be used
effectively to lead us to appropriate corrective actions.
Applying continuous improvement tools can optimize
81
Reliability Engineering for Maintenance
82
root cause analysis
References
Gulati, Ramesh. Maintenance and Reliability Best Practices.
New York: Industrial Press, 2012.
Latino, Robert J.; Latino, Kenneth C.; Latino, Mark A. Root
Cause Analysis: Improving Performance for Bottom-Line Results.
Boca Raton: CRC Press, 2002.
Tague, Nancy R. The Quality Toolbox. Milwaukee: ASQ
Quality Press, 2005.
Andersen, Bjorn and Fagerhaug, Tom. Root Cause Analysis.
Milwaukee: ASQ Quality Press, 2006.
Cause Mapping, www.thinkreliability.com
Rca
83
Cp
capital project
management
Uptime Ele Technical Activities
capital project management
®
Introduction
REM Capital project
Reliability management
Engineering
ACM Asset (CP) is the management
Condition
WEM Work ofExecution
for Maintenance Management Management
all capital asset purchases, from the investment require-
ments definition to commissioning. Capital project
Ca
management Rsdfocuses Acion managing
Vib theFa Pm Ps
capital expendi-
criticality reliability asset vibration fluid preventive planning and
analysis strategy condition analysis analysis maintenance scheduling
ture fordevelopment
an asset from the time business’ needs determine
information
Re
the design Rca of the asset Utto the infrared
Ir
approximate Mt capital Odr expen- Mro
diture required.
reliability
engineering
root cause
analysis CP also determines
ultrasound
testing thermal the scopereliability
motor
testing of the managemen
operator driven mro-spares
imaging
project (required capacity, size of asset, financial jus-
Cptification, Rcd etc.), the Ab Ndt
supplier evaluation Luand selection, De Cmms computerized
capital reliability alignment and non machinery defect maintenance
and the centered
project
management execution
design of the project,
balancing
testing which is typically the
destructive lubrication elimination managemen
system
Cp
O
Business
Needs Analysis Design Create/Acquire M
Modi
Asset Lifecycle
Key Terms and Definitions
Reprinted with permission from NetexpressUSA Inc. d/b/a Reliabilityweb.com. Copyright © 2016. All rights reserved. No part of this graphic may be reprodu
Reliability®, Certified Reliability Leader™, Reliabilityweb.com® , A Reliability Framework and Asset Management System™ and Uptime® Elemen
Acceptance Criteria – Requirements a project or system
reliabilityweb.com • maintenance.org • rel
must meet before a customer can accept delivery.
87
Reliability Engineering for Maintenance
Cp
system will perform its required functions satisfactorily
under specific conditions within a certain time period.
System Design – The translation of customer require-
ments into a comprehensive, detailed, functional
performance or design specification that is then used to
construct a specific asset.
Systems Engineering – A discipline applying technical
and administrative direction and surveillance to identify
89
Reliability Engineering for Maintenance
90
capital project management
Cp
ever the reason, the strategic plan must be linked to
investment planning for the return on investment for
the new assets to be properly managed. Capital project
management is a business requirement.
Once the strategic plan identifies the need for addi-
tional assets, a study should be done as part of the
investment planning process to examine the utilization
of existing assets. It is quite common in many companies
91
Reliability Engineering for Maintenance
92
capital project management
Cp
the asset is in its operational and maintenance phases
of the lifecycle. Historically, the majority of companies
overlook this fact and fail to achieve the profitability
required by the projections in the strategic plan.
Additional considerations at this lifecycle phase
would be operability and maintainability. The design
engineer must solicit input from the operations person-
nel as to how the new equipment should operate. Will
93
Reliability Engineering for Maintenance
94
capital project management
Cp
Commissioning New Assets
In this phase of the asset’s lifecycle, the asset, whether
it is built, purchased, or retrofitted, is installed in the
plant or built. This is the construction or installation
phase of the project. There is some divergence based
on the philosophical leaning of the engineers, but the
project phase involves the installation of the equipment.
95
Reliability Engineering for Maintenance
96
capital project management
a. Data,
b. Resources,
c. Quality.
Cp
it is necessary to understand that all documentation,
from the financial justification to purchasing the asset
and right through to the commissioning phase, must be
collected and collated. All this data must be capable of
being referenced once the asset begins performing to be
certain design capacities are achieved, thus ensuring the
asset achieves its return on investment projected by the
strategic plan. Most of this data should be collected and
stored in the organization’s CMMS or EAM system.
97
Reliability Engineering for Maintenance
98
capital project management
Summary
Capital project management is extremely important
to a company being able to achieve design return on
Cp
investment. Most companies will never achieve true
value realization from their assets. ISO55000 defines an
asset as something that delivers value. The value the asset
realizes is in the operational and maintenance phase of
its life. If capital project management is not properly
utilized, the asset realizes a reduced value through its
lifecycle. Capital project management can be a com-
petitive weapon for companies that properly utilize it.
99
Rcd
reliability
centered
design
reliability centered design
Introduction
Many industry experts report that the majority of failures
(i.e., defects) during an asset’s operational phase are the
result of poor or inadequate design. Many times, design
omissions are caused by insufficient funds or budget
constraints imposed due to a lack of understanding of
the consequences on the lifecycle costs of the asset. The
capital project manager’s and designer’s performance is
judged on how they met budget and schedule targets, not
on long-term asset performance, including lifecycle costs.
A well designed, built and installed asset should have
fewer failures and a much lower total cost of ownership
during the entire life of the asset.
Leading and highly reliable organizations integrate
reliability-centered design (RCD) principles into all
aspects of their capital projects process, including asset
concept, design/development, build and the install phase.
103
Reliability Engineering for Maintenance
104
reliability centered design
107
Reliability Engineering for Maintenance
108
reliability centered design
109
Reliability Engineering for Maintenance
110
reliability centered design
112
reliability centered design
113
Reliability Engineering for Maintenance
114
reliability centered design
115
Reliability Engineering for Maintenance
116
reliability centered design
Summary
Things, products, or assets fail in service. Everyone has
witnessed the various failure of products in their daily
life. To be reliable, assets must be robust and adequately
designed to avoid failure modes, even in the presence
of a broad range of conditions, including harsh envi-
ronments, changing operational demands and internal
deterioration due to wear and fatigue.
Designers and engineers should use a combination of
practices and tools to eliminate or minimize failures to
enhance design, which will result in reduced TCO. Some
examples of these practices and tools are:
• Voice of customers: stakeholders specifically, opera-
tors, maintainers, etc., to understand the requirements
and issues; Rcd
• DFMEA/FMEA types of tools to identify failure
modes and mitigate their consequences;
• Design based on RAMS2 principles:
• Use of reliable components based on reliability
analysis, etc.;
117
Reliability Engineering for Maintenance
References
Gulati, Ramesh. 10 Rights of Asset Management.
Reliabilityweb.com Solutions 2.0 Virtual Conference,
Session 10. www.reliabilityweb.com/videos/article/
solutions-2.0-virtual-conference-session-10
Gulati, Ramesh. Maintenance & Reliability Best Practices,
Second Edition. South Norwalk: Industrial Press, 2012.
118
reliability centered design
Rcd
119
Acknowledgment
The Uptime® Elements™ were originally created by Terrence
O’Hanlon, CEO and Publisher of Uptime® magazine and
Reliabilityweb.com®, in consultation and close cooperation
with Reliabilityweb.com co-founder Kelly Rigg O’Hanlon.
Early versions were reviewed by Erin Corin O’Hanlon and
Ian Jaymes O’Hanlon. The initial idea was inspired during a
parent-teacher meeting with science teacher Mark Summit
at Canterbury School in Fort Myers, Florida.
Development of this concept could not have happened
without the mentoring by true masters in the, reliability
and asset management communities, including Terry Wire-
man; Paul Barringer; Dr. Robert Abernathy; Jack Nicholas
Jr.; Anthony “Mac” Smith; Ron Moore; Bob DiStefano;
Steve Turner; Joel Levitt; Ramesh Gulati; Winston Ledet;
June Ledet; Michelle Ledet Henley; Heinz Bloch; Christer
Idhammar; Ralph Buscarello; Edmea Adell; Celso De Aze-
vedo; JohnWoodhouse; the entire AEDC/Jacobs/ATA team
led by Bart Jones; and many more people who have been kind
and generous in sharing their expertise.
Early stage evolution definition and development by
Steve Thomas, Ramesh Gulati, Jeff Smith, Grahame Fogel,
John Schultz and the Allied Reliability Group team, and PJ
Vlok proved invaluable to its current state. Early presentation
of these elements resulted in valuable feedback from mem-
121
Acknowledgment
123
CRL Body of Knowledge
The Association of Asset Management Professionals (AMP)
has developed an exam and certification based on the
Uptime Elements and it’s Reliability Leadership system. It
is designed to create leaders who focus on delivering value to
the triple bottom line of:
• Economic prosperity,
• Environmental sustainability,
• Social responsibility.
The body of knowledge that creates the foundation for the
exam and certification includes:
1. The Uptime® ElementsTM Passport series
2. The Journey by Stephen Thomas
3. Don’t Just Fix it, Improve It! by Winston P. Ledet,
Winston J. Ledet and Sherri M. Abshire
4. Uptime® ElementsTM Dictionary for the Reliability Leader
and Asset Manager by Ramesh Gulati
ng
i
Reliab
Part of the Certified Reliability Leader
Body of Knowledge REM
e
nc
fo a
r M ai nten