85% found this document useful (13 votes)
2K views

RCM Fundamentals - Meridium

Uploaded by

murali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
85% found this document useful (13 votes)
2K views

RCM Fundamentals - Meridium

Uploaded by

murali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

Reliability-Centered

Maintenance

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 1
Reliability-Centered Maintenance
Version: RCM Fundamentals Training.doc
Copyright © Meridium, Inc. 2008
All Rights Reserved
This training material is provided under a license
agreement containing restrictions on use and disclosure.
All rights, including reproduction by photographic or
electronic process and translation into other languages,
are reserved by Meridium.
Meridium is a registered trademark of Meridium, Inc.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 2
© Copyright Meridium, Inc. 2008. All rights reserved.
Document: RCM Fundamentals Training.doc
Page 3
Reliability-Centered Maintenance

Table of Contents

Table of Contents ............................................................................. 4


Foreword ....................................................................................... 7
Reliability-centered Maintenance ......................................................... 10
RCM-DO-01 Fundamentals of Managing Maintenance................................. 11
The Expectations of Maintenance ..................................................... 11
Understanding Failure ................................................................... 12
The Objective of Maintenance ......................................................... 15
What is RCM?.............................................................................. 16
The RCM Structure ....................................................................... 17
Functions .................................................................................. 18
The FMEA.................................................................................. 19
Consequences............................................................................. 20
Failure Management Strategies ........................................................ 22
Default Actions ........................................................................... 24
RCM-DO-02 Preparing for Analysis....................................................... 25
RCM-DO-03 Functions and Functional Failures ........................................ 26
Operating Context ....................................................................... 27
Writing Functions ........................................................................ 28
Performance Standards ................................................................. 29
Exercises .................................................................................. 31
Secondary Functions..................................................................... 32
RCM-DO-03b Air Conditioner ........................................................... 33
Functional Failures ...................................................................... 36
Failed States .............................................................................. 37
Exercise ................................................................................... 38
Exercises .................................................................................. 39
RCM-DO-04 Failure Modes and Effects .................................................. 40
Reasonably Likely ........................................................................ 41
Causality .................................................................................. 42
Writing a Failure Mode .................................................................. 44
Types of Failures ......................................................................... 45
The Problem with Data.................................................................. 46
Effects ..................................................................................... 48
RCM-DO-05 Consequences and Effectiveness .......................................... 50
Hidden or Evident?....................................................................... 52
Safety ...................................................................................... 54
Environmental ............................................................................ 55

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 4
Reliability-Centered Maintenance

Operational ............................................................................... 56
Repair Only ............................................................................... 57
RCM-HO-05a Assigning Consequences ................................................. 58
Applicable and Effective ................................................................ 61
Tolerable levels of Risk ................................................................. 62
Hidden Failures........................................................................... 64
The Famous Pump Example ............................................................ 67
Exercise 1 ................................................................................. 78
Exercise 2 ................................................................................. 79
Exercise 3 ................................................................................. 80
Case Study - BP refinery Incident...................................................... 82
Managing Safety and Environmental Consequences................................. 83
Economic Consequences ................................................................ 84
RCM-DO-06 Applicability and Task Selection .......................................... 87
Types of Maintenance ................................................................... 90
Preventive Maintenance (PM’s) ........................................................ 91
Predictive Maintenance ................................................................. 94
Detective Maintenance.................................................................101
Exercise 1 – Task Categories...........................................................102
Exercise 2 – Which type of maintenance? ...........................................103
The Basis of Task Preference..........................................................104
RCM-DO-06c Uses of MTBF...............................................................105
What MTBF can tell us?.................................................................105
At what level can we apply MTBF? ...................................................106
How can MTBF add value to Reliability Initiatives? ................................108
Summary .................................................................................110
RCM-DO-06d Advanced Detective Maintenance Techniques........................112
Exercise 1 – Steam Turbine ............................................................121
Exercise 2 – Steel Plant ................................................................122
Common Cause Failure Modes.........................................................123
Exercise 4 - Hoist .......................................................................125
Options for redesign ....................................................................126
Multiple Redundant Devices ...........................................................130
Exercise 5 – Pumps and PSV’s .........................................................130
Managing Risk in Hidden Failures .....................................................132
Voting Systems ..........................................................................133
Economic Consequences ...............................................................134
Exercise 6 – Economic Hidden Failures ..............................................138
RCM-DO-07 The Value of RCM...........................................................139
The Cashable Results of RCM..........................................................139
The Non-cashable Results of RCM ....................................................146
The Principal Barrier to Value Realization ..........................................148
The Role of the RCM Facilitator/Analyst ............................................149

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 5
Reliability-Centered Maintenance

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 6
Reliability-Centered Maintenance

Foreword
The Reliability-Centered Maintenance (RCM) approach was first documented in
the detailed book on the subject by F. Stanley Nowlan, Director, Maintenance
Analysis, and Howard F. Heap, Manager, Maintenance Program Planning, both
of United Airlines1. The book was sponsored by the Office of the Assistant
Secretary of Defense (Manpower, Reserve Affairs and Logistics) and was
published in 1978. From that book:
For years maintenance was a craft learned through experience and rarely examined
analytically. As new performance requirements led to increasingly complex equipment, however,
maintenance cost grew accordingly. By the late 1950's the volume of these cost in the airline
industry had reached a level that warranted a new look at the entire concept of preventive
maintenance. By that time studies of actual operating data had also begun to contradict certain
basis assumptions of traditional maintenance practice.
One of the underlying assumptions of maintenance theory has always been that there is a
fundamental cause-and-effect relationship between scheduled maintenance and operating
reliability. This assumption was based on the intuitive belief that because mechanical parts wear
out, the reliability of any equipments directly related to operating age. It therefore followed that
the more frequently equipment was overhauled, the better protected it was against the likelihood of
failure. The only problem was in determining what age limit was necessary to assure reliable
operation.
In the case of aircraft it was also commonly assumed that all reliability problems were
directly related to operating safety. Over the years, however, it was found that many types of
failures could not be prevented no matter how intensive the maintenance activities. Moreover, in a
field subject to rapidly expanding technology it was becoming increasingly difficult to eliminate
uncertainty. Equipment designers were able to cope with this problem, not by preventing failures,
but by preventing such failures from affecting safety. In most aircraft essential functions are
protected by redundancy features which ensure that, in the event of a failure, the necessary
function will still be available from some other source. Although fail-safe and "failure-tolerant"
design practices have not entirely eliminated the relationship between safety and reliability, they
have dissociated the two issues sufficiently that their implications for maintenance have become
quite different.
A major question still remained, however, concerning the relationship between schedule
maintenance and reliability. Despite the time-honored belief that reliability was directly related to
the intervals between scheduled overhauls, searching studies based on actuarial analysis of failure

1
F. Stanley Nowlan and Howard F. Heap, Reliability Centered Maintenance, United Airlines and Dolby Press,
sponsored and published by the Office of Assistant Secretary of Defense, 1978

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 7
Reliability-Centered Maintenance

data suggested that the traditional hard-time policies were, apart from their expense, ineffective in
controlling failure rates. This was not because the intervals were not short enough, and surely not
because the teardown inspections were not sufficiently through. Rather, it was because, contrary
to expectations, for many items the likelihood of failure did not in fact increase with increasing
operation age. Consequently a maintenance policy based exclusively on some maximum operating
age would, no matter what the age limit, have little or no effect on the failure rate.
In 1960 a task force of FAA and airline personnel was formed to investigate
scheduled maintenance and resulted in an FAA/Industry Reliability Program in
1961. Building upon this work, in 1965 United Airlines developed a
rudimentary decision-diagram technique. This technique was refined and
embodied in the 747 Maintenance Steering Group (MSG) Handbook:
Maintenance Evaluation and Program Development (MSG-1) from the Air
Transport Association in 1968. MSG-1 was used to develop the maintenance
program for the Boeing 747, the first maintenance program to apply RCM
concepts. Subsequent improvements led to MSG-2, which was used to develop
the maintenance programs for the Lockheed 1011 and the Douglas DC-10. A
similar document, European Maintenance System Guide, served as the basis for
development of the initial programs for the Concorde and the Airbus A-300.
The objective of the approach outlined in MSG-1 and MSG-2 was to develop a
scheduled maintenance program that assured the maximum safety and
reliability of equipment at the lowest cost. An example of the success of this
approach can be seen comparing the Douglas DC-8, which had a scheduled
overhaul of 339 items in a traditional maintenance program to the DC-10, based
upon MSG-2, which only had seven items to be overhauled. The latest
commercial aircraft maintenance guidance is based upon MSG-3 (Rev 2) for the
Boeing 757 and 767 aircraft.
In the early 1970's this work attracted the attention of the office of the Secretary
of Defense. The Navy was the first military organization to apply RCM to both
new design and in-service aircraft. Also in the early 1970's, the Navy embarked
on a major program to change the way nuclear submarines were maintained.
Over the next 20 years the Navy would virtually eliminate scheduled overhaul on
the nuclear submarine based upon an aggressive Condition Monitoring Program
and other technical advances to the ship systems. RCM is currently being used
on all new ship designs.
The RCM methodology has subsequently been applied in a wide variety of
commercial and military applications. The Electric Power Research Institute
(EPRI) has tested the methodology at several nuclear power utility sites of Florida
Power & Electric, Duke Power, and Southern California Edison. Puget Sound
Power and Light Co. has been using RCM since 1991 in both substations and
line maintenance. NASA has long used RCM in analyzing Space Shuttle and

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 8
Reliability-Centered Maintenance

Shuttle Support Systems. In the early 1990's, NASA embarked on a process of


basing the approach to facilities maintenance on RCM. And in 1995, Boeing
Commercial Airplane Group embraced RCM as one of the tools in implementing
a more robust and standardized facilities maintenance program.2 This was
significant in that one of the key groups in fomenting RCM in complex systems
(Boeing Aircraft) was now applying the approach to common industrial facilities
equipment.
More recently, issues surrounding RCM seem more focused on applying the technique and less on
proving its value. Must a group perform a classical/rigorous analysis, or is a more streamlined
approach acceptable? An excellent article regarding the variations in the methodology was presented
at the 2003 International Maintenance Conference.3 Regardless of the approach selected, the outcome
of RCM analysis is focused on selecting the most effective maintenance strategy and, when
maintenance can not deliver the needed reliability, identifying redesign requirements.

2
Westbrook, Dennis, Boeing Commercial Airplane Group, and William H. Closser, C&A Consulting, “Transition of
an Organization to a Reliability Based Culture”, Proceedings of 14th Annual International Maintenance Conference,
August 3-7, 1997, Atlanta, GA
3
Nicholas, Jack R. “The Controversy about Reliability Centered Maintenance Methodology, Its Variants and
Derivatives”, Proceedings of the 18th International Maintenance Conference, Dec. 7-10, 2003, Clearwater, FL.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 9
Reliability-Centered Maintenance

Reliability-centered Maintenance

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 10
Reliability-Centered Maintenance

RCM-DO-01 Fundamentals of Managing Maintenance


The Expectations of Maintenance

• Productivity
– How much are we producing?
• Cost-Effectiveness
– What is it costing us to do so?
• Safety & Environment
– Are we hurting anybody or damaging the environment in
the process?
• Quality
– Are we producing at a consistent high level of quality?
• Corporate Learning
– How can I make sure that I will be able to sustain/improve
this into the future?

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 11
Reliability-Centered Maintenance

Understanding Failure
The “Wear-out” Curve The
Thebelief
beliefthat
thatall
allassets
assetshave
haveaa“life”.
“life”.That
Thatisis––
aaperiod
periodofoffew
fewrandom
randomfailures
failuresfollowed
followedby byaa
wear out zone.
wear out zone.

Eventually
Eventuallypeople
peoplestarted
startedto
tobelieve
believethatthatmany
many
assets actually suffered early life failures.
assets actually suffered early life failures. The “Bathtub” Curve
The
The“bath-tub”
“bath-tub”curve
curvemakes
makesup upthethebasis
basisofof
many engineers beliefs in asset performance
many engineers beliefs in asset performance

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 12
Reliability-Centered Maintenance

• Only 11% of failures were related to age… 89% had no direct


The 6 Failure Patterns correlation with the age of the assets at all!
• And only 6% had a wear out curve
A. 4%
• So what?
B. 2% • If our maintenance schedule has been developed based
on principles of “life” then we are achieving…?
• Or worse…
C. 5% • 64% of failures were infant mortality failures..

D. 7% • 14% of all failures were seen as random, therefore we are


often doing absolutely nothing to manage these!

E. 14% • Do different assets fail differently?


• Complex assets…
F. 66% • Simple assets have dominant failure modes (Wear,
erosion, corrosion, evaporation etc)

• Regardless of the status in your industry – it will increase, as


automation, mechanization and asset complexity increases.
# Reliability-Centered Maintenance, (Nowlan and Heap)
Exhibit 2:13 Age related Patterns

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 13
Reliability-Centered Maintenance

The 6 Failure Patterns


UAL Broberg MSP
1968 1973 1982
A 4% 3% 3%

B 2% 1% 17%

C 5% 4% 3%

D 7% 11% 6%

E 14% 15% 42%

F 68% 66% 29%

# U.S. Navy Analysis of Submarine Maintenance Data and the Development


of Age and Reliability Profiles - Timothy M. Allen, Department of the Navy

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 14
Reliability-Centered Maintenance

The Objective of Maintenance


Initial Capability • So, if the objective of
(What it can do)
maintenance is to keep the asset
Margin for Deterioration running between what it “can” do
and what the users “want” it to
Desired Performance do. Then we must:
(What its users want it
Performance

to do)
– First, define what the users want the
asset to do in its present operating
context

– Second, determine if the asset is


able to meet these requirements

– Third, determine the maintenance


interventions required
# SAE JA1012 Figure 2
# SAE JA1012 Section 6.2 Performance Standards

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 15
Reliability-Centered Maintenance

What is RCM?

RCM is a process to ensure that assets continue to meet


their user requirements in their present operating context.

~John Moubray

RCM applies to any equipment where there is a need to


realise maximum operating reliability at the lowest cost

~ Stan Nowlan and Howard Heap

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 16
Reliability-Centered Maintenance

The RCM Structure


1. What are the functions and associated desired standards of performance of the asset in its
present operating context? (Functions)

2. In what ways can it fail to fulfil its functions? (Functional Failures)

3. What causes each functional failure? (Failure Modes)

4. What happens when each failure occurs? (Failure Effects)

5. In what way does each failure matter? (Failure Consequences)

6. What should be done to predict or prevent each failure? (Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found? (Default Actions)

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 17
Reliability-Centered Maintenance

Functions

The Seven Questions of RCM


(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of
performance of the asset in its present operating context?
(Functions)
(All Functions)
2. In what ways can it fail to fulfil its functions?
(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 18
Reliability-Centered Maintenance

The FMEA

The Seven Questions of RCM


(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of performance of
the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure? All failed states, causes


(Failure Modes) of failure, and the effects
of each failure
4. What happens when each failure occurs?
(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 19
Reliability-Centered Maintenance

Consequences
The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of
performance of the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


How it matters
(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 20
Reliability-Centered Maintenance

Will the loss of


HN HO HE HS No function caused by Yes ES EE EO EN
Does the failure have a No Is there an intolerable risk that No Is there an intolerable risk this failure mode on No Is there an intolerable risk that No
Is there an intolerable risk Does the failure have a
direct adverse effect on the failure could breach a that the failure could kill its own become that the failure could kill or
the failure could breach a
direct adverse effect on
operational capability? known environmental standard or injure someone? known environmental standard
evident to the injure someone? operational capability?
or regulation? or regulation?
operating crew
No Yes Yes Yes Yes Yes Yes No
under normal
Is a Predictive On-Condition task Is a Predictive task technically feasible circumstances? Is a Predictive task technically feasible Is a Predictive task technically feasible
technically feasible and effective? and effective? and effective? and effective?

HO1 Yes HS1 Yes ES1 Yes EO1 Yes

HN1
Predictive
Task
No HE1
Predictive
Task
No
RCM Decision EE1
Predictive
Task
No EN1
Predictive
Task
No

Is a Preventive Restoration task Is a Preventive Restoration task


Algorithm Is a Preventive Restoration task Is a Preventive Restoration task
technically feasible and effective? technically feasible and effective? technically feasible and effective? technically feasible and effective?

Yes HS2 Yes Based on Example 2 Yes EO2 Yes


HO2 ES2
Preventive
Restoration
Preventive
Restoration
SAE JA1012 Preventive
Restoration
Preventive
Restoration
Task No HE2 Task No task No EN2 Task No
HN2 EE2

Is a Preventive Replacement task Is a Preventive Replacement task Is a Preventive Replacement task Is a Preventive Replacement task
technically feasible and effective? technically feasible and effective? technically feasible and effective? technically feasible and effective?

HO3 Preventive Yes HS3 Preventive Yes ES3 Preventive Yes EO3 Preventive Yes
Replacement Replacement Replacement Replacement
Task Task Task Task
HN3 No HE3 No EE3 No EN3 No

Is a Detective task to detect the Is a Detective task to detect the


failure technically feasible and failure technically feasible and
effective? effective?

Yes Yes
HO4 HS4
Detective Detective
Task No Task No
HN4 HE4

Yes Yes Yes


HO5 ES4 EO4
Run-to Run-to-Fail ? Is a combination of Run-to Run-to-Fail ?
Combination tasks technically
-Fail -Fail
HN5 EE4 of tasks feasible and effective? EN4
No
No No
HO6 HS5 ES5 EO5
Redesign may Redesign is Redesign is Redesign may
be desirable compulsory compulsory be desirable
HN6 HE5 EE5 EN5

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 21
Reliability-Centered Maintenance

Failure Management Strategies


The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of
performance of the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure? Each task must be
(Proactive Tasks and Task Intervals) applicable and effective

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 22
Reliability-Centered Maintenance

What types of maintenance are there?

RCM Term Alternative Term What it is… Abbreviations

Predictive • On-Condition Maintenance Check an item for signs of PTIVE


Maintenance • Condition Based potential failures and leave
Maintenance (CBM) it in place on the
• Condition Monitoring (CM) condition that it will make
it to it’s next inspection
• Inspections interval.
Preventive • Overhaul A task to restore an assets PRES
Restoration • Scheduled Restoration original resistance to
• Restorative tasks failure prior to its failure,
this is a preventive task
• Rework
Preventive Replacement A task to replace an asset PREP
Replacement Overhauls (Also) prior to its failure, this is a
preventive task

Detective Failure finding A task to detect whether DTIVE


Maintenance Function testing an item has failed or not.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 23
Reliability-Centered Maintenance

Default Actions
The
The Seven
Seven Questions
Questions of
of RCM
RCM
(SAE
(SAE JA1011
JA1011 5a.
5a. -5g.
-5g. 2002
2002 ))

1. What are the functions and associated desired standards of


performance of the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)
Determine the actions to
7. What should be done if a suitable proactive task cannot be be taken if routine
maintenance cannot
found? (Default Actions)
performed
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 24
Reliability-Centered Maintenance

RCM-DO-02 Preparing for Analysis


What asset or system…?
• Before we know what a system or sub-system can do…we
need to know exactly what the system contains…
If we go too high.. We risk
de-motivating the team
Plant
Plant
and creating superfluous
analyses…

RCM is best Process 1


Process 1
Process 2
Process 2
Process 3
Process 3
Process 4
Process 4
performed at a
system level.
However, it can be
performed at an
Electrical System Mechanical Assets Instrumentation Fixed Equipment
Electrical System Mechanical Assets Instrumentation Fixed Equipment
equipment level in
special circumstances.
Centrifugal Pump
AC 3 phase motor
Hydraulic Motor
Chain Conveyor
If we go too low.. We risk Rotary Valves
paralysis by analysis…

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 25
Reliability-Centered Maintenance

RCM-DO-03 Functions and Functional Failures


The Seven Questions of RCM
(SAE
(SAE JA1011 5a.
5a. -5g. 2002 ))
1. What are the functions and associated desired standards of
performance of the asset in its present operating context? We will cover…
(Functions) • Operating context
• Types/Categories of
Functions
2. In what ways can it fail to fulfil its functions?
• How to write a
(Functional Failures) function statement

3. What causes each functional failure?


(Failure Modes)
SAE
SAEJA1011
JA10115.1.2
5.1.2All
AllFunctions
Functionsof of
4. What happens when each failure occurs?
(Failure Effects)
the asset/system shall be identified
the asset/system shall be identified
(all
(allprimary
primaryand
andsecondary
secondaryfunctions
functions
5. In what way does each failure matter? including
including the functions ofall
the functions of all
(Failure Consequences) protective
protectivedevices)”
devices)”
6. What should be done to predict or prevent each failure?
(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 26
Reliability-Centered Maintenance

Operating Context
1. Duty Cycles…
Our car is a Ford Focus.
2. Weather and the immediate Great car…we maintain it to
Environmental… the manufacturers
specifications…
3. Applicable regulations and laws…

4. Asset Configuration… …but they don’t!

5. Remoteness…
Why?
6. How it is managed…

7. Public perceptions… TheOperating


The OperatingContext
Contextofofany
anyasset
assettells
tells
you how that asset is operated.
you how that asset is operated.
8. Budget restraints…
This
Thiswill influencehow
willinfluence howwe
wemaintain
maintainit.it.
9. Skills available…
ItItdoesn’t
doesn’ttell
tellyou
youwhatwhatthe
theasset
assetcan
cando,
do,or
or
10. Any other factor that determine what we want it to
what we want it to do….do….
how we use the asset (s) or system

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 27
Reliability-Centered Maintenance

Writing Functions

Writing Functions
SAE JA1011, 5.1.3 - All functions shall contain a verb, an object and a
performance standard (quantified in every case where this is done)

Pump can deliver up


Off take from Tank
to 1000 l/minute
800 l/ minute

X We
Weaccept
will
acceptthat
that“times
deteriorate.
will deteriorate.
“timesarrow”
arrow”means
meansthat
thatassets
assets

Performance
Performancestandards,
standards,tell
tellus
usthe
theminimum
minimum
level
level of performance acceptable to theusers
of performance acceptable to the usersor
or
owners of the asset.
owners of the asset.
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 28
Reliability-Centered Maintenance

Performance Standards
(What it can do)
4. Total
Margin for Deterioration 1. Between Limits
2. Specific
What its users want it to
do
Performance

3. Varying – Up To

6. Open

One or more criteria for


performance
5. Multiple

Up to 800
l/minute At 100 bar

# SAE JA1012 Figure 2


# SAE JA1012 Section 6.2 Performance Standards

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 29
Reliability-Centered Maintenance

These are NOT Functions! (Why?)

• To be safe…
• To be reliable…
• To comply with environmental standards…
• To comply with IE2314356XXX (etc)…

Performance
Performance standards
standards need
need to
to be
be quantified
quantified
where possible to avoid ambiguity.
where possible to avoid ambiguity.

E.g.
E.g. What
What is
is reliable,
reliable, and
and who
who says
says so?
so?

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 30
Reliability-Centered Maintenance

Exercises

Primary Function Statements


The reason why an asset is purchased in the first place.
SAE JA1011, 5.1.3 - All functions shall contain a verb, an object
and a performance standard (quantified in every case where this is
done)

• A light fitting in an office…

• An office chair…

• A projector used in presentations…

• A pushbike for you to ride to work on…

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 31
Reliability-Centered Maintenance

Secondary Functions
Secondary Functions
(SAE JA1012 6.2.2)
Secondary functions are all the other requirements we have of
the asset (s) that are not covered by the primary function.

Environmental Integrity
Safety / Structural Integrity
Control / Containment / Comfort
Appearance
Protective Devices and Systems
Economy and Efficiency
Superfluous

The primary Function of an office chair was given as “To support


a person weighing up to 150 kilograms in a seated position”
What are the secondary functions?
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 32
Reliability-Centered Maintenance

RCM-DO-03b Air Conditioner

An office is located in an extremely hot


environment where the average annual
temperature ranges between 30oC (86oF) and
41oC (105oF).
They have installed an air-conditioning
system that will, at maximum output,
maintain a temperature differential of 20oC
(+/-0.56oC) between the outside ambient air
and the inside office air. It will also
dehumidify the air to a level of 45% (+/-
4%).
The office is approximately 914m2
(3000ft2); the air conditioning unit will
provide six BTU (British Thermal Units) of
cooling.

Operational Description
The system is very simple and consists of a reciprocating piston compressor, a condenser, a thermal
expansion valve, and an evaporator. A three-phase electric squirrel cage motor drives the compressor
via four parallel v-belts. A guard is in place to stop people touching the belts while they are in use.
Setting air conditioning temperatures can be very individual and is almost never without complaints.
Over the years the company has determined that a temperature in the range of 19oC (~66oF) and 23oC
(~73oF) is the most comfortable to work at, and causes the least amount of arguments. The thermostat
is set to 21oC (~70oF), and they would like it to not exceed 23oC, or to not go below 19oC.
The compressor is oil lubricated, and compresses a standard refrigerant gas, which is a known
greenhouse gas. Any release of the refrigerant breaches a number of environmental regulations. It
takes low-pressure superheated gas from the evaporator, compresses it to high-pressure superheated
gas, and pushes it through the condenser.
A draft over the condenser coils by comes from a three phase electric fan, which removes the heat and
changes the high-pressure vapor to a high pressure liquid. When the condenser is working well there is
a temperature differential of 3.1oC (10oF) across the condenser.
De-superheated high pressure liquid leaves the condenser in the liquid line to the thermal expansion
valve (TX valve). The TX valve regulates the flow of high-pressure liquid refrigerant into the
evaporator coil. It is designed to open just enough to let refrigerant flow while maintaining a high
pressure differential from its inlet to its outlet. The pressure at the exit of the expansion valve is low
enough that it initiates a phase change in the liquid refrigerant to a vapor.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 33
Reliability-Centered Maintenance

A three phase motor forces draft air over the evaporator coils and superheats the vapor. This creates
the cooling effect. Both the evaporator fan and the condenser fan have lightweight steel cowls to stop
foreign objects from damaging the fan blades.
The refrigerant then leaves the evaporator as a superheated gas and reinitiates the process again with
the compressor. Any failure of the evaporator means that there is a possibility of liquid entering the
compressor, destroying the internal components. When the evaporator is working well there is a
temperature differential of 3.1oC (10oF) across its coils.
The electric motor drives of the compressor and the evaporator have thermal overloads that will trip
the circuit if the full load current (FLA) reaches 125%, the condenser fan has protection of 115% of
FLA.
The company has local research reports that show that bacteria,
viruses and fungi tend to thrive in that part of the world when the
humidity is greater than 47%. Similar “wellness” reports have
shown that workers in an office environment are most
comfortable between 30% and 44%. If the humidity is too low
workers offer suffer from dry eyes, increased static and it feels
colder than it is. Too high and workers feel very uncomfortable
and feel hotter than it is.
The air conditioner typically needs to run for 8-10 minutes before
the dehumidification process can commence. At its present
design capacity, it will run for 100% of the time in summer, and
40-50% of the time during other seasons in this climate.
However, if the thermostat fails, and stops the compressor at
temperatures above its set point, then this will cause short run
times, and will not allow the unit to dehumidify the air in the
office space.
The company using this unit has other similar systems installed
in other offices and finds them to be reliable and economical to install and to run. However,
discussions with the manufacturer and a study of the history of similar systems have produced the
following list of common failures.
a) Condenser fins flattened, preventing forced airflow over the condenser coils. (Installation
errors)
b) Evaporator fins flattened, preventing forced airflow over the evaporator coils. (Installation
errors)
c) Clogging of the TX valve, causing a total failure of the system (Normally occurs every 2 years)
d) Wear out of the valves within the compressor. (Normally once every 5 years)
e) Failure of the thermostat, meaning it will not trip at all (once every 4 years), or it will trip at
temperatures greater than the set point. (once every 6 years)

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 34
Reliability-Centered Maintenance

While these are common failure modes, they do not include all of the likely failure modes. For
example, the drive motors for the compressor, the condenser, and the evaporator are all standard three-
phase squirrel cage electric motors and suffer from the failure modes that generally occur in these
types of motors.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 35
Reliability-Centered Maintenance

Functional Failures
The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of performance of
the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 36
Reliability-Centered Maintenance

Failed States

Failed States
Functional Failures indicate failed states – “How” it is unable to do
what we want it to.
• We need to define all of the Failed States for every function.
– Failed states are derived directly from the function statements and
their performance standards
– Generally cover too much, too little (partial) and not at all…(total)
• To pump water from tank A to tank B at up to 800 l/minute
(Varying)
– Unable to pump at all
– Pumps at more than 800 l/minute (?)
• To pump water from tank A to tank B at between 800 l/minute
and 1000 l/minute (Multiple)
– Unable to pump at all
– Pumps at less than 800 l/minute
– Pumps at more than 800 l/minute

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 37
Reliability-Centered Maintenance

Exercise

The primary function of a grinding machine may be listed as: “To grind bearing journals in a
cycle time of 3.00 minutes ± 3 seconds, to a diameter of 75 mm ± 0.1 mm, with a surface
finish of no greater than Ra 0.2.”

0.05 75 mm 0.05

0.05
0.05

3,06

3.03

3 minutes

2.57

2.54

# SAE JA1012 Section 7.2

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 38
Reliability-Centered Maintenance

Exercises

To start pumping water from tank A


to tank B at a volume of
800l/minute, at a pressure of 100
bar, when the water level is at the
High Level
low level switch and to stop when it
reaches the high level switch
Low Level

100 bar
800l/minute

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 39
Reliability-Centered Maintenance

RCM-DO-04 Failure Modes and Effects


The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of performance of
the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 40
Reliability-Centered Maintenance

Reasonably Likely

Reasonably Likely
Pump struck by lightning North West Australia – reasonably likely
The Atacama Desert in Chile – Highly unlikely

Pump Stolen Mexico – Reasonably Likely

The USA - Unlikely


Supply cable insulation Saudi Arabia – Reasonably Likely
deteriorated due to sun
exposure The UK – Not likely

Levels
Levels of
of reasonableness
reasonableness determined
determinedby
bythe
the
analysis group…..
analysis group…..

IfIf no
no agreement
agreementisispossible
possible then
thenthe
theorganization
organization
that owns the assets must make a decision
that owns the assets must make a decision
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 41
Reliability-Centered Maintenance

Causality
Level 1?
… or Level 2?
Unable to pump water at all
1. Motor Fails Unable to pump water at all
2. Pump Fails 1. Motor Fails due to stator
3. Pipes Fail earth fault
4. Inlet to tank B blocked 2. Motor fails due to short
5. Outlet from tank A between the coils
blocked 3. Motor fails due fan end
bearing failure
4. Motor fails due to drive
… or Level 3? end bearing failure
Unable to pump water at all 5. Motor fails due to
1. Drive end bearing fails overheating
due to ingress of water 6. Motor fails due to loose
2. Drive end bearing fails connections
due to lack of adequate
grease
3. Drive end bearing fails
due to misalignment

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 42
Reliability-Centered Maintenance

How far is far enough?


Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7

Motor stops Due to failed Due to lack of Due to inadequate


drive end grease training of the
bearing lubrication technician

Due to the Due to improper Due to lack of Due to former


wrong grease purchasing controls communication differences
between between
maintenance and department
purchasing managers

Due to inadequate
training of the
lubrication technician

Due to Due to poor Due to incorrect


misalignment installation procedure writing
during procedures procedures
installation
Due to inadequate Due to poor Due to lack of Due to
tools purchasing communicatio former
controls ns between differences
maintenance between
and department
purchasing managers

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 43
Reliability-Centered Maintenance

Writing a Failure Mode

Writing a failure mode


• Failure modes are the reasons why something is in a failed
state.
• When defining failure modes first we need to understand
how it has failed (the functional failure) then we determine
why it has failed.
• Avoid verbs like, breaks, fails, malfunctions
• Use the “due to” convention and at least a noun and a verb
(Not a rule – a guide)
• Only one cause per failure mode
Normally written something like this…
Functions Functional Failures Failure Modes

To pump Unable to pump water from Drive end motor bearing failed due to
water from tank A to tank B lack of grease
tank A to tank Short in motor windings due to
B at 800 insulation degrades over time
l/minute
Drive end motor bearing seized due
to misalignment on installation.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 44
Reliability-Centered Maintenance

Types of Failures
What its users want it
What it can do What it can do to do

What its users want it to What its users want it to What it can do
do do

Wear and Incorrect use, Not fit for


tear, often purpose
degradation of deliberate,
the asset overloading
Maintenance Operations Engineering /
Purchasing

Who’s responsible for reliability?


Reliability is a process… not
a department!
# SAE JA1012 Figure 2

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 45
Reliability-Centered Maintenance

The Problem with Data

Where does the data come from?

“One of the most


important contributions of the Reliability-
Centered Maintenance Program
is its explicit recognition that certain
types of information … are, in principle
,as well as in practice, unobtainable.”
Mathematical Aspects of Reliability Centered Maintenance
H.L. Resnikoff

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 46
Reliability-Centered Maintenance

Where does the data come from?

“This means … in practice and in principle,


the policy must be designed without using
experiential data which will
arise from the failures the policy is meant
to avoid.”
Mathematical Aspects of Reliability Centered Maintenance
Data H.L. Resnikoff
30%

Knowledge
70%

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 47
Reliability-Centered Maintenance

Effects
The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of performance of
the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 48
Reliability-Centered Maintenance

Effects and Consequences


• Effects are the direct outcome of failure mode. (What happens)

• The primary role of the effects statement is to inform us of the


consequences (Why it matters)
– When do we know about it, what evidence is there that it has failed?
– Safety implications
– Implications for Environmental standards and regulations
– Operational implications
• Cost of repair
• What is required to restore the function?
• Time to repair (TTR)
– Any other implications such as reputation, news headlines, etcetera.

• SAE JA1011, 5.4.1 “Failure effects shall describe what would happen if
no specific task is done to anticipate prevent or detect the failure

• They are the typical worst case scenario… not the extreme worst case
scenario.
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 49
Reliability-Centered Maintenance

RCM-DO-05 Consequences and Effectiveness


The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of performance of
the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 50
Reliability-Centered Maintenance

A Hierarchy of Consequences

Operational? Environment? Safety? Evident Safety? Environment? Operational?


or
Hidden?

On-Condition Task? On-Condition Task? On-Condition Task? On-Condition Task?

Preventive Restoration or Preventive Restoration or Preventive Restoration or Preventive Restoration or


Preventive Replacement? Preventive Replacement? Preventive Replacement? Preventive Replacement?

Failure Finding Task? Failure Finding Task?

No scheduled maintenance
Combination of Tasks?
No scheduled maintenance
Redesign is Compulsory
Redesign may be desirable
Redesign may be desirable
Redesign is Compulsory

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 51
Reliability-Centered Maintenance

Hidden or Evident?

Hidden-Failure: Multiple Failure Consequence:


Failure of pressure Event: Explosion of the
release valve on high Dangerous build-up of pressure vessel when
pressure vessel in a gas gas pressure within the under high pressure
plant pressure vessel. conditions

To Process

To Process
To Process
To Process

To Process

To Process
# The Maintenance Scorecard, Daryl Mather,
Industrial Press, ISBN 0831131810

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 52
Reliability-Centered Maintenance

Hidden Failures

HN HO HE HS ES EE EO EN
Will the loss of Is there an intolerable Is there an intolerable Does the failure have a
Does the failure have a Is there an intolerable Is there an intolerable function caused by this risk that the failure risk that the failure direct adverse effect on
direct adverse effect on risk that the multiple risk that the multiple
operational capability? failure could breach a failure mode on its could kill or injure could breach a known operational capability?
failure could kill or own become evident to environmental standard
known environmental someone?
injure someone? the operating crew or regulation?
standard or regulation?
under normal
circumstances?

• RCM begins by separating hidden and evident consequences

• By themselves, hidden failures have no consequences, requiring an


additional failure before they have any tangible impact

• The ultimate consequences of failure are often severe

• Can be separated into Safety, Environmental and Operational consequences

• Generally devices that provide protection for safety, the environment of


operations such as; high-high level switches, over-speed switches, standby
equipment, etc.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 53
Reliability-Centered Maintenance

Safety
Safety Consequences

HN HO HE HS ES EE EO EN
Will the loss of Is there an intolerable Is there an intolerable Does the failure have a
Does the failure have a Is there an intolerable Is there an intolerable function caused by this risk that the failure risk that the failure direct adverse effect on
direct adverse effect on risk that the multiple risk that the multiple failure mode on its could kill or injure could breach a known operational capability?
operational capability? failure could breach a failure could kill or own become evident to environmental standard
known environmental someone?
injure someone? the operating crew or regulation?
standard or regulation?
under normal
circumstances?

• Once the failure has been categorized as Hidden or Evident, the


first consideration in evaluating any failure possibility is safety to
life and limb.

• Asks the team to determine whether there is an intolerable risk of


death or injury

• Will not default to run to failure under any circumstances, at all


times there is a need to take some action

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 54
Reliability-Centered Maintenance

Environmental

Environmental
Environmental Consequences
Consequences

HN HO HE HS ES EE EO EN
Will the loss of Is there an intolerable Is there an intolerable Does the failure have a
Does the failure have a Is there an intolerable Is there an intolerable function caused by this risk that the failure risk that the failure direct adverse effect on
direct adverse effect on risk that the multiple risk that the multiple
operational capability? failure could breach a failure mode on its could kill or injure could breach a known operational capability?
failure could kill or own become evident to environmental standard
known environmental someone?
injure someone? the operating crew or regulation?
standard or regulation?
under normal
circumstances?

• Gained prominence through the 1980’s with the onset of global


warming and increased environmental awareness.

• Deal with an intolerable risk of breaking environmental standards,


regulations or laws. (Internal or external)

• Will not default to run to failure under any circumstances, at all times
there is a need to take some action

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 55
Reliability-Centered Maintenance

Operational
Operational Consequences

HN HO HE HS ES EE EO EN
Will the loss of Is there an intolerable Is there an intolerable Does the failure have a
Does the failure have a Is there an intolerable Is there an intolerable function caused by this risk that the failure risk that the failure direct adverse effect on
direct adverse effect on risk that the multiple risk that the multiple
operational capability? failure could breach a failure mode on its could kill or injure could breach a known operational capability?
failure could kill or own become evident to environmental standard
known environmental someone?
injure someone? the operating crew or regulation?
standard or regulation?
under normal
circumstances?

• Any failure consequence that has direct, or secondary, negative


effect on the operations

• Task selection is, in part, determined by cost effectiveness trade off


calculations as opposed to levels of tolerable risk.

• Includes “other” cost implications such as reputation, adverse


newspaper coverage and other PR related issues.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 56
Reliability-Centered Maintenance

Repair Only

Non-operational consequences

HN HO HE HS ES EE EO EN
Will the loss of Is there an intolerable Is there an intolerable Does the failure have a
Does the failure have a Is there an intolerable Is there an intolerable
direct adverse effect on risk that the multiple function caused by this risk that the failure risk that the failure direct adverse effect on
risk that the multiple failure mode on its could breach a known operational capability?
operational capability? failure could breach a could kill or injure
failure could kill or own become evident to environmental standard
known environmental injure someone? someone?
the operating crew or regulation?
standard or regulation?
under normal
circumstances?

• Economic consequences only


• Costs of repair and secondary damages

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 57
Reliability-Centered Maintenance

RCM-HO-05a Assigning Consequences


Hidden or
HN HO HE HS Evident? ES EE EO EN

a) A household circuit breaker continuously trips when there is no fault present.


b) A hydraulic positioning unit moves a train into position under the feed hopper of a large ammonia
plant. Since it installation, some ten years ago, the high pressures and extreme heat of the working
environment it has caused numerous leaks, resulting in some downtime each time. (In addition, all
potential fire risks).
Underneath the positioning unit is a concrete bund, there to stop any hydraulic oil seeping into the
ground below, breaching a number of environmental regulations. However, due to errors in the
pouring of the concrete, it is allowing small quantities of hydraulic oil to pass through it every time
there is a leak. What is the consequence of the failure of the concrete?
c) Vibration sensors protect a forced draft fan. Their role is to protect the fan from high secondary
damages stemming from unplanned bearing failure. Due to the critical nature of this asset, the
company keeps a spare fan assembly. In case of any failure of the fan, the quickest way to restore
the function is to replace the entire assembly. This particular fan does not have any safety
consequences associated with bearing failure.
They are set at 7mm/second and provide a warning light for operators so they can shut the fan
down immediately. Due to a failure of the indicating bulb at the control panel, the alarm goes
unnoticed when vibration reaches the alarm level.
d) A wastewater plant has turbidity meters to measure the relative clarity of the effluent leaving the
plant into the local river system. High percentages of microscopic particles will cause the effluent
to be excessively “cloudy”, the turbidity meter then adjust the dosing earlier in the process to
reduce the impact on the environment.
Over time, the calibration of this meter has drifted, so much so that the effluent leaving the plant
contains a high percentage of microscopic solids, breaching a number of environmental laws and
regulations, as well as adversely affecting the wildlife in the area.
e) Over time, the brake pads in a car wear down; meaning that the car will not stop when required.
The result was an accident when attempting to stop at a red light.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 58
Reliability-Centered Maintenance

f) A speed sensor protects a turbine from over-speed, preventing it from speeding up to destruction,
sending debris in every direction. The sensor has failed in such a way that it will not trip the
turbine on over speed.
g) A standby motor to drive a pumping system has developed false brinnelling (flat spots) of the
bearings. This means that that when it is called on to run it will run for a short while before
tripping the motor on overload. If it runs continually in this fashion, it could also cause secondary
damage to the motor shaft.
h) When the level in a tank reaches the low level, the low-level switch starts a pump. Because of
vibration in the surrounding area, one of the terminals comes loose and the switch will not work
when it is required to.
i) A pumping system has a duty and a standby pump. The stand by pump takes over the function if
ever the duty pump should fail. Over time, the resistance of the insulation within the duty motor
breaks down, and it suffers an earth fault.
j) Due to a pinhole leak, the air pressure has gone out of the spare tire in your car.
k) Each aircraft is equipped with life preserver jackets for passenger use in case of a water landing.
One of these has developed a failure, preventing it from inflating when required.
l) An electrically driven “pony” pump primes a lubrication system on start up, at a specified pressure
the main pump takes over to run the system at operating pressure. This is an effort to minimize the
energy usage of the plant, and the main pump could easily start up under full load with no
consequence aside from increased energy usage. The pony pump has a failure of the mechanical
seal and be unserviceable for a time.
m) An air-conditioning system has had the condenser fins flattened out by vandalism; the result is that
the airflow through the condenser is not sufficient to reduce the temperature prior to the refrigerant
gas travelling to the evaporator. The result is that the system will not reduce room temperature
below the 35oC ambient temperature. This affects the health of the people working in the room and
results in two people suffering from heatstroke.
n) The high-high level switch on a tank trips the pump when there is a high-high level. This then
needs to a manual reset. At present, this switch has spurious trips that cause the pump to stop when
there is no high level.
o) A large-scale screening facility gets its supply from a conveyor running the length of the building
some four stories above the ground. Along the side of the conveyor are walkways with handrails.
One of the handrails has a crack in it that is not visible to the naked eye. However, if somebody
were to use it, it would give way, leaving the person to fall four stories to their death.
p) An IT data center houses all of servers containing the corporate IT information. The cooling
system of a data center requires the rooms to be continuously at a temperature of between 20oC
and 25oC, and a humidity range of between 40%-60%.
A failure of the power supply could lead to outright server failure, or at the very least increase
failure rates of electronic components. This would have a catastrophic effect on business

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 59
Reliability-Centered Maintenance

continuity. For these reasons, a diesel generator set is on permanent standby protecting the power
supply to the coolers and humidifiers; an uninterruptible power supply or UPS further protects this.
The diesel generator set has developed a failure in the starter circuit due to corroded battery
terminals, meaning it will not be able to start when required.
q) An operating company used a tank farm to store flammable liquid raw material. A pressure safety
valve (PSV) set at the tank maximum allowable working pressure (MAWP) of 100 psig protected
one of the tanks containing a highly reactive material.
The previous PHA identified the plugging of the PSV inlet as a potential concern. The PSV’s
annual inspection reports verified plugging, substantiating this concern. The PHA team
recommended the installation of a rupture disc upstream of the PSV.
A month later, an overpressure event (triggered by contamination) caused the tank pressure to
reach 180 psig before the rupture disc blew and vented the tank contents. The ensuing Incident
Investigation revealed that the rupture disc had developed a pinhole leak and the space between the
rupture disc and PSV had pressurized to the normal tank pressure of 80 psig.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 60
Reliability-Centered Maintenance

Applicable and Effective


Applicable and Effective
(Based on diagram 17 of the SAE JA1012)

Before selecting

Applicable
any failure
management policy
analysts first need
to determine
whether or not the
task is actually
possible!

Effective Within
WithinRCMRCMNO NOtask taskcan canbe
beapplied
appliedtoto
any
any failure mode withoutfirst
failure mode without first
Then they need to determine whether the task establishing
establishingthat thatititisisactually
actuallypossible
possible
will be worthwhile in terms of either cost or risk. totodo
dothe
thetask,
task,and andsecondly
secondlywithout
without
(Based on the consequences) ensuring
ensuringthatthatititwill
willadequately
adequatelymanage
manage
the
theconsequences.
consequences.
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 61
Reliability-Centered Maintenance

Tolerable levels of Risk

What is risk, and how tolerable should it be…?

Ideal Reality
Risk is the likelihood of
an unwanted event

• People often forget to fear those things that rarely happen…


particularly in the face of productivity challenges, market share
opportunities and competitive necessities.
# Human Error, James Reason

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 62
Reliability-Centered Maintenance

Example Tolerable Risk Levels

Government Set Tolerable Risk UK Hong Kong The Netherlands Australia


Criteria

Individual risk minimum (Worker) 1x 10-5 Not Used Not Used Not Used
Individual risk minimum (Public) 1x 10-6 Not Used 1x 10-6 Not Used
Individual risk maximum (Worker) 1x 10-3 Not Used Not Used Not Used
Individual risk maximum (Public) 1x 10-4 1x 10-5 1x 10-6 1x 10-6

Survey of U. S. Corporate Tolerable Risk High Range Low Range


Criteria

Minimum individual risk (Worker) 10-5 10-9


Maximum individual risk 10-3 10-6
individual SIF individual risk target 10-3 10-6
E.M. Marszal, Survey of process plant risk tolerance criteria and third party liability settlements, exida.com, Philadelphia, 2000.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 63
Reliability-Centered Maintenance

Hidden Failures

Hidden Failures

A hidden functional failure, on its own,


will not become evident to the operators
under normal operating circumstances

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 64
Reliability-Centered Maintenance

The five main categories for hidden failures…

• The majority of hidden failures occur on protective


devices, these are devices that:
• Warn of abnormal conditions
• Shutdown equipment in case of a failure
• Eliminate or alleviate abnormal conditions caused by
failure
• Take over from a function that has failed
• Prevent dangerous situations from arising

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 65
Reliability-Centered Maintenance

Most protective devices can fail in two ways…

• By acting when they are not needed…


• By ceasing to provide protection….

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 66
Reliability-Centered Maintenance

The Famous Pump Example

If the ultimate high level


switch fails closed, it is Ultimate high
evident… 1000 l/m level switch.
(normally open)
If the ultimate high level
Shuts the pump
switch fails open, then
off until
nobody knows it has failed…
manually reset.

High level shuts off Low level


pump until low level switch turns
turns it back on again on the pump
until the level
reaches the
high level
Low low level switch switch
turns off the pump until
manually reset 800 l/m

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 67
Reliability-Centered Maintenance

Item

Item

Item
Function Function Failure Failure Modes and Effects

1 To pump between A Unable to pump 1 Pump blocked


tank A and tank B between tank A
at up to 800 l/m and tank B at up
2 Pipes blocked
to 800 l/m
3 Ultimate high
Evident level fails closed
Item

Item

Item
Function Function Failure Failure Modes and Effects

2 To stop the pump A Unable to stop the 1 Ultimate high


on ultimate high pump on ultimate level fails open
level high level

Hidden

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 68
Reliability-Centered Maintenance

Slide 18

The probability that the protected function will fail


in any one cycle is given by its failure rate
One year
Protected
Function B Fails

Protective
Device C Fails

If the failure rate is once in four years, then the probability that it
will fail in one year is 1 in 4.

(This corresponds to a mean time between failure of four year)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 69
Reliability-Centered Maintenance

Slide 19

The probability that the protective device will be in a failed state at


any point in time is given by its downtime
(if it conforms to a random failure pattern)

One year
Protected
Function B Fails

Protective
Device C Fails

If the downtime is 33% then the probability that is will be in a failed


state at any point in time is 1 in 3.
(This corresponds to an availability)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 70
Reliability-Centered Maintenance

Slide 20

The Probability of a Multiple Failure

One year
Protected Mean Time Between Failures = 4 years
Function B

Protective
Device C Availability = 67% Downtime = 33%

The probability that B will fail while C is in a failed state:


1 in 4 x 1 in 3 = 1 in 12
(In other words there is a one in twelve chance that the multiple
failure will occur in any one year)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 71
Reliability-Centered Maintenance

When developing a failure management policy for a hidden


function, the first stage is to decide what probability we are
prepared to tolerate for a multiple failure…

One year
Protected Mean Time Between Failures = 4 years
Function B

Protective
Device C Availability = 67% Downtime = 33%

The probability that B will fail while C is in a failed state:


1 in 4 x 1 in 3 = 1 in 12
Prepared to accept 1 in 1000

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 72
Reliability-Centered Maintenance

Reduce the probability of failure of the protected function


(by applying a suitable failure management policy)
One year

Protected Mean Time Between Failures = 10 years


Function B X
Protective
Device C Availability = 99% Unavailability = 1%
And/or by increasing the availability of the protective device:
- by preventing the failure of the protective device, or
- by periodically checking whether the protective device is still working and
repairing it if it has failed
- by modifying the system in some way
The probability that B will fail while C is in a failed state is now
1in 10 x 1 in 100 = 1 in 1000

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 73
Reliability-Centered Maintenance

• 6 identical PSV’s have each been checked once a year for 5 years (FFI = 1
year)

• So the devices have been in service a total of 30 years

• In that time 3 were found to be in a failed state


From Process

To Process

• So the MTBF of the devices (MTBFdevice) is 30 years / 3 failures = 10 years


Year 1 Year 2 Year 3 Year 4 Year 5

1
• We know that the failed devices failed
some time during the year before the 2
checks – but not when… 3

• It seems reasonable to assume that each 4


failed device was down for an average of 6 5
months
6

1. So the total downtime (DTdevice) was 1.5 years out of 30 or 5%

2. So on the basis of these figures it appears that: FFI = 2 x DT device x MTBF device

3. This is generally true if DT device <5% and MTBF device is random

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 74
Reliability-Centered Maintenance

Slide 24

Detective Task Frequency and Availability

First Inspection Second Inspection

Time

1 Year 2 Years 3 Years 4 Years

Maximum potential unavailability time = 2 years

First Inspection Second Inspection Third Inspection

Time

1 Year 2 Years 3 Years 4 Years


Maximum potential unavailability time = 1 year

Risk management of Hidden failures involves the management of


unavailability to within levels accepted by the company….
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 75
Reliability-Centered Maintenance

One year
Protected Mean Time Between Failures = 10 years
Function B Fails

Protective Failed
Device C
1 in 10 x 1 in 100 = 1 in 1000
Step One
Step Two Step Three Decide what
Determine / estimate how Calculate what unavailability of probability we
often the protected function the protective device enables us tolerate for the
is likely to need to protective to achieve 1 given 2 multiple failure
device
if then
DTdevice = Unavailability of the protective device (1/MTBFfunction) x DTdevice = 1/MTBFmultiple
MTBFfunction = Failure rate of the protected function
MTBFmultiple = Failure rate of the multiple failure or
DTdevice = MTBFfunction / MTBFmultiple

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 76
Reliability-Centered Maintenance

We have seen that…

Where:
FFI = 2 x DT device x MTBF device …1
Where:
…and that…
••FFI
FFI ==failure
failurefinding
findingtask
task
interval DTdevice = MTBFfunction / MTBFmultiple …2
interval
••Dt
Dtdevice = Unavailability of the
device = Unavailability of the
protective
protectivedevice
device
••MTBF
MTBFdevice = MTBF of the Therefore.. by substituting 2 into 1 gives…
device = MTBF of the
protective device
protective device
••MTBF
MTBFfunction = MTBF of the
function = MTBF of the
protected function
protected function
••MTBF
MTBFmultiple = MTBF of the
multiple = MTBF of the 2 x MTBFfunction x MTBFdevice
multiple
multiplefailure
failure FFI =
MTBFmultiple

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 77
Reliability-Centered Maintenance

Exercise 1
A small chemical plant has an eye bath to enable people to wash their eyes if dangerous chemicals
contaminate them. When asked what checks have been done on the eye bath in the past, the
maintenance department said “that’s the production departments job”. However, production thought
the safety officer was doing it, who in turn thought it was “looked after by the preventive maintenance
system”. As a result, it appears that the eye bath has never been checked, at least on a routine basis.
The eye bath has been in place for eight years. A quick check now reveals that the eye bath is actually
in working order, so the only data we have about the reliability of this bath is that it has not failed in
eight years. Further investigation reveals that someone needed to use it in an emergency on two
occasions since it was installed.
The plant manager has asked you to set up a checking routine for this eye bath as a matter of urgency.
How often should the check be done?
The safety committee decided that they do not want the eye bath to be inoperable when it is needed
more than once in 1,000,000 years. A series of phone calls to other companies reveals 60 eyebaths that
have been installed for a total of 720 years between them. 2 of these have been found to be in a failed
state in that period.

2 x MTBFdevice x MTBFfunction
FFI =
MTBFmultiple

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 78
Reliability-Centered Maintenance

Exercise 2
A tank is used to store diesel and is
enclosed in a concrete bund. This
is intended to prevent anything
which might escape from the tank
seeping into the ground and
breaching a variety of
environmental regulations. The
review group decides that they
would not like this to happen more
than once every 10,000 years.
A review of a number of similar
systems and discussions with users
suggest that a significant quantity
of liquid is likely to escape into the
bund no more than once every 150
years on average, usually due to leaks in pipeline flanges or seals. The integrity of the bund itself has
never been checked until now, but it can be done in a number of ways.
One is to fill the bund with water to a depth of (say) 100 millimeters, and check whether the water
level drops by more than the rate of evaporation over a period of (say) two days. Such a check is
carried out on the bund, and reveals that it is still intact.
So in the absence of any hard data at all, and after considerable discussion, the group decides that in
any one year the chance of the bund springing an invisible leak (due to subsidence, latent construction,
defects or whatever) is “1 in 100”.

2 x MTBFdevice x MTBFfunction
FFI =
MTBFmultiple

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 79
Reliability-Centered Maintenance

Exercise 3
A steel producing plant has a need for many product-
handling assets to move around the raw iron ore prior to
processing.
As part of this asset base, they have 10 large conveyors.
Each of these has 4 e-stops, one on either side of the head
end, and one on either side of the tail end of the conveyor.
The management has tasked the maintenance team with
determining a frequency for testing the function of each
of these e-stops to make sure that when we need them to
work they will work.
After some discussion, they consulted relevant
specifications and determined that they wanted these e-
stops to meet their SIL-2 classification. For this company
that means a likelihood of 1:100,000 (105) that any one
would have a failure in any one year.
They found that on their own plant they had never experienced a failure of one of the emergency stops.
However, on consulting a commercial data store they found the following information:
• A population was tested over a time period of 106 hours
• During this time the item was found to have failed 8 times in an undetected and unsafe manner,
• And 60 times in a detected safe fashion
They were installed all of the conveyors at roughly the same time 20 years ago. After conversations
with a few of the longer serving people they were able to ascertain that they had required to use an e-
stop, either to protect people or to protect life, approximately 15 times.
What frequency will they need to do for the detective task to maintain the level of risk that the
company has deemed as tolerable?

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 80
Reliability-Centered Maintenance

It needs a failure
of the function
This is the multiple failure. The before the
result of a protected function consequences of
failing while the protective device a hidden failure
is in a failed state… are realised!
Function

Device

Failure of the
device has no
consequences
by itself…..

Therefore… for a detective 1. Ensure that the task will not increase
maintenance task to be technically the probability of a multiple failure
feasible we need to:
2. Determine whether it is practical to
do the task in the desired intervals

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 81
Reliability-Centered Maintenance

Case Study - BP refinery Incident

• Indicator not designed to measure above 10 feet (Not


fit for purpose)
• High/High level alarm did not work (Hidden Failure)
• Indicator not accurate, filled to 13 feet when
indicating 10
• Valve for liquid flow left closed (Human Error)
• Reached 138 feet, indicator told operators 10 feet and
falling (Hidden Failure)
• Pressure controlling valve didn’t work (Hidden
Failure)
• High level alarm on the blow down drum did not
work, the last line of defense… (Hidden Failure)

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 82
Reliability-Centered Maintenance

Managing Safety and Environmental Consequences

Effectiveness for Safety and Environmental


Consequences
PTIVE PRES PREP Combination DTIVE Redesign
Hidden Safety The failure If a suitable Redesign is
or management policy proactive task Compulsory
Environmental must reduce the risk cannot be found
Consequences of the multiple then the first option
failure to a tolerable is to seek a failure-
level finding task that
reduces the
probability of
multiple failure to a
tolerable level

Evident Safety The failure Is a combination


or management policy of tasks
Environmental must reduce the risk technically
Consequences of failure to a feasible and worth
tolerable level doing?

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 83
Reliability-Centered Maintenance

Economic Consequences

The Economic Consequences of Failure

• A failure has economic consequences if it has


a direct adverse effect on operational
capability
– Reduced output
– Product quality considerations
– Poor customer service
– Increased operating costs
– Costs in terms of legal or regulatory charges
– Reputation costs

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 84
Reliability-Centered Maintenance

The Economic Consequences of Failure

• Two issues need to be considered when


assessing the consequences of failure
– How much the failure costs each time it occurs
– How often would it occur if no attempt was made
to prevent it
• As a result the consequences should always
be evaluated over a reasonable period of
time

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 85
Reliability-Centered Maintenance

Effectiveness of Economic Consequences


PTIVE PRES PREP Run to Failure Redesign

Hidden Economic Over a period of time, the If there is no If Run-to-Failure is


Consequences failure management policy effective routine not an option due to
must reduce the task then the frequency of failure
probability of a multiple initial default is or other
failure (and associated Run-to-Failure implications, then
total costs) to an redesign may be
acceptable minimum desirable.
Evident Economic Over a period of time, the
consequences failure management policy
must cost less than the
cost of the operational
consequences (if any)
plus the total cost of repair

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 86
Reliability-Centered Maintenance

RCM-DO-06 Applicability and Task Selection


The Seven Questions of RCM
(SAE JA1011 5a. -5g. 2002 )
1. What are the functions and associated desired standards of performance of
the asset in its present operating context?
(Functions)

2. In what ways can it fail to fulfil its functions?


(Functional Failures)

3. What causes each functional failure?


(Failure Modes)

4. What happens when each failure occurs?


(Failure Effects)

5. In what way does each failure matter?


(Failure Consequences)

6. What should be done to predict or prevent each failure?


(Proactive Tasks and Task Intervals)

7. What should be done if a suitable proactive task cannot be found?


(Default Actions)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 87
Reliability-Centered Maintenance

Technical
Hidden Will the loss of
Evident Feasibility Criteria

HN HO HE HS function caused by ES EE EO EN Applicability


No Yes
Does the failure have a No Is there an intolerable risk that No Is there an intolerable risk this failure mode on
Is there an intolerable risk
No Is there an intolerable risk that No
Does the failure have a
Criteria
direct adverse effect on the failure could breach a that the failure could kill its own become that the failure could kill or
the failure could breach a
direct adverse effect on
operational capability? known environmental standard or injure someone? known environmental standard
or regulation? evident to the injure someone?
or regulation? operational capability?

operating crew
No Yes Yes Yes Yes Yes Yes No
under normal
Is a Predictive On-Condition task Is a Predictive task technically feasible
circumstances? Is a Predictive task technically feasible Is a Predictive task technically feasible Predictive Tasks
technically feasible and effective? and effective? and effective? and effective? Is there a clear potential failure condition?
What is it? What is the P-F interval? Is the
interval long enough for action to be
HO1 Yes HS1 Yes ES1 Yes EO1 Yes taken to avoid or minimise the

HN1
Predictive
Task

Is a Preventive Restoration task


No HE1
Predictive
Task

Is a Preventive Restoration task


No
RCM Decision
Algorithm
EE1
Predictive
Task

Is a Preventive Restoration task


No EN1
Predictive
Task

Is a Preventive Restoration task


No
consequences of failure? Is the P-F
interval reasonably consistent? Is it
practical to do the task at intervals less
than the P-F interval?
Before selecting
Preventive Restoration

any failure
technically feasible and effective? technically feasible and effective? technically feasible and effective? technically feasible and effective?
Tasks
Is there an age at which there is an
Yes Yes Based on Example 2 Yes Yes
increase in the conditional probability of
HO2 HS2 ES2 EO2 failure? (Life?) What is this age? Do
Preventive Preventive SAE JA1012 Preventive Preventive
enough items survive to this age to satisfy
Restoration Restoration Restoration Restoration
No No No No the effectiveness criteria? Will the task
Task HE2 Task task EN2 Task

management policy
HN2 EE2 restore the original resistance to failure?
When there are safety or environmental
consequences, all items need to survive to
this age.
Is a Preventive Replacement task Is a Preventive Replacement task Is a Preventive Replacement task Is a Preventive Replacement task
technically feasible and effective? technically feasible and effective? technically feasible and effective? technically feasible and effective?
Preventive Replacement

HO3 Preventive Yes HS3 Preventive Yes ES3 Preventive Yes EO3 Preventive Yes
Tasks
Is there an age at which there is an
increase in the conditional probability of
analysts first need

Applicable
Replacement Replacement Replacement Replacement failure? (Life?) What is this age? Do
Task Task Task Task enough items survive to this age to satisfy
HN3 No HE3 No EE3 No EN3 No the effectiveness criteria?

to determine
When there are safety or environmental
consequences, all items need to survive
Is a Detective task to detect the Is a Detective task to detect the to this age.
failure technically feasible and failure technically feasible and
effective? effective?

Detective Tasks
Yes Yes

whether or not the


HO4 HS4 Is it possible to check the item has failed
Detective Detective without significantly increasing the risk of
No Task No a multiple failure? Is it practical to do the
Task
HN4 HE4 task at the required interval.

task is actually
Yes Yes Yes
HO5 ES4 EO4 Run-to-Fail or a Combination
Run-to Run-to-Fail ? Is a combination of Run-to Run-to-Fail ?
Combination tasks technically
-Fail
of tasks
-Fail of Tasks
HN5 EE4 feasible and effective? EN4 For Hidden Safety & Environmental
consequences if no Failure Finding Task is
No feasible then re-design is compulsory.
No No
HO6

HN6
Redesign may
be desirable
HS5

HE5
Redesign is
compulsory
ES5

EE5
Redesign is
compulsory
EO5

EN5
Redesign may
be desirable
For Evident Safety & Environmental
consequences if no combination of tasks
is feasible then re-design is compulsory.

For Operational & Non-Operational


possible!
consequences re-design may be desirable
rather than Run-to-Fail if the economic
consequences justify this.

Hidden Economic Hidden Safety and Environmental Evident Safety and Environmental Evident Economic
Consequence Consequence Consequence Consequences
To be effective:-
Over a period of time, the failure
To be effective:-
The failure management policy must reduce the risk of
Effectiveness Criteria To be effective:-
The failure management policy must reduce the risk
To be effective:-
Over a period of time, the failure management
management policy reduce the risk of a the failure to a tolerable level. of the failure to a tolerable level. policy must cost less than the cost of the
multiple failure (and associate total costs) operational consequences (if any) plus the total
to at an acceptable minimum. cost of repair.

Effective
Then they need to determine whether the task
will be worthwhile in terms of either cost or risk.
(Based on the consequences)

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 88
Reliability-Centered Maintenance

Technically Feasible

A routine task is applicable if it is physically


possible for the task to reduce the
consequences of the failure mode to an
acceptable level.

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 89
Reliability-Centered Maintenance

Types of Maintenance
RCM Term Colloquial Term What it is… Abbreviations
Predictive • On-Condition Maintenance Check an item for signs of potential PTIVE
Maintenance • Condition Based failures and leave it in place on the
Maintenance (CBM) condition that it will make it to it’s next
• Condition Monitoring (CM) inspection interval. (Planned)
• Inspections
Preventive • Overhaul A task to restore an assets original PRES
Restoration • Scheduled Restoration resistance to failure prior to its failure,
• Rework this is a preventive task (Planned)

Preventive Replacement A task to replace an asset prior to its PREP


Replacement Overhauls (Also) failure, this is a preventive task
(Planned)
Detective Failure finding A task to detect whether an item has DTIVE
Maintenance Function testing failed or not. (Planned)
Corrective Corrective A task to correct failing or failed assets. CTIVE
Maintenance Run to failure (RTF) (Planned)
Reactive Breakdown A task to restore the function of an
Maintenance Shutdown asset that is failing or has failed Reactive
(Unplanned)

RCM will always direct maintainers to choose a maintenance or operational activity over a redesign as it is
almost always the most cost effective means of managing failure.

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 90
Reliability-Centered Maintenance

Preventive Maintenance (PM’s)


Preventive maintenance tasks (PM’s) are routine actions that are
taken to prevent failures.
Withinreliability-centred
Within reliability-centred • Preventive Restoration – tasks to restore
maintenancethere
maintenance thereare
aretwo
two an items original resistance to failure.
types of preventive tasks.
types of preventive tasks. • Preventive Replacement – tasks to
replace an asset.

Characteristicsof
Characteristics ofPreventive
Preventivetasks….
tasks….

IfIfthe
thetask
taskisisaa The
Themajority
majorityofofitems
itemsmust
must There
Theremust
mustbebean
anage
agewhere
wherethe the
restoration task survive until this point (only conditional probability of failure
restoration task survive until this point (only conditional probability of failure will will
thenititneeds
then needsto to aafew
few“random”
“random”failures)
failures) increasedramatically
increase dramatically(a (aLife)
Life)
restore
restore thethe
items original
items original
resistance
resistance to to
failure…
failure…
Life

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 91
Reliability-Centered Maintenance

Preventive maintenance tasks (PM’s) are routine actions that are


taken to prevent failures.

•Withinreliability-centred
•Within reliability-centred • Preventive Restoration – tasks to restore
maintenancethere
thereare
aretwo
two an items original resistance to failure.
maintenance
types of preventive tasks.
types of preventive tasks. • Preventive Replacement – tasks to
replace an asset.

Characteristicsof
Characteristics ofPreventive
Preventivetasks….
tasks….

IfIfthe
thefailure
failuremode
modehashassafety
safetyor
or Theremust
There mustbebean
anage
agewhere
wherethe the
environmentalconsequences
environmental consequencesthen thenall
allitems
items conditional
conditionalprobability
probabilityofoffailure
failurewill
will
must survive to this age!
must survive to this age! increase dramatically (a Life)
increase dramatically (a Life)

IfIfthe
thetask
taskisisaa
restoration
restoration tasktaskthen
thenitit
needs to restore
needs to restore the the
itemsoriginal
original
Life
items
resistance
resistance to tofailure…
failure…

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 92
Reliability-Centered Maintenance

How important is this…?

The 6 Failure Patterns In later published studies


Only 11% of this number has ranged
failures in the from 8 to 23 % of all
original Nowlan and failures!
Heap report were
related to age!

89% of failures
were not related to
age!

Yet
Yetdespite
despiteknowing
knowingthese
thesefacts
factsmany
manypeople
peopleare
are
reluctant to let go of time based maintenance
reluctant to let go of time based maintenance Why?
(such
(suchasasmany
manyscheduled
scheduledshutdowns)
shutdowns)
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 93
Reliability-Centered Maintenance

Predictive Maintenance

• Nearly all failures give a warning that they are


about to occur or are in the process of occurring.

• These warnings are known as potential failures

• Operators see potential failures (warning signs) all


the time… but they are not sure what they are
warning them of…

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 94
Reliability-Centered Maintenance

Predictive Maintenance tasks

Items are checked for potential failures, and they are


left in service on the condition that they continue to
meet satisfactory performance standards

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 95
Reliability-Centered Maintenance

Predictive Maintenance tasks

The point where The point where we


failure starts to can detect it
occur (Potential Failure)
The point where it no
longer does what we
want it to do
(Functional Failure)

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 96
Reliability-Centered Maintenance

Predictive Maintenance tasks

P-F Interval
P1 (1 Month)
P2

2 weeks

P3

# Captured by Data, 2003, Daryl Mather


Inspection Interval = Less
than the P-F Interval
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 97
Reliability-Centered Maintenance

Predictive Maintenance
The P-F interval is long enough
for action to be taken to avoid,
eliminate or minimise the
consequences of failure
There is a clear potential failure
condition (in other words there is a
clear warning that the failure of
the onset of failure) The P - F Interval

The P-F interval is


reasonably consistent
Resistance to Failure

A task can be done at


Potential Failure Identified intervals less than the P-
F interval

Functional Failure Occurs

Time or Task Intervals

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 98
Reliability-Centered Maintenance

Predictive Maintenance tasks

P-F Interval
P1 (10 Months) P-F Interval
P1 (1 Month)
P2
P2

P3 P3

Inspection Interval Inspection Interval


(2 weeks) (3 months)
# Captured by Data, 2003, Daryl Mather

© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 99
Reliability-Centered Maintenance

Condition Monitoring

Product Quality Monitoring

The Human Senses

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 100
Reliability-Centered Maintenance

Detective Maintenance

Detective Maintenance Tasks

It needs a failure
of the function
This is the multiple failure. The before the
result of a protected function consequences of
failing while the protective device a hidden failure
is in a failed state… are realised!
Function

Device

Failure of the
device has no
consequences
by itself…..

Therefore… for a detective 1. Ensure that the task will not increase
maintenance task to be technically the probability of a multiple failure
feasible we need to:
2. Determine whether it is practical to
do the task in the desired intervals
© Copyright Meridium, Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 101
Reliability-Centered Maintenance

Exercise 1 – Task Categories


1. Once every three years the flow rate of a wastewater pumping station is checked to see if it has
deteriorated at all. If it has the team plan to replace the impeller within two weeks.

2. The drive end bearing of a motor is greased every 3 weeks

3. A hydraulic oil system provides pressure to drive the hydraulic motor that powers an apron
feeder. Every so often there is a differential pressure alarm, which signals when the filter is no
longer able to filter to the correct level and rate. When this occurs, the maintenance team
cleans the filter.

4. A weight meter in a product handling plant is routinely calibrated to ensure that the production
(profitability) of the plant is accurately measured.

5. A large DC motor regularly requires the commutator to be skimmed to prevent flashovers


between the brush holders via the commutator.

6. A tank contains corrosive acid which would is prohibited from seeping into the ground by law.
A task has been scheduled to perform a seepage test on this tank every 4 years.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 102
Reliability-Centered Maintenance

Exercise 2 – Which type of maintenance?

The bearing in a the housing of a


generator has failed once in the past
causing an outage of two days to
production at a cost of several million
dollars.
In order to avoid this occurring again
the maintenance department is tasked
with finding a task that will predict or
prevent this occurring in the future.
They decide to strip down the
generator once every two years and to
perform a dye-penetrant check on the
bearing to search for cracks or
fissures on the races primarily.

We know that once cracks are able to be detected via the dye-penetrant test, the
bearing usually has around 3 months left prior to total failure.

Q1. What type of task are they suggesting; and


Q2. Will it solve their problem?

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 103
Reliability-Centered Maintenance

The Basis of Task Preference

• On-condition Tasks –Identify failure at the potential failure stage.


Reduce the likelihood of safety, environmental and operational consequences.
Also reduce operational costs by allowing equipment to realise most of its useful life.

• Preventive Restoration – When directed at specific components and parts it will lead to a reduction
in the overall failure rate of items that have a dominant failure mode.

• Preventive Replacement – Least favoured of the three.


Can reduce safety related consequences in some failure modes.
However is a larger cost of execution. (Reduced cost effectiveness)

• Detective Tasks – If the other three are not able to be applied, then this is the best option for hidden
failures.
If the frequency is practical, it can be done safely and the task itself does not substantially increase
the risk of failure, then this is the selected option.
Unlike the other three tasks this option will leave the function in a state of unavailability for a period

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 104
Reliability-Centered Maintenance

RCM-DO-06c Uses of MTBF


Mean Time Between Failure or MTBF4, is one of the most widely used metrics in physical asset
management. Generally, companies use it as a guide to the performance of their physical assets,
helping them to identify assets or processes that are causing lost revenue or cost related issues.
However, although widely applied, MTBF is still the subject of some confusion. Moreover, MTBF is
useful for a range of different purposes, giving organizations greater ability to increase the net present
value of their physical asset base.
When companies first look at implementing MTBF, they tend to ask three fundamental questions:
1) What MTBF can tell us about our assets,
2) what levels can it be applied at, and
3) how can MTBF be used to add value to our reliability initiatives?
What MTBF can tell us?
The standard use for MTBF in industry is to tell us the performance of the primary function of an asset
or system.
Figure 1 - Example System

High-High
High-Level

Duty Standby Low Level


Low-Low
800 l / m
Off - Take
For example, a pumping system consists of a duty/standby pump arrangement, a pressure relief valve, piping, and the tank
and associated level switches. The primary function for this system is to pump water to tank B at a rate of between 900
l/minute and 1000 l/minute. In this case, a failure occurs when the pump system is unable to pump water at the required
rate for whatever reason.

4
This module deals with MTBF in isolation and does not discuss other metrics such as MTTF (Mean Time To
Failure) or MTTR (Mean Time To Repair).

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 105
Reliability-Centered Maintenance

Here, we can calculate the MTBF as follows:


Total Time Required
Number of Failures
So if the total time that we required to pump to deliver this function was (say) 5 years, and we had 4 failures in that time,
the average time between failures would be 5/4 = 1.25.
If this were the mean time between failures, then the failure rate for one year would be 1/MTBF or 1/1.25, which is 0.8, or
80% likelihood of experiencing a failure of the primary function in one year.
If we then wanted to convert this into months we would first convert the MTBF figure to months, 1.25 years = 15 months,
then again determine the likelihood of this occurring in one month 1/15, or 0.066.
This means there is a 6% likelihood of experiencing a failure resulting in the loss of the primary function in any given
month. We could do the same for a week, a day or any other given period.
The above example shows us that initial uses of MTBF can provide us with the average time between
failures5 for a given time period, and that this can then be manipulated to give us a failure rate6 for any
specified period of time.
Thus, for one measurement of MTBF we are able to calculate the following information:
• MTBF of the Primary Function = 1.25
o Likelihood of a failure in one year = 1/1.25years (80% or 8 x 10-1)
o Likelihood of a failure in one month = 1/15 months (6% or 6 x 10-2)
o Likelihood of a failure in one day = 1/456.25 days (0.22% or 2.2 x 10-3)
o Likelihood of failure in one hour = 1/10950 hours (0.009% or 9 x 10-5)
At all times the formula takes into account the total time of the function, not of the asset itself. This
means that regardless of the number or type of assets in the system, the calculation always uses the
total time required of the function, or 5 years in this example.
At what level can we apply MTBF?
Like many other metrics in physical asset management MTBF is applicable at any level throughout the
asset base.
However, for performance measurement there are two rules for its application:
1. it is always used to measure the function of the asset where it is being applied, and
2. it always uses the total time required of the function of the level where it is applied.
For instance, in the example given above we determined that the MTBF for the pumping system was
1.25 years, and we were then able to derive failure rates for various other periods.
In addition, we can also apply this to the assets in the system as demonstrated in table 1.

5 Total Time Required


Number of Failures
6
1
MTBF

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 106
Reliability-Centered Maintenance

Table 1 - Component MTBF


Asset Function Total Time Number of MTBF Annual Failure
Required Failures Rate
Duty Pump To pump water to tank B 5 years 7 0.714 years 140%
at a rate of between 900
l/minute and 1000
l/minute
Stand By To maintain 800l/minute 5 years 2 2.5 years 40%
Pump to 1000 l/minute if the
duty pump fails
Piping To provide clear access 5 years 1 5 years 20%
for 800 l/minute to 1000
l/minute from the pump
sets to the tank
High-High To trip the pumping 5 years 1 5 years 20%
Level system when water
Switch reaches the high-high
level
High Level To shut off the pump 5 years 1 5 years 20%
Switch when the tank level
reaches the high level
Low Level To turn on the pump 5 years 1 5 years 20%
Switch when the tank has been
drained to the low level
Low-Low To alarm when the tank 5 years 0 5 years 20%
level switch level has been drained to
the low-low level
Tank To contain up to 250,000 5 years 0 5 years 20%
liters of water
Pumping To pump water to tank B 5 years 4 1.25 years 80%
System at a rate of between 900
l/minute and 1000
l/minute

Table 1 contains some information that should immediately provoke some questions. For example, we
have counted four failures in our system level MTBF, yet the table contains 13 failures. (Not counting
the system failures)
To understand this we need to review the functions for each of the components mentioned.
For example, the function of the High-High Level Switch is to trip the pumping system when water
reaches the high-high level. If there is a failure preventing this asset from performing its function, it
will not prevent the system from pumping water. We have had one failure on the switch that we know
about in this period.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 107
Reliability-Centered Maintenance

Another obvious issue is the fact that we have had seven failures of the Duty Pump. However, during
this time we have also only had two failures on the Stand-By pump, a dormant function, which we
know of. As this system has redundancy built into it, we can only experience a loss of the primary
function if we have a failure of the Duty pump and the Standby pump at the same time.
The four failures causing the loss of function at the system level were:
• One multiple failure of the duty and standby pump
• One failure of the High Level switch, meaning the level reached the High-High level once during
the 5-year period.
• One failure of the Low Level switch, resulting in the Low-Low level tripping the downstream
process
• One failure of the piping causing downtime
Figure 2 - MTBF at Different Levels

System
System
MTBF 1.25 years
MTBF 1.25 years
In any year 8x10-1-1
In any year 8x10

Multiple Pump Failure


Multiple Pump Failure High Level Switch Low Level Switch Piping
1:4.17 in any year High Level Switch Low Level Switch Piping
1:4.17 in any year MTBF 5 years MTBF 5 years MTBF 5 years
or MTBF 5 years MTBF 5 years MTBF 5 years-1
or In any year 2x10-1-1 In any year 2x10-1-1 In any year 2x10 -1
2.4 x 10-1-1 In any year 2x10 In any year 2x10 In any year 2x10
2.4 x 10

Duty Pump Standby Pump


Duty Pump Standby Pump
MTBF 1.67 years MTBF 2.50 years
MTBF 1.67 years MTBF 2.50 years
In any year 6x10-1-1 In any year 4x10-1-1
In any year 6x10 In any year 4x10

All the other failure mentioned were either; hidden to the operations team until revealed by inspection,
or their function was protected by other assets. (In the case of the failures on the Duty Pump)
As shown in Figure 1, MTBF is useful at any level throughout an asset base. However, its’ application
must be on the functions of the assets, and the total time required of each function, at each level of
performance measurement.
How can MTBF add value to Reliability Initiatives?
In the hands of a skilled RCM facilitator the measurement and manipulation of MTBF can be used to
set the performance expectations of the physical asset base, as well as providing a base for evaluation
of strategies, and to indicate the overall performance of assets; not just the performance of their
functions.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 108
Reliability-Centered Maintenance

This helps organizations in the change process because they begin to think about what the assets do,
rather than what they are. That is, an appreciation of functional performance as opposed to asset
performance.
For example, in the system described in Figure 1 we can break the system down into its’ functions, and begin to assign
performance expectations to each of these. 7
Function 1 - To pump water to tank B at a rate of between 900 l/minute and 1000 l/minute
Functional Failure 1.A – Does not pump water at all
The water pump in this example provides, say, the cooling water for a petrochemical plant. If the system is unable to pump
water, there will be a loss of production. The tank contains enough water to keep the plant running for a minimum period
of 2 hours, and a maximum period of 6 hours.
A multiple failure of both pumps would nominally result in a loss of production equal to, say, USD $2,000,000. In this case
the asset owner would like to keep the likelihood of this occurring to a reasonably low level and after some discussion he
decides on a level of 1:10,000 years, or an annual rate of 10-4.
This means management of all failure modes causing this consequence, an adverse impact on operational capability, to the
same level of likelihood.
Function 2 – To trip the pumping system when water reaches the high-high level
Functional Failure 2.A – Does not trip when the water reaches the high-high level.
In the case of the water system, an overflow of the tank would result in water in the surrounding area. While this is a slip
hazard for employees sent to correct the issue, the asset owner does not regard it as a serious hazard, nor will it result in
any damage to additional equipment.
The failure mode is dormant, meaning it will only have consequences when there is a failure of the high-level switch and
the high-high level switch. In this particular case, the asset owner is at ease accepting a higher level of risk of occurrence,
say, one in every 100 years, or a likelihood of 10-2 in any one year.
Function 3 – To alarm when the tank level is at the low-low level
Functional Failure 3.A – Does not trip when the tank is at the low-low level.
As with the High-High protection this alarm is only required once there has already been a failure of some sort, in this case,
notably a failure of the Low-Level Switch.
If this was to occur, and the tank consequently ran dry, the results would be catastrophic in financial terms. The
downstream equipment would run dry, and the plant would be without cooling water forcing a loss of production estimated
at around 3 days or USD $6,000,000 in this case. There would also be damages conservatively estimated at
USD$1,500,000 for producing assets.
The asset owner sees this as the worst possible outcome of a failure of this system. As a result, he would like to keep the
likelihood of failure at 1:100,000 years, or 10-5 per year. The resulting performance expectations of failure modes are in
Table 2 below.
We can see that the sum of each of the failure modes contributing to the loss of function must equal the desired failure rate,
or risk, at the above level. (Assuming these are all the relevant failure modes)

7
Full details about how to construct a risk profile based on performance expectations is contained in module RCM-
DO-05a Tolerable Levels of Risk (A Study of Industry)

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 109
Reliability-Centered Maintenance

Table 2 - Functional MTBF


Function Failure Desired Failure Rate Existing Annual Failure Rate
-4
To pump water to tank B Desired failure rate is 10 , therefore every failure mode underneath must be
at a rate of between 900 managed to at least 4x10-5 to ensure this level is reached.
l/minute and 1000
l/minute
Multiple 4x10-5 1:1.67 x 1:2.5 = 1:4.17
Pump
Failure or
-1
2.4x10

Piping 4x10-5 1:5

High Level 4x10-5 1:5


Switch
Low Level 4x10-5 1:5
Switch
To trip the pumping High-High 1:10-2 1:5 x 1:5 = 1:10
system when water Level
reaches the high-high Switch
level
To alarm when the tank Low-Low 1:10-5 1:5 x 1:5 = 1:10
level has been drained to level switch
the low-low level

Here we can see the desired failure rates set out in Table 2 for each function, and translated into a
performance requirement for each failure mode.
We can also record actual MTBF measures against this to see how effective we have been in managing
the failures of this asset to the desired levels of performance. However, this would only be a guide.
The MTBF measured would only calculate since the beginning of measurement. The best use of this
approach is to provide valuable input for RCM analysts, as well as for other applications within the
reliability field. It would also give asset owners a pre-determined risk envelope that they require their
assets to work within, increasing their control over asset performance, and hence over corporate
profitability.
Summary
MTBF is an exceptionally useful metric in the field of physical asset management and it is possible to
apply it at any level throughout the physical asset base.
The principal benefit of wide ranging use of MTBF is that it begins the process of focusing a company
on how the assets work to fulfill a function, rather than what those assets actually are. This is one of
the fundamental concepts of Reliability-centered Maintenance.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 110
Reliability-Centered Maintenance

As such, at whatever level it is applied, MTBF measures the function performed by that asset, asset
system, or entire process. It is also useful for proactively establishing the performance expectations of
the asset base, particularly in the areas of the Efficiency function.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 111
Reliability-Centered Maintenance

RCM-DO-06d Advanced Detective Maintenance Techniques

Hidden-Failure: Multiple Failure Consequence:


Failure of pressure Event: Explosion of the
release valve on high Dangerous build-up of pressure vessel when
pressure vessel in a gas gas pressure within the under high pressure
plant pressure vessel. conditions
To Process

To Process
To Process
To Process

To Process

To Process
# The Maintenance Scorecard, Daryl Mather,
Industrial Press, ISBN 0831131810

If the ultimate high level


switch fails closed, it is Ultimate high
evident… 1000 l/m level switch.
(normally open)
If the ultimate high level
Shuts the pump
switch fails open, then
off until
nobody knows it has failed…
manually reset.

High level shuts off Low level


pump until low level switch turns
turns it back on again on the pump
until the level
reaches the
high level
Low low level switch switch
turns off the pump until
manually reset 800 l/m

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 112
Reliability-Centered Maintenance

The probability that the protected function will fail


in any one cycle is given by its failure rate
One year
Protected
Function B Fails

Protective
Device C Fails

If the failure rate is once in four years, then the probability that it
will fail in one year is 1 in 4.

(This corresponds to a mean time between failure of four year)

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 113
Reliability-Centered Maintenance

The probability that the protective device will be in a failed state at


any point in time is given by its downtime
(if it conforms to a random failure pattern)

One year
Protected
Function B Fails

Protective
Device C Fails

If the downtime is 33% then the probability that is will be in a failed


state at any point in time is 1 in 3.
(This corresponds to an availability)
© Copyright Meridium... Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 114
Reliability-Centered Maintenance

The Probability of a Multiple Failure

One year
Protected Mean Time Between Failures = 4 years
Function B

Protective
Device C Availability = 67% Downtime = 33%

The probability that B will fail while C is in a failed state:


1 in 4 x 1 in 3 = 1 in 12
(In other words there is a one in twelve chance that the multiple
failure will occur in any one year)

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 115
Reliability-Centered Maintenance

When developing a failure management policy for a hidden


function, the first stage is to decide what probability we are
prepared to tolerate for a multiple failure…

One year
Protected Mean Time Between Failures = 4 years
Function B

Protective
Device C Availability = 67% Downtime = 33%

The probability that B will fail while C is in a failed state:


1 in 4 x 1 in 3 = 1 in 12
Prepared to accept 1 in 1000

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 116
Reliability-Centered Maintenance

Reducing the Probability of a Multiple Failure

Reduce the probability of failure of the protected function


(by applying a suitable failure management policy)
One year

Protected Mean Time Between Failures = 10 years


Function B X
Protective
Device C Availability = 99% Unavailability = 1%
And/or by increasing the availability of the protective device:
- by preventing the failure of the protective device, or
- by periodically checking whether the protective device is still working and
repairing it if it has failed
- by modifying the system in some way
The probability that B will fail while C is in a failed state is now
1in 10 x 1 in 100 = 1 in 1000
© Copyright Meridium... Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 117
Reliability-Centered Maintenance

• 6 identical PSV’s have each been checked once a year for 5 years (FFI = 1
year)

• So the devices have been in service a total of 30 years

• In that time 3 were found to be in a failed state


From Process

To Process

• So the MTBF of the devices (MTBFdevice) is 30 years / 3 failures = 10 years


Year 1 Year 2 Year 3 Year 4 Year 5

1
• We know that the failed devices failed
some time during the year before the 2
checks – but not when… 3

• It seems reasonable to assume that each 4


failed device was down for an average of 6 5
months
6

1. So the total downtime (DTdevice) was 1.5 years out of 30 or 5%

2. So on the basis of these figures it appears that: FFI = 2 x DT device x MTBF device

3. This is generally true if DT device <5% and MTBF device is random

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 118
Reliability-Centered Maintenance

One year
Protected Mean Time Between Failures = 10 years
Function B Fails

Protective Failed
Device C
1 in 10 x 1 in 100 = 1 in 1000
Step One
Step Two Step Three Decide what
Determine / estimate how Calculate what unavailability of probability we
often the protected function the protective device enables us tolerate for the
is likely to need to protective to achieve 1 given 2 multiple failure
device
if then
DTdevice = Unavailability of the protective device (1/MTBFfunction) x DTdevice = 1/MTBFmultiple
MTBFfunction = Failure rate of the protected function
MTBFmultiple = Failure rate of the multiple failure or
DTdevice = MTBFfunction / MTBFmultiple

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 119
Reliability-Centered Maintenance

We have seen that…

Where:
FFI = 2 x DT device x MTBF device …1
Where:
…and that…
••FFI
FFI ==failure
failurefinding
findingtask
task
interval DTdevice = MTBFfunction / MTBFmultiple …2
interval
••DT
DTdevice = Unavailability of
device = Unavailability of
the
theprotective
protectivedevice
device
••MTBF
MTBFdevice = MTBF of the Therefore.. by substituting 2 into 1 gives…
device = MTBF of the
protective
protectivedevice
device
••MTBF
MTBFfunction = MTBF of the
function = MTBF of the
protected function
protected function
••MTBF
MTBFmultiple = MTBF of the
multiple = MTBF of the 2 x MTBFfunction x MTBFdevice
multiple
multiplefailure
failure FFI =
MTBFmultiple

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 120
Reliability-Centered Maintenance

Exercise 1 – Steam Turbine


The function of a speed sensor on a large steam turbine (680MW) is to measure the rotational speed of
the turbine and to shut off the steam supply if the speed exceeds a specified limit.
The multiple failures which could occur if this mechanism does not work when required is that the
turbine could speed up to the point where centrifugal forces cause it to disintegrate.
The electric utility which operates the turbine decides that they will accept a probability of the
multiple failures once in 100,000 years for any one turbine.
The utility has twenty similar turbines in operation for an average of ten years each, giving a total of
200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out
due to over-speeding during this period. This corresponds to an MTBF of the protected function of 100
years for any one turbine.
The utility has never found one of the over speed mechanisms to be in a failed state when they have
carried out failure finding checks on their own machines, but data from a commercial data bank
indicate an MTBF of 500 years.
How often should the utility perform a failure finding task on the over speed mechanism in order to
reduce the probability of failure of the multiple failure to the desired level?

2 x MTBFdevice x MTBFfunction
FFI =
MTBFmultiple

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 121
Reliability-Centered Maintenance

Exercise 2 – Steel Plant


A steel producing plant has a need for many
product-handling assets to move around the raw
iron ore prior to processing.
As part of this asset base, they have 15 large
conveyors. Each of these has 4 e-stops, one on
either side of the head end, and one on either
side of the tail end of the conveyor.
The management has tasked the maintenance
team with determining a frequency for testing
the function of each of these e-stops to make sure that when we need them to work they will work.
After some discussion, they consulted relevant specifications and determined that they wanted these e-
stops to meet their SIL-2 classification. For this company that means a likelihood of 1:1,000,000 (106)
that any one would have a failure in any one year.
They found that on their own plant they had never experienced a failure of one of the emergency stops.
However, on consulting a commercial data store they found the following information:
• A population was tested over a time period of 108 hours
• During this time the item was found to have failed 14 times in an undetected and unsafe manner,
• And 60 times in a detected safe fashion
They were installed all of the conveyors at roughly the
same time 18 years ago. After conversations with a few
of the longer serving people they were able to ascertain
that they had required to use an e-stop, either to protect
people or to protect life, approximately 12 times.
What frequency will they need to do for the detective task
to maintain the level of risk that the company has deemed
as tolerable?

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 122
Reliability-Centered Maintenance

Common Cause Failure Modes

Calculation 2 • Managing more than one hidden


failure…
U1
– Any failure mode could take out the
protective function

Failure Mode 1 – Any failure modes that can be


managed via predictive or preventive
routines should be managed that
Failure Mode 1 way

Failure Mode 1 – All failure modes can be managed


via one detective maintenance task

n… – The detective task does not increase


the likelihood of a multiple failure

– It is practical to do the task at the


required interval

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 123
Reliability-Centered Maintenance

Calculation 2 We saw previously that…

B Fails 1/MTBFMultiple = (1/MTBFFunction) x DTTOTAL


C Fails
U1 U2 U3
Therefore…
MTBF Function / MTBF Multiple = DT1 + DT2 + DT3
DTTOTAL = DT1 + DT2 + DT3
We can deduce from what we also saw previously…
DT Device = FFI / 2 x (MTBF Device)

If we call MTBF of each of the three failure modes MD1, MD2 and MD3 respectively
then…

MTBF Function / MTBF Multiple =(FFI/2 x MD1)+(FFI/2 x MD2)+(FFI/2 x MD3)

Therefore…

2 x MTBF Function
FFI =
MTBF Multiple x (1/MD1+1/ MD2+1/ MD3)

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 124
Reliability-Centered Maintenance

Exercise 4 - Hoist
A speed sensor on the hoist drum of a crane used in a machine shop is designed to activate the
emergency brake on the main hoist if the drum starts turning too fast. If any aspect of the emergency
braking system does not work when required and the hoist drum runs away, industry standards
statistics suggest that there is a 5% chance that someone could get badly hurt or killed as a result.
The group performing the review decides that they would like to reduce the probability of this
happening to once in 200,000 years.
If there is only a 1 in 20 chance (5%) that the multiple failure of the over speeding drum and failed
emergency brakes will hurt or kill someone, an overall probability of 1 death or injury in 200,000
years for this reason can be achieved if the probability of the multiple failure itself is reduced to 1 in
10,000 years.
This is a new system, so the users of the crane have no historical data about its performance. However,
the suppliers of the speed sensor advise that it has an MTBF in this context of 300 years, and the
emergency brake an MTBF in this context of 100 years. No information is available about the
reliability of the electrical circuit between the two, but the behavior of similar circuits on similar
cranes suggests an MTBF of 200 years.
The circumstances under which the drum over speeds and needs the emergency brake occur on
average once every 50 years. You are asked to determine how often the emergency braking system
should be tested to reduce the multiple failure probability to the required level.

2 x MTBF Function
FFI =
MTBF Multiple x (1/MD1+1/ MD2+1/ MD3)

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 125
Reliability-Centered Maintenance

Options for redesign

What if, we re-did the speed sensor example, but with different figures? (A higher level of tolerable
risk and a lower device failure rate?)
…The electric utility which operates the turbine decides that they will accept a probability of failure of
the multiple failure once in (say) 1,000,000 years for any one turbine.
The utility has twenty similar turbines in operation for an average of ten years each, giving a total of
200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out
due to over-speeding during this period. This corresponds to an MTBF of the protected function of 100
years for any one turbine.
The utility has never found one of the over speed mechanisms to be in a failed state when they have
carried out failure finding checks on their own machines, but data from a commercial data bank
indicate an MTBF of 100 years.
How often should the utility perform a failure finding task on the over speed mechanism in order to
reduce the probability of failure of the multiple failure to the desired level?
2 x 100 x 100
FFI = FFI = 7.3 days
1,000,000
We can…
Make the function evident somehow
…or…
Provide additional layers of protection

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 126
Reliability-Centered Maintenance

One year
Protected Mean Time Between Failures = 5 years 10-2 x 10-2 = 1:10-4
Function B

Protective
Device C Availability = 75% Downtime = 25%

Function

10-2 x 10-2 x 10-2 = 1:10-6


Device 1

Device 2

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 127
Reliability-Centered Maintenance

Formula 3 In this case the unavailability of these devices should be


squared in the failure finding formula…
1/MTBFMultiple = (1/MTBFFunction) x (DT Device)2
B Fails
U1 Therefore…
C Fails
(MTBF Function / MTBF Multiple)1/2= DT Device
U2

If MTBFFunction and MTBFMultiple are given then…


DT Device = (MTBFFunction / MTBFMultiple)1/2

Therefore… 1/2
FFI = 2 x MTBF Device x MTBF Function
MTBF Multiple

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 128
Reliability-Centered Maintenance

What if, we re-did the speed sensor example, but with different figures? (Higher level of tolerable risk,
and lower device failure rate?) (Now two sensors)
…The electric utility which operates the turbine decides that they will accept a probability of failure of
the multiple failure once in (say) 1,000,000 years for any one turbine.
The utility has twenty similar turbines in operation for an average of ten years each, giving a total of
200 years of operating experience. As far as anyone knows, only two of these turbines have tripped out
due to over-speeding during this period. This corresponds to am MTBF of the protected function of
100 years for any one turbine.
The utility has never found one of the over speed mechanisms to be in a failed state when they have
carried out failure finding checks on their own machines, but data from a commercial data bank
indicate an MTBF of 100 years.
How often should the utility perform a failure finding task on the over speed mechanism in order to
reduce the probability of failure of the multiple failure to the desired level?

FFI = 2 x MTBF Device x MTBF Function 1/2

MTBF Multiple
1/2
100
FFI = 2 years!
FFI = 2 x 100 x
1,000,000

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 129
Reliability-Centered Maintenance

Multiple Redundant Devices


• We can do this for any number of devices tested
randomly…
FFI = 2 x MTBF Device x MTBF Function 1/n

MTBF Multiple

• If all are tested together then the formula


becomes…

1/n
FFI = MTBF Device x (n+1) x MTBF Function

MTBF Multiple

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

Exercise 5 – Pumps and PSV’s

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 130
Reliability-Centered Maintenance

A hydraulic system is protected from overpressure by four Pressure Relief Valves (PRV’s). One is
placed in the line to the line directly from the duty and standby pumping arrangements, and there is
one PSV in each of the supply lines to the accumulators. If the pressure exceeds the safe working
pressures then the PRV’s will relieve the pressure in the lines back to the hydraulic oil tank. All PSV’s
are set to the same pressure level.
The unit operates under extremely high pressures and if the safe working pressure is exceeded there is
a chance of a pipe rupturing, exposing people in the surrounding areas to pressures likely to cause
serious injuries. Risk ranking structures set-up by the corporate safety department has deemed this
asset as a high criticality asset. This means that it will need to be managed to a tolerable probability of
failure of 1:1,000,000.
In the 12 years that the hydraulic system
PSV 1
has been installed it has never once
required any PRV to relieve the pressure
within the hydraulic circuit to the
Accumulators accumulators. For this system they were
PSV 2
unable to find failure rate information in
commercial databanks.
However, a quick call to their 5 other
plants in their company showed them that
PSV 3 there were 4 such systems in the company,
with a combined operating life of 80
years. Incident records show that the
PRV’s have been used to relieve the
pumps 10 times. Evidence from the manufacturer suggests that the PRV’s have a failure rate of 1:100.
Given that all three will be tested at the same time, what is the failure finding frequency required to
achieve the tolerable probability of a multiple failure?

1/n
FFI = MTBF Device x (n+1) x MTBF Function

MTBF Multiple

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 131
Reliability-Centered Maintenance

Managing Risk in Hidden Failures

Environment Safety First establish whether there


is an intolerable risk or not.

Predictive Second determine if a Predictive


task is applicable and effective

Preventive
Third determine if a Preventive
Restoration
task is applicable and effective

Preventive
Replacement

Failure
Fourth determine if a Detective
Finding
task is applicable and effective

Redesign Fifth – the protection is inadequate

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 132
Reliability-Centered Maintenance

Voting Systems
k out
Formula
of n systems
6 If “r” = number of units that need to be in a
failed state before the entire system would
fail then…
B Fails
U1 U2 U3
C Fails r = n – k +1

Therefore; if FFI is a very small fraction of


MTBF Device it can be shown that:

1/r
(n-1)! x r! x (r + 1) x MTBF Function
FFI = MTBF Device x
n! x MTBF Multiple

! = Factorial (Used a lot in combinatronics and other probability theory statistical formulae)
5! = 1 x 2 x 3 x 4 x 5

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 133
Reliability-Centered Maintenance

Economic Consequences

But what about economic consequences…?

• Operational and economic-only consequences are purely


economic

– In other words, the only consequence of a multiple failure that does


not affect safety or the environment is that it costs money.

• But doing a failure finding task also costs money


– So in this case, we need to determine the failure finding task interval
that reduces total costs to a minimum, and then ask whether the
minimum total cost is acceptable

© Copyright Meridium... Inc. 2007

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 134
Reliability-Centered Maintenance

We saw previously that…


2 x MTBFdevice x MTBFfunction
FFI =
MTBFmultiple FFI

Therefore the probability of failure in any one year 2 x MTBFdevice x MTBFfunction

CM x FFI
The annualized cost of failure will be 2 x MTBFdevice x MTBFfunction

CFF
The annualized cost of doing a failure finding task
FFI

C Device
If FFI is a fairly small fraction of MTBF Function, the annualized cost of
repairing the failed protective device will be approximately: MTBF Device

C Function
Likewise for the function…
MTBF Function

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 135
Reliability-Centered Maintenance

Annualised cost of a
dCTotal multiple failure CFF C Device C Function
Cmultiple FFI + + +
x
dFFI FFI MTBF Device MTBF Function
2 x MTBFdevice x MTBFfunction
At a minimum
when

Annualised cost of
failure finding
CFF
FFI
Cost

Interval between failure finding tasks

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 136
Reliability-Centered Maintenance

Annualised cost of a
multiple failure CFF C Device C Function
Cmultiple + + +
x FFI
FFI MTBF Device MTBF Function
2 x MTBFdevice x MTBFfunction

Cmultiple CFF
dCTotal -
= 2 x MTBFdevice x MTBFfunction
FFI2
dFFI

2 x MTBFdevice x MTBFfunction x C
FFI2 = FF

Where:
Where: Cmultiple

•• CCmultiple ==Cost
Costof
ofone
oneMultiple
Multiple
multiple
Failure
Failure
•• CCFF ==Cost 1/2
FF Costofofone
onefailure
failure
finding
findingtasktask
2 x MTBFdevice x MTBFfunction x CFF
•• MTBF
MTBFdevice = Failure rate of
device = Failure rate of
the
the protectivedevice
protective device
•• MTBF
MTBFfunction = Failure rate of CMultiple
function = Failure rate of
the
the protectedfunction
protected function

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 137
Reliability-Centered Maintenance

Exercise 6 – Economic Hidden Failures


A hydraulic motor is used to drive an agitator on a reactor vessel in a chemical plant. The oil tank on
the hydraulic system contains a low level alarm which is used to remind the operators when to fill the
tank with oil. An ultimate low level alarm which is designed to shutdown the hydraulic system if the
system runs low on oil and the upper switch fails to warn the operators. If both switches fail and the oil
runs out, the motor could be severely damaged and the reactor down for up to 5 hours. This would cost
the company $1,500 in lost production and $525 to repair the motor – a total cost of $2,025.
The company has three such reactor vessels each driven by its own hydraulic system, and the operators
can only recall two occasions in which an ultimate low level switch has needed to stop a motor over a
period of twelve years. This means that the mean time between failures of the protected function is 18
years. (MTBFfunction)
Until now the low level switches have never been checked, nor have they been in a failed state when
called upon to work. In the absence of any other information and after careful study of the
configuration of the switches, the RCM Facilitator decides that the MTBF of the ultimate low level
switch is likely to be about twice that of the low level alarm, or 36 years. (MTBFdevice)
It is difficult to reach the switches, so a full functional check of the ultimate switches requires
lowering the level of the tanks under controlled conditions and checking whether the motors cut-out.
This task takes about an hour per tank at a cost of $25 per task (CFF). In the light of this information,
you are asked to determine the optimum failure finding interval for these switches.

1/2

2 x MTBFdevice x MTBFfunction x CFF

CMultiple

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 138
Reliability-Centered Maintenance

RCM-DO-07 The Value of RCM


As a cornerstone of the maintenance discipline, RCM can achieve benefits in a vast number of areas
depending on where and how it is applied.
When properly implemented, Reliability Centered Maintenance provides companies with a tool for
achieving lowest asset Net Present Costs (NPC) for a given level of performance and risk.
This implies a cashable impact across a multitude of economic activities, covering both OPEX8 and
CAPEX9.
However, RCM will also provide companies with a range of non-cashable advantages that will have a
positive impact throughout the enterprise.
This document contains a brief list of potential areas of benefit only, and not the entire range of
potential uses of RCM. Along with these areas, the author has previously used RCM for
• capital submissions in regulated industries,
• to reduce the risk of legal ramifications in management of environmental integrity,
• to establish a tool for contract negotiations related to outsourced maintenance,
• reduction of a companies carbon footprint,
• and as a means of developing trouble shooting guides
The information in this module is to alleviate some of the benefits anxiety that often surfaces in the
early implementation stages of large-scale RCM projects, and to provide guidelines for trainee RCM
Analysts.
The Cashable Results of RCM
Direct cashable benefits from implementing RCM can emerge in every area where maintenance and
operations have an impact.
This can include such disparate areas as increased uptime, decreasing energy usage, reductions in
chemical utilization, or reductions in inventory holdings and routine maintenance spending.
Instead of trying to cover all the potential areas where the method can deliver financial impacts, this
section will focus more on how RCM influences the profit and loss of an enterprise.
This is evident in two principle areas,
• an increase in potential revenue, and
• direct cost reductions.

8
OPEX – Operational Expenditure
9
CAPEX – Capital Expenditure

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 139
Reliability-Centered Maintenance

Direct Cost Reductions


The main noticeable result of Reliability Centered Maintenance is a dramatic change to the
maintenance regimes that are in place.
John Moubray, a pioneer in this field until his recent passing, regularly stated that RCM would achieve
“a reduction of between 20% and 70% in routine maintenance where there is an existing scheduled
maintenance program.”
Based on the experience of the author, this leads primarily to an increased level of cost-effectiveness
of maintenance, particularly in industries that are very asset intensive.10
The team is able to claim benefits in these areas where there is a calculable reduction in the cost of
labor, materials or consumables to perform maintenance11 over a reasonable amount of time. (Usually
a year)
Logically, these are only potential benefits at the completion of the analysis, as it will take until the
first omitted routine, or the first breakdown requiring reduced resources, before savings begin to
accrue.
However, once implemented they can easily be counted through direct calculation. For this to be
accurate there is a need to quantify both the routine maintenance costs as well as the corrective
maintenance costs.
There are some real world limitations on attempting to forecast cost reductions purely through
accumulated data.
The first issue the team can face is that current maintenance regimes often do not exist in the
company’s ERP or CMMS program, or they group them at a high level.
Data losses, poor ERP management, and distrust of technology means that experienced technicians
often keep the knowledge of existing maintenance outside of corporate systems.
Further compounding the issue is the disparate way that maintenance routines are stored. At times,
they are at an asset level, a maintainable item level, and still other times they can be at higher system
or unit levels.
A second limitation is that on the occasions when RCM proposes a more rigorous policy, there is a
tendency to overlook the change in reactive and corrective maintenance.12
Still, some direct cost reduction cases are obvious and do not require a detailed activity analysis.
Every task in an RCM analysis must be both applicable, meaning it is physically possible to do the
task, and effective, worthwhile doing in terms of cost and/or risk, before selection as an adequate
failure management strategy.

10
Asset-Intensive – Industries where asset maintenance and asset replacement form major parts of OPEX and
CAPEX
11
Maintenance refers to both routine and corrective or reactive activities.
12
The issues surrounding RCM and WoL asset management are covered in more detail in “RCM-DO-10 RCM and
Whole-of-Life (WoL) Asset Management”

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 140
Reliability-Centered Maintenance

When maintenance is developed using an unstructured method there are common errors that can occur.

Ineffective Maintenance
One of the great misleading statistics in asset maintenance today is the calculation of average life for
bearings. The effect of this is to support the outdated and almost mystical belief of the link between
age and failure.
Based on this way of thinking, it is still common to find maintenance departments carrying out hard-
time bearing replacement programs as a means of managing risk.
However, it has been the experience of the author that hard time bearing replacement policies can
increase, rather than decrease, the likelihood of failure while at the same time increasing the direct
maintenance costs.
This flies in the face of popular beliefs and is an example of how RCM thinking can drive reductions
in routine maintenance levels.
The original Nowlan and Heap report13 specifically spoke about bearings when addressing failure in
complex assets.
A complex item, as opposed to a simple item, is one that is subject to many failure modes. As a result,
the failure processes may involve a dozen different stress and resistance considerations.
Even with complex items, failures related to age will concentrate about an average age for that mode.
However, bearings have many failure modes.
Where there is no dominant failure mode14, as is the case in complex items such as most bearings, then
distribution of the average life of all the failure modes is widely dispersed along the entire exposure
axis.15 Therefore, failure will be unrelated to operating age. This is a unique feature of complex items.
When deciding maintenance policy for bearings, this issue is further exacerbating by the provision of
the L10 life by manufacturers. This number represents the point at which 10% of the items may have
failed, meaning that 90% will have survived.
Lieblein and Zelen, in their seminal work on the subject of bearing life16, found that the characteristic
life, the point where statistically 63.2% of the items will have failed, was roughly 5 times the L10 life.
They also found that the “life” forecasts had a median Weibull Beta value of 1.4, indicating a near
constant probability of failure. This means that the likelihood of failure at any point in the life of the
bearings in their study increased only marginally as the asset aged.
Other published analyses have quoted a beta of “1.3” for Ball and Roller Bearings, and a beta of “1”
for sleeve bearings.17

13
Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978
14
Dominant failure mode – the most common cause of failure
15
Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978
16
Statistical Investigation of the Fatigue Life of Deep Groove Bearings, J. Lieblen and M. Zelen, Journal of Research
of the National Bureau of Standards, Vol 57, No 5, November 1956.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 141
Reliability-Centered Maintenance

In process manufacturing industries, we find contaminated oil as one of frequent reasons for early life
failures. However, this is only one of the multitudes of stresses that bearings face as complex assets.
Others can include poor storage
leading to false brinnelling and
Characteristic Life
early corrosion, excessive heat Complex assets, such as 63.2%
and pressure, overloading, bearings, do not have a
exposure to vibration, abrasions dominant failure mode.
Instead they many different
and cracks. All of these could stresses leading to failure.
contribute to either early life L50 Life
failures, or premature wear out.
Often, the L10 life is mistaken for
an end life point for bearings, L10 Life
thus used as a reference interval
for replacement tasks. However,
Average Life
as can be seen from the
information above, it is not the
end-life, rather a minimum
Conditional
guaranteed life for 90% of probability of
bearings under specific load failure
conditions. Likelihood of failure at every point…
Constant / Random
These failures are distributed
This is in line with Nowlan and along the stress axis, making
Heaps’ findings and shows that in failure unrelated to age.
This is unique to complex
many cases we are at best assets.
wasting a large portion of the
bearings useful life, making this
an ineffective use of maintenance
resources.18
Increased bearing life and decreased labor costs are not the only potential savings. Frequent replacing
of bearings on, say, motor shafts we introduce the likelihood of a range of additional failure modes.
For example, installation and frequent change out failures include:
Wear of the motor shaft, decreasing the adequacy of the interference fit; leading to bearings spinning
on the shaft (A failure of the motor, not of the bearing)
Over heating of the bearing leading to early life failures and distortion of the inner race
Excessive force (i.e. Hammers) instead of bearing pullers, damaging the races of the bearings and
leading to early life failures

17
Bloch, Heinz P. and Fred K. Geitner, 1994, Practical Machinery Management for Process Plants, Volume 2:
Machinery Failure Analysis and Troubleshooting, 2nd Edition, Gulf Publishing Company, Houston, TX
18
Over one machine, this appears to be a very small maintenance cost item. However, when applied throughout a
plant, or on the so-called “critical” assets, it amounts to a significant maintenance cost.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 142
Reliability-Centered Maintenance

Bearing misalignment
Wrong bearing selection
Pre-failed bearings due to poor storage techniques
While we can manage some of these, others are a direct result of frequent bearing changes.
Therefore, if we use hard time bearing replacement as a maintenance policy then we are:
a) reducing the maximum used life of the bearing, and
b) increasing the likelihood of failure through the introduction of several additional failure modes
In the Meridium RCM decision algorithm19, a management policy for an Evident Operational and
Non-Operational failure mode must comply with the following:
“Over a period of time, the failure management policy must cost less than the cost of the operational
consequences (if any) plus the total cost of repair.”
Ineffective maintenance is more common than most professionals think, it can also include areas such
as maintenance out of context, where maintenance regimes are unaligned with how the asset is used,
or practices that decrease an assets efficient operations.
Using the decision algorithm in RCM, the first option available to the team is Predictive Maintenance.
Where this is both applicable and effective it will increase the effectiveness of maintenance in a range
of areas:
Predictive Maintenance detects the signs of the onset of failure. As such, it provides the capability to
manage all failures, including random failures.
It can be done in-situ and often without interfering with the normal operation of the process.
It will ensure that the asset utilizes all of its economically useful life. (As opposed to hard-time
replacements)

Inapplicable Maintenance
This mistaken belief that there is always a relationship between age and failure leads maintenance
departments to all sorts of policies that, in practice, are achieving nothing.
Often these occur during maintenance turnarounds. The opportunity to access items that are normally
in a running state drives people to inspect items just in case a life related failure mode has developed.
In particular, this again is a common activity in relation to bearing management.
For example, a turbine turnaround occurs once every 3 years (say) for other failure management
reasons.

19
The Meridium RCM Decision Algorithm is based on Figure 17 – A Second Decision Diagram Example, page 49,
SAE JA1012, 2002-01

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 143
Reliability-Centered Maintenance

The maintenance department has taken this opportunity to perform a dye penetrant check on the
bearing to see if any cracks are starting to form, requiring them to take action.
On the face of it, this appears to be a perfectly valid, even wise, use of the opportunity. However, on
applying the RCM logic a little closer this perception changes dramatically.
For the sake of this example, we will say that the P-F interval is about 3 months. Meaning once we
detect cracks in this particular bearing, we have around three months of time prior to functional failure.
If we test the bearing on a hard-time basis of every three years, and the P-F interval is three months,
then the following logic applies.
a) The dye penetrant test is only useful if the bearing failure is occurring at the time of inspection.
b) This means it had to start developing at less than 3 months prior to opening.
As we shutdown every 36 months, the likelihood of this occurring (given the randomness of bearing
failure) is around 1:12.
Turnaround Interval = 3 years
Moreover, the likelihood of it not
occurring is around 11:12. This task
does not satisfy the RCM applicability
criteria and is a waste of resources.
In addition, opening the bearing
housing and interfering with the
bearing, which presumably is
operating fine, we again introduce the
possibility of human error.20
It is difficult to categorize this
Likelihood of detection 1:12 maintenance practice directly; but the
Likelihood of non-detection 11:12 closest match in RCM is Predictive
Maintenance. (PTIVE)
P-F Interval = 3 months
In the Meridium RCM decision
algorithm, this means the team needs to answer all of the following questions before this task is
applicable:
Is there a clear potential failure condition?
What is it?
What is the P-F interval?
Is the interval long enough to take action to avoid or minimise the consequences of failure?
Is the P-F interval reasonably consistent?

20
Human error is discussed in detail in module RCM-DO-06a Introducing Human Error.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 144
Reliability-Centered Maintenance

Is it practical to do the task at intervals less than the P-F interval?


The team would be able to answer all of the above questions positively except for the last one. For the
task of dye penetrant, testing it is not practical to do the task at intervals less than the P-F Interval,
therefore the task is not applicable.
Inapplicable maintenance practices are widespread and, in the experience of the author, often reflect
the underlying belief of a consistent relationship between age and failure.

Increases in Revenue
There are two specific areas where an RCM team can claim savings.
a) Where an asset, or system, has a history of failures leading to lost production opportunities.
Principally this refers unplanned shutdowns, overrun turnarounds, and start up issues of an
asset or system.
b) Where an asset, or system, has a history of failures leading to reduced production output. This
includes areas such as utilization, quality, and reduced availability. For example:
a. Reduced turnaround times
b. Increased yield (quality)
c. Increased availability for full production rates

The RCM team can


Unplanned Shutdowns claim these savings
Shutdown Overruns Downtime only where they can
Startup Failures
prove they have isolated
the cause of the lost, or
Off Spec. Production

Production Slow Down


Under-performance Planned reduced, production and
have recommended a
Capacity strategy that will
Uptime mitigate it or prevent it
in the future.
Best Achievable
Rate
These are potential
because it will take a
reasonable amount of
time, nominally one year, before effective measurement can prove reduced production losses.
However, it is often the case that there are noticeable increases in available uptime after implementing
RCM maintenance policies.
Calculating benefits in this case requires the estimation the value of additional uptime, throughput or
yield, as well as the reduced costs of labor and materials.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 145
Reliability-Centered Maintenance

As these are historic failures, issues such as quantification of lost production, direct maintenance costs,
and the frequency of failure are relatively easy to find out.
However, an alternative is to use sophisticated forecasting techniques such as Crow-AMSAA. This is
time proven as an accurate method for forecasting failure rates; enabling the team to then calculate
savings from the changes to asset maintenance. This is also a valid method for forecasting savings in
direct costs.

Other Cashable Benefits


It is the experience of the author that CAPEX, as opposed to OPEX, benefits often represent the
largest cashable advantages to implementing RCM.
• A delayed use of capital, compared to the pre-RCM scenario, allowing deployment elsewhere in the
enterprise. This occurs through life-extension, and through higher confidence decision making.
• A reduction in operating losses, over the life of the asset base, attributable to correct timing of
capital refurbishment and replacement tasks
• A potential reduction in the cost of capital and the cost of insuring assets, due to the increased
confidence in decision-making
• Through the incorporation of risk into the budgeting process, the benefits of this are literally
incalculable as they depend on how the organization uses this information in the marketplace.
• A calculable reduction in inventory holdings based on the RCM approach.
While there are other cashable benefits, the above listed items represent the most common and the
least debated among the reliability communities.
The Non-cashable Results of RCM
RCM will increase the teams’ awareness of the limitations and operational requirements of the
physical assets they study, often substantially. This results in the following intangible benefits:
• A reduction in the risk of safety and environmental integrity related failure modes.
• Increased knowledge of the assets, their functions and their failures
• Increased ability to trouble shoot failed assets
• Changes to P&IDs specifically, and at times to other process drawings
• Changes to operation procedures, training, purchasing, work practices and other related areas
• A tangible increase in the quality and integrity of asset data because of the focus of RCM
However, it is often difficult, if not impossible, to measure the extent of the impact or to link them to
changes in the profitability of the enterprise. At times, the effort to do this can actually distort or
obscure the achievement itself.
(Attempts to equate a reduction in the risk of loss of life to a monetary value, is an example of this)

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 146
Reliability-Centered Maintenance

However, it is possible to represent some non-cashable benefits in monetary terms. The most common
of these is cost avoidance.

Risk Mitigation
When the mitigated risk is economic, it is often termed cost avoidance.
Where the team has implemented a policy for a reasonably likely21 failure mode where there was an
inadequate existing strategy in place, the team is justified in claiming this as a potential benefit of
RCM, even though the failure has not occurred previously.
These benefits count as non-cashable for a number of reasons:
1. They will never appear as part of the profit and loss of any enterprise. Nor will they cause a change
to maintenance budgets or revenues.
2. The team requires estimates to calculate the cost avoidance benefit. Some failure modes may have
similar consequences, affect similar assets, and have overlapping impacts on production.
For example, RCM teams can find themselves presenting benefits of several times the value of the
entire installation. If not explained correctly this is a false representation, which can erode the
credibility of RCM, and of the team attempting to implement it.
They are nevertheless valid and important benefits for the RCM team to claim.
Note the emphasis on “an inadequate existing strategy”. RCM did not invent maintenance, and often
there are adequate existing failure management policies in place.
As an output, the team will find that some maintenance regimes will disappear, some will remain, and
they will add some new, more sophisticated, regimes.
Redundancy This occurs because some of the maintenance
policies in place are redundant, some are either
Remaining pre-RCM
routines inapplicable or ineffective, yet others are adequate
Existing pre-RCM routines

Net maintenance means of managing failure.


tasks
Thus, there is no justification for claiming benefits
New

where there is an adequate existing strategy to


manage the failure mode.
Nor is there any justification for claiming benefits
where failure modes are not reasonably likely.
New

Other areas of risk mitigation are failure modes that


would affect either safety or environmental integrity.
Pre-RCM In many cases, these will have direct economic
Post-RCM

consequences through regulatory penalties, or through secondary economic damages caused by the

21
What constitutes reasonably likely is specific to each company, and often to each RCM analysis. Methods for
determining reasonableness are not included in this module.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 147
Reliability-Centered Maintenance

failure. Where this is the case then the team can calculate the value of the cost avoided in a similar
method to economic only consequences. 22
Where the failure mode will not have significant economic consequences, the delta between the
discovered risk and the managed risk can represent the benefit of risk mitigation.
The Principal Barrier to Value Realization
The benefits of RCM are obvious to anybody who has studied it or to any maintenance practitioner
who can relate to the concepts espoused in the method.

All levels within the corporation generally see different advantages to RCM and there is rarely a lack
of motivation for improvement.
Implementation problems commence due to fundamental misunderstandings about maintenance and
the functions of physical asset management23. This leads maintenance departments to see increased
risk where it does not exist.
For example, a maintenance manager could face any of the
Cashable Non-Cashable
following recommendations: (Among others)
• Elimination hard-time replacement policies where
Increased Risk applicable and effective,
Revenue Mitigation
• Elimination of invasive inspection while we have the
opportunity on planned turnarounds.

Reduced
This reluctance to change comes from the perception that this
Knowledge is risky, and instead of implementing the policy changes,
Costs Increases
things stay as they are.
The result is more of the same.
• Risk of unplanned failure stays provably higher, and
• the effectiveness of maintenance stays provably lower.
Moreover, resources remain tight performing maintenance that is not required, or repairing problems
caused by the activities that are supposed to prevent them.
It is clear that before we can successfully implement the strategy outcomes of RCM, we first need to
make sure that there is a deep understanding within the company of modern reliability principles.

22
Cost avoidance calculation methods are available in Handout RCM-DO-07a Calculating Costs Avoided, inspired
by the work of Steve Soos on this subject.
23
The Role of the Maintenance Manager, Daryl Mather, 2008:
• Design effective maintenance policy
• Execute them as efficiently as possible
• Collect relevant data for higher confidence decisions in the future.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 148
Reliability-Centered Maintenance

The Role of the RCM Facilitator/Analyst


In a time of continual change, the ability to implement is one of the most prized and sought after skill
sets.
In module RCM-DO-08 Implementation and Execution, we highlight the importance of momentum
and the vital role of benefit awareness in creating momentum.
RCM often requires the cooperation of a range of departments; including purchasing/stores, human
resources/training, operations, maintenance and the engineering department.
In the experience of the author, initiatives are not successful over the medium-long term when
companies try to order change. If you want to change the way an organization works fundamentally,
then people have to want to change.
For this to happen they need to understand the logic behind RCM, and they must understand what the
benefits are to them in their present role. One of the useful tools for engaging people is a solid, fact
based benefits cases for every analysis that is completed.
If it is to be effective, then this task should commence during the analysis period itself, and presented
before implementation.

© Copyright Meridium, Inc. 2008. All rights reserved.


Document: RCM Fundamentals Training.doc
Page 149

You might also like