A-Systems-Approach-to-Failure-Modes-v1 Paper Good For Functions and Failure Mechanism
A-Systems-Approach-to-Failure-Modes-v1 Paper Good For Functions and Failure Mechanism
January 2018
Dr Stuart Burge
Abstract
Failure Mode and Effect Analysis (FMEA) is ubiquitous throughout industry and commerce,
yet its deployment is often fraught with difficulty and its application ephemeral. It is argued in
this paper that root cause of these difficulties is an incomplete and inconsistent understanding
of failure.
This paper uses concepts and principles from Systems Thinking to provide a clear,
repeatable and reproducible approach to failure and its associated aspects that greatly
facilitate the deployment and application of FMEA in all its forms.
It also argues for the need to adopt an even earlier form of FMEA as a precursor to a Design
FMEA and Process FMEA. The intent of what is called Functional FMEA is to identify issues
before design commences such that they can be designed out ab initio.
Copyright and IPR Notice: Copyright and IPR exists and is held by BHW
and the Systems Engineering Company. This work must not be copied,
distributed or otherwise used without the express permission of BHW
The use of Failure Mode and Effect Analysis (FMEA), in all its incarnations, is
ubiquitous throughout industry and commerce, yet its deployment is often
fraught with difficulty and its application ephemeral. What appears, prima
facie, to be a simple tool turns out to be used inconsistently with a
corresponding high degree of frustration and lack of confidence. Part of the
problem lies with the published works on this fundamentally simple tool.
Indeed, even the recognized writers in the field demonstrate a distinct degree
of inconsistency.
The paper is split into 4 sections. The first provides an overview of the generic
“text book” FMEA process. The second describes some of the common
issues that are encountered with its application, while the third presents the
systems approach that focuses on the need for a clear understanding of
function. It is argued that a clear understanding of the function(s) of a design
or process provides a simple and elegant method of deriving the failure
modes and hence the causal chain that defines the failure mechanism.
The last section introduces the Functional FMEA as a tool that can be applied
at the requirements stage, before design has begun, in order to identify issues
early and design them out.
2.1 History
The history of FMEA is certainly not clear. What is certain is that the major
impetus for its use originates with the US Military. Indeed, the origins of FMEA
can be traced back to the US military standard MIL-P-1629 (1949) which
describes the process for conducting a “Failure Modes, Effects and Criticality
Analysis” (FMECA). It is indeed a misconception that FMEA and FMECA are
different. Originally, they were; FMEA did what its name suggests: it identified
and documented the failure modes and effects. The inclusion of numerical
assessments of severity of effect, probability of occurrence and ability to
Step Activity
1 Identify Item and Determine Boundary
2 Determine function
3 Determine for each function /component/ process step:
• potential failure modes
• effects of potential failure modes
• causes of potential failure modes
• current controls
4 Tabulate and assign ratings to:
• seriousness of occurrence
• probability of occurrence
• detectability of occurrence
5 Determine Criticality Index or Risk Priority Number
6 Determine corrective actions where appropriate
7 Assess Corrective Action
The following provides a summary of the 7 steps that constitute the “textbook”
Failure Mode and Effects Analysis.
The second step is for each item to determine its function or functions. These
should be written as statements as to what the item does.
Scale 1 2 3 4 5 6 7 8 9 10
S Not severe Extremely severe
O Very unlikely to occur Very likely to occur
D Certain to detect Cannot detect
The reason why the probability of detection scale appears to be the wrong
way around, is that it is concerned with the ability to detect the failure before
the user does.
Even with the above scale it can be difficult to assign ratings. For example,
what is the difference between a 3 or 4. In most cases such differences are
not important provided the team or individual is consistent when forming an
FMEA. The important point to note is that FMEA is a relative and subjective
analysis. Indeed, if two teams perform an FMEA on the same item, the ratings
given are likely to be different. However, the ranking of the failure modes by
the criticality indices is likely to be the same. That is, both teams will identify
the same critical failure modes and causes. Care must therefore be exercised
when comparing two FMEAs. Some companies have attempted to overcome
the subjectivity of FMEA by defining the criteria for each rating.
RPN = SxOxD
Once all the criticality indices have been calculated, a summary of the most
critical is extracted in order to highlight those areas where priority action must
be directed. It is also recognised practice to highlight any failures that contain
a 10 for Severity, Occurrence or Detection irrespective of the other ratings
The last step is common sense. The purpose of FMEA is not only to identify
potential problems but also to provide corrective action. Step 7 is therefore a
repeat of the FMEA to ensure that the corrective actions do actually reduce
the criticality index.
The other major contributing factor to the failings of FMEA is failure itself.
Nobody plans to fail, but failures do occur. In fact, they occur on a daily basis
with a consequential impact on cost, time and also reputation. It therefore
Even the less astute of organizations will soon realise that “fixing failures” is
wasteful and recognise the need to attempt to prevent failure and begin a
quest for the “holy grail”. That search is often short and sweet because FMEA
appears through the mist as a knight in shining armour. It is appealing on
many levels. It is a simple tool and most people with their inherent
understand of failure and cause and effect feel they can quickly grasp the
process and intent. It has numbers and therefore has a “scientific” quantitative
feel but is not mathematical. Other potentially useful tools such as Reliability
Block Diagrams, Design of Experiments, Sensitivity Analysis, Monte Carlo
Analysis, Fault Tree Analysis etc. are overlooked because the maths involved
appears too hard! There is also plenty of literature on FMEA. Many people
have trod this path leading to the over documentation and an “over-
availability” of literature on FMEA, particularly on the Internet!
This glut of FMEA literature should be useful. It is not. Everybody has their
take on FMEA that stems from their understanding of failure. In consequence
FMEA is reported in an inconsistent and incomplete fashion. To all intents and
purposes, FMEA is a simple tool and therefore most people feel they are able
to understand its application. Moreover, they can complete the FMEA form
and it “looks good” – it “looks right”. Unfortunately, the fact you can put
something in a column of an FMEA form does not mean it is correct. To
illustrate this, below is an example obtained from the Internet on 7 January
2014. It was obtained using a Safari browser and used the search word
“Failure Mode”. One of the items in the top 10 was entitled “The difference
between root cause and failure mode” – potentially an interesting online
debate – but it included the following example cited as helping in the debate:
At first glance this looks like a reasonable example – but it is wrong! Aspects
are correct. The author got the “Equipment” aspect right, but the rest is not
I wish to argue that it is the lack of a clear, logical, consistent and complete
definition of failure and its associated aspects, combined with our individual
belief that we “understand” failure that has led to a number of common issues
when applying FMEA in practice. These common issues include:
It is clear from table 2 that even the trusted sources of reference provide
incomplete and inconsistent understanding of failure. Any FMEA novice using
these reference sources in order to gain understanding will be left with an
incomplete picture. Therefore, FMEA becomes open to interpretation and the
inclusion of an individual’s personal view of a failure. It is analogous to
attempt to run a court of law with incomplete and inconsistent written laws!
FAILURE MODE: the manner in which the item fails to meet its intended
function. [SAE J1739 JAN2009]
Failure modes are used in preference to failures, since any given failure may have
several failure modes. Failure modes are written as “anti-functions” and perhaps
more important are that there are only 5 basic types:
• No Function
• Over Function
• Under Function
In terms of determining the failure modes what is critical from the function give
above is the verb to “convert”. This verb, together with the 5 usual suspects
given above, are used to determine the failure modes as:
Table 3: Failure Modes Table for the function to convert radio frequency signals into
light and sound
To construct the failure mode, the verb of the function has been taken to the
back and the 5 usual suspects of No, Over, Under, Intermittent and
Unintended used to start the definition. In practice it may be necessary to
interpret the resultant phrase taking into account the context. For example, it
may be necessary to discuss and agree what is meant by under convert of the
RF signal? However, the beauty of the systems view of failure is the ability to
apply it consistently though the use of function.
In the above table the second column contains the end effect:
Table 3 also shows what is called the “End” effect – what the user of the
system will actually experience. Dependent upon the “item” under
investigation there are other possible effects. In some systems it may be
necessary to think of the effect scenario through the levels of a system. This
leads to a number of additional effects at different levels:
There are however, other aspects to failure that are important. Firstly, there is
always something that causes the failure mode to occur.
However, there is one final aspect of failure that is often overlooked and
ignored. This is the failure mechanism that is the “road between the cause and
effect”:
The “Failure Mechanism” is interesting because most FMEA forms do not have
a column for it and in consequence people often confuse failure mode1 and
failure mechanism. It is not uncommon to find “fatigue” or “corrosion” given as a
Failure Mode. It is also possible to find these words appearing in the causes
column! Both are incorrect, “fatigue” and “corrosion” are Failure Mechanisms –
they are the chemical or physical process that results in failure. Something will
cause the item to fatigue or corrode – it is this that should be recorded in the
Cause column. The Failure Mode will be how an item fails to meet its intended
function as a consequence of fatigue or corrosion. It is perhaps easy now to see
why it was possible to declare the example given in box 1 as incorrect.
Typically, Failure Mechanisms comprise physical degradation of the item and its
components due to local operational conditions in combination with aspects
such as design features, materials and surface treatments.
1 In the case of “failure mode” BS 4778 actually doesn’t help! BS 4778 defines a failure mode as: “The effect by which a
failure is observed”. The lack of reference to the “function” presents an inconsistency within the BSI approach.
Starting with the “upper-loop” of figure 1, the model states that an item has a
function 2 and it is that function that defines the failure mode (through the usual
suspects). Moreover, the item will fail to deliver its function via the failure mode
and will result in an effect. The “lower-loop” of figure 1 states that a failure mode
will have a cause that initiates a failure mechanism that generates the failure
mode. To illustrate this, consider the simple situation of a pair of scissors.
2 Because an item ranges from a system down to an individual component is it possible that an individual item could
have many functions.
It is often argued that it “doesn’t matter if things are in the wrong column – it's
the process that is more important”. It is easy to have some sympathy with
this view; the fact a group of engineers are talking about potential failures is a
good thing. The fact they do not understand is equally disappointing and
frankly unprofessional.
Notice, in figure 3, the only element that has changed is “ITEM” has been
replaced by “SYSTEM, SUB-SYSTEM, PART”. The remainder of the model is
unchanged.
Figure 4: The Systems Model of Failure Modified for Process-Based Systems (Process
FMEA)
Again, note that the only element that has changed is “ITEM” has been
replaced by “PROCESS STEP”. The remainder of the model is unchanged.
This is a consequence of the universal nature of the system definition of
failure that will allow for a simple correct and consistent approach to
identifying and documenting failure attributes. This consistent and complete
view of failure also leads to development of a universal approach to Failure
Mode and Effects Analysis and a precursor to DFMEA and PFMEA, the
Functional Failure Mode and Effects Analysis.
The difference between the activities in table 1 (textbook FMEA) and table 4
(Universal FMEA) are small. This is both convenient and deliberate. The
differences are concerned with recognising that products and processes can
be treated as systems. The parts, assemblies etc. of a product are equivalent
to the process steps in a process. Most important, however, is the recognition
that these parts, assemblies, process steps etc. all perform a function. It is the
identification of the function that provides the segue into the failure analysis
since from figure 1 all the other attributes necessary to completely define
failure follow. It also leads to a modification of the generic FMEA form to
include Failure Mechanism. This is shown in Figure 5. It is still perfectly sound
practice to use the generic FMEA form with the system view of failure, but
there is nowhere to capture the failure mechanism and therefore that
information, unless recorded elsewhere, is lost.
It is typically reported [for example INCOSE] that the cost to fix increase by a
factor of 10 with each development phase. What is therefore clear from figure
6 is that performing FFMEAs provides the opportunity to address potential
failure modes at minimum cost and result in more robust failure free products
and services.
5.0 Conclusions
This paper has argued that Failure Mode and Effect Analysis (FMEA), while
ubiquitous throughout industry and commerce, struggles to be effectively
deployed and its application ephemeral. It is further argued that the main
This paper has used concepts from Systems Thinking to provide a clear and
consistent understanding of failure that in turn permits a repeatable and
reproducible approach to FMEA whether it is Design or Process.
This system approach to FMEA also points the way to the need to adopt an
even earlier form of FMEA as a precursor to a Design FMEA and Process
FMEA. The intent of what is called Functional FMEA is to identify issues
before design commences such that they can be designed out ab initio.
6.0 References
[1] BS4778-3.1:1991 Quality vocabulary. Availability, reliability and maintainability terms
Guide to concepts and related definitions
[3] IEC 60812, 2006 Analysis techniques for system reliability – Procedure for failure
mode and effects analysis (FMEA)
[4] Stamatis D.H. Failure Mode and Effect Analysis, ASQ Quality Press 2003 ISBN 0-
87389-598-3
[6] AIAG (2008). Potential Failure Mode and Effect Analysis (FMEA), 4th Edition.
Automotive Industry Action Group. ISBN 9781605341361.
[7] SAE (2008). Potential Failure Mode and Effects Analysis in Design (Design FMEA)
and Potential Failure Mode and Effects Analysis in Manufacturing and Assembly
Processes (Process FMEA) and Effects Analysis for Machinery (Machinery FMEA).
SAE International.