0% found this document useful (0 votes)
55 views

07 A

A Fault Tree represents the causes of a specified system failure mode in terms of the failure modes of the system components. The analysis of the Fault Tree can produce two types of result: qualitative and quantitative.

Uploaded by

Jin-Long Lin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

07 A

A Fault Tree represents the causes of a specified system failure mode in terms of the failure modes of the system components. The analysis of the Fault Tree can produce two types of result: qualitative and quantitative.

Uploaded by

Jin-Long Lin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

2012 Annual RELIABILITY and MAINTAINABILITY Symposium

Introduction to Fault Tree Analysis

John Andrews, Ph.D.

John Andrews, Ph.D., FIMechE, CEng, MIMA, CMath, MSaRS, FSS Network Rail and Royal Academy of Engineering Professor of Infrastructure Asset Management Director of The Lloyds Register Educational Trust centre for Risk and Reliability Engineering Nottingham Transportation Engineering Centre Faculty of Engineering, University Park Nottingham University Nottingham, NG7 2RD, England email: [email protected]

Tutorial Notes 2012 AR&MS

SUMMARY & PURPOSE


A fault tree represents the causes of a specified system failure mode in terms of the failure modes of the system components. The analysis of the fault tree can produce two types of result: qualitative and quantitative. Qualitative results specify the minimal contributions of component failures which result in system failure. Quantification provides the probability or frequency of the system failure modes. The tutorial will explain the mathematics used to perform a fault tree analysis. A considerable focus of the tutorial will also be on the development of the fault tree model from the engineering system.

John Andrews, Ph.D., FIMechE, CEng, MIMA, CMath, MSaRS, FSS


John Andrews is the Royal Academy of Engineering Professor of Infrastructure Asset Management in the Faculty of Engineering at Nottingham University, UK. He recently transferred to Nottingham University from Loughborough University where he had worked since 1989. The prime focus of his research has been on methods for predicting system reliability in terms of the component failure probabilities and a representation of the system structure. Much of this work has concentrated on the Fault Tree technique and the use of the Binary Decision Diagrams (BDDs) as an efficient and accurate solution method. He is the author of around 200 research papers on this topic and is joint author, with Bob Moss, of a text book, Reliability and Risk Assessment, now in its second edition, published by ASME. John is a member of the Safety and Reliability Group of the Institution of Mechanical Engineers (IMechE). He is also Editor of the Journal of Risk and Reliability the new part O of the IMechE Proceedings and a member of the Editorial Boards for Reliability Engineering and System Safety and Quality and Reliability Engineering International.

Table of Contents
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Introduction ..........................................................................................................................................................................1 Fault Tree Symbols and Construction ..................................................................................................................................1 Minimal Cut Sets ..................................................................................................................................................................3 Component Failure Probability ............................................................................................................................................4 Minimal Cut Set Failure Probability ....................................................................................................................................5 System Failure Probability ...................................................................................................................................................5 Importance Measures ...........................................................................................................................................................6 System Failure Intensity .......................................................................................................................................................7 Case Study ............................................................................................................................................................................7 Summary ..............................................................................................................................................................................9 References ..........................................................................................................................................................................10 Tutorial Visuals.. .................................11

ii Andrews

2012 AR&MS Tutorial Notes

1. INTRODUCTION Fault tree analysis is now a commonly applied method to predict the failure probability or failure frequency of engineering systems in terms of the failure and repair parameters of the system components. The concept of expressing the system failure causes in a logic diagram, which became known as a fault tree, was established in the early 1960s by Watson working at Bell Telephone Labs on the launch control system of the Minuteman intercontinental ballistic missile. The time-dependent methodology to quantify the system failure likelihood or frequency, known as kinetic tree theory was developed almost 10 years later by Vesely [ref 1] working at the Idaho Nuclear Corporation. Enhancements to the technique including the development of importance measures [refs 2,3] and initiator and enabler theory [ref 4] added to the capability. In recent years an alternative to kinetic tree theory for efficient and accurate fault tree quantification has been developed known as the Binary Decision Diagram method [refs 5-11]. Once constructed and appropriate data supplied for the basic events the analysis of the fault tree can be undertaken. Analysis produces two types of result: qualitative and quantitative. Qualitative analysis produces the minimal combinations of basic (component failure) events which result in the system failure mode (top event). These are known as minimal cut sets. Quantitative results include the top event unavailability, unreliability or failure rate. The top event parameters are defines as follows: unavailability: Qsys(t), the probability that the system failure mode exists at time t unreliability: Fsys(t) the probability that the system failure mode occurs at least once from time 0 to time t. failure rate: the rate at which the system failure mode occurs All of these quantities can be used to judge the acceptability of the system performance. If required the quantification can be extended to produce importance measures which identify the contribution each basic event makes to the top event. 2. FAULT TREE SYMBOLS AND CONSTRUCTION The features of a typical fault tree are shown in the figure 1. The tree structure starts at the high level system failure mode and progresses in branches spreading downward developing its causality in terms of lower resolution events until component failure modes, basic events, appear. When the lowest resolution events, component failures, are encountered then this defines the limit of the analysis and the development of the failure logic is terminated. The system failure mode of concern is known for obvious reasons as the top event. Typical examples of this type of event are: 1. total loss of production 2. safety system fails to respond 3. standby system fails to start 4. explosion 5. loss of space mission

6.

release of radioactive material


Fire Protection System Fails to Respond to a Fire

TOP Event

OR Gate
Fire Detection System Fails to Detect a Fire Water Deluge System Fails to Activate

Intermediate Event AND Gate

Failure to Detect Smoke

Failure to Detect Heat

Pump Fails to Start


PUMP

Nozzles Blocked

SD

HD

NOZ

Basic Event

Figure 1. Typical Fault Tree Features Note that for the first three of these events the system failure can be tolerated and the repair of the causes of the failure will produce the non-occurrence of the top event. For the later three when the top event has occurred then repairing the component failures which have contributed to its occurrence will not remove the top event. Typical events which terminate the logic development are: 1. pump fails to start 2. valve fails closed 3. flow sensor fails to indicate high flow 4. operator fails to respond The first three of these events are hardware failures which specify the piece of equipment which has failed and also the mode in which it fails. Events which do not specify the failure mode at either system or basic event level are unhelpful and should be avoided. There are two types of symbols which appear in the fault tree structure: gates and events. The events start at a high, system, level at the top of the diagram and progress, through intermediate events, to finer resolution events as you move down the diagram through sub-system and section level down to component level. Typical examples of event symbols used in the fault tree structure are illustrated in figure 2.
Symbol Name Intermediate Meaning System or component event description.

Basic

Basic event for which failure and repair data is available. Usually represents a component failure. Represents definitely occurring or definitely not occurring events.

House

Figure 2. Event Symbols The events in the fault tree are linked using gate symbols. Common gates are shown in figure 3. The three

2012 Annual RELIABILITY and MAINTAINABILITY Symposium

Andrews 1

fundamental logic gates are OR, AND and NOT. The output (higher level event of an OR gate will result from the occurrence of at least one of the input (lower level) events. For an AND gate the output event occurrence requires the simultaneous existence of all of the input events. The output to a NOT gate happens as long as the input event does not.
Symbol Name OR Causal Relation Output event occurs if at least one of the input events occur. Output event occurs if all input events occur. Output event occurs if at least m of the input events occur. Output event occurs if all input events occur in sequential order from left to right.

the house event is known to be either true or false. Setting such events to true or false on the fault tree has the effect of turning on or off branches in the fault tree. House events can be used when fault trees are developed for systems which have several operating modes, sections taken out for maintenance, or to represent different design options.
System Fails to Trip When Hazard Occurs 2

AND

VOTE

Sensor 1 Fails

Sensor 2 Fails

Sensor 3 Fails

S1

S2

S3

PRIORITY AND

Figure 4. 2-out-of-3:F Vote Gate The process to construct a fault tree for a system can be time-consuming and the engineer must have a very thorough understanding of the system before it can take place. Each fault tree explores the causes of one particular system failure mode and therefore it may be necessary to draw more than one fault tree for any system. Unfortunately there are not a set of rules which can be stated and guarantee the fault tree constructed will have the correct system failure logic. Guidelines [refs 12-14] which help develop a structured and systematic way of generating the fault trees can be given which will provide a process which is less prone to error. These guidelines are: 1. Assume no miracles If the normal functioning of a component propagates a fault sequence then it is assumed that the component functions normally. If a component failure fortuitously prevents a fault sequence then this is a miracle and should not be included in the system failure logic development. 2. Complete-the-gate: Define all inputs to a gate before the further development of any one is undertaken. 3. No gate-to-gate: Gate inputs should be properly defined and gates should not be directly connected to other gates. As an example of applying these guidelines to construct a fault tree consider the system, shown in figure 5, designed to react to an undesired gas presence. In the event of a gas leak the system is required to perform two functions. It isolates the sections so that the size of the leak is limited to the inventory contained between the two isolation valves, and de-pressurises the section by flaring the gas. Isolation is achieved by closing two normally open isolation valves. Flaring the gas is achieved by opening the normally closed blowdown valve. For

NOT

Output event occurs if the input event does not occur.

Figure 3. Gate Symbols Other gates included in figure 3 are the VOTE gate where at least m of the inputs has to occur to produce the output event, and the PRIORITY AND gate where, like the AND gate, all inputs have to occur but they have to occur in the sequence specified by the list of input events going from left to right. Sensors used to detect undesirable conditions are frequently arranged in a voting configuration to give a high chance of successfully identifying the condition, but a low chance of a spurious identification when the event does not exist. For example, if it takes 2-out-of-3 sensors to work to correctly detect a hazardous condition for which a system trip will occur (2-out-of-3:W) then one sensor failing to detect the event presence can be tolerated but a second means that there is only one working sensor and the trip condition cannot be satisfied. This system failure is also a 2-out-of-3 voting configuration but this time 2-out-of-3:F. This is represented in the fault tree with a VOTE gate shown in figure 4. If there are 4 sensors and 2 are required to recognise a condition to trip the system then up to 2 sensor failures can be tolerated. The occurrence of a third sensor failure in this voting configuration leave the system unable to satisfy the 2out-of-4:W condition and hence will fail. Therefore a 2-outof-4:W system is a 3-out-of-4:F. The house event, shown in figure 2, is an event which terminates a branch of the fault tree but unlike the basic event,

2 Andrews

2012 AR&MS Tutorial Notes

the system illustrated the gas leak is detected by two sensors each of a different type. One (SD1), is a sonic detector, the other (CD1) triggers on gas concentration. The controlling computer will issue a system trip as soon as either of the detectors indicate a gas presence. The computer will automatically drop out a relay which removes power to each of the 3 valves. As a secondary means of achieving the same objective an alarm is sounded which informs the operator of the leak. The operator then activates the push button to deenergise the valves.
SD1 Sonic detector Computer
V1

Leak detection system fails

V1 fails to close

V2 fails to close

V3 fails to open

power to valves

V2

power to valves

V3

power to valves

CD1 Concentration detector operator Alarm Push Button (BP) Relay

power to the valves

push button contacts closed


1

relay contacts closed

V3

Relay Contacts

push button contacts closed

relay contacts closed

PB

push button not operated

CONT

relay remains energised computer fails to provide trip signal

V1

P/C
Blowdown Valve LEAK

V2

OP

no alarm given

P/O
Isolation Valve

P/O
Isolation Valve
AL computer fails to provide trip signal

Figure 5. Gas Leak Detection System The failure modes to consider for each of the components in the system are given in Table 1.
Component failure mode Isolation valve 1 fails to close Isolation valve 2 fails to close Blowdown valve 3 fails to open Operator unavailable Computer fails to process trip condition Alarm fails to sound Relay contacts stuck closed Concentration detector fails to register leak Sonic detector fails to register leak Push Button contacts stuck closed code V1 V2 V3 OP COMP AL CONT CD1 SD1 PB
computer fails to issue trip signal

computer failure

gas leak not detected

COMP

CD1

SD1

Table 1. Component Failure Modes Given a gas leak the system should perform three tasks: close isolation valve V1 close isolation valve V2 open blowdown valve V3 A fault tree has been drawn for the Top Event leak detection system fails. This is shown in figure 6.

Figure 6. Gas Detection System Fault Tree 3. MINIMAL CUT SETS A system failure analysis using a fault tree can establish the component conditions that will yield a system failed state. A list of component failed states which cause the system failure mode is known as a cut set. This information is however not that useful as there can be component failures included in the list which are not needed to cause the system failure since other component failures will have already guaranteed that the system will fail. By removing these redundant component failure events from the list gives

2012 Annual RELIABILITY and MAINTAINABILITY Symposium

Andrews 3

minimal cut sets. Minimal cut sets are a list of minimal (necessary and sufficient) component failed states which cause the system failure mode.
SYSTEM FAILS

TOP=B.GATE1.GATE2 =B.(A+GATE3).(D+GATE4) =B.(A.D+A.GATE4+ GATE3.D+GATE3.GATE4) =B.[A.D+A.A.C+(B+C).D+(B+C).A.C] =B.[A.D+A.A.C+B.D+C.D+B.A.C+C.A.C] (1) Where . represents AND and + represents OR in the equations. These equations are then simplified using the laws of Boolean Algebra: Idempotent A.A=A (1) removes repeated events within each cut set A+A=A (2) removes repeated cut sets from the expression Absorption A+A.B =A (3) removes non-minimal failure combinations Applying idempotent rule (1) to equation 1 gives: TOP=B.[A.D+A.C+B.D+C.D+B.A.C+C.A] (2) Applying rule (2) gives: TOP=B.[A.D+A.C+B.D+C.D+B.A.C] Applying rule (3) gives: TOP=B.[A.D+A.C+B.D+C.D] Expanding out and applying these rules further gives: TOP=B.D+A.B.C (3) (4) (5)

GATE 1

GATE 2

Figure 7. Example System Fault Tree Structure. By inspection the minimal cut sets of the fault tree shown in figure 7 are: {A,B,C} and {B,D}. The way that the fault tree represents the system failure logic is not unique and different engineers will probably draw a different tree structure for the same system failure mode. Whilst the actual diagram structures may be different, if they represent the same logic function, they will produce the same minimal cut sets. To produce the minimal cut sets of a fault tree a Boolean equation is established for the Top Event which is then manipulated into its minimal sum-of-products form (disjunctive normal form) to enable the minimal cut sets to be identified. A Boolean variable is defined for each basic event which is TRUE if he basic event occurs and FALSE if it does not. As an example consider the fault tree in figure 8.

This form of the equations in its simplest sum-of-products form and cannot be reduced any further. The products of this expression are the minimal cut sets. Therefore the fault tree shown in figure 8 has minimal cut sets: B.D and A.B.C (showing the fault tree to be equivalent to that shown in figure 7). 4. COMPONENT FAILURE PROBABILITY To quantify the fault tree the component mode failure probabilities must be predicted. The models used to make this prediction depend on how the component is maintained and three situations are considered here: no repair, repair when the failure occurs (revealed failure), and repair when the failure is discovered (unrevealed failures). No Repair When a component cannot be repaired then its chance of failure will continue to increase over time to its limiting value of 1 as shown in Figure 9. In such circumstance if the component is functioning at a time t then it must have worked continuously to that time and so its reliability and availability are the same. Therefore the unreliability, F(t), and unavailability, Q(t), are the same and if the component has a constant failure rate, , these are given by: Q (t ) = F (t ) = 1 e t t (6)

Figure 8. Example Fault Tree Using a top-down approach we get the following Boolean expression for the top event in terms of the component failure conditions:

4 Andrews

2012 AR&MS Tutorial Notes

Q 1

Q(t)
1 e t

0
Figure 9. Non-repairable Component Revealed Failures

Time

Figure 11. Unrevealed Component Failure

It is known when a revealed component failure occurs and the repair can be started immediately. This is unscheduled maintenance which takes place in response to the component failure occurrence. For components with constant failure rate, , and constant repair rate, , the unavailability at time t (illustrated in figure 10) is given by: (7) Q (t ) = (1 e ( + )t ) +

(1 e ) (8) 1 e t Q AV = (1 e ) dt = t + = 1 0 0 Where is the interval between inspections. Alternatively this can be approximated by: (9) Q AV = + 2
1

t

The average unavailability is given by:

5. MINIMAL CUT SET FAILURE PROBABILITY Assuming the components fail independently of each other the calculation of the minimal cut set, Ci, probabilities is trivial and given by:
P(Ci ) = P( X j )
j =1 n

Q(t) 1 +

(10)

where the events in the minimal cut set, Ci, are X1, X2, Xn. 6. SYSTEM FAILURE PROBABILITY Using fault tree analysis predictions for the failure probability or the failure frequency of the system (top event) can be made. In this section we will concentrate on the top event probability. Having obtained the minimal cut sets we can express the top event logic equation as the disjunction (OR) of the NC minimal cut sets, Ci. The system failure probability, Qsys, is then the probability of this disjunction: T = C1 + C2 + L + C NC (11) QSYS = P (T ) = P (C1 + C2 + L + C N C ) Then top event probability is then evaluated using the inclusion-exclusion expansion:

0
Figure 10. Revealed Component Failure

Time

Note that when the times to an event are given by the exponential distribution and occur with a constant rate then the mean time to the event is 1/rate so: Mean time to failure, = 1/ and Mean time to repair, =1/ Unrevealed failure When components are part of standby or safety systems which only operate under certain conditions then when failures occur they will not be noticed. For this type of system they must be tested to reveal the failure and so the repair takes place when scheduled tests are carried out. This results in the failure probability distribution shown in figure 11.

QSYS = P(Ci )

NC

NC

L + ( 1)
i =3 j =2

i =1 NC i 1

j 1

i =2

P (C C ) +
j =1 i j

i 1

k =1 N C +1

P (Ci C j Ck ) L P(C1 C2 L CNC )

(12)

Consider the example fault tree in figure 7 which has minimal cut sets {B,D}, {A,B,C}. Applying equation 12 gives: QSYS = qa qb qc + qb qd + qa qb qc qd (13) where qA, qB, qC, qD are the failure probabilities of components A, B, C and D respectively. In this particular example it is a simple calculation. However, consider a moderate to large sized fault tree which

2012 Annual RELIABILITY and MAINTAINABILITY Symposium

Andrews 5

delivered 100,000 minimal cut sets. The number of elements in first term of equation 12 would be105, in the second term 5 x 109 and in the third term 1.7 x 1014 and so on for the105 terms in the equation. Even with modern fast digital computers this is an enormous number of calculations and would take a considerable time to complete. In practice acceptably accurate upper bound approximations are used such as the Rare Event approximation (equation 14) or the Minimal Cut Set Upper Bound (equation 15).
QEXACT P(Ci ) r
i =1 Nc

TOP

GATE 1

(14) (15)

GATE 2

QEXACT 1 (1 P (Ci ) )
i =1

Nc

7. IMPORTANCE MEASURES Should a system not perform to the reliability or availability target required then modifications to the design or operation have to be made to address the weaknesses. An output from a fault tree analysis which can help to identify the weaknesses is importance measures. Importance measures provide an indication, in some sense, of the contribution that each basic event or minimal cut set makes to the system failure mode. There are many different types of importance measure and each calculates a different means of ranking the contribution to the top event. More details can be found in references 12 and 14. Considering the basic event importance measures. The vulnerability of the system to the occurrence of each component failure event is indicated by a numerical value. The higher the importance value the greater the contribution of that basic event to the system failure. Depending on nature of the importance measure they can take into account such things as the structure of the system (levels of redundancy etc), the failure rate of the component, and the time taken to repair the component. To improve the system performance the basic events which have the highest importance measure can be addressed. Importance measures can be deterministic which consider only the system structure or probabilistic and account for the likelihood of component failures. A concept which is fundamental in developing component importance measures is that of a critical system state. A Critical System State for a component i is a state for the remaining n-1 components such that failure of component i causes the system to go from a working to a failed state. STRUCTURAL MEASURE OF IMPORTANCE Having defined the critical system states the structural measure of importance, Ii, can be defined: number of critical states for component i . Ii = (16) total number of states for the (n - 1) remaining components Consider a simple system of 4 components whose failure causes are represented by the fault tree in figure 12. Where the failure of the components are given by: qA = qC = 0.1, and qB = qD = 0.2.

Figure 12. Simple Four Component System Fault Tree Taking each component in turn the critical system states can be identified by constructing a table which considers the states of all the other components in the system. Some of these states may already satisfy the conditions which mean the system is failed. Others will mean that the system still functions. From these states, those which will fail the system when the component being considered fails are critical and identified. These tables are illustrated for components A, B and C in tables 2, 3 and 4 respectively. Due to the symmetry of the system component D will have the same number of critical states as component C.
B 1 2 3 4 5 6 7 8 W W W W F F F F States C W W F F W W F F D W F W F W F W F Critical for A? Y Y Y Y Y N N N

Table 2. Criticality of Component A


A 1 2 3 4 5 6 7 8 W W W W F F F F States C W W F F W W F F D W F W F W F W F Critical for B? N Y Y Y N N N N

Table 3. Criticality of component B

6 Andrews

2012 AR&MS Tutorial Notes

A 1 2 3 4 5 6 7 8 W W W W F F F F

States B W W F F W W F F

D W F W F W F W F

Critical for C/D? N N Y N N N N N

Table 4. Criticality of Component C This gives structural importance measures for the components of: IA = 5/8 IB = 3/8 (17) IC = ID = 1/8 BIRNBAUM MEASURE OF IMPORTANCE The Criticality Function, Gi(q), is the probability that the system is in a critical state for component i . This is also known as Birnbaums measure of importance. From table 2 Birbaums measure of importance for component A is given by summing the probability of being in a critical state. This is: GA = (1 qB)(1 qC)(1 qD) + (1 qB)(1 qC) qD + (1 qB)( qC)(1 qD) + (1 qB) qC qD + qB (1 qC)(1 qD) = (1 qB)+ qB (1 qC)(1 qD) (18) GA = 0.944 Similarly from tables 3 and 4 we get: GB = (1 qA)(1 qC) qD + (1 qA) qC (1 qD) + (1 qA) qC qD (19) GB = 0.252 and GC = (1 qA) qB (1 qD) (20) GC = 0.144 (21) GD = (1 qA) qB (1 qC) GD = 0.162 Whilst the structural and Birnbaum measures can be produced using the tabular approach this soon becomes impractical for real systems due to the size of the tables. An alternative means of calculating Birnbaums measure is to use: Qsys (22) Gi ( q ) = = Qsys (1i , q ) Qsys (0 i , q) qi where Qsys(1i,q) is the system failure probability with qi=1 and Qsys(0i,q) is the system failure probability with qi=0. CRITICALITY MEASURE OF IMPORTANCE The criticality measure of importance for component i is the contribution to the system failure probability due to the system being in a critical state for component i and i failing.

Gi (q(t ))qi (t ) (23) QSYS (t ) The failure probability of the simple system shown in figure 12, with minimal cut sets {A}, {B,C} and {B,D}, is given by: QSYS = q A + qB qC + qB qD q A qB qC q A qB qD qB qC qD + q A qB qC qD (24) = 0.1 + 0.02 + 0.04 0.002 0.004 0.004 + 0.0004 = 0.1504 The criticality importance measures for the components are then: (0.944)(0.1) ICMA = = 0.6277 0.1504 (0.252)(0.2) ICMB = = 0.3351 (25) 0.1504 (0.144)(0.1) ICMC = = 0.0957 0.1504 (0.162)(0.2) ICMD = = 0.2154 0.1504 I CM i =

FUSSELL-VESELY MEASURE OF IMPORTANCE The Fussell-Vesely measure of component importance for component i is defined as: the ratio of the probability of the union of all minimal cut sets containing i and the system failure probability. P U Cj iC r (26) I FVi = j QSYS For the simple system shown in figure 12 this measure gives: q 0.1 = 0.6649 I FVA = A = QSYS 0.1504 q ( q + qD qC qD ) 0.2(0.1 + 0.2 0.02) = = 0.3723 I FVB = B C QSYS 0.1504 q q 0.02 = 0.1330 I FVC = C B = QSYS 0.1504 q q 0.04 = 0.2660 I FVD = D B = QSYS 0.1504 (27)
8. SYSTEM FAILURE INTENSITY Let wSYS(t) is the system failure intensity at time t. Having calculated Birnbaums measure of importance for each of the n components means that the system failure intensity can be determined from:
wSYS (t ) = Gi ( q ) wi (t )
i =1 n

(28)

where wi is the component failure intensity and Gi(q) is the Criticality Function. 9 SYSTEM CASE STUDY As an example of applying a fault tree analysis to a system consider the simple tank level control system shown in figure 13. Initially the system has the push button contacts open and

2012 Annual RELIABILITY and MAINTAINABILITY Symposium

Andrews 7

switches 1 and 2 (SW1, SW2) contacts closed. To start the system the push button is pressed and held. This energises relay R1 which closes its contacts and maintains the circuit when the push button is released. Relay R2 is also energised and its contacts close, starting the pump in the second circuit. The pump transfers fluid to the tank. The level of the tank fluid is monitored by two level sensors L1 and L2. When the tank fluid reaches the required level switch SW1 opens and de-energises relay R2 turning off the pump. When the fluid in the tank is used and the level drops SW1 will close and pump fluid to replace that used. The normal operation of the system is the switch SW1 opening and closing which turns off and on the pump. As a safety feature, the second level sensor, L2, is connected to switch SW2. When the fluid level is unacceptably high SW2 opens which de-energises relay R1. R1 contacts then open to break the control circuit. This results in R2 deenergising, its contacts open and remove power from the pump. This will require a manual start-up of the circuit. Figure 13. Simple Tank Level Control System

Tank Overfills

Pump Motor energised too long Relay R2 contacts closed too long

Relay R2 contacts fail closed

Relay R2 remains energised

R2

Figure 14. Fault Tree for Top Event Tank Overfills For the system failure mode Tank overfills, the relevant component failure modes along with the failure rate and repair time data are shown in table 5. Some of the failure modes will be revealed such as relay R2 contacts stuck closed. This component condition will mean that the pump keeps running and the problem is revealed by the tank overfilling. Others such as relay R1 contacts fail closed will be unrevealed as this is the normal operating state for that component. All of the component failure modes associated with the safety systems (L2, SW2, R1 and PB) will be unrevealed as for this class of events the failure will only be revealed when the component is tested /inspected or when a demand for the component to work

8 Andrews

2012 AR&MS Tutorial Notes

occurs. For these component failure events an inspection interval is also specified which enables the probability of the event to be calculated. For this example an inspection interval of 4380 hours is assumed. Component Failure Mode Stuck closed Stuck closed Stuck closed Fail to indicate high level Code Failure Rate (per hour) 5. x 10-5 6. x 10-5 Mean Time to Repair (hours) 2. 10. 10. 5.

The final fault tree structure showing how the basic events combine to cause the system level failure event is illustrated in figure 15 For the tank level control system fault tree the complete list of minimal cut sets are given in table 6. As can be seen there are 9 failure combinations in total. One is first order (a single event causes system failure) and eight are of order two. 1 2 3 4 5 6 7 8 9 R2 SW1 SW1 SW1 SW1 L1 L1 L1 L1

Push Button Relay Contacts Switch Level Sensors

PB R1/R2

PB
R1 SW2 L2

SW1/SW2 5. x 10-5 L1/L2 2. x 10-6

PB R1 SW2 R1

Table 5. Component Failure Modes and Data

Table 6. Minimal Cut Sets Using the component failure data in table 1, the system failure parameters can be calculated: Top Event Probability = 1.39 x 10-3 Top Event Frequency = 1.919 x 10-4 per hour If the system failure predictions indicate an unacceptable performance the weaknesses can be identified using component importance measures. The Fussell-Vesely measure is indicated in table 7. This shows that component L1 provides the biggest contribution to system failure. Rank 1 2 3 4 5 6 7 Component L1 R2 L2 SW1 R1 SW2 PB Fussell Vesely 0.4148 0.3777 0.3155 0.2075 0.1139 0.0966 0.0963

Table 7 Importance Measures The system assessment results presented have been obtained using a commercial software package. Figure 15. Tank Overfill Fault Tree Structure The fault tree for the undesired top event tank overfills is developed in figure 14. The text boxes specify exactly what each gate output event in the fault tree represents. Each branch is developed downward using AND and OR gates until basic events (component failure events) are encountered and the failure causality development is terminated. 10. SUMMARY A summary of the features of fault tree analysis is: Provides a well structured development of the system failure logic. Forms a documented record of analysis which can be used to communicate fault development with regulators etc. Directly developed from the engineering system structure. Easily interpreted from the engineering viewpoint. Analysis gives all minimal cut sets.

2012 Annual RELIABILITY and MAINTAINABILITY Symposium

Andrews 9

Quantification gives the top system failure mode probability or frequency. Vulnerability to system failure can be identified using importance measures. 11. REFERENCES

1. W.E. Vesely, A Time Dependent Methodology for Fault Tree Evaluation, Nuclear Design and Engineering, no. 13 (1970): 337-360. 2. Z.W.Birnbaum, On the importance of different components in a multi-component system, Multivariate Analysis 11, P.R.Krishnaiah, ed.,Academic Press, 1969 3. Fussell, J. B., How to Hand-Calculate System Reliability Characteristics, IEEE Transactions on Reliability, R-24, (3), 1975 4. Lambert H.E and Dunglinson C., Interval Reliability for Initiating and Enabling events, , IEEE Transactions on Reliability, Vol 32, June 1983, pp 150-163. 5. Akers B, Binary Decision Diagrams, IEEE Trans on Computers, 27(6), 509-516, 1978. 6. Bryant R, Graph Based Algorithms for Boolean Function Manipulation, IEEE Trans on Computers, 35(8), 677-691, 1986.

7. Schneeweiss W., Fault Tree Analysis Using Binary Decision Diagrams, IEEE Trans on Reliability, 34(5), 453-457, 1985. 8. Rauzy A, New Approaches for Fault Tree Analysis, Reliability Engineering and System Safety, 05(59), 203211, 1993. 9. Sinnamon R.M. and Andrews J.D., Quantitative Fault Tree Analysis Using Binary Decision Diagrams, European Journal of Automation, 30 (8), 1996, 1051-1071. 10. Sinnamon R.M and Andrews J.D., Improved Efficiency in Qualitative Fault Tree Analysis, Quality and Reliability Engineering International, Vol 13, 1997, pp293-298. 11. Sinnamon R.M and Andrews J.D., Improved Accuracy in Quantitative Fault Tree Analysis, Quality and Reliability Engineering International, Vol 13, 1997, pp285-292 12. Andrews J.D. and Moss T.R., Reliability and Risk Assessment, Professional Engineering Publications Ltd, 2002. 13. Haasl D.F., Roberts N.H., Vesely, W.E. and Goldberg F.F., Fault Tree Handbook, US Nuclear Regulatory Commission NUREG-0492, 1981 14. Henley E.J. and Kumamato H., Reliability Engineering and Risk Assessment, Prentice-Hall, 1981

10 Andrews

2012 AR&MS Tutorial Notes

You might also like