Estimating Failure Function For Sure
Estimating Failure Function For Sure
Content
1 FAILURE FUNCTION DISTRIBUTIONS.................................................... 3
1.1 INTRODUCTION ......................................................................................... 3
1.2 NEGATIVE EXPONENTIAL DISTRIBUTION ........................................................... 3
1.3 NORMAL OR GAUSSIAN DISTRIBUTION ............................................................. 4
1.4 LOG-NORMAL DISTRIBUTION ......................................................................... 5
1.5 WEIBULL DISTRIBUTION .............................................................................. 6
2 ESTIMATION OF DISTRIBUTION............................................................ 7
2.1 INTRODUCTION ......................................................................................... 7
2.2 SAMPLES OF FAILURE DATA ........................................................................... 7
2.3 INCOMPLETE OR CENSORED SAMPLES .............................................................. 8
2.4 SAMPLE WITH ONLY A FEW COMPONENTS ......................................................... 9
2.5 ESTIMATING A WEIBULL DISTRIBUTION FROM A SAMPLE ......................................10
2.6 EXAMPLE OF WEIBULL DISTRIBUTION ESTIMATION.............................................11
2.7 PROBLEMS WITH FIELD DATA COLLECTION .......................................................13
2.8 BEST PRACTICE AND RECOMMENDATION FOR FIELD DATA COLLECTION ....................14
3 REFERENCES ........................................................................................ 16
1.1 Introduction
1
R (t ) = e - λt (for t » : R (t ) ≈ 1 - λt )
λ
1
F (t ) = 1 - e - λt (for t » : F (t ) ≈ λt )
λ
[1]
f (t ) = λ e - λt
λ (t ) = λ
λ(t)
1 t 1 x - μ
2
σ 2 π ∫0
F(t) = -
e
2 σ .dx
[2]
1 1 2
f(t) = e 2( )
- t-μ
σ 2π
With: -∞ < t < ∞
σ > 0: standard deviation
μ > 0: the average life expectancy
For -∞ < t < 0 the distribution has no
meaning for reliability or availability (a
negative life expectancy is of course
impossible). For μ > 3σ this negative
part is negligible.
λ(t)
t 1
2
1 1 ln x -μ
F(t) = .∫ . e - 2 σ . dx
σ 2π 0 x
[3]
1 1 ln t - μ
2
f(t) = . e - 2 σ
σ t 2π
With t, μ en σ > 0.
Second to the negative exponential distribution the Weibull distribution is probably the most
used distribution in reliability and availability research. The distribution is versatile and can be
used to describe various failure functions.
β
t - γ
F(t) = 1 - e - η
β-1
(t - γ)
β
t - γ
f(t) = β β
. e - η
[4]
η
β-1
(t - γ)
λ(t) = β . β
η
2.1 Introduction
The failure function of a component depends the load and the capability of the component to
cope with the load. The load can be mechanical, thermal, chemical and/or corrosive. The
components capability to cope with the loads is dependent on the design, the production quality
and the maintenance.
Most knowledge of failure function is a result of field data collection. There are however many
problems when collecting failure data. A very strict definition is needed.
There are many sources for failure function data. From literature, from commercial databases
and field data, either self obtained or from a manufacturer. Data in literature are often not very
well specified (conditions of use, failure exact mode) and often only provide a failure rate.
Commercial data bases are much more detailed, but still often a translation has to be made
between the conditions and failure modes described for the found data and the actual
conditions and failure modes. Most reliable are in general data acquired either through own
experience or directly from the manufacturer.
ˆ = N F (t)
F(t)
No
ˆ = N NF (t) = 1 - N F (t)
R(t)
No No
ˆ = NF (t + Δt) - N F (t)
f(t) [5]
N o . Δt
ˆ = N F (t + Δt) - N F (t)
λ(t)
N o(t) . Δt
(^ = estimate)
If a large sample is available it is practical to create classes. For a first estimate about
10 classes often suffice, Table I gives an example. Here N0 = 100, and 11 classes have
been defined. With this approach a small underestimate of the failure probability is
found for the lowest class (where no failure has occurred yet in the sample) and
simultaneously an overestimation of the failure probability is found for the highest
class (where no samples have been found).
Table I: Example of failure sample data, grey columns are actual acquired data.
Often the data available is truncated, or censored. There are four types ofen
encountered censoring:
Type I: The time the components are monitored is limited, not all components will
have failed at the end of the monitored period.
Type II: The number of components that are allowed to fail is limited, also to limit
the duration of the monitoring period.
Type III: Combination of I and II: either after a certain time or after a certain number
of components has failed (whichever comes first) monitoring is stopped.
Type IV: The start-up moment of the components is not the same, but a random
variable.
There a various ways to compensate for censored data sets. See for instance Section
9.3. in [Høland, 1994].
Another special approach is needed for small sample. Consider a sample of three times to
failure:
# tf
1 36
2 64
3 124
Using the method described in section 2.2 this gives:
^
# tf F (t)
1 36 0.33
2 64 0.67
3 124 1
This implies that for all components the probability of failure after 124 running hours become a
certainty! This is not very plausible, so lets assume four classes:
I: 0 - 36
II : 36 - 64
III : 64 - 124
IV: ≥124
If we now assume that new failures have an equal chance to occur in either of thiese
classes, the following failure function is found:
^
# tf F (t)
1 36 0.25
2 64 0.50
3 124 0.75
Generalized, this estimate results in:
ˆ ti) = i
F( [6]
No + 1
A assumption underlysing this approximation ios that the distributon function is symmetrical.
Often however a slanted distribution is often found (e.g. log-normal distribution). An often used
approximation, instead of equation [6] is:
ˆ ti) = i - 0,3
F( [7]
N o + 0,4
To be able to calculate failure probability and other quantities at any given time is one
of the main advantages of estimating a distribution function rather then using
tabulated sample results.
1
With: y= ln ln , x = ln t, and c = β ln η, this gives:
1 - F(t)
y=β.x-c [10]
This is the equation of a straight line! The axis of the Weibull paper are transformed to
represent this, Figure 5.
The sample data is now plotted with on the horizontal axis the time to failure and on
the vertical axis the approximated failure function. A proper fit is found if it is possible
to draw a straight line through the plotted points.
^
The characteristic life η is found for F (t) = 0.632 :
β β
t η
- -
η η
F(η) = 1 - e =1-e = 0,632
The shape parameter β is the inclination of the found line. For convenience, a graphic
aid is included on the Weibull graph: in the upper left corner a scale for β is given. By
^
moving the found line in parallel until it crosses the open dot near F (t) = 0.632 , β is
found on the scale on top.
If the result is not a straight line, it is possible that the minimal life γ > 0. By reducing
the time to failures with the minimal life again a straight line is found. It is only
possible to iteratively determine the appropriate minimum life.
hours. This result is also shown in Figure 5. The resulting points are very well
approximated by a straight line.
f(t) = 2,2
* e
-
3200 [11]
3200
2,2 (t - 2000) 1,2
λ(t) = 2,2
3200
Class ^ ^ ^ ^
tf [hour] F (t) λ(t) F (t) λ(t)
nr.
1 <2000 0 0 0 0
Inventories: Whilst failure reports identify the numbers and types of failure they rarely
provide a source of information as to the total numbers of the item in question and
their installation dates and running times.
Motivation: If the field service engineer can see no purpose in recording information it
is likely that items will be either omitted or incorrectly recorded. The purpose of fault
reporting and the ways in which it can be used to simplify the task need to be
explained. If the engineer is frustrated by unrealistic time standards, poor working
conditions and inadequate instructions, then the failure report is the first task which
will be skimped or omitted. A regular circulation of field data summaries to the field
engineer is the best (possibly the only) way of encouraging feedback. It will help him
to see the overall field picture and advice on diagnosing the more awkward faults will
be appreciated.
Verification: Once the failure report has left the person who completes it the possibility
of subsequent checking is remote. If repair times or diagnoses are suspect then it is
likely that they will go undetected or be unverified. Where failure data are obtained
from customers staff, the possibility of challenging information becomes even more
remote.
Cost: Failure reporting is costly in terms of both the time to complete failure-report
forms and the hours of interpretation of the information. For this reason, both supplier
and customer are often reluctant to agree to a comprehensive reporting system. If the
information is correctly interpreted and design or manufacturing action taken to
remove failure sources, then the cost of the activity is likely to be offset by the savings
and the idea must be ’sold’ on this basis.
Recording non-failures: The situation arises where a failure is recorded although none
exists. This can occur in two ways. First, there is the habit of locating faults by
replacing suspect but not necessarily failed components. When the fault disappears the
first (wrongly removed) component is not replaced and is hence recorded as a failure.
Failure rate data are therefore artificially inflated and spares depleted. Second, there is
the interpretation of secondary failures as primary failures. A failed component may
cause stress conditions upon another which may, as a result, fail. Diagnosis may reveal
both failures but not always which one occurred first. Again, failure rates become
wrongly inflated. More complex maintenance instructions and the use of higher-grade
personnel will help reduce these problems at a cost.
Times to failure: These are necessary in order to establish wear out See next section.
1
This section is based on section 13.2 from “Reliability, Maintainability and Risk”, by D.J. Smith,
Butterworth-Heinemann ISBN 0-7506-5168-7, 2001).
The following list summarizes the best practice together with recommended
enhancements for both manual and computer based field failure recording. Recorded
field information is frequently inadequate and it is necessary to emphasize that failure
data must contain sufficient information to enable precise failures to be identified and
failure distributions to be identified. They must, therefore, include:
Adequate information about the symptoms and causes of failure. This is important
because predictions are only meaningful when a system level failure is precisely
defined. Thus component failures which contribute to a defined system failure can only
be identified if the failure modes are accurately recorded. There needs to be a
distinction between failures (which cause loss of system function) and defects (which
may only cause degradation of function).
Intervals between common cause failures. Because common cause failures do not
necessarily occur at precisely the same instant it is desirable to be able to identify the
time elapsed between them.
The effect that a ’component part’ level failure has on failure at the system level. This
will vary according to the type of system, the level of redundancy (which may
postpone system level failure) etc.
Costs of failure such as the penalty cost of system outage (e.g. loss of production) and
the cost of corrective repair effort and associated spares and other maintenance costs.
2
This section is based on section 13.5 from “Reliability, Maintainability and Risk”, by D.J. Smith,
Butterworth-Heinemann ISBN 0-7506-5168-7, 2001).
Effective data screening to identify and correct errors and to ensure consistency. There
is a cost issue here in that effective data screening requires significant man-hours to
study the field failure returns. In the author’s experience an average of as much as
one hour per field return can be needed to enquire into the nature of a given failure
and to discuss and establish the underlying cause. Both codification and narrative are
helpful to the analyst and, whilst each has its own merits, a combination is required in
practice. Modern computerized maintenance management systems offer possibilities
for classification and codification of failure modes and causes. However, this relies on
motivated and trained field technicians to input accurate and complete data. The
option to add narrative should always be available.
Adequate information about the environment (e.g. weather in the case of unprotected
equipment) and operating conditions (e.g. unusual production throughput loadings).
3 REFERENCES
[Høland, 1994]
A. Høland and M. Rausand: “System Reliability Theory, Models and Statistical
Methods”, ISBN 0-471-59397-4, John Wiley & Sons Inc., 1994.
[Smith, 2001]
D.J. Smith: “Reliability, Maintainability and Risk”, ISBN 0-7506-5168-7, Butterworth-
Heinemann, 2001.