KK Aggarwal
KK Aggarwal
VOLUME 3
Editor
A.Z. Keller, Department of Industrial Technology and Management,
University of Bradford, U.K.
Aims and Scope. Fundamental questions which are being asked these days of all
products, processes and services with ever increasing frequency are:
How safe?
How reliable?
How good is the quality?
In practice none of the three topics can be considered in isolation as they often
interact in subtle and complex fashions. The major objective of the series is to
cover the spectrum of disciplines required to deal with safety, reliability and
quality. The texts will be of a level generally suitable for final year, M.Sc and Ph.D
students, researchers in the above fields, practitioners, engineers, consultants and
others concerned with safety, reliability and quality.
In addition to fundamental texts, authoritative 'state of the art' texts on topics of
current interest will be specifically commissioned for inclusion in the series.
The special emphasis which will be placed on all texts will be, readability, clarity,
relevance and applicability.
The titles published in this series are listed at the end of this volume.
Reliability Engineering
by
K. K. AGGARWAL
Centrefor Excellence in Reliability Engineering,
Regional Engineering College,
Kurukshetra, India
ISBN 978-94-010-4852-1
Preface ix
1.1 Introduction 1
1.2 Need for Reliability Engineering 2
1.3 Definition 4
1.4 Causes of Failures 7
1.5 Catastrophic Failures and Degradation Failures 9
1.6 Characteristic Types of Failures 11
1.7 Useful Life of Components 13
1.8 The Exponential Case of Chance Failures 15
1.9 Reliability Measures 19
1.10 Failure Data Analysis 25
3.1 Introduction 59
3.2 Reliability Block Diagrams 60
3.3 Series Systems 62
3.4 Parallel Systems 67
3.5 Series Parallel Systems 70
3.6 K-out-of-M Systems 73
3.7 Open and Short Circuit Failures 75
3.8 Standby Systems 81
vi
4.1 Introduction 87
4.2 Path Determination 89
4.3 Boolean Algebra Methods 91
4.4 A Particular Method 93
4.5 Cut Set Approach 96
4.6 Delta- Star Method 97
4.7 Logical Signal Relations Method 100
4.8 Baye's Theorem Method 103
140
PROBLEMS 335
REFERENCES 367
In its most wider sense, the word Reliability has a very important
meaning: Re-Liability which simply means that it is liability, not once but
again and again, from designers, manufacturers, inspectors, vendors to
users and on all those who are involved with a system in any way to
make it reliable. Much attention is being paid, more than ever before, to
the quality and reliability of engineering systems.
Much of the subject matter for the text has been taken from the lecture
notes of the courses which the author co-ordinated for the benefit of
practising engineers. Some of the contributors to these lecture notes deserve
my special acknowledgment. These are: Professor Krishna Gopal,
Dr.V.K.Sharma, Ms.Shashwati and Ms.Namrata of Regional Engineering
College, Kurukshetra; Professor N.Viswanadham, and Professor V.V.S.Sarma
of Indian Institute of Science, Bangalore; Shri A.K.Sinha and Shri P.K.Rao of
Centre for Reliability Engineering, Madras; Shri Siharan De and Shri
Chandragupta from Indian Space Research Organization. In addition to these
lecture notes, I have drawn very heavily from several books and papers
already published in the field of reliability engineering. It is my pleasure to
specially mention my obligation to Balagurusamy, Dhillon, Bazovsky, Ireson,
xi
The author has tried his level best to make the text complete and free of
mistakes. Nonetheless, as a student of reliability engineering he does
realize that failures can only be minimized and their effects mitigated but
these can not be completely eliminated. I thank all those who helped me
directly and indirectly to reduce the failures and own full responsibility for
all those which still remain. I shall be grateful if any such shortcomings
or mistakes are brought to my notice.
K K AGGARWAL
1
RELIABILITY FUNDAMENTAL S
1.1 INTRODUCTION
In the earlier times, the problems connected with the development and
operation of the systems were serious but the consequences of failures were
not as dramatic or as catastrophic. From the beginning of the industrial age
reliability problems had to be considered rather seriously. At first, reliability
was confined to mechanical equipment. However, with the advent of
electrification considerable effort went into making the supply of electric
power reliable. With the use of aircraft came the reliability problems
connected with airborne equipment, which were more difficult to solve than
reliability problems of stationary or land-transportati on equipment. Reliability
entered a new era with the onset of the electronic age, the age of jet aircraft
flying at sonic and supersonic speeds and the age of missiles and space
vehicles. In the early days, the reliability problems had been approached by
using:
The above approaches suddenly became impractical for the new types of
airborne and electronic equipment. The intuitive approach and the redesign
approach had to make way for an entirely new approach to reliability -
statistically defined, calculated and designed.
The overall scientific discipline that deals with general methods and
procedures during the planning, design, acceptance, transportation and
testing of manufactured products to ensure their maximum effectiveness
during use and provides general methods for estimating reliability of
complex systems from component reliabilities has received the name
Reliability Engineering. Designing equipment with specified reliability figures,
demonstration of reliability values, issues of maintenance, inspection, repair
and replacement and the notion of maintainability as a design parameter
come under the purview of Reliability Engineering. It is thus obvious that the
reliability theory needed for achieving the above mentioned tasks is a
precise mathematical theory based on probability and mathematical
statistics. Also there exist conflicting requirements of cost, performance,
safety and reliability needing system-theoreti c techniques of optimization and
simulation. The complexity of modern systems however demands computer
aided approaches to reliability assessment.
During the World War II reliability was considered to be one of the pressing
needs in order to study the behaviour of various systems used by the
military. Several studies carried out during this period revealed startling
results.
(a) A study uncovered the fact that for every vacuum tube in use,
there was one in spare and seven tubes in transit for which
orders had already been placed.
{c) An army study revealed that between two thirds and three
fourths of equipments were out of commission or under repair.
(d) An air force study conducted over a five year period disclosed
that repair and maintenance costs were about 10 times the
original cost.
(g) Twenty- four maintenance man -hours per flight hour were
required in Navy aircraft in 1949. It was estimated that this
rose to 80 in 1965, primarily because of an increase in
electronic equipment complexity from 120 parts in 1949 to
8,900 in 1960 to an estimated 33,000 in 1965.
(h) A study revealed that a pre- World War II civil aeroplane had
about $4,000 worth of electronic control, navigation and
communication apparatus. The post- war commercial DC-6
required in excess of $50,000 worth of electronic apparatus
while a contemporary jet bomber has over $1,000,000 worth of
electronic gear, a twenty fold increase over DC-6 and over 200
times that of pre- World War II aeroplanes.
The size of the system, the intricacy of the specified functions, the length of
the useful interval of the life variable, and the degree of hostility of the
system's environment all influence the reliability.
It will be clear that the tendency towards larger systems, i.e. systems
with larger numbers of components, would decrease the reliability if the
development of more reliable system components and structures does not
keep in step. There are many such systems with a large quantitative
complexity, such as energy distribution networks, telecommunication
systems, digital computer networks, and space probes.
Further, the correct functioning of a system over a longer interval of the life
variable is increasingly important as we become dependent on such
systems (energy generation systems, pacemakers and the like). These so
called critical systems require a high reliability, often over long periods
(e.g. 25 years for telecommunication systems). A source of concern in
pacemakers, for instance, is the energy source, since circuit failures in
pacemakers occur with a probability of less than 140x 1Q-9 per hour.
Besides this, our technical systems are more and more put to use in
hostile environments; they have to be suitable for a wider variety of
environments. Just think of applications in the process industry (heat,
humidity, chemical substances), mobile applications in aircraft, ships, and
vehicles (mechanical vibrations, shocks, badly defined power supply
voltages, high electromagnetic interference level).
Also the socio-ethical aspects of products with a reliability that is too low
cannot be underestimated. These low- reliability disposable products lead to
a waste of labour, energy, and raw materials that are becoming more and
more scarce.
1.3 DEFINITION
The following definitions of reliability are most often met with in the
literature.
1. Probability
2. Adequate performance
3. Time
4. Operating and environmental conditions.
The true reliability is never exactly known, but numerical estimates quite
close to this value can be obtained by the use of statistical methods and
probability calculations. How close the statistically estimated reliability
comes to the true reliability depends on the amount of testing, the
completeness of field service reporting all successes and failures, and other
essential data. For the statistical evaluation of an equipment, the equipment
has to be operated and its performance observed for a specified time
under actual operating conditions in the field or under well-simulated
conditions in a Laboratory. Criteria of what is considered an adequate
performance have to be exactly spelled out for each case, in advance.
It is true that only in some simple cases, where devices of the go-no-go
type are involved, the distinction between adequate performance and
failure is a very simple matter. For instance, a switch either works or does
not work - it is good or bad. But there are many more cases where such a
clear-cut decision can not be made so easily and a number of performance
parameters and their limits must first be specified.
2. System Complexity
3. Poor Maintenance
5. Human Reliability
Time
Fig.I.I Three examples of monotonic drift two of which give rise to failures.
Y(t)
y
max
(•)
Y .
IDlll
time
v(t)
(b)
vr
time
First, there are the failures which occur early in the life of a component.
They are called earlyfailures. Some examples of early failures are:
Secondly, there are failures which are caused by wearout of parts. These
occur in an equipment only if it is not properly maintained-or not maintained
at all. Wearoutfailures are due primarily to deterioration of the design strength
of the device as a consequence of operation and exposure to environmental
fluctuations. Deterioration results from a number of familiar chemical and
physical phenomena:
* Corrosion or oxidation
* Insulation breakdown or leakage
* Ionic migration of metals in vacuum or on surfaces
* Frictional wear or fatigue
* Shrinkage and cracking in plastics
Third, there are so-called chance failures which neither good debugging
techniques nor the best maintenance practices can eliminate. These failures
Reliability Fundamentals 13
If we plot the curve of the failure rate against the lifetime T of a very large
sample of a homogeneous component population, the resulting failure rate
graph is shown in Fig 1.3. At the time T =O we place in operation a very
large number of new components of one kind. This population will initially
exhibit a high failure rate if it contains some proportion of substandard,
weak specimens. As these weak components fail one by one, the failure
rate decreases comparatively rapidly during the so-called burn-in or
debugging period, and stabilizes to an approximately constant value at
the time Tb when the weak components have died out. The component
population after having been burned in or debugged, reaches its lowest
failure rate level which is approximately constant. This period of life is
called the useful life period and it is in this period that the exponential
law is a good
approximation. When the components reach the life Tw wearout begins to
make itself noticeable. From this time on, the failure rate increases rather
rapidly. If upto the time Tw only a small percentage of the component
population has failed of the many components which survived up to the time
Tw• about one-half will fail in the time period from Tw to M. The time M is
the mean wearout life of the population. We call it simply mean life,
distinguished from the mean time between failures, m = 1/"A. in the useful
life period.
... , --
I
1-- Chance failures
I
A.=l/m
0 Tb T M
w
Operating life 1{age)
If the chance failure rate is very small in the useful life period, the
mean time between failures can reach hundreds of thousands or even
millions of hours. Naturally, if a component is known to have a mean
time between failures of say 100,000 hours (or a failure rate of
0.00001) that certainly does not mean that it can be used in operation
for 100,000 hours.
The mean time between failures tells us how reliable the component 1s m
its useful life period, and such information is of utmost importance. A
component with a mean time between failures of 100,000 hours will have a
reliability of 0.9999 or 99.99 percent for any 10-hour operating period.
Further if we operate 100,000 components of this quality for 1 hour, we
would expect only one to fail. Equally, would we expect only one failure if
we operate 10,000 components under the same conditions for 10 hours, or
1000 components for 100 hours, or 100 components for 1000 hours.
In the simplest case, when a device is subject only to failures which occur
at random intervals, and the expected number of failures is the same
for eQually long operating periods, its reliability is mathematically defined by
the well-known exponential formula
In this formula A. is a constant called the failure rate, and t is the operating
time. The failure rate must be expressed in the same time units as time, t
usually in hours. However, it may be better to use cycles or miles in same
cases. The reliability R is then the probability that the device, which has a
constant failure rate A. will not fail in the given operating time t.
This reliability formula is correct for all properly debugged devices which are
not subject to early failures, and which have not yet suffered any degree
of wearout damage or performance degradation because of their age.
The probability that the device will not fail in its entire useful life period
of 1000 hours is
We often use the reciprocal value of the failure rate, which is called the
mean time between failures, m. The mean time between failures,
abbreviated MTBF can be measured directly in hours. By definition, in the
exponential case, the mean time between failures, or MTBF is
m = 1/ A. (1.2)
When plotting this function, with Reliability values on the ordinate and the
corresponding time values on the abscissa, we obtain a curve which is
often referred to as the survival characteristic and is shown in Fig 1.4.
There are a few points on this curve which are easy to remember and which
help greatly in rough predicting work. For an operating time t = m, the
device has a probability of only 36.8 percent (or approximately 37 percent)
to survive. For t = m/10, the curve shows a reliability of R = 0.9 and for t
= m/100, the reliability is R = 0.99; for t = m/1000, it is 0.999.
Reliability
1.0
(a)
0 m 2m Time
3m
1.00
0.99
I
1-
0.95
(b)
- -+- ---
m/100 m/20 m/10
Example 1.1
Solution
A.= 0.0001/hr
Therefore, m = 1/ A. = 10,000 hr
t = 100 hours
S,000 0.02
0.999995
0.03
0.999990
0.0005
o.os
0.999950
1,000 0.001 0.1
0.999900
0.2
soo 0.999500
0.3
o.s
o.oos 0.999000
1.0
0.995
100 0,01
2.0
0.99
so 3.0
4.0
0.95
s.o
0.90
o.os 10.0
20.0
10 0.1 o.s 30.0
so.o
s 0.1
100.0
o.s
200.0
1.0
300.0
soo.o
II
III
***
1.9 RELIABILITY MEASURES
If a fixed number N0 of components are tested, there will be, after a time
t, N8(t) components which survive the test and N1(t) components which
fail. Therefore, N0 = N8(t) + N1(t) is a constant throughout the test. The
reliability,
expressed as a fraction by the probability definition at any time t during the
test is:
In the same way, we can also define the probability of failure 0 (called
unreliability) as
Rearranging,
components will fail out of these N8(t) components. When we now divide
both sides of the equation (1.9) by N8(t), we obtain the rate of failure or the
instantaneous probability of failure per one component, which we call the
failure rate:
which is the most general expression for the failure rate because it
applies to exponential as well as non-exponential distributions. In the
general case, I.. is a function of the operating time t, for both R and dR/dt
are functions of t. Only in one case will the equation yield a constant, and
that is when failures occur exponentially at random intervals in time.
By rearrangement and integration of the above equation, we obtain
the general formula for reliability,
1..(t)dt = -(dR(t)/R(t))
When we specify that failure rate is constant in the above equation, the
exponent becomes
t
-I t..(t) dt = - /.. t
0
and the known reliability formula for constant failure rate results,
It may be observed that the total area under this curve equals unity because
00 00
which means the failure rate at any time t equals the f(tl value divided
by the reliability, both taken at the time t. This equation again applies to all
possible distributions and reliabilities, whether or not they are exponential.
In the special case when A. is constant, the distribution is
We also have
By integration, we obtain,
t
O(tl = I f(tl dt
( 1.181
0
t ( 1.191
R(tl 1- J f(tl dt
0
but because the area under the density curve is always unity, we can write
00 t 00
This is shown in Fig1 .6, the graph of the density function for the
exponential case.
f(t)
0 Operating time
Fig. 1.6 The exponential density function.
The important point we have made here is that the failure rate is always
equal to the ratio of density to reliability. In the exponential case this ratio is
constant. However, in the case of non- exponential distributions, the ratio
changes with time and, therefore, the failure rate is then a function of time.
We have thus specified relationships between four important reliability
measures:
I f!tldt
i0
ttt
I
I
O(tl 1-R(tl•1-exp[- I A.(tldt]0
f!tldt
! i0 t
A.(tl I (-1/R(t)JdR(tl/dt [1/(1-0(tl lJdO(t)/dt • (t)/[J f(t)dt]
No
m = ( 1/N0l I t dNt
0
00 00
(1.231
m = (1/N0l I t N0 f(tl dt = I t f(tldt
0 0
As f(tl = -dR/dt
1
m =I t dR (1.24)
0
From the reliability curve Fig 1.7, this can be easily interpreted
as
00 (1.251
m = I R(tl dt
0
Hence, MTBF can always be expressed as the total area under the
reliability curve.
R(t)
dt time
Fig. 1.7 Area under the reliability curve.
A.( t} = A.
f(t) = A. exp(- A. t)
00
1/A. (1 .26)
m J exp(- A. t) dt
0
Example 1.2
Solution
In this case,
A.(t) = kt
t
Hence, R(t) = exp[ - I kt dt] = exp(- kt2/2l
0
00
***
1.10FAILURE DATA ANALYSIS
The pattern of failures can be obtained from life test results, i.e. by testing a
fairly large number of models until failure occurs and observing failure-
rate characteristics as a function of time. The first step, therefore, is to link
reliability with experimental or field-failure data. Suppose we make
obser vations on the system at times t1, t2,....etc. Then we can define the
failure density function as follows:
The failure data for ten electronic components is as given in Table1 .3.
Compute and plot failure density, failure rate, reliability and unreliability
functions.
.........
,....
I.
...... .........,...
. .!.: .:..!?.a.t..!r.. -,
! Failure No ! 1
.P..1,...
1 2 l 3 1 41.:..,.. ! 5 6l 7l 8l
............,.......
.......,..........
....,
9l 10
The computation of failure density and failure rate is shown in Table 1.4.
Similarly the computation of reliability and unreliability function is shown
in Table 1.5. These results are also shown in Fig 1.8. As shown, we can
compute R(t) for this example using the formula R(t) = N8(ti)/N0 at each
value of ti and connecting these points by a set of straight lines. In the data
analysis one usually finds it convenient to work with A.(t) curve and deduce
the reliability and density functions theoretically. For example, in this
illustration, we can see that the hazard rate can be modeled as a constant.
***
Table 1 4 Computafion of fa1'Iure dens1"t[y and f a1'Iure rate
Time Interval Failure density Failure rate
(Hours)
0-8 11(10 x 8) = 0.0125 1/(10 x 8) = 0.0125
8-20 11(10 x 12) = 0.0084 11(9 x 12) = 0.0093
20-34 11(10 x 14) = 0.0072 11(8 x 14) = 0.0096
34-46 1/(10 x 12) = 0.0084 1/(7 x 12) = 0.0119
46-63 11(10 x 17) = 0.0059 11(6 x 17) = 0.0098
63-86 11(10 x 23) = 0.0044 11(5 x 23) = 0.0087
86-111 11(10 x 25) = 0.0040 11(4 x 25) = 0.0100
111-141 11(10 x 30) = 0.0033 11(3 x 30) = 0.0111
141-186 11(10 x 45) = 0.0022 11(2 x 45) = 0.0111
186-266 11(10 x 80) = 0.0013 1/(1 x 80) = 0.0125
0 time time
(a) (b)
(I) Q(t)
(c) time
L (d)
time
That means that 1/N8(t) and dNt(t)/dt must either decrease at the same rate
or must be held constant through the entire test. A simple way to measure a
constant failure rate is to keep the number of components in the test
constant by immediately replacing the failed components with good ones.
The number of alive components N8(t) is then equal to N0 throughout the
test. Therefore, 1/N8(t) = 1/N0 is constant, and dNt(t)/dt in this test must
also be constant if the failure rate is to be constant. But dNt(t)/dt will be
constant only if the total number of failed components Nt(t) counted from
the beginning of test increases linearly with time. If Nt components have
failed in time t at a constant rate, the number of components failing per unit
time becomes Ntlt and in this test we can substitute N tlt for dNt(t)/dt and
1/N0 for 1/N8(t). Therefore,
A. = (1/N8(t))(dNt(t)/dt) = ( 1/N0HNt/t) (1.29)
Thus, we need to count only the number of failures Nt and the straight hours
of operation t. The constant failure rate is then the number of failures
divided by the product of test time t and the number of components in test
which is kept continuously at N0• This product N0•t is the number of unit
hours accumulated during the test. Of course, this procedure for determining
the failure rate can be applied only if A. is constant.
If only one equipment (N0 = 1) is tested but is repairable so that the test can
continue after each failure, the failure rate becomes A. = Nt/t where the
unit hours t amount to the straight test time.
Exampl.e 1.4:
Consider another example wherein the time scale is now divided into equally
spaced intervals called class intervals. The data is tabulated in the Table 1.6
in class intervals of 1000 hours. Compute the failure density and failure
rate functions.
Table 1.6: Data for Example 1.4
Time i nterval hours Fail ures i n the i
nterval
1 0 0 1 - 2 0 0 0 .
·········2 0 0 · 1··· ··: 3' 0· 0 0·········T
2 4
·············
··········· 2· 9·························
4001 - 5000 17
5001 - 6000 13
Solution:
It can be seen that the failure rate in this case can be approximated by a
linearly increasing time function.
Example 1.5 :
A sample of 100 electric bulbs was put on test for 1500 hrs. During this
period 20 bulbs failed at 840, 861, 901, 939, 993, 1060, 1100, 1137,
1184, 1200, 1225, 1251, 1270, 1296, 1314, 1348, 1362, 1389, 1421,
and 1473 hours. Assuming constant failure rate, determine the value of
failure rate.
Solution:
In this case,
Nt = 20
N0t = 840 + 861 + 901 + 939 + 993 + 1060 + 1100 + 1137 + 1184 + 1200 +
12
25 + 1251 + 1270 + 1296 + 1314 + 1348 + 1362 + 1389 + 1421 +
1473 +
80(1500) = 143, 564 hrs.
***
2
RELIABILITY MATHEMATICS
30
Reliability Mathematics 31
When considering sets and operations on sets, Venn diagrams can be used
to represent sets diagrammatically. Fig 2.1(a) shows a Venn diagram for A (")
B and Fig 2.1(b) shows a Venn diagram for A u B. Fig 2.1(c) shows a Venn
diagram with three sets A, B and C.
AnB
A B
A B
(b)
(a)
32 Reliability Engineering
A B
(c)
A group of 10 men and 8 women are administered a test for high blood
pressure. Among the men, 4 are found to have high blood pressure, whereas
3 of the women have high blood pressure. Use a Venn diagram to illustrate
this idea.
Solution
The Venn diagram is shown in Fig 2.2. The circle labeled H represents the 7
people having high blood pressure, and the circle labeled W represents the
8 women. The numbers placed in the various regions indicate how many
people there are in the category corresponding to the region. For example,
there are 4 people who have high blood pressure and are not women.
Similarly there are 5 women who do not have high blood pressure.
H w
4 3 s
***
2.2 PROBABILITY THEORY
There is a natural relation between probability theory and set theory based
on the concept of a random experiment for which it is impossible to state a
particular outcome, but we can define the set of all possible outcomes. The
sample space of an experiment, denoted by S, is the set of all possible
outcomes of the experiment. An event is any collection of outcomes of the
experiment or subset of the sample space S. An event is said to be simple if
it consists of exactly one outcome, and compound if it consists of more
than one outcome.
k k
Pr ( u Ai ) = I:
Pr(Aj) i=1 i=1
We can also use the concept relative frequency to develop the function Pr(.). If
we repeat an experiment n times and event A occurs nA times, 0 < nA < n,
then the value of the relative frequency fA = nA/n approaches Pr(A) as n
increases to infinity.
k n-1 n
4. Pr(A1 u A2 u ....u An) = I: Pr(Ai) - I: I: Pr(Ai f""'I Aj)
i=1 i=1 j=i+1
n-2 n-1 n
+ I: I: I: Pr(Ai f""'I Ai f""'I Ak)
i=1 j =i+ 1 k =j+ 1
+ ...+ (-1)n+1 Pr(A1 f""'I A2 f""'I ... f""'I An)
(2.1)
2.22 Conditional Probability
Pr(Ai n B)
Pr(Ai /B) = ----------------- ; i = 1,2,......., n (2.8)
Pr(B)
Example 2.2
Think of the relays as being drawn one at a time. Let A be the event that the
first is good, and B the event that the second is good. Then the probability
that both are good is
Pr (A n B) = Pr (A) Pr (B/A)
= (8/10) x (7/9) = 28/45
The reason that Pr(B/A) = 7/9 is that knowing that the first one is good
means that there are now 7 good ones left among the 9 possible ones
that might be chosen second.
***
Example
2.3
Solution
Let B denote the event that the randomly selected device is good, and let
A, and A2 be the events that it comes from machines 1 and 2 respectively.
Then using (2.7),
Pr(B) = Pr (B n A t ) + Pr (B n A2)
= 0.92
***
Example
2.4
Three boxes contain two coins each. Box 1 contains two gold coins; box 2,
one gold and one silver coin; and box 3, two silver coins. A box is selected
at random, and then a coin is selected at random from the box. The coin
turns out to be gold. What is the probability that the other coin in the box is
gold?
Solution
Pr(box -1 ) Pr(gold/box -1
) Pr(box -1/gold)
Pr(gold)
(1/3)(1)
2/3
(1/2)
***
If X is a random variable, then for any real number x, the probability that
X will assume a value less than or equal to x is called the probability
distribution function of the random variable X and is denoted by F(x), i.e.
f(x)
x2 x3 x4 xS
0 xl
x
F(x)
0 xl
x2 x3 x4 xS
x
Fig. 2.3 Probability density function f(x) and distribution function F(x) for a discrete
nndom variable.
where the summation is extended over all indices for which xi x. It is clear
that F(x) is the distribution function of the random variable X. Since the
distribution function is a cumulative probability, it is often called the
cumulative-distribution function. The distribution function and probability
density function for a discrete random variable are shown in Fig 2.3.
Example 2.5
Suppose that 100 people have been checked by a dentist, and the
breakdown of the number of cavities found is as follows:
No. of cavities 0 1 2 3 4 5 6 7
No. of people with 40 25 15 12 4 2 0 2
this many cavities
Solution
The values of probabilities are easily read from the data given as:
Pr (x =01 = 0.40 ,
Pr (x =1) = 0.25 , and so on.
0.98 1.00
-----X
0 1 2 3 4 S 6 7 8
Fig. 2.4 Distribution function for example 2.5
***
2.41. Binomial
Distribution
number of times the given event occurs in a set of trials. Such problems
can be solved by using the so-called binomial distribution provided they
satisfy the following assumptions:
1.There are only two possible outcomes, success or failure, for each
trial.
2.The probability of success is constant from trial to trial.
3.There are m trials, where m is a constant.
4.The m trials are independent, i.e. they do not influence each other.
2...........T!..3.... . . . . .. ..... . .
9 1 9 l 36 ! 84 126 126 ! 84 i 36 9 1
10 10 i 45 i 120 210 252 i 210 ! 120 45 10 i1
Example 2.6
An aircraft uses three active and identical engines in parallel. All engines fail
independently. At least one engine must function normally for the aircraft to
fly successfully. The probability of success of an engine is 0.8. Calculate the
probability of the aircraft crashing. Assume that one engine can only be in
two states, i.e., operating normally or failed.
Solution
Example 2.7
Solution
In this case,
m = 10
k 2
p 0.30
or, q 0.70
***
2.42 Poisson Distribution
('A.t)x exp(-A.t)
f(x) ------------------ X = 0 1,• • • • • • • • n
1 I (2.15)
(x)!
exp(-µ) µx,
f(x) X= 0, 1,2,... (2.16)
(x)!
It can be seen that it is a limiting form of the binomial distribution for large n
and small p, where np = µ is the most probable number of occurrences.
Solution
Example 2.9
Suppose the number of cars entering a certain parking lot during a 30-
second time period is known to be a random variable having a Poisson mass
function with parameter µ= 5. What is the probability that during a given
30 seconds period exactly 7 cars will enter the lot.
Solution
It may be observed that this answer is quite close to the one obtained
in example 2.9, where Poisson distribution was assumed instead. This can
be considered as a numerical confirmation to the fact that: when n is large
and p is small then the binomial distribution with parameters n and p is
approximately equal to the Poisson distribution with parameter = n.p. That
is why we call Poisson distribution as a bridge between discrete distributions
and continuous distributions.
x
F(x) = J f(y) dy (2.18)
-oo
If the function F(x) is continuous, then its derivative is the density function,
It may be noted that this density function has the following properties:
b (2.20)
Pr(a < x b) = F(b)- J f(x)dx
F(a) a
This means that the probability of the event a < X b equals the area
under the curve of the density function f(x) between x =a and x = b.
00
3. J f(x)dx = 1 (2.21)
-oo
Example 2.10
Solution
t t
F(t) = l f(t) dt l 2t dt = t2
-00 0
A plot of f(t) and F(t) for the example is shown in fig 2.5
F(t)
f{t)
----
- -- - -
0 0
***
Example 2.11
Suppose f(t) = c(4-t2) for -2 < t <2, with f(t) = 0 otherwise. Determine
the value that c must have in order for f to be a density function.
Solution
The total area under any density curve must be one. Hence,
2
I fCt> dt = 1
-2
2
or, fcC4-t)2 dt = 1
-2
2
or, c [4t - t3/31 I = 1
-2
or, c = 3/32.
***
2.51 Uniform
Distribution
f(x) = c, a < x
= 0, s: b
otherwise
Since b b
I f(x)dx = I c dx = 1
a a
1/b-a _ _
0 a b
x
(a)
F(x)
0 a
x
(b)
0 xsa (2.23)
F(x) = (x-a)/(b- a<xSb
a) x > b
1
A continuous random variable having the range O<x <oo is said to have
an exponential distribution (Fig 2.7) if it has the probability-density
function of the form
f(x) = A. exp(- A.x), 0 s x < oo (2.24)
00
J f(x)dx = 1 (2.27)
- 00
F(x)
f{x)
x
Fig.2.7 F(x) and f(x) ofan exponential distribution.
and therefore
00
(2.28)
I a x exp[-(b x2/2)ldx = a/b = 1
0
Thus, the Rayleigh density becomes:
f(x)
x
Fig. 2.8 The Weibull density function.
where a and b are positive constants and are known as scale and shape
parameters respectively.
It is evident that the exponential and Rayleigh distributions are the special
cases of the two-parameter Weibull distribution when b = 0 and b =
1
respectively. Weibull distribution is useful whenever failure is caused by the
stress exceeding the strength at the weakest point of the item and is widely
applicable for Mechanical components.
where
00
The constants µ and cr > 0 are arbitrary and represent the mean and
standard deviation of the random variable. This function and the
corresponding distribution function are shown in Fig 2.9. This is the most
important probability distribution for use in statistics. It also has
applications in Reliability engineering, for example in the failure of Ball-
bearings.
The values assumed by the random variable X(t) are called states, and the set
of all possible values forms the state space of the process. The state space is
generally denoted by I.
Reliability Mathematics 51
f{x)
µ x
(a)
F(x)
-----------------,,,,,...
o.s
0 x
(2.38)
P0(t + At l - P0 (t)
----------------------------- = -z(t) P0(t)
At
Reliability Mathematics 53
dP0 (t)
(2.39)
------------- = -z(t)
P0(t)
dt
dP1(t)
--------- = z(t) P0 (t) (2.40)
dt
t (2.41)
P0(t) = exp[- f z( 't )d•l
0
and
t
P1(t) 1 - exp[- f z( •ld•l (2.42)
0
(2.43)
The role played by the initial conditions is clearly evident. If there is a fifty
fifty chance that the system is good at t = 0, then P0(0) = 1/2, and
t (2.44)
P0 (t) =( 1/2) exp[- f
z(•)d•l
0
0 ...
z(t) 6. t
0 p 1
Fig. 2.10 Markov graph for a single nonrepairable element
(2.44)
(2.45)
(2.46)
(2.47)
723(t) A t
1-13(t)At
Fig. 2.11 Markov graph for two distinct nonrepalrable elements.
dt
(t) dt
dt
dt
The initial conditions associated with this set of equations are P0(0), P1(0),
P2(0), and P3(0). These equations, of course could have been written by
inspection using the algorithm previously stated.
It is difficult to solve these equations for a general hazard function z(t), but
if the hazards are specified, the solution is quite simple. If all the hazards
are constant, Zo1 (t) = A., , Zo2(t) = A.2, z13(t) = A.a, and z23(t) = A.4.
The solutions are
(2.49a)
P1(t) (2.49b)
(2.49c)
P3(t) (2.49d)
where
(2.50)
Note that we have not as yet had to say anything about the configuration
of the system, but only have had to specify the number of elements and the
transition probabilities. Thus, when we solve for P0, P1, P2, we have
essentially solved for all possible two element system configurations.
Thus, our two-element model has four states, and a four-element model 16
states. This means that an n-component system may require a solution of as
many as 2° first-order differential equations. In many cases we are interested
in fewer states. Suppose we want to know only how many failed items are
present in each state and not which items have failed. This would mean a
model with n + 1 states rather than 2n, which represents a tremendous
saving. To illustrate how such simplifications affect the Markov graph we
consider a collapsed flow graph shown in Fig 2.12 for the two element
system. Collapsing the flow graph is equivalent to the restriction P' 1(t) = P1(t)
+ P2(t). Note that this can collapse the flow graph only if z13 = z23;
however, z01 and z02 need not be equal.
Markov graphs for a system with repair are shown in Fig 2.13(a,b). The
graph in Fig 2.13(a) is a general model, and that of Fig 2.13(b) is a
collapsed model.
The system equations can be written for Fig 2.13(a) by inspection using the
algorithm previously discussed.
0
•'01."i
No failure
71>1 (t)l.t
0"iti
One failW"e
z'12(1)11
s'
0
2"i"2
Two failures
(2.51a)
(2.51b)
(2.51c)
(2.51d)
(2.52a)
(2.52b)
(2.52c)
The probabilities in the general and the collapsed model are related by
(2.53a)
(2.53b)
(2.53c)
1- &(z +w )
23 20
z' A t
12.,.
s'
2
0
= x' x'
1 2
3.1 1 INTRODUCTION
59
Reliability Analysis of Series Parallel Systems 61
Once we have the right figures for the reliabilities of the components in a
system, or good estimates of these figures, we can then perform very exact
calculations of system reliability even when the system is the most
complex combination of components conceivable. The exactness of our
results does not hinge on the probability calculations because these are
perfectly accurate; rather, it hinges on the exactness of the reliability
data of the components. In system reliability calculations for Series-Parallel
Systems we need use only the basic rules of the probability calculus.
4. The state of each element and of the entire network is either good
(operating) or bad (failed).
Two blocks in a block diagram are shown in series if the failure of either of
them results in system failure. In a series block diagram of many blocks,
such as Fig 3.1, it is imperative that all the blocks must operate successfully
for system success. Similarly two blocks are shown in parallel in the block
diagram, if the success of either of these results in system success. In a
parallel block diagram of many blocks, such as Fig 3.2, successful operation
of any one or more blocks ensures system success. A block diagram, in
which both the above connections are used is termed as Series-Parallel Block
Diagram.
,·-.·-··
In
x - x - - ---- ·- ---
1 - 2 . _ .r - n
Out
In Out In Out
(atleast k needed)
Fig. 3.2 A Parallel Block Fig. 3.3 A k-out-of-m Block Diagram
Diagram
to pass the required current. Such a block diagram can not be recognised
without a description inscribed on it, as in Fig 3.3. Series and Parallel
reliability block diagrams can be described as special cases of this type
with k equal to m and unity respectively.
62 Reliability Engineering
Many complex systems are series systems as per reliability logic. The block
diagram of a series system was shown in Fig 3.1. If Ei and Ei' denote the
events of satisfactory and unsatisfactory operation of the component i, the
event representing system success is the logical intersection of E1, E2,...,En.
Reliability of the system is the probability of success of this event and is
given by
n (3.4)
R(t) = II
Pi(t)
i= 1
and
n (3.5)
R(t) exp [-t :E "-il
i=1
Therefore, the reliability law for the whole system is still exponential. Also,
for series systems with constant failure rate components the system failure
rate is the sum of failure rates of individual components i.e.,
n (3.6)
A.s = :E A.i
i=1
Example 3.1
Silicon transistors
Silicon diodes
Ai =0.000008/hr
A.d =0.000002 /hr
Composition resistors
Ar =0.000001 /hr
Ceramic capacitors
Ac =0.000004 /hr
Solution
This sum is the expected hourly failure rate As of the whole circuit. The
estimated reliability of the circuit is then
R(t) = exp(-0.0001 t)
This does not mean that the circuit could be expected to operate without
failure for 10,000 hours. We know from the exponential function that its
chance to survive for 10,000 hours is only about
37%. ***
Thus, when designing the circuits and their packaging, the circuit designer
should always keep two things in mind:
1. Do not overstress the components, but operate them well below their
rated values, including temperature. Provide good packaging
against shock and vibration, but remember that in tightly packaged
equipment without adequate heatsinks, extremely high operating
temperatures may develop which can kill all reliability efforts.
It may be observed that the time t used above is the system operating
time. Only when a component operates continuously in the system will the
component's operating time be equal to the system's operating time. In
general, when a component operates on the average for t, hours in t
system operating hours, it assumes in the system's time scale a failure
rate of
(3.8)
(3.9)
(3.10)
But if this component also has a time dependent failure rate of A.' while
energized, and a failure rate of A." when de-energized (with system still
operating), the component assumes in the system time scale a failure rate of
(3. 11)
Example 3.2
An electric bulb has a failure rate of 0.0002/hr when glowing and that of
0.00002/hr when not glowing. At the instant of switching -ON, the failure
rate is estimated to be 0.0005/switching. What is the average failure rate of
the bulb if on the average it is switched 6 times every day and it remains
ON for a total of 8 hrs in the day on the average.
Solution
Here,
t = 24 hrs
ti = 8 hrs
A.' =0.0002/hr
A." =0.00002/hr
"-c =0.0005/switching
c =6
***
In case the components in a series system are identical and
independent each with reliability, p or unreliability, q
R = pn = (1-q)n (3.12)
Example 3.3
Solution
R i=:i 1-nq
or, 0.99 = 1-
1Oq or, q
=0.001
Hence, p =0.999
R = p10
or, p10 =0.99
p =(0.99)0.1 = 0.99899.
We can thus see that the difference between exact calculation and
approximate calculation is negligible and hence the approximate realtion
is frequently used in practical design which in simple words means that
the system unreliability is the product of component unreliability by the
number of components in the system.
***
3.4 PARALLEL SYSTEMS
If Pr(Ei') =qi and Pr(Ej) =pj, the time dependent reliability function
is
(3.17)
m
R(t) = 1 - Ilqi(t)
i=1
(3.18)
m
= 1- n (1-
pj(t)) i=1
In case of identical
components,
R = 1 - (1-p(t)]m (3.19)
a = q(t)m (3.20)
When two components with the failure rates A.1 and A.2 operate in parallel,
the reliability Rp of this parallel system is given by
(3.25)
00
(3.26)
mp =I Rp dt = 1/A.1 + 1/A.2 - 1/(A.1 +
A.2)
0
When the failure rates of two parallel components are equal so that A.1 =
A.2 = A., the unreliability of this parallel combination of two identical
components is
Op = 01 02 = 02 = [1-exp(-A.t)J2
The reliability is
(3.31)
Example 3.4
Solution
m
Rp = 1-Il (1-Pi)
i=1
Example 3.5
0.92
B
0.98
,
A
0.92
0.98 0.98
D E
Solution
(0.9936) =0.9737
Example 3.6
Three generators, one with a capacity of 100 kw and the other two with a
capacity of 50 kw each are connected in parallel. Draw the reliability logic
diagram if the required load is:
(i) 100 kw (ii) 150 kw
Solution
The reliability logic diagram for case (i) is drawn as shown in Fig 3.5(a)
because in this case either one 100 kw or two 50 kw generators must
function. Similarly, the logic diagram for case (ii) is drawn as shown in
Fig 3.5(b) as in this 100 kw generator must function and out of the
remaining two any one is to function.
lOOkw
SOkw SOkw SOkw
The pertinent question here is, at what level should the components be
duplicated, i.e, at component level, subsystem level or system level?.
We will explain this with the help of an example. Consider the two
configurations as given in Fig 3.6.
[}--0·····-·- ··-· n
(a)
(b)
Fig 3.6: Redundancy at Component Level
Let the reliability of each component be r. The reliability of the system (Rs)
in the case of configuration 3.6(a) can be expressed as
Rs = 1-(1-rn)2 = rn(2-rn)
The reliability of the system (Rs') in the case of configuration 3.6(b) is
expressed as
Rs' rn(2-r)n
----- = -------------
It can be shown that the ratio R's:R 6 is greater than unity for r < 1.
Hence, the configuration 3.6(b) would always provide higher reliability.
Thus, as a generalisation, it can be said that the components if duplicated
in the system at the component level give higher system reliability than if
duplicted at the subsystem level (here each set is considered as a
subsystem). In general, it should be borne in mind that the redundancy
should be provided at the component level until and unless there are
some overriding reasons or constraints from the design point of view.
m
(3.34)
R = t mci pi (1-p)m-
i i=k
If the components are not identical but have different reliabilities, the
calculations become more complicated.
(R1 + 01) (R2 + 02) (R3 + 03) = R1R2Ra + (R1R203 + R1Ra02 +R2Ra01)
+ (R10203 + R2010a + R30102l + 01020a
= 1
To obtain system reliability for 1-out-of-3 system, we will discard the last
term only, i.e., 01020a and for 2-out-of-3 system, the last four terms are to
be discarded.
Example 3.7
Solution
It may be noted that there are several elements which can fail open or
short. The familiar examples are diodes and electrolytic capacitors in
electronic circuits. Several other elements having two modes of failures can
be similarly treated. For example, a valve fails to open when required or it
fails to close when needed has two modes of failure. The analysis given
below is applicable to such situations also.
Qo' + Qs' =
1.0 Qo'Q = Qo
Qs'Q = Qs
For two elements A and 8 in the active-parallel redundant configuration, the
unit will fail if
1. Either A or 8 shorts, or
2. Both A and 8 open.
(3.40)
For any range of q0 and q8, the optimum number of parallel elements is
one if q8 > q0• For most practical values of q0 and q8, the optimum number
turns
out to be two. In general, for a given q8 and q0, the reliability as a
function of m would have the form shown in Fig.3.7.
R
m
Fig. 3.7: Reliability versus number of elements
Om Om 0 (3.41)
Or,
ln[ln(q0)/ln(1-q8)]
m (3.42)
qs
1.0
Oplimwn nwnbcr=3
q·' qo/qs
Oplimwn nwnbcr=4
Oplimwn nwnbcr=S
t_:
Oplimwn nwnber >=6
.001
1 qo .OS
Parallel Unit
The result given above indicates that if q8 > q0, the optimum number of
parallel paths is one. However, addition of an element in series will result
in an increase in reliability if q8 is much greater than q0•
= 1 - [1-P8(0)] [1-Pb(O)J
= 1 - (1-Qoa) (1-Qob)
If all elements are identical, the reliability of the n-element series unit is
Using the same approach as that for the parallel configuration case, it
is easily shown that the optimum number of series elements for a given Q0
and Qs is
The estimated failure probability for an element that can short or open is
0.15. The ratio of short to open failure probabilities is known to be 0.25.
What is the optimum number of parallel elements to use ?.
Solution
Here,
qo +qs =0.15 and qslqo =0.25
ln[ln(0.12)/ln(1-0.03)]
mopt
= ln[(1-0.03)/0.12]
It may be pertinent to point out here that if the numerical value of the
optimum number does not come out to be close to an integer, we should
determine the reliability by considering integers on both sides of the real
value and then choose the optimum one.
***
3.71 Fail-Safe and Fail-to-Danger
(3.46)
(3.47)
Using the approximation Ps < < 1, we see that the fail-safe probability
grows linearly with the number of units in parallel,
Os i== N Ps (3.48)
m (3.49)
= I: mq (pd)i( 1-pd)m-j
j=m-k+ 1
(3.50)
(3.52)
From Eqs.(3.50) and (3.52) the trade-off between fail-to-danger and spurious
operation is seen. The fail-safe unreliability is decreased by increasing k
and the fail-to-danger unreliability is decreased by increasing m-k.
We have,
exp(- A.t)[1 + A.t + (A.t)2/2! + (A.t)3/3! + ---------] = 1
In this expression the term exp(- A.ti• 1 represents the probability that
no failure will occur, the term exp(- A.t)*(A.t) represents the probability
that exactly one failure will occur, exp(- A.t)(A.t)2/2! represents the
probability that exactly two failures will occur, etc. Therefore, the probability
that two or one or no failure will occur or the probability that not more than
two failures will occur equals:
exp(- A.t) + exp(- A.t) A.t + exp(- A.t) (A.t)2/2!
For a stand-by system of three units which have the same failure rate and
where one unit is operating and other two are standing by to take over the
operation in succession, we have
ms = (n + 1)/ A. (3.58)
It is the exception rather than the rule that the failure rates of the stand-by
units are equal to those of the operating unit. For instance, a hydraulic
actuator will be backed up by an electrical actuator, and there may be even
a third stand-by unit, pneumatic or mechanical. In such cases, the
failure rates of the stand-by units will not be equal and the formulae
which we derived above will no longer apply.
1. A succeeds up to time t or
2. A fails at time ti <t and B operates from ti to t.
The first term of this equation represents the probability that element
A will succeed until time t. The second term excluding the outside
integral, is the density function for A failing exactly at ti and B
succeeding for the remaining (t-til hours. Since ti can range from 0 to
t, ti is integrated over that range.
For the exponential case where the element failure rates are A.a and
00 t 00
and (3.621
It can be shown that it does not matter whether the more reliable element
is used as the primary or the stand-by element.
Example 3.9
Solution
The appreciable decrease in the values of reliability and MTBF may please
be observed by the reader because of the imperfect nature of sensing and
switching over device.
***
3.81 Types of Standby Redundancy
1. Cold Standby
In this case, the value of the standby component changes progressively. For
example, components having rubber parts deteriorate over time and
ultimately affect the reliability of standby component.
3. Hot Standby
The standby component in this case, fails without being operated because
of a limited shelf life. For example, batteries will fail even in standby due
to some chemical reactions.
4. Sliding Standby
--------@--
Fig 3.9: Sliding Standby
It may be noted that sliding standby components may have more than one
component in standby depending upon the reliability requirement.
In this case, an Automatic Fault Locator (AFL) is provided with the main
system which accomplishes the function of locating the faulty component,
disconnecting it and connecting the standby component. AFL's are
generally provided in automatic and highly complex systems. The sliding
standby redundancy having AFL is shown in Fig 3.10.
-8--------
LGJ
4. 1 INTRODUCTION
87
88 Reliability Engineering
consists of a check valve and a shut-off valve in series. Any branch of the
two pairs is capable of supplying sufficient gas to the cabin. There are
three alternative paths between the oxygen tank and the pair of valves.
Oxygen can be transmitted to the cabin through either of the two regulators
and the pair of valves connected to the regulator. It can also be
transmitted to the cabin through a selector valve and either of the two pairs
of valves.
Regulator
tor
Regulator
The most common problem which arises in the analysis of such a network
is to compute in an efficient and systematic manner the source to
terminal reliability between a given pair of nodes, namely, the probability
that there exists at least one path between these two nodes. Although
not necessary, it is generally convenient to simplify the diagram by
removing purely series, purely parallel, self-loops and dead-end
connections before applying any of these general algorithms.
21
20 13 14 lS 16 17
The methods in the second group do not require a prior knowledge of all
paths of the network. These methods are also important as the computer
time needed to determine all minimal paths is sometimes comparable to
the time required for making the terms of the success function disjoint.
Three such methods viz. Delta-Star Method, Logical Signal Relations method
and Baye's Theorem Method are also discussed.
An example has been solved by all the methods discussed below. This
allows the reader to easily compare the algorithms and also ensures
correctness of calculations by all methods.
For the bridge network of Fig.4.3; the connection matrix (CJ is written as:
r 0 0 A c l
I 0 0 0 0 I
[CJ = I 0 B 0 E I
L 0 D E 0 J
nl
ht n2 Out
The method requires removal of the last row and last column after modifying
the remaining entires of [CJ as:
where nth row (column) is the last row (column) in the matrix. This operation
will lead to all required paths from i to j through n. Thus, a reduced
connection matrix of size (n-1I is built. The above steps are successively
repeated till a matrix of size 2 is obtained. Element C12 of this matrix
corresponds to all the paths. Removing nodes 4 and 3 respectively from
the connection matrix,
I0 CD(4) A + CE(4) l
I I
C(4l = Io o o I
I I
Lo B+ ED(4l o J
l0 CD(4) + AB(3) + CEB(4,3) + AED(4,3) l
C(4,3) = I I
Lo o J
Hence, the minimal paths are: CD, AB, CEB and AED. The number(s) in
parenthesis denotes the node which has been traversed and is recorded
to avoid going over that node again. The algorithm is attractive as it does
not require matrix multiplications and the size of the matrix reduces in
every step.
As already stated, we first derive the s-o-p expression for the system
success function as a pure Boolean algebraic statement. If it is to be
interpreted as a probability expression, certain modifications may be
necessary. The modifications are necessary because the following relation
for expressing the probability of the union of n events is true only if the
events are mutually exclusive
Z = AB + ACD (4.3)
El E2 E3 E4
ES
D
This equation could have been obtained directly from the orginal Boolean
expression by converting the same into its canonical form as:
Z = AB + AB'CD (4.7)
The key problem of all Boolean algebra methods thus is to rewrite the
Boolean statement of system success /failure function in a form (as concise
as possible) such that all terms are mutually disjoint. It may be observed that
two conjunctive terms T1 and T2 will represent disjoint groupings if there
exists at least one literal in T 1 such that the same literal occurs in its
complemented form in T2·
(4.9)
(4.10)
Ultimately, we shall find all subsets of P2 which are disjoint with P1. Union
of all these subsets is P2,dis· Similarly we find Pj,dis for all j such that Pj,dis
n Pi= 0 for all i<j. This step is fastest if we first expand Pi about a branch
which has occurred in Pi's most often. Then
m
u Pi,dis (4. 11)
i= 1
4. Let j = j + 1
5. If j < m; go to step 4.
Example 4.1
The above steps of the algorithm are illustrated with the help of the non-
series-parallel reliability logic diagram in Fig.4.3.
The sets associated with the paths of the above network, properly arranged,
are:
E1 = [1 1 0 0 OI
E2 = [0 0 1 1 OI
EJ = [ 1 0 0 1 1I
E4 = [0 1 1 0 1J
T1 = I 1 1 0 0 01
T2 =[1 1 1 1 0I
TJ = I2 1 1 2 1I
T4 = [2 2 2 2 2J
P1,dis = P1 = AB
Considering E2 and T2, K1 = A, K2 = B
E2(A) = [ 1 0 1 1 0 ICONTINUE
E2(A') = (-1 0 1 1 0 IRETAIN
E2{A){B) = [ 1 1 1 1 0 IDROP
E2(A)(B') = [ 1 -1 1 1 0 I RETAIN
Therefore,
Hence,
(4. 17)
***
4.5 CUT SET APPROACH
The method for finding the unreliability expression using this approach is
just the dual of the method for finding the reliability expression using a
knowledge of paths. The basic philosophy remaining same, all the reported
methods for the reliability analysis using paths can be easily transformed
for the dual analysis. The method described in section 4.4 is shown
applied in the following example using cutset approach.
Example 4.2
Derive the reliability expression for the graph shown in fig.4.3 using
cutset approach.
Solution:
It can be seen easily that s-t cutsets are AC, BD, ADE and BCE.
We now proceed to first make the second term disjoint with respect to the
first as follows:
Now AB' D' is disjoint with respect to the first term but not with A' B'D'.
Hence expanding A'B'D' further, we have:
Now A'B'CD' is disjoint with respect to the first two terms and A'B'C'D' can
be dropped because it is completely contained in the first term. Therefore,
S' = A'C' u AB'D' u A'B'CD' u A'D'E' u B'C'E'
Proceeding similarly for making third and fourth terms also disjoint, we have
finally the following expression for S' in which all terms are mutually disjoint.
As all the terms are mutually disjoint, probability calculations are relatively
straight forward and we have the following expression for 0 i.e. Pr{S'}:
(4.20)
(4.21)
***
4.6 DELTA-STAR METHOD
O/P
L'P node
node
the remaining two (one) sets have flows coming out of the corresponding
nodes.
(a) Between node 1 and nodes 2 and 3 (Fig.4.6a) when all three sets are
present.
For example, three components of a system with reliabilities R,3, R12, R32
connected to form the delta configuration shown in Figs.4.5 & 4.6 can be
transformed into star equivalent with reliabilities R10, R20, R3o·
(4.22a)
(4.22b)
(4.22c)
Solving the above equations for R10, R20, R3o results in:
(4.23a)
(4.23b)
(4.23c)
Where,
(4.24a)
(4.24b)
(4.24cl
Example 4.3
Rio
L 3
R 30
(a)
Q_J-
i RJ.o o o 2
(b)
1\2 R 32 1 RJ.o o R 30 3
(c)
Solving for R10, R20 and R30 from the above equations, we have
(4.27)
* * *
4.7 LOGICAL SIGNAL RELATIONS METHOD
A pair of nodes ni and ni are fused if the two nodes are replaced by a
single new node such that all branches that were incident on either ni or
ni or on both are now incident on the new node. We denote the fusion of ni
and ni as
nini. More than two nodes are fused by taking them two at a time until all
are fused.
The logical signal relations for some common sub-networks are given in
Fig.4.8. Each relation is expressed so that its terms are always mutually
disjoint. Sub-networks at serial number 4 and 5 refer to 2 and 3 branches,
respectively, incident on a node. This concept can easily be extended for b
branches incident on a node by observing the recursive nature of relations.
1. (a) Write the logical signal relation for the sink node.
(b) Successively proceed towards the source node using the required
relations. Repeat until the source node is reached.
Substitute,
S(n1) = S(n1 ....) = 1
n4
E Out
nl
In
2. In the expression thus obtained for the logical signal at the output node,
replace the logical variables by the corresponding probability variables to
obtain the reliability expression.
Example 4.4
(4.28)
10 Reliability Engineering
1.
0
.. X;
S(nj) = X; S(n ; )
0
n; "j
2.
0 .. X;
... Xj S(n 1 ) = xj S(nj ) = X; X j S(n; )
0 0
n; "j "I
X;
n;
nz
6.
S(n4) =BO'(A u A'CE) S(n1) u B'O(C u AC'E) S(n1) u BO(A u A'C) S(n1)
..(4.29)
Substituting S(n1) = 1
Therefore,
Example 4.5
= [1-P(A')P(C')J [ 1- (4.32)
P(B')P(O')]
= ( 1-QaQcH 1-QbQd)
(b) When E is bad
-A B -
In
Out
- c
D -
(a) E- Good
A B
c D
(b) E- Bad
Derive an expression for s-t reliability of the network shown in fig.4. 10.
Solution:
(4.36)
where Rs1 and Rs2 are given in equations (4.32) and (4.33) respectively and
Pe is given as:
Pe = Py + Pz -PvPz (4.37)
(4.38)
***
(a) X-bad
(b) X-good
5. 1 INTRODUCTION
107
10 Reliability Engineering
8
5.2 PURPOSE
s
requiremenls Failure rate data
*MIL-lIDBK-217
*GIDEP
•RADC-NPAD
*Others
.--..-JA'saibution
IJ!:sign requirements,
pusslon pro
mterface reqwremenls
etc.
Reliability
Prediction
P application
n:view
C SULTATION
Drawings/ RECOMMENDATIONS
SPECS 1-----
RESOLUTION
Selected p
parts list
5.3 CLASSIFICATION
Preliminary design prediction is intended for use in the early detailed design
phase. During this phase design configuration data are documented by
engineering sketches and preliminary drawings. The level of detailed
information available may be restricted to part listings. Stress analysis data
are not generally available. Parts Count Method is one such preliminary
design prediction method.
The failure rates should be corrected for applied and induced stress levels
with duty cycles determined by Mission Analysis.
TABLE 5.1
Ground. Mobile GM: Conditions more severe than those for GF. mostly for
vibration and shock. Cooling air supply may also be more limited.
Missile, Captive Carry Mc: Same as AuT, AuF or AuH depending on the
applicable aircraft platform.
5.641 Procedure
The item failure rate can be determined directly by the summation of part
failure rates if all elements of the item reliability model are in series or can
be assumed in series for purposes of an approximation. In the event the
item reliability model consists of non-series elements (e.g. redundancies,
alternate modes of operation), item reliability can be determined by
summing part failure rates for the individual elements and calculating an
equivalent series failure rate for the non-series elements of the model.
The general expression for item failure rate with this method is:
i=n (5.1)
A.item = l: Ni A.ai
Ilai i=1
for a given item
environment. Where
Quality factors are to be applied to each part type where quality level data
exists or can be reasonably assumed. Multi-quality levels and data exist for
parts, such as microelectronics, discrete semiconductors, and for established
reliability(ER) resistors and capacitors. For other parts such as non
electronics, Ila = 1 provided that parts are procured in accordance with
applicable parts specifications.
Table 5.2 shows typical parts count method prediction of a transmitter unit.
Major parts that are used in electronic equipment which have an influence
on the reliability of the system and their behaviour is dependent on the
stresses are:
* Microelectronics
* Discrete Semiconductors
* Electron Tubes
* Lasers
* Resistors
* Capacitors
* Inductive Components
* Rotary Components
* Relays
* Switches
* Connectors
* Wires & Printed Wiring boards
* Connections
* Miscellaneous
TABLE 5.2
...............<0».:<0»343s..ir.......0,..:·31a3o2..
Resistorsfvariable>
Non Wire wound
..... .... ... .. ... .. ... .
i 6 ............
!................... u. ........ ....
0.900 l 5.400
T........................
..ca·;;-a·c:itC>·.:5ffi· Ci'i............I.........................
................................!.........................
Ceramic l 40 0.054 i 2.160
..I.i:!! !!E...............................L.t..................
...............9..:9..1.?...i........9..:.!.Q..
.. !.tr.QJ.Y.!J. .......................... f.. ................................. .J?.:.Q?.9..+...... :.?..
I Part Quality
II The use environment Ill
The Thermal Aspect
The quality factor of the part has a direct effect on the Part failure rate and
appears on the Part Models as Ila.
The applicable model parameters based on the stress and other related
factors are obtained from the relevant tables and substituted in the
corresponding expressions. The failure rate for each part is obtained and
considering all parts as a series system (because the absence of any part
QS
R4 R6
lOA
40-SOV
will not make the circuit functional) the total failure rate (or MTBF) is
obtained as a summation taking into account the interconnections and
printed wiring board configuration.
The different types of components used in the circuit are
Transistor Ap = Ab( Ile TIA Ila TIR Ilg Ile) failures/1Q6 hrs (5.2)
Variable Potentiometer
Ap = Ab( Iltaps Ila TIR Ilv Ile Ile) failures/106 hrs (5.5)
The total failure rate for the circuit using Part Stress Analysis works out to
0.606 x 10-6 hrs, whereas that by Parts count method it is calculated as
1.45 x 10-6 hours. From this, it can be observed that in this case there is
more than a two fold improvement on the failure rate or MTBF figure.
However, even for such a simple circuit as the one given in Fig.5.2 the
manual work associated with circuit analysis and calculation of values of
failure rates refering to the appropriate MIL-HDBK-217 tables with applicable
TI factors requires about one full man day as compared to less than an hour
for calculations by the Parts Count Method. This is the price to be paid for
Parts Stress Method which is more refined and leads to better and accurate
prediction. Table 5.3
Detail s of Circuit Parts with Actual and Rated Stresses
S.No Code Tvoe Aoolied Stress Max.Ratings
1 01 2N 1479 1.00W 5.00 W
2 02 2N 3055 10.00W 117.00W
3 03 2N 3055 66.00W 117.00W
4 04 2N 3053 0.50W 5.00 W
5 05 2N 3055 66.00W 117.00W
6 06 2N 3053 0.50W 5.00 W
7 07 2N 3055 66.00W 117.00W
8 R1 1.2K 0.39W 1.00 W
9 R2 0.1K 1.16W 2.50 W
10 R3 2.0K 0.16W 0.25 W
11 R4 .1K 1.16W 2.50 W
12 R5 570 0.50W 1.00 W
13 R6 .1K 1.16W 2.50W
14 R7 270 0.06W 0.25W
15 RB 1K 0.10W 0.25W
16 R9 1 K Pot 0.10W 0.25W
17 R10 1K 0.10W 0.25W
18 Cl 1 MF 18.00V 50.00V
19 C2 100 MF 30.00V 63.00V
20 CR BZV 58 C12 0.10W 0.40 W
TABLE 5.4
Fa1'Iure Rate Calculat1'on b1v Parts Stress A na1I vs1.s
Part Failure rate No. of similar Total failure
Ref. A.n X 106 Parts rate A.n x 106
03, 05, 07 0.04200 3 0.126
02 0.00430 1 0.0043
01 0.05600 1 0.056
04, 06 0.00315 2 0.0063
CZ 0.01100 1 0.011
R2, R4, R6 0.03230 3 0.097
RS, R10 0.01150 2 0.023
R1, R5 0.00280 2 0.0056
R3, R7 0.00840 2 0.0168
R9 0.036 1 0.036
PWB 0.000576 1 0.000576
Connections 0.0055 40 0.2244
Total : 0.606
120 Reliability Engineering
Presently there has been significant change in the approach for Reliability
Prediction Methods. A microcomputer revolution has taken place, and
personal computer systems like, PC, PC/XT, PC/AT have flooded the
market. Manual calculations and data generation have become time
consuming and the present day computer having multi·tasking, multi-user
features with interactive facility and powerful software packages have helped
to unburden the design and reliability engineer. Most of the softwares have
been developed on Microcomputer Systems having a 256 KB memory, 2
floppies, 10 MB Hard disk, Monitor (Colour Display) and printer with the cost
of software being nominal. The use of the computer as a tool for all these
and availability of many sources for software on 5.25" I 3.50" floppies
assure portability and easy access etc. The language mostly used is
dBase 111/ IV.
• Predictor
• 217 Predict
• HARP, (Parts Count using Standard failure rate lists other than MIL
HDBK-217)
* RELECALC 217
* IRAS
(a) Piece parts making up the system and their breakdown into modules (bl
Part dependent parameters for each piece part
(c) Failure rate models and failure rate confirmation covered by it for
each piece part
(d) Part application dependant parameters for each part
(e) Contingency parameters (treatment of default values, trade-off
analysis, redundancy)
(f) Forms of prediction results
(g) Structuring of ORACLE outputs to meet the data item description
6.1 1 INTRODUCTION
(6.1)
where,
(6.2)
122
Reliability Allocation 12
3
programs initiated after unfortunate field experiences.
No
No
Release to production
No
Let there be N subsystems in the system whose reliability goal is R*. Out of
these N subsystems, let there be m( N) subsystems whose estimated or
predicted reliabilities are known and reliability improvements are considered
feasible. Let n( = N-m) be the remaining subsystems whose estimated or
predicted reliabilities are not known and we have to allocate reliabilities to
these subsystems considering parameters such as cost, complexity, state of
art, etc. These n units are beyond the purview of this section and the
problem of reliability allocation for this group is discussed in the next
section.
For the purpose of this section, therefore, the statement of the problem is:
Let
As* : system failure rate
Aj : predicted failure rate for jth subsystem
Aj * : allocated failure rate for jth subsystem
(i) If As• is the system failure rate requirement, allocated unit failure rates
Aj • must be chosen so that
(6.5)
(6.7)
(6.8)
Example 6.1
A system has four serial units with predicted failure rates of 0.002, 0.003,
0.004 and 0.007/hr. If system failure rate is desired to be 0.010, allocate
failure rates to four units.
Solution
Therefore,
Hence,
Example 6.2
Solution
Unit weights have already been computed in example 6.1. Hence, allocated
relaibilities are directly computed as:
Rl * = ( R')W1 = (0.90)0.1250 = 0.987
(6.9)
where Rm+1
Example 6.3
Solution
R0 =(0.65/0.92)112 = 0.841
n (6.13)
II Rj* R"
j =1
If A.j* is the allocated failure rate for jth subsystem and A. is the
A
required failure rate for the system, the above equation is equivalent to
(6.14)
(6. 15)
where wi is the weightage factor for jth subsystem. These weightage factors
have obviously to be defined in such a manner so as to have
L Wj = 1 (6.16)
To make sure that the above equation is satisfied, we define wi, in terms of
proportionality factors Zj 's as
(6.17)
(6.18)
1. Complexity
2. Cost
3. State of Art
If a component has been available for a long time and has experienced an
extensive development program including failure analysis and corrective
action of deficiencies, it may be quite difficult to further improve its reliability
even if the reliability is considerably lower than desired. Other components
which have initially high reliabilities may be further improved relatively rather
economically.
4. Redundancy Introduction
(6.22)
The value of factor Fj is taken as 2/3 if jth subsystem can have units
connected in parallel and is taken as 1 otherwise.
5. Maintenance
(6.23)
6. Time of Operation
If T is the mission time and also the operating time of all subsystems, time
of operation need not be considered in reliability allocation. However, for a
sophisticated mission, it is probable that some subsystems are required to
operate for periods less than the mission time.
(6.24)
where di can be defined as the duty ratio for jth subsystem i.e. the fraction
of the mission time for which jth subsystem operates. So,
dj = t/T (6.25)
KiCiFiMi (6.26)
Zj =
------------
Sjdj
The proportionality sign has been replaced by equality without any loss
of generality as any constant will cancel out during computation of
weight factors.
Example 6.4
1. Subsystems 7 and 8 operate for 75% and 50% of the mission time
respectively. All other subsystems operate for complete mission time.
2. Redundancy can be used at subsystems 6 and 10 only.
3. Maintenance is not possible for any of the subsystems.
4. The values of complexity factor, cost factor and state of art factor for
these subsystems are:
j K; C; S;
6 6 2 1.0
7 5 3 4.0
8 3 2 3.0
9 7 4 5.0
10 2 6 2.0
Soluti.on
R0 = 10.928/0.98)1/2 = 0.973
Hence, Ra* = R4* = 0.973 and Rs* = 0.980 (Unchanged)
KjCjFjMj
Zj = -----------
Si di
Mi = 1 for all j
Fs = F10 = 2/3; F1 = Fs = Fg = 1
d1 = 0.75, ds = 0.50, ds = dg = d10 = 1
Using the above and the table of data given,
Zs = 8, Z7 = 5, Zs = 4, Zg = 5.6, Z10 = 4
Wj = Zjl s Zj
W5 = 0.3007,
W7 = 0.1880,
Wa = 0.1504,
Wg= 0.2105,
and w10 = 0.1504
***
6.4 CRITICALITY
(6.27)
Rb = 1 - Xb + XbPb
(6.28)
Obviously, Rb = Pb for Xb = 1
and Rb = 1 for Xb = 0
R(b)
p(b)
0
Fig. 6.2: Equivalent component reliability v/s criticality. X(b)
Rj = 1+ Xj[Rj* - 1]
or,
This approach thus makes the reliability allocation for partially critical
components also a relatively simple exercise.
R(j)
X(j)
Ri + xj* - 1 = 0 or Xi • = 1 - Rj (6.31)
R(j)
------+-- ARellloiacbaitleity
Do not
bother
0 XQ)
Fig. 6.4: Applicable nnge for actual reliability allocatioJL
7
REDUNDANCY TECHNIQUES FOR
RELIABILITY OPTIMIZATION
7.1 1 INTRODUCTION
Of these, the last method is most effective and most commonly used. The
other methods are generally limited by the level of improvement which
can be achieved. For example, it is well known that system reliability
can be improved by using superior components, i.e., highly reliable
components with low failure rates. But it is not always possible to
produce such highly reliable components with reasonable effort and/or
cost. We describe commonly used Redundancy Techniques in this
chapter.
140
Redundancy Techniques for Reliability Optimization 141
Table 7.1
Hamming Code for BCD
No. P1 P2 da P3 d2 di do
........................... . .1.......................... ....2......................... .... .....3........................ .....4.......................... ...5.......................... ---6--------------
0 .....7..................... ......
1 1 1 0 0 0 1
---2-·----------- --0------------- 1 ................. ---
0----------- ---1----------- -0-------- --1-------------
0
----- 3---------- ....0......................... ....0............................ ··0· -·----- .....0........................ . .....1.......................
------------- ............................
4 1 0 0 1 1 0 0
5 0 1 0 0 1 0 1
6 1 0 1 1 1
-- ----------- --···------ ·· · -·· ------ · - ---···--- -- ----··· -···· ··········-- ·
-7-·-··· ··· ···0· ·--· ----···....0...........................· ·0··-··········1·· ----·....1......................... -·-1--··· -·-
8 1 1 0 0 0 0
9 0 0 1 0 0 1
Table 7.1 shows Hamming code corresponding to BCD code. Each parity bit
when combined with selected data bits, produces even parity. Parity check
bit P1 is associated with data bits da, d2, do and gives C3; p2 with d3, d1,
do and gives C2; and p3 with d2, d,, do and gives C1. Error detection
and location are performed by checking the code words at receiving end to
form word C1C2C3.
1. Dynamic programming
2. The discrete maximum principle
3. The sequential unconstrained minimization technique (SUMT)
4. Method of Lagrange multipliers and the Kuhn-Tucker conditions
5. Geometric programming
6. Integer programming
7. Heuristic approaches
The above techniques can be classified as Exact and Approximate Methods.
Exact methods give us optimum solution but require large amount of
computer time and memory. Approximate methods are faster but may not
result into the optimum solution.
n
(7.1)
R = IT 1 - ( 1-Pi)
Xi
i=1
7.51 Method I
n
R = II 1 - ( 1-Pi)Xi
i= 1
(7.2)
m 1,2,......., m (7.3)
L Cii (Xi) Ki
;
i= 1
Example 7.1
P1 = 0.60, C11 =2
P2 = 0.65, C21 = 1
Solution
(2 + 1) X2 s; 5 or X2 = 1
2 X 1 s; 5 - 1(1) = 4 or X 1 = 2
x = [2 11
R = 0.546 ***
Example 7.2
Consider a four stage system for optimum redundancy allocation with two
linear constraints. The data are:
n = 4, Ki s; 56, K2 s; 120
Solution
and (5 + 4 + 8 + 7 ) X4 s: 120
or X4 =4
Eliminating stage 4 and then considering stage 1,
(5 + 4 + 8) x, s: 120 - 4 (7) = 92
or x, = 5
(4 + 8) X3 s: 92 - 5 (5) = 67
or
2.3 X2 s: 32 - 5 (3.4) = 15
4 X2 S: 67 - 5 (8) = 27
or
Therefore,optimum solution
is
X = [5 6 5 4]
R = 0.99747 ***
7.52 Method II
n
0 = 1 -II 1 - Xi
i=1
(7.4)
which can be approximated as
a - (7.5)
n (7.6)
l: Cii (Xi) :S: Ki; j = 1,2,.........,
m
i=1
The sequential steps involved in solving the problem by this method are
as follows:
Table 7.2
(Solut1on of Example 7.3)
Stage Unreliability
X1 X2 I II Cost
1 1 0.40* 0.35 3
2 1 0.16 0.35 5+
X = [2 1 ***
1
Example 7.4 (Data same as in Example 7.2)
Table 7.3
(Solution of Example 7.4)
Stage Unrer1ab1T1ty
x, X2 X3 X4 I II Ill IV K, K2
1 1 1 1 0.2000 0.3000* 0.2500 0.1500 11.4 24
1 2 1 1 0.2000 0.0900 0.2500* 0.1500 13.7 28
1 2 2 1 0.2000* 0.0900 0.0625 0.1500 17.1 36
2 2 2 1 0.0400 0.0900 0.0625 0.1500* 18.3 41
2 2 2 2 0.0400 0.0900* 0.0625 0.0225 22.8 48
2 3 2 2 0.0400 0.0270 0.0625* 0.0225 25.1 52
2 3 3 2 0.0400* 0.0270 0.0156 0.0225 28.5 62
3 3 3 2 0.0080 0.0210* 0.01 56 0.0225 29.7 65
3 4 3 2 0.0080 0.0081 0.01 56 0.0225* 32.0 69
3 4 3 3 0.0080 0.0081 0.01 56* 0.0034 36.5 76
3 4 4 3 0.0080 0.0081 * 0.0039 0.0034 39.9 84
3 5 4 3 0.0080* 0.0024 0.0039 0.0034 42.2 88
4 5 4 3 0.0016 0.0024 0.0039* 0.0034 43.4 93
4 5 5 3 0.001 6 0.0024 0.0010 0.0034* 46.8 101
4 5 5 4 0.0016 0.0024* 0.0010 0.0005 51 .3 108
4 6 5 4 0.0016* 0.0007 0.0010 0.0005 53.6 1 12
5 6 5 4 0.0003 0.0007 0.0010 0.0005 54.8 117
(No addition now possible without violating the constraints)
X = [5 6 5 4]
***
7.53 Method Ill
similar reliability, but different in cost (or any other constraint). In any
complex practical system invariably there shall be components with almost
same reliability but widely differing cost because of different nature of
components.
pi Qi Xi
(7.7)
Fi(Xi) = ---------------------
m
II A Cii (Xi)
j=l
pi Qi Xi+ l
Fi(Xi + 1) = ---------------------
m
II A Cii (Xi)
j=1
(7.8)
3. Mark the stage (*) having highest value of stage selection factor Fi(Xi). A
redundant component is proposed to be added to that stage.
4. Check constraints:
(a) If the solution is still within the permissible region, add the redundant
component. Modify the value of Xi and hence Fi(Xil and go back to
step 3.
Fi(Xil ---------------
Table 7.4
(solut1on of Examp e 7.5)
X1 X2 F,(Xil F2(Xil X1Ci1 Fi(Xi + 1)
x [1 3J
R 0.599 ***
15 Reliability Engineering
Table 7.5
(Solution of Example 7.6)
x, X2 X3 X4 Fi(Xi) F2(Xi) F3(Xi) F4(Xil :EXiCi1 :EXiCi2 Fi(Xi + 1)
1 1 1 1 2.667* 2.283 0.689 0.404 11.4 24 0.533
2 1 1 1 0.533 2.283* 0.689 0.404 12.6 29 0.685
2 2 1 1 0.533 0.685 0.689* 0.404 14.9 33 0.172
2 2 2 1 0.533 0.685* 0.172 0.404 18.3 41 0.205
2 3 2 1 0.533* 0.205 0.172 0.404 20.6 45 0.107
3 3 2 1 0.107 0.205 0.172 0.404* 21.8 50 0.061
3 3 2 2 0.107 0.205 * 0.172 0.061 26.3 57 0.062
3 4 2 2 0.107 0.062 0.172* 0.061 28.6 61 0.043
3 4 3 2 0.107 * 0.062 0.043 0.061 32.0 69 0.021
4 4 3 2 0.021 0.062* 0.043 0.061 33.2 72 0.018
4 5 3 2 0.021 0.018 0.043 0.061. 35.5 78 0.009
4 5 3 3 0.021 0.018 0.043* 0.009 40.0 85 0.011
4 5 4 3 0.021 * 0.018 0.011 0.009 43.7 93 0.004
5 5 4 3 0.004 0.018* 0.011 0.009 44.6 98 0.005
5 6 4 3 0.004 0.005 0.011 * 0.009 46.9 102 0.003
5 6 5 3 0.004 0.005 0.003 0.009 * 50.3 110 0.001
5 6 5 4 0.004 0.005 0.003 0.001 54.8 117
(No addition now possible without voilating the constraints)
X = [5 6 5 4
1
***
8
MAINTAINABILITY AND AVAILABILITY
8. 1 INTRODUCTION
From time to time, statistics are generated which emphasize the costliness
of maintenance actions. While estimates of actual costs vary, they
invariably reflect the immensity of maintenance expenditures. According to
one source, approximately 800,000 military and civilian technicians in U.S.A.
are directly concerned with maintenance. Another source states that for a
sample of four equipments in each of three classes - radar, communication,
and navigation the yearly support cost is 0.6, 12 and 6 times, respectively,
the cost of the original equipment. Such figures clearly indicate the need
for continually improved maintenance techniques.
153
15 Reliability Engineering
4
cost. Performance capability includes the capacity to meet specified
requirements such as range, power output, sensitivity and the like.
Dependability is a measure of the degree of consistency of performance and
is essentially the same as operational availability. Availability is, in turn, a
function of reliability and maintainability. System cost must include the total
amount for development, production and service-life support of the
equipment.
Maintainability, then is only one part - although a very important part - of the
measurement of over- all system worth. The US Department of Defence
definition of maintainability is quoted as follows :
The search for a single definition that encompasses all the attributes of maintainability
in a quantitatively measurable term is,for the present, unrewarding. It isfirst necessary
to identify and measure the most relevant factors that make up this end measurement.
It is likely that no single final measurement will adequately serve all purposes.
In line with this reasoning, several possible indices were suggested which
may be useful in the quantitative description of maintenance activity.
Among these are:
It is probable that any or all of the indices above may be needed in one
situation or another, plus, perhaps, other special indices.
Mt = maintenance time
X 1,..,X0 = values which quantitatively express the n governing factors
described above.
Planned Unplanned
Preventive Corrective
I
Look, Feel
I
Shut Down
Inspection
Emergency
Rwtning Maintenance
Maintenance
A closer study of Fig 8.2 leads to many interesting results. In the first case,
as the degree of maintenance increases, the cost of emergency maintenance
decreases (shown by a thick line) while the cost for planned maintenance
increases with an increase in the degree of maintenance. The total
maintenance cost is shown as a dark thick line. By inspection, it is obvious,
that there is a point where total maintenance cost is minimum; that 'is, where
the maintenance is economical for a degree of maintenance. The cost
figures indicated below the figure show the percentage of cost in three
cases. First, before planned maintenance, the major cost involved is of
emergency maintenance. In the case of economic maintenance, the
interesting point to note is that there is a saving of at least 20 % of the total
cost. When the degree of maintenance increases greatly, it becomes
uneconomical and the major share is taken by planned maintenance. From
this analysis, we may infer that too much maintenance can be as costly as
too little maintenance.
100
t
Cost of
Maintenance
85%
20% Reduced Total Cost
Emergency S%
10%Emergency 5% Emergency
Planned 10%
SlUldry 60%Planned BS%Planned
10%SlU1dry 10% SWldry
Active repair time is the number of down-time hours during which one or
more technicians actually work on a system to restore it to operable
condition. Logistic time is the number of down-time hours consumed in
awaiting parts or units needed to affect a repair. Administrative time is
that portion of down time not covered by active repair time or logistic
time. Based on a 24 hr day, it includes overnight time, weekends, and
normal administrative delays.
Active repair time is usually indicative of the complexity of the system, the
nature of its design and installation, the adequacy of test facilities, and the
skill of maintenance personnel.
We list below the factors which can be provided in the design of a system
to achieve optimum maintainability.
The percentage of time the equipment is under operation is called the steady
staJe availability. It characterizes the mean behaviour of the equipment. The
availability function A(t) is defined as the probability that the equipment is
operating at time t. Although, this definition appears to be very similar to the
reliability function R(t), the two have different meanings. While reliability
places emphasis on failure-free operation up to time t, availability
is concerned with the status of the equipment at time t. The
availability function does not say anything about the number of failures
that occur during time t. This means that two equipments A and B can
have different number of failures in a given time interval and can still
have the same availability. For example, in a period of 100 hr, an
equipment of 0.8 availability might have two failures, each causing 10 hr
down-time, or three failures, one causing 10 hr down time and the other two
5 hr each.
A = f (R, M) (8.2)
where A = system availability
R = system reliability
M = system
maintainability
Equation (8.2) can be viewed as an input and output relation, where R, and
M are the inputs and A is the output. Fig.8.3 shows the availability response
surface with R and M as inputs.
Maintainability
Reliability
It may also be seen from Fig. 8.3 that along a contour, sccessive incremental
increase in reliability (maintainability) require smaller and smaller amounts of
maintainability (reliability). This is referred to as competitive substitution or
trade off.
Repair can improve the system reliability if the system has redundancy.
This is possible because if one equipment fails the other can continue to
operate and the system can thus survive. Meanwhile, the failed
equipment can be
repaired and if it can be brought to operation before the other fails, then the
system will continue to operate. Thus, the system can be kept alive
continuously if the repair time of the equipment is less than the time
between failures.
1. Preparation
2. Malfuction verification
3. Fault location
4. Part procurement
5. Repair
6. Final test
The time required to perform each of these tasks varies from zero to several
hours, depending on numerous conditions associated with particular
maintenance events. Weather, for example, causes great variations in the
time required for preparation. Other variables include the skill level of
maintenance technicians, their familiarity with the system under repair, and
even the manner in which symptoms are reported to them. This variability in
preparation time would limit the accuracy of any maintenance-time
predictions based on maintenance-category time distributions.
and therefore,
t
Pr(T::;; t) f µ exp(- µt) dt
0
M(t)
1-1/e
11 µ Time
MTI J t g(t) dt
0
R
00
(8.7)
J µ t exp(- µ t) dt = 1I
µ
0
The availability function can be computed using the familiar Markov model. It
is assumed that the failure and repair rates are constant. The Markov graph
for the availability of single component with repair is shown in Fig.8.5. The
repair starts as soon as the component fails.
1- Mt
0
State 0 State 1
State 0 denotes that no failure has occurred and state 1 denotes that one
failure has occurred (i. e. the component is down). If component has not
failed at time t, then the probability that the component will fail in the time
interval (t, t + M) is equal to A.At. On the other hand, if the component is in
state 1 (failed state), then the probability that the compnent will enter into
state 0 is equal to µ At.
From the Markov graph, it can be seen that the probability that the
component will be in state 0 at time t + At is
P0(t +At) = P0 (t) ( 1- A.At) + P1(t) µAt (8.8)
Similarly, the probability that the component will be in state 1 at time t +
At is
---------------------
= - P0 (t) A. + P1(t) µ
dP0
(8.1Oal
(t)
dt
(8.1Ob)
dP1(t)
dt
At time t =0
µ A. (8.11a)
+ --------- exp [-( A. +
µ)tl
/... + µ /... + µ
A. A.
---------- - --------- exp [-( A. + µ)t] (8.
11b) A. + µ
A. + µ
µ
A(t) = P0 (t) = ----------- + ------------- exp HA. + µ)t] (8. 12)
A. + µ A. + µ
The availibility function is plotted in Fig. 8.6(a).
0 normalized time
(a) Availability of the unit.
OIP
0 u T
c
(b) Average history of o/p of the unit.
up do
µ
(c) Two state transition diagram. Fig.8.6
1I A.
A (8.14)
1/ A. + 1/ µ
Here, 1I A. is the mean time between failures (MTBF). It may be noted that
this has been defined as the mean time to failure (MTTF) in the case of
non repairable components. 1/µ is the mean repair time or mean time to
repair (MTTR). Fig.8.6(b) characterizes the expected or mean behaviour
of the component. U represents the mean up-time (MTBF) and 0
represents the mean down-time (MTTR). T0 is known as cycle time.
Here,
U = l/ A.
0 = 1/ µ
The steady-state availability is a number greater than zero and less than
one. It is equal to zero when no repair is performed ( µ= 0) and equal to one
when the equipment does not fail (A.= 0). Normally, 1/µ is much smaller than
1I A. and therefore the availability can be approximated as
The number of failures per unit time is called the frequency of failures.
This is given by
The availability, transition rates ( A. and µ ) and mean cycle time can be
related as follows:
f = A A. = A' µ (8.20)
Example 8.1
Solution
500
Availability = -------------
= 500/555 = 0.90
500 + 55
The automobile would be available 90% of the time.
***
Example 8.2
Solution
R(t) = exp(-A.t)
Therefore,
µ
0.98
µ + A.
or, µ = 0.98 µ+ 1.12 x 10-4 x 0.98
or, µ = 5.49 x 1o-3 /hr.
by
1- t
\i'!> t
The following set of differential equations can be obtained from the state
probability equations,
Where,
The mean time tofirst system failure (MTFF) is another system parameter useful
for the analysis of system effectiveness when repairs are performed. This
parameter is often referred to as the mean time between failures (MTBF) as
the system states alternate between good and bad continuously due to
repair.
00
MTFF = J R(t) dt
0
00
( S1 exp (S2t) -S2 exp (s1t)
= I ------------------------------------
) (s1-s2l
0
(8.24)
s1 + s2 = - ( Ao + Al +
µ1l s1 s2 = Ao Al
MTFF = ( Ao + Al + µ1l/ Ao Al (8.25)
µ "'t
2
µ, = µ I µ2 = 2µ
Therefore, A..2
A = µ/(n A. + µ) (8.31)
Example 8.3
Solution
A.1 = 9x10-4/hr
µ1 = 1/50=0.02 /hr
A.2 = 15x10-4/hr
Hence, the system availability for two transmitters in parallel is given by:
A = 1 - (1 - Ai )(1 - A2l
1 - (1 -0.9569)(1 - 0.9800)
= 1 - 0.0431 x 0.02 = 0.9987
***
The ideal procedure would be to replace a unit just prior to failure, and
thus realize the maximum of trouble - free life. The relationship used here
gives the average hourly cost in terms of two costs, K1 and K2 and the
failure probability distribution of the particular item.
Where,
k=lO
Hours of Operation, I
Flg.8.9 Average hourly cost or scheduled replacemenL
In the figure, a model for aircraft engine was considered and the family of
curves is plotted for various ratios of K, to K2 which is denoted as K.
When K = 1 there is no advantage in scheduled replacement, and the
equipment should be allowed to run to failure. When K> 1, there is an
advantage in scheduled replacement. If, for example, the cost of in-
service failure was 10 times the cost of a scheduled replacement, then
the K = 10
curve shows that replacement should be scheduled at approximately 80 hr in
this case as the cost would be the least at this point.
Example 8.4
Solution
(i) MTBF
(ii) Reliability
Expect one system failure for every 18.33 m1ss1ons. 1000/18.33 = 54.56
system failures per 1000 missions, or, R = 0.94544. This is an average. When
all three units are good, R =0.999138; when two are good, R =0.991;
when only one is good, R =0.90484.
Example 8.5
Consider a system consisting of 10 tubes. The failure rate for each tube is
l... =0.01/hr. How many spares are necessary to satisfy a 99.73% confidence
level,that there will be no stock out for a mission time of 1000 hr.
Solution
i=n
P = l: [exp(- A.T)(A.T)iJ I i! (8.34)
i=O
9. 1 INTRODUCTION
The overall test program for a product can be considered to be the most
important single phase of a well-planned and executed reliability program,
requiring the largest expenditure of reliability/ quality funds and manpower.
It provides the vital inputs on which the designer bases his design and
subsequent redesign or design refinement. It is the source of almost all
meaningful data from the inception of the project throughout the entire life
of the hardware, the springboard for corrective action on design, process,
and use, and the only sound basis on which logistics planning can proceed
to ensure that the necessary parts and maintenance capability are available
to support the equipment in actual use. It provides project management with
the most vital information on the technical progress and problems of the
project.
178
Reliability Testing 179
Although the details differ with the product under consideration, reliability
testing at any point in the life cycle is often severely limited by both
money and time. Unless the subject of the test is a very inexpensive mass-
produced component, it is costly to devote enough units to testing to
make the sample size as large as one would like, particularly when the test
is likely to cause wear and even destruction of the test units. The time
over which the test units must be operated in order to obtain sufficient
failure data also may be severely restricted by the date at which the
design must be frozen, the manufacture commenced, or the product
delivered. Finally, there is a premium attached to having reliability
information early in the life cycle when there are few test prototypes
available. The later design, manufacture, or operating modifications are
made, the more expensive they are likely to be.
Simply speaking, a destructive test is one that will leave the tested
hardware unfit for further use, whereas a non-destructive test is one that
will not. In most cases, as with tests of explosives, this simple definition
will suffice. However, in some rather rare instances the hardware may still
be usable for limited purposes, as with a complete design or production
qualification test which leaves the hardware unfit for delivery to a customer
but perfectly good for testing to failure to determine failure modes. Hence it
is important that the possible or potential further use be examined early in
deciding on the exact elements of any test program so that a trade-off
can be made whenever it is economically feasible.
Ambient tests are usually used for production testing, largely because of
their simplicity and economy. (They may run one tenth to one hundredth
the cost of an environmental test.) To be useful in high-reliability production
projects, it is essential that they be developed in the R&D phase, in
conjunction with environmental tests, to determine their validity for
separating out material which will not function in the actual environments
that will be encountered by the hardware after delivery.
al Size of Parts
bl Nature of the Parts
cl Frequency of Testing
dl Complexity of Instrumentation
el Complexity of the Test
fl Accessibility of Natural
Environments gl Relative Costs
hl Relative Time
4. Levels of Tests
5. Tests by
Purpose
When one suggests that a test program is needed, the first question is
generally What kind of test? meaning a test for what purpose. It is natural
to think of testing in terms of the intended purpose for which it is being run,
since this is the usual departure point for all of the planning, funding,
assignment of responsibility, and use of the resulting data. In a
comprehensive test program associated with a high reliability project, it is
convenient to consider the many purposes for which tests are conducted in
groups, named as evaluation; simulated use; quality; reliability; consumer
research, and investigations.
Although all testing contributes data for reliability calculations and hence
could be considered in a larger sense to be reliability testing, there are
specific tests which are performed for no other purpose than to gather these
data. These are the tests referred to in this section, and for purposes of this
discussion they have been grouped into peripheral testing, life testing,
accelerated life testing, service-life evaluation testing, and surveillance
testing. The data from reliability testing are used to determine mean time or
cycles to and between failure, to calculate or verify attained reliability, to
establish storage and operating life limits on critically age-sensitive parts
(and from both of these come the depth requirements for spare parts), and
to determine modes of failure. Reliability tests are performed at all stages of
the project and on all levels of assembly. They are performed both in
ambient and environmental conditions, and they include both destructive
and nondestructive tests, inspections, and examinations. They may also
include some actual-use tests, although they are usually confined to the
laboratory to ensure control of input conditions.
1 . Peripheral Testing
2. Life Testing
Reliability prediction and reliability assessment are vitally concerned with the
determination of the mean time (or cycles) to and between failures, since
this number is basic in reliability calculations. The number can be computed
directly from the data gathered from the life test program, where tests are
performed not only on samples of completed assemblies but on spares and
piece parts as well. The tests are generally performed in the laboratory on
test equipment which, for economy of testing cost, is designed to operate
continuously or cycle the hardware automatically. The operation is
interrupted at regular intervals, and functional tests or nondestructive
inspections are made to find out whether there has been any degradation of
the operability of the part with time or cycles of operation. Generally, the
most severe expected service environments are chosen and a number of
samples are utilized in a statistical design of experiments which permit the
interpretation of results.
Life testing is slow and expensive and may take six months to a year to
complete. In some situations, where real time is the same as operating
time, the test program may take years; typical of these are tests of paint,
where the actual service conditions are exposure to outdoor weather, or of
submarine cable and equipment, where the actual service condition is
exposure to ocean depths. In these situations it is essential that the life
testing program be instituted on the earliest production prototypes, so that
field failures of service equipment delivered at a later time can be predicted
prior to occurrence or that corrective action on the design or production
process can be instituted before production actually begins.
Life tests are ordinarily too drawn out to provide such gross information
quickly enough to permit design corrections to be made expeditiously. In
these projects an accelerated life-test program is generally instituted.
We shall discuss Accelerated Life Testing in details in a subsequent
section of this chapter.
SLE testing is generally accelerated life testing, since the object of the
testing is to provide management with immediate answers on the expected
life remaining in the field population. The samples selected should be the
oldest or those with the most use in order that the worst material condition
can be detected. Functional hardware should be tested at ambient conditions
both before and after being exposed to the accelerated-agi ng environment or
cycling, and the results of these ambient tests should be compared with
each other as well as with the original factory test data taken at the time
the parts were delivered.
5. Surveillance Testing
The last test program in the reliability test group is surveillance testing.
These tests, which are performed on samples drawn at regular intervals
from the actual field service stocks, consist of ambient tests and
examinations performed on the samples at progressive levels of
disassembly. The object of the testing is to discover evidence of failure or
incipient failures in the hardware, including not only shifts in values of
components in functional hardware but chemical deterioration of materials,
fatigue cracks, corrosion, whiskers, hardening of rings and seals, and any
other unanticipated modes of failure.
The two characteristics differentiating surveillance testing from other kinds
of reliability testing are the limitation of testing to ambient examinations and
the complete disassembly of the specimens.
(9.1)
Even if we had several years time so that we could compute the mean for all
components, the question of how many of them had failed because of
chance and how many had failed because of wearout would arise. We
can safely assume that the majority would fail because of wearout.
The optimum estimate for the mean time between failures is given by:
r (9.2)
= ( 1/r)[ L ti + (n - r)tr
]
i= 1
The choice of the sample size, i.e., of the number of components which we
should submit to a test, depends on the available test time tr and on the
precision of or confidence in the test result which we wish to achieve.
When the available test time for a nonreplacement test is t hours and the
expected failure rate of the specimens is A., and m has to be measured with a
precision corresponding to r chance failures, the number of specimens n to
be submitted to the test is
Since the time tr of the test duration is known and r chance failures have
been counted during the test, the estimate m is obtained as
where,
k
X = (1/k) :E ti (9.10)
i= 1
k
Y = :E ln(tj) (9.11)
i=1
where ti is the ith time to failure and k is the total number of failures in
the sample.
Example 9.1
A sample of 20 failure times (in days) of an air traffic control system is given
in Table 9.1. Determine with the aid of Barlett's test that the data are
representative of an exponential distribution.
TABLE 9.1
f a1"Iure T"1mes ("1n davs
7 35 85 142
8 46 86 186
20 45 111 185
19 63 112 266
34 64 141 267
Solution
x = [1/20](7 + 8 + 20 + 19 + 34 + 35 + 46 + 45 + 63 + 64 + 85 +
86 + 111 + 112 + 141 + 142 + 186 + 185 + 266 + 267)
= 96.10
With the aid of the above results from Equation (9.91 we get
From Table 9.2 for a two-tailed test with 90 percent confidence level, the
corresponding values are:
x2 le12, (k - 1n = x2l (0.1121, (20 -111 = 30.14
TABLE 9.2
Ch1' - S1quare o·1stn'bufion
Degree of Probability
Freedom
0.975 0.950 0.05 0.025
1 0.001 0.004 3.840 5.020
2 0.050 0.100 5.990 7.380
3 0.220 0.350 7.820 9.350
4 0.480 0.710 9.490 11.14
5 0.830 1.150 11.07 12.83
6 1.240 1.640 12.59 14.45
7 1.690 2.170 14.07 16.01
8 2.180 2.730 15.51 17.54
9 2.700 3.330 16.92 19.02
10 3.250 3.940 18.31 20.48
11 3.820 4.580 19.68 21.92
12 4.400 5.230 21.92 23.34
13 5.010 5.890 22.36 24.74
14 5.630 6.570 23.69 26.12
15 6.260 7.260 25.00 27.49
16 6.910 7.960 26.30 28.85
17 7.560 8.670 27.59 30.19
18 8.230 9.390 28.87 31.53
19 8.910 10.12 30.14 32.85
20 9.590 10.85 31.41 34.17
21 12.40 13.85 36.42 39.36
***
9.4 PARAMETRIC METHODS
normal, or Weibull. For if this can be accomplished, a great deal more can
often be determined about the nature of the failure mechanisms, and the
resulting model can be used more readily in the analytical techniques.
Often the exponential distribution or constant failure rate model is the first to
be used when we attempt to parameterize data. In addition to being the only
distribution for which only one parameter must be estimated, it provides a
reasonable starting point for considering other two or three parameter
distributions. For as will be seen, the distribution of the data may indicate
whether the failure rate is increasing or decreasing, and this in turn
may provide insight whether another distribution should be considered.
In R = - A.t (9.12)
or, In (1/R) = A.t (9.13)
where N is the number of test units. It will be noted that A.t = 1 when 1- Q =
0.98
0.97
0.96
0.95
0.94
0.93
0.92
0.91
/-'
0.90 /
0.88 /v
Q(t) 0.86
I/
0.84
//
0.82
0.80
Ad ing v -,
/v
0.78 -----v I
0.76
/ / I/
0.72
./ I
/ / /
0.68
/ v
/ /
/ /
0.632
v
/ /
/ /
0.60
I/ c v /
0.52 I / /
I / /
A iilcrea! ng
0.48
0.40
0.32 ,,
/, /
/
0.24 /
/ r
0.16
0.08 /
0 /
0 2 3 4 6 7 8 9
5
Time x 1oru2
e-1 or Q = 0.632. Thus the value of 1/A. is equal to the time at which Q
0.632. The data through which the straight line is drawn on Fig.9.2 come
from the following example.
19 Reliability Engineering
Example 9.2
The following are the failure times from eight control circuits in hours: 80,
134, 148, 186, 238, 450, 581, and 890. Estimate the failure rate by making
a plot on exponential distribution probability paper.
Solution
The calculations are carried out in Table 9.3. From Fig.9.2 we see that
Q=
0.632 when t = 400 hr. Therefore we estimate A. = 0.0025/hr.
TABLE 9.3
Exponent1.a ICaIcuIat1ons
i t; i/N + 1 i t; i/N + 1
1 80 0.111 5 238 0.555
2 134 0.222 6 450 0.666
3 148 0.333 7 581 0.777
4 186 0.444 8 890 0.888
***
The following is an important feature of plotting failure times on logarithmic
paper. If the failure rate is not constant, the curvature of the data may
indicate whether the failure rate is increasing or decreasing. The dotted
lines on Fig.9.2 indicate the general pattern that the data would follow were
the failure rate increasing (concave upward) or decreasing (concave
downward) with time.
The two Weibull parameters are then estimated directly from the straight
line. The slope m is obtained by drawing a right triangle with a horizontal
side of length one; the length of the vertical side is then the slope. The
value of 8 is estimated by noting that the ordinate vanished when Q =
0.632 yielding t = 8.
The Bayesian formula stems from the fact that the intersection of two
probabilities can be written in terms of two different conditional probabilities;
Suppose that X1, X2, ....., Xn are the only possible values that X may take
on. Since X can have only one value, the events Xi are mutually exclusive,
and therefore,
n (9.23)
:E Pr{Xi} = 1,
i=1
Also, the Bayes equation, may be written in the form of Total Probability as
Example 9.3
Subsequently, a 6-month test is run, and the prototype for the new
computer does not fail. In the light of these test results, (a) how should
the experts' opinions be weighed, and (b) how should the estimated MTTF
be upgraded?
Solution
Let Pr{X1} = Pr{X2} = 0.5 be the prior probabilities that the MTTF
estimates of experts 1 and 2 are correct. If the experts' opinions are
correct, the probability of 6-month operation without failure is
Thus, the revised probabilities that each of the experts are correct are:
The estimates of the mean time between failures m, or any other statistical
parameter, are so called point estimates to the true unknown parameter.
How reliable are such estimates and what confidence can we have in them?
We know that statistical estimates are more likely to be close to the true
value as the sample size increases. Thus, there is a close correlation
between the accuracy of an estimate and the size of the sample from which
it was obtained. Only an infinitely large sample size could give us a 100 per
cent confidence or certainty that a measured statistical parameter coincides
with the true value. In this context, confidence is a mathematical probability
relating the mutual positions of the true value of a parameter and its
estimate.
But suppose you want to make sure that the particular train is typical of
those which arrive normally within the average confidence interval. You
could check at the information window or with the stationmaster sometime
before train time to see if this particular train is running on time at earlier
stops. Twenty per cent of the trains normally arrive at times outside the
80 per cent confidence interval because of events which make them
nontypical. This is the equivalent engineering action of evaluating a test
result in terms of ancillary factors to determine mitigating circumstances or
system interaction factors.
Suppose also that you are out of town on business and cannot get to the
railroad station until a specific time. In that case you might want to know
the confidence that the train will arrive some time after you do, so that you
will be on hand to greet your guest. If you arrive an hour or more ahead of
the normal train time, your confidence will be almost 100 per cent that the
train will arrive later than you do. However, as the two times of arrival
approach coincidence, the confidence in your arriving first will approach 50
per cent. Under these conditions the variability in the train arrival is a major
factor. This example illustrates a statistical approach described as a one
sided confidence determination or interval.
Both one sided and two sided confidence intervals are illustrated in the
Fig.9.3 and Fig.9.4 respectively.
10 ,IL.L."-""-''-"-""-'-""-""-'L...<.....'-L..d
O ---------
Fig. 9.3: One-sided confidence Interval.
Usually sampled data are used when estimating the mean life of a product.
If one draws two separate samples from a population for the purpose of
estimating the mean life, it will be quite unlikely that both samples will yield
the same mean life results. Therefore, the confidence limits on mean life are
computed to take into consideration the sampling fluctuations. In this
section the confidence limit formulations for the following two types of
test procedures are presented.
100 - ----------
90 _....,._
Percent of
Population
80 Percent
of events Interval
50
_,,_,,_,,,_,,_,,,_,,,.,,,_,,,.,,.,..,,.,..,,.,..,'7'7'7'7,_,,_,,,.
10_.L..L...L..L...L..L...L..L..<-L..L..L..<...<.....<...<.....<..L..< '-L.L
o -
----------
Fig. 9.4: Two-sided conftdence Interval.
In this situation, the items are tested until the preassigned failures occur.
The formulas for one-sided (lower limit) and two sided (upper and
lower limits) confidence limits, respectively in this case are as follows:
where k is the total number of failures and e is the probability that the
interval will not contain the true value of mean life [thus e = 1- (confidence
level)].
k
(9.28)
t = l: Yi + (x-k)y
j=1
where x is the total items, at time zero, placed on test; y is the time
at the conclusion of life test; and Yi is the time of failure j.
Example 9.4
Solution
to t = (25)(150) = 3,750 hr
r 2(31501 l
l I
The minimum value of mean life is 190.55 hr for the 97.5 percent
confidence level.
***
9.512 Test Procedure II
and
r 2t 2t l
I I
Solution
r 2(2549) l
l I
x2co.025,2!6l + 21
00 J = [(5098/26.12), 00 ] = ( 195.18, 00 )
Thus the minimum value of mean life is 195.18 hr for the 97.5 percent
confidence level.
***
9.6 ACCELERATED TESTING
If we have enough test data, the conventional testing methods will allow us
to fit our choice of a life distribution model and estimate the unknown
parameters. However, with today's highly reliable components, we are often
unable to obtain a reasonable amount of test data when stresses
approximate normal use conditions. Instead, we force components to fail by
testing at much higher than the intended application conditions. By this way,
we get failure data that can be fitted to life distribution models, with
relatively small test sample sizes and practical test times.
The price we have to pay for overcoming the dilemma of not being able to
estimate failure rates by testing directly at use conditions (with realistic
sample sizes and test times) is the need for additional modeling. How can
we go from the failure rate at high stress to what a future user of the
product is likely to experience at much lower stresses?
The models used to bridge the stress gap are known as acceleration models.
This section develops the general theory of these models and looks in detail
at some well known forms of acceleration models, such as the Arrhenius and
the Eyring models.
When we find a range of stress values over which this assumption holds,
we say we have true acceleration.
If we use subscripts to denote stress levels, with U being a typical use set of
stresses and S (or S1, S2, ...) for higher laboratory stresses, then the key
equations in Table 9.4 hold no matter what the underlying life distribution
happens to be.
TABLE 9.4
G eneraILi'n ear A cce erat1on R eI at1ons h1' 0s
1. Time to fail: tv = AF X t..
2. Failure probability: Fv(t) = Fs(t/AF)
3. Density function: fv(t) = (1/AF) fs (t/AF)
4. Failure rate: hv(t) = (1/AF) h,. (t/AF)
Table 9.4 gives the mathematical rules for relating CDFs and failure rates
from one stress to another. These rules are completely general, and depend
only on the assumption of true acceleration and linear acceleration factors.
In the next section, we will see what happens when we apply these rules
to exponential distribution as an example.
Example 9.6
Solution
The MTTF is the reciprocal of the failure rate and varies directly with the
acceleration factor. Therefore the MTTF at 25oC is 4500 x 35 = 157,500.
The use failure rate is 1/157,500 = 0.635%/K. The cumulative percent of
failures at 40,000 hr is given by 1-e-0.00635x40 = 22.4%.
***
9.62 Acceleration Models
There are many models in the literature that have been used successfully
to model acceleration for various components and failure mechanisms.
These models are generally written in a deterministic form that says that
time to fail is an exact function of the operating stresses and several
material and process dependent constants.
Since all times to failure are random events that cannot be predicted exactly
in advance, and we have seen that acceleration is equivalent to multiplying
a distribution scale parameter, we will interpret an acceleration model as an
equation that calculates a distribution scale parameter, or percentile, as a
function of the operating stress. In the discussion below we use a typical
percentile T50, as is the convention for these models.
When only thermal stresses are significant, an empirical model, known as the
Arrhenius model, has been used with great success. This model takes the
form
Note that we can write the Arrhenius model in terms of T50, or the I/A.
parameter (when working with an exponential), or any other percentile of
the life distribution we desire. The value of the constant A will change, but
this will have no effect on acceleration factors.
AF = (9.32)
A exp( H/kT2)
from which
(9.33)
(9.34)
This last equation shows us how to estimate AH from two cells of
experimental test data consisting of times to failure of units tested at
temperature T1 and times to failure of units tested at temperature T2. All we
have to do is estimate a percentile, such as T50, in each cell, then take the
ratio of the corresponding times and use the preceding equation to estimate
AH. This procedure is valid for any life distribution.
The Arrhenius model is an empirical equation that justifies its use by the
fact that it worlcs in many cases. It lacks, however, a theoretical derivation
and the ability to model acceleration when stresses other than temperature
are involved.
The Eyring model equation, written for temperature and a second stress,
takes the form.
The first exponential is the temperature term, while the second exponential
contains the general form for adding any other type of stress. In other
words, if a second nonthermal stress was needed in the model, a third
exponential multiplier exactly the same as the second, except for replacing
B and C by additional constants D and E, would be added to the equation.
The resulting Eyring model for temperature and two other stresses would
then be
It is interesting to look at how the first term, which models the effect
of temperature, compares to the Arrhenius model. Except for the Ta factor,
this term is the same as the Arrheni us. If a. is close to zero, or the range
over which the model is applied is small, the term Ta has little impact and
can be absorbed into the A constant without changing the practical value
of the expression. Consequently, the Arrheni us model is successful
because it is a useful simplification of the theoretically derived Eyring
model.
9.623 Other Acceleration Models
There are many other models, most of which are simplified forms of the
Eyring, which have been successful. A model known as the power rule
model has been used for paper impregnated capacitors. It has only voltage
dependency, and takes the form AV-B for the mean time to fail (or the T50
parameter).
Another way to model voltage is to have a term such as Ae-BV. This kind
of term is easy to work with after taking logarithms.
Humidity plays a key role for many failure mechanisms, such as those
related to corrosion or ionic metal migration. The most successful models
including humidity have terms such as A(RH)-B or Ae-B<RHl , where RH is
relative humidity.
Use of magnified load does reduce testing time and possibly the number of
items required for test. A major problem is that of correlation. For
example, if we wish to know the performance of an engine in normal use
of 5000 h, we can get much the same performance in 2830 h at full
throttle, or in
100 h at 23 percent overload. This correlation is possible, since much
information exists. In many situations, however, establishing such
correlation is difficult, since we must first know what normal means and
then we must have enough overload data to correlate with normal.
Results from
accelerated
MTIF
Stress level
Accelerated testing is useful, but it must be carried out with great care to
ensure that results are not erroneous. We must know for sure that the
phenomena for which the acceleration factor has been calculated are the
failure mechanisms. Experience gained with similar products and a
careful comparison of the failure mechanisms occurring in accelerated and
real time tests will help determine whether we are testing the correct
phenomena.
One common type of accelerated test stresses the test sample to the
maximum ratings for the part. Acceleration factors are then applied to
achieve a probable failure rate which would have been applicable at
considerably derated conditions. For example, paper capacitors commonly
exhibit a fifth-power acceleration factor with voltage. Most other parts
exhibit close to a third power acceleration factor. A standard third power is
frequently used for acceptance tests. For example, suppose a test is
performed to demonstrate a failure rate of 1.0%/ 1,000 hours while operated
at full rated voltage. This could be interpreted as the equivalent of 0.008%/
1,000 hours at 20 per cent of the full voltage rating. This is calculated as
follows:
full rating
Derated failure rate, d = -----------------------------------------------
(rated voltage/derated voltage)3
1.0%/K hours
d = ------------------- = (1.0/53) = 0.008%/K hours
(VR/0.2VR)3
Voltage
225 275 T
325 375 425 450
600 {
550 {
500 {
450 {
400 {
Temperature F 350 {
300 {
The figure of merit usually used for measuring equipment reliability is mean
time between failures (MTBF). Reliability acceptance testing for equipment
generally consists of operational tests performed under simulated end-use
conditions with acceptable MTBF and confidence specified.
The test measures the most likely value of MTBF, and the amount of
statistical data obtained during the test must be evaluated to determine the
confidence which can be placed on the measurement. When this has been
done, the following statements can be made: The best estimate of the MTBF
is B hours; but, based on the amount of data, we can be 90 percent sure,
for example, that it is not more than an upper limit of A hours and 90
percent sure that it is not less than a lower limit of C hours. This defines
an
80 percent double sided confidence that the true value lies between the
values of A and C.
Usually, for acceptance testing, the single sided description stating the
cumulative probability that a measured MTBF is greater than a certain
specified minimum value has the greatest usefulness. This brings to mind
that it is most frequently desired to plan equipment acceptance tests to
prove with a known confidence that the MTBF is greater than a certain
specified figure.
Sequential testing differs from other test procedures in that the length of test
is not established before the test begins but depends upon what happens
during the test. The test sample is tested while subjected to a prescribed
environment and duty cycle until the preassigned limitations on the risks of
making wrong decisions based on the cumulative test evidence have been
satisfied. The ratio of quantity of failures to the length of test at any test
interval is interpreted according to a sequential analysis test plan.
Conspicu usly good items are accepted quickly; conspicuously bad items
are rejected quickly; and items of intermediate quality require more
extensive testing.
less testing on the average than other testing procedures when the
preassigned limitations on the risks of making both kinds of wrong decisions
are the same for both tests. The chief disadvantage is that the test time
required to reach a decision cannot be determined prior to testing.
10
6
Continue
testing
4 region \
2 Accept line
20 40 60 80
S, cwnulalive nwnber of successes
Equipment Diagnosis
under trial . ----land
rc:atoration
Analysis
Modification and
classification
These types of trials and tests can be used to obtain initial reliability
information but the conditions of the tests require to be carefully studied.
The tests themselves may not be under the same environmental conditions,
it is often not easy to set up correctly the true conditions. In the case of life
testing as already illustrated in the previous section, this may represent
accelerated testing particularly where the equipment is of very high
reliability and catastrophic failure information is required. Various techniques
of analysis exist for estimating the reliability characteristic of interest such
as failure rate and some of the techniques have already been illustrated.
Typically two periods of testing time may be selected, one at the start of
the test and the other at the termination of the test, selecting periods with
approximately equal numbers of failures.
Cwnulalive
Failure nte
100
IO
1.0
0.1
100 1000 10,000 100,000 1,000,000 10,000,000
total time
(9.41)
me = ----------------------------------------------- = t I
k
(total number of product failures)
k = time (9.43)
(9.44)
Example 9.7
Solution
Substituting the given data into Equation (9.41) yields the estimated value
for
m5 = 300/5 = 60 hr
10.1 IMPORTANCE
At the same time, both the development and operational cost of software
216
Software Reliability 217
It has been indicated that three of the most important software product
characteristics are quality, cost and schedule. Note that these are primarily
user-oriented rather than developer-oriented attributes. Quantitative
measures exist for the latter two characteristics, but the quantification of
quality has been more difficult. It is most important, however, because the
absence of a concrete measure for software quality generally means that
quality will suffer when it competes for attention against cost and schedule.
In fact, this absence may be the principal reason for the well known
existence of quality problems in many software products.
This does not mean that some attention to faults is without value. But the
attention should be focused on faults as predictors of reliability and on the
nature of faults. A better understanding of faults and the causative human
error processes should lead to strategies to avoid, detect and remove, or
compensate for them.
The field of hardware reliability has been established for some time.
Hence, one might ask how software reliability relates to it. In reality,
the division between hardware and software reliability is somewhat
artificial. Both may be defined in the same way. Therefore, one may
combine hardware and software component reliabilities to get system
reliability. Both depend on the environment. The source of failures in
software is design faults, while the principal source in hardware has
generally been physical deterioration. However, the concepts and
theories developed for software reliability could really be applied to any
design activity,including hardware design. Once a software (design)
defect is properly fixed, it is in general fixed for all time. Failure usually
occurs only when a program (design) is exposed to an environment that
it was not developed or tested for. Although manufacturing can affect the
quality of physical components, the replication process for software
(design) is trivial and can be performed to very high standards of quality.
Since introduction and removal of design faults occurs during
software development, software reliability may be expected to vary
during this period.
The design reliability concept has not been applied to hardware to that
extent. The probability of failure due to wear and other physical causes has
usually been much greater than that due to an unrecognized design
problem. It was possible to keep hardware design failures low because
hardware was generally less complex logically than software. Hardware
design failures had to be kept low because retrofitting of manufactured
items in the field was very expensive. Awareness of the work that is going
on in software reliability, plus a growing realization of the importance of
design faults, may
now be having an effect on hardware reliability too. This growing
awareness is strengthened by the parallels that people are starting to draw
between software engineering and chip design.
A fault is the defect in the program that, when executed under particular
conditions, causes a failure. There can be different sets of conditions that
cause failures, or the conditions can be repeated. Hence a fault can be the
source of more than one failure. A fault is a property of the program rather
than a property of its execution or behavior. It is what we are really
referring to in general when we use the term bug. A fault is created when a
programmer makes an error. It's very important to make the failure-fault
distinction!
1. time of failure,
2. time interval between failures,
3. cumulative failures experienced up to a given time,
4. failures experiences in a time interval.
10.2.
TABLE 10.1
T1' me base d f a1"Iure soec1T1cat1on
Failure Failure time Failure
number (sec) interval
1 8 (sec)
8
2 18 10
3 25 7
4 36 1 1
5 45 9
6 57 12
7 71 14
8 86 15
9 104 18
10 1 24 20
1 1 143 19
12 169 26
13 197 28
14 222 25
15 250 28
Note that all the foregoing four quantities are random variables. By random,
we mean that the values of the variables are not known with certainty.
There are many possible values, each associated with a probability of
occurrence. For example, we don't really know when the next failure will
occur. If we did, we would try to prevent or avoid it. We only know a set
of possible times of failure.
TABLE 10.2
F81"Iure b8sed f81"Iure spec1"f1c8fion
Time(sec) Cumulative Failures in
failures interval
30 3 (sec)
3
60 6 3
90 8 2
120 9 1
150 11 2
180 12 1
210 13 1
240 14 1
There are at least two principal reasons for this randomness. First, the
commission of errors by programmers, and hence the introduction of faults,
is a very complex, unpredictable process. Hence the locations of faults
within the program are unknown. Second, the conditions of execution of a
program are generally unpredictable. For example, with a telephone
switching system, how do you know what type of call will be made next? In
addition, the relationship between program function requested and code path
executed, although theoretically determinable, may not be so in practice
because it is so complex. Since failures are dependent on the presence of a
fault in the code and its execution in the context of certain machine states, a
third complicating element is introduced that argues for the randomness of
the failure process.
We will look at the time variation from two different viewpoints, the mean
value function and the failure intensity function. The mean value function
represents the average cumulative failures associated with each time point.
The failure intensity function is the rate of change of the mean value function
or the number of failures per unit time. For example, you might say 0.01
failure/hr or 1 failure/100 hr. Strictly speaking, the failure intensity is the
derivative of the mean value function with respect to time, and is an
instantaneous value.
A random process whose probability distribution varies with time is called
nonhomogeneous. Most failure processes during test fit this situation.
Fig.10. 1 illustrates the mean value and the related failure intensity
functions at time tA and te. Note that the mean failures experienced
increases from
3.04 to 7.77 between these two points, while the failure intensity
decreases.
TABLE 10.3
Probability distribution at times ta and tR
Value of random Probability
variable (failures
in time period)
Elapsed time tA = 1hr Elapsed time tR = 5hr
0 0.10 0.01
1 0.18 0.02
2 0.22 0.03
3 0.16 0.04
4 0.11 0.05
5 0.08 0.07
6 0.05 0.09
7 0.04 0.12
8 0.03 0.16
9 0.02 0.13
10 0.01 0.10
11 0 0.07
12 0 0.05
13 0 0.03
14 0 0.02
15 0 0.01
Mean failures 3.04 7.77
The number of faults in the software is the difference between the number
introduced and the number removed.
Mean failurea
Failure httensity
5 I (failures/hr)
I
: Tunc t 5
I B
I
I
I
I
if"ailurc ht
10
TiJe (hr)
Fig.10. 1Mean value & failure intensity functions
Fault removal obviously can't occur unless you have some means of
detecting the fault in the first place. Thus fault removal resulting from
execution depends on the occurrence of the associated failure. Occurrence
depends both on the length of time for which the software has been
executing and on the execution environment or operational profile. When
different functions are executed, different faults are encountered and the
failures that are exhibited tend to be different; thus the environmental
influence. We can often find faults without execution. They may be found
through inspection, compiler diagnostics, design or code reviews, or code
reading.
10.31 Environment
During test, the term test case is sometimes used instead of run type.
The run types required of the program by the environment can be viewed
as being selected randomly. Thus, we define the operational profile as the
set of runtypes that the program can execute along with probabilities with
which they will occur. In Fig.10.2, we show two of many possible input
states. A and B, with their probabilities of occurrence. The part of the
operational profile for just those two states is shown in Fig.10.3. In
reality, the number of possible input states is generally quite large. A
realistic operational profile is illustrated in Fig.10.4. Note that the input
states have been located on the horizontal axis in order of the probabilities
of their occurrence. This can be done without loss of generality. They
have been placed close together so that the operational profile would
appear to be a continuous curve.
• Input state A
(pA=0.12)
Proba · of occurrence
0.15
0.12
0.10
0.05
A B
Input state
Flg.10.3 Portion or operational profile
The definition that we will present here for software reliability is one that
is widely accepted throughout the field. It is the probability of failure-free
operation of a computer program for a specified time in a specified
environment. For example, a time-sharing system may have a reliability of
0.95 for 10 hr when employed by the average user. This system, when
executed for 10 hr, would operate without failure for 95 of these periods
out of 100. As a result of the general way in which we defined failure, note
that the concept of software reliability incorporates the notion of
performance being satisfactory. For example, excessive response time at a
given load level may be considered unsatisfactory, so that a routine must be
recoded in more efficient form.
Probability of occurrence
Input state
Fig.10.4 Operational profile
Pressures have been increasing for achieving a more finely tuned balance
among product and process characteristics, including reliability. Trade offs
among product components with respect to reliability are also becoming
increasingly important. Thus an important use of software reliability
measurement is in system engineering. However, there are at least four
other ways in which software reliability measures can be of great value to
the software engineer, manager, or user.
Reliability 1.0
Failure
Intensity
Time (hr)
Third, one can use a software reliability measure to monitor the operational
performance of software and to control new features added and design
changes made to the software. The reliability of software usually decreases
as a result of such changes. A reliability objective can be used to determine
when, and perhaps how large, a change will be allowed. The objective would
be based on user and other reQuirements. For example, a freeze on all
changes not related to debugging can be imposed when the failure intensity
rises above the performance objective.
To model software reliability one must first consider the principal factors
that affect it: fault introduction, fault removal, and the environment. Fault
introduction depends primarily on the characteristics of the developed code
(code created or modified for the application) and development process
characteristics include software engineering technologies and tools used
and level of experience of personnel. Note that code can be developed
to add features or remove faults. Fault removal depends upon time,
operational profile, and the quality of repair activity. The environment
directly depends on the operational profile. Since some of the foregoing
factors are probabilistic in nature and operate over time, software reliability
models are generally formulated in terms of the random processes. The
models are distinguished from each other in general terms by the nature of
the variation of the random process with time.
There are at least two general ways of viewing predictive validity. These
are based on the two equivalent approaches to characterizing the failure
random process, namely;
The number of failures approach may yield a method that is more practical
to use than the failure time approach. In the former approach, we describe
the failure random process by [M(t), t OJ, representing failures experienced
by time t. Such a counting process is characterized by specifying the
distribution of M(t), including the mean value function µ(t).
230 Reliability Engineering
Assume that we have observed q failures by the end of test time tq. We use
the failure data up to time t8( :s: tq) to estimate the parameters of µ(t).
Substituting the estimates of the parameters in the mean value function
yields the estimate of the number of failures by the time tq. The estimate is
compared with the actually observed number q. This procedure is repeated
for various values of ta.
We can visually check the predictive validity by plotting the relative error
against the normalized test time. The error will approach 0 as t8 approaches
tq. If the points are positive (negative), the model tends to overestimate
(underestimate). Numbers closer to 0 imply more accurate prediction and
hence a better model.
10.512 Capability
10.514 Applicability
There are at least four special situations that are encountered commonly in
practice. A model should either be capable of dealing with them directly or
should be compatible with procedures that can deal with them. These are:
1.program evolution,
2.classification of severity of failures into different categories,
3.ability to handle incomplete failure data or data with measurement
uncertainties (although not without loss of predictive validity),
4.operation of the same program on computers of different performance.
10.515 Simplicity
The execution time component for both models assumes that failures
occur as a random process, to be specific, a nonhomogeneous Poisson
process. Poisson simply refers to the probability distribution of the value of
the process at each point in time. The term nonhomogeneous indicates that
the characteristics of the probability distributions that make up the random
process vary with time. This is exhibited in a variation of failure intensity
with time. You would expect that,since faults are both being introduced
and removed as time passes.
The two models have failure intensity functions that differ as functions of
execution time. However, the difference between them is best described in
terms of slope or decrement per failure experienced (Fig.10.6). The
decrement in the failure intensity function remains constant for the basic
execution time model whether it is the first failure that is being fixed or
the last. By contrast, for the logarithmic Poisson execution time model, the
decrement per failure becomes smaller with failures experienced. In fact,
it decreases exponentially. The first failure initiates a repair process that
yields a substantial decrement in failure intensity, while later failures result
in much smaller decrements.
The failure intensity for the basic model as a function of failures experienced
is
The quantity A.a is the initial failure intensity at the start of execution. Note
that µ is the average or expected number of failures experienced at a
given
point in time. The quantity v0 is the total number of failures that
would occur in infinite time.
Example 10.1
Assume that a program will experience 100 failures in infinite time. It has
now experienced 50. The initial failure intensity was 10 failures/CPU hr.
Determine the value of the current failure intensity.
Solution
Examplel0.2
Assume that the initial failure intensity is again 10 failures/ CPU hr. The
failure intensity decay parameter is 0.02/failure. We assume that 50 failures
have been experienced. The current failure intensity is to be determined.
Solution
Example 10.3
Solution
Example 10.4
Solution
Mean
Failures
Experi
need
µ Total failures
Basic model
Execution time't
Let execution time be denoted by 't. We can then write, for the basic model.
Example 10.S
Solution
= 100[1-exp(-10)1
= 100(1-0.0000454) = 100 failures( almost).
***
For the logarithmic Poisson model, we have the corresponding relation
for the number of failures as given by:
Example 10.6
Use the same parameters as Example 10.2. Let's find the number of
failures experienced for the logarithmic Poisson model at 10 and 100 CPU
hr of execution.
Solution
Failure intensity
A.
Basic model
Execution time T
Example 10.7
Calculate the failure intensities at 10 and 100 CPU hr, using the parameters
of the example 10.1.
Solution
Example 10.8
Solution
This is slightly lower than the corresponding failure intensity for the basic
model. At 100 CPU hr we have:
A.( • I = 10/[10(0.02)(1001 + 11
= 0.476 failure/ CPU hr.
The failure intensity at the higher execution time is larger for the logarithmic
Poisson model.
***
10.61 Derived Quantities
Assume that you have chosen a failure intensity objective for the software
product being developed. Suppose some portion of the failures are being
removed through correction of their associated faults. Then one can use the
objective and the present value of failure intensity to determine the additional
expected number of failures that must be experienced to reach that
objective. The process is illustrated graphically in Fig.10.9. Equations
describing the relationship in closed form may be derived for both models so
that manual calculations can be performed. They are
( 10.9)
(10.10)
for the logarithmic Poisson model. The quantity Aµ is the expected number
of failures to reach the failure intensity objective, A.p is the present failure
intensity, and A.t is the failure intensity objective.
Initial
>o
Failure
Intensity
;\.
t -----,
I
Objeclive ___L ___
't" :
I
Example 10.9
For the basic model, we determine the expected number of failures that will
be experienced between a present failure intensity of 3.68 failures/CPU hr
and an objective of 0.000454 failure/CPU hr. We will use the same
parameter values as in Example 10.1.
Solution
(Vo/ Ao) (A.p - A.f)
= ( 100/10) (3.68
-0.000454)
10(3.68) = 37 failures
***
Example 10.10
We will find, for the logarithmic Poisson model, the expected number of
failures experienced between a present failure intensity of 3.33 failures/CPU
hr and an objective of 0.476 failure/CPU hr. The parameter values will be the
same as in Example 10.2.
Solution
= (1/0) ln(A.p/A.t)
= (110.02) ln(3.33/0.476)
24 Reliability Engineering
0
= 50 In 6.996 = 97
failures. ***
Similarly, you can determine the additional execution time A't required to
reach the failure intensity objective for either model. This is
Initial
AO
Failure intensity
A
Present _
A p J
Objective --------
A f J
ll't
Execution time 't
Fig.10.10 Additional execution time to failure intensity objective
Example 10.11
For the basic model, with the same parameter values used in Example 10.1
we will determine the execution time between a present failure intensity of
3.68 failures/CPU hr and an objective of 0.000454 failure/CPU hr.
Solution
Example 10.12
For the logarithmic Poisson model, with the same parameter values used in
Example 10.2, we will find the execution time between a present failure
intensity of 3.33 failures/CPU hr and an objective of 0.476 failure/CPU hr.
Solution
= 90 CPU hr.
***
The foregoing quantities are of interest in themselves. The additional
expected number of failures required to reach the failure intensity objective
gives some idea of the failure correction workload. The additional execution
time indicates the remaining amount of test required. However, even more
importantly, they are both used in making estimates of the additional
calendar time required to reach the failure intensity objective.
The calendar time component relates execution time and calendar time by
determining the calendar time to execution time ratio at any given point in
time. The ratio is based on the constraints that are involved in applying
resources to a project. To obtain calendar time, one integrates this ratio with
respect to execution time. The calendar time component is of greatest
significance during phases where the software is being tested and repaired.
During this period one can predict the dates at which various failure intensity
objectives will be met. The calendar time component exists during periods in
which repair is not occurring and failure intensity is constant. However, it
reduces in that case to a constant ratio between calendar time and
execution time.
Table 10.4 on the following page will help in visualizing these different
aspects of the resources, and the parameters that result.
TABLE 10.4
CaIendar time component resources and parameters
Usage parameters
requirements per
Quantities
Resources CPU hr Failure available
Failure identification ei IJ.i pi 1
Personnel
Failure correction 0 Pt Pt
Personnel
Computer time Sc µc Pc Pc
Note that Sr is the resource usage per CPU hr. It is nonzero for
failure identification personnel(0il and computer time (Sc). The quantity
µr is the resource usage per failure. Be careful not to confuse it with
mean failures experienced µ. It was deliberately chosen to be similar
to suggest the connection between resource usage and failures
experienced. It is nonzero for failure identification personnel (), failure
correction personnel (), and computer time (µc).
Example 10.13
Suppose the test team runs test cases for 8 CPU hr and identifies 20
failures. The effort required per hr of execution time is 6 person hr. Each
failure requires 2 hr on the average to verify and determine its nature.
Calculate the total failure identification effort required.
Solution
Computer time required per unit execution time will normally be greater than
1. In addition to the execution time for the program under test, additional
time will be required for the execution of such support programs as test
drivers, recording routines, and data reduction packages.
Consider the change in resource usage per unit of execution time. It can
be obtained by differentiating Equation (10.13) with respect to execution
time.
244 Rel ability Engineering
We obtain
(10.14)
Since the failure intensity decreases with testing, the effort used per hour
of execution time tends to decrease with testing. It approaches the
execution time coefficient of resource usage asymptotically as execution
time increases.
The form of the instantaneous calendar time to execution time ratio for any
given limiting resource and either model is shown in Fig.10.11. It is readily
obtained from Equations ( 10.14) and ( 10.15) as
(10.16)
The shape of this curve will parallel that of the failure intensity. The curve
approaches an asymptote of 0r1Pr Pr· Note that the asymptote is 0 for the
failure correction personnel resource. At any given time, the maximum of the
ratios for the three limiting resources actually determines the rate at which
calendar time is expended; this is illustrated in Fig. 10.12. The maximum
is plotted as a solid curve. When the curve for a resource is not
maximum (not limiting), it is plotted thin. Note the transition points Fl and
IC. Here, the calendar time to execution time ratios of two resources are
equal and the limiting resource changes. The point FC is a potential but not
true transition point. Neither resource F nor resource C is limiting near this
point.
Execution time i:
The calendar time component allows you to estimate the calendar time in
days required to meet the failure intensity objective. The value of this
interval is particularly useful to software managers and engineers. One
may determine it from the additional execution time and additional
number of failures needed to meet the objective that we found for the
execution time component. Second, one now determines the date on
which the failure intensity objective will be achieved. This is a simple
variant of the first quantity that takes account of things like weekends and
holidays. However, it is useful quantity because it speaks in terms
managers and engineers understand.
11
RELIABILITY ANALYSIS OF SPECIAL
SYSTEMS
246
Reliability Analysis of Special 247
Systems
11.11 Reliability Analysis
From the definition of spanning tree, any Ti will link all n nodes of G with
(n-1) branches and hence represents the minimum interconnections
required for providing a communication between all computer centres
which are represented by nodes. Thus, the problem of studying the
network reliability between any of the centres in the CCN is a problem of:
C = C1 x C2 x ... x Cn-1
n-1 ( 11.1)
X Ci
i=1
Example I I.I
Enumerate the spanning trees for a bridge network shown in Figure 11.1.
n3
Flg.11.1 A bridge networ
Solution
Using ( 11.1),
***
In step #2, a Boolean algebraic expression has a one-to-one correspondence
with the probability expression if the Boolean terms are modified until they
represent a disjoint grouping. We present below an algorithm for finding the
probability expression and hence the network reliability of CCN starting from
a set of Ti's.
Fo = To
Fi = To U T1 U ... U Ti-1
I Each literal of Ti ---t1 1 for 1S i s (N-1)
N·1
S(disjoint) = T0 U Ti !; (Fi) (11.4)
i=1
Since, all terms in (11.4) are mutually exclusive, the network reliability
expression R8, is obtained from (11.4) by changing Xi to Pi· and X'i to Qi,
viz.,
Rs = S(disjoint) I
I x i IX';) P;.lq;) (11.5)
Example 11.2
Derive the network reliability expression for a simple bridge network as given
in figure 11.1.
Solution
The F'is and !;(Fil's for i = 1, ..., 7 are obtained as shown in Table 11.1.
TABLE 11.1
F· (F·) F· (F-)
F =X +X
For the CCN having equal probabilities of survival p for each communication
250 Reliability Engineering
link(11.6) simplifies to
***
In deriving(11.6) we have assumed perfect nodes. As computer outages
account for as much as 90% of failures in most CCNs, we have to consider
the reliability of nodes as less than 1 in such situations. In such a case.
(11.6) is to be multiplied by a factor <Pn1 Pn2 Pn3 Pn4l where Pni
represents the reliability of node ni.
At various times during its life time, the structure of the system may
not remain constant throughout the mission but may have a time varying
structure due to reconfiguration of the system or changes in the
requirements placed on the system. Such systems are called phased
mission systems. These systems perform several different tasks during
their operational life.
The components can, but need not, be repairable, with specified repair
times. Often a system undergoing a phased mission will contain both
repairable and non-repairable components. In a mission such as that of an
intercontinental ballistic missile, all of the components are non-repairable.
During a manned space flight, however, an astronaut might be able to
replace or atleast repair a malfunctioning item.
(2) Basic Event Transformation: In the configruation for phase j, basic event
Ck is replaced by a series logic in which the basic events Ck 1, ...,
Cki perform s-independently with the probability of failure frtc (k,j).
(4) Minimal cut-sets are obtained for this new logic model.
The method is illustrated with the help of an example. Let us consider the
block diagram for a simple three-phased mission as shown in fig.11.2.
Cutsets for this example system are given as
Phase 1 2/3
Phase 2 Phase 3
Phase 1 BCD
Phase 2 A, BC, BD, CD
Phase 3 A, BCD
The solution is obtained in following steps:
Phase 1
Phase 2 BC, BD, CD
Phase 3 A, BCD
(2) Basic Event Transformation: By applying this step, block diagram as
shown in Fig. 11.3 is obtained.
1/3
(4) The above minimal cutsets are used to obtain total system unreliability.
Example 11.3
d(1) 40 hours
d(2) 60 hours
d(3) 100 hours
Phase 1 BCD
Phase 2 A, BC, BO, CD
Phase 3 A, BCD
r 1
Component 1 I .001 .001 .003 I
frtc(i,j) Component 2 I .001 .005 .002 I
Component 3 I .002 .010 .010 I
Component 4 I .010 .030 .020 I
L J
Solution
Phase 1 0 0 0 0
Phase 2 0 1 1 0, 0 1 0 1 and 0 0 1 1
Phase 3 1 0 0 0 and 0 1 1 1
A, A2 Aa B, B2 Ba C1 C2 Ca D1 D2 Da
Phase 1
Phase 2 0 0 0 1 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 0 0 1 0 0 1 0
Phase 3 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 1 0 0 1 0
0 0 0 0 1 1 0 1 1 0 1 1
1 0 0 0 0 0 0 0 0 0 0 0
-1 1 0 0 0 0 0 0 0 0 0 0
-1 -1 1 0 0 0 0 0 0 0 0 0
-1 -1 -1 0 1 0 0 0 0 0 1 0
-1 -1 -1 0 -1 0 1 0 0 1 0 0
-1 -1 -1 0 1 0 1 0 0 1 -1 0
-1 -1 -1 0 -1 0 -1 1 0 0 1 0
-1 -1 -1 0 -1 0 1 1 0 -1 1 0
-1 -1 -1 1 -1 0 1 -1 0 -1 0 0
-1 -1 -1 1 -1 0 1 1 0 -1 -1 0
-1 -1 -1 1 1 0 1 0 0 -1 -1 0
-1 -1 -1 0 1 0 -1 1 0 0 -1 0
-1 -1 -1 -1 1 0 1 1 0 -1 -1 0
-1 -1 -1 1 -1 0 -1 -1 0 1 0 0
-1 -1 -1 1 -1 0 -1 1 0 1 -1 0
-1 -1 -1 1 1 0 -1 -1 0 1 -1 0
-1 -1 -1 -1 -1 1 -1 -1 1 0 0 1
-1 -1 -1 1 -1 1 -1 -1 1 -1 0 1
-1 -1 -1 -1 -1 1 -1 1 1 0 -1 1
-1 -1 -1 1 -1 1 -1 1 1 -1 -1 1
-1 -1 -1 -1 -1 1 1 -1 1 -1 0 1
-1 -1 -1 -1 -1 1 1 1 1 -1 -1 1
-1 -1 -1 -1 1 1 -1 -1 1 0 -1 1
-1 -1 -1 1 1 1 -1 -1 1 -1 -1 1
-1 -1 -1 -1 1 1 1 -1 1 -1 -1 1
1. e-1.001140 .96 .
04
2. e-1.001160 .94 .
06
3. e-1.ooa1100 .74 .
26
4. e-<.001140 .96 .
04
5. e-(.005160 .74 .
26
6. e-1.0021100 .81 .19
7. e-1.002140 .92 .
08
8. e-1.01160 .54 .
46
9. e-1.omoo .36 .64
-1 -1 -1 0 -1 0 1 0 0 1 0 0
Unreliability = P1 P2 Pa P5 07 010
(e-frtc(l ,11 d(l l) (e-frtc(1,2l d(21) (e-frtc (1,31 d(3)) (e-frtc (2,21
d(2l) ( 1-e -frtc(3, 1ld(l l) (1-e -frtc(4, 1Idlll)
= [e-1.001140] [e-1.0011so1 [e-1.00311001 [e-<.005l60J [ 1-e-1.002140]
[ 1-e-1.01140]
(.96) (.94) (.74) (.74) (.077) (.33) = 0.013
= 01 + P102 + P1 P2 03 + P1 P2 Pa 05 011 + P1 P2 Pa P5
07 010 + ... + P1 P2 P3 P4 05 Os 07 Pa 09 P10 P11 012
.04 + .0576 + .235 + .144 + .013 + ... + 9.9 x 10-5
.72
***
11.3 COMMON CAUSE FAILURES
A common cause failure can have more complex direct consequences than
the simple failure of a number of components. In particular, the failure of a
component might protect another from the common event's effects. Thus,
Commom Cause Analysis cannot proceed in a general manner by
substituting specific component failures for component event.
11.31 Reliability Analysis
The method below is very general & is applicable for calculating the
reliability of a system composed of non-identical components and depicted
by a non-series-parallel reliability block diagram in presence of common
cause failures. However, the calculation for the reliability of a system
with identical components in presence of common-cause failures is
discussed first.
(2) Find the probability that a specified group of m components out of the n
components system are all good.
(3) Construct an expression for reliability using results from above two steps
and the reliability expression of the system under s-independent
assumption.
(2) 2-component processes that include the specified component. There are
a total of nC2 i.i.d. Z2 failure processes but only n-1c1 of these
processes include specified component.
(3) In general, there are nCr i.i.d. Zr failure processes with parameter Ar
governing the simultaneous failure of r components. Out of these nCr
failure processes, n-1Cr_1 include the specified component.
= P0_ 1<1l(t)
n (11.11)
IT Pkl1l(t)
k =n-m+ 1
Example 11.4
Solution
***
Example 11.5
For the system given in the Fig.11.4 below, calculate the system reliability.
Solution
Ruc(t) [1 - (1 - P(t))2]2
= 4 P2 (t) - 4 P3(t) + P4(t)
Example 11.6
Solution
Rnc(t) = P3(t)
Now P(t) = exp { - ( A.1 + 2 A.2 + A.3) t }
P(10) = 0.955997
Hence,
Rnc(10) = 0.87372, or, On = 0.12628
Now,
R1cc(t) = P3<1 l(t)
= p111l(t) p211l(t) p311l(t)
= exp { - (3 A.1 + 3 A.2 + A.alt}
Thus,
Rice (10) = 0.90937, or, Occ = 0.09063
The previous method is now extended for calculating the source-termi nal
reliability of Non Series Parallel network subjected to common-cause
failures. Each failure process is represented by failure combinations and
associated failure rate.
Algorithm
( 1) If there are any parallel branches in the network, combine them into
one i.e. every set of parallel branches is to be replaced by a single
branch.
(2) Write the matrix graph for the network. If b is the number of edges
in the network, then matrix graph is a b x 4 matrix. There is one
to one correspondence between each edge and each row. First
column gives the branch number, second column gives the starting
node of the branch, third column gives the converging node of the
branch and fourth column gives the direction code.
Example I I.7
For the network shown in Fig.11.5, calculate s-t reliability at time 10, 20,
.... 100 hours. Source node number is given as 1 and sink node number
is given as 4. Components can fail individually as well as under common
cause. Components can fail individually with failure rates .001, .002, .003,
.004, and .005/hour respectively. Three common-cause events can occur :
r l
1 I 0 1 3 0 I
2 I 1 0 5 2 I
3 I 3 5 0 4 I
4 I 0 2 4 0 I
L J
node x node
1 2 3 4
r l
1 I 0 2 3 0 I
2 I 1 0 3 4 I
3 I 1 2 0 4 I
4 I 0 2 3 0 I
L J
which shows that node number 1 is directly connected to node
number 2 & 3, Node number 2 is directly connected to node
number 1,3 & 4 etc.
(iii) Obtain minimal paths from above matrix. The process consists of
two steps : (a) & (b). In step (a) Minimal paths in node form are
obtained and in (b) minimal paths in edge form are obtained.
(a) Start path tracing from node number 1 i.e source node. Node
number 1 is directly connected to node number 2. Go to
row corresponding to node number 2, which is directly connected to
1,3 & 4. As in the path tracing, node number 1 has already been
taken, so we take path from node 2 to 3. Now go to row
corresponding to node number 3, which is directly connected to 1,2,
& 4. Node number 1 and 2 have already been taken so path from
node 3 to 4 is chosen. As the sink node number is reached, stop the
process. So the first minimal path obtained is 1234.
1 2 3 4
1 2 4 0
1 3 2 4
1 3 4 0
Step(4) Expand the terms which have complemented variables. For each
complemented variable in a term, two terms in uncomplemented
variables are obtained, e.g.,
t, = 12
t2 = 34
t3 = -134
t4 = 134
ts = -1234
ts = 235
t1 = -2345
ts = -1235
tg = 12345
t10 = 145
t,, = -1345
t12 = -1245
t,3 = 12345
Step (5)
Similarly failure rates of all terms are calculated. At any time, say 10 hours,
reliability for term t1 = exp(-(6. 1x1Q-3)101 = 0.9408232. Reliability of all
other terms can be calculated in a similar manner.
Step (61
R(10) = 0.94 + 0.91 - 0.89 +O .89 - 0.88 + 0.88 - 0.84 - 0.87 + 0.83
+ 0.80 - 0.85 - 0.86 + 0.83 = 0.97
***
11.4 RELIABILITY AND CAPACITY INTEGRATION
These two performance measures are thus used independently while neither
is a true measure of the performance of the telecommunication network.
T = SUF (11.12)
S: subset corresponding to those system states where at least one
path is available from s to t.
Wi = Ci I C max (11.15)
Then the weighted reliability measure, viz, performance index, is defined as:
Pl = :E Wi Psi (11.16)
Si e S
Example 11.8
A network with 5 branches is given in Fig. 11.6 where the capacity of each
link is also shown. Compute the performance Index.
Solution
The 16 success states are listed in the table 11.2 considering path
availability only. The capacity of subnetwork for each success states is
also given in the table, Cmax = 7. The performance index, Pl is now
determined as
S E
TABLE 11.2
sivstem success states
Element States Capacity Probability
A B c D E (C·) Term(o;)
n -- n n --
0 0 1 1 0 4
0 0 1 1 1 4 a..q ... ... ....
0 1 1 0 1 3 a..n. ... ,... ....
0
0
1
1
1 1
1 1 0
1 4
4
-n .n,n
-· -.n_.q
- 9
1 1 0 0 0 3 o..n.n ,... .n
1 1 0 0 1 3 o..n. ... ,... ....
1 1 0 1 1 7 o..n.n n ....
1 1 0 1 0 3 Pa" n n
1 1 1 1 0 7 -,..
1 1 1 1 1 7 oAn,n.n_,n
1 1 1 0 1 o,.n.... ... ...
3
.
1 1 1 0 0 3
oAn,... n ....
1 0 1 1 1 4
n n . ..,. ,.. .n
1 0 1 1 0 4
n n. n n .n
1 0 0 1 1 4
n ,.... "' n .n
D..OhOD..iD..
Reliability Analysis of Special 27
Systems 1
Pl = (2p4q + pS) + 4/7(p2q3 + 4p3q2 + 2p4q) + (3/7)(4p3q2 +
p2q3 + p4q) ( 11.18)
12.1 INTRODUCTION
Reliability costs can be divided into five categories as shown in fig. 12.1.
272
Economics of Reliability Engineering 273
Classification
I
This classification includes all those costs associated with internal failures,
in other words, the costs associated with materials, components, and
products and other items which do not satisfy quality requirements.
Furthermore, these are those costs which occur before the delivery of the
product to the buyer. These costs are associated with things such as the
following :
1. Scrap
2. Failure analysis studies
3. Testing
4. In-house components and materials failures
5. Corrective measures
Classification II
1. Evaluating suppliers
2. Calibrating and certifying inspection and test devices and
instruments.
3. Receiving inspection
4. Reviewing designs
5. Training personnel
6. Collecting quality-related data
7. Coordinating plans and programs
8. Implementing and maintaining sampling plans
9. Preparing reliability demonstration plans
Classification III
Classification IV
Classification V
This category includes costs associated with detection and appraisal . The
principal components of such costs are as follows:
1. Cost of testing
2. Cost of inspection (i.e.,in-process, source, receiving, shipping
and so on)
3. Cost of auditing
Total Cost
----------------------------- --_...
Failure Cost
Cost
Mfg. Cost
Operating Cost
Reliability
Flg.12.2 Cost curves or a product.
The subsequent sections describe some reliabililty cost models which show
how the equipment life-cost is affected by reliability achievement, utility,
depreciation and availability.
The reliability and cost relationship for any equipment can be described
mathematically by suitably choosing a cost-reliability relationship function.
A suitable cost-reliability function C(r1,r2) must satisfy the following
properties:
1. Misra et al Function:
3. Aggarwal et al Function:
(12.6)
C(r) = k [tan(1t r/2)]hCrl
where k is a constant and h(r) is given (12.7)
by: h(r) = 1 + ra; 0 :!> a :!> 1, or
h(r) = m; 1 :s; m :s; 2
and r is equipment reliability.
l 0; r, ::? r2 (12.10)
where a and b are constants; and r1 and r2 are reliability values of the
equipment.
Cost
c=ac 6.67b
II
IIII
C=ac 2.Sb
Ii
II
II
'0.6'0.85
Reliability
Flg.12.3 Product reliability and cost.
Let us assume that the cost of equipment is known at some reliability, say
r0, then
Therefore, (12.13)
a = C 1 e-
b
Thus, if the equipment cost is known at some value of reliability and the
manufacturer intends to improve the reliability of the equipment, the
corresponding cost to be incurred can be evaluated by obtaining the
constants a and b with the help of the above equations and then by using
these values in the equation:
Cs = C m = a eb/11-rl[ln(1-R)]/[ln(1-r)] (12.16)
dC8 I dr = 0 (12.19)
Example 12.1
Solution
***
12.5 RELIABILITY UTILITY COST
MODELS
Suppose customer invests money for a product. The costs and benefits
accruing from the investment will continue for a number of years. The
similar products may have different costs and returns depending upon the
manufacturer. A cost utility analysis is required for making comparisons of
product values. The customer's investment includes the following categories
of costs:
When the product is put to use, the customer has to spend money every
year on items (ii), (iii) & (iv). If the product is used for, say n years, then the
present value of the money that the user has to spend for all the years can
be calculated as follows:
n
(12.21)
V1 = Ci + :E (C0j + Ctj + CmjH1/1
+ili
j =1
where i is the annual interest rate (expressed as a fraction) and Coi• Cti and
Cmj are the respective costs incurred in the jth year and assumed to be paid
at the end of that year.
If, at the end of the nth year the scrap value of the product is Vs, then
the present value of the n-year-old product is
Cp = V1 - V2
n
= Ci + L [1/1 + i)Ji [C0j +Ctj + Cmjl - Vs [1/(1 +i)Jn
j=l
n (12.23)
= Ci + L [1/11 + i)Ji !Cyj) - Vs £1/(1 +
i)]n
j=l
Where Cy is the yearly cost. The product having lowest Cp should be choice
of the customer. However, while making decisions he has to keep in mind
other factors such as availability of spares, possible increase in costs in
future, etc.
Example 12.2
TABLE 12.1
Cost C; Cv1 C..? Cv::i v..
Product A 20,000 1000 1600 2200 15,00
0
Product B 15,000 1500 1800 2000 10,00
0
Solution
It is clear from the above calculations that in spite of a higher initial cost,
Economics of Reliability Engineering 28
3
product A is more economical. This is due to low failure and maintenance
costs as a result of its higher reliability.
***
12.51 Depreciation-Cost Models
(12.24)
(12.25)
(12.27)
Therefore, (12.28)
d = 1 - lVs/Ci]1/n
Example 12.3
Solution
Resale value
2 3 4 s
Years of service
Flg.12.4 Depreciation Models.
TABLE 12.2
Year(j) Initial Cost Depreciation Cost at the end of the year
1 1000 129 871
2 871 112.4 758.6
3 758.6 97.9 660.7
4 660.7 85.2 757.5
5 575.5 74.2 501.3
***
28 Reliability Engineering
4
12.6 AVAILABILITY COST MODELS FOR PARALLEL SYSTEMS
Then the total system cost due to operation, maintenance and failures per
unit time will be
Cs = ----------------------- (12.30)
U + D
(12.31)
Cs = C1m As + C2Bs
= C1m + (C2-C1m)Bm (12.33)
It is clear that as m increases the first term increases and the second term
decreases and therefore there exists a value of m for which Cs is minimum.
This can be found by solving the equation
0 (12.34)
dm
Example
12.4
Solution
On solving this equation, we find the value of m lies between 2 and 3. Now
1. There will always be a configuration which will have the lowest cost
amongst all possible configurations.
2. Same reliability level may be achieved for different costs.
3. System may have different reliability levels for the same cost in two
or more configurations.
4. The reliability level can be higher for a combination of components
which results in lower system cost i.e. system reliability need not be a
monotonically increasing function of cost
5. Also, there will exist a configuration having the highest reliability
level amongst all the possible component groups.
TABLE 12.3
Comoonent 1 Component 2 Component 3
A, 0.90 10 81 0.80 5 c, 0.95 40
A2 0.95 30 82 0.90 20 C2 0.98 100
A'3 0.98 100 8'3 0.95 50
For instance, suppose an engineer has to design a system which has three
components connected in series. The number of options with their cost and
reliability corresponding to each component are given in Table 12.3.
T A B L E 1 2 . 4
............... . ·Fi······ --····.c......................... . ... ... ....l ""R ..... .... ... . · c:----·····--r·····----·····........
.R............Tc............l
A,81C1 0.684 55* A381C1 i 0.745 145 l A::t 81C, 0.768 l
245 i
.12.G1.... .Q:.?.?.9.... ..?..9... ........ a2.G1.....L9.:. -- .J.. Q.....La. 2.G2..... .
9..&.1:.l.?..?..9......l
..1..af i......9..:. .!.?... .J.Q9....... . .a. _af i.....+.Q.&.1:. J.9....... .. a
.G2.......9.:.J.:?.)..?..Q. ..
A281C1 0.722 75 A, 8,LJ.0.706 115 lA281C2 0.837 1 135 l
A?8?C1 0.812 90* A, 8?C? !0.794 130 i A?82C? 0.838 i 150 !
. .2..af i......9..:. . .?... .J.:?.9..... -.1.af 2.....l.Q.&.. J.9.......L2 -G2.. ... ..9.:.
. UJQ. ..J
The component groups categorized by the various degrees of reliability
yield 18 combinations shown in Table 12.4.
The six expected desirable configurations can now be analysed from Table
12.4. These configurations shown in this table are also exhibited
graphically in Fig.12.5.
Now, the problem arises how to generate only these six optimum
configurations mathematically so that the system designer may get
maximum benefit of his resources without wasting much time and without
the fear of choosing a configuration which has less reliability than possible
for the given cost.
The situation may arise in which the minimum reliability requirement and
the maximum cost permitted is predecided. In such a case one has to see
only those optimum configurations which satisfy both the required
conditions.
0.9S
33,2
0.9
2,3,1
O.BS
0.8
0.7S
0.7
0.6S
0.6
SO 70 90 llO 130 ISO 170 190 210 230 2SO
Cost
N
Il Mi
j =1
N
[ l: (Mr 1I + 1J
j =1
12.71 Algorithm
1. i= 1, I= 1(Initialize)
2. Calculate
N
II R ·l,J·
j =1
(12.35)
N
C1 =
I: Ci.i
j =1
Where I corresponds to the number of times step 1 is
performed
(12.36)
3. Calculate A1 = [f(Ri.i• Ci,j)J, j = 1,2,...,N
(12.37)
(12.38)
and
(12.39)
N
5. When I I: 1Mr1); stop.
j=1
(a) Items are completely effective until they fail, after which they are
completly ineffective.
(b) Queuing problems (arising because of several items failing
simultaneously) are ignored since it is assumed that maintenance/repair
crew size is unlimited or sufficient to carry out maintenance/repairs.
(c) Failed items are replaced with identical items, that is , the replaced
item has the same life time distribution as that of the failed item.
(d) The replacement time is negligible.
If the equipment is used for T years, then the total running cost
incurred will be
T
K(T) =I r(t) dt (12.41)
0
290 Reliability Engineering
Thus,
Total cost incurred on the equipment in T years
= Capital cost + Total running cost in T years - Scrap value
= C + K (T) - S (12.42)
C + K(T) - S (12.43)
A (T) = -----------------
T
Thus, C - S + K(T)
Example 12.5
The cost of a machine is $15000 and its scrap value is $1000. The
maintenance costs of the machine (as found from the records) are as
follows:
Solution
T
K(T) = E r(t).
i=1
TABLE 12.5
Calculations for A(T)
Years t r(t) K(TI C-S +K(TI A(T)
1 200 200 14200 14200
2 300 500 14500 7250
3 500 1000 15000 5000
4 650 1650 15650 3912
5 800 2450 16450 3290
6 1000 3450 17450 2908
7 1600 5050 19050 2721
8 2100 7150 21150 2643
9 2700 9850 23850 2650
From Table 12.5, it may be seen that A(TI is minimum in the eighth year.
Thus, the machine should be replaced at the end of eighth year,
otherwise the average annual cost will again increase.
***
Example 12.6
A lorry fleet owner finds from his past records the cost per year of running a
lorry and its resale value, as given in Table 12.6. the purchase price of the
lorry is $25000. At what stage should the lorry be replaced ?
Solution
From Table 12.6, it may be noted that the scrap value is a decreasing
function of time. We now wish to minimise A(T). The analysis of the
problem is given in Table 12.7
TABLE 12.6
Cost Data for Running a Lorry
1purchase pri.ce of the Iorrv: $
25000)
Year of Resale price at Annual Annual
operation end of vear operating cost Maintenance cost
1 1 6300 300
2 5000
13500 7000 500
3 1 7700 1000
4 2000
9000 9500 1 500
5 8000 1 2500
6 7500 1500
13000 3500
7 7000 14300 4500
TABLE 12.7
AnaI VSI.S 0f Examo1Ie 12 6
Year Resale Investment Annual Cumulala- Total ann- Average
of price at cost C-S(t) operating tive of ual cost C- cost A(t)
oper- end of cost r(t) r(t), K(t) S(t) + K(t)
ation vear,S(t)
1 1 5000 10000 6300 6300 16300 16300
2 13500 1 1 500 7000 13300 24800 1 2400
3 1 2000 13000 7700 21000 34000 1 1333
4 9000 16000 9500 30500 46500 1 1625
5 8000 1 7000 1 1500 42000 59000 1 1800
6 7500 1 7500 13000 55000 72500 1 2083
7 7000 18000 14300 69300 87300 1 2471
Table 12.7 indicates that the value of A(T) is minimum in the third
year. Hence, the lorry should be replaced after every three years which
results in the lowest average annual cost of $1 1 333.
* * *
13
RELIABILITY MANAGEMENT
13.1 INTRODUCTION
SERVICE -
8
-INFORMATIO
- I
I
I
111 I
I
EXTERNAL SOURCES
293
29 Reliability Engineering
4
decrease in wastage of money, material, and manpower. As organizations
grow more and more complex, communication and coordination between
various activities become less and less effective. The cost of
ineffective communication can be dangerously expensive in terms of
both time and money. Moreover reliability achievement needs, in
addition to proper coordination of information, a specialized knowledge of
each and all of the interrelated components in a system. This places a
great emphasis on the creation of an independent group which could not
only coordinate between different departments but also carry out all
reliability activities of the organization.
The managing of reliability and quality control areas under the impact of
today's organized world competition is a highly complex and challenging
task. Management's reliability and quality control ingenuity in
surmounting the technological developments required for plant equipment,
process controls, and manufactured hardware requires a close working
relationship between all producer-and user-organization elements concerned.
The techniques and applications of reliability and quality control are rapidly
advancing and changing on an international basis. Industry views the use
of higher performance and reliability standards as scientific management
tools for securing major advantage over their competition. The application of
these modern sciences to military equipment, space systems, and
commercial products offers both challenge and opportunity to those
responsible for organization effectiveness. The use of intensified reliability
and quality programs as a means to improving product designs, proving
hardware capability, and reducing costs offers far reaching opportunity for
innovations in organization and methods..
1. Clearly understandable,
2. Unambiguous, and
3. Realistic in terms of resources available.
6. Maintenance policy:
Management must provide the controls needed to assure that all quality
attributes affecting reliability, maintainability, safety, and cost comply with
commitments and satisfy the customer's requirements. Tersely stated,
management must have well-planned policies, effective program planning,
timely scheduling, and technical training. Management must clearly state and
support its objectives and policies for accomplishing the product quality and
reliability and assign responsibility for accomplishment to appropriate
functions throughout the organization.
PRESIDENT OR
PLANT GENERAL MANAGER
RESEARCHQUALITY
LEGAL& ANDASSURANCE PRODUCT
PROCUREMENT
ENGINEERINGDIRECTOR SUPPORT
QUALITY RELIABll.ITY
CONTROL CONTROL
MANAGER MANAGER
QUALITY CONTROL
PROJECT
ADMINISTRATOR(S)
PROJECT
ADMINISTRATOR (S)
OPERATIONS.
GROUP
MBTROLOGY
ANALYSIS
GROUP
INSPECTION
Management must recognize and choose the type of persons that are needed
to fill the key positions in the reliability and quality control organization.
Management must know that these selected people will be able to work
closely with and motivate others to accomplish their respective tasks.
Top management philosophy establishes the element for employee
motivation throughout the enterprise.
The cost control function within the reliability and quality control
organization is most frequently located within the quality control
Administrative Group, the Quality Control Systems Group, or the Quality
Control Engineering Group. Regardless of which group is given the
responsibility, the director of reliability and quality control and his
department managers must maintain very close and continuing
communications with the responsible individuals. Timely analysis of trends
and decisions and guidance should be provided frequently.
The reliability and quality control management team has value to the total
organization that is related directly to its favourable impact on product
Reliability Management 301
The abrupt deemphasis of cost plus fixed fee military contracting has
focused attention upon the incentive contract as a means for assuring
effective management interest in achieving product reliability and
maintenance commitments. With this medium, a specified scale of incentive
and sometimes penalty is applied as a factor in the total contract price.
Penalty scales are usually applied at lower rates than incentive scales
and may be omitted in competitive fixed price contracts.
Every product merits an analysis of the total tasks to be performed with the
allowed costs. The estimation of costs for every function must be quite
close to the final actual costs of the specific function if effective results are
to be achieved. It is apparent that the general readjustment (usually
arbitrary cuts) of budgetary estimates by top management will be in those
areas where the departmental estimates and accounting reports of past
performance on similar programs are in obvious disagreement.
Cost estimation of the equipment and facilities required for standards and
calibration, process control, inspection and test is another essential task
for reliability and quality control engineers. Applicable staff and line
personnel should be given the opportunity to take part in the planning of all
equipment and facilities expansion, retirement, or replacement.
To control cost in the quality and reliability programs, careful long range
planning must be exercised by management. This planning must be
accomplished by those to whom top management has delegated the
responsibility and who will be held accountable for the implementation of the
plans. The controlling of these long range plans at the time of
implementation is one of the basic principles of cost control.
objectives of the consumer and the company. At the top management level,
the matrix technique is useful in determining the organisation structure
based upon the responsibilities delegated to each department and as a
basis for penetrating new market areas. In all cases, the effectiveness
of the management process is directly related to profitability through
consumer assurance that product performance and quality are maximized
within the negotiated cost structure.
1. Customer Requirement
2. Special Requirements
3. Schedule
6. Manpower Availability
The program requirements for specialized manpower are such that this
factor is considered. This objective is not heavily weighted since it is
related to attainment of other objectives.
The management function then utilizes this tool for planning and action in
performance of its activities. The organization matrix provides the
mechanism for management in an expeditious manner and efficient
departmental control commensurate with this company's products and
philosophies.
The nature of the reliability and quality control activity imposes an added
burden upon the planning which must precede the provision of facilities
and equipment. The managers of plant engineering and facilities functions
are under constant pressure to hold down the costs of space, equipment,
and material, as well as the cost of personnel. In the natural optimism for
self confidence in the organization and its product, quality and reliability
methods and equipment requirements are sometimes taken for granted.
It is desirable that the provisions for reliability and quality control facilities
and equipment be made in close cooperation with the company's
engineering design group; if feasible, the planning should be made during
the concept and preliminary design phase of the product, and certainly in
conjunction with plans for new plant locations or structural additions to the
existing plant. It is important that any particular requirements for test
equipment be given to management so that they can be provided in the
planning layout of new facilities.
The critical demands of advance planning for reliability and quality control
equipment appear in the funding and scheduling of the production master
plan. Equipments which require long lead procurement must be included
within the master schedule to minimize the terms of loan capital provided
for this purpose. Similarly, the funding requirements for facilities must be
evaluated, for these will include such considerations as inspection area
lighting, temperature, humidity, air conditioning, clean room, air control and
flow distribution, special disposal and sanitation installations, personnel
safety provisions, and mobile access into all such areas.
In some organizations the reliability and quality control groups have been
given the responsibility for test equipment design. This requires that very
capable engineers be made responsible for this effort. When adequately
staffed, certain advantages may accrue through this organization policy.
These advantages include improved timeliness and effectiveness of test
equipment, greater emphasis on automation, improved supplier coordination,
improved integration of all test functions, and optimum emphasis on
nondestructive inspection and test methods.
(a) A brief and factual account of the development and objectives of the
reliability programme,
(b) explicit definition of terms that are of interest to the study and
that are used throughout the specification,
(c) data requirements, such as item of data, criteria, unit of
measurements, etc.,
(d) a complete and detailed technical inventory of the product to
be evaluated, and
(e) materials and facilities needed for the evaluation.
Two methods are usually employed in collecting the required data, depending
upon the relative importance of accuracy vs. cost.
The second method is to employ technical personnel who have the assigned
responsibility for carrying out the measurement programme. This method
has numerous advantages. A few important ones are enumerated below:
2. A high interest in the study can be maintained at the source of the data.
3. As a result of (1) and (2), the evaluation personnel can make the
necessary decisions to keep the study on the right course.
The use of samples in the measurement of reliability requires that the final
result be presented as an estimated value with the confidence limits to
indicate the probable range within which the population mean will fall. The
larger the size of the sample, the narrower will be the confidence interval.
1. Were the data taken from the development tests, field tests, component
tests, system tests?
2. What were the environmental conditions?
3. Were the data homogeneous and representative?
4. How large was the sample size?
5. What assumptions were made concerning the shape of the failure
distribution?
Data
Data Collection Identification
Data bank
External External
sources requests
The following areas often generate information vital to reliability control and
should be periodically monitored to establish that no new data sources are
310 Reliability Engineering
Look for major subcontracts involving test requirements and individual tests
subcontracted directly at project engineering request.
A large company has much valuable data generated from one time only
sources, libraries can serve as checkpoints which often turn up these
occasional inputs.
3. Contracts Department
1. Failure Reports
Control on failure reports will vary with the volume of reports to be handled.
A small quantity can reasonably be tabulated, and the trends analyzed and
studied, by using manual methods and by working from the original
narrative descriptions. As the quantity of reports grows, the necessity
of conventional coding and restricted English terms increases if the
information is to be handled on a mass basis. A computer search is
possible only when each field or box (by which a search might be made)
is restricted to a stipulated selection of terms or figures on the original
report. The trends thus revealed naturally require subsequent engineering
interpretation of significance.
2. Test Reports
13.8 TRAINING
The plan of action by management for the advance planning of the goals
rests and is dependent on the company's resources such as facilities, tools,
raw materials, personnel, productive capacity, sales outlets, etc. Because
business is subject to change, it is rather difficult to predetermine definite
training courses during the early product planning stage. But when a product
becomes firm business and specifications are known, training plans must
be activated on a time phased basis.
One of the duties of the quality assurance engineer should be to ensure that
supervisory personnel become aware of the training needs of their workers
and to make certain that means are devised and used to determine exactly
what, when, and how training is to be implemented and made effective
(Fig.13.5).
Problem Input
- 1.
Determine
2.
Classify
3.
Take
Training ID Levels of Training
Needs Essentiality Action
4. Assure that all reliability and quality control personnel are capable of
performing their tasks effectively and efficiently.
Informal training (on the job) occurs throughout industry when any member
of management gives instructions to his subordinates. Skill in such
communication is important in achieving desired actions. Motivation for
quality and reliability is a daily task and is the result of organized
effort. It requires the measurement of progress and gives frequent
feedback to employees of the quality of job they are doing. Control
charts provide a scoreboard of personnel performance. This feedback of
information, when coupled with plans for corrective-action patterns, will
promote desired motivation.
Formal training occurs when skills, experience, ideas, and information are
organized into a classroom curriculum to achieve desired levels of skills and
understanding. The objectives in training programs must be stated, and they
must be realistic. The applicable subject matter must be organized and
accurate, and methods must be suited to subject matter. Instructors must
be qualified and experienced, and proper evaluation and feedback for
curriculum improvement must be provided. Schedules must be realistic and
planned to have personnel trained as the task is implemented.
The following factors can be used to evaluate training for both mental and
physical skills:
316
Reliability Applications 317
/ \ 0/
---------------8-- -----0-----fr---- -==-::=:i-
Cowx}
Runway Localizer
Rwiway d
Transmitter
(a) PLAN VIEW
Path of
Glidepath Equipment
/
Path of
Airplane
The runway localizer provides the lateral or azimuth guidance that enables
the airplane to approach the runway from the proper direction. Signals
carrying azimuth guidance information are produced by a VHF Localizer
equipment. The glidepath equipment provides an equisignal path type of
guidance in the vertical plane analogous to the guidance in azimuth provided
by the equisignal path of the localizer.
Where,
Reliability of the system
R
Failure rate of the system.
A.
Time
t
MTBF of the system.
m
14.13 Localizer
-Equisignal
Course
150Hz
_,x:90 Hz
( ; c;
'1 /,.
"'..l/
A Cat II system has two channels, each consisting of main transmitter Unit,
Motor Drive Unit and Mechanical Modulator in addition to Coaxial
Distribution Unit, Aerial Distribution Unit, Localizer Aerial Arrays, Monitor
Aerials and Associated Equipment, Control Unit (local) and Control Unit
(remote).
14.14 Glidepath
In order to ensure that there will be only one equisignal glidepath, the
lower antenna is so excited that its lobe maximum is larger than the
maximum of the upper antenna and is so placed that its pattern has a
maximum that is at a relatively large angle above the horizon as shown in
Fig.14.3. Different side band frequencies are radiated from these antennas
in the same manner as indicated for localizer in Fig.14.2. The proper
glidepath is in the range of 2 degree to 5 degree. Since the glidepath
equipment must be placed at the side of the runway so that it will not
present a hazard, the antenna patterns in the horizontal plane must be
carefully controlled so that the glidepath will have the correct slope along
the azimuth course defined by the localizer. The category-II equipment is
identical to that of localizer equipment.
Lower Antenna
Pattern ---
Upper
Antenna
Pattern ---+-- "
The functional performance of the localizer equipment of the ILS has been
described. Based on this functional performance we obtain the Reliability
Logic Diagram (RLD) for Cat II system which has been shown as RLD -1 in
RL0-1
• I.7 Remote Control Unit
I.I Main Tnnmiitter Unit 1.8 Local Control Unit
• 1.2 R.F. Distribution Unit • 1.9 Aerio1 Arrays
1.3 Motor Drive Unit I .I 0 Aerial Distribution Unit
+ 1.4 Mechanical Modulator + I.I I Monitor Aerials ond Associated
Unit Equipment
+ 1.5 Monitor Unit
1.6 Cooxiol Distribution Unit
RL0-1.5
RL0-1.4
+ 1.5.I Position Monitor
+ 1.5.2 Width Monitor
1.4.1 90 Hz Modulotor Unit + 1.5.3 Clearance Monitor
1.4.2 I SOiiz Modulator Unit + 1.5.4 Alarm Unit
1.4.3 Motor Speed Alarm Unit
RL0-1.5.1
1.5.1.1 RFond AGC Amplifiec
1.5.1.2 Audio Amplifier
1.5.1.3 90 Hz ond 150 HzFilter
1.5.1.4 Ba1ancecl D.C. Amplifier
1.5.1.5 POlition Attenu1tor
RL0-1.11
RL0-1.5.4
1.11.1 RF Attenuotor
1.5.4.I Interconnection Boord 1.11.2 Monitor Line R.F. Amplifier
1.5.4.2 Stabilized Bia Supply Unit
1.5.4.3 Alarm Boord
Fig.14.4. Some of the blocks (*) namely 1.2, 1.7 and 1.9 do not contribute
to the failure of the equipment and are therefore not analyzed further.
Some other blocks, namely 1.1, 1.3, 1.6, 1.8 and 1.10 are simple and their
failure rates can be directly estimated by finding out the failure rates of the
constituent components. Blocks such as 1.4, 1.5 and 1.11 require further
decomposition in separate sub-blocks and are indicated by ( +). The
numbering of the blocks has been done in such a way that it clearly
shows that this is the sub block of which particular block. The following
points have been kept in view while analyzing Fig.14.4.
(ii) The components of the coaxial distribution unit have not been
included in the analysis (based on experience) except for four
switchover relays.
(iii) The remote control unit has only some switches and all other
functions are confined to the local control unit only. Therefore,
remote control unit is not considered in the reliability analysis.
(iv) In the local control unit , identity tone detectors have not been
considered into reliability analysis as their failure does not result
into the failure of the equipment.
(v) The failure rates of Aperture Monitor Combining unit in the Aerial
Distribution Equipment and of the monitor dipoles in Monitor Aerials
and Associated equipments have been taken as equal to zero.
(vi) In cat II system the stand by channel B comes into operation when
the main channel A fails. In practice channel A is operated for some
time, then channel B is operated for some time, then channel A
and so on. Therefore, the effective failure rate of each channel would
be the half of each channel's failure rate, calculated on the
assumption of continuous operation.
The failure rate calculations for localizer are shown in the respctive tables.
The failure rate given in these tables refer to the values per million parts and
therefore are to be multiplied by 10-6. They have been taken from Mil-HDK-
217. The following notes will be helpful in understanding these tables.
(i) Reference Note No. has been included in the tables for each entry.
Its significance is:
(a) Note No. 1 indicates that the value has been estimated using the
Handbook.
(b) Note No. 2 indicates that the value has been estimated by
referring to Part Stress Method in the Handbook.
(c) Note No. 3 indicates that the failure rate for this item has been
calculated in another table. The numbers of the tables and the
numbers in the Reliability Logic Diagrams are self explanatory.
(ii) Ground fixed environment (GF) has been assumed for failure rate
calculations.
(iii) Resistors are of carbon composition type. They have been assumed
classified according to a style with 2 letters. For resistors and
capacitors, commercial, non-mil quality has been assumed and the
value of quality factor 7ta is taken as 3.
(iv) Diodes have been divided into two categories - General purpose
(silicon) and Zener & Avalanche. Both these and transistors are
assumed to be of non-mil hermetic type with 7ta = 5.
(v) Connections of PCBs with coaxial cable are taken to fall in the
category of coaxial connectors. Control panel with coaxial cable fall
in the category of circular, rack & panel arrangement. Wiring Board
connectors fall in the category of printed wiring Board. Sockets,
Plugs, etc. are considered similar to coaxial type of connectors for
failure rate estimation. Transformers are categorized into two types
Audio transformers and RF transformers. They are assumed to be of
non-mil type and 7ta = 3. Inductors are also assumed to have 7ta =
3.
(vii) Quartz crystal, fuses, lamps (neon and incandescent) are assumed
to be of MIL-C-3098 specification and meters are assumed to be of
MIL-M-10304 specification.
(viii) Warning devices, batteries and all the elements of Aerial Distribution
unit except the resistors and capacitors are assumed to have zero
failure rate.
Table 14.1 summarizes the failure rates of all the constituent units of
localizer. These failure rates have been obtained as shown in the
subsequent tables. The actual values for all components could not be
reproduced for obvious reasons.
TABLE 14. 1
Failure Rates for Units of Localizer
Sr.No Name of the Component Failure Rate
1. Main Transmitter Unit f,
2. R.F. Distribution Unit f?
3. Motor Drive Unit h
4. Mechanical Modulator Unit fa
5. Monitor Unit f!=i
6. Coaxial Distribution Unit fs
7. Remote Control Unit h
8. Local Control Unit fa
9. Aerial Arravs fg
10. Aerial Distribution Eauipment f,n
11. Monitor Aerials and Associated Eauipment f11
The block diagram is shown in Fig. 14.4. Let R1 be the reliability for both
the channels, each comprising of blocks 1.1 to 1.4. Let R2 be the
reliability for parallel combination of blocks 1.5. Let R3 be the reliability of
blocks 1.6 to
1.11 in series.
The total failure rate for blocks 1.6 to 1.11 in series is given by :
A.3 = fs + h + fs + fg + f10 + fn
Therefore,
TABLE 1.4
Mechanical Modulator Unit
Sr. Name of the Component Ref. Oty. Generic no Failure
No Note failure rate
No rate
1. 90 Hz Modulator Unit 3 1 18.249 - 18.2490
2 150 Hz Modulator Unit 3 1 18.249 - 18.2490
3 Motor Speed Alarm Unit 3 1 11.760 - 11.7600
Total 48.2580
TABLE 1.4.1
90 Hz Modulator Unit
Sr. Name of the Component Ref. Qty. Generic no Failure
No Note failure rate
No rate
1. Fixed oaoer capacitor 1 7 0.0260 3 0.5460
2 Variable air trimmer 1 3 1.9000 3 17.100
capacitor
3 R.F. Transformer 1 1 0.1500 3 0.4500
4 Socket 1 3 0.0170 3 0.1530
Total 18.2490
14.16 Glidepath reliability calculations
The functional performance of the glidepath equipment of the ILS has already
been described. Based on the functional performance we obtain the
Reliability Logic Diagram for the Cat II system which is shown as RLD-2 in
the Fig. 14.5.
The blocks in this diagram are numbered as 2.1 to 2.11. Some of the blocks
(*) namely 2.2, 2.7,2.9 and 2.10 do not contribute to the failure of the
equipment and are therefore not analyzed further. Some other blocks namely
2.1, 2.3, 2.6, 2.8, and 2.11 are simple and their failure rates can be directly
estimated by using the failure rates of the constituent components. Blocks
such as 2.4 and 2.5 are decomposed into various sub-blocks and it is
indicated by( +).
The failure rate evaluation of the glidepath equipment has been carried out
assuming the points as indicated in the case of localizer except for the
following:-
(i) The number of switch over relays in the coaxial distribution unit
is now 3 instead of 4.
(ii) In the local control unit, identity tone detectors are not used in
this case.
(iv) All the associated units except the RF amplifier in the monitor
aerials and associated equipments have zero failure rate.
Table 14.2 summarizes the failure rates of all the constituent units of the
glidepath equipment. These failure rates have been obtained as shown in
the subsequent tables.
The block diagram is shown in Fig.14.5. Let R 1 be the reliability for both
the channels, each comprising of blocks 2.1 to 2.4. Let R2 be the reliability
for parallel combination of blocks 2.5. Let R3 be the reliability of blocks
2.6 to
2.11 in series . Then the glidepath reliability RG is given by
RLD-2
"2.7 Remote Control Unit
2.1 Main Tn1111111itter Unit 2.8 Local Control
"2.2 R.F. Dim0ution Unit "2.9 AerW
Unit Auays
2.3 Molot: Drive Unit
+2.4 Mechanical Modulator Unit "2.10 Aerial Distribution Unit
+2.5 Monitor Unit 2.11 Monitor Aeriall and Aooociatecl
2.6 Coaxial Diltrihulion Unit Equipment
--§B0-
RLD-B
RLD-2.4
+ 2.H Pooition Monitor
2.4.1 90 Hz Modulator Unit + 2.5.2 Width Monitor
2.4.2 150Hz Modulator Unit + 2.5.3 C1eanrn:e Monitor
2.4.3 Molot: Speed Alum Unit + 2.5.4 Alann Unit
RLD-2.S.I
2.5.1.1 RFand AOC Amplifier
2.5.1.2 Audio Amplifier
2.5.1.390 Hz and ISO Hz FUter
2.5.1.4 Balanced D.C. Amplifier
2.5.1.5Pooition Attenuator
RLD-2.5.4
2.H.I Interconnection Boord
2.5.4.2Stabiliud Siu Supply Unit
2.5.4.3 Alann Boan!
The total failure rate for blocks 2.6 to 2.11 in series is given by :
Therefore,
00
mG = I RG dt
0
It may be observed that the localizer as well as the glidepath make use
of active parallel as well as standby redundancy in some subsystems.
Therefore, the failure rate will be the function of time. Hence, talking about a
consolidated failure rate for these units is meaningless. Therefore, we have
evaluated the reliability expressions and used them to evaluate the MTBF for
these units.
n
Front end
processor
System
controllen
Station 1 Station 2
Fig.14.6 Bank data network.
To answer the first question, compare the present failure intensity with
the project's failure intensity objective. The question regarding when the
software will be ready for release can be answered by observing the
completion date line in Fig.14.7. We can determine whether we should
regress to a previous version by tracking present failure intensity for each
version. If the new version is not meeting the failure intensity objective
and the old one is, and the difference between the failure intensities is
substantial, it will probably be worth the effort to regress.
The model can help the manager, through simulation, reach trade-off
decisions among schedules, costs, resources, and reliability and can assist
in determining resource allocations. One chooses several values of each
parameter that is to be varied, applies the model to compute the effects,
examines the results, and iterates this procedure as required.
We will present the results of the studies here to show their usefulness. It
is assumed that the increases of actions 3 and 4 are made by reallocating
experienced people from other parts of the project so that negligible
training time is involved.
60
so
40
Flg.14.8 predicted
Effect or failure Intensity
completion daleobjective on
for bank projecl
240
220
200
Testen
60
so
40
30
20
Present Doubled
Tripled
Resource levels
The calendar time failure intensities for the front end processor and system
controller software will be 0.0038 failure/hr and 0.002 failure/hr,
respectively. The 24-hr reliabilities can be calculated, using a standard
formula for relating failure intensity and reliability, as 0.913 and 0.953. The
overall 24-hr period reliability as seen from station 1 is calculated to be
0.857 and that from station 2 turns out to be 0.853. If the bank considers
this unacceptable, improvements should be made first in the front end
processor software and then in the system controller software.
PROBLEMS
2. An engine shaft has a failure rate of 0.5 x 1o-7thr. The shield used
with the shaft has a failure rate of 2.5x10-7 /hr. If a given company
has 5000 engines with these shafts and shields and each engine
operates for 350 days of useful life. Estimate the number of shafts
and shields that must be replaced annually.
What is the probability that the item will still be functioning without
failure at t = 300 days, given that the unit functioned without failure
at t = 100 days ?
335
336 Reliability Engineering
(a) What is the probability that the device will fail during the
second year of operation?
(b) If upon failure the device is immediately replaced, what is
the probability that there will be more than one failure in 3
years of operation?
9. The failure rate for a certain type of component is A.(t) = A.0t where
A.0 > > 0 and is constant. Find its reliability, mortality and MTBF.
f(t) = 32 ,t > 0,
(t + 4)3
where t is in years.
15. For the reliability analysis, 300 diodes were placed for a life test.
After 1500 hr, 16 diodes had failed and test was stopped. The times
at which failures occurred are: 115, 120, 205, 370, 459, 607, 714,
840, 990, 1160, 1188, 1300, 1380, 1414, 1449 and 1497 hrs.
Determine the failure rate of the diodes.
(a) Estimate the MTBF if failed resistors are replaced when found.
(bl Estimate the MTBF if no replacements are made.
17. Twenty small generators were put under test for a period of 1500
hours. One generator failed at 400 hours and was replaced by new
one. A second failed at 500 hours and was also replaced. A third and
fourth failed at 550 and 600 hours, respectively, and were removed
from testing, but were not replaced. A fifth malfunctioned at 700
hours was immediately repaired, and was put back into test. A sixth
malfunctioned at 800 hours but was kept in test. Later analysis
showed this failure was due to governor malfunction. Estimate the
failure rate of the generators.
18. Ten units are placed on life test, and the failure times are 9, 19, 27,
35, 40, 46, 50, 55, 56, 60 hr. Plot f(t), A.(t), O(t) and R(t).
TABLE
Time Interval Hours Number of failures during the
i n t e r v a
·--·---------··--- I-'"·-··-·..----·-- · -- - -- - - -
--
·--- l
--··-····-·-··
1000 < T s; 1020 25
1020 < T s; 1040 40
1 040 < T s ; 1 0 2 0
""" · - - - - ·---------- -- --..--
6 0
- ---·-
1080 < T s; 1100
·--·--
5
where t is in years.
22. A device is put into service on a Monday and operates seven days
each week. Each day there is a 10% chance that the device will
break down. (This includes the first day of operation). The
maintenance crew is not available on weekends, and so the manager
hopes that the first breakdown does not occur on a weekend. What
is the probability that the first breakdown will occur on a weekend?
23. A man and his wife appear for an interview for two posts. The
probability of husband's selection is 1/7 and that of the wife's
selection is 1/5. What is the probability that only one of them will
be selected ?
if B then s1 else s2
28. A cinema house gets electric power from a generator run by diesel
engine. On any day, the probability that the generator is down (event
A) is 0.025 and the probability that the diesel engine is down (event
8) is 0.04. What is the probability that the cinema house will have
power on any given day? Assume that occurrence of event A and
event B are independent of each other.
29. A has one share in a lottery in which there is one prize and
two blanks ; B has three shares in a lottery in which there are
three prizes and 6 blanks; compare the probability of A's success
to that of B's success.
30. Four persons are chosen at random from a group containing 3 men,
2 women and 4 children. Calculate the chances that exactly two of
them will be children.
(a) No failure
(b) One failure
(c) Two failures
(d) Two failures or less
(el More than two failures.
37. Four identical electronic units are connected in parallel. Each has a
reliability of 0.9. Estimate the probability of 0, 1, 2, 3, and 4 of
these units remaining operative.
(a) no break-down
(b) 1 break-down
(c) 2 break-downs
(d) 10 break-downs
(e) Less then 3 break-downs
(f) three or more than 3 break-downs.
39. An illuminated m1m1c diagram in a plant control room has 150
nominally identical bulbs which are required to be permanently
illuminated. If the probability of any one bulb being out at any one
time is 0.01, what is the probability of
40. Verify that the function defined by f(t) = 0.1e-.25t + 0.06e- 0.1t
for all number t>O, with f(t) = 0 for t< 0, is a density function
and
find the expected value of a random variable having such a density
function.
44. A room is lit by five nominally identical lamps. All lamps are
switched on together and left switched on. It is known that the
times to lamp failures after they have been switched on is
rectangularly distributed between a lower limit of 8000 hr and an
upper limit of 12,000 hr. What is the mean time to the room being
in darkness? How would this mean time be affected if the number
of lamps was increased to a total of 157
not greater than 7.0 mm. This clearance is provided for cooling
purposes. The radius of the clock is a random variable following a
normal probability law with a mean of 20.0 cm and a coefficient of
variation of 1%. The manufacturing process adopted to produce the
housing results in making the inner radius of the box also a random
variable following a normal probability law with a mean of 20.2 cm
and a coefficient of variation of 2%. Evaluate the probability that the
specified clearance will be met for a clock and its housing.
Draw the shape of this p.d.f. and calculate the reliability of the
control system if the requirement for the power output at a particular
time is (a) that it should be between 45 W and 57 W, (b) that it
should be between 43 W and 57 W and (c) that it should be less
than 55 W.
48. A given component has an MTBF of 106 hr, what is the reliability for
an operating period of 10 hr for 5 such components in series ?
50. A manufacturer of 16K byte memory boards finds that the reliability
of the manufactured boards is 0.98. Assume that the defects are
independent.
52. A certain component has a failure rate of 4x1o-8/hr in the on- state
and a failure rate of 4x10-9/hr in the off-state. On average, over the
life of this component, it is only 25% of the time in the on-state.
What is the effective failure rate of this component?
56. A solid fuel booster engine has been test fired 2760 times. On 414
occasions the engine failed to ignite. If a projectile is fitted with
three identical and independent booster engines of this type, what
is the chance on launching of the projectile that,
57. The reliability function for a relay is R(t) = exp(- A.K) where K is
the number of cycles and A. = 1o-4/cycle. A logic circuit uses 10
relays. The specific logic circuit used is unknown. What range should
K have for the system reliability to be 0.95 ?
Which of the alternatives would you select ? Why ? Assume that the
redundant units are statistically independent.
59. Two circuit breakers of the same design each have a failure-to-
open on-demand probability of 0.02. The breakers are placed in
series so that both must fail to open in order for the circuit breaker
system to fail.
Main Memoiy
Disk.Drives
64. A PC/XT has the following units with their respective failure rates
in (%/1000 hrs.) as indicated:
1.0
ii Co-processor 2.0
iii Kev Board 0.8
iv VDU 2.5
v Hard Disc 3.0
vi Flooov Drive 1 1.5
vii Floppy Drive 2 1.5
viii Printer 3.5
(a) Determine the reliability of each unit for 2,000 hrs. of operation.
(b) Determine the reliability of the system and MTBF if only one
floppy drive is sufficient.
65. The circuit in the following picture shows a battery, a light, and two
switches for redundancy. The two switches are operated by different
people, and for each person there is a probability of 0.9 that the
person will remember to turn on the switch. The battery and the light
have reliability 0.99. Assuming that the battery, the light, and the
two people all function independently, what is the probability that
the light will actually turn on?
Battery
Light
Switch 1
Switch 2
66. A computer system has three units as shown in Fig. Their
reliabilities are as follows:
(a)
(b)
69. Four elements of a system each have a constant probability of 0.1 of
being in the failed state at any time. What is the system
probability of being in the failed state if the elements are so
connected that system successes is achieved when :
73. If the level of stress changes during a mission, then the failure rate
also changes. At take off, for example, an aircraft engine has to
generate a greater torque to get the higher engine thrust required.
At cruising altitude and speed, torque requirements are reduced.
Assume the stress profile of an aircraft flight is as shown:
(a) Find an expression for reliability of a single engine for one flight. (bl
Assume a four engine aircraft. If all four engines are required for
takeoff and climb, but only two out of four are required for
completing the flight, determine the entire system reliability for
one flight.
350 Reliability Engineering
Time
0 t
s
74. A pipeline carrying fluid has two valves as shown below. Draw the
reliability logic diagram if
Valve A Valve B
Flow --
(a) both of them are normally closed and expected to open when
required to permit flow, and
(b) both of them are normally open and expected to close to block
the flow.
1. System reliability
2. Open mode failure probability
3. Short mode failure probability
= 0.2
= 0.3
79. A small nuclear research reactor has three absorber rods which are
suspended above the reactor and are designed to drop into the
reactor core and shut the reactor down in the event of any untoward
incident. The three rods are designated A, B and C and it has
been found that the probability of each of these rods failing to
drop on demand is Pa = 0.005, Pb = 0.01 and Pc =0.001. If it is
known that any two or more of three rods entering the reactor
core will
352 Reliability Engineering
safely shut the reactor down, what is the probability of failing to shut
the reactor down when called upon to do so?
80. A system has MTBF of 200 hrs. Calculate the 100 hr. reliability of a
system in which one such unit is operative and two identical units
are standing by.
82. The failure rate of a device is constant equal to 0.06x1o-3 per hr.
How many standby devices are required to achieve a reliability of
more than 0.985 for an operating period of 10,000 hrs? What is the
MTTF of the resulting system ?
2 F 4 G 6
T2 T3
Tl T4
95. Three units of a system have predicted reliabilities 0.75, 0.85 and
0.95 respectively. It is desired to allocate the reliabilities such
that the system reliability is not less than 0.75. Obtain the
necessary solution by
96. Devise Hamming code consisting of data digits and check digits to
encode the 5-bit data word 10101. Show how one error can be
detected and corrected.
Decode the message assuming that at most a single error can occur
in a word.
The total system cost should not exceed 200 Dollars and total
system weight should not exceed 60 Kg.
106. A large office block has a fire detection and alarm system which is
subject to a mean constant failure-rate of two failures per year (
1 year = 8760 hr) and each failure that occurs takes, on average, 4
hr to detect and repair. The system is also subject to a quarterly
routine inspection and test on which occasions it is out of action for
a fixed time of 1 hr. If the expected probability of fire occurrence
in the building over a period of time is 0.073, what is the
probability of an undetected fire by the alarm system over the same
period of time?
108. You are given a system with n components. The MTBF for each
component is 100 hrs. and the MTTR is 5 hrs. Each component
has its own repair facility. Find the limiting availability of the
system when:
111. A system consists of two units in active redundancy. The units have
a constant failure rate A. of 1o-3 per hour and fail stochastically
independent.
......G.....:.......9.:9.9............:.............9.:.Q2......
1. System availability,
2. Frequency of system failure,
3. Mean down- time, and
4. Mean up- time.
113. In testing certain systems whose operating time upto failure was
normally distributed, we obtain ten realisations of the operating time
upto failure (in hours): 115, 75, 80, 150, 75, 100, 120, 95, 70,
100. Find the confidence bounds for the mean of the operating time
upto failure with a level of confidence of 95%.
114. Twenty identical items were tested for 200 hr. Nine of the total
items failed during the test period. Their failure times are specified
in table below. The failed items were never replaced. Determine
whether the failure data represent the exponential distribution.
Failure number 1 2 3 4 5 6 7 8 9
33.7, 36.9, 46.8, 56.6, 62.1, 63.6, 78.4, 79.0, 101.5, 110.2
Q = Oo exp(-EA/kT)
What is the MTTF of this converter at 25oc7
119. The same data have been fit with both the basic and logarithmic
poisson models. The parameters obtained are :
Processor 1
Processor 2
122. The Soft Landing software service company has won a service
contract to provide recovery service for a patient control and billing
system. The service is provided to doctors in a region who own
personal computers. It has a failure intensity of 1 failure /100 cpu hr.
The average system runs 10 cpu hr /week and there are 600 doctors
to be serviced. The average service person can make 2 service calls
daily, including travel and allowing for enough spare personnel to
prevent excessive service backlogs from building up.
Per failure
Resource usage
Failure identification effort Per
3 hr
person hr 2 person hr
Failure correction effort 0 6 person hr
Computer time 1.5 CPU hr 1 CPU hr
3. 0.9802
5. 47 days
7. 0.905
(cl m =t0/3
15. 0.000682/ hr
23. 217
363
36 Reliability Engineering
4
29. 7 : 16
31. 0.999
35. 0.6976
41. 53/729
43. 0.75
45. 0.216
55. 0.3024
61. 0.10765
65. 0.9703
71. 0.885
0.832.
85. 0.9949
87· R = Pab + Qab Pac Pbc + Qab Pad Pbd Qac + Qab Pad Pbd Pac Qbc
+ Pad Ped Pbc Qab Qac qbd + Pac Ped Pbd Qab Qad
%c 89. 0.988
91. 0.94208
97. 9996
99. 3,2,2,3, 1
107. 0.896
119. 60 failures and 4.16 CPU hr, 64 failures and 3.2 CPU hr;
114 failures and 18 CPU hr, 156 failures and 39.2 CPU hr
123. (a)
XI = 778 per-hr, X.F = 552 per-hr, x.c = 389 CPU hr
(b No, Somewhat less
)
125. System "C" is optimal.
REFERENCES
BOOKS
8. Breipohl A.M., Probabilistic Systems Analysis, John Wiley & Sons, Inc.,
NewYork, 1970.
10. Colombo A.G. and Keller A.Z., Reliability Modelling and Applications,
D.Reidel Publishing Co., Holland, 1987.
11. Deo N., Graph Theory with Applications to Engineering and Computer
Science, Prentice -Hall Inc., Englewood Cliffs, New Jersey, 1974.
367
368 Reliability Engineering
13. Dhillon B.S. and Singh C., Engineering Reliability: New Techniques and
Applications, Wiley-lntersci ence, John Wiley & Sons, Inc., New York,
1981.
14. Dummer G.W.A. and Griffin N., Electronic Equipment Reliability, John
Wiley & Sons, Inc., New York, 1960.
18. Green A.E., Safety Systems Reliability, John Wiley & Sons Ltd., New
York, 1983.
19. Ireson W.G., Reliability Hand Book, McGraw-Hill, Inc., New York, 1966.
20. Ireson W.G. and Coombs C.F., Jr. (Editors), Handbook of Reliability
Engineering and Management, McGraw-Hill Book Co., Inc., New York,
1988.
21. Klaassen K.B. and Jack C.L.van Peppen, System Reliability, Chapman
and Hall, Inc., New York, 1989.
22. Llyod D.K. and Lipow M., Reliability: Management, Methods and
Mathematics, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1962.
24. Mann N.R., Schafer R.E. and Singpurwalla N.D., Methods for Statistical
Analysis of Reliability and Life Data, John Wiley & Sons, Inc., New York,
1974.
27. Myers G.J., Software Reliability: Principles and Practices, John Wiley &
Sons, Inc., New York, 1976.
28. Page L.B., Probability for Engineering, Computer Science Press, New
York, 1989.
36. Sinha S.K., Reliability and Life testing, Wiley Eastern Limited, New
Delhi, 1986.
38. Tillman FA., Hwang C.L. and Kuo W., Optimization of Systems
Reliability, Marcel Dekker, Inc., New York, 1980.
39. Trivedi K.S., Probability and Statistics with Reliability, Queuing and
Computer Science Application, Prentice-Hall, Inc., Englewood Cliffs,
New Jersey, 1982.
40. Von Alven W.H. (Editor), Reliability Engineering, Prentice-Hall, Inc.,
Englewood Cliffs, New Jersey, 1964.
RESEARCH PUBLICATIONS
7. Aggarwal K.K., Gupta J.S. and Misra K.8., A New Heuristic Criterion
for Solving a Redundancy Optimization Problem, IEEE Trans.
Reliability, Vol. R-24, pp 86-87, April 1975.
8. Aggarwal K.K., Misra K.B. and Gupta J.S., A Simple Method for
Reliability Evaluation of a Communication System, IEEE Trans.
Communication, Vol. Com-23, pp 563-565, May 1975.
9. Aggarwal K.K., Misra K.B. and Gupta J.S., A Fast Algorithm for
Reliability Evaluation, IEEE Trans. Reliability, Vol. R-24, pp 83-85, April
1975.
10. Aggarwal K.K., Misra K.B. and Gupta J.S., Reliability Evaluation: A
Comparative Study of Different Techniques, Microelectronics and
Reliability, Vol.14, pp 49-56, 1975.
11. Aggarwal K.K. and Gupta J.S., On Minimizing the Cost of Reliable
Systems, IEEE Trans. Reliability, Vol.24, pp 205-208, 1975.
References 371
13. Aggarwal K.K. and Rai S., Symbolic Reliability Evaluation Using
Logical Signal Relations, IEEE Trans. Reliability, Vol. R-27, pp 202-205,
August 1978.
14. Aggarwal K.K., Chopra Y.C. and Bajwa J.S., Modification of Cut Sets
for Reliability Evaluation of Communication Systems, Microelectronics
and Reliability , Vol.22, pp 337-340, 1982.
15. Aggarwal K.K., Chopra Y.C. and Bajwa J.S., Topological layout of Links
for Optimizing the s-t Reliability in a Computer Communication
Network, Microelectronics and Reliability, Vol.22, pp 341-345, 1982.
16. Aggarwal K.K., Chopra Y.C. and Bajwa J.S., Capacity Consideration in
Reliability Analysis of Communication Systems, IEEE Trans.
Reliability, Vol.31, pp 171-181, 1982.
17. Aggarwal K.K., Chopra Y.C. and Bajwa J.S., Reliability Evaluation by
Network Decomposition, IEEE Trans. Reliability, Vol.31, pp355- 358,
1982.
19. Anderson R.T., Reliability Design Hand Book, llT Research Institute,
April 1979.
21. Balagurusamy E. and Misra K.B., Failure Rate Derating Chart for Parallel
Redundant Units with Dependent Failures, IEEE Trans. Reliability,
Vol.25, pp 122, June 1976.
23. Banerjee S.K. and Rajamani K., aosed form Solutions for Delta-Star and
Star-Delta Conversions for Reliability Networks, IEEE Trans. Reliability,
Vol.R-25, pp 115-118, June 1976.
24. Bennets A.G., On the Analysis of Fault Trees, IEEE Trans. Reliability,
Vol.R-24, pp 175-185, August 1975.
29. Deo N. and Medidi M., Parallel Algorithms for Terminal- Pair Reliability,
IEEE Trans. Reliability, Vol.41, pp 201-209, June 1992.
30. Downs T. and Garrone P., Some New Models of Software Testing with
Performance Comparisons, IEEE Trans. Reliability, Vol.40, pp 322- 328,
August 1991.
32. Dugan J.B. and Trivedi K.S., Coverage Modeling for Dependability
Analysis of Fault Tolerant Systems, IEEE Trans. Computers, Vol.38, pp
775-787, June 1989.
34. Evans M.G.K., Parry G.W. and Wreathall J., On the Treatment of
Common -Cause Failures in the System Analysis, Reliability
Engineering, Vol.39, pp 107-115, 1984.
36. Fratta L. and Montanari U.G.,A Boolean Algebra Method for Computing
the Terminal Reliability of a Communication Network, IEEE Trans. Circuit
Theory, Vol.CT-20, pp 203-211, May 1973.
38. Gopal K., Aggarwal K.K. and Gupta J.S., Reliability Evaluation in
Complex Systems with many Failure Modes, International Journal of
Systems Science, Vol.7, pp 1387-1392, 1976.
39. Gopal K., Aggarwal K.K. and Gupta J.S., A New Method for Reliability
Optimization, Microelectronics and Reliability, Vol.17, pp 419- 422,
1978.
40. Gopal K., Aggarwal K.K. and Gupta J.S., A New Method for Solving
Reliability Optimization Problems, IEEE Trans. Reliability, Vol.29, pp 36-
37, 1980.
41. Gopal K., Aggarwal K.K. and Gupta J.S., On Optimal Redundancy
Allocation, IEEE Trans. Reliability, Vol.27, pp 325-328, 1978.
42. Gopal K., Aggarwal K.K. and Gupta J.S., Reliability Analysis of
Multistate Device Networks, IEEE Trans. Reliability, Vol. R-27, pp 233-
235, August 1978.
43. Gopal K., Aggarwal K.K. and Gupta J.S., A New Approach to Reliability
Optimization in GMR Systems, Microelectronics and Reliability,
Vol.18, pp 419-422, 1978.
44. Gopal K., Aggarwal K.K. and Gupta J.S.,An Event Expension Algorithm
for Reliability Evaluation in Complex Systems, International Journal of
Systems Science, Vol.10, pp 363-371, 1979.
45. Gopal K., Reliability Analysis of Complex Networks and Systems, Ph.D
Thesis, Kurukshetra University, Kurukshetra, India, 1978.
48. Hansler E., McAulifee G.K. and Wilkov R.S., Exact Calculation of
Computer Network Reliability, Networks, Vol. 4, pp 95-112, 1974.
51. Hurley R.B., Probability Maps, IEEE Trans. Reliability, Vol.R-12, pp 39-
44, September 1963.
52. Jasman G.B. and Kai O.S., A New Technique in Minimal Path and
Cutset Evaluation, IEEE Trans. Reliability, Vol.34, pp 136-143, 1985.
53. Jensen P.A. and Bellmore M., An Algorithm to Determine the Reliability
of Complex Systems, IEEE Trans. Reliability, Vol.R-18, pp 169-174,
November 1969.
56. Lin P.M., Leaon B.J. and Huang T.C., A New Algorithm for Symbolic
System Reliability Analysis, IEEE Trans. Reliability, Vol. R-25, pp 2-15,
April 1976.
57. Locks M.O. and Biegel J.E., Relationship Between Minimal Path-Sets and
Cut-Sets, IEEE Trans. Reliability, Vol.R-27, pp 106-107, June 1978.
58. Locks M.O., Inverting and Minimizing Path-Sets and Cut-Sets, IEEE
Trans. Reliability, Vol R-27, pp 106, June 1978.
64. Misra K.B. and Sharma U., An Efficient Algorithm to Solve Integer
Programming Problems Arising in System- Reliability Design, IEEE
Trans. Reliability, Vol.40, pp 81-91, April 1991.
66. Nakagawa Y., Nakashima K. and Hattori Y., Optimal Reliability Allocation
by branch- and- bound Technique, Vol.R-27, pp 31-38, April 1978.
69. Page L.B. and Perry J.E., A Model for System Reliability with Common
Cause Failures, IEEE Trans. Reliability, Vol.R-38, pp 406- 410, October
1989.
71. Pedar A. and Sarma V.V.S., Phased- Mission Analysis for Evaluating the
Effectiveness of Aerospace Computing Systems, IEEE Trans. Reliability,
Vol.30, December 1981.
72. Pedar A., Reliability Modelling and Architectural Optimization of
Aerospace Computing Systems, Ph.D.Thesis, Indian Institute of
Science, Bangalore, India, 1981.
74. Renu Bala and Aggarwal K.K., A Simple Method for Optimal
Redundancy Allocation for Complex Networks, Microelectronics and
Reliability, Vol.27, pp 835-837, 1987.
75. Rushdi A.M., Symbolic Reliability Analysis with the Aid of Variable
Entered Karnaugh Maps, IEEE Trans. Reliability, Vol.A- 32, pp 134-139,
June 1983.
84. Soi l.M.N. and Aggarwal K.K., Reliability Indices for Topological
Design of Reliable CCNs, IEEE Trans. Reliability, Vol.30, pp 438-443,
1981.
87. Suresh Rai and Arun Kumar, Recursive Technique for Computing System
Reliability, IEEE Trans. Reliability, Vol.R-36, pp 38-44, April 1987.
88. Suresh Rai and Aggarwal K.K., An Efficient Method For Reliability
Evaluation of a General Network, IEEE Trans. Reliability, Vol.A- 27,
pp 206-211, August 1978.
89. Tillman F.A., Hwang C.L., Fan LT. and Lal K.C., Optimal Reliab17ity
of Complex System, IEEE Trans. Reliability, Vol.R-19, pp 95-100,
August 1970.
90. Tillman F.A., Hwang C.L and Kuo W., Optimization Techniques for
System Reliability with Redundancy- A Review, IEEE Trans.
Reliability, Vol.R-26, pp 148-155, August 1977.
92. Vinod Kumar and Aggarwal K.K., Determination of Path Identifiers for
Reliability Analysis of a Broadcasting Network using Petrinets,
International Journal of Systems Science, Vol.19, pp 2643-2653, 1988.
-non series-parallel 62
A -parallel 61
-series 61
A Particular Method for Reliability -series parallel 61
Analysis 93 Boolean algebra method 91
Acceleration Burn-in 12,13
-for exponential distribution facotr 202
-models 203 c
Acceptable risk of error 200
Actions-timely management 298 Calendar time component 241
Active element group method 109 Causative factor 257
Active element groups 109,130 Capacity analysis 268
Active repair time 158 Cartesian product- Normalized 247
Additional execution time 240 Catastrophic failures 9
Adequate Performance 5 Causes of failures 7
Allocation factors for Reliability CC methodology 260
Apportionment 129 Chance failures 12
Apportionment for new units 123,128 Characteristic types of failures 11
ARPA Computer Network 88 Common cause failures 256
Arrhenius model 204 Communication & co-ordination 8
Availability 153,154,165 Comparison of software reliability models
-function 163 229
-man power 303 Competitive substitutions 160
-operational 154 Complement of a set 30
-steady state 159,165 Complexity factor 129,130
Average failure rate 65 Component reliability measure 185
Average hourly cost 172 Computation of failure rate 26
Computer communication networks 88,246
B Conditional probability 34
Conditional probability chain 51
Bank data network system 334 Confidence
Banking system 329 -estimation 197
Basic allocation method 125
378
Subject Index 379
-monotonic IO
-analysis 156 -non-monotonic IO
-effective consideration 299 Duane plots 213
-effective choice of subsytems 285 Duty ratio 131
-of reliability design 275 Dynamic Programming 287
-prevention 273
-present 281 E
-penalty 284
-timely planning 299 Early failures 11
Cost model Economical Quality Control and Sampling
-availability 284 Plans 304
-depreciation 282 Economics of reliability engineering 272
-reliability achievement 276 Effective evaluation 315
-reliability utility 280 Effective training 311
Criticality 135,136 Effort function 127
Cutset approach 96 Environment-hostile 4,223
Cycle time 166 Environmental
-Data 111
D -symbol identification 111
-symbol description 111
Dead end connection 88 Environmental test laboratory 310
Debugging 12,13 Error correction 141
Decision theory 302 Error detection & location 142
Decrement per failure experienced 234 Evaluation of training 314
Degradation failures 5,9 Event
Delta star method 97 -compound 35
Dependability 154 -independent 34
Derating 140 -simple 33
Design reliability 218 Exclusive operator 249
Detailed design prediction 109 Execution
Developed code 222 -environment 222
Developer oriented approach 217,218 -time component 232
Devices 6 Expected
Discrete Random Variable 36 -number of failures 235
Distribution function -number of additional failures 236
-binomial 39 Exploding technology 107
-continuous 44 Exponential
-chi square 197 -law 13
-discrete 36,37 -law verification 187
-exponential 47, 190 Eyring Model 205
-gamma 49,50
-normal 50,193 F
-poisson 41
-prior 194 Fail
-posterior 194 -safe 79
-rayleigh 24,47 -to danger 79
-uniform 46 Failure 6,217
-weibull 49,192 -catastrophic 9
Distribution percentile 202 -chance 12
Down time 158 -complete 9
Drift
38 Reliability Engineering
-data analysis 25
-degradation 9
H
-density function 21
Hamming code 141
-drift IO
Heuristic methods for Reliability Allocation
-frequency 159
144
-frequency curve 21
High pressure oxygen system 87
-gradual 9
-identification Human reliability 8
-intensity 225,234
-intensity function 221 I
-marginal 9
-open 75 Incentive contracts 300
-partial 9 Independent events 62
-rate 6,20,112 Informal training 314
-reports 310 Information sources for failure rate data
-sensing and switch over devices 81 109 In-house test ll0
-short circuit 75 input
-sudden 9 -state 224
Failure rate -space 224
-average 65 -variable 224
-derated 207 Inspection non-destructive 305
Failure intensity decay parameter 233 Instantaneous probability of failures 20
Failure intensity as function of time 236 Instructions retry step 142
Failure mode and effects analysis (FMEA) Instrument landing system 316
6 Intended life 12
Failure mode effects and criticality analysis Inter-section of two sets 30
(FMECA) 108 Intermittent failures 142
Fault Internal data sources 309
-removal 223 Isoavailability curve 168
-tolerance 142 Item failure rate 114
-tree analysis 6
Feasibility prediction 109 J
Field data performance 121
Field performance feedback 110 Job knowledge quotient 312
Field service engineering 310
Format reliability specification 296 K
Formulating design decisions 108
Funnel arrangement l81 K-out of m System 61,73
Kamaugh map 91
G Keystone Element 103
Terminal reliability 90
Test
u
-acceptance 209 Unavailability 166
-integrating 180 Understandability of documentation 218
-level of 180 Uni-phase system 251
-non-replacement 186,199 Union of sets 30
-purpose of 182 Unreliability 19
-reliability 182 Use environment 117
-replacement 199 Use of reliability studies 226
-report 311 Use of samples 307
-step stress 208 Useful life 13
-two tailed 188 User
Test equipment design 305 -friendly 217
Testing -oriented 217
-accelerated 201,206 -oriented view 218
-accelerated life 183
-actual conditions 180 v
-ambient 180
-destructive 179 Validity check 142
-environmental 180 Variable
-equipment acceptance 209 -input 224
-life 183 -output 224
-non-destructive 179 Venn diagram 31
-of reliability growth 211 Vertex cutset 247
-peripheral 182
-production 180
-service life evaluation 184
w
-simulated conditions 180
Wear out failures 12
-surveillance 184
Weightage factor 129
Thermal aspect 117
Weighted
Thermal design 109
-reliability index 268
Three state Markov model 168
-reliability measure 269
TOPICS IN SAFETY, RELIABILITY AND QUALITY