0% found this document useful (0 votes)
81 views37 pages

Reliability Engineering: A Perspective: January 2008

This document discusses reliability engineering from the perspective of Krishna B. Misra. It begins by defining reliability according to IEC standards as the probability of adequate performance over a specified duration under given operating conditions. It emphasizes the importance of quantifying and engineering reliability into products. The document then discusses how environmental factors like temperature, vibration, radiation can affect reliability and gives examples of how combined environmental stresses can have greater impact than individual factors. It stresses that reliability is dependent on operating environment and conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views37 pages

Reliability Engineering: A Perspective: January 2008

This document discusses reliability engineering from the perspective of Krishna B. Misra. It begins by defining reliability according to IEC standards as the probability of adequate performance over a specified duration under given operating conditions. It emphasizes the importance of quantifying and engineering reliability into products. The document then discusses how environmental factors like temperature, vibration, radiation can affect reliability and gives examples of how combined environmental stresses can have greater impact than individual factors. It stresses that reliability is dependent on operating environment and conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/302006959

Reliability Engineering: A Perspective

Chapter · January 2008


DOI: 10.1007/978-1-84800-131-2_19

CITATIONS READS

16 16,460

1 author:

Krishna B. Misra
RAMS Consultants
132 PUBLICATIONS   3,212 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

performability Engineering View project

New Trends in System Reliability Evaluation View project

All content following this page was uploaded by Krishna B. Misra on 07 May 2016.

The user has requested enhancement of the downloaded file.


19

Reliability Engineering: A Perspective

Krishna B. Misra,
RAMS Consultants, India

Abstract: The most important attribute of performance is reliability and it is defined as the probability of
failure free operation over a specified duration of time under a given set of conditions, which depend up
on location and the kind of application the item is put to. In designing for reliability, the objective must be
to maximize inherent and operational reliability such that occurrences of failures are considerably reduced.
Component or system reliability goal can be achieved through design and testing. Failures and
consequently accidents can be avoided if their causes of can be traced and taken care of during the design.

19.1 Introduction perform its expected job under the specified


conditions of use over an intended period of time.
The most important characteristic of product Obviously, this definition raises questions, like:
performance is reliability and there are several • Can this capability be quantified?
reasons why reliability is more important than • Can this quantification be used with all kinds
before. Among them are: of products and systems without resorting to
• Products are becoming more and more any redefinition or modification on case to
complex and unless reliability is improved, case basis?
the performance may become inadequate. • Can this capability be used to compare the
• There exists a tough competition between the performance of different engineering designs
industrialized nations and only the products and technologies for a product or a system
with high reliability will eventually survive. and assess their suitability?
• There is a compulsion of minimizing waste as
the world population grows and the resources As an answer to all the above questions, we can say
keep on shrinking. that if the capability is quantified through
probability of satisfactory performance provides us
• Product longevity implies less world
a very general approach to evaluate relatively the
pollution.
performance of products and systems. The next
important question then arises is: can this
19.1.1 Definition
capability be engineered into products or systems?
The answer is again in affirmation and in fact our
According to IEC definition, reliability is the
main concern in reliability engineering is to
capability of a product (or a system or a service) to
254 Krishna B. Misra

engineer reliability into products, equipments and Table 19.1: Environmental Factors that Influence
systems. Reliability
Implicit in the IEC definition of reliability are Natural Induced
three components viz., probability of adequate Electromagnetic - Acceleration
operation, over the specified time and under the Radiation Chemicals
specified conditions of use. These need Electrostatic - Corona
explanation. Discharge Electromagnetic -
Adequate Operation: signifies that a failure of a Frost Radiation
product (or a system) does not necessarily Fungus Electrostatic-
constitute complete cessation of its operation, but it Gravity Discharge
only represents that the product or the unit is not Humidity Explosion
functioning within the intended bounds of Lightning Moisture
performance or is not operating satisfactorily. Pollution, Air Nuclear Radiation
Mission Time: The specified time in the definition Pressure, High, Low, Shock, Thermal
can also be called as mission time and without Vacuum Space Debris
specifying mission time, a figure of reliability has Radiation, Cosmic, Temperature: High,
no meaning since reliability is dependent on Rain Low,
mission time and decreases with the increase in Salt Spray Turbulence
length of mission time. The duration of mission Sand and Dust Vapour Trails
time depends on the design objectives of a system Snow Vibration:
and it is this length of time over which the system Temperature Mechanical ,
should not have any failures at all so that the Wind Vibration: Acoustic
desired mission of the system can be carried out
successfully without being impaired or aborted. Each environmental factor requires
The choice of mission time exclusively depends determination of its impact on the operational and
upon the objective of the system. reliability characteristics of materials and parts
Conditions of use: These basically refer to the comprising the system. This includes operational
environmental conditions under which the product and maintenance environment as well as pre-
or the system has to operate. In fact the operational environments, when stresses imposed
performance of a product can be adversely affected on parts during manufacturing assembly,
by these environmental conditions. Thus reliability inspection, testing, shipping and installation may
is always dependent on specified environmental have significant impact on equipment reliability.
conditions. If there is a departure from the Very often more than one environmental factor
specified conditions; the intended level of may be acting on parts or equipment. These
performance of the product or system cannot be combined or concurrent environments may be
ensured and there is a risk that the product might more detrimental to reliability than the effects of
even transit to a degraded performance level or these single environments separately. For example,
even to a failure. equipment may be exposed to a combination such
Further, the conditions of use render reliability as temperature, humidity, altitude, shock and
engineering a challenging task as this task can vibration, while it is being transported. Moreover
aptly be handled through indigenous know-how, superposition of effects of individual
experience and experimentation as the conditions environmental factors cannot predict the resulting
of environment may vary considerably from place influence a combination of environmental factors
to place depending upon the geographical location. will have on the reliability or performance of the
A checklist of environmental factors, which equipment. In fact the severity of combination may
reliability engineer may be face while designing his be much more than individual effects added
system against is given in Table 19.1. There are together. For example, the percentage of failures
two types of factors, viz., natural and induced. caused by temperature may be (40% of all
Reliability Engineering: A Perspective
255
failures), vibration (24%) and humidity (19%). nuclear power plants may be exposed to neutron
Sand and dust (6%) and salinity (4%) but humidity irradiation that can increase the brittleness of steel
combined with temperature can cause 75% of all causing brittle failure. However, overstressing is
failures. Humidity with salt air environment (as done to the extent that it does not change the
may be common with the coastal region) can be a product failure mode and mechanisms.
major cause of degradation of equipment
performance since they promote corrosion effects 19.1.2 Some Hard Facts about Reliability
in metallic components besides leading to the
formation of surface films on non-metallic parts. However there are certain notions that need to be
Moisture absorption by insulating material can also clarified to a beginner in this area.
induce conductivity and dissipation factor of these
materials. As pointed out earlier, the effects cannot 19.1.2.1 Death is a certain event
be linearly extrapolated if two environments act Birth and Death are certain events of life. This
simultaneously upon a product or a system. applies to all man-made objects also. It is a
Sometimes, sudden changes of temperature may universal fact that anything that is born has got to
also induce large amount of internal mechanical die some day. Certainly in reliability, we do not or
stresses in structural elements, particularly, when cannot eliminate natural death. A natural death is
dissimilar materials are involved. Effect of thermal not considered as an undesired event in reliability
shock-induced stresses include cracking of seams, but cessation of activity or function during the
delamination, loss of hermeticity, leakage of fill mission time is certainly an undesired event and is
gases, separation of encapsulating materials from what we call a failure in reliability terminology.
components and enclosure. Natural frequencies of Our prime concern, therefore, in reliability
items comprising equipment must also be engineering is to prevent a failure of the item
considered in the design phase since a resonant during its mission time and not outside it.
condition may greatly amplify subsystem
deflection and may increase stresses beyond the 19.1.2.2 Failures are inevitable
safe limits. Vibration environment can create Since as mentioned earlier, it is not possible to
relative motion between members. When combined eliminate failures completely, in reliability
with other environmental stresses, this motion can engineering we aim to prevent them as far as
produce fret corrosion. possible and to contain their influence on the
Therefore, in order to ensure reliability, functioning of system, if they do occur
environmental testing should form an integral part
of performance demonstration. In fact, 19.1.2.3 Hundred percent reliability is impossible
experimentation on the basis of actual
environments envisaged acting upon a product or a A hundred percent reliability is possible only when
system must be necessarily carried out or a product or unit never ever fails. Whatever be the
investigated into before finalizing a product design. mission time, reliability can never be 100%.
Environmental stress test is commonly employed (Except theoretically, if mission time is zero, i.e.,
during the product design and development stage. we do not use it at all). It can approach
For products that are in use over a long period of asymptotically to unity and can be close to unity,
time, accelerated life test is usually done in order to i.e., 0.999…9 but can never be unity. Till there is a
shorten the time-to-failure by elevating the severity single chance of failure (or death, which is a
of loading more than what the product is supposed certainty), reliability can never be unity.
to work under normal conditions so as to obtain
meaningful results on reliability characteristic. For 19.1.2.4 Instant of failure is unpredictable
instance, electronic components may be tested at
elevated temperatures in order to hasten the The instant of failure can’t be predicted. When a
incidence of failures. Likewise, steel pipes used in failure would occur, nobody can ever predict.
256 Krishna B. Misra

There is no mathematical equation or theory that 19.1.4 Failure-related Terminology


exists, which can do this. The occurrence is just a
chance or a random event. Therefore, it should be There are several terms in vogue in reliability
clear in our minds that the theory of reliability engineering which should be defined for avoiding
can’t predict the exact instant when a product or a confusion and for clarity of understanding.
system is goint to fail. One should not hurriedly
infer from this statement that reliability theory is Defect: A defect is the departure of a characteristic
imprecise but only supports the statement that of an item from requirement.
failures can occur any time and are random in Fault: A fault is the state of an item characterized
nature. by its inability to perform a required
function.
19.1.2.5 Uncertainty of results Error: An error is a discrepancy between a
computed, observed or measured value or
Statistical and probabilistic methods are used for
condition and the true, specified or
analyzing failure data, quantifying reliability
theoretically correct value or condition.
during the prediction, measurement and testing
Human Error: It is a human action that produces
phase. However, on account of high level of
an undesired result or consequence.
uncertainty involved at various level in the process,
Bug: A bug is a software defect.
these can hardly be applied with the kind of
Failure: A failure is the cessation of the ability of
precision and credibility that engineers are
an item to perform a required function.
accustomed to when dealing with most other
However, a failure, in reliability engineering, has
engineering problems. But the mathematical and
wider connotation, which not only includes death
statistical methods in reliability analysis in fact
but also the inability of an item to perform
allow us to have an idea of the uncertainty present.
adequately over the mission time. Thus the
definition of failure has been related to the level or
19.1.3 Strategy in Reliability Engineering
quality of the desired performance. For example, if
an electric power company guarantees that the
The prioritized objectives of reliability engineering
voltage within the premises of a consumer shall be
are:
within ± 5% of the rated supply voltage, its failure
• To apply engineering knowledge to to keep the supply voltage within these limits shall
understand and anticipate the possible causes be construed as a failure of power supply. A failure
of product or system failures and to take here not only implies that no electric power is
adequate measures to prevent them from available but also an inadequate level of supply
occurring. voltage. This broad definition of failure permits us
• To identify and check the failure to quantify the performance of a product vis-à-vis
mechanisms, which eventually lead to the expected level of performance.
failures. Failure Mechanism: A failure mechanism is a
• Explore the ways of reducing the likelihood physical, chemical, thermal or metallurgical
or frequency of failures despite the efforts to process which causes component characteristics to
prevent them. change beyond tolerance which eventually leads to
• To apply methods for estimating the a certain mode of component failure. Typical
reliability of new designs, and for analyzing failure mechanisms can be electromigration,
reliability data with a view to improve future thermal instability, electrostatic field, corrosion,
designs. fatigue etc.
Basically, reliability engineering is the first and Failure Mode: A failure mode is the visible or
foremost application of good engineering, in the observable effect by which a component failure is
widest sense, during design, development, identified. A component can fail in more than one
manufacture and use. failure mode.
Reliability Engineering: A Perspective
257
The examples are short circuit or open circuit mode strength and consequently lead to premature failure
of failure or degraded performance of electronic at the flawed spots.
devices such as capacitors, diodes etc. or valves Metallurgical failure: This failure is the result of
stuck open or stuck closed in mechanical devices. extreme oxidation or operation in corrosive
Several failure mechanisms may exist environment. The environmental factors such as
simultaneously at a point of time eventually heat, nuclear radiation, erosion, and corrosive
leading to a failure of a component in a particular media accelerate the occurrence of metallurgical
mode. Two or more same failure mechanisms can failures. These are also referred as material
lead different failure modes. All failure modes are failures.
known as primary failures. Stress concentration failure: This type of failure
Dependent Failures: In case the failure of a occurs when uneven stress “flow” through a
component changes the operating stress on other mechanical member. Usually, the concentration of
components of a system, dependency exits and this stress occurs at sudden transitions from thick to
increases the failure probability of other thin gauges, at sudden variations in loading along a
components. structure, at right-angle joints or at various
Common Mode of Failure: It is a single failure attachment conditions.
mode that results into simultaneous failure of Compressive failure: This failure occurs under
several components. compressive loads. The compressive failure often
results in permanent deformation /cracking
19.1.4.1 Types of Electrical Failures
/rupturing.
Electronic or electrical components may fail in any Bearing failure: This type of failure is similar in
of the following modes [54]: nature to the compressive failure and it usually
occurs due to a round cylindrical surface bearing
• short circuit: Diodes, transistors, capacitors on either a flat or a concave surface like roller
may often fail in this mode bearings in a race. The repeated loading may cause
• open Circuit: resistors, crystals etc. may often fatigue failures.
fail in this mode Tensile-yield-strength failure: This failure occurs
• degraded performance: SCRs and capacitor under pure tensile stress, especially, when the
aluminium may fail in this mode. applied stress exceeds the yield strength of
• functional failure: coils, relays may often fail member. Usually, it results in permanent
in this mode. deformation in the structure or a permanent set and
In fact the relative frequency of occurrences of is rarely catastrophic. The value of safety factors
these failure modes can be different for different for this kind of failure could be as low as 1 (i.e.,
component type and these failure modes and their design to yield) or 4 or 5 for pressure vessels and
relative frequencies should be considered for a bridges.
particular type of application of a component Ultimate tensile-strength failure: This leads to a
during the design of a system using these complete failure of the structure at a cross-section
components. This makes the reliability design a and occurs only when the applied stress is greater
challenging task. than the ultimate tensile strength. The values of
19.1.4.2 Types of Mechanical Failures safety factors for this kind of failure could be as
low as unity (i.e., design to fail) or 5 or 6 for
Depending upon the type of stress, the usual modes pressure vessels and bridges.
of failure in mechanical components [75,88] can be Bending failure: This failure occurs when one
enumerated as follows: outer surface is in tension and the other surface is
Material failures: This type of failure mainly under compression and may therefore be referred
occurs due to poor quality checks, fatigue cracks, as combined failure. The tensile rupture of the
weld defects, and small cracks and flaws. These outer material is a representative of the bending
defects actually reduce the allowable material failure.
258 Krishna B. Misra

Fatigue failures: Repeated cycling of the load (61.2%), surface fatigue (20.3%), wear (13.2%)
causes metal fatigue. It is a progressive localized and plastic flow (5.3%) are quite common.
damage due to fluctuating stresses and strains on
the material occurs mainly due to repeated loading 19.1.5 Genesis of Failures
or unloading (or partial unloading) of an item. The
process consists of three stages, viz., crack There can be several reasons why a product fails.
initiation, progressive growth of crack and finally Knowing the potential causes of failures is
sudden fracture of the remaining cross section. essential to prevent them. It is rarely possible to
Instability failure: This type of failure occurs in anticipate all causes of failures; therefore it is
structural members such as columns and beams, necessary to concentrate on the uncertainties
particularly the ones manufactures using this involved in the reliability engineering effort during
material where the loading is usually in design, development, manufacture and use. It
compression. However, instability failure may also should address all the anticipated and possibly
result because or torsion or by combined loading, unanticipated causes of failure, to ensure that their
i.e., including bending and compression. Usually occurrence is prevented or minimized. Broadly
this type of failure is crippling or leads to a total speaking, failures can occur if:
failure of the structure. The experience indicates • The design is inherently weak. The list of
that a safety factor value of 2 is usually sufficient possible reasons is endless, and every design
but a factor of less than 1.5 should not be used. problem presents the potential for errors,
Shear loading failure: The occurrence of ultimate omissions and commissions. The more
and yield failure takes place when the shear stress complex the design is, greater is this
becomes greater than the strength of the material potential!
when applying high torsion or shear loads. Usually, • The stress applied exceeds the strength.
these failures occur along 45 0 axis in relation to Overstress failures can occur in spite of
the principal axis. designers having provided some margin of
Creep failures: Usually, long-term loads make safety. Electronic component specifications
elastic materials stretch even when they are do prescribe the maximum rated conditions of
stressed less than the normal yield strength of the application, and circuit designers ensure that
material. The material stretches (creeps), if the load these rated values are not exceeded. In other
is maintained continuously and ultimately results in words, they derate the component by
a rupture. Working at high temperatures, ensuring that the stresses in worst conditions
accelerates creep. of use remain below the rated stress values.
Corrosion failures: Corrosion is a chemically In the same way mechanical designers know
induced damage to a material that results in the properties of the materials being used
deterioration of the material and its properties. This (e.g., ultimate tensile strength) and they
may results in failure of the component. Several ensure that there is an adequate margin of
factors should be considered during a failure safety between the strength of the component
analysis to determine the affect corrosion played in and the maximum applied stress. However, it
a failure. For example, Type of corrosion, is not possible to provide protection against
Corrosion rate, the extent of the corrosion and every possible stress application as failures in
interaction between corrosion and other failure most cases can also be caused by variation in
mechanism. stress and strength values, as there always
Multiple Failure Modes: In fact a single exists some uncertainty about them. In fact
mechanical member can fail in various different this allows us to define reliability in terms of
modes of failures. For example, gear is one of the the stress and strength distributions
important and common components in mechanical considering that they are random variables.
equipment and the experience shows that amongst To illustrate how the probability of failure
the various causes of gear failures, breakage can be computed from the knowledge of
Reliability Engineering: A Perspective
259
distributions of stress and strength, let us where S1 is the location parameter and
assume that the stress applied to a member is α s β s are the shape and scale parameters for
normally distributed with a density function
the strength distribution. Therefore, the
f s (S) with mean and variance given by µ S and
probability of failure Q would be given by:
σ S , respectively and for the sake of ∞
Q = P{S < s} = [1 − F (S )] f (S )dS
(19.2)
simplicity, let us further assume that the ∫
−∞
s S
strength of this member is determined from
some non-destructive test and is determined
to be S 1 -a fixed value. Obviously, the ∞   S − s α s  α  S − S α S −1   S − S α S 
= ∫ exp −  1
  S  1
 exp −  1
  ds
member can fail only if the stress is greater S1   β s   β S  β S    β S  
than the given strength, i.e., S 1 . This would
mean that the probability of failure, Q is
If we substitute
given by αS α S −1
 S − S1  α  S − S1 
∞ d =   , then dd = S   dS
Q ≡ P(s>S 1 ) = ∫ f s (s)ds (19.1) β
 S  β S  β S 
R = 1− Q ,
1
S1 and S = δ α Sβ S + S1 , we obtain
or the reliability of this member or unit would be: where
∞ S1
R ≡ 1-Q = 1 - ∫ f s (s)ds = ∫ f s (s)ds = P(s≤S 1 )  β αS


  S − s 

Q = P{S < s} = ∫ e −δ exp −  S δ α S +  1 1 
1

S1 -∞  δδ
  β  β s  
The definition (19.1) provides unreliability 0
 s

of a component when the component has a
known value of strength S 1 and is invariant. The integral in the above expression can
However when the variability of both stress be computed through numerical integration
and strength is considered, the probability of for various combinations of the parameters
failure of the component is dependent on the of stress and strength as suggested in [103].
area of overlap between the stress and Also, it is easy to realize that one can
strength distributions. Larger the area of compute Q for various combinations of
overlap, higher is the probability of failure distributions for stress and strength (other
of the member. Now let us assume both than Gaussian) such as exponential,
stress and strength are variable and s and S Weibull, lognormal or extreme value
represent the stress and strength random distributions following the same procedure.
variables, respectively. Also the location This approach, which may provide closed
parameters for their distributions are s 1 and form expressions in some case whereas in
S 1 for stress and strength, respectively. other cases numerical integration may have
Therefore, the probability density functions to be carried out, is generally used for
for the stress would be given by: computing probability of failure of
α s −1
α s  s − s1    s − s α s  mechanical and structural members. For a
f s (s) =   exp −  1
  fuller discussion of the various cases, a
β S  β s    β s   reader is referred to chapter 4 of [103].
where, s 1 is the location parameter and Some other approaches such as [132] have
α s , β s are the shape and scale parameters for also been proposed.
the stress distribution. • The actual strength of any population of
Similarly, the probability density components varies and there are bound to be
function for strength can be: some that are relatively strong, others that are
α S −1
 S −S  αS
 relatively weak, however a majority of them
α S  S − S1 
f S (S ) =   exp −  1
 , S1 ≤ S < ∞ will have average strength. The applied loads
β S  β S    β S  
can also vary. Here again the failure will not
260 Krishna B. Misra

occur so long as the applied load does not their occurrence. It is also true that all failures in
exceed the strength or in other words, a high principle can be prevented by identifying, studying
value of safety factor is used. However, if and analyzing them. The O-ring seals on the Space
there is an overlap between the distributions Shuttle booster rockets were not classed as failures,
of load and strength, and a load value in the until the ill-fated launch of Challenger. It is
high tail of the load distribution is applied to therefore necessary to know more about the
an item in the weak tail of the strength reasons of failures and to anticipate them. A
distribution so that there is overlap or reliability engineer must be a hardcore pessimist
interference between the distributions, then when it comes to anticipate what can go wrong in a
the failure will definitely occur. system and take all precautionary or remedial
• Failures can also be caused by wear out due measures against them but having done that, he
to any mechanism or process that causes an should hope for the best and be an optimist.
item that is sufficiently strong at the start of
its life to become weaker with the passage of 19.1.6 Classification of Failures
time. Examples of such processes are:
material fatigue, wear between relatively Failures can be classified very broadly based on the
moving surfaces that are in contact, causes of failures over the life span of a product.
corrosion, insulation deterioration, and the Figure 19.1 shows a typical characteristic of
wear out mechanisms of light bulbs and failures over all the three regions of life span. If the
fluorescent tubes. Initially the strength is failures occur during the early life period or what is
adequate to withstand the applied loads, but known as infancy period, they are known as early-
as time passes weakening occurs and the life failures or quality failures. Any substandard
strength decreases. In every case the average item would surely fail during its early life. The
value falls and the spread of the strength responsibility, therefore, rightly rests with the
distribution widens. This is also the reason manufacture, as these failures can be attributed to
why it is difficult to provide an accurate his failure to use the right type of raw material or to
prediction of the lives of such products. control processes and to maintain high quality. As
• Failures can also be caused by the time- the weak or substandard components fail during
dependent mechanisms. Examples of such a early life, the hazard rate of the product decreases
mechanism are creep caused by simultaneous rapidly. Early-life failures can be eliminated
high temperature and tensile stress, as in through debugging processes, which consist of
turbine discs and fine solder joints, battery operating the product for quite some time under
run-down and progressive drift of component simulated conditions of actual use. This would
parameter values. Again a reader can refer to cause a vast majority of the substandard products
[103] for the time dependent cases. to fail in the early life and product hazard rate
• Other causes of failure can be sneaks. A reaches its lowest point at the end of this period (0-
sneak is a condition in which the system does t E ) as shown in Figure 19.1. Beyond the point t E ,
not work properly even though every part useful life period of the product begins.
does. Sneaks also occur in software designs. The failures that occur during the useful life
• There are many other potential causes of period are designated as catastrophic or chance
failures, such as they can occur on account of failures and are unpredictable due to their nature
errors, incorrect specifications in designs or itself. However, we can always quantify their
software coding, or by faulty assembly or likelihood statistically. Their rate of occurrence is
tests, by inadequate or improper maintenance, generally constant during the entire life of a
or by misuse. product. It is actually during this useful period of
Therefore, all failures have some cause or the other life, a product is utilized to its maximum. The
and by proper anticipation, analysis and catastrophic failures can’t be eliminated either
application, an engineer can reduce the chances of through good debugging techniques or using the
Reliability Engineering: A Perspective
261
best maintenance practices as they are caused by 19.2 Problems of Concern in
sudden stress accumulation beyond the designed Reliability Engineering
strength of the product and can be minimized only
by reliability improvement programmes. Usually,
In specific terms, the problems, one is concerned
the hazard rate in majority of cases can be
about in reliability engineering are:
approximated to be constant and the mean time to
• How one can extend the useful life of a
failure can be computed by its reciprocal. An
product?
application of a sudden stress in excess of design
strength or maintenance induced failures may also • How we can minimize the chance failures of
display to a constant hazard rate pattern of failures. a product?
• How to reduce the initial hazard rate and to
avoid product failures during its initial
period?
The first two problems relate to the design and
depend on material selection and the choice of
derating and safety factors, with emphasis being on
prevention, reduction, or complete elimination of
chance failures which considerably improves the
reliability in actual use. Here, one is also concerned
about the overhauling or preventive replacement of
Fig. 19.1: A Typical Bathtub Curve the product during its design life. Whereas the third
problem concerns the initial or infancy period,
If a product has worked for sufficiently long period which may range from a few minutes to several
(say 0-t w, in Figure 19.1), wear out sets in due to hundred hours in certain cases. To eliminate the
aging and the old age effect shows up causing possibility of failures immediately after delivery,
hazard rate to increase. Failures during this last manufacturers generally conduct test-runs or
phase of life are designated as wear out failures. debugging tests before delivery to ensure that
As the time passes, death becomes more and more quality failures would be eliminated and a
probable, thereafter. In products involving customer is assured of better product reliability
mechanical parts, where moving parts are present, backed by product warranty.
the wear out sets in as soon as they are put to use. Since failures can occur any time during the
In fact the age at which wear out acts, depends on lifetime of a product, reliability [103] can as well
the product and the environment under which it is be called a birth–to-death concern. Starting with
functioning. Wear out failures can be minimized if preproduction activities like procurement of raw
the product is replaced at an appropriate time. In material and parts, preparing a conceptual design
Figure 19.1, the mean wear out life or the mean life and detailed engineering design the product passes
of product is shown to be (0-M). This, however, through production, test and quality control,
should not be confused with the mean time to shipment, warehouse storage and finally goes
failure, which is the reciprocal of the hazard rate through use and maintenance phase before being
during useful life period, if the failure distribution discarded at the end of its life.
is exponential. Also M has to be greater than T w . Quality and maintainability are the activities
the position of M depends on the failure aimed at improving the reliability of the equipment
distribution during the wear out period. during manufacture and operation and maintenance
phase of product life. Truly, a reliability
management program necessarily concerns itself
about improving the reliability of a product, right
from its conception to disposal.
262 Krishna B. Misra

One of the measures of reliability of a product science. Analytical methods help only to select an
over its entire life period is designated as alternative design or technology; thereafter the
operational reliability (R o ), which can be process is iterative and repetitive till the specified
expressed as a product of inherent reliability, (R i ) performance goals are achieved. Design requires
and the use reliability (R u ), i.e., R o = R i .R u . ingenuity and creativeness and a designer ought to
Inherent reliability is what is built into a product know what has been already tried before and did
during the design and manufacture phase. We not work. Generally, a new prototype is not built
cannot expect better reliability of a product than because the designer knows that it will work better
what we build in it during its design and that way, but because he has no reasons why it will
manufacture phase. Within the resources available, not work until it has been tried out.
reliable design should use all possible means, such In order to design reliability into a product,
as structural redundancy, derating, safety, reasons for product failures must be clearly
environmental factors etc., for improving the understood. Generally, a product fails prematurely
product reliability. Having prepared the best because of inadequate design features,
possible engineering design, we can still lose on manufacturing and parts defects, abnormal stresses
inherent reliability, if we ignore the necessity of induced during packaging or distribution, human
stringent quality control during the manufacture. error and maintenance error, or external conditions
Quality in its simplest interpretation is reliability (environmental or operating) that may exceed the
during manufacturing phase and ensures that only designed values. Usually at the design stage, only
proper materials, processes and quality control rough reliability estimates on components
techniques have been used. reliabilities are available. The accuracy of these
Further the conditions of use, maintenance and estimates depends upon how much data or
environment often affect the use-reliability. In fact, information was available with the analyst and on
maintainability can be considered as reliability his ability to use this information or data to
during operation and maintenance phase. Unless improve upon the product. Consideration for the
proper packaging, instructions for storage, severity of environment under which the system is
shipment and use, operating manuals, maintenance to function must also be made. As the design
procedures, spare parts, training of maintenance progresses, more details usually become available
personnel and conditions of environment is and at that stage it is possible to refine both the
prescribed, it would be meaningless to expect a apportioned goals and the reliability estimates. In
high value of use reliability. the end, the final estimates would be based on the
analysis of a detailed design, which incorporates
19.2.1 Reliability is built during Design Phase failure rates of all components and parts and the
results of all developmental, qualification and
It is estimated [144] that 80 per cent of poor quality acceptance tests. The failure rates used in the
is caused by design. Over 90 per cent of field estimates should be the measured failure rates of
failures are result of poor design and most of the the components used in the system.
product recalls have their origin in design. Most of Inadequate attention to the design effort could
the law suits are filed on account of improper result in a faulty hardware and retrofits rarely
design and 70-75 per cent of product costs are compensate for the original faulty design and may
function of design. Therefore, if any phase in the even be quite costly. They may create additional
entire life-cycle of a product that has maximum problems if the designer is not thorough with the
impact on reliability, it is the design phase. art of handling them. Therefore, there is no
Once a product has been conceived and substitute to a good design and, it is one of the
functionally designed, a designer begins with major responsibilities of the top management to
product design from the performance provide a highly competent design team to bring
consideration, which is not a straightforward out a reliable product or system.
process. It is an art based on the application of
Reliability Engineering: A Perspective
263
In fact reliability should be designed and built costs. Each of these options shall be discussed in
into products and the system at the earliest possible the following sections.
stages of product/system development. It is the Reducing complexity: The number of parts in a
most economical approach to minimize the life system is a measure of system complexity. For
cycle costs of the product or system. One can better performance the complexity should be
achieve a better reliability for the product or reduced as far as is possible. Simpler designs are
system at much lower costs than otherwise as invariably more reliable. Very often the parts are
majority of life cycle costs are locked in phases serially related with respect to the system
other than design and development, which one reliability, each part is required to have a very high
pays later on during the product life for poor reliability. Any alternative design that can help
reliability effort at its design stage. For example, a reduce the number of parts would definitely lead to
typical percentage of costs in various phases of life a significant improvement in reliability. Thus the
cycle phases can be as given in Table 19.2. consideration of simplicity should be exercised
Table 19.2: Life Cycle Costs throughout all phases of the design process. The
necessity of all parts should be questioned and
Life Cycle Phases Percentage
design simplifications should be employed
Costs
wherever possible. A function that can achieved
Concept/Feasibility 3 by assembling few parts with proven reliability
Design/ 12 would be considered more reliable than a design
Development that uses host of components to achieve more
Manufacture 35 automated operation. Designing parts to serve more
Operation/Use 50 than one function or use may also reduce part
Therefore, a sincere effort at design stage offers the counts.
following advantages in that it: In addition to parts minimization, a designer
• Ensures that the product meets the must explore the possibility of minimizing the part
performance goals. variation. Use of common parts and components
• Identifies potential failure mechanisms during can better control material quality and
product design. manufacturing tolerances. Also as component
• Provides an estimation of the product parameters are known to drift over time, the
warranty costs and allows Life cycle and designer must ensure that different tolerances do
warranty cost analysis. not combine in such a way that is likely to degrade
• Optimizes benefits accruing from design system functionality.
alternatives. Derating: Derating consists of stressing a
• Finds the best reliability allocation to meet component significantly below its rated or
system reliability goals. designed value of stress and provides a very
• Permits to predict product reliability prior to effective method of improving reliability of a
finalizing changes. product. This is commonly used practice in
For a reliability engineer, reliability during the electronic equipment and in fact is synonymous
product design and manufacturing phases should with the concept of safety factor in mechanical or
be of primary concern. Several options, such as structural designs. Voltage and temperature are
parts and material selection, derating, stress- common derating stresses for electrical
strength analysis, use of technology, simplification, components. However, derating may be applied in
and redundancy, are available to him to accomplish case of current or power as well. One can have
this. Very often, combinations of these options derating of temperature and effort should be made
may be exercised in the actual design process as to keep operating temperature as low as possible.
trade-offs are made between performance and This can be accomplished by using product heat
transfer models such as reducing power, providing
heat sinks and thermal conduction and radiators,
264 Krishna B. Misra

providing packaging with optimum airflow, and equipment reliability. The electrical and electronic
specifying the operating environment. Stress components should have been environmentally
derating can also be resorted to in case of stress screened to achieve a higher (and specified)
mechanical systems. reliability. High reliability (Hi-rel) components
Redundancy: If the system reliability goal is higher usually have a higher price, but they help
than the system reliability achieved by functional significantly reduce the overall failure rate of the
design of the system, it is necessary to improve product. Sometimes, a designer may have to
system reliability through system structural choose between selecting standard parts and
redundancy. Also when it is impossible to achieve manufacturing specialized parts having perhaps
the desired component reliability through inherent greater tolerances and reliability.
component design, redundancy may be the only In the design phase, one has the complete
alternative. Redundancy may also help achieve freedom to choose the suppliers according to the
increased product reliability against external options available for the choice of components
environmental stresses. Design trade-off analysis is based on technology or the type. One should
generally carried out to get an optimal intelligently trade off between cost and reliability
configuration due to the increased costs of by buying reliable components that will economize
additional components the size or weight added to task of designing equipment with high reliability.
the system, and possibly the increase in repair and Generally trade-off is usually struck on the basis of
preventive maintenance necessary to maintain cost, but ease of repair, parts availability, energy
multiple components due to redundancy rather than requirements, weight, and size may also be taken
just one. Redundant system configurations include into consideration. Databases can be very helpful
both active and standby units or there may be the in selecting parts with better reliability value
more general k-out-of-m or partial, fractional or among the competing parts.
even voting redundancy. Redundancy also provides Communication with the user: It is helpful and
a possibility of repairing the failed units while the advantageous to be in communication with the
other redundant unit or units are operating. customer/consumer for improving product
Safety margins and system compatibility: A reliability. This is also required to secure a
designer should provide the highest operating negotiated value for product reliability in stead of
margin possible for product design. He can conduct producing a design of perfect reliability but costly
product preliminary tests at stresses beyond the since the customer/ consumer may be satisfied with
margin extremes experienced during the operation. a lower value of product reliability at much lower
Test time constraints do require accelerated tests cost than a product of high reliability at a very high
but one should not overlook the fact that the highly cost.
accelerated tests may sometimes present failure Use of Alternate Technology: Sometimes,
modes that may be present during normal operation alternative technologies are available and the
of a product under the specified condition and thus design engineer must explore the possibility of
may not reflect the product’s true reliability. It using alternate technology which may provide an
should not be missed that the product user will use advantage of better reliability as he may have
the product in an unspecified manner and usually considerable flexibility in meeting the design goals.
not inform the designer or producer how it was Typical examples include electromechanical
being used when a failure occurred, causing devices versus solid state devices and digital
unnecessary improvement expense. display versus analog display etc.
Vendor and Component selection: Vendor Better design tools and strategy: In the design of
selection is critical for building reliability into a electrical and electronic equipment one can make
product and therefore should be given appropriate use of computer-based circuit layout that will help
importance to the purchase of reliable components. minimize electronic noise and maximize electronic
Purchasing components from a vendor with proven voltage margins in the circuit physical design. At
reliability history goes a long way to enhance the design level, one has the choice of using
Reliability Engineering: A Perspective
265
materials that tolerate a higher level of engineering tests. Field failures constitute a
temperature, humidity, and chemical corrosion meaningful data source as they represent
resistance would definitely provide an improved experience from the real world. However, the exact
design. These materials, which can be considered a operational and environmental conditions before
mechanical factor, are implicit in the PCB and and at the time of failure may not be fully and
component structures. For example one can use a exactly known. Nevertheless it is helpful to have
tantalum capacitor in design rather than an some data that is reasonably is possible for each
electrolytic capacitor to decrease failure rate and operational failure.
increase operating life. Data on service life is necessary in assessing the
Similarly, in the design of mechanical time characteristic of reliability. It would be
equipment extensive use of Computer Aided helpful to know how many units are in service, for
Design (CAD) and Computer Aided Manufacturing what period of time and under what conditions of
(CAM) is being used extensively these days and use? And how many units have failed at what time?
this facilitates in producing innovative and creative Certainly, answers to these questions would help
systems. Tolerances can be made very tight using obtain considerable insight into product
CNC machines thereby help improving the quality performance. It would be further useful if the
of the product and many more advanced above information is supplemented by the results
manufacturing machines and tools are available to from engineering tests (accelerated life tests) since
produce high precision products. many of the parameters and test conditions can be
controlled, or at least are known. As a result, this
19.2.2 Failure Data data is normally considered to be dependable for
analysis purposes.
In reliability, we try to become wiser form the past The problem of acquiring data is not an easy
mistakes and the whole effort is to avoid such one. Although sufficient failure data has been
failures for which the causes have become known. generated and is available for the electronic or
Therefore, failure information or data is a must for electrical components and devices, not much
any reliability improvement programme. The published information is available on the failure of
success of reliability effort depends on the mechanical components and in fact this lack of
availability of good failure data that is complete information or extrapolation of the available failure
and accurate. This would enable a reliability information to actual conditions of use, constitutes
specialist to estimate and predict reliability today the greatest deficiency in the field of
accurately, take corrective measures, improve reliability.
designs, plan production process, properly operate The fact that failure rate information is not
or even plan maintenance strategies well in published does not necessarily mean that it does
advance. Therefore, the failure data collection and not exist. The reality is that those who possess this
storage is central to the success of any reliability information are often reluctant to publish or share
management programme. it because the same were not obtained under
Scarcity and inaccuracy of reliability data has controlled conditions of experimentation and can
been one of the basis difficulties in the easily be questioned. In case where the data exists,
implementation of reliability improvement the owner would not like to share it with others as
programmes. Although a number or reliability data they have spent quite a fortune in creating or
banks have been established world over, the quality developing it.
of reliability data cannot be called as satisfactory to Although the situation highly satisfactory in the
support more sophisticated models than can be electronic industry where a great deal of failure
employed today. Three types of data are especially information is available today, yet the published
important for evaluating product reliability. These material does not make it more accurate for the
are field (operational) failures data, service life reliability estimates of electronic systems. This is
data with or without failure, and the data from primarily on account of the fact that each test
266 Krishna B. Misra

laboratory obtains a different rate for the same costs. Thus, prediction has several objectives such
component depending upon the test environment, as:
test equipment and the procedure. Therefore, it is • Feasibility evaluation
very common that different part manufacturers • Comparing competing designs
publish different failure rates for the same parts. • Identification of potential reliability
Moreover these published failure rates are problems, and
generally for catastrophic failures and do not • To provide input to other reliability and
include information on out-of-tolerance failures. maintainability tasks.
Also the published failure rates do not include the Feasibility evaluation involves evaluating the
effect of degradation in reliability during compatibility of the proposed design concept with
manufacturing storage, or transshipment. the design requirements. Early in the system
Once the failure data is available and design, a feasibility evaluation would typically
environmental conditions known, the data must be involve the parts count type of prediction (MIL-
used to fit in an appropriate failure distribution and HDBK-217 F, Appendix A) to determine the
parameters of the distributions ascertained to compatibility of feasibility evaluation with required
represent the model [108,109,122, 123,124,133]. reliability. Feasibility evaluation may also include
The situation for increasing hazard rate is not that a detailed parts stress type analysis (MIL-HDBK-
simple since there can be more than one candidates 217 F, Sections 5-23) for components. Feasibility
for fitting in the distribution. There are techniques evaluation is usually critical for totally new system
to do that and excellent texts, such as [128, designs, where no similar experience exists as with
136,146,147], are available that can be referred to. systems with known reliability characteristics.
There are situations when we design unique Comparing competing designs activity is similar
equipment, assembly or redesign a product for a to the feasibility evaluation but for the fact that it
new environment for which no failure rates based provides one output, the predicted reliability, to be
on prior experience are available. In all such cases, used in making broader system level design trade-
it is often possible to establish a range for the off decisions involving factors such as cost,
unknown failure rate by comparison with the weight, power, performance etc. A parts stress type
existing similar systems. The analysis then can be prediction is typically used to provide a
carried out for the limiting the critical failure rates quantitative assessment of relative cost-benefit of
in the range. Often such an analysis can be useful systems level trade-off considerations. Reliability
in establishing a design configuration. Depending predictions also provide a systematic means of
upon the importance of the new component, such checking all components and assemblies for
estimates may have to be substantiated by potential reliability problems. It also offers a means
laboratory tests. Also some knowledge of of evaluating the reliability improvement in case of
correlation of laboratory failure rates with the potential problem areas by focusing attention on
actual service experience should be available. low quality, over-stressed or misapplied parts. It
should be emphasized that the prediction itself does
not improve system reliability; it only provides a
19.3 Reliability Prediction
means for identifying potential problems that, if
Methodology resolved, will lead to improve the system
reliability. Therefore, predictions provide an
Reliability prediction is fundamental to system excellent ground for reviewing and evaluating the
design and involves a quantitative assessment of progress of system design ahead of testing.
system reliability prior to system development. In Reliability predictions also provide key inputs
fact reliability prediction [76] provides baseline to other R/M tasks such as maintainability analysis,
values for reliability growth and demonstration testability evaluation and failure modes and effects
testing, maintainability, supportability and logic analysis (FMEA). Because prediction identifies
weak reliability spots, it provides key inputs to
Reliability Engineering: A Perspective
267
weigh the benefits of adding test points, making failure rates for a specific application. One
more readily accessible locations for maintenance can improve the reliability of a system by
or introducing redundancy to minimize the carefully anticipating the intended
influence of a particularly critical failure mode. environment of use. In fact, the system being
Reliability prediction begins at the lowest level in designed may have to function in the
system hierarchy, i.e., at the components level and presence of harsh environments like extreme
proceeds thorough the intermediate level, which is temperatures, humidity, salt spray, moisture,
subsystem level, until system reliability is dust, sand, gravel, vibration and shock, EMI
obtained. In general there is hierarchy of reliability etc.
prediction technique available depending on the • The product reliability then can be computed
depth and knowledge of the design and the to provide a product level estimation of the
availability of collected reliability data on the reliability. This procedure, when applied
equipment and any useful information on it. repeatedly to new evolving product:
The general procedure of prediction can be • Becomes an accurate indicator of
outlined as follows: unreliable components in the new
• Define the system and its operating products,
conditions. • Becomes an accurate indicator of total
• Define system performance criteria (in terms system reliability determined by
of either success or failure). catastrophic failure of components,
• Development of a system model using • Becomes a product reliability document,
Reliability Block Diagrams (RBD) or Fault eventually.
Tree methodology. In fact, a components’ level synthesis of product
• Compile parts list for the system. reliability is recommended for all serious product
• Assign failure rates and modify generic designs. The process becomes more accurate with
failure rates using an appropriate procedure. repeated use and evolves into the reliability design
• Select a prediction procedure and perform tool for the system.
Parts Count or Parts stress method for the
system 19.3.1 Standards for Reliability Prediction
• Combine part failure rates.
• Estimate system reliability. Depending upon the nature and type of
The main effort in reliability predictive sequence is components, whether electrical, electronic or
concentrated in carrying out the following tasks: mechanical, several documents are available for
reliability prediction procedure. An engineer can
• A list of components is prepared for the
select the model that is best suited to the part types
product. The component reliability data for
and requirements of the component or product
components is usually obtained from the
whose reliability is being predicted. It is important
applicable sources or relevant Data Banks.
that predictions are made as precise as possible
• Reconciliation is often needed between the
since if the reliability predictions are made too low
differing values of failure data available from
then one may arrive at an over design and the
various sources. A tentative and eventual
design can be costly. On the other hand, if the
accepted value for each component is
predicted values are too optimistic and high, the
determined.
design could lead to catastrophic consequences.
• Various operating stresses are usually The standards for reliability prediction use a series
projected for the preliminary design and of models for various categories of electronic,
expected operating conditions are projected electrical and electro-mechanical components to
into the prediction of failure-rate of each predict failure rates that are affected by
component. Stress factors are generated as environmental conditions, quality levels, stress
appropriate multiplying factors for the part conditions and various other factors. Some of these
268 Krishna B. Misra

procedures can be found in the following aid in constructing block diagrams indicating
documents: logical connection between the system and its sub-
assemblies and components on the other side. This
MIL-HDBK-217: This standard has been the
graphical representation can be extended to relate
mainstay of reliability predictions for about 40
failure modes at various system levels. For
years. The most recent revision of MIL-HDBK-217
maintained systems standards such as MIL-STD-
is Revision F Notice 2, which was released in
472 can be helpful in estimating Mean time to
February 1995 and has not been updated since
Repairs (MTTR) for sub-systems and components.
then. This is the most widely used and accepted
document for the prediction of electronic Telcordia SR-332: The Telcordia method originally
components and equipment reliability. Reliability developed by AT&T Bell Labs and modified by
prediction using this procedure in electronic design Bell Communications Research (Bellcore) to
often results in a reduction in design lag time and improve upon the representation of mathematical
savings in lifetime costs. MIL-HDBK-217 can be equations of MIL-HDBK-217 in accordance with
used for both commercial and military grade what their equipment was experiencing in the field.
components. The handbook includes a series of However, the main concepts in MIL-HDBK-217
empirical failure rate models developed using and Telcordia are very similar, but Telcordia has an
historical piece part failure data for a wide range of added the ability to take into account burn-in, field,
components. They predict reliability in terms of and laboratory testing, which makes it quite
failures per million operating hours and assume an popular with the commercial establishments. The
exponential distribution, which is justified in case current version of Telcordia is Issue 1, which was
of electronic and electrical components. MIL- released in May 2001 and follows Bellcore Issue 6
HDBK-217 allows to perform a parts count in order of release. The Telcordia method also
analysis and a parts stress analysis. One can use assumes an exponential failure distribution and
parts count for quick estimates and early design calculates reliability in terms of failures per billion
analyses or use part stress method covering 14 part operating hours, or FITs.
separate operational environments, such as ground Telcordia also has the ability to perform a parts
fixed, airborne inhabited etc., .for taking actual count or part stress analysis and also provides
temperature and stress information into account models for predicting the failure rates of units and
and therefore offers relatively an accurate estimate devices during the first year of operation. In fact
of failure rate. Typical factors used in determining the failure rate during this period is expressed as a
a part’s failure rate include a temperature multiplying factor (First Year Multiplier or FYM)
factor (π T ) , power factor (π P ) , power stress operating on the predicted steady-state failure rate.
factor π S , quality factor (π Q ) and environmental The Prediction Module automatically calculates the
First Year Multiplier based on specified system,
factor (π E ) in addition to the base failure rate (λb ) . unit and device burn-in times and temperatures.
To improve upon some of the handicaps of this Telcordia Standard allows reliability predictions
standard, one can extend the parts count and parts to be performed using three methods: Method I
stress methods by incorporating the mathematical parts count approach applies when there is no field
models from the Telcordia procedure, which can failure data available, the Method II provides a
also be used for the prediction of reliability of modification to Method I to include laboratory test
electronic components and equipment and allows data and the Method III variation includes field
the consideration of burn-in as well as laboratory failure tracking. Method I include a first year
and field data. Multiplier to account for infant mortality. Method
There are other standards that are useful in II includes a Bayes weighting procedure that
reliability prediction process. They are MIL-STD- covers three approaches depending on the level of
1629A, BS 5760 Part 5 for carrying out Failure previous burn-in the part or unit has undergone.
Mode, Effects and Criticality Analysis (FMECA). Method III also includes a Bayes weighting
The FMECA module provides interactive graphical procedure but is based on three different cases
Reliability Engineering: A Perspective
269
depending on how similar the equipment is to the One of the differences is that MIL-HDBK-217
one from which the data was collected. For the calculates failure rates in failures per million
most widely used Method I, where the burn-in hours, whereas the Telcordia model calculates
varies, the steady-state failure rate depends on the failure rates in failures in time (or FITs), which is
basic part steady-state failure rate and the quality, expressed as failures per billion hours. MIL-
electrical stress and temperature factors as follows: HDBK-217 provides models for printed circuit
boards, lasers, SAWS, magnetic bubble memories,
λ SSi = λGi π Qi π Si π Ti (19.3)
and tubes, whereas the Telcordia does not support
Telcordia offers ten different Calculation Methods. these parts. Telcordia provides models for
Each of these methods is designed to take into gyroscopes, batteries, heaters, coolers, and
account different information, which relates to computer systems. However, these part types are
stress data, burn-in data, field data, or laboratory not supported by MIL-HDBK-217. Nevertheless,
test data. one can always use MIL-HDBK-217 for the
majority of the parts in the analysis and use
Comparison between MIL-HDBK-217 and Telcordia for those part types that are not
Telcordia: Since MIL-HDBK-217 was the original supported by MIL-HDBK-217 (or vice versa).
standard for reliability prediction analyses, it is Since Telcordia was initially designed for use in
known and accepted worldwide, whereas Telcordia the telecommunications industry, the operating
is primarily accepted in the United States, although environments that Telcordia supports are very
its popularity is gradually growing internationally, limited. For example, Telcordia only supported
it has not been completely accepted by the three different variations of Ground-based
international community as yet. Moreover, when environments, initially. But, Telcordia is fast
AT&T Bell Labs developed the Bellcore (now evolving and in the most recent issue of Telcordia,
Telcordia), they concentrated primarily on additional operating environments of Airborne,
commercial equipment. Telcordia was specifically Commercial and Space, and Commercial have been
designed to focus on telecommunications whereas made available. MIL-HDBK-217, on the other
MIL-HDBK-217 was much more broad based in hand, has always offered a number of different
scope and is useful to both military and operating environments. Currently, MIL-HDBK-
commercial equipment, and had no specific focus. 217 supports a variety of ground, sea, air, and
Although the basis of the Telcordia and MIL- space environments.
HDBK-217 calculations are very similar, it is often In MIL-HDBK-217, the quality levels that are
observed that the calculations in Telcordia are used differ from one part type to another and they
more optimistic than calculations in MIL-HDBK- are derived from specific data that is component
217. Moreover, Telcordia calculations require dependent. Therefore, the quality levels for
fewer parameters for components. This however resistors are different than the quality levels for
does not mean that Telcordia failure rates shall semiconductors and the quality levels for
always be better. It simply indicates that, semiconductors are different from the quality
depending on the component types, the difference levels for integrated circuits. But the assignment of
in failure rates can be predicted. quality levels in Telcordia is very simple and it
As stated earlier, Telcordia model an additional currently supports four standard quality levels.
capability of considering burn-in data, laboratory These quality levels are identical for all component
test data, and field data. This feature is extremely types, and are simply based on the origin and
helpful in calculating failure rates that are based on screening of components.
historical data, rather than simply stress data. In
addition, burn-in data is used to quantify the first PRISM: This procedure has been developed by the
year multiplier, which is indicative of infant Reliability Analysis Center (RAC), and was
mortality. released in 2000. It provides the ability to update
predictions based on test data and addresses factors
such as development process robustness. PRISM
270 Krishna B. Misra

interfaces directly with RAC’s automated document is forced to report average failure rates
databases and provides methodology to assess the to account for both defects and wear out.
quality of system development process. It has been Cumulatively, the database represents
incorporated into the Relex Reliability Prediction approximately 2.5 trillion-part hour and 387,000
module for carrying out system reliability analysis failures accumulated from the early 1970’s through
and MTBF prediction. It has the ability to model 1994. The environments addressed include the
the effects of thermal cycling and dormancy. It same ones covered by MIL-HDBK-217; however,
allows one to select parts from both the Electronic data is often very limited for some environments
Parts Reliability Data (EPRD) and Non-electronic and specific part types.
Parts Reliability Data (NPRD-95) documents
NSWC-98/LE1: This document is also known as
published by RAC and permits the use of
the Handbook of Reliability Prediction Procedures
predecessor data and process grading factors in the
for Mechanical Equipment (NSWC-98/LE1) and
reliability analyses.
was primarily developed by the United States
The Bayesian approach has facilitated the use of
Navy. This document uses a series of modules for
test and field data at the assembly level to enhance
several categories of mechanical components to
predicted component failure rates with real-life
predict the failure rates that are affected by
experiences. PRISM includes means to deal with
stresses, flow rates, temperature, and several other
software reliability but is limited by the fact that it
factors. In fact it provides models for various types
does not yet include models for all commonly used
of mechanical devices including springs, bearings,
devices. The system reliability model presented by
seals and gaskets, electric motors including motor
PRISM is:
windings, brushes, armature shaft etc. It also deals
λS = λIA (π P π IM π E + π D π G + π M π IM + π E π G + π S π G + π I π E + π N + π W π E ) + λSW
with brakes and clutches, compressors, threaded
(19.4) fasteners, mechanical couplings and slider-crank
where λ IA is the initial assessment failure rate mechanisms. This is a relatively new standard and
(based on RACRates component failure rate also contains reliability models for solenoids, gears
models) for the system and the other factors and splines, valves, actuators, pumps, filters etc.
involved address parts processes (π P ) , infant
CNET 93: It is a document developed by France
mortality (π IM ) , environment (π E ) , design Telecom and provides reliability models for a wide
processes (π D ) , reliability growth (π G ) , range of components. CNET 93 is a comprehensive
manufacturing processes (π M ) , item management model similar to MIL-HDBK-217, which provides
a detailed stress analysis.
processes (π S ) , induced processes (π I ) , no-defect
RDF 2000: It is a newer version of the CNET
processes (π N ) , and wear-out processes (π W ) .
UTEC80810 procedure developed by UTE and has
Also λ SW is the software failure rate. Quantitative not received much attention in the US but has the
values for the individual factors are determined potential of becoming eventually an international
through an interactive process intended to standard. It uses cycling profiles and their
benchmark the extent that measures known to applicable phases to provide a completely different
enhance reliability are used in design, basis for failure rate calculations. The models take
manufacturing and management processes. into account power on/off cycling as well as
temperature cycling and are very complex with
NPRD-95: This document provides failure rates for predictions for integrated circuits requiring
a wide variety of items, which include mechanical information on ambient and print circuit ambient
and electromechanical parts and assemblies. This temperatures, type of technology, number of
document provides detailed failure rate data on transistors, year of manufacturing, junction
over 25,000 parts for numerous part categories temperature, working time ratio, storage time ratio,
grouped by environment and quality level. Because thermal expansion, number of thermal cycles,
the data does not include time-to-failure, the thermal amplitude of variation, application of the
Reliability Engineering: A Perspective
271
device, as well as transistor technology related and available either from the field or from testing
package related base failure rates. laboratories or from the data banks. But usually if a
product that is being designed is new and there
HRD5: It is a reliability prediction procedure
may not be enough information available from
developed by British Telecommunications plc. and
either of the above-mentioned sources, then the
provides models for a wide range of components.
reliability engineer has to use some alternative
In general, HRD5 is similar to CNET 93, but
procedure. There are a variety of reliability
provides simpler models and requires fewer data
prediction procedures that can be used in such a
parameters for reliability analysis.
situation.
Document 299B: This document is based on the Similar Product Technique: This approach is used
Chinese standard GJB/z 299B and its English to estimate the reliability of a new product based
translation has been done by Beijing Yuntong upon the known reliability of an existing product
Forever Sci.-Tech. Co. Ltd. The document 299B is with similar attributes. These attributes can be the
very similar to the MIL-HDBK-217 reliability type of technology used, digital circuitry, the
procedure and permits one to take temperature and complexity of the product and the operating
stress information into consideration. environmental conditions and quality of the
Physics-of-Failure Procedure: This has a family of product.
approaches that differ significantly from the other Similar Complexity Technique: In this approach,
empirical methodologies and attempts to identify the reliability of the product under design is
the weakest link of a design to ensure that the estimated by comparing its relative complexity
equipment reliability exceeds the design value and with a known product of similar complexity.
the methodology ignores the problem of defects
escaping the manufacturing process and assumes Prediction by Function Technique: This approach
that the product reliability is constrained by the uses the correlations between function and
predicted life of the weakest link. The model reliability in order to predict the reliability of a new
addressed the problems such as microcircuit die design.
attach fatigue, bond wire flexure fatigue and the die However, there are two basic empirical techniques
fatigue cracking. The models require detailed that are often found useful in the task of product
device geometry and material properties It is reliability estimation, viz.,
however used primarily at the sub-device level • Parts count prediction method.
during early design stage in electronic system • Parts stress prediction method.
reliability predictions. Failure data for both these methods are available in
latest release of MIL-HDBK-217F if the product
The IEEE Gold Book: viz., IEEE STD 493-1997 is belongs to the electrical or electronic category.
available for the design of reliable Industrial and These methods are also used with modification in
Commercial Power Systems and provides data on case of mechanical equipment particularly under
equipment used in industrial and commercial certain conditions of use.
power distribution systems besides dealing with
reliability analysis, probability methods, power 19.3.2.1 Parts Count Method:
system reliability evaluation, economic dispatch This method is used for obtaining preliminary
and cost of power outage data and this document design and development and consists of preparing a
has been updated in 1997 although the most recent list of the each generic part type such as capacitors,
data is from 1989. resistors, and so in an electronic circuit and their
numbers used in the design of an electrical or
19.3.2 Prediction Procedures electronic equipment. It assumes that these
components are reasonably fixed and over all
The task of predicting reliability of a product is design complexity is not expected to change
straight forward if reliability data of the product is substantially during later stages of design,
272 Krishna B. Misra

development and prediction process. The parts the stresses such as temperature, humidity,
count method is generally used in the early design vibration etc., and their effect on the parts failure
phase as sufficient information is not available rate, to which each part shall be subjected to under
then, and a full stress method cannot be used. MIL- actual conditions of use. The part stress analysis
HDBK-217 provides a set of default tables, which also assumes that all times to failure of parts are
provide a generic failure rate (λ g ) for each exponentially distributed. The data necessary for
component type based upon the intended the parts stress analysis include: specific part types
operational environment. This component generic (including device complexity of microelectronics),
failure rate is modified by a quality factor (π Q ),
quantity of parts, part quality levels, environment
of use, and part operating stresses.
which represents the quality of the components in To enable a full stress analysis to be conducted
question i.e., the components are manufactured and there must be sufficient details on the systems
tested to a full Military Standard or to a lesser design available to the reliability engineer. In short,
commercial standard. In addition to this, a learning the design of the system must be such that the
factor (π L ) is also used for micro-electronic electronic and electrical design of the hardware is
circuits and represents the number of years that the known down to the components’ level. There will
component has been in production. Thus using the be detailed parts lists and circuit schematics as this
component generic failure rate for a given is required to take into consideration the electrical
environment and modifying it for quality and and thermal stresses that may be experienced by
learning factor in case of microelectronics, various components. With the detailed knowledge
component’s final failure rate is established. The of the electrical and electronic design and the
summation of the individual component’s failure specific data pertaining to each component type
rates will yield the overall failure for the circuit. used within the design the stress analysis can be
With this and the summation of other Circuit Card done. Component information, or data sheets, is
Assembly, the failure rate of a line replaceable unit usually available from the manufacturer and these
can be established. days this information can also be obtained over the
Finally, following this process an overall failure Web. In the case of a parts stress analysis, the
rate for the system can be established. This mathematical models are available in MIL-HDBK-
approach assumes constant failure rate of each 217 for each component type, i.e.,
component, assembly, equipment and the system. microelectronics, resistors, capacitors and
The system/sub-system failure rate can be obtained electro/mechanical devices.
by summation of part failure rates assuming that all The most general approach used by MIL-
components are in series. In case the model HDBK-217, is that each generic component type is
consists of non-series components or redundant assigned a base failure rate (λb ) and is modified by
units, the item reliability can still be determined influence factors. These factors, such as π E, which
either by considering only series elements of the is referred as environmental factor, based upon the
model as approximation or by summing part failure intended operation environment of the equipment
rates for the individual elements and calculating an or system and the other factor considered is π Q ,
equipment series failure rate for non series element which referred as quality factor and is a general
of the model. look up factor that represents the quality of the
19.3.2.2 Parts Stress Method: component in question. There are other factors as
well that are used for multiplying the generic
It is an accurate method of system reliability failure rate. These factors should all be known or
prediction other than measuring reliability under can be determined based upon the state-of-
actual or simulated conditions. This method is hardware, for which the part stress method is being
usually used when the design is almost completed applied and assumptions regarding the various
and detailed parts list and parts stresses are known. factors associated with the failure rates can be
The approach requires a detailed knowledge of all justified. Some of the factors will result in a linear
Reliability Engineering: A Perspective
273
degradation of the base failure rate while others condition is usually true for the vast majority of
may result in an exponential degradation, in mechanical components [75].
particular, the factors associated with temperature.
There are many environmental factors for which 19.3.3 Reliability Prediction for Mechanical
the generic failure rate can be modified; however and Structural Members
some of the environments factors used in MIL-
HDBK-217 are: Ground Benign (GB), Naval Mechanical components do wear out due to
Shelter (NS), Airborne, Uninhabited Cargo (AUC), friction, overload, plastic deformation, fatigue,
and Space Flight (SF), the details of which can be changes in composition due to excessive heat,
found in the MIL Standard. corrosion, abuse etc. The traditional method of
Therefore, the part failure models may vary guarding against stress related failures has been to
with different part types. But their general form is: specify a safety factor greater than one, where
λi = λBπ Eπ Aπ Q π N (19.5) safety factor is defined as: SF= S/s, SF being the
where, nominal safety factor, S is the nominal strength or
λ B = Base part failure rate; the value is obtained allowable stress and s is the maximum principal
from the part stress data for each generic part stress. This definition assumed that the value of
type. The data is generally presented in the stress and strength are known exactly and the
form of failure rate against normalized stress difference between them provided the safety
margin for overloading and reduction of strength
and temperature factors. The value of λ B
over time due to corrosion, cracking etc. If a high
usually determined by the stress level
value of safety factor is chosen, the design tends to
(current, voltage etc) at the expected
become heavy, bulky and costly. Thus a designer
operating temperature.
needed to make a trade off and use his experience
π E = Environmental factor; this factor accounts for
and judgement in arriving at a adequate value of
the influence of environment other than
safety factor.
temperature and is related to other operating
There is always uncertainty present in the
conditions, such as vibration, humidity etc.
measurements of strength as well as in stress
π Q = Quality factor; this factor accounts for the calculations. Therefore, stress and strength must be
degree of manufacturing control with which treated as random variables and following the
the part was fabricated and tested before principles of probabilistic design (as against the
shipment to the user. deterministic approach using safety factors) as
π N = Additional adjustment factor; this is a factor, introduced in equations (19.1) and (19.2), one can
which can account for cyclic effects, predict the reliability of a mechanical or structural
construction class and other factors that member.
modify data Probabilistic mechanical or structural
Modification for Mechanical Components: component reliability [11,74,89,141,152] is gaining
Unfortunately, mechanical parts typically do not importance of recent due to the fact that today we
follow the exponential failure distribution or have high performance materials due to advances
constant failure rate. Instead they exhibit wear out in material technology but these materials often
characteristics, or have an increasing failure rate possess detrimental side effects and the other
with time. The procedure developed by RAC reason is that the need for higher performance is
makes it possible to still use the standards for the constantly pushing operating stresses to higher
prediction of reliability for mechanical levels. Probabilistic design model consists of four
components. While the actual time to failure major activities; viz., design process, material
distribution may be Weibull or lognormal, it may production, manufacturing and operations. Output
appear to be exponentially distributed over a long from design process is the expected operating
period of time. However, this is so only when the stress distribution resulting from load spectra. The
components are replaced upon failure. This remaining three activities provide the material
274 Krishna B. Misra

strength distributions, determined through Monte described earlier but using simple simulation
Carlo simulation of random variables representing technique it is possible to calculate probability of
random variation of incoming material strength, failure for explicit or implicit limit state functions.
manufacturing defects and operational factors. The In fact to evaluate the accuracy of these
probability of failure can be computed from (19.2) sophisticated techniques or to verify a new
as the probability of stress exceeding the strength. technique, simulation technique is routinely used to
The integral can be computed numerically using independently evaluate the probability of failure.
trapezoidal rule or more refined methods such as A good description of all these approaches can
Simpson rule or the methods based on polynomials be found in reference [55,58] and also in the
such as Laguerre-Gauss or Gauss-Hermite chapter 63 included in this handbook.
quadrature formulae.
Analytical methods such as FORM and SORM
are also available to calculate the probability of 19.4 System Reliability Evaluation
failure. There are first order second moment
(FOSM) and advanced first order second moment Since system reliability design, as we all know, is
(AFOSM) methods. The method derives its name basically an iterative process and system reliability
from the fact that it is based on the first order computation is repeatedly required at each cycle of
Taylor series approximation of the performance iteration of the design procedure, reliability
function linearized at the mean value of the random evaluation is a must for any system reliability
variable. In FOSM, the information on the design. This necessitates development of efficient
distribution of random variables is ignored whereas and fast procedures for system reliability
in AFOSM, the information on the distribution of evaluation in order to economize on time required
random variables is taken into account. FOSM is for system design. In fact a considerable amount of
also known as mean value first order second research work has been done to develop fast
moment method (MVFOSM). There are other methods of system reliability computation.
methods such Hasofer-Lind (H-L) method [9], System reliability computation basically
which basically is an AFOSM approach when the involves a three-step procedure. In the first step,
random variables are normally distributed. H-L one develops a logical model for the system and in
methods are available for linear and nonlinear the second step the system logical relationship is
performance equation or limit state equation. One transformed in algebraic relationship of system
can determine the reliability index, which is reliability expression in terms of component
basically the distance between origin and the reliabilities and in the third step; one can substitute
design point on the performance equation, in both for component reliabilities in the algebraic
the cases. expression available for system reliability.
Second Order Reliability Methods (SORM) Sometimes, computerized algorithms combine
takes into account the curvature of the limit state these steps 2 and 3 and produce system reliability
equation by considering second order derivative value without transforming logical relationship into
terms in the Taylor series expansion, which were an algebraic reliability expression. Reference [6]
actually neglected in FORM approach. Thus exactly does that for a series-parallel configuration
SORM is supposed to offer more accurate and is the fastest method of computing system
approach. The SORM approach was first suggested reliability. For non-identical parallel components,
by Fiessler et al [25] using quadratic an algorithm [16] computes reliability very fast. If
approximation. A simple closed form solution for analytical procedures are not used, Monte Carlo
the probability computation using second order methods [103,150,155] can be used to estimate
approximation was given by Breitung [50] using system reliability.
the theory of asymptotic approximations.
The reliability computation when limit states are
implicit can be done using analytical methods
Reliability Engineering: A Perspective
275
19.4.1 Reliability Modelling or in failure frame of reference through Fault tree
diagram, where the failure of system is related to
Reliability modelling is the first important step in the failures of system constituents at various levels.
the system reliability evaluation. In order to model Both the approaches have their own advantages in
a system, it is always possible to decompose it into system analyses.
its constituent parts. For complex system, the Fault tree analysis is a well-documented
number of such parts may be quite large and a methodology and is being used extensively for
multi-level approach is always helpful to achieve assessing the safety and reliability of high risk
this decomposition. A component by definition is system such as nuclear, chemical, aerospace
the system constituent at the lowest level. industries. One can find a detailed discussion of
Therefore, it may be necessary to model this multi- this technique in chapter 38 of this handbook and
component system at various levels. The system in chapter 8 of [103]. There are other graphical
can be maintained (where repairs of failed units are relationships possible by which the same task can
possible) or may just be non-maintained system. be achieved, viz., events trees, binary decision
Modelling for these two types is done differently. diagrams (BDD) [chapter 25 of this handbook],
In reliability modelling, we try to achieve a causal trees or diagraphs [chapter 8 of 103] etc.
relationship between individual components, or Petri nets [22,38,39,66,117] and even neural nets
subsystems failures with the system failures for the [99] have been found useful in system reliability
known system objective. One can model system assessment or fault diagnosis programs. This
failure in two ways, viz., using either the black-box handbook provides discussion of some of these
approach or the white box-approach. In black-box topics.
approach the state of system is described either in The underlying assumption in reliability
terms of two states (working/failed) or more than modelling is that each component or system can
two states (multi-state model) without linking them have only two states which means that it is either
to the components of the system. In the white-box good or in working state or it is bad, which means
approach the state of the system is specified in not working. Thus the model is a two-state model.
terms of the states of the various components. In In fact three state models [103] have been
the white-box approach we have forward or widely discussed in the past and optimal design of
bottom- up approach, in which one starts with the these systems has also been discussed widely.
failure events at the part level and then proceeds to Three states systems are those systems where two
the system level to assess the consequences of such failure modes of failure, such as short circuit and
failures. Failure mode and effects analysis (FMEA) open circuit failures in electrical passive devices
belongs to this type of modelling. In the other and a valve getting stuck open or stuck closed in
approach which is known as backward ( top-down) mechanical systems, are considered in computing
approach , one starts at system level and proceeds system reliability.
downwards to the component level to link the Recently, considerable attention is being
system performance to the failures at part level. accorded to the analysis of multi-state systems
Fault tree analysis belongs to this category. [139,142], where the usual assumption of binary
Linking of system performance to the states for components and system is shed to include
performance of its constituents can be done multi-state components, which may be useful in
qualitatively and quantitatively. In the first case considering degraded states in addition to working
one builds up the logical relationship whereas in and failed states. Analysis and design of such
the latter, one obtains a measure of system systems using conventional methods becomes quite
performance (which may be system reliability) to cumbersome particularly when repairs are being
the performance (reliabilities) of components. The considered. Also reliability modelling of nano-
logical relationship can be expressed graphically scale devices [154] is also finding importance with
either by what is known as Reliability Logic the nanotechnology getting wider acceptability.
Diagram (RLD) in the success frame of reference Therefore, new approaches such as becomes
276 Krishna B. Misra

naturally important and this handbook includes that all terminals of the system are able to
chapters 28, 29, 30 and 58 on the recent researches communicate with each other or remain connected
in this direction. with each other. Imperfect nodes were also
Alternatively a component can be modelled as a considered by several papers including [27].
two terminal passive device which allows a signal
to pass through it from its input terminal to output 19.4.2 Structures of Modelling
terminal, if it is good and blocks it if it is bad. In
other words, a component can be considered as a Based on the modelling procedure, several system
directed branch with its reliability as its gain. This structures are obtained, such as series, parallel,
modelling criterion helped to model system as a series-parallel, parallel-series, standby, non-series-
graph like structure known as probabilistic graph parallel, k-out-of-n. Again k-out-of-n , which has
and in fact facilitated the use of graph theory for been widely researched model [18,19,28,32,45, 52,
system reliability evaluation. The first two papers 62, 83, 84, 85,92,104,105,115,116,129,130,134,
using this approach were [5,6] published in 1970 145] has a family of its own such as consecutive-k-
and were trend setter for the use of graph theory in out-of-n model and many other models such as
system reliability evaluation. The reference [5] was circular consecutive-k-out-of-n model, circular m-
subsequently helpful in developing topological consecutive-k-out-of-n model, circular r-within-
method [23] of system reliability evaluation. consecutive-k-out-of-n model based on related
Application of graph theory [5] also led to the definitions and possibilities. A good discussion of
definition of path set and cut set. A path set is a set these can be found in [103]. Among all the series-
of those components whose success leads to parallel (or reducible networks or not non-series
system success and this definition works well with parallel) structures, the most widely used and
block diagram approach of system representation. general model is k-out-of-n:G, which consists of n
Likewise, a cut set is a set of those components in a components in parallel out of which at least k
system whose failure yield system failure and this components should be good. Likewise one can
definition works well with fault tree approach. One define k-out-of-n:B or k-out-of-n:F, whereat least
also defines minimal path sets and minimal cut sets k units must fail for system to fail. In fact other
as sets which have the minimum number of models like series, parallel etc. can be derived as
components of the system. There can be very many special cases of this k-out-of-n model. Therefore, it
path sets and cut set in a system. These definitions was considered important to include a discussion of
and developments led to several competing system this structure and to provide an indication of what
reliability evaluation techniques [103, 149]. are the new trends in research in this area. Load-
It is needless to emphasize here that this sharing model (see chapter 20 of this handbook) is
modelling consideration is particularly found another realistic model of parallel or k-out-of-n
useful in performance evaluation of system, where the failure of any unit increases the
communication systems [63,72,151,153] or hazard rate of the remaining units in parallel.
transportation system or water supply system with There are models [17,103], which help compute
or without capacity of links specified [33,41,103]. system reliability with the assumption of
One can define the system success and thereby its dependency of failures to approach realistic
reliability if communication system is able to situations where the analyst cannot ignore the
communicate between the two specified terminals dependency of failures.
of the system. Therefore, it is aptly known as two- Non-series-parallel system or non-reducible
terminal reliability (or terminal-pair reliability) of system is more general category of models but is
the communication system. This leads us to define difficult to handle. However, methods are available
k-terminal [47,56,72] or n-terminal(or all- to compute reliability of such systems.
terminal) or global reliability of these systems, if it Markov chains provide a modelling procedure
is possible to communicate between specified k- for the availability or reliability modelling of
terminals or all the n terminals of the system so maintained systems under various assumptions of
Reliability Engineering: A Perspective
277
practical importance. Markov models that are which for m =100 provides approximately a value
discrete in states but continuous in time have been of 1.27x1030. One can imagine what it will be for
found very useful in analysing systems with repairs m=1681. One should not forget that it was a
under varying repair strategies and support facility system with 33 elements only that yielded the
considerations. However the main drawback with number of path sets as 1681. What will be the
these models is the size of Markov models, which number of path sets if the system is large, say has
grows very fast with the number of components in 1000 elements?
the system and therefore restricts its applicability to To overcome this problem, initial research was
large system. A good discussion of Markov focused on obtaining directly the non-cancelling
modelling of maintained systems can be found in terms of system reliability expression in
chapter 7 of [103] where the approaches such as uncomplemented terms. This led to development of
state space, network and conditional probability topological methods [5,20,23] of generating
approaches for maintained systems have been directly the terms of reliability expression. But it
provided. Three-state Markov model is presented can be easily visualized that the number of non-
in [12, 103]. cancelling terms would not be small problem was
Several other reliability models have been still far from being satisfactory. For example, for a
developed based on conditions of use and the 7-element system [5] with 7 path sets, 29 non-
physical design constraints but the basic cancelling terms were obtained, whereas 27-1 is
consideration in developing a model is to provide 127.
means to compute its performance as realistically This difficulty or handicap led to further research
as possible under the assumptions made. in the development of reduction and transformation
techniques of obtaining system reliability
19.4.3 Obtaining Reliability Expression expression. For series-parallel systems
decomposition is straight forward and one could
System reliability evaluation methods can be use a computerized system decomposition
classified in two broad categories, viz., those which technique such as suggested in [6], which
employ path sets and cut sets for deriving system incidentally computes system reliability value
reliability expression and other category uses without obtaining explicitly an expression for
recursive method [7,8,59] network reduction [6, system reliability and is the fastest method of
96,113], decomposition [21,82,96,112] or computing system reliability even today. Also
transformation techniques [29,34,36]. A good originally it was suggested for redundant systems
discussion of these techniques is available in [103]. witch active redundancy but can consider any type
A comparative assessment of earlier methods and of redundancy (viz., standby, active, partial, or
error computation was provided in [13,14]. voting). But for non-series parallel systems or
Early research in the evaluation methods was non-decomposing systems (which can’t be
also oriented towards determining path sets and cut decomposed into series or parallel structure) Bayes
sets of a system economically as the number of theorem or factoring theorem [5,6,65] has been
path sets and cut sets can be very large even for a effectively used to obtain system reliability
system of moderate size. For example, a 33- expression by factoring out one element at a time
element system could give as many as 1681 path from the system which generates two sets of
sets and unionizing them to obtain algebraic system systems (one with element being factored out open
reliability expression can be time consuming and circuited and the other with this element being
sometimes unwieldy from all accounts. In general shorted). The resulting configurations can be
if there are m path sets of a system, the union of further used for decomposition by repeatedly
these path sets normally should yield an expression applying factoring theorem till one gets a series
containing 2m-1 terms, of which many of them of parallel configuration. Simultaneous factoring of
course will combine but one can roughly figure out several elements is also possible [6]. Each time one
the number of resulting terms by calculating 2m-1, factors an element, two configurations are
278 Krishna B. Misra

generated. But here again the computation time 90,102] each claiming to provide ever-decreasing
increases exponentially with the number of number of terms in a system reliability expression.
elements being factored out. Work on directed This concept was further extended to include
networks [49,71,94] has received considerable multi-variable inversion [95,111,137] and highly
attention since factoring theorem with directed compact form of system reliability expression.
components cannot be applied as with undirected These procedures offered the advantage of a
elements. compact expression which not only saves
Another direction of research focussed on computation time but can provide more accurate
transformation techniques [29, 34, 36], which are numerical value of system reliability as it involves
basically approximate methods but has been less number of multiplications and handles less
improved upon to provide reasonably accurate number of terms.
results from engineering point of view.
Yet another direction, which is receiving 19.4.4 Special Gadgets and Experts Systems
considerable amount of attention and a significant
amount of work is being done is the use of parallel Another important development in the reliability
processing algorithms [114] to reduce computation evaluation techniques has been in the area of
time and to develop capability of handling very development of special electronic gadgets or
large systems. dedicated desk-top systems in determining the path
In order to further improve the reliability sets or cut sets of a system. If we can represent a
evaluation procedure and to obtain a compact form system by a network of two-terminal components
of the system reliability expression involving and a signal (generated by a signal generating
complemented (unreliability) as well as device) is passed through the system. When signal
uncomplemented variables (reliabilities), there is able to pass through the system (where each
were suggestions to generate disjointed path sets so element is in a particular state on or off state) it
that reliability expression can be obtained just by would simulate the system up-state and is
summing up the probabilities of disjointed path analogous to creating a path set if we are able to
sets. This approach yielded very few terms in the record the state of each component. The digital part
system reliability expression than the system of this gadget would take care of the algebraic
reliability expression that could be obtained using computation of reliability. This gadget is also
methods available at that time in terms of useful in demonstrating to reliability beginners the
uncomplemented (representing component idea of path sets and cut sets. The basic idea of the
reliabilities) variables. In fact the system reliability gadget (called as “reliability analyzer”) was
expression in terms of component reliabilities was introduced in [24,121] and was further developed
often very lengthy even for a moderate size system in [35,37, and 43] using a microprocessor platform.
and would require very many multiplications Later on this idea was commercially exploited by
which affected the accuracy of system reliability Laviron [42] to develop a desk top aid in the form
value. of ESCAF. Of course these developments came
The first attempt to achieve minimized system about at a time (around early 1970s) when the
reliability expression using the concept of digital computers capabilities had not expanded to
producing disjointed paths set from simple path the limits as they now have. Today, we have very
sets in a general computerized algorithmic fast computers to do any job that too in real time
procedure was made in 1975 in [15], which is domain.
known as AMG-algorithm. This was basically a Another related development was in the
single variable inversion technique. Later on, direction of developing experts systems [57, 73,
several improvements to the procedure were 77, 78, 100, 120]. In fact there are experts systems
suggested starting with Abraham [26] algorithm available to carry out Failure Mode and Effects
and a large number of papers started appearing in Analysis (FMEA). Several expert systems have
literature [31,40 , 51, 53, 60, 61, 67, 69, 70, 81, been developed for specific reliability programmes
Reliability Engineering: A Perspective
279
in organizations and companies. For example probabilistic approach. Fuzzy set theory also
REMM [140] has been developed to help examine provides to resolve the gap between the perceived
design decisions in the production of a new product risk (subjective) and the statistical risk (objective)
in relation to dependability requirements imposed and in the opinion of the editor of this handbook is
on the product and for environmental concerns for the very appropriate area of application of fuzzy set
product operation besides the expert’s opinion and theory. In fact, fuzzy set theory has the capability
views on various aspects. Another experts system of providing models close to human thinking or
ARDA [110] helps analyse reliability data. Yet brain which has an asset of inferring and decision
another experts system RAMES [101] has been making in a fuzzy environment. There have been
developed to assist a weapon system program in several papers [30, 46, 68, 79, 80, 91, 97, 98, 107,
weapon system RAM performance analysis and 118 and 125] on the system analysis and design
enhancement and many other considerations. based on fuzzy set theory and it appears logical that
this area will be explored extensively in near
future.
19.5 Alternative Approaches Another approach that is being actively pursued
is the Dempster-Shafer theory [93,119] and may
The main drawback of probabilistic approach to offer the interesting alternative approach to system
performance evaluation of an engineering system is performance assessment. This handbook therefore
the uncertainty associated with the results, which includes an expository chapter with long list of
puts the entire process to question. Specifying references to familiarize a reader with these aspects
uncertainty by specifying mean and variance or of performance assessment.
confidence levels is not found to be the best way of
estimating the quality of results. Although some
intuitive methods of accounting, combination and 19.6 Reliability Design Procedure
propagation of uncertainty exist but these have not
been found satisfactory by the engineers. While The usual design procedure a system designer
Bayesian models have been extensively used, follows is:
primarily, as a numerical for representation and • Defines system terms of the operational
inference with uncertainty yet this again masks the requirements,
problem of uncertainty when priors are selected • Develops an index of system effectiveness
and ignorance remains hidden in the priors. All and arranges the system into several non-
these shortcomings of probabilistic approach led interacting subsystems,
researchers to look for alternative approaches to • Applies mathematical techniques to evaluate
system performance assessment. alternate system configurations in terms of
Conventional probabilistic approach treats reliability (or any other system effectiveness
failure events as random occurrences and low index) and constraints on resources,
probability of an event does not necessarily mean • Specifies a system configuration,
low possibility or that it will rarely occur. In fact maintenance policy and inter-relationship
the worst accidents in human history had very low with other factors,
probability evaluated for their occurrences. Zadeh • Allocates failure and repair rates to each
[3] put forward fuzzy set theory which appears to individual component so as to meet the
resolve these inconsistencies by offering a specified system reliability goals.
Possibilistic approach to performance evaluation In general, reliability is considered to be a good
based on the premise that small probability does criterion for system effectiveness. However,
not always mean low possibility, whereas a low depending upon the mission, some other criteria
probability necessarily should imply low may be selected for designs. Some of the other
probability. In fact there have been a large number parameters that may be of interest are: Availability
of papers published in this area which holds or maintainability, probability of mission success,
promise of providing an alternative approach to
280 Krishna B. Misra

mean time to failure, duration of single down time, subject to constraints on cost, etc. or else one can
or operational readiness, etc. minimize the cost of spares subject to specified
Any one of the above (or more) may form the level of availability.
criterion of optimal system design and therefore, be The subject of optimal system design has been
traded off with some of the constrained resources discussed in two chapters, viz., chapter 32 and
such as cost, weight, power consumption, etc. for chapter 33, where various formulations for system
non-maintained system; generally an index of design problem and their solution techniques have
reliability is sufficient for the system reliability been discussed in depth.
effectiveness. But for maintained systems, it is not
enough to base system design on reliability
criterion. For almost all the process systems or 19.7 Reliability Testing
energy systems, the basic necessity is to know the
percentage of time the systems are available. This One of the important feature of reliability
establishes a criteria for the design based on the engineering today, as distinct from the past when
system availability, thus one may be interested in we could have talked qualitatively only, is that we
maximizing the system availability. Alternately are in a position to say and demonstrate how much
one may rather be interested in comparing the reliable a product is? This has been made possible
design alternatives based on the duration of single by reliability testing and demonstration. Reliability
down time or frequency of failures. Whatever may testing is also a major source of generation
be the criterion of assessment of various reliability data. Testing is also required for design
alternatives for design, one should be able to build review, failure mode and effect analysis (FMEA),
a mathematical model for the design problem that trade off studies etc.
will fit into the present-day techniques of solution. Component testing is an important activity in
The above guidelines are by no means the any reliability improvement program. Here tests
ultimate rule; the actual design procedure would are generally carried out to determine design
depend entirely upon the system under study. margins and failure modes. Testing is also a must
However, it is an established fact that, in order to on all unproven components. Although component
arrive at an optimal design, allocation process level testing can provide basic design data and
either of reliability or redundancy forms an integral helps weed out weak designs, it is not adequate
part of the whole design procedure. enough in comparison to product or system testing.
We have already seen in an earlier section that the System testing comprises of testing the complete
use of either redundancy or high reliability unit or product and is tested as far as possible
components are a few means of achieving a desired under actual stress and environmental conditions.
level of the system reliability, therefore, the For thorough effectiveness system testing is done
designing includes an allocation of either on an iterative basis, i.e., test, fail, fix-retest cycle.
redundancy or reliability or both in a system. For a Reliability demonstration is done based on life
maintained system design, usually maximization of tests. As such these tests are time-consuming and
availability is desired rather than system reliability. expensive. It is therefore imperative that we should
However, if the maximization of system reliability plan tests in such a way as to be able to obtain
is the objective and the state-of-art permits it, maximum information from these tests with the
failure and repair rates are allocated to each fewest items put to test. It is also important that the
component of the system in order to maximize its items to be tested and performance of the tests be
system reliability subject to some specified techno- closely controlled in order to assure validity of
economic constraints on the system design. Also as tests. Very often products may be put to
we can’t stock any number of spares due to accelerated life tests and therefore may be tested at
resources are always limited, it becomes imperative higher temperature, pressure or other stresses to
to seek an optimal allocation for spare parts while reduce the test time. This may not be possible for
maximizing system availability or reliability many types of products particularly when the
Reliability Engineering: A Perspective
281
failure mode is likely to change with an increase in need to observe failures of products to better
the stress level. understand their failure modes and their life
Usually the following tests are in vogue: characteristics, reliability practitioners have
attempted to devise methods to force these
Development Tests: These are initiated at the time
products to fail more quickly than they would
of first proto-type assembly and constitute
under normal use conditions. In other words, they
• Performance tests-which are conducted at have attempted to accelerate their failures.
normal environment. Accelerated tests are necessary for obtaining life
• Environment tests-which are done at the data quickly. Life testing of products under higher
specified conditions of environment. stress levels, without introducing additional failure
• Endurance tests-which provide data that modes, can provide significant savings of both time
determines the degree of degradation that and money. Correct analysis of data gathered via
results from operating the item over an such accelerated life testing can yield parameters
extended period of time. and other information for the product’s life under
use stress conditions.
Qualification Tests: These tests are primarily
designed to subject a product to various Types of Accelerated Tests:
environmental conditions and operating stress Different types of tests, that have been called
levels, which it is expected to encounter in use. The accelerated tests, provide different information
details of the tests are actually worked out on a about the product and its failure mechanisms.
mutual basis between producer and consumer and Generally, accelerated tests can be divided into
are witnessed by the agents of the consumer. three types:
Generally the decisions are of failure success or go,
no-go type, with no gradation between the two. For Qualitative Tests: n general, qualitative tests are
these tests the sample sizes are generally small. not designed to yield life data that can be used in
subsequent analysis or for “Accelerated Life Test
Reliability Demonstration Tests: These tests are Analysis.” In general, qualitative tests do not
performed to demonstrate the product’s ability to quantify the life (or reliability) characteristics of
perform its functions repeatedly and therefore the the product under normal use conditions. They are
criterion is based on MTBF or MTTF. The purpose designed to reveal probable failure modes.
of these tests is to know how long a device will However, if not designed properly, they may cause
continue to function under the specific the product to fail due to modes that would not be
environment or loading conditions. For the encountered in real life. Qualitative Tests have
maximum number of samples within the economic been referred to by many names including Elephant
limits are subjected to these tests. Tests, Torture Tests, HALT (Highly Accelerated
Life Testing) and Shake & Bake Tests.
Accelerated Life Test: Traditional “Life Data
Analysis” involves analyzing times-to-failure data Elephant Tests: These tests are known by several
(of a product, system or component) obtained other names such as design margin tests, design
under “normal” operating conditions in order to qualification tests, toture tests shake and bake test
quantify the life characteristics of the product, or killer tesrs. Generally the sample is small or just
system or component. In many situations, and for one and the specimen is subjected to single
many reasons, such life data (or times-to-failure extreme value of stress or a thermal cycling or to
data) is very difficult, if not impossible, to obtain. number of stresses, simultaneously or sequentially.
The reasons for this difficulty can include the long If a product passes this test, designer feel happy
life times of today’s products, the small time period over the design but if the product fails this test, the
between design and release, and the challenge of deginer has to redesign the product or appropriate
testing products that are used continuously under measure is taken to improve manufacturing. It is
normal conditions. Given this difficulty, and the not necessary that an elephant test may produce a
282 Krishna B. Misra

failure that the product may encounter in actual product, component or system under normal use
use. Therefore it may be necessary to devise conditions, and thereby provide information on the
different elephant tests such as high volotage test to probability of failure of the product under use
reveal electrical failure modes and vibration tests to conditions, mean life under use conditions, and
reveal mechanical failure modes. projected returns and warranty costs. It can also be
used to assist in the performance of risk
Environmental Stress Screening(ESS): The ESS is
assessments, design comparisons, etc.
a process involving accelerated testing of products
Accelerated Life Testing can take the form of
under an environment such as random vibration
“Usage Rate Acceleration” or “Overstress
and thermal cycling and shock and bake. The goal
Acceleration.” For all life tests, time-to-failure
of ESS is two folds. The first is as an elephant test
information for the product is always required
during development; its purpose is to expose
design and manufacturing problems. The other is Usage Rate Acceleration: For products that do not
accelerated burn in during manufacturing to operate continuously under normal conditions, if
improve reliability. MIL-HDBK-344 for electronic the test units are operated continuously, failures are
equipment and MIL-STD-883C for encountered earlier than if the units were tested at
microelectronics are used. normal usage. For example, if we assume an
average washer use of 6 hours a week, one could
Burn-In: Burn-in consists of running items under conceivably reduce the testing time 28-fold by
design or accelerated conditions. Burn-in can be testing these washers continuously. Data obtained
regarded as a special case of ESS. Burn-in is a test through usage acceleration can be analyzed with
performed for the purpose of screening or the same methods used to analyze regular times-to-
eliminating freak or marginal devices with inherent failure data.
defects or defects with manufacturing aberrations
before customers receive them. If items fail early, Overstress Acceleration: For products with very
they have manufacturing defects. Burn in is high or continuous usage, the accelerated life-
generally used for electronic components and testing practitioner must stimulate the product to
equipment. Static and dynamic burn-in are used on fail in a life test. This is accomplished by applying
the devices depending upon the complexity and stress levels that exceed the levels that a product
their failure mechanisms. will encounter under normal use. The times-to-
ESS and Burn-in are performed on the entire failure data obtained under these conditions are
population and do not involve sampling. ESS then used to extrapolate to use conditions.
includes burn in as one of its purposes. Although Accelerated life tests can be performed at high or
ESS has evolved from burn-in but it is far low temperature, humidity, voltage, pressure,
advanced process. ESS is an accelerated process of vibration etc., and/or combinations of stresses to
stressing a product in continuous cycles between accelerate or stimulate the failure mechanisms.
pre-determined environmental extremes mainly Accelerated life test (ALT) stresses and stress
comprising temperature cycling and random levels should be chosen so that they accelerate the
vibrations. Burn-in therefore is a special case of failure modes under consideration but do not
ESS where temperature change rate for thermal introduce failure modes that would never occur
cycling is zero and vibration is sinusoidal if used. under use conditions. Normally, these stress levels
The classical book of Kececioglu and Sun [127] should fall outside the product specification limits
can be referred to by an interested reader on the but inside the design limits. Usually it is assumed
subject of Burn-in. that intrinsic failures due to wear out may only
exist and this assumption may not be quite valid in
Quantitative Accelerated Life Tests: Quantitative
certain cases particularly in presence of randomly
Accelerated Life Testing, unlike the qualitative
occurring defects due to manufacturing process. In
testing methods, consists of quantitative tests
those cases one may use the methodology
designed to quantify the life characteristics of the
described in [135].
Reliability Engineering: A Perspective
283
HALT and HASS: The HALT and HASS are not Lunar module was the first to be subjected to
meant to simulate the environment in field but the MEOST methodology and since 1960, MEOST has
sole purpose is to expose weak links in the design been successfully applied to aircraft engines,
and processes while using only a small sample size helicopters, automobiles aerospace equipment etc.
and in a very short time. The stresses are stepped
up to well beyond the expected environment in
actual use until the “Fundamental limit of 19.8 Reliability Growth
technology” is reached. Each weak link discovered
provides an opportunity to improve the product The objective of reliability growth is to improve
design or the processes which may lead to reliability over time during product development.
improved reliability of a product, reduced costs and This is primarily during the design and
reduced design time. It is basically ruggedization manufacturing phase of the product and is achieved
of design as a robust product exhibits higher by following a process of test-fix-test-fix cycle.
reliability than a non-robust product. Reliability tests are conducted on prototypes to
HASS is a hundred per cent screening of ensure that reliability goals have been met. Failure
production of items using stresses which are higher analysis is conducted to identify the high failure
than those met in normal use. In HASS, accelerated modes in case the goal is not met and these modes
stresses are applied to production in order to are then fixed. We try to eliminate these modes or
shorten the time to failure of the defective unit and attempt to lessen their effects in order to improve
to shorten the corrective action time and the reliability. This cycle is repeated and failure data
number of units built with the same flaw. HASS is generated from the tests are plotted in the form of a
generally not possible without a comprehensive growth curve called reliability growth curve. as
HALT. Without HALT, fundamental design limits mentioned, this growth curve is obtained through
will restrict the acceptable stress levels in continuous test , evaluation and redesign activity.
production screens. The earliest reliability growth model was proposed
The originator of HALT and HASS is Gregg K. by Duane [2] who enunciated that a plot of
Hobbs [131] and we have chapter from him logarithm of cumulative number of failures per test
included in this Handbook on HALT and HASS. time versus logarithm of test time during growth
testing is approximately linear. There are other
Multiple Environment Overstress Tests (MEOST): growth models such as Crow-AMSAA [10]. Crow
It claims to provide powerful tools that can predict observed that Duane model could be stochastically
and correct potential field failures at the design represented as a Weibull process and this stochastic
stage of the product besides claiming that MEOST extension became what is known as Crow-
[144] can reduce design test cost both in labour AMSAA model as this model was first developed
and the number of units required to demonstrate at the US Army Material System Analysis Activity.
reliability besides reducing the design cycle times. This model is used with systems when usage is
In MEOST the objective is not to pass a product measured on continuous scale. There are other
but to fail it which will highlight the weakness of models as well such as Lloyd and Lipow model
the design. A single stress or environment is not [1], Gompertz Model [4], Crow extended model
sufficient to generate failures. More than one stress [143] and logistic [87] model. A good coverage of
applied sequentially is not enough to ferret out the various other models is provided in [44, 106,126].
interaction effects. Therefore, several stresses or
environments are combined simulating the field
conditions as closely as possible to create synergy References
of interaction effects. The combined overstresses
go beyond design stress level to a maximum [1] Lloyd DK, Lipow M. Reliability growth models.
possible overstress limit (MPOSL) and the rate of in Reliability Management, Methods and
overstress is accelerated to produce failures in Mathematics. Prentice-Hall Inc. 1962; 330-338.
shortest possible time. It is claimed that the Apollo
284 Krishna B. Misra

[2] Duane JT. Learning Curve Approach to Reliability [18] Misra KB, Balagurusamy E. reliability analysis of
Monitoring. IEEE Trans. on Aerospace 1964; k-out-of –n: G system with dependent failures. Int.
2(2):563-566. J. System Science 1976; Nov., 7(11): 1209-1215.
[3] Zadeh LA. Fuzzy sets. Information Control [19] Balagurusamy E, Misra KB. Avaliability and
1965;8: 338-353. failure frequency of repairable m-order systems.
[4] Virene EP. Reliability growth and the upper Int. J. System Science 1976; Nov., 7(11): 1209-
reliability limit, How to use available data to 1215.
calculate. Proc. Annual Symp. on Rel. (IEEE cat. [20] Lin PM, Leon BJ, Huang TC. A new algorithm for
no. 68 C33-R) Jan. 1968: 265-270. symbolic system reliability analysis. IEEE Trans.
[5] Misra KB, Rao TSM. Reliability analysis of on Rel. 1976; R-25(1): 2-15.
redundant networks using flow graphs. IEEE [21] DeMercado J, Spyratos N, Bowen BA. A method
Trans. on Rel. 1970; Feb., R-19(1):19-24. for calulation of network reliability. IEEE Trans.
[6] Misra KB. An algorithm for reliability evaluation on Rel. 1977; R-25(2): 71-76.
of redundant networks. IEEE Trans. on Rel. 1970; [22] Sifakis J. Use of timed Petri nets for performance
Nov., R-19(4):146-151. evaluation. 3rd Int. Symp. Beliner and Gelenbe,
[7] Hansler E. A fast recursive to calculate the Editors. Measuring, Modeling and Evaluating
reliability of a communication network. IEEE Computer System 1977; 75-95.
Trans. on Com. 1972; June, Com-20(3): 637-640. [23] Satyanarayana A, Prabhakar A. New topological
[8] Kershenbaum A, Van Styke RM. Recursive formula and rapid algorithm for reliability analysis
analysis of network reliability. Networks 1973; 3: of complex networks. IEEE Trans. on Rel. 1978;
81-94. June, R-27(2):82-100.
[9] Hasofer AM, Lind NC. Exact and invariant second [24] Misra KB, Raja AK. A laboratory model for
moment code format. J. of the Engg. Mechanics system reliability analyzer. Microelectronics and
Div.ASCE 1974; 100: 111-121. Rel. 1979; 19(3): 259-264.
[10] Crow LH. Reliability Analysis for complex, [25] Fiessler B, Neumann H-J, Rackwitz R. Quadratic
repairable systems in reliability and biometry. limit states in strucural reliability. J. of the Engg.
SIAM. Proschan F. and Serfling RJ. (Eds.) 1974: Mechanics Div.ASCE 1979; 105: 661-676.
379-410. [26] Abraham JA. An improved algorithm for network
[11] Rao SS. A probabilistic approach to the design of reliability. IEEE Trans. on Rel. 1979; April, R-
gear trains. Int J. of Machine Tool Design and 28(1):58-62.
Research 1974; 14: 267-278. [27] Gadani JP, Misra KB. Reliability evaluation of a
[12] Proctor CL, Singh B. A three-state system Markov system with imperfect nodes and links using
model. Microelectronics and Rel. 1975; 14 network approach. Systems Science 1979; 5(3):
(5/6):463-464. 265-274.
[13] Aggarwal KK, Gupta JS, Misra KB. Reliability [28] Kontoleon JM. Reliability determination of a r-
evaluation: A comparative study of different successive-out-of-n:F system. IEEE Trans. on Rel.
methods. Microelectronics and Rel. 1975; 1980; Dec., R-29 (5): 327.
14(1):49-56. [29] Gadani JP, Misra KB. Network reliability
[14] Aggarwal KK, Gupta JS, Misra KB. evaluation of three-state devices using
Computational time and absolute error comparison transformation technique. Microelectronics and
for reliability expression derived by various Rel. 1981; 21(2): 231-234.
methods. Microelectronics and Rel. 1975; 14: 465- [30] Misra KB, Sharma A. performance index to
467. quantify reliability using fuzzy subset theory.
[15] Aggarwal KK, Misra KB, Gupta JS. A fast Microelectronics and Rel. 1981; 21(4): 543-549.
algorithm for reliability evaluation. IEEE Trans. [31] Locks MO. Recursive disjoint products: A review
on Rel. 1975; April, R-24(1): 83-85. of three algorithms. IEEE Trans. on Rel. 1982, R-
[16] Balagurusamy E, Misra KB. Reliability and mean 31(1): 33-35.
life of a parallel system with non-identical units. [32] Bollinger RC, Salvia AA. Consecutive –k-out-of-
IEEE Trans. on Rel. 1975; R-24 (5): 340-341. n: F networks. IEEE Trans. on Rel. 1982; April,
[17] Balagurusamy E, Misra KB. Failure rate derating R-31 (1): 53-56.
chart for parallel redundant units with dependent [33] Misra KB, Prasad P. Comment on reliability
failures. IEEE Trans. on Rel. 1976; R-25 (2): 122. evaluation of a flow network. IEEE Trans. on Rel.
1982; June, R-31(2):174-176.
Reliability Engineering: A Perspective
285
[34] Gadani JP, Misra KB. A network reduction and of directed networks. Operations Research 1984;
transformation algorithm for the assessment of May-June, 32: 493-517.
system effectiveness indices. IEEE Trans. on Rel. [50] Breitung K. Asymptotic approximations for
1981; April, R-30 (1): 48-57. multinormal Integral. J. of the Engg. Mechanics
[35] Bansal VK, Misra KB. Hardware approach for Div. ASCE 1984; 110: 357-366.
generating spanning trees in reliability studies. [51] Schneeweiss WG. Disjoint Boolean products via
Microelectronics and Rel. 1981; 21(2): 243-253. Shanon’s expansion. IEEE Trans. on Rel. 1984;
[36] Gadani JP, Misra KB. Quadrilateral-star Oct., R-33(4):329-332.
transformation: an aid for reliability evaluation of [52] Fu JC. Reliability of a large consecutive-k-out-of-
large complex systems. IEEE Trans. on Rel. 1982; n: F system. IEEE Trans. on Rel. 1985; June, R-34
April, R-31 (1): 49-59. (2): 127-130.
[37] Bansal VK, Misra KB, Jain MP. Minimal path sets [53] Locks MO. Recent developments in computing of
and minimal cut sets using a search technique. system reliability. IEEE Trans. on Rel. 1985; Dec.,
Microelectronics and Rel. 1982; 22(6): 1067-1075. R-34(5):425-436.
[38] Hura GS. Petri nets as a modeling tool. [54] O’Connor Patrick DT. Practical reliability
Microelectronics and Rel. 1982; 22(3): 433-439. engineering. John Wiley & Sons, Chichester,U.K.,
[39] Molloy MK. Performance analysis using 1985.
stochastic Petri nets. IEEE Trans. on Rel. 1982; [55] Madson HO, Krenk S, Lind NC. Methods of
Sep., R-31(9): 913-917. structural safety. Prentice-Hall Inc. Englewood,
[40] Bennetts RG. Analysis of reliability block New Jersey, 1986.
diagrams by Boolean techniques. IEEE Trans. on [56] Wood RK. Factoring algorithms for computing k-
Rel. 1982; June, R-31(2): 159-166. terminal network reliability. IEEE Trans. on Rel.
[41] Aggrawal KK, Chopra YC, Bajwa JS. Capacity 1986; Aug., R-35 (3): 269-278.
consideration in reliability analysis of [57] Andrew PK. Improvement of operator reliability
communication systems IEEE Trans. on Rel. using expert systems. Int. J. of Rel. Engg. 1986;
1982; June, R-31(2): 177-181. 14(4): 309-319.
[42] Laviron A, Carnino A, Manaranche. ESCAF- A [58] Melcher RE. Structural reliability analysis and
new and cheap system for complex reliability prediction. Ellis Horwood, Chichester, U.K.,1987.
analysis and computation. IEEE Trans. on Rel. [59] Rai S, Kumar A. Recursive technique for
1982; R-31(4):339-349. computing system reliability. IEEE Trans. on Rel.
[43] Bansal VK, Misra KB, Jain MP. Improved 1987; April, R-36(1): 38-44.
Implementation of a search technique to find [60] Beichelt F, Spross L. An improved Abraham
spanning trees. Microelectronics and Rel. 1983; method for generating disjoint sums. IEEE Trans.
23(1): 141-147. on Rel. 1987; April, R-36(1): 70-74.
[44] Crow LH. Methods for reliability growth [61] Locks MO. A minimizing algorithm for sum of
assessment during development. In Electronic disjoint products. IEEE Trans. on Rel. 1987; Oct.,
Systems Effectiveness and Life Cycle Testing. R-36(4): 445-453.
Skwirzynski JK (Ed.) Springer-Verlag, 1983. [62] Rushdi AM. Efficient computation of k-to-l-out-
[45] Kenyon RL, Newell RJ. Steady-state availability of-n system reliability. Int. J. of Rel. Engg. 1987;
of k-out-of-n:G system with single repair. IEEE 17:157-163.
Trans. on Rel. 1983; June, R-32 (2): 188-190. [63] Rushdi AM. Performance Indexes of a
[46] Tanaka H, Fan LT, Lai FS, Taguchi K. Fault tree telecommunication network. IEEE Trans. on Rel.
analysis by fuzzy probability. IEEE Trans. on Rel. 1988; April, 37(1): 57-64.
1983; Dec., R-32 (5); 453-457. [64] Yoo YB, Deo N. A comparison of algorithms for
[47] Satyanarayana A, Wood RK. A linear-time terminal-pair reliability. IEEE Trans. on Rel. 1988;
algorithm for computing k terminal reliability in June, 37(2): 210-215.
series-parallel networks. SIAM Journal of [65] Page LB, Perry JE. A practical implementation of
Computing 1983; 14: 818-832. the factoring theorem for network reliability. IEEE
[48] Agrawal A, Barlow RE. A survey of network Trans. on Rel. 1988; Aug., R-37 (3): 259-267.
reliability and domination theory. Operations [66] Hura GS, Etessami FS. The use of petri nets to
Research 1984; May-June, 32: 478-492. analysis coherent fault trees. IEEE Trans. on Rel.
[49] Agrawal A, Satyanarayana A. An O(/E/) time 1988; Dec., R-37 (5): 469-474.
algorithm for computing the reliability of a class
286 Krishna B. Misra

[67] Ball MO, Provan JS. Disjoint products and combinatorial circuits. IEEE Trans. on Rel. 1990;
efficient computation on reliability. Operations April, R-39 (1): 76-86.
Research 1988; Oct., 36:703-715. [83] Kuo W, Zhang W, Zuo M. A consecutive k-out-
[68] Misra KB, Weber GG. A new method for fuzzy of-n: G system: The mirror image of a
fault tree analysis. Microelectronics and Rel. 1989; consecutive-k-out-of-n: F system. IEEE Trans. on
29(2): 195-216. Rel. 1990; June, R-39 (2): 244-253.
[69] Heidtmann KD. Smaller sums of disjoint products [84] Rushdi AM. Some open questions on: Strict
by subproduct inversion. IEEE Trans. on Rel. consecutive-k-out-of-n: F systems. IEEE Trans. on
1989; Aug., R-38 (3): 305-311. Rel. 1990; June, R-39 (2): 380-381.
[70] Beichelt F, Spross L. Comments on: An improved [85] Papastavridis S. m-consecutive-k-out-of-n:F
Abraham method for generating disjoint sums. systems. IEEE Trans. on Rel. 1990; Aug., R-39
IEEE Trans. on Rel. 1989; Oct., R-38 (4): 422- (3): 386-388.
424. [86] Politof T, Satyanarayana A. A linear time
[71] Page LB, Perry JE. Reliability of directed algorithm to compute the reliability of planer
networks using the factoring theorem. IEEE Trans. cube-free networks. IEEE Trans. on Rel. 1990;
on Rel. 1989; Dec., R-38 (5): 556-562. Dec., R-39(5): 557-563.
[72] Mandaltsis D, Kontoleon J. Enumeration of k-trees [87] Kececioglu DB. Reliability growth. In Reliability
and their applications to reliability evaluation of Engineering Handbook Ed. 4 , Vol 2. Prentice-
communication networks. Microelectronics and Hall, Englewood Cliffs 1991; 415-418.
Rel. 1989; 29 (5): 733-735. [88] Clark WB. Analysis of reliability data for
[73] Moureau R. FURAX: Expert system for automatic mechanical systems. Proc. Annual Rel. and Maint.
generation of reliability models for electrical or Symp. Orlando, Florida, USA 1991; 438-441.
fluid networks. Proc. 7th International Conf. on [89] Thien-My D, Lin Z, Massoud M. Mechanical
Rel. and Maint., Brest, France 1990. strength reliability evaluation using an iterative
[74] Dumai A, Winkler A. Reliability prediction model approach. Proc. Annual Rel. and Maint. Symp.,
for gyroscopes. Proc. Annual Rel. and Maint. Orlando, Florida, USA 1991; 446-450.
Symp. Los Angeles, California, USA 1990; 5-9. [90] Noh S, Rai S. Experiment results on preprocessing
[75] Vannoy EH. Improving MIL-HDBK-217 type of paths/cuts terms in some of disjoint product
models for predicting mechanical reliability. Proc. techniques. Proceedings of the Infocom 1991;
Annual Rel. and Maint. Symp. Los Angeles, 533-542.
California, USA 1990; 341-345. [91] Kenaranuie R. Event-tree analysis by fuzzy
[76] Bowles JB, Klein LA. Comparison of commercial probability. IEEE Trans. on Rel. 1991; April, R-40
reliability prediction programs. Proc. Annual Rel. (1): 120-124.
and Maint. Symp., Los Angeles, California, USA [92] Rushidi AM. Comments on: An efficient non-
1990; 450-455. recursive algorithm for computing the reliability of
[77] Elliott MS. Knowledge-based systems for k-out-of-n system. IEEE Trans. on Rel. 1991;
reliability analysis. Proc. Annual Rel. and Maint. April, R-40 (1): 60-61.
Symp., Los Angeles, California, USA 1990; 481- [93] Inagaki T. Interdependence between safety-
489. control policy and multiple sensor schemes via
[78] Lehtela M. Computer-aided failure mode and Dempster-Shafer theory. IEEE Trans. on Rel.
effect analysis of electronic circuits. 1991; June, R-40 (2): 182-188.
Microelectronics and Rel. 1990; 30 (4): 761-773. [94] Theologu OR, Carlier JG. Factoring and
[79] Misra KB and Weber GG. Use of fuzzy set theory reductions for networks with imperfect vertices,
for level-1 studies in probabilistic risk assessment. IEEE Trans. on Rel. 1991; June, R-40 (2): 210-
Fuzzy Sets and Systems 1990; 37: 139-160. 217.
[80] Onisawa T. An application of fuzzy concepts to [95] Veeraraghavan M, Trivedi KS. An improved
modeling of reliability analysis. Fuzzy sets and algorithm for the symbolic reliability analysis of
Systems 1990; 37:389-393. networks. IEEE Trans. on Rel. 1991; Aug., R-40
[81] Wilson JM. An improved minimizing algorithm (3): 347-360.
for sum of disjoint products. IEEE Trans. on Rel. [96] Shooman AM, Kershenbaum A. Exact graph-
1990; April, R-39 (1): 42-45. reduction algorithms for network reliability
[82] Helman P, Rosenthal A. A decomposition scheme analysis. Technical repot, IBM TJ Watson
for the analysis of fault trees and other Research Center, Hawthorne, New York, 1991.
Reliability Engineering: A Perspective
287
[97] Onisawa T. Fuzzy reliability assessment system reliability evaluation. Elsevier, 1993; 75-
considering the influence of many factors on 114.
reliability. IEEE Trans. on Rel 1991; Dec., R-40 [113] Shooman Andrew M. Probabilistic graph-
(5): 563-571. reduction techniques. In Misra KB, (Ed.) New
[98] Guth MAS. A probabilistic foundation for Trends in System Reliability Evaluation. Elsevier,
vagueness and imprecision in fault tree analysis. 1993; 117-163.
IEEE Trans. on Rel. 1991; Dec., R-40 (5): 563- [114] Deo Narsingh, Medidi Muralidhar. Parallel
571. algorithms and implementations. In Misra KB,
[99] Karunanithi N, Whitney D, Malaiya YK. Using (Ed.) New Trends in System Reliability
neural networks in reliability prediction. IEEE Evaluation. Elsevier, 1993; 165-182.
Software. July/Aug. 1992; 9(4) : 53-59. [115] Rushdi Ali M. Reliability of k-out-of-n systems. In
[100] Zaitri CK, Keller AZ, Fleming PV. A smart Misra KB, (Ed.) New Trends in System Reliability
FMEA (Failure Modes and Effects Analysis) Evaluation. Elsevier, 1993; 185-221.
package. Proc. Annual Rel. and Maint. Symp., Las [116] Papastavridis SG, Koutras MV. Consecutive-k-
Vegas, USA 1992; 414-421. out-of-n systems. In Misra KB, (Ed.) New Trends
[101] Hansen WA, Edson BN, Larter PC, Reliability, in System Reliability Evaluation. Elsevier, 1993;
availability and maintainability expert system 228-242.
(RAMES), Proc. Annual Rel. and Maint. Symp. [117] Hura GS. Use of Petri nets for system reliability
Las Vegas, USA 1992; 478-482. evaluation. In Misra KB, (Ed.) New Trends in
[102] Locks MO, Wilson JM. Note on disjoint product System Reliability Evaluation. Elsevier, 1993;
algorithms. IEEE Trans. on Rel. 1992; March, R- 339-364.
41 (1): 81-84. [118] Onisawa Takehisa, Misra KB. Use of fuzzy set
[103] Misra KB. Reliability analysis and prediction: A theory (Part-II: Applications). In Misra KB, (Ed.)
methodology oriented treatment. Elsevier Science New Trends in System Reliability Evaluation.
Publishers BV, Amsterdam, 1992. Elsevier, 1993; 551-583.
[104] Iyer S. Distribution of lifetime of consecutive k- [119] Inagaki T. Dempster-Shafer theory and its
within m-out-of-n: F systems. IEEE Trans. on Rel. applications. In Misra KB, (Ed.) New Trends in
1992; Sept., R-41 (3): 448-450. System Reliability Evaluation. Elsevier, 1993;
[105] Boehme TK, Kossow A, Preuss W. A 587-623.
generalization of consecutive k-out-of-n: F [120] Russomanno DJ, Bonnell RD, Bowles JB. Expert
systems. IEEE Trans. on Rel. 1992; Sept., R-41 systems for reliability evaluation. In Misra KB,
(3): 448-450. (Ed.) New Trends in System Reliability
[106] Xie M, Zhao M. On some reliability growth Evaluation. Elsevier, 1993; 625-651.
models with simple graphical interpretations. [121] Misra KB. Reliability analyzer. In Misra KB, (Ed.)
Microelectronics and Rel. 1993; 33(2): 149-167. New Trends in System Reliability Evaluation.
[107] Soman KP, Misra KB. Fuzzy fault tree analysis Elsevier, 1993; 653-700.
using resolution identity. Int. J. of Fuzzy Sets and [122] Soman KP, Misra KB. On Bayesian estimation of
Mathematics 1993; 1:193-212. system reliability. Microelectronics and Rel. 1993;
[108] Soman KP, Misra KB. A least square estimation of 33 (10): 1455-1493.
three parameters of a Weibull distribution. [123] Soman KP, Misra KB. Bayesian sequential
Microelectronics and Rel. 1992; 32(3): 303-305. estimation of two parameters of a Weibull
[109] Soman KP, Misra KB. Moments of order statistics distribution. Microelectronics and Rel. 1994; 34
using the orthogonal inverse expansion method (3): 509-519.
and its application in reliability. Microelectronics [124] Soman KP, Misra KB. A simple procedure of
and Rel. 1992; 32 (4): 469-473. computing variance sensitivity coefficients of top
[110] Ansell J, Al-Doori M. ARDA: Expert system for events in a fault tree. Microelectronics and Rel.
reliability data analysis. Proc. Int. Conf. on APL 1994; 34 (5): 929-934.
1993; 1-5. [125] Soman KP, Misra KB. Estimation of parameters of
[111] Veeraraghavan Malathi, Trivedi Kishor S. Multi- failure distributions with fuzzy data. Int. J.
variable inversion techniques. In Misra KB, (Ed.) Systems Science. 1995; 26(3): 659-670.
New Trends in System Reliability Evaluation. [126] Reliability growth-statistical test and estimation
Elsevier, 1993; 39-73. methods. IEC-1164 International Electrotechnical
[112] Beichelt Frank. Decomposition and reduction Commission, 1995.
techniques. In Misra KB, (Ed.). New trends in
288 Krishna B. Misra

[127] Kececioglu Dimitri, Sun F-B. Burn-in testing: Its International Symposium on Uncertainty
quantification and optimization, Prentice Hall Modeling and Analysis I SUMA 2003: 28 – 33.
PTR, New Jersey, 1997. [142] Levitin Gregory. Reliability of multi-state systems
[128] Meeker WQ, Esobar LA. Statistical methods for with two failure-modes IEEE Trans. on Rel. 2003;
reliability data. John Wiley & Sons, New York, Sept., 52(3): 340-348.
1998. [143] Crow LH. An extended reliability growth model
[129] Zuo Ming J, Lin Daming, Wu Yanhong. for managing and assessing corrective actions.
Reliability evaluation of combined k-out-of-n:F, IEEE Proc. of Annual Rel. and Maint. Symp.
consecutive-k-out-of-n:F, and Linear connected-(r; 2004; 73-80.
s)-out-of-(m; n):F system structures. IEEE Trans. [144] Bhote KR, Bhote AK. World class reliability:
on Rel. 2000; March, 49(1): 99-105. Using multiple environment overstress tests to
[130] Zuo MJ, Lin Daming, Wu Y. Reliability make it happen. American Management
evaluation of combined k-out-of-n:F, consecutive- Association ,New York, 2004.
k-out-of-n:F and linear connected-(r, s)-out-of-(m, [145] Lin Min-Sheng. An O(k2 log(n)) algorithm for
n):F system structures. IEEE Trans. on Rel. 2000; computing the reliability of consecutive-k-out-of-
March, 49(1): 99-104. n:F systems. IEEE Trans. on Rel. 2004; 53(1): 3-6.
[131] Hobbs Gregg K. Accelerated Reliability [146] Nelson Wayne B. Applied Life Data Analysis.
Engineering: HALT and HASS. John Wiley & Wiley-Interscience, New Jersey, U.S.A. 2004.
Sons, Chichester, U.K.2000. [147] Nelson Wayne B. Accelerated testing: Statistical
[132] Roy Dilip, Dasgupta Tanmoy. A discretizing models, test plans, and data analysis. Wiley-
approach for evaluating reliability of complex Interscience, New Jersey, U.S.A. 2004.
systems under stress-strength model. IEEE Trans. [148] Xinjian Xiang, Dingfei-Ge. Research on reliability
on Rel. 2001; June, 50(2): 145-150. evaluation of electronic products under
[133] Jones Jeff, Hayes Joe. Estimation of system environmental conditions. Fifth World Congress
reliability using a “non-constant failure rate” on Publication Intelligent Control and Automation
model. IEEE Trans. on Rel. 2001; Sept., 50(3): 2004; June 15-19, 4: 3146–3149.
286-288. [149] Hongbin Li, Qing Zhao. A cut/tie set method for
[134] Boushaba M, Ghoraf N. A 3-Dimensional reliability evaluation of control systems. Proc. of
consecutive-k-out-of-n:F models. Int. J. Rel. Qua. the 2005 American Control Conference; 8-10
& Safety. 2002; 9(2): 111-125. June, 2: 1048 – 1053.
[135] Kim CM, Bai DS. Analysis of accelerated life test [150] Wang H, Pham H. Reliability and optima
data under two failure modes. Int. J. Rel. Qua. & maintenance. Springer, London, U.K., 2006.
Safety. 2002; 9(2): 111-125. [151] Lin Yi-Kuei. System reliability of a limited-flow
[136] Kececioglu Dimitri. Reliability & Life Testing network in multicommodity case. IEEE Trans.on
Handbook , Vol. I and II, DEStech Publications, Rel. 2007; March, 56(1):17-25.
Lanceste, .Pa., U.S.A. 2002. [152] Turkkan Noyan, Pham-Gia Thu. System stress-
[137] Chaturvedi SK, Misra KB. An efficient multi- strength reliability: The multivariate case. IEEE
variable inversion algorithm for reliability Trans. on Rel. 2007; March, 56(1):115-124.
evaluation of complex systems using path sets. Int. [153] Lin Yi-Kuei. Reliability evaluation for an
J. of Rel., Qua. and Safety Engg 2002; 9(3): 237- information network with node failure under cost
259. constraint. IEEE Trans. on Systems, Man and
[138] Chaurvedi S.K., Misra K.B. A hybrid method to Cybernetics, Part A 2007; 37 (2): 180 – 188.
evaluate reliability of complex system. Int. J. Qua. [154] Bae Suk Joo, Kang Chang Wook, Choi Jung Sang.
Rel. Management 2002; 19 (8/9): 1098-1112. Quality and reliability evaluation for nano-scaled
[139] Levitin G. Reliability evaluation for acyclic devices. IEEE International Conference on
transmission networks of multi-state elements with Management of Innovation and Technology 2006;
delays. IEEE Transactions on Rel. 2003; June, June, 2: 798 – 801.
52(2): 231 – 237. [155] Crespo Adolfo, Iung Marquez Benoît. A structured
[140] Jones JA, Marshall JA, Aulak G, Newman B. approach for the assessment of system availability
Development of an expert system for reliability and reliability using Monte Carlo simulation. J. of
task planning as part of REMM methodology. Pro. Qual. in Maintenance Engg. 2007; 13 (2):125-136.
Annual Rel and Maint. Symp. 2003; 423-428.
[141] Lee Jungwon Huh, Haldar SY, Yosu A. Reliability
evaluation using finite element method. Fourth

View publication stats

You might also like