Modeling Failure-Data by Mixture of 2 Weibull Distributions A Graphical Approach
Modeling Failure-Data by Mixture of 2 Weibull Distributions A Graphical Approach
R. Jiang on their suitability for modeling a given failure data set. There
The University of Queensland, Brisbane are 2 cases: the shape parameters of the 2 sub-populations are,
D.N.P. Murthy
different, or
The University of Queensland, Brisbane
the same.
For both cases, parameter estimation depends on the relative
Key Wordr - Failure data, Modeling, Mixed-Weibull distribu- ratios of the scale parameters for the 2 sub-populations. These
tions, Parameter estimation issues are discussed in detail, and the approach is illustrated
through two examples involving real data.
Summary & Conclusions - The paper presents a graphical J&K characterize the WPP plots where the shape
approach to decide on the appropriatenessof a mixture of 2 Weibull parameters for the 2 sub-populations are different. Their results
distributions to model a given failure data set. It involves plotting contain several errors and hence differ from those presented
the data on Weibull paper. The plots are completely characteriz- here. We critically evaluate their paper and discuss the reasons
ed and the parameters estimated for various cases. Two examples why some of their results are incorrect.
involving real data illustrate the usefulness of the approach. The
results are compared with those of Jiang & Kececioglu (1992) and
Section 2 discusses the mixed Weibull distribution and its
the errors therein are pointed out. plotting on WPP. Sections 3 & 4 deal with WPP plots and
parameter estimation where the shape parameters for the 2 sub-
populations are different. Sections 5 & 6 give similar results
where the 2 shape parameters are the same. Section 7 discusses
1 . INTRODUCTION
the appropriateness of a mixture of 2 Weibull distributions to
model a given failure data set. Section 8 presents two examples
Graphical approaches have been used extensively in deter-
involving real data to illustrate the approach. Section 9 evaluates
mining whether a particular distribution is suited to modeling
critically the results of J&K and lists the errors in their paper.
a given set of failure data. Plotting paper for several distribu-
tions is available to carry this out [2]. The procedure is fairly
Notation
straightforward. One typically plots the failure data on the plot-
ting paper for the distribution selected. If the data fall roughly f(t), R ( t ) [pdf, Sfl for the mixed Weibull distribution
along a straight line, then the distribution selected is not judg- 1 index for sub-population; i = 1,2 unless otherwise
ed inappropriate for modeling the data. In this case, one can specified
estimate the distribution parameters from the plot. J(t), Ri(t) [pdf, Sfl of sub-population i
vi, Pi [scale, shape] parameter of Ri( t ) , all are positive.
Acronyms p, q mixing weight for sub-population [ 1, 21; p E (0,l),
p+q = 1
J&K Jiang & Kececioglu [l] P P1 = p2 (the special case)
WPP Weibull plotting paper.
k (172/01)p (41)
Of particular interest is WPP as the 2-parameter Weibull X ln(t)
distribution has been used to model a wide variety of failure Y ( d ln(-ln(R(t)))
distributions [3]. If the plotted data are scattered along a straight Y ( X ) M-W(eX)))
line, then the 2 parameters can be obtained from the slope & e curve: y ( x ) vs x
intercept. Often when the failure data are plotted on WPP, the ej approximations to C, j = 1,2,3
points are scattered along a smooth curve rather than a straight Li plot for sub-population i on WPP
line. Then the data cannot be modeled by a 2-parameter Weibull Z intersection of L1 & &
distribution. A common next step is to decide whether the data L, asymptote to e as x - -a
can be modeled by a 3-parameter Weibull distribution or a com- L, tangent to e at x = xI
bination of 2 Weibull distributions. Two formulations that have A,B,C inflection points of y(x) for p1 # p2
received some attention are: T inflection point of y ( x ) for P1 = p2 = p
xo. yo coordinates of point Q in the x-y plane, Q E {A, B,
the mixture model 11, and its references] c, 1, r>
the competing risk model [4]. y’, y ” [first, second] derivative of y(x).
This paper characterizes the plots of a mixture model for Other, standard notation is given in “Information for Readers
2 Weibull distributionson WPP as a first step towards deciding & Authors” at the rear of each issue.
L, is:
y = y ( t ) = ln(-ln(R(t))). (3)
curve C.
lim (s2(x)) = 0.
e is the straight line: x-
lim ( y ’ ( x ) ) =
x- f m
PI.
When p2 > pl, the intersection, I , of L~ & b is important. An intuitive explanation for this asymptotic result is: x
corresponds to t -
0, and x -
SODcorresponds to t
---03
03;
&
since > P1, the tail for fi ( t ) dominates the tail for A ( t ) as
t - -
0 and as t 03. This implies that for very small and very
large t, fi ( t ) dominates A (t). As a result, for the asymptotic
(7) case, the mixture is effectively a single Weibull distribution and
hence C! is a straight line with slope P1.
The appendix also shows that:
$ 1 . 2 ~ e x p ( ~ l . x ) / v P- exp(P2.x)/vP Using this in (8) yields the asymptotes to e as the two straight
q(x) ln(Ri(eX))+l,for i=1,2, and ‘null’ lines:
a P.01 + q.02.
This section characterizes e. Use (1) in (4)and simplify: y = 01 (x - ln(v1)) + In@), as - -03y
(17)
t
f
2.0 --
-1 .e 0 .el 1 .e 2.8 3.0 4.0
X
71/72 = 1:
71/72 > 1:
RI R1( e X ) .
I t (17) as x - -00:
3.0 --
0.8 .-
-3 .e -.
I
R(vd = p.exp(-l), R(0d = p + q.exp(-l). (32) At C,the slope of y ' is zero. The intersection of the horizontal
line (39) with the C defines a point which can be viewed as
The two special points on C are: an approximation to C. Plots for a range of parameter values
indicates that this approximation is reasonable. This yields,
x1 = ln(01), Y1 = ln(-ln(p.exp(-l))); (33)
(1) can be approximated as: If the data set is very large, then the WPP plot has suffi-
cient points on either side of I so that one can fit the asymp-
R ( t ) = p.R,(t); thus, (35) totes for x -
f03. However, when the mixture is well separated
and vl -4 q2 [ql % q2] then most of the data is to the left
[right] of I , and as a result, one can fit only the asymptote for
C2: y ( x ) = En(-ln(p-Rl(ex)), (36)
x - -00 [+a]. When q l = q2, the data are scattered about
equally on either side of I so that one can fit both asymptotes.
y' = P1/[1 + (ln(p)/ln(~1))1, (37)
This implies that the estimation procedure for the well-separated
R, E Rl(e"). case must be different from that for q1 = 92.
Once the data are plotted on the WPP [2], one can infer
whether,
The asymptotic slopes for C2 are:
01 = 02,
lim ( y ' ( x ) ) = 0, x -lim
x--m +m
(y'(x)) = P1. (38) 01 9 02,
01 Q 02,
The asymptotes are straight lines, based on the scatter of the data relative to I .
(5) as x - +CO.
The points are scattered on either side of the intersection
point I.
This implies that e2is a close approximation to e for large
positive values of x (large t ) and that C2 approaches a horizon- Steps
tal line for large negative values of x (small ?). Figure 6 shows
C2 for a set of parameter values and confirms this. 1. Fit a smooth curve to approximate e.
2. Draw L1 & L,.
3. Obtain P1, q1 from the slope and intercept of L1.
4. Obtain p from the vertical separation between Ll & La.
5 . Obtain I from the intersection of C with L1.
6. Fit the L,. From its slope, compute P2 using (11).
7. Draw & (slope P2 passing through I ) and obtain q2
from the y-intercept of L2.
4.2 01 a 02
Steps
1. Fit a smooth curve to approximate e.
2. Locate A and use (30) to obtain q & p.
I 3. Obtain q1 & 0 2 from (23) & (24).
-3.8 -2.8 -1.8 B .e 1 .e 4. Draw the tangent at the left end of C to approximate
)I L,. Obtain PI from the slope of L,.
5 . Draw L 1 ,parallel to L, and separated by ln(l/p). The
[p=O.4, q l = l , 01=1.25; qp=O.2, /31=2.75] intersection of this with the fitted C gives I , and thus XI.
Figure 6. Shape of e & e2,and Inflection-Point C 6. Compute p2 from (7).
482 IEEE TRANSACTIONS ON RELIABILITY, VOL. 44, NO. 3. 1995 SEPTEMBER
4.3 171 p 72
Most of the data points are to the right of I. Hence, the
estimation must be based on the properties of C to the right of I.
Steps
1. Fit a smooth curve to approximate e.
2. Locate C. Use (40)to obtain p & q.
3. Obtain q1 & 712 from (33) & (34).
4. Draw the tangent at the right end of C to approximate
L1, ensuring that it passes through (ln(ql), 0). Obtain from
the slope of L1.
5. The intersection of L1 with the fitted C gives I, and
thus XI.
6. Compute p2 from (7). I
' x- X. '
Use (2) & (3) in (43): y(x) = 0.( x - ln(q2)) as x - +W. (52)
ln(p.R,k-l+q)
y ( x ) = pax + In (44) Eq (51) corresponds to La, and (52) corresponds to &. Figure
v! In (R2 1 8 shows the asymptotes for a set of parameter values, and L1
given by (5) with = 0 . The vertical separation between L1
Y'(X) = P*P*Sl(X)+ Q'P'SZ(X)
& is ln(k), and that between La & & is ln(p.k+q); they
increase with k.
= 0.(p.k.(R2(eX))k-1 + q).s2(x). (45)
t
-1 . B 0.0 1.0 2.0 3.0 4.0
v2=15,p=2.75]
b=0.4, v1=1.5,
lim ( y ( x ) - P.x) = -P.ln(qz). (50)
x-+m Figure 8. Shape of e
.IIANG/MURTHY: MODELING FAILURE-DATA BY A MIXTURE OF TWO WEIBULL DISTRIBUTIONS 483
It is not possible to obtain the inflection points analytical- 4. Obtain (p .k + q ) from the vertical separationbetween
ly. Figure 7 shows that there is only one inflection point. Plots 4 & La; this separation = ln(p-k +
q).
for a range of parameter values indicate the same. This 5. Draw a vertical line at x = 1n(q2);its intersection with
inflection-point is T with coordinates (xT,y T ). C gives y 2 . This yields (p.exp(-k) +q-exp(-1)) from (55).
6. From the results of steps 4 & 5 , obtain p & k. From
5.1 Well-Separated Mixture these, q l is obtained using (41).
Following J&K, the mixture is well separated when k %
1
1. For the two points of special interest on C (corresponding X
0.0 .’
R(772) = p-exp(-k) + q.exp(-1). (53b)
From (23),
-7.8.
ii
t
-1.2 0.0 1.2 2.4 3.6
X
accept a mixture of Weibull distributions as an appropriate 10 944 0 7.2558 0.138012 0.86 1988 6.8401 -1.90708
model when the data came from a different model. 4 11 959 0 8.2117 0.159181 0.840189 6.8554 -1.75228
12 1071 1 t * 6.9763
This leads to the problem of incorrect selection of the model; 13 1318 1 t * 7.1839 *
the probability of this occurring is high when the data set is 14 1377 0 9.4458 0.181464 0.8 18536 7.2277 - I .60825
15 1472 1 * * 7.2944 *
small. In such cases, other statistical approaches also encounter
16 1534 0 10.6001 0.204367 0.795633 7.3356 - 1.47571
the same problem. This paper does not address this difficult 17 1579 1 7.3645 *
problem. Rather, it allows the data to decide whether it is to 18 1610 1 1 * 7.3840
be modeled by a mixture of 2 Weibull distributions based on 19 1729 1 * 7.4553
20 1792 1 I * 7.4911 *
a visual inspection of the plots on WPP. * *
This approach is consistent with data-dependent modeling, 21 1847 1 t
7.5213
22 2400 0 11.9468 0.231087 0.768913 7.7832 - 1.33645
where one takes a pluralistic view in the sense that a given data 23 2550 1 * 7.8438
set can be modeled adequately by more than one model for- 24 2568 1 * * * 7.8509
mulation. Jiang & Murthy [5] discuss this issue in the context 25 2639 o 13.3932 0.249786 0.740214 7.8782 -1.20126
of comparing model formulations each involving 2 Weibull 26 2944 0 14.8396 0.288484 0.711516 7.9875 -1.07776
27 2981 0 16.2960 0.317183 0.682817 8.oooO -0.96357
distributions.
28 3392 0 17.7324 0.345881 0.654119 8.1292 -0.85692
The graphical is a point in The 29 3392 0 19,1788 0.374579 0.625421 8,1292 -0.75645
plot yields some insight for model selection and provides in- 30 3791 1 * I
8.2404 t
itial estimates. The graphical method does not provide any s- 31 3904 o 20.6941 0.404646 0.595355 8.2698 -0.65663
confidence limits for the estimates. More sophisticated statistical 32 4443 1 * I
* 8.3991 *
need to be used to the model and to obtain 33 4829 0 22.2891 0.436292 0.563708 8.4824 -0.55649
34 5328 0 23.8841 0.467838 0.532062 8.5807 -0.46046
more accurate parameter-estimates and s-confidence limits. 35 5562 25,4791 o,499585 o,500415 8,6237 -o,36771
We are studying extensive simulations to evaluate the ef- 36 5900 1 * * * 8.6827 t
ficacy of the graphical approach for various sizes of data sets. 37 6122 o 27.1805 0.533343 0.466657 8.7196 -0.27160
38 6226 1 * * * 8.7365
39 6331 0 29.0128 0.569698 0.430302 8.7532 -0.17047
40 6531 0 30.8451 0.606054 0.393946 8.7843 -0.07092
8. EXAMPLES 41 6711 1 ' 8.8115 t
42 6835 1 * I
* 8.8298 t
Notation 43 6947 1 t
* 8.8461 *
44 7878 1 * * 8.9718 *
implies an estimate. *
45 7884 1 t
* 8.9726 1
46 10263 1 * 9.2363
8.1 Example 1 47 11019 0 34.8761 0.686034 0.313966 9.3074 0.14710
The data are actual failures of the throttle for 25 pre- 48 12986 0 38.9071 0.766014 0.233986 9.4716 0.37328
* 9.4806 1
from [6: exercise 10.191, The data comprise both failure and
right-censored observatgns. Here t iS the distance (h) before #o: Failure I : Censored
) ~ 1~ ; Data
failure (or censoring) as opposed to time. Table 1 displays the *Implies nor upplicubk
JIANG/MURTHY: MODELING FAILURE-DATA BY A MIXTURE OF TWO WEIBULL DISTRIBUTIONS
a .e
Y
a .8
Y
-2 .a
-2.5
-4 .a
-5.8
0 = 0.88, (4 = 0.12);
Ref [8] proposes a model with PI # &; its estimates are:
p^, = 1.3, = 9050 (km);
p = 0.5;
TABLE 2 J&K studied the plots on WPP when item failure is modeled
Data for Example 2 [8: p 1931 by a mixture of 2 Weibull distributions and 0, # Oz. They use:
Thymic lymphoma (22 data points) an analytic approach to characterize the asymptotic behavior
of the plots,
159 189 191 198 200 207 220 235 345 250
256 251 265 266 280 343 356 383 403 414 a computational approach to obtain the shapes for a range
428 432 of parameter values.
Reticulum cell sarcoma (38 data points) Based on this, they propose a classification scheme for the
317 318 399 495 525 536 549 552 554 557 shapes of the plots. They comment on the estimation of the
558 571 586 594 596 605 612 621 628 631 parameters of the model.
636 643 647 648 649 661 663 666 670 695 This section critically examines the J&K results and shows
697 700 705 712 713 738 748 753
~~
that:
some of their analytic results are incorrect,
Use the method of section 6.2: their inference based on computational studies are not correct,
some of their statements regarding parameter estimation are
0 = 0.3, 0 = 7.6; not valid.
186 IEEE TRANSACTIONS ON RELIABILITY. VOL. 44. NO. 3. 1995 SEPTEMBER
Four main points are treated in sections 9.1 - 9.4. plotted the curve for values of x to the left of I , then it would
9.1 Point 1 have been similar to [ 1: figure lb] . The shape of both these
J&K deal only with P1 # P2. This paper deals both cases, plots is similar to that in our figure 4, in the sense that they
and as indicated herein, the two cases need to be treated are close to & near I and approach LI as x - 03. The inter-
separately. val of x over which the curve flattens out (and is close to the
horizontal line y = ln(-ln(p)), shown more clearly in our
9.2 Point 2 figure 6) depends on q l / q z . The larger the ratio, the greater
is the flatness as shown in [ 1: figure la] relative to [ 1: figure
The main analytic results of J&K (proved in their appen-
lb].
dix) are:
[ I : figure IC]. This is similar to our figure 3 as q1 = q 2 .
- PI -
2A. y'(x) asx -00,
[ I : figures Id - lfl. These show the plots for values of x to
the left of I and correspond to q 1 < q 2 . Had J&K plotted
- P2 -
2B. y'(x) asx +00.
the curve for values of x to the right of I , then they would
have obtained shapes similar to that in our figure 2. The only
difference in [ l : figures Id - lfl is the interval over which
They prove 2A & 2B using a limiting argument which requires
q1 - -
0 and q2 m; this derivation is not valid and can lead
to wrong results. The results of our section 3 show that 2A is
the plots are relatively flat and close to the line y = In(-
ln(4)). The degree of flatness increases as q1/q2 decreases
as shown in [l: figures Id - lfl.
true and that 2B is false. Discussion. As argued in the 3 previous paragraphs, the shape
of the [1: plots] are similar for all combinations of P1/P2( #
9.3 Point 3 1) and q 1 / q 2 .The only thing that is different is the flatness
For p = 0.3, J&K compute C for a range of parameter of the plot near the inflection point. The inflection point is
values. Based on their extensive computation, they classify 6 to the right of I when q 1 / q 2> 1 and to the left when q 1 / q 2
shapes of C (labeled A - F) and show them in [ l : figures la < 1. The degree of flatness around the inflection point
- If]. Four comments on their plots are given in sections 9.3.1 depends on q1/q2. In light of this, the J&K classification
- 9.3.4. scheme has no basis.
-
analytic result which requires that C must have slope P2 as x differently.
00.
9.3.4 Comment 3.4 We derive some of the results in the text. For notational
Our sections 3 & 5 establish that there are only two distinct convenience, we suppress the arguments and write Ri(e")as
shapes for C over the interval -00 < x < +m; the shape of simply Ri,etc.
C depends on whether PI = P2 or not. J&K claim 6 different
shapes; we comment on their plots [ l : figures l a - lfl and show A. 1 Preliminary Results
why this is not correct.
We state these without proof as they are easily obtained
[l: figures la & lb]. These are for q1 > q2. [l: figure la] using the L'Hospital rule. Interested readers can obtain a copy
shows the plot for values of x to the right of I; if they had of the proofs by writing to the second author.
JIANGIMURTHY: MODELING FAILURE-DATA BY A MIXTURE OF TWO WEIBULL DISTRIBUTIONS 487
x-*w
+
Lemma 4. lim ([ln(R1) l]/[ln(R) + 11) = 1 . ($1,2(x) is defined in section 3)
lim (ln(R)/ln(Rl)) = 1.
x- +a
Lemma 6 . X--W
lim (ln(Rl)/ln(p.R1+q)) = l / p .
Lemma 7. x--w
lim ( R 2 / R ) = 1; , ? y m ( R 2 / ~ )= 114.
lim (ln(R)lln(R2)) = 1.
x- w
Q.E.D.
Lemma 9. x--w
lim (In(p.R$-’+g)/ln(R2)) = p . (k-1);
= p*p1.1.1/p = 01.
lim (sl) =
x-fW
liyW
Use lemma 1 in (27):
= p.PI-O-l/[q-ln(q)]= 0. Q.E.D.
Since,
Similarly,
488 IEEE TRANSACTIONS ON RELIABILITY, VOL. 44. NO. 3. 1995 SEPTE,MBER
3, 1995, pp 187-198.
lim (sl) = lim [6] A.D.S. Carter, Mechanical Reliability (2nd ed), 1986, pp 303-308,
x--00 x--m
455-460; John Wiley & Sons.
[7] K.C. Kapur, L.R. Lamberson, Reliability in Engineering Design, 1977,
pp 314-317; John Wiley & Sons.
[8] R.C. Elandt-Johnson, N.L. Johnson, Survival Model and Data Analysis,
1980, pp 192-196; John Wiley & Sons.
AUTHORS
I’KOC,’EEUINGS FREE PROCEEDINGS FREE PROCEEDINGS FREE PROCEEDINGS FREE PROCEEDINGS FREE I’IIOCELIlIh’(;.\
Your Reliability Society gives each member a copy of the Proceedings for:
AR&MS (Annual Reliability and Maintainability Symposium),and/or
IRPS (International Reliability Physics Symposium),
depending on the type of membership.
IVwhen there are surplus copies, the Society tries to make them available to:
Instructors in Reliability. (Sometimes we can supply a copy for every member of a reliability class.)
Technical Libraries.
At this time, there are no such surplus copies. 4TRb