0% found this document useful (0 votes)
5 views

Competing risks for repairable systems: A data study

This paper analyzes a dataset from the Offshore Reliability Data (OREDA) Database to model maintenance and repair of components subject to various failure mechanisms. The authors focus on a compressor system and employ an imperfect repair model to estimate the effectiveness of maintenance strategies. The study aims to unveil the quality of maintenance performed by the crew and improve future maintenance planning based on the analysis of failure events and repair outcomes.

Uploaded by

Batara Sinaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Competing risks for repairable systems: A data study

This paper analyzes a dataset from the Offshore Reliability Data (OREDA) Database to model maintenance and repair of components subject to various failure mechanisms. The authors focus on a compressor system and employ an imperfect repair model to estimate the effectiveness of maintenance strategies. The study aims to unveil the quality of maintenance performed by the crew and improve future maintenance planning based on the analysis of failure events and repair outcomes.

Uploaded by

Batara Sinaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Journal of Statistical Planning and

Inference 136 (2006) 1687 – 1700


www.elsevier.com/locate/jspi

Competing risks for repairable systems:


A data study
Helge Langseth∗ , Bo Henry Lindqvist
Department of Mathematical Sciences, Norwegian University of Science and Technology, Norway
Received 13 February 2004; accepted 14 October 2004
Available online 15 August 2005

Abstract
We analyze a dataset from the Offshore Reliability Data (OREDA) Database, looking for a model,
which can be used to unveil aspects of the quality of the maintenance performed. To do so, we must
investigate the mathematical modeling of maintenance and repair of components that can fail due to
a variety of failure mechanisms.
© 2005 Elsevier B.V. All rights reserved.

MSC: 62N05; 62F10; 62F40

Keywords: Competing risks; Repairable systems; Data analysis

1. Introduction

In this paper we employ a model for components which fail due to one of a series of
“competing” failure mechanisms, each acting independently upon the system. The compo-
nents under consideration are repaired upon failure, but are also preventively maintained.
The preventive maintenance (PM) is performed due to casual observation of an evolving
failure. The maintenance need not be perfect; we use a version of an imperfect repair model
to allow a flexible yet simple maintenance model. Our motivation for analyzing this dataset
is to estimate quantities, which describe the “goodness” of the maintenance crew; their
ability to prevent failures by performing thorough maintenance at the correct time. Our

∗ Corresponding author. Present address: SINTEF Technology and Society, Department of Safety and Reliability,
N-7465 Trondheim, Norway.
E-mail addresses: [email protected] (H. Langseth), [email protected] (B.H. Lindqvist).

0378-3758/$ - see front matter © 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.jspi.2004.10.032
1688 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

main focus in this paper is to analyze a dataset from the OREDA database (OREDA, 2002),
which gives the intermediate failure times, the “winning” failure mechanism associated
with each failures (i.e., the failure mechanism leading to the failure), and the maintenance
activity.

2. Dataset and model implications

The dataset we want to analyze is presented in Table 1. This dataset, which is from the
OREDA database (OREDA, 2002), describes a single compressor system on an offshore
installation. A compressor system can be broken down into several subsystems (compres-
sor unit, lubrication system, shaft seal system, etc.); this particular dataset gives an account
of the compressor unit. The compressor unit is again divided into several maintainable
items. A maintainable item is defined in OREDA as “an item that contribute a part, or
an assembly of parts, that is normally the lowest indenture level during maintenance”.
The maintainable items making up the compressor unit are, among others, valves, in-
ternal piping and radial bearing. From analyzing the database we can identify which
maintainable items that lead to a particular failure of the compressor unit, but we are
not always able to tell which maintainable items that are affected by the correspond-
ing repair. We will therefore follow the OREDA Handbook and make our analysis at
the subunit level, neglecting the (partial) information regarding maintainable items. The
dataset gives the time of each event, the failure mechanism leading to it, and the failure’s
severity.
The failures can roughly be seen as the result of two different failure mechanisms, coded
as either “1” or “2” in the FM column of Table 1. The last digit of the FM code is used
to give further details; 1.0 is a general description of the FM (default value, e.g., “General
mechanical failure”) whereas codes 1.1–1.4 are specializations of this failure mechanism.
The same coding is employed for failure mechanism “2”. In our analysis we will focus on
failure mechanism “2”, where we group together failures from failure mechanisms coded
with values from 2.0 to 2.7. Failures due to other failure mechanisms are treated as external
events (i.e., random censoring).
Formally, we consider a mechanical component which is set into operation at time t = 0.
We assume that the component is (as good as) new at that time. At some random times
T1 , T2 , . . . the component fails. After failure the component is immediately repaired and
put back into service. The data can now be given as an ordered sequence of points

(Yi , Ji ); i = 1, 2, . . . , n, (1)

where each point represents an event. Here

Yi = inter-event time, i.e. time since previous event


(time since start of service if i = 1)

0 if critical failure,
Ji = (2)
1 if degraded failure.
Table 1
This dataset describes the service time of a single component over a period of 1659 time units taken from the OREDA data (OREDA, 2002). (For confidentiality reasons
the (non-standard) time unit is not described in further detail.)

Time FM Severity Time FM Severity Time FM Severity Time FM Severity

220 1.0 I 429 2.0 C 651 1.0 D 1109 2.0 D


233 1.0 I 460 * * 657 1.0 D 1117 2.2 I
234 1.4 I 470 2.0 C 660 2.0 D 1197 2.0 C
240 2.6 D 474 1.0 C 666 1.0 I 1258 1.1 C
265 1.0 I 475 2.0 I 668 1.0 I 1269 2.0 C
270 1.3 I 476 2.5 I 680 2.0 D 1297 2.0 D
273 1.0 I 508 2.2 D 681 1.0 D 1309 1.0 D
279 * C 522 2.0 I 684 2.0 D 1322 2.0 C
285 1.0 I 523 2.0 I 691 1.0 I 1346 2.0 D
287 * C 535 2.0 D 693 1.0 D 1349 2.0 D
294 2.3 D 542 1.0 D 705 1.0 C 1359 2.7 D
295 2.0 I 570 2.0 I 717 1.0 C 1363 1.0 C
300 1.0 D 580 1.2 C 834 2.0 C 1448 2.0 C
325 2.0 C 604 2.0 D 837 1.2 C 1476 2.0 D
328 1.0 C 612 2.0 C 841 1.0 C 1481 2.0 D
333 2.0 C 613 2.0 C 843 1.0 C 1557 1.3 C
365 1.0 I 614 2.0 C 845 1.0 C 1606 1.3 I
368 2.3 D 615 2.0 D 875 1.0 C 1610 2.0 D
369 2.4 I 634 1.0 I 972 1.0 C 1642 * D
381 1.0 I 636 * C 1037 1.0 C 1659 2.7 D
417 2.0 I 637 1.0 D 1084 1.0 I
418 2.1 D 638 2.0 I 1091 1.0 D

The failures are caused by different failure mechanisms, coded in the FM column. The failure mechanisms can roughly be grouped into two groups: One containing
codes 1.0–1.4; the other containing codes 2.0–2.7. Each group of failure mechanisms is coded “hierarchically”: For instance 1.0 is a general description of the failure
mechanisms in the first group whereas codes 1.1–1.4 are specializations of this failure mechanism. “*” denotes missing information. The same coding is employed for
the other group. Severity describes the criticality of the failure (“C” denotes critical failures, “D” are degraded failures, “I” are incipient failures, and we again use “*” to
represent missing data).
H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700 1689
1690 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

Performance
Good as new
Incipient/
Degraded
Unacceptable

t
Critical
Failure

Fig. 1. Component with degrading performance.

We have to find a model for data of type (1). The basic ingredient in such a model is the
hazard rate (t) at time t, for a component which is new at time t =0. We assume that (t) is
a continuous and integrable function on [0, ∞). In practice it will be important to estimate
(·) since this information may, e.g., be used to plan future maintenance strategies.
The first thing we observe from our dataset is that when a failure is detected, the main-
tenance crew will repair the failure, and put the component back in operation. Thus, the
dataset must be consider using theory for repairable systems (see e.g., Ascher and Feingold,
1984). The most frequently used models for repairable systems assume either perfect repair
(renewal process models) or minimal repair (nonhomogeneous Poisson-process models).
In our case, this picture is complicated by the fact that the analysis is performed at the
subunit level. Some of the maintainable items building up the compressor unit will be re-
paired thoroughly after an event, whereas others may be left as they are or given only minor
adjustments. Hence, neither perfect nor minimal repair models may be appropriate, and we
shall here adopt the imperfect repair model presented by Brown and Proschan (1983). This
will introduce one parameter, p, which is the probability of perfect repair for a preventive
maintenance. This quantity is of interest since it can be used as indication of the quality of
maintenance. The parameters may in practice be compared between plants and companies,
and thereby unveil maintenance improvement potential.
The events can be categorized as either (i) Critical failures, (ii) Degraded failures (iii)
Incipient failures), or (iv) External events (component taken out of service or some other
random censoring). This information can be read off the Severity column in Table 1. In
the OREDA database a critical failure is an event that cause immediate and complete loss
of a system’s capability of providing its output. A degraded failure is defined as a failure
that prevents the system from providing its output within specifications, and which may
develop into a critical failure in time. An incipient failure is defined as a failure that does
not immediately cause loss of the system’s capability of providing its output, but that could
develop to a critical or degraded failure in the near future if not attended to. We will not
distinguish between incipient and degraded failures in our analysis, but term both severities
“Degraded”.
The development of a failure can be seen as depicted in Fig. 1. We assume that the
component is continuously deteriorating when used, so that the performance gradually
degrades until it falls outside a preset acceptable margin. As soon as the performance is
unacceptable, the component experiences a critical failure. Before the component fails, it
may exhibit inferior performance, which will be found as a degraded failure in our dataset.
H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700 1691

This is a “signal” to the maintenance crew that a critical failure is approaching, and that
the inferior component may be repaired. In our dataset these “signals” are detected through
continuous condition monitoring, observed production interference, or functional testing.
When the maintenance crew intervenes and repairs a component before it fails critically,
the repair action is called (an unscheduled) preventive maintenance. The maintenance crew
will typically strive for avoiding critical failures, because they are considered more costly
than the degraded ones.
Our model must take into account the relation between preventive maintenance and
critical failures. As it is assumed that the component gives some kind of “signal”, which will
alert the maintenance crew, it is not reasonable to model the (potential) times for preventive
maintenance and critical failures as stochastically independent. We shall therefore adopt
the random signs censoring of Cooke (1996). This will eventually introduce a single new
parameter q, with interpretation as the probability that a critical failure is avoided by a
preceding unscheduled preventive maintenance.
The dataset we analyze in this paper stems from a single component, but the analysis
can easily be extended to a more general situation by assuming that all components fail
independently of each other.

3. Basic ingredients of the model

In this section we describe and discuss the two main building blocks of the model we
will use. In Section 3.1 we consider the concept of imperfect repair, as defined by Brown
and Proschan (1983). Then in Section 3.2 we introduce our basic model for the relation
between preventive and corrective maintenance.

3.1. Imperfect repair

The repairable systems model we propose to use for this dataset is motivated by the
imperfect repair model of Brown and Proschan (1983), which we term the BP model in the
following. We therefore start with a review of this model. As usual, we define Ti to be the
consecutive event times, and we use Yi for the inter-event times (time between events). For
simplicity of notation we assume that the component is observed from time t = 0, and with
the definition T0 = 0, we have Yi = Ti − Ti−1 for i = 1, 2, . . . , n where n is the number of
events (see Fig. 2). We use (t) for the hazard rate for a component of “age” t, and
P (Event in [t, t + t) | Ft − )
(t | Ft − ) = lim
t↓0 t
for the conditional intensity given Ft − , the history of the counting process up to time t, see
Andersen et al. (1992). Furthermore, N (t) is the number of events in (0, t] and N (t − ) is
the number of events in (0, t). We will assume that failures are repaired immediately, and
we disregard the time to repair.
This notation enables us to repeat some of the most standard repair models: Perfect
repair is modeled by (t | Ft − ) = (t − TN(t − ) ) where t − TN(t − ) is the time since the last
event, i.e., age is measured by the inter-event times (Y in Fig. 2); minimal repair is given by
1692 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

t
0 T1 T2 T3
Y1
[I] Y2 Y3

[I]
2

[I]
3

Fig. 2. We have three time dimensions to measure the age of a component: age w.r.t. calendar time (T), age
w.r.t. inter-event times (Y), and effective age (). The value of i , i > 1, depends upon both inter-event times as
well as maintenance history. This is indicated by a dotted line for the i ’s.

(t | Ft − ) = (t), that is, the age is equal to the calendar time (T in Fig. 2); imperfect repair
can be modeled by (t | Ft − ) = (N(t − ) + t − TN(t − ) ) where N(t − ) ∈ [0, TN(t − ) ] is the
effective age immediately after the last repair. In the BP model, N(t − ) is defined indirectly
by letting a failed component be given perfect repair with probability p; with probability
1 − p it is treated with minimal repair. To determine i (i > 1) we therefore need both T
(or Y) and the maintenance history.
It is easy to see that for i 1 we have under the BP model that i = 0 with probability p
and i = i−1 + Yi with probability 1 − p; 0 = 0 by definition. We can also express i
by using the inter-event times:

⎪ 0 with probability p,



⎪ Y i with probability p · (1 − p),



⎪ Y i−1 + Y i with probability p · (1 − p)2 ,

⎪ .
⎨ ..
i =  i (3)

⎪ Y p · (1 − p) i−2
,

⎪ j with probability

⎪ j =2



⎪ i

⎩ Yj with probability (1 − p)i−1 .
j =1

For simplicity of notation we follow Kijima (1989) and introduce the random variable Di ,
used to denote the outcome of the repair after the i’th event; Di = 0 if it was a perfect repair
and Di = 1 if it was minimal. The BP model with parameter p corresponds to assuming
that all Di are iid and independent of Y1 , Y2 , . . . with P (Di = 0) = p, P (Di = 1) = 1 − p,
i = 1, . . . , n; we will assume that Di may be unobserved. It follows that
⎛ ⎞
i i
i = ⎝ Dk ⎠ Yj . (4)
j =1 k=j

Let h = {D1 , D2 , . . . , Dn } be a repair history; that is, a set consisting of realization of all
maintenance actions telling if it was perfect or minimal. Let H be the set of possible repair
histories. Note that |H| = 2n−1 . (Strictly speaking, there are 2n possible repair histories,
but as the last maintenance action has no effect on the observed inter-event times we have
no interest in the repair at the last event, and it suffices to consider 2n−1 histories.)
H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700 1693

Notice that we can express the conditional intensity, given the history as (t | Ft − )
=(N(t − ) + t − TN(t − ) ). Consequently, to make the BP model operational, we only need
the parameter p and the distribution function f () = () exp(−()); () is the accu-

mulated hazard, () = u=0 (u) du. Having defined those, the distribution function of
the inter-event time Yi is to be calculated as
⎛ ⎞ ⎛ ⎞
i
 i i−1 i−1
fYi (yi | y1 , . . . , yi−1 ) = f ⎝ yk ⎠ · P ⎝ Dk − D k = 1⎠ , (5)
j =1 k=j k=j k=j −1

where D0 = 0 (the component is assumed to be as good as new at t = 0) and we define


 i−1
k=i Dk = 1. It is enlightening

to recognize Eq. (5) as a mixture distribution, where the
i−1
mixture weights, P ( i−1 D
k=j k − k=j −1 Dk = 1), can be interpreted as the probability that
the j’th repair was the last perfect repair before time Ti .
One of our goals will be to estimate the BP-parameter p. It is a quantity of some interest,
because it can be used as an indication of the quality of the performed maintenance. The
parameter can be compared between plants and companies, and thereby unveil maintenance
improvement potential (because it measures maintenance quality independent of the failure
processes).

3.2. Modeling preventive maintenance versus critical failures

Recall from Section 2 that PM interventions are reported as degraded failures. Degraded
failures censor critical failures, and the two types of failure may be highly correlated. We
model the interaction between PM and critical failures as a competing risks problem. The
“true” underlying competing risks model is not identifiable from a competing risks dataset,
in particular can any dataset of this type be explained by a model of independent risks
(Tsiatis, 1975). Bunea and Bedford (2002) investigate the result of this model uncertainty on
maintenance optimization, and conclude that the effect of making wrong model assumptions
can be substantial. Input from domain experts and careful analysis of the information actually
available in the dataset is therefore of major importance when deciding how to model the
relationship between preventive maintenance and critical failures. In our case, the degraded
failures are defined as a step towards critical failures (consider again Fig. 1). Maintenance
personnel who find a component in a degraded state will (typically) repair it to avoid
a critical failure, because critical failures usually lead to higher costs than repairing the
degraded failure. Hence, it is reasonable to assume the competing risks to be positively
correlated.
A number of possible ways to model interaction between degraded and critical failures
are discussed by Cooke (1996). We adopt one of these, called random signs censoring. In the
notation introduced in Section 2 we consider here the case when we observe pairs (Yi , Ji )
where the Yi are inter-event times, whereas the Ji are indicators of failure type (critical or
degraded). For a typical pair (Y, J ) we let Y be the minimum of the potential critical failure
time X and the potential degraded failure time Z, while J = I (Z < X) is the indicator of the
event {Z < X} (assuming that P (Z = X) = 0 and that there are no external events). Thus
1694 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

we have a competing risks problem. However, while X and Z would traditionally be treated
as independent, random signs censoring makes them dependent in a special way.
The basic assumption of random signs censoring is that the event of successful preventive
maintenance, {Z < X}, is stochastically independent of the potential critical failure time X.
In other words, the conditional probability q(x) = P (Z < X | X = x) does not depend on
the value of x.
Cooke (1996) does not fully describe random signs censoring, as the conditional density
function f (z|X = x, Z < X) can be chosen arbitrarily. Lindqvist et al. (2005) develop a
framework called the repair alert (RA) model. It is a special case of random signs censoring,
where the existence of a continuous function G(t) such that

G(z)
P (Z z | X = x, Z < X) = , 0 z x (6)
G(x)

is assumed. Lindqvist et al. (2005) show that for any pair of sub-survival functions compat-
ible with random signs censoring, there exists exactly one repair alert model (under some
regularity condition).
The repair alert model introduces the new parameters G(·) and q. Langseth and Lindqvist
(2003) propose a special parametrization of the RA model called the intensity proportional
repair alert (IPRA) model. This model is defined as follows:

• Let X have hazard rate function (·) and cumulative hazard (·).
• {Z < X} and X are stochastically independent.
• Conditionally, given Z < X and X=x, the distribution of the intervention time Z satisfies

(z)
P (Z z | X = x, Z < X) = , 0 z x. (7)
(x)

Thus, the IPRA model is a repair alert model where G(t) = (t). The IPRA model
therefore assumes the conditional density of Z proportional to the intensity of the underlying
failure process. This seems like a coarse but somewhat reasonable description of the behavior
of a competent maintenance crew. The assumptions above determine the distribution for
Y = min(X, Z) as

fY (y) = (1 − q) (y) exp (−(y)) + q (y) Ie((y)), (8)


where Ie(t)= t exp(−u)/u du is known as the exponential integral (see, e.g., Abramowitz
and Stegun, 1965).
Using the IPRA model, we define the distribution ofY, the time to the next event for a com-
ponent as good as new, from the hazard rate of the failure process, (t), and the parameter q.
In particular, this defines (t), the intensity of events. Finally, we can use this to calculate the
conditional intensity given the history, which is given as (t | Ft − )=(N(t − ) +t −TN(t − ) )
under the BP-model. Langseth and Lindqvist (2003) prove identifiability of all parameters
in this combined model.
H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700 1695

4. The data analysis

In this section we will generate a formal test for the applicability of the combined model
for the dataset in Table 1. We focus our attention to failure mechanism 2, other events
are treated as random censorings (see, e.g., Crowder (2001, Chapter 2) for how to handle
censoring under competing risks). When we come to the point where a parametric model
is required, we will assume that the underlying distribution of failures (the distribution of
X, when using the notation of Section 3.2) follows the log-normal distribution.
One of the fundamental statistics when working with competing risks models is the
probability of repair beyond t, P (Z < X|Z t, X t), denoted (t) by Cooke (1996). One
reason for this quantity to be of such importance is that many classes of models give a unique
footprint in terms of the set of possible functions (t) they can give rise to. For example,
the repair alert models are characterized as the set of models for which (t) < (0) for all
t > 0 (recall that (0) = q). Similarly, it is simple algebraic manipulations to see that the
footprint of the IPRA model can be expressed as
(t) Ie((t)) − exp(−(t))
(t)/q = . (9)
q (t) Ie((t)) − exp(−(t))
To test if the IPRA model fits our dataset well we would like to estimate (·) from data,
and compare the estimate with the functional relationship prescribed by Eq. (9). In Section
4.1 we outline the relevant machinery for the simple case where all repair actions are perfect.
The general case is considered in Section 4.2.

4.1. Testing the applicability of IPRA under perfect repair

To use the IPRA model under perfect repair we make two assumptions:

(i) The Repair Alert model must be applicable; (t) < (0) for all t > 0.
(ii) Among the RA models, the IPRA model is to be chosen; G(t) = (t).

The modeling assumptions could be verified, e.g., in the spirit of Dewan et al. (2003),
who use U-statistics to test various hypothesis regarding (t). However, we prefer a test
that can be extended to the general framework of imperfect repair, and it is not obvious
how the test statistics proposed by Dewan et al. (2003) behaves when integrated into the
repairable systems framework. Therefore, we will not proceed along these lines here, but
rather propose to use parametric bootstrap (see, e.g., Efron and Tibshirani, 1993) to test the
hypothesis
H0 : IPRA is appropriate vs.
H1 : IPRA is not appropriate.
We want to use the footprint in Eq. (9) as our starting point for this test. By substituting
u = (t), we express the IPRA footprint as
u Ie(u) − exp(−u)
v(u; q) = . (10)
qu Ie(u) − exp(−u)
1696 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

To find the observed counterpart to Eq. (10) we must estimate (·) and (·). Fortunately, both
functions have simple non-parametric estimators under perfect repair: Let {#z > t} denote
the number of observed events where {Z < X, Z > t} and similarly let {#x > t} represent
the number of observed events where {X < Z, X > t}. Then, (t) ˆ = {#z t}/({#z > t} +
ˆ
{#x > t}) and (t) = − log({#x > t}/{#x > 0}) are consistent estimators of (t) and (t),
respectively.
Let v̂(u; q̂) be the estimated value of v(u; q̂). To formally test the model, we propose
to define a test based on the area between the theoretical footprint and the non-parametric

estimates. A test statistic defined by ˜ = u=0 |v(u; q̂) − v̂(u; q̂)| du has the interpretation
that a high value of ˜ suggests that the H0 should be rejected. Unfortunately, though, this
test has the undesired property that all values of u are considered equally important. As both
ˆ
{#x > t} and {#z > t}, the two statistics combined to generate (t), are decreasing in t (and
therefore in u), the sampling fluctuations will dominate for “large” values of u, and it seems
unnatural to let these “large” values be as important as the “small” ones. Based on the fact
that the (theoretical) portion of failures happening after time t is given as exp(−(t)), we
propose to define the test statistic as
 ∞
= |v(u; q̂) − v̂(u; q̂)| exp(−u) du. (11)
u=0

The sampling distribution of under H0 is not known, but can be approximated using
parametric bootstrap.

4.2. Generalizing the test to imperfect repair

In this subsection we want to consider a test for the hypothesis

H0 : The data stems from a IPRA + BP model vs.


H1 : The data does not stem from a IPRA + BP model,

and we would again like to base the test on the IPRA footprint in Eq. (10). To generalize
the test outlined above to the imperfect repair framework, we must be able to estimate
(·) and (·) also when the data stems from a repairable system. This is unfortunately not
straight-forward, and to simplify the following exposition, we note that to perform the test,
we only need machinery to calculate the value of the test statistic under H0 . It is therefore
not necessary to generate non-parametric estimators for (·) and (·), we only require
estimators that are applicable under the combined IPRA and BP models.
Under perfect repair, we use the relationship
P (Z < X, Z t)
(t) = (12)
P (Z < X, Z t) + P (Z > X, X t)
ˆ = {#z > t}/({#z > t} + {#x > t}). When we turn to imperfect
to generate the estimator (t)
repair, things are not that simple: We do not know the effective age of the component as it
fails, and are therefore not able to calculate {#x > t} and {#z > t} directly. Note, however,
that we are able to calculate these numbers when we condition on a particular repair history,
H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700 1697

and the correct way of estimating (t) under the BP model would be to sum over all possible
repair histories
 {#z > t|h}
ˆ p) =
(t; P (h; p) · ,
{#z > t|h} + {#x > t|h}
h ∈H

where (t) now must be calculated as a function of p and therefore is denoted (t; p). A
complicating factor is that p in unknown, and must be estimated from the dataset. In the
following presentation we let the maximum likelihood estimator p̂ play the role of this
parameter, and therefore calculate (t;ˆ p̂). Finally, we also need to calculate P (h; p̂). Note
that although this is not explicit in the notation, P (h; p̂) should be calculated conditionally
on the observations (y1 , j1 ), (y1 , j1 ), . . . , (yn , jn ). Now, we can in principle use Eq. (5) to
ˆ p̂) under H0 . However, the calculations of (t;
calculate P (h; p̂) and thereby (t; ˆ p̂) come
n
at a computational cost of order O(2 ), where n is the number of observations. In our case,
this corresponds to summing approximately 1025 terms, a computationally prohibitive task.
We therefore propose an approximation to (t; ˆ p̂).
The approximation tries to estimate each term in Eq. (12) simultaneously, and thereby
gain calculation efficiency. For example, we can use

k:jk =1 P (yk + k−1 > t|y1 , . . . , yn , j1 , . . . , jn )
n
to estimate P (Z < X, Z t; p̂). Note that the stochastic element in the event {yk + k−1
> t|y, j} is related to the outcomes of the repair events only. We can calculate the probability
of this event if we have access to probabilities of the form pk,l = P (Dk−1 = Dk−2 = · · · =
Dk−l = 1 | y, j). These probabilities can (under H0 ) be calculated using Eq. (5).
We end up with the following estimator:1

ˆ S k:j =1 P (yk + k−1 > t|y, j)
 (t; p̂) = k . (13)
k P (yk + k−1 > t|y, j)

Now, we have an estimate of the footprint of the combined model under H0 as given by
Eq. (10). Fig. 3 shows v(u; q̂) together with v̂(u; q̂). The calculation of v̂(u; q̂) was based
on the maximum likelihood estimators of q and (t). We get a reasonable fit for small values
of u, whereas the fit is poorer when u increases. However, as mentioned above, the number
of data points in this part of the model is rather small and the poor fit might therefore be
expected. To test, if this lack of fit is due to sampling fluctuations or not we proceed by
defining the test statistic as in Eq. (11), and approximate its sampling distribution under
H0 through parametric bootstrap. The approximated density function, called fˆ(B) ( ) in Fig.
4, can be used to make formal inference regarding the model. The dataset gives = 0.217,
and from the bootstrap distribution we find that P (B) ( > 0.217 | H0 ) = 0.25. Hence, we
accept the hypothesis that IPRA combined with BP defines an appropriate class of models
for this dataset (at the 25% level).

1 It is a consequence of Slutsky’s theorem that this is a consistent estimator for (t; p̂) under H .
0
1698 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

1
0.9
0.8

v (u /q = 7)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.5 1 1.5 2
u

Fig. 3. The plot of the observed footprint (using p̂ = 0.74) together with the theoretical one. We observe reasonable
fit for smaller t, say u  0.6, but poorer fit for larger values of u.

5
( )
∧(B)
f

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Fig. 4. fˆ(B) ( ) is the bootstrap approximation of the distribution function of . is the area between footprint
v(u; p) and observed values weighted by exp(−u).

4.3. Do we need a repairable systems model?

To this end we have fitted a IPRA model for the time between events, and added a
BP model on top of that to model the repair of the system. The test in the last section
accepted the hypothesis that this combined model generated the dataset in Table 1. The
maximum likelihood parametrization of the model was to use p̂ = 0.74 in the BP model,
combined with the IPRA -model where q̂ = 0.65, and the underlying failure process was
assumed to follow the log-normal distributions; we used the maximum likelihood estimates
for the parameters: ˆ = 3.7 and ˆ 2 = 1.82 .
To test whether the repairable system part of the model is required, we use the likelihood
ratio test. The full model as outlined in Section 3 (p̂ = 0.74) obtains a log likelihood
H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700 1699

of −212.1. The IPRA as a renewal process model (all repairs are perfect; p = 1) gives
log likelihood of −214.4, which is significantly poorer than the full model at level 0.03.
IPRA as a nonhomogeneous Poisson-process model (all repairs are minimal; p = 0) gives
log likelihood of −258.9, which is significantly poorer than the full model at all reasonable
levels. We therefore conclude that the repairable systems model is indeed required (at level
0.03); this is in accordance with expert knowledge, which also points towards perfect and
minimal repair models being unacceptable.

5. Conclusions

In this paper, we have analyzed a dataset from the OREDA database (OREDA, 2002)
describing a compressor system. We fitted a competing risks model and coupled that with
the imperfect repair. Inference was done to verify that this model was capable of describing
the dataset well.
We stress that due to non-identifiability of competing risks problems we cannot really
infer that IPRA combined with the BP model is the correct model for the present dataset.
We can only conclude that the data does not reject this combined model, and that if we
where to choose a building-block among the RA models, then IPRA can be a reasonable
alternative (this is also strengthened by the fact that IPRA is inexpensive w.r.t. the number of
required parameter). To be advocating this model we would need to discuss the assumptions
of RA with domain experts.
Leaving non-identifiability issues aside, the main goal of our analysis has been to find
parameters that describe the quality of the performed maintenance. We have fitted maximum
likelihood parameters, and found that the maintenance crew’s “thoroughness” is quantified
by the parameter p (p̂ = 0.74) and their “eagerness” by q (q̂ = 0.65). It is our belief
that these numbers, together with information about the annual maintenance expenditure
and the number of critical failures that occur per year, characterize the efficiency of the
maintenance at a level that can be compared between different maintenance crews and even
between different installations.

Acknowledgements

We thank the OREDA project represented by the Steering Committee Chairman Runar
]stebZ and project manager Terje Dammen for making the data in Table 1 available to us.
We thank the participants at the Workshop on Analysis of Competing Risks—Statistical and
Probabilistic Approach (Delft, Holland, June 2003) for interesting discussions.

References

Abramowitz, M., Stegun, I.A., 1965. Handbook of Mathematical Functions. Dover Publications, New York.
Andersen, P., Borgan, ]., Gill, R., Keiding, N., 1992. Statistical Models Based on Counting Processes. Springer,
New York.
Ascher, H., Feingold, H., 1984. Repairable Systems Reliability—Modeling, Inference, Misconceptions and Their
Causes. Marcel Dekker, Inc., New York.
1700 H. Langseth, B.H. Lindqvist / Journal of Statistical Planning and Inference 136 (2006) 1687 – 1700

Brown, M., Proschan, F., 1983. Imperfect repair. J. Appl. Probab. 20, 851–859.
Bunea, C., Bedford, T., 2002. The effect of model uncertainty on maintenance optimization. IEEE Trans. Reliability
51 (4), 486–493.
Cooke, R.M., 1996. The design of reliability data bases, Part I and Part II. Reliability Eng. Syst. Safety 52, 137–
146 and 209–223.
Crowder, M.J., 2001. Classical Competing Risks. Chapmann & Hall, Boca Raton.
Dewan, I., Deshpande, J.V., Kulathinal, S.B., 2003. On testing dependence between time to failure and cause of
failure via conditional probabilities. Scand. J. Statist. 31, 79–91.
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapmann & Hall, New York.
Kijima, M., 1989. Some results for repairable systems with general repair. J. Appl. Probab. 26, 89–102.
Langseth, H., Lindqvist, B.H., 2003. A maintenance model for components exposed to several failure modes and
imperfect repair. In: Doksum, K., Lindqvist, B.H. (Eds.), Mathematical and Statistical Methods in Reliability.
Quality, Reliability and Engineering Statistics. World Scientific, Singapore, Chapter 27, pp. 415–430.
Lindqvist, B.H., StZve, B., Langseth, H., 2005. Modeling of dependence between critical failure and preventive
maintenance: the repair alert model, J. Statist. Planning and Inference, Special Issue on Competing Risks.
OREDA, 2002. Offshore Reliability Data Handbook, 4th Edition. Distributed by Det Norske Veritas, P.O. Box
300, N-1322 HZvik, Norway. See also <http://www.oreda.com>.
Tsiatis, A.A., 1975. A nonidentifiability aspect of the problem of competing risks. Proc. Nat. Acad. Sci. USA 72,
20–22.

You might also like