Computing Limiting Average Availability of a Repairable System through Discretization
Computing Limiting Average Availability of a Repairable System through Discretization
Abstract
Formulas for limiting average availability of a repairable system exist only for
some special cases: (1) either the lifetime or the repair time is exponential;
or (2) there is one spare unit and one repair facility. We consider a more gen-
eral setting involving several spare units and several repair facilities; and we
allow arbitrary life- and repair time distributions. Under periodic monitor-
ing, which essentially discretizes the time variable, we compute the limiting
average availability. The discretization approach closely approximates the
existing results in the special cases; and increases the limiting average avail-
ability as we include additional spare unit or additional repair facility.
Keywords: Periodic monitoring, Perfect repair, Semi-Markov process,
Transition probability, Sojourn time
2010 MSC: 90B25, 62N05, 60J27
1 1. Introduction
2 Reliability engineers have always been interested in different techniques
3 to improve the functionality, quality and effectiveness of operating systems.
4 Consequently, availability of a maintained system (that is, the probability
5 that the system is fully functional) is a key quantity of interest. Many heavy
6 industries such as power plants, metal casting, chemical production, space
∗
Corrosponding author
Email addresses: [email protected] (Debolina Chatterjee), [email protected]
(Jyotirmoy Sarkar)
Preprint submitted to Reliability Engineering & System Safety June 27, 2019
____________________________________________________
This is the author's manuscript of the article published in final edited form as:
Chatterjee, D., & Sarkar, J. (2020). Computing limiting average availability of a repairable system through discretization. Reliability
Engineering & System Safety, 193, 106616. https://doi.org/10.1016/j.ress.2019.106616
7 administration etc. rely on expensive machineries for production and mainte-
8 nance. Failure of these machineries is detrimental to the industry, resulting
9 in both economic and logistic challenges. Therefore, the system should be
10 actively maintained by setting up one or more repair facilities and also by
11 keeping one or more back-up spare units to serve as replacement when any
12 damaged/failed unit is sent for repair. Fire detection system, safety valves
13 etc. especially use this kind of spare/repair management. The plan may
14 sound straightforward; but there are many logistical issues to address. For
15 instance, the system has to be monitored continuously to detect failure and
16 switch the operation to the spare unit immediately. Also, one must deter-
17 mine the optimum number of repair facilities that should be established and
18 the optimum number of spare units that should be kept on hand so that the
19 overall system availability is not compromised, and at the same time the cost
20 is within control.
21 We recall a well-studied model of a repairable system and some known
22 results under that model. However, several restrictive assumptions in this
23 otherwise attractive model severely limits its applicability. Here, we remove
24 these restrictive assumptions by devising a discretization approach, which
25 reduces the burden of monitoring the system continuously, reproduces the
26 results in the known special cases, and extends to the most general setting.
2
44 the system. Oftentimes, under continuous life- and repair time distributions
45 and continuous monitoring, the limiting availability exists; and then it equals
46 the limiting average availability, or the limiting proportion of time the system
47 is up; and is given by
M SU T
Aav = (1.1)
M SU T + M SDT
48 where MSUT denotes the mean system up time and MSDT denotes the mean
49 system down time.
50 In the very special case of exponential lifetime and exponential repair
51 time distributions with means µ and ν respectively, [1] (page 206), provided
52 the limiting average availability for the case of one repair facility (r = 1) and
53 either no or one spare unit (s = 0 or s = 1). More specifically,
µ 1/ν
Aav (r = 1, s = 0) = = (1.2)
µ+ν 1/ν + 1/µ
54 since, in this case, in eq. (1.1) MSUT equals the mean time to failure and
55 MSDT equals the mean time to repair; and
µ(µ + ν) 1/ν
Aav (r = 1, s = 1) = = (1.3)
µ2 + µν + ν 2 1/ν + 1/µ − 1/(µ + ν)
56 1.2. Availability in some other models
57 Allowing arbitrary distributions for the lifetime X and the repair time Y ,
58 [13] (page 283), derived the limiting average availability of a one-unit system
59 supported by one repair facility and one spare unit as
E[X]
Aav (r = 1, s = 1) = (1.4)
E[max{X, Y }]
60 Indeed, when eq. (1.4) is specialized to exponential life- and repair distribu-
61 tions, one can recover eq. (1.3).
62 In [9], for a maintained system under continuous monitoring and perfect
63 repair policy, the instantaneous availability is determined using the Fourier
64 transform technique. Here repair time is restricted to exponential, but life-
65 time is allowed to be either gamma or exponential. Further, using the same
66 technique but incorporating several imperfect repairs before a replacement
67 or a perfect repair, the availability is obtained for exponential lifetime and
68 repair time distributions (with possibly different parameters) in [2].
3
69 Assuming periodic inspection, in [11], the system availability is deter-
70 mined when repair is perfect, lifetime is either gamma or exponential and
71 repair time is constant. The work is extended in [3] by allowing an im-
72 perfect repair policy and a random repair time (specifically, exponential).
73 Further in [12], a periodically inspected system supported by a spare unit
74 and maintained with perfect repair or upgrade is considered; and both the in-
75 stantaneous availability and the limiting average availability are determined
76 for arbitrary lifetime, degenerate upgrade time and exponential repair time.
77 The paper [4] adds to the results of [11] by assuming that the periodic in-
78 spections take place at fixed time points after repair or replacement in case
79 of failure.
80 Allowing arbitrary continuous lifetime, but restricting to exponential re-
81 pair times only, [10] derived the limiting average availability of a one-unit sys-
82 tem under continuous monitoring when there are s ≥ 1 spare units and r ≥ 1
83 repair facilities, by studying the embedded Markov chain (tracked at selected
84 observation times), which is said to be in State i where (i = 0, 1, ..., s, s + 1),
85 if there are i failed units undergoing or awaiting repair by that observation
86 time.
87 Apart from a one-unit system, availability has been studied also for a
88 k-out-of-N system. For example, the authors of [14] study the interactions
89 among several control variables such as preventive maintenance policy, spare
90 part inventories and repair capacity while they affect the system availability.
91 They present an exact as well as an approximate method to develop a trade-
92 off among these control variables. These authors also advocate in [15] a block
93 replacement policy in which all failed and degraded components are repaired
94 by a single repair shop while spare units take over the operation. They
95 provide two approximate methods to analyze the relation between system
96 availability and control variables. In both papers they assume the component
97 lifetimes and repair times are exponentially distributed.
98 For a k-out-of-N : G system, [17] and [18] allow the repair time to have
99 a general distribution, but assume the lifetime to be exponential. The for-
100 mer paper considered one repair man with a single vacation, while the latter
101 considered a replaceable repair equipment which may fail during the repair
102 period and then be replaced by a new one. Both papers used supplemen-
103 tary variable technique and Laplace transform to calculate the availability.
104 The supplementary variable technique is implemented in [16] to derive state
105 equations by defining the system state space and sojourn time in each state
106 to calculate the availability of the system.
4
107 1.3. Overcoming the challenge
108 Let us highlight a serious drawback in the models mentioned above to set
109 the stage for our current research. Although not realistic, researchers often
110 assume exponential life- or repair time distribution to simplify mathematical
111 derivations. They exploit the lack of memory property of the exponential dis-
112 tribution to ensure that the successive differences between life- or repair times
113 are independent exponential variables (with different rates), and thereby they
114 obtain closed form expressions for the limiting average availability.
115 Can we make the model more realistic by allowing arbitrary lifetime and
116 arbitrary repair time distributions for any number of spare units and repair
117 facilities? The challenge of obtaining the limiting average availability under
118 this general setting is expressed in [10] as follows:
125 Some recent papers allow arbitrary life- and repair time distributions:
126 In [5], the authors studied single-component repairable systems supporting
127 different levels of workloads. They provide a numerical algorithm to evalu-
128 ate the probability that the system will perform a specified amount of work
129 within a specified mission time, and the associated conditional expected cost.
130 The paper [6] models dynamic performance of multi-state series parallel sys-
131 tems with repairable elements that can function at different load levels and
132 employs a universal generating function technique to assess system perfor-
133 mance. Here the instantaneous availability is evaluated at different load
134 levels. Further, in [7], the authors proposed a discrete-state continuous-
135 time stochastic process to evaluate instantaneous availability for a common
136 bus performance sharing (CBPS) system. The technique involves integra-
137 tion with respect to the joint distribution of < Tj , Xj > (where Tj denotes
138 the detection time of the failure of the j th component and Xj denotes the
139 operation time).
140 The current paper responds to the challenge posed in [10] by adopting a
141 discretization approach: We inspect the system only at discrete time points;
142 and we intervene only when during inspection we find a unit has failed or
5
143 the failed system is ready for revival because at least one repair has been
144 completed. In particular, if a repair has been completed, but the operating
145 unit has not failed, we do not intervene at all! Thus, this approach essentially
146 discretizes the time variable. Moreover, it relaxes the burden of monitoring
147 the system continuously to monitoring it periodically (at inspection times
148 only); hence, it is logistically preferable.
149 In Section 2, we revisit the case of (r = 1, s = 1); model the stochastic
150 process through discretization as a semi-Markov process; derive the limiting
151 average availability; and exhibit its closeness to the analytic result (eq. (1.4))
152 of [13]. In Section 3, we extend the discretization method to the case of
153 (r = 2, s = 1); that is, we permit a second repair facility. Finally, Section 4
154 concludes the paper with a summary.
157 (1) Lifetimes of the units are independent and identically distributed (IID)
158 continuous random variables with arbitrary cumulative distribution
159 function (CDF) F on a positive support.
160 (2) Repair times are IID continuous random variables with arbitrary CDF
161 G on a positive support.
163 (4) Repair is perfect; that is, a repaired unit is as good as new.
164 (5) The system is under periodic monitoring; that is, it is inspected at
165 regular intervals.
166 (6) Interventions are made only at observation epochs when an operating
167 unit is found to have failed or when the down system is ready for revival
168 because at least one failed unit has been repaired.
169 (7) Whenever at inspection a unit is found to have failed, it is sent to the
170 repair facility. Repair commences instantaneously if the facility is free.
171 Otherwise, the failed unit awaits repair until the facility is free.
6
172 (8) Installation to operation happens immediately when a failed unit is sent
173 to repair (at an inspection epoch) and there is a spare unit (as a result
174 of an already completed repair), or when the failed system is ready for
175 revival at an inspection epoch because one of the failed units has been
176 repaired.
Figure 1: The state transition diagram for the (r = 1, s = 1) case. A rectangle denotes an
up state, and an oval denotes a down state. The status of each unit is denoted as follows:
P for operation; S for stand-by; R for repair (with subscript indicating for how many
inspection periods the repair has been going on); and W for waiting for repair.
180
181 We label the states of the system to indicate the number of failed units:
184 (2) State 2 means there are two failed units. Additionally, we must use
185 a second index to indicate how long the repair on the first failed unit
186 has been going on when the system enters State 2, because that will
187 determine how long the system will stay in State 2. This second index
188 splits State 2 into sub-states: We say the system is in State (2, k) for
7
189 k = 1, 2, . . . , N − 1, if repair on the first failed unit has been going on
190 for a duration k∆ when the other unit was detected to have failed. This
191 is because we monitor the system only at epochs that are multiples of
192 ∆ from the start (or from system revival).
193 Note that by the time the system is detected to have failed, repair on
194 the first failed unit has been going on for a positive duration. Hence,
195 there is no State (2, 0). Also, repair is surely completed in N ∆ duration.
196 Hence, there is no State (2, N ).
197 Let F and G denote the CDFs of the discretized lifetime and repair
198 time X and Y respectively. Let p and q denote the corresponding prob-
199 ability mass functions (PMFs) calculated by taking successive differences
200 pk = F (k ∆) − F ((k − 1) ∆) and qk = G(k ∆) − G((k − 1) ∆) respectively, for
201 k = 1, 2, . . . , N . Let R denote the CDF of max{X, Y } calculated by taking
202 product R(k ∆) = F (k ∆) G(k ∆), and let r denote the corresponding PMF
203 of max{X, Y } obtained by successive differences rk = R(k ∆) − R((k − 1) ∆)
204 for k = 1, 2, . . . , N .
205
207 • At time t = 0, the system is in State 0, where one unit begins to operate
208 and the other spare unit is on cold standby. The system goes from State
209 0 to State 1 when the operating unit is detected to have failed, repair
210 starts on it and the spare unit is put on operation instantaneously.
211 Hence,
P0→1 = 1 (2.1)
212 The system never returns to State 0.
213 • From State 1, after an intervention, the system can go to two places:
214 (i) If repair on the failed unit is completed before the operating unit
215 is detected to have failed, then we do not record this transition at all.
216 Instead, we wait until the operating unit is detected to have failed at
217 epoch k∆. Then we interchange the roles of the two units; and say that
218 the the system has re-entered State 1. This happens with probability
N
X
P1→1 = pk G(k∆) (2.2)
k=1
8
219 (ii) On the other hand, if the operating unit is detected to have failed at
220 epoch k∆, before the repair on the previously failed unit is completed,
221 then the system goes to State (2,k) with probability
222 In this case, the freshly failed unit awaits repair to commence on it only
223 after the repair on the previously failed unit is found to be completed
224 at an inspection epoch. While the system is in State 2 (that is, in any
225 of the States (2,k)), no unit is operating; and the system is down.
226 • From State (2,k) the system surely goes to State 1 when the ongoing
227 repair on the first failed unit is found to be completed at an inspection
228 time and the repair on the second failed unit begins. This happens
229 with probability
P(2,k)→1 = 1 (2.4)
230 In the proposed discretization approach, we split the repair time into
231 N (to be determined momentarily) intervals each of length ∆; and observe
232 the system at epochs k∆ for k = 1, 2, . . . , N . For all practical purposes,
233 we assume that repair is completed only at epochs k∆, since those are the
234 observation epochs (and possible installation epochs).
235 We choose N large enough so that the probability that the larger of life-
236 time and repair time (hence, either lifetime or repair time) exceeds N ∆ is
237 very small (preferably under .001, say); that is, {1 − R(N ∆)} ≈ .001.
238
9
251 solving the following state equations:
X X
πj = πi Pij , for all j ∈ S; and πj = 1 (2.5)
i∈S j∈S
252 where Pij denotes the transition probability from State i ∈ S to State j ∈ S
253 and the transition probability matrix P , which is of dimension (N + 1) ×
254 (N + 1), is as follows:
0 1 ··· 0
0
0
∗ ∗
· · · ∗
= 0
1 · · · 0
0
.. .. ..
. . ..
. . . . .
0 1 0 ··· 0
255 In the P -matrix above, the row and the column labels stand for the corre-
256 sponding states. Note that although the transition matrix P is (N + 1) ×
257 (N + 1), it has non-zero entries (denoted by *) only in the second row corre-
258 sponding to transition out of State 1 and in the second column corresponding
259 to transition into State 1. Therefore, it is straight-forward to solve eq. (2.5).
260 Second, we find the expected sojourn time in each state; that is, the
261 expected time the system stays in that state before it moves to a new state.
262 If a unit is found to have failed at inspection time k∆, it must have failed
263 during the interval ((k−1)∆, k∆]. For simplicity, we assume that it has failed
264 at the midpoint of the interval; that is, it was operating for the initial ∆/2
265 period in the interval and was in failed state during the last ∆/2 period (but
266 was undetected). Although this is a rather crude assumption, it serves our
267 purpose as far as computation of limiting average availability is concerned.
268 The expected sojourn times
PN µ0 and µ1 in State 0 and State 1 respectively,
269 both equal E(X)−∆/2 = k=1 pk k∆−∆/2, since we do not record a repair
10
270 until after the operating unit fails. We subtract ∆/2 from the expected
271 discretized lifetime to account for the fact that the system is actually down
272 during the last ∆/2 duration within each State 0 and State 1.
273 The expected sojourn time µ(2,k) in any State (2, k) (a down state), is
274 the expected additional repair time, given that the previously failed item has
275 been undergoing repair for k∆ time. For k = 1, 2, . . . , N , we have,
N −k
X qj+k j∆
µ(2,k) = E[Y |Y > k] = (2.6)
j=1
1 − G(k∆)
276 There is no need to make a further adjustment of ∆/2 in eq. (2.6) as the
277 system is down the whole time while in State (2,k).
278 Next, using Corollary to Proposition 4.8.1 of [8], the limiting probability
279 that the stochastic process will be found in State j (where j runs over all N
280 States 1, (2, 1), (2, 2), . . . (2, N − 1)) is independent of the initial state and is
281 given by
πj µ j
θj = PN (2.7)
i=1 πi µi
The denominator N
P
282
i=1 πi µi in (2.7) is called the expected cycle time; and
283 it is the expected time between successive renewals (or entry into State 1).
284 Having calculated all θj ’s, we define θ2 = θ(2,1) + · · · + θ(2,N −1) = 1 − θ1 , since
285 State 2 is the aggregate of States (2, 1), (2, 2), . . . , (2, N − 1).
286 Since the system is up in States 0 and 1, and down in State 2, but the
287 system never returns to States 0, the limiting average availability of the
288 system is given by
Aav = 1 − θ2 = θ1 (2.8)
11
296 Under discretization approach, since F (12)G(12) < .001, we decompose
297 the time range (0, 12] into N = 120 intervals of length ∆ = 0.1 each. We
298 construct the CDFs of discretized life- and repair times, F and G, from the
299 above mentioned Weibull distributions evaluated at k ∆ for k = 1, 2, . . . , 120.
300 We construct the PMFs p, q, r as defined above by successive differences.
301 Using equations (2.1 - 2.4), we construct the transition probability matrix
302 P , which in this case is of dimension 121 × 121. Recall from above that P
303 has non-zero entries only in row 2 and column 2. Below we partially display
304 the second row rounding each entry to 3 decimal places; all other entries of
305 the second column are 1.
12
323 closely approximates the analytic result previously derived by [13]. (2) For
324 the case (r = 1, s = 1), the limiting average availability is .53341, while for
325 the case (r = 1, s = 0), using eq. (1.1), the limiting average availability is
326 only 1/2.77 = .361. Thus, there is a significant increase (47.76%) in Aav with
327 the introduction of a spare unit.
328
329 For (r = 1, s = 1), having established the test case of Weibull life- and
330 Weibull repair times, we carry out a more comprehensive study of various
331 combinations of life- and repair time distributions, always ensuring mean
332 lifetime=1 and mean repair time=1.77. We report in Table 1 the limiting
333 average availability using both the analytic formula and the discretization
334 approach. We extend the time range to (0, 20] so that F (20)G(20) < 0.001,
335 but we keep ∆ = 0.1, implying that there are 201 states.
Table 1: Availability under different life- and repair time distributions for the (r = 1, s = 1)
case. The top entry of each cell is the availability computed through discretization and
the bottom entry using eq. (1.4).
336
337 Highlighted in the table is the special case when both life- and repair
338 time distributions are exponential. The analytic result for this case is al-
339 ready given in [1](page 206), [13](page 283) and [10](Corollary 2.2). Here we
340 demonstrate that the result of the discretization approach (.46971) closely
13
341 approximates the analytic result (.46926). The slight discrepancy is due to
342 crudely subtracting ∆/2 from the expected sojourn times of the system up
343 states; State 0 and State 1.
364
Initially, the system is in State 0, where one unit begins to operate and
the other unit is on cold standby. We write the state-space of the system in
two different notation—using one or two indices—depending on the level of
details required for the analysis:
S = {0; 1; 2+ ; 1+ } = {0; (1, 0); (2, 1), . . . , (2, N − 1); (1, 1), . . . , (1, N − 1)}
365 where the first index i denotes how many units have been detected to have
366 failed and are under repair, and the second index j tells us how long the repair
14
Figure 2: The state transition diagram for the (r = 2, s = 1) case. The notation are the
same as in Figure 1.
367 on the first failed unit has been going on when the repair on the second failed
368 unit just starts.
369 Let us explain the state space notation in terms of several examples:
370 • State 1 = (1, 0) means that one unit has been detected to have failed;
371 it has been placed on repair just now, so that its repair duration so far
372 is 0; and the other unit has just been placed on operation.
373 • Note that there is no State (2, 0) because by the time failure on the
374 second unit is detected, the repair on the first failed unit has already
375 started and it has been going on for a positive multiple of ∆. Also, there
376 is no State (2, N ) because if repair has been going on for duration N ∆,
377 it must have been completed. Likewise, there is no State (1, N ).
378 • State (2, 5) (provided, of course, N > 5) means that the system just
379 entered State 2 (that is, both units are known to have failed); repair
380 on the first failed unit has been going on for 5∆ periods; and repair on
381 the second failed unit has just started.
382 • State (1, 7) (provided, of course, N > 7) means that repair on the only
383 failed unit has been going on for 7∆ periods when the other unit is just
384 put on operation (hence, there is only one failed unit).
385 Recall that we only record those inspection epochs when a failure is de-
386 tected or when a down system is ready for revival because at least one unit
15
387 has been repaired. In particular, we do not record epochs when a repair is
388 completed, but the other unit is still operating.
389 Next, let us write down the recorded transitions between states and the
390 associated transition probabilities. Recall that we monitor the system only
391 at epochs ∆, 2∆, 3∆, . . .. As in the case of (r = 1, s = 1), we assume that X
392 is the discretized lifetime with CDF F and PMF p; and Y is the discretized
393 repair time having CDF G and PMF q. Also, we choose N such that the
394 larger of life- and repair times exceeds N ∆ with probability at most .001.
395 • From State 0, the system surely goes to State 1=(1,0) after a random
396 lifetime having PMF p. Therefore,
P0→(1,0) = 1 (3.1)
397 • From State 1=(1,0), if the operating unit is still functioning at epoch
398 k ∆, we do nothing. But if the operating unit is found to have failed
399 at epoch k∆, then it must have failed in the interval ((k − 1)∆, k∆],
400 which happens with probability pk = F (k ∆) − F ((k − 1) ∆). There
401 are two distinct cases to consider:
402 (i) Repair is already completed by epoch k∆ (that is, repair is finished
403 sometime during (0, k ∆]), which happens with probability P{Y ≤
404 k∆}=G(k∆). In this case, interchange the roles of the two units—
405 the repaired unit takes over the operation and the failed unit is put on
406 repair. Hence, the system re-enters State 1=(1,0). Hence,
N
X
P(1,0)→(1,0) = pk G(k ∆) (3.2)
k=1
407 (ii) Repair is not completed by epoch k∆, which happens with proba-
408 bility P{Y ≤ k∆}=G(k∆). In this case, the system goes down, since
409 both units have failed and there is no other spare unit to take over
410 operation. More specifically, the system enters State (2, k). Hence,
411 • When the system enters State (2, k), we continue to observe the system
412 at regular intervals of ∆, labeling those epochs as (k+1)∆, (k+2)∆, . . ..
413 Two distinct cases are possible:
16
414 (i) Both failed units are repaired during the same time interval, say,
415 ((k+j−1)∆, (k+j)∆], where j = 1, 2, . . . , N −k. To find the probability
416 of this case happening, add over all j the product of two independent
417 probabilities: Given that the repair of the first failed unit was not
418 completed by time k∆, the conditional probability that it is completed
qk+j
419 during ((k + j − 1)∆, (k + j)∆] is 1−G(k∆) . The probability that the
420 second failed unit on which repair started at epoch k∆ is repaired
421 during the same time interval as the first failed unit is qj . Finally, note
422 that in this case, one of the repaired units (it does not matter which
423 one, since the two units are identical) is put on operation and the other
424 becomes a standby spare; that is, the system enters State 0. Therefore,
N −k
X qk+j
P(2,k)→0 = qj (3.4)
j=1
1 − G(k∆)
425 (ii) One of the repairs is completed, but not the other. In this case,
426 the repaired unit is put on operation; and the repair on the other unit,
427 which has been going on for l∆ time, continues on, causing the system
428 to enter State (1, l). The meaning of l is explained below in two sub-
429 cases depending on which repair is completed—repair on the first failed
430 unit, or repair on the second failed unit.
431 – (a) Suppose that the first failed unit, on which the repair has been
432 going on for k∆ time, is repaired earlier; and it happens during
433 interval ((k +l −1)∆, (k +l)∆]. The conditional probability of this
qk+l
434 event is 1−G(k∆) . The probability that the second failed unit, on
435 which repair had started freshly at epoch k∆, will not be repaired
436 within the additional l∆ duration is P{Y > l∆} = 1 − G(l∆).
437 – (b) Suppose that the second failed unit, on which repair started
438 at epoch k∆, gets repaired earlier; and it happens during inter-
439 val ((l − 1)∆, l∆], which has probability ql−k . Then the condi-
440 tional probability that the first failed unit will not be repaired by
441 epoch l∆, given that the repair was not completed by epoch k∆,
1−G(l∆)
442 is 1−G(k∆) .
443 Combining the two sub-cases (a) and (b), we have
1 − G(l∆)
P(2,k)→(1,l) = [qk+l + ql−k ] (3.5)
1 − G(k∆)
17
444 where we interpret qt = 0, unless 1 ≤ t ≤ N .
445 • From State (1,l), the system can go to one of two directions:
446 (i) If repair is completed before the operating unit fails, we do not
447 record that transition; instead, we wait until the operating unit fails,
448 say during interval ((j −1) ∆, j∆] (for j = 1, 2, . . . , N ), with probability
449 pj , and the system goes to State (1,0). The conditional probability that
450 repair is completed before this additional time j∆, given that the repair
451 was not completed by time l∆, is G((l+j)∆)−G(l∆)
1−G(l∆)
. Hence,
N
X G((l + j)∆) − G(l∆)
P(1,l)→(1,0) = pj (3.6)
j=1
1 − G(l∆)
18
462 The row and column labels in above matrix represent the corresponding
463 states. As in the case of (r = 1, s = 1), here also the continuous-time
464 stochastic process, after discretization, is a Semi-Markov Process. Hence,
465 the analysis follows along similar lines.
466 First, we find the stationary probabilities {πj , j ∈ S} of the discrete-time
467 Markov chain by solving the state equations that are similar in structure to
468 eq. (2.5), but involve many more states.
469 Second, we find the expected sojourn time in each state. In fact, the
470 expected sojourn times µ0 , µ(1,0) and µ(1,l) in States 0, (1, 0), (1, l), for 1 ≤ l ≤
N −1, are all equal to E(X)−∆/2 = N
P
471
k=1 pk k∆−∆/2. [The subtraction of
472 ∆/2 accounts for the system being down during the last ∆/2 duration within
473 each state 0, (1, 0), (1, l).] The expected sojourn time µ(2,k) in State (2, k) (a
474 down state) is the expected value of the minimum of the two repair times Y0
475 and Yk having CDFs G(j) and G(k+j)−G(k)
1−G(k)
for 0 ≤ j ≤ N (with G(t) = 1 for
476 t > N ) respectively. Using Problem 1.1 of [8], this expectation can be found
477 as the sum of the survival function evaluated at non-negative integers. That
478 is, for k = 1, 2, . . . , N , we have
N
X
µ(2,k) = E[min{Y0 , Yk }] = P {Y0 ≥ j, Yk ≥ j}
j=0
N −k
(3.8)
X [1 − G(j∆)][1 − G((k + j)∆)]
= .
j=0
1 − G(k∆)
479 Here, there is no need to make an additional adjustment of ∆/2 as the system
480 is down throughout the time it is in State (2,k).
481 Next, using Corollary to Proposition 4.8.1 of [8], the limiting probability
482 that the stochastic process will be found in State j is independent of the initial
483 state and is given by expressions of the form (eq. (2.7)), but with many more
484 states. Let us define State 1+ as aggregate of States (1, 1), (1, 2), . . . , (1, N −1)
485 and State 2 as aggregate of States (2, 1), (2, 2), . . . , (2, N − 1).
486 Having calculated all θj ’s, we define θ2 = θ(2,1) + · · · + θ(2,N −1) . Since the
487 system is up in States 0, 1, 1+ , and down in State 2, all states being recurrent,
488 the limiting average availability of the system is given by
Aav = 1 − θ2 . (3.9)
19
489 3.2. Computations and comparison
490 We compute the limiting average availability for various life- and repair
491 time distributions, always choosing mean lifetime 1 and mean repair time
492 1.77. We have truncated all distributions to have support [0, 12], which we
493 have partitioned into 120 equal sub-intervals; that is, we choose ∆ = 0.1.
494 Consequently, there are 240 states in the state space S.
495 The transition probability matrix P is 240 × 240, whose entries, using
496 equations (3.1 - 3.7) and rounded to 4 decimal places, are partially displayed:
497 The stationary probabilities are obtained by using eq. (2.5). They are
498 π0 = .010, π(1,0) = .265, and
499 {π(2,1) , π(2,2) , π(2,3) , . . . , π(2,N −2) , π(2,N −1) } = {.0002, .0013, .0035, . . . , 0, 0}.
500 The expected sojourn times in State 0, State (1, 0) and State (1, l) for
501 l = 1, 2, . . . , N − 1 are all equal to 10.0016. And, using eq. (3.8)
502 {µ(2,1) , µ(2,2) , µ(2,3) . . . , µ(2,N −2) , µ(2,N −1) } = {12.549, 12.093, 11.665, .., 1.399, 1}.
503 Next, using eq. (2.7), we see that the limiting probabilities that the
504 stochastic process will stay in a State j, for j ∈ S are respectively θ0 = .0106,
505 θ(1,0) = .2794, θ1+ = .3764 and θ2 = .4666. Also, the expected cycle time is
506 9.493. Finally, using eq. (3.9), the limiting average availability is obtained
507 as .66650.
508 Furthermore, in Table 2, we display the limiting average availability cal-
509 culated for the same set of life- and repair times as in the case (r = 1, s = 1)
510 and the percentage improvement when (r = 2, s = 1).
511
20
``` Exponential Gamma Weibull
``` Repair-time
``` (1/1.77) (2, 0.855) (2 , 2)
Life-time ```
`` `
.65807 .66392 .66650
Weibull (3, 1.12)
33.37 27.54 24.94
.64764 .65057 .65171
Gamma (2, 0.5)
34.44 29.05 26.51
.63903 .63992 .63943
Inverse-Gauss (1 , 1)
35.33 30.44 28.23
.63676 .63718 .63693
Exponential (1)
35.56 30.61 28.21
.63024 .63009 .62537
Lognormal (-0.5 , 1)
36.23 31.64 29.88
514 4. Conclusion
515 Recall from Section 2 that our discretization approach closely approxi-
516 mates the analytic result for the (r = 1, s = 1) case. Also, from Section 3 we
517 note that for the (r = 2, s = 1) case under exponential life- and exponential
518 repair times, the analytic result of [10], yields a limiting average availabiltiy of
519 0.63871, while our discretization approach using eq. (2.8) gives a limiting av-
520 erage availability of .63676. Hence, we claim that the discretization approach
521 works reasonably well; and it can be used to compute the limiting average
522 availability for any life- and repair time distributions. We also find that as
523 we increase an additional spare unit from (r = 1, s = 0) to (r = 1, s = 1) or
524 as we add an additional repair facility from (r = 1, s = 1) to (r = 2, s = 1)
525 there is significant increase in the limiting average availability of the system.
526 Obviously, we anticipate a further increase in limiting average availability
527 when the number of spare units and/or the number of repair facilities is
528 increased. Of course, inclusion of an additional spare unit or an additional
529 repair facility will invariably lead to an increase in the number of states and
530 therefore inflate the computational burden. Nonetheless, the discretization
531 approach will continue to yield the limiting average availability under any
21
532 arbitrary continuous life- and repair time distributions for other systems as
533 well. For example, in future we plan to extend the discretization method to
534 study a k-out-of-N : G system.
535 Thus, our main contribution in this paper is to provide a simple com-
536 putational technique by utilizing the discretization approach that allows us
537 to incorporate any arbitrary life- and repair time distributions as well as
538 increase the number of repair facilities and/or the number of spare units.
539 5. Acknowledgement
540 We sincerely thank three reviewers for their valuable suggestions and ad-
541 vice. Not only their recommendations improved the readability and relevance
542 of our work, but also helped us gather some resources for future research.
543 References
544 [1] Barlow, R.E., Proschan, F., 1975. Statistical theory of reliability and
545 life testing: probability models. Technical Report. Florida State Univ
546 Tallahassee.
547 [2] Biswas, A., Sarkar, J., 2000. Availability of a system maintained
548 through several imperfect repairs before a replacement or a perfect
549 repair. Statistics & Probability Letters 50, 105–114. URL: https:
550 //doi.org/10.1016/S0167-7152(00)00087-0.
551 [3] Biswas, A., Sarkar, J., Sarkar, S., 2003. Availability of a periodically
552 inspected system, maintained under an imperfect-repair policy. IEEE
553 Transactions on Reliability 52, 311–318. URL: https://ieeexplore.
554 ieee.org/document/1248648.
555 [4] Cui, L., Xie, M., 2005. Availability of a periodically inspected system
556 with random repair or replacement times. Journal of Statistical Planning
557 and Inference 131, 89–100. URL: https://doi.org/10.1016/j.jspi.
558 2003.12.008.
559 [5] Levitin, G., Xing, L., Dai, Y., 2015. Optimal loading of system with
560 random repair time. European Journal of Operational Research 247,
561 137–143. URL: https://doi.org/10.1016/j.ejor.2015.05.033.
22
562 [6] Levitin, G., Xing, L., Dai, Y., 2017. Optimal loading of series parallel
563 systems with arbitrary element time-to-failure and time-to-repair dis-
564 tributions. Reliability Engineering & System Safety 164, 34–44. URL:
565 https://doi.org/10.1016/j.ress.2017.02.008.
566 [7] Levitin, G., Xing, L., Huang, H.Z., 2019. Dynamic availability and per-
567 formance deficiency of common bus systems with imperfectly repairable
568 components. Reliability Engineering & System Safety 189, 58–66. URL:
569 https://doi.org/10.1016/j.ress.2019.04.007.
570 [8] Ross, S.M., Kelly, J.J., Sullivan, R.J., Perry, W.J., Mercer, D., Davis,
571 R.M., Washburn, T.D., Sager, E.V., Boyce, J.B., Bristow, V.L., 1996.
572 Stochastic processes. volume 2. Wiley New York.
573 [9] Sarkar, J., Chaudhuri, G., 1999. Availability of a system with gamma
574 life and exponential repair time under a perfect repair policy. Statistics
575 & Probability Letters 43, 189–196. URL: https://doi.org/10.1016/
576 S0167-7152(98)00259-4.
577 [10] Sarkar, J., Li, F., 2006. Limiting average availability of a system sup-
578 ported by several spares and several repair facilities. Statistics & proba-
579 bility letters 76, 1965–1974. URL: https://doi.org/10.1016/j.spl.
580 2006.04.046.
581 [11] Sarkar, J., Sarkar, S., 2000. Availability of a periodically inspected
582 system under perfect repair. Journal of Statistical Planning and In-
583 ference 91, 77–90. URL: https://doi.org/10.1016/S0378-3758(00)
584 00128-2.
585 [12] Sarkar, J., Sarkar, S., 2001. Availability of a periodically inspected
586 system supported by a spare unit, under perfect repair or perfect up-
587 grade. Statistics & probability letters 53, 207–217. URL: https:
588 //doi.org/10.1016/S0167-7152(01)00087-6.
592 [14] de Smidt-Destombes, K.S., van der Heijden, M.C., van Harten, A.,
593 2004. On the availability of a k-out-of-n system given limited spares
23
594 and repair capacity under a condition based maintenance strategy.
595 Reliability engineering & System safety 83, 287–300. URL: https:
596 //doi.org/10.1016/j.ress.2003.10.004.
597 [15] de Smidt-Destombes, K.S., van der Heijden, M.C., van Harten, A., 2007.
598 Availability of k-out-of-n systems under block replacement sharing lim-
599 ited spares and repair capacity. International Journal of Production
600 Economics 107, 404–421. URL: https://doi.org/10.1016/j.ijpe.
601 2006.08.013.
602 [16] Wang, N., Li, M., Xiao, B., Ma, L., 2018. Availability analysis of a
603 general time distribution system with the consideration of maintenance
604 and spares. Reliability Engineering & System Safety URL: https://
605 doi.org/10.1016/j.ress.2018.06.025.
606 [17] Wu, W., Tang, Y., Yu, M., Jiang, Y., 2014. Reliability analysis of a
607 k-out-of-n: G repairable system with single vacation. Applied Mathe-
608 matical Modelling 38, 6075–6097. URL: https://doi.org/10.1016/j.
609 apm.2014.05.020.
610 [18] Wu, W., Tang, Y., Yu, M., Jiang, Y., Liu, H., 2018. Reliability analysis
611 of ak-out-of-n: G system with general repair times and replaceable repair
612 equipment. Quality Technology & Quantitative Management 15, 274–
613 300. URL: https://doi.org/10.1080/16843703.2016.1226712.
24