8 Francois
8 Francois
Stéphane Dubois
Régie Autonome des Transports Parisiens (RATP)
Dept. Des Equipements et des Systèmes du Transport, 40bis, rue Salengro
F-94724, Fontenay-sous-Bois Cedex, France.
Abstract
1. Introduction
In this paper, an extension of the commonly used VirMaLab formalism will be
introduced. Indeed, this new application deals the broken rails prevention in an
automation context for railway Paris metro lines. The final goal of the project is to
evaluate and compare various diagnostic, maintenance and operating scenarios, in
terms of availability, broken rails frequency… Due to the peak hour’s constraints, the
operator (RATP) needs to estimate, hour by hour its ability to detect broken rail. But,
for many reasons (time computation, parameter accuracy, learning data…), the
modeling of a rail degradation process with a one hour step is impossible.
1
Proceedings of the 38th ESReDA Seminar, Pecs, Hungary, May 4-5, 2010
Our goal is to model the influence of maintenance on the reliability and exploitation
performance of Parisian metro railway of the RATP which is the major transit
operator responsible for public transportation in Paris and its surroundings. The
context of the application is the Parisian metro command
system renewal and decision makers will need to update existing maintenance
policies. Our constraint is to build a system with a granularity of one hour as we need
to evaluate the lost exploitation loop number and as number of exploitation trains
vary with day time. To answer this problem we have build a multi-model containing
four models with four different steps.
The paper is sectioned as follow. The next part will introduce the methodology of our
approach. The third section will deal with the technical developments of the
formalisms of Dynamic Bayesian Networks and of Graphical Duration Models. The
forth part will detail our model. Then, we will conclude with results and future work.
2. Methodology
As we need a model with a hour step and as inference and simulation of probabilistic
graphical models are complex, we have chosen to build a multi-model consisting in
four different models concentrating on different tasks and with different steps.
These four step are motivated by the fact before the rail crack, it evolves between four
normalized sizes of default. The first normalized abnormality of the rail is named X1
and represents a small default inside the structure of the rail. The second normalized
abnormality is named X2 and affects the maintenance planning. That kind of default
is often observed. The last normalized default is the crack which is named BR and
obviously the last possible state is the normal state named OK.
2
Advanced Maintenance Modelling
Considering that classification, the first dynamical model evaluates the transition of
the rail state from OK to X1, and given the slow evolution of the rail, the step of this
model is the month. The second dynamical model represent the evolution of the rail
from the state X1 to the state X2 and is also based on a monthly step. The third
dynamical model simulates the degradation of the rail from the state X2 to the crack
S, its step is the week.
These three first models emphasize the role of the preventive maintenance strategies.
The last model (from the state BR to the state OK) is the one that emphasize the role
of the corrective maintenance and evaluate hour by hour the response of the system to
the crack until the rail replacement by a new component.
When using simulation of the model, the four models are waiting a transition to a new
state to enable the following corresponding model. The originality of this work is the
use of many dynamical Bayesian networks together with the use of Graphical
Duration Model recently introduced in the literature by [3].
When considering exact inference on this multi model, as each model concentrates on
different state of evolution of the rail, we consider that performing separate inference
on each model even if there are links between these give a solution to the component
life time estimation as we could add the estimated life time in the state OK with the
ones in the state X1, the state X2 and the state S. We could use that approximation
because only rail in states X2 and BR are renewable. Then, we need to estimate the
percentage of rail that access the state BR compare to those in state X2 which have
been corrected by preventive maintenance to weight the above sum.
3. Technical developments
3.1 Dynamical Bayesian Networks
a directed acyclic graph (dag) G =< N,U > where N represents the set of nodes (one
node for each variable) and U the set of edges
and parameters θ = [[θijk]], 1≤ i ≤ n, 1≤ j ≤ qi, 1≤ k ≤ ri the set of conditional
probability tables of each node Xi knowing its parents' state set Pa(Xi) (with ri and qi
as respective cardinalities of Xi and Pa(Xi) ).
3
Proceedings of the 38th ESReDA Seminar, Pecs, Hungary, May 4-5, 2010
In reliability analysis, one can be interested in modeling how a system changes from
an up state to a down state over time. Most of the time, a modeling based on the DBN
formalism was done [2].
The major drawback of this approach comes from the constraint on the state sojourn
times which are necessarily exponentially distributed. Indeed, if the considered
system follows (or is very close to) an exponentially distributed degradation process,
this approach can be perfectly suitable. On the other hand, if the sojourn times are far
from an exponential distribution, a Markovian modeling will be unable to take this
fact into account and the modeling of the degradation process will be biased. In a
reliability analysis, such inaccurate estimation can have strong consequences, notably
if one wants to optimize parameters of maintenance policies based on reliability. This
constraint can be solved by using Semi-Markov models which allow considering any
kind of sojourn time distributions. One solution is introduced in the following section.
The Graphical Duration Model is a specific DBN, using semi-Markov models. The
main idea is the introduction of remaining time variable into the graph that allows to
model multi-state systems featuring complex sojourn times. Figure 1 shows a GDM
in its DBN form.
The solid lines define the basic structure; dashed lines indicate optional items and red
bold edges characterize dependencies between time slices. The model handles two
kinds of variable:
(Xt), 1≤ t ≤T, represents the system state over a sequence of length T.
(XDt ), 1≤ t ≤T, represents the remaining time before a system state modification
(remaining sojourn time).
4
Advanced Maintenance Modelling
Where the notation A╨B means that variables A and B are statistically independent.
On the other hand, the GDM structure leads to
(Xt-1,XDt-1) ╨ (Xt+1;X
D
t+1) │ (Xt,XDt ) .
So, the set (Xt,XDt) engendered by a GDM is Markovian, despite (Xt) is not. The GDM
generalizes the recent studies on discrete semi-Markovian processes [1].
On the practical point of view, this approach allows specifying arbitrary state sojourn
time distributions by contrast with a classic Markovian framework in which all
durations have to be exponentially distributed.
This modeling is therefore particularly interesting as soon as the question is to capture
the behavior of a given system subjected to a particular context and a complex
degradation distribution. More details on GDMs (quantitative description, optional
context description) can be found in [3, 4].
We have been guided for designing the model by the fact that a crack evolves in the
rail structure following four normalized default. The first state is OK, when the rail is
alright. Then the first level of default, named X1, represents a crack that has less than
5
Proceedings of the 38th ESReDA Seminar, Pecs, Hungary, May 4-5, 2010
5 millimeters of length. The second size of default, named X2, represents a crack that
has more than 5 millimeters of length. Even if the real evolution speed isn't really
known as it depends on too many parameters. It is empirically known that bigger is
the crack, more speed is its evolution. That's why discriminating the size of the
default allows to model it in a better way. The last state, which is named S, represents
a default that need immediate replacement of the rail.
That structure of evolution of cracks with different speed, and different maintenance
policies associated, has leaded us to a multi-model. The first model represents the
(slow) evolution from the normal state (OK) to the first level of perceptible default
X1. This model has a monthly step. The second model represents the degradation
from the low level default X1 to the high level of default X2. This second model has
also a monthly step as it is known that the evolution from X1 to X2 could take many
years. For instance, safety policies admit that, after 3 years of classification in X1, a
default is automatically over classed in X2. The third model represents the
degradation of the rail from the state X2 (less than 3mm crack length) to the state BR
(unsafe crack) and it uses a weekly step. These three first models concentrate on the
evaluation of predictive maintenance as the rail is always considered as safe.
The last model concentrate on the efficiency of the corrective maintenance. It deal
with a unsafe rail (BR) and hourly evaluate if the crack is detected by the different
agent of detection. When it is done the rail is replaced, the cost is evaluated, and we
get back to the first model with a new rail (OK).
This first model, introduced in figure 3, deals with the rail’s preventive maintenance
strategy. As a VirMaLab modelling, it is constituted of two blocks.
The first one describes the degradation process of the rail, using the GDM formalism
(introduced in section 2.3.). The rail degradation can be influenced by several
contextual variables such as the rolling stock (changing from on line to another), the
curve radius (and if we consider the inner or outer rail) and the steel’s stiffness.
The second block of this model describes the diagnosis devices and the maintenance
strategy. Three devices trigger periodic auscultations of the rails: The Ultra-sonic
vehicle (USV), walking survey teams (WT) and drivers (Drv, whom presence depends
on the state of the traffic, with peak hours, night operating stops…). The modelling of
the last device is a little more complex. Indeed, various Track Circuits technologies
constitute the whole signalling network, with different failure rates, different sizes...
Moreover, the analysis of RATP databases underlines that, during worm seasons, the
rail dilatation keeps the electric contact of many cracks. All these variables have,
therefore, to be taken into account in the final modelling.
All four diagnosis devices supply an estimation of the current state of the rail
(integrating their own good detection and false alarm rates) that influences the
maintenance decision. When a maintenance action is performed, it is assumed that the
system turns to the OK state in a single iteration.
6
Advanced Maintenance Modelling
Figure 3. Structure of the VirMaLab model for the 3 first slices of the StatAvaries Multi-nets.
Four models inspired by the one represented in figure 3 are linked together in a
dynamical process. The resulting model represented in figure 4 is a dynamic Bayesian
multi-network that allows to simulate life time of the observed rail together with the
maintenance policies that are developed on the rail network.
To make easier the use of this multi-nets model for both maintenance operators and
managers of the automation lines project, a friendly user interface was developed.
It allows determining the following parameters: The considered line (among the 11
iron contact RATP metro lines), the rail context: The whole line or only the in curve
rails (eventually only the upper rail), the critical curve radius. It determines the set of
curves on which a crack could have critical consequences in terms of passengers’
safety, the rail quality. For different reasons, an operator can decide to change the
iron stiffness. Consequently, the rail degradation process must be adapted, rolling
stock specifications: Running period, mean speed, length and axle load. These
parameters influence the rail degradation speed and are also necessary to evaluate
some final indicators, diagnosis parameters as good detection and false alarms rates,
USV and WT auscultation periods, parameters of the TC technologies encountered on
the considered metro line, traffic periods.
The user can define the night and running periods (usually, a metro line is operating
20 hours a day) and, in the operating period, 6 different temporal windows and their
own train periods. Thus, the real traffic conditions of each line (but also hypothetical
parameters that might be evaluated) can be modeled.
When all parameters are defined, the inference can begin. Due to the modelling
complexity, the computation of an experiment can be quite long (around 2 hours). But
7
Proceedings of the 38th ESReDA Seminar, Pecs, Hungary, May 4-5, 2010
the user can be sure that the StatAvaries tool provides the exact values of expected
results since the inference of the modelling is based on an exact inference algorithm
[7]. As an illustration, the next section will introduce results obtained in one of the
scenarios investigated for RATP. The aim of this paper is not to list all results
obtained during the study but to introduce the VirLaLab multi-nets extension,
illustrated by one experimental example. For more information on some of the
obtained results, readers can contact the authors.
In this study, one of the considered scenarios deals with the influence of the USV
auscultation period on maintenance actions and network’s availability.
The figure 5 introduces some results of this experiment, obtained for line 7. For
industrial reasons, exact values of indicators are deleted. Nevertheless, the interest of
this picture lays in the dynamic of defects numbers.
For this experiment, the ultrasonic auscultation period was changed (the currently
commonly used value is To), with three considered options: 2To, To/2 and To/6.
We can note that, as expected, the more frequently ultrasonic equipment sound the
infrastructure, the more preventive actions will be planed. Early defects are therefore
more easily diagnosed, and then, corrected before they turn to the critical state of
broken rail.
Moreover, the gain in terms of broken rails is especially significant for the first
simulations (To/2) and, beyond, seems to decrease.
8
Advanced Maintenance Modelling
We can note that, decreasing the USV auscultation period, the preventive maintenance
is more reactive. That result that is intuitive could be quantified using our technics.
Then, more X2 defaults can be detected sooner which means less X2 that evolves to
BR. The improvement of preventive maintenance shows a decrease of detection time
that could be quantified to be used as an input of an optimisation problem to set up
the appropriate period considering risk level that is allowed versus human workers
that could be deployed.
9
Proceedings of the 38th ESReDA Seminar, Pecs, Hungary, May 4-5, 2010
References
[1] Barbu, V., Boussemart, M., Limnios, N., 2004. Discrete time semi-markov
processes for reliability and survival analysis. Communication in Statistics - Theory
and Methods 33 (11), 2833-2868.
[2] Bouillaut, L., Francois, O., Leray, P., Aknin, P., Dubois, S., 2008. Temporal
bayesian network modeling maintenance strategies : Prevention of broken rails. In:
Procedings of World Congress on Rail Research WCRR'08.
[3] Donat, R., Bouillaut, L., Aknin, P., Leray, P., 2008. A dynamic graphical model
to represent complex survival distributions. Advances in Mathematical Modeling for
Reliability ISBN: 978-1-58603-865-6, 17-24.
[4] Donat, R., Leray, P., Bouillaut, L., Aknin, P., 2009. Representation of discrete
duration models by means of a specic dynamic bayesian network. Neurocomputing
ISSN: 0925-2312, 1-12.
[5] Jensen, F., 1996. An introduction to Bayesian Networks. Taylor and Francis,
London, United Kingdom.
[6] Kim, J., Pearl, J., 1987. Convice; a conversational inference consolidation
engine. IEEE Trans. on Systems, Man and Cybernetics 17, 120{132.
[7] Lauritzen, S., Spiegelhalter, D., 1988. Local computations with probabilities on
graphical structures and their application to expert systems. Journal of the Royal
Statistical Society B 50 (2), 157-224.
[8] Murphy, K., 2002. Dynamic bayesian networks: Representation, inference and
learning. Ph.D. thesis, University of california, Berkeley.
[9] Naïm, P., Wuillemin, P.-H., Leray, P., Pourret, O., Becker, A., 2007. Réseaux
bayésiens. Eyrolles, ISBN : 2-212-11972-0, 3e edition.
[10] Pearl, J., 1998. Graphical models for probabilistic and causal reasoning. In:
Gabbay, D. M., Smets, P. (Eds.), Handbook of Defeasible Reasoning and Uncertainty
Management Systems, Volume 1: Quanti_ed Representation of Uncertainty and
Imprecision. Kluwer Academic Publishers, Dordrecht, pp. 367-389.
10