0% found this document useful (0 votes)
83 views8 pages

Modeling and Analysis of Redundancy Management in Distributed Object-Oriented Systems by Using UML Statecharts

This document summarizes a paper that presents techniques for modeling and analyzing redundancy schemes in distributed object-oriented systems using UML statecharts. The key aspects are: 1) The replication manager, which is core to redundancy schemes, is modeled using UML statecharts to capture its behavior and different replication/repair strategies. 2) The statechart model is transformed into a stochastic Petri net model to enable quantitative dependability analysis and comparison of alternatives. 3) The approach is illustrated on an example system based on the Fault Tolerant CORBA specification involving replicated server objects, fault detection, and a replication manager.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views8 pages

Modeling and Analysis of Redundancy Management in Distributed Object-Oriented Systems by Using UML Statecharts

This document summarizes a paper that presents techniques for modeling and analyzing redundancy schemes in distributed object-oriented systems using UML statecharts. The key aspects are: 1) The replication manager, which is core to redundancy schemes, is modeled using UML statecharts to capture its behavior and different replication/repair strategies. 2) The statechart model is transformed into a stochastic Petri net model to enable quantitative dependability analysis and comparison of alternatives. 3) The approach is illustrated on an example system based on the Fault Tolerant CORBA specification involving replicated server objects, fault detection, and a replication manager.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Modeling and Analysis of Redundancy Management in Distributed

Object-Oriented Systems by Using UML Statecharts

Gábor HUSZERL, István MAJZIK


Budapest University of Technology and Economics
Department of Measurement and Information Systems
Pázmány Péter sétány 1/d., H-1521 Budapest, Hungary
E-mail: [huszerl|majzik]@mit.bme.hu

Abstract the Object Request Broker (ORB), as well as a set of ser-


vices for object management, e.g. Transaction Service or
The paper presents techniques that enable the modeling Trader Service. Originally, neither the ORB nor the ex-
and analysis of redundancy schemes in distributed object- isting services provided means for building reliable and
oriented systems. The replication manager, as core part of highly available applications. Different approaches were
the redundancy scheme, is modeled by using UML state- elaborated to incorporate the necessary extensions into the
charts. The flexibility of the statechart-based modeling, CORBA framework. Among others Orbix+Isis [8], Electra
which includes event processing and state hierarchy, en- [10] and Eternal [11] can be mentioned. Recognizing the
ables an easy and efficient modeling of replication strate- need for applications that provide high availability, OMG
gies as well as repair and recovery policies. The statechart started a standardization process to define fault tolerance
is transformed to a Petri-net based dependability model, in CORBA. The Fault Tolerant CORBA (FT-CORBA) [13]
which also incorporates the models of the replicated ob- was proposed by leading information technology compa-
jects. By the analysis of the Petri-net model the designer nies and reached the level of a Final Adopted Specification
can obtain reliability and availability measures that can be in 2000.
used in the early phases of the design to compare alterna- The fault tolerant distributed OO systems typically share
tives and find dependability bottlenecks. Our approach is similar concepts. Server-like objects are replicated on
illustrated by an example. different nodes of the distributed system, thus forming a
replica group. The clients can transparently invoke one or
more objects of the replica group depending on the redun-
1 Introduction dancy scheme (e.g. active or passive redundancy). The fault
model assumes object crash failures. It is the responsibility
The increasing need for distributed technologies have of a replication manager to keep the necessary number of
led to several distributed object-oriented (OO) middlewares. object replicas, i.e. to recover the replica objects or create
These systems target the problems of transparent invocation new replica instead of the crashed one. Accordingly, the be-
of objects as well as services that are necessary in a dis- havior of the replication manager has crucial impacts on the
tributed environment (ordering of messages, multicast com- availability of the replica group.
munication etc.). The need for highly available applications The designer of the system needs tool support to con-
lead also to fault tolerant extensions like replication man- struct optimal fault tolerance schemes and to parameterize
agement, checkpointing and recovery. these schemes in terms of deployment, number of replicas,
This trend can be presented by the example of the most fault monitoring interval, repair policy etc. Dependability
important open object-oriented middleware, the Common modeling proved to be a useful technique in the early de-
Object Request Broker Architecture (CORBA). CORBA sign phases when comparison of the alternative architec-
was specified by the Object Management Group (OMG). tural solutions and identification of dependability bottle-
It provides an object-oriented infrastructure that allows ob- necks is necessary. Stochastic dependability models using
jects to communicate, regardless of the specific platforms Markov-chains or Petri-nets can provide numerical reliabil-
and languages used to implement them. CORBA defines ity and availability figures that can be used to analyze the
the basic mechanisms for remote object invocation through sensitivity of the system-level measures to component pa-
rameter values. In this framework fault tolerance is provided by entity re-
Dependability modeling is usually a manual task requir- dundancy, i.e. by the replication of objects, fault detection
ing expertise and some experience. However, if a detailed and error recovery. Client objects can invoke the methods
model of the system is available, automatic dependabil- of the replicated server objects thus avoiding single point
ity modeling is a promising approach. Nowadays, as the of failures normally caused by single server objects. The
Unified Modeling Language (UML) becomes the de facto client objects should not be aware of the fact that the server
standard modeling language of object-oriented systems, the objects are replicated (replication transparency) and should
system model is usually available in UML. Accordingly, not be aware of faults in the server replicas or of recov-
UML-based automatic model transformation can provide ery from faults (failure transparency). Redundant objects
the stochastic dependability model required for the analy- belong to object groups, and several object groups can be
sis [3, 4]. managed together in a fault tolerance domain. In each do-
The paper presents a technique that enables the modeling main, the creation and maintenance of the object replicas is
and analysis of redundancy schemes in distributed object- provided by the replication manager (RM) (Figure 1). We
oriented systems. The replication manager, as core part of do not separate application-controlled and infrastructure-
the redundancy scheme, is modeled by using UML stat- controlled schemes and assume that the replication manager
echarts. The flexibility of the statechart-based modeling, is solely responsible for these tasks. The replica objects are
which includes event processing and state hierarchy, sup- continuously monitored by local fault detectors that are de-
ports an easy and efficient modeling of replication strate- ployed on each host. If a replica object fails (crashes) then
gies as well as repair and recovery policies. The statechart the local fault detector reports the error to the fault notifier.
of the replication manager, as well as simplified behavior The fault notifier filters and analyzes the incoming error re-
of the replica objects, is transformed to a stochastic Petri- ports and sends a notification to the replication manager.
net. By the analysis of the Petri-net model the designer can The local fault detectors are monitored by a global fault de-
obtain reliability and availability measures that can be used tector that detects when a local fault detector is not available
to compare alternatives and analyze sensitivity to parame- (e.g. in the case of a host failure). When the replication
ters. Our analysis is based on an (early) architectural view manager receives a notification about the crash of an object
of the system. The behavioral description of the replication replica, then it can initiate the recovery of that replica (by
manager is used only to derive the replication management invoking a specific method), or it can remove the crashed
and repair strategy, in this way determine the structure of object from the object group and create a new replica (by
the dependability model. invoking a factory object that is deployed on each host).
In our previous works methods of transformation from
full UML statecharts to Petri-nets were proposed [7]. Now The replication style can be one of the standard ones
we apply this method to the analysis of replication manage- like active, warm or cold passive, stateless etc. but also
ment, present the necessary UML extensions and show the application-specific style can be programmed. The (trans-
usefulness of the approach. We investigate the modeling of parent) connection between the client and the object group
client fail-over, repair and recovery policies and fault man- is the responsibility of a gateway (GW). Direct connection
agement. To do this, we adopt the system structure that is is handled as a degenerate case of the gateway. The client
proposed by the FT-CORBA specification. fail-over strategy determines the behavior of the client when
The paper is structured as follows. In Section 2 we dis- it does not receive the requested service. Usually, a retry
cuss the typical replication strategies in OO systems. Sec- mechanism is implemented. In our case we delegate this
tion 3 describes the modeling approach. The model trans- mechanism to the gateway and we say that the system fails
formation is outlined in Section 4. The last section presents when the necessary number of server replica as required by
an illustrative example. The paper is closed by a short con- the gateway is not available.
clusion.
In order to increase reliability, the global components of
the infrastructure, i.e. the replication manager, the global
2 Replication in distributed OO systems fault detector, the fault notifier and the gateway, can be
replicated as well. We assume an active replication (an ob-
The architecture presented in the FT-CORBA standard ject fails if there are no available replicas).
clearly separates the typical tasks in a redundancy structure
and assigns these tasks to individual objects. In this way The fault model is object crash. It means that in the case
the responsibilities and the interfaces of the objects can be of an error the object will not provide any response (ser-
clearly defined. In our architectural model, however, we do vice) to the clients and will not return to normal operation
not restrict the interfaces, mechanisms and other implemen- until an explicit recovery. In this paper we do not model
tation details that are specified by the standard. commission faults.
notification
Fault Fault
Replication Manager
Notifier Detector

fault reports is alive

create object

Host 1 Host 2 Host 3

Server Replica 1 Server Replica 2


Client Object

Fault Fault
Factory Detector Factory Detector

ORB ORB ORB

Logging Recovery Logging Recovery


Mechanism Mechanism Mechanism Mechanism

Figure 1. The architecture of the FT-CORBA redundancy structure

3 Modeling of redundancy structures ing to them a fault tree consisting of an AND gate (i.e.
an infrastructure object fails if all of its replicas fail).
We are interested in the dependability model available Note that this replication style can be refined.
in the early phases of the system design when decisions
can be made about the architecture of the system and the Replication style: It is the responsibility of the gateway
applied redundancy scheme (including replication, recov- which processes the request of the client and the re-
ery and repair strategy). Accordingly, we focus on the ar- sponse(s) of the server object(s). The statechart of the
chitecture of the system and abstract from implementation gateway determines the condition of the system fail-
details of the replica consistency, checkpointing and recov- ure. Usually, this function can be represented by a fault
ery. The main source of information is the object model of tree.
the system (available as UML class, object and deployment Repair and recovery strategy: It is the responsibility of
diagrams) and the dynamic model of the core parts of the the RM which processes events from the fault notifier
system (available as UML statechart diagrams). and from the factory, and sends messages to the fac-
The availability of the object group (presented in the pre- tory and to the replica objects. The processing of the
vious section) is determined by several factors and param- failure/repair events, which is defined by the statechart
eters. In the following, we outline how they can be repre- diagram of the RM, determines the repair and recovery
sented in (and then extracted from) the UML model. strategy.
Reliability parameters of the replica objects, the hosts
Accordingly, the most sophisticated model in the re-
and the local fault detectors: The dynamic behavior of
dundancy structure is that of the RM. The full modeling
these objects is not analyzed. The designer assigns the
failure rate and the average time required for recovery power of statecharts is required to be able to describe the
conditions and sequences of events/actions that determine
to these objects as UML tagged values.
when a failed object is recovered, how many replicas are
Reliability parameters of the global components of the maintained, what is the condition of object removal from
infrastructure: They are modeled in a similar way like the replica group etc. Note that the statechart model of
in the case of the replica objects. The roles of the ob- the RM describes only dependability-related behavior, no
jects are identified by UML stereotypes. Active repli- application-specific details are included. Thus, there is no
cation of these objects is taken into account by assign- need to filter out irrelevant states or transitions.
Event From (sender) To (receiver) Function
create RM factory Create an object
initialized factory RM Object is created and initialized
remove RM factory Remove an object
removed factory RM Object was removed
object fault fault notifier RM Failure of an object
host fault fault notifier RM Failure of a host
recover RM replica Initiate recovery
recovered fault notifier RM Object is recovered
unrecoverable fault notifier RM Object can not be recovered

Table 1. Events in the redundancy structure

Without restricting the interfaces and the implementation nets with timing and stochastic extensions. Petri nets (PN)
of the system, we assume that the events presented in Table are a widely accepted formalism for modeling and analysis
1 are processed by the RM. of distributed systems. For performance and dependabil-
ity evaluation extensions of PNs like Generalized Stochas-
4 Dependability analysis by model transfor- tic Petri Nets [1], Stochastic Reward Nets [12] offer not
only precise mathematical background but also sophisti-
mation
cated analysis tools. Our choice was the class of Stochas-
tic Reward Nets (SRN). SRNs generalize classical PNs by
4.1 Dependability sub-models rewards (various measures) and by assigning guards and
distributions of the firing time to transitions. Three SRN
Our dependability model consists of several sub-models tools, SPNP [6], PANDA [2] and DEEM [5] were used in
as follows. our analysis environment. Dependability measures can be
 The core part of the dependability model that repre-
specified by reward functions. In certain cases (e.g. in the
case of exponential transition firing times) analytic solution
sents the replica management is generated by an auto-
is possible, otherwise simulation has to be performed. If a
matic model transformation from the statechart of the
steady state exists then steady state measures can be com-
RM to a stochastic Petri-net model.
puted, otherwise transient analysis can be executed. The
 The replica objects and the global components of the analysis of the probability of states identified as represent-
infrastructure are represented by simplified models ing erroneous behavior leads to reliability (if no repair is
that are stored in a library of pre-defined sub-models. modeled) and availability characteristics (if repair is mod-
Note that the simplified models can be replaced by de- eled).
tailed ones when the dependability-related behavior of Correct SRN representation of the statechart with event
these components is fully described by statecharts (e.g. processing and state hierarchy needs a thorough analysis of
in the case of the fault notifier). In this case the auto- the semantics of both models. Our transformation was de-
matic model transformation can be used (like above). fined in a modular way, by introducing a set of SRN pat-
 The connection among these sub-models – as defined
terns. These patterns are assigned to peculiar constructs
(like event dispatcher) or concepts (like state hierarchy, syn-
in the UML object diagram – is provided by the event
chronization) of the UML statechart formalism, this way
processing mechanism, i.e. event queues and dispatch-
they help in decomposing the problem and also in proving
ers belonging to the objects.
the correctness of the proposed solutions (according to the
In the following, we outline the model transformation informal requirements of the UML semantics as defined in
necessary to construct the core part of the dependability the standard [14]). The source models of the transforma-
model. tion described in this paper are restricted to UML statecharts
without history states. Actions are restricted to generation
4.2 From UML statecharts to stochastic Petri nets of new events, while events do not have parameters.
According to the semantics of UML statecharts (pre-
Our analysis of the redundancy management is based sented informally in [14] and formalized in [9]), several pe-
on a transformation from UML statechart models to Petri culiar concepts have to be taken into account. We discuss
them in the following. source and target states at different levels of the state hierar-
chy. Due to the concurrency, multiple transitions (triggered
by the same event) may be enabled at the same time. En-
4.2.1 Event queue and dispatcher
abled transitions which have common state(s) to exit are in
The events arriving from the environment or from the state conflict. Some conflicts can be resolved by the priority rela-
machine specified by the statechart itself are collected in tion: if a transition has a source state that is substate of the
the queue and dispatched by the dispatcher one at a time. source state of an other transition (being in conflict) then it
Event queues provide the interfaces among state machines has higher priority.
belonging to different objects. Since UML defines precisely From the point of view of the priority, enabled transitions
neither the policy of the dispatcher nor the number and dis- can be represented in the form of a tree according to their
tribution of event queues, we have defined patterns for sev- source states in the state hierarchy. Transitions on different
eral policies and leave it to the designer to specify the details branches of this tree can fire independently, while the con-
in the UML model (by using constraints). If the events are flicts of transitions being on the same path from the root to
selected non-deterministically, then the queue can be im- a leaf are resolved by the priority relation. Conflicts among
plemented with SRNs quite easily. However, FIFO (First transitions emanating from the same state or from different
In, First Out) is a costly policy in terms of the size and state active states, over which the priority relation is not defined,
space of the SRN. Figure 2 presents a FIFO queue for two are resolved non-deterministically.
events ”up” and ”down”. Accordingly, the SRN representing the maximal selec-
tion of UML transitions triggered by the same event is a tree
of interconnected sub-SRNs (each of them representing a
up0 down0 single UML transition) with an additional control structure.
This control structure consists of two chains of places. A to-
discard_up discard_down
[queue_2] [queue_2]
ken runs on one of the chains when the event is “not yet con-
[!queue_2] [!queue_2] sumed” by the transitions on the given arc of the tree, and
a token runs on the other chain when the event is “already
consumed”. A joining node of the tree merges the chains of
queue_2 up_queue_2 down_queue_2
the subtrees. All of the UML transitions in the subtree have
higher priority than any transitions along the common path
[!queue_1] [!queue_1] of the tree (on the root-side of the joining node), therefore
“event is unconsumed” applies to this common path if and
queue_1 up_queue_1 down_queue_1 only if the event was not consumed by any of the transitions
of the subtree. The “event is consumed” applies to the com-
mon path when some of the transitions of the subtree have
[!queue_0] [!queue_0] already fired (they had carried over the tokens on the “con-
sumed” chain) and the other transitions could not fire (they
queue_0 up_queue_0 down_queue_0 passed on the tokens along the chain). This construction
ensures that if the token representing the event reaches the
root of the tree, no more sub-SRNs corresponding to transi-
READY tions of the statechart will fire, the execution step is almost
finished.
up1 down1

4.2.3 Semantics of timed transitions

Figure 2. SRN pattern of a FIFO queue of two The relationship of timing and guard evaluation is not spec-
events ified in standard UML. In our approach, time delay is as-
sociated with UML transitions, assuming that this delay is
the result of program code execution or communication de-
lay. Accordingly, the guard expressions have to be evalu-
4.2.2 Hierarchy of states and transitions ated before the firing of the (timed) transitions. Taking into
account the needs of different applications, we implemented
One important feature of statecharts is the hierarchic struc- three possible semantics for timed and guarded UML tran-
ture of states. States can contain sub-states or concurrent sitions: (i) the selection of the transitions is irrespective of
sub-machines. Transitions of a statechart may have their timing, (ii) the guard has to be true during the delay else the
transition will be deselected and (iii) the “fastest” enabled object-level replication is used. In the following presenta-
transition wins. tion we focus on the model of the RM. The model is com-
The corresponding SRN patterns represent the timed pleted by simplified models for the infrastructure objects
transitions of the statechart. The types and parameters of and the server replicas.
the timed SRN transitions correspond to the ones of the
transformed statechart transitions. The timing policy (re- 5.1 Model of the replication management
sampling, race with age/enabling memory) is defined by the
designer and must be supported by the SRN-tool used for In our model the RM creates 2 replicas when the server
the analysis. object group is constructed. In case of a host failure the
replica deployed on that host is removed from the group and
4.2.4 Step semantics a new one is created on a different host. We do not focus
on the repair of hosts, thus we assume here that the number
The transitions of the UML statechart fire in steps, i.e. a
of hosts is not limited. In case of an object failure the RM
stable state configuration is reached only if the maximal set
initiates the recovery of the replica. If the recovery was
of enabled transitions has already fired. In contrary, SRN
not successful then the replica is removed. When failure of
reaches a stable state after each firing. The UML semantics
a just-recovered replica is reported, it is removed without
requires the evaluation of the guards of the transitions at the
trying to recover again. The replica is also removed when
beginning of a step, before firing of any transition. Thus
there are no other replicas working. The services of the
the guards refer to the consistent state configuration before
replicated objects are available as long as there is at least
the actual step. In SRNs, the guard of a transition will be
one working replica.
evaluated just before the given transition fires, the evalua-
tion is not scheduled to the beginning of a “step” and the unrecoverable/remove
REMOVING host−fault/remove RECOVERING
“results” are not stored. Accordingly, the last stable state recovered/−
configuration of the state machine must be recorded to en-
obj−fault/remove
sure the correct evaluation of guards. To do that, the places

representing the states of the statechart are duplicated. For
a state  there is a place containing a token if and only if
host−fault/remove
WORKING

removed/create
the state was active
is an  other place
 containing
just before the actual step, and there
a token if and only if the
−/create
CREATING
initialized/− OK
T/−
SUSP.

obj−fault
state will be active after the actual step. In this way the [!B.WORKING]/remove [B.WORKING]/recover
guards and the transitions changing the state refer to differ-
ent places. This concept necessitates a synchronization of
the duplicated places at the end of each step. Figure 3. Statechart model of the redundancy
manager
4.2.5 Composition of subnets
The SRN corresponding to a given UML statechart is com- The statechart model of the RM consists of 2 concurrent,
posed of the subnets introduced in the previous subsections. identical sub-machines (A and B) supervising the 2 replicas.
The subnets are connected with each other using interface Figure 3 depicts the statechart of one sub-machine (A). (We
places. simplified this Figure by not depicting a ”time-out - retry on
The initial state of the SRN is defined as follows. If the different host” mechanism.)
event queue contains events in the initial state then these When starting, the RM sends event create to the fac-
events are represented by the initial marking of the appro- tory of the chosen host. If it has received a message about
priate places of the event queue subnet. The initial state the successful construction (initialized) then it considers the
configuration is mapped to the SRN by inserting tokens into given replica working. In state Working two events will trig-
the corresponding place-pairs. ger transitions. An event host-fault moves the component to
the state Removing sending an event remove to the factory
5 An example of the given host. An object-fault moves the component ei-
ther to the same state (Removing) sending remove or to the
Our illustrative example is a model of the architecture of state Recovering sending recover to the object replica. The
a distributed object-oriented system. The application (e.g. choice depends on the state of the other replica (B). In the
an e-commerce application or a hospital patient monitor- state Recovering the local fault notifier of the host can re-
ing system) cannot tolerate long unavailability of the ser- port the successful recovery of the object, in which case the
vice provided by the system. To achieve this goal active component moves back to a sub-state of its state Working.
The component leaves this sub-state when a timer has ex- accepting events create and remove and sending events
pired (the timer is a separate object). If another object-fault initialized and removed.
occurs before this time, the event remove is sent to the fac-  The failure of a host is effectual only if an object
tory of the replica. The local fault notifier of the host may
replica is deployed on it.
report the object being unrecoverable by sending an event
unrecoverable to the RM. In this case remove is sent to the  The fault notifier collects information about the state
factory of the replica. In the state Removing the RM waits of the object replica and the host with some delay, and
for an event removed from the factory, and after receiving forwards these events to the RM. Its SRN model was
it, it begins with an event create again. constructed from the UML statechart model by using
The RM is considered to have a FIFO event queue of the transformation.
length 6, capable of accepting 12 different events.
5.2 Measurement results
Factory Host Object

CREATE REMOVE RECOVER


Size of the model. The SRN model of the system con-
H
sists of 109 places and 147 transitions. The state space of
D F
the underlying Markov-chain is 7,386 tangible states (i.e.
[L]
failure_rate
states in which the system spends non-zero time), and there
failure_rate

recovery
are 24,406 transitions among them.
[H&L]

S [E]
Transient analysis. The analysis answers the question
L O R what is the probability of having at least one (or two) work-
ing object replicas. In the early phase of the design usu-
C Fault notifier ally timed SRN transitions with exponential distribution are
used in the model and the designer estimates the parameters
[S] [O] [F&H]
of the distributions. This assumption enables an analytical
[S]
solution of the model. Here we assumed the following pa-
K E
rameters:
[R || F]

Modeled occurrence Average time units


Host failure 10,000
initialized removed host−fault unrecoverable recovered object−fault
Object failure 1,000
Recovery 10
Figure 4. Petri net models of the objects Local fault detection 10
Global fault detection 100
To analyze the model we transformed it to stochastic re-
Step of the RM 1
ward nets (SRN) by using the patterns introduced above.
The other infrastructure objects and the replicas were repre- Figure 5 presents the probability of one (two) working
sented by simplified SRNs assigned to them from a prede- replicas. The probability of having (on the long run, i.e.
fined library, on the basis of their role in the object model. in steady-state) at least one working replica is 99.92%, of
Some of them are presented in Figure 4. The 3 ellipses on having a selected replica working is 97,09% and of having
the top depict the places, where the SRN of the RM can both replicas working is 94,26%.
put tokens representing the corresponding events. On the Comparison of RM strategies. A central question of
bottom of the figure the 6 ellipses depict the input places the early design is the comparison of different architectural
belonging to the event queue of the RM. Guards in square solutions. The designer can reduce the design cycle by
brackets refer to the marking of places of concurrent ob- comparing the solutions and elaborating only the best fit-
jects. ting one. In our example one parameter of the design is
 The model of an object replica is on the right of the top
the number of object replicas required to achieve the re-
quired availability. Other interesting parameter is the time
row in the figure. Failure of the replica is represented
delay considering a recovered object as suspicious. It is also
by a timed transition (its parameter is the failure rate
questionable whether the policy of considering a recovered
estimated by the designer). When the object has failed,
object suspicious (and handling a subsequent failure in this
it can start a recovery phase, when an event recover is
interval in a different way) is meaningful.
sent to it by the RM. After a successful recovery it can
The comparison of systems with different number of ob-
resume work again.
ject replicas is quite easy, the required modification of the
 The factory constructs and destructs the object replica dependability model is straightforward.
100
References
At least one replica working

99
[1] M. Ajmone Marsan. Stochastic Petri nets: An elementary in-
troduction. In G. Rozenberg, editor, Advances in Petri Nets,
98
LNCS 424, pages 1–29. Springer Verlag, 1991.
Probability of done [%]


One selected replica working [2] S. Allmaier and S. Dalibor. Panda - Petri net ANalysis and
97 Design Assistant. In Tools Descriptions, 9th Int. Conf. on
Modeling Techniques and Tools for Computer Performance
96
Evaluation (Tools’97), St. Malo, France, 1997.
[3] A. Bondavalli, M. Dal Cin, D. Latella, and A. Pataricza.
High-level Integrated Design Environment for Dependabil-
95
ity (HIDE). In Proc. Fifth Int. Workshop on Object-Oriented
Both replicas working
Real-Time Dependable Systems (WORDS-99), Monterey,
94
0 100 200 300 400 500 600 700 800 900 1000
California, USA, November 18-20. 1999.
Time [time unit]
[4] A. Bondavalli, I. Majzik, and I. Mura. Automatic depend-
ability analysis for supporting design decisions in UML. In
Figure 5. Probability of having working repli- Proc. HASE’99, Fourth IEEE Int. Symposium on High As-
cas surance Systems Engineering, 1999.
[5] A. Bondavalli, I. Mura, S. Chiaradonna, R. Filippini, S. Poli,
and F. Sandrini. DEEM: A tool for the dependability
modeling and evaluation of multiple phased systems. In
Proc. IEEE Int. Conf. on Dependable Systems and Networks
However, the analysis of the time delay of considering
(DSN), New York, June 26-28, 2000.
an object being suspicious is more tricky. Assuming ex- [6] G. Ciardo, J. Muppala, and K. S. Trivedi. SPNP - stochastic
ponential distribution of all timing activities, the analysis Petri net package. In Proc. IEEE 3rd Int. Workshop on Petri
will show that the availability of the system is not sensi- Nets and Performance Models (PNPM’89), pages 142–151.,
tive to the parameter of this time delay (as the length of the Kyoto, Japan, 1989.
interval and the number of failures in this interval are not [7] G. Huszerl and I. Majzik. Quantitative analysis of depend-
bound). Naturally, this result points out the ambiguity of ability critical systems based on UML statechart models. In
the assumption of exponential timing, and not the inappro- Proc. HASE 2000, Fifth IEEE Int. Symposium on High As-
priateness of the fault handling policy. The analysis can be surance Systems Engineering, 2000.
performed correctly by using SRN models with determinis- [8] Isis Distributed Systems Inc. and Iona Technologies Ltd. Or-
bix+Isis Programmers’s Guide, 1995.
tic timed transitions. [9] D. Latella, I. Majzik, and M. Massink. Automatic verifi-
cation of UML statechart diagrams using the SPIN model-
checker. Formal Aspects of Computing, 11(6):637–664,
6 Conclusion 1999.
[10] S. Maffeis and D. C. Schmidt. Constructing reliable dis-
tributed communication systems with CORBA. IEEE Com-
We showed in this paper that complex, application- munications Magazine, 14(2), 1997.
dependent replication strategies of distributed object- [11] L. E. Moser, P. M. Melliar-Smith, P. Narasimhan, L. Tewks-
oriented systems can be analyzed automatically. The anal- bury, and V. Kalogeraki. The Eternal system: An architec-
ysis can be performed in an early design phase when the ture for enterprise applications. In Proc. Int. Enterprise Dis-
structure of the system and the behavior of the replication tributed Object Computing Conf., pages 214–222, 1999.
[12] J. K. Muppala, G. Ciardo, and K. S. Trivedi. Stochastic re-
manager is defined. On the one hand, the hosts, infrastruc-
ward nets for reliability prediction. Commun. in Reliability,
ture objects and server object replicas can be represented Maintainability and Serviceability, 1(2):9–20, July 1994.
by simplified dependability sub-models (their detailed be- [13] Object Management Group. Fault tolerant CORBA specifi-
havior should not be specified). On the other hand, the de- cation v1.0, ptc/2000-04-04. http://www.omg.org/, 2000.
signer can use the full power of UML statecharts to describe [14] OMG. UML Semantics, version 1.1. Object Management
the core part of the redundancy management, i.e. the be- Group, September 1997.
havior of the RM. The statechart of the RM is transformed
to an SRN dependability model which is completed by the
other sub-models and analyzed by off-the-shelf tools. The
optimal replication management can be selected by mod-
eling alternative behaviors of the RM, executing the auto-
matic model transformation and the subsequent dependabil-
ity analysis.

You might also like