0% found this document useful (0 votes)
54 views

Group Comm PDF

This document proposes using group communication techniques to improve distributed system management. Specifically, it discusses how group communication can help with three tasks: simultaneous execution of commands on multiple machines, efficient software installation on multiple machines, and consistent management of network tables. The architecture would exploit multicast capabilities to speed up processes and minimize network load during management tasks.

Uploaded by

Sai Shankar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Group Comm PDF

This document proposes using group communication techniques to improve distributed system management. Specifically, it discusses how group communication can help with three tasks: simultaneous execution of commands on multiple machines, efficient software installation on multiple machines, and consistent management of network tables. The architecture would exploit multicast capabilities to speed up processes and minimize network load during management tasks.

Uploaded by

Sai Shankar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Group Communication as an Infrastructure for

Distributed System Management 


Y. Amir D. Breitgand, G. V. Chockler, D. Dolev
Department of Computer Science Institute of Computer Science
The Johns Hopkins University The Hebrew University of Jerusalem
Baltimore MD 21218 Jerusalem 91904 Israel
and the NASA Center of Excellence fdavb, grishac, [email protected]
in Space Data and Information Sciences
[email protected]

Abstract ture exploits a group communication service to mini-


In the past, system management tools for computer mize communication costs and to help preserve com-
systems were oriented towards managing a single com- plete and consistent operation, despite of potential
puter with, possibly, many users. When the networked network partitions and site crashes. Although we fo-
system concept became widespread, centralized solu- cus on the Unix environment, a de-facto standard for
tions such as the Network Information Service (NIS1 ) distributed environments, the same mechanisms are
were developed to help the system manager to control applicable for other settings as well.
a network of workstations. Today, when many sites We address problems in three areas of distributed
contain hundreds of workstations, these solutions may system management:
no longer be adequate.  Table Management: Examples for table manage-
This paper proposes the usage of techniques, devel- ment include the management of user accounts,
oped for group communication and database replica- maintenance of a uni ed le system and various
tion, for distributed cluster management. We show network tables. We will show how new group
how group communication can be exploited to provide communication and replication techniques can
three classes of frequently needed operations: simulta- render table management services ecient, sym-
neous execution of the same operation in a group of metric, consistent and highly available, while pre-
workstations; software installation in multiple work- serving the existing interface to these services.
stations; and consistent network table management
(improving the consistency of NIS)  Software Installation
1 Introduction and Version Control: Presently, software installa-
tion is mostly done manually by the system man-
The rapid growth of distributed environments is ager on a per-machine basis. We will show how
motivated by a number of advantages provided by group communication techniques can exploit the
a distributed architecture over a centralized one. multicast and broadcast capabilities of local area
Among them are the better cost-performance ratio, networks to speed up the installation process and
potential for higher availability, provision for geo- to minimize latency and network load during the
graphical spread of organizations, and, from the user installation process.
point of view, more autonomy for users.
Unfortunately, distributed environments are harder  Simultaneous Execution: It is sometimes neces-
to manage. Often they require management data to be sary to invoke the same management command
scattered and duplicated in several sites. When sys- on several machines. For example, if an electricity
tem size grows, controlling the management data and shutdown time is expected, it might be advisable
keeping it consistent becomes a complex and tedious to shut down the whole system. Consequently, it
task. is bene cial to have a method to invoke the shut-
This paper shows how group communication mech- down command from one control station. We will
anisms can help in building ecient solutions for dis- show how such a centralized management tool can
tributed system management. Our proposed architec- be easily constructed within our architecture.
 This work was supported by United States - Israel Bina- The rest of this paper is organized as follows: the
tional Science Foundation, grant number 92-00189. next section brie y describes existing work in the area.
1 NIS was formerly called Yellow Pages, but later renamed Section 3 describes Transis, our group communication
to avoid confusion with other trademarks. toolkit. Section 4 presents the proposed architecture
for distributed system management. Sections 5-7 fo- acknowledgments in order to cope with omission faults
cus on simultaneous execution, software installation, not handled by UDP. To monitor the status of each
and table management respectively, and Section 8 of- participating server, DSMIT clients retain the last ac-
fers concluding remarks. knowledged transmission. In contrast to this, our so-
lution monitors the status of the whole group of man-
2 Related Work agement servers. The need to care about each par-
In this section we brie y survey some existing solu- ticular target arises only upon a membership change,
tions for distributed system management. In partic- reported by the group communication layer.
ular, we consider Network Information Service (NIS) The primary reason for DSMIT not to use group
as a con guration management framework and Dis- communication toolkits (such as Transis) was that ef-
tributed SMIT as a tool for concurrent execution of cient group communication solutions were restricted
system administration tasks in a heterogeneous en- to LANs at the time of DSMIT's development. Re-
vironment. In addition, Tivoli Management Environ- cent work ([1, 3, 8]) demonstrates that the group com-
ment (TME) is discussed as an example of a leading in- munication paradigm can be e ectively extended to
tegrated solution for distributed system management. WAN environment. As group communication tech-
nology over WAN matures, the reliability layer imple-
Network Information Service. mented in DSMIT will become less-e ective.
Sections 5, 6 show how a group communication
The Network Information Service (NIS) [10] is sup- transport layer can be utilized for building e ective
plied as a part of the operating system by all major and reliable solutions for problems DSMIT was de-
UNIX vendors. In NIS, a collection of network tables signed to tackle.
(maps) constituting a con guration database, can op-
tionally be replicated among a group of servers. Up- Tivoli Management Environment
dates to the con guration database are always made
at the distinguished server, termed master, and later The Tivoli Management Environment [12] (TME) is
may be propagated to the other servers, called slaves probably the most comprehensive integrated solution
(if such exist). In this architecture, the master cen- for distributed system management existing today.
tralizes the con guration management, and the slaves We focus on two components of TME: Tivoli/Admin
are for higher availability and better performance. which deals with system con guration management,
While this architecture has proved to be successful, and Tivoli/Courier which addresses software distribu-
current implementations of NIS lack built-in facilities tion.
for guaranteeing consistency of replicas in the presence Both Tivoli/Admin and Tivoli/Courier use a com-
of network partitions and server crashes. munication toolkit, named MDist (multiplexed distri-
In particular, propagation of updates is not done bution) [11]. MDist is designed to distribute a large
automatically on a natural, per-update basis, but is amount of data to a prede ned set of target machines
relayed over the system administrator to be performed using point-to-point communication. These targets
periodically. This may lead to undesirable temporal can be either end-receivers or repeaters, which can
inconsistencies even when the system is stable. After in turn become distributors. All participants are or-
each update, the corresponding table is completely re- ganized into a tree which is constructed in order to
built, and the whole table is sent over the network. speed-up data distribution and to improve scalability.
Since TCP/IP is used, the number of slaves which can Tivoli/Admin allows replication of certain con gu-
be reasonably employed is limited. Slaves are not al- ration data in order to increase availability and per-
lowed to propagate updates to other slaves. Thus, the formance. Consistency among the di erent copies is
system cannot reach a consistent state if the network maintained using the two phase commit protocol [6].
partitions and the master is not present in a partition. Two phase commit performs end-to-end acknowledg-
If the master crashes, NIS cannot continue to operate ment between all of the replicas for each update.
without a complete recon guration. Therefore, it is resource consuming and achieves lim-
In Section 7 we show how NIS implementation can ited performance which deteriorates linearly as the
be substantially improved while preserving all of its number of replicas increases.
appealing features. While the concepts developed by Tivoli certainly
constitute an integrated and comprehensive dis-
Distributed SMIT tributed system management solution, its infrastruc-
ture can be improved. Thanks to the open design of
Distributed SMIT[5] (DSMIT) presents an integrated TME, our solutions can be integrated into it and co-
tool for heterogeneous system management. DSMIT exist with other approaches.
consists of clients which emit management commands
to servers in a uni ed platform-independent syntac- 3 The Transis System
tic form. The servers translate the commands into a Transis [3] is a group communication sub-system
platform-speci c form and perform them in parallel. developed at the Hebrew University of Jerusalem.
DSMIT utilizes a reliability layer built on top of Transis supports the process group paradigm in which
UDP 2 . This layer makes extensive use of end-to-end
the number of simultaneously opened connections and, hence,
2 TCP could not be used as a transport layer because it limits limits system scalability.
16 Pentiums

2500 0.8

0.7
2000
0.6

Utilization
Messages/Second
0.5
1500
0.4
1000 0.3

0.2
500
0.1
0 0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400
Message Size (Bytes)

Figure 1: Throughput as a function of message size.

a process can join groups and multicast messages to (i.e. processes that are not members of a group
groups. Using Transis, messages can be addressed can multicast to that group).
to the entire process group by specifying the group A process is linked with a library that connects it
name (a character string selected by the user). The to the Transis daemon. When the process connects to
group membership changes when a new process joins Transis, an inter-process communication handle (simi-
or leaves the group, a processor containing processes lar to a socket handle) is created. A process can main-
belonging to the group crashes, or a network partition tain multiple connections to Transis. A process may
or re-merge occurs. Processes belonging to the group voluntarily join speci c process groups on a speci c
receive con guration change noti cation when such an connection. A message which is received can be a reg-
event occurs. ular message sent by a process, or a membership no-
Transis incorporates sophisticated algorithms for ti cation created by Transis regarding a membership
membership and reliable ordered delivery of messages change of one of the groups to which this process be-
that tolerate message omission, processor crashes longs. Transis service semantics is described in [2, 9].
and recoveries, and network partitions and remerges. Transis is operational for almost three years now. It
High performance is achieved by utilizing non-reliable is used by students of the Distributed Systems course
broadcast or multicast where possible (such as on lo- at The Hebrew University and by the members of The
cal area networks). Transis performance can be seen High Availability Lab. Several projects were imple-
in Figure 1. mented on top of Transis. Among them were highly
Transis application programming interface (API) available mail system, two types of replication servers
provides a connection oriented service that, in princi- and several graphical demonstration programs.
ple, extends a point-to-point service such as TCP/IP Ongoing work on the Transis project focuses,
to a reliable multicast service. The API contains en- among other things, on security and authentication of
tries that allow a process to connect to Transis, to join users which is important for useful distributed system
and leave process groups, to multicast messages to pro- management tools.
cess groups, to receive messages and to disconnect.
Transis is implemented as a daemon. The Tran- 4 The Architecture
sis daemon handles the physical multicast communi- The architecture is composed of two layers as de-
cation. It keeps track of the processes residing in its picted in Figure 2. The bottom layer is Transis, our
computer which participate in group communication, group communication toolkit, which provides reliable
and also keeps track of the computer's membership multicast and membership services. The top layer is
(i.e. connectivity). The bene t of this structure are composed of a management server and a monitor. Al-
signi cant. The main advantages in our context are: though we use Transis as our group communication
 Message ordering and reliability are maintained layers, other existing toolkits such as Totem [4], Ho-
at the level of the daemons and not on a per rus [13] or Newtop [7] could have been used.
group basis. Therefore, the number of groups in The management server provides two classes of ser-
the system has almost no in uence on system per- vices: long-term services and short-term ones. Long-
formance. term services provide consistent semantics across par-
titions and over time. They are used for replication of
 Flow control is maintained at the level of the dae- network tables (maps) such as the password database,
mons rather than at the level of the individual which are maintained on a secondary storage. These
process group. This leads to better overall per- services implement an ecient replica control protocol
formance. that applies changes on a per-update basis.
Short-term services are reliable as long as the net-
 Implementing an open group semantics is easy work is not partitioned and the management server
External External
Application Application

Management Server Management Server

Long Term Short Term Long Term Short Term


Monitor
Representation Representation Management
Converter Converter Server API

Transis API Transis API Transis API

T r a n s i s T r a n s i s T r a n s i s

Figure 2: Two levels architecture.

does not crash. In case of a network partition or a The command is executed by each of the rele-
server crash, the monitor and the management servers vant servers relatively to the working directory
receive noti cation from Transis. The application may that corresponds to the initiating monitor.
be informed and may take whatever steps necessary.
Short-term services include simultaneous task execu-  Siminst. Install a software package on a num-
tion and software installation. ber of speci ed hosts. The installation is per-
The monitor provides a user interface to the ser- formed relatively to the working directory that
vices of the management server. The monitor is a corresponds to the initiating monitor.
process which may run on any of the nodes that run
Transis. Several monitors may run simultaneously in  Update-map. Update map while preserving con-
the network. sistency between replicas.
The management server runs as a daemon on each  Query-map. Retrieve information from a map.
of the participating nodes. It is an event driven pro-
gram. Events can be generated by the monitor, an-  Exit. Terminate the management server process.
other server or Transis.
Each server maintains a vector of contexts, with In practice, sites may be heterogeneous both in
one entry for each monitor it is currently interacting terms of software (e.g. operating system) and hard-
with. Each entry contains (among other things) the ware. We make use of a generic platform-independent
current working directory of the server as set by the representation for management commands and for the
corresponding monitor. reports of their execution. This representation is the
The long-term services are a non-intervening ex- only format used for communication between the pro-
tension of the current standard Unix NIS. Since the tocol entities. The Representation Converter (see Fig-
hosts NIS map repositories retain their original for- ure 2) is responsible for converting this generic repre-
mat, applications (e.g. gethostbyname) that use RPC sentation into a platform-speci c form. This archi-
to retrieve information from them are not changed, tecture enables the support of new platforms with a
The service quality is improved because the replication relative ease.
scheme implemented by the management server guar- A prototype of the presented architecture was im-
antees consistency and is much more ecient com- plemented on top of Transis and was tested in a clus-
pared to the ad-hoc solution provided by NIS. ter of Unix workstations. The code, developed in the
The management server API contains the following C programming language, spans approximately 6500
entries: lines. The table management protocol (the more so-
 Status. Return the status of the server and its phisticated part) constitutes about half of the code.
host machine. 5 Simultaneous Execution
 Chdir. Change the server's working directory The system manager may frequently need to in-
which corresponds to the requesting monitor. voke an identical management command on several
machines. Potentially, the machines may be of di er-
 Simex. Execute a command simultaneously ent types. The activation of a particular daemon or
(more or less) on a number of speci ed hosts. script on several machines, or the shutdown operation
Initially: Initially:
connect to Transis; connect to Transis;
join private group; join private group;
join group Cluster; join group Cluster;
while(true) f while(true) f
m = receive(); m = receive();
switch(m:type) switch(m:type)
case command from a user: case Chdir(dir) from the monitor M :
NR = M; contexts[M ]:working dir = dir;
multicast(command, Cluster); send ACK to M 's private group;
while(NR 6= ;) case Status from the monitor M :
m = receive(); get status of my machine;
switch(m:type) convert status to a system-independent form;
case view change message: send status to M 's private group;
NR = NR n (M n m:M); case Simex(command) from the monitor M :
M = m:M; convert command to a system-speci c form;
case result of execution from server: chdir(contexts[M ]:working dir);
NR = NR n server; result = execute(command);
return the result; convert result to a system-independent form;
case view change message: send result to M 's private group;
M = m:M; case Exit:
g terminate my process;
g
command can be one of the following:
Chdir, Status, Simex or Exit
(a) The Monitor (b) The Management Server

Figure 3: The Simultaneous Execution Protocol

of several machines are good examples. Another ex- and time, due to the point-to-point nature of TCP/IP.
ample is the simultaneous monitoring of CPU load, In addition, repeating the same procedure many times
memory usage and other relevant system parameters is prone to human errors resulting in inconsistent in-
on all or part of the machines in a cluster. stallations.
Figure 3(a) and Figure 3(b) present the pseudo- In contrast, we use Transis to disseminate the rel-
code of the relevant parts of the management server evant les to the members of the subgroup eciently.
and the monitor respectively. The management server We use the technique presented in Section 5 to exe-
maintains two sets: M and NR. M is the most re- cute the installation commands simultaneously at all
cent membership of the group Cluster as reported the involved locations. Each command is submitted
by Transis. NR is the set of the currently connected only once, reducing the possibility of human errors.
management servers which have not yet report the Using the process group paradigm, the system admin-
outcome of a command execution to the monitor. istrator can dynamically organize hosts with the same
It is easy to see how other tasks are integrated with installation requirements into a single multicast group.
the simultaneous execution task to form our proposed Our installation protocol proceeds in the following
architecture. steps. First, the monitor multicasts a message adver-
tising the installation of a package P , the set Rp of its
6 Software Installation installation requirements (e.g. disk space, available
Software installation and version upgrade consti- memory, operating system version etc.), the installa-
tute one of the most time-consuming system manage- tion multicast group Gp and the target list Tp . Upon
ment tasks. In large heterogeneous sites which com- reception of this message, the management server joins
prise tens or even hundreds of machines, there are of- Gp , if the system which it controls conforms to Rp
ten subgroups of computers with identical (or similar) and belongs to Tp . When all the management servers
architecture running copies of the same application from Tp have either joined Gp or reported that they
software and operating system. Presently, it is a com- will not participate in the installation, the monitor be-
mon practice to perform installation or upgrade by re- gins multicasting the les comprising the package P to
peating the same procedure at all locations in the sub- the group G. Finally, the status of the installation at
group separately. Installation or upgrade procedures every management server is reported to the monitor.
include the transfer of the packages, the execution of The Transis membership service helps detecting hosts
installation utilities and the update of relevant con g- which may not have completed the installation due to
uration les. Traditionally, all the above mentioned a network partition or host crash.
operations are performed using the TCP/IP protocol. The same protocol may later be repeated for a more
This approach is wasteful in terms of both bandwidth restricted multicast group G0  G. The monitor ques-
tions the members of G0 about the missing les prior  Vec: a vector of sequence numbers containing one
to the redistribution, and only the needed les are entry for each of the SG's members. If V ec[i] = n
multicast to G0 in order to save bandwidth and time. then S knows that server i has all the updates
up to n. Initially, all V ec's entries are 0. Vec is
7 Table Management retained on a non-volatile storage.
This section presents the protocol for ecient and
consistent management of the replicated network ta-  SGT: the Transis group name of SG.
bles, each of which represents a service. Servers which
share replicas of the same table form the same ser-  Memb: the current membership of SGT as re-
vice group (SG). A service group consists of an ad- ported by Transis. This is a structure which
ministratively assigned primary server and a number contains a unique identi er of the membership
of secondary ones. For the sake of simplicity we will (memb id) and a set of currently connected
consider a single SG in the following discussion. servers (set).
The primary server enforces a single total order on
all the update messages inside the SG. This is achieved  ARU: 4 a sequence number such that S knows
by forwarding each new request for update from a that all the updates with sequence numbers no
client to the SG's primary. The primary creates an greater than ARU were received and applied to
update message from the request, assigns it a unique the table by all the members of SGT. Note that
sequence number, and multicasts this update message ARU = min1ijV ecj (V ec[i]).
to the SG. Each secondary server applies the update
messages to the SG's table in the order consistent with  min sn, max sn: the minimal and maximal se-
the primary's one. This guarantees that all the servers quence numbers of update messages that need to
in the same network component remain in a consistent be retransmitted upon a membership change.
state. If the network partitions, at most one compo-
nent (the one that includes the primary) can perform  Memb counter: variable that counts the State
new updates. Therefore, con icting updates are never messages during the information exchange upon
possible. a membership change.
When a membership change (network partition or
merge, or server crash) is reported by the group com- Message Types
munication layer, the connected servers exchange in-
formation and converge to the most updated consis-  Req: a new request to perform an update to the
tent state known to any of them. Note that this hap- table. This request is sent by a client to one of
pens even if the primary is not a member of the cur- the servers. The update operation is stored in the
rent membership. The information exchange is done action eld of this message.
in two stages. In the rst stage, the servers exchange
state messages containing a vector, representing their  Upd: an update message multicast by SG's pri-
knowledge about the last update known to each server. mary to SGT . This message carries a unique se-
In the second stage, the most updated server multi- quence number in the sn eld in addition to the
casts updates that are missed by any member of the elds of a Req message.
currently connected group. 3
Each server logs all the update messages from the  M: a membership change noti cation delivered
primary on a non-volatile storage. This log is used by Transis. This message contains the same two
for restoring of a server's state when a server recovers elds as the Memb structure.
from a crash. A server discards an update from the
log when it learns that all the other servers have ap-  State: a state message which carries the V ec
plied this update to their table (and hence, no server and the identi er of the sender. This message
will need to recover that update in the future). is stamped with the membership identi er of the
membership it was sent in.
Data Structures  StateP: similar to the state message which is used
Each management server S 2 SG maintains the for garbage collection when the membership con-
following data structures: tains all the members of SG.
 my id: a unique identi er of S.  Qry: a query message from a client.
 p id: the identi er of SG's primary server.
In addition, a type eld is included with each message.
 MQ: a list of the updates received by S. MQ is The Pseudo Code
retained on a non-volatile storage.
The following subsections present the pseudo-code
3 If the primary server is present in a component, it will be of the table management protocol.
the one performing the retransmission. Otherwise, one of the
most updated secondary servers is deterministically chosen. 4 ARU stands for \all-received-up-to"
Request from a client State message from a server
The server which receives an update request from a When a valid State message is received, the server
client, forwards it to the primary server. The pri- updates its knowledge regarding other servers' knowl-
mary server creates an update message from this re- edge. After all the States messages have been re-
quest, applies it to the SG's table and multicasts it to ceived, the needed update messages are retransmitted
the group. Procedure handle-request details these by the most updated server. If the primary server is
steps. a member of the current membership, it is selected as
the most updated server, otherwise the most updated
handle-request(m) secondary server with the smallest identi er is selected
f using the procedure most-updated-server.
if (my id == p id) then Procedure handle-state details these steps.
V ec[my id] = V ec[my id] + 1;
m:sn = V ec[my id];
m:type = Upd; handle-state(m)
append m to MQ; f
sync MQ and V ec to disk; if (m:memb id 6= Memb:memb id) then
apply m:action to SG's table; return;
multicast(m, SGT); V ec = max(V ec; m:V ec);
else if (p id 2 Memb) then if (m:V ec[m:sender] < min sn) then
send(m, p id); min sn = m:V ec[m:sender];
g if (m:V ec[m:sender] > max sn) then
max sn = m:V ec[m:sn];
Memb counter = Memb counter ; 1;
Update from a server if (Memb counter == 0) then
if ( most-updated-server () ) then
A secondary server which receives an update message for each m0 2 MQ0 s:t: m0 :sn > min sn do
in the correct order, applies the update to the table multicast(m , SGT);
and changes its data structures accordingly. Proce- g
dure handle-update details these steps. The most-updated-server procedure presented
below returns true if the invoking server is the most-
handle-update(m) updated-server with the minimal identi er, and false
f otherwise.
if (my id 6= p id and boolean most-updated-server()
m:sn == V ec[my id] + 1) then f
V ec[my id] = m:sn; for each i 2 Memb:set and i < my id do
append m to MQ; if (V ec[i] == max sn) then
sync MQ and V ec to disk; return false;
apply m:action to SG's table; if (V ec[my id] == max sn) then
else return true;
discard m; g
g
Garbage collection
Membership change noti cation from Transis In order to discard updates which are no longer
Upon a membership change, the connected servers ex- needed, procedure collect-garbage is called upon
change information in order to converge to the most the reception of either a State message, or a StateP
updated state. Procedure handle-membership pre- message. The StateP message is sent periodically
pares the data structures for this recovery process and if the membership contains all the members of the
multicasts a State message. Note that the State mes- SG. The reason for having the StateP message, is to
sage contains V ec, representing the local knowledge avoid maintaining large amounts of updates that are
regarding other servers' states. no longer needed because each member of the SG has
already applied them.
handle-membership(m)
f collect-garbage(m)
Memb:set = m:set; f
Memb:memb id = m:memb id; V ec = max(V ec; m:V ec);
min sn = max sn = V ec[my id]; new ARU = min1ijV ecj (V ec[i]);
Memb counter = j Memb j; if (new ARU >0 ARU) then 0
create a State 0message m0 ; for each m 20 MQ s:t: m  new ARU do
multicast(m , SGT ); remove m from MQ;
g ARU = new ARU;
sync MQ and V ec to disk; [5] N. Amit, D. Ginat, S. Kipnis, and J. Mihaeli.
g Distributed SMIT: System management tool for
large Unix environments. Research report, IBM
Events handling Israel Science and Technology, 1995. In prepara-
tion.
The following is the main loop of the table manage-
ment part of the management server. [6] P. A. Bernstein, V. Hadzilacos, and N. Goodman.
Concurrency Control and Recovery in Database
Initially: Systems, chapter 7. Addison Wesley, 1987.
connect to Transis; [7] P. Ezhilchelvan, R. Macedo, and S. Shrivastava.
join group SGT; Newtop: A fault-tolerant group communication
initialize all the V ec entries to 0; protocol. In Proceedings of the 15th International
bring in MQ and V ec (if present) from disk; Conference on Distributed Computing Systems,
ARU = min1ijV ecj (V ec[i]); May 1995.
while(true) f
m = receive(); [8] N. Huleihel. Ecient ordering of messages in
switch(m:type) wide area networks. Master's thesis, Institute
case Req: of Computer Science, The Hebrew University of
handle-request(m); Jerusalem, Israel, 1996.
case Upd:
handle-update(m); [9] L. E. Moser, Y. Amir, P. Melliar-Smith, and D. A.
case Qry: Agarwal. Extended virtual synchrony. In Proceed-
retrieve an answer from the local table; ings of the 14th International Conference on Dis-
send the answer to the client; tributed Computing Systems, pages 56{65, June
case M: 1994.
handle-membership(m);
case State: [10] H. Stern. Managing NFS and NIS, chapter 2, 3,
handle-state(m); 4. O'Reilly & Associates Inc, rst edition, June
collect-garbage(m); 1991.
case StateP: [11] Tivoli Systems Inc. Multiplexed Distribution
collect-garbage(m); (MDist), November 1994. Available via anony-
g mous ftp from ftp.tivoli.com /pub/info.
8 Conclusion [12] Tivoli Systems Inc. TME 2.0: Technology Con-
We have presented an architecture that utilizes cepts and Facilities, 1994. Technology white pa-
group communication to provide ecient and reliable per discussing Tivoli 2.0 components and ca-
distributed system management. The common man- pabilities. Available via anonymous ftp from
agement tasks of simultaneous execution, software in- ftp.tivoli.com /pub/info.
stallation and table management were addressed. The
resulting services are convenient to use, consistent in [13] R. van Renesse, K. P. Birman, R. Friedman,
presence of failures, and complementary to the exist- M. Hayden, and D. Karr. A framework for pro-
ing standard mechanisms. tocol composition in Horus. In Proceedings of
References the ACM symposium on Principles of Distributed
Computing, August 1995.
[1] Y. Amir, 1995. The Spread toolkit, Private Com-
munication.
[2] Y. Amir. Replication Using Group Communica-
tion Over a Partiotioned Network. PhD thesis,
Institute of Computer Science, The Hebrew Uni-
versity of Jerusalem, Israel, 1995.
[3] Y. Amir, D. Dolev, S. Kramer, and D. Malki.
Transis: A communication sub-system for high
availability. In Proceedings of the 22nd Annual In-
ternational Symposium on Fault-Tolerant Com-
puting, pages 76{84, July 1992. The full version
of this paper is available as TR CS91-13, Dept. of
Comp. Sci., the Hebrew University of Jerusalem.
[4] Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A.
Agarwal, and P. Ciarfella. The Totem single-
ring ordering and membership protocol. 13(4),
November 1995.

You might also like