Process Migration: Dejan S. Milo JI CI C
Process Migration: Dejan S. Milo JI CI C
´ ČIĆ
DEJAN S. MILOJI
HP Labs
FRED DOUGLIS
AT&T Labs–Research
YVES PAINDAVEINE
TOG Research Institute
RICHARD WHEELER
EMC
AND
SONGNIAN ZHOU
University of Toronto and Platform Computing
Process migration is the act of transferring a process between two machines. It enables
dynamic load distribution, fault resilience, eased system administration, and data
access locality. Despite these goals and ongoing research efforts, migration has not
achieved widespread use. With the increasing deployment of distributed systems in
general, and distributed operating systems in particular, process migration is again
receiving more attention in both research and product development. As
high-performance facilities shift from supercomputers to networks of workstations, and
with the ever-increasing role of the World Wide Web, we expect migration to play a
more important role and eventually to be widely adopted.
This survey reviews the field of process migration by summarizing the key concepts
and giving an overview of the most important implementations. Design and
implementation issues of process migration are analyzed in general, and then revisited
for each of the case studies described: MOSIX, Sprite, Mach, and Load Sharing Facility.
The benefits and drawbacks of process migration depend on the details of
implementation and, therefore, this paper focuses on practical matters. This survey will
help in understanding the potentials of process migration and why it has not caught on.
Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted
without fee provided that the copies are not made or distributed for profit or commercial advantage, the
copyright notice, the title of the publication, and its date appear, and notice is given that copying is by
permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists,
requires prior specific permission and/or a fee.
°2001
c ACM 0360-0300/01/0900-0241 $5.00
ACM Computing Surveys, Vol. 32, No. 3, September 2000, pp. 241–299.
242 D. S. Milojičić et al.
of the potentials migration can offer to visited for each of the case studies de-
different applications (see Section 2.3 on scribed: MOSIX, Sprite, Mach, and LSF.
goals, Section 4 on approaches, and Sec- The benefits and drawbacks of process mi-
tion 2.4 on applications). gration depend on the details of implemen-
With the increasing deployment of dis- tation and therefore this paper focuses
tributed systems in general, and dis- on practical matters. In this paper we
tributed operating systems in particular, address mainly process migration mech-
the interest in process migration is again anisms. Process migration policies, such
on the rise both in research and in prod- as load information management and dis-
uct development. As high-performance fa- tributed scheduling, are mentioned to the
cilities shift from supercomputers to Net- extent that they affect the systems be-
works of Workstations (NOW) [Anderson ing discussed. More detailed descriptions
et al., 1995] and large-scale distributed of policies have been reported elsewhere
systems, we expect migration to play a (e.g., Chapin’s survey [1996]).
more important role and eventually gain This survey will help in understand-
wider acceptance. ing the potential of process migration. It
Operating systems developers in in- attempts to demonstrate how and why
dustry have considered supporting pro- migration may be widely deployed. We
cess migration, for example Solaris MC assume that the reader has a general
[Khalidi et al., 1996], but thus far the knowledge of operating systems.
availability of process migration in com-
mercial systems is non-existent as we
describe below. Checkpoint-restart sys- Organization of the Paper
tems are becoming increasingly deployed The paper is organized as follows. Sec-
for long-running jobs. Finally, techniques tion 2 provides background on process mi-
originally developed for process migration gration. Section 3 describes the process
have been employed in developing mobile migration by surveying its main charac-
agents on the World Wide Web. Recent in- teristics: complexity, performance, trans-
terpreted programming languages, such parency, fault resilience, scalability and
as Java [Gosling et al., 1996], Telescript heterogeneity. Section 4 classifies vari-
[White, 1996] and Tcl/Tk [Ousterhout, ous implementations of process migration
1994] provide additional support for agent mechanisms and then describes a couple
mobility. of representatives for each class. Section 5
There exist a few books that discuss describes four case studies of process mi-
process migration [Goscinski, 1991; Barak gration in more detail. In Section 6 we
et al., 1993; Singhal and Shivaratri, 1994; compare the process migration implemen-
Milojičić et al., 1999]; a number of sur- tations presented earlier. In Section 7 we
veys [Smith, 1988; Eskicioglu, 1990; Nut- discuss why we believe that process migra-
tal, 1994], though none as detailed as tion has not caught on so far. In the last
this survey; and Ph.D. theses that deal section we summarize the paper and de-
directly with migration [Theimer et al., scribe opportunities for further research.
1985; Zayas, 1987a; Lu, 1988; Douglis,
1990; Philippe, 1993; Milojičić, 1993c; Zhu,
1992; Roush, 1995], or that are related 2. BACKGROUND
to migration [Dannenberg, 1982; Nichols,
1990; Tracey, 1991; Chapin, 1993; Knabe, This section gives some background on
1995; Jacqmot, 1996]. process migration by providing an over-
This survey reviews the field of pro- view of process migration terminology,
cess migration by summarizing the key target architectures, goals, application
concepts and describing the most impor- taxonomy, migration algorithms, system
tant implementations. Design and im- requirements, load information manage-
plementation issues of process migration ment, distributed scheduling, and alterna-
are analyzed in general and then re- tives to migration.
deals with the transfer of authority, for Massively Parallel Processors (MPP)
instance access to a shared file system, are another type of architecture used
but in a limited way: authority is under for migration research [Tritscher and
the control of a single administrative do- Bemmerl, 1992; Zajcew et al., 1993]. MPP
main. Finally, mobile agents transfer code, machines have a large number of pro-
data, and especially authority to act on cessors that are usually shared between
the owner’s behalf on a wide scale, such multiple users by providing each of them
as within the entire Internet. with a subset, or partition, of the pro-
cessors. After a user relinquishes a par-
tition, it can be reused by another user.
2.2. Target Architectures
MPP computers are typically of a NORMA
Process migration research started with (NO Remote Memory Access) type, i.e.,
the appearance of distributed processing there is no remote memory access. In
among multiple processors. Process mi- that respect they are similar to net-
gration introduces opportunities for shar- work clusters, except they have a much
ing processing power and other resources, faster interconnect. Migration represents
such as memory and communication chan- a convenient tool to achieve repartition-
nels. It is addressed in early multipro- ing. Since MPP machines have a large
cessor systems [Stone, 1978; Bokhari, number of processors, the probability of
1979]. Current multiprocessor systems, failure is also larger. Migrating a running
especially symmetric multiprocessors, are process from a partially failed node, for ex-
scheduled using traditional scheduling ample after a bank of memory unrelated to
methods. They are not used as an envi- the process fails, allows the process to con-
ronment for process migration research. tinue running safely. MPP machines also
Process migration in NUMA (Non- use migration for load distribution, such
Uniform Memory Access) multiprocessor as the psched daemon on Cray T3E, or
architectures is still an active area of re- Loadleveler on IBM SP2 machines.
search [Gait, 1990; Squillante and Nelson, Since its inception, a Local Area Net-
1991; Vaswani and Zahorjan, 1991; Nelson work (LAN) of computers has been the
and Squillante, 1995]. The NUMA archi- most frequently used architecture for pro-
tectures have a different access time to the cess migration. The bulk of the systems de-
memory of the local processor, compared scribed in this paper, including all of the
to the memory of a remote processor, or to case studies, are implemented on LANs.
a global memory. The access time to the Systems such as NOW [Anderson et al.,
memory of a remote processor can be vari- 1995] or Solaris [Khalidi et al., 1996] have
able, depending on the type of intercon- recently investigated process migration
nect and the distance to the remote pro- using clusters of workstations on LANs.
cessor. Migration in NUMA architectures It was observed that at any point in time
is heavily dependent on the memory foot- many autonomous workstations on a LAN
print that processes have, both in memory are unused, offering potential for other
and in caches. Recent research on virtual users based on process migration [Mutka
machines on scalable shared memory mul- and Livny, 1987]. There is, however, a so-
tiprocessors [Bugnion, et al., 1997] rep- ciological aspect to the autonomous work-
resents another potential for migration. station model. Users are not willing to
Migration of whole virtual machines be- share their computers with others if this
tween processors of a multiprocessor ab- means affecting their own performance
stracts away most of the complexities of [Douglis and Ousterhout, 1991]. The pri-
operating systems, reducing the migrate- ority of the incoming processes (process-
able state only to memory and to state ing, VM, IPC priorities) may be reduced
contained in a virtual monitor [Teodosiu, in order to allow for minimal impact
2000]. Therefore, migration is easier to im- on the workstation’s owner [Douglis and
plement if there is a notion of a virtual Ousterhout, 1991; Krueger and Chawla,
machine. 1991].
invocation and interprocess communi- Such an application can either migrate it-
cation, while Migratory PVM (MPVM) self, or it can be migrated by another au-
[Casas et al., 1995] extends PVM to al- thority. This type of application is most
low instances of a parallel application common in various systems described in
to migrate among nodes. Some other Section 4 and in the case studies described
applications are inherently paralleliz- in Section 5. Note that it is difficult to
able, such as the make tool [Baalbergen, select such applications without detailed
1988]. For example, Sprite provides a knowledge of past behavior, since many
migration-aware parallel make utility applications are short-lived and do not ex-
that distributes a compilation across ecute long enough to justify the overhead
several nodes [Douglis and Ousterhout, of migration (see Section 2.7).
1991]. Certain processor-bound applica- Migration-aware applications are
tions, such as scientific computations, can applications that have been coded to
be parallelized and executed on multi- explicitly take advantage of process
ple nodes. An example includes work by migration. Dynamic process migration can
Skordos [1995], where an acoustic appli- automatically redistribute these related
cation is parallelized and executed on a processes if the load becomes uneven on
cluster of workstations. Applications that different nodes, e.g. if processes are dy-
perform I/O and other nonidempotent op- namically created, or there are many more
erations are better suited to a system-wide processes than nodes. Work by Skordos
remote execution facility that provides lo- [1995], Freedman [1991] and Cardelli
cation transparency and, if possible, pre- [1995] represent this class of application.
emptive migration. They are described in more detail in Sec-
Long-running applications, which tion 4.6.
can run for days or even weeks, can Network applications are the most
suffer various interruptions, for example recent example of the potential use of mi-
partial node failures or administrative gration: for instance, mobile agents and
shutdowns. Process migration can relo- mobile objects (see Sections 4.7 and 4.8).
cate these applications transparently to These applications are designed with mo-
prevent interruption. Examples of such bility in mind. Although this mobility dif-
systems include work by Freedman [1991] fers significantly from the kinds of “pro-
and MPVM [Casas et al., 1995]. Migra- cess migration” considered elsewhere in
tion can also be supported at the appli- this paper, it uses some of the same tech-
cation level [Zhou et al., 1994] by pro- niques: location policies, checkpointing,
viding a checkpoint/restart mechanism transparency, and locating and communi-
which the application can invoke periodi- cating with a mobile entity.
cally or upon notification of an impending
interruption.
2.5. Migration Algorithm
Generic multiuser workloads, for
example the random job mix that an Although there are many different migra-
undergraduate computer laboratory pro- tion implementations and designs, most of
duces, can benefit greatly from process mi- them can be summarized in the following
gration. As users come and go, the load on steps (see also Figure 3):
individual nodes varies widely. Dynamic
process migration [Barak and Wheeler, 1. A migration request is issued to a
1989, Douglis and Ousterhout, 1991] can remote node. After negotiation, mi-
automatically spread processes across all gration has been accepted.
nodes, including those applications that 2. A process is detached from its
are not enhanced to exploit the migration source node by suspending its execu-
mechanism. tion, declaring it to be in a migrating
An individual generic application, state, and temporarily redirecting com-
which is preemptable, can be used with munication as described in the follow-
various goals in mind (see Section 2.3). ing step.
Fig. 3. Migration Algorithm. Many details have been simplified, such as user v. kernel migration, when
is process actually suspended, when is the state transferred, how are message transferred, etc. These details
vary subject to particular implementation.
amount of state has been transferred cation state, such as open message
from the source process instance. Af- channels in the case of message-based
ter that, the destination instance will systems, or open files and signal masks
be promoted into a regular process. in the case of UNIX-like systems.
r Naming/accessing the process and
6. State is transferred and imported
into a new instance on the remote its resources. After migration, the mi-
node. Not all of the state needs to be grated process should be accessible by
transferred; some of the state could be the same name and mechanisms as
lazily brought over after migration is if migration never occurred. The same
completed (see lazy evaluation in Sec- applies to process’s resources, such as
tion 3.2). threads, communication channels, files
7. Some means of forwarding refer- and devices. During migration, access to
ences to the migrated process must be a process and/or some of its resources
maintained. This is required in order can be temporarily suspended. Varying
to communicate with the process or to degrees of transparency are achieved in
control it. It can be achieved by regis- naming and accessing resources during
tering the current location at the home and after migration (see Section 3.3).
r Cleaning up the process’s non-
node (e.g. in Sprite), by searching for the
migrated process (e.g. in the V Kernel, migratable state. Frequently, the mi-
at the communication protocol level), grated process has associated system
or by forwarding messages across all state that is not migratable (examples
visited nodes (e.g. in Charlotte). This include a local process identifier, pid, and
step also enables migrated communica- the local time; a local pid is relevant only
tion channels at the destination and it to the local OS, and every host may have
ends step 3 as communication is perma- a slightly different value for the local
nently redirected. time—something that may or may not
8. The new instance is resumed when matter to a migrating process). Migra-
sufficient state has been transferred tion must wait until the process finishes
and imported. With this step, process or aborts any pending system operation.
migration completes. Once all of the If the operation can be arbitrarily long, it
state has been transferred from the is typically aborted and restarted on the
original instance, it may be deleted on destination node. For example, migra-
the source node. tion can wait for the completion of local
file operations or local device requests
that are guaranteed to return in a lim-
2.6. System Requirements for Migration ited time frame. Waiting for a message or
To support migration effectively, a system accessing a remote device are examples
should provide the following types of func- of operations that need to be aborted and
tionality: restarted on the remote node. Processes
r that cannot have their non-migrateable
Exporting/importing the process state cleaned cannot be considered for
state. The system must provide some migration.
type of export/import interfaces that
allow the process migration mechanism 2.7. Load Information Management
to extract a process’s state from the
source node and import this state on the The local processes and the resources of
destination node. These interfaces may local and remote nodes have to be char-
be provided by the underlying operating acterized, in order to select a process for
system, the programming language, migration and a destination node, as well
or other elements of the programming as to justify migration. This task is com-
environment that the process has access monly known as load information manage-
to. State includes processor registers, ment. Load information is collected and
process address space and communi- passed to a distributed scheduling policy
their age. He also finds that the most low and medium loaded systems, which
UNIX processes are short-lived, more than have a few overloaded nodes. This strat-
78% of the observed processes have a life- egy is convenient for remote invocation
time shorter than 1s and 97% shorter strategies [Eager et al., 1986a; Krueger
than 4s. and Livny, 1987b; Agrawal and Ezzat,
Harchol-Balter and Downey explore 1987].
the correlation between process lifetime r A receiver-initiated policy is acti-
and acceptable migration costs [Harchol- vated on underloaded nodes willing to
Balter and Downey, 1997]. They derive accept the load from overloaded ones.
a more accurate form of the process life- A receiver-initiated policy is preferable
time distribution that allows them to pre- for high load systems, with many over-
dict the life-time correlated to the pro- loaded nodes and few underloaded ones.
cess age and to derive a cost criterion Process migration is particularly well-
for migration. Svensson filters out short- suited for this strategy, since only with
running processes by relying on statistics migration can one initiate process trans-
[Svensson, 1990], whereas Wang et al. de- fer at an arbitrary point in time [Bryant
ploy AI theory for the same purpose [Wang and Finkel, 1981; Eager et al., 1986a;
et al., 1993]. Krueger and Livny, 1988].
r A symmetric policy is the combina-
2.8. Distributed Scheduling
tion of the previous two policies, in an
This section addresses distributed sched- attempt to take advantage of the good
uling closely related to process migration characteristics of both of them. It is suit-
mechanisms. General surveys are pre- able for a broader range of conditions
sented elsewhere [Wang and Morris, 1985; than either receiver-initiated or sender-
Casavant and Kuhl, 1988a; Hac, 1989b; initiated strategies alone [Krueger and
Goscinski, 1991; Chapin, 1996]. Livny, 1987b; Shivaratri et al., 1992].
Distributed scheduling uses the infor- r A random policy chooses the destina-
mation provided by the load informa- tion node randomly from all nodes in a
tion management module to make migra- distributed system. This simple strategy
tion decisions, as described in Figure 4. can result in a significant performance
The main goal is to determine when to improvement [Alon et al., 1987; Eager
migrate which process where. The acti- et al., 1986b; Kunz, 1991].
vation policy provides the answer to the
question when to migrate. Scheduling is The following are some of the issues in
activated periodically or it is event-driven. distributed scheduling related to the pro-
After activation, the load is inspected, and cess migration mechanism:
if it is above/below a threshold, actions r
are undertaken according to the selected Adaptability is concerned with the
strategy. The selection policy answers the scheduling impact on system behavior
question which process to migrate. The [Stankovic, 1984]. Based on the cur-
processes are inspected and some of them rent host and network load, the rela-
are selected for migration according to the tive importance of load parameters may
specified criteria. Where to migrate de- change. The policy should adapt to these
pends on the location policy algorithm, changes. Process migration is inherently
which chooses a remote node based on the adaptable because it allows processes to
available information. run prior to dispatching them to other
There are a few well-known classes of nodes, giving them a chance to adapt. Mi-
distributed scheduling policies: gration can happen at any time (thereby
adapting to sudden load changes),
r A sender-initiated policy is activated whereas initial placement happens only
on the node that is overloaded and that prior to starting a process. Examples of
wishes to off-load to other nodes. A adaptive load distribution include work
sender-initiated policy is preferable for by Agrawal and Ezzat [1987], Krueger
and Livny [1988], Concepcion and scheduling among the groups. This area
Eleazar [1988], Efe and Groselj [1989], has attracted much research [Bowen
Venkatesh and Dattatreya [1990], Shiv- et al., 1988; Bonomi and Kumar, 1988;
aratri and Krueger [1990], and Mehra Feitelson and Rudolph, 1990; Gupta
and Wah [1992]. and Gopinath, 1990; Gopinath and
r Stability is defined as the ability to de- Gupta, 1991; Chapin, 1995]. A process
tect when the effects of further actions migration mechanism is a good fit for
(e.g. load scheduling or paging) will not hierarchical scheduling since processes
improve the system state as defined by are typically migrated within a LAN or
a user’s objective [Casavant and Kuhl, other smaller domain. Only in the case
1988b]. Due to the distributed state, of large load discrepancies are processes
some instability is inevitable, since it migrated between domains, i.e. between
is impossible to transfer state changes peers at higher levels of the hierarchy.
across the system instantly. However,
high levels of instability should be The most important question that dis-
avoided. In some cases it is advisable tributed scheduling studies address re-
not to perform any action, e.g. under ex- lated to process migration is whether mi-
tremely high loads it is better to aban- gration pays off. The answer depends
don load distribution entirely. Process heavily on the assumptions made. For ex-
migration can negatively affect stability ample, Eager et al. compare the receiver-
if processes are migrated back and forth and sender-initiated policies [Eager et al.,
among the nodes, similar to the thrash- 1986a], and show that the sender-initiated
ing introduced by paging [Denning, policies outperform the receiver-initiated
1980]. To prevent such behavior a limit policies for light and moderate system
on the number of migrations can be im- loads. The receiver-initiated policy is bet-
posed. Bryant and Finkel demonstrate ter for higher loads, assuming that trans-
how process migration can improve sta- fer costs are same. They argue that the
bility [Bryant and Finkel, 1981]. transfer costs for the receiver policy, that
r requires some kind of migration, are much
Approximate and heuristic sched- higher than the costs for mechanisms
uling is necessary since optimal solu- for the sender-initiated strategies, where
tions are hard to achieve. Suboptimal initial placement suffices. They finally
solutions are reached either by approx- conclude that under no condition could mi-
imating the search space with its subset gration provide significantly better perfor-
or by using heuristics. Some of the mance than initial placement [Eager et al.,
examples of approximate and heuristic 1988].
scheduling include work by Efe [1982], Krueger and Livny investigate the rela-
Leland and Ott [1986], Lo [1988], tionship between load balancing and load
Casavant and Kuhl [1988a], and Xu and sharing [Krueger and Livny, 1988]. They
Hwang [1990]. Deploying process migra- argue that load balancing and load shar-
tion introduces more determinism and ing represent various points in a contin-
requires fewer heuristics than alterna- uum defined by a set of goals and load con-
tive load distribution mechanisms. Even ditions [Krueger and Livny, 1987]. They
when incorrect migration decisions are claim that the work of Eager et al. [1988]
made, they can be alleviated by subse- is only valid for a part of the contin-
quent migrations, which is not the case uum, but it cannot be adopted generally.
with initial process placement where Based on better job distributions than
processes have to execute on the same those used by Eager et al., their simulation
node until the end of its lifetime. results show that migration can improve
r Hierarchical scheduling integrates performance.
distributed and centralized scheduling. Harchol-Balter and Downey present the
It supports distributed scheduling most recent results on the benefits of us-
within a group of nodes and centralized ing process migration [Harchol-Balter and
Downey, 1997]. They use the measured and Birman, 1982], Nest [Agrawal and
distribution of process lifetimes for a va- Ezzat, 1987], Sprite [Ousterhout et al.,
riety of workloads in an academic envi- 1988], Plan 9 [Pike et al., 1990], Amoeba
ronment. The crucial point of their work [Mullender et al., 1990], Drums [Bond,
is understanding the correct lifetime dis- 1993], Utopia [Zhou et al., 1994], and Hive
tribution, which they find to be Pareto [Chapin et al., 1995].
(heavy-tailed). Based on the trace-driven Remote execution has disadvantages as
simulation, they demonstrate a 35-50% well. It allows creation of the remote in-
improvement in the mean delay when us- stance only at the time of process creation,
ing process migration instead of remote as opposed to process migration which al-
execution (preemptive v. non-preemptive lows moving the process at an arbitrary
scheduling) even when the costs of migra- time. Allowing a process to run on the
tion are high. source node for some period of time is ad-
Their work differs from Eager et al. vantageous in some respects. This way,
[1988] in system model and workload de- short-lived processes that are not worth
scription. Eager et al. model server farms, migrating are naturally filtered out. Also,
where the benefits of remote execution the longer a process runs, the more in-
are overestimated: there are no associated formation about its behavior is available,
costs and no affinity toward a particular such as whether and with whom it commu-
node. Harchol-Balter and Downey model a nicates. Based on this additional informa-
network of workstations where remote ex- tion, scheduling policies can make more
ecution entails costs, and there exists an appropriate decisions.
affinity toward some of the nodes in a dis- Cloning processes is useful in cases
tributed system. The workload that Eager where the child process inherits state from
et al. use contains few jobs with non-zero the parent process. Cloning is typically
life-times, resulting in a system with lit- achieved using a remote fork mechanism.
tle imbalance and little need for process A remote fork, followed by the termina-
migration. tion of the parent, resembles process mi-
gration. The complexity of cloning pro-
2.9. Alternatives to Process Migration cesses is similar to migration, because the
Given the relative complexity of im- same amount of the process state is in-
plementation, and the expense incurred herited (e.g. open files and address space).
when process migration is invoked, re- In the case of migration, the parent is
searchers often choose to implement al- terminated. In the case of cloning, both
ternative mechanisms [Shivaratri et al., parent and child may continue to access
1992; Kremien and Kramer, 1992]. the same state, introducing distributed
Remote execution is the most fre- shared state, which is typically complex
quently used alternative to process migra- and costly to maintain. Many systems use
tion. Remote execution can be as simple remote forking [Goldberg and Jefferson,
as the invocation of some code on a re- 1987; Smith and Ioannidis, 1989; Zajcew
mote node, or it can involve transferring et al., 1993].
the code to the remote node and inherit- Programming language support for
ing some of the process environment, such mobility enables a wide variety of options,
as variables and opened files. Remote exe- since such systems have almost complete
cution is usually faster than migration be- control over the runtime implementation
cause it does not incur the cost of trans- of an application. Such systems can en-
ferring a potentially large process state able self-checkpointing (and hence migrat-
(such as the address space, which is cre- able) applications. They are suitable for
ated anew in the case of remote execu- entire processes, but also for objects as
tion). For small address spaces, the costs small as a few bytes, such as in Emerald
for remote execution and migration can [Jul et al., 1988; Jul, 1989] or Ellie
be similar. Remote execution is used in [Andersen, 1992]. Finer granularity in-
many systems such as COCANET [Rowe curs lower transfer costs. The complexity
a process was created [Douglis and Transparency also assumes that the mi-
Ousterhout, 1991]. An example of a home grated instance can execute all system
dependency is redirecting systems calls to calls as if it were not migrated. Some user-
the home node: for example, local host- space migrations do not allow system calls
dependent calls, calls related to the file that generate internode signals or file ac-
system (in the absence of a distributed cess [Mandelberg and Sunderam, 1988;
file system), or operations on local devices. Freedman, 1991].
A home dependency can simplify migra- Single System Image (SSI) represents a
tion, because it is easier to redirect re- complete form of transparency. It provides
quests to the home node than to support a unique view of a system composed of a
services on all nodes. However, it also ad- number of nodes as if there were just one
versely affects reliability, because a mi- node. A process can be started and com-
grated foreign process will always depend municated with without knowing where
on its home node. The notion of the home it is physically executing. Resources can
dependency is further elaborated upon be- be transparently accessed from any node
low in Section 5.1 (MOSIX) and Section 5.2 in the system as if they were attached
(Sprite). to the local node. The underlying system
Redirecting communication through the typically decides where to instantiate new
previously established links represents processes or where to allocate and access
another kind of residual dependency. In resources.
general, dependencies left at multiple SSI can be applied at different levels of
nodes should be avoided, since they re- the system. At the user-level, SSI consists
quire complex support, and degrade per- of providing transparent access to objects
formance and fault resilience. Therefore, and resources that comprise a particu-
some form of periodic or lazy removal of lar programming environment. Examples
residual dependencies is desirable. For ex- include Amber [Chase et al., 1989] and
ample, the system could flush remain- Emerald [Jul, 1989]. At the traditional op-
ing pages to the backing store, or update erating system level, SSI typically con-
residual information on migrated commu- sists of a distributed file system and dis-
nication channels. tributed process management, such as in
MOSIX [Barak and Litman, 1985], Sprite
[Ousterhout et al., 1988] and OSF/1 AD
3.3. Transparency
TNC [Zajcew et al., 1993]. At the microker-
Transparency requires that neither the nel level, SSI is comprised of mechanisms,
migrated task nor other tasks in the sys- such as distributed IPC, distributed mem-
tem can notice migration, with the possi- ory management, and remote tasking. A
ble exception of performance effects. Com- near-SSI is implemented for Mach [Black
munication with a migrated process could et al., 1992] based on these transpar-
be delayed during migration, but no mes- ent mechanisms, but the policies are sup-
sage can be lost. After migration, the ported at the OSF/1 AD server running on
process should continue to communicate top of it. At the microkernel level the pro-
through previously opened I/O channels, grammer needs to specify where to create
for example printing to the same console remote tasks.
or reading from the same files. SSI supports transparent access to a
Transparency is supported in a variety process, as well as to its resources, which
of ways, depending on the underlying op- simplifies migration. On the other hand,
erating system. Sprite and NOW MOSIX the migration mechanism exercises func-
maintain a notion of a home machine that tionality provided at the SSI level, posing
executes all host-specific code [Douglis a more stressful workload than normally
and Ousterhout, 1991; Barak et al., 1995]. experienced in systems without migra-
Charlotte supports IPC through links, tion [Milojičić et al., 1993a]. Therefore,
which provide for remapping after migra- although a migration implementation on
tion [Finkel et al., 1989]. top of SSI may seem less complex, this
complexity is pushed down into the SSI locating techniques, such as multicasting
implementation. (as used in V kernel [Theimer et al.,
Some location dependencies on another 1985), reliance on the home node (as used
host may be inevitable, such as access- in Sprite [Douglis and Ousterhout, 1991],
ing local devices or accessing kernel- and MOSIX [Barak and Litman, 1985]),
dependent state that is managed by the or on a forwarding name server (as used
other host. It is not possible transparently in most distributed name services, such
to support such dependencies on the newly as DCE, as well as in mobile agents, such
visited nodes, other than by forwarding as MOA [Milojičić et al., 1999]). This way
the calls back to the home node, as was dependencies are singled out on dedicated
done in Sprite [Douglis and Ousterhout, nodes, as opposed to being scattered
1991]. throughout all the nodes visited, as is
the case for Charlotte [Artsy et al., 1987].
3.4. Fault Resilience Shapiro, et al. [1992] propose so-called
Fault resilience is frequently mentioned SSP Chains for periodically collapsing
as a benefit of process migration. How- forwarding pointers (and thereby reduc-
ever, this claim has never been substan- ing residual dependencies) in the case of
tiated with a practical implementation, garbage collection.
although some projects have specifically
addressed fault resilience [Chou and 3.5. Scalability
Abraham, 1983; Lu et al., 1987]. So far The scalability of a process migration
the major contribution of process migra- mechanism is related to the scalability of
tion for fault resilience is through com- its underlying environment. It can be mea-
bination with checkpointing, such as in sured with respect to the number of nodes
Condor [Litzkow and Solomon, 1992], LSF in the system, to the number of migra-
Zhou et al., 1994] and in work by Skordos tions a process can perform during its life-
[1995]. Migration was also suggested as a time, and to the type and complexity of
means of fault containment [Chapin et al., the processes, such as the number of open
1995]. channels or files, and memory size or frag-
Failures play an important role in mentation.
the implementation of process migration. The number of nodes in the system af-
They can happen on a source or target ma- fects the organization and management
chine or on the communication medium. of structures that maintain residual pro-
Various migration schemes are more or cess state and the naming of migrated pro-
less sensitive to each type of failure. Resid- cesses. If these structures are not part of
ual dependencies have a particularly neg- the existing operating system, then they
ative impact on fault resilience. Using need to be added.
them is a trade-off between efficiency and Depending on the migration algorithm
reliability. and the techniques employed, some sys-
Fault resilience can be improved in sev- tems are not scalable in the number of
eral ways. The impact of failures during migrations a process may perform. As we
migration can be reduced by maintaining shall see in the case study on Mach (see
process state on both the source and Section 5.3), sometimes process state can
destination sites until the destination grow with the number of migrations. This
site instance is successfully promoted to is acceptable for a small number of mi-
a regular process and the source node is grations, but in other cases the additional
informed about this. A source node failure state can dominate migration costs and
can be overcome by completely detaching render the migration mechanism useless.
the instance from the source node once Migration algorithms should avoid lin-
it is migrated, though this prevents ear dependencies on the amount of state
lazy evaluation techniques from being to be transferred. For example, the eager
employed. One way to remove communi- data transfer strategy has costs propor-
cation residual dependencies is to deploy tional to the address space size, incurring
significant costs for large address spaces. points at which the application can be
The costs for a lazily copied process are safely preempted and checkpointed. The
independent of the address space size, but checkpointing program sets a breakpoint
they can depend on the granularity and at each preemption point and examines
type of the address space. For example, the the state of the process when a breakpoint
transfer of a large sparse address space is encountered. Smith and Hutchinson
can have costs proportional to the num- note that not all programs can be safely
ber of contiguous address space regions, checkpointed in this fashion, largely de-
because each such region has metadata as- pending on what features of the language
sociated with it that must be transferred are used [Smith and Hutchinson, 1998].
at migration time. This overhead can be Emerald [Steensgaard and Jul, 1995]
exacerbated if the meta-data for each re- is another example of a heterogeneous
gion is transferred as a separate opera- system.
tion, as was done in the initial implemen- In the most recent systems, heterogene-
tation of Mach task migration [Milojičić ity is provided at the language level, as
et al., 1993b]. by using intermediate byte code repre-
Communication channels can also affect sentation in Java [Gosling et al., 1996],
scalability. Forwarding communication to or by relying on scripting languages such
a migrated process is acceptable after a as Telescript [White, 1996] or Tcl/Tk
small number of sequential migrations, [Ousterhout, 1994].
but after a large number of migrations
the forwarding costs can be significant. In 3.7. Summary
that case, some other technique, such as This subsection evaluates the trade-offs
updating communication links, must be between various characteristics of process
employed. migration, and who should be concerned
with it.
3.6. Heterogeneity Complexity is much more of a concern
Heterogeneity has not been addressed in to the implementors of a process migra-
most early migration implementations. tion facility than to its users. Complexity
Instead, homogeneity is considered as a depends on the level where migration
requirement; migration is allowed only is implemented. Kernel-level implemen-
among the nodes with a compatible archi- tations require significantly more com-
tecture and processor instruction set. This plexity than user-level implementations.
was not a significant limitation at the time Users of process migration are impacted
since most of the work was conducted on only in the case of user-level implemen-
clusters of homogeneous machines. tations where certain modifications of the
Some earlier work indicated the need application code are required or where mi-
as well as possible solutions for solving gration is not fully transparent.
the heterogeneity problem, but no mature Long-running applications are not con-
implementations resulted [Maguire and cerned with performance as are those
Smith, 1988; Dubach, 1989; Shub, 1990; applications whose lifetimes are com-
Theimer and Hayes, 1991]. parable to their migration time. Short-
The deployment of world-wide comput- running applications are generally not
ing has increased the interest in hetero- good candidates for migration. Migration-
geneous migration. In order to achieve time performance can be traded off against
heterogeneity, process state needs to be execution-time (by leaving residual depen-
saved in a machine-independent represen- dencies, or by lazily resolving communica-
tation. This permits the process to resume tion channels). Residual dependencies are
on nodes with different architectures. An of concern for long-running applications
application is usually compiled in advance and for network applications. Applications
on each architecture, instrumenting the with real-time requirements generally are
code to know what procedures and vari- not suitable candidates for residual depen-
ables exist at any time, and identifying dency because of the unpredictable costs
machine are called segments. If a segment quired mechanisms for forwarding mes-
fails, other segments cooperatively rein- sages and updating links. The transferred
stantiate it by locating a free machine, re- state includes program code and data
booting it from the network, and migrat- (most of the state), swappable and non-
ing the failed worm segment to it. A worm swappable state, and messages in the in-
can move from one machine to another, oc- coming queue of the process.
cupying needed resources, and replicating
itself. As opposed to other migration sys-
4.2. Transparent Migration in
tems, a worm is aware of the underlying
UNIX-like Systems
network topology. Communication among
worm segments is maintained through UNIX-like systems have proven to be rel-
multicasting. atively hard to extend for transparent
The original Butler system supports migration and have required significant
remote execution and process migra- modifications and extensions to the under-
tion [Dannenberg, 1982]. Migration oc- lying kernel (see Subsections 4.3 and 4.4
curs when the guest process needs to be for comparisons with other types of OSes).
“deported” from the remote node, e.g. in There are two approaches to addressing
case when it exceeds resources it nego- distribution and migration for these sys-
tiated before arrival. In such a case, the tems. One is to provide for distribution at
complete state of the guest process is the lower levels of a system, as in MOSIX
packaged and transferred to a new node. or Sprite, and the other is by providing
The state consists of the address space, distribution at a higher-level, as in Locus
registers, as well as the state contained and its derivatives. In this section, we
in the servers collocated at the same shall describe process migration for Locus,
node. Migration does not break the com- MOSIX and Sprite. All of these systems
munication paths because the underly- also happened to be RPC-based, as op-
ing operating system (Accent [Rashid and posed to the message-passing systems de-
Robertson, 1981]) allows for port migra- scribed in Section 4.3.
tion. The Butler design also deals with Locus is a UNIX-compatible operating
the issues of protection, security, and au- system that provides transparent access
tonomy [Dannenberg and Hibbard, 1985]. to remote resources, and enhanced relia-
In particular, the system protects the bility and availability [Popek et al., 1981;
client program, the Butler daemons on the Popek and Walker, 1985]. It supports pro-
source and destination nodes, the visiting cess migration [Walker et al., 1983] and
process, and the remote node. In its later initial placement [Butterfield and Popek,
incarnation, Butler supports only remote 1984]. Locus is one of the rare systems
invocation [Nichols, 1987]. that achieved product stage. It has been
DEMOS/MP [Miller et al., 1987] is a ported to the AIX operating system on the
successor of the earlier version of the IBM 370 and PS/2 computers under the
DEMOS operating system [Baskett et al., name of the Transparent Computing Fa-
1977]. Process migration is fully transpar- cility (TCF) [Walker and Mathews, 1989].
ent: a process can be migrated during exe- Locus migration has a high level of func-
cution without limitations on resource ac- tionality and transparency. However, this
cess. The implementation of migration has required significant kernel modifications.
been simplified and its impact to other Locus has subsequently been ported to
services limited by the message-passing, the OSF/1 AD operating system, under
location-independent communication, and the name of TNC [Zajcew et al., 1993].
by the fact that the kernel can partici- OSF/1 AD is a distributed operating sys-
pate in the communication in the same tem running on top of the Mach microker-
manner as any process [Powell and Miller, nel on Intel x86 and Paragon architectures
1983]. Most of the support for process (see Section 5.3). TNC is only partially con-
migration already existed in the DEMOS cerned with task migration issues of the
kernel. Extending it with migration re- underlying Mach microkernel, because in
the OSF/1 AD environment the Mach sages sent to a migrated process can be
interface is not exposed to the user, and forwarded after its migration to a new
therefore the atomicity of process migra- destination. However, much of the sim-
tion is not affected. Locus was also used plicity that seems to be inherent for
as a testbed for a distributed shared mem- message-passing systems is hidden in-
ory implementation, Mirage [Fleisch and side the complex message-passing mech-
Popek, 1989]. Distributed shared mem- anisms [Douglis and Ousterhout, 1991].
ory was not combined with process migra- In this section we describe Charlotte,
tion as was done in the case of Mach (see Accent and the V kernel. The V kernel can
Section 5.3). be classified both as a microker-nel and
The MOSIX distributed operating sys- as a message passing kernel; we chose to
tem is an ongoing project that began in present it in the message-passing section.
1981. It supports process migration on Charlotte is a message-passing operat-
top of a single system image base [Barak ing system designed for the Crystal mul-
and Litman, 1985] and in a Network of ticomputer composed of 20 VAX-11/750
Workstations environment [Barak et al., computers [Artsy and Finkel, 1989]. The
1995]. The process migration mechanism Charlotte migration mechanism exten-
is used to support dynamic load balancing. sively relies on the underlying operating
MOSIX employs a probabilistic algorithm system and its communication mecha-
in its load information management that nisms which were modified in order to sup-
allows it to transmit partial load informa- port transparent network communication
tion between pairs of nodes [Barak and [Artsy et al., 1987]. Its process migration is
Shiloh, 1985; Barak and Wheeler, 1989]. well insulated from other system modules.
A case study of the MOSIX system is pre- Migration is designed to be fault resilient:
sented in Section 5.1. processes leave no residual dependency on
The Sprite network operating system the source machine. The act of migration
[Ousterhout et al., 1988] was developed is committed in the final phase of the state
from 1984–1994. Its process migration fa- transfer; it is possible to undo the migra-
cility [Douglis and Ousterhout, 1991] was tion before committing it.
transparent both to users and to appli- Accent is a distributed operating sys-
cations, by making processes appear to tem developed at CMU [Rashid and
execute on one host throughout their ex- Robertson, 1981; Rashid, 1986]. Its pro-
ecution. Processes could access remote re- cess migration scheme was the first one to
sources, including files, devices, and net- use the “Copy-On-Reference” (COR) tech-
work connections, from different locations nique to lazily copy pages [Zayas, 1987a].
over time. When a user returned to a work- Instead of eagerly copying pages, virtual
station onto which processes had been off- segments are created on the destination
loaded, the processes were immediately node. When a page fault occurs, the vir-
migrated back to their home machines and tual segment provides a link to the page
could execute there, migrate else-where, on the source node. The duration of the ini-
or suspend execution. A case study of the tial address space transfer is independent
Sprite system is presented in Section 5.2. of the address space size, but rather de-
pends on the number of contiguous mem-
ory regions. The subsequent costs for lazily
4.3. OS with Message-Passing Interface
copied pages are proportional to the num-
Process migration for message-passing op- ber of pages referenced. The basic as-
erating systems seems easier to design sumption is that the program would not
and implement. Message passing is con- access all of its address space, thereby
venient for interposing, forwarding and saving the cost of a useless transfer. Be-
encapsulating state. For example, a new sides failure vulnerability, the drawback
receiver may be interposed between the of lazy evaluation is the increased com-
existing receiver and the sender, with- plexity of in-kernel memory management
out the knowledge of the latter, and mes- [Zayas, 1987b].
et al., 1993]. Since Amoeba does not sup- cally a function of the address space size,
port virtual memory, the memory transfer since the eager (all) data transfer scheme
for process migration is achieved by phys- is deployed. This subsection presents a
ical copying [Zhu et al., 1995]. few such implementations: Condor, the
Birlix supports adaptable object migra- work by Alonso and Kyrimis, the work
tion [Lux, 1995]. It is possible to spec- by Mandelberg and Sunderam, the work
ify a migration policy on a per-object ba- by Petri and Langendoerfer, MPVM, and
sis. A meta-object encapsulates data for LSF.
the migration mechanism and information Condor is a software package that sup-
collection. An example of the use of an ports user-space checkpointing and pro-
adaptable migration mechanism is to ex- cess migration in locally distributed sys-
tend migration for improved reliability or tems [Litzkow, 1987; Litzkow et al., 1988;
performance [Lux et al., 1993]. Litzkow and Solomon, 1992]. Its check-
Mach [Accetta et al., 1986] was used pointing support is particularly useful for
as a base for supporting task migration long-running computations, but is too ex-
[Milojičić et al., 1993b], developed at the pensive for short processes. Migration in-
University of Kaiserslautern. The goals volves generating a core file for a process,
were to demonstrate that microkernels combining this file with the executable
are a suitable substrate for migration and then sending this on to the target
mechanisms and for load distribution in machine. System calls are redirected to a
general. The task migration implementa- “shadow” process on the source machine.
tion significantly benefited from the near This requires a special version of the C
SSI provided by Mach, in particular from library to be linked with the migrated
distributed IPC and distributed memory programs.
management. Process migration was built Condor does not support processes that
for the OSF/1 AD 1 server using Mach use signals, memory mapped files, timers,
task migration [Paindaveine and Milojičić, shared libraries, or IPC. The scheduler
1996]. Task and process migration on top activation period is 10 minutes, which
of Mach are discussed in more detail in demonstrates the “heaviness” of migra-
Section 5.3. tion. Nevertheless, Condor is often used
for long-running computations. It has
been ported to a variety of operating sys-
4.5. User-space Migrations
tems. Condor was a starting point for a few
While it is relatively straightforward to industry products, such as LSF from Plat-
provide process migration for distributed form Computing [Zhou et al., 1994] and
operating systems, such as the V kernel, Loadleveler from IBM.
Accent, or Sprite, it is much harder to sup- Alonso and Kyrimis perform minor
port transparent process migration on in- modifications to the UNIX kernel in order
dustry standard operating systems, which to support process migration in user space
are typically non-distributed. Most work- [Alonso and Kyrimis, 1988]. A new signal
stations in the 1980s and 1990s run pro- for dumping process state and a new sys-
prietary versions of UNIX, which makes tem call for restarting a process are intro-
them a more challenging base for process duced. This implementation is limited to
migration than distributed operating sys- processes that do not communicate and
tems. Source code is not widely available are not location- or process-dependent.
for a proprietary OS; therefore, the only The work by Alonso and Kyrimis was done
way to achieve a viable and widespread in parallel with the early Condor system.
migration is to implement it in user space. Mandelberg and Sunderam present
User-space migration is targeted to a process migration scheme for UNIX that
long-running processes that do not pose does not support tasks that perform I/O
significant OS requirements, do not need on non-NFS files, spawn subprocesses,
transparency, and use only a limited set of or utilize pipes and sockets [Mandelberg
system calls. The migration time is typi- and Sunderam, 1988]. A new terminal
interface supports detaching a process that they will be used in the predominant
from its terminal and monitors requests phase of execution. This ad hoc process mi-
for I/O on the process migration port. gration considers only memory contents.
Migratory Parallel Virtual Machine Skordos integrates migration with par-
(MPVM) extends the PVM system allel simulation of subsonic fluid dy-
[Beguelin et al., 1993] to support process namics on a cluster of workstations
migration among homogeneous machines [Skordos, 1995]. Skordos statically allo-
[Casas et al., 1995]. Its primary goals are cates problem sizes and uses migration
transparency, compatibility with PVM, when a workstation becomes overloaded.
and portability. It is implemented entirely Upon migration, the process is restarted
as a user-level mechanism. It supports after synchronization with processes par-
communication among migrating pro- ticipating in the application on other
cesses by limiting TCP communication to nodes. At the same time, it is possible to
other MPVM processes. conduct multiple migrations. On a cluster
Load Sharing Facility (LSF) sup- of 20 HP-Apollo workstations connected
ports migration indirectly through pro- by 10 Mbps Ethernet, Skordos notices ap-
cess checkpointing and restart [Platform proximately one migration every 45 min-
Computing, 1996]. LSF can work with utes. Each migration lasts 30 seconds on
checkpointing at three possible levels: average. Despite the high costs, its rela-
kernel, user, and application. The tech- tive impact is very low. Migrations happen
nique used for user-level check-pointing is infrequently, and do not last long relative
based on the Condor approach [Litzkow to the overall execution time.
and Solomon, 1992], but no core file Bharat and Cardelli describe Migra-
is required, thereby improving perfor- tory Applications, an environment for mi-
mance, and signals can be used across grating applications along with the user
checkpoints, thereby improving trans- interface and the application context,
parency. LSF is described in more detail in thereby retaining the same “look and feel”
Section 5.4. across different platforms [Bharat and
Cardelli, 1995]. This type of migration is
particularly suitable for mobile applica-
4.6. Application-specific Migration
tions, where a user may be travelling from
Migration can also be implemented as a one environment to another. Migratory ap-
part of an application. Such an approach plications are closely related to the un-
deliberately sacrifices transparency and derlying programming language Oblique
reusability. A migrating process is typi- [Cardelli, 1995].
cally limited in functionality and migra-
tion has to be adjusted for each new appli- 4.7. Mobile Objects
cation. Nevertheless, the implementation
can be significantly simplified and opti- In this paper we are primarily concerned
mized for one particular application. In with process and task migration. Object
this subsection we describe work by Freed- migration and mobile agents are two other
man, Skordos, and Bharat and Cardelli. forms of migration that we mention briefly
Freedman reports a process migration in this and the following subsection. Al-
scheme involving cooperation between the though used in different settings, these
migrated process and the migration mod- forms of migration serve a similar purpose
ule [Freedman, 1991]. The author ob- and solve some of the same problems as
serves that long-running computations process migration does. In this subsection,
typically use operating system services we give an overview of object migration for
in the beginning and ending phases of Emerald, SOS and COOL.
execution, while most of their time is spent Emerald is a programming language
in number-crunching. Therefore, little at- and environment for the support of dis-
tention is paid to supporting files, sock- tributed systems [Black et al., 1987]. It
ets, and devices, since it is not expected supports mobile objects as small as a
local load vector information. In addition, Table 1. MOSIX System Call Performance
when a load information message is re-
System Call Local Remote Slowdown
ceived, the receiving node acknowledges
receipt of the message by returning its read (1K) 0.34 1.36 4.00
write (1K) 0.68 1.65 2.43
own load information back to the sending open/close 2.06 4.31 2.09
node. fork (256 Kb) 7.8 21.60 2.77
During each iteration of the algorithm, exec (256 KB) 25.30 51.50 2.04
the local load vector is updated by incorpo-
rating newly received information and by
aging or replacing older load information. hout [1987], system calls were 2.8 times
To discourage migration between nodes slower when executed on a remote 33MHz
with small load variations, each node ad- MOSIX node [Barak et al., 1989]. Table 1
justs its exported local load information shows the measured performance and
by a stability factor. For migration to take slowdown of several commonly used sys-
place, the difference in load values be- tem calls. Many system calls, for example
tween two nodes must exceed this stability getpid (), are always performed on the pro-
value. cesse’s current node and have no remote
The load balancing algorithm decides to performance degradation.
migrate processes when it finds another The performance of the MOSIX mi-
node with a significantly reduced load. It gration algorithm depends directly on
selects a local process that has accumu- the performance of the linker’s data
lated a certain minimum amount of run- transfer mechanism on a given network
time, giving preference to processes which and the size of the dirty address space
have a history of forking off new subpro- and user area of the migrating process.
cesses or have a history of communica- The measured performance of the VME-
tion with the selected node. This prevents based MOSIX migration, from one node
short-lived processes from migrating. of the cluster to the bus master, was
Implementation and Performance. 1.2 MB/second. The maximum data trans-
Porting the original version of MOSIX to a fer speed of the system’s VME bus was
new operating system base required sub- 3 MB/second.
stantial modifications to the OS kernel in Some applications benefit significantly
order to layer the code base into the three from executing in parallel on multiple
MOSIX components (linker, lower and up- nodes. In order to allow such applications
per kernels). Few changes took place at to run on a system without negatively im-
the low level operating system code [Barak pacting everyone else, one needs process
and Wheeler, 1989]. migration in order to be able to rebalance
In order to reduce the invasiveness of loads when necessary. Arguably the most
the porting effort, a completely redesigned important performance measurement is
version of NOW MOSIX was developed for the measurement of an actual user-level
the BSDI version of UNIX [Barak et al., application. Specific applications, for ex-
1995]. The NOW MOSIX provides process ample an implementation of a graph color-
migration and load balancing. without a ing algorithm, show a near-linear speedup
single system image. As in Sprite, system with increasing number of nodes [Barak
calls that are location sensitive are for- et al., 1993]. Of course, this speedup does
warded to the home node of a migrated not apply to other types of applications
process as required (cf. Section 5.2). (non-CPU-bound, such as network or I/O
The performance of a migrated pro- bound jobs). These applications may ex-
cess in MOSIX depends on the nature perience different speedups. No attempt
of the process. One measurement of the has been conducted to measure an average
effect that migration has on a process speedup for such types of applications.
is the slower performance of remote sys- Lessons Learned. The MOSIX sys-
tem calls. Using the frequencies of system tem demonstrated that dynamic load bal-
calls measured by Douglis and Ouster- ancing implemented via dynamic process
access to files or processes from different would suspend any process that could not
locations over time. be remigrated.
As in MOSIX, processes that share Implementation and Performance.
memory could not be migrated. Also, pro- Sprite ran on Sun (Sun 2, Sun 3,
cesses that map hardware devices directly Sun 4, SPARCstation 1, SPARCstation 2)
into memory, such as the X server, could and Digital (DECstation 3100 and 5100)
not migrate. workstations. The entire kernel consisted
Scalability. Sprite was designed for a of approximately 200,000 lines of heav-
cluster of workstations on a local area net- ily commented code, of which approximate
work and did not particularly address the 10,000 dealt with migration.
issue of scalability. As a result, neither The performance of migration in Sprite
did the migration system. The central- can be measured in three respects. All
ized load information management sys- measurements in this subsection were
tem, discussed next, could potentially be taken on SPARCstation 1 workstations
a bottleneck, although a variant based on on a 10-Mbps Ethernet, as reported in
the MOSIX probabilistic load dissemina- [Douglis and Ousterhout, 1991].
tion algorithm was also implemented. In
practice, the shared file servers proved to 1. The time to migrate a process was a
be the bottleneck for file-intensive opera- function of the overhead of host selec-
tions such as kernel compilations with as tion (36ms to select a single host, amor-
few as 4-5 hosts, while cpu-intensive simu- tized over multiple selections when mi-
lations scaled linearly with over ten hosts gration is performed in parallel); the
[Douglis, 1990]. state for each open file (9.4ms/file);
Load Information Management. A dirty file and VM blocks that must
separate, user-level process (migd) was be flushed (480-660 Kbytes/second de-
responsible for maintaining the state of pending on whether they are flushed
each host and allocating idle hosts to ap- in parallel); process state such as exec
plications. This daemon would be started arguments and environment variables
on a new host if it, or its host, should during remote invocation (also 480
crash. It allocated idle hosts to request- Kbytes/second); and a basic overhead
ing processes, up to one foreign “job” per of process creation and message traffic
available processor. (A “job” consisted of (76ms for the null process).
a foreign process and its descendants.) 2. A process that had migrated away from
It supported a notion of fairness, in that its home machine incurred run-time
one application could use all idle hosts overhead from forwarding location-
of the same architecture but would have dependent system calls. Applications of
some of them reclaimed if another ap- the sort that were typically migrated in
plication requested hosts as well. Re- Sprite, such as parallel compilation and
claiming due to fairness would look to LaTeX text processing, incurred only
the application just like reclaiming due 1-3% degradation from running re-
to a workstation’s local user returning: motely, while other applications that
the foreign processes would be migrated invoked a higher fraction of location-
home and either run locally, migrated dependent operations (such as access-
elsewhere, or suspended, depending on ing the TCP daemon on the home ma-
their controlling task’s behavior and host chine, or forking children repeatedly)
availability. incurred substantial overhead.
Migration was typically performed by 3. Since the purpose of migration in Sprite
pmake, a parallel make program like many was to enable parallel use of many
others that eventually became common- workstations, application speedup is
place (e.g., [Baalbergen, 1988]) Pmake an important metric. Speedup is af-
would use remote invocation and then fected by a number of factors, in-
remigrate processes if migd notified it cluding the degree of parallelism, the
that any of its children were evicted. It load on central resources such as the
with minimal changes to the microkernel. pager. However, the DMM became too
This was possible by relying on Mach OS complex, and had performance and scal-
mechanisms, such as (distributed) mem- ability problems. The particular design
ory management and (distributed) IPC. mistakes include the interactions between
The second goal was to demonstrate that it DSM support and virtual copies in a dis-
is possible to perform load distribution at tributed system; transparent extension of
the microkernel level, based on the three Mach copy-on-write VM optimization to
distinct parameters that characterize mi- distributed systems; and limitations im-
crokernels: processing, VM and IPC. posed by Mach’s external memory man-
Design. The design of task migration agement while transparently extending it
is affected by the underlying Mach mi- to distributed systems. (Copy-on-write is
crokernel. Mach supported various pow- an optimization introduced to avoid copy-
erful OS mechanisms for purposes other ing pages until it is absolutely needed,
than task and process migration. Exam- and otherwise sharing the same copy. It
ples include Distributed Memory Manage- has also been used in Chorus [Rozier,
ment (DMM) and Distributed IPC (DIPC). 1992] and Sprite [Nelson and Ousterhout,
DIPC and DMM simplified the design and 1988].)
implementation of task migration. DIPC DMM had too many goals to be success-
takes care of forwarding messages to mi- ful; it failed on many general principles,
grated process, and DMM supports remote such as “do one thing, but do it right,” and
paging and distributed shared memory. “optimize the common case” [Lampson,
The underlying complexity of message 1983]. Some of the experiments with task
redirection and distributed memory man- migration reflect these problems. Varia-
agement are heavily exercised by task tions of forking an address space and mi-
migration, exposing problems otherwise grating a task significantly suffered in
not encountered. This is in accordance performance. While some of these cases
with earlier observations about message- could be improved by optimizing the al-
passing [Douglis and Ousterhout, 1991]. gorithm (as was done in the case of CRTC
In order to improve robustness and per- [Milojičić et al., 1997]), it would only add
formance of DIPC, it was subsequently to an already complex and fragile XMM
redesigned and reimplemented [Milojičić design and implementation. Some of the
et al., 1997]. Migration experiments have DMM features are not useful for task mi-
not been performed with the improved gration, even though they were motivated
DIPC. However, extensive experiments by task migration support. Examples in-
have been conducted with Concurrent Re- clude DSM and distributed copy-on-write
mote Task Creation (CRTC), an in-kernel optimizations. DSM is introduced in order
service for concurrent creation of remote to support the transparent remote forking
tasks in a hierarchical fashion [Milojičić of address spaces (as a consequence of re-
et al., 1997]. The CRTC experiments are mote fork or migration) that locally share
similar to task migration, because a re- memory. Distributed copy-on-write is mo-
mote fork of a task address space is per- tivated by transparently forking address
formed. spaces that are already created as a con-
DMM enables cross-node transparency sequence of local copy-on-write, as well as
at the Mach VM interface in support in order to support caching in distributed
of a distributed file system, distributed case.
file system, distributes processes, and dis- Even though the DIPC and DMM inter-
tributed shared memory [Black, et al., faces support an implementation of user-
1998]. The DMM support resulted in sim- level task migration, there are two excep-
plified design and implementation of the tions. Most of the task state is accessible
functionality built on top of it, such as from user space except for the capabilities
SSI UNIX and remote tasking, and it that represent tasks and threads and ca-
avoided pager modifications by interpos- pabilities for internal memory state. Two
ing between the VM system and the new interfaces are provided for exporting
the aforementioned capabilities into user of DMM, the established paging paths
space. remain bound to the source node even
A goal of one of the user-space migration after eager copying of pages is performed
servers is to demonstrate different data to the destination node.
transfer strategies. An external memory Transparency was achieved by delay-
manager was used for implementation of ing access or providing concurrent access
this task migration server. The following to a migrating task and its state during
strategies were implemented: eager copy, migration. The other tasks in the sys-
flushing, copy-on-reference, precopy and tem can access the migrating task either
read-ahead [Milojičić et al., 1993b]. For by sending messages to the task kernel
most of the experiments, a simplified mi- port or by accessing its memory. Send-
gration server was used that relied on the ing messages is delayed by interposing
default in-kernel data transfer strategy, the task kernel port with an interpose
copy-on-reference. port. The messages sent to the interpose
The task migration algorithm steps port are queued on the source node and
are: then restarted on the destination node.
The messages sent to other task ports
1. Suspend the task and abort the threads are transferred as a part of migration of
in order to clean the kernel state.1 the receive capabilities for these ports.
2. Interpose task/thread kernel ports on Access to the task address space is sup-
the source node. ported by DMM even during migration.
3. Transfer the address space, capabili- Locally shared memory between two tasks
ties, threads and the other task/thread becomes distributed shared memory after
state. migration of either task.
4. Interpose back task/thread kernel ports In OSF/1 AD, a virtual process (Vprocs)
on the destination node. framework supports transparent opera-
tions on the processes independently of
5. Resume the task on the destination
the actual process’s location [Zajcew et al.,
node.
1993]. By analogy, vprocs are to processes
Process state is divided into several cat- what vnodes are to files, both providing lo-
egories: the Mach task state; the UNIX cation and heterogeneity transparency at
process local state; and the process- the system call interface. Distributed pro-
relationship state. The local process state cess management and the single system
corresponds to the typical UNIX proc and image of Mach and OSF/1 AD eased the
user structures. Open file descriptors, al- process migration implementation.
though part of the UNIX process state, are A single system image is preserved by
migrated as part of the Mach task state. retaining the process identifier and by
Fault Resilience of Mach task migra- providing transparent access to all UNIX
tion was limited by the default transfer resources. There are no forwarding stub
strategy, but even more by the DIPC and processes or chains. No restrictions are
DMM modules. Both modules heavily imposed on the processes considered for
employ the lazy evaluation principle, migration: for example, using pipes or sig-
leaving residual dependencies throughout nals does not prevent a process from being
the nodes of a distributed system. For migrated.
example, in the case of DIPC, proxies of Scalability. The largest system that
the receive capabilities remain on the Mach task migration ran on at University
source node after receive capability is of Kaiserslautern consisted of five nodes.
migrated to a remote node. In the case However, it would have been possible to
scale it closer towards the limits of the
1 Aborting is necessary for threads that can wait in
scalability of the underlying Mach micro-
the kernel arbitrarily long, such as in the case of wait-
kernel, which is up to a couple of thou-
ing for a message to arrive. The wait operation is sand nodes on the Intel Paragon super-
restartable on the destination node. computer.
User/Total IPC VM
Type Application Time (msg/s) ((pagin + out)/s)
Processing Dhrystone 1.00 3.49 0.35 + 0
IPC find 0.03 512.3 2.75 + 0
VM WPI Jigsaw 0.09 2.46 28.5 + 38.2
costs of migration are due to task mi- cess state would presumably have
gration. Process migration has very little alleviated this overhead, similar to
overhead in addition to task migration. the evolution in Sprite toward dis-
Performance measurements were con- tinguishing between location-dependent
ducted on a testbed consisting of three In- and location-independent calls [Douglis,
tel 33MHz 80486 PCs with 8MB RAM. The 1989].
NORMA14 Mach and UNIX server UX28 r Applications on microkernels can be pro-
were used. Performance is independent of filed as a function of processing, IPC and
the address space size (see Figure 8), and VM and this information can be used
is a linear function of the number of ca- for improved load distribution. Improve-
pabilities. It was significantly improved in ment ranges from 20-55% for collabora-
subsequent work [Milojičić et al., 1997]. tive types of applications.
Lessons Learned
5.4. LSF
r Relying on DIPC and DMM is crucial LSF (Load Sharing Facility) is a load shar-
for the easy design and implementation ing and batch scheduling product from
of transparent task migration, but these Platform Computing Corporation [Plat-
modules also entail most of the complex- form Computing, 1996]. LSF is based on
ity and they limit performance and fault the Utopia system developed at the Uni-
resilience. versity of Toronto [Zhou et al., 1994],
r Task migration is sufficient for mi- which is in turn based on the earlier Ph.D.
crokernel applications. In contrast, as thesis work of Zhou at UC Berkeley [Zhou,
mentioned above, UNIX applications 1987; Zhou and Ferrari, 1988].
would forward most system calls back LSF provides some distributed op-
to the source node, resulting in an erating system facilities, such as dis-
order-of-magnitude performance degra- tributed process scheduling and transpar-
dation. Migrating the full UNIX pro- ent remote execution, on top of various
to temporary load spikes and to control Process Migration vs. Initial Place-
migration frequency, LSF allows users to ment. Although LSF makes use of pro-
specify a time period for which a process cess migration to balance the load, it is
is suspended on its execution node. Only used more as an exception rather than the
if the local load conditions remain unfa- rule, for three reasons. First, transparent
vorable after this period would the sus- user-level checkpointing and migration
pended process be migrated to another are usable by only those processes linked
node. with the checkpoint library, unless the OS
The target node is selected based on the kernel can be modified; in either case,
dynamic load conditions and the resource their applicability is limited. Secondly,
requirements of the process. Recognizing intelligent initial process placement has
that different processes may require dif- been found to be effective in balancing the
ferent types of resources, LSF collects a load in many cases, reducing the need for
variety of load information for each node, migration [Eager et al., 1988]. Finally, and
such as average CPU run queue length, perhaps most importantly, the same load
available memory and swap space, disk balancing effect can often be achieved by
paging and I/O rate, and the duration of process placement with much less over-
idle period with no keyboard and mouse head. The remote process execution mech-
activities. Correspondingly, a process may anism in LSF maintains the connection
be associated with resource requirement between the application and the Remote
expressions such as Execution Server on the execution node
and caches the application’s execution con-
select[sparc && swap >= 120 && text for the duration of the application ex-
mem >= 64] order[cpu : mem] ecution, so that repeated remote process
executions would incur low overhead (0.1
which indicates that the selected node seconds as measured by Zhou et al. on a
should have a resource called “sparc,” and network of UNIX workstations [1994]).
should have at least 120 MB of swap space In contrast, it is not desirable to main-
and 64 MB of main memory. Among the tain per-application connections in a ker-
eligible nodes, the one with the fastest, nel implementation of process migration
lightly loaded CPU, as well as large mem- to keep the kernel simple, thus every pro-
ory space, should be selected. A heuristic cess migration to a remote node is “cold”.
sorting algorithm is employed by LSF to Per-application connections and cached
consider all the (potentially conflicting) re- application state are rather “heavyweight”
source preferences and select a suitable for kernel-level migration mechanisms,
host. Clearly, good host allocation can only and the kernel-level systems surveyed
be achieved if the load condition of all in this paper treat each migration sepa-
nodes is known to the scheduler. rately (though the underlying communi-
The resource requirements of a process cation systems, such as kernel-to-kernel
may be specified by the user when sub- RPC, may cache connection state). The
mitting the process to LSF, or may be benefits of optimizing remote execution
configured in a system process file along are evident by comparing LSF to an ear-
with the process name. This process file lier system such as Sprite. In the case of
is automatically consulted by LSF to de- Sprite, the overhead of exec time migra-
termine the resource requirement of each tion was measured to be approximately
type of process. This process file also stores 330ms on Sparcstation 1 workstations
information on the eligibility of each type over the course of one month [Douglis
of process for remote execution and mi- and Ousterhout, 1991]. Even taking dif-
gration. If the name of a process is not ferences in processor speed into account
found in this file, either it is excluded from as well as underlying overheads such as
migration consideration, or only nodes of file system cache flushing, LSF shows a
the same type as the local node would be marked improvement in remote invoca-
considered. tion performance.
Transp. Migration Locus, MOSIX, major changes to the OS high fair full
in UNIX-like OS Sprite underlying env. (Supporting SSI) (OS depend.)
Microkernels Amoeba, Arcade, BirliX, no UNIX semantics OS low (DMM good full
Chorus, Mach, RHODOS complex OS support and DIPC) (OS depend.)
User Space Condor, Alonso & Kyrimis, less transparency application low (forwarding very good limited
Mandelberg, LSF (relinked) system calls) (appl. dep.)
Application Freedman, Skordos, min. transparency, application lowest (app migration very good minimal
Bharat & Cardelli more appl. knowledge (recompiled) awareness)
Mobile objects Emerald, SOS, object oriented programming moderate good full
COOL environment (communication)
Mobile Agents Agent-TCL, Aglets heterogeneity programming lowest (security good fair
TACOMA, Telescript environment & safety)
we note that the performance of user- are aware of migration and they can in-
and application-level migrations typically voke migration only at predefined places
fall in the range of seconds, even min- in the code. Kernel-supported imple-
utes, when migrating processes with large mentations typically have higher lev-
address spaces. The kernel supported els of transparency. Single system image
migrations, especially the newer imple- supports transparent migration at any
mentations, fall in the range of tens of point of application code; migration can
milliseconds. The most optimized ker- transparently be initiated either by the
nel implemented migration (Choices) has migrating process or by another process.
initial costs of only 14ms [Roush and Most mobile agent implementations do
Campbell, 1996], and it is better even if not allow transparent migration invoca-
some rough normalization is accounted for tion by other applications; only the mi-
(see [Roush, 1995]). grating agent can initiate it. Even though
As mentioned earlier, the dominant per- less transparent, this approach simplifies
formance element is the cost to transfer implementation.
the address space. Kernel-level optimiza- More specifics on transparency in the
tions can cut down this cost, whereas user- case studies are presented in Table 4.
level implementations do not have access Migration for each case study is catego-
to the relevant data structures and cannot rized by whether it transparently sup-
apply these optimizations. ports open files, forking children, commu-
Recently, trends are emerging that nication channels, and shared memory. If
allow users more access to kernel data, migration requires changes to the kernel
mechanism, and policies [Bomberger or relinking the application, that is also
et al., 1992]. For example, microkernels listed.
export most of the kernel state needed for Support for shared memory of migrated
user-level implementations of migration tasks in Mach is unique. In practice, it
[Milojičić, 1993c]. Extensible kernels pro- was problematic due to a number of de-
vide even more support in this direction sign and implementation issues [Black
[Bershad et al., 1995; Engler et al., 1995]. et al. 1998]. Other systems that supported
These trends decrease the relevance of both shared memory and migration ei-
user versus kernel implementations. ther chose not to provide transparent ac-
Transparency describes the extent to cess to shared memory after migration
which a migrated process can continue (e.g. Locus [Walker and Mathews, 1989;
execution after migration as if migra- Fleisch and Popek, 1989]), or disallowed
tion has not happened. It also determines migration of processes using shared mem-
whether a migrated process is allowed to ory (e.g., Sprite [Ousterhout et al., 1988]).
invoke all system functions. Many user- or Kernel-level migration typically sup-
application-level implementations do not ports all features transparently, whereas
allow a process to invoke all system calls. user-level migrations may limit access to
Migration that is implemented inside the NFS files and may not support communi-
kernel typically supports full functional- cation channels or interprocess communi-
ity. In general, the higher the level of cation. In addition, a user-level migration
the implementation, the less transparency typically requires relinking applications
is provided. User-space implementations with special libraries. Migration done as
part of an application requires additional Finally, the flushing strategy also re-
re-compilation. quires some amount of change to the
In Table 5, we compare different data kernel, and has somewhat higher freeze
transfer strategies with respect to freeze time than copy-on-reference, but improves
time, freeze costs, residual time and residual time and costs by leaving residual
costs, residual dependencies, and dependencies only on a server, but not on
initial migration time (time passed the source node. Process migration in the
since request for migration until process Choices system, not listed in the table, rep-
started on remote node). resents a highly optimized version of eager
We can see that different strategies (dirty) strategy.
have different goals and introduce differ- The data transfer strategy dominates
ent costs. At one end of the spectrum, process migration characteristics such as
systems that implement an eager (all) performance, complexity, and fault re-
strategy in user space eliminate residual silience. The costs, implementation details
dependencies and residual costs, but suf- and residual dependencies of other pro-
fer from high freeze time and freeze costs. cess elements (e.g. communication chan-
Modifying the operating system allows nels, and naming) are also important but
an eager (dirty) strategy to reduce the have less impact on process migration.
amount of the address space that needs to In the Mach case study, we saw that
be copied to the subset of its dirty pages. most strategies can be implemented in
This increases residual costs and depen- user space. However, this requires a pager-
dencies while reducing freeze time and like architecture that increases the com-
costs. plexity of OS and migration design and
Using a precopy strategy further im- implementation.
proves freeze time, but has higher freeze Table 6 summarizes load information
costs than other strategies. Applications database characteristics. Database type
with real-time requirements can bene- indicates whether the information is
fit from this. However, it has very high maintained as a distributed or a cen-
migration time because it may require tralized database. Centralized databases
additional copying of already transferred have shown surprising scalability for some
pages. systems, in particular LSF. Neverthe-
Copy on reference requires the most ker- less, achieving the highest level of scal-
nel changes in order to provide sophis- ability requires distributed information
ticated virtual mappings between nodes. management.
It also has more residual dependencies Maximum nodes deployed is defined
than other strategies, but it has the lowest as the number of nodes that were ac-
freeze time and costs, and migration time tually used. It is hard to make predic-
is low, because processes can promptly tions about the scalability of migration
start on the remote node. and load information management. An
Collection Dissemination
Migration/ Per Process System Parameters Retained Negotiation -periodic (freq.) -periodic (freq.)
Charact. Parameters (also disseminated) Information Parameters -event driv. (event) -event driv. (event)
MOSIX age, I/O patterns, average ready queue partial migrating process periodic periodic (1-60s)
file access (random subset) may be refused (worm-like)
Sprite none time since last local all info retained migration periodic (5s) periodic (1min) and
user input, ready version upon a state change
queue length
Mach age, remote IPC, average ready queue, all info retained destination load, periodic (1s) periodic (1s)
and remote remote IPC, remote free paging space
paging paging
LSF none arbitrary all info retained system parameters periodic periodic
configurable of all nodes
Examples include crossing a load thresh- use. Several things worked against the
old on a single node or on demand after wider adoption of the MOSIX system: the
an application-specific request, or only for implementation was done on a commer-
specific events like process eviction. Sprite cial operating system which prevented
process migration can be initiated as a wide-spread distribution of the code. One
part of the pmake program or a migratory commercial backer of MOSIX withdrew
shell, or as a consequence of the eviction from the operating system business.
of a remotely executed process. The current outlook is much brighter.
Some of the systems use a priori The latest versions of MOSIX support pro-
knowledge, typically in the form of spec- cess migration on BSDI’s version of UNIX
ifying which processes are not allowed and Linux. The Linux port eliminates the
to migrate. These are for example well legal barriers that prevented the distribu-
known system processes, such as in the tion of early versions of the system.
case of MOSIX, Sprite and Mach, or com- Sprite. Sprite as a whole did not
mands in the case of LSF. The learning achieve a long-lasting success, so its pro-
from the past column indicates how some cess migration facility suffered with it.
systems adapt to changing loads. Exam- Sprite’s failure to expand significantly be-
ples include aging load vectors and pro- yond U.C. Berkeley was due to a conscious
cess residency in MOSIX, and limiting decision among its designers not to in-
consecutive migrations in Mach. Stabil- vest the enormous effort that would have
ity is achieved by requiring a minimum been required to support a large external
residency for migrated processes after a community. Instead, individual ideas from
migration (such as in MOSIX), by intro- Sprite, particularly in the areas of file sys-
ducing a high threshold per node (such as tems and virtual memory, have found their
in Mach and LSF), or by favoring long- way into commercial systems over time.
idle machines (such as in Sprite). It can The failure of Sprite’s process migration
also be achieved by manipulating load in- facility to similarly influence the commer-
formation as was investigated in MOSIX. cial marketplace has come as a surprise.
For example, dissemination policies can be Ten years ago we would have predicted
changed, information can be weighed sub- that process migration in UNIX would be
ject to current load, and processes can be commonplace today, despite the difficul-
refused. ties in supporting it. Instead, user-level
load distribution is commonplace, but it is
7. WHY PROCESS MIGRATION commonly limited to applications that can
HAS NOT CAUGHT ON run on different hosts without ill effects,
and relies either on explicit checkpointing
In this section, we attempt to identify or the ability to run to completion.
the barriers that have prevented a wider Mach and OSF/1. Compared to other
adoption of process migration and to ex- systems, Mach has gone the furthest
plain how it may be possible to overcome in technology transfer. Digital UNIX
them. We start with an analysis of each has been directly derived from OSF/1,
case study; we identify misconceptions; NT internals resemble the Mach design,
we identify those barriers that we con- and a lot of research was impacted by
sider the true impediments to the adop- Mach. However, almost no distributed
tion of migration; and we conclude by out- support was transferred elsewhere. The
lining the likelihood of overcoming these distributed memory management and dis-
barriers. tributed IPC were extremely complex, re-
quiring significant effort to develop and to
7.1. Case Analysis
maintain. The redesign of its distributed
MOSIX. The MOSIX distributed operat- IPC was accomplished within the OSF
ing system is an exception to most other RI [Milojičić et al., 1997], but distributed
systems supporting transparent process memory management has never been
migration in that it is still in general redesigned and was instead abandoned
[Black et al. 1998]. Consequently, task and r the lack of support for transparency, and
process migration have never been trans- r the lack of support for heterogeneity.
ferred elsewhere except to Universities
and Labs. Some implementations, even success-
LSF. Platform Computing has not ag- ful ones, indeed have reinforced such be-
gressively addressed process migration liefs. Despite the absence of wide spread
because the broad market is still not deployment, work on process migration
ready—partially due to an immature dis- has persisted. In fact, recently we have
tributed system structure, and partially seen more and more attempts to provide
due to a lack of cooperation from OS migration and other forms of mobility
and application vendors. But most impor- [Steensgaard and Jul, 1995; Roush and
tantly there was no significant customer Campbell, 1996; Smith and Hutchinson,
demand. 1998]. Checkpoint/restart systems are be-
Since a vast majority of users run Unix ing deployed for the support of long-
and Windows NT, for which dynamic pro- running processes [Platform Computing,
cess migration is not supported by the OS 1996]. Finally, mobile agents are being in-
kernel, Platform Computing has been us- vestigated on the Web.
ing user-level job checkpointing and mi- If we analyze implementations, we see
gration as an indirect way to achieve that technical solutions exist for each
process migration for the users of LSF. of these problems (complexity, cost, non-
A checkpoint library based on that of transparency and homogeneity). Migra-
Condor is provided that can be linked tion has been supported with various de-
with Unix application programs to en- grees of complexity: as part of kernel
able transparent process migration. This mechanisms; as user-level mechanisms;
has been integrated into a number of im- and even as a part of an application (see
portant commercial applications. For ex- Sections 4.2–4.6). The time needed to mi-
ample, a leading circuit simulation tool grate has been reduced from the range
from Cadence, called Verilog, can be check- of seconds or minutes [Mandelberg and
pointed on one workstation and resumed Sunderam, 1988; Litzkow and Solomon,
on another. 1992] to as low as 14ms [Roush and
It is often advantageous to have check- Campbell, 1996]. Various techniques have
pointing built into the applications and been introduced to optimize state transfer
have LSF manage the migration process. [Theimer et al., 1985; Zayas, 1987a; Roush
The checkpoint file is usually smaller and Campbell, 1996] (see Section 3.2).
compared to user-level, because only cer- Transparency has been achieved to dif-
tain data structures need to be saved, ferent degrees, from limited to complete
rather than all dirty memory pages. With (see Section 3.3). Finally, recent work
more wide-spread use of workstations and demonstrates improvements in support-
servers on the network, Platform Comput- ing heterogeneous systems, as done in
ing is experiencing a rapidly increasing Emerald [Steensgaard and Jul, 1995],
demand for process migration. Tui [Smith and Hutchinson, 1998] and
Legion [Grimshaw and Wulf, 1996] (see
7.2. Misconceptions Section 3.6).
Frequently, process migration has been
7.3. True Barriers to Migration Adoption
dismissed as an academic exercise with
little chance for wide deployment [Eager We believe that the true impediments to
et al., 1988; Kremien and Kramer, 1992; deploying migration include the following:
Shivaratri et al., 1992]. Many rationales
r A lack of applications. Scientific
have been presented for this position,
such as: applications and academic loads (e.g.
pmake and simulations) represent a
r significant complexity, small percentage of today’s applications.
r unacceptable costs, The largest percentage of applications
was realized that there is a lot of remote We believe there is a future for process
communication) can improve performance migration. Different streams of develop-
significantly. Secondly, with the larger ment may well lead to a wider deploy-
scale of systems, the failures are more fre- ment of process migration. Below we in-
quent, thereby increasing the relevance clude some possible paths.
of being able to continue program execu- One path is in the direction of LSF,
tion at another node. For long-running or a user-level facility that provides much
critical applications (those that should not of the functionality of full-fledged pro-
stop executing) migration becomes a more cess migration systems, but with fewer
attractive solution. Finally, the increas- headaches and complications. The check-
ing popularity of hardware mobile gadgets point/restart model of process migration
will require mobility support in software. has already been relatively widely de-
Examples include migrating applications ployed. Packages such as Condor, LSF
from a desktop, to a laptop, and eventually and Loadleveler are used for scientific and
to a gadget (e.g. future versions of cellular batch applications in production environ-
phones or palmtops). ments. Those environments have high de-
Sociology. There are a few factors re- mands on their computer resources and
lated to sociology. The meaning and rel- can take advantage of load sharing in a
evance of someone’s own workstation is simple manner.
blurring. There are so many computers in A second path concerns clusters of work-
use today that the issue of computing cy- stations. Recent advances in high speed
cles becomes less relevant. Many comput- networking (e.g. ATM [Partridge, 1994]
ers are simply servers that do not belong and Myrinet [Boden et al., 1995]) have re-
to any single user, and at the same time duced the cost of migrating processes, al-
the processing power is becoming increas- lowing even costly migration implementa-
ingly cheap. A second aspect is that as the tions to be deployed.
world becomes more and more connected, A third path, one closer to the con-
the idea of someone else’s code arriving on sumers of the vast majority of today’s
one’s workstation is not unfamiliar any- computers (Windows systems on Intel-
more. Many security issues remain, but based platforms), would put process mi-
they are being actively addressed by the gration right in the home or office. Sun re-
mobile code and agents community. cently announced their Jini architecture
In summary, we do not believe that for home electronics [Sun Microsystems,
there is a need for any revolutionary de- 1998] and other similar systems are sure
velopment in process migration to make it to follow. One can imagine a process start-
widely used. We believe that it is a matter ing on a personal computer, and migrat-
of time, technology development, and the ing its flow of control into another device
changing needs of users that will trigger a in the same domain. Such activity would
wider use of process migration. be similar to the migratory agents ap-
proach currently being developed for the
Web [Rothermel and Hohl, 1998].
8. SUMMARY AND FURTHER RESEARCH
Still another possible argument for pro-
In this paper we have surveyed process cess migration, or another Worm-like fa-
migration mechanisms. We have classi- cility for using vast processing capability
fied and presented an overview of a num- across a wide range of machines, would
ber of systems, and then discussed four be any sort of national or international
case studies in more detail. Based on this computational effort. Several years ago,
material, we have summarized various Quisquater and Desmedt [1991] suggested
migration characteristics. Throughout the that the Chinese government could solve
text we tried to assess some misconcep- complex problems (such as factoring large
tions about process migration, as well as numbers) by permitting people to use the
to discover the true reasons for the lack of processing power in their television sets,
its wide acceptance. and offering a prize for a correct answer
BARAK, A. AND SHILOH, A. 1985. A Distributed Load- Memory Management (XMM): Lessons Learned.
Balancing Policy for a Multicomputer. Software- Software-Practice and Experience 28, 9, 1011–
Practice and Experience 15, 9, 901–913. 1031.
BARAK, A. AND WHEELER, R. 1989. MOSIX: An BODEN, N., COHEN, D., FELDERMAN, R. E., KULAWIK, A.
Inte-grated Multiprocessor UNIX. Proceedings E., SEITZ, C. L., SEIZOVIC, J. N., AND SU, W.-K. 1995.
of the Winter 1989 USENIX Conference, 101– Myrinet: A Gigabit-per-Second Local Area Net-
112. work. IEEE Micro 15, 1, 29–38.
BARAK, A., SHILOH, A., AND WHEELER, R. 1989. BOKHARI, S. H. 1979. Dual Processor Scheduling
Flood Prevention in the MOSIX Load-Balancing with Dynamic Reassignment. IEEE Transac-
Scheme. IEEE Technical Committee on Operat- tions on Software Engineering, SE-5, 4, 326–334.
ing Systems Newsletter 3, 1, 24–27. BOMBERGER, A. C., FRANTZ, W. S., HARDY, A. C.,
BARAK, A., GUDAY, S., AND WHEELER, R. G. 1993. The HARDY, N., LANDAU, C. R., AND SHAPIRO, J. S.
MOSIX Distributed Operating System. Springer 1992. The Key-KOS (R) Nanokernel Architec-
Verlag. ture. USENIX Workshop on Micro-Kernels and
BARAK, A., LADEN, O., AND BRAVERMAN, A. 1995. The Other Kernel Architectures, 95–112.
NOW MOSIX and its Preemptive Process Mi- BOND, A. M. 1993. Adaptive Task Allocation in
gration Scheme. Bulletin of the IEEE Technical a Distributed Workstation Environment. Ph.D.
Committee on Operating Systems and Applica- Thesis, Victoria University at Wellington.
tion Environments 7, 2, 5–11. BONOMI, F. AND KUMAR, A. 1988. Adaptive Optimal
BARBOU DES PLACES, F. B., STEPHEN, N., AND REYNOLDS, Load Balancing in a Heterogeneous Multiserver
F. D. 1996. Linux on the OSF Mach3 Micro- System with a Central Job Scheduler. Proceed-
kernel. Proceedings of the First Conference on ings of the 8th International Conference on Dis-
Freely Redistributable Software, 33–46. tributed Computing Systems, 500–508.
BARRERA, J. 1991. A Fast Mach Network IPC Imple- BORGHOFF U. M. 1991. Catalogue of Distributed
mentation. Proceedings of the Second USENIX File/Operating Systems. Springer Verlag.
Mach Symposium, 1–12. BOWEN, N. S., NIKOLAOU, C. N., AND GHAFOOR, A.
BASKETT, F., HOWARD, J., AND MONTAGUE, T. 1977. Task 1988. Hierarchical Workload Allocation for
Communication in DEMOS. Proceedings of the Distributed Systems. Proceedings of the 1988 In-
6th Symposium on OS Principles, 23–31. ternational Conference on Parallel Processing,
BAUMANN, J., HOHL, F., ROTHERMEL, K., AND STRAB ER, II:102–109.
M. 1998. Mole—Concepts of a Mobile Agent BROOKS, C., MAZER, M. S., MEEKS, S., AND MILLER,
System. World Wide Web 1, 3, 123–137. J. 1995. Application-Specific Proxy Servers as
BEGUELIN, A., DONGARRA, J., GEIST, A., MANCHEK, R., HTTP Stream Transducers. Proceedings of the
OTTO, S., AND WALPOLE, J. 1993. PVM: Experi- Fourth International World Wide Web Confer-
ences, Current Status and Future Directions. ence, 539–548.
Proceedings of Supercomputing 1993, 765–766. BRYANT, B. 1995. Design of AD 2, a Distributed
BERNSTEIN, P. A. 1996. Middleware: A Model for Dis- UNIX Operating System. OSF Research Insti-
tributed System Services. Communications of tute.
the ACM 39, 2, 86–98. BRYANT, R. M. AND FINKEL, R. A. 1981. A Stable Dis-
BERSHAD, B., SAVAGE, S., PARDYAK, P., SIRER, E. G., tributed Scheduling Algorithm. Proceedings of
FIUCZINSKI, M., BECKER, D., CHAMBERS, C., AND the 2nd International Conference on Distributed
EGGERS, S. 1995. Extensibility, Safety and Per- Computing Systems, 314–323.
formance in the SPIN Operating System. Pro- BUGNION, E., DEVINE, S., GOVIL, K., AND ROSENBLUM, M.
ceedings of the 15th Symposium on Operating 1997. Disco: running commodity operating sys-
Systems Principles, 267–284. tems on scalable multiprocessors. ACM Transac-
BHARAT, K. A. AND CARDELLI, L. 1995. Migratory Ap- tions on Computer Systems 15, 4, 412–447.
plications. Proceedings of the Eight Annual ACM BUTTERFIELD, D. A. AND POPEK, G. J. 1984. Network
Symposium on User Interface Software Technol- Tasking in the Locus Distributed UNIX System.
ogy. Proceedings of the Summer USENIX Conference,
BLACK, A., HUTCHINSON, N., JUL, E., LEVY, H., 62–71.
AND CARTER, L. 1987. Distributed and Abstract CABRERA, L. 1986. The Influence of Workload on
Types in Emerald. IEEE Transactions on Soft- Load Balancing Strategies. Proceedings of the
ware Engineering, SE-13, 1, 65–76. Winter USENIX Conference, 446–458.
BLACK, D., GOLUB, D., JULIN, D., RASHID, R., DRAVES, CARDELLI, L. 1995. A Language with Distributed
R., DEAN, R., FORIN, A., BARRERA, J., TOKUDA, H., Scope. Proceedings of the 22nd Annual ACM
MALAN, G., AND BOHMAN, D. 1992. Microkernel Symposium on the Principles of Programming
Operating System Architecture and Mach. Pro- Languages, 286–297.
ceedings of the USENIX Workshop on Micro- CASAS, J., CLARK, D. L., CONURU, R., OTTO, S. W., PROUTY,
Kernels and Other Kernel Architectures, 11–30. R. M., AND WALPOLE, J. 1995. MPVM: A Migra-
BLACK, D., MILOJIČIĆ, D., DEAN, R., DOMINIJANNI, tion Transparent Version of PVM. Computing
M., LANGERMAN, A., SEARS, S. 1998. Extended Systems 8, 2, 171–216.
Systems, Performance Evaluation Review 16, 1, of Distributed Systems. Proceedings of the 8th
63–72. International Conference on Parallel Processing,
EFE, K. 1982. Heuristic Models of Task Assignment 728–734.
Scheduling in Distributed Systems. IEEE Com- GOLUB, D., DEAN, R., FORIN, A., AND RASHID, R. 1990.
puter, 15, 6, 50–56. UNIX as an Application Program. Proceedings
EFE, K. AND GROSELJ, B. 1989. Minimizing Control of the Summer USENIX Conference, 87–95.
Overheads in Adaptive Load Sharing. Proceed- GOPINATH, P. AND GUPTA, R. 1991. A Hybrid Ap-
ings of the 9th International Conference on Dis- proach to Load Balancing in Distributed Sys-
tributed Computing Systems, 307–315. tems. Proceedings of the USENIX Symposium
ENGLER, D. R., KAASHOEK, M. F., AND O’TOOLE, J. J. on Experiences with Distributed and Multipro-
1995. Exokernel: An Operating System Archi- cessor Systems, 133–148.
tecture for Application-Level Resource Manage- GOSCINSKI, A. 1991. Distributed Operating Systems:
ment. Proceedings of the 15th Symposium on Op- The Logical Design. Addison Wesley.
erating Systems Principles, 267–284. GOSLING, J., JOY, B., AND STEELE, G. 1996. The Java
ESKICIOGLU, M. R. 1990. Design Issues of Process Language Specification. Addison Wesley.
Migration Facilities in Distributed Systems. GRAY, R. 1995. Agent Tcl: A flexible and secure
IEEE Technical Committee on Operating Sys- mobileagent system. Ph.D. thesis, Technical Re-
tems Newsletter 4, 2, 3–13. port TR98-327, Department of Computer Sci-
EZZAT, A., BERGERON, D., AND POKOSKI, J. 1986. Task ence, Dartmouth College, June 1997.
Allocation Heuristics for Distributed Comput- GRIMSHAW, A. AND WULF, W. 1997. The Legion Vision
ing Systems. Proceedings of the 6th International of a Worldwide Virtual Computer. Communica-
Conference on Distributed Computing Systems. tions of the ACM 40, 1, 39–45.
FARMER, W. M., GUTTMAN, J. D., AND SWARUP, V. 1996. GUPTA, R. AND GOPINATH, P. 1990. A Hierarchical Ap-
Security for Mobile Agents: Issues and Require- proach to Load Balancing in Distributed Sys-
ments. Proceedings of the National Information tems. Proceedings of the Fifth Distributed Mem-
Systems Security Conference, 591–597. ory Computing Conference, II, 1000–1005.
FEITELSON, D. G. AND RUDOLPH, L. 1990. Mapping HAC, A. 1989a. A Distributed Algorithm for Perfor-
and Scheduling in a Shared Parallel Environ- mance Improvement Through File Replication,
ment Using Distributed Hierarchical Control. File Migration, and Process Migration. IEEE
Proceedings of the 1990 International Conference Transactions on Software Engineering 15, 11,
on Parallel Processing, I: 1–8. 1459–1470.
FERRARI, D. AND ZHOU., S. 1986. A Load Index for Dy- HAC, A. 1989b. Load Balancing in Distributed Sys-
namic Load Balancing. Proceedings of the 1986 tems: A Summary. Performance Evaluation Re-
Fall Joint Computer Conference, 684–690. view, 16, 17–25.
FINKEL, R., SCOTT, M., ARTSY, Y., AND CHANG, H. 1989. HAERTIG, H., KOWALSKI, O. C., AND KUEHNHAUSER, W. E.
Experience with Charlotte: Simplicity and Func- 1993. The BirliX Security Architecture.
tion in a Distributed Operating system. IEEE HAGMANN, R. 1986. Process Server: Sharing Pro-
Transactions on Software Engineering, SE-15, 6, cessing Power in a Workstation Environment.
676–685. Proceedings of the 6th International Conference
FLEISCH, B. D. AND POPEK, G. J. 1989. Mirage: A Co- on Distributed Computing Systems, 260–267.
herent Distributed Shared Memory Design. Pro- HAMILTON, G. AND KOUGIOURIS, P. 1993. The Spring
ceedings of the 12th ACM Symposium on Oper- Nucleus: A Microkernel for Objects. Proceedings
ating System Principles, 211–223. of the 1993 Summer USENIX Conference, 147–
FREEDMAN, D. 1991. Experience Building a Process 160.
Migration Subsystem for UNIX. Proceedings of HAN, Y. AND FINKEL, R. 1988. An Optimal Scheme for
the Winter USENIX Conference, 349–355. Disseminating Information. Proceedings of the
GAIT, J. 1990. Scheduling and Process Migration in 1988 International Conference on Parallel Pro-
Partitioned Multiprocessors. Journal of Parallel cessing, II, 198–203.
and Distributed Computing 8, 3, 274–279. HARCHOL-BALTER, M. AND DOWNEY, A. 1997. Exploit-
GAO, C., LIU, J. W. S., AND RAILEY, M. 1984. Load Bal- ing Process Lifetime Distributions for Dynamic
ancing Algorithms in Homogeneous Distributed Load Balancing. ACM Transactions on Com-
Systems. Proceedings of the 1984 International puter Systems 15, 3, 253–285. Previously ap-
Conference on Parallel Processing, 302–306. peared in the Proceedings of ACM Sigmetrics
GERRITY, G. W., GOSCINSKI, A., INDULSKA, J., TOOMEY, W., 1996 Conference on Measurement and Modeling
AND ZHU, W. 1991. Can We Study Design Issues of Computer Systems, 13–24, May 1996.
of Distributed Operating Systems in a Gener- HILDEBRAND, D. 1992. An Architectural Overview of
alized Way? Proceedings of the Second USENIX QNX. Proceedings of the USENIX Workshop on
Symposium on Experiences with Distributed and Micro-Kernels and Other Kernel Architectures,
Multiprocessor Systems, 301–320. 113–126.
GOLDBERG, A. AND JEFFERSON, D. 1987. Transparent HOFMANN, M.O., MCGOVERN, A., AND WHITEBREAD, K.
Process Cloning: A Tool for Load Management 1998. Mobile Agents on the Digital Battlefield.
Proceedings of the Autonomous Agents ’98, 219– KRUEGER, P. AND LIVNY, M. 1987. The Diverse
225. Objectives of Distributed Scheduling Policies.
HOHL, F. 1998. A Model of Attacks of Malicious Proceedings of the 7th International Conference
Hosts Against Mobile Agents. Proceedings of the on Distributed Computing Systems, 242–249.
4th Workshop on Mobile Objects Systems, INRIA KRUEGER, P. AND LIVNY, M. 1988. A Comparison of
Technical Report, 105–120. Preemptive and Non-Preemptive Load Balanc-
HWANG, K., CROFT, W., WAH, B., BRIGGS, F., SIMONS, ing. Proceedings of the 8th International Con-
W., AND COATES, C. 1982. A UNIX-Based Local ference on Distributed Computing Systems, 123–
Computer Network with Load Balancing. IEEE 130.
Computer, 15, 55–66. KRUEGER, P. AND CHAWLA, R. 1991. The Stealth Dis-
JACQMOT, C. 1996. Load Management in Dis- tributed Scheduler. Proceedings of the 11th Inter-
tributed Computing Systems: Towards Adaptive national Conference on Distributed Computing
Strategies. Technical Report, Ph.D. Thesis, De- Systems, 336–343.
partement d’Ingenierie Informatique, Universite KUNZ, T. 1991. The Influence of Different Work-
catholique de Louvain. load Descriptions on a Heuristic Load Balancing
JOHANSEN, D., VAN RENESSE, R., AND SCHNEIDER, F. 1995. Scheme. IEEE Transactions on Software Engi-
Operating System Support for Mobile Agents. neering 17, 7, 725–730.
Proceedings of the 5th Workshop on Hot Topics LAMPSON, B. 1983. Hints for Computer System De-
in Operating Systems, 42–45. sign. Proceedings of the Ninth Symposium on
JUL, E., LEVY, H., HUTCHINSON, N., AND BLACK, A. 1988. Operating System Principles, 33–48.
Fine-Grained Mobility in the Emerald System. LANGE, D. AND OSHIMA, M. 1998. Programming Mo-
ACM Transactions on Computer Systems 6, 1, bile Agents in JavaTM -With the Java Aglet API.
109–133. Addison Wesley Longman.
JUL, E. 1988. Object Mobility in a Distributed LAZOWSKA, E. D., LEVY, H. M., ALMES, G. T., FISHER,
Object-Oriented System. Technical Report 88- M. J., FOWLER, R. J., AND VESTAL, S. C. 1981. The
12-06, Ph.D. Thesis, Department of Computer Architecture of the Eden System. Proceedings of
Science, University of Washington, Also Techni- the 8th ACM Symposium on Operating Systems
cal Report no. 98/1, University of Copenhagen Principles, 148–159.
DIKU. LEA, R., JACQUEMOT, C., AND PILLVESSE, E. 1993.
JUL, E. 1989. Migration of Light-weight Processes COOL: System Support for Distributed Pro-
in Emerald. IEEE Technical Committee on Op- gramming. Communications of the ACM 36, 9,
erating Systems Newsletter, 3(1)(1):20–23. 37–47.
KAASHOEK, M. F., VAN RENESSE, R., VAN STAVEREN, H., LELAND, W. AND OTT, T. 1986. Load Balancing
AND TANENBAUM, A. S. 1993. FLIP: An Internet- Heuristics and Process Behavior. Proceedings of
work Protocol for Supporting Distributed Sys- the SIGMETRICS Conference, 54–69.
tems. ACM Transactions on Computer Systems, LIEDTKE, J. 1993. Improving IPC by Kernel Design.
11(1). Proceedings of the Fourteenth Symposium on Op-
KEMPER, A., KOSSMANN, D. 1995. Adaptable Pointer erating Systems Principles, 175–188.
Swizzling Strategies in Object Bases: Design, LITZKOW, M 1987. Remote UNIX-Turning Idle
Realization, and Quantitative Analysis. VLDB Work-stations into Cycle Servers. Proceedings
Journal 4(3): 519-566(1995). of the Summer USENIX Conference, 381–
KHALIDI, Y. A., BERNABEU, J. M., MATENA, V., SHIRIFF, 384.
K., AND THADANI, M. 1996. Solaris MC: A Multi- LITZKOW, M., LIVNY, M., AND MUTKA, M. 1988. Condor
Computer OS. Proceedings of the USENIX 1996 A Hunter of Idle Workstations. Proceedings of
Annual Technical Conference, 191–204. the 8th International Conference on Distributed
KLEINROCK, L. 1976. Queueing Systems vol. 2: Com- Computing Systems, 104–111.
puter Applications. Wiley, New York. LITZKOW, M. AND SOLOMON, M. 1992. Supporting
KNABE, F. C. 1995. Language Support for Mo- Checkpointing and Process Migration outside
bile Agents. Technical Report CMU-CS-95-223, the UNIX Kernel. Proceedings of the USENIX
Ph.D. Thesis, School of Computer Science, Winter Conference, 283–290.
Carnegie Mellon University, Also Technical Re- LIVNY, M. AND MELMAN, M. 1982. Load Balancing in
port ECRC-95-36, European Computer Industry Homogeneous Broadcast Distributed Systems.
Research Centre. Proceedings of the ACM Computer Network Per-
KOTZ, D., GRAY, R., NOG, S., RUS, D., CHAWLA, S., AND formance Symposium, 47–55.
CYBENKO., G. 1997. Agent Tcl: Targeting the LO, V. 1984. Heuristic Algorithms for Task Assign-
needs of mobile computers. IEEE Internet Com- ments in Distributed Systems. Proceedings of
puting 1, 4, 58–67. the 4th International Conference on Distributed
KREMIEN, O. AND KRAMER, J. 1992. Methodical Anal- Computing Systems, 30–39.
ysis of Adaptive Load Sharing Algorithms. IEEE LO, V. 1989. Process Migration for Communication
Transactions on Parallel and Distributed Sys- Performance. IEEE Technical Committee on Op-
tems 3, 6, 747–760. erating Systems Newsletter 3, 1, 28–30.
LO, V. 1988. Algorithms for Task Assignment and 194. Also appeared in IEE Proceedings—
Contraction in Distributed Computing Systems. Distributed Systems Engineering, 5, 1–14, 1998.
Proceedings of the 1988 International Conference MILOJIČIĆ, D., DOUGLIS, F., WHEELER, R. 1999. Mobil-
on Parallel Processing, II, 239–244. ity: Processes, Computers, and Agents. Addison-
LOUBOUTIN, S. 1991. An Implementation of a Pro- Wesley Longman and ACM Press.
cess Migration Mechanism using Minix. Pro- MILOJIČIĆ, D., GIESE, P., AND ZINT, W. 1993a. Expe-
ceedings of 1991 European Autumn Conference, riences with Load Distribution on Top of the
Budapest, Hungary, 213–224. Mach Microkernel. Proceedings of the USENIX
LU, C., CHEN, A., AND LIU, J. 1987. Protocols for Re- Symposium on Experiences with Distributed and
liable Process Migration. INFOCOM 1987, The Multiprocessor Systems.
6th Annual Joint Conference of IEEE Computer MILOJIČIĆ, D., ZINT, W., DANGEL, A., AND GIESE, P.
and Communication Societies. 1993b. Task Migration on the top of the Mach
LU, C. 1988. Process Migration in Distributed Sys- Microkernel. Proceedings of the third USENIX
tems. Ph.D. Thesis, Technical Report, University Mach Symposium, 273–290.fc
of Illinois at Urbana-Champaign. MILOJIČIĆ, D. 1993c. Load Distribution, Implemen-
LUX, W., HAERTIG, H., AND KUEHNHAUSER, W. E. 1993. tation for the Mach Microkernel. Ph.D. Thesis,
Migrating Multi-Threaded, Shared Objects. Technical Report, University of Kaiserslautern.
Proceedings of 26th Hawaii International Con- Also Vieweg, Wiesbaden, 1994.
ference on Systems Sciences, II, 642–649. MILOJIČIĆ, D., LANGERMAN, A., BLACK, D., SEARS, S.,
LUX, W. 1995. Adaptable Object Migration: Concept DOMINIJANNI, M., AND DEAN, D. 1997. Concur-
and Implementation. Operating Systems Review rency, a Case Study in Remote Tasking and Dis-
29, 2, 54–69. tributed IPC. IEEE Concurrency 5, 2, 39–49.
MA, P. AND LEE, E. 1982. A Task Allocation Model for MIRCHANDANEY, R., TOWSLEY, D., AND STANKOVIC, J.
Distributed Computing Systems. IEEE Transac- 1989. Analysis of the Effects of Delays on Load
tions on Computers, C-31, 1, 41–47. Sharing. IEEE Transactions on Computers 38,
MAGUIRE, G. AND SMITH, J. 1988. Process Migra- 11, 1513–1525.
tions: Effects on Scientific Computation. ACM MIRCHANDANEY, R., TOWSLEY, D., AND STANKOVIC, J.
SIGPLAN Notices, 23, 2, 102–106. 1990. Adaptive Load Sharing in Heteroge-
MALAN, G., RASHID, R., GOLUB, D., AND BARON, R. 1991. neous Distributed Systems. Journal of Parallel
DOS as a Mach 3.0 Application. Proceedings and Distributed Computing, 331–346.
of the Second USENIX Mach Symposium, 27– MULLENDER, S. J., VAN ROSSUM, G., TANENBAUM, A.
40. S., VAN RENESSE, R., AND VAN STAVEREN, H.
MANDELBERG, K. AND SUNDERAM, V. 1988. Process 1990. Amoeba—A Distributed Operating Sys-
Migration in UNIX Networks. Proceedings of tem for the 1990s. IEEE Computer 23, 5, 44–
USENIX Winter Conference, 357–363. 53.
MEHRA, P. AND WAH, B. W. 1992. Physical Level Syn- MUTKA, M. AND LIVNY, M. 1987. Scheduling Remote
Processing Capacity in a Workstation Processor
thetic Workload Generation for Load-Balancing
Bank Computing System. Proceedings of the 7th
Experiments. Proceedings of the First Sympo-
International Conference on Distributed Com-
sium on High Performance Distributed Comput-
puting Systems, 2–7.
ing, 208–217.
NELSON, M. N., AND OUSTERHOUT, J. K. 1988. Copy-
MILLER, B. AND PRESOTTO, D. 1981. XOS: an Operat-
ing System for the XTREE Architecture. Oper- on-Write for Sprite. Proceedings of the Summer
ating Systems Review, 2, 15, 21–32. 1988 USENIX Conference, 187–201.
MILLER, B., PRESOTTO, D., AND POWELL, M. 1987. DE- NELSON, M. N., WELCH, B. B., AND OUSTERHOUT, J. K.
1988. Caching in the Sprite Network File Sys-
MOS/ MP: The Development of a Distributed Op-
erating System. Software-Practice and Experi- tem. ACM Transaction on Computer Systems 6,
1, 134–54.
ence 17, 4, 277–290.
MILOJIČIĆ, D. S., BREUGST, B., BUSSE, I., CAMPBELL, J., NELSON, R. AND SQUILLANTE, M. 1995. Stochastic
COVACI, S., FRIEDMAN, B., KOSAKA, K., LANGE, D., Analysis of Affinity Scheduling and Load Bal-
ONO, K., OSHIMA, M., THAM, C., VIRDHAGRISWARAN, ancing in Parallel Processing Systems. IBM Re-
search Report RC 20145.
S., AND WHITE, J. 1998b. MASIF, The OMG Mo-
bile Agent System Interoperability Facility. Pro- NI, L. M. AND HWANG, K. 1985. Optimal Load Bal-
ceedings of the Second International Workshop ancing in a Multiple Processor System with
on Mobile Agents, pages 50–67. Also appeared Many Job Classes. IEEE Transactions on Soft-
in Springer Journal on Personal Technologies, 2, ware Engineering, SE-11, 5, 491–496.
117–129, 1998. NICHOLS, D. 1987. Using Idle Workstations in a
MILOJIČIĆ, D. S., CHAUHAN, D., AND LAFORGE, W. Shared Computing Environment. Proceedings of
1998a. Mobile Objects and Agents (MOA), De- the 11th Symposium on OS Principles, 5–12.
sign, Implementation and Lessons Learned. NICHOLS, D. 1990. Multiprocessing in a Network
Proceedings of the 4th USENIX Conference on of Workstations. Ph.D. Thesis, Technical Report
Object-Oriented Technologies (COOTS), 179– CMU-CS-90-107, Carnegie Mellon University.
NUTTAL, M. 1994. Survey of Systems Providing Pro- RANGANATHAN, M., ACHARYA, A., SHARMA, S. D.,
cess or Object Migration. Operating System Re- AND SALTZ, J. 1997. Networkaware Mobile Pro-
view, 28, 4, 64–79. grams. Proceedings of the USENIX 1997 Annual
OMG, 1996. Common Object Request Broker Archi- Technical Conference, 91–103.
tecture and Specification. Object Management RASHID, R. AND ROBERTSON, G. 1981. Accent: a Com-
Group Document Number 96.03.04. munication Oriented Network Operating Sys-
OUSTERHOUT, J., CHERENSON, A., DOUGLIS, F., NELSON, tem Kernel. Proceedings of the 8th Symposium
M., AND WELCH, B. 1988. The Sprite Network on Operating System Principles, 64–75.
Operating System. IEEE Computer, 23–26. RASHID, R. 1986. From RIG to Accent to Mach: The
OUSTERHOUT, J. 1994. TcL and the Tk Toolkit. Evolution of a Network Operating System. Pro-
Addison-Wesley Longman. ceedings of the ACM/IEEE Computer Society
Fall Joint Computer Conference, 1128–1137.
PAINDAVEINE, Y. AND MILOJIČIĆ, D. 1996. Process v.
Task Migration. Proceedings of the 29th Annual ROSENBERRY, W., KENNEY, D., AND FISHER, G. 1992.
Hawaii International Conference on System Sci- Understanding DCE. O’Reilly & Associates, Inc.
ences, 636–645. ROTHERMEL, K., AND HOHL, F. 1998. Mobile Agents.
PARTRIDGE, C. 1994. Gigabit Networking. Addison Proceedings of the Second International Work-
Wesley. shop, MA’98, Springer Verlag.
ROUSH, E. T. 1995. The Freeze Free Algorithm
PEINE, H. AND STOLPMANN, T. 1997. The Architecture
for process Migration. Ph.D. Thesis, Techni-
of the Ara Platform for Mobile Agents. Proceed-
cal Report, University of Illinois at Urbana-
ings of the First International Workshop on Mo-
Champaign.
bile Agents (MA’97). LNCS 1219, Springer Ver-
lag, 50–61. ROUSH, E. T. AND CAMPBELL, R. 1996. Fast Dynamic
Process Migration. Proceedings of the Interna-
PETRI, S. AND LANGENDORFER, H. 1995. Load Balanc-
tional Conference on Distributed Computing Sys-
ing and Fault Tolerance in Workstation Clusters
tems, 637–645.
Migrating Groups of Communicating Processes.
Operating Systems Review 29 4, 25–36. ROWE, L. AND BIRMAN, K. 1982. A Local Network
Based on the UNIX Operating System. IEEE
PHELAN, J. M. AND ARENDT, J. W. 1993. An OS/2 Transactions on Software Engineering, SE-8, 2,
Personality on Mach. Proceedings of the third 137–146.
USENIX Mach Symposium, 191–202.
ROZIER, M. 1992. Chorus (Overview of the Chorus
PHILIPPE, L. 1993. Contribution à l’étude et la Distributed Operating System). USENIX Work-
réalisation d’un système d’exploitation à im- shop on Micro Kernels and Other Kernel Archi-
age unique pour multicalculateur. Ph.D. Thesis, tectures, 39–70.
Technical Report 308, Université de Franche-
SCHILL, A. AND MOCK, M. 1993. DC++: Distributed
comté.
Object Oriented System Support on top of OSF
PIKE, R., PRESOTTO, D., THOMPSON, K., AND TRICKEY, H. DCE. Distributed Systems Engineering 1, 2,
1990. Plan 9 from Bell Labs. Proceedings of the 112–125.
UKUUG Summer 1990 Conference, 1–9.
SCHRIMPF, H. 1995. Migration of Processes, Files
PLATFORM COMPUTING. 1996. LSF User’s and Admin- and Virtual Devices in the MDX Operating Sys-
istrator’s Guides, Version 2.2, Platform Comput- tem. Operating Systems Review 29, 2, 70–81.
ing Corporation. SHAMIR, E. AND UPFAL, E. 1987. A Probabilistic Ap-
POPEK, G., WALKER, B. J., CHOW, J., EDWARDS, D., proach to the Load Sharing Problem in Dis-
KLINE, C., RUDISIN, G., AND THIEL, G. 1981. Lo- tributed Systems. Journal of Parallel and Dis-
cus: a Network-Transparent, High Reliability tributed Computing, 4, 5, 521–530.
Distributed System. Proceedings of the 8th Sym- SHAPIRO, M. 1986. Structure and Encapsulation
posium on Operating System Principles, 169– in Distributed Systems: The PROXY Princi-
177. ple. Proceedings of the 6th International Con-
POPEK, G. AND WALKER, B. 1985. The Locus Dis- ference on Distributed Computing Systems, 198–
tributed System Architecture. MIT Press. 204.
POWELL, M. AND MILLER, B. 1983. Process Migra- SHAPIRO, M., DICKMAN, P., AND PLAINFOSSÉ, D. 1992.
tion in DEMOS/MP. Proceedings of the 9th Sym- Robust, Distributed References and Acyclic
posium on Operating Systems Principles, 110– Garbage Collection. Proceedings of the Sympo-
119. sium on Principles of Distributed Computing,
PU, C., AUTREY, T., BLACK, A., CONSEL, C., COWAN, C., IN- 135–146.
OUYE, J., KETHANA, L., WALPOLE, J., AND ZHANG, K. SHAPIRO, M., GAUTRON, P., AND MOSSERI, L. 1989. Per-
1995. Optimistic Incremental Specialization. sistence and Migration for C++ Objects. Pro-
Proceedings of the 15th Symposium on Operat- ceedings of the ECOOP 1989-European Confer-
ing Systems Principles, 314–324. ence on Object-Oriented Programming.
QUISQUATER, J.-J. AND DESMEDT, Y. G. 1991. Chinese SHIVARATRI, N. G. AND KRUEGER, P. 1990. Two Adap-
Lotto as an Exhaustive Code-Breaking Machine. tive Location Policies for Global Scheduling Al-
IEEE Computer 24, 11, 14–22. gorithms. Proceedings of the 10th International
Conference on Distributed Computing Systems, STUMM, M. 1988. The Design and Implementation
502–509. of a Decentralized Scheduling Facility for a
SHIVARATRI, N., KRUEGER, P., AND SINGHAL, M. 1992. Workstation Cluster. Proceedings of the Sec-
Load Distributing for Locally Distributed Sys- ond Conference on Computer Workstations, 12–
tems. IEEE Computer, 33–44. 22.
SHOHAM, Y. 1997. An Overview of Agent-oriented SUN MICROSYSTEMS. 1998. JiniTM Software Simpli-
Programming. in J. M. Bradshaw, editor, Soft- fies Network Computing. http://www.sun.com/
ware Agents, 271–290. MIT Press. 980713/jini/feature.jhtml.
SHOCH, J. AND HUPP, J. 1982. The Worm Programs— SVENSSON, A. 1990. History, an Intelligent Load
Early Experience with Distributed Computing. Sharing Filter. Proceedings of the 10th Interna-
Communications of the ACM 25, 3, 172–180. tional Conference on Distributed Computing Sys-
tems, 546–553.
SHUB, C. 1990. Native Code Process-Originated Mi-
gration in a Heterogeneous Environment. Pro- SWANSON, M., STOLLER, L., CRITCHLOW, T., AND KESSLER,
ceedings of the 18th ACM Annual Computer Sci- R. 1993. The Design of the Schizophrenic
ence Conference, 266–270. Workstation System. Proceedings of the third
USENIX Mach Symposium, 291–306.
SINGHAL, M. AND SHIVARATRI, N. G. 1994. Advanced
Concepts in Operating Systems. McGraw-Hill. TANENBAUM, A. S., RENESSE, R. VAN, STAVEREN, H. VAN.,
SHARP, G. J., MULLENDER, S. J., JANSEN, A. J.,
SINHA, P., MAEKAWA, M., SHIMUZU, K., JIA, X., ASHIHARA,
AND VAN ROSSUM, G. 1990. Experiences with the
UTSUNOMIYA, N., PARK, AND NAKANO, H. 1991.
Amoeba Distributed Operating System. Com-
The Galaxy Distributed Operating System.
munications of the ACM, 33, 12, 46–63.
IEEE Computer 24, 8, 34–40.
TANENBAUM, A. 1992. Modern Operating Systems.
SKORDOS, P. 1995. Parallel Simulation of Subsonic Prentice Hall, Englewood Cliffs, New Jersey.
Fluid Dynamics on a Cluster of Workstations.
Proceedings of the Fourth IEEE International TARDO, J. AND VALENTE, L. 1996. Mobile Agent
Symposium on High Performance Distributed Security and Telescript. Proceedings of
Computing. COMPCON’96, 52–63.
SMITH, J. M. 1988. A Survey of Process Migration TEODOSIU, D. 2000. End-to-End Fault Containment
Mechanisms. Operating Systems Review 22, 3, in Scalable Shared-Memory Multiprocessors.
28–40. Ph.D. Thesis, Technical Report, Stanford Univer-
sity.
SMITH, J. M. AND IOANNIDIS, J. 1989. Implement-
ing Remote fork() with Checkpoint-Restart. THEIMER, M. H. AND HAYES, B. 1991. Heterogeneous
IEEE Technical Committee on Operating Sys- Process Migration by Recompilation. Proceed-
tems Newsletter 3, 1, 15–19. ings of the 11th International Conference on Dis-
tributed Computer Systems, 18–25.
SMITH, P. AND HUTCHINSON, N. 1998. Heterogeneous
THEIMER, M. AND LANTZ, K. 1988. Finding Idle Ma-
Process Migration: The Tui System. Software-
chines in a Workstation-Based Distributed Sys-
Practice and Experience 28, 6, 611–639.
tem. IEEE Transactions on Software Engineer-
SOH, J. AND THOMAS, V. 1987. Process Migration for ing, SE-15, 11, 1444–1458.
Load Balancing in Distributed Systems. TEN-
THEIMER, M., LANTZ, K., AND CHERITON, D. 1985. Pre-
CON, 888–892.
emptable Remote Execution Facilities for the V
SQUILLANTE, M. S. AND NELSON, R. D. 1991. Analysis System. Proceedings of the 10th ACM Sympo-
of Task Migration in Shared-Memory Multipro- sium on OS Principles, 2–12.
cessor Scheduling. Proceedings of the ACM SIG- TRACEY, K. M. 1991. Processor Sharing for Coopera-
METRICS Conference 19, 1, 143–155. tive Multi-task Applications. Ph.D. Thesis, Tech-
STANKOVIC, J. A. 1984. Simulation of the three Adap- nical Report, Department of Electrical Engineer-
tive Decentralized Controlled Job Scheduling al- ing, Notre Dame, Indiana.
gorithms. Computer Networks, 199–217. TRITSCHER, S. AND BEMMERL, T. 1992. Seitenorien-
STEENSGAARD, B. AND JUL, E. 1995. Object and Na- tierte Prozessmigration als Basis fuer Dynamis-
tive Code Thread Mobility. Proceedings of the chen Lastausgleich. GI/ITG Pars Mitteilungen,
15th Symposium on Operating Systems Princi- no 9, 58–62.
ples, 68–78. TSCHUDIN, C. 1997. The Messenger Environment
STEKETEE, C., ZHU, W., AND MOSELEY, P. 1994. Imple- M0 - A condensed description. In Mobile Object
mentation of Process Migration in Amoeba. Pro- Systems: Towards the Programmable Internet,
ceedings of the 14th International Conference on LNCS 1222, Springer Verlag, 149–156.
Distributed Computer Systems, 194–203. VAN DIJK, G. J. W. AND VAN GILS, M. J. 1992. Efficient
STONE, H. 1978. Critical Load Factors in Two- process migration in the EMPS multiprocessor
Processor Distributed Systems. IEEE Transac- system. Proceedings 6th International Parallel
tions on Software Engineering, SE-4, 3, 254–258. Processing Symposium, 58–66.
STONE, H. S. AND BOKHARI, S. H. 1978. Control of Dis- VAN RENESSE, R., BIRMAN, K. P., AND MAFFEIS, S. 1996.
tributed Processes. IEEE Computer 11, 7, 97– Horus: A Flexible Group Communication Sys-
106. tem. Communication of the ACM 39, 4, 76–85.
VASWANI, R. AND ZAHORJAN, J. 1991. The implications WIECEK, C. A. 1992. A Model and Prototype of VMS
of Cache Affinity on Processor Scheduling for Using the Mach 3.0 Kernel. Proceedings of the
Multiprogrammed Shared Memory Multiproces- USENIX Workshop on Micro-Kernels and Other
sors. Proceedings of the Thirteenth Symposium Kernel Architectures, 187–204.
on Operating Systems Principles, 26–40. WONG, R., WALSH, T., AND PACIOREK, N. 1997. Con-
VENKATESH, R. AND DATTATREYA, G. R. 1990. Adap- cordia: An Infrastructure for Collaborating
tive Optimal Load Balancing of Loosely Cou- Mobile Agents. Proceedings of the First Interna-
pled Processors with Arbitrary Service Time tional Workshop on Mobile Agents, LNCS 1219,
Distributions. Proceedings of the 1990 Interna- Springer Verlag, 86–97.
tional Conference on Parallel Processing, I, 22– XU, J. AND HWANG, K. 1990. Heuristic Methods for
25. Dynamic Load Balancing in a Message-Passing
VIGNA, G. 1998. Mobile Agents Security, LNCS Supercomputer. Proceedings of the Supercom-
1419, Springer Verlag. puting’90, 888–897.
VITEK, I., SERRANO, M., AND THANOS, D. 1997. Secu- ZAJCEW, R., ROY, P., BLACK, D., PEAK, C., GUEDES,
rity and Communication in Mobile Object Sys- P., KEMP, B., LOVERSO, J., LEIBENSPERGER, M.,
tems. In Mobile Object Systems: Towards the BARNETT, M., RABII, F., AND NETTERWALA, D. 1993.
Programmable Internet, LNCS 1222, Springer An OSF/1 UNIX for Massively Parallel Multi-
Verlag, 177–200. computers. Proceedings of the Winter USENIX
WALKER, B., POPEK, G., ENGLISH, R., KLINE, C., AND Conference, 449–468.
THIEL, G. 1983. The LOCUS Distributed Oper- ZAYAS, E. 1987a. Attacking the Process Migration
ating System. Proceedings of the 9th Symposium Bottleneck. Proceedings of the 11th Symposium
on Operating Systems Principles 17, 5, 49–70. on Operating Systems Principles, 13–24.
WALKER, B. J. AND MATHEWS, R. M. 1989. Process Mi- ZAYAS, E. 1987b. The Use of Copy-on-Reference in
gration in AIX’s Transparent Computing Facil- a Process Migration System. Ph.D. Thesis, Tech-
ity (TCF). IEEE Technical Committee on Oper- nical Report CMU-CS-87-121, Carnegie Mellon
ating Systems Newsletter, 3, 1, (1) 5–7. University.
WANG, Y.-T. AND MORRIS, R. J. T. 1985. Load Sharing ZHOU, D. 1987. A Trace-Driven Simulation Study of
in Distributed Systems. IEEE Transactions on Dynamic Load Balancing. Ph.D. Thesis, Techni-
Computers, C-34, 3, 204–217. cal Report UCB/CSD 87/305, CSD (EECS), Uni-
WANG, C.-J., KRUEGER, P., AND LIU, M. T. 1993. In- versity of California, Berkeley.
telligent Job Selection for Distributed Schedul- ZHOU, S. AND FERRARI, D. 1987. An Experimen-
ing. Proceedings of the 13th International Con- tal Study of Load Balancing Performance. Pro-
ference on Distributed Computing Systems, 288– ceedings of the 7th IEEE International Confer-
295. ence on Distributed Computing Systems, 490–
WELCH, B. B. AND OUSTERHOUT, J. K. 1988. Pseudo- 497.
Devices: User-Level Extensions to the Sprite File ZHOU, S. AND FERRARI, D. 1988. A Trace-Driven Sim-
System. Proceedings of the USENIX Summer ulation Study of Dynamic Load Balancing. IEEE
Conference, 7–49. Transactions on Software Engineering 14, 9,
WELCH, B. 1990. Naming, State Management and 1327–1341.
User-Level Extensions in the Sprite Distributed ZHOU, S., ZHENG, X., WANG, J., AND DELISLE, P. 1994.
File System. Ph.D. Thesis, Technical Report Utopia: A Load Sharing Facility for Large,
UCB/CSD 90/567, CSD (EECS), University of Heterogeneous Distributed Computer Systems.
California, Berkeley. Software-Practice and Experience.
WHITE, J. 1997. Telescript Technology: An Intro- ZHU, W. 1992. The Development of an Environment
duction to the Language. White Paper, General to Study Load Balancing Algorithms, Process
Magic, Inc., Sunnyvale, CA. Appeared in Brad- migration and load data collection. Ph.D. The-
shaw, J., Software Agents, AAAI/MIT Press. sis, Technical Report, University of New South
WHITE, J. E., HELGESON, S., AND STEEDMAN, D. A. 1997. Wales.
System and Method for Distributed Computa- ZHU, W., STEKETEE, C., AND MUILWIJK, B. 1995.
tion Based upon the Movement, Execution, and Load Balancing and Workstation Autonomy on
Interaction of Processes in a Network. United Amoeba. Australian Computer Science Commu-
States Patent no. 5603031. nications (ACSC’95) 17, 1, 588–597.
Received October 1996; revised December 1998 and July 1999; accepted August 1999