0% found this document useful (0 votes)
41 views

Process State Saving Paper

The document describes a new process-internal state capture and recovery mechanism called Process Introspection. It allows capturing the state of a running process so it can later be recovered to run equivalently on a different system. The mechanism uses compiler support and library primitives to automate capturing global variables, stack state, and subroutine calls. It was found to offer good performance while reducing programmer effort compared to other process-internal state capture approaches.

Uploaded by

Kamal Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Process State Saving Paper

The document describes a new process-internal state capture and recovery mechanism called Process Introspection. It allows capturing the state of a running process so it can later be recovered to run equivalently on a different system. The mechanism uses compiler support and library primitives to automate capturing global variables, stack state, and subroutine calls. It was found to offer good performance while reducing programmer effort compared to other process-internal state capture approaches.

Uploaded by

Kamal Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Process State Capture and Recover it later

Pradeep Nagar, Pradeep Yadav,Vikas Mayer, Kamal Kumar Lingayas Institute of Mgnt. & Technology Nachauli, Jasana Road, Old Faridabad Faridabad-121002, India

1. Abstract The ability to capture the state of a process and later recover that state in the form of an equivalent running process is the basis for a number of important features in parallel and distributed systems. Adaptive load sharing and fault tolerance are well-known examples. Traditional state capture mechanisms have employed an external agent (such as the operating system kernel) to examine and capture process state. However, the increasing prevalence of heterogeneous cluster and metacomputing systems as high-performance computing platforms has prompted investigation of processinternal state capture mechanisms. Perhaps the greatest advantage of the process-internal approach is the ability to support crossplatform state capture and recovery, an important feature in heterogeneous environments. Among the perceived disadvantages of existing process-internal mechanisms are poor performance in multiple respects, and difficulty of use in terms of programmer effort. In this paper we describe a new process-internal state capture and recovery mechanism: Process Introspection. Experiences with this system indicate that the perceived disadvantages associated with processinternal mechanisms can be largely overcome, making this approach to state capture an appropriate one for cluster and metacomputing environments.

2. Introduction The ability to capture the state of a process and later recover that state in the form of an equivalent running process is the basis for a number of important features in parallel and distributed systems. For example, process migration policies supporting adaptive load sharing and/or fault tolerance rely on a state capture facility. Process state capture and recovery is the basis of a large class of backward error recovery schemes documented in the fault tolerance literature. Traditional state capture mechanisms have employed an external agent (such as the OS kernel) to examine and capture process state. In this paper we describe a new processinternal state capture and recovery mechanism: Process Introspection. This system is based on a combination of library and compiler support to maximize the ease of use of process-internal state capture and recovery. For platformindependent modules, the compiler completely automates state capture and recovery. For modules where automatic transformation is not possible, a flexible library providing the needed primitive operations for cross-platform state capture and recovery makes adding state capture functionality straightforward. These results indicate that a cross platform processinternal state capture mechanism can offer good performance. Our results lead us to conclude that the process-internal approach to state capture is the appropriate one for cluster and meta computing environments.

2. Design The design of a process-internal state capture mechanism is naturally based upon the modification of user programs to render them both self-describing and selfrecovering. 2.1. Model In our model, a running process is defined to be in one of three states: normal execution, state-capture, or state-recovery. The state of the process is changed by the program itself, either in response to requests from outside sources or as the result of an internal trigger such as periodic checkpoint scheduling. We require that the program periodically execute poll points: points in the code at which the process determines if it is in the state-capture mode, in which case a state description should be produced if onehas been requested. Certain parts of the process state are easily captured for example, any global variable or heap allocated data structures are globally addressable and are thus easy to manipulate. The key difficulty in creating the state description is the capture of the subroutine invocation stack state. In the Process Introspection approach, the process utilizes the native subroutine return mechanism to capture stack state. When a poll point is encountered during state capture mode, the current active subroutine captures its own state, including its local variables and the logical location of the poll point at which the current call frame was saved (e.g., this can simply be an integer that uniquely identifies the poll point within the subroutine), and returns to the caller. After the return, the caller saves its own state in the same way, and this frame-by-frame stack capture repeats until the base subroutine has been reached, at which point the stack state capture is complete. For this stack-saving mechanism to work, the program must execute a poll point

after returning from each subroutine call. At this point, the program might be in normal execution mode, in which case it proceeds with normal processing, or it might be in state-capture mode, in which case the stack save process continues. We name these required poll points following subroutine returns mandatory poll points. In fact, more frequent checks for state capture initiation may be desirable, in which case additional optional poll points can be placed anywhere in a program. 2.2. System usage This model of process-internal state capture and recovery appears at first glance to require significant programmer effort. In practice, many of the described code transformations can be automated. The Process Introspection system does this through the use of a source code compiler and runtime support library. For computational modules that are specified in a platform-independent form (i.e., that are written in a high-level language, are type-safe, and do not rely on the underlying features of a particular hardware platform for correctness), the described code transformations can be completely automated. Consider a message passing interface module. A compiler attempting to incorporate state capture and recovery functionality into such a module would have no way to know how to encode the capture of state such as messages in transit, nor would it be able to determine the state capture coordination semantics required by the application. In such cases, our model requires that the module be augmented by hand to incorporate state capture and recovery functionality this typically involves the creation of a state capture- enabled wrapper module for the interface in question. We envision the creation of state-capture-enabled library modules as an infrequent activity undertaken by cluster and meta computing software system designers. These wrapper libraries can then be reused by application

programmers whose own modules will be transformed automatically. To support the interoperation of state-capture enabled library modules and automatically transformed application modules, and to ease the hand-coding of state capture mechanisms for library modules, our system supports a library interface. This library provides basic services such as cross-platform data-format transformation routines, routines for constructing descriptive meta-information about the data regions of the process (such as a data type description table), and an event model for allowing separately developed statecapture-enabled modules to interoperate. 5. Related work The idea of capturing the state of a running process on one kind of computer system and then later restarting an equivalent process on a different type of computer system can be a good effort as it will save our work in case of system crashes.In our model, these compatible well defined states are present in the form of process states when poll points are encountered. There may some kind of systems in which user wants its all work to be saved automatically and if there is some fault then it will be possible to Recover it from the start position, this is much similar to hibernate process, in which the current RAM state is saved.

while at the same time providing good performance in terms of average checkpoint-request wait time. This result is important process internal state capture and recovery made possible by periodic polling can be utilized effectively, efficiently. Furthermore, the design of our system demonstrates that process-internal state capture and recovery need not place undue burden on the programmer the typical usage mode for our system is fully automatic, requiring only an additional compiler translation of the users application program. References [1] A. Acharya, M. Ranganathan and J. Saltz, Sumatra: A language for resource-aware mobile programs, in: Mobile Object Systems, eds. J. Vitek and C. Tschudin (Springer, Berlin, 1997). [2] A. Beguelin, E. Seligman and M. Starkey, Dome: Distributed object migration environment, Technical Report CMU-CS-94-153, Carnegie Mellon University (May 1994). [3] F. Bodin, P. Beckman, D. Gannon, J. Gotwals, S. Narayana, S. Srinivas and B. Winnicka, Sage++: An objectoriented toolkit and class

6. Conclusions We have presented Process Introspection, a process internal heterogeneous process state capture and recovery mechanism based on automatic code modification. Experiences with this system have produced encouraging results. First, we found that relatively simple pollpoint-placement policies can achieve acceptable levels of incurred overhead

library for building Fortran and C++ restructuring tools, OONSKI (1994). [4] Other anonymous online resources.

You might also like