Efficient Checkpointing in Java
Efficient Checkpointing in Java
jll,muller @irisa.fr
tel:+33.2.99.84.72.87, fax:+33.2.99.84.71.71
Abstract
This paper investigates the optimization of language-level
checkpointing of Java programs. First, we describe how
to systematically associate incremental checkpoints with
Java classes. While being safe, the genericness of this
solution induces substantial execution overhead. Second,
to solve the dilemma of genericness versus performance,
we use automatic program specialization to transform the
generic checkpointing methods into highly optimized ones.
Specialization exploits two kinds of information: (i) structural properties about the program classes, (ii) knowledge
of unmodified data structures in specific program phases.
The latter information allows us to generate phase-specific
checkpointing methods. We evaluate our approach on two
benchmarks, a realistic application which consists of a program analysis engine, and a synthetic program which can
serve as a metric. Specialization gives a speedup proportional to the complexity of the object structure and the
modification pattern. Measured speedups for the program
analysis engine are up to 1.5, and for the synthetic program
are up to 15.
1 Introduction
Checkpointing is known to introduce overhead proportional to the checkpoint size [12, 28]. Traditionally, optimizations of the checkpointing process are targeted toward
scientific programs written in Fortran or C. Such programs
often have good locality and large regions of read-only
data. In this environment, an effective optimization technique is incremental checkpointing, which uses systemlevel facilities to identify modified virtual-memory pages
[7, 19, 25]. Each checkpoint contains only the pages that
have been modified since the previous checkpoint. Additionally, by using a mechanism such as copy-on-write, the
application need not be blocked, at the expense of deferring
The Java programmer has no control over the location of objects. Thus, it is impossible to ensure that
frequently modified objects are all stored in the same
page. Furthermore, a single page may contain both
live objects and objects awaiting garbage collection.
These arguments suggest that a user-driven languagelevel approach may be appropriate for Java programs.
Language-level checkpointing augments the source program with code to record the program state [16, 17, 26].
To promote safety, this checkpointing code should be introduced systematically, and interfere as little as possible
with the standard behavior of the program. One approach
is to add methods to each class to save and restore the local state. Checkpointing is then performed by a generic
checkpoint method that invokes the checkpointing methods of each checkpointable object. Incremental checkpointing can be implemented by associating a flag with
each object, indicating whether the object has been modified since the previous checkpoint. This checkpointing
code can either be added manually or generated automatically using a preprocessor [16, 17]. In either case, localizing the code for saving and restoring the state of an object
in its class definition respects encapsulation, thus enhancing overall program safety, and simplifies program maintenance.
Nevertheless, this generic programming model introduces overheads. First, because the checkpoint method
is independent of the objects being checkpointed, it must
interact with these objects using virtual calls. Virtual calls
are less efficient than direct function calls, and block traditional compiler optimizations, such as inlining. Second,
although the use of the modified flag reduces the size of
checkpoints, it does not eliminate the need to visit each
checkpointable object defined by the program.
This checkpointing strategy can be optimized by manually creating specialized checkpointing functions for recurring object structures in the program. When some of
the objects are known not to be modified between specific checkpoints, all code relating to the checkpointing of
those objects can be removed. Nevertheless, many specialized checkpointing routines may be needed, to account for
the range of compound object structures used in different
phases of the program. When the program is modified,
these manually optimized routines may need to be completely rewritten. Thus, while these kinds of optimizations
can yield significant performance improvements, performing them by hand is laborious and error-prone.
Our approach
In this paper, we propose to use automatic program specialization to automatically optimize a generic checkpointing algorithm based on programmer-supplied information
about the fixed aspects of the object structure. Program
specialization is a technique for automatically and aggressively optimizing a generic program with respect to information about the program inputs [11, 15]. This technique
has been applied in a wide range of areas, including operating systems [20, 21, 31], and scientific programs [13, 23].
By specializing the checkpointing implementation with
respect to recurring structural and modification patterns,
we eliminate many tests, virtual calls, and traversals of unmodified data. Because specialization is automatic, these
transformations can be performed reliably, and are simple
to modify as the program evolves.
To assess the benefits of our approach in a realistic setting, we specialize the checkpointing of an implementation
of a program analysis engine, which performs the kinds
of analyses that are used in compilation or automatic program specialization. To analyze more precisely the benefits
of our approach, we also consider a synthetic program in
which we can vary the dimensions and modification pat-
For the synthetic example, we first specialize with respect to the structure, and then with respect to both
the structure and the modification pattern. Specialization with respect to the structure gives speedups
up to 3. Specialization with respect to the structure
and the modification pattern gives speedups proportional to the percentage of unmodified objects. When
three quarters of the objects are unmodified, we obtain
speedups up to 15.
2.1 Implementation
The implementation consists of the Checkpointable interface, which specifies the methods that must be provided
by each object to be checkpointed, and a Checkpoint object, which drives the checkpointing process. These are defined in Figure 1. For simplicity, we assume that the checkpointed objects do not contain cycles. We also assume that
checkpoints are constructed using a blocking protocol, and
are written from the output stream to stable storage asynchronously.
Each checkpointable object contains a unique identifier
and methods that describe how to record the state of the object and its children. Additionally, to implement incremental checkpointing, each object contains a flag indicating
whether any fields of the object have been modified since
the previous checkpoint. This functionality is captured by
the Checkpointable interface. The unique identifier and
the modification flag, which are defined in the same way
for all checkpointable objects, are factored into a separate
CheckpointInfo object, also defined in Figure 1.
public interface Checkpointable {
public CheckpointInfo getCheckpointInfo();
public void fold(Checkpoint c);
public void record(OutputStream d);
}
public class Checkpoint {
OutputStream d;
public Checkpoint() {
d = new OutputStream();
}
public void checkpoint(Checkpointable o) {
CheckpointInfo info = o.getCheckpointInfo();
if (info.modified()) {
d.writeInt(info.getId());
o.record(d);
info.resetModified();
}
o.fold(this);
}
}
public class CheckpointInfo {
private int id;
private boolean modified;
public CheckpointInfo() {
id = newId();
modified = true;
}
// unique identifier
public int getId() { return id; }
private static int newId() { ... }
// modification flag
public boolean modified() { return modified; }
public void setModified() { modified=true; }
public void resetModified() { modified=false; }
}
checkpointable object must define the methods getCheckpointInfo(), record(), and fold().
The
method getCheckpointInfo() accesses the associated CheckpointInfo structure.
The method
record(OutputStream d) records the complete local
state of the checkpointable object in the output stream
d.1 A value of base type is written directly, while a
sub-object is represented by its unique identifier. The
method fold(Checkpoint c) recursively applies the
checkpointing object c to each of the checkpointable
sub-objects.
Checkpointing is initiated by creating a Checkpoint
object, which initializes the output stream. The user program then applies the checkpoint method to the root of
each compound structure to record in the checkpoint. To
implement incremental checkpointing, checkpointing of an
object is divided into two steps. First, if the object has
been modified, its unique identifier is recorded in the output stream, and its record() method is invoked to record
its local state. The modified field is also reset. Then,
regardless of whether the object has been modified since
the previous checkpoint, the fold method of the object is
invoked to recursively apply the checkpointing process to
the children.
As in other approaches to checkpointing of objectoriented programs, the state of each object is restored from
a checkpoint using a restore method local to the object. The
definition of such a method is the inverse of the definition
of record. The unique identifiers associated with each
object are used to reconstruct the state from a sequence of
incremental checkpoints. Because restoration is performed
rarely, specialization seems unlikely to be interesting here.
2.2 Defining checkpointable objects
The methods required by the Checkpointable interface
can be systematically defined either manually or automatically, as follows. A class implementing the Checkpointable interface creates a CheckpointInfo structure and defines the associated getCheckpointInfo()
accessor function. Such a class also defines record and
fold methods to record its local state and traverse its children, respectively. A class that extends a checkpointable
class defines record and fold methods corresponding
to its own local state. These methods invoke the respective methods of the parent class to checkpoint the inherited
fields.
As an example, we use part of the implementation of
the program analysis engine, presented in Section 4. Each
phase of the program analysis engine stores its result in
1 In practice, we instantiate OutputStream as a DataOutputStream composed with a ByteArrayOutputStream, as defined in
the java.io package.
X.java
X.sc
javac+Harissa
JSCC
Specialization
directives
C files
Tempo
X-specialized.c
Harissa run-time
gcc
Assirah
Binary
application
javac
Java files
Java bytecode
application
3 Program Specialization
Program specialization is the optimization of a program
based on supplementary information about its input. We
first describe this technique, and then consider how to use
it to optimize the checkpointing process.
Attributes
SEEntry
BTEntry
ETEntry
Id Id
BT
ET
Id Id
checkpoint_attr(Checkpointable o) {
Attributes attr = (Attributes)o;
CheckpointInfo attrInfo = attr.getCheckpointInfo();
if (attrInfo.modified()) {
d.writeInt(attrInfo.getId());
attr.record(d);
attrInfo.resetModified();
}
SEEntry seEntry = attr.se;
CheckpointInfo seEntryInfo = SEEntry.getCheckpointInfo();
if (seEntryInfo.modified()) {
d.writeInt(seEntryInfo.getId());
seEntry.record(d); /* records both lists */
seEntryInfo.resetModified();
}
BTEntry btEntry = attr.bt;
CheckpointInfo btEntryInfo = btEntry.getCheckpointInfo();
if (btEntryInfo.modified()) {
d.writeInt(btEntryInfo.getId());
btEntry.record(d);
btEntryInfo.resetModified();
}
BT bt = btEntry.bt;
CheckpointInfo btInfo = bt.getCheckpointInfo();
if (btInfo.modified()) {
d.writeInt(btInfo.getId());
bt.record(d); /* virtual call */
btInfo.resetModified();
}
ETEntry etEntry = attr.et;
CheckpointInfo etEntryInfo = etEntry.getCheckpointInfo();
if (etEntryInfo.modified()) {
d.writeInt(etEntryInfo.getId());
etEntry.record(d);
etEntryInfo.resetModified();
}
ET et = etEntry.et;
CheckpointInfo etInfo = et.getCheckpointInfo();
if (etInfo.modified()) {
d.writeInt(etInfo.getId());
et.record(d); /* virtual call */
etInfo.resetModified();
}
}
Figure 6: Specialization of checkpoint w.r.t. the modification properties of an Attributes object for the bindingtime analysis
full ckp.
min. size
12.52
5.3
-
full ckp.
min. size
11.04
4.56
-
Table 1: Checkpoint size (in Mb) and execution time (in seconds). (JDK 1.2.2 JVM, Sun Ultra2 300MHz)
5 A synthetic application
To assess the benefits of our approach independent of a
particular application, we consider a synthetic example, in
which we can vary the structure of the checkpointed objects. The goal of these tests is to provide a metric for
determining to what degree other applications can benefit from our approach. We consider checkpointing a set
of compound structures, each containing five linked lists.
We vary properties of these structures such as the length of
the lists, the percentage of modified list elements, and the
number of integer-typed fields stored in each list element.
The test program constructs 20,000 compound structures, randomly chooses constituent list elements to be
modified according to the constraints of the experiment,
and performs a single checkpoint. Our benchmarks present
the time to construct the checkpoint. Unless otherwise
stated, the Java programs were translated to C before specialization and then run in the Harissa JVM.
We first compare incremental checkpointing to full
checkpointing. When some objects are not modified, incremental checkpointing reduces the cost of recording the
current state. Nevertheless, incremental checkpointing also
introduces tests into the traversal of the compound structures. Figure 7 shows that even when all of the objects
are modified the added cost is negligible. The speedup obtained by incremental checkpointing increases as the number of modified objects decreases, and as the cost of recording the state of each object increases. When only a quarter of the objects are modified, and when 10 integers are
recorded for each modified object, incremental checkpointing is over 3 times faster than full checkpointing.
speedup
list length = 1
list length = 5
3
2
1
0
100%50% 25%
10 integers recorded
per modified object
plete traversal of the compound structures to identify modified objects. Specialization with respect to properties of
the object structure optimizes the traversal. In particular,
we specialize with respect to the following structural information.
The shape of the compound structures.
list length = 1
list length = 5
speedup
4
3
2
1
0
% modified 100%50% 25%
elements
1 integer recorded
per modified object
100%50% 25%
10 integers recorded
per modified object
10
list length = 1
list length = 5
50%
speedup
speedup
10
25%
8
100%
4
2
6
25%
50%
100%
modified
lists
1 3
modified
lists
list length = 1
list length = 5
15
50%
100%
speedup
25%
10
5
0
10
50%
100%
5
0
modified
lists
1 3
modified
lists
15
8
25%
50%
10
100%
100%
25%
50%
4
2
0
modified
lists
(length 5)
1 integer
10 integers
speedup
speedup
Automatic program-transformation techniques have already been used to improve the reliability and performance of source-level checkpointing. The C-to-C compilers c2ftc and porch, developed by Ramkumar and
Strumpen [26, 30] and by Strumpen [29], respectively, add
code around each procedure call to enable a program to
manage the checkpointing and recovery of its control stack.
A preprocessor in the Dome system provides a similar facility for parallel C++ programs [5, 6]. Plank et al. propose
to use data-flow analysis to determine automatically, based
on hints from the user, the regions of memory that are not
modified between checkpoints [4, 24]. Calls to functions in
a checkpointing library (libckpt for Sparc or CLIP for Intel Paragon) are then automatically inserted into the source
program. Killijian et al. and Kasbekar et al. use compiletime reflection provided by OpenC++ [10] to add check-
25%
15
6 Related work
speedup
list. This is the worst case, because only tests, but not object traversals, are eliminated. Because the number of eliminated tests depends on the length of the lists, we achieve
the best speedup for long lists. Figure 10 shows that for
lists of length 5, when only one value is recorded for each
modified object, the speedup over unspecialized incremental checkpointing ranges from 5 to 15, depending on the
number of lists that may contain modified objects. When
10 integers are recorded for each object, these speedups
range from 2 to 11.
So far, we have assessed the performance of specialized
C code. For portability, we can also translate the specialized C code back to Java using the Assirah tool. In our
third specialization experiment above (c.f. Figure 10), we
specialize with respect to both the number of lists that may
contain a modified object and the position at which a modified object may occur in each list. Figure 11 compares the
performance of the Java specialized code with the performance of the unspecialized Java implementation of incremental checkpointing, for lists of length 5. As shown in
Figure 11a, using the JDK 1.2.2 JIT compiler, we obtain
speedups of up to 12. As shown in Figure 11b, combining JDK 1.2.2 with the state-of-the art dynamic compiler
HotSpot, we obtain speedups of up to 6 over the performance of the unspecialized code, also running on HotSpot.
As shown in Table 2, the Harissa code is significantly faster
than the code produced by the JDK 1.2.2 JIT compiler or
HotSpot. Table 2 also shows that the unspecialized code
run with HotSpot can be faster than the specialized code
run without HotSpot. Thus, one may wonder whether
HotSpot subsumes program specialization. Nevertheless,
Figure 11b shows that the specialization further improves
performance under HotSpot, demonstrating that specialization and dynamic compilation are complementary.
0
1
a) JDK 1.2.2
1 3
modified
lists
(length 5)
Unspecialized code
Specialized code
Possibly
mod. lists
1
5
1
5
100%
1.05
1.80
0.17
0.70
Harissa
50%
0.98
1.36
0.10
0.42
25%
0.95
1.14
0.08
0.27
100%
3.99
10.92
0.95
4.39
JDK 1.2.2
50%
1.98
7.05
0.54
2.33
25%
1.76
4.03
0.30
1.27
Table 2: Checkpoint execution time (in seconds), 10 integers written for each element
pointing code at the source level to the definitions of C++
objects [16, 17]. The reflection-based approaches are most
closely related to ours. Essentially, we use program specialization to optimize checkpointing methods of the form
generated by reflection.
Several of these source-level approaches address the
problem of incremental checkpointing. The analysis proposed by Plank et al. to detect unmodified regions of memory is performed at compile time, and is thus necessarily
approximate. The reflective approach of Killijian et al. associates a modification flag with each object field. Maintaining and testing these flags at run time adds substantial
overhead: extra space to store the modification flags, extra time on every assignment to update the associated flag,
and extra time during checkpointing to test the flags. Our
approach exploits both compile-time and run-time information. When it is possible to determine at compile time
that an object is not modified between checkpoints, specialization eliminates the code to save the state of the object. When it is not possible to determine this information
at compile time, the modified flag is retained in the specialized program and tested at run time. Because specialization
is automatic, it is feasible to create many implementations,
to account for the modification patterns of each phase of
the program, without changing the source code.
Language-level checkpointing for Java provides independence from the virtual machine. Other approaches have
simplified the checkpointing process and reduced checkpoint size by omitting aspects of the underlying language
implementation. The Stardust [9] and Dome [5, 6] systems
for SIMD parallelism in heterogeneous environments restrict checkpointing to synchronization points in the main
function, eliminating the need to record the stack. In the
context of Java, Killijian et al. also record only object
fields, and thus omit the stack [17].
Checkpointing is conceptually similar to serialization,
the conversion of an object structure into a flat representation. In Java, serialization is implemented using run-time
reflection. Reflection is used both to determine the static
structure of each object (its type, field names, etc.), and to
access the recorded field values. The structure of an object, however, does not change during execution. Thus,
repetitively determining this information at run time is inefficient. Braux and Noye propose to eliminate the over-
Availability
Examples described in this paper are available at
http://www.irisa.fr/compose/jspec/checkpoint.