Android JIT
Android JIT
Introduction
Just-In-Time (JIT)/Dynamic
Compilation
JIT Design
Dalvik JIT
JIT Compiler
Intermediate Representation
Optimization Techniques
Introduction:
The Java language is made to be interpreted to achieve the critical
goal of application portability.
HW.java
Other classes
HW.class
Java
Language
javac
ca
08
fe
1a
ba
42
be
..
java
Class file(bytecode)
Java Virtual
Machine
Microprocessors have instruction sets that define the operations they can
perform, so does the VM instructions compile into a format known
asbytecodes.
It is through the VM that executable bytecode Java classes are executed
and ultimately routed to appropriate native system calls.
Problem:
Problem (Contd.):
The conventional approach resulted in significantly lower
performance when compared to compiled languages like C/C++ by the
additional processor and memory usage during interpretation.
As a result, slow and space-constrained computing devices have tended
not to include virtual computing technology(i.e. JVM).
Initiatives:
Byte
Code
s
JVM
Just-In-Time
Compiler
Intermediate
Representation
Generator
Optimize
r
GC
Code
Generator
Runtime
Profile
r
Method 1
Method 2
Method 3
Method 4
V-table
Method 1
Bytecode
Method 2
Bytecode
Method 3
Bytecode
Method 4
Bytecode
Each address in the V-table points to the executable bytecode for the
particular method
Just-In-Time
Compiler
Method 4
Method 5
V-table
When the VM calls a method through the address in the V-table, the JIT
compiler is executed instead.
Just-In-Time
Compiler
Method 4
Method 5
Method 5 Native
Code
From now on, each call to the method results in a call to the native
version.
JIT Design :
Challenges (Price of Platform neutrality):
The time it takes to compile the code is added to the program's running time.
JIT typically causes a slight delay in initial execution of an application, due to the
time taken to load and compile the bytecode.
Optimizations:
The JVM interprets a method until its call count exceeds aJIT threshold.
2. After a method is compiled, its call count is reset to zero; subsequent calls to
the
method continue to increment its count.
3. When the call count of a method reaches aJIT recompilation threshold, the
JIT compiles it a second time, this time applying a larger selection of
optimizations than on the previous compilation (because the method has proven
to be a significant part of the whole program)
Method 1
Method 2
Method 3
Method 4
V-table
Method 1
Bytecode
Method 2
Bytecode
Method 3
Bytecode
Method 4
Bytecode
Just-In-Time
Compiler
IT Design (Contd.) :
Interpret
er
JIT
JIT=O
FF
.
class
.
class
JIT=ON
Threshold=10
.
class
times >=
10
JVM
JVM
Nativ
e
Code
Operating
System
times <
10
Dalvik JIT :
Dalvik Execution Environment:
1.Register based architecture (Register Machine)
Stack-based machines (JVMs) must useinstructionsto load data on
the stack and manipulate that data, and, thus, require more instructions
than register machines.
2.Very compact representation
Javabytecode is converted into an alternateinstruction setused by
the Dalvik VM.
dxis a tool used to convert some (but not all) Java.classfiles into
the .dex format.
3.Emphasis on code/data sharing to reduce memory usage
Multipleclassesare included in a single .dex file.
4.Highly-tuned very fast (2x similar) Dalvik Interpreter, good enough for
most of the applications.
For compute-intensive applications, Native Development Kit was
released to allow Dalvik applications to call out statically-compiled(native)
methods.
2.Trace Compiler
- Most common model for low-level code migration systems
- Interprets with profiling to identify hot execution paths
- Compiled fragments chained together in translation cache
- Strengths
Only hottest of hot code is compiled, minimizing memory usage
Tight integration with interpreter allows focus on common cases
Very rapid return of performance boost once hotness detected
- Weaknesses
Smaller optimization window limits peak gain
More frequent state synchronization with interpreter
Difficult to share translation cache across processes
Method JIT:
Best optimization
window
Trace JIT:
Best speed/space
tradeoff
Full Program
4,695,780
bytes
Hot
Methods
396,230
8% of
program
bytes
Hot Traces
396,230
bytes
26% of Hot
methods
2% of program
Start
Translation
Cache
NO
Update Profile
count for this
location
Interpret/build NO
Trace request
Submit
Compilation
Request
Compiler
Thread
Threshol
d?
Translati
on
YES
Exit 0
Exit 1
Xlatio
n
exist
s?
Translati
on
Exit 0
Exit 1
YES
Install
new
translatio
n
Translati
on
Exit 0
Exit 1
JIT Compiler:
JIT Compiler Work Flow:
In order to execute bytecode, JIT compiler goes through three stages.
1.Baseline: Generates code that is Obviously correct
The process involves generating an internal representation of a java
code that is
different from bytecodes but at a higher level than the target
processor's native
instructions (Intermediate Representation(IR)).
IR allows more effective machine-specific optimizations
2.Optimizing: Applies a set of optimizations to a class when it is loaded
at run time
3.Adaptive:
Methods are compiled with a non-optimizing compiler first
A key part of the JIT design was to split the compilation process into two
and then
selects
hot
methods for
based on bytecodes
run-time
passes.
The
first pass
transforms
therecompilation
standard, stack-based
profiling
information.
into
a simple
3-address intermediate representation in which all
temporary statement results are placed into new local variables instead
of entries on an evaluation stack. The second pass converts this threeaddress form into native machine code.
Intermediate Representation:
An IR code must be convenient to translate into real assembly code for all
desired target machines
1.
2.
not necessary
The value of y being used in the third line
comes from the second assignment of y.
y1 := 1
y2 := 2
x := y2
SSA
b
y
t
e
c
o
d
e
M
a
c
h
i
n
e
1. IRs that are close to a high-level language are called high-level IRs, and IRs
that are close to assembly are called low-level IRs.
2. A high-level IR might preserve things like array subscripts or field accesses
whereas a low-level IR converts those into explicit addresses and offsets.
Original
LIR
float a[10][20]
a[i][j+2]
HIR
t1 = a[i, j+2]
MIR
t1
t2
t3
t4
t5
t6
t7
=
=
=
=
=
=
=
j+2
i*20
t1+t2
4*t3
addr a
t5+t4
*t6
r1
r2
r3
r4
r5
r6
r6
=
=
=
=
=
=
=
[fp-4]
[r1+2]
[fp-8]
r3*20
r4+r2
4*r5
fp216
class AdditionMethodTest {
public static void main(String args[]) {
int a = 3;
int b = 4;
int c = a + b;
int d = getNewValue(c);
return;
} // End method main
public static int getValue(int var) {
return var * var;
} // End method getNewValue
}
Java Code
Bytecode
y=x+5
Generated IR
(optimization
off)
Generated IR
(optimization
on)
INT_ADD tint,
xint 5
INT_MOVE yint,
tint
Optimization Techniques:
Why Optimization:
1.
Optimization Techniques:
1. In-lining (also at lower levels)
2. Specialization
3. Constant folding
4. Constant propagation
5. Value numbering
6. Dead code elimination
7. Loop-invariant code motion
8. Common sub-expression elimination
9. Strength reduction
10. Branch prediction/optimization
11. Register allocation
12. Loop unrolling
13. Cache optimization