0% found this document useful (0 votes)
1K views58 pages

Dalvik VM Internals

This document summarizes the Dalvik virtual machine (VM) used in Android. It discusses how the Dalvik VM aims to be memory efficient by: 1. Sharing common strings, methods, and other constants between applications and system libraries using a shared constant pool in DEX files to minimize repetition. 2. Using per-type pools and implicit typing and labeling to further reduce memory usage. 3. Compressing application sizes by 44-50% on average compared to uncompressed JAR files through the use of optimized DEX files.

Uploaded by

gopishk
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views58 pages

Dalvik VM Internals

This document summarizes the Dalvik virtual machine (VM) used in Android. It discusses how the Dalvik VM aims to be memory efficient by: 1. Sharing common strings, methods, and other constants between applications and system libraries using a shared constant pool in DEX files to minimize repetition. 2. Using per-type pools and implicit typing and labeling to further reduce memory usage. 3. Compressing application sizes by 44-50% on average compared to uncompressed JAR files through the use of optimized DEX files.

Uploaded by

gopishk
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Dalvik VM Internals

Dan Bornstein
Google
• Intro

• Memory

• CPU

• Advice

• Conclusion
Dalvík, Iceland
The Big Picture
The Big Picture
What is the Dalvik VM?
It is a virtual machine to…

• run on a slow CPU

• with relatively little RAM

• on an OS without swap space

• while powered by a battery


What is the Dalvik VM?
It is a virtual machine to…

• run on a slow CPU

• with relatively little RAM

• on an OS without swap space

• while powered by a battery


Memory Efficiency

• Intro

• Memory

• CPU

• Advice

• Conclusion
Problem: Memory Efficiency

• total system RAM: 64 MB


• available RAM after low-level startup: 40 MB

• available RAM after high-level services have started: 20 MB

• multiple independent mutually-suspicious processes


• separate address spaces, separate memory

• large system library: 10 MB jar)


Problem: Memory Efficiency

• total system RAM: 64 MB


• available RAM after low-level startup: 40 MB

• available RAM after high-level services have started: 20 MB

• multiple independent mutually-suspicious processes


• separate address spaces, separate memory

• large system library: 10 MB


Dex File Anatomy
header
"Hello World"
"Lcom/google/Blort;" int
"println" string_ids String[]
… com.google.Blort

void fn(int)
type_ids
double fn(Object, int)
String fn() proto_ids String.offset
… Integer.MAX_VALUE

field_ids
PrintStream.println(…)
Collection.size()
… method_ids

class_defs

data
Dex File Anatomy
header
"Hello World"
"Lcom/google/Blort;" int
"println" string_ids String[]
… com.google.Blort

void fn(int)
type_ids
double fn(Object, int)
String fn() proto_ids String.offset
… Integer.MAX_VALUE

field_ids
PrintStream.println(…)
Collection.size()
… method_ids

class_defs

data
Dex File Anatomy
header
"Hello World"
"Lcom/google/Blort;" int
"println" string_ids String[]
… com.google.Blort

void fn(int)
type_ids
double fn(Object, int)
String fn() proto_ids String.offset
… Integer.MAX_VALUE

field_ids
PrintStream.println(…)
Collection.size()
… method_ids

class_defs

data
Dex File Anatomy
header
"Hello World"
"Lcom/google/Blort;" int
"println" string_ids String[]
… com.google.Blort

void fn(int)
type_ids
double fn(Object, int)
String fn() proto_ids String.offset
… Integer.MAX_VALUE

field_ids
PrintStream.println(…)
Collection.size()
… method_ids

class_defs

data
Dex File Anatomy
.jar file
.class file
heterogeneous
constant pool .dex file
string_ids
other data constant pool
type_ids
constant pool
.class file proto_ids
heterogeneous constant pool
constant pool field_ids
constant pool
method_ids
other data
constant pool

.class file
heterogeneous other data
constant pool

other data
Shared Constant Pool
public interface Zapper {
public String zap(String s, Object o);
}
public class Blort implements Zapper {
public String zap(String s, Object o) {
...;
}
}
public class ZapUser {
public void useZap(Zapper z) {
z.zap(...);
}
}
Shared Constant Pool
Original .class files
class Zapper class Blort
"Zapper" "SourceFile" "Blort" "Zapper"
"zap" "java/lang/Object" method ref "zap" "LineNumberTable"
"Zapper.java" "Blort.java" "(Ljava/lang/String;Ljava/lang/
"(Ljava/lang/String;Ljava/lang/ "Code" Object;)Ljava/lang/String;"
Object;)Ljava/lang/String;"
method ref "java/lang/Object"

"<init>" "SourceFile"
"()V"
class ZapUser
"ZapUser" "Zapper" "ZapUser.java"
method ref "zap" "LineNumberTable"
"(Ljava/lang/String;Ljava/lang/
"Code" Object;)Ljava/lang/String;"
method ref "java/lang/Object"

"<init>" "SourceFile"
"()V"
"useZap" "(LZapper;)V"
Shared Constant Pool
Original .class files
class Zapper class Blort
"Zapper" "SourceFile" "Blort" "Zapper"
"zap" "java/lang/Object" method ref "zap" "LineNumberTable"
"Zapper.java" "Blort.java" "(Ljava/lang/String;Ljava/lang/
"(Ljava/lang/String;Ljava/lang/ "Code" Object;)Ljava/lang/String;"
Object;)Ljava/lang/String;"
method ref "java/lang/Object"

"<init>" "SourceFile"
"()V"
class ZapUser
"ZapUser" "Zapper" "ZapUser.java"
method ref "zap" "LineNumberTable"
"(Ljava/lang/String;Ljava/lang/
"Code" Object;)Ljava/lang/String;"
method ref "java/lang/Object"

"<init>" "SourceFile"
"()V"
"useZap" "(LZapper;)V"
Shared Constant Pool
Original .class files
class Zapper class Blort
"Zapper" "SourceFile" "Blort" "Zapper"
"zap" "java/lang/Object" method ref "zap" "LineNumberTable"
"Zapper.java" "Blort.java" "(Ljava/lang/String;Ljava/lang/
"(Ljava/lang/String;Ljava/lang/ "Code" Object;)Ljava/lang/String;"
Object;)Ljava/lang/String;"
method ref "java/lang/Object"

"<init>" "SourceFile"
"()V"
class ZapUser
"ZapUser" "Zapper" "ZapUser.java"
method ref "zap" "LineNumberTable"
"(Ljava/lang/String;Ljava/lang/
"Code" Object;)Ljava/lang/String;"
method ref "java/lang/Object"

"<init>" "SourceFile"
"()V"
"useZap" "(LZapper;)V"
Shared Constant Pool
.dex file
method id "useZap"
"Blort.java"
"LZapUser;" proto id "Zapper.java"
"ZapUser.java"
method id
"LZapper;"
"<init>"
proto id "V"
method id
method id method id
"LBlort;"

method id "zap"
method id
proto id "Ljava/lang/String;"

"Ljava/lang/Object;"
Shared Constant Pool
Memory is saved via…

• minimal repetition

• per-type pools (implicit typing)

• implicit labeling
Size Comparison
common system libraries
(U) 21445320 — 100% (U) uncompressed jar file
(J) 10662048 — 50% (J) compressed jar file
(D) 10311972 — 48% (D) uncompressed dex file

web browser app


(U) 470312 — 100%
(J) 232065 — 49%
(D) 209248 — 44%

alarm clock app


(U) 119200 — 100%
(J) 61658 — 52%
(D) 53020 — 44%
4 Kinds Of Memory

• clean vs. dirty

• clean: mmap()ed and unwritten

• dirty: malloc()ed

• shared vs. private


• shared: used by many processes

• private: used by only one process


4 Kinds Of Memory
• clean (shared or private)
• common dex files (libraries)

• application-specific dex files

• shared dirty
• ???

• private dirty
• application “live” dex structures

• application heap
Enter The Zygote

• nascent VM process

• starts at boot time

• preloads and preinitializes classes

• fork()s on command
Enter The Zygote
Zygote Maps
Zygote heap
Maps dex file Browser
(shared dirty,
copy-on-write; (mmap()ed) Browser dex file
Home
rarely written)
Maps live code (mmap()ed) Home dex file
and heap
Browser live (mmap()ed)
core library dex (private dirty) code and heap
files
Home live code
shared from (private dirty) and heap
(mmap()ed) Zygote
shared from (private dirty)
Zygote
shared from
"live" core Zygote
libraries

(shared dirty;
read-only)
4 Kinds Of Memory
• clean (shared or private)
• common dex files (libraries)

• application-specific dex files

• shared dirty
• library “live” dex structures

• shared copy-on-write heap (mostly not written)

• private dirty
• application “live” dex structures

• application heap
GC And Sharing
embedded separated
mark bits mark bits
mark bits
object data
object data
parallel
mark bits
mark bits object data

object data
object data
mark bits

object data object data

. .
. .
. .
GC And Sharing

• separate process, separate heaps, separate GCs

• GCs must be independent

• GC should respect the sharing!


GC And Sharing
Mark bits kept separate from other heap memory.

object data
parallel
• avoids un-sharing pages
object data
mark bits

• better small cache behavior


object data
doesn’t waste memory
object data

.
.
.
CPU Efficiency

• Intro

• Memory

• CPU

• Advice

• Conclusion
Problem: CPU Efficiency

• CPU speed: 250-500MHz

• bus speed: 100MHz

• data cache: 16-32K

• available RAM for apps: 20 MB


No JIT

• usually doesn’t matter

• lots of native code


• system provides libs for graphics, media

• JNI available

• hardware support common (graphics, audio)


Install-Time Work

• verification
• dex structures aren’t “lying”
• valid indices

• valid offsets

• code can’t misbehave


Install-Time Work

• optimization
• byte-swapping and padding (unnecessary on ARM)

• static linking

• “inlining” special native methods

• pruning empty methods

• adding auxiliary data


Register Machine
Why?

• avoid instruction dispatch

• avoid unnecessary memory access

• consume instruction stream efficiently


• higher semantic density per instruction
Register Machine
The stats

• 30% fewer instructions

• 35% fewer code units

• 35% more bytes in the instruction stream


• but we get to consume two at a time
Example #1: Source

public static long sumArray(int[] arr) {


long sum = 0;
for (int i : arr) {
sum += i;
}
return sum;
}
Example #1: .class
0000: lconst_0
0001:
0002:
lstore_1
aload_0
• 25 bytes
0003: astore_3
0004: aload_3 • 14 dispatches
0005: arraylength
0006:
0008:
istore 04
iconst_0
• 45 reads
0009:
000b:
istore 05
iload 05 // rl ws • 16 writes
000d: iload 04 // rl ws
000f: if_icmpge 0024 // rs rs
0012: aload_3 // rl ws
0013: iload 05 // rl ws
0015: iaload // rs rs ws
0016: istore 06 // rs wl
0018: lload_1 // rl rl ws ws
0019: iload 06 // rl ws
001b: i2l // rs ws ws read stack
001c: ladd // rs rs rs rs ws ws
001d: lstore_1 // rs rs wl wl write stack
001e: iinc 05, #+01 // rl wl
0021: goto 000b read local write local
0024: lload_1
0025: lreturn
Example #1: .dex
• 18 bytes
0000: const-wide/16 v0, #long 0 • 6 dispatches
0002: array-length v2, v8
0003: const/4 v3, #int 0 • 19 reads
0004: move v7, v3 • 6 writes
0005: move-wide v3, v0
0006: move v0, v7
0007: if-ge v0, v2, 0010 // r r
0009: aget v1, v8, v0 // r r w
000b: int-to-long v5, v1 // r w w
000c: add-long/2addr v3, v5 // r r r r w w
000d: add-int/lit8 v0, v0, #int 1 // r w
000f: goto 0007
0010: return-wide v3
Example #2: Source

private static final int[] S33KR1T_1NF0RM4T10N = {


0x4920616d, 0x20726174,
0x68657220, 0x666f6e64,
0x206f6620, 0x6d756666,
0x696e732e
};
Example #2: .class
0000: bipush #+07
0002: newarray int
0004: dup
0005: iconst_0
0006:
0008:
ldc #+4920616d
iastore ...
0009: dup
000a:
000b:
iconst_1
ldc #+20726174 dup
000d: iastore
000e:
000f:
dup
iconst_2 bipush #+NN
0010: ldc #+68657220
0012:
0013:
iastore
dup
ldc #VVVVVVVV
0014: iconst_3
0015:
0017:
ldc #+666f6e64
iastore
iastore
0018:
0019:
001a:
dup
iconst_4
ldc #+206f6620
...
001c: iastore
001d: dup
001e: iconst_5
001f: ldc #+6d756666
0021: iastore
0022: dup
0023: bipush #+06
0025: ldc #+696e732e
0027: iastore
0028: putstatic Example2.S33KR1T_1NF0RM4T10N:[I
002b: return
Example #2: Hack!
private static final int[] S33KR1T_1NF0RM4T10N;
static {
String s =
"\u4920\u616d\u2072\u6174\u6865" +
"\u7270\u666f\u6e64\u206f\u6620" +
"\u6d75\u6666\u696e\u732e";
S33KR1T_1NF0RM4T10N = new int[7];
for (int i = 0, j = 0; i < 7; i++, j += 2) {
S33KR1T_1NF0RM4T10N[i] =
(s.charAt(j) << 16) | s.charAt(j+1);
}
}
Example #2: .dex
0000: const/4 v0, #int 7 // #7
0001: new-array v0, v0, int[]
0003: fill-array-data v0, 000a
0006: sput-object v0,
Example2.S33KR1T_1NF0RM4T10N:int[]
0008: return-void
0009: nop // spacer
000a: array-data // for fill-array-data @ 0003
0: 1226858861 // #4920616d
1: 544366964 // #20726174
2: 1751478816 // #68657220
3: 1718578788 // #666f6e64
4: 544171552 // #206f6620
5: 1836410470 // #6d756666
6: 1768846126 // #696e732e
0026:
Example #2: .dex
0000: const/4 v0, #int 7 // #7
0001: new-array v0, v0, int[]
0003: fill-array-data v0, 000a
0006: sput-object v0,
Example2.S33KR1T_1NF0RM4T10N:int[]
0008: return-void
0009: nop // spacer
000a: array-data // for fill-array-data @ 0003
0: 1315272293 // #4e657665
1: 1914726255 // #7220676f
2: 1852727584 // #6e6e6120
3: 1734964837 // #67697665
4: 544829301 // #20796f75
5: 544567355 // #2075703b
6: 544105846 // #206e6576
7: 1701978215 // #65722067
8: 1869508193 // #6f6e6e61
9: 543974772 // #206c6574
10: 544829301 // #20796f75
11: 543453047 // #20646f77
12: 1848520238 // #6e2e2e2e
003e:
Interpreters 101
The portable way
static void interp(const char* s) {
for (;;) {
switch (*(s++)) {
case 'a': printf("Hell"); break;
case 'b': printf("o"); break;
case 'c': printf(" w"); break;
case 'd': printf("rld!\n"); break;
case 'e': return;
}
}
}
int main(int argc, char** argv) {
interp("abcbde");
}
Interpreters 101
The gcc way
#define DISPATCH() \
{ goto *op_table[*((s)++) - 'a']; }
static void interp(const char* s) {
static void* op_table[] = {
&&op_a, &&op_b, &&op_c, &&op_d, &&op_e
};
DISPATCH();
op_a: printf("Hell"); DISPATCH();
op_b: printf("o"); DISPATCH();
op_c: printf(" w"); DISPATCH();
op_d: printf("rld!\n"); DISPATCH();
op_e: return;
}
Interpreters 101
ARM assembly

op_table:
Two memory reads
.word op_a
.word op_b
...
#define DISPATCH() ldrb r0, [rPC], #1 \
ldr pc, [rOP_TABLE, r0, lsl #2]
op_a: ...
DISPATCH()
op_b: ...
DISPATCH()
...
Interpreters 101
ARM assembly (cleverer)
One memory read

#define DISPATCH() ldrb r0, [rPC], #1 \


add pc, rFIRST_OP, r0, lsl #6
.align 64
op_a: // address gets stored in rFIRST_OP
... up to 16 instructions ...
op_b:
... up to 16 instructions ...
op_c:
... up to 16 instructions ...
...
Optimizing Your Code

• Intro

• Memory

• CPU

• Advice

• Conclusion
Time Scale
• human interaction scale
• 10-30 interactions / sec

• human perception scale


• 25-30 image frames / sec

• continuous audio, synched within 100 msec

• computer scale
• run as much and as fast as possible
Get Plenty Of Rest
A well-behaved app…

• spends most of its time sleeping

• reacts quickly and decisively to user and network input


Loop Wisely
(1) for (int i = initializer; i >= 0; i--)
(2) int limit = calculate limit;
for (int i = 0; i < limit; i++)
(3) Type[] array = get array;
for (Type obj : array)
(4) for (int i = 0; i < array.length; i++)
(5) for (int i = 0; i < this.var; i++)
(6) for (int i = 0; i < obj.size(); i++)
(7) Iterable<Type> list = get list;
for (Type obj : list)
Loop Wisely
(1) for (int i = initializer; i >= 0; i--)
(2) int limit = calculate limit;
for (int i = 0; i < limit; i++)
(3) Type[] array = get array;
for (Type obj : array)
(4) for (int i = 0; i < array.length; i++)
(5) for (int i = 0; i < this.var; i++)
(6) for (int i = 0; i < obj.size(); i++)
(7) Iterable<Type> list = get list;
for (Type obj : list)
Loop Wisely
(1) for (int i = initializer; i >= 0; i--)
(2) int limit = calculate limit;
for (int i = 0; i < limit; i++)
(3) Type[] array = get array;
for (Type obj : array)
(4) for (int i = 0; i < array.length; i++)
(5) for (int i = 0; i < this.var; i++)
Danger! Danger!
(6) for (int i = 0; i < obj.size(); i++)
(7) Iterable<Type> list = get list;
for (Type obj : list)

Danger! Danger!
Avoid Allocation

• short-lived objects need to be GCed

• long-lived objects take precious memory


That’s all!

• Intro

• Memory

• CPU

• Advice

• Conclusion
Questions?

You might also like