A_lightweight_software_control_system_fo
A_lightweight_software_control_system_fo
University of Virginia
Original
Strata Software Software Dynamic Translator
Application
Dynamic Translator (Strata)
Code
Library
Sensor Sensors
Application-
Control Logic &
spectifc Goals
Library Actuator
& Database
Response
Actions Fragment Cache
Controlled Application
New New
Cached? Fragment
PC
Fetch
Decode
Translate
Context
Finished? Next PC
Switch
Strata
New New
Cached? Fragment
PC
Fetch
Decode
Translate
Context
Finished? Next PC
Switch
The Takeaway:
Non-control instruction
Trampoline
• Strata’s fragment construction process for basic blocks ending in conditional branches
Direct Conditional branch
• Fragment linking avoids excess overhead related to reentering the translator
0.8
0.9
1.0
1.1
1.2
1.3
1.4
17 wis
1 e
17 .sw
2. i m
17 mg
3 rid
17 .ap
7. plu
17 me
8. sa
ga
lg
18 17 el
3. 9.a
Average overhead: 4%
xt
30 rac
1. k
16 aps
4. i
g
17 zip
5
17 .vp
6. r
University of Virginia
18 gcc
18 1.m
19 6.c cf
7. raf
pa t y
r
UltraSPARC-IIi Results
25 25 ser
3. 2.e
pe o
rlb n
25 mk
Worst-case overhead, most indirect branches: 33%
25 4.g
integer benchmarks: 8%
5. ap
25 vor
6. tex
b
30 zip
0. 2
Evaluating Fragment Construction Policies for SDT Systems in VEE’06
tw
This takes FOREVER, right?
ol
f
8
av
in e
ta
v
fp e
av
e
Strata as a Control System
Questions?
Jason Hiser
University of Virginia
chksm=0; 10
for(int i=start;i<end;i++)
chksm+=*(int*)i;
%ebp+=chksm;
• Advantages • Disadvantages
EXPECTED_CHECKSUM
– Provides circular protection – Applied once at link time
– Reasonable overhead – Execution of guard may
reveal its location
45%
40%
35%
(% of total)
30%
25%
20%
15%
10%
5%
0%
- 2 4 6 8 10 12 14 16 18 20 22
Runtime (seconds)
No flushing 10s flushing 1s flushing 0.1s flushing
SDT does well to start with, no more than 45% of application text in the F$!
Flushing helps
• Flushing every 1 sec. => less than 10% of app. text in F$
• Flushing every 0.1 sec. => less than 3% of app. text in F$
8048330: 80 3d ee ac 08 41 00 74 02 8d 84 c3 34 12 84 80
1.8
1.6
1.4
1.2
1
0.8
f
ty
e
p
ke
k
ip
t
er
cf
2
c
a
ol
ar
rte
m
eo
ag
ip
m
gc
ga
es
af
m
gz
rs
tw
ua
9.
rlb
bz
am
cr
vo
6.
1.
2.
er
m
4.
pa
4.
0.
17
eq
6.
6.
17
18
25
pe
7.
av
25
5.
16
8.
30
7.
18
25
3.
17
25
3.
18
19
18
25
No flushing 10s flushing 1s flushing 0.1s flushing
• SDT Applications
– Obf/AT
Jason D. Hiser
http://www.cs.virginia.edu/~jdh8d
Questions?
...
Application Binary Fragment Cache
r1 = …
...
... save t0
r1 = … t0 = APPADDR_1
if (r1 == t0)
...
jmp FRAGADDR_100
jmp r1 restore t0
... t0 = APPADDR_2
L0: if (r1 == t0)
... jmp FRAGADDR_120
restore t0
<backing mechanism>
New New
Cached? Fragment
PC
Fetch
Decode
Translate
Context
Finished? Next PC
Switch
0
5
10
15
20
25
30
35
40
45
up
w
17 ise
1.
1.1
s
17 wim
2.
m
0.9
17 gri
3. d
1.0
ap
17 plu
7.
1.0
m
17 e
8. sa
5.9
ga
lg
e
1.0
17 l
18 9.
3. a
e q rt
1.0
ce
18 r
8. ec
1.4
am
18 mp
9.
1.0
l
19 uca
1. s
1.1
20 f ma
0.
s i 3d
1.7
xt
ra
1.8
1.
a
16 psi
1.0
4.
gz
17 ip
1.1
5.
v
17 pr
1.1
6.
g
18 c c
8.3
University of Virginia
18 .mc
6. f
1.0
c
19 raf
7. t
7.7
pa y
rs
25 er
1.2
25 2
3. .e o
pe n
4.4
rlb
m
k
The “rarity” of IBs
25
4
39.5
25 .ga
5. p
vo
34.4
25 te r
6. x
2.8
bz
30 ip2
0.
1.1
tw
Millions of Indirect Calls+Switches/Second
ol
fp f
1.0
av
e
in
1.2
ta
ve
39
2.0
av
e
1.4
How to Handle IBs, Option 1:
Indirect Branch Translation Cache
• Mapping done with table in data memory (memory accesses)
– Table entry: <AppAddr, FragAddr>
• Table indexed by application address
1.5
1.4
Runtime (Overhead).
1.3
1.2
1.1
1.0
0.9
0.8
177.mesa 176.gcc 186.crafty 252.eon 253.perlbmk 254.gap 255.vortex graph ave
2
Runtime (Overhead).
1.8
1.6
1.4
1.2
0.8
177.mesa 176.gcc 186.crafty 252.eon 253.perlbmk 254.gap 255.vortex graph ave
New New
Cached? Fragment
PC
Fetch
Decode
Translate
Context
Finished? Next PC
Switch
branches
New
Cached?
in application
New text
PC Fragment
– Trampoline pool improves locality
Fetch
– Trampolines can Decodebe recycled
Translate
Context
Finished? Next PC
• Disadvantages
Switch
Tramp. Pool
– May translate unrequested basic
blocks (waste of time and F$
space)
Wednesday, August 12, 2009 University of Virginia 48
Fragment Construction Policies
1) Conditional branch policies
2) Unconditional branch policies
3) Call policies
• Partial inlining
• Lazy vs. eager target translation
4) Fragment alignment
5) Trampoline placement
0.8
1.0
1.2
1.4
1.6
1.8
17 wis
1 e
17 .sw
2. im
17 mg
3 . ri d
17 ap
7 plu
17 .me
8 . sa
ga
lg
18 17 el
3 9.
always stop
f k
18 ace e
8 . re
a c
18 mm
9 p
19 .lu
1 c
20 .f m as
0. a3
si d
xt
3 0 ra c
1. k
16 aps
4. i
g
17 zip
5
17 .vp
6 r
University of Virginia
18 .gcc
18 1.m
Opteron 244
19 6.c cf
7. raf
pa t y
rs
25 25 er
3. 2.e
pe o
rlb n
2 5 mk
Conditional Branches
25 4.g
5. ap
25 vor
6. tex
always continue
b
30 zip
0. 2
t
av wol
er f
a
51
in ge
ta
v
fp e
av
e
“Always continue” reduces overhead from 39% to 28% for integer benchmarks
Partial Inlining/Branch Eliding
• Advantages
– Provides opportunity for optimization
– Eliminates call/branch instructions
• Disadvantages
– Increased code duplication
– Calls not matched with return instructions =>
Bad branch predictor performance!
0.8
1.0
1.2
1.4
1.6
1.8
17 wis
1 e
17 .sw
2. im
17 mg
3 ri d
17 .ap
7 plu
17 .me
8. sa
ga
lg
18 17 el
3 9.
partial inlining
30 rac
1. k
16 a ps
4. i
g
17 zip
5
17 .vp
6 r
University of Virginia
18 .gcc
18 1.m
Opteron 244
19 6.c cf
7 . ra f
pa t y
rs
25 25 er
3. 2.e
pe o
rlb n
no partial inlining
25 mk
25 4.g
5 . ap
25 vor
6. tex
b
30 zip
0. 2
t
Partial Inlining Performance
av wol
er f
a
53
“No partial inlining” reduces overhead from 24% to 10% for integer benchmarks
in ge
ta
v
fp e
av
e