0% found this document useful (0 votes)
24 views

A_lightweight_software_control_system_fo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

A_lightweight_software_control_system_fo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

A Lightweight Software Control System

for Cyber Awareness and Security

Michele Co, Clark L. Coleman, Jack W. Davidson,


Sudeep Ghosh, Jason D. Hiser, John C. Knight and
Anh Nguyen-Tuong

University of Virginia

Wednesday, August 12, 2009 University of Virginia


The Problem
• Modern society critical relies on properly
functioning software
– Transportation, power distribution, communication,
water and sanitation systems
• Software is not and cannot be made perfect
– Rigorous testing/validation still yield flawed software
– Software bugs/vulnerabilities are a growing problem
• A method for improving the resilience of
software is needed

Wednesday, August 12, 2009 University of Virginia 2


Software Dynamic Translation (SDT)

• The programmatic modification of a


running program’s binary instructions
Application Binary Software layer mediates program
Dynamic Translator execution by modifying (translating)
instructions before they execute on host
Operating System
CPU
CPU

If built correctly, SDT systems are reliable,


transparent, low memory overhead, very fast and
act as a control system

Wednesday, August 12, 2009 University of Virginia 3


Overview
• Introduction
• Overview
• Software Dynamic Translation
– Strata
• Summary

Wednesday, August 12, 2009 University of Virginia 4


Strata Control System Framework
Build Time Run Time

Original
Strata Software Software Dynamic Translator
Application
Dynamic Translator (Strata)
Code
Library

Translated Application Code


Actuators
Application Binary Rewriter
Code
(Binary)
Goals &
Control
Response
Logic
Actions

Sensor Sensors
Application-
Control Logic &
spectifc Goals
Library Actuator
& Database
Response
Actions Fragment Cache

Controlled Application

Wednesday, August 12, 2009 University of Virginia 5


Strata Virtual Machine
Application Binary
Context
Dynamic Translator
Capture

New New
Cached? Fragment
PC

Fetch
Decode
Translate
Context
Finished? Next PC
Switch

Strata

Wednesday, August 12, 2009 University of Virginia 6


Strata Virtual Machine
System Start
(first PC
PC)
Fragment Cache Application Binary
Context
Dynamic Translator
Capture

New New
Cached? Fragment
PC

Fetch
Decode
Translate
Context
Finished? Next PC
Switch

The Takeaway:
Non-control instruction
Trampoline
• Strata’s fragment construction process for basic blocks ending in conditional branches
Direct Conditional branch
• Fragment linking avoids excess overhead related to reentering the translator

Wednesday, August 12, 2009 University of Virginia 7


16
8. Runtime (normalized)
w
up

0.8
0.9
1.0
1.1
1.2
1.3
1.4
17 wis
1 e
17 .sw
2. i m
17 mg
3 rid
17 .ap
7. plu
17 me
8. sa
ga
lg
18 17 el
3. 9.a

Wednesday, August 12, 2009


18 eq rt
7. ua
f k
18 ace e
8. re
a c
18 mm
9 p
19 .lu
1 c
20 .f m as
0. a3
si d

Average overhead: 4%
xt
30 rac
1. k
16 aps
4. i
g
17 zip
5
17 .vp
6. r

University of Virginia
18 gcc
18 1.m
19 6.c cf
7. raf
pa t y
r
UltraSPARC-IIi Results

25 25 ser
3. 2.e
pe o
rlb n
25 mk
Worst-case overhead, most indirect branches: 33%

25 4.g
integer benchmarks: 8%

5. ap
25 vor
6. tex
b
30 zip
0. 2
Evaluating Fragment Construction Policies for SDT Systems in VEE’06

tw
This takes FOREVER, right?

ol
f
8

av
in e
ta
v
fp e
av
e
Strata as a Control System

Wednesday, August 12, 2009 University of Virginia 9


Summary
• Software protections critical infrastructures
and needs to be monitored
• Software Dynamic Translation provides a
useful mechanism for such monitoring
– Detecting memory errors
– Detecting tampering

Wednesday, August 12, 2009 University of Virginia 10


A Lightweight Software Control System
for Cyber Awareness and Security

Questions?

Jason Hiser
University of Virginia

Wednesday, August 12, 2009 University of Virginia


Backup Slides

Wednesday, August 12, 2009 University of Virginia


Obfuscation and Anti-Tampering
• Anti-tampering (AT): making a program hard to
(meaningfully) modify
• Obfuscation (Obf): making a program hard to
understand
• Why?
– Protect Intellectual Property (IP)
• Preventing reverse engineering or code extraction
• Digital watermarking and fingerprinting
– Digital Rights Management (DRM)
– Security
• Anti-virus
• Anti-hacker
• Insider threats
– Obfuscation used to hide AT techniques

Wednesday, August 12, 2009 University of Virginia 13


Limitations of Previous
Obf/AT Work
• Static
– Applied once at software build time
– Often NP-Hard if only static information is used, but dynamic
information easily breaks many techniques
• Slow
– High runtime/memory overhead
• Special Hardware
– Trusted network connection with bounded time
– Can trust that the CPU will generate result in bounded time
• Unrealistic threat model
– OS, network, or memory trusted
– Known optimal algorithms to calculate program checksums

Wednesday, August 12, 2009 University of Virginia 14


The Problem with all Obf/AT work,
even mine!
• For an arbitrary application, strong Obf/AT
is impossible!
– The player holds all the cards, eventually they
will figure out how the application works and
how to change it (Barak’01)
• The good news…
– Sufficient to make an attacker’s job harder
than re-writing the application
– Some functions can be obfuscated (Wee’05,
Hohenberger’07)
Wednesday, August 12, 2009 University of Virginia 15
Goals
• Significantly stronger Obf/AT algorithms
– Not just against static attacks, but against realistic
hybrid dynamic/static attacks
• No reliance on custom, trusted HW/SW
– Should work on machine currently in your office
• Efficient runtime overhead
– 100x slowdown is unacceptable
• If necessary, configurable tradeoff between protection level
and overhead, perhaps on per-module basis

Wednesday, August 12, 2009 University of Virginia 16


Guards (Atallah’02)
• Code segments that check some property
of the program and react based on the
outcome
– Most commonly, check that the code is
unchanged and subtly fail if tampering is
detected.
– Each guard can protect other guards to form a
network

Wednesday, August 12, 2009 University of Virginia 17


Guards Example
Application Binary

chksm=0; 10
for(int i=start;i<end;i++)
chksm+=*(int*)i;

%ebp+=chksm;

• Advantages • Disadvantages
EXPECTED_CHECKSUM
– Provides circular protection – Applied once at link time
– Reasonable overhead – Execution of guard may
reveal its location

Wednesday, August 12, 2009 University of Virginia 18


Guards with SDT
F$ Application Binary
Strata
10 10

• Advantages: Guards copied to F$ differently in each run of the program,


execution of guard does not reveal its location in the application text.
• Disadvantage: Can attempt to attack guards one at a time and guards still look the
same during each execution of the program, even if at different locations

Wednesday, August 12, 2009 University of Virginia 19


Guards with SDT
F$ Application Binary
Strata
10

• Advantages: Guards copied to F$ differently in each run of the program,


execution of guard does not reveal its location in the application text.
• Disadvantage: guards still look the same during each execution of the program

Wednesday, August 12, 2009 University of Virginia 20


Addressing Shortcomings
• Flush the F$ periodically
– Move the guards around
• Encrypt the application code and decrypt on
demand
– Hides app. code from static disassembly/analysis
• Hide key with white-box AES techniques
– To succeed in an attack, encryption blocks must be
modified as a unit
• One-off changes, attacking guards one at a time, or playing
what-if games with single instructions will fail!

Wednesday, August 12, 2009 University of Virginia 21


But, isn’t the entire app. just in F$?
Preliminary Results: Case Study gcc
Application Text Translated

45%
40%
35%
(% of total)

30%
25%

20%
15%
10%

5%
0%
- 2 4 6 8 10 12 14 16 18 20 22
Runtime (seconds)
No flushing 10s flushing 1s flushing 0.1s flushing

SDT does well to start with, no more than 45% of application text in the F$!
Flushing helps
• Flushing every 1 sec. => less than 10% of app. text in F$
• Flushing every 0.1 sec. => less than 3% of app. text in F$

Wednesday, August 12, 2009 University of Virginia 22


Dynamic transforms
• Apply dynamic Obf/AT transforms on fragments
– Application never runs the same way twice
– Guards appear different each time they execute!
• Examples
– Dynamic disassembly resistance
– Dynamic control flow graph obfuscation
– Dynamic guards
– Instruction morphing
– Algebraic Identities

Wednesday, August 12, 2009 University of Virginia 23


Dynamic Disassembly Resistance
• Goal – make it harder to disassemble F$

8048330: 80 3d ee ac 08 41 00 74 02 8d 84 c3 34 12 84 80

8048330: 80 3d ee ac 08 41 00 cmpb $0x0, *(0x4108acee)


8048337: 74 02 je 804833b
8048339: 8d 84 c3 34 12 84 80 .byte
lea 0x8d 0x84
80841234(%ebx,%eax,8),%eax
804333b: c3 ret
804333c: 34 12 84 80 …

Lightweight transform performed randomly for each frag build

Wednesday, August 12, 2009 University of Virginia 24


(Dynamic) Opaque Predicates
• Runtime generate predicates which are
hard to decrypt after generation
a=a->next; b
a
b=b->next; c
c=c->next;
if(a==b)

else if(b==c)

Wednesday, August 12, 2009 University of Virginia 25


Surely all this must take forever!
Preliminary Results: Flushing + Encryption
2.4
2.2
2
Runtime

1.8
1.6
1.4
1.2
1
0.8

f
ty

e
p
ke

k
ip
t

er
cf

2
c
a

ol
ar

rte
m
eo

ag
ip
m

gc

ga
es

af
m
gz

rs

tw
ua
9.

rlb

bz
am

cr

vo
6.

1.

2.

er
m

4.
pa
4.

0.
17

eq

6.

6.
17

18

25

pe
7.

av
25

5.
16
8.

30
7.
18

25
3.
17

25
3.
18

19
18

25
No flushing 10s flushing 1s flushing 0.1s flushing

• Flushing every 1 sec. => 3% slower than no flushing


• Flushing every 0.1 sec => Lots of slowdown.. but, maybe we can improve that
• Selectively flushing
• Using extra CPU’s in a multi-core machine

Wednesday, August 12, 2009 University of Virginia 26


How well does dynamic Obf/AT
protect applications?

• Continuing evaluation ongoing as part of


CyberTrust’07 Grant

Wednesday, August 12, 2009 University of Virginia 27


Overview
• Introduction
• Overview
• Strata
– SDT Concepts

• SDT Applications
– Obf/AT

• Related Work and Summary

Wednesday, August 12, 2009 University of Virginia 28


Related Work - SDTs
• SDT Applications
– Security policy enforcement (Code Diversity, Program
Shepherding)
– Software migration (Apple’s Rosetta)
– Dynamic instrumentation (PIN, FIST)
– Dynamic patching and debugging (Arachne)
• SDT Optimizations
– Dynamic optimizers (Dynamo/DynamoRIO, JITs)
• Bala, Duesterwald, Bruening, Suganuma, Arnold, …
– Trace selection:
• NET (Deusterwald’00)
• LEI (Hohenberger’05)
– Code cache management (Hazelwood’06)
• Many many more..

Wednesday, August 12, 2009 University of Virginia 29


Related Work – Obf/AT
• Guards (Atallah ’02)
– Breaking guards (Wurster’05),
– Self-modifying guards (Giffin’05)
• Opaque Predicates (Collberg’98)
• Data Obfuscation (Collberg’98)
• Control flow flattening (Wang’00)
• Dynamic code mutation (Madou’05)
• So many others…

Wednesday, August 12, 2009 University of Virginia 30


Summary
• Software dynamic translation
– Efficient, powerful technology to dynamically modify
programs
– Low overhead
• Recent optimizations yield only 4% slower than native
execution for Spec2k benchmarks!
• Obfuscation and anti-tampering
– Important for DRM/IP/Security
– Current technology has many shortcomings against
realistic threat models
– Combining previous static techniques with SDT yields
significantly stronger Obf/AT protection

Wednesday, August 12, 2009 University of Virginia 31


Optimizing Software Dynamic Translation
(for Program Obfuscation and Anti-tampering)

Jason D. Hiser
http://www.cs.virginia.edu/~jdh8d

Questions?

Wednesday, August 12, 2009 University of Virginia


Experimental Setup
• Strata running on 3 machines
– Opteron 244, 1.8GHz, Linux, gcc 4.0
– Xeon, 2.8GHz, Linux, gcc 3.3
– UltraSPARC-IIi, 500MHz, Solaris 5.9,
SUNWspro cc
• Results compared to no SDT
• Indirect branches handled efficiently with
indirect branch translation cache
mechanism

Wednesday, August 12, 2009 University of Virginia 33


Fast Returns
• Translate call instructions to push fragment
cache return address instead of application ret.
addr.
+ Copy return instructions directly to F$
= Fast returns
Advantages: Very fast, minimal F$ space
Disadvantages: May break some programs with
nonstandard usage of call instruction.
Alternatives: Use IBTC/Sieve or Return Cache

Wednesday, August 12, 2009 University of Virginia 34


How to Handle IBs, Option 3:
Inline Mappings
• Instructions emitted at each branch to perform translation
• No hashing – compare app. address against inlined addresses

...
Application Binary Fragment Cache
r1 = …
...
... save t0
r1 = … t0 = APPADDR_1
if (r1 == t0)
...
jmp FRAGADDR_100
jmp r1 restore t0
... t0 = APPADDR_2
L0: if (r1 == t0)
... jmp FRAGADDR_120
restore t0
<backing mechanism>

Wednesday, August 12, 2009 University of Virginia 35


Indirect Branch Translation Cache
• Table in memory
– Advantage: Small code footprint & minimal branches
– Disadvantage: Memory accesses & data cache pressure
– Other considerations
• Uses two temporary registers & comparison
• Many options
– Sharing (one for all branches or one per branch) One for all
– Appropriate size (number of entries) 4k-Entries
– Resizing (dynamically adjust size) Not necessary
– Reprobing (where to look on collision) Space constraints
– Lookup code placement
• Inline in fragment or a separate “function” Space v. Speed

Wednesday, August 12, 2009 University of Virginia 36


Sieve
• Table as an instruction sequence
– Advantage: Fewer data memory accesses
– Disadvantage: More branches and possibly pressure
on instruction cache
– Other considerations
• Uses one temporary register
• Uses an address-sized constant compared to register
• Options
– Table size 16K-Entries
– Others possible, but seem to not matter

Wednesday, August 12, 2009 University of Virginia 37


Back to Indirect Branches (IB)

Fragment Cache Application Binary


Context
Dynamic Translator
Capture

New New
Cached? Fragment
PC

Fetch
Decode
Translate
Context
Finished? Next PC
Switch

How necessary is this? Aren’t indirect branches pretty rare?

Wednesday, August 12, 2009 University of Virginia 38


16
8.
w

0
5
10
15
20
25
30
35
40
45
up
w
17 ise
1.

1.1
s
17 wim
2.
m

0.9
17 gri
3. d

1.0
ap
17 plu
7.

1.0
m
17 e
8. sa

5.9
ga
lg
e

1.0
17 l
18 9.
3. a
e q rt

Wednesday, August 12, 2009


1.0
18 ua
7. k
fa e

1.0
ce
18 r
8. ec

1.4
am
18 mp
9.

1.0
l
19 uca
1. s

1.1
20 f ma
0.
s i 3d

1.7
xt
ra

Overhead (Normalized to Native)


30 ck

1.8
1.
a
16 psi

1.0
4.
gz
17 ip
1.1
5.
v
17 pr
1.1
6.
g
18 c c
8.3

University of Virginia
18 .mc
6. f
1.0

c
19 raf
7. t
7.7

pa y
rs
25 er
1.2

25 2
3. .e o
pe n
4.4

rlb
m
k
The “rarity” of IBs

25
4
39.5

25 .ga
5. p
vo
34.4

25 te r
6. x
2.8

bz
30 ip2
0.
1.1

tw
Millions of Indirect Calls+Switches/Second

ol
fp f
1.0

av
e
in
1.2

ta
ve
39
2.0

av
e
1.4
How to Handle IBs, Option 1:
Indirect Branch Translation Cache
• Mapping done with table in data memory (memory accesses)
– Table entry: <AppAddr, FragAddr>
• Table indexed by application address

Application Binary ... Fragment Cache


r1 = …
...
... save t0, t1
r1 = … t0 = hash(r1)
... if (IBTC[t0].AppAddr == r1)
jmp r1 t1 = IBTC[t0].FragAddr
... jmp t1
L0: restore t0, t1
... else
jmp translator

Wednesday, August 12, 2009 University of Virginia 40


How to Handle IBs, Option 2: Sieve
• Mapping done by executing instruction sequence
Sieve Table
Addr16 Addr10 Bucket2 Frag10
Addr8

Jmp Bucket1 Bucket1 Frag99


Dispatch Addr4
Jmp Bucket4
Bucket4 Frag111
Addr10
Frag16
Return To Bucket3
Translator Bucket5 Addr12
Addr16
Frag204
Fragment Cache

Wednesday, August 12, 2009 University of Virginia 41


Combined: Inline Mapping
• Inlining mappings at indirect
– Advantage: No hashing, no mem. access, min. branches
– Disadvantage: Code growth & hit cost depends on hit entry
– Other considerations
• Possibly one register and constant address comparison to register
• Options
– Number of inline entries It d
• Should the translator decide the amount of inlining? ep
– Target to inline en
– Execution point when that target should be selected
ds
..
– Backing mechanism to use (what to do on a miss)

Wednesday, August 12, 2009 University of Virginia 42


IBTC Vs. Sieve
UltraSPARC-IIi
1.6

1.5

1.4
Runtime (Overhead).

1.3

1.2

1.1

1.0

0.9

0.8
177.mesa 176.gcc 186.crafty 252.eon 253.perlbmk 254.gap 255.vortex graph ave

32K-Entry IBTC 1K-Entry Sieve

Sieve: 2 instructions to generate address-sized constant, more control transfers

Wednesday, August 12, 2009 University of Virginia 43


IBTC Vs. Sieve
Pentium IV Xeon
2.2

2
Runtime (Overhead).

1.8

1.6

1.4

1.2

0.8
177.mesa 176.gcc 186.crafty 252.eon 253.perlbmk 254.gap 255.vortex graph ave

32K-Entry IBTC 32K-Entry Sieve 16K-Entry Sieve

• Sieve: 1 instruction to generate address-sized constant


• Sieve: No need to save/restore eflags for 16k-entries => Big win!
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems in
CGO’07

Wednesday, August 12, 2009 University of Virginia 44


Why SDT for Obf/AT?
• Efficient: 2-10% overhead
• Monitors program execution
– Dynamically apply Obf/AT transformations
– Malicious user first has to figure out the SDT,
then the application
• Ever try to debug a program running under a simulator without
source code?!
– The SDT can protect the application, and the application can
protect the SDT, circular level of trust

Wednesday, August 12, 2009 University of Virginia 45


Unconditional Direct Branches
System Start

Fragment Cache Application Binary


Context
Dynamic Translator
Capture

New New
Cached? Fragment
PC

Fetch
Decode
Translate
Context
Finished? Next PC
Switch

Direct Unconditional branch


Elide direct branches (and calls) to avoid extra instructions

Wednesday, August 12, 2009 University of Virginia 46


F$ Inefficiencies

• Each conditional branch transfers control


to a trampoline
+ Trampolines patched to jump directly to
target fragment
= 2 F$ branches executed for every one
executed branch in the original program!
• Patched trampolines leave wasted F$
space – reduced locality?
• Possible code duplication
– 100 calls to strcpy() executed lead to 100
copies of the first basic block of strcpy thanks
to partial inlining and unconditional branch
eliding

Wednesday, August 12, 2009 University of Virginia 47


Improving Performance
Fragment Cache • Advantages Application Binary
Context
– One branch
Capture in F$ for most
Dynamic Translator

branches
New
Cached?
in application
New text
PC Fragment
– Trampoline pool improves locality
Fetch
– Trampolines can Decodebe recycled
Translate
Context
Finished? Next PC
• Disadvantages
Switch

Tramp. Pool
– May translate unrequested basic
blocks (waste of time and F$
space)
Wednesday, August 12, 2009 University of Virginia 48
Fragment Construction Policies
1) Conditional branch policies
2) Unconditional branch policies
3) Call policies
• Partial inlining
• Lazy vs. eager target translation
4) Fragment alignment
5) Trampoline placement

Wednesday, August 12, 2009 University of Virginia 49


Conditional Branch Handling
• Always stop translating
• Always continue translating
• Stop if…
– Target already translated
– Fall through already translated
– Target OR fall through translated
– Target AND fall through translated

Wednesday, August 12, 2009 University of Virginia 50


16 Runtime (Normalized)
8.
w
up

0.8
1.0
1.2
1.4
1.6
1.8
17 wis
1 e
17 .sw
2. im
17 mg
3 . ri d
17 ap
7 plu
17 .me
8 . sa
ga
lg
18 17 el
3 9.

Wednesday, August 12, 2009


18 .eq art
7. ua

always stop
f k
18 ace e
8 . re
a c
18 mm
9 p
19 .lu
1 c
20 .f m as
0. a3
si d
xt
3 0 ra c
1. k
16 aps
4. i
g
17 zip
5
17 .vp
6 r

University of Virginia
18 .gcc
18 1.m
Opteron 244

19 6.c cf
7. raf
pa t y
rs
25 25 er
3. 2.e
pe o
rlb n
2 5 mk
Conditional Branches

25 4.g
5. ap
25 vor
6. tex
always continue

b
30 zip
0. 2
t
av wol
er f
a
51

in ge
ta
v
fp e
av
e
“Always continue” reduces overhead from 39% to 28% for integer benchmarks
Partial Inlining/Branch Eliding
• Advantages
– Provides opportunity for optimization
– Eliminates call/branch instructions
• Disadvantages
– Increased code duplication
– Calls not matched with return instructions =>
Bad branch predictor performance!

Wednesday, August 12, 2009 University of Virginia 52


16 Runtime (Normalized)
8.
w
up

0.8
1.0
1.2
1.4
1.6
1.8
17 wis
1 e
17 .sw
2. im
17 mg
3 ri d
17 .ap
7 plu
17 .me
8. sa
ga
lg
18 17 el
3 9.

Wednesday, August 12, 2009


18 .eq art
7. ua
f k
18 ace e
8 . re
a c
18 mm
9 p
19 .lu
1 c
20 .f m as
0. a3
si d
xt

partial inlining
30 rac
1. k
16 a ps
4. i
g
17 zip
5
17 .vp
6 r

University of Virginia
18 .gcc
18 1.m
Opteron 244

19 6.c cf
7 . ra f
pa t y
rs
25 25 er
3. 2.e
pe o
rlb n
no partial inlining

25 mk
25 4.g
5 . ap
25 vor
6. tex
b
30 zip
0. 2
t
Partial Inlining Performance

av wol
er f
a
53
“No partial inlining” reduces overhead from 24% to 10% for integer benchmarks

in ge
ta
v
fp e
av
e

You might also like