SlideShare a Scribd company logo
Edgar Barbosa
  H2HC 2011
São Paulo - Brazil
Who am I?
                      
 Edgar Barbosa
 Senior Security Researcher at COSEINC (Singapore)
 One of the developers of Blue Pill, a hardware-based
  virtualization rootkit. Also presented a way to detect this type
  of rootkit.
 Discovered the Windows kernel KdVersionBlock data structure
  used for some forensic tools.
 Focus: RCE, Windows Internals, Virtualization and Program
  Analysis.
 Currently working on the COSEINC SMT Project, which aims
  to automate the bug finding process with the help of SMT
  solvers. The current presentation is part of the research done for
  the SMT project.
Control Flow Analysis
Control Flow Analysis
             
 Control Flow Analysis (CFA)
 Static analysis technique to discover the hierarchical flow of
  control within a procedure (function).
 Analysis of all possible execution paths inside a program or
  procedure.
 Represents the control structure of the procedure using
  Control Flow Graphs.
 Compiler theory - optimization
 The focus of this presentation is to demonstrate CFA for
  Reverse Code Engineering, where the source code isn’t
  available.
RCE and CFA
           
     Executable
                       Disassembler
   (binary format)




Extract control flow   Control Flow
    information          Graph
What is a CFG?
                
 A Control Flow Graph (CFG) is a directed graph
  G(V;E) which consists of a set of vertices (nodes)V,
  and a set of edges E, which indicate possible flow of
  control between nodes
 Or, is a directed graph that represents a superset of
  all possible execution paths of a procedure.
 Graph nodes represents objects called Basic Blocks
  (BB)
CFG
Nodes
         
Edges
               
tail


   head

       tail           head
CFG
Edges
         
BinNavi
             
 Views
 Nodes
 Edges
CFG properties
                
 In the CFA literature the algorithms assume the following
  CFG properties:
    Unique Start node (Entry node)
    All the nodes of must be reachable from the START node.
    Unique Exit node
 Real-world:
    Easy to find multiple exit nodes (RETURN) on the
     disassembly of a function
 Create a new exit node, add it to the graph and modify
  the return instructions to jump to the new node.
BB identification
                
 In general, the problem of discovering all the
  possible execution paths of a code is undecidable. (cf.
  Halting problem).
 First step for CFG reconstruction is to identifiy all the
  basic blocks.
 A basic block is a maximal sequence of instructions
  that can be entered only at the first of them and
  exited only from the last of them
Basic Block (BB)
       
Basic Blocks
                      
 First instruction of a BB (the leader instruction):
   1.   The entry point of the routine
   2.   The target of a branch instruction
   3.   The instruction immediately following a branch
 Although CALL is a branch instruction, the target
  function is assumed to always return and therefore it is
  allowed in the middle of a BB.
 To build the BB’s we need to identify all the leader
  instructions. This requires the disassembly of the
  instructions.
 Two disassembly algorithms
1. Linear Sweep
                  
 A linear sweep algorithm starts with the first byte in the
  code section and proceeds by decoding each byte until an
  illegal instruction is encountered[a]




>> 8B FF 55 8B EC 8B 45 08

8B FF         mov    edi, edi
55            push   ebp
8B EC         mov    ebp, esp
8B 45 08      mov    eax, [ebp+8]
2. Recursive Traversal
             
 Linear sweep algorithm doesn’t take into account the
  control flow behaviour of some instructions.
>> EB 01 FF 8B 45 FC

 EB 01      jmp short 0x401020
 FF         ???     ;invalid
 Recursive traversal disassemblers interpret branch
  instructions in the program to translate only those
  bytes which can actually be reached by control flow.   [b]
2. Recursive Traversal
           
EB 01 FF 8B 45 FC



EB 01    jmp short 0x401020
FF       ???   (UNREACHABLE)
8B 45 FC mov eax, dword ptr ss:[ebp-4]
State-of-art CFG
          reconstruction
                
 Once identified the basic blocks, the CFG
  construction is done after the addition of the edges.
 CFG construction is especially difficult when the
  code includes indirect calls. (call dword ptr[eax])
 State-of-art CFG construction available is the open-
  source Jakstab tool (Java Toolkit for Static Analysis
  of Binaries) from Johannes Kinder.
 Provides better results than IDAPro.
Jakstab   [d]




   
Self-modifying code
        
     Control Flow Analysis
Self-modifying code
              
 Consider the following example (not real x86 opcodes)
                                   [c]



      Address      Assembly               Binary
      0x0          movb 0xc 0x8           c6 0c 08
      0x3          inc %ebx               40 01
      0x5          movb 0xc 0x5           c6 0c 05
      0x8          inc %edx               40 03
      0xa          push %ecx              ff 02
      0xc          dec %ebx               48 01
 A linear sweep or recursive traversal algorithm execution on
  the above code would result in a single Basic Block (single
  entry/single exit/no branches)
SMC
       CFG 1            CFG 2          CFG 3


0x0   movb 0xc 0x8   movb 0xc 0x8   movb 0xc 0x8
0x3   inc %ebx
0x5   movb 0xc 0x5
0x8   inc %edx       inc %ebx       inc %ebx
0xa   push %ecx      movb 0xc 0x5   jmp 0xc
0xc   dec %ebx       jmp 0x3

                                    jmp 0x3

                     push %ecx      push %ecx
                     dec %ebx


                                    dec %ebx
SE-CFG
                    
 State-Enhanced Control Flow Graph (SE-CFG)
 CFG augmented with extensions to support SMC.
 Allows the use of control flow analysis algorithms
  for SMC.
 “A Model for Self-Modifying Code”
 Codebyte extensions – Codebyte conditional edges
 Implemented in a link-time binary rewriter: Diablo.
 It can be downloaded from
   http://www.elis.ugent.be/diablo
SMC - CFG
             
            movb 0xc 0x8

            inc %ebx


jmp 0xc     movb 0xc 0x5


            inc %edx       jmp 0x3
            push %ecx


            dec %ebx
Control Flow Analysis
Dominators
   
 Control Flow Analysis
Dominance relation
            
 Relation about the nodes of a control flow graph.
 “Node A dominates Node B if every path from the
  entry node to B includes A”.
 Representation: A dom B
 Properties:
   Antisymmetric (either A dom B or B dom A)
   Reflexive (A dom A)
   Transitive (If A dom B and B dom C then A dom C)
 Can be represented by a tree, the Dominator Tree.
Control Flow Graph
                 
Entry
Node




Exit
node
Dominator Tree
     
Implementations
              
 Classic reference:
    Lengauer-Tarjan algorithm
 Boost C++ library
 Immunity Debugger
    libcontrolflow.py
      Class DominatorTree
 BinNavi API
    GraphAlgorithms getDominatorTree()
    DEMO: Gui plugin
Natural loops
                 
 We can use the Dominator Tree to identify loops.
 Locate the back edges
 Back edge:
   An edge whose head dominates its tail.
 A loop consists:
   of all nodes dominated by its entry node (head of the
    back edge) from which the entry node can be reached
 These loops are named Natural Loops.
Loop
 Header




Back Edge
ImmunityDbg !findloop
         
         ImmDbgPyCommandsfindloop.py
Strongly connected
    components
        
     Control Flow Analysis
SCC
                        
 SCC  Strongly connected components
 A graph (directed/undirected) is called strongly
  connected if there is a path from each vertex to every
  other vertex
 Any loop is a strongly connected component
SCC
            a


    b                   This graph is
                    d   not strongly
c
                        connected.
        e


                f
SCC
            a
                    SCC

    b
                          But it contains
                          a subgraph
                    d
c                         which is
        e                 strongly-
                          connected.
                f
SCC - algorithms
               
 Tarjan algorithm
   fast algorithm - complex
 Kosaraju-Sharir algorithm
   simple, but slower than Tarjan’s algorithm
 Implementations available for all languages:
   C#/Python/Lua/Ruby/Java
Control Flow Analysis
Interval Analysis
       
    Control Flow Analysis
Regions and intervals
            
 Unfortunately SCC isn’t able to identify nested loops
 Interval Analysis
    Divides the CFG into regions and consolidate them into
     new nodes (abstract nodes) resulting in an abstract flowgraph.
 We need to identify regions and pre-intervals
 Region:
    A region in a flow graph is a sub graph H with an unique
     entry node h
 Pre-Interval:
    A pre-interval in a flow graph is a region <H,h> such that
     every cycle (loop) in H includes the header h.
 Similar to a unique entry SCC.
Nested Intervals
T1/T2 transformations
           
 Reduction of graphs
 We can collapse nodes from a region to a single
  node. This is called t1/t2 transformation. If we apply
  it to all loops, the graph becomes a cycle-free one.
 Cycle-free graphs are easier to analyze.
Control Flow Analysis
Control Flow Analysis
Interval analysis
            
 DEMO
GOTO considered
               harmful…
                       




http://xkcd.com/292/
Irreducible graphs
               
 All the loops identified by the previous methods
  (dominance tree/interval analysis) are called natural
  loops.
 They are unique entry loops.
 There another type of loop:
    irreducible graphs or improper regions
Irreducible graph
                
                      e
                                  Loop (a , b)
Entry
                                  2 entries! b or a
                  s


              a           b



Return
                              r
Irreducible graphs
              
 Who codes like that?
   Anyone who uses GOTO
   It is rare, but it does exist
      notepad.exe
      ntoskrnl.exe (Windows Kernel)
 What’s the problem?
   Most of the algorithms are unable to handle
    irreducible graphs!!! Including Interval analysis.
   Can’t apply T1/T2
translateString
                           
int *__stdcall TranslateString(int a1)
{
            wchar_t v1; // cx@1
            …
            if ( v1 )
            {
                         while ( 1 )

                       {
                             v5 = &v22 + v26;
                             …
                                                  Jump inside the
                             LABEL_49:
                             v1 = *(_WORD *)v7;
                                                  WHILE statement
                                         …

                       }
           }
               goto LABEL_49;
 }
Solutions
                    
 There are 2 main solutions to handle irreducible
  graphs:
    Structural Analysis
    DJ-Graphs
Structural Analysis
        
     Control Flow Analysis
Structural Analysis
              
 Structural analysis will identify the main language
  constructs inside a flow graph using region schemas.
 Do you want to build your own decompiler?
    Hex-Rays decompiler internally uses Structural
     Analysis
 Created by Micha Sharir
 Reference paper:
    Structural analysis: a new approach to flow analysis in
     optimizing compliers (1979)
Acyclic schemas
       
Cyclic schemas
       
DJ-Graphs
                    
 Another way to handle irreducible graphs.
 It is also able to identify all types of structures,
  including improper regions and nested structures.
 Uses a combination of the dominance tree and the
  original flowgraph with two additional types of
  edges:
    the D edge (Dominator)
    the J edges
 Paper: Identifying loops using DJ graphs.[e]
DJ-Graphs
    
Applications
                  
 Taint analysis
    Control dependency (dominators, post-dominators)
 Diff Slicing
    Execution Indexing (view the CFG as a grammar)
      Execution alignment
    Identification of root causes of software crashes
 Decompilation
 Code coverage
 Bug finding
References
                             
 a-
  http://www.usenix.org/event/usenix03/tech/full_papers/prasad/prasad_html/n
  ode5.html
 b - An Abstract Interpretation-Based Framework for Control Flow Reconstruction
  from Binaries
 c – Bertrand Anckaert, Matias Madou, and Koen De Bosschere. 2006. A model
    for self-modifying code. In Proceedings of the 8th international conference on
    Information hiding (IH'06)
 d - http://www.jakstab.org/
 e - Vugranam C. Sreedhar, Guang R. Gao, and Yong-Fong Lee. 1996. Identifying
  loops using DJ graphs. ACM Trans. Program. Lang. Syst. 18, 6 (November 1996), 649-
  658.
 f - Advanced compiler implementation – Steven Muchnick
 g - Notes on Graph Algorithms Used in Optimizing Compilers - Carl D. Offner
Questions?
                 
 Contact: edgarmb@gmail.com
            edgar@research.coseinc.com
 twitter: @embarbosa
Ad

More Related Content

What's hot (20)

Intro automata theory
Intro automata theory Intro automata theory
Intro automata theory
Rajendran
 
LINEAR BOUNDED AUTOMATA (LBA).pptx
LINEAR BOUNDED AUTOMATA (LBA).pptxLINEAR BOUNDED AUTOMATA (LBA).pptx
LINEAR BOUNDED AUTOMATA (LBA).pptx
AkhilJoseph63
 
Types of grammer - TOC
Types of grammer - TOCTypes of grammer - TOC
Types of grammer - TOC
AbhayDhupar
 
Protection and Security in Operating Systems
Protection and Security in Operating SystemsProtection and Security in Operating Systems
Protection and Security in Operating Systems
vampugani
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
System calls
System callsSystem calls
System calls
Bernard Senam
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
FellowBuddy.com
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
NderituGichuki1
 
Virtual memory ppt
Virtual memory pptVirtual memory ppt
Virtual memory ppt
Punjab College Of Technical Education
 
Automata theory - Push Down Automata (PDA)
Automata theory - Push Down Automata (PDA)Automata theory - Push Down Automata (PDA)
Automata theory - Push Down Automata (PDA)
Akila Krishnamoorthy
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
Gayathri Gaayu
 
Input Output Organization
Input Output OrganizationInput Output Organization
Input Output Organization
Kamal Acharya
 
3b. LMD & RMD.pdf
3b. LMD & RMD.pdf3b. LMD & RMD.pdf
3b. LMD & RMD.pdf
TANZINTANZINA
 
Bus structure in Computer Organization.pdf
Bus structure in Computer Organization.pdfBus structure in Computer Organization.pdf
Bus structure in Computer Organization.pdf
mvpk14486
 
Non regular languages
Non regular languagesNon regular languages
Non regular languages
lavishka_anuj
 
I/O Management
I/O ManagementI/O Management
I/O Management
Keyur Vadodariya
 
Interrupts
InterruptsInterrupts
Interrupts
Urwa Shanza
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
Kuppusamy P
 
Media Access Control (MAC Layer)
Media Access Control (MAC Layer)Media Access Control (MAC Layer)
Media Access Control (MAC Layer)
Meenakshi Paul
 
Little o and little omega
Little o and little omegaLittle o and little omega
Little o and little omega
Rajesh K Shukla
 
Intro automata theory
Intro automata theory Intro automata theory
Intro automata theory
Rajendran
 
LINEAR BOUNDED AUTOMATA (LBA).pptx
LINEAR BOUNDED AUTOMATA (LBA).pptxLINEAR BOUNDED AUTOMATA (LBA).pptx
LINEAR BOUNDED AUTOMATA (LBA).pptx
AkhilJoseph63
 
Types of grammer - TOC
Types of grammer - TOCTypes of grammer - TOC
Types of grammer - TOC
AbhayDhupar
 
Protection and Security in Operating Systems
Protection and Security in Operating SystemsProtection and Security in Operating Systems
Protection and Security in Operating Systems
vampugani
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
FellowBuddy.com
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
NderituGichuki1
 
Automata theory - Push Down Automata (PDA)
Automata theory - Push Down Automata (PDA)Automata theory - Push Down Automata (PDA)
Automata theory - Push Down Automata (PDA)
Akila Krishnamoorthy
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
Gayathri Gaayu
 
Input Output Organization
Input Output OrganizationInput Output Organization
Input Output Organization
Kamal Acharya
 
Bus structure in Computer Organization.pdf
Bus structure in Computer Organization.pdfBus structure in Computer Organization.pdf
Bus structure in Computer Organization.pdf
mvpk14486
 
Non regular languages
Non regular languagesNon regular languages
Non regular languages
lavishka_anuj
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
Kuppusamy P
 
Media Access Control (MAC Layer)
Media Access Control (MAC Layer)Media Access Control (MAC Layer)
Media Access Control (MAC Layer)
Meenakshi Paul
 
Little o and little omega
Little o and little omegaLittle o and little omega
Little o and little omega
Rajesh K Shukla
 

Similar to Control Flow Analysis (20)

Introduction to VHDL
Introduction to VHDLIntroduction to VHDL
Introduction to VHDL
Yaser Kalifa
 
Adding a BOLT pass
Adding a BOLT passAdding a BOLT pass
Adding a BOLT pass
Amir42407
 
Kroening et al, v2c a verilog to c translator
Kroening et al, v2c   a verilog to c translatorKroening et al, v2c   a verilog to c translator
Kroening et al, v2c a verilog to c translator
sce,bhopal
 
Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design
LogsAk
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
Ravi Sony
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
PRADEEP
 
Verilog
VerilogVerilog
Verilog
Mohamed Rayan
 
Lecture 16 RC Architecture Types & FPGA Interns Lecturer.pptx
Lecture 16 RC Architecture Types & FPGA Interns Lecturer.pptxLecture 16 RC Architecture Types & FPGA Interns Lecturer.pptx
Lecture 16 RC Architecture Types & FPGA Interns Lecturer.pptx
wafawafa52
 
03 Synthesis (1).ppt
03 Synthesis  (1).ppt03 Synthesis  (1).ppt
03 Synthesis (1).ppt
ShreyasMahesh
 
The pocl Kernel Compiler
The pocl Kernel CompilerThe pocl Kernel Compiler
The pocl Kernel Compiler
Clay (Chih-Hao) Chang
 
Session1
Session1Session1
Session1
omarAbdelrhman2
 
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW ProcessorTMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
Iosif Itkin
 
compiler design-Intermediate code generation.pptx
compiler design-Intermediate code generation.pptxcompiler design-Intermediate code generation.pptx
compiler design-Intermediate code generation.pptx
murudkarp11
 
tau 2015 spyrou fpga timing
tau 2015 spyrou fpga timingtau 2015 spyrou fpga timing
tau 2015 spyrou fpga timing
Tom Spyrou
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
Universität Rostock
 
Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners
Ravi Sony
 
Hardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesHardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) Languages
LEGATO project
 
Project-Synopsis
Project-SynopsisProject-Synopsis
Project-Synopsis
Roshan Barua
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
Sławomir Zborowski
 
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARFHES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
Hackito Ergo Sum
 
Introduction to VHDL
Introduction to VHDLIntroduction to VHDL
Introduction to VHDL
Yaser Kalifa
 
Adding a BOLT pass
Adding a BOLT passAdding a BOLT pass
Adding a BOLT pass
Amir42407
 
Kroening et al, v2c a verilog to c translator
Kroening et al, v2c   a verilog to c translatorKroening et al, v2c   a verilog to c translator
Kroening et al, v2c a verilog to c translator
sce,bhopal
 
Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design
LogsAk
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
Ravi Sony
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
PRADEEP
 
Lecture 16 RC Architecture Types & FPGA Interns Lecturer.pptx
Lecture 16 RC Architecture Types & FPGA Interns Lecturer.pptxLecture 16 RC Architecture Types & FPGA Interns Lecturer.pptx
Lecture 16 RC Architecture Types & FPGA Interns Lecturer.pptx
wafawafa52
 
03 Synthesis (1).ppt
03 Synthesis  (1).ppt03 Synthesis  (1).ppt
03 Synthesis (1).ppt
ShreyasMahesh
 
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW ProcessorTMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
Iosif Itkin
 
compiler design-Intermediate code generation.pptx
compiler design-Intermediate code generation.pptxcompiler design-Intermediate code generation.pptx
compiler design-Intermediate code generation.pptx
murudkarp11
 
tau 2015 spyrou fpga timing
tau 2015 spyrou fpga timingtau 2015 spyrou fpga timing
tau 2015 spyrou fpga timing
Tom Spyrou
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
Universität Rostock
 
Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners
Ravi Sony
 
Hardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesHardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) Languages
LEGATO project
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
Sławomir Zborowski
 
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARFHES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
Hackito Ergo Sum
 
Ad

Recently uploaded (20)

UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Ad

Control Flow Analysis

  • 1. Edgar Barbosa H2HC 2011 São Paulo - Brazil
  • 2. Who am I?   Edgar Barbosa  Senior Security Researcher at COSEINC (Singapore)  One of the developers of Blue Pill, a hardware-based virtualization rootkit. Also presented a way to detect this type of rootkit.  Discovered the Windows kernel KdVersionBlock data structure used for some forensic tools.  Focus: RCE, Windows Internals, Virtualization and Program Analysis.  Currently working on the COSEINC SMT Project, which aims to automate the bug finding process with the help of SMT solvers. The current presentation is part of the research done for the SMT project.
  • 4. Control Flow Analysis   Control Flow Analysis (CFA)  Static analysis technique to discover the hierarchical flow of control within a procedure (function).  Analysis of all possible execution paths inside a program or procedure.  Represents the control structure of the procedure using Control Flow Graphs.  Compiler theory - optimization  The focus of this presentation is to demonstrate CFA for Reverse Code Engineering, where the source code isn’t available.
  • 5. RCE and CFA  Executable Disassembler (binary format) Extract control flow Control Flow information Graph
  • 6. What is a CFG?   A Control Flow Graph (CFG) is a directed graph G(V;E) which consists of a set of vertices (nodes)V, and a set of edges E, which indicate possible flow of control between nodes  Or, is a directed graph that represents a superset of all possible execution paths of a procedure.  Graph nodes represents objects called Basic Blocks (BB)
  • 7. CFG Nodes
  • 8. Edges  tail head tail head
  • 9. CFG Edges
  • 10. BinNavi   Views  Nodes  Edges
  • 11. CFG properties   In the CFA literature the algorithms assume the following CFG properties:  Unique Start node (Entry node)  All the nodes of must be reachable from the START node.  Unique Exit node  Real-world:  Easy to find multiple exit nodes (RETURN) on the disassembly of a function  Create a new exit node, add it to the graph and modify the return instructions to jump to the new node.
  • 12. BB identification   In general, the problem of discovering all the possible execution paths of a code is undecidable. (cf. Halting problem).  First step for CFG reconstruction is to identifiy all the basic blocks.  A basic block is a maximal sequence of instructions that can be entered only at the first of them and exited only from the last of them
  • 14. Basic Blocks   First instruction of a BB (the leader instruction): 1. The entry point of the routine 2. The target of a branch instruction 3. The instruction immediately following a branch  Although CALL is a branch instruction, the target function is assumed to always return and therefore it is allowed in the middle of a BB.  To build the BB’s we need to identify all the leader instructions. This requires the disassembly of the instructions.  Two disassembly algorithms
  • 15. 1. Linear Sweep   A linear sweep algorithm starts with the first byte in the code section and proceeds by decoding each byte until an illegal instruction is encountered[a] >> 8B FF 55 8B EC 8B 45 08 8B FF mov edi, edi 55 push ebp 8B EC mov ebp, esp 8B 45 08 mov eax, [ebp+8]
  • 16. 2. Recursive Traversal   Linear sweep algorithm doesn’t take into account the control flow behaviour of some instructions. >> EB 01 FF 8B 45 FC EB 01 jmp short 0x401020 FF ??? ;invalid  Recursive traversal disassemblers interpret branch instructions in the program to translate only those bytes which can actually be reached by control flow. [b]
  • 17. 2. Recursive Traversal  EB 01 FF 8B 45 FC EB 01 jmp short 0x401020 FF ??? (UNREACHABLE) 8B 45 FC mov eax, dword ptr ss:[ebp-4]
  • 18. State-of-art CFG reconstruction   Once identified the basic blocks, the CFG construction is done after the addition of the edges.  CFG construction is especially difficult when the code includes indirect calls. (call dword ptr[eax])  State-of-art CFG construction available is the open- source Jakstab tool (Java Toolkit for Static Analysis of Binaries) from Johannes Kinder.  Provides better results than IDAPro.
  • 19. Jakstab [d] 
  • 20. Self-modifying code  Control Flow Analysis
  • 21. Self-modifying code   Consider the following example (not real x86 opcodes) [c] Address Assembly Binary 0x0 movb 0xc 0x8 c6 0c 08 0x3 inc %ebx 40 01 0x5 movb 0xc 0x5 c6 0c 05 0x8 inc %edx 40 03 0xa push %ecx ff 02 0xc dec %ebx 48 01  A linear sweep or recursive traversal algorithm execution on the above code would result in a single Basic Block (single entry/single exit/no branches)
  • 22. SMC CFG 1 CFG 2 CFG 3 0x0 movb 0xc 0x8 movb 0xc 0x8 movb 0xc 0x8 0x3 inc %ebx 0x5 movb 0xc 0x5 0x8 inc %edx inc %ebx inc %ebx 0xa push %ecx movb 0xc 0x5 jmp 0xc 0xc dec %ebx jmp 0x3 jmp 0x3 push %ecx push %ecx dec %ebx dec %ebx
  • 23. SE-CFG   State-Enhanced Control Flow Graph (SE-CFG)  CFG augmented with extensions to support SMC.  Allows the use of control flow analysis algorithms for SMC.  “A Model for Self-Modifying Code”  Codebyte extensions – Codebyte conditional edges  Implemented in a link-time binary rewriter: Diablo.  It can be downloaded from  http://www.elis.ugent.be/diablo
  • 24. SMC - CFG  movb 0xc 0x8 inc %ebx jmp 0xc movb 0xc 0x5 inc %edx jmp 0x3 push %ecx dec %ebx
  • 26. Dominators  Control Flow Analysis
  • 27. Dominance relation   Relation about the nodes of a control flow graph.  “Node A dominates Node B if every path from the entry node to B includes A”.  Representation: A dom B  Properties:  Antisymmetric (either A dom B or B dom A)  Reflexive (A dom A)  Transitive (If A dom B and B dom C then A dom C)  Can be represented by a tree, the Dominator Tree.
  • 28. Control Flow Graph  Entry Node Exit node
  • 30. Implementations   Classic reference:  Lengauer-Tarjan algorithm  Boost C++ library  Immunity Debugger  libcontrolflow.py  Class DominatorTree  BinNavi API  GraphAlgorithms getDominatorTree()  DEMO: Gui plugin
  • 31. Natural loops   We can use the Dominator Tree to identify loops.  Locate the back edges  Back edge:  An edge whose head dominates its tail.  A loop consists:  of all nodes dominated by its entry node (head of the back edge) from which the entry node can be reached  These loops are named Natural Loops.
  • 33. ImmunityDbg !findloop  ImmDbgPyCommandsfindloop.py
  • 34. Strongly connected components  Control Flow Analysis
  • 35. SCC   SCC  Strongly connected components  A graph (directed/undirected) is called strongly connected if there is a path from each vertex to every other vertex  Any loop is a strongly connected component
  • 36. SCC a b This graph is d not strongly c connected. e f
  • 37. SCC a SCC b But it contains a subgraph d c which is e strongly- connected. f
  • 38. SCC - algorithms   Tarjan algorithm  fast algorithm - complex  Kosaraju-Sharir algorithm  simple, but slower than Tarjan’s algorithm  Implementations available for all languages:  C#/Python/Lua/Ruby/Java
  • 40. Interval Analysis  Control Flow Analysis
  • 41. Regions and intervals   Unfortunately SCC isn’t able to identify nested loops  Interval Analysis  Divides the CFG into regions and consolidate them into new nodes (abstract nodes) resulting in an abstract flowgraph.  We need to identify regions and pre-intervals  Region:  A region in a flow graph is a sub graph H with an unique entry node h  Pre-Interval:  A pre-interval in a flow graph is a region <H,h> such that every cycle (loop) in H includes the header h.  Similar to a unique entry SCC.
  • 43. T1/T2 transformations   Reduction of graphs  We can collapse nodes from a region to a single node. This is called t1/t2 transformation. If we apply it to all loops, the graph becomes a cycle-free one.  Cycle-free graphs are easier to analyze.
  • 46. Interval analysis   DEMO
  • 47. GOTO considered harmful…  http://xkcd.com/292/
  • 48. Irreducible graphs   All the loops identified by the previous methods (dominance tree/interval analysis) are called natural loops.  They are unique entry loops.  There another type of loop:  irreducible graphs or improper regions
  • 49. Irreducible graph  e Loop (a , b) Entry 2 entries! b or a s a b Return r
  • 50. Irreducible graphs   Who codes like that?  Anyone who uses GOTO  It is rare, but it does exist  notepad.exe  ntoskrnl.exe (Windows Kernel)  What’s the problem?  Most of the algorithms are unable to handle irreducible graphs!!! Including Interval analysis.  Can’t apply T1/T2
  • 51. translateString  int *__stdcall TranslateString(int a1) { wchar_t v1; // cx@1 … if ( v1 ) { while ( 1 ) { v5 = &v22 + v26; … Jump inside the LABEL_49: v1 = *(_WORD *)v7; WHILE statement … } } goto LABEL_49; }
  • 52. Solutions   There are 2 main solutions to handle irreducible graphs:  Structural Analysis  DJ-Graphs
  • 53. Structural Analysis  Control Flow Analysis
  • 54. Structural Analysis   Structural analysis will identify the main language constructs inside a flow graph using region schemas.  Do you want to build your own decompiler?  Hex-Rays decompiler internally uses Structural Analysis  Created by Micha Sharir  Reference paper:  Structural analysis: a new approach to flow analysis in optimizing compliers (1979)
  • 57. DJ-Graphs   Another way to handle irreducible graphs.  It is also able to identify all types of structures, including improper regions and nested structures.  Uses a combination of the dominance tree and the original flowgraph with two additional types of edges:  the D edge (Dominator)  the J edges  Paper: Identifying loops using DJ graphs.[e]
  • 58. DJ-Graphs
  • 59. Applications   Taint analysis  Control dependency (dominators, post-dominators)  Diff Slicing  Execution Indexing (view the CFG as a grammar)  Execution alignment  Identification of root causes of software crashes  Decompilation  Code coverage  Bug finding
  • 60. References   a- http://www.usenix.org/event/usenix03/tech/full_papers/prasad/prasad_html/n ode5.html  b - An Abstract Interpretation-Based Framework for Control Flow Reconstruction from Binaries  c – Bertrand Anckaert, Matias Madou, and Koen De Bosschere. 2006. A model for self-modifying code. In Proceedings of the 8th international conference on Information hiding (IH'06)  d - http://www.jakstab.org/  e - Vugranam C. Sreedhar, Guang R. Gao, and Yong-Fong Lee. 1996. Identifying loops using DJ graphs. ACM Trans. Program. Lang. Syst. 18, 6 (November 1996), 649- 658.  f - Advanced compiler implementation – Steven Muchnick  g - Notes on Graph Algorithms Used in Optimizing Compilers - Carl D. Offner
  • 61. Questions?   Contact: [email protected] [email protected]  twitter: @embarbosa