Emulator
Emulator
Contents
Emulation, Basic Interpretation
Threaded Interpretation
2
Microprocessor Architecture & System Software Lab
Emulation
“Implementing the interface/functionality of one system on
a system with different interface/functionality“
In VM, it means instruction set emulation
• Implementing one ISA (the target) reproduces the behavior of
software compiled to another ISA (the source)
3
Microprocessor Architecture & System Software Lab
Emulation Methods
Two methods of emulation: interpretation & binary translation
Interpretation
• Repeats a cycle of fetch a source instruction, analyze, perform
Binary translation
• Translates a block of source instr. to a block of target instr.
• Save the translated code for repeated use
• Bigger initial translation cost with smaller execution cost
• More advantageous if translated code is executed frequently
Some in-between techniques
• Threaded interpretation
• Predecoding
4
Microprocessor Architecture & System Software Lab
Basic Interpreter
• Emulates the whole Source Context
source machine state Source Memory State Block
Program Counter
• Guest memory and Condition Codes
context block is kept in Code
Reg 0
interpreter’s memory (heap) Reg 1
Reg n-1
• general-purpose registers,
PC, CC, control registers
Interpreter Overview
5
Microprocessor Architecture & System Software Lab
Decode-and-dispatch interpreter
Interpretation repeats
Decodes an instruction
Dispatches it to an interpretation routine based on the type of
instruction
Code for interpreting PPC ISA
While (!halt && interrupt){
inst=code[PC];
opcode=extract(inst,31,6);
switch(opcode){
case LoadWordAndZero: LoadWordAndZero(inst);
case ALU: ALU(inst);
case Branch: Branch(inst);
· · · · ·
}
6
Microprocessor Architecture & System Software Lab
Instruction Functions
7
Microprocessor Architecture & System Software Lab
Instruction Functions
8
Microprocessor Architecture & System Software Lab
Decode-and-dispatch interpreter
Advantage
• Low memory requirements
• Zero star-up time
Disadvantage:
• Steady-state performance is slow
- A source instruction must be parsed each time it is emulated
- Lots of branches would degrade performance
9
Microprocessor Architecture & System Software Lab
Branches in Decode-&-Dispatch
While (!halt&&interrupt){
switch(opcode){
case ALU:ALU(inst); Switch(opcode)
·····
} 1.Switch statement->case Indirect
2.ALU(inst) case direct
3.Return from the routine Indirect
4.Loop back-edge direct
return
11
Microprocessor Architecture & System Software Lab
Threaded Interpretation (2)
12
Microprocessor Architecture & System Software Lab
Threaded Interpretation (3)
13
Microprocessor Architecture & System Software Lab
Threaded Interpretation
One key point is that dispatch occurs indirectly thru a
dispatch table
routine = dispatch[opcode,extended_opcode];
goto *routine;
14
Microprocessor Architecture & System Software Lab
Predecoding
Extracting various fields of an instruction is complicated
• Fields are not aligned, requiring complex bit extraction
• Some related fields needed for decoding is not adjacent
If it is in a loop, this extraction job should be repeated
15
Microprocessor Architecture & System Software Lab
Predecoding for PPC
In PPC, opcode & extended opcode field are separated and register
specifiers are not byte-aligned
Define instruction format and define an predecode instruction array
based on the format
Struct instruction {
unsigned long op; // 32 bit
unsigned char dest; // 8 bit
unsigned char src1; // 8 bit
unsigned int src2; // 16 bit
} code [CODE_SIZE];
16
Microprocessor Architecture & System Software Lab
Predecoding Example
17
Microprocessor Architecture & System Software Lab
Previous Interpreter Code
18
Microprocessor Architecture & System Software Lab
New Interpreter Code
19
Microprocessor Architecture & System Software Lab
Directed Threaded Interpretation
Even with predecoding, indirect threading includes a centralized
dispatch table, which requires
• Memory access and indirect jump
To remove this overhead, replace the instruction opcode in predecoded
format by address of interpreter routine
07 001048d0
1 2 08 1 2 08
dispatch loop
Indirection Table
(a) (b ) ( c)
21
Microprocessor Architecture & System Software Lab
Comparison
Predecoded Indirect Threaded Direct Threaded
Indirection Table
Predecoder
(d) (e)
22
Microprocessor Architecture & System Software Lab
Comparison
Decode-and- Indirect Threaded Direct Threaded
Dispatch Interpreter Interpreter
23
Microprocessor Architecture & System Software Lab
DSVM
Dynamic Samsung Virtual Machine
Splitted interpreter
• Inner, Outer loop
• Instruction cache
Indirect threaded interpretation
24
Microprocessor Architecture & System Software Lab
Interpreting CISC ISA
RISC ISA (Power PC) 32 bit register. 32bit length.
31 25 20 15 10 0
31 25 20 15 0
Register-immediate Op Rd Rs1 Const
25
Microprocessor Architecture & System Software Lab
Interpreting a Complex Instruction Set
CISC instruction set has a wide variety of formats, variable instruction
lengths, and variable field lengths (x86 instruction lengths: 1 ~ 16 bytes)
7 6 5 3 2 0 7 6 5 3 2 0
Mod Reg/ R/M Scale Index Base
Opcode
26
Microprocessor Architecture & System Software Lab
Interpreting a Complex Instruction Set
Decode and dispatch
• Decode fields and fill in a
general template General
• Jump to routines Decode
(fill-in instruction
Slow due to generality
Structure)
Solution
• Make common case faster
Dispatch
27
Microprocessor Architecture & System Software Lab
Some optimizations
Dispatch
On
first byte
Shared
routines
28
Microprocessor Architecture & System Software Lab
Threaded Interpretation
Complex
Decode/
Dispatch
29
Microprocessor Architecture & System Software Lab