Intermediate Representation and Symbol Table
Intermediate Representation and Symbol Table
Design
• More of a wizardry rather than science
• HIR (high level IR) preserves loop structure and array bounds
– language independent
– good for code generation for one or more architectures
– appropriate for most optimizations
2
Issues in IR Design
• source language and target language
Front
ucode SLLIC Optimizer
end
Used by Spectrum
HP3000 Low Level
As these were
Intermediate code
stack machines 4
Issues in new IR Design …
• Use more than one IR for more than one
optimization
5
float a[20][10];
use a[i][j+2]
HIR
MIR LIR
t1a[i,j+2]
t1 j+2 r1 [fp-4]
t2 i*20 r2 r1+2
t3 t1+t2 r3 [fp-8]
t4 4*t3 r4 r3*20
t5 addr a r5 r4+r2
t6 t4+t5 r6 4*r5
t7*t6 r7fp-216
f1 [r7+r6]
6
High level IR
int f(int a, int b) {
int c;
c = a + 2;
print(b, c);
}
7
function
• Low level IR
– corresponds one to one to target machine instructions
– architecture dependent
• Multi-level IR
– has features of MIR and LIR
– may also have some features of HIR
9
Abstract Syntax Tree/DAG
• Condensed form of parse tree
assign assign
a + a +
* * *
c c c
11
Postfix notation
• Linearized representation of a syntax tree
12
Postfix notation …
• No parenthesis are needed in postfix notation
because
– the position and parity of the operators permits
only one decoding of a postfix expression
13
Three address code
• It is a sequence of statements of the
general form X := Y op Z where
15
Three address instructions
• Assignment • Function
– x = y op z – param x
– x = op y – call p,n
– x=y – return y
• Jump • Pointer
– goto L – x = &y
– if x relop y goto L – x = *y
– *x = y
• Indexed assignment
– x = y[i]
– x[i] = y
16
Other representations
• SSA: Single Static Assignment
• RTL: Register transfer language
• Stack machines: P-code
• CFG: Control Flow Graph
• Dominator Trees
• DJ-graph: dominator tree augmented with join edges
• PDG: Program Dependence Graph
• VDG: Value Dependence Graph
• GURRR: Global unified resource requirement
representation. Combines PDG with resource
requirements
• Java intermediate bytecodes
• The list goes on ......
17
Symbol Table
• Compiler uses symbol table to keep track of scope and binding
information about names
• format need not be uniform because information depends upon the usage of
the name
• to keep records uniform some entries may be outside the symbol table
• symbol table entry may be set up when role of name becomes clear
19
• a name may denote several objects in the same block
– int x;
struct x {float y, z; }
– lexical analyzer return the name itself and not pointer to symbol table
entry
– record in the symbol table is created when role of the name becomes
clear
– in this case two symbol table entries will be created
• characters in a name
– there is a distinction between token id, lexeme and attributes of the
names
– it is difficult to work with lexemes
– if there is modest upper bound on length then lexemes can be stored
in symbol table
– if limit is large store lexemes separately
20
Storage Allocation Information
• information about storage locations is kept in the symbol table
• if target is assembly code then assembler can take care of storage for
various names
21
Data Structures
• List data structure
– simplest to implement
– use a single array to store names and information
– search for a name is linear
– entry and lookup are independent operations
– cost of entry and search operations are very high and
lot of time goes into book keeping
• Hash table
– The advantages are obvious
22
Representing Scope Information
• entries are declarations of names
23
• most closely nested scope rule can be implemented in data
structures discussed so far
24
Symbol table structure
• Assign variables to storage classes that prescribe scope, visibility, and
lifetime
– scope rules prescribe the symbol table structure
– scope: unit of static program structure with one or more variable
declarations
– scope may be nested
• Pascal: procedures are scoping units
• C: blocks, functions, files are scoping units
• Static variables
25
Symbol attributes and symbol
table entries
• Symbols have associated attributes
Name Type
name character string
class enumeration
size integer
type enumeration
26
Local Symbol Table Management
NewSymTab: SymTab SymTab
27
• A major consideration in designing a symbol table is
that insertion and retrieval should be as fast as
possible
29
Nesting structure of an example
Pascal program
program e; procedure i;
var a, b, c: integer; var b, d: integer;
begin
procedure f; b:= a+c
var a, b, c: integer; end;
begin
a := b+c procedure j;
end; var b, d: integer;
begin
procedure g; b := a+d
var a, b: integer; end;
procedure h; begin
var c, d: integer; a := b+c
begin
end.
c := a+d
end;
30
Global Symbol table structure
• scope and visibility e( ) ‘s symtab
Integer a
rules determine the Integer b
structure of global Integer c
symbol table
31
32
Storage binding and symbolic
registers
• Translates variable names into addresses
33
• global/static: fixed relocatable address or offset with
respect to base as global pointer
34
a: global b: local c[0..9]: local
gp: global pointer fp: frame pointer
LIR
MIR LIR
s0 s0*2
a a*2 r1 [gp+8]
r2 r1*2
[gp+8] r2
s1 [fp-28]
b a+c[1] r3 [gp+8]
s2 s0+s1
r4 [fp-28]
r5 r3+r4
[fp-20]r5
Names bound
to symbolic
Names bound registers
to locations
35
Local Variables in Frame
• assign to consecutive locations; allow
enough space for each
– may put word size object in half word
boundaries
– requires two half word loads
– requires shift, or, and
37
How to store large local data
structures
• Requires large space in local frames and therefore large
offsets
Unsorted aligned
i x j y
0 -4 -8 -16 -18 -20 -24
Sorted frames
x i y j
0 -8 -12 -16 -18
39