0% found this document useful (0 votes)

970 views39 pages

Intermediate Representation and Symbol Table

The document discusses intermediate representations (IRs) used in compilers. It notes that compilers typically use 2-3 IRs including a high-level IR (HIR) that preserves structure, a mid-level IR (MIR) for optimizations and code generation, and a low-level IR (LIR) similar to machine code. While there is no standard IR, they allow expressing programs in a form machines can understand while enabling analysis and transformations. IR design involves balancing various issues around languages, optimizations, and code generation.

Uploaded by

SANDJITH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

970 views39 pages

Intermediate Representation and Symbol Table

Uploaded by

SANDJITH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 39

Intermediate Representation

Design
• More of a wizardry rather than science

• each compiler uses 2-3 IRs

• HIR (high level IR) preserves loop structure and array bounds

• MIR (medium level IR) reflects range of features in a set of

source languages

– language independent
– good for code generation for one or more architectures
– appropriate for most optimizations

• LIR (low level IR) low level similar to the machines

1
• Compiler writers have tried to define Universal IRs and have
failed. (UNCOL in 1958)

• There is no standard Intermediate Representation. IR is a step in

expressing a source program so that machine understands it

• As the translation takes place, IR is repeatedly analyzed and

transformed

• Compiler users want analysis and translation to be fast and

correct

• Compiler writers want optimizations to be simple to write, easy to

understand and easy to extend

• IR should be simple and light weight while allowing easy

expression of optimizations and transformations.

2
Issues in IR Design
• source language and target language

• porting cost or reuse of existing design

• whether appropriate for optimizations

• U-code IR used on PA-RISC and Mips. Suitable for

expression evaluation on stacks but less suited for
load-store architectures

• both compilers translate U-code to another form

– HP translates to very low level representation

– Mips translates to MIR and translates back to U-code for
code generator
3
Issues in new IR Design
• how much machine dependent

• expressiveness: how many languages are covered

• appropriateness for code optimization

• appropriateness for code generation

• Use more than one IR (like in PA-RISC)

Front
ucode SLLIC Optimizer
end
Used by Spectrum
HP3000 Low Level
As these were
Intermediate code
stack machines 4
Issues in new IR Design …
• Use more than one IR for more than one
optimization

• represent subscripts by list of subscripts:

suitable for dependence analysis

• make addresses explicit in linearized form:

– suitable for constant folding, strength reduction,
loop invariant code motion, other basic
optimizations

5
float a[20][10];
use a[i][j+2]
HIR
MIR LIR
t1a[i,j+2]
t1 j+2 r1 [fp-4]
t2 i*20 r2 r1+2
t3 t1+t2 r3 [fp-8]
t4 4*t3 r4 r3*20
t5 addr a r5 r4+r2
t6 t4+t5 r6 4*r5
t7*t6 r7fp-216
f1 [r7+r6]
6
High level IR
int f(int a, int b) {
int c;
c = a + 2;
print(b, c);
}

• Abstract syntax tree

– keeps enough information to reconstruct source
form
– keeps information about symbol table

7
function

ident f paramlist body

ident a paramlist declist stmtlist

ident b end ident c end = stmtlist

ident c + call end

Identifiers are actually ident a const 2 ident arglist

Pointers to the print
Symbol table entries arglist
ident b
end
ident c
8
• Medium level IR
– reflects range of features in a set of source languages
– language independent
– good for code generation for a number of architectures
– appropriate for most of the optimizations
– normally three address code

• Low level IR
– corresponds one to one to target machine instructions
– architecture dependent

• Multi-level IR
– has features of MIR and LIR
– may also have some features of HIR

9
Abstract Syntax Tree/DAG
• Condensed form of parse tree

• useful for representing language constructs

• Depicts the natural hierarchical structure of the

source program

– Each internal node represents an operator

– Children of the nodes represent operands
– Leaf nodes represent operands

• DAG is more compact than abstract syntax tree

because common sub expressions are eliminated
10
a := b * -c + b * -c
Abstract syntax tree Directed Acyclic Graph

assign assign

a + a +

* * *

b uminus b uminus b uminus

c c c
11
Postfix notation
• Linearized representation of a syntax tree

• List of nodes of the tree

• Nodes appear immediately after its children

• The postfix notation for an expression E is defined as follows:

– If E is a variable or constant then the postfix notation is E

itself

– If E is an expression of the form E1 op E2 where op is a

binary operator then the postfix notation for E is
• E1' E2' op where E1' and E2‘ are the postfix notations for E1 and
E2 respectively

– If E is an expression of the form (E1) then the postfix

notation for E1 is also the postfix notation for E

12
Postfix notation …
• No parenthesis are needed in postfix notation
because
– the position and parity of the operators permits
only one decoding of a postfix expression

• Postfix notation for

a = b * -c + b * - c
is
abc-*bc-*+=

13
Three address code
• It is a sequence of statements of the
general form X := Y op Z where

– X, Y or Z are names, constants or compiler

generated temporaries

– op stands for any operator such as a fixed-

or floating-point arithmetic operator, or a
logical operator
14
Three address code …
• Only one operator on the right hand side is allowed

• Source expression like x + y * z might be translated into

t1 := y * z
t2 := x + t1

where t1 and t2 are compiler generated temporary names

• Unraveling of complicated arithmetic expressions and of control flow

makes 3-address code desirable for code generation and optimization

• The use of names for intermediate values allows 3-address code to be

easily rearranged

• Three address code is a linearized representation of a syntax tree where

explicit names correspond to the interior nodes of the graph

15
Three address instructions
• Assignment • Function
– x = y op z – param x
– x = op y – call p,n
– x=y – return y

• Jump • Pointer
– goto L – x = &y
– if x relop y goto L – x = *y
– *x = y
• Indexed assignment
– x = y[i]
– x[i] = y

16
Other representations
• SSA: Single Static Assignment
• RTL: Register transfer language
• Stack machines: P-code
• CFG: Control Flow Graph
• Dominator Trees
• DJ-graph: dominator tree augmented with join edges
• PDG: Program Dependence Graph
• VDG: Value Dependence Graph
• GURRR: Global unified resource requirement
representation. Combines PDG with resource
requirements
• Java intermediate bytecodes
• The list goes on ......

17
Symbol Table
• Compiler uses symbol table to keep track of scope and binding
information about names

• symbol table is changed every time a name is encountered in the

source; changes to table occur
– if a new name is discovered
– if new information about an existing name is discovered

• Symbol table must have mechanism to:

– add new entries
– find existing information efficiently

• Two common mechanism:

– linear lists, simple to implement, poor performance
– hash tables, greater programming/space overhead, good performance

• Compiler should be able to grow symbol table dynamically

• if size is fixed, it must be large enough for the largest program

18
Symbol Table Entries
• each entry for a declaration of a name

• format need not be uniform because information depends upon the usage of
the name

• each entry is a record consisting of consecutive words

• to keep records uniform some entries may be outside the symbol table

• information is entered into symbol table at various times

– keywords are entered initially
– identifier lexemes are entered by lexical analyzer

• symbol table entry may be set up when role of name becomes clear

• attribute values are filled in as information is available

19
• a name may denote several objects in the same block
– int x;
struct x {float y, z; }
– lexical analyzer return the name itself and not pointer to symbol table
entry
– record in the symbol table is created when role of the name becomes
clear
– in this case two symbol table entries will be created

• attributes of a name are entered in response to declarations

• labels are often identified by colon

• syntax of procedure/function specifies that certain identifiers are

formals

• characters in a name
– there is a distinction between token id, lexeme and attributes of the
names
– it is difficult to work with lexemes
– if there is modest upper bound on length then lexemes can be stored
in symbol table
– if limit is large store lexemes separately
20
Storage Allocation Information
• information about storage locations is kept in the symbol table

• if target is assembly code then assembler can take care of storage for
various names

• compiler needs to generate data definitions to be appended to

assembly code

• if target is machine code then compiler does the allocation

• for names whose storage is allocated at runtime no storage allocation

is done

• compiler plans out activation records

21
Data Structures
• List data structure
– simplest to implement
– use a single array to store names and information
– search for a name is linear
– entry and lookup are independent operations
– cost of entry and search operations are very high and
lot of time goes into book keeping

• Hash table
– The advantages are obvious

22
Representing Scope Information
• entries are declarations of names

• when a lookup is done, entry for appropriate declaration must be

returned

• scope rules determine which entry is appropriate

• maintain separate table for each scope

• symbol table for a procedure or scope is compile time equivalent an

activation record

• information about non local is found by scanning symbol table for

the enclosing procedures

• symbol table can be attached to abstract syntax of the procedure

(integrated into intermediate representation)

23
• most closely nested scope rule can be implemented in data
structures discussed so far

• give each procedure a unique number

• blocks must also be numbered

• procedure number is part of all local declarations

• name is represented as a pair of number and name

• names are entered in symbol table in the order they occur

• most closely nested rule can be created in terms of following

operations:
– lookup: find the most recently created entry
– insert: make a new entry
– delete: remove the most recently created entry

24
Symbol table structure
• Assign variables to storage classes that prescribe scope, visibility, and
lifetime
– scope rules prescribe the symbol table structure
– scope: unit of static program structure with one or more variable
declarations
– scope may be nested
• Pascal: procedures are scoping units
• C: blocks, functions, files are scoping units

• Visibility, lifetimes, global variables

• Common (in Fortran)

• Automatic or stack storage

• Static variables

25
Symbol attributes and symbol
table entries
• Symbols have associated attributes

• typical attributes are name, type, scope, size, addressing mode

etc.

• a symbol table entry collects together attributes such that they

can be easily set and retrieved

• example of typical names in symbol table

Name Type
name character string
class enumeration
size integer
type enumeration

26
Local Symbol Table Management
NewSymTab: SymTab  SymTab

DestSymTab: SymTab  SymTab

InsertSym: SymTab X Symbol  boolean

LocateSym: SymTab X Symbol  boolean

GetSymAttr: SymTab X Symbol X Attr  boolean

SetSymAttr: SymTab X Symbol X Attr X value  boolean

NextSym: SymTab X Symbol  Symbol

MoreSyms: SymTab X Symbol  boolean

27
• A major consideration in designing a symbol table is
that insertion and retrieval should be as fast as
possible

• One dimensional table: search is very slow

• Balanced binary tree: quick insertion, searching and

retrieval; extra work required to keep the tree
balanced

• Hash tables: quick insertion, searching and retrieval;

extra work to compute hash keys

• Hashing with a chain of entries is generally a good

approach
28
Hashed local symbol table

29
Nesting structure of an example
Pascal program
program e; procedure i;
var a, b, c: integer; var b, d: integer;
begin
procedure f; b:= a+c
var a, b, c: integer; end;
begin
a := b+c procedure j;
end; var b, d: integer;
begin
procedure g; b := a+d
var a, b: integer; end;

procedure h; begin
var c, d: integer; a := b+c
begin
end.
c := a+d
end;

30
Global Symbol table structure
• scope and visibility e( ) ‘s symtab
Integer a
rules determine the Integer b
structure of global Integer c
symbol table

• for Algol class of f( ) ‘s symtab g( ) ‘s j( ) ‘s symtab

languages scoping Integer a
Integer b
symtab
Integer a
Integer b
Integer d
rules structure the Integer c Integer b
symbol table as tree
of local tables
– global scope as root
h( ) ‘s symtab i( ) ‘s symtab
– tables for nested Integer c Integer b
scope as children of Integer d Integer d
the table for the
scope they are
nested in

31
32
Storage binding and symbolic
registers
• Translates variable names into addresses

• This process must occur before or during code

generation

• each variable is assigned an address or addressing

method

• each variable is assigned an offset with respect to

base which changes with every invocation

• variables fall in four classes: global, global static,

stack, stack static

33
• global/static: fixed relocatable address or offset with
respect to base as global pointer

• stack variable: offset from stack/frame pointer

• allocate stack/global in registers

• registers are not indexable, therefore, arrays cannot

be in registers

• assign symbolic registers to scalar variables

• used for graph coloring for global register allocation

34
a: global b: local c[0..9]: local
gp: global pointer fp: frame pointer
LIR
MIR LIR
s0  s0*2
a  a*2 r1  [gp+8]
r2  r1*2
[gp+8]  r2
s1  [fp-28]
b  a+c[1] r3  [gp+8]
s2  s0+s1
r4  [fp-28]
r5  r3+r4
[fp-20]r5
Names bound
to symbolic
Names bound registers
to locations
35
Local Variables in Frame
• assign to consecutive locations; allow
enough space for each
– may put word size object in half word
boundaries
– requires two half word loads
– requires shift, or, and

• align on double word boundaries

– wastes space
– machine may allow small offsets
36
• sort variables by the alignment they need

• store largest variables first

– automatically aligns all the variables
– does not require padding

• store smallest variables first

– requires more space (padding)
– for large stack frame makes more variables
accessible with small offsets

37
How to store large local data
structures
• Requires large space in local frames and therefore large
offsets

• If large object is put near the boundary other objects

require large offset either from fp (if put near beginning)
or sp (if put near end)

• Allocate another base register to access large objects

• Allocate space in the middle or elsewhere; store pointer

to these locations from at a small offset from fp

• Requires extra loads

38
int i;
double float x;
short int j;
float y;

Unsorted aligned

i x j y
0 -4 -8 -16 -18 -20 -24

Sorted frames

x i y j
0 -8 -12 -16 -18
39

Quantitative Aptitude For Competitive Examinations - Abhijit Guha
100% (2)
Quantitative Aptitude For Competitive Examinations - Abhijit Guha
588 pages
ACC/ACF2400 Accounting Information Systems: Part 1: Group Activity
No ratings yet
ACC/ACF2400 Accounting Information Systems: Part 1: Group Activity
3 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
The Rodin Glossary
91% (11)
The Rodin Glossary
484 pages
Fault Code List For Central Data Memory (ZDS) Control Unit PDF
100% (1)
Fault Code List For Central Data Memory (ZDS) Control Unit PDF
2 pages
15IR and SymTab
No ratings yet
15IR and SymTab
30 pages
5-ir
No ratings yet
5-ir
51 pages
mod4
No ratings yet
mod4
39 pages
Compiler Construction: A Compulsory Module For Students in
No ratings yet
Compiler Construction: A Compulsory Module For Students in
34 pages
cd
No ratings yet
cd
4 pages
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
23 pages
Chapter 6 Code generation and Optimization
No ratings yet
Chapter 6 Code generation and Optimization
34 pages
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
21 pages
CS 346: Intermediate Code Generation: Resource
No ratings yet
CS 346: Intermediate Code Generation: Resource
60 pages
Lecture Notes on Code Generation
No ratings yet
Lecture Notes on Code Generation
74 pages
Chapter 5 - Intermediate Code Generation
No ratings yet
Chapter 5 - Intermediate Code Generation
27 pages
Unit-Iv: Intermediate Code Generation
No ratings yet
Unit-Iv: Intermediate Code Generation
19 pages
Additional Note CSC 409
No ratings yet
Additional Note CSC 409
11 pages
BCS 324 TOPIC 5
No ratings yet
BCS 324 TOPIC 5
35 pages
Unit 4.2
No ratings yet
Unit 4.2
44 pages
Compiler Engineering
No ratings yet
Compiler Engineering
24 pages
Intermediate Representation: Goals
No ratings yet
Intermediate Representation: Goals
40 pages
unit 4 new
No ratings yet
unit 4 new
60 pages
Lecture Notes Compiler Design Chapter-6
No ratings yet
Lecture Notes Compiler Design Chapter-6
55 pages
C Ompiler Theory: (Intermediate C Ode Generation - Abstract S Yntax + 3 Address C Ode)
No ratings yet
C Ompiler Theory: (Intermediate C Ode Generation - Abstract S Yntax + 3 Address C Ode)
32 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
62 pages
Compiler Design - Code Generation
No ratings yet
Compiler Design - Code Generation
62 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
42 pages
Lecture21-22 Compiler Construction
No ratings yet
Lecture21-22 Compiler Construction
42 pages
Chap. 6, Intermediate Code Generation: J. H. Wang Dec. 17, 2008
No ratings yet
Chap. 6, Intermediate Code Generation: J. H. Wang Dec. 17, 2008
55 pages
Intermediate Code Generator 1
No ratings yet
Intermediate Code Generator 1
48 pages
Unit-4-2
No ratings yet
Unit-4-2
23 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
44 pages
RkCD-Chapter 6 - Intermediate Code Generation
No ratings yet
RkCD-Chapter 6 - Intermediate Code Generation
12 pages
Unit 4
No ratings yet
Unit 4
4 pages
Chapter 8 - Code Generation
No ratings yet
Chapter 8 - Code Generation
62 pages
UNIT-3 Odg
No ratings yet
UNIT-3 Odg
17 pages
2024_CD_Ch06_Intermidiate_&_Ch07_runtime_&_Ch08_code_optimization
No ratings yet
2024_CD_Ch06_Intermidiate_&_Ch07_runtime_&_Ch08_code_optimization
29 pages
Compiler (Statement of Problem)
No ratings yet
Compiler (Statement of Problem)
58 pages
Chapter 6 - Intermediate Code Generation
No ratings yet
Chapter 6 - Intermediate Code Generation
42 pages
TSR_CLASS CD-UNIT 3
No ratings yet
TSR_CLASS CD-UNIT 3
111 pages
Compiler Phases
No ratings yet
Compiler Phases
18 pages
Intermediate Code Generation and Code Optimization
No ratings yet
Intermediate Code Generation and Code Optimization
40 pages
Code Generator
No ratings yet
Code Generator
44 pages
Unit Iv - Syntax Directed Translation & Run Time Environment
No ratings yet
Unit Iv - Syntax Directed Translation & Run Time Environment
8 pages
CD Mid 2
No ratings yet
CD Mid 2
24 pages
Chapter 6
No ratings yet
Chapter 6
28 pages
Cdunit 5
No ratings yet
Cdunit 5
41 pages
Runtime Environments
No ratings yet
Runtime Environments
127 pages
Unit 6 and 7- Code Optimization and Code Generation
No ratings yet
Unit 6 and 7- Code Optimization and Code Generation
48 pages
CD Chapter 6
No ratings yet
CD Chapter 6
24 pages
CD Unit3
No ratings yet
CD Unit3
17 pages
PLDI Week 04 LLVM
No ratings yet
PLDI Week 04 LLVM
62 pages
Unit-5-Issues in Code Generation
No ratings yet
Unit-5-Issues in Code Generation
20 pages
COMPILER DESIGN ASSIGNMENT TWO 17 12 2022 Submit
No ratings yet
COMPILER DESIGN ASSIGNMENT TWO 17 12 2022 Submit
18 pages
Lecture On Compiler Design: Chapter 8: Intermediate Code Generation
No ratings yet
Lecture On Compiler Design: Chapter 8: Intermediate Code Generation
29 pages
Runtime Storage Management (AutoRecovered)
No ratings yet
Runtime Storage Management (AutoRecovered)
17 pages
Run Time Env Symbol Table Review
No ratings yet
Run Time Env Symbol Table Review
42 pages
Unit Ii Program Design and Analysis: - Software Components. - Representations of Programs. - Assembly and Linking
No ratings yet
Unit Ii Program Design and Analysis: - Software Components. - Representations of Programs. - Assembly and Linking
60 pages
Compiler Design Lec-8Code Generation and Optimization (2)
No ratings yet
Compiler Design Lec-8Code Generation and Optimization (2)
46 pages
Subject Code: 6CS63/06IS662 NO. of Lectures Per Week: 04 Total No. of Lecture HRS: 52 IA Marks: 25 Exam HRS: 03 Exam Marks:100
No ratings yet
Subject Code: 6CS63/06IS662 NO. of Lectures Per Week: 04 Total No. of Lecture HRS: 52 IA Marks: 25 Exam HRS: 03 Exam Marks:100
38 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
C Programming
From Everand
C Programming
Netra
No ratings yet
"C Programming for Beginners: A Step-by-Step Guide"
From Everand
"C Programming for Beginners: A Step-by-Step Guide"
Lov kush
No ratings yet
05 02 2018
No ratings yet
05 02 2018
26 pages
The Hindu Review February 2018
No ratings yet
The Hindu Review February 2018
16 pages
Powers of IRDA PDF
No ratings yet
Powers of IRDA PDF
275 pages
SBI PO Exam - Syllabus, Pattern and Question Papers Compet
No ratings yet
SBI PO Exam - Syllabus, Pattern and Question Papers Compet
9 pages
PD Eaug 14
No ratings yet
PD Eaug 14
160 pages
Marketing
No ratings yet
Marketing
54 pages
Academic Calender2012-2013 EVEN Sem - ECEW
No ratings yet
Academic Calender2012-2013 EVEN Sem - ECEW
7 pages
Unit-1 Computer Networks
No ratings yet
Unit-1 Computer Networks
2 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Lom Log
No ratings yet
Lom Log
4 pages
GATE Study Material, Forum, Downloads, Discussions & More!
No ratings yet
GATE Study Material, Forum, Downloads, Discussions & More!
20 pages
Web Services Transformer
No ratings yet
Web Services Transformer
20 pages
C Programming Chapter-1 Notes
No ratings yet
C Programming Chapter-1 Notes
27 pages
CP Plus Indigo cVMS-2000 User Manual
No ratings yet
CP Plus Indigo cVMS-2000 User Manual
77 pages
Manual Dcm4chee Cs
No ratings yet
Manual Dcm4chee Cs
46 pages
PlugY The Survival Kit - Readme
No ratings yet
PlugY The Survival Kit - Readme
16 pages
25 Python Scripts To Boost Your Productivity - by Neuro Bytes - CodeCuriosity - Nov, 2024 - Medium
No ratings yet
25 Python Scripts To Boost Your Productivity - by Neuro Bytes - CodeCuriosity - Nov, 2024 - Medium
15 pages
Content Archiving With Infoarchive: View Point
No ratings yet
Content Archiving With Infoarchive: View Point
8 pages
Javascript: Prof.N.Nalini Ap (SR) Scope Vit
No ratings yet
Javascript: Prof.N.Nalini Ap (SR) Scope Vit
35 pages
Web Services Notes PDF
No ratings yet
Web Services Notes PDF
28 pages
Ad Hoc Wireless Networks
No ratings yet
Ad Hoc Wireless Networks
34 pages
JDBC Interview Questions and Answers - by Sivareddy
No ratings yet
JDBC Interview Questions and Answers - by Sivareddy
3 pages
cs2 Report
No ratings yet
cs2 Report
17 pages
Lecture 2 - Database Design
No ratings yet
Lecture 2 - Database Design
71 pages
Cursor
No ratings yet
Cursor
4 pages
Chapter - 5 Data File Handling - Text File Objective Answer Type Quesrtions 1 X 10 10 MARKS
No ratings yet
Chapter - 5 Data File Handling - Text File Objective Answer Type Quesrtions 1 X 10 10 MARKS
3 pages
Business Impact Analysis
No ratings yet
Business Impact Analysis
18 pages
Java Socket Programming Examples
No ratings yet
Java Socket Programming Examples
32 pages
Dbms Mod4
No ratings yet
Dbms Mod4
38 pages
Creating Bootable USB Drive Using Rufus
100% (1)
Creating Bootable USB Drive Using Rufus
6 pages
Elements of Computer Engineering Online Class
100% (1)
Elements of Computer Engineering Online Class
46 pages
Java Notes Complete
No ratings yet
Java Notes Complete
63 pages
Blue Back
No ratings yet
Blue Back
32 pages
Optical Networks
No ratings yet
Optical Networks
46 pages
Iccit 2005 Ppmtree Paper
No ratings yet
Iccit 2005 Ppmtree Paper
5 pages