Introduction and Structure of A Compiler
Introduction and Structure of A Compiler
Compiler Design
Prof. SAM 1
UNIT I: INTRODUCTION TO COMPILERS 9 CS8602 Syllabus Compiler Design
Structure of a compiler – Lexical Analysis – Role of Lexical Analyzer – Input Buffering – Specification of
Tokens – Recognition of Tokens – Lex – Finite Automata – Regular Expressions to Automata –
Minimizing DFA.
UNIT IV: RUN-TIME ENVIRONMENT AND CODE GENERATION 8 CS8602 Syllabus Compiler
Design
Storage Organization, Stack Allocation Space, Access to Non-local Data on the Stack, Heap Management –
Issues in Code Generation – Design of a simple Code Generator.
Prof. SAM 2
Test Book
Compilers
: Principles, Techniques, and Tools (2nd Editio
n)
,
by
Alfred V. Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D. Ullman.
Prof. SAM 3
Examples for Translators & Compilers
Translators
• Interpreters
• Assemblers
• Compilers
Compilers
• Cross Compilers
• One pass Compilers
• Multi pass Compilers
• Source to Source Compilers
• Silicon Compilers
Prof. SAM 4
Compilers
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent program
written in a target language
Input
Source Target
Compiler
Program Program
Source
Program
Interpreter Output
Input
Error messages
Prof. SAM 6
Cross Compilers
• A cross compiler is a compiler capable of
creating executable code for a platform other
than the one on which the compiler is
running.
Prof. SAM 7
Preprocessors, Compilers, Assemblers,
and Linkers
Skeletal Source Program
Preprocessor
Source Program
Try for example:
Compiler gcc -S myprog.c
Target Assembly Program javap Class
Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine
Prof. SAM
Code 8
Analysis of Source Programs
source program
Prof. SAM 9
Lexical Analysis
tokens
Prof. SAM 10
Syntax Analysis
parse tree
Prof. SAM 11
Semantic Analysis
type checking
type conversion
Prof. SAM 12
Symbol Table
Prof. SAM 13
Tools for Analysis: Examples
• Structure Analysis – Automatic structure
• Pretty Printers – Analyze and prints comment
lines in special font
• Static Checkers – find potential bugs without
running a program
• Interpreters – performs operation implied by
the source program
Prof. SAM 14
Synthesis of Object Code
parse tree & symbol table
code optimizer
optimized intermediate code
code generator
target program
Prof. SAM 15
Intermediate Code Generation
Prof. SAM 16
Code Optimization
Prof. SAM 17
Code Generation
Prof. SAM 18
Model of A Compiler
• A compiler must perform two tasks:
- analysis of source program: The analysis part breaks up
the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to
create an intermediate representation of the source program.
- synthesis of its corresponding program: constructs the
desired target program from the intermediate representation
and the information in the symbol table.
• The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
Prof. SAM 19
Phases of a Compiler
• Compiler operates in phases
• Each phase transforms source program from
one representation to another
Prof. SAM 20
Phases of the Compiler
Source Program
1
Lexical Analyzer
2
Syntax Analyzer
3
Semantic Analyzer
5
Code Optimizer
6
Code Generator
TargetProf.
Program
SAM 21
Phases of Modern Compiler
Error
handler
Prof. SAM 22
Lexical Analysis (scanner): The first phase of a compiler
Reads the characters in the source program and groups them into
tokens.
Token represents an Identifier, or a keyword, a punctuation
character or a operator.
The character Sequence forming a token is called the lexeme for
the token.
The lexical Analyzer not only generate tokens but also it enters the
lexeme into the symbol table
(token-name, attribute-value)
• Token-name: an abstract symbol is used during syntax analysis, an
• attribute-value: points to an entry in the symbol table for this
token.
Prof. SAM 23
Example: position =initial + rate * 60
1.”position” is a lexeme mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
2. = is a lexeme that is mapped into the token (=). Since this token needs no attribute-
value, we have omitted the second component. For notational convenience, the
lexeme itself is used as the name of the abstract symbol.
3. “initial” is a lexeme that is mapped into the token (id, 2), where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token (+).
5. “rate” is a lexeme mapped into the token (id, 3), where 3 points to the symbol-table
entry for rate.
6. * is a lexeme that is mapped into the token (*) .
7. 60 is a lexeme that is mapped into the token (60)
Blanks separating the lexemes would be discarded by the lexical analyzer.
Prof. SAM 24
Syntax Analysis (parser) : The second phase of the compiler
• The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation
Prof. SAM 25
Syntax Analysis Example
Assignment stmt
identifier := expression
It uses the syntax tree and the information in the symbol table
to check the source program for semantic consistency with the
language definition.
It gathers type information and saves it in either the syntax
tree or the symbol table, for subsequent use during
intermediate-code generation.
An important part of semantic analysis is type checking,
where the compiler checks that each operator has matching
operands.
For example, many programming language definitions require an array
index to be an integer; the compiler must report an error if a floating-
point number is used to index an array.
The language specification may permit some type conversions
called coercions. For example, a binary arithmetic operator
may be applied to either a pair of integers or to a pair of
floating-point numbers.
Prof. SAM 27
Intermediate Code Generation: three-address code
After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation
(a program for an abstract machine).
This intermediate representation should have two important properties:
– it should be easy to produce and
– it should be easy to translate into the target machine.
Intermediate representations are of the form,
syntax tree, TAC and Postfix notation.
The considered intermediate form called three-address code, which consists
of a sequence of assembly-like instructions with three operands per
instruction. Each operand can act like a register.
Prof. SAM 28
Code Optimization: to generate better target code
• There are simple optimizations that significantly improve the running time of the
target program without slowing down compilation too much .
• It involves:
• - Detection and removal of dead code.
- Calculation of constant expressions and terms.
- Moving code outside of loops.
- Removal of unnecessary temporary variables.
Prof. SAM 29
Code Generation
• It takes as input an intermediate representation of the source program and
maps it into the target language
• If the target language is machine, code, registers or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of
machine instructions that perform the same task.
• A crucial aspect of code generation is the careful assignment of registers
to hold variables.
• This involves:
– Allocation of Registers and Memory.
– Generation of correct References.
– Generation of correct types.
– Generation of machine code.
Prof. SAM 30
Symbol Table
• It is a data structure containing a record for each identifier, with fields for
the attributes of the identifier.
• The data structure allows us to find the record for each identifier quickly
and to store or retrieve data from that record quickly.
• These attributes may provide information about the storage allocated for a
name, its type, its scope (where in the program its value may be used).
• In the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or
by reference), and the type returned.
Prof. SAM 31
Error Detection and Reporting
• Each phase can encounter some errors.
• After detecting an error, a phase must deal with that error, so that
compilation can proceed allowing further errors in the source program to
be detected.
Eg:
i) Lexical Errors : (Don’t form any tokens)
“in, floa, switc etc.,”
ii) Syntax Errors: (Token stream violates the structure rules of the
language).
“ Missing of parenthesis, braces etc.,”
Prof. SAM 32
Example
Consider the following statement position=initial
+ rate*60
Show the output of each phase.
Prof. SAM 33
Example
position := initial + rate * 60
lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
:=
id1 +
id2 *
id3 60
semantic analyzer
:=
Symbol + E
Table
id1 r
id2l *
r
position .... id3 inttoreal o
60 r
initial ….
s
intermediate code generator
rate…. Prof. SAM 34
Example
Symbol Table E
r
position ....
r
initial …. o
intermediate code generator r
rate….
temp1 := inttoreal(60) s
temp2 := id3 * temp1
temp3 := id2 + temp2 3 address code
id1 := temp3
code optimizer
temp1 := id3 * 50.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVFProf.
R1,SAM
id1 35
Grouping of Compiler Phases
• Front end
Consist of those phases that depend on the source language but
largely independent of the target machine.
• Back end
Consist of those phases that are usually target machine dependent
such as optimization and code generation.
Prof. SAM 36
• Example 2:
Prof. SAM 37
Lexical Analysis
Lexical Analysis
Syntax Analysis
while (y < z) {
Semantic Analysis
int x = a + b;
y += x; IR Generation
} IR Optimization
Code Generation
Code Optimization
Prof. SAM 38
Prof. SAM 39
Outcome of Lexical Analyzer
Prof. SAM 40
• Groups the tokens into Syntactic Structures.
Prof. SAM 41
Ensures that the components of a program fit together meaningfully
Gathers type information and checks for type compatibility
Prof. SAM 42
• Simple Instructions produced by the
syntax Analyzer is IR.
• Eg: TAC
Prof. SAM 43
• Improve the Intermediate Code so that
the ultimate object program runs faster
and or takes less space.
Prof. SAM 44
Machine code is generated.
Prof. SAM 45
Prof. SAM 46
The text book
Prof. SAM 47