1 Basics of Compiler Design
1 Basics of Compiler Design
1
Outlines
1.1 Overview and History
1.2 What Do Compilers Do?
1.3 The Structure of a Compiler
1.4 The Syntax and Semantics of Programming Languages
1.5 Compiler Design and Programming Language Design
1.7 Computer Architecture and Compiler Design
1.8 Compiler Design Considerations
2
Overview and History (1)
Cause
Software for early computers was written in assembly language
3
What Do Compilers Do (1)
Programming Machine
Language Compiler Language
(Source) (Target)
4
What Do Compilers Do (2)
A compiler is a program that can read a program in one
language (the source language) and translate it into an
equivalent program in another language (the target language).
An important role of the compiler is to report any errors in the
source program that it detects during the translation process.
source
program input
Compiler Target
program
target output
5 program
What Do Compilers Do (3)
Compilers may generate three types of code:
Pure Machine Code
Machine instruction set without assuming the existence of any
operating system or library.
Mostly being OS or embedded applications.
Augmented Machine Code
Code with OS routines and runtime support routines.
Virtual Machine Code
Virtual instructions,can be run on any architecture with a virtual
machine interpreter or a just-in-time compiler
Ex.Java
6
Interpreter
7
Hybrid Compiler
Compilation Translator
Interpretation
intermediate Virtual
program Machine output
input
8
Is the Compiler Enough?
Several other programs may be required to create an executable target
program.
Preprocessor: collects the separated parts of the source program into one
piece.
Linker: links the generated machine code with the needed object files and
library files by resolving external memory addresses.
Loader: puts together all the executables object files into memory for
execution.
9
Program Build Cycle
source program
Preprocessor
Compiler
Assembler
Linker / Loader
Analysis Synthesis
11
The Structure of a Compiler (2)
Two fundamental parts:
Analysis (front end):
1. Decomposes source program into pieces and imposes grammatical structure on
them.
2. If the source program is syntactically or semantically ill, it must provide
informative messages so the user can take corrective action.
3. Collects information about the source program and stores it in a data structure
called a Symbol Table.
Synthesis (back end):
1. Generates intermediate representation of the source program.
2. Optimizes the intermediate code.
3. Constructs the desired target program from the intermediate representation
and the information in the symbol table.
12
The Structure of a Compiler (2)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Code
Generator
13
Target machine code
The Structure of a Compiler (3)
Analysis
14
The Analysis Task for Compilation
Comprises three phases:
1. Lexical / Scanning Analysis:
L-to-R scan to identify tokens.
Tokens: sequence of characters having a collective meaning.
3. Semantic Analysis:
Checking to ensure correctness of components.
15
The Structure of a Compiler (Scanning)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Scanner
The scanner begins the analysis of the source program by
reading the input, character by character, and grouping
Symbol and
nd symbols (tokens) Optimizer
characters into individual words a
Attribute
Tables
RE ( Regular expression )
NFA ( Non-deterministic Finite Automata )
(U
DFA ( Deterministic Finite Autsed
omatab)y all
LEX Phases of The
Compiler) Code
Generator
16
Target machine code
The Structure of a Compiler (Parsing)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Parser
Given a formal syntax specification (typically as a context-
free grammar [CFG] ), the parse reads tokens and groups
spSymbol and Optimizer
them into units as ecifiedAttribute
by the productions of the CFG
being used.
Tables
As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
(Used by all
tree.
CFG ( Context-Free Grammahra
P ) ses of
BNF ( Backus-Naur Form )The Compiler) Code
GAA ( Grammar Analysis Algorithms ) Generator
LL, LR, SLR, LALR Parsers
YA1C
7C
Target machine code
Syntax / Parsing Analysis Example
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Semantic Routines
Perform two functions
Check the static semantics of each construct
Do the actual traSnysm
labtioolnand Optimizer
The heart of a compileArttribute
Tables
Syntax Directed Translation
Semantic Processing Tech(nU iqs
ue
esd by all
IR (Intermediate Representation)
Phases of The
Compiler) Code
Generator
19
Target machine code
Semantic Analysis
Example:
=
=
<id,1> +
<id,1> +
<id,2> *
<id,2> *
<id,3> <id,3> intofloat
<60>
60
20
The Structure of a Compiler
Synthesis
21
The Synthesis Task for Compilation
Comprises three phases:
4. Intermediate Code Generation.
5. Code Optimization.
6. Code Generation.
22
Intermediate Code Generation
Example:
=
t1 = inttofloat (60)
<id,1> + t2 = id3 * tl
<id,2> * t3 = id2 + t2
<id,3> intofloat
60
23
The Structure of a Compiler
(Optimization)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Optimizer
The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent but
Symbol and Optimizer
improved IR code
Attribute x and slow
This phase can be veryTables
comple
Peephole optimization
loop optimization, register allocation, code scheduling
(Used by all
Register and Temporary MP
anhaa
gs es
emeno
tf
Peephole Optimization The Compiler) Code
Generator
24
Target machine code
Code Optimization (Example)
Example:
t1 = inttofloat (60)
t1 = id3 * 60.0
t2 = id3 * tl
id1 = id2 + t1
t3 = id2 + t2
25
The Structure of a Compiler (Code
Generation)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Code Generator
Interpretive Code Generation
Generating Code from Tree/Dag
Grammar-Based Code Generator Optimizer
Code
Generator
26 Target machine code
Code Generation (Example)
Example:
27
The Structure of a Compiler
(Complete Example)
Code Generator
[Intermediate Code Generator]
Tokens
Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree
Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code
28
The Structure of a Compiler (Tools)
Compiler writing tools
Compiler generators or compiler-
compilers
E.g. scanner and parser generators
Examples : Yacc, Lex
29
The Syntax and Semantics of
Programming Language (1)
A programming language must include the specification of
syntax (structure) and semantics (meaning).
Syntax typically means the context-free syntax because of
the almost universal use of context-free-grammar (CFGs)
Ex.
a = b + c is syntactically legal
b + c = a is illegal
30
The Syntax and Semantics of
Programming Language (2)
The semantics of a programming language are commonly
divided into two classes:
Static semantics
Semantics rules that can be checked at compiled time.
Ex.The type and number of a function’s arguments
Runtime semantics
Semantics rules that can be checked only at run time
31
Computer Architecture and Compiler
Design
Compilers should exploit the hardware-specific feature
and computing capability to optimize code.
The problems encountered in modern computing
platforms:
Instruction sets for some popular architectures are highly
nonuniform.
High-level programming language operations are not always
easy to support.
Ex. exceptions,threads, dynamic heap access …
Exploiting architectural features such as cache,distributed
processors and memory
Effective use of a large number of processors
32
Compiler Design Considerations
Debugging Compilers
Designed to aid in the development and debugging of
programs.
Optimizing Compilers
Designed to produce efficient target code
Retargetable Compilers
A compiler whose target architecture can be changed without
its machine-independent components having to be rewritten.
33