Compiler CH1
Compiler CH1
Compiler Design ()
1. Introduction
1
Overview and History
Cause
Software for early computers was written in assembly language
The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler
2
Why Study Compilers?
Build a large, ambitious software system.
Learn how to build programming languages.
Learn how programming languages work.
Learn tradeoffs in language design.
For new platforms
for new languages
3
What Do Compilers Do?
A compiler acts as a translator,
transforming human-oriented programming languages
into computer-oriented machine languages.
error messages
Output
4
The Structure of a Compiler
There are two major parts of a compiler: Analysis and Synthesis
Compiler
Analysis Synthesis
Lexical Analyzer, Intermediate Code Generator
Syntax Analyzer and Code Optimizer
Semantic Analyzer Code Generator
5
Analysis and Synthesis
6
Analysis and Synthesis Con…
Analysis: The analysis part breaks up the source program
into consistent pieces and creates an intermediate
representation of the source program.
During analysis, the operations implied by the source
program are determined and recorded in a hierarchical
structure called a tree.
Often a special kind of tree called a syntax tree is used, in
which each node represents an operation and the children of
a node represent the arguments of the operation
Synthesis: The synthesis part constructs the desired target
program from the intermediate representation of the two
parts; synthesis requires the most specialized techniques.
7
Phases of A Compiler
Source Program
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Symbol table Error handlers
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
• Each phase transforms the source program from one representation
into another representation.
8 • They communicate with error handlers.
• They communicate with the symbol table.
Lexical Analyzer
Lexical Analyzer reads the source program character by character
and returns the tokens of the source program.
A token describes a pattern of characters having same meaning in the
source program. (such as identifiers, operators, keywords, numbers,
delimiters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number
oldval 12
10
Syntax Analyzer versus Lexical Analyzer
Which constructs of a program should be recognized by the
lexical analyzer, and which ones by the syntax analyzer?
Both of them do similar things; But the lexical analyzer deals
with simple non-recursive constructs of the language.
The syntax analyzer deals with recursive constructs of the
language.
The lexical analyzer simplifies the job of the syntax analyzer.
The lexical analyzer recognizes the smallest meaningful units
(tokens) in a source program.
The syntax analyzer works on the smallest meaningful units
(tokens) in a source program to recognize meaningful structures
in our programming language.
11
Semantic Analyzer
A semantic analyzer checks the source program for
semantic errors and collects the type information for the
code generation.
Type-checking is an important part of semantic analyzer.
Normally semantic information cannot be represented by a
context-free language used in syntax analyzers.
Context-free grammars used in the syntax analysis are
integrated with attributes (semantic rules)
the result is a syntax-directed translation,
Attribute grammars
Ex: newval := oldval + 12
The type of the identifier newval must match with type of
the expression (oldval+12)
12
Intermediate Code Generation
A compiler may produce an explicit intermediate codes representing
the source program.
These intermediate codes are generally machine (architecture
independent). But the level of intermediate codes is close to the level
of machine codes.
Ex:
newval := oldval * fact + 1
Ex:
MULT id2,id3,temp1
ADD temp1,#1,id1
14
Code Generator
Produces the target language in a specific architecture.
The target program is normally is a relocatable object file
containing the machine codes.
Ex:
( assume that we have an architecture with instructions whose at least
one of its operands is a machine register)
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1
15
The Structure of a Compiler More Example
Code Generator
[Intermediate Code Generator]
Tokens
Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree
Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code
16
Compiler Construction Tools
Programs to be discussed:
17
General Compiler Infra-structure
Program source Syntactic
Scanner Tokens Parser Structure Semantic
(stream of (tokenizer) Routines
characters)
IR: Intermediate
Representation (1)
Analysis/
Symbol and Transformations/
Attribute Tables optimizations
IR: Intermediate
Representation (2)
Code
Generator
Assembly code
18
lex Programming Utility
General Information:
• Input is stored in a file with *.l extension
• File consists of three main sections
• lex generates C function stored in lex.yy.c
Using lex:
1) Specify words to be used as tokens (Extension of regular
expressions)
2) Run the lex utility on the source file to generate yylex( ), a C
function
3) Declares global variables char* yytext and int yyleng
19
lex Programming Utility
%%
/* lex patterns and actions */
{INT} {sscanf (yytext, “%d”, &i);
printf(“INTEGER\n”);}
%%
Using yacc:
1) Generates a C function called yyparse()
2) yyparse() may include calls to yylex()
3) Compile this function to obtain the compiler
21
yacc Parser Generator
yacc source lex source
yacc lex
#include “lex.yy.c”
y.tab.c lex.yy.c
cc
a.out
22
(Operation, Left Operand, Right Operand, Result)
Lex & Yacc
23
gcc Compiler
General Information:
Gcc is the GNU Project C compiler
A command-line program
Gcc takes C source files as input
Outputs an executable: a.out
You can specify a different output filename