0% found this document useful (0 votes)

16 views

01 IntroToCompilers

Uploaded by

hafsaahadbutt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

01 IntroToCompilers

Uploaded by

hafsaahadbutt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Compiler Construction

COMP 451
Introduction

1
Today’s Adenda
§ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program
§ Structure / phases of a compiler
§ A brief overview of all the phases of compiler.

2
Compiler
§ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.
§ A compiler also reports errors present in the source program as part of
translation process.

3
Compilation Process
Major Parts of the Compilation Process
§ Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
§ Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.
§ Also called passes of a compilation process.

4
Phases of a Compiler

5
Phases of a Compiler
Lexical Analyzer
§ It reads the program and converts it into tokens.
§ Tokens are defined by regular expressions which are understood by the
lexical analyzer.
Syntax Analyzer
§ Also called parser.
§ It takes all the tokens one by one and uses Context Free Grammar to
construct the parse tree.
Semantic Analyzer
§ Verifies the parse tree, whether it’s meaningful or not.
§ Produces a verified/corrected/annotated parse tree.
§ It also performs type checking.
Intermediate Code Generator
§ Last phase of front end.
§ An intermediate code is one which can be readily executed by machine.
§ Three Address Code is a popular example of IC.

6
Phases of a Compiler
Intermediate Code Generator …
§ Till intermediate code, it is same for every compiler out there, but
after that, it depends on the platform.
§ To build a new compiler we don’t need to build it from scratch.
§ We can take the intermediate code from the already existing compiler
and build the last two parts.

7
Phases of a Compiler
Code Optimizer
§ It transforms the code so that it consumes fewer resources and produces
more speed.
§ The meaning of the code being transformed is not altered.

Target Code Generator

§ This is the final stage of compilation.
§ The main purpose of Target Code generator is to write a code that the
machine can understand.
§ The output is dependent on the type of assembler.

8
Symbol Table
§ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
§ It is built in lexical and syntax analysis phases.
§ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
§ It is used by compiler to achieve compile time efficiency.
§ Each entry in symbol table is associated with attributes that support
compiler in different phases.
§ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings
— Compiler generated temporaries
— Labels in source languages

9
Types of Compilers

§ Single Pass Compiler

§ Multi Pass Compiler

10
Single Pass Compiler

Single pass compiler

• passes through the part of
each compilation unit exactly
once.
• faster and smaller than the
multi pass compiler.
• less efficient in comparison
with multi-pass compiler.
• processes the input exactly
once, so going directly from
lexical analysis to code
generator, and then going
back for the next read.
• can’t backup and re-process,
so grammar should be
simplified.
• Examples may include
command interpreters such
as bash/sh/tcsh
11
Two Pass Compiler
Two pass compiler
• processes the source code of a
program multiple times.
• In multi-pass compiler, we
divide phases in two passes.
• First pass is called front end,
and is platform independent.
• Second pass is called back end
and is platform dependent.

12
Front End / Back End Division
§ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
§ These two parts are the front end, and the back end.
§ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
§ Typically the front end is independent of the target language.
§ The back end
— synthesizes the target program from the intermediate representation
produced by the front end.
§ Typically the back end is independent of the source language.

13
Front End / Back End Division
§ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.
§ Instead of NM compilers, we need N front ends and M back ends.
§ For gcc (originally abbreviating "Gnu C Compiler", but now abbreviating
"Gnu Compiler Collection"), N=7 and M~30 so the savings are
considerable.

14
Front End / Back End Division

• Compiler for more

than one programming
languages but same
machine.
• For each programming
language there is
requirement of making
Front end/first pass for
each of them and only
one Back end/second
pass as shown.

15
Front End / Back End Division

• compiler for same

programming language
but different
machines/systems.
• In this case we make
different Back ends for
different machines and
make only one Front
end for same
programming language
as shown.

16
Front End / Back End Division

17
Programs which help
Compiler accomplish its task

18
Cousins of Compilers
Cousins of Compiler
§ These are different programs that work with compiler to generate
machine code.
§ Other than compiler, there are three programs that work together to do
the task
— Pre processor
— Assembler
— Loader/Linker

19
Cousins of Compilers
Pre processor
§ Macro expansion
§ File inclusion
§ Provides a modified source program.
§ Strip off comments
Compiler
§ Checks for syntax and semantic errors.
§ If no error, the expanded code is converted into assembly code specific
for the processor architecture.
Assembler
§ Produces a re-locatable machine code.

20
Cousins of Compilers
Linker
§ Combines object files and library routines required to run the complete
source program.
§ It then hands over the stuff to the loader.
Loader
§ Takes the entire program (the one provided by the assembler and the
linker) to main memory for execution.

21
Cousins of Compilers
§ Write a simple program in C on *NIX.
§ Compile it using
$ gcc –save-temps prg.c
§ This will produce four files prg.i, prg.s, prg.o, and a.out
§ Go through these files to understand the working of cousins of compilers
discussed in previous slides.

22
Cousins of Compilers
§ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
§ To illustrate the concept, try compiling the program prg.c as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn

§ Now assuming both prgstatic and prgdyn are in the PWD

$ ls –li
§ Look at the size of both executables.

23
Coming Back to Phases of a
Compiler

Understanding with an Example

24
Lexical Analysis (Scanning)
Source Text into Tokens
§ Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.
§ The main action of building tokens is often driven by token descriptions.
§ How to describe these tokens?
§ Regular expression notation is an effective approach to describing
tokens.

25
Lexical Analysis (Scanning)
Lexemes and Tokens
§ The character stream input is grouped into meaningful units called
lexemes, which are then mapped into tokens,
§ Tokens are the output of the lexical analyzer.
§ For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;

26
Lexical Analysis (Scanning)
Lexemes and Tokens
§ For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
§ A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.
— The name id is short for identifier.
— The value 1 is the index of the entry for x3 in the symbol table
produced by the compiler.

27
Lexical Analysis (Scanning)
Lexemes and Tokens
§ The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.
§ The lexeme y is mapped to the token <id,2>
§ The lexeme + is mapped to the token <+>.

28
Lexical Analysis (Scanning)
Lexemes and Tokens
§ The lexeme ‘3’ is somewhat interesting and will be discussed later.
§ The lexeme ‘;’ is mapped to the token <;>
§ Note that non-significant blanks are normally removed during scanning.
§ In C, most blanks are non-significant.
§ That does not mean the blanks are unnecessary. Consider
int x; and
intx;
§ The blank between int and x is clearly necessary, but it does not become
part of any token.
§ Blanks inside strings are an exception, they are strictly considered as
part of the token.

29
Syntax Analysis (Parsing)
§ The parser is based on a formal syntax specification such as a CFGs.
§ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.

30
Syntax Analysis (Parsing)
§ The parser is based on a formal syntax specification such as a CFGs.
§ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.
§ And what is a CSG?
— Context-sensitive Grammars allow replacement of a non-terminal
based on what lies before or after the non-terminal.

31
Syntax Analysis (Parsing)
§ Consider the rule

!→0!1

§ What this says is "wherever you find !, you can replace it with 0!1".
§ Now, consider the rule

"!#→"0!1#

§ This says "You can replace ! with 0!1 only if it is preceded by " and
followed by #”.
§ Here, it imposes a condition on when ! can be replaced with 0!1.
§ You can apply this rule only if ! appears in this particular context.

32
Syntax Analysis (Parsing)
§ Parsing involves a further grouping in which tokens are grouped into
grammatical phrases, which are often represented in a parse tree.
§ For example
x3 = y + 3;
would be parsed into a tree.
§ This parsing would result from a grammar containing rules such as
asst-stmt → id = expr ;
expr → number | id | expr + expr
§ Note the recursive definition of expression (expr).

33
Syntax Analysis (Parsing)

34
Syntax Analysis (Parsing)

§ Often we utilize a simpler tree called the syntax tree with operators as
interior nodes and operands as the children of the operator.
§ The syntax tree below corresponds to the parse tree in previous slide.
§ More on this point later.

35
Semantic Analysis
§ Also called the Type Checker.
§ Type checking is purely dependent on the semantic rules of the source
language. It is independent of the compiler’s target.
§ The semantics of a language provide a set of rules that specify which
syntactically legal programs are actually valid.
§ Such rules typically require that
— all identifiers be declared,
— operators and operands be type-compatible, and
— procedures be called with the proper number of parameters.

36
Semantic Analysis
§ The compiler needs semantic information, e.g., the types (integer, real,
pointer to array of integers, etc) of the objects involved.
§ This enables checking for semantic errors and inserting type conversion
where necessary.
§ For example, if y was declared to be a real and x3 an integer, we need to
insert conversion operators "inttoreal" and "realtoint" as shown.

37
Intermediate Code Generation
§ Many compilers internally generate intermediate code for an "idealized
machine".
§ For example, the intermediate code generated would assume that the
target has an unlimited number of registers and that any register can be
used for any operation.
§ With these assumptions of a machine with an unlimited number of
registers and instructions with three operands, one generates "three-
address code" by walking the semantic tree.
§ Our example C instruction would produce
temp1 = inttoreal(3)
temp2 = id2 + temp1
temp3 = realtoint(temp2)
id1 = temp3

38
Intermediate Code Generation
§ Our example C instruction would produce
temp1 = inttoreal(3)
temp2 = id2 + temp1
temp3 = realtoint(temp2)
id1 = temp3

§ Sometimes three-address code is called quadruples because one can view

the previous code sequence as
inttoreal temp1 3 -–
add temp2 id2 temp1
realtoint temp3 temp2 -–
assign id1 temp3 -–

§ Each "quad" has the form

operation target source1 source2

39
Code Optimazation
§ This is a very serious subject, one that we will not really do justice to in
this introductory course. Some optimizations are fairly easy to see.
§ Since 3 is a constant, the compiler can perform the int to real conversion
and replace the first two quads with
add temp2 id2 3.0
§ The last two quads can be combined into
realtoint id1 temp2
§ In addition to optimizations performed on the intermediate code, further
optimizations can be performed on the machine code by the machine-
dependent back end.

40
Code Generation
§ Modern processors have only a limited number of register.
§ Although some processors, such as the x86, can perform operations
directly on memory locations, we will for now assume only register
operations.
§ Some processors (e.g., the MIPS architecture) use three-address
instructions.
§ We follow this model.
§ Other processors permit only two addresses; the result overwrites one of
the sources.
§ Using three-address instructions restricted to registers (except for load
and store instructions, which naturally must also reference memory),
code something like the following would be produced for our example,
after first assigning memory locations to id1 and id2.
LD R1, id2
ADDF R1, R1, #3.0 // add float
RTOI R2, R1 // real to int
ST id1, R2

Compiler Design Quantum PDF
100% (1)
Compiler Design Quantum PDF
211 pages
Chapter 5 Syntax-Directed Translation
No ratings yet
Chapter 5 Syntax-Directed Translation
25 pages
Bedasa
No ratings yet
Bedasa
31 pages
Chapter 1
No ratings yet
Chapter 1
42 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
Chapter-1[1]
No ratings yet
Chapter-1[1]
49 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Module 1
100% (1)
Module 1
91 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
CD - Unit 1 Notes
No ratings yet
CD - Unit 1 Notes
38 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
Debre Markos University Burie Campus Departement of Computer Science
No ratings yet
Debre Markos University Burie Campus Departement of Computer Science
44 pages
CD All Units
No ratings yet
CD All Units
117 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
Compilers
No ratings yet
Compilers
86 pages
1-Introduction to Compilers
No ratings yet
1-Introduction to Compilers
40 pages
#Chapter 1 - CD
No ratings yet
#Chapter 1 - CD
37 pages
Notes Compiler
No ratings yet
Notes Compiler
28 pages
File (4)
No ratings yet
File (4)
25 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
CC 1
No ratings yet
CC 1
41 pages
Principles of Compiler Design: Million G/her
No ratings yet
Principles of Compiler Design: Million G/her
40 pages
CDUnit 1
No ratings yet
CDUnit 1
39 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
CD - Unit I
No ratings yet
CD - Unit I
30 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
Lecture 1 - CSC 303
No ratings yet
Lecture 1 - CSC 303
40 pages
Chapter-1
No ratings yet
Chapter-1
49 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
Unit 1 Phases of CD
No ratings yet
Unit 1 Phases of CD
33 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Chapter 1-1
No ratings yet
Chapter 1-1
25 pages
Compiler Design CSE - 353: UNIT-1
No ratings yet
Compiler Design CSE - 353: UNIT-1
42 pages
Unit 1 (A)
No ratings yet
Unit 1 (A)
40 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
41 pages
Compiler Design Quantum
No ratings yet
Compiler Design Quantum
251 pages
CSC Slides Intro n Lex
No ratings yet
CSC Slides Intro n Lex
77 pages
CD Introduction
No ratings yet
CD Introduction
32 pages
Compilers
No ratings yet
Compilers
25 pages
1 - Introduction To Compilers
No ratings yet
1 - Introduction To Compilers
21 pages
Unit-I - CD R2021
No ratings yet
Unit-I - CD R2021
60 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
9 pages
Compiler Design Concepts Worked Out Examples and M
100% (1)
Compiler Design Concepts Worked Out Examples and M
100 pages
Introduction To Compiler Design (CD) : Mu-Mit
No ratings yet
Introduction To Compiler Design (CD) : Mu-Mit
22 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
35 pages
CS4031 Compiler Construction Lecture 1
No ratings yet
CS4031 Compiler Construction Lecture 1
42 pages
Compiler 1
No ratings yet
Compiler 1
33 pages
Muhammad Hamza BSCS-E3-22-23 Compiler
No ratings yet
Muhammad Hamza BSCS-E3-22-23 Compiler
11 pages
Compiler CH1
No ratings yet
Compiler CH1
24 pages
Compiler Design Note1
No ratings yet
Compiler Design Note1
111 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Language Processing System
No ratings yet
Language Processing System
20 pages
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5
No ratings yet
Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5
38 pages
Notes CFG
No ratings yet
Notes CFG
25 pages
Theory of Computation and Compiler Design: Module - 4
No ratings yet
Theory of Computation and Compiler Design: Module - 4
27 pages
Principles of Programming Language: B.Tech
No ratings yet
Principles of Programming Language: B.Tech
121 pages
CSE Course Viva Questions
84% (37)
CSE Course Viva Questions
27 pages
CS8501 Theory of Computation Important Question
No ratings yet
CS8501 Theory of Computation Important Question
18 pages
NLP CHAPTER 3
No ratings yet
NLP CHAPTER 3
23 pages
ANTLR Reference Manual
No ratings yet
ANTLR Reference Manual
152 pages
CEAT Questions - ToC1
No ratings yet
CEAT Questions - ToC1
21 pages
Grammar
No ratings yet
Grammar
44 pages
TCS Lect 22 Regular Grammar
No ratings yet
TCS Lect 22 Regular Grammar
39 pages
UG Question Papers & Answer Keys (Paper 1 & 2)
75% (4)
UG Question Papers & Answer Keys (Paper 1 & 2)
800 pages
1 Closure Properties of Context-Free Languages: 1.1 Union
No ratings yet
1 Closure Properties of Context-Free Languages: 1.1 Union
11 pages
An Introduction To The Theory of Computation PDF
No ratings yet
An Introduction To The Theory of Computation PDF
467 pages
Theory of Computation MCQs
85% (27)
Theory of Computation MCQs
40 pages
Context Free Grammar
No ratings yet
Context Free Grammar
23 pages
Compiler Easy Notes- Hamza Zahoor
No ratings yet
Compiler Easy Notes- Hamza Zahoor
37 pages
M.Sc. Degree Examination, November 2015 Third Semester Computer Science and Information Technology Elective - E-COMMERCE (CBCS - 2011 Onwards)
No ratings yet
M.Sc. Degree Examination, November 2015 Third Semester Computer Science and Information Technology Elective - E-COMMERCE (CBCS - 2011 Onwards)
31 pages
BCS - Compiler Construction - Notes
No ratings yet
BCS - Compiler Construction - Notes
60 pages
2010 MCA Syllabus 2008-11 Modified
100% (1)
2010 MCA Syllabus 2008-11 Modified
66 pages
Chapter-1 FLAT
No ratings yet
Chapter-1 FLAT
5 pages
Unit 2 Brief Notes
No ratings yet
Unit 2 Brief Notes
10 pages
Introduction To The Theory of Computation
No ratings yet
Introduction To The Theory of Computation
4 pages
NLP UNIT 2 Notes
No ratings yet
NLP UNIT 2 Notes
14 pages
Compiler Design (13Cs401) List of Programs
No ratings yet
Compiler Design (13Cs401) List of Programs
12 pages
How To Convert CFG To CNF
0% (1)
How To Convert CFG To CNF
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
37 pages
CH-1 IntroToAutomataTheory
No ratings yet
CH-1 IntroToAutomataTheory
35 pages
4. Context Free Grammar
No ratings yet
4. Context Free Grammar
50 pages

01 IntroToCompilers

Uploaded by

01 IntroToCompilers

Uploaded by

Compiler Construction

Target Code Generator

§ Single Pass Compiler

Single pass compiler

• Compiler for more

• compiler for same

§ Now assuming both prgstatic and prgdyn are in the PWD

Understanding with an Example

§ Sometimes three-address code is called quadruples because one can view

§ Each "quad" has the form

You might also like