0% found this document useful (0 votes)

47 views

Compiler Construction COMP 451 Compiler: An Overview

The document discusses the major parts of the compilation process, including the front end which analyzes the source code by breaking it into tokens, creating an intermediate representation, and generating a symbol table. The back end takes the intermediate code as input and creates the target program. The phases of a compiler are also described, beginning with lexical analysis, syntax analysis using a parser to build a parse tree, semantic analysis to verify the tree, and intermediate code generation to produce code that can be executed by the machine.

Uploaded by

ali hassan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Compiler Construction COMP 451 Compiler: An Overview

Uploaded by

ali hassan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 193

Compiler Construction

COMP 451
Compiler: An Overview

1
Today’s Adenda
▪ A basic intro in which we will see

2
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler

3
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process

4
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program

5
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program
— Structure / phases of a compiler

6
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program
— Structure / phases of a compiler
— A brief overview of functionality of all the phases of compiler using
an example.

7
Compiler
▪ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.

8
Compiler
▪ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.
▪ A compiler also reports errors present in the source program as part of
translation process.

9
Compiler
▪ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.
▪ A compiler also reports errors present in the source program as part of
translation process.

10
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
▪Synthesis (Back End)

11
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)

12
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.

13
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.

14
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.

15
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)

16
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.

17
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.

18
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.
▪Also called passes of a compilation process.

19
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.
▪Also called passes of a compilation process.

20
Phases of a Compiler

21
Phases of a Compiler
Lexical Analyzer

22
Phases of a Compiler
Lexical Analyzer
▪It reads the program and converts it into tokens.

23
Phases of a Compiler
Lexical Analyzer
▪It reads the program and converts it into tokens.
▪Tokens are defined by regular expressions which are understood by the
lexical analyzer.

24
Phases of a Compiler
Lexical Analyzer
▪It reads the program and converts it into tokens.
▪Tokens are defined by regular expressions which are understood by the
lexical analyzer.

25
Phases of a Compiler
Syntax Analyzer

26
Phases of a Compiler
Syntax Analyzer
▪Also called parser.

27
Phases of a Compiler
Syntax Analyzer
▪Also called parser.
▪It takes all the tokens one by one and uses Context Free Grammar to
construct the parse tree.

28
Phases of a Compiler
Syntax Analyzer
▪Also called parser.
▪It takes all the tokens one by one and uses Context Free Grammar to
construct the parse tree.

29
Phases of a Compiler
Semantic Analyzer

30
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.

31
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
▪Produces a verified/corrected/annotated parse tree.

32
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
▪Produces a verified/corrected/annotated parse tree.
▪It also performs type checking.

33
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
▪Produces a verified/corrected/annotated parse tree.
▪It also performs type checking.

34
Phases of a Compiler
Intermediate Code Generator

35
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.

36
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.

37
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.

38
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
▪Till intermediate code, it is same for every compiler out there, but after
that, it depends on the platform.

39
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
▪Till intermediate code, it is same for every compiler out there, but after
that, it depends on the platform.
▪To build a new compiler we don’t need to build it from scratch.

40
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
▪Till intermediate code, it is same for every compiler out there, but after
that, it depends on the platform.
▪To build a new compiler we don’t need to build it from scratch.
▪We can take the intermediate code from the already existing compiler and
build the last two parts.

41
Phases of a Compiler
Code Optimizer

42
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.

43
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.

44
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.

Target Code Generator

45
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.

Target Code Generator

▪This is the final stage of compilation.

46
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.

Target Code Generator

▪This is the final stage of compilation.
▪The main purpose of Target Code generator is to write a code that the
machine can understand.

47
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.

Target Code Generator

▪This is the final stage of compilation.
▪The main purpose of Target Code generator is to write a code that the
machine can understand.
▪The output is dependent on the type of assembler.

48
Symbol Table
▪ It stores information
— about scope

49
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.

50
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.

51
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.

52
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.

53
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.

54
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:

55
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants

56
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names

57
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings

58
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings
— Compiler generated temporaries

59
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings
— Compiler generated temporaries
— Labels in source languages

60
Broad Categories of Compilers

▪Single Pass Compiler

▪Multi Pass Compiler

61
Single Pass Compiler

62
Single Pass Compiler

Single pass compiler

63
Single Pass Compiler

Single pass compiler

•passes through the part of each
compilation unit exactly once.

64
Single Pass Compiler

Single pass compiler

•passes through the part of each
compilation unit exactly once.
•faster and smaller than the multi
pass compiler.

65
Single Pass Compiler

Single pass compiler

•passes through the part of each
compilation unit exactly once.
•faster and smaller than the multi
pass compiler.
•less efficient in comparison with
multi-pass compiler.

66
Single Pass Compiler

Single pass compiler

67
Single Pass Compiler

Single pass compiler

•passes through the part of each
compilation unit exactly once.
•faster and smaller than the multi
pass compiler.
•less efficient in comparison with
multi-pass compiler.
•processes the input exactly once,
so going directly from lexical
analysis to code generator, and
then going back for the next
read.
•can’t backup and re-process, so
grammar should be simplified.

68
Single Pass Compiler

Single pass compiler

69
Two Pass Compiler

70
Two Pass Compiler
Two pass compiler

71
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.

72
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
•In multi-pass compiler, we divide
phases in two passes.

73
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
•In multi-pass compiler, we divide
phases in two passes.
•First pass is called front end, and
is platform independent.

74
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
•In multi-pass compiler, we divide
phases in two passes.
•First pass is called front end, and
is platform independent.
•Second pass is called back end
and is platform dependent.

75
Front End / Back End Division

76
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.

77
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.

78
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end

79
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,

80
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and

81
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.

82
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.

83
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
▪ The back end

84
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
▪ The back end
— synthesizes the target program from the intermediate representation
produced by the front end.

85
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
▪ The back end
— synthesizes the target program from the intermediate representation
produced by the front end.
▪ Typically the back end is independent of the source language.

86
Front End / Back End Division
▪ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.

87
Front End / Back End Division
▪ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.
▪ Instead of NM compilers, we need N front ends and M back ends.

88
Front End / Back End Division
▪ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.
▪ Instead of NM compilers, we need N front ends and M back ends.
▪ For gcc (originally abbreviating "Gnu C Compiler", but now abbreviating
"Gnu Compiler Collection"), N=7 and M~30 so the savings are
considerable.

89
Front End / Back End Division

90
Front End / Back End Division

• Compiler for more than

one programming
languages but same
machine.

91
Front End / Back End Division

• Compiler for more than

one programming
languages but same
machine.
• For each programming
language there is
requirement of making
Front end/first pass for
each of them and only
one Back end/second
pass as shown.

92
Front End / Back End Division

93
Front End / Back End Division

• compiler for same

programming language
but different
machines/systems.

94
Front End / Back End Division

• compiler for same

programming language
but different
machines/systems.
• In this case we make
different Back ends for
different machines and
make only one Front
end for same
programming language
as shown.

95
Front End / Back End Division

96
Programs which help
Compiler accomplish its task

97
Cousins of Compilers
Cousins of Compiler

98
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.

99
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task

100
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
— Pre processor

101
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
— Pre processor
— Assembler

102
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
— Pre processor
— Assembler
— Loader/Linker

103
Cousins of Compilers

104
Cousins of Compilers
Pre processor

105
Cousins of Compilers
Pre processor
▪Macro expansion

106
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion

107
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.

108
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments

109
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler

110
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.

111
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
▪If no error, the expanded code is converted into assembly code specific for
the processor architecture.

112
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
▪If no error, the expanded code is converted into assembly code specific for
the processor architecture.
Assembler

113
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
▪If no error, the expanded code is converted into assembly code specific for
the processor architecture.
Assembler
▪Produces a re-locatable machine code.

114
Cousins of Compilers
Linker

115
Cousins of Compilers
Linker
▪Combines object files and library routines required to run the complete
source program.

116
Cousins of Compilers
Linker
▪Combines object files and library routines required to run the complete
source program.
▪It then hands over the stuff to the loader.
Loader

117
Cousins of Compilers
Linker
▪Combines object files and library routines required to run the complete
source program.
▪It then hands over the stuff to the loader.
Loader
▪Takes the entire program (the one provided by the assembler and the linker)
to main memory for execution.

118
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.

119
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
▪ Compile it using
$ gcc –save-temps prg.c

120
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
▪ Compile it using
$ gcc –save-temps prg.c
▪ This will produce four files prg.i, prg.s, prg.o, and a.out

121
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
▪ Compile it using
$ gcc –save-temps prg.c
▪ This will produce four files prg.i, prg.s, prg.o, and a.out
▪ Go through these files to understand the working of cousins of compilers
discussed in previous slides.

122
Linking: Static Vs Dynamic

123
Linking: Static Vs Dynamic
▪ Linking is of two types.

124
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
— Dynamic linking

125
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.

126
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.

127
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling the program prg.c as follows

128
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic

129
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn

130
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn

▪ Now assuming both prgstatic and prgdyn are in the PWD

$ ls –li

131
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn

▪ Now assuming both prgstatic and prgdyn are in the PWD

$ ls –li
▪ Look at the size of both executables.

132
Coming Back to Phases of a
Compiler

Understanding with an Example

133
Lexical Analysis (Scanning)
Source Text into Tokens

134
Lexical Analysis (Scanning)
Source Text into Tokens
▪Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.

135
Lexical Analysis (Scanning)
Source Text into Tokens
▪Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.
▪How to describe these tokens?

136
Lexical Analysis (Scanning)
Source Text into Tokens
▪Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.
▪How to describe these tokens?
▪Regular expression notation is an effective approach to describing tokens.

137
Lexical Analysis (Scanning)
Lexemes and Tokens

138
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,

139
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.

140
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements

141
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;

142
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
would be grouped into the lexemes x3, =, y, +, 3, and ;

143
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
would be grouped into the lexemes x3, =, y, +, 3, and ;

but the following would result in

x 3 = y + 3;

144
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
would be grouped into the lexemes x3, =, y, +, 3, and ;

but the following would result in

x 3 = y + 3;

x, 3, = y, +, 3 and ;

145
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;

146
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example

147
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.

148
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.
— The name id is short for identifier.

149
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.
— The name id is short for identifier.
— The value 1 is the index of the entry for x3 in the symbol table
produced by the compiler.

150
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =

151
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.

152
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.

153
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.

154
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.
▪The lexeme y is mapped to the token <id,2>

155
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.
▪The lexeme y is mapped to the token <id,2>
▪The lexeme + is mapped to the token <+>.

156
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.

157
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>

158
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.

159
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.

160
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider

161
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
int x; and
intx;

162
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
int x; and
intx;
▪The blank between int and x is clearly necessary, but it does not become
part of any token.

163
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
int x; and
intx;
▪The blank between int and x is clearly necessary, but it does not become
part of any token.
▪Blanks inside strings are an exception, they are strictly considered as part
of the token.

164
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.

165
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?

166
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.

167
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.

168
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.
▪ And what is a CSG?

169
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.
▪ And what is a CSG?
— Context-sensitive Grammars allow replacement of a non-terminal
based on what lies before or after the non-terminal.

170
Syntax Analysis (Parsing)
▪ Consider the rule

171
Syntax Analysis (Parsing)
▪ Consider the rule

𝐴→0𝐴1

172
Syntax Analysis (Parsing)
▪ Consider the rule

𝐴→0𝐴1

▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".

173
Syntax Analysis (Parsing)
▪ Consider the rule

𝐴→0𝐴1