Compiler Construction COMP 451 Compiler: An Overview
Compiler Construction COMP 451 Compiler: An Overview
COMP 451
Compiler: An Overview
1
Today’s Adenda
▪ A basic intro in which we will see
2
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
3
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
4
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program
5
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program
— Structure / phases of a compiler
6
Today’s Adenda
▪ A basic intro in which we will see
— Types of compiler
— Major parts of compilation process
— Other programs that work with compiler to execute a program
— Structure / phases of a compiler
— A brief overview of functionality of all the phases of compiler using
an example.
7
Compiler
▪ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.
8
Compiler
▪ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.
▪ A compiler also reports errors present in the source program as part of
translation process.
9
Compiler
▪ Compiler is a program that reads a program written in one language and
translates it into an equivalent program into another language.
▪ A compiler also reports errors present in the source program as part of
translation process.
10
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
▪Synthesis (Back End)
11
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
12
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
13
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
14
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
15
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
16
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
17
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.
18
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.
▪Also called passes of a compilation process.
19
Compilation Process
Major Parts of the Compilation Process
▪Analysis (Front End)
— Takes the input source program and breaks it into parts.
— Creates an intermediate representation of the source program.
— Generates symbol table.
▪Synthesis (Back End)
— Takes the intermediate representation as input.
— Creates desired target program.
▪Also called passes of a compilation process.
20
Phases of a Compiler
21
Phases of a Compiler
Lexical Analyzer
22
Phases of a Compiler
Lexical Analyzer
▪It reads the program and converts it into tokens.
23
Phases of a Compiler
Lexical Analyzer
▪It reads the program and converts it into tokens.
▪Tokens are defined by regular expressions which are understood by the
lexical analyzer.
24
Phases of a Compiler
Lexical Analyzer
▪It reads the program and converts it into tokens.
▪Tokens are defined by regular expressions which are understood by the
lexical analyzer.
25
Phases of a Compiler
Syntax Analyzer
26
Phases of a Compiler
Syntax Analyzer
▪Also called parser.
27
Phases of a Compiler
Syntax Analyzer
▪Also called parser.
▪It takes all the tokens one by one and uses Context Free Grammar to
construct the parse tree.
28
Phases of a Compiler
Syntax Analyzer
▪Also called parser.
▪It takes all the tokens one by one and uses Context Free Grammar to
construct the parse tree.
29
Phases of a Compiler
Semantic Analyzer
30
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
31
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
▪Produces a verified/corrected/annotated parse tree.
32
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
▪Produces a verified/corrected/annotated parse tree.
▪It also performs type checking.
33
Phases of a Compiler
Semantic Analyzer
▪Verifies the parse tree, whether it’s meaningful or not.
▪Produces a verified/corrected/annotated parse tree.
▪It also performs type checking.
34
Phases of a Compiler
Intermediate Code Generator
35
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
36
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
37
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
38
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
▪Till intermediate code, it is same for every compiler out there, but after
that, it depends on the platform.
39
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
▪Till intermediate code, it is same for every compiler out there, but after
that, it depends on the platform.
▪To build a new compiler we don’t need to build it from scratch.
40
Phases of a Compiler
Intermediate Code Generator
▪Last phase of front end.
▪An intermediate code is one which can be readily executed by machine.
▪Three Address Code is a popular example of IC.
▪Till intermediate code, it is same for every compiler out there, but after
that, it depends on the platform.
▪To build a new compiler we don’t need to build it from scratch.
▪We can take the intermediate code from the already existing compiler and
build the last two parts.
41
Phases of a Compiler
Code Optimizer
42
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
43
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.
44
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.
45
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.
46
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.
47
Phases of a Compiler
Code Optimizer
▪It transforms the code so that it consumes fewer resources and produces
more speed.
▪The meaning of the code being transformed is not altered.
48
Symbol Table
▪ It stores information
— about scope
49
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
50
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
51
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
52
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
53
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
54
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
55
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
56
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
57
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings
58
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings
— Compiler generated temporaries
59
Symbol Table
▪ It stores information
— about scope
— about instances of various entities such as variable and function
names, classes, objects, etc.
▪ It is populated in lexical and syntax analysis phases.
▪ The information is collected by the analysis phases of compiler and is
used by synthesis phases of compiler to generate code.
▪ It is used by compiler to achieve compile time efficiency.
▪ Each entry in symbol table is associated with attributes that support
compiler in different phases.
▪ Items stored in symbol table:
— Variable names and constants
— Procedure and function names
— Literal constants and strings
— Compiler generated temporaries
— Labels in source languages
60
Broad Categories of Compilers
61
Single Pass Compiler
62
Single Pass Compiler
63
Single Pass Compiler
64
Single Pass Compiler
65
Single Pass Compiler
66
Single Pass Compiler
67
Single Pass Compiler
68
Single Pass Compiler
69
Two Pass Compiler
70
Two Pass Compiler
Two pass compiler
71
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
72
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
•In multi-pass compiler, we divide
phases in two passes.
73
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
•In multi-pass compiler, we divide
phases in two passes.
•First pass is called front end, and
is platform independent.
74
Two Pass Compiler
Two pass compiler
•processes the source code of a
program multiple times.
•In multi-pass compiler, we divide
phases in two passes.
•First pass is called front end, and
is platform independent.
•Second pass is called back end
and is platform dependent.
75
Front End / Back End Division
76
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
77
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
78
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
79
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
80
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
81
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
82
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
83
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
▪ The back end
84
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
▪ The back end
— synthesizes the target program from the intermediate representation
produced by the front end.
85
Front End / Back End Division
▪ As discussed earlier, modern compilers contain two (large) parts, each of
which is often subdivided further.
▪ These two parts are the front end, and the back end.
▪ The front end
— analyzes the source program,
— determines its constituent parts, and
— constructs an intermediate representation of the program.
▪ Typically the front end is independent of the target language.
▪ The back end
— synthesizes the target program from the intermediate representation
produced by the front end.
▪ Typically the back end is independent of the source language.
86
Front End / Back End Division
▪ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.
87
Front End / Back End Division
▪ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.
▪ Instead of NM compilers, we need N front ends and M back ends.
88
Front End / Back End Division
▪ This front/back division very much reduces the work for a compiling
system that can handle several (N) source languages and several (M)
target languages.
▪ Instead of NM compilers, we need N front ends and M back ends.
▪ For gcc (originally abbreviating "Gnu C Compiler", but now abbreviating
"Gnu Compiler Collection"), N=7 and M~30 so the savings are
considerable.
89
Front End / Back End Division
90
Front End / Back End Division
91
Front End / Back End Division
92
Front End / Back End Division
93
Front End / Back End Division
94
Front End / Back End Division
95
Front End / Back End Division
96
Programs which help
Compiler accomplish its task
97
Cousins of Compilers
Cousins of Compiler
98
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
99
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
100
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
— Pre processor
101
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
— Pre processor
— Assembler
102
Cousins of Compilers
Cousins of Compiler
▪These are different programs that work with compiler to generate machine
code.
▪Other than compiler, there are three programs that work together to do the
task
— Pre processor
— Assembler
— Loader/Linker
103
Cousins of Compilers
104
Cousins of Compilers
Pre processor
105
Cousins of Compilers
Pre processor
▪Macro expansion
106
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
107
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
108
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
109
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
110
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
111
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
▪If no error, the expanded code is converted into assembly code specific for
the processor architecture.
112
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
▪If no error, the expanded code is converted into assembly code specific for
the processor architecture.
Assembler
113
Cousins of Compilers
Pre processor
▪Macro expansion
▪File inclusion
▪Provides a modified source program.
▪Strip off comments
Compiler
▪Checks for syntax and semantic errors.
▪If no error, the expanded code is converted into assembly code specific for
the processor architecture.
Assembler
▪Produces a re-locatable machine code.
114
Cousins of Compilers
Linker
115
Cousins of Compilers
Linker
▪Combines object files and library routines required to run the complete
source program.
116
Cousins of Compilers
Linker
▪Combines object files and library routines required to run the complete
source program.
▪It then hands over the stuff to the loader.
Loader
117
Cousins of Compilers
Linker
▪Combines object files and library routines required to run the complete
source program.
▪It then hands over the stuff to the loader.
Loader
▪Takes the entire program (the one provided by the assembler and the linker)
to main memory for execution.
118
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
119
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
▪ Compile it using
$ gcc –save-temps prg.c
120
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
▪ Compile it using
$ gcc –save-temps prg.c
▪ This will produce four files prg.i, prg.s, prg.o, and a.out
121
Cousins of Compilers: Practical Review
▪ Write a simple program in C on *NIX.
▪ Compile it using
$ gcc –save-temps prg.c
▪ This will produce four files prg.i, prg.s, prg.o, and a.out
▪ Go through these files to understand the working of cousins of compilers
discussed in previous slides.
122
Linking: Static Vs Dynamic
123
Linking: Static Vs Dynamic
▪ Linking is of two types.
124
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
— Dynamic linking
125
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
126
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
127
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling the program prg.c as follows
128
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
129
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn
130
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn
131
Linking: Static Vs Dynamic
▪ Linking is of two types.
— Static linking
• Code of external function/s is placed in the executable.
— Dynamic linking
• Code placement is delayed until load time / run time.
▪ To illustrate the concept, try compiling any C program as follows
$ gcc –static prg.c –o prgstatic
And
$ gcc prg.c –o prgdyn
132
Coming Back to Phases of a
Compiler
133
Lexical Analysis (Scanning)
Source Text into Tokens
134
Lexical Analysis (Scanning)
Source Text into Tokens
▪Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.
135
Lexical Analysis (Scanning)
Source Text into Tokens
▪Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.
▪How to describe these tokens?
136
Lexical Analysis (Scanning)
Source Text into Tokens
▪Reading the input text (character by character) and grouping individual
characters into tokens such as identifiers, integers, reserved words, and
delimiters.
▪How to describe these tokens?
▪Regular expression notation is an effective approach to describing tokens.
137
Lexical Analysis (Scanning)
Lexemes and Tokens
138
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
139
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
140
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
141
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
142
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
would be grouped into the lexemes x3, =, y, +, 3, and ;
143
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
would be grouped into the lexemes x3, =, y, +, 3, and ;
144
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The character stream input is grouped into units called lexemes, which are
then mapped into tokens,
▪Tokens are the output of the lexical analyzer.
▪For example, any one of the following C statements
x3 = y + 3;
x3=y + 3 ;
x3=y+3 ;
would be grouped into the lexemes x3, =, y, +, 3, and ;
x, 3, = y, +, 3 and ;
145
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
146
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
147
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.
148
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.
— The name id is short for identifier.
149
Lexical Analysis (Scanning)
Lexemes and Tokens
▪For example, any one of the following C statements
x3 = y + 3;
x3 = y + 3 ;
x3 =y+3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;
▪A token is a <token-name, attribute-value> pair. For example
— The lexeme x3 would be mapped to a token such as <id,1>.
— The name id is short for identifier.
— The value 1 is the index of the entry for x3 in the symbol table
produced by the compiler.
150
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
151
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
152
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
153
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.
154
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.
▪The lexeme y is mapped to the token <id,2>
155
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme =
— would be mapped to the token <=>.
— In reality it is probably mapped to a pair, whose second component is
ignored.
— The point is that there are many different identifiers so we need the
second component, but there is only one assignment symbol =.
▪The lexeme y is mapped to the token <id,2>
▪The lexeme + is mapped to the token <+>.
156
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
157
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
158
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
159
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
160
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
161
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
int x; and
intx;
162
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
int x; and
intx;
▪The blank between int and x is clearly necessary, but it does not become
part of any token.
163
Lexical Analysis (Scanning)
Lexemes and Tokens
▪The lexeme ‘3’ is somewhat interesting and will be discussed later.
▪The lexeme ‘;’ is mapped to the token <;>
▪Note that non-significant blanks are normally removed during scanning.
▪ In C, most blanks are non-significant.
▪That does not mean the blanks are unnecessary. Consider
int x; and
intx;
▪The blank between int and x is clearly necessary, but it does not become
part of any token.
▪Blanks inside strings are an exception, they are strictly considered as part
of the token.
164
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
165
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
166
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
167
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.
168
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.
▪ And what is a CSG?
169
Syntax Analysis (Parsing)
▪ The parser is based on a formal syntax specification such as a CFGs.
▪ What is a CFG?
— Context-free Grammars allow a non-terminal to be replaced by a
corresponding production rule whenever it appears in a derivation
process.
— The replacement occurs irrespective of what lies before or after the
non-terminal.
▪ And what is a CSG?
— Context-sensitive Grammars allow replacement of a non-terminal
based on what lies before or after the non-terminal.
170
Syntax Analysis (Parsing)
▪ Consider the rule
171
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
172
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".
173
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".
▪ Now, consider the rule
174
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".
▪ Now, consider the rule
𝐶𝐴𝐵→𝐶0𝐴1𝐵
175
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".
▪ Now, consider the rule
𝐶𝐴𝐵→𝐶0𝐴1𝐵
176
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".
▪ Now, consider the rule
𝐶𝐴𝐵→𝐶0𝐴1𝐵
177
Syntax Analysis (Parsing)
▪ Consider the rule
𝐴→0𝐴1
▪ What this says is "wherever you find 𝐴, you can replace it with 0𝐴1".
▪ Now, consider the rule
𝐶𝐴𝐵→𝐶0𝐴1𝐵
178
Syntax Analysis (Parsing)
▪ Parsing involves a further grouping in which tokens are grouped into
grammatical phrases, which are often represented in a parse tree.
179
Syntax Analysis (Parsing)
▪ Parsing involves a further grouping in which tokens are grouped into
grammatical phrases, which are often represented in a parse tree.
▪ For example
x3 = y + 3;
180
Syntax Analysis (Parsing)
▪ Parsing involves a further grouping in which tokens are grouped into
grammatical phrases, which are often represented in a parse tree.
▪ For example
x3 = y + 3;
would be parsed into a tree.
181
Syntax Analysis (Parsing)
▪ Parsing involves a further grouping in which tokens are grouped into
grammatical phrases, which are often represented in a parse tree.
▪ For example
x3 = y + 3;
would be parsed into a tree.
▪ This parsing would result from a grammar containing rules such as
asst-stmt → id = expr ;
expr → number | id | expr + expr
182
Syntax Analysis (Parsing)
▪ Parsing involves a further grouping in which tokens are grouped into
grammatical phrases, which are often represented in a parse tree.
▪ For example
x3 = y + 3;
would be parsed into a tree.
▪ This parsing would result from a grammar containing rules such as
asst-stmt → id = expr ;
expr → number | id | expr + expr
▪ Note the recursive definition of expression (expr).
183
Syntax Analysis (Parsing)
184
Semantic Analysis
▪ Also called the Type Checker.
185
Semantic Analysis
▪ Also called the Type Checker.
▪ Type checking is purely dependent on the semantic rules of the source
language. It is independent of the compiler’s target.
186
Semantic Analysis
▪ Also called the Type Checker.
▪ Type checking is purely dependent on the semantic rules of the source
language. It is independent of the compiler’s target.
▪ The semantics of a language provide a set of rules that specify which
syntactically legal programs are actually valid.
187
Semantic Analysis
▪ Also called the Type Checker.
▪ Type checking is purely dependent on the semantic rules of the source
language. It is independent of the compiler’s target.
▪ The semantics of a language provide a set of rules that specify which
syntactically legal programs are actually valid.
▪ Such rules typically require that
— all identifiers be declared,
— operators and operands be type-compatible, and
— procedures be called with the proper number of parameters.
188
Intermediate Code Generation
▪ Many compilers internally generate intermediate code for an "idealized
machine".
189
Intermediate Code Generation
▪ Many compilers internally generate intermediate code for an "idealized
machine".
▪ For example, the intermediate code generated would assume that the
target has an unlimited number of registers and that any register can be
used for any operation.
190
Intermediate Code Generation
▪ Many compilers internally generate intermediate code for an "idealized
machine".
▪ For example, the intermediate code generated would assume that the
target has an unlimited number of registers and that any register can be
used for any operation.
▪ With these assumptions of a machine with an unlimited number of
registers and instructions with three operands, one generates
"three-address code" by walking the semantic tree.
191
Code Optimazation
▪ This is a very serious subject, one that we will not really do justice to in
this introductory course. Some optimizations are fairly easy to see.
▪ In addition to optimizations performed on the intermediate code, further
optimizations can be performed on the machine code by the
machine-dependent back end.
192
Code Generation
▪ Modern processors have only a limited number of register.
▪ Some processors (e.g., the MIPS architecture) use three-address
instructions.
▪ Other processors permit only two addresses; the result overwrites one of
the sources.
▪ Using three-address instructions restricted to registers (except for load
and store instructions, which naturally must also reference memory),
code something like the following would be produced for our example,
after first assigning memory locations to id1 and id2.
LD R1, id2
ADDF R1, R1, #3.0 // add float
RTOI R2, R1 // real to int
ST id1, R2
193