Compiler Construction Week 2
Compiler Construction Week 2
Zulfiqar Ali
UIT University
Week 2 - Contents
• More on Language Processors
• More on Compiler Structure
More on Language Processors
More on Language Processors
• In addition to a compiler,
several other programs may
be required to create an
executable target program.
Which are
• Preprocessor
• Compiler
• Assembler
• Linker/Loader
Preprocessors
• A program may need to include
modules available in different
separate file.
• Preprocessor combine all required
modules and prepare a complete
source program.
• It also expand Macros.
Compiler
• The source will be put to compiler
and it may produce assembly
(intermediate code).
• Depending on the hardware and
OS we may have different
Assemblers, and one assembly
code may not work on another
hardware/OS/Platform
Assembler
• The assembler then produces
relocatable machine code.
• There are 2 types Machine codes.
– Relcoatable Machine Code: It can be
load at any point in computer and
run.
– Absolute Machine Code: it can be
load in a Fixed Storage point and only
run at that point.
Linker/Loader
• Large Programs are often compiled into
pieces called relocatable object, and all
objects need to linked together.
• Linker: The linker resolves external
memory addresses, where the code in
one file may refer to a location in another
file.
• The loader then puts together all of the
executable object files into memory for
execution.
Structure of a Compiler
• If we examine the compilation process in more
detail, we see that it operates as a sequence
of phases, each of which transforms one
representation of the source program to
another.
Structure of Compiler
• We have already studied the basics of
Structures of compilers, If we examine the
compilation process in more detail, we see
that it operates as a sequence of phases, each
of which transforms one representation of the
source program to another.
• The symbol table,
which store
information about the
entire source program,
is used by all phases of
the compiler.
• In practice, several phases may be grouped together, and the
intermediate representations between the grouped phases
need not be constructed explicitly.
• Some compilers have a machine-independent optimization
phase between the front end and the back end.
• The purpose of this optimization phase is to perform
transformations on the intermediate representation.
Lexical Analysis
• The first phase of a compiler is called lexical analysis or
scanning.
• It reads the stream of characters making up the source
program and groups the characters into meaningful
sequences called lexemes.
• For each lexeme, the lexical analyzer produces as output a
token of the form that it passes on to the subsequent phase,
syntax analysis. (token-name; attribute-value)
(Token-name; attribute-value)
• In the token, the first component token-name is an
abstract symbol that is used during syntax analysis,
and the second component attribute-value points to
an entry in the symbol table for this token.
• Information from the symbol-table entry is needed
for semantic analysis and code generation.
Lexical Units
• Lexical Analysis Identifies Different Lexical Units in a
Source Code.
Some Lexical Classes are:
• Identifiers
• Constants
• Keywords
• Operators
Example
Suppose a source program contains the assignment statement.
position = initial + rate * 60
The characters in this assignment could b e grouped into the following lexemes.
Position Identifier
= Operator
Initial Identifier
+ Operator
Rate Identifier
* Operator
60 Constant
And mapped into the following tokens passed on to the syntax
analyzer.
position is a lexeme that would b e mapped into a token (id; 1),
where id is an abstract symbol standing for identifier and 1
points to the symbol-table entry for position. The symbol-table
entry for an identifier holds Information about the identifier,
such as its name and type.
The assignment symbol = is a lexeme that is mapped into the
token (=) . Since this token needs no attribute-value.
initial is a lexeme that is mapped into the token (id; 2), where 2
points to the symbol-table entry for initial.
+ is a lexeme that is mapped into the token (+).
rate is a lexeme that is mapped into the token (id; 3), where 3
points to the symbol-table entry for rate.
* is a lexeme that is mapped into the token (*).
60 is a lexeme that is mapped into the token (60).