0% found this document useful (0 votes)
21 views

Compiler Construction Week 2

Uploaded by

Tayyaba Tariq
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Compiler Construction Week 2

Uploaded by

Tayyaba Tariq
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Compiler Constructions

Zulfiqar Ali
UIT University
Week 2 - Contents
• More on Language Processors
• More on Compiler Structure
More on Language Processors
More on Language Processors
• In addition to a compiler,
several other programs may
be required to create an
executable target program.
Which are
• Preprocessor
• Compiler
• Assembler
• Linker/Loader
Preprocessors
• A program may need to include
modules available in different
separate file.
• Preprocessor combine all required
modules and prepare a complete
source program.
• It also expand Macros.
Compiler
• The source will be put to compiler
and it may produce assembly
(intermediate code).
• Depending on the hardware and
OS we may have different
Assemblers, and one assembly
code may not work on another
hardware/OS/Platform
Assembler
• The assembler then produces
relocatable machine code.
• There are 2 types Machine codes.
– Relcoatable Machine Code: It can be
load at any point in computer and
run.
– Absolute Machine Code: it can be
load in a Fixed Storage point and only
run at that point.
Linker/Loader
• Large Programs are often compiled into
pieces called relocatable object, and all
objects need to linked together.
• Linker: The linker resolves external
memory addresses, where the code in
one file may refer to a location in another
file.
• The loader then puts together all of the
executable object files into memory for
execution.
Structure of a Compiler
• If we examine the compilation process in more
detail, we see that it operates as a sequence
of phases, each of which transforms one
representation of the source program to
another.
Structure of Compiler
• We have already studied the basics of
Structures of compilers, If we examine the
compilation process in more detail, we see
that it operates as a sequence of phases, each
of which transforms one representation of the
source program to another.
• The symbol table,
which store
information about the
entire source program,
is used by all phases of
the compiler.
• In practice, several phases may be grouped together, and the
intermediate representations between the grouped phases
need not be constructed explicitly.
• Some compilers have a machine-independent optimization
phase between the front end and the back end.
• The purpose of this optimization phase is to perform
transformations on the intermediate representation.
Lexical Analysis
• The first phase of a compiler is called lexical analysis or
scanning.
• It reads the stream of characters making up the source
program and groups the characters into meaningful
sequences called lexemes.
• For each lexeme, the lexical analyzer produces as output a
token of the form that it passes on to the subsequent phase,
syntax analysis. (token-name; attribute-value)
(Token-name; attribute-value)
• In the token, the first component token-name is an
abstract symbol that is used during syntax analysis,
and the second component attribute-value points to
an entry in the symbol table for this token.
• Information from the symbol-table entry is needed
for semantic analysis and code generation.
Lexical Units
• Lexical Analysis Identifies Different Lexical Units in a
Source Code.
Some Lexical Classes are:
• Identifiers
• Constants
• Keywords
• Operators
Example
Suppose a source program contains the assignment statement.
position = initial + rate * 60
The characters in this assignment could b e grouped into the following lexemes.

Position  Identifier
=  Operator
Initial  Identifier
+  Operator
Rate  Identifier
*  Operator
60  Constant
And mapped into the following tokens passed on to the syntax
analyzer.
position is a lexeme that would b e mapped into a token (id; 1),
where id is an abstract symbol standing for identifier and 1
points to the symbol-table entry for position. The symbol-table
entry for an identifier holds Information about the identifier,
such as its name and type.
The assignment symbol = is a lexeme that is mapped into the
token (=) . Since this token needs no attribute-value.
initial is a lexeme that is mapped into the token (id; 2), where 2
points to the symbol-table entry for initial.
+ is a lexeme that is mapped into the token (+).
rate is a lexeme that is mapped into the token (id; 3), where 3
points to the symbol-table entry for rate.
* is a lexeme that is mapped into the token (*).
60 is a lexeme that is mapped into the token (60).

Blanks separating the lexemes would b e discarded by the lexical


analyzer.
(id, 1) (=) (id, 2) (+)(id, 3) (*) (60)
• In this representation, the token names =, + and * are
abstract symbols for the assignment, addition, and
multiplication operators, respectively.
• Here lexeme 60 should have token (id, 60), but we are
not doing this,, we will discuss this in details in upcoming
lectures.
Syntax Analysis
• The second phase of the compiler is syntax analysis or
parsing. The parser uses the first components of the
tokens produced by the lexical analyzer to create a tree-
like intermediate representation that depicts the
grammatical structure of the token stream.
• A typical representation is a syntax tree in which each
interior node represents an operation and the children
of the node represent the arguments of the operation.
position = initial + rate * 60
• This tree shows the order in which the operations in the assignment
are to be performed
• The tree has an interior node labeled * with (id, 3) as its left child
and the integer 60 as its right child.
• The node (id, 3) represents the identifier rate.
• The node labeled * makes it explicit that we must first multiply the
value of rate by 60.
position = initial + rate * 60
• The node labeled + indicates that we must add the result of this
multiplication to the value of initial.
• The root of the tree, labeled =, indicates that we must store the
result of this addition into the location for the identifier position.
• This ordering of operations is consistent with the usual conventions
of arithmetic which tell us that multiplication has higher precedence
than addition, and hence that the multiplication is to be performed
before the addition.
Semantic Analysis
• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition.
• It also gathers type information and saves it in either the syntax tree or the
symbol table, for subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
• For example, many programming language definitions require an array
index to be an integer, the compiler must report an error if a floating-point
number is used to index an array
• The language specification may permit some type conversions called
coercions. For example, a binary arithmetic operator may be applied to either
a pair of integers or to a pair of floating-point numbers. If the operator is
applied to a floating-point number and an integer, the compiler may convert or
coerce the integer into a floating-point number.
• Such a coercion appears in Fig. 1.7. Suppose that position, initial, and rate have
been declared to be floating-point numbers, and that the lexeme 60 by itself
forms an integer. The type checker in the semantic analyzer in Fig. 1.7
discovers that the operator * is applied to a floating-point number rate and an
integer 60. In this case, the integer may be converted into a floating-point
number.
Intermediate Code Generator
• After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation.
which we can think of as a program for an abstract machine.
• This intermediate representation should have two important properties: it
should be easy to produce and it should be easy to translate into the
target machine.
• The output of the intermediate code generator consists of the three-
address code sequence
Reference
• Compilers: Principles, Techniques, and Tools, A. V.
Aho, R. Sethi and J. D. Ullman, Addison-Wesley, 2nd
ed., 2006.
THANK YOU

You might also like