_20221211_082916_570
_20221211_082916_570
Compiler 2
Lecture -1- : Types of parsers in compiler design
The parser is that phase of the compiler which takes a token string as input
and with the help of existing grammar, converts it into the corresponding
Intermediate Representation. The parser is also known as Syntax Analyzer.
Types of Parser:
The parser is mainly classified into two categories, i.e. Top-down Parser, and
Bottom-up Parser. These are explained below:
1- Top-Down Parser:
The top-down parser is the parser that generates parse for the given input
string with the help of grammar productions by expanding the non-terminals i.e. it
starts from the start symbol and ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types: Recursive descent parser,
and Non-recursive descent parser.
3rd Class
Compiler 2
Recursive descent parser is also known as the Brute force parser or the
backtracking parser. It basically generates the parse tree by using brute force and
backtracking.
Non-recursive descent parser is also known as LL(1) parser or predictive
parser or without backtracking parser or dynamic parser. It uses a parsing table to
generate the parse tree instead of backtracking.
2- Bottom-up Parser:
Bottom-up Parser is the parser that generates the parse tree for the given
input string with the help of grammar productions by compressing the non-
terminals i.e. it starts from non-terminals and ends on the start symbol. It uses the
reverse of the rightmost derivation.
Further Bottom-up parser is classified into two types: LR parser, and Operator
precedence parser.
LR parser is the bottom-up parser that generates the parse tree for the given
string by using unambiguous grammar. It follows the reverse of the rightmost
derivation.
LR parser is of four types:
a- LR(0) b- SLR(1) c-LALR(1) d-CLR(1)
Operator precedence parser generates the parse tree form given grammar
and string but the only condition is two consecutive non-terminals and epsilon
never appear on the right-hand side of any production.
Bottom Up Parsers / Shift Reduce Parsers
Bottom up parsers start from the sequence of terminal symbols and work
their way back up to the start symbol by repeatedly replacing grammar rules' right
hand sides by the corresponding non-terminal. This is the reverse of the derivation
process, and is called "reduction".
3rd Class
Compiler 2
Example:1 consider the grammar
S→ aABe
A→ Abc|b
B→ d
The sentence abbcde can be reduced to S by the following steps:
Sol:
abbcde
aAbcde
aAde
aABe
S
Example:2 consider the grammar
S→ aABe
A→ Abc|bc
B→ dd
3rd Class
Compiler 2
The sentence abcbcdde can be reduced to S by the following steps:
Sol:
abcbcdde
aAbcdde
aAdde
aABe
S
Example:3 Using the following arithmetic grammar
E E+T | T
T T*F | F
F (E) | id
Compiler 2
This derivation is in fact a RightMost Derivation (RMD):
Handle Pruning:
Bottom-up parsing during a left-to-right scan of the input constructs a
rightmost derivation in reverse. Informally, a "handle" is a substring that matches
the body of a production, and whose reduction represents one step along the reverse
of a rightmost derivation.
For example, adding subscripts to the tokens id for clarity, the handles
during the parse of idl * id2 according to the expression grammar
3rd Class
Compiler 2
E E+T | T
T T*F | F
F (E) | id
Although T is the body of the production E T, the symbol T is not a handle in
the sentential form T * id2.
If T were indeed replaced by E, we would get the string E * id2, which cannot be
derived from the start symbol E.
Thus, the leftmost substring that matches the body of some production need not be
a handle.
E T T * F F * F F * id1 id1 * id2 … (derivation)
Compiler 2
H.W.
For this grammar
E → E+T | T
T → T*F | F
F → id | (E)
Parse the input id * id + id
3rd Class
Compiler 2
Lecture -2- : LR Parser Family
Compiler 2
Parsing Table:
Parsing table is divided into two parts- Action table and Go-To table. The
action table gives a grammar rule to implement the given current state and current
terminal in the input stream.
1. The ACTION function takes as arguments a state i and a terminal a (or $, the
input endmarker).
The value of ACTION [i, a] can have one of four forms:
a. Shift j, where j is a state : The action taken by the parser effectively shifts
input a to the stack, but uses state j to represent a.
b. Reduce A β. The action of the parser effectively reduces β on the top of
the stack to head A.
c. Accept. The parser accepts the input and finishes parsing.
d. Error. The parser discovers an error in its input and takes some corrective
action.
2. We extend the GOTO function, defined on sets of items, to states:
if GOTO [ Ii, A] = Ij, then GOTO also maps a state i and a nonterminal A to
state j.
Example: The ACTION and GOTO functions of an LR-parsing table for the
expression the following grammar,
E E+T | T
T T*F | F
F (E) | id
2
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Repeated with the productions numbered
1. E E + T
2. E T
3. T T * F
4. T F
5. F (E)
6. F id
The codes for the actions are:
1. si means shift and stack state i,
2. rj means reduce by the production numbered j,
3. acc means accept,
4. blank means error.
First construct the set of items I0:
I0:
E •E + T r1
E •T r2
T •T * F r3
T •F r4
F •(E) r5
F •id r6
3
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
T •T * F
T •F
F •(E)
F •id
I5: Goto [I0, id]
F id• … Complete
Compiler 2
I11: Goto [I8, )]
F (E) • … Complete
Follow(E) = {$, +, )}
Follow(T) = {$, +, ), *}
Follow(F) = {$, +, ), *}
Compiler 2
The parsing actions for state i are determined as follows:
a. If [A α.αβ] is in Ii and GOTO (Ii, α) = Ij, then set ACTION [i, α] to
"shift j." Here α must be a terminal.
b. If [A α.] is in Ii, then set ACTION [i, α] to "reduce A α" for all α
in FOLLOW(A); here A may not be S'.
c. if [S' S.] is in Ii, then set ACTION [i, S] to "accept." If any conflicting
actions result from the above rules, we say the grammar is not SLR (1). The
algorithm fails to produce a parser in this case.
3. The goto transitions for state i are constructed for all nonterrninals A using the
rule: If GOTO (Ii, A) = Ij, then GOTO [i, A] = j.
4. All entries not defined by rules (2) and (3) are made "error."
5. The initial state of the parser is the one constructed from the set of items
containing [S' .S].
Example:
Let us construct the SLR table for the augmented expression grammar.
The canonical collection of sets of LR(0) items for the grammar.
I0:
E' •E r1
E •E + T r2
E •T r3
T •T * F r4
T •F r5
F •(E) r6
F •id r7
Compiler 2
I4: Goto [I0, (]
F (•E)
E •E + T
E •T
T •T * F
T •F
F •(E)
F •id
I5: Goto [I0, id]
F id• … Complete
7
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
I10: Goto [I7, F]
T T * F• … Complete
Goto [I7, (] = I4
Goto [I7, id] = I5
Follow(E) = {$, +, )}
Follow(T) = {$, +, ), *}
Follow(F) = {$, +, ), *}
8
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Lecture -3- : Syntax Directed Translation
Syntax Directed Translation has augmented rules to the grammar that
facilitate semantic analysis. SDT involves passing information bottom-up and/or
top-down the parse tree in form of attributes attached to the nodes. Syntax-directed
translation rules use:
1. Lexical values of nodes.
2. Constants.
3. Attributes associated with the non-terminals in their definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or
syntax tree and compute the values of attributes at the nodes of the tree by visiting
them in some order. In many cases, translation can be done during parsing without
building an explicit tree.
Example
E E+T | T
T T*F | F
F id
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules
to this grammar, in order to pass some information up the parse tree and check for
semantic errors, if any. In this example, we will focus on the evaluation of the given
expression, as we don’t have any semantic assertions to check in this very basic
example.
1. E E + T { E.val = E.val + T.val }
2. E T { E.val = T.val }
3. T T * F { T.val = T.val * F.val }
4. T F { T.val = F.val }
5. F id { F.val = id.lexval }
1
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Semantic analysis for (S = 2+3*4)
2
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
S–attributed and L–attributed SDTs in Syntax directed translation
1- Synthesized attributes
A Synthesized attribute is an attribute of the non-terminal on the left-hand
side of a production. Synthesized attributes represent information that is being
passed up the parse tree. The attribute can take value only from its children
(Variables in the RHS of the production).
For eg. let’s say A -> BC is a production of a grammar, and A’s attribute is
dependent on B’s attributes or C’s attributes then it will be synthesized attribute.
2- Inherited attributes
An attribute of a nonterminal on the right-hand side of a production is called
an inherited attribute. The attribute can take value either from its parent or from its
siblings (variables in the LHS or RHS of the production).
For example, let’s say A -> BC is a production of a grammar and B’s
attribute is dependent on A’s attributes or C’s attributes then it will be inherited
attribute.
S-attributed SDT:
If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
S-attributed SDTs are evaluated in bottom-up
parsing, as the values of the parent nodes depend
upon the values of the child nodes.
Semantic actions are placed in rightmost place
of RHS.
L-attributed SDT:
If an SDT uses both synthesized attributes and
inherited attributes with a restriction that inherited
attribute can inherit values from left siblings only, it
is called as L-attributed SDT.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right
parsing manner.
Semantic actions are placed anywhere in RHS.
3
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
For example: A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S are allowed
but Y.S = Z.S violates the L-attributed SDT definition as attributed is inheriting the
value from its right sibling.
4
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Lecture -4- : Semantic Analysis in Compiler Design
Semantic Analysis is the third phase of Compiler. Semantic Analysis makes
sure that declarations and statements of program are semantically correct. It is a
collection of procedures which is called by parser as and when required by grammar.
Both syntax tree of previous phase and symbol table are used to check the
consistency of the given code. Type checking is an important part of semantic
analysis where compiler makes sure that each operator has matching operands.
Semantic Analyzer:
It uses syntax tree and symbol table to check whether the given program is
semantically consistent with language definition. It gathers type information and
stores it in either syntax tree or symbol table. This type information is subsequently
used by compiler during intermediate-code generation.
Semantic Errors:
Errors recognized by semantic analyzer are as follows:
1. Type mismatch
2. Undeclared variables
3. Reserved identifier misuse
4. Multiple declaration of variable in a scope.
5. Accessing an out-of-scope variable.
6. Actual and formal parameter mismatch.
Compiler 2
Keeps a check that control structures are used in a proper manner.(example:
no break statement outside a loop)
Example:
float x = 10.1;
float y = x*30;
In the above example integer 30 will be type casted to float 30.0 before
multiplication, by semantic analyzer.
Static and Dynamic Semantics:
In many compilers, the work of the semantic analyzer takes the form of
semantic action routines, invoked by the parser when it realizes that it has reached
a particular point within a grammar rule.
Of course, not all semantic rules can be checked at compile time. Those that
can are referred to as the static semantics of the language. Those that must be
checked at run time are referred to as the dynamic semantics of the language. C has
very little in the way of dynamic checks.
Examples of rules that other languages enforce at run time include the
following:
■ Variables are never used in an expression unless they have been given a value.
■ Pointers are never dereferenced unless they refer to a valid object.
■ Array subscript expressions lie within the bounds of the array.
■ Arithmetic operations do not overflow.
Semantic analysis judges whether the syntax structure constructed in the source
program derives any meaning or not.
CFG + semantic rules = Syntax Directed Definitions
For example:
int a = “value”;
2
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Should not issue an error in lexical and syntax analysis phase, as it is lexically and
structurally correct, but it should generate a semantic error as the type of the
assignment differs. These rules are set by the grammar of the language and evaluated
in semantic analysis. The following tasks should be performed in semantic analysis:
Scope resolution
Type checking
Array-bound checking
If a semantic analyzer has a symbol table for each separate procedure, it can find
semantic errors that occur because of the following mistakes:
Names that aren’t declared
Operands of the wrong type for the operator they’re used with
Values that have the wrong type for the name to which they're assigned
If a semantic analyzer has a symbol table for the program as a whole, it can find
semantic errors that occur because of the following mistakes:
Procedures that are invoked with the wrong number of arguments
Procedures that are invoked with the wrong type of arguments
Function return values that are the wrong type for the context in which
they're used
If a semantic analyzer has control-flow and data-flow information for each separate
procedure, it can find semantic errors that occur because of the following mistakes:
If a semantic analyzer has control-flow and data-flow information for the program
as a whole, it can find semantic errors that occur because of the following
mistakes:
Procedures that are never invoked
Procedures that have no effect
3
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Global variables that are used before being initialized or assigned
Global variables that are initialized or assigned, but not used
Examples
1- the following code is correct
while (x <= 5)
writeOut "OK";
break;
;
Whereas the following one isn’t, and should be rejected.
while (x <= 5)
writeOut "OK";
;
break;
2- x = 3;
z = "abc";
y = x + z;
The three lines above should also generate a compilation error. The reason
is that the operator + is used with a int type (x) and a string type (z). Even
though this kind of operation may be allowed in some languages.
4
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Lecture -5- : Semantic Analysis (TYPE checking)
A semantic analyzer checks the source program for semantic errors. Type-
checking is an important part of semantic analyzer. Type checking is the process of
verifying and enforcing constraints of types in values and attempts to catch
programming errors based on the theory of types.
Two types of semantic Checks are performed within this phase these are:-
1. Static Semantic Checks are performed at compile time like:-
Type checking.
Every variable is declared before used.
Identifiers are used in appropriate contexts.
Check labels
2. Dynamic Semantic Checks are performed at run time, and the compiler
produces code that performs these checks:-
Array subscript values are within bounds.
Arithmetic errors, e.g. division by zero.
A variable is used but hasn’t been initialized.
Three kinds of languages:
1- Statically(typed: All or almost all checking of types is done as part of
compilation (C, Java)
2- Dynamically(typed: Almost all checking of types is done as part of program
execution (Scheme)
3- Un-typed: No type checking (machine code).
NOTE: Some programming languages such as C will combine both static and
dynamic typing i.e, some types are checked before execution while others
during execution.
1
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
The design of type checker depends on:
1- Syntactic structure of language constructor.
2- The type expression of language.
3- The rules of the assigning types to construct.
Type Expression and Type Systems
Type Expression
The type of a language construct will be denoted by a type expression. A type
expression is either a basic type or is formed by applying an operator called a type
constructor to other type expressions.
1- Basic type
• Integer: 7, 34, 909.
• Floating point: 5.34, 123, 87.
• Character: a, A.
• Boolean: not, and, or, xor.
2- Type constructor
Arrays: If T is a type expression, then array (I, T) is a type expression
denoting the type of an array with elements of type T and index set I.
Products: If T1 and T2 are type expressions, then their Cartesian
product T1×T2 is a type expression.
Records: The type of a record is in a sense the product of the types of
its fields. The difference between a record and a product is that the fields
of a record have names.
Pointers: If T is a type expression, then pointer (T) is a type expression
denoting the type pointer to an object of type T.
Functions: Functions take values in some domain and map them into
value in some range.
2
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Type System
Collection of rules for assigning types expression. In most languages, types
system are:
1- Basic types are the atomic types with no internal structure as far as the
programmer is concerned (int, char, float,….).
2- Constructed types are arrays, records, and sets. In addition, pointers and
functions can also be treated as constructed types.
3- Type Equivalence:
Name equivalence: Types are equivalence only when they have the
same name.
Structural equivalence: Types are equivalence when they have the
same structure.
Example: In C uses structural equivalence for structs and name
equivalence for arrays/pointers.
3
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Lecture -6-: Intermediate Code Generation
1
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
If we generate machine code directly from source code then for n target
machine we will have n optimizers and n code generators but if we will have a
machine independent intermediate code, we will have only one optimizer.
Intermediate code can be either language specific (e.g., Bytecode for Java) or
language independent (three-address code).
1- Postfix Notation –
The ordinary (infix) way of writing the sum of a and b is with operator in the
middle: a + b
The postfix notation for the same expression places the operator at the right
end as ab +. In general, if e1 and e2 are any postfix expressions, and + is any binary
operator, the result of applying + to the values denoted by e1 and e2 is postfix
notation by e1e2 +. No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators permit only one way to
decode a postfix expression. In postfix notation the operator follows the operand.
Example –
The postfix representation of the expression (a – b) * (c + d) + (a – b) is:
ab – cd + *ab -+.
2- Three-Address Code –
A statement involving no more than three references (two for operands and
one for result) is known as three address statement. A sequence of three address
statements is known as three address code. Three address statement is of the form
x = y op z where x, y, z will have address (memory location).
Sometimes a statement might contain less than three references but it is still
called three address statement.
Example – The three address code for the expression a + b * c + d :
T1=b*c
T2=a+T1
T3=T2+d
T 1 , T 2 , T 3 are temporary variables.
3- Syntax Tree –
2
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Syntax tree is nothing more than condensed form of a parse tree. The
operator and keyword nodes of the parse tree are moved to their parents and a
chain of single productions is replaced by single link in syntax tree the internal
nodes are operators and child nodes are operands. To form syntax tree put
parentheses in the expression, this way it's easy to recognize which operand
should come first.
Example –
x = (a + b * c) / (a – b * c)
3
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
The operation which change H.L.L to Assembly language, is called the
Intermediate code generation and there is the division operation come it,
which mean every statement have a sing operation.
Example: X=A+B*C/D-Y*N
T1= B*C
T2=T1/D
T3=Y*N
T4=A+T2
T5=T4-T3
Example: Y= Cos(A*B)+C/N-X*P
T1=A*B
T2=Cos(T1)
T3=X*p
T4=C/N
T5=T2+T4
T6=T5-T3
If Condition Statement:
Example:
X=1;
If (X>Y)
{ A=A+1;
B=B-A+2;
}
P=P+1;
60 P=P+1
4
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Example:
X=1
If ((X>Y) && (Y>=2))
{
A=A+1
B=B-A+2
}
Else X=X+1;
P=P+2+X;
For - Loop
Example:
For (i=1; i<=10;i++)
X = X+ (i*Y);
5
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
Issues in the design of a code generator
Code generator converts the intermediate representation of source code into
a form that can be readily executed by the machine. A code generator is expected
to generate the correct code. Designing of code generator should be done in such a
way so that it can be easily implemented, tested and maintained.
1. Input to code generator
The input to code generator is the intermediate code generated by the front
end, along with information in the symbol table that determines the run-time
addresses of the data-objects denoted by the names in the intermediate
representation. Intermediate codes may be represented mostly in quadruples,
triples, indirect triples, Postfix notation, syntax trees, DAG’s, etc. The code
generation phase just proceeds on an assumption that the input are free from
all of syntactic and state semantic errors, the necessary type checking has
taken place and the type-conversion operators have been inserted wherever
necessary.
2. Target program
The target program is the output of the code generator. The output may be
absolute machine language, relocatable machine language, assembly
language.
1. Absolute machine language as output has advantages that it can be
placed in a fixed memory location and can be immediately executed.
2. Relocatable machine language as an output allows subprograms and
subroutines to be compiled separately. Relocatable object modules can
be linked together and loaded by linking loader. But there is added
expense of linking and loading.
3. Assembly language as output makes the code generation easier. We can
generate symbolic instructions and use macro-facilities of assembler in
generating code. And we need an additional assembly step after code
generation.
3. Memory Management:
Mapping the names in the source program to the addresses of data objects
is done by the front end and the code generator. A name in the three address
statements refers to the symbol table entry for name. Then from the symbol
table entry, a relative address can be determined for the name.
6
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
4. Instruction selection:
Selecting the best instructions will improve the efficiency of the program.
It includes the instructions that should be complete and uniform. Instruction
speeds and machine idioms also plays a major role when efficiency is
considered. But if we do not care about the efficiency of the target program
then instruction selection is straight-forward.
For example, the respective three-address statements would be translated
into the latter code sequence as shown below:
P:=Q+R
S:=P+T
MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
7
By: Dr. Ielaf Osamah
3rd Class
Compiler 2
M a, b
6. Evaluation order:
The code generator decides the order in which the instruction will be
executed. The order of computations affects the efficiency of the target code.
Among many computational orders, some will require only fewer registers to
hold the intermediate results. However, picking the best order in the general
case is a difficult NP-complete problem.
7. Approaches to code generation issues:
Code generator must always generate the correct code. It is essential
because of the number of special cases that a code generator might face.
Some of the design goals of code generator are:
Correct
Easily maintainable
Testable
Efficient
8
By: Dr. Ielaf Osamah