SlideShare a Scribd company logo
Syllabus
• Syntax Analysis - CFG, top-down and bottom-
up parsers, RDP, Predictive parser, SLR,LR(1),
LALR parsers, using ambiguous grammar, Error
detection and recovery, automatic
construction of parsers using YACC,
Introduction to Semantic analysis-Need of
semantic analysis, type checking and type
conversion.
1- By Jaydeep Patil AISSMS's IOIT Pune
UNIT 2
Syntax Analysis
2- By Jaydeep Patil AISSMS's IOIT Pune
Grammar
• A Set of formal rules for generating syntactically correct
sentence.
• It is defined by tuple G(V,T,P,S)
– V- Variables
– T- Terminals
– P-Production
– S-Start Symbol(variable)
• Terminal-{a,b,c,…..z,0-9}
• Nonterminal or Variable-{A-Z}
• Rule : LHS of production should at least contain one
non-terminal that is variable.
3- By Jaydeep Patil AISSMS's IOIT Pune
Grammar
Example 1
E->E+E
E->E*E
E->id
G(E,{id,*,+},P,E)
G={id,id+id,id*id,id+id*id,…………….}
Example 2
S->Xa
X->aX|bX|a|b
G({S,X},{a,b},P,S)
s->{aa,ba,aaa,baa………….}
4- By Jaydeep Patil AISSMS's IOIT Pune
CFG
• RULE: 1. Every Production of the form
𝐴 → α
A any Variable, α any terminal
2. α → β
On LHS of Production there has to be
only non-terminal, Variable not string
CFL(Context Free Language)-> A Set of
Sentences derived from start symbol of CGF is
called CFL.
6- By Jaydeep Patil AISSMS's IOIT Pune
Leftmost and Rightmost Derivations
E->E+E
E->E*E
E->id
• Derive Id+id*id
7- By Jaydeep Patil AISSMS's IOIT Pune
8- By Jaydeep Patil AISSMS's IOIT Pune
Ambiguity
• A grammar that produces more than one
parse tree for some sentence is said to be
ambiguous. Put another way, an ambiguous
grammar is one that produces more than one
leftmost derivation or more than one
rightmost derivation for the same sentence.
9- By Jaydeep Patil AISSMS's IOIT Pune
10- By Jaydeep Patil AISSMS's IOIT Pune
Left Recursion
• A grammar is left recursive if it has a nonterminal A such that
there is a derivation
A -> Aa
for some string a.
• Top-down parsing methods cannot handle left-recursive
grammars, so a transformation is needed to eliminate left
recursion.
A -> Aa I b
A-> bA’
A’->aA’/e
11- By Jaydeep Patil AISSMS's IOIT Pune
Left Factoring
• Left factoring is a grammar transformation that is useful for producing a
grammar suitable for predictive, or top-down, parsing. When the choice
between two alternative A-productions is not clear, we may be able to
rewrite the productions to defer the decision until enough of the input has
been seen that we can make the right choice.
• For example, if we have the two production
A -> ab1 I ab2
A-> aA’
A’-> b1|b2
12- By Jaydeep Patil AISSMS's IOIT Pune
Left Factoring
S-> cAd
A->ab|a
S->cAd
A->aA’
A’->b/e
13- By Jaydeep Patil AISSMS's IOIT Pune
Top Down Parsing
• Top-down parsing can be viewed as the problem of
constructing a parse tree for the input string, starting from the
root and creating the nodes of the parse tree in preorder.
Equivalently, top-down parsing can be viewed as finding a
leftmost derivation for an input string.
14- By Jaydeep Patil AISSMS's IOIT Pune
Recursive-Descent Parsing
• A recursive-descent parsing program consists
of a set of procedures, one for each
nonterminal. Execution begins with the
procedure for the start symbol, which halts
and announces success if its procedure body
scans the entire input string.
15- By Jaydeep Patil AISSMS's IOIT Pune
Recursive-Descent Parsing
16- By Jaydeep Patil AISSMS's IOIT Pune
Recursive-Descent Parsing
• General recursive-descent may require
backtracking; that is, it may require repeated
scans over the input. However, backtracking is
rarely needed to parse programming language
constructs, so backtracking parsers are not
seen frequently.
17- By Jaydeep Patil AISSMS's IOIT Pune
Recursive-Descent Parsing
Consider the grammar
s -> c A d
A -> a b I a
Get W={cad}
S->cAd
S->cad
18- By Jaydeep Patil AISSMS's IOIT Pune
Recursive-Descent Parsing
• A left-recursive grammar can cause a
recursive-descent parser, even one with
backtracking, to go into an infinite loop. That
is, when we try to expand a nonterminal A, we
may eventually find ourselves again trying to
expand A without having consumed any input.
19- By Jaydeep Patil AISSMS's IOIT Pune
FIRST and FOLLOW
• The construction of both top-down and
bottom-up parsers is aided by two functions,
FIRST and FOLLOW, associated with a
grammar G. During topdown parsing, FIRST
and FOLLOW allow us to choose which
production to apply, based on the next input
symbol. During panic-mode error recovery,
sets of tokens produced by FOLLOW can be
used as synchronizing tokens.
20- By Jaydeep Patil AISSMS's IOIT Pune
FIRST
• First is a function which gives the set of terminals
that begins the string derived from production
rules.
• Rules:
1. If x is a terminal then first(x)=x.
2. If X->e is a production, then add e to the set of
first[x].
3. If X is non terminal then,
3.a) If X->Y, first (Y) is an element of the set of first(x).
3.b) If first(Y) has e as an element & X->YZ then
first(x)=first(Y)-e U first(z)
21- By Jaydeep Patil AISSMS's IOIT Pune
Follow
• Follow is a function which gives set of terminals that
can appear immediately to the right of given symbol.
• Rules:
22- By Jaydeep Patil AISSMS's IOIT Pune
FIRST
• E->TE’
• E’->+TE’/ε
• T->FT’
• T’->*FT’/ ε
• F->(E)/id
First(E) = first(T) = First(F)={ (,id }
First(E’)={ +, ε }
First(T’)={*/ ε}
23- By Jaydeep Patil AISSMS's IOIT Pune
S-> iEtSS’/a
S’-> eS/ ε
E->b
First(S)={i,a}
First(S’)={e, ε)
First(E)={b}
24- By Jaydeep Patil AISSMS's IOIT Pune
S->A
A->aB/Ad
B->aBC/f
C->g
First(S)=First(A) ={a}
First(B)={a,f}
First(C)={g}
25- By Jaydeep Patil AISSMS's IOIT Pune
First(S)={1, ε}
First(A)={1,0}
First(B)={0}
First(C)={1}
S->1AB/ ε
A->1AC/0C
B->0S
C->1
26- By Jaydeep Patil AISSMS's IOIT Pune
S->AaAb/BbBa
A-> ε
B-> ε
First(A)={ε}
First(B)={ε}
First(S)={First(A)- ε U First(a)} U{First(B)- ε U First(b)={a,b}
27- By Jaydeep Patil AISSMS's IOIT Pune
S->aBbDh
B->cC
C->bc/ ε
D->EF
E->g/ ε
F->f/ ε
First(S)={a}
First(B)={c}
First(C)={b/ ε}
First(D)={First(E)- ε U First (F)}={g,f, ε}
First(E) = {g, ε}
First(F) = {f, ε}
28- By Jaydeep Patil AISSMS's IOIT Pune
FIRST & FOLLOW
• E->TE’
• E’->+TE’/ε
• T->FT’
• T’->*FT’/ ε
• F->(E)/id
First(E) = first(T) = First(F)={ (,id }
First(E’)={ +, ε }
First(T’)={*/ ε}
Follow(E)= { $,) }
Follow(E’)={$,) }
Follow(T)={First(E’)- ε U Follow(E’)}={+, $,) }
Follow(T’)={+,$,)}
Follow(F)={First(T’)- ε U Follow(T’)}={*,+, $,) }
29- By Jaydeep Patil AISSMS's IOIT Pune
S-> iEtSS’/a
S’-> eS/ ε
E->b
First(S)={i,a}
First(S’)={e, ε)
First(E)={b}
Follow(S)={$,e}
Follow(S’)={$,e}
Follow(T)={t}
30- By Jaydeep Patil AISSMS's IOIT Pune
S->A
A->aB/Ad
B->aBC/f
C->g
First(S)=First(A) ={a}
First(B)={a,f}
First(C)={g}
Follow(S)={$}
Follow(A)={$,d}
Follow(B)={$,d,g}
Follow(C)={$,d,g}
31- By Jaydeep Patil AISSMS's IOIT Pune
First(S)={1, ε}
First(A)={1,0}
First(B)={0}
First(C)={1}
Follow(S)={$}
Follow(A)={0,1}
Follow(B)={$}
Follow(C)={0,1)
S->1AB/ ε
A->1AC/0C
B->0S
C->1
32- By Jaydeep Patil AISSMS's IOIT Pune
S->AaAb/BbBa
A-> ε
B-> ε
First(A)={ε}
First(B)={ε}
First(S)={First(A)- ε U First(a)} U{First(B)- ε U First(b)={a,b}
Follow(S) ={$}
Follow(A)={a,b}
Follow(B}={b,a} 33- By Jaydeep Patil AISSMS's IOIT Pune
S->aBbDh
B->cC
C->bc/ ε
D->EF
E->g/ ε
F->f/ ε
First(S)={a}
First(B)={c}
First(C)={b/ ε}
First(D)={First(E)- ε U First (F)}={g,f, ε}
First(E) = {g, ε}
First(F) = {f, ε}
Follow(S)={$}
Follow(B)={b}
Follow(C)={b}
Follow(D)={h}
Follow(E)={f,h}
Follow(F}={h}
34- By Jaydeep Patil AISSMS's IOIT Pune
S->aBDh
B->cC
C->bc/ ε
D->EF
E->g/ ε
F->f/ ε
First(S)={a}
First(B)={c}
First(C)={b/ ε}
First(D)={First(E)- ε U First (F)}={g,f, ε}
First(E) = {g, ε}
First(F) = {f, ε}
Follow(S)={$}
Follow(B)={g,f,h}
Follow(C)={g,f,h}
Follow(D)={h}
Follow(E)={f,h}
Follow(F}={h}
35- By Jaydeep Patil AISSMS's IOIT Pune
• E->TA
• A->+TA/ε
• T->FB
• B->*FB/ ε
• F->(E)/id
First(E) = first(T) = First(F)={ (,id }
First(A)={ +, ε }
First(B)={*/ ε}
Follow(E)= { $,) }
Follow(A)={$,) }
Follow(T)={First(E’)- ε U Follow(E’)}={+, $,) }
Follow(B)={+,$,)}
Follow(F)={First(T’)- ε U Follow(T’)}={*,+, $,) }
36- By Jaydeep Patil AISSMS's IOIT Pune
LL ( 1 ) Grammars
• Predictive parsers, that is, recursive-descent parsers
needing no backtracking, can be constructed for a class
of grammars called LL(I) , The first "L" in LL(1) stands
for scanning the input from left to right, the second "L"
for producing a leftmost derivation, and the “1" for
using one input symbol of lookahead at each step to
make parsing action decisions.
• The class of LL(1) grammars is rich enough to cover
most programming constructs, although care is needed
in writing a suitable grammar for the source language .
For example, no left-recursive or ambiguous grammar
can be LL(1) .
37- By Jaydeep Patil AISSMS's IOIT Pune
Predictive Parsing
38- By Jaydeep Patil AISSMS's IOIT Pune
Grammar is LL(1)(No Multiple Entries)
39- By Jaydeep Patil AISSMS's IOIT Pune
Grammar is Not LL(1)(Multiple Entries)
40- By Jaydeep Patil AISSMS's IOIT Pune
Nonrecursive Predictive Parsing
• A non recursive predictive parser can be built
by maintaining a stack explicitly, rather than
implicitly via recursive calls. The parser mimics
a leftmost derivation. If w is the input that has
been matched so far , then the stack holds a
sequence of grammar symbols.
41- By Jaydeep Patil AISSMS's IOIT Pune
43- By Jaydeep Patil AISSMS's IOIT Pune
44- By Jaydeep Patil AISSMS's IOIT Pune
• If X=a=$ the parser halts & announces successful
completion of parsing
• If X=a|=$ the parser pops X the stack and advance the
i/p pointer to the next i/p symbol.
• If x is non terminal the program consults entry M[X,a]
of the parsing table M. This entry will either on X
production of the grammar or an error entry. If for
example m[x,a]->{UVW} the parser replaces x on the
top of stack by WVU(with U on top of stack). As output
we shall assume that the parser joust points to the
production used.
45- By Jaydeep Patil AISSMS's IOIT Pune
46- By Jaydeep Patil AISSMS's IOIT Pune
Bottom-Up Parsing
47- By Jaydeep Patil AISSMS's IOIT Pune
Bottom-Up Parsing
• A bottom-up parse corresponds to the
construction of a parse tree for an input string
beginning at the leaves (the bottom) and
working up towards the root (the top) .
48- By Jaydeep Patil AISSMS's IOIT Pune
49- By Jaydeep Patil AISSMS's IOIT Pune
Reductions
• We can think of bottom-up parsing as the
process of "reducing" a string w to the start
symbol of the grammar. At each reduction
step, a specific substring matching the body of
a production is replaced by the nonterminal at
the head of that production.
• The key decisions during bottom-up parsing
are about when to reduce and about what
production to apply, as the parse proceeds.
50- By Jaydeep Patil AISSMS's IOIT Pune
• The goal of bottom-up parsing is therefore to
construct a derivation in reverse. The
following derivation corresponds to the parse
in
• E => T => T * F => T * id => F * id => id * id
• This derivation is in fact a rightmost
derivation.
51- By Jaydeep Patil AISSMS's IOIT Pune
Handle
• Bottom-up parsing during a left-to-right scan
of the input constructs a rightmost derivation
in reverse. Informally, a "handle" is a substring
that matches the body of a production, and
whose reduction represents one step along
the reverse of a rightmost derivation.
52- By Jaydeep Patil AISSMS's IOIT Pune
Shift-Reduce Parsing
• Shift-reduce parsing is a form of bottom-up parsing in
which a stack holds grammar symbols and an input
buffer holds the rest of the string to be parsed.
• As we shall see, the handle always appears at the top of
the stack just before it is identified as the handle.
• We use $ to mark the bottom of the stack and also the
right end of the input. Conventionally, when discussing
bottom-up parsing, we show the top of the stack on the
right, rather than on the left as we did for top-down
parsing. Initially, the stack is empty, and the string w is
on the input, as follows:
55- By Jaydeep Patil AISSMS's IOIT Pune
• During a left-to-right scan of the input string,
the parser shifts zero or more input symbols
onto the stack, until it is ready to reduce a
string β of grammar symbols on top of the
stack. It then reduces β to the head of the
appropriate production. The parser repeats
this cycle until it has detected an error or until
the stack contains the start symbol and the
input is empty:
56- By Jaydeep Patil AISSMS's IOIT Pune
57- By Jaydeep Patil AISSMS's IOIT Pune
LR Parsing: Simple LR
• The 'most prevalent type of bottom-up parser
today is based on a concept called LR(k) parsing;
the "L" is for left-to-right scanning of the input,
the "R" for constructing a rightmost derivation in
reverse, and the k for the number of input
symbols of lookahead that are used in making
parsing decisions. The cases k = 0 or k = 1 are of
practical interest, and we shall only consider LR
parsers with k <= 1 here. When (k) is omitted, k is
assumed to be 1 .
58- By Jaydeep Patil AISSMS's IOIT Pune
Why LR Parsers
• LR parsers are table-driven, much like the
nonrecursive LL parsers. A grammar for which
we can construct a parsing table using one of
the methods in this section and the next is
said to be an LR grammar. Intuitively, for a
grammar to be LR it is sufficient that a left-to-
right shift-reduce parser be able to recognize
handles of right- sentential forms when they
appear on top of the stack.
59- By Jaydeep Patil AISSMS's IOIT Pune
60- By Jaydeep Patil AISSMS's IOIT Pune
• The principal drawback of the LR method is that it is too
much work to construct an LR parser by hand for a typical
programming-language grammar. A specialized tool, an LR
parser generator, is needed.
• Fortunately, many such generators are available, and we
shall discuss one of the most commonly used ones, Yacc .
• Such a generator takes a context-free grammar and
automatically produces a parser for that grammar. If the
grammar contains ambiguities or other constructs that are
difficult to parse in a left-to-right scan of the input, then the
parser generator locates these constructs and provides
detailed diagnostic messages.
61- By Jaydeep Patil AISSMS's IOIT Pune
Items and the LR(O) Automaton
62- By Jaydeep Patil AISSMS's IOIT Pune
Augmented grammar
• If G is a grammar with start symbol S, then G' ,
the augmented grammar for G, is G with a ne
• start symbol S' and production S' -> S. The
purpose of this new starting production is to
indicate to the parser when it should stop
parsing and announce acceptance of the
input. That is, acceptance occurs when and
only when the parser is about to reduce by
• S' -> S.
63- By Jaydeep Patil AISSMS's IOIT Pune
64- By Jaydeep Patil AISSMS's IOIT Pune
E’-> E
E-> E+T|T
T-> T*F|F
F->(E)|id
65- By Jaydeep Patil AISSMS's IOIT Pune
66- By Jaydeep Patil AISSMS's IOIT Pune
67- By Jaydeep Patil AISSMS's IOIT Pune
68- By Jaydeep Patil AISSMS's IOIT Pune
69- By Jaydeep Patil AISSMS's IOIT Pune
70- By Jaydeep Patil AISSMS's IOIT Pune
Canonical(CLR)- Parsing
71- By Jaydeep Patil AISSMS's IOIT Pune
72- By Jaydeep Patil AISSMS's IOIT Pune
73- By Jaydeep Patil AISSMS's IOIT Pune
74- By Jaydeep Patil AISSMS's IOIT Pune
75- By Jaydeep Patil AISSMS's IOIT Pune
Operator Precedence Parsing
• Operator Grammar: For Small but important
class of grammar, we can easily construct
efficient shift reduce parsers by hand.
Operator Grammars have the property that no
production right side is empty or has two
adjacent non-terminals.
76- By Jaydeep Patil AISSMS's IOIT Pune
• Eg: E-> EAE/ (E)/ -E/id
• A->+/-/|/*/
• The above grammar is not operator grammar
but we can readjust the grammar.
77- By Jaydeep Patil AISSMS's IOIT Pune
• In Operator Precedence Parsing we define
three disjoint precedence relations <· , =· ,·>
between certain pair of terminals. This
precedence relations guide the selection of
handle & have the following meaning
78- By Jaydeep Patil AISSMS's IOIT Pune
• Basic Principle
• Having precedence relations allows identifying handles as follows:
• 1. Scan the string from left until seeing ·> and put a pointer.
• 2. Scan backwards the string from right to left until seeing <·
• 3. Everything between the two relations <· and ·> forms the handle
• 4. Replace handle with the head of the production.
79- By Jaydeep Patil AISSMS's IOIT Pune
80- By Jaydeep Patil AISSMS's IOIT Pune
Conflicts in LR Parsing
• Every SLR grammar is unambiguous, but every unambiguous
grammar is not a SLR grammar.
81- By Jaydeep Patil AISSMS's IOIT Pune
shift/reduce and reduce/reduce
conflicts
• If a state does not know whether it will make a shift
operation or reduction for a terminal, we say that there
is a shift/reduce conflict.
• If a state does not know whether it will make a
reduction operation using the production rule i or j
for a terminal, we say that there is a reduce/reduce
conflict.
• If the SLR parsing table of a grammar G has a conflict,
we say that that grammar is not SLR grammar.
82- By Jaydeep Patil AISSMS's IOIT Pune
83- By Jaydeep Patil AISSMS's IOIT Pune
84- By Jaydeep Patil AISSMS's IOIT Pune
Using Ambiguous Grammars
• All grammars used in the construction of LR-parsing
tables must be un-ambiguous.
• Can we create LR-parsing tables for ambiguous
grammars ?
– Yes, but they will have conflicts.
– We can resolve these conflicts in favor of one of them to disambiguate the grammar.
– At the end, we will have again an unambiguous grammar.
• Why we want to use an ambiguous grammar?
– Some of the ambiguous grammars are much natural, and a corresponding unambiguous
grammar can be very complex.
– Usage of an ambiguous grammar may eliminate unnecessary reductions.
• Ex.
E  E+T | T
E  E+E | E*E | (E) | id  T  T*F | F
F  (E) | id
85- By Jaydeep Patil AISSMS's IOIT Pune
Sets of LR(0) Items for Ambiguous
Grammar
I0: E’  .E
E  .E+E
E  .E*E
E  .(E)
E  .id
I1: E’  E.E  E .+E
E  E .*E
I2: E  (.E)
E  .E+E
E  .E*E
E  .(E)
E  .id
I3: E  id.
I4: E  E +.E
E  .E+E
E  .E*E
E  .(E)
E  .id
I5: E  E *.E
E  .E+E
E  .E*E
E  .(E)
E  .id
I6: E  (E.)
E  E.+E
E  E.*E
I7: E  E+E.E  E.+E
E  E.*E
I8: E  E*E.E  E.+E
E  E.*E
I9: E  (E).
I5
)
E
E
E
E
*
+
+
+
+
*
*
*
(
(
(
(
id
id
id
id
I4
I2
I2
I3
I3
I4
I4
I5
I5
86- By Jaydeep Patil AISSMS's IOIT Pune
Using ambiguous grammars
87- By Jaydeep Patil AISSMS's IOIT Pune
88- By Jaydeep Patil AISSMS's IOIT Pune
89- By Jaydeep Patil AISSMS's IOIT Pune
90- By Jaydeep Patil AISSMS's IOIT Pune
91- By Jaydeep Patil AISSMS's IOIT Pune
92- By Jaydeep Patil AISSMS's IOIT Pune
Error Recovery in LR Parsing
• An LR parser will detect an error when it consults the
parsing action table and finds an error entry. All empty
entries in the action table are error entries.
• Errors are never detected by consulting the goto table.
• An LR parser will announce error as soon as there is no
valid continuation for the scanned portion of the input.
• A canonical LR parser (LR(1) parser) will never make
even a single reduction before announcing an error.
• The SLR and LALR parsers may make several reductions
before announcing an error.
• But, all LR parsers (LR(1), LALR and SLR parsers) will
never shift an erroneous input symbol onto the stack.
93- By Jaydeep Patil AISSMS's IOIT Pune
ERROR RECOVERY IN LR PARSING
• An LR Parser will detect an error when it consults
parsing table and finds an error entry. A canonical
parser will never make even a single reduction before
announcing an error. The SLR and LALR parsers may
take several reductions before announcing an error,
but they will never shift an erroneous input into the
stack..
• We can implement two modes of recovery :
94- By Jaydeep Patil AISSMS's IOIT Pune
Panic Mode
• We scan down the stack until a state s with a goto
on a particular non-terminal A is found. Zero or more
input symbols are then discarded until a symbol a is
found that can legitimately follow A. The parser then
stack the state goto[ s, A] and resume normal
parsing. Normally there may be many choices for the
non terminal A. Normally these would be non-
terminals representing major program pieces, such
as an expression, statement, or block.
95- By Jaydeep Patil AISSMS's IOIT Pune
Phrase Level Recovery
• It is implemented by examining each error
entry in the LR parsing table and deciding on
the basis of language the most likely program
error that give rise to that error entry in the LR
parsing table. An appropriate error procedure
than can be implemented; presumably the top
of the stack and/or first input symbols would
be modified in a way deemed appropriate for
each error.
96- By Jaydeep Patil AISSMS's IOIT Pune
• As an example consider the grammar (1).
E  E + E | E * E | ( E ) | id ---- (1)
The parsing table contains error routines that
have effect of detecting error before any
shift move takes place.
97- By Jaydeep Patil AISSMS's IOIT Pune
Id + * ( ) $ E
0 s3 e1 e1 s2 e2 e1 1
1 e3 s4 s5 e3 e2 acc
2 s3 e1 e1 s2 e2 e1 6
3 r4 r4 r4 r4 r4 r4
4 s3 e1 e1 s2 e2 e1 7
5 s3 e1 e1 s2 e2 e1 8
6 e3 s4 s5 e3 s9 e4
7 r1 r1 s5 r1 r1 r1
8 r2 r2 r2 r2 r2 r2
9 r3 r3 r3 r3 r3 r3
The LR parsing table with error routines
98- By Jaydeep Patil AISSMS's IOIT Pune
Error routines : e1
• This routine is called from states 0,2,4 and 5,
all of which the beginning of the operand,
either an id of left parenthesis. Instead an
operator, + or *, or the end of input was
found.
• Action: Push an imaginary id on to the stack
and cover it with a state 3. (the goto of the
states 0, 2, 4 and 5)
• Print: Issue diagnostic “missing operand”
99- By Jaydeep Patil AISSMS's IOIT Pune
Error routines : e2
• This routine is called from states 0, 1, 2, 4 and
5 on the finding a right parenthesis.
• Action: Remove the right parenthesis from the
input
• Print: Issue diagnostic “Unbalanced right
parenthesis”
100- By Jaydeep Patil AISSMS's IOIT Pune
Error routines : e3
• This routine is called from states 1 or 6 when
expecting an operator, and an id or right
parenthesis is found.
• Action: Push + onto the stack and cover it with
state 4.
• Print: Issue diagnostic “Missing operator”
101- By Jaydeep Patil AISSMS's IOIT Pune
Error routines : e4
• This routine is called from state 6 when the
end of input is found while expecting operator
or a right parenthesis.
• Action: Push a right parenthesis onto the stack
and cover it with a state 9.
• Print: Issue diagnostic “Missing right
parenthesis”
102- By Jaydeep Patil AISSMS's IOIT Pune
Automatic construction of parsers
(YACC), YACC specifications.
103- By Jaydeep Patil AISSMS's IOIT Pune
104
Automatic construction of parsers
(YACC), YACC specifications.
• Two classical tools for compilers:
– Lex: A Lexical Analyzer Generator
– Yacc: “Yet Another Compiler Compiler” (Parser Generator)
• Lex creates programs that scan your tokens one by one.
• Yacc takes a grammar (sentence structure) and generates a
parser.
Lex Yacc
yylex() yyparse()
Lexical Rules Grammar Rules
Input Parsed Input
- By Jaydeep Patil AISSMS's IOIT Pune
105
Automatic construction of parsers
(YACC), YACC specifications.
• Lex and Yacc generate C code for your analyzer & parser.
Lex Yacc
yylex() yyparse()
Lexical Rules Grammar Rules
Input
Parsed
Input
C code C code
C code C code
Lexical Analyzer
(Tokenizer)
Parser
char
stream
token
stream
- By Jaydeep Patil AISSMS's IOIT Pune
106
Automatic construction of parsers
(YACC), YACC specifications.
• Often, instead of the standard Lex and Yacc,
Flex and Bison are used:
– Flex: A fast lexical analyzer
– (GNU) Bison: A drop-in replacement for (backwards
compatible with) Yacc
• Byacc is Berkeley implementation of Yacc (so it
is Yacc).
• Resources:
– http://en.wikipedia.org/wiki/Flex_lexical_analyser
– http://en.wikipedia.org/wiki/GNU_Bison
• The Lex & Yacc Page (manuals, links):
– http://dinosaur.compilertools.net/
- By Jaydeep Patil AISSMS's IOIT Pune
107
Automatic construction of parsers
(YACC), YACC specifications.
• Yacc is not a new tool, and yet, it is still used in many
projects.
• Yacc syntax is similar to Lex/Flex at the top level.
• Lex/Flex rules were regular expression – action pairs.
• Yacc rules are grammar rule – action pairs.
declarations
%%
rules
%%
programs
- By Jaydeep Patil AISSMS's IOIT Pune
108- By Jaydeep Patil AISSMS's IOIT Pune
• Declaration Section
• There are two sections in the declarations part of a Yacc program;
both are optional. In the first section, we put ordinary C
declarations, delimited by %{ and % }. Here we place declarations of
any temporaries used by the translation rules or procedures of the
second and third sections.
Ex. #include <ctype .h>
• The C preprocessor to include the standard header file <ctype . h>
that contains the predicate isdigit.
• Also in the declarations part are declarations of grammar tokens.
• %token DIGIT
• Tokens declared in this section can then be used in the second and
third parts of the Yacc specification. , If Lex is used to create the
lexical analyzer that passes token to the Yacc parser, then these
token declarations are also made available to the analyzer
generated by Lex.
109- By Jaydeep Patil AISSMS's IOIT Pune
• The Translation Rules Part
• In the part of the Yacc specification after the
first %% pair, we put the translation rules .
Each rule consists of a grammar production
and the associated semantic action. A set of
productions that we have been writing:
110- By Jaydeep Patil AISSMS's IOIT Pune
• In a Yacc production, unquoted strings of letters
and digits not declared to be tokens are taken to
be non terminals. A quoted single character, e.g. '
c' , is taken to be the terminal symbol c , as well
as the integer code for the token represented by
that character (i.e., Lex would return the
character code for ' c ‘ to the parser, as an
integer) . Alternative bodies can be separated by
a vertical bar, and a semicolon follows each head
with its alternatives and their semantic actions.
The first head is taken to be the start symbol.
111- By Jaydeep Patil AISSMS's IOIT Pune
• A Yacc semantic action is a sequence of C statements. In a
semantic action, the symbol $$ refers to the attribute value
associated with the nonterminal of the head, while $i refers to
the value associated with the ith grammar symbol (terminal or
nonterminal) of the body. The semantic action is performed
whenever we reduce by the associated production, so
normally the semantic action computes a value for $$ in
terms of the $i's. In the Yacc specification, we have written
the two E-productions
112- By Jaydeep Patil AISSMS's IOIT Pune
• Note that the nonterminal term in the first production is
the third grammar symbol of the body, while + is the
second. The semantic action associated with the first
production adds the value of the expr and the term of the
body and assigns the result as the value for the
nonterminal expr of the head. We have omitted the
semantic action for the second production altogether, since
copying the value is the default action for productions with
a single grammar symbol in the body. In general, { $$ = $ 1 ;
} is the default semantic action. Notice that we have added
a new starting production
• line : expr ' n ' { print f C " %dn" , $ 1 ) ; }
• to the Yacc specification. This production says that an input
to the desk calculator is to be an expression followed by a
newline character. The semantic action associated with this
production prints the decimal value of the expression
followed by a newline character.
113- By Jaydeep Patil AISSMS's IOIT Pune
• The Supporting C-Routines Part
• The third part of a Yacc specification consists of supporting C-
routines. A lexical analyzer by the name yylex() must be provided.
Using Lex to produce yylex() is a common choice; The lexical
analyzer yylex() produces tokens consisting of a token name and its
associated attribute value. If a token name such as DIGIT is
returned, the token name must be declared in the first section of
the Yacc specification.
• The attribute value associated with a token is communicated to the
parser through a Yacc-defined variable yylval. It reads input
characters one at a time using the C-function getchar() . If the
character is a digit, the value of the digit is stored in the variable
yylval, and the token name DIGIT is returned. Otherwise, the
character itself is returned as the token name.
114- By Jaydeep Patil AISSMS's IOIT Pune
115- By Jaydeep Patil AISSMS's IOIT Pune
yacc –d bas.y # create y.tab.h, y.tab.c
lex bas.l # create lex.yy.c
cc lex.yy.c y.tab.c –o bas.exe # compile/link
116- By Jaydeep Patil AISSMS's IOIT Pune
• Yacc reads the grammar descriptions in bas.y and generates a
syntax analyzer (parser), that includes function yyparse, in file
y.tab.c. The –d option causes yacc to generate definitions for
tokens and place them in file y.tab.h. Lex reads the pattern
descriptions in bas.l, includes file y.tab.h, and generates a
lexical analyzer, that includes function yylex, in file lex.yy.c.
• Finally, the lexer and parser are compiled and linked together
to create executable bas.exe. From main we call yyparse to
run the compiler. Function yyparse automatically calls yylex
to obtain each token.
117- By Jaydeep Patil AISSMS's IOIT Pune
• %token INTEGER
• This definition declares an INTEGER token. Yacc generates a parser in file y.tab.c
and an include file, y.tab.h:
• #ifndef YYSTYPE
• #define YYSTYPE int
• #endif
• #define INTEGER 258
• extern YYSTYPE yylval;
• Lex includes this file and utilizes the definitions for token values. To obtain tokens
yacc calls yylex. Function yylex has a return type of int that returns a token. Values
associated with the token are returned by lex in variable yylval. For example,
• [0-9]+ { yylval = atoi(yytext); return INTEGER; }
• would store the value of the integer in yylval, and return token INTEGER to yacc.
The type of yylval is determined by YYSTYPE. Since the default type is integer this
works well in this case. Token values 0-255 are reserved for character values. For
example, if you had a rule such as
• [-+] return *yytext; /* return operator */
• the character value for minus or plus is returned. Note that we placed the minus
sign first so that it wouldn’t be mistaken for a range designator. Generated token
values typically start around 258 because lex reserves several values for end-of-file
and error processing.
118- By Jaydeep Patil AISSMS's IOIT Pune
• By default yylval is of type int, but you can override that from the
YACC file by re#defining YYSTYPE.
• The Lexer needs to be able to access yylval. In order to do so, it
must be declared in the scope of the lexer as an extern variable.
The original YACC neglects to do this for you, so you should add the
following to your lexer, just beneath
• #include <y.tab.h>:
• extern YYSTYPE yylval;
• Bison does this for you automatically.
• #ifndef checks whether the given token has been #defined earlier
in the file or in an included file; if not, it includes the code between
it and the closing #else or, if no #else is present, #endif statement.
119- By Jaydeep Patil AISSMS's IOIT Pune
120- By Jaydeep Patil AISSMS's IOIT Pune
• Internally yacc maintains two stacks in memory; a
parse stack and a value stack. The parse stack
contains terminals and nonterminals that
represent the current parsing state. The value
stack is an array of YYSTYPE elements and
associates a value with each element in the parse
stack. For example when lex returns an INTEGER
token yacc shifts this token to the parse stack. At
the same time the corresponding yylval is shifted
to the value stack. The parse and value stacks are
always synchronized so finding a value related to
a token on the stack is easily accomplished.
121- By Jaydeep Patil AISSMS's IOIT Pune
122- By Jaydeep Patil AISSMS's IOIT Pune
• The left-hand side of a production, or nonterminal, is entered left-justified
and followed by a colon. This is followed by the right-hand side of the
production. Actions associated with a rule are entered in braces.
• With left-recursion, we have specified that a program consists of zero or
more expressions. Each expression terminates with a newline. When a
newline is detected we print the value of the expression. When we apply
the rule
• expr: expr '+' expr { $$ = $1 + $3; }
• we replace the right-hand side of the production in the parse stack with
the left-hand side of the same production. In this case we pop “expr '+'
expr” and push “expr”. We have reduced the stack by popping three terms
off the stack and pushing back one term. We may reference positions in
the value stack in our C code by specifying “$1” for the first term on the
right-hand side of the production, “$2” for the second, and so on. “$$”
designates the top of the stack after reduction has taken place. The above
action adds the value associated with two expressions, pops three terms
off the value stack, and pushes back a single sum. As a consequence the
parse and value stacks remain synchronized.
123- By Jaydeep Patil AISSMS's IOIT Pune
• Numeric values are initially entered on the stack when we
reduce from INTEGER to expr. After INTEGER is shifted to
the stack we apply the rule
• expr: INTEGER { $$ = $1; }
• The INTEGER token is popped off the parse stack followed
by a push of expr. For the value stack we pop the integer
value off the stack and then push it back on again. In other
words we do nothing. In fact this is the default action and
need not be specified. Finally, when a newline is
encountered, the value associated with expr is printed.
• In the event of syntax errors yacc calls the user-supplied
function yyerror. If you need to modify the interface to
yyerror then alter the canned file that yacc includes to fit
your needs. The last function in our yacc specification is
main. This example still has an ambiguous grammar.
Although yacc will issue shift-reduce warnings it will still
process the grammar using shift as the default operation.
124- By Jaydeep Patil AISSMS's IOIT Pune
• The lexical analyzer returns VARIABLE and INTEGER tokens. For variables yylval
specifies an index to the symbol table sym. For this program sym merely holds the
value of the associated variable. When INTEGER tokens are returned, yylval
contains the number scanned.
125- By Jaydeep Patil AISSMS's IOIT Pune
• The input specification for yacc follows. The
tokens for INTEGER and VARIABLE are utilized by
yacc to create #defines in y.tab.h for use in lex.
This is followed by definitions for the arithmetic
operators. We may specify %left, for left-
associative or %right for right associative. The last
definition listed has the highest precedence.
Consequently multiplication and division have
higher precedence than addition and subtraction.
All four operators are left-associative. Using this
simple technique we are able to disambiguate
our grammar.
126- By Jaydeep Patil AISSMS's IOIT Pune
127- By Jaydeep Patil AISSMS's IOIT Pune
• extern void *malloc();
• malloc accepts an argument of type size_t,
and size_t may be defined as unsigned long. If
you are passing ints (or even unsigned ints),
malloc may be receiving garbage (or similarly
if you are passing a long but size_t is int).
128- By Jaydeep Patil AISSMS's IOIT Pune
Semantic Analysis
129- By Jaydeep Patil AISSMS's IOIT Pune
Beyond syntax analysis
•An identifier named x has been recognized.
–Is x a scalar, array or function?
–How big is x?
–If x is a function, how many and what type of arguments does it take?
–Is x declared before being used?
–Where can x be stored?
–Is the expression x+y type-consistent?
•Semantic analysis is the phase where we collect information about the types
of expressions and check for type related errors.
•The more information we can collect at compile time, the less overhead we
have at run time.
130- By Jaydeep Patil AISSMS's IOIT Pune
Semantic Analysis
•The syntax of a programming language
describes the proper form of its programs,
•while the semantics of the language defines
what its programs mean; that is, what each
program does when it executes.
131- By Jaydeep Patil AISSMS's IOIT Pune
Semantic analysis
•Collecting type information may involve "computations"
–What is the type of x+y given the types of x and y?
•Tool: attribute grammars
–Each grammar symbol has a number of associated attributes:
–The type of a variable or expression
–The value of a variable or expression
–The code for a statement
–Etc.
–The grammar is augmented with special equations (called semantic
actions) that specify how the values of attributes are computed from other
attributes.
–The process of using semantic actions to evaluate attributes is called
syntax-directed translation.
132- By Jaydeep Patil AISSMS's IOIT Pune
•TYPE Checking
133- By Jaydeep Patil AISSMS's IOIT Pune
•A compiler must check that the source program
follows both syntactic and semantic conversion of the
source language.
•This checking called static checking (to distinguish it
from dynamic checking during execution of the target
program), ensure that certain kinds of programming
errors will be detected and reported.
134- By Jaydeep Patil AISSMS's IOIT Pune
•Example of Static Checks:
–Type Checks : A Compiler Should report error if an operator
applied to an incompatible operand.
–Flow of Control Checks: Statement that cause flow of
control to leave a construct must have some place to which
to transfer the flow of control.
–Uniqueness check: There are some situations in which an
object must be defined only once.
–Name-Related Check: Sometimes the same name must
appear two or more times. Ex. In ADA, A loop or block may
have a name that appears at the beginning and end of the
construct. The compiler must check same name is used at
both places.
135- By Jaydeep Patil AISSMS's IOIT Pune
Type Checking
•TYPE CHECKING is the main activity in semantic
analysis.
•Goal: calculate and ensure consistency of the type of
every expression in a program
•If there are type errors, we need to notify the user.
•Otherwise, we need the type information to generate
code that is correct.
136- By Jaydeep Patil AISSMS's IOIT Pune
137
Type Systems and Type
Expressions
137- By Jaydeep Patil AISSMS's IOIT Pune
Type systems
•Every language has a set of types and rules for
assigning types to language constructs.
•Example from the C specification:
–“The result of the unary & operator is a pointer to the
object referred to by the operand. If the type of the operand
is ‘…’ then the type of the result is ‘pointer to …’
•Usually, every expression has a type.
•Type have structure: the type ‘pointer to int’ is
•CONSTRUCTED from the type ‘int’
138- By Jaydeep Patil AISSMS's IOIT Pune
Basic vs. constructed types
•Most programming languages have basic and
constructed types.
•BASIC TYPES are the atomic types provided by the
language.
–Pascal: boolean, character, integer, real
–C: char, int, float, double
•CONSTRUCTED TYPES are built up from basic types.
–Pascal: arrays, records, sets, pointers
–C: arrays, structs, pointers
139- By Jaydeep Patil AISSMS's IOIT Pune
Type expressions
•We denote the type of language constructs with TYPE
EXPRESSIONS.
•Type expressions are built up with TYPE
CONSTRUCTORS.
1.A basic type is a type expression. The basic types are
boolean, char, integer, and real. The special basic type
type_error signifies an error. The special type void
signifies “no type”
2.A type name is a type expression (type names are like
typedefs in C)
140- By Jaydeep Patil AISSMS's IOIT Pune
Type expressions
1.A type constructor applied to type expressions is a type expression.
a.Arrays: if T is a type expression, then pointer(T) is a type expression denoting
the type “pointer to an object of type T”
b.Array(I,T)  I: index set, T: element type
c.Products: if T1 and T2 are type expressions, then their Cartesian product T1 ×
T2 is also a type expression.
d.Records: a record is a special kind of product in which the fields have names
(examples below)
e.Pointers: if T is a type expression, then pointer(T) is a type expression denoting
the type “pointer to an object of type T”
f.Functions: functions map elements of a domain D to a range R, so we write D ->
R to denote “function mapping objects of type D to objects of type R” (examples
below)
2.Type expressions may contain variables, whose values are themselves type
expressions.  polymorphism
141- By Jaydeep Patil AISSMS's IOIT Pune
Record type expressions
•The Pascal code
• type row = record
• address: integer;
• lexeme: array[1..15] of char
• end;
• var table: array[1..10] of row;
•associates type expression
•record((address × integer) × (lexeme × array(1..15,char)))
•with the variable row, and the type expression
•array(1..101,record((address × integer) × (lexeme × array(1..15,char)))
•with the variable table
142- By Jaydeep Patil AISSMS's IOIT Pune
Function type expressions
•The C declaration
•int *foo( char a, char b );
•would associate type expression
•char × char -> pointer(integer)
•with foo. Some languages (like ML) allow all sorts of
crazy function types, e.g.
• (integer -> integer) -> (integer -> integer)
•denotes functions taking a function as input and
returning another function
143- By Jaydeep Patil AISSMS's IOIT Pune
Graph representation of type expressions
•The recursive structure of a type can be represented
with a tree, e.g. for char × char -> pointer(integer):
•Some compilers explicitly use graphs like these to
represent the types of expressions.
144- By Jaydeep Patil AISSMS's IOIT Pune
Type systems and checkers
•A TYPE SYSTEM is a set of rules for assigning
type expressions to the parts of a program.
•Every type checker implements some type
system.
•Syntax-directed type checking is a simple
method to implement a type checker.
145- By Jaydeep Patil AISSMS's IOIT Pune
Static vs. dynamic type checking
•STATIC type checking is done at compile time.
•DYNAMIC type checking is done at run time.
•Any kind of type checking CAN be done at run time.
•But this reduces run-time efficiency, so we want to do static
checking when possible.
•A SOUND type system is one in which ALL type errors can be
found statically.
•If the compiler guarantees that every program it accepts will run
without type errors, then the language is STRONGLY TYPED.
146- By Jaydeep Patil AISSMS's IOIT Pune
147
An Example Type Checker
147- By Jaydeep Patil AISSMS's IOIT Pune
Example type checker
•Let’s build a translation scheme to synthesize
the type of every expression from its
subexpressions.
•Here is a Pascal-like grammar for a sequence of
declarations (D) followed by an expression (E)
•Example program: key: integer;
• key mod 1999
P → D ; E
D → D ; D | id : T
T → char | integer | array [ num ] of T | ↑ T
E → literal | num | id | E mod E | E [ E ] | E ↑
148- By Jaydeep Patil AISSMS's IOIT Pune
The type system
•The basic types are char and integer.
•type_error signals an error.
•All arrays start at 1, so
•array[256] of char
•leads to type expression: array(1..256,char)
•The symbol ↑ in an declaration specifies a pointer
type,
•so
• ↑ integer
•leads to type expression: pointer(integer)
149- By Jaydeep Patil AISSMS's IOIT Pune
Translation scheme for
declarations
•P → D ; E
•D → D ; D
•D → id : T { addtype(id.entry, T.type) }
•T → char { T.type := char }
•T → integer { T.type := integer }
•T → ↑T1 { T.type := pointer(T1.type) }
•T → array [ num ] of T1
• { T.type := array(1 .. num.val, T1.type) }
150- By Jaydeep Patil AISSMS's IOIT Pune
Type checking for expressions
•E → literal { E.type := char }
•E → num { E.type := integer }
•E → id { E.type := lookup(id.entry) }
•E → E1 mod E2 { if E1.type =integer and E2.type = integer
• then E.type := integer
• else E.type := type_error }
•E → E1 [ E2 ] { if E2.type = integer and E1.type = array(s,t)
• then E.type := t else E.type := type_error }
•E → E1↑ { if E1.type = pointer(t)
• then E.type := t else E.type := type-error }
Once the identifiers and their types have been inserted into the symbol table, we can
check the type of the elements of an expression:
151- By Jaydeep Patil AISSMS's IOIT Pune
How about boolean types?
•Try adding
• T -> boolean
• Relational operators < <= = >= > <>
• Logical connectives and or notto the
grammar, then add appropriate type checking
semantic actions.
152- By Jaydeep Patil AISSMS's IOIT Pune
Type checking for statements
•Usually we assign the type VOID to statements.
•If a type error is found during type checking,
though, we should set the type to type_error
•Let’s change our grammar allow statements:
• P → D ; S
•i.e., a program is a sequence of declarations
followed by a sequence of statements.
153- By Jaydeep Patil AISSMS's IOIT Pune
Type checking for statements
•S → id := E { if id.type = E.type then S.type := void
• else S.type := type_error }
•S → if E then S1 { if E.type = boolean
• then S.type := S1.type
• else S.type := type_error }
•S → while E do S1 { if E.type = boolean
• then S.type := S1.type
• else S.type := type_error }
•S → S1 ; S2 { if S1.type = void and S2.type = void
• then S.type := void
• else S.type := type_error.
Now we need to add productions and semantic actions:
154- By Jaydeep Patil AISSMS's IOIT Pune
Type checking for function calls
•Suppose we add a production E → E ( E )
•Then we need productions for function declarations:
E → E1 ( E2 ) { if E2.type = s and E1.type = s → t
then E.type := t
else E.type := type_error }
T → T1 → T2 { T.type := T1.type → T2.type }
and function calls:
155- By Jaydeep Patil AISSMS's IOIT Pune
Type checking for function calls
•Multiple-argument functions, however, can be
modeled as functions that take a single PRODUCT
argument.
• root : ( real → real ) x real → real
•this would model a function that takes a real function
•over the reals, and a real, and returns a real.
•In C:float root( float (*f)(float), float x );
156- By Jaydeep Patil AISSMS's IOIT Pune
Type expression equivalence
•Type checkers need to ask questions like:
• – “if E1.type == E2.type, then …”
•What does it mean for two type expressions to be
equal?
•STRUCTURAL EQUIVALENCE says two types are the
same if they are made up of the same basic types and
constructors.
•NAME EQUIVALENCE says two types are the same if
their constituents have the SAME NAMES.
157- By Jaydeep Patil AISSMS's IOIT Pune
Structural Equivalence
•boolean sequiv( s, t )
•{
• if s and t are the same basic type
• return TRUE;
• else if s == array( s1, s2 ) and t == array( t1, t2 )
• return sequiv( s1, t1 ) and sequiv( s2, t2 )
• else s == s1 x s2 and t = t1 x t2 then
• return sequiv( s1, t1 ) and sequiv( s2, t2 )
• else if s == pointer( s1 ) and t == pointer( t1 )
• return sequiv( s1, t1 )
• else if s == s1 → s2 and t == t1 → t2 then
• return sequiv( s1, t1 ) and sequiv( s2, t2 )
• return false
•}
158- By Jaydeep Patil AISSMS's IOIT Pune
Relaxing structural equivalence
•We don’t always want strict structural equivalence.
•E.g. for arrays, we want to write functions that accept
arrays of any length.
•To accomplish this, we would modify sequiv() to
accept any bounds:
• …
• else if s == array( s1, s2 ) and t == array( t1, t2 )
• return sequiv( s2, t2 )
• …
159- By Jaydeep Patil AISSMS's IOIT Pune
Encoding types
•Recursive routines are very slow.
•Recursive type checking routines increase the
compiler’s run time.
•In the compilers of the 1970’s and 1980’s,
compilers took too long time to run.
•So designers came up with ENCODINGS for
types that allowed for faster type checking.
160- By Jaydeep Patil AISSMS's IOIT Pune
Name equivalence
•Most languages allow association of names with type expressions. This
makes type equivalence trickier.
•Example from Pascal:
• type link = ↑cell;
• var next: link;
• last: link;
• p: ↑ cell;
• q,r: ↑ cell;
•Do next, last, p, q, and r have the same type?
•In Pascal, it depends on the implementation!
•In structural equivalence, the types would be the same.
•But NAME EQUIVALENCE requires identical NAMES.
161- By Jaydeep Patil AISSMS's IOIT Pune
Handling cyclic types
•Suppose we had the Pascal declaration
• type link = ↑cell;
• cell = record
• info: integer;
• next: link;
• end;
•The declaration of cell contains itself (via the next
pointer).
•The graph for this type therefore contains a cycle.
162- By Jaydeep Patil AISSMS's IOIT Pune
Cyclic types
•The situation in C is slightly different, since it is
impossible to refer to an undeclared name.
• typedef struct _cell {
• int info;
• struct _cell *next;
• } cell;
• typedef *cell link;
•But the name link is just shorthand for
• (struct _cell *).
•C uses name equivalence for structs to avoid recursion
•(after expanding typedef’s).
•But it uses structural equivalence elsewhere.
163- By Jaydeep Patil AISSMS's IOIT Pune
Type conversion
•Suppose we encounter an expression x+i where x has type float and i has
type int. CPU instructions for addition could take EITHER float OR int as
operands, but not a mix.
•This means the compiler must sometimes convert the operands of
arithmetic expressions to ensure that operands are consistent with operators.
•With postfix as an intermediate language for expressions, we could express
the conversion as follows:
x i inttoreal float+
•where real+ is the floating point addition operation.
164- By Jaydeep Patil AISSMS's IOIT Pune
Type coercion
•If type conversion is done by the compiler without the
programmer requesting it, it is called IMPLICIT
conversion or type COERCION.
•EXPLICIT conversions are those that the programmer
• specifices, e.g.
• x = (int)y * 2;
•Implicit conversion of CONSTANT expressions should
be done at compile time.
165- By Jaydeep Patil AISSMS's IOIT Pune
Type checking example with coercion
•Production Semantic Rule
•E -> num E.type := integer
•E -> num . num E.type := real
•E -> id E.type := lookup( id.entry )
•E -> E1 op E2 E.type := if E1.type == integer and E2.type == integer
• then integer
• else if E1.type == integer and E2.type == real
• then real
• else if E1.type == real and E2.type == integer
• then real
• else if E1.type == real and E2.type == real
• then real
• else type_error
166- By Jaydeep Patil AISSMS's IOIT Pune
END of Unit 2
167- By Jaydeep Patil AISSMS's IOIT Pune

More Related Content

What's hot (20)

Introduction to fuzzy logic
Introduction to fuzzy logicIntroduction to fuzzy logic
Introduction to fuzzy logic
Dr. C.V. Suresh Babu
 
Ll(1) Parser in Compilers
Ll(1) Parser in CompilersLl(1) Parser in Compilers
Ll(1) Parser in Compilers
Mahbubur Rahman
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Ambiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarAmbiguous & Unambiguous Grammar
Ambiguous & Unambiguous Grammar
MdImamHasan1
 
Regular Grammar
Regular GrammarRegular Grammar
Regular Grammar
Ruchika Sinha
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
Farzana Aktar
 
Theory of Computation Unit 1
Theory of Computation Unit 1Theory of Computation Unit 1
Theory of Computation Unit 1
Jena Catherine Bel D
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
FellowBuddy.com
 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
movocode
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
CONTEXT FREE GRAMMAR
CONTEXT FREE GRAMMAR CONTEXT FREE GRAMMAR
CONTEXT FREE GRAMMAR
Zahid Parvez
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
monircse2
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
International Institute of Information Technology (I²IT)
 
Compier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.pptCompier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.ppt
Apoorv Diwan
 
Yacc (yet another compiler compiler)
Yacc (yet another compiler compiler)Yacc (yet another compiler compiler)
Yacc (yet another compiler compiler)
omercomail
 
Random Forest
Random ForestRandom Forest
Random Forest
Abdullah al Mamun
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
Akshaya Arunan
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
kartikaVashisht
 
LR(0) PARSER
LR(0) PARSERLR(0) PARSER
LR(0) PARSER
International Institute of Information Technology (I²IT)
 
Error Detection & Recovery
Error Detection & RecoveryError Detection & Recovery
Error Detection & Recovery
Akhil Kaushik
 

Similar to Compiler: Syntax Analysis (20)

3. Syntax Analyzer.pptx
3. Syntax Analyzer.pptx3. Syntax Analyzer.pptx
3. Syntax Analyzer.pptx
Mattupallipardhu
 
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPTCh4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
FutureTechnologies3
 
Compiler unit 2&3
Compiler unit 2&3Compiler unit 2&3
Compiler unit 2&3
BBDITM LUCKNOW
 
Syntactic specification is concerned with the structure and organization of t...
Syntactic specification is concerned with the structure and organization of t...Syntactic specification is concerned with the structure and organization of t...
Syntactic specification is concerned with the structure and organization of t...
vijaya603274
 
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
Padamata Rameshbabu
 
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
Padamata Rameshbabu
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
ASHOK KUMAR REDDY
 
Syntaxdirected
SyntaxdirectedSyntaxdirected
Syntaxdirected
Royalzig Luxury Furniture
 
Syntaxdirected (1)
Syntaxdirected (1)Syntaxdirected (1)
Syntaxdirected (1)
Royalzig Luxury Furniture
 
Syntaxdirected
SyntaxdirectedSyntaxdirected
Syntaxdirected
Royalzig Luxury Furniture
 
Lecture7 syntax analysis_3
Lecture7 syntax analysis_3Lecture7 syntax analysis_3
Lecture7 syntax analysis_3
Mahesh Kumar Chelimilla
 
Parsing
ParsingParsing
Parsing
khush_boo31
 
Lecture8 syntax analysis_4
Lecture8 syntax analysis_4Lecture8 syntax analysis_4
Lecture8 syntax analysis_4
Mahesh Kumar Chelimilla
 
Ch2 (1).ppt
Ch2 (1).pptCh2 (1).ppt
Ch2 (1).ppt
daniloalbay1
 
LL(1) Parsers
LL(1) ParsersLL(1) Parsers
LL(1) Parsers
International Institute of Information Technology (I²IT)
 
INTRODUCTION TO LISP
INTRODUCTION TO LISPINTRODUCTION TO LISP
INTRODUCTION TO LISP
Nilt1234
 
Syntax Analysis.pptx
Syntax Analysis.pptxSyntax Analysis.pptx
Syntax Analysis.pptx
AshaS74
 
COMPILER DESIGN- Syntax Analysis
COMPILER DESIGN- Syntax AnalysisCOMPILER DESIGN- Syntax Analysis
COMPILER DESIGN- Syntax Analysis
Jyothishmathi Institute of Technology and Science Karimnagar
 
Compiler Design_Syntax Analyzer_Top Down Parsers.pptx
Compiler Design_Syntax Analyzer_Top Down Parsers.pptxCompiler Design_Syntax Analyzer_Top Down Parsers.pptx
Compiler Design_Syntax Analyzer_Top Down Parsers.pptx
RushaliDeshmukh2
 
design pattern jhsj sdasj sdde swsa adas cdx zc as dsa wa
design pattern jhsj sdasj sdde swsa adas  cdx zc as dsa  wadesign pattern jhsj sdasj sdde swsa adas  cdx zc as dsa  wa
design pattern jhsj sdasj sdde swsa adas cdx zc as dsa wa
tahsanahmmedturjo727
 
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPTCh4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
Ch4_topdownparser_ngfjgh_ngjfhgfffdddf.PPT
FutureTechnologies3
 
Syntactic specification is concerned with the structure and organization of t...
Syntactic specification is concerned with the structure and organization of t...Syntactic specification is concerned with the structure and organization of t...
Syntactic specification is concerned with the structure and organization of t...
vijaya603274
 
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
Padamata Rameshbabu
 
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
Padamata Rameshbabu
 
INTRODUCTION TO LISP
INTRODUCTION TO LISPINTRODUCTION TO LISP
INTRODUCTION TO LISP
Nilt1234
 
Syntax Analysis.pptx
Syntax Analysis.pptxSyntax Analysis.pptx
Syntax Analysis.pptx
AshaS74
 
Compiler Design_Syntax Analyzer_Top Down Parsers.pptx
Compiler Design_Syntax Analyzer_Top Down Parsers.pptxCompiler Design_Syntax Analyzer_Top Down Parsers.pptx
Compiler Design_Syntax Analyzer_Top Down Parsers.pptx
RushaliDeshmukh2
 
design pattern jhsj sdasj sdde swsa adas cdx zc as dsa wa
design pattern jhsj sdasj sdde swsa adas  cdx zc as dsa  wadesign pattern jhsj sdasj sdde swsa adas  cdx zc as dsa  wa
design pattern jhsj sdasj sdde swsa adas cdx zc as dsa wa
tahsanahmmedturjo727
 

Recently uploaded (20)

Unit 6 Message Digest Message Digest Message Digest
Unit 6  Message Digest  Message Digest  Message DigestUnit 6  Message Digest  Message Digest  Message Digest
Unit 6 Message Digest Message Digest Message Digest
ChatanBawankar
 
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her IndustryTesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia
 
Introduction of Structural Audit and Health Montoring.pptx
Introduction of Structural Audit and Health Montoring.pptxIntroduction of Structural Audit and Health Montoring.pptx
Introduction of Structural Audit and Health Montoring.pptx
gunjalsachin
 
[HIFLUX] Lok Fitting&Valve Catalog 2025 (Eng)
[HIFLUX] Lok Fitting&Valve Catalog 2025 (Eng)[HIFLUX] Lok Fitting&Valve Catalog 2025 (Eng)
[HIFLUX] Lok Fitting&Valve Catalog 2025 (Eng)
하이플럭스 / HIFLUX Co., Ltd.
 
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdf
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdfKevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdf
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdf
Medicoz Clinic
 
Introduction to Machine Vision by Cognex
Introduction to Machine Vision by CognexIntroduction to Machine Vision by Cognex
Introduction to Machine Vision by Cognex
RicardoCunha203173
 
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notesBEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
VarshithaP6
 
"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai
Julio Chai
 
Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...
Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...
Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...
ManiMaran230751
 
UNIT-5-PPT Computer Control Power of Power System
UNIT-5-PPT Computer Control Power of Power SystemUNIT-5-PPT Computer Control Power of Power System
UNIT-5-PPT Computer Control Power of Power System
Sridhar191373
 
world subdivision.pdf...................
world subdivision.pdf...................world subdivision.pdf...................
world subdivision.pdf...................
bmmederos12
 
하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)
하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)
하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)
하이플럭스 / HIFLUX Co., Ltd.
 
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCHUNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
Sridhar191373
 
Air Filter Flat Sheet Media-Catalouge-Final.pdf
Air Filter Flat Sheet Media-Catalouge-Final.pdfAir Filter Flat Sheet Media-Catalouge-Final.pdf
Air Filter Flat Sheet Media-Catalouge-Final.pdf
FILTRATION ENGINEERING & CUNSULTANT
 
Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.
MdManikurRahman
 
DIY Gesture Control ESP32 LiteWing Drone using Python
DIY Gesture Control ESP32 LiteWing Drone using  PythonDIY Gesture Control ESP32 LiteWing Drone using  Python
DIY Gesture Control ESP32 LiteWing Drone using Python
CircuitDigest
 
Application Security and Secure Software Development Lifecycle
Application  Security and Secure Software Development LifecycleApplication  Security and Secure Software Development Lifecycle
Application Security and Secure Software Development Lifecycle
DrKavithaP1
 
ISO 5011 Air Filter Catalogues .pdf
ISO 5011 Air Filter Catalogues      .pdfISO 5011 Air Filter Catalogues      .pdf
ISO 5011 Air Filter Catalogues .pdf
FILTRATION ENGINEERING & CUNSULTANT
 
BEC602-Module-3-1_Notes.pdf. Vlsi design and testing notes
BEC602-Module-3-1_Notes.pdf. Vlsi design and testing notesBEC602-Module-3-1_Notes.pdf. Vlsi design and testing notes
BEC602-Module-3-1_Notes.pdf. Vlsi design and testing notes
VarshithaP6
 
Unit 6 Message Digest Message Digest Message Digest
Unit 6  Message Digest  Message Digest  Message DigestUnit 6  Message Digest  Message Digest  Message Digest
Unit 6 Message Digest Message Digest Message Digest
ChatanBawankar
 
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her IndustryTesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia - A Leader In Her Industry
Tesia Dobrydnia
 
Introduction of Structural Audit and Health Montoring.pptx
Introduction of Structural Audit and Health Montoring.pptxIntroduction of Structural Audit and Health Montoring.pptx
Introduction of Structural Audit and Health Montoring.pptx
gunjalsachin
 
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdf
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdfKevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdf
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdf
Medicoz Clinic
 
Introduction to Machine Vision by Cognex
Introduction to Machine Vision by CognexIntroduction to Machine Vision by Cognex
Introduction to Machine Vision by Cognex
RicardoCunha203173
 
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notesBEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
BEC602- Module 3-2-Notes.pdf.Vlsi design and testing notes
VarshithaP6
 
"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai
Julio Chai
 
Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...
Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...
Forensic Science – Digital Forensics – Digital Evidence – The Digital Forensi...
ManiMaran230751
 
UNIT-5-PPT Computer Control Power of Power System
UNIT-5-PPT Computer Control Power of Power SystemUNIT-5-PPT Computer Control Power of Power System
UNIT-5-PPT Computer Control Power of Power System
Sridhar191373
 
world subdivision.pdf...................
world subdivision.pdf...................world subdivision.pdf...................
world subdivision.pdf...................
bmmederos12
 
하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)
하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)
하이플럭스 락피팅 카달로그 2025 (Lok Fitting Catalog 2025)
하이플럭스 / HIFLUX Co., Ltd.
 
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCHUNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCH
Sridhar191373
 
Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.Department of Environment (DOE) Mix Design with Fly Ash.
Department of Environment (DOE) Mix Design with Fly Ash.
MdManikurRahman
 
DIY Gesture Control ESP32 LiteWing Drone using Python
DIY Gesture Control ESP32 LiteWing Drone using  PythonDIY Gesture Control ESP32 LiteWing Drone using  Python
DIY Gesture Control ESP32 LiteWing Drone using Python
CircuitDigest
 
Application Security and Secure Software Development Lifecycle
Application  Security and Secure Software Development LifecycleApplication  Security and Secure Software Development Lifecycle
Application Security and Secure Software Development Lifecycle
DrKavithaP1
 
BEC602-Module-3-1_Notes.pdf. Vlsi design and testing notes
BEC602-Module-3-1_Notes.pdf. Vlsi design and testing notesBEC602-Module-3-1_Notes.pdf. Vlsi design and testing notes
BEC602-Module-3-1_Notes.pdf. Vlsi design and testing notes
VarshithaP6
 

Compiler: Syntax Analysis

  • 1. Syllabus • Syntax Analysis - CFG, top-down and bottom- up parsers, RDP, Predictive parser, SLR,LR(1), LALR parsers, using ambiguous grammar, Error detection and recovery, automatic construction of parsers using YACC, Introduction to Semantic analysis-Need of semantic analysis, type checking and type conversion. 1- By Jaydeep Patil AISSMS's IOIT Pune
  • 2. UNIT 2 Syntax Analysis 2- By Jaydeep Patil AISSMS's IOIT Pune
  • 3. Grammar • A Set of formal rules for generating syntactically correct sentence. • It is defined by tuple G(V,T,P,S) – V- Variables – T- Terminals – P-Production – S-Start Symbol(variable) • Terminal-{a,b,c,…..z,0-9} • Nonterminal or Variable-{A-Z} • Rule : LHS of production should at least contain one non-terminal that is variable. 3- By Jaydeep Patil AISSMS's IOIT Pune
  • 5. CFG • RULE: 1. Every Production of the form 𝐴 → α A any Variable, α any terminal 2. α → β On LHS of Production there has to be only non-terminal, Variable not string CFL(Context Free Language)-> A Set of Sentences derived from start symbol of CGF is called CFL. 6- By Jaydeep Patil AISSMS's IOIT Pune
  • 6. Leftmost and Rightmost Derivations E->E+E E->E*E E->id • Derive Id+id*id 7- By Jaydeep Patil AISSMS's IOIT Pune
  • 7. 8- By Jaydeep Patil AISSMS's IOIT Pune
  • 8. Ambiguity • A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Put another way, an ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence. 9- By Jaydeep Patil AISSMS's IOIT Pune
  • 9. 10- By Jaydeep Patil AISSMS's IOIT Pune
  • 10. Left Recursion • A grammar is left recursive if it has a nonterminal A such that there is a derivation A -> Aa for some string a. • Top-down parsing methods cannot handle left-recursive grammars, so a transformation is needed to eliminate left recursion. A -> Aa I b A-> bA’ A’->aA’/e 11- By Jaydeep Patil AISSMS's IOIT Pune
  • 11. Left Factoring • Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive, or top-down, parsing. When the choice between two alternative A-productions is not clear, we may be able to rewrite the productions to defer the decision until enough of the input has been seen that we can make the right choice. • For example, if we have the two production A -> ab1 I ab2 A-> aA’ A’-> b1|b2 12- By Jaydeep Patil AISSMS's IOIT Pune
  • 13. Top Down Parsing • Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting from the root and creating the nodes of the parse tree in preorder. Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string. 14- By Jaydeep Patil AISSMS's IOIT Pune
  • 14. Recursive-Descent Parsing • A recursive-descent parsing program consists of a set of procedures, one for each nonterminal. Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the entire input string. 15- By Jaydeep Patil AISSMS's IOIT Pune
  • 15. Recursive-Descent Parsing 16- By Jaydeep Patil AISSMS's IOIT Pune
  • 16. Recursive-Descent Parsing • General recursive-descent may require backtracking; that is, it may require repeated scans over the input. However, backtracking is rarely needed to parse programming language constructs, so backtracking parsers are not seen frequently. 17- By Jaydeep Patil AISSMS's IOIT Pune
  • 17. Recursive-Descent Parsing Consider the grammar s -> c A d A -> a b I a Get W={cad} S->cAd S->cad 18- By Jaydeep Patil AISSMS's IOIT Pune
  • 18. Recursive-Descent Parsing • A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go into an infinite loop. That is, when we try to expand a nonterminal A, we may eventually find ourselves again trying to expand A without having consumed any input. 19- By Jaydeep Patil AISSMS's IOIT Pune
  • 19. FIRST and FOLLOW • The construction of both top-down and bottom-up parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G. During topdown parsing, FIRST and FOLLOW allow us to choose which production to apply, based on the next input symbol. During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens. 20- By Jaydeep Patil AISSMS's IOIT Pune
  • 20. FIRST • First is a function which gives the set of terminals that begins the string derived from production rules. • Rules: 1. If x is a terminal then first(x)=x. 2. If X->e is a production, then add e to the set of first[x]. 3. If X is non terminal then, 3.a) If X->Y, first (Y) is an element of the set of first(x). 3.b) If first(Y) has e as an element & X->YZ then first(x)=first(Y)-e U first(z) 21- By Jaydeep Patil AISSMS's IOIT Pune
  • 21. Follow • Follow is a function which gives set of terminals that can appear immediately to the right of given symbol. • Rules: 22- By Jaydeep Patil AISSMS's IOIT Pune
  • 22. FIRST • E->TE’ • E’->+TE’/ε • T->FT’ • T’->*FT’/ ε • F->(E)/id First(E) = first(T) = First(F)={ (,id } First(E’)={ +, ε } First(T’)={*/ ε} 23- By Jaydeep Patil AISSMS's IOIT Pune
  • 23. S-> iEtSS’/a S’-> eS/ ε E->b First(S)={i,a} First(S’)={e, ε) First(E)={b} 24- By Jaydeep Patil AISSMS's IOIT Pune
  • 26. S->AaAb/BbBa A-> ε B-> ε First(A)={ε} First(B)={ε} First(S)={First(A)- ε U First(a)} U{First(B)- ε U First(b)={a,b} 27- By Jaydeep Patil AISSMS's IOIT Pune
  • 27. S->aBbDh B->cC C->bc/ ε D->EF E->g/ ε F->f/ ε First(S)={a} First(B)={c} First(C)={b/ ε} First(D)={First(E)- ε U First (F)}={g,f, ε} First(E) = {g, ε} First(F) = {f, ε} 28- By Jaydeep Patil AISSMS's IOIT Pune
  • 28. FIRST & FOLLOW • E->TE’ • E’->+TE’/ε • T->FT’ • T’->*FT’/ ε • F->(E)/id First(E) = first(T) = First(F)={ (,id } First(E’)={ +, ε } First(T’)={*/ ε} Follow(E)= { $,) } Follow(E’)={$,) } Follow(T)={First(E’)- ε U Follow(E’)}={+, $,) } Follow(T’)={+,$,)} Follow(F)={First(T’)- ε U Follow(T’)}={*,+, $,) } 29- By Jaydeep Patil AISSMS's IOIT Pune
  • 29. S-> iEtSS’/a S’-> eS/ ε E->b First(S)={i,a} First(S’)={e, ε) First(E)={b} Follow(S)={$,e} Follow(S’)={$,e} Follow(T)={t} 30- By Jaydeep Patil AISSMS's IOIT Pune
  • 32. S->AaAb/BbBa A-> ε B-> ε First(A)={ε} First(B)={ε} First(S)={First(A)- ε U First(a)} U{First(B)- ε U First(b)={a,b} Follow(S) ={$} Follow(A)={a,b} Follow(B}={b,a} 33- By Jaydeep Patil AISSMS's IOIT Pune
  • 33. S->aBbDh B->cC C->bc/ ε D->EF E->g/ ε F->f/ ε First(S)={a} First(B)={c} First(C)={b/ ε} First(D)={First(E)- ε U First (F)}={g,f, ε} First(E) = {g, ε} First(F) = {f, ε} Follow(S)={$} Follow(B)={b} Follow(C)={b} Follow(D)={h} Follow(E)={f,h} Follow(F}={h} 34- By Jaydeep Patil AISSMS's IOIT Pune
  • 34. S->aBDh B->cC C->bc/ ε D->EF E->g/ ε F->f/ ε First(S)={a} First(B)={c} First(C)={b/ ε} First(D)={First(E)- ε U First (F)}={g,f, ε} First(E) = {g, ε} First(F) = {f, ε} Follow(S)={$} Follow(B)={g,f,h} Follow(C)={g,f,h} Follow(D)={h} Follow(E)={f,h} Follow(F}={h} 35- By Jaydeep Patil AISSMS's IOIT Pune
  • 35. • E->TA • A->+TA/ε • T->FB • B->*FB/ ε • F->(E)/id First(E) = first(T) = First(F)={ (,id } First(A)={ +, ε } First(B)={*/ ε} Follow(E)= { $,) } Follow(A)={$,) } Follow(T)={First(E’)- ε U Follow(E’)}={+, $,) } Follow(B)={+,$,)} Follow(F)={First(T’)- ε U Follow(T’)}={*,+, $,) } 36- By Jaydeep Patil AISSMS's IOIT Pune
  • 36. LL ( 1 ) Grammars • Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(I) , The first "L" in LL(1) stands for scanning the input from left to right, the second "L" for producing a leftmost derivation, and the “1" for using one input symbol of lookahead at each step to make parsing action decisions. • The class of LL(1) grammars is rich enough to cover most programming constructs, although care is needed in writing a suitable grammar for the source language . For example, no left-recursive or ambiguous grammar can be LL(1) . 37- By Jaydeep Patil AISSMS's IOIT Pune
  • 37. Predictive Parsing 38- By Jaydeep Patil AISSMS's IOIT Pune
  • 38. Grammar is LL(1)(No Multiple Entries) 39- By Jaydeep Patil AISSMS's IOIT Pune
  • 39. Grammar is Not LL(1)(Multiple Entries) 40- By Jaydeep Patil AISSMS's IOIT Pune
  • 40. Nonrecursive Predictive Parsing • A non recursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive calls. The parser mimics a leftmost derivation. If w is the input that has been matched so far , then the stack holds a sequence of grammar symbols. 41- By Jaydeep Patil AISSMS's IOIT Pune
  • 41. 43- By Jaydeep Patil AISSMS's IOIT Pune
  • 42. 44- By Jaydeep Patil AISSMS's IOIT Pune
  • 43. • If X=a=$ the parser halts & announces successful completion of parsing • If X=a|=$ the parser pops X the stack and advance the i/p pointer to the next i/p symbol. • If x is non terminal the program consults entry M[X,a] of the parsing table M. This entry will either on X production of the grammar or an error entry. If for example m[x,a]->{UVW} the parser replaces x on the top of stack by WVU(with U on top of stack). As output we shall assume that the parser joust points to the production used. 45- By Jaydeep Patil AISSMS's IOIT Pune
  • 44. 46- By Jaydeep Patil AISSMS's IOIT Pune
  • 45. Bottom-Up Parsing 47- By Jaydeep Patil AISSMS's IOIT Pune
  • 46. Bottom-Up Parsing • A bottom-up parse corresponds to the construction of a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top) . 48- By Jaydeep Patil AISSMS's IOIT Pune
  • 47. 49- By Jaydeep Patil AISSMS's IOIT Pune
  • 48. Reductions • We can think of bottom-up parsing as the process of "reducing" a string w to the start symbol of the grammar. At each reduction step, a specific substring matching the body of a production is replaced by the nonterminal at the head of that production. • The key decisions during bottom-up parsing are about when to reduce and about what production to apply, as the parse proceeds. 50- By Jaydeep Patil AISSMS's IOIT Pune
  • 49. • The goal of bottom-up parsing is therefore to construct a derivation in reverse. The following derivation corresponds to the parse in • E => T => T * F => T * id => F * id => id * id • This derivation is in fact a rightmost derivation. 51- By Jaydeep Patil AISSMS's IOIT Pune
  • 50. Handle • Bottom-up parsing during a left-to-right scan of the input constructs a rightmost derivation in reverse. Informally, a "handle" is a substring that matches the body of a production, and whose reduction represents one step along the reverse of a rightmost derivation. 52- By Jaydeep Patil AISSMS's IOIT Pune
  • 51. Shift-Reduce Parsing • Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar symbols and an input buffer holds the rest of the string to be parsed. • As we shall see, the handle always appears at the top of the stack just before it is identified as the handle. • We use $ to mark the bottom of the stack and also the right end of the input. Conventionally, when discussing bottom-up parsing, we show the top of the stack on the right, rather than on the left as we did for top-down parsing. Initially, the stack is empty, and the string w is on the input, as follows: 55- By Jaydeep Patil AISSMS's IOIT Pune
  • 52. • During a left-to-right scan of the input string, the parser shifts zero or more input symbols onto the stack, until it is ready to reduce a string β of grammar symbols on top of the stack. It then reduces β to the head of the appropriate production. The parser repeats this cycle until it has detected an error or until the stack contains the start symbol and the input is empty: 56- By Jaydeep Patil AISSMS's IOIT Pune
  • 53. 57- By Jaydeep Patil AISSMS's IOIT Pune
  • 54. LR Parsing: Simple LR • The 'most prevalent type of bottom-up parser today is based on a concept called LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions. The cases k = 0 or k = 1 are of practical interest, and we shall only consider LR parsers with k <= 1 here. When (k) is omitted, k is assumed to be 1 . 58- By Jaydeep Patil AISSMS's IOIT Pune
  • 55. Why LR Parsers • LR parsers are table-driven, much like the nonrecursive LL parsers. A grammar for which we can construct a parsing table using one of the methods in this section and the next is said to be an LR grammar. Intuitively, for a grammar to be LR it is sufficient that a left-to- right shift-reduce parser be able to recognize handles of right- sentential forms when they appear on top of the stack. 59- By Jaydeep Patil AISSMS's IOIT Pune
  • 56. 60- By Jaydeep Patil AISSMS's IOIT Pune
  • 57. • The principal drawback of the LR method is that it is too much work to construct an LR parser by hand for a typical programming-language grammar. A specialized tool, an LR parser generator, is needed. • Fortunately, many such generators are available, and we shall discuss one of the most commonly used ones, Yacc . • Such a generator takes a context-free grammar and automatically produces a parser for that grammar. If the grammar contains ambiguities or other constructs that are difficult to parse in a left-to-right scan of the input, then the parser generator locates these constructs and provides detailed diagnostic messages. 61- By Jaydeep Patil AISSMS's IOIT Pune
  • 58. Items and the LR(O) Automaton 62- By Jaydeep Patil AISSMS's IOIT Pune
  • 59. Augmented grammar • If G is a grammar with start symbol S, then G' , the augmented grammar for G, is G with a ne • start symbol S' and production S' -> S. The purpose of this new starting production is to indicate to the parser when it should stop parsing and announce acceptance of the input. That is, acceptance occurs when and only when the parser is about to reduce by • S' -> S. 63- By Jaydeep Patil AISSMS's IOIT Pune
  • 60. 64- By Jaydeep Patil AISSMS's IOIT Pune
  • 61. E’-> E E-> E+T|T T-> T*F|F F->(E)|id 65- By Jaydeep Patil AISSMS's IOIT Pune
  • 62. 66- By Jaydeep Patil AISSMS's IOIT Pune
  • 63. 67- By Jaydeep Patil AISSMS's IOIT Pune
  • 64. 68- By Jaydeep Patil AISSMS's IOIT Pune
  • 65. 69- By Jaydeep Patil AISSMS's IOIT Pune
  • 66. 70- By Jaydeep Patil AISSMS's IOIT Pune
  • 67. Canonical(CLR)- Parsing 71- By Jaydeep Patil AISSMS's IOIT Pune
  • 68. 72- By Jaydeep Patil AISSMS's IOIT Pune
  • 69. 73- By Jaydeep Patil AISSMS's IOIT Pune
  • 70. 74- By Jaydeep Patil AISSMS's IOIT Pune
  • 71. 75- By Jaydeep Patil AISSMS's IOIT Pune
  • 72. Operator Precedence Parsing • Operator Grammar: For Small but important class of grammar, we can easily construct efficient shift reduce parsers by hand. Operator Grammars have the property that no production right side is empty or has two adjacent non-terminals. 76- By Jaydeep Patil AISSMS's IOIT Pune
  • 73. • Eg: E-> EAE/ (E)/ -E/id • A->+/-/|/*/ • The above grammar is not operator grammar but we can readjust the grammar. 77- By Jaydeep Patil AISSMS's IOIT Pune
  • 74. • In Operator Precedence Parsing we define three disjoint precedence relations <· , =· ,·> between certain pair of terminals. This precedence relations guide the selection of handle & have the following meaning 78- By Jaydeep Patil AISSMS's IOIT Pune
  • 75. • Basic Principle • Having precedence relations allows identifying handles as follows: • 1. Scan the string from left until seeing ·> and put a pointer. • 2. Scan backwards the string from right to left until seeing <· • 3. Everything between the two relations <· and ·> forms the handle • 4. Replace handle with the head of the production. 79- By Jaydeep Patil AISSMS's IOIT Pune
  • 76. 80- By Jaydeep Patil AISSMS's IOIT Pune
  • 77. Conflicts in LR Parsing • Every SLR grammar is unambiguous, but every unambiguous grammar is not a SLR grammar. 81- By Jaydeep Patil AISSMS's IOIT Pune
  • 78. shift/reduce and reduce/reduce conflicts • If a state does not know whether it will make a shift operation or reduction for a terminal, we say that there is a shift/reduce conflict. • If a state does not know whether it will make a reduction operation using the production rule i or j for a terminal, we say that there is a reduce/reduce conflict. • If the SLR parsing table of a grammar G has a conflict, we say that that grammar is not SLR grammar. 82- By Jaydeep Patil AISSMS's IOIT Pune
  • 79. 83- By Jaydeep Patil AISSMS's IOIT Pune
  • 80. 84- By Jaydeep Patil AISSMS's IOIT Pune
  • 81. Using Ambiguous Grammars • All grammars used in the construction of LR-parsing tables must be un-ambiguous. • Can we create LR-parsing tables for ambiguous grammars ? – Yes, but they will have conflicts. – We can resolve these conflicts in favor of one of them to disambiguate the grammar. – At the end, we will have again an unambiguous grammar. • Why we want to use an ambiguous grammar? – Some of the ambiguous grammars are much natural, and a corresponding unambiguous grammar can be very complex. – Usage of an ambiguous grammar may eliminate unnecessary reductions. • Ex. E  E+T | T E  E+E | E*E | (E) | id  T  T*F | F F  (E) | id 85- By Jaydeep Patil AISSMS's IOIT Pune
  • 82. Sets of LR(0) Items for Ambiguous Grammar I0: E’  .E E  .E+E E  .E*E E  .(E) E  .id I1: E’  E.E  E .+E E  E .*E I2: E  (.E) E  .E+E E  .E*E E  .(E) E  .id I3: E  id. I4: E  E +.E E  .E+E E  .E*E E  .(E) E  .id I5: E  E *.E E  .E+E E  .E*E E  .(E) E  .id I6: E  (E.) E  E.+E E  E.*E I7: E  E+E.E  E.+E E  E.*E I8: E  E*E.E  E.+E E  E.*E I9: E  (E). I5 ) E E E E * + + + + * * * ( ( ( ( id id id id I4 I2 I2 I3 I3 I4 I4 I5 I5 86- By Jaydeep Patil AISSMS's IOIT Pune
  • 83. Using ambiguous grammars 87- By Jaydeep Patil AISSMS's IOIT Pune
  • 84. 88- By Jaydeep Patil AISSMS's IOIT Pune
  • 85. 89- By Jaydeep Patil AISSMS's IOIT Pune
  • 86. 90- By Jaydeep Patil AISSMS's IOIT Pune
  • 87. 91- By Jaydeep Patil AISSMS's IOIT Pune
  • 88. 92- By Jaydeep Patil AISSMS's IOIT Pune
  • 89. Error Recovery in LR Parsing • An LR parser will detect an error when it consults the parsing action table and finds an error entry. All empty entries in the action table are error entries. • Errors are never detected by consulting the goto table. • An LR parser will announce error as soon as there is no valid continuation for the scanned portion of the input. • A canonical LR parser (LR(1) parser) will never make even a single reduction before announcing an error. • The SLR and LALR parsers may make several reductions before announcing an error. • But, all LR parsers (LR(1), LALR and SLR parsers) will never shift an erroneous input symbol onto the stack. 93- By Jaydeep Patil AISSMS's IOIT Pune
  • 90. ERROR RECOVERY IN LR PARSING • An LR Parser will detect an error when it consults parsing table and finds an error entry. A canonical parser will never make even a single reduction before announcing an error. The SLR and LALR parsers may take several reductions before announcing an error, but they will never shift an erroneous input into the stack.. • We can implement two modes of recovery : 94- By Jaydeep Patil AISSMS's IOIT Pune
  • 91. Panic Mode • We scan down the stack until a state s with a goto on a particular non-terminal A is found. Zero or more input symbols are then discarded until a symbol a is found that can legitimately follow A. The parser then stack the state goto[ s, A] and resume normal parsing. Normally there may be many choices for the non terminal A. Normally these would be non- terminals representing major program pieces, such as an expression, statement, or block. 95- By Jaydeep Patil AISSMS's IOIT Pune
  • 92. Phrase Level Recovery • It is implemented by examining each error entry in the LR parsing table and deciding on the basis of language the most likely program error that give rise to that error entry in the LR parsing table. An appropriate error procedure than can be implemented; presumably the top of the stack and/or first input symbols would be modified in a way deemed appropriate for each error. 96- By Jaydeep Patil AISSMS's IOIT Pune
  • 93. • As an example consider the grammar (1). E  E + E | E * E | ( E ) | id ---- (1) The parsing table contains error routines that have effect of detecting error before any shift move takes place. 97- By Jaydeep Patil AISSMS's IOIT Pune
  • 94. Id + * ( ) $ E 0 s3 e1 e1 s2 e2 e1 1 1 e3 s4 s5 e3 e2 acc 2 s3 e1 e1 s2 e2 e1 6 3 r4 r4 r4 r4 r4 r4 4 s3 e1 e1 s2 e2 e1 7 5 s3 e1 e1 s2 e2 e1 8 6 e3 s4 s5 e3 s9 e4 7 r1 r1 s5 r1 r1 r1 8 r2 r2 r2 r2 r2 r2 9 r3 r3 r3 r3 r3 r3 The LR parsing table with error routines 98- By Jaydeep Patil AISSMS's IOIT Pune
  • 95. Error routines : e1 • This routine is called from states 0,2,4 and 5, all of which the beginning of the operand, either an id of left parenthesis. Instead an operator, + or *, or the end of input was found. • Action: Push an imaginary id on to the stack and cover it with a state 3. (the goto of the states 0, 2, 4 and 5) • Print: Issue diagnostic “missing operand” 99- By Jaydeep Patil AISSMS's IOIT Pune
  • 96. Error routines : e2 • This routine is called from states 0, 1, 2, 4 and 5 on the finding a right parenthesis. • Action: Remove the right parenthesis from the input • Print: Issue diagnostic “Unbalanced right parenthesis” 100- By Jaydeep Patil AISSMS's IOIT Pune
  • 97. Error routines : e3 • This routine is called from states 1 or 6 when expecting an operator, and an id or right parenthesis is found. • Action: Push + onto the stack and cover it with state 4. • Print: Issue diagnostic “Missing operator” 101- By Jaydeep Patil AISSMS's IOIT Pune
  • 98. Error routines : e4 • This routine is called from state 6 when the end of input is found while expecting operator or a right parenthesis. • Action: Push a right parenthesis onto the stack and cover it with a state 9. • Print: Issue diagnostic “Missing right parenthesis” 102- By Jaydeep Patil AISSMS's IOIT Pune
  • 99. Automatic construction of parsers (YACC), YACC specifications. 103- By Jaydeep Patil AISSMS's IOIT Pune
  • 100. 104 Automatic construction of parsers (YACC), YACC specifications. • Two classical tools for compilers: – Lex: A Lexical Analyzer Generator – Yacc: “Yet Another Compiler Compiler” (Parser Generator) • Lex creates programs that scan your tokens one by one. • Yacc takes a grammar (sentence structure) and generates a parser. Lex Yacc yylex() yyparse() Lexical Rules Grammar Rules Input Parsed Input - By Jaydeep Patil AISSMS's IOIT Pune
  • 101. 105 Automatic construction of parsers (YACC), YACC specifications. • Lex and Yacc generate C code for your analyzer & parser. Lex Yacc yylex() yyparse() Lexical Rules Grammar Rules Input Parsed Input C code C code C code C code Lexical Analyzer (Tokenizer) Parser char stream token stream - By Jaydeep Patil AISSMS's IOIT Pune
  • 102. 106 Automatic construction of parsers (YACC), YACC specifications. • Often, instead of the standard Lex and Yacc, Flex and Bison are used: – Flex: A fast lexical analyzer – (GNU) Bison: A drop-in replacement for (backwards compatible with) Yacc • Byacc is Berkeley implementation of Yacc (so it is Yacc). • Resources: – http://en.wikipedia.org/wiki/Flex_lexical_analyser – http://en.wikipedia.org/wiki/GNU_Bison • The Lex & Yacc Page (manuals, links): – http://dinosaur.compilertools.net/ - By Jaydeep Patil AISSMS's IOIT Pune
  • 103. 107 Automatic construction of parsers (YACC), YACC specifications. • Yacc is not a new tool, and yet, it is still used in many projects. • Yacc syntax is similar to Lex/Flex at the top level. • Lex/Flex rules were regular expression – action pairs. • Yacc rules are grammar rule – action pairs. declarations %% rules %% programs - By Jaydeep Patil AISSMS's IOIT Pune
  • 104. 108- By Jaydeep Patil AISSMS's IOIT Pune
  • 105. • Declaration Section • There are two sections in the declarations part of a Yacc program; both are optional. In the first section, we put ordinary C declarations, delimited by %{ and % }. Here we place declarations of any temporaries used by the translation rules or procedures of the second and third sections. Ex. #include <ctype .h> • The C preprocessor to include the standard header file <ctype . h> that contains the predicate isdigit. • Also in the declarations part are declarations of grammar tokens. • %token DIGIT • Tokens declared in this section can then be used in the second and third parts of the Yacc specification. , If Lex is used to create the lexical analyzer that passes token to the Yacc parser, then these token declarations are also made available to the analyzer generated by Lex. 109- By Jaydeep Patil AISSMS's IOIT Pune
  • 106. • The Translation Rules Part • In the part of the Yacc specification after the first %% pair, we put the translation rules . Each rule consists of a grammar production and the associated semantic action. A set of productions that we have been writing: 110- By Jaydeep Patil AISSMS's IOIT Pune
  • 107. • In a Yacc production, unquoted strings of letters and digits not declared to be tokens are taken to be non terminals. A quoted single character, e.g. ' c' , is taken to be the terminal symbol c , as well as the integer code for the token represented by that character (i.e., Lex would return the character code for ' c ‘ to the parser, as an integer) . Alternative bodies can be separated by a vertical bar, and a semicolon follows each head with its alternatives and their semantic actions. The first head is taken to be the start symbol. 111- By Jaydeep Patil AISSMS's IOIT Pune
  • 108. • A Yacc semantic action is a sequence of C statements. In a semantic action, the symbol $$ refers to the attribute value associated with the nonterminal of the head, while $i refers to the value associated with the ith grammar symbol (terminal or nonterminal) of the body. The semantic action is performed whenever we reduce by the associated production, so normally the semantic action computes a value for $$ in terms of the $i's. In the Yacc specification, we have written the two E-productions 112- By Jaydeep Patil AISSMS's IOIT Pune
  • 109. • Note that the nonterminal term in the first production is the third grammar symbol of the body, while + is the second. The semantic action associated with the first production adds the value of the expr and the term of the body and assigns the result as the value for the nonterminal expr of the head. We have omitted the semantic action for the second production altogether, since copying the value is the default action for productions with a single grammar symbol in the body. In general, { $$ = $ 1 ; } is the default semantic action. Notice that we have added a new starting production • line : expr ' n ' { print f C " %dn" , $ 1 ) ; } • to the Yacc specification. This production says that an input to the desk calculator is to be an expression followed by a newline character. The semantic action associated with this production prints the decimal value of the expression followed by a newline character. 113- By Jaydeep Patil AISSMS's IOIT Pune
  • 110. • The Supporting C-Routines Part • The third part of a Yacc specification consists of supporting C- routines. A lexical analyzer by the name yylex() must be provided. Using Lex to produce yylex() is a common choice; The lexical analyzer yylex() produces tokens consisting of a token name and its associated attribute value. If a token name such as DIGIT is returned, the token name must be declared in the first section of the Yacc specification. • The attribute value associated with a token is communicated to the parser through a Yacc-defined variable yylval. It reads input characters one at a time using the C-function getchar() . If the character is a digit, the value of the digit is stored in the variable yylval, and the token name DIGIT is returned. Otherwise, the character itself is returned as the token name. 114- By Jaydeep Patil AISSMS's IOIT Pune
  • 111. 115- By Jaydeep Patil AISSMS's IOIT Pune
  • 112. yacc –d bas.y # create y.tab.h, y.tab.c lex bas.l # create lex.yy.c cc lex.yy.c y.tab.c –o bas.exe # compile/link 116- By Jaydeep Patil AISSMS's IOIT Pune
  • 113. • Yacc reads the grammar descriptions in bas.y and generates a syntax analyzer (parser), that includes function yyparse, in file y.tab.c. The –d option causes yacc to generate definitions for tokens and place them in file y.tab.h. Lex reads the pattern descriptions in bas.l, includes file y.tab.h, and generates a lexical analyzer, that includes function yylex, in file lex.yy.c. • Finally, the lexer and parser are compiled and linked together to create executable bas.exe. From main we call yyparse to run the compiler. Function yyparse automatically calls yylex to obtain each token. 117- By Jaydeep Patil AISSMS's IOIT Pune
  • 114. • %token INTEGER • This definition declares an INTEGER token. Yacc generates a parser in file y.tab.c and an include file, y.tab.h: • #ifndef YYSTYPE • #define YYSTYPE int • #endif • #define INTEGER 258 • extern YYSTYPE yylval; • Lex includes this file and utilizes the definitions for token values. To obtain tokens yacc calls yylex. Function yylex has a return type of int that returns a token. Values associated with the token are returned by lex in variable yylval. For example, • [0-9]+ { yylval = atoi(yytext); return INTEGER; } • would store the value of the integer in yylval, and return token INTEGER to yacc. The type of yylval is determined by YYSTYPE. Since the default type is integer this works well in this case. Token values 0-255 are reserved for character values. For example, if you had a rule such as • [-+] return *yytext; /* return operator */ • the character value for minus or plus is returned. Note that we placed the minus sign first so that it wouldn’t be mistaken for a range designator. Generated token values typically start around 258 because lex reserves several values for end-of-file and error processing. 118- By Jaydeep Patil AISSMS's IOIT Pune
  • 115. • By default yylval is of type int, but you can override that from the YACC file by re#defining YYSTYPE. • The Lexer needs to be able to access yylval. In order to do so, it must be declared in the scope of the lexer as an extern variable. The original YACC neglects to do this for you, so you should add the following to your lexer, just beneath • #include <y.tab.h>: • extern YYSTYPE yylval; • Bison does this for you automatically. • #ifndef checks whether the given token has been #defined earlier in the file or in an included file; if not, it includes the code between it and the closing #else or, if no #else is present, #endif statement. 119- By Jaydeep Patil AISSMS's IOIT Pune
  • 116. 120- By Jaydeep Patil AISSMS's IOIT Pune
  • 117. • Internally yacc maintains two stacks in memory; a parse stack and a value stack. The parse stack contains terminals and nonterminals that represent the current parsing state. The value stack is an array of YYSTYPE elements and associates a value with each element in the parse stack. For example when lex returns an INTEGER token yacc shifts this token to the parse stack. At the same time the corresponding yylval is shifted to the value stack. The parse and value stacks are always synchronized so finding a value related to a token on the stack is easily accomplished. 121- By Jaydeep Patil AISSMS's IOIT Pune
  • 118. 122- By Jaydeep Patil AISSMS's IOIT Pune
  • 119. • The left-hand side of a production, or nonterminal, is entered left-justified and followed by a colon. This is followed by the right-hand side of the production. Actions associated with a rule are entered in braces. • With left-recursion, we have specified that a program consists of zero or more expressions. Each expression terminates with a newline. When a newline is detected we print the value of the expression. When we apply the rule • expr: expr '+' expr { $$ = $1 + $3; } • we replace the right-hand side of the production in the parse stack with the left-hand side of the same production. In this case we pop “expr '+' expr” and push “expr”. We have reduced the stack by popping three terms off the stack and pushing back one term. We may reference positions in the value stack in our C code by specifying “$1” for the first term on the right-hand side of the production, “$2” for the second, and so on. “$$” designates the top of the stack after reduction has taken place. The above action adds the value associated with two expressions, pops three terms off the value stack, and pushes back a single sum. As a consequence the parse and value stacks remain synchronized. 123- By Jaydeep Patil AISSMS's IOIT Pune
  • 120. • Numeric values are initially entered on the stack when we reduce from INTEGER to expr. After INTEGER is shifted to the stack we apply the rule • expr: INTEGER { $$ = $1; } • The INTEGER token is popped off the parse stack followed by a push of expr. For the value stack we pop the integer value off the stack and then push it back on again. In other words we do nothing. In fact this is the default action and need not be specified. Finally, when a newline is encountered, the value associated with expr is printed. • In the event of syntax errors yacc calls the user-supplied function yyerror. If you need to modify the interface to yyerror then alter the canned file that yacc includes to fit your needs. The last function in our yacc specification is main. This example still has an ambiguous grammar. Although yacc will issue shift-reduce warnings it will still process the grammar using shift as the default operation. 124- By Jaydeep Patil AISSMS's IOIT Pune
  • 121. • The lexical analyzer returns VARIABLE and INTEGER tokens. For variables yylval specifies an index to the symbol table sym. For this program sym merely holds the value of the associated variable. When INTEGER tokens are returned, yylval contains the number scanned. 125- By Jaydeep Patil AISSMS's IOIT Pune
  • 122. • The input specification for yacc follows. The tokens for INTEGER and VARIABLE are utilized by yacc to create #defines in y.tab.h for use in lex. This is followed by definitions for the arithmetic operators. We may specify %left, for left- associative or %right for right associative. The last definition listed has the highest precedence. Consequently multiplication and division have higher precedence than addition and subtraction. All four operators are left-associative. Using this simple technique we are able to disambiguate our grammar. 126- By Jaydeep Patil AISSMS's IOIT Pune
  • 123. 127- By Jaydeep Patil AISSMS's IOIT Pune
  • 124. • extern void *malloc(); • malloc accepts an argument of type size_t, and size_t may be defined as unsigned long. If you are passing ints (or even unsigned ints), malloc may be receiving garbage (or similarly if you are passing a long but size_t is int). 128- By Jaydeep Patil AISSMS's IOIT Pune
  • 125. Semantic Analysis 129- By Jaydeep Patil AISSMS's IOIT Pune
  • 126. Beyond syntax analysis •An identifier named x has been recognized. –Is x a scalar, array or function? –How big is x? –If x is a function, how many and what type of arguments does it take? –Is x declared before being used? –Where can x be stored? –Is the expression x+y type-consistent? •Semantic analysis is the phase where we collect information about the types of expressions and check for type related errors. •The more information we can collect at compile time, the less overhead we have at run time. 130- By Jaydeep Patil AISSMS's IOIT Pune
  • 127. Semantic Analysis •The syntax of a programming language describes the proper form of its programs, •while the semantics of the language defines what its programs mean; that is, what each program does when it executes. 131- By Jaydeep Patil AISSMS's IOIT Pune
  • 128. Semantic analysis •Collecting type information may involve "computations" –What is the type of x+y given the types of x and y? •Tool: attribute grammars –Each grammar symbol has a number of associated attributes: –The type of a variable or expression –The value of a variable or expression –The code for a statement –Etc. –The grammar is augmented with special equations (called semantic actions) that specify how the values of attributes are computed from other attributes. –The process of using semantic actions to evaluate attributes is called syntax-directed translation. 132- By Jaydeep Patil AISSMS's IOIT Pune
  • 129. •TYPE Checking 133- By Jaydeep Patil AISSMS's IOIT Pune
  • 130. •A compiler must check that the source program follows both syntactic and semantic conversion of the source language. •This checking called static checking (to distinguish it from dynamic checking during execution of the target program), ensure that certain kinds of programming errors will be detected and reported. 134- By Jaydeep Patil AISSMS's IOIT Pune
  • 131. •Example of Static Checks: –Type Checks : A Compiler Should report error if an operator applied to an incompatible operand. –Flow of Control Checks: Statement that cause flow of control to leave a construct must have some place to which to transfer the flow of control. –Uniqueness check: There are some situations in which an object must be defined only once. –Name-Related Check: Sometimes the same name must appear two or more times. Ex. In ADA, A loop or block may have a name that appears at the beginning and end of the construct. The compiler must check same name is used at both places. 135- By Jaydeep Patil AISSMS's IOIT Pune
  • 132. Type Checking •TYPE CHECKING is the main activity in semantic analysis. •Goal: calculate and ensure consistency of the type of every expression in a program •If there are type errors, we need to notify the user. •Otherwise, we need the type information to generate code that is correct. 136- By Jaydeep Patil AISSMS's IOIT Pune
  • 133. 137 Type Systems and Type Expressions 137- By Jaydeep Patil AISSMS's IOIT Pune
  • 134. Type systems •Every language has a set of types and rules for assigning types to language constructs. •Example from the C specification: –“The result of the unary & operator is a pointer to the object referred to by the operand. If the type of the operand is ‘…’ then the type of the result is ‘pointer to …’ •Usually, every expression has a type. •Type have structure: the type ‘pointer to int’ is •CONSTRUCTED from the type ‘int’ 138- By Jaydeep Patil AISSMS's IOIT Pune
  • 135. Basic vs. constructed types •Most programming languages have basic and constructed types. •BASIC TYPES are the atomic types provided by the language. –Pascal: boolean, character, integer, real –C: char, int, float, double •CONSTRUCTED TYPES are built up from basic types. –Pascal: arrays, records, sets, pointers –C: arrays, structs, pointers 139- By Jaydeep Patil AISSMS's IOIT Pune
  • 136. Type expressions •We denote the type of language constructs with TYPE EXPRESSIONS. •Type expressions are built up with TYPE CONSTRUCTORS. 1.A basic type is a type expression. The basic types are boolean, char, integer, and real. The special basic type type_error signifies an error. The special type void signifies “no type” 2.A type name is a type expression (type names are like typedefs in C) 140- By Jaydeep Patil AISSMS's IOIT Pune
  • 137. Type expressions 1.A type constructor applied to type expressions is a type expression. a.Arrays: if T is a type expression, then pointer(T) is a type expression denoting the type “pointer to an object of type T” b.Array(I,T)  I: index set, T: element type c.Products: if T1 and T2 are type expressions, then their Cartesian product T1 × T2 is also a type expression. d.Records: a record is a special kind of product in which the fields have names (examples below) e.Pointers: if T is a type expression, then pointer(T) is a type expression denoting the type “pointer to an object of type T” f.Functions: functions map elements of a domain D to a range R, so we write D -> R to denote “function mapping objects of type D to objects of type R” (examples below) 2.Type expressions may contain variables, whose values are themselves type expressions.  polymorphism 141- By Jaydeep Patil AISSMS's IOIT Pune
  • 138. Record type expressions •The Pascal code • type row = record • address: integer; • lexeme: array[1..15] of char • end; • var table: array[1..10] of row; •associates type expression •record((address × integer) × (lexeme × array(1..15,char))) •with the variable row, and the type expression •array(1..101,record((address × integer) × (lexeme × array(1..15,char))) •with the variable table 142- By Jaydeep Patil AISSMS's IOIT Pune
  • 139. Function type expressions •The C declaration •int *foo( char a, char b ); •would associate type expression •char × char -> pointer(integer) •with foo. Some languages (like ML) allow all sorts of crazy function types, e.g. • (integer -> integer) -> (integer -> integer) •denotes functions taking a function as input and returning another function 143- By Jaydeep Patil AISSMS's IOIT Pune
  • 140. Graph representation of type expressions •The recursive structure of a type can be represented with a tree, e.g. for char × char -> pointer(integer): •Some compilers explicitly use graphs like these to represent the types of expressions. 144- By Jaydeep Patil AISSMS's IOIT Pune
  • 141. Type systems and checkers •A TYPE SYSTEM is a set of rules for assigning type expressions to the parts of a program. •Every type checker implements some type system. •Syntax-directed type checking is a simple method to implement a type checker. 145- By Jaydeep Patil AISSMS's IOIT Pune
  • 142. Static vs. dynamic type checking •STATIC type checking is done at compile time. •DYNAMIC type checking is done at run time. •Any kind of type checking CAN be done at run time. •But this reduces run-time efficiency, so we want to do static checking when possible. •A SOUND type system is one in which ALL type errors can be found statically. •If the compiler guarantees that every program it accepts will run without type errors, then the language is STRONGLY TYPED. 146- By Jaydeep Patil AISSMS's IOIT Pune
  • 143. 147 An Example Type Checker 147- By Jaydeep Patil AISSMS's IOIT Pune
  • 144. Example type checker •Let’s build a translation scheme to synthesize the type of every expression from its subexpressions. •Here is a Pascal-like grammar for a sequence of declarations (D) followed by an expression (E) •Example program: key: integer; • key mod 1999 P → D ; E D → D ; D | id : T T → char | integer | array [ num ] of T | ↑ T E → literal | num | id | E mod E | E [ E ] | E ↑ 148- By Jaydeep Patil AISSMS's IOIT Pune
  • 145. The type system •The basic types are char and integer. •type_error signals an error. •All arrays start at 1, so •array[256] of char •leads to type expression: array(1..256,char) •The symbol ↑ in an declaration specifies a pointer type, •so • ↑ integer •leads to type expression: pointer(integer) 149- By Jaydeep Patil AISSMS's IOIT Pune
  • 146. Translation scheme for declarations •P → D ; E •D → D ; D •D → id : T { addtype(id.entry, T.type) } •T → char { T.type := char } •T → integer { T.type := integer } •T → ↑T1 { T.type := pointer(T1.type) } •T → array [ num ] of T1 • { T.type := array(1 .. num.val, T1.type) } 150- By Jaydeep Patil AISSMS's IOIT Pune
  • 147. Type checking for expressions •E → literal { E.type := char } •E → num { E.type := integer } •E → id { E.type := lookup(id.entry) } •E → E1 mod E2 { if E1.type =integer and E2.type = integer • then E.type := integer • else E.type := type_error } •E → E1 [ E2 ] { if E2.type = integer and E1.type = array(s,t) • then E.type := t else E.type := type_error } •E → E1↑ { if E1.type = pointer(t) • then E.type := t else E.type := type-error } Once the identifiers and their types have been inserted into the symbol table, we can check the type of the elements of an expression: 151- By Jaydeep Patil AISSMS's IOIT Pune
  • 148. How about boolean types? •Try adding • T -> boolean • Relational operators < <= = >= > <> • Logical connectives and or notto the grammar, then add appropriate type checking semantic actions. 152- By Jaydeep Patil AISSMS's IOIT Pune
  • 149. Type checking for statements •Usually we assign the type VOID to statements. •If a type error is found during type checking, though, we should set the type to type_error •Let’s change our grammar allow statements: • P → D ; S •i.e., a program is a sequence of declarations followed by a sequence of statements. 153- By Jaydeep Patil AISSMS's IOIT Pune
  • 150. Type checking for statements •S → id := E { if id.type = E.type then S.type := void • else S.type := type_error } •S → if E then S1 { if E.type = boolean • then S.type := S1.type • else S.type := type_error } •S → while E do S1 { if E.type = boolean • then S.type := S1.type • else S.type := type_error } •S → S1 ; S2 { if S1.type = void and S2.type = void • then S.type := void • else S.type := type_error. Now we need to add productions and semantic actions: 154- By Jaydeep Patil AISSMS's IOIT Pune
  • 151. Type checking for function calls •Suppose we add a production E → E ( E ) •Then we need productions for function declarations: E → E1 ( E2 ) { if E2.type = s and E1.type = s → t then E.type := t else E.type := type_error } T → T1 → T2 { T.type := T1.type → T2.type } and function calls: 155- By Jaydeep Patil AISSMS's IOIT Pune
  • 152. Type checking for function calls •Multiple-argument functions, however, can be modeled as functions that take a single PRODUCT argument. • root : ( real → real ) x real → real •this would model a function that takes a real function •over the reals, and a real, and returns a real. •In C:float root( float (*f)(float), float x ); 156- By Jaydeep Patil AISSMS's IOIT Pune
  • 153. Type expression equivalence •Type checkers need to ask questions like: • – “if E1.type == E2.type, then …” •What does it mean for two type expressions to be equal? •STRUCTURAL EQUIVALENCE says two types are the same if they are made up of the same basic types and constructors. •NAME EQUIVALENCE says two types are the same if their constituents have the SAME NAMES. 157- By Jaydeep Patil AISSMS's IOIT Pune
  • 154. Structural Equivalence •boolean sequiv( s, t ) •{ • if s and t are the same basic type • return TRUE; • else if s == array( s1, s2 ) and t == array( t1, t2 ) • return sequiv( s1, t1 ) and sequiv( s2, t2 ) • else s == s1 x s2 and t = t1 x t2 then • return sequiv( s1, t1 ) and sequiv( s2, t2 ) • else if s == pointer( s1 ) and t == pointer( t1 ) • return sequiv( s1, t1 ) • else if s == s1 → s2 and t == t1 → t2 then • return sequiv( s1, t1 ) and sequiv( s2, t2 ) • return false •} 158- By Jaydeep Patil AISSMS's IOIT Pune
  • 155. Relaxing structural equivalence •We don’t always want strict structural equivalence. •E.g. for arrays, we want to write functions that accept arrays of any length. •To accomplish this, we would modify sequiv() to accept any bounds: • … • else if s == array( s1, s2 ) and t == array( t1, t2 ) • return sequiv( s2, t2 ) • … 159- By Jaydeep Patil AISSMS's IOIT Pune
  • 156. Encoding types •Recursive routines are very slow. •Recursive type checking routines increase the compiler’s run time. •In the compilers of the 1970’s and 1980’s, compilers took too long time to run. •So designers came up with ENCODINGS for types that allowed for faster type checking. 160- By Jaydeep Patil AISSMS's IOIT Pune
  • 157. Name equivalence •Most languages allow association of names with type expressions. This makes type equivalence trickier. •Example from Pascal: • type link = ↑cell; • var next: link; • last: link; • p: ↑ cell; • q,r: ↑ cell; •Do next, last, p, q, and r have the same type? •In Pascal, it depends on the implementation! •In structural equivalence, the types would be the same. •But NAME EQUIVALENCE requires identical NAMES. 161- By Jaydeep Patil AISSMS's IOIT Pune
  • 158. Handling cyclic types •Suppose we had the Pascal declaration • type link = ↑cell; • cell = record • info: integer; • next: link; • end; •The declaration of cell contains itself (via the next pointer). •The graph for this type therefore contains a cycle. 162- By Jaydeep Patil AISSMS's IOIT Pune
  • 159. Cyclic types •The situation in C is slightly different, since it is impossible to refer to an undeclared name. • typedef struct _cell { • int info; • struct _cell *next; • } cell; • typedef *cell link; •But the name link is just shorthand for • (struct _cell *). •C uses name equivalence for structs to avoid recursion •(after expanding typedef’s). •But it uses structural equivalence elsewhere. 163- By Jaydeep Patil AISSMS's IOIT Pune
  • 160. Type conversion •Suppose we encounter an expression x+i where x has type float and i has type int. CPU instructions for addition could take EITHER float OR int as operands, but not a mix. •This means the compiler must sometimes convert the operands of arithmetic expressions to ensure that operands are consistent with operators. •With postfix as an intermediate language for expressions, we could express the conversion as follows: x i inttoreal float+ •where real+ is the floating point addition operation. 164- By Jaydeep Patil AISSMS's IOIT Pune
  • 161. Type coercion •If type conversion is done by the compiler without the programmer requesting it, it is called IMPLICIT conversion or type COERCION. •EXPLICIT conversions are those that the programmer • specifices, e.g. • x = (int)y * 2; •Implicit conversion of CONSTANT expressions should be done at compile time. 165- By Jaydeep Patil AISSMS's IOIT Pune
  • 162. Type checking example with coercion •Production Semantic Rule •E -> num E.type := integer •E -> num . num E.type := real •E -> id E.type := lookup( id.entry ) •E -> E1 op E2 E.type := if E1.type == integer and E2.type == integer • then integer • else if E1.type == integer and E2.type == real • then real • else if E1.type == real and E2.type == integer • then real • else if E1.type == real and E2.type == real • then real • else type_error 166- By Jaydeep Patil AISSMS's IOIT Pune
  • 163. END of Unit 2 167- By Jaydeep Patil AISSMS's IOIT Pune