@vtucode - in Module 3 2021 Scheme
@vtucode - in Module 3 2021 Scheme
Where A production represents any number of a’s and b’s and is given by:
A → aA | bA | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → AabA
A → aA | bA | ɛ
iii. Obtain CFG for the language L = { ( 011 + 1)* 01 }
L can be re-written as:
S → A01
A → 011A | 1A | ɛ
iv. Obtain CFG for the language L = { w| w € (0,1)* with at least one occurrence of ‘101’ }.
The regular expression corresponding to the language is L = { w 101 w }
Where A production represents any number of 0’s and 1’s and is given by:
A → 0A | 1A | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → A101A
A → 0A | 1A | ɛ
v. Obtain CFG for the language L = { w| wab € (a,b)* }.
OR
Obtain CFG for the language containing strings of a’s and b’s ending with’ab’. }.
The resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → Aab
A → aA | bA | ɛ
vi. Obtain CFG for the language containing strings of a’s and b’s ending with’ab’ or ‘ba’. }.
OR
Obtain the context free grammar for the language L = { XY | X € (a, b)* and Y € (ab or ba)
The regular expression corresponding to the language is w (ab + ba) where w is in( a, b)*
X→ aX | bX | ɛ
Y → ab | ba
The resulting grammar is G = ( V, T, P, S) where,
V = { S, X, Y }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → XY
X→ aX | bX | ɛ
Y → ab | ba
Obtain the CFG for the language L = { Na (w) = Nb (w) | w € (a, b)* }
OR
Obtain the CFG for the language containing strings of a’s and b’s with equal number of a’s and b’s.
Answer:
To get equal number of a’s and b’s, we know that there are 3 cases:
i. An empty string ɛ has equal number of a’s and b’s
ii. Equal number of a’s followed by equal number of b’s.
iii. Equal number of b’s followed by equal number of a’s.
The corresponding productions for these 3 cases can be written as
S→ ɛ
S→ aSb
S→ bSa
Using these productions the strings of the form ɛ, ab, ba, ababab….., bababa…. etc can be
generated.
But the strings such as abba, baab, etc, where the strings starts and ends with the same symbol,
cannot be generated from these productions. So to generate these type of strings, we need to
concatenate the above two productions which generates equal a’s and equal b’s and equal b’s and
equal a’s or vice versa. The corresponding production is S→ SS.
The resulting grammar corresponding to the language with equal number of a’s and equal number
of b’s is G = ( V, T, P, S) where,
V = { S }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S→ɛ
S→ aSb
S → bSa
S → SS
Obtain the CFG for the language L = { Na (w) = Nb (w) + 1 | w € (a, b)* }
The language containing stings of a’s and b’s with number of a’s one more than number of ‘b’s.
Here we should have one more a’s either in the beginning or at the end or at the middle.
We can write the A production with equal number of a’s and equal number of b’s as
A→ ɛ | aAb | bAa |AA
and finally inserting one extra ‘a’ between these A production. ie:
S→ AaA
We know that CFG corresponding to the language 0m 1m | m ≥ 1, by referring the basic building
block grammar of an bn | n ≥ 1.
The equivalent A production is:
A → 0A1
A → 01
Here B represents any number of 2’s with at least one 2 (n ≥ 1), which is similar to an grammar.
The equivalent B production is:
B → 2B
B→2
So the context free grammar for the language L = { 0m 1m 2n | m, n ≥ 1 } is G = ( V, T, P, S) where,
V = { S, A, B}, T = { 0, 1, 2}, S is the start symbol, and P is the production rule is as shown below:
S → AB
A → 0A1 | 01
B → 2B | 2
Obtain the context free grammar for the language L = {a2n bm | m, n ≥ 0 }
Answer:
Since ‘a’ represented in terms of ‘n’ and ‘b’ represented in terms of ‘m’, we can re-write the
language as:
Case 2: when j = k
Where C→ 1C| 1
So the context free grammar for the language L = { 0i 1j | i ≠ j where i, j ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A, B, C}, T = {0, 1}, S is the start symbol, and P is the production rule is as shown below:
S→ AB |BC
A→ 0A| 0
B→ 0B1| ɛ
C→ 1C| 1
Obtain the context free grammar for the language L = {an bm | n = 2m where m ≥ 0 }
Answer:
By substituting n = 2m we have
L= { a2m bm | m ≥ 0 }
Here for every two ‘a’s one ‘b’ has to be generated. This is obtained by suffixing ‘aaS’ with one
‘b’. The minimum string is ɛ.
So the context free grammar for the language L = {an bm | n = 2m where m ≥ 0 }
is G = ( V, T, P, S) where,
V = {S }, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aaSb
S→ɛ
Obtain the context free grammar for the language L = {an bm | n ≠ 2m where n, m ≥ 1 }
Answer:
Here n ≠ 2m means n > 2m or n< 2m, which results in two possible cases of Language L.
Case 1: when n > 2m, we can re-write the language L by taking n = 2m + 1
L= { a2m+1 bm | m ≥ 1}; by referring the basic building block grammar example, the resulting
production ( a2m bm ) is given by:
A → aaAb
The minimum string when m = 1 is ‘aaab’.
ie : A → a
is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, c}, S is the start symbol, and P is the production rule is as shown below:
S→ aSc| A
A→ bAcc| ɛ
Obtain the context free grammar for the language L = { w an bn wR| W is in (0, 1)* and n ≥ 0 }
Answer: we can re-write the language L as
The corresponding A production is given by; A → aAb |ɛ ; min. value is ɛ when n = 0
We can insert this substring A production between wwR production represented by S.
The corresponding S production is S → 0S0 | 1S1 |A
Note: In S production minimum value is A, when wwR results in ɛ; ie: only the middle substring A
appears.
So the context free grammar for the language L = { w an bn wR| w is in (0, 1)* and n ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → 0S0 | 1S1 | A
A→ aAb | ɛ
Obtain the context free grammar for the language L = { an wwR bn | w is in (0, 1)* and n ≥ 2 }
L=
S1 production is ; S1 → AB
A → aAb |ɛ
B → cB | c
S2 production is ; S2 → AC
C → cCd | ɛ
So the context free grammar for the language L = {an bnci | n ≥ 0, i ≥1 U an bn cm dm | n, m ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, S1, S2 A, B, C}, T = {a, b, c, d}, S is the start symbol, and P is the production rule is as
shown below:
S → S1| S2
S1 → AB
A → aAb |ɛ
B → cB | c
S2 → AC
C → cCd | ɛ
Obtain the context free grammar for the language L1L2 where L1 = { an bn ci | n ≥ 0, i ≥1 } and L2
={ 0n12n | n ≥ 0 }
Answer:
S1 production is ; S1 → AB
A → aAb |ɛ
B → cB | c
S2 production is: S2 → 0 S211 | ɛ
A → aAb |ab
B→ bB |ɛ ; and S production is S → aAB
So the context free grammar for the language L = { an+2 bm | n ≥ 0, m > n } is G = ( V, T, P, S)
where,
V = {S, A, B}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below: S
→ aAB
A → aAb |ab
B→ bB |ɛ
******* Obtain the context free grammar for the language L = { an bm | n ≥ 0, m > n }
n=0 n=1 n=2 …….
m=1 m=2 m=3 … m=2 m=3 m=4 . m=3 m=4 m=5 ……….
ɛb ɛ bb ɛ bbb … abb abbb abbbb . aabbb aabbbb aabbbbb
ɛ b+ ab b+ aabb b+ ……….
an bn b+ where n ≥ 1
We observe that above language consists of strings of a’s and b’s with n number of a’s followed by
n number of b’s, which in term followed by any number of b’s with at least one b
L = { an b n b + | n ≥ 0 }
******* Obtain the context free grammar for the language L = { an bn-3 | n ≥ 3 }
Answer:
L = { aaaɛ, aaaab, aaaaabb, aaaaaabbb,………………………………….. }
So we can re-write the language as;
L = aaa an bn | n ≥ 0
So the context free grammar for the language L = { an bn-3 | n ≥ 3 } is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aaaA
A → aAb | ɛ
******* Obtain the context free grammar for the language L = { w € ( a, b)* | |w| mod 3 ≠ |w| mod
2}
DFA:
Note: The derivation process may end whenever one of the following things happens.
i. The working string no longer contains any non terminal symbols (including, as a special case when
the working string is ε). Ie: working string is generated.
ii. There are non terminal symbols in the working string but there is no match with the left-hand
side of any rule in the grammar. For example, if the working string were AaBb, this would
happen if the only left-hand side were C
Left Most Derivation (LMD): In derivation process, if a leftmost variable is replaced at every step,
then the derivation is said to be leftmost.
Example: E → E+E | E*E | a | b
Let us derive a string a+b*a by applying LMD.
E => E*E
E+E*E
a +E*E
a+b*E
a+b*a
Right Most Derivation (RMD): In the derivation process, if a rightmost variable is replaced at every
step, then the derivation is said to be rightmost.
Example: E → E+E | E*E | a | b
Let us derive a string a+b*a by applying RMD.
E => E+E
E+E*E
E +E*a
E+b*a
a+b*a
Sentential form: For a context free grammar G, any string ‘w’ in (V U T)* which appears in every
derivation step is called a sentence or sentential form.
Two ways we can generate sentence:
i. Left sentential form
ii. Right sentential form
Example: S => AB
aAbB
abB
abbB
abb
Here {S, AB, aAbB, abB, abbB, abb } can be obtained from start symbol S, Each string in the set is
called sentential form.
Left Sentential form: For a context free grammar G, any string ‘w’ in (V U T)* which appears in
every Left Most Derivation step is called a Left sentential form.
Example: E => E*E
E+E*E
a +E*E
a+b*E
a+b*a
Left sentential form = {E, E*E, E+E*E, a +E*E, a+b*E, a+b*a }
Right Sentential form: For a context free grammar G, any string ‘w’ in (V U T)* which appears in
every Right Most Derivation step is called a Left sentential form.
Example: E => E+E
E+E*E
E +E*a
E + b*a
a + b*a
Right sentential form = {E, E+E, E+E*E, E +E*a, E+ b*a, a + b * a }
PARSE TREE: ( DERIVATION TREE)
What is parse tree?
The derivation process can be shown in the form of a tree. Such trees are called derivation trees or
Parse trees.
Example: E → E+E | E*E | a | b
The Parse tree for the LMD of the string a+b*a is as shown below:
YIELD OF A TREE
What is Yield of a tree?
The yield of a tree is the string of terminal symbols obtained by only reading the leaves of the tree
from left to right without considering the ɛ symbols.
Example:
Problem 1:
Consider the following grammar G:
S → aAS |a
A→ SbA |SS |ba
Obtain: i) LMD; ii. RMD iii. Parse tree for LMD iv. Parse tree for RMD for the string
‘aabbaa’
Problem 2:
Design a grammar for valid expressions over operator – and /. The arguments of expressions are
valid identifier over symbols a, b, 0 and 1. Derive LMD and RMD for string w = (a11 – b0) / (b00 –
a01). Write parse tree for LMD
Answer:
Grammar for valid expression:
E → E – E | E / E | (E) |I
I → a | b | Ia |Ib | I0 |I1
Problem 3:
Consider the following grammar G:
E → + EE | * EE | - EE | x | y
Find the: i) LMD; ii. RMD iii. Parse tree for the string ‘+*-xyxy’
Answer:
E → + EE | * EE | - EE | x | y
LMD: RMD:
Problem 4:
Show the derivation tree for the string ‘aabbbb’ with grammar:
S → AB |ɛ
A → aB
B → Sb
Give a verbal description of the language generated by this grammar.
Answer: Derivation tree:
Problem 6:
Consider the following grammar:
S → AbB
A →aA |ɛ
B → aB | bB |ɛ
Give LMD, RMD and parse tree for the string aaabab
LMD: RMD:
Obtain the context free grammar for generating integers and derive the integer 1278 by applying
LMD.
The context free grammar corresponding to the language containing set of integers is G = ( V, T, P,
S) where, V = { I, N, D }, T = { 0, 1}, I is the start symbol, and P is the production rule is as shown
below:
I → N | SN
S→+|-|ε
N → D | DN | ND
D → 0 | 1 | 2 | 3 | ……….| 9
LMD for the integer 1278:
I => N
ND
NDD
NDDD
DDDD
1DDD
12DD
127D
1278
AMBIGUOUS GRAMMAR
Sometimes a Context Free Grammar may produce more than one parse tree for some (or all) of the
strings it generates. When this happens, we say that the grammar is ambiguous. More precisely. a
grammar G is ambiguous if there is at least one string in L( G) for which G
produces more than one parse tree.
***What is an ambiguous grammar?
A context free grammar G is an ambiguous grammar if and only if there exists at least one string
‘w’ is in L(G) for which grammar G produces two or more different parse trees by applying either
LMD or RMD.
Show how ambiguity in grammars are verified with an example.
Testing of ambiguity in a CFG by the following rules:
i. Obtain the string ‘w’ in L(G) by applying LMD twice and construct the parse tree. If the two parse
trees are different, then the grammar is ambiguous.
ii. Obtain the string ‘w’ in L(G) by applying RMD twice and construct the parse tree. If the
two parse trees are different, then the grammar is ambiguous.
iii. Obtain the LMD and get a string ‘w’. Obtain the RMD and get the same string ‘w’
for both the derivations construct the parse tree. If there are two different parse trees
then the grammar is ambiguous.
Show that the following grammar is ambiguous:
S → AB | aaB
A → a | Aa
B→b
Let us take the string w= aab
This string has two parse trees by applying LMD twice so the grammar is ambiguous;
The grammar is ambiguous, because we are getting two different parse trees for the same string by
applying LMD twice.
Associativity and Precedence Priority in CFG:
Example:
E → E+E| E-E
E →E*E
E →a|b|c
Associativity:
Let us consider the string : a + b + c
Parse Tree for LMD1: Parse Tree for LMD2:
The two different parse trees exist because of the associativity rules fails. That means for the given
string a + b + c; on either side of the operand ‘b’, there exist two operators. Which operator should
I associate with operand b? This ambiguity results in either I should consider the operand ‘b’ with
left side operator (Left associative) or right side (Right associative) operator. So the first parse tree
is correct, where the left most ‘+’ is evaluated first.
How to resolve the associtivity rules:
E →E+E
E →a|b|c
Here the grammar is not defined in the proper order, ie: the growth of the tree is in either left
direction or right direction.
The growth of the first parse tree is in left direction. That means it is left associative. The growth
second parse tree is in right direction, ie: right associative.
For normal associative rule is left associative, so we have to restrict the growth of parse tree in right
direction by modifying the above grammar as:
E →E+I|I
I→ a | b | c
The parse tree corresponding to the string: a+b+c:
The growth of the parse tree is in left direction since the grammar is left recursive, therefore it is
left associative. There is only one parse tree exists for the given string. So the grammar is
ambiguous.
Note: For the operators to be left associative, grammar should be left recursive. Also for the
operators to be right associative, grammar should be right recursive.
Left Recursive grammar: A production in which the leftmost symbol of the body is same as the
non-terminal at the head of the production is called a left recursive production.
Example: E → E + T
Right Recursive grammar: A production in which the rightmost symbol of the body is same as
the non-terminal at the head of the production is called a right recursive production.
Example: E → T + E
Precedence of operators in CFG:
Let us consider the string: a + b * c
LMD 1 for the string: a+b*c LMD 2 for the string: a+b*c
The first parse tree is valid, because the highest precedence operator ‘*’ is evaluated first compared
to ‘+’. (See the lower level of parse tree, where ‘*’ is evaluated first). The second parse tree is not
valid, since the expression containing ‘+’ is evaluated first. So here we got two parse trees because
of the precedence is not taken care.
So if we take care of associativity and precedence of operators in CFG, then the grammar is un-
ambiguous.
NOTE:
Normal precedence rule: If we have the operators such as +, -, *, /, , then the highest precedence
operator is evaluated first.
Next highest precedence operator * and / is evaluated. Finally the least precedence operator + and –
is evaluated.
Normal Associativity rule: Grammar should be left associative.
E →E+T|T
Similarly at the second level, we have to generate all ‘*’s.
T → T * F ; * is left associative.
If the expression does not contain any ‘*’s, then we have to bypass the grammar T → T * F
T → F
Finally the second level grammar is
T →T*F|F
Third level:
F →a|b|c
So the resultant un-ambiguous grammar is:
E →E+T|T
T →T*F|F
F →a|b|c
So the operator which is closest to the start symbol has least precedence and the operator which is
farthest away from start symbol has the highest precedence.
Un-Ambiguous Grammar:
For a grammar to be un-ambiguous we have to resolve the two properties such as:
i. Associativity of operators: This can be resolved by writing the grammar recursion.
ii. Precedence of operators: can be resolved by writing the grammar in different levels.
Is the following grammar is ambiguous?
If the grammar is ambiguous, obtain the un-ambiguous grammar assuming normal precedence and
associativity.
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c 10
Answer:
Let us consider the string: a + b * c
LMD 1 for the string: a+b*c LMD 2 for the string: a+b*c
For the given string there exists two different parse trees, by applying LMD twice. So the above
grammar is ambiguous.
The equivalent un-ambiguous grammar is obtained by writing all the operators as left associative
and writing the operators +, – at the first level and *, / at the next level.
Equivalent un-ambiguous grammar:
E →E+T|E–T|T
T →T*F|T/F|F
F → ( E) | a | b | c
Is the following grammar is ambiguous?
If the grammar is ambiguous, obtain the un-ambiguous grammar assuming the operators + and – are
left associative and * and / are right associative with normal precedence .
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c
Ambiguous grammar------- see the previous answer.
Equivalent un-ambiguous grammar:
E →E+T|E–T|T
T →F*T|F/T|F
F → ( E) | a | b | c
LMD 1 for the string ‘aab’: LMD 2 for the string ‘aab’:
RMD 1 for the string ‘aab’: RMD 2 for the string ‘aab’:
The above grammar is ambiguous, since we are getting two parse trees for the same string ‘aab’ by
applying LMD twice.
Two LMDs:
means the evaluation starts from right side; therefore the operator is right associative.
Show that the following grammar is ambiguous. Also find the un-ambiguous grammar equivalent to
the grammar by normal precedence and associative rules.
E → E+ E | E - E
E → E*E| E / E
E→E E
E → ( E) | a | b
Answer:
We already proved that the above grammar is ambiguous
Equivalent Un-ambiguous grammar:
E→E+T|E–T|T
T→T*F|T/F|F
F→G F|G
G → (E) | a | b
The given string has two parse trees by applying LMD twice so the grammar is ambiguous;
Show that the following grammar is ambiguous using the string “ ibtibtaea”
S → iCtS | iCtSeS | a
C→ b
Answer:
String w = ibtibtaea
The given string has two parse trees by applying LMD twice:
stmt → if expr then stmt | if expr then stmt else stmt | other
Terminals are keywords if, then and else.
Non terminals are expr and stmt.
Here “other” stands for any other statement. According to this grammar one of the compound
conditional statement can be written as
if E1 then S1 else if E2 then S2 else S3
It has the parse tree as shown below:
In all programming languages with conditional statements of this form, the first parse tree is
preferred. The general rule is match each else with the closest unmatched then.
Unambiguous grammar for this if else statements:
stmt → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
ope_stmt → if expr then stmt | if expr then matched_stmt else open_stmt
Unambiguous grammar :
S → M |U
M → iEtMeM |a
U → iEtS | iEtMeU
E → b
LEFT RECURSION
A production in which the leftmost symbol of the body is same as the non-terminal at the head of
the production is called a left recursive production.
Example: E → E + T
Immediate Left recursive production:
A production of the form A → Aα is called an immediate left recursive production. Consider a
non-terminal A with two productions
A → Aα | β
Where α and β are sequence of terminals and non-terminals that do not start with A.
Repeated application of this production results in sequence of α’s to the right of A. When A is
finally replaced by β, we have β followed by a sequence of zero or more αs.
Therefore a non-left recursive production for A → Aα | β is given by
A → βA’
A’ → αA’ | ε
Note: In general we can eliminate any immediate left recursive production of the form
A → Aα1 | A α2 | Aα3 ………… | Aαm | β1 | β2| β3|…………| βn
By replacing A production by
A → β1A’ | β2 A’| β3 A’|…………| βn A’
A’ → α1 A’ | α2 A’| α3 A’| …………..|αm A’ | ε
no βi begins with A
What is left recursion?
A grammar is left recursive if it has a non-terminal A such that there is a derivation A Aα for
some string α.
Top down parsing methods cannot handle left recursive grammars, so a transformation is needed
to eliminate left recursion.
A grammar containing productions results in left recursive productions, after applying two or
more steps of derivations can be eliminated using the following algorithm.
Algorithm to eliminate left recursion from a grammar having no ε production:
Write an algorithm to eliminate left recursion from a grammar.
1. Arrange the non-terminals in some order A1, A2, . . . , An
2. for ( each i from 1 to n )
{
S → Aa | b
A → Ac| Sd | a
By applying elimination algorithm,
Arrange the non-terminals as A1 = S and A2 = A
Since there is no immediate left recursion among S production, so nothing happens during the
outer loop for i =1.
For i =2, we substitute for S in A → Sd to obtain the following A productions.
A → Ac| Aad | bd | a
Eliminating the immediate left recursion among these A- productions yields the following
grammar
S → Aa | b
A → bdA’| aA’
A’ → cA’| adA’ | ε
C → CAB’CB| abB’CB | CC | aB | a
Eliminating the immediate left recursion among these C- productions results in new C
productions as
C → abB’CBC’ | aBC’ |aC’
C’ → AB’CB C’ | CC’ | ε
The equivalent non- left recursive grammar is given by:
A → BC | a
B → CAB’| abB’
B’ → CbB’ | ε
C → abB’CBC’ | aBC’ |aC’
C’ → AB’CB C’ | CC’ | ε
Eliminate left recursion from the following grammar.
Lp → no | Op Ls
Op → +1–1*
Ls → Ls Lp | Lp
For i = 1 and 2 nothing happens to the production Lp and Op.
For i= 3
By removing immediate left recursion,
Ls → Lp Ls’
Ls’ → Lp Ls’ | ε
The equivalent non- left recursive grammar is given by:
Lp → no | Op Ls
Op → +1–1*
Ls → Lp Ls’
Ls’ → Lp Ls’ | ε
S → aB | aC | Sd | Se
B → bBc| f
C → g
For i =1 , results in a new S productions as
S → aB S’ | aC S’
S’ → d S’ | eS’ |ε
For i =2 nothing happens to B productions, B → bBc| f
For i =3 nothing happens to C productions C → g
The equivalent non- left recursive grammar is given by:
S → aB S’ | aC S’
S’ → d S’ | eS’ |ε
B → bBc| f
C → g
LEFT FACTORING (Non-deterministic to Deterministic CFG conversion)
It is a grammar transformation method used in parser. When the choice between two alternative A
productions is not clear, we can rewrite the productions so to make the right choice.
A→ αβ1 | αβ2 |………..| αβn | Γ
By left factoring this grammar, we get
A → αA‟ | Γ
A‟ → β1 | β2 ……………..| βn
Γ is other alternatives that do not begin with α.
A predictive parser (a top-down parser without backtracking) insists that the grammar must be
left-factored.
What is left factoring?
Left factoring is removing the common left factor that appears in two or more productions of the
Same non-terminal.
Example: S → i EtSeS | iEtS | a
E→b
S → i EtSS’ | a
S’ → eS | ε
E→b
Perform left factoring for the grammar.
E → E+T|T
T → id | id [ ] | id [ X ]
X → E,E|E
The equivalent non-left recursive grammar is given by:
E → TE’
E’ → +TE’ | ε
T → id | id [ ] | id [ X ]
X → E,E|E
After left factoring the grammar, we get
E → TE’
E’ → +TE’ | ε
T → id T’
T’ → ε | [ ] |[ X] |
X → E X’
X’ → ,E|ε
Top-down parsing can be viewed as the problem of constructing a parse tree for the input
string, starting from the root (Top) and working up towards the leaves (Down).
Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input
string.
At each step of a top-down parse, the key problem is that of determining the production to
be applied for a non-terminal, say A.
Once an A-production is chosen, the rest of the parsing process consists of "matching" the
terminal symbols in the production body with the input string.
RECURSIVE-DESCENT PARSING
Backtracking is needed (If a choice of a production rule does not work, we backtrack to try
other alternatives.)
It is a general parsing technique, but not widely used.
Not an efficient parsing method.
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop, so we
have to eliminate left recursion from a grammar
Recursive-Descent Parsing Algorithm:
Explain Recursive-Descent Parsing Algorithm.
void A ( )
{
1. Choose an A-production, A → X1 X2 X3 …………………………….. Xk ;
2. for (i = 1 to k)
{
3. if ( Xi is a non-terminal )
4. call procedure Xi ( ) ;
5. else if (Xi equals the current input symbol a)
6. advance the input to the next symbol;
The leftmost leaf labeled c, matches the first symbol of input w, so we advance the input pointer to
a, the second symbol of w, and consider the next leftmost leaf labeled A.
Expand A using the first alternative A → a b to obtain the following tree:
Now we have a match for the second input symbol a, with the leftmost leaf labeled a, so we
advance the input pointer to d, third input symbol of w.
Now compare the current input symbol d against the next leaf labeled b. Since b does not match d
,we report failure and go back to A (Back tracking) to see whether there is another alternative for
A that has not been tried, but that might produce a match.
Now the leftmost leaf labeled a matches the current input symbol a, ie: the second symbol of w,
then advance the pointer to the next input symbol d.
Now the next leaf d matches the third input symbol d, later when it finds $ nothing is left out to be
read in the tree. Since it produces a parse tree for the string w, it halts and announce successful
completion of parsing.
return success
Write a recursive descent parser for the grammar:
S → aBc
B → bc | b
Input: abc
Begin with a tree consisting of a single node labeled S with input pointer pointing to first input
symbol a.
Since the input a matches with leftmost leaf labeled a, advance the pointer to next input symbol
b.
Expand B using the alternative B → bc
We have a match for second input symbol b. Move the pointer again it finds the match for third
symbol c. Now the pointer is pointing to $, indicating the end of string, but in the tree we find one
more symbol c to be read, thus it fails
When the pointer is set to position 2, it checks the second alternative and generates the tree ;
Now the pointer moves to the 2nd symbol finds a match, then advances to the 3rd symbol finds a
match, later when it encounters „$‟ nothing is left out to be read in the tree. Thus it halts and
announce successful completion of parsing.
return success
Show that recursive descent parsing fails for the input string ‘acdb’ for the grammar.
S → aAb
A → cd | c
The first input symbol a matches with left most leaf a and advance the pointer to next input
symbol c.
Now expand A using the second alternative A → c
We have a match for second input symbol c with left leaf node c. Advance the pointer to the next
input symbol d.
Now compare the input symbol d against the next leaf, labeled b. Since b does not match d, we
report failure and go back to A to see another alternative for A and reset the pointer to position 2.
FOLLOW ( ) FUNCTION:
FOLLOW (A) is defined as the set of terminal symbols that appear immediately to the right of A. ie
: FOLLOW (A ) = { a | S *=> αAaβ where α and β are some grammar symbols, may be terminal or
non terminal symbols.
Rules used in computation of FOLLOW function:
1. For the start symbol S place ‘$’ in FOLLOW (S).
2. If there is a production A→ αBβ, then everything in FIRST ( β ) except ε is in FOLLOW
(B).
3. If there is a production A→ αBβ and FIRST(β) derives ε, then
T →T * F|F
F → (E) | id
The above grammar contains left recursive productions, so by eliminating left recursive, grammar
G becomes:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ |ε
F → (E) | id
Computation of FIRST set:
From T → FT’
FOLLW (T‟) = FOLLOW ( T) = { +, ), $ }
From T’ → *FT’ | ε
FOLLW (T‟) = FOLLOW (T‟)
Therefore FOLLOW (T’) = { +, ), $ }
From T →FT’
FOLLOW (F) = FIRST ( T‟) – { ε } U FOLLOW (T) = { *, +, ), $} ie: by
applying 3rd rule, as β tends to ε when T‟ derives ε
From T’ →*FT’ | ε
FOLLOW (F) = FIRST ( T‟) – { ε } U FOLLOW (T‟)
Therefore FOLLOW (F) = { *, +,
NOTE:
For any non-terminal, FOLLOW set is computed by selecting the productions in which, that non-
terminal appears on RHS of production.
Non-terminal symbol FIRST FOLLOW
E { (, id } { ), $}
T { (, id } { +, ), $ }
F { (, id } { *, +, ), $ }
E’ { +, ε } { ),$ }
T‟ { *, ε } { +, ), $ }
T →T * F|F
F → (E) | id
F { (, id } { *, +, ), $ }
FOLLOW ( E ) = { ) } from F → (E) and FOLLOW ( E ) = { + } from E → E +
T .
FOLLOW ( T ) = {*} from T → T * F and FOLLOW ( T ) = FOLLOW{E} from
E →E + T |T
Stmt_seq’ → ; Stmt_sequence | ε
Stmt → s
Non-terminal symbol FIRST FOLLOW
Stmt_sequence {s } { $}
Stmt_seq’ {; ε} { $}
{
Stmt {s } ; $}
Construct the predictive parsing table by making necessary changes to the grammar given
below:
E →E + T |T
T →T * F|F
F → (E) | id
Also check whether the modified grammar is LL(1) grammar or not.
The above grammar contains left recursive productions, so we eliminate left recursive
productions.
T { (, id } { +, ), $ }
F { (, id } { *, +, ), $ }
E’ { +, ε } { ), $ }
T‟ { *, ε } { +, ), $ }
Non-terminal id + * ( ) $
E E → TE‟ E → TE‟
E’ E’→+TE’ E’→ ε E’→ ε
T T→ FT‟ T→ FT‟
T’ T’→ ε T’→*FT’ T’→ ε T’→ ε
F F→ id F→ (E)
The above modified grammar is LL(1) grammar, since the parsing table entry uniquely identifies a
production or signals an error
Construct the LL(1) parsing table for the grammar given below:
E →E * T |T
T → id + T | id
T → id + T | id
E‟ { *, ε } {$}
T { id } { *, $ }
T‟ { +, ε } { *, $ }
Construction of predictive parsing table:
Non-terminal id + * $
E E → TE‟
E’ E’ → *T E’ E’→ ε
T T → id T’
T’ T’→ +T T’→ ε T’→ ε
Do necessary modifications and Construct the LL(1) parsing table for the resultant grammar .
By eliminating left recursive productions:
E → TE’
E’ → ATE’ | ε
A→+|-
T → FT’
T’ → MFT’ | ε
M →*
F → (E) | num
E‟ { +, -, ε } { ), $ }
T { (, num } { +, - , ), $ }
T‟ { *, ε } { +, - , ), $ }
A { +, - } { (, num }
M {*} { (, num }
F { (, num } { *, +, - , ), $ }
Construction of predictive parsing table:
Input Symbol
Non- num + - * ( ) $
terminal
E E → TE‟ E → TE‟
E’ E‟→ATE‟ E‟→ATE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→ ε T‟→MFT‟ T‟→ ε T‟→ ε
A A →+ A →-
M M→*
F F→ num F→ (E)
Construct the LL(1) parsing table for the grammar given below:
S → AaAb | BbBa
A →ε
B →ε
Answer:
Non-terminal symbol FIRST FOLLOW
S { a, b } { $}
A {ε} { a, b }
B {ε} { a, b }
Parsing Table:
a b $
Non-terminal
E S → AaAb S → BbBa
E’ A→ε A→ε
T B→ ε B→ε
Construct the LL(1) parsing table for the grammar given below:
S →A
A → aB
B → bBC | f
C →g
Non-terminal symbol FIRST
S {a}
A {a}
B { b, f }
C {g}
Note: Since the grammar is ε- free, FOLLOW sets are not required to be computed in order to enter
the productions into the parsing table.
Parsing Table:
a b f g d
Non-terminal
S S→A
A A → aB A→d
B B → bBC B→f
C C→ g
Construct the LL(1) parsing table for the grammar given below:
S → aBDh
B → cC
C → bC | ε
D → EF
E →g|ε
F →f|ε
B {c} { g, f, h }
C { b, ε } { g, f, h }
D { g, f, ε } {h}
E { g, ε } { f, h }
F { f, ε } { h}
Parsing Table:
NT a b c g f h $
S S → aBDh
B B → cC
C C→ bC C→ ε C→ ε C→ ε
D D→ EF D→ EF D→ EF
E E→ g E→ε E→ ε
F F→ f F→ ε
T { (, id } { +, ), $ }
F { (, id } { *, +, ), $ }
E’ { +, ε } { ), $ }
T‟ { *, ε } { +, ), $ }
Non-terminal id + * ( ) $
E E → TE‟ E → TE‟
E’ E’→+TE’ E’→ ε E’→ ε
T T→ FT‟ T→ FT‟
T’ T’→ ε T’→*FT’ T’→ ε T’→ ε
F F→ id F→ (E)
iv. The above modified grammar is LL(1) grammar, since the parsing table entry uniquely
Identifies a production or signals an error.
v. Moves made by predictive parser on input id + id * id
MATCHED STACK INPUT ACTION
E$ id+ id * id$
TE‟$ id+ id * id$ Output E → TE‟
FT‟E‟$ id+ id * id$ Output T → FT‟
idT‟E‟$ id+ id * id$ Output F → id
id T‟E‟$ + id * id $ match id
id E‟$ + id * id $ Output T‟→ ε
L {(a } { )}
L’ {,ε} { )}
iv. The above modified grammar is LL(1) grammar, since the parsing table entry uniquely
identifies a production or signals an error.
v. Moves made by predictive parser on input (a, (a,a))
MATCHED STACK INPUT ACTION
S$ (a,(a, a))$
(L)$ (a,(a, a))$ Output S → (L)
( L)$ a,(a, a))$ match (
( SL’)$ a,(a, a))$ Output L →SL’
( aL’)$ a,(a, a))$ Output S→a
(a L’)$ ,(a, a))$ match a
(a ,SL’)$ ,(a, a))$ Output L’→,SL’
(a, SL’)$ (a, a))$ match ,
(a, (L)L‟)$ (a, a))$ Output S → (L)
(a,( L)L’)$ a, a))$ match (
(a,( SL’)L’)$ a, a))$ Output L →SL’
(a,( a L’)L’)$ a, a))$ Output S→a
(a,(a L’)L’)$ , a))$ match a
(a,(a ,SL’)L’)$ , a))$ Output L’→,SL’
(a,(a, SL’)L’)$ a))$ match ,
(a,(a, aL’)L’)$ a))$ Output S→a
(a,(a,a L’)L’)$ ))$ match a
(a,(a,a )L’)$ ))$ Output L’→ε
(a,(a,a) L’)$ )$ match )
(a,(a,a) )$ )$ Output L’→ε
(a,(a,a)) $ $ match )
A { c, ε } { d, b }
B { d, ε} {b}
Parsing Table:
S’ { e, ε } { e, $ }
E {b} {t}
Non-terminal
a b e i t $
S S→a S → iEtSS’
S’ → eS
S’ → ε
S’ S’ → ε
E E→ b
The above parsing table contains two production rules for M [S’, e]. So the given grammar is
not LL(1) grammar.
Here the grammar is ambiguous, and the ambiguity is manifested by a choice in what production to
use when an e (else) is seen. We can resolve this ambiguity by choosing S’ → eS.
A { a, c } { b, d }
Explain how panic mode error recovery techniques used for the following grammar:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ |ε
F → (E) | id
T { (, id } { +, ), $}
F { (, id } { *, +, ), $ }
E’ { +, ε } { ), $ }
T‟ { *, ε } { +, ), $ }
FT‟E’$ id$
idT‟E’$ id$
T‟E’$ $
E’$ $
$ $
NOTE:
How to determine a Context free grammar is LL(1) or Not? without constructing parsing Table
1. For any CFG of the form:
A → α1 | α2 | α3 | ……..
If there is no ε in any of these rules, then find FIRST(α1), FIRST(α2), FIRST(α3) and so on.
Take the intersection of these FIRST()s pair-wise.
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) ……………….. = Ø (No common terms)
Then the grammar is LL(1) grammar otherwise it is not LL(1)
[Find the pair-wise intersection of FIRST()]
Example:
Check whether the following grammar is LL(1) or not without constructing parsing table.
1. S → aSa | bS | c
Answer:
FIRST(α1) = FIRST(aSa) = {a}
FIRST(α2) = FIRST(bS) = {b}
FIRST(α3) = FIRST(c) = {c}
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) = {a}∩ {b}∩ {c} = Ø
Therefore the given grammar is LL(1) grammar
Check whether the following grammar is LL(1) or not without constructing parsing table
S → iCtSS1| bS | a
S1 → eS | ε
C→b