6_Simplification of CFG
6_Simplification of CFG
Forms
1
Why and How to “simplify”
CFGs?
2
Why to simplify?
1) Simpler grammar are easier to
understand
2) Can lead to fast parsing
3) Restricted forms are useful for some
parsing algorithm
4) Restricted forms can give more
knowledge about derivations
3
Three ways to simplify/clean a CFG
(clean)
1. Eliminate useless symbols
(simplify)
2. Eliminate -productions A =>
4
Eliminating useless symbols
Grammar cleanup
5
Eliminating useless symbols
A symbol X is reachable if there exists:
◼ S ➔* X
reachable generating
6
Algorithm to detect useless
symbols
1. First, eliminate all symbols that are not
generating
7
Example: Useless symbols
◼ S➔AB | a
◼ A➔ b
1. A, S are generating
2. B is not generating (and therefore B is useless)
3. ==> Eliminating B… (i.e., remove all productions that involve B)
1. S➔ a
2. A➔b
4. Now, A is not reachable and therefore is useless
◼ Given: G=(V,T,P,S)
◼ Basis:
◼ Every symbol in T is obviously generating.
◼ Induction:
◼ Suppose for a production A➔ , where
is generating
◼ Then, A is also generating
9
S ➔* X
Algorithm to find all reachable symbols
◼ Given: G=(V,T,P,S)
◼ Basis:
◼ S is obviously reachable (from itself)
◼ Induction:
◼ Suppose for a production A➔ 1 2… k,
where A is reachable
◼ Then, all symbols on the right hand side,
{1, 2 ,… k} are also reachable.
10
Example 1
◼ S -> AB | a
A -> BC | b
B -> aB | C
C -> aC | B
Generating Symbols = { a, b, S, A}
Reachable Symbols ={ S, a}
11
Example 2
◼ S -> AC | B
A -> a
C -> c | BC
E -> aA | e S -> AC
A -> a
C -> c
Generating Symbols = { a, c, e, A, C, E, S} E -> aA | e
Reachable Symbols ={ S, A, C, a, c}
Final Grammar is :
S -> AC
A ->c
C -> c
12
Example 3
◼ S -> AB | AC
A -> aAb | bAa | a
B -> bbA | aaB | AB
C -> abCA | aDb
D -> bD | aC S -> AB
A -> aAb | bAa | a
B -> bbA | aaB | AB
Generating Symbols = {a, b, A, B, S}
Reachable Symbols ={S, A, B, a, b}
Final Grammar is:
S -> AB
A -> aAb | bAa | a
B -> bbA | aaB | AB
13
Eliminating -productions
A =>
14
A➔
Eliminating -productions
Caveat: It is not possible to eliminate -productions for
languages which include in their word set
So we will target the grammar for the rest of the language
Theorem: If G=(V,T,P,S) is a CFG for a language L,
then L- {} has a CFG without -productions
15
Algorithm to detect all nullable
variables
◼ Basis:
◼ If A➔ is a production in G, then A is
nullable
(note: A can still have other productions)
◼ Induction:
◼ If there is a production B➔ C1C2…Ck,
where every Ci is nullable, then B is also
nullable
16
Eliminating -productions
Given: G=(V,T,P,S)
Algorithm:
1. Detect all nullable variables in G
2. Then construct G1=(V,T,P1,S) as follows:
i. For each production of the form: A➔X1X2…Xk, where
k≥1, suppose m out of the k Xi’s are nullable symbols
ii. Then G1 will have 2m versions for this production
i. i.e, all combinations where each Xi is either present or absent
iii. Alternatively, if a production is of the form: A➔, then
remove it
17
Example: Eliminating -
productions
◼ Let L be the language represented by the following CFG G:
i. S➔AB
ii. A➔aAA |
iii. B➔bBB |
Simplified
grammar
Goal: To construct G1, which is the grammar for L-{}
18
Example: Eliminating -
productions
◼ Let L be the language represented by the following CFG G:
i. S➔aSb | aAb
ii. A➔
Simplified
Goal: To construct G1, which is the grammar for L-{} grammar
Similarly, S ➔ aSb | ab
19
Example: Eliminating -
productions
◼ Let L be the language represented by the following CFG G:
i. S➔ABA
ii. A➔aA |
iii. B➔bB |
Simplified
grammar
Goal: To construct G1, which is the grammar for L-{}
◼ S-> a | Xb | aYa
X -> Y |
Y -> b | X
21
Eliminating unit productions
A => B B has to be a variable
23
The Unit Pair Algorithm:
to remove unit productions
◼ Suppose A➔B1 ➔B2 ➔ … ➔ Bn ➔
◼ Action: Replace all intermediate productions to produce
directly
◼ i.e., A➔ ; B1➔ ; … Bn ➔ ;
◼ Induction: If (A,B) and (B,C) are unit pairs, and A➔C is also a unit
pair.
24
The Unit Pair Algorithm:
to remove unit productions
Input: G=(V,T,P,S)
Goal: to build G1=(V,T,P1,S) devoid of unit
productions
Algorithm:
1. Find all unit pairs in G
2. For each unit pair (A,B) in G:
1. Add to P1 a new production A➔, for every
B➔ which is a non-unit production
2. If a resulting production is already there in P,
then there is no need to add it.
25
Examples
◼ S -> Aa | B ◼ S -> A | bb
B -> A | bb A -> B | b
A -> a | bc | B B -> S | a
S -> Aa | bb | a | bc
S -> a | b | bb
B -> bb | a | bc
A -> a | bc | bb
◼ S -> AB
A ->a ◼ S-> a | Xb | aYa | B | aa
B -> C | b X -> Y
C -> D Y -> b | X
D -> E
E -> a
26
Putting all this together…
◼ Theorem: If G is a CFG for a language that
contains at least one string other than , then there
is another CFG G1, such that L(G1)=L(G) - , and
G1 has:
◼ no -productions
◼ no unit productions
◼ no useless symbols
◼ Algorithm:
Step 1) eliminate -productions Again,
Step 2) eliminate unit productions the order is
Step 3) eliminate useless symbols important!
Why?
27
Normal Forms
28
Why normal forms?
◼ If all productions of the grammar could be
expressed in the same form(s), then:
29
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{}
Definition:
G is said to be in Chomsky Normal Form if all
its productions are in one of the following
two forms:
i. A ➔ BC where A,B,C are variables, or
ii. A➔a where a is a terminal
◼ G has no useless symbols
◼ G has no unit productions
◼ G has no -productions
30
CNF checklist
Is this grammar in CNF?
G1:
1. E ➔ E+T | T*F | (E) | Ia | Ib | I0 | I1
2. T ➔ T*F | (E) | Ia | Ib | I0 | I1
3. F ➔ (E) | Ia | Ib | I0 | I1
4. I ➔ a | b | Ia | Ib | I0 | I1
Checklist:
• G has no -productions
• G has no unit productions
• G has no useless symbols
• But…
• the normal form for productions is violated
32
Example #1
G in CNF:
G:
X0 => 0
S => AS | BABC
X1 => 1
A => A1 | 0A1 | 01
S => AS | BY1
B => 0B | 0 Y1 => AY2
C => 1C | 1 Y2 => BC
A => AX1 | X0Y3 | X0X1
Y3 => AX1
B => X0B | 0
C => X1C | 1
33
Example #2
G: 1. E ➔ EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1
1. E ➔ E+T | T*F | (E) | Ia | Ib | I0 | I1 2. T ➔ TX*F | X(EX) | IXa | IXb | IX0 | IX1
2. T ➔ T*F | (E) | Ia | Ib | I0 | I1 3. F ➔ X(EX) | IXa | IXb | IX0 | IX1
3. F ➔ (E) | Ia | Ib | I0 | I1 4. I ➔ Xa | Xb | IXa | IXb | IX0 | IX1
4. I ➔ a | b | Ia | Ib | I0 | I1 Step (1) 5. X+ ➔ +
6. X* ➔ *
7. X+ ➔ +
8. X( ➔ (
9. …….
34
Languages with
◼ For languages that include ,
◼ Write down the rest of grammar in CNF
◼ Then add production “S => ” at the end
E.g., consider: G in CNF:
G: X0 => 0
X1 => 1
S => AS | BABC
A => A1 | 0A1 | 01 | S => AS | BY1 |
B => 0B | 0 | Y1 => AY2
C => 1C | 1 | Y2 => BC
36
Return of the Pumping Lemma !!
37
Why pumping lemma?
◼ A result that will be useful in proving
languages that are not CFLs
◼ (just like we did for regular languages)
A B
Ind. Hyp: h = k-1
➔ |w|≤ 2k-2
h
= height
Ind. Step: h = k
S will have exactly two children:
S➔AB
Implication:
◼ If |w| ≥ 2m, then
◼ Its parse tree’s height is at least m+1
41
The Pumping Lemma for CFLs
Let L be a CFL.
Then there exists a constant N, s.t.,
◼ if z L s.t. |z|≥N, then we can write
z=uvwxy, such that:
1. |vwx| ≤ N
2. vx≠
3. For all k≥0: uvkwxky L
43
Meaning:
Repetition in the
last m+1 variables
A1 Ai = Aj
A2 Ai
h ≥ m+1 h ≥ m+1
Aj
m+1
Ah-1
u v x y
Ah=a
w
z z = uvwxy
• Therefore, vx≠
44
Extending the parse tree…
S = A0
S = A0
Ai
w
Ai u y
u v x y
z = uwy
v x
w ==> For all k≥0: uvkwxky L
z = uvkwxky 45
Proof contd..
• Also, since Ai’s subtree no taller than m+1
But, 2m =N
==> |vwx| ≤ N
46
Application of Pumping
Lemma for CFLs
Example 1: L = {ambmcm | m>0 }
Claim: L is not a CFL
Proof:
◼ Let N <== P/L constant
◼ Pick z = aNbNcN
◼ Apply pumping lemma to z and show that there
exists at least one other string constructed from z
(obtained by pumping up or down) that is L
47
Proof contd…
◼ z = uvwxy
◼ As z = aNbNcN and |vwx| ≤ N and vx≠
◼ ==> v, x cannot contain all three symbols
(a,b,c)
◼ ==> we can pump up or pump down to build
another string which is L
48
Example #2 for P/L application
◼ L = { ww | w is in {0,1}*}
49
Example 3
k2
◼ L={ 0 | k is any integer)
50
Example 4
◼ L = {aibjck | i<j<k }
51
CFL Closure Properties
52
Closure Property Results
◼ CFLs are closed under:
◼ Union
◼ Concatenation
◼ Kleene closure operator
◼ Substitution
◼ Homomorphism, inverse homomorphism
◼ reversal
◼ CFLs are not closed under: Note: Reg languages
are closed
◼ Intersection under
◼ Difference these
◼ Complementation operators
53
Strategy for Closure Property
Proofs
◼ First prove “closure under substitution”
◼ Using the above result, prove other closure properties
◼ CFLs are closed under:
◼ Union
◼ Concatenation
◼ Kleene closure operator
Prove ◼ Substitution
this first ◼ Homomorphism, inverse homomorphism
◼ Reversal
54
Note: s(L) can use
a different alphabet
55
CFLs are closed under
Substitution
IF L is a CFL and a substititution defined
on L, s(L), is s.t., s(a) is a CFL for every
symbol a, THEN:
◼ s(L) is also a CFL
What is s(L)?
L s(L)
w1 s(w1) Note: each s(w)
w2 s(L) s(w2) is itself a set of strings
w3 s(w3)
w4 s(w4)
… … 56
CFLs are closed under
Substitution
◼ G=(V,T,P,S) : CFG for L
◼ Because every s(a) is a CFL, there is a CFG for each s(a)
◼ Let Ga = (Va,Ta,Pa,Sa)
◼ Construct G’=(V’,T’,P’,S) for s(L)
◼ P’ consists of:
◼ The productions of P, but with every occurrence of terminal “a” in
their bodies replaced by Sa.
◼ All productions in any Pa, for any a ∑
S=> S0SS0 | S1 S S1 |
S0=> aS0b | ab
S1=> xx | yy 58
CFLs are closed under union
Let L1 and L2 be CFLs
To show: L2 U L2 is also a CFL
Let us show by using the result of Substitution
◼ Then, L* = s(Lnew)
61
We won’t use substitution to prove this result
62
Some negative closure results
◼ L1 L2 = L1 U L2
Logic: if CFLs were to be closed under complementation
➔ the whole right hand side becomes a CFL (because
CFL is closed for union)
➔ the left hand side (intersection) is also a CFL
➔ but we just showed CFLs are
NOT closed under intersection!
➔ CFLs cannot be closed under complementation.
64
Some negative closure results
65
Decision Properties
◼ Emptiness test
◼ Generating test
◼ Reachability test
◼ Membership test
◼ PDA acceptance
66
“Undecidable” problems for
CFL
◼ Is a given CFG G ambiguous?
◼ Is a given CFL inherently ambiguous?
◼ Is the intersection of two CFLs empty?
◼ Are two CFLs the same?
◼ Is a given L(G) equal to ∑*?
67
Summary
◼ Normal Forms
◼ Chomsky Normal Form
◼ Griebach Normal Form
◼ Useful in proroving P/L
◼ Pumping Lemma for CFLs
◼ Main difference: z=uviwxiy
◼ Closure properties
◼ Closed under: union, concatentation, reversal, Kleen
closure, homomorphism, substitution
◼ Not closed under: intersection, complementation,
difference
68