Automata and Formal Languages: CS138, Winter 2006
Automata and Formal Languages: CS138, Winter 2006
Formalities
The new Homework 3 is due on Friday afternoon. Questions?
This Week
This week: Simplification of Context-Free Grammars from An Introduction to Formal Languages and Automata by Peter Linz [Reader, pp. 7185] We will look at the important task of rewriting Context Free Grammars to equivalent ones to easy the computational problem of parsing words of the CFG. Ultimately, we will describe a parsing algorithm that works in polynomial time O(|w|3).
Removing -Productions
A -production rule is of the form A . Any variable A for which we have A * is called nullable. Theorem 6.3: If the language of a CFG G is -free, then we can efficiently rewrite G to an equivalent CFG G without -production rules. Proof: Backtracking, collect all nullable variables in VN. Add to P all production rules A x1x2xm as well as the rules that have the variables from VN replaced by . Unless all xj are nullable, then A is not added. Again, see Reader for more details.
CS138, Wim van Dam, UCSB
Example 6.5
Take the CFG defined by S ABaC A BC B b| C D| Dd The nullable variables are: VN = {A,B,C}
Thus we get the equivalent, -production free CFG : S ABaC | BaC | AaC | ABa | aC | Ba | Aa | a A BC | B | C Bb CD Dd
CS138, Wim van Dam, UCSB
Example 6.6
Take the CFG S Aa|B B A|bb A a|bc|B Unit productions for the CFG are: S * B and S * A B * A A * B
added with: S bb | a | bc B a | bc A bb
CS138, Wim van Dam, UCSB
Today
Last Monday we saw how to transform a (-free) CFG into an equivalent CFG that has: 1. no -productions (A * ) 2. no unit-productions (A * B) 3. no useless variables or useless productions Today we will discuss two important normal forms: the Chomsky Normal Form and the Greibach Normal Form, and the fast parsing of CFGs in CNF [Reader, pp. 8084].
added with: S bb | a | bc B a | bc A bb
CS138, Wim van Dam, UCSB
Theorem 6.6
Theorem 6.6: Every -free CFG G can be described by an equivalent CFG G in Chomsky normal form. The transformation from G to G can be done efficiently. Outline of Proof: 1. Rewrite G to eliminate unit and -productions. 2. Rewrite such that all terminal producing rules are of the form Baa. 3. Rewrite such that all variable producing rules are of the form ACD with C,DV.
Details of Proof
Step 2: How do you transform general production rules of the kind Ay1yn with yjVT to rules that are of the kind Ay1yn with yjV or Ay with yT? Answer: Introduce terminal producing variables Byy for each yT and replace in all relevant rules y by By. Step 3: How do you transform production rules of the kind AC1Cn with CjV to rules of the kind AC1C2? Answer: Make a chain of rules to produce C1Cn: AC1D1 and D1C2D2 and and Dn2Cn1Cn.
CYK in Action
Take the grammar V11 V12 V13 V14 S AB | CC A CC C S,A . B BC | 0 V22 V23 V24 C0|1 with w = 1101 L? B,C S,A,B V11 = {C}, V22 = {C}, V33 = {B,C}, V44 = {C} V33 V34 V12 = {S,A}, V23 = {S,A}, V34 = {S,A,B} C V13 = {S}, V24 = {} V44 V14 = {S} S AB CCB 1CB 11B 11BC 110C 1101
CS138, Wim van Dam, UCSB
S,A
Complexity of CYK
There are O(n2) variable sets Vik that we have to construct. For each set Vik there are no more than n pairs (Vij,Vj+1 k) that we have to consider to determine Vik. In total, the running time is upper bounded by O(n3). Note that this does not include the time required to bring the CFG into Chomsky Normal Form (which can be done efficiently though).
Formalities
The new Homework 3 is due today, 5pm. New homework will be announced this weekend. Midterm on context free grammars will probably be later than originally planned (so, after Friday March 3). Coming Monday there will be no class. Questions?
CS138, Wim van Dam, UCSB
CYK in Action
Take the grammar V11 V12 V13 V14 S AB | CC 1 A CC C S,A . B BC | 0 1 V22 V23 V24 C0|1 with w = 1101 L? B,C S,A,B 0 V11 = {C}, V22 = {C}, V33 = {B,C}, V44 = {C} V33 V34 V12 = {S,A}, V23 = {S,A}, V34 = {S,A,B} C V13 = {S}, V24 = {} 1 V44 V14 = {S} Retracing the V14 = {S} result gives the derivation tree:
CS138, Wim S AB CCB 1CB 11B 11BC 110C 1101van Dam, UCSB
S,A
An Exercise (1)
Write into Chomsky Normal Form the CFG: S aA|aBB A aaA| B bC|bbC CB Answer (1): First you remove the -productions (A): S aA|aBB|a A aaA|aa B bC|bbC CB
An Exercise (2)
Answer (2): Next you remove the unit-productions from: S aA|aBB|a A aaA|aa B bC|bbC CB Removing CB, we have to include the C*B possibility, which can be done by substitution (Thm 6.4) and gives: S aA|aBB|a A aaA|aa B bC|bbC C bC|bbC