SSK5204 Chapter 5: Context-Free Grammars and Languages
SSK5204 Chapter 5: Context-Free Grammars and Languages
Introduction
Finite Automata and Regular Expressions, two different,
though equivalent, methods of describing languages.
Introductioncont.
Important application of CFG occurs in the
specification and compilation of programming
languages (PL)
A grammar for a PL often appears as a reference for
people trying to learn the language syntax.
Designers of compilers and interpreters for PL often
start by obtaining a grammar for the language.
Context-free languages
*
+
5
3
= 25
or
*
3
= 17
EXPR
EXPR
TERM
TERM
TERM
NUM
NUM
NUM
VERB-PHRASE
NOUN-PHRASE A-NOUN
or A-NOUN PREP-PHRASE
a girl
A-NOUN
PREP-PHRASE
a girl
A-NOUN
PREP-PHRASE
recursive A-NOUN
structure
PREP-PHRASE PREP NOUN-PHRASE
with a flower
PREP NOUN-PHRASE
ARTICLE a
ARTICLE the
NOUN boy
NOUN girl
NOUN flower
VERB likes
VERB touches
VERB sees
PREP with
SENTENCE
NOUN-PHRASE
VERB-PHRASE
CMPLX-VERB
PREP-PHRASE
A-NOUN
ARTICLE NOUN
A-NOUN
PREP ARTICLE NOUN
NOUN-PHRASE
A-NOUN
VERB ARTICLE NOUN
10
Context-free grammar
start variable
A 0A1
AB
B#
variables
terminals
productions
A 0A1 00A11 000A111
000B111 000#111
derivation
11
Context-free grammar
Aa
A is a variable and a is a string of variables and terminals
S is a variable called the start variable
12
13
N 0N
N 1N
N0
N1
Variables: E, N
Terminals: +, *, (, ), 0, 1
Start variable: E
shorthand:
conventions:
E E + E | (E) | N
N 0N | 1N | 0 | 1
Variables in UPPERCASE
Start variable comes first
Derivation
A derivation is a sequential application of productions:
E E+E
(E)+ E
(E)+ N
(E + E)+ 1
(E + E)+ 1
(E + N)+ 1
(N + N)+ 1
(N + 1N)+ 1
(N + 10)+ 1
(1 + 10)+ 1
* (1 + 10)+ 1
E
14
E E + E |(E) | N
N 0N | 1N | 0 | 1
derivation
ab
one production
* b
a
derivation
Context-free languages
Analysis example 1
A 0A1 | B
B#
L(G) = {0n#1n: n 0}
16
00#11
A B #
00#111
00##11
Analysis example 2
S SS | (S) |
()
17
(2)
(3)
(S)
(SS)
((S)S)
((S)(S))
(()(S))
(()())
(()())
Parse trees
S SS | (S) |
(S)
(SS)
((S)S)
((S)(S))
(()(S))
(()())
(()())
18
S
(
S
S
(
S
)(
Parse trees
S
(S)
(SS)
((S)S)
((S)(S))
(()(S))
(()())
(S)
(SS)
((S)S)
(()S)
(()(S))
(()())
(S)
(SS)
(S(S))
((S)(S))
(()(S))
(()())
(S)
(SS)
(S(S))
(S())
((S)())
(()())
S
(
S
S
( S )( S )
Analysis example 2
S SS | (S) |
20
(()()
())(()
Analysis example 2
S SS | (S) |
L(G) = {w:
w has the same number of ( and )
no prefix of w has more )than(}
S
S
( ( ) ( ) ) ( )
21
Parsing rules:
Design example 1
L = {0n1n | n 0}
These strings have recursive structure:
000000111111
0000011111
00001111
000111
0011
01
S 0S1|
22
Design example 2
L = numbers without leading zeros
0, 109, 2, 23
, 01, 003
allowed
not allowed
S 0|LN
N DN|
D 0|L
L 1|2|3|4|5|6|7|8|9
23
1052870032
any number N
leading digit L
Design examples
L = {0n1n0m1m | n 0, m 0}
These strings have two parts:
L = L1L2
L1 = {0n1n | n 0}
L2 = {0m1m | m 0}
24
010011
00110011
000111
S S1S1
S1 0S11 |
Design examples
L = {0n1m0m1n | n 0, m 0}
011001
0011
These strings have nested structure: 1100
00110011
outer part: 0n1n
inner part: 1m0m
S 0S1|I
I 1I0 |
25
regular
expression
26
NFA
DFA
CFG
a (alphabet symbol)
Sa
E1 + E2
S S1 | S 2
E1E2
S S1S2
E1*
S SS1 |
S 0S1 |
L = {0n1n: n 0}
regular
28
context-free
Ambiguity
Parsing algorithms
29
Ambiguity
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2
E + E
E * E
1+2*2
N E * E
1 N
2
N
2
=5
E + E N
N 2
=6
Example
Is S SS | x
ambiguous?
Yes, because
S
S
31
xxx
S
S
Disambiguation
S SS | x
S Sx | x
S
S
S
Disambiguation
same precedence!
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2
F
Divide expression
into terms and factors
33
F
T
F
F
2 * (1 + 2 * 2)
Disambiguation
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2
An expression is a sum of
one or more terms
ET|E+T
TF|T*F
F (E) | 1 | 2
34
Parsing example
E
E
T
+ T
F
E
T *
(
F
E + T
T * F
T
F
F
F
2 * (1 + 1 + 2 * 2) + 1
35
ET|E+T
TF|T*F
F (E) | 1 | 2
Disambiguation
36
Ambiguity in English
37
Parsing
S 0S1 | 1S0S | T
TS|
input: 0011
38
Parsing
S 0S1 | 1S0S | T
TS|
S
0S1
input: 0011
10S10S
00T11
...
00S11
0011
...
1S0S
...
000S111
...
00S11
01S0S1
0T1
Problems
40
When to stop
S 0S1 | 1S0S | T
TS|
Problems:
S 0S1 0T1 01
1
STST
Derivation may loop
because of unit productions
Removal of -productions
Example
grammar
nullable variables
S ACD
A a
B
C ED |
D BC | b
Eb
Eliminating -productions
S ACD
A a
B
C ED |
D BC | b
Eb
nullable: B, C, D
44
DC
S AD
DB
D
S AC
S A
C E
For every nullable N:
If you see X aNb, add X ab
If you see N , remove it.
AB
grammar:
S 0S1 | 1S0S | T
TS|R|
R 0SR
45
T
R
A B ... C A
delete it and replace everything with A
S 0S1 | 1S0S |
T
T
S|R|
R 0SR
R
replace T by S
46
S 0S1 | 1S0S
SR|
R 0SR
A B ... C a
by A a, B a,... , C a
S
S 0S1 | 1S0S
|R|
R 0SR
S 0S1 | 1S0S
| 0SR |
R 0SR
Recap
Problem:
Solution:
Eliminate productions
Eliminate unit productions
Try all possible derivations but
stop parsing when
|derived string| > |input|
48
Example
S 0S1 | 0S0S | T
TS|0
input: 0011
S 0
0S1
S 0S1 | 1S0S | 0
conclusion:
0011 L
001
00S11 too long
00S0S1 too long
0S0S 000S
0000
1000S1 too long
00S10S too long
1000S0S too long
00S0S0S too long
49
Problems
50
Preparations
A faster way to parse:
the Cocke-Younger-Kasami algorithm
Eliminate productions
Eliminate unit productions
Convert CFG to Chomsky Normal Form
51
A BC
or
Aa
A BCDE
Cc
Noam Chomsky
break up
sequences
with new
variables
A BX
X CY
Y DE
Cc
Cocke-Younger-Kasami algorithm
S AB | BC
A BA | a
B CC | b
C AB | a
x = baaba
SAC
SA
B
SAC
B
B
AC
B
SC
SA
AC
AC
53
x = baaba
SAC
SA
B
SAC
B
B
AC
B
SC
SA
AC
AC
54
Cocke-Younger-Kasami algorithm
Grammar without and unit productions
in Chomsky Normal Form
Input string x = x1xk
1k
12
11
23
22
x1
x2
s
kk
xk
t k