0% found this document useful (0 votes)
103 views

SSK5204 Chapter 5: Context-Free Grammars and Languages

Here are the steps to remove -productions from a CFG: 1. Identify all nullable variables N that can derive the empty string  using the derivation rules. 2. Remove all productions with nullable variables N on the right-hand side, and add  to the left-hand side. For example, if T → S and S is nullable, replace T → S with T → . This removes the ability to directly derive the empty string while preserving the language generated by the grammar. 42 Removal of unit productions A production is a unit production if: RHS has exactly one nonterminal

Uploaded by

scodrash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

SSK5204 Chapter 5: Context-Free Grammars and Languages

Here are the steps to remove -productions from a CFG: 1. Identify all nullable variables N that can derive the empty string  using the derivation rules. 2. Remove all productions with nullable variables N on the right-hand side, and add  to the left-hand side. For example, if T → S and S is nullable, replace T → S with T → . This removes the ability to directly derive the empty string while preserving the language generated by the grammar. 42 Removal of unit productions A production is a unit production if: RHS has exactly one nonterminal

Uploaded by

scodrash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

SSK5204

Chapter 5: Context-Free Grammars


and Languages
Dr. Nor Fazlida Mohd Sani, Dept. of Computer Science,
Fac. of Computer Science and Information Technology, UPM.

Introduction
Finite Automata and Regular Expressions, two different,
though equivalent, methods of describing languages.

Showed that many languages can be described in this way but


that some simple languages, such as {0n1n: n 0}, cannot.

Present the context-free grammars (CFG), a more


powerful method describing languages

Such grammars can described certain features that have recursive


structure, which make them useful in a variety of applications.

CFG first used in the study of human languages

way of understanding the relationship of terms such as noun, verb,


and preposition and respective phrases leads to natural recursion
because noun phrases may appear inside verb phrases and vice
versa.
CFG can capture important aspects of these relationships.

Introductioncont.
Important application of CFG occurs in the
specification and compilation of programming
languages (PL)
A grammar for a PL often appears as a reference for
people trying to learn the language syntax.
Designers of compilers and interpreters for PL often
start by obtaining a grammar for the language.

Compilers and interpreters contain a component called


parser, extract meaning of a program prior to generating
compiled code or performing interpreted execution.
Some tools even automatically generate the parser from
the grammar.

Context-free languages

Precedence in arithmetic expressions


bash-3.2$ python
Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55)
>>> 2+3*5
17

*
+

5
3

= 25

or

*
3

= 17

Grammars describe meaning


EXPR EXPR + TERM
EXPR TERM
TERM TERM * NUM
TERM NUM
NUM 0-9
rules for valid (simple)
arithmetic expressions

EXPR
EXPR

TERM

TERM
TERM

NUM

NUM

NUM

Rules always yield the correct meaning


6

The grammar of English


SENTENCE NOUN-PHRASE VERB-PHRASE
a girl likes the boy
NOUN-PHRASE

VERB-PHRASE

NOUN-PHRASE A-NOUN
or A-NOUN PREP-PHRASE
a girl
A-NOUN

a girl with a flower


A-NOUN

PREP-PHRASE

The grammar of English


NOUN-PHRASE A-NOUN
or A-NOUN PREP-PHRASE

a girl

a girl with a flower

A-NOUN

PREP-PHRASE
recursive A-NOUN
structure
PREP-PHRASE PREP NOUN-PHRASE

with a flower
PREP NOUN-PHRASE

The grammar of (parts of) English

SENTENCE NOUN-PHRASE VERB-PHRASE


NOUN-PHRASE A-NOUN
NOUN-PHRASE A-NOUN PREP-PHRASE
VERB-PHRASE CMPLX-VERB
VERB-PHRASE CMPLX-VERB PREP-PHRASE
PREP-PHRASE PREP A-NOUN
A-NOUN ARTICLE NOUN
CMPLX-VERB VERB NOUN-PHRASE
CMPLX-VERB VERB

ARTICLE a
ARTICLE the
NOUN boy
NOUN girl
NOUN flower
VERB likes
VERB touches
VERB sees
PREP with

The meaning of sentences

SENTENCE
NOUN-PHRASE

VERB-PHRASE
CMPLX-VERB

PREP-PHRASE
A-NOUN
ARTICLE NOUN

A-NOUN
PREP ARTICLE NOUN

NOUN-PHRASE
A-NOUN
VERB ARTICLE NOUN

a girl with a flower likes the boy

10

Context-free grammar
start variable

A 0A1
AB
B#

variables
terminals

productions
A 0A1 00A11 000A111
000B111 000#111
derivation
11

Context-free grammar

A context-free grammar is given by (V, S, R, S) where

V is a finite set of variables or non-terminals


S is a finite set of terminals
R is a set of productions or substitution rules of the form

Aa
A is a variable and a is a string of variables and terminals
S is a variable called the start variable

12

Notation and conventions


EE+E
E (E)
EN

13

N 0N
N 1N
N0
N1

Variables: E, N
Terminals: +, *, (, ), 0, 1
Start variable: E

shorthand:

conventions:

E E + E | (E) | N
N 0N | 1N | 0 | 1

Variables in UPPERCASE
Start variable comes first

Derivation
A derivation is a sequential application of productions:
E E+E
(E)+ E
(E)+ N
(E + E)+ 1
(E + E)+ 1
(E + N)+ 1
(N + N)+ 1
(N + 1N)+ 1
(N + 10)+ 1
(1 + 10)+ 1

* (1 + 10)+ 1
E
14

E E + E |(E) | N
N 0N | 1N | 0 | 1

derivation

ab

one production

* b
a

derivation

Context-free languages

The language of a CFG is the set of all strings at the


end of a derivation
* w}
L(G) = {w : w S* and S

Questions we will ask:


I give you a CFG, what is the language?
I give you a language, write a CFG for it
15

Analysis example 1
A 0A1 | B
B#

L(G) = {0n#1n: n 0}

Can you derive:

16

00#11

A 0A1 00A11 00B11 00#11

A B #

00#111

No, there is an uneven number of 0s and 1s

00##11

No, there are too many #

Analysis example 2
S SS | (S) |

Can you derive


S (S)
()

()
17

(2)
(3)

(S)
(SS)
((S)S)
((S)(S))
(()(S))
(()())

(()())

Parse trees
S SS | (S) |

A parse tree gives a more compact representation:


S

(S)
(SS)
((S)S)
((S)(S))
(()(S))
(()())

(()())
18

S
(

S
S
(

S
)(

Parse trees
S

(S)
(SS)
((S)S)
((S)(S))
(()(S))
(()())
(S)
(SS)
((S)S)
(()S)
(()(S))
(()())

(S)
(SS)
(S(S))
((S)(S))
(()(S))
(()())

(S)
(SS)
(S(S))
(S())
((S)())
(()())

S
(

S
S

( S )( S )

One parse tree can represent many derivations


19

Analysis example 2
S SS | (S) |

Can you derive

20

(()()

No, because there is an uneven


number of ( and )

())(()

No, because there is a prefix


with an excess of )

Analysis example 2
S SS | (S) |

L(G) = {w:
w has the same number of ( and )
no prefix of w has more )than(}

S
S

Divide w up in blocks with


same number of ( and )

( ( ) ( ) ) ( )
21

Parsing rules:

Each block is in L(G)


Parse each block recursively

Design example 1
L = {0n1n | n 0}
These strings have recursive structure:
000000111111
0000011111
00001111
000111
0011
01

S 0S1|
22

Design example 2
L = numbers without leading zeros
0, 109, 2, 23

, 01, 003

allowed

not allowed

S 0|LN
N DN|
D 0|L
L 1|2|3|4|5|6|7|8|9

23

1052870032
any number N

leading digit L

Design examples
L = {0n1n0m1m | n 0, m 0}
These strings have two parts:
L = L1L2
L1 = {0n1n | n 0}
L2 = {0m1m | m 0}

rules for L1: S1 0S11|


L2 is the same as L1

24

010011
00110011
000111

S S1S1
S1 0S11 |

Design examples
L = {0n1m0m1n | n 0, m 0}

011001
0011
These strings have nested structure: 1100
00110011
outer part: 0n1n
inner part: 1m0m

S 0S1|I
I 1I0 |

25

Context-free versus regular

Write a CFG for the language (0 + 1)*111


S U111
U 0U | 1U |

Can you do so for every regular language?

Every regular language is context-free

regular
expression
26

NFA

DFA

From regular to context-free


regular expression

CFG

grammar with no rules


S

a (alphabet symbol)

Sa

E1 + E2

S S1 | S 2

E1E2

S S1S2

E1*

S SS1 |

(S becomes the new start symbol)


27

Context-free versus regular

Is every context-free language regular?

S 0S1 |

L = {0n1n: n 0}

Is context-free but not regular

regular

28

context-free

Ambiguity
Parsing algorithms

29

Ambiguity
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2

E + E

E * E

1+2*2

N E * E
1 N
2

N
2

=5

E + E N

N 2

=6

A CFG is ambiguous if some string has more


than one parse tree
30

Example
Is S SS | x

ambiguous?

Yes, because
S
S

31

xxx

S
S

Disambiguation
S SS | x

S Sx | x

S
S
S

Sometimes we can rewrite the grammar to


remove ambiguity
32

Disambiguation

same precedence!

E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2

F
Divide expression
into terms and factors

33

F
T

F
F
2 * (1 + 2 * 2)

Disambiguation
E E + E | E * E | (E) | N
N 1N | 2N | 1 | 2

An expression is a sum of
one or more terms

ET|E+T

Each term is a product of


one or more factors

TF|T*F

Each factor is a parenthesized


expression or a number

F (E) | 1 | 2

34

Parsing example
E
E
T

+ T

F
E

T *
(
F

E + T
T * F
T
F
F
F
2 * (1 + 1 + 2 * 2) + 1
35

ET|E+T
TF|T*F
F (E) | 1 | 2

Disambiguation

Disambiguation is not always possible because

There exist inherently ambiguous languages


There is no general procedure for disambiguation

In programming languages, ambiguity comes from


precedence rules, and we can do like in example

In English, ambiguity is sometimes a problem:


He ate the cookies on the floor

36

Ambiguity in English

He ate the cookies on the floor

37

Parsing
S 0S1 | 1S0S | T
TS|

input: 0011

How would we program the computer


to build a parse tree for us?

38

Parsing
S 0S1 | 1S0S | T
TS|
S

0S1

input: 0011

10S10S

00T11

First idea: Try all derivations


39

...

00S11
0011

...

1S0S

...

000S111
...

00S11
01S0S1
0T1

Problems

40

Trying all derivations may take a


very long time

If input is not in the language,


parsing will never stop

When to stop
S 0S1 | 1S0S | T
TS|

Idea 2: Stop when


|derived string| > |input|

Problems:
S 0S1 0T1 01
1

Derived strings may shrink


because of -productions

STST
Derivation may loop
because of unit productions

Task: remove and unit productions


41

Removal of -productions

A variable N is nullable if it derives the empty string


*
N
Identify all nullable variables N

Remove nullable variables carefully


If start variable S is nullable:
Add a new start variable S
Add special productions S S |
42

Example
grammar

nullable variables

S ACD
A a
B
C ED |
D BC | b
Eb

Repeat the following:


If X , mark X as nullable
If X YZW, all marked nullable,
mark X as nullable also.

Identify all nullable


variables
43

Eliminating -productions
S ACD
A a
B
C ED |
D BC | b
Eb
nullable: B, C, D

44

DC
S AD
DB
D
S AC
S A
C E
For every nullable N:
If you see X aNb, add X ab
If you see N , remove it.

Remove nullable variables


carefully

Eliminating unit productions

A unit production is a production of the form

AB

grammar:

S 0S1 | 1S0S | T
TS|R|
R 0SR

45

unit productions graph:


S

T
R

Removal of unit productions


If there is a cycle of unit productions

A B ... C A
delete it and replace everything with A

S 0S1 | 1S0S |
T
T
S|R|
R 0SR

R
replace T by S

46

S 0S1 | 1S0S
SR|
R 0SR

Removal of unit productions


Replace every chain

A B ... C a
by A a, B a,... , C a

S
S 0S1 | 1S0S
|R|
R 0SR

S 0S1 | 1S0S
| 0SR |
R 0SR

S R 0SR is replaced by S 0SR, R 0SR


47

Recap
Problem:

If input is not in the language,


parsing will never stop

Solution:

Eliminate productions
Eliminate unit productions
Try all possible derivations but
stop parsing when
|derived string| > |input|

48

Example
S 0S1 | 0S0S | T
TS|0

input: 0011
S 0
0S1

S 0S1 | 1S0S | 0

conclusion:
0011 L

001
00S11 too long
00S0S1 too long

0S0S 000S
0000
1000S1 too long
00S10S too long
1000S0S too long
00S0S0S too long
49

Problems

50

Trying all derivations may take a


very long time

If input is not in the language,


parsing will never stop

Preparations
A faster way to parse:
the Cocke-Younger-Kasami algorithm

To use it we must prepare the CFG:

Eliminate productions
Eliminate unit productions
Convert CFG to Chomsky Normal Form
51

Chomsky Normal Form

A CFG is in Chomsky Normal Form


if every production* has the form

A BC

or

Aa

Convert to Chomsky Normal Form:


A BcDE
replace
terminals
with new
variables

A BCDE
Cc

Noam Chomsky
break up
sequences
with new
variables

A BX
X CY
Y DE
Cc

* Exception: We allow S for start variable only


52

Cocke-Younger-Kasami algorithm
S AB | BC
A BA | a
B CC | b
C AB | a

x = baaba

SAC

SA
B

SAC
B
B
AC

B
SC

SA

AC

AC

Idea: We generate each substring of x bottom up

53

Parse tree reconstruction


S AB | BC
A BA | a
B CC | b
C AB | a

x = baaba

SAC

SA
B

SAC
B
B
AC

B
SC

SA

AC

AC

Tracing back the derivations, we obtain the parse tree

54

Cocke-Younger-Kasami algorithm
Grammar without and unit productions
in Chomsky Normal Form
Input string x = x1xk

For all cells in last row


If there is a production A xi
Put A in table cell ii
For cells st in other rows
If there is a production A BC
where B is in cell sj and C is in cell
(j+1)t
Put A in cell st

1k

12
11

23
22

x1

x2
s

kk

Cell ij remembers all possible derivations of substring xixj


55

xk
t k

You might also like