0% found this document useful (0 votes)
3 views

Lecture 05

The document is a lecture on parsing in computer science, covering topics such as the limitations of regular languages, the functionality of parsers, and context-free grammars (CFGs). It explains the structure of CFGs, derivations, parse trees, and addresses issues of ambiguity in grammars. The lecture emphasizes the importance of distinguishing valid strings of tokens and handling ambiguity effectively in programming languages.

Uploaded by

itsmeshinoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 05

The document is a lecture on parsing in computer science, covering topics such as the limitations of regular languages, the functionality of parsers, and context-free grammars (CFGs). It explains the structure of CFGs, derivations, parse trees, and addresses issues of ambiguity in grammars. The lecture emphasizes the importance of distinguishing valid strings of tokens and handling ambiguity effectively in programming languages.

Uploaded by

itsmeshinoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Introduction to Parsing

CS143
Lecture 5

Instructor: Fredrik Kjolstad


Slide design by Prof. Alex Aiken, with modifications
1
Outline

• Limitations of regular languages

• Parser overview

• Context-free grammars (CFG’s)

• Derivations

• Ambiguity
2
Languages and Automata

• Formal languages are very important in CS


– Especially in programming languages

• Regular languages
– The weakest formal languages widely used
– Many applications

• We will today study context-free languages

3
Beyond Regular Languages

• Many languages are not regular

• Strings of balanced parentheses are not regular:

{( ) | i ≥ 0}
i i

4
What Can Regular Languages Express?

• Languages requiring counting modulo a fixed


integer

• Intuition: A finite automaton that runs long enough


must repeat states

• Finite automaton can’t remember # of times it has


visited a particular state

5
The Functionality of the Parser

• Input: sequence of tokens from lexer

• Output: parse tree of the program


(Conceptually, but in practice parsers return an AST)

6
Example

• Cool
if x = y then 1 else 2 fi
• Parser input
IF ID = ID THEN INT ELSE INT FI
• Parser output
IF-THEN-ELSE

= INT INT

ID ID
7
Comparison with Lexical Analysis

String of characters String of tokens Parse tree


Lexer Parser

8
The Role of the Parser

• Not all strings of tokens are programs . . .


• . . . parser must distinguish between valid and
invalid strings of tokens

• We need
– A language for describing valid strings of tokens
– A method for distinguishing valid from invalid strings of
tokens

9
Context-Free Grammars

• Programming language constructs have recursive


structure

• An EXPR is
if EXPR then EXPR else EXPR fi
while EXPR loop EXPR pool

• Context-free grammars are a natural notation for


this recursive structure

10
CFGs (Cont.)

• A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions

X → Y1Y2 … Yn
where X ∊ N and Yi ∊T ∪ N ∪ {ε}

11
Notational Conventions

• In these lecture notes


– Non-terminals are written upper-case
– Terminals are written lower-case
– The start symbol is the left-hand side of the first
production

12
Examples of CFGs

A fragment of Cool:

EXPR → if EXPR then EXPR else EXPR fi


| while EXPR loop EXPR pool
| id

13
Examples of CFGs (cont.)

Simple arithmetic expressions:

E → E ∗E
| E+E
| (E )
| id
(Running example this week and next)
14
The Language of a CFG

Read productions as rules:

X → Y1 … Yn

means X can be replaced by Y1 … Yn

15
Key Idea

1. Begin with a string consisting of the start symbol


“S”
2. Replace any non-terminal X in the string by a
the right-hand side of some production
X → Y1 … Yn

3. Repeat (2) until there are no non-terminals in


the string

16
The Language of a CFG (Cont.)

More formally, write

X1 … Xi-1 Xi Xi+1 ... Xn → X1 … Xi-1 Y1 ... Ym Xi+1 ... Xn

if there is a production

Xi → Y1 … Ym

17
The Language of a CFG (Cont.)

Write
X1 … Xn →* Y1 ... Ym
if
X1 … Xn → … → … → Y1 ... Ym

in 0 or more steps

18
The Language of a CFG

Let G be a context-free grammar with start symbol


S. Then the language of G is:

{a1…an | S →* a1…an and every ai is a terminal }

19
Terminals

• Terminals are so-called because there are no


rules for replacing them

• Once generated, terminals are permanent

• Terminals ought to be tokens of the language

20
Examples

L(G) is the language of CFG G

Strings of balanced parentheses {( ) | i ≥ 0}


i i

Two grammars:

S → (S ) S → (S )
OR
S → ε | ε

21
Cool Example

A fragment of Cool:

EXPR → if EXPR then EXPR else EXPR fi


| while EXPR loop EXPR pool
| id

22
Cool Example (Cont.)

Some elements of the Cool CFG

id
if id then id else id fi
while id loop id pool
if while id loop id pool then id else id fi
if if id then id else id fi then id else id fi

23
Arithmetic Example

Simple arithmetic expressions:


E → E+E | E ∗ E | (E) | id
Some elements of the language:
id id + id
(id) id ∗ id
(id) ∗ id id ∗ (id)
24
Notes

The idea of a CFG is a big step. But:

• Membership in a language is “yes” or “no”


– We also need a parse tree of the input

• Must handle errors gracefully

• Need an implementation of CFG’s (e.g., bison)

25
More Notes

• Form of the grammar is important


– Many grammars generate the same language
– Tools are sensitive to the grammar

– Note: Tools for regular languages (e.g., flex) are


sensitive to the form of the regular expression, but this
is rarely a problem in practice

26
Derivations and Parse Trees

A derivation is a sequence of productions leading


to a string of only terminals
S→…→d…

A derivation can be drawn as a tree


– Start symbol is the tree’s root
– For a production X → Y1 … Yn add children Y1 … Yn to
node X

27
Derivation Example

• Grammar
E → E+E | E ∗ E | (E) | id
• String
id ∗ id + id

28
Derivation Example (cont.)

E
E
→ E+E
E + E
→ E ∗ E+E
→ id ∗ E + E E * E id
→ id ∗ id + E
id id
→ id ∗ id + id
29
Derivation in Detail (1)

30
Derivation in Detail (2)

E + E
E
→ E+E

31
Derivation in Detail (3)

E E + E

→ E+E
E * E
→ E ∗ E+E

32
Derivation in Detail (4)

E
E + E
→ E+E
→ E ∗ E+E E * E
→ id ∗ E + E
id

33
Derivation in Detail (5)

E
E
→ E+E E + E

→ E ∗ E+E
E * E
→ id ∗ E + E
→ id ∗ id + E id id

34
Derivation in Detail (6)

E
E
→ E+E
E + E
→ E ∗ E+E
→ id ∗ E + E E * E id
→ id ∗ id + E
id id
→ id ∗ id + id
35
Notes on Derivations

• A parse tree has


– Terminals at the leaves
– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the original


input

• The parse tree shows the association of


operations, the input string does not

36
Left-most and Right-most Derivations

• The example is a left-most


derivation
– At each step, replace the
left-most non-terminal
E
→ E+E
• There is an equivalent
notion of a right-most → E+id
derivation
→ E ∗ E + id
→ E ∗ id + id
→ id ∗ id + id
37
Right-most Derivation in Detail (1)

38
Right-most Derivation in Detail (2)

E + E
E
→ E+E

39
Right-most Derivation in Detail (3)

E E + E

→ E+E
id
→ E+id

40
Right-most Derivation in Detail (4)

E
E + E
→ E+E
→ E+id E * E id
→ E ∗ E + id

41
Right-most Derivation in Detail (5)

E
E
→ E+E E + E

→ E+id
E * E id
→ E ∗ E + id
→ E ∗ id + id id

42
Right-most Derivation in Detail (6)

E
E
→ E+E
E + E
→ E+id
→ E ∗ E + id E * E id
→ E ∗ id + id
id id
→ id ∗ id + id
43
Derivations and Parse Trees

• Note that right-most and left-most derivations


have the same parse tree

• The difference is the order in which branches are


added

44
Summary of Derivations

• We are not just interested in whether s ∈ L(G)


– We need a parse tree for s

• A derivation defines a parse tree


– But one parse tree may have many derivations

• Left-most and right-most derivations are important


in parser implementation

45
Ambiguity

• Grammar E → E+E | E ∗ E | (E) | id


• String id ∗ id + id

46
Ambiguity (Cont.)

This string has two parse trees


E E

E + E E * E

E * E id id E + E

id id id id

47
Ambiguity (Cont.)

• A grammar is ambiguous if it has more than one


parse tree for some string
– Equivalently, there is more than one right-most or left-
most derivation for some string

• Ambiguity is BAD
– Leaves meaning of some programs ill-defined

48
Dealing with Ambiguity E → E+E | E ∗ E | (E) | id

• There are several ways to handle ambiguity

• Most direct method is to rewrite grammar


unambiguously
' '
E → E +E | E
'
E → id ∗ Eʹ | id | (E) ∗ Eʹ | (E)
• Enforces precedence of * over +

49
Ambiguity in Arithmetic Expressions

• Recall the grammar


E → E + E | E * E | ( E ) | int
• The string int * int + int has two parse trees:
E E

E + E E * E

E * E int int E + E

int int int int


50
Ambiguity: The Dangling Else

• Consider the grammar


E → if E then E
| if E then E else E
| OTHER

• This grammar is also ambiguous

51
The Dangling Else: Example

• The expression
if E1 then if E2 then E3 else E4
has two parse trees
if if

E1 if E4 E1 if

E2 E3 E2 E3 E4

• Typically we want the second form

52
E → if E then E
| if E then E else E
The Dangling Else: A Fix | OTHER

• else matches the closest unmatched then


• We can describe this in the grammar
E → MIF /* all then are matched */
| UIF /* some then is unmatched */

MIF → if E then MIF else MIF


| OTHER

Key: Disallow if-then inside


UIF → if E then E then-clause
| if E then MIF else UIF

53
The Dangling Else: Example Revisited

• The expression if E1 then if E2 then E3 else E4


UIF → if E then E
| if E then MIF else UIF

if if

E1 if E1 if E4

E2 E3 E4 E2 E3

• A valid parse tree (for • Not valid because the then


a UIF) expression is not a MIF
54
Ambiguity

• No general techniques for handling ambiguity

• Impossible to convert automatically an ambiguous


grammar to an unambiguous one

• Used with care, ambiguity can simplify the grammar


– Sometimes allows more natural definitions
– We need disambiguation mechanisms

55
Precedence and Associativity Declarations

• Instead of rewriting the grammar


– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations

• Most tools allow precedence and associativity


declarations to disambiguate grammars

• Examples …

56
Associativity Declarations

• Consider the grammar E → E + E | int


• Ambiguous: two parse trees of int + int + int
E E

E + E E + E

E + E int int E + E

int int int int

• Left associativity declaration: %left +


57
Precedence Declarations

• Consider the grammar E → E + E | E * E | int


– And the string int + int * int
E E

E * E E + E

E + E int int E * E

int int int int


• Precedence declarations: %left +
%left *
58

You might also like