0% found this document useful (0 votes)
10 views

Chapter 4_01a0a63b848e0c15cdfbc525231434fc

Chapter 4 discusses grammars and parsing, focusing on the differences between context-free grammars (CFG) and regular expressions (r.e.), including their applications in programming languages. It covers the structure and capabilities of CFGs, parsing techniques, and the importance of error handling in grammars. The chapter also addresses ambiguity in grammars and presents algorithms for grammar analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 4_01a0a63b848e0c15cdfbc525231434fc

Chapter 4 discusses grammars and parsing, focusing on the differences between context-free grammars (CFG) and regular expressions (r.e.), including their applications in programming languages. It covers the structure and capabilities of CFGs, parsing techniques, and the importance of error handling in grammars. The chapter also addresses ambiguity in grammars and presents algorithms for grammar analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Chapter4:

GrammarsandParsing
陳奇業成功⼤學資訊⼯程系

1
Parsing: Syntax Analysis
 decides which part of the incoming token stream should be grouped together.
 the output of parsing is some representation of a parse tree.
 intermediate code generator transforms the parse tree into an intermediate language.

2
Comparisons between regular expressions
and context-free grammars
 A context-free grammar:
→ exp exp⁡| |
→+ |− |∗
 A regular expression:

=
= 0 1 2 3 4 5 6 7 8|9

The major difference is that the rules of a context-free grammar are recursive.

3
Rules from F.A.(r.e.) to CFG
1. For each state there is a nonterminal symbol.
2. If state has a transition to state on symbol , introduce → .
3. If goes to on input , introduce → .
4. If is an accepting state, introduce → .
5. Make the start state of the NFA be the start symbol of the grammar.

4
Examples
(1) r.e.: (a|b)(a|b|0|1)*
c.f.g.: S → aA|bA A → aA|bA|0A|1A|

(2) r.e.: (a|b)*abb


c.f.g.: S → aS | bS | aA
A → bB
B → bC
C→

5
Why don’t we use c.f.g. to replace r.e. ?
 r.e. => easy & clear description for token.
 r.e. => efficient token recognizer
 modularizing the components (The grammar rules use regular expressions as
components)

6
Features of programming languages
 contents:
- declarations
- sequential statements
- iterative statements
- conditional statements

7
Description of programming languages
 Syntax Diagrams
 Context Free Grammars (CFG)

8
Contex Free Grammar (in BNF)

exp  exp addop term | term

addop  + | -

term  term mulop factor | factor

mulop  *
factor  ( exp ) | number
History
 In 1956 BNF (Backus Naur Form:巴科斯-諾爾範式) is used for description of natural
language.
 The Syntactic Specification of Programming Languages - CFG ( a BNF description)

11
Capabilities of Context-free grammars
 give precise syntactic specification of programming languages
 a parser can be constructed automatically by CFG
 the syntax entity specified in CFG can be used for translating into object code.
 useful for describing nested structures such as balanced parentheses, matching begin-
end's, corresponding if-then-else, etc.

12
Context-Free Grammars: Concepts and
Notation
 A context-free grammar = , , ,
 A finite terminal vocabulary
 The token set produced by scanner
 A finite set of nonterminal vocabulary
 Intermediate symbols

 A start symbol ∈ that starts all derivations


 Also called goal symbol
 , a finite set of productions (rewriting rules) of the form → …
1 2
 ∈ , ∈ ∪ ,1≤ ≤
 → is a valid production
13
 = + , ∗ , (, ), , exp, , exp,
Context-Free  : exp→exp op exp | ( exp ) | number,op→+
Grammars: Concepts
and Notation

(number-number)*number

14
Context-Free Grammars: Concepts and
Notation (Cont’d)
 Other notations
 Vocabulary of ,
 = ∪

 ( ), the set of string derivable from


 Context-free language of grammar
 Notational conventions
 , , ,… denote symbols in
 , , ,… denote symbols in
 , , ,… denote symbols in

 , , ,… denote strings in

 , , ,… denote strings in
15
Context-Free Grammars: Concepts and
Notation (Cont’d)
 Derivation
 One step derivation
 If → , then ⇒
 One or more steps derivation ⟹ +
 Zero or more steps derivation ⟹ ∗

 If ⟹ ∗ , then is said to be sentential form of the CFG


 SF(G) is the set of sentential forms of grammar G (may contain nonterminal vocabulary )
∗ +
 = ∈ |⟹

 = SF(G) ∩
16
Parsers and Recognizers (Cont’d)
 Naming of parsing techniques

The way to parse L: Leftmost


 Top-down token sequence R: Righmost
 LL
 Bottom-up
 LR

46
Context-Free Grammars: Concepts and
Notation (Cont’d)
 Left-most derivation, a top-down parsers
+ ∗
 ⇒lm, ⇒lm ,⇒lm
 E.g. of leftmost derivation of f(v+v)

17
Context-Free Grammars: Concepts and
Notation (Cont’d)
 Right-most derivation (canonical derivation)
+ ∗
 ⇒rm, ⇒rm,⇒rm
 E.g. of rightmost derivation of f(v+v)

18
→ → ∗ → ∗ → ∗ → ∗

19
Method classic approach modern approach
top-down recursive descent LL parsing (produce leftmost
derivation)
bottom-up operator precedence LR parsing (shift-reduce
parsing; produce rightmost
derivation in reverse order)

20
Context-Free
Grammars: Concepts
and Notation (Cont’d)
 A parse tree
 rooted by the start symbol
 Its leaves are grammar symbols or
 a graphical representation for
derivations.
 (Note the difference between parse tree
and syntax tree.)
 Often the parse tree is produced in
only a figurative sense; in reality, the
parse tree exists only as a sequence of
actions made by stepping through the
tree construction process.

21
Errors in Context-Free Grammars
 CFGs are a definitional mechanism. They may have errors, just as programs may.
 Flawed CFG
 Useless nonterminals
 Unreachable
 Derive no terminal string

S A|B
Aa Nonterminal C cannot be reached form S
B Bb Nonterminal B derives no terminal string
Cc

S is the start symbol.


22
Errors in Context-Free Grammars
 Ambiguous:
 Grammars that allow different parse trees for the same terminal string
 It is impossible to decide whether a given CFG is ambiguous

23
Ambiguity
Ambiguous Grammars
- Def.: A context-free grammar that can produce more than one parse tree for some
sentence.
- The ways to disambiguate a grammar: (1) specifying the intention (e.g. associativity and
precedence for arithmetic operators, other) (2) rewrite a grammar to incorporate the
intention into the grammar itself.

24
For (1) Precedence: ( )>negate > exponent > * / > + -
Associativity: exponent → right associativity
others → left associativity
For (2) 1. introducing one nonterminal for each precedence level.

25
Example 1
E -> E + E | E-E | E * E | E / E | E ↑ E | ( E ) | - E | id
is ambiguous (↑ is exponent operator with right associativity.)

26
E E

E + E E * E

id E * E E + E id

id id id id

More than one parse tree for the sentence id + id * id


+ *

id * + id

id id id id

More than one syntax tree for the sentence id + id *


id
The corresponding grammar shown
below is unambiguous
element → (expression) | id /*((expression) 括號內的最優先做之故) */

primary → -primary | element

factor → primary ↑ factor | primary /*has right associativity */

term → term * factor | term / factor | factor


expression → expression + term | expression – term | term

29
Ex: id + id * id expression

expression + term

term
term factor
*
factor
factor
primary
primary primary
element
element element
id
id id
Example 2
 stat → IF cond THEN stat | IF cond THEN stat ELSE stat | other stat

is an ambiguous grammar

33
Dangling else problem
stat

IF cond THEN stat


IF cond THEN stat ELSE stat
if c1 then

If c1 then if c2 then s2 else s3 if c2 then s2 else s3


stat

IF cond THEN stat ELSE stat

if IF cond THEN stat else s3


c1 then
if c2 then s2
The corresponding grammar shown
below is unambiguous.
stat → matched-stat | unmatched-stat
matched-stat → IF cond THEN matched-stat ELSE matched-stat | other-stat
unmatched-stat → IF cond THEN stat | IF cond THEN matched-stat ELSE unmatched-stat

35
Transforming
Extened BNF
Grammars
 Extended BNF ≡ BNF
 Extended BNF allows
 Square bracket []
 Optional list {}

37
38
39
Parsers and Recognizers
 Recognizer
 An algorithm that does boolean-valued test
 “Is this input syntactically valid?

 Parser
 Answers more general questions
 Is this input valid?
 And, if it is, what is its structure (parse tree)?

40
Parsers and Recognizers (Cont’d)
 Two general approaches to parsing
 Top-down parser
 Expanding the parse tree (via predictions) in a depth-first manner
 Preorder traversal of the parse tree
 Predictive in nature
 lm
 LL

41
Parsers and Recognizers (Cont’d)
 Buttom-down parser
 Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the
productions used to generate the leaves
 Postorder traversal of the parse tree
 rm
 LR

42
Parsers and Recognizers (Cont’d)

To parse
begin SimpleStmt; SimpleStmt; end $
43
44
45
Grammar Analysis Algorithms
 Goal of this section:
 Discuss a number of important analysis algorithms for Grammars

47
Grammar Analysis Algorithms (Cont’d)
 The data structure of a grammar G

48
Grammar Analysis Algorithms (Cont’d)
 What nonterminals can derive ?
A  BCD  BC  B  
 An iterative marking algorithm

49
50
Grammar Analysis Algorithms (Cont’d)

51
Grammar Analysis Algorithms (Cont’d)
 Follow
 is any nonterminal
 Follow is the set of terminals that my follow in some sentential form
∗ +
 Follow = ∈ |⟹ … … ∪ if ⟹ then else

 First
 The set of all the terminal symbols that can begin a sentential form derivable from
 If is the right-hand side of a production, then First contains terminal symbols that begin
strings derivable from

 First = ∈ | ⟹ ∪ if ⟹ ∗ then else

52
Grammar Analysis Algorithms (Cont’d)
 Definition of C data structures and subroutines
 first_set[ ]
 contains terminal symbols and
 is any single vocabulary symbol
 follow_set[ ]
 contains terminal symbols and
 is a nonterminal symbol

53
It is a subroutine of
fill_first_set()

54
55
The execution of fill_first_set() using grammar G0

56
57
The execution of fill_follow_set() using grammar G0

58
More examples
S  aSe
SB
B  bBe
BC
C  cCe
Cd

59
S  aSe
More examples SB
B  bBe
BC
C  cCe
Cd

60
More examples
S  ABc
Aa
A 
Bb
B 

61
More examples
S  ABc
Aa
A 
Bb
B 

62

You might also like