Chapter 4_01a0a63b848e0c15cdfbc525231434fc
Chapter 4_01a0a63b848e0c15cdfbc525231434fc
GrammarsandParsing
陳奇業成功⼤學資訊⼯程系
1
Parsing: Syntax Analysis
decides which part of the incoming token stream should be grouped together.
the output of parsing is some representation of a parse tree.
intermediate code generator transforms the parse tree into an intermediate language.
2
Comparisons between regular expressions
and context-free grammars
A context-free grammar:
→ exp exp| |
→+ |− |∗
A regular expression:
∗
=
= 0 1 2 3 4 5 6 7 8|9
The major difference is that the rules of a context-free grammar are recursive.
3
Rules from F.A.(r.e.) to CFG
1. For each state there is a nonterminal symbol.
2. If state has a transition to state on symbol , introduce → .
3. If goes to on input , introduce → .
4. If is an accepting state, introduce → .
5. Make the start state of the NFA be the start symbol of the grammar.
4
Examples
(1) r.e.: (a|b)(a|b|0|1)*
c.f.g.: S → aA|bA A → aA|bA|0A|1A|
5
Why don’t we use c.f.g. to replace r.e. ?
r.e. => easy & clear description for token.
r.e. => efficient token recognizer
modularizing the components (The grammar rules use regular expressions as
components)
6
Features of programming languages
contents:
- declarations
- sequential statements
- iterative statements
- conditional statements
7
Description of programming languages
Syntax Diagrams
Context Free Grammars (CFG)
8
Contex Free Grammar (in BNF)
addop + | -
mulop *
factor ( exp ) | number
History
In 1956 BNF (Backus Naur Form:巴科斯-諾爾範式) is used for description of natural
language.
The Syntactic Specification of Programming Languages - CFG ( a BNF description)
11
Capabilities of Context-free grammars
give precise syntactic specification of programming languages
a parser can be constructed automatically by CFG
the syntax entity specified in CFG can be used for translating into object code.
useful for describing nested structures such as balanced parentheses, matching begin-
end's, corresponding if-then-else, etc.
12
Context-Free Grammars: Concepts and
Notation
A context-free grammar = , , ,
A finite terminal vocabulary
The token set produced by scanner
A finite set of nonterminal vocabulary
Intermediate symbols
(number-number)*number
14
Context-Free Grammars: Concepts and
Notation (Cont’d)
Other notations
Vocabulary of ,
= ∪
46
Context-Free Grammars: Concepts and
Notation (Cont’d)
Left-most derivation, a top-down parsers
+ ∗
⇒lm, ⇒lm ,⇒lm
E.g. of leftmost derivation of f(v+v)
17
Context-Free Grammars: Concepts and
Notation (Cont’d)
Right-most derivation (canonical derivation)
+ ∗
⇒rm, ⇒rm,⇒rm
E.g. of rightmost derivation of f(v+v)
18
→ → ∗ → ∗ → ∗ → ∗
19
Method classic approach modern approach
top-down recursive descent LL parsing (produce leftmost
derivation)
bottom-up operator precedence LR parsing (shift-reduce
parsing; produce rightmost
derivation in reverse order)
20
Context-Free
Grammars: Concepts
and Notation (Cont’d)
A parse tree
rooted by the start symbol
Its leaves are grammar symbols or
a graphical representation for
derivations.
(Note the difference between parse tree
and syntax tree.)
Often the parse tree is produced in
only a figurative sense; in reality, the
parse tree exists only as a sequence of
actions made by stepping through the
tree construction process.
21
Errors in Context-Free Grammars
CFGs are a definitional mechanism. They may have errors, just as programs may.
Flawed CFG
Useless nonterminals
Unreachable
Derive no terminal string
S A|B
Aa Nonterminal C cannot be reached form S
B Bb Nonterminal B derives no terminal string
Cc
23
Ambiguity
Ambiguous Grammars
- Def.: A context-free grammar that can produce more than one parse tree for some
sentence.
- The ways to disambiguate a grammar: (1) specifying the intention (e.g. associativity and
precedence for arithmetic operators, other) (2) rewrite a grammar to incorporate the
intention into the grammar itself.
24
For (1) Precedence: ( )>negate > exponent > * / > + -
Associativity: exponent → right associativity
others → left associativity
For (2) 1. introducing one nonterminal for each precedence level.
25
Example 1
E -> E + E | E-E | E * E | E / E | E ↑ E | ( E ) | - E | id
is ambiguous (↑ is exponent operator with right associativity.)
26
E E
E + E E * E
id E * E E + E id
id id id id
id * + id
id id id id
29
Ex: id + id * id expression
expression + term
term
term factor
*
factor
factor
primary
primary primary
element
element element
id
id id
Example 2
stat → IF cond THEN stat | IF cond THEN stat ELSE stat | other stat
is an ambiguous grammar
33
Dangling else problem
stat
35
Transforming
Extened BNF
Grammars
Extended BNF ≡ BNF
Extended BNF allows
Square bracket []
Optional list {}
37
38
39
Parsers and Recognizers
Recognizer
An algorithm that does boolean-valued test
“Is this input syntactically valid?
Parser
Answers more general questions
Is this input valid?
And, if it is, what is its structure (parse tree)?
40
Parsers and Recognizers (Cont’d)
Two general approaches to parsing
Top-down parser
Expanding the parse tree (via predictions) in a depth-first manner
Preorder traversal of the parse tree
Predictive in nature
lm
LL
41
Parsers and Recognizers (Cont’d)
Buttom-down parser
Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the
productions used to generate the leaves
Postorder traversal of the parse tree
rm
LR
42
Parsers and Recognizers (Cont’d)
To parse
begin SimpleStmt; SimpleStmt; end $
43
44
45
Grammar Analysis Algorithms
Goal of this section:
Discuss a number of important analysis algorithms for Grammars
47
Grammar Analysis Algorithms (Cont’d)
The data structure of a grammar G
48
Grammar Analysis Algorithms (Cont’d)
What nonterminals can derive ?
A BCD BC B
An iterative marking algorithm
49
50
Grammar Analysis Algorithms (Cont’d)
51
Grammar Analysis Algorithms (Cont’d)
Follow
is any nonterminal
Follow is the set of terminals that my follow in some sentential form
∗ +
Follow = ∈ |⟹ … … ∪ if ⟹ then else
First
The set of all the terminal symbols that can begin a sentential form derivable from
If is the right-hand side of a production, then First contains terminal symbols that begin
strings derivable from
∗
First = ∈ | ⟹ ∪ if ⟹ ∗ then else
52
Grammar Analysis Algorithms (Cont’d)
Definition of C data structures and subroutines
first_set[ ]
contains terminal symbols and
is any single vocabulary symbol
follow_set[ ]
contains terminal symbols and
is a nonterminal symbol
53
It is a subroutine of
fill_first_set()
54
55
The execution of fill_first_set() using grammar G0
56
57
The execution of fill_follow_set() using grammar G0
58
More examples
S aSe
SB
B bBe
BC
C cCe
Cd
59
S aSe
More examples SB
B bBe
BC
C cCe
Cd
60
More examples
S ABc
Aa
A
Bb
B
61
More examples
S ABc
Aa
A
Bb
B
62