cs3304 4
cs3304 4
An Example Grammar
<program> -> <stmts>
<stmts> -> <stmt>
| <stmt> ; <stmts>
<stmt> -> <var> = <expr>
<var> -> a | b | c | d
<expr> -> <term> + <term>
| <term> - <term>
<term> -> <var>
| const 1
An Exemplar Derivation
<program> => <stmts>
=> <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const sentence
2
1
9/1/16
Sentential Forms
• Every string of symbols in the
derivation is a sentential form
• A sentence is a sentential form that has
only terminal symbols
• A leftmost derivation is one in which
the leftmost non-terminal in each
sentential form is the one that is
expanded next in the derivation
3
Sentential Forms
• A left-sentential form is a sentential
form that occurs in the leftmost
derivation
• A rightmost derivation works right to left
instead
• A right-sentential form is a sentential
form that occurs in the rightmost
derivation
• Some derivations are neither leftmost nor
rightmost
4
2
9/1/16
Why BNF?
Context-Free Grammars
• The syntax of simple arithmetic
expression
expr -> id | number | -expr |(expr)
|expr op expr
op -> + | - | * | /
• What are the terminal symbols and
nonterminal symbols?
• What is the start symbol?
6
3
9/1/16
7
Another Example
<program> -> <stmts> • G = {T, N, S, P}
<stmts> -> <stmt> • What are the
terminals?
|<stmt> ; <stmts>
<stmt> -> <var> = <expr>
• What are the
nonterminals?
<var> -> a | b | c | d
<expr> -> <term> + <term>
| <term> - <term> • What is the
<term> -> <var>
start symbol?
| const • Possible
strings?
8
4
9/1/16
Parse Tree
• A parse tree is
– a hierarchical representation of a
derivation
– to represent the structure of the
derivation of a terminal string from some
non-terminal
– to describe the hierarchical syntactic
structure of programs for any language
9
An Example
• Given the simple assignment statement
syntax
<assign> -> <id> = <expr>
<id> -> A | B | C
<expr> -> <id> + <expr>
| <id> * <expr>
| ( <expr> )
| <id>
• With leftmost derivation, how is A = B * (A +
C) generated?
10
5
9/1/16
Derivation for A = B * (A + C)
<assign> => <id> = <expr>
=> A = <expr>
=> A = <id> * <expr>
=> A = B * <expr>
=> A = B * ( <expr> )
=> A = B * ( <id> + <expr>)
=> A = B * (A + <expr>)
=> A = B * (A + <id>)
=> A = B * (A + C)
11
A <id> * <expr>
B ( <expr> )
<id> + <expr>
A <id>
C
12
6
9/1/16
Parse Tree
• A grammar is ambiguous if it generates
a sentential form that has two or more
distinct parse trees
13
An Ambiguous Grammar
expr -> id | number | -expr |(expr)
| expr op expr
op -> + | - | * | /
14
7
9/1/16
Operator Associativity
• Single recursion in production rules
<expr> -> <expr> - <expr> | const
✗ Ambiguous
✓ Unambiguous
8
9/1/16
Operator Precedence
• Use stratification in production rules
– Intentionally put operators at different
levels of parse trees
<expr> -> <expr> - <term> | <term>
<term> -> <term> / const | const
17
18
9
9/1/16
term + factor
factor * id(x)
id(slope)
19
20
10
9/1/16
21
22
11
9/1/16
Scanner
• Pattern matcher for character strings
– If a character sequence matches a pattern,
it is identified as a token
• Responsibilities
– Tokenize source, report lexical errors if
any, remove comments and whitespace, save
text of interesting tokens, save source
locations, (optional) expand macros and
implement preprocessor functions
23
Tokenizing Source
• Given a program, identify all lexemes and
their categories (tokens)
24
12