Lecture03 Parsing 1
Lecture03 Parsing 1
Kenneth C. Louden
3. Context-Free Grammars and
Parsing
PART ONE
Contents
PART ONE
3.1 The Parsing Process [More]
3.2 Context-Free Grammars [More]
3.3 Parse Trees and Abstract [More]
3.4 Ambiguity [More]
PART TWO
3.5 Extended Notations: EBNF and Syntax Diagrams
3.6 Formal Properties of Context-Free Languages
3.7 Syntax of the TINY Language
Introduction
• Parsing is the task of Syntax Analysis
• Determining the syntax, or structure, of a program.
• The syntax is defined by the grammar
rules of a Context-Free Grammar
• The rules of a context-free grammar are recursive
• The basic data structure of Syntax Analysis
is parse tree or syntax tree
• The syntactic structure of a language must also be
recursive
3.1 The Parsing Process
Function of a Parser
• Takes the sequence of tokens produced by
the scanner as its input and produces the
syntax tree as its output.
Parser
• Sequence of tokens Syntax-Tree
Issues of the Parsing
• The sequence of tokens is not an explicit input
parameter
– The parser calls a scanner procedure getToken to
fetch the next token from the input as it is needed
during the parsing process.
– The parsing step of the compiler reduces to a call to
the parser as follows: SyntaxTree = parse( )
Issues of the Parsing
• The parser incorporate all the other phases of a
compiler in a single-pass compiler
• No explicit syntax tree needs to be constructed
• The parser steps themselves will represent the
syntax tree implicitly by a call Parse ( )
Issues of the Parsing
• In Multi-Pass, the further passes will use the
syntax tree as their input
– The structure of the syntax tree is heavily dependent on
the particular syntactic structure of the language
– This tree is usually defined as a dynamic data structure
– Each node consists of a record whose fields include the
attributes needed for the remainder of the compilation
process (i.e., not just those computed by the parser).
Issues of the Parsing
• What is more difficult for the parser than
the scanner is the treatment of errors.
• Error in the scanner
– Generate an error token and consume the
offending character.
Issues of the Parsing
• Error in the parser
– The parser must not only report an error message
– but it must recover from the error and continue
parsing (to find as many errors as possible)
• A parser may perform error repair
– Error recovery is the reporting of meaningful error
messages and the resumption of parsing as close to the
actual error as possible
Back
3.2 Context-Free Grammars
Basic Concept
• A context-free grammar is a specification for the
syntactic structure of a programming language
– Similar to the specification of the lexical structure of a
language using regular expressions
– Except involving recursive rules
• For example:
exp → exp op exp | (exp) | number
op → + | – | *
3.2.1 Comparison to Regular
Expression Notation
Comparing an Example
• The context-free grammar:
exp → exp op exp | (exp) | number
op → + | – | *
Back
3.3 Parse trees and abstract
syntax trees
3.3.1 Parse trees
Derivation V.S. Structure
• Derivations do not uniquely represent the
structure of the strings
– There are many derivations for the same
string.
• The string of tokens:
– (number - number ) * number
• There exist two different derivations for
above string
Derivation V.S. Structure
exp op exp
number + number
• The above parse tree is corresponds to the
three derivations:
Parsing Tree
• Left most derivation
)exp => exp op exp
(1)
(2)
) => number op exp
(3)
) => number + exp
(4)
) => number + number
2 exp 3 op 4 exp
number + number
Example: The expression (34-3)*42
• The parse tree for the above arithmetic expression
1 exp
4 exp 3 op 2 exp
( 5 exp ) * number
8 exp 7 op 6 exp
number – number
3.3.2 Abstract syntax trees
Way Abstract Syntax-Tree
• The parse tree contains more information than
is absolutely necessary for a compiler
• For the example: 3*4
exp
exp op exp
number * number
(3) (4)
Why Abstract Syntax-Tree
• The principle of syntax-directed translation
– The meaning, or semantics, of the string 3+4 should
be directly related to its syntactic structure as
represented by the parse tree.
• In this case, the parse tree should imply that
the value 3 and the value 4 are to be added.
• A much simpler way to represent this same
information, namely, as the tree
+
3 4
Tree for expression (34-3)*42
• The expression (34-3)*42 whose parse tree can be
represented more simply by the tree:
*
- 42
34 3
• The parentheses tokens have actually disappeared
– still represents precisely the semantic content of
subtracting 3 from 34, and then multiplying by 42.
Abstract Syntax Trees or Syntax
Trees
• Syntax trees represent abstractions of the
actual source code token sequences,
– The token sequences cannot be recovered
from them (unlike parse trees).
– Nevertheless they contain all the information
needed for translation, in a more efficient
form than parse trees.
Abstract Syntax Trees or Syntax
Trees
• A parse tree is a representation for the structure
of ordinary called concrete syntax when
comparing it to abstract syntax.
• Abstract syntax can be given a formal definition
using a BNF-like notation, just like concrete
syntax.
• The BNF-like rules for the abstract syntax of the
simple arithmetic expression:
exp → OpExp(op,exp,exp) | ConstExp(integer)
op → Plus | Minus | Times
Abstract Syntax Trees or Syntax
Trees
• Data type declaration.:the C data type
declarations.
typedef enum {Plus,Minus,Times} OpKind;
typedef enum {OpK.ConstK} ExpKind;
typedef struct streenode
{ ExpKind kind;
OpKind op;
struct streenode *lchild,*rchild;
int val;
} STreeNode;
typedef STreeNode *SyntaxTree;
Examples
• Example 3.8:
– The grammar for simplified if-statements
statement → if-stmt | other
if-stmt → if ( exp ) statement
| if ( exp ) statement else statement
exp → 0 | 1
Examples
• The parse tree for the string:
– if (0) other else other
statement
if-stmt
0 other other
Examples
• Using the grammar of Example 3.6
statement → if-stmt | other
if-stmt → if ( exp ) statement else-part
else-part → else statement | ε
exp → 0 | 1
Examples
• This same string has the following parse tree:
– if (0) other else other
statement
if-stmt
other
Examples
• A syntax tree for the previous string (using either
the grammar of Example 3.4 or 3.6) would be:
– if (0) other else other
if
0 other other
Examples
• A set of C declarations that would be appropriate for the
structure of the statements and expressions in this example’
is as follows:
typedef enum {ExpK, StmtK) NodeKind;
typedef enum {Zero, One} ExpKind;
typedef enum {IfK, OtherK) StmtKind;
typedef struct streenode
{ NodeKind kind;
ExpKind ekind; .
StmtKind skind;
struct streenode
*test,*thenpart,*elsepart;
} STreeNode;
typedef STreeNode * SyntaxTree;
Examples
• Example 3.9:
– The grammar of a sequence of statements
separated by semicolons from Example 3.7:
stmt-sequence → stmt ; stmt-sequence| stmt
stmt → s
Examples
• The string s; s; s has the following parse tree
with respect to this grammar:
stmt-sequence
stmt ; stmt-sequence
s stmt ; stmt-sequence
s stmt
s
Examples
• A possible syntax tree for this same string is:
;
s ;
s s
s s s
Problem & Solution
• The solution: use the standard leftmost-child
right-sibling representation for a tree (presented
in most data structures texts) to deal with arbitrary
number of children
– The only physical link from the parent to its children is
to the leftmost child.
– The children are then linked together from left to right
in a standard linked list, which are called sibling links
to distinguish them from parent-child links.
Problem & Solution
• The previous tree now becomes, in the leftmost-
child right-sibling arrange-ment:
seq
s s s
• With this arrangement, we can also do away with
the connecting seq node, and the syntax tree then
becomes simply:
s s s
Back
3.4 Ambiguity
What is Ambiguity
• Parse trees and syntax trees uniquely express the
structure of syntax
• But it is possible for a grammar to permit a string
to have more than one parse tree
• For example, the simple integer arithmetic
grammar:
exp → exp op exp | ( exp ) | number
op → + | - | *
_ 42
34 3 AND
_
34 *
3 42
An Ambiguous Grammar
• A grammar that generates a string with two
distinct parse trees
• Such a grammar represents a serious problem for a
parser
– Not specify precisely the syntactic structure of a
program
• In some sense, an ambiguous grammar is like a
non-deterministic automaton
– Two separate paths can accept the same string
An Ambiguous Grammar
• Ambiguity in grammars cannot be removed
nearly as easily as non-determinism in finite
automata
– No algorithm for doing so, unlike the situation in the
case of automata
• Ambiguous grammars always fail the tests that
we introduce later for the standard parsing
algorithms
– A body of standard techniques have been developed to
deal with typical ambiguities that come up in
programming languages.
Two Basic Methods dealing with
Ambiguity
• One is to state a rule that specifies in each
ambiguous case which of the parse trees (or
syntax trees) is the correct one, called a
disambiguating rule.
– The advantage: it corrects the ambiguity without
changing (and possibly complicating) the grammar.
– The disadvantage: the syntactic structure of the
language is no longer given by the grammar alone.
Two Basic Methods dealing with
Ambiguity
• Change the grammar into a form that
forces the construction of the correct parse
tree, thus removing the ambiguity.
• Of course, in either method we must first
decide which of the trees in an ambiguous
case is the correct one.
Remove The Ambiguity in Simple
Expression Grammar
• Simply state a disambiguating rule that
establishes the relative precedence of the
three operations represented.
– The standard solution is to give addition and
subtraction the same precedence, and to give
multiplication a higher precedence.
• A further disambiguating rule is the associativity
of each of the operations of addition, subtraction,
and multiplication.
– Specify that all three of these operations are left
associative
Remove the Ambiguity in simple
Expression Grammar
• Specify that an operation is nonassociative
– A sequence of more than one operator in an expression
is not allowed.
• For instance, writing simple expression grammar
in the following form: fully parenthesized
expressions
exp → factor op factor | factor
factor → ( exp ) | number
op→ + |- | *
Remove the Ambiguity in simple
Expression Grammar
• Strings such as 34-3-42 and even 34-3*42
are now illegal, and must instead be written
with parentheses
– such as (34-3) -42 and 34- (3*42).
• Not only changed the grammar, also
changed the language being recognized.
3.4.2 Precedence and Associativity
Group of Equal Precedence
number number
New Parse Tree
• The parse tree for the expression 34-3-42
exp
factor number
number
• The precedence cascades cause the parse trees to become much more
complex
• The syntax trees, however, are not affected
3.4.3 The dangling else problem
An Ambiguity Grammar
• Consider the grammar from:
statement → if-stmt | other
if-stmt → if ( exp ) statement
| if ( exp ) statement else statement
exp→ 0 | 1
• This grammar is ambiguous as a result of
the optional else. Consider the string
if (0) if (1) other else other
statement
if-stmt
0 if-stmt other
if ( exp ) statement
1 other
statement
unmatched-stmt
if ( exp ) statement
0 if-stmt
1 other other
Dangling else problem
• Which tree is correct depends on associating the
single else-part with the first or the second if-
statement.
– The first associates the else-part with the first if-
statement;
– The second associates it with the second if-statement.
• This ambiguity called dangling else problem
• This disambiguating rule is the most closely
nested rule
– implies that the second parse tree above is the
correct one.
An Example
• For example:
if (x != 0)
if (y = = 1/x) ok = TRUE;
else z = 1/x;
• Note that, if we wanted we could associate the
else-part with the first if-statement by using
brackets {...} in C, as in
if (x != 0)
{ if (y = = 1/x) ok = TRUE; }
else z = 1/x;
A Solution to the dangling else
ambiguity in the BNF
statement → matched-stmt | unmatched-stmt
matched-stmt → if ( exp ) matched-stmt else matched-stmt |
other
unmatched-stmt → if ( exp ) statement
| if ( exp ) matched-stmt else unmatched-stmt
exp → 0 | 1
• Permitting only a matched-stmt to come before an
else in an if-statement
– forcing all else-parts to be matched as soon as possible.
statement
unmatched-stmt
if ( exp ) statement
0 matched-stmt
1 other other
More about dangling else
• The dangling else problem has its origins in the
syntax of Algol60.
• It is possible to design the syntax in such a way
that the dangling else problem does not appear.
– Require the presence of the else-part, and this method
has been used in LISP and other functional languages
(where a value must also be returned).
– Use a bracketing keyword for the if-statement
languages that use this solution include Algol68 and
Ada.
More About Dangling else
For example, in Ada, the Associate the else-part with
programmer writes the second if-statement, the
programmer writes
if x /= 0 then if x /= 0 then
if y = 1/x then ok := true; if y = 1/x then ok := true;
else z := 1/x; end if
else z := 1/x;
end if;
end if;
end if;
More about dangling else
• BNF in Ada (somewhat simplified) is
THANKS