Introduction To Syntax Analysis: CSCI4160: Compiler Design and Software Development
Introduction To Syntax Analysis: CSCI4160: Compiler Design and Software Development
Overview
Context-Free
Grammar Introduction to Syntax Analysis
Ambiguous
Grammar
CSCI4160: Compiler Design and Software Development
CSCI 4160
Overview
Context-Free
Grammar 1 Overview
Ambiguous
Grammar
2 Context-Free Grammar
3 Ambiguous Grammar
Roadmap
CSCI 4160
Overview
Context-Free
Grammar 1 Overview
Ambiguous
Grammar
2 Context-Free Grammar
3 Ambiguous Grammar
Syntax Analysis
CSCI 4160
Syntax Analysis
Overview
Context-Free
The parser (syntax analyzer) receives the source code in
Grammar the form of tokens from the lexical analyzer and performs
Ambiguous
Grammar
syntax analysis, which create a tree-like intermediate
representation that depicts the grammatical structure of the
token stream.
CSCI 4160
Overview
Context-Free
Grammar
Ambiguous
Grammar
Role of Parser
CSCI 4160
Overview
Context-Free Parser
Grammar
Ambiguous
Checks the stream of words and their parts of speech
Grammar (produced by the scanner) for grammatical correctness
Determines if the input is syntactically well formed
Guides checking at deeper levels than syntax (static
semantics checking)
Builds an IR representation of the code
Study of Parsing
CSCI 4160
Parser
The parser
Overview Needs the syntax of programming language constructs, which can be
Context-Free
specified by context-free grammars or BNF (Backus-Naur Form)
Grammar
Need an algorithm for testing membership in the language of the grammar.
Ambiguous
Grammar
Roadmap
The roadmap for study of parsing
Context-free grammars and derivations
Top-down parsing
Recursive descent (predictive parsing)
LL (Left-to-right, Leftmost derivation) methods
Bottom-up parsing
Operator precedence parsing
LR (Left-to-right, Rightmost derivation) methods
SLR, canonical LR, LALR
Expressive Power of Different Parsing
Techniques
CSCI 4160
Overview
Context-Free
Grammar
Ambiguous
Grammar
Benefits Offered by Grammar
CSCI 4160
Grammars offer significant benefits for both language
designers and compiler writers:
Overview A grammar gives a precise, yet easy-to-understand
Context-Free
Grammar
syntactic specification to a programming language.
Ambiguous
Grammar
Parsers can automatically be constructed for certain
classes of grammars.
The parser-construction process can reveal syntactic
ambiguities and trouble spots.
A grammar imparts structure to a language.
The structure is useful for translating source programs
into correct object code and for detecting errors.
A grammar allows a language to be evolved.
New constructs can be integrated more easily into an
implementation that follows the grammatical structure of
the language.
Why Not Use RE/DFA?
CSCI 4160
Advantages of RE/DFA
Overview
Limits of RE/DFA
CSCI 4160
Overview
Context-Free
Grammar Grammars are a more powerful notation than regular
Ambiguous
Grammar
expressions.
Every construct that can be described by a regular
expression can be described by a grammar, but not
vice-versa.
Every regular language is a context-free language, but
not vice-versa.
Roadmap
CSCI 4160
Overview
Context-Free
Grammar 1 Overview
Ambiguous
Grammar
2 Context-Free Grammar
3 Ambiguous Grammar
Context-Free Grammar
CSCI 4160
Definition
A context-free grammar (CFG) has four components:
Overview A set of terminal symbols, sometimes referred to as "tokens."
Context-Free A set of nonterminal symbols. sometimes called "syntactic variables."
Grammar
One nonterminal is distinguished as the start symbol.
Ambiguous
Grammar A set of productions in the form: LHS → RHS where
LHS (called head, or left side) is a single nonterminal symbol
RHS (called body, or right side) consists of zero or more terminals
and nonterminals.
The terminals are the elementary symbols of the language defined by the
grammar.
Nonterminals impose a hierarchical structure on the language that is key to
syntax analysis and translation.
Conventionally, the productions for the start symbol are listed first.
The productions specify the manner in which the terminals and
nonterminals can be combined to form strings.
CFG Example
CSCI 4160
A CFG Grammar
Overview
CSCI 4160
Overview
Context-Free
Grammar Productions with the same head can be grouped. Therefore,
Ambiguous the previous CFG grammar is equivalent to the one below.
Grammar
CSCI 4160
Ambiguous
1 expr → expr + term | expr - term | term
Grammar
2 term → term * factor | term / factor | factor
3 factor → ( expr ) | id
where
expr, term, and factor are nonterminals
id, +, -, *, /, (, and ) are terminals
expr is the start symbol
Notational Conventions
CSCI 4160
To avoid confusion between terminals and nonterminals, the following notational
conventions for grammar will be used.
Overview
terminal symbols
Context-Free
Grammar
lowercase letters like a, b, c.
Ambiguous
Grammar
digits, operator and punctuation symbols, such as +, *, (, ), 0, 1, ..., 9.
Boldface strings such as id, or if. Each of which represents a single
terminal symbol.
nonterminal symbols
CSCI 4160
Overview
Context-Free
Grammar To avoid confusion between terminals and nonterminals, the following notational
conventions for grammar will be used.
Ambiguous
Grammar
Grammar symbols (i.e. either terminal or nonterminal)
CSCI 4160
Derivations
A grammar derives strings by beginning with the start symbol and repeatedly
replacing a nonterminal by the body of a production for that nonterminal. This
Overview sequence of replacements is called derivation.
Context-Free
Grammar
Derivation Example
Ambiguous
Grammar Given the grammar:
1 exp → exp op exp | ( exp ) | number
2 op → + | - | *
The following is a derivation for an expression. At each step the grammar rule
choice used for the replacement is given on the right.
Context-Free Language
CSCI 4160
∗ +
⇒ and =
New Notations: = ⇒
∗
α1 ⇒
= αn means α1 derives αn in zero or more steps.
Overview +
⇒ αn means α1 derives αn in one or more steps.
α1 =
Context-Free
Grammar
Definition
Ambiguous
Grammar ∗
⇒ α, where S is the start symbol of grammar G, then α is called a
If S =
sentential form of G. A sentential form may contain both terminals and
nonterminals.
A sentence of G is a sentential form with no nonterminals.
The language generated by a grammar G is its set of sentences, denoted
as L(G).
A language that can be generated by a context-free grammar is said to be a
context-free language.
If two grammars generate the same language, the grammars are said to be
equivalent.
Process of discovering a derivation is called parsing.
Leftmost and Rightmost Derivations
CSCI 4160
CSCI 4160
Leftmost Derivation of (number - number)*number
Overview
Context-Free
Grammar
Ambiguous
Grammar
CSCI 4160
Definition
Overview A parse Tree is a labeled tree representation of a derivation
Context-Free that filters out the order in which productions are applied to
Grammar
Ambiguous
replace nonterminals.
Grammar
The interior nodes are labeled by nonterminals
The leaf nodes are labeled by terminals
The children of each internal node A are labeled, from
left to right, by the symbols in the body of the production
by which this A was replaced during the derivation.
CSCI 4160
The following is a parse tree for these two derivations
discussed here.
Overview
Context-Free
Grammar
Ambiguous
Grammar
Roadmap
CSCI 4160
Overview
Context-Free
Grammar 1 Overview
Ambiguous
Grammar
2 Context-Free Grammar
3 Ambiguous Grammar
Ambiguous Grammars
CSCI 4160
Overview Definition
Context-Free
Grammar A grammar that produces more than one parse tree for
Ambiguous some sentence is said to be ambiguous. Such a grammar is
Grammar
called ambiguous grammar.
CSCI 4160
The grammar:
Overview 1 exp → exp op exp | id | id
Context-Free 2 op → + | - | * | /
Grammar
are ambiguous because there are two different parse trees for sentence: id -
Ambiguous number*id
Grammar
Solving Ambiguity
CSCI 4160
There are two basic methods to deal with ambiguities.
Overview
Approach 1: Disambiguating Rule
Context-Free
Grammar
State a rule that specifies in each ambiguous case which of
Ambiguous the parse trees is the correct one. Such a rule is called a
Grammar
disambiguating rule.
Advantage: No need to change the grammar itself
Disadvantage: the syntactic structure of the language is
no longer given by the grammar alone.
CSCI 4160
To use Approach 1 to remove ambiguity from the above ambiguous grammar, the
following disambiguating rules are defined:
all operators (+, -, *, /) are left associative.
+ and - have the same precedence
* and / have the same precedence
* and / have higher precedence than + and -.
CSCI 4160
Ambiguous Grammar
Ambiguous
Grammar We can add precedence to the above ambiguous grammar to remove ambiguity.
To add precedence:
Group Operators into Precedence Levels
Create a nonterminal for each level of precedence
CSCI 4160
The figure below demonstrates how to add precedence to a
grammar.
Overview
Context-Free
Grammar
Ambiguous
Grammar
Dangling Else Problem
CSCI 4160
Dangling Else Grammar
stmt → if expr then stmt
| if expr then stmt else stmt
Overview
| other
Context-Free
Grammar
The above grammar is ambiguous since the string
Ambiguous if E1 then if E2 then S1 else S2 has the two parse trees shown below.
Grammar
Dangling Else Problem
CSCI 4160
Overview
Context-Free
Grammar Two ways to solve the dangling else problem
Ambiguous Approach 1: Create the following disambiguating rule
Grammar
Match each else with the closest unmatched
then.
Approach 2: Rewriting the grammar so that the
disambiguating rule can be incorporated
directly into the grammar.
Dangling Else Problem
CSCI 4160
The following explain the idea to rewrite the dangling-else grammar to remove the
ambiguity.
Overview
A statement appearing between a then and an else must be ”matched”; that
Context-Free
Grammar is, the interior statement must not end with an unmatched or open then.
Ambiguous A matched statement is either an if-then-else statement containing no open
Grammar statements or it is any other kind of unconditional statement.
CSCI 4160
Overview
Context-Free
Grammar
With regarding to the dangling else problem
Ambiguous Rewrite the grammar is usually not taken. Instead, the
Grammar disambiguating rule is preferred.
The principal reason is that parsing methods are easy
to configure in such a way that the most closely nested
rule is obeyed.
Another reason is the added complexity of the new
grammar.
Class Problem
CSCI 4160
Modify the grammar below to enforce the standard arithmetic precedence rules