0% found this document useful (0 votes)
5 views

Lecture 03

Syntax analysis is the second phase of a compiler that checks the source code for grammatical structure and ensures adherence to language grammar rules. Parsing, a key component of this phase, involves analyzing token sequences to construct parse trees and facilitate semantic analysis. Context-Free Grammar (CFG) is used to define the syntax of programming languages, consisting of production rules that describe how valid strings can be generated.

Uploaded by

nihafahima9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 03

Syntax analysis is the second phase of a compiler that checks the source code for grammatical structure and ensures adherence to language grammar rules. Parsing, a key component of this phase, involves analyzing token sequences to construct parse trees and facilitate semantic analysis. Context-Free Grammar (CFG) is used to define the syntax of programming languages, consisting of production rules that describe how valid strings can be generated.

Uploaded by

nihafahima9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Syntax Analysis

Overview of Syntax Analysis


• Definition:
Syntax analysis is the second phase of the compiler.
It checks the source code for its grammatical structure.

• Purpose:
Ensure the program follows the rules of the source language
grammar.
Role of Parsing in Compiler Design
Parsing is the process of analyzing a sequence of tokens to determine its
grammatical structure based on a given formal grammar.

Main Responsibilities:
•Check for Syntax Errors: Ensures the program adheres to the syntax rules of
the language.
•Construct Parse Tree: Represents the structure of the source code
hierarchically.
•Facilitate Semantic Analysis: Provides input for the next stage of the
compiler (semantic analysis) by structuring the code.
•Assist in Code Generation: Helps in translating the parse tree into
intermediate code or machine code.
•Example: Converting a mathematical expression into a parse tree and using
that for further analysis.
Position of parser in compiler model

❑ Parsing Approach
• Top down
• Bottom UP
Context-Free Grammar (CFG)
A Context-Free Grammar (CFG) is a formal system used to define
the syntax of programming languages. It consists of a set of
production rules that describe how strings in a language can be
generated.
•Formal way to describe the syntax of programming languages.
•Defines the syntactic structure of valid strings in a language.
•Composed of rules, called productions.

Notation:
G = (N, Σ, P, S)
CFG is defined by four tuples:

•V: A set of non-terminal symbols (also called variables).


•Σ: A set of terminal symbols (disjoint from V).
•P: A set of production rules, where each rule maps a non-terminal to
a string of terminals and/or non-terminals.
•S: A start symbol, which is a non-terminal.
Example
•Example: CFG for Simple Arithmetic Expression
Formal Definition of CFG
1. Terminals (T) are the basic symbols from which strings
are formed.
• The term "token name" is a synonym for "terminal" and
frequently we will use the word "token" for terminal.
• Terminals are the first components of the tokens
output by the lexical analyzer.
• Keywords: if, else, (, )

stmt → if ( expr ) stmt else stmt


stmt → if ( expr ) stmt else stmt

2. Non-terminals (V) are syntactic variables that denote


sets of strings.
Non-terminals: stmt, expr
The sets of strings denoted by non-terminals help
define the language generated by the grammar.
Nonterminals impose a hierarchical structure on the
language that is key to syntax analysis and translation.
stmt → if ( expr ) stmt else stmt

3. Start symbol (S): one non-terminal is distinguished as


the start symbol,
✔ the set of strings it denotes is the language generated
by the grammar.
✔ Conventionally, the productions for the start symbol are
listed first.
stmt → if ( expr ) stmt else stmt
4. Productions (P) of a grammar specify the manner in
which the terminals & non-terminals can be combined
to form strings. Each production consists of:
• (a) A non-terminal called the head or left side of the
production; this production defines some of the strings
denoted by the head.
• (b) The symbol →. Sometimes : : = has been used in
place of the arrow.
• (c) A body or right side consisting of zero or more
terminals and non-terminals.
Example: Grammar for simple arithmetic
expressions
expression → expression + term
expression → expression - term Terminals:
expression → term id, +, -, *, /, (, )
term → term * factor Non-terminals:
term → term / factor
expression, term, factor
term → factor
factor → ( expression ) Start Symbol
factor → id expression
Notational Conventions
1. Terminals:
2. Non-terminals:
-Lowercase letters early in the
alphabet: a, b, c. -Uppercase letters early in the alphabet:
A, B, C.
-Operator symbols: +, *
-Letter S: start symbol.
-Punctuation symbols:
parentheses, comma, -Lowercase, italic: expr, stmt.
-Digits: 0, 1, . . . , 9. -Uppercase letters: E, T, F
-Boldface strings: id, if, each of
which represents a single
terminal symbol.
5. Lowercase Greek letters, α, β,
δ: represent (possibly empty)
3. Uppercase letters late in strings of grammar symbols.
the alphabet: as X, Y, Z, Thus , a generic production can
represent grammar be written as A → α, where
A-head & α-body.
symbols(T or V).
6. A set of productions A → α1, A
4. Lowercase letters late in → α2, ... , Ak → αk with a
the alphabet: u, v, ... ,z, common head A (call them
represent (possibly A-productions) , written
empty) strings of A → α1,| A → α2,| ... ,| Ak → αk
terminals. Call α1, α2, …, αk are the
alternatives for A.

7. Unless stated otherwise, the head of the


first production is the start symbol.
Example
Construct the CFG for the language having any number of a's over the set ∑= {a}.
Solution:
r.e. = a*
Production rule for the Regular expression is as follows:
S → aS rule 1
S → ε rule 2
to derive a string "aaaaaa", we can start with start symbols.
S
aS
aaS rule 1
aaaS rule 1
aaaaS rule 1
aaaaaS rule 1
aaaaaaS rule 1
aaaaaaε rule 2
aaaaaa
The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null string because S is a start
symbol and rule 2 gives S → ε.
Construct a CFG for the language L = anb2n where n>=1.
Solution:
The string that can be generated for a given language is
{abb, aabbbb, aaabbbbbb....}.
The grammar could be:

S → aSbb | abb
Now if we want to derive a string "aabbbb", we can start
with start symbols.
S → aSbb
S → aabbbb
Example 3:
Construct a CFG for a language L = {wcwR | where w € (a, b)*}.
Solution:
The string that can be generated for a given language is {aacaa, bcb, abcba, bacab,
abbcbba, ....}
The grammar could be:
1.S → aSa rule 1
2.S → bSb rule 2
3.S → c rule 3
Now if we want to derive a string "abbcbba", we can start with start symbols.
1.S → aSa
2.S → abSba from rule 2
3.S → abbSbba from rule 2
4.S → abbcbba from rule 3
Derivation

The derivation is the process of using the production rules (grammar) to


derive the input string. There are two decisions that the parser must make
to form the input string:
Deciding which non-terminal is to be replaced. There are two options to do this:
a) Left-most Derivation: When the non-terminals are replaced from left to right, it is
called left-most derivation.
b) Right-most Derivation: When the non-terminals are replaced from right to left, it is
called right-most derivation.
Deciding the production rule using which the non-terminal will be replaced.
Example:
Production rules:
•S = S + S
•S = S - S
•S = a | b |c
Input:
a-b+c
The left-most derivation is:
•S = S + S
•S = S - S + S
•S = a - S + S
•S = a - b + S
•S = a - b + c
Example:
•S = S + S
•S = S - S
•S = a | b |c
Input:
a-b+c
The right-most derivation is:
•S = S - S
•S = S - S + S
•S = S - S + c
•S = S - b + c
•S = a - b + c
Parse Tree
• Definition: A hierarchical tree that represents the derivation of a
string according to a grammar.
A parse tree contains the following properties:
• The root node is always a node indicating start symbols.
• The derivation is read from left to right.
• The leaf node is always terminal nodes.
• The interior nodes are always the non-terminal nodes.
Example: Illustration of a parse tree for an
arithmetic expression.
Regular Expression: id + id * id
Grammar Rules: Parse Tree
•E → E + T | T E
•T → T * F | F /|\
E + T
• F → id | /\
T T F
/\ | |
F * F id
| |
id id
Production rules:

1.E  E  E
Input:
2.E  E  E
a*b+c
3.E = a | b | c

Draw a derivation tree for the string "bab" from the CFG given by
1.S → bSb | a | b

Now, the derivation tree for the string "bbabb" is as


follows:
Construct a derivation tree for the string aabbabba for the CFG given
by,
1.S → aB | bA
2.A → a | aS | bAA
3.B → b | bS | aBB Now, the derivation tree is as
follows:
Example
Show the derivation tree for string "aabbbb" with the following
grammar.
1.S  AB | ε
2.A → aB
3.B  Sb
Parse Tree & Derivation
-(id+id)

LMD
Ambiguity in Grammar
A grammar is said to be ambiguous if there exists
more than one leftmost derivation or more than one
rightmost derivation or more than one parse tree for
the given input string. If the grammar is not
ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for
compiler construction. No method can automatically
detect and remove the ambiguity, but we can remove
ambiguity by re-writing the whole grammar without
ambiguity.
Example 1:
Let us consider a grammar G with the production
rule
1.E  I
2.E  E  E
3.E  E  E
4.E → E
5.I → ε | 0 | 1 | 2 | ... | 9
For the string "3  2  5", the above grammar can generate two parse trees by leftmost
derivation:
Example 2:
Check whether the given grammar G is ambiguous or not.
•E → E + E
•E → E - E
• E → id
From the above grammar derive String "id + id - id"

First Leftmost Second


1.E  E  E Leftmost
1.E  E  E
2. → id + E 1.E  E  E
2.E  E  E
3. → id + E  E 2. → E  E  E
3.E → id
4. → id + id - E 3. → id + E  E
5. → id + id- id 4. → id + id - E
5. → id + id - id
Grammar for mathematical expressions

Example strings:

Denotes any number

Prof. Busch - LSU 30


A leftmost derivation
for

Prof. Busch - LSU 31


Another
leftmost derivation
for

32
Two derivation trees
for

Prof. Busch - LSU 33


take

Prof. Busch - LSU 34


Good Bad
Tree Tree

Compute expression result


using the tree

Prof. Busch - LSU 35


Types of Parsers

• Top-Down Parsing:
• Builds the parse tree from the root.
• Example: Recursive Descent Parser, LL Parser.
• Bottom-Up Parsing:
• Builds the parse tree from the leaves.
• Example: Shift-Reduce Parser, LR Parser.

You might also like