0% found this document useful (0 votes)

14 views34 pages

CD -2 Notes

Syntax analysis, or parsing, is the second phase of compiler design that checks the syntactical structure of input code using a parse tree, ensuring it adheres to the grammar of the programming language. It involves various parsing algorithms like LL, LR, and LALR, and is responsible for error detection, intermediate code generation, and optimization. The document also discusses context-free grammars, derivations, parse trees, and techniques like left recursion and left factoring that improve parsing efficiency.

Uploaded by

22nn1a05h7.charitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views34 pages

CD -2 Notes

Uploaded by

22nn1a05h7.charitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis.

It
checks the syntactical structure of the given input, i.e. whether the given
input is in the correct syntax (of the language in which the input has been
written) or not. It does so by building a data structure, called a Parse tree or
Syntax tree. The parse tree is constructed by using the pre-defined Grammar
of the language and the input string. If the given input string can be produced
with the help of the syntax tree (in the derivation process), the input string is
found to be in the correct syntax. if not, the error is reported by the syntax
analyzer.

Syntax analysis, also known as parsing, is a process in compiler design where

the compiler checks if the source code follows the grammatical rules of the
programming language. This is typically the second stage of the compilation
process, following lexical analysis.

The main goal of syntax analysis is to create a parse tree or abstract syntax
tree (AST) of the source code, which is a hierarchical representation of the
source code that reflects the grammatical structure of the program.

There are several types of parsing algorithms used in syntax analysis,

including:

● LL parsing: This is a top-down parsing algorithm that starts with the

root of the parse tree and constructs the tree by successively

expanding non-terminals. LL parsing is known for its simplicity and

ease of implementation.

● LR parsing: This is a bottom-up parsing algorithm that starts with

the leaves of the parse tree and constructs the tree by successively

reducing terminals. LR parsing is more powerful than LL parsing and

can handle a larger class of grammars.

● LR(1) parsing: This is a variant of LR parsing that uses lookahead to

disambiguate the grammar.

● LALR parsing: This is a variant of LR parsing that uses a reduced set

of lookahead symbols to reduce the number of states in the LR

parser.

● Once the parse tree is constructed, the compiler can perform

semantic analysis to check if the source code makes sense and

follows the semantics of the programming language.

● The parse tree or AST can also be used in the code generation

phase of the compiler design to generate intermediate code or

machine code.

Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical

representation of the code’s structure. The tree shows the relationship
between the various parts of the code, including statements, expressions,
and operators.

Context-Free Grammar: Syntax analysis uses context-free grammar to

define the syntax of the programming language. Context-free grammar is a
formal language used to describe the structure of programming languages.

Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using

two main approaches: top-down parsing and bottom-up parsing. Top-down
parsing starts from the highest level of the syntax tree and works its way
down, while bottom-up parsing starts from the lowest level and works its
way up.
Error Detection: Syntax analysis is responsible for detecting syntax errors in
the code. If the code does not conform to the rules of the programming
language, the parser will report an error and halt the compilation process.

Intermediate Code Generation: Syntax analysis generates an intermediate

representation of the code, which is used by the subsequent phases of the
compiler. The intermediate representation is usually a more abstract form of
the code, which is easier to work with than the original source code.

Optimization: Syntax analysis can perform basic optimizations on the code,

such as removing redundant code and simplifying expressions.

The pushdown automata (PDA) is used to design the syntax analysis phase.

The Grammar for a Language consists of Production rules.

Example: Suppose Production rules for the Grammar of a language are:

S -> cAd
A -> bc|a
And the input string is “cad”.

Now the parser attempts to construct a syntax tree from this grammar for the
given input string. It uses the given production rules and applies those as
needed to generate the string. To generate string “cad” it uses the rules as
shown in the given diagram:
In step (iii) above, the production rule A->bc was not a suitable one to apply
(because the string produced is “cbcd” not “cad”), here the parser needs to
backtrack, and apply the next production rule available with A which is
shown in step (iv), and the string “cad” is produced.

Thus, the given input can be produced by the given grammar, therefore the
input is correct in syntax. But backtrack was needed to get the correct syntax
tree, which is really a complex process to implement.

Advantages :

● Advantages of using syntax analysis in compiler design include:

● Structural validation: Syntax analysis allows the compiler to check if

the source code follows the grammatical rules of the programming

language, which helps to detect and report errors in the source

code.

● Improved code generation: Syntax analysis can generate a parse

tree or abstract syntax tree (AST) of the source code, which can be

used in the code generation phase of the compiler design to

generate more efficient and optimized code.

● Easier semantic analysis: Once the parse tree or AST is constructed,

the compiler can perform semantic analysis more easily, as it can

rely on the structural information provided by the parse tree or AST.

Disadvantages:
● Disadvantages of using syntax analysis in compiler design include:

● Complexity: Parsing is a complex process, and the quality of the

parser can greatly impact the performance of the resulting code.

Implementing a parser for a complex programming language can be

a challenging task, especially for languages with ambiguous

grammars.

● Reduced performance: Syntax analysis can add overhead to the

compilation process, which can reduce the performance of the

compiler.

● Limited error recovery: Syntax analysis algorithms may not be able

to recover from errors in the source code, which can lead to

incomplete or incorrect parse trees and make it difficult for the

compiler to continue the compilation process.

● Inability to handle all languages: Not all languages have formal

grammars, and some languages may not be easily parseable.

### The Role of the Parser

1. Role of the Parser:

- Definition: The parser is a crucial component in a compiler or interpreter

that takes the sequence of tokens produced by the lexical analyzer and
arranges them into a syntactic structure according to the grammar of the

programming language.

- Purpose: It verifies the syntactic correctness of the source code and

produces a parse tree or abstract syntax tree (AST) that represents the

syntactic structure of the code.

1.1. Tasks Performed by the Parser:

- Syntax Checking: Ensures that the source code conforms to the

grammatical rules of the programming language.

- Parse Tree Construction: Builds a hierarchical tree structure that

represents the syntactic structure of the input code.

- **Error Reporting:** Identifies and reports syntax errors in the source code,

providing information on where and what the error is.

### Context-Free Grammars

**2. Context-Free Grammars (CFG):**

- Definition: A context-free grammar is a formal grammar that defines a

language in terms of production rules. Each rule describes how a

non-terminal symbol can be replaced by a sequence of terminal symbols and

non-terminals.

- **Components:**

- Non-Terminals: Symbols that can be expanded into sequences of

terminal and/or non-terminal symbols.

- Terminals: Symbols that appear in the strings of the language. These

are the basic symbols from which strings are constructed.

- Production Rules: Rules that define how non-terminal symbols can be

replaced with sequences of terminals and non-terminals.

- Start Symbol: A special non-terminal symbol from which the parsing

process begins.

2.1. Example of a CFG:

Consider a simple grammar for arithmetic expressions:

- Non-Terminals: `{E, T, F}`

- Terminals: `{+, *, (, ), id}`

- **Production Rules:**

- `E → E + T | T`

- `T → T * F | F`

- `F → (E) | id`

- Start Symbol: `E`

### **Derivations**

**3. Derivations:**
- **Definition:** A derivation is a sequence of production rule applications

that starts from the start symbol and results in a string composed of terminal

symbols.

- **Types of Derivations:**

- Leftmost Derivation: At each step, the leftmost non-terminal in the

current string is replaced by one of its production rules.

- Rightmost Derivation: At each step, the rightmost non-terminal in the

current string is replaced by one of its production rules.

3.1. Example of a Derivation:

For the grammar given above, deriving the string `id + id * id`:

1. Start with `E`

2. Apply È → E + T`: È + T`
3. Apply È → T` in È + T`: `T + T`

4. Apply `T → F` in `T + T`: `F + T`

5. Apply `F → id` in `F + T`: `id + T`

6. Apply `T → T * F` in `T`: `id + T * F`

7. Apply `T → F` in `T * F`: `id + F * F`

8. Apply `F → id` in `F * F`: `id + id * id`

### Parse Trees

4. Parse Trees:

- Definition: A parse tree is a tree representation of the syntactic

structure of a string according to a context-free grammar. It visually

represents the derivation of a string and shows how each non-terminal

symbol is expanded according to the production rules.

- **Structure:**
- **Root Node:** Represents the start symbol of the grammar.

- Internal Nodes: Represent non-terminal symbols and their expansions.

- Leaf Nodes: Represent terminal symbols of the string being derived.

4.1. Example of a Parse Tree:

For the string `id + id * id` using the grammar:

```

/|\

E+T

| /|\

T T*F
| | |

F F id

| |

id id

```

### **Ambiguity**

**5. Ambiguity:**

- Definition: A grammar is ambiguous if there exists a string that can be

generated by the grammar in more than one way, meaning the string has

more than one valid parse tree.

- Consequences: Ambiguity can lead to multiple interpretations of the

same string, making it difficult for the parser to determine the correct

syntactic structure.
**5.1. Example of Ambiguity:**

Consider a grammar for arithmetic expressions:

- **Production Rules:**

- `E → E + E | E * E | id`

For the string `id + id * id`, this grammar is ambiguous because it can be

parsed in two ways:

1. `(id + (id * id))`

2. `((id + id) * id)`

### **Left Recursion**

6. Left Recursion:

- Definition: A grammar is left-recursive if it has a non-terminal symbol

that can eventually derive itself as the leftmost symbol in its production

rules.

- **Types:**

- Immediate Left Recursion: A non-terminal `A` has a production `A →

Aα | β`, where `α` and `β` are sequences of symbols.

- Indirect Left Recursion: Involves multiple non-terminals where `A` can

eventually derive itself through a chain of other non-terminals.

6.1. Example of Immediate Left Recursion:

Consider a grammar:
- **Production Rules:**

- `A → Aα | β`

For the input `id`, it can be derived as:

1. `A → Aα` (immediate left recursion)

### Left Factoring

7. Left Factoring:

- Definition: Left factoring is a grammar transformation technique used to

eliminate ambiguity and left recursion. It involves refactoring the grammar to

factor out common prefixes in production rules.

- **Purpose:** To simplify parsing and make it suitable for predictive parsers

like LL(1) parsers.

7.1. Example of Left Factoring:

Consider a grammar:

- **Production Rules:**

- `E → E + T | E * T | T`

To left factor this grammar, we can rewrite it as:

- **Transformed Rules:**

- `E → T E'`
- `E' → + T E' | * T E' | ε`

Here, `E'` is introduced to handle the remaining part of the production after

the common prefix `T`.

### **Summary**

The parser plays a critical role in interpreting and validating the syntactic

structure of source code. Context-free grammars define the syntax rules, and

derivations show how strings are generated from these grammars. Parse

trees visualize these derivations, while ambiguity highlights potential issues

in grammar. Left recursion and left factoring are techniques used to address

and transform grammar issues, improving the efficiency and clarity of

parsing. Understanding these concepts is fundamental for designing and

implementing effective parsers in compilers and interpreters.

To compute the FIRST set of a grammar, one can start with all terminals
having the respective terminal in their FIRST set and continue the process by
adding the first terminal of the right-hand side of the production to the set of
the non-terminal in the left-hand side of the production. Repeat this process
until no new element can be added to any set.

FIRST set is a fundamental concept in syntax analysis, and it is used in many

parsing algorithms and techniques. Its computation is a

Rules to compute FIRST set:

1. If x is a terminal, then FIRST(x) = { ‘x’ }

2. If x-> ?, is a production rule, then add ? to FIRST(x).

3. If X->Y1 Y2 Y3….Yn is a production,

1. FIRST(X) = FIRST(Y1)

2. If FIRST(Y1) contains ? then FIRST(X) = { FIRST(Y1) – ? } U {

FIRST(Y2) }

3. If FIRST (Yi) contains ? for all i = 1 to n, then add ? to

FIRST(X).

Example 1:
Production Rules of Grammar
E -> TE’
E’ -> +T E’|?
T -> F T’
T’ -> *F T’ | ?
F -> (E) | id

FIRST sets
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, ? }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, ? }
FIRST(F) = { ( , id }

Example 2:
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | ?
C -> h | ?

FIRST sets
FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba)
= { d, g, h, b, a, ?}
FIRST(A) = { d } U FIRST(BC)
= { d, g, h, ? }
FIRST(B) = { g , ? }
FIRST(C) = { h , ? }

Notes:

1. The grammar used above is Context-Free Grammar (CFG). Syntax of

most programming languages can be specified using CFG.

2. CFG is of the form A -> B, where A is a single Non-Terminal, and B

can be a set of grammar symbols ( i.e. Terminals as well as

Non-Terminals)

Follow(X) to be the set of terminals that can appear immediately to the right
of Non-Terminal X in some sentential form.
Example:
S ->Aa | Ac
A ->b

S S
/ \ / \
A a A c
| |
b b

Here, FOLLOW (A) = {a, c}

Rules to compute FOLLOW set:

1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal

2) If A -> pBq is a production, where p, B and q are any grammar

symbols,
then everything in FIRST(q) except Є is in FOLLOW(B).

3) If A->pB is a production, then everything in FOLLOW(A) is in

FOLLOW(B).

4) If A->pBq is a production and FIRST(q) contains Є,

then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)

Example 1:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }

FOLLOW Set
FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th
rule
FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + ,
$ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *,
+, $, ) }

Example 2:
Production Rules:
S -> aBDh
B -> cC
C -> bC | Є
D -> EF
E -> g | Є
F -> f | Є

FIRST set
FIRST(S) = { a }
FIRST(B) = { c }
FIRST(C) = { b , Є }
FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є }
FIRST(E) = { g , Є }
FIRST(F) = { f , Є }

FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }

Example 3:
Production Rules:
S -> ACB|Cbb|Ba
A -> da|BC
B-> g|Є
C-> h| Є

FIRST set
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b, a}
FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }
FIRST(B) = { g, Є }
FIRST(C) = { h, Є }

FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A) = { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h }

Note:

1. Є as a FOLLOW doesn’t mean anything (Є is an empty string).

2. $ is called end-marker, which represents the end of the input string,

hence used while parsing to indicate that the input string has been

completely processed.

3. The grammar used above is Context-Free Grammar (CFG). The

syntax of a programming language can be specified using CFG.

4. CFG is of the form A -> B, where A is a single Non-Terminal, and B

can be a set of grammar symbols ( i.e. Terminals as well as

Non-Terminals)

Classification of Top-Down Parsing

1. With Backtracking: Brute Force Technique

2. Without Backtracking:

1. Recursive Descent Parsing

2. Predictive Parsing or Non-Recursive Parsing or LL(1) Parsing or

Table Driver Parsing.

Recursive Descent Parsing

1. Whenever a Non-terminal spends the first time then go with the

first alternative and compare it with the given I/P String

2. If matching doesn’t occur then go with the second alternative and

compare it with the given I/P String.

3. If matching is not found again then go with the alternative and so

on.

4. Moreover, If matching occurs for at least one alternative, then the I/P

string is parsed successfully.

Recursive Descent Parsing

S()
{ Choose any S production, S ->X1X2…..Xk;
for (i = 1 to k)
{
If ( Xi is a non-terminal)
Call procedure Xi();
else if ( Xi equals the current input, increment input)
Else /* error has occurred, backtrack and try another
possibility */
}
}

Let’s understand it better with an example:

A recursive descent parsing program consists of a set of procedures, one for
each nonterminal. Execution begins with the procedure for the start symbol
which halts if its procedure body scans the entire input string.

Non-Recursive Predictive Parsing: This type of parsing does not require

backtracking. Predictive parsers can be constructed for LL(1) grammar, the
first ‘L’ stands for scanning the input from left to right, the second ‘L’ stands
for leftmost derivation, and ‘1’ for using one input symbol lookahead at each
step to make parsing action decisions.

### Pre-Processing Steps of Top-Down Parsing

Top-down parsing is a parsing strategy that starts with the start symbol of
the grammar and attempts to derive the input string by applying production
rules in a way that mimics the derivation process. Before applying the
top-down parsing technique, several pre-processing steps are required to
prepare the grammar and ensure that the parsing process is efficient and
unambiguous.

Here are the key pre-processing steps involved in top-down parsing:

### 1. Eliminate Left Recursion

**1.1. Definition:**

- Left Recursion occurs when a non-terminal symbol can eventually

derive itself as the leftmost symbol in its production rules, leading to infinite
recursion in top-down parsing.

1.2. Steps to Eliminate Left Recursion:

- Immediate Left Recursion:

- Given a production of the form `A → Aα | β`, where `A` is a non-terminal,

`α` is a sequence of symbols, and `β` is a sequence of symbols that do not
start with `A`.

- Transform it into:

- `A → βA'`

- `A' → αA' | ε`

- Here, `A'` is a new non-terminal introduced to handle the recursive part.

**1.3. Example:**
- Original Grammar:

- `E → E + T | T`

- Transformed Grammar:

- `E → T E'`

- `E' → + T E' | ε`

### 2. Eliminate Left Factoring

**2.1. Definition:**

- Left Factoring is used to handle cases where a grammar has multiple

productions with a common prefix, which can lead to ambiguity and
inefficiencies in top-down parsing.

2.2. Steps to Eliminate Left Factoring:

- Identify common prefixes in the productions of a non-terminal.

- Refactor the grammar to factor out the common prefix.

**2.3. Example:**

- Original Grammar:

- `E → T + T | T * T`

- Transformed Grammar:

- `E → T E'`

- `E' → + T E' | * T E' | ε`

### **3. Compute First and Follow Sets**

3.1. First Set:

- Definition: The First set of a non-terminal `A` contains all terminal

symbols that can appear at the beginning of strings derived from `A`.

3.2. Follow Set:

- Definition: The Follow set of a non-terminal `A` contains all

terminal symbols that can appear immediately to the right of `A` in some
"sentential" form derived from the start symbol.

3.3. Steps to Compute First and Follow Sets:

- First Set Calculation:

- For each non-terminal, compute the set of terminals that can appear as the
first symbol of any string derived from that non-terminal.

- Include terminal symbols that are directly derivable from the non-terminal.

- Include the terminals of the First sets of the symbols following the
non-terminal if the non-terminal can derive ε (empty string).

- Follow Set Calculation:

- Start by adding the end-of-input marker (usually `$`) to the Follow set of
the start symbol.

- For each production `A → αBβ`, add First(β) (excluding ε) to Follow(B).

- If β can derive ε or is empty, add Follow(A) to Follow(B).

**3.4. Example:**

- For the grammar:

- `A → aB | c`

- `B → b | ε`

- **First Sets:**

- `First(A) = {a, c}`

- `First(B) = {b, ε}`

- **Follow Sets:**

- `Follow(A) = {$}`

- `Follow(B) = {a, c, $}`

### 4. Remove Ambiguities

**4.1. Definition:**

- Ambiguity in a grammar occurs when a string can be derived in more

than one way, resulting in multiple parse trees.

4.2. Steps to Remove Ambiguities:

- Modify the Grammar:

- Refactor the grammar to eliminate ambiguities, which may involve

changing the structure of production rules or adding new non-terminals.
- **Use Disambiguation Rules:**

- Apply additional rules or constraints to resolve ambiguities in parsing.

**4.3. Example:**

- Original Grammar:

- `E → E + E | E * E | id`

- Refactored Grammar (for precedence):

- `E → E1`

- `E1 → E1 + T | T`

- `T → T * F | F`

- `F → id`

### 5. Ensure Grammar is in a Suitable Form

**5.1. Definition:**

- Ensure that the grammar is in a form that is suitable for top-down parsing,
such as a **predictive parsing** grammar.

5.2. Steps to Ensure Suitable Form:

- Eliminate Direct Left Recursion: As described earlier.

- Apply Left Factoring: As described earlier.

- Ensure No ε-Productions: Except for cases where ε is part of a

production that doesn’t affect parsing decisions.
**5.3. Example:**

- Grammar for Predictive Parsing:

- Given `E → E + T | T` and `T → T * F | F`

- Transform it to eliminate ambiguities and prepare it for a predictive parser.

### **Summary**

The pre-processing steps for top-down parsing are essential for ensuring
that the grammar is suitable for efficient and unambiguous parsing. These
steps include eliminating left recursion and left factoring, computing First and
Follow sets, removing ambiguities, and ensuring the grammar is in a suitable
form. By performing these transformations, the grammar becomes more
manageable and ready for top-down parsing techniques such as recursive
descent parsing or LL(1) parsing.

LL(1) or Table Driver or Predictive Parser

1. In LL1, the first L stands for Left to Right, and the second L stands

for Left-most Derivation. 1 stands for the number of Looks Ahead

tokens used by the parser while parsing a sentence.

2. LL(1) parsing is constructed from the grammar which is free from

left recursion, common prefix, and ambiguity.

3. LL(1) parser depends on 1 look ahead symbol to predict the

production to expand the parse tree.

4. This parser is Non-Recursive

Construction of LL(1)predictive parsing table

For each production A -> α repeat the following steps:

● Add A -> α under M[A, b] for all b in FIRST(α)

● If FIRST(α) contains ε then add A -> α under M[A,c] for all c in

FOLLOW(A).

● Size of parsing table = (No. of terminals + 1) * #variables

Example:

Consider the grammar

S → (L) | a
L → SL’
L’ → ε | SL’
M ( ) a $

S 1 2

L 3 3

L’ 5 4 5 4

For any grammar, if M has multiple entries then it is not LL(1) grammar.

Example:
S → iEtSS’/a
S’ →eS/ε
E→b

Important Notes
If a grammar contains left factoring then it can not be LL(1).
Eg - S -> aS | a
---- both productions go in a

If a grammar contains left recursion it can not be LL(1)

Eg - S -> Sa | b
S -> Sa goes to FIRST(S) = b
S -> b goes to b, thus b has 2 entries hence not LL(1)

If the grammar is ambiguous then it can not be LL(1).

Every regular grammar need not be LL(1) because regular grammar may
contain left factoring, left recursion or ambiguity.

Here are the answers to the questions in the provided Compiler Design final assignment
No ratings yet
Here are the answers to the questions in the provided Compiler Design final assignment
20 pages
ACT CH 3 Context Free Languages
No ratings yet
ACT CH 3 Context Free Languages
66 pages
003chapter 3 - Syntax Analysis
No ratings yet
003chapter 3 - Syntax Analysis
171 pages
Ch-6 CFG, Derivation Trees
No ratings yet
Ch-6 CFG, Derivation Trees
23 pages
Compiler Construction CHAPTER 3
No ratings yet
Compiler Construction CHAPTER 3
15 pages
Download
100% (1)
Download
14 pages
CD - CH3 - Syntax Analysis(Parsing)
No ratings yet
CD - CH3 - Syntax Analysis(Parsing)
109 pages
Be Information Technology Semester 4 2024 May Automata Theory Rev 2019 c Scheme
No ratings yet
Be Information Technology Semester 4 2024 May Automata Theory Rev 2019 c Scheme
2 pages
Unit 2
No ratings yet
Unit 2
1 page
Ch3_Syntax Analysis
No ratings yet
Ch3_Syntax Analysis
96 pages
Ch3_Syntax Analysis
No ratings yet
Ch3_Syntax Analysis
96 pages
ATLC CIE 2
No ratings yet
ATLC CIE 2
8 pages
Unit 4
No ratings yet
Unit 4
40 pages
Learning Materials, CD, Unit-3 (Syntax Analysis)
No ratings yet
Learning Materials, CD, Unit-3 (Syntax Analysis)
42 pages
Lecture 3 - CSC 303
No ratings yet
Lecture 3 - CSC 303
23 pages
CD2
No ratings yet
CD2
12 pages
Compiler Construction (CS4623) : Course Instructor: Ms. Tayyaba Zaheer
No ratings yet
Compiler Construction (CS4623) : Course Instructor: Ms. Tayyaba Zaheer
23 pages
Compiler Design Unit-2
No ratings yet
Compiler Design Unit-2
14 pages
Document 7
No ratings yet
Document 7
13 pages
Syntax analysis in CC[1]
No ratings yet
Syntax analysis in CC[1]
15 pages
Syntax Analysis
No ratings yet
Syntax Analysis
20 pages
13000121136
No ratings yet
13000121136
15 pages
COMP Unit 2
No ratings yet
COMP Unit 2
8 pages
pcc unit 3
No ratings yet
pcc unit 3
9 pages
Baisakhi (Compiler Des)
No ratings yet
Baisakhi (Compiler Des)
12 pages
Compiler Design 1
No ratings yet
Compiler Design 1
206 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
2.introduction To Syntax Analysis
No ratings yet
2.introduction To Syntax Analysis
2 pages
Compiler Unit 2 (1)
No ratings yet
Compiler Unit 2 (1)
78 pages
ALL UNITS
No ratings yet
ALL UNITS
19 pages
SEN 317 Lecture 3
No ratings yet
SEN 317 Lecture 3
10 pages
Compiler Construction Complete Notes
No ratings yet
Compiler Construction Complete Notes
22 pages
section c
No ratings yet
section c
16 pages
Chapter 4 Syntax Directed Translation
No ratings yet
Chapter 4 Syntax Directed Translation
37 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
92 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
94 pages
QB of AT ESE Even 22-23
100% (1)
QB of AT ESE Even 22-23
4 pages
Compiler Construction Notes After Mid
No ratings yet
Compiler Construction Notes After Mid
18 pages
Automata
No ratings yet
Automata
6 pages
UNIT 2 PPT
No ratings yet
UNIT 2 PPT
22 pages
Compiler Designassignment
No ratings yet
Compiler Designassignment
15 pages
Lecture 2 - Alphabets-Strings, Languages and Grammars
No ratings yet
Lecture 2 - Alphabets-Strings, Languages and Grammars
47 pages
SS Unit 4
No ratings yet
SS Unit 4
29 pages
Scanner Parser
No ratings yet
Scanner Parser
62 pages
C2p2 CD Report
No ratings yet
C2p2 CD Report
14 pages
Cd notes
No ratings yet
Cd notes
194 pages
SEN 317 Lecture 5
No ratings yet
SEN 317 Lecture 5
18 pages
L2 - Structure of a Compiler
No ratings yet
L2 - Structure of a Compiler
43 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
3.2 Parse Trees
No ratings yet
3.2 Parse Trees
13 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
27 pages
FAFL Final Lecture 26.2 CMH
No ratings yet
FAFL Final Lecture 26.2 CMH
24 pages
Btcse 701-Compiler Design
No ratings yet
Btcse 701-Compiler Design
10 pages
HW2 Solutions 2016 Spring PDF
No ratings yet
HW2 Solutions 2016 Spring PDF
6 pages
Group - A (Short Answer Questions)
No ratings yet
Group - A (Short Answer Questions)
5 pages
Compiler Engineering
No ratings yet
Compiler Engineering
27 pages
Day 5 - Syntax Analysis
No ratings yet
Day 5 - Syntax Analysis
46 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
Chapter 3 (2)
No ratings yet
Chapter 3 (2)
41 pages
Winsem2024-25 Bcse304l Th Ch2024250501975 Reference Material III Module 4 5
No ratings yet
Winsem2024-25 Bcse304l Th Ch2024250501975 Reference Material III Module 4 5
56 pages
Lexical and Syntax Analysis_Updated
No ratings yet
Lexical and Syntax Analysis_Updated
5 pages
Toc Gtu
No ratings yet
Toc Gtu
10 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
CH1 3
No ratings yet
CH1 3
32 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Undecidability
No ratings yet
Undecidability
58 pages
Introduction To Compiler Lexical Analysis Notes
No ratings yet
Introduction To Compiler Lexical Analysis Notes
21 pages
Lec 03 Syntax Analysis
No ratings yet
Lec 03 Syntax Analysis
19 pages
170701
No ratings yet
170701
2 pages
Unit-3 Syntax Analysis
No ratings yet
Unit-3 Syntax Analysis
319 pages
Theory of Automata 20 Most Important Questions
No ratings yet
Theory of Automata 20 Most Important Questions
3 pages
CD Unit - 2
100% (1)
CD Unit - 2
148 pages
Chapter 1 - Introduction To Comp
No ratings yet
Chapter 1 - Introduction To Comp
27 pages
MSCCST 302
No ratings yet
MSCCST 302
2 pages
Handout 12 - Automata Theory
No ratings yet
Handout 12 - Automata Theory
9 pages
TuringMachine Must Read Guide
No ratings yet
TuringMachine Must Read Guide
4 pages
Longest Common Substring Problem: Example
No ratings yet
Longest Common Substring Problem: Example
5 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
Session 07 - Context Free Grammar
No ratings yet
Session 07 - Context Free Grammar
34 pages
Compiler Design
No ratings yet
Compiler Design
23 pages
Question Bank TOC - Rejinpaul PDF
No ratings yet
Question Bank TOC - Rejinpaul PDF
24 pages
Introduction To Compiler Design (CD) : Mu-Mit
No ratings yet
Introduction To Compiler Design (CD) : Mu-Mit
22 pages
Automata Theory Introduction
No ratings yet
Automata Theory Introduction
73 pages
Lec 3
No ratings yet
Lec 3
76 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)

CD -2 Notes

Uploaded by

CD -2 Notes

Uploaded by

Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis.

Syntax analysis, also known as parsing, is a process in compiler design where

There are several types of parsing algorithms used in syntax analysis,

● LL parsing: This is a top-down parsing algorithm that starts with the

root of the parse tree and constructs the tree by successively

expanding non-terminals. LL parsing is known for its simplicity and

● LR parsing: This is a bottom-up parsing algorithm that starts with

reducing terminals. LR parsing is more powerful than LL parsing and

can handle a larger class of grammars.

disambiguate the grammar.

● LALR parsing: This is a variant of LR parsing that uses a reduced set

of lookahead symbols to reduce the number of states in the LR

● Once the parse tree is constructed, the compiler can perform

semantic analysis to check if the source code makes sense and

follows the semantics of the programming language.

phase of the compiler design to generate intermediate code or

Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical

Context-Free Grammar: Syntax analysis uses context-free grammar to

Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using

Intermediate Code Generation: Syntax analysis generates an intermediate

Optimization: Syntax analysis can perform basic optimizations on the code,

The Grammar for a Language consists of Production rules.

Example: Suppose Production rules for the Grammar of a language are:

● Advantages of using syntax analysis in compiler design include:

● Structural validation: Syntax analysis allows the compiler to check if

the source code follows the grammatical rules of the programming

language, which helps to detect and report errors in the source

● Improved code generation: Syntax analysis can generate a parse

used in the code generation phase of the compiler design to

generate more efficient and optimized code.

● Easier semantic analysis: Once the parse tree or AST is constructed,

the compiler can perform semantic analysis more easily, as it can

rely on the structural information provided by the parse tree or AST.

● Complexity: Parsing is a complex process, and the quality of the

parser can greatly impact the performance of the resulting code.

Implementing a parser for a complex programming language can be

a challenging task, especially for languages with ambiguous

● Reduced performance: Syntax analysis can add overhead to the

compilation process, which can reduce the performance of the

● Limited error recovery: Syntax analysis algorithms may not be able

to recover from errors in the source code, which can lead to

incomplete or incorrect parse trees and make it difficult for the

compiler to continue the compilation process.

● Inability to handle all languages: Not all languages have formal

grammars, and some languages may not be easily parseable.

### **The Role of the Parser**

**1. Role of the Parser:**

- **Definition:** The parser is a crucial component in a compiler or interpreter

- **Purpose:** It verifies the syntactic correctness of the source code and

syntactic structure of the code.

**1.1. Tasks Performed by the Parser:**

- **Syntax Checking:** Ensures that the source code conforms to the

grammatical rules of the programming language.

- **Parse Tree Construction:** Builds a hierarchical tree structure that

represents the syntactic structure of the input code.

providing information on where and what the error is.

### **Context-Free Grammars**

- **Definition:** A context-free grammar is a formal grammar that defines a

language in terms of production rules. Each rule describes how a

non-terminal symbol can be replaced by a sequence of terminal symbols and

- **Non-Terminals:** Symbols that can be expanded into sequences of

terminal and/or non-terminal symbols.

- **Terminals:** Symbols that appear in the strings of the language. These

are the basic symbols from which strings are constructed.

- **Production Rules:** Rules that define how non-terminal symbols can be

replaced with sequences of terminals and non-terminals.

- **Start Symbol:** A special non-terminal symbol from which the parsing

**2.1. Example of a CFG:**

- **Non-Terminals:** `{E, T, F}`

- **Terminals:** `{+, *, (, ), id}`

- **Start Symbol:** `E`

- **Leftmost Derivation:** At each step, the leftmost non-terminal in the

current string is replaced by one of its production rules.

- **Rightmost Derivation:** At each step, the rightmost non-terminal in the

current string is replaced by one of its production rules.

### The Role of the Parser

1. Role of the Parser:

- Definition: The parser is a crucial component in a compiler or interpreter

- Purpose: It verifies the syntactic correctness of the source code and

1.1. Tasks Performed by the Parser:

- Syntax Checking: Ensures that the source code conforms to the

- Parse Tree Construction: Builds a hierarchical tree structure that

### Context-Free Grammars

- Definition: A context-free grammar is a formal grammar that defines a

- Non-Terminals: Symbols that can be expanded into sequences of

- Terminals: Symbols that appear in the strings of the language. These

- Production Rules: Rules that define how non-terminal symbols can be

- Start Symbol: A special non-terminal symbol from which the parsing

2.1. Example of a CFG:

- Non-Terminals: `{E, T, F}`

- Terminals: `{+, *, (, ), id}`

- Start Symbol: `E`

- Leftmost Derivation: At each step, the leftmost non-terminal in the

- Rightmost Derivation: At each step, the rightmost non-terminal in the

3.1. Example of a Derivation:

### Parse Trees

4. Parse Trees:

- Definition: A parse tree is a tree representation of the syntactic

- Internal Nodes: Represent non-terminal symbols and their expansions.

- Leaf Nodes: Represent terminal symbols of the string being derived.

4.1. Example of a Parse Tree:

- Definition: A grammar is ambiguous if there exists a string that can be

- Consequences: Ambiguity can lead to multiple interpretations of the

6. Left Recursion:

- Definition: A grammar is left-recursive if it has a non-terminal symbol

- Immediate Left Recursion: A non-terminal `A` has a production `A →

- Indirect Left Recursion: Involves multiple non-terminals where `A` can

6.1. Example of Immediate Left Recursion:

### Left Factoring

7. Left Factoring:

- Definition: Left factoring is a grammar transformation technique used to

7.1. Example of Left Factoring: