CD -2 Notes
CD -2 Notes
It
checks the syntactical structure of the given input, i.e. whether the given
input is in the correct syntax (of the language in which the input has been
written) or not. It does so by building a data structure, called a Parse tree or
Syntax tree. The parse tree is constructed by using the pre-defined Grammar
of the language and the input string. If the given input string can be produced
with the help of the syntax tree (in the derivation process), the input string is
found to be in the correct syntax. if not, the error is reported by the syntax
analyzer.
The main goal of syntax analysis is to create a parse tree or abstract syntax
tree (AST) of the source code, which is a hierarchical representation of the
source code that reflects the grammatical structure of the program.
ease of implementation.
the leaves of the parse tree and constructs the tree by successively
parser.
● The parse tree or AST can also be used in the code generation
machine code.
The pushdown automata (PDA) is used to design the syntax analysis phase.
Now the parser attempts to construct a syntax tree from this grammar for the
given input string. It uses the given production rules and applies those as
needed to generate the string. To generate string “cad” it uses the rules as
shown in the given diagram:
In step (iii) above, the production rule A->bc was not a suitable one to apply
(because the string produced is “cbcd” not “cad”), here the parser needs to
backtrack, and apply the next production rule available with A which is
shown in step (iv), and the string “cad” is produced.
Thus, the given input can be produced by the given grammar, therefore the
input is correct in syntax. But backtrack was needed to get the correct syntax
tree, which is really a complex process to implement.
Advantages :
code.
tree or abstract syntax tree (AST) of the source code, which can be
Disadvantages:
● Disadvantages of using syntax analysis in compiler design include:
grammars.
compiler.
that takes the sequence of tokens produced by the lexical analyzer and
arranges them into a syntactic structure according to the grammar of the
programming language.
produces a parse tree or abstract syntax tree (AST) that represents the
- **Error Reporting:** Identifies and reports syntax errors in the source code,
non-terminals.
- **Components:**
process begins.
- **Production Rules:**
- `E → E + T | T`
- `T → T * F | F`
- `F → (E) | id`
### **Derivations**
**3. Derivations:**
- **Definition:** A derivation is a sequence of production rule applications
that starts from the start symbol and results in a string composed of terminal
symbols.
- **Types of Derivations:**
For the grammar given above, deriving the string `id + id * id`:
2. Apply `E → E + T`: `E + T`
3. Apply `E → T` in `E + T`: `T + T`
4. Apply `T → F` in `T + T`: `F + T`
- **Structure:**
- **Root Node:** Represents the start symbol of the grammar.
```
/|\
E+T
| /|\
T T*F
| | |
F F id
| |
id id
```
### **Ambiguity**
**5. Ambiguity:**
generated by the grammar in more than one way, meaning the string has
same string, making it difficult for the parser to determine the correct
syntactic structure.
**5.1. Example of Ambiguity:**
- **Production Rules:**
- `E → E + E | E * E | id`
For the string `id + id * id`, this grammar is ambiguous because it can be
that can eventually derive itself as the leftmost symbol in its production
rules.
- **Types:**
Consider a grammar:
- **Production Rules:**
- `A → Aα | β`
Consider a grammar:
- **Production Rules:**
- `E → E + T | E * T | T`
- **Transformed Rules:**
- `E → T E'`
- `E' → + T E' | * T E' | ε`
Here, `E'` is introduced to handle the remaining part of the production after
### **Summary**
The parser plays a critical role in interpreting and validating the syntactic
structure of source code. Context-free grammars define the syntax rules, and
derivations show how strings are generated from these grammars. Parse
in grammar. Left recursion and left factoring are techniques used to address
1. FIRST(X) = FIRST(Y1)
FIRST(Y2) }
FIRST(X).
Example 1:
Production Rules of Grammar
E -> TE’
E’ -> +T E’|?
T -> F T’
T’ -> *F T’ | ?
F -> (E) | id
FIRST sets
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, ? }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, ? }
FIRST(F) = { ( , id }
Example 2:
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | ?
C -> h | ?
FIRST sets
FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba)
= { d, g, h, b, a, ?}
FIRST(A) = { d } U FIRST(BC)
= { d, g, h, ? }
FIRST(B) = { g , ? }
FIRST(C) = { h , ? }
Notes:
Non-Terminals)
Follow(X) to be the set of terminals that can appear immediately to the right
of Non-Terminal X in some sentential form.
Example:
S ->Aa | Ac
A ->b
S S
/ \ / \
A a A c
| |
b b
Example 1:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id
FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }
FOLLOW Set
FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th
rule
FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + ,
$ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *,
+, $, ) }
Example 2:
Production Rules:
S -> aBDh
B -> cC
C -> bC | Є
D -> EF
E -> g | Є
F -> f | Є
FIRST set
FIRST(S) = { a }
FIRST(B) = { c }
FIRST(C) = { b , Є }
FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є }
FIRST(E) = { g , Є }
FIRST(F) = { f , Є }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }
Example 3:
Production Rules:
S -> ACB|Cbb|Ba
A -> da|BC
B-> g|Є
C-> h| Є
FIRST set
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b, a}
FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }
FIRST(B) = { g, Є }
FIRST(C) = { h, Є }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A) = { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h }
Note:
hence used while parsing to indicate that the input string has been
completely processed.
Non-Terminals)
2. Without Backtracking:
on.
4. Moreover, If matching occurs for at least one alternative, then the I/P
Top-down parsing is a parsing strategy that starts with the start symbol of
the grammar and attempts to derive the input string by applying production
rules in a way that mimics the derivation process. Before applying the
top-down parsing technique, several pre-processing steps are required to
prepare the grammar and ensure that the parsing process is efficient and
unambiguous.
**1.1. Definition:**
- Transform it into:
- `A → βA'`
- `A' → αA' | ε`
**1.3. Example:**
- Original Grammar:
- `E → E + T | T`
- Transformed Grammar:
- `E → T E'`
- `E' → + T E' | ε`
**2.1. Definition:**
**2.3. Example:**
- Original Grammar:
- `E → T + T | T * T`
- Transformed Grammar:
- `E → T E'`
- For each non-terminal, compute the set of terminals that can appear as the
first symbol of any string derived from that non-terminal.
- Include terminal symbols that are directly derivable from the non-terminal.
- Include the terminals of the First sets of the symbols following the
non-terminal if the non-terminal can derive ε (empty string).
- Start by adding the end-of-input marker (usually `$`) to the Follow set of
the start symbol.
**3.4. Example:**
- `A → aB | c`
- `B → b | ε`
- **First Sets:**
- **Follow Sets:**
- `Follow(A) = {$}`
**4.1. Definition:**
**4.3. Example:**
- Original Grammar:
- `E → E + E | E * E | id`
- `E → E1`
- `E1 → E1 + T | T`
- `T → T * F | F`
- `F → id`
**5.1. Definition:**
- Ensure that the grammar is in a form that is suitable for top-down parsing,
such as a **predictive parsing** grammar.
- Given `E → E + T | T` and `T → T * F | F`
### **Summary**
The pre-processing steps for top-down parsing are essential for ensuring
that the grammar is suitable for efficient and unambiguous parsing. These
steps include eliminating left recursion and left factoring, computing First and
Follow sets, removing ambiguities, and ensuring the grammar is in a suitable
form. By performing these transformations, the grammar becomes more
manageable and ready for top-down parsing techniques such as recursive
descent parsing or LL(1) parsing.
FOLLOW(A).
Example:
S 1 2
L 3 3
L’ 5 4 5 4
For any grammar, if M has multiple entries then it is not LL(1) grammar.
Example:
S → iEtSS’/a
S’ →eS/ε
E→b
Important Notes
If a grammar contains left factoring then it can not be LL(1).
Eg - S -> aS | a
---- both productions go in a
Every regular grammar need not be LL(1) because regular grammar may
contain left factoring, left recursion or ambiguity.