0% found this document useful (0 votes)
93 views36 pages

CD Module2 16 03 23 PDF

The document discusses different types of parsers used in compiler design. It describes that a parser breaks down source code into tokens during syntax analysis. There are three main stages - lexical analysis, syntactic analysis, and semantic analysis. Top-down and bottom-up parsers are discussed along with LR, LL, recursive descent, and predictive parsers. Context-free grammars are also summarized.

Uploaded by

Souvik Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views36 pages

CD Module2 16 03 23 PDF

The document discusses different types of parsers used in compiler design. It describes that a parser breaks down source code into tokens during syntax analysis. There are three main stages - lexical analysis, syntactic analysis, and semantic analysis. Top-down and bottom-up parsers are discussed along with LR, LL, recursive descent, and predictive parsers. Context-free grammars are also summarized.

Uploaded by

Souvik Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Compiler Design (YCS6003)

Syntax Analysis
Soumya Majumdar
Parser
● Parser is a program that is usually part of a compiler.
● Receives input in the form of sequential source program instructions, interactive online
commands, markup tags or some other defined interface.
● Parsing happens during the analysis stage of compilation
● In parsing, code is taken from the preprocessor, broken into smaller pieces and analyzed
so other software can understand it. The parser does this by building a data structure out
of the pieces of input.
● Parser consists of three components, each of which handles a different stage of the
parsing process. The three stages are: Lexical analysis, Syntactic analysis, Semantic
analysis

Lexical Analysis
● Lexical analyzer or scanner takes code from preprocessor and breaks it into smaller
pieces.
● Groups the input code into sequences of characters called lexemes, each of which
corresponds to a token.
● Tokens are units of grammar in the programming language that the compiler understands.
● Lexical analyzers also remove white space characters, comments and errors from the
input.
Syntactic Analysis
● Checks the syntactical structure of the input using a data structure called a parse tree or
derivation tree.
● Syntax analyzer uses tokens to construct a parse tree that combines the predefined
grammar of the programming language with the tokens of the input string.
● Syntactic analyzer reports a syntax error if the syntax is incorrect.
Semantic Analysis
● Verifies the parse tree against a symbol table and determines whether it is semantically
consistent. This process is also known as context sensitive analysis.
● Includes data type checking, label checking and flow control checking.

Types of Parser
<sentence> ::= <subject> <verb> <object>

<subject> ::= <article> <noun>

<article> ::= the | a

<noun> ::= dog | cat | person

<verb> ::= pets | fed

<object> ::= <article> <noun>


Types of Parser
● Top-down parsers. These start with a rule at the top, such as <sentence> ::=
<subject> <verb> <object>. Given input string "The person fed a cat," parser would
look at first rule, and work its way down all the rules checking to make sure they are
correct. In this case, the first word is a <subject>, it follows the subject rule, and parser
will continue reading sentence looking for a <verb>.
● Bottom-up parsers. These start with rule at the bottom. In this case, parser would
look for an <object> first, then look for a <verb> next and so on.
Types of Parser in terms of derivation
● LL parsers: parse input from left to right using leftmost derivation to match the rules in the
grammar to the input. This process derives a string that validates the input by expanding the
leftmost element of the parse tree.
● LR parsers: parse input from left to right using rightmost derivation. This process derives a
string by expanding the rightmost element of the parse tree.
Types of Parser
● Recursive descent parsers: Recursive descent parsers backtrack after each decision point to

double-check accuracy. Recursive descent parsers use top-down parsing.

● Predictive Parser : Predictive parser is a recursive descent parser with no backtracking or

backup. It is a top-down parser that does not require backtracking. At each step, the choice of

the rule to be expanded is made upon the next terminal symbol.

● Earley parsers: These parse all context-free grammars, unlike LL and LR parsers. Most

real-world programming languages do not use context-free grammars.

● Shift-reduce parsers: These shift and reduce an input string. At each stage in string, they

reduce word to a grammar rule. This approach reduces the string until it has been completely

checked.
Types of Parser
Top-down parser
● When the parser starts constructing the parse tree from the start symbol and then tries to

transform the start symbol to the input, it is called top-down parsing.

● Recursive descent parsing : It is a common form of top-down parsing. It is called recursive as

it uses recursive procedures to process the input. Recursive descent parsing suffers from

backtracking.

● Backtracking : If one derivation of a production fails, the syntax analyzer restarts process using

different rules of same production. This technique may process the input string more than once

to determine the right production.


Recursive-descent parser
● Recursive descent is a top-down parsing technique that constructs the parse tree from the top

and the input is read from left to right.

● This parsing technique recursively parses the input to make a parse tree, which may or may not

require back-tracking.

● A form of recursive-descent parsing that does not require any back-tracking is known as

predictive parsing.

● This parsing technique is regarded recursive as it uses context-free grammar which is recursive

in nature.
Back-tracking

S → rXd | rZd

X → oa | ea

Z → ai
Back-tracking
Input string: read
Predictive Parser
● Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string.
● Predictive parser does not suffer from backtracking.
● Predictive parser uses a look-ahead pointer, which points to the next input symbols.
● To make parser back-tracking free, predictive parser puts some constraints on grammar and
accepts only a class of grammar known as LL(k) grammar.
● Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
● Both the stack and the input contains an end symbol $ to denote that the stack is empty and the
input is consumed
● Parser refers to the parsing table to take any decision on the input and stack element
combination.
Predictive Parser
Recursive-descent vs Predictive Parser
● In recursive descent parsing, the parser may have more than one production to choose from for a
single instance of input.
● In predictive parser, each step has at most one production to choose.
● There might be instances where there is no production matching the input string, making the
parsing procedure to fail.
LL Parser
● LL Parser accepts LL grammar.
● LL grammar is a subset of context-free grammar but with some restrictions to get the simplified
version, in order to achieve easy implementation.
● LL grammar can be implemented by means of both algorithms namely, recursive-descent or
table-driven.
● LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the
second L in LL(k) stands for left-most derivation and k itself represents the number of look
aheads. Generally k = 1, so LL(k) may also be written as LL(1).
LL Parser
Bottom-up Parser
● Bottom-up parsing starts with input symbols and tries to construct the parse tree up to the start
symbol
● Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it
reaches the root node.
Bottom-up Parser
Bottom-up Parser
Grammar:

1. S → S+S
2. S → S-S
3. S → (S)
4. S→a

Input string:

a1-(a2+a3)
Bottom-up Parser
Parsing table:
Context free grammar
● A context free grammar (CFG) is a forma grammar which is used to generate all the possible
patterns of strings in a given formal language. It is defined as four tuples −

G=(V,T,P,S)

● G is a grammar, which consists of a set of production rules. It is used to generate the strings of a
language.
● T is the final set of terminal symbols. It is denoted by lower case letters.
● V is the final set of non-terminal symbols. It is denoted by capital letters
● P is a set of production rules, which is used for replacing non-terminal symbols (on the left side
of production) in a string with other terminals (on the right side of production).
● S is the start symbol used to derive the string
Context free grammar
● Context free grammar consists of terminals, non-terminals, start symbol and production.
● Terminal: basic symbols from which strings formed. (can ve also called “token name”)
● Non-terminal: syntactic variables that denotes set of strings. Set of strings denoted by
non-terminal help to define languages generated by grammar.
● Start Symbol: one non-terminal is distinguished as start symbol.
● Production: specify the manner in which terminals and non-terminals can be combined to form
strings
● A production consists of (i) a non-terminal symbol (head/left side of production) (ii) -> symbol
or ::= symbol (iii) terminal/non-terminals (right side of production)
Context free grammar
LR Parser
● Bottom-up parser for context-free grammar that is very generally used by computer
programming language compiler and other associated tools.
● LR parser reads their input from left to right and produces a right-most derivation.
● It is called a Bottom-up parser because it attempts to reduce the top-level grammar productions
by building up from the leaves.
● LR parsers are the most powerful parser of all deterministic parsers in practice.
● LR(k) parser: here the L refers to the left-to-right scanning, R refers to the rightmost derivation
in reverse
● k refers to the number of input symbols for lookahead that are used in making parsing decision.
LR Parser advantages
● Can be constructed to recognise vairually all programming languages construct for which context
free grammer can be written
● LR parsing method is most general non-backtracking shift-reduce parsing method
● LR parser can detect a syntactic error as soon as possible to do so on a left-right scan of the input

Disadvantage: it is too much work to construct an LR parser by hand for a typical programming
language grammer.
LR(0) item
● An LR(0) item is a production of the grammar with exactly one dot on the right-hand side.
● For example, production T → T * F leads to four LR(0) items:

T→⋅T*F

T→T⋅*F

T→T*⋅F

T→T*F⋅

● What is to the left of the dot has just been read, and the parser is ready to read the remainder,
after the dot.
● Two LR(0) items that come from the same production but have the dot in different places are
considered different LR(0) items.
Closure of LR(0) item
S is a set of LR(0) items. The following rules tell how to build closure(S), the closure of S. We must
add LR(0) items to S until there are no more to add.

● All members of S are in the closure(S).


● Suppose closure(S) contains item A → α⋅Bβ, where B is a nonterminal. Find all productions B
→ γ1, …, B → γn with B on the left-hand side. Add LR(0) items B → ⋅γ1, … B → ⋅γn to
closure(S).
Closure of LR(0) item
For example, let's take the closure of set {E → E + ⋅ T}.

Since there is an item with a dot immediately before nonterminal T, we add T → ⋅ F and T → ⋅ T * F.
The set now contains the following LR(0) items.

E→E+⋅T
T→⋅F
T→⋅T*F
Closure of LR(0) item
Now there is an item in the set with a dot immediately followed by F. So we add items F → ⋅ n and F →
⋅ ( E ). The set now contains the following items.

E→E+⋅T

T→⋅F

T→⋅T*F

F→⋅n

F→⋅(E)

No more LR(0) items need to be added, so the closure is finished.


Closure of LR(0) item
What is the point of the closure?

● LR(0) item E → E + ⋅ T indicates that the parser has just finished reading an expression
followed by a + sign. In fact, E + are the top two symbols on the stack.
● Now, the parser is looking to see if there is a T next. (It does not predict that there is a T next. It
is just considering that as a possibility.)
● But that means it should be looking for something that is the right-hand side of a production for
T. So we add items for T with the dot at the beginning.
Problem 1
Consider the following grammar-

E→E–E

E→ExE

E → id

Parse the input string id – id x id using a shift-reduce parser.


Problem 2
Consider the following grammar-

S→(L)|a

L→L,S|S

Parse the input string ( a , ( a , a ) ) using a shift-reduce parser.


Problem 3
Considering the string “10201”, design a shift-reduce parser for the following grammar-

S → 0S0 | 1S1 | 2

You might also like