0% found this document useful (0 votes)
30 views50 pages

ATCD Material

Uploaded by

rebellion1452
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views50 pages

ATCD Material

Uploaded by

rebellion1452
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Unit – I

Formal Language and Regular Expressions

Finite State Machine:

Finite-state machines provide a simple computational model with many applications. Finite-state
machines, also called finite-state automata (singular: automaton) or just finite automata.
Definition: A FSM (Finite State Machine) is also called finite automata is a machine, or a mathematical
model of a machine, which can only reach a finite no.of states and transitions. It is used in mathematical
problem analysis. Computation begins in the start state with an input string. It changes to new states
depending on the transition function.

Examples of a FSM:

1. Counting to five,
2. Getting up in the morning
3. A Playing board
4. Traffic Light
5. Vending Machine

Components of a Finite State Machine

FSM is a device consisting of three components


1. a Input Tape,
2. a Control Circuit, and
3. a Read Tape
which satisfy the following conditions:
1. The tape starts from left end and extends to the right without an end.
2. The tape is dividing into squares in each a symbol.
3. The tape has a read only head.
4. The head moves to the right one square every time it reads a symbol. It never moves to the
left. When it sees no symbol, it stops and the automata terminate its operation.
5. There is a control determines the state of the automaton and also controls the movement of
the head.

Mathematical representation of a Finite State Machine


A finite automaton M is defined by a 5-tuple (Q, Σ, δ, q0, F), where
 Q is the finite non-empty set of states of M
 Σ is the set of symbols representing input to M
 δ : Q × Σ → Q is the transition function
q0 ∈ Q is the start state of M
F ⊆ Q is the set of final states of M

Symbols, Alphabets and Strings, Operations on Strings

Alphabet: An alphabet is a non-empty finite set. We normally use the symbols a, b, c…with or
without subscripts or 0, 1, 2, . . ., etc. for the elements of an alphabet.

String: A string over an alphabet Σ is a finite sequence of symbols of Σ.


Example1. Let Σ = {a, b} be an alphabet; then aa, ab, bba, baaba, . are some examples of strings
over Σ.

 Since the empty sequence is a finite sequence, it is also a string. We use ε, to denote the
empty string.

 The set of all strings over an alphabet Σ is denoted by Σ∗.

For example, if Σ = {0, 1}, then Σ*= {ε, 0, 1, 00, 01, 10, 11, 000, 001, . . .}. Although the
set Σ* is infinite, it is a countable set. In fact, Σ∗ is countably infinite for any alphabet Σ.

Operations on Strings:

1. Concatenation:

Let x = a1a2 · · · an and y = b1b2 · · · bm be two strings.

The concatenation of the pair x, y denoted by xy is the string a1a2 · · · anb1b2 · · · bm

 Concatenation is associative, i.e., w1(w2w3) = (w1w2)w3.

 Concatenation is usually not commutative, i.e., w1w2 ≠ w2w1


2. Length of a string - its length as a sequence (the number of symbols) if w = abcd, |w| =

3. Substring: If w is a string, then v is a substring of w if there exist strings x and y such that w = xvy

x is called prefix, and y is called suffix of w

R
4. Reversal: Reversal of a string w denoted w is the string spelled

backwards Example 1: If w = abbabab then w R =bababba

5. Kleen Closure: Let w be a string. w* is the set of strings obtained by applying any number of

concatenations of w with itself, including the empty string.

Example: if w = a is a string then a* = a 0 U a1 U a2 U a3 U…… a0

= {𝜖}, a1 = {a}, a2 = {aa}, a3 = {aaa} ……..

a* = a0 U a1 U a2 U a3 U…… = { 𝜖, a, aa, aaa,aaaa-----------}


6. Keen Plus: It is denoted by w+ and defined as w+ = w* – {𝜖}

a+ = a1 U a2 U a3 U…… = {a, aa, aaa,aaaa-----------}

Formal Languages, Operations on Languages

Formal Languages:

A formal language is an abstraction for general characteristics of programming language, that can
be defined as a set of strings over an alphabet ∑ .
A language L is a possibly infinite set of strings over a finite alphabet Σ. It is denoted by L.

 L(M) is the notation for a language defined by a machine M. The machine M accepts a certain set
of strings, thus a language.

 L(G) is the notation for a language defined by a grammar G. The grammar G


recognizes a certain set of strings, thus a language.

 L(r) is the notation for a language defined by a regular expression r.

Operations on Languages:

1. Concatenation of Languages :

{xy | x ∈ L1, y ∈
Given languages L1 and L2, we define their concatenation to be the language L1 ◦ L2 =

L2} Example.
• L1 = {hello} and L2 = {world} then L1 ◦ L2 = {helloworld}

2. Kleene Closure:
We write Ln to denote the language which is obtained by concatenating n copies of L. More formally,
L 0 = {ε} and
L n = L n−1 0 L, for n ≥ 1.

Kleene star or Kleene closure of a language L, denoted by L ∗, is defined as

Example:

i. Kleene star of the language {01} is

3. Kleen Plus : The positive closure of a language L is denoted by L+ is defined as

4. Union : A string xϵ L1 U L2 iff x ϵ L1 or x ϵ L2

Example: L1 = { 0, 11, 01, 011 } , L2 = { 1, 01, 110 } then L1 U L2 = { 0, 11, 01, 011,
111 }

5. Intersection: A string, xϵ L1∩ L2 iff x ϵ L1 and x ϵ L2 .

Example : L1 = { 0, 11, 01, 011 } , L2 = { 1, 01, 110 } then L1∩ L2 = { 01 }

6. Relative Complement: Given some alphabet Σ, for any two languages S, T over Σ, the
difference S − T of S and T is the language S − T = {w ∈ Σ* | w ∈ S and w ∈ T }.

A special case of the difference is obtained when S = Σ* , in which case we define the
complement L of a language L as L = {w ∈ Σ* | w ∈ L}.

7. Reversal of a Language: Given a language L over Σ, we define the reverse of L as LR ={w R | w ∈ L}.
Regular Language

Regular Language: The regular languages are defined as follows Given a finite alphabet Σ:
1. ∅ is a regular language.
2. For any string x ∈ Σ ∗ , {x} is a regular language.
3. A language L is regular if there exists an FSA M such that L(M) = L.
4. A language L is regular if there exists a regular expression r such that Lr = L
These languages are accepted by DFA and NDFA

Closure Properties:

A variety of operations which preserve regularity. i.e., the universe of regular languages is closed
under these operations.
1. Regular Languages are closed under union ∪, Concatenation ◦ and Closure ∗.

L1 ∪ L2 = L(R1 ∪ R2) =⇒ L1 ∪ L2 regular.


L1 * = L(R1* ) =⇒ L1* regular.

2. Regular Languages are closed under complementation, i.e., if L is regular then L =


Σ∗ \ L is also regular
3. Regular Languages are closed under intersection, i.e., if L1 and L2 are regular then
L1 ∩ L2 is also regular
4. Regular languages are closed under homomorphism, i.e., if L is a regular language
and h is a homomorphism, then h(L) is also regular.
5. Regular languages are closed under Difference of two languages. i.e; L1 –L2
6. The reverse of LR of L ia also regular

Note: To prove that a given language is not regular , we use

1. Pumping Lemma
2. Myhill-Nerode Theorem

Regular languages, Regular expressions

Regular Language

The languages accepted by FA are regular languages and these languages are easily described by
simple expressions called regular expressions.

For any regular expression r and s over Σ corresponding to the languages L r and Ls respectively, each
of the following is a regular expression corresponding to the language indicated.
 (rs) corresponding to the language LrLs
 (r + s) corresponding to the language Lr ∪ Ls
 r* corresponding to the language Lr.

Some examples of regular expression are

1. L (01) = {0, 1}
2. L (01 + 0) = {01, 0}
3. L (0 (1+ 0)) = {01, 00}
4. L (0*) = {ε, 0, 00, 000, …}
5. L ((0 + 10)* (ε + 1)) = all strings of 0‟s and 1‟s without two consecutive 1‟s.
 If L1 and L2 are regular languages in Σ*, then L1 ∪ L2, L1 ∩ L2, L1 – L2 and L1 (complement of
L1), are all regular languages.
 Pumping lemma is a useful tool to prove that a certain language is not regular.

Regular Expression

Regular expressions mean to represent certain sets of strings in some algebraic fashion. A regular
expression over the alphabet Σ is defined as follows

 ϕ is a regular expression corresponding to the empty language ϕ.


 ε is a regular expression corresponding to the language {ε}.
 For each symbol a ∈Σ a is a regular expression corresponding to the language {a}.

Regular Set

A set represented by a regular expression is called regular set e.g., If Σ = {a, b} is an alphabet, then

Different Regular Sets and their Expressions


Identities for Regular Expressions

The following points are the some identities for regular expressions.

 ϕ+R=R+ϕ=R
 εR=Rε=R
 R + R = R, where R is the regular expression.
 (R*)* = R* ϕR = Rϕ = ϕ
 ε * = ε and ϕ* = ε
 RR* = R*R = R+
 R*R* = R*
 (P + Q)* = (P*Q*)* = (P* + Q*)*, where P and Q are regular expressions.
 R (P + Q) = RP + RQ and (P + Q)R = PR + QR
 P(QP)* = (PQ)*P

Properties of Regular Language

Regular languages are closed under following properties

1. Union
2. Concatenation
3. Kleene closure
4. Complementation
5. Transpose
6. Intersection
Finite Automata, Deterministic Finite Automata(DFA)

Automata (singular : automation) are a particularly simple, but useful, model of computation. They were
initially proposed as a simple model for the behavior of neurons.
States, Transitions and Finite-State Transition System:
Let us first give some intuitive idea about a state of a system and state transitions before describing finite
automata. Informally, a state of a system is an instantaneous description of that system which gives all
relevant information necessary to determine how the system can evolve from that point on.
Transitions are changes of states that can occur spontaneously or in response to inputs to the states.
Though transitions usually take time, we assume that state transitions are instantaneous (which is an
abstraction). Some examples of state transition systems are: digital systems, vending machines, etc. A
system containing only a finite number of states and transitions among them is called a finite-state
transition system. Finite-state transition systems can be modeled abstractly by a mathematical model
called finite automation
Deterministic Finite (-state) Automata
Informally, a DFA (Deterministic Finite State Automaton) is a simple machine that reads an in- put string
-- one symbol at a time -- and then, after the input has been completely read, decides whether to accept or
reject the input. As the symbols are read from the tape, the automaton can change its state, to reflect how
it reacts to what it has seen so far. A machine for which a deter-ministic code can be formulated, and if
there is only one unique way to formulate the code, then the machine is called deterministic finite
automata.
Thus, a DFA conceptually consists of 3 parts:
1. A tape to hold the input string. The tape is divided into a finite number of cells. Each cell
holds a symbol from .
2. A tape head for reading symbols from the tape
3. A control , which itself consists of 3 things:

 finite number of states that the machine is allowed to be in (zero or more states
are designated as accept or final states),
 a current state, initially set to a start state,
 a state transition function for changing the current state.
Deterministic Finite State Automaton : A Deterministic Finite State Automaton (DFA) is

a 5-tuple :
• Q is a finite set of states.
• is a finite set of input symbols or alphabet

• is the “next state” transition function (which is total ). Intuitively, a function that tells
which state to move to in response to an input, i.e., if M is in state q and sees input a, it moves to state
.
• is the start state.
• F is the set of accept or final states.
Design of DFAs

Language Accepted or Recognized by a DFA :


The language accepted or recognized by a DFA M is the set of all strings accepted by M , and is denoted by
L(M)={w€∑* / M accepts w}.
The notation of acceptance can also made more precise by extending the transaction function.

Extended transition function :

Extend (which is function on symbols) to a function on strings, i.e. .

That is, is the state the automation reaches when it starts from the state q and finish processing
the string w.

Non Deterministic Finite Automata (NFA)

Non-Deterministic Finite Automata


Nondeterminism is an important abstraction in computer science. Importance of nondeterminism is found
in the design of algorithms. For examples, there are many problems with efficient nondeterministic
solutions but no known efficient deterministic solutions. ( Travelling salesman, Hamiltonean cycle,
clique, etc). Behaviour of a process is in a distributed system is also a good example of nondeterministic
situation. Because the behaviour of a process might depend on some messages from other processes that
might arrive at arbitrary times with arbitrary contents. It is easy to construct and comprehend an NFA
than DFA for a given regular language. The concept of NFA can also be used in proving many theorems
and results. Hence, it plays an important role in this subject.
In the context of FA nondeterminism can be incorporated naturally. That is, an NFA is defined in
the same way as the DFA but with the following two exceptions:
• multiple next state.
• - transitions.
Multiple Next State:
• In contrast to a DFA, the next state is not necessarily uniquely determined by the current state and
input symbol in case of an NFA. (Recall that, in a DFA there is exactly one start state and exactly
one transition out of every state for each symbol in ).
• This means that - in a state q and with input symbol a - there could be one, more than one or zero
next state to go, i.e. the value of is a subset of Q. Thus

= which means that any one of could be the next


state.
Non-Deterministic Automata with Є-moves

- transitions :
In an -transition, the tape head doesn't do anything- it doesnot read and it doesnot move.
However, the state of the automata can be changed - that is can go to zero, one

or more states. This is written formally as implying that the next state
could by any one of w/o consuming the next input symbol.
Formal definition of NFA:

Formally, an NFA is a quituple where Q, , , and F bear the


same meaning as for a DFA, but , the transition function is redefined as follows:

where P(Q) is the power set of Q i.e. .

Equivalence of NFA and DFA

It is worth noting that a DFA is a special type of NFA and hence the class of languages accepted by DFA s
is a subset of the class of languages accepted by NFA s. Surprisingly, these two classes are in fact equal.
NFA s appeared to have more power than DFA s because of generality enjoyed in terms of - transition and
multiple next states. But they are no more powerful than DFA s in terms of the languages they accept.

Equivalence of NFA and DFA

- closure :

In the equivalent DFA , at every step, we need to modify the transition functions to keep track of all
the states where the NFA can go on -transitions. This is done by

replacing by -closure , i.e. we now compute at every step as


follows:

Besides this the initial state of the DFA D has to be modified to keep track of all the states that can be
reached from the initial state of NFA on zero or more -transitions.

This can be done by changing the initial state to -closure ).

It is clear that, at every step in the processing of an input string by the DFA D , it enters a state that
corresponds to the subset of states that the NFA N could be in at that particular point. This has been
proved in the constructions of an equivalent NFA for any -NFA

If the number of states in the NFA is n , then there are states in the DFA . That is, each state in the
DFA is a subset of state of the NFA . But, it is important to note that most of these states are
inaccessible from the start state and hence can be removed from the DFA without changing the accepted
language. Thus, in fact, the number of states in the equivalent DFA would be much less than .
It is interesting to note that we can avoid encountering all those inaccessible or unnecessary states in
the equivalent DFA by performing the following two steps inductively.

1. If is the start state of the NFA, then make - closure ( ) the start state of the equivalent
DFA . This is definitely the only accessible state.

2. If we have already computed a set of states which are accessible. Then .

compute because these set of states will also be accessible.

Conversion oF regular expression to Finite Automata (Arden‟s Theorem)

Arden's Theorem

In order to find out a regular expression of a Finite Automaton, we use Arden‟s Theorem along with
the properties of regular expressions.

Statement −

Let P and Q be two regular expressions.

If P does not contain null string, then R = Q + RP has a unique solution that is R = QP*

Proof −

R = Q + (Q + RP)P [After putting the value R = Q + RP]

= Q + QP + RPP

When we put the value of R recursively again and again, we get the following equation −

R = Q + QP + QP2 + QP3…..

R = Q (ε + P + P2 + P3 + …. )

R = QP* [As P* represents (ε + P + P2 + P3 + ….) ]

Hence, proved.

Assumptions for Applying Arden‟s Theorem −

 The transition diagram must not have NULL transitions


 It must have only one initial state
Problem

Construct a regular expression corresponding to the automata given below −

Solution

Here the initial state is q2 and the final state is q1.

The equations for the three states q1, q2, and q3 are as follows −

q1 = q1a + q3a + ε (ε move is because q 1 is the initial state0

q2 = q1b + q2b + q3b

q3 = q2a

Now, we will solve these three equations −

q2 = q1b + q2b + q3b

= q1b + q2b + (q2a)b (Substituting value of q3)

= q1b + q2(b + ab)

= q1b (b + ab)* (Applying Arden‟s Theorem)

q1 = q1a + q3a + ε

= q1a + q2aa + ε (Substituting value of q3)

= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)

= q1(a + b(b + ab)*aa) + ε

= ε (a+ b(b + ab)*aa)*

= (a + b(b + ab)*aa)*

Hence, the regular expression is (a + b(b + ab)*aa)*.


Equivalence of NFA and Regular Expression

In general, any regular expression X can be converted to an equivalent NFA called NFA X
containing a single start state and a single accepting state

Case 1: sequence of symbols

output is sequence of states with transitions accepting those symbols e.g., the

regular expression abba yields the NFA

e.g., the regular expression ε yields the NFA

Case 2: disjunction

if A and B are regular expressions whose equivalent NFAs are NFA A and NFAB, then the we can
construct an NFA called NFAA|B that accepts the language generated by A|B as follows:

create start and accepting states of NFAA|B

use ε-transitions to connect the start state of NFAA|B to the start states of NFAA and NFAB

change start states of NFAA and NFAB so that they are no longer start states

use ε-transitions to connect accepting states of NFAA and NFAB to the accepting state of NFAA|B

change accepting states of NFAA and NFAB so that they are no longer accepting states e.g., the

NFA recognizing the language generated by abba|bab:

Equivalence of NFA and Regular Expression


Case 3: repetition

If A is a regular expression whose equivalent NFA is NFA A, then we can construct an NFA called
NFAA* which accepts the language generated by A* as follows:

create start and accepting states of NFAA*

create an ε-transition from start state to accepting state of NFA A* create an

ε-transition from accepting state to start state of NFA A* create ε-transition

from start state of NFAA* to start state of NFAA change start state of NFAA

so that it is not a start state

create ε-transitions from accepting state of NFA A to accepting state of NFAA* change

accepting state of NFAA so that it is not an accepting state

e.g., construct NFA that recognizes language generated by (abba|bab)*

Case 4: concatenation

if A and B are regular expressions whose equivalent NFAs are NFA A and NFAB, then the we can
construct an NFA called NFAAB that accepts the language generated by AB as follows:

create start and accepting states of NFAAB

create ε-transition from start state of NFA AB to start state of NFA A ,change start state of NFA A so it is not

a start state, create ε-transition from accepting state of NFA A to start state of NFAB change accepting state

of NFAA so it is not an accepting state

change start state of NFAB so it is not a start state

create ε-transition from accepting state of NFA B to accepting state of NFA AB change

accepting state of NFAB so it is not an accepting state

e.g.: construct NFA that recognizes (a|b)c first


part: (a|b)

second part: c

overall NFA: (a|b)c

Applications of Finite automata to Lexical Analysis

The finite automata concepts also used in various fields. In the design of a compiler, it used in the lexical
analysis to produce tokens in the form of identifiers, keywords and constants from the input program. In pattern
recognition, it used to search keywords by using string-matching algorithms.

Lex generates regular expressions into transition diagrams,Then it translates the transition diagram into C code to
recognize tokens in the input stream.There are many posiible algorithms are there ,the simples algorithm is
RENFADFA
Unit – II

Context free Grammars and Parsing

Types of Grammar, Noam Chomsky‟s Classification

In the literary sense of the term, grammars denote syntactical rules for conversation in natural languages.
Linguistics have attempted to define grammars since the inception of natural languages like English,
Sanskrit, Mandarin, etc. The theory of formal languages finds its applicability extensively in the fields of
Computer Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective
for writing computer languages.

Grammar

A grammar G can be formally written as a 4-tuple (N, T, S, P) where

 N or VN is a set of Non-terminal symbols

S is the Start symbol, S ∈ N


 T or ∑ is a set of Terminal symbols

 P is Production rules for Terminals and Non-terminals

Example

Grammar G1 −

({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})

Here,

S, A, and B are Non-terminal symbols;

a and b are Terminal symbols

S is the Start symbol, S ∈ N

Productions, P : S → AB, A → a, B → b

Derivations from a Grammar

Strings may be derived from other strings using the productions in a grammar. If a grammar G has a
production α → β, we can say that x α y derives x β y in G. This derivation is written as −

x α y ⇒G x β

y
Example

Let us consider the grammar −

G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )


Some of the strings that can be derived are −

S ⇒ aAb using production S → aAb

⇒ aaAbb using production aA → aAb

⇒ aaaAbbb using production aA → aAb

⇒ aaabbb using production A → ε

According to Noam Chomosky, there are four types of grammars − Type 0, Type 1, Type 2, and Type
1. The following table shows how they differ from each other −

Grammar Type Grammar Accepted Language Accepted Automaton

Recursivelyenumerable
Type 0 Unrestricted grammar language Turing Machine

Context-sensitive Linear-bounded
Type 1 Context-sensitive language
grammar automaton

Context-free
Type 2 grammar Context-free language Pushdown automaton

Type 3 Regular grammar Regular language Finite state automaton

Take a look at the following illustration. It shows the scope of each type of grammar −
Type - 3 Grammar

Type-3 grammars generate regular languages. Type-3 grammars must have a single non-terminal on the
left-hand side and a right-hand side consisting of a single terminal or single terminal followed by a single
non-terminal.

The productions must be in the form X → a or X → aY

where X, Y ∈ N (Non terminal) and a ∈ T (Terminal)

Type - 2 Grammar

Type-2 grammars generate context-free languages.

The productions must be in the form A → γ

where A ∈ N (Non terminal)


and γ ∈ (T∪N)* (String of terminals and non-terminals).

These languages generated by these grammars are be recognized by a non-deterministic pushdown


automaton.

Example

S→Xa
X → a
X → aX
X → abc
X→ε

Type - 1 Grammar

Type-1 grammars generate context-sensitive languages. The productions must be in the form

αAβ→αγβ

where A ∈ N (Non-terminal)

and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)

The strings α and β may be empty, but γ must be non-empty.

The rule S → ε is allowed if S does not appear on the right side of any rule. The languages generated
by these grammars are recognized by a linear bounded automaton.

Example

AB → AbBc
A → bcA
B→b
Type - 0 Grammar

Type-0 grammars generate recursively enumerable languages. The productions have no restrictions.
They are any phase structure grammar including all formal grammars.

They generate the languages that are recognized by a Turing machine.

The productions can be in the form of α → β where α is a string of terminals and non-terminals with at
least one non-terminal and α cannot be null. β is a string of terminals and non-terminals.

Example

S → ACaB
Bc → acB
CB → DB
aD → Db

Derivation Tree
A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information a
string derived from a context-free grammar.

Representation Technique
 Root vertex − Must be labeled by the start symbol.
 Vertex − Labeled by a non-terminal symbol.
 Leaves − Labeled by a terminal symbol or ε.

If S → x1x2 …… xn is a production rule in a CFG, then the parse tree / derivation tree will be as follows −

There are two different approaches to draw a derivation tree −


Top-down Approach −
 Starts with the starting symbol S
 Goes down to tree leaves using productions

Bottom-up Approach −
 Starts from tree leaves
 Proceeds upward to the root which is the starting symbol S

Derivation or Yield of a Tree


The derivation or the yield of a parse tree is the final string obtained by concatenating the labels of the leaves of the
tree from left to right, ignoring the Nulls. However, if all the leaves are Null, derivation is Null.
Example
Let a CFG {N,T,P,S} be
N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε
One derivation from the above CFG is “abaabb”
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb
Partial Derivation Tree
A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either all of its children are
in the sub-tree or none of them are in the sub-tree.
Example
If in any CFG the productions are −
S → AB, A → aaA | ε, B → Bb| ε

the partial derivation tree can be the following −


If a partial derivation tree contains the root S, it is called a sentential form. The above sub-tree is also in
sentential form.

Leftmost and Rightmost Derivation of a String


 Leftmost derivation − A leftmost derivation is obtained by applying production to the leftmost variable
in each step.
 Rightmost derivation − A rightmost derivation is obtained by applying production to the rightmost
variable in each step.
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −
The rightmost derivation for the above string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below –
Left and Right Recursive Grammars
In a context-free grammar G, if there is a production in the form X → Xawhere X is a non-terminal and ‘a’ is a
string of terminals, it is called a left recursive production. The grammar having a left recursive production is
called a left recursive grammar.
And if in a context-free grammar G, if there is a production is in the form X → aX where X is a non-terminal
and ‘a’ is a string of terminals, it is called a right recursive production. The grammar having a right recursive
production is called a right recursive grammar.
Ambiguous and Unambiguous Grammars

Eliminate the Ambiguity

We can remove the ambiguity by removing the left recursing and left factoring.
Left Recursion

A production of the context free grammar G = (V N, E, P, S) is said to be left recursive if it is of the


form

Where, A is a non-terminal and α∈(VN ∪ E)*


A → Aα,

Removal of Left Recursion

Let the variable A has left recursive productions as follows

(i) A → Aα1 |Aα2|Aα3|…|Aαn|β1|β2|β3|…|βn

Where β1, β2 ….. βn do not begin with A. Then we replace A production in the form of

(ii) A → β1A1 |β2A1|…|βmA1 where

A1 → α1A1|α2A1|α3A1|…,|αnA1|∧

Left Factoring

Two or more productions of a variable A of the grammar G = (V N, E, S, P) are said to have left
factoring, if the productions are of the form

A → αβ1|αβ2|…αβn where β1,…βn(VN ∪Σ)


Removal of Left Factoring

Let the variable A has left factoring productions as follows

A → αβ1|αβ2|…|αβn|y1|y2|y3|…|ym where, β1, β2…..,βn have a common factor

α and y1, y2,….ym does not contain a as a prefix, then we replace the production into the form as
follows

A → αA1|Y1Y2|…..|YM, where

A1 → β1|β2|…..|βn
Eliminate the Useless Productions/Symbols

The symbols that cannot be used in any productions due to their unavailability in the productions or
inability in deriving the terminals, are known as useless symbols.

e,g., consider the grammar G with the following production rules

S → aS |A| C

A→aB

→ aa C

→ ab

Step 1 Generate the list of variables those produce terminal symbols U

= {A, B, S}

Because C does not produce terminal symbols so this production will be deleted. Now the modified
productions are

S → aS |A

A→a

B → aa

Step 2 Identity the variables dependency graph S

→ AB

In this graph, B variable is not reachable from S so it will be deleted also. Now the productions are

S → aS |A

A→a
Elimination of Є - Productions

Eliminate Null Productions

If any variable goes to ∧ then that is called as nullable variable.

e.g., A → ∧, then variable A is said to be nullable variable

Step 1 Scan the nullable variables in the given production list.

Step 2 Find all productions which does not include null productions.
e.g., consider the CFG has following productions S → ABaC

A → BC

B → b|∧

C → D|∧

D→d

solve step find the nulable variables firstly the set is empty

N = {}

N = {B, C}

N = {A, B, C}

Due to B, C variables, A will also be a nullable variable.

Step 3 {Null productions}

S → BaC | AaC | ABa | aC | Ba | Aa | a

A→B|C

B → b

C → D

D→d

The above grammar is the every possible combination except ∧ Now put this new grammar with
original grammar with null.

S → ABaC | BaC | AaC | ABa | aC | Ba | Aa | a

Elimination of Unit Productions

Eliminate the Unit-Productions

A production of the type A → B, where A, B are variables is called unit productions.

Step 1 Using productions, we create dependency graph


S⇒B

∵S B&B A

∴S A
Step 2 Now the production without unit productions

S → Aa S → bb | a | bc

B → bb + A → bb

A → a | bc B → a | bc

Now the final grammar is

S → Aa | bb | a | bc

B → bb | a | bc

A → a | bc | bb

Application of Context- Free grammars

Parsers

Markup languages
THE ROLE OF PARSER
The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and verifies that the

string can be generated by the grammar for the source language. It reports any syntax errors in the program.
It also recovers from commonly occurring errors so that it can continue processing its input.
PARSING
It is the process of analyzing a continuous stream of input in order to determine its grammatical structure
with respect to a given formal grammar.
Parse tree:
Graphical representation of a derivation or deduction is called a parse tree. Each interior node of the parse
tree is a non-terminal; the children of the node can be terminals or nonterminals.
Types of parsing:
1.Top down parsing 2.Bottom up
parsing
Top-down parsing : A parser can start with the start symbol and try to transform it to the
input string. Example : LL Parsers.
Bottom-up parsing :A parser can start with input and attempt to rewrite it into the start
symbol.Example : LR Parsers.
Top-down parsing:
Top-down parsing can be viewed as the problem of constructing a parse tree for the input string,starting from
the root and creating the nodes of the parse tree in preorder Equivalently, top-down parsing can be viewed as
finding a leftmost derivation for an input string.

Example
This sequence of trees corresponds to a leftmost derivation of the input
At each step of a top-down parse, the key problem is that of determining the production
to be
applied for a non terminal, say A. Once an A-production is chosen, the rest of the parsing
process
consists of "matching7' the terminal symbols in the production body with the input string.
For example, consider the top-down parsing as below

Recursive-Descent Parsing
A recursive-descent parsing program consists of a set of procedures, one for each non
terminal.Execution begins with the procedure for the start symbol, which halts and announces success if its
procedure body scans the entire input string. Pseudo code for a typical non terminal appears in Fig. 4.13.
Note that this pseudo code is nondeterministic, since it begins by choosing the Aproduction to apply in a
manner that is not specified.
General recursive-descent may require backtracking; that is, it may requirerepeated scans over the input
However, backtracking is rarely needed to parse programming language constructs, so backtracking parsers
are not seen frequently.
Example for recursive decent parsing:
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop. Hence,
elimination of left-recursion must be done before parsing. Consider
the grammar for arithmetic expressions E→E+T|T
T→T*F|F
F→(E)|id
After eliminating the left-recursion the grammar becomes, E →TE‘ E‘→+TE‘

T →FT‘
T‘→*FT‘ | ε
F → (E) | id
Now we can write the procedure for grammar as follows:
Recursive procedure:
Procedure E()
begin
T( );
EPRIME( );
End
Procedure EPRIME( )
begin
If input_symbol=‘+‘ then ADVANCE( ); T( );
EPRIME( );
end
Procedure T( )
begin
F( );
TPRIME( );
end
Procedure TPRIME( )
begin
If input_symbol=‘*‘ then ADVANCE( ); F( );
TPRIME( );
end Procedure
F( ) begin
If input-symbol=‘id‘ then ADVANCE( ); else if
input-symbol=‘(‗ then ADVANCE( ); E( );
else if input-symbol=‘)‘ then ADVANCE( );
end
else ERROR( );
Stack implementation:
To recognize input id+id*id :
PROCEDURE INPUT STRING
E( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
EPRIME( ) id+id*id
ADVANCE( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
ADVANCE( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
LL(1)Parsing
The construction of a predictive parser is aided by two functions associated with a grammar G :
1. FIRST
2. FOLLOW
Rules for first( ):
1. If X is terminal, then FIRST(X) is {X}.
2. If X →ε is a production, then add ε to FIRST(X).
3. If X is non-terminal and X → aα is a production then add a to FIRST(X).
4. If X is non-terminal and X → Y1 Y2…Yk is a production, then place a in FIRST(X)
if for some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1); that is,
Y1,….Yi-1 => ε. If ε is in FIRST(Yj) for all j=1,2,..,k, then add ε to FIRST(X).
Rules for follow( ):
1. If S is a start symbol, then FOLLOW(S) contains $.
2. If there is a production A→ αBβ, then everything in FIRST(β) except ε is placed in
follow(B).
3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains ε, then
everything in FOLLOW(A) is in FOLLOW(B).

Algorithm for construction of predictive parsing table: Input : Grammar G


Output : Parsing table M Method :
1. For each production A →α of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST(α), add A →α to M[A, a].
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is in FIRST(α) and $
is in FOLLOW(A) , add A →α to M[A, $].
4. Make each undefined entry of M be error.
Example:
Consider the following grammar :
E→E+T|T
T→T*F|F
F→(E)|id
After eliminating left-recursion the grammar is E
→TE‘
E‘→+TE‘|ε
T→FT‘
T‘→*FT‘|ε F
→ (E)|id
First( ) :
FIRST(E) = { ( , id}
FIRST(E‘) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T‘) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E‘) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T‘) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
Predictive parsing table :

Stack Input Output


$E id+id*id $
$E‘T id+id*id $ E→TE‘
$E‘T‘F id+id*id $ T→FT‘
$E‘T‘id id+id*id $ F→id
$E‘T‘ +id*id $
$E‘ +id*id $ T‘ →ε
$E‘T+ +id*id $ E‘ → +TE‘
$E‘T id*id $
$E‘T‘F id*id $ T→FT‘
$E‘T‘id id*id $ F→id
$E‘T‘ *id $
$E‘T‘F* *id $ T‘ → *FT‘
$E‘T‘F id $
$E‘T‘id id $ F→id
$E‘T‘ $
$E‘ $ T‘ →ε
$ $ E‘ →ε

LL(1) grammar:
The parsing table entries are single entries. So each location has not more than one entry. This
type of grammar is called LL(1) grammar.
Consider this following grammar: S→iEtS |
iEtSeS| a
E→b
After eliminating left factoring, we have
S→iEtSS‘|a
S‘→ eS | ε
E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non- terminals.
FIRST(S) = { i, a }
FIRST(S‘)={e,ε}
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S‘)={$,e} FOLLOW(E) = {t}

Since there are more than one production, the grammar is not LL(1) grammar.
Actions performed in predictive parsing:
1. Shift
2. Reduce
3. Accept
4. Error
Implementation of predictive parser:
1. Elimination of left recursion, left factoring and ambiguous grammar.
2. Construct FIRST() and FOLLOW() for all non-terminals.
3. Construct predictive parsing table.
4. Parse the given input string using stack and parsing table.

Error Recovery in Predictive Parsing


This discussion of error recovery refers to the stack of a table-driven predictive parser, since it makes explicit
the terminals and nonterminals that the parser hopes to match with the remainder of the input; the techniques
can also beused with recursive-descent parsing.
An error is detected during predictive parsing when the terminal on top of the stack does not match the next
input symbol or when nonterminal A is on top of the stack, a is the next input symbol, and M[A, a] is error
(i.e., the parsing-table entry is empty).

 Bottom-up parsers build parse trees from the leaves and work up to the root.

 Bottom-up syntax analysis known as shift-reduce parsing.

 An easy-to-implement shift-reduce parser is called operator precedence parsing.

 General method of shift-reduce parsing is called LR parsing.

 Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top). At each reduction step a particular substring
matching the right side of a production is replaced by the symbol on the left of that production, and if the
substring is chosen correctly at each step, a rightmost derivation is traced out in reverse.
Example 2.7.1
Consider the grammar
S → aABe
A → Abc | b
B→d
The sentence abbcde can be reduced to S by the following steps.
abbcde
aAbcde
aAde
aABe
S

Handles:
A handle of a string is a substring that matches the right side of a production, and whose reduction
to the nonterminal on the left side of the production represents one step along the reverse of a rightmost
derivation.

Handle Pruning:
A rightmost derivation in reverse can be obtained by handle pruning. i.e., start with a string of
terminals w that is to parse. If w is a sentence of the grammar at hand, thenw = γn, where γn is the nth right
sentential form of some as yet unknown rightmost derivation.
S = γ0 γ1 γ2 … γn-1 γn = w.
Example for right sentential form and handle for grammar
E→E+E
E→E*E
E→(E)
E → id

Shift Reduce Parsing

A shift-reduce parser uses a parse stack which (conceptually) contains grammar symbols. During the
operation of the parser, symbols from the input are shifted onto the stack. If a prefix of the symbols on top of the
stack matches the RHS of a grammar rule which is the correct rule to use within the current context, then the
parser reduces the RHS of the rule to its LHS,replacing the RHS symbols on top of the stack with the
nonterminal occurring on the LHS of the rule. This shift-reduce process continues until the parser terminates,
reporting either success or failure. It terminates with success when the input is legal and is accepted by the
parser. It terminates with failure if an error is detected in the input. The parser is nothing but a stack automaton
which may be in one of several discrete states. A state is usually represented simply as an integer. In reality, the
parse stack contains states, rather than grammar symbols. However, since each state corresponds to a unique
grammar symbol, the state stack can be mapped onto the grammar symbol stack mentioned earlier. The operation
of the parser is controlled by a couple of tables.

ACTION TABLE
The action table is a table with rows indexed by states and columns indexed by terminal symbols. When
the parser is in some state s and the current look ahead terminal is t, the action taken by the parser depends on
the contents of action[s][t], which can contain four different kinds of entries:

Shift s' Shift state s' onto the parse stack.

Reduce r Reduce by rule r. This is explained in more detail

below. Accept Terminate the parse with success, accepting the

input. Error Signal a parse error

GOTO TABLE

The goto table is a table with rows indexed by states and columns indexed by nonterminal symbols.
When the parser is in state s immediately after reducing by rule N, then the next state to enter is given by
goto[s][N].

Implementation of Shift-Reduce Parser:


 To implement shift-reduce parser, use a stack to hold grammar symbols and an input buffer to hold the
string w to be parsed.
 Use $ to mark the bottom of the stack and also the right end of the input.
 Initially the stack is empty, and the string w is on the input, as follows:
Stack Input
$ w$
 The parser operates by shifting zero or more input symbols onto the stack until a handle β is on top of the
stack.
 The parser then reduces β to the left side of the appropriate production.
 The parser repeats this cycle until it has detected an error or until the stack contains the start symbol and
the input is empty:

Stack Input
$S $
 After entering this configuration, the parser halts and announces successful completion of parsing.
 There are four possible actions that a shift-reduce parser can make: 1) shift 2) reduce 3) accept 4) error.
 In a shift action, the next symbol is shifted onto the top of the stack.
 In a reduce action, the parser knows the right end of the handle is at the top of the stack. It must then
locate the left end of the handle within the stack and decide with what nonterminal to replace the handle.
 In an accept action, the parser announces successful completion of parsing.
 In an error action, the parser discovers that a syntax error has occurred and calls an error recovery
routine.
 Note: an important fact that justifies the use of a stack in shift-reduce parsing: the handle will always
appear on top of the stack, and never inside.

Example 2.8.1
Consider the grammar
E→E+E
E→E*E
E→(E)
E → id and the input string id1 + id2 * id3. Use the shift-reduce parser to check whether the input
string is accepted by the Grammar
Conflicts during shift-reduce parsing:

 There are CFGs for which shift-reduce parsing cannot be used.


 For every shift-reduce parser for such grammar can reach a configuration in which the parser cannot
decide whether to shift or to reduce (a shift-reduce conflict), or cannot decide which of several
reductions to make (a reduce/reduce conflict), by knowing the entire stack contents and the next input
symbol.
Example of such grammars:
 These grammars are not LR(k) class grammars, refer them as no-LR grammars.
 The k in the LR(k) grammars refer to the number of symbols of lookahead on the input.
 Grammars used in compiling usually fall in the LR(1) class, with one symbol lookahead.
 An ambiguous grammar can never be LR.
Stmt → if expr then stmt
| if expr then stmt else stmt
| other
In this grammar there is a shift/reduce conflict occur for some input string. So this Grammar is not
LR(1) grammar. The current state of a shift-reduce parser is the state on top of the state stack.

The detailed operation of such a parser is as follows:


 Initialize the parse stack to contain a single state s0, where s0 is the distinguished initial state of the
parser.

 Use the state s on top of the parse stack and the current lookahead t to consult the action table entry
action[s][t]:

 If the action table entry is shift s' then push state s' onto the stack and advance the input so that the
lookahead is set to the next token.

 If the action table entry is reduce r and rule r has m symbols in its RHS, then pop m symbols off the parse
stack. Let s' be the state now revealed on top of the parse stack and N be the LHS nonterminal for rule r.
Then consult the goto table and push the state given by goto[s'][N] onto the stack. The lookahead token is
not changed by this step.

 If the action table entry is accept, then terminate the parse with success.
 If the action table entry is error, then signal an error
 Repeat step (2) until the parser terminates.

Model of LR Parsers

There are three types of LR parsers: LR(k), simple LR(k), and lookahead LR(k) (abbreviated to LR(k),
SLR(k), LALR(k))). The k identifies the number of tokens of lookahead. We will usually only concern ourselves
with 0 or 1 tokens of lookahead, but the techniques do generalize to k > 1. The different classes of parsers all
operate the same way (as shown above, being driven by their action and goto tables), but they differ in how their
action and goto tables are constructed, and the size of those tables.
We will consider LR(0) parsing first, which is the simplest of all the LR parsing methods. It is also the
weakest and although of theoretical importance, it is not used much in practice because of its limitations. LR(0)
parses without using any lookahead at all.
Adding just one token of lookahead to get LR(1) vastly increases the parsing power. Very few grammars
can be parsed with LR(0), but most unambiguous CFGs can be parsed with LR(1). The drawback of adding the
lookahead is that the algorithm becomes somewhat more complex and the parsing table gets much, much bigger.
The full LR(1) parsing table for a typical programming language has many thousands of states compared to the
few hundred needed for LR (0). A compromise in the middle is found in the two variants SLR(1) and LALR(1)
which also use one token of lookahead but employ techniques to keep the table as small as LR(0). SLR(k) is an
improvement over LR(0) but much weaker than full LR(k) in terms of the number of grammars for which it is
applicable. LALR(k) parses a larger set of languages than SLR(k) but not quite as many as LR(k). LALR(1) is
the method used by the yacc parser generator.

Difference between LR and LL Parsers

LL LR
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root nonterminal on the Ends with the root nonterminal on the stack.
stack.
Ends when the stack is empty. Starts with an empty stack.
Uses the stack for designating what is still to Uses the stack for designating what is already
be expected. seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the Tries to recognize a right hand side on the
stack, and pushes the corresponding right stack, pops it, and pushes the corresponding
hand side. nonterminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off Reads the terminals while it pushes them on
the stack. the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.

Construction of SLR table


Table Construction
Given a CFG, to construct the finite automaton (DFA) that recognizes handles. After that, constructing
the ACTION and GOTO tables will be straightforward.The states of the finite state machine correspond to item
sets. An item (or configuration) is a production with a dot (.) at the RHS that indicates how far we have
progressed using this rule to parse the input. For example, the item E ::= E + . E indicates that we are using the
rule E ::= E + E and that, using this rule, we have parsed E, we have seen a token +, and we are ready to parse
another E. Now, why do we need a set (an item set) for each state in the state machine? Because many
production rules may be applicable at this point; later when we will scan more input tokens we will be able tell
exactly which production to use. This is the time when we are ready to reduce by the chosen production.

For example, the closure of the item E ::= E +. T is


the set: E ::= E + . T
T ::= . T * F
T ::= . T / F
T ::= . F
F ::= . num
F ::= . id

The initial state of the DFA (state 0) is the closure of the item S ::= . a $, where S ::= a$ is the first rule.
In simple words, if there is an item X ::= a . s b in an item set,where s is a symbol (terminal or nonterminal), we
have a transition labelled by s to an item set that contains X ::= a s . b. But it's a little bit more complex
than that:

 If we have more than one item with a dot before the same symbol s, say X ::= a. s b and Y ::= c . s d,
then the new item set contains both X ::= a s . b and Y ::= c s . d.
 We need to get the closure of the new item set.
 We have to check if this item set has been appeared before so that we don't create it
again.

As another example, the CFG:

1) S ::= E $
2) E ::= E + T
3) |T
4) T ::=
id 5)
|(E)
has the following item sets:
I0: S ::= . E $ I4: E ::= E + T .
E ::= . E + T
E ::= . T I5: T ::= id .
T ::= . id
T ::= . ( E ) I6: T ::= ( . E )
E ::= . E + T
I1: S ::= E . $ E ::= . T
E ::= E . + T T ::= . id
T ::= . ( E )
I2: S ::= E $ .
I7: T ::= ( E . )
I3: E ::= E + . T E ::= E . + T
T ::= . id
T ::= . ( E ) I8: T ::= ( E ) .

I9: E ::= T .

The ACTION and GOTO tables that correspond to this DFA are:

state action goto


)
id ( + $ S E T
0s5 s6 19
1 s3 s2
aaaaa
s5 s6
4
r2 r2 r2 r2 r2
r4 r4 r4 r4 r4 6s5
s6
79
7 s8 s3
r5 r5 r5 r5 r5
r3 r3 r3 r3 r3

As another example, consider the following augmented grammar:

0) S' ::= S $
1) S ::= B B
2) B ::= a
B 3) B ::=
c

The state diagram is:


The ACTION and GOTO parsing tables are:

state action goto


a c $ S' S B
0 s5 s7 1 3
1 s2
2 a a a
3 s5 s7 4
4 r1 r1 r1
5 s5 s7 6
6 r2 r2 r2
7 r3 r3 r3

LR Parser(LR(1)) -CLR & LALR


Canonical LR(1) Items

Recall that in the SLR method, state i calls for reduction by A -> α if the set of items I contains
item A -> α@+ and a is in FOLLOW(A). In some situations, however, when state I appears on top of the
stack, the viable prefix βα on the stack is such that βA cannot be followed by a in any right- sentential form.
Thus, the reduction by A -> α should be invalid on input a.

Formally, we say LR(1) item [A -> α@β, a+ is valid for a viable prefix γ if there is a derivation
S=> δAω => δαβω , where

γ = δα, and Either a is the first symbol of ω, or ω is ε and a is $.

Example : Let us consider the grammar


S->BB

B->aB|b

There is a rightmost derivation S => aaBab => aaaBab. We see that item [B -> a@B, a+ is valid for a viable
prefix γ = aaa by letting δ = aa, A = B, ω = ab, α = a, and β = B in the above definition. There is also a
rightmost derivation
S => BaB => BaaB.
From this derivation we see that item [B -> a@B, $] is valid for viable prefix Baa.

Constructing LR(1) Sets of Items

Input: An augmented grammar G’.

Output:The sets of LR(1) items that are the set of items valid for one or more viable prefixes of G’ .
Example:Consider the following augmented grammar: -
S’ S
S CC
C Cc | d

The initial set of items is:-

I0 : S’  .S ,

S .CC,

C .Cc, c |

d C .d, c |

We have next set of items

as:- I1 : S’  S., $

I2 : S  .Cc,

$ C

 .Cc, $

C  .d, $
I3 : C  c.C, $

C  .c C , c |

d C  .d, $

I4 : C  d. , c | d

I5 : S  CC. , $

I6 : C  c.C,

$ C  .c

C ,$ C

 .d , $

I7 : C  d. , $

I8 : C  c C. , c | d

I9 : C  c C. , $

Construction of the canonical LR parsing table.

Input:An augmented grammar G’.

Output:The canonical LR parsing table functions action and goto for G’

Method :

 Construct C={I0,I1………..,In-, the collection of sets of LR(1) items for G’.


 State I of the parser is constructed from I i. The parsing actions for state I are determined as
follows :
If [A  α.
 a β, b+ is in Ii, and goto(Ii, a) = Ij, then set action[ i,a] to “shift j.” Here, a is required
to be a terminal.
 If [ A  α., a+ is in Ii, A ≠ S’, then set action* i,a+ to “reduce A  α.”
 If *S’S.,$] is in Ii, then set action* i ,$+ to “accept.”

If a conflict results from above rules, the grammar is said not to be LR(1), and the algorithm is said to be fail.

 The goto transition for state i are determined as follows: If goto(Ii ,


A)= Ij ,then goto[i,A]=j.
 All entries not defined by rules(2) and (3) are made “error.”
 The initial state of the parser is the one constructed from the set containing item *S’.S, $].
Example:
Grammar:
a. S’ -> S
b. S -> CC
c. C -> cC
d. C -> d

States:

I0 : S’ -> .S,
$ S -
> .CC, $
C -> .c C,
c /d C -> .d,
c /d
I1: S’ -> S., $

I2: S -> C.C, $


C -> .Cc, $

C -> .d, $

I3: C -> c. C, c
/d C -
> .Cc, c /d
C -> .d, c /d

I4: C -> d., c /d


I5: S -> CC., $
I6: C -> c.C, $
C -> .cC, $
C -> .d, $

I7: C -> d., $


I8: C -> cC., c /d
I9: C -> cC., $
Canonical Parsing Table:

STATE Action Goto


s
c D $ S C

0 S3 S4 1 2

1 acc
2 S6 S7 5
3 S3 S4 8
4 R3 R3
5 R1
6 S6 S7 9
7 R3
8 R2 R2
9 R2

NOTE: For goto graph see the construction used in Canonical LR.

CLR Parsing Table:

goto

START Action
s
C D $ S C

0 S36 S47 1 2

1 Acc

2 S36 S47 5

36 S36 S47 89

47 R3 R3 R3

5 R1

89 R2 R2 R2
UNIT-3

Syntax-Directed Definitions

 A syntax-directed definition uses a CFG to specify the syntatic structure of the input.
 A syntax-directed definition associates a set of attributes with each grammar symbol.
 A syntax-directed definition associates a set of semantic rules with each production rule.

For example, let the grammar contains the production:

X→YZ

And also let that nodes X, Y and Z have associated attributes X.a, Y.a and Z.a respectively.

The annotated parse tree looks like:

diagram

If the semantic rule


{X.a := Y.a + Z.a}
is associated with the production
X→YZ
then parser should add the attribute 'a' of node Y and attribute 'a' of node Z together and set the attribute 'a' of node X
to their sum.

Synthesized Attributes
An attribute is synthesized if its value at a parent node can be determined from attributes of its children.

diagram

Since in this example, the value of a node X can be determined from 'a' attribute of Y and Z nodes attribute 'a' in a
synthesized attribute.
Synthesized attributes can be evaluated by a single bottom-up traversal of the parse tree.

Example 2.6: Following figure shows the syntax-directed definition of an infix-to-postfix translator.

Figure 2.5 Pg. 34

PRODUCTION SEMANTIC RULE


expr → expr1 + term expr.t : = expr1.t + | | term.t | | '+'
expr → expr1– term expr.t : = expr1.t + | | term.t | | '-'
expr → term expr.t : = term.t
term → 0 term.t : = '0'
term → 1 term.t : = '1'
: :
: :
term → 9 term.t : = '9'

Attribute Grammar
Attribute grammar is a special form of context-free grammar where some additional
information (attributes) are appended to one or more of its non-terminals in order to provide
context-sensitive information. Each attribute has well-defined domain of values, such as
integer, float, character, string, and expressions.

Attribute grammar is a medium to provide semantics to the context-free grammar and it


can help specify the syntax and semantics of a programming language. Attribute grammar
(when viewed as a parse-tree) can pass values or information among the nodes of a tree.

Example:

E → E + T {E.value=E.value+T.value}

The right part of the CFG contains the semantic rules that specify how the grammar should
be interpreted. Here, the values of non-terminals E and T are added together and the result
is copied to the non-terminal E.

Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions. Based on the way the
attributes get their values, they can be broadly divided into two categories : synthesized
attributes and inherited attributes.

Synthesized attributes
These attributes get values from the attribute values of their child nodes. To illustrate,
assume the following production:

S → ABC

If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute,
as the values of ABC are synthesized to S.

As in our previous example (E → E + T), the parent node E gets its value from its child node.
Synthesized attributes never take values from their parent nodes or any sibling nodes.

Inherited attributes
In contrast to synthesized attributes, inherited attributes can take values from parent and/or
siblings. As in the following production,

S → ABC

A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B.

Expansion : When a non-terminal is expanded to terminals as per a grammatical rule

Reduction : When a terminal is reduced to its corresponding non-terminal according to


grammar rules. Syntax trees are parsed top-down and left to right. Whenever reduction
occurs, we apply its corresponding semantic rules (actions).

Semantic analysis uses Syntax Directed Translations to perform the above tasks.

Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax
analysis).

Semantic analyzer attaches attribute information with AST, which are called Attributed AST.

Attributes are two tuple value, <attribute name, attribute value>

For example:

int value =5;


<type,“integer”>
<presentvalue,“5”>

For every production, we attach a semantic rule.

S-attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These attributes
are evaluated using S-attributed SDTs that have their semantic actions written after the
production (right hand side).

As depicted above, attributes in S-attributed SDTs are evaluated in bottom-up parsing, as


the values of the parent nodes depend upon the values of the child nodes.

L-attributed SDT
This form of SDT uses both synthesized and inherited attributes with restriction of not taking
values from right siblings.

In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling nodes.
As in the following production

S → ABC

S can take values from A, B, and C (synthesized). A can take values from S only. B can take
values from S and A. C can get values from S, A, and B. No non-terminal can get values from
the sibling to its right.

Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.
We may conclude that if a definition is S-attributed, then it is also L-attributed as L-
attributed definition encloses S-attributed definitions.

Intermediate code can be either language specific (e.g., Byte Code for Java) or language independent (three-address
code).

Three-Address Code

Intermediate code generator receives input from its predecessor phase, semantic analyzer, in the form of an annotated
syntax tree. That syntax tree then can be converted into a linear representation, e.g., postfix notation. Intermediate
code tends to be machine independent code. Therefore, code generator assumes to have unlimited number of memory
storage (register) to generate code.

For example:

a = b + c * d;

The intermediate code generator will try to divide this expression into sub-expressions and then generate the
corresponding code.

r1 = c * d;
r2 = b + r1;
r3 = r2 + r1;
a = r3

r being used as registers in the target program.

A three-address code has at most three address locations to calculate the expression. A three-address code can be
represented in two forms : quadruples and triples.

Quadruples

Each instruction in quadruples presentation is divided into four fields: operator, arg1, arg2, and result. The above
example is represented below in quadruples format:

Op arg1 arg2 result


* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 A
Triples

Each instruction in triples presentation has three fields : op, arg1, and arg2.The results of respective sub-expressions
are denoted by the position of expression. Triples represent similarity with DAG and syntax tree. They are equivalent
to DAG while representing expressions.

Op arg1 arg2
* c d
+ b (0)
+ (1) (0)
= (2)

Triples face the problem of code immovability while optimization, as the results are positional and changing the
order or position of an expression may cause problems.

Indirect Triples

This representation is an enhancement over triples representation. It uses pointers instead of position to store results.
This enables the optimizers to freely re-position the sub-expression to produce an optimized code.

You might also like