ATC-21CS51 Module 1 To 5 Notes
ATC-21CS51 Module 1 To 5 Notes
Notes
V SEMESTER
Prepared By
Mr. Athmaranjan K
CONTENTS
AUTOMATA THEORY & COMPILER DESIGN
21CS51
Module No. Topics Page No.
Epsilon- NFA
Minimization of DFA
Chapter1 – 1.5
Chapter4 –4.4
Textbook 2:
Page 1
Automata Theory & Compiler Design 21CS51 Module 1
Imagine a Modern CPU. Every bit in a machine can only be in two states (0 or 1). Therefore, there
are a finite number of possible states. In addition, when considering the parts of a computer a CPU
interacts with, there are a finite number of possible inputs from the computer's mouse, keyboard,
hard disk, different slot cards, etc. As a result, one can conclude that a CPU can be modeled as a
finite-state machine.
The Turing machine can be thought of as a finite automaton or control unit equipped with an infinite
storage (memory). Its "memory" consists of an infinite number of one-dimensional array of cells.
Turing's machine is essentially an abstract model of modern-day computer execution and storage,
developed in order to provide a precise mathematical definition of an algorithm or mechanical
procedure.
Why to Study Theory of Computation:
Why do we need to study Automata Theory (Theory of Computation)?
Theory of computation lays a strong foundation for a lot of abstract areas of computer science. TOC
teaches you about the elementary ways in which a computer can be made to think. Implementations
come and go, but today's programmers can't read code from 50 years ago. Programmers from the
early days could never have imagined what a program of today would look like. In the face of that
kind of change, TOC is very important to study the mathematical properties, of problems and of
algorithms for solving problems that depend on neither the details of today's technology nor the
programming fashion of early days. It is desirable to know which problem can be algorithmically
solved and which cannot. Understanding which problems can be algorithmically solved is one of the
main objectives of theory of computation.
TOC provides a set of abstract structures that are useful for solving certain classes of
problems. These abstract structures can be implemented on whatever hardware/software
platform is available.
Using these abstract structures the required design efforts can be determined for the actual
model.
Using TOC, problems are analyzed by finding the fundamental properties of the problems
themselves such as:
1. Is there any computational solution to the problem? 1f not, is there a restricted
but useful variation of the problem for which a solution does exist.
2. If a solution exists, can it be implemented using some fixed amount of
memory'?
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 3
Automata Theory & Compiler Design 21CS51 Module 1
3. If a solution exists, how efficient is it? More specifically, how do its time and
space requirements grow as the size of the problem grows'?
4. Are there groups of problems that are equivalent in the sense that if there is an
efficient solution to one member of the group there is an efficient solution to
all the others'?
TOC plays an important role in compiler design, in switching theory, design and analysis of
digital circuits, etc.
Abstract Machine:
An abstract machine or abstract computer is a conceptual or theoretical model of a computer
hardware or software system which really does not exist. The machines are hypothetical computers.
These machines have commonly encountered hardware features and concepts and avoid most of the
details that are often found in real computers or machines
Concatenation of strings: Concatenation of two strings „s‟ and „t‟ is the string formed by appending
the string „t‟ to string „s‟.
It is denoted by: s||t or st
Example: If the string s = good and the string t = bye then st = goodbye.
NOTE: 1. |xy| = |x| + |y|
2. The empty string, ε, is the identity for concatenation of strings. (So for all x (x ε = ε x
= x)
3. Concatenation, as a function defined on strings, is associative. So for all s, t, w
((st)w = s(tw)).
Replication: For each string w and each natural number k, the string w; is defined as:
w0 = ε
wk+1 = wk w
For example: a3 = aaa
(bye)2 = byebye
a2b3 = aabbb
Reversal of a string: The reversal of a string w is obtained by writing the symbols in reverse order,
which is denoted by wR
Example: w = 11001 wR = 10011
NOTE: 1. If |w| = 0 then wR = w = ε
2. If |w| ≥ 1 then a € ∑( u €∑*( w=ua) (i.e., the last character of w is a.) then wR =auR
3. If w and x are strings then (wx)R = xR wR
Relations on Strings
Substring: A string„s‟ is a substring of a string „t’ if „s’ occurs contiguously as part of „t’.
Example: aaa is a substring of string aaabbbaaa
aaaaaa is not a substring of string aaabbbaaa
Proper Substring: A string „s’ is a proper substring of a string „t’ if „s’ is a substring of „t’ and
s≠t
Example: aaabbbaaa is not a proper substring of string aaabbbaaa
NOTE: 1. Every string is a substring (although not a proper substring) of itself.
2. The empty string ε is a substring of every string.
Prefix of a string: Prefix is a string of any number of leading symbols. A string „s’ is a prefix of „t’
if x € ∑* (t = sx)
Example: Prefixes of a string abba are: ε , a, ab, abb, abba.
Proper Prefix: A string „s‟ is a proper prefix of a string „t’ if „s’ is a prefix of ‘t’ and s ≠ t.
Example: Proper prefixes of a string abba are: ε , a, ab, abb ( where abba is not a proper prefix)
NOTE: 1. Every string is (although not a proper prefix) a prefix of itself.
2. The empty string ε is a prefix of every string.
Suffix of a string: Suffix is a string of any number of trailing symbols. A string „s’ is a suffix of „t’
if x € ∑* (t = xs)
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 6
Automata Theory & Compiler Design 21CS51 Module 1
Language: A language is a (Finite or Infinite) set of all strings, which are chosen from some Σ*,
where Σ is a particular finite alphabet.
Example: ∑ = {a, b}
∑* = { ε, a, b, aa, ab, ba, bb, aaa, aab,………………..}
Suppose Language contains set of all strings of a‟s and b‟s with an equal number of each is given by:
L ={ ε, ab, ba, aabb,baab, baba,……………………….}
Powers of an alphabet : Power of an alphabet is the set of strings of certain length (k) obtained
from an alphabet Σ. It is denoted by Σk
Example: If Σ = {0,1}, then Σ0 ={ ε }, Σ1 = { 0, 1}, Σ2 = { 00,01,10,11}, Σ2 = { 000, 001,
010,011,100,101,110,111} and so on.
The set of languages defined on ∑ is P( ∑*), the power set of ∑* or the set of all subsets of ∑*. If ∑ =
Ø then ∑* is { ϵ} and P(∑*), is {Ø, { ϵ} }.
NOTE: 1. L= { } = Φ is the empty language, is a language over any alphabet.
2. L= {ε} the language consisting of only the empty string, is also a language over any
alphabet.
Rules: For the above language all it says that any a's there; are must come before all the b's (if any).
If there are no a's or no b's, then there can be none that violate the rule. So the strings ε, a, aa, and bb
trivially satisfy the rule and are in L.
Example 2: Let L = {x : y€ {a, b}* (x = ya)}. Give an English description of the language.
L { a, aa, ba, aaa, baa, bbaa……..}.
Language L contains strings of a‟s and b’s ending with „a‟.
Note: ( a, b) * means all strings can be formed by concatenating the symbols a and b zero or more
times
Example 3: Let L = {x # y : x, y € {0,1, 2. 3. 4. 5. 6. 7, 8, 9}* and, when x and y are viewed as the
decimal representations of natural numbers, square(x) = y }.
The strings 3#9 and 12#144 are in L.
Concatenation of Languages: Let L1 and L2 be two languages defined over some alphabet ∑. Then
their concatenation, written as L1L2 is:
L1L2 = {w € ∑* : s € L1 ( t € L2 (w = st)) }
Example 1:
Let: L1 = {cat, dog. mouse, bird}.
L2 = {bone, food}.
L1L2 = {catbone, catfood, dogbone. dogfood. mousebone, mousefood, birdbone, birdfood}.
Input tape: It is divided into number of cells each of which can hold one symbol.
Control unit: machine has some finite states (q0, q1, q2, q3, q4,…), one of which is the start state(q0).
Based on the current input symbol, state of the machine can change.
Output unit: Output may be accept or reject. When end of the input is encountered, the control unit
may be in accepting or reject state.
Transition Table:
It is a tabular representation of transition function δ. For the above transition diagram, δ is given
= δ (q1, 0)
δ*(q1, 0 ) = q1
= δ (q1, 1)
δ*(q1, 01 ) = q2
= δ (q2, 1)
δ*(q1, 011 ) = q2
= δ (q2, 0)
δ*(q1, 0110 ) = q1
= δ (q1, 0)
δ*(q1, 01101 ) = q2 . After reading the string w= 01101, the machine is in state q2, which is final
state. So, the string 01101 is accepted by DFA.
DFA/DFSM Design Techniques
Pattern recognition Problems:
i. Identify the minimum string.
ii. Construct a DFA for minimum string using Σ.
iii. Identify the transitions which are not defined in minimum string DFA
iv. Construct the complete DFA for the given alphabet by referring the minimum string DFA.
Draw a DFA or DFSM to accept the language contains strings of a‟s having at least one a.
Answer:
Language contains a minimum string as single a.
Note: if length of string is „m‟ then naturally we should have m+1 number of states to design FA
So DFA corresponding to this minimum string is
To reach q1 state from q0 we need one input a, and in state q1 define input a. So the resultant string aa
which is accepted by q1, as it is defined in the language.
ie: δ( q1, a ) = q1
Therefore the DFA for the above problem is given by M = ({ q0, q1 }, { a}, δ, q0, {q1} )
Draw a DFA or DFSM to accept the language contains strings of a‟s and b‟s having at least one a
Language contains a minimum string as single a, and there is no restrictions on b, so it could be ε in
minimum case or any number of b‟s.
So DFA corresponding to this minimum string is
Therefore the DFA for the above problem is given by M = ( { q0, q1 }, { a,b}, δ, q0, {q1} )
Draw a DFA or DFSM to accept the language contains strings of a‟s and b‟s having exactly one a.
There is no restriction on number of b‟s, but L contains only one a.
Therefore the DFA for the above problem is given by M = ( { q0, q1 }, { a,b}, δ, q0, {q1} )
Therefore the DFA for the given problem is M = ( { q0, q1,q2,qf }, { a, b}, δ, q0, {qf} )
Where, δ is as shown in transition diagram:
All Strings of a‟s and b‟s except those which end with abb. ( not ending with abb).
Answer is similar to that of previous problem, except consider non final states of previous problem
as a final state and final state as a non final state.
The DFA for the problem which is not ending with abb is M = ( { q 0, q1,q2,q3 }, { a, b}, δ, q0, {q0, q1,
q2})
Obtain a DFA/DFSM to accept the language L contains strings of a‟s and b‟s except those having
substring aab.
Note: DFA design procedure is same as that of previous problem except, make final state as non
final and non final state as final state of previous problem.
DFA M = ({ q0, q1,q2,q3 }, { a, b}, δ, q0, {q0, q1, q2} )
Where, δ is as shown in transition diagram.
In state q0 on input b and in q1 on input a, machine enters trap state (starts with ab) and also ab
followed by any number of a‟s and b‟s in qf state is accepted.
DFA M = ( { q0, q1, qf, qt }, { a, b}, δ, q0, {qf} )
Where, δ is as shown in transition diagram.
Obtain a DFA/DFSM to accept the language contains strings of 0‟s and 1‟s having exactly three
consecutive 0‟s.
Minimum string is 000 and its DFA is:
Draw a DFA/DFSM to accept strings of a‟s and b‟s such that L = { awa | w € (a+b)* }
OR
Show that the language L = {awa | w € (a+b)* } is regular.
Note: A language is regular if it is accepted by a DFA.
That means if it possible to design DFA for the given language L = {awa | w € (a+b)* }, then we say
that L is regular.
The minimum string for the language L = { awa } is aa, and its DFA is
Draw a DFA/DFSM to accept strings of 0‟s, 1‟s and 2‟s beginning with a 0 followed by odd number
of 1‟s and ending with a 2.
Minimum string is 012 and its DFA
DFA M = ({ q0, q1, q2, qf, qt}, { 0, 1,2}, δ, q0, {qf }) where qt is the trap state and δ is as shown in
transition diagram.
Draw a DFA /DFSM to accept strings of a‟s and b‟s with at most two consecutive b‟s.
Minimum string is bb and its DFA is
L = {ε, b, bb, ab, abb, bba, a, baa, aa,…………………………………}
DFA M = ({ q0, q1, qf, qt}, { a, b}, δ, q0, {qf }) where qt is the trap state and δ is as shown in
transition diagram:
Draw a DFA to accept strings of 0‟s and 1‟s, starting with at least two 0‟s and ending with at least
two 1‟s:
Minimum string is 0011 and its DFA
The DFA M = ({ q0, q1, q2, q3, qf, qt}, { 0, 1}, δ, q0, {qf }) where qt is the trap state and δ is as shown
in transition diagram:
Draw a DFA to accept strings of a‟s and b‟s having not more than three a‟s.
OR
Draw a DFA to accept the Language L= { Na(w) ≤ 3, w € (a, b)* }
Minimum string: L = {ε, a, aa, aaa,b, ba,bba,abbb,………..}
The DFA M = ({ q0, q1, q2, q3, qt}, { a, b}, δ, q0, {q0,}) where qt is the trap state and δ is as shown in
transition diagram:
i. DFA to accept the strings of a‟s and b‟s starting with ab can be written as shown
below:
DFA to accept the strings of a‟s and b‟s ending with ab can be written as shown below:
The two DFA‟s can be joined to accept the strings of a‟s and b‟s beginning with ab or ending with
ab or both can be written as shown below:
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, {q2,q5,}) where δ is as
shown in transition diagram:
ii. The set of strings with at least one “a” and exactly two „b‟s.
The minimum strings may be abb or bab or bba and its DFA is as shown below:
Therefore the DFA M = ({ q0, q1, q2, q3, q4, q5,qt}, { a, b}, δ, q0, {q3,}) and qt is the trap state where δ
is as shown in transition diagram:
Obtain a DFA to accept the set of all strings that begins with 01 and end with 11.
Minimum length string = 011 and its DFA:
The DFA M = ({q0, q1, q2, q3, q4, qt}, { 0, 1}, δ, q0, {q3 }) where qt is the trap state and δ is as shown
in transition diagram:
Obtain a DFA to accept the set of all strings that begins with 01 and end with 10.
Minimum length string = 010 and its DFA:
The DFA M = ({q0, q1, q2, q3, q4, qt}, { 0, 1}, δ, q0, {q3 }) where qt is the trap state and δ is as shown
in transition diagram:
Obtain a DFA to accept the language contains strings of 0‟s and 1‟s with odd number of 1‟s followed
by even number of 0‟s.
The DFA M = ({q0, q1, q2, q3, qt}, { 0, 1}, δ, q0, {q1,q3 }) where δ is as shown in transition diagram:
Obtain a DFA to accept the language L = { w | w is of even length and begins with 01 }
The DFA M = ({ q0, q1, q2, q3, qt}, { 0, 1}, δ, q0, {q2}) where δ is as shown in transition diagram:
Obtain a DFA to accept the language contains strings of binary odd numbers.
The DFA M = ({ q0, q1}, { 0, 1}, δ, q0, {q1}) where δ is as shown in transition diagram
Obtain a DFA to accept the set of all strings when interpreted as binary integer is an odd or even number.
The DFA M = ({ q0, q1, q2}, { 0, 1}, δ, q0, {q1,q2}) where δ is as shown in transition diagram
Obtain a DFA for the language L = {w €{a. b}* : no two consecutive characters are the same}.
Answer:
L = { ϵ, a, b, ab, ba, aba, bab,abab,……………}
The DFSM M = ({ q0, q1,q2, d}, { a, b}, δ, q0, {q0, q1, q2}) where δ is as shown in transition diagram;
where state d is the dead state or trap state
Obtain a DFA for the language L = {w €{a, b}* : every a region in w is of even length}.
Answer:
Language L contains strings of a‟s and b‟s in which any number of b‟s immediately preceded or
followed by even number of a‟s .
The DFSM M = ({ q0, q1, d}, { a, b}, δ, q0, {q0}) where δ is as shown in transition diagram
Modulo-n-problems
Obtain a DFA to accept the language L = { |w| mod 3 = 0, where w € (a, b)* }
Answer:
Modulo- 3 results in three remainders: 0, 1, 2. So in q0 state, no input symbol required to reach that
state, length is 0. Therefore q0 is identified as remainder 0 state. Similarly in q1 (length1), remainder is
1, and q2 remainder is 2 (length 2). Afterwards it enters to q0 and the same process is repeated. Final
state is q0 since the |w| mod 3 = 0 (remainder 0 state, which is q0)
The DFA M = ({ q0, q1, q2}, { a, b}, δ, q0, {q0}) where δ is as shown in transition diagram:
Obtain a DFA to accept the language L = { |w| mod 3 < > 0, where w € (a, b)* }.
ie: (|w| mod 3 not equal to 0)
The DFA M = ({ q0, q1, q2}, { a, b}, δ, q0, {q1, q2}) where δ is as shown in transition diagram:
Obtain a DFA to accept the language L = { w: |w| mod 5 < > 0 } and w € (a,b)*
The DFA M = ({ q0, q1, q2, q3, q4}, { a, b}, δ, q0, { q1, q2, q3, q4}) where δ is as shown in transition
diagram
Obtain a DFA to accept the language L = { w: |w| mod 3 |w| mod 2 and w € (a,b)*}.
Answer:
Here mod 3 results in three reminders 0, 1, 2 and mod 2 results two remainders 0, 1.
Let us consider |w| mod 3 = x and it results in three states say Q1 = { 0, 1, 2 }
|w| mod 2 = y and it results in two states say Q2 = { 0, 1 }.
Therefore the number of states required to design DFA for the given language can be obtained by
taking cross product of Q1 and Q2.
Q = Q1 X Q2
Q = {( 0, 0), ( 0, 1), (1, 0), (1, 1), (2, 0), (2, 1) }.
Here (0, 0) state is considered as start state.
(Because in start state length of the string required to reach that sate = 0; that means |w| mod3 = 0
mod 3 = 0 and 0 mod 2 = 0; (0, 0) state.
Final state: To accept strings of w such that |w| mod 3 |w| mod 2, is the pairs (x, y) such that x
y are final states.
So the final states are {(0,0), (1,0), (1,1), (2,0),(2,1) }
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q0, q1, q2, q4, q5}) where δ is as shown in
transition diagram
Obtain a DFA to accept the language L = { w: |w| mod 3 |w| mod 2 and w € (a,b)*}.
Q1 = { 0, 1, 2 }
Q2 = { 0, 1 }.
Q = {( 0, 0), ( 0, 1), (1, 0), (1, 1), (2, 0), (2, 1) }.
Here (0, 0) state is considered as start state.
Final states are; x y ie: = { (0, 0), (0,1), (1,1) }
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q0, q1, q3}) where δ is as shown in transition
diagram:
Obtain a DFA to accept the language L = { w: |w| mod 3 ≠ |w| mod 2 and w € (a,b)*}.
Answer:
Q = {( 0, 0), ( 0, 1), (1, 0), (1, 1), (2, 0), (2, 1) }.
Here (0, 0) state is considered as start state.
Final states are; x ≠ y ie: = { (0, 1), (1,0), (2,0), (2,1) }
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q2, q3, q4, q5}) where δ is as shown in transition
diagram:
Obtain a DFA to accept the language = { w |w € (a,b)*; Na(w) mod 3 = 2 and Nb(w) mod 2 =1}
Answer:
Na(w) mod 3 results in remainders as 0,1,2 and the states corresponding to these remainders can be
represented as: Q1 = { A0, A1, A2 }
Nb(w) mod 2 results in remainders as 0,1 and the states corresponding to these remainders can be
represented as: Q2 = { B0, B1 }
The possible states for the given DFA is Q = Q1 X Q2
Q = { (A0,B0 ), ( A0,B1), (A1,B0), ( A1,B1), (A2,B0), (A2,B1) }
Here (A0,B0 ) is the start state and the language contains strings of a‟s and b‟s such that Na(w) mod 3
= 2 and Nb(w) mod 2 =1, results in a final state ( A2,B1).
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q5}) where δ is as shown in transition
diagram:
Obtain a DFA to accept the language L= {w |w € (a,b)*; Na(w) mod 3 1 and Nb(w) mod 2 1}
Answer is same as that of previous problem, but only differs in final sate.
Final states ={ (A1,B0), ( A1,B1), (A2,B0), (A2,B1) .
The DFA M = ({q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q1, q2, q4, q5}) where δ is as shown in
transition diagram:
****Obtain a DFA to accept the language contains strings of a‟s and b‟s such that,
i. Set of all strings having even number of a‟s and even number of b‟s.
ii. Set of all strings having even number of a‟s and odd number of b‟s.
iii. Set of all strings having odd number of a‟s and even number of b‟s.
iv. Set of all strings having odd number of a‟s and odd number of b‟s.
i. Set of all strings having even number of a‟s and even number of b‟s:
OR
Number of a‟s divisible 2 and number of b‟s divisible by 2.
OR
Number of a‟s multiple of 2 and number of b‟s multiple of 2
Answer:
Even number of a‟s is nothing but Na(w) mod 2 = 0
Even number of b‟s is nothing but Nb(w) mod 2 = 0.
So the possible remainders in each case is 0, 1;
Q1 = { A0, A1}
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 31
Automata Theory & Compiler Design 21CS51 Module 1
Q2 = { B0, B1}. Therefore Q = Q1 X Q2 = { (A0, B0) , (A0, B1), (A1, B0), (A1, B1) }
Start state: (A0, B0)
Final state to accept the language contains even number of a‟s and even number of b‟s is (A0,B0).
So the DFA M = ({ q0, q1, q2, q3 }, { a, b}, δ, q0, { q0}) where δ is as shown in transition diagram:
ii. Set of all strings having even number of a‟s and odd number of b‟s.
Answer is same as that of previous problem, but only differs in final sate.ie: (A0, B1)
Even number of a‟s is nothing but Na(w) mod 2 = 0
Odd number of b‟s is nothing but Nb(w) mod 2 = 1.
DFA M = ({ q0, q1, q2, q3 }, { a, b}, δ, q0, { q2}) where δ is as shown in transition diagram:
iii. Set of all strings having odd number of a‟s and even number of b‟s.
Answer is same as that of previous problem, but only differs in final sate; ie: (A1, B0)
Odd number of a‟s is nothing but Na(w) mod 2 = 1
Even number of b‟s is nothing but Nb(w) mod 2 = 0.
DFA M = ({q0, q1, q2, q3 }, { a, b}, δ, q0, { q1}) where δ is as shown in transition diagram
iv. Set of all strings having odd number of a‟s and odd number of b‟s.
Odd number of a‟s is nothing but Na(w) mod 2 = 1
Odd number of b‟s is nothing but Nb(w) mod 2 = 1.
Answer is same as that of previous problem, but only differs in final sate; ie: (A1, B1)
DFA M = ({ q0, q1, q2, q3 }, { a, b}, δ, q0, { q3}) where δ is as shown in transition diagram:
3. For example if the numbers are interpreted as binary numbers and divisible by 5 then we need
total five states to design DFA.
4. The states of DFA: Start state q0 represents remainder 0, q1 represents remainder 1, q2
remainder 2, q3 remainder 3 and q4 remainder 4 respectively.
5. Final state of DFA is the remainder 0 state, which is q0 (since all numbers divisible by n
results in remainder 0)
Design a DFA to accept all binary numbers which are divisible by 3.
OR
Design a DFA to accept all binary integers which are multiple of 3.
Answer:
Number divisible by 3 results in remainders: 0, 1 and 2
Therefore the number of states required to design DFA for this problem = 3
Remainder 0 corresponds to q0 state, remainder 1 corresponds to q1 and remainder 2 corresponds to
q2 state.
Final state: q0 where all binary number divisible by 3 results in remainder 0 is accepted.
In q0 machine reads any number of 0‟s and results in remainder 0 (0 mod 3); when input in q0 is = 1;
Machine enters unto remainder 1 state (1 mod 3).
In q1 if it reads input 0; then 10 (2 mod 3) results in remainder 2; so it enters into q2 state.
In q1 if it reads input 1 then 11 results in remainder 0; so it enters into q0
In q2 if input = 0; 100 results in remainder 1; so it enters into q1
In q2 if input = 1; 101 results in remainder 2; so it remains in q2
The Transition Function of the above problem:
State ∑
0 1
→ * q0 q0 q1
q1 q2 q0
q2 q1 q2
SHORTCUT METHOD
1. For any divisible by n number problem, first write the transition table for „n‟ number of states
and then translate Transition table into Transition diagram.
Design a DFA to accept all binary numbers which are divisible by 5. OR which are multiple of 5.
Easy to solve by using short cut method:
Write the transition table for binary number divisible by 5. Total number of states required is 5:
State ∑
0 1
→ * q0 q0 q1
q1 q2 q3
q2 q4 q0
q3 q1 q2
q4 q3 q4
Design a DFA to accept the set of all strings beginning with a 1 that when interpreted as binary
integer is a multiple of 5. For example 101, 1010, 1111 etc are in the language and 0, 0101, 100, 111,
01111 etc are not.
Answer remains same as that of previous problem, but the number should always start with a 1.
If a binary number starts with a 0, that number should never be accepted and machine enters to trap
state on input 0. So let us rename the final state of previous problem as qf and have a new start state
q0 and from this state on input 1 machine enters into state q1, and the remaining procedure is same as
that of previous problem.
Therefore the DFA for the above problem is M =({ q0, q1, q2, q3, q4 }, { 0, 1}, δ, q0, { qf}) where δ is
as shown in transition diagram:
Design a DFA to accept the set of all strings that when interpreted in reverse as a binary integer is
divisible by 5. Examples of strings in the language are 0, 10011, 1001100 and 0101.
Answer is same as that of divisible-5 problem; but reverses the direction of all arrow marks except
the arrow labeled with start.
Therefore the DFA for the above problem is M =({ q0, q1, q2, q3, q4 }, { 0, 1}, δ, q0, { q0}) where δ is
as shown in transition diagram:
v. L = { awa | w € ( a,b)* } ie: starting with “a” and ending with “a”.
i. L = { ababn or aban | n 0 }
NFA for the above problem is M =({ q0, q1, q2, q3, q4,qf }, { a, b}, δ, q0, { qf,q4}) where δ is as
shown in transition diagram:
v. L = { wab | w € ( a,b)* } .
NFA for the above problem is M =({ q0, q1, q2}, { a, b}, δ, q0, { q2) where δ is as shown in
transition diagram:
Obtain an NFA which accepts exactly those strings that have the symbol 1 in the second last position
over ∑ = { 0, 1}.
δD( {q0, q1, q2}, a ) = δN( q0, a ) Ṳ δN( q1, a ) Ṳ δN( q2, a ) = {q0,q1} U φṲφ = {q0,q1}
δD( {q0, q1, q2}, b ) = δN( q0, b ) Ṳ δN( q1, b ) Ṳ δN( q2, b ) = {q0} U {q2} Ṳφ = {q0,q2}
Transition table of DFA:
a b
φ φ φ
From the above table we observe that only {q0, {q0, q1} and {q0, q2} are reachable from start state
q0, and all other states are inaccessible states. So by discarding all those inaccessible states from the
above transition table we get the DFA equivalent to given NFA is:
a b
{q0} {q0, q1} {q0}
{q0, q1} {q0, q1} {q0,q2}
*{q0,q2} {q0, q1} {q0}
The final state of DFA FD = {q0, q2} ( since q2 is the final state of NFA)
The DFA M = ({(q0), (q0,q1), (q0,q2), { a, b}, δ, q0, { (q0,q2)}) where δ is as shown in transition
diagram:
δ 0 1
→p {p, q} {p}
q φ {r}
*
r {p, r} {q}
Answer:
Transition table of DFA using subset construction method:
δ 0 1
φ φ φ
→{p} {p, q} {p}
{q} φ {r}
From the above table we observe that only {p}, {p, q}, {p, r} and {p, q, r} are reachable from start
state {p}, all other states are inaccessible states. So by discarding all those inaccessible states from
the above transition table we get the DFA as:
δ 0 1
→{p} {p, q} {p}
The final state of DFA FD = { (p, r), (p, q, r)} ( since r is the final state of NFA)
The DFA M = ({ (p), (p,q), (p,r), (p,q,r) { 0,1}, δ, {p}, { (p,r), (p,q,r)} where δ is as shown in
transition diagram:
That is to compute δD (QD, a ), we look at all the states p in QD, see what states in NFA
goes to from p on input a, and take the union of all those states.
iii. Identify the final state of DFA ie FD; is the sets that include at least one accepting state of
NFA.
Note: In case in NFA to DFA conversion problem, particular method is not specified, then always go
for LAZY evaluation method:
Convert the following NFA to DFA.
δ 0 1
→p {q} φ
*q {p} {q, r}
r φ {q}
The start state of DFA = {p} (since p is the start state of NFA)
Initially DFA has only one state ie: start state. QD = { {p} }
Find the transitions from {p} on input 0 and 1.
δD( (p), 0 ) = δN(p, 0 ) = {q}, δD( (p), 1 ) = δN(p, 1 ) = φ
Add the new state {q} to QD = { {p}, {q} }
δ 0 1
→{p} {q} φ
*{q} {p} {q, r}
*{ q, r} {p} {q, r}
δ a b
→q0 {q1} φ
The equivalent DFA is given by M = ( {{ q0}, {q1} , {q1, q2}}, { 0,1 }, δ, {q0}, {q1,,q2} )
Transition table of DFA:
δ a b
→q0 {q1} φ
In DFA φ indicates that there is no transition defined (or it enters to trap state)
Answer:
Step 1: Start state of DFA is the start state of NFA; ie q0
Initially set of DFA states has QD = { (q0) }
Write the transition function in state q0
δD( q0, a ) = δN( q0, a) = { q0, q1}
δD( {q0, b ) = δN( q0, b ) = { q0, q3}
Add the new state { q0, q1} and{ q0, q3} to QD & write the transition function for the new states and
repeat the same process until we get no more new states.
δD( {q0, q1}, a ) = { δN( q0, a) U δN( q1, a)}
= {q0, q1} U { φ }
= {q0, q1} (already existing state no need to add into QD)
δD( {q0, q1}, b ) = { δN( q0, b) U δN( q1, b)}
= {q0, q3} U { q2 }
= {q0, q2, q3} (new state added to QD)
= {q0, q3} U { φ } U { φ }
= {q0, q3} (Already existing state)
δD( {q0, q1, q4}, a ) = { δN( q0, a) U δN( q1, a) U δN( q4, a) }
= {q0, q1} U { φ } U { φ }
= {q0, q1} (Already existing state)
δD( {q0, q1, q4}, b ) = { δN( q0, b) U δN( q1, b) U δN( q4, b) }
= {q0, q3} U { q2 } U { φ }
= {q0, q2, q3} (Already existing state)
Since there is no more new state, we stop the process and finally DFA has 5 states;
QD = {{q0}, {q0, q1}, {q0, q3}, {q0, q2, q3}, {q0, q1, q4}}
Since q2 and q4 are the final states of NFA; the set of states of DFA containing at least one of these
states is considered as the final state. Therefore the final state of DFA= {{q0, q2, q3}, {q0, q1, q4}}
δ 0 1
→p {p, r} {q}
q {r, s} {p}
*r {p, s} {r}
*s {q, r} φ
δD( q, 0 ) = δN( q, 0 ) = { r,s} , δD( {q, 1 ) = δN( q, 1 ) = { p} --- already existing state
δD( (p,r), 0 ) = δN(p, 0 ) Ṳ δN(r, 0 ) = {p,r} Ṳ{p,s} = {p,r,s}
δD( (p,r), 1 ) = δN(p, 1 ) Ṳ δN(r, 1 ) = {q}Ṳ{r} = { q,r}
Add the new states {r,s}, { p,r,s},{q,r} to QD, ie:QD = { (p), (q), (p,r), (r,s), { p,r,s},{q,r} }
DFA
0 1
{p} { p, r} { q}
{q} {r, s} {p}
*{r} {p, s} {r}
*{p, r} {p, r, s} {q, r}
*{r, s} {p, q, r, s} {r}
*{q, r} {p, r, s} {p, r}
*{p, s} {p, q, r} {q}
*{p, r, s} {p, q, r, s} {q, r}
*{p, q, r} {p, r, s} {p, q, r}
*{p, q, r, s} {p, q, r, s} {p, q, r}
Define ε-NFA.
ε- NFA is five-tuple indicating E= (Q, Σ, δ, q0, F), where E is a Non-deterministic machine with ε-
moves where
Q --- Non empty finite set of states
Σ --- Non empty finite set of input alphabets (symbols)
δ -- Transition function, which maps from Q x Σ Ṳ {ε} → 2Q
q0 € Q is the start (initial) state.
F is subset of Q, is the final (accepting) state.
Language accepted by a ε- NFA:
Let E= (Q, Σ, δ, q0, F) be a ε-NFA. A string w is accepted by the machine E, if and only if transition
for w takes the initial state q0 to final state F.
ie: δ*(q0, w) is in F. ie: δ*(q0, w) contains at least one accepting state.
Epsilon-Closure:
What is epsilon-closure?
Epsilon closure of any state q is the set of all states which are reachable from state q on ε-transitions
only. ε-closure(q) is denoted by ECLOSE(q).
Recursive definition of epsilon-closure:
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 49
Automata Theory & Compiler Design 21CS51 Module 1
Example:
i. The set of strings consisting of zero or more a‟s followed by zero or more b‟s followed by
zero or more c‟s
ii. The set of strings consisting of either 01 repeated one or more times or 010 repeated one or
more times. ie: L = { (01, 010 )+ }.
i.
ii.
q1 : The situation in which we have seen the sign if there is one but none of the digits or decimal
point.
q2 : The situation in where we have just seen the decimal point may or may not have seen prior digits.
q4 : We have definitely seen at least one digit but not the decimal point.
q3: We have seen a decimal point and at least one digit either before or after the decimal point. We
may stay in q3 reading whatever digits there are and also have the option of guessing the string of
digits is complete and going spontaneously to q5 as an accepting state.
Design a NDFSM (NFA) for L = {w € {a, b}*: w is made up of an optional a followed by aa
followed by zero or more b's }
NDFSM (NFA) M =({ q0, q1, q2, q3}, { a, b}, δ, q0, { q3}) where δ is as shown in transition diagram:
NDFSM (NFA) M =({ q0, q1, q2, q3, q4, q5, q6}, { a, b}, δ, q0, { q4, q5}) where δ is as shown in
transition diagram:
a) aabbba. Yes.
b) bab. No.
c) baba. Yes
3. Identify the final state: If any of the state in QD belongs to the final state of ε-NFA, then those
components of states are considered as final state of DFA,
δ ε a b
Also give the set of all strings of length 3 or less accepted by the automaton.
ε-closure (p) = { p, q, r}
ε-closure (q) = { q}
ε-closure (r) = { p, q, r}
The start state of DFA is the ε-closure (p), where p is the start state of ε – NFA.
Transition function:
The set of all strings of length 3 or less accepted by the automaton is given by:
L = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, baa, bab, bba, bbb }
Convert the following ε- NFA to DFA, by computing ε-closure of each state.
Answer:
MINIMIZATION OF DFA
Language accepted by a Finite Automata is called Regular Language.
Using decision property of regular languages, we can decide whether two automata define the same
language. If so, we can minimize the states of automata with as few states as possible. Minimization
of automata is very important in design of switching circuits. As the number of states of automata
decreases, the size of the circuit decreases and hence the cost decreases.
Table Filling Algorithm is used to find the set of states that are distinguishable and indistinguishable
states.
Minimization of Automata using Table Filling Algorithm:
Procedure:
1. Eliminate all the states which are not reachable from start state.
2. Identify the initial markings for each pair of states (p, q) such that, p is an accepting state and
q is non-accepting state or vice versa, then pair (p, q) is distinguishable and mark that pair (p,
q) by putting X mark.
3. Identify the subsequent markings for each unmarked pair (p, q), such that for each a € Σ find
δ (p, a) = r and δ (q, a) = s. If the pair (r, s) is already marked as distinguishable then the pair
(p, q) is also distinguishable, so mark pair (p, q) by putting X mark.
Repeat step 3 until no previously unmarked pairs are marked.
4. Obtain the states of minimized DFA such that the group of states, in which the unmarked
pairs of states obtained after performing step3 are considered as indistinguishable
(equivalent) states, they can be merged into one state and individual distinguishable states.
5. Identify the start state of minimized DFA: The group of states in which one of the component
say [p1, p2, p3, p4……pn] consists of start state of given DFA, then that component [p1, p2,
p3, p4……pn] is the start state of minimized DFA.
6. Identify the final state of minimized DFA: The group of states in which one of the component
say [p1, p2, p3, p4……pn] consists of final state of given DFA, then that component [p1, p2,
p3, p4……pn] is the final state of minimized DFA.
i. Draw the table of distinguishable and indistinguishable states for the automata shown below.
ii. Construct minimum state equivalent automata.
Answer:
Step 1: State D is not reachable from start state (refer TT where D is not defined in column 0‟ or
column 1‟) . We can eliminate state D.
Step 2: Identify the initial markings:
B
*C X X
E X
F X
G X
H X
A B *C E F G
Step 3: Identify the subsequent markings:
Finally unmarked pair of state (A, E) and (B, H) are considered as indistinguishable or equivalent
states.
Individual states C, F and G are distinguishable (not equivalent) states.
Minimized DFA‟s Transition table:
δ 0 1
→(A,E) (B,H) F
(B,H) G C
*C (A,E) C
F C G
G G (A,E)
Draw the table of distinguishable and indistinguishable states for the automata shown below and
hence find the minimum state equivalent automata.
δ 0 1
→A B C
B D E
C F G
*D D E
E F G
*F D E
*G F G
Identify the initial markings by considering pair of states in which one final and other non-final state.
B
C
D X X X
E X
*F X X X X
*G X X X X
A B C *D E *F
Finally unmarked pair of state (C, E) and (D, F) are considered as indistinguishable or equivalent
states.
The remaining individual states A, B and G are distinguishable (not equivalent) states.
Start state of minimized DFA = A
i. Draw the table of distinguishable and indistinguishable states for the automata.
ii. Construct minimum state equivalent automata.
δ 0 1
→A B E
B C F
*C D H
D E H
E F I
*F G B
G H B
H I C
*I A E
Identify the initial markings by considering pair of states in which one final and other non-final state.
B
*C X X
D X
E X
*F X X X X
G X X
H X X
*I X X X X X X
A B *C D E *F G H
δ 0 1
Consider the two DFA‟s shown below. Using table filling algorithm, show that the language
accepted by both the DFA‟s is same.
Answer:
Two DFA‟s accept the same language if their start states are equivalent.
Using table filling algorithm, we have to prove that start state of first DFA is A which is equivalent
to start state of second DFA, C.
B X
*C X
*D X
E X X X
*A B *C *D
From table filling algorithm we observe that, states (A, C) are indistinguishable or equivalent, so
both the DFA‟s accept the same language.
Also (A, D), (B, E) and (C, D) are indistinguishable.
So minimized automata contains (A, C, D) and (B, E) states.
Transition table of minimized DFA:
δ 0 1
Step 2: Identify the subsequent markings for each unmarked pairs (q1 , q2 ), (q1 , q4 ), (q2 , q4) &(q3 , q5 )
by referring transition table of DFA.
δ 0 1
(q1 , q2 ) (q2 , q3 ) (q3 , q5 )
(q1 , q4 ) (q2 , q3 ) (q3 , q5 )
(q2 , q4) (q3 , q3 ) (q5 , q5 )
(q3 , q5 ) (q2 , q4 ) (q3 , q5)
q2 X
* q3 X X
q4 X X
* q5 X X X
q1 q2 * q3 q4
From the above table we observe that states (q2 , q4) and (q3 , q5 ) are not marked. So these states are
considered as indistinguishable (Equivalent) states and the state q1 is distinguishable state.
The minimized DFA will have group of states of distinguishable and indistinguishable.
Thus the minimized DFA will have 3 states: q1, (q2 , q4) and (q3 , q5 )
INTERPRETER
An interpreter is another common kind of language processor. Instead of producing a target
program as a translation, an interpreter appears to directly execute the operations specified in the
source program on inputs supplied by the user.
LEXICAL ANALYZER
The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer
reads the stream of characters making up the source program and groups the characters into
meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as
output a token of the form: (token-name, attribute-value), that it passes on to the subsequent
phase, syntax analysis.
In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for
this token. Information from the symbol-table entry is needed for semantic analysis and code
generation.
SYNTAX ANALYZER (PARSER)
It is the second phase of compiler. The parser uses the first components of the tokens
produced by the lexical analyzer to create a tree-like intermediate representation that depicts
the grammatical structure of the token stream. A typical representation is a syntax tree in
which each interior node represents an operation and the children of the node represent the
arguments of the operation.
SEMANTIC ANALYZER
The semantic analyzer uses the syntax tree and the information in the symbol table to check
the source program for semantic consistency with the language definition. It also gathers
type information and saves it in either the syntax tree or the symbol table, for subsequent
use during intermediate-code generation.
INTERMEDIATE CODE GENERATOR
In the process of translating a source program into target code, a compiler may construct
one or more intermediate representations, which can have a variety of forms. Syntax trees
are a form of intermediate representation; they are commonly used during syntax and
semantic analysis. The low level or machine like intermediate representation form should be
easy to produce and easy to translate into the target machine. It can be in three address code
or quadruples, triples etc.
MACHINE-INDEPENDENT CODE-OPTIMIZATION
Optimization phase is optional. This phase provides better target program than it would
have otherwise produced from un-optimized form. (both machine dependent and
independent optimizers are optional phase). This phase attempts to improve the intermediate
code by eliminating unwanted codes, so that better target code will result. Usually better
means faster, but other objectives may be desired, such as shorter code, or target code that
consumes less power.
CODE GENERATOR
The code generator takes as input an intermediate representation of the source program and
maps it into the target language. If the target language is machine code, registers or memory
locations are selected for each of the variables used by the program. Then, the intermediate
instructions are translated into sequences of machine instructions that perform the same
task. A crucial aspect of code generation is the judicious assignment of registers to hold
variables.
Symbol Table
Symbol Table will interact with all phases of compilation. A symbol table is a data structure
containing a record for each identifier with fields for the attributes of the identifier. When an
identifier in the source program is detected by the lexical analyzer, the identifier is entered
into the symbol table.
Show the translations for an assignment statement position = initial + rate* 60, clearly indicate the
output of each phase.
Lexical analyzer phase:
Input to lexical analyzer phase is position = initial + rate * 60
position is a lexeme that would be mapped into a token < id, 1 >
The assignment symbol = is a lexeme that is mapped into the token < = >
initial is a lexeme that is mapped into the token < id, 2>
+ is a lexeme that is mapped into the token < +>
rate is a lexeme that is mapped into the token < id, 3> *
is a lexeme that is mapped into the token < * >
60 is a lexeme that is mapped into the token < 60 >
The output of lexical analyzer phase is:
< id, 1 > < = > < id, 2 > < + > <id, 3 > < * > < 60 >
Syntax analyzer phase:
Input to syntax analyzer phase is: < id, 1 > < = > < id, 2 > < + > <id, 3 > < * > < 60 >
Syntax analysis produces output in the form of tree called syntax tree with operators are
considered as interior nodes and operands are children of the node with normal precedence rules.
Output of syntax analyzer is a syntax tree of the following form
Suppose that position, initial and rate have been declared to be floating point numbers and the
lexeme 60 by itself forms an integer. Type checker in semantic analyzer discovers that the operator
* is applied to a floating point number rate and integer 60. Integer may be converted into a floating
point number. Output of this phase has an extra node for the operator inttofloat.
Output is the modified version of syntax tree:
t1 = id3 * 60.0
id1 = id2 + t1
Code generator phase:
Here registers or memory locations are selected for each of the variables used by the program.
Output of code generator phase is (Here all the identifiers are floating point type)
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
are transmitted from one part of a program to each other part. Data-flow analysis is a key
part of code optimization.
5. Compiler-construction tool-kits that provide an integrated set of routines for constructing
various phases of a compiler.
Textbook 2:
----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.
Chapter4- 4.1
Textbook 2:
Page 77
Automata Theory & Compiler Design 21CS51 Module 2
If R is a regular expression denoting the language LR and S is a regular expression denoting the
language LS then R + S is a regular expression corresponding to the language LR U LS.
R.S is a regular expression corresponding to the language LR. LS.
R* is a regular expression corresponding to the language LR. Thus the expressions obtained by
applying any of the rules are regular expressions.
Examples of Regular expressions
Regular expression Meaning
a* String consisting of any number of a’s. (zero or more a’s)
a+ String consisting of at least one a. (one or more a’s)
(a + b) String consisting of either a or b
*
(a+b) String consisting of any nuber of a’s and b’s including ε
(a+b)* ab Strings of a’s and b’s ending with ab.
ab(a+b)* Strings of a’s and b’s starting with ab.
(a + b)* ab (a+b)* Strings of a’s and b’s with substring ab.
Obtain regular expression to accept the language containing strings of a’s and b’s such that L = {
a2n+1 b2m+1 | n, m 0 }.
a2n+1 means odd number of a’s, regular expression = a(aa)*
b2m+1 means odd number of b’s, regular expression = b(bb)*
The regular expression for the given language = a(aa)*b(bb)*
Obtain regular expression to accept the language containing strings of 0’s and 1’s with exactly
one 1 and an even number of 0’s.
Regular expression for exactly one 1 = 1
Even number of 0’s = (00)*
So here 1 can be preceded or followed by even number of 0’s or 1 can be preceded and followed
by odd number of 0’s.
The regular expression for the given language = (00)* 1 (00)* + 0(00)* 1 0(00)*
Obtain regular expression to accept the language containing strings of 0’s and 1’s having no two
consecutive 0’s. OR
Obtain regular expression to accept the language containing strings of 0’s and 1’s with no pair of
consecutive 0’s.
Whenever a 0 occurs it should be followed by 1. But there is no restriction on number of 1’s. So
it is a string consisting of any combinations of 1’s and 01’s, ie regular expression = (1+01)*
Suppose string ends with 0, the above regular expression can be modified by inserting (0 + ε ) at
the end.
Regular expression for the given language = (1+01)* (0 + ε )
Obtain regular expression to accept the language containing strings of 0’s and 1’s having no two
consecutive 1’s. OR
Obtain regular expression to accept the language containing strings of 0’s and 1’s with no pair of
consecutive 1’s.
Whenever a 1 occurs it should be followed by 0. But there is no restriction on number of 0’s. So
it is a string consisting of any combinations of 0’s and 10’s, ie regular expression = (0+10)*
Suppose string ends with 1, the above regular expression can be modified by inserting (1 + ε ) at
the end.
Regular expression for the given language = (0+10)* (1 + ε )
Obtain regular expression to accept the following languages over Σ = { a, b}.
i. Strings of a’s and b’s with substring aab.
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 81
Automata Theory & Compiler Design 21CS51 Module 2
So the regular expression for the given language = [(a+b) (a+b) (a+b)]*+ [(a+b)
(a+b)]*
ix. Obtain the regular expression to accept the language L = { anbm | m+n is even }
Here n represents number of a’s and m represents number of b’s.
m+n is even results in two possible cases;
case i. when even number of a’s followed by even number of b’s.
regular expression : (aa)*(bb)*
case ii. Odd number of a’s followed by odd number of b’s.
regular expression = a(aa)* b(bb)*.
So the regular expression for the given language = (aa)*(bb)* + a(aa)* b(bb)*
x. Obtain the regular expression to accept the language L = { anbm | n 4 and m 3 }.
Here n 4 means at least 4 a’s, the regular expression for this = aaaa(a)*
m 3 means at most 3 b’s, regular expression for this = (ε+b) (ε+b) (ε+b).
So the regular expression for the given language = aaaa(a)* (ε+b) (ε+b) (ε+b).
xi. Obtain the regular expression to accept the language L = { anbm cp | n 4 and m 3 p
2}.
Here n 4 means at least 4 a’s, the regular expression for this = aaaa(a)*
m 3 means at most 3 b’s, regular expression for this = (ε+b) (ε+b) (ε+b).
p 2 means at most 2 c’s, regular expression for this = (ε+c) (ε+c)
So the regular expression for the given language = aaaa(a)*(ε+b) (ε+b) (ε+b) (ε+c)
(ε+c).
xii. All strings of a’s and b’s that do not end with ab.
Strings of length 2 and that do not end with ab are ba, aa and bb.
So the regular expression = (a+b)*(aa + ba +bb)
xiii. All strings of a’s, b’s and c’s with exactly one a.
The regular expression = (b+c)* a (b+c)*
xiv. All strings of a’s and b’s with at least one occurrence of each symbol in Σ = {a, b}.
At least one occurrence of a’s and b’s means ab + ba, in between we have n number
of a’s and b’s.
So the regular expression =(a+b)* a (a+b)* b(a+b)* +(a+b)* b(a+b)* a(a+b)*
Case ii. Since nm 3, if m = 1 then n should be 3. The equivalent regular expression is given
by: RE = aaa(a)* b
Case iii. Since nm 3, if m 2 and n 2 then the equivalent regular expression is given by:
RE = aa(a)* bb(b)*
So the final regular expression is obtained by adding all the above regular expression.
Regular expression = abbb(b)* + aaa(a)*b + aa(a)*bb(b)*
Application of Regular expression:
1. Regular expressions are used in UNIX.
2. Regular expressions are extensively used in the design of Lexical analyzer phase.
3. Regular expressions are used to search patterns in text.
FINITE AUTOMATA AND REGULAR EXPRESSIONS
1. ****Converting Regular Expressions to Automata:
Prove that every language defined by a regular expression is also defined by a finite automata.
Proof:
Suppose L = L(R) for a regular expression R, we show that L = L(E) for some ε-NFA E with:
a. Exactly one accepting state.
b. No arcs into the initial state.
c. No arcs out of the accepting state.
The proof must be discussed with the following transition diagrams for the basis of the
construction of an automaton.
Starting at new start state, we can go to the start state of either the automaton for R or S. We then
reach the accepting state of one of these automata R or S. We can follow one of the ε- arcs to the
accepting state of the new automaton.
Automaton for R.S is given by:
The start state of the first( R) automata becomes the start state of the whole and the final state of
the second(S) automata becomes the final state of the whole.
Automaton for R* is given by:
From start state to final state one arc labeled ε ( for ε in R*) or the to the start state of automaton
R through that automaton one or more time and then to the final state.
Finally the ε-NFA for the regular expression: (0+1)*1(0+1) is given by:
3. If the start state of FSM M is part of a loop (i.e: it has any transitions coming into it), then
create a new start state s and connects to M ‘s start state via an ε-transition. This new
start state s will have no transitions into it.
4. If a FSM M has more than one accepting state or if there is just one but there are any
transitions out of it, create a new accepting state and connect each of M’s accepting states
to it via an ε-transition. Remove the old accepting states from the set of accepting states.
Note that the new accepting state will have no transitions out from it.
5. At this point, if M has only one state, then that state is both the start state and the
accepting state and M has no transitions. So L (M} = {ε}. Halt and return the simple
regular expression as ε.
6. Until only the start state and the accepting state remain do:
6.1. Select some state s of M which is of any state except the start state or the accepting
state.
6.2 Remove that state s from M.
6.3 Modify the transitions among the remaining states so that M accepts the same
strings The labels on the rewritten transitions may be any regular expression.
7. Return the regular expression that labels the one remaining transition from the start state
to the accepting state
Consider the following FSM M: Show a regular expression for L(M).
OR
Obtain the regular expression for the following finite automata using state elimination method.
We can build an equivalent machine M' by eliminating state q2 and replacing it by a transition
from q1 to q3 labeled with the regular expression ab*a.
So M' is:
Obtain the regular expression for the following finite automata using state elimination method.
There is no incoming edge into the initial state as well as no outgoing edge from final state. So
there is only two states, initial and final.
There is no incoming edge into the initial state as well as no outgoing edge from final state.
After eliminating the state B:
Regular expression = ab
Obtain the regular expression for the following finite automata using state elimination method.
There is no incoming edge into the initial state as well as no outgoing edge from final state.
After eliminating the state B:
Obtain the regular expression for the following finite automata using state elimination method.
Since initial state has incoming edge, and final sate has outgoing edge, we have to create a new
iniatial and final state by connecting new initial state to old initial state through ε and old final
state to new final state through ε. Make old final state has non-final state.
Since there are multiple final states, we have to create a new final state.
Obtain the regular expression for the following finite automata using state elimination method.
Obtain the regular expression for the following finite automata using state elimination method.
Since start state 1 has incoming transitions, we create a new start state and link that state to state
1 through ε.
Since accepting state 1 and 2 has outgoing transitions, we create a new accepting state and link
that state to state 1 and state 2 through ε. Remove the old accepting states from the set of
accepting states. (ie: consider 1 and 2 has non final states)
Finally we have only start and final states with one transition from start state 1 to final state 2,
The labels on transition path indicates the regular edpression.
Regular Expression = (ab U aaa* b)* (a U ε )
The first goes from state i to state k without passing through k, the last piece goes from k to j
without passing through k, and all the pieces in the middle go from k to itself, without passing
through k. When we combine the expressions for the paths of the two types above, we have the
expression for the labels of all paths from state i to state j that go through no state higher than k.
Rij0 = Regular expressions for the paths that can go through no intermediate states at all.
Rij1 = Regular expressions for the paths that can go through an intermediate state 1 only.
Rij2 = Regular expressions for the paths that can go through an intermediate state 1 and state 2
only.
Rij3 = Regular expressions for the paths that can go through an intermediate state 1, state 2 and
Write the regular expression for the language accepted by the following DFA:
Answer:
When k =0; (passing through no intermediate state), the various regular expressions are:
When k =1; (passing through sate 1 as intermediate state), the various regular expressions are:
Therefore the regular expression corresponding to the language accepted by the DFA is given by:
R122 (state 1(i) is the start state and state 2(j) is the final state). By using the formula:
Answer:
Number of states in DFA = 3; ie: k = 3
By renaming the states of DFA:
Regular expressions for paths that can go through a) no state, b) state 1 only and c) states 1 and 2
only.
Therefore the regular expression corresponding to the language accepted by the DFA is given by:
R133 (state 1(i) is the start state and state 3 (j) is the final state). By using the formula:
q2 q2 q3
*q3 q3 q2
Answer:
Number of states in DFA = 3; ie: k =3
By renaming the states of DFA as q1 = 1, q2 = 2, q3 = 3
Transition diagram of DFA:
Regular expressions for paths that can go through 3 intermediate states: states 1, states 2 and
states 3 only.
Rij(3)
R11(3) Ø+ε=ε
R12(3) b
R13(3) (a + bb)b*
R14(3) ab*a + bbb*a
R21(3) Ø
R22(3) Ø+ε=ε
(3)
R23 bb*
R24(3) bb*a
R31(3) Ø
R32(3) Ø
R33(3) b*
R34(3) b*a
R41(3) Ø
R42(3) Ø
R43(3) Ø
R44(3) Ø+ε=ε
The regular expression corresponding to the language accepted by the DFA is given by: R144
(state 1(i) is the start state and state 4 (j) is the final state). By using the formula:
Let L be a regular language. Then there exists a constant ‘n’ (which depends on L) such that for
every string ‘w’ in L such that |w| ≥ n, we can break w into three strings, w=xyz, such that:
2. |xy| ≤ n
each ai is an input symbol. Since we have ‘m’ input symbols, naturally we should have ‘m+1’
states, in sequence q0, q1, q2……………….qm where q0 is → start state and qm is → final state.
Since |w| ≥ n, by the pigeonhole principle it is not possible to have distinct transitions, since there
are only ‘n’ different states. So one of the state can have a loop. Thus we can find two different
integers i and j with 0 ≤ i < j ≤ n, such that qi = qj. Now we can break the string w = xyz as
follows:
x = a1a2a3……………..ai.
y = ai+1, ai+2, ……..aj ( loop string where i =j)
z = aj+1,aj+2,…………..am.
The relationships among the strings and states are given in figure below:
‘x’ may be empty in the case that i= 0. Also ‘z’ may be empty if j = n = m. However, y cannot be
empty, since ‘i’ is strictly less than ‘j’.
Thus for any k ≥ o, xykz is also accepted by DFA ‘A’; that is for a language L to be a regular,
xykz is in L for all k ≥ o.
Applications of Pumping lemma:
1. It is useful to prove certain languages are non-regular.
2. It is possible to check whether a language accepted by FA is finite or infinite.
Show that L= {an bn | n>= 0} is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
since |w| = n +n = 2n ≥ n, we can split ‘w’ into xyz such that |xy| ≤ n and |y |≥ 1 as
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the string ‘w’ has ‘n’ number of ‘a‘s followed by ‘n’
number of ‘b’s. ie: w = an bn.
But according to pumping lemma ‘n+1’number of ‘a’s should be followed by ‘n’ of ‘b’s, which
is a contradiction to the assumption that the language is regular.
So the language L= {ai bj | i>j} is not regular language.
Show that L= {w | na(w) < nb(w) } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = an-1 bn
since |w| = n-1 + n = 2n-1 ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the string ‘w’ has ‘n-1’ number of ‘a‘s followed by
‘n-1’ number of ‘b’s. ie: w = an-1 bn-1.
But according to pumping lemma ‘n-1’number of ‘a’s should be followed by ‘n’ of ‘b’s, which is
a contradiction to the assumption that the language is regular.
So the language L= {w | na(w) < nb(w) } is not regular.
Show that L= {w | na(w) = nb(w) } is not regular.
We can prove that L is not regular by taking string w= anbn | n>=0.
For solution refer problem1.
Show that L= {ai bj | i ≠ j} is not regular.
ie: i ≠ j means i > j or i < j; so we can take string ‘w’ = an+1bn or w= an-1bn.
Solution is similar to the previous problems.
Show that L= {an bm cn+m | n,m >= 0} is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Since L is regular it is closed under homomorphism. So we can take h(a) = a, h(b) = a and h(c) =
c.
Now the language L is reduced to L = {an am cn+m | n+m >= 0}
ie: L= {an+m cn+m | n+m >= 0} which is in the form
L = { ai bj | i >=0},
Consider w = an bn
since |w| = n +n = 2n ≥ n, we can split ‘w’ into xyz such that |xy| ≤ n and |y |≥ 1 as
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s will be less than number of ‘b’s
ie: w = an-1 bn.
Which is a contradiction to the assumption that the language is regular. So the given language
So the language L= {an bn | n>= 0} is not regular language L= {an bm cn+m | n,m >= 0} is not
regular.
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s on the left of first b will be less
than number of ‘a’s after the first b
ie: ww = an-1 bnanbn.
Which is a contradiction to the assumption that the language is regular.
So the language L= {ww | w € (a+b)*} is not regular is not regular language.
Show that L= {wwR | w € (a+b)* } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = an bn, therefore wwR = an bn bn an
since |w| = n+n+n + n = 4n ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
x = an-1
y=a
z = bn bn an
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s on the left of first b will be less
than number of ‘a’s after the first b
ie: wwR = an-1 bn bn an.
Which is a contradiction to the assumption that the language is regular.
So the language L= {wwR | w € (a+b)*} is not regular is not regular language.
Show that L= {an! | n ≥ 0 } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
input.
Note: The speed of lexical analysis is a concern in compiler design, since only this phase reads the
source program character-by character.
Discuss the various issues of lexical analysis.
1. Lexical analyzer reads the source program character by character to produce tokens.
2. Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token
when the parser asks a token from it.
3. Normally L.A. don’t return a comment as a token. It skips a comment, and return the
next token (which is not a comment) to the parser.
4. Correlating error messages: It can associate a line number with each error message. In
some compilers it makes a copy of the source program with the error messages inserted
at the appropriate positions.
5. If the source program uses a macro-preprocessor, the expansion of macros may be
performed by the lexical analyzer.
Role of Lexical Analyzer
Explain the role of lexical analyzer with a block diagram.
• Read the input characters of the source program, group them into lexemes and produces
output as a sequence of tokens.
• It interacts with the symbol table.
• Initially parser calls the lexical analyzer, by means of getNextToken command.
• In response to this command LA read characters from its input until it can identify the
next lexeme and produce a token for that lexeme, which can be returned to parser.
• It eliminates comments and white space.
i. Token.
ii. Pattern.
iii. Lexeme.
Token: It describes the class or category of input string. A token is a pair consisting of a token
name and an optional attribute value.
For example, identifier, keywords, constants are called tokens.
Pattern: Set of rule that describes the tokens. It is a description of the form that the lexemes of a
token may take.
Example: letter [A-Za-z].
Lexeme: Sequence of characters in the source program that are matched with the pattern of the
token.
Example: int, a, num, ans etc.
Token representation:
In many programming languages, the following classes cover most or all of the tokens:
i. One token for each keyword; The pattern for a keyword is the same as the keyword itself.
ii. Tokens for the operators, either individually or in classes such as the token comparison.
iii. One token representing all identifiers.
iv. One or more tokens representing constants, such as numbers and literal.
v. Tokens for each punctuation symbol, such as left and right parentheses, comma, and
semicolon.
Attributes for tokens
A token has only a single attribute that is a pointer to the symbol-table entry in which the
information about the token is kept.
Example: The token names and associated attribute values for the statement E = M * C ** 2 are
written below as a sequence of pairs.
<id, pointer to symbol-table entry for E>
<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>
Lexical errors:
1. It is hard for a lexical analyzer to tell, without the aid of other components, that there is a
source-code error. For instance, if the string fi is encountered for the first time in a C
program in the context:
fi ( a == f ( x ) )
A lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared
function identifier. Since fi is a valid lexeme for the token id, the lexical analyzer must return
the token id to the parser and let some other phase of the compiler- probably the parser in
this case handle an error due to transposition of the letters.
2. Suppose a situation arises in which the lexical analyzer is unable to proceed because none of
the patterns for tokens matches any prefix of the remaining input. The simplest recovery
strategy is "panic mode" recovery. We delete successive characters from the remaining input,
until the lexical analyzer can find a well-formed token at the beginning of what input is left.
This recovery technique may confuse the parser, but in an interactive computing environment
it may be quite adequate.
The forward pointer moves ahead to search for end of lexeme. As soon as the blank space is
encountered, it indicates end of lexeme. In above example as soon as forward pointer encounters
a blank space, the lexeme is identified.
The fp will be moved ahead when it sees white space. That is when fp encounters white space
it ignores and moves ahead. Then both fp and bp is set at next token.
1. One buffer
2. Two buffer
One buffer scheme:
Here only one buffer is used to store the input string. But the problem with this scheme is that, if
a lexeme is very long, then it crosses the buffer boundary. To scan the remaining part of lexeme,
the buffer has to be refilled, that makes overwriting of first part of lexeme. Sometimes it may
result in loss of data due to the user misinterpretation.
Two Buffer scheme:
Why two buffer schemes is used in lexical analysis? Explain.
Because of the amount of time taken to process characters and the large number of characters
that must be processed during the compilation of a large source program, specialized two
buffering techniques have been developed to reduce the amount of overhead required to process
a single input character.
Here a buffer (array) divided into two N-character halves, where N = number of
characters on one disk block Ex: 4096 bytes – If fewer than N characters remain in the
input file , then special character, represented by eof, marks the end of source file and it is
different from input character.
One read command is used to read N characters. Two pointers are maintained: beginning
of the lexeme pointer and forward pointer.
Initially, both pointers point to the first character of the next lexeme.
Using this method we can overcome the problem faced by one buffer scheme, even
though the input is lengthier the user knows from where he has to begin in the next
buffer, as he can see the contents of previous buffer. Thus there is no scope for loss of
any data.
Sentinels:
In two buffering scheme we must check the forward pointer, each time it is incremented. Thus
we make two tests: one for the end of the buffer, and one to determine what character is read.
We can combine these two tests, if we use a sentinel character at the end of buffer.
Sentinel is a special character inserted at the end of buffer, that cannot be a part of source
program; eof is used as sentinel.
Look ahead code:
Operations on Languages:
Give the formal definitions of operations on languages with notations.
In lexical analysis the most important operations on languages are:
i. Union
ii. Concatenation
iii. Star closure
iv. Positive closure.
These operations are formally defined as follows
Regular Expressions
We use regular expressions to describe tokens of a programming language.
A regular expression is built up of simpler regular expressions (using defining rules)
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 124
Automata Theory & Compiler Design 21CS51 Module 2
Algebraic properties
r|s = s|r
r|(s|t)= (r|s)|t
(rs)t = r(st)
r(s|t) = rs|rt
(s|t)r = sr|tr
εr=r
rε=r
r* = (r| ε)* r** = r*
OR
letter → [A-Za-z_]
digit → [0-9]
digits → ( digit )+
optionalFraction → . digits | ε
optionalExponent → ( E ( + | - | ε ) digits ) | ε
number → digits optionalFraction optionalExponent
OR
digit → [0-9]
digits → digit+
number → digits ( . digits ) ? ( E [+-]? digits )?
Recognition of Tokens
Our current goal is to perform the lexical analysis needed for the following grammar.
Specification of Token
To specify tokens Regular Expressions are used.
Recognition of Token: To recognize tokens there are 2 steps
1. Design of Transition Diagram
2. Implementation of Transition Diagram
Transition Diagrams
A transition diagram is similar to a flowchart for (a part of) the lexer. We draw one for each
possible token. It shows the decisions that must be made based on the input seen. The two main
components are circles representing states (think of them as decision points of the lexer) and arrows
representing edges (think of them as the decisions made).
It is fairly clear how to write code corresponding to this diagram. You look at the first character, if
it is <, you look at the next character. If that character is =, you return (relop, LE) to the parser. If
instead that character is >, you return (relop, NE). If it is another character, return (relop, LT) and
adjust the input buffer so that you will read this character again since you have not used it for the
current lexeme. If the first character was =, you return (relop, EQ).
Write the transition diagram to recognize the token given below:
i. relop (relational operator)
ii. Identifier and keyword
iii. Unsigned number
iv. Integer constant
v. Whitespace
i. Transition diagram for relop:
v. Whitespace:
Whitespace characters are represented by delimiter, where delim includes the characters like
blank, tab, new line and other characters that are not considered by the language design to be part
of any token.
There are two ways we can handle reserved words that look like identifiers:
1. Install the reserved words in the symbol table initially: When we find an identifier, a call
to installID( ) function places that identifier into the symbol table if it is not already there
and returns a pointer to the symbol table entry. The function getToken( ) examines the
symbol table for the lexeme found, and returns token name as either id or one of the
keyword token that was initially installed in the table.
2. Create separate transition diagrams for each keyword
Architecture of a transition diagram based lexical analyzer
The idea is that we write a piece of code for each decision diagram. This piece of code contains a
case for each state, which typically reads a character and then goes to the next case depending on
the character read. nextchar() is used to read a next char from the input buffer. The numbers in the
circles are the names of the cases. Accepting states often need to take some action and return to the
parser. Many of these accepting states (the ones with stars) need to restore one character of input.
This is called retract() in the code.
What should the code for a particular diagram do if at one state the character read is not one of
those for which a next state has been defined? That is, what if the character read is not the label of
any of the outgoing arcs? This means that we have failed to find the token corresponding to this
diagram.
The code calls fail(), is not an error case. It simply means that the current input does not match
this particular token. So we need to go to the code section for another diagram after restoring the
input pointer so that we start the next diagram at the point where this failing diagram started. If
we have tried all the diagram, then we have a real failure and need to print an error message and
Coding part:
TOKEN getRelop( )
{
TOKEN retToken = new(RELOP);
while(1)
{ /* repeat character processing until a return or failure occurs */
Switch (state)
{
case 0: c = nextChar( );
if ( c == '< ‘ ) state = 1; else
if ( c == '=' ) state = 5; else if
( c == '>' ) state = 6;
else fail( ); /* lexeme is not a relational operator…other */
break;
case 1: c = nextChar( );
if ( c == '=' ) state = 2;
else if ( c == '>' ) state = 3;
else if ( c == other character ) state = 4;
else fail( ); /* lexeme is not a relational operator…other */
break;
……..
……..
case 8: retract( );
retToken.attribute = GT;
return(retToken);
}
}
}
----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.
Page | 134
Automata Theory & Compiler Design 21CS51 Module 3
Where A production represents any number of a‟s and b‟s and is given by:
A → aA | bA | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → AabA
A → aA | bA | ɛ
iii. Obtain CFG for the language L = { ( 011 + 1)* 01 }
L can be re-written as:
S → A01
A → 011A | 1A | ɛ
iv. Obtain CFG for the language L = { w| w € (0,1)* with at least one occurrence of „101‟ }.
The regular expression corresponding to the language is L = { w 101 w }
Where A production represents any number of 0‟s and 1‟s and is given by:
A → 0A | 1A | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → A101A
A → 0A | 1A | ɛ
v. Obtain CFG for the language L = { w| wab € (a,b)* }.
OR
Obtain CFG for the language containing strings of a‟s and b‟s ending with‟ab‟. }.
The resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → Aab
A → aA | bA | ɛ
vi. Obtain CFG for the language containing strings of a‟s and b‟s ending with‟ab‟ or „ba‟. }.
OR
Obtain the context free grammar for the language L = { XY | X € (a, b)* and Y € (ab or ba)
The regular expression corresponding to the language is w (ab + ba) where w is in( a, b)*
X→ aX | bX | ɛ
Y → ab | ba
The resulting grammar is G = ( V, T, P, S) where,
V = { S, X, Y }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → XY
X→ aX | bX | ɛ
Y → ab | ba
Obtain the CFG for the language L = { Na (w) = Nb (w) | w € (a, b)* }
OR
Obtain the CFG for the language containing strings of a‟s and b‟s with equal number of a‟s and b‟s.
Answer:
To get equal number of a‟s and b‟s, we know that there are 3 cases:
i. An empty string ɛ has equal number of a‟s and b‟s
ii. Equal number of a‟s followed by equal number of b‟s.
iii. Equal number of b‟s followed by equal number of a‟s.
The corresponding productions for these 3 cases can be written as
S→ ɛ
S→ aSb
S→ bSa
Using these productions the strings of the form ɛ, ab, ba, ababab….., bababa…. etc can be
generated.
But the strings such as abba, baab, etc, where the strings starts and ends with the same symbol,
cannot be generated from these productions. So to generate these type of strings, we need to
concatenate the above two productions which generates equal a’s and equal b’s and equal b’s and
equal a’s or vice versa. The corresponding production is S→ SS.
The resulting grammar corresponding to the language with equal number of a‟s and equal number
of b‟s is G = ( V, T, P, S) where,
V = { S }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S→ɛ
S→ aSb
S → bSa
S → SS
Obtain the CFG for the language L = { Na (w) = Nb (w) + 1 | w € (a, b)* }
The language containing stings of a‟s and b‟s with number of a‟s one more than number of „b‟s.
Here we should have one more a‟s either in the beginning or at the end or at the middle.
We can write the A production with equal number of a‟s and equal number of b‟s as
A→ ɛ | aAb | bAa |AA
and finally inserting one extra „a‟ between these A production. ie:
S→ AaA
We know that CFG corresponding to the language 0m 1m | m ≥ 1, by referring the basic building
block grammar of an bn | n ≥ 1.
The equivalent A production is:
A → 0A1
A → 01
Here B represents any number of 2‟s with at least one 2 (n ≥ 1), which is similar to an grammar.
The equivalent B production is:
B → 2B
B→2
So the context free grammar for the language L = { 0m 1m 2n | m, n ≥ 1 } is G = ( V, T, P, S) where,
V = { S, A, B}, T = { 0, 1, 2}, S is the start symbol, and P is the production rule is as shown below:
S → AB
A → 0A1 | 01
B → 2B | 2
Obtain the context free grammar for the language L = {a2n bm | m, n ≥ 0 }
Answer:
Since „a‟ represented in terms of „n‟ and „b‟ represented in terms of „m‟, we can re-write the
language as:
Case 2: when j = k
Where C→ 1C| 1
So the context free grammar for the language L = { 0i 1j | i ≠ j where i, j ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A, B, C}, T = {0, 1}, S is the start symbol, and P is the production rule is as shown below:
S→ AB |BC
A→ 0A| 0
B→ 0B1| ɛ
C→ 1C| 1
Obtain the context free grammar for the language L = {an bm | n = 2m where m ≥ 0 }
Answer:
By substituting n = 2m we have
L= { a2m bm | m ≥ 0 }
Here for every two „a‟s one „b‟ has to be generated. This is obtained by suffixing „aaS‟ with one
„b‟. The minimum string is ɛ.
So the context free grammar for the language L = {an bm | n = 2m where m ≥ 0 }
is G = ( V, T, P, S) where,
V = {S }, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aaSb
S→ɛ
Obtain the context free grammar for the language L = {an bm | n ≠ 2m where n, m ≥ 1 }
Answer:
Here n ≠ 2m means n > 2m or n< 2m, which results in two possible cases of Language L.
Case 1: when n > 2m, we can re-write the language L by taking n = 2m + 1
L= { a2m+1 bm | m ≥ 1}; by referring the basic building block grammar example, the resulting
production ( a2m bm ) is given by:
A → aaAb
The minimum string when m = 1 is ‘aaab’.
ie : A → a
is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, c}, S is the start symbol, and P is the production rule is as shown below:
S→ aSc| A
A→ bAcc| ɛ
Obtain the context free grammar for the language L = { w an bn wR| W is in (0, 1)* and n ≥ 0 }
Answer: we can re-write the language L as
The corresponding A production is given by; A → aAb |ɛ ; min. value is ɛ when n = 0
We can insert this substring A production between wwR production represented by S.
The corresponding S production is S → 0S0 | 1S1 |A
Note: In S production minimum value is A, when wwR results in ɛ; ie: only the middle substring A
appears.
So the context free grammar for the language L = { w an bn wR| w is in (0, 1)* and n ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → 0S0 | 1S1 | A
A→ aAb | ɛ
Obtain the context free grammar for the language L = { an wwR bn | w is in (0, 1)* and n ≥ 2 }
L=
S1 production is ; S1 → AB
A → aAb |ɛ
B → cB | c
S2 production is ; S2 → AC
C → cCd | ɛ
So the context free grammar for the language L = {an bnci | n ≥ 0, i ≥1 U an bn cm dm | n, m ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, S1, S2 A, B, C}, T = {a, b, c, d}, S is the start symbol, and P is the production rule is as
shown below:
S → S1| S2
S1 → AB
A → aAb |ɛ
B → cB | c
S2 → AC
C → cCd | ɛ
Obtain the context free grammar for the language L1L2 where L1 = { an bn ci | n ≥ 0, i ≥1 } and L2
={ 0n12n | n ≥ 0 }
Answer:
S1 production is ; S1 → AB
A → aAb |ɛ
B → cB | c
S2 production is: S2 → 0 S211 | ɛ
A → aAb |ab
B→ bB |ɛ ; and S production is S → aAB
So the context free grammar for the language L = { an+2 bm | n ≥ 0, m > n } is G = ( V, T, P, S)
where,
V = {S, A, B}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below: S
→ aAB
A → aAb |ab
B→ bB |ɛ
******* Obtain the context free grammar for the language L = { an bm | n ≥ 0, m > n }
n=0 n=1 n=2 …….
m=1 m=2 m=3 … m=2 m=3 m=4 . m=3 m=4 m=5 ……….
ɛb ɛ bb ɛ bbb … abb abbb abbbb . aabbb aabbbb aabbbbb
ɛ b+ ab b+ aabb b+ ……….
an bn b+ where n ≥ 1
We observe that above language consists of strings of a‟s and b‟s with n number of a‟s followed by
n number of b‟s, which in term followed by any number of b‟s with at least one b
L = { an b n b + | n ≥ 0 }
******* Obtain the context free grammar for the language L = { an bn-3 | n ≥ 3 }
Answer:
L = { aaaɛ, aaaab, aaaaabb, aaaaaabbb,………………………………….. }
So we can re-write the language as;
L = aaa an bn | n ≥ 0
So the context free grammar for the language L = { an bn-3 | n ≥ 3 } is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aaaA
A → aAb | ɛ
******* Obtain the context free grammar for the language L = { w € ( a, b)* | |w| mod 3 ≠ |w| mod
2}
DFA:
Note: The derivation process may end whenever one of the following things happens.
i. The working string no longer contains any non terminal symbols (including, as a special case when
the working string is ε). Ie: working string is generated.
ii. There are non terminal symbols in the working string but there is no match with the left-hand
side of any rule in the grammar. For example, if the working string were AaBb, this would
happen if the only left-hand side were C
Left Most Derivation (LMD): In derivation process, if a leftmost variable is replaced at every step,
then the derivation is said to be leftmost.
Example: E → E+E | E*E | a | b
Let us derive a string a+b*a by applying LMD.
E => E*E
E+E*E
a +E*E
a+b*E
a+b*a
Right Most Derivation (RMD): In the derivation process, if a rightmost variable is replaced at every
step, then the derivation is said to be rightmost.
Example: E → E+E | E*E | a | b
Let us derive a string a+b*a by applying RMD.
E => E+E
E+E*E
E +E*a
E+b*a
a+b*a
Sentential form: For a context free grammar G, any string „w‟ in (V U T)* which appears in every
derivation step is called a sentence or sentential form.
Two ways we can generate sentence:
i. Left sentential form
ii. Right sentential form
Example: S => AB
aAbB
abB
abbB
abb
Here {S, AB, aAbB, abB, abbB, abb } can be obtained from start symbol S, Each string in the set is
called sentential form.
Left Sentential form: For a context free grammar G, any string „w‟ in (V U T)* which appears in
every Left Most Derivation step is called a Left sentential form.
Example: E => E*E
E+E*E
a +E*E
a+b*E
a+b*a
Left sentential form = {E, E*E, E+E*E, a +E*E, a+b*E, a+b*a }
Right Sentential form: For a context free grammar G, any string „w‟ in (V U T)* which appears in
every Right Most Derivation step is called a Left sentential form.
Example: E => E+E
E+E*E
E +E*a
E + b*a
a + b*a
Right sentential form = {E, E+E, E+E*E, E +E*a, E+ b*a, a + b * a }
PARSE TREE: ( DERIVATION TREE)
What is parse tree?
The derivation process can be shown in the form of a tree. Such trees are called derivation trees or
Parse trees.
Example: E → E+E | E*E | a | b
The Parse tree for the LMD of the string a+b*a is as shown below:
YIELD OF A TREE
What is Yield of a tree?
The yield of a tree is the string of terminal symbols obtained by only reading the leaves of the tree
from left to right without considering the ɛ symbols.
Example:
Problem 1:
Consider the following grammar G:
S → aAS |a
A→ SbA |SS |ba
Obtain: i) LMD; ii. RMD iii. Parse tree for LMD iv. Parse tree for RMD for the string
„aabbaa‟
Problem 2:
Design a grammar for valid expressions over operator – and /. The arguments of expressions are
valid identifier over symbols a, b, 0 and 1. Derive LMD and RMD for string w = (a11 – b0) / (b00 –
a01). Write parse tree for LMD
Answer:
Grammar for valid expression:
E → E – E | E / E | (E) |I
I → a | b | Ia |Ib | I0 |I1
Problem 3:
Consider the following grammar G:
E → + EE | * EE | - EE | x | y
Find the: i) LMD; ii. RMD iii. Parse tree for the string „+*-xyxy‟
Answer:
E → + EE | * EE | - EE | x | y
LMD: RMD:
Problem 4:
Show the derivation tree for the string „aabbbb‟ with grammar:
S → AB |ɛ
A → aB
B → Sb
Give a verbal description of the language generated by this grammar.
Answer: Derivation tree:
Problem 6:
Consider the following grammar:
S → AbB
A →aA |ɛ
B → aB | bB |ɛ
Give LMD, RMD and parse tree for the string aaabab
LMD: RMD:
Obtain the context free grammar for generating integers and derive the integer 1278 by applying
LMD.
The context free grammar corresponding to the language containing set of integers is G = ( V, T, P,
S) where, V = { I, N, D }, T = { 0, 1}, I is the start symbol, and P is the production rule is as shown
below:
I → N | SN
S→+|-|ε
N → D | DN | ND
D → 0 | 1 | 2 | 3 | ……….| 9
LMD for the integer 1278:
I => N
ND
NDD
NDDD
DDDD
1DDD
12DD
127D
1278
AMBIGUOUS GRAMMAR
Sometimes a Context Free Grammar may produce more than one parse tree for some (or all) of the
strings it generates. When this happens, we say that the grammar is ambiguous. More precisely. a
grammar G is ambiguous if there is at least one string in L( G) for which G
produces more than one parse tree.
***What is an ambiguous grammar?
A context free grammar G is an ambiguous grammar if and only if there exists at least one string
„w’ is in L(G) for which grammar G produces two or more different parse trees by applying either
LMD or RMD.
Show how ambiguity in grammars are verified with an example.
Testing of ambiguity in a CFG by the following rules:
i. Obtain the string ‘w‟ in L(G) by applying LMD twice and construct the parse tree. If the two parse
trees are different, then the grammar is ambiguous.
ii. Obtain the string ‘w‟ in L(G) by applying RMD twice and construct the parse tree. If the
two parse trees are different, then the grammar is ambiguous.
iii. Obtain the LMD and get a string „w‟. Obtain the RMD and get the same string „w‟
for both the derivations construct the parse tree. If there are two different parse trees
then the grammar is ambiguous.
Show that the following grammar is ambiguous:
S → AB | aaB
A → a | Aa
B→b
Let us take the string w= aab
This string has two parse trees by applying LMD twice so the grammar is ambiguous;
The grammar is ambiguous, because we are getting two different parse trees for the same string by
applying LMD twice.
Associativity and Precedence Priority in CFG:
Example:
E → E+E| E-E
E →E*E
E →a|b|c
Associativity:
Let us consider the string : a + b + c
Parse Tree for LMD1: Parse Tree for LMD2:
The two different parse trees exist because of the associativity rules fails. That means for the given
string a + b + c; on either side of the operand „b’, there exist two operators. Which operator should
I associate with operand b? This ambiguity results in either I should consider the operand „b‟ with
left side operator (Left associative) or right side (Right associative) operator. So the first parse tree
is correct, where the left most „+‟ is evaluated first.
How to resolve the associtivity rules:
E →E+E
E →a|b|c
Here the grammar is not defined in the proper order, ie: the growth of the tree is in either left
direction or right direction.
The growth of the first parse tree is in left direction. That means it is left associative. The growth
second parse tree is in right direction, ie: right associative.
For normal associative rule is left associative, so we have to restrict the growth of parse tree in right
direction by modifying the above grammar as:
E →E+I|I
I→ a | b | c
The parse tree corresponding to the string: a+b+c:
The growth of the parse tree is in left direction since the grammar is left recursive, therefore it is
left associative. There is only one parse tree exists for the given string. So the grammar is
ambiguous.
Note: For the operators to be left associative, grammar should be left recursive. Also for the
operators to be right associative, grammar should be right recursive.
Left Recursive grammar: A production in which the leftmost symbol of the body is same as the
non-terminal at the head of the production is called a left recursive production.
Example: E → E + T
Right Recursive grammar: A production in which the rightmost symbol of the body is same as
the non-terminal at the head of the production is called a right recursive production.
Example: E → T + E
Precedence of operators in CFG:
Let us consider the string: a + b * c
LMD 1 for the string: a+b*c LMD 2 for the string: a+b*c
The first parse tree is valid, because the highest precedence operator „*‟ is evaluated first compared
to „+‟. (See the lower level of parse tree, where „*‟ is evaluated first). The second parse tree is not
valid, since the expression containing „+‟ is evaluated first. So here we got two parse trees because
of the precedence is not taken care.
So if we take care of associativity and precedence of operators in CFG, then the grammar is un-
ambiguous.
NOTE:
Normal precedence rule: If we have the operators such as +, -, *, /, , then the highest precedence
operator is evaluated first.
Next highest precedence operator * and / is evaluated. Finally the least precedence operator + and –
is evaluated.
Normal Associativity rule: Grammar should be left associative.
E →E+T|T
Similarly at the second level, we have to generate all „*‟s.
T → T * F ; * is left associative.
If the expression does not contain any „*‟s, then we have to bypass the grammar T → T * F
T → F
Finally the second level grammar is
T →T*F|F
Third level:
F →a|b|c
So the resultant un-ambiguous grammar is:
E →E+T|T
T →T*F|F
F →a|b|c
So the operator which is closest to the start symbol has least precedence and the operator which is
farthest away from start symbol has the highest precedence.
Un-Ambiguous Grammar:
For a grammar to be un-ambiguous we have to resolve the two properties such as:
i. Associativity of operators: This can be resolved by writing the grammar recursion.
ii. Precedence of operators: can be resolved by writing the grammar in different levels.
Is the following grammar is ambiguous?
If the grammar is ambiguous, obtain the un-ambiguous grammar assuming normal precedence and
associativity.
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c 10
Answer:
Let us consider the string: a + b * c
LMD 1 for the string: a+b*c LMD 2 for the string: a+b*c
For the given string there exists two different parse trees, by applying LMD twice. So the above
grammar is ambiguous.
The equivalent un-ambiguous grammar is obtained by writing all the operators as left associative
and writing the operators +, – at the first level and *, / at the next level.
Equivalent un-ambiguous grammar:
E →E+T|E–T|T
T →T*F|T/F|F
F → ( E) | a | b | c
Is the following grammar is ambiguous?
If the grammar is ambiguous, obtain the un-ambiguous grammar assuming the operators + and – are
left associative and * and / are right associative with normal precedence .
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c
Ambiguous grammar------- see the previous answer.
Equivalent un-ambiguous grammar:
E →E+T|E–T|T
T →F*T|F/T|F
F → ( E) | a | b | c
LMD 1 for the string „aab‟: LMD 2 for the string „aab‟:
RMD 1 for the string „aab‟: RMD 2 for the string „aab‟:
The above grammar is ambiguous, since we are getting two parse trees for the same string „aab‟ by
applying LMD twice.
Two LMDs:
means the evaluation starts from right side; therefore the operator is right associative.
Show that the following grammar is ambiguous. Also find the un-ambiguous grammar equivalent to
the grammar by normal precedence and associative rules.
E → E+ E | E - E
E → E*E| E / E
E→E E
E → ( E) | a | b
Answer:
We already proved that the above grammar is ambiguous
Equivalent Un-ambiguous grammar:
E→E+T|E–T|T
T→T*F|T/F|F
F→G F|G
G → (E) | a | b
The given string has two parse trees by applying LMD twice so the grammar is ambiguous;
Show that the following grammar is ambiguous using the string “ ibtibtaea”
S → iCtS | iCtSeS | a
C→ b
Answer:
String w = ibtibtaea
The given string has two parse trees by applying LMD twice:
stmt → if expr then stmt | if expr then stmt else stmt | other
Terminals are keywords if, then and else.
Non terminals are expr and stmt.
Here “other” stands for any other statement. According to this grammar one of the compound
conditional statement can be written as
if E1 then S1 else if E2 then S2 else S3
It has the parse tree as shown below:
In all programming languages with conditional statements of this form, the first parse tree is
preferred. The general rule is match each else with the closest unmatched then.
Unambiguous grammar for this if else statements:
stmt → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
ope_stmt → if expr then stmt | if expr then matched_stmt else open_stmt
Unambiguous grammar :
S → M |U
M → iEtMeM |a
U → iEtS | iEtMeU
E → b
LEFT RECURSION
A production in which the leftmost symbol of the body is same as the non-terminal at the head of
the production is called a left recursive production.
Example: E → E + T
Immediate Left recursive production:
A production of the form A → Aα is called an immediate left recursive production. Consider a
non-terminal A with two productions
A → Aα | β
Where α and β are sequence of terminals and non-terminals that do not start with A.
Repeated application of this production results in sequence of α‟s to the right of A. When A is
finally replaced by β, we have β followed by a sequence of zero or more αs.
Therefore a non-left recursive production for A → Aα | β is given by
A → βA’
A’ → αA’ | ε
Note: In general we can eliminate any immediate left recursive production of the form
A → Aα1 | A α2 | Aα3 ………… | Aαm | β1 | β2| β3|…………| βn
By replacing A production by
A → β1A’ | β2 A’| β3 A’|…………| βn A’
A’ → α1 A’ | α2 A’| α3 A’| …………..|αm A’ | ε
no βi begins with A
What is left recursion?
A grammar is left recursive if it has a non-terminal A such that there is a derivation A Aα for
some string α.
Top down parsing methods cannot handle left recursive grammars, so a transformation is needed
to eliminate left recursion.
A grammar containing productions results in left recursive productions, after applying two or
more steps of derivations can be eliminated using the following algorithm.
Algorithm to eliminate left recursion from a grammar having no ε production:
Write an algorithm to eliminate left recursion from a grammar.
1. Arrange the non-terminals in some order A1, A2, . . . , An
2. for ( each i from 1 to n )
{
S → Aa | b
A → Ac| Sd | a
By applying elimination algorithm,
Arrange the non-terminals as A1 = S and A2 = A
Since there is no immediate left recursion among S production, so nothing happens during the
outer loop for i =1.
For i =2, we substitute for S in A → Sd to obtain the following A productions.
A → Ac| Aad | bd | a
Eliminating the immediate left recursion among these A- productions yields the following
grammar
S → Aa | b
A → bdA’| aA’
A’ → cA’| adA’ | ε
C → CAB‟CB| abB‟CB | CC | aB | a
Eliminating the immediate left recursion among these C- productions results in new C
productions as
C → abB’CBC’ | aBC’ |aC’
C’ → AB’CB C’ | CC’ | ε
The equivalent non- left recursive grammar is given by:
A → BC | a
B → CAB‟| abB‟
B‟ → CbB‟ | ε
C → abB‟CBC‟ | aBC‟ |aC‟
C‟ → AB‟CB C‟ | CC‟ | ε
Eliminate left recursion from the following grammar.
Lp → no | Op Ls
Op → +1–1*
Ls → Ls Lp | Lp
For i = 1 and 2 nothing happens to the production Lp and Op.
For i= 3
By removing immediate left recursion,
Ls → Lp Ls‟
Ls‟ → Lp Ls‟ | ε
The equivalent non- left recursive grammar is given by:
Lp → no | Op Ls
Op → +1–1*
Ls → Lp Ls‟
Ls‟ → Lp Ls‟ | ε
S → aB | aC | Sd | Se
B → bBc| f
C → g
For i =1 , results in a new S productions as
S → aB S‟ | aC S‟
S‟ → d S‟ | eS‟ |ε
For i =2 nothing happens to B productions, B → bBc| f
For i =3 nothing happens to C productions C → g
The equivalent non- left recursive grammar is given by:
S → aB S‟ | aC S‟
S‟ → d S‟ | eS‟ |ε
B → bBc| f
C → g
LEFT FACTORING (Non-deterministic to Deterministic CFG conversion)
It is a grammar transformation method used in parser. When the choice between two alternative A
productions is not clear, we can rewrite the productions so to make the right choice.
A→ αβ1 | αβ2 |………..| αβn | Γ
By left factoring this grammar, we get
A → αA‟ | Γ
A‟ → β1 | β2 ……………..| βn
Γ is other alternatives that do not begin with α.
A predictive parser (a top-down parser without backtracking) insists that the grammar must be
left-factored.
What is left factoring?
Left factoring is removing the common left factor that appears in two or more productions of the
Same non-terminal.
Example: S → i EtSeS | iEtS | a
E→b
S → i EtSS’ | a
S’ → eS | ε
E→b
Perform left factoring for the grammar.
E → E+T|T
T → id | id [ ] | id [ X ]
X → E,E|E
The equivalent non-left recursive grammar is given by:
E → TE‟
E‟ → +TE‟ | ε
T → id | id [ ] | id [ X ]
X → E,E|E
After left factoring the grammar, we get
E → TE‟
E‟ → +TE‟ | ε
T → id T‟
T‟ → ε | [ ] |[ X] |
X → E X‟
X‟ → ,E|ε
Top-down parsing can be viewed as the problem of constructing a parse tree for the input
string, starting from the root (Top) and working up towards the leaves (Down).
Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input
string.
At each step of a top-down parse, the key problem is that of determining the production to
be applied for a non-terminal, say A.
Once an A-production is chosen, the rest of the parsing process consists of "matching" the
terminal symbols in the production body with the input string.
RECURSIVE-DESCENT PARSING
Backtracking is needed (If a choice of a production rule does not work, we backtrack to try
other alternatives.)
It is a general parsing technique, but not widely used.
Not an efficient parsing method.
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop, so we
have to eliminate left recursion from a grammar
Recursive-Descent Parsing Algorithm:
Explain Recursive-Descent Parsing Algorithm.
void A ( )
{
1. Choose an A-production, A → X1 X2 X3 …………………………….. Xk ;
2. for (i = 1 to k)
{
3. if ( Xi is a non-terminal )
4. call procedure Xi ( ) ;
5. else if (Xi equals the current input symbol a)
6. advance the input to the next symbol;
The leftmost leaf labeled c, matches the first symbol of input w, so we advance the input pointer to
a, the second symbol of w, and consider the next leftmost leaf labeled A.
Expand A using the first alternative A → a b to obtain the following tree:
Now we have a match for the second input symbol a, with the leftmost leaf labeled a, so we
advance the input pointer to d, third input symbol of w.
Now compare the current input symbol d against the next leaf labeled b. Since b does not match d
,we report failure and go back to A (Back tracking) to see whether there is another alternative for
A that has not been tried, but that might produce a match.
Now the leftmost leaf labeled a matches the current input symbol a, ie: the second symbol of w,
then advance the pointer to the next input symbol d.
Now the next leaf d matches the third input symbol d, later when it finds $ nothing is left out to be
read in the tree. Since it produces a parse tree for the string w, it halts and announce successful
completion of parsing.
return success
Write a recursive descent parser for the grammar:
S → aBc
B → bc | b
Input: abc
Begin with a tree consisting of a single node labeled S with input pointer pointing to first input
symbol a.
Since the input a matches with leftmost leaf labeled a, advance the pointer to next input symbol
b.
Expand B using the alternative B → bc
We have a match for second input symbol b. Move the pointer again it finds the match for third
symbol c. Now the pointer is pointing to $, indicating the end of string, but in the tree we find one
more symbol c to be read, thus it fails
When the pointer is set to position 2, it checks the second alternative and generates the tree ;
Now the pointer moves to the 2nd symbol finds a match, then advances to the 3rd symbol finds a
match, later when it encounters „$‟ nothing is left out to be read in the tree. Thus it halts and
announce successful completion of parsing.
return success
Show that recursive descent parsing fails for the input string „acdb‟ for the grammar.
S → aAb
A → cd | c
The first input symbol a matches with left most leaf a and advance the pointer to next input
symbol c.
Now expand A using the second alternative A → c
We have a match for second input symbol c with left leaf node c. Advance the pointer to the next
input symbol d.
Now compare the input symbol d against the next leaf, labeled b. Since b does not match d, we
report failure and go back to A to see another alternative for A and reset the pointer to position 2.
FOLLOW ( ) FUNCTION:
FOLLOW (A) is defined as the set of terminal symbols that appear immediately to the right of A. ie
: FOLLOW (A ) = { a | S *=> αAaβ where α and β are some grammar symbols, may be terminal or
non terminal symbols.
Rules used in computation of FOLLOW function:
1. For the start symbol S place „$‟ in FOLLOW (S).
2. If there is a production A→ αBβ, then everything in FIRST ( β ) except ε is in FOLLOW
(B).
3. If there is a production A→ αBβ and FIRST(β) derives ε, then
T →T * F|F
F → (E) | id
The above grammar contains left recursive productions, so by eliminating left recursive, grammar
G becomes:
E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ |ε
F → (E) | id
Computation of FIRST set:
From T → FT‟
FOLLW (T‟) = FOLLOW ( T) = { +, ), $ }
From T‟ → *FT‟ | ε
FOLLW (T‟) = FOLLOW (T‟)
Therefore FOLLOW (T’) = { +, ), $ }
From T →FT‟
FOLLOW (F) = FIRST ( T‟) – { ε } U FOLLOW (T) = { *, +, ), $} ie: by
applying 3rd rule, as β tends to ε when T‟ derives ε
From T‟ →*FT‟ | ε
FOLLOW (F) = FIRST ( T‟) – { ε } U FOLLOW (T‟)
Therefore FOLLOW (F) = { *, +,
NOTE:
For any non-terminal, FOLLOW set is computed by selecting the productions in which, that non-
terminal appears on RHS of production.
Non-terminal symbol FIRST FOLLOW
E { (, id } { ), $}
T { (, id } { +, ), $ }
F { (, id } { *, +, ), $ }
E‟ { +, ε } { ),$ }
T‟ { *, ε } { +, ), $ }
T →T * F|F
F → (E) | id
F { (, id } { *, +, ), $ }
FOLLOW ( E ) = { ) } from F → (E) and FOLLOW ( E ) = { + } from E → E +
T .
FOLLOW ( T ) = {*} from T → T * F and FOLLOW ( T ) = FOLLOW{E} from
E →E + T |T
Stmt_seq‟ → ; Stmt_sequence | ε
Stmt → s
Non-terminal symbol FIRST FOLLOW
Stmt_sequence {s } { $}
Stmt_seq’ {; ε} { $}
{
Stmt {s } ; $}
Construct the predictive parsing table by making necessary changes to the grammar given
below:
E →E + T |T
T →T * F|F
F → (E) | id
Also check whether the modified grammar is LL(1) grammar or not.
The above grammar contains left recursive productions, so we eliminate left recursive
productions.
T { (, id } { +, ), $ }
F { (, id } { *, +, ), $ }
E‟ { +, ε } { ), $ }
T‟ { *, ε } { +, ), $ }
Non-terminal id + * ( ) $
E E → TE‟ E → TE‟
E’ E‟→+TE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→*FT‟ T‟→ ε T‟→ ε
F F→ id F→ (E)
The above modified grammar is LL(1) grammar, since the parsing table entry uniquely identifies a
production or signals an error
Construct the LL(1) parsing table for the grammar given below:
E →E * T |T
T → id + T | id
T → id + T | id
E‟ { *, ε } {$}
T { id } { *, $ }
T‟ { +, ε } { *, $ }
Construction of predictive parsing table:
Non-terminal id + * $
E E → TE‟
E’ E‟ → *T E‟ E‟→ ε
T T → id T‟
T’ T‟→ +T T‟→ ε T‟→ ε
Do necessary modifications and Construct the LL(1) parsing table for the resultant grammar .
By eliminating left recursive productions:
E → TE‟
E‟ → ATE‟ | ε
A→+|-
T → FT‟
T‟ → MFT‟ | ε
M →*
F → (E) | num
E‟ { +, -, ε } { ), $ }
T { (, num } { +, - , ), $ }
T‟ { *, ε } { +, - , ), $ }
A { +, - } { (, num }
M {*} { (, num }
F { (, num } { *, +, - , ), $ }
Construction of predictive parsing table:
Input Symbol
Non- num + - * ( ) $
terminal
E E → TE‟ E → TE‟
E’ E‟→ATE‟ E‟→ATE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→ ε T‟→MFT‟ T‟→ ε T‟→ ε
A A →+ A →-
M M→*
F F→ num F→ (E)
Construct the LL(1) parsing table for the grammar given below:
S → AaAb | BbBa
A →ε
B →ε
Answer:
Non-terminal symbol FIRST FOLLOW
S { a, b } { $}
A {ε} { a, b }
B {ε} { a, b }
Parsing Table:
a b $
Non-terminal
S S → AaAb S → BbBa
A A→ε A→ε
B B→ ε B→ε
Construct the LL(1) parsing table for the grammar given below:
S →A
A → aB
B → bBC | f
C →g
Non-terminal symbol FIRST
S {a}
A {a}
B { b, f }
C {g}
Note: Since the grammar is ε- free, FOLLOW sets are not required to be computed in order to enter
the productions into the parsing table.
Parsing Table:
a b f g d
Non-terminal
S S→A
A A → aB A→d
B B → bBC B→f
C C→ g
Construct the LL(1) parsing table for the grammar given below:
S → aBDh
B → cC
C → bC | ε
D → EF
E →g|ε
F →f|ε
B {c} { g, f, h }
C { b, ε } { g, f, h }
D { g, f, ε } {h}
E { g, ε } { f, h }
F { f, ε } { h}
Parsing Table:
NT a b c g f h $
S S → aBDh
B B → cC
C C→ bC C→ ε C→ ε C→ ε
D D→ EF D→ EF D→ EF
E E→ g E→ε E→ ε
F F→ f F→ ε
T { (, id } { +, ), $ }
F { (, id } { *, +, ), $ }
E‟ { +, ε } { ), $ }
T‟ { *, ε } { +, ), $ }
Non-terminal id + * ( ) $
E E → TE‟ E → TE‟
E’ E‟→+TE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→*FT‟ T‟→ ε T‟→ ε
F F→ id F→ (E)
iv. The above modified grammar is LL(1) grammar, since the parsing table entry uniquely
Identifies a production or signals an error.
v. Moves made by predictive parser on input id + id * id
MATCHED STACK INPUT ACTION
E$ id+ id * id$
TE‟$ id+ id * id$ Output E → TE‟
FT‟E‟$ id+ id * id$ Output T → FT‟
idT‟E‟$ id+ id * id$ Output F → id
id T‟E‟$ + id * id $ match id
id E‟$ + id * id $ Output T‟→ ε
L {(a } { )}
L‟ {,ε} { )}
iv. The above modified grammar is LL(1) grammar, since the parsing table entry uniquely
identifies a production or signals an error.
v. Moves made by predictive parser on input (a, (a,a))
MATCHED STACK INPUT ACTION
S$ (a,(a, a))$
(L)$ (a,(a, a))$ Output S → (L)
( L)$ a,(a, a))$ match (
( SL‟)$ a,(a, a))$ Output L →SL‟
( aL‟)$ a,(a, a))$ Output S→a
(a L‟)$ ,(a, a))$ match a
(a ,SL‟)$ ,(a, a))$ Output L‟→,SL‟
(a, SL‟)$ (a, a))$ match ,
(a, (L)L‟)$ (a, a))$ Output S → (L)
(a,( L)L‟)$ a, a))$ match (
(a,( SL‟)L‟)$ a, a))$ Output L →SL‟
(a,( a L‟)L‟)$ a, a))$ Output S→a
(a,(a L‟)L‟)$ , a))$ match a
(a,(a ,SL‟)L‟)$ , a))$ Output L‟→,SL‟
(a,(a, SL‟)L‟)$ a))$ match ,
(a,(a, aL‟)L‟)$ a))$ Output S→a
(a,(a,a L‟)L‟)$ ))$ match a
(a,(a,a )L‟)$ ))$ Output L‟→ε
(a,(a,a) L‟)$ )$ match )
(a,(a,a) )$ )$ Output L‟→ε
(a,(a,a)) $ $ match )
A { c, ε } { d, b }
B { d, ε} {b}
Parsing Table:
S‟ { e, ε } { e, $ }
E {b} {t}
Non-terminal
a b e i t $
S S→a S → iEtSS’
S’ → eS
S’ → ε
S’ S’ → ε
E E→ b
The above parsing table contains two production rules for M [S’, e]. So the given grammar is
not LL(1) grammar.
Here the grammar is ambiguous, and the ambiguity is manifested by a choice in what production to
use when an e (else) is seen. We can resolve this ambiguity by choosing S’ → eS.
A { a, c } { b, d }
Explain how panic mode error recovery techniques used for the following grammar:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ |ε
F → (E) | id
T { (, id } { +, ), $}
F { (, id } { *, +, ), $ }
E‟ { +, ε } { ), $ }
T‟ { *, ε } { +, ), $ }
FT‟E‟$ id$
idT‟E‟$ id$
T‟E‟$ $
E‟$ $
$ $
NOTE:
ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 212
Automata Theory & Compiler Design 21CS51 Module 3
How to determine a Context free grammar is LL(1) or Not? without constructing parsing Table
1. For any CFG of the form:
A → α1 | α2 | α3 | ……..
If there is no ε in any of these rules, then find FIRST(α1), FIRST(α2), FIRST(α3) and so on.
Take the intersection of these FIRST()s pair-wise.
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) ……………….. = Ø (No common terms)
Then the grammar is LL(1) grammar otherwise it is not LL(1)
[Find the pair-wise intersection of FIRST()]
Example:
Check whether the following grammar is LL(1) or not without constructing parsing table.
1. S → aSa | bS | c
Answer:
FIRST(α1) = FIRST(aSa) = {a}
FIRST(α2) = FIRST(bS) = {b}
FIRST(α3) = FIRST(c) = {c}
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) = {a}∩ {b}∩ {c} = Ø
Therefore the given grammar is LL(1) grammar
Check whether the following grammar is LL(1) or not without constructing parsing table
S → iCtSS1| bS | a
S1 → eS | ε
C→b
For S production rule:
----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.
Page | 215
Automata Theory & Compiler Design 21CS51 Module 4
A Pushdown automaton has seven components, say P = (Q, Σ, Γ , δ, q0, Z0, F ) where
Q: A finite set of states.
c. The arcs correspond to transitions of the PDA in the following sense. An arc labeled a, X/α
from state q to state p means that δ ( q, a, X ) contains the pair (p, α ). It tells what input is
used, and also gives the old and new tops of the stack.
INSTANTANEOUS DESCRIPTIONS OF A PDA (I D)
How PDA processes the input string, that means the PDA goes from configuration to configuration,
in response to input symbols (or ε ) can be represented using Instantaneous Descriptions of PDA.
Definition of Instantaneous Descriptions (ID)
Let P = ( Q, Σ, Γ , δ, q0, Z0, F ) be a PDA, the Instantaneous Descriptions of a PDA has a triplet
form (q, w, γ ) where q is the state.
w is the remaining input, and
γ is the stack contents.
Example: let the current configuration of PDA be ( q, aw, Zα), it means
q is the current state.
aw is the string to be processed.
Zα is current content of of the stack with Z as the topmost symbol on the stack.
(q, aw, Zα ) (p, w, βα) means that the current configuration of PDA will be (q, aw, Zα ) and
after applying zero or more number of transitions, the PDA enters into new configuration (p, w, βα ).
L(P) = { w | (q0, w, Z0) (q, ε, α ) } for some state q in F and any stack string α.
2. Acceptance by Empty stack state:
Let P = ( Q, Σ, Γ, δ, q0, Z0, F ) be a PDA. Then the language accepted by PDA P by empty stack
is N(P) = { w | (q0, w, Z0) (q, ε, ε ) } for any state q. That is, N(P) is the set of inputs w that
PDA P can consume and at the same time empty its stack.
Design a PDA to accept the language L = { an bn | n ≥ 0 } . Draw the graphical representation of PDA
obtained. Also write the ID for the string „aaabbb‟.
Procedure: Since language contains strings of „n‟ number of a‟s followed by „n‟ number of b‟s,
machine can read n number of „a‟s in start state. Let us push all the scanned input symbol ‘a’ onto the
stack. When machine encounter input string as „b‟, we should see that for each „b‟ input, there should be
corresponding symbol ‟a‟ on the stack. Finally if there is no input (ε) and stack is empty, it indicates that
the string scanned has n number of „a‟s followed by n number of „b‟s.
Procedure: Since language contains strings of „n‟ number of a‟s followed by„2n‟ number of b‟s,
machine can read n number of „a‟s in start state. For each input symbol ‘a’ push two ‘a’s onto the stack.
When machine encounter input string as „b‟, we should see that for each „b‟ input, there should be
corresponding symbol ‟a‟ on the stack. Finally if there is no input (ε) and stack is empty, it indicates that
the string scanned has n number of „a‟s followed by 2n number of „b‟s.
Design a PDA to accept the language L = { 02n 1n | n ≥ 1 } . Draw the transition diagram for the
constructed PDA. Also show the moves made by PDA for the string „000011‟.
Procedure: Since language contains strings of „2n‟ number of 0‟s followed by „n‟ number of 1‟s,
machine can read 2n number of „0‟s in start state. In start state q0, let us push all the scanned input
symbol ‘0’ onto the stack. When it reads „1‟ in q0, change the state to q1 and pop one „0‟ from stack. In
state q1 without consuming any input (ε) symbol, change the state to q2 and pop one „0‟ from stack. In
state q2 machine reads input symbol as „1‟ and change the state to q1, pop one‟0‟ from stack and this
process is repeated. When machine encounter, there is no more input(ε) in state q2 and stack is empty,
change the state to final state qf. It indicates that the string scanned has 2n number of „0‟s followed by n
number of „1‟s.
PDA to accept L = { a2n bn | n ≥ 1 } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, 0, Z0) = (q0, 0Z0)
δ(q0, 0, 0) = (q0, 00)
δ(q0, 1, 0) = (q1, ε)
δ(q1, ε, 0) = (q2, ε)
δ(q2, 1, 0) = (q1, ε)
δ(q2, ε, Z0) = (qf, Z0) Q = { q0, q1, q2, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { 0, 1}, Γ = { 0, Z0 } and F = { qf }
Transition diagram:
Design a PDA to accept the language L = { w | w € ( a+b)* and Na(w) = Nb(w) } . Draw the transition
diagram for the constructed PDA. Also show the moves made by PDA for the string „abbaaabb”
Procedure: The first scanned input symbol is either „a‟ or „b‟, push that symbol onto the stack. From this
point onwards, if the scanned input symbol and the top of stack symbol are same, then push that current
input symbol onto the stack. If the input symbol and top of stack symbol are different, then pop one symbol
from stack and repeat the process. Finally, when end of string is encountered, if the stack is empty, we say
that the string w has equal number of „a‟s and „b‟s otherwise number of „a‟s and „n‟s are different.
Transition diagram:
Design a PDA to accept the language L = { w | w € ( a+b)* and Na(w) > Nb(w) } . Draw the transition
diagram for the constructed PDA. Also show the moves made by PDA for the string „baaabbaa‟.
Note: Procedure remains same as previous problem, only the changes in final state transition
function. That is once the end of input string is encountered (ε) , the stack should contain at least one
„a‟. From this point onwards change state to q1, keep on popping the symbol a from stack until stack
gets empty. When stack is empty (Z0), input is already empty, so go to final state and accept the
language.
PDA to accept L = { w | w € ( a+b)* and Na(w) > Nb(w) } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ε)
δ(q0, b, a) = (q0, ε)
δ(q0, ε, a) = (q1, ε)
δ(q1, ε, a) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0) Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b}, Γ = { a, b, Z0 } and F = { qf }
Transition Diagram:
Design a PDA to accept the language L = { w | w € ( a+b)* and Na(w) < Nb(w) } . Draw the transition
diagram for the constructed PDA. Also show the moves made by PDA for the string „aabbbbab‟.
Note: Procedure remains same as previous problem, only the changes in final state transition
function. That is once the end of input string is encountered (ε) , the stack should contain at least one
„b‟. From this point onwards change state to q1, keep on popping the symbol b from stack until stack
gets empty. When stack is empty (Z0), input is already empty, so go to final state and accept the
language.
PDA to accept L = { w | w € ( a+b)* and Na(w) > Nb(w) } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ε)
δ(q0, b, a) = (q0, ε)
δ(q0, ε, b) = (q1, ε)
δ(q1, ε, b) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0) Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b}, Γ = { a, b, Z0 } and F = { qf }
Transition diagram:
Design a PDA to accept the language L = { wCwR | w € ( a+b)* } . Draw the transition diagram and
also write the moves made by PDA for the string “baaCaab”.
Procedure: To check for palindrome, let us push all scanned input symbols onto the stack till we
encounter the letter C. Once we pass the middle string, if the string is palindrome, for each scanned input
symbol, there should be a corresponding symbol (same as input symbol) on the stack. Finally if there is
no input and stack is empty, we say that the given string is palindrome and accepted by PDA.
PDA to accept L = { wCwR | w € ( a + b)* } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ab)
δ(q0, b, a) = (q0, ba)
δ(q0, C, a) = (q1, a)
δ(q0, C, b) = (q1, b)
δ(q1, a, a) = (q1, ε)
δ(q1, b, b) = (q1, ε)
δ(q0, C, Z0) = (q1, Z0) ; for w = ε
δ(q1, ε, Z0) = (qf, Z0)
Design an NPDA to accept the language L = { wwR | w € ( a+b)* } . Draw the transition diagram and
also write the moves made by PDA for the string “baaaab”.
Procedure: To check for palindrome, let us push all scanned input symbols onto the stack till we
encounter the midpoint. Once we pass the middle string, if the string is palindrome, for each scanned
input symbol, there should be a corresponding symbol (same as input symbol) on the stack. Finally if
there is no input and stack is empty, we say that the given string is palindrome.
δ(q0, ε, b) = (q1, b)
δ(q1, a, a) = (q1, ε)
δ(q1, b, b) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0)
δ(q0, ε, Z0) = (qf, Z0) ; for w= ε
Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b,}, Γ = { a, b, Z0 } and F = { qf }
Moves made by PDA for the string “baaaab”:
(q0, baaaab, Z0 ) (q0, aaaab, bZ0 ) (q0, aaab, abZ0 ) (q0, aab, aabZ0 ) (q1, aab, aabZ0 ) (q1, ab, abZ0 )
(q1, b, bZ0 ) (q1, ε, Z0 ) (qf, ε, Z0 )
Transition diagram:
Design a PDA to accept the language L = { 0n1m0n | m, n ≥ 1 } . Draw the transition diagram and also
write the moves made by PDA for the string “0011100”.
Procedure: Initially (q0) machine reads n number of „0‟s, push all the scanned input symbol „0‟ onto the
stack, when machine reads „1‟ in start state q0 , change the state to q1, and do not alter the content of stack.
In q1 state machine reads „1‟s and ignores that symbol. When machine reads „0‟ in q1 state, we should see
that for each scanned input symbol „0‟ there should be a corresponding symbol „0‟ on the stack, so change
the state to q2 and pop one „0‟ from stack. Finally if there is no input (ε) and stack is empty, we say that
string w has n number of „0‟s followed by „m‟ number of „1‟s followed by „n‟ number of „0‟s.
PDA to accept L = { 0n1m0n | m, n ≥ 1 } . is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, 0, Z0) = (q0, 0Z0)
δ(q0, 0, 0) = (q0, 00)
δ(q0, 1, 0) = (q1, 0)
δ(q1, 1, 0) = (q1, 0)
δ(q1, 0, 0) = (q2, ε)
δ(q2, 0, 0) = (q2, ε)
δ(q2, ε, Z0) = (qf, Z0)
Q = { q0, q1, q2, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { 0,1,}, Γ = { 0, Z0 } and F = { qf }
Moves made by PDA for the string “0011100”:
(q0, 0011100, Z0 ) (q0, 011100, 0Z0 ) (q0, 11100, 00Z0 ) (q1, 1100, 00Z0 ) (q1, 100, 00Z0 )
(q1, 00, 00Z0 ) (q2, 0, 0Z0 ) (q2, ε, Z0 ) (qf, ε, Z0 )
Transition diagram:
δ(q1, c, b) = (q2, ε)
δ(q2, c, b) = (q2, ε)
δ(q2, c, a) = (q3, ε)
δ(q3, c, a) = (q3, ε)
δ(q3, ε, Z0) = (qf, Z0 )
Q = { q0, q1, q2,q3, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a,b,c}, Γ = {a, b, Z0 } and F = { qf }
Design a PDA to accept the language L = { 0n1m0m1n | m, n ≥ 1 } . Draw the transition diagram and
also write the moves made by PDA for the string “0011100011”.
Procedure: Initially (q0) machine reads n number of „0‟s, push all the scanned input symbol „0‟ onto the
stack, when machine reads „1‟ in start state, change the state to q1, and push that input symbol onto the
stack. In q1 state machine reads as many number of „1‟s and push that symbol onto the stack. When
machine reads „0‟ in q1 state, we should see that for each scanned input symbol „0‟ there should be a
corresponding symbol „1‟ on the stack, so change the state to q2 and pop one „1‟ from stack. Again in q2
machine reads „0‟s and each time we should see that for each scanned input symbol „0‟ there should be a
corresponding symbol „1‟ on the stack and pop one‟1‟ from stack. In q2 if machine reads „1‟s , then
change state to q3 and we should see that for each scanned input symbol „1‟ there should be a
corresponding symbol „0‟ on the stack and pop one‟0‟ from stack Again in q3 machine reads remaining
„1‟s and each time pop one „0‟ from stack. Finally in q3 if there is no input (ε) and stack is empty, we say
that string w has n number of „0‟s followed by „m‟ number of „1‟s followed by „m‟ number of „0‟s
followed by „n‟ number of „1‟s.
δ(q1, 0, 1) = (q2, ε)
δ(q2, 0, 1) = (q2, ε)
δ(q2, 1, 0) = (q3, ε)
δ(q3, 1, 0) = (q3, ε)
δ(q3, ε, Z0) = (qf, Z0)
Q = { q0, q1, q2,q3, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { 0,1,}, Γ = { 0,1,Z0 } and F = { qf }
Moves made by PDA for the string “0011100011”:
(q0, 0011100011, Z0 ) (q0, 011100011, 0Z0 ) (q0,11100011, 00Z0 ) (q1, 1100011, 100Z0 )
(q1, 100011, 1100Z0 ) (q1, 00011, 11100Z0 ) (q2, 0011, 1100Z0 ) (q2, 011, 100Z0 ) (q2, 11, 00Z0 )
(q3, 1, 0Z0 ) (q3, ε, Z0 ) (qf, ε, Z0 )
Graphical representation of PDA (Transition diagram):
BOTTOM - UP PARSING
What is bottom – up parsing?
A bottom-up parser creates the parse tree of the given input string, starting from leaves working
towards the root (start symbol).
A bottom-up parser tries to find the right-most derivation of the given input in the reverse order.
Example:
E →E +T |T
T → T*F |F
F → ( E ) | id
Construct a bottom-up parse tree for the input string id * id
The above bottom up parse tree construction is same as that of deriving input strings by RMD in
reverse order.
REDUCTIONS
What is reduction?
Reduction is the reverse of step in derivation, where substring of input matching the RHS of
production is replaced by the non terminal at the LHS (Head) of that production.
We can think of bottom – up parsing as the process of “reducing” a string w to the start symbol of
the grammar. At each reduction step, a specific substring matching the body of the (RHS)
production is replaced by the non-terminal at the head (LHS) of that production.
The key decisions during bottom-up parsing are about:
When to reduce the input substring.
What production to apply, as the parse proceeds.
For the above example reductions will be discussed in terms of the sequence of strings:
id * id, F * id, T * id, T * F, T, E
Here sequence starts with id * id.
The first reduction process generates the sequence F * id by reducing the leftmost id to F, using the
production F → id
The second reduction produces T * id by reducing F to T, using T → F.
Now we have a choice between reducing string T, which is the body of E → T, and the string
consisting of second id, which is the body of F → id. Rather than reduce T to E, the second id is
reduced to F, resulting in the string T * F. This string is reduced to T. The parse completes with
reduction of T to the start symbol E.
HANDLE
Define handle with an example.
OR
For the following grammar indicate the handle for the right sentential form id1 * id2
A handle of a string is the substring that matches the right side of a production rule.
Every substring that matches the right side of a production rule is need not be a handle.
Example:
During the parse of input id1 * id2 according to the grammar are:
E →E +T |T
T → T*F |F
F → ( E ) | id
Reduction sequences: id1 * id2, F * id2, T * id2, T * F, T, E
NOTE: The symbol T is not a handle in the sentential form T * id2. If T were indeed replaced by
E, we would get the string E * id2, which cannot be derived from start symbol E.
If the grammar is unambiguous, then every right-sentential form of the grammar has exactly one
handle.
HANDLE PRUNING:
What is handle pruning ? Give a bottom up parse for the input : aaa * a++ and the grammar:
S → SS + | SS * | a
A bottom up parsing is an attempt to detect the handle of a right sentential form and whenever a
handle is detected, the reduction is performed. This is equivalent to performing a rightmost
derivation in reverse and is called “handle pruning”.
Bottom up parse for the input aaa*a++:
Working Principle:
During left to right scan of the input string, shift reduce parser goes on shifting the input
symbols onto the stack until a handle comes on the top of the stack.
When a handle appears on the top of the stack, it performs reduction.
The parser repeats the cycle (shift/reduce) until it has detected an error or the stack contains
the start symbol and the input is empty (successful).
NOTE: In bottom-up parsing we show the top of the stack on the right, rather than on the left as
we did for top down parsing.
ACTIONS OF SHIFT REDUCE PARSER
Explain with an example, the stack implementation of a shift reduce parser.
List and explain the actions of shift reduce parser.
$E * id3 $ Shift *
$E * id3 $ Reduce by E→ id
$E * E $ Reduce by E→ E * E
$E $ ACCEPT
For the grammar S → 0S1 | 01, give shift reduce configuration on input string 000111
Shift reduce configuration for 000111
STACK INPUT ACTION
$ 000111 $ Shift 0
$0 00111 $ Shift 0
$00 0111 $ Shift 0
$000 111 $ Shift 1
$0001 11 $ Reduce by S→ 01
$00S 11 $ Shift 1
$00S1 1 $ Reduce by S→ 0S1
$0S 1 $ Shift 1
$0S1 $ Reduce by S→ 0S1
$S $ ACCEPT
Consider the following grammars and parse the respective strings using shift- reduce parser.
E →E+T | T
T → T*F | F
F → (E) | id
string is “id + id * id”
Here we follow 2 rules
1. If the incoming operator has more priority than in stack operator then perform shift.
2. If in stack operator has same or less priority than the priority of incoming operator then
perform reduce.
Write the context free grammar and perform shift reduce parsing for the input int a, b, c;
Context free grammar for int id, id, id;
S → T L;
T → int
L → L, id | id
Configuration of shift reduce parser on input: int id, id, id;
STACK INPUT ACTION
$ int id, id, id;$ Shift int
$ int id, id, id;$ Reduce int
$T id, id, id;$ Shift id
STACK INPUT
$……if expr then Statement else………….$
Here depending on what follows the else on the input:
it might be correct to reduce if expr then Statement to Statement, or
it might be correct to shift else and then look for another Statement to complete the
alternative: if expr then Statement else Statement.
The above shift/reduce conflict can be resolved by shifting else onto the stack.
Reduce/reduce conflict:
The situation in which parser cannot make decision about which of several reductions to apply are
called reduce/reduce conflict
Example:
E → E + id
E → id
Suppose the input string is: id + id
If we have shift- reduce parser in configuration:
STACK INPUT
$ E + id $
Here parser can perform reduction of id to E or it can perform reduction E + id to E. This conflict
can be resolved by reducing E + id to E.
Shift reduce implementation does not tell us anything about the technique used for detection of
handles. Depending upon the technique used for detection of handles, we get different shift reduce
parsers.
i. Operator precedence parser: Uses the precedence relationship between certain pairs of
terminals to guide the selection of handles.
ii. LR parser: It uses DFA that recognizes the set of all viable prefixes; by reading the stack
from bottom to top, to determine what handle, if any, is on the top of the stack.
LR PARSER
What is LR parser? What is the meaning of L and R in LR grammars?
LR parser is a shift reduce parser uses DFA to recognize handles, based on the concept called
LR(k) parsing; where L is for left to right scan of the input, the R for constructing the rightmost
derivation in reverse, and k for number of look ahead input symbols used in making parsing
decisions.
Why LR parsing more attractive?
LR parsers can be constructed for all programming language constructs for which CFGs can
be written.
LR parser is more efficient.
LR parser can quickly detect a syntactic error.
LR parser constructed for LR grammars can describe more languages than LL grammars.
Drawback of LR methods:
Too much work to construct an LR parser by hand for a typical programming language grammar.
But automatic parser generators like YAAC will take CFG as input and produces a parser for that
grammar.
Items or LR(0) items
How does a shift-reduce parser know when to shift and when to reduce? For example, with stack
contains $T and next input symbol * in the following configuration
Stack Input
How does the parser know that T on the top the stack is not a handle, so the appropriate action is
to shift and to reduce T to E?
An LR parser makes shift-reduce decisions by maintaining states, to keep track of where we are in a
parse.
Define LR(0) item (Item).
An LR(0) item of a grammar G is a production rule of G with a dot placed at some position of the
right hand side of the rule.
Example: A grammar G has production rule A → XYZ results in four LR(0) items as:
A → .XYZ
A → X .YZ
A → XY .Z
A → XYZ .
The dot (.) indicates how much of the right hand side of the production is seen at a given point in
the parsing process.
Item A → .XYZ indicates we hope to see a string derivable from XYZ next on the input.
Item A → X .YZ indicates that we have just seen on the input a string from X and next we
Item A → XYZ . indicates that we have seen the body XYZ and that it may be time to reduce
XYZ to A (as a handle)
CLOSURE FUNCTION:
If I is a set of items for grammar G, then CLOSURE(I) is the set of items constructed from I by the
two rules:
i. Initially add every item in I to CLOSURE(I).
E→.T
T→.T*F
T→.F
F → . (E)
F→ . id
GOTO FUNCTION:
GOTO ( I, X ) is the transition from I on X, first identify all the items in I in which the dot precedes
X on the right side. Then move the dot in all the selected items one position to the right (over X)
and then take the closure of the set of these items.
Example: if set I = {E → .E + T
E→.T
T→.T*F
T→.F
F → . (E)
F→ . id
}
Then GOTO ( I, T ) = CLOSURE ( { E → T.
T → T. * F }
)
= { E → T.
T → T. * F
}
*****What are Kernel and non-kernel items?
Kernel Items:
Those items with initial item S‟ → .S and all items whose dots are not at the left end are called
Kernel items.
Example: S‟ → .S
E → T.
T → T. * F
Non-Kernel Items:
All items with their dots at the left end except for S‟→ .S are called non-kernel items.
Example:
E → .E + T
E→.T
T→.T*F
Viable Prefixes:
The prefixes of right sentential forms that can appear on the stack of a shift reduce parser are called
viable prefixes.
C = { CLOSURE ( { S‟ → .S }) }
repeat
for ( each set of Items I in C ) for (
each grammar symbol X )
if (GOTO ( I, X ) is not empty and not in C ) add
GOTO (I, X) to C;
Until no new sets of items are added to C on a round;
Obtain the sets of canonical collection of sets of valid LR(0) items for the grammar given below:
S → CC
C → cC | d
Answer
Grammar G:
S → CC
C → cC
C→d
Augmented grammar G‟:
S‟ → S
S → CC
C → cC
C→d
Canonical collection of sets of LR(0) items are computed as follows:
I0 = CLOSURE ( { S‟ → .S } ) = { S’ → .S
S → .CC
C → .cC
C → .d }
I1 contains the item in which dot is already moved to the rightmost end, so there is no GOTO
function or transition in I1 .
The canonical collection of LR(0) items for the given grammar is C = { I0, I1, I2, I3, I4, I5, I6 }
LR(0) Automaton for the given grammar is:
}
GOTO ( I0, Stmt_sequence ) = CLOSURE ( { Stmt_sequence‟ → Stmt_sequence.
Stmt_sequence → Stmt_sequence . ; stmt
})
= { Stmt_sequence‟ → Stmt_sequence.
Stmt_sequence → Stmt_sequence . ; stmt ------- I1
}
GOTO ( I0, stmt) = CLOSURE ( { Stmt_sequence → stmt. } )
= { Stmt_sequence → stmt. } --------------- I2
GOTO ( I0, s) = CLOSURE ( { stmt → s. })
= { stmt → s. } ---------------- I3
GOTO ( I1, ;) = CLOSURE ( { Stmt_sequence → Stmt_sequence ; . stmt })
A → .(A)--------------------- I0
A → .a
}
GOTO ( I0, A) = CLOSURE ( { A’→ A. } ) = { A‟→ A. }-------I1
GOTO ( I0, ( ) = CLOSURE ( { A →(. A) } ) = { A → (.A)
A → .(A)------- I2
A → .a
}
GOTO ( I0, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3
GOTO ( I2, A ) = CLOSURE ( { A → (A .) } ) = { A → (A.) } ---- I4
GOTO ( I2, ( ) = CLOSURE ( { A → (. A ) }) = { A → (.A)
A → .(A) ------- I2
A → .a
}
GOTO ( I2, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3
B → .--------------- I5
}
Goto( I4, A ) = Closure ( { S →AaA.b } ) = { S →AaA.b } ------------- I6
Goto( I5, B ) = Closure ( { S →BbB.a } ) = { S →BbB.a } ------------- I7
Goto( I6, b ) = Closure ( { S →AaAb. } ) = { S →AaAb. } ------------- I8
Goto( I7, a ) = Closure ( { S →BbBa. } ) = { S →BbBa. } ------------- I9
LR(0) automaton:
Write the canonical collection of sets of LR(0) items for the grammar:
S→L=R|R
L → * R | id
R→L
Augmented grammar will be:
S‟ → S
S→L=R
S→R
L→*R
L → id
R→L
E→E+T
E→T
T→T*F
T→F
F→(E)
F → id
F→.(E)F
→ .id
}
LR(0) Automaton:
Simple LR or SLR parsing is the construction from the grammar of the LR(0) automaton. The
states of these automaton are the sets of items from the canonical LR(0) collection, and the
transitions are given by the GOTO function.
How can LR(0) automata help with shift-reduce decisions?
Shift-reduce decisions can be made as follows;
Suppose that the string of grammar symbols takes the LR(0) automaton from the start state to some
state „j‟. Then perform the shift operation on next iput symbol‟a‟ if state „j‟ has a transition on
„a‟. otherwise perform reduce operation: During reduction, the items in state „j‟ will tell us which
production to use.
Example: id * id
By looking the above LR(0) automaton; the following table illustrates the actions of a shift reduce
parser on input id * id
At line (1) next input is „id‟ and state 0 (I0) has a transition on „id‟ to state 5 (refer LR(0)
automaton). Therfore we shift. At line (2) the next state number 5 (symbol „id‟) has been pushed
onto the stack. There is no transition from state 5 (I5) on input *, so we reduce. The item in state
5, F → id. is used for reduction ( production in which dot appears at the rightmost end). So the
reduction is by production F → id , reduction is implemented by poping the body of the production
(id) from the stack (at line 2) and pushing the head of the production (F in this case)
So here when we pop state 5 from stack, state 0 become the top and look for a transition on F (head
of the production) . That is state 0 has a transition on F to state 3, so we push state 3, with
corresponding symbol F (at line 3). Each of the remaing moves is determined similarly at line (2)
MODEL OF LR PARSER: (Structure of LR parser)
GOTO Table: It simply map the transitions in automaton on non-terminals. If GOTO [Ii, A] = Ij,
then in GOTO table we have to make an entry as sate „j‟ in the column ( I, A).
Behavior of the LR parser:
Discuss the behavior of LR parser.
The behavior of the LR parser for the given input is determined by reading the current input symbol
ai and state Sm on top of the stack, and consulting the entry ACTION [ Sm, ai] in the parsing action
Table.
i. If ACTION [ Sm, ai] = Shift j (sj ), then parser performs a shift operation, in which the
next state „j‟ is shifted into the stack.
ii. If ACTION [ Sm, ai] = Reduce k (rk), then parser performs a reduce operation, in which the
production used for reduction is identified with the number „k‟. The reduction process is
implemeted by poping the „n‟ number of states corresponding to the „n‟ number of terms in the
body of production used for reduction, from stack. The head of the
production used in reduction is pushed onto the stack by consulting the entry in GOTO [s m,
A] where sm is state on top of the stack and A is non terminal symbol coresponding to the
head of the production in reduction process.
iii. If ACTION [ sm, ai] = Accept, parsing is complted.
iv. If ACTION [ sm, ai] = Error, the parser has discovered an error and calls an error
recovery routine
LR PARSING ALGORITHM:
With neat diagram explain LR parsing algorithm.
Input: An input string „w‟ and an LR parsing table with ACTION and GOTO for grammar G.
Output: w is L(G) after reduction, otherwise an error indication.
Parsing Algorithm:
Let a be the first symbol of the given input string w$
while (1) / * repeat forever*/
{
let s be the state on top of the stack;
if ( ACTION [ s, a] = shift t )
{
Push ‘t’ onto the satck;
Let ‘a’ be the next input symbol;
}
else if ( ACTION [ s, a] = reduce A → β )
{
pop | β | symbols off the stack;
TYPES OF LR PARSER:
The structure of LR parser for different types will change only in parsing table.
A s d i s c u s s e d e a r l i e r , t h e r e a r e t hr e e t yp e s o f L R P a r s e r s t ha t e m p l o y t he
bo t t o m - u p method of parsing a string in a given CFG. They are
i. Simple LR Parsers or LR(0) parsers (SLR)
ii. LR(1) Parsers or Canonical LR Parser (CLR)
iii. Look-ahead LR parsers(LALR):
A → .a
}
GOTO ( I0, A) = CLOSURE ( { A’→ A. } ) = { A‟→ A. }------- I1
GOTO ( I0, ( ) = CLOSURE ( { A →(. A) }
) = { A → (.A)
A → .(A)------- I2
A → .a
}
GOTO ( I0, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3
GOTO ( I2, A ) = CLOSURE ( { A → (A
.) } ) = { A → (A.) } ---- I4
GOTO ( I2, ( ) = CLOSURE ( { A → (. A )} ) = { A → (.A)
A → .(A) ------- I2
A → .a
}
GOTO ( I2, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3
ACTION GOTO
STATE a ( ) $ A
0 S3 S2 1
1 accept
2 S3 S2 4
3 r2 r2
4 S5
5 r1 r1
By gving number to the productions of the grammar G:
(1) A → (A)
(2) A→a
Here the augmented production A‟→ A. is present in I1 item set, so we have to make an action
entry [ 1, $ ] = accept
Identify the productions of the form A → α .
Item set I3 contains the production A → a. and I5 contains A → (A).
For the given grammar Follow (A) = { ) , $ }
Therefore in state number 3 on input ) make an entry in action table as [ 3, ) ] = r2 ( reduce by
A → a production)
[3, $ ] = r2
In state number 5 on input ) make an entry in action table as [ 5, ) ] = r1 ( reduce by A → (A)
production)
[ 5, $] = r1
GOTO table entry:
GOTO ( I0, A ) = I1 makes an entry in goto table as [0, A ] = 1
GOTO ( I2, A ) = I4 makes an entry in goto table as [2, A ] = 4
NOTE: In reduction process, pop n states from stack, where n = number of terms on RHS of
reducing production.
The canonical collection of LR(0) items for the given grammar is C = { I0, I1, I2, I3, I4, I5, I6 }
By numbering the grammar G:
(1) S → CC
(2) C → cC
(3) C→d
Follow (S ) = { $ }
Follow (C ) = { c, d }
STATE c d $ S C
0 S3 S4 1 2
1 Accept
2 S3 S4 5
3 S3 S4 6
4 r3 r3
5 r1
6 r2 r2
I0 : CLOSURE ( { E‟ → .E} )
{ E‟ → .E
E→.(E)
E → . id
}
Goto [ I0, E ] = I1 = { E‟ → E.
}
Goto [ I0, ( ] = I2 = { E → (. E )
E→.(E)
E → . id
}
Follow( E ) = { ), $}
S→b
A → SA
A→a
Closure { S‟ → .S }
= { S‟ → .S
S → .AS
S →. b
→ I0
A → .SA
A → .a
}
Goto [ I0, S ] = { S‟ → S. }
→ I1
A → S. A
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I0, A ] = { S → A. S
→ I2
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I0, b ] = {S→b.}
→ I3
Goto [ I0, a ] = { A → a. } → I4
Goto [ I1, A ] = { A → S A.
→ I5
S → A.S
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I1, b ] =
→ I3
Goto [ I1, a ] =
→ I4
Goto [ I1, S ] = { A → S.A
→ I6
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I2, S ] = { S → AS.
A → S.A
S → .AS -- I7
S →. b
A → .SA
A → .a
}
Goto [ I2, A] = { S → A.S
S → .AS -- I2
S →. b
A → .SA
A → .a
}
Goto [ I2, b ] =
→ I3
Goto [ I2, a ] =
→ I4
Goto [ I5, S ] =
→ I7
Goto [ I5, A ] =
→ I2
Goto [ I5, b ] =
→ I3
Goto [ I5, a ] =
→ I4
Goto [ I6, a ] =
→ I4
Goto [ I6, b ] =
→ I3
Goto [ I6, A ] = → I5
Goto [ I6, S ] =
→ I6
Goto [ I7, b ] =
→ I3
Goto [ I7, a ] =
→ I4
Goto [I7, A] = ---- I5
Goto[I7, S] = --- I6
STATE
id ( ) + * $ E T F
0 S5 S4 1 2 3
1 S6 accept
2 r2 r2 S7 r2
3 r4 r4 r4 r4
4 S5 S4 8 2 3
5 r6 r6 r6 r6
6 S5 S4 9 3
7 S5 S4 10
8 S11 S6
9 r1 r1 S7 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
S → .AaAb
S → .BbBa --------------------- I0
A→.
B→.
L→*R
L → id
R→L
Canonical collection of sets of LR(0) items are computed as follows:
(1) S → L = R
(2) S → R
(3) L → * R
(4) L → id
(5) R → L
Follow(S) = { $}
Follow(L) = { =, $ }
Follow(R) = { =, $}
Since there is a multiple entry, ie: both a shift and a reduce entry in ACTION [ 2, =], state 2 has a
shift/reduce conflict on input symbol „= „ so the given grammar is not SLR(1).
NOTE: The above grammar is not ambiguous, the shift/reduce conflict arises from the fact that the
SLR parser is not powerful enough to remember enough left context to decide what action the
parser should take on input =, having seen a string reducible to L.
S → .bBa
A → .d
B → .d
}
Goto (I0, d) = { A → d.
B → d. }---------- I5
Goto (I2, a) = { S → Aa. } -------------- I6
ACTION GOTO
STATE a b c d $ S A B
0 S4 S5 1 2 3
1 Accept
2 S6
3 S7
4 S5 8
5 r5/ r6 r5 9
6 r1
7 r3
8 S10
9 S11
10 r2
11 r4
(1) S → Aa
(2) S → bAc
(3) S → Ba
(4) S → bBa
(5) A→d
(6) B→d
Follow(S) = { $}
Follow(A) = { a, c}
Follow(B) = { a}
The above SLR parsing table contains multiple entries in state 5 on input „a‟, Action[5, a] = r5 /
r6 results in reduce/reduce conflict. So the given grammar is not LR(0) or SLR grammar.
Show that the following grammar is SLR(1)
S → SA | A
A→ a
Augmented grammar:
S‟ → S
S → SA
S →A
A→ a
LR(0) items:
I0 : { S‟ → .S
S → .SA
S → .A
A → .a }
Goto( I0, S) = { S‟ → S.
S → S.A I1
A → .a
}
Goto ( I0, A) = {S → A. } I2
Goto ( I0, a) = {A→ a. } I3
(1) S → SA
(2) S → A
(3) A→ a
Follow(S) = { a, $ }
Follow(A) = { a, $ }
ACTION GOTO
STATE a $ S A
0 S3 1 2
1 S3 Accept 4
2 r2 r2
3 r3 r3
4 r1 r1
From the above SLR parsing table we observe that, each parsing table entry uniquely identifies
shift or reduce operation or signals an error (blank entry). So the given grammar is SLR(1).
Consider the grammar :
E→E+n|n
ii. Construct SLR parsing table and parse the input string n + n + n
Augmented grammar:
E‟ → E
E→E+n
E→n
LR(0) items:
I0 { E‟ → .E
E → .E + n
E → .n
}
Goto (I0, E) = { E‟ → E. I1
E → E .+ n
}
Goto (I0, n) = { E → n. } I2
Goto (I1, + ) = { E → E +. n } I3
Goto (I3, n ) = { E → E + n. } I4
(1) E → E + n
(2) E → n
Follow (E) = { +, $ }
The item „a‟ is called as an LR (1) item, because the length of the look-ahead symbol is one.
An item without look-ahead is one with look-ahead of length zero, hence it is LR (0) item.
In SLR parsing method, we were working with LR (0) items.
An LR(1) item is comprised of two parts:
LR( 0 ) item and the look-ahead associated with the item.
There are two different methods for LR parsing based on look-ahead symbol on the input:
i. The Canonical LR or LR (1 ) or just LR parser: which makes full use of the look- ahead
symbol(s). This method uses a large set of items, called the LR (1) items.
ii. Look-ahead LR or LALR:
a. Which is based on the LR(0) sets of items and has many fewer states than typical
parsers based on the LR(1) items. (CLR)
b. We can handle many more grammars with the LALR method than the SLR method,
by introducing look-ahead‟s into the LR(0) items.
c. LALR parsing table is not bigger than SLR tables.
d. It most widely used parser.
LR (1) Or Canonical LR Parser (CLR):
Every state of the LR(1) or CLR parser will correspond to a set of LR (1) items.
When parser „looks-ahead‟ in the input buffer to decide whether or not reduction is to be
done; the information about the terminals will be available in the state of the parser itself.
Canonical collection of LR(1) items can be obtained by just modifying the CLOSURE and GOTO
functions with look-ahead symbols.
Augmented grammar:
S‟ → S
S → CC
C → cC
C → d
LR(1) items:
The initial item set I0 is obtained by computing the
CLOSURE ( { S‟ → .S, $ } )
β = ε and a =$
Add all S productions with dot at left most end and look-ahead input symbol = $ to I0
S → .CC, $
Goto (I0, S ) = { S‟ → S. , $ } → I1
Goto (I0, C ) = { S → C.C, $
C → .cC , $ → I2
C → .d , $
}
Goto (I0, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
}
Goto (I0, d) = { C → d. , c / d } → I4
Goto (I2, C) = { S → CC. , $ } → I5
Goto (I2, c) = { C → c.C , $
C → .cC , $ → I6
C → . d, $ }
Goto (I2, d) = { C → d. , $ } → I7
Goto (I3, C) = { C → c C. , c / d } → I8
Goto (I3, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
}
Goto (I3, d) = { C → d. , c / d } → I4
Goto (I6, C) = { C → cC. , $ } → I9
Goto (I6, c) = { C → c.C , $
C → .cC , $ → I6
C → . d, $ }
Goto (I6, d) = { C → d. , $ } → I7
Goto (I5, a) = { A → a. , ) } I6
Goto (I8, ) ) = { A → (A) . , ) } I9
S → AA
A → Aa | b
Augmented grammar:
S‟ → S
S → AA
A → Aa
A→ b
LR(1) items:
I0 : = {
S‟ → .S, $
S → .AA, $
A → .Aa, b/a
A → .b, b/a
Goto (I0, S ) = { S → S. , $ } I1
A → .b, $/a
A → A.a , b/a I2
}
Goto (I2, b ) = { A → b. , $/a } I5
GOTO graph:
1. Construct the canonical collection of LR(1) item set C‟ = { I0, I1, I2…………….. In } for the
augmented grammar G‟.
2. State „i‟ of the parser is constructed from Ii, the parsing action for state „i‟, for every terminal
symbol „a‟ is determined as follows:
a) If GOTO ( Ii, a ) = Ij then make an Action [ i, a] = Sj
b) For every state Ii in C‟ whose underlying set of LR(1) items contains an item of the form
{ A → α., a } , make an Action [ i, a ] = rk where k is the number of the production A→ α.
c) If { S‟ → .S , $ } is in Ii, then set Action[i, $ ] = accept.
3. For state „i‟ make an entry in GOTO table for non-terminals A, using the rule GOTO[i, A] = j
4. All entries not defined by rules (2) and (3) are made „error‟.
5. The initial state of the parser is the one constructed from the set of items containing [ S‟→ .S,
$]
NOTE:
If canonical LR (1) parsing table, action function has no multiply defined entries, then the given
grammar is called an LR(1) grammar or CLR grammar.
Construct LR(1) items and LR(1) or CLR parsing table for the following grammar:
S → CC
C → cC | d
Augmented grammar:
S‟ → S
S → CC
C → cC
C → d
LR(1) items:
I0 :
Goto (I0, S ) = { S‟ → S. , $ } → I1
C → .d , $
}
Goto (I0, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
}
Goto (I0, d) = { C → d. , c / d } → I4
Goto (I2, C) = { S → CC. , $ } → I5
Goto (I2, c) = { C → c.C , $
C → .cC , $ → I6
C → . d, $ }
Goto (I2, d) = { C → d. , $ } → I7
Goto (I3, C) = { C → c C. , c / d } → I8
Goto (I3, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
Goto (I3, d) = { C → d. , c / d } → I4
C → .cC , $ → I6
C → . d, $ }
Goto (I6, d) = { C → d. , $ } → I7
(1) S → CC
(2) C → cC
(3) C → d
Canonical LR Parsing Table:
ACTION GOTO
STATE c d $ S C
0 S3 S4 1 2
1 Accept
2 S6 S7 5
3 S3 S4 8
4 r3 r3
5 r1
6 S6 S7 9
7 r3
8 r2 r2
9 r2
Augmented grammar:
S‟ → S
S → L=R
S → R
L → *R
L → id
R → L
LR(1) items:
S‟ → .S, $ I0
S → . L=R, $
S→.R,$
L→.*R,=
L → . id , =
R→.L,$
L→.*R,$
L → . id , $
Goto (I0, S) = S‟ → S. , $ I1
Goto (I0, L) = S → L. = R, $
R → L. , $ I2
Goto (I0, R) = S →R . , $ I3
R→.L,= R→.L,=/$
L → .* R , = L → .* R , = / $ ----- I4
L → . id , = L → . id , =/ $
L→ *.R,$
R→.L,$
L → .* R , $
L → . id , $
R→.L,$ I6
L→.*R,$
L → . id , $
Goto (I4, R) = L → * R. , = / $ I7
Goto (I4, L) = R → L. , = / $ I8
R → . L , =/$
L → .* R , =/$ I4
L → . id , =/$
Goto (I6, R) = S → L = R. , $ I9
Goto (I6, *) = L → *. R , $
R → .L , $
L → .* R , $ I11
L → .id , $
Goto (I6, id) = L → id. , $ I12
Goto (I11, *) = L → *. R , $
R → .L , $
L → .* R , $ I11
L → .id , $
Goto (I11, id) = L → id. , $ I12
(1) S → L = R
(2) S → R
(3) L → *R
(4) L → id
(5) R → L
CLR Parsing Table:
ACTION GOTO
STATE id * = $ S L R
0 S5 S4 1 2 3
1 Accept
2 S6
3 r2
4 S5 S4 8 7
5 r4 r4
6 S12 S11 10 9
7 r3 r3
8 r5 r5
9 r1
10 r5
11 S12 S11 10 13
12 r4
13 r3
S → . AaAb, $
S → . BbBa, $ I0
A → ., a
B → ., b
Goto (I0, S) = S‟ → S. , $ I1
Goto (I0, A) = S → A.aAb, $ I2
Goto (I0, B) = S →B . bBa, $ I3
Goto (I2, a) = S → Aa. Ab, $
A → ., b I4
Goto (I3, b) = S →B b.Ba, $ I5
B → ., a
Goto (I4, A) = S → AaA. b, $ I6
1. S → AaAb
2. S → BbBa
3. A → ε
4. B → ε
LR(1) Parsing table:
ACTION GOTO
STATE a b $ S A B
0 r3 r4 1 2 3
1 Accept
2 S4
3 S5
4 r3 6
5 r4 7
6 S8
7 S9
8 r1
9 r2
Goto (I0, E) = S → E. , $ I2
Goto (I0, ( ) = E → (. L), $
L → .EL, ) I3
E → .( L), a /(
E → .a, a /(
Goto (I0, a) = E → a., $ I4
Goto (I3, L) = E → (L.), $ I5
Goto (I3, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(
E → .a, a /(
E → .a, a /(
Goto (I3, a) = E → a., a /( I8
E → .a, a /(
Goto (I6, ( ) = E → (. L), a /(
L → .EL, )
I7
E → .( L), a /(
E → .a, a /(
Goto (I6, a) = E → a., a /( I8
Goto (I7, L) = E → (L.), a /( I11
Goto (I7, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(
E → .a, a /(
Goto (I7, ( ) = E → (. L), a /(
L → .EL, )
I7
E → .( L), a /(
E → .a, a /(
Goto (I7‟ a) = E → a., a /( I8
2. State „i‟ of the parser is constructed from Ii, the parsing action for state „i‟, for every
terminal symbol „a‟ is determined as follows:
bFor every state Ii in C‟ whose underlying set of LR(1) items contains an item of the form
3. For state „i‟ make an entry in GOTO table for non-terminals A, using the rule GOTO[i, A]
=j
4. All entries not defined by rules (2) and (3) are made „error‟.
5. The initial state of the parser is the one constructed from the set of items containing
[ S‟→ .S, $ ]
C → cC | d
Also parse the input string „ccdd‟ using LALR parsing table
Augmented grammar:
S’ → S
S → CC
C → cC
C → d
LR(1) items:
I0 :
Goto (I0, S ) = { S‟ → S. , $ } → I1
C → .d , $
}
Goto (I0, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
}
Goto (I0, d) = { C → d. , c / d } → I4
Goto (I2, C) = { S → CC. , $ } → I5
Goto (I2, c) = { C → c.C , $
C → .cC , $ → I6
C → . d, $ }
Goto (I2, d) = { C → d. , $ } → I7
Goto (I3, C) = { C → c C. , c / d } → I8
Goto (I3, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
Goto (I3, d) = { C → d. , c / d } → I4
Goto (I6, C) = { C → cC. , $ } → I9
C → .cC , $ → I6
C → . d, $ }
Goto (I6, d) = { C → d. , $ } → I7
From the above LR(1) items we see that, I3, I6 have identical LR(0) items that differ only in their
look-ahead‟s. The same goes for the pair of states I4, I7 and the pair of states I8, I9. Hence we can
combine I3 with I6, I4 with I7 and I8 with I9 to obtain the reduced collection of LR(1) items as shown
below:
I0 :
{ S‟ → S. , $ } → I1
{ S → C.C, $
C → .cC , $ → I2
C → .d , $
{ C → c .C , c / d /$
C → .cC , c / d/ $ → I36
C → . d, c / d/ $
{ C → d. , c / d /$ } → I47
{ S → CC. , $ } → I5
{ C → c C. , c / d /$ } → I89
5 r1
89 r2 r2 r2
0 36 36 47 ccd d$ Reduce by C → d
0 36 36 89 ccC d$ Reduce by C → cC
0 36 89 cC d$ Reduce by C → cC
02 C d$ Shift 47.
0 2 47 Cd $ Reduce by C → d
025 CC $ Reduce by S → CC
01 S $ Accept
Goto (I0, E) = S → E. , $ I2
Goto (I0, ( ) = E → (. L), $
L → .EL, ) I3
E → .( L), a /(
E → .a, a /(
E → .a, a /(
Goto (I7, ( ) = E → (. L), a /(
L → .EL, )
E → .( L), a /( I7
E → .a, a /(
Goto (I7‟ a) = E → a., a /( I8
From the above LR(1) items we see that, I3, I7 have identical LR(0) items that differ only in their
look-ahead‟s. The same goes for the pair of states I4, I8 and the pair of states I5, I11 and I9, I12. Hence
we can combine I3 with I7, I4 with I8 , I5 with I11 and I9 with I12 to obtain the reduced collection of
LR(1) items as shown below:
I37 :
E → (. L), $ / a /(
L → .EL, )
E → .( L), a /(
E → .a, a /(
I48:
E → a., $ /a /(
I511:
E → (L.), $ / a /(
I912:
E → (L) . , $ / a /(
LALR Parsing Table:
ACTION GOTO
STATE a ( ) $ S E L
0 S48 S37 1 2
1 accept
2 r1
37 S48 S37 6 511
48 r3 r3 r3
511 S912
6 S48 S37 6 10
912 r2 r2 r2
10 S912 / r4
1. S → E
2. E → ( L )
3. E → a
4. L → EL
The above grammar is not an LALR(1) grammar, since the LALR parsing table contains multiple
entries in state 10 on input „)‟.
TURING MACHINE
Introduction:
In the early 1930s. mathematicians were trying to define effective computation. Alan Turing in
1936. Alanzo Church in 1933, S.C. Kleene in 1935, Schonfinkel in 1965 gave various models using
the concept of Turing machines, λ-calculus, combinatory logic, post-systems and p-recursive
functions. It is interesting to note that these were formulated much before the electro-
mechanical/electronic computers were devised. Although these formalisms, describing effective
Computations are dissimilar, they turn to be equivalent.
Among these formalisms, the Turing's formulation is accepted as a model of algorithm or
computation.
Turing machines are useful in several ways. As an automaton, the Turing machine is the most
general model. It accepts type-0 (un-restricted Grammer generated language) languages. It can also
be used for computing functions. It turns out to be a mathematical model of partial recursive
functions. Turing machines are also used for determining the un-decidability of certain languages
and measuring the space and time complexity of problems.
Type-0 Grammar: Any Grammar in which the production rule is of type: α → β where α is a string of
terminals and non-terminals with at least one non-terminal and α cannot be null. β is a string of
terminals and non-terminals.
Type-0 Grammar generates Recursively Enumerable Languages.
Turing’s Thesis:
• Any computation that can be carried out by a mechanical means can be performed by some
Turing Machine.
• The Church-Turing thesis states that any algorithmic procedure that can be carried out by
human beings/computer can be carried out by a Turing machine.
• It has been universally accepted by computer scientists that the Turing machine provides an
ideal theoretical model of a computer.
Few arguments for accepting this thesis are
1. Anything that can be done on existing digital computer can also be done by Turing Machine.
2. No one has yet been able to suggest a problem solvable by what we consider an algorithm, for
which a Turing machine program cannot be written.
The Turing machine model uses an infinite tape as its unlimited memory. The input symbols occupy
some of the tape‟s cells. Input symbols can be preceded and followed by infinite number of blank
(B) characters. Each cell can store only one symbol. The input to and the output from the finite state
automaton are effected by the R/W head which can examine one cell at a time.
A move of the Turing machine is a function of the state of the finite control and the tape symbol
scanned. In one move, the TM will change state. The next state optionally may be the same as the
current state.
At each step of computation
1. Read/scan the symbol below the R/W head
2. Update/write a symbol the R/W head
3. Move the R/W head one step LEFT
4. Move the R/W head one step RIGHT
Finite Control is with a sort of FSM which has
• Initial state
• Final states or Accepting state.
• Rejecting state
Computation can either: Halt and ACCEPT or Halt and REJECT or LOOP (the machine fails to
HALT).
REPRESENTATION OF TURING MACHINES
We can describe a Turing machine by employing
1. Instantaneous descriptions (ID) using move-relations.
a X B
→q (p, X, R)
A TM that always halt irrespective of whether they accept or not, are a good model for an
algorithm. If an algorithm exist for a given problem, then the problem is decidable otherwise it is
un-decidable problem.
DESIGN OF TURING MACHINES
Basic guidelines for designing a Turing machine:
• The fundamental objective in scanning a symbol by the R/W head is to 'know‟ what to do in the
future.
• The machine must remember the past symbols scanned. The Turing machine can remember this
by going to the next unique state.
• The number of states must be minimized. This can be achieved by changing the states only
when there is a change in the written symbol or when there is a change in the movement of the
R/W head.
********Design a Turing Machine to accept the language L = { an bn | n 1 }. Write the transition
diagram, also show the moves made by the TM for the string “aabb”.
General Procedure:
Starting from the left end machine checks the first input symbol, “a” and changes it to X, and move
the r/w head towards right until it sees a left most “b”. Now when it finds a leftmost “b”, replace it
by Y and move the r/w head towards left. At this point number of “a‟s matches with number of
“b‟s. Again repeat the same process till all a‟s and b‟s are replaced by X‟s and Y‟s respectively. In
start state if there are no “a’s,(only Y) then change the state and see for no “b‟s. Finally when
machine reads B, we say that language contains, n number of a‟s followed by n number of b‟s.
δ ( q0, a ) = ( q1, X, R ) ; replace a by X and move right
δ ( q1, a ) = ( q1, a, R ) ; In right move, ignore all a‟s and Y‟s
δ ( q1, Y ) = ( q1, Y, R )
δ ( q1, b ) = ( q2, Y, L ) ; replace b by Y, and move left
δ ( q2, a ) = ( q2, a, L ) ;In left move, ignore all a‟s and Y‟s
δ ( q2, Y ) = ( q2, Y, L )
δ ( q2, X ) = ( q0, X, R ) ; when it finds X in q2, move right, go to q0 and repeat the process.
After replacing all a‟s by X‟s and b‟s by Y‟s, and machine is in state q0 reads Y, it means that there
are no a‟s, we should see that there are no b‟s. For this change state to q3 and replace by Y by Y
and move right.
δ ( q0, Y ) = ( q3, Y, R )
In state q3 we should see that there are only Y‟s and no more b‟s. So as we scan Y‟s, replace Y by
Y and remain in q3 only.
δ ( q3, Y ) = ( q3, Y, R )
In state q3 if it reads B, it indicates that there no b‟s and we say that the language accepted, since it
contains n number of a‟s followed by n number of b‟s.
δ ( q3, B ) = ( qf, B, R )
Answer:
The TM for the language L = { an bn | n 1 } is given by
M = ({ q0, q1 q2, q3 qf,} , {a, b}, { a, b, X, Y,B }, δ, {q0}, B, {qf}) where δ is the transition function
given by:
δ ( q0, a ) = ( q1, X, R )
δ ( q1, a ) = ( q1, a, R )
δ ( q1, Y ) = ( q1, Y, R )
δ ( q1, b ) = ( q2, Y, L )
δ ( q2, a) = ( q2, a, L )
δ ( q2, Y ) = ( q2, Y, L )
δ ( q2, X ) = ( q0, X, R )
δ ( q0, Y ) = ( q3, Y, R )
δ ( q3, Y ) = ( q3, Y, R )
δ ( q3, B ) = ( qf, B, R )
OR
δ is given by the transition table:
a b X Y B
→q0 (q1, X, R) (q3, Y, R)
q1 (q1, a, R) (q2, Y, L) (q1, Y, R)
q3 (q3, Y, R) (qf, B, R)
qf
Transition diagram:
δ ( q3, X ) = ( q0, X, R) ; when it finds X in left move, repeat the process from q0
After replacing all a‟s by X‟s , b‟s by Y‟s, and c‟s by Z‟s and machine is in state q0 reads Y, it
means that there are no a‟s, we should see that there are no b‟s and c‟s. For this change state to q 3
and replace by Y by Y and move right.
δ ( q0, Y ) = ( q4, Y, R )
In state q4 we should see that there are only Y‟s and no more b‟s. So as we scan Y‟s, replace Y by
Y and remain in q3 only.
δ ( q4, Y ) = ( q4, Y, R )
In state q4 if it reads Z, it means that there are no b‟s, we should see that there are no c‟s and only
Z‟s should be present. So on scanning first Z change state to q5, replace Z by Z and move right.
δ ( q4, Z ) = ( q5, Z, R ) .
In state q5 only Z‟s should be present, so as long as scanned symbol is Z, remain in q5 and replace Z
by Z and move right.
δ ( q5, Z ) = ( q5, Z, R ) .
Once blank symbol is encountered, change state to qf, replace B by B and move right, and we say
that language is accepted by qf. . δ ( q5, B ) = ( qf, B, R ) .
Answer:
The TM for the language L = { an bncn | n 1 } is given by
M = ({ q0, q1 q2, q3 q4, q5, qf,} , {a, b}, { a, b, c, X, Y, Z,B }, δ, {q0}, B, {qf}) where δ is the transition
function given by:
δ ( q0, a ) = ( q1, X, R )
δ ( q1, a ) = ( q1, a, R )
δ ( q1, Y ) = ( q1, Y, R )
δ ( q1, b ) = ( q2, Y, R )
δ ( q2, b) = ( q2, b, R)
δ ( q2, Z) = ( q2, Z, R)
δ ( q2, c) = ( q3, Z, L)
δ ( q3, a ) = ( q3, a, L)
δ ( q3, b ) = ( q3, b, L)
δ ( q3, Y) = ( q3, Y, L)
δ ( q3, Z ) = ( q3, Z, L)
δ ( q3, X ) = ( q0, X, R)
δ ( q0, Y ) = ( q4, Y, R )
δ ( q4, Y ) = ( q4, Y, R ) .
δ ( q4, Z ) = ( q5, Z, R )
δ ( q5, Z ) = ( q5, Z, R ) .
δ ( q5, B ) = ( qf, B, R ) .
Transition Diagram:
********Design a Turing machine to accept the language consisting of all palindromes of 0‟s and
1‟s. Write the transition diagram. Also write the moves made by TM for the string 101.
General Procedure:
Starting at the left end machine checks the first input symbol, if it is a 0, change it to X. Similarly
if it is a 1, change it to Y and move the r/w head towards right until it sees a blank. Now when it
finds a blank (B) move the r/w head towards left and check whether the scanned input symbol
matches the one most recently changed. If so it is also changed correspondingly and the machine
moves back left until it finds the left most 0 or 1. This process is continued by moving left and right
alternately until all 0‟s and 1‟s have been matched.
δ ( q0, 0 ) = ( q1, X, R ) ;In start state q0, replace 0 by X, change state to q1 and move right.
δ ( q0, 1 ) = ( q2, Y, R ) ;In start state q0, replace 1 by Y, change state to q2 and move right.
δ ( q1, 0 ) = ( q1, 0, R )
δ ( q1, 1 ) = ( q1,1, R ) ; In state q1 or q2 ignore 0‟s and 1‟s and move right until it sees B
or X or Y
δ ( q2, 0 ) = ( q2, 0, R )
δ ( q2, 1 ) = ( q2, 1, R )
δ ( q1, B) = ( q3, B, L ) ; In q1 or q2 when it finds B or X or Y change state to q3 or q4 and
δ ( q1, X) = ( q3, X, L ) move left
δ ( q1, Y) = ( q3, Y, L )
δ ( q2, B) = ( q4, B, L )
δ ( q2, X) = ( q4, X, L )
δ ( q2, Y) = ( q4, Y, L )
In state q3 it verifies that the symbol read is 0 and changes the 0 to an X and goes to state q5 .or in
q4 it verifies that the symbol read is 1 and changes 1 to a Y and goes to state q5.
δ ( q3, 0) = ( q5, X, L)
δ ( q4, 1) = ( q5, Y, L).
In state q5 machine moves left by ignoring 0‟s and 1‟s encountered, until it finds an X or Y.
δ ( q5, 0) = ( q5, 0, L)
δ ( q5, 1) = ( q5, 1, L)
Now when it finds X or Y, once again machine changes state to q0 and moves right.
δ ( q5, X) = ( q0, X, R)
δ ( q5, Y) = ( q0, Y, R)
Once again there are two possible cases in state q0:
1. If machine sees 0‟s and 1‟s, it repeats the above matching cycle process, we have just
described.
2. If machine sees X or Y, then it indicates that machine has changed all 0‟s to X‟s and 1‟s to
Y‟s; the input was of palindrome of even length, and hence machine should accept. Thus
machine enters state qf and halts.
δ ( q0, X) = ( qf, X, R)
δ ( q0, Y) = ( qf, Y, R)
In case machine in state q3 or q4 and reads an X or a Y instead of a 0 or 1, it concludes that input
was a palindrome of odd length.
In this case it changes to state qf.
δ ( q3, X) = ( qf, X, R)
δ ( q3, Y) = ( qf, Y, R)
δ ( q4, X) = ( qf, X, R)
δ ( q4, Y) = ( qf, Y, R).
Note: If machine encounters a 1 in state q3 or a 0 in state q4, then the input is not a palindrome
and so machine dies without accepting.
Answer:
The TM for the language consisting of all palindromes of 0‟s and 1‟s is given by
M = ({q0, q1, q2, q3, q4, q5, qf}, { 0, 1}, { 0, 1, X, Y, B}, δ, q0, B, {qf}) where δ is the transition
function given by:
δ ( q0, 0 ) = ( q1, X, R )
δ ( q0, 1 ) = ( q2, Y, R )
δ (q0, B) = ( qf, B, R)
δ ( q0, X) = ( qf, X, R)
δ ( q0, Y) = ( qf, Y, R)
δ ( q1, 0 ) = ( q1, 0, R )
δ ( q1, 1 ) = ( q1,1, R )
δ ( q2, 0 ) = ( q2, 0, R )
δ ( q2, 1 ) = ( q2, 1, R )
δ ( q1, B) = ( q3, B, L )
δ ( q1, X) = ( q3, X, L )
δ ( q1, Y) = ( q3, Y, L )
δ ( q2, B) = ( q4, B, L )
δ ( q2, X) = ( q4, X, L )
δ ( q2, Y) = ( q4, Y, L )
δ ( q3, 0) = ( q5, X, L)
δ ( q3, X) = ( qf, X, R)
δ ( q3, Y) = ( qf, Y, R)
δ ( q4, 1) = ( q5, Y, L)
δ ( q4, X) = ( qf, X, R)
δ ( q4, Y) = ( qf, Y, R) δ ( q5, 0) = ( q5, 0, L)
δ ( q5, 1) = ( q5, 1, L)
δ ( q5, X) = ( q0, X, R)
δ ( q5, Y) = ( q0, Y, R)
Transition Diagram:
******Design a TM that accept the language L = {wwR | w € (0, 1)*}. Write its transition diagram.
Also show the moves made by the TM for the 0110.
Note: Answer is same as that of previous problem except in wwR (string of palindrome of even
length) from states q3 and q4 no transitions are defined on input symbols X and Y.
Answer:
The TM for the language L = {wwR | w € (0, 1)*} is given by
M = ({q0, q1, q2, q3, q4, q5, qf}, {0, 1}, { 0, 1, X, Y, B}, δ, q0, B, {qf}) where δ is the transition
function given by:
δ ( q0, 0 ) = ( q1, X, R )
δ ( q0, 1 ) = ( q2, Y, R )
δ (q0, B) = ( qf, B, R)
δ ( q0, X) = ( qf, X, R)
δ ( q0, Y) = ( qf, Y, R)
δ ( q1, 0 ) = ( q1, 0, R )
δ ( q1, 1 ) = ( q1,1, R )
δ ( q2, 0 ) = ( q2, 0, R )
δ ( q2, 1 ) = ( q2, 1, R )
δ ( q1, B) = ( q3, B, L )
δ ( q1, X) = ( q3, X, L )
δ ( q1, Y) = ( q3, Y, L )
δ ( q2, B) = ( q4, B, L )
δ ( q2, X) = ( q4, X, L )
δ ( q2, Y) = ( q4, Y, L )
δ ( q3, 0) = ( q5, X, L)
δ ( q4, 1) = ( q5, Y, L)
δ ( q5, 0) = ( q5, 0, L)
δ ( q5, 1) = ( q5, 1, L)
δ ( q5, X) = ( q0, X, R)
δ ( q5, Y) = ( q0, Y, R)
Transition Diagram:
******Design a TM that accept the language L = {w| Na(w) = Nb(w) for all w € (a, b)* }. Write its
transition diagram. Also show the moves made by the TM for the bbabaa.
General Procedure:
Three possible cases:
1. On encountering B in start state, machine directly enters into final state qf.
2. On encountering a in stateq0.
3. On encountering b in state q0.
On encountering B in start state, machine directly enters into final state qf.
δ ( q0, B) = ( qf, B, R)
In start state q0 on encountering a, we skip all subsequent symbols till we get b. Then come back to
the next leftmost symbol and repeat any of the 3 cases based on the next input symbol to be
scanned.
In start state q0 on encountering b, we skip all subsequent symbols till we get a. Then come back to
the next leftmost symbol and repeat any of the 3 cases based on the next input symbol to be
scanned.
On encountering a:
δ ( q0, a) = ( q1, X, R) ; replace a by X and move right to get b
δ ( q1, a) = ( q1, a, R) ; ignore all a‟s and Y‟s till we get b
δ ( q1, Y) = ( q1, Y, R)
δ ( q1, b) = ( q2, Y, L) ; replace b by Y and move left and find the next leftmost symbol.
δ ( q2, a) = ( q2, a, L) ; when searching for X, we may encounter a‟s and Y‟s, so ignore that symbol.
δ ( q2, Y) = ( q2, Y, L)
δ ( q2, X) = ( q0, X, R) ; when it finds X, go to q0 and repeat.
On encountering b:
δ ( q0, b) = ( q3, X, R) ; replace b by X and move right to get a
δ ( q3, b) = ( q3, b, R) ; ignore all b‟s and Y‟s until it sees a
δ ( q3, Y) = ( q3, Y, R)
δ ( q3, a) = ( q4, Y, L) ; replace a by Y and move left and find the next leftmost symbol.
δ ( q4, b) = ( q4, b, L) ; when searching for X, we may encounter b‟s and Y‟s, so ignore that symbol.
δ ( q4, Y) = ( q4, Y, L)
δ ( q4, X) = ( q0, X, R) ; when it finds X, go to q0 and repeat.
In state q0 if machine reads Y, it indicates that so far the scanned symbols have equal number of a‟s
and b‟s. So replace Y by Y and move the r/w head towards right, remain in q 0 and repeat any one
of the three cases
δ ( q0, Y) = ( q0, Y, R)
Finally the language is accepted when there is no input in q0, machine enters to final state qf
Answer:
The TM for the language L = {w| Na(w) = Nb(w) for all w € (a, b)* }is given by
M = ({q0, q1, q2, q3, q4, qf}, {0, 1}, { 0, 1, X, Y, B}, δ, q0, B, {qf}) where δ is the transition function
given by:
δ ( q0, B) = ( qf, B, R)
δ ( q0, a) = ( q1, X, R)
δ ( q0, Y) = ( q0, Y, R)
δ ( q1, a) = ( q1, a, R)
δ ( q1, Y) = ( q1, Y, R)
δ ( q1, b) = ( q2, Y, L)
δ ( q2, a) = ( q2, a, L)
δ ( q2, Y) = ( q2, Y, L)
δ ( q2, X) = ( q0, X, R)
δ ( q0, b) = ( q3, X, R)
δ ( q3, b) = ( q3, b, R)
δ ( q4, X) = ( q0, X, R)
δ ( q3, a) = ( q4, Y, L)
δ ( q4, b) = ( q4, b, L)
δ ( q4, Y) = ( q4, Y, L)
δ ( q3, Y) = ( q3, Y, R)
Transition Diagram:
*********Given a string w, design a Turing machine that generates the string ww where w € a*
General Procedure:
1. Replace each symbol in w with X
2. Find the rightmost X
3. Replace the rightmost X by the symbol a
4. Move the R/W head towards right of rightmost a and replace B by a
5. Find the rightmost X
6. Repeat through step 3 till we find no more X‟s
In state q0, keep on replacing the input symbol a by X and move the r/w head towards right till we
find B.
δ ( q0, a) = ( q0, X, R)
In state q0, when it finds B, replace B by B and change the state to q1 and move r/w head towards
left, till we get X.
δ ( q0, B) = ( q1, B, L)
If we get a in state q1, replace a by a and move left.
δ ( q1, a) = ( q1, a, L)
When we get X in q1 replace it by a and move right, change the sate to q2
δ ( q1, X) = ( q2, a, R)
If we get a in state q2, replace a by a and move right till we get B
δ ( q2, a) = ( q2, a, R)
In state q2, when it finds B, replace B by a and change the state to q1 and move r/w head towards
left, till we get X and repeat the above steps.
δ ( q2, B) = ( q1, a, L)
Finally when there is no more X‟s, and in state q1, machine reads B as the input, change the state to
qf, replace B by B and move right.
δ ( q1, B) = ( qf, B, R)
The Turing machine that generates the string ww where w € a* is given by
M = ({q0, q1, q2, qf }, {a}, { a, X, B}, δ, q0, B, {qf}) where δ is the transition function given by:
δ ( q0, a) = ( q0, X, R)
δ ( q0, B) = ( q1, B, L)
δ ( q1, a) = ( q1, a, L)
δ ( q1, X) = ( q2, a, R)
δ ( q2, a) = ( q2, a, R)
δ ( q2, B) = ( q1, a, L)
δ ( q1, B) = ( qf, B, R)
DESCRIPTION OF TURING MACHINES
In the examples discussed so far, the transition function δ was described as a partial function
(function δ: Q x Г → Q x Г x {L, R} is not defined for all (q, x) by spelling out the current state,
the input symbol, the resulting state, the tape symbol replacing the input symbol and the movement
of R/W head to the left or right. We can call this a formal description of a TM. Just as we have the
machine language and higher level languages for a computer. We can have a higher level of
description, called the implementation description. In this case we describe the movement of the
head, the symbol stored etc. in English.
For example, a single instruction like 'move to right till the end of the input string' requires several
moves. A single instruction in the implementation description is equivalent to several moves of a
standard TM.
At a higher level we can give instructions in English language even without specifying the state or
transition function. This is called a high-level description. In next section we give implementation
description or high-level description.
TECHNIQUES FOR TM CONSTRUCTION
The Turing machine, which we have discussed so far, is called the standard or Basic Turing
machine. In this section we give some high-level conceptual tools to make the construction of TMs
easier.
1. Turing Machine with stationary head
2. Storage in the state
3. Multiple Track Turing Machine
4. Subroutines
TURING MACHINE WITH STATIONARY HEAD
In the definition of a TM we defined δ(q, a) = (p, Y, D) where D = L or R. So the head moves to
the left or right after reading an input symbol. Suppose we want to include the option that the head
can continue to be in the same cell for some input symbol. Then we define δ(q, a) = (p, Y, S). This
means that the TM, on reading the input symbol a, changes the state to p and writes Y in
the current cell in place of a and R/W head continues to remain in the same cell.
Stationary move can be simulated by the standard TM with Two moves.
State [ q0, B] → In the initial state, M is in q0 and TM has seen only B in its data portion.
In state [q0, B] on seeing the first symbol as 0, of the input sting w, M moves right, enters the state
[q1, 0]
In state [q0, B] on seeing the first symbol as 1, of the input sting w, M moves right, enters the state
[q2, 1]
In [q1, 0] → M moves right without changing state for input symbol 1.
In [q2, 1] → M moves right without changing state for input symbol 0.
In state [q1, 0] if its next symbol is B, M enters [qf, B], an accepting state.
In state [q2, 1] if its next symbol is B, M enters [qf, B], an accepting state.
Example:
Here the input symbols are tape symbols defined in 3 tracks. ie: Γ3 ;for example input is [c, a, b]
δ ( q, c, a, b) = (qnext, Z, X, Y, R)
The resultant tape structure is as shown below:
SUBROUTINES
Subroutines are used in computer languages, when some task has to be done repeatedly. We can
implement this facility for TMs as well.
TM subroutine is a set of states that perform some pre-defined task. The TM subroutine has a start state
and a state without any moves. This state which has no moves serves as the return state and passes the
control to the state which calls the subroutine.
Design a TM which can multiply two positive integers
The input (m, n) where m, n being given, the positive integers are represented by 0m10n. M starts
with 0m10n in its tape. At the end of the computation 0mn (mn in unary representation) surrounded
by B's is obtained as the output.
General Procedure:
1. 0m10n1 is placed on the tape and output will be written after the rightmost 1.
2. The Leftmost 0 is erased by replacing 0 by B.
3. A block of n 0‟s is copied onto the right end.
4. Step 2 and 3 is repeated m times and 10m10mn is obtained on the tape
5. The prefix 10n1 of 10n10mn is erased. (replacing all 0‟s and 1‟s by B) leaving the product mn
as the output.
For example multiply 2 and 4: Initially tape contains these two unary numbers is as follows:
Here we have to copy n number of 0‟s from the second group to the last group by replacing n number of
B‟s by n number of 0‟s
In start state q0 replace leftmost 0 by B, change state to q1 and move the r/w head towards right till we
get 1.
δ (q0, 0) = ( q1, B, R)
Now we should copy n 0‟s from the second group to last group.
δ (q1, 0) = ( q1, 0, R)
δ (q1, 1) = ( q2, 1, R)
Now the R/w is pointing to the first 0 of second group. (COPY subroutine in start state q2 )
In q2 replace 0 by X and change the state to q3, move r/w head towards right till we get B.
δ (q2, 0) = ( q3, X, R)
δ (q3, 0) = ( q3, 0, R)
δ (q3, 1) = ( q3, 1, R)
In q3 when it reads B, replace B by 0 and change state to q4 and move left.( at this point one symbol
is copied from second group to last group)
δ (q3, B) = ( q4, 0, L)
In q4 we should search for rightmost X.
While moving left in q4, replace 0 by 0 , 1 by 1 till we get X.
δ (q4, 0) = ( q4, 0, L)
δ (q4, 1) = ( q4, 1, L)
When it reads X, change state to q2 , replace X by X and move right.
δ (q4, X) = ( q2, X, R)
When n number of B‟s in last group are, replaced by n number of 0‟s in second group. In state q2
machine reads 1 then change state to q5 and move left.
δ (q2, 1) = ( q5, 1, L)
In state q5 while moving left, replace all X‟s in second group by 0‟s, till we get 1.
δ (q5, X) = ( q5, 0, L)
When machine reads 1 in q5 replace 1 by 1 and move right, change state to q6
δ (q5, 1) = ( q6, 1, R)
In state q6 machine reads 0,(pointing to the first symbol of 2nd group) move left and change state to
q7.
δ (q6, 0) = ( q7, 1, L)
In q7 machine reads 1; the delimiter of 1st and 2nd group. Change state to q8 and move left, so that
machine enters the 1st group.
δ (q7, 1) = ( q8, 1, L)
In q8 when it reads 0 change state to q9 and move left.
δ (q8, 0) = ( q9, 0, L)
In q9 on any number 0s move the r/w towards left.
δ (q9, 0) = ( q9, 0, L)
In q9 if we encounter B, change the state to q0 and move right.
δ (q9, B) = ( q0, B, R)
But, in state q8, instead of 0‟s if machine encounters B‟s it means that n 0‟s have been copied from
the second group to last group m number of times.
Now replace the delimiter1 which precede and follow the second group and second group 0‟s by
B‟s.
δ (q8, B) = ( q10, B, R)
δ (q10, 1) = ( q11, B, R)
δ (q11, 0) = ( q11, B, R)
δ (q11, 1) = ( q12, B, R)
B B B B B B B B B B B B B 0 0 0 0 0 0 0 0 B B
δ (q4, 0) = ( q4, 0, L)
δ (q4, 1) = ( q4, 1, L)
δ (q4, X) = ( q2, X, R)
δ (q6, 0) = ( q7, 1, L)
δ (q5, X) = ( q5, 0, L)
δ (q5, 1) = ( q6, 1, R)
δ (q2, 1) = ( q5, 1, L)
δ (q7, 1) = ( q8, 1, L)
δ (q8, 0) = ( q9, 0, L)
δ (q9, 0) = ( q9, 0, L)
δ (q9, B) = ( q0, B, R)
δ (q8, B) = ( q10, B, R)
δ (q10, 1) = ( q11, B, R)
δ (q11, 0) = ( q11, B, R)
δ (q11, 1) = ( q12, B, R)
Transition diagram:
Multi-Tape TM:
A Turing machine M with more than one tape.
A multi-tape TM has a finite set Q of states, an initial state q0, a subset F of Q called the set of final
states, a set P of tape symbols, a new symbol B not in P called the blank symbol.
• There are k tapes, each divided into cells. The first tape holds the input string w.
• Initially all the other tapes hold the blank symbol.(B)
• Initially the head of the first tape (input tape) is at the left end of the input w.
• All the other heads can be placed at any cell initially.
• δ is a partial transition function from Q x Гk into Q x Гk x {L, R, S}k. where k is the
number of tapes.
• Multi-tape TM is more powerful than single tape TM but the language accepted by Multi-
tape TM is recursively enumerable language. That means language accepted by Multi-tape
TM is also accepted by basic or standard TM. Multi-tape TM and standard TM are
equivalent.
The Multi-tape TM M enters a new state.
On each tape a new symbol is written in the cell under the head.
Each tape head moves to the left or right or remains stationary. The heads move
independently: some move to the left, some to the right and the remaining heads do not
move.
The initial ID has the initial state q0, the input string w in the first tape (input tape), empty
strings of B's in the remaining k - 1 tapes.
An accepting ID has a final state, some strings in each of the k tapes.
OR
Proof:
Suppose a language L is accepted by a k-tape (Multi tape) TM M. We simulate M with a single-tape
TM M1 with 2k tracks. Let us consider the implementation description by considering k = 2
We will prove this theorem by simulating the working of 2-tape TM (M) with the working of single
tape 4-track TM (M1). Assume that the second, fourth, ..., (2k)th tracks hold the contents of the k-
tapes. The first, third, ... , (2k - 1)th tracks hold a R/w head marker (a symbol say X) to indicate the
position of the respective tape head.
The R/w head markers (X) of the first and third tracks are at the cells containing the first symbol.
To simulate the above move of Multi-tape TM M in single tape TM M1, the single tape TM M1 has
to visit the two R/w head markers and store the scanned symbols in its finite control.
The finite control of single tape TM M1 has also the information about the states of multi-tape TM
M and its moves.
Now M1 revisits each of the head markers to perform the following operations:
It changes the tape symbol in the corresponding track of single tape TM M1 based on the
information regarding the move of 2-tape TM M corresponding to the state (of M) and the tape
symbol in the corresponding tape M. It moves the head markers to the left or right. M1 changes
the state of M in its control
The non-deterministic TM in fact is no more powerful than the deterministic TM. Any language
accepted by non-deterministic TM can be accepted by deterministic TM.
LBA is formally defined as M: which is a 9-tuple, namely (Q, ∑ ,Г, δ, q0. B, , $, F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols with two special symbols ,$ and is a subset of Г and B
≠∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → Q x Г x { L/R}
• q0 € Q is the initial state, and
F is the subset of Q is the set of final states
is the left end marker, which is entered in the leftmost cell of the input tape and prevents the
R/W head from getting off the left end of the tape.
$ is the right end marker, which is entered in the rightmost cell of the input tape and prevents the
R/W head from getting off the right end of the tape. Both the end markers should not appear on
any other cell within the input tape. R/W head should not print any other symbol over both the
end markers.
There are two tapes: one is called the input tape, and the other, working tape.
• On the input tape the head never prints and never moves to the left.
• On the working tape the head can modify the contents in any way, without any restriction.
The set of strings accepted by nondeterministic LBA is the set of strings generated by the context-
sensitive grammars, excluding the null strings. That is context sensitive language.
DECIDABILITY
The notion of a recursively enumerable language and a recursive language existed even before
the invention of computers. These languages are also defined using Turing machines as follows:
TM halts when it reaches a final state after reading the entire input string w.
Turing machine M halts when M reaches a state q and a current input symbol a to be
scanned so that δ(q, a) is undefined.
There are TMs that never halt on some inputs in any one of these ways. (it may enter into
infinite loop)
So we make a distinction between the languages accepted by a TM that halts on all input strings
and a TM that never halts on some input strings. That leads to the property of decidability and un-
decidability of the language.
Let M1 be the TM which decides whether or not any computation by another TM M will ever halt
when a description of that TM M is given as M and tape symbol as w.
That means input to M1 will be (machine, tape) pair; (M, w)
Then for every input (M, w) to M1; if TM M accept input w, then M1 halts which is called Accept
halt.
Similarly if M does not accept input w then the machine M1 will halt which is called reject halt.
As M2 itself is one TM we will take M2 = M. that means we will replace M by M2 from the above
given machine.
Thus the machine M2 halts for input M2, if M2 does not halt for M2. This is a contradiction. That
means a machine M1 which can tell whether any other Turing machine will halt on particular input
does not exist. Hence halting problem is un-decidable.
THE POST CORRESPONDENCE PROBLEM
The Post Correspondence Problem (PCP) was first introduced by Emil Post in 1946. Later, the
problem was found to have many applications in the theory of formal languages. The problem over
an alphabet ∑ belongs to a class of yes/no problems and is stated as follows:
Consider the two lists of non-empty strings over an alphabet ∑ = { 0, 1}
x = ( x1, x2, x3, x4,……………………. xn)
y = ( y1, y2, y3, y4,……………………. yn)
The PCP is to determine whether or not there exist i1,i2……………im where 1 ≤ ij ≤ n such that
xi1 xi2………………… xim = yi1 yi2………………… yim
The indices ij need not be distinct and m may be greater than n. Also, if there exists a solution to
PCP, there exist infinitely many solutions.
Does the PCP with two lists x = (b, bab3, ba) and y = (b3, ba, a) have a solution?
Answer:
We have to determine whether or not there exists a sequence of substrings of x such that the string
formed by this sequence and the string formed by the sequence of corresponding substrings of y are
identical.
x = (b, bab3, ba) and y = (b3, ba, a)
The required sequence is given by:
i1 = 2, i2 = 1, i3 = 1, i4 = 3, ie: (2, 1, 1, 3) and m = 4
The corresponding strings are:
bab3bbba = bab3b3a ie: babbbbbba = babbbbbba ; Thus the PCP has a solution.
Does the PCP with two lists x = (11, 100, 111) and y = (111, 001, 11) have a solution?
The required sequence is given by:
i1 = 1, i2 = 2, i3 = 3, ie: (1, 2, 3) and m = 3
The corresponding strings are:
If L1 and L2 are recursive languages then Show that L1 U L2 is also recursive language.
OR
Show that the recursive languages are closed under union.
Proof:
Let L1 and L2 are recursive languages.
As L1 and L2 are recursive languages there exists a TM M1 that accepts L1 and M2 that accepts L2.
Now we have to simulate a TM M that accepts the language L such tat L = L1 U L2
Construction of TM M is as follows:
If the string w € L1 U L2, then it implies that either w € L1 or w € L2 or w belongs to both L1 and L2.
That means TM M1 accepts the string w; if w € L1 or TM M2 accepts the string w; if w € L2
Also M1 and M2 accept the string w if it belongs to both L1 and L2. Thus the simulated TM M
produces the output as Accept (yes).
Similarly if the string w does not belong to L1 U L2 then it implies that string w does not belongs L1
as well as L2, resulting in both the machine M1 and M2 produces the output as Reject. Thus the
simulated TM M also produces the output as Reject (No).
Thus the TM M accepts the language L = L1 U L2 is recursive. ( it produces only two outputs as Y
or N)
If L1 and L2 are recursively enumerable languages then Show that L1 U L2 is also recursively
enumerable language.
OR
Show that the recursively enumerable languages are closed under union.
Proof:
Let L1 and L2 are recursively enumerable languages.
As L1 and L2 are recursively enumerable languages there exists a TM M1 that accepts L1 and M2
that accepts L2. Now we have to simulate a TM M that accepts the language L such tat L = L1 U L2
Construction of TM M is as follows:
If the string w € L1 U L2, then it implies that either w € L1 or w € L2 or w belongs to both L1 and L2.
That means TM M1 accepts the string w; if w € L1 or TM M2 accepts the string w; if w € L2
Also M1 and M2 accept the string w if it belongs to both L1 and L2. Thus the simulated TM M
produces the output as Accept (yes).
Similarly if the string w does not belong to L1 U L2 then it implies that string w does not belongs L1
as well as L2, resulting in both the machine M1 and M2 produces the output as Reject. Thus the
simulated TM M also produces the output as Reject (No).
Sometimes the string w which does not belongs to L1, results in M1s output as entering into loop
forever or the string w which does not belongs to L2, results in M2s output as entering into loop
forever or the string w which does not belongs to both L1 and L2, results in M1s output as entering
into loop forever and M2s output as entering into loop forever. So the simulated TM M producing
output as loop forever.
Thus the TM M accepts the language L = L1 U L2 is recursively enumerable. (since it produces the
outputs as Y or N or loop forever)
Show that complement of a recursive language is also recursive language.
OR
If L is recursive then there exist a TTM with two outputs Yes (Accept) or No (Reject). Thus the
machine halts and T(M) = L
Let us construct a new machine M1 such that L‟ = T(M1) with the following steps
1. Accepting states of M are made non-accepting states of M1 and there is no transition from
that state in M1. That means we have created a state in M1 that will HALT without
accepting.
2. Now create a new accepting state for M1 say „qf‟ and there is no transition from r.
3. If q is a non- accepting state of M and δ (q, x) is not defined, then add a transition from q to
qf for M1
Since M is guaranteed to halt M1 is also guaranteed to Halt. In fact M1 accepts exactly those
strings that M does not accept. Thus we can say that M1 accepts
If the input w belongs to L then M1 accepts w and we declare the machine M accepts w.
If the input string w does not belongs to L. ie w € then M2 accepts w and we declare that M
halts without accepting (reject). Thus in both the cases, M eventually Halts. By the construction
of M it is clear that T(M) = T(M1) = L.
Hence L is recursive.
COMPLEXITY
The efficiency of an algorithm can be decided by measuring the performance of an algorithm. We
can measure the performance of an algorithm by computing two factors:
i. Amount of time required by an algorithm to execute
ii. Amount of storage required by an algorithm.
Hence we define two terms- Time complexity and space complexity.
Time complexity: of an algorithm means the amount of time taken by algorithm to run. By
computing time complexity we come to know whether the algorithm is slow or fast
Space complexity of an algorithm means the amount of space (memory) taken by an algorithm. By
computing space complexity we can analyze whether an algorithm requires more or less space.
To select the best algorithm, we need to check efficiency of each algorithm. The efficiency can be
measured by computing time complexity of each algorithm. Asymptotic notations such as Ω, Ѳ and
O is the shorthand way to represent the time complexity. Using this notation we can give time
complexity as “fastest possible”, ”slowest possible or average time.
Big Oh Notation: The Big oh notation is denoted by O is a method of representing the upper bound
of algorithm‟s running time. Using Big Oh notation we can give longest amount of time taken by
algorithm to complete.
Definition of Big Oh notation:
Let f(n) and g(n) be two non-negative functions
Let n0 and constant c are two integers such that n0 denotes some value of input and n > n0. Similarly
c is some constant such that c > 0 we can write
f(n) ≤ c * g(n)
then f(n) is Big Oh of g(n). ie: f(n) = O(g(n))
Consider the function f(n) = 2n +2 and g(n) = n2 Find some constant c, so that f(n) <= c* g(n)
Find c when n = n0 = 1
f(n) = 2n +2 and g(n) = n2
f(n) = 4 and g(n) = 1 f(n) > g(n)
n= 2
f(n) = 6 and g(n) = 4 f(n) > g(n)
n= 3
f(n) = 8 and g(n) = 9 f(n) < g(n)
Hence we can conclude that for n > 2, we obtain f(n) < g(n)
Thus the upper bound of existing time is obtained by Big Oh notation.
GROWTH RATE OF FUNCTIONS
When we have two algorithms for the same problem, we may require a comparison between the
running times of these two algorithms.
Measuring the performance of an algorithm in relation with the input size ‘n’ is called order of
growth.
Construct the time complexity T(n) for the Turing machine M to accept the language L = {an bn | n
≥ 1}
• TM Consists of going through the input string (anbn) forward and backward and replacing
the leftmost a by X and the leftmost b by Y. So we require at most 2n moves to match a 0
with a 1.
• Repetition of the above step requires n number of times.
• Hence the number of moves for accepting an bn is at most (2n) n
For strings not of the form an bn , TM halts with less than 2n2 steps.
Hence T(n) = O(n2).
NP Problem
Quantum Computation: is the area of study that focuses on development of computer technology
based on the principle of quantum theory
Quantum Computer:
We know that a bit (a 0 or a 1) is the fundamental concept of classical computation and
information. Classical computer is built from an electronic circuit containing wires and logical
gates. Let us study quantum bits and quantum circuits which are analogous to bits and (classical)
circuits.
Quantum computer maintains a sequence of qubits. Qubit can be mathematically described as:
The classical computer bits has two states 0 and 1.The two possible states for a qubit are the states:
|0 > and |1 >. Qubit is represented using the notation | >. Qubit can be in infinite number of states
other than | 0 > and | 1 > . It can be in state
It is not possible to obtain the quantum states by observation, whereas in classical computer bit 0
and 1 can be observed.
Multiple qubits can be defined.
Example: Two qubit system has 4 basis states
|0 0>
|0 1>
|10 >
|11 >
Quantum states can be:
is changed to
The action of qubit NOT gate can be represented using matrix as:
Thus the quantum computer is a system built from quantum circuits, containing wires and
elementary quantum gates to carry out manipulation of quantum information
CHURCH-TURING THESIS
Any algorithm that can be performed on any computing machine can be performed on a Turing
machine as well.
Any algorithmic process can be simulated efficiently by a Turing machine
• But a challenge to the strong Church-Turing thesis arose from analog computation.
• Certain types of analog computers solved some problems
efficiently whereas these problems had no efficient
solution on a Turing machine. But when the presence of noise was taken into account, the
power of the analog computers disappeared.
• In mid-1970s. Robert Solovay and Volker Strassen gave a randomized algorithm for testing
the primality of a number. (A deterministic polynomial algorithm was given by Manindra
Agrawal, Neeraj Kayal and Nitein Saxena of IIT Kanpur in 2003) This led to the
modification of the Church thesis.
Strong Church-Turing Thesis: Any algorithmic process can be simulated efficiently using a
Here the input symbols are tape symbols defined in 3 tracks. ie: Γ3 ;for example input is [c, a, b]
δ ( q, c, a, b) = (qnext, Z, X, Y, R)
The resultant tape structure is as shown below:
A multi-tape TM has a finite set Q of states, an initial state q0, a subset F of Q called the set of final
states, a set P of tape symbols, a new symbol B not in P called the blank symbol.
• There are k tapes, each divided into cells. The first tape holds the input string w.
• Initially all the other tapes hold the blank symbol.(B)
• Initially the head of the first tape (input tape) is at the left end of the input w.
• All the other heads can be placed at any cell initially.
• δ is a partial transition function from Q x Гk into Q x Гk x {L, R, S}k. where k is the
number of tapes.
• Multi-tape TM is more powerful than single tape TM but the language accepted by Multi-
tape TM is recursively enumerable language. That means language accepted by Multi-tape
TM is also accepted by basic or standard TM. Multi-tape TM and standard TM are
equivalent.
The Multi-tape TM M enters a new state.
On each tape a new symbol is written in the cell under the head.
Each tape head moves to the left or right or remains stationary. The heads move
independently: some move to the left, some to the right and the remaining heads do not
move.
The initial ID has the initial state q0, the input string w in the first tape (input tape), empty
strings of B's in the remaining k - 1 tapes.
An accepting ID has a final state, some strings in each of the k tapes.
There are two tapes: one is called the input tape, and the other, working tape.
• On the input tape the head never prints and never moves to the left.
• On the working tape the head can modify the contents in any way, without any restriction.
5. RECURSIVE ENUMERABLE LANGUAGES
A language L which is a subset of ∑* is a recursively enumerable language if there exists a TM M,
such that L = T(M). ( Halt or enter into infinite loop).
That means languages accepted by a TM are called RE languages.
Structure of RE languages:
1. An algorithm has a TM that not only recognizes the language, but it tells us when it has
decided the input string is not in the language, such a TM always halts eventually regardless
of whether or not it reaches an accepting state.
2. Language consists of those RE languages that are not accepted by a TM with the guarantee
of halting. These languages are accepted in an inconvenient way ie:
i. If the is in the language, then it is accepted by TM
ii. If input is not in the language then the TM may run forever, and we shall never be
sure the input won‟t be accepted eventually.
6. RECURSIVE LANGUAGES
A language L which is a subset of ∑* is a recursive language if there exists a TM M, that satisfies
the following two conditions.
i. If the string w is defined in the language, then the TM accepts the string w and
Halts.
ii. If the string w is not defined in the language, then the TM eventually Halts without
reaching an accepting state.
Recursive language definition assures us that TM always Halts. It is clear that a recursive language
is subset of recursively enumerable language.
Recursive languages are also called as decidable languages. The TM that always halt irrespective of
whether they accept or not is a good model for an algorithm. If an algorithm exist to solve a
problem, then the problem is decidable otherwise un-decidable if it is not a recursive language.
The existence or non-existence of an algorithm to solve a problem is often more important than the
existence some TM to solve the problem. Thus dividing the languages into decidable and un-
decidable languages is often more important than the division of languages as Recursively
enumerable( Those have some sort of TM) or non-recursively enumerable languages ( which have
no TM at all)
The relationship between the classes of languages are as shown below:
Does the PCP with two lists x = (b, bab3, ba) and y = (b3, ba, a) have a solution?
We have to determine whether or not there exists a sequence of substrings of x such that the string
formed by this sequence and the string formed by the sequence of corresponding substrings of y are
identical.
x = (b, bab3, ba) and y = (b3, ba, a)
The required sequence is given by:
i1 = 2, i2 = 1, i3 = 1, i4 = 3, ie: (2, 1, 1, 3) and m = 4
bab3bbba = bab3b3a
Thus the PCP has a solution.
If the first substring used in PCP is always xl and y1 then the PCP is known as the Modified Post
Correspondence Problem
machine may be in a loop because of very long computation. What is required is an algorithm that
can determine the correct answer for any M and w by performing some analysis on the machine‟s
description and the input..
Formally the Halting problem of TM is stated as “given arbitrary TM M = (Q, ∑ ,Г, δ, q0. B, F) and
the input w € ∑*, does M halt on input w?”
Thus the Halting problem of a TM M is a collection of strings (language) having a format called
(M, w) in such a way that TM M with input strings w. If M halts on w then really M is a TM and w
is some input string. We collect all (M, w) strings in the form of language. The haltingproblem of a
TM is un-decidable.
10. P and NP PROBLEMS
Problems can be classified under two groups:
1. P- problem: Problem can be solved in polynomial time.
Searching of an element from the list O(logn), Sorting of elements of O(logn)
2. NP-problem: Problem can be solved in non-deterministic polynomial time.
Knapsack problem O(2n/2) and travelling salesperson problem (O(n)).
P problem:
A Turing machine M is said to be of time complexity T(n) if the following holds:
Given an input w of length n, M halts after making at most T(n) moves
A language L is in class P if there exists some polynomial T(n) such that L = T(M) for some
deterministic TM M of time complexity T(n).
Construct the time complexity T(n) for the Turing machine M to accept the language L = {an bn | n
≥ 1}
• TM Consists of going through the input string (anbn) forward and backward and replacing
the leftmost a by X and the leftmost b by Y. So we require at most 2n moves to match one a
with one b.
• Repetition of the above step requires n number of times.
• Hence the number of moves for accepting an bn is at most (2n) n
For strings not of the form an bn, TM halts with less than 2n2 steps.
Hence T(n) = O(n2).
NP Problem
A language L is in class NP if there is a non-deterministic TM M and a polynomial time complexity
T(n) such that L = T(M) and M executes at most T(n) moves for every input w of length n.
We have seen that a deterministic TM M1 simulating a non-deterministic TM M exists. If T(n) is
time complexity of M, then the complexity of the equivalent deterministic TM M1 is 2O(T(n))
11. COMPLEXITY of an ALGORITHM
The efficiency of an algorithm can be decided by measuring the performance of an algorithm. We
can measure the performance of an algorithm by computing two factors:
i. Amount of time required by an algorithm to execute
ii. Amount of storage required by an algorithm.
Hence we define two terms- Time complexity and space complexity.
Time complexity: of an algorithm means the amount of time taken by algorithm to run. By
computing time complexity we come to know whether the algorithm is slow or fast
Space complexity of an algorithm means the amount of space (memory) taken by an algorithm. By
computing space complexity we can analyze whether an algorithm requires more or less space.
To select the best algorithm, we need to check efficiency of each algorithm. The efficiency can be
measured by computing time complexity of each algorithm. Asymptotic notations such as Ω, Ѳ and
O is the shorthand way to represent the time complexity. Using this notation we can give time
complexity as “fastest possible”, ”slowest possible or average time.
Big Oh Notation: The Big oh notation is denoted by O is a method of representing the upper bound
of algorithm‟s running time. Using Big Oh notation we can give longest amount of time taken by
algorithm to complete.
When we have two algorithms for the same problem, we may require a comparison between the
running times of these two algorithms.
Measuring the performance of an algorithm in relation with the input size ‘n’ is called order of
growth. In particular the exponential function grows at a very fast rate when compared to any
polynomial of large degree. We prove a precise statement comparing the growth rate of
polynomials and exponential function.
A Turing machine M is said to be of time complexity T(n) if the following holds:
Given an input w of length n, M halts after making at most T(n) moves:
A language L is in class P if there exists some polynomial T(n) such that L = T(M) for some
deterministic TM M of time complexity T(n). In the case of an algorithm T(n) denotes the running
time for solving a problem with an input of size n.
We have seen that a deterministic TM M1 simulating a non-deterministic TM M exists. If T(n) is
time complexity of M, then the complexity of the equivalent deterministic TM M1 is 2O(T(n))
It is not known whether the complexity of M1 is less than 2O(T(n)).
12. *********QUANTUM COMPUTER
A quantum computer is a system that built from quantum circuits, containing wires and elementary
quantum gates, to carry out manipulation of quantum information.
A classical computer has a memory made up of bits of 0 and 1. The quantum computers maintain a
sequence of qubits that can be represented mathematically as:
The classical computer bits have two states 0 and 1.The two possible states for a qubit are the
states: |0 > and |1 >. Qubit is represented using the notation | >. Qubit can be in infinite number of
states other than | 0 > and | 1 >. It can be in state
It is possible to define logical gates using qubit. The classical NOT gate changes the 0 to 1 and 1 to
0. Incase of qubit NOT gate
is changed to
The action of qubit NOT gate can be represented using matrix as:
Universal Turing Machine is a Turing machine that can simulate an arbitrary Turing machine on
arbitrary input. The universal machine essentially achieves this by reading both the description of
the Turing machine to be simulated as well as the input thereof from its own tape.
The language accepted by the Universal TM is called a universal language. It contains multiple
tapes. Single TM can be used as a stored program computer, taking its program as well as its data
from one or more tapes on which input is placed. The same idea is implemented in Universal TM. It
is easiest to describe universal TM U as a multi-tape TM with transitions of M are stored initially
on the first tape, along with the string w. A second tape will be used to hold the simulated tape of
M, using the same format as for the code of M. That is tape symbol Xi of M will be represented by
0i, and tape symbol will be separated by single 1‟s, the third tape of U holds the state of M, with
state qi represented by i 0‟s. The universal TM U accepts the coded pair (M, w) if and only if M
accepts w. If M rejects w then the machine U also rejects w. Also if the machine M for input w
enters a loop forever, then U also does the same.
instructions generated. A formalist called as syntax directed definition is used fort specifying
translations for programming language constructs.
Definition: A syntax directed definition (SDD) is a context free grammar together with attributes
and rules. Attributes are associated with grammar symbol and rules are associated with
productions.
Semantic rules: set up dependencies between attributes which can be represented by a dependency
graph.
This dependency graph determines the evaluation order of these semantic rules.
Evaluation of a semantic rule defines the value of an attribute. But a semantic rule may also have
some side effects such as printing a value.
If X is a grammar symbol and a is one of its attribute, then we write X.a to denote the value of „a’
at a particular parse tree node labeled X. There are two kinds of attributes for non-terminals :
i. Synthesized Attribute.( S- Attribute): An attribute is said to be synthesized attribute if its
value at a parse tree node is determined from attribute values at the children of the node.
ii. Inherited Attribute: An inherited attribute is one whose value at parse tree node is
determined in terms of attributes at the parent and / or siblings of that node.
The attribute can be string, a number, a type, a, memory location or anything else.
Terminals can have synthesized attributes, but not inherited attributes. Attributes for terminals have
lexical value (lexval) that are supplied by the lexical analyzer.
Example: L → En
E → E1 + T
E→T
T →F
F → digit
We can write the syntax directed definition for the above grammar, by considering all non- terminal
symbol with a synthesized attribute val, and the terminal digit has synthesized attribute lexval.
L.val = E. val
E.val = E1. val + T. val
T.val = F. val
F. val = digit.lexval
Here F and T‟ are siblings, F and T has a synthesized attribute val and the id has a synthesized
attribute lexval. The non-terminal T‟ has inherited attribute inh.
Here the inherited attribute value for T‟ is determined in terms of attributes of its siblings F.val.
ie: T‟.inh = F.val
What do you mean by annotating or decorating the parse tree?
The process of computing the attribute values at the parse tree node is called annotating or
decorating the parse tree.
Annotated Parse Tree:
What is annotated parse tree?
A parse tree showing the values of attributes at each node is called an annotated parse tree.
For the grammar:
L → En
E → E1 + T
E→T
T → T1 * F
T →F
F →(E)
F → digit
E.val = 19 n
E.val = 15 + T.val = 4
T.val = 15 F.val = 4
digit .lexval = 3
Write the syntax-directed definition for simple desk calculator and give parse tree and annotated
parse tree for the expression:
i. (7 – 2 ) * (8 – 1 )n
ii. 5 + 6 * 7;
iii. 4 * ( 3 + 5) – 7
Context free grammar for simple desk calculator is given by:
L → En
E → E1 + T
E → E1 - T
E→T
T → T1 * F
T → T1 / F
T →F
F →(E)
F → digit
Syntax directed definition for Simple Desk Calculator:
Let us consider each of the non-terminals has a single synthesized attribute called val. Also the
terminal digit has a synthesized attribute lexval, which is an integer value returned by the lexical
analyzer.
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
E → E1 - T E. val = E1 . val - T. val
E→T E. val = T. val
T → T1 * F T. val = T1 . val * F.val
T → T1 / F T. val = T1 . val / F.val
T →F T. val = F . val
F →(E) F. val = E . val
F → digit F. val = digit . lexval
B → 0 B.val = 0
B →1 B.val = 1
Construct the parse tree and annotated parse tree for the input string :11001
Parse tree for the input string:11001:
BN
L
L1 B
L1 B 1
L1 B 0
L1 B 0
B 1
L.val =25
L1. Val =12 B.val =1
B.val =1 1 lexval
1 lexval
D.val =1 ɛ
1 lexval
From the above annotated parse tree we observe that A, B, C has synthesized attribute say val.
D.val = lexval 1
B.val = D.val = 1 (from definition of synthesized attribute)
C has two attributes; an inherited attribute say C.inh and synthesized attribute say C.syn.
From C → ɛ ; directly, we cannot compute the synthesized attribute of C.syn. Therefore we have
to compute its another attribute C.inh.
From A → BC production, B and C are siblings, so the attribute value at node B, ie: B.val =1 is
inherited by C. Therefore we can pass this B.val to C as
C.inh = B.val
From C → ɛ ;by computing the synthesized attribute value of C at node C itself,
C.syn = C. inh = 1
2. T’ → *F T‟
T’ → ɛ
F → digit
T‟. inh = 8
T‟.syn = 24
digit. lexval = 3 ɛ
For the given grammar, write annotated parse tree for 3 * 5 using top-down approach. Write
semantic rule for each step.
T → F T‟
T’ → *F T‟
T’ → ɛ
F → digit
Let us consider each of the non-terminals T and F has a single synthesized attribute called val. Also
the terminal digit has a synthesized attribute lexval, which is an integer value returned by the
lexical analyzer. The non-terminal T‟ has two attributes: an inherited attribute say inh and a
synthesized attribute say syn.
Annotated parse tree for the input 3 * 5:
T.val = 15
digit.lexval =5 ɛ
Semantic rules:
Productions Semantic Rules
T → F T‟ T‟.inh = F.val
T.val = T‟.syn
T’ → *F T1‟ T1‟.inh = T‟.inh x F.val
T‟.syn = T1‟.syn
T’ → ɛ T1‟.syn = T1‟.inh
F → digit F.val = digit. lexval
Write annotated parse tree for 3 * 5 + 4n using the grammar suitable for top-down parser. Write
semantic rule for each step.
Answer:
For the above problem, we can make use of desk calculator grammar without left-recursion. That is
in top-down approach we have to eliminate left-recursion from the grammar. The equivalent
grammar without left recursion is given by:
E → T E‟
E‟ → +T E1‟
E‟ → ɛ
T → F T‟
T‟ → *F T‟
T‟ → ɛ
F → digit
Annotated parse tree for the input 3 * 5 + 4n :
Semantic rules:
Productions Semantic Rules
E → T E‟ E.val = E‟.syn
E‟.inh = T.val
E‟ → +T E1‟ E1‟.inh = E‟.inh + T.val
E‟.syn = E1‟.syn
E‟ → ɛ E1‟.syn = E1‟.inh
T → F T‟ T‟.inh = F.val
T.val = T‟.syn
T’ → *F T1‟ T1‟.inh = T‟.inh x F.val
T‟.syn = T1‟.syn
T’ → ɛ T1‟.syn = T1‟.inh
F → digit F.val = digit. lexval
Write the syntax directed definition for the simple desk calculator and also write the annotated
parse tree for the expression:
i. ( 3 + 4) * ( 5 + 6)n
ii. 1 * 2 * 3 * (4 + 5 )n
iii. ( 9 + 8 * ( 7+ 6 ) + 5) * 4n
Syntax directed definition:
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
T →F T. val = F . val
Give SDD to process a simple variable declaration in „C‟ and give annotated parse tree for the
following expression: int a, b, c
Answer:
A simple declaration say D consisting of a basic type T followed by a list of L identifiers. T can be
int or float. For each identifier on the list, the type is entered into the symbol table entry for the
identifier. Entries can be updated in any order.
Context free grammar for C declaration statement:
D→TL
T→ int
T → float
L → L1 , id
L → id
Let us consider the non-terminal T has one synthesized attribute, T. type, which is the type in the
declaration D. Non-terminal L has one inherited attribute which we call as L.inh. This L.inh
attribute value is passed to the list of identifiers, so that it can be added to the appropriate symbol
table entries.
The value of the L1.inh is computed at parse tree node by copying the value of L.inh from the parent
of that node (head of the production)
Annotated Parse tree for the input : int a, b, c:
D
id. entry = a
addType( id.entry, L.inh) function is called whenever an identifier with appropriate type is added
to the symbol table entry.
id.entry, a lexical value that points to a symbol table object and
L.inh, the type being assigned to every identifier on the list.
Syntax Directe Definition ( SDD) for type declaration in C
Productions Semantic rules
D→TL L.inh = T. type
T→ int T. type = integer
T → float T. type = float
L → L1 , id L1. inh = L.inh
addType( id.entry, L.inh)
L → id addType( id.entry, L.inh)
Annotated parse tree for the input float id1, id2, id3
Here dotted line represent the annotated graph and solid line indicates the dependency graph.
NOTE:
If a semantic rule associated with a production p defines the value of inherited attribute B.c in terms
of the value X.a. Then, the dependency graph has an edge from X.a to B.c.
Example: F.val = dgit. Lexval
In dependency graph we have to draw a line (edge) from digit. lexval to F.val.
Draw dependency graph for the expression 3 * 5 by using the desk calculator grammar suitable for
top-down parsing.
SDD for the grammar :
Productions Semantic Rules
T → F T‟ T‟.inh = F.val
T’ → *F T1‟ T 1‟.inh
T.val = =T‟.syn
T‟.inh x F.val
T’ → ɛ T 1‟.syn==TT11‟.syn
T‟.syn ‟.inh
F → digit F.val = digit. lexval
Dependency graph for the expression 3 * 5:
Explanation:
Here the dotted lines represent the parse tree edges and solid lines represent the edges of the
dependency graph.
Dependency graph nodes are represented by the numbers 1 through 9, correspond to the attributes
in the annotated parse tree.
Nodes 1 and 2 represent the attribute lexval associated with the two leaves labeled digit.
Nodes 3 and 4 represent the attribute val associated with the two nodes labeled F.
Edges to node 3 from 1 and to node 4 from 2 result from the semantic rule F.val = digit. lexval, but
the edge in dependency graph represents dependence, not equality
Nodes 5 and 6 represent the inherited attribute T’.inh associated with each of the occurrences of
non-terminal T‟.
The edge to 5 from 3 is due to the rule T’.inh = F.val.
Edges to 6 from 5 for T’.inh and from node 4 for F.val is due to the rule T. val = T1 . val * F.val.
Nodes 7 and 8 represent the synthesized attribute syn associated with the occurrences of T‟.
The edge to 7 from 6 is due to the semantic rule T‟.syn = T‟.inh.
The edge to node 8 from 7 is due to a semantic rule T‟.syn = T1‟.syn
The edge to node 9 from 8 is due to a semantic rule T‟.val = T1‟.syn.
Obtain the syntax directed definition for simple type declarations:
D→TL
T→ int
T → float
L → L1 , id
L → id
Also obtain the dependency graph for a declaration float id1, id2, id3
Solution:
Syntax Directe Definition ( SDD) for type declaration:
Productions Semantic rules
D→TL L.inh = T. type
T→ int T. type = integer
T → float T. type = float
L → L1 , id L1. inh = L.inh
L → id addType( id.entry, L.inh)
Nodes 1, 2, and 3 represent the attribute entry associated with each of the leaves labeled id.
Node 4 represents the attribute T.type, and actually where attribute evaluation begins. This type is
then passed to nodes 5, 7 and 9 representing L.inh associated with each of the occurrences if the
non-terminal L.
Nodes 6, 8 and 10 are the dummy attributes that represent the application of the function addType
to a type and one of these entry values.
Give the SDD for simple desk calculator and draw dependency graph for expression 1 * 2* 3 *( 4
+ 5 )n
Syntax directed definition:
Exmple:
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
E→T E. val = T. val
T → T1 * F T. val = T1 . val * F.val
T →F T. val = F . val
F →(E) F. val = E . val
L-attributed definitions:
The syntax directed definition is said to be L-attributed if every attribute in SDD must be either
synthesized attributes or inherited attributes in a restricted fashion.
A syntax-directed definition is L-attributed if each inherited attribute of Xj, where 1≤ j≤ n, on the
right side of A → X X2...Xn depends only on:
The attributes of the symbols X1,...,Xj-1 to the left of Xj in the production and
The inherited attribute of A.
Every S-attributed definition is L-attributed, the restrictions only apply to the inherited
attributes (not to synthesized attributes).
Example:
Productions Semantic Rules
T → F T‟ T‟.inh = F.val
T.val = T‟.syn
T’ → *F T1‟ T1‟.inh = T‟.inh x F.val
T‟.syn = T1‟.syn
The first rule defines the inherited attribute T‟.inh = F.val, and F appears to the left of T‟ in the
production body.
The second rule defines the inherited attribute T1‟.inh = T‟.inh x F.val, which is associated with the
inherited value of head T‟ and F.val, where F appears left of T1‟ in the production body.
High level representations are close to source language and low level representations are
close to the target machine.
Three types of intermediate representation:
1. Syntax Trees
2. Postfix notation
3. Three Address Code
Syntax Tree:
Syntax tree is nothing more than a condensed form of the parse tree. .Nodes in syntax tree represent
constructs in the source program. The children of a node represent the meaningful Components of a
construct.
Postfix Notation:
Operator follows the operand
Example::(a-b) * ( c + d) + (a – b), the postfix representation is::ab-cd+*ab-+
Three Address Code:
• It is a sequence of statements of the form x = y op z.
• It has at most one operator on the right side of an instruction.
• No built-up arithmetic expressions are permitted.
• It is a linearized representation of a syntax tree or a DAG
Variants of syntax trees:
• Nodes of syntax tree represent constructs in the source program; the children of a node
represent the meaningful components of a construct.
• A directed acyclic graph (DAG) for an expression identifies the common sub expressions of
the expression. (sub expressions that appears more than once)
• A DAG has leaves corresponding to atomic operands and interior nodes corresponding
to operators. A node N in a DAG has more than one parent if N represents a common
sub expression; in a syntax tree.
DIRECTED ACYCLIC GRAPH (DAG):
What is a DAG? How it differs from syntax tree.
A directed acyclic graph (DAG) for an expression identifies the common sub expressions of the
expression. (sub expressions that appears more than once).
DAG gives the compiler important clues regarding the generation of efficient code to evaluate the
expressions.
Comparison between DAG and syntax tree:
• A node N in DAG has more than one parent if N represents a common sub-expression;
• In syntax tree the tree for common sub-expression would be replicated as many times as the
sub-expression appears in the original expression.
• DAG gives the compiler important clues regarding the generation of efficient code to
evaluate the expressions.
Node( ) and Leaf( ) functions were called to Create a fresh node and leaf node respectively in
Syntax tree construction. The same functions can be used in DAG, but in DAG before creating a
new node/ leaf-node, These functions first check whether an identical Node already exists.
If a previously created identical node exists, the existing node is returned, otherwise new node is
Created.
Construction of DAG:
Node ( ) and Leaf ( ) functions were used to create the node and leaf node of DAG respectively.
in DAG before creating a new node/ leaf-node, These functions first check whether an identical
Node already exists.
If a previously created identical node exists, the existing node is returned, otherwise new node is
Created.
******Develop SDD to produce directed acyclic graph for an expression and draw the DAG for the
expression a + a * ( b – c) + ( b – c) * d. Show the steps for constructing the same.
SDD to produce directed acyclic graph for an expression:
E → E1 + T E. node = new Node („+‟, E1 .node, T1. node)
E → E1 + T E. node = new Node („-‟, E1 .node, T1.node)
E→T E. node = T. node
T → T1 * F T. node = new Node („*‟, T1 .node, F. node)
T→ F E. node = T. node
F→(E) F. node = E. node
F → id F. node = new Leaf( id, id. entry)
DAG:
Construct the DAG and identify the value number for the sub-expressions of the following
expressions, assuming + associates from left.
i. a + b + ( a + b)
DAG:
Array representation:
1 id → entry for a
2 id → entry for b
3 + 1 2
4 + 1 2
5 + 4 3
ii. a + b + a + b
DAG:
Array representation:
1 id → entry for a
2 id → entry for b
3 + 1 2
4 + 3 1
5 + 4 2
b) Conditional jumps of the form if x goto L and if False x goto L. These instructions
execute the instruction with label L next if x is true and false, respectively. Otherwise
the following 3-address instruction in sequence is executed next as ususl.
c) Conditional jumps such as if x relop y goto L, which apply a relational operator such as
<, ==, >=, etc to x and y, and execute the instruction with L next if x stands in relop to y.
If not, 3-address instruction following if x relop y goto L is executed in sequence.
Example: if x < y goto L1
d) Procedure calls and returns are implemented using the following instructions:
Param x for parameters; call p, n and y = call p, n for procedure and function calls
respectively; and return y; where y represents the returned value, which is optional.
Example:
param x1
param x2
…
param xn
call p, n
Three- address instruction (code) Representation:
The above 3-address instruction format specifies the components of each type of instruction, but
does not specify the representation of these instructions in a data structure. In a compiler these 3-
address instructions can be implemented as records with fields for the operator and operands.
Three such representations are called
1. Quadruples
2. Triples
3. Indirect-triples
Explain in detail the implementation of three address statements (code)
OR
Explain the following with an example
i. Quadruples: A quadruple or quad has four fields which we call op, arg1, arg2 and result. The
op field contains an internal code for the operator.
Example: Three address code t1 = x + y can be represented in quadruple form as follows:
ii. Triples: A triple has only three fields, which we call op, arg1 and arg2. In triples form result of
an operation say x op y is referred by its position rather than by an explicit temporary name.
Example: Three address codes t1 = x + y
t2 = z * t1 can be represented in triples form as follows:
op arg1 arg2
0 + x y
1 * z (0)
. . . .
. . . .
Here the second 3-address instruction contains the temporary name t1. In triples form t1 can be
referred by its position; ie: (0). The parenthesized numbers represent pointers into the triple
structure itself.
iii. Indirect Triples: Indirect triples consist of a listing of pointers to triples, rather than a listing of
triples themselves.
As we know with triples the result of an operation is referred to by its position. So moving an
instruction may require us to change all references to that result. This problem does not occur in
indirect triples form.
Example:
Three address codes t1 = x + y
t2 = z * t1 can be represented in indirect-triples form as follows:
ii. SSA uses a notational convention called Ø function is used to combine the two definitions of
single variable.
Example: if ( flag )
{
x = -1;
}
else
x = 1;
y=x*a
The above program has two control paths in which the variable x gets defined.
In SSA, if we use different names for x then the source program becomes
if ( flag )
{
x1 = -1;
}
else
x2 = 1;
y=x*a
Now which variable should we use in the assignment y = x * a ?
This can be answered by considering the second aspect of SSA, where a notational convention
called Ø function is used to combine the two definitions of x: as
if ( flag )
{
x1 = -1;
}
else
x2 = 1;
x3 = Ø ( x1, x2 );
Here Ø( x1, x2 ) has the value x1 if the control flow passes through the true part of the condition and
the value x2 if it passes through false part.
Advantages and disadvantages of quadruple, triples and indirect-triples:
The benefit of Quadruples over Triples can be seen in an optimizing compiler, where
instructions are often moved around.
With quadruples, if we move an instruction that computes a temporary t, then the
instructions that use t require no change.
With triples, the result of an operation is referred to by its position, so moving an instruction
may require changing all references to that result. This problem does not occur with indirect
triples
With indirect-triples an optimizing compiler can move an instruction by re-ordering the
instruction list without affecting the triples themselves.
iii. Triples:
op arg1 arg2
0 minus c
1 * b (0)
2 minus c
3 * b (2)
4 + ( 1) (3)
5 = a (4)
. ….. …… …..
iv. Indirect- Triples:
instruction
25 (0) op arg1 arg2
26 (1) 0 minus c
27 ( 2) 1 * b (0)
28 (3) 2 minus c
29 ( 4) 3 * b (2)
30 (5) 4 + ( 1) (3)
.. ………… 5 = a (4)
. ….. …… …..
Obtain the DAG and three-address code for the expression
( a+ b) * ( c + d) - ( a+ b)
DAG:
Three-address code:
t1 = a + b
t2 = c + d
t3 = a + b
t4 = t1 * t2
t5 = t4 – t3
Translate the arithmetic expression a + - ( b +c ) into:
i. A syntax tree
ii. Three address code
iii. Quadruples
iv. Triples
v. Indirect-Triples
i. Syntax Tree:
iii. Quadruple:
op arg1 arg2 result
0 + b c t1
1 minus t1 t2
2 + a t2 t3
. …… …… …….
iv. Triples:
op arg1 arg2
0 + b c
1 minus (0)
2 + a (1)
. …… ……
v. Indirect-triples:
instruction
25 (0) op arg1 arg2
26 (1) 0 + b c
27 ( 2) 1 minus (0)
.. ………… 5 + a (1)
. ….. …… …..
machine level code before execution. Re-locatable code provides great deal of flexibilities as the
functions can be compiled separately before generation of object code.
Memory Management:
During code generation process the symbol table entries have to be mapped to actual
physical addresses and levels have to be mapped to instruction address.
Mapping name in the source program to address of data is co-operating done by the front end and
code generator.
Local variables are stack allocation in the activation record while global variables are in
static area.
Instruction Selection:
The code generator must map the intermediate-representation program (3-address code) into
sequence of codes that can be executed by the target machine. This mapping can be determined by
considering the factors such as:
a. The level of intermediate-representation
b. The nature of instruction set architecture.
c. Quality of the generated code.
If the intermediate-representation level is high-level, then the code generator produces poor
code that needs further optimization.
If the IR level is Low-level details of the underlying machine, then the code generator can
produce more efficient code sequences.
The instruction set should be complete, in the sense that all operations can be implemented.
Sometimes a single operation may be implemented using many instruction (many set of
instructions). The code generator should choose the most appropriate instruction. The
instruction should be chosen in such a way that speed is of execution is minimum or other
machine related resource utilization should be minimum.
Example: Consider the set of statements
a=b+c
d=a+e
would be translated into:
LD R0, b
ADD R0, R0, c
ST a, R0
LD R0, a
ADD R0, R0, e
ST d, R0
Here the fourth statement is redundant since it loads a value that has just been stored, and so is the
third if „a‟ is not subsequently used. So the redundant instruction should be eliminated.
Example: a = a + 1 would be translated into:
LD R0, a
Add R0, R0, # 1
ST a, R0
If the target machine has an increment (INC) instruction, then the above 3-address code may be
implemented by the single instruction INC a, rather than by a more obvious sequence that loads a
into a register, adds 1 to the register, and then stores the result back into a;
Thus instruction cost is also an important issue in design of code generation. Cost of instruction is
defined as cost of execution plus the number of memory access.
Register Allocation:
If the operands are in register, the execution is faster hence the set of variables whose values
are required at a point in the program are to be retained in the registers.
In Register allocation we select the set of variables that will reside in register
In register assignment, we pick the register that contains variable.
Consider a hypothetical byte addressable machine as target machine. It has n general purpose
register R1, R2 ------- Rn. The machine instructions are two address instructions of the form
op-code source address destination address
Example:
MOV R0, R1
ADD R1, R2
Target Machine supports for the following addressing modes:
a. Absolute addressing mode
Example: MOV R0, M where M is the address of memory location of one of the operands.
MOV R0, M moves the contents of register R0 to memory location M.
b. Register addressing mode where both the operands are in register.