Unit 1
Unit 1
Dr. M. NARAYANAN
B.E., M.E., Ph.D in CSE
Professor
Department of CSE
School of Engineering
Malla Reddy University
MR20-1CS0112 - AUTOMATA & COMPILER DESIGN
Course Objectives:
To introduce the Finite Automata, NFA and DFA.
To gain insight into the Context Free Language.
To study the Phases of a Compiler and Lexical Analysis and Syntax Analysis.
To acquaint the Intermediate Code Generation, Code Optimization and Code Generation.
Course Outcomes:
After completion of the course, the students will be able to Understand the concept of
Finite Automata, NFA and DFA.
Understand about Context Free Language .
Explain the concept of Phases of a Compiler, Lexical Analysis and Syntax Analysis.
Describe the Intermediate code generation, Code Optimization and Code Generation.
UNIT –I
Finite Automata and Regular Expressions: Finite Automata- Examples and Definitions -
Accepting the Union, Intersection, Difference of Two Languages. Regular Expressions:
Regular Languages and Regular Expressions– Conversion from Regular Expression to NFA
and Deterministic Finite Automata. Context free grammar: Derivations trees and ambiguity –
Simplified forms and Normal forms.
TEXT BOOKS:
1. John E. Hopcroft and Jeffrey D. Ullman, „Introduction to Automata Theory, Languages,
and Computation, 3rd Edition, Pearson Edition.
2. Alfred Aho, V. Ravi Sethi, and D. Jeffery Ullman, “Compilers Principles, Techniques and
Tools”, Addison- Wesley, 2nd Edition, 2007.
3. John C. Martin, „Introduction to Languages and the Theory of Computations‟, McGraw
Hill, 3rd Edition, 2007.
Finite Automata (FA) is the simplest machine to recognize patterns. The finite automata
or finite state machine is an abstract machine which have five elements or tuple.
It has a set of states and rules for moving from one state to another but it depends upon the
applied input symbol. Basically it is an abstract model of digital computer.
Finite automata takes the string of symbol as input and changes its state accordingly. When
the desiring symbol is search, then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same
state. Finite automata have two states, Accept state or Reject state.
Finite automata have two states, Accept state or Reject state. When the input string is
processed successfully, and the automata reached its final state, then it will accept. A finite
automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where: Finite automata can be
represented by input tape and finite control.
Finite Automata and Regular Expressions: Finite Automata
Automata theory is the study of abstract computing devices or machines before there were
computers in the 1930s.
A Turing studied an abstract machine that had all the capabilities of todays computers at
least as far as in what they could compute Turing's goal was to describe precisely the
boundary between what a computing machine could do and what it could not do his
conclusions apply not only to his abstract Turing machines but to todays real machines.
In the 1940s and 1950s simpler kinds of machines which we today call finite automata
were studied by a number of researchers.
These automata originally proposed to model brain function turned out to be particularly
useful for a variety of other purposes. Also in the late the linguist N. Chomsky began the
study of formal grammars.
While not strictly machines these grammars have close relationships to abstract automata
and serve today as the basis of some important software components including parts of
compilers.
An abstract machine is a computer science theoretical model that allows for a detailed and
precise analysis of how a computer system functions.
In 1969 S. Cook extended Turing’s study of what could and what could not be computed
Cook was able to separate those problems that can be solved efficiently by computer from
those problems that can in principle be solved bin practice take so much time that
computers are useless for all but very small instances of the problem.
The latter class of problems is called intractable or NP-hard. It is highly unlikely that even
the exponential improvement in computing speed that computer hardware has been
following Moore’s Law will have significant impact on our ability to solve large instances
of intractable problems.
All of these theoretical developments bear directly on what computer scientists do today.
Some of the concepts like finite automata and certain kinds of formal grammars are used in
the design and construction of important kinds of software Other concepts like the Turing
machine help us understand what we can expect from our software.
Why Study Automata Theory
There are several reasons why the study of automata and complexity is an important part
of the core of Computer Science.
Automata theory is important because it allows scientists to understand how machines
solve problems. An automaton is any machine that uses a specific, repeatable process to
convert information into different forms. Modern computers are a common example of
an automaton.
If scientists didn’t study automata theory, they would have a much more difficult time
designing systems that could perform repeatable actions based on specific inputs and
outputs. Scientists are able to design systems that can perform specific tasks, such as
personal computer systems, automatic aircraft pilots and many more, by using
automata theory.
There are a number of other examples of automatons. These range from basic devices,
such as a pendulum clock, to missile guidance systems and complex telephone
networks.
Thermostats are a familiar example of an automaton.
A thermostat checks the temperature of its surrounding environment at specific intervals,
and then turns on when the temperature reaches a certain level. In this case, there are only
two potential states for the thermostat: on or off.
Automatons can be much more complex than a thermostat. Modern computers have a large
number of data inputs and potential states.
Automata theory is used to design computers that respond to inputs by producing
reliable outputs.
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Finite Automata Model:
Finite automata can be represented by input tape and finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in
each cell.
Finite control: The finite control decides the next state on receiving particular input from
input tape. The tape reader reads the cells one by one from left to right, and at a time only
one input symbol is read.
Types of Automata:
There are two types of finite automata:
1.DFA(deterministic finite automata)
2.NFA(non-deterministic finite automata)
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. In the DFA, the machine goes to one state only for a particular input
character. DFA does not accept the null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of
states for a particular input. It can accept the null move.
DFA (Deterministic finite automata)
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. The finite automata are called deterministic finite automata if the machine
is read an input string one symbol at a time.
In DFA, there is only one path for specific input from the current state to the next
state.
DFA does not accept the null move, i.e., the DFA cannot change state without any input
character.
DFA can contain multiple final states. It is used in Lexical Analysis in Compiler.
DFA uses include protocol analysis, text parsing, video game character behavior, security
analysis, CPU control units, natural language processing, and speech recognition.
Modern applications of automata theory go far beyond compiler techniques or hardware
verification. Automata are widely used for modelling and verification of software,
distributed systems, real-time systems, or structured data. They have been equipped with
features to model time and probabilities as well.
In the following diagram, we can see that from state q0 for input a, there is only one path
which is going to q1. Similarly, from q0, there is only one path for input b going to q2.
Formal Definition of a DFA
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
Q is a finite set of states.
∑ is a finite set of symbols called the alphabet.
δ is the transition function where δ: Q × ∑ → Q
q0 is the initial state from where any input is processed (q0 ∈ Q).
F is a set of final state/states of Q (F ⊆ Q).
Graphical Representation of a DFA
A DFA is represented by digraphs called state diagram.
The vertices / nodes represent the states.
The arcs labeled with an input alphabet show the transitions.
The initial state is denoted by an empty single incoming arc.
The final state is indicated by double circles.
Example: Transition Diagram
Q = {q0, q1, q2}
∑ = {0, 1}
q0 = {q0}
F = {q2}
Transitions are
δ(q0,0)= q0, δ(q0,1)= q1, δ(q1,1)= q1, δ(q1,0)= q2, δ(q2,0)= q2, δ(q2,1)= q2
Draw Transition Diagram and Transition Table.
Transition Table:
Transition?
Transition Table?
NFA (Non-Deterministic Finite Automata)
NFA stands for non-deterministic finite automata. It is easy to construct an NFA than DFA
for a given regular language.
The finite automata are called NFA when there exist many paths for specific input
from the current state to the next state.
Every NFA is not DFA, but each NFA can be translated into DFA.
NFA is defined in the same way as DFA but with the following two exceptions, it contains
multiple next states, and it contains ε transition.
In the following image, we can see that from state q0 for input a, there are two next states q1
and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed or
determined that with a particular input where to go next. Hence this FA is called non-
deterministic finite automata.
Solution:
Transition diagram:
Transition Table?
Examples
Design a NFA for the transition table as given below:
The given NFA Design
Design an NFA with ∑ = {0, 1} accepts all string ending with 01.
Hence, NFA would be:
We
We
Finite Automata With Epsilon Transitions
Regular Expression
The language accepted by finite automata can be easily described by simple expressions
called Regular Expressions. It is the most effective way to represent any language.
The languages accepted by some regular expression are referred to as Regular languages.
A regular expression can also be described as a sequence of pattern that defines a string.
Regular expressions are used to match character combinations in strings. String searching
algorithm used this pattern to find the operations on a string.
For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx,
xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx,
xxxx, .....}
Definition of RE
Operations on Regular Language
The various operations on regular language are:
Union: If L and M are two regular languages then their union L U M is also a union.
L U M = {s | s is in L or s is in M}
Intersection: If L and M are two regular languages then their intersection is also an
intersection.
L ⋂ M = {st | s is in L and t is in M}
Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular
language.
L* = Zero or more occurrence of language L.
Example 1:
Write the regular expression for the language accepting all combinations of a's, over the set ∑ =
{a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times,
that means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular
expression for this as:
R = a*
That is Kleen closure of a.
Example 2:
Write the regular expression for the language accepting all combinations of a's except the null
string, over the set ∑ = {a}
Solution:
The regular expression has to be built for the language
L = {a, aa, aaa, ....}
This set indicates that there is no null string. So we can denote regular expression as:
R = a+
Example 3:
Write the regular expression for the language accepting all the string containing any
number of a's and b's.
Solution:
The regular expression will be:
r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a
and b.
The (a + b)* shows any combination with a and b even a null string.
Example 4:
Write the regular expression for the language accepting all the string which are starting
with 1 and ending with 0, over ∑ = {0, 1}.
Solution:
In a regular expression, the first symbol should be 1, and the last symbol should be 0. The
r.e. is as follows:
R = 1 (0+1)* 0
Example 5:
Write the regular expression for the language starting and ending with a and having any
combination of b's in between.
Solution:
The regular expression will be:
R = a b* b
Example 6:
Write the regular expression for the language starting with a but not having consecutive b's.
Solution:
The regular expression has to be built for the language:
L = {a, aba, aab, aba, aaa, abab, .....}
The regular expression for the above language is:
R = {a + ab}*
Example 7:
Write the regular expression for the language accepting all the string in which any number of
a's is followed by any number of b's is followed by any number of c's.
Solution:
As we know, any number of a's means a* any number of b's means b*, any number of c's
means c*. Since as given in problem statement, b's appear after a's and c's appear after b's. So
the regular expression could be:
R = a* b* c*
Example 8:
Write the regular expression for the language containing the string in which every 0 is
immediately followed by 11.
Solution:
The regular expectation will be:
R = (011 + 1)*
Example 9:
Write the regular expression for the language accepting all the string in which any number of
a's is followed by any number of b's is followed by any number of c's.
Solution: As we know, any number of a's means a* any number of b's means b*, any number
of c's means c*. Since as given in problem statement, b's appear after a's and c's appear after
b's. So the regular expression could be:
R = a* b* c*
Operations on Sets
The basic set operations are:
1. Union of Sets: Union of Sets A and B is defined to be the set of all those elements which
belong to A or B or both and is denoted by A∪B.
A∪B = {x: x ∈ A or x ∈ B}
Example:
Let A = {1, 2, 3}, B= {3, 4, 5, 6}
Then
A∪B = {1, 2, 3, 4, 5, 6}.
2. Intersection of Sets: Intersection of two sets A and B is the set of all those elements
which belong to both A and B and is denoted by A ∩ B.
A ∩ B = {x: x ∈ A and x ∈ B}
Example:
Let A = {11, 12, 13}, B = {13, 14, 15}
Then A ∩ B = {13}.
B
3. Difference of Sets: The difference of two sets A and B is a set of all those elements which
belongs to A but do not belong to B and is denoted by A - B.
A - B = {x: x ∈ A and x ∉ B}
If A and B are two sets, then their difference is given by A - B or B - A.
• If A = {2, 3, 4} and B = {4, 5, 6}
A - B means elements of A which are not the elements of B.
i.e., in the above example A - B = {2, 3}
Example:
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}
then A - B = {1,2} and B - A = {5, 6}
We
We
We
We
We
We
We