Unit-III (Regular Expression)
Unit-III (Regular Expression)
Regular Expressions
Prepared By:
Ghanashyam BK
1
Regular Language
A language is said to be a REGULAR LANGUAGE if
and only if some Finite State Machine recognizes it
So what languages are NOT REGULAR?
The languages
Which are not recognized by any FSM
Which require memory
2
Regular Expressions
those algebraic expressions used for representing
regular languages, the languages accepted by finite
automaton.
offer a declarative way to express the strings we want
to accept.
Many system uses regular expression as input
language.
Search commands such as UNIX grep
Lexical analyzer generator such as LEX or FLEX.
Lexical analyzer is a component of compiler that breaks
the source program into logical unit called tokens.
3
Regular Expressions
Each regular expression ‘r’ denotes a language L(r)
The defining rules specify how L(r) is formed by combining in
various ways
Method:
Let Σ be an alphabet, the regular expression over the alphabet Σ
are defined inductively as follows:
Basic steps:
Φ is a regular expression representing empty language.
Є is a regular expression representing the language of empty
strings. i.e.{Є}
if ‘a’ is a symbol in Σ, then ‘a’ is a regular expression
representing the language {a}.
4
Regular Expressions
following operations over basic regular expression define
the complex regular expression as:
-if ‘r’ and ‘s’ are the regular expressions representing the
language L(r) and L(s) then
r U s is a regular expression denoting the language L(r) U
L(s).
r.s is a regular expression denoting the language L(r).L(s).
r* is a regular expression denoting the language (L(r))*.
(r) is a regular expression denoting the language (L(r)).
Note: any expression obtained from Φ, Є, a using above
operation and parenthesis where required is a regular
expression.
5
Regular Operators
Basically, there are three operators that are used to
generate the languages that are regular
Union (U / | /+): If L1 and L2 are any two regular
languages then
L1UL2 ={s | s ε L1, or s ε L2 }
For Example:
L1 = {00, 11}, L2 = (Є, 10} then
L1UL2 = {Є, 00, 11, 10}
6
Regular Operators
Concatenation (.):
If L1 and L2 are any two regular languages then,
L1.L2 = {l1.l2|l1 ε L1 and l2 ε L2}
For examples:
L1 = {00, 11} and L2 = {Є, 10} then
L1.L2={00,11,0010,1110}
L2.L1={1000,1011,00,11}
So L1.L2 !=L2.L1
7
Regular Operators
Kleen Closure (*):
If L is any regular Language then,
L* = Li =L0 UL1UL2U………….
Precedence of regular operator:
The star operator is of highest precedence. i.e it applies
to its left well formed RE.
Next precedence is taken by concatenation operator.
Finally, unions are taken
8
Regular Languages
Let Σ be an alphabet, the class of regular language
over Σ is defined inductively as:
Φ is a regular language representing empty language
{Є} is a regular language representing language of
empty strings.
For each a ε Σ, {a} is a regular language.
If L1, L2…………. Ln is regular languages, then so is
L1U L2U………..ULn.
If LI,L2,L3,…………..Ln are regular languages, then so
is L1.L2.L3………Ln
If L is a regular language, then so is L*
9
Applications of Regular Languages
Validation:
Determining that a string complies with a set of
formatting constraints. Like email address validation,
password validation etc.
Search and Selection:
Identifying a subset of items from a larger set on the
basis of a pattern match.
Tokenization:
Converting a sequence of characters into words, tokens
(like keywords, identifiers) for later interpretation.
10
Algebraic Rules for Regular Expressions
Commutativity:
Commutative of operator means we can switch the order
of its operands and get the same result.
The union of regular expression is commutative but
concatenation of regular expression is not commutative.
Associativity:
The unions as well as concatenation of regular
expressions are associative.
i.e. if t, r, s are regular expressions representing regular
languages L(t),L(r) and L(s) then,
t+(r+s) = (t+r)+s and t.(r.s) = (t.r).s
11
Algebraic Rules for Regular Expressions
Distributive law:
For any regular expression r,s,t representing regular
language L(r), L(s) and L(t) then,
r(s+t) = rs+rt ------ left distribution.
(s+t)r = sr+tr ------ right distribution
Identity law:
Φ is identity for union. i.e. for any regular expression r
representing regular expression L(r).
r + Φ = Φ + r = r i.e. Φ U r = r.
Є is identity for concatenation. i.e. Є.r = r = r. Є
12
Algebraic Rules for Regular Expressions
Annihilator:
An annihilator for an operator is a value such that when
the operator is applied to the annihilator and some other
value, the result is annihilator.
Φ is annihilator for concatenation.
i.e. Φ.r = r.Φ = Φ
Idempotent law of union:
For any regular expression r representing the regular
language L(r), r + r = r.
This is the idempotent law of union.
13
Algebraic Rules for Regular Expressions
Law of closure:
for any regular expression r, representing the regular
language L(r),then
(r*)*=r*
Closure of Φ = Φ* = Є
Closure of Є = Є* = Є
Positive closure of r, r+ = rr*.
14
Regular Expressions Examples
Consider Σ = {0, 1}, then some regular expressions over
Σ are:
0*10* is RE that represents language {w|w contains a
single 1}
Σ * 1Σ* is RE for language{w|w contains at least single
1}
Σ*001 Σ* = {w|w contains the string 001 as substring}
(Σ Σ)* or ((0+1)*.(0+1)*) is RE for {w|w is string of
even length}
1*(01*01*)* is RE for {w|w is string containing even
number of zeros}
15
Regular Expressions Examples
0*10*10*10* is RE for {w|w is a string with exactly
three 1’s}
For string that have substring either 001 or 100, the
regular expression is (1+0)*.001.(1+0)*+(1+0)*.(100).
(1+0)*
For strings that have at most two 0’s with in it, the
regular expression is 1*.(0+Є).1*.(0+Є).1*
For the strings ending with 11, the regular expression
is (1+0)*.(11)
16
Finite Automata and Regular expression
In order to show that the RE define the same class of
language as Finite automata, we must show that:
Any language define by one of these finite automata is
also defined by RE.
Every language defined by RE is also defined by any of
these finite automata.
17
Reduction of Regular Expression to ε –
NFA
We can show that every language L(R) for some RE R,
is also a language L(E) for some epsilon NFA.
This say that both RE and epsilon-NFA are equivalent
in terms of language representation.
Theorem 1
For any regular expression r, there is an Є-NFA that
accepts the same language represented by r.
Proof:
Let L =L(r) be the language for regular expression r,
now we have to show there is an Є-NFA E such that L
(E) =L.
18
Reduction of Regular Expression to ε –
NFA
The proof can be done through structural induction on r, following the
recursive definition of regular expressions.
For this we know Φ, Є, ‘a’ are the regular expressions representing
languages {Φ}; an empty language, {Є};language for empty strings
and {a} respectively.
The Є-NFA accepting these languages can be constructed as;
20
Reduction of Regular Expression to ε –
NFA
Then, r=r1+r2 can be constructed as:
21
Reduction of Regular Expression to ε –
NFA
Here, the path from starting to accepting state go first
through the automaton for r1, where it must follow a
path labeled by a string in L(r1), and
then through the automaton for r2, where it follows a
path labeled by a string in L(r2).
Thus, the language accepted by above automaton is
L(r1).L(r2).
22
Reduction of Regular Expression to ε –
NFA
For *(Kleen closure)
Now, r* Can be constructed as;
23
Examples (Conversion from RE to Є-NFA)
For regular expression (1+0) the Є-NFA is:
24
Examples (Conversion from RE to Є-NFA)
For regular expression (00+1)*10 the Є-NFA is as:
25
Equivalence of Regular Expression and
Finite Automata
Discussed in class.
26
Conversion of DFA to Regular Expression
Arden’s Theorem:
Let p and q be the regular expressions over the
alphabet Σ, if p does not contain any empty string then
r = q + rp has a unique solution r = qp*.
Proof:
Here, r = q + rp ……………… (i)
Let us put the value of r = q + rp on the right hand side
of the relation (i), so;
r = q + (q + rp)p
r = q + qp + rp2………………(ii)
27
Conversion of DFA to Regular Expression
Again putting value of r = q + rp in relation (ii), we get;
r = q + qp + (q +rp) p2
r = q+ qp + qp2 + rp3………………
Continuing in the same way, we will get as;
r = q + qp + qp2 + qp3………………..
r = q(Є + p + p2 +p3 +…………………..
Thus r = qp* Proved.
28
Conversion of DFA to Regular Expression
Use of Arden’s rule to find the regular expression
for DFA:
To convert the given DFA into a regular expression,
here are some of the assumptions regarding the
transition system:
The transition diagram should not have the Є-transitions.
There must be only one initial state.
The vertices or the states in the DFA are as;
q1,q2,……………..qn (Any qi is final state)
29
Conversion of DFA to Regular Expression
Wij denotes the regular expression representing the set
of labels of the edjes from qi to qj.
Thus we can write expressions as;
q1=q1w11+q2w21+q3w31+………………qnwn1+Є
q2=q1w12+q2w22+q3w32+………………+qnwn2
q3=q1w13+q2w23+q3w33+………………+qnwn3
…………………………………………………
…………………………………………………
qn=q1w1n+q2wn2+q3wn3+………………………qnwnn
Solving these equations for qi in terms of wij gives the
regular expression eqivalent to given DFA.
30
Conversion of DFA to Regular Expression
Examples: Convert the following DFA into regular
expression.
31
Conversion of DFA to Regular Expression
Putting the values of q2 and q3 in (i)
q1=q101+q110+ Є
i.e.q1=q1(01+10)+ Є
i.e.q1= Є+q1(01+10) (since r = q+rp)
i.e. q1= Є(01+10)* (using Arden’s rule)
Since, q1 is final state, the final regular expression for
the DFA is
Є(01+10)*
= (01+10)*
32
Excercises
Convert the following DFA into RE.
33
Representation of Languages
Representations can be formal or informal.
Example (formal): represent a language by a RE or
DFA defining it.
Example: (informal): a logical or prose statement
about its strings:
{0n1n | n is a nonnegative integer}
The set of strings consisting of some number of 0’s
followed by the same number of 1’s.
34
Properties of Regular Languages
Language classes have two important kinds of properties:
Decision properties.
A decision property for a class of languages is an algorithm that
takes a formal description of a language (e.g., a DFA) and tells
whether or not some property holds.
Example: Is language L empty?
Closure properties.
A closure property of a language class says that given languages
in the class, an operator (e.g., union) produces another language in
the same class.
Example: the regular languages are obviously closed under union,
concatenation, and (Kleene) closure.Use the RE representation of
languages.
35
Pumping Lemma
It is shown that the class of language known as regular
language has at least four different descriptions.
They are the language accepted by DFA‟s, by NFA‟s,
by Є-NFA, and defined by RE.
Not every language is Regular.
To show that a langauge is not regular, the powerfull
technique used is known as Pumping Lemma.
36
Pumping Lemma
Statement:
Let L be a regular language. Then, there exists an
integer constant n so that for any x ε L with |x| ≥ n,
there are strings u, v, w such that x = uvw,
v is not equal to Є
|uv| ≤ n,
|v| > 0.
Then uvkw ε L for all k ≥ 0.
Note: Here k is the string that can be pumped i.e
repeating k any number of times or deleting it, keeps
the resulting string in the language.
37
Pumping Lemma
Proof:
Suppose L is a regular language, then L is accepted by
some DFA M. Let M has n states. Also L is infinite so
M accepts some string x of length n or greater. Let
length of x, |x| =m where m ≥ n.
Now suppose;
X = a1a2a3………………am where each ai ε Σ be an input
symbol to M. Now, consider for j = 1,………….n, qj be
states of M
38
Pumping Lemma
Then,
(q0,x) = (q0,a1a2………..am) [q0 being start state of M]
= (q1,a2………am)
=…………………
=…………………
= (qm,Є) [qm being final state]
39
Pumping Lemma
Now we can break x=uvw as
u = a1a2…………..ai
v =ai+1……………aj
w =aj+1……………am
i.e. string ai+1 ………………aj takes M from state qi
back to itself since qi = qj. So we can say M accepts
a1a2…………ai(ai+1…………aj)k aj+1……………am
for all k≥0.
Hence, uvkw ε L for all k≥0.
40
Application of Pumping Lemma
To prove any language is not a regular language.
For example: Show that language, L={0r1r|r ≥0} is
not a regular language.
Solution:
Let L is a regular language. Then by pumping lemma,
there are strings u, v, w with v≥1 such that uvkw ε L for
k≥0.
41
Application of Pumping Lemma
Case I:
Let v contain 0’s only. Then,
suppose u = 0p , v = 0q ,w = 0r1s ;
Then we must have p+q+r = s (as we have 0r1r ) and
q>0
Now, uvkw = 0p(0q)k0r1s = 0p+qk+r1s
Only these strings in 0p+qk+r1s belongs to L for k=1
otherwise not.
Hence we conclude that the language is not regular.
42
Application of Pumping Lemma
Case II
Let v contains 1’s only. Then u= 0p1q , v = 1r , w=1s
Then p= q+r+s and r>0
Now, 0p1q(1r)k1s = 0p1q+rk+s
Only those strings in 0p1q+rk+s belongs to L for k =1
otherwise not.
Hence the language is not regular.
43
Application of Pumping Lemma
Case III
V contains 0’s and 1’s both. Then, suppose,
u = 0p , v = 0q1r , w = 1s ;
p+q = r+s and q+r>0
Now, uvkw = 0p(0q1r)k1s = 0p+qk1rk+s
Only those strings in 0p+qk1rk+s belongs to L for k=1,
otherwise not. (As it contains 0 after 1 for k>1 in the
string.)
Hence the language is not regular.
44
Closure Properties of Regular Languages
The union of two regular languages is regular
The intersection of two regular languages is regular.
The complement of a regular language is regular
The difference of two regular language is regular.
The reversal of a regular language is regular.
The closure (star) of a regular language is regular.
The concatenation of a regular language is a regular.
45
Properties of Regular Languages over
Union(U)
Theorem: If L and M are regular languages, then so
is L U M.
Proof:
Since, L and M are regular, they have regular
expressions,
Say L=L(R) and M = L(S). Then
L U M = L(R+S) by the definition of the + operator for
regular expressions.
46
Properties of Regular Languages over
Complement
47
Minimization of Finite State Machines:
Table Filling Algorithm
Given a DFA M, that accepts a language L (M). Now,
configure a DFA M’. During the course of
minimization, it involves identifying the equivalent
states and distinguishable states.
Equivalent States: Two states p & q are called
equivalent states, denoted by p ≡ q if and only if for
each input string x, (
(p, x) is a final state if and only if (q, x) is a final
state.
Distinguishable state: Two states p & q are said to be
distinguishable states if (for any) there exists a string
x, such that (p, x) is a final state (q, x) is not a
48
final state.
Minimization of Finite State Machines:
Table Filling Algorithm
The steps of the algorithm are; For identifying the
pairs (p, q) with p ≠ q;
List all the pairs of states for which p ≠ q.
Make a sequence of passes through each pairs.
On first pass, mark the pair for which exactly one
element is final (F).
On each sequence of pass, mark the pair (r, s) if for any a
ε Σ, δ(r, a) = p and δ(s, a) = q and (p, q) is already
marked.
After a pass in which no new pairs are to be marked, stop
Then marked pairs (p, q) are those for which p and q are
not equivalent and unmarked pairs are those for which p
49 ≡ q.
Minimization of Finite State Machines:
Table Filling Algorithm
Example
50
Minimization of Finite State Machines:
Table Filling Algorithm
Now to solve this problem first we should determine
weather the pair is distinguishable or not.
51
Minimization of Finite State Machines:
Table Filling Algorithm
For pair (b, a)
(δ(b, 0 ), δ(a, 0)) = (g, h) – unmarked
(δ(b, 1), δ(a, 1)) = (c, f) – marked
For pair (d, a)
(δ(d, 0), δ(a, 0)) = (c, b) – marked
Therefore (d, a) is distinguishable.
For pair (e, a)
(δ(e, 0), δ(a, 0)) = (h, h) – unmarked.
(δ(e, 1), δ(a, 1)) = (f, f) –unmarked.
[(e, a) is not distinguishable)]
52
Minimization of Finite State Machines:
Table Filling Algorithm
For pair (g, a)
(δ(g, 0), δ( a, 0)) = (a, g) – unmarked.
(δ(g, 1), δ(a, 1)) = (e, f) – unmarked
For pair (h, a)
(δ(h, 0), δ(a, 0)) = (g, h) –unmarked
(δ(h, 1), δ(a 1) = (c, f) – marked
Therefore (h, a) is distinguishable.
For pair (d, b)
(δ(d, 0), δ(b,0)) = (c, g) – marked
Therefore (d, b) is distinguishable.
53
Minimization of Finite State Machines:
Table Filling Algorithm
For pair (e, b)
(δ(e, 0), δ(b,0)) = (h, g) –unmarked
(δ(e, 1), δ(b,1) = (f, c) – marked.
For pair (f, b)
(δ(f, 0), δ(b,0)) = (c, g) – marked
For pair (g, b)
(δ(g, 0), δ(b, 0)) = (g, g) – unmarked
(δ(h, 1), δ(b, 1)) = (e, c) – marked
For pair (h, b)
(δ(h, 0), δ(b, 0)) = (g, g) – unmarked
(δ(h,1), δ(b,1)) = (c, c) - unmarked.
54
Minimization of Finite State Machines:
Table Filling Algorithm
For pair (e, d)
(δ(e, 0), δ(d, 0)) = (h, c) – marked
(e, d) is distinguishable.
For pair (f, d)
(δ(f, 0), δ(d, 0)) = (c, c) – unmarked
(δ(f,1), δ(f,1)) = (g, g) - unmarked.
For pair (g, d)
(δ(g, 0), δ(d, 0)) = (g, c) – marked
For pair (h, d)
(δ(h, 0), δ(d, 0)) = (g, c) – marked
55
Minimization of Finite State Machines:
Table Filling Algorithm
For pair (f, e)
(δ(f, 0), δ(e, 0)) = (c, h) – marked
For pair (g, e)
(δ(g, 0), δ(e, 0)) = (g, h) – unmarked
(δ(g,1), δ(e,1)) = (e, f) -marked.
For pair (h, e)
(δ(h, 0), δ(e, 0)) = (g, h) – unmarked
(δ(h,1), δ(e,1)) = (c, f) -marked.
For pair (g, f)
(δ(g, 0), δ(f, 0)) = (g, c) – marked
56
Minimization of Finite State Machines:
Table Filling Algorithm
For pair (h, f)
(δ(h, 0), δ(f, 0)) = (g, c) – marked
For pair (h, g)
(δ(h, 0), δ(g, 0)) = (g, g) – unmarked
(δ(h,1), δ(g,1)) = (c, e) -marked.
Thus (a, e), (b, h) and (d, f) are equivalent pairs of
states.
57
Minimization of Finite State Machines:
Table Filling Algorithm
Hence the minimized DFA is
58