0% found this document useful (0 votes)
14 views

Regular Expressions (2)

Uploaded by

Kriti Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Regular Expressions (2)

Uploaded by

Kriti Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Regular Expressions Unit-2

Regular Expression
Regular Expression are those algebraic expressions used for representing regular
languages, the languages accepted by finite automaton. Regular expressions offer a
declarative way to express the strings we want to accept. This is what the regular
expression offer that the automata do not.
A regular expression is built up out of simpler regular expression using a set of defining
rules. Each regular expression ‘r’ denotes a language L(r). The defining rules specify how
L(r) is formed by combining in various ways the languages denoted by the sub-
expressions of ‘r’.
Let Σ be an alphabet, the regular expression over the alphabet Σ are defined inductively
as follows;
-Φ is a regular expression representing empty language.
-Є is a regular expression representing the language of empty strings.
- if ‘a’ is a symbol in Σ, then ‘a’ is a regular expression representing the language {a}. -if
‘r’ and‘s’ are the regular expressions representing the language L(r) and L(s) then
¾ r U s is a regular expression denoting the language L(r)UL(s).
¾ r.s is a regular expression denoting the language L(r).L(s).
¾ r* is a regular expression denoting the language(L(r))*.
¾ (r) is a regular expression denoting the language (L(r))[L(r)] [defines parenthesis
may be placed around regular expressions if we desire].

Regular operator:

Basically, there are three operators that are used to generate the languages that are
regular,
-Union (U / |):
If L1 and L2 are any two regular languages then
L1UL2 ={s | s ε L1, or s ε L2 }
e.g.
L1 = {00, 11}, L2 = (Є, 10}
L1UL2 = {Є, 00, 11, 10}

-Concatenation (.):

If L1 and L2 are any two regular languages then,


L1.L2 = {l1.l2|l1 ε L1 and l2 ε L2}
e.g. L1 = {00, 11} and L2 = {Є, 10}

-Kleen Closure (*):

If L is any regular Language then,

* 0 1 2
L = Li =L UL UL U………….

Page - 1 - of 17
Regular Expressions Unit-2

-Positive closure (+):

+ i * 0
L = L = L -L
Precedence of regular operators:

¾ Closure (*) has highest precedence


¾ Concatenation (.) has next highest precedence.
¾ Union (U / | / +) has lowest precedence.

Regular language:

Let Σ be an alphabet, the class of regular language over Σ is defined inductively as;
¾ Φ is a regular language representing empty language
¾ {Є} is a regular language representing language of empty strings.
¾ For each a ε Σ, {a} is a regular language.
¾ If L1, L2…………. Ln is regular languages, then so is L1U L2U………..ULn.
¾ If LI,L2,L3,…………..Ln are regular languages, then so is L1.L2.L3………Ln
¾
If L is a regular language, then so is L*

Every language can be written as an expression using the operations of union,


concatenation and kleen closure. To simplify writing of these formulas, we adopt
following conventions;
¾ Writing a symbol, say ‘a’ by itself is shorthand for {a}. That is, we promote a
symbol to the singleton set containing it.
¾ The concatenation symbol . can be dropped as, in xy instead of x.y.
¾ Parentheses need be used only when it is necessary to override the normal
precedence of * over . over U.

Each of these formulas is called as regular construction and by definition of the regular
language are exactly those languages that are generated by regular construction.

Application of regular languages:


¾ Validation: Determining that a string complies with a set of formatting
constraints. Like email address validation, password validation etc.
¾ Search and Selection: Identifying a subset of items from a larger set on the basis
of a pattern match.
¾ Tokenization: Converting a sequence of characters into words, tokens (like
keywords, identifiers) for later interpretation.
Algebraic Rules/laws for regular expression:

1. Commutativity: The union of regular expression is commutative but


concatenation of regular expression is not commutative. i.e. if r and l are regular
expressions representing like languages L(r) and L(l) then,
r+l =l+r i.e.r U l = l U r

Page - 2 - of 17
Regular Expressions Unit-2

but r.l ≠l.r.


2. Associativity: The unions as well as concatenation of regular expressions are
associative.
i.e. if l, r, s are regular expressions representing regular languages L(l),L(r) and
L(s) then,
l+(r+s) = (l+r)+s
And l.(r.s) = (l.r).s
3. Distributive law: Φ is identity for union. i.e. for any regular expression r
representing regular language L(l), L(r) and L(s) then,
l(m+n) = lm+ln ------ left distribution.
(m+n)l = ml+nl ------ right distribution.
4. Identity law:
Φ is identity for union. i.e. for any regular expression r representing regular
expression L(r).
r+Φ=Φ+r=r i.e. ΦUr=r.
Є is identity for concatenation. i.e. Є.r = r = r. Є
5. Annihilator: An annihilator for an operator is a value such that when the operator
is applied to the annihilator and some other value, the result is annihilator.

Φ is annihilator for concatenation. i.e. Φ.r = r. Φ = Φ


6. Idempotent law of union: For any regular expression r representing the regular
language L(r), r + r = r. This is the idempotent law of union.
7. Law of closure: for any regular expression r, representing the regular language
L(r),
(r*)*=r*
Closure of Φ = Φ* = Є
Closure of Є = Є* = Є
+
Positive closure of r, r = rr*.
Examples
Consider Σ = {0, 1}, some regular expressions over Σ;
¾ 0*10*={w|w contains a single 1}
¾ Σ*1 Σ*={w|w contains at least single 1}
¾ Σ*001 Σ* = {w|w contains the string 001 as substring}
¾ (Σ Σ)* or ((0+1)*.(0+1)*) = {w|w is string of even length}
¾ 1*(01*01*)* = {w|w is string containing even number of zeros}
¾ 0*10*10*10* ={w|w is a string with exactly three 1’s}
¾ For string that have substring either 001 or 100, the regular expression is
(1+0)*.001.(1+0)*+(1+0)*.(100).(1+0)*
¾ For strings that have at most two 0’s with in it, the regular expression is
1*.(0+Є).1*.(0+Є).1*
+
¾ For the strings ending with 11, the regular expression is (1+0)*.(11)
¾ Regular expression that denotes the C identifiers:
(Alphabet + _ )(Alphabet + digit + _ )*

Page - 3 - of 17
Regular Expressions Unit-2

Theorem 1
If L, M and N are any languages, then L(M U N) = L(M) U L(N).
Proof:
Let w = xy be a string, now to prove the theorem it is sufficient to show
that w ε L(M) U L(N).

Now first consider “if part”:


Let w ε L(M) U L(N)
This implies that, w ε L(M) or w ε L(N) (by union rule)
i.e. xy ε L(M) or xy ε L(N)
Also,
xy εL(M) implies x ε L and y ε M (by concatenation rule)
And,
xy ε L(N) implies x ε L and y ε N (by concatenation rule)
This implies that
x εL and y ε (M U N)
=>xy ε L(M U N) (concatenating above)
=>w ε L(M U N)
Now consider “only if” part:
Let w ε L(M U N) => xy ε L(M U N)
Now,
xy ε L(M U N) => x ε L and y ε (M U N) (by concatenation)
y ε (M U N) => y ε M or y ε N (by union rule)
Now, we have x ε L
Here if y ε M then xy ε L(M) (by concatenation)
And if y ε N then xy ε L(N) (by concatenation)
Thus, if xy ε L(M) => xy ε (L(M) U L(N)) (by union rule)
Xy ε L(N) i.e. w ε (L(M) U L(N)
Thus,
We have, L (M U N) = L(M) U L(N)

Finite Automata and Regular Expression:


The regular expression approach for describing language is fundamentally different from
the finite automaton approach. However, these two notations turn out to represent exactly
the same set of languages, which we call regular languages.

Regular expression to finite automata:

Theorem: 2
Every language defined by a regular expression is also defined by a finite
automaton. [For any regular expression r, there is an Є-NFA that accepts the same
language represented by r].

Proof:

Page - 4 - of 17
Regular Expressions Unit-2

Let L =L(r) be the language for regular expression r, now we have to show there
is an Є-NFA E such that L (E) =L.
The proof can be done through structural induction on r, following the recursive
definition of regular expressions.
For this we know Φ, Є, a are the regular expressions representing languages {Φ};
an empty language, language for empty strings and {a} respectively. The Є-NFA
accepting these languages can be constructed as;

This forms the basis step.


Now, Inductive step:
Let r be a regular expression representing language L(r) and r1,r2 be regular
expressions for languages L(r1) and L(r 2), such that
For L(r) = L(r1) L(r2) we have the regular expression;
r = r1 r2 where = {+ (union), . (concatenation), *(closure)}
Now Case I: = + (union)
From basis step we can construct Є-NFA’s for r1 and r2. Let the Є-NFA’s be M1
and M2 respectively

Then, r=r1+r2 can be constructed as:

The language of this automaton is L(r1) U L(r2) which is also the language represented
by expression r1+r2.

Case II: = . (Concatenation)


Now, r = r1.r2 can be constructed as;

Page - 5 - of 17
Regular Expressions Unit-2

Here, the path from starting to accepting state go first through the automaton for r1,
where it must follow a path labeled by a string in L(r1), and then through the automaton
for r2, where it follows a path labeled by a string in L(r2). Thus, the language accepted by
above automaton is L(r1).L(r2).
Case III: = *(Kleen closure)
Now, r = r* Can be constructed as;

Clearly language of this Є-NFA is L(r*) as it can also just Є as well as string in L(r),
L(r)L(r), L(r)L(r)L(r) and so on. Thus covering all strings in L(r*).

Finally, for regular expression (r), the automaton for r also serves as the automaton for
(r), since the parentheses do not change the language defined by the expression. This
completes the proof.

Examples
For regular expression (1+0) the Є-NFA is:

The Є-NFA for (0+1)*

Page - 6 - of 17
Regular Expressions Unit-2

Now, Є-NFA for whole regular expression (0+1)*1(0+1)

For regular expression (00+1)*10 the Є-NFA is as

For regular expression (0+1+10)

For Regular Expression (1+110)*0

Page - 7 - of 17
Regular Expressions Unit-2

For regular expression 1(01+10)*+0(11+10)*

Conversion from DFA to Regular Expression

Arden’s Theorem
Let p and q be the regular expressions over the alphabet Σ, if p does not contain any
empty string then r = q + rp has a unique solution; r + qp*.

Proof:
Here, r = q + rp ……………… (i)
Let us put the value of r = q + rp on the right hand side of the relation (i), so;
r = q + (q + rp)p
2
r = q + qp + rp ………………(ii)
Again putting value of r = q + rp in relation (ii), we
2
get; r = q + qp + (q +rp) p
2 3
r = q+ qp + qp + qp ………………
Continuing in the same way, we will get as;
2 3
r = q + qp + qp + qp ………………..
2 3
r = q(Є + p + p +p +…………………..
Thus r = qp* Proved.

Use of Arden’s rule to find the regular expression for DFA:

To convert the given DFA into a regular expression, here are some of the
assumptions regarding the transition system:
¾ The transition diagram should not have the Є-transitions.
¾ There must be only one initial state.
¾ The vertices or the states in the DFA are as;
q1,q2,……………..qn (Any qi is final state)
¾ Wij denotes the regular expression representing the set of labels of the
edjes from qi to qj. Thus we can write expressions as;
q1=q1w11+q2w12+q3w31+………………qnwn1+Є

Page - 8 - of 17
Regular Expressions Unit-2

q2=q1w12+q2w22+q3w32+………………+qnwn2
q3=q1w13+q2w23+q3w33+………………+qnwn3
…………………………………………………
…………………………………………………
…………………………………………………
qn=q1w1n+q2wn2+q3wn3+………………………qnwnn
Solving these equations for qi in terms of wij gives the regular expression.

Examples: Convert the following DFA’S into regular expression.

Let the equations are


q1=q21+q30+ Є……….(i)
q2=q10…………………(ii)
q3=q11…………………..(iii)
q4=q20+q31+q40+ q41……(iv)
Putting the values of q2 and q3 in (i)
q1=q101+q110+ Є
i.e.q1=q1(01+10)+ Є
i.e.q1= Є+q1(01+10) (since r = q+rp)
i.e. q1= Є(01+10)* (using Arden’s rule)
Since, q1 is final state, the final regular expression for the DFA
is Є(01+10)* = (01+10)*

Configure regular expression for following:

Page - 9 - of 17
Regular Expressions Unit-2

Let the equation are:


q1=q10+q40+ Є…………………. (i)
q2=q11+q21+q41…………………(ii)
q3=q20……………………………(iii)
q4=q31……………………………(iv)
Putting value of q3 in (iv)
q4=q201…………………….(v)
Now, putting this value of q4 in equation (ii)
q2=q11+q21+q2011
=q11+q2(1+011)
Now using Arden’s rule;
q2=q11(1+011)*
Then from equation (v)
q4=q11(1+011)*10………….(vi)
Now from equation (i) &(vi), putting value of q4 in (i0
q1=q10+q11(1+011)*)010+ Є
=q1(0+1(1+011)*010)+ Є
Using Arden’s rule we get,
q1= Є(0+1(1+011)*010)*
Putting the value of q1in equation (vi)
q4=(0+(1+011)*010))1(1+011)*01
Since, q4 is final state in DFA, this is the equivalent regular expression for the DFA.

Given following NFA, configure equivalent regularexpression.

Now, the equations are:


q1=q10+q11+ Є…………(i)
q2=q11……………………(ii)
q3=q20+q21……………….(iii)
q4= q30+q31……………….(iv)
From equation (i),we have,
q1=q1(0+1)+ Є
q1= Є+q1(0+1)
Using Arden’s rule,
q1= Є(0+1)* = (0+1)*
Now from (ii)
q2= (0+1)*1
Similarly from (iii)
q3=q2(0+1) = (0+1*1(0+1)
And from (iv)
q4=q3(0+1) = (0+1)*1(0+1)(0+1)
Since we have q3 & q4 as final state so final regular expression is

Page - 10 - of 17
Regular Expressions Unit-2

q3+q4 = (0+1)*1(0+1)+(0+1)*1(0+1)(0+1)

Configure regular expression for following.

Let the equation be:


q1=q20+ Є………(i)
q2=q10……………(ii)
q3=q11+q51………….(iii)
q4=q21+q30+q40+q50………(iv)
q5=q31…………………………..(v)
Here, putting the value of q2 in (i),
q1=q100+ Є
i.e.q1 = Є+q100
Now, using Arden’s rule;
q1= Є (00)* = (00)*
Putting q3in q5;
q5= (q11+q51)1
= (100)*1 + q51)1
= (00)*11 + q511
Using Arden’s rule;
q5 = (00)*11(11)*
Since, here q1 & q5 are final states of DFA, the final regular expression
is: q1+q5 = (00)* + (00)* 11(11)*

Pumping lemma for regular expressions:

Any regular language can be represented by each of the formalism; DFA, NFA or regular
expression.

If L is any finite language, then L is regular. Why?

Because, for example, we could produce an NFA or DFA having |L| transitions,
with each labeled by a different string in L.
But when L is infinite language, it must contain arbitrarily long strings. Our
intuition would tell us that, in general, the longer a string x is, the more memory/states, it
will take to determine if x ε L. Since DFAS have only a constant amount of

Page - 11 - of 17
Regular Expressions Unit-2

memory/states, they would not be able to process long strings unless the strings had some
kind of repeated patterns in the strings of L. Similarly, any regular expression that
generates L must have a pattern enclosed in an *.
This leads us to suspect that any infinite regular language must contain long
strings that have some type of simple repetitive pattern in them. This fact is captured by
“Pumping Lemma”.

Statement: Let L be a regular language. Then, there exists an integer constant n so that
for any x ε L with |x| ≥ n, there are strings u, v, w such that x = uvw, |uv| ≤ n, |v| > 0.
k
Then uv w ε L for all k ≥ 0.
Proof:
Suppose L is a regular language, then L is accepted by some DFA M. Let M has n
states. Also L is infinite so M accepts some string x of length n or greater. Let length of x,
|x| =m where m ≥ n.
Now suppose;
X = a1a2a3………………am where each ai ε Σ be an input symbol to
M. Now, consider for j = 1,………….n,qj be states of M
Then,
(q0,x) = (q0,a1a2………..am) [q0 being start state of M]

= (q1,a1a2………am)
…………………
…………………
…………………
= (qm,Є) [qm being final state]
Since m ≥ n, and m has only n states, so by pigeonhole principle, there exists some I and
j;
0 ≤ i < j ≤ m such that qi =qj

So,

Now, this entire string x = a1…………..am can be broken as:


x = uvw such that
u = a1a2…………..ai
v =ai+1……………aj
w =aj+1……………am
i.e. string ai+1 ………………aj takes M from state qi back to itself since qi = qj. So we can
K
say M accepts a1a2…………ai(ai+1…………aj) aj+1……………am for all k≥0.
k
Hence, uv w ε L for all k≥0.

Page - 12 - of 17
Regular Expressions Unit-2

Application of Pumping Lemma:

To prove any language is not a regular language.

For example:
r r
Show that language, L={0 1 |n ≥0} is not a regular language.
=>Let L is a regular language. Then by pumping lemma, there are strings u, v, w with
k
v≥1 such that uv w ε L for k≥0.
Case I:
p q r s
Let v contain 0’s only. Then, suppose u = 0 , v = 0 ,w = 0 1 ; p+q+r = s (as we
r r
have 0 1 ) and q>0
k p q k r s p+qk+r s
Now, uv w = 0 (0 ) 0 1 =0 1
p+qk+r s
Only these strings in 0 1 belongs to L for k=1 otherwise not.

Case II
p q r s
Let v contains 1’s only. Then u= 0 1 , v= 1 , w=1
Then p= q+r+s and r>0
p q r k s p q+rk+s
Now, 0 1 (1 ) 1 =0 1
p q+rk+s
Only those stringsin 0 1 belongs to L fpr k =1 otherwise not.

Case III

V contains 0’s and 1’s both. Then, suppose,


p q r s
u=0 ,v=0 1,w=1;
p+q = r+s and q+r>0
k p q r s p+qk+ rk+s
Now, uv w = 0 (0 1 )1 = 0 1
p+qk+ rk+s
Only those strings in 0 1 belongs to L for k=1, otherwise not. (As it contains 0
after 1 for k>1 in the string.) Thus, the language L is not a regular language.

Minimization of DFA
Given a DFA M, that accepts a language L (M). Now, configure a DFA M ‘.
During the course of minimization, it involves identifying the equivalent states and
distinguishable states.

Equivalent States: Two states p & q are called equivalent states, denoted by p ≡ q if and

only if for each input string x, (p, x) is a final state if and only if (q, x) is a final
state.

Distinguishable state: Two states p & q are said to be distinguishable states if (for any)

there exists a string x, such that (p, x) is a final state (q, x) is not a final state.

Page - 13 - of 17
Regular Expressions Unit-2

For minimization, the table filling algorithm is used. The steps of the algorithm are;
For identifying the pairs (p, q) with p ≠ q;
¾ List all the pairs of states for which p ≠ q.
¾ Make a sequence of passes through each pairs.
¾ On first pass, mark the pair for which exactly one element is final (F).
¾ On each sequence of pass, mark the pair (r, s) if for any a ε Σ, δ(r, a) = p
and δ(s, a) = q and (p, q) is already marked.
¾ After a pass in which no new pairs are to be marked, stop
¾ Then marked pairs (p, q) are those for which p q and unmarked pairs are
those for which p ≡ q.
Example:

Now to solve this problem first we should determine weather the pair is
distinguishable or not.

For pair (b, a)


(δ(b, 0 ), δ(a, 0)) = (g, h) – unmarked
(δ(b, 1), δ(a, 1)) = (c, f) – marked
For pair (d, a)
(δ(d, 0), δ(a, 0)) = (c, b) – marked
Therefore (d, a) is distinguishable.
For pair (e, a)
(δ(e, 0), δ(a, 0)) = (h, h) – unmarked.

Page - 14 - of 17
Regular Expressions Unit-2

(δ(e, 1), δ(a, 1)) = (f, f) –unmarked.


[(e, a) is not distinguishable)]
For pair (g, a)
(δ(g, 0), δ( a, 0)) = (a, g) – unmarked.
(δ(g, 1), δ(a, 1)) = (e, f) – unmarked
For pair (h, a)
(δ(h, 0), δ(a, 0)) = (g, h) –unmarked
(δ(h, 1), δ(a 1) = (c, f) – marked
Therefore (h, a) is distinguishable.
For pair (d, b)
(δ(d, 0), δ(b,0)) = (c, g) – marked
Therefore (d, b) is distinguishable.
For pair (e, b)
(δ(e, 0), δ(b,0)) = (h, g) –unmarked
(δ(e, 1), δ(b,1) = (f, c) – marked.
For pair (f, b)
(δ(f, 0), δ(b,0)) = (c, g) – marked
For pair (g, b)
(δ(g, 0), δ(b, 0)) = (g, g) – unmarked
(δ(h, 1), δ(b, 1)) = (e, c) – marked
For pair (h, b)
(δ(h, 0), δ(b, 0)) = (g, g) – unmarked
(δ(h,1), δ(b,1)) = (c, c) - unmarked.
For pair (e, d)
(δ(e, 0), δ(d, 0)) = (h, c) – marked
(e, d) is distinguishable.
For pair (f, d)
(δ(f, 0), δ(d, 0)) = (c, c) – unmarked
(δ(f,1), δ(f,1)) = (g, g) - unmarked.
For pair (g, d)
(δ(g, 0), δ(d, 0)) = (g, c) – marked
For pair (h, d)
(δ(h, 0), δ(d, 0)) = (g, c) – marked
For pair (f, e)
(δ(f, 0), δ(e, 0)) = (c, h) – marked
For pair (g, e)
(δ(g, 0), δ(e, 0)) = (g, h) – unmarked
(δ(g,1), δ(e,1)) = (e, f) -marked.
For pair (h, e)
(δ(h, 0), δ(e, 0)) = (g, h) – unmarked
(δ(h,1), δ(e,1)) = (c, f) -marked.
For pair (g, f)
(δ(g, 0), δ(f, 0)) = (g, c) – marked
For pair (h, f)
(δ(h, 0), δ(f, 0)) = (g, c) – marked
For pair (h, g)

Page - 15 - of 17
Regular Expressions Unit-2

(δ(h, 0), δ(g, 0)) = (g, g) – unmarked


(δ(h,1), δ(g,1)) = (c, e) -marked.
Thus (a, e), (b, h) and (d, f) are equivalent pairs of states.

Hence the minimized DFA is

Minimize the following DFA:

Another simple approach.


¾ Maintain a partition of states of DFA. Initially partition consists of two
groups; the accepting states and non accepting states.
¾ The fundamental step is to take some group of states say A={s1, s2, s3,
……….sk} and some input symbol a, and look at what transitions states s1,
s2, s3, ……….sk have on input symbol a. If these transitions are the states that
fall into two or more different groups of the current partition, then we must
split of so that the transitions from the subsets of A are all confined to a single
group of the current partition. Suppose for example, that s1and s2 go to states
t1 and t2 on input ‘a’ and t1 and t2 are in different groups of the partition.
Then we must split A into at least two subsets so that one subset contains s1
and other s2. Note that t1 and t2 are distinguished by some string w, so s1 and
w
s2 are distinguished by string a .
¾ We repeat this process of splitting groups in the current partition until no
more groups need to be split.
Let us consider the following example

Page - 16 - of 17
Regular Expressions Unit-2

The initial partition consists of:


{A, B, C, D} {E}
= {A, B, C,} {D} {E} [since, D goes to E on input b]
= {A, c} {B} {D} {E} {B goes to D on input b]
This is the final partition; as A and C have same transition for a and b.

So, the minimized DFA is;

Minimize the following DFA:

{S0, S1, S2, S3} {S4}


(S0, S1} {S2, S3} {S4}

So, the final DFA is:

Page - 17 - of 17

You might also like