Regular Expressions (2)
Regular Expressions (2)
Regular Expression
Regular Expression are those algebraic expressions used for representing regular
languages, the languages accepted by finite automaton. Regular expressions offer a
declarative way to express the strings we want to accept. This is what the regular
expression offer that the automata do not.
A regular expression is built up out of simpler regular expression using a set of defining
rules. Each regular expression ‘r’ denotes a language L(r). The defining rules specify how
L(r) is formed by combining in various ways the languages denoted by the sub-
expressions of ‘r’.
Let Σ be an alphabet, the regular expression over the alphabet Σ are defined inductively
as follows;
-Φ is a regular expression representing empty language.
-Є is a regular expression representing the language of empty strings.
- if ‘a’ is a symbol in Σ, then ‘a’ is a regular expression representing the language {a}. -if
‘r’ and‘s’ are the regular expressions representing the language L(r) and L(s) then
¾ r U s is a regular expression denoting the language L(r)UL(s).
¾ r.s is a regular expression denoting the language L(r).L(s).
¾ r* is a regular expression denoting the language(L(r))*.
¾ (r) is a regular expression denoting the language (L(r))[L(r)] [defines parenthesis
may be placed around regular expressions if we desire].
Regular operator:
Basically, there are three operators that are used to generate the languages that are
regular,
-Union (U / |):
If L1 and L2 are any two regular languages then
L1UL2 ={s | s ε L1, or s ε L2 }
e.g.
L1 = {00, 11}, L2 = (Є, 10}
L1UL2 = {Є, 00, 11, 10}
-Concatenation (.):
* 0 1 2
L = Li =L UL UL U………….
Page - 1 - of 17
Regular Expressions Unit-2
+ i * 0
L = L = L -L
Precedence of regular operators:
Regular language:
Let Σ be an alphabet, the class of regular language over Σ is defined inductively as;
¾ Φ is a regular language representing empty language
¾ {Є} is a regular language representing language of empty strings.
¾ For each a ε Σ, {a} is a regular language.
¾ If L1, L2…………. Ln is regular languages, then so is L1U L2U………..ULn.
¾ If LI,L2,L3,…………..Ln are regular languages, then so is L1.L2.L3………Ln
¾
If L is a regular language, then so is L*
Each of these formulas is called as regular construction and by definition of the regular
language are exactly those languages that are generated by regular construction.
Page - 2 - of 17
Regular Expressions Unit-2
Page - 3 - of 17
Regular Expressions Unit-2
Theorem 1
If L, M and N are any languages, then L(M U N) = L(M) U L(N).
Proof:
Let w = xy be a string, now to prove the theorem it is sufficient to show
that w ε L(M) U L(N).
Theorem: 2
Every language defined by a regular expression is also defined by a finite
automaton. [For any regular expression r, there is an Є-NFA that accepts the same
language represented by r].
Proof:
Page - 4 - of 17
Regular Expressions Unit-2
Let L =L(r) be the language for regular expression r, now we have to show there
is an Є-NFA E such that L (E) =L.
The proof can be done through structural induction on r, following the recursive
definition of regular expressions.
For this we know Φ, Є, a are the regular expressions representing languages {Φ};
an empty language, language for empty strings and {a} respectively. The Є-NFA
accepting these languages can be constructed as;
The language of this automaton is L(r1) U L(r2) which is also the language represented
by expression r1+r2.
Page - 5 - of 17
Regular Expressions Unit-2
Here, the path from starting to accepting state go first through the automaton for r1,
where it must follow a path labeled by a string in L(r1), and then through the automaton
for r2, where it follows a path labeled by a string in L(r2). Thus, the language accepted by
above automaton is L(r1).L(r2).
Case III: = *(Kleen closure)
Now, r = r* Can be constructed as;
Clearly language of this Є-NFA is L(r*) as it can also just Є as well as string in L(r),
L(r)L(r), L(r)L(r)L(r) and so on. Thus covering all strings in L(r*).
Finally, for regular expression (r), the automaton for r also serves as the automaton for
(r), since the parentheses do not change the language defined by the expression. This
completes the proof.
Examples
For regular expression (1+0) the Є-NFA is:
Page - 6 - of 17
Regular Expressions Unit-2
Page - 7 - of 17
Regular Expressions Unit-2
Arden’s Theorem
Let p and q be the regular expressions over the alphabet Σ, if p does not contain any
empty string then r = q + rp has a unique solution; r + qp*.
Proof:
Here, r = q + rp ……………… (i)
Let us put the value of r = q + rp on the right hand side of the relation (i), so;
r = q + (q + rp)p
2
r = q + qp + rp ………………(ii)
Again putting value of r = q + rp in relation (ii), we
2
get; r = q + qp + (q +rp) p
2 3
r = q+ qp + qp + qp ………………
Continuing in the same way, we will get as;
2 3
r = q + qp + qp + qp ………………..
2 3
r = q(Є + p + p +p +…………………..
Thus r = qp* Proved.
To convert the given DFA into a regular expression, here are some of the
assumptions regarding the transition system:
¾ The transition diagram should not have the Є-transitions.
¾ There must be only one initial state.
¾ The vertices or the states in the DFA are as;
q1,q2,……………..qn (Any qi is final state)
¾ Wij denotes the regular expression representing the set of labels of the
edjes from qi to qj. Thus we can write expressions as;
q1=q1w11+q2w12+q3w31+………………qnwn1+Є
Page - 8 - of 17
Regular Expressions Unit-2
q2=q1w12+q2w22+q3w32+………………+qnwn2
q3=q1w13+q2w23+q3w33+………………+qnwn3
…………………………………………………
…………………………………………………
…………………………………………………
qn=q1w1n+q2wn2+q3wn3+………………………qnwnn
Solving these equations for qi in terms of wij gives the regular expression.
Page - 9 - of 17
Regular Expressions Unit-2
Page - 10 - of 17
Regular Expressions Unit-2
q3+q4 = (0+1)*1(0+1)+(0+1)*1(0+1)(0+1)
Any regular language can be represented by each of the formalism; DFA, NFA or regular
expression.
Because, for example, we could produce an NFA or DFA having |L| transitions,
with each labeled by a different string in L.
But when L is infinite language, it must contain arbitrarily long strings. Our
intuition would tell us that, in general, the longer a string x is, the more memory/states, it
will take to determine if x ε L. Since DFAS have only a constant amount of
Page - 11 - of 17
Regular Expressions Unit-2
memory/states, they would not be able to process long strings unless the strings had some
kind of repeated patterns in the strings of L. Similarly, any regular expression that
generates L must have a pattern enclosed in an *.
This leads us to suspect that any infinite regular language must contain long
strings that have some type of simple repetitive pattern in them. This fact is captured by
“Pumping Lemma”.
Statement: Let L be a regular language. Then, there exists an integer constant n so that
for any x ε L with |x| ≥ n, there are strings u, v, w such that x = uvw, |uv| ≤ n, |v| > 0.
k
Then uv w ε L for all k ≥ 0.
Proof:
Suppose L is a regular language, then L is accepted by some DFA M. Let M has n
states. Also L is infinite so M accepts some string x of length n or greater. Let length of x,
|x| =m where m ≥ n.
Now suppose;
X = a1a2a3………………am where each ai ε Σ be an input symbol to
M. Now, consider for j = 1,………….n,qj be states of M
Then,
(q0,x) = (q0,a1a2………..am) [q0 being start state of M]
= (q1,a1a2………am)
…………………
…………………
…………………
= (qm,Є) [qm being final state]
Since m ≥ n, and m has only n states, so by pigeonhole principle, there exists some I and
j;
0 ≤ i < j ≤ m such that qi =qj
So,
Page - 12 - of 17
Regular Expressions Unit-2
For example:
r r
Show that language, L={0 1 |n ≥0} is not a regular language.
=>Let L is a regular language. Then by pumping lemma, there are strings u, v, w with
k
v≥1 such that uv w ε L for k≥0.
Case I:
p q r s
Let v contain 0’s only. Then, suppose u = 0 , v = 0 ,w = 0 1 ; p+q+r = s (as we
r r
have 0 1 ) and q>0
k p q k r s p+qk+r s
Now, uv w = 0 (0 ) 0 1 =0 1
p+qk+r s
Only these strings in 0 1 belongs to L for k=1 otherwise not.
Case II
p q r s
Let v contains 1’s only. Then u= 0 1 , v= 1 , w=1
Then p= q+r+s and r>0
p q r k s p q+rk+s
Now, 0 1 (1 ) 1 =0 1
p q+rk+s
Only those stringsin 0 1 belongs to L fpr k =1 otherwise not.
Case III
Minimization of DFA
Given a DFA M, that accepts a language L (M). Now, configure a DFA M ‘.
During the course of minimization, it involves identifying the equivalent states and
distinguishable states.
Equivalent States: Two states p & q are called equivalent states, denoted by p ≡ q if and
only if for each input string x, (p, x) is a final state if and only if (q, x) is a final
state.
Distinguishable state: Two states p & q are said to be distinguishable states if (for any)
there exists a string x, such that (p, x) is a final state (q, x) is not a final state.
Page - 13 - of 17
Regular Expressions Unit-2
For minimization, the table filling algorithm is used. The steps of the algorithm are;
For identifying the pairs (p, q) with p ≠ q;
¾ List all the pairs of states for which p ≠ q.
¾ Make a sequence of passes through each pairs.
¾ On first pass, mark the pair for which exactly one element is final (F).
¾ On each sequence of pass, mark the pair (r, s) if for any a ε Σ, δ(r, a) = p
and δ(s, a) = q and (p, q) is already marked.
¾ After a pass in which no new pairs are to be marked, stop
¾ Then marked pairs (p, q) are those for which p q and unmarked pairs are
those for which p ≡ q.
Example:
Now to solve this problem first we should determine weather the pair is
distinguishable or not.
Page - 14 - of 17
Regular Expressions Unit-2
Page - 15 - of 17
Regular Expressions Unit-2
Page - 16 - of 17
Regular Expressions Unit-2
Page - 17 - of 17