@vtucode - in Module 2 ATC 5th Semester 2021 Scheme
@vtucode - in Module 2 ATC 5th Semester 2021 Scheme
in
MODULE-2
Regular Expressions
Regular Expression (RE)
1. ø is a RE.
2. ε is a RE.
3. Every element in ∑ is a RE.
4. Given two REs α and β,αβ is a RE.
5. Given two REs α and β, α U β is a RE.
6. Given a RE α, α* is a RE.
7. Given a RE α, α+ is a RE.
8. Given a RE α, (α) is a RE.
1
ATC-Module-2- vtucode.in
= (L(a) U L(b))*L(b)
=({a} U {b})*{b}
= {a,b}*{b}
(a U b)*b is the set of all strings over the alphabet {a, b} that end in b.
L = {abb,aabb,babb,ababb-------}
RE = (a U b)*abb
2
ATC-Module-2- vtucode.in
L = {a,aaa,ababa,bbaaaaba------}
RE = b*(ab*ab*)* a b* or b*ab*(ab*ab*)*
3
ATC-Module-2- vtucode.in
1. Kleene star
2. Concatenation
3. Union
Kleene's Theorem
Theorem 1:
Any language that can be defined by a regular expression can be accepted by some
finite state machine.
Theorem 2:
Any language that can be accepted by a finite state machine can be defined by
some regular expressions.
Note: These two theorems are proved further.
Figure (1)
4
ATC-Module-2- vtucode.in
Figure (2)
3. If α is ε,we construct simple FSM shown in Figure(3).
Figure (3)
4. Let β and γ be regular expressions.
If L(β) is regular,then FSM M1 = (K1, ∑ , δ1, s1, A1).
If L(γ) is regular,then FSM M2 = (K2, ∑ , δ2, s2, A2).
If α is the RE β U γ, FSM M3=(K3, ∑ , δ3, s3, A3) and
L(M3)=L(α)=L(β) U L(γ)
M3 = ({S3} U K1 U K2, ∑ , δ3, s3, A1 U A2), where
δ3 = δ1 U δ2 U { ((S3, ε), S1),((S3, ε),S2)}.
α=βUγ
5
ATC-Module-2- vtucode.in
α = βγ
6. If α is the regular expression β*, FSM M2 = (K2, ∑, δ2 s2, A2) such that
L (M2) = L (α)) = L (β )*.
M2 = ({S2} U K1, ∑, δ2,S2,{S2} U A1), where
δ2 = δ1 U {((S2, ε ),S1)} U {((q, ε ),S1):q ϵ A1}.
α = β*
An FSM for b
6
ATC-Module-2- vtucode.in
An FSM for a
An FSM for ab
7
ATC-Module-2- vtucode.in
8
ATC-Module-2- vtucode.in
9
ATC-Module-2- vtucode.in
fsmtoregexheuristic(M: FSM) =
3. If the start state of M is has incoming transitions into it, create a new start
state s.
4. If there is more than one accepting state of M or one accepting state with
outgoing transitions from it, create a new accepting state.
6. Until only the start state and the accepting state remain do:
Let M be:
Step 1:Create a new start state and a new accepting state and link them to M
11
ATC-Module-2- vtucode.in
1-2-1:ab U aaa*b
1-2-5:a
RE = (ab U aaa*b)*(a U ε)
12
ATC-Module-2- vtucode.in
Proof : By Construction
L(M) = L(α)
If any of the transitions are missing, add them without changing L(M) by labeling
all of the new transitions with the RE ø.
13
ATC-Module-2- vtucode.in
Select a state rip and remove it and modify the transitions as shown below.
Consider any states p and q.once we remove rip,how can M get from p to q?
Let R(p,q) be RE that labels the transition in M from P to Q.Then the new machine
M' will be removing rip,so R'(p,q)
= R(1,3) U R(1,2)R(2,2)*R(2,3)
= ø U ab*a
= ab*a
modified machine M
1.Standardize (M:FSM)
iv. If there is more than one transition between states p and q ,collapse them to
single transition
14
ATC-Module-2- vtucode.in
2.buildregex(M:FSM)
iii. until only the start state and the accepting state remain do:
iv. Return the RE that labels from start state to the accepting state
1-4-2 : bb
1-2: a U bb
15
ATC-Module-2- vtucode.in
1-3: (a U bb)b*a
RE = (a U bb)b*a
16
ATC-Module-2- vtucode.in
p-q-p: 01
p-r-p: 10
RE = (01 U 10)*
17
ATC-Module-2- vtucode.in
18
ATC-Module-2- vtucode.in
19
ATC-Module-2- vtucode.in
20
ATC-Module-2- vtucode.in
Building DFSM
• K can be defined by RE
Algorithm- buildkeywordFSM
• To build dfsm that accepts any string with atleast one of the specified
keywords
Buildkeyword(K:Set of keywords)
21
ATC-Module-2- vtucode.in
22
ATC-Module-2- vtucode.in
• More generally string processing, where the data need not be textual.
RE = -? ([0-9]+(\.[0-9]*)? | \.[0-9]+)
• (α)? means the RE α can occur 0 or 1 time.
((a-z) U (A-Z))
23
ATC-Module-2- vtucode.in
• α* means that the pattern may occur any number of times(including zero).
• α{n,m} means that the pattern must occur atleast n times but not more than
m times
• So RE of a legal password is :
• RE for an ip address is :
RE = ((0-9){1,3}(\.(0-9){1,3}){3})
Examples: 121.123.123.123
118.102.248.226
10.1.23.45
• Union is Commutative
αUβ=βUα
24
ATC-Module-2- vtucode.in
• Union is Associative
(α U β) U ү = α U (β U ү)
αUΦ=ΦUα=α
• union is idempotent
αUα=α
• Concatenation is associative
(αβ)ү = α(βү)
αε = εα = α
αΦ = Φα = Φ
• Φ* = ε
• ε* = ε
• (α*)* = α*
• α*α* = α*
25
ATC-Module-2- vtucode.in
• If α* ⊆ β* then α*β* = β*
• (α U β)* = (α*β*)*
= a* U aa //(α*)* = α*
= a* // L(aa) ⊆ L(a*)
= b* // α*α* = α*
= b* //L(ε U b) ⊆ L(b*)
26
ATC-Module-2- vtucode.in
Chapter-7
Regular Grammars
and terminals.
XY
Legal Rules
Sa
Sε
TaS
SaSa
STT
aSaT
ST
27
ATC-Module-2- vtucode.in
• Start symbol of any grammar G will be the symbol on the left-hand side of
the first rule in RG
DFSM accepting L
Sε
SaT
SbT
TaS
TbS
28
ATC-Module-2- vtucode.in
S => aT
=> abT
=> abaS
=> ababS
=> abab
THEOREM
Statement:
The class of languages that can be defined with regular grammars is exactly the
regular languages.
L (M) = L (G):
Algorithm-Grammar to FSM
29
ATC-Module-2- vtucode.in
to # labeled w.
accepting.
from D to D labeled i.
Example 2:GrammarFSM
RE = (a U b)*aaaa
Regular Grammar G
SaS
SbS
SaB
BaC
CaD
Da
30
ATC-Module-2- vtucode.in
31
ATC-Module-2- vtucode.in
32
ATC-Module-2- vtucode.in
33
ATC-Module-2- vtucode.in
RE = (a U bb)b*a
Grammar
AaB
AbD
BbB
BaC
DbB
Cε
A => aB
=> abB
=> abaC
=> aba
A => bB
=> bbB
=> bbaC
=> bba
number of b's}
34
ATC-Module-2- vtucode.in
Grammar
AaB
AbC
BaA
BbD
CbA
CaD
DbB
DaC
Cε
A => aB
=> abD
=> abaC
=> ababA
=> ababbC
=> ababb
35
ATC-Module-2- vtucode.in
w ends in a}.
SbS
SaT
T ε
TaS
TbX
XaS
XbX
36
ATC-Module-2- vtucode.in
• But regular grammars are often used in practice as FSMs and REs are easier
to work.
• But as we move further there will no longer exist a technique like regular
expressions.
Chapter-8
Statement:
Proof:
37
ATC-Module-2- vtucode.in
languages:
regular languages.
Proof:
is regular.
the R.E: s1 U s2 U …U sn
• So it too is regular
Regular expressions are most useful when the elements of L match one or
more patterns.
FSMs are most useful when the elements of L share some simple structural
properties.
38
ATC-Module-2- vtucode.in
Examples:
current US president}.
Fn = 22n + 1 , n >= 0.
• All of them are prime. It appears likely that no other Fermat numbers are
prime. If that is true,then L6
39
ATC-Module-2- vtucode.in
• Union
• Concatenation
• Kleene star
• Complement
• Intersection
• Difference
• Reverse
• Letter substitution
40
ATC-Module-2- vtucode.in
Theorem:
Proof:
Steps:
M2=(K, ∑,δ,s,K-A)
Example:
RE = (0 U 1)*01
Theorem:
Proof:
• Note that
• We have already shown that the regular languages are closed under both
complement and union.
• Example:
L = L1 ∩ L2, where
43
ATC-Module-2- vtucode.in
L = {w Є {a,b}* : w contains an even number of a’s and an odd number of b’s and
all a’s come in runs of three }.
Theorem:
Proof:
Theorem:
Proof:
Example:
By construction.
• Initially, let M′ be M.
44
ATC-Module-2- vtucode.in
• Example 1
sub(a) = 0, sub(b) = 11
• Example 2
h(0120) = h(0)h(1)h(2)h(0)
= aabbaa
45
ATC-Module-2- vtucode.in
h(01*2) = h(0)(h(1))*h(2)
= a(ab)*ba
Proof:
• Each time it reads an input character, it visits some state. So ,in processing a
string of length n, M creates a total of n+1 state visits.
• If n+1 > | K |, then, by the pigeonhole principle, some state must get more
than one visit.
• So, if n>= | K |,then M must visit at least one state more than once.
|xy| <= k,
y ≠ ε,and
Proof:
Let k be |K|
• We can carve w up and assign the name y to the first substring to drive M
through a loop.
• Then x is the part of w that precedes y and z is the part of w that follows y.
• We show that each of the last three conditions must then hold:
• |xy| <= k
• y≠ε
• ∀q >= 0 (xyqz ϵ L)
be in L.
1. Assume L is regular.
6. Our assumption is wrong and hence the given language is not regular.
47
ATC-Module-2- vtucode.in
Proof by contradiction.
1 2
aaaaa…aaaaabbbb…bbbbbb
x y z
k |xy| ,
y
• Not regular.
• L consists of all strings of the form a*b* where the number of a’s is five
more than the number of b’s.
• Let w = ak+5bk.
48
ATC-Module-2- vtucode.in
• We can pump y out once, which will generate the string ak+5-pbk, which is not
in L because the number of a’s is is less than 5 more than the number of b’s.
49