Regular Expressions: Definitions Equivalence To Finite Automata
Regular Expressions: Definitions Equivalence To Finite Automata
Regular Expressions
Definitions
Equivalence to Finite Automata
2
REs: Introduction
Regular expressions describe
languages by an algebra.
They describe exactly the regular
languages.
If E is a regular expression, then L(E) is
the language it defines.
Well describe REs and their languages
recursively.
3
Operations on Languages
REs use three operations: union,
concatenation, and Kleene star.
The union of languages is the usual
thing, since languages are sets.
Example: {01,111,10}{00, 01} =
{01,111,10,00}.
4
Concatenation
The concatenation of languages L and
M is denoted LM.
It contains every string wx such that w is
in L and x is in M.
Example: {01,111,10}{00, 01} =
{0100, 0101, 11100, 11101, 1000,
1001}.
5
Kleene Star
If L is a language, then L*, the Kleene
star or just star, is the set of strings
formed by concatenating zero or more
strings from L, in any order.
L* = {} L LL LLL
Example: {0,10}* = {, 0, 10, 00, 010,
100, 1010,}
6
REs: Definition
Basis 1: If a is any symbol, then a is a
RE, and L(a) = {a}.
Note: {a} is the language containing one
string, and that string is of length 1.
Basis 2: is a RE, and L() = {}.
Basis 3: is a RE, and L() = .
7
REs: Definition (2)
Induction 1: If E
1
and E
2
are regular
expressions, then E
1
+E
2
is a regular
expression, and L(E
1
+E
2
) = L(E
1
)L(E
2
).
Induction 2: If E
1
and E
2
are regular
expressions, then E
1
E
2
is a regular
expression, and L(E
1
E
2
) = L(E
1
)L(E
2
).
Induction 3: If E is a RE, then E* is a RE,
and L(E*) = (L(E))*.
8
Precedence of Operators
Parentheses may be used wherever
needed to influence the grouping of
operators.
Order of precedence is * (highest),
then concatenation, then + (lowest).
9
Examples: REs
L(01) = {01}.
L(01+0) = {01, 0}.
L(0(1+0)) = {01, 00}.
Note order of precedence of operators.
L(0*) = {, 0, 00, 000, }.
L((0+10)*(+1)) = all strings of 0s
and 1s without two consecutive 1s.
10
Equivalence of REs and Finite
Automata
We need to show that for every RE,
there is a finite automaton that accepts
the same language.
Pick the most powerful automaton type: the
-NFA.
And we need to show that for every
finite automaton, there is a RE defining
its language.
Pick the most restrictive type: the DFA.
11
Converting a RE to an -NFA
Proof is an induction on the number of
operators (+, concatenation, *) in the
RE.
We always construct an automaton of a
special form (next slide).
12
Form of -NFAs Constructed
No arcs from outside,
no arcs leaving
Start state:
Only state
with external
predecessors
Final state:
Only state
with external
successors
13
RE to -NFA: Basis
Symbol a:
:
:
a
14
RE to -NFA: Induction 1 Union
For E
1
For E
2
For E
1
E
2
15
RE to -NFA: Induction 2
Concatenation
For E
1
For E
2
For E
1
E
2
16
RE to -NFA: Induction 3 Closure
For E
For E*
17
DFA-to-RE
A strange sort of induction.
States of the DFA are named 1,2,,n.
Induction is on k, the maximum state
number we are allowed to traverse
along a path.
18
k-Paths
A k-path is a path through the graph of
the DFA that goes though no state
numbered higher than k.
Endpoints are not restricted; they can
be any state.
n-paths are unrestricted.
RE is the union of REs for the n-paths
from the start state to each final state.
19
Example: k-Paths
1
3
2
0
0 0
1
1
1
0-paths from 2 to 3:
RE for labels = 0.
1-paths from 2 to 3:
RE for labels = 0+11.
2-paths from 2 to 3:
RE for labels =
(10)*0+1(01)*1
3-paths from 2 to 3:
RE for labels = ??
20
DFA-to-RE
Basis: k = 0; only arcs or a node by
itself.
Induction: construct REs for paths
allowed to pass through state k from
paths allowed only up to k-1.
21
k-Path Induction
Let R
ij
k
be the regular expression for
the set of labels of k-paths from state i
to state j.
Basis: k=0. R
ij
0
= sum of labels of arc
from i to j.
if no such arc.
But add if i=j.
22
Example: Basis
R
12
0
= 0.
R
11
0
= + = .
1
3
2
0
0 0
1
1
1
Notice algebraic law:
plus anything =
that thing.
23
k-Path Inductive Case
A k-path from i to j either:
1. Never goes through state k, or
2. Goes through k one or more times.
R
ij
k
= R
ij
k-1
+ R
ik
k-1
(R
kk
k-1
)* R
kj
k-1
.
Doesnt go
through k
Goes from
i to k the
first time
Zero or
more times
from k to k
Then, from
k to j
24
Illustration of Induction
States < k
k
i
j
Paths not going
through k
From k
to j
From k to k
Several times
Path to k
25
Final Step
The RE with the same language as the
DFA is the sum (union) of R
ij
n
, where:
1. n is the number of states; i.e., paths are
unconstrained.
2. i is the start state.
3. j is one of the final states.
26
Example
R
23
3
= R
23
2
+ R
23
2
(R
33
2
)*R
33
2
= R
23
2
(R
33
2
)*
R
23
2
= (10)*0+1(01)*1
R
33
2
= + 0(01)*(1+00) + 1(10)*(0+11)
R
23
3
= [(10)*0+1(01)*1] [ +
(0(01)*(1+00) + 1(10)*(0+11))]*
1
3
2
0
0 0
1
1
1
Start
27
Summary
Each of the three types of automata
(DFA, NFA, -NFA) we discussed, and
regular expressions as well, define
exactly the same set of languages: the
regular languages.
28
Algebraic Laws for REs
Union and concatenation behave sort of
like addition and multiplication.
+ is commutative and associative;
concatenation is associative.
Concatenation distributes over +.
Exception: Concatenation is not
commutative.
29
Identities and Annihilators
is the identity for +.
R + = R.
is the identity for concatenation.
R = R = R.
is the annihilator for concatenation.
R = R = .