Regular Language: Md. Mohibullah, Assistant Professor, Department of CSE, Comilla University
Regular Language: Md. Mohibullah, Assistant Professor, Department of CSE, Comilla University
Expressions[Part-01]
Regular Language
A regular language is a language that can be expressed with a regular expression or a
deterministic or non-deterministic finite automata or state machine. A language is a set
of strings which are made up of characters from a specified alphabet, or set of symbols. Regular
languages are a subset of the set of all strings. Regular languages are used in parsing and designing
programming languages and are one of the first concepts taught in computability courses. These
are useful for helping computer scientists to recognize patterns in data and group certain
computational problems together — once they do that, they can take similar approaches to solve
the problems grouped together. Regular languages are a key topic in computability theory.
Operations on Regular Language
The various operations on regular language are:
Union: If A and B are two regular languages then their union A U B is also a union.
A U B = {w | w is in A or w is in B} or A U B = {w: w ∈ A or w ∈ B}
Concatenation: If A and B are two regular languages then their intersection is also an intersection.
AoB = {wx | w is in A and x is in B} or AB = {wx: w ∈ A and x ∈ B}.
Kleen closure or Star: If A is a regular language then its Kleen closure A* will also be a regular
language.
A* = {u1u2 . . . uk : k ≥ 0 and ui ∈ A for all i = 1, 2, . . . , k}.
In words, A* is obtained by taking any finite number of strings in A, and gluing them together.
Observe that k = 0 is allowed; this corresponds to the empty string Ɛ. Thus, Ɛ ∈ A*.
That is, A* = Zero or more occurrence of language A.
Regular Expression
o The language accepted by finite automata can be easily described by simple expressions called
Regular Expressions. It is the most effective way to represent any language. In other words,
Regular expressions can be thought of as the algebraic description of a regular language.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching
algorithm used this pattern to find the operations on a string.
In a regular expression, x* means zero or more occurrence of x. It can generate {ε, x, xx, xxx,
xxxx, ......} [Here ε or Ʌ or ʎ can be used]
and, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, ......}
Before formally defining the notion of a regular expression, we give some examples.
Consider the expression
(0 ∪ 1)01∗
The language described by this expression is the set of all binary strings
1. that start with either 0 or 1 (this is indicated by (0 ∪ 1)),
2. for which the second symbol is 0 (this is indicated by 0), and
3. that end with zero or more 1s (this is indicated by 1∗).
That is, the language described by this expression is
{00, 001, 0011, 00111, . . ., 10, 101, 1011, 10111, . . .}.
Here another example, the alphabet is {0, 1}:
Definition 3
Let R1 and R2 be regular expressions and let L1 and L2 be the languages described by them,
respectively. If L1 = L2 (i.e., R1 and R2 describe the same language), then we will write R1 = R2.
Hence, even though (0∪ ε)1∗ and 01∗∪1 ∗ are different regular expressions, we write
(0 ∪ ε)1∗ = 01∗ ∪ 1 ∗
because they describe the same language.
We will not present the (boring) proofs of these identities, but urge you to convince yourself
informally that they make perfect sense. To give an example, we mentioned above that
(0 ∪ ε)1∗ = 01∗ ∪ 1∗.
We can verify this identity in the following way:
(0 ∪ ε)1∗ = 01∗ ∪ ε1∗ (by identity 7)
= 01∗ ∪ 1∗ (by identity 2)
Exercise 3
Write the regular expression for the language accepting all the string containing any number of a's
and b’s i.e. all string at all over ∑ = {a, b}.
Exercise 4
Write the regular expression for the language accepting over ∑ = {0, 1}.
i. all the string which are starting with 1 and ending with 0
ii. all strings beginning with 1 and ending with 00
iii. all strings ending with either 010 or 0010
iv. all strings beginning with 00