Regular Expression Question Solution
Regular Expression Question Solution
Shakir Al Faraji
Computer Science Dept.,
Petra University
Amman - Jordan.
email: [email protected]
Thank you,
Dr. Shakir Al Faraji
Shakir
IMPORTANT NOTES
Students
This presentation is designed to be used in
class as part of a guided discovery
sequence. It is not self-explanatory! Please
use it only for revision purposes after having
taken the class. Simply going through the
slides will teach you nothing. You must be
actively thinking, doing and questioning to
learn!
Thank you,
Dr. Shakir Al Faraji
Shakir.
Course Strategy
Be
Shakir.
Material
There
is a book:
Hopcroft, Rajeev,
& Ullman 3ed Edition
(2007), Addison Wesley
These were the lecture notes.
Well, apart from the slides.
Thank you,
Dr. Shakir Al Faraji
Shakir.
Regular Expression
Definition
A regular expression, or RE,
describes strings of characters
(words or phrases or any
arbitrary text). It's a pattern that
matches certain strings and
doesn't match others. A regular
Definition-Cont
Regular expressions are used to
generate patterns of strings. A
regular expression is an algebraic
formula whose value is a pattern
consisting of a set of strings,
called the language of the
expression.
Dr. Shakir Al Faraji
Operands in a regular
expression
Operands in a regular expression can be:
characters from the alphabet over which
the regular expression is defined.
variables whose values are any pattern
defined by a regular expression.
epsilon which denotes the empty string
containing no characters.
null which denotes the empty set of
strings.
Dr. Shakir Al Faraji
Operators used in
regular expressions
Union: If R1 and R2 are regular
expressions, then R1 | R2 (also written as
R1 U R2 or R1 + R2) is also a regular
expression.
L(R1|R2) = L(R1) U L(R2).
Concatenation: If R1 and R2 are regular
expressions, then R1R2 (also written as
R1.R2) is also a regular expression.
L(R1R2) = L(R1) concatenated with L(R2).
Dr. Shakir Al Faraji
Operators used in
regular expressions
Kleene closure: If R1 is a regular
expression, then R1* (the Kleene closure
of R1) is also a regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...
Examples
The set of strings over {0,1} that end in
3 consecutive 1's.
(0 | 1)* 111
OR (0 + 1)* 111
The set of strings over {0,1} that have
at least one 1.
0* 1 (0 + 1) *
Dr. Shakir Al Faraji
Examples-Cont.
The set of strings over {0,1} that have at
most one 1.
0* | 0* 1 0*
The set of strings over {A..Z,a..z} that
contain the word "main".
Let <letter> = A | B | ... | Z | a | b | ... | z
<letter>* main <letter>*
Dr. Shakir Al Faraji
Examples-Cont.
The set of strings over {A..Z,a..z} that
contain 3 x's.
<letter>* x <letter>* x <letter>* x <letter>*
Examples-Cont.
The set of identifiers in Pascal.
Let <letter> = A | B | ... | Z | a | b | ... | z
Let <digit> = 0 | 1 | 2 | 3 ... | 9
<letter> (<letter> | <digit>)*
Examples-Cont.
The set of real numbers in Pascal.
Let <digit> = 0 | 1 | 2 | 3 ... | 9
Let <exp> = 'E' <sign> <digit> <digit>* | epsilon
Let <sign> = '+' | '-' | epsilon
Let <decimal> = '.' <digit> <digit>* | epsilon
Examples-Cont.
Consider = { a }
L is a language that each word is
of odd length
a (aa)*
Examples-Cont.
Consider = { a, b }
L is a language that each word
must start with the letter b
b (a+b)*
Examples-Cont.
Consider = { a, b }
L is a language that each word
must start with the letter b and
end with the letter a
b (a+b)* a
Examples-Cont.
Consider = { a, b, c }
L = { a, c, ab, cb, abb, cbb,
abbb,
cbbb, abbbb, cbbbb . . . }
Examples-Cont.
Consider (a+b)*a(a+b)*
L = language of all words over the
= { a, b } that have an a in them
L = { a, aa, ba, aab, aba, baa,
bba,
aaaa, aaba, abaa . . . }
Examples-Cont.
Consider the following RE
(a+b)* a(a+b)* a(a+b)*
L = language of all words over the
= { a, b } that have at least two
as in them
Examples-Cont.
(a+b)* a(a+b)* a(a+b)* = b*ab*a(a+b)*
?
(a+b)* a(a+b)* a(a+b)* = (a+b)*ab*ab*
?
(a+b)* a(a+b)* a(a+b)* = b*a(a+b)*ab*
Examples-Cont.
(a+b)* a(a+b)* b(a+b)*
Examples-Cont.
(a+b)* a(a+b)* b(a+b)*
?
Language of all words that have at
least one a and at least one b !!!!
Examples-Cont.
(a+b)* a(a+b)* b(a+b)*
What about the word ba !!!!
Examples-Cont.
(a+b)* a(a+b)* b(a+b)*
What about the word ba !!!!
MUST BE
(a+b)* a(a+b)* b(a+b)* + (a+b)* b(a+b)* a(a+b)*
Examples-Cont.
IS
(a+b)* a(a+b)* b(a+b)* + (a+b)* b(a+b)* a(a+b)*
SAME AS
(a+b)* a(a+b)* b(a+b)* + bb*aa*
Examples-Cont.
b* + ab*
( + a )b*
b* + ab* = ( + a )b*
More on RE.
Definition
If S and T are sets of strings of letters, we
define the product set of strings of letters to
be
ST = { all combinations of a string from S
concatenated with a string from T }
More on RE - Cont.
If S = { a, aa, aaa } , T = { bb, bbb }
then
ST = { abb, abbb, aabb, aabbb, aaabb, aaabbb }
Languages Associated
with RE.
Definition
Examples.
Can you describe the following RE
(a+b)* (aa+bb) (a+b)*
Examples.
Can you describe the following RE
(a+b)* (aa+bb) (a+b)*
All strings of as and bs that at some point
contain a double letter.
Examples.
= { a, b }
What strings do not contain a double
letter?
Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Is it correct ?
Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Is it correct ? NO
Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Is it correct ? NO
( +b)(ab)*( +a)
Dr. Shakir Al Faraji
Examples.
( a + b* )* = (a + b )*
Examples.
( a + b* )* = (a + b )*
YES
YES
NO
Examples.
[aa + bb + ( ab + ba) (aa+bb)*(ab + ba) ]*
Examples.
[aa + bb + ( ab + ba) (aa+bb)*(ab + ba) ]*
EVEN-EVEN
type1 = aa
type2 = bb
type3 = (ab+ba)(aa+bb)*(ab+ba)
E = [ type1 + type2 + type3 ] *
b[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}\b
Metacharacters
As youve learned, the backslash negates any
special meaning that the character
following it has to a regular expression. It
has another function, too: it can turn
ordinary characters into special ones.
Consider the tab. You dont see it on the
screen the way you see ordinary letters;
you see what it does.
Metacharacters-Cont.
If you turn on the show-invisibles function,
however, you generally see an indication
that there is a character there.
Regular expressions let you access these
invisible characters (usually called
metacharacters):
Metacharacters-Cont.
Metacharacter
Meaning
\n
Newline (or paragraph
mark, or however you
think of it)
\t
Tab character
\s
Any whitespace
character (tab, space, or
newline)
Metacharacters-Cont.
For purposes of modifiers like star and
plus, these metacharacters act like single
characters. So \n+ finds one or more
newlines.
A special caution with BBEdit: Because of
ancient OS wars, Macs and non-Macs
treat newlines differently. If a regular
expression containing \n isnt finding
what you think it should, try replacing \n
in your search pattern with \r.
Dr. Shakir Al Faraji
Metacharacters-Cont.
Depending on your regular-expression
engine or editing program, there may be
other metacharacters available to you.
Read the manual or help pages for details.
In addition, a few more special regularexpression characters provide useful
functions. Remember that to look for the
actual character, you must precede it with
a backslash.
Dr. Shakir Al Faraji
Metacharacters-Cont.
Depending on your regular-expression
engine or editing program, there may be
other metacharacters available to you.
Read the manual or help pages for details.
In addition, a few more special regularexpression characters provide useful
functions. Remember that to look for the
actual character, you must precede it with
a backslash.
Dr. Shakir Al Faraji
END