Lec 4
Lec 4
Specification of Tokens
1
Specification of Tokens
In this section first we need to know about finite vs infinite sets
and also uses the notion of a countable set.
2
Strings and Languages
For strings and languages we need to see a bunch of definitions:
3
Strings and Languages..
Def: A language over an alphabet is a countable set of strings over
the alphabet.
Ex: All grammatical English sentences with five, eight, or twelve
words is a language over ascii.
5
Strings and Languages...
Def: The proper prefixes, suffixes, and substrings of a string s are
those, prefixes, suffixes, and substrings, respectively, of s that
are not ε or not equal to s itself.
Ex: ban is proper prefix of banana.
6
Operation on Languages
The union of {English sentences with one, three, or five words} with
{English sentences with two or four words} is {English sentences with five
or fewer words}.
7
Operation on Languages..
The concatenation of L1 and L2 is the set of all strings st, where s is
a string of L1 and t is a string of L2
8
Operation on Languages..
The (Kleene) closure of L, denoted L* is L0 ∪ L1 ∪ L2 ...
Ex:
{a,b}* is {ε,a,b,aa,ab,ba,bb,aaa,aab,aba,abb,baa,bab,bba,bbb,...}
{a,b}+ is {a,b,aa,ab,ba,bb,aaa,aab,aba,abb,baa,bab,bba,bbb,...}
{ε,a,b}* is {ε,a,b,aa,ab,ba,bb,...}.
{ε,a,b}+ is the same as {ε,a,b}*.
9
Regular Expressions
A regular expression is a sequence of characters that forms a
search pattern, mainly for use in pattern matching with strings.
10
Regular Expressions..
Rules that define the regular expressions over some alphabet Σ
and the languages that those expressions denote are:
11
Regular Expressions...
INDUCTION: There are four parts to the induction whereby larger
regular expressions are built from smaller ones.
12
Regular Expressions...
Regular expressions often contain unnecessary pairs of
parentheses. We may drop certain pairs of parentheses if we
adopt the conventions that:
13
Regular Expressions...
Ex. Let Σ = {a, b}
2. (a|b)(a|b) denotes {aa, ab, ba, bb} , the language of all strings of
length two over the alphabet Σ .
Another regular expression for the same language is
aa | ab | ba | bb
14
Regular Expressions...
4. (a|b)* denotes the set of all strings consisting of zero or
more instances of a or b, that is, all strings of a's and b's:
{ε, a, b, aa, ab, ba, bb, aaa, ... }.
5. a | a*b denotes the language {a, b, ab, aab, aaab, ... }, that
is,
the string a and all strings consisting of zero or more a's and
ending in b
15
Thank You