0% found this document useful (0 votes)
11 views16 pages

Lec 4

Uploaded by

Mohammad Humayun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views16 pages

Lec 4

Uploaded by

Mohammad Humayun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Contents

 Specification of Tokens

 Strings and Languages


 Operations on Languages
 Regular Expressions
 Regular Definitions
 Extensions of Regular Expressions

1
Specification of Tokens
 In this section first we need to know about finite vs infinite sets
and also uses the notion of a countable set.

 A countable set is either a finite set or one whose elements can be


counted.

 The set of rational numbers, i.e., fractions in lowest terms, is


countable;.

 The set of real numbers is uncountable, because it is strictly bigger,


i.e., it cannot be counted.

2
Strings and Languages
 For strings and languages we need to see a bunch of definitions:

 Def: An alphabet is a finite set of symbols.


Ex: {0,1}, presumably φ , ascii, unicode, ebcdic

 Def: A string over an alphabet is a finite sequence of symbols from


that alphabet. Strings are often called words or sentences.
Ex: Strings over {0,1}: ε, 0, 1, 111010.
Strings over ascii: ε, system, the string consisting of 3
blanks.

3
Strings and Languages..
 Def: A language over an alphabet is a countable set of strings over
the alphabet.
Ex: All grammatical English sentences with five, eight, or twelve
words is a language over ascii.

 Def: The concatenation of strings s and t is the string formed by


appending the string t to s. It is written st.
Ex: εs = sε = s for any string s.

 Def: The length of a string is the number of symbols (counting


duplicates) in the string.
Ex: The length of vciit, written |vciit|, is 5.
4
Strings and Languages...
 Def: A prefix of string S is any string obtained by removing zero or
more symbols from the end of s.
Ex: ban, banana, and ε are prefixes of banana.

 Def: A suffix of string s is any string obtained by removing zero or


more symbols from the beginning of s.
Ex: nana, banana, and ε are suffixes of banana.

 Def: substring of s is obtained by deleting any prefix and any suffix


from s.
Ex: banana, nan, and ε are substrings of banana.

5
Strings and Languages...
 Def: The proper prefixes, suffixes, and substrings of a string s are
those, prefixes, suffixes, and substrings, respectively, of s that
are not ε or not equal to s itself.
Ex: ban is proper prefix of banana.

 Def: A subsequence of s is any string formed by deleting zero or


more not necessarily consecutive positions of s.
Ex: baan is a subsequence of banana.

6
Operation on Languages

 In lexical analysis, the most important operations on languages are


union, concatenation, and closure, which are defined as follows:

 The union of L1 and L2, written L ∪ M is simply the set-theoretic


union, i.e., it consists of all words (strings) in either L1 or L2.

The union of {English sentences with one, three, or five words} with
{English sentences with two or four words} is {English sentences with five
or fewer words}.

7
Operation on Languages..
 The concatenation of L1 and L2 is the set of all strings st, where s is
a string of L1 and t is a string of L2

The concatenation of {a,b,c} and {1,2} is {a1,a2,b1,b2,c1,c2}

 As with strings, it is natural to define powers of a language L

L0={ε}, which is not φ. Li+1=LiL

8
Operation on Languages..
 The (Kleene) closure of L, denoted L* is L0 ∪ L1 ∪ L2 ...

 The positive closure of L, denoted L+ is L1 ∪ L2 ...

 Ex:
{a,b}* is {ε,a,b,aa,ab,ba,bb,aaa,aab,aba,abb,baa,bab,bba,bbb,...}
{a,b}+ is {a,b,aa,ab,ba,bb,aaa,aab,aba,abb,baa,bab,bba,bbb,...}
{ε,a,b}* is {ε,a,b,aa,ab,ba,bb,...}.
{ε,a,b}+ is the same as {ε,a,b}*.

9
Regular Expressions
 A regular expression is a sequence of characters that forms a
search pattern, mainly for use in pattern matching with strings.

 The idea is that the regular expressions over an alphabet consist of


the alphabet, and expressions using union, concatenation, and *,
but it takes more words to say it right.

 Each regular expression r denotes a language L(r) , which is also


defined recursively from the languages denoted by r's sub
expressions.

10
Regular Expressions..
 Rules that define the regular expressions over some alphabet Σ
and the languages that those expressions denote are:

 BASIS: There are two rules that form the basis:

1. ε is a regular expression, and L(ε) is {ε} , that is, the


language whose sole member is the empty string.

2. If a is a symbol in Σ, then a is a regular expression, and


L(a) = {a}, that is, the language with one string, of length
one, with a in its 1st position.

11
Regular Expressions...
 INDUCTION: There are four parts to the induction whereby larger
regular expressions are built from smaller ones.

Suppose r and s are regular expressions denoting languages L(r)


and L(s), respectively.

1. (r) | (s) is a regular expression denoting the language L(r) U L(s)


2. (r) (s) is a regular expression denoting the language L(r)L(s)
3. (r)* is a regular expression denoting (L (r)) *
4. (r) is a regular expression denoting L(r)

12
Regular Expressions...
 Regular expressions often contain unnecessary pairs of
parentheses. We may drop certain pairs of parentheses if we
adopt the conventions that:

a) The unary operator * has highest precedence and is left associative.

b) Concatenation has second highest precedence and is left associative.

c) | has lowest precedence and is left associative.

13
Regular Expressions...
 Ex. Let Σ = {a, b}

1. The regular expression a | b denotes the language {a, b} .

2. (a|b)(a|b) denotes {aa, ab, ba, bb} , the language of all strings of
length two over the alphabet Σ .
Another regular expression for the same language is
aa | ab | ba | bb

3. a* denotes the language consisting of all strings of zero or more


a's, that is, {ε , a, aa, aaa, ... }.

14
Regular Expressions...
4. (a|b)* denotes the set of all strings consisting of zero or
more instances of a or b, that is, all strings of a's and b's:
{ε, a, b, aa, ab, ba, bb, aaa, ... }.

Another regular expression for the same language is


(a*b*)*

5. a | a*b denotes the language {a, b, ab, aab, aaab, ... }, that
is,
the string a and all strings consisting of zero or more a's and
ending in b

15
Thank You

You might also like