0% found this document useful (0 votes)
44 views35 pages

ToA - Lecture 03 04 - Language Preliminaries Regular Expressions

Uploaded by

wosqa nisar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views35 pages

ToA - Lecture 03 04 - Language Preliminaries Regular Expressions

Uploaded by

wosqa nisar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Lecture – 03 & 04

Language Preliminaries & Regular Expressions


Theory of Formal Languages
• An area of theoretical computer science dealing with questions concerning syntax.
• Language – a set of words
• Word – a sequences of symbols from some alphabet
• Alphabet – a set of symbols (or letters)
• Words and languages appear in computer science on many levels:
• Representation of input and output data
• Representation of programs
• Manipulation with character strings or files
Theory of Formal Languages - Motivation
• Examples of problem types, where theory of formal languages is useful:
• Construction of Compilers:
• Lexical analysis
• Syntactic analysis

• Searching in Text:
• Searching for a given text pattern
• Searching for a part of text specified by a regular expression
Alphabet, Word
• Alphabet – a nonempty finite set of symbols
• Example:
• Word – a finite sequence of symbols from the given alphabet
• Example:
• The set of all words of alphabet is denoted with .
• For variables, whose values are words, we will use names such as etc., possibly with indexes
(e.g., )
• So, when we write , it means that the value of variable is word .
• Similarly, the notation means that the value of a variable is some word consisting of
symbols belonging to alphabet .
Formal Languages
• A (formal) language over an alphabet is a subset of , i.e.,

• Example:
• Language
• Language
Encoding of Input and Output
• Inputs and outputs of an algorithm could be encoded as words over some alphabet .

• Example: for “Sorting” problem, we can take alphabet .


• An example of input data (as a word over alphabet ):

• and the corresponding output data (as a word over alphabet )

• Remark: It is often the case that only some words over the given alphabet represent valid
input or output.
Encoding of Input and Output
• Example: If an input for a given problem is graph, it could be represented as a pair of two
lists – a list of nodes and a list of edges:
• For example, the following graph

• could be represented as a word

• over alphabet .
Correspondence between Recognizing Formal Languages
and Decision Problems
• There is a close correspondence between recognizing words from a given language and
decision problems:
• For each language over some alphabet there is a corresponding decision problem:

• Input: A word over alphabet .


• Question: Does belong to ?

• For each decision problem where inputs are encoded as words over alphabet , there is a corresponding
language:
• The language containing of exactly those words over alphabet , for which the answer to the question stated in
problem P is “Yes”.
Correspondence between Recognizing Formal Languages
and Decision Problems
• Example: The following decision problem can be viewed as the language L given below and
vice versa.

• Problem
• Input: A word over alphabet .
• Question: Does the word contain an even number of occurrences of symbol b?

• Language
Models of Computation
• We can consider different types of machines that are able to perform an algorithm.
• There can be many kinds of differences between these types of machines:
• What types of instructions they can execute
• What types of dates they can store in their memory and this memory is organized

• Different kinds of such machines are called models of computation.


• In the case of very simple kinds of such machines, they are usually called automata in the
formal language theory.
• In this course we will see several types of such automata.
Models of Computation
• For different types of models of computation analyze for example:
• What algorithmic problems can be solved by such machines and what languages they can recognize.
• How efficiently they can execute different algorithms
• How machines of a certain type can simulate the computations of some other type of machines
• How the number of instructions that are executed by the machine in such simulation grows compared
to the original machine
• …
Alphabet
• An alphabet is a finite, non-empty set of symbols
• We use the symbol (sigma) to denote an alphabet
• Examples:
• Binary:
• All lower-case letters:
• Alphanumeric:
• DNA molecule letters:
• …
Strings
• A string or word is a finite sequence of symbols chosen from
• Empty string is (or “epsilon”)
• Length of a string , denoted by “”, is equal to the number of (non-) characters in the string

• = concatenation of two strings and


Powers of an Alphabet
• Let be an alphabet.
Languages
• is said to be a language over alphabet , only if
• this is because is the set of all strings (of all possible length including ) over the given alphabet

• Examples:
• Let be the language of all strings consisting of ’s followed by ’s:

• Let be the language of all strings of equal number of ’s and ’s:

• Definition: denotes the Empty Language


• Let ; Is ?
• No
The Membership Problem
• Given a string and a language over , decide whether .

• Example:
• Let
• Is ?
Concatenation of Words
• One of operations we can do on words is the operation of concatenation:
• For example, the concatenation of words and is the word .
• The operation of concatenation is denoted by symbol (it is similar to multiplication). This
symbol can be omitted.
• So, for , the concatenation of words and is written as or just .

• Remark: Formally, the concatenation of words over alphabet is a function of type


Concatenation of Words
• Concatenation is associative, i.e., for every three words , and , we have

• which means that we can omit parenthesis when we write multiple concatenations. For example, we
can write instead of .
• Word is a neutral element for the operation of concatenation, so for every word we also
have:

• Remark: It is obvious that if the given alphabet contains at least two different symbols, the
operation of concatenation is not commutative, e.g.,
Power of a Word
• For arbitrary word and arbitrary we can define word as the word obtained by concatenating
copies of the word .
• Example:
• For its .

• A little bit more formal definition looks as follows:

for
Reverse of a Word
• The reverse of a word is the word written from backwards (in the opposite order).
• The reverse of a word is denoted .
• Example:

• So, if (where ) then .


• We can define using the following inductively defined function
• rev : as the value .
• The function is defined as follows:

• for and it holds that


Prefix, Suffix & Subword
• Prefix
• A word is a prefix of a word , if there exists a word such that .
• Prefixes of the word are .

• Suffix
• A word is a suffix of a word , if there exists a word such that .
• Suffixes of the word are .

• Subword
• A word is a subword of a word , if there exists a words and such that .
• Subwords of the word are
Operations of Languages
• Let us say we have already described some languages. We can create new languages from
these languages using different operations on languages.
• So, a description of a complicated language can be decomposed in such a way that it is described a
result of an application of some operations on some simpler languages.
• Examples of important operations on languages:
• Union
• Intersection
• Complement
• Concatenation
• Iteration

• Remark: It is assumed the languages involved in these operations use the same alphabet .
Set Operation of Languages
• Since languages are sets, we can apply any set operations to them:
• Union:
• is the language consisting of the words belonging to language or to language (or to both).

• Intersection
• is the language consisting of the words belonging to language and to language .

• Complement
• is the language containing those words from that do not belong to .

• Difference
• is the language containing those words of that do not belong to .

• Remark: We assume that for some given alphabet


Concatenation of Languages
• Concatenation of languages and , where , is the language such that for each it holds that

• The concatenation of languages and is denoted .


• Example:

• The language contains the following words:

• Remark: Note that the concatenation of languages is associative, i.e., for arbitrary
languages it holds that:
Power of a Language
• Notation , where and , denotes the concatenation of the form where the language occurs
times, i.e.,

• Example: For , the language contains the following words:

• Formally, the power of a language , denoted can be defined using the following inductive
definition:
• for
Finite Automata
• Some Applications
• Software for designing and checking the behavior of digital circuits
• Lexical analyzer of a typical compiler
• Software for scanning large bodies of text (e.g., web pages) for pattern finding
• Software for verifying systems of all types that have a finite number of states (e.g., stock market
transaction, communication/network protocol)
Defining Languages
• The languages can be defined in different ways, such as Descriptive definition, Recursive
definition, using Regular Expression (RE) and using Finite Automata (etc.)
• Descriptive Definition of Language:
• The language is defined, describing the conditions imposed on its words.
• Example:
• The language L of strings of odd length, defined over , can be written as
• The language L of strings that does not start with a, defined over , can be written as
• The language L of strings of length 2, defined over , can be written as
• The language L of strings of length 3 ending in 0, defined over , can be written as
• The language EQUAL of strings with number of a’s equal to number of b’s, defined over , can be written as
• The language EVEN-EVEN of strings with even number of a’s and even number of b’s, defined over , can be written
as
Palindrome
• The language consisting of and the string s defined over such that
• It is to be denoted that the word of PALINDROME are called palindromes.
• English language example:
• EYE, RADAR, LEVEL, NOON, etc.
• Example:
• , PALINDROME =
Regular Expressions
• Offers a declarative way to express the pattern of any string we want to accept
• Example:
• Automata more machine-like
• <input: string, output: [accept/reject]>
• Regular expressions more program syntax-like
• Unix environments heavily use regular expressions
• bash shell, grep, vi & other editors, sed
• Perl scripting – good for string processing
• Lexical analyzers such as Lex or Flex
Regular Expressions

Regular Finite Automata


Expressions (DFA, NFA)

Regular
Languages
Regular Expressions - Definitions
• Regular expressions are:
• An algebraic way to describe languages (they describe exactly the regular languages).
• If E is a regular expression, then L(E) is the language it defines.
• We’ll describe RE’s and their languages recursively.

• Basis
• Basis 1: If is any symbol, then a is RE, and .
• Basis 2: is RE, and .
• Basis 3: is RE, and .
• Induction
• Induction 1: If and are regular expressions, then is a regular expression, and
• Induction 2: If and are regular expressions, then is a regular expression, and
• Induction 3: If is RE, then is RE, and
Language Operators
• Union
• = all strings that are either in L or M
• Concatenation
• = all strings that are of the form
• Kleene Closure (the * operator)
Kleene Closure – Example
• Let

• …
Kleene Closure – Special Notes
• is an infinite set iff and Why?
• If , then Why?
• If , then Why?

• denotes the set of all words over an alphabet


• Therefore, an abbreviated way of saying there is an arbitrary language over an alphabet is:
Thank You 
Any Questions?

You might also like