Formal Languages and Chomsky Hierarchy
Formal Languages and Chomsky Hierarchy
SAURABH SINGH
SCIENTIFIC ANALYSIS GROUP
DEFENCE R&D ORGANISATION, DELHI
[email protected]
1 ST M A R C H 2 0 1 6
Language
Generative approach
A language is the set of strings generated by a grammar.
Generation process
Start symbol
Expand with rewrite rules.
Stop when a word of the language is generated.
Recognition approach
A language is the set of strings accepted by an automaton.
Recognition process
Start in initial state.
Transitions to other states guided by the string symbols.
Until read whole string and reach accept/reject state.
Formal language
Is a set of words, that is, finite strings of symbols taken from the alphabet over which
the language is defined.
Alphabet: a finite, non-empty set of symbols.
Example
1 = { 0, 1 }
2 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
3 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F }
4 = { a, b, c,, z }
Notation
a, b, c, . . . denote symbols
Formal languages: definition and basic notions
Example
0 = {} for any
11 = {0, 1}
String operations:
vw is the concatenation of v and w
v is a prefix of w iff vy = w
v is a suffix of w iff xv = w
Example
w = w = w
Formal languages: definition and basic notions
{}
Formal languages: definition and basic notions
Operations on languages
Let L1 and L2 be languages over the alphabets 1 and 2, respectively.
Then:
L1 U L2 = {w | w L1 V w L2 }
1= {w *1 | w 1 }
L*1 = {} U L1 U L2
1 U
L1 L2 = {w | w L1 w L2}
A grammar is a tuple G = (V,T,S,P) where
V is a finite, non-empty set of symbols called variables
(or non-terminals or syntactic categories)
T is an alphabet of symbols called terminals
S V is the start (or initial) symbol of the grammar
P is a finite set of productions where (V T)+ and (V T)
Type-3 grammars generate regular languages. Type-3 grammars must have a single
non-terminal on the left-hand side and a right-hand side consisting of a single
terminal or single terminal followed by a single non-terminal.
The productions must be in the form X a or X aY
where X, Y N (Non terminal)
and a T (Terminal)
The rule S is allowed if S does not appear on the right side of any rule.
Example
S aB
B bB
B
What language does this define? ab*
Finite Automata
is the alphabet
= {0,1}
: Q Q transition function* 0
q1 1
0,1
q0 Q is start state
1
q3
Build an automaton that accepts all and only those strings that
contain 001
0,1
1 0
0
0 1
q q0 q00 q001
1
Limits of Regular languages and finite automata
What types of languages cant FAs accept? In other words, what limits
are there on the complexity of regular languages?
FAs lack memory, so that you cant have one part of a regular language
dependent on another part.
(Type-2)Context-free grammars
a, A/AA
b, A/
L { a n b n # : n 1} a, $/A$
b, A/ #, $/
top
$
b
a, A/AA
b, A/
L { a n b n # : n 1} a, $/A$
b, A/ #, $/
aaabbb#
A
A A A
A A A A A
$ $ $ $ $ $ $
accepting
Theory of Computation, NTUEE
Pushdown Automaton (PDA)
finite control and a single unbounded stack
Lecture
01-30
a, A/AA
b, A/
L { a n b n # : n 1} a, $/A$
b, A/ #, $/
aaabbbb#
A
A A A ? rejecting
A A A A A
$ $ $ $ $ $ $
Theory of Computation, NTUEE
Pushdown Automaton (PDA)
finite control and a single unbounded stack
Lecture
01-31
a, A/AA
b, A/
L { a b # : n 1}
n n
a, $/A$
b, A/ #, $/
aaabb#
A
A A A
rejecting
A A A A A ?
$ $ $ $ $ $
Theory of Computation, NTUEE
Limits on PDAs and CFGs
Adding memory is nice, but there are still
significant limits on what a PDA can
accomplish.
Type-1 grammars generate context-sensitive languages. The productions must be in the form
A
where A N (Non-terminal) and , , (T N)* (Strings of terminals and non-terminals)
The strings and may be empty, but must be non-empty.
The rule S is allowed if S does not appear on the right side of any rule. The languages generated by
these grammars are recognized by a linear bounded automaton.
Example
AB AbBc
A bcA
Bb
Alternate Definition:
P={-> ; | ||}
It is based on Random Access Memory
(Type-0)
Example
S ACaB
Bc acB
CB DB
aD Db
Turing Machines
At each step, the behaviour of the machine can depend on the current state of the control unit,
the tape symbol at the current read position.
Depending on these things, the machine may then overwrite the current tape symbol with a new symbol,
shift the tape left or right by one cell, jump to a new control state.
This happens repeatedly until (lets say) the control unit enters some final state.
Turing Machines cont.
An initial state i Q
A tape alphabet
An input alphabet
A blank symbol