At Module-3
At Module-3
(i) x = uvw,
(ii) v 6= ǫ,
(iii) |uv| ≤ n,
(iv) uvkw ∈ L for all k ∈ N. To prove that a language L is not
regular, we use proof by contradiction.
|y| > 0
|xz| ≤ c
For all k ≥ 0, the string xy z is also in L.
k
Solution −
At first, we assume that L is regular and n is the number of states.
Let w = a b . Thus |w| = 2n ≥ n.
n n
Let k = 2. Then xy z = a a a b . 2 p 2q r n
Number of as = (p + 2q + r) = (p + q + r) + q = n + q
Hence, xy z = a b . Since q ≠ 0, xy z is not of the form a b .
2 n+q n 2 n n
L' denotes the language that does not contain strings that begin and end with a. This implies L'
contains strings that
(A,B,C,D)
L=(A)
L’=(B,C,D)
L
begins with a and ends with a
L’
begins with a and ends with b
begins with b and ends with a
begins with b and ends with b
The DFA for L' is obtained by flipping the final states of DFA(L) to non-final states and vice-versa.
The DFA for L' is given below.
q0 ensures ε is accepted
q1 ensures all strings that
begin with a and end with
b are accepted.
q3 ensures all strings that
begin with b (ending with
either a or b) are
accepted.
Important Note: While specifying the DFA for L, we have also included the dead state q3. It is
important to include the dead state(s) if we are going to derive the complement DFA since, the
dead state(s) too would become final in the complementation. If we didn't add the dead state(s)
originally, the complement will not accept all strings supposed to be accepted.
In the above example, if we didn't include q3 originally, the complement will not accept strings
starting with b. It will only accept strings that begin with a and end with b which is only a subset of
the complement.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER COMPLEMENTATION.
2. Union
DFA(L1 ∪ L2) can be constructed by adding a new start state and new final state.
The new start state connects to the two start states of DFA(L1) and DFA(L2) by εtransitions.
Similarly, two ε transitions are added from the final states of DFA(L1) and DFA(L2) to the new
final state.
Convert this resulting NFA to its equivalent DFA.
As an exercise you can try this approach of DFA construction for union for the given example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER UNION.
3. Intersection
If L1 and L2 are regular, then L1 ∩ L2 is regular.
Since a language denotes a set of (possibly infinite) strings and we have shown above that
regular languages are closed under union and complementation, by De Morgan's law can be
applied to show that regular languages are closed under intersection too.
L1 and L2 are regular ⇒ L1' and L2' are regular (by Complementation property)
L1' ∪ L2' is regular (by Union property)
L1 ∩ L2 is regular (by De Morgan's law)
(A’UB’)’=A’ ‘ U’ B’ ‘
A Insec B
In terms of DFA, we can say that a DFA(L1 ∩ L2) accepts those strings that are accepted by both
DFA(L1) and DFA(L2).
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER INTERSECTION.
4. Concatenation
If L1 and L2 are regular, then L1 . L2 is regular.
L1= a*
L2=b*
L1.L2= a*.b*= ab*
L1UL2=L1+L2=L1|L2=a*|b*=(a|b)*
This can be easily proved by regular expressions. If R1 is a regular expression denoting L1 and
R2 is a regular expression denoting L2, then we R1 . R2 denotes the regular expression denoting
L1 . L2. Therefore, L1 . L2 is regular.
In terms of DFA, we can say that a DFA(L1 . L2) can be constructed by adding an ε-trainstion from
the final state of DFA(L1) - which now ceases to be the final state - to the start state of DFA(L2).
You can try showing this using an example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER CONCATENATION.
5. Kleene star
If L is regular, then L* is regular.
This can be easily proved by regular expression.
If L is regular, then there exists a regular expression R.
We know that if R is a regular expression, R* is a regular expression too. R* denotes the
language L*. Therefore L* is regular.
In terms of DFA, in the DFA(L) we add two ε transitions, one from start state to final state and
another from final state to start state. This denotes DFA(L*). You can try showing this for an
example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER KLEENE STAR.
6. Difference
If L1 and L2 are regular, then L1 - L2 is regular.
We know that L1 - L2 = L1 ∩ L2'
L1 and L2 are regular ⇒ L1 and L2' are regular (by Complementation property)
L1 ∩ L2' is regular (by Intersection property)
L1 - L2 is regular (by De Morgan's law)
In terms of DFA, we can say that a DFA(L1 - L2) accepts those strings that are accepted by both
DFA(L1) and not accepted by DFA(L2). You can try showing this for an example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER DIFFERENCE.
7. Reverse
If L is regular, then LR is regular.
Let DFA(L) denote the DFA of L. Make the following modifications to construct DFA(LR).
In case there are more than one final state in DFA(L), first add a new final state and add
ε- transitions from the final states (which now cease to be final states any more) and
perform this step.
(ab)∗={ϵ,ab,abab,ababab,...}(ab)∗={ϵ,ab,abab,ababab,...}
while the kleene star of union gives
(a+b)∗=(a|b)*=(aUb)={ϵ,a,b,aa,ab,ba,bb,…}(a+b)∗={ϵ,a,b,
aa,ab,ba,bb,…}
so you got it correctly, and indeed all the words you write belong to the language.
3.4 Introduction to Grammars and Languages:
In the literary sense of the term, grammars denote syntactical rules for conversation in
natural languages. Linguistics have attempted to define grammars since the inception
of natural languages like English, Sanskrit, Mandarin, etc.
26+26=52a-z A-Z, eps=53
My name is kamal
The theory of formal languages finds its applicability extensively in the fields of
Computer Science. Noam Chomsky gave a mathematical model of grammar in 1956
which is effective for writing computer languages.
JAVA, /c C++
Int = INT= int= iNt
Int a,b,c…
float
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
N or V is a set of variables or non-terminal symbols.
N
belongs to V .N
Example
Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
S and A are Non-terminal symbols.
a and b are Terminal symbols.
ε is an empty string.
S is the Start symbol, S ∈ N
Production P : S → aAb, aA → aaAb, A → ε
G= (V, T, P, S)
Where,
In CFG, the start symbol is used to derive the string. You can derive the string by
repeatedly replacing a non-terminal by the right hand side of the production, until all
non-terminal have been replaced by terminal symbols.
Example:
Production rules:
1. S → aSa
2. S → bSb
3. S → c
Now check that abbcbba string can be derived from the given CFG.
1. S ⇒ aSa
2. S ⇒ abSba
3. S ⇒ abbSbba
4. S ⇒ abbcbba
By applying the production S → aSa, S → bSb recursively and finally applying the
production S → c, we get the string abbcbba
Example
Let us consider the grammar −
G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )
Some of the strings that can be derived are −
S ⇒ aAb using production S → aAb
⇒ aaAbb using production aA → aAb
⇒ aaaAbbb using production aA → aaAb
⇒ aaabbb using production A → ε
The set of all strings that can be derived from a grammar is said to be the language
generated from that grammar. A language generated by a grammar G is a subset
formally defined by
L(G)={W|W ∈ ∑*, S ⇒G W}
If L(G1) = L(G2), the Grammar G1 is equivalent to the Grammar G2.
Example
If there is a grammar
G: N = {S, A, B} T = {a, b} P = {S → AB, A → a, B → b}
Here S produces AB, and we can replace A by a, and B by b. Here, the only accepted
string is ab, i.e.,
L(G) = {ab}
Example
Suppose we have the following grammar −
G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b}
The language generated by this grammar −
L(G) = {ab, a b, ab , a b , ………}
2 2 2 2
= {a b | m ≥ 1 and n ≥ 1}
m n
Example
Problem − Suppose, L (G) = {a b | m ≥ 0 and n > 0}. We have to find out the
m n
Example
Problem − Suppose, L (G) = {a b | m > 0 and n ≥ 0}. We have to find out the grammar
m n
S → aA, A → aA , A → B, B → bB ,B → λ
S → aA → aB → aλ → a (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated
by the production set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })
Take a look at the following illustration. It shows the scope of each type of grammar −
Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single
non-terminal on the left-hand side and a right-hand side consisting of a single terminal
or single terminal followed by a single non-terminal.
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example
X → ε
X → a | aY
Y → b
Type - 2 Grammar
Type-2 grammars generate context-free languages.
The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
These languages generated by these grammars are be recognized by a non-
deterministic pushdown automaton.
Example
S → X a
X → a
X → aX
X → abc
X → ε
Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in
the form
αAβ→αγβ
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The
languages generated by these grammars are recognized by a linear bounded
automaton.
Example
AB → AbBc
A → bcA
B → b
Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have
no restrictions. They are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and
nonterminals with at least one non-terminal and α cannot be null. β is a string of
terminals and non-terminals.
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
Example
Representation Technique
Root vertex − Must be labeled by the start symbol.
Vertex − Labeled by a non-terminal symbol.
Leaves − Labeled by a terminal symbol or ε.
If S → x x …… x is a production rule in a CFG, then the parse tree / derivation tree will
1 2 n
be as follows −
There are two different approaches to draw a derivation tree −
Top-down Approach −
Starts with the starting symbol S
Goes down to tree leaves using productions
Bottom-up Approach −
Starts from tree leaves
Proceeds upward to the root which is the starting symbol S
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −
The rightmost derivation for the above string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below −
Left and Right Recursive Grammars
In a context-free grammar G, if there is a production in the form X → Xa where X is a
non-terminal and ‘a’ is a string of terminals, it is called a left recursive production.
The grammar having a left recursive production is called a left recursive grammar.
And if in a context-free grammar G, if there is a production is in the form X →
aX where X is a non-terminal and ‘a’ is a string of terminals, it is called a right
recursive production. The grammar having a right recursive production is called
a right recursive grammar.
If a context free grammar G has more than one derivation tree for some string w ∈
L(G), it is called an ambiguous grammar. There exist multiple right-most or left-most
derivations for some string generated from that grammar.
Problem
Check whether the grammar G with production rules −
X → X+X | X*X |X| a
is ambiguous or not.
Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost derivations.
Derivation 1 − X → X+X → a +X → a+ X*X → a+a*X → a+a*a
Parse tree 1 −
Example of strings with balanced parenthesis are (()), ()(), (()()), while )(, and (() are not
balanced. A grammar with the following productions generates all and only the strings
with balanced parenthesis:
B → BB | (B) | λ
The first production, B → BB, says that concatenation of two strings of balanced
parenthesis is balanced. That is, we can match the parenthesis in two strings
independently. The second production, B → (B), says that if we place a pair of
parenthesis around a balanced string, then the result is balanced. The third production,
B → λ is the basis, which says that an empty string is balanced.
Example-2:
There are numerous aspects of typical programming language that behave like
balanced parentheses. Beginning and ending of code blocks, such as begin and end in
Pascal, or the curly braces { . . . } of C, are examples.
A grammer that generates the possible sequence of if and else (represented by i and e,
respectively) is:
S → SS | iS | iSe | λ
For instance, ieie, iie, and iei are possible sequences of if and else’s and each of these
strings is generated by the above grammer. Some examples of illegal sequences not
generated by the grammer are, ei, ieeii, iee.
Example-3:
We give below CFG that describes some parts of the structure of HTML (Hypertext
Markup Language).
Char → a | A | . . . Text → λ | Char Text Doc → λ | Element Doc Element → Text |< EM
> Doc < /EM >|< P > Doc |< OL > List < /OL > List → λ | ListItem List ListItem →< LI >
Doc Example-4:
Then the grammar generates, in particular, the following strings: John bought car Jim
ate cheese big Jim ate green cheese John bought big car big stout John bought big
white car Unfortunately, the grammer also generates sentences like: big stout car
bought big stout car big cheese ate Jim green Jim ate green big Jim.
Syntax:
<!DOCTYPE element DTD identifier
[
first declaration
second declaration
.
.
nth declaration
]>
Example:
<?xml version="1.0"?>
<!DOCTYPE address [
]>
<address>
<name>
<first>Rohit</first>
<last>Sharma</last>
</name>
<email>[email protected]</email>
<phone>9876543210</phone>
<birthday>
<year>1987</year>
<month>June</month>
<day>23</day>
</birthday>
</address>
!ELEMENT address defines that the address element must contain four
elements: “name, email, phone, birthday”.
!ELEMENT name defines that the name element must contain two elements:
“first, last”.
!ELEMENT birthday defines that the birthday element must contain three
elements “year, month, day”.
!ELEMENT year defines the year element to be of type “#PCDATA”.
!ELEMENT month defines the month element to be of type
“#PCDATA”.
!ELEMENT day defines the day element to be of type “#PCDATA”.
XML
<?xml version="1.0"?>
<address>
<name>
<first>Rohit</first>
<last>Sharma</last>
</name>
<email>[email protected]</email>
<phone>9876543210</phone>
<birthday>
<year>1987</year>
<month>June</month>
<day>23</day>
</birthday>
</address>
address.dtd:
Attention reader! Don’t stop learning now. Get hold of all the important HTML
concepts with the Web Design for Beginners | HTML course.
3.11 Ambiguity in grammars and languages – ambiguous
grammars.
Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one leftmost derivation or
more than one rightmost derivation or more than one parse tree for the given input
string. If the grammar is not ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction. No method
can automatically detect and remove the ambiguity, but we can remove ambiguity by re-
writing the whole grammar without ambiguity.
Example 1:
Let us consider a grammar G with the production rule
1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9
Solution:
For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost
derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is
ambiguous.
Example 2:
Check whether the given grammar G is ambiguous or not.
1. E → E + E
2. E → E - E
3. E → id
Solution:
From the above grammar String "id + id - id" can be derived in 2 ways:
1. E → E + E
2. → id + E
3. → id + E - E
4. → id + id - E
5. → id + id- id
Since there are two leftmost derivation for a single string "id + id - id", the grammar G is
ambiguous.
Example 3:
Check whether the given grammar G is ambiguous or not.
1. S → aSb | SS
2. S → ε
Solution:
For the string "aabb" the above grammar can generate two parse trees
Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.
Example 4:
Check whether the given grammar G is ambiguous or not.
1. A → AA
2. A → (A)
3. A → a
Solution:
For the string "a(a)aa" the above grammar can generate two parse trees:
Since there are two parse trees for a single string "a(a)aa", the grammar G is
ambiguous.