0% found this document useful (0 votes)
97 views36 pages

At Module-3

The document discusses properties of regular languages and context-free grammars. It covers topics like proving languages are not regular using the pumping lemma, closure properties of regular languages under operations like union and complement, and definitions of context-free grammars. Specific examples are provided to demonstrate proofs that languages are not regular and constructions of regular expressions and DFAs for languages under operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views36 pages

At Module-3

The document discusses properties of regular languages and context-free grammars. It covers topics like proving languages are not regular using the pumping lemma, closure properties of regular languages under operations like union and complement, and definitions of context-free grammars. Specific examples are provided to demonstrate proofs that languages are not regular and constructions of regular expressions and DFAs for languages under operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

MODULE 03

3. Properties of regular languages-


3.1 Proving Languages not to be Regular
3.2 The Pumping Lemma for Regular Languages
3.3 Closure properties of regular languages –
3.3.1 closure of regular languages under Boolean operations-union,
3.3.2 complementation,
3.3.3 intersection & difference,
3.3.4 Equivalence and minimization of automata

3.4 Context Free Grammars and Languages: Definition of context-free grammars,


3.5 derivations using a grammar,
3.6 leftmost and rightmost derivations,
3.7 The language of a grammar and sentential forms,
exercise problems,
3.8 Parse trees – constructing a parse tree, the yield of a parse tree, inference, derivations and
parse trees,
3.9 Applications of context free grammars – Markup languages,
3.10 XML and document-type definitions,
3.11 Ambiguity in grammars and languages – ambiguous grammars.
3.1 Proving Languages not to be Regular

The Pumping Lemma is used for proving that a language is not


regular. Here is the Pumping Lemma.

If L is a regular language, then there is an integer n > 0 with the


property that: (*) for any string x ∈ L where |x| ≥ n, there are
strings u, v, w such that

(i) x = uvw,
(ii) v 6= ǫ,
(iii) |uv| ≤ n,
(iv) uvkw ∈ L for all k ∈ N. To prove that a language L is not
regular, we use proof by contradiction.

Here are the steps.


1. Suppose that L is regular.

2. Since L is regular, we apply the Pumping Lemma


and assert the existence of a number n > 0 that satisfies
the property (*).

3. Give a particular string x such that


(a) x ∈ L,
(b) |x| ≥ n.
This the trickiest part. A wrong choice here will
make step 4 impossible.

4. By Pumping Lemma, there are strings u, v, w such


that (i)-(iv) hold. Pick a particular number k ∈ N and
argue that uvkw 6∈ L, thus yielding our desired
contradiction. What follows are two example proofs using
Pumping Lemma.

3.2 The Pumping Lemma for Regular Languages


Theorem
Let L be a regular language. Then there exists a constant ‘c’ such that for every
string w in L −
|w| ≥ c
3=3

We can break w into three strings, w = xyaz, such that –


X,y,z,xy,yz,xz,xyz=7=0-6=7=c=x,y,z,a,xy,ya,az,zx,xya,yaz,azx,xyaz=12
W=xy= x,y,xy=3
W=y= y=1>0

 |y| > 0
 |xz| ≤ c
 For all k ≥ 0, the string xy z is also in L.
k

Applications of Pumping Lemma


Pumping Lemma is to be applied to show that certain languages are not regular. It
should never be used to show a language is regular.
 If L is regular, it satisfies Pumping Lemma.
 If L does not satisfy Pumping Lemma, it is non-regular.

Method to prove that a language L is not regular


 At first, we have to assume that L is regular.
 So, the pumping lemma should hold for L.
 Use the pumping lemma to obtain a contradiction −
o Select w such that |w| ≥ c

o Select y such that |y| ≥ 1

o Select x such that |xy| ≤ c

o Assign the remaining string to z.

o Select k such that the resulting string is not in L.

Hence L is not regular.


Problem
Prove that L = {a b | i ≥ 0} is not regular.
i i

Solution −
 At first, we assume that L is regular and n is the number of states.
 Let w = a b . Thus |w| = 2n ≥ n.
n n

 By pumping lemma, let w = xyz, where |xy| ≤ n.


 Let x = a , y = a , and z = a b , where p + q + r = n, p ≠ 0, q ≠ 0, r ≠ 0. Thus |y| ≠ 0.
p q r n

 Let k = 2. Then xy z = a a a b . 2 p 2q r n

 Number of as = (p + 2q + r) = (p + q + r) + q = n + q
 Hence, xy z = a b . Since q ≠ 0, xy z is not of the form a b .
2 n+q n 2 n n

 Thus, xy z is not in L. Hence L is not regular.


2
3.3 Properties of Regular Languages
So far we have seen different ways of specifying regular language: DFA, NFA, ε-NFA, regular
expressions and regular grammar. We noted that all these different expressions are equal in
power by showing the equivalences. Regular expressions and grammars are considered as
generators of regular language while the machines (DFA, NFA, ε-NFA) are considered as
acceptors of the language.
Now we will look at the properties of regular language. The properties can be broadly classified
as two parts: (A) Closure properties and (B) Decision properties

(A) Closure Properties


1. Complementation
If a language L is regular its complement L' is regular.
Let DFA(L) denote the DFA for the language L. Modify the DFA as follows to obtain DFA(L').

1.Change the final states to non-final states.


2.Change the non-final states to final states.
Since there exists a DFA(L') now, L' is regular.
This can be shown by an example using a DFA. Let L denote the language containing strings that
begins and ends with a. Σ = {a, b}. The DFA for L is given below.
Q0-> Q1->Q2= {aa}

Note: q3 denotes the


dead state.
Once you enter q3,
you remain in it
forever.

L' denotes the language that does not contain strings that begin and end with a. This implies L'
contains strings that
(A,B,C,D)
L=(A)
L’=(B,C,D)
L
 begins with a and ends with a
L’
 begins with a and ends with b
 begins with b and ends with a
 begins with b and ends with b
The DFA for L' is obtained by flipping the final states of DFA(L) to non-final states and vice-versa.
The DFA for L' is given below.


q0 ensures ε is accepted


q1 ensures all strings that
begin with a and end with
b are accepted.


q3 ensures all strings that
begin with b (ending with
either a or b) are
accepted.

Important Note: While specifying the DFA for L, we have also included the dead state q3. It is
important to include the dead state(s) if we are going to derive the complement DFA since, the
dead state(s) too would become final in the complementation. If we didn't add the dead state(s)
originally, the complement will not accept all strings supposed to be accepted.
In the above example, if we didn't include q3 originally, the complement will not accept strings
starting with b. It will only accept strings that begin with a and end with b which is only a subset of
the complement.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER COMPLEMENTATION.

2. Union

If L1 and L2 are regular, then L1 ∪ L2 is regular.


This is easier proved using regular expressions. If L1 is regular, there exists a regular expression
R1 to describe it. Similarly, if L2 is regular, there exists a regular expression R2 to describe it. R1
+ R2 denotes the regular expression that describe L1 ∪ L2. Therefore, L1 ∪ L2 is regular.
This again can be shown using an example. If L1 is a language that contains strings that begin
with a and L2 is a language that contain strings that end with a, then L1 ∪ L2 denotes the language
the contain strings that either begin with a or end with a.

- a(a+b)* is the regular expression that denotes L1.


- (a+b)*a is the regular expression that denotes L2.
- L1 ∪ L2 is denoted by the regular expression
a(a+b)* + (a+b)*a. Therefore, L1 ∪ L2 is regular.
In terms of DFA, we can say that a DFA(L1 ∪ L2) accepts those strings that are accepted by either
DFA(L1) or DFA(L2) or both.

 DFA(L1 ∪ L2) can be constructed by adding a new start state and new final state.
 The new start state connects to the two start states of DFA(L1) and DFA(L2) by εtransitions.
 Similarly, two ε transitions are added from the final states of DFA(L1) and DFA(L2) to the new
final state.
 Convert this resulting NFA to its equivalent DFA.

As an exercise you can try this approach of DFA construction for union for the given example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER UNION.

3. Intersection
If L1 and L2 are regular, then L1 ∩ L2 is regular.
Since a language denotes a set of (possibly infinite) strings and we have shown above that
regular languages are closed under union and complementation, by De Morgan's law can be
applied to show that regular languages are closed under intersection too.

L1 and L2 are regular ⇒ L1' and L2' are regular (by Complementation property)
          L1' ∪ L2' is regular (by Union property)
          L1 ∩ L2 is regular (by De Morgan's law)
(A’UB’)’=A’ ‘ U’ B’ ‘
A Insec B
In terms of DFA, we can say that a DFA(L1 ∩ L2) accepts those strings that are accepted by both
DFA(L1) and DFA(L2).
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER INTERSECTION.

4. Concatenation
If L1 and L2 are regular, then L1 . L2 is regular.

L1= a*
L2=b*
L1.L2= a*.b*= ab*

L1UL2=L1+L2=L1|L2=a*|b*=(a|b)*
This can be easily proved by regular expressions. If R1 is a regular expression denoting L1 and
R2 is a regular expression denoting L2, then we R1 . R2 denotes the regular expression denoting
L1 . L2. Therefore, L1 . L2 is regular.
In terms of DFA, we can say that a DFA(L1 . L2) can be constructed by adding an ε-trainstion from
the final state of DFA(L1) - which now ceases to be the final state - to the start state of DFA(L2).
You can try showing this using an example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER CONCATENATION.

5. Kleene star
If L is regular, then L* is regular.
This can be easily proved by regular expression.
If L is regular, then there exists a regular expression R.
We know that if R is a regular expression, R* is a regular expression too. R* denotes the
language L*. Therefore L* is regular.
In terms of DFA, in the DFA(L) we add two ε transitions, one from start state to final state and
another from final state to start state. This denotes DFA(L*). You can try showing this for an
example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER KLEENE STAR.

6. Difference
If L1 and L2 are regular, then L1 - L2 is regular.
We know that L1 - L2 = L1 ∩ L2'

L1 and L2 are regular ⇒ L1 and L2' are regular (by Complementation property)
          L1 ∩ L2' is regular (by Intersection property)
          L1 - L2 is regular (by De Morgan's law)
In terms of DFA, we can say that a DFA(L1 - L2) accepts those strings that are accepted by both
DFA(L1) and not accepted by DFA(L2). You can try showing this for an example.
CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER DIFFERENCE.

7. Reverse
If L is regular, then LR is regular.
Let DFA(L) denote the DFA of L. Make the following modifications to construct DFA(LR).

1. Change the start state of DFA(L) to the final state.


2. Change the final state of DFA(L) to the start state.

In case there are more than one final state in DFA(L), first add a new final state and add
ε- transitions from the final states (which now cease to be final states any more) and
perform this step.

3. Reverse the direction of the arrows.

You can try showing this using an example.


CONCLUSION: REGULAR LANGUAGES ARE CLOSED UNDER REVERSAL.

Difference Between Union & Concatenation


the kleene star of concatenation gives

(ab)∗={ϵ,ab,abab,ababab,...}(ab)∗={ϵ,ab,abab,ababab,...}
while the kleene star of union gives

(a+b)∗=(a|b)*=(aUb)={ϵ,a,b,aa,ab,ba,bb,…}(a+b)∗={ϵ,a,b,
aa,ab,ba,bb,…}
so you got it correctly, and indeed all the words you write belong to the language.
3.4 Introduction to Grammars and Languages:
In the literary sense of the term, grammars denote syntactical rules for conversation in
natural languages. Linguistics have attempted to define grammars since the inception
of natural languages like English, Sanskrit, Mandarin, etc.
26+26=52a-z A-Z, eps=53
My name is kamal

The theory of formal languages finds its applicability extensively in the fields of
Computer Science. Noam Chomsky gave a mathematical model of grammar in 1956
which is effective for writing computer languages.

JAVA, /c C++
Int = INT= int= iNt
Int a,b,c…
float

Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
 N or V is a set of variables or non-terminal symbols.
N

 T or ∑ is a set of Terminal symbols.


 S is a special variable called the Start symbol, S ∈ N
 P is Production rules for Terminals and Non-terminals. A production rule has the

form α → β, where α and β are strings on V ∪ ∑ and least one symbol of α


N

belongs to V .N

Terminal and non terminal

Singma: {a,b,c,….z,A,B,C,D…,eps}=53=T= a……z,0 &1, eps

N=Vn= eps A---------a----------- >B= A….Z


Example
Grammar G1 −
({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
(Vn, T, S, P}
Here,
 S, A, and B are Non-terminal symbols;
 a and b are Terminal symbols
 S is the Start symbol, S ∈ N
 Productions, P : S → AB, A → a, B → b

Example
Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
 S and A are Non-terminal symbols.
 a and b are Terminal symbols.
 ε is an empty string.
 S is the Start symbol, S ∈ N
 Production P : S → aAb, aA → aaAb, A → ε

Definition of context-free grammars,


Context free grammar is a formal grammar which is used to generate all possible strings
in a given formal language.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)

Where,

G describes the grammar


T describes a finite set of terminal symbols.

V describes a finite set of non-terminal symbols

P describes a set of production rules

S is the start symbol.

In CFG, the start symbol is used to derive the string. You can derive the string by
repeatedly replacing a non-terminal by the right hand side of the production, until all
non-terminal have been replaced by terminal symbols.

Example:

L= {wcwR | w € (a, b)*}

Production rules:

1. S → aSa
2. S → bSb
3. S → c

Now check that abbcbba string can be derived from the given CFG.

1. S ⇒ aSa
2. S ⇒ abSba
3. S ⇒ abbSbba
4. S ⇒ abbcbba

By applying the production S → aSa, S → bSb recursively and finally applying the
production S → c, we get the string abbcbba

3.5 derivations using a grammar


Derivations from a Grammar
Strings may be derived from other strings using the productions in a grammar. If a
grammar G
Strings may be derived from other strings using the productions in a grammar. If a
grammar G has a production α → β, we can say that x α y derives x β y in G. This
derivation is written as −
x α y ⇒G x β y

Example
Let us consider the grammar −
G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )
Some of the strings that can be derived are −
S ⇒ aAb using production S → aAb
⇒ aaAbb using production aA → aAb
⇒ aaaAbbb using production aA → aaAb
⇒ aaabbb using production A → ε
The set of all strings that can be derived from a grammar is said to be the language
generated from that grammar. A language generated by a grammar G is a subset
formally defined by
L(G)={W|W ∈ ∑*, S ⇒G W}
If L(G1) = L(G2), the Grammar G1 is equivalent to the Grammar G2.

Example
If there is a grammar
G: N = {S, A, B} T = {a, b} P = {S → AB, A → a, B → b}
Here S produces AB, and we can replace A by a, and B by b. Here, the only accepted
string is ab, i.e.,
L(G) = {ab}

Example
Suppose we have the following grammar −
G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b}
The language generated by this grammar −
L(G) = {ab, a b, ab , a b , ………}
2 2 2 2

= {a b | m ≥ 1 and n ≥ 1}
m n

Construction of a Grammar Generating a Language


We’ll consider some languages and convert it into a grammar G which produces those
languages.

Example
Problem − Suppose, L (G) = {a b | m ≥ 0 and n > 0}. We have to find out the
m n

grammar G which produces L(G).


Solution
Since L(G) = {a b | m ≥ 0 and n > 0}
m n

the set of strings accepted can be rewritten as −


L(G) = {b, ab,bb, aab, abb, …….}
Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’
including null.
To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −
S → aS , S → B, B → b and B → bB
S → B → b (Accepted)
S → B → bB → bb (Accepted)
S → aS → aB → ab (Accepted)
S → aS → aaS → aaB → aab(Accepted)
S → aS → aB → abB → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated
by the production set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB })

Example
Problem − Suppose, L (G) = {a b | m > 0 and n ≥ 0}. We have to find out the grammar
m n

G which produces L(G).


Solution −
Since L(G) = {a b | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as −
m n

L(G) = {a, aa, ab, aaa, aab ,abb, …….}


Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’
including null.
To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the
productions –

S → aA, A → aA , A → B, B → bB ,B → λ

S → aA → aB → aλ → a (Accepted)

S → aA → aaA → aaB → aaλ → aa (Accepted)

S → aA → aB → abB → abλ → ab (Accepted)

S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted)

S → aA → aaA → aaB → aabB → aabλ → aab (Accepted)

S → aA → aB → abB → abbB → abbλ → abb (Accepted)

Thus, we can prove every single string in L(G) is accepted by the language generated
by the production set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })

Chomsky Classification of Grammars


According to Noam Chomosky, there are four types of grammars − Type 0, Type 1,
Type 2, and Type 3. The following table shows how they differ from each other −
Grammar Grammar Accepted Language Accepted Automaton
Type

Type 0 Unrestricted grammar Recursively enumerable Turing Machine


language

Type 1 Context-sensitive Context-sensitive Linear-bounded


grammar language automaton

Type 2 Context-free Context-free language Pushdown automaton


grammar

Type 3 Regular grammar Regular language Finite state


automaton

Take a look at the following illustration. It shows the scope of each type of grammar −

Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single
non-terminal on the left-hand side and a right-hand side consisting of a single terminal
or single terminal followed by a single non-terminal.
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.

Example

X → ε
X → a | aY
Y → b

Type - 2 Grammar
Type-2 grammars generate context-free languages.
The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
These languages generated by these grammars are be recognized by a non-
deterministic pushdown automaton.

Example

S → X a
X → a
X → aX
X → abc
X → ε
Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in
the form
αAβ→αγβ
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The
languages generated by these grammars are recognized by a linear bounded
automaton.

Example

AB → AbBc
A → bcA
B → b
Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have
no restrictions. They are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and
nonterminals with at least one non-terminal and α cannot be null. β is a string of
terminals and non-terminals.

Example
S → ACaB
Bc → acB
CB → DB
aD → Db

Leftmost and Rightmost Derivations

The Language of a Grammar and Sentential forms,


Exercise Problems
Parse trees – constructing a parse tree, the yield of a parse tree,
inference, derivations and parse trees

Context-Free Grammar Introduction

Definition − A context-free grammar (CFG) consisting of a finite set of grammar rules


is a quadruple (N, T, P, S) where
 N is a set of non-terminal symbols.
 T is a set of terminals where N ∩ T = NULL.
 P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the production
rule P does have any right context or left context.
 S is the start symbol.

Example

 The grammar ({A}, {a, b, c}, P, A), P : A → aA, A → abc.


 The grammar ({S, a, b}, {a, b}, P, S), P: S → aSa, S → bSb, S → ε
 The grammar ({S, F}, {0, 1}, P, S), P: S → 00S | 11F, F → 00F | ε

Generation of Derivation Tree


A derivation tree or parse tree is an ordered rooted tree that graphically represents the
semantic information a string derived from a context-free grammar.

Representation Technique
 Root vertex − Must be labeled by the start symbol.
 Vertex − Labeled by a non-terminal symbol.
 Leaves − Labeled by a terminal symbol or ε.
If S → x x …… x is a production rule in a CFG, then the parse tree / derivation tree will
1 2 n

be as follows −
There are two different approaches to draw a derivation tree −
Top-down Approach −
 Starts with the starting symbol S
 Goes down to tree leaves using productions

Bottom-up Approach −
 Starts from tree leaves
 Proceeds upward to the root which is the starting symbol S

Derivation or Yield of a Tree


The derivation or the yield of a parse tree is the final string obtained by concatenating
the labels of the leaves of the tree from left to right, ignoring the Nulls. However, if all
the leaves are Null, derivation is Null.
Example
Let a CFG {N,T,P,S} be
N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε
One derivation from the above CFG is “abaabb”
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb
Sentential Form and Partial Derivation Tree
A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either all
of its children are in the sub-tree or none of them are in the sub-tree.
Example
If in any CFG the productions are −
S → AB, A → aaA | ε, B → Bb| ε
the partial derivation tree can be the following −
If a partial derivation tree contains the root S, it is called a sentential form. The above
sub-tree is also in sentential form.

Leftmost and Rightmost Derivation of a String


 Leftmost derivation − A leftmost derivation is obtained by applying production
to the leftmost variable in each step.
 Rightmost derivation − A rightmost derivation is obtained by applying
production to the rightmost variable in each step.

Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −
The rightmost derivation for the above string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below −
Left and Right Recursive Grammars
In a context-free grammar G, if there is a production in the form X → Xa where X is a
non-terminal and ‘a’ is a string of terminals, it is called a left recursive production.
The grammar having a left recursive production is called a left recursive grammar.
And if in a context-free grammar G, if there is a production is in the form X →
aX where X is a non-terminal and ‘a’ is a string of terminals, it is called a right
recursive production. The grammar having a right recursive production is called
a right recursive grammar.
If a context free grammar G has more than one derivation tree for some string w ∈
L(G), it is called an ambiguous grammar. There exist multiple right-most or left-most
derivations for some string generated from that grammar.
Problem
Check whether the grammar G with production rules −
X → X+X | X*X |X| a
is ambiguous or not.

Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost derivations.
Derivation 1 − X → X+X → a +X → a+ X*X → a+a*X → a+a*a
Parse tree 1 −

Derivation 2 − X → X*X → X+X*X → a+ X*X → a+a*X → a+a*a


Parse tree 2 −
Since there are two parse trees for a single string "a+a*a", the grammar G is
ambiguous.

3.9 Applications of context free grammars – Markup


languages
Applications of Context-Free Grammers Gur Saran Adhar Grammars are used to
describe programming languages. Most importantly there is a mechanical way of turning
the description as a Context Free Grammar (CFG) into a parser, the component of the
compiler that discovers the structure of the source program and represents that
structure as a tree. For example, The Document Type Definition (DTD) feature of XML
(Extensible Markup Language) is essentially a context-free grammar that describes the
allowable HTML tags and the ways in which these tags may be nested. For example,
one could describe a sequence of characters that was intended to be interpreted as a
phone number by and.

Example-1: Typical programming languages use parentheses and or brackets in a


nested and balanced fashion. That is, we must be able to match some left parenthesis
against a right parenthesis that appears immediately to its right, remove both of them
and repeat. If we eventually eliminate all the parenthesis, then the string was balanced.

Example of strings with balanced parenthesis are (()), ()(), (()()), while )(, and (() are not
balanced. A grammar with the following productions generates all and only the strings
with balanced parenthesis:

B → BB | (B) | λ
The first production, B → BB, says that concatenation of two strings of balanced
parenthesis is balanced. That is, we can match the parenthesis in two strings
independently. The second production, B → (B), says that if we place a pair of
parenthesis around a balanced string, then the result is balanced. The third production,
B → λ is the basis, which says that an empty string is balanced.

Example-2:

There are numerous aspects of typical programming language that behave like
balanced parentheses. Beginning and ending of code blocks, such as begin and end in
Pascal, or the curly braces { . . . } of C, are examples.

There is a related pattern thar appears occasionally, where ”parentheses” can be


balanced with the exception that there can be a unbalanced left parentheses.

An example is the treatment of if and else in C. An if-clause can appear unbalanced by


any else-clause, or it may be balanced by a matching else-clause.

A grammer that generates the possible sequence of if and else (represented by i and e,
respectively) is:

S → SS | iS | iSe | λ

For instance, ieie, iie, and iei are possible sequences of if and else’s and each of these
strings is generated by the above grammer. Some examples of illegal sequences not
generated by the grammer are, ei, ieeii, iee.

Example-3:

We give below CFG that describes some parts of the structure of HTML (Hypertext
Markup Language).

Char → a | A | . . . Text → λ | Char Text Doc → λ | Element Doc Element → Text |< EM
> Doc < /EM >|< P > Doc |< OL > List < /OL > List → λ | ListItem List ListItem →< LI >
Doc Example-4:

Let G be a grammar with the set of variables:


V = {S, < Noun phrase >, < Verb phrase >, < Adjective phrase >, < Noun >, < Verb >, <
Adjective >} the alphabet set: Σ = {big, stout, John, bought, white, car, J im, cheese, ate,
green} with the

rules: (1) S →< Noun phrase >< Verb phrase >


(2) < Noun phrase >→< Noun >|< Adjective phrase >< Noun >| λ
(3) < Verb phrase >→< Verb >< Noun phrase >
(4) < Adjective phrase >→< Adjective phrase >< Adjective >| λ
(5) < Noun >→ John | car | J im | cheese
(6) < Verb >→ bought | ate
(7) < Adjective >→ big | stout | white | green

Then the grammar generates, in particular, the following strings: John bought car Jim
ate cheese big Jim ate green cheese John bought big car big stout John bought big
white car Unfortunately, the grammer also generates sentences like: big stout car
bought big stout car big cheese ate Jim green Jim ate green big Jim.

3.10 XML and document-type definitions

Document Type Definition – DTD

A Document Type Definition (DTD) describes the tree structure of a document


and something about its data. It is a set of markup affirmations that actually
define a type of document for the SGML family, like GML, SGML, HTML, XML.
A DTD can be declared inside an XML document as inline or as an external
recommendation. DTD determines how many times a node should appear, and
how their child nodes are ordered.

There are 2 data types, PCDATA and CDATA


 PCDATA is parsed character data.
 CDATA is character data, not usually parsed.

Syntax:
<!DOCTYPE element DTD identifier
[
first declaration
second declaration
.
.
nth declaration
]>
Example:

DTD for the above tree is:


XML Document with an Internal DTD:
 XML

<?xml version="1.0"?>

<!DOCTYPE address [

<!ELEMENT address (name, email, phone, birthday)>

<!ELEMENT name (first, last)>

<!ELEMENT first (#PCDATA)>

<!ELEMENT last (#PCDATA)>

<!ELEMENT email (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

<!ELEMENT birthday (year, month, day)>

<!ELEMENT year (#PCDATA)>

<!ELEMENT month (#PCDATA)>


<!ELEMENT day (#PCDATA)>

]>

<address>

<name>

<first>Rohit</first>

<last>Sharma</last>

</name>

<email>[email protected]</email>

<phone>9876543210</phone>

<birthday>

<year>1987</year>

<month>June</month>

<day>23</day>

</birthday>

</address>

The DTD above is interpreted like this:

 !DOCTYPE address defines that the root element of this document is


address.

 !ELEMENT address defines that the address element must contain four
elements: “name, email, phone, birthday”.
 !ELEMENT name defines that the name element must contain two elements:
“first, last”.

 !ELEMENT first defines the first element to be of type “#PCDATA”.


 !ELEMENT last defines the last element to be of type “#PCDATA”.

 !ELEMENT email defines the email element to be of type “#PCDATA”.

 !ELEMENT phone defines the phone element to be of type “#PCDATA”.

 !ELEMENT birthday defines that the birthday element must contain three
elements “year, month, day”.
 !ELEMENT year defines the year element to be of type “#PCDATA”.
 !ELEMENT month defines the month element to be of type
“#PCDATA”.
 !ELEMENT day defines the day element to be of type “#PCDATA”.

XML document with an external DTD:

 XML

<?xml version="1.0"?>

<!DOCTYPE address SYSTEM "address.dtd">

<address>

<name>

<first>Rohit</first>

<last>Sharma</last>

</name>

<email>[email protected]</email>

<phone>9876543210</phone>

<birthday>
<year>1987</year>

<month>June</month>

<day>23</day>

</birthday>

</address>

address.dtd:

 <!ELEMENT address (name, email, phone, birthday)>

 <!ELEMENT name (first, last)>


 <!ELEMENT first (#PCDATA)>
 <!ELEMENT last (#PCDATA)>

 <!ELEMENT email (#PCDATA)>

 <!ELEMENT phone (#PCDATA)>

 <!ELEMENT birthday (year, month, day)>


 <!ELEMENT year (#PCDATA)>
 <!ELEMENT month (#PCDATA)>
 <!ELEMENT day (#PCDATA)>
Output:

Attention reader! Don’t stop learning now. Get hold of all the important HTML
concepts with the Web Design for Beginners | HTML course.
3.11 Ambiguity in grammars and languages – ambiguous
grammars.
Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one leftmost derivation or
more than one rightmost derivation or more than one parse tree for the given input
string. If the grammar is not ambiguous, then it is called unambiguous.

If the grammar has ambiguity, then it is not good for compiler construction. No method
can automatically detect and remove the ambiguity, but we can remove ambiguity by re-
writing the whole grammar without ambiguity.

Example 1:
Let us consider a grammar G with the production rule

1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9

Solution:

For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost
derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is
ambiguous.

Example 2:
Check whether the given grammar G is ambiguous or not.

1. E → E + E
2. E → E - E
3. E → id

Solution:

From the above grammar String "id + id - id" can be derived in 2 ways:

First Leftmost derivation

1. E → E + E
2. → id + E
3. → id + E - E
4. → id + id - E
5. → id + id- id

Second Leftmost derivation


1. E → E - E
2. →E+E-E
3. → id + E - E
4. → id + id - E
5. → id + id - id

Since there are two leftmost derivation for a single string "id + id - id", the grammar G is
ambiguous.

Example 3:
Check whether the given grammar G is ambiguous or not.

1. S → aSb | SS
2. S → ε

Solution:

For the string "aabb" the above grammar can generate two parse trees

Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.

Example 4:
Check whether the given grammar G is ambiguous or not.

1. A → AA
2. A → (A)
3. A → a
Solution:

For the string "a(a)aa" the above grammar can generate two parse trees:

Since there are two parse trees for a single string "a(a)aa", the grammar G is
ambiguous.

You might also like