0% found this document useful (0 votes)
9 views

ATCD UT3 Material

Uploaded by

rodsingle948
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ATCD UT3 Material

Uploaded by

rodsingle948
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT – III: GRAMMARS AND COMPILERS

Part-I - Context Free Grammars and Languages:


Context Free Grammars, Parse Trees and Ambiguity in Grammars: Chomsky Hierarchy of
Languages, Context-Free Languages Definition, Left most and Right most derivations, Sentential
forms, Derivation trees, Ambiguity in context free grammars and removing it.
Applications and Properties of CFLs: Closure Properties of CFL, Elimination of left recursion,
left factoring, Applications of CFL.
Part-II – Compilers-Lexical and Syntax Analysis:
Compilers - Lexical Analysis: Compiler, Phases of a compiler, Tokens, patterns and lexemes,
Attributes for tokens, Role of Lexical Analysis & Input Buffering, Lex tool & Sample Lex
programs.
Compilers - Syntax Analysis: Parsing, role of parser, top down parsing, Computing first and
follows, Recursive descent parser, Predictive Parsers, LL (1) Grammar & LL (1) Parsers.

3.0 Chomsky Hierarchy Of Languages:


Chomsky Hierarchy represents the class of languages that are accepted by the different
machines. According to Chomsky hierarchy, grammar is divided into 4 types as follows:

1. Type 0 is known as unrestricted grammar.


2. Type 1 is known as context-sensitive grammar.
3. Type 2 is known as a context-free grammar.
4. Type 3 Regular Grammar.

Figure: Chomsky Hierarchy

This is a hierarchy. Therefore every language of type 3 is also of type 2, 1 and 0. Similarly, every
language of type 2 is also of type 1 and type 0, etc.

1
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
1. Type 0 Grammar:

Type 0 grammar is known as Unrestricted grammar. There is no restriction on the grammar rules
of these types of languages. These languages are also known as the Recursively Enumerable
languages. These languages can be efficiently modeled by Turing machines.

These are of the form: α → β, where α, β ϵ (V υ T)*

Example:

bAa → aa

S→s

2. Type 1 Grammar:

Type 1 grammar is known as Context Sensitive Grammar. The context sensitive grammar is used
to represent context sensitive language. The language generated by the grammar is recognized by
the Linear Bound Automata. The context sensitive grammar follows the following rules:

 The context sensitive grammar may have more than one symbol on the left hand side of
their production rules.
 The number of symbols on the left-hand side must not exceed the number of symbols on
the right-hand side.
 The rule of the form A → ε is not allowed unless A is a start symbol. It does not occur on
the right-hand side of any rule.
 The Type 1 grammar should be Type 0. In type 1, Production is in the form of α → β,
where |α| ≤ |β|.

Example:

S → AT

T → xy

A→a

3. Type 2 Grammar:

Type 2 Grammar is known as Context Free Grammar. Type2 grammars generate context free
languages. The language generated by the grammar is recognized by pushdown automata. Type
2 should be type 1. The production rule is of the form: α → β, where |α| ≤ |β|, αϵV, |α| = 1.

Example:

A → aBb
2
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
A→b

B→a

4. Type 3 Grammar:

Type 3 Grammar is known as Regular Grammar. Regular languages are those languages which
can be described using regular expressions. These languages can be modeled by NFA or DFA.

Type 3 is most restricted form of grammar. The Type 3 grammar should be Type 2 and Type 1.
Type 3 should be in the form of α → β.

Example:

S → a | bA (Right linear grammar)

S → a | Ab (Left linear grammar)

3.1 Context Free Languages:


Context-Free Language (CFL) is a language which is generated by a context-free grammar or
Type 2 grammar(according to Chomsky classification) and gets accepted by a Pushdown
Automata.

Context Free Grammar (CFG) –

Definition − A context-free grammar (CFG) is a formal grammar which is used to generate all
possible strings in a given formal language.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)

Where,

 G describes the grammar


 T describes a finite set of terminal symbols.
 V describes a finite set of non-terminal symbols
 P describes a set of production rules, P: N → (N ∪ T)*,
 S is the start symbol.

Example:

Let the language L = anbn where n≥1

Production rules:

3
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
S → aSa

S → bSb

S→c

Now check that abbcbba string can be derived from the given CFG.

S ⇒ aSa

S ⇒ abSba

S ⇒ abbSbba

S ⇒ abbcbba

By applying the production S → aSa, S → bSb recursively and finally applying the production

S → c, we get the string abbcbba.

3.1.1 Rules for writing a CFG:

1. A single non-terminal should be at L.H.S.

2. The rule should be always in the form of L.H.S → R.H.S. Where R.H.S may be the
combination of non-terminal and terminal symbols.

3. The NULL derivation can be specified as NT → ε

4. One of the non-terminals should be start symbol and conventionally we should write the rules
for this non-terminal.

3.2 Derivation and Parse Trees:


Derivation is a sequence of production rules. It is used to get the input string through these
production rules. For constructing derivation two things are important. These are as follows:

a) Choice of non-terminal from several others.


b) Choice of rule from production rules for corresponding non-terminal.

Example:

Let a CFG {N,T,P,S} be

N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε

One derivation from the above CFG is “abaabb”

4
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb

Sentential Form and Partial Derivation Tree

A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either all of its
children are in the sub-tree or none of them are in the sub-tree.

Example:

If in any CFG the productions are −

S → AB, A → aaA | ε, B → Bb| ε

the partial derivation tree can be the following –

If a partial derivation tree contains the root S, it is called a sentential form. The above sub-tree is
also in sentential form.

5
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
3.2.1 Leftmost and Rightmost Derivation of a String –

Leftmost derivation − A leftmost derivation is obtained by applying production to the leftmost


variable in each step.

Rightmost derivation − A rightmost derivation is obtained by applying production to the


rightmost variable in each step.

Example:

Consider the given grammar below –

X → X+X | X*X |X| a

Obtain the leftmost and rightmost derivation for the string "a+a*a".

Solution: The leftmost derivation for the string "a+a*a" may be −

X → X+X

X → a+X

X → a + X*X

X → a+a*X

X → a+a*a

The stepwise derivation of the above string is shown as below –

Step1:

Step2:

6
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
Step3:

Step4:

Step5:

The rightmost derivation for the above string "a+a*a" may be −

X → X*X

X → X*a

X → X+X*a

X → X+a*a

7
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
X → a+a*a

The stepwise derivation of the above string is shown as below –

Step1:

Step2:

Step4:

8
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
Exercise Problem 1: Consider the grammar

S → (L) | a

L → L, S | a

a) What are the terminals, non-terminals and start symbol?

b) Find parse trees for the following sentences:

i) (a, a)

ii) (a, (a, a))

iii) (a, ((a, a), (a, a)))

c) Construct a leftmost derivation for each of the sentences in (b)

d) Construct a rightmost derivation for each of the sentences in (b)

e) What language does the grammar generate?

Exercise Problem 2: Consider the following grammar

S → 0A | 1B | 0 | 1

A → 0S | 1B | 1

B → 0A | 1S

Construct leftmost derivations and parse tree for the following sentences

i) 0101 ii) 1100101

9
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
3.2.2 Parse Tree:

 Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
 In parsing, the string is derived using the start symbol. The root of the parse tree is that
start symbol.
 It is the graphical representation of symbol that can be terminals or non-terminals.
 Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So,
the operator in the parent node has less precedence over the operator in the sub-tree.

The parse tree follows these points:

 All leaf nodes have to be terminals.


 All interior nodes have to be non-terminals.
 In-order traversal gives original input string.

Example:

Production rules:

S→S+S|S*S

S → a|b|c

Input:

a*b+c

Step1:

Step2:

10
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
Step3:

Step4:

Step5:

11
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
3.3 Ambiguity in context free grammars and removing it:
3.3.1 Ambiguous Grammar:

A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivative or more than one parse tree/derivation tree for the given input
string. If the grammar is not ambiguous then it is called unambiguous.

Example:

S → aSb | SS

S→ε

For the string aabb, the above grammar generates two parse trees:

If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the
whole grammar without ambiguity.

3.3.2 Removal of Ambiguity:

To convert ambiguous grammar to unambiguous grammar, the following rules are:

1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left
recursion in the production rule. Left recursion means that the leftmost symbol on the right side
is the same as the non-terminal on the left side. For example,

X → Xa

2. If the right associative operates(^) is used in the production rule then apply right recursion in
the production rule. Right recursion means that the rightmost symbol on the left side is the same
as the non-terminal on the right side. For example,

X → aX
12
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
Example 1:

Show that the given grammar is ambiguous. Also, find an equivalent unambiguous grammar.

E→E+E

E→E*E

E → id

Solution:

Let us derive the string "id + id * id"

As there are two different parse tree for deriving the same string, the given grammar is
ambiguous.

Unambiguous grammar will be:

E→E+T

E→T

T→T*F

T→F

F → id

13
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
Example 2:

Consider a grammar G is given as follows:

S → AB | aaB

A → a | Aa

B→b

Determine whether the grammar G is ambiguous or not. If G is ambiguous, construct an


unambiguous grammar equivalent to G.

Solution:

Let us derive the string "aab"

As there are two different parse tree for deriving the same string, the given grammar is
ambiguous.

Unambiguous grammar will be:

S → AB

A → Aa | a

B→b

Example 3:

Check that the given grammar is ambiguous or not. Also, find an equivalent unambiguous
grammar.

S→S+S

14
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
S→S*S

S→S^S

S→a

Solution:

The given grammar is ambiguous because the derivation of string aab can be represented by the
following string:

Unambiguous grammar will be:

S→S+A|

A→A*B|B

B→C^B|C

C→a

15
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
3.4 Applications and Properties of CFL:

3.4.1 Closure properties of CFL:


Context Free Languages (CFLs) are accepted by pushdown automata. Context free languages
can be generated by context free grammars, which have productions (substitution rules) of the
form :

A→ρ (where A ∈ N and ρ ∈ (T ∪ N)* and N is a non-terminal and T is a terminal)

Properties of Context Free Languages –

a) Union: If L1 and L2 are two context free languages, their union L1 ∪ L2 will also be context
free.

Example:

L1 = { anbncm | m >= 0 and n >= 0 } and

L2 = { anbmcm | n >= 0 and m >= 0 }

L3 = L1 ∪ L2 = { anbncm ∪ anbmcm | n >= 0, m >= 0 } is also context free.

L1 says number of a’s should be equal to number of b’s and L2 says number of b’s should be
equal to number of c’s. Their union says either of two conditions to be true. So it is also context
free language.

b) Concatenation: If L1 and If L2 are two context free languages, their concatenation L1.L2
will also be context free.

Example:

L1 = { anbn | n >= 0 } and L2 = { cmdm | m >= 0 }

L3 = L1.L2 = { anbncmdm | m >= 0 and n >= 0} is also context free.

L1 says number of a’s should be equal to number of b’s and L2 says number of c’s should be
equal to number of d’s. Their concatenation says first number of a’s should be equal to number
of b’s, then number of c’s should be equal to number of d’s. So, we can create a PDA which will
first push for a’s, pop for b’s, push for c’s then pop for d’s. So it can be accepted by pushdown
automata, hence context free.

c) Kleene Closure: If L1 is context free, its Kleene closure L1* will also be context free.

Example:

16
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
L1 = { anbn | n >= 0 }

L1* = { anbn | n >= 0 }* is also context free.

Context-free languages are not closed under −

Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is not necessarily context
free.

Intersection with Regular Language − If L1 is a regular language and L2 is a context free


language, then L1 ∩ L2 is a context free language.

Complement − If L1 is a context free language, then L1’ may not be context free.

3.4.2 Elimination of Left Recursion:


A Grammar G (V, T, P, S) is left recursive if it has a production in the form.

A → A α |β.

The above Grammar is left recursive because the left most non-terminal on the right side of the
production is same as left side of the non-terminal. It can eliminate left recursion by re-writing
the production rule as:

A → βA′

A′ → αA′

A′ → ϵ

In Left Recursive Grammar, expansion of A will generate Aα, Aαα, Aααα at each step, causing it
to enter into an infinite loop. This causes major problem in top-down parsing and therefore
elimination of left recursion is a must.

Elimination of Left Recursion

Left Recursion can be eliminated by introducing new non-terminal A such that.

17
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
This type of recursion is also called Immediate Left Recursion.

Example1: Eliminate the left recursion from the grammar

E → E + T|T

T → T * F|F

F → (E)|id

Solution:

The production after removing the left recursion will be

E → TE′

E′ → +TE′| ∈

T → FT′

T′ →∗ FT′| ∈

F → (E)|id

Example2: Consider the following grammar

A → ABd|Aa|a

B → Be|b

Remove left recursion.

3.4.3 Left Factoring:


A grammar is said to be left factored when it is of the form –

A → αβ1 | αβ2 | αβ3 | …… | αβn | γ i.e the productions start with the same terminal (or set of
terminals). On seeing the input α we cannot immediately tell which production to choose to
expand A.

Left factoring is a grammar transformation that is useful for producing grammar suitable for
predictive or top-down parsing. When the choice between two alternative A-productions is not
clear, we may be able to rewrite the productions to defer the decision until enough of the input
has been seen to make the right choice.

For the grammar A → αβ1 | αβ2 | αβ3 | …… | αβn | γ

The equivalent left factored grammar will be –


18
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
A → αA′ | γ

A′ → β1 | β2 | β3 | …… | βn

Example1:

Consider the following grammar to do left factoring.

S → iEtS | iEtSeS | a

E→b

Solution:

The left factored grammar becomes,

S → iEtSS′ | a

S′ → eS | ε

Example2:

Do left factoring in the following grammar –

A → Aab | aA | a

B → Bb | b

3.4.4 Applications of CFL:


 Context Free Grammars are used in Compilers (like GCC) for parsing. In this step, it
takes a program (a set of strings).
 Context Free Grammars are used to define the High Level Structure of a Programming
Languages.
 Every Context Free Grammars can be converted to a Parser which is a component of a
Compiler that identifies the structure of a Program and converts the Program into a Tree.
 Document Type Definition in XML is a Context Free Grammars which describes the
HTML tags and the rules to use the tags in a nested fashion.

Following is the Context Free Grammar for HTML (with limited tags):

 Char -? a | A | . . .
 Text → λ | Char Text
 Doc → λ | Element Doc
 Element → Text | < EM > Doc < /EM >|< P > Doc |< OL > List < /OL >
 List → λ | ListItem List
19
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.
 ListItem → < li > Doc

All finite set of strings are Regular Languages. All Regular Language is a Context Free
Language. Hence, all Programming Languages are Context Free Languages / can be represented
by a Context Free Grammar.

Algebraic Expressions can be represented using Context Free Grammar.

For example, this is the rules of a Context Free Grammar for syntactically correct Infix
expression using 3 variables (x, y, z):

 S → empty
 S → (S)
 S→x
 S→y
 S→z
 S→S+S
 S →S – S
 S→S*S
 S→S/S

20
Mr. K. Leela Prasad, Asst. Prof., CSE Dept.

You might also like