0% found this document useful (0 votes)
2 views

Automata Lectuee5

The document discusses Context-Free Languages (CFLs) and Context-Free Grammars (CFGs), highlighting their significance in recognizing non-regular languages and their applications in programming languages and compilers. It provides examples of CFGs, including those for palindromes and balanced parentheses, and explains the structure of productions and derivations. The document emphasizes the importance of CFGs in parsing and syntax checking in various programming contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Automata Lectuee5

The document discusses Context-Free Languages (CFLs) and Context-Free Grammars (CFGs), highlighting their significance in recognizing non-regular languages and their applications in programming languages and compilers. It provides examples of CFGs, including those for palindromes and balanced parentheses, and explains the structure of productions and derivations. The document emphasizes the importance of CFGs in parsing and syntax checking in various programming contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

MENOUFIAUNIVERSITY

FACULTYOFCOMPUTERSANDINFORMATION

‫جامعة المنوفية‬

Third Level (Second Semester)


Software Engineering, CS Dept., (CS314)

FORMAL LANGUAGES AND


AUTOMATA THEORY

Dr. Hamdy M. Mousa

Today Proverb
‫العلم نور‬
Knowledge is power
Context-Free Languages &
Grammars
(CFLs & CFGs)

2
Not all languages are regular
• Not all languages are regular

• So what happens to the languages which


are not regular?

• Can we still come up with a language


recognizer?
• i.e., something that will accept (or reject)
strings that belong (or do not belong) to the
language?
Context-Free Languages (CFL)
• A language class larger than the class of
regular languages

• Supports natural, recursive notation called


“context-free grammar”

• Applications:
Context-
• Parse trees, compilers Regular
free
• XML (FA/RE)
(PDA/CFG)
Example
• A palindrome is a word that reads identical
from both ends
• E.g., madam, redivider, malayalam, 010010010
• Let L = { w | w is a binary palindrome}
• Is L regular?
• No.
• Proof:
• Let w=0N10N (assuming N to be the p/l constant)

• By Pumping lemma, w can be rewritten as xyz, such that xykz


is also L (for any k≥0)
• But |xy| ≤ N and y ≠ 
• ==> y=0+
• ==> x yk z will NOT be in L for k=0
• ==> Contradiction
Palindromes
But the language of palindromes is a CFL, because it
supports recursive substitution (in the form of a
CFG)
• This is because we can construct a “grammar”
like this:
1. A ==> 
Productions

2. A ==> 0 Terminal
3. A ==> 1
4. A ==> 0A0
5. A ==> 1A1 Variable or non-terminal
Same as:
A => 0A0 | 1A1 | 0 | 1 | 
How does this grammar work?
Context-Free Grammar (CFG)
• How does the CFG for palindromes work?
• An input string belongs to the language (i.e.,
accepted) iff it can be generated by the CFG
Example: w=01110
G:
A => 0A0 | 1A1 | 0 | 1 | 
• G can generate w as follows:
1. A => 0A0
Generating a string from a grammar:
2. => 01A10
1.Pick and choose a sequence of
3. => 01110 productions that would allow us to
generate the string.
2.At every step, substitute one
variable with one of its productions.
Context-Free Grammar (CFG)
Definition
• A context-free grammar G=(V,T,P,S), where:
• V: set of variables or non-terminals
• T: set of terminals (= alphabet U {})
• P: set of productions, each of which is of the form
V ==> 1 | 2 | …
• Where each i is an arbitrary string of variables and
terminals
• S ==> start variable
Example:
CFG for the language of binary palindromes:
• G=({A},{0,1},P,A)
• P: A ==> 0 A 0 | 1 A 1 | 0 | 1 | 
Example
• CFG
G = ({A, B, C, S}, {a, b, c}, P, S)
P:
(1) S –> ABC
(2) A –> aA A –> aA | ε
(3) A –> ε
(4) B –> bB B –> bB | ε
(5) B –> ε
(6) C –> cC C –> cC | ε
(7) C –> ε
Is aabc  G ?
Example
P: (1) S –> ABC
(2) A –> aA A –> aA | ε
(3) A –> ε
(4) B –> bB B –> bB | ε
(5) B –> ε
(6) C –> cC C –> cC | ε
• Example Derivations: (7) C –> ε

S => ABC (1) S => ABC (1)


=> BC (3) => aABC (2)
=> C (5) => aaABC (2)
=> ε (7) => aaBC (3)
=> aabBC (4)
=> aabC (5)
=> aabcC (6)
=> aabc (7)

• Note that G generates the language a*b*c*


More examples
• Parenthesis matching in code
• Syntax checking
• In scenarios where there is a general need for:
• Matching a symbol with another symbol, or
• Matching a count of one symbol with that of another
symbol, or
• Recursively substituting one symbol with a string of
other symbols
Example #2
• Language of balanced paranthesis
e.g., ()(((())))((()))….
• CFG?
G:
S => (S) | SS | 

How would you “interpret” the string “(((()))()())”


using this grammar?
Example #3
• A grammar for L = {0m1n | m≥n}

• CFG?
G:
S => 0S1 | A
A => 0A | 

How would you interpret the string “00000111”


using this grammar?
Example #4
A program containing if-then(-else) statements:

if Condition then Statement else Statement

OR

if Condition then Statement

CFG?
More examples
• L1 = {0n | n≥0 }
• L2 = {0n | n≥1 }
• L3={0i1j2k | i=j or j=k, where i,j,k≥0}
• L4={0i1j2k | i=j or i=k, where i,j,k≥1}
Applications of CFLs & CFGs
• Compilers use parsers for syntactic
checking
• Parsers can be expressed as CFGs
1. Balancing paranthesis:
• B ==> BB | (B) | Statement
• Statement ==> …
2. If-then-else:
• S ==> SS | if Condition then Statement else Statement | if
Condition then Statement | Statement
• Condition ==> …
• Statement ==> …
3. C paranthesis matching { … }
4. Pascal begin-end matching
5. YACC (Yet Another Compiler-Compiler)
More applications
• Markup languages
• Nested Tag Matching
• HTML
• <html> …<p> … <a href=…> … </a> </p> …
</html>

• XML
• <PC> … <MODEL> … </MODEL> .. <RAM>
… </RAM> … </PC>
Tag-Markup Languages
Roll ==> <ROLL> Class Students </ROLL>
Class ==> <CLASS> Text </CLASS>
Text ==> Char Text | Char
Char ==> a | b | … | z | A | B | .. | Z
Students ==> Student Students | 
Student ==> <STUD> Text </STUD>

Here, the left hand side of each production denotes one non-
terminals (e.g., “Roll”, “Class”, etc.)
Those symbols on the right hand side for which no
productions (i.e., substitutions) are defined are terminals
(e.g., ‘a’, ‘b’, ‘|’, ‘<‘, ‘>’, “ROLL”, etc.)
Structure of a production

head derivation body


A =======> 1 | 2 | … | k

The above is same as:

1. A ==> 1
2. A ==> 2
3. A ==> 3

K. A ==> k
CFG Conventions
• Terminal symbols <== a, b, c…

• Non-terminal symbols <== A,B,C, …

• Terminal or non-terminal symbols <== X,Y,Z

• Terminal strings <== w, x, y, z

• Arbitrary strings of terminals and non-terminals


<== , , , ..
Syntactic Expressions
Syntactic Expressions in Programming
Languages:
result = a*b + score + 10 * distance + c
Operators are also
terminals variables terminals

Regular languages have only terminals


• Reg expression = [a-z][a-z0-1]*
• If we allow only letters a & b, and 0 & 1 for
constants (for simplification)
• Regular expression = (a + b)(a+b+0+1)*
String membership
How to say if a string belong to the language
defined by a CFG?
1. Derivation
• Head to body
Both are equivalent forms
2. Recursive inference
• Body to head
Example:
G:
• w = 01110 A => 0A0 | 1A1 | 0 | 1 | 
• Is w a palindrome?
A => 0A0
=> 01A10
=> 01110
Simple Expressions…
• We can write a CFG for accepting
simple expressions
• G = (V,T,P,S)
• V = {E,F}
• T = {0,1,a,b,+,*,(,)}
• S = {E}
• P:
• E ==> E+E | E*E | (E) | F
• F ==> aF | bF | 0F | 1F | a | b | 0 | 1
Generalization of derivation
◼ Derivation is head ==> body

◼ A==>X (A derives X in a single step)


◼ A ==>*G X (A derives X in a multiple steps)

◼ Transitivity:

IF A ==>*GB, and B ==>*GC, THEN A ==>*G C


Context-Free Language
• The language of a CFG, G=(V,T,P,S),
denoted by L(G), is the set of terminal
strings that have a derivation from the start
variable S.

• L(G) = { w in T* | S ==>*G w }
Left-most & Right-most Derivation Styles
G:
E => E+E | E*E | (E) | F
F => aF | bF | 0F | 1F |  E =*=>G a*(ab+10)
Derive the string a*(ab+10) from G:
◼E ◼E
◼==> E * E ◼==> E * E
Left-most
◼==> F * E ◼==> E * (E)
derivation: ◼==> aF * E ◼==> E * (E + E) Right-most
◼==> a * E ◼==> E * (E + F) derivation:
Always ◼==> a * (E) ◼==> E * (E + 1F)

◼==> a * (E + E) ◼==> E * (E + 10F)


substitute
◼==> a * (F + E) ◼==> E * (E + 10) Always
leftmost
◼==> a * (aF + E) ◼==> E * (F + 10) substitute
variable ◼==> a * (abF + E) ◼==> E * (aF + 10) rightmost
◼==> a * (ab + E) ◼==> E * (abF + 0)
variable
◼==> a * (ab + F) ◼==> E * (ab + 10)

◼==> a * (ab + 1F) ◼==> F * (ab + 10)

◼==> a * (ab + 10F) ◼==> aF * (ab + 10)

◼==> a * (ab + 10) ◼==> a * (ab + 10)


Leftmost vs. Rightmost derivations
Q1) For every leftmost derivation, there is a
rightmost derivation, and vice versa. True
or False?
True - will use parse trees to prove this

Q2) Does every word generated by a CFG


have a leftmost and a rightmost derivation?
Yes – easy to prove (reverse direction)
Q3) Could there be words which have more
than one leftmost (or rightmost) derivation?
Yes – depending on the grammar
Example
Construct a CFG for the regular expression
(0+1)*

Solution:
• The CFG can be given by,
• Production rule (P):
• S → 0S | 1S
• S→
• The rules are in the combination of 0's and
1's with the start symbol. Since (0+1)*
indicates {, 0, 1, 01, 10, 00, 11, …}.
Example
• Construct a CFG for a language L = {wcwR: w ∈ (a, b)*}.
• Solution:
• The string that can be generated for a given language is
{aacaa, bcb, abcba, bacab,abbcbba, ....}
• The grammar could be:
• S → aSa rule 1
• S → bSb rule 2
• S→c rule 3

• Now if we want to derive a string "abbcbba", we can start with


start symbols.
• S → aSa
• → abSba using rule 2
• → abbSbba using rule 2
• → abbcbba using rule 3
• Thus any of this kind of string can be derived from the given
production rules.
Example
• Let G(L) = ({S, A, B}, {a, b}, P, S) where P denoted as:
• S → aA | bB | a | b rule 1
• A → aA | a rule 2
• B → bB |b rule 3
1- Is the string “aa” Accept or not?
S → aA using rule 1 (S → aA)
→ aa using rule 2 (A → a)
The string is Accept
2- Is the string “bbb” Accept or not?
S → bB using rule 1 (S → bB)
→ bbB using rule 3 (B → bB)
→ bbb using rule 3 (B → b)
The string is Accept
3- Is the string “aaba” Accept or not?
S → aA using rule 1 (S → aA)
→ aaA using rule 2 (A → aA)
The string is not Accept
Example
• Let G(L) = ({S, B, C}, {a, b, c}, P, S) where P
denoted as:
• S → aSBC | aBC rule 1
• CB → BC rule 2
• aB → ab rule 3
• bB → bb rule 4
• bC → bc rule 5
• cC → cc rule 6
1- Is the string “abc” Accept or not?
S → aBC using rule 1 (S → aBC)
→ abC using rule 3 (aB → ab)
→ abc using rule 5 (bC → bc)
The string is Accept
Example…
Let G(L) = ({S, B, C}, {a, b, c}, P, S) where P denoted as:
S → aSBC | aBC rule 1
CB → BC rule 2
aB → ab rule 3
bB → bb rule 4
bC → bc rule 5
cC → cc rule 6

2- Is the string “a2b2c2” Accept or not?


S → aSBC using rule 1 (S → aSBC)
→ aaBCBC using rule 1 (S → aBC)
→ aaBBCC using rule 2 (CB → BC)
→ aabBCC using rule 3 (aB → ab)
→ aabbCC using rule 4 (bB → bb)
→ aabbcC using rule 5 (bC → bc)
→ aabbcc using rule 6 (cC → cc)
The string is Accept
Quiz
Let G(L) = ({S, B, C}, {a, b, c}, P, S) where P
denoted as:
• S → aSBC | aBC
• CB → BC
• aB → ab
• bB → bb
• bC → bc
• cC → cc
• 1- Is the string “a3b3c3” Accept or not?
• 2- Is the string “a3b2” Accept or not?

You might also like