0% found this document useful (0 votes)
1 views26 pages

Pcdunit2 Continuation

The document discusses the specification and recognition of tokens using regular expressions, defining key concepts such as alphabets, strings, languages, and operations on languages. It explains the rules for constructing regular expressions, provides examples, and illustrates the use of transition diagrams for lexical analysis. Additionally, it contrasts parse trees with syntax trees, highlighting their differences and uses in representing grammatical structures.

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views26 pages

Pcdunit2 Continuation

The document discusses the specification and recognition of tokens using regular expressions, defining key concepts such as alphabets, strings, languages, and operations on languages. It explains the rules for constructing regular expressions, provides examples, and illustrates the use of transition diagrams for lexical analysis. Additionally, it contrasts parse trees with syntax trees, highlighting their differences and uses in representing grammatical structures.

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Specification and

Recognition of Tokens
Regular Expression
• A regular expression is a pattern which
specifies a set of strings of characters; it is
said to match certain strings.
• Declarative way of defining/describe
regular languages
Example
• Letter ( letter | digit )*
Terminology of Languages
• Alphabet : a finite set of symbols (ASCII characters)
• String :
– Finite sequence of symbols on an alphabet
– Sentence and word are also used in terms of string
  is the empty string
– |s| is the length of string s.
• Language: sets of strings over some fixed alphabet
  the empty set is a language.
– {} the set containing empty string is a language
• Operators on Strings:
– Concatenation: xy represents the concatenation of strings x and y.
– s =s s=s
– sn = s s s ……… s ( n times) s0 = 
Operations on Languages
• Concatenation:
– L1L2 = { s1s2 | s1  L1 and s2  L2 }
• Union
– L1 L2 = { s | s  L1 or s  L2 }
• Exponentiation:
– L0 = {} L1 = L L2 = LL
• Kleene Closure

– L =
* Li

i 0 L* denotes “zero or more concatenations of “ L

• Positive Closure

– L+ =
Li

i 1
L+ denotes “One or more concatenations of “ L
Example
• L1 = {a,b,c,d} L2 = {1,2}

• L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}

• L1  L2 = {a,b,c,d,1,2}

• L13 = all strings with length three (using a,b,c,d}


• L1* = all strings using letters a,b,c,d and empty
string
• L1+ = doesn’t include the empty string
Regular Set
• A language that can be defined by a
regular expression is called a regular set.
• A language that can be defined by a
context-free grammar is called a context-
free language.
• the set of regular sets  the set of
context-free language
Rule for Regular Expressions
• The rules that define the regular expression over
alphabet  are as follows.
1.  is a regular expression, denoted { }
2. If a is a symbol in , then a is a regular
expression denoting {a}
3. Suppose r and s are regular expressions for the
languages L(r) and L(s), then,
a) (r) | (s) is a regular expression denoting
L(r)L(s)
b) (r) (s) is a regular expression denoting L(r)L(s)
c) (r )* is a regular expression denoting (L(r ))*
Rule for Regular Expressions (Cont.)

Unnecessary parentheses can be avoided in


regular expression if we adopt the following
conventions
1.The unary operator * has the highest precedence
and is left associative.
2.Concatenation has the second highest
precedence and is left associative.
3.| has the lowest precedence and is left
associative
Rule for Regular Expressions
(Example)
• Let  ={a, b}
1. a | b denotes {a, b}
2. (a | b)(a | b) denotes {aa, ab, ba, bb}, the set of
all strings of a’s and b’s of length two.
3. a* denotes {, a, aa, aaa, …}, the set of all
strings of zero or more a’s.
4. (a | b)* denotes the set of all strings containing
zero or more instances of a or b.
5. a | a*b denotes the set containing string a or the
strings consisting zero or more a’s followed by b.
Notational Shorthands
1. One or more instances +
– a+ : the set of all strings of one ore more a’s
– r + = r r*, r* = r + | 
2. Zero or one instance ?
– r? = r | 
digit  0 | 1 | … | 9
digits  digit +
optional_faction  (.digits)?
optional_exponent  (E(+|-) ? digits)?
num  digits optional_fraction optional_exponent
3. Character class:
− [abc] = a | b | c
− [a-z] = a | b | … | z
− id  [A-Za-z][A-Za-z0-9]*
Recognition of Tokens
• Consider the following grammar fragment:
stmt  if expr then stmt
| if expr then stmt else stmt
|
expr  term relop term
| term
term  id
| num
Recognition of Tokens (Cont.)
• The regular definitions for tokens are as follows:
if  if
then  then
else  else
relop  < | <= | = | <>| > | >=
id  letter (letter|digit)*
num  digit+ (.digit+)? (E(+|-)?digit+ )?
delim  blank | tab | newline
ws  delim+
Regular-expression Patterns
for Tokens
Transition Diagrams
• Lexical analysis use transition diagram to
keep track of information about characters
that are seen as the forward pointer scans
the input.
• Positions in a transition diagram are drawn
as circles and are called states. The states
are connected by arrows, called edges.
• A double circle indicated an accepting state,
a state in which a token is found.
• a* indicates that input retraction must take
place.
Transition Diagrams for >=

• start state : stare 0 in the above example


• If input character is >, go to state 6.
• other refers to any character that is not
indicated by any of the other edges leaving s.
Transition Diagrams for
Relational Operators

token attribute-value
Transition Diagrams for Identifiers and Keywords

Transition Diagrams for White Space


Transition Diagrams for
Unsigned Numbers

install_num(
)

install_num(
)
install_num(
)
Write a RE for the language L accepting all the strings
ending with 00 over the alphabet ∑ = {0 , 1}
(0+1)* 00

Construct s RE for the L accepting the strings which


are starting with 1 and ending with 0 over the alphabet
∑ = {0 , 1}
R = 1(0+1)*0

Write a RE to denote a language over the set ∑


={a,b,c} .S.T every string will have atleast one ‘a’
followed by atleast one ‘b’ followed by atleast one ‘c’
a+b+c+
Write a RE to denote a L where ∑ ={a,b} S.T the third character from the
right end of the string is always a

(a+b)* a (a+b)(a+b)

Construct a RE for the L which accepts all strings with atleast two b over
the alphabet ∑ ={a,b}

(a+b)* b (a+b)* b(a+b)*


Write a RE which denotes a L ∑ ={1} having odd length of string.

11,1111,111111 (11)*
1.(11)* for odd
Construct a RE for the L over the set ∑ ={a,b} in which the
total number of a is divisible by 3.
R=(aaa)+ divisible by 3
R= (b*ab*ab*ab*)+
Write the RE for the L over the set of strings over {a,b,c} that
contain exactly one b
R= (a/c)* b (a/c)*

Write the RE for the L over the set of strings over {a,b,c} that
contain no two consecutive b’s
(b/ͼ) (a/c/ab/cb)*
Write the RE for the L over the set of strings over alphabet
{a,b,c} containing an even number of a’s
((b/c)* a (b/c)* a)* (b/c)*
Describe the language denoted by the following RE’s
•0(0/1)* 0
•(0/1)* 0 (0/1) (0/1)
•0* 10*10*10* or a*ba*ba*ba*.
•(00/11)* ((01/10)* (01/10) (00/11)* )*

The set of all string of 0’s and 1’s starting and ending with 0

The set of all strings of 0’s and 1’s with the third symbol from the right end is 0

The set of all strings of 0’s and 1’s with the number of 1’s in the string is 3

The set of all strings of 0’s and 1’s with even number of 0’s and 1’s.

(0 + 1)∗0((0 + 1)(0 + 1)(0 + 1))∗0(0 + 1)∗


Parse Trees Vs Syntax Trees
Syntax Trees-
Syntax trees are abstract or compact representation of parse trees.They are also called
as Abstract Syntax Trees

Parse Tree Syntax Tree


Parse tree is a graphical
Syntax tree is the compact form of a parse
representation of the replacement
process in a derivation.
tree.

Each interior node represents a


grammar rule. Each interior node represents an operator.
Each leaf node represents an operand.
Each leaf node represents a terminal.

Parse trees provide every Syntax trees do not provide every


characteristic information from the characteristic information from the real
real syntax. syntax.

Parse trees are comparatively less Syntax trees are comparatively more dense
dense than syntax trees. than parse trees.
Considering the following grammar-
E→E+T|T
T→TxF|F
F → ( E ) | id
( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))

You might also like