Chapter 3 implementation_of_lexical_analysis
Chapter 3 implementation_of_lexical_analysis
Analysis
Chapter 3
1
Notatio
n
• There is variation in regular
expression notation
A+
• At least one: AA*
• Union: A | B A+B
• Option: A + A?
• Range: ‘a’+’b’+…+’z’ [a-z]
• Excluded range:
complement of [a-z]
2
Regular Expressions in Lexical
Specification
• Last Lecture: a specification for the
predicate
s L(R)
Set of strings
4
Regular Expressions => Lexical
Spec. (2)
5
Regular Expressions => Lexical
Spec. (3)
3. Let input be x1…xn
For 1 i n check
x1…xi L(R)
6
Ambiguities
(1)
• There are ambiguities in the
algorithm
• Solution:
– Write a rule matching all “bad”
strings
– Put it last (lowest priority)
9
Summa
ry
• Regular expressions provide a concise
notation for string patterns
10
Finite
Automata
• Regular expressions =
specification
• Finite automata =
implementation
s2
• An accepting state
a
• A transition
13
A Simple
Example
• A finite automaton that accepts only
“1”
1
A B
• Accepts ‘1’ :
• Rejects ‘0’ :
14
Another Simple
Example
• A finite automaton accepting any number
of 1’s followed by a single 0
• Alphabet: {0,1}
0
A
B
15
• Accepts ‘110’:
And Another
Example
• Alphabet {0,1}
• What language does this
recognize?
1 0
0 0
1
1
16
And Another
Example
1 0
Select the regular
language that 0 0
denotes the same
language as this
finite automaton 1
1
(0 + 1)*
(1* + 0)(1 + 0)
1* + (01)* + (001)* +
(000*1)* (0 + 1)*00
17
And Another
Example
1 0
Select the regular
language that 0 0
denotes the same
language as this
finite automaton 1
1
(0 + 1)*
(1* + 0)(1 + 0)
1* + (01)* + (001)* +
(000*1)*
(0 +
1)*00 18
Epsilon
Moves
• Another kind of transition: -
moves
A B
19
Deterministic and Nondeterministic
Automata
• Deterministic Finite Automata (DFA)
– One transition per input per state
– No -moves
20
Execution of Finite
Automata
• A DFA can take only one path through
the state graph
– Completely determined by input
21
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
• Input:
• Possible
States:
Rule: NFA accepts if it can get to a final
state
22
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
A
• Input: 1
• Possible
States:
Rule: NFA accepts if it can get to a final
state
23
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
A
• Input: 1
• Possible {A
States: }
Rule: NFA accepts if it can get to a final
state
24
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
A
• Input: 1 0
• Possible {A
States: }
Rule: NFA accepts if it can get to a final
state
25
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
A B
• Input: 1 0
• Possible {A {A,
States: } B}
Rule: NFA accepts if it can get to a final
state
26
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
A B
• Input: 1 0
• Possible {A {A,
States: } B}
Rule: NFA accepts if it can get to a final
state
Acceptance of
NFAs
• An NFA can get into multiple
states
1
0
0
A
• Input: 1 0 0
• PossibleB {A {A, {A, B,
} B} C}
States:
Rule: NFA accepts if it can get to a final
state C
28
NFA vs. DFA
(1)
• NFAs and DFAs recognize the same set
of languages (regular languages)
1 0
0 0
DFA
1
1
• DFA can be exponentially larger than
NFA 30
Regular Expressions to Finite
Automata
• High-level sketch
NFA
Regular
expression DFA
s
Lexical Table-driven
Specificatio Implementation of
n DFA
31
Regular Expressions to
NFA (1)
• For each kind of rexp, define an
NFA
– Notation: NFA for rexp M
M
• For
• For input a
a
32
Regular Expressions to
NFA (2)
• For AB
A
B
• For A +
B
B
A
33
Regular Expressions to
NFA (3)
• For A*
A
34
Example of RegExp -> NFA
conversion
• Consider the regular
expression
(1+0)*1
• The NFA is
C
1
E
A B G
1
0 I J
D F H
35
NFA to DFA: The
Trick
• Simulate the NFA
• Each state of DFA
= a non-empty subset of states of the NFA
• Start state
= -closure of the start state of NFA
• Add a transition S a S’ to DFA iff
– S’ is the set of NFA states reachable from any
state in S after seeing the input a, considering -
moves as well
• Final states
Subsets that include at least one final state of
NFA
36
-closure of a
state
-closure(B)= {B,C,D}
-closure(G)=
{A,B,C,D,G,H,I}
37
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
38
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
39
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
40
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
ABCDHI
41
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0
ABCDHI
42
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0
ABCDHI
43
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0 FGHIABCD
ABCDHI
44
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0 FGHIABCD
ABCDHI
1
45
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0 FGHIABCD
ABCDHI
1
46
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0 FGHIABCD
ABCDHI
1 EJGHIABCD
47
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0 FGHIABCD
ABCDHI 0
1 EJGHIABCD
48
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0 FGHIABCD
ABCDHI 0 1
1 EJGHIABCD
49
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0
0 FGHIABCD
ABCDHI 0 1
1 EJGHIABCD
50
NFA -> DFA
Example
C
1
E
A B G
1
0 I J
D F H
0
0 FGHIABCD
ABCDHI 0 1
1
1 EJGHIABCD
51
Implementat
ion
• A DFA can be implemented by a 2D table T
– One dimension is “states”
– Other dimension is “input symbol”
– For every transition Si a Sk define T[i,a] = k
• DFA “execution”
– If in state Si and input a, read T[i,a] = k and
skip to state Sk
– Very efficient
52
Table Implementation of a
DFA
0
0 T
S 0 1
1
1 U
0 1
S T U
T T U
U T U
53
Implementation (Cont.)
54
DFA for recognizing two relational
operators
other
8
* return(SYMBOL, >)
We’ve accepted “>” and have read “other” character that must be
unread. That is moving the input pointer one character back.
55
DFA of Pascal relational
operators
start < =
0 1 2 return(SYMBOL, <=)
>
3 return(SYMBOL, <>)
other
= 4
*
return(SYMBOL, <)
5 return(SYMBOL, =)
>
=
6 7 return(SYMBOL, >=)
other
8
*
return(SYMBOL, >)
56
DFA for recognizing id and
keyword
letter or digit
return(get_token(), install_id())
E digit
other
58
Lexical
errors
• Some errors are out of power of
lexical analyzer to recognize:
fi (a == f(x)) …
61
<id, 1> <op, = > <id, 2> <op, + > <id, 3> <op, * > <num, 60 >
Using Buffer to Enhance
Efficiency
Current token
E = M * C * * 2 eof