0% found this document useful (0 votes)

27 views

Lecture02 Scanning 2

The document summarizes the process of constructing a deterministic finite automaton (DFA) from a regular expression. It discusses how to first construct a non-deterministic finite automaton (NFA) from the regular expression using Thompson's construction. This involves using epsilon transitions to connect the machines for each component of the regular expression. It then describes how to perform subset construction to convert the NFA into an equivalent DFA. Subset construction works by taking the epsilon closure of states and tracking which states are reachable from a set of states on each input symbol.

Uploaded by

Nada Shaaban

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Lecture02 Scanning 2

Uploaded by

Nada Shaaban

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

COMPILER CONSTRUCTION

Principles and Practice

Kenneth C. Louden
2. Scanning (Lexical Analysis)

PART TWO
Contents
PART ONE
2.1 The Scanning Process
2.2 Regular Expression
2.3 Finite Automata

PART TWO
2.4 From Regular Expressions to DFAs [Open]
2.5 Implementation of a TINY Scanner [Open]
2.6 Use of Lex to Generate a Scanner Automatically [Open]
2.4 From Regular Expression To
DFAs
Main Purpose
• Study an algorithm:
– Translating a regular expression into a DFA via
NFA.

Regular Program
NFA DFA
Expression
Contents
• From a Regular Expression to an NFA [More]
• From an NFA to a DFA [More]
• Simulating an NFA using Subset Construction
[More]
• Minimizing the Number of States in a DFA [More]
2.4.1 From a Regular Expression
to an NFA
The Idea of Thompson’s
Construction
• Use ε-transitions
– to “glue together” the machine of each piece of a regular
expression
– to form a machine that corresponds to the whole expression
• Basic regular expression
– The NFAs for basic regular expression of the form a, ε,or φ

a
ε
The Idea of Thompson’s
Construction
• Concatenation: to construct an NFA equal to rs
– To connect the accepting state of the machine of r to
the start state of the machine of s by an ε-transition.
– The start state of the machine of r as its start state and
the accepting state of the machine of s as its accepting
state.
– This machine accepts L(rs) = L(r) L(s) and so
corresponds to the regular expression rs.
ε
r s
… …
The Idea of Thompson’s
Construction
• Choice among alternatives: To construct an NFA
equal to r | s
– To add a new start state and a new accepting state and
connected them as shown using ε-transitions.
– Clearly, this machine accepts the language
L(r|s) =L(r ) U L( s), and so corresponds to the regular
expression r|s.
r
…
ε ε

ε ε

s
…
The Idea of Thompson’s
Construction
• Repetition: Given a machine that corresponds to r，
Construct a machine that corresponds to r*
– To add two new states, a start state and an accepting state.
– The repetition is afforded by the newε-transition from the
accepting state of the machine of r to its start state.
– To draw an ε-transition from the new start state to the new
accepting state.
– This construction is not unique, sim-plifications are possible in the
many cases.
ε

ε ε
r
…

ε
Examples of NFAs Construction
Example 1.12: Translate regular expression ab|a into NFA
a

a ε b

a ε b
ε ε

ε ε
a
Examples of NFAs Construction
Example 1.13: Translate regular expression letter(letter|digit)* into NFA
letter
letter
ε ε

digit
ε ε
digit
ε

letter
ε ε
ε ε
ε ε
digit ε

letter
ε ε ε
letter ε ε ε
ε ε
digit

RET
2.4.2 From an NFA to a DFA
Goal and Methods
• Goal
– Given an arbitrary NFA, construct an equivalent DFA. (i.e.,
one that accepts precisely the same strings)
• Some methods
– (1) Eliminating ε-transitions
• ε-closure: the set of all states reachable by ε-transitions from a state
or states
– (2) Eliminating multiple transitions from a state on a single input
character.
• Keeping track of the set of states that are reachable by matching a
single character
– Both these processes lead us to consider sets of states instead of
single states. Thus, it is not surprising that the DFA we construct
has sets of states of the original NFA as its states.
The Algorithm Called Subset
Construction.
• The ε-closure of a Set of states:
– The ε-closure of a single state s is the set of states
reachable by a series of zero or more ε-transitions,
and we write this set as: s
• Example 2.14: regular a*
ε

ε a ε
1 2 3 4

ε
The algorithm called subset
construction.
ε

ε a ε
1 2 3 4

1 = { 1，2，4}， 2 ={2}， 3 ={2，3，4}， and 4 ={4}.

The ε-closure of a set of states : the union of the ε-closures of each individual state.
S= U
sin S
s

{1,3} = 1∪ 3 = {1，2，3}∪{2，3，4}={1，2，3，4}
The Subset Construction Algorithm
(1) Compute the ε-closure of the start state of M; to obtain new state M .
(2) For this set, and for each subsequent set, compute transitions on
characters a as follows.
Given a set S of states and a character a in the alphabet,
Compute the set
S′a = { t | for some s in S there is a transition from s to t on a }.
Then, compute S a ' , the ε-closure of S′a.
This defines a new state in the subset construction, together with
a new transition S→ S a ' .
(3) Continue with this process until no new states or transitions are created.
(4) Mark as accepting those states constructed in this manner that contain
an accepting state of M.
Examples of Subset Construction
ε

ε a ε
1 2 3 4

M ε-closure of M ( S ) S′a

1 1,2,4 3

3 2,3,4 3
a
a
{1,2,4} {2,3,4}
Examples of Subset Construction
a ε b
ε 2 3 4 5 ε

1
ε ε
a

6 7

M ε-closure of M (S) S′a S′b

1 1,2,6 3,7
3,7 3,4,7,8 5
5 5,8

a b
{1,2,6} {3,4,7,8} {5,8}
Examples of Subset Construction ε

letter
ε 5 6 ε

letter ε ε ε
1 2 3 4 9
ε ε
digit
7 8

M ε-closure of M (S) S′letter S′digit ε

1 1 2
2 2,3,4,5,7,10 6 8
6 4,5,6,7,9,10 6 8
letter
8 4,5,7,8,9,10 6 8
letter {4,5,6,7,9,10}
letter
{1} {2,3,4,5,7,10} digit letter

digit {4,5,7,8,9,10}
digit
RET
2.4.3 Simulating an NFA using
the Subset Construction
One Way of Simulating an NFA
• NFAs can be implemented in similar ways to
DFAs, except that NFAs are nondeterministic
– Many different sequences of transitions that
must be tried.
– Store up transitions that have not yet been tried
and backtrack to them on failure.
An Other Way of Simulating an
NFA
• Use the subset construction
– Instead of constructing all the states of the associated
DFA
– Construct only the state at each point that is indicated
by the next input character
• The advantage: Not need to construct the entire DFA
– Example: input single character a, construct the start
state {1,2,6}and then the second state {3,4,7,8} to
move and match the a.
– Since no following b, accept without generating the
state {5,8}
a b
{1,2,6} {3,4,7,8} {5,8}
An Other Way of Simulating an
NFA
• The disadvantage: A state may be constructed many times, if the path
contains loops
– Example: given the input string r2d3, the sequence of states as showing
below letter

letter {4,5,6,7,9,10}
letter
{1} {2,3,4,5,7,10} digit letter

digit {4,5,7,8,9,10}
digit

• If these states are constructed as the transitions occur, then the states
of the DFA have been constructed and the state {4,5,7,8,9,10}has even
been constructed twice
– Less efficient than constructing the entire DFA

RET
2.4.4 Minimizing the Number of
States in a DFA
Why need Minimizing ?
• The process of deriving a DFA algorithmically
from a regular expression has the unfortunate
property that
– the resulting DFA may be more complex than
necessary.
• The derived the DFA for the regular expression a*
and an equivalent DFA
a a

a
An Important Result from Automata
Theory for Minimizing
• Given any DFA, there is an equivalent DFA
containing a minimum number of states, and, that
this minimum-state DFA is unique (except for
renaming of states)

• It is also possible to directly obtain this

minimum-state DFA from any given DFA.
Algorithm obtaining Mini-States
DFA
1. It begins with the most optimistic assumption possible. Creates two
sets: one consisting of all the accepting states and the other consisting
of all the non-accepting states.

2. Given this partition of the states of the original DFA, consider the
transitions on each character a of the alphabet.
(1) If all accepting states have transitions on a to accepting states, then
this defines an a-transition from the new accepting state (the set of all
the old accept-ing states) to itself.
(2) If all accepting states have transitions on a to non-accepting states,
then this defines an a-transition from the new accepting state to the
new non-accepting state (the set of all the old non-accepting states).
Algorithm obtaining Mini-States
DFA
(3) On the other hand, if there are two accepting states s and t that
have transitions on a that land in different sets, then no a-transition can
be defined for this grouping of the states. We say that a distinguishes
the states s and t
(4) We must also consider error transitions to an error state that is non-
accepting. If there are accepting states s and t such that s has an a-
transition to another accepting state, while t has no a-transition at all
(i.e., an error transition), then a distinguishes s and t.

3. If any further sets are split, we must return and repeat the process
from the beginning. This process continues until either all sets contain
only one element (in which case, we have shown the original DFA to
be minimal) or until no further splitting of sets occurs.
Examples of Minimizing DFA
Example 2.18: The regular
expression letter(letter|digit)* letter

letter {4,5,6,7,9,10}
letter
{1} {2,3,4,5,7,10} digit letter

digit {4,5,7,8,9,10}
digit

The accepting sets {2,3,4,5,7,10},{4,5,6,7,9,10},{4,5,7,8,9,10}

The nonaccepting sets {1}

letter

letter
1 2
digit
Examples of Minimizing DFA
a
Example 2.18: the regular expression (a| ε)b* 1 2

b b

a distinguishes state 1 from states 2 and 3,

3
and we must repartition the states into the sets {1} and {2},{3} b

The accepting sets {1},{2},{3}

The non-accepting sets

a b

{1} b {2},{3}

RET
2.5 Implementation of a Tiny
Scanner
The Tiny language

• The features of a program in TINY:

– a sequence of statements separated by semicolons
– no procedure, no declarations
– all variables are integer,
– two control statement : if-else and repeat
– read and write statements
– comments with curly brackets; but can not be nested
– expressions are Boolean and integer arithmetic
expressions ( using < ,=), (+,-,* /, parentheses,
constants, variables ), Boolean expressions are only as
tests in control statements.
One Sample Program in TINY:
Factorial Function
Read x; {input an integer}
If x>0 then {don’t compute if x <=0}
Fact:=1;
Repeat
Fact :=fact *x;
X:=x-1;
Until x=0;
Write fact {output factorial of x}
End
2.5.1 Implementing a Scanner for
the Sample Language TINY
Defining the tokens and their
attributes.
The tokens and token classes of TINY are summarized as follows:
• Reserved Words Special Symbols Other
• if + number
• then - (1 or more digits)
• else *
• end /
• repeat =
• until < identifier
• read ( (1 or more letters)
• write )
• ;
• :=
TINY has the following lexical
conventions.
1. Comments are enclosed in curly brackets
{…} and cannot be nested;
2. The code is free format; white space
consists of blanks, tabs, and newlines;
3. The principle of longest substring is
followed in recognizing tokens.
The DFAs for the Tokens of TINY
The DFA for the special The DFA combined with
symbols except assignment: DFAs that accept numbers
and identifiers:
digit

return PLUS
INNUM
+
[other]
digit
- letter

letter [other]
STAR INID DONE
T

;
+-*/=<()

return SEMI
The DFAs for the Tokens of TINY
• The DFA extended by adding comments, white space, and
assignment to this DFA
• The DFA considers reserved words to be the same as
identifiers, and then to look up the identifiers in a table of
reserved words
digit

white INNUM
space digit
[other]

letter
letter [other]
STAR INID DONE
T =
:
[other]
{
}
INASSIGN

INCOMMENT

other other
Ways to Translate a DFA or NFA
into Code
A better method:
• Using a variable to maintain the current state and
• writing the transitions as a doubly nested case statement inside a loop,
• where the first case statement tests the current state and the nested second level tests the input
character.

The code of the DFA for identifier: letter

• state := 1; { start }
• while state = 1 or 2 do letter [other]

• case state of 1 2 3

• 1: case input character of

digit
• letter: advance the input :
• state := 2;
• else state := ….{ error or other }; else state := 3;
• end case; • end case;
• 2: case input character of • end case;
• letter , digit: advance the input; • end while;
• state := 2; { actually unnecessary } • if state = 3 then accept else error;
The Code to Implement This DFA
Appendix B :(p511-516) Scan.h and Scan.c
The principal procedure : getToken (lines 674-793)
– consumes input characters and returns the next
token recognized according to the DFA
– uses the doubly nested case analysis described
in Section 2.3.3,
– a large case list based on the state, within which
are individual case lists based on the current
input character.
The code to implement this DFA
Appendix B :(p511-516) Scan.h and Scan.c
The tokens are defined as an enumerated type in
globals.h (lines 174-186)
– which include all the tokens listed above together with
the bookkeeping tokens endfile (when the end of the
file is reached) and ERROR (when an erroneous
character is encountered)
– The states of the scanner are defined as an enumerated
type, but within the scanner itself (lines 612-614).
The code to implement this DFA
Appendix B :(p511-516) Scan.h and Scan.c
The only attribute computed is the lexeme, or string value of
the token recognized
– placed in the variable tokenString.

This variable and getToken are the only services offered to

other parts of the compiler,
– Their definitions collected in the header file scan.h (lines 550-571).
– tokenString is declared with a fixed length of 41, so that
identifiers cannot be more than 40 characters (plus the ending null
character).
The code to implement this DFA
Appendix B :(p511-516) Scan.h and Scan.c
The scanner makes use of three global
variables:
– the file variables source and listing,
– the integer variable lineno declared in
globals.h, allocated and initialized in main. c.
The code to implement this DFA
Appendix B :(p511-516) Scan.h and Scan.c
The table reservedWords (lines 649-656} and the
procedure reservedLookup (lines 658-666}
– perform a lookup of reserved words after an identifier
is recognized by getToken
– the value of current Token is changed accordingly
– A flag variable save is used to indicate whether a
character is to be added to tokenString.
The code to implement this DFA
Appendix B :(p511-516) Scan.h and Scan.c
Character input to the scanner is provided by the
getNextChar function (lines 627-642),
– fetches characters from lineBuf, a 256-character buffer internal to
the scanner.
– If the buffer is exhausted, getNextChar refreshes the buffer from
the source file using the standard C procedure fgets,
– assuming each time that a new source code line is being fetched
(and incrementing lineno).
– While this assumption allows for simpler code, a TINY program
with lines greater than 255 characters will not be handled quite
correctly
The code to implement this DFA
Appendix B :(p511-516) Scan.h and Scan.c

ungetNextChar procedure (lines 644-647) backs up

one character in the input buffer.
Sample program in the TINY
language
• { sample program
• In TINY language -
• Computes factorial
• }
• read x; { input on integer }
• if 0 < x then { don't compute if x <= 0 }
• fact := 1 ;
• repeat
• fact := fact * x;
• x := x - 1
• until x = 0;
• write fact { output factorial of x }
• end
Output of scanner given the TINY
program
TINY COMPILATION: sample.tny 9: fact := fact * x; 12: write fact { output factorial of x }
1: { Sample program 9: id, name= fact 12: reserved words: write
2: in TINY language –
9: := 12: id, name= fact
3: computes factorial
4: } 9: id, name= fact 13:end
5: read x; { input an integer } 9: * 13: reserved word: end
5: reserved word: read 9: id, name= x 14: EOF
5: id, name= x 9: ;
5: ; 10:x := x - 1
6: if 0 < x then { don't compute if x <= 0 }
10: id, name= x
6: reserved word: if
6: mum, val= 0 10: :=
6: < 10: id, name=x
6: id, name= x 10: -
6: reserved word: then 10: mum, val = 1
7: fact := 1; 11:until x = 0;
7: id, name= fact
11: reserved word: until
7: :=
7: num, val= 1 11: id, name= x
7: ; 11: =
8: repeat 11: mum, val= 0
8: reserved word: repeat 11: ;
2.5.2 Reserved Words Versus
Identifiers
Recognizing Reserved Words
• First considering them as identifiers and then
looking them up in a table of reserved words
– Efficiency depends on the lookup process in the
reserved word table
• linear search
• binary search
• hash table (Minimal perfect hash functions)
• Reserved words use the same table that
stores identifiers
2.5.3 Allocating space for
identifiers
Some Issues for Allocation
• In the TINY scanner: token strings can only be a maximum
of 40 characters, but identifiers may be arbitrarily long
• Allocate a 40-character array for each identifier, then much
of the space is wasted
– In TINY compiler, the utility function copyString allocates only
the necessary space
– A solution to the size limitation of tokenString is to only allocate
space on an as needed basis, using the realloc standard C function.
• An alternative is to allocate an initial large array for all
identifiers and then to perform do-it-yourself memory
alloca-tion within this array
2.6 Use of Lex to Generate a
Scanner Automatically
Introduction to Lex
• Use the Lex scanner generator to generate a
scanner from a description of the tokens of
TINY as regular expressions
• A number of different versions of Lex exist,
the most popular version of Lex is called
flex {for Fast Lex)
Introduction to Lex
• Lex is a program
– Input : a text file containing regular expressions, together with the
actions to be taken when each expression is matched
– Output : Contains C source code defining a procedure yylex that is
a table-driven implementation of a DFA corresponding to the
regular expressions of the input file, and that operates like a
getToken procedure
• The Lex output file, usually called lex.yy.c or
lexyy.c
– compiled and linked to a main program to get a running
program.
Ways to translate a DFA or NFA
into Code
The transition table of the DFA for C comments: The code scheme:

Input char / * Other Accepting

• state := 1;
state • ch := next input character;
1 2 no • while not Accept[state] and not error(state) do
2 3 no • newstate := T[state,ch];
3 3 4 3 no • if Advance[state,ch] then ch := next input char;
4 5 4 3 no
• state := newstate;
• end while;
5 yes

• if Accept[state] then accept;

Assumes :
• The transi-tions are kept in a transition array T indexed by states and input characters;
• The transi-tions that advance the input (i.e., those not marked with brackets in the table) are given by
the Boolean array Advance, indexed also by states and input characters;
• Accepting states are given by the Boolean array Accept, indexed by states.
2.6.1 Lex conventions for
regular expression
Conventions
• Matching of single characters, or strings of
characters, by writing the characters in sequence.
• Metacharacters matched as actual characters by
surrounding the characters in quotes; Quotes
written around characters that are not
metacharacters, where they have no effect.
– match a left parenthesis, we must write " (“
– an alternative is to use the backslash metacharacter \
– match the character sequence (* , have to write \(\*
or " (* "
Conventions
• Metacharacters : *, +, (, ) , |, ?
– The set of strings of a’s and b’s that begin with either
aa or bb and have an optical c at the end.
– (aa|bb)(a|b)*c? ("aa"|"bb")("a"|"b")*"c“
• The Lex convention for character classes (sets of
characters) is to write them between square
brackets.
– The above example cab be writen as:
– (aa|bb) [ab]*c?
Conventions
• Ranges of characters written using a hyphen
– The expression [0-9] means in Lex any of the digits
zero through nine.
• A period is a metacharacter represents a set of
characters:
– It represents any character except a new-line.
• Complementary sets written in this notation,
using the carat ^ as the first character inside the
brackets
– [^0-9abc] means any character that is not a digit and
is not one of the letters a, b, or c.
Conventions
• One curious feature is that inside square brackets
(representing a character class), most of the
metacharacters lose their special status and do not
need to be quoted.
– written [-+] instead of ("+" | "-") . (but not [+-] because of the
metacharacter use of - to express a range of characters).
– [."?] means any of the three characters period, quotation mark,
or question mark
• Some characters, however, are still metacharacters
even inside the square brackets, and to get the actual
character, we must precede the character by a backslash .
– [\^ \ \] means either of the actual characters ^ or \.
Conventions
• A further important metacharacter convention in
Lex is the use of curly brackets to denote names of
regular expressions.
– that a regular expression can be given a name, and that
these names can be used in other regular expressions as
long as there are no recursive references.
• nat [0-9]+
• signedNat (+|-)?{nat}
The table of Conventions
Pattern Meaning

a the character a

“a” the character a, even if a is a metacharacter

\a the character a when a is a metacharacter

a* zero or more repetitions of a

a+ one or more repetitions of a

a? an optional a

a|b a or b
(a) a itself
[abc] any of the characters a, b, or c .

[a-d] any of the characters a. b, c. or d

[^ab] any character except a or b

. any character except a newline

{xxx} the regular expression that the name xxx represents

2.6.2 The format of a Lex input
file
The format
{ definitions }
%%
{ rules }
%%
{ auxiliary routines

• The definition section occurs before the first %%.

– any C code that must be inserted external to any function should appear in
this section between the delimiters %{and %}
• Names for regular expressions must also be defined in this section.
– A name is defined by writing it on a separate line starting in the first
column and following it (after one or more blanks) by the regular
expression it represents.
The format
{ definitions }
%%
{ rules }
%%
{ auxiliary routines}
• The second section: rules
• These consist of a sequence of regular expressions
– followed by the C code that is to be executed when the
corresponding reg-ular expression is matched.
The format
{ definitions }
%%
{ rules }
%%
{ auxiliary routines}
• The third section: auxiliary routines
• Routines are called in the second section and not defined
elsewhere.
– This section may also contain a main program, if we want to
compile the Lex output as a standalone program.
– This section can also be missing. (the second %% need not be
written. The first %% is always necessary.)
Examples
(1) The following Lex input specifies a scanner that adds line numbers to text, sending its
output to the screen.
• %{
• /* a Lex program that adds line numbers
• to lines of text, printing the
• new text to the standard output
• */
• #include <stdio.h>
• int lineno = l;
• %}
• line .*\n
• %%
• {line} { printf ("%5d %s",lineno++,yytext) ; }
• %%
• main( )
• { yylex( ); return 0; }
Examples
(1) Running the program obtained from Lex on this input file itself gives the following
output:
• 1 %{
• 2 /* a Lex program that adds line numbers
• 3 to lines of text, printing the
• 4 new text to the standard
• 5 */
• 6 #include <stdio.h>
• 7 int lineno = l;
• 8 %}
• 9 line .*\n
• 10 %%
• 11 {line} { printf ("%5d %s",lineno++, yytext) ; }
• 12 %%
• 13 main( )
• 14 { yylex( ); return 0; }
Examples
Example 2.21 the Lex input file: • digit [0-9]
• %{ • number {digit}+
• /* a Lex program that • %%
changes all numbers from • {number} { int n =
decimal to hexadecimal atoi(yytext);
notation, printing a summary
statiatic to stdeer • printf(“%x”, n);
• */ • if (n > 9) count++;}
• #include <stdlib.h> • %%
• #include <stdio.h> • main( )
• int count=0; • { yylex();
• %} • fprintf(stderr, “number of
replacements = %d”, count);
• return 0;
• }
Examples
Example 2.22 the following Lex • %%
input file: • {ends_with_a} ECHO;
• %{ • {begins_with_a} ECHO;
• /* Selects only lines that end • .*\n ;
or
• %%
• begin with the letter 'a'.
• main( )
• Deletes everything else.
• { yylex( ); return 0; }
• */
• #include <stdio.h>
• %}
• ends_with_a .*a\n
• begins_with_a a.*\n
Additional feature of Lex input

• Lex has a priority system for resolving such ambiguities.

– First, Lex always matches the longest possible substring {so Lex always
generates a scanner that follows the longest substring principle).
– Then, if the longest substring still matches two or more rules, Lex picks
the first rule in the order they are listed in the action section.
• If the rules and actions as follows:
– .*\n ;
– {ends_with_a} ECHO;
– {begins_with_a} ECHO;
• The program produced by Lex would generate no output at all for any
file, since every line of input will be matched by the first rule.
Summary
• Ambiguity resolution
– Lex's output will always first match the longest
possible substring to a rule.
– If two or more rules cause substrings of equal
length to be matched, then Lex's output will
pick the rule listed first in the action section.
– If no rule matches any nonempty sub-string,
then the default action copies the next character
to the output and continues.
Summary
• Insertion of c code
– Any text written between %{ and %} in the definition section will
be copied directly to the output program external to any procedure.
– Any text in the auxiliary procedures section will be copied directly
to the output program at the end of the Lex code.
– Any code that follows a regular expression (by at least one space)
in the action section (after the first %%) will be inserted at the
appropriate place in the recognition procedure yylex and will be
executed when a match of the corresponding regular expression
occurs.
– The C code representing an action may be either a single C
statement or a compound C statement consisting of any
declara-tions and statements surrounded by curly brackets.
Lex internal names
Lex Internal Name Meaning/Use
lex.yy.c or lexyy.c Lex output file name
yylex Lex scanning routine
yytext string matched on current action
yyin Lex input file (default: stdin)
yyout Lex output file (default: stdout)
input Lex buffered input routine
ECHO Lex default action (print yytext to
yyout)
2.6.3 A tiny scanner using Lex

Appendix B gives a listing of a Lex

input file tiny.l
End of Chapter Two

THANKS

Relais Chateaux Terms and Conditions 2013
No ratings yet
Relais Chateaux Terms and Conditions 2013
33 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Properties and Use of Coal Fly Ash PDF
No ratings yet
Properties and Use of Coal Fly Ash PDF
268 pages
United States v. Bradley Carter - Plea Agreement Letter
100% (2)
United States v. Bradley Carter - Plea Agreement Letter
11 pages
Introduction To Google Analytics - A Guide For Absolute Beginners
100% (2)
Introduction To Google Analytics - A Guide For Absolute Beginners
148 pages
CS-352 - Spring 2024 - Lec4
No ratings yet
CS-352 - Spring 2024 - Lec4
38 pages
4-Lexical Analysis Part3
No ratings yet
4-Lexical Analysis Part3
37 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
21 pages
A Process of Recognizing The Lexical Components in A
No ratings yet
A Process of Recognizing The Lexical Components in A
39 pages
UNIT II
No ratings yet
UNIT II
5 pages
1.3 Regular Expression
No ratings yet
1.3 Regular Expression
47 pages
Chapter 3 Syntax Directed Transalation 54
No ratings yet
Chapter 3 Syntax Directed Transalation 54
7 pages
Unit4 Notes
No ratings yet
Unit4 Notes
32 pages
RE To NFA
No ratings yet
RE To NFA
3 pages
Compiler 2
No ratings yet
Compiler 2
32 pages
Slide 3 - Regular Expressions
No ratings yet
Slide 3 - Regular Expressions
33 pages
Compiler Construction Lecture Notes: Why Study Compilers?
No ratings yet
Compiler Construction Lecture Notes: Why Study Compilers?
16 pages
Compiler Construction Lecture Notes
No ratings yet
Compiler Construction Lecture Notes
27 pages
2 Lexical
100% (1)
2 Lexical
7 pages
TOC
No ratings yet
TOC
14 pages
Sri Vidya College of Engineering and Technology Question Bank
No ratings yet
Sri Vidya College of Engineering and Technology Question Bank
5 pages
Lab Assignment-I
No ratings yet
Lab Assignment-I
6 pages
Spring 2024 Compiler Constructoin A Lab 3-2
No ratings yet
Spring 2024 Compiler Constructoin A Lab 3-2
16 pages
408
No ratings yet
408
8 pages
Computer Science
No ratings yet
Computer Science
111 pages
2-Intermediate Code Generation-Quadruple
No ratings yet
2-Intermediate Code Generation-Quadruple
19 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Q.1 (Answer Any Three From The Following)
No ratings yet
Q.1 (Answer Any Three From The Following)
29 pages
2. Simple Syntax Directed Translation
No ratings yet
2. Simple Syntax Directed Translation
51 pages
2 LexicalAnalysis
No ratings yet
2 LexicalAnalysis
11 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Compiler 2
No ratings yet
Compiler 2
38 pages
Compiler Design CA1
No ratings yet
Compiler Design CA1
10 pages
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
No ratings yet
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
76 pages
Assignment2 Solution Compiler Design
No ratings yet
Assignment2 Solution Compiler Design
4 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
Automata and Complexity Theory
100% (3)
Automata and Complexity Theory
18 pages
Unit-3-Parser Basics, Need and Role of Parser
No ratings yet
Unit-3-Parser Basics, Need and Role of Parser
5 pages
Compilerchapter 4
No ratings yet
Compilerchapter 4
26 pages
Expression
0% (1)
Expression
9 pages
RkCD-Chapter 6 - Intermediate Code Generation
No ratings yet
RkCD-Chapter 6 - Intermediate Code Generation
12 pages
Lexical Analysis: Winter 2007 SEG2101 Chapter 8 1
No ratings yet
Lexical Analysis: Winter 2007 SEG2101 Chapter 8 1
50 pages
Formal Languages and Automata Theory
No ratings yet
Formal Languages and Automata Theory
24 pages
Lecture 3 Lexical Analyzer
No ratings yet
Lecture 3 Lexical Analyzer
44 pages
Home Work For Automata Theory
No ratings yet
Home Work For Automata Theory
4 pages
35 MG
No ratings yet
35 MG
38 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
AT&CD Unit 1
No ratings yet
AT&CD Unit 1
19 pages
Unit 2: Role of Lexical Analyzer
No ratings yet
Unit 2: Role of Lexical Analyzer
11 pages
Compiler Design Lab manual
No ratings yet
Compiler Design Lab manual
32 pages
Compiler Lecture 3
No ratings yet
Compiler Lecture 3
16 pages
Project
No ratings yet
Project
7 pages
CH 2
No ratings yet
CH 2
36 pages
Regular Expression To DFA Conversion Module
No ratings yet
Regular Expression To DFA Conversion Module
38 pages
Question Solved TCS
No ratings yet
Question Solved TCS
15 pages
Syntax
No ratings yet
Syntax
42 pages
Automata Theory Computability - M1
No ratings yet
Automata Theory Computability - M1
85 pages
1586345305compiler Construction Lecture 1
No ratings yet
1586345305compiler Construction Lecture 1
4 pages
CD Unit 2
No ratings yet
CD Unit 2
15 pages
Regular expressions
No ratings yet
Regular expressions
21 pages
Tactile Morse Code
From Everand
Tactile Morse Code
Robert Bodnaryk
No ratings yet
The Genetic Code of All Languages; Part-7 (Korean Hangul Alphabets)
From Everand
The Genetic Code of All Languages; Part-7 (Korean Hangul Alphabets)
Moni Kanchan Panda
No ratings yet
Coloureq PDF
No ratings yet
Coloureq PDF
31 pages
Analisis Financiero Refineria Cartagena
No ratings yet
Analisis Financiero Refineria Cartagena
9 pages
Scenario Planning
No ratings yet
Scenario Planning
3 pages
Long CS Form No. 212 Personal Data Sheet - Excel Format
No ratings yet
Long CS Form No. 212 Personal Data Sheet - Excel Format
7 pages
C 100 For I 1 To N Do For J 1 To N Do (Temp A (I) (J) + C A (I) (J) A (J) (I) A (J) (I) Temp - C) For I 1 To N Do For J 1 To N Do Output (A (I) (J) )
No ratings yet
C 100 For I 1 To N Do For J 1 To N Do (Temp A (I) (J) + C A (I) (J) A (J) (I) A (J) (I) Temp - C) For I 1 To N Do For J 1 To N Do Output (A (I) (J) )
5 pages
Performance Appraisal
0% (2)
Performance Appraisal
8 pages
Digital Metamorphosis: 1. Themes in Details
No ratings yet
Digital Metamorphosis: 1. Themes in Details
8 pages
Pension Regulations The Army
No ratings yet
Pension Regulations The Army
254 pages
Tms 320 DM 6467
No ratings yet
Tms 320 DM 6467
355 pages
Narrative (FINAL)
No ratings yet
Narrative (FINAL)
129 pages
Report On Sku
No ratings yet
Report On Sku
5 pages
Geographical Information System: Course Description
No ratings yet
Geographical Information System: Course Description
4 pages
Rajarshi Bhattacharya
No ratings yet
Rajarshi Bhattacharya
2 pages
BS Zoology 25092023
No ratings yet
BS Zoology 25092023
2 pages
KB220315PRWTK - Sanction Letter
No ratings yet
KB220315PRWTK - Sanction Letter
3 pages
PIM Test Solutions For All Applications: Communication
No ratings yet
PIM Test Solutions For All Applications: Communication
28 pages
Module 3 - Planning
No ratings yet
Module 3 - Planning
10 pages
Online Taxi Booking System: Yima Joash Gire
0% (1)
Online Taxi Booking System: Yima Joash Gire
15 pages
Case Study - Age Discrimination in The Workplace
No ratings yet
Case Study - Age Discrimination in The Workplace
3 pages
TD Las 2021 2022 1
No ratings yet
TD Las 2021 2022 1
43 pages
1991 D. W. M. Waters, The Constructive Trust in Evolution- Substantive and Remedial
100% (1)
1991 D. W. M. Waters, The Constructive Trust in Evolution- Substantive and Remedial
52 pages
Xando Pitch 3.17
No ratings yet
Xando Pitch 3.17
11 pages
Sofistik en Bridges
No ratings yet
Sofistik en Bridges
36 pages
20 Best Behavioral Economics Books of All Time
100% (1)
20 Best Behavioral Economics Books of All Time
163 pages
Operate Main Andauxiliary Machineryand Associatedcontrol Systems
No ratings yet
Operate Main Andauxiliary Machineryand Associatedcontrol Systems
122 pages
FS Final AEI 31 December 2023
No ratings yet
FS Final AEI 31 December 2023
185 pages

Lecture02 Scanning 2

Uploaded by

Lecture02 Scanning 2

Uploaded by

COMPILER CONSTRUCTION

Principles and Practice

1 = { 1，2，4}， 2 ={2}， 3 ={2，3，4}， and 4 ={4}.

M ε-closure of M (S) S′a S′b

M ε-closure of M (S) S′letter S′digit ε

• It is also possible to directly obtain this

The accepting sets {2,3,4,5,7,10},{4,5,6,7,9,10},{4,5,7,8,9,10}

a distinguishes state 1 from states 2 and 3,

The accepting sets {1},{2},{3}

• The features of a program in TINY:

The code of the DFA for identifier: letter

• 1: case input character of

This variable and getToken are the only services offered to

ungetNextChar procedure (lines 644-647) backs up

Input char / * Other Accepting

• if Accept[state] then accept;

“a” the character a, even if a is a metacharacter

\a the character a when a is a metacharacter

a* zero or more repetitions of a

a+ one or more repetitions of a

[a-d] any of the characters a. b, c. or d

[^ab] any character except a or b

. any character except a newline

{xxx} the regular expression that the name xxx represents

• The definition section occurs before the first %%.

• Lex has a priority system for resolving such ambiguities.

Appendix B gives a listing of a Lex

You might also like