unit5
unit5
Scanner
1
Task of a scanner
• Deliver tokens
3
Token classes of KPL
• Unsigned integer
• Identifier
• Key word: begin,end, if,then, while, do, call, const, var, procedure,
program,type, function,of,integer,char,else,for, to,array
• Character constant
• Operators:
• Arithmetic
+ - */
• Relational
= != < > <= >=
• Assign :=
• Separators
( ) . : ; (. .)
4
Finite Automata
• Consider a one-way
automatic door.
• This door has two pads that
can sense when someone is
standing on them, a front
and rear pad.
• We want people to walk Front Rear
through the front and
toward the rear, but not Pad Pad
allow someone to walk the
other direction:
One Way Door
Start C O
a
DFA Example D1
a b
q0 q1 q0
q1 q2 q0
q2 q3 q0
q3 q3 q3
Input and output of a lexical analyzer (scanner)
12
Recognizing KPL’s tokens
13
The scanner as a Deterministic Finite Automaton
• Character classification
• Token recognition
15
Lexical rules of KPL
16
Classification of characters based on their ASCII code
typedef enum {
CHAR_SPACE, // Spaces (include space, tab, backspace…
CHAR_LETTER, // Letters
CHAR_DIGIT, // digits
CHAR_PLUS, // ‘+’
CHAR_MINUS, // ‘-’
CHAR_TIMES, // ‘*’
CHAR_SLASH, // ‘/’
CHAR_LT, // ‘<‘
CHAR_GT, // ‘<‘
CHAR_EXCLAIMATION, // ‘!’
CHAR_EQ, // ‘=‘
CHAR_COMMA, // ‘,’
CHAR_PERIOD, // ‘.’
CHAR_COLON, // ‘:’
CHAR_SEMICOLON, // ‘;’
CHAR_SINGLEQUOTE, // ‘\’’
CHAR_LPAR, // ‘(‘
CHAR_RPAR, // ‘)’
CHAR_UNKNOWN // invalid characters
} CharCode;
CharCode charCodes[256] ={……}
17
Data structure for list of tokens
enum {
TK_NONE, TK_IDENT, TK_NUMBER, TK_CHAR, TK_EOF,
18
Scanner implementation based on DFA
state = 0;
currentChar = readChar();
token = getToken();
while (token!=EOF)
state =0;
token = getToken();
}
Token recognizer
switch (state)
{
case 0 :
switch (currentChar)
{
case space
state = 2;
case lpar
state = 38;
case letter
state = 3;
case digit
state =7;
case plus
state = 9;
case lt
state = 13
……
}
Token recognizer (cont’d)
case 9:
readChar();
return SB_PLUS;
case 13:
readChar();
if (currentChar = EQ)state = 14 else state =
15;
case 14:
readChar();
return SB_LE;
case 15:
return SB_LT;
Token recognizer (cont’d)
case 2:
while (currentChar= space) // skip blanks
readChar();
return getToken();
case 35:
readChar();
if (currentChar= EOF)state =41;
else
switch (currentChar)
{
case period
state = 36;// token lsel
case times
state =37; //skip comment
default
state =41; // token lpar
}
return getToken();
}
Skip comments
case 37: // skip comment
readChar();
while (currentChar != times)
{
state = 37;
readChar();
}
state = 38;
case 38:
readChar();
while (currentChar == times)
{
state = 38;
currentChar = readChar();
}
If (currentChar == lpar) state = 39; else state =40;
Distinction between identifiers and keywords
24
Distinction between identifiers and keywords
case 4:
if (checkKeyword (token) == TK_NONE)state = 5;
else state =6;
case 5:
install_ident();// save to symbol table
case 6
return checkKeyword(token);
…………
Initialize a symbol table
26
Scanner Generators
27
Model of a compiler
Syntax Semantic
Source program structure
Scanner Tokens Parser analyzer
(Stream of Characters)
Intermediate code
Target
Code
Generator
Assembly code
28
Scanner and parser generator
• FLEX (LEX)
– Generate C code for the scanner
– Lexical rules are expressed by a set of regular expressionss
• BISON (YACC)
– Generate C code for the parser following LR(1) method (bottom up)
– Grammar is expressed by BNF
29
Input and output of a scanner generator
NUMBER [0 - 9]+
DELIMITER [ \n\t\r]
CHAR \'[[:print:]]\'
IDENT [a-zA-Z][a-zA-Z0-9]*
COMMENT \(\*([^*]|(\*+[^*)]))*\*+\)
ERROR [^+\-*/,;.:()=a-zA-Z0-9<>]
30
Model of a scanner generator
R Scanner
generator P
S L( R) Accept a token
• Applications
Regular
-NFA NFA DFA
expression
Minimization
Minimum DFA
FLEX: The fast lexical analyzer generator
2. Run FLEX on the input file. flex produces a C file called lex.yy.c
with the scanning function yylex().
Scanner in c code
Rule file
*.l Flex compiler lex.yy.c
The flex input file consists of three section separated by a line with
just %%
%{
auxiliary declarations
%}
regular definitions
%%
translation rules
%%
auxiliary procedures
Auxiliary declarations and regular definitions
41
Translation rules
42
Auxiliary procedures
43