L3_FSM
L3_FSM
Compiler Design
Dr. Sadu Chiranjeevi
Assistant Professor
Department of Computer Science and Engineering
[email protected]
1
How to describe tokens?
• Programming language tokens can be
described by regular languages
• Regular languages
– Are easy to understand
– There is a well understood and useful theory
– They have efficient implementation
• Regular languages have been discussed in
great detail in the “Theory of Computation”
course
How to specify tokens
• Regular definitions
– Let ri be a regular expression and di be a
distinct name
– Regular definition is a sequence of
definitions of the form
d1 r1
d2 r2
…..
dn rn
– Where each ri is a regular expression
over Σ U {d1, d2, …, di-1}
Examples
• My fax number
91-(512)-259-7586
• Σ = digit U {-, (, ) }
• Country digit + digit2
• Unsigned number in C
• digit 0| 1| …|9
• digits digit+
fraction ’.’ digits | є
exponent (E ( ‘+’ | ‘-’ | є) digits) | є
number digits fraction exponent
Regular expressions in specifications
• Regular expressions describe many useful languages
• A final state
• Transition
Letter|_ other *
digit
delim *
other
Transition diagram for unsigned numbers
Implementation of transition
diagrams
Token nexttoken() {
while(1) {
switch (state) {
……
case 10: c=nextchar();
if(isletter(c)) state=10;
elseif (isdigit(c)) state=10;
else state=11;
break;
……
}
}
}
Lexical analyzer generator
• Input to the generator
– List of regular expressions in priority order
– Associated actions for each of regular expression
(generates kind of token and other book keeping
information)
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Format of Lex file
• Definitions include declarations of constant, variable
and regular definitions.
/* Rules Section*/
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting number of words*/
int yywrap(void){}
int main()
{
// The function that starts the analysis
yylex();
return 0;
}