0% found this document useful (0 votes)
3 views

L3_FSM

Uploaded by

mekasiddu44
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

L3_FSM

Uploaded by

mekasiddu44
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

23CS2204

Compiler Design
Dr. Sadu Chiranjeevi
Assistant Professor
Department of Computer Science and Engineering
[email protected]

1
How to describe tokens?
• Programming language tokens can be
described by regular languages
• Regular languages
– Are easy to understand
– There is a well understood and useful theory
– They have efficient implementation
• Regular languages have been discussed in
great detail in the “Theory of Computation”
course
How to specify tokens
• Regular definitions
– Let ri be a regular expression and di be a
distinct name
– Regular definition is a sequence of
definitions of the form
d1  r1
d2 r2
…..
dn  rn
– Where each ri is a regular expression
over Σ U {d1, d2, …, di-1}
Examples
• My fax number
91-(512)-259-7586
• Σ = digit U {-, (, ) }
• Country  digit + digit2

• Area ‘(‘ digit ‘)’


+ digit3

• Exchange  digit+ digit3


• Phone  digit+ digit4
• Number  country ‘-’ area ‘-’
exchange ‘-’ phone
Examples

• My email address
[email protected]
• Σ = letter U {@, . }
• letter  a| b| …| z| A| B| …| Z
• name  letter+
• address  name ‘@’ name ‘.’ name
‘.’ name
Examples …
• Identifier
letter  a| b| …|z| A| B| …| Z
digit  0| 1| …| 9
identifier  letter(letter|digit)*|_(letter|digit)*

• Unsigned number in C
• digit  0| 1| …|9
• digits  digit+
fraction  ’.’ digits | є
exponent  (E ( ‘+’ | ‘-’ | є) digits) | є
number  digits fraction exponent
Regular expressions in specifications
• Regular expressions describe many useful languages

• Regular expressions are only specifications;


implementation is still required

• Given a string s and a regular expression R,


does s Є L(R) ?

• Solution to this problem is the basis of the lexical


analyzers

• However, just the yes/no answer is not sufficient

• Goal: Partition the input into tokens


1. Write a regular expression for lexemes of each
token
• number  digit+
2. Construct R matching all lexemes of all tokens
• R = R1 + R2 + R3 + …..
3. Let input be x1…xn
• for 1 ≤ i ≤ n check x1…xi Є L(R)
4. x1…xi Є L(R)  x1…xi Є L(Rj) for some j
• smallest such j is token class of x1…xi
5. Remove x1…xi from input; go to (3)
Transition Diagrams
• Regular expression are declarative specifications
• Transition diagram is an implementation
• A transition diagram consists of
– An input alphabet belonging to Σ
– A set of states S
– A set of transitions statei → 𝑖𝑛𝑝𝑢𝑡 statej
– A set of final states F
– A start state n
• Transition s1 →𝑎 s2 is read:
in state s1 on input 𝑎 go to state s2
• If end of input is reached in a final state then accept
• Otherwise, reject
Pictorial notation
• A state

• A final state

• Transition

• Transition from state i to state j on an


input a a
i j
How to recognize tokens
• Consider
relop  < | <= | = | <> | >= | >
id  (letter|_)(letter|digit)*
num  digit+ (‘.’ digit+)? (E(‘+’|’-’)? digit+)?
delim  blank | tab | newline
ws  delim+

• Construct an analyzer that will return


<token, attribute> pairs
Transition diagram for relops
Transition diagram for identifier
letter

Letter|_ other *

digit

Transition diagram for white spaces


delim

delim *
other
Transition diagram for unsigned numbers
Implementation of transition
diagrams
Token nexttoken() {
while(1) {
switch (state) {
……
case 10: c=nextchar();
if(isletter(c)) state=10;
elseif (isdigit(c)) state=10;
else state=11;
break;
……
}
}
}
Lexical analyzer generator
• Input to the generator
– List of regular expressions in priority order
– Associated actions for each of regular expression
(generates kind of token and other book keeping
information)

• Output of the generator


– Program that reads input character stream and breaks
that into tokens
– Reports lexical errors (unexpected characters), if any
LEX: A lexical analyzer
generator
lex.yy.c
Token C code for C
specifications LEX Lexical Compiler
Lex.l analyzer
Object code
a.out
Input Lexical
tokens
program analyzer
Format of Lex file
• A Lex program is separated into three sections by %%
delimiters. The formal of Lex source is as follows:

{ definitions }
%%
{ rules }
%%
{ user subroutines }
Format of Lex file
• Definitions include declarations of constant, variable
and regular definitions.

• Rules define the statement of form p1 {action1} p2


{action2}....pn {action}.

• Where pi describes the regular expression and action1


describes the actions what action the lexical analyzer
should take when pattern pi matches a lexeme.

• User subroutines are auxiliary procedures needed by


the actions. The subroutine can be loaded with the
lexical analyzer and compiled separately.
Lex Program
/*lex program to count number of words*/
%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}

/* Rules Section*/
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting number of words*/

"\n" {printf("%d\n", i); i = 0;}


%%

int yywrap(void){}

int main()
{
// The function that starts the analysis
yylex();

return 0;
}

You might also like