Flex
Flex
Ismaeel Alkrayyan
AI Departement,4th
lexical analyzer
Scanner :
• This is the first phase of a compiler.
• reading a source text as a file of characters and dividing them up into tokens
by matching sequential characters to patterns.
• Filtering comment lines and white space characters. white space characters
like tab, space, newline characters.
• Pattern: It is a rule that describes the character that can be grouped into
tokens. It is expressed as a regular expression. Input stream of characters are
matched with patterns and tokens are identified.
• Lexeme: It is the actual text/character stream that matches with the pattern
and is recognized as a token.
• For example, “int” is identified as token keyword. Here “int” is lexeme and keyword is token
Scanner generators:
• Helps write programs whose control flow is directed by instances of regular
expressions in the input stream.
Output: C code
Input: a set of implementing a
regular expressions flex (or lex) scanner:
+ actions function: yylex()
file: lex.yy.c
compiler
user
supplies
driver
code
main() {…}
or
parser() {…}
definitions
required
%%
rules optional
%%
user code
%%
%%
Pattern Action
• The patterns at the heart of every flex scanner use a rich regular
expression language.
• A regular expression is a pattern description using a metalanguage. a
language that you use to describe what you want the pattern to
match
• The metalanguage uses standard text characters, some of which
represent themselves and others of which represent patterns.
• All characters other than the metacharacter, including all letters and
digits, match themselves.
• [-+]?([0-9]*\.?[0-9]+|[0-9]+\.)(E(+|-)?[0-9]+)?
Example
#include <stdio.h>
#include <stdlib.h>
%}
Rule to match a number and return its value to
dgt [0-9] the calling routine
%%
rules
#include <stdio.h>
defining and using a name
#include <stdlib.h>
%}
dgt [0-9]
%%
rules
total += val;
n++;
}
if (n > 0) printf(“ave = %d\n”, total/n);
}
#include <stdio.h>
defining and using a name
#include <stdlib.h>
%}
dgt [0-9]
%% char * yytext;
rules
total += val;
n++;
}
if (n > 0) printf(“ave = %d\n”, total/n);
}
#include <stdio.h>
defining and using a name
#include <stdlib.h>
%}
dgt [0-9]
%% char * yytext;
rules
• If compiled using “gcc –Wall” the previous flex file will generate
compiler warnings:
lex.yy.c: … : warning: `yyunput’ defined but not used
lex.yy.c: … : warning: `input’ defined but not used
• Example:
• <STRING>[^"]* { …match string body… }
• [^"] matches any character other than "
• The rule is activated only if the scanner is in the start condition STRING.
• INITIAL refers to the original state where no start conditions are
active.
• <*> matches all start conditions.