0% found this document useful (0 votes)
7 views

Flex

A flex tutorial document is summarized as follows: 1) Flex is a tool that generates scanners (lexical analyzers) that read input as a stream of characters and divide it into tokens by matching character sequences to regular expressions. 2) A flex program consists of definitions, rules, and user code sections. The rules section uses regular expressions to define patterns and associate actions to perform when patterns are matched. 3) Flex generates a C program that implements a lexical analyzer function (yylex()) to scan the input and return tokens. The user can write additional code to process the tokens.

Uploaded by

boshra.xzx
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Flex

A flex tutorial document is summarized as follows: 1) Flex is a tool that generates scanners (lexical analyzers) that read input as a stream of characters and divide it into tokens by matching character sequences to regular expressions. 2) A flex program consists of definitions, rules, and user code sections. The rules section uses regular expressions to define patterns and associate actions to perform when patterns are matched. 3) Flex generates a C program that implements a lexical analyzer function (yylex()) to scan the input and return tokens. The user can write additional code to process the tokens.

Uploaded by

boshra.xzx
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

flex

Ismaeel Alkrayyan
AI Departement,4th
lexical analyzer
Scanner :
• This is the first phase of a compiler.
• reading a source text as a file of characters and dividing them up into tokens
by matching sequential characters to patterns.
• Filtering comment lines and white space characters. white space characters
like tab, space, newline characters.

A quick tutorial on fLex 2


Tokens, Patterns, Lexemes

• Token: It is a group of characters with logical meaning. Token is a logical


building block of the language .Example: id, keyword,Num

• Pattern: It is a rule that describes the character that can be grouped into
tokens. It is expressed as a regular expression. Input stream of characters are
matched with patterns and tokens are identified.

• Lexeme: It is the actual text/character stream that matches with the pattern
and is recognized as a token.
• For example, “int” is identified as token keyword. Here “int” is lexeme and keyword is token

A quick tutorial on fLex 3


flex : Overview

Scanner generators:
• Helps write programs whose control flow is directed by instances of regular
expressions in the input stream.

Output: C code
Input: a set of implementing a
regular expressions flex (or lex) scanner:
+ actions function: yylex()
file: lex.yy.c

A quick tutorial on Lex 4


Using flex
file: lex.yy.c

lex input spec yylex()


(regexps + flex {

actions)
}

compiler
user
supplies
driver
code

main() {…}
or
parser() {…}

A quick tutorial on Lex 5


flex: input format

An input file has the following structure:

definitions
required
%%
rules optional
%%
user code

Shortest possible legal flex input:

%%

A quick tutorial on Lex 6


Definitions
%option noyywrap
%{ Options
#include<stdio.h>
#include<stdlib.h>
int line_count=0; C Code
%}
whitespace [ \t\v\f\r]+
Newline [\n] Flex
DIGIT [0-9] Definitions
CommentStart "/*"
ID [a-zA-Z][a-zA-Z0-9]*
%% A quick tutorial on Lex 7
Rules

• The rules portion of the input contains a sequence of rules.


• Each rule has the form
pattern action
where:
• pattern describes a pattern to be matched on the input
• pattern must be un-indented
• action must begin on the same line.

A quick tutorial on fLex 8


Rules
%%

[0-9]+ {printf("%s is a number",yytext);}


{whitespace} {printf("whitespace encountered");}
{newline} {line_count++;}
. {printf("Mysterious character found");}

%%

Pattern Action

Patterns extended regular expressions.


Do not place any whitespace at the beginning of a pattern line.
“start conditions” can be used to specify that a pattern match only in
specific situations.
Patterns

• Essentially, extended regular expressions.

• <<EOF>> to match “end of file”


• Character classes:
• [:alpha:], [:digit:], [:alnum:], [:space:].
• {name} where name was defined earlier.
• “start conditions” can be used to specify that a pattern match only in
specific situations.

A quick tutorial on Lex 10


Regular Expressions

• The patterns at the heart of every flex scanner use a rich regular
expression language.
• A regular expression is a pattern description using a metalanguage. a
language that you use to describe what you want the pattern to
match
• The metalanguage uses standard text characters, some of which
represent themselves and others of which represent patterns.
• All characters other than the metacharacter, including all letters and
digits, match themselves.

A quick tutorial on Lex 11


Regular Expressions

A quick tutorial on Lex 12


Regular Expressions
Metacharacter Meaning Example
Matches any single
. character except the
newline character (\n).
Used to escape \n is a newline
\ metacharacters and as
part of the usual C escape \* is a literal
sequences; asterisk.
Trailing context, which means 0/1
/ to match the regular matches 0 in the
expression preceding the slash string 01 but would
but only if followed by the not match
regular expression after the anything in the
slash. string 0 or 02.
A quick tutorial on Lex 13
Only one slash is permitted per
Regular Expressions

Metacharacter Meaning Example


Zero or one occurrence of
? -?[0-9]+
preceding expression
• To specify already
defined names {whitespace}
{}
• To specify number of 1{2}3{4}5{6}
occurrance
a|b
| Or faith|hope|charity

Group series of regular


() expression together
A quick tutorial on Lex (ab|cd)+ 14
Regular Expressions

Metacharacter Meaning Example

• If within [], then means


except following characters [^ab]
^
• Otherwise means start of ^ab
line

$ End of line 124$


“” Match anything literally “^124$”
<<EOF>> End of file

A quick tutorial on Lex 15


Regular Expressions
• complex number pattern with
• exponent part is optional.
• optional decimal point.
• optional sign.

• [-+]?([0-9]*\.?[0-9]+|[0-9]+\.)(E(+|-)?[0-9]+)?
Example

A flex program to read a file of (positive) integers and compute


the average:
%{ Definition for a digit
definitions

#include <stdio.h>
#include <stdlib.h>
%}
Rule to match a number and return its value to
dgt [0-9] the calling routine
%%
rules

{dgt}+ return atoi(yytext);


%%
void main()
Driver code
{ (could instead have been in a separate file)
int val, total = 0, n = 0;
user code

while ( (val = yylex()) > 0 ) {


total += val;
n++;
} A quick tutorial on Lex 17
if (n > 0) printf(“ave = %d\n”,
Example

A flex program to read a file of (positive) integers and compute


the average:
%{
definitions

#include <stdio.h>
defining and using a name
#include <stdlib.h>
%}
dgt [0-9]
%%
rules

{dgt}+ return atoi(yytext);


%%
void main()
{
int val, total = 0, n = 0;
while ( (val = yylex()) > 0 ) {
user code

total += val;
n++;
}
if (n > 0) printf(“ave = %d\n”, total/n);
}

A quick tutorial on Lex 18


Example

A flex program to read a file of (positive) integers and compute


the average:
%{
definitions

#include <stdio.h>
defining and using a name
#include <stdlib.h>
%}
dgt [0-9]
%% char * yytext;
rules

{dgt}+ return atoi(yytext);


a buffer that holds the input
%% characters that actually match the
void main() pattern
{
int val, total = 0, n = 0;
while ( (val = yylex()) > 0 ) {
user code

total += val;
n++;
}
if (n > 0) printf(“ave = %d\n”, total/n);
}

A quick tutorial on Lex 19


Example

A flex program to read a file of (positive) integers and compute


the average:
%{
definitions

#include <stdio.h>
defining and using a name
#include <stdlib.h>
%}
dgt [0-9]
%% char * yytext;
rules

{dgt}+ return atoi(yytext);


a buffer that holds the input
%% characters that actually match the
void main() pattern
{
int val, total = 0, n = 0;
while ( (val = yylex()) > 0 ) {
user code

total += val; Invoking the scanner: yylex()


n++; Each time yylex() is called, the
} scanner continues processing
if (n > 0) printf(“ave = %d\n”, total/n); the input from where it last left
} off.
Returns 0 on end-of-file.

A quick tutorial on Lex 20


Avoiding compiler warnings

• If compiled using “gcc –Wall” the previous flex file will generate
compiler warnings:
lex.yy.c: … : warning: `yyunput’ defined but not used
lex.yy.c: … : warning: `input’ defined but not used

• These can be removed using ‘%option’ declarations in the


first part of the flex input file:
%option nounput
%option noinput

A quick tutorial on Lex 21


Matching the Input (Handles Ambiguous
Patterns)
• When more than one pattern can match the same input, the scanner
behaves as follows:
• Match the longest possible string every time the scanner matches input.
• if multiple rules match, the rule listed first in the flex input file is chosen;
• if no rule matches, the default is to copy the next character to stdout.
• The text that matched (the “token”) is copied to a buffer yytext.
rules

A quick tutorial on Lex 22


Matching the Input (cont’d)

Pattern to match C-style comments: /* … */


"/*"(.|\n)*"*/"
Input:

#include <stdio.h> /* definitions */


int main(int argc, char * argv[ ]) {
if (argc <= 1) {
printf(“Error!\n”); /* no arguments */
}
printf(“%d args given\n”, argc);
return 0;
}

A quick tutorial on Lex 23


Matching the Input (cont’d)

Pattern to match C-style comments: /* … */


"/*"(.|\n)*"*/"
Input:

longest match: #include <stdio.h> /* definitions */


int main(int argc, char * argv[ ]) {
if (argc <= 1) {
printf(“Error!\n”); /* no arguments */
}
printf(“%d args given\n”, argc);
return 0;
}

A quick tutorial on Lex 24


Matching the Input (cont’d)

Pattern to match C-style comments: /* … */


"/*"(.|\n)*"*/"
Input:

longest match: #include <stdio.h> /* definitions */


Matched text
int main(int argc, char * argv[ ]) {
shown in blue
if (argc <= 1) {
printf(“Error!\n”); /* no arguments */
}
printf(“%d args given\n”, argc);
return 0;
}

A quick tutorial on Lex 25


Start Conditions

• Used to activate rules conditionally.


• Any rule prefixed with <S> will be activated only when the scanner is in start
condition S.
• Declaring a start condition S:
• in the definition section: %x S
• “%x” specifies “exclusive start conditions”
• Putting the scanner into start condition S:
• action: BEGIN(S)

A quick tutorial on Lex 26


Start Conditions (cont’d)

• Example:
• <STRING>[^"]* { …match string body… }
• [^"] matches any character other than "
• The rule is activated only if the scanner is in the start condition STRING.
• INITIAL refers to the original state where no start conditions are
active.
• <*> matches all start conditions.

A quick tutorial on Lex 27


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*" ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 28


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 29


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 30


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 31


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 32


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 33


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 34


Using Start Conditions

• Start conditions let us explicitly simulate finite state machines.


• This lets us get around the “longest match” problem for C-style
comments.

FSA for C comments: flex input:


%x S1, S2, S3
non-* %%
*
"/" BEGIN(S1);
/ S1 * S2 * S3
/ <S1>"*" BEGIN(S2);
<S2>[^*] ; /* stay in S2 */
<S2>"*" BEGIN(S3);
non-{ /,* }
<S3>"*“ ; /* stay in S3 */
<S3>[^*/] BEGIN(S2);
<S3>"/" BEGIN(INITIAL);

A quick tutorial on Lex 35


Putting it all together

• Scanner implemented as a function


int yylex();
• return value indicates type of token found (encoded as a +ve
integer);
• the actual string matched is available in yytext.
• Scanner and parser need to agree on token type encodings
• let yacc generate the token type encodings
• yacc places these in a file y.tab.h
• use “#include y.tab.h” in the definitions section of the flex
input file.
• When compiling, link in the flex library using “-lfl”

A quick tutorial on Lex 36

You might also like