0% found this document useful (0 votes)

51 views

In A Nutshell: Flex Flex

This document provides an overview of the flex tool, which is a fast lexical analyzer generator. Flex takes a specification of patterns and actions to generate a C program that implements a scanner. The specification file defines patterns using regular expressions and associated C code actions. Flex generates C code that implements a finite automaton to recognize the patterns. The generated code includes a function called yylex() that scans for tokens.

Uploaded by

David Clemente Gijon

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

In A Nutshell: Flex Flex

Uploaded by

David Clemente Gijon

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS143 Handout 05

Summer 2012 June 27, 2012
flex In A Nutshell
Handout written by Julie Zelenski with minor edits by Keith Schwarz.

flex is a fast lexical analyzer generator.  You specify the scanner you want in the form
of patterns to match and actions to apply for each token.  flex takes your specification
and generates a combined NFA to recognize all your patterns, converts it to an
equivalent DFA, minimizes the automaton as much as possible, and generates C code
that will implement it.  flex is similar to another tool, lex, designed by Lesk and
Schmidt, and bears many similarities to it.  While we will use flex in the course, almost
all features we use are also present in the original lex.

This handout is designed to give you a quick introduction to the flex tool.  This should
serve as a useful reference for the first programming assignment.  However, you should
be aware that in order to complete the first assignment, you may need to use some more
advanced features of flex that aren't covered here.  To learn more about flex, run info
flex, or read the documentation at http://flex.sourceforge.net/manual/.

How It Works
flex is designed for use with C code and generates a scanner written in C.  The scanner
is specified using regular expressions for patterns and C code for the actions. The
specification files are traditionally identified by their .l extension.  You invoke flex on
a .l file and it creates lex.yy.c, a source file containing a wad of unrecognizable C
code that implements a FA encoding all your rules and including the code for the actions
you specified.  The file provides an extern function yylex() that will scan one token.
You compile that C file normally, link with the lex library, and you have built a
scanner! The scanner reads from stdin and writes to stdout by default.

% lex myFile.l creates lex.yy.c containing C code for scanner

% gcc -o scan lex.yy.c -ll compiles scanner, links with lex lib
% ./scan executes scanner, will read from stdin

Linking with the lex library provides a simple main that will repeatedly calls yylex
until it reaches EOF. You can also compile and link the scanner into your project and use
your own main to control when tokens are scanned. The Makefiles we provide for the
projects will execute the compilation steps for you, but it is worthwhile to understand
the steps required.
2

A flex Input File
flex input files are structured as follows:
%{
Declarations
%}
Definitions
%%
Rules
%%
User subroutines

The optional Declarations and User subroutines sections are used for ordinary C
code that you want copied verbatim to the generated C file. Declarations are copied to
the top of the file, user subroutines to the bottom. The optional Definitions section is
where you specify options for the scanner and can set up definitions to give names to
regular expressions as a simple substitution mechanism that allows for more readable
entries in the Rules section that follows. The required Rules section is where you
specified the patterns that identify your tokens and the action to perform upon
recognizing each token.

flex Rules
A rule has a regular expression (called the pattern) and an associated set of C statements
(called the action). The idea is that whenever the scanner reads an input sequence that
matches a pattern, it executes the action to process it. This is a substantial generalization
of regular expressions that allows the tool to be used in many different contexts.

In specifying patterns, flex supports a fairly rich set of conveniences (character classes,
specific repetition, etc.) beyond our formal language definition of a regular expression.
These features don't add expressive power, but simply allow you to construct
complicated patterns more succinctly. The table below shows some operators to give you
an idea of what is available. For more details, see the web or man pages.

Character classes [0-9] This means alternation of the characters in the

range listed (in this case: 0|1|2|3|4|5|6|7|8|
9). More than one range may be specified, e.g.
[0-9A-Za-z] as well as specifying individual
characters, as with  [aeiou0-9].
Character exclusion ^ The first character in a character class may be ^ to
indicate the complement of the set of characters
specified.  For example, [^0-9] matches any
nondigit character.
Arbitrary character . The period matches any single character except
newline.
Single repetition x? 0 or 1 occurrence of x.
3

Nonzero repetition x+ x repeated one or more times; equivalent to xx*.
Specified repetition x{n,m} x repeated between n and m times.
Beginning of line ^x Match x at beginning of line only.
End of line x$ Match x at end of line only.
Contextsensitivity ab/cd Match ab but only when followed by cd. The
lookahead characters are left in the input stream
to be read for the next token.
Literal strings "x" This means x even if x would normally have
special meaning. Thus, "x*" may be used to
match x followed by an asterisk. You can turn off
the special meaning of just one character by
preceding it with a backslash, .e.g. \. matches
exactly the period character and nothing more.
Definitions {name} Replace with the earlier defined pattern called
name. This kind of substitution allows you to re
use pattern pieces and define more readable
patterns.

As the scanner reads characters from the file, it will gather them until it forms the
longest possible match for any of the available patterns. If two or more patterns match
an equally long sequence, the pattern listed first in the file is used.

The code that you include in the actions depends on what processing you are trying to
do with each token. Perhaps the only action necessary is to print the matching token,
add it to a table, or perhaps ignore it in the case of white space or comments. For a
scanner designed to be used by a compiler, the action will usually record the token
attributes and return a code that identifies the token type.

flex Global Variables
The tokengrabbing function yylex takes no arguments and returns an integer. Often
more information is needed about the token just read than that one integer code. The
usual way information about the token is communicated back to the caller is by having
the scanner set the contents of a global variable which can be read by the caller. After
counseling you for years that globals are absolute evil, we reluctantly sanction their
limited use here, because our tools require we use them. Here are the specific global
variables used:

• yytext is a nullterminated string containing the text of the lexeme just
recognized as a token. This global variable is declared and managed in the
lex.yy.c file. Do not modify its contents. The buffer is overwritten with each
4

subsequent token, so you must make your own copy of a lexeme you need to
store more permanently.
• yyleng is an integer holding the length of the lexeme stored in yytext. This
global variable is declared and managed in the lex.yy.c file. Do not modify its
contents.
• yylval is the global variable used to store attributes about the token, e.g. for an
integer lexeme it might store the value, for a string literal, the pointer to its
characters and so on.  This variable is declared to be of type YYSTYPE, and is
usually a union of all the various fields needed for different token types. If you
are using a parser generator (such as yacc or bison), it will define this type for
you, otherwise, you must provide the definition yourself.  Your scanner actions
should appropriately set the contents of the variable for each token.
• yylloc is the global variable that is used to store the location (line and column)
of the token.  This variable is declared to be of type YYLTYPE.  Again, the parser
generator can provide this or it may be your responsibility.  Your scanner actions
should appropriately set the contents of the variable for each token.

Example 1
Here is a simple and complete specification for a scanner that replaces all numbers in a
stream of text with a question mark. It might be useful, for example, if you were a
particularly unscrupulous accountant:

%%
[0-9]+ printf("?");
. ECHO;

The first %% marks the beginning of the rules section, the only section required in the
input file.  The pattern for the first rule matches any sequence of digits and the
associated action prints a question mark instead of the number itself.  The second rule
matches any remaining character and uses the standard action ECHO (which just prints
the character unchanged).  To build and run this program, you could use the following

% flex hide-digits.l
% gcc -o hide-digits lex.yy.c –ll
% ./hide-digits
... at this point anything you type, the scanner echoes after ...
... replacing numbers with question-marks ...

Example 2
The following flex input file has all three sections: a definitions section (when you can
define substitutions, set up global variables, etc.), the rules section, and the user
subroutines section (where you can define helper functions). This function includes its
own main rather than using the one supplied by the flex library. What does this
program do?
5

%{
int numChars = 0, numWords = 0, numLines = 0;
%}

\n {numLines++; numChars++;}
[^ \t\n]+ {numWords++; numChars += yyleng;}
. {numChars++;}

int main() {
yylex();
printf("%d\t%d\t%d\n", numChars, numWords, numLines);
}

You can build and execute this scanner from the commandline as follows:

% flex count.l
% gcc -o count lex.yy.c –ll
% ./count < count.l
% 243 34 17

Example 3
The following shows an excerpt of a scanner configured for use in a compiler. When the
scanner finds a token, it stores information about that token in the global variable
yylval, then returns predefined a token code to inform the compiler of the token type
just scanned. There's obviously a lot more that needs to be here, but that's what you get
to do for your first programming project!
%%
[+>;] { return yytext[0]; /* use ASCII code for single-char token */}
"for" { return T_For; }
[0-9]+ { yylval.integerConstant = atoi(yytext);
return T_IntConstant; }
[a-z]+ { yylval.identifier = strdup(yytext);
return T_Identifier; }

Bibliography
The Flex Project. “Lexical Analysis with Flex.” Accessed Online, 19 Jun 2011. URL:
http://flex.sourceforge.net/manual/
T. Mason, D. Brown, lex & yacc. Sebastopol, CA: O'Reilly & Associates, 1990.

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
JIRA 7 Administration Cookbook Second Edition - Sample Chapter
No ratings yet
JIRA 7 Administration Cookbook Second Edition - Sample Chapter
35 pages
6310 Installation Manual (ENG) PDF
No ratings yet
6310 Installation Manual (ENG) PDF
1 page
Programming Assignment I Due Thursday, April 20, 2023 at 11:59pm
No ratings yet
Programming Assignment I Due Thursday, April 20, 2023 at 11:59pm
7 pages
Lab Programs
No ratings yet
Lab Programs
18 pages
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
13 pages
lab4
No ratings yet
lab4
12 pages
Flex/Le X: Javeria Akram (276) Ifra Zahid
No ratings yet
Flex/Le X: Javeria Akram (276) Ifra Zahid
21 pages
Lecture 07 PDF
No ratings yet
Lecture 07 PDF
8 pages
Estd 1919
No ratings yet
Estd 1919
22 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
No ratings yet
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
16 pages
An Introduction To Flex
No ratings yet
An Introduction To Flex
7 pages
PA1
No ratings yet
PA1
7 pages
Flex
No ratings yet
Flex
36 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
lex tool
No ratings yet
lex tool
7 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Flexman PDF
No ratings yet
Flexman PDF
37 pages
Lex and Yacc
No ratings yet
Lex and Yacc
27 pages
Lab 2
No ratings yet
Lab 2
22 pages
Lex and YACC primer/HOWTO
No ratings yet
Lex and YACC primer/HOWTO
20 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Lab_session
No ratings yet
Lab_session
27 pages
CMSC 141 - Automata and Language Theory
No ratings yet
CMSC 141 - Automata and Language Theory
2 pages
Flex Assign
No ratings yet
Flex Assign
2 pages
2.lexing
No ratings yet
2.lexing
16 pages
Flex Help
No ratings yet
Flex Help
18 pages
UNIT I BKS Lexical Analysis IX - LEX
No ratings yet
UNIT I BKS Lexical Analysis IX - LEX
17 pages
LEX and YACC
No ratings yet
LEX and YACC
3 pages
Compiler Desing-Final ppt2
No ratings yet
Compiler Desing-Final ppt2
194 pages
From Simple IO to Monad Transformers
From Everand
From Simple IO to Monad Transformers
J Adrian Zimmer
2/5 (1)
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Lex Yacc Tutorial
No ratings yet
Lex Yacc Tutorial
38 pages
A Brief (F) Lex Tutorial
No ratings yet
A Brief (F) Lex Tutorial
13 pages
Compiler Lab
No ratings yet
Compiler Lab
23 pages
Compiler Construction: Lab Report # 06
No ratings yet
Compiler Construction: Lab Report # 06
5 pages
SPCC EXP7
No ratings yet
SPCC EXP7
8 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Compiler Construction: Lab Report # 07
No ratings yet
Compiler Construction: Lab Report # 07
6 pages
Lex Programming Lab
No ratings yet
Lex Programming Lab
9 pages
cc-lab5
No ratings yet
cc-lab5
15 pages
What Do You Mean by LEX2
No ratings yet
What Do You Mean by LEX2
7 pages
Lex (Software)
No ratings yet
Lex (Software)
4 pages
Compiler Design (CD) : Lab Assignment 1
No ratings yet
Compiler Design (CD) : Lab Assignment 1
36 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
SSCD assignment1
No ratings yet
SSCD assignment1
11 pages
Spring 2024 Compiler Constructoin A Lab 5
No ratings yet
Spring 2024 Compiler Constructoin A Lab 5
9 pages
Lex
No ratings yet
Lex
19 pages
ASSIGNMENT 1
No ratings yet
ASSIGNMENT 1
4 pages
BITS Pilani
No ratings yet
BITS Pilani
21 pages
Lecture #5 Began Here: Avoid These Top 10 Homework #1 Bugs in Your Homework #2
No ratings yet
Lecture #5 Began Here: Avoid These Top 10 Homework #1 Bugs in Your Homework #2
5 pages
Lex Material 1
No ratings yet
Lex Material 1
37 pages
Lex Yacc
No ratings yet
Lex Yacc
9 pages
CC2
No ratings yet
CC2
6 pages
Lex
No ratings yet
Lex
4 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Shahrukh Memon CSC-18F-054 CS-6A Computer Compiler Sindh Madressatul Islam University
No ratings yet
Shahrukh Memon CSC-18F-054 CS-6A Computer Compiler Sindh Madressatul Islam University
3 pages
Lecture08 4up
No ratings yet
Lecture08 4up
5 pages
The Function of Lex Is As Follows
No ratings yet
The Function of Lex Is As Follows
3 pages
It2024 2marks
No ratings yet
It2024 2marks
16 pages
SQL Tutorial PDF
100% (1)
SQL Tutorial PDF
43 pages
FortiClient EMS 6.2.6 Administration Guide
No ratings yet
FortiClient EMS 6.2.6 Administration Guide
203 pages
CCR: This Function Identifies and Corrects Data Inconsistencies Between APO and R3 System
No ratings yet
CCR: This Function Identifies and Corrects Data Inconsistencies Between APO and R3 System
4 pages
DataTraffic Monitoring and Analysis
100% (1)
DataTraffic Monitoring and Analysis
370 pages
Introduction To Multimedia Notes
100% (2)
Introduction To Multimedia Notes
9 pages
Web2py Paypal Integration
100% (1)
Web2py Paypal Integration
12 pages
SAP-BPC Course Content at NBITS
No ratings yet
SAP-BPC Course Content at NBITS
4 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
2 pages
Solutions To In-Class Assignments (Chapter 3)
No ratings yet
Solutions To In-Class Assignments (Chapter 3)
5 pages
1.5 - 1 Conduct Test and
100% (2)
1.5 - 1 Conduct Test and
15 pages
Caching
100% (1)
Caching
5 pages
Rockwell PLC5: HMI Setting
No ratings yet
Rockwell PLC5: HMI Setting
5 pages
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
No ratings yet
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
187 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
8051 Chap6 Interrupts
No ratings yet
8051 Chap6 Interrupts
13 pages
Creating A Custom Embedded Linux Distribution For Any Embedded Device Using The Yocto Project
100% (1)
Creating A Custom Embedded Linux Distribution For Any Embedded Device Using The Yocto Project
122 pages
Row & Col. Major
No ratings yet
Row & Col. Major
2 pages
Software Development Life Cycle
No ratings yet
Software Development Life Cycle
3 pages
Revit Architecture 2012 System Requirements
No ratings yet
Revit Architecture 2012 System Requirements
6 pages
Create Reports
No ratings yet
Create Reports
7 pages
IAESTE USA Assessment Test
No ratings yet
IAESTE USA Assessment Test
4 pages
A Governance Model For Cloud Computing
No ratings yet
A Governance Model For Cloud Computing
6 pages
L-3 Simulation of MM1
No ratings yet
L-3 Simulation of MM1
36 pages
Student Database Management System
82% (17)
Student Database Management System
24 pages
Mastering The Game of Go Without Human Knowledge
100% (1)
Mastering The Game of Go Without Human Knowledge
18 pages
Scheduling: - Job Queue - Ready Queue - Device Queue - IPC Queue
No ratings yet
Scheduling: - Job Queue - Ready Queue - Device Queue - IPC Queue
20 pages
Unix Basic Question
No ratings yet
Unix Basic Question
33 pages

In A Nutshell: Flex Flex

Uploaded by

In A Nutshell: Flex Flex

Uploaded by

CS143 Handout 05

% lex myFile.l creates lex.yy.c containing C code for scanner

Character classes [0-9] This means alternation of the characters in the

You might also like