0% found this document useful (0 votes)
11 views

LEX and YACC

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

LEX and YACC

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

LEX and YACC

The Lex and Flex Scanner


Generators
• Lex and its newer cousin flex are scanner
generators
• Scanner generators systematically translate
regular definitions into C source code for
efficient scanning
• Generated code is easy to integrate in C
applications

2
Creating a Lexical Analyzer with
Lex and Flex
lex
source lex.yy.c
lex (or flex)
program
lex.l

lex.yy.c C a.out
compiler

input sequence
stream a.out of tokens

3
Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }

pn { actionn }
4
Regular Expressions in Lex
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1/r2 match r1 when followed by r2 5
{d} match the regular expression defined by d
Example Lex Specification 1
Contains
%{ the matching
Translation #include <stdio.h> lexeme
%}
rules %%
[0-9]+ { printf(“%s\n”, yytext); }
.|\n { }
%% Invokes
main() the lexical
{ yylex(); analyzer
}

lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
6
Example Lex Specification 2
%{
#include <stdio.h> Regular
int ch = 0, wd = 0, nl = 0;
definition
Translation %}
rules delim [ \t]+
%%
\n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8d\n", nl, wd, ch);
}
7
Example Lex Specification 3
%{
#include <stdio.h> Regular
%}
definitions
Translation digit [0-9]
rules letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %s\n”, yytext); }
{id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
%%
main()
{ yylex();
}

8
Example Lex Specification 4
%{ /* definitions of manifest constants */
#define LT (256)

%}
delim [ \t\n]
ws {delim}+
letter [A-Za-z] Return
digit [0-9]
id {letter}({letter}|{digit})* token to
number
%%
{digit}+(\.{digit}+)?(E[+\-]?{digit}+)? parser
{ws} { }
if {return IF;} Token
then
else
{return THEN;}
{return ELSE;}
attribute
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
“<“ {yylval = LT; return RELOP;}
“<=“ {yylval = LE; return RELOP;}
“=“ {yylval = EQ; return RELOP;}
“<>“ {yylval = NE; return RELOP;}
“>“ {yylval = GT; return RELOP;}
“>=“
%%
{yylval = GE; return RELOP;} Install yytext as
9
int install_id() identifier in symbol table

– Writing a parser with YACC (Yet Another
Compiler Compiler).

• Automatically generate a parser for a context free grammar


(LALR parser)
– Allows syntax direct translation by writing grammar productions and
semantic actions
– LALR(1) is more powerful than LL(1).

• Work with lex. YACC calls yylex to get the next token.
– YACC and lex must agree on the values for each token.

• Like lex, YACC pre-dated c++, need workaround for some


constructs when using c++ (will give an example).
– Writing a parser with YACC (Yet Another
Compiler Compiler).
• YACC file format:

declarations /* specify tokens, and non-terminals */


%%
translation rules /* specify grammar here */
%%
supporting C-routines

• Command “yacc yaccfile” produces y.tab.c, which contains a


routine yyparse().
– yyparse() calls yylex() to get tokens.

• yyparse() returns 0 if the program is grammatically correct,


non-zero otherwise
• The declarations part specifies tokens, non-terminals
symbols, other C/C++ constructs.

– To specify token AAA BBB


• %token AAA BBB

– To assign a token number to a token (needed when using lex), a


nonnegative integer followed immediately to the first appearance
of the token
• %token EOFnumber 0
• %token SEMInumber 101

– Non-terminals do not need to be declared unless you want to


associated it with a type to store attributes (will be discussed later).
• Translations rules specify the grammar productions

exp : exp PLUSnumber exp


| exp MINUSnumber exp
| exp TIMESnumber exp
| exp DIVIDEnumber exp
| LPARENnumber exp RPARENnumber
| ICONSTnumber
;

exp : exp PLUSnumber exp


;
exp : exp MINUSnumber exp
;
• Yacc environment
– Yacc processes a yacc specification file and produces a y.tab.c file.
– An integer function yyparse() is produced by Yacc.
• Calls yylex() to get tokens.
• Return non-zero when an error is found.
• Return 0 if the program is accepted.
– Need main() and and yyerror() functions.
– Example:
yyerror(const char *str)
{ printf("yyerror: %s at line %d\n", str, yyline);
}
main()
{
if (!yyparse()) {printf("accept\n");}
else printf("reject\n");
}
• lex.yy.c and y.tab.c may be compiled separately, or yacc
file may just include lex.yy.c as in example0.y
• Global variables such as yyline, yycolumn, and yylval can
be used in yacc routines.
– YACC automatically builds a parser for the grammar (LALR
parser).
• May have shift/reduce and reduce/reduce conflicts when the
grammar is not LALR
– In this case, you will need to modify grammar to make it LALR in order
for yacc to work properly.
• YACC tries to resolve conflicts automatically
– Default conflict resolution:
» shift/reduce --> shift
» reduce/reduce --> first production in the state
• ‘yacc -v *.y’ will generate a report in file ‘y.output’.
– Resolving conflicts
• modify the grammar. Use precedence and
associativity of operators.
– Using keywords %left, %right, %nonassoc in the
declarations section.
» All tokens on the same line are the same precedence
level and associativity.
» The lines are listed in order of increasing precedence.
• Attribute grammar with yacc
– Each symbol can be associated with some
attributes.
• Data structure of the attributes can be specified in the union in the
declarations. (see example4.y).

%union {
int semantic_value;
}
%token <semantic_value> INTEGERCONST 2
%type <semantic_value> exp
%type <semantic_value> term
%type <semantic_value> item

• Semantic actions associate with productions can be specified.


• The union is used to define yylval (don’t need to
redeclare again, but you can directly using
yylval.semantic_value in the lex code).
• Semantic actions
– Semantic actions associate with productions can be
specified.

item : LPARENnumber exp RPARENnumber


{$$ = $2;}
| ICONSTnumber
{$$ = $1;}
;
• $$ is the attribute associated with the left handside of the
production
• $1 is the attribute associated with the first symbol in the
right handside, $2 for the second symbol, …
– An action can be in anywhere in the production, it is also
counted as a symbol.
• Semantic actions
– Semantic actions can be in anywhere in the
production, an action is also counted as a
symbol.

item : LPARENnumber {cout << “debug”;} exp RPARENnumber


{$$ = $3;}
| ICONSTnumber
{$$ = $1;}
;
ANTLR, Yacc, and Bison
• ANTLR tool
– Generates LL(k) parsers
• Yacc (Yet Another Compiler Compiler)
– Generates LALR parsers
• Bison
– Improved version of Yacc

21
Creating an LALR(1) Parser with
Yacc/Bison
yacc
specification Yacc or Bison y.tab.c
yacc.y compiler

y.tab.c C a.out
compiler

input output
stream a.out stream

22
Yacc Specification
• A yacc specification consists of three parts:
yacc declarations, and C declarations within %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are productions with actions:
production1 { semantic action1 }
production2 { semantic action2 }

productionn { semantic actionn }

23
Writing a Grammar in Yacc
• Productions in Yacc are of the form
Nonterminal: tokens/nonterminals { action }
| tokens/nonterminals { action }

;
• Tokens that are single characters can be used
directly within productions, e.g. ‘+’
• Named tokens must be declared first in the
declaration part using
%token TokenName
24
Synthesized Attributes
• Semantic actions may refer to values of the
synthesized attributes of terminals and
nonterminals in a production:
X : Y1 Y2 Y3 … Yn { action }
– $$ refers to the value of the attribute of X
– $i refers to the value of the attribute of Yi
• For example
factor : ‘(’ expr ‘)’ { $$=$2; }
factor.val=x
$$=$2
( expr.val=x ) 25
Example 1
%{ #include <ctype.h> %}
Also results in definition of
%token DIGIT #define DIGIT xxx
%%
line : expr ‘\n’ { printf(“= %d\n”, $1); }
;
expr : expr ‘+’ term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term ‘*’ factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : ‘(’ expr ‘)’ { $$ = $2; }
| DIGIT { $$ = $1; }
; Attribute of factor (child)
%% Attribute of
int yylex() term (parent) Attribute of token
{ int c = getchar();
(stored in yylval)
if (isdigit(c))
{ yylval = c-’0’; Example of a very crude lexical
return DIGIT; analyzer invoked by the parser
}
return c; 26
}
Dealing With Ambiguous
Grammars
• By defining operator precedence levels and
left/right associativity of the operators, we can
specify ambiguous grammars in Yacc, such as
E → E+E | E-E | E*E | E/E | (E) | -E | num
• To define precedence levels and associativity in
Yacc’s declaration part:
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
27
Example 2
%{
Double type for attributes
#include <ctype.h>
#include <stdio.h> and yylval
#define YYSTYPE double
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“= %g\n”, $2); }
| lines ‘\n’
| /* empty */
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr { $$ = $1 / $3; }
| ‘(’ expr ‘)’ { $$ = $2; }
| ‘-’ expr %prec UMINUS { $$ = -$2; }
| NUMBER
; 28
%%
Example 2 (cont’d)
%%
int yylex()
{ int c;
while ((c = getchar()) == ‘ ‘)
;
if ((c == ‘.’) || isdigit(c)) Crude lexical analyzer for
{ ungetc(c, stdin); fp doubles and arithmetic
scanf(“%lf”, &yylval); operators
return NUMBER;
}
return c;
}
int main()
{ if (yyparse() != 0)
fprintf(stderr, “Abnormal exit\n”); Run the parser
return 0;
}
int yyerror(char *s)
Invoked by parser
{ fprintf(stderr, “Error: %s\n”, s);
} to report parse errors
29
Combining Lex/Flex with
Yacc/Bison
yacc
specification Yacc or Bison y.tab.c
compiler y.tab.h
yacc.y

Lex specification
lex.l Lex or Flex
lex.yy.c
and token definitions compiler
y.tab.h

lex.yy.c C a.out
y.tab.c compiler

input output
stream a.out stream 30
Lex Specification for Example 2
%option noyywrap
%{
#define YYSTYPE double
#include “y.tab.h” Generated by Yacc, contains
#define NUMBER xxx
extern double yylval;
%}
number [0-9]+\.?|[0-9]*\.[0-9]+ Defined in y.tab.c
%%
[ ] { /* skip blanks */ }
{number} { sscanf(yytext, “%lf”, &yylval);
return NUMBER;
}
\n|. { return yytext[0]; }

yacc -d example2.y bison -d -y example2.y


lex example2.l flex example2.l
gcc y.tab.c lex.yy.c gcc y.tab.c lex.yy.c
./a.out ./a.out 31

You might also like