LEX and YACC
LEX and YACC
2
Creating a Lexical Analyzer with
Lex and Flex
lex
source lex.yy.c
lex (or flex)
program
lex.l
lex.yy.c C a.out
compiler
input sequence
stream a.out of tokens
3
Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
4
Regular Expressions in Lex
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1/r2 match r1 when followed by r2 5
{d} match the regular expression defined by d
Example Lex Specification 1
Contains
%{ the matching
Translation #include <stdio.h> lexeme
%}
rules %%
[0-9]+ { printf(“%s\n”, yytext); }
.|\n { }
%% Invokes
main() the lexical
{ yylex(); analyzer
}
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
6
Example Lex Specification 2
%{
#include <stdio.h> Regular
int ch = 0, wd = 0, nl = 0;
definition
Translation %}
rules delim [ \t]+
%%
\n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8d\n", nl, wd, ch);
}
7
Example Lex Specification 3
%{
#include <stdio.h> Regular
%}
definitions
Translation digit [0-9]
rules letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %s\n”, yytext); }
{id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
%%
main()
{ yylex();
}
8
Example Lex Specification 4
%{ /* definitions of manifest constants */
#define LT (256)
…
%}
delim [ \t\n]
ws {delim}+
letter [A-Za-z] Return
digit [0-9]
id {letter}({letter}|{digit})* token to
number
%%
{digit}+(\.{digit}+)?(E[+\-]?{digit}+)? parser
{ws} { }
if {return IF;} Token
then
else
{return THEN;}
{return ELSE;}
attribute
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
“<“ {yylval = LT; return RELOP;}
“<=“ {yylval = LE; return RELOP;}
“=“ {yylval = EQ; return RELOP;}
“<>“ {yylval = NE; return RELOP;}
“>“ {yylval = GT; return RELOP;}
“>=“
%%
{yylval = GE; return RELOP;} Install yytext as
9
int install_id() identifier in symbol table
…
– Writing a parser with YACC (Yet Another
Compiler Compiler).
• Work with lex. YACC calls yylex to get the next token.
– YACC and lex must agree on the values for each token.
%union {
int semantic_value;
}
%token <semantic_value> INTEGERCONST 2
%type <semantic_value> exp
%type <semantic_value> term
%type <semantic_value> item
21
Creating an LALR(1) Parser with
Yacc/Bison
yacc
specification Yacc or Bison y.tab.c
yacc.y compiler
y.tab.c C a.out
compiler
input output
stream a.out stream
22
Yacc Specification
• A yacc specification consists of three parts:
yacc declarations, and C declarations within %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are productions with actions:
production1 { semantic action1 }
production2 { semantic action2 }
…
productionn { semantic actionn }
23
Writing a Grammar in Yacc
• Productions in Yacc are of the form
Nonterminal: tokens/nonterminals { action }
| tokens/nonterminals { action }
…
;
• Tokens that are single characters can be used
directly within productions, e.g. ‘+’
• Named tokens must be declared first in the
declaration part using
%token TokenName
24
Synthesized Attributes
• Semantic actions may refer to values of the
synthesized attributes of terminals and
nonterminals in a production:
X : Y1 Y2 Y3 … Yn { action }
– $$ refers to the value of the attribute of X
– $i refers to the value of the attribute of Yi
• For example
factor : ‘(’ expr ‘)’ { $$=$2; }
factor.val=x
$$=$2
( expr.val=x ) 25
Example 1
%{ #include <ctype.h> %}
Also results in definition of
%token DIGIT #define DIGIT xxx
%%
line : expr ‘\n’ { printf(“= %d\n”, $1); }
;
expr : expr ‘+’ term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term ‘*’ factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : ‘(’ expr ‘)’ { $$ = $2; }
| DIGIT { $$ = $1; }
; Attribute of factor (child)
%% Attribute of
int yylex() term (parent) Attribute of token
{ int c = getchar();
(stored in yylval)
if (isdigit(c))
{ yylval = c-’0’; Example of a very crude lexical
return DIGIT; analyzer invoked by the parser
}
return c; 26
}
Dealing With Ambiguous
Grammars
• By defining operator precedence levels and
left/right associativity of the operators, we can
specify ambiguous grammars in Yacc, such as
E → E+E | E-E | E*E | E/E | (E) | -E | num
• To define precedence levels and associativity in
Yacc’s declaration part:
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
27
Example 2
%{
Double type for attributes
#include <ctype.h>
#include <stdio.h> and yylval
#define YYSTYPE double
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“= %g\n”, $2); }
| lines ‘\n’
| /* empty */
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr { $$ = $1 / $3; }
| ‘(’ expr ‘)’ { $$ = $2; }
| ‘-’ expr %prec UMINUS { $$ = -$2; }
| NUMBER
; 28
%%
Example 2 (cont’d)
%%
int yylex()
{ int c;
while ((c = getchar()) == ‘ ‘)
;
if ((c == ‘.’) || isdigit(c)) Crude lexical analyzer for
{ ungetc(c, stdin); fp doubles and arithmetic
scanf(“%lf”, &yylval); operators
return NUMBER;
}
return c;
}
int main()
{ if (yyparse() != 0)
fprintf(stderr, “Abnormal exit\n”); Run the parser
return 0;
}
int yyerror(char *s)
Invoked by parser
{ fprintf(stderr, “Error: %s\n”, s);
} to report parse errors
29
Combining Lex/Flex with
Yacc/Bison
yacc
specification Yacc or Bison y.tab.c
compiler y.tab.h
yacc.y
Lex specification
lex.l Lex or Flex
lex.yy.c
and token definitions compiler
y.tab.h
lex.yy.c C a.out
y.tab.c compiler
input output
stream a.out stream 30
Lex Specification for Example 2
%option noyywrap
%{
#define YYSTYPE double
#include “y.tab.h” Generated by Yacc, contains
#define NUMBER xxx
extern double yylval;
%}
number [0-9]+\.?|[0-9]*\.[0-9]+ Defined in y.tab.c
%%
[ ] { /* skip blanks */ }
{number} { sscanf(yytext, “%lf”, &yylval);
return NUMBER;
}
\n|. { return yytext[0]; }