0% found this document useful (0 votes)
5 views20 pages

ACD unit-2 part-2

The document provides an overview of lexical analysis in compiler design, detailing the role of the lexical analyzer, its functions, and the process of designing one using transition diagrams and finite automata. It distinguishes between lexical and syntax analyzers, explaining how the former identifies tokens while the latter deals with meaningful structures. Additionally, it introduces the LEX tool for generating lexical analyzers and outlines its components and operation.

Uploaded by

vanitha.thandur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

ACD unit-2 part-2

The document provides an overview of lexical analysis in compiler design, detailing the role of the lexical analyzer, its functions, and the process of designing one using transition diagrams and finite automata. It distinguishes between lexical and syntax analyzers, explaining how the former identifies tokens while the latter deals with meaningful structures. Additionally, it introduces the LEX tool for generating lexical analyzers and outlines its components and operation.

Uploaded by

vanitha.thandur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Compiler Design

1. Introduction
Syntax Analyzer Vs Lexical
Analyzer
Contents:

1. Over view and Role of Lexical Analyzer

2. Simple approach to design Lexical Analyzer

3. Lexical errors

4. Syntax analyzer VS Lexical analyzer

5. Lexical Analyzer tool--LEX


Lexical analysis
Overview of Lexical analysis :

To identify the tokens we need some method of describing the possible


tokens that can appear in the inputstream.

For this purpose we use regular expression, a notation that can be used to
describe essentially all the tokens of programminglanguage.

After decided what the tokens are, we need some mechanism to recognize
these in the input stream.This is done by the token recognizers, which are
designed using transition diagrams and finite automata.
Lexical Role of Lexical analyzer :

analysis
The Lexical Analyzer is
the first phase of
acompiler.

It reads the source program


character by character and
produce the output a
sequence of tokens

Lexical Analyzer may also perform certain secondary tasks as the user interface.
One such task is striping out from the source program the commands and white
spaces in the form of blank, tab and new line characters.
Another is correlating error message from the compiler with the source program.
Lexical analysis
TOKEN,LEXEME, PATTERN:

Token: Token is a sequence of characters that can be treated as asingle


logical entity.
Typical tokens are :
1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
Pattern:Aset of strings in the input for which the same token is produced as
output.This set of strings is described by a rule called a pattern associated
with the token.

Lexeme:Alexeme is a sequence of characters in the source program that is


matched by the pattern for atoken.
Lexical analysis
Simple approach to design Lexical analyzer

To design Lexical Analyzer :

1. Prepare a set of language constructs

2. Represent the constructs using transition diagrams.

3. Prepare algorithmic code for every state in the transition diagram

4. Convert the algorithms to programs.


Lexical analysis
Languages constructs specify how the identifiers, constants and labels are
specified. What are the keywords and operators used in the language etc

For example, In a particular language identifier can be defined as :

“ beginning with a letter followed by any number of letters or digits ” the


following is equivalent automata (transition diagram )

letter = a/b/c/d/----/z
digit =0/1/2/---/9
delimiter= . /, / ;
Lexical analysis

In a particular language integer constant can be defined as :

“ beginning with a digit followed by any number of digits ”

letter = a/b/c/d/----/z
digit =0/1/2/---/9
delimiter= . /, / ;
Lexical analysis

In a particular language floating point constant can be defined as :

“ beginning with a digit followed by any number of digits, dot followed by one
or more digits ”

letter = a/b/c/d/----/z
digit =0/1/2/---/9
delimiter= . /, / ;
Lexical analysis
In a particular language key words are defined as :

“ IF THEN ELSE WHILE


Lexical analysis
Now the algorithmic code is written for each state and then convert those
algorithms to programs.

Algorithm code for state 0: Algorithm code for state 1:

C=getchar( ); C=getchar( );
If c==letter( c) then goto state 1 If c==letter( c) || digit (c ) then goto
Else state 1
Error( ); Else
If c==delimiter( C ) then goto state
Algorithm code for state 2: Else
2
Error( );
Retract( )
Install ( id )
Lexical analysis

In a similar way algorithmic code was written for all the states in all the
transition diagrams. Later these algorithms are converted to programs.

Now in our program, if we use any key word, it will be given as input to the
particular automata, if the keyword is correctly spelled and allowed as per
language constructs it will enter to final state and it was accepted other wise
error handler was invoked.
Lexical analysis
LEXICALERRORS:

Lexical errors are the errors thrown by your lexer when unable to continue.
Which means that there's no way to recognize a lexeme as a valid token for
you lexer.

Syntax errors, on the other side, will be thrown by your scanner when a
given set of already recognized valid tokens don't match any of the right
sides of your grammarrules.

Simple panic-mode error handling system requires that we return to a


high- level parsing function when a parsing or lexical error is detected.
Lexical analysis
LEXICALERRORS:

Error-recovery actions are:


i. Delete one character from the remaininginput.
ii. Insert a missing character in to the remaininginput.
iii. Replace a character by another character.
iv. Transpose two adjacent characters.
Syntax Analyzer Vs Lexical
Analyzer
Which constructs of a program should be recognized by
lexicalanalyzer,andwhichones bythe syntax analyzer? the

Both of them do similar things; But the lexical analyzer deals with
simple non-recursive constructs of the language.
The syntax analyzer deals with recursive constructs of the
language.
The lexical analyzer simplifies the job of the syntax analyzer.
The lexical analyzer recognizes the smallest meaningful units (tokens) in
source program.
The syntax analyzer works on the smallest meaningful units (tokens) in a
source program to recognize meaningful structures in the given
program.
Lexical analyzer tool --LEX
LEX :

Lex is a program that generates lexical analyzer. It is used with YACC


parser generator. The lexical analyzer is a program that transforms an input
stream into a sequence of tokens. It reads the input stream and produces
the source code as output through implementing the lexical analyzer
in the C program.
Lexical analyzer tool --LEX
Step 1: An input file describes the lexical analyzer to be generated named
lex.l is written in lex language. The lex compiler transforms lex.l to C
program, in a file that is always named lex.yy.c.
Step 2: The C complier compile lex.yy.c file into an executable file called
a.out.
Step 3: The output file a.out take a stream of input characters and
produce a stream of tokens.
Lexical analyzer tool --LEX
LEX :

Lex is a program that generates lexical analyzer. It is used with YACC


parser generator. The lexical analyzer is a program that transforms an input
stream into a sequence of tokens. It reads the input stream and produces
the
codesource
as output through implementing the lexical analyzer in the
Cprogram.
Lex specifications:
ALex program (the .l file ) consists of
threeparts:
declarations
%%
translation rules
%%
auxiliary procedures
Lexical analyzer tool--LEX
LEX:

1. The declarations section includes declarations of variables, manifest


constants (A manifest constant is an identifier that is declared to
representa constant e.g. # define PIE 3.14), and regular definitions.
2.The translation rules of a Lex program are statements of the
form : p1 {action 1}
p2 {action 2}
p3 {action 3}
……
……
Where, each p is a regular expression and each action is a program
fragment describing what action the lexical analyzer should take when a
pattern p matches a lexeme. In Lex the actions are written in C.
3. The third section holds whatever auxiliary procedures are needed by the
actions. Alternatively these procedures can be compiled separately and loaded
with the lexical analyzer.
Lexical analysis Parsing

A parser converts this list of tokens into a


A Scanner simply turns an input String
Tree-like object to represent how the
(say a file) into a list of tokens.
tokens fit together to form sentence.
These tokens represent things like
identifiers, parentheses, operators etc.

The lexical analyzer (the "lexer") parses


A parser does not give the nodes any
individual symbols from the source code
meaning beyond structural cohesion.
file into tokens.
From there, the "parser" proper turns
those whole tokens into sentences of
your grammar

You might also like