0% found this document useful (0 votes)
9 views29 pages

2.1Lexical Analysis

Uploaded by

2k5preethi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views29 pages

2.1Lexical Analysis

Uploaded by

2k5preethi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Lexical Analysis

Role of the Lexical Analyzer


 Read input and produce a sequence of tokens
 Can act as a sub-routine for the parser
 get_next_token

 Secondary tasks
 Strips blanks and new line characters
 Correlate error messages

 Maybe split into scanning and lexical analysis


phases
 After lexical analysis individual characters are no
longer examined.
Role of Lexical Analyzer

token
source lexical
analyzer parser
program
get next
token

symbol
table
Issues in lexical analysis
Why Separation of lexical and syntactic analysis
phases:

 Simpler Conceptual Model


 A parser including the conventions for comments and white
space is significantly more complex
 Increases Compiler Efficiency
 Specialized buffering techniques for reading input characters
and processing tokens

 Separation Promotes Portability.


 Input alphabet peculiarities and other device-specific anomalies
can be restricted to the lexical analyzer.
Terminology
 Tokens include keywords, operators, identifiers,
constants, literal strings, punctuation symbols
 A lexeme is a sequence of characters in the
source program representing a token
 A pattern is a rule describing a set of lexemes
that can represent a particular token
Examples

Token Sample Lexemes Informal Description of Pattern


const const const
if if if
relation <, <=, =, < >, >, >= < or <= or = or < > or >= or >
id pi, count, D2 letter followed by letters and digits
num 3.1416, 0, 6.02E23 any numeric constant
literal “core dumped” any characters between “ and “ except

Actual values are critical. Info is :


Classifies
Pattern 1.Stored in symbol table
2.Returned to parser
Attributes for Tokens
• Attributes provide additional information
about tokens
• Tokens influence parsing decision; the
attributes influence the translation of
tokens
• Lexical analyzers usually provide a single
attribute per token (pointer into symbol
table)
Attributes for Tokens
• Example: E = M * C ** 2
– <id, pointer to symbol-table entry for E>
– <assign_op, >
– <id, pointer to symbol-table entry for M>
– <mult_op, >
– <id, pointer to symbol-table entry for C>
– <exp_op, >
– <num, integer value 2>
What Tokens are Useful Here?

for (int k = 0; k < myArray[5]; ++k


{
cout << k << endl;
}
– for
– {
– int
– }
– << ;
– =<
– ([
– )]
– ++
– Identifier
– IntegerConstant
– Cout
– >>
– endl
Difficulties in Lexical Analysis
• Alignment of statements; free format input
• Treatment of blanks
– DO 5 I = 1.25; DO 5 I = 1,25
• Reserved Keywords
– IF THEN THEN THEN = ELSE; ELSE ELSE =
THEN
Lexical Error

• Lexical error is a sequence of characters


that does not match the pattern of any
token. Lexical phase error is found during
the execution of the program.
• During the lexical analysis phase this type
of error can be detected.
Lexical phase error can be:
•Spelling error.
•Exceeding length of identifier or numeric
constants.
•Appearance of illegal characters.
•To remove the character that should be present.
•To replace a character with an incorrect
character.
•Transposition of two characters.
Void main()
{ In this code, 1xab is neither
a number nor an identifier.
int x=10, y=20;
So this code will show the
char * a; lexical error.
a= &x;
x= 1xab;
}

#include <iostream>
int main() { using namespace std;

int a=2147483647 +1; int main() {


return 0;
} int x = 12$34;
printf(“Hello");$
return 0;
}
Lexical Errors
• Many errors may not be detected at the
lexical analysis stage alone
• Example: fi (a == b)
• Errors are found when Lexical analyzer is
unable to proceed because none of the
patterns for tokens matches a prefix of
remaining input.
Handling Lexical Errors
• Panic mode Recovery
– Delete successive characters from the remaining
input until the analyzer can find a well-formed token.
– May confuse the parser
• Possible error recovery actions
– Deleting or Inserting Input Characters
– Replacing or Transposing Characters
• Minimum distance error correction

You might also like