Recognition of Token in Lexical Analysis-3
Recognition of Token in Lexical Analysis-3
analysis
Example of Non-Tokens:
•Comments, preprocessor directive, macros,
blanks, tabs, newline, etc.
Lexeme: The sequence of characters matched by
a pattern to form the corresponding token or a
sequence of input characters that comprises a
single token is called a lexeme. eg- “float”,
“abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
How Lexical Analyzer Works?
1.Input preprocessing: This stage involves cleaning up the input text and
preparing it for lexical analysis.
2.Tokenization: This is the process of breaking the input text into a
sequence of tokens
3.Token classification: In this stage, the lexer determines the type of each
token.
4.Token validation: In this stage, the lexer checks that each token is valid
according to the rules of the programming language.
5.Output generation: In this final stage, the lexer generates the output of
the lexical analysis process, which is typically a list of tokens.
You can observe that we have omitted comments.
As another example, consider below printf
statement.
( LAPREN = ASSIGNMENT
a IDENTIFIER a IDENTIFIER
b IDENTIFIER 2 INTEGER
) RPAREN ; SEMICOLON
Advantages
1.Simplifies Parsing:Breaking down the source
code into tokens makes it easier for computers to
understand and work with the code.
2.Error Detection: Lexical analysis will detect
lexical errors such as misspelled keywords or
undefined symbols early in the compilation
process.
3.Efficiency: Once the source code is converted
into tokens, subsequent phases of compilation or
interpretation can operate more efficiently.
Disadvantages
1.Limited Context: Lexical analysis operates
based on individual tokens and does not consider
the overall context of the code.
2.Overhead: Although lexical analysis is
necessary for the compilation or interpretation
process, it adds an extra layer of overhead.
3.Debugging Challenges: Lexical errors
detected during the analysis phase may not always
provide clear indications of their origins in the
original source code.