0% found this document useful (0 votes)

21 views14 pages

Unit 2

Lexical analysis is the initial phase of the compilation process that breaks down source code into tokens, which are the fundamental elements of a programming language. This process includes scanning the code, tokenization, maintaining a symbol table, and producing a token stream for further analysis. Common issues in lexical analysis include invalid characters, unterminated strings, and ambiguous tokens, and regular expressions are often used to define the patterns for these tokens.

Uploaded by

anandbarot1350

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views14 pages

Unit 2

Uploaded by

anandbarot1350

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Unit 2 : Lexical Analysis

1010043418
1. Introduction to Lexical Analysis

Lexical analysis is the first phase of the compilation process in computer science and programming
language theory. It is also known as scanning or tokenization. The main objective of lexical analysis is
to break down the source code of a program into smaller units called tokens. These tokens represent the
fundamental building blocks of a programming language, such as keywords, identifiers, literals, and
operators.

The process of lexical analysis involves the following steps:

1. Input: The source code of the program is taken as input.

2. Scanning: The source code is read character by character, and the scanner identifies sequences of
characters that form a token. It skips whitespace and comments that do not impact the structure of the
program.

3. Tokenization: The scanner groups characters together to form tokens based on predefined patterns or
regular expressions. Each token has a specific meaning in the programming language. Common types of
tokens include keywords like "if" and "while," identifiers like variable names, literals like numbers and
strings, and operators like "+" and "=".

4. Symbol Table: As the tokens are identified, a symbol table is typically maintained. The symbol table
is a data structure that keeps track of identifiers and their associated information, like their data types
and memory locations.

5. Output: The output of the lexical analysis phase is a sequence of tokens, usually represented as a
stream of token names or token codes, along with any additional information like the lexeme (the actual
character sequence that forms the token) and, if applicable, the location in the source code (line number,
column number) where the token was found.

The token stream produced by the lexical analyzer is then passed to the next phase of the compiler
(usually the syntax analysis or parsing phase) to create an abstract syntax tree (AST) and check the
syntactic correctness of the source code. The AST is then used for further processing, optimization, and
eventually code generation to produce the executable program.
Figure 1Interaction of Lexical Analyzer with Parser

Issues in Lexical Analysis:

There are several reasons for separating the analysis phase of compiling into lexical and

parsing: • Simpler design is the most important consideration.

• Compiler efficiency is improved. A large amount of time is spent reading the source program and
partitioning into tokens. Buffering techniques are used for reading input characters and
processing tokens that speed up the performance of the compiler.
• Compiler portability is enhanced.
2. Tokens, Patterns, Lexemes:
Token: Token is a sequence of characters in the input that form a meaningful word. In most languages,
the tokens fall into these categories:

• Keywords

• Operators

• Identifiers

• Constants

• Literal strings

• Punctuation.
Pattern: There is a set of strings in the input for which a token is produced as output. This set is
described a rule called pattern.
Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for a
token.

Here's an example code snippet in a language:

x = 10 + y;
Now, let's identify the tokens, patterns, and lexemes in the given code snippet:

• x is an identifier token.
Pattern for an identifier: [a-zA-Z_][a-zA-Z0-9_]*
Lexeme: x

• = is an operator token.
Pattern for the assignment operator: =
Lexeme: =

• 10 is a number token.
Pattern for a number: [0-9]+
Lexeme: 10

• + is an operator token.
Pattern for the addition operator: +
Lexeme: +

• y is an identifier token.
Pattern for an identifier: [a-zA-Z_][a-zA-Z0-9_]*
Lexeme: y
• ; is a delimiter token.
Pattern for the semicolon delimiter: ;
Lexeme: ;
So, the token stream produced by the lexical analysis phase for this code snippet would be:

IDENTIFIER("x")
OPERATOR("=")
NUMBER("10")
OPERATOR("+")
IDENTIFIER("y")
DELIMITER(";")

3. Errors in Lexical Analysis Phase

During the lexical analysis phase, errors may occur due to various reasons. Here are some common errors
that can be encountered during lexical analysis:

• Invalid Characters: If the source code contains characters that are not recognized by the programming
language or are not part of the defined regular expressions, the lexical analyzer will raise an error for
each invalid character.
• 2. Unterminated Strings: If a string literal is not properly terminated with a closing quotation mark, it

will result in an error. For example, `"Hello, World!` is an unterminated string literal. • Unterminated
Comments: If a comment is not properly terminated, it can lead to lexical errors. For example, in some
languages, the comment might begin with `/*` but not have a closing `*/`, causing the rest of the code to
be considered as a comment.
• Incorrect Identifiers: If an identifier does not follow the language's rules for naming variables or uses
reserved keywords, it will lead to a lexical error.
• Ambiguous Tokens: If a particular sequence of characters can be interpreted as multiple token types,

the lexer may not be able to decide the correct token, leading to ambiguity and potential errors. • Integer
Overflow: If a numeric literal exceeds the range that can be represented by the language's data types, it
will lead to an integer overflow error.
• Invalid Numeric Literals: Numeric literals that do not follow the language's rules for representing
numbers, such as having multiple decimal points, may result in lexical errors.
• Incomplete Operators: Some languages have multi-character operators like `<=`, `>=`, `==`, etc. If the
code contains an incomplete or unrecognized operator, it will lead to a lexical error.
• Missing Semicolons: In languages that require statements to end with semicolons, missing semicolons
at the end of statements will result in errors.
• Reserved Words Usage: If the code uses reserved words or keywords in an inappropriate context, the
lexical analyzer may raise errors.
During the lexical analysis phase, the lexer typically reports these errors and tries to recover as best as
possible to continue processing the rest of the source code. The error reporting mechanism may vary
depending on the compiler or programming language being used. In some cases, the lexer may halt
processing after encountering the first lexical error, while in other cases, it may continue scanning the entire
source code and report all errors in one go.

4. A language for specifying Lexical Analysis

To specify lexical analyzers, one commonly used language is called "Regular Expression" or "Regular
Definition" notation. It allows developers to define the lexical structure of a programming language by
describing the patterns of tokens using regular expressions. Regular expressions are a powerful and concise
way to specify complex patterns of characters.
Here's an example of how you might use Regular Expression notation to define the lexical structure of a
simple programming language:

%{
// This section may include code or declarations that will be copied to the generated
lexer. #include <stdio.h>
%}

// Regular Definitions
DIGIT [0-9]
LETTER [a-zA-Z]
IDENTIFIER {LETTER}({LETTER}|{DIGIT})*
NUMBER {DIGIT}+
ASSIGNMENT_OP =
ADDITION_OP \+
SUBTRACTION_OP -
SEMICOLON ;

// Rules Section
%%
{IDENTIFIER} { printf("IDENTIFIER(%s)\n", yytext); }
{NUMBER} { printf("NUMBER(%s)\n", yytext); }
{ASSIGNMENT_OP} { printf("ASSIGNMENT_OP\n"); }
{ADDITION_OP} { printf("ADDITION_OP\n"); }
{SUBTRACTION_OP}{ printf("SUBTRACTION_OP\n"); }
{SEMICOLON} { printf("SEMICOLON\n"); }
. { /* Ignore unrecognized characters */ }
%%
// Custom functions or code can be included after the second '%%' section.
In this example, we use `%{ ... %}` to include C code that will be copied directly to the generated
lexical analyzer (lexer). The section after `%{ ... %}` is for defining regular definitions, which are named
patterns that can be used later in the rules section.
In the rules section, we define how to recognize different tokens based on the regular definitions. For
example, `{IDENTIFIER}` represents the pattern defined as `IDENTIFIER`, which matches any valid
identifier according to the specified regular expression. Similarly, `{NUMBER}` matches any sequence of
digits, and so on.
When a lexical analyzer is generated from this specification, it will recognize and print tokens based
on the patterns defined in the rules section. For example, given the input `x = 10 + y;`, the lexer would
produce the following token stream:
IDENTIFIER(x)
ASSIGNMENT_OP
NUMBER(10)
ADDITION_OP
IDENTIFIER(y)
SEMICOLON
The generated lexer would ignore any whitespace or comments as defined in the rules section (the last rule
`. { /* Ignore unrecognized characters */ }`). This way, it focuses only on the meaningful tokens defined in
the specification.

5. REGULAR EXPRESSIONS
Regular expression is a formula that describes a possible set of string. Component of regular
expression..
X the character x
. any character, usually accept a new line[x y z]

any of the characters x, y, z, …..

R? a R or nothing (=optionally as R)

R* zero or more occurrences…..

R+ one or more occurrences ……

R1R2 an R1 followed by an R2

R1|R1 either an R1 or an R2.

A token is either a single string or one of a collection of strings of a certain ty`pe. If we view the
set of strings in each token class as an language, we can use the regular-expression notation to
describe tokens.
Consider an identifier, which is defined to be a letter followed by zero or more letters or digits.
In regular expression notation we would write.
Identifier = letter (letter | digit)*
Here are the rules that define the regular expression over alphabet .
• is a regular expression denoting { € }, that is, the language containing only the empty
string.
• For each ‘a’ in Σ, is a regular expression denoting { a }, the language with only one
string consisting of the single symbol ‘a’ .
• If R and S are regular expressions, then

(R) | (S) means L(r) U L(s)

R.S means L(r).L(s)
R* denotes L(r*)

6. REGULAR DEFINITIONS
For notational convenience, we may wish to give names to regular expressions and to define
regular expressions using these names as if they were symbols.
Identifiers are the set or string of letters and digits beginning with a letter. The following regular
definition provides a precise specification for this class of string.
Example-1,
Ab*|cd? Is equivalent to (a(b*)) | (c(d?))
Pascal identifier
Letter - A | B | ……| Z | a | b |……| z|
Digits - 0 | 1 | 2 | …. | 9
Id - letter (letter / digit)*

7. Recognition of tokens:
We learn how to express pattern using regular expressions. Now, we must study how to take the
patterns for all the needed tokens and build a piece of code that examins the input string and
finds a prefix that is a lexeme matching one of the patterns.
Stmt →if expr then stmt
| If expr then else stmt
|є
Expr →term relop term
| term
Term →id
|number
For relop ,we use the comparison operations of languages like Pascal or SQL where = is
“equals” and < > is “not equals” because it presents an interesting structure of lexemes.
The terminal of grammar, which are if, then , else, relop ,id and numbers are the names of tokens
as far as the lexical analyzer is concerned, the patterns for the tokens are described using regular
definitions.
digit → [0,9]
digits →digit+
number →digit(.digit)?(e.[+-]?digits)?
letter → [A-Z,a-z]
id →letter(letter/digit)*
if → if
then →then
else →else
relop →< | > |<= | >= | = = | < >

In addition, we assign the lexical analyzer the job stripping out white space, by recognizing the
“token” we defined by:
WS → (blank/tab/newline)+
Here, blank, tab and newline are abstract symbols that we use to express the ASCII characters of
the same names. Token ws is different from the other tokens in that ,when we recognize it, we do
not return it to parser ,but rather restart the lexical analysis from the character that follows the
white space . It is the following token that gets returned to the parser.
Lexeme Token Name Attribute Value

Any WS - -

if If -

then Then -

else Else -

Any id Id Pointer to table entry

Any number Number Pointer to table entry

< Relop LT

<= Relop LE

== Relop EQ

<> Relop NE

8. TRANSITION DIAGRAM:
Transition Diagram has a collection of nodes or circles, called states. Each state represents a
condition that could occur during the process of scanning the input looking for a lexeme that
matches one of several patterns .
Edges are directed from one state of the transition diagram to another. each edge is labeled by a
symbol or set of symbols.
If we are in one state s, and the next input symbol is a, we look for an edge out of state s labeled
by a. if we find such an edge ,we advance the forward pointer and enter the state of the transition
diagram to which that edge leads.
Some important conventions about transition diagrams are
1. Certain states are said to be accepting or final .These states indicates that a lexeme has
been found, although the actual lexeme may not consist of all positions b/w the
lexeme Begin and forward pointers we always indicate an accepting state by a double
circle.
2. In addition, if it is necessary to return the forward pointer one position, then we shall
additionally place a * near that accepting state.
3. One state is designed the state ,or initial state ., it is indicated by an edge labeled “start”
entering from nowhere .the transition diagram always begins in the state before any input
symbols have been used.

Fig. 3.3: Transition diagram of Relational operators

As an intermediate step in the construction of a LA, we first produce a stylized flowchart,
called a transition diagram. Position in a transition diagram, are drawn as circles and are
called as states.

Fig. 3.4: Transition diagram of Identifier

The above TD for an identifier, defined to be a letter followed by any no of letters or digits.A
sequence of transition diagram can be converted into program to look for the tokens specified
by the diagrams. Each state gets a segment of code.
9. FINITE AUTOMATON
• A recognizer for a language is a program that takes a string x, and answers “yes” if x is a
sentence of that language, and “no” otherwise.
• We call the recognizer of the tokens as a finite automaton.

• A finite automaton can be: deterministic (DFA) or non-deterministic (NFA) • This means
that we may use a deterministic or non-deterministic automaton as a lexical analyzer.
• Both deterministic and non-deterministic finite automaton recognize regular

sets. • Which one?

– deterministic – faster recognizer, but it may take more space
– non-deterministic – slower, but it may take less space
– Deterministic automatons are widely used lexical analyzers.
• First, we define regular expressions for tokens; Then we convert them into a DFA to get
a lexical analyzer for our tokens.
10. Non-Deterministic Finite Automaton (NFA)
• A non-deterministic finite automaton (NFA) is a mathematical model that consists
of: o S - a set of states
o Σ - a set of input symbols (alphabet)
o move - a transition function move to map state-symbol pairs to sets of
states. o s0 - a start (initial) state
o F- a set of accepting states (final states)
• ε- transitions are allowed in NFAs. In other words, we can move from one state
to another one without consuming any symbol.
• A NFA accepts a string x, if and only if there is a path from the starting state to one of
accepting states such that edge labels along this path spell out x.
Example:
11. Deterministic Finite Automaton (DFA)

• A Deterministic Finite Automaton (DFA) is a special form of a NFA.

• No state has ε- transition

• For each symbol a and state s, there is at most one labeled edge a leaving s. i.e.
transition function is from pair of state-symbol to state (not set of states)

Example:
12. Converting RE to NFA
• This is one way to convert a regular expression into a NFA.

• There can be other ways (much efficient) for the conversion.

• Thomson’s Construction is simple and systematic method.

• It guarantees that the resulting NFA will have exactly one final state, and one start

state. • Construction starts from simplest parts (alphabet symbols).

• To create a NFA for a complex regular expression, NFAs of its sub-expressions

are combined to create its NFA.
• To recognize an empty string ε:

• To recognize a symbol a in the alphabet Σ:

• For regular expression r1 | r2:

N(r1) and N(r2) are NFAs for regular expressions r1 and r2.
• For regular expression r1 r2

Here, final state of N(r1) becomes the final state of N(r1r2).

• For regular expression r*

Example:
For a RE (a|b) * a, the NFA construction is shown below.
13. Converting NFA to DFA (Subset Construction)
We merge together NFA states by looking at them from the point of view of the input
characters: • From the point of view of the input, any two states that are connected by an –
transition may as well be the same, since we can move from one to the other without
consuming any character. Thus states which are connected by an -transition will be
represented by the same states in the DFA.
• If it is possible to have multiple transitions based on the same symbol, then we can regard a
transition on a symbol as moving from a state to a set of states (ie. the union of all those
states reachable by a transition on the current symbol). Thus these states will be
combined into a single DFA state.
To perform this operation, let us define two functions:
• The -closure function takes a state and returns the set of states reachable from it based on
(one or more) -transitions. Note that this will always include the state itself. We should
be able to get from a state to any state in its -closure without consuming any input.
• The function move takes a state and a character, and returns the set of states reachable
by one transition on this character.
We can generalise both these functions to apply to sets of states by taking the union of
the application to individual states.

For Example, if A, B and C are states, move({A,B,C},à') = move(A,à') move(B,à')

move(C,`a').
The Subset Construction Algorithm is a follows:

put ε-closure({s0}) as an unmarked state into the set of DFA (DS)

while (there is one unmarked S1 in DS) do
begin
mark S1
for each input symbol a do
begin
S2 ← ε-closure(move(S1,a))
if (S2 is not in DS) then
add S2 into DS as an unmarked state
transfunc[S1,a] ← S2
end
end

•a state S in DS is an accepting state of DFA if a state in S is an accepting state of

NFA • the start state of DFA is ε-closure({s0})

14. Lexical Analyzer Generator

15. Lex specifications:

A Lex program (the .l file ) consists of three parts:
declarations
%%
translation rules
%%
auxiliary procedures
1. The declarations section includes declarations of variables,manifest constants(A manifest
constant is an identifier that is declared to represent a constant e.g. # define PIE 3.14),
and regular definitions.
2. The translation rules of a Lex program are statements of the form :

p1 {action 1}
p2 {action 2}
p3 {action 3}
……
……
Where, each p is a regular expression and each action is a program fragment describing
what action the lexical analyzer should take when a pattern p matches a lexeme. In Lex
the actions are written in C.
3. The third section holds whatever auxiliary procedures are needed by the
actions.Alternatively these procedures can be compiled separately and loaded with the
lexical analyzer.

Note: You can refer to a sample lex program given in page no. 109 of chapter 3 of the
book: Compilers: Principles, Techniques, and Tools by Aho, Sethi & Ullman for more
clarity.

16. INPUT BUFFERING

The LA scans the characters of the source pgm one at a time to discover tokens. Because of large
amount of time can be consumed scanning characters, specialized buffering techniques have been
developed to reduce the amount of overhead required to process an input character. Buffering
techniques:
1. Buffer pairs
2. Sentinels

The lexical analyzer scans the characters of the source program one a t a time to discover tokens.
Often, however, many characters beyond the next token many have to be examined before the
next token itself can be determined. For this and other reasons, it is desirable for thelexical
analyzer to read its input from an input buffer. Figure shows a buffer divided into two haves of,
say 100 characters each. One pointer marks the beginning of the token being discovered. A look
ahead pointer scans ahead of the beginning point, until the token is discovered .we view the
position of each pointer as being between the character last read and thecharacter next to be read.
In practice each buffering scheme adopts one convention either apointer is at the symbol last
read or the symbol it is ready to read.

Token beginnings look ahead pointerThe distance which the lookahead pointer may have to
travel past the actual token may belarge. For example, in a PL/I program we may see:
DECALRE (ARG1, ARG2… ARG n) Without knowing whether DECLARE is a
keyword or an array name until we see the character that follows the right parenthesis.
In either case, the token itself ends at the second E. If the look ahead pointer travels
beyond the buffer half in which it began, the other half must be loaded with the next
characters from the source file. Since the buffer shown in above figure is of limited
size there is an implied constraint on how much look ahead can be used before the
next token is discovered. In the above example, ifthe look ahead traveled to the left
half and all the way through the left half to the middle, we could not reloadthe right
half, because we would lose characters that had not yet been groupedinto tokens.
While we can make the buffer larger if we chose or use another buffering scheme,we
cannot ignore the fact that overhead is limited.

Non-Deterministic Finite Automata
No ratings yet
Non-Deterministic Finite Automata
36 pages
Cs 010406 Theory of Computation
0% (1)
Cs 010406 Theory of Computation
2 pages
Lexical Analysis in Compiler Design With Example
No ratings yet
Lexical Analysis in Compiler Design With Example
8 pages
18487900
100% (2)
18487900
22 pages
Digital System Design Syllabus For EC 3 Sem 2018 Scheme - VTU CBCS 18EC34 Syllabus
100% (1)
Digital System Design Syllabus For EC 3 Sem 2018 Scheme - VTU CBCS 18EC34 Syllabus
2 pages
compiler group 4
No ratings yet
compiler group 4
28 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
3 pages
Exercise Questions On Regular Language and Regular Expression
100% (1)
Exercise Questions On Regular Language and Regular Expression
56 pages
Solutions To Set 13
No ratings yet
Solutions To Set 13
12 pages
Lexical Analysis
No ratings yet
Lexical Analysis
128 pages
Datasheet_lsm6dso32x_ACC-GYRO
No ratings yet
Datasheet_lsm6dso32x_ACC-GYRO
162 pages
Automata Computability and PDF
0% (3)
Automata Computability and PDF
1 page
Theory of Computation - CS3452 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
Theory of Computation - CS3452 - Question Bank and Important 2 Marks Questions With Answer
47 pages
Compiler Design Lab KCS552
No ratings yet
Compiler Design Lab KCS552
82 pages
Syllabus - CSE 2022-23 - 2 - 1
No ratings yet
Syllabus - CSE 2022-23 - 2 - 1
3 pages
Digital_Electronic_Locker
No ratings yet
Digital_Electronic_Locker
4 pages
Aadl Behavioral Annex
No ratings yet
Aadl Behavioral Annex
6 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
27 pages
138 Pushdown Automata Theory: Constructive
No ratings yet
138 Pushdown Automata Theory: Constructive
4 pages
Ejercicios CLAD V1.1
No ratings yet
Ejercicios CLAD V1.1
31 pages
5.Tokens, Patterns, and Lexemes
No ratings yet
5.Tokens, Patterns, and Lexemes
7 pages
Quartus II IntegratedSynthesis
No ratings yet
Quartus II IntegratedSynthesis
98 pages
Compilers: Topic 2: Lexical Analysis
No ratings yet
Compilers: Topic 2: Lexical Analysis
29 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
37 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
18 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
Do It Yourself 201 01
No ratings yet
Do It Yourself 201 01
9 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
Regular Expressions in Automata Theory
No ratings yet
Regular Expressions in Automata Theory
10 pages
BTECH CSE 5th 6th Sem 2020 21
No ratings yet
BTECH CSE 5th 6th Sem 2020 21
29 pages
Vlsi-Physical Design For Freshers: GATE 2018 ECE (Electronics and Communication) Digital Circuits Questions
No ratings yet
Vlsi-Physical Design For Freshers: GATE 2018 ECE (Electronics and Communication) Digital Circuits Questions
9 pages
Comp Final
No ratings yet
Comp Final
16 pages
cd UNIT-1
No ratings yet
cd UNIT-1
60 pages
Chapter 2 Lexical Analysis (Scanning) (1)
No ratings yet
Chapter 2 Lexical Analysis (Scanning) (1)
56 pages
CS402 Update Mcqs FinalTerm by Vu Topper RM
No ratings yet
CS402 Update Mcqs FinalTerm by Vu Topper RM
51 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
HW_31712
No ratings yet
HW_31712
22 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
68 pages
2. Lexical Analyzer
No ratings yet
2. Lexical Analyzer
16 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
Lecture 3- Lexical Analysis (1)
No ratings yet
Lecture 3- Lexical Analysis (1)
42 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
RPA Unit 5 DC
No ratings yet
RPA Unit 5 DC
32 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
1_scanning-slides-sanyal-part1
No ratings yet
1_scanning-slides-sanyal-part1
22 pages
5.2-System Modeling Part2
No ratings yet
5.2-System Modeling Part2
14 pages
@CD_ch2 compiler design
No ratings yet
@CD_ch2 compiler design
26 pages
Representation of Vending Device Using Finite State Automata
No ratings yet
Representation of Vending Device Using Finite State Automata
7 pages
CD Laqs
No ratings yet
CD Laqs
29 pages
role of a lexical AN
No ratings yet
role of a lexical AN
26 pages
ACD unit-2 part-2
No ratings yet
ACD unit-2 part-2
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
15 pages
Motivation: Introduction To Theoretical Computer Science Finite Automata
No ratings yet
Motivation: Introduction To Theoretical Computer Science Finite Automata
83 pages
Chapter-2
No ratings yet
Chapter-2
41 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Lecture-2-10022025-035804pm
No ratings yet
Lecture-2-10022025-035804pm
27 pages
Compiler Design
No ratings yet
Compiler Design
117 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
The Pillar of Computation Theory
100% (2)
The Pillar of Computation Theory
343 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Lexical Analysis
No ratings yet
Lexical Analysis
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
9 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Unit 1
No ratings yet
Unit 1
24 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Lexical Analysis
No ratings yet
Lexical Analysis
12 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Mechanisms Worls
No ratings yet
Mechanisms Worls
57 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Final Model Print
No ratings yet
Final Model Print
12 pages
Verilog For Finite State Machines
No ratings yet
Verilog For Finite State Machines
8 pages
Automata & Compiler Design Handout
No ratings yet
Automata & Compiler Design Handout
59 pages
Certificate Declaration: Topic Name
No ratings yet
Certificate Declaration: Topic Name
16 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Ce G 3150 Lab 4 Traffic Light
No ratings yet
Ce G 3150 Lab 4 Traffic Light
22 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet

Unit 2

Uploaded by

Unit 2

Uploaded by

Unit 2 : Lexical Analysis

The process of lexical analysis involves the following steps:

1. Input: The source code of the program is taken as input.

Issues in Lexical Analysis:

parsing: • Simpler design is the most important consideration.

Here's an example code snippet in a language:

3. Errors in Lexical Analysis Phase

4. A language for specifying Lexical Analysis

any of the characters x, y, z, …..

R* zero or more occurrences…..

R+ one or more occurrences ……

R1|R1 either an R1 or an R2.

(R) | (S) means L(r) U L(s)

Any id Id Pointer to table entry

Any number Number Pointer to table entry

Fig. 3.3: Transition diagram of Relational operators

Fig. 3.4: Transition diagram of Identifier

sets. • Which one?

• A Deterministic Finite Automaton (DFA) is a special form of a NFA.

• No state has ε- transition

• There can be other ways (much efficient) for the conversion.

• Thomson’s Construction is simple and systematic method.

state. • Construction starts from simplest parts (alphabet symbols).

• To create a NFA for a complex regular expression, NFAs of its sub-expressions

• To recognize a symbol a in the alphabet Σ:

• For regular expression r1 | r2:

Here, final state of N(r1) becomes the final state of N(r1r2).

For Example, if A, B and C are states, move({A,B,C},à') = move(A,à') move(B,à')

put ε-closure({s0}) as an unmarked state into the set of DFA (DS)

•a state S in DS is an accepting state of DFA if a state in S is an accepting state of

NFA • the start state of DFA is ε-closure({s0})

14. Lexical Analyzer Generator

15. Lex specifications:

16. INPUT BUFFERING

You might also like