0% found this document useful (0 votes)

12 views5 pages

Lexical Analysis

Lexical analysis is the initial phase of a compiler that processes source code by breaking it into tokens while removing whitespace and comments. It generates errors for invalid tokens and works closely with the syntax analyzer, utilizing regular expressions to define valid token patterns. The process involves input buffering to efficiently scan and identify tokens, which include keywords, identifiers, constants, and operators.

Uploaded by

cammusowmiya13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Lexical Analysis

Uploaded by

cammusowmiya13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Lexical Analysis

Lexical analysis is the first phase of a compiler. It takes modified source code from language
preprocessors that are written in the form of sentences. The lexical analyzer breaks these
syntaxes into a series of tokens, by removing any whitespace or comments in the source code.

If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works
closely with the syntax analyzer. It reads character streams from the source code, checks for
legal tokens, and passes the data to the syntax analyzer when it demands.

Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token. These rules are defined by
grammar rules, by means of a pattern. A pattern explains what can be a token, and these
patterns are defined by means of regular expressions.

In programming language, keywords, constants, identifiers, strings, numbers, operators and

punctuations symbols can be considered as tokens.

For example, in C language, the variable declaration line

int value = 100;
contains the tokens:
int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

Lexical analysis
Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token. These rules are defined by
grammar rules, by means of a pattern. A pattern explains what can be a token, and these
patterns are defined by means of regular expressions.
In programming language, keywords, constants, identifiers, strings, numbers, operators and
punctuations symbols can be considered as tokens.

For example, in C language, the variable declaration line

int value = 100;

contains the tokens:

int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

Specifications of Tokens
Let us understand how the language theory undertakes the following terms:

Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets, {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a
set of Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.

Strings
Any finite sequence of alphabets (characters) is called a string. Length of the string is the total
number of occurrence of alphabets, e.g., the length of the string tutorialspoint is 14 and is
denoted by |tutorialspoint| = 14. A string having no alphabets, i.e. a string of zero length is
known as an empty string and is denoted by ε (epsilon).

Special symbols
A typical high-level language contains the following symbols:-

Language
A language is considered as a finite set of strings over some finite set of alphabets. Computer
languages are considered as finite sets, and mathematically set operations can be performed
on them. Finite languages can be described by means of regular expressions.

Explore our latest online courses and learn new skills at your own pace. Enroll and become a
certified expert to boost your career.

Regular Expressions
The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that
belong to the language in hand. It searches for the pattern defined by the language rules.

Regular expressions have the capability to express finite languages by defining a pattern for
finite strings of symbols. The grammar defined by regular expressions is known as regular
grammar. The language defined by regular grammar is known as regular language.

Regular expression is an important notation for specifying patterns. Each pattern matches a set
of strings, so regular expressions serve as names for a set of strings. Programming language
tokens can be described by regular languages. The specification of regular expressions is an
example of a recursive definition. Regular languages are easy to understand and have efficient
implementation.

There are a number of algebraic laws that are obeyed by regular expressions, which can be
used to manipulate regular expressions into equivalent forms.

What is a Lexeme?
A lexeme is an actual string of characters that matches with a pattern and generates a token.
eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” .

Role of lexical Analysis

The lexical analysis is the first phase of the compiler where a lexical analyser operate as an
interface between the source code and the rest of the phases of a compiler. It reads the input
characters of the source program, groups them into lexemes, and produces a sequence of
tokens for each lexeme. The tokens are sent to the parser for syntax analysis.

If the lexical analyzer is located as a separate pass in the compiler it can need an intermediate
file to locate its output, from which the parser would then takes its input. It can eliminate the
need for the intermediate file, the lexical analyzer and the syntactic analyser (parser) are often
grouped into the same pass where the lexical analyser operates either under the control of the
parser or as a subroutine with the parser.

The lexical analyzer also interacts with the symbol table while passing tokens to the parser.
Whenever a token is discovered, the lexical analyzer returns a representation for that token to
the parser. If the token is a simple construct including parenthesis, comma, or a colon, then it
returns an integer program. If the token is a more complex items including an identifier or
another token with a value, the value is also passed to the parser.

Lexical analyzer separates the characters of the source language into groups that logically
belong together, called tokens. It includes the token name which is an abstract symbol that
define a type of lexical unit and an optional attribute value called token values. Tokens can be
identifiers, keywords, constants, operators, and punctuation symbols including commas and
parenthesis. A rule that represent a group of input strings for which the equal token is make as
output is called the pattern.

Regular expression plays an essential role in specifying patterns. If a keyword is treated as a

token, the pattern is only a sequence of characters. For identifiers and various tokens, patterns
form a difficult structure.
The lexical analyzer also handles issues including stripping out the comments and whitespace
(tab, newline, blank, and other characters that are used to separate tokens in the input). The
correlating error messages that are generated by the compiler during lexical analyzer with the
source program.

For example, it can maintain track of all newline characters so that it can relate an ambiguous
statement line number with each error message. It can be implementing the expansion of
macros, in the case of macro, pre-processors are used in the source program.

Input Buffering

Lexical Analysis has to access secondary memory each time to identify tokens. It is time-
consuming and costly. So, the input strings are stored into a buffer and then scanned by Lexical
Analysis.

Lexical Analysis scans input string from left to right one character at a time to identify tokens. It
uses two pointers to scan tokens −

Begin Pointer (bptr) − It points to the beginning of the string to be read.

Look Ahead Pointer (lptr) − It moves ahead to search for the end of the token.

Example − For statement int a, b;

Both pointers start at the beginning of the string, which is stored in the buffer.

Look Ahead Pointer scans buffer until the token is found.

The character ("blank space") beyond the token ("int") have to be examined before the token
("int") will be determined.

After processing token ("int") both pointers will set to the next token ('a'), & this process will be
repeated for the whole program.

A buffer can be divided into two halves. If the look Ahead pointer moves towards halfway in First
Half, the second half is filled with new characters to be read. If the look Ahead pointer moves
towards the right end of the buffer of the second half, the first half will be filled with new
characters, and it goes on.
Buffer Pairs −
A specialized buffering technique can decrease the amount of overhead, which is needed to
process an input character in transferring characters. It includes two buffers, each includes N-
character size which is reloaded alternatively.

There are two pointers such as the lexeme Begin and forward are supported. Lexeme Begin
points to the starting of the current lexeme which is discovered. Forward scans ahead before a
match for a pattern are discovered. Before a lexeme is initiated, lexeme begin is set to the
character directly after the lexeme which is only constructed, and forward is set to the character
at its right end.

Sidewalks, Islands,&Medians Design Manuals (MOMRA) - (English)
100% (3)
Sidewalks, Islands,&Medians Design Manuals (MOMRA) - (English)
111 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
cd1
No ratings yet
cd1
92 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Unit 01 - PART 2
No ratings yet
Unit 01 - PART 2
25 pages
Lecture 04 05 PDF
No ratings yet
Lecture 04 05 PDF
8 pages
HW_31712
No ratings yet
HW_31712
22 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
compiler_design- Module2-print
No ratings yet
compiler_design- Module2-print
16 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
17 pages
CSI 411 - Compiler - Lecture 2 PDF
No ratings yet
CSI 411 - Compiler - Lecture 2 PDF
22 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
Compiler Design 1
100% (1)
Compiler Design 1
30 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Notes 3-4
No ratings yet
Notes 3-4
2 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
lec 02
No ratings yet
lec 02
17 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
CT - Lecture 2
No ratings yet
CT - Lecture 2
23 pages
Unit2
No ratings yet
Unit2
61 pages
cd UNIT-1
No ratings yet
cd UNIT-1
60 pages
@CD_ch2 compiler design
No ratings yet
@CD_ch2 compiler design
26 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Comp Final
No ratings yet
Comp Final
16 pages
What Is The Role of The Lexical Analyzer in Compiler Design
No ratings yet
What Is The Role of The Lexical Analyzer in Compiler Design
2 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Lecture 3- Lexical Analysis (1)
No ratings yet
Lecture 3- Lexical Analysis (1)
42 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Chapter-2
No ratings yet
Chapter-2
41 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Compilers and Translators Assignment
No ratings yet
Compilers and Translators Assignment
3 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
68 pages
2. Lexical Analyzer
No ratings yet
2. Lexical Analyzer
16 pages
Unit 2
No ratings yet
Unit 2
14 pages
L4 - Lexical Analysis (Introduction)
No ratings yet
L4 - Lexical Analysis (Introduction)
11 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
2 Lexical Analyzer
No ratings yet
2 Lexical Analyzer
21 pages
CD Unit I Part II Lexical Analysis
No ratings yet
CD Unit I Part II Lexical Analysis
58 pages
Lecture-2-10022025-035804pm
No ratings yet
Lecture-2-10022025-035804pm
27 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
37 pages
The Role of The Lexical Analyzer
100% (1)
The Role of The Lexical Analyzer
15 pages
Lexical Analysis
No ratings yet
Lexical Analysis
35 pages
5.Tokens, Patterns, and Lexemes
No ratings yet
5.Tokens, Patterns, and Lexemes
7 pages
AWS Acceptable Use Policy: No Illegal, Harmful, or Offensive Use or Content
No ratings yet
AWS Acceptable Use Policy: No Illegal, Harmful, or Offensive Use or Content
5 pages
VMR HA Deplyment
No ratings yet
VMR HA Deplyment
23 pages
专业的求职信服务
100% (2)
专业的求职信服务
6 pages
United States Court of Appeals, Second Circuit.: No. 600, Docket 78-2106
No ratings yet
United States Court of Appeals, Second Circuit.: No. 600, Docket 78-2106
6 pages
Bank Secrecy Law
No ratings yet
Bank Secrecy Law
6 pages
Eqt Record
No ratings yet
Eqt Record
41 pages
Ma 101 Course Guide
No ratings yet
Ma 101 Course Guide
2 pages
FlexSPI Driver Design
No ratings yet
FlexSPI Driver Design
14 pages
0545 Ndlovu Thandiwe L
No ratings yet
0545 Ndlovu Thandiwe L
2 pages
Creo Parametric Vocational Tarnning
50% (4)
Creo Parametric Vocational Tarnning
29 pages
2C - Note On Bostock V Clayton County - 2021 Final
No ratings yet
2C - Note On Bostock V Clayton County - 2021 Final
3 pages
ict 2
No ratings yet
ict 2
35 pages
VMC Fanuc Electrical User Guide and Diagram
100% (1)
VMC Fanuc Electrical User Guide and Diagram
94 pages
Andrews Wommack Laws of Wealth
No ratings yet
Andrews Wommack Laws of Wealth
20 pages
Astm / Aama Water Testing
No ratings yet
Astm / Aama Water Testing
3 pages
The Substantive Test of Charity in Lung Center. The Issue in Lung Center
No ratings yet
The Substantive Test of Charity in Lung Center. The Issue in Lung Center
3 pages
The Information-Seeking Behavior and Levels of Knowledge, Precaution, and Fear of College Students in Iloilo, Philippines Amidst The COVID-19 Pandemic
No ratings yet
The Information-Seeking Behavior and Levels of Knowledge, Precaution, and Fear of College Students in Iloilo, Philippines Amidst The COVID-19 Pandemic
15 pages
Cai - Insurance Cover - 2017
No ratings yet
Cai - Insurance Cover - 2017
1 page
Coastal Beach Tote - M22142 RW (1) v1656531461421
100% (1)
Coastal Beach Tote - M22142 RW (1) v1656531461421
4 pages
Oleum Leakage-Case Study
No ratings yet
Oleum Leakage-Case Study
14 pages
Guerrini 10.1.1.462.4943
No ratings yet
Guerrini 10.1.1.462.4943
6 pages
Pecpr Guide2007 Colour
No ratings yet
Pecpr Guide2007 Colour
18 pages
lecture task
No ratings yet
lecture task
4 pages
Groningen Social Disabilities Schedule (GSDS-II) SAMPLE
No ratings yet
Groningen Social Disabilities Schedule (GSDS-II) SAMPLE
8 pages
DISQUALIFIED As A Candidate For The Office of Congressman, Fourth District of Leyte, For Lack of Residency Requirement. SO ORDERED..
No ratings yet
DISQUALIFIED As A Candidate For The Office of Congressman, Fourth District of Leyte, For Lack of Residency Requirement. SO ORDERED..
2 pages
Cryogenic Brake Rotors: by John Bellah
No ratings yet
Cryogenic Brake Rotors: by John Bellah
6 pages
Screenshot 2025-01-03 at 5.00.25 PM
No ratings yet
Screenshot 2025-01-03 at 5.00.25 PM
4 pages
Jarraya - Montassar - Benmekki - Hamdi 2021
No ratings yet
Jarraya - Montassar - Benmekki - Hamdi 2021
15 pages
KHYF Viniyoga Therapy Europe 2018-20-21
No ratings yet
KHYF Viniyoga Therapy Europe 2018-20-21
15 pages

Lexical Analysis

Uploaded by

Lexical Analysis

Uploaded by

Lexical Analysis

In programming language, keywords, constants, identifiers, strings, numbers, operators and

For example, in C language, the variable declaration line

For example, in C language, the variable declaration line

int value = 100;

int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

Role of lexical Analysis

Regular expression plays an essential role in specifying patterns. If a keyword is treated as a

Begin Pointer (bptr) − It points to the beginning of the string to be read.

Example − For statement int a, b;

Look Ahead Pointer scans buffer until the token is found.

You might also like