The document discusses lexical analysis and how it relates to parsing in compilers. It introduces basic terminology like tokens, patterns, lexemes, and attributes. It describes how a lexical analyzer works by scanning input, identifying tokens, and sending tokens to a parser. Regular expressions are used to specify patterns for token recognition. Finite automata like nondeterministic and deterministic finite automata are constructed from regular expressions to recognize tokens.
This document discusses lexical analysis and regular expressions. It begins by outlining topics related to lexical analysis including tokens, lexemes, patterns, regular expressions, transition diagrams, and generating lexical analyzers. It then discusses topics like finite automata, regular expressions to NFA conversion using Thompson's construction, NFA to DFA conversion using subset construction, and DFA optimization. The role of the lexical analyzer and its interaction with the parser is also covered. Examples of token specification and regular expressions are provided.
Syntax defines the grammatical rules of a programming language. There are three levels of syntax: lexical, concrete, and abstract. Lexical syntax defines tokens like literals and identifiers. Concrete syntax defines the actual representation using tokens. Abstract syntax describes a program's information without implementation details. Backus-Naur Form (BNF) uses rewriting rules to specify a grammar. BNF grammars can be ambiguous. Extended BNF simplifies recursive rules. Syntax analysis transforms a program into an abstract syntax tree used for semantic analysis and code generation.
The document discusses lexical analysis of computer programming languages. It introduces lexical analysis as the process of reading a string of characters and categorizing them into tokens based on their roles. This involves constructing regular expressions to define the patterns for different token classes like keywords, identifiers, and numbers. The document then explains how to specify the lexical structure of a language by defining regular expressions for each token class and using them to build a lexical analyzer that takes a string as input and outputs the sequence of tokens.
This document discusses strings, languages, and regular expressions. It defines key terms like alphabet, string, language, and operations on strings and languages. It then introduces regular expressions as a notation for specifying patterns of strings. Regular expressions are defined over an alphabet and can combine symbols, concatenation, union, and Kleene closure to describe languages. Examples are provided to illustrate regular expression notation and properties. Limitations of regular expressions in describing certain languages are also noted.
This document discusses the role of a lexical analyzer in compiling source code. It contains the following key points:
1. A lexical analyzer takes source code as input and breaks it down into tokens by removing whitespace and comments. It checks for valid tokens and passes data to the syntax analyzer.
2. Regular expressions are used to formally specify tokens through patterns. Examples of tokens include keywords, identifiers, numbers, and operators.
3. A finite automaton can recognize tokens by using a transition diagram or table to transition between states based on input characters and accept or reject token patterns.
compiler Design course material chapter 2gadisaAdamu
The document provides an overview of lexical analysis in compiler design. It discusses the role of the lexical analyzer in reading characters from a source program and grouping them into lexemes and tokens. Regular expressions are used to specify patterns for tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are used to recognize patterns and languages. Thompson's construction is used to translate regular expressions to NFAs, and subset construction is used to translate NFAs to equivalent DFAs. This process is used in lexical analyzer generators to automate the recognition of language tokens from regular expressions.
This document discusses the specification and recognition of tokens in compiling. It begins by defining tokens and regular expressions, which are used to formally specify tokens. Regular expressions define patterns for strings. The document then explains how to recognize tokens by defining patterns for language elements using regular expressions. Finally, it provides examples of transition diagrams that can be used to recognize tokens like relational operators, identifiers, numbers, and whitespace.
The document discusses the phases of a compiler including lexical analysis. It describes how the lexical analyzer reads input characters and produces tokens by recognizing patterns specified in regular expressions. Some key points covered are:
1) A lexical analyzer separates a program into tokens by identifying patterns in regular expressions.
2) Common tokens include identifiers, reserved words, operators, and various types of constants.
3) Regular expressions are used to specify patterns for recognizing different token types in a language like C++.
4) The lexical analyzer works with the parser to break the source code into tokens which are then analyzed by subsequent compiler phases.
The document discusses scanning (lexical analysis) in compiler construction. It covers the scanning process, regular expressions, and finite automata. The scanning process identifies tokens from source code by categorizing characters as reserved words, special symbols, or other tokens. Regular expressions are used to represent patterns of character strings and define the language of tokens. Finite automata are mathematical models for describing scanning algorithms using states, transitions, and acceptance.
The document discusses concepts related to regular expressions and strings. It defines regular expressions as patterns used to match strings and describes how strings are sequences of characters that can represent text. It also discusses languages as sets of strings formed from alphabets and how operations like concatenation and union can combine languages.
The compiler is software that converts a program written in a high-level language (Source Language) to a low-level language (Object/Target/Machine Language/0, 1’s).
A translator or language processor is a program that translates an input program written in a programming language into an equivalent program in another language. The compiler is a type of translator, which takes a program written in a high-level programming language as input and translates it into an equivalent program in low-level languages such as machine language or assembly language.
The document discusses the role and process of lexical analysis in compilers. It can be summarized as:
1) Lexical analysis is the first phase of a compiler that reads source code characters and groups them into tokens. It produces a stream of tokens that are passed to the parser.
2) The lexical analyzer matches character sequences against patterns defined by regular expressions to identify lexemes and produce corresponding tokens.
3) Common tokens include keywords, identifiers, constants, and punctuation. The lexical analyzer may interact with the symbol table to handle identifiers.
The role of the lexical analyzer
Specification of tokens
Finite state machines
From a regular expressions to an NFA
Convert NFA to DFA
Transforming grammars and regular expressions
Transforming automata to grammars
Language for specifying lexical analyzers
The document discusses syntax analysis and parsing. It defines context-free grammars and different types of grammars. It also discusses derivation, parse trees, ambiguity in grammars and different parsing techniques like top-down and bottom-up parsing.
The document discusses lexical analysis and lexical analyzer generators. It begins by explaining that lexical analysis separates a program into tokens, which simplifies parser design and implementation. It then covers how the lexical analyzer interacts with the parser by returning tokens. The rest of the document details how to specify patterns for tokens using regular expressions and regular definitions, and how lexical analyzer generators like Lex and Flex translate those specifications into a scanner that recognizes tokens.
The document discusses lexical analysis and lexical analyzer generators. It covers topics such as:
- Lexical analysis separates the compiler into distinct phases for simpler design and more efficient implementation.
- Lexical analyzers are specified using regular expressions which are translated into nondeterministic finite automata (NFA) and then deterministic finite automata (DFA) for efficient scanning.
- Tools like Lex and Flex can automatically generate a lexical analyzer in C from a specification file defining tokens with regular expressions and actions.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
The document discusses syntax analysis in compiler design. It defines syntax analysis as the process of analyzing a string of symbols according to the rules of a formal grammar. This involves checking the syntax against a context-free grammar, which is more powerful than regular expressions and can check balancing of tokens. The output of syntax analysis is a parse tree. It separates lexical analysis and parsing for simplicity and efficiency. Lexical analysis breaks the source code into tokens, while parsing analyzes token streams against production rules to detect errors and generate the parse tree.
Regular expressions define patterns to match strings. A regular expression compares its pattern to a string and returns true or false. If true, it may return something depending on the function used. Strings are data types that represent text using characters. A language is a set of strings formed from a specific alphabet. Operations like concatenation and union can combine languages. Regular expressions, strings, and languages are fundamental concepts in compiler design.
The document discusses lexical analysis and lexical analyzer generators. It provides background on why lexical analysis is a separate phase in compiler design and how it simplifies parsing. It also describes how a lexical analyzer interacts with a parser and some key attributes of tokens like lexemes and patterns. Finally, it explains how regular expressions are used to specify patterns for tokens and how tools like Lex and Flex can be used to generate lexical analyzers from regular expression definitions.
The *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responThe *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responses*: Insects can exhibit complex behaviors, such as mating, foraging, and social interactions.
Characteristics
1. *Decentralized*: Insect nervous systems have some autonomy in different body parts.
2. *Specialized*: Different parts of the nervous system are specialized for specific functions.
3. *Efficient*: Insect nervous systems are highly efficient, allowing for rapid processing and response to stimuli.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive in diverse environments.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive
Ad
More Related Content
Similar to Lexical analysis compiler design to read and study (20)
This document discusses the role of a lexical analyzer in compiling source code. It contains the following key points:
1. A lexical analyzer takes source code as input and breaks it down into tokens by removing whitespace and comments. It checks for valid tokens and passes data to the syntax analyzer.
2. Regular expressions are used to formally specify tokens through patterns. Examples of tokens include keywords, identifiers, numbers, and operators.
3. A finite automaton can recognize tokens by using a transition diagram or table to transition between states based on input characters and accept or reject token patterns.
compiler Design course material chapter 2gadisaAdamu
The document provides an overview of lexical analysis in compiler design. It discusses the role of the lexical analyzer in reading characters from a source program and grouping them into lexemes and tokens. Regular expressions are used to specify patterns for tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are used to recognize patterns and languages. Thompson's construction is used to translate regular expressions to NFAs, and subset construction is used to translate NFAs to equivalent DFAs. This process is used in lexical analyzer generators to automate the recognition of language tokens from regular expressions.
This document discusses the specification and recognition of tokens in compiling. It begins by defining tokens and regular expressions, which are used to formally specify tokens. Regular expressions define patterns for strings. The document then explains how to recognize tokens by defining patterns for language elements using regular expressions. Finally, it provides examples of transition diagrams that can be used to recognize tokens like relational operators, identifiers, numbers, and whitespace.
The document discusses the phases of a compiler including lexical analysis. It describes how the lexical analyzer reads input characters and produces tokens by recognizing patterns specified in regular expressions. Some key points covered are:
1) A lexical analyzer separates a program into tokens by identifying patterns in regular expressions.
2) Common tokens include identifiers, reserved words, operators, and various types of constants.
3) Regular expressions are used to specify patterns for recognizing different token types in a language like C++.
4) The lexical analyzer works with the parser to break the source code into tokens which are then analyzed by subsequent compiler phases.
The document discusses scanning (lexical analysis) in compiler construction. It covers the scanning process, regular expressions, and finite automata. The scanning process identifies tokens from source code by categorizing characters as reserved words, special symbols, or other tokens. Regular expressions are used to represent patterns of character strings and define the language of tokens. Finite automata are mathematical models for describing scanning algorithms using states, transitions, and acceptance.
The document discusses concepts related to regular expressions and strings. It defines regular expressions as patterns used to match strings and describes how strings are sequences of characters that can represent text. It also discusses languages as sets of strings formed from alphabets and how operations like concatenation and union can combine languages.
The compiler is software that converts a program written in a high-level language (Source Language) to a low-level language (Object/Target/Machine Language/0, 1’s).
A translator or language processor is a program that translates an input program written in a programming language into an equivalent program in another language. The compiler is a type of translator, which takes a program written in a high-level programming language as input and translates it into an equivalent program in low-level languages such as machine language or assembly language.
The document discusses the role and process of lexical analysis in compilers. It can be summarized as:
1) Lexical analysis is the first phase of a compiler that reads source code characters and groups them into tokens. It produces a stream of tokens that are passed to the parser.
2) The lexical analyzer matches character sequences against patterns defined by regular expressions to identify lexemes and produce corresponding tokens.
3) Common tokens include keywords, identifiers, constants, and punctuation. The lexical analyzer may interact with the symbol table to handle identifiers.
The role of the lexical analyzer
Specification of tokens
Finite state machines
From a regular expressions to an NFA
Convert NFA to DFA
Transforming grammars and regular expressions
Transforming automata to grammars
Language for specifying lexical analyzers
The document discusses syntax analysis and parsing. It defines context-free grammars and different types of grammars. It also discusses derivation, parse trees, ambiguity in grammars and different parsing techniques like top-down and bottom-up parsing.
The document discusses lexical analysis and lexical analyzer generators. It begins by explaining that lexical analysis separates a program into tokens, which simplifies parser design and implementation. It then covers how the lexical analyzer interacts with the parser by returning tokens. The rest of the document details how to specify patterns for tokens using regular expressions and regular definitions, and how lexical analyzer generators like Lex and Flex translate those specifications into a scanner that recognizes tokens.
The document discusses lexical analysis and lexical analyzer generators. It covers topics such as:
- Lexical analysis separates the compiler into distinct phases for simpler design and more efficient implementation.
- Lexical analyzers are specified using regular expressions which are translated into nondeterministic finite automata (NFA) and then deterministic finite automata (DFA) for efficient scanning.
- Tools like Lex and Flex can automatically generate a lexical analyzer in C from a specification file defining tokens with regular expressions and actions.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
The document discusses syntax analysis in compiler design. It defines syntax analysis as the process of analyzing a string of symbols according to the rules of a formal grammar. This involves checking the syntax against a context-free grammar, which is more powerful than regular expressions and can check balancing of tokens. The output of syntax analysis is a parse tree. It separates lexical analysis and parsing for simplicity and efficiency. Lexical analysis breaks the source code into tokens, while parsing analyzes token streams against production rules to detect errors and generate the parse tree.
Regular expressions define patterns to match strings. A regular expression compares its pattern to a string and returns true or false. If true, it may return something depending on the function used. Strings are data types that represent text using characters. A language is a set of strings formed from a specific alphabet. Operations like concatenation and union can combine languages. Regular expressions, strings, and languages are fundamental concepts in compiler design.
The document discusses lexical analysis and lexical analyzer generators. It provides background on why lexical analysis is a separate phase in compiler design and how it simplifies parsing. It also describes how a lexical analyzer interacts with a parser and some key attributes of tokens like lexemes and patterns. Finally, it explains how regular expressions are used to specify patterns for tokens and how tools like Lex and Flex can be used to generate lexical analyzers from regular expression definitions.
The *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responThe *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responses*: Insects can exhibit complex behaviors, such as mating, foraging, and social interactions.
Characteristics
1. *Decentralized*: Insect nervous systems have some autonomy in different body parts.
2. *Specialized*: Different parts of the nervous system are specialized for specific functions.
3. *Efficient*: Insect nervous systems are highly efficient, allowing for rapid processing and response to stimuli.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive in diverse environments.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive
The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy
The Ever-Evolving World of
Science
Welcome to Grade 7 Science4not just a textbook with facts, but an invitation to
question, experiment, and explore the beautiful world we live in. From tiny cells
inside a leaf to the movement of celestial bodies, from household materials to
underground water flows, this journey will challenge your thinking and expand
your knowledge.
Notice something special about this book? The page numbers follow the playful
flight of a butterfly and a soaring paper plane! Just as these objects take flight,
learning soars when curiosity leads the way. Simple observations, like paper
planes, have inspired scientific explorations throughout history.
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsesushreesangita003
what is pulse ?
Purpose
physiology and Regulation of pulse
Characteristics of pulse
factors affecting pulse
Sites of pulse
Alteration of pulse
for BSC Nursing 1st semester
for Gnm Nursing 1st year
Students .
vitalsign
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. The current case count is 795 from Texas, New Mexico, Oklahoma, and Kansas. 95 individuals have required hospitalization, and 3 deaths, 2 children in Texas and one adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003.
The YSPH Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources – including status reports, maps, news articles, and web content– into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The “unlocked" format enables other responders to share, copy, and adapt seamlessly. The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...larencebapu132
This is short and accurate description of World war-1 (1914-18)
It can give you the perfect factual conceptual clarity on the great war
Regards Simanchala Sarab
Student of BABed(ITEP, Secondary stage)in History at Guru Nanak Dev University Amritsar Punjab 🙏🙏
Understanding P–N Junction Semiconductors: A Beginner’s GuideGS Virdi
Dive into the fundamentals of P–N junctions, the heart of every diode and semiconductor device. In this concise presentation, Dr. G.S. Virdi (Former Chief Scientist, CSIR-CEERI Pilani) covers:
What Is a P–N Junction? Learn how P-type and N-type materials join to create a diode.
Depletion Region & Biasing: See how forward and reverse bias shape the voltage–current behavior.
V–I Characteristics: Understand the curve that defines diode operation.
Real-World Uses: Discover common applications in rectifiers, signal clipping, and more.
Ideal for electronics students, hobbyists, and engineers seeking a clear, practical introduction to P–N junction semiconductors.
Title: A Quick and Illustrated Guide to APA Style Referencing (7th Edition)
This visual and beginner-friendly guide simplifies the APA referencing style (7th edition) for academic writing. Designed especially for commerce students and research beginners, it includes:
✅ Real examples from original research papers
✅ Color-coded diagrams for clarity
✅ Key rules for in-text citation and reference list formatting
✅ Free citation tools like Mendeley & Zotero explained
Whether you're writing a college assignment, dissertation, or academic article, this guide will help you cite your sources correctly, confidently, and consistent.
Created by: Prof. Ishika Ghosh,
Faculty.
📩 For queries or feedback: [email protected]
*Metamorphosis* is a biological process where an animal undergoes a dramatic transformation from a juvenile or larval stage to a adult stage, often involving significant changes in form and structure. This process is commonly seen in insects, amphibians, and some other animals.
Multi-currency in odoo accounting and Update exchange rates automatically in ...Celine George
Most business transactions use the currencies of several countries for financial operations. For global transactions, multi-currency management is essential for enabling international trade.
This chapter provides an in-depth overview of the viscosity of macromolecules, an essential concept in biophysics and medical sciences, especially in understanding fluid behavior like blood flow in the human body.
Key concepts covered include:
✅ Definition and Types of Viscosity: Dynamic vs. Kinematic viscosity, cohesion, and adhesion.
⚙️ Methods of Measuring Viscosity:
Rotary Viscometer
Vibrational Viscometer
Falling Object Method
Capillary Viscometer
🌡️ Factors Affecting Viscosity: Temperature, composition, flow rate.
🩺 Clinical Relevance: Impact of blood viscosity in cardiovascular health.
🌊 Fluid Dynamics: Laminar vs. turbulent flow, Reynolds number.
🔬 Extension Techniques:
Chromatography (adsorption, partition, TLC, etc.)
Electrophoresis (protein/DNA separation)
Sedimentation and Centrifugation methods.
How to Subscribe Newsletter From Odoo 18 WebsiteCeline George
Newsletter is a powerful tool that effectively manage the email marketing . It allows us to send professional looking HTML formatted emails. Under the Mailing Lists in Email Marketing we can find all the Newsletter.
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schoolsdogden2
Algebra 1 is often described as a “gateway” class, a pivotal moment that can shape the rest of a student’s K–12 education. Early access is key: successfully completing Algebra 1 in middle school allows students to complete advanced math and science coursework in high school, which research shows lead to higher wages and lower rates of unemployment in adulthood.
Learn how The Atlanta Public Schools is using their data to create a more equitable enrollment in middle school Algebra classes.
Exploring Substances:
Acidic, Basic, and
Neutral
Welcome to the fascinating world of acids and bases! Join siblings Ashwin and
Keerthi as they explore the colorful world of substances at their school's
National Science Day fair. Their adventure begins with a mysterious white paper
that reveals hidden messages when sprayed with a special liquid.
In this presentation, we'll discover how different substances can be classified as
acidic, basic, or neutral. We'll explore natural indicators like litmus, red rose
extract, and turmeric that help us identify these substances through color
changes. We'll also learn about neutralization reactions and their applications in
our daily lives.
by sandeep swamy
2. Lexical Analyzer
• Functions
• Grouping input characters into tokens
• Stripping out comments and white spaces
• Correlating error messages with the source program
• Issues (why separating lexical analysis from
parsing)
• Simpler design
• Compiler efficiency
• Compiler portability (e.g. Linux to Win)
3. The Role of a Lexical Analyzer
Lexical
analyzer
Parser
Source
program
read char
put back
char
pass token
and attribute value
get next
Symbol Table
Read entire
program into
memory
id
4. Lexical Analysis
• What do we want to do? Example:
if (i == j)
Z = 0;
else
Z = 1;
• The input is just a string of characters:
t if (i == j) n t t z = 0;n t else n t t z = 1;
• Goal: Partition input string into substrings
• Where the substrings are tokens
8. What’s a Token?
• A syntactic category
• In English:
• noun, verb, adjective, …
• In a programming language:
• Identifier, Integer, Keyword, Whitespace,
9. What are Tokens For?
• Classify program substrings according to role
• Output of lexical analysis is a stream of tokens . . .which is input to the
parser
• Parser relies on token distinctions
• An identifier is treated differently than a keyword
10. Tokens
• Tokens correspond to sets of strings.
• Identifier: strings of letters or digits, starting with a letter
• Integer: a non-empty string of digits
• Keyword: “else” or “if” or “begin” or …
• Whitespace: a non-empty sequence of blanks, newlines, and tabs
11. Typical Tokens in a PL
• Symbols: +, -, *, /, =, <, >, ->, …
• Keywords: if, while, struct, float, int, …
• Integer and Real (floating point) literals
123, 123.45
• Char (string) literals
• Identifiers
• Comments
• White space
13. Tokens, Patterns and Lexemes
• Pattern: A rule that describes a set of strings
• Token: A set of strings in the same pattern
• Lexeme: The sequence of characters of a token
Token Sample Lexemes Pattern
if if if
id abc, n, count,… letters+digit
NUMBER 3.14, 1000 numerical
constant
; ; ;
17. Token Attribute
• E = C1 ** 10
Token Attribute
ID Index to symbol table entry E
=
ID Index to symbol table entry C1
**
NUM 10
18. Lexical Error and Recovery
• Error detection
• Error reporting
• Error recovery
• Delete the current character and restart scanning at
the next character
• Delete the first character read by the scanner and
resume scanning at the character following it.
• How about runaway strings and comments?
20. Specification of Tokens
• Regular expressions are an important notation for specifying lexeme
patterns. While they cannot express all possible patterns, they are very
effective in specifying those types of patterns that we actually need for
tokens.
21. Strings and Languages
• An alphabet is any finite set of symbols such as letters, digits, and
punctuation.
• The set {0,1) is the binary alphabet
• If x and y are strings, then the concatenation of x and y is also string,
denoted xy, For example, if x = dog and y = house, then xy = doghouse.
• The empty string is the identity under concatenation; that is, for any string
s, ES = SE = s.
• A string over an alphabet is a finite sequence of symbols drawn
from that alphabet.
• In language theory, the terms "sentence" and "word" are often used as
synonyms for "string."
• |s| represents the length of a string s, Ex: banana is a string of length 6
• The empty string, is the string of length zero.
22. Strings and Languages (cont.)
• A language is any countable set of strings over some fixed alphabet.
• Let L = {A, . . . , Z}, then{“A”,”B”,”C”, “BF”…,”ABZ”,…] is consider the
language defined by L
• Abstract languages like , the empty set, or
{},the set containing only the empty string, are languages under this definition.
24. Operations on Languages
Example:
Let L be the set of letters {A, B, . . . , Z, a, b, . . . , z ) and
let D be the set of digits {0,1,.. .9).
L and D are, respectively, the alphabets of uppercase and lowercase
letters and of digits.
other languages can be constructed from L and D, using the operators
illustrated above
26. Operations on Languages (cont.)
1. L U D is the set of letters and digits - strictly speaking the language with 62
(52+10) strings of length one, each of which strings is either one letter or one
digit.
2. LD is the set of 520 strings of length two, each consisting of one letter followed
by one digit.(10×52).
Ex: A1, a1,B0,etc
3. L4 is the set of all 4-letter strings. (ex: aaba, bcef)
4. L* is the set of all strings of letters, including e, the empty string.
5. L(L U D)* is the set of all strings of letters and digits beginning with a letter.
6. D+ is the set of all strings of one or more digits.
27. Regular Expressions
• The standard notation for regular languages is regular expressions.
• Atomic regular expression:
• Compound regular expression:
28. Cont.
larger regular expressions are built from smaller ones. Let r and s are regular
expressions denoting languages L(r) and L(s), respectively.
1. (r) | (s) is a regular expression denoting the language L(r) U L(s).
2. (r) (s) is a regular expression denoting the language L(r) L(s) .
3. (r) * is a regular expression denoting (L (r)) * .
4. (r) is a regular expression denoting L(r). This last rule says that we can
add additional pairs of parentheses around expressions without changing
the language they denote.
for example, we may replace the regular expression (a) | ((b) * (c)) by a| b*c.
31. Regular Definition
• C identifiers are strings of letters, digits, and underscores.
The regular definition for the language of C identifiers.
• LetterA | B | C|…| Z | a | b | … |z| -
• digit 0|1|2 |… | 9
• id letter( letter | digit )*
• Unsigned numbers (integer or floating point) are strings such
as 5280, 0.01234, 6.336E4, or 1.89E-4. The regular
definition
• digit 0|1|2 |… | 9
• digits digit digit*
• optionalFraction .digits |
• optionalExponent ( E( + |- | ) digits ) |
• number digits optionalFraction optionalExponent
33. RECOGNITION OF TOKENS
•The patterns for the given tokens:
•Given the grammar of branching statement:
The terminals of the grammar, which are
if, then, else, relop, id, and number, are
the names of tokens as used by the lexical
analyzer.
The lexical analyzer also has the job of
stripping out whitespace, by recognizing
the "token" ws defined by:
39. Recognition of Identifiers
• Ex2: ID = letter(letter | digit) *
9 10 11
start letter
return(id)
# indicates input retraction
other
#
letter or digit
Transition Diagram:
40. Mapping transition diagrams into C code
9 10 11
start letter
return(id)
other
letter or digit
switch (state) {
case 9:
if (isletter( c) ) state = 10; else state =
failure();
break;
case 10: c = nextchar();
if (isletter( c) || isdigit( c) ) state = 10; else state 11
case 11: retract(1); insert(id); return;
41. Recognition of Reserved Words
•Install the reserved words in the symbol table initially. A field of the symbol-
table entry indicates that these strings are never ordinary identifiers, and tells
which token they represent.
•Create separate transition diagrams for each keyword; the transition
diagram for the reserved word then
42. The transition diagram for token number
Multiple accepting state
Accepting integer
e.g. 12
Accepting float
e.g. 12.31
Accepting float
e.g. 12.31E4
43. Finite Automata
• Transition diagram is finite automation
• Nondeterministic Finite Automation (NFA)
• A set of states
• A set of input symbols
• A transition function, move(), that maps state-symbol
pairs to sets of states.
• A start state S0
• A set of states F as accepting (Final) states.
44. Example
0 1 3
start a
2
b b
a
b
The set of states = {0,1,2,3}
Input symbol = {a,b}
Start state is S0, accepting state is S3