SlideShare a Scribd company logo
COMPILER DESIGN
Topic: Lexical Analysis
By
RANJAN V
The Role of the Lexical Analyzer
• As the first phase of a compiler, the main task of the lexical analyzer
is to read the input characters of the source program, group them into
lexemes, and produce as output a sequence of tokens for each lexeme
in the source program.
• It is common for the lexical analyzer to interact with the symbol table
as well. When the lexical analyzer discovers a lexeme constituting an
identifier, it needs to enter that lexeme into the symbol table.
The role of lexical analyzer
Lexical Analyzer Parser
Source
program
token
getNextToken
Symbol
table
To semantic
analysis
Lexical Analyzer continue----
• Since the lexical analyzer is the part of the compiler that reads the
source text, it may perform certain other tasks besides identification
of lexemes
They are:
• One such task is stripping out comments and whitespace (blank,
newline, tab
• Another task is correlating error messages generated by the compiler
with the source program.
• i.e For instance, the lexical analyzer may keep track of the number of
newline characters seen, so it can associate a line number with each
error message.
Sometimes, lexical analyzers are divided into a cascade of two
processes
a) Scanning consists of the simple processes that do not require
tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.
b) b) Lexical analysis proper is the more complex portion, where the
scanner produces the sequence of tokens as output
Why to separate Lexical analysis and
parsing(Syntax analyzer)
1. Simplicity of design
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks. For
example, a parser that had to deal with comments and
whitespace as syntactic units would be considerably more
complex than one that can assume comments and
whitespace have already been removed by the lexical
analyzer
2. Improving compiler efficiency
A separate lexical analyzer allows us to apply specialized
techniques that serve only the lexical task, not the job of
parsing.
Continue…
3. Enhancing compiler portability
Input-device-specific peculiarities can be restricted to the lexical analyzer
Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional token value
• A pattern is a description of the form that the lexemes of a token may
take
• A lexeme is a sequence of characters in the source program that
matches the pattern for a token
Example
Token Informal description Sample lexemes
if
else
comparison
id
number
literal
Characters i, f
Characters e, l, s, e
< or > or <= or >= or == or !=
Letter followed by letter and digits
Any numeric constant
Anything but “ sorrounded by “
if
else
<=, !=
pi, score, D2
3.14159, 0, 6.02e23
“core dumped”
printf(“total = %dn”, score);
Attributes for tokens
• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
Lexical errors
It is hard for a lexical analyzer to tell, without the aid of other
components, that there is a source-code error
• Some errors are out of power of lexical analyzer to recognize:
• fi (a == f(x)) …
a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an
undeclared function identifier. Since fi is a valid lexeme for the token id, the
lexical analyzer must return the token id to the parser and let some other
phase of the compiler - probably the parser in this case - handle an error
• However it may be able to recognize errors like:
• d = 2r
• Such errors are recognized when no pattern for tokens matches a
character sequence
Error recovery
Suppose a situation arises in which the lexical analyzer is unable to
proceed because none of the patterns for tokens matches any prefix of
the remaining input
The simplest recovery strategy is "panic mode" recovery:-
• Successive characters are ignored until we reach to a well formed
token
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent characters
Input buffering
• Sometimes lexical analyzer needs to look ahead some symbols to
decide about the token to return
• In C language: we need to look after -, = or < to decide what token to return
• In Fortran: DO 5 I = 1.25
• We need to introduce a two buffer scheme to handle large look-
aheads safely
E = M * C * * 2 eof
Sentinels
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
E = M eof * C * * 2 eof eof
Specification of tokens
• In theory of compilation regular expressions are used to formalize
the specification of tokens
• Regular expressions are means for specifying regular languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form of strings
Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression, L(a) = {a}
• (r) | (s) is a regular expression denoting the language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn
• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]
• Example:
• letter_ -> [A-Za-z_]
• digit -> [0-9]
• id -> letter_(letter|digit)*
Recognition of tokens
• Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
| Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
Transition diagrams
• Transition diagram for relop
Transition diagrams (cont.)
• Transition diagram for reserved words and identifiers
Transition diagrams (cont.)
• Transition diagram for unsigned numbers
Transition diagrams (cont.)
• Transition diagram for whitespace
Architecture of a transition-diagram-based
lexical analyzer
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) { /* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: …
…
case 8: retract();
retToken.attribute = GT;
return(retToken);
}
Lexical Analyzer Generator - Lex
Lexical Compiler
Lex Source program
lex.l
lex.yy.c
C
compiler
lex.yy.c a.out
a.out
Input stream Sequence
of tokens
Structure of Lex programs
declarations
%%
translation rules
%%
auxiliary functions
Pattern {Action}
PUMPING LEMMA
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
BY contradiction we can prove that all languages are not regular
using pumping lemma
CFG
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
Leftmost and Right Most derivation
Take an example of the below grammar
Production rule should be of the form as
mentioned below for CFG
Left Most derivation
Right Most derivation
Example 2 of Leftmost and right most
derivation
1. S->AB/€
2. A->aB
3. B->Sb
Derive “abb” from both leftmost and rightmost derivation.
Left Most Derivation: Right Most derivation
S->AB S->AB
->aBB -> Asb S-> €
->aSbB ->Ab A->aB
->abB ->aBb B-> Sb
->abSb ->aSbb S-> €
->abb ->abb
Sentential Forms
•
atc 3rd module compiler and automata.ppt
Parse tree or Derivation tree
• The parse tree is the pictorial representation of derivations.
Therefore, it is also known as derivation trees. The derivation tree is
independent of the other in which productions are used.
• A parse tree is an ordered tree in which nodes are labeled with the
left side of the productions and in which the children of a node
define its equivalent right parse tree also known as syntax tree,
generation tree, or production tree.
• A Parse Tree for a CFG G =(V,∑, P,S) is a tree satisfying the following
conditions −
Conditions
1. Root has the label S, where S is the start symbol.
2. Each vertex of the parse tree has a label which can be a variable
(V), terminal (Σ), or ε.
3. If A → C1,C2…….Cn is a production, then C1,C2…….Cn are children of
node labeled A.
4. Leaf Nodes are terminal (Σ), and Interior nodes are variable (V).
5. The label of an internal vertex is always a variable
Yield or result − Yield of Derivation Tree is the concatenation of labels
of the leaves in left to right ordering.
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
Consider the Grammar given below −
E⇒ E+E|E ∗E|id
Find Leftmost and Rightmost Derivation for the string.
Left Most :
E ⇒ E+E
⇒ E+E+E
⇒ id+E+E
⇒ id+id+E
⇒ id+id+id
Right Most derivation:
E ⇒ E+E
⇒ E+E+E
⇒ E+E+id
⇒ E+id+id
⇒ id+id+id
Example1 − If CFG has productions.
S → a A S | a
S → Sb A | SS | ba
Show that S ⇒ *aa bb aa & construct parse tree whose yield is aa bb aa.
Solution
• S ⇒lm aAS
• ⇒ a Sb A S
• ⇒ aa b A S
• ⇒ aa bba S
∴ S ⇒ * aa bb aa
lm S ⇒ aAS
⇒ a Sb A S
⇒ aa b A S
⇒ aa bba S
∴ S ⇒ * aa bb aa
Ambiguous Grammar
•
Example 2
• Let us consider this grammar: E -> E+E|id We can create a 2 parse
tree from this grammar to obtain a string id+id+id. The following are
the 2 parse trees generated by left-most derivation:
Recursions
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
Eliminating Left recursion
atc 3rd module compiler and automata.ppt
Left Factoring
• Converting the grammar to Non-Deterministic to Deterministic
By converting the production into form
A->αA’
A’->β1| β2 , Look the example below
•
Example 2
A->aAB|aA
B->bB|b
Solution: Converting the above grammar into below form
A->αA’
A’->β1| β2
A->aA’ B->b|B’
A’->AB|A B’->B|€
Example 3
A->aAB|aA|a
Solution: Converting the above grammar into below form
A->αA’
A’->β1| β2
A->aA’
A’->AB|A|€
Examples
A->aB/abc
Solution :A->aA’
A’->B|bc
S->iEtS|iEtSeS|a
E->b
Solution: S-> iEtSS’ |a
S’->€|es
E->b
Examples
• S->aSSbS|aSaSb|abb|b
• S->aS’|b
• S’->SSbS|SaSb|bb
• S’’->SS”|bb
• S”->SbS|aSb

More Related Content

Similar to atc 3rd module compiler and automata.ppt (20)

PDF
3a. Context Free Grammar.pdf
TANZINTANZINA
 
PDF
An Introduction to the Compiler Designss
ElakkiaU
 
PPTX
Chapter 2.pptx compiler design lecture note
adugnanegero
 
PPT
Lecturer-05 lex anylser (1).pptrjyghsgst
engrsheikhmuhammadha
 
PPTX
5490ce2bf23093de242ccc160dbfd3b639d.pptx
anuveeshshettycse
 
PPTX
Role-of-lexical-analysis
Dattatray Gandhmal
 
PPT
SS & CD Module 3
ShwetaNirmanik
 
PPT
Module 2
ShwetaNirmanik
 
PDF
Lexical analysis - Compiler Design
Kuppusamy P
 
PPTX
Lecture 02 lexical analysis
Iffat Anjum
 
PPTX
A Role of Lexical Analyzer
Archana Gopinath
 
PPT
Lexical Analysis
Munni28
 
PPTX
04LexicalAnalysissnsnjmsjsjmsbdjjdnd.pptx
OishiBiswas1
 
PDF
Ch03-LexicalAnalysis in compiler design subject.pdf
Padamata Rameshbabu
 
PPT
Lecture 1 - Lexical Analysis.ppt
NderituGichuki1
 
DOCX
Compiler Design
Anujashejwal
 
PPTX
Lexical Analyser PPTs for Third Lease Computer Sc. and Engineering
DrRajurkarArchanaMil
 
PPTX
phases of compiler
SabeehSafdar2
 
PDF
Assignment4
Sunita Milind Dol
 
PPT
Compiler Designs
wasim liam
 
3a. Context Free Grammar.pdf
TANZINTANZINA
 
An Introduction to the Compiler Designss
ElakkiaU
 
Chapter 2.pptx compiler design lecture note
adugnanegero
 
Lecturer-05 lex anylser (1).pptrjyghsgst
engrsheikhmuhammadha
 
5490ce2bf23093de242ccc160dbfd3b639d.pptx
anuveeshshettycse
 
Role-of-lexical-analysis
Dattatray Gandhmal
 
SS & CD Module 3
ShwetaNirmanik
 
Module 2
ShwetaNirmanik
 
Lexical analysis - Compiler Design
Kuppusamy P
 
Lecture 02 lexical analysis
Iffat Anjum
 
A Role of Lexical Analyzer
Archana Gopinath
 
Lexical Analysis
Munni28
 
04LexicalAnalysissnsnjmsjsjmsbdjjdnd.pptx
OishiBiswas1
 
Ch03-LexicalAnalysis in compiler design subject.pdf
Padamata Rameshbabu
 
Lecture 1 - Lexical Analysis.ppt
NderituGichuki1
 
Compiler Design
Anujashejwal
 
Lexical Analyser PPTs for Third Lease Computer Sc. and Engineering
DrRajurkarArchanaMil
 
phases of compiler
SabeehSafdar2
 
Assignment4
Sunita Milind Dol
 
Compiler Designs
wasim liam
 

More from ranjan317165 (17)

PPT
universal human values L 14 Trust v4.ppt
ranjan317165
 
PPT
Universal human values self and body chapter
ranjan317165
 
PPT
L 13 universal human values Harmony in the Family v4.ppt
ranjan317165
 
PPT
L 20 Mutual Fulfilment in Nature uhv lectures v5.ppt
ranjan317165
 
PPTX
Module 4 Project management by ranjan v.pptx
ranjan317165
 
PPTX
Software Requiremnet analysis module 2.pptx
ranjan317165
 
PPTX
Introduction-to-Programming-Languages.pptx
ranjan317165
 
PPT
Information system securit lecture 1y .ppt
ranjan317165
 
PPTX
C functions with exercise to solve easily.pptx
ranjan317165
 
PPTX
C functions by ranjan call by value and reference.pptx
ranjan317165
 
PPT
L 27 Holistic Technologies v5 universal human values.ppt
ranjan317165
 
PPT
06_PumpingLemma compiler design of chapter 4.ppt
ranjan317165
 
PPT
CS540-2-lecture2 Lexical analyser of .ppt
ranjan317165
 
PPT
15CS46 - Data communication or computer networks 1_Module-3.ppt
ranjan317165
 
PPTX
compiler introduction vtu syllabus 1st chapter.pptx
ranjan317165
 
PPT
Ppt on Design engineering which is chapter 9
ranjan317165
 
PPTX
FiniteAutomata_anim.pptx
ranjan317165
 
universal human values L 14 Trust v4.ppt
ranjan317165
 
Universal human values self and body chapter
ranjan317165
 
L 13 universal human values Harmony in the Family v4.ppt
ranjan317165
 
L 20 Mutual Fulfilment in Nature uhv lectures v5.ppt
ranjan317165
 
Module 4 Project management by ranjan v.pptx
ranjan317165
 
Software Requiremnet analysis module 2.pptx
ranjan317165
 
Introduction-to-Programming-Languages.pptx
ranjan317165
 
Information system securit lecture 1y .ppt
ranjan317165
 
C functions with exercise to solve easily.pptx
ranjan317165
 
C functions by ranjan call by value and reference.pptx
ranjan317165
 
L 27 Holistic Technologies v5 universal human values.ppt
ranjan317165
 
06_PumpingLemma compiler design of chapter 4.ppt
ranjan317165
 
CS540-2-lecture2 Lexical analyser of .ppt
ranjan317165
 
15CS46 - Data communication or computer networks 1_Module-3.ppt
ranjan317165
 
compiler introduction vtu syllabus 1st chapter.pptx
ranjan317165
 
Ppt on Design engineering which is chapter 9
ranjan317165
 
FiniteAutomata_anim.pptx
ranjan317165
 
Ad

Recently uploaded (20)

PPTX
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPTX
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PPTX
MODULE 05 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
Distribution reservoir and service storage pptx
dhanashree78
 
PPTX
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PDF
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
PPT
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PDF
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PPTX
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
MODULE 05 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Distribution reservoir and service storage pptx
dhanashree78
 
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
Ad

atc 3rd module compiler and automata.ppt

  • 1. COMPILER DESIGN Topic: Lexical Analysis By RANJAN V
  • 2. The Role of the Lexical Analyzer • As the first phase of a compiler, the main task of the lexical analyzer is to read the input characters of the source program, group them into lexemes, and produce as output a sequence of tokens for each lexeme in the source program. • It is common for the lexical analyzer to interact with the symbol table as well. When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that lexeme into the symbol table.
  • 3. The role of lexical analyzer Lexical Analyzer Parser Source program token getNextToken Symbol table To semantic analysis
  • 4. Lexical Analyzer continue---- • Since the lexical analyzer is the part of the compiler that reads the source text, it may perform certain other tasks besides identification of lexemes They are: • One such task is stripping out comments and whitespace (blank, newline, tab • Another task is correlating error messages generated by the compiler with the source program. • i.e For instance, the lexical analyzer may keep track of the number of newline characters seen, so it can associate a line number with each error message.
  • 5. Sometimes, lexical analyzers are divided into a cascade of two processes a) Scanning consists of the simple processes that do not require tokenization of the input, such as deletion of comments and compaction of consecutive whitespace characters into one. b) b) Lexical analysis proper is the more complex portion, where the scanner produces the sequence of tokens as output
  • 6. Why to separate Lexical analysis and parsing(Syntax analyzer) 1. Simplicity of design The separation of lexical and syntactic analysis often allows us to simplify at least one of these tasks. For example, a parser that had to deal with comments and whitespace as syntactic units would be considerably more complex than one that can assume comments and whitespace have already been removed by the lexical analyzer 2. Improving compiler efficiency A separate lexical analyzer allows us to apply specialized techniques that serve only the lexical task, not the job of parsing.
  • 7. Continue… 3. Enhancing compiler portability Input-device-specific peculiarities can be restricted to the lexical analyzer
  • 8. Tokens, Patterns and Lexemes • A token is a pair a token name and an optional token value • A pattern is a description of the form that the lexemes of a token may take • A lexeme is a sequence of characters in the source program that matches the pattern for a token
  • 9. Example Token Informal description Sample lexemes if else comparison id number literal Characters i, f Characters e, l, s, e < or > or <= or >= or == or != Letter followed by letter and digits Any numeric constant Anything but “ sorrounded by “ if else <=, != pi, score, D2 3.14159, 0, 6.02e23 “core dumped” printf(“total = %dn”, score);
  • 10. Attributes for tokens • E = M * C ** 2 • <id, pointer to symbol table entry for E> • <assign-op> • <id, pointer to symbol table entry for M> • <mult-op> • <id, pointer to symbol table entry for C> • <exp-op> • <number, integer value 2>
  • 11. Lexical errors It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error • Some errors are out of power of lexical analyzer to recognize: • fi (a == f(x)) … a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared function identifier. Since fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler - probably the parser in this case - handle an error • However it may be able to recognize errors like: • d = 2r • Such errors are recognized when no pattern for tokens matches a character sequence
  • 12. Error recovery Suppose a situation arises in which the lexical analyzer is unable to proceed because none of the patterns for tokens matches any prefix of the remaining input The simplest recovery strategy is "panic mode" recovery:- • Successive characters are ignored until we reach to a well formed token • Delete one character from the remaining input • Insert a missing character into the remaining input • Replace a character by another character • Transpose two adjacent characters
  • 13. Input buffering • Sometimes lexical analyzer needs to look ahead some symbols to decide about the token to return • In C language: we need to look after -, = or < to decide what token to return • In Fortran: DO 5 I = 1.25 • We need to introduce a two buffer scheme to handle large look- aheads safely E = M * C * * 2 eof
  • 14. Sentinels Switch (*forward++) { case eof: if (forward is at end of first buffer) { reload second buffer; forward = beginning of second buffer; } else if {forward is at end of second buffer) { reload first buffer; forward = beginning of first buffer; } else /* eof within a buffer marks the end of input */ terminate lexical analysis; break; cases for the other characters; } E = M eof * C * * 2 eof eof
  • 15. Specification of tokens • In theory of compilation regular expressions are used to formalize the specification of tokens • Regular expressions are means for specifying regular languages • Example: • Letter_(letter_ | digit)* • Each regular expression is a pattern specifying the form of strings
  • 16. Regular expressions • Ɛ is a regular expression, L(Ɛ) = {Ɛ} • If a is a symbol in ∑then a is a regular expression, L(a) = {a} • (r) | (s) is a regular expression denoting the language L(r) ∪ L(s) • (r)(s) is a regular expression denoting the language L(r)L(s) • (r)* is a regular expression denoting (L9r))* • (r) is a regular expression denting L(r)
  • 17. Regular definitions d1 -> r1 d2 -> r2 … dn -> rn • Example: letter_ -> A | B | … | Z | a | b | … | Z | _ digit -> 0 | 1 | … | 9 id -> letter_ (letter_ | digit)*
  • 18. Extensions • One or more instances: (r)+ • Zero of one instances: r? • Character classes: [abc] • Example: • letter_ -> [A-Za-z_] • digit -> [0-9] • id -> letter_(letter|digit)*
  • 19. Recognition of tokens • Starting point is the language grammar to understand the tokens: stmt -> if expr then stmt | if expr then stmt else stmt | Ɛ expr -> term relop term | term term -> id | number
  • 20. Recognition of tokens (cont.) • The next step is to formalize the patterns: digit -> [0-9] Digits -> digit+ number -> digit(.digits)? (E[+-]? Digit)? letter -> [A-Za-z_] id -> letter (letter|digit)* If -> if Then -> then Else -> else Relop -> < | > | <= | >= | = | <> • We also need to handle whitespaces: ws -> (blank | tab | newline)+
  • 22. Transition diagrams (cont.) • Transition diagram for reserved words and identifiers
  • 23. Transition diagrams (cont.) • Transition diagram for unsigned numbers
  • 24. Transition diagrams (cont.) • Transition diagram for whitespace
  • 25. Architecture of a transition-diagram-based lexical analyzer TOKEN getRelop() { TOKEN retToken = new (RELOP) while (1) { /* repeat character processing until a return or failure occurs */ switch(state) { case 0: c= nextchar(); if (c == ‘<‘) state = 1; else if (c == ‘=‘) state = 5; else if (c == ‘>’) state = 6; else fail(); /* lexeme is not a relop */ break; case 1: … … case 8: retract(); retToken.attribute = GT; return(retToken); }
  • 26. Lexical Analyzer Generator - Lex Lexical Compiler Lex Source program lex.l lex.yy.c C compiler lex.yy.c a.out a.out Input stream Sequence of tokens
  • 27. Structure of Lex programs declarations %% translation rules %% auxiliary functions Pattern {Action}
  • 32. BY contradiction we can prove that all languages are not regular using pumping lemma
  • 33. CFG
  • 37. Leftmost and Right Most derivation Take an example of the below grammar
  • 38. Production rule should be of the form as mentioned below for CFG
  • 41. Example 2 of Leftmost and right most derivation 1. S->AB/€ 2. A->aB 3. B->Sb Derive “abb” from both leftmost and rightmost derivation. Left Most Derivation: Right Most derivation S->AB S->AB ->aBB -> Asb S-> € ->aSbB ->Ab A->aB ->abB ->aBb B-> Sb ->abSb ->aSbb S-> € ->abb ->abb
  • 43.
  • 45. Parse tree or Derivation tree • The parse tree is the pictorial representation of derivations. Therefore, it is also known as derivation trees. The derivation tree is independent of the other in which productions are used. • A parse tree is an ordered tree in which nodes are labeled with the left side of the productions and in which the children of a node define its equivalent right parse tree also known as syntax tree, generation tree, or production tree. • A Parse Tree for a CFG G =(V,∑, P,S) is a tree satisfying the following conditions −
  • 46. Conditions 1. Root has the label S, where S is the start symbol. 2. Each vertex of the parse tree has a label which can be a variable (V), terminal (Σ), or ε. 3. If A → C1,C2…….Cn is a production, then C1,C2…….Cn are children of node labeled A. 4. Leaf Nodes are terminal (Σ), and Interior nodes are variable (V). 5. The label of an internal vertex is always a variable Yield or result − Yield of Derivation Tree is the concatenation of labels of the leaves in left to right ordering.
  • 49. Consider the Grammar given below − E⇒ E+E|E ∗E|id Find Leftmost and Rightmost Derivation for the string. Left Most : E ⇒ E+E ⇒ E+E+E ⇒ id+E+E ⇒ id+id+E ⇒ id+id+id
  • 50. Right Most derivation: E ⇒ E+E ⇒ E+E+E ⇒ E+E+id ⇒ E+id+id ⇒ id+id+id
  • 51. Example1 − If CFG has productions. S → a A S | a S → Sb A | SS | ba Show that S ⇒ *aa bb aa & construct parse tree whose yield is aa bb aa. Solution • S ⇒lm aAS • ⇒ a Sb A S • ⇒ aa b A S • ⇒ aa bba S ∴ S ⇒ * aa bb aa
  • 52. lm S ⇒ aAS ⇒ a Sb A S ⇒ aa b A S ⇒ aa bba S ∴ S ⇒ * aa bb aa
  • 54. Example 2 • Let us consider this grammar: E -> E+E|id We can create a 2 parse tree from this grammar to obtain a string id+id+id. The following are the 2 parse trees generated by left-most derivation:
  • 60. Left Factoring • Converting the grammar to Non-Deterministic to Deterministic
  • 61. By converting the production into form A->αA’ A’->β1| β2 , Look the example below •
  • 62. Example 2 A->aAB|aA B->bB|b Solution: Converting the above grammar into below form A->αA’ A’->β1| β2 A->aA’ B->b|B’ A’->AB|A B’->B|€
  • 63. Example 3 A->aAB|aA|a Solution: Converting the above grammar into below form A->αA’ A’->β1| β2 A->aA’ A’->AB|A|€
  • 65. Examples • S->aSSbS|aSaSb|abb|b • S->aS’|b • S’->SSbS|SaSb|bb • S’’->SS”|bb • S”->SbS|aSb