0% found this document useful (0 votes)
22 views

SSCD Chapter3

The lexical analyzer breaks the input program into tokens by identifying patterns like identifiers, keywords, operators, and delimiters. It builds a symbol table to track these lexical units. It may also perform tasks like removing whitespace and comments. The lexical analyzer then passes the tokenized input to the parser for the next phase of syntactic analysis.

Uploaded by

SAWAN J KOTIAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

SSCD Chapter3

The lexical analyzer breaks the input program into tokens by identifying patterns like identifiers, keywords, operators, and delimiters. It builds a symbol table to track these lexical units. It may also perform tasks like removing whitespace and comments. The lexical analyzer then passes the tokenized input to the parser for the next phase of syntactic analysis.

Uploaded by

SAWAN J KOTIAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

MODULE 2

Semantic Analysis
Explain in detail various phases of
compilation for the input string
X=a*b+(c*d)/e
X=a*b+(c*d)/e
Symbol Table
Lexical Analyzer Symbol ……..
1 x

<id,1>
X=a*b+(c*d)/e
Symbol Table
Lexical Analyzer Symbol Type
1 x id
= op

<id,1> <=>
X=a*b+(c*d)/e
Symbol Table
Lexical Analyzer Symbol Type
1 x id
1 = op
2 a id
<id,1> <=> <id,2>
X=a*b+(c*d)/e
Symbol Table
Lexical Analyzer Symbol Type
1 x id
2 a id

<id,1> <=> <id,2> <*>


X=a*b+(c*d)/e
Symbol Table
Lexical Analyzer Symbol Type
1 x id
2 a id
3 b id
<id,1> <=> <id,2> <*> <id,3> <+>
X=a*b+(c*d)/e

Lexical Analyzer

<id,1> <=> <id,2> <*> <id,3> <+> <(>


X=a*b+(c*d)/e

Lexical Analyzer

<id,1> <=> <id,2> <*> <id,3> <+> <(> <id,4>


<*> <id,5> <)> < / > <id, 6>
Symbol Table
Symbol Type
1 x id
2 a id
3 b id
4 c id
5 d Id
6 e id
X=a*b+(c*d)/e

Lexical Analyzer

<id,1> <=> <id,2> <*> <id,3> <+> <(> <id,4>


<*> <id,5> <)> < / > <id, 6>
Syntax Analyzer
Syntax Analyzer
X=a*b+(c*d)/e
X=a*b+ (c*d) /e

c d
Syntax Analyzer
X=a*b+(c*d)/e
X=a*b+ (c*d) /e
X=a*b+(c*d)/e

*
*
c d
a b
Syntax Analyzer
X=a*b+(c*d)/e
X=a*b+ (c*d) /e
X=a*b+(c*d)/e
X=a*b+(c*d)/e

* * e

a b c d
Syntax Analyzer
X=a*b+(c*d)/e
X=a*b+ (c*d) /e
X=a*b+(c*d)/e
+
X=a*b+(c*d)/e

* * e

a b c d
Syntax Analyzer
X=a*b+(c*d)/e
=
X=a*b+ (c*d) /e
X=a*b+(c*d)/e
x +
X=a*b+(c*d)/e
X=a*b+(c*d)/e
/

* * e

a b c d
Syntax Analyzer

id,1 +

* * id,6

Id,2 id,3 id,4 id,5


Semantic Analysis

id,1 +

* * id,6

Id,2 id,3 id,4 id,5


Intermediate Code Generation

t1= id4 * id5 t1 =c*d


t2 =a*b
t2 = id2 * id3
t3 = t1/e
t3 = t2/id6 t4 = t2+t3
t4= t2+t3
x= t4
m/c Independent Code Optimization

t1= id4 * id5 t1 =c*d


t2 =a*b
t2 = id2* id3
t3 = t1/e
t3= t1/id6 t4 = t2+t3
x= t2+t3
Code Generation
t1= id4 * id5
MOV R1, id4 t2 = id2* id3
MOV R2,id5 t3= t1/id6
MUL R1, R2 x= t2+t3
Code Generation

MOV R1, id4 t1= id4 * id5


t2 = id2* id3
MOV R2,id5
t3= t1/id6
MUL R1, R2 x= t2+t3

MOV R3,id2
MOV R4,id3
MUL R3, R4
Code Generation

MOV R1, id4 t1= id4 * id5


t2 = id2* id3
MOV R2,id5
t3= t1/id6
MUL R1, R2 x= t2+t3

MOV R3,id2
MOV R4,id3
MUL R3, R4

MOV R5, id6


DIV R1, R5

ADD R3,R1
STR X,R3
various phases of compilation for the input string
Y=(x+z)/m*n
• The evolution of programming languages
• The science of building compiler
• Application of compiler technology
1.Optimization for high level programming languages
2. Optimization for computer architecture
3.Designing of new computer architecture
4.Program translation
5.Software productivity tools
Role

Lexical Analysis
Analogy
• A,B,C….Z
Word
• System-
• a,b,c,…z • System interconnected
• Compiler Components

Alphabets Dictionary
• Alphabets Words Dictionary

Symbol
Lexemes
Table

Lexeme is a sequence of characters in the


source program that is matched by the pattern
for token.
(words in the source Program)
Example
Token Informal description Sample lexemes
if Characters i, f if
else Characters e, l, s, e else
relation < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2


number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

46
Roles of the Lexical analyzer

Lexical analyzer performs below given tasks:


• Helps to identify token into the symbol table
• Removes white spaces and comments from the source program
• Correlates error messages with the source program
• Helps you to expands the macros if it is found in the source program
• Read input characters from the source program
Lexical Analysis

Parser
Lexical Analyzer and Parser Communication
Token Rest of the
Compiler
Lexical Analyzer Parser
Source Program
Get next Token

Symbol Table

52
Why to separate Lexical analysis and parsing

o Simplicity of design

o Improving compiler efficiency

o Enhancing compiler portability

54
Lexical Errors

•Lexical : name of some identifier typed incorrectly


•Syntactical : missing semicolon or unbalanced
parenthesis
•Semantical : incompatible value assignment
•Logical : code not reachable, infinite loop

55
Lexical Errors
Panic mode recovery: In this method, successive characters from the input are
removed one at a time until a designated set of synchronizing tokens is found.
Synchronizing tokens are delimiters such as; or }
• The advantage is that it is easy to implement and guarantees not to go into an
infinite loop
• The disadvantage is that a considerable amount of input is skipped without
checking it for additional errors

Other error recovery options:


1.Delete one character from the remaining input
2.Inserting a missing character into the remaining input
3.Replace a character by another character
4.Transpose 2 adjacent characters

56
Using Buffer to Enhance Efficiency
Current token
E = M * C * * 2 eof

lexeme beginning forward (scans


ahead to find
pattern match)
if forward at end of first buffer then begin
reload second buffer ; Block I/O
forward : = forward + 1
end
else if forward at end of second buffer then begin
reload first buffer ;
Block I/O
move forward to beginning of first buffer
end
else forward : = forward + 1 ;

69
Algorithm: Buffered I/O with Current
Sentinels
token
E = M * eof C * * 2 eof eof

lexeme beginning forward (scans


ahead to find
forward : = forward + 1 ;
pattern match)
if forward is at eof then begin
if forward at end of first buffer then begin
reload second buffer ;
Block I/O
forward : = forward + 1
end
else if forward at end of second buffer then begin
reload first buffer ;
Block I/O
move forward to biginning of first buffer
end
else / * eof within buffer signifying end of input * /
terminate lexical analysis
end
70 2nd eof ⇒ no more input !
Strings and Languages
Terms for parts of strings
Regular Expression
• Specification of Tokens
• A regular expression is a specific pattern that provides concise and
flexible means to "match" (specify and recognize) strings of text
1)a|b->
2)(a|b)(a|b)->
3)a*->
4)(a|b)*
5)a|a*b->
Algebraic laws for RE
The C Identifiers

What will be the C Identifiers?


Unsigned Numbers

What will be C Unsigned Numbers?


2380, 0.0123, 6.34E34, 12.3E-12
Extensions of Regular Expressions
• Kleene closure and Positive closure: one or more instances
• r* = r+| Ɛ and r+ = rr* = r*r
• Zero or one instance: r? is equivalent to r l Ɛ
• Character classes. [a-z] is shorthand for
a|b|. . . |z
Using the Extension
Recognition of Tokens
• Our discussion will make use of the following running
example.
Recognition of Tokens
• For relop, we use the comparison operators of languages like Pascal
or SQL, where = is “equals” and <> is “not equals,” because it presents
an interesting structure of lexemes.
• The terminals of the grammar, which are if, then, else, relop, id, and
number, are the names of tokens as far as the lexical analyzer is
concerned
Example

• The patterns for these tokens are described using regular


definitions.
Example
• Lexical analyzer stripes out white- space, by recognizing the “token”
ws defined by:
ws → (blank | tab | newline)+
• Here, blank, tab, and newline are abstract symbols that we use to
express the ASCII characters of the same names.
• ws is not returned to the parser.
• We rather restart the lexical analysis from the character that follows
the whitespace.
• Our goal for the lexical analyzer is summarized in figure.
Transition diagrams
• Transition diagram for relop
start < =
0 1 2 Return(relop,LE)

>
3 Return(relop,NE)

other *
4 Return(relop,LT)

= Return(relop,EQ)
5

> =
6 7 Return(relop,GE)
other *
8 Return(relop,GT)
93
Transition diagrams (cont.)
• Transition diagram for reserved words and identifiers
Letter/digit

start letter other *


1 2 3

Return(getToken().instalID())

94
Transition diagrams (cont.)
• Transition diagram for unsigned numbers

95
Transition diagrams (cont.)
• Transition diagram for whitespace

96

You might also like