0% found this document useful (0 votes)

30 views33 pages

Lesson 08 2

Uploaded by

my5911319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views33 pages

Lesson 08 2

Uploaded by

my5911319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

LESSON 08

Overview
of
Previous Lesson(s)
Over View
 Syntax-directed translation is done by attaching rules or program
fragments to productions in a grammar.

 An attribute is any quantity associated with a programming

construct .

 A translation scheme is a notation for attaching program fragments

to the productions of a grammar.

3
Over View..
 In an abstract syntax tree for an expression, each interior node
represents an operator, the children of the node represent the
operands of the operator.

Abstract Syntax tree for 9-5+2

4
Over View…
 Structure of our Compiler
Source
Program
(Character Token Syntax-directed Java
Lexical
Lexicalanalyzer
analyzer stream bytecode
stream) translator

Develop
parser and code
generator for translator

Syntax definition
JVM specification
(BNF grammar)

5
Over View…
 Typical tasks performed by lexical analyzer

 Removal of white space and comments

 Encode constants as tokens
 Recognize keywords
 Recognize identifiers
 Store identifier names in a global symbol table

6
TODAY’S LESSON

7
Contents
 Symbol Tables
 Symbol Table Per Scope
 The Use of Symbol Tables
 Intermediate Code Generator

 Syntax Directed Translator Flow

 Role of the Lexical Analyzer
 Tokens, Patterns & Lexemes
 Attributes for Tokens
 Lexical Errors
 Input Buffering
 Buffer Pairs

8
Symbol Tables
 Symbol tables are data structures that are used by compilers to
hold information about source-program constructs.

 Information is put into the symbol table when the declaration of an

identifier is analyzed.

 Entries in the symbol table contain information about an identifier

such as its character string (or lexeme) , its type, its position in
storage, and any other relevant information.

 The information is collected incrementally.

9
Symbol Table Per Scope
 If the language you are compiling supports nested scopes, the lexer
can only construct the <lexeme,token> pairs.

 The parser converts these pairs into a true symbol table that reflects
the nested scopes.
 If the language is flat, the scanner can produce the symbol table.

 Key idea is, when entering a block, a new symbol table is created.

 Each such table points to the one immediately outer table.

10
Use of Symbol Table
 A semantic action gets information from the symbol table when
the identifier is subsequently used, for example, as a factor in an
expression.

 Now there are two important forms for the intermediate code generator

 Trees, especially parse trees and syntax trees.

 Linear, especially three-address code

11
Intermediate Code Generator
 Static checking refers to checks performed during compilation,
whereas, dynamic checking refers to those performed at run time.

 Examples of static checks include

 Syntactic checks such as avoiding multiple declarations of the same

identifier in the same scope.

 Type checks.

12
Intermediate Code Generator..
 L-values and R-values
Consider Q = Z; or A[f(x)+B*D] = g(B+C*h(x,y));

 Three tasks:
 Evaluate the left hand side (LHS) to obtain an l-value.
 Evaluate the RHS to obtain an r-value.
 Perform the assignment.

 An l-value corresponds to an address or a location.

 An r-value corresponds to a value.
 Neither 12 nor s+t can be used as an l-value, but both are legal r-
values.

13
Intermediate Code Generator...
 Static checking is used to insure that R-values do not appear on the
LHS.

 Type Checking assures that the type of the operands are correct as
per the operator and also reports error, if any.

 Coercions The automatic conversion of one type to another.

 Overloading Same symbol can have different meanings depending on

the types of the operands.

14
Three Address Code
 These are primitive instructions that have one operator and (up to)
three operands, all of which are addresses.

 One address is the destination, which receives the result of the

operation,
 Other two addresses are the sources of the values to be operated on.

 Ex.
ADD x y z
MULT a b c
ARRAY_L q r s
ifTrueGoto x L

15
Syntax Directed Translator Flow
 The starting point for a syntax-directed translator is a grammar for
the source language.

 A grammar describes the hierarchical structure of programs.

 It is defined in terms of elementary symbols called terminals and

variable symbols called nonterminals.

 These symbols represent language constructs.

16
Syntax Directed Translator Flow..
 The productions of a grammar consist of a non terminal called the
left side of a production and a sequence of terminals and non
terminals called the right side of the production.

 One non terminal is designated as the start symbol.

 A lexical analyzer reads the input one character at a time and

produces as output a stream of tokens.
 A token consists of a terminal symbol along with additional
information in the form of attribute values.

17
Syntax Directed Translator Flow...
 Parsing is the problem of figuring out how a string of terminals can
be derived from the start symbol of the grammar by repeatedly
replacing a non terminal by the body of one of its productions.

 Efficient parsers can be built, using a top-down method called

predictive parsing.

 A syntax-directed definition attaches rules to productions, the rules

compute attribute vales.

18
Syntax Directed Translator Flow...
 A translation scheme embeds program fragments called semantic
actions in production bodies.
 The actions are executed in the order that productions are used
during syntax analysis.

 The result of syntax analysis is a representation of the source

program, called intermediate code.

 An abstract syntax tree has nodes for programming constructs, the

children of a node give the meaningful sub constructs.

19
Role of Lexical Analyzer

20
Role of Lexical Analyzer..
 Sometimes, lexical analyzers are divided into a cascade of two
processes:

 Scanning consists of the simple processes that do not require

tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.

 Lexical analysis is the more complex portion, where the scanner

produces the sequence of tokens as output.

21
Lexical Analysis Vs Parsing
 There are a number of reasons why the analysis portion is normally
separated into lexical analysis and parsing.

 The separation of lexical and syntactic analysis often allows us to

simplify at least one of these tasks.

 Compiler efficiency is improved. A separate lexical analyzer allows to

apply specialized techniques that serve only the lexical task, not the
job of parsing.

 Compiler portability is enhanced. Input-device-specific peculiarities

can be restricted to the lexical analyzer.

22
Tokens, Patterns & Lexemes
 A token is a pair consisting of a token name and an optional
attribute value.
 The token name is an abstract symbol representing a kind of lexical
unit, e.g., a particular keyword, or sequence of input characters
denoting an identifier.

 A pattern is a description of the form that the lexemes of a token

may take.
 In the case of a keyword as a token, the pattern is just the sequence of
characters that form the keyword.

 A lexeme is a sequence of characters in the source program that

matches the pattern for a token and is identified by the lexical
analyzer as an instance of that token.
23
Tokens, Patterns & Lexemes..
 In many programming languages, the following classes cover most
or all of the tokens:

24
Attributes for Tokens
 For tokens corresponding to keywords, attributes are not needed
since the name of the token tells everything.

 But consider the token corresponding to integer constants. Just

knowing that the we have a constant is not enough, subsequent
stages of the compiler need to know the value of the constant.

 Similarly for the token identifier we need to distinguish one

identifier from another.
 The normal method is for the attribute to specify the symbol table
entry for this identifier.

25
Attributes for Tokens..
 Ex. The token names and associated attribute values for the
Fortran statement E = M * C2 are as follows:

<id, pointer to symbol-table entry for E>

<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>

26
Lexical Errors
 Lexical analyzer didn’t always predict errors in source code without
the aid of other components.

 Ex. String fi is encountered for the first time in a program in the

context:
fi ( a == f (x) ) …
 A lexical analyzer cannot tell whether fi is a misspelling of the
keyword if or an undeclared function identifier.

 fi is a valid lexeme for the token id, the lexical analyzer must return
the token id to the parser and let parser in this case - handle an error
due to transposition of the letters.

27
Lexical Errors..

 If a lexical analyzer is unable to proceed because none of the

patterns for tokens matches any prefix of the remaining input. The
simplest recovery strategy is "panic mode" recovery.

 In this strategy we delete successive characters from the remaining

input, until the lexical analyzer can find a well-formed token at the
beginning of what input is left.

28
Lexical Errors...

 Other possible error-recovery actions are:

 Delete one character from the remaining input.

 Insert a missing character into the remaining input.
 Replace a character by another character.
 Transpose two adjacent characters.

29
Input Buffering
 Determining the next lexeme often requires reading the input
beyond the end of that lexeme.

 Ex.
 To determine the end of an identifier normally requires reading the
first whitespace character after it.
 Also just reading > does not determine the lexeme as it could also be
>=.
 When you determine the current lexeme, the characters you read
beyond it may need to be read again to determine the next lexeme.

30
Buffer Pairs
 Specialized buffering techniques have been developed to reduce
the amount of overhead required to process a single input
character.

 An important scheme involves two buffers that are alternately

reloaded.

31
Buffer Pairs
 Each buffer is of the same size N , and N is usually the size of a disk
block, e.g., 4096 bytes.

 Using one system read command we can read N characters into a

buffer, rather than using one system call per character.

 If fewer than N characters remain in the input file, then a special

character, eof, marks the end of the source file.

 Two pointers to the input are maintained:

 Pointer lexemeBegin, marks the beginning of the current lexeme,
whose extent we are attempting to determine.
 Pointer forward scans ahead until a pattern match is found.

32
Thank You

Datamine Command
100% (7)
Datamine Command
59 pages
QRadar 7.5.0 - Admin Guide
No ratings yet
QRadar 7.5.0 - Admin Guide
476 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
3 pages
ISC 2011 Computer Science Paper 2 Practical
No ratings yet
ISC 2011 Computer Science Paper 2 Practical
3 pages
ATCD_Unit 3_QB
No ratings yet
ATCD_Unit 3_QB
9 pages
8- Introduction to SQL 10-23-2024
No ratings yet
8- Introduction to SQL 10-23-2024
40 pages
User Password Expires
No ratings yet
User Password Expires
12 pages
Literature Review On Computerised Payroll System
100% (2)
Literature Review On Computerised Payroll System
4 pages
2019 FEBRUARY IAT 1 TE CMPN SEM VI SPCC
No ratings yet
2019 FEBRUARY IAT 1 TE CMPN SEM VI SPCC
12 pages
1 UNIT 1 CDUnit1 - Compatibility Mode
No ratings yet
1 UNIT 1 CDUnit1 - Compatibility Mode
17 pages
BWCT
No ratings yet
BWCT
6 pages
Syllabus For M.Sc. in Data Science
No ratings yet
Syllabus For M.Sc. in Data Science
3 pages
C Language Test
No ratings yet
C Language Test
8 pages
Gap Assessment For 27017
No ratings yet
Gap Assessment For 27017
9 pages
USA New York DL Online Generator - Verif Tools
No ratings yet
USA New York DL Online Generator - Verif Tools
1 page
TPA_1
No ratings yet
TPA_1
5 pages
Lexical Analysis
No ratings yet
Lexical Analysis
128 pages
Netscreen NS5GT & GreenBow IPsec VPN Configuration
100% (1)
Netscreen NS5GT & GreenBow IPsec VPN Configuration
15 pages
Comp Final
No ratings yet
Comp Final
16 pages
EP1 Econometria: Valeria Gutierrez 30/8/2021
No ratings yet
EP1 Econometria: Valeria Gutierrez 30/8/2021
6 pages
PP_LA_SA
No ratings yet
PP_LA_SA
20 pages
SPCC - 5
No ratings yet
SPCC - 5
19 pages
Unit 2
No ratings yet
Unit 2
14 pages
Lexical and Syntax Analysis_Updated
No ratings yet
Lexical and Syntax Analysis_Updated
5 pages
Chapter 19: Space Race: The Story of Google
No ratings yet
Chapter 19: Space Race: The Story of Google
4 pages
CSC 461 Final
No ratings yet
CSC 461 Final
170 pages
L2 - Structure of a Compiler
No ratings yet
L2 - Structure of a Compiler
43 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
SATELLITE 320 SERIES Product Specifications
No ratings yet
SATELLITE 320 SERIES Product Specifications
3 pages
Chapter 2 Lexical Analysis (Scanning) (1)
No ratings yet
Chapter 2 Lexical Analysis (Scanning) (1)
56 pages
CC-ll
No ratings yet
CC-ll
15 pages
SS Unit 4
No ratings yet
SS Unit 4
29 pages
Unit 5 SP
No ratings yet
Unit 5 SP
28 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
IMP L2 TRF U4 Vocabulary
No ratings yet
IMP L2 TRF U4 Vocabulary
1 page
FPGA Based System Design: Engr. Rashid Farid Chishti Lecturer, Dee, Fet, Iiui Chishti@Iiu - Edu.Pk Week 6
No ratings yet
FPGA Based System Design: Engr. Rashid Farid Chishti Lecturer, Dee, Fet, Iiui Chishti@Iiu - Edu.Pk Week 6
9 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Manual Pronto
No ratings yet
Manual Pronto
185 pages
SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Lexical Analysis
No ratings yet
Lexical Analysis
12 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Ge2112 - Fundamentals of Computing and Programming: Unit Iv C
No ratings yet
Ge2112 - Fundamentals of Computing and Programming: Unit Iv C
22 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
DX Diag
No ratings yet
DX Diag
33 pages
Compiler Constructer
No ratings yet
Compiler Constructer
17 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Unit 1
No ratings yet
Unit 1
24 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
compiler design unit 1 srm 21 regulation
No ratings yet
compiler design unit 1 srm 21 regulation
193 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
VR Facial Animation Via Multiview Image Translation PDF
No ratings yet
VR Facial Animation Via Multiview Image Translation PDF
16 pages
Doca0005en 14
No ratings yet
Doca0005en 14
132 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Cd notes
No ratings yet
Cd notes
194 pages
Module-1 1
No ratings yet
Module-1 1
53 pages
3 CS ServicesandCall
No ratings yet
3 CS ServicesandCall
31 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
Compiler Design
No ratings yet
Compiler Design
117 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
Unit 1
No ratings yet
Unit 1
50 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
SMW200A_ReleaseNotes~4
No ratings yet
SMW200A_ReleaseNotes~4
90 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Code Creation: Machine Inventions
From Everand
Code Creation: Machine Inventions
Pasquale De Marco
No ratings yet
Compiler Design
No ratings yet
Compiler Design
7 pages
Compiler Design Mod 1
No ratings yet
Compiler Design Mod 1
75 pages
IP-MPLS With Letter
No ratings yet
IP-MPLS With Letter
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
CD Unit - 2
100% (1)
CD Unit - 2
148 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
FRANC3D V7.4 Users Guide
No ratings yet
FRANC3D V7.4 Users Guide
191 pages
React+d3.js - Build Data Visualizations With React and d3.js
100% (2)
React+d3.js - Build Data Visualizations With React and d3.js
80 pages
Interface: Types of Interfaces
No ratings yet
Interface: Types of Interfaces
3 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Certificate Declaration: Topic Name
No ratings yet
Certificate Declaration: Topic Name
16 pages
1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Lesson 08 2

Uploaded by

Lesson 08 2

Uploaded by

LESSON 08

 An attribute is any quantity associated with a programming

 A translation scheme is a notation for attaching program fragments

Abstract Syntax tree for 9-5+2

 Removal of white space and comments

 Syntax Directed Translator Flow

 Information is put into the symbol table when the declaration of an

 Entries in the symbol table contain information about an identifier

 The information is collected incrementally.

 Each such table points to the one immediately outer table.

 Trees, especially parse trees and syntax trees.

 Linear, especially three-address code

 Examples of static checks include

 Syntactic checks such as avoiding multiple declarations of the same

 An l-value corresponds to an address or a location.

 Coercions The automatic conversion of one type to another.

 Overloading Same symbol can have different meanings depending on

 One address is the destination, which receives the result of the

 A grammar describes the hierarchical structure of programs.

 It is defined in terms of elementary symbols called terminals and

 These symbols represent language constructs.

 One non terminal is designated as the start symbol.

 A lexical analyzer reads the input one character at a time and

 Efficient parsers can be built, using a top-down method called

 A syntax-directed definition attaches rules to productions, the rules

 The result of syntax analysis is a representation of the source

 An abstract syntax tree has nodes for programming constructs, the

 Scanning consists of the simple processes that do not require

 Lexical analysis is the more complex portion, where the scanner

 The separation of lexical and syntactic analysis often allows us to

 Compiler efficiency is improved. A separate lexical analyzer allows to

 Compiler portability is enhanced. Input-device-specific peculiarities

 A pattern is a description of the form that the lexemes of a token

 A lexeme is a sequence of characters in the source program that

 But consider the token corresponding to integer constants. Just

 Similarly for the token identifier we need to distinguish one

<id, pointer to symbol-table entry for E>

 Ex. String fi is encountered for the first time in a program in the

 If a lexical analyzer is unable to proceed because none of the

 In this strategy we delete successive characters from the remaining

 Other possible error-recovery actions are:

 Delete one character from the remaining input.

 An important scheme involves two buffers that are alternately

 Using one system read command we can read N characters into a

 If fewer than N characters remain in the input file, then a special

 Two pointers to the input are maintained:

You might also like