Lesson 08 2
Lesson 08 2
Overview
of
Previous Lesson(s)
Over View
Syntax-directed translation is done by attaching rules or program
fragments to productions in a grammar.
3
Over View..
In an abstract syntax tree for an expression, each interior node
represents an operator, the children of the node represent the
operands of the operator.
4
Over View…
Structure of our Compiler
Source
Program
(Character Token Syntax-directed Java
Lexical
Lexicalanalyzer
analyzer stream bytecode
stream) translator
Develop
parser and code
generator for translator
Syntax definition
JVM specification
(BNF grammar)
5
Over View…
Typical tasks performed by lexical analyzer
6
TODAY’S LESSON
7
Contents
Symbol Tables
Symbol Table Per Scope
The Use of Symbol Tables
Intermediate Code Generator
8
Symbol Tables
Symbol tables are data structures that are used by compilers to
hold information about source-program constructs.
9
Symbol Table Per Scope
If the language you are compiling supports nested scopes, the lexer
can only construct the <lexeme,token> pairs.
The parser converts these pairs into a true symbol table that reflects
the nested scopes.
If the language is flat, the scanner can produce the symbol table.
Key idea is, when entering a block, a new symbol table is created.
10
Use of Symbol Table
A semantic action gets information from the symbol table when
the identifier is subsequently used, for example, as a factor in an
expression.
Now there are two important forms for the intermediate code generator
11
Intermediate Code Generator
Static checking refers to checks performed during compilation,
whereas, dynamic checking refers to those performed at run time.
Type checks.
12
Intermediate Code Generator..
L-values and R-values
Consider Q = Z; or A[f(x)+B*D] = g(B+C*h(x,y));
Three tasks:
Evaluate the left hand side (LHS) to obtain an l-value.
Evaluate the RHS to obtain an r-value.
Perform the assignment.
13
Intermediate Code Generator...
Static checking is used to insure that R-values do not appear on the
LHS.
Type Checking assures that the type of the operands are correct as
per the operator and also reports error, if any.
14
Three Address Code
These are primitive instructions that have one operator and (up to)
three operands, all of which are addresses.
Ex.
ADD x y z
MULT a b c
ARRAY_L q r s
ifTrueGoto x L
15
Syntax Directed Translator Flow
The starting point for a syntax-directed translator is a grammar for
the source language.
16
Syntax Directed Translator Flow..
The productions of a grammar consist of a non terminal called the
left side of a production and a sequence of terminals and non
terminals called the right side of the production.
17
Syntax Directed Translator Flow...
Parsing is the problem of figuring out how a string of terminals can
be derived from the start symbol of the grammar by repeatedly
replacing a non terminal by the body of one of its productions.
18
Syntax Directed Translator Flow...
A translation scheme embeds program fragments called semantic
actions in production bodies.
The actions are executed in the order that productions are used
during syntax analysis.
19
Role of Lexical Analyzer
20
Role of Lexical Analyzer..
Sometimes, lexical analyzers are divided into a cascade of two
processes:
21
Lexical Analysis Vs Parsing
There are a number of reasons why the analysis portion is normally
separated into lexical analysis and parsing.
22
Tokens, Patterns & Lexemes
A token is a pair consisting of a token name and an optional
attribute value.
The token name is an abstract symbol representing a kind of lexical
unit, e.g., a particular keyword, or sequence of input characters
denoting an identifier.
24
Attributes for Tokens
For tokens corresponding to keywords, attributes are not needed
since the name of the token tells everything.
25
Attributes for Tokens..
Ex. The token names and associated attribute values for the
Fortran statement E = M * C2 are as follows:
26
Lexical Errors
Lexical analyzer didn’t always predict errors in source code without
the aid of other components.
fi is a valid lexeme for the token id, the lexical analyzer must return
the token id to the parser and let parser in this case - handle an error
due to transposition of the letters.
27
Lexical Errors..
28
Lexical Errors...
29
Input Buffering
Determining the next lexeme often requires reading the input
beyond the end of that lexeme.
Ex.
To determine the end of an identifier normally requires reading the
first whitespace character after it.
Also just reading > does not determine the lexeme as it could also be
>=.
When you determine the current lexeme, the characters you read
beyond it may need to be read again to determine the next lexeme.
30
Buffer Pairs
Specialized buffering techniques have been developed to reduce
the amount of overhead required to process a single input
character.
31
Buffer Pairs
Each buffer is of the same size N , and N is usually the size of a disk
block, e.g., 4096 bytes.
32
Thank You