0% found this document useful (0 votes)
30 views33 pages

Lesson 08 2

Uploaded by

my5911319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views33 pages

Lesson 08 2

Uploaded by

my5911319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

LESSON 08

Overview
of
Previous Lesson(s)
Over View
 Syntax-directed translation is done by attaching rules or program
fragments to productions in a grammar.

 An attribute is any quantity associated with a programming


construct .

 A translation scheme is a notation for attaching program fragments


to the productions of a grammar.

3
Over View..
 In an abstract syntax tree for an expression, each interior node
represents an operator, the children of the node represent the
operands of the operator.

Abstract Syntax tree for 9-5+2

4
Over View…
 Structure of our Compiler
Source
Program
(Character Token Syntax-directed Java
Lexical
Lexicalanalyzer
analyzer stream bytecode
stream) translator

Develop
parser and code
generator for translator

Syntax definition
JVM specification
(BNF grammar)

5
Over View…
 Typical tasks performed by lexical analyzer

 Removal of white space and comments


 Encode constants as tokens
 Recognize keywords
 Recognize identifiers
 Store identifier names in a global symbol table

6
TODAY’S LESSON

7
Contents
 Symbol Tables
 Symbol Table Per Scope
 The Use of Symbol Tables
 Intermediate Code Generator

 Syntax Directed Translator Flow


 Role of the Lexical Analyzer
 Tokens, Patterns & Lexemes
 Attributes for Tokens
 Lexical Errors
 Input Buffering
 Buffer Pairs

8
Symbol Tables
 Symbol tables are data structures that are used by compilers to
hold information about source-program constructs.

 Information is put into the symbol table when the declaration of an


identifier is analyzed.

 Entries in the symbol table contain information about an identifier


such as its character string (or lexeme) , its type, its position in
storage, and any other relevant information.

 The information is collected incrementally.

9
Symbol Table Per Scope
 If the language you are compiling supports nested scopes, the lexer
can only construct the <lexeme,token> pairs.

 The parser converts these pairs into a true symbol table that reflects
the nested scopes.
 If the language is flat, the scanner can produce the symbol table.

 Key idea is, when entering a block, a new symbol table is created.

 Each such table points to the one immediately outer table.

10
Use of Symbol Table
 A semantic action gets information from the symbol table when
the identifier is subsequently used, for example, as a factor in an
expression.

 Now there are two important forms for the intermediate code generator

 Trees, especially parse trees and syntax trees.

 Linear, especially three-address code

11
Intermediate Code Generator
 Static checking refers to checks performed during compilation,
whereas, dynamic checking refers to those performed at run time.

 Examples of static checks include

 Syntactic checks such as avoiding multiple declarations of the same


identifier in the same scope.

 Type checks.

12
Intermediate Code Generator..
 L-values and R-values
Consider Q = Z; or A[f(x)+B*D] = g(B+C*h(x,y));

 Three tasks:
 Evaluate the left hand side (LHS) to obtain an l-value.
 Evaluate the RHS to obtain an r-value.
 Perform the assignment.

 An l-value corresponds to an address or a location.


 An r-value corresponds to a value.
 Neither 12 nor s+t can be used as an l-value, but both are legal r-
values.

13
Intermediate Code Generator...
 Static checking is used to insure that R-values do not appear on the
LHS.

 Type Checking assures that the type of the operands are correct as
per the operator and also reports error, if any.

 Coercions The automatic conversion of one type to another.

 Overloading Same symbol can have different meanings depending on


the types of the operands.

14
Three Address Code
 These are primitive instructions that have one operator and (up to)
three operands, all of which are addresses.

 One address is the destination, which receives the result of the


operation,
 Other two addresses are the sources of the values to be operated on.

 Ex.
ADD x y z
MULT a b c
ARRAY_L q r s
ifTrueGoto x L

15
Syntax Directed Translator Flow
 The starting point for a syntax-directed translator is a grammar for
the source language.

 A grammar describes the hierarchical structure of programs.

 It is defined in terms of elementary symbols called terminals and


variable symbols called nonterminals.

 These symbols represent language constructs.

16
Syntax Directed Translator Flow..
 The productions of a grammar consist of a non terminal called the
left side of a production and a sequence of terminals and non
terminals called the right side of the production.

 One non terminal is designated as the start symbol.

 A lexical analyzer reads the input one character at a time and


produces as output a stream of tokens.
 A token consists of a terminal symbol along with additional
information in the form of attribute values.

17
Syntax Directed Translator Flow...
 Parsing is the problem of figuring out how a string of terminals can
be derived from the start symbol of the grammar by repeatedly
replacing a non terminal by the body of one of its productions.

 Efficient parsers can be built, using a top-down method called


predictive parsing.

 A syntax-directed definition attaches rules to productions, the rules


compute attribute vales.

18
Syntax Directed Translator Flow...
 A translation scheme embeds program fragments called semantic
actions in production bodies.
 The actions are executed in the order that productions are used
during syntax analysis.

 The result of syntax analysis is a representation of the source


program, called intermediate code.

 An abstract syntax tree has nodes for programming constructs, the


children of a node give the meaningful sub constructs.

19
Role of Lexical Analyzer

20
Role of Lexical Analyzer..
 Sometimes, lexical analyzers are divided into a cascade of two
processes:

 Scanning consists of the simple processes that do not require


tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.

 Lexical analysis is the more complex portion, where the scanner


produces the sequence of tokens as output.

21
Lexical Analysis Vs Parsing
 There are a number of reasons why the analysis portion is normally
separated into lexical analysis and parsing.

 The separation of lexical and syntactic analysis often allows us to


simplify at least one of these tasks.

 Compiler efficiency is improved. A separate lexical analyzer allows to


apply specialized techniques that serve only the lexical task, not the
job of parsing.

 Compiler portability is enhanced. Input-device-specific peculiarities


can be restricted to the lexical analyzer.

22
Tokens, Patterns & Lexemes
 A token is a pair consisting of a token name and an optional
attribute value.
 The token name is an abstract symbol representing a kind of lexical
unit, e.g., a particular keyword, or sequence of input characters
denoting an identifier.

 A pattern is a description of the form that the lexemes of a token


may take.
 In the case of a keyword as a token, the pattern is just the sequence of
characters that form the keyword.

 A lexeme is a sequence of characters in the source program that


matches the pattern for a token and is identified by the lexical
analyzer as an instance of that token.
23
Tokens, Patterns & Lexemes..
 In many programming languages, the following classes cover most
or all of the tokens:

24
Attributes for Tokens
 For tokens corresponding to keywords, attributes are not needed
since the name of the token tells everything.

 But consider the token corresponding to integer constants. Just


knowing that the we have a constant is not enough, subsequent
stages of the compiler need to know the value of the constant.

 Similarly for the token identifier we need to distinguish one


identifier from another.
 The normal method is for the attribute to specify the symbol table
entry for this identifier.

25
Attributes for Tokens..
 Ex. The token names and associated attribute values for the
Fortran statement E = M * C2 are as follows:

<id, pointer to symbol-table entry for E>


<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>

26
Lexical Errors
 Lexical analyzer didn’t always predict errors in source code without
the aid of other components.

 Ex. String fi is encountered for the first time in a program in the


context:
fi ( a == f (x) ) …
 A lexical analyzer cannot tell whether fi is a misspelling of the
keyword if or an undeclared function identifier.

 fi is a valid lexeme for the token id, the lexical analyzer must return
the token id to the parser and let parser in this case - handle an error
due to transposition of the letters.

27
Lexical Errors..

 If a lexical analyzer is unable to proceed because none of the


patterns for tokens matches any prefix of the remaining input. The
simplest recovery strategy is "panic mode" recovery.

 In this strategy we delete successive characters from the remaining


input, until the lexical analyzer can find a well-formed token at the
beginning of what input is left.

28
Lexical Errors...

 Other possible error-recovery actions are:

 Delete one character from the remaining input.


 Insert a missing character into the remaining input.
 Replace a character by another character.
 Transpose two adjacent characters.

29
Input Buffering
 Determining the next lexeme often requires reading the input
beyond the end of that lexeme.

 Ex.
 To determine the end of an identifier normally requires reading the
first whitespace character after it.
 Also just reading > does not determine the lexeme as it could also be
>=.
 When you determine the current lexeme, the characters you read
beyond it may need to be read again to determine the next lexeme.

30
Buffer Pairs
 Specialized buffering techniques have been developed to reduce
the amount of overhead required to process a single input
character.

 An important scheme involves two buffers that are alternately


reloaded.

31
Buffer Pairs
 Each buffer is of the same size N , and N is usually the size of a disk
block, e.g., 4096 bytes.

 Using one system read command we can read N characters into a


buffer, rather than using one system call per character.

 If fewer than N characters remain in the input file, then a special


character, eof, marks the end of the source file.

 Two pointers to the input are maintained:


 Pointer lexemeBegin, marks the beginning of the current lexeme,
whose extent we are attempting to determine.
 Pointer forward scans ahead until a pattern match is found.

32
Thank You

You might also like