0% found this document useful (0 votes)
17 views

Chapter 7

Uploaded by

indhuja r
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Chapter 7

Uploaded by

indhuja r
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter 7 Semantic Processing

Prof Chung.

1 113/04/12
Outlines
 7.1 Syntax-directed Translation
 7.1.1 Using a Syntax Tree Representation of a Parse
 7.1.2 Compiler Organization Alternatives
 7.1.3 Parsing, Checking, and Translation in a Single Pass

 7.2 Semantic Processing Techniques


 7.2.1 LL Parsers and Action Symbols
 7.2.2 LR Parsers and Action Symbols
 7.2.3 Semantic Record Representations
 7.2.4 Implementing Action-controlled Semantic Stacks
 7.2.5 Parser-controlled Semantic Stacks

2 113/04/12
Outlines
 7.3 Intermediate Representations and Code
Generation
 7.3.1 Intermediate Representations vs. Direct Code
Generation
 7.3.2 Forms of Intermediate Representations
 7.3.3 A Tuple Language

3 113/04/12
7.1
Syntax-directed Translation
7.1.1 Using a Syntax Tree Representation of a Parse
7.1.2 Compiler Organization Alternatives
7.1.3 Parsing, Checking, and Translation in a Single Pass

4 113/04/12
Syntax-Directed Translation
 Almost all modern compilers are syntax-directed
 The compilation process is driven by the syntactic structure of a
source program, as recognized by the parser
 Semantic Routines
 Parts of the compiler that interprets the meaning (semantic) of a
program
 Perform analysis task: static semantic checking such as variable
declarations, type errors, etc.
 Perform synthesis task: IR or actual code generation
 The semantic action is attached to the productions
(or sub trees of a syntax tree).

5 113/04/12
7.1.1
Using a Syntax Tree Representation of a Parse (1)
parse tree
Parsing: <assign>

 build the parse tree


<target> := <exp>
 Non-terminals for operator precedence
id
and associatively are included. <exp> + <term>

<term> <factor>

<term> <factoor> id

<factor> id

Semantic processing: Const

 build and decorate the Abstract Syntax Tree (AST)


 Non-terminals used for ease of parsing
may be omitted in the abstract syntax tree. abstract syntax tree
:=

id +
* id
6 113/04/12
const id
7.1.1
Using a Syntax Tree Representation of a Parse (2)
 Semantic routines traverse (post-order) the AST,
computing attributes of the nodes of AST.

 Initially, only leaves (i.e. terminals, e.g. const, id) have


attributes
Ex. Y := 3*X + I
:=
+
id(Y)

* id(I)

const(3) id(X)
7 113/04/12
7.1.1
Using a Syntax Tree Representation of a Parse (3)
 The attributes are then propagated to other nodes using
some functions, e.g.
 build symbol table
 attach attributes of nodes
 check types, etc.

 bottom-up / top-down propagation


<program>

<stmt> ‘‘‘ ‘‘ ‘‘
declaration
:= ‘‘
‘‘‘‘‘
id + exp.
symbol type
* id ‘
table ‘‘‘check types: integer * or floating *
const id ” Need to consult symbol table for types of id’s.
‘‘

8 113/04/12
7.1.1
Using a Syntax Tree Representation of a Parse (4)
 After attribute propagation is done,
the tree is decorated and ready for code generation,
use another pass over the decorated AST to generate code.

 Actually, these can be combined in a single pass


 Build the AST
 Decorate the AST
 Generate the target code

 What we have described is essentially


the Attribute Grammars(AG) (Details in chap.14)

9 113/04/12
7.1.2
Compiler Organization Alternatives (1)
 A single- pass for analysis and synthesis
 Interleaved in a single pass
 Scanning
 Parsing
 Checking
 Translation

 No explicit IR is generated
 Ex. Micro Compiler (chap.2),
 Since code generation is limited to looking at one tuple at a time,
few optimizations are possible
 Ex. Consider Register Allocation,
 Requires a more global view of the AST.

10 113/04/12
7.1.2
Compiler Organization Alternatives (2)
 We wish the code generator
completely hide machine details and semantic routines
become independent of machines.

 However,
this is violated sometimes in order to produce better code.
 Suppose there are several classes of registers,
each for a different purpose.
 Then register allocation is better done
by semantic routines than code generator
since semantic routines have a broader view of the AST.

11 113/04/12
7.1.2
Compiler Organization Alternatives (3)
 One-pass compiler + peephole optimization
 One pass for code generation
 one pass for peephole optimization

 Peephole : looking at only a few instructions at a time


 Simple but effective
 Simplify code generator
 since there is a pass of post-processing.

12 113/04/12
7.1.2
Compiler Organization Alternatives (4)
 One pass analysis and IR synthesis + code gen pass
 1st pass : Analysis and IR
 2nd pass : Code generation

 (+) flexible design for the code generator


 (+) may use optimizations for IR
 (+) greater independence of target machines
(the front-end is quite independent of target machines.)
 (+) re-targeting is easier.

13 113/04/12
7.1.2
Compiler Organization Alternatives (5)
 Multipass analysis
 For limited address space, four is a pass.
 Scanner
 Parser
 Declaration
 Static checking

 complete separation of analysis and synthesis.

14 113/04/12
7.1.2
Compiler Organization Alternatives (6)
 Multipass synthesis
 IR
 Machine-independent optimization passes
 Machine-dependent optimization passes
 Code gen passes
 Peephole
 …….

 Many complicated optimization and code generation algorithms


require multiple passes.

15 113/04/12
7.1.2
Compiler Organization Alternatives (7)
 Multi-language and multi-target compilers
 Components may be shared and parameterized.
 Ex : Ada uses Diana (language-dependent IR)
 Ex : GCC uses two IRs.
 one is high-level tree-oriented
 the other(RTL) is more machine-oriented
FORTRAN PASCAL ADA C .....

.....
machine-independent optimization

SUN PC main-frame
16 113/04/12
language - and machine-independent IRs
7.1.3
Single Pass (1)
 In Micro of chap 2, scanning, parsing and semantic
processing are interleaved in a single pass.
 (+) simple front-end
 (+) less storage if no explicit trees
 (-) immediately available information is limited since no complete
tree is built.

 Relationships
semantic
rtn 1
call call
scanner parser semantic semantic
tokens rtn 2 records

semantic
rtn k

17 113/04/12
7.1.3
Single Pass (2)
 Each terminal and non-terminal has a semantic record.
 Semantic records may be considered
as the attributes of the terminals and non-terminals.
 Terminals
 the semantic records are created by the scanner.
 Non-terminals
 the semantic records are created by a semantic routine when a production
is recognized.

 Semantic records are transmitted


among semantic routines ex. A  B C D #SR
via a semantic stack. A

18 113/04/12
B C D #SR
7.1.3
Single Pass (3)
1 pass = 1 post-order traversal of the parse tree
 parsing actions -- build parse trees
 semantic actions -- post-order traversal

<assign>

ID (A) := <exp>

<exp> <exp>+<term> <exp> + <term>


<assign> ID:=<exp>
<term> const (1)
1
+ +
<exp> id (B)
B B B
A A A A A
gencode(+,B,1,tmp1)
gencode(:=,A,tmp1)

19 113/04/12
7.2
Semantic Processing Techniques
7.2.1 LL Parsers and Action Symbols
7.2.2 LR Parsers and Action Symbols
7.2.3 Semantic Record Representations
7.2.4 Implementing Action-controlled Semantic Stacks
7.2.5 Parser-controlled Semantic Stacks

20 113/04/12
7.2
Semantic Processing
 Semantic routines may be invoked in two ways:
 <1> By parsing procedures
 as in the recursive descent parser in chap 2

 <2>By the parser driver


 as in LL and LR parsers.

21 113/04/12
7.2.1
LL(1)
 Some productions
 have no action symbols; others may have several.

 Semantic routines
 are called when action symbols appear on stack top.
<exp> <term> + <exp> #add
<term>
+ +
parse <exp> <exp> <exp>
<exp> #add #add #add
stack #add

<exp>
semantic + +
stack <term> <term> <exp>
22 113/04/12
7.2.2
LR(1) - (1)
 Semantic routines
 are invoked only when a structure is recognized.

 LR parsing
 a structure is recognized when the RHS is reduced to LHS.

 Therefore, action symbols must be placed at the end.


Ex:
<stmt> # ifThen
if <cond> then <stmt> end
if <cond> then <stmt> else <stmt> end
# ifThenElse
23 113/04/12
7.2.2
LR(1) - (2)
 After shifting “if <cond> “
 The parser cannot decide
which of #ifThen and #ifThenElse should be invoked.

 cf. In LL parsing,
 The structure is recognized when a non-terminal is expanded.

24 113/04/12
7.2.2
LR(1) - (3)
 However, sometimes we do need to perform semantic
actions in the middle of a production.
Ex:
<stmt> if <exp> then <stmt> end

generate code for <exp> generate code for <stmt>


Need a conditional jump here.

Solution: Use two productions:


<stmt> <if head> then <stmt> end #finishIf
<if head> if <exp> #startIf
semantic hook (only for semantic processing)

25 113/04/12
7.2.2
LR(1) - (4)
 Another problem
 What if the action is not at the end?

 Ex:
 <prog>  #start begin <stmt> end
 We need to call #start.

 Solution: Introduce a new non-terminal.


 <prog><head> begin <stmt> end
 <head>#start

 YACC automatically performs such transformations.


26 113/04/12
7.2.3
Semantic Record Representation - (1)
 Since we need to use a stack to store semantic records, all
semantic records must have the same type.
 variant record in Pascal
 union type in C

 Ex:
enum kind {OP, EXP, STMT, ERROR};
typedef struct {
enum kind tag;
union {
op_rec_type OP_REC;
exp_rec_type EXP_REC;
stmt_rec_type STMT_REC;
......
}
} sem_rec_type;

27 113/04/12
7.2.3
Semantic Record Representation - (2)
 How to handle errors?

 Ex.
 A semantic routine
needs to create a record for each identifier in an expression.
 What if the identifier is not declared?

 The solution at next page…….

28 113/04/12
7.2.3
Semantic Record Representation - (3)
 Solution 1: make a bogus record
 This method may create a chain of
meaningless error messages due to this bogus record.

 Solution 2: create an ERROR semantic record


 No error message will be printed
when ERROR record is encountered.

 WHO controls the semantic stack?


 action routines
 parser
29 113/04/12
7.2.4
Action-controlled semantic stack - (1)
 Action routines take parameters from
 the semantic stack directly and push results onto the stack.

 Implementing stacks:
 1. array
 2. linked list

 Usually, the stack is transparent - any records


in the stack may be accessed by the semantic routines.
 (-) difficult to change

30 113/04/12
7.2.4
Action-controlled semantic stack - (2)
 Two other disadvantages:
 (-)Action routines
need to manage the stack.

 (-)Control of the stack


is distributed among action routines.
 Each action routine
pops some records and pushes 0 or 1 record.
 If any action routine
makes a mistake, the whole stack is corrupt.

 The solution at next page……..

31 113/04/12
7.2.4
Action-controlled semantic stack - (3)
 Solution 1: Let parser control the stack
 Solution 2: Introduce additional stack routines
 Ex:
 Parser  Stack routines  Parameter-driven action routines

 If action routines
do not control the stack, we can use opague (or abstract)
stack: only push() and pop() are provided.
 (+) clean interface
 (- ) less efficient

32 113/04/12
7.2.5
parser-controlled stack - (1)
 LR
 Semantic stack and parse stack operate in parallel [shifts and
reduces in the same way].

<stmt> .......... <stmt>


 Ex: then .......... then
<exp> .......... <exp>
 <stmt> if <exp> then <stmt> end if .......... if
:
parser stack semantic stack
may be combined
 Ex:
 YACC generates such parser-controlled semantic stack.
 <exp><exp> + <term>
 { $$.value=$1.value+$3.value;}
33 113/04/12
7.2.5
parser-controlled stack - (2)
 LL parser-controlled semantic stack
 Every time a production AB C D is predicted,
B
Parse stack C
A D
: :

12 top
11 D
10 C right
Semantic stack 9 B
: 8 : current
A 7 A left
: :

Need four pointers for the semantic stack (left, right, current, top).

34 113/04/12
7.2.5
parser-controlled stack - (3)
 However, when a new production BE F G is predicted,
the four pointers will be overwritten.

 Therefore, create a new EOP record for the four pointers


on the parse stack.

 When EOP record appears on stack top,


restore the four pointers, which essentially pops off
records from the semantic stack.

 An example at next page…….


35 113/04/12
7.2.5
parser-controlled stack - (4)

E
Parse stack F
G
B EOP(7,9,9,12)
C C
ABCD D B  EFG
D
EOP(...) EOP(......)
A
: :
:

top
Semantic stack 15
14 G
13 F right
12 top 12 E
current 11 D 11 D
10 right C current
C 10
9 9 B 9 B left
8 : 8 current :
: 8
7 A 7 A
A left 7
: :
:

36 113/04/12
7.2.5
parser-controlled stack - (5)
 Note
 All push() and pop() are done by the parser
 Not by the action routines.

 Semantic records
 Are passed to the action routines by parameters.

 Example
 <primary>(<exp>) #copy ($2,$$)

37 113/04/12
7.2.5
parser-controlled stack - (6)
 Initial information
 is stored in the semantic record of LHS.

 After the RHS is processed the resulting information


 is stored back in the semantic record of LHS.

D
initially C finally
B
: : :
A A A
: : :
38 information flow (attributes) 113/04/12
7.2.5
parser-controlled stack - (7)
 (-) Semantic stack may grow very big.
 <fix>
 Certain non-terminals never use semantic records,
 e.g. <stmt list> and <id list>.

 We may insert #reuse


 before the last non-terminal in each of their productions.

 Example
 <stmt list><stmt> #reuse <stmt tail>
 <stmt tail><stmt> #reuse <stmt tail>
 <stmt tail>

 Evaluation
 Parser-controlled semantic stack is easy with LR, but not so with LL.

39 113/04/12
7.3
Intermediate Representations
and Code Generation
7.1.1 Using a Syntax Tree Representation of a Parse
7.1.2 Compiler Organization Alternatives
7.1.3 Parsing, Checking, and Translation in a Single Pass

40 113/04/12
7.3
Intermediate representation and code generation
 Two possibilities:

..... semantic code Machine code


1. routines generation
(+) no extra pass for code generation
(+) allows simple 1-pass compilation

semantic IR code
2. ..... Machine code
routines generation
(+) allows higher-level operations e.g. open block, call procedures.
(+) better optimization because IR is at a higher level.
(+) machine dependence is isolated in code generation.

41 113/04/12
7.3
Intermediate representation and code generation
 IR
 good for optimization and portability

 Machine Code
 simple

42 113/04/12
7.3.2 - (1)
 1. postfix form
 Example
a+b ab+
(a+b)*c ab+c*
a+b*c abc*+
a:=b*c+b*d abc*bd*+:=

 (+) simple and concise


 (+) good for driving an interpreter
 (- ) Not good for optimization or code generation

43 113/04/12
7.3.2 - (2)
a := b*c + b*d
 2. 3-addr code (1) (* b c) (1) ( * b c t1 )
 Triple (2) (* b d) (2) ( * b d t2 )
(3) (+ (1) (2)) (3) ( + t1 t2 t3 )
 op arg1 arg2 (4) (:= (3) a) (4) ( := t3 a _)
 Quadruple
intermediate results use temporary
 op arg1 arg2 arg3 are referenced by names
the instruction #

 Triple: more concise


 But what if instructions are deleted,
 Moved or added during optimization?

 Triples and quadruples


 are more similar to machine code.
44 113/04/12
7.3.2 - (3)
 More detailed 3-addr code
 Add type information

 Example a := b*c + b*d


 Suppose b,c are integer type, d is float type.

(1) ( I* b c) (I* b c t1)


(2) (FLOAT b _) (FLOAT b t2 _)
(3) ( F* (2) d) (F* t2 d t3)
(4) (FLOAT (1) _) (FLOAT t1 t4 _)
(5) ( *f+ (4) (3)) ( F+ t4 t3 t5)
(6) ( := (5) a) ( := t5 a _)
45 113/04/12
7.3.2 - (4)
 Sometimes,
the number of arguments to operators may vary.

 The generalized 3-addr code is called tuples.

 Example
(I* b c t1)
(FLOAT b t2)
(F* t2 d t3)
(FLOAT t1 t4)
(F+ t4 t3 t5)
( := t5 a)
46 113/04/12
7.3.2 - (5)
 3. Sometimes, we use trees or DAG
:= :=
a + a +
Ex: a := b*c + b*d * * * *

b c b d b c d
 More generally, we may use AST as IR.
 Machine-independent optimization
Is implemented as tree transformations.
Ex. Ada uses Diana.
.....

47 113/04/12

You might also like