0% found this document useful (0 votes)
26 views

CTCD Unit 4

unit 4 ctcd

Uploaded by

Ranjit47 H
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

CTCD Unit 4

unit 4 ctcd

Uploaded by

Ranjit47 H
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT IV – RUN TIME ENVIRONMENTS AND CODE GENERATION

ISSUES IN THE DESIGN OF A CODE GENERATOR

The following issues arise during the code generation phase:

1. Input to code generator


2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to code generator:


 The input to the code generation consists of the intermediate representation of the source
program produced by front end, together with information in the symbol table todetermine
run-time addresses of the data objects denoted by the names in the intermediate
representation.

 Intermediate representation can be:


a. Linear representation such as postfix notation
b. Three address representation such as quadruples
c. Virtual machine representation such as stack machine code
d. Graphical representations such as syntax trees and DAGs.

 Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.

2. Target program:
 The output of the code generator is the target program. The output may be:
a. Absolute machine language
- It can be placed in a fixed memory location and can be executed immediately.
b. Relocatable machine language
- It allows subprograms to be compiled separately.

c. Assembly language
- Code generation is made easier.

3. Memory management:
 Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator.

 It makes use of symbol table, that is, a name in a three-address statement refers to a
symbol-table entry for the name.

 Labels in three-address statements have to be converted to addresses of instructions.


For example,
j:goto i generates jump instruction as follows :
 If i<j, a backward jump instruction with target address equal to location of
code for quadruple i is generated.
 If i>j, the jump is forward. We must store on a list for quadruple i the location
of the first machine instruction generated for quadruple j.When i is processed,
the machine locations for all instructions that forward jumps to iare filled.

4. Instruction selection:
 The instructions of target machine should be complete and uniform.

 Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.

 The quality of the generated code is determined by its speed and size.

 The former statement can be translated into the latter statement as shown below:

5. Register allocation
 Instructions involving register operands are shorter and faster than those involving
operands in memory.

 The use of registers is subdivided into two subproblems:


 Register allocation– the set of variables that will reside in registers at a point in
the program is selected.
 Register assignment– the specific register that a variable will reside in is
picked.

 Certain machine requires even-odd register pairs for some operands and results.
For example, consider the division instruction of the form:
D x, y

where, x – dividend even register in even/odd register pair


y – divisor
even register holds the remainder
odd register holds the quotient

6. Evaluation order
 The order in which the computations are performed can affect the efficiency of the
target code. Some computation orders require fewer registers to hold intermediate
results than others.

TARGET MACHINE

 Familiarity with the target machine and its instruction set is a prerequisite for designing a
good code generator.
 The target computer is a byte-addressable machine with 4 bytes to a word.
 It has n general-purpose registers, R0, R1, . . . , Rn-1 .
 It has two-address instructions of the form:
op source, destination
where op is an op-code, and source and destination are data fields.

 It has the following op-codes:


MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)
 The source and destination of an instruction are specified by combining registers
and memory locations with address modes.

Address modes with their assembly-language forms

MODE FORM ADDRESS ADDED COST

Absolute M M 1

Register R R 0

indexed c(R) c+contents(R) 1

indirect register *R contents(R) 0

indirect indexed *c(R) contents(c+cont 1


ents(R))

literal #c c 1
 For example : MOV R0, M stores contents of Register R0 into memory location M ;
 MOV 4(R0), M stores the value contents(4+contents(R 0)) into M.

Instruction costs :

 Instruction cost = 1+cost for source and destination address modes. This cost corresponds
to the length of the instruction.
 Address modes involving registers have cost zero.
 Address modes involving memory location or literal have cost one.
 Instruction length should be minimized if space is important. Doing so also minimizes the
time taken to fetch and perform the instruction.

 In order to generate good code for target machine, we must utilize its addressing
capabilities efficiently.

Problems
1. Generate code for the following three-address statements assuming all variables are stored in memory
locations.
2. x = 1
3. x = a
4. x = a + 1
5. x = a + b
6. The two statements
o x=b*c
o y=a+x
answer
1. LD R1, #1
ST x, R1

2. LD R1, a
ST x, R1

3. LD R1, a
ADD R1, R1, #1
ST x, R1

4. LD R1, a
LD R2, b
ADD R1, R1, R2
ST x, R1

5. LD R1, b
LD R2, c
MUL R1, R1, R2
LD R3, a
ADD R3, R3, R1
ST y, R3

2. Generate code for the following three-address statements assuming a and b are arrays whose elements are 4-byte
values.
1. The four-statement sequence
i. x = a[i]
ii. y = b[j]
iii. a[i] = y
iv. b[j] = x
2. The three-statement sequence
i. x = a[i]
ii. y = b[i]
iii. z=x*y
3. The three-statement sequence
i. x = a[i]
ii. y = b[x]
iii. a[i] = y
answer
1. LD R1, i
MUL R1, R1, #4
LD R2, a(R1)
LD R3, j
MUL R3, R3, #4
LD R4, b(R3)
ST a(R1), R4
ST b(R3), R2

2. LD R1, i
MUL R1, R1, #4
LD R2, a(R1)
LD R1, b(R1)
MUL R1, R2, R1
ST z, R1

3. LD R1, i
MUL R1, R1, #4
LD R2, a(R1)
MUL R2, R2, #4
LD R2, b(R2)
ST a(R1), R2

3. Generate code for the following three-address sequence assuming that p and q are in memory locations:
y = *q
q=q+4
*p = y
p=p+4
answer
LD R1, q
LD R2, 0(R1)
ADD R1, R1, #4
ST q, R1
LD R1, p
ST 0(R1), R2
ADD R1, R1, #4
ST p, R1

4. Generate code for the following sequence assuming that x, y, and z are in memory locations:
if x < y goto L1
z=0
goto L2
L1: z = 1
answer
LD R1, x
LD R2, y
SUB R1, R1, R2
BLTZ R1, L1
LD R1, #0
ST z, R1
BR L2
L1: LD R1, #1
ST z, R1

5. Generate code for the following sequence assuming that n is in a memory location:
s=0
i=0
L1: if i > n goto L2
s=s+i
i=i+1
goto L1
L2:
answer
Long version:

LD R1, #0
ST s, R1
ST i, R1
L1: LD R1, i
LD R2, n
SUB R2, R1, R2
BGTZ R2, L2
LD R2, s
ADD R2, R2, R1
ST s, R2
ADD R1, R1, #1
ST i, R1
BR L1
L2:

Short version:

LD R2, #0
LD R1, R2
LD R3, n
L1: SUB R4, R1, R3
BGTZ R4, L2
ADD R2, R2, R1
ADD R1, R1, #1
BR L1
L2:

6. Determine the costs of the following instruction sequences:


1. LD R0, y
LD R1, z
ADD R0, R0, R1
ST x, R0

2. LD R0, i
MUL R0, R0, 8
LD R1, a(R0)
ST b, R1

3. LD R0, c
LD R1, i
MUL R1, R1, 8
ST a(R1),R0

4. LD R0, p
LD R1, 0(R0)
ST x, R1

5. LD R0, p
LD R1, x
ST 0(R0), R1

6. LD R0, x
LD R1, y
SUB R0, R0, R1
BLTZ *R3, R0
answer
1. 2 + 2 + 1 + 2 = 7
2. 2 + 2 + 2 + 2 = 8
3. 2 + 2 + 2 + 2 = 8
4. 2 + 2 + 2 = 6
5. 2 + 2 + 2 = 6
6. 2 + 2 + 1 + 1 = 6

BASIC BLOCKS AND FLOW GRAPHS

Basic Blocks

 A basic block is a sequence of consecutive statements in which flow of control enters at


the beginning and leaves at the end without any halt or possibility of branching except at
the end.
 The following sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t 4 : = t 1 + t3
t5 : = b * b
t 6 : = t 4 + t5

Basic Block Construction:


Algorithm: Partition into basic blocks

Input: A sequence of three-address statements

Output: A list of basic blocks with each three-address statement in exactly one block

Method:

1. We first determine the set of leaders, the first statements of basic blocks. The
rules we use are of the following:
a. The first statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a
leader.
c. Any statement that immediately follows a goto or conditional goto statement
is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
 Consider the following source code for dot product of two vectors a and b of length 20

begin

prod :=0;

i:=1;

do begin

prod :=prod+ a[i] * b[i];

i :=i+1;

end

while i <= 20

end

 The three-address code for the above source program is given as :


(1) prod := 0

(2) i := 1

(3) t1 := 4* i

(4) t2 := a[t1] /*compute a[i] */

(5) t3 := 4* i

(6) t4 := b[t3] /*compute b[i] */

(7) t5 := t2*t4

(8) t6 := prod+t5

(9) prod := t6

(10) t7 := i+1

(11) i := t7

(12) if i<=20 goto (3)

Basic block 1: Statement (1) to (2)

Basic block 2: Statement (3) to (12)


Flow Graphs

 Flow graph is a directed graph containing the flow-of-control information for the set of
basic blocks making up a program.
 The nodes of the flow graph are basic blocks. It has a distinguished initial node.
 E.g.: Flow graph for the vector dot product is given as follows:

prod : = 0 B1
i:=1

t1 : = 4 * i
t2 : = a [ t1 ]
t3 : = 4 * i
B2
t4 : = b [ t3 ]
t5 : = t2 * t4
t6 : = prod + t5
prod : = t6
t7 : = i + 1
i : = t7
if i <= 20 goto B2

 B 1 is theinitialnode. B 2 immediately follows B1, so there is an edge from B1 to B2. The target
of jump from last statement of B1 is the first statement B2, so there is an edge from B1 (last
statement) to B2 (first statement).
 B 1 is thepredecessorof B 2, and B2 is asuccessorof B 1.

Loops

 A loop is a collection of nodes in a flow graph such that


1. All nodes in the collection arestrongly connected.
2. The collection of nodes has a uniqueentry.
 A loop that contains no other loops is called an inner loop.

NEXT-USE INFORMATION

 If the name in a register is no longer needed, then we remove the name from the register
and the register can be used to store some other names.
Input:Basic block B of three-address statements

Output:At each statement i: x= y op z, we attach to i the liveliness and next-uses of x,


y and z.

Method:We start at the last statement of B and scan backwards.

1. Attach to statement i the information currently found in the symbol table


regarding the next-use and liveliness of x, y and z.
2. In the symbol table, set x to “not live” and “no next use”.
3. In the symbol table, set y and z to “live”, and next-uses of y and z to i.

Symbol Table:

Names Liveliness Next-use

x not live no next-use

y Live i

z Live i

A SIMPLE CODE GENERATOR

 A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements.

 For example: consider the three-address statementa := b+c


It can have the following sequence of codes:

ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c

(or)

ADD c, Ri Cost = 2 // if c is in a memory location

(or)

MOV c, Rj Cost = 3 // move c from memory to Rj and add

ADD Rj, Ri

Register and Address Descriptors:

 A register descriptor is used to keep track of what is currently in each registers. The
register descriptors show that initially all the registers are empty.
 An address descriptor stores the location where the current value of the name can be
found at run time.
A code-generation algorithm:

The algorithm takes as input a sequence of three-address statements constituting a basic block.
For each three-address statement of the form x : = y op z,perform the following actions:

1. Invoke a functiongetregto determine the location L where the result of the computation y op
z should be stored.

2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value of y is
not already in L, generate the instructionMOV y’ , Lto place a copy of y in L.

3. Generate the instructionOP z’ , Lwhere z’ is a current location of z. Prefer a register to a


memory location if z is in both. Update the address descriptor of x to indicate that x is in
location L. If x is in L, update its descriptor and remove x from all other descriptors.

4. If the current values of y or z have no next uses, are not live on exit from the block, and are in
registers, alter the register descriptor to indicate that, after execution of x : = y op z , those
registers will no longer contain y or z.

Generating Code for Assignment Statements:

 The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence:
t:=a–b
u:=a–c
v:=t+u
d:=v+u
with d live at the end.

Code sequence for the example is:

Statements Code Generated Register descriptor Address descriptor

Register empty

t:=a-b MOV a, R0 R0 contains t t in R0


SUB b, R0

u:=a-c MOV a , R1 R0 contains t t in R0


SUB c , R1 R1 contains u u in R1

v:=t+u ADD R1, R0 R0 contains v u in R1


R1 contains u v in R0

d:=v+u ADD R1, R0 R0 contains d d in R0


d in R0 and memory
MOV R0, d
Generating Code for Indexed Assignments

The table shows the code sequences generated for the indexed assignment statements
a : = b [ i ]anda [ i ] : = b

Statements Code Generated Cost

a : = b[i] MOV b(Ri), R 2

a[i] : = b MOV b, a(Ri) 3

Generating Code for Pointer Assignments

The table shows the code sequences generated for the pointer assignments
a : = *pand*p : = a

Statements Code Generated Cost

a : = *p MOV *Rp, a 2

*p : = a MOV a, *Rp 2

Generating Code for Conditional Statements

Statement Code

if x < y goto z CMP x, y


CJ< z /* jump to z if condition code
is negative */

x : = y +z MOV y, R0
if x < 0 goto z ADD z, R0
MOV R0,x
CJ< z
RUN-TIME ENVIRONMENTS
Parameter Passing
The communication medium among procedures is known as parameter passing. The values of the
variables from a calling procedure are transferred to the called procedure by some mechanism. Before
moving ahead, first go through some basic terminologies pertaining to the values in a program.
r-value
The value of an expression is called its r-value. The value contained in a single variable also becomes
an r-value if it appears on the right-hand side of the assignment operator. r-values can always be
assigned to some other variable.
l-value
The location of memory (address) where an expression is stored is known as the l-value of that
expression. It always appears at the left hand side of an assignment operator.

For example:

day = 1;
week = day * 7;
month = 1;
year = month * 12;

From this example, we understand that constant values like 1, 7, 12, and variables like day, week, month
and year, all have r-values. Only variables have l-values as they also represent the memory location
assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.

Formal Parameters
Variables that take the information passed by the caller procedure are called formal parameters. These
variables are declared in the definition of the called function.

Actual Parameters
Variables whose values or addresses are being passed to the called procedure are called actual
parameters. These variables are specified in the function call as arguments.

Example:
fun_one()
{
int actual_parameter = 10;
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter, depending upon the parameter passing
technique used. It may be a value or an address.

Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of actual parameters and the
compiler puts that into the called procedure’s activation record. Formal parameters then hold the values
passed by the calling procedure. If the values held by the formal parameters are changed, it should have
no impact on the actual parameters.

Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is copied to the activation record
of the called procedure. This way, the called procedure now has the address (memory location) of the
actual parameter and the formal parameter refers to the same memory location. Therefore, if the value
pointed by the formal parameter is changed, the impact should be seen on the actual parameter as they
should also point to the same value.

Pass by Copy-restore
This parameter passing mechanism works similar to ‘pass-by-reference’ except that the changes to
actual parameters are made when the called procedure ends. Upon function call, the values of actual
parameters are copied in the activation record of the called procedure. Formal parameters if manipulated
have no real-time effect on actual parameters (as l-values are passed), but when the called procedure
ends, the l-values of formal parameters are copied to the l-values of actual parameters.
Example:
int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
When this function ends, the l-value of formal parameter x is copied to the actual parameter y. Even if
the value of y is changed before the procedure ends, the l-value of x is copied to the l-value of y making
it behave like call by reference.

Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that works like preprocessor
in C language. In pass by name mechanism, the name of the procedure being called is replaced by its
actual body. Pass-by-name textually substitutes the argument expressions in a procedure call for the
corresponding parameters in the body of the procedure so that it can now work on actual parameters,
much like pass-by-reference.

Symbol Table
Symbol table is an important data structure created and maintained by compilers in order to store
information about the occurrence of various entities such as variable names, function names, objects,
classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.
A symbol table may serve the following purposes depending upon the language in hand:

 To store the names of all entities in a structured form at one place.


 To verify if a variable has been declared.
 To implement type checking, by verifying assignments and expressions in the source code are
semantically correct.
 To determine the scope of a name (scope resolution).

A symbol table is simply a table which can be either linear or a hash table. It maintains an entry for
each name in the following format:

<symbol name, type, attribute>


For example, if a symbol table has to store information about the following variable declaration:

static int interest;

then it should store the entry such as:

<interest, int, static>

The attribute clause contains the entries related to the name.

Implementation
If a compiler is to handle a small amount of data, then the symbol table can be implemented as an
unordered list, which is easy to code, but it is only suitable for small tables only. A symbol table can be
implemented in one of the following ways:

 Linear (sorted or unsorted) list


 Binary Search Tree
 Hash table

Among all, symbol tables are mostly implemented as hash tables, where the source code symbol itself
is treated as a key for the hash function and the return value is the information about the symbol.

Operations

A symbol table, either linear or hash, should provide the following operations.

insert()
This operation is more frequently used by analysis phase, i.e., the first half of the compiler where tokens
are identified and names are stored in the table. This operation is used to add information in the symbol
table about unique names occurring in the source code. The format or structure in which the names are
stored depends upon the compiler in hand.
An attribute for a symbol in the source code is the information associated with that symbol. This
information contains the value, state, scope, and type about the symbol. The insert() function takes the
symbol and its attributes as arguments and stores the information in the symbol table.

For example:

int a;

should be processed by the compiler as:

insert(a, int);

lookup()

lookup() operation is used to search a name in the symbol table to determine:


 if the symbol exists in the table.
 if it is declared before it is being used.
 if the name is used in the scope.
 if the symbol is initialized.
 if the symbol declared multiple times.
The format of lookup() function varies according to the programming language. The basic format
should match the following:

lookup(symbol)
This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol exists in the
symbol table, it returns its attributes stored in the table.

Scope Management

A compiler maintains two types of symbol tables: a global symbol table which can be accessed by all
the procedures and scope symbol tables that are created for each scope in the program.
To determine the scope of a name, symbol tables are arranged in hierarchical structure as shown in the
example below:
...
int value=10;

void pro_one()
{
int one_1;
int one_2;

{ \
int one_3; |_ inner scope 1
int one_4; |
} /

int one_5;

{ \
int one_6; |_ inner scope 2
int one_7; |
} /
}

void pro_two()
{
int two_1;
int two_2;

{ \
int two_3; |_ inner scope 3
int two_4; |
} /

int two_5;
}
...
The above program can be represented in a hierarchical structure of symbol tables:
The global symbol table contains names for one global variable (int value) and two procedure names,
which should be available to all the child nodes shown above. The names mentioned in the pro_one
symbol table (and all its child tables) are not available for pro_two symbols and its child tables.
This symbol table data structure hierarchy is stored in the semantic analyzer and whenever a name needs
to be searched in a symbol table, it is searched using the following algorithm:
 first a symbol will be searched in the current scope, i.e. current symbol table.
 if a name is found, then search is completed, else it will be searched in the parent symbol table
until,
 either the name is found or global symbol table has been searched for the name.

You might also like