Chapter-6 (Compiler Design and Construction)
Chapter-6 (Compiler Design and Construction)
The front end translates the source program into an intermediate representation from
which the backend generates target code. Intermediate codes are machine independent
codes, but they are close to machine instructions.
Intermediate Representations
There are three kinds of intermediate representations:
1. Graphical representations (e.g. Syntax tree or Dag)
2. Postfix notation: operations on values stored on operand stack (similar to JVM byte code)
3. Three-address code: (e.g. triples and quads) Sequence of statement of the form x = y op z
Syntax tree:
Syntax tree is a graphic representation of given source program and it is also called
variant of parse tree. A tree in which each leaf represents an operand and each interior
node represents an operator is called syntax tree.
Example: Syntax tree for the expression a*(b + c)/d
/
* d
a +
b c
+ *
* d
a -
b c
Postfix notation
The representation of an expression in operators followed by operands is called postfix
notation of that expression. In general if x and y be any two postfix expressions and OP
is a binary operator then the result of applying OP to the x and y in postfix notation by
―x y OP‖.
Examples:
1. (a+ b) * c in postfix notation is: a b + c *
2. a * (b + c) in postfix notation is: a b c + *
Postfix notation is the useful form of intermediate code if the given language is
expressions.
Postfix notation is also called as 'suffix notation' and 'reverse polish'.
Postfix notation is a linear representation of a syntax tree.
In the postfix notation, any expression can be written unambiguously without
parentheses.
The ordinary (infix) way of writing the sum of x and y is with operator in the middle:
x * y. But in the postfix notation, we place the operator at the right end as xy *.
In postfix notation, the operator follows the operand.
t1 = B + A
t2 = Y - t1
t3 = t1 * t2
Solution:
t1 = 2
t2 = t1 * n
t3 = t2 + k
i = t3
L1: if i = 0 goto L2
t4 = i - k
i = t4
goto L1
L2: ………………..
B1.true = B.true
B1.false = newlabel()
B → B1 || B2 B2.true = B.true
B2.false = B.false
B.code = B1.code || label(B1.false) || B2.code
B1.true = newlabel()
B1.false = B.false
B → B1 && B2 B2.true = B.true
B2.false = B.false
B.code = B1.code || label(B1.true) || B2.code
B1.true = B.false
B1.false = B.true
B →! B1
B.code = B1.code
B.true = newlabel()
B.false = S.next
S → if ( B ) S1
S1.next = S.next
S.code = B.code || label(B.true) || S1.code
B.true = newlabel()
B.false = newlabel()
S1.next = S.next
S → if ( B ) S1 else S2
S2.next = S.next
S.code = B.code || label(B.true) || S1.code
|| gen(goto S.next) || label(B.false) || S2.code
begin = newlabel()
B.true = newlabel()
S → while ( B ) S1
B.false = S.next
S1.next = begin
S.code = label(begin) || B.code || label(B.true) || S1.code || gen(goto begin)
Fig: If--then
Example 1: Convert the following switch statement into three address code:
Switch (i + j)
{
Case 1: x=y + z
Case 2: u=v + w
Case 3: p=q * w
Default: s=u / v
}
Solution:
t=i+j
L1: if t==1 goto L2
else goto L3
L2: x=y+z goto L8
L3: if t==2 goto L4
else goto L5
L4: u=v + w goto L8
L5: if t==3 goto L6
else goto L7
L6: p=q * w goto L8
L7: s=u / v goto L8
L8: …………………..
i.e A[i]= i* w + C
Example: Let A be a 10 X 20 array, there are 4 bytes per word, assume low1=low2=1.
Solution: Let X=A[Y, Z]
Now using formula for two dimensional array as,
((i1 * n2) + i2) * w + baseA - ((low1 * n2) + low2) * w
= ((Y * 20) + Z) * 4 + baseA - ((1 * 20) + 1) * 4
= ((Y * 20) + Z) * 4 + baseA - ((1 * 20) + 1) * 4
= ((Y * 20) + Z) * 4 + baseA – 84
We can convert the above expression in three address codes as below:
T1= Y * 20
T1= T1+Z
T2=T1*4
T3=baseA -84
T4=T2+ T3
X= T4
4. Procedure Calls
The procedure is such an important and frequently used programming construct that it is
imperative for a compiler to generate good code for procedure calls and returns. The run-time
routines that handle procedure argument passing, calls and returns are part of the run-time support
package.
5. Back patching
The easiest way to implement the syntax-directed definitions for boolean expressions is to
use two passes. First, construct a syntax tree for the input, and then walk the tree in depth-
first order, computing the translations. The main problem with generating code for Boolean
expressions and flow-of-control statements in a single pass is that during one single pass
we may not know the labels that control must go to at the time the jump statements are
generated. Hence, a series of branching statements with the targets of the jumps left
unspecified is generated. Each statement will be put on a list of goto statements whose
labels will be filled in when the proper label can be determined. We call this subsequent
filling in of labels backpatching.
To manipulate lists of labels, we use three functions:
1. makelist(i) creates a new list containing only i, an index into the array of
quadruples; makelist returns a pointer to the list it has made.
2. merge(p1,p2) concatenates the lists pointed to by p1 and p2, and returns a pointer to
the concatenated list.
3. backpatch(p,i) inserts i as the target label for each of the statements on the list
pointed to by p.
If we decide to generate the three address code for given syntax directed definition
using single pass only, then the main problem that occurs is the decision of addresses of
the labels. ‗goto‘ statements refer these label statements and in one pass it becomes
difficult to know the location of these label statements. The idea to back-patching is to
leave the label unspecified and fill it later, when we know what it will be.
If we use two passes instead of one pass then in one pass we can leave these
addresses unspecified and in second pass this incomplete information can be filled up.