Unit 5 - Compiler Design - WWW - Rgpvnotes.in
Unit 5 - Compiler Design - WWW - Rgpvnotes.in
Tech
Subject Name: Compiler Design
Subject Code: CS-603
Semester: 6th
Downloaded from www.rgpvnotes.in
______________________________________________________________________________________
UNIT- V:
Introduction to Code optimization: sources of optimization of basic blocks, loops in flow graphs, dead code
elimination, loop optimization, Introduction to global data flow analysis, Code Improving transformations,
Data flow analysis of structure flow graph Symbolic debugging of optimized code.
______________________________________________________________________________________
1. Code Optimization
Code optimization phase is an optional phase in the phases of a compiler, which is either before the code
generation phase or after the code generation phase. This chapter focuses on the types of optimizer and the
techniques available for optimizing.
The code produced by the straight forward compiling algorithms can often be made to run faster or take less
space, or both. This improvement is achieved by program transformations that are traditionally called
optimizations. Compilers that apply code-improving transformations are called optimizing compilers.
The transformation must preserve the meaning of programs. That is, the optimization must not change the
output produced by a program for a given input, or cause an error such as division by zero, that was not
present in the original source program. At all times we take the “safe” approach of missing an opportunity
to apply a transformation rather than risk changing what the program does.
A transformation must, on the average, speed up programs by a measurable amount. We are also interested
in reducing the size of the compiled code although the size of the code has less importance than it once had.
Not every transformation succeeds in improving every program, occasionally an “optimization” may slow
down a program slightly.
The transformation must be worth the effort. It does not make sense for a compiler writer to expend the
intellectual effort to implement a code improving transformation and to have the compiler expend the
additional time compiling source programs if this effort is not repaid when the target programs are
executed. “Peephole” transformations of this kind are simple enough and beneficial enough to be included
in any compiler.
There are a number of ways in which a compiler can improve a program without changing the function it
computes.
The transformations
Common sub expression elimination,
Copy propagation,
Dead-code elimination, and
Constant folding
Are common examples of such function-preserving transformations? The other transformations come up
primarily when global optimizations are performed.
Frequently, a program will include several calculations of the same value, such as an offset in an array.
Some of the duplicate calculations cannot be avoided by the programmer because they lie below the level
of detail accessible within the source language.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1. And value of i is not
been changed from definition to use.
Assignments of the form f: = g called copy statements, or copies for short. The idea behind the copy-
propagation transformation is to use g for f, whenever possible after the copy statement f: = g. Copy
propagation means use of one variable instead of another.
This may not appear to be an improvement, but as we shall see it gives us an opportunity to eliminate x.
For example:
x=Pi;
……
A=x*r*r;
The optimization using copy propagation can be done as follows:
A=Pi*r*r;
Here the variable x is eliminated
2.5 Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used subsequently; otherwise, it is dead at that
point. A related idea is dead or useless code, statements that compute values that never get used. While the
programmer is unlikely to introduce any dead code intentionally, it may appear as the result of previous
transformations. An optimization can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, ‘if’ statement is dead code because this condition will never get satisfied.
For example,
a=3.14157/2 can be replaced by
a=1.570 thereby eliminating a division operation.
Structure-Preserving Transformations
Algebraic Transformations
Structure-Preserving Transformations:
The primary Structure-Preserving Transformation on basic blocks are:
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation in the basic block when value of t 1does not affect the
value of t2.
Natural Loop:
One application of dominator information is in determining the loops of a flow graph suitable for
improvement.
The properties of loops are A loop must have a single entry point, called the header. This entry
point-dominates all nodes in the loop, or it would not be the sole entry to the loop.
There must be at least one way to iterate the loop(i.e.)at least one path back to the header.
One way to find all the loops in a flow graph is to search for edges in the flow graph whose heads
dominate their tails. If a→b is an edge, b is the head and a is the tail. These types of edges are called as
back edges.
5. Dead-Code Elimination
Dead code is one or more than one code statements, which are:
Either never executed or unreachable,
Or if executed, their output is never used.
Thus, dead code plays no role in any program operation and therefore it can simply be eliminated.
Partially dead code
There are some code statements who’s computed values are used only under certain circumstances, i.e.,
sometimes the values are used and sometimes they are not. Such codes are known as partially dead-code.
LOOP
………………… a=x*y
A=z;
a=1; b=10
If a<b
. ………………. …………….
…………….
Likewise, the picture above depicts that the conditional statement is always false, implying that the code,
written in true case, will never be executed, hence it can be removed.
5.1 Partial Redundancy
Redundant expressions are computed more than once in parallel path, without any change in operands.
Whereas partial-redundant expressions are computed more than once in a path, without any change in
operands. For example,
Loop-invariant code is partially redundant and can be eliminated by using a code-motion technique.
Another example of a partially redundant code can be:
If (condition)
{
a = y OP z;
}
else
{
...
}
c = y OP z;
We assume that the values of operands (y and z) are not changed from assignment of variable a to
variable c. Here, if the condition statement is true, then y OP z is computed twice, otherwise once. Code
motion can be used to eliminate this redundancy, as shown below:
If (condition)
{
...
tmp = y OP z;
a = tmp;
...
}
else
{
...
tmp = y OP z;
}
c = tmp;
Here, whether the condition is true or false; y OP z should be computed only once.
This equation can be read as “the information at the end of a statement is either generated within the
statement, or enters at the beginning and is not killed as control flows through the statement.”
The details of how data-flow equations are set and solved depend on three factors.
The notions of generating and killing depend on the desired information, i.e., on the data flow
analysis problem to be solved. Moreover, for some problems, instead of proceeding along with flow
of control and defining out[s] in terms of in[s], we need to proceed backwards and define in[s] in
terms of out[s].
Since data flows along control paths, data-flow analysis is affected by the constructs in a program.
In fact, when we write out[s] we implicitly assume that there is unique end point where control
leaves the statement; in general, equations are set up at the level of basic blocks rather than
statements, because blocks do have unique end points.
There are subtleties that go along with such statements as procedure calls, assignments through
pointer variables, and even assignments to array variables.
Now let us take a global view and consider all the points in all the blocks. A path from p1 to pn is a
sequence of points p1, p2,….,pn such that for each i between 1 and n-1, either
1. Pi is the point immediately preceding a statement and pi+1 is the point immediately following that
statement in the same block, or
2. Pi is the end of some block and pi+1 is the beginning of a successor block.
A definition of variable x is a statement that assigns, or may assign, a value to x. The most common forms
of definition are assignments to x and statements that read a value from an i/o device and store it in x. These
statements certainly define a value for x, and they are referred to as unambiguous definitions of x. There are
certain kinds of statements that may define a value for x; they are called ambiguous definitions.
1. A call of a procedure with x as a parameter or a procedure that can access x because x is in the scope
of the procedure.
2. An assignment through a pointer that could refer to x. For example, the assignment *q:=y is a
definition of x if it is possible that q points to x. we must assume that an assignment through a pointer is a
definition of every variable.
We say a definition d reaches a point p if there is a path from the point immediately following d to p, such
that d is not “killed” along that path. Thus a point can be reached by an unambiguous definition and an
ambiguous definition of the appearing later along one path.
Expressions in this language are similar to those in the intermediate code, but the flow graphs for
statements have restricted forms.
We define a portion of a flow graph called a region to be a set of nodes N that includes a header, which
dominates all other nodes in the region. All edges between nodes in N are in the region, except for some
that enter the header. The portion of flow graph corresponding to a statement S is a region that obeys the
further restriction that control can flow to just one outside block when it leaves the region.
We say that the beginning points of the dummy blocks at the statement’s region are the beginning and end
points, respective equations are inductive, or syntax-directed, definition of the sets in[S], out[S], gen[S],
and kill[S] for all statements S. gen[S] is the set of definitions “generated” by S while kill[S] is the set of
definitions that never reach the end of S.
Algorithms for performing the code improving transformations rely on data-flow information. Here
we consider common sub-expression elimination, copy propagation and transformations for moving
loop invariant computations out of loops and for eliminating induction variables.
Global transformations are not substitute for local transformations; both must be performed.
Not all changes made by algorithm are improvements. We might wish to limit the number of different
evaluations reaching s found in step (1), probably to one.
Algorithm will miss the fact that a*z and c*z must have the same value in
a :=x+y c :=x+y
vs
b :=a*z d :=c*z
Because this simple approach to common sub expressions considers only the literal expressions themselves,
rather than the values computed by expressions.
Determine those uses of x that are reached by this definition of namely, s: x: =y.
Determine whether for every use of x found in (1) , s is in c_in[B], where B is the block of this particular
use, and moreover, no definitions of x or y occur prior to this use of x within B. Recall that if s is in
c_in[B]then s is the only definition of x that reaches B.
If s meets the conditions of (2), then remove s and replace all uses of x found in (1)
by y.
Move, in the order found by loop-invariant algorithm, each statement s found in (1) and meeting
conditions (2i), (2ii), (2iii) , to a newly created pre-header, provided any operands of s that are defined in
loop L have previously had their definition statements moved to the pre-header.
To understand why no change to what the program computes can occur, condition (2i) and (2ii) of this
algorithm assure that the value of x computed at s must be the value of x after any exit block of L. When we
move s to a pre-header, s will still be the definition of x that reaches the end of any exit block of L.
Condition (2iii) assures that any uses of x within L did, and will continue to, use the value of x computed
by s.
Symbolic debuggers are system development tools that can accelerate the validation speed of behavioral
specifications by allowing a user to interact with an executing code at the source level. In response to a user
query, the debugger retrieves the value of a source variable in a manner consistent with respect to the
source statement where execution has halted.
Symbolic debuggers are system development tools that can accelerate the validation speed of behavioral
specifications by allowing a user to interact with an executing code at the source level [Hen82]. Symbolic
debugging must ensure that in response to a user inquiry, the debugger is able to retrieve and display the
value of a source variable in a manner consistent with respect to a breakpoint in the source code. Code
optimization techniques usually makes symbolic debugging harder. While code optimization techniques
such as transformations must have the property that the optimized code is functionally equivalent to the un-
optimized code, such optimization techniques may produce a different execution sequence from the source
statements and alter the intermediate results. Debugging un-optimized rather than optimized code is not
acceptable for several reasons, including:
while an error in the un-optimized code is undetectable, it is detectable in the optimized code,
optimizations may be necessary to execute a program due to hardware limitations, and
a symbolic debugger for optimized code is often the only tool for finding errors in an optimization
tool.
In a design-for-debugging (DfD) approach that enables retrieval of source values for a globally optimized
behavioral specification. The goal of the DfD technique is to modify the original code in a pre-synthesis
step such that every variable of the source code is controllable and observable in the optimized program.