Effective Representation of Aliases and Indirect Memory Operations in SSA Form
Effective Representation of Aliases and Indirect Memory Operations in SSA Form
Abstract. This paper addresses the problems of representing aliases and indirect
memory operations in SSA form. We propose a method that prevents explosion in
the number of SSA variable versions in the presence of aliases. We also present a
technique that allows indirect memory operations to be globally commonized.
The result is a precise and compact SSA representation based on global value
numbering, called HSSA, that uniformly handles both scalar variables and indi-
rect memory operations. We discuss the capabilities of the HSSA representation
and present measurements that show the effects of implementing our techniques
in a production global optimizer.
1 Introduction
The Static Single Assignment (SSA) form [CFR+91] is a popular and efficient represen-
tation for performing analyses and optimizations involving scalar variables. Effective
algorithms based on SSA have been developed to perform constant propagation, redun-
dant computation detection, dead code elimination, induction variable recognition, and
others [AWZ88, RWZ88, WZ91, Wolfe92]. But until now, SSA has only been used
mostly for distinct variable names in the program. When applied to indirect variable
constructs, the representation is not straight-forward, and results in added complexities
in the optimization algorithms that operate on the representation [CFR+91, CG93,
CCF94]. This has prevented SSA from being widely used in production compilers.
In this paper, we devise a scheme that allows indirect memory operations to be rep-
resented together with ordinary scalar variables in SSA form. Since indirect memory
accesses also affect scalar variables due to aliasing, we start by defining notations to
model the effects of aliasing for scalars in SSA form. We also describe a technique to
reduce the overhead in SSA representation in the presence of aliases. Next, we intro-
duce the concept of virtual variables, which model indirect memory operations as if
they are scalars. We then show how virtual variables can be used to derive identical or
distinct versions for indirect memory operations, effectively putting them in SSA form
together with the scalar variables. This SSA representation in turn reduces the cost of
analyzing the scalar variables that have aliases by taking advantage of the versioning
applied to the indirect memory operations aliased with the scalar variables. We then
present a method that builds a uniform SSA representation of all the scalar and indirect
1
memory operations of the program based on global value numbering, which we call the
Hashed SSA representation (HSSA). The various optimizations for scalars can then
automatically extend to indirect memory operations under this framework. Finally, we
present measurements that show the effectiveness of our techniques in a production glo-
bal optimizer that uses HSSA as its internal representation.
In this paper, indirect memory operations cover both the uses of arrays and accesses
to memory locations through pointers. Our method is applicable to commonly used lan-
guages like C and FORTRAN.
1. If they exactly overlap, our representation will handle them as a single variable.
2. In [CCF94], MustDefs and MayDefs are called Killing Defs and Preserving Defs respec-
tively, while in [Steen95], they are called Strong Updates and Weak Updates.
2
Original Program: SSA Representation:
i←2 i1 ← 2
if (j) if (j1)
µ(i1) *p ← 3
call func() *p ← 3 call func() i2←χ(i1)
i3←φ(i1, i2)
return i
return i3
Fig. 1. Example of µ, χ and φ
same variable from being referenced later in the program. On the use side, in addition to
real uses of the variable, there are places in the program where there are potential refer-
ences to the variable that need to be taken into account in analyzing the program. We
call these potential references MayUse.
To accommodate the MayDefs, we use the idea from [CCF94] in which SSA edges
for the same variable are factored over its MayDefs. This is referred to as location-fac-
tored SSA form in [Steen95]. We model this effect by introducing the χ assignment
operator in our SSA representation. χ links up the use-def edges through MayDefs. The
operand of χ is the last version of the variable, and its result is the version after the
potential definition. Thus, if variable i may be modified, we annotate the code with i2 =
χ(i1), where i1 is the current version of the variable.
To model MayUses, we introduce the µ operator in our SSA representation. µ takes
as its operand the version of the variable that may be referenced, and produces no result.
Thus, if variable i may be referenced, we annotate the code with µ(i1), where i1 is the
current version of the variable.
In our internal representation, expressions cannot have side effects. Memory loca-
tions can only be modified by statements, which include direct and indirect store state-
ments and calls. Thus, χ can only be associated with store and call statements. µ is
associated with any dereferencing operation, like the unary operator * in C, which can
happen within an expression. Thus, µ arises at both statements and expressions. We also
mark return statements with µ for non-local variables to represent their liveness at func-
tion exits. Separating MayDef and MayUse allows us to model the effects of calls pre-
cisely. For example, a call that only references a variable will only cause a µ but no χ.
For our modeling purpose, the µ takes effect just before the call, and the χ takes effect
right after the call. Figure 1 gives an example of the use of µ, χ together with φ in our
SSA representation. In the example, function func uses but does not modify variable i.
The inclusion of µ and χ in the SSA form does not impact the complexity of the
algorithm that computes SSA form. A pre-pass inserts unnamed µ’s and χ’s for the
3
aliased variables at the points of aliases in the program. In applying the SSA creation
algorithm described in [CFR+91], the operands of the µ and χ are treated as uses and the
χ’s are treated as additional assignments. The variables in the µ and χ are renamed
together with the rest of the program variables.
Transformations performed on SSA form have to take aliasing into account in order
to preserve the safety of the optimization. In our SSA representation, it means taking
into account the µ and χ annotations. For example, in performing dead code elimination
using the algorithm presented in [CFR+91], a store can be deleted only if the store itself
and all its associated χ’s are not marked live.
4
and only use-def information is available.1The algorithm identifies the versions of vari-
ables that can be made zero-version and resets their versions to 0.
Algorithm 1. Compute Zero Versions:
1. Initialize flag HasRealOcc for each variable version created by SSA renaming to
false.
2. Pass over the program. On visiting a real occurrence, set the HasRealOcc flag for the
variable version to true.2
3. For each program variable, create NonZeroPhiList and initialize to empty.
4. Iterate through all variable versions:
a. If HasRealOcc is false and it is defined by χ, set version to 0.
b. If HasRealOcc is false and it is defined by φ:
• If the version of one of the operands of the φ is 0, set version to 0.
• Else if the HasRealOcc flag of all of the operands of the φ is true, set HasRe-
alOcc to true.
• Else add version to NonZeroPhiList for the variable.
5. For each program variable, iterate until its NonZeroPhiList no longer changes:
a. For each version in NonZeroPhiList:
• If the version of one of the operands of the φ is 0, set version to 0 and remove
from NonZeroPhiList.
• Else if the HasRealOcc flag of all the operands of the φ is true, set HasReal-
Occ to true and remove from NonZeroPhiList.
The first iteration through all the variable versions, represented by Step 4, com-
pletely processes all variable versions except those that are the results of φ whose oper-
ands have not yet been processed. These are collected into NonZeroPhiList. After the
first iteration of Step 5, the versions still remaining in NonZeroPhiList all have at least
one operand defined by φ. The upper bound on the number of iterations in Step 5 corre-
sponds to the longest chain of contiguous φ assignments for the variable in the program.
When no more zero versions can be propagated through each φ, the algorithm termi-
nates.
Because zero versions can have multiple assignments statically, they do not have
fixed or known values, so that two zero versions of the same variable cannot be assumed
to be the same. The occurrence of a zero version breaks the use-def chain. Since the
results of χ’s have unknown values, zero versions do not affect the performance of opti-
mizations that propagate known values, like constant propagation [WZ91], because they
cannot be propagated across points of Maydefs to the variables. Optimizations that
5
a← a1 ← a1 ←
µ(a1) µ(a1)
call func() call func() call func()
a2 ← χ(a1) a0 ← χ(a1)
* * * *
Tree representation: *
+ * + +
p
p p1 p &a i
8
Fig. 3. Examples of Indirect Memory Operations
6
Original Program: SSA Representation:
. . *p . . . . (*p1)1 . .
p←p+1 p2 ← p1 + 1
. . *p . . . . (*p2)2 . .
Fig. 4. Renaming Indirect Variables
7
(a[i1])1 ← (a[i1])1 ←
a[i] ←
v*2←χ(v*1) va[i]2←χ(va[i]1)
(*p1)1 ← (*p1)1 ←
*p ←
v*3←χ(v*2) v*p2←χ(v*p1)
i ←i + 1 i2 ← i1 + 1 i2 ←i1 + 1
µ(v*4) µ(va[i]2)
. . a[i] . . . . (a[i3])2 . . . . (a[i3])2 . .
(3 versions of v*p,
(4 versions of v*) 2 versions of va[i])
(a) Original Program (b) Using 1 virtual variable v* (c) Using 2 virtual variables
va[i] and v*p
Indirect variables a[i] and *p do not alias with each other
Fig. 5. SSA for indirects using different numbers of virtual variables
indirect variable. Each occurrence of a indirect variable can then be given the version
number of the virtual variable that annotates it in the µ or χ, except that new versions
need to be generated when the address expression contains different versions of vari-
ables. We can easily handle this by regarding indirect variables whose address expres-
sions have been renamed differently to be different indirect variables, even though they
share the same virtual variable due to similar alias behavior. In Figure 4, after p has been
renamed, *p1 and *p2 are regarded as different indirect variables. This causes different
version numbers to be assigned to the *p’s, (*p1)1 and (*p2)2, even though the version
of the virtual variable v*p has not changed.
It is possible to cut down the number of virtual variables by making each virtual
variable represent more indirect variables. This is referred to as assignment factoring in
[Steen95]. It has the effect of replacing multiple use-def chains belonging to different
virtual variables with one use-def chain that has more nodes and thus versions of the vir-
tual variable, at the expense of higher analysis cost while going up the use-def chain. In
the extreme case, one virtual variable can be used to represent all indirect variables in
the program. Figure 5 gives examples of using different numbers of virtual variables in
n a program. In that example, a[i1] and a[i2] are determined to be different versions
because of the different versions of i used in them, and we assign versions 1 and 2
(shown as subscripts after the parentheses that enclose them) respectively. In part (b) of
Figure 5, when we use a single virtual variable v* for both a[i] and p, even though they
8
do not alias with each other, the single use-def chain has to pass through the appear-
ances of both of them, and is thus less accurate.
In practice, we do not use assignment factoring among variables that do not alias
among themselves, so that we do not have to incur the additional cost of analyzing the
alias relationship between different variables while traversing the use-def chains. For
example, we assign distinct virtual variables to a[i] and b[i] where arrays a and b do not
overlap with each other. While traversing the use-def chains, we look for the presence of
non-aliasing in indirects by analyzing their address expressions. For example, a[i1] does
not alias with a[i1+1] even though they share the same virtual variable.
Zero versions can also be applied to virtual variables, in which virtual variables
appearing in the µ and χ next to their corresponding indirect variables are regarded as
real occurrences. This also helps to reduce the number of versions for virtual variables.
1. The identification of each variable version is its unique node in the hash table, and the ver-
sion number can be discarded.
9
traversing the code. But with GVN, this method cannot be used because GVN is flow-
insensitive. One possibility is to give up commonizing the indirect operators by always
creating a new value number for each indirect operator. This approach is undesirable,
since it decreases the optimality and effectiveness of the GVN. To solve this problem,
we apply the method described in the previous section of renaming indirect operations.
During GVN, we map a value number to each unique version of indirect operations that
our method has determined.
Our GVN has some similarity to that described in [Click95], in that expressions are
hashed into the hash table bottom-up. However, our representation is in the form of
expression trees, instead of triplets. Since we do not use triplets, variables are distinct
from operators. Statements are not value-numbered. Instead, they are linked together on
a per-block basis to represent the execution flow of each basic block. The DAG struc-
ture allows us to provide use-def information cheaply and succinctly, via a single link
from each variable node to its defining statement. The HSSA representation by default
does not provide def-use chains.
Our HSSA representation has five types of nodes. Three of them are leaf nodes:
const for constants, addr for addresses and var for variables. Type op represents general
expression operators. Indirect variables are represented by nodes of type ivar. Type ivar
is a hybrid between type var and type op. It is like type op because it has an expression
associated with it. It is like type var because it represents memory locations. The ivar
node corresponds to the C dereferencing operator *. Both var and ivar nodes have links
to their defining statements.
We now outline the steps to build the HSSA representation of the program:
Algorithm 2. Build HSSA:
1. Assign virtual variables to indirect variables in the program.
2. Perform alias analysis and insert µ and χ for all scalar and virtual variables.
3. Insert φ using the algorithm described in [CFR+91], including the χ as assignments.
4. Rename all scalar and virtual variables using the algorithm described in [CFR+91].
5. Perform the following simultaneously:
a. Perform dead code elimination to eliminate dead assignments, including φ and χ,
using the algorithm described in [CFR+91].
b. Perform Steps 1 and 2 of the Compute Zero Version algorithm described in Sec-
tion 3 to set the HasRealOcc flag for all variable versions.
6. Perform Steps 3, 4 and 5 of the Compute Zero Version algorithm to set variable ver-
sions to 0.
7. Hash a unique value number and create the corresponding hash table var node for
each scalar and virtual variable version that are determined to be live in Step 5a.
Only one node needs to be created for the zero versions of each variable.
10
8. Conduct a pre-order traversal of the dominator tree of the control flow graph of the
program and apply global value numbering to the code in each basic block:
a. Hash expression trees bottom up into the hash table, searching for any existing
matching entry before creating each new value number and entry. At a var node,
use the node created in Step 7 for that variable version.
b. For two ivar nodes to match, two conditions must be satisfied: (1) their address
expressions have the same value number, and (2) the versions of their virtual
variables are either the same, or are separated by definitions that do not alias with
the ivar.
c. For each assignment statement, including φ and χ, represent its left hand side by
making the statement point to the var or ivar node for direct and indirect assign-
ments respectively. Also make the var or ivar node point back to its defining
statement.
d. Update all φ, µ and χ operands and results to make them refer to entries in the
hash table.
The second condition of Step 8b requires us to go up the use-def chain of the virtual
variable starting from the current version to look for occurrences of the same ivar node
that are unaffected by stores associated with the same virtual variable. For example, a
store to a[i1+1] after a use of a[i1] does not invalidate a[i1]. Because we have to go up
the use-def chain, processing the program in a pre-order traversal of the dominator tree
of the control flow graph guarantees that we have always processed the earlier defini-
tions.
Once the entire program has been represented in HSSA form, the original input pro-
gram can be deleted. Figure 6 gives a conceptual HSSA representation for the example
of Figure 4. In our actual implementation, each entry of the hash table uses a linked list
2 var p
3 var p def
Statement :
11 const 1
:
:
24 ivar kid1 = 2
Store lhs = 3 rhs = 42 :
42 op + kid1 = 2 kid2 = 11
:
55 ivar kid1 = 3
:
Fig. 6. HSSA representation for the example in Figure 5
11
for all the entries whose hash numbers collide, and the value number is represented by
the pair <index, depth>.
6 Using HSSA
The HSSA form is more memory efficient than ordinary representations because of
structure sharing resulting from DAGs. Compared to ordinary SSA form, HSSA also
uses less memory because it does not provide def-use information, while use-def infor-
mation is much less expensive because multiple uses are represented by a common
node. Many optimizations can run faster on HSSA because they only need to be applied
just once on the shared nodes. The various optimizations can also take advantage of the
fact that it is trivial to check if two expressions compute the same value in HSSA.
An indirect memory operation is a hybrid between expression and variable, because
it is not a leaf node but operates on memory. Our HSSA representation captures this
property, so that it can benefit from optimizations applied to either expressions or vari-
ables.
Optimizations developed for scalar variables in SSA form can now be applied to
indirect memory operations. With the use-def information for indirect variables, we can
substitute indirect loads with their known values, performing constant propagation or
forward substitution. We can recognize and exploit equivalences among expressions
that include indirect memory operations. We can also remove useless direct and indirect
stores at the same time while performing dead code elimination.
The most effective way to optimize indirect memory operations is to promote them
to scalars when possible. We call this optimization indirection removal, which refers to
the conversion of an indirect store and its subsequent indirect loads to direct store and
loads. This promotion to scalar form enables it to be allocated to register, thus eliminat-
ing accesses to memory. An indirect variable can be promoted to a scalar whenever it is
free of aliases. This can be verified by checking for the presence of its virtual variables
in µ’s and χ’s. Promotion of an ivar node can be effected by overwriting it with the con-
tents of the new var node, thus avoiding rehashing its ancestor nodes in the DAG repre-
sentation.
Optimization opportunites in indirect memory operations depends heavily on the
quality or extent of alias analysis. Implementing the above techniques in an optimizer
enables programs to benefit directly from any improvement in the results of the alias
analyzer.
7 Measurements
We now present measurements that show the effects of applying the techniques we
described in this paper. The measurements are made in the production global optimizer
WOPT, a main component of the compiler suite that will be shipped on MIPS R10000-
based systems in May of 1996. In addition to the optimizations described in Section 6,
12
Routine Language Description
tomcatv FORTRAN 101.tomcatv, SPECfp95
loops FORTRAN subroutine loops from 103.su2cor, SPECfp95
kernel FORTRAN routine containing the 24 Lawrence Livermore Kernels
twldrv FORTRAN subroutine twldrv from 145.fpppp, SPECfp95
Data_path C function Data_path from 124.m88ksim, SPECint95
compress C function compress from 129.compress, SPECint95
Query_Ass C function Query_AssertOnObject from 147.vortex, SPECint95
eval C function eval from 134.perl, SPECint95
Table 1. Description of routines used in measurements
WOPT also performs bit-vector-based partial redundancy elimination and strength
reduction. From the input program, it builds HSSA and uses it as its only internal pro-
gram representation until it finishes performing all its optimizations. We focus on the
effects that zero versioning and the commonizing of indirect variables have on the
HSSA representation in the optimizer. We have picked a set of 8 routines, 7 of which are
from the SPEC95 benchmark suites. Table 1 describes these routines. We have picked
progressively larger routines to show the effects of our techniques as the size of the rou-
tines increases. The numbers shown do not include the effects of inter-procedural alias
analysis.
We characterize the HSSA representation by the number of nodes in the hash table
needed to represent the program. The different types of nodes are described earlier in
Section 5. Without zero versioning, Table 2 shows that var nodes can account for up to
94% of all the nodes in the hash table. Applying zero versioning decreases the number
of var nodes by 41% to 90%. The numbers of nodes needed to represent the programs
are reduced from 30% to 85%. Note that the counts for non-var nodes remain constant,
since only variables without real occurrences are converted to zero versions. Having to
deal with less nodes, the time spent in performing global optimization is reduced from
2% to 45%. The effect of zero versioning depends on the amount of aliasing in the pro-
gram. Zero versioning also has bigger effects on large programs, since there are more
variables affected by each alias. We have found no noticeable difference in the running
time of the benchmarks due to zero versioning.
With zero versioning being applied, Table 3 shows the effects of commonizing indi-
rect variables on the HSSA representation. Ivar nodes account for from 6% to 21% of
the total number of nodes in our sample programs. Commonizing ivar nodes reduces the
ivar nodes by 3% to 58%. In each case, the total number of nodes decreases more than
the number of ivar nodes, showing that commonizing the ivar nodes in turn enables
other operators that operate on them to be commonized. Though the change in the total
number of hash table nodes is not significant, the main effect of this technique is in pre-
venting missed optimizations, like global common subexpressions, among indirect
memory operations.
13
number of nodes percentage
reduction compilation
routines zero version off zero version on
speedup
all vars all vars all vars
tomcatv 1803 1399 844 440 53% 69% 4%
loops 7694 6552 2493 1351 68% 79% 9%
kernel 9303 8077 2974 1748 68% 78% 6%
twldrv 33683 31285 6297 3899 81% 88% 11%
Data_path 489 365 340 216 30% 41% 2%
compress 759 647 367 255 52% 61% 4%
Query_Ass 5109 4653 1229 773 76% 83% 12%
eval 62966 59164 9689 5887 85% 90% 45%
Table 2. Effects of zero versioning
number of nodes
percentage reduction
routines ivar commoning off ivar commoning on
all nodes ivar nodes all nodes ivar nodes all nodes ivar nodes
tomcatv 844 124 828 111 2% 10%
loops 2493 453 2421 381 3% 16%
kernel 2974 398 2854 306 4% 23%
twldrv 6297 506 6117 333 3% 34%
Data_path 340 44 320 30 6% 32%
compress 367 21 365 19 0.5% 10%
Query_Ass 1229 183 1218 173 1% 5%
eval 9689 1994 9114 1504 6% 25%
Table 3. Effects of the global commonizing of ivar nodes
8 Conclusion
We have presented practical methods that efficiently model aliases and indirect
memory operations in SSA form. Zero versioning prevents large variations in the repre-
sentation overhead due to the amount of aliasing in the program, with minimal impact
on the quality of the optimizations. The HSSA form captures the benefits of SSA while
efficiently representing program constructs using global value numbering. Under
HSSA, the integration of alias handling, SSA and global value numbering enables indi-
rect memory operations to be globally commonized. Generalizing SSA to indirect mem-
ory operations in turn allows them to benefit from optimizations developed for scalar
variables. We believe that these are all necessary ingredients for SSA to be used in a
production-quality global optimizer.
14
Acknowledgement
The authors would like to thank Peter Dahl, Earl Killian and Peng Tu for their helpful
comments in improving the quality of this paper. Peter Dahl also contributed to the
work in this paper.
References
[AWZ88] Alpern, B., Wegman, M. and Zadeck, K. Detecting Equality Of Variables in
Programs. Conference Record of the 15th ACM Symposium on the Principles of
Programming Languages, Jan. 1988.
[CWZ90] Chase, D., Wegman, M. and Zadeck, K. Analysis of Pointers and Structures.
Proceedings of the SIGPLAN ‘90 Conference on Programming Language Design and
Implementation, June 1990.
[Chow83] Chow, F. “A Portable Machine-independent Global Optimizer — Design and
Measurements,” Ph.D. Thesis and Technical Report 83-254, Computer System Lab,
Stanford University, Dec. 1983.
[Click95] Click, C., Global Code Motion Global Value Numbering, Proceedings of the
SIGPLAN ‘95 Conference on Programming Language Design and Implementation,
June 1995.
[CBC93] Choi, J., Burke, M. and Carini, P. Efficient Flow-Sensitive Interprocedural
Computation of Pointer-Induced Aliases and Side Effects. Conference Record of the
20th ACM Symposium on the Principles of Programming Languages, Jan. 1993.
[CCF94] Choi, J., Cytron, R. and Ferrante, J. On the Efficient Engineering of Ambitious
Program Analysis. IEEE Transactions on Software Engineering, February 1994, pp.
105-113.
[CS70] Cocke, J. and Schwartz, J. Programming Languages and Their Compilers. Courant
Institute of Mathematical Sciences, New York University, April 1970.
[CFR+91] Cytron, R., Ferrante, J., Rosen B., Wegman, M. and Zadeck, K., Efficiently
Computing Static Single Assignment Form and the Control Dependence Graph. ACM
Transactions on Programming Languages and Systems, October 1991, pp. 451-490.
[CG93] Cytron, R. and Gershbein, R., Efficient Accomodation of May-alias Information in
SSA Form, Proceedings of the SIGPLAN ‘93 Conference on Programming Language
Design and Implementation, June 1993.
[RWZ88] Rosen, B., Wegman, M. and Zadeck K. Global Value Numbers and Redundant
Computation. Conference Record of the 15th ACM Symposium on the Principles of
Programming Languages, Jan. 1988.
[Ruf95] Ruf, E. Context-Insensitive Alias Analysis Reconsidered. Proceedings of the
SIGPLAN ‘95 Conference on Programming Language Design and Implementation,
June 1995.
[Steen95] Steensgaard, B. Sparse Functional Stores for Imperative Programs. Proceedings of
the SIGPLAN ‘95 Workshop on Intermediate Representations, Jan. 1995.
[WL95] Wilson, B. and Lam, M. Efficient Context Sensitive Pointer Analysis for C Programs.
Proceedings of the SIGPLAN ‘95 Conference on Programming Language Design and
Implementation, June 1995.
[WZ91] Wegman, M. and Zadeck, K. Constant Propagation with Conditional Branches. ACM
Transactions on Programming Languages and Systems, April 1991, pp. 181-210.
[Wolfe92] Wolfe, M. Beyond induction variables. Proceedings of the SIGPLAN ‘92 Conference
on Programming Language Design and Implementation, June 1992.
15