Skip to content

SLP Vectorizer: some issues with getSpillCost() #26916

Closed
@ayalz

Description

@ayalz
Bugzilla Link 26542
Version trunk
OS Windows NT
Attachments ll version of the example
CC @ayalz,@hfinkel

Extended Description

BoUpSLP::getSpillCost() is said to do the following:
// Walk from the bottom of the tree to the top, tracking which values are
// live. When we see a call instruction that is not part of our tree,
// query TTI to see if there is a cost to keeping values live over it
// (for example, if spills and fills are required).

Some issues (with the implementation, not the documentation ;-):

  1. The first sentence above assumes that a traversal of the VectorizableTree[] array from its first entry (0, the root) to its last entry, corresponds to a bottom-up traversal of the associated instructions in program order, thereby supporting liveness data-flow analysis. However, the entries in this array are not guaranteed to appear in program order. In the example below involving a subtraction the minuend appears closer to the root of the VectorizableTree than the subtrahend, contrary to the order in which they appear inside the basic block. This messes up the liveness scan. Thanks to Michael Kuperstein for a lively discussion on this.

  2. The second sentence above indicates that we accumulate a spill/fill cost per call instruction. However, if the associated calling conventions provide callee-saved vector registers, we may be able to pay this cost once. Furthermore, even if all associated vector registers are caller-saved, if a live range between two VectorizableTree entries contains several calls, as in the example below, we can spill/fill once for all.

  3. In order to support SLP Vectorizer: Partial Tree Vectorization #18119 (SLP Vectorizer: Partial Tree Vectorization), we should attribute an individual spill cost to each VectorizableTree entry.

  4. A live range may span more than two basic blocks. If we want to keep checking only two (at both ends of the live range) in the interest of compile time, add a comment.

  5. Finally, targets that do not override the default getCostOfKeepingLiveOverCall() which returns 0, should avoid searching for calls throughout the live ranges of all values in the SLP chain, just to accumulate zero costs.

Consider the following example, SLP vectorizing to VF=4:

void slp (int, int);

void slp_call (int *a, int * b, int n)
{
int a0 = a[1];
int a1 = a[2];
int a2 = a[3];
int a3 = a[4];
slp(n, *a);
slp(n, *a);
slp(n, *a);
int b0 = b[2];
int b1 = b[3];
int b2 = b[4];
int b3 = b[5];
a0 = a0 - b0;
a1 = a1 - b1;
a2 = a2 - b2;
a3 = a3 - b3;
b[2] = a0;
b[3] = a1;
b[4] = a2;
b[5] = a3;
}

Clearly only one vector register needs to be spilled and filled before and after the 3 calls to slp() (if at all, considering remat, but that's another story). But as we start to traverse up the tree, we first account for the single live value feeding the store, then the 2 live values feeding the subtract (erasing the former single value); next we will reach the loads from a[1:4] (not the loads from b[2:5]), and before doing so we encounter the 3 calls and accumulate a cost of spilling/filling the above 2 live values each.

This order is evident by running -slp-vectorizer -dump:
SLP: #LV: 1 sub, Looking at %sub = sub nsw i32 %0, %7
SLP: #LV: 2 , Looking at %0 = load i32, i32* %arrayidx, align 4, !tbaa !​1
SLP: #LV: 1 , Looking at %7 = load i32, i32* %arrayidx4, align 4, !tbaa !​1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions