Background
I keep repeating this to multiple people, so I thought I should just get it out in the open so we can discuss this together. @rolfmorel @asiemien @mshahid @javedabsar @MaheshRavishankar @banach-space @ftynse
There is a disconnect between Linalg operations being declared and upstreamed from various groups, and we need to get the house in order. Some are very generic (pun intended), others are very specific. Named ops have been added with a restrictive semantics, that we’re trying to fix with the new scheme.
We have discussed between ourselves and I think we mostly agree on the overall idea, but we need to get to a specific level of detail to help @mshahid propose batch_matmul
and match_reduce_matmul
, @rolfmorel propose contraction
, @javedabsar propose element-wise
and @MaheshRavishankar propose convolution
. We also don’t need to create this tree in one go, and can have multiple steps for different branches.
Tree Proposal
An operation tree, where generic is at the root, and the operations get more and more specialized. For example, generic
→ contract
→ matmul
. When doing tiling, we want to avoid having to coerce a matmul
all the way down to a generic
if we can stop half-way through as a contract
.
The reason to have the intermediate operations has been widely discussed (and agreed): to help pattern matchers (avoid checking generics every time for every possible combination in ever transform/pass in a schedule/pipeline), by persisting semantics (memoization) across multiple transforms. It also helps matching intermediate semantics to vectorization or micro-kernel lowering when we support more than just the standard matmuls, but don’t want to create a gazillion variations.
Here’s the idea for a tree:
generic
|- contract
| |- matmul
| |- vecmat
| |- matvec
| |- batch_...
| |- batch_reduce_...
| |- quantized_... (we need to solve this problem later)
| |- mmt4d
| |- dot
|- element_wise
| |- add
| |- max
| |- select
| |- (all compute unary, binary, ternary)
|- convolution
| |- conv_3d_ncdhw_fcdhw
| |- conv_2d_...
| |- conv_...
| |- depthwise_conv_...
|- pooling
| |- pooling_nchw_max
| |- pooling_...
|- winograd
| |- winograd_...
...
A class of operations that don’t have a common ancestor are the memory ops (copy
, transpose
, broadcast
, reduce
, pack
/unpack
(soon)). They can all map to generics, so if they branch straight off generic
, they’d be directly coerced into that on transforms that change the shapes.
There are some left-overs that cannot be converted to generics:
softmax
, which can be lowered to multiple generics, and can be thought of as a meta operation. This would not be in the tree at all.index
andyield
, which are used inside the payload region and do not have a lowering per-se. Also a meta operation, also outside of the tree.
Current Implementation
Right now, with OpDSL, named ops are essentially aliases to linalg.generic
, so tiling and fusion can happen on it without additional code. But it also restricts what we can do with the ops, so we started moving them to ODS (table-gen).
We have discussed creating the operations as aliases, similar to OpDSL, but that means different things to different people. Some may think this is a table-gen alias definition, others may think they just implement the same interfaces, etc. So we need to find a common language, which in this case is reasonably simple because they all mean the same thing in the end.
Also note that not all current operations need to be named in the new tree. For example, having a matmul
operation is really useful, but perhaps a batch_reduce_matmul
can be matched from a contract
instead. Same thing element wise operations (do we really need linalg.add
if we already have linalg.element_wise add
?) and the near-infinite variations of convolution
s.
This is another strong motivation for doing this tree approach, replacing the flat grassland we have today.
Implementation Proposal
The criteria we need for the aliases are:
- A transform on an operation should try to still employ the same operation. Ex: tiling a
linalg.add
should create a loop ofadd
s. - A transform on an operation that cannot use the same op (ex. some
batch_matmul
tiling) should be coerced to its immediate parent on the tree. Failing that, its grand-parent, until reaching the root, orlinalg.generic
. - An optional coercion can be made in cases where another sibling operation provides the appropriate semantics. For example, batch-tiling a
batch_matmul
into a loop over the batch dimension usingmatmul
. - Transforms should operate on interfaces and operations should implement those interfaces. But you still can have a transform that is specific to a particular operation where an interface doesn’t make sense.
- There is a trivial path to generalize operations (towards
generic
). - There can be a path to specialize operations (towards the leaves) with enough pattern-matching. Those are expensive, this is why we want to persist as much as possible in IR for future transforms. Note, not every
generic
can be specialized. - There can be helper functions that match more generic operations’ features that mean “the same thing” as the more specific operation. This is helpful if you know you won’t be handling it after this transform (ex. lowering). These should use the same matcher infrastructure as de-generalization transforms.
A naive approach is to encode these relationships in table-gen as def-aliases. But we still need to implement the functionality in C++ that cover the bridges between these representations. Technically, because we’ll implement those anyway, having def-aliases is not required and may get in the way.
Having a back-end for converting def-alises to C++ code that de/generalize and guarantee interface implementation is outside of the scope for now and should not be attempted before we know the actual semantics we’ll go for.
Next Steps
None of these changes will happen now. But this is the design we’re going for the linalg dialect in the short-term, so the new operations will follow this path towards a more flexible transform infrastructure without losing the power to represent what we need without op explosion.
RFCs on the aforementioned operations will start popping soon, and this is the context that we’re operating on.