River Riddle | 01c857b | 2020-03-30 19:25:00 | [diff] [blame] | 1 | # 'vector' Dialect |
Nicolas Vasilache | c9d5f34 | 2019-03-29 18:48:20 | [diff] [blame] | 2 | |
River Riddle | 1a083f0 | 2020-03-24 18:57:13 | [diff] [blame] | 3 | [TOC] |
| 4 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 5 | MLIR supports multi-dimensional `vector` types and custom operations on those |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 6 | types. A generic, retargetable, higher-order `vector` type (`n-D` with `n > 1`) |
| 7 | is a structured type, that carries semantic information useful for |
| 8 | transformations. This document discusses retargetable abstractions that exist in |
| 9 | MLIR today and operate on ssa-values of type `vector` along with pattern |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 10 | rewrites and lowerings that enable targeting specific instructions on concrete |
| 11 | targets. These abstractions serve to separate concerns between operations on |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 12 | `memref` (a.k.a buffers) and operations on `vector` values. This is not a new |
| 13 | proposal but rather a textual documentation of existing MLIR components along |
| 14 | with a rationale. |
Nicolas Vasilache | c9d5f34 | 2019-03-29 18:48:20 | [diff] [blame] | 15 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 16 | ## Positioning in the Codegen Infrastructure |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 17 | |
| 18 | The following diagram, recently presented with the |
| 19 | [StructuredOps abstractions](https://drive.google.com/corp/drive/u/0/folders/1sRAsgsd8Bvpm_IxREmZf2agsGU2KvrK-), |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 20 | captures the current codegen paths implemented in MLIR in the various existing |
| 21 | lowering paths. |
| 22 |  |
| 23 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 24 | The following diagram seeks to isolate `vector` dialects from the complexity of |
| 25 | the codegen paths and focus on the payload-carrying ops that operate on std and |
| 26 | `vector` types. This diagram is not to be taken as set in stone and |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 27 | representative of what exists today but rather illustrates the layering of |
| 28 | abstractions in MLIR. |
| 29 | |
| 30 |  |
| 31 | |
| 32 | This separates concerns related to (a) defining efficient operations on |
| 33 | `vector` types from (b) program analyses + transformations on `memref`, loops |
| 34 | and other types of structured ops (be they `HLO`, `LHLO`, `Linalg` or other ). |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 35 | Looking a bit forward in time, we can put a stake in the ground and venture that |
| 36 | the higher level of `vector`-level primitives we build and target from codegen |
| 37 | (or some user/language level), the simpler our task will be, the more complex |
| 38 | patterns can be expressed and the better performance will be. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 39 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 40 | ## Components of a Generic Retargetable Vector-Level Dialect |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 41 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 42 | The existing MLIR `vector`-level dialects are related to the following bottom-up |
| 43 | abstractions: |
| 44 | |
| 45 | 1. Representation in `LLVMIR` via data structures, instructions and intrinsics. |
| 46 | This is referred to as the `LLVM` level. |
| 47 | 2. Set of machine-specific operations and types that are built to translate |
| 48 | almost 1-1 with the HW ISA. This is referred to as the Hardware Vector |
| 49 | level; a.k.a `HWV`. For instance, we have (a) the `NVVM` dialect (for |
| 50 | `CUDA`) with tensor core ops, (b) accelerator-specific dialects (internal), |
| 51 | a potential (future) `CPU` dialect to capture `LLVM` intrinsics more closely |
| 52 | and other dialects for specific hardware. Ideally this should be |
| 53 | auto-generated as much as possible from the `LLVM` level. |
| 54 | 3. Set of virtual, machine-agnostic, operations that are informed by costs at |
| 55 | the `HWV`-level. This is referred to as the Virtual Vector level; a.k.a |
| 56 | `VV`. This is the level that higher-level abstractions (codegen, automatic |
| 57 | vectorization, potential vector language, ...) targets. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 58 | |
| 59 | The existing generic, retargetable, `vector`-level dialect is related to the |
| 60 | following top-down rewrites and conversions: |
| 61 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 62 | 1. MLIR Rewrite Patterns applied by the MLIR `PatternRewrite` infrastructure to |
| 63 | progressively lower to implementations that match closer and closer to the |
| 64 | `HWV`. Some patterns are "in-dialect" `VV -> VV` and some are conversions |
| 65 | `VV -> HWV`. |
| 66 | 2. `Virtual Vector -> Hardware Vector` lowering is specified as a set of MLIR |
| 67 | lowering patterns that are specified manually for now. |
| 68 | 3. `Hardware Vector -> LLVM` lowering is a mechanical process that is written |
| 69 | manually at the moment and that should be automated, following the `LLVM -> |
| 70 | Hardware Vector` ops generation as closely as possible. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 71 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 72 | ## Short Description of the Existing Infrastructure |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 73 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 74 | ### LLVM level |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 75 | |
| 76 | On CPU, the `n-D` `vector` type currently lowers to `!llvm<array<vector>>`. More |
| 77 | concretely, `vector<4x8x128xf32>` lowers to `!llvm<[4 x [ 8 x [ 128 x float |
| 78 | ]]]>`. There are tradeoffs involved related to how one can access subvectors and |
| 79 | how one uses `llvm.extractelement`, `llvm.insertelement` and |
| 80 | `llvm.shufflevector`. A [deeper dive section](#DeeperDive) discusses the current |
| 81 | lowering choices and tradeoffs. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 82 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 83 | ### Hardware Vector Ops |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 84 | |
| 85 | Hardware Vector Ops are implemented as one dialect per target. For internal |
| 86 | hardware, we are auto-generating the specific HW dialects. For `GPU`, the `NVVM` |
| 87 | dialect adds operations such as `mma.sync`, `shfl` and tests. For `CPU` things |
| 88 | are somewhat in-flight because the abstraction is close to `LLVMIR`. The jury is |
| 89 | still out on whether a generic `CPU` dialect is concretely needed, but it seems |
| 90 | reasonable to have the same levels of abstraction for all targets and perform |
| 91 | cost-based lowering decisions in MLIR even for `LLVM`. Specialized `CPU` |
| 92 | dialects that would capture specific features not well captured by LLVM peephole |
| 93 | optimizations of on different types that core MLIR supports (e.g. Scalable |
| 94 | Vectors) are welcome future extensions. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 95 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 96 | ### Virtual Vector Ops |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 97 | |
Michal Terepeta | c47108c | 2021-11-26 07:14:07 | [diff] [blame] | 98 | Some existing Standard and Vector Dialect on `n-D` `vector` types comprise: |
| 99 | |
| 100 | ```mlir |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 101 | %2 = arith.addf %0, %1 : vector<3x7x8xf32> // -> vector<3x7x8xf32> %2 = |
River Riddle | 23aa5a7 | 2022-02-26 22:49:54 | [diff] [blame^] | 102 | arith.mulf %0, %1 : vector<3x7x8xf32> // -> vector<3x7x8xf32> %2 = vector.splat |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 103 | %1 : vector<3x7x8xf32> // -> vector<3x7x8xf32> |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 104 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 105 | %1 = vector.extract %0[1]: vector<3x7x8xf32> // -> vector<7x8xf32> %1 = |
| 106 | vector.extract %0[1, 5]: vector<3x7x8xf32> // -> vector<8xf32> %2 = |
| 107 | vector.outerproduct %0, %1: vector<4xf32>, vector<8xf32> // -> vector<4x8xf32> |
| 108 | %3 = vector.outerproduct %0, %1, %2: vector<4xf32>, vector<8xf32> // fma when |
| 109 | adding %2 %3 = vector.strided_slice %0 {offsets = [2, 2], sizes = [2, 2], |
| 110 | strides = [1, 1]}: vector<4x8x16xf32> // Returns a slice of type |
| 111 | vector<2x2x16xf32> |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 112 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 113 | %2 = vector.transfer_read %A[%0, %1] {permutation_map = (d0, d1) -> (d0)}: |
| 114 | memref<7x?xf32>, vector<4xf32> |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 115 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 116 | vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {permutation_map = (d0, d1, |
Michal Terepeta | c47108c | 2021-11-26 07:14:07 | [diff] [blame] | 117 | d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32> |
| 118 | ``` |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 119 | |
| 120 | The list of Vector is currently undergoing evolutions and is best kept track of |
| 121 | by following the evolution of the |
Matthias Springer | 99ef9ee | 2022-01-31 10:10:51 | [diff] [blame] | 122 | [VectorOps.td](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td) |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 123 | ODS file (markdown documentation is automatically generated locally when |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 124 | building and populates the |
| 125 | [Vector doc](https://github.com/llvm/llvm-project/blob/main/mlir/docs/Dialects/Vector.md)). |
| 126 | Recent extensions are driven by concrete use cases of interest. A notable such |
| 127 | use case is the `vector.contract` op which applies principles of the |
| 128 | StructuredOps abstraction to `vector` types. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 129 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 130 | ### Virtual Vector Rewrite Patterns |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 131 | |
| 132 | The following rewrite patterns exist at the `VV->VV` level: |
| 133 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 134 | 1. The now retired `MaterializeVector` pass used to legalize ops on a |
| 135 | coarse-grained virtual `vector` to a finer-grained virtual `vector` by |
| 136 | unrolling. This has been rewritten as a retargetable unroll-and-jam pattern |
| 137 | on `vector` ops and `vector` types. |
| 138 | 2. The lowering of `vector_transfer` ops legalizes `vector` load/store ops to |
| 139 | permuted loops over scalar load/stores. This should evolve to loops over |
| 140 | `vector` load/stores + `mask` operations as they become available `vector` |
| 141 | ops at the `VV` level. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 142 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 143 | The general direction is to add more Virtual Vector level ops and implement more |
| 144 | useful `VV -> VV` rewrites as composable patterns that the PatternRewrite |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 145 | infrastructure can apply iteratively. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 146 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 147 | ### Virtual Vector to Hardware Vector Lowering |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 148 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 149 | For now, `VV -> HWV` are specified in C++ (see for instance the |
| 150 | [SplatOpLowering for n-D vectors](https://github.com/tensorflow/mlir/commit/0a0c4867c6a6fcb0a2f17ef26a791c1d551fe33d) |
| 151 | or the |
| 152 | [VectorOuterProductOp lowering](https://github.com/tensorflow/mlir/commit/957b1ca9680b4aacabb3a480fbc4ebd2506334b8)). |
| 153 | |
| 154 | Simple |
| 155 | [conversion tests](https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir) |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 156 | are available for the `LLVM` target starting from the Virtual Vector Level. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 157 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 158 | ## Rationale |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 159 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 160 | ### Hardware as `vector` Machines of Minimum Granularity |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 161 | |
| 162 | Higher-dimensional `vector`s are ubiquitous in modern HPC hardware. One way to |
| 163 | think about Generic Retargetable `vector`-Level Dialect is that it operates on |
hasheddan | 0316f3e | 2021-05-19 21:18:44 | [diff] [blame] | 164 | `vector` types that are multiples of a "good" `vector` size so the HW can |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 165 | efficiently implement a set of high-level primitives (e.g. |
| 166 | `vector<8x8x8x16xf32>` when HW `vector` size is say `vector<4x8xf32>`). |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 167 | |
| 168 | Some notable `vector` sizes of interest include: |
| 169 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 170 | 1. CPU: `vector<HW_vector_size * k>`, `vector<core_count * k’ x |
| 171 | HW_vector_size * k>` and `vector<socket_count x core_count * k’ x |
| 172 | HW_vector_size * k>` |
| 173 | 2. GPU: `vector<warp_size * k>`, `vector<warp_size * k x float4>` and |
| 174 | `vector<warp_size * k x 4 x 4 x 4>` for tensor_core sizes, |
| 175 | 3. Other accelerators: n-D `vector` as first-class citizens in the HW. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 176 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 177 | Depending on the target, ops on sizes that are not multiples of the HW `vector` |
| 178 | size may either produce slow code (e.g. by going through `LLVM` legalization) or |
| 179 | may not legalize at all (e.g. some unsupported accelerator X combination of ops |
| 180 | and types). |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 181 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 182 | ### Transformations Problems Avoided |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 183 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 184 | A `vector<16x32x64xf32>` virtual `vector` is a coarse-grained type that can be |
| 185 | “unrolled” to HW-specific sizes. The multi-dimensional unrolling factors are |
| 186 | carried in the IR by the `vector` type. After unrolling, traditional |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 187 | instruction-level scheduling can be run. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 188 | |
| 189 | The following key transformations (along with the supporting analyses and |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 190 | structural constraints) are completely avoided by operating on a `vector` |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 191 | `ssa-value` abstraction: |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 192 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 193 | 1. Loop unroll and unroll-and-jam. |
| 194 | 2. Loop and load-store restructuring for register reuse. |
| 195 | 3. Load to store forwarding and Mem2reg. |
| 196 | 4. Coarsening (raising) from finer-grained `vector` form. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 197 | |
| 198 | Note that “unrolling” in the context of `vector`s corresponds to partial loop |
| 199 | unroll-and-jam and not full unrolling. As a consequence this is expected to |
| 200 | compose with SW pipelining where applicable and does not result in ICache blow |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 201 | up. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 202 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 203 | ### The Big Out-Of-Scope Piece: Automatic Vectorization |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 204 | |
| 205 | One important piece not discussed here is automatic vectorization (automatically |
| 206 | raising from scalar to n-D `vector` ops and types). The TL;DR is that when the |
| 207 | first "super-vectorization" prototype was implemented, MLIR was nowhere near as |
| 208 | mature as it is today. As we continue building more abstractions in `VV -> HWV`, |
| 209 | there is an opportunity to revisit vectorization in MLIR. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 210 | |
| 211 | Since this topic touches on codegen abstractions, it is technically out of the |
| 212 | scope of this survey document but there is a lot to discuss in light of |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 213 | structured op type representations and how a vectorization transformation can be |
| 214 | reused across dialects. In particular, MLIR allows the definition of dialects at |
| 215 | arbitrary levels of granularity and lends itself favorably to progressive |
| 216 | lowering. The argument can be made that automatic vectorization on a loops + ops |
| 217 | abstraction is akin to raising structural information that has been lost. |
| 218 | Instead, it is possible to revisit vectorization as simple pattern rewrites, |
| 219 | provided the IR is in a suitable form. For instance, vectorizing a |
| 220 | `linalg.generic` op whose semantics match a `matmul` can be done |
| 221 | [quite easily with a pattern](https://github.com/tensorflow/mlir/commit/bff722d6b59ab99b998f0c2b9fccd0267d9f93b5). |
| 222 | In fact this pattern is trivial to generalize to any type of contraction when |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 223 | targeting the `vector.contract` op, as well as to any field (`+/*`, `min/+`, |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 224 | `max/+`, `or/and`, `logsumexp/+` ...) . In other words, by operating on a higher |
| 225 | level of generic abstractions than affine loops, non-trivial transformations |
| 226 | become significantly simpler and composable at a finer granularity. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 227 | |
| 228 | Irrespective of the existence of an auto-vectorizer, one can build a notional |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 229 | vector language based on the VectorOps dialect and build end-to-end models with |
| 230 | expressing `vector`s in the IR directly and simple pattern-rewrites. |
| 231 | [EDSC](https://github.com/llvm/llvm-project/blob/main/mlir/docs/EDSC.md)s |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 232 | provide a simple way of driving such a notional language directly in C++. |
| 233 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 234 | ## Bikeshed Naming Discussion |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 235 | |
| 236 | There are arguments against naming an n-D level of abstraction `vector` because |
| 237 | most people associate it with 1-D `vector`s. On the other hand, `vector`s are |
| 238 | first-class n-D values in MLIR. The alternative name Tile has been proposed, |
| 239 | which conveys higher-D meaning. But it also is one of the most overloaded terms |
| 240 | in compilers and hardware. For now, we generally use the `n-D` `vector` name and |
| 241 | are open to better suggestions. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 242 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 243 | ## DeeperDive |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 244 | |
| 245 | This section describes the tradeoffs involved in lowering the MLIR n-D vector |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 246 | type and operations on it to LLVM-IR. Putting aside the |
| 247 | [LLVM Matrix](http://lists.llvm.org/pipermail/llvm-dev/2018-October/126871.html) |
| 248 | proposal for now, this assumes LLVM only has built-in support for 1-D vector. |
| 249 | The relationship with the LLVM Matrix proposal is discussed at the end of this |
| 250 | document. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 251 | |
| 252 | MLIR does not currently support dynamic vector sizes (i.e. SVE style) so the |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 253 | discussion is limited to static rank and static vector sizes (e.g. |
| 254 | `vector<4x8x16x32xf32>`). This section discusses operations on vectors in LLVM |
| 255 | and MLIR. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 256 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 257 | LLVM instructions are prefixed by the `llvm.` dialect prefix (e.g. |
| 258 | `llvm.insertvalue`). Such ops operate exclusively on 1-D vectors and aggregates |
| 259 | following the [LLVM LangRef](https://llvm.org/docs/LangRef.html). MLIR |
| 260 | operations are prefixed by the `vector.` dialect prefix (e.g. |
| 261 | `vector.insertelement`). Such ops operate exclusively on MLIR `n-D` `vector` |
| 262 | types. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 263 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 264 | ### Alternatives For Lowering an n-D Vector Type to LLVM |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 265 | |
| 266 | Consider a vector of rank n with static sizes `{s_0, ... s_{n-1}}` (i.e. an MLIR |
| 267 | `vector<s_0x...s_{n-1}xf32>`). Lowering such an `n-D` MLIR vector type to an |
| 268 | LLVM descriptor can be done by either: |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 269 | |
Alex Zinenko | dd5165a | 2021-01-06 15:21:08 | [diff] [blame] | 270 | 1. Flattening to a `1-D` vector: `!llvm<"(s_0*...*s_{n-1})xfloat">` in the MLIR |
| 271 | LLVM dialect. |
| 272 | 2. Nested aggregate type of `1-D` vector: |
| 273 | `!llvm."[s_0x[s_1x[...<s_{n-1}xf32>]]]">` in the MLIR LLVM dialect. |
| 274 | 3. A mix of both. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 275 | |
| 276 | There are multiple tradeoffs involved in choosing one or the other that we |
| 277 | discuss. It is important to note that “a mix of both” immediately reduces to |
| 278 | “nested aggregate type of 1-D vector” with a `vector.cast %0: |
| 279 | vector<4x8x16x32xf32> to vector<4x4096xf32>` operation, that flattens the most |
| 280 | "k" minor dimensions. |
| 281 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 282 | ### Constraints Inherited from LLVM (see LangRef) |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 283 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 284 | The first constraint was already mentioned: LLVM only supports `1-D` `vector` |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 285 | types natively. Additional constraints are related to the difference in LLVM |
| 286 | between vector and aggregate types: `“Aggregate Types are a subset of derived |
| 287 | types that can contain multiple member types. Arrays and structs are aggregate |
| 288 | types. Vectors are not considered to be aggregate types.”.` |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 289 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 290 | This distinction is also reflected in some of the operations. For `1-D` vectors, |
| 291 | the operations `llvm.extractelement`, `llvm.insertelement`, and |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 292 | `llvm.shufflevector` apply, with direct support for dynamic indices. For `n-D` |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 293 | vectors with `n>1`, and thus aggregate types at LLVM level, the more restrictive |
| 294 | operations `llvm.extractvalue` and `llvm.insertvalue` apply, which only accept |
| 295 | static indices. There is no direct shuffling support for aggregate types. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 296 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 297 | The next sentence illustrates a recurrent tradeoff, also found in MLIR, between |
| 298 | “value types” (subject to SSA use-def chains) and “memory types” (subject to |
| 299 | aliasing and side-effects): `“Structures in memory are accessed using ‘load’ and |
| 300 | ‘store’ by getting a pointer to a field with the llvm.getelementptr instruction. |
| 301 | Structures in registers are accessed using the llvm.extractvalue and |
| 302 | llvm.insertvalue instructions.”` |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 303 | |
| 304 | When transposing this to MLIR, `llvm.getelementptr` works on pointers to `n-D` |
| 305 | vectors in memory. For `n-D`, vectors values that live in registers we can use |
| 306 | `vector.extract` and `vector.insert` which do not accept dynamic indices. Note |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 307 | that this is consistent with hardware considerations as discussed below. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 308 | |
| 309 | An alternative is to use an LLVM `1-D` `vector` type for which one can use |
| 310 | `llvm.extractelement`, `llvm.insertelement` and `llvm.shufflevector`. These |
| 311 | operations accept dynamic indices. The implication is that one has to use a |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 312 | flattened lowering of an MLIR n-D vector to an LLVM 1-D vector. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 313 | |
| 314 | There are multiple tradeoffs involved that mix implications on the programming |
| 315 | model, execution on actual HW and what is visible or hidden from codegen. They |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 316 | are discussed in the following sections. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 317 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 318 | ### Nested Aggregate |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 319 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 320 | Pros: |
| 321 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 322 | 1. Natural encoding n-D vector -> (n-1)-D aggregate over 1-D vector. |
| 323 | 2. No need for linearization / delinearization logic inserted everywhere. |
| 324 | 3. `llvm.insertvalue`, `llvm.extractvalue` of `(n-k)-D` aggregate is natural. |
| 325 | 4. `llvm.insertelement`, `llvm.extractelement`, `llvm.shufflevector` over `1-D` |
| 326 | vector type is natural. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 327 | |
| 328 | Cons: |
| 329 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 330 | 1. `llvm.insertvalue` / `llvm.extractvalue` does not accept dynamic indices but |
| 331 | only static ones. |
| 332 | 2. Dynamic indexing on the non-most-minor dimension requires roundtrips to |
| 333 | memory. |
| 334 | 3. Special intrinsics and native instructions in LLVM operate on `1-D` vectors. |
| 335 | This is not expected to be a practical limitation thanks to a `vector.cast |
| 336 | %0: vector<4x8x16x32xf32> to vector<4x4096xf32>` operation, that flattens |
| 337 | the most minor dimensions (see the bigger picture in implications on |
| 338 | codegen). |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 339 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 340 | ### Flattened 1-D Vector Type |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 341 | |
| 342 | Pros: |
| 343 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 344 | 1. `insertelement` / `extractelement` / `shufflevector` with dynamic indexing |
| 345 | is possible over the whole lowered `n-D` vector type. |
| 346 | 2. Supports special intrinsics and native operations. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 347 | |
Michal Terepeta | c47108c | 2021-11-26 07:14:07 | [diff] [blame] | 348 | Cons: |
| 349 | |
| 350 | 1. Requires linearization/delinearization logic everywhere, translations are |
| 351 | complex. |
| 352 | 2. Hides away the real HW structure behind dynamic indexing: at the end of the |
| 353 | day, HW vector sizes are generally fixed and multiple vectors will be needed |
| 354 | to hold a vector that is larger than the HW. |
| 355 | 3. Unlikely peephole optimizations will result in good code: arbitrary dynamic |
| 356 | accesses, especially at HW vector boundaries unlikely to result in regular |
| 357 | patterns. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 358 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 359 | ### Discussion |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 360 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 361 | #### HW Vectors and Implications on the SW and the Programming Model |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 362 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 363 | As of today, the LLVM model only support `1-D` vector types. This is |
| 364 | unsurprising because historically, the vast majority of HW only supports `1-D` |
| 365 | vector registers. We note that multiple HW vendors are in the process of |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 366 | evolving to higher-dimensional physical vectors. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 367 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 368 | In the following discussion, let's assume the HW vector size is `1-D` and the SW |
| 369 | vector size is `n-D`, with `n >= 1`. The same discussion would apply with `2-D` |
| 370 | HW `vector` size and `n >= 2`. In this context, most HW exhibit a vector |
| 371 | register file. The number of such vectors is fixed. Depending on the rank and |
| 372 | sizes of the SW vector abstraction and the HW vector sizes and number of |
| 373 | registers, an `n-D` SW vector type may be materialized by a mix of multiple |
| 374 | `1-D` HW vector registers + memory locations at a given point in time. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 375 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 376 | The implication of the physical HW constraints on the programming model are that |
| 377 | one cannot index dynamically across hardware registers: a register file can |
| 378 | generally not be indexed dynamically. This is because the register number is |
| 379 | fixed and one either needs to unroll explicitly to obtain fixed register numbers |
| 380 | or go through memory. This is a constraint familiar to CUDA programmers: when |
| 381 | declaring a `private float a[4]`; and subsequently indexing with a *dynamic* |
| 382 | value results in so-called **local memory** usage (i.e. roundtripping to |
| 383 | memory). |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 384 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 385 | #### Implication on codegen |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 386 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 387 | MLIR `n-D` vector types are currently represented as `(n-1)-D` arrays of `1-D` |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 388 | vectors when lowered to LLVM. This introduces the consequences on static vs |
| 389 | dynamic indexing discussed previously: `extractelement`, `insertelement` and |
| 390 | `shufflevector` on `n-D` vectors in MLIR only support static indices. Dynamic |
| 391 | indices are only supported on the most minor `1-D` vector but not the outer |
| 392 | `(n-1)-D`. For other cases, explicit load / stores are required. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 393 | |
| 394 | The implications on codegen are as follows: |
| 395 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 396 | 1. Loops around `vector` values are indirect addressing of vector values, they |
| 397 | must operate on explicit load / store operations over `n-D` vector types. |
| 398 | 2. Once an `n-D` `vector` type is loaded into an SSA value (that may or may not |
| 399 | live in `n` registers, with or without spilling, when eventually lowered), |
| 400 | it may be unrolled to smaller `k-D` `vector` types and operations that |
| 401 | correspond to the HW. This level of MLIR codegen is related to register |
| 402 | allocation and spilling that occur much later in the LLVM pipeline. |
| 403 | 3. HW may support >1-D vectors with intrinsics for indirect addressing within |
| 404 | these vectors. These can be targeted thanks to explicit `vector_cast` |
| 405 | operations from MLIR `k-D` vector types and operations to LLVM `1-D` |
| 406 | vectors + intrinsics. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 407 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 408 | Alternatively, we argue that directly lowering to a linearized abstraction hides |
| 409 | away the codegen complexities related to memory accesses by giving a false |
| 410 | impression of magical dynamic indexing across registers. Instead we prefer to |
| 411 | make those very explicit in MLIR and allow codegen to explore tradeoffs. |
| 412 | Different HW will require different tradeoffs in the sizes involved in steps 1., |
| 413 | 2. and 3. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 414 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 415 | Decisions made at the MLIR level will have implications at a much later stage in |
| 416 | LLVM (after register allocation). We do not envision to expose concerns related |
| 417 | to modeling of register allocation and spilling to MLIR explicitly. Instead, |
| 418 | each target will expose a set of "good" target operations and `n-D` vector |
| 419 | types, associated with costs that `PatterRewriters` at the MLIR level will be |
| 420 | able to target. Such costs at the MLIR level will be abstract and used for |
| 421 | ranking, not for accurate performance modeling. In the future such costs will be |
| 422 | learned. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 423 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 424 | #### Implication on Lowering to Accelerators |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 425 | |
| 426 | To target accelerators that support higher dimensional vectors natively, we can |
| 427 | start from either `1-D` or `n-D` vectors in MLIR and use `vector.cast` to |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 428 | flatten the most minor dimensions to `1-D` `vector<Kxf32>` where `K` is an |
| 429 | appropriate constant. Then, the existing lowering to LLVM-IR immediately |
| 430 | applies, with extensions for accelerator-specific intrinsics. |
| 431 | |
Kazuaki Ishizaki | fc817b0 | 2020-01-20 03:14:37 | [diff] [blame] | 432 | It is the role of an Accelerator-specific vector dialect (see codegen flow in |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 433 | the figure above) to lower the `vector.cast`. Accelerator -> LLVM lowering would |
| 434 | then consist of a bunch of `Accelerator -> Accelerator` rewrites to perform the |
| 435 | casts composed with `Accelerator -> LLVM` conversions + intrinsics that operate |
| 436 | on `1-D` `vector<Kxf32>`. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 437 | |
| 438 | Some of those rewrites may need extra handling, especially if a reduction is |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 439 | involved. For example, `vector.cast %0: vector<K1x...xKnxf32> to vector<Kxf32>` |
| 440 | when `K != K1 * … * Kn` and some arbitrary irregular `vector.cast %0: |
| 441 | vector<4x4x17xf32> to vector<Kxf32>` may introduce masking and intra-vector |
| 442 | shuffling that may not be worthwhile or even feasible, i.e. infinite cost. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 443 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 444 | However `vector.cast %0: vector<K1x...xKnxf32> to vector<Kxf32>` when `K = K1 * |
| 445 | … * Kn` should be close to a noop. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 446 | |
| 447 | As we start building accelerator-specific abstractions, we hope to achieve |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 448 | retargetable codegen: the same infra is used for CPU, GPU and accelerators with |
| 449 | extra MLIR patterns and costs. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 450 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 451 | #### Implication on calling external functions that operate on vectors |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 452 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 453 | It is possible (likely) that we additionally need to linearize when calling an |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 454 | external function. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 455 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 456 | ### Relationship to LLVM matrix type proposal. |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 457 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 458 | The LLVM matrix proposal was formulated 1 year ago but seemed to be somewhat |
| 459 | stalled until recently. In its current form, it is limited to 2-D matrix types |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 460 | and operations are implemented with LLVM intrinsics. In contrast, MLIR sits at a |
| 461 | higher level of abstraction and allows the lowering of generic operations on |
| 462 | generic n-D vector types from MLIR to aggregates of 1-D LLVM vectors. In the |
| 463 | future, it could make sense to lower to the LLVM matrix abstraction also for CPU |
| 464 | even though MLIR will continue needing higher level abstractions. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 465 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 466 | On the other hand, one should note that as MLIR is moving to LLVM, this document |
Michal Terepeta | c47108c | 2021-11-26 07:14:07 | [diff] [blame] | 467 | could become the unifying abstraction that people should target for 1-D vectors |
| 468 | and the LLVM matrix proposal can be viewed as a subset of this work. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 469 | |
Jacques Pienaar | 1842fd5 | 2020-02-17 21:38:25 | [diff] [blame] | 470 | ### Conclusion |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 471 | |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 472 | The flattened 1-D vector design in the LLVM matrix proposal is good in a |
| 473 | HW-specific world with special intrinsics. This is a good abstraction for |
| 474 | register allocation, Instruction-Level-Parallelism and |
| 475 | SoftWare-Pipelining/Modulo Scheduling optimizations at the register level. |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 476 | However MLIR codegen operates at a higher level of abstraction where we want to |
| 477 | target operations on coarser-grained vectors than the HW size and on which |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 478 | unroll-and-jam is applied and patterns across multiple HW vectors can be |
| 479 | matched. |
| 480 | |
| 481 | This makes “nested aggregate type of 1-D vector” an appealing abstraction for |
| 482 | lowering from MLIR because: |
| 483 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 484 | 1. it does not hide complexity related to the buffer vs value semantics and the |
| 485 | memory subsystem and |
| 486 | 2. it does not rely on LLVM to magically make all the things work from a too |
| 487 | low-level abstraction. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 488 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 489 | The use of special intrinsics in a `1-D` LLVM world is still available thanks to |
| 490 | an explicit `vector.cast` op. |
Nicolas Vasilache | a932f03 | 2020-01-03 18:05:44 | [diff] [blame] | 491 | |
River Riddle | 1a083f0 | 2020-03-24 18:57:13 | [diff] [blame] | 492 | ## Operations |
Nicolas Vasilache | c9d5f34 | 2019-03-29 18:48:20 | [diff] [blame] | 493 | |
River Riddle | 1a083f0 | 2020-03-24 18:57:13 | [diff] [blame] | 494 | [include "Dialects/VectorOps.md"] |