River Riddle | 55de49a | 2020-04-11 18:38:05 | [diff] [blame] | 1 | # Shape Inference |
Jacques Pienaar | fa26a37 | 2020-01-09 02:48:38 | [diff] [blame] | 2 | |
| 3 | Shape inference as discussed here is considered a specific instance of type |
| 4 | inference for [ShapedType][ShapedType]. Type constraints are along (at least) |
| 5 | three axis: 1) elemental type, 2) rank (including static or dynamic), 3) |
| 6 | dimensions. While some operations have no compile time fixed shape (e.g., output |
| 7 | shape is dictated by data) we could still have some knowledge of |
| 8 | constraints/bounds in the system for that operation (e.g., the output of a |
| 9 | `tf.where` is at most the size of the input data). That is, there are additional |
| 10 | valuable constraints that could be captured even without full knowledge of the |
| 11 | shape. |
| 12 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 13 | Type inference is currently modelled executionally for operation creation using the |
Jacques Pienaar | fa26a37 | 2020-01-09 02:48:38 | [diff] [blame] | 14 | [`InferTypeOpInterface`][InferTypeOpInterface], while |
| 15 | `InferShapedTypeOpInterface` is used to implement the shape and element type |
| 16 | inference. The return type can often be deduced from the deduced return shape |
| 17 | and elemental type (queryable from `InferShapedTypeOpInterface`) and so type |
| 18 | inference for tensor types can be implemented with `InferShapedTypeOpInterface`. |
| 19 | |
| 20 | ## Shape functions |
| 21 | |
| 22 | The C++ interfaces are the base mechanism whereby shape inference is queried and |
| 23 | executed, but not the intended way to specify shape constraints in general. |
| 24 | |
| 25 | Initially the shape inference will be declaratively specified using: |
| 26 | |
| 27 | * Constraints on the operands of an operation directly. For example |
| 28 | constraining the input type to be tensor/vector elements or that the |
| 29 | elemental type be of a specific type (e.g., output of computing the size |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 30 | of a value is of elemental type `i1`) or class (e.g., float-like). |
Jacques Pienaar | fa26a37 | 2020-01-09 02:48:38 | [diff] [blame] | 31 | * Constraints across operands and results of an operation. |
| 32 | |
| 33 | - For example, specifying equality constraints on type/constituents of a |
| 34 | type (shape and elemental type) between operands and results (e.g., the |
| 35 | output type of an add is the same as those of the input operands). |
| 36 | |
| 37 | NOTE: The C++ shape functions are an intermediate step until the shape dialect |
| 38 | is more full-fledged, at which point the C++ functions should become the |
| 39 | exceptional case. |
| 40 | |
| 41 | ## Testing |
| 42 | |
| 43 | Shape inference is currently tested alongside type inference by |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 44 | `TestReturnTypeDriver` in the test dialect. This driver performs two checks: |
Jacques Pienaar | fa26a37 | 2020-01-09 02:48:38 | [diff] [blame] | 45 | |
Jacques Pienaar | 4dc39ae | 2020-02-28 18:59:34 | [diff] [blame] | 46 | 1. Verification that the return types specified matches the inferred types. This |
Kazuaki Ishizaki | fc817b0 | 2020-01-20 03:14:37 | [diff] [blame] | 47 | explicit check will be removed and made part of Op verification instead. |
Jacques Pienaar | fa26a37 | 2020-01-09 02:48:38 | [diff] [blame] | 48 | 2. Test the creation of Ops without specifying the return type explicitly in |
| 49 | function `testCreateFunctions` by creating new binary Ops (Op classes |
| 50 | specified in `TestReturnTypeDriver`) using 1) all operands to |
| 51 | `testCreateFunctions` as both operands, and 2) using combinations of input |
| 52 | operands of the function. |
| 53 | |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 54 | ## Shape dialect |
| 55 | |
| 56 | This section details the shape type inference dialect (`shape`). The initial |
| 57 | focus will be on shape functions that describe shape functions could be used in |
| 58 | runtime and compiler (for constructions of ops/refinement of shapes, reification |
| 59 | of dynamic allocations for dialect including TF, TFLite, XLA & tensor compute |
| 60 | dialect under discussion). |
| 61 | |
| 62 | This will focus on the shape functions (e.g., determine the rank and dimensions |
| 63 | of the output shape). As shown in the shaped container type, shape will be one |
| 64 | of 3 components, the others being elemental type and attribute (which is |
| 65 | currently left open with the intention of supporting extensions such as layouts |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 66 | or bounded shapes at a later point). This allows for decoupling of these: |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 67 | |
| 68 | * Not all the information is needed for all analysis; |
| 69 | * Not all shape functions need to provide all the information (e.g., one could |
| 70 | define a base class function that only populates element type but composes |
| 71 | with the others); |
| 72 | * It allows reusing the constraints between, say, Tensor and Memref |
| 73 | representation of an operation; |
| 74 | |
| 75 | An argument could be made that these are metadata function instead of shape |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 76 | functions, with some considering shape and elemental types different and some considering them both as |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 77 | part of shape. But `shape function` is IMHO descriptive and metadata can span |
| 78 | too large a range of potential uses/values. |
| 79 | |
| 80 | ### Requirements |
| 81 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 82 | The requirements for the shape inference functions are determined by the |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 83 | requirements of shape inference, but we believe the requirements below still |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 84 | allow freedom to consider different shape inference approaches and so we do not |
| 85 | impose a particular shape inference approach here. |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 86 | |
| 87 | #### Shape inference functions |
| 88 | |
| 89 | * **Expressiveness** shape functions need to support programs where tensors |
| 90 | have shapes that are not known statically (for example, `tensor<16x?xf32>` |
| 91 | or `tensor<*xf32>*`); |
| 92 | * **Shape error detection** Many operations will have constraints on their |
| 93 | operands. If the constraints are not satisfied or cannot be determined if |
| 94 | satisfied statically, then a runtime check/assertion could be generated. |
| 95 | |
| 96 | * This also aligns with the requirement that the shape function description |
| 97 | should be usable by both the compiler and runtime. |
| 98 | * Shape error functions should be easy to understand, at least what |
| 99 | constraint of the operation is violated. This also requires that shape |
| 100 | function error messages should be configurable by the author of the |
| 101 | shape function (e.g., the author would be able to give the semantic |
| 102 | constraint invalidated rather the low-level check that failed). |
| 103 | * The static analysis may be used to eliminate run-time checks that are |
| 104 | guaranteed to pass. |
| 105 | * Ideally all would eventually (see section |
| 106 | [Inlining shape checking](#inline)) be elided. |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 107 | * Only reporting errors which are guaranteed to occur at runtime. If an error is only |
| 108 | possible (rather than guaranteed) then we use a runtime assertion to fail and produce an error |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 109 | message with the invariant violated. |
| 110 | |
| 111 | * Shape functions usable by compiler and runtime. |
| 112 | |
| 113 | * This does not mean the exact same C++ function, but rather the |
| 114 | description should be consumable by either. |
| 115 | * Shape function description should not be constrained by either runtime |
| 116 | or compiler's type system to handle types only used for analysis. That |
| 117 | is, these two type systems differ and both should be supported, but the |
| 118 | intersection of the two should not be required. As a particular example, |
| 119 | if a compiler only wants to differentiate exact shapes vs dynamic |
Kazuaki Ishizaki | b2f5fd8 | 2020-04-29 05:47:35 | [diff] [blame] | 120 | shapes, then it need not consider a more generic shape lattice even |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 121 | though the shape description supports it. |
| 122 | |
| 123 | * Declarative (e.g., analyzable at compile time, possible to generate |
| 124 | different versions for different use cases) |
| 125 | |
| 126 | * This may not strictly be a requirement, but a way to handle the former: |
| 127 | a declarative specification could be reused by both while avoiding a |
| 128 | need to map to or from a 3rd representation given these two systems |
| 129 | have/and will have different types. |
| 130 | |
| 131 | * Shape inference functions are expressible at runtime |
| 132 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 133 | * User can define a shape function for a new operation dynamically at runtime, |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 134 | this allows for vendors to describe an operation and shape function |
| 135 | dynamically. |
| 136 | |
| 137 | This requirement is on the wishlist. |
| 138 | |
| 139 | * Doesn't require graph-wide shape information (e.g., only require local |
| 140 | information) |
| 141 | |
| 142 | * Shape functions should be cheap to invoke on each kernel launch. |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 143 | * Shape function can be dictated by arguments (operands, attributes and regions) |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 144 | only (e.g., same operands as the corresponding operation could be |
| 145 | constructed & invoked with). |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 146 | * Shape information that needs higher-level/graph information should use |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 147 | richer types (e.g., `TensorList<F32>`); |
| 148 | * The function should be invocable before/while constructing an op (e.g., |
| 149 | can't rely on the op being constructed). |
| 150 | |
| 151 | * Shape functions should be pure functions. |
| 152 | |
| 153 | * Should support functions whose type is only known dynamically (e.g., |
| 154 | `read_from_file` op) |
| 155 | |
| 156 | * Without needing to invoke the op (e.g., reading a file once for |
| 157 | determining the shape & then post to be able to actually consume the |
| 158 | output of the file). |
| 159 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 160 | * The shape function operation dialect should be interoperable with non-shape function dialect operations. |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 161 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 162 | * There may be a common set of operations that satisfy most uses (e.g., merge, |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 163 | equal_type, arithmetic expressions, slice, concat, pattern matching on |
| 164 | attributes such as padding etc.) that will be discovered and could cover |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 165 | a large percentage of the use cases. Among these there will be some |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 166 | which carry extra semantic info that could be used for symbolic |
| 167 | constraints (e.g., checking equality of two dimensions resulting in |
| 168 | setting an equality constraint) and higher-order interpretation for |
| 169 | constraint solving. |
| 170 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 171 | It is therefore beneficial (but not required) to reuse operations, |
| 172 | especially as for statically known shapes, arbitrary arithmetic |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 173 | computations could still be performed. This means that the computations |
| 174 | performed statically may or may not be supported by an arbitrary solver, |
| 175 | but would still be allowed. |
| 176 | |
| 177 | * The shape function should be expandable such that symbolic equality and |
| 178 | upper bound constraints (say) could be represented and may be propagated by |
| 179 | shape inference. |
| 180 | |
| 181 | * E.g., the shape functions may contain more information that is only |
| 182 | useful when used from shape inference; |
| 183 | |
| 184 | * Shape functions are allowed to fail and report an error. The error reporting |
| 185 | should report the location of the operation that failed with, where |
| 186 | possible, a user actionable error message. |
| 187 | |
| 188 | * These failures could become inlined and become runtime failures with |
| 189 | runtime values and error messages. |
| 190 | * Reporting errors should be optional. E.g., The same function |
| 191 | may be used as to query validity without reporting an error. |
| 192 | |
| 193 | #### Non-goals |
| 194 | |
| 195 | 1. The shape dialect is an IR representations and not a programming language; |
| 196 | * While the functions should be readable, it doesn't carry the |
| 197 | conveniences of a programming language. Deciding how people write these |
| 198 | things, e.g. a mini dsl, a C++ API that generates them, extracting them |
| 199 | programmatically from `SetShapeFn` calls, etc., is still TBD. |
| 200 | 1. Describe the shape inference approach that will use the shape functions; |
| 201 | * The goal is that the shape functions and the constraints one could |
| 202 | obtain from them are general enough that they would be useful for |
| 203 | various analysis. But whether we follow very simple (e.g., only fully |
| 204 | static information is used for shape output, unranked for everything |
| 205 | else) to very advance (e.g., expression trees of symbolic constants) can |
| 206 | be evaluated independently of this proposal and with concrete benefit |
| 207 | analysis. |
| 208 | 1. Describe the approach whereby error messages will be generated; |
| 209 | * While the shape functions will be able to emit errors optionally, it |
| 210 | will be possible to dictate when they emit an error. This enables |
| 211 | deciding whether or which error to emit: there have been proposals in |
| 212 | the literature that the iteration order for shape inference affect the |
| 213 | quality of the error message produced, and the shape functions do not |
| 214 | mandate that. |
| 215 | 1. Flow sensitive shape functions; |
| 216 | * To enable scalable/cheap shape inference, the shape functions do not |
| 217 | intend to provide flow sensitive information. This facility could |
Markus Böck | 286a7a4 | 2021-10-29 07:19:11 | [diff] [blame] | 218 | potentially be built as part of some higher order analysis that reuse |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 219 | the shape functions/constraints due to the shape functions. |
| 220 | 1. All static functions are usable for dynamic/unknown shapes; |
| 221 | * More involved computations can be performed with statically known shapes |
| 222 | than what can be sensibly analyzed with unknown/symbolic variables. |
| 223 | |
| 224 | ### Discussion |
| 225 | |
| 226 | #### Inline shape inference checks {#inline} |
| 227 | |
| 228 | Shape functions should be lowerable to runtime checks for validity. E.g. verify |
| 229 | as much as possible statically, but enable generating instructions to compute the |
| 230 | shape dynamically and or falling back to runtime checks for attributes not |
| 231 | verifiable at compile time. These checks inserted should ideally only check that |
| 232 | which could not have been verified statically. |
| 233 | |
| 234 | These inlined calls could interfere with optimization patterns/passes (e.g., |
| 235 | shape inference should not insert constructs that interfere with optimization |
| 236 | patterns) and so could be delayed until later (with another round of |
| 237 | optimizations, constant folding, CSE, etc., that should remove redundant runtime |
| 238 | operations). |
| 239 | |
| 240 | ### Possibly Asked Questions |
| 241 | |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 242 | #### What about ODS specifications of operations? |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 243 | |
| 244 | In ODS we have been recording the constraints for the operands & attributes of |
| 245 | an operation. Where these are sufficient to constrain the output shape (e.g., |
| 246 | `SameOperandAndResultType` or broadcastable) we should generate the shape |
| 247 | function from those. Where not, an explicit shape function should be specified |
| 248 | (spelling TBD but currently considering using the MLIR textual form as |
| 249 | serialization approach). |
| 250 | |
| 251 | #### Why not extract the shape function from reference implementation? |
| 252 | |
| 253 | This could be done in future! The extracted shape function would use the shape |
Baden Hughes | 453cd2d | 2020-02-22 02:08:33 | [diff] [blame] | 254 | inference dialect, so we are starting there. Especially for operations described in a |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 255 | structured way, one could autogenerate the shape function. |
| 256 | |
| 257 | #### How/in what language will the shape functions be authored? |
| 258 | |
| 259 | TBD. open to many approaches and suggestions, starting on the IR produced by |
| 260 | whatever language is the priority of this proposal. |
| 261 | |
| 262 | #### What shape inference approach is being suggested here? |
| 263 | |
| 264 | None. There are multiple different shape inference approaches that we could |
| 265 | layer on top of these. From the most basic (always return unranked), to more |
| 266 | useful (return fixed shape for constant inputs/arguments) to the more advanced |
Markus Böck | 286a7a4 | 2021-10-29 07:19:11 | [diff] [blame] | 267 | (create logical conjunctions of algebraic statements between symbolic named |
Jacques Pienaar | c4b4c0c | 2020-01-28 20:05:54 | [diff] [blame] | 268 | values). |
| 269 | |
| 270 | ### Open points |
| 271 | |
| 272 | 1. Should shape functions that produce dynamic outputs given all statically |
| 273 | shaped inputs be marked specially? E.g., read from file. |
| 274 | |
| 275 | TODO: Add examples here. |
| 276 | |
Jacques Pienaar | fa26a37 | 2020-01-09 02:48:38 | [diff] [blame] | 277 | ## WIP/Future considerations |
| 278 | |
| 279 | Shape functions are determined by attributes and could be arbitrarily |
| 280 | complicated with a wide-range of specification possibilities. Equality |
| 281 | relationships are common (e.g., the elemental type of the output matches the |
| 282 | primitive type of the inputs, both inputs have exactly the same type [primitive |
| 283 | type and shape]) and so these should be easy to specify. Algebraic relationships |
| 284 | would also be common (e.g., a concat of `[n,m]` and `[n,m]` matrix along axis 0 |
| 285 | is `[n+n, m]` matrix), while some ops only have defined shapes under certain |
| 286 | cases (e.g., matrix multiplication of `[a,b]` and `[c,d]` is only defined if `b |
| 287 | == c`). |
| 288 | |
| 289 | Instead of specifying an additional mechanism to specify a shape transfer |
| 290 | function, the reference implementation of the operation will be used to derive |
| 291 | the shape function. The reference implementation is general and can support the |
| 292 | arbitrary computations needed to specify output shapes. |
| 293 | |
xgupta | 94fac81 | 2021-02-01 07:24:21 | [diff] [blame] | 294 | [InferTypeOpInterface]: https://github.com/llvm/llvm-project/tree/main/mlir/include/mlir/Interfaces/InferTypeOpInterface.td |
| 295 | [ShapedType]: https://github.com/llvm/llvm-project/tree/main/mlir/include/mlir/IR/BuiltinTypes.h |