blob: e3ebe9973ba2348ceb2f39881fbbf34a45c2f0f4 [file] [log] [blame] [view]
Alex Zinenkob10940e2021-09-10 13:47:571# LLVM IR Target
2
3This document describes the mechanisms of producing LLVM IR from MLIR. The
4overall flow is two-stage:
5
61. **conversion** of the IR to a set of dialects translatable to LLVM IR, for
7 example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific
8 dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md),
9 [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md);
102. **translation** of MLIR dialects to LLVM IR.
11
12This flow allows the non-trivial transformation to be performed within MLIR
13using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and
14potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR
15are expected to closely match the corresponding LLVM IR instructions and
16intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well
17as reduces the churn in case of changes.
18
19SPIR-V to LLVM dialect conversion has a
20[dedicated document](SPIRVToLLVMDialectConversion.md).
21
22[TOC]
23
24## Conversion to the LLVM Dialect
25
26Conversion to the LLVM dialect from other dialects is the first step to produce
27LLVM IR. All non-trivial IR modifications are expected to happen at this stage
28or before. The conversion is *progressive*: most passes convert one dialect to
29the LLVM dialect and keep operations from other dialects intact. For example,
30the `-convert-memref-to-llvm` pass will only convert operations from the
31`memref` dialect but will not convert operations from other dialects even if
32they use or produce `memref`-typed values.
33
34The process relies on the [Dialect Conversion](DialectConversion.md)
35infrastructure and, in particular, on the
36[materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter`
37to support progressive lowering by injecting `unrealized_conversion_cast`
38operations between converted and unconverted operations. After multiple partial
39conversions to the LLVM dialect are performed, the cast operations that became
40noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass
41is not specific to the LLVM dialect and can remove any noop casts.
42
43### Conversion of Built-in Types
44
45Built-in types have a default conversion to LLVM dialect types provided by the
46`LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend
47this type converter to support other types. Extra care must be taken if the
48conversion rules for built-in types are overridden: all conversion must use the
49same type converter.
50
51#### LLVM Dialect-compatible Types
52
53The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the
54LLVM dialect are kept as is.
55
56#### Complex Type
57
58Complex type is converted into an LLVM dialect literal structure type with two
59elements:
60
61- real part;
62- imaginary part.
63
64The elemental type is converted recursively using these rules.
65
66Example:
67
68```mlir
69 complex<f32>
70 // ->
71 !llvm.struct<(f32, f32)>
72```
73
74#### Index Type
75
76Index type is converted into an LLVM dialect integer type with the bitwidth
77specified by the [data layout](DataLayout.md) of the closest module. For
78example, on x86-64 CPUs it converts to i64. This behavior can be overridden by
79the type converter configuration, which is often exposed as a pass option by
80conversion passes.
81
82Example:
83
84```mlir
85 index
86 // -> on x86_64
87 i64
88```
89
90#### Ranked MemRef Types
91
92Ranked memref types are converted into an LLVM dialect literal structure type
93that contains the dynamic information associated with the memref object,
94referred to as *descriptor*. Only memrefs in the
95**[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the
96LLVM dialect with the default descriptor format. Memrefs with other, less
97trivial layouts should be converted into the strided form first, e.g., by
98materializing the non-trivial address remapping due to layout as `affine.apply`
99operations.
100
101The default memref descriptor is a struct with the following fields:
102
1031. The pointer to the data buffer as allocated, referred to as "allocated
104 pointer". This is only useful for deallocating the memref.
1052. The pointer to the properly aligned data pointer that the memref indexes,
106 referred to as "aligned pointer".
1073. A lowered converted `index`-type integer containing the distance in number
108 of elements between the beginning of the (aligned) buffer and the first
109 element to be accessed through the memref, referred to as "offset".
1104. An array containing as many converted `index`-type integers as the rank of
111 the memref: the array represents the size, in number of elements, of the
112 memref along the given dimension.
1135. A second array containing as many converted `index`-type integers as the
114 rank of memref: the second array represents the "stride" (in tensor
115 abstraction sense), i.e. the number of consecutive elements of the
116 underlying buffer one needs to jump over to get to the next logically
117 indexed element.
118
119For constant memref dimensions, the corresponding size entry is a constant whose
120runtime value matches the static value. This normalization serves as an ABI for
121the memref type to interoperate with externally linked functions. In the
122particular case of rank `0` memrefs, the size and stride arrays are omitted,
123resulting in a struct containing two pointers + offset.
124
125Examples:
126
127```mlir
128// Assuming index is converted to i64.
129
130memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)>
131memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64,
132 array<1 x 64>, array<1 x i64>)>
133memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
134 array<1 x 64>, array<1 x i64>)>
135memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
136 array<5 x 64>, array<5 x i64>)>
137memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
138 array<5 x 64>, array<5 x i64>)>
139
140// Memref types can have vectors as element types
141memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>,
142 ptr<vector<4 x f32>>, i64,
143 array<2 x i64>, array<2 x i64>)>
144```
145
146#### Unranked MemRef Types
147
148Unranked memref types are converted to LLVM dialect literal structure type that
Markus Böck286a7a42021-10-29 07:19:11149contains the dynamic information associated with the memref object, referred to
Alex Zinenkob10940e2021-09-10 13:47:57150as *unranked descriptor*. It contains:
151
1521. a converted `index`-typed integer representing the dynamic rank of the
153 memref;
1542. a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with
155 the contents listed above.
156
157This descriptor is primarily intended for interfacing with rank-polymorphic
158library functions. The pointer to the ranked memref descriptor points to some
159*allocated* memory, which may reside on stack of the current function or in
160heap. Conversion patterns for operations producing unranked memrefs are expected
161to manage the allocation. Note that this may lead to stack allocations
162(`llvm.alloca`) being performed in a loop and not reclaimed until the end of the
163current function.
164
165#### Function Types
166
167Function types are converted to LLVM dialect function types as follows:
168
169- function argument and result types are converted recursively using these
170 rules;
171- if a function type has multiple results, they are wrapped into an LLVM
172 dialect literal structure type since LLVM function types must have exactly
173 one result;
174- if a function type has no results, the corresponding LLVM dialect function
175 type will have one `!llvm.void` result since LLVM function types must have a
176 result;
177- function types used in arguments of another function type are wrapped in an
178 LLVM dialect pointer type to comply with LLVM IR expectations;
179- the structs corresponding to `memref` types, both ranked and unranked,
180 appearing as function arguments are unbundled into individual function
181 arguments to allow for specifying metadata such as aliasing information on
182 individual pointers;
183- the conversion of `memref`-typed arguments is subject to
184 [calling conventions](TargetLLVMIR.md#calling-conventions).
185
186Examples:
187
188```mlir
189// Zero-ary function type with no results:
190() -> ()
191// is converted to a zero-ary function with `void` result.
192!llvm.func<void ()>
193
194// Unary function with one result:
195(i32) -> (i64)
196// has its argument and result type converted, before creating the LLVM dialect
197// function type.
198!llvm.func<i64 (i32)>
199
200// Binary function with one result:
201(i32, f32) -> (i64)
202// has its arguments handled separately
203!llvm.func<i64 (i32, f32)>
204
205// Binary function with two results:
206(i32, f32) -> (i64, f64)
207// has its result aggregated into a structure type.
208!llvm.func<struct<(i64, f64)> (i32, f32)>
209
210// Function-typed arguments or results in higher-order functions:
211(() -> ()) -> (() -> ())
212// are converted into pointers to functions.
213!llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)>
214
215// These rules apply recursively: a function type taking a function that takes
216// another function
217( ( (i32) -> (i64) ) -> () ) -> ()
218// is converted into a function type taking a pointer-to-function that takes
219// another point-to-function.
220!llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)>
221
222// A memref descriptor appearing as function argument:
223(memref<f32>) -> ()
224// gets converted into a list of individual scalar components of a descriptor.
225!llvm.func<void (ptr<f32>, ptr<f32>, i64)>
226
227// The list of arguments is linearized and one can freely mix memref and other
228// types in this list:
229(memref<f32>, f32) -> ()
230// which gets converted into a flat list.
231!llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)>
232
233// For nD ranked memref descriptors:
234(memref<?x?xf32>) -> ()
235// the converted signature will contain 2n+1 `index`-typed integer arguments,
236// offset, n sizes and n strides, per memref argument type.
237!llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)>
238
239// Same rules apply to unranked descriptors:
240(memref<*xf32>) -> ()
241// which get converted into their components.
242!llvm.func<void (i64, ptr<i8>)>
243
244// However, returning a memref from a function is not affected:
245() -> (memref<?xf32>)
246// gets converted to a function returning a descriptor structure.
247!llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()>
248
249// If multiple memref-typed results are returned:
250() -> (memref<f32>, memref<f64>)
251// their descriptor structures are additionally packed into another structure,
252// potentially with other non-memref typed results.
253!llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>,
254 struct<(ptr<double>, ptr<double>, i64)>)> ()>
255```
256
257Conversion patterns are available to convert built-in function operations and
258standard call operations targeting those functions using these conversion rules.
259
260#### Multi-dimensional Vector Types
261
262LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can
263be multi-dimensional. Vector types cannot be nested in either IR. In the
264one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same
265size with element type converted using these conversion rules. In the
266n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types
267of one-dimensional vectors.
268
269Examples:
270
271```
272vector<4x8 x f32>
273// ->
274!llvm.array<4 x vector<8 x f32>>
275
276memref<2 x vector<4x8 x f32>
277// ->
278!llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>>
279 i64, array<1 x i64>, array<1 x i64>)>
280```
281
282#### Tensor Types
283
284Tensor types cannot be converted to the LLVM dialect. Operations on tensors must
285be [bufferized](Bufferization.md) before being converted.
286
287### Calling Conventions
288
289Calling conventions provides a mechanism to customize the conversion of function
290and function call operations without changing how individual types are handled
291elsewhere. They are implemented simultaneously by the default type converter and
292by the conversion patterns for the relevant operations.
293
294#### Function Result Packing
295
296In case of multi-result functions, the returned values are inserted into a
297structure-typed value before being returned and extracted from it at the call
298site. This transformation is a part of the conversion and is transparent to the
299defines and uses of the values being returned.
300
301Example:
302
303```mlir
304func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) {
305 return %arg0, %arg1 : i32, i64
306}
307func @bar() {
Mogballa54f4ea2021-10-12 23:14:57308 %0 = arith.constant 42 : i32
309 %1 = arith.constant 17 : i64
Alex Zinenkob10940e2021-09-10 13:47:57310 %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64)
311 "use_i32"(%2#0) : (i32) -> ()
312 "use_i64"(%2#1) : (i64) -> ()
313}
314
315// is transformed into
316
317llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> {
318 // insert the vales into a structure
319 %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)>
320 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)>
321 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)>
322
323 // return the structure value
324 llvm.return %2 : !llvm.struct<(i32, i64)>
325}
326llvm.func @bar() {
327 %0 = llvm.mlir.constant(42 : i32) : i32
328 %1 = llvm.mlir.constant(17) : i64
329
330 // call and extract the values from the structure
331 %2 = llvm.call @bar(%0, %1)
332 : (i32, i32) -> !llvm.struct<(i32, i64)>
333 %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)>
334 %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)>
335
336 // use as before
337 "use_i32"(%3) : (i32) -> ()
338 "use_i64"(%4) : (i64) -> ()
339}
340```
341
342#### Default Calling Convention for Ranked MemRef
343
344The default calling convention converts `memref`-typed function arguments to
345LLVM dialect literal structs
346[defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into
347individual scalar arguments.
348
349Examples:
350
River Riddle23aa5a72022-02-26 22:49:54351This convention is implemented in the conversion of `builtin.func` and `func.call` to
Alex Zinenkob10940e2021-09-10 13:47:57352the LLVM dialect, with the former unpacking the descriptor into a set of
353individual values and the latter packing those values back into a descriptor so
354as to make it transparently usable by other operations. Conversions from other
355dialects should take this convention into account.
356
357This specific convention is motivated by the necessity to specify alignment and
358aliasing attributes on the raw pointers underpinning the memref.
359
360Examples:
361
362```mlir
363func @foo(%arg0: memref<?xf32>) -> () {
364 "use"(%arg0) : (memref<?xf32>) -> ()
365 return
366}
367
368// Gets converted to the following
369// (using type alias for brevity):
370!llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
371 array<1xi64>, array<1xi64>)>
372
373llvm.func @foo(%arg0: !llvm.ptr<f32>, // Allocated pointer.
374 %arg1: !llvm.ptr<f32>, // Aligned pointer.
375 %arg2: i64, // Offset.
376 %arg3: i64, // Size in dim 0.
377 %arg4: i64) { // Stride in dim 0.
378 // Populate memref descriptor structure.
379 %0 = llvm.mlir.undef :
380 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d
381 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d
382 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d
383 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d
384 %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d
385
386 // Descriptor is now usable as a single value.
387 "use"(%5) : (!llvm.memref_1d) -> ()
388 llvm.return
389}
390```
391
392```mlir
393func @bar() {
394 %0 = "get"() : () -> (memref<?xf32>)
395 call @foo(%0) : (memref<?xf32>) -> ()
396 return
397}
398
399// Gets converted to the following
400// (using type alias for brevity):
401!llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
402 array<1xi64>, array<1xi64>)>
403
404llvm.func @bar() {
405 %0 = "get"() : () -> !llvm.memref_1d
406
407 // Unpack the memref descriptor.
408 %1 = llvm.extractvalue %0[0] : !llvm.memref_1d
409 %2 = llvm.extractvalue %0[1] : !llvm.memref_1d
410 %3 = llvm.extractvalue %0[2] : !llvm.memref_1d
411 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d
412 %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d
413
414 // Pass individual values to the callee.
415 llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
416 llvm.return
417}
418```
419
420#### Default Calling Convention for Unranked MemRef
421
422For unranked memrefs, the list of function arguments always contains two
423elements, same as the unranked memref descriptor: an integer rank, and a
424type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that
425while the *calling convention* does not require allocation, *casting* to
426unranked memref does since one cannot take an address of an SSA value containing
427the ranked memref, which must be stored in some memory instead. The caller is in
428charge of ensuring the thread safety and management of the allocated memory, in
429particular the deallocation.
430
431Example
432
433```mlir
434llvm.func @foo(%arg0: memref<*xf32>) -> () {
435 "use"(%arg0) : (memref<*xf32>) -> ()
436 return
437}
438
439// Gets converted to the following.
440
441llvm.func @foo(%arg0: i64 // Rank.
442 %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor.
443 // Pack the unranked memref descriptor.
444 %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
445 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)>
446 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)>
447
448 "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> ()
449 llvm.return
450}
451```
452
453```mlir
454llvm.func @bar() {
455 %0 = "get"() : () -> (memref<*xf32>)
456 call @foo(%0): (memref<*xf32>) -> ()
457 return
458}
459
460// Gets converted to the following.
461
462llvm.func @bar() {
463 %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>)
464
465 // Unpack the memref descriptor.
466 %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)>
467 %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)>
468
469 // Pass individual values to the callee.
470 llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>)
471 llvm.return
472}
473```
474
475**Lifetime.** The second element of the unranked memref descriptor points to
476some memory in which the ranked memref descriptor is stored. By convention, this
477memory is allocated on stack and has the lifetime of the function. (*Note:* due
478to function-length lifetime, creation of multiple unranked memref descriptors,
479e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to
480be returned from a function, the ranked descriptor it points to is copied into
481dynamically allocated memory, and the pointer in the unranked descriptor is
482updated accordingly. The allocation happens immediately before returning. It is
483the responsibility of the caller to free the dynamically allocated memory. The
River Riddle23aa5a72022-02-26 22:49:54484default conversion of `func.call` and `func.call_indirect` copies the ranked
Alex Zinenkob10940e2021-09-10 13:47:57485descriptor to newly allocated memory on the caller's stack. Thus, the convention
486of the ranked memref descriptor pointed to by an unranked memref descriptor
487being stored on stack is respected.
488
489#### Bare Pointer Calling Convention for Ranked MemRef
490
491The "bare pointer" calling convention converts `memref`-typed function arguments
492to a *single* pointer to the aligned data. Note that this does *not* apply to
493uses of `memref` outside of function signatures, the default descriptor
494structures are still used. This convention further restricts the supported cases
495to the following.
496
497- `memref` types with default layout.
498- `memref` types with all dimensions statically known.
499- `memref` values allocated in such a way that the allocated and aligned
500 pointer match. Alternatively, the same function must handle allocation and
501 deallocation since only one pointer is passed to any callee.
502
503Examples:
504
505```
506func @callee(memref<2x4xf32>) {
507
508func @caller(%0 : memref<2x4xf32>) {
509 call @callee(%0) : (memref<2x4xf32>) -> ()
510}
511
512// ->
513
514!descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64,
515 array<2xi64>, array<2xi64>)>
516
517llvm.func @callee(!llvm.ptr<f32>)
518
519llvm.func @caller(%arg0: !llvm.ptr<f32>) {
520 // A descriptor value is defined at the function entry point.
521 %0 = llvm.mlir.undef : !descriptor
522
523 // Both the allocated and aligned pointer are set up to the same value.
524 %1 = llvm.insertelement %arg0, %0[0] : !descriptor
525 %2 = llvm.insertelement %arg0, %1[1] : !descriptor
526
527 // The offset is set up to zero.
528 %3 = llvm.mlir.constant(0 : index) : i64
529 %4 = llvm.insertelement %3, %2[2] : !descriptor
530
531 // The sizes and strides are derived from the statically known values.
532 %5 = llvm.mlir.constant(2 : index) : i64
533 %6 = llvm.mlir.constant(4 : index) : i64
534 %7 = llvm.insertelement %5, %4[3, 0] : !descriptor
535 %8 = llvm.insertelement %6, %7[3, 1] : !descriptor
536 %9 = llvm.mlir.constant(1 : index) : i64
537 %10 = llvm.insertelement %9, %8[4, 0] : !descriptor
538 %11 = llvm.insertelement %10, %9[4, 1] : !descriptor
539
540 // The function call corresponds to extracting the aligned data pointer.
541 %12 = llvm.extractelement %11[1] : !descriptor
542 llvm.call @callee(%12) : (!llvm.ptr<f32>) -> ()
543}
544```
545
546#### Bare Pointer Calling Convention For Unranked MemRef
547
548The "bare pointer" calling convention does not support unranked memrefs as their
549shape cannot be known at compile time.
550
551### C-compatible wrapper emission
552
553In practical cases, it may be desirable to have externally-facing functions with
554a single attribute corresponding to a MemRef argument. When interfacing with
555LLVM IR produced from C, the code needs to respect the corresponding calling
556convention. The conversion to the LLVM dialect provides an option to generate
557wrapper functions that take memref descriptors as pointers-to-struct compatible
558with data types produced by Clang when compiling C sources. The generation of
559such wrapper functions can additionally be controlled at a function granularity
560by setting the `llvm.emit_c_interface` unit attribute.
561
562More specifically, a memref argument is converted into a pointer-to-struct
563argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where
564`T` is the converted element type and `N` is the memref rank. This type is
565compatible with that produced by Clang for the following C++ structure template
566instantiations or their equivalents in C.
567
568```cpp
569template<typename T, size_t N>
570struct MemRefDescriptor {
571 T *allocated;
572 T *aligned;
573 intptr_t offset;
574 intptr_t sizes[N];
575 intptr_t strides[N];
576};
577```
578
579Furthermore, we also rewrite function results to pointer parameters if the
580rewritten function result has a struct type. The special result parameter is
581added as the first parameter and is of pointer-to-struct type.
582
583If enabled, the option will do the following. For *external* functions declared
584in the MLIR module.
585
5861. Declare a new function `_mlir_ciface_<original name>` where memref arguments
587 are converted to pointer-to-struct and the remaining arguments are converted
588 as usual. Results are converted to a special argument if they are of struct
589 type.
5902. Add a body to the original function (making it non-external) that
591 1. allocates memref descriptors,
592 2. populates them,
593 3. potentially allocates space for the result struct, and
594 4. passes the pointers to these into the newly declared interface function,
595 then
596 5. collects the result of the call (potentially from the result struct),
597 and
598 6. returns it to the caller.
599
600For (non-external) functions defined in the MLIR module.
601
6021. Define a new function `_mlir_ciface_<original name>` where memref arguments
603 are converted to pointer-to-struct and the remaining arguments are converted
604 as usual. Results are converted to a special argument if they are of struct
605 type.
6062. Populate the body of the newly defined function with IR that
607 1. loads descriptors from pointers;
608 2. unpacks descriptor into individual non-aggregate values;
609 3. passes these values into the original function;
610 4. collects the results of the call and
611 5. either copies the results into the result struct or returns them to the
612 caller.
613
614Examples:
615
616```mlir
617
618func @qux(%arg0: memref<?x?xf32>)
619
620// Gets converted into the following
621// (using type alias for brevity):
622!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
623 array<2xi64>, array<2xi64>)>
624
625// Function with unpacked arguments.
626llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
627 %arg2: i64, %arg3: i64, %arg4: i64,
628 %arg5: i64, %arg6: i64) {
629 // Populate memref descriptor (as per calling convention).
630 %0 = llvm.mlir.undef : !llvm.memref_2d
631 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
632 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
633 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
634 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
635 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
636 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
637 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
638
639 // Store the descriptor in a stack-allocated space.
640 %8 = llvm.mlir.constant(1 : index) : i64
641 %9 = llvm.alloca %8 x !llvm.memref_2d
642 : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
643 array<2xi64>, array<2xi64>)>>
644 llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
645 array<2xi64>, array<2xi64>)>>
646
647 // Call the interface function.
648 llvm.call @_mlir_ciface_qux(%9)
649 : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
650 array<2xi64>, array<2xi64>)>>) -> ()
651
652 // The stored descriptor will be freed on return.
653 llvm.return
654}
655
656// Interface function.
657llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
658 array<2xi64>, array<2xi64>)>>)
659```
660
661```mlir
662func @foo(%arg0: memref<?x?xf32>) {
663 return
664}
665
666// Gets converted into the following
667// (using type alias for brevity):
668!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
669 array<2xi64>, array<2xi64>)>
670!llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
671 array<2xi64>, array<2xi64>)>>
672
673// Function with unpacked arguments.
674llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
675 %arg2: i64, %arg3: i64, %arg4: i64,
676 %arg5: i64, %arg6: i64) {
677 llvm.return
678}
679
680// Interface function callable from C.
681llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) {
682 // Load the descriptor.
683 %0 = llvm.load %arg0 : !llvm.memref_2d_ptr
684
685 // Unpack the descriptor as per calling convention.
686 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
687 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
688 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
689 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
690 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
691 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
692 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
693 llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
694 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64,
695 i64, i64) -> ()
696 llvm.return
697}
698```
699
700```mlir
701func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> {
702 return %arg0 : memref<?x?xf32>
703}
704
705// Gets converted into the following
706// (using type alias for brevity):
707!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
708 array<2xi64>, array<2xi64>)>
709!llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
710 array<2xi64>, array<2xi64>)>>
711
712// Function with unpacked arguments.
713llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64,
714 %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64)
715 -> !llvm.memref_2d {
716 %0 = llvm.mlir.undef : !llvm.memref_2d
717 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
718 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
719 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
720 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
721 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
722 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
723 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
724 llvm.return %7 : !llvm.memref_2d
725}
726
727// Interface function callable from C.
728llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) {
729 %0 = llvm.load %arg1 : !llvm.memref_2d_ptr
730 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
731 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
732 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
733 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
734 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
735 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
736 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
737 %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
738 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d
739 llvm.store %8, %arg0 : !llvm.memref_2d_ptr
740 llvm.return
741}
742```
743
744Rationale: Introducing auxiliary functions for C-compatible interfaces is
745preferred to modifying the calling convention since it will minimize the effect
746of C compatibility on intra-module calls or calls between MLIR-generated
747functions. In particular, when calling external functions from an MLIR module in
748a (parallel) loop, the fact of storing a memref descriptor on stack can lead to
749stack exhaustion and/or concurrent access to the same address. Auxiliary
750interface function serves as an allocation scope in this case. Furthermore, when
751targeting accelerators with separate memory spaces such as GPUs, stack-allocated
752descriptors passed by pointer would have to be transferred to the device memory,
753which introduces significant overhead. In such situations, auxiliary interface
754functions are executed on host and only pass the values through device function
755invocation mechanism.
756
757### Address Computation
758
759Accesses to a memref element are transformed into an access to an element of the
760buffer pointed to by the descriptor. The position of the element in the buffer
761is calculated by linearizing memref indices in row-major order (lexically first
762index is the slowest varying, similar to C, but accounting for strides). The
763computation of the linear address is emitted as arithmetic operation in the LLVM
764IR dialect. Strides are extracted from the memref descriptor.
765
766Examples:
767
768An access to a memref with indices:
769
770```mlir
Mogballa54f4ea2021-10-12 23:14:57771%0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
Alex Zinenkob10940e2021-09-10 13:47:57772```
773
774is transformed into the equivalent of the following code:
775
776```mlir
777// Compute the linearized index from strides.
778// When strides or, in absence of explicit strides, the corresponding sizes are
779// dynamic, extract the stride value from the descriptor.
780%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
781 array<4xi64>, array<4xi64>)>
Mogballa54f4ea2021-10-12 23:14:57782%addr1 = arith.muli %stride1, %1 : i64
Alex Zinenkob10940e2021-09-10 13:47:57783
784// When the stride or, in absence of explicit strides, the trailing sizes are
785// known statically, this value is used as a constant. The natural value of
786// strides is the product of all sizes following the current dimension.
787%stride2 = llvm.mlir.constant(32 : index) : i64
Mogballa54f4ea2021-10-12 23:14:57788%addr2 = arith.muli %stride2, %2 : i64
789%addr3 = arith.addi %addr1, %addr2 : i64
Alex Zinenkob10940e2021-09-10 13:47:57790
791%stride3 = llvm.mlir.constant(8 : index) : i64
Mogballa54f4ea2021-10-12 23:14:57792%addr4 = arith.muli %stride3, %3 : i64
793%addr5 = arith.addi %addr3, %addr4 : i64
Alex Zinenkob10940e2021-09-10 13:47:57794
795// Multiplication with the known unit stride can be omitted.
Mogballa54f4ea2021-10-12 23:14:57796%addr6 = arith.addi %addr5, %4 : i64
Alex Zinenkob10940e2021-09-10 13:47:57797
798// If the linear offset is known to be zero, it can also be omitted. If it is
799// dynamic, it is extracted from the descriptor.
800%offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
801 array<4xi64>, array<4xi64>)>
Mogballa54f4ea2021-10-12 23:14:57802%addr7 = arith.addi %addr6, %offset : i64
Alex Zinenkob10940e2021-09-10 13:47:57803
804// All accesses are based on the aligned pointer.
805%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
806 array<4xi64>, array<4xi64>)>
807
808// Get the address of the data pointer.
809%ptr = llvm.getelementptr %aligned[%addr8]
810 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)>
811 -> !llvm.ptr<f32>
812
813// Perform the actual load.
814%0 = llvm.load %ptr : !llvm.ptr<f32>
815```
816
817For stores, the address computation code is identical and only the actual store
818operation is different.
819
820Note: the conversion does not perform any sort of common subexpression
821elimination when emitting memref accesses.
822
823### Utility Classes
824
825Utility classes common to many conversions to the LLVM dialect can be found
826under `lib/Conversion/LLVMCommon`. They include the following.
827
828- `LLVMConversionTarget` specifies all LLVM dialect operations as legal.
829- `LLVMTypeConverter` implements the default type conversion as described
830 above.
831- `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM
832 dialect-specific functionality.
833- `VectorConvertOpToLLVMPattern` extends the previous class to automatically
834 unroll operations on higher-dimensional vectors into lists of operations on
835 one-dimensional vectors before.
836- `StructBuilder` provides a convenient API for building IR that creates or
837 accesses values of LLVM dialect structure types; it is derived by
838 `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the
839 built-in types convertible to LLVM dialect structure types.
840
841## Translation to LLVM IR
842
843MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata`
844operations can be translated to LLVM IR modules using the following scheme.
845
846- Module-level globals are translated to LLVM IR global values.
847- Module-level metadata are translated to LLVM IR metadata, which can be later
848 augmented with additional metadata defined on specific ops.
849- All functions are declared in the module so that they can be referenced.
850- Each function is then translated separately and has access to the complete
851 mappings between MLIR and LLVM IR globals, metadata, and functions.
852- Within a function, blocks are traversed in topological order and translated
853 to LLVM IR basic blocks. In each basic block, PHI nodes are created for each
854 of the block arguments, but not connected to their source blocks.
855- Within each block, operations are translated in their order. Each operation
856 has access to the same mappings as the function and additionally to the
857 mapping of values between MLIR and LLVM IR, including PHI nodes. Operations
858 with regions are responsible for translated the regions they contain.
859- After operations in a function are translated, the PHI nodes of blocks in
860 this function are connected to their source values, which are now available.
861
862The translation mechanism provides extension hooks for translating custom
863operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`:
864
865- `convertOperation` translates an operation that belongs to the current
866 dialect to LLVM IR given an `IRBuilderBase` and various mappings;
867- `amendOperation` performs additional actions on an operation if it contains
868 a dialect attribute that belongs to the current dialect, for example sets up
869 instruction-level metadata.
870
871Dialects containing operations or attributes that want to be translated to LLVM
872IR must provide an implementation of this interface and register it with the
873system. Note that registration may happen without creating the dialect, for
874example, in a separate library to avoid the need for the "main" dialect library
875to depend on LLVM IR libraries. The implementations of these methods may used
876the
877[`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html)
878object provided to them which holds the state of the translation and contains
879numerous utilities.
880
881Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a
882small, relatively stable set of instructions and types that MLIR intends to
883model fully. Therefore, the extension mechanism is provided only for LLVM IR
884constructs that are more often extended -- intrinsics and metadata. The primary
885goal of the extension mechanism is to support sets of intrinsics, for example
886those representing a particular instruction set. The extension mechanism does
887not allow for customizing type or block translation, nor does it support custom
888module-level operations. Such transformations should be performed within MLIR
889and target the corresponding MLIR constructs.
890
891## Translation from LLVM IR
892
893An experimental flow allows one to import a substantially limited subset of LLVM
894IR into MLIR, producing LLVM dialect operations.
895
896```
897 mlir-translate -import-llvm filename.ll
898```