blob: 131e527a54a0c36ce140e25e255732be0717e36f [file] [log] [blame] [view]
Sean Silvae2d7d3c2021-01-09 00:25:471# Buffer Deallocation - Internals
2
3This section covers the internal functionality of the BufferDeallocation
4transformation. The transformation consists of several passes. The main pass
5called BufferDeallocation can be applied via “-buffer-deallocation on MLIR
6programs.
7
8## Requirements
9
Mogballa54f4ea2021-10-12 23:14:5710In order to use BufferDeallocation on an arbitrary dialect, several control-flow
11interfaces have to be implemented when using custom operations. This is
12particularly important to understand the implicit control-flow dependencies
13between different parts of the input program. Without implementing the following
14interfaces, control-flow relations cannot be discovered properly and the
15resulting program can become invalid:
Sean Silvae2d7d3c2021-01-09 00:25:4716
Mogballa54f4ea2021-10-12 23:14:5717* Branch-like terminators should implement the `BranchOpInterface` to query
18 and manipulate associated operands.
19* Operations involving structured control flow have to implement the
20 `RegionBranchOpInterface` to model inter-region control flow.
21* Terminators yielding values to their parent operation (in particular in the
22 scope of nested regions within `RegionBranchOpInterface`-based operations),
23 should implement the `ReturnLike` trait to represent logical value
24 returns”.
Sean Silvae2d7d3c2021-01-09 00:25:4725
Mogballa54f4ea2021-10-12 23:14:5726Example dialects that are fully compatible are the std and scf dialects with
27respect to all implemented interfaces.
Sean Silvae2d7d3c2021-01-09 00:25:4728
Julian Grossea51e7d2021-01-27 14:26:0729During Bufferization, we convert immutable value types (tensors) to mutable
30types (memref). This conversion is done in several steps and in all of these
Mogballa54f4ea2021-10-12 23:14:5731steps the IR has to fulfill SSA like properties. The usage of memref has to be
32in the following consecutive order: allocation, write-buffer, read- buffer. In
33this case, there are only buffer reads allowed after the initial full buffer
34write is done. In particular, there must be no partial write to a buffer after
35the initial write has been finished. However, partial writes in the initializing
36is allowed (fill buffer step by step in a loop e.g.). This means, all buffer
37writes needs to dominate all buffer reads.
Julian Grossea51e7d2021-01-27 14:26:0738
39Example for breaking the invariant:
40
41```mlir
42func @condBranch(%arg0: i1, %arg1: memref<2xf32>) {
43 %0 = memref.alloc() : memref<2xf32>
River Riddleace01602022-02-04 04:59:4344 cf.cond_br %arg0, ^bb1, ^bb2
Julian Grossea51e7d2021-01-27 14:26:0745^bb1:
River Riddleace01602022-02-04 04:59:4346 cf.br ^bb3()
Julian Grossea51e7d2021-01-27 14:26:0747^bb2:
48 partial_write(%0, %0)
River Riddleace01602022-02-04 04:59:4349 cf.br ^bb3()
Julian Grossea51e7d2021-01-27 14:26:0750^bb3():
Alexander Belyaev465b9a42021-03-31 07:34:0351 test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
Julian Grossea51e7d2021-01-27 14:26:0752 return
53}
54```
55
56The maintenance of the SSA like properties is only needed in the bufferization
57process. Afterwards, for example in optimization processes, the property is no
58longer needed.
59
Sean Silvae2d7d3c2021-01-09 00:25:4760## Detection of Buffer Allocations
61
62The first step of the BufferDeallocation transformation is to identify
63manageable allocation operations that implement the `SideEffects` interface.
64Furthermore, these ops need to apply the effect `MemoryEffects::Allocate` to a
65particular result value while not using the resource
66`SideEffects::AutomaticAllocationScopeResource` (since it is currently reserved
67for allocations, like `Alloca` that will be automatically deallocated by a
Mogballa54f4ea2021-10-12 23:14:5768parent scope). Allocations that have not been detected in this phase will not be
69tracked internally, and thus, not deallocated automatically. However,
70BufferDeallocation is fully compatible with hybrid setups in which tracked and
71untracked allocations are mixed:
Sean Silvae2d7d3c2021-01-09 00:25:4772
73```mlir
74func @mixedAllocation(%arg0: i1) {
Mogballa54f4ea2021-10-12 23:14:5775 %0 = memref.alloca() : memref<2xf32> // aliases: %2
76 %1 = memref.alloc() : memref<2xf32> // aliases: %2
River Riddleace01602022-02-04 04:59:4377 cf.cond_br %arg0, ^bb1, ^bb2
Sean Silvae2d7d3c2021-01-09 00:25:4778^bb1:
79 use(%0)
River Riddleace01602022-02-04 04:59:4380 cf.br ^bb3(%0 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:4781^bb2:
82 use(%1)
River Riddleace01602022-02-04 04:59:4383 cf.br ^bb3(%1 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:4784^bb3(%2: memref<2xf32>):
85 ...
86}
87```
88
89Example of using a conditional branch with alloc and alloca. BufferDeallocation
90can detect and handle the different allocation types that might be intermixed.
91
92Note: the current version does not support allocation operations returning
93multiple result buffers.
94
95## Conversion from AllocOp to AllocaOp
96
97The PromoteBuffersToStack-pass converts AllocOps to AllocaOps, if possible. In
98some cases, it can be useful to use such stack-based buffers instead of
99heap-based buffers. The conversion is restricted to several constraints like:
100
Mogballa54f4ea2021-10-12 23:14:57101* Control flow
102* Buffer Size
103* Dynamic Size
Sean Silvae2d7d3c2021-01-09 00:25:47104
Mogballa54f4ea2021-10-12 23:14:57105If a buffer is leaving a block, we are not allowed to convert it into an alloca.
106If the size of the buffer is large, we could convert it, but regarding stack
107overflow, it makes sense to limit the size of these buffers and only convert
108small ones. The size can be set via a pass option. The current default value is
1091KB. Furthermore, we can not convert buffers with dynamic size, since the
110dimension is not known a priori.
Sean Silvae2d7d3c2021-01-09 00:25:47111
112## Movement and Placement of Allocations
113
114Using the buffer hoisting pass, all buffer allocations are moved as far upwards
115as possible in order to group them and make upcoming optimizations easier by
Mogballa54f4ea2021-10-12 23:14:57116limiting the search space. Such a movement is shown in the following graphs. In
117addition, we are able to statically free an alloc, if we move it into a
118dominator of all of its uses. This simplifies further optimizations (e.g. buffer
119fusion) in the future. However, movement of allocations is limited by external
120data dependencies (in particular in the case of allocations of dynamically
121shaped types). Furthermore, allocations can be moved out of nested regions, if
122necessary. In order to move allocations to valid locations with respect to their
123uses only, we leverage Liveness information.
Sean Silvae2d7d3c2021-01-09 00:25:47124
125The following code snippets shows a conditional branch before running the
126BufferHoisting pass:
127
128![branch_example_pre_move](/includes/img/branch_example_pre_move.svg)
129
130```mlir
131func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
River Riddleace01602022-02-04 04:59:43132 cf.cond_br %arg0, ^bb1, ^bb2
Sean Silvae2d7d3c2021-01-09 00:25:47133^bb1:
River Riddleace01602022-02-04 04:59:43134 cf.br ^bb3(%arg1 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47135^bb2:
Alexander Belyaev465b9a42021-03-31 07:34:03136 %0 = memref.alloc() : memref<2xf32> // aliases: %1
Sean Silvae2d7d3c2021-01-09 00:25:47137 use(%0)
River Riddleace01602022-02-04 04:59:43138 cf.br ^bb3(%0 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47139^bb3(%1: memref<2xf32>): // %1 could be %0 or %arg1
Alexander Belyaev465b9a42021-03-31 07:34:03140 test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
Sean Silvae2d7d3c2021-01-09 00:25:47141 return
142}
143```
144
145Applying the BufferHoisting pass on this program results in the following piece
146of code:
147
148![branch_example_post_move](/includes/img/branch_example_post_move.svg)
149
150```mlir
151func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
Alexander Belyaev465b9a42021-03-31 07:34:03152 %0 = memref.alloc() : memref<2xf32> // moved to bb0
River Riddleace01602022-02-04 04:59:43153 cf.cond_br %arg0, ^bb1, ^bb2
Sean Silvae2d7d3c2021-01-09 00:25:47154^bb1:
River Riddleace01602022-02-04 04:59:43155 cf.br ^bb3(%arg1 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47156^bb2:
157 use(%0)
River Riddleace01602022-02-04 04:59:43158 cf.br ^bb3(%0 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47159^bb3(%1: memref<2xf32>):
Alexander Belyaev465b9a42021-03-31 07:34:03160 test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
Sean Silvae2d7d3c2021-01-09 00:25:47161 return
162}
163```
164
165The alloc is moved from bb2 to the beginning and it is passed as an argument to
166bb3.
167
Mogballa54f4ea2021-10-12 23:14:57168The following example demonstrates an allocation using dynamically shaped types.
169Due to the data dependency of the allocation to %0, we cannot move the
Sean Silvae2d7d3c2021-01-09 00:25:47170allocation out of bb2 in this case:
171
172```mlir
173func @condBranchDynamicType(
174 %arg0: i1,
175 %arg1: memref<?xf32>,
176 %arg2: memref<?xf32>,
177 %arg3: index) {
River Riddleace01602022-02-04 04:59:43178 cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index)
Sean Silvae2d7d3c2021-01-09 00:25:47179^bb1:
River Riddleace01602022-02-04 04:59:43180 cf.br ^bb3(%arg1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47181^bb2(%0: index):
Alexander Belyaev465b9a42021-03-31 07:34:03182 %1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards to the data
Sean Silvae2d7d3c2021-01-09 00:25:47183 // dependency to %0
184 use(%1)
River Riddleace01602022-02-04 04:59:43185 cf.br ^bb3(%1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47186^bb3(%2: memref<?xf32>):
Alexander Belyaev465b9a42021-03-31 07:34:03187 test.copy(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
Sean Silvae2d7d3c2021-01-09 00:25:47188 return
189}
190```
191
Alexander Belyaev465b9a42021-03-31 07:34:03192## Introduction of Clones
Sean Silvae2d7d3c2021-01-09 00:25:47193
194In order to guarantee that all allocated buffers are freed properly, we have to
195pay attention to the control flow and all potential aliases a buffer allocation
196can have. Since not all allocations can be safely freed with respect to their
197aliases (see the following code snippet), it is often required to introduce
198copies to eliminate them. Consider the following example in which the
199allocations have already been placed:
200
201```mlir
202func @branch(%arg0: i1) {
Alexander Belyaev465b9a42021-03-31 07:34:03203 %0 = memref.alloc() : memref<2xf32> // aliases: %2
River Riddleace01602022-02-04 04:59:43204 cf.cond_br %arg0, ^bb1, ^bb2
Sean Silvae2d7d3c2021-01-09 00:25:47205^bb1:
Alexander Belyaev465b9a42021-03-31 07:34:03206 %1 = memref.alloc() : memref<2xf32> // resides here for demonstration purposes
Sean Silvae2d7d3c2021-01-09 00:25:47207 // aliases: %2
River Riddleace01602022-02-04 04:59:43208 cf.br ^bb3(%1 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47209^bb2:
210 use(%0)
River Riddleace01602022-02-04 04:59:43211 cf.br ^bb3(%0 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47212^bb3(%2: memref<2xf32>):
213
214 return
215}
216```
217
218The first alloc can be safely freed after the live range of its post-dominator
Mogballa54f4ea2021-10-12 23:14:57219block (bb3). The alloc in bb1 has an alias %2 in bb3 that also keeps this buffer
220alive until the end of bb3. Since we cannot determine the actual branches that
221will be taken at runtime, we have to ensure that all buffers are freed correctly
222in bb3 regardless of the branches we will take to reach the exit block. This
223makes it necessary to introduce a copy for %2, which allows us to free %alloc0
224in bb0 and %alloc1 in bb1. Afterwards, we can continue processing all aliases of
225%2 (none in this case) and we can safely free %2 at the end of the sample
226program. This sample demonstrates that not all allocations can be safely freed
227in their associated post-dominator blocks. Instead, we have to pay attention to
228all of their aliases.
Sean Silvae2d7d3c2021-01-09 00:25:47229
230Applying the BufferDeallocation pass to the program above yields the following
231result:
232
233```mlir
234func @branch(%arg0: i1) {
Alexander Belyaev465b9a42021-03-31 07:34:03235 %0 = memref.alloc() : memref<2xf32>
River Riddleace01602022-02-04 04:59:43236 cf.cond_br %arg0, ^bb1, ^bb2
Sean Silvae2d7d3c2021-01-09 00:25:47237^bb1:
Alexander Belyaev465b9a42021-03-31 07:34:03238 %1 = memref.alloc() : memref<2xf32>
Alexander Belyaev57470ab2021-11-25 10:42:16239 %3 = bufferization.clone %1 : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03240 memref.dealloc %1 : memref<2xf32> // %1 can be safely freed here
River Riddleace01602022-02-04 04:59:43241 cf.br ^bb3(%3 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47242^bb2:
243 use(%0)
Alexander Belyaev57470ab2021-11-25 10:42:16244 %4 = bufferization.clone %0 : (memref<2xf32>) -> (memref<2xf32>)
River Riddleace01602022-02-04 04:59:43245 cf.br ^bb3(%4 : memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47246^bb3(%2: memref<2xf32>):
247
Alexander Belyaev465b9a42021-03-31 07:34:03248 memref.dealloc %2 : memref<2xf32> // free temp buffer %2
249 memref.dealloc %0 : memref<2xf32> // %0 can be safely freed here
Sean Silvae2d7d3c2021-01-09 00:25:47250 return
251}
252```
253
254Note that a temporary buffer for %2 was introduced to free all allocations
255properly. Note further that the unnecessary allocation of %3 can be easily
Mogballa54f4ea2021-10-12 23:14:57256removed using one of the post-pass transformations or the canonicalization pass.
Sean Silvae2d7d3c2021-01-09 00:25:47257
Alexander Belyaev465b9a42021-03-31 07:34:03258The presented example also works with dynamically shaped types.
Sean Silvae2d7d3c2021-01-09 00:25:47259
260BufferDeallocation performs a fix-point iteration taking all aliases of all
261tracked allocations into account. We initialize the general iteration process
262using all tracked allocations and their associated aliases. As soon as we
263encounter an alias that is not properly dominated by our allocation, we mark
Mogballa54f4ea2021-10-12 23:14:57264this alias as *critical* (needs to be freed and tracked by the internal
265fix-point iteration). The following sample demonstrates the presence of critical
266and non-critical aliases:
Sean Silvae2d7d3c2021-01-09 00:25:47267
268![nested_branch_example_pre_move](/includes/img/nested_branch_example_pre_move.svg)
269
270```mlir
271func @condBranchDynamicTypeNested(
272 %arg0: i1,
273 %arg1: memref<?xf32>, // aliases: %3, %4
274 %arg2: memref<?xf32>,
275 %arg3: index) {
River Riddleace01602022-02-04 04:59:43276 cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index)
Sean Silvae2d7d3c2021-01-09 00:25:47277^bb1:
River Riddleace01602022-02-04 04:59:43278 cf.br ^bb6(%arg1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47279^bb2(%0: index):
Alexander Belyaev465b9a42021-03-31 07:34:03280 %1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards due to the data
Sean Silvae2d7d3c2021-01-09 00:25:47281 // dependency to %0
282 // aliases: %2, %3, %4
283 use(%1)
River Riddleace01602022-02-04 04:59:43284 cf.cond_br %arg0, ^bb3, ^bb4
Sean Silvae2d7d3c2021-01-09 00:25:47285^bb3:
River Riddleace01602022-02-04 04:59:43286 cf.br ^bb5(%1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47287^bb4:
River Riddleace01602022-02-04 04:59:43288 cf.br ^bb5(%1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47289^bb5(%2: memref<?xf32>): // non-crit. alias of %1, since %1 dominates %2
River Riddleace01602022-02-04 04:59:43290 cf.br ^bb6(%2 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47291^bb6(%3: memref<?xf32>): // crit. alias of %arg1 and %2 (in other words %1)
River Riddleace01602022-02-04 04:59:43292 cf.br ^bb7(%3 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47293^bb7(%4: memref<?xf32>): // non-crit. alias of %3, since %3 dominates %4
Alexander Belyaev465b9a42021-03-31 07:34:03294 test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
Sean Silvae2d7d3c2021-01-09 00:25:47295 return
296}
297```
298
299Applying BufferDeallocation yields the following output:
300
301![nested_branch_example_post_move](/includes/img/nested_branch_example_post_move.svg)
302
303```mlir
304func @condBranchDynamicTypeNested(
305 %arg0: i1,
306 %arg1: memref<?xf32>,
307 %arg2: memref<?xf32>,
308 %arg3: index) {
River Riddleace01602022-02-04 04:59:43309 cf.cond_br %arg0, ^bb1, ^bb2(%arg3 : index)
Sean Silvae2d7d3c2021-01-09 00:25:47310^bb1:
Alexander Belyaev465b9a42021-03-31 07:34:03311 // temp buffer required due to alias %3
Alexander Belyaev57470ab2021-11-25 10:42:16312 %5 = bufferization.clone %arg1 : (memref<?xf32>) -> (memref<?xf32>)
River Riddleace01602022-02-04 04:59:43313 cf.br ^bb6(%5 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47314^bb2(%0: index):
Alexander Belyaev465b9a42021-03-31 07:34:03315 %1 = memref.alloc(%0) : memref<?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47316 use(%1)
River Riddleace01602022-02-04 04:59:43317 cf.cond_br %arg0, ^bb3, ^bb4
Sean Silvae2d7d3c2021-01-09 00:25:47318^bb3:
River Riddleace01602022-02-04 04:59:43319 cf.br ^bb5(%1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47320^bb4:
River Riddleace01602022-02-04 04:59:43321 cf.br ^bb5(%1 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47322^bb5(%2: memref<?xf32>):
Alexander Belyaev57470ab2021-11-25 10:42:16323 %6 = bufferization.clone %1 : (memref<?xf32>) -> (memref<?xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03324 memref.dealloc %1 : memref<?xf32>
River Riddleace01602022-02-04 04:59:43325 cf.br ^bb6(%6 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47326^bb6(%3: memref<?xf32>):
River Riddleace01602022-02-04 04:59:43327 cf.br ^bb7(%3 : memref<?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47328^bb7(%4: memref<?xf32>):
Alexander Belyaev465b9a42021-03-31 07:34:03329 test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
330 memref.dealloc %3 : memref<?xf32> // free %3, since %4 is a non-crit. alias of %3
Sean Silvae2d7d3c2021-01-09 00:25:47331 return
332}
333```
334
335Since %3 is a critical alias, BufferDeallocation introduces an additional
336temporary copy in all predecessor blocks. %3 has an additional (non-critical)
337alias %4 that extends the live range until the end of bb7. Therefore, we can
338free %3 after its last use, while taking all aliases into account. Note that %4
Alexander Belyaev465b9a42021-03-31 07:34:03339does not need to be freed, since we did not introduce a copy for it.
Sean Silvae2d7d3c2021-01-09 00:25:47340
341The actual introduction of buffer copies is done after the fix-point iteration
342has been terminated and all critical aliases have been detected. A critical
343alias can be either a block argument or another value that is returned by an
344operation. Copies for block arguments are handled by analyzing all predecessor
345blocks. This is primarily done by querying the `BranchOpInterface` of the
346associated branch terminators that can jump to the current block. Consider the
Mogballa54f4ea2021-10-12 23:14:57347following example which involves a simple branch and the critical block argument
348%2:
Sean Silvae2d7d3c2021-01-09 00:25:47349
350```mlir
351 custom.br ^bb1(..., %0, : ...)
352 ...
353 custom.br ^bb1(..., %1, : ...)
354 ...
355^bb1(%2: memref<2xf32>):
356 ...
357```
358
359The `BranchOpInterface` allows us to determine the actual values that will be
360passed to block bb1 and its argument %2 by analyzing its predecessor blocks.
361Once we have resolved the values %0 and %1 (that are associated with %2 in this
Mogballa54f4ea2021-10-12 23:14:57362sample), we can introduce a temporary buffer and clone its contents into the new
363buffer. Afterwards, we rewire the branch operands to use the newly allocated
364buffer instead. However, blocks can have implicitly defined predecessors by
365parent ops that implement the `RegionBranchOpInterface`. This can be the case if
366this block argument belongs to the entry block of a region. In this setting, we
367have to identify all predecessor regions defined by the parent operation. For
368every region, we need to get all terminator operations implementing the
369`ReturnLike` trait, indicating that they can branch to our current block.
370Finally, we can use a similar functionality as described above to add the
371temporary copy. This time, we can modify the terminator operands directly
372without touching a high-level interface.
Sean Silvae2d7d3c2021-01-09 00:25:47373
374Consider the following inner-region control-flow sample that uses an imaginary
Mogballa54f4ea2021-10-12 23:14:57375custom.region_if operation. It either executes the then or else region and
376always continues to the join region. The custom.region_if_yield operation
377returns a result to the parent operation. This sample demonstrates the use of
378the `RegionBranchOpInterface` to determine predecessors in order to infer the
379high-level control flow:
Sean Silvae2d7d3c2021-01-09 00:25:47380
381```mlir
382func @inner_region_control_flow(
383 %arg0 : index,
384 %arg1 : index) -> memref<?x?xf32> {
Alexander Belyaev465b9a42021-03-31 07:34:03385 %0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47386 %1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
387 then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1
388 custom.region_if_yield %arg2 : memref<?x?xf32>
389 } else(%arg3 : memref<?x?xf32>) { // aliases: %arg4, %1
390 custom.region_if_yield %arg3 : memref<?x?xf32>
391 } join(%arg4 : memref<?x?xf32>) { // aliases: %1
392 custom.region_if_yield %arg4 : memref<?x?xf32>
393 }
394 return %1 : memref<?x?xf32>
395}
396```
397
398![region_branch_example_pre_move](/includes/img/region_branch_example_pre_move.svg)
399
400Non-block arguments (other values) can become aliases when they are returned by
401dialect-specific operations. BufferDeallocation supports this behavior via the
402`RegionBranchOpInterface`. Consider the following example that uses an scf.if
403operation to determine the value of %2 at runtime which creates an alias:
404
405```mlir
406func @nested_region_control_flow(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
Mogballa54f4ea2021-10-12 23:14:57407 %0 = arith.cmpi "eq", %arg0, %arg1 : index
Alexander Belyaev465b9a42021-03-31 07:34:03408 %1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47409 %2 = scf.if %0 -> (memref<?x?xf32>) {
410 scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1
411 } else {
Alexander Belyaev465b9a42021-03-31 07:34:03412 %3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // nested allocation in a div.
Sean Silvae2d7d3c2021-01-09 00:25:47413 // branch
414 use(%3)
415 scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1
416 }
417 return %2 : memref<?x?xf32>
418}
419```
420
421In this example, a dealloc is inserted to release the buffer within the else
Mogballa54f4ea2021-10-12 23:14:57422block since it cannot be accessed by the remainder of the program. Accessing the
423`RegionBranchOpInterface`, allows us to infer that %2 is a non-critical alias of
424%1 which does not need to be tracked.
Sean Silvae2d7d3c2021-01-09 00:25:47425
426```mlir
427func @nested_region_control_flow(%arg0: index, %arg1: index) -> memref<?x?xf32> {
Mogballa54f4ea2021-10-12 23:14:57428 %0 = arith.cmpi "eq", %arg0, %arg1 : index
Alexander Belyaev465b9a42021-03-31 07:34:03429 %1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47430 %2 = scf.if %0 -> (memref<?x?xf32>) {
431 scf.yield %1 : memref<?x?xf32>
432 } else {
Alexander Belyaev465b9a42021-03-31 07:34:03433 %3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47434 use(%3)
Alexander Belyaev465b9a42021-03-31 07:34:03435 memref.dealloc %3 : memref<?x?xf32> // %3 can be safely freed here
Sean Silvae2d7d3c2021-01-09 00:25:47436 scf.yield %1 : memref<?x?xf32>
437 }
438 return %2 : memref<?x?xf32>
439}
440```
441
442Analogous to the previous case, we have to detect all terminator operations in
443all attached regions of scf.if that provides a value to its parent operation
Mogballa54f4ea2021-10-12 23:14:57444(in this sample via scf.yield). Querying the `RegionBranchOpInterface` allows us
445to determine the regions that return a result to their parent operation. Like
446before, we have to update all `ReturnLike` terminators as described above.
Sean Silvae2d7d3c2021-01-09 00:25:47447Reconsider a slightly adapted version of the custom.region_if example from
448above that uses a nested allocation:
449
450```mlir
451func @inner_region_control_flow_div(
452 %arg0 : index,
453 %arg1 : index) -> memref<?x?xf32> {
Alexander Belyaev465b9a42021-03-31 07:34:03454 %0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47455 %1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
456 then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1
457 custom.region_if_yield %arg2 : memref<?x?xf32>
458 } else(%arg3 : memref<?x?xf32>) {
Alexander Belyaev465b9a42021-03-31 07:34:03459 %2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // aliases: %arg4, %1
Sean Silvae2d7d3c2021-01-09 00:25:47460 custom.region_if_yield %2 : memref<?x?xf32>
461 } join(%arg4 : memref<?x?xf32>) { // aliases: %1
462 custom.region_if_yield %arg4 : memref<?x?xf32>
463 }
464 return %1 : memref<?x?xf32>
465}
466```
467
468Since the allocation %2 happens in a divergent branch and cannot be safely
469deallocated in a post-dominator, %arg4 will be considered a critical alias.
Mogballa54f4ea2021-10-12 23:14:57470Furthermore, %arg4 is returned to its parent operation and has an alias %1. This
471causes BufferDeallocation to introduce additional copies:
Sean Silvae2d7d3c2021-01-09 00:25:47472
473```mlir
474func @inner_region_control_flow_div(
475 %arg0 : index,
476 %arg1 : index) -> memref<?x?xf32> {
Alexander Belyaev465b9a42021-03-31 07:34:03477 %0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47478 %1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
479 then(%arg2 : memref<?x?xf32>) {
Alexander Belyaev57470ab2021-11-25 10:42:16480 %4 = bufferization.clone %arg2 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47481 custom.region_if_yield %4 : memref<?x?xf32>
482 } else(%arg3 : memref<?x?xf32>) {
Alexander Belyaev465b9a42021-03-31 07:34:03483 %2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
Alexander Belyaev57470ab2021-11-25 10:42:16484 %5 = bufferization.clone %2 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03485 memref.dealloc %2 : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47486 custom.region_if_yield %5 : memref<?x?xf32>
487 } join(%arg4: memref<?x?xf32>) {
Alexander Belyaev57470ab2021-11-25 10:42:16488 %4 = bufferization.clone %arg4 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03489 memref.dealloc %arg4 : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47490 custom.region_if_yield %4 : memref<?x?xf32>
491 }
Alexander Belyaev465b9a42021-03-31 07:34:03492 memref.dealloc %0 : memref<?x?xf32> // %0 can be safely freed here
Sean Silvae2d7d3c2021-01-09 00:25:47493 return %1 : memref<?x?xf32>
494}
495```
496
497## Placement of Deallocs
498
499After introducing allocs and copies, deallocs have to be placed to free
500allocated memory and avoid memory leaks. The deallocation needs to take place
501after the last use of the given value. The position can be determined by
502calculating the common post-dominator of all values using their remaining
503non-critical aliases. A special-case is the presence of back edges: since such
Mogballa54f4ea2021-10-12 23:14:57504edges can cause memory leaks when a newly allocated buffer flows back to another
505part of the program. In these cases, we need to free the associated buffer
506instances from the previous iteration by inserting additional deallocs.
Sean Silvae2d7d3c2021-01-09 00:25:47507
508Consider the following scf.for use case containing a nested structured
509control-flow if:
510
511```mlir
512func @loop_nested_if(
513 %lb: index,
514 %ub: index,
515 %step: index,
516 %buf: memref<2xf32>,
517 %res: memref<2xf32>) {
518 %0 = scf.for %i = %lb to %ub step %step
519 iter_args(%iterBuf = %buf) -> memref<2xf32> {
Mogballa54f4ea2021-10-12 23:14:57520 %1 = arith.cmpi "eq", %i, %ub : index
Sean Silvae2d7d3c2021-01-09 00:25:47521 %2 = scf.if %1 -> (memref<2xf32>) {
Alexander Belyaev465b9a42021-03-31 07:34:03522 %3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias due to a
Sean Silvae2d7d3c2021-01-09 00:25:47523 // divergent allocation
524 use(%3)
525 scf.yield %3 : memref<2xf32>
526 } else {
527 scf.yield %iterBuf : memref<2xf32>
528 }
529 scf.yield %2 : memref<2xf32>
530 }
Alexander Belyaev465b9a42021-03-31 07:34:03531 test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
Sean Silvae2d7d3c2021-01-09 00:25:47532 return
533}
534```
535
Mogballa54f4ea2021-10-12 23:14:57536In this example, the *then* branch of the nested scf.if operation returns a
Sean Silvae2d7d3c2021-01-09 00:25:47537newly allocated buffer.
538
539Since this allocation happens in the scope of a divergent branch, %2 becomes a
Mogballa54f4ea2021-10-12 23:14:57540critical alias that needs to be handled. As before, we have to insert additional
541copies to eliminate this alias using copies of %3 and %iterBuf. This guarantees
542that %2 will be a newly allocated buffer that is returned in each iteration.
543However, returning %2 to its alias %iterBuf turns %iterBuf into a critical
544alias as well. In other words, we have to create a copy of %2 to pass it to
545%iterBuf. Since this jump represents a back edge, and %2 will always be a new
546buffer, we have to free the buffer from the previous iteration to avoid memory
547leaks:
Sean Silvae2d7d3c2021-01-09 00:25:47548
549```mlir
550func @loop_nested_if(
551 %lb: index,
552 %ub: index,
553 %step: index,
554 %buf: memref<2xf32>,
555 %res: memref<2xf32>) {
Alexander Belyaev57470ab2021-11-25 10:42:16556 %4 = bufferization.clone %buf : (memref<2xf32>) -> (memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47557 %0 = scf.for %i = %lb to %ub step %step
558 iter_args(%iterBuf = %4) -> memref<2xf32> {
Mogballa54f4ea2021-10-12 23:14:57559 %1 = arith.cmpi "eq", %i, %ub : index
Sean Silvae2d7d3c2021-01-09 00:25:47560 %2 = scf.if %1 -> (memref<2xf32>) {
Alexander Belyaev465b9a42021-03-31 07:34:03561 %3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias
Sean Silvae2d7d3c2021-01-09 00:25:47562 use(%3)
Alexander Belyaev57470ab2021-11-25 10:42:16563 %5 = bufferization.clone %3 : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03564 memref.dealloc %3 : memref<2xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47565 scf.yield %5 : memref<2xf32>
566 } else {
Alexander Belyaev57470ab2021-11-25 10:42:16567 %6 = bufferization.clone %iterBuf : (memref<2xf32>) -> (memref<2xf32>)
Sean Silvae2d7d3c2021-01-09 00:25:47568 scf.yield %6 : memref<2xf32>
569 }
Alexander Belyaev57470ab2021-11-25 10:42:16570 %7 = bufferization.clone %2 : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03571 memref.dealloc %2 : memref<2xf32>
572 memref.dealloc %iterBuf : memref<2xf32> // free backedge iteration variable
Sean Silvae2d7d3c2021-01-09 00:25:47573 scf.yield %7 : memref<2xf32>
574 }
Alexander Belyaev465b9a42021-03-31 07:34:03575 test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
576 memref.dealloc %0 : memref<2xf32> // free temp copy %0
Sean Silvae2d7d3c2021-01-09 00:25:47577 return
578}
579```
580
581Example for loop-like control flow. The CFG contains back edges that have to be
582handled to avoid memory leaks. The bufferization is able to free the backedge
583iteration variable %iterBuf.
584
585## Private Analyses Implementations
586
587The BufferDeallocation transformation relies on one primary control-flow
588analysis: BufferPlacementAliasAnalysis. Furthermore, we also use dominance and
589liveness to place and move nodes. The liveness analysis determines the live
590range of a given value. Within this range, a value is alive and can or will be
591used in the course of the program. After this range, the value is dead and can
592be discarded - in our case, the buffer can be freed. To place the allocs, we
593need to know from which position a value will be alive. The allocs have to be
594placed in front of this position. However, the most important analysis is the
595alias analysis that is needed to introduce copies and to place all
596deallocations.
597
598# Post Phase
599
600In order to limit the complexity of the BufferDeallocation transformation, some
601tiny code-polishing/optimization transformations are not applied on-the-fly
Alexander Belyaev465b9a42021-03-31 07:34:03602during placement. Currently, a canonicalization pattern is added to the clone
603operation to reduce the appearance of unnecessary clones.
Sean Silvae2d7d3c2021-01-09 00:25:47604
605Note: further transformations might be added to the post-pass phase in the
606future.
607
Alexander Belyaev465b9a42021-03-31 07:34:03608## Clone Canonicalization
Sean Silvae2d7d3c2021-01-09 00:25:47609
Alexander Belyaev465b9a42021-03-31 07:34:03610During placement of clones it may happen, that unnecessary clones are inserted.
611If these clones appear with their corresponding dealloc operation within the
612same block, we can use the canonicalizer to remove these unnecessary operations.
613Note, that this step needs to take place after the insertion of clones and
Mogballa54f4ea2021-10-12 23:14:57614deallocs in the buffer deallocation step. The canonicalization inludes both, the
615newly created target value from the clone operation and the source operation.
Sean Silvae2d7d3c2021-01-09 00:25:47616
Alexander Belyaev465b9a42021-03-31 07:34:03617## Canonicalization of the Source Buffer of the Clone Operation
Sean Silvae2d7d3c2021-01-09 00:25:47618
Alexander Belyaev465b9a42021-03-31 07:34:03619In this case, the source of the clone operation can be used instead of its
620target. The unused allocation and deallocation operations that are defined for
621this clone operation are also removed. Here is a working example generated by
622the BufferDeallocation pass that allocates a buffer with dynamic size. A deeper
Sean Silvae2d7d3c2021-01-09 00:25:47623analysis of this sample reveals that the highlighted operations are redundant
624and can be removed.
625
626```mlir
627func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {
Alexander Belyaev465b9a42021-03-31 07:34:03628 %1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
Alexander Belyaev57470ab2021-11-25 10:42:16629 %2 = bufferization.clone %1 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03630 memref.dealloc %1 : memref<?x?xf32>
631 return %2 : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47632}
633```
634
635Will be transformed to:
636
637```mlir
638func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {
Alexander Belyaev465b9a42021-03-31 07:34:03639 %1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
640 return %1 : memref<?x?xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47641}
642```
643
Alexander Belyaev465b9a42021-03-31 07:34:03644In this case, the additional copy %2 can be replaced with its original source
645buffer %1. This also applies to the associated dealloc operation of %1.
Sean Silvae2d7d3c2021-01-09 00:25:47646
Alexander Belyaev465b9a42021-03-31 07:34:03647## Canonicalization of the Target Buffer of the Clone Operation
Sean Silvae2d7d3c2021-01-09 00:25:47648
Alexander Belyaev465b9a42021-03-31 07:34:03649In this case, the target buffer of the clone operation can be used instead of
650its source. The unused deallocation operation that is defined for this clone
651operation is also removed.
Sean Silvae2d7d3c2021-01-09 00:25:47652
Alexander Belyaev465b9a42021-03-31 07:34:03653Consider the following example where a generic test operation writes the result
Mogballa54f4ea2021-10-12 23:14:57654to %temp and then copies %temp to %result. However, these two operations can be
655merged into a single step. Canonicalization removes the clone operation and
656%temp, and replaces the uses of %temp with %result:
Sean Silvae2d7d3c2021-01-09 00:25:47657
658```mlir
659func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){
Alexander Belyaev465b9a42021-03-31 07:34:03660 %temp = memref.alloc() : memref<2xf32>
661 test.generic {
Sean Silvae2d7d3c2021-01-09 00:25:47662 args_in = 1 : i64,
663 args_out = 1 : i64,
664 indexing_maps = [#map0, #map0],
665 iterator_types = ["parallel"]} %arg0, %temp {
666 ^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
Mogballa54f4ea2021-10-12 23:14:57667 %tmp2 = math.exp %gen2_arg0 : f32
Alexander Belyaev465b9a42021-03-31 07:34:03668 test.yield %tmp2 : f32
Sean Silvae2d7d3c2021-01-09 00:25:47669 }: memref<2xf32>, memref<2xf32>
Alexander Belyaev57470ab2021-11-25 10:42:16670 %result = bufferization.clone %temp : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev465b9a42021-03-31 07:34:03671 memref.dealloc %temp : memref<2xf32>
Sean Silvae2d7d3c2021-01-09 00:25:47672 return
673}
674```
675
676Will be transformed to:
677
678```mlir
679func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){
Alexander Belyaev465b9a42021-03-31 07:34:03680 test.generic {
Sean Silvae2d7d3c2021-01-09 00:25:47681 args_in = 1 : i64,
682 args_out = 1 : i64,
683 indexing_maps = [#map0, #map0],
684 iterator_types = ["parallel"]} %arg0, %result {
685 ^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
Mogballa54f4ea2021-10-12 23:14:57686 %tmp2 = math.exp %gen2_arg0 : f32
Alexander Belyaev465b9a42021-03-31 07:34:03687 test.yield %tmp2 : f32
Sean Silvae2d7d3c2021-01-09 00:25:47688 }: memref<2xf32>, memref<2xf32>
689 return
690}
691```
692
Sean Silvae2d7d3c2021-01-09 00:25:47693## Known Limitations
694
Alexander Belyaev465b9a42021-03-31 07:34:03695BufferDeallocation introduces additional clones from memref dialect
Alexander Belyaev57470ab2021-11-25 10:42:16696(“bufferization.clone”). Analogous, all deallocations use the memref
697dialect-free operation memref.dealloc”. The actual copy process is realized
698using test.copy”. Furthermore, buffers are essentially immutable after their
699creation in a block. Another limitations are known in the case using
700unstructered control flow.