Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 1 | # Buffer Deallocation - Internals |
| 2 | |
| 3 | This section covers the internal functionality of the BufferDeallocation |
| 4 | transformation. The transformation consists of several passes. The main pass |
| 5 | called BufferDeallocation can be applied via “-buffer-deallocation” on MLIR |
| 6 | programs. |
| 7 | |
| 8 | ## Requirements |
| 9 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 10 | In order to use BufferDeallocation on an arbitrary dialect, several control-flow |
| 11 | interfaces have to be implemented when using custom operations. This is |
| 12 | particularly important to understand the implicit control-flow dependencies |
| 13 | between different parts of the input program. Without implementing the following |
| 14 | interfaces, control-flow relations cannot be discovered properly and the |
| 15 | resulting program can become invalid: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 16 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 17 | * Branch-like terminators should implement the `BranchOpInterface` to query |
| 18 | and manipulate associated operands. |
| 19 | * Operations involving structured control flow have to implement the |
| 20 | `RegionBranchOpInterface` to model inter-region control flow. |
| 21 | * Terminators yielding values to their parent operation (in particular in the |
| 22 | scope of nested regions within `RegionBranchOpInterface`-based operations), |
| 23 | should implement the `ReturnLike` trait to represent logical “value |
| 24 | returns”. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 25 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 26 | Example dialects that are fully compatible are the “std” and “scf” dialects with |
| 27 | respect to all implemented interfaces. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 28 | |
Julian Gross | ea51e7d | 2021-01-27 14:26:07 | [diff] [blame] | 29 | During Bufferization, we convert immutable value types (tensors) to mutable |
| 30 | types (memref). This conversion is done in several steps and in all of these |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 31 | steps the IR has to fulfill SSA like properties. The usage of memref has to be |
| 32 | in the following consecutive order: allocation, write-buffer, read- buffer. In |
| 33 | this case, there are only buffer reads allowed after the initial full buffer |
| 34 | write is done. In particular, there must be no partial write to a buffer after |
| 35 | the initial write has been finished. However, partial writes in the initializing |
| 36 | is allowed (fill buffer step by step in a loop e.g.). This means, all buffer |
| 37 | writes needs to dominate all buffer reads. |
Julian Gross | ea51e7d | 2021-01-27 14:26:07 | [diff] [blame] | 38 | |
| 39 | Example for breaking the invariant: |
| 40 | |
| 41 | ```mlir |
| 42 | func @condBranch(%arg0: i1, %arg1: memref<2xf32>) { |
| 43 | %0 = memref.alloc() : memref<2xf32> |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 44 | cf.cond_br %arg0, ^bb1, ^bb2 |
Julian Gross | ea51e7d | 2021-01-27 14:26:07 | [diff] [blame] | 45 | ^bb1: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 46 | cf.br ^bb3() |
Julian Gross | ea51e7d | 2021-01-27 14:26:07 | [diff] [blame] | 47 | ^bb2: |
| 48 | partial_write(%0, %0) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 49 | cf.br ^bb3() |
Julian Gross | ea51e7d | 2021-01-27 14:26:07 | [diff] [blame] | 50 | ^bb3(): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 51 | test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> () |
Julian Gross | ea51e7d | 2021-01-27 14:26:07 | [diff] [blame] | 52 | return |
| 53 | } |
| 54 | ``` |
| 55 | |
| 56 | The maintenance of the SSA like properties is only needed in the bufferization |
| 57 | process. Afterwards, for example in optimization processes, the property is no |
| 58 | longer needed. |
| 59 | |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 60 | ## Detection of Buffer Allocations |
| 61 | |
| 62 | The first step of the BufferDeallocation transformation is to identify |
| 63 | manageable allocation operations that implement the `SideEffects` interface. |
| 64 | Furthermore, these ops need to apply the effect `MemoryEffects::Allocate` to a |
| 65 | particular result value while not using the resource |
| 66 | `SideEffects::AutomaticAllocationScopeResource` (since it is currently reserved |
| 67 | for allocations, like `Alloca` that will be automatically deallocated by a |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 68 | parent scope). Allocations that have not been detected in this phase will not be |
| 69 | tracked internally, and thus, not deallocated automatically. However, |
| 70 | BufferDeallocation is fully compatible with “hybrid” setups in which tracked and |
| 71 | untracked allocations are mixed: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 72 | |
| 73 | ```mlir |
| 74 | func @mixedAllocation(%arg0: i1) { |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 75 | %0 = memref.alloca() : memref<2xf32> // aliases: %2 |
| 76 | %1 = memref.alloc() : memref<2xf32> // aliases: %2 |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 77 | cf.cond_br %arg0, ^bb1, ^bb2 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 78 | ^bb1: |
| 79 | use(%0) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 80 | cf.br ^bb3(%0 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 81 | ^bb2: |
| 82 | use(%1) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 83 | cf.br ^bb3(%1 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 84 | ^bb3(%2: memref<2xf32>): |
| 85 | ... |
| 86 | } |
| 87 | ``` |
| 88 | |
| 89 | Example of using a conditional branch with alloc and alloca. BufferDeallocation |
| 90 | can detect and handle the different allocation types that might be intermixed. |
| 91 | |
| 92 | Note: the current version does not support allocation operations returning |
| 93 | multiple result buffers. |
| 94 | |
| 95 | ## Conversion from AllocOp to AllocaOp |
| 96 | |
| 97 | The PromoteBuffersToStack-pass converts AllocOps to AllocaOps, if possible. In |
| 98 | some cases, it can be useful to use such stack-based buffers instead of |
| 99 | heap-based buffers. The conversion is restricted to several constraints like: |
| 100 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 101 | * Control flow |
| 102 | * Buffer Size |
| 103 | * Dynamic Size |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 104 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 105 | If a buffer is leaving a block, we are not allowed to convert it into an alloca. |
| 106 | If the size of the buffer is large, we could convert it, but regarding stack |
| 107 | overflow, it makes sense to limit the size of these buffers and only convert |
| 108 | small ones. The size can be set via a pass option. The current default value is |
| 109 | 1KB. Furthermore, we can not convert buffers with dynamic size, since the |
| 110 | dimension is not known a priori. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 111 | |
| 112 | ## Movement and Placement of Allocations |
| 113 | |
| 114 | Using the buffer hoisting pass, all buffer allocations are moved as far upwards |
| 115 | as possible in order to group them and make upcoming optimizations easier by |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 116 | limiting the search space. Such a movement is shown in the following graphs. In |
| 117 | addition, we are able to statically free an alloc, if we move it into a |
| 118 | dominator of all of its uses. This simplifies further optimizations (e.g. buffer |
| 119 | fusion) in the future. However, movement of allocations is limited by external |
| 120 | data dependencies (in particular in the case of allocations of dynamically |
| 121 | shaped types). Furthermore, allocations can be moved out of nested regions, if |
| 122 | necessary. In order to move allocations to valid locations with respect to their |
| 123 | uses only, we leverage Liveness information. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 124 | |
| 125 | The following code snippets shows a conditional branch before running the |
| 126 | BufferHoisting pass: |
| 127 | |
| 128 |  |
| 129 | |
| 130 | ```mlir |
| 131 | func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 132 | cf.cond_br %arg0, ^bb1, ^bb2 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 133 | ^bb1: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 134 | cf.br ^bb3(%arg1 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 135 | ^bb2: |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 136 | %0 = memref.alloc() : memref<2xf32> // aliases: %1 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 137 | use(%0) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 138 | cf.br ^bb3(%0 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 139 | ^bb3(%1: memref<2xf32>): // %1 could be %0 or %arg1 |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 140 | test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> () |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 141 | return |
| 142 | } |
| 143 | ``` |
| 144 | |
| 145 | Applying the BufferHoisting pass on this program results in the following piece |
| 146 | of code: |
| 147 | |
| 148 |  |
| 149 | |
| 150 | ```mlir |
| 151 | func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 152 | %0 = memref.alloc() : memref<2xf32> // moved to bb0 |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 153 | cf.cond_br %arg0, ^bb1, ^bb2 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 154 | ^bb1: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 155 | cf.br ^bb3(%arg1 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 156 | ^bb2: |
| 157 | use(%0) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 158 | cf.br ^bb3(%0 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 159 | ^bb3(%1: memref<2xf32>): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 160 | test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> () |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 161 | return |
| 162 | } |
| 163 | ``` |
| 164 | |
| 165 | The alloc is moved from bb2 to the beginning and it is passed as an argument to |
| 166 | bb3. |
| 167 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 168 | The following example demonstrates an allocation using dynamically shaped types. |
| 169 | Due to the data dependency of the allocation to %0, we cannot move the |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 170 | allocation out of bb2 in this case: |
| 171 | |
| 172 | ```mlir |
| 173 | func @condBranchDynamicType( |
| 174 | %arg0: i1, |
| 175 | %arg1: memref<?xf32>, |
| 176 | %arg2: memref<?xf32>, |
| 177 | %arg3: index) { |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 178 | cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 179 | ^bb1: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 180 | cf.br ^bb3(%arg1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 181 | ^bb2(%0: index): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 182 | %1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards to the data |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 183 | // dependency to %0 |
| 184 | use(%1) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 185 | cf.br ^bb3(%1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 186 | ^bb3(%2: memref<?xf32>): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 187 | test.copy(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> () |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 188 | return |
| 189 | } |
| 190 | ``` |
| 191 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 192 | ## Introduction of Clones |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 193 | |
| 194 | In order to guarantee that all allocated buffers are freed properly, we have to |
| 195 | pay attention to the control flow and all potential aliases a buffer allocation |
| 196 | can have. Since not all allocations can be safely freed with respect to their |
| 197 | aliases (see the following code snippet), it is often required to introduce |
| 198 | copies to eliminate them. Consider the following example in which the |
| 199 | allocations have already been placed: |
| 200 | |
| 201 | ```mlir |
| 202 | func @branch(%arg0: i1) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 203 | %0 = memref.alloc() : memref<2xf32> // aliases: %2 |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 204 | cf.cond_br %arg0, ^bb1, ^bb2 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 205 | ^bb1: |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 206 | %1 = memref.alloc() : memref<2xf32> // resides here for demonstration purposes |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 207 | // aliases: %2 |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 208 | cf.br ^bb3(%1 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 209 | ^bb2: |
| 210 | use(%0) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 211 | cf.br ^bb3(%0 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 212 | ^bb3(%2: memref<2xf32>): |
| 213 | … |
| 214 | return |
| 215 | } |
| 216 | ``` |
| 217 | |
| 218 | The first alloc can be safely freed after the live range of its post-dominator |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 219 | block (bb3). The alloc in bb1 has an alias %2 in bb3 that also keeps this buffer |
| 220 | alive until the end of bb3. Since we cannot determine the actual branches that |
| 221 | will be taken at runtime, we have to ensure that all buffers are freed correctly |
| 222 | in bb3 regardless of the branches we will take to reach the exit block. This |
| 223 | makes it necessary to introduce a copy for %2, which allows us to free %alloc0 |
| 224 | in bb0 and %alloc1 in bb1. Afterwards, we can continue processing all aliases of |
| 225 | %2 (none in this case) and we can safely free %2 at the end of the sample |
| 226 | program. This sample demonstrates that not all allocations can be safely freed |
| 227 | in their associated post-dominator blocks. Instead, we have to pay attention to |
| 228 | all of their aliases. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 229 | |
| 230 | Applying the BufferDeallocation pass to the program above yields the following |
| 231 | result: |
| 232 | |
| 233 | ```mlir |
| 234 | func @branch(%arg0: i1) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 235 | %0 = memref.alloc() : memref<2xf32> |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 236 | cf.cond_br %arg0, ^bb1, ^bb2 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 237 | ^bb1: |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 238 | %1 = memref.alloc() : memref<2xf32> |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 239 | %3 = bufferization.clone %1 : (memref<2xf32>) -> (memref<2xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 240 | memref.dealloc %1 : memref<2xf32> // %1 can be safely freed here |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 241 | cf.br ^bb3(%3 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 242 | ^bb2: |
| 243 | use(%0) |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 244 | %4 = bufferization.clone %0 : (memref<2xf32>) -> (memref<2xf32>) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 245 | cf.br ^bb3(%4 : memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 246 | ^bb3(%2: memref<2xf32>): |
| 247 | … |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 248 | memref.dealloc %2 : memref<2xf32> // free temp buffer %2 |
| 249 | memref.dealloc %0 : memref<2xf32> // %0 can be safely freed here |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 250 | return |
| 251 | } |
| 252 | ``` |
| 253 | |
| 254 | Note that a temporary buffer for %2 was introduced to free all allocations |
| 255 | properly. Note further that the unnecessary allocation of %3 can be easily |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 256 | removed using one of the post-pass transformations or the canonicalization pass. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 257 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 258 | The presented example also works with dynamically shaped types. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 259 | |
| 260 | BufferDeallocation performs a fix-point iteration taking all aliases of all |
| 261 | tracked allocations into account. We initialize the general iteration process |
| 262 | using all tracked allocations and their associated aliases. As soon as we |
| 263 | encounter an alias that is not properly dominated by our allocation, we mark |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 264 | this alias as *critical* (needs to be freed and tracked by the internal |
| 265 | fix-point iteration). The following sample demonstrates the presence of critical |
| 266 | and non-critical aliases: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 267 | |
| 268 |  |
| 269 | |
| 270 | ```mlir |
| 271 | func @condBranchDynamicTypeNested( |
| 272 | %arg0: i1, |
| 273 | %arg1: memref<?xf32>, // aliases: %3, %4 |
| 274 | %arg2: memref<?xf32>, |
| 275 | %arg3: index) { |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 276 | cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 277 | ^bb1: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 278 | cf.br ^bb6(%arg1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 279 | ^bb2(%0: index): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 280 | %1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards due to the data |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 281 | // dependency to %0 |
| 282 | // aliases: %2, %3, %4 |
| 283 | use(%1) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 284 | cf.cond_br %arg0, ^bb3, ^bb4 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 285 | ^bb3: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 286 | cf.br ^bb5(%1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 287 | ^bb4: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 288 | cf.br ^bb5(%1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 289 | ^bb5(%2: memref<?xf32>): // non-crit. alias of %1, since %1 dominates %2 |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 290 | cf.br ^bb6(%2 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 291 | ^bb6(%3: memref<?xf32>): // crit. alias of %arg1 and %2 (in other words %1) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 292 | cf.br ^bb7(%3 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 293 | ^bb7(%4: memref<?xf32>): // non-crit. alias of %3, since %3 dominates %4 |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 294 | test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> () |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 295 | return |
| 296 | } |
| 297 | ``` |
| 298 | |
| 299 | Applying BufferDeallocation yields the following output: |
| 300 | |
| 301 |  |
| 302 | |
| 303 | ```mlir |
| 304 | func @condBranchDynamicTypeNested( |
| 305 | %arg0: i1, |
| 306 | %arg1: memref<?xf32>, |
| 307 | %arg2: memref<?xf32>, |
| 308 | %arg3: index) { |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 309 | cf.cond_br %arg0, ^bb1, ^bb2(%arg3 : index) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 310 | ^bb1: |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 311 | // temp buffer required due to alias %3 |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 312 | %5 = bufferization.clone %arg1 : (memref<?xf32>) -> (memref<?xf32>) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 313 | cf.br ^bb6(%5 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 314 | ^bb2(%0: index): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 315 | %1 = memref.alloc(%0) : memref<?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 316 | use(%1) |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 317 | cf.cond_br %arg0, ^bb3, ^bb4 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 318 | ^bb3: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 319 | cf.br ^bb5(%1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 320 | ^bb4: |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 321 | cf.br ^bb5(%1 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 322 | ^bb5(%2: memref<?xf32>): |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 323 | %6 = bufferization.clone %1 : (memref<?xf32>) -> (memref<?xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 324 | memref.dealloc %1 : memref<?xf32> |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 325 | cf.br ^bb6(%6 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 326 | ^bb6(%3: memref<?xf32>): |
River Riddle | ace0160 | 2022-02-04 04:59:43 | [diff] [blame] | 327 | cf.br ^bb7(%3 : memref<?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 328 | ^bb7(%4: memref<?xf32>): |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 329 | test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> () |
| 330 | memref.dealloc %3 : memref<?xf32> // free %3, since %4 is a non-crit. alias of %3 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 331 | return |
| 332 | } |
| 333 | ``` |
| 334 | |
| 335 | Since %3 is a critical alias, BufferDeallocation introduces an additional |
| 336 | temporary copy in all predecessor blocks. %3 has an additional (non-critical) |
| 337 | alias %4 that extends the live range until the end of bb7. Therefore, we can |
| 338 | free %3 after its last use, while taking all aliases into account. Note that %4 |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 339 | does not need to be freed, since we did not introduce a copy for it. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 340 | |
| 341 | The actual introduction of buffer copies is done after the fix-point iteration |
| 342 | has been terminated and all critical aliases have been detected. A critical |
| 343 | alias can be either a block argument or another value that is returned by an |
| 344 | operation. Copies for block arguments are handled by analyzing all predecessor |
| 345 | blocks. This is primarily done by querying the `BranchOpInterface` of the |
| 346 | associated branch terminators that can jump to the current block. Consider the |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 347 | following example which involves a simple branch and the critical block argument |
| 348 | %2: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 349 | |
| 350 | ```mlir |
| 351 | custom.br ^bb1(..., %0, : ...) |
| 352 | ... |
| 353 | custom.br ^bb1(..., %1, : ...) |
| 354 | ... |
| 355 | ^bb1(%2: memref<2xf32>): |
| 356 | ... |
| 357 | ``` |
| 358 | |
| 359 | The `BranchOpInterface` allows us to determine the actual values that will be |
| 360 | passed to block bb1 and its argument %2 by analyzing its predecessor blocks. |
| 361 | Once we have resolved the values %0 and %1 (that are associated with %2 in this |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 362 | sample), we can introduce a temporary buffer and clone its contents into the new |
| 363 | buffer. Afterwards, we rewire the branch operands to use the newly allocated |
| 364 | buffer instead. However, blocks can have implicitly defined predecessors by |
| 365 | parent ops that implement the `RegionBranchOpInterface`. This can be the case if |
| 366 | this block argument belongs to the entry block of a region. In this setting, we |
| 367 | have to identify all predecessor regions defined by the parent operation. For |
| 368 | every region, we need to get all terminator operations implementing the |
| 369 | `ReturnLike` trait, indicating that they can branch to our current block. |
| 370 | Finally, we can use a similar functionality as described above to add the |
| 371 | temporary copy. This time, we can modify the terminator operands directly |
| 372 | without touching a high-level interface. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 373 | |
| 374 | Consider the following inner-region control-flow sample that uses an imaginary |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 375 | “custom.region_if” operation. It either executes the “then” or “else” region and |
| 376 | always continues to the “join” region. The “custom.region_if_yield” operation |
| 377 | returns a result to the parent operation. This sample demonstrates the use of |
| 378 | the `RegionBranchOpInterface` to determine predecessors in order to infer the |
| 379 | high-level control flow: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 380 | |
| 381 | ```mlir |
| 382 | func @inner_region_control_flow( |
| 383 | %arg0 : index, |
| 384 | %arg1 : index) -> memref<?x?xf32> { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 385 | %0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 386 | %1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) |
| 387 | then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1 |
| 388 | custom.region_if_yield %arg2 : memref<?x?xf32> |
| 389 | } else(%arg3 : memref<?x?xf32>) { // aliases: %arg4, %1 |
| 390 | custom.region_if_yield %arg3 : memref<?x?xf32> |
| 391 | } join(%arg4 : memref<?x?xf32>) { // aliases: %1 |
| 392 | custom.region_if_yield %arg4 : memref<?x?xf32> |
| 393 | } |
| 394 | return %1 : memref<?x?xf32> |
| 395 | } |
| 396 | ``` |
| 397 | |
| 398 |  |
| 399 | |
| 400 | Non-block arguments (other values) can become aliases when they are returned by |
| 401 | dialect-specific operations. BufferDeallocation supports this behavior via the |
| 402 | `RegionBranchOpInterface`. Consider the following example that uses an “scf.if” |
| 403 | operation to determine the value of %2 at runtime which creates an alias: |
| 404 | |
| 405 | ```mlir |
| 406 | func @nested_region_control_flow(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 407 | %0 = arith.cmpi "eq", %arg0, %arg1 : index |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 408 | %1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 409 | %2 = scf.if %0 -> (memref<?x?xf32>) { |
| 410 | scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1 |
| 411 | } else { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 412 | %3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // nested allocation in a div. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 413 | // branch |
| 414 | use(%3) |
| 415 | scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1 |
| 416 | } |
| 417 | return %2 : memref<?x?xf32> |
| 418 | } |
| 419 | ``` |
| 420 | |
| 421 | In this example, a dealloc is inserted to release the buffer within the else |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 422 | block since it cannot be accessed by the remainder of the program. Accessing the |
| 423 | `RegionBranchOpInterface`, allows us to infer that %2 is a non-critical alias of |
| 424 | %1 which does not need to be tracked. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 425 | |
| 426 | ```mlir |
| 427 | func @nested_region_control_flow(%arg0: index, %arg1: index) -> memref<?x?xf32> { |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 428 | %0 = arith.cmpi "eq", %arg0, %arg1 : index |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 429 | %1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 430 | %2 = scf.if %0 -> (memref<?x?xf32>) { |
| 431 | scf.yield %1 : memref<?x?xf32> |
| 432 | } else { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 433 | %3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 434 | use(%3) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 435 | memref.dealloc %3 : memref<?x?xf32> // %3 can be safely freed here |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 436 | scf.yield %1 : memref<?x?xf32> |
| 437 | } |
| 438 | return %2 : memref<?x?xf32> |
| 439 | } |
| 440 | ``` |
| 441 | |
| 442 | Analogous to the previous case, we have to detect all terminator operations in |
| 443 | all attached regions of “scf.if” that provides a value to its parent operation |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 444 | (in this sample via scf.yield). Querying the `RegionBranchOpInterface` allows us |
| 445 | to determine the regions that “return” a result to their parent operation. Like |
| 446 | before, we have to update all `ReturnLike` terminators as described above. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 447 | Reconsider a slightly adapted version of the “custom.region_if” example from |
| 448 | above that uses a nested allocation: |
| 449 | |
| 450 | ```mlir |
| 451 | func @inner_region_control_flow_div( |
| 452 | %arg0 : index, |
| 453 | %arg1 : index) -> memref<?x?xf32> { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 454 | %0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 455 | %1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) |
| 456 | then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1 |
| 457 | custom.region_if_yield %arg2 : memref<?x?xf32> |
| 458 | } else(%arg3 : memref<?x?xf32>) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 459 | %2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // aliases: %arg4, %1 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 460 | custom.region_if_yield %2 : memref<?x?xf32> |
| 461 | } join(%arg4 : memref<?x?xf32>) { // aliases: %1 |
| 462 | custom.region_if_yield %arg4 : memref<?x?xf32> |
| 463 | } |
| 464 | return %1 : memref<?x?xf32> |
| 465 | } |
| 466 | ``` |
| 467 | |
| 468 | Since the allocation %2 happens in a divergent branch and cannot be safely |
| 469 | deallocated in a post-dominator, %arg4 will be considered a critical alias. |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 470 | Furthermore, %arg4 is returned to its parent operation and has an alias %1. This |
| 471 | causes BufferDeallocation to introduce additional copies: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 472 | |
| 473 | ```mlir |
| 474 | func @inner_region_control_flow_div( |
| 475 | %arg0 : index, |
| 476 | %arg1 : index) -> memref<?x?xf32> { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 477 | %0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 478 | %1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) |
| 479 | then(%arg2 : memref<?x?xf32>) { |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 480 | %4 = bufferization.clone %arg2 : (memref<?x?xf32>) -> (memref<?x?xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 481 | custom.region_if_yield %4 : memref<?x?xf32> |
| 482 | } else(%arg3 : memref<?x?xf32>) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 483 | %2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 484 | %5 = bufferization.clone %2 : (memref<?x?xf32>) -> (memref<?x?xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 485 | memref.dealloc %2 : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 486 | custom.region_if_yield %5 : memref<?x?xf32> |
| 487 | } join(%arg4: memref<?x?xf32>) { |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 488 | %4 = bufferization.clone %arg4 : (memref<?x?xf32>) -> (memref<?x?xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 489 | memref.dealloc %arg4 : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 490 | custom.region_if_yield %4 : memref<?x?xf32> |
| 491 | } |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 492 | memref.dealloc %0 : memref<?x?xf32> // %0 can be safely freed here |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 493 | return %1 : memref<?x?xf32> |
| 494 | } |
| 495 | ``` |
| 496 | |
| 497 | ## Placement of Deallocs |
| 498 | |
| 499 | After introducing allocs and copies, deallocs have to be placed to free |
| 500 | allocated memory and avoid memory leaks. The deallocation needs to take place |
| 501 | after the last use of the given value. The position can be determined by |
| 502 | calculating the common post-dominator of all values using their remaining |
| 503 | non-critical aliases. A special-case is the presence of back edges: since such |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 504 | edges can cause memory leaks when a newly allocated buffer flows back to another |
| 505 | part of the program. In these cases, we need to free the associated buffer |
| 506 | instances from the previous iteration by inserting additional deallocs. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 507 | |
| 508 | Consider the following “scf.for” use case containing a nested structured |
| 509 | control-flow if: |
| 510 | |
| 511 | ```mlir |
| 512 | func @loop_nested_if( |
| 513 | %lb: index, |
| 514 | %ub: index, |
| 515 | %step: index, |
| 516 | %buf: memref<2xf32>, |
| 517 | %res: memref<2xf32>) { |
| 518 | %0 = scf.for %i = %lb to %ub step %step |
| 519 | iter_args(%iterBuf = %buf) -> memref<2xf32> { |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 520 | %1 = arith.cmpi "eq", %i, %ub : index |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 521 | %2 = scf.if %1 -> (memref<2xf32>) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 522 | %3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias due to a |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 523 | // divergent allocation |
| 524 | use(%3) |
| 525 | scf.yield %3 : memref<2xf32> |
| 526 | } else { |
| 527 | scf.yield %iterBuf : memref<2xf32> |
| 528 | } |
| 529 | scf.yield %2 : memref<2xf32> |
| 530 | } |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 531 | test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> () |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 532 | return |
| 533 | } |
| 534 | ``` |
| 535 | |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 536 | In this example, the *then* branch of the nested “scf.if” operation returns a |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 537 | newly allocated buffer. |
| 538 | |
| 539 | Since this allocation happens in the scope of a divergent branch, %2 becomes a |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 540 | critical alias that needs to be handled. As before, we have to insert additional |
| 541 | copies to eliminate this alias using copies of %3 and %iterBuf. This guarantees |
| 542 | that %2 will be a newly allocated buffer that is returned in each iteration. |
| 543 | However, “returning” %2 to its alias %iterBuf turns %iterBuf into a critical |
| 544 | alias as well. In other words, we have to create a copy of %2 to pass it to |
| 545 | %iterBuf. Since this jump represents a back edge, and %2 will always be a new |
| 546 | buffer, we have to free the buffer from the previous iteration to avoid memory |
| 547 | leaks: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 548 | |
| 549 | ```mlir |
| 550 | func @loop_nested_if( |
| 551 | %lb: index, |
| 552 | %ub: index, |
| 553 | %step: index, |
| 554 | %buf: memref<2xf32>, |
| 555 | %res: memref<2xf32>) { |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 556 | %4 = bufferization.clone %buf : (memref<2xf32>) -> (memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 557 | %0 = scf.for %i = %lb to %ub step %step |
| 558 | iter_args(%iterBuf = %4) -> memref<2xf32> { |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 559 | %1 = arith.cmpi "eq", %i, %ub : index |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 560 | %2 = scf.if %1 -> (memref<2xf32>) { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 561 | %3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 562 | use(%3) |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 563 | %5 = bufferization.clone %3 : (memref<2xf32>) -> (memref<2xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 564 | memref.dealloc %3 : memref<2xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 565 | scf.yield %5 : memref<2xf32> |
| 566 | } else { |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 567 | %6 = bufferization.clone %iterBuf : (memref<2xf32>) -> (memref<2xf32>) |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 568 | scf.yield %6 : memref<2xf32> |
| 569 | } |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 570 | %7 = bufferization.clone %2 : (memref<2xf32>) -> (memref<2xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 571 | memref.dealloc %2 : memref<2xf32> |
| 572 | memref.dealloc %iterBuf : memref<2xf32> // free backedge iteration variable |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 573 | scf.yield %7 : memref<2xf32> |
| 574 | } |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 575 | test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> () |
| 576 | memref.dealloc %0 : memref<2xf32> // free temp copy %0 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 577 | return |
| 578 | } |
| 579 | ``` |
| 580 | |
| 581 | Example for loop-like control flow. The CFG contains back edges that have to be |
| 582 | handled to avoid memory leaks. The bufferization is able to free the backedge |
| 583 | iteration variable %iterBuf. |
| 584 | |
| 585 | ## Private Analyses Implementations |
| 586 | |
| 587 | The BufferDeallocation transformation relies on one primary control-flow |
| 588 | analysis: BufferPlacementAliasAnalysis. Furthermore, we also use dominance and |
| 589 | liveness to place and move nodes. The liveness analysis determines the live |
| 590 | range of a given value. Within this range, a value is alive and can or will be |
| 591 | used in the course of the program. After this range, the value is dead and can |
| 592 | be discarded - in our case, the buffer can be freed. To place the allocs, we |
| 593 | need to know from which position a value will be alive. The allocs have to be |
| 594 | placed in front of this position. However, the most important analysis is the |
| 595 | alias analysis that is needed to introduce copies and to place all |
| 596 | deallocations. |
| 597 | |
| 598 | # Post Phase |
| 599 | |
| 600 | In order to limit the complexity of the BufferDeallocation transformation, some |
| 601 | tiny code-polishing/optimization transformations are not applied on-the-fly |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 602 | during placement. Currently, a canonicalization pattern is added to the clone |
| 603 | operation to reduce the appearance of unnecessary clones. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 604 | |
| 605 | Note: further transformations might be added to the post-pass phase in the |
| 606 | future. |
| 607 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 608 | ## Clone Canonicalization |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 609 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 610 | During placement of clones it may happen, that unnecessary clones are inserted. |
| 611 | If these clones appear with their corresponding dealloc operation within the |
| 612 | same block, we can use the canonicalizer to remove these unnecessary operations. |
| 613 | Note, that this step needs to take place after the insertion of clones and |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 614 | deallocs in the buffer deallocation step. The canonicalization inludes both, the |
| 615 | newly created target value from the clone operation and the source operation. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 616 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 617 | ## Canonicalization of the Source Buffer of the Clone Operation |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 618 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 619 | In this case, the source of the clone operation can be used instead of its |
| 620 | target. The unused allocation and deallocation operations that are defined for |
| 621 | this clone operation are also removed. Here is a working example generated by |
| 622 | the BufferDeallocation pass that allocates a buffer with dynamic size. A deeper |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 623 | analysis of this sample reveals that the highlighted operations are redundant |
| 624 | and can be removed. |
| 625 | |
| 626 | ```mlir |
| 627 | func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 628 | %1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 629 | %2 = bufferization.clone %1 : (memref<?x?xf32>) -> (memref<?x?xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 630 | memref.dealloc %1 : memref<?x?xf32> |
| 631 | return %2 : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 632 | } |
| 633 | ``` |
| 634 | |
| 635 | Will be transformed to: |
| 636 | |
| 637 | ```mlir |
| 638 | func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> { |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 639 | %1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> |
| 640 | return %1 : memref<?x?xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 641 | } |
| 642 | ``` |
| 643 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 644 | In this case, the additional copy %2 can be replaced with its original source |
| 645 | buffer %1. This also applies to the associated dealloc operation of %1. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 646 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 647 | ## Canonicalization of the Target Buffer of the Clone Operation |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 648 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 649 | In this case, the target buffer of the clone operation can be used instead of |
| 650 | its source. The unused deallocation operation that is defined for this clone |
| 651 | operation is also removed. |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 652 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 653 | Consider the following example where a generic test operation writes the result |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 654 | to %temp and then copies %temp to %result. However, these two operations can be |
| 655 | merged into a single step. Canonicalization removes the clone operation and |
| 656 | %temp, and replaces the uses of %temp with %result: |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 657 | |
| 658 | ```mlir |
| 659 | func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){ |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 660 | %temp = memref.alloc() : memref<2xf32> |
| 661 | test.generic { |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 662 | args_in = 1 : i64, |
| 663 | args_out = 1 : i64, |
| 664 | indexing_maps = [#map0, #map0], |
| 665 | iterator_types = ["parallel"]} %arg0, %temp { |
| 666 | ^bb0(%gen2_arg0: f32, %gen2_arg1: f32): |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 667 | %tmp2 = math.exp %gen2_arg0 : f32 |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 668 | test.yield %tmp2 : f32 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 669 | }: memref<2xf32>, memref<2xf32> |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 670 | %result = bufferization.clone %temp : (memref<2xf32>) -> (memref<2xf32>) |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 671 | memref.dealloc %temp : memref<2xf32> |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 672 | return |
| 673 | } |
| 674 | ``` |
| 675 | |
| 676 | Will be transformed to: |
| 677 | |
| 678 | ```mlir |
| 679 | func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){ |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 680 | test.generic { |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 681 | args_in = 1 : i64, |
| 682 | args_out = 1 : i64, |
| 683 | indexing_maps = [#map0, #map0], |
| 684 | iterator_types = ["parallel"]} %arg0, %result { |
| 685 | ^bb0(%gen2_arg0: f32, %gen2_arg1: f32): |
Mogball | a54f4ea | 2021-10-12 23:14:57 | [diff] [blame] | 686 | %tmp2 = math.exp %gen2_arg0 : f32 |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 687 | test.yield %tmp2 : f32 |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 688 | }: memref<2xf32>, memref<2xf32> |
| 689 | return |
| 690 | } |
| 691 | ``` |
| 692 | |
Sean Silva | e2d7d3c | 2021-01-09 00:25:47 | [diff] [blame] | 693 | ## Known Limitations |
| 694 | |
Alexander Belyaev | 465b9a4 | 2021-03-31 07:34:03 | [diff] [blame] | 695 | BufferDeallocation introduces additional clones from “memref” dialect |
Alexander Belyaev | 57470ab | 2021-11-25 10:42:16 | [diff] [blame] | 696 | (“bufferization.clone”). Analogous, all deallocations use the “memref” |
| 697 | dialect-free operation “memref.dealloc”. The actual copy process is realized |
| 698 | using “test.copy”. Furthermore, buffers are essentially immutable after their |
| 699 | creation in a block. Another limitations are known in the case using |
| 700 | unstructered control flow. |