Blame - mlir/docs/BufferDeallocationInternals.md - external/github.com/llvm/llvm-project.git

blob: 131e527a54a0c36ce140e25e255732be0717e36f [file] [log] [blame] [view]

Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	1	# Buffer Deallocation - Internals
				2
				3	This section covers the internal functionality of the BufferDeallocation
				4	transformation. The transformation consists of several passes. The main pass
				5	called BufferDeallocation can be applied via “-buffer-deallocation” on MLIR
				6	programs.
				7
				8	## Requirements
				9
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	10	In order to use BufferDeallocation on an arbitrary dialect, several control-flow
				11	interfaces have to be implemented when using custom operations. This is
				12	particularly important to understand the implicit control-flow dependencies
				13	between different parts of the input program. Without implementing the following
				14	interfaces, control-flow relations cannot be discovered properly and the
				15	resulting program can become invalid:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	16
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	17	* Branch-like terminators should implement the `BranchOpInterface` to query
				18	and manipulate associated operands.
				19	* Operations involving structured control flow have to implement the
				20	`RegionBranchOpInterface` to model inter-region control flow.
				21	* Terminators yielding values to their parent operation (in particular in the
				22	scope of nested regions within `RegionBranchOpInterface`-based operations),
				23	should implement the `ReturnLike` trait to represent logical “value
				24	returns”.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	25
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	26	Example dialects that are fully compatible are the “std” and “scf” dialects with
				27	respect to all implemented interfaces.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	28
Julian Gross	ea51e7d	2021-01-27 14:26:07	[diff] [blame]	29	During Bufferization, we convert immutable value types (tensors) to mutable
				30	types (memref). This conversion is done in several steps and in all of these
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	31	steps the IR has to fulfill SSA like properties. The usage of memref has to be
				32	in the following consecutive order: allocation, write-buffer, read- buffer. In
				33	this case, there are only buffer reads allowed after the initial full buffer
				34	write is done. In particular, there must be no partial write to a buffer after
				35	the initial write has been finished. However, partial writes in the initializing
				36	is allowed (fill buffer step by step in a loop e.g.). This means, all buffer
				37	writes needs to dominate all buffer reads.
Julian Gross	ea51e7d	2021-01-27 14:26:07	[diff] [blame]	38
				39	Example for breaking the invariant:
				40
				41	```mlir
				42	func @condBranch(%arg0: i1, %arg1: memref<2xf32>) {
				43	%0 = memref.alloc() : memref<2xf32>
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	44	cf.cond_br %arg0, ^bb1, ^bb2
Julian Gross	ea51e7d	2021-01-27 14:26:07	[diff] [blame]	45	^bb1:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	46	cf.br ^bb3()
Julian Gross	ea51e7d	2021-01-27 14:26:07	[diff] [blame]	47	^bb2:
				48	partial_write(%0, %0)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	49	cf.br ^bb3()
Julian Gross	ea51e7d	2021-01-27 14:26:07	[diff] [blame]	50	^bb3():
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	51	test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
Julian Gross	ea51e7d	2021-01-27 14:26:07	[diff] [blame]	52	return
				53	}
				54	```
				55
				56	The maintenance of the SSA like properties is only needed in the bufferization
				57	process. Afterwards, for example in optimization processes, the property is no
				58	longer needed.
				59
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	60	## Detection of Buffer Allocations
				61
				62	The first step of the BufferDeallocation transformation is to identify
				63	manageable allocation operations that implement the `SideEffects` interface.
				64	Furthermore, these ops need to apply the effect `MemoryEffects::Allocate` to a
				65	particular result value while not using the resource
				66	`SideEffects::AutomaticAllocationScopeResource` (since it is currently reserved
				67	for allocations, like `Alloca` that will be automatically deallocated by a
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	68	parent scope). Allocations that have not been detected in this phase will not be
				69	tracked internally, and thus, not deallocated automatically. However,
				70	BufferDeallocation is fully compatible with “hybrid” setups in which tracked and
				71	untracked allocations are mixed:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	72
				73	```mlir
				74	func @mixedAllocation(%arg0: i1) {
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	75	%0 = memref.alloca() : memref<2xf32> // aliases: %2
				76	%1 = memref.alloc() : memref<2xf32> // aliases: %2
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	77	cf.cond_br %arg0, ^bb1, ^bb2
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	78	^bb1:
				79	use(%0)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	80	cf.br ^bb3(%0 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	81	^bb2:
				82	use(%1)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	83	cf.br ^bb3(%1 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	84	^bb3(%2: memref<2xf32>):
				85	...
				86	}
				87	```
				88
				89	Example of using a conditional branch with alloc and alloca. BufferDeallocation
				90	can detect and handle the different allocation types that might be intermixed.
				91
				92	Note: the current version does not support allocation operations returning
				93	multiple result buffers.
				94
				95	## Conversion from AllocOp to AllocaOp
				96
				97	The PromoteBuffersToStack-pass converts AllocOps to AllocaOps, if possible. In
				98	some cases, it can be useful to use such stack-based buffers instead of
				99	heap-based buffers. The conversion is restricted to several constraints like:
				100
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	101	* Control flow
				102	* Buffer Size
				103	* Dynamic Size
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	104
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	105	If a buffer is leaving a block, we are not allowed to convert it into an alloca.
				106	If the size of the buffer is large, we could convert it, but regarding stack
				107	overflow, it makes sense to limit the size of these buffers and only convert
				108	small ones. The size can be set via a pass option. The current default value is
				109	1KB. Furthermore, we can not convert buffers with dynamic size, since the
				110	dimension is not known a priori.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	111
				112	## Movement and Placement of Allocations
				113
				114	Using the buffer hoisting pass, all buffer allocations are moved as far upwards
				115	as possible in order to group them and make upcoming optimizations easier by
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	116	limiting the search space. Such a movement is shown in the following graphs. In
				117	addition, we are able to statically free an alloc, if we move it into a
				118	dominator of all of its uses. This simplifies further optimizations (e.g. buffer
				119	fusion) in the future. However, movement of allocations is limited by external
				120	data dependencies (in particular in the case of allocations of dynamically
				121	shaped types). Furthermore, allocations can be moved out of nested regions, if
				122	necessary. In order to move allocations to valid locations with respect to their
				123	uses only, we leverage Liveness information.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	124
				125	The following code snippets shows a conditional branch before running the
				126	BufferHoisting pass:
				127
				128	![branch_example_pre_move](/includes/img/branch_example_pre_move.svg)
				129
				130	```mlir
				131	func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	132	cf.cond_br %arg0, ^bb1, ^bb2
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	133	^bb1:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	134	cf.br ^bb3(%arg1 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	135	^bb2:
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	136	%0 = memref.alloc() : memref<2xf32> // aliases: %1
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	137	use(%0)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	138	cf.br ^bb3(%0 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	139	^bb3(%1: memref<2xf32>): // %1 could be %0 or %arg1
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	140	test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	141	return
				142	}
				143	```
				144
				145	Applying the BufferHoisting pass on this program results in the following piece
				146	of code:
				147
				148	![branch_example_post_move](/includes/img/branch_example_post_move.svg)
				149
				150	```mlir
				151	func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	152	%0 = memref.alloc() : memref<2xf32> // moved to bb0
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	153	cf.cond_br %arg0, ^bb1, ^bb2
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	154	^bb1:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	155	cf.br ^bb3(%arg1 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	156	^bb2:
				157	use(%0)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	158	cf.br ^bb3(%0 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	159	^bb3(%1: memref<2xf32>):
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	160	test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	161	return
				162	}
				163	```
				164
				165	The alloc is moved from bb2 to the beginning and it is passed as an argument to
				166	bb3.
				167
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	168	The following example demonstrates an allocation using dynamically shaped types.
				169	Due to the data dependency of the allocation to %0, we cannot move the
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	170	allocation out of bb2 in this case:
				171
				172	```mlir
				173	func @condBranchDynamicType(
				174	%arg0: i1,
				175	%arg1: memref<?xf32>,
				176	%arg2: memref<?xf32>,
				177	%arg3: index) {
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	178	cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	179	^bb1:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	180	cf.br ^bb3(%arg1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	181	^bb2(%0: index):
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	182	%1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards to the data
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	183	// dependency to %0
				184	use(%1)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	185	cf.br ^bb3(%1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	186	^bb3(%2: memref<?xf32>):
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	187	test.copy(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	188	return
				189	}
				190	```
				191
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	192	## Introduction of Clones
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	193
				194	In order to guarantee that all allocated buffers are freed properly, we have to
				195	pay attention to the control flow and all potential aliases a buffer allocation
				196	can have. Since not all allocations can be safely freed with respect to their
				197	aliases (see the following code snippet), it is often required to introduce
				198	copies to eliminate them. Consider the following example in which the
				199	allocations have already been placed:
				200
				201	```mlir
				202	func @branch(%arg0: i1) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	203	%0 = memref.alloc() : memref<2xf32> // aliases: %2
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	204	cf.cond_br %arg0, ^bb1, ^bb2
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	205	^bb1:
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	206	%1 = memref.alloc() : memref<2xf32> // resides here for demonstration purposes
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	207	// aliases: %2
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	208	cf.br ^bb3(%1 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	209	^bb2:
				210	use(%0)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	211	cf.br ^bb3(%0 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	212	^bb3(%2: memref<2xf32>):
				213	…
				214	return
				215	}
				216	```
				217
				218	The first alloc can be safely freed after the live range of its post-dominator
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	219	block (bb3). The alloc in bb1 has an alias %2 in bb3 that also keeps this buffer
				220	alive until the end of bb3. Since we cannot determine the actual branches that
				221	will be taken at runtime, we have to ensure that all buffers are freed correctly
				222	in bb3 regardless of the branches we will take to reach the exit block. This
				223	makes it necessary to introduce a copy for %2, which allows us to free %alloc0
				224	in bb0 and %alloc1 in bb1. Afterwards, we can continue processing all aliases of
				225	%2 (none in this case) and we can safely free %2 at the end of the sample
				226	program. This sample demonstrates that not all allocations can be safely freed
				227	in their associated post-dominator blocks. Instead, we have to pay attention to
				228	all of their aliases.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	229
				230	Applying the BufferDeallocation pass to the program above yields the following
				231	result:
				232
				233	```mlir
				234	func @branch(%arg0: i1) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	235	%0 = memref.alloc() : memref<2xf32>
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	236	cf.cond_br %arg0, ^bb1, ^bb2
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	237	^bb1:
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	238	%1 = memref.alloc() : memref<2xf32>
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	239	%3 = bufferization.clone %1 : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	240	memref.dealloc %1 : memref<2xf32> // %1 can be safely freed here
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	241	cf.br ^bb3(%3 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	242	^bb2:
				243	use(%0)
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	244	%4 = bufferization.clone %0 : (memref<2xf32>) -> (memref<2xf32>)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	245	cf.br ^bb3(%4 : memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	246	^bb3(%2: memref<2xf32>):
				247	…
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	248	memref.dealloc %2 : memref<2xf32> // free temp buffer %2
				249	memref.dealloc %0 : memref<2xf32> // %0 can be safely freed here
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	250	return
				251	}
				252	```
				253
				254	Note that a temporary buffer for %2 was introduced to free all allocations
				255	properly. Note further that the unnecessary allocation of %3 can be easily
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	256	removed using one of the post-pass transformations or the canonicalization pass.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	257
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	258	The presented example also works with dynamically shaped types.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	259
				260	BufferDeallocation performs a fix-point iteration taking all aliases of all
				261	tracked allocations into account. We initialize the general iteration process
				262	using all tracked allocations and their associated aliases. As soon as we
				263	encounter an alias that is not properly dominated by our allocation, we mark
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	264	this alias as critical (needs to be freed and tracked by the internal
				265	fix-point iteration). The following sample demonstrates the presence of critical
				266	and non-critical aliases:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	267
				268	![nested_branch_example_pre_move](/includes/img/nested_branch_example_pre_move.svg)
				269
				270	```mlir
				271	func @condBranchDynamicTypeNested(
				272	%arg0: i1,
				273	%arg1: memref<?xf32>, // aliases: %3, %4
				274	%arg2: memref<?xf32>,
				275	%arg3: index) {
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	276	cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	277	^bb1:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	278	cf.br ^bb6(%arg1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	279	^bb2(%0: index):
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	280	%1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards due to the data
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	281	// dependency to %0
				282	// aliases: %2, %3, %4
				283	use(%1)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	284	cf.cond_br %arg0, ^bb3, ^bb4
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	285	^bb3:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	286	cf.br ^bb5(%1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	287	^bb4:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	288	cf.br ^bb5(%1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	289	^bb5(%2: memref<?xf32>): // non-crit. alias of %1, since %1 dominates %2
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	290	cf.br ^bb6(%2 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	291	^bb6(%3: memref<?xf32>): // crit. alias of %arg1 and %2 (in other words %1)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	292	cf.br ^bb7(%3 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	293	^bb7(%4: memref<?xf32>): // non-crit. alias of %3, since %3 dominates %4
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	294	test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	295	return
				296	}
				297	```
				298
				299	Applying BufferDeallocation yields the following output:
				300
				301	![nested_branch_example_post_move](/includes/img/nested_branch_example_post_move.svg)
				302
				303	```mlir
				304	func @condBranchDynamicTypeNested(
				305	%arg0: i1,
				306	%arg1: memref<?xf32>,
				307	%arg2: memref<?xf32>,
				308	%arg3: index) {
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	309	cf.cond_br %arg0, ^bb1, ^bb2(%arg3 : index)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	310	^bb1:
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	311	// temp buffer required due to alias %3
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	312	%5 = bufferization.clone %arg1 : (memref<?xf32>) -> (memref<?xf32>)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	313	cf.br ^bb6(%5 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	314	^bb2(%0: index):
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	315	%1 = memref.alloc(%0) : memref<?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	316	use(%1)
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	317	cf.cond_br %arg0, ^bb3, ^bb4
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	318	^bb3:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	319	cf.br ^bb5(%1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	320	^bb4:
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	321	cf.br ^bb5(%1 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	322	^bb5(%2: memref<?xf32>):
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	323	%6 = bufferization.clone %1 : (memref<?xf32>) -> (memref<?xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	324	memref.dealloc %1 : memref<?xf32>
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	325	cf.br ^bb6(%6 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	326	^bb6(%3: memref<?xf32>):
River Riddle	ace0160	2022-02-04 04:59:43	[diff] [blame]	327	cf.br ^bb7(%3 : memref<?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	328	^bb7(%4: memref<?xf32>):
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	329	test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
				330	memref.dealloc %3 : memref<?xf32> // free %3, since %4 is a non-crit. alias of %3
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	331	return
				332	}
				333	```
				334
				335	Since %3 is a critical alias, BufferDeallocation introduces an additional
				336	temporary copy in all predecessor blocks. %3 has an additional (non-critical)
				337	alias %4 that extends the live range until the end of bb7. Therefore, we can
				338	free %3 after its last use, while taking all aliases into account. Note that %4
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	339	does not need to be freed, since we did not introduce a copy for it.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	340
				341	The actual introduction of buffer copies is done after the fix-point iteration
				342	has been terminated and all critical aliases have been detected. A critical
				343	alias can be either a block argument or another value that is returned by an
				344	operation. Copies for block arguments are handled by analyzing all predecessor
				345	blocks. This is primarily done by querying the `BranchOpInterface` of the
				346	associated branch terminators that can jump to the current block. Consider the
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	347	following example which involves a simple branch and the critical block argument
				348	%2:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	349
				350	```mlir
				351	custom.br ^bb1(..., %0, : ...)
				352	...
				353	custom.br ^bb1(..., %1, : ...)
				354	...
				355	^bb1(%2: memref<2xf32>):
				356	...
				357	```
				358
				359	The `BranchOpInterface` allows us to determine the actual values that will be
				360	passed to block bb1 and its argument %2 by analyzing its predecessor blocks.
				361	Once we have resolved the values %0 and %1 (that are associated with %2 in this
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	362	sample), we can introduce a temporary buffer and clone its contents into the new
				363	buffer. Afterwards, we rewire the branch operands to use the newly allocated
				364	buffer instead. However, blocks can have implicitly defined predecessors by
				365	parent ops that implement the `RegionBranchOpInterface`. This can be the case if
				366	this block argument belongs to the entry block of a region. In this setting, we
				367	have to identify all predecessor regions defined by the parent operation. For
				368	every region, we need to get all terminator operations implementing the
				369	`ReturnLike` trait, indicating that they can branch to our current block.
				370	Finally, we can use a similar functionality as described above to add the
				371	temporary copy. This time, we can modify the terminator operands directly
				372	without touching a high-level interface.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	373
				374	Consider the following inner-region control-flow sample that uses an imaginary
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	375	“custom.region_if” operation. It either executes the “then” or “else” region and
				376	always continues to the “join” region. The “custom.region_if_yield” operation
				377	returns a result to the parent operation. This sample demonstrates the use of
				378	the `RegionBranchOpInterface` to determine predecessors in order to infer the
				379	high-level control flow:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	380
				381	```mlir
				382	func @inner_region_control_flow(
				383	%arg0 : index,
				384	%arg1 : index) -> memref<?x?xf32> {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	385	%0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	386	%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
				387	then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1
				388	custom.region_if_yield %arg2 : memref<?x?xf32>
				389	} else(%arg3 : memref<?x?xf32>) { // aliases: %arg4, %1
				390	custom.region_if_yield %arg3 : memref<?x?xf32>
				391	} join(%arg4 : memref<?x?xf32>) { // aliases: %1
				392	custom.region_if_yield %arg4 : memref<?x?xf32>
				393	}
				394	return %1 : memref<?x?xf32>
				395	}
				396	```
				397
				398	![region_branch_example_pre_move](/includes/img/region_branch_example_pre_move.svg)
				399
				400	Non-block arguments (other values) can become aliases when they are returned by
				401	dialect-specific operations. BufferDeallocation supports this behavior via the
				402	`RegionBranchOpInterface`. Consider the following example that uses an “scf.if”
				403	operation to determine the value of %2 at runtime which creates an alias:
				404
				405	```mlir
				406	func @nested_region_control_flow(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	407	%0 = arith.cmpi "eq", %arg0, %arg1 : index
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	408	%1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	409	%2 = scf.if %0 -> (memref<?x?xf32>) {
				410	scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1
				411	} else {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	412	%3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // nested allocation in a div.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	413	// branch
				414	use(%3)
				415	scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1
				416	}
				417	return %2 : memref<?x?xf32>
				418	}
				419	```
				420
				421	In this example, a dealloc is inserted to release the buffer within the else
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	422	block since it cannot be accessed by the remainder of the program. Accessing the
				423	`RegionBranchOpInterface`, allows us to infer that %2 is a non-critical alias of
				424	%1 which does not need to be tracked.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	425
				426	```mlir
				427	func @nested_region_control_flow(%arg0: index, %arg1: index) -> memref<?x?xf32> {
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	428	%0 = arith.cmpi "eq", %arg0, %arg1 : index
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	429	%1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	430	%2 = scf.if %0 -> (memref<?x?xf32>) {
				431	scf.yield %1 : memref<?x?xf32>
				432	} else {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	433	%3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	434	use(%3)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	435	memref.dealloc %3 : memref<?x?xf32> // %3 can be safely freed here
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	436	scf.yield %1 : memref<?x?xf32>
				437	}
				438	return %2 : memref<?x?xf32>
				439	}
				440	```
				441
				442	Analogous to the previous case, we have to detect all terminator operations in
				443	all attached regions of “scf.if” that provides a value to its parent operation
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	444	(in this sample via scf.yield). Querying the `RegionBranchOpInterface` allows us
				445	to determine the regions that “return” a result to their parent operation. Like
				446	before, we have to update all `ReturnLike` terminators as described above.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	447	Reconsider a slightly adapted version of the “custom.region_if” example from
				448	above that uses a nested allocation:
				449
				450	```mlir
				451	func @inner_region_control_flow_div(
				452	%arg0 : index,
				453	%arg1 : index) -> memref<?x?xf32> {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	454	%0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	455	%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
				456	then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1
				457	custom.region_if_yield %arg2 : memref<?x?xf32>
				458	} else(%arg3 : memref<?x?xf32>) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	459	%2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // aliases: %arg4, %1
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	460	custom.region_if_yield %2 : memref<?x?xf32>
				461	} join(%arg4 : memref<?x?xf32>) { // aliases: %1
				462	custom.region_if_yield %arg4 : memref<?x?xf32>
				463	}
				464	return %1 : memref<?x?xf32>
				465	}
				466	```
				467
				468	Since the allocation %2 happens in a divergent branch and cannot be safely
				469	deallocated in a post-dominator, %arg4 will be considered a critical alias.
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	470	Furthermore, %arg4 is returned to its parent operation and has an alias %1. This
				471	causes BufferDeallocation to introduce additional copies:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	472
				473	```mlir
				474	func @inner_region_control_flow_div(
				475	%arg0 : index,
				476	%arg1 : index) -> memref<?x?xf32> {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	477	%0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	478	%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
				479	then(%arg2 : memref<?x?xf32>) {
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	480	%4 = bufferization.clone %arg2 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	481	custom.region_if_yield %4 : memref<?x?xf32>
				482	} else(%arg3 : memref<?x?xf32>) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	483	%2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	484	%5 = bufferization.clone %2 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	485	memref.dealloc %2 : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	486	custom.region_if_yield %5 : memref<?x?xf32>
				487	} join(%arg4: memref<?x?xf32>) {
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	488	%4 = bufferization.clone %arg4 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	489	memref.dealloc %arg4 : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	490	custom.region_if_yield %4 : memref<?x?xf32>
				491	}
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	492	memref.dealloc %0 : memref<?x?xf32> // %0 can be safely freed here
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	493	return %1 : memref<?x?xf32>
				494	}
				495	```
				496
				497	## Placement of Deallocs
				498
				499	After introducing allocs and copies, deallocs have to be placed to free
				500	allocated memory and avoid memory leaks. The deallocation needs to take place
				501	after the last use of the given value. The position can be determined by
				502	calculating the common post-dominator of all values using their remaining
				503	non-critical aliases. A special-case is the presence of back edges: since such
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	504	edges can cause memory leaks when a newly allocated buffer flows back to another
				505	part of the program. In these cases, we need to free the associated buffer
				506	instances from the previous iteration by inserting additional deallocs.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	507
				508	Consider the following “scf.for” use case containing a nested structured
				509	control-flow if:
				510
				511	```mlir
				512	func @loop_nested_if(
				513	%lb: index,
				514	%ub: index,
				515	%step: index,
				516	%buf: memref<2xf32>,
				517	%res: memref<2xf32>) {
				518	%0 = scf.for %i = %lb to %ub step %step
				519	iter_args(%iterBuf = %buf) -> memref<2xf32> {
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	520	%1 = arith.cmpi "eq", %i, %ub : index
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	521	%2 = scf.if %1 -> (memref<2xf32>) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	522	%3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias due to a
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	523	// divergent allocation
				524	use(%3)
				525	scf.yield %3 : memref<2xf32>
				526	} else {
				527	scf.yield %iterBuf : memref<2xf32>
				528	}
				529	scf.yield %2 : memref<2xf32>
				530	}
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	531	test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	532	return
				533	}
				534	```
				535
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	536	In this example, the then branch of the nested “scf.if” operation returns a
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	537	newly allocated buffer.
				538
				539	Since this allocation happens in the scope of a divergent branch, %2 becomes a
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	540	critical alias that needs to be handled. As before, we have to insert additional
				541	copies to eliminate this alias using copies of %3 and %iterBuf. This guarantees
				542	that %2 will be a newly allocated buffer that is returned in each iteration.
				543	However, “returning” %2 to its alias %iterBuf turns %iterBuf into a critical
				544	alias as well. In other words, we have to create a copy of %2 to pass it to
				545	%iterBuf. Since this jump represents a back edge, and %2 will always be a new
				546	buffer, we have to free the buffer from the previous iteration to avoid memory
				547	leaks:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	548
				549	```mlir
				550	func @loop_nested_if(
				551	%lb: index,
				552	%ub: index,
				553	%step: index,
				554	%buf: memref<2xf32>,
				555	%res: memref<2xf32>) {
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	556	%4 = bufferization.clone %buf : (memref<2xf32>) -> (memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	557	%0 = scf.for %i = %lb to %ub step %step
				558	iter_args(%iterBuf = %4) -> memref<2xf32> {
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	559	%1 = arith.cmpi "eq", %i, %ub : index
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	560	%2 = scf.if %1 -> (memref<2xf32>) {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	561	%3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	562	use(%3)
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	563	%5 = bufferization.clone %3 : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	564	memref.dealloc %3 : memref<2xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	565	scf.yield %5 : memref<2xf32>
				566	} else {
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	567	%6 = bufferization.clone %iterBuf : (memref<2xf32>) -> (memref<2xf32>)
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	568	scf.yield %6 : memref<2xf32>
				569	}
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	570	%7 = bufferization.clone %2 : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	571	memref.dealloc %2 : memref<2xf32>
				572	memref.dealloc %iterBuf : memref<2xf32> // free backedge iteration variable
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	573	scf.yield %7 : memref<2xf32>
				574	}
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	575	test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				576	memref.dealloc %0 : memref<2xf32> // free temp copy %0
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	577	return
				578	}
				579	```
				580
				581	Example for loop-like control flow. The CFG contains back edges that have to be
				582	handled to avoid memory leaks. The bufferization is able to free the backedge
				583	iteration variable %iterBuf.
				584
				585	## Private Analyses Implementations
				586
				587	The BufferDeallocation transformation relies on one primary control-flow
				588	analysis: BufferPlacementAliasAnalysis. Furthermore, we also use dominance and
				589	liveness to place and move nodes. The liveness analysis determines the live
				590	range of a given value. Within this range, a value is alive and can or will be
				591	used in the course of the program. After this range, the value is dead and can
				592	be discarded - in our case, the buffer can be freed. To place the allocs, we
				593	need to know from which position a value will be alive. The allocs have to be
				594	placed in front of this position. However, the most important analysis is the
				595	alias analysis that is needed to introduce copies and to place all
				596	deallocations.
				597
				598	# Post Phase
				599
				600	In order to limit the complexity of the BufferDeallocation transformation, some
				601	tiny code-polishing/optimization transformations are not applied on-the-fly
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	602	during placement. Currently, a canonicalization pattern is added to the clone
				603	operation to reduce the appearance of unnecessary clones.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	604
				605	Note: further transformations might be added to the post-pass phase in the
				606	future.
				607
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	608	## Clone Canonicalization
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	609
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	610	During placement of clones it may happen, that unnecessary clones are inserted.
				611	If these clones appear with their corresponding dealloc operation within the
				612	same block, we can use the canonicalizer to remove these unnecessary operations.
				613	Note, that this step needs to take place after the insertion of clones and
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	614	deallocs in the buffer deallocation step. The canonicalization inludes both, the
				615	newly created target value from the clone operation and the source operation.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	616
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	617	## Canonicalization of the Source Buffer of the Clone Operation
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	618
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	619	In this case, the source of the clone operation can be used instead of its
				620	target. The unused allocation and deallocation operations that are defined for
				621	this clone operation are also removed. Here is a working example generated by
				622	the BufferDeallocation pass that allocates a buffer with dynamic size. A deeper
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	623	analysis of this sample reveals that the highlighted operations are redundant
				624	and can be removed.
				625
				626	```mlir
				627	func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	628	%1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	629	%2 = bufferization.clone %1 : (memref<?x?xf32>) -> (memref<?x?xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	630	memref.dealloc %1 : memref<?x?xf32>
				631	return %2 : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	632	}
				633	```
				634
				635	Will be transformed to:
				636
				637	```mlir
				638	func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	639	%1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
				640	return %1 : memref<?x?xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	641	}
				642	```
				643
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	644	In this case, the additional copy %2 can be replaced with its original source
				645	buffer %1. This also applies to the associated dealloc operation of %1.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	646
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	647	## Canonicalization of the Target Buffer of the Clone Operation
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	648
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	649	In this case, the target buffer of the clone operation can be used instead of
				650	its source. The unused deallocation operation that is defined for this clone
				651	operation is also removed.
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	652
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	653	Consider the following example where a generic test operation writes the result
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	654	to %temp and then copies %temp to %result. However, these two operations can be
				655	merged into a single step. Canonicalization removes the clone operation and
				656	%temp, and replaces the uses of %temp with %result:
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	657
				658	```mlir
				659	func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	660	%temp = memref.alloc() : memref<2xf32>
				661	test.generic {
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	662	args_in = 1 : i64,
				663	args_out = 1 : i64,
				664	indexing_maps = [#map0, #map0],
				665	iterator_types = ["parallel"]} %arg0, %temp {
				666	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	667	%tmp2 = math.exp %gen2_arg0 : f32
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	668	test.yield %tmp2 : f32
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	669	}: memref<2xf32>, memref<2xf32>
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	670	%result = bufferization.clone %temp : (memref<2xf32>) -> (memref<2xf32>)
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	671	memref.dealloc %temp : memref<2xf32>
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	672	return
				673	}
				674	```
				675
				676	Will be transformed to:
				677
				678	```mlir
				679	func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	680	test.generic {
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	681	args_in = 1 : i64,
				682	args_out = 1 : i64,
				683	indexing_maps = [#map0, #map0],
				684	iterator_types = ["parallel"]} %arg0, %result {
				685	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
Mogball	a54f4ea	2021-10-12 23:14:57	[diff] [blame]	686	%tmp2 = math.exp %gen2_arg0 : f32
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	687	test.yield %tmp2 : f32
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	688	}: memref<2xf32>, memref<2xf32>
				689	return
				690	}
				691	```
				692
Sean Silva	e2d7d3c	2021-01-09 00:25:47	[diff] [blame]	693	## Known Limitations
				694
Alexander Belyaev	465b9a4	2021-03-31 07:34:03	[diff] [blame]	695	BufferDeallocation introduces additional clones from “memref” dialect
Alexander Belyaev	57470ab	2021-11-25 10:42:16	[diff] [blame]	696	(“bufferization.clone”). Analogous, all deallocations use the “memref”
				697	dialect-free operation “memref.dealloc”. The actual copy process is realized
				698	using “test.copy”. Furthermore, buffers are essentially immutable after their
				699	creation in a block. Another limitations are known in the case using
				700	unstructered control flow.