msdebug/mlir/test/Dialect/Bufferization/Transforms · Ascend/MindStudio-Debugger - AtomGit

GGitHub[mlir] Fix block merging (#97697 )

c63125d4创建于 2024年7月18日历史提交

文件	最后提交记录	最后更新时间
OwnershipBasedBufferDeallocation	[mlir] Fix block merging (#97697) With this PR I am trying to address: https://github.com/llvm/llvm-project/issues/63230. What changed: - While merging identical blocks, don't add a block argument if it is "identical" to another block argument. I.e., if the two block arguments refer to the same `Value`. The operations operands in the block will point to the argument we already inserted. This needs to happen to all the arguments we pass to the different successors of the parent block - After merged the blocks, get rid of "unnecessary" arguments. I.e., if all the predecessors pass the same block argument, there is no need to pass it as an argument. - This last simplification clashed with `BufferDeallocationSimplification`. The reason, I think, is that the two simplifications are clashing. I.e., `BufferDeallocationSimplification` contains an analysis based on the block structure. If we simplify the block structure (by merging and/or dropping block arguments) the analysis is invalid . The solution I found is to do a more prudent simplification when running that pass. Note: this a rework of #96871 . I ran all the integration tests (`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.	1 年前
buffer-deallocation-simplification.mlir	[mlir][bufferization] Add `BufferOriginAnalysis` (#86461) This commit adds the `BufferOriginAnalysis`, which can be queried to check if two buffer SSA values originate from the same allocation. This new analysis is used in the buffer deallocation pass to fold away or simplify `bufferization.dealloc` ops more aggressively. The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`, which collects buffer SSA value "same buffer" dependencies. E.g., given IR such as: `%0 = memref.alloc() %1 = memref.subview %0 %2 = memref.subview %1` The `BufferViewFlowAnalysis` will report the following "reverse" dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all buffer SSA values in the reverse use-def chain that originate from the same allocation as `%2`. The `BufferOriginAnalysis` is built on top of that. It handles only simple cases at the moment and may conservatively return "unknown" around certain IR with branches, memref globals and function arguments. This analysis enables additional simplifications during `-buffer-deallocation-simplification`. In particular, "regular" scf.for loop nests, that yield buffers (or reallocations thereof) in the same order as they appear in the iter_args, are now handled much more efficiently. Such IR patterns are generated by the sparse compiler.	2 年前
buffer-deallocation.mlir	Revert "[mlir][bufferization] Improve buffer deallocation pass" This reverts commit 1bebb60a7565e5197d23120528f544b886b4d905. This caused problems in downstream projects. We are reverting to give them more time for integration.	2 年前
buffer-hoisting.mlir	[mlir][bufferize] Remove hoisting functionality from One-Shot Bufferize The same functionality is already provided by `-buffer-hoisting` and `-buffer-loop-hoisting`. Differential Revision: https://reviews.llvm.org/D126251	4 年前
buffer-loop-hoisting.mlir	Avoid buffer hoisting from parallel loops (#90735) This change corrects an invalid behavior in pass `--buffer-loop-hoisting`. The pass is in charge of extracting buffer allocations (e.g., `memref.alloca`) from loop regions (e.g., `scf.for`) when possible. This works OK for looks with sequential execution semantics. However, a buffer allocated in the body of a parallel loop may be concurrently accessed by multiple thread to store its local data. Extracting such buffer from the loop causes all threads to wrongly share the same memory region. In the following example, dimension 1 of the input tensor is reversed. Dimension 0 is traversed with a parallel loop. func.func @f(%input: memref<2x3xf32>) -> memref<2x3xf32> { %c0 = index.constant 0 %c1 = index.constant 1 %c2 = index.constant 2 %c3 = index.constant 3 %output = memref.alloc() : memref<2x3xf32> scf.parallel (%index) = (%c0) to (%c2) step (%c1) { // Create subviews for working input and output slices %input_slice = memref.subview %input[%index, 2][1, 3][1, -1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, -1], offset: ?>> %output_slice = memref.subview %output[%index, 0][1, 3][1, 1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>> // Copy the input slice into this temporary buffer. This intermediate // copy is unnecessary, but is used for illustration purposes. %temp = memref.alloc() : memref<1x3xf32> memref.copy %input_slice, %temp : memref<1x3xf32, strided<[3, -1], offset: ?>> to memref<1x3xf32> // Copy temporary buffer into output slice memref.copy %temp, %output_slice : memref<1x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>> scf.reduce } return %output : memref<2x3xf32> } The patch submitted here prevents `%temp = memref.alloc() : memref<1x3xf32>` from being hoisted when the containing op is `scf.parallel` or `scf.forall`. A new op trait called `HasParallelRegion` is introduced and assigned to these two ops to indicate that their regions have parallel execution semantics. @joker-eph @ftynse @nicolasvasilache @sabauma	2 年前
finalizing-bufferize.mlir	[mlir] use strided layout in structured codegen-related tests All relevant operations have been switched to primarily use the strided layout, but still support the affine map layout. Update the relevant tests to use the strided format instead for compatibility with how ops now print by default. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D134045	3 年前
lower-deallocations-func.mlir	[mlir][test] Fix filecheck annotation typos (#92897) Moved fixes for mlir from https://github.com/llvm/llvm-project/pull/91854, plus few additional in second commit. --------- Co-authored-by: klensy <nightouser@gmail.com>	2 年前
lower-deallocations.mlir	[mlir] [bufferize] fix bufferize deallocation error in nest symbol table (#98476) In nested symbols, the dealloc_helper function generated by lower deallocations pass was incorrectly positioned, causing calls fail. This patch fixes this issue.	1 年前
one-shot-bufferize-allow-return-allocs.mlir	[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: `Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32>` One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.	2 年前
one-shot-bufferize-analysis-bottom-up-from-terminators.mlir	[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: `Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32>` One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.	2 年前
one-shot-bufferize-analysis-empty-tensor-elimination.mlir	[mlir][bufferization] skip empty tensor elimination if they have different element type (#96998) In the origin implementation, the empty tensor elimination will add a `tensor.cast` and eliminate the tensor even if they have different element type(f32, bf16). Here add a check for element type and skip the elimination if they are different.	1 年前
one-shot-bufferize-analysis.mlir	[mlir][bufferization] `MaterializeInDestinationOp`: Support memref destinations (#68074) Extend `bufferization.materialize_in_destination` to support memref destinations. This op can now be used to indicate that a tensor computation should materialize in a given buffer (that may have been allocated by another component/runtime). The op still participates in "empty tensor elimination". Example: `mlir func.func @test(%out: memref<10xf32>) { %t = tensor.empty() : tensor<10xf32> %c = linalg.generic ... outs(%t: tensor<10xf32>) -> tensor<10xf32> bufferization.materialize_in_destination %c in restrict writable %out : (tensor<10xf32>, memref<10xf32>) -> () return }` After "empty tensor elimination", the above IR can bufferize without an allocation: `mlir func.func @test(%out: memref<10xf32>) { linalg.generic ... outs(%out: memref<10xf32>) return }` This change also clarifies the meaning of the `restrict` unit attribute on `bufferization.to_tensor` ops.	2 年前
one-shot-bufferize-compat.mlir	[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the `allow-return-allocs` pass option will default to true now, `create-deallocs` defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new `allow-return-allocs-from-loops` option is added as a temporary workaround for some bufferization limitations.	2 年前
one-shot-bufferize-empty-tensor-elimination.mlir	[MLIR] Generalize expand_shape to take shape as explicit input (#90040) This patch generalizes tensor.expand_shape and memref.expand_shape to consume the output shape as a list of SSA values. This enables us to implement generic reshape operations with dynamic shapes using collapse_shape/expand_shape pairs. The output_shape input to expand_shape follows the static/dynamic representation that's also used in `tensor.extract_slice`. Differential Revision: https://reviews.llvm.org/D140821 --------- Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com> Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com> Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>	2 年前
one-shot-bufferize-memory-space-invalid.mlir	[mlir][bufferize] Infer memory space in all bufferization patterns This change updates all remaining bufferization patterns (except for scf.while) and the remaining bufferization infrastructure to infer the memory space whenever possible instead of falling back to "0". (If a default memory space is set in the bufferization options, we still fall back to that value if the memory space could not be inferred.) Differential Revision: https://reviews.llvm.org/D128423	3 年前
one-shot-bufferize-partial.mlir	[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: `Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32>` One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.	2 年前
one-shot-bufferize-pass-statistics.mlir	[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the `allow-return-allocs` pass option will default to true now, `create-deallocs` defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new `allow-return-allocs-from-loops` option is added as a temporary workaround for some bufferization limitations.	2 年前
one-shot-bufferize.mlir	Bufferization with ControlFlow Asserts (#95868) Fixed incorrect bufferization interaction with cf.assert - reordered bufferization condition checking - fixed hasNeitherAllocateNorFreeSideEffect checking bug - implemented memory interface for cf.assert --------- Co-authored-by: McCowan Zhang <mccowan.z@ssi.samsung.com>	1 年前
one-shot-module-bufferize-allow-return-allocs.mlir	[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: `Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32>` One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.	2 年前
one-shot-module-bufferize-analysis.mlir	[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: `Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32>` One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.	2 年前
one-shot-module-bufferize-force-copy-before-write.mlir	[mlir][bufferization] Fix failing lit test Checks were too strict and by the time the patch was submitted, the output of the test changed. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D142969	3 年前
one-shot-module-bufferize-invalid.mlir	[mlir][bufferization] Allow cyclic function graphs without tensors (#68632) Cyclic function call graphs are generally not supported by One-Shot Bufferize. However, they can be allowed when a function does not have tensor arguments or results. This is because it is then no longer necessary that the callee will be bufferized before the caller.	2 年前
one-shot-module-bufferize-out-params.mlir	[mlir][Bufferization] castOrReallocMemRefValue: Use BufferizationOptions (#89175) This allows to configure both the op used for allocation and copy of memrefs. It also changes the default behavior because the default allocation in `BufferizationOptions` creates `memref.alloc` with `alignment = 64` where we used to create `memref.alloca` without any alignment before. Fixes `// TODO: Use alloc/memcpy callback from BufferizationOptions if called via // BufferizableOpInterface impl of ToMemrefOp.`	2 年前
one-shot-module-bufferize.mlir	[mlir][Bufferization] castOrReallocMemRefValue: Use BufferizationOptions (#89175) This allows to configure both the op used for allocation and copy of memrefs. It also changes the default behavior because the default allocation in `BufferizationOptions` creates `memref.alloc` with `alignment = 64` where we used to create `memref.alloca` without any alignment before. Fixes `// TODO: Use alloc/memcpy callback from BufferizationOptions if called via // BufferizableOpInterface impl of ToMemrefOp.`	2 年前
tensor-copy-insertion-memory-space-invalid.mlir	[mlir][bufferization] Make `TensorCopyInsertionPass` a test pass TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass. This change affects the sparse compiler, which uses `TensorCopyInsertionPass`. A new `SparsificationAndBufferizationPass` is added to replace all passes in the sparse tensor pipeline from `TensorCopyInsertionPass` until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops. Differential Revision: https://reviews.llvm.org/D138915	3 年前
tensor-copy-insertion-memory-space.mlir	[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the `allow-return-allocs` pass option will default to true now, `create-deallocs` defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new `allow-return-allocs-from-loops` option is added as a temporary workaround for some bufferization limitations.	2 年前
tensor-copy-insertion.mlir	[MLIR][Bufferization] Choose default memory space in tensor copy insertion (#88500) Tensor copy insertion currently uses memory_space = 0 when creating a tensor copy using alloc_tensor. This memory space should instead be the default memory space provided in bufferization options.	2 年前
transform-ops.mlir	[mlir][Bufferization] Add support for controlled bufferization of alloc_tensor (#70957) This revision adds support to `transform.structured.bufferize_to_allocation` to bufferize `bufferization.alloc_tensor()` ops. This is useful as a means path to control the bufferization of `tensor.empty` ops that have bene previously `bufferization.empty_tensor_to_alloc_tensor`'ed.	2 年前