文件最后提交记录最后更新时间
[mlir] Fix block merging (#97697) With this PR I am trying to address: https://github.com/llvm/llvm-project/issues/63230. What changed: - While merging identical blocks, don't add a block argument if it is "identical" to another block argument. I.e., if the two block arguments refer to the same Value. The operations operands in the block will point to the argument we already inserted. This needs to happen to all the arguments we pass to the different successors of the parent block - After merged the blocks, get rid of "unnecessary" arguments. I.e., if all the predecessors pass the same block argument, there is no need to pass it as an argument. - This last simplification clashed with BufferDeallocationSimplification. The reason, I think, is that the two simplifications are clashing. I.e., BufferDeallocationSimplification contains an analysis based on the block structure. If we simplify the block structure (by merging and/or dropping block arguments) the analysis is invalid . The solution I found is to do a more prudent simplification when running that pass. **Note**: this a rework of #96871 . I ran all the integration tests (-DMLIR_INCLUDE_INTEGRATION_TESTS=ON) and they passed.1 年前
[mlir][bufferization] Add BufferOriginAnalysis (#86461) This commit adds the BufferOriginAnalysis, which can be queried to check if two buffer SSA values originate from the same allocation. This new analysis is used in the buffer deallocation pass to fold away or simplify bufferization.dealloc ops more aggressively. The BufferOriginAnalysis is based on the BufferViewFlowAnalysis, which collects buffer SSA value "same buffer" dependencies. E.g., given IR such as: %0 = memref.alloc() %1 = memref.subview %0 %2 = memref.subview %1 The BufferViewFlowAnalysis will report the following "reverse" dependencies (resolveReverse) for %2: {%2, %1, %0}. I.e., all buffer SSA values in the reverse use-def chain that originate from the same allocation as %2. The BufferOriginAnalysis is built on top of that. It handles only simple cases at the moment and may conservatively return "unknown" around certain IR with branches, memref globals and function arguments. This analysis enables additional simplifications during -buffer-deallocation-simplification. In particular, "regular" scf.for loop nests, that yield buffers (or reallocations thereof) in the same order as they appear in the iter_args, are now handled much more efficiently. Such IR patterns are generated by the sparse compiler.2 年前
Revert "[mlir][bufferization] Improve buffer deallocation pass" This reverts commit 1bebb60a7565e5197d23120528f544b886b4d905. This caused problems in downstream projects. We are reverting to give them more time for integration.2 年前
[mlir][bufferize] Remove hoisting functionality from One-Shot Bufferize The same functionality is already provided by -buffer-hoisting and -buffer-loop-hoisting. Differential Revision: https://reviews.llvm.org/D1262514 年前
Avoid buffer hoisting from parallel loops (#90735) This change corrects an invalid behavior in pass --buffer-loop-hoisting. The pass is in charge of extracting buffer allocations (e.g., memref.alloca) from loop regions (e.g., scf.for) when possible. This works OK for looks with sequential execution semantics. However, a buffer allocated in the body of a parallel loop may be concurrently accessed by multiple thread to store its local data. Extracting such buffer from the loop causes all threads to wrongly share the same memory region. In the following example, dimension 1 of the input tensor is reversed. Dimension 0 is traversed with a parallel loop. func.func @f(%input: memref<2x3xf32>) -> memref<2x3xf32> { %c0 = index.constant 0 %c1 = index.constant 1 %c2 = index.constant 2 %c3 = index.constant 3 %output = memref.alloc() : memref<2x3xf32> scf.parallel (%index) = (%c0) to (%c2) step (%c1) { // Create subviews for working input and output slices %input_slice = memref.subview %input[%index, 2][1, 3][1, -1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, -1], offset: ?>> %output_slice = memref.subview %output[%index, 0][1, 3][1, 1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>> // Copy the input slice into this temporary buffer. This intermediate // copy is unnecessary, but is used for illustration purposes. %temp = memref.alloc() : memref<1x3xf32> memref.copy %input_slice, %temp : memref<1x3xf32, strided<[3, -1], offset: ?>> to memref<1x3xf32> // Copy temporary buffer into output slice memref.copy %temp, %output_slice : memref<1x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>> scf.reduce } return %output : memref<2x3xf32> } The patch submitted here prevents %temp = memref.alloc() : memref<1x3xf32> from being hoisted when the containing op is scf.parallel or scf.forall. A new op trait called HasParallelRegion is introduced and assigned to these two ops to indicate that their regions have parallel execution semantics. @joker-eph @ftynse @nicolasvasilache @sabauma2 年前
[mlir] use strided layout in structured codegen-related tests All relevant operations have been switched to primarily use the strided layout, but still support the affine map layout. Update the relevant tests to use the strided format instead for compatibility with how ops now print by default. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D1340453 年前
[mlir][test] Fix filecheck annotation typos (#92897) Moved fixes for mlir from https://github.com/llvm/llvm-project/pull/91854, plus few additional in second commit. --------- Co-authored-by: klensy <nightouser@gmail.com>2 年前
[mlir] [bufferize] fix bufferize deallocation error in nest symbol table (#98476) In nested symbols, the dealloc_helper function generated by lower deallocations pass was incorrectly positioned, causing calls fail. This patch fixes this issue.1 年前
[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32> One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.2 年前
[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32> One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.2 年前
[mlir][bufferization] skip empty tensor elimination if they have different element type (#96998) In the origin implementation, the empty tensor elimination will add a tensor.cast and eliminate the tensor even if they have different element type(f32, bf16). Here add a check for element type and skip the elimination if they are different.1 年前
[mlir][bufferization] MaterializeInDestinationOp: Support memref destinations (#68074) Extend bufferization.materialize_in_destination to support memref destinations. This op can now be used to indicate that a tensor computation should materialize in a given buffer (that may have been allocated by another component/runtime). The op still participates in "empty tensor elimination". Example: mlir func.func @test(%out: memref<10xf32>) { %t = tensor.empty() : tensor<10xf32> %c = linalg.generic ... outs(%t: tensor<10xf32>) -> tensor<10xf32> bufferization.materialize_in_destination %c in restrict writable %out : (tensor<10xf32>, memref<10xf32>) -> () return } After "empty tensor elimination", the above IR can bufferize without an allocation: mlir func.func @test(%out: memref<10xf32>) { linalg.generic ... outs(%out: memref<10xf32>) return } This change also clarifies the meaning of the restrict unit attribute on bufferization.to_tensor ops.2 年前
[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the allow-return-allocs pass option will default to true now, create-deallocs defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new allow-return-allocs-from-loops option is added as a temporary workaround for some bufferization limitations.2 年前
[MLIR] Generalize expand_shape to take shape as explicit input (#90040) This patch generalizes tensor.expand_shape and memref.expand_shape to consume the output shape as a list of SSA values. This enables us to implement generic reshape operations with dynamic shapes using collapse_shape/expand_shape pairs. The output_shape input to expand_shape follows the static/dynamic representation that's also used in tensor.extract_slice. Differential Revision: https://reviews.llvm.org/D140821 --------- Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com> Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com> Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>2 年前
[mlir][bufferize] Infer memory space in all bufferization patterns This change updates all remaining bufferization patterns (except for scf.while) and the remaining bufferization infrastructure to infer the memory space whenever possible instead of falling back to "0". (If a default memory space is set in the bufferization options, we still fall back to that value if the memory space could not be inferred.) Differential Revision: https://reviews.llvm.org/D1284233 年前
[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32> One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.2 年前
[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the allow-return-allocs pass option will default to true now, create-deallocs defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new allow-return-allocs-from-loops option is added as a temporary workaround for some bufferization limitations.2 年前
Bufferization with ControlFlow Asserts (#95868) Fixed incorrect bufferization interaction with cf.assert - reordered bufferization condition checking - fixed hasNeitherAllocateNorFreeSideEffect checking bug - implemented memory interface for cf.assert --------- Co-authored-by: McCowan Zhang <mccowan.z@ssi.samsung.com>1 年前
[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32> One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.2 年前
[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32> One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.2 年前
[mlir][bufferization] Fix failing lit test Checks were too strict and by the time the patch was submitted, the output of the test changed. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D1429693 年前
[mlir][bufferization] Allow cyclic function graphs without tensors (#68632) Cyclic function call graphs are generally not supported by One-Shot Bufferize. However, they can be allowed when a function does not have tensor arguments or results. This is because it is then no longer necessary that the callee will be bufferized before the caller.2 年前
[mlir][Bufferization] castOrReallocMemRefValue: Use BufferizationOptions (#89175) This allows to configure both the op used for allocation and copy of memrefs. It also changes the default behavior because the default allocation in BufferizationOptions creates memref.alloc with alignment = 64 where we used to create memref.alloca without any alignment before. Fixes // TODO: Use alloc/memcpy callback from BufferizationOptions if called via // BufferizableOpInterface impl of ToMemrefOp. 2 年前
[mlir][Bufferization] castOrReallocMemRefValue: Use BufferizationOptions (#89175) This allows to configure both the op used for allocation and copy of memrefs. It also changes the default behavior because the default allocation in BufferizationOptions creates memref.alloc with alignment = 64 where we used to create memref.alloca without any alignment before. Fixes // TODO: Use alloc/memcpy callback from BufferizationOptions if called via // BufferizableOpInterface impl of ToMemrefOp. 2 年前
[mlir][bufferization] Make TensorCopyInsertionPass a test pass TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass. This change affects the sparse compiler, which uses TensorCopyInsertionPass. A new SparsificationAndBufferizationPass is added to replace all passes in the sparse tensor pipeline from TensorCopyInsertionPass until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops. Differential Revision: https://reviews.llvm.org/D1389153 年前
[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the allow-return-allocs pass option will default to true now, create-deallocs defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new allow-return-allocs-from-loops option is added as a temporary workaround for some bufferization limitations.2 年前
[MLIR][Bufferization] Choose default memory space in tensor copy insertion (#88500) Tensor copy insertion currently uses memory_space = 0 when creating a tensor copy using alloc_tensor. This memory space should instead be the default memory space provided in bufferization options.2 年前
[mlir][Bufferization] Add support for controlled bufferization of alloc_tensor (#70957) This revision adds support to transform.structured.bufferize_to_allocation to bufferize bufferization.alloc_tensor() ops. This is useful as a means path to control the bufferization of tensor.empty ops that have bene previously bufferization.empty_tensor_to_alloc_tensor'ed.2 年前