| Refactor shared load/store utilities. (#4141)
Refactor shared load/store utilities.
(This commit message is written about loads, but everything also applies
to stores.)
Previous to this PR we had two ways of loading from shared memory within
the same CTA.
1. LLVM::LoadOp. This supports vector loads, but not predication.
2. TargetInfo::loadShared. This supported predication, but not vector loads.
Loads from shared memory in different CTAs were accessible only through
an nvidia-specific header. These did not support predication, and although
they supported vector loads, it worked slightly differently than LLVM::LoadOp
(namely, you have to know you're loading a vector and unwrap the type
before passing to the function).
This PR reworks all this. Now
1. TargetInfo::loadShared and TargetInfo::loadDShared have the same API.
2. They both support predication and vectors, and the vectors work like
LLVM::LoadOp.
3. They share code; they both just emit PTX.
4. Because we're emitting PTX directly from loadDShared, we can delete
the NVIDIA::LoadDSmem op.
In general I think a logical operation should have either
A. A function createFoo() that emits one or more MLIR operations, or
B. An MLIR op FooOp that lowers to one or more MLIR operations.
But for distributed shmem loads, we had both (A) and (B). This was a
redundant layer of indirection.
This is used in a future LLs patch. | 1 年前 |