文件最后提交记录最后更新时间
[NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) (#7471) This change replaces some NVGPU ops with the corresponding NVVM ops. It aligns with previous discussions in PR #7420. For some op like NVGPU::FenceAsyncSharedOp, there is no corresponding Intrinsic, and LLVM will also generate PTX. However, in the long run, I think it is better to hand over the responsibility of generating code to LLVM instead of hard coding PTX at the NVGPU layer. The ConvertNVVMToLLVMPass has been added to the pipeline and build system so that NVVM ops are correctly lowered to LLVM IR.10 个月前
[FRONTEND] make some cuda-specific functions more general; remove triton-translate (#2811) 2 年前