文件最后提交记录最后更新时间
[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - proton.start(..., mode="instrumentation", ...) - Introduce both high- and low- level scope APIs - For Gluon DSL: pl.scope, pl.enter_scope, and pl.exit_scope. Profiling API for Triton DSL is disabled by default. - For TTGIR: proton.record start and proton.record end - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - proton.start(..., data="trace", ...) is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>9 个月前
[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - proton.start(..., mode="instrumentation", ...) - Introduce both high- and low- level scope APIs - For Gluon DSL: pl.scope, pl.enter_scope, and pl.exit_scope. Profiling API for Triton DSL is disabled by default. - For TTGIR: proton.record start and proton.record end - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - proton.start(..., data="trace", ...) is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>9 个月前
[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - proton.start(..., mode="instrumentation", ...) - Introduce both high- and low- level scope APIs - For Gluon DSL: pl.scope, pl.enter_scope, and pl.exit_scope. Profiling API for Triton DSL is disabled by default. - For TTGIR: proton.record start and proton.record end - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - proton.start(..., data="trace", ...) is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>9 个月前