| [PROTON] Intra kernel profiling (#7258)
### Instrumentation & Runtime
- Introduce a dedicated instrumentation mode
- proton.start(..., mode="instrumentation", ...)
- Introduce both high- and low- level scope APIs
- For Gluon DSL: pl.scope, pl.enter_scope, and pl.exit_scope.
Profiling API for Triton DSL is disabled by default.
- For TTGIR: proton.record start and proton.record end
- Inject profiling buffers for each triton kernel at codegen time and
pass them to the proton runtime so kernels can push data directly from
the device to the host
### Proton Dialect & Lowering
- Add Proton → ProtonGPU → LLVM pipelines, including passes for
shared-memory allocation, profile scratch allocation, and a few
optimizations for reduced overhead or improved accuracy.
### Tracing
- proton.start(..., data="trace", ...) is supported for both fine- and
coarse-grained events.
---------
Co-authored-by: Yuanwei Fang <fywkevin@gmail.com>
Co-authored-by: Yuanwei Fang <fywkevin@fb.com>
Co-authored-by: Corbin Robeck <corbin.robeck@amd.com>
Co-authored-by: peterbell10 <peterbell10@openai.com>
Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com>
Co-authored-by: Corbin Robeck <robeck@meta.com>
Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com>
Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com>
Co-authored-by: Shawn Zhong <github@shawnzhong.com>
Co-authored-by: Shawn Zhong <shawnzhong@fb.com>
Co-authored-by: 鐘天楽 <a844379248@icloud.com> | 9 个月前 |