triton-ascend/third_party/proton/test · Ascend/triton-ascend - AtomGit

GGitHub[PROTON] Actively initialize main thread context (#7884 )

dd1bbc52创建于 2025年8月18日历史提交

文件	最后提交记录	最后更新时间
examples	[Proton][AMD] Fix peak TB/s and support gfx950 specs (#7175) Using `2 * bus_width * memory_clock_rate * 1e3 / 8` as the formula cannot deduce the proper max TB/s on AMD devices; the method is more involved on AMD. For now we just hardcode the TB/s result to get correct result and unblock supporting of gfx950.	11 个月前
unittest	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
CMakeLists.txt	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
helper.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
helper_kernels.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
override_helper.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
test_api.py	[PROTON] Use context vars for hook name and id (#7858)	9 个月前
test_cmd.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
test_instrumentation.py	[PROTON] Use context vars for hook name and id (#7858)	9 个月前
test_lib.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
test_override.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
test_profile.py	[PROTON] Actively initialize main thread context (#7884) Initialize mainContextStack in the ShadowContextSource constructor and mark the thread context as the main for the current session. This guarantees a single context even when the actual main thread doesn’t encounter any scoped regions. Previously, if the main thread never entered a scope, the first worker thread to encounter a scope could “win” the race to create the main context stack. That led to an incorrect context tree and unstable ordering. Before (incorrect): threadA_0 - threadB_0 - threadB_1 threadA_0 threadA_1 After (correct): threadB_0 threadB_1 threadA_0 threadA_1	9 个月前
test_viewer.py	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前