文件最后提交记录最后更新时间
[PYTHON] Added frontend to print sass using turingas disasm.py (#109) 4 年前
[FRONTEND] use standard plugin interface for CUDA (#2887) 2 年前
[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - proton.start(..., mode="instrumentation", ...) - Introduce both high- and low- level scope APIs - For Gluon DSL: pl.scope, pl.enter_scope, and pl.exit_scope. Profiling API for Triton DSL is disabled by default. - For TTGIR: proton.record start and proton.record end - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - proton.start(..., data="trace", ...) is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>9 个月前
RFC [python] Rename config.py > knobs.py (#6641) In https://github.com/triton-lang/triton/pull/6467 I didn't realize that triton.Config exists. Having both triton.config and triton.Config is confusing. On the one hand we could rename triton.Config to triton.AutotunerConfig so that it's more descriptive, but that'd come at the cost of a non-trivial API name change. Instead, since triton.config is so new, I think it's more reasonable to rename that module. Qualitatively I've denoted the different variables as 'knobs' so renaming the module to knobs seems reasonable. Of note: this is an RFC, so if this seems silly / pedantic feel free to shut this down (and close the PR).1 年前
fix(bug): allow using npu-smi info to get the device info Co-authored-by: 刘风昇<liufengsheng2@huawei.com> # message auto-generated for no-merge-commit merge: !1338 merge acl2 into main fix(bug): allow using npu-smi info to get the device info Created-by: meloliu12327 Commit-by: 刘风昇 Merged-by: ascend-robot Description: 添加使用npu-smi info获取硬件型号信息的方式 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!13382 个月前
fix make_get_num_algos_def bug in link.py (#3330) 2 年前
[Blackwell][TUTORIALS] Add tutorial 10-block-scaled-matmul.py (#5813) This tutorial demos Triton support for block scaled matrix multiply on Blackwell's 5th generation tensor core with low precision FP4 and FP8 datatypes. Planned followups include optimized TMA loads for block scale factors, and mixed precision support. Additional changes * Moves MX dtype helper classes to triton/tools/mxfp.py for use in tutorials as well as test code. @ThomasRaoux @pawelszczerbuk @masahi @mbrookhart @binarybana1 年前
[triton_kernels] use static TMAs in matmul_ogs.py (#7803) Co-authored-by: apgoucher <apgoucher@openai.com>9 个月前
Add support for padding option to TMA loads (#7993) Closes #7364 builds on top of #7364 from @jhapradip and addresses remaining comments, as well as implements thepadding option in the fallback RewriteTensorDescriptorToPointer path. - support for passing padding = "nan" on TMA descriptor creation for both host and device TMAs - forwards this argument down to tma descriptor creation - implement the NaN other value in the TMA fallback path --------- Co-authored-by: Pradip Jha <pradipjha@hotmail.com>9 个月前