MindSpeed/mindspeed/te/pytorch/module · Ascend/MindSpeed - AtomGit

文件	最后提交记录	最后更新时间
ops	feat(fp8): quant mm add Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3272 merge feature_quant_mm_add into master feat(fp8): quant mm add Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. mxfp8 策略支持 npu_add_quant_matmul 算子 2. tensorwise 支持mc2策略 3. 修复在backward时取FP8Tensor2D device 取不到的问题自测报告: https://wiki.huawei.com/domains/76578/wiki/233229/WIKI2026030210253506 See merge request: Ascend/MindSpeed!3272	2 个月前
__init__.py	!2791 [feat!!!]te support v2 Merge pull request !2791 from yangjie/master	8 个月前
checkpoint.py	feature(fp8): te checkpoint Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3162 merge feature_checkpoint into master feature(fp8): te checkpoint Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. 引入 te checkpoint消除重计算中冗余的量化操作 2. refactor(blockwise): 删除128128的blockwise策略, 保留1 128\|128 * 128策略替换 3. perf(hif8): 删除多余的cast 4. fix(delayed): 修复delayed算法 5. refactor(recipe 2x): 重构blockwise和mxfp8策略数据存取, 简化后续算子适配 6. 消除字符串字面量, 采用枚举替代验证报告: https://wiki.huawei.com/domains/76578/wiki/233229/WIKI202601139775970 See merge request: Ascend/MindSpeed!3162	4 个月前
grouped_linear.py	feat: mxfp8-32x32 quant Co-authored-by: kyle_zhangchi<zhangchi158@huawei.com> # message auto-generated for no-merge-commit merge: !3471 merge feat_mxfp8-32x32 into master feat: mxfp8-32x32 quant Created-by: kyle_zhangchi Commit-by: kyle_zhangchi Merged-by: ascend-robot Description: ## What this PR does / why we need it? 在Megatron框架下新增mxfp8-32x32量化算子，降低权重显存占用 ## Does this PR introduce any user-facing change? --fp8-recipe新增mxfp8-32x32选项 https://gitcode.com/Ascend/MindSpeed/commit/e065cbca6873bfc02661d088b07d90224333e87d?ref=feat_mxfp8-32x32&prId=3471 ## How was this patch tested? 验证文档 https://wiki.huawei.com/domains/170864/wiki/367830/WIKI2026051111046509 See merge request: Ascend/MindSpeed!3471	7 天前
layernorm.py	!2715 adjust LayerNorm patch Merge pull request !2715 from wangyuansheng8/master	9 个月前
layernorm_column_parallel_linear.py	fix:layer norm bias init zero Co-authored-by: Liz_<lizhi166@huawei.com> # message auto-generated for no-merge-commit merge: !3475 merge master into master fix:layer norm bias init zero Created-by: Liz_ Commit-by: Liz_ Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3475	15 天前
linear.py	feat: mxfp8-32x32 quant Co-authored-by: kyle_zhangchi<zhangchi158@huawei.com> # message auto-generated for no-merge-commit merge: !3471 merge feat_mxfp8-32x32 into master feat: mxfp8-32x32 quant Created-by: kyle_zhangchi Commit-by: kyle_zhangchi Merged-by: ascend-robot Description: ## What this PR does / why we need it? 在Megatron框架下新增mxfp8-32x32量化算子，降低权重显存占用 ## Does this PR introduce any user-facing change? --fp8-recipe新增mxfp8-32x32选项 https://gitcode.com/Ascend/MindSpeed/commit/e065cbca6873bfc02661d088b07d90224333e87d?ref=feat_mxfp8-32x32&prId=3471 ## How was this patch tested? 验证文档 https://wiki.huawei.com/domains/170864/wiki/367830/WIKI2026051111046509 See merge request: Ascend/MindSpeed!3471	7 天前
performance_grouped_linear.py	fix: bugfix for gmm Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3487 merge bugfix_for_gmm into master fix: bugfix for gmm Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 1、partition_dim初始化设置与gpu对齐 2、expert weight在TP场景下grad过滤方式与gpu对齐 See merge request: Ascend/MindSpeed!3487	8 天前