文件最后提交记录最后更新时间
fix(torch/cp): use sbnd format before all2all Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3282 merge fixbug_ulysses_tnd into master fix(torch/cp): use sbnd format before all2all Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: fixbug for ulysses tnd See merge request: Ascend/MindSpeed!32823 个月前
!2716 fix: async_log_allreduce Merge pull request !2716 from 邓佳/core_r0.12.1_fix_v3 9 个月前
Add offline pad_data Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !2938 merge revise_preprocess_data into master Add offline pad_data Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 增加离线预处理pack数据集功能,可提前将数据padding到2\*CP倍,在线使用CP功能时可节约padding部分耗时 * 精度与非离线padding版本存在一定差异 ![2.png](https://raw.gitcode.com/user-images/assets/7404741/07e65a36-a1cd-4f79-ab62-832febdfa052/2.png '2.png') * Llama2-7b,单机16k,GBS=8场景下,性能提升4.8% ![性能提升.png](https://raw.gitcode.com/user-images/assets/7404741/e38f0a9b-e0b2-498a-a42b-c8b59ec05e87/性能提升.png '性能提升.png') See merge request: Ascend/MindSpeed!29386 个月前
perf(verl ckpt): ckpt load and save acceleration Co-authored-by: 李鸣沼<lmztju@126.com> # message auto-generated for no-merge-commit merge: !3074 merge verl_load_and_save_ckpt into master perf(verl ckpt): ckpt load and save acceleration Created-by: lmztju Commit-by: 李鸣沼;l30057177 Merged-by: ascend-robot Description: **测试场景1:** verl+megatron后端dapo-qwen3-30b 910A3双机加载和保存ckpt 本地存储 ![image.png](https://raw.gitcode.com/user-images/assets/7404741/cea9c052-2550-4d2c-9dd7-bcfe0cb33d99/image.png 'image.png') See merge request: Ascend/MindSpeed!30745 个月前
docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!33182 个月前
fix: fix the alltoall_seq token dispatcher Nan bug Co-authored-by: guofanfeng<guofanfeng1@huawei.com> # message auto-generated for no-merge-commit merge: !3249 merge bug_fix into master fix: fix the alltoall_seq token dispatcher Nan bug Created-by: guofanfeng23 Commit-by: guofanfeng Merged-by: ascend-robot Description: fix the alltoall_seq token dispatcher Nan bug https://wiki.huawei.com/domains/152732/wiki/307991/WIKI2026020210028614 See merge request: Ascend/MindSpeed!32493 个月前
fix hccl for mla Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: merge hccl_fix into master fix hccl for mla Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 1、remove the constraint of group_query_attention in hccl-group-buffer-adaptive for diverse CP conditions 2、add validation for op-cal-tflops See merge request: Ascend/MindSpeed!28558 个月前
docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!33182 个月前
fix: TE + recompute_norm refix Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3325 merge master into master fix: TE + recompute_norm refix Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: What this PR does / why we need it? refix TE + recompute_norm Does this PR introduce any user-facing change? No How was this patch tested? Test using MindSpeed-Core ST cases and LLM+core case See merge request: Ascend/MindSpeed!33252 个月前
quant fp8 optimizer 6 个月前
disttrain intervl2 适配开箱 Co-authored-by: gcw_amOUPDs9<fuyuefeng@huawei.com> # message auto-generated for no-merge-commit merge: !2915 merge master into master disttrain intervl2 适配开箱 Created-by: gcw_amOUPDs9 Commit-by: gcw_amOUPDs9 Merged-by: ascend-robot Description: disttrain intervl2 适配开箱 disttrain intervl2 适配开箱 See merge request: Ascend/MindSpeed!29156 个月前
低精度优化器debug Co-authored-by: w30064656<wangzhuangzhuang8@h-partners.com> # message auto-generated for no-merge-commit merge: !3105 merge master into master 低精度优化器debug Created-by: w30064656 Commit-by: w30064656 Merged-by: ascend-robot Description: 低精度优化器 mxfp8 性能劣化修复 ![](http://image.huawei.com/tiny-lts/v1/images/mdstorm/350bca8b9f463dcdb58e8844b22194c8_2081x815.png) See merge request: Ascend/MindSpeed!31055 个月前
!2635 安全:文件路径校验/权限 Merge pull request !2635 from glhyy/secmaster 10 个月前
NoOpLayers support additional parameters Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3234 merge master into master NoOpLayers support additional parameters Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: NoOpLayers support additional parameters See merge request: Ascend/MindSpeed!32343 个月前
feat: add w4a16 quant Co-authored-by: xusiyang<xusiyang2@huawei.com> # message auto-generated for no-merge-commit merge: !3334 merge master into master feat: add w4a16 quant Created-by: weixin_44492126 Commit-by: xusiyang;weixin_44492126 Merged-by: ascend-robot Description: What this PR does / why we need it? QAT支持W4A16伪量化 Does this PR introduce any user-facing change? 详细说明见:docs/zh/features/qat_quant.md How was this patch tested? 参数添加"--qat-scheme w4a16-mxf4"时启用伪量化 See merge request: Ascend/MindSpeed!33342 个月前
feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Co-authored-by: Klayyy<wanglei886@h-partners.com> # message auto-generated for no-merge-commit merge: !3309 merge master into master feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Created-by: Klayyy Commit-by: Klayyy Merged-by: ascend-robot Description: 1.补充AI QOS特性feature UT 2.ut补充过程中,自检代码,修复BUG 2.1 torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()调用名称修改 2.2 qos_feature.py 中 raiseValueError 提示词完善 2.3 qos.py中对于最小冲突度组合中优先级的赋值部分,去掉重复代码,去掉无用库导入,_PARALLEL_TYPES中有逗号未添加 2.4 qos.py中 应是sdma qos 部分的处理,误使用roce 3.补充H2D QOS 对于 PCIE异步通道的使用,对于DCMI接口新建set_h2d_qos接口,提供给python调用 4.修改aiQos Readme中关于DCMI接口的调用,补充DCMI接口SO编译方法 See merge request: Ascend/MindSpeed!33092 个月前
feature(fp8): te checkpoint Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3162 merge feature_checkpoint into master feature(fp8): te checkpoint Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. 引入 te checkpoint消除重计算中冗余的量化操作 2. refactor(blockwise): 删除128*128的blockwise策略, 保留1 * 128|128 * 128策略替换 3. perf(hif8): 删除多余的cast 4. fix(delayed): 修复delayed算法 5. refactor(recipe 2x): 重构blockwise和mxfp8策略数据存取, 简化后续算子适配 6. 消除字符串字面量, 采用枚举替代 验证报告: https://wiki.huawei.com/domains/76578/wiki/233229/WIKI202601139775970 See merge request: Ascend/MindSpeed!31624 个月前
feat: fp8_reuse_quant_w Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3358 merge feat_fp8_reuse_quant_w into master feat: fp8_reuse_quant_w Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33582 个月前
!359 change ascendspeed to mindspeed Merge pull request !359 from 邓佳/master 1 年前
feature(fp8): te checkpoint Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3162 merge feature_checkpoint into master feature(fp8): te checkpoint Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. 引入 te checkpoint消除重计算中冗余的量化操作 2. refactor(blockwise): 删除128*128的blockwise策略, 保留1 * 128|128 * 128策略替换 3. perf(hif8): 删除多余的cast 4. fix(delayed): 修复delayed算法 5. refactor(recipe 2x): 重构blockwise和mxfp8策略数据存取, 简化后续算子适配 6. 消除字符串字面量, 采用枚举替代 验证报告: https://wiki.huawei.com/domains/76578/wiki/233229/WIKI202601139775970 See merge request: Ascend/MindSpeed!31624 个月前
!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824 1 年前
!2552 [fix bug] hccl_buff cannt be zero Merge pull request !2552 from 李宝奎/master_wtd 10 个月前
!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824 1 年前
!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824 1 年前
!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824 1 年前
fixbug for auto_settings Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3003 merge auto_settings into master fixbug for auto_settings Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 1. fixbug for auto_settings 2. add readme See merge request: Ascend/MindSpeed!30035 个月前
!1216 安全排查修改 Merge pull request !1216 from huangzhenyu/master-icsl-no-ootb 1 年前