MindSpeed/mindspeed/core · Ascend/MindSpeed - AtomGit

文件	最后提交记录	最后更新时间
context_parallel	fix(torch/cp): use sbnd format before all2all Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3282 merge fixbug_ulysses_tnd into master fix(torch/cp): use sbnd format before all2all Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: fixbug for ulysses tnd See merge request: Ascend/MindSpeed!3282	3 个月前
data_parallel	!2716 fix: async_log_allreduce Merge pull request !2716 from 邓佳/core_r0.12.1_fix_v3	9 个月前
datasets	Add offline pad_data Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !2938 merge revise_preprocess_data into master Add offline pad_data Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 增加离线预处理pack数据集功能，可提前将数据padding到2\CP倍，在线使用CP功能时可节约padding部分耗时精度与非离线padding版本存在一定差异 ![2.png](https://raw.gitcode.com/user-images/assets/7404741/07e65a36-a1cd-4f79-ab62-832febdfa052/2.png '2.png') * Llama2-7b，单机16k，GBS=8场景下，性能提升4.8% ![性能提升.png](https://raw.gitcode.com/user-images/assets/7404741/e38f0a9b-e0b2-498a-a42b-c8b59ec05e87/性能提升.png '性能提升.png') See merge request: Ascend/MindSpeed!2938	6 个月前
dist_checkpointing	perf(verl ckpt): ckpt load and save acceleration Co-authored-by: 李鸣沼<lmztju@126.com> # message auto-generated for no-merge-commit merge: !3074 merge verl_load_and_save_ckpt into master perf(verl ckpt): ckpt load and save acceleration Created-by: lmztju Commit-by: 李鸣沼;l30057177 Merged-by: ascend-robot Description: 测试场景1： verl+megatron后端dapo-qwen3-30b 910A3双机加载和保存ckpt 本地存储 ![image.png](https://raw.gitcode.com/user-images/assets/7404741/cea9c052-2550-4d2c-9dd7-bcfe0cb33d99/image.png 'image.png') See merge request: Ascend/MindSpeed!3074	5 个月前
distributed	docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!3318	2 个月前
fusions	fix: fix the alltoall_seq token dispatcher Nan bug Co-authored-by: guofanfeng<guofanfeng1@huawei.com> # message auto-generated for no-merge-commit merge: !3249 merge bug_fix into master fix: fix the alltoall_seq token dispatcher Nan bug Created-by: guofanfeng23 Commit-by: guofanfeng Merged-by: ascend-robot Description: fix the alltoall_seq token dispatcher Nan bug https://wiki.huawei.com/domains/152732/wiki/307991/WIKI2026020210028614 See merge request: Ascend/MindSpeed!3249	3 个月前
hccl_buffer	fix hccl buffer errors for verl cases Co-authored-by: quancs001<quancs@qq.com> # message auto-generated for no-merge-commit merge: !3478 merge fix_hccl_buffer_for_verl into master fix hccl buffer errors for verl cases Created-by: quancs001 Commit-by: quancs001 Merged-by: ascend-robot Description: What this PR does / why we need it? When running RL exps with verl, several errors are raised, e.g.: 1. `megatron.training.get_args` raises an exception 2. the args for hccl_buffer with ";" could not be parsed by hydra, and the error LexerNoViableAltException is raised. This PR is proposed to solve the errors. Does this PR introduce any user-facing change? No. How was this patch tested? The code is tested and verified locally. See merge request: Ascend/MindSpeed!3478	15 天前
megatron_basic	docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!3318	2 个月前
memory	fix: TE + recompute_norm refix Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3325 merge master into master fix: TE + recompute_norm refix Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: What this PR does / why we need it? refix TE + recompute_norm Does this PR introduce any user-facing change? No How was this patch tested? Test using MindSpeed-Core ST cases and LLM+core case See merge request: Ascend/MindSpeed!3325	2 个月前
models	quant fp8 optimizer	6 个月前
multi_modal	disttrain intervl2 适配开箱 Co-authored-by: gcw_amOUPDs9<fuyuefeng@huawei.com> # message auto-generated for no-merge-commit merge: !2915 merge master into master disttrain intervl2 适配开箱 Created-by: gcw_amOUPDs9 Commit-by: gcw_amOUPDs9 Merged-by: ascend-robot Description: disttrain intervl2 适配开箱 disttrain intervl2 适配开箱 See merge request: Ascend/MindSpeed!2915	6 个月前
optimizer	feat: SwapMuon add save/load ckpt support Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3518 merge master into master feat: SwapMuon add save/load ckpt support Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: SwapMuon add save/load ckpt support RFC：https://gitcode.com/Ascend/MindSpeed/issues/164 See merge request: Ascend/MindSpeed!3518	11 小时前
performance	!2635 安全：文件路径校验/权限 Merge pull request !2635 from glhyy/secmaster	10 个月前
pipeline_parallel	feat: add custom pp layout Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3496 merge add_pp_layout into master feat: add custom pp layout Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 新增pipeline-model-parallel-layout功能，支持自定义PP每个stage的层排布验证链接：https://wiki.huawei.com/domains/137239/wiki/268925/WIKI2026052611233549 issue: https://gitcode.com/Ascend/MindSpeed/issues/166 See merge request: Ascend/MindSpeed!3496	9 小时前
qat	add w8a16 quant Co-authored-by: wangjunhang<wangjunhang7@huawei.com> # message auto-generated for no-merge-commit merge: !3460 merge Dev_W8A16 into master add w8a16 quant Created-by: goodflower9 Commit-by: wangjunhang Merged-by: ascend-robot Description: What this PR does / why we need it? This PR adds W8A16 MXFP8 QAT support based on the existing QAT flow. Does this PR introduce any user-facing change? Yes. Users can enable W8A16 MXFP8 QAT with: ``` --qat-scheme w8a16-mxfp8 ``` Related doc: ``` docs/zh/features/qat_quant.md ``` How was this patch tested? Enable fake quantization by adding the parameter `--qat-scheme w8a16-mxfp8` https://wiki.huawei.com/domains/76578/wiki/233229/WIKI2026051211063362 See merge request: Ascend/MindSpeed!3460	3 天前
qos	feat(ut/qos/torch): 补充ut，修复代码遗漏BUG Co-authored-by: Klayyy<wanglei886@h-partners.com> # message auto-generated for no-merge-commit merge: !3309 merge master into master feat(ut/qos/torch): 补充ut，修复代码遗漏BUG Created-by: Klayyy Commit-by: Klayyy Merged-by: ascend-robot Description: 1.补充AI QOS特性feature UT 2.ut补充过程中，自检代码，修复BUG 2.1 torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()调用名称修改 2.2 qos_feature.py 中 raiseValueError 提示词完善 2.3 qos.py中对于最小冲突度组合中优先级的赋值部分，去掉重复代码，去掉无用库导入，_PARALLEL_TYPES中有逗号未添加 2.4 qos.py中应是sdma qos 部分的处理，误使用roce 3.补充H2D QOS 对于 PCIE异步通道的使用，对于DCMI接口新建set_h2d_qos接口，提供给python调用 4.修改aiQos Readme中关于DCMI接口的调用，补充DCMI接口SO编译方法 See merge request: Ascend/MindSpeed!3309	2 个月前
tensor_parallel	feature(fp8): te checkpoint Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3162 merge feature_checkpoint into master feature(fp8): te checkpoint Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. 引入 te checkpoint消除重计算中冗余的量化操作 2. refactor(blockwise): 删除128128的blockwise策略, 保留1 128\|128 * 128策略替换 3. perf(hif8): 删除多余的cast 4. fix(delayed): 修复delayed算法 5. refactor(recipe 2x): 重构blockwise和mxfp8策略数据存取, 简化后续算子适配 6. 消除字符串字面量, 采用枚举替代验证报告: https://wiki.huawei.com/domains/76578/wiki/233229/WIKI202601139775970 See merge request: Ascend/MindSpeed!3162	4 个月前
transformer	feat: add custom pp layout Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3496 merge add_pp_layout into master feat: add custom pp layout Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 新增pipeline-model-parallel-layout功能，支持自定义PP每个stage的层排布验证链接：https://wiki.huawei.com/domains/137239/wiki/268925/WIKI2026052611233549 issue: https://gitcode.com/Ascend/MindSpeed/issues/166 See merge request: Ascend/MindSpeed!3496	9 小时前
__init__.py	!359 change ascendspeed to mindspeed Merge pull request !359 from 邓佳/master	1 年前
fp8_utils.py	feat: mxfp8-32x32 quant Co-authored-by: kyle_zhangchi<zhangchi158@huawei.com> # message auto-generated for no-merge-commit merge: !3471 merge feat_mxfp8-32x32 into master feat: mxfp8-32x32 quant Created-by: kyle_zhangchi Commit-by: kyle_zhangchi Merged-by: ascend-robot Description: ## What this PR does / why we need it? 在Megatron框架下新增mxfp8-32x32量化算子，降低权重显存占用 ## Does this PR introduce any user-facing change? --fp8-recipe新增mxfp8-32x32选项 https://gitcode.com/Ascend/MindSpeed/commit/e065cbca6873bfc02661d088b07d90224333e87d?ref=feat_mxfp8-32x32&prId=3471 ## How was this patch tested? 验证文档 https://wiki.huawei.com/domains/170864/wiki/367830/WIKI2026051111046509 See merge request: Ascend/MindSpeed!3471	7 天前
mindspeed_parallel_group.py	!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824	1 年前
parallel_state.py	!2552 [fix bug] hccl_buff cannt be zero Merge pull request !2552 from 李宝奎/master_wtd	10 个月前
simple_parallel_cfg.py	!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824	1 年前
singleton_meta.py	!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824	1 年前
tensor_parallel_y_union_cp.py	!699 2d张量并行 Merge pull request !699 from liujianxing/2d_tensor_0824	1 年前
training.py	fixbug for auto_settings Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3003 merge auto_settings into master fixbug for auto_settings Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 1. fixbug for auto_settings 2. add readme See merge request: Ascend/MindSpeed!3003	5 个月前
weight_grad_store.py	!1216 安全排查修改 Merge pull request !1216 from huangzhenyu/master-icsl-no-ootb	1 年前