文件最后提交记录最后更新时间
Update ParamAndGradBuffer init Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3264 merge master into master Update ParamAndGradBuffer init Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: Update ParamAndGradBuffer init to fix buffer offset bug. See merge request: Ascend/MindSpeed!32643 个月前
!2738 【feat.】支持Megatron Custom FSDP特性 Merge pull request !2738 from yuqi/custom_fsdp 9 个月前
docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!33182 个月前
重排bucket group;在模型定义顺序与执行顺序混乱的情况下,可以明显提升开启overlap param gather后的吞吐 Co-authored-by: zjuCC<caizeyong1@huawei.com> # message auto-generated for no-merge-commit merge: merge feature/reset_bucket_group into master 重排bucket group;在模型定义顺序与执行顺序混乱的情况下,可以明显提升开启overlap param gather后的吞吐 Created-by: zjuCC Commit-by: zjuCC Merged-by: ascend-robot Description: 在大模型的训练过程中,模型的定义顺序和执行不一致是非常常见的问题,尤其是重定义常见的transformer组件或者使用多模态大模型时。这直接导致overlap-param-gather参数时会出现精度问题和计算通信串行的问题。目前megatron 0.12.1的方案解决了精度问题,然而不可避免地会出现计算和通信串行的问题。 具体实验设计参考https://clouddocs.huawei.com/wapp/doc/a5f778cd-fa07-49fc-9b23-3d14c4c67c85 See merge request: Ascend/MindSpeed!28407 个月前
[Bugfix] Fix Megatron checkpoint saving&loading compatibility for torch_dcp format Co-authored-by: 林明哲<linmingzhe3@huawei.com> # message auto-generated for no-merge-commit merge: !3077 merge fix1202 into master [Bugfix] Fix Megatron checkpoint saving&loading compatibility for torch_dcp format Created-by: LinMingZhe Commit-by: 林明哲 Merged-by: ascend-robot Description: Fix Megatron checkpoint saving&loading compatibility for torch_dcp format See merge request: Ascend/MindSpeed!30775 个月前
!461 修改mlp的bug Merge pull request !461 from 徐伟/master 1 年前
quant fp8 optimizer 6 个月前