文件最后提交记录最后更新时间
!728 添加统计算子TFOPS计算,方便计算MFU和HFU Merge pull request !728 from yangcheng/master 1 年前
!443 自定义算子FFN支持单算子模式 & 文档更新 Merge pull request !443 from chenjiazhong/ffn_cpp 1 年前
【bugfix!!!】security update & info update Co-authored-by: EX_mitsu<yangjie409@h-partners.com> # message auto-generated for no-merge-commit merge: !2960 merge master into master 【bugfix!!!】security update & info update Created-by: EX_mitsuX Commit-by: EX_mitsu Merged-by: ascend-robot Description: <span style="color:#000000;">1. 增加eps用于防止浮点数比较产生错误(安全)</span> 2.去除fbov vpp下zero-memory限制 3.修正错误的版本配套表描述 See merge request: Ascend/MindSpeed!29606 个月前
!1576 flops计算新增支持gmm_add以及gmm接口 Merge pull request !1576 from yangcheng/master 1 年前
!841 Mc2 Moe Bug Fix Merge pull request !841 from yayahello/master 1 年前
!2341 add swap optimizer to core_r0.12.0 Merge pull request !2341 from wangyuansheng8/master 11 个月前
!1603 add ema adamw optimizer Merge pull request !1603 from wangyuansheng8/master 1 年前
!1151 MOE迭代torch接口文档输出字段描述完善 Merge pull request !1151 from 陈建军/master 1 年前
【bugfix!!!】security update & info update Co-authored-by: EX_mitsu<yangjie409@h-partners.com> # message auto-generated for no-merge-commit merge: !2960 merge master into master 【bugfix!!!】security update & info update Created-by: EX_mitsuX Commit-by: EX_mitsu Merged-by: ascend-robot Description: <span style="color:#000000;">1. 增加eps用于防止浮点数比较产生错误(安全)</span> 2.去除fbov vpp下zero-memory限制 3.修正错误的版本配套表描述 See merge request: Ascend/MindSpeed!29606 个月前
!359 change ascendspeed to mindspeed Merge pull request !359 from 邓佳/master 1 年前
Feat: adaptor for DeepSeek V4 Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3427 merge master into master Feat: adaptor for DeepSeek V4 Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: What this PR does / why we need it? Adaptor for DeepSeek V4!!! Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!34271 个月前
!359 change ascendspeed to mindspeed Merge pull request !359 from 邓佳/master 1 年前
!359 change ascendspeed to mindspeed Merge pull request !359 from 邓佳/master 1 年前
!615 feat: use-fused-moe-token-permute-and-unpermute 融合算子 Merge pull request !615 from 邓佳/dev_moe 1 年前
!741 fix(npu_moe_token_unpermute.cpp): restore_shape; fix(megatron_adaptor.py): initialize_model_parallel for tp-cp Merge pull request !741 from 邓佳/dev_moe 1 年前
!1759 长序列RingAttention适配TND+融合算子适配 Merge pull request !1759 from zengshu/master 1 年前
!564 npu_rotary_positon_embedding算子改为c++绑定 Merge pull request !564 from rui_everAfter/cpp_bound 1 年前
fix npu_sparse_attn_sharedkv ops in Mindspeed Co-authored-by: boes129<chenqi185@huawei.com> # message auto-generated for no-merge-commit merge: !3477 merge chenqi_f1 into master fix npu_sparse_attn_sharedkv ops in Mindspeed Created-by: boes129 Commit-by: boes129 Merged-by: ascend-robot Description: What this PR does / why we need it? 1.新版npu_sparse_attn_sharedkv算子的aclnn接口变了,新增了orikv stride和cmpkv stride参数,以适配不同内存框架。 但mindspeed没感知到,代码里还是按照老接口的调用的,在deepseekv4 flash预训练时会导致调用算子报core dump,参考新算子接口示例 https://gitcode.com/cann/cann-recipes-infer/pull/387/diffs 和ops-transformer仓PR https://gitcode.com/cann/ops-transformer/commit/fc04f943c12b87c6581527bd558fbe38cee31879?ref=master ,修复此问题。 2.原先的文件不满足.clang-format的格式要求,无法通过流水线,按照项目中的.clang-format文件格式化了文件。 Does this PR introduce any user-facing change? NA How was this patch tested? 使用新版算子包, 替换了修改后的npu_sparse_attn_shared_kv.cpp,通过mindspeed对deepseekv4 flash完成预训练,验证了修改成功生效 See merge request: Ascend/MindSpeed!34778 天前
Feat: adaptor for DeepSeek V4 Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3427 merge master into master Feat: adaptor for DeepSeek V4 Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: What this PR does / why we need it? Adaptor for DeepSeek V4!!! Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!34271 个月前
!634 Add quantize gmm ops Merge pull request !634 from zqh/0809 1 年前
!968 Add weight quantize gmm ops Merge pull request !968 from 洪炜杰/hong0927 1 年前