文件最后提交记录最后更新时间
!1698 Adpative-CP修复安全检查告警 Merge pull request !1698 from Zhenghao/master 1 年前
fix(torch/atb): fixbug for atb RunAtbCmd Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3288 merge fixbug_atb_RunAtbCmd into master fix(torch/atb): fixbug for atb RunAtbCmd Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: fixbug for atb RunAtbCmd Verification Report: https://wiki.huawei.com/domains/84233/wiki/316024/WIKI2026030210253433 See merge request: Ascend/MindSpeed!32882 个月前
【bugfix!!!】security update & info update Co-authored-by: EX_mitsu<yangjie409@h-partners.com> # message auto-generated for no-merge-commit merge: !2960 merge master into master 【bugfix!!!】security update & info update Created-by: EX_mitsuX Commit-by: EX_mitsu Merged-by: ascend-robot Description: <span style="color:#000000;">1. 增加eps用于防止浮点数比较产生错误(安全)</span> 2.去除fbov vpp下zero-memory限制 3.修正错误的版本配套表描述 See merge request: Ascend/MindSpeed!29606 个月前
!1769 2dtp支持lcoc融合算子 Merge pull request !1769 from kirliavc/2dtp_backward 1 年前
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge: !2833 merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对: ![coc_swap_compare.PNG](https://raw.gitcode.com/user-images/assets/7404741/bba011fd-8710-497b-9ace-19cac98111d9/coc_swap_compare.PNG 'coc_swap_compare.PNG') - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对: ![deepseek_mla_swap_compare.PNG](https://raw.gitcode.com/user-images/assets/7404741/9212a78b-f179-419b-9761-b8b8deb128f3/deepseek_mla_swap_compare.PNG 'deepseek_mla_swap_compare.PNG') 2. 自定义cpp算子(例如atb等)的接入示例。 见docs/features/smart_swap.md。 See merge request: Ascend/MindSpeed!28335 个月前
feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Co-authored-by: Klayyy<wanglei886@h-partners.com> # message auto-generated for no-merge-commit merge: !3309 merge master into master feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Created-by: Klayyy Commit-by: Klayyy Merged-by: ascend-robot Description: 1.补充AI QOS特性feature UT 2.ut补充过程中,自检代码,修复BUG 2.1 torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()调用名称修改 2.2 qos_feature.py 中 raiseValueError 提示词完善 2.3 qos.py中对于最小冲突度组合中优先级的赋值部分,去掉重复代码,去掉无用库导入,_PARALLEL_TYPES中有逗号未添加 2.4 qos.py中 应是sdma qos 部分的处理,误使用roce 3.补充H2D QOS 对于 PCIE异步通道的使用,对于DCMI接口新建set_h2d_qos接口,提供给python调用 4.修改aiQos Readme中关于DCMI接口的调用,补充DCMI接口SO编译方法 See merge request: Ascend/MindSpeed!33092 个月前