Fork
0
代码
介绍
代码
Issues
Pull Requests
流水线
Actions
讨论
Wiki
项目成员
分析
项目设置
Fork
0
master
MindSpeed
/
mindspeed
/
core
/
memory
/
smart_swap
下载当前目录
ascend-robot
feat(smart-swap): simplify the use of smart-swap
a86ca5b5
创建于
2025年12月17日
历史提交
文件
最后提交记录
最后更新时间
__init__.py
!1386
add feature smart swap Merge pull request
!1386
from ChenDonYY/master
1 年前
hooks.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
policy_generator.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_adaptor.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_arranger.py
!1386
add feature smart swap Merge pull request
!1386
from ChenDonYY/master
1 年前
swap_cpp_adaptor.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_engine.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_manager.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_megatron_adaptor.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_policy_config.py
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge:
!2833
merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对:  - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对:  2. 自定义cpp算子(例如atb等)的接入示例。 见
docs/features/smart_swap.md
。 See merge request: Ascend/MindSpeed
!2833
5 个月前
swap_utils.py
!1386
add feature smart swap Merge pull request
!1386
from ChenDonYY/master
1 年前