| [Bugfix]fix: memory reorganization during concurrent fast host dispatch and multi-stream reuse.
Co-authored-by: MoCuishle-M<1656631289@qq.com>
# message auto-generated for no-merge-commit merge:
!1750 merge master into master
[Bugfix]fix: memory reorganization during concurrent fast host dispatch and multi-stream reuse.
Created-by: MoCuishle-M
Commit-by: MoCuishle-M
Merged-by: ascend-robot
Description: ## Motivation
Fixes the memory reorganization problem triggered when fast host dispatch and tensor multi-stream reuse occur simultaneously.
当前实现因为发生内存重整,在GBS相同情况下,单步迭代时间60+ s。
回退代码后,单步迭代时间大约44 s。
在当前代码基础上,添加同步操作,单步迭代时间大约42 s。
## Modification
Please briefly describe what modification is made in this PR.
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [x] The new code needs to comply with the Clean Code specification.
- [x] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [x] CLA has been signed and all committers have signed the CLA in this PR.
- [x] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1750 | 6 个月前 |