| feat(pytorch): Support torch_dist and async_save
Co-authored-by: z__y<z4t155664@163.com>
# message auto-generated for no-merge-commit merge:
!4361 merge async_save_pretrain_no_generate into master
feat(pytorch): Support torch_dist and async_save
Created-by: z__y
Commit-by: z__y
Merged-by: ascend-robot
Description: ## What this PR does / why we need it?
This PR adds support for torch_dist format checkpoint saving and asynchronous checkpoint saving.
Notably, asynchronous checkpoint saving only supports the torch_dist format.
## Does this PR introduce any user-facing change?
No. This PR only adds new checkpoint saving features without changing existing user workflows or APIs.
## How was this patch tested?
Tests verify that torch_dist format checkpoint saving and asynchronous checkpoint saving work correctly,
and that async_save only works with the torch_dist format as expected.
See merge request: Ascend/MindSpeed-LLM!4361 | 1 个月前 |