| feat(pytorch): add DeepSeek V4 fine-tuning trainer
Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com>
# message auto-generated for no-merge-commit merge:
!4452 merge dsv4 into master
feat(pytorch): add DeepSeek V4 fine-tuning trainer
Created-by: HANHU1CHEN
Commit-by: HanhuiChen
Merged-by: ascend-robot
Description: ## What this PR does / why we need it?
Due to the unique structure of the dsv4 model, we need a new trainer to load the model.
## Does this PR introduce any user-facing change?
No, user can also run with posttrain_gpt.py to start training.
## How was this patch tested?
We have already been running long-term training on the dataset, and the training loss is converging normally.
See merge request: Ascend/MindSpeed-LLM!4452 | 22 天前 |