MindSpeed-LLM/mindspeed_llm/tasks/posttrain/sft · Ascend/MindSpeed-LLM - AtomGit

ascend-robotfeat(pytorch): add DeepSeek V4 fine-tuning trainer

21d61979创建于 22 天前历史提交

文件	最后提交记录	最后更新时间
__init__.py	feat(pytorch): add DeepSeek V4 fine-tuning trainer Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4452 merge dsv4 into master feat(pytorch): add DeepSeek V4 fine-tuning trainer Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Due to the unique structure of the dsv4 model, we need a new trainer to load the model. ## Does this PR introduce any user-facing change? No, user can also run with posttrain_gpt.py to start training. ## How was this patch tested? We have already been running long-term training on the dataset, and the training loss is converging normally. See merge request: Ascend/MindSpeed-LLM!4452	22 天前
sft_trainer.py	feat(pytorch): add DeepSeek V4 fine-tuning trainer Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4452 merge dsv4 into master feat(pytorch): add DeepSeek V4 fine-tuning trainer Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Due to the unique structure of the dsv4 model, we need a new trainer to load the model. ## Does this PR introduce any user-facing change? No, user can also run with posttrain_gpt.py to start training. ## How was this patch tested? We have already been running long-term training on the dataset, and the training loss is converging normally. See merge request: Ascend/MindSpeed-LLM!4452	22 天前