文件最后提交记录最后更新时间
feat(pytorch): add DeepSeek V4 fine-tuning trainer Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4452 merge dsv4 into master feat(pytorch): add DeepSeek V4 fine-tuning trainer Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Due to the unique structure of the dsv4 model, we need a new trainer to load the model. ## Does this PR introduce any user-facing change? No, user can also run with posttrain_gpt.py to start training. ## How was this patch tested? We have already been running long-term training on the dataset, and the training loss is converging normally. See merge request: Ascend/MindSpeed-LLM!445222 天前
feat(pytorch): add DeepSeek V4 fine-tuning trainer Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4452 merge dsv4 into master feat(pytorch): add DeepSeek V4 fine-tuning trainer Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Due to the unique structure of the dsv4 model, we need a new trainer to load the model. ## Does this PR introduce any user-facing change? No, user can also run with posttrain_gpt.py to start training. ## How was this patch tested? We have already been running long-term training on the dataset, and the training loss is converging normally. See merge request: Ascend/MindSpeed-LLM!445222 天前