MindSpeed-LLM/mindspeed_llm/tasks/checkpoint · Ascend/MindSpeed-LLM - AtomGit

ascend-robotrevert "transformer from 4 upgrade to 5"

文件	最后提交记录	最后更新时间
__init__.py	!1998 rename: repo package name from modellink to mindspeed_llm Merge pull request !1998 from MeiFei/master-package-rename	1 年前
convert.py	[pytroch][feature]add enable mg2hf convert in train Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4096 merge master into master [pytroch][feature]add enable mg2hf convert in train Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: add enable mg2hf See merge request: Ascend/MindSpeed-LLM!4096	4 个月前
convert_ckpt_deepseek4.py	feat(pytorch): add dsv4 mg2hf Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4458 merge master into master feat(pytorch): add dsv4 mg2hf Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1. add dsv4 mg2hf - only support pp - only support etp = 1 or tp = 1 2. fix dsv4 hf2mg vpp ## Does this PR introduce any user-facing change? if use base model of dsv4 to do mg2hf convert, please set --model-type-hf with deepseek4_base ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4458	19 天前
convert_ckpt_longcat.py	[pytorch][model]longcat model fix Co-authored-by: guihaowen666<guihaowen@huawei.com> # message auto-generated for no-merge-commit merge: !4251 merge br_master_longcat_fix into master [pytorch][model]longcat model fix Created-by: guihaowen666 Commit-by: guihaowen666 Merged-by: ascend-robot Description: longcat model fix See merge request: Ascend/MindSpeed-LLM!4251	3 个月前
convert_ckpt_mamba2.py	[pytroch][feature]add enable mg2hf convert in train Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4096 merge master into master [pytroch][feature]add enable mg2hf convert in train Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: add enable mg2hf See merge request: Ascend/MindSpeed-LLM!4096	4 个月前
convert_hf2mg.py	fix(pytorch):Ensure no PP/VPP stage contains only empty layers during LoRA fine-tuning Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4364 merge master into master fix(pytorch):Ensure no PP/VPP stage contains only empty layers during LoRA fine-tuning Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: ## What this PR does / why we need it? Ensure no PP/VPP stage contains only empty layers during LoRA fine-tuning; fix train-from-hf about args in convert-checkpoint ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4364	1 个月前
convert_mg2hf.py	fix(pytorch):Ensure no PP/VPP stage contains only empty layers during LoRA fine-tuning Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4364 merge master into master fix(pytorch):Ensure no PP/VPP stage contains only empty layers during LoRA fine-tuning Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: ## What this PR does / why we need it? Ensure no PP/VPP stage contains only empty layers during LoRA fine-tuning; fix train-from-hf about args in convert-checkpoint ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4364	1 个月前
convert_param.py	refactor(megatron):update coverage script Co-authored-by: guihaowen666<guihaowen@huawei.com> # message auto-generated for no-merge-commit merge: !4295 merge br_master_coverage_fix_0313 into master refactor(megatron):update coverage script Created-by: guihaowen666 Commit-by: guihaowen666 Merged-by: ascend-robot Description: # 覆盖率分析脚本更新 ---- ## What this PR does / why we need it? 更新仓库run_coverage.sh脚本，修复覆盖率分析扫描文件不全的问题 ## Does this PR introduce any user-facing change? 不影响仓库基本功能，旨在优化仓库覆盖率分析功能 ## How was this patch tested? 已在蓝区机器上自测通过 See merge request: Ascend/MindSpeed-LLM!4295	2 个月前
loader_hf.py	revert "transformer from 4 upgrade to 5" Co-authored-by: wanggangguo<wanggangguo@huawei.com> # message auto-generated for no-merge-commit merge: !4518 merge upgrade into master revert "transformer from 4 upgrade to 5" Created-by: isfrapples Commit-by: wanggangguo Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4518	5 天前
loader_mg.py	fix: 修复多个潜在bug以提高代码健壮性 Co-authored-by: 王姜奔<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4362 merge master into master fix: 修复多个潜在bug以提高代码健壮性 Created-by: wangjiangben Commit-by: 王姜奔 Merged-by: ascend-robot Description: ## 修复内容本PR修复了代码仓库中发现的多个潜在bug，以提高代码的健壮性和稳定性。 ### 修复详情 #### 1. 修复裸except语句问题: 使用裸的`except:`会捕获所有异常包括系统异常，可能导致难以调试的问题。修复: 改为`except Exception:`，只捕获标准异常。影响文件: - mindspeed_llm/tasks/checkpoint/loader_hf.py - mindspeed_llm/tasks/checkpoint/loader_mg.py #### 2. 修复除零检查逻辑错误问题: `check_divisible_by_zero`函数逻辑错误，原条件会导致非整数除数直接执行除法。修复: 简化为`if divisor != 0:`，正确处理所有数值类型。影响文件: - mindspeed_llm/tasks/utils/error_utils.py #### 3. 修复DPO训练器除零风险问题: `chosen_log_probs / chosen_length`在`chosen_length`为0时会引发除零异常。修复: 使用`torch.clamp(chosen_length, min=1)`确保安全除法。影响文件: - mindspeed_llm/tasks/posttrain/dpo/dpo_trainer.py #### 4. 修复BBH评估除零风险问题: `loss_values.sum(-1).cpu().numpy() / token_ids.size(1)`在token序列为空时会除零。修复: 使用`max(token_ids.size(1), 1)`防止除零。影响文件: - mindspeed_llm/tasks/evaluation/eval_impl/bbh_eval.py ## 测试计划 - [x] 代码修改已完成 - [x] 修改已提交到本地仓库 - [x] 修改已推送到远程仓库 - [x] 等待CI测试通过 - [ ] 等待代码审查 ## 影响范围这些修复主要影响： - 异常处理机制 - 数值计算安全性 - 边缘情况处理所有修改都是防御性编程，不会改变正常情况下的行为逻辑。 See merge request: Ascend/MindSpeed-LLM!4362	1 个月前
model_builder.py	fix(pytorch): fix some bugs of ds4 ckpt Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4442 merge master into master fix(pytorch): fix some bugs of ds4 ckpt Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1.严格校验compress_ratios和层数的对应关系； 2.compress_ratios传入方式更改； 3.拦截非bf16权重场景； 4.对未知参数报错 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4442	25 天前
models.py	[pytroch][feature]add enable mg2hf convert in train Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4096 merge master into master [pytroch][feature]add enable mg2hf convert in train Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: add enable mg2hf See merge request: Ascend/MindSpeed-LLM!4096	4 个月前
saver.py	[pytorch][bugfix]fix lora_target_modules in ckpt save_lora_to_hf Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4010 merge master into master [pytorch][bugfix]fix lora_target_modules in ckpt save_lora_to_hf Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: fix lora_target_modules in ckpt save_lora_to_hf See merge request: Ascend/MindSpeed-LLM!4010	4 个月前