| [Feature] The transformers_model supports two loss calculation methods: token level and sample level
Co-authored-by: zhangxubin<1656631289@qq.com>
# message auto-generated for no-merge-commit merge:
!1602 merge master into master
[Feature] The transformers_model supports two loss calculation methods: token level and sample level
Created-by: MoCuishle-M
Commit-by: zhangxubin
Merged-by: ascend-robot
Description: ## Motivation
The transformers_model supports two loss calculation methods: token level and sample level.
1.
Enable token-level loss calculation using '--calculate-per-token-loss'.
Enable token-level loss calculation using '--calculate-per-sample-loss'.
Perform argument validation to prevent both '--calculate-per-token-loss' and '--calculate-per-sample-loss' from being enabled simultaneously.
When neither of these two arguments is enabled, the loss computation behavior remains consistent with the current implementation.
2.
Modify the description of the default loss calculation behavior for vlm_model
3.
Put the compute_token_level_loss function in the utility file.
## Modification
The transformers_model supports two loss calculation methods: token level and sample level
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1602 | 7 个月前 |