| fix grpo loss&reward
Co-authored-by: may_feimei<meifei5@huawei.com>
# message auto-generated for no-merge-commit merge:
merge master into master
fix grpo loss&reward
Created-by: may_feimei
Commit-by: may_feimei
Merged-by: ascend-robot
Description: 修复训崩
See merge request: Ascend/MindSpeed-RL!669 | 7 个月前 |