feat(pytorch): support deepseekv4_flash in mcore backend
Co-authored-by: dingzicha1997<dingzilin@huawei.com>
# message auto-generated for no-merge-commit merge:
!4420 merge geneva2 into master
feat(pytorch): support deepseekv4_flash in mcore backend
Created-by: dingzicha1997
Commit-by: dingzicha1997
Merged-by: ascend-robot
Description:
## What this PR does / why we need it?
Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue.
## Does this PR introduce any user-facing change?
Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path.
## How was this patch tested?
Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations.
See merge request: Ascend/MindSpeed-LLM!4420
feat(pytorch): support deepseekv4_flash in mcore backend
Co-authored-by: dingzicha1997<dingzilin@huawei.com>
# message auto-generated for no-merge-commit merge:
!4420 merge geneva2 into master
feat(pytorch): support deepseekv4_flash in mcore backend
Created-by: dingzicha1997
Commit-by: dingzicha1997
Merged-by: ascend-robot
Description:
## What this PR does / why we need it?
Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue.
## Does this PR introduce any user-facing change?
Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path.
## How was this patch tested?
Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations.
See merge request: Ascend/MindSpeed-LLM!4420
feat(pytorch): support deepseekv4_flash in mcore backend
Co-authored-by: dingzicha1997<dingzilin@huawei.com>
# message auto-generated for no-merge-commit merge:
!4420 merge geneva2 into master
feat(pytorch): support deepseekv4_flash in mcore backend
Created-by: dingzicha1997
Commit-by: dingzicha1997
Merged-by: ascend-robot
Description:
## What this PR does / why we need it?
Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue.
## Does this PR introduce any user-facing change?
Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path.
## How was this patch tested?
Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations.
See merge request: Ascend/MindSpeed-LLM!4420