MindSpeed-LLM/mindspeed_llm/core · Ascend/MindSpeed-LLM - AtomGit

ascend-robottest(megatron): pipeline ut testcase fix

文件	最后提交记录	最后更新时间
context_parallel	feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4420	1 个月前
datasets	feat(pytorch): add DeepSeek4 fine-tuning template Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4436 merge dsv4 into master feat(pytorch): add DeepSeek4 fine-tuning template Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Adds a fine-tuning template for the DeepSeek4 model series to support its specific prompt format, including thinking mode, tool calling (DSML format), and reasoning effort control. ## Does this PR introduce any user-facing change? Yes — users can now select --prompt-type deepseek4 to fine-tune DeepSeek4 models. Two new behaviors are also exposed: - `--enable-thinking` controls thinking vs chat mode - `--reasoning-effort {max,high}` inserts a max-effort instruction prefix; only valid when thinking is enabled - `--drop-thinking` controls whether reasoning content is kept in each turn ## How was this patch tested? Tested with byte-level alignment against the official encoding_dsv4 script. See merge request: Ascend/MindSpeed-LLM!4436	25 天前
distributed	feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4420	1 个月前
fusions	feat(torch): Add swiglu func with limit Co-authored-by: iansheng<shengjiayi@huawei.com> # message auto-generated for no-merge-commit merge: !4428 merge swiglu426 into master feat(torch): Add swiglu func with limit Created-by: iansheng Commit-by: iansheng Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4428	1 个月前
high_availability	fix(pytorch):fix TP_DP and TP_DP_CP group not being rebuilt during ARF Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> # message auto-generated for no-merge-commit merge: !4500 merge bugfix_tp_cp into master fix(pytorch):fix TP_DP and TP_DP_CP group not being rebuilt during ARF Created-by: guoywang Commit-by: wangguoyan Merged-by: ascend-robot Description: ## What this PR does / why we need it? fix TP_DP and TP_DP_CP group not being rebuilt during ARF ## Does this PR introduce any user-facing change? NA ## How was this patch tested? NA See merge request: Ascend/MindSpeed-LLM!4500	10 天前
layerwise_disaggregated_training	test(megatron): pipeline ut testcase fix Co-authored-by: xuguoliang3<xuguoliang3@huawei.com> # message auto-generated for no-merge-commit merge: !4559 merge 20260602_ut_fix into master test(megatron): pipeline ut testcase fix Created-by: xuguoliang3 Commit-by: xuguoliang3 Merged-by: ascend-robot Description: ## What this PR does / why we need it? fix pipeline ut testcase posttrain/ldt_sft/test_initialize.py fix the incorrect package path in core/layerwise_disaggregated_training/initialize.py, this branch is compatible with the original Megatron process and will not be executed in the scenarios supported by LDT. ## Does this PR introduce any user-facing change? no user-facing change ## How was this patch tested? test by CI See merge request: Ascend/MindSpeed-LLM!4559	14 小时前
models	fix: remove aten_to operation for dsk_v4 Co-authored-by: wuweiqiang24<1005334931@qq.com> Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !4466 merge opt_dskv4 into master fix: remove aten_to operation for dsk_v4 Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4466	17 天前
optimizer	feat(pytorch): Add MindSpeed Muon feature Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4549 merge master into master feat(pytorch): Add MindSpeed Muon feature Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Replaces the in-repo self-maintained Muon optimizer with MindSpeed's native Muon implementation, removing the legacy code and adapting the patch registration accordingly. ## Does this PR introduce any user-facing change? No change to the Muon usage interface; existing Muon training scripts and arguments continue to work. The underlying implementation is switched to MindSpeed's native version. ## How was this patch tested? Precision has been verified: training with the native Muon optimizer was aligned against the previous self-maintained implementation, with consistent loss and grad-norm behavior. See merge request: Ascend/MindSpeed-LLM!4549	17 小时前
pipeline_parallel	add global aux loss to mindspeed-llm Co-authored-by: 乌兰娜仁<wulannarenzhao@gmail.com> # message auto-generated for no-merge-commit merge: !3938 merge add_global_aux_loss4 into master add global aux loss to mindspeed-llm Created-by: hid88941705 Commit-by: 乌兰娜仁 Merged-by: ascend-robot Description: add global aux loss to mindspeed-llm See merge request: Ascend/MindSpeed-LLM!3938	5 个月前
ssm	!2916 [pytorch][feature] pytorch model mamba cp algorithm support Merge pull request !2916 from Kingsleyandher/master	10 个月前
tensor_parallel	fix(torch): fix checkpoint compatibility issue Co-authored-by: zhyebin01<zhangyebin@h-partners.com> # message auto-generated for no-merge-commit merge: !4253 merge bugfix into master fix(torch): fix checkpoint compatibility issue Created-by: zhyebin01 Commit-by: zhyebin01 Merged-by: ascend-robot Description: [pytorch][bugfix]fix checkpoint compatibility issue See merge request: Ascend/MindSpeed-LLM!4253	2 个月前
transformer	refactor(pytorch): update deepseek4 shell Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4423 merge master into master refactor(pytorch): update deepseek4 shell Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4423	1 个月前
__init__.py	feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4420	1 个月前
fp8_utils.py	[pytorch][feature]support transformer_engine and FP8 training Co-authored-by: mingzhenwang<wangmingzhen4@huawei.com> # message auto-generated for no-merge-commit merge: !3565 merge 0917merge_master into master [pytorch][feature]support transformer_engine and FP8 training Created-by: mingzhenwang Commit-by: mingzhenwang Merged-by: ascend-robot Description: 1.LLM支持transformer_engine 2.deepseek/qwen moe支持TELinear层 See merge request: Ascend/MindSpeed-LLM!3565	5 个月前
optimizer_param_scheduler.py	[pytorch][feature][mindcluster] Integration of elastic-training-related callback code Co-authored-by: 李鸣沼<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3661 merge elastic_training into master [pytorch][feature][mindcluster] Integration of elastic-training-related callback code Created-by: lmztju Commit-by: lmztju;李鸣沼 Merged-by: ascend-robot Description: move takd elastic-training callback to MindSpeed-LLM See merge request: Ascend/MindSpeed-LLM!3661	6 个月前
parallel_state.py	docs: Add comprehensive docstrings for core modules Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4397 merge feature/add-docstrings into master docs: Add comprehensive docstrings for core modules Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary This PR adds detailed English docstrings for key functions and classes across multiple core modules to improve code documentation and maintainability. ## Changes ### Core Modules (`mindspeed_llm/core/`) - context_parallel: Add docstrings for context parallel attention and wrapper functions - `CPDotProductAttention`: Context parallel dot product attention implementation - `attention_init_wrapper`: Attention initialization with Ulysses and hybrid CP support - datasets: Add docstrings for dataset building utilities - `need_to_build_dataset`: Determine which ranks need to build datasets - `build_generic_dataset`: Build distributed datasets - distributed: Add docstrings for gradient sync and buffer management - `start_grad_sync_wrapper`: Gradient synchronization with distributed optimizer support - `recover_gradient_scaling_factors`: Restore gradient scaling factors - models: Add docstrings for GPT layer specifications - `get_gpt_layer_local_spec_wrapper`: GPT layer spec with custom normalization - `build_layers_wrapper`: Layer building with MC2 optimization for MoE - parallel_state: Add docstrings for parallel initialization - `initialize_model_parallel_decorator`: Model parallel initialization with expert parallel support - transformer: Add docstrings for transformer block functions - `get_num_layers_to_build`: Calculate layers for current pipeline stage - `get_layer_offset_wrapper`: Layer offset with custom distribution support - `transformer_block_init_wrapper`: TransformerBlock initialization ### Operators (`mindspeed_llm/ops/`) - triton: Add docstrings for NPU optimization functions - `get_npu_properties`: Get NPU device properties - `rms_norm_ref`: Reference implementation of RMS normalization with gating ### Transformer Engine (`mindspeed_llm/te/`) - Add docstrings for transformer engine attention - `do_kvallgather_context_parallel`: Context parallel attention with KV AllGather strategy ### Training (`mindspeed_llm/training/`) - arguments: Add docstrings for argument parsing - `extra_args_provider_decorator`: Add MindSpeed-LLM specific arguments - `parse_args_decorator`: Parse arguments with MindSpeed-LLM processing - `core_transformer_config_from_args_wrapper`: Create TransformerConfig with extensions - `validate_args_v2_decorator`: Validate arguments with MindSpeed-LLM extensions - checkpointing: Add docstrings for checkpoint management - `_load_base_checkpoint_wrapper`: Load checkpoint with LoRA support - `load_checkpoint_wrapper`: Load checkpoint with loose loading support - initialize: Add docstrings for initialization - `_compile_dependencies`: Compile dataset index builder dependencies - training: Add docstrings for training utilities - `_enable_npu_datadump_step_end`: Enable NPU data dump - `update_save_checkpoint_chmod`: Update checkpoint file permissions - utils: Add docstrings for utility functions - `_disable_gc`: Context manager to disable garbage collection - `temporal_async_caller_schedule_async_call`: Schedule async call with GC disabled ## Documentation Standards All docstrings follow Python standard format: - Brief description of function/class purpose - `Args`: Parameter descriptions with types - `Returns`: Return value description - `Note`: Important usage notes and constraints (where applicable) ## Statistics - Files changed: 13 - Lines added: 443 - Lines removed: 6 ## Testing - All docstrings are written in English - Docstrings accurately describe function behavior - No functional code changes, only documentation improvements ## Related Issues Improves code documentation and developer experience for MindSpeed-LLM core modules. See merge request: Ascend/MindSpeed-LLM!4397	1 个月前
timers.py	[pytorch][feature][mindcluster] Integration of elastic-training-related callback code Co-authored-by: 李鸣沼<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3661 merge elastic_training into master [pytorch][feature][mindcluster] Integration of elastic-training-related callback code Created-by: lmztju Commit-by: lmztju;李鸣沼 Merged-by: ascend-robot Description: move takd elastic-training callback to MindSpeed-LLM See merge request: Ascend/MindSpeed-LLM!3661	6 个月前