文件最后提交记录最后更新时间
feat(pytorch): add DeepSeek4 fine-tuning template Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4436 merge dsv4 into master feat(pytorch): add DeepSeek4 fine-tuning template Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Adds a fine-tuning template for the DeepSeek4 model series to support its specific prompt format, including thinking mode, tool calling (DSML format), and reasoning effort control. ## Does this PR introduce any user-facing change? Yes — users can now select --prompt-type deepseek4 to fine-tune DeepSeek4 models. Two new behaviors are also exposed: - --enable-thinking controls thinking vs chat mode - --reasoning-effort {max,high} inserts a max-effort instruction prefix; only valid when thinking is enabled - --drop-thinking controls whether reasoning content is kept in each turn ## How was this patch tested? Tested with byte-level alignment against the official encoding_dsv4 script. See merge request: Ascend/MindSpeed-LLM!443625 天前
!2945 [pytorch][refactor] refactor the patch framework Merge pull request !2945 from yanzhixiao/refactor-0628 10 个月前
docs: Add comprehensive docstrings for core modules Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4397 merge feature/add-docstrings into master docs: Add comprehensive docstrings for core modules Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary This PR adds detailed English docstrings for key functions and classes across multiple core modules to improve code documentation and maintainability. ## Changes ### Core Modules (mindspeed_llm/core/) - **context_parallel**: Add docstrings for context parallel attention and wrapper functions - CPDotProductAttention: Context parallel dot product attention implementation - attention_init_wrapper: Attention initialization with Ulysses and hybrid CP support - **datasets**: Add docstrings for dataset building utilities - need_to_build_dataset: Determine which ranks need to build datasets - build_generic_dataset: Build distributed datasets - **distributed**: Add docstrings for gradient sync and buffer management - start_grad_sync_wrapper: Gradient synchronization with distributed optimizer support - recover_gradient_scaling_factors: Restore gradient scaling factors - **models**: Add docstrings for GPT layer specifications - get_gpt_layer_local_spec_wrapper: GPT layer spec with custom normalization - build_layers_wrapper: Layer building with MC2 optimization for MoE - **parallel_state**: Add docstrings for parallel initialization - initialize_model_parallel_decorator: Model parallel initialization with expert parallel support - **transformer**: Add docstrings for transformer block functions - get_num_layers_to_build: Calculate layers for current pipeline stage - get_layer_offset_wrapper: Layer offset with custom distribution support - transformer_block_init_wrapper: TransformerBlock initialization ### Operators (mindspeed_llm/ops/) - **triton**: Add docstrings for NPU optimization functions - get_npu_properties: Get NPU device properties - rms_norm_ref: Reference implementation of RMS normalization with gating ### Transformer Engine (mindspeed_llm/te/) - Add docstrings for transformer engine attention - do_kvallgather_context_parallel: Context parallel attention with KV AllGather strategy ### Training (mindspeed_llm/training/) - **arguments**: Add docstrings for argument parsing - extra_args_provider_decorator: Add MindSpeed-LLM specific arguments - parse_args_decorator: Parse arguments with MindSpeed-LLM processing - core_transformer_config_from_args_wrapper: Create TransformerConfig with extensions - validate_args_v2_decorator: Validate arguments with MindSpeed-LLM extensions - **checkpointing**: Add docstrings for checkpoint management - _load_base_checkpoint_wrapper: Load checkpoint with LoRA support - load_checkpoint_wrapper: Load checkpoint with loose loading support - **initialize**: Add docstrings for initialization - _compile_dependencies: Compile dataset index builder dependencies - **training**: Add docstrings for training utilities - _enable_npu_datadump_step_end: Enable NPU data dump - update_save_checkpoint_chmod: Update checkpoint file permissions - **utils**: Add docstrings for utility functions - _disable_gc: Context manager to disable garbage collection - temporal_async_caller_schedule_async_call: Schedule async call with GC disabled ## Documentation Standards All docstrings follow Python standard format: - Brief description of function/class purpose - Args: Parameter descriptions with types - Returns: Return value description - Note: Important usage notes and constraints (where applicable) ## Statistics - **Files changed**: 13 - **Lines added**: 443 - **Lines removed**: 6 ## Testing - All docstrings are written in English - Docstrings accurately describe function behavior - No functional code changes, only documentation improvements ## Related Issues Improves code documentation and developer experience for MindSpeed-LLM core modules. See merge request: Ascend/MindSpeed-LLM!43971 个月前
docs: Add comprehensive docstrings for core modules Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4397 merge feature/add-docstrings into master docs: Add comprehensive docstrings for core modules Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary This PR adds detailed English docstrings for key functions and classes across multiple core modules to improve code documentation and maintainability. ## Changes ### Core Modules (mindspeed_llm/core/) - **context_parallel**: Add docstrings for context parallel attention and wrapper functions - CPDotProductAttention: Context parallel dot product attention implementation - attention_init_wrapper: Attention initialization with Ulysses and hybrid CP support - **datasets**: Add docstrings for dataset building utilities - need_to_build_dataset: Determine which ranks need to build datasets - build_generic_dataset: Build distributed datasets - **distributed**: Add docstrings for gradient sync and buffer management - start_grad_sync_wrapper: Gradient synchronization with distributed optimizer support - recover_gradient_scaling_factors: Restore gradient scaling factors - **models**: Add docstrings for GPT layer specifications - get_gpt_layer_local_spec_wrapper: GPT layer spec with custom normalization - build_layers_wrapper: Layer building with MC2 optimization for MoE - **parallel_state**: Add docstrings for parallel initialization - initialize_model_parallel_decorator: Model parallel initialization with expert parallel support - **transformer**: Add docstrings for transformer block functions - get_num_layers_to_build: Calculate layers for current pipeline stage - get_layer_offset_wrapper: Layer offset with custom distribution support - transformer_block_init_wrapper: TransformerBlock initialization ### Operators (mindspeed_llm/ops/) - **triton**: Add docstrings for NPU optimization functions - get_npu_properties: Get NPU device properties - rms_norm_ref: Reference implementation of RMS normalization with gating ### Transformer Engine (mindspeed_llm/te/) - Add docstrings for transformer engine attention - do_kvallgather_context_parallel: Context parallel attention with KV AllGather strategy ### Training (mindspeed_llm/training/) - **arguments**: Add docstrings for argument parsing - extra_args_provider_decorator: Add MindSpeed-LLM specific arguments - parse_args_decorator: Parse arguments with MindSpeed-LLM processing - core_transformer_config_from_args_wrapper: Create TransformerConfig with extensions - validate_args_v2_decorator: Validate arguments with MindSpeed-LLM extensions - **checkpointing**: Add docstrings for checkpoint management - _load_base_checkpoint_wrapper: Load checkpoint with LoRA support - load_checkpoint_wrapper: Load checkpoint with loose loading support - **initialize**: Add docstrings for initialization - _compile_dependencies: Compile dataset index builder dependencies - **training**: Add docstrings for training utilities - _enable_npu_datadump_step_end: Enable NPU data dump - update_save_checkpoint_chmod: Update checkpoint file permissions - **utils**: Add docstrings for utility functions - _disable_gc: Context manager to disable garbage collection - temporal_async_caller_schedule_async_call: Schedule async call with GC disabled ## Documentation Standards All docstrings follow Python standard format: - Brief description of function/class purpose - Args: Parameter descriptions with types - Returns: Return value description - Note: Important usage notes and constraints (where applicable) ## Statistics - **Files changed**: 13 - **Lines added**: 443 - **Lines removed**: 6 ## Testing - All docstrings are written in English - Docstrings accurately describe function behavior - No functional code changes, only documentation improvements ## Related Issues Improves code documentation and developer experience for MindSpeed-LLM core modules. See merge request: Ascend/MindSpeed-LLM!43971 个月前
docs: Add comprehensive docstrings for core modules Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4397 merge feature/add-docstrings into master docs: Add comprehensive docstrings for core modules Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary This PR adds detailed English docstrings for key functions and classes across multiple core modules to improve code documentation and maintainability. ## Changes ### Core Modules (mindspeed_llm/core/) - **context_parallel**: Add docstrings for context parallel attention and wrapper functions - CPDotProductAttention: Context parallel dot product attention implementation - attention_init_wrapper: Attention initialization with Ulysses and hybrid CP support - **datasets**: Add docstrings for dataset building utilities - need_to_build_dataset: Determine which ranks need to build datasets - build_generic_dataset: Build distributed datasets - **distributed**: Add docstrings for gradient sync and buffer management - start_grad_sync_wrapper: Gradient synchronization with distributed optimizer support - recover_gradient_scaling_factors: Restore gradient scaling factors - **models**: Add docstrings for GPT layer specifications - get_gpt_layer_local_spec_wrapper: GPT layer spec with custom normalization - build_layers_wrapper: Layer building with MC2 optimization for MoE - **parallel_state**: Add docstrings for parallel initialization - initialize_model_parallel_decorator: Model parallel initialization with expert parallel support - **transformer**: Add docstrings for transformer block functions - get_num_layers_to_build: Calculate layers for current pipeline stage - get_layer_offset_wrapper: Layer offset with custom distribution support - transformer_block_init_wrapper: TransformerBlock initialization ### Operators (mindspeed_llm/ops/) - **triton**: Add docstrings for NPU optimization functions - get_npu_properties: Get NPU device properties - rms_norm_ref: Reference implementation of RMS normalization with gating ### Transformer Engine (mindspeed_llm/te/) - Add docstrings for transformer engine attention - do_kvallgather_context_parallel: Context parallel attention with KV AllGather strategy ### Training (mindspeed_llm/training/) - **arguments**: Add docstrings for argument parsing - extra_args_provider_decorator: Add MindSpeed-LLM specific arguments - parse_args_decorator: Parse arguments with MindSpeed-LLM processing - core_transformer_config_from_args_wrapper: Create TransformerConfig with extensions - validate_args_v2_decorator: Validate arguments with MindSpeed-LLM extensions - **checkpointing**: Add docstrings for checkpoint management - _load_base_checkpoint_wrapper: Load checkpoint with LoRA support - load_checkpoint_wrapper: Load checkpoint with loose loading support - **initialize**: Add docstrings for initialization - _compile_dependencies: Compile dataset index builder dependencies - **training**: Add docstrings for training utilities - _enable_npu_datadump_step_end: Enable NPU data dump - update_save_checkpoint_chmod: Update checkpoint file permissions - **utils**: Add docstrings for utility functions - _disable_gc: Context manager to disable garbage collection - temporal_async_caller_schedule_async_call: Schedule async call with GC disabled ## Documentation Standards All docstrings follow Python standard format: - Brief description of function/class purpose - Args: Parameter descriptions with types - Returns: Return value description - Note: Important usage notes and constraints (where applicable) ## Statistics - **Files changed**: 13 - **Lines added**: 443 - **Lines removed**: 6 ## Testing - All docstrings are written in English - Docstrings accurately describe function behavior - No functional code changes, only documentation improvements ## Related Issues Improves code documentation and developer experience for MindSpeed-LLM core modules. See merge request: Ascend/MindSpeed-LLM!43971 个月前
[pytorch][feature][mindcluster] Integration of elastic-training-related callback code Co-authored-by: 李鸣沼<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3661 merge elastic_training into master [pytorch][feature][mindcluster] Integration of elastic-training-related callback code Created-by: lmztju Commit-by: lmztju;李鸣沼 Merged-by: ascend-robot Description: move takd elastic-training callback to MindSpeed-LLM See merge request: Ascend/MindSpeed-LLM!36616 个月前
refactor(pytorch): update deepseek4 shell Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4423 merge master into master refactor(pytorch): update deepseek4 shell Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44231 个月前
docs: Add comprehensive docstrings for core modules Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4397 merge feature/add-docstrings into master docs: Add comprehensive docstrings for core modules Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary This PR adds detailed English docstrings for key functions and classes across multiple core modules to improve code documentation and maintainability. ## Changes ### Core Modules (mindspeed_llm/core/) - **context_parallel**: Add docstrings for context parallel attention and wrapper functions - CPDotProductAttention: Context parallel dot product attention implementation - attention_init_wrapper: Attention initialization with Ulysses and hybrid CP support - **datasets**: Add docstrings for dataset building utilities - need_to_build_dataset: Determine which ranks need to build datasets - build_generic_dataset: Build distributed datasets - **distributed**: Add docstrings for gradient sync and buffer management - start_grad_sync_wrapper: Gradient synchronization with distributed optimizer support - recover_gradient_scaling_factors: Restore gradient scaling factors - **models**: Add docstrings for GPT layer specifications - get_gpt_layer_local_spec_wrapper: GPT layer spec with custom normalization - build_layers_wrapper: Layer building with MC2 optimization for MoE - **parallel_state**: Add docstrings for parallel initialization - initialize_model_parallel_decorator: Model parallel initialization with expert parallel support - **transformer**: Add docstrings for transformer block functions - get_num_layers_to_build: Calculate layers for current pipeline stage - get_layer_offset_wrapper: Layer offset with custom distribution support - transformer_block_init_wrapper: TransformerBlock initialization ### Operators (mindspeed_llm/ops/) - **triton**: Add docstrings for NPU optimization functions - get_npu_properties: Get NPU device properties - rms_norm_ref: Reference implementation of RMS normalization with gating ### Transformer Engine (mindspeed_llm/te/) - Add docstrings for transformer engine attention - do_kvallgather_context_parallel: Context parallel attention with KV AllGather strategy ### Training (mindspeed_llm/training/) - **arguments**: Add docstrings for argument parsing - extra_args_provider_decorator: Add MindSpeed-LLM specific arguments - parse_args_decorator: Parse arguments with MindSpeed-LLM processing - core_transformer_config_from_args_wrapper: Create TransformerConfig with extensions - validate_args_v2_decorator: Validate arguments with MindSpeed-LLM extensions - **checkpointing**: Add docstrings for checkpoint management - _load_base_checkpoint_wrapper: Load checkpoint with LoRA support - load_checkpoint_wrapper: Load checkpoint with loose loading support - **initialize**: Add docstrings for initialization - _compile_dependencies: Compile dataset index builder dependencies - **training**: Add docstrings for training utilities - _enable_npu_datadump_step_end: Enable NPU data dump - update_save_checkpoint_chmod: Update checkpoint file permissions - **utils**: Add docstrings for utility functions - _disable_gc: Context manager to disable garbage collection - temporal_async_caller_schedule_async_call: Schedule async call with GC disabled ## Documentation Standards All docstrings follow Python standard format: - Brief description of function/class purpose - Args: Parameter descriptions with types - Returns: Return value description - Note: Important usage notes and constraints (where applicable) ## Statistics - **Files changed**: 13 - **Lines added**: 443 - **Lines removed**: 6 ## Testing - All docstrings are written in English - Docstrings accurately describe function behavior - No functional code changes, only documentation improvements ## Related Issues Improves code documentation and developer experience for MindSpeed-LLM core modules. See merge request: Ascend/MindSpeed-LLM!43971 个月前