MindSpeed-MM/examples/magistral-2509/fsdp2_config.yaml-代码预览-MindSpeed-MM:基于昇腾芯片的多模态大模型训练套件项目 - AtomGit

ascend-robot[Feature] Add Magistral3 lora readme and script

sharding_size: auto
reshard_after_forward: True
sub_modules_to_wrap:
  - language_model.embed_tokens
  - language_model.layers.{*}
  - vision_tower.transformer.layers.{*}
  - multi_modal_projector.linear_1
  - multi_modal_projector.linear_2
  - lm_head
recompute_modules:
  - language_model.layers.{*}
  - vision_tower.transformer.layers.{*}
param_dtype: "bf16"
reduce_dtype: "bf16"
cast_forward_inputs: True
num_to_forward_prefetch: 1
num_to_backward_prefetch: 1
offload_to_cpu: False