ascend-robotmodify：remove examples of old models

文件	最后提交记录	最后更新时间
1.3b	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
14b	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
feature_extract	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
README.md	modify：remove examples of old models Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2587 merge refactor into master modify：remove examples of old models Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 夕阳计划，下掉老旧模型，清单参考：https://gitcode.com/Ascend/MindSpeed-MM/issues/311 ## Does this PR introduce any user-facing change? 已提前发布版本计划。当前先日落大部分模型的脚本和相关测试用例，提高CI执行效率。Qwen2VL/InternVL和diffusers系列相关模型在下一阶段日落（近期内）。本次PR日落模型： StepVideo Lumnia Whisper JanusPro Vae Videoalign Qihoo-T2X Ming 删除的ST： Qwen2VL系列 InternVL系列 CogvideoX系列 Hunyuanvideo OpenSoraPlan1.3 ## How was this patch tested? 移除已有模型的启动脚本和看护用例，不涉及测试 See merge request: Ascend/MindSpeed-MM!2587	4 小时前
flf2v_input_first_frame.png	!1001 【feature】flf2v infer for wan2.1(First-Last-Frame to Video) Merge pull request !1001 from chenpeizhe/master	11 个月前
flf2v_input_last_frame.png	!1001 【feature】flf2v infer for wan2.1(First-Last-Frame to Video) Merge pull request !1001 from chenpeizhe/master	11 个月前
fsdp2_config.yaml	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
histogram_generator.py	modify：remove examples of old models Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2587 merge refactor into master modify：remove examples of old models Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 夕阳计划，下掉老旧模型，清单参考：https://gitcode.com/Ascend/MindSpeed-MM/issues/311 ## Does this PR introduce any user-facing change? 已提前发布版本计划。当前先日落大部分模型的脚本和相关测试用例，提高CI执行效率。Qwen2VL/InternVL和diffusers系列相关模型在下一阶段日落（近期内）。本次PR日落模型： StepVideo Lumnia Whisper JanusPro Vae Videoalign Qihoo-T2X Ming 删除的ST： Qwen2VL系列 InternVL系列 CogvideoX系列 Hunyuanvideo OpenSoraPlan1.3 ## How was this patch tested? 移除已有模型的启动脚本和看护用例，不涉及测试 See merge request: Ascend/MindSpeed-MM!2587	4 小时前
i2v_input.jpg	!794 add wan2.1 model t2v and i2v inference scripts Merge pull request !794 from 陈小龙/master	1 年前
samples_flf2v_images.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
samples_flf2v_prompts.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
samples_i2v_images.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
samples_i2v_prompts.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
samples_t2v_prompts.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
samples_v2v_prompts.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
samples_v2v_videos.txt	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前
v2v_input.mp4	!914 【feature】v2v for wan2.1 Merge pull request !914 from J石页/wan	1 年前
zero_config.yaml	style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2616	11 小时前

Wan2.1 使用指南

Wan2.1 使用指南

版本说明

参考实现

T2V I2V LoRA微调任务

url=https://github.com/modelscope/DiffSynth-Studio.git
commit_id=03ea278

FLF2V推理

url=https://github.com/huggingface/diffusers.git
commit_id=f8d4a1e

变更记录

2025.03.27: 首次支持Wan2.1模型

任务支持列表

模型大小	任务类型	预训练	lora微调	在线T2V推理	在线I2V推理	在线FLF2V推理	在线V2V推理
1.3B	t2v	✔	✔	✔			✔
1.3B	i2v	✔
14B	t2v	✔	✔	✔			✔
14B	i2v	✔	✔		✔
14B	flf2v					✔

环境安装

【模型开发时推荐使用配套的环境版本】

请参考安装指南

仓库拉取

git clone https://gitcode.com/Ascend/MindSpeed-MM.git
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout core_v0.12.1
cp -r megatron ../MindSpeed-MM/
cd ../MindSpeed-MM

环境搭建

# python3.10
conda create -n test python=3.10
conda activate test

# 安装 torch 和 torch_npu，注意要选择对应python版本、x86或arm的torch、torch_npu及apex包
pip install torch-2.7.1-cp310-cp310-manylinux_2_28_aarch64.whl
pip install torch_npu-2.7.1*-cp310-cp310-manylinux_2_28_aarch64.whl
# apex for Ascend 参考 https://gitcode.com/Ascend/apex
# 建议从原仓编译安装

# 将shell脚本中的环境变量路径修改为真实路径，下面为参考路径
source /usr/local/Ascend/cann/set_env.sh

# 安装加速库
git clone https://gitcode.com/Ascend/MindSpeed.git
cd MindSpeed
# checkout commit from MindSpeed core_r0.12.1
git checkout 93c45456c7044bacddebc5072316c01006c938f9
pip install -r requirements.txt
pip install -e .
cd ..

# 安装其余依赖库
pip install -e .

# 源码安装Diffusers
pip install diffusers==0.33.1

Decord搭建

【X86版安装】

pip install decord==0.6.0

【ARM版安装】

apt方式安装请参考链接

yum方式安装请参考脚本

权重下载及离线转换

Diffusers权重下载

模型	Hugging Face下载链接
T2V-1.3B	https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers
T2V-14B	https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers
I2V-14B-480P	https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
I2V-14B-720P	https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
FLF2V-14B-720P	https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P-Diffusers

权重转换

需要对下载后的Wan2.1模型权重transformer部分进行权重转换，运行权重转换脚本：

mm-convert WanConverter hf_to_mm \
 --cfg.source_path <./weights/Wan-AI/Wan2.1-{T2V/I2V/FLF2v}-{1.3/14}B-Diffusers/transformer/> \
 --cfg.target_path <./weights/Wan-AI/Wan2.1-{T2V/I2V/FLF2v}-{1.3/14}B-Diffusers/transformer/> \
 --cfg.target_parallel_config.pp_layers <pp_layers>

权重转换脚本的参数说明如下：

参数	含义	默认值
--cfg.source_path	原始权重路径	/
--cfg.target_path	转换或切分后权重保存路径	/
--pp_layers	PP/VPP层数	开启PP时, 使用PP和VPP需要指定各stage的层数并转换, 默认为`[]`，即不使用

如需转回Hugging Face格式，需运行权重转换脚本：

注：如进行layer zero进行训练，则需首先进行其训练权重后处理，再进行如下操作：

mm-convert WanConverter mm_to_hf \
 --cfg.source_path <path for your saved weight/> \
 --cfg.target_path <./converted_weights/Wan-AI/Wan2.1-{T2V/I2V/FLF2v}-{1.3/14}B-Diffusers/transformer/> \
 --cfg.hf_dir <weights/Wan-AI/Wan2.1-{T2V/I2V/FLF2v}-{1.3/14}B-Diffusers/transformer/>

权重转换脚本的参数说明如下：

参数	含义	默认值
--cfg.source_path	MindSpeed MM保存的权重路径	/
--cfg.target_path	转换后的Hugging Face权重路径	/
--cfg.hf_dir	原始Hugging Face权重路径，需要从该目录下获取原始Hugging Face配置文件	/

权重下载及在线加载

Diffusers权重下载

模型(已验证)	Hugging Face下载链接
T2V-1.3B	https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers
T2V-14B	https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers

在线加载

如果需要用在线权重加载进行模型训练的话，只需将下载的huggingface原始权重赋于examples/wan2.1/14b/t2v/pretrain_fsdp2.sh中的LOAD_PATH参数：

LOAD_PATH="./weights/Wan-AI/Wan2.1-T2V-14B-Diffusers/transformer/"

同时，将examples/wan2.1/14b/t2v/pretrain_fsdp2.sh中的bridge_patch置为true

    "patch": {
        "bridge_patch": true
    }

预训练

数据预处理

将数据处理成如下格式

</dataset>
  ├──data.json
  ├──videos
  │  ├──video0001.mp4
  │  ├──video0002.mp4

其中，videos/下存放视频，data.json中包含该数据集中所有的视频-文本对信息，具体示例如下：

[
    {
        "path": "videos/video0001.mp4",
        "cap": "Video discrimination1.",
        "num_frames": 81,
        "fps": 24,
        "resolution": {
            "height": 480,
            "width": 832
        }
    },
    {
        "path": "videos/video0002.mp4",
        "cap": "Video discrimination2.",
        "num_frames": 81,
        "fps": 24,
        "resolution": {
            "height": 480,
            "width": 832
        }
    },
    ......
]

修改examples/wan2.1/feature_extract/data.txt文件，其中每一行表示一个数据集，第一个参数表示数据文件夹的路径，第二个参数表示data.json文件的路径，用,分隔

特征提取

准备工作

在开始之前，请确认环境准备、模型权重和数据集预处理已经完成

参数配置

检查模型权重路径、数据集路径、提取后的特征保存路径等配置是否完成

配置文件	修改字段	修改说明
examples/wan2.1/feature_extract/data.json	num_frames	最大的帧数，超过则随机选取其中的num_frames帧
examples/wan2.1/feature_extract/data.json	max_height, max_width	最大的长宽，超过则centercrop到最大分辨率
examples/wan2.1/feature_extract/data.json	from_pretrained	修改为下载的tokenizer的权重所对应的路径
examples/wan2.1/feature_extract/feature_extraction.sh	NPUS_PER_NODE	卡数
examples/wan2.1/feature_extract/feature_extraction.sh	MM_MODEL	修改为目标task的的模型文件路径，如model_t2v.json
examples/wan2.1/feature_extract/model_{task}.json	from_pretrained	修改为下载的权重所对应路径（包括vae, text_encoder）
mindspeed_mm/tools/tools.json	save_path	提取后的特征保存路径

启动特征提取

bash examples/wan2.1/feature_extract/feature_extraction.sh

训练

准备工作

在开始之前，请确认环境准备、模型权重下载、特征提取已完成。

参数配置

检查模型权重路径、并行参数配置等是否完成

配置文件	修改字段	修改说明
examples/wan2.1/{model_size}/{task}/feature_data.json	basic_parameters	数据集路径，`data_path`和`data_folder`分别配置提取后的特征的文件路径和目录
examples/wan2.1/{model_size}/{task}/pretrain.sh	NPUS_PER_NODE	每个节点的卡数
examples/wan2.1/{model_size}/{task}/pretrain.sh	NNODES	节点数量
examples/wan2.1/{model_size}/{task}/pretrain.sh	LOAD_PATH	权重转换后的预训练权重路径
examples/wan2.1/{model_size}/{task}/pretrain.sh	SAVE_PATH	训练过程中保存的权重路径
examples/wan2.1/{model_size}/{task}/pretrain.sh	CP	训练时的CP size（建议根据训练时设定的分辨率调整）

【并行化配置参数说明】：

当调整模型参数或者视频序列长度时，需要根据实际情况启用以下并行策略，并通过调试确定最优并行策略。

CP: 序列并行。
- 使用场景：在视频序列（分辨率X帧数）较大时，可以开启来降低内存占用。
- 使能方式：在启动脚本中设置 CP > 1，如：CP=2；
- 限制条件：head 数量需要能够被CP整除（在examples/wan2.1/{model_size}/{task}/pretrain_model.json中配置，参数为num_heads）
- 默认使能方式为Ulysses序列并行。
- DiT-RingAttention：DiT RingAttention序列并行请参考文档
- DiT-USP: DiT USP混合序列并行（Ulysses + RingAttention）请参考文档
- FPDT(Fully Pipelined Distributed Transformer): Ulysses Offload 并行请参考文档
- 注：wan2.1使用full attention，对应general，即--attention-mask-type general。
layer_zero
- 使用场景：在模型参数规模较大时，单卡上无法承载完整的模型，可以通过开启layerzero降低静态内存。
- 使能方式：examples/wan2.1/{model_size}/{task}/pretrain.sh的GPT_ARGS中加入--layerzero和--layerzero-config ${layerzero_config}
- 训练权重后处理：使用该特性训练时，保存的权重需要使用下面的转换脚本进行后处理才能用于推理：
```
# 根据实际情况修改 ascend-toolkit 路径
source /usr/local/Ascend/cann/set_env.sh
mm-convert WanConverter layerzero_to_mm \
 --cfg.source_path <./save_ckpt/wan2.1/> \
 --cfg.target_path <./save_ckpt/wan2.1_megatron_ckpt/>
```
PP：流水线并行

目前支持将predictor模型切分流水线。
- 使用场景：模型参数较大时候，通过流水线方式切分并行，降低训练内存占用
- 使能方式：
  - 修改在 pretrain_model.json 文件中的"pipeline_num_layers", 类型为list。该list的长度即为 pipeline rank的数量，每一个数值代表rank_i中的层数。例如，[7, 8, 8, 7]代表有4个pipeline stage，每个容纳7/8个dit layers。注意list中所有的数值的和应该和总num_layers字段相等。此外，pp_rank==0的stage中除了包含dit层数以外，还会容纳text_encoder和ae，因此可以酌情减少第0个stage的dit层数。注意保证PP模型参数配置和模型转换时的参数配置一致。
  - 此外使用pp时需要在运行脚本GPT_ARGS中打开以下几个参数
```
PP = 4 # PP > 1 开启
GPT_ARGS="
--optimization-level 2 \
--use-multiparameter-pipeline-model-parallel \  #使用PP或者VPP功能必须要开启
--variable-seq-lengths \  #按需开启，动态shape训练需要加此配置，静态shape不要加此配置
"
```
VP: 虚拟流水线并行

目前支持将predictor模型切分虚拟流水线并行。
- 使用场景：对流水线并行进行进一步切分，通过虚拟化流水线，降低空泡
- 使能方式:
  - 如果想要使用虚拟流水线并行，请将pretrain_model.json文件中的"pipeline_num_layers"一维数组改造为两维，其中第一维表示虚拟并行的数量，二维表示流水线并行的数量，例如：[[3, 4, 4, 4], [3, 4, 4, 4]]，其中第一维两个数组表示vp为2, 第二维的stage个数为4表示流水线数量pp为3或4。
  - 需要在pretrain.sh当中修改如下变量，需要注意的是，VP仅在PP大于1的情况下生效:
```
PP=4
VP=2

GPT_ARGS="
  --pipeline-model-parallel-size ${PP} \
  --virtual-pipeline-model-parallel-size ${VP} \
  --optimization-level 2 \
  --use-multiparameter-pipeline-model-parallel \  #使用PP或者VPP功能必须要开启
  --variable-seq-lengths \  #按需开启，动态shape训练需要加此配置，静态shape不要加此配置
"
```
选择性重计算 + FA激活值offload
- 如果显存比较充裕，可以开启选择性重计算（self-attention不进行重计算）以提高吞吐，建议同步开启FA激活值offload，将FA的激活值异步卸载至CPU
- 选择性重计算
  - 在examples/wan2.1/{model_size}/{task}/pretrain.sh中，添加参数--recompute-skip-core-attention和--recompute-num-layers-skip-core-attention x可以开启选择性重计算，其中--recompute-num-layers-skip-core-attention后的数字表示跳过self attention计算的层数，--recompute-num-layers后的数字表示全重计算的层数，建议调小recompute-num-layers的同时增大recompute-num-layers-skip-core-attention直至显存打满。
```
GPT_ARGS="
    --recompute-granularity full \
    --recompute-method block \
    --recompute-num-layers 0 \
    --recompute-skip-core-attention \
    --recompute-num-layers-skip-core-attention 40 \
"
```
- 不进行重计算的self-attention激活值异步offload
  - 在examples/wan2.1/{model_size}/{task}/pretrain_model.json中，通过attention_async_offload字段可以开启异步offload，建议开启该功能，节省更多的显存
fsdp2
- 使用场景：在模型参数规模较大时，可以通过开启fsdp2降低静态内存。
- 使能方式：examples/wan2.1/{model_size}/{task}/pretrain_fsdp2.sh的GPT_ARGS中加入--use-torch-fsdp2，--fsdp2-config-path ${fsdp2_config}，--untie-embeddings-and-output-weights以及--ckpt-format torch_dist，其中fsdp2_config配置请参考：FSDP2说明

启动训练

bash examples/wan2.1/{model_size}/{task}/pretrain.sh

或

bash examples/wan2.1/{model_size}/{task}/pretrain_fsdp2.sh

lora 微调

准备工作

数据处理、特征提取、权重下载及转换同预训练章节

参数配置

参数配置同训练章节，除此之外，中涉及lora微调特有参数：

配置文件	修改字段	修改说明
examples/wan2.1/{model_size}/{task}/finetune_lora.sh	lora-r	lora更新矩阵的维度
examples/wan2.1/{model_size}/{task}/finetune_lora.sh	lora-alpha	lora-alpha 调节分解后的矩阵对原矩阵的影响程度
examples/wan2.1/{model_size}/{task}/finetune_lora.sh	lora-target-modules	应用lora的模块列表

启动微调

bash examples/wan2.1/{model_size}/{task}/finetune_lora.sh

微调完成后，可以使用权重转换工具，将训练好的lora权重与原始权重进行合并

mm-convert WanConverter merge_lora_to_base \
 --cfg.source_path <./converted_weights/Wan-AI/Wan2.1-{T2V/I2V}-{1.3/14}B-Diffusers/transformer/> \
 --cfg.target_path <./converted_weights/Wan-AI/Wan2.1-{T2V/I2V}-{1.3/14}B-Diffusers/transformer_merge/> \
 --cfg.lora_path <lora_save_path> \
 --lora_alpha 64 \
 --lora_rank 64

DPO训练

目前仅支持i2v任务的DPO基础训练，更多功能待后续完善。

环境准备

参考docs/zh/features/vbench-evaluate.md中的环境安装指导完成vbench及依赖三方件的安装
将VBench的 t2v json 下载到MM代码根路径"./vbench/VBench_full_info.json"

生成视频样本

修改推理配置文件：

参数配置文件	修改字段	修改说明
examples/wan2.1/14b/i2v/inference_model.json	from_pretrained	修改为下载的权重所对应路径（包括vae、tokenizer、text_encoder）
examples/wan2.1/14b/i2v/inference_model.json	num_inference_videos_per_sample	每个prompt生成的视频样本数量，建议至少大于2
examples/wan2.1/14b/i2v/inference_model.json	save_path	生成视频的保存路径
examples/wan2.1/14b/i2v/inference.sh	LOAD_PATH	转换之后的transformer部分权重路径

i2v prompts配置文件	修改字段	修改说明
examples/wan2.1/samples_i2v_images.txt	文件内容	图片路径
examples/wan2.1/samples_i2v_prompts.txt	文件内容	自定义prompt

启动推理流程生成视频样本：

bash examples/wan2.1/14b/i2v/inference.sh

删除视频样本保存路径下的video_grid.mp4，最终视频样本数量为：prompt条数 * $num_inference_videos_per_sample

生成偏好数据集

执行如下命令，为生成的视频样本打分，并生成偏好数据文件

python examples/wan2.1/histogram_generator.py --prompt_file <prompt文件路径> --videos_path <视频样本路径> --num_inference_videos_per_sample <每个prompt生成的视频样本数量>

生成偏好数据集脚本的参数说明如下：

参数	含义	如何配置
--prompt_file	prompt文件路径	与生成视频样本时，推理配置文件中的prompt字段值一致
--videos_path	视频样本路径	与生成视频样本时，推理配置文件中的save_path字段值一致
--num_inference_videos_per_sample	每个prompt生成的视频样本数量	与生成视频样本时，推理配置文件中的num_inference_videos_per_sample字段值一致

执行脚本后，会生成偏好数据集文件"data.jsonl"和评分概率直方图文件"video_score_histogram.json"，默认与视频样本目录平级

data.jsonl中包含成对的视频偏好数据和文本信息，具体示例如下：

[
    {
        "file": "video_0.mp4",
        "file_rejected": "video_2.mp4",
        "captions": "prompt1",
        "score": 0.646468401,
        "score_rejected": 0.5799660087
    },
    {
        "file": "video_4.mp4",
        "file_rejected": "video_5.mp4",
        "captions": "prompt2",
        "score": 0.7914018631,
        "score_rejected": 0.69968328357
    },
    ......
]

训练参数配置

在开始之前，请确认环境准备、模型权重准备、偏好数据准备已完成。

权重配置

需根据实际任务情况在启动脚本文件posttrain.sh中的LOAD_PATH="your_converted_dit_ckpt_dir"变量中添加转换后的权重的实际路径，如LOAD_PATH="./weights/Wan-AI/Wan2.1-I2V-14B-Diffusers/transformer/",其中./weights/Wan-AI/Wan2.1-I2V-14B-Diffusers/transformer/为转换后的权重的实际路径。LOAD_PATH变量中填写的完整路径一定要正确，填写错误的话会导致权重无法加载但运行并不会提示报错。根据需要填写SAVE_PATH变量中的路径，用以保存训练后的权重。
偏好数据集路径配置

根据实际情况修改feature_data.json中的偏好数据集路径，分别为"data_path": "./sora_features/data.jsonl"替换为实际的data.jsonl所在路径,"data_folder": "./sora_features/"替换"/data_path/"为实际的视频样本所在路径。
VAE及text_encoder、tokenizer路径配置

根据实际情况修改inference_model.json文件中from_pretrained字段配置vae、text_encoder、tokenizer路径。
dpo参数配置

根据实际情况修改posttrain_model.json中的直方图文件路径，即将histogram_path的值配置为执行生成偏好数据集脚本后，生成的"video_score_histogram.json"文件路径

启动DPO训练

bash examples/wan2.1/14b/i2v/posttrain.sh

推理

准备工作

在开始之前，请确认环境准备、模型权重下载已完成

参数配置

检查模型权重路径、并行参数等配置是否完成

配置文件	修改字段	修改说明
examples/wan2.1/{model_size}/{task}/inference_model.json	from_pretrained	修改为下载的权重所对应路径（包括vae、tokenizer、text_encoder）
examples/wan2.1/samples_t2v_prompts.txt	文件内容	T2V推理任务的prompt，可自定义，一行为一个prompt
examples/wan2.1/samples_i2v_prompts.txt	文件内容	I2V推理任务的prompt，可自定义，一行为一个prompt
examples/wan2.1/samples_i2v_images.txt	文件内容	I2V推理任务的首帧图片路径，可自定义，一行为一个图片路径
examples/wan2.1/samples_flf2v_prompts.txt	文件内容	FLF2V推理任务的prompt，可自定义，一行为一个prompt
examples/wan2.1/samples_flf2v_images.txt	文件内容	FLF2V推理任务的首、尾帧图片路径，可自定义，一行为两张图片（首、尾帧）路径，用", "隔开
examples/wan2.1/samples_v2v_prompts.txt	文件内容	V2V推理任务的prompt，可自定义，一行为一个prompt
examples/wan2.1/samples_v2v_videos.txt	文件内容	V2V推理任务的首个视频路径，可自定义，一行为一个视频路径
examples/wan2.1/{model_size}/{task}/inference_model.json	save_path	生成视频的保存路径
examples/wan2.1/{model_size}/{task}/inference_model.json	dual_image	双帧推理输入，仅在FLF2V任务中设置为true，其他任务可不配置
examples/wan2.1/{model_size}/{task}/inference_model.json	input_size	生成视频的分辨率，格式为 [t, h, w]
examples/wan2.1/{model_size}/{task}/inference_model.json	flow_shift	scheduler参数，480P推荐shift=3.0，720P推荐shift=5.0，FLF2V任务推荐shift=16.0
examples/wan2.1/{model_size}/{task}/inference.sh	LOAD_PATH	转换之后的transformer部分权重路径

启动推理

bash examples/wan2.1/{model_size}/{task}/inference.sh

环境变量声明

环境变量	描述	取值说明
`ASCEND_SLOG_PRINT_TO_STDOUT`	是否开启日志打印	`0`: 关闭日志打屏 `1`: 开启日志打屏
`ASCEND_GLOBAL_LOG_LEVEL`	设置应用类日志的日志级别及各模块日志级别，仅支持调试日志	`0`: 对应DEBUG级别 `1`: 对应INFO级别 `2`: 对应WARNING级别 `3`: 对应ERROR级别 `4`: 对应NULL级别，不输出日志
`TASK_QUEUE_ENABLE`	用于控制开启task_queue算子下发队列优化的等级	`0`: 关闭 `1`: 开启Level 1优化 `2`: 开启Level 2优化
`COMBINED_ENABLE`	设置combined标志。设置为0表示关闭此功能；设置为1表示开启，用于优化非连续两个算子组合类场景	`0`: 关闭 `1`: 开启
`CPU_AFFINITY_CONF`	控制CPU端算子任务的处理器亲和性，即设定任务绑核	设置`0`或未设置: 表示不启用绑核功能 `1`: 表示开启粗粒度绑核 `2`: 表示开启细粒度绑核
`HCCL_CONNECT_TIMEOUT`	用于限制不同设备之间socket建链过程的超时等待时间	需要配置为整数，取值范围`[120,7200]`，默认值为`120`，单位`s`
`PYTORCH_NPU_ALLOC_CONF`	控制缓存分配器行为	`expandable_segments:<value>`: 使能内存池扩展段功能，即虚拟内存特征
`HCCL_EXEC_TIMEOUT`	控制设备间执行时同步等待的时间，在该配置时间内各设备进程等待其他设备执行通信同步	需要配置为整数，取值范围`[68,17340]`，默认值为`1800`，单位`s`
`ACLNN_CACHE_LIMIT`	配置单算子执行API在Host侧缓存的算子信息条目个数	需要配置为整数，取值范围`[1, 10,000,000]`，默认值为`10000`
`TOKENIZERS_PARALLELISM`	用于控制Hugging Face的transformers库中的分词器（tokenizer）在多线程环境下的行为	`False`: 禁用并行分词 `True`: 开启并行分词
`MULTI_STREAM_MEMORY_REUSE`	配置多流内存复用是否开启	`0`: 关闭多流内存复用 `1`: 开启多流内存复用
`NPU_ASD_ENABLE`	控制是否开启Ascend Extension for PyTorch的特征值检测功能	设置`0`或未设置: 关闭特征值检测 `1`: 表示开启特征值检测，只打印异常日志，不告警 `2`:开启特征值检测，并告警 `3`:开启特征值检测，并告警，同时会在device侧info级别日志中记录过程数据
`ASCEND_LAUNCH_BLOCKING`	控制算子执行时是否启动同步模式	`0`: 采用异步方式执行 `1`: 强制算子采用同步模式运行
`NPUS_PER_NODE`	配置一个计算节点上使用的NPU数量	整数值（如 `1`, `8` 等）