795db6ec创建于 4月2日历史提交

文件	最后提交记录	最后更新时间
README.md	修复example、ssz、histogram资料问题 Co-authored-by: libarry<870390541@qq.com> # message auto-generated for no-merge-commit merge: !278 merge docs_bugfix into master 【docs】修复example、ssz、histogram资料问题 Created-by: libarry Commit-by: libarry Merged-by: ascend-robot Description: 感谢您贡献的Pull Request！在提交之前，请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) - 请明确说明您提交PR的变更内容。本部分旨在概述所做的变更，以及此PR是如何解决该问题的。请尽可能地提供有助于评审人员更高效、更快速完成检视审查的实用说明。 - 请说明为何需要这些更改，例如具体的使用场景或bug描述。 - 关联issue号（如果有）。 - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Related issue number (if any) ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意，这里指的是任何面向用户的变更，包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图)，以便Committer能够快速复现验证，也便于后续的维护。如果未添加测试，请说明未添加的原因，以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!278	2 个月前
calib_prompts.txt	【msmodelslim】cleancode Co-authored-by: caishengcheng<caishengcheng@huawei.com>	5 个月前
inference.py	【msmodelslim】刷新文件头部声明和版权 Co-authored-by: caishengcheng<caishengcheng@huawei.com> # message auto-generated for no-merge-commit merge: !28 merge license into master 【msmodelslim】刷新文件头部声明和版权 Created-by: caishengcheng Commit-by: caishengcheng Merged-by: ascend-robot Description: 【msmodelslim】刷新文件头部声明和版权 See merge request: Ascend/msmodelslim!28	4 个月前

Open-Sora-Plan V1.2 量化使用说明

Open-Sora-Plan V1.2的推理量化依赖于推理工程仓：MindIE/open_sora_planv1_2，根据该工程仓完成配置后，使用以下示例代码进行量化。

使用前准备

安装 msModelSlim 工具，详情请参见《msModelSlim工具安装指南》。

支持的模型版本与量化策略

模型系列	模型版本	HuggingFace链接	W8A8	W8A16	W4A16	W4A4	稀疏量化	KV Cache	Attention	时间步量化	FA3量化	异常值抑制量化	量化命令
Open-Sora-Plan	Open-Sora-Plan v1.2	Open-Sora-Plan v1.2	✅										W8A8静态量化

说明：

✅ 表示该量化策略已通过msModelSlim官方验证，功能完整、性能稳定，建议优先采用。
空格表示该量化策略暂未通过msModelSlim官方验证，用户可根据实际需求进行配置尝试，但量化效果和功能稳定性无法得到官方保证。
点击量化命令列中的链接可跳转到对应的具体量化命令

使用示例

Open-Sora-Plan V1.2 W8A8静态量化

量化启动命令

我们提供了完整的量化启动脚本示例：OpenSoraPlanV1_2/inference.py，其启动命令可参考(请提前确保calib_prompts.txt权限不大于'0o640')：

# 根据使用卡数进行配置多卡环境变量和nproc_per_node，以下使用8卡为例
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:False"
export TASK_QUEUE_ENABLE=2
export HCCL_OP_EXPANSION_MODE="AIV"
torchrun --nnodes=1 --nproc_per_node 8  --master_port 29503 \
    /the/absolute/path/of/example/multimodal_sd/OpenSoraPlanV1_2/inference.py \
    --model_path /path/to/checkpoint-xxx/model_ema \
    --num_frames 93 \
    --height 720 \
    --width 1280 \
    --cache_dir "../cache_dir" \
    --text_encoder_name google/mt5-xxl \
    --text_prompt "example/multimodal_sd/OpenSoraPlanV1_2/calib_prompts.txt" \
    --ae CausalVAEModel_D4_4x8x8 \
    --ae_path "/path/to/causalvideovae" \
    --fps 24 \
    --guidance_scale 7.5 \
    --num_sampling_steps 100 \
    --tile_overlap_factor 0.125 \
    --max_sequence_length 512 \
    --dtype bf16 \
    --use_cfg_parallel \
    --algorithm "dit_cache" \
    --save_img_path "./results/quant/images" \
    --do_quant \
    --quant_weight_save_folder "./results/quant/safetensors" \
    --quant_dump_calib_folder "./results/quant/cache" \
    --quant_type "w8a8"

校准数据Dump和量化的示例代码

import os
import torch

from ascend_utils.common.security.pytorch import safe_torch_load
from msmodelslim.quant import quant_model, SessionConfig
from msmodelslim.quant import W8A8ProcessorConfig, W8A8QuantConfig, SaveProcessorConfig
from example.multimodal_sd.utils import get_disable_layer_names, get_rank, DumperManager, get_rank_suffix_file

DUMP_CALIB_FOLDER = './results/quant/cache'  # 用于存放校准数据的文件夹
SAFE_TENSOR_FOLDER = './results/quant/safe_tensor'  # 用于存放量化模型的文件夹

rank = get_rank()
is_distributed = rank >= 0  # 标记是否为分布式环境

dump_data_path = os.path.join(DUMP_CALIB_FOLDER, get_rank_suffix_file(base_name="calib_data", ext="pth",
                                                                      is_distributed=is_distributed, rank=rank))

############################ 加载模型 ############################
model_path = './model' #模型路径

def load_t2v_checkpoint(model_path):
    pass


pipeline = load_t2v_checkpoint(model_path)  # 加载模型

model = pipeline.transformer

############################ dump 校准数据 ############################
def run_model_and_save_images(pipeline, ...):
    # 原始模型推理过程
    pass

if not os.path.exists(dump_data_path):  # 检查校准数据是否已存在，不存在则dump
    # 添加forward hook用于dump model的forward输入
    dumper_manager = DumperManager(model, capture_mode='args')

    # 执行浮点模型推理
    run_model_and_save_images(
        pipeline,
        ...
    )
    # 保存校准数据
    dumper_manager.save(dump_data_path)

############################ 启动量化 ############################
# 加载校准数据，校准数据需要提前dump生成
calib_dataset = safe_torch_load(dump_data_path, map_location=f'npu:{rank if is_distributed else 0}')
safetensors_name = get_rank_suffix_file(base_name='quant_model_weight_w8a8', ext='safetensors',
                                        is_distributed=is_distributed, rank=rank)
json_name = get_rank_suffix_file(base_name='quant_model_description_w8a8', ext='json',
                                 is_distributed=is_distributed, rank=rank)
# 量化配置
session_cfg = SessionConfig(
    processor_cfg_map={
        "w8a8": W8A8ProcessorConfig(
            cfg=W8A8QuantConfig(
                act_method='minmax'
            ),
            disable_names=get_disable_layer_names(model, layer_include=None,
                                                    layer_exclude=('*net.2*', '*adaln_single*'))
        ),
        "save": SaveProcessorConfig(
            output_path=SAFE_TENSOR_FOLDER,
            safetensors_name=safetensors_name,
            json_name=json_name,
            save_type=['safe_tensor'],
            part_file_size=None
        )
    },
    calib_data=calib_dataset,
    device='npu'
)

# pydantic库自带的数据类型校验
session_cfg.model_validate(session_cfg)

# 量化模型
quant_model(model, session_cfg)

附录

运行参数说明

以下是使用OpenSoraPlanV1_2/inference.py进行Open-Sora-Plan V1.2模型推理量化时的参数说明。量化启动命令未涉及参数对应的说明请见Open-Sora-Plan V1.2推理工程仓MindIE/open_sora_planv1_2

参数名	含义	使用限制
model_path	Open-Sora-Plan V1.2原始浮点模型路径	必选。数据类型：字符串。无默认值。
num_frames	设置生成的总帧数	可选。数据类型：整型。默认值93。
height	指定生成视频的高度	可选。数据类型：整型。默认值720。
width	指定生成视频的宽度	可选。数据类型：整型。默认值1280。
dtype	指定用于推理的数据类型	可选。数据类型：字符串。默认值'bf16'。可选值：'bf16'或'fp16'。
cache_dir	指定缓存目录，用于存储临时文件	可选。数据类型：字符串。默认值'./cache_dir'。
ae	VAE的对视频的压缩规格	可选。数据类型：字符串。默认值'CausalVAEModel_4x8x8'。
ae_path	指定VAE模型权重配置路径	可选。数据类型：字符串。默认值'CausalVAEModel_4x8x8'。
text_encoder_name	指定text_encoder权重配置路径	可选。数据类型：字符串。默认值'google/mt5-xxl'。
save_img_path	指定生成视频的保存路径	可选。数据类型：字符串。默认值"./sample_videos/t2v"。
guidance_scale	指定引导比例，用于控制negative文本对视频生成的影响程度	可选。数据类型：浮点型。默认值7.5。
num_sampling_steps	指定采样步骤的数量，用于控制生成视频的多样性	可选。数据类型：整型。默认值50。
fps	指定生成视频的帧率	可选。数据类型：整型。默认值24。
batch_size	指定批处理大小，用于控制一次生成视频的数量	可选。数据类型：整型。默认值1。
max_sequence_length	指定最大序列长度，用于控制文本编码器的输入长度	可选。数据类型：整型。默认值512。
text_prompt	指定文本提示，可以是单个字符串或包含多个字符串的列表，也可以是包含多个字符串的文本文件路径	必选。数据类型：字符串或字符串列表、txt文件。无默认值。
tile_overlap_factor	VAE tiling decode时重叠比例，用于控制生成视频的细节	可选。数据类型：浮点型。默认值0.25。
algorithm	指定使用的算法	可选。数据类型：字符串。默认值None。可选值：None、'dit_cache'或'sampling_optimize'。
use_cfg_parallel	是否使用cfg并行，用于控制模型的并行计算方式	可选。数据类型：布尔型。默认值False。只有显式传入 --use_cfg_parallel 则变为True。
test_time	是否开启性能测试	可选。数据类型：布尔型。默认值False。只有显式传入 --test_time 则变为True。
seed	控制随机种子	可选。数据类型：整型。默认值1234。
vae_parallel	是否启用VAE并行计算	可选。数据类型：布尔型。默认值False。只有显式传入 --vae_parallel 则变为True。
do_quant	是否进行量化	必选。数据类型：布尔型。默认False，即不启动量化。只有显式传入 --do_quant 则变为True，在进行Open-Sora-Plan v1.2模型推理量化时，必须使能该参数。
quant_type	指定量化类型	可选。数据类型：字符串。默认值"w8a8"。可选值："w8a8"。
quant_weight_save_folder	指定量化模型权重保存路径	必选。数据类型：字符串。无默认值。
quant_dump_calib_folder	指定量化校准数据保存路径	必选。数据类型：字符串。无默认值。
do_save_video	是否进行推理视频保存	可选。数据类型：布尔型。默认False，即不启动推理视频保存。只有显式传入 --do_save_video 则变为True，启动视频保存。