795db6ec创建于 4月2日历史提交

文件	最后提交记录	最后更新时间
README.md	修复example、ssz、histogram资料问题 Co-authored-by: libarry<870390541@qq.com> # message auto-generated for no-merge-commit merge: !278 merge docs_bugfix into master 【docs】修复example、ssz、histogram资料问题 Created-by: libarry Commit-by: libarry Merged-by: ascend-robot Description: 感谢您贡献的Pull Request！在提交之前，请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) - 请明确说明您提交PR的变更内容。本部分旨在概述所做的变更，以及此PR是如何解决该问题的。请尽可能地提供有助于评审人员更高效、更快速完成检视审查的实用说明。 - 请说明为何需要这些更改，例如具体的使用场景或bug描述。 - 关联issue号（如果有）。 - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Related issue number (if any) ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意，这里指的是任何面向用户的变更，包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图)，以便Committer能够快速复现验证，也便于后续的维护。如果未添加测试，请说明未添加的原因，以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!278	2 个月前
calib_prompts.txt	【msmodelslim】cleancode Co-authored-by: caishengcheng<caishengcheng@huawei.com>	5 个月前
sd3_inference.py	【msmodelslim】刷新文件头部声明和版权 Co-authored-by: caishengcheng<caishengcheng@huawei.com> # message auto-generated for no-merge-commit merge: !28 merge license into master 【msmodelslim】刷新文件头部声明和版权 Created-by: caishengcheng Commit-by: caishengcheng Merged-by: ascend-robot Description: 【msmodelslim】刷新文件头部声明和版权 See merge request: Ascend/msmodelslim!28	4 个月前

SD3-Medium量化使用说明

当前仅支持对SD3模型的transformer部分进行W8A8静态量化。

使用前准备

安装 msModelSlim 工具，详情请参见《msModelSlim工具安装指南》。

支持的模型版本与量化策略

模型系列	模型版本	HuggingFace链接	W8A8	W8A16	W4A16	W4A4	时间步量化	FA3量化	异常值抑制量化	量化命令
SD3	SD3-Medium	SD3-Medium	✅							W8A8

说明：

✅ 表示该量化策略已通过msModelSlim官方验证，功能完整、性能稳定，建议优先采用。
空格表示该量化策略暂未通过msModelSlim官方验证，用户可根据实际需求进行配置尝试，但量化效果和功能稳定性无法得到官方保证。
点击量化命令列中的链接可跳转到对应的具体量化命令

使用示例

SD3-Medium W8A8量化

我们提供了完整的量化启动脚本示例：SD3/sd3_inference.py，其启动命令可参考(请提前确保calib_prompts.txt权限不大于'0o640')：

python /the/absolute/path/of/example/multimodal_sd/SD3/sd3_inference.py \
    --sd3_model_path "/path/to/stable-diffusion-3-medium-diffusers" \
    --prompt_path "example/multimodal_sd/SD3/calib_prompts.txt" \
    --width 1024 \
    --height 1024 \
    --infer_steps 28 \
    --seed 42 \
    --device "npu" \
    --save_path "./results/quant/images" \
    --do_quant \
    --quant_weight_save_folder "./results/quant/safetensors" \
    --quant_dump_calib_folder "./results/quant/cache" \
    --quant_type "w8a8"

校准数据Dump和量化的示例代码

# 导入模型库
import os
import torch
from diffusers import StableDiffusion3Pipeline

from ascend_utils.common.security.pytorch import safe_torch_load
from msmodelslim.quant import quant_model, SessionConfig
from msmodelslim.quant import W8A8ProcessorConfig, W8A8QuantConfig, SaveProcessorConfig
from example.multimodal_sd.utils import get_disable_layer_names, get_rank, DumperManager, get_rank_suffix_file

DUMP_CALIB_FOLDER = './results/quant/cache'  # 用于存放校准数据的文件夹
SAFE_TENSOR_FOLDER = './results/quant/safetensors'  # 用于存放量化模型的文件夹

rank = get_rank()
is_distributed = rank >= 0  # 标记是否为分布式环境

dump_data_path = os.path.join(DUMP_CALIB_FOLDER, get_rank_suffix_file(base_name="calib_data", ext="pth",
                                                                      is_distributed=is_distributed, rank=rank))

############################ 加载模型 ############################
def load_t2v_checkpoint(model_path):
    pipeline = StableDiffusion3Pipeline.from_pretrained(model_path, torch_dtype=torch.float16).to('npu')
    return pipeline


pipeline = load_t2v_checkpoint("/path/to/stable-diffusion-3-medium-diffusers")  # 加载模型

model = pipeline.transformer

############################ dump 校准数据 ############################
if not os.path.exists(dump_data_path):  # 检查校准数据是否已存在，不存在则dump
    # 添加forward hook用于dump model的forward输入
    dumper_manager = DumperManager(model, capture_mode='args')

    # 执行浮点模型推理

    pipeline(
        prompts=["A photo of an astronaut riding a horse on mars"],
        negative_prompts=[""],
        width=1024,
        height=1024,
        num_inference_steps=28,
        ...
    )
    # 保存校准数据
    dumper_manager.save(dump_data_path)

############################ 启动量化 ############################
# 加载校准数据，校准数据需要提前dump生成
calib_dataset = safe_torch_load(dump_data_path, map_location=f'npu:{rank if is_distributed else 0}')
safetensors_name = get_rank_suffix_file(base_name='quant_model_weight_w8a8', ext='safetensors',
                                        is_distributed=is_distributed, rank=rank)
json_name = get_rank_suffix_file(base_name='quant_model_description_w8a8', ext='json',
                                 is_distributed=is_distributed, rank=rank)
# 量化配置
session_cfg = SessionConfig(
    processor_cfg_map={
        "w8a8": W8A8ProcessorConfig(
            cfg = W8A8QuantConfig(
                act_method='minmax'
            ),
            disable_names=['context_embedder']
        ),
        "save": SaveProcessorConfig(
            output_path=SAFE_TENSOR_FOLDER,
            safetensors_name=safetensors_name,
            json_name=json_name,
            save_type=['safe_tensor'],
            part_file_size=None
        )
    },
    calib_data=calib_dataset,
    device='npu'
)

# python pydantic库自带的数据类型校验
session_cfg.model_validate(session_cfg)

# 量化模型
quant_model(model, session_cfg)

附录

运行参数说明

以下是使用SD3/sd3_inference.py进行SD3模型推理量化时的参数说明。

参数名	含义	使用限制
sd3_model_path	SD3原始浮点模型路径	必选。数据类型：字符串。无默认值。
prompt_path	输入prompt（提示词）路径	可选。数据类型：字符串。默认值"./calib_prompts.txt"。
width	生成图像宽度	可选。数据类型：整型。默认值1024。
height	生成图像高度	可选。数据类型：整型。默认值1024。
infer_steps	推理步数	可选。数据类型：整型。默认值28。
seed	prompt（提示词）随机种子	可选。数据类型：整型。默认值42。
device	模型运行设备	可选。数据类型：字符串。默认值"npu"，当前仅支持npu。
save_path	推理图像保存路径	可选。数据类型：字符串。默认值"./results"。仅在do_save_img开启时生效。
do_quant	是否进行量化	必选。数据类型：布尔型。默认False，即不启动量化。只有显式传入 --do_quant 则变为True，在进行SD3模型推理量化时，必须使能该参数。
quant_type	量化类型	可选。数据类型：字符串。默认值"w8a8"，当前仅支持"w8a8"。
quant_weight_save_folder	量化权重保存路径	必选。数据类型：字符串。无默认值。
quant_dump_calib_folder	量化校准数据保存路径	必选。数据类型：字符串。无默认值。
do_save_img	是否进行推理图像保存	可选。数据类型：布尔型。默认False，即不启动推理图像保存。只有显式传入 --do_save_img 则变为True，启动图像保存。