6a3b82c1创建于 4月14日历史提交

文件	最后提交记录	最后更新时间
README.md	【docs】：大模型检查低错修改 Co-authored-by: zzm30<zhengzhimin1@h-partners.com> # message auto-generated for no-merge-commit merge: !333 merge master into master 【docs】：大模型检查低错修改 Created-by: zzm30 Commit-by: zzm30 Merged-by: ascend-robot Description: 感谢您贡献的Pull Request！在提交之前，请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) - 请明确说明您提交PR的变更内容。本部分旨在概述所做的变更，以及此PR是如何解决该问题的。请尽可能地提供有助于评审人员更高效、更快速完成检视审查的实用说明。 - 请说明为何需要这些更改，例如具体的使用场景或bug描述。 - 关联issue号（如果有）。 - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Related issue number (if any) ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意，这里指的是任何面向用户的变更，包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图)，以便Committer能够快速复现验证，也便于后续的维护。如果未添加测试，请说明未添加的原因，以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!333	1 个月前
quant_llava.py	【msmodelslim】刷新文件头部声明和版权 Co-authored-by: caishengcheng<caishengcheng@huawei.com> # message auto-generated for no-merge-commit merge: !28 merge license into master 【msmodelslim】刷新文件头部声明和版权 Created-by: caishengcheng Commit-by: caishengcheng Merged-by: ascend-robot Description: 【msmodelslim】刷新文件头部声明和版权 See merge request: Ascend/msmodelslim!28	4 个月前

LLaVA 量化说明

模型介绍

LLaVA（Large Language and Vision Assistant）是一个多模态大模型，由威斯康星大学麦迪逊分校、微软研究院和哥伦比亚大学研究者共同发布。能完成图像描述、视觉问答、图像查询、根据图片写代码等任务，还能用于多模态聊天、科学问答，帮助理解图像内容并生成相应的自然语言文本。

使用前准备

安装 msModelSlim 工具，详情请参见《msModelSlim工具安装指南》。
transformers版本需要配置安装为4.37.2
```
pip install transformers==4.37.2
```

LLaVA模型当前已验证的量化方法

模型	原始浮点权重	量化方式	推理框架支持情况	量化命令
LLaVA-v1.5-7B	llava-1.5-7b-hf	W8A8静态量化	MindIE当前不支持 vLLM Ascend当前不支持	W8A8静态量化

说明：

点击量化命令列中的链接可跳转到对应的具体量化命令。

生成量化权重

量化权重统一使用quant_llava.py脚本生成，以下提供LLaVA模型量化权重生成快速启动命令。

使用示例

如果需要使用NPU多卡量化，请先配置环境变量以支持多卡量化（Atlas 300I Duo 系列产品不支持多卡量化）：

# 根据实际情况选择多卡，以下2卡量化为例：
export ASCEND_RT_VISIBLE_DEVICES=0,1
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False

若加载自定义模型，调用from_pretrained函数时要指定trust_remote_code=True，让修改后的自定义代码文件能够正确地被加载。(请确保加载的自定义代码文件的安全性)

1. LLaVA-v1.5-7B

LLaVA-v1.5-7B W8A8静态量化

生成LLaVA-v1.5-7B模型W8A8量化权重，异常值抑制使用m2算法，在NPU上运行，请将{浮点权重路径}和{量化权重路径}替换为用户实际路径。{校准图片路径}默认为"../calibImages"，用户可根据实际场景替换为其他图片。

python quant_llava.py  --model_path {浮点权重路径} --calib_images {校准图片路径}  --save_directory {量化权重保存路径} --w_bit 8 --a_bit 8 --device_type npu --trust_remote_code True --mindie_format

附录

量化参数说明

参数名	含义	默认值	使用方法
model_path	浮点权重路径	无默认值	必选参数；输入LLaVA原始浮点权重目录路径。
calib_images	校准集图片路径	../calibImages	可选参数；输入校准数据集的目录路径。本示例中图片来源于公开数据集COCO。示例选取其中2张图片。用户可根据实际场景替换为其他图片。
save_directory	量化权重路径	无默认值	必选参数；输出量化权重路径。
part_file_size	量化权重文件大小，单位是GB	默认为None，不限制单个权重文件大小，只生成一个量化权重文件。	可选参数；生成量化权重文件大小，请用户自定义单个量化权重文件的大小上限。
w_bit	权重量化bit	8	可选参数; 在LLaVA量化场景下支持配置为8。
a_bit	激活值量化bit	8	可选参数; 在LLaVA量化场景下支持配置为8。
device_type	量化运行设备类型	'npu'	可选参数; 可选值：['cpu', 'npu']。
trust_remote_code	是否信任自定义代码	False	可选参数; 指定`trust_remote_code=True`让修改后的自定义代码文件能够正确地被加载(请确保所加载的自定义代码文件来源可靠，避免潜在的安全风险)。
mindie_format	多模态理解模型量化后的权重配置文件是否兼容MindIE现有版本	False	开启`mindie_format`时保存的量化权重格式能够兼容MindIE当前的版本。

更多参数配置要求，请参考量化过程中配置的参数 QuantConfig 以及量化参数配置类 Calibrator