文件最后提交记录最后更新时间
【docs】:大模型检查低错修改 Co-authored-by: zzm30<zhengzhimin1@h-partners.com> # message auto-generated for no-merge-commit merge: !333 merge master into master 【docs】:大模型检查低错修改 Created-by: zzm30 Commit-by: zzm30 Merged-by: ascend-robot Description: 感谢您贡献的Pull Request! 在提交之前,请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) - 请明确说明您提交PR的变更内容。本部分旨在概述所做的变更,以及此PR是如何解决该问题的。请尽可能地提供有助于评审人员更高效、更快速完成检视审查的实用说明。 - 请说明为何需要这些更改,例如具体的使用场景或bug描述。 - 关联issue号(如果有)。 - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Related issue number (if any) ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意,这里指的是**任何**面向用户的变更,包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。 如果本次测试方式与常规单元测试不同,请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图),以便Committer能够快速复现验证,也便于后续的维护。 如果未添加测试,请说明未添加的原因,以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!3331 个月前
【docs】资料架构重构,并将资料托管至readthedocs。 Co-authored-by: keith_wa<keith_wwa@163.com> # message auto-generated for no-merge-commit merge: !109 merge pr_docs_re_copy into master 【docs】资料架构重构,并将资料托管至readthedocs。 Created-by: keith_wa Commit-by: keith_wa Merged-by: ascend-robot Description: 1. 动机 (Motivation) 内容上: 优化导航结构:原目录结构逻辑不清晰、目录层级深且链接复杂。 提升阅读体验:原 traditional_quantization_v0 目录下存在 20+ 个零散文档,内容分布碎片化,用户难以快速建立完整的技术全景认知。 消除内容冗余:多个文档之间存在重复的依赖说明、操作流程及参数介绍,增加了维护成本及版本不一致的风险。 呈现上: 提供专业资料托管:原docs/目录结构不清晰、目录名/文档名不直观(英文),跳转繁琐且无搜索功能。 2. 修改点 (Changes) 2.1 重新梳理目录结构 ![image.png](https://raw.gitcode.com/user-images/assets/8444818/8bba19f9-e84e-4f71-bd5e-ffd310dde142/image.png 'image.png') 2.2. 文档整合与重构 V0及传统量化核心文档合并:将 20 多个零散文档按功能维度深度整合为 10篇核心指南: # V0框架文档导航(已停止演进) 本目录文档按模型类型与任务场景重排,便于按需求快速定位。 ## 一、传统模型量化与校准 - [传统模型量化与校准](traditional_model_quantization_and_calibration.md) - 包含 PyTorch/ONNX/MindSpore 训练后量化与 QAT。 ## 二、大模型量化与压缩 - [大模型量化与校准](foundation_model_quantization_and_calibration.md) - 包含低显存量化、混合校准数据集、FA3 量化。 - [压缩与结构优化(大模型为主)](foundation_model_compression.md) - 包含稀疏量化与权重压缩、长序列压缩、权重压缩流程、低秩分解。 ## 三、训练加速与模型改造 - [训练加速与模型改造](pruning_and_distillation.md) - 包含重要性剪枝、Transformer 剪枝、Sparse tool、模型蒸馏。 - [稀疏加速训练](sparse_acceleration_training.md) - 包含宽度扩增与深度扩增模型的稀疏训练加速流程。 ## 四、工具与生态适配 - [辅助工具与专项指导](compression_utils.md) - 包含量化权重格式说明与 MindSpeed 适配器。 - [伪量化精度测试工具](fake_quantization_accuracy_testing_tool.md) - 包含 Precision Tool 使用方式与测试流程。 - [多模态生成模型推理优化](inference_optimization_for_multimodal_generative_model.md) - 包含 DiT 缓存优化与自适应采样优化流程。 - [常见代码示例](quantization_and_sparse_quantization_scenario_import_code_examples.md) - 包含常见量化/稀疏量化场景导入代码样例。 2.3 配置readthedocs文档托管: https://modelslim.readthedocs.io/zh-cn/latest/ 2.4 配置deepwiki: https://deepwiki.com/Keithwwa/ModelSlim 3. 验证: 3.1. gimini代码检视: https://github.com/Keithwwa/ModelSlim/pull/1 See merge request: Ascend/msmodelslim!1093 个月前
[docs] fix doc tools scan problem Co-authored-by: zhongzhoutan<1710115119@bjmu.edu.cn> # message auto-generated for no-merge-commit merge: !344 merge docs/fix-lint-master into master [docs] fix doc tools scan problem Created-by: tangxuanya Commit-by: zhongzhoutan Merged-by: ascend-robot Description: 感谢您贡献的Pull Request! 在提交之前,请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) - 请明确说明您提交PR的变更内容。本部分旨在概述所做的变更,以及此PR是如何解决该问题的。请尽可能地提供有助于评审人员更高效、更快速完成检视审查的实用说明。 - 请说明为何需要这些更改,例如具体的使用场景或bug描述。 - 关联issue号(如果有)。 - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Related issue number (if any) ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意,这里指的是**任何**面向用户的变更,包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。 如果本次测试方式与常规单元测试不同,请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图),以便Committer能够快速复现验证,也便于后续的维护。 如果未添加测试,请说明未添加的原因,以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!3441 个月前
mxfp4量化支持fouroversix算法 Co-authored-by: gcw_RRBWdjVD<wangrong89@huawei.com> # message auto-generated for no-merge-commit merge: !410 merge pr_w00608002_fouroversix into master mxfp4量化支持fouroversix算法 Created-by: gcw_RRBWdjVD Commit-by: gcw_RRBWdjVD Merged-by: ascend-robot Description: 感谢您贡献的Pull Request! 在提交之前,请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) mxfp4支持fouroversix量化算法 ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意,这里指的是**任何**面向用户的变更,包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。 如果本次测试方式与常规单元测试不同,请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图),以便Committer能够快速复现验证,也便于后续的维护。 如果未添加测试,请说明未添加的原因,以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!41014 小时前
【doc】【issue】analysis doc fix Co-authored-by: xyxin_006<xyxin_hit@163.com> # message auto-generated for no-merge-commit merge: !476 merge analysis_doc_fix into master 【doc】【issue】analysis doc fix Created-by: xyxin_006 Commit-by: xyxin_006 Merged-by: ascend-robot Description: 感谢您贡献的Pull Request! 在提交之前,请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) 针对issue中提到了敏感层资料问题进行完善与修复 https://gitcode.com/Ascend/msmodelslim/issues/257 ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意,这里指的是**任何**面向用户的变更,包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。 如果本次测试方式与常规单元测试不同,请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图),以便Committer能够快速复现验证,也便于后续的维护。 如果未添加测试,请说明未添加的原因,以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!4761 天前
【feature】【doc】敏感层分析资料优化 Co-authored-by: xyxin_006<xyxin_hit@163.com> # message auto-generated for no-merge-commit merge: !383 merge doc/analysis_based_mse into master 【feature】【doc】敏感层分析资料优化 Created-by: xyxin_006 Commit-by: xyxin_006 Merged-by: ascend-robot Description: 感谢您贡献的Pull Request! 在提交之前,请务必阅读 [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md)。 Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ [CONTRIBUTING.md](https://gitcode.com/Ascend/msmodelslim/blob/master/CONTRIBUTING.md). ## PR描述 (What this PR does / why we need it?) - 请明确说明您提交PR的变更内容。本部分旨在概述所做的变更,以及此PR是如何解决该问题的。请尽可能地提供有助于评审人员更高效、更快速完成检视审查的实用说明。 - 请说明为何需要这些更改,例如具体的使用场景或bug描述。 - 关联issue号(如果有)。 - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Related issue number (if any) ## 面向用户的变更 (Does this PR introduce _any_ user-facing change)? - 请注意,这里指的是**任何**面向用户的变更,包括但不限于API、用户界面或其他使用方式上的变更。 - Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. ## 功能验证 (How was this patch tested?) 请确认CI已通过增量及存量的单元测试用例。 如果本次测试方式与常规单元测试不同,请详细说明您的测试步骤(最好提供完整的可复现的操作路径及关键截图),以便Committer能够快速复现验证,也便于后续的维护。 如果未添加测试,请说明未添加的原因,以及为何难添加测试。 - [_] 功能自验 - [_] 本地自验截图(涉及个人标识符等敏感信息请注意脱敏) - [_] 新增/变更内容是否已新增/适配UT测试用例看护 CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. - [_] Self-verification of the feature. - [_] Screenshot of local self-verification (please anonymize any sensitive information such as personal identifiers) - [_] Have new or modified unit test (UT) cases been added or adapted to cover the newly added or changed content? See merge request: Ascend/msmodelslim!38328 天前
README.md

toc_depth: 3

量化算法总览

msModelSlim 支持多种先进的量化算法,涵盖了从离群值抑制到低比特优化的各个环节。下表按类别总结了目前支持的核心算法及其主要特性。

离群值抑制算法

离群值抑制算法旨在平滑激活值的分布,减少量化带来的精度损失。

算法名称 核心思想 适用场景 详细说明
QuaRot 应用正交旋转矩阵平滑激活值分布 抑制激活离群值,提升精度 查看详情
Adapt Rotation 在QuaRot基础上使用基于校准数据迭代优化 Hadamard 旋转矩阵 优化旋转矩阵,进一步提升低比特量化精度 查看详情
SmoothQuant 协同缩放激活与权重,平滑离群值 抑制激活离群值 查看详情
Iterative Smooth 迭代式平滑缩放,更精细的分布调整 复杂分布下的精度优化 查看详情
Flex Smooth Quant 二阶段网格搜索自动寻找最优 alpha/beta 灵活适配不同架构 查看详情
Flex AWQ SSZ 结合 AWQ 与 SSZ,使用真实量化器评估误差 自动搜索最优平滑参数 查看详情
KV Smooth 针对 KV Cache 的平滑抑制算法 降低 KV Cache 显存占用 查看详情
AWQ 基于激活值统计特征网格搜索最优缩放因子 自动搜索最优平滑参数 查看详情

量化算法

包含权重量化、激活量化以及针对特定结构的量化方案。

算法名称 类型 核心思想 适用场景 详细说明
AutoRound 权重量化优化 基于 SignSGD 优化舍入偏移,降低重构误差 4bit 等超低比特量化 查看详情
FA3 Quant 激活量化 针对 Attention 激活的 per-head INT8 量化 长序列、MLA 架构模型 查看详情
GPTQ 权重量化优化 通过逐列优化和误差补偿最小化量化误差 高精度权重量化需求 查看详情
KVCache Quant KV Cache 量化 针对 KV Cache 的量化方案 提升长序列推理效率 查看详情
Linear Quant 基础量化 对线性层进行权重量化和激活量化 基础量化场景 查看详情
PDMIX 混合阶段量化 Prefilling 使用动态量化,Decoding 使用静态量化 大模型推理加速,平衡精度与性能 查看详情
Histogram 激活量化 分析直方图分布,搜索最优截断区间 过滤离群值,提高精度 查看详情
MinMax 基础量化 统计最大最小值确定量化范围 基础量化场景,计算开销低 查看详情
SSZ 权重量化 迭代搜索最优缩放因子和偏移量 权重分布不均的精度优化 查看详情
LAOS 低比特量化 针对 W4A4 等极低比特场景的优化 极致压缩需求 查看详情
Float Sparse 稀疏化 基于 ADMM 算法实现模型浮点 sparse 高压缩率需求 查看详情

自动调优策略

通过自动化策略寻找最优的量化配置。

算法名称 核心思想 适用场景 详细说明
Standing High 结合离群值策略,在满足精度条件下基于二分法尽量减少回退层数 需精细控制模板与策略,需要提供完整量化配置 查看详情
Standing High With Experience 仅需量化类型与结构配置,根据专家经验自动生成量化配置 熟悉模型结构,无需提供完整量化配置 查看详情

敏感层分析算法

敏感层分析通过msmodelslim analyze在校准数据上度量各层或子结构对量化的敏感程度,得到排序结果以辅助回退与 YAML 调参。

算法名称 分析范围 核心思想 适用场景 详细说明
Std linear(线性层) 用激活动态范围与标准差的比值刻画敏感度 量化前线性层粗筛、默认策略之一 查看详情
Quantile linear(线性层) 基于分位数与 IQR 构造 score,对离群点相对稳健 激活尾部重、希望降低离群主导 查看详情
Kurtosis linear(线性层) 估计激活峰度,识别尖峰与极端值影响 关注尖峰分布、配合回退或混精 查看详情
Attention MSE(mse) attn(attention 结构) 浮点与量化权重下 attention 输出的 MSE Attention 权重量化敏感度(需适配器接口) 查看详情
层级 MSE(mse_layer_wise) layer(Decoder 块) 块内选中子模块输出上 MSE 的块内均值 整层或整块(如 MLP / attention 段)回退 查看详情
模型级 MSE(mse_model_wise) layer(链式前向) 逐层量化扰动对模型最终输出的 MSE 从最终隐藏状态视角看层敏感度 查看详情

算法选择建议

  • 初学者:建议优先使用《一键量化 (V1)》,它会自动集成合适的算法组合。
  • 敏感层与回退:在定稿 YAML 前可用《敏感层分析》结合上表 metrics 做层/结构排序;linear可首选Kurtosislayer可优先mse_layer_wise
  • 追求极致精度:可以尝试组合使用 QuaRot + AutoRound
  • 长序列推理:推荐开启 FA3 QuantKVCache Quant