MindStudio ModelSlim
Simple, fast, and lean—msModelSlim is all you need.
Ascend Model Compression Tool
🌐 Project homepage |
📖 Documentation |
🔥 What's New |
🤔 Issue
🔥 What's New
🗓️ March 2026
- Added support for GLM-4.6V W8A8 quantization.
🗓️ February 2026
- Added support for Qwen3-Omni-30B-A3B-Thinking and Qwen3-Omni-30B-A3B-Instruct W8A8 quantization.
- Added support for Qwen2.5-Omni-7B W8A8 quantization.
- Added support for Qwen3.5-397B-A17B W8A8 quantization.
- Added support for GLM-5 W4A8 quantization.
- Optimized configuration recommendations for quick quantization scenarios.
🗓️ January 2026
- Added support for Qwen3-VL-32B-Instruct W8A8 quantization.
📋 Change History (Click to Expand)
🗓️ December 2025
- Added support for automatic tuning using quantization accuracy feedback, enabling automatic searches for optimal quantization configurations based on accuracy targets.
- Added support for quantization of custom multimodal understanding models, enabling the integration of quantization workflows for these models.
- Added support for multi-device execution during quick quantization, enabling distributed layer-wise quantization to increase foundation model quantization efficiency.
- Added support for DeepSeek-V3.2 W8A8 quantization, requiring only a single device with 64 GB GPU memory and 100 GB system memory.
- Added support for DeepSeek-V3.2-Exp W4A8 quantization, requiring only a single device with 64 GB GPU memory and 100 GB system memory.
- Added support for Qwen3-VL-235B-A22B W8A8 quantization.
🗓️ November 2025
- Added support for plugin-based model adaptation and configuration registration alongside dependency pre-checks.
🗓️ October 2025
- Added support for Qwen3-235B-A22B W4A8 and Qwen3-30B-A3B W4A8 quantization, with quantized model inference and deployment support on the vLLM Ascend framework.
🗓️ September 2025
- Added support for DeepSeek-V3.2-Exp W8A8 quantization, requiring only a single device with 64 GB GPU memory and 100 GB system memory.
- Resolved an issue where abnormal tokens (such as "game copy") frequently occurred during Qwen3-235B-A22B W8A8 quantization operations, as detailed in the Qwen3-MoE quantization best practices.
- Added support for DeepSeek R1 W4A8 per_channel quantization
[Prototype]. - Added support for sensitivity analysis of foundation model quantization layers.
🗓️ August 2025
- Added support for quick quantization of the Wan2.1 model.
- Added support for layer-wise quantization of foundation models, significantly reducing memory consumption during quantization workflows.
- Added support for SSZ weight quantization algorithm of foundation models, improving quantization accuracy by iteratively searching for optimal scaling factors and offsets.
Note: Features labeled with
[Prototype]are not fully verified, meaning they can be unstable or contain bugs. Features labeled with[Beta]represent non-commercial capabilities.
📖 Overview
The Ascend model compression tool MindStudio ModelSlim (msModelSlim) is a compression tool dedicated to hardware acceleration, leveraging compression technologies natively optimized for Ascend architectures. It integrates a suite of inference optimization technologies (such as quantization and compression) designed to accelerate dense foundation models, Mixture of Experts (MoE) models, multimodal understanding models, and multimodal generative models.
Ascend AI model developers can call Python APIs to adapt algorithms and models, optimize accuracy and performance, and export models in different formats. The models can run on Ascend AI Processors through inference frameworks such as MindIE and vLLM Ascend.
🗂️ Directory Structure
The following list describes the key project directories. For a comprehensive breakdown, see Directory Structure.
├─config # Configuration files
├─docs # Documentation directory
├─example # Examples directory
├─lab_calib # Calibration dataset
├─lab_practice # Best practices
├─msmodelslim
│ ├─app # Application module
│ ├─cli # Command-line interface
│ ├─core # Other quantization modules and components
│ ├─infra # Quantization infrastructure
│ ├─model # Model adaptation layer
│ ├─ir # Quantization mode
│ ├─processor # Algorithm
│ └─utils # General utility infrastructure
└─test # Test directory
🧾 Release Notes
The release notes of msModelSlim include the software version mapping, software package download, and feature updates of each version. For details, see Release Notes.
🛠️ Environment Setup
For details about the installation procedure, see the msModelSlim Installation Guide.
🚀 Quick Start
This section helps you quickly get started with the quick quantization of foundation models.
For details, see Quick Start.
✨ Feature Description
🧩 Model Support Matrix
The model support matrix presents the adaptation status of different features and models in various scenarios in a table format.
For details, see Model Support Matrix.
📘 Feature Guide
The feature guide provides feature introductions and usage instructions based on the features supported by msModelSlim across different architectures.
For details, see Tool Documentation. In the navigation tree on the left, select the feature you want to view.
⚙️ Custom Model Quantization
This section provides guidance for developers who need to connect their own models to msModelSlim and perform quick quantization.
For details about model connection, see LLM Model Integration Guide and Multimodal Understanding Model Integration Guide.
🧪 Cases Studies
The case collection provides text descriptions and code samples based on actual application scenarios, aiming to help users quickly get familiar with the usage of msModelSlim in specific scenarios, including accuracy tuning methods. msModelSlim will continuously improve the case collection.
| Case Category | Case Name | Description |
|---|---|---|
| v1 framework quantization accuracy tuning | v1 Framework Quantization Accuracy Tuning Guide | |
| v1 framework Qwen3-32B accuracy tuning | v1 Framework Qwen3-32B w8a8 Accuracy Tuning Case | |
| Weight Conversion | Guide for Using msModelSlim Quantized Weights with AutoAWQ and AutoGPTQ | Quantized weight format conversion guide |
| Inference and Deployment | Quantized Weight Usage Cases in Acceleration Library and MindIE-Torch Scenarios | Usage methods of quantized weights in inference acceleration libraries |
❓ FAQ
For details about the frequently asked questions, see FAQ.
🤝 Contribution Guide
For details, see Contribution Guide.
📌 Related Statements
🛡️ Security Statement
Describes the security hardening information, public network address information, and communication matrix of msModelSlim. For details, see msModelSlim Security Statement.
⚠️ Disclaimer
👤 To msModelSlim Users
- This tool is intended solely for debugging and development. You are responsible for any risks and should carefully review the following information:
- msModelSlim depends on third-party open-source software such as Transformers and PyTorch, which is provided and maintained by their respective communities. Resolution of issues in these dependencies relies on community contributions and feedback. Please notice that the msModelSlim repository does not guarantee fixes for issues in third-party software, nor does it guarantee testing or correction of all vulnerabilities or errors in such software.
- When you use msModelSlim, it reads model weights from local storage based on provided command-line parameters or configuration files. Using untrusted model weights may cause unknown security risks. You are advised to use methods such as SHA256 verification to ensure model weights are trusted before passing them to the tool.
- To ensure security and implement the principle of least privilege, you are advised to use msModelSlim as a standard user rather than a high-privilege user (such as root).
- Adhere to the principle of least privilege. For example, prevent other users from writing data by disabling permissions such as
666and777. - Ensure the
umaskvalue of the execution user is greater than or equal to0027to prevent excessive permissions on generated quantized model directories and files.- To check the
umaskvalue, run theumaskcommand. - To change the
umaskvalue, run theumask new_valuecommand.
- To check the
- Ensure that original model data and quantized model data are stored in the current user directory without symbolic links to avoid potential security issues.
- Adhere to the principle of least privilege. For example, prevent other users from writing data by disabling permissions such as
- Data processing and deletion: Users are responsible for managing and deleting any data generated while using this tool. You are advised to promptly delete any related data after use to prevent information leaks.
- Data confidentiality and transmission: Users understand and agree not to share or transmit any data generated by this tool. Neither the tool nor its developers are responsible for any information leaks, data breaches, or other negative consequences.
- User input security: Users are responsible for the security of any commands they enter and for any risks or losses resulting from improper input. The tool and its developers are not liable for issues caused by incorrect command usage.
- Disclaimer scope: This disclaimer applies to all individuals and entities using this tool. By using the tool, you acknowledge and accept this statement and assume all risks and responsibilities arising from its use. If you do not agree, please stop using the tool immediately.
- Before using this tool, please read and understand the preceding disclaimer. If you have any questions, contact the developer.
📦 To Data Owners
If you do not want your dataset to be mentioned in the models of msModelSlim, or if you wish to update its description, please submit an issue on Gitcode. msModelSlim will delete or update your dataset description according to your request. Thank you for your understanding and contribution to msModelSlim.
📜Contribution Statement
- Error report submission: If you discover a vulnerability in msModelSlim that is not a security issue, first search the Issues in the msModelSlim repository to avoid submitting duplicates. If the vulnerability is not listed, create a issue. If you discover a security-related issue, do not disclose it publicly. Please refer to the security handling guidelines for details. All error reports must include complete information about the issue.
- Security issue handling: For guidance on handling security issues in this project, please contact the core team via email for instructions.
- Resolving existing issues: Browse open Issues to identify issues that need attention, and attempt to fix them.
- Proposing new features: Use the Feature label when creating an issue for a new feature. We will review and confirm proposals periodically.
- How to contribute: a. Fork the repository of the project. b. Clone it to your local machine. c. Create a development branch. d. Conduct local testing. All unit tests, including any new test cases, must pass before submission. e. Submit your code. f. Create a pull request (PR). g. Code review: Modify the code according to review comments and resubmit your changes. This process may involve multiple rounds of iterations. h. After your PR is approved by the required number of reviewers, the committer will conduct the final review. i. After your PR is approved and all tests pass, the CI system will merge it into the project's main branch.
📄 LICENSE
For the license of msModelSlim, see LICENSE.
Documents in the docs directory of msModelSlim are licensed under CC-BY 4.0. For details, see LICENSE.
💬 Suggestions and Feedback
You are welcome to contribute to the community. If you have any questions or suggestions, please submit Issues. We will reply as soon as possible. Thank you for your support.
🙏 Acknowledgements
msModelSlim is jointly developed by the following Huawei departments and Ascend ecosystem partners:
Huawei:
- Computing Product Line
- 2012 Labs
Thank you to everyone in the community for your PRs. We warmly welcome your contributions.
👥 About the MindStudio Team
The Huawei MindStudio full-pipeline development toolchain team is dedicated to providing an end-to-end solution for building Ascend AI applications, accelerating the processes of training, inference, and operator development. You can learn more about the Huawei MindStudio team through the following channels:
Send "communication group" to the official account to obtain the QR code of the technical communication group.