Debug Mode User Guide
Overview
Debug mode is an advanced feature provided by the quick quantization function. It is used to save key information during quantization to help developers and advanced users gain in-depth insights into the quantization process, troubleshoot problems, or conduct algorithm research.
After debug mode is enabled, the tool automatically saves the context information during quantization. This information includes the intermediate results, statistics, and tensor information of each processor for subsequent analysis and debugging.
Usage
Syntax
To enable the debug mode, add the --debug option to the quick quantization command:
msmodelslim quant --debug [arguments]
Command-line Options
| Option | Purpose | Mandatory (Yes/No) | Description |
|---|---|---|---|
| --debug | Enables debug mode. | No | After this option is added, the context information is automatically saved to the debug directory when quantization is complete. |
Examples
Example 1: Basic Debug Mode
Perform W8A8 quantization on the Qwen2.5-7B-Instruct model and enable debug mode:
msmodelslim quant \
--model_path ${MODEL_PATH} \
--save_path ${SAVE_PATH} \
--device npu \
--model_type Qwen2.5-7B-Instruct \
--quant_type w8a8 \
--trust_remote_code True \
--debug
Example 2: Multi-Device Quantization + Debug Mode
Perform distributed quantization by using multiple devices and enable debug mode:
msmodelslim quant \
--model_path ${MODEL_PATH} \
--save_path ${SAVE_PATH} \
--device npu:0,1,2,3 \
--model_type Qwen2.5-7B-Instruct \
--quant_type w8a8 \
--trust_remote_code True \
--debug
Example 3: Custom Configuration + Debug Mode
Perform quantization by using a custom configuration file and enable debug mode:
msmodelslim quant \
--model_path ${MODEL_PATH} \
--save_path ${SAVE_PATH} \
--device npu \
--model_type ${MODEL_TYPE} \
--config_path ${CONFIG_PATH} \
--trust_remote_code True \
--debug
Debug Information Output
Output Directory Structure
After debug mode is enabled, the tool creates the debug_info subdirectory under the quantization weight save path (save_path) to store debug information.
${SAVE_PATH}/
├── config.json # Original model configuration file
├── generation_config.json # Original generation configuration file
├── quant_model_description.json # Description file for quantized weights
├── quant_model_weight_w8a8.safetensors # Quantized weight file
├── tokenizer_config.json # Original tokenizer configuration file
├── tokenizer.json # Original tokenizer vocabulary
└── debug_info/ # Debug information directory
├── debug_info.json # Debug metadata in JSON format
└── debug_info.safetensors # Debug tensor data in SafeTensors format
Output File Description
debug_info.json
Stores non-tensor data and tensor metadata generated during the quantization process. The data is organized by namespace.
File Structure
{
"namespace_key_1": {
"field_1": "value",
"field_2": 123,
"tensor_field": {
"_type": "tensor",
"_file": "debug_info.safetensors",
"_key": "tensor_0"
}
},
"namespace_key_2": {
...
}
}
Field Description
- Namespace (
namespace): An independent namespace is created by each processor or module to isolate the debug information of different phases. - Common fields: record scalar values such as integers, floating-point numbers, strings, and Boolean values directly.
- Tensor fields: store reference information for PyTorch tensors.
_type: indicates that the field is a tensor reference with the fixed value"tensor"._file: indicates the name of the file where the tensor data is stored (debug_info.safetensors)._key: indicates the key name of the tensor in theSafeTensorsfile.
debug_info.safetensors
Stores all tensor data generated during the quantization process in SafeTensors format, including:
- Quantization parameters such as
scaleandzero_point - Statistics such as minimum values, maximum values, and histograms
- Intermediate result tensors
- Other tensors used for debugging
Features
- Efficient storage: The SafeTensors format supports fast loading and memory mapping.
- Cross-platform compatibility: The file can be shared across different frameworks and platforms.
- Security: The format is more secure than the pickle format to avoid code injection risks.
Saved Information
Debug mode saves the debug information generated by each processor during quantization. The specific content depends on the quantization configuration and processor type used. Typical debug information includes:
Quantization processor (linear_quant)
- Quantization parameters such as
scaleandzero_pointfor each layer - Activation statistics such as
min,max, andhistogram - Weight statistics
- Error analysis before and after quantization
Outlier suppression processors (such as iter_smooth and flex_smooth)
- Smoothing factors
- Intermediate results generated during iteration
- Smoothing effect evaluation metrics for each layer
Other processors
- Processor-specific configuration parameters
- Intermediate calculation results
- Performance statistics
Debug Information Usage
Loading Debug Information
You can use a Python script to load and analyze the debug information.
import json
from safetensors import safe_open
# Load JSON metadata
with open("debug_info/debug_info.json", "r") as f:
debug_meta = json.load(f)
# Load SafeTensors tensor data
with safe_open("debug_info/debug_info.safetensors", framework="pt") as f:
# Obtain the key names of all tensors
tensor_keys = f.keys()
# Load specific tensors
for key in tensor_keys:
tensor = f.get_tensor(key)
print(f"{key}: shape={tensor.shape}, dtype={tensor.dtype}")
Analyzing Quantization Effects
Perform the following analysis by using the debug information:
1. View the distribution of quantization parameters
# Analyze the quantization parameters in a namespace
namespace = debug_meta["linear_quant_namespace"]
# View the reference of the scale parameter
if "scales" in namespace and namespace["scales"]["_type"] == "tensor":
scale_key = namespace["scales"]["_key"]
# Load actual data from SafeTensors for analysis
2. Compare statistics of different layers
# Traverse all namespaces and collect statistics
for ns_name, ns_data in debug_meta.items():
if "layer_stats" in ns_data:
print(f"Layer: {ns_name}")
print(f"Stats: {ns_data['layer_stats']}")
3. Troubleshoot accuracy issues
Locate the cause of accuracy degradation by comparing activation value distributions before and after quantization and checking outlier suppression effects.
Precautions
Storage Space
- The debug information can occupy significant drive space, typically 10% to 50% of the model size. Ensure that sufficient storage space is available.
- For ultra-large models (100B+ parameters), the debug information can reach dozens of GB.
Performance Impact
- Enabling debug mode slightly increases quantization time, typically by 5% to 10%.
- Main overhead consists of the time required for serialization and writing the debug information to the drive.
Security
- The debug information may contain sensitive information about the model (such as quantization parameters and statistics).
- Keep the debug information file secure to prevent unauthorized disclosure.
Compatibility
- The format of the debug information may change with version updates.
- Use msModelSlim of the same version to load and analyze the debug information.
Application Scenarios
Debug mode is applicable to the following scenarios:
1. Quantization accuracy tuning
Analyze the debug information to address suboptimal model accuracy after quantization:
- Identify layers with large quantization errors.
- Verify if outlier suppression algorithms are effective.
- Evaluate if the distribution of quantization parameters is reasonable.
2. Algorithm research and development
Researchers can use the debug information to perform the following tasks:
- Analyze the effects of different quantization algorithms.
- Compare quantization results across different configurations.
- Develop new quantization algorithms or optimization strategies.
3. Troubleshooting and reporting
When a quantization issue occurs, the debug information can help with the following tasks:
- Quickly locate the cause of the issue.
- Provide detailed diagnostic information to technical support.
- Reproduce and verify the issue.
4. Model analysis and optimization
Use the debug information to understand the following matters:
- Activation distribution characteristics for each model layer.
- Identification of quantization-sensitive layers.
- Basis for formulating mixed-precision quantization strategies.
FAQ
Q1: What Should I Do If The Debug Information Occupies Too Much Space?
A: Consider the following methods:
- Analyze and delete unnecessary debug information promptly after quantization is complete.
- Enable debug mode only when in-depth analysis is required.
- Archive the debug information directory by using a compression tool.
Q2: How Do I Determine Whether to Enable Debug Mode?
A: Enabling debug mode is recommended in the following cases:
- Accuracy decreases significantly after quantization and the cause must be located.
- New quantization configurations or algorithm combinations are tested.
- Algorithm research or model analysis is performed.
- Detailed information is required when an issue is reported to technical support.
Q3: What Should I Do If the Debug Information Fails to Be Saved?
A: Possible causes and solutions include:
- Insufficient drive space: Clear the drive space or change the save path.
- Permission issues: Ensure that you have write permissions for the save path.
- Invalid path: Ensure that the path specified by the
save_pathparameter is valid.
If saving fails, the tool outputs a warning message but does not interrupt the quantization process. Quantized weights are still saved normally.
Q4: Can Debug Information Be Shared Across Different Devices?
A: Yes. The debug information is in standard JSON and SafeTensors formats, allowing it to be shared across different devices and platforms. However, note the following:
- Ensure that a compatible version of msModelSlim is used.
- SafeTensors files can be large. Pay attention to network bandwidth during transmission.
Q5: Does Debug Mode Affect Quantization Results?
A: No. Debug mode only records context information after quantization is complete. It does not change the execution logic or results of the quantization algorithm. The generated quantized weights are identical regardless of whether debug mode is enabled.