Debug Mode User Guide

Overview

Debug mode is an advanced feature provided by the quick quantization function. It is used to save key information during quantization to help developers and advanced users gain in-depth insights into the quantization process, troubleshoot problems, or conduct algorithm research.

After debug mode is enabled, the tool automatically saves the context information during quantization. This information includes the intermediate results, statistics, and tensor information of each processor for subsequent analysis and debugging.

Usage

Syntax

To enable the debug mode, add the --debug option to the quick quantization command:

msmodelslim quant --debug [arguments]

Command-line Options

Option	Purpose	Mandatory (Yes/No)	Description
--debug	Enables debug mode.	No	After this option is added, the context information is automatically saved to the debug directory when quantization is complete.

Examples

Example 1: Basic Debug Mode

Perform W8A8 quantization on the Qwen2.5-7B-Instruct model and enable debug mode:

msmodelslim quant \
  --model_path ${MODEL_PATH} \
  --save_path ${SAVE_PATH} \
  --device npu \
  --model_type Qwen2.5-7B-Instruct \
  --quant_type w8a8 \
  --trust_remote_code True \
  --debug

Example 2: Multi-Device Quantization + Debug Mode

Perform distributed quantization by using multiple devices and enable debug mode:

msmodelslim quant \
  --model_path ${MODEL_PATH} \
  --save_path ${SAVE_PATH} \
  --device npu:0,1,2,3 \
  --model_type Qwen2.5-7B-Instruct \
  --quant_type w8a8 \
  --trust_remote_code True \
  --debug

Example 3: Custom Configuration + Debug Mode

Perform quantization by using a custom configuration file and enable debug mode:

msmodelslim quant \
  --model_path ${MODEL_PATH} \
  --save_path ${SAVE_PATH} \
  --device npu \
  --model_type ${MODEL_TYPE} \
  --config_path ${CONFIG_PATH} \
  --trust_remote_code True \
  --debug

Debug Information Output

Output Directory Structure

After debug mode is enabled, the tool creates the debug_info subdirectory under the quantization weight save path (save_path) to store debug information.

${SAVE_PATH}/
├── config.json                          # Original model configuration file
├── generation_config.json               # Original generation configuration file
├── quant_model_description.json         # Description file for quantized weights
├── quant_model_weight_w8a8.safetensors  # Quantized weight file
├── tokenizer_config.json                # Original tokenizer configuration file
├── tokenizer.json                       # Original tokenizer vocabulary
└── debug_info/                          # Debug information directory
    ├── debug_info.json                  # Debug metadata in JSON format
    └── debug_info.safetensors           # Debug tensor data in SafeTensors format

Output File Description

`debug_info.json`

Stores non-tensor data and tensor metadata generated during the quantization process. The data is organized by namespace.

File Structure

{
  "namespace_key_1": {
    "field_1": "value",
    "field_2": 123,
    "tensor_field": {
      "_type": "tensor",
      "_file": "debug_info.safetensors",
      "_key": "tensor_0"
    }
  },
  "namespace_key_2": {
    ...
  }
}

Field Description

Namespace (namespace): An independent namespace is created by each processor or module to isolate the debug information of different phases.
Common fields: record scalar values such as integers, floating-point numbers, strings, and Boolean values directly.
Tensor fields: store reference information for PyTorch tensors.
- _type: indicates that the field is a tensor reference with the fixed value "tensor".
- _file: indicates the name of the file where the tensor data is stored (debug_info.safetensors).
- _key: indicates the key name of the tensor in the SafeTensors file.

`debug_info.safetensors`

Stores all tensor data generated during the quantization process in SafeTensors format, including:

Quantization parameters such as scale and zero_point
Statistics such as minimum values, maximum values, and histograms
Intermediate result tensors
Other tensors used for debugging

Features

Efficient storage: The SafeTensors format supports fast loading and memory mapping.
Cross-platform compatibility: The file can be shared across different frameworks and platforms.
Security: The format is more secure than the pickle format to avoid code injection risks.

Saved Information

Debug mode saves the debug information generated by each processor during quantization. The specific content depends on the quantization configuration and processor type used. Typical debug information includes:

Quantization processor (linear_quant)

Quantization parameters such as scale and zero_point for each layer
Activation statistics such as min, max, and histogram
Weight statistics
Error analysis before and after quantization

Outlier suppression processors (such as iter_smooth and flex_smooth)

Smoothing factors
Intermediate results generated during iteration
Smoothing effect evaluation metrics for each layer

Other processors

Processor-specific configuration parameters
Intermediate calculation results
Performance statistics

Debug Information Usage

Loading Debug Information

You can use a Python script to load and analyze the debug information.

import json
from safetensors import safe_open

# Load JSON metadata
with open("debug_info/debug_info.json", "r") as f:
    debug_meta = json.load(f)

# Load SafeTensors tensor data
with safe_open("debug_info/debug_info.safetensors", framework="pt") as f:
    # Obtain the key names of all tensors
    tensor_keys = f.keys()
    
    # Load specific tensors
    for key in tensor_keys:
        tensor = f.get_tensor(key)
        print(f"{key}: shape={tensor.shape}, dtype={tensor.dtype}")

Analyzing Quantization Effects

Perform the following analysis by using the debug information:

1. View the distribution of quantization parameters

# Analyze the quantization parameters in a namespace
namespace = debug_meta["linear_quant_namespace"]

# View the reference of the scale parameter
if "scales" in namespace and namespace["scales"]["_type"] == "tensor":
    scale_key = namespace["scales"]["_key"]
    # Load actual data from SafeTensors for analysis

2. Compare statistics of different layers

# Traverse all namespaces and collect statistics
for ns_name, ns_data in debug_meta.items():
    if "layer_stats" in ns_data:
        print(f"Layer: {ns_name}")
        print(f"Stats: {ns_data['layer_stats']}")

3. Troubleshoot accuracy issues

Locate the cause of accuracy degradation by comparing activation value distributions before and after quantization and checking outlier suppression effects.

Precautions

Storage Space

The debug information can occupy significant drive space, typically 10% to 50% of the model size. Ensure that sufficient storage space is available.
For ultra-large models (100B+ parameters), the debug information can reach dozens of GB.

Performance Impact

Enabling debug mode slightly increases quantization time, typically by 5% to 10%.
Main overhead consists of the time required for serialization and writing the debug information to the drive.

Security

The debug information may contain sensitive information about the model (such as quantization parameters and statistics).
Keep the debug information file secure to prevent unauthorized disclosure.

Compatibility

The format of the debug information may change with version updates.
Use msModelSlim of the same version to load and analyze the debug information.

Application Scenarios

Debug mode is applicable to the following scenarios:

1. Quantization accuracy tuning

Analyze the debug information to address suboptimal model accuracy after quantization:

Identify layers with large quantization errors.
Verify if outlier suppression algorithms are effective.
Evaluate if the distribution of quantization parameters is reasonable.

2. Algorithm research and development

Researchers can use the debug information to perform the following tasks:

Analyze the effects of different quantization algorithms.
Compare quantization results across different configurations.
Develop new quantization algorithms or optimization strategies.

3. Troubleshooting and reporting

When a quantization issue occurs, the debug information can help with the following tasks:

Quickly locate the cause of the issue.
Provide detailed diagnostic information to technical support.
Reproduce and verify the issue.

4. Model analysis and optimization

Use the debug information to understand the following matters:

Activation distribution characteristics for each model layer.
Identification of quantization-sensitive layers.
Basis for formulating mixed-precision quantization strategies.

FAQ

Q1: What Should I Do If The Debug Information Occupies Too Much Space?

A: Consider the following methods:

Analyze and delete unnecessary debug information promptly after quantization is complete.
Enable debug mode only when in-depth analysis is required.
Archive the debug information directory by using a compression tool.

Q2: How Do I Determine Whether to Enable Debug Mode?

A: Enabling debug mode is recommended in the following cases:

Accuracy decreases significantly after quantization and the cause must be located.
New quantization configurations or algorithm combinations are tested.
Algorithm research or model analysis is performed.
Detailed information is required when an issue is reported to technical support.

Q3: What Should I Do If the Debug Information Fails to Be Saved?

A: Possible causes and solutions include:

Insufficient drive space: Clear the drive space or change the save path.
Permission issues: Ensure that you have write permissions for the save path.
Invalid path: Ensure that the path specified by the save_path parameter is valid.

If saving fails, the tool outputs a warning message but does not interrupt the quantization process. Quantized weights are still saved normally.

Q4: Can Debug Information Be Shared Across Different Devices?

A: Yes. The debug information is in standard JSON and SafeTensors formats, allowing it to be shared across different devices and platforms. However, note the following:

Ensure that a compatible version of msModelSlim is used.
SafeTensors files can be large. Pay attention to network bandwidth during transmission.

Q5: Does Debug Mode Affect Quantization Results?

A: No. Debug mode only records context information after quantization is complete. It does not change the execution logic or results of the quantization algorithm. The generated quantized weights are identical regardless of whether debug mode is enabled.