MindStudio Probe

🚀 All-scenario Ascend AI precision debugging tool

Docs License Version Ascend

📢 What's New

[2026.03.28]: Notice of Deprecation: ADump Module in the msProbe Repository

[2026.03.20]: Released the Foundation Model Training Accuracy Debugging Guide, Foundation Model Inference Accuracy Debugging Guide, and Common Framework Tool Instructions.

[2025.12.31]: Released the open-source version of MindStudio Probe.

📌 Overview

MindStudio Probe (msProbe) is a full-scenario precision toolchain for Ascend. It is designed for precision debugging during model development and help you significantly improve the efficiency of locating model precision problems.

🔍 Directory Structure

The key directories are as follows. For details, see Project Directory.

MindStudio-probe
├── ccsrc                         # C/C++ source code directory
├── cmake                     # CMake files for the C-based components of msProbe
├── docs                         # Documentation directory
├── examples                   # Directory for tool configuration examples
├── output                       # Directory for generated deliverables
├── plugins                    # Entry to plugin code
├── python                      # Python source code directory
├── scripts                      # Directory for storing installation, uninstallation, and upgrade scripts
├── test                         # Test code directory 
├── setup.py                     # E2E packaging and build script
├── README.md                  # Repository description
├── LICENSE                       # License file

📝Version Description

Version Supported PyTorch Version Supported MindSpore Version Supported Python Version Supported CANN Version
26.0.0 (under development) 2.1/2.2/2.5/2.6/2.7/2.8/2.9 2.4.0/2.5.0/2.6.0/2.7.1 3.8-3.12 ≥ CANN 8.3.RC1
26.0.0-alpha.2 2.1/2.2/2.5/2.6/2.7/2.8/2.9 2.4.0/2.5.0/2.6.0/2.7.1 3.8-3.12 ≥ CANN 8.3.RC1
26.0.0-alpha.1 2.1/2.2/2.5/2.6/2.7/2.8 2.4.0/2.5.0/2.6.0/2.7.1 3.8-3.11 ≥ CANN 8.3.RC1

🛠️ Environment Setup

Install msProbe by referring to msProbe Installation Guide.

🚀 Quick Start

An executable sample is provided to describe the precision data collection and comparison functions of msProbe, helping you quickly get started. For details, see Quick Start of msProbe in the PyTorch Scenario or Quick Start of msProbe in the MindSpore Scenario.

📖 Functions

Scenario Sub-mode/Sub-scenario Function Description Reference
vLLM Inference eager mode Data collection Collect msProbe precision data. Data Collection
Data comparison Compare the precision of the data dumped by msProbe to locate precision issues
via graph comparison in hierarchical visualization or precision comparison mode.
Graph Comparison in Hierarchical Visualization
Precision Comparison
ACLGraph mode Data collection Collect precision data by using the acl_save API. Data Collection
TorchAir graph mode Data collection Collect precision data by using the set_ge_dump_config API. Data Collection
Precision comparison Compare the precision of the data dumped by msProbe to locate precision issues Precision Comparison
SGLang inference eager mode Data collection Collect msProbe precision data. Data Collection
Data comparison Compare the precision of the data dumped by msProbe to locate precision issues Graph Comparison in Hierarchical Visualization
Precision Comparison
ATB inference - Data collection Before running an ATB model, load the ATB dump module to collect the precision data during the running of the ATB model. Data Collection
Precision comparison Compare the precision of the data dumped by ATB to locate precision issues. Precision Comparison
Data conversion Convert the precision data dumped by ATB into a NumPy (.npy) or PyTorch tensor (.pt) file. Data Conversion
Offline model inference - Data collection Collect msProbe precision data. Data Collection
Precision comparison Provide one-click offline model comparison by simplify inputting a model without data collection in advance and generate results quickly. Precision Comparison
Offline model data precision comparison Compare the precision of an offline model by inputting the dump data of the offline model. Offline Model Data Precision Comparison
Data conversion Convert the dump data of an offline model into a NumPy (.npy) or PyTorch tensor (.pt) file. Data Conversion
PyTorch training - Configuration check before training Before training or precision comparison, compare the configuration differences that may affect training precision in the two environments. Configuration Check Before Training
Data collection Configure the config.json file to collect msProbe precision data. Data Collection
Precision pre-check Scan all APIs in a training model running Ascend NPUs and provide diagnostic and analytical insights into precision. Precision Pre-check
Graph comparison in hierarchical visualization Parse the precision data dumped by msProbe to restore the model graph structure and compare the precision data of each model layer. Graph Comparison in Hierarchical Visualization
Precision comparison Compare the precision of the data dumped by msProbe to locate precision issues Precision Comparison
Training status monitoring Collect and aggregate the intermediate values of the network layer, optimizer, and communication operators during model training, helping diagnose exceptions that occur during computing, communication, and optimization. Training Status Monitoring
Checkpoint comparison During or after training, compare two different checkpoints to evaluate model similarity. Checkpoint Comparison
First network overflow/underflow node analysis In the multi-rank scenario, find the first node where NaN or INF occurs through data dumping. First Network Overflow/Underflow Node Analysis
Trend visualization Visualize the data collected by msProbe or the training status monitoring statistics in terms of the number of iterations, rank, and tensor. Trend Visualization
MindSpore training - Configuration check before training Before training or precision comparison, compare the configuration differences that may affect training precision in the two environments. Configuration Check Before Training
Data collection Configure the config.json file to collect msProbe precision data. Data Collection
Precision pre-check Scan all APIs in a training model running Ascend NPUs and provide diagnostic and analytical insights into precision. Precision Pre-check
Graph comparison in hierarchical visualization Parse the precision data dumped by msProbe to restore the model graph structure and compare the precision data of each model layer. Graph Comparison in Hierarchical Visualization
Precision comparison Compare the precision of the data dumped by msProbe to locate precision issues Precision Comparison
Training status monitoring Collect and aggregate the intermediate values of the network layer, optimizer, and communication operators during model training, helping diagnose exceptions that occur during computing, communication, and optimization. Training Status Monitoring
Overflow/Underflow detection and parsing Overflow/Underflow detection collects precision data from APIs/modules with overflow/underflow issues, while overflow/underflow analysis examines this data to determine whether the phenomenon is normal.
It is recommended that the data collection function be triggered to collect statistics and detect overflow/underflow problems.
Overflow/Underflow Detection and Parsing
Data Collection
Checkpoint comparison During or after training, compare two different checkpoints to evaluate model similarity. Checkpoint Comparison
Trend visualization Visualize the data collected by msProbe or the training status monitoring statistics in terms of the number of iterations, rank, and tensor. Trend Visualization
MSAdapter scenario - Data collection Configure the config.json file to collect msProbe precision data. Data Collection
Checkpoint comparison During or after training, compare two different checkpoints to evaluate model similarity. Checkpoint Comparison

📚 Supplementary Materials

💬 FAQs

FAQs summarizes the problems that may occur when you use msProbe.

📝 Additional Information

💬 Suggestions and Feedback

You are welcome to contribute to the community. If you have any questions or suggestions, please submit issues. We will reply as soon as possible. Thank you for your support.

📱 Follow the MindStudio WeChat Account 💬 Communication and Support Channels

Scan the QR code to follow us and get the latest updates.
💡 Join the WeChat group:
Follow the WeChat account and reply "communication group" to obtain the QR code for joining the group.

🛠️ ️Other channels:

🤝 Acknowledgments

msProbe is jointly developed by the following Huawei departments:

  • Ascend Computing MindStudio Development Department
  • Parallel Distributed Computing Laboratory

Thank you to everyone in the community for your PRs. We warmly welcome contributions to msProbe!