MindStudio Monitor
Overview
MindStudio Monitor (msMonitor) is a one-stop online monitoring tool to provide end-to-end performance monitoring and fault locating capabilities in cluster scenarios.
Based on dynolog, msMonitor provides the nputrace and npu-monitor functions for users through the dynamic collection capabilities of the AI frameworks Ascend PyTorch Profiler and MindSpore Profiler and MSPTI.
- npu-monitor: a lightweight, resident background tool that monitors the time consumption of key operators.
- nputrace: obtains detailed profile data of the framework, CANN, and devices.

As shown in the preceding figure, msMonitor consists of three parts:
- Dynolog daemon: Each node has only one dynolog daemon process, which is responsible for receiving RPC requests from the dyno CLI, triggering the nputrace and npu-monitor functions, processing reported data, and displaying final data. For details about dynolog, see dynolog.
- Dyno CLI: It is a dyno client that provides nputrace and npu-monitor subcommands for users. It can be installed on any node. For details about dyno, see dyno.
- MSPTI Monitor: It is a monitoring submodule implemented based on MSPTI. It obtains profile data by calling the MSPTI APIs and reports the data to the dynolog daemon.
Directory Structure
The key directories are as follows. For details, see Project Directory.
├── docs # Project document directory
│ └── en # English document directory
├── dynolog_npu # Code directory of the dynolog_npu module
├── plugin # Code directory of the plugin module
├── scripts # Directory for storing build and test scripts
│ ├── build.sh # dynolog_npu build script
│ ├── run_st.sh # System test script
│ └── run_ut.sh # Unit test script
├── test # Test code
│ ├── st # System testing cases
│ └── ut # Unit test cases
├── third_party # Third-party dependency library
└── README.md # Project description document
Version Description
msMonitor consists of three files, as shown in the following table.
The dyno and dynolog files can be packaged into DEB or RPM packages. Currently, msMonitor can run on the PyTorch and MindSpore frameworks. For details about the latest software packages, see Version Mapping.
| File Name | Description |
|---|---|
| dyno | Binary files of the dyno client |
| dynolog | Binary files of the dynolog server |
| mindstudio_monitor-{mindstudio_version}-cp{python_version}-cp{python_version}-linux_{system_architecture}.whl | Common capability tool packages such as MSPTI Monitor and IPC.{mindstudio_version} indicates the MindStudio version, {python_version} indicates the Python version, and {system_architecture} indicates the CPU architecture. |
The following table lists the msMonitor version mapping and software package download links.
| msMonitor Version | Release Date | CANN Version | torch_npu Version | MindSpore Version | Download Link | Checksum |
|---|---|---|---|---|---|---|
| 8.3.0(aarch64) | 2025-12-29 | 8.3.RC1 or later | v7.3.0 or later | 2.7.2 or later | aarch64_8.3.0.zip | 2c675ae346dfc1c70f5e9c7103d6f8c7e53be00dca28ed5f9cc577ac59e4bc44 |
| 8.3.0(x86) | 2025-12-29 | 8.3.RC1 or later | v7.3.0 or later | 2.7.2 or later | x86_8.3.0.zip | 1a38cc141e67c50eb09ebdc757c1fd3ed54439f227459e71292b2d18bb78e7f0 |
| 8.1.0(aarch64) | 2025-07-11 | 8.1.RC1 or later | v7.1.0 or later | 2.7.0-rc1 or later | aarch64_8.1.0.zip | ce136120c0288291cc0a7803b1efc8c8416c6105e9d54c17ccf2e2510869fada |
| 8.1.0(x86) | 2025-07-11 | 8.1.RC1 or later | v7.1.0 or later | 2.7.0-rc1 or later | x86_8.1.0.zip | 097d11c7994793b6389b19259269ceb3b6b7ac5ed77da3949b3f09da2103b7f2 |
Environment Setup
Install msMonitor. You can install msMonitor by installing the software package or compiling the source code. For details, see msMonitor Installation Guide.
You are advised to install msMonitor by installing the software package. The procedure is as follows:
-
Select the software package based on Version Mapping and download it to the Linux installation environment.
-
Verify the package integrity.
Go to the directory containing the .zip package and run the following command:
sha256sum {name}.zip{name} indicates the name of the .zip package.
If the command output displays a checksum that matches the corresponding .zip package version, the correct tool installation package has been downloaded. Example:
2c675ae346dfc1c70f5e9c7103d6f8c7e53be00dca28ed5f9cc577ac59e4bc44 aarch64_8.3.0.zip -
Install the .whl package.
# Decompress the package. mkdir x86 unzip x86_8.3.0.zip -d x86 # Go to the directory of the depressed installation package. cd x86 # Install the .whl package that matches the Python version in the current environment. pip install mindstudio_monitor-{mindstudio_version}-cp{python_version}-cp{python_version}-linux_{system_architecture}.whlIf the installation is successful, the following information is displayed:
Successfully installed mindstudio_monitor-<version> pybind11-<version> -
Install dynolog.
Select an installation method based on the server OS.
-
Method 1: Use the .deb software package (applicable to Debian/Ubuntu).
dpkg -i --force-overwrite dynolog*.deb -
Method 2: Use the .rpm software package (applicable to Red Hat/Fedora/openSUSE).
rpm -ivh dynolog*.rpm --nodeps
-
Quick Start
For details about the npu-monitor and nputrace functions, see Feature Description. The following describes the common application scenarios of msMonitor:
- Use the npu-monitor function to obtain the time consumed by key operators.
- If the time consumed by key operators deteriorates, use the nputrace function to collect detailed profile data for analysis.
Procedure
-
Start the dynolog daemon process.
Example commands:
# Enable the dynolog daemon in CLI mode. dynolog --enable-ipc-monitor --certs-dir /home/server_certs # To use TensorBoard to display data, pass --metric_log_dir to specify the output path of the TensorBoard file. dynolog --enable-ipc-monitor --certs-dir /home/server_certs --metric_log_dir /tmp/metric_log_dir # The log path of dynolog daemon is /var/log/dynolog.log. -
Configure the msMonitor environment variable.
export MSMONITOR_USE_DAEMON=1 -
Set
LD_PRELOADto start MSPTI (start npu-monitor).# Example of the default path: export LD_PRELOAD=/usr/local/Ascend/ascend-toolkit/latest/lib64/libmspti.so export LD_PRELOAD=<CANN_toolkit_installation_path>/ascend-toolkit/latest/lib64/libmspti.so -
Start a training or inference job.
bash run_ai_task.sh -
Run the
dynocommand to trigger npu-monitor to monitor the time consumed by key operators.# Enable npu-monitor, set the reporting period to 30s, and set the reported data type to Kernel. dyno --certs-dir /home/client_certs npu-monitor --npu-monitor-start --report-interval-s 30 --mspti-activity-kind Kernel # Disable npu-monitor. dyno --certs-dir /home/client_certs npu-monitor --npu-monitor-stop -
Run the
dynocommand to trigger nputrace to collect detailed trace data. (nputrace can be triggered only after npu-monitor is disabled.)# Collect data of two steps starting from the 10th step, including the framework, CANN, and device data. After the collection is complete, the data is automatically analyzed and not simplified. The output path is /tmp/profile_data. dyno --certs-dir /home/client_certs nputrace --start-step 10 --iterations 2 --activities CPU,NPU --analyse --data-simplification false --log-file /tmp/profile_data
Feature Description
NOTE
Due to low-level resource restrictions, the npu-monitor and nputrace functions cannot be enabled at the same time.
After the nputrace or npu-monitor command is executed, a JSON character string for response is displayed. The commandStatus field in the character string indicates whether the command takes effect. effective indicates that the command takes effect, and ineffective indicates that the command is invalid. Other fields are native fields of dynolog (available only when the status is effective).
Status Query
dyno --certs-dir <CERT_DIR> status # The value of --certs-dir must be the same in dyno and dynolog. <CERT_DIR> can be set to the certificate path. If the TLS certificate key is not used, set this parameter to NO_CERTS.
After the preceding command is executed, a JSON character string is printed, for example, {"current_step":1,"npumonitor":"Idle","nputrace":"Ready","start_step":5,"stop_step":10}.
State Description
Uninitialized: The program is not started or is before dynolog init.Idle: No command is delivered.Ready: The command has been delivered but the specified step is not reached.Running: Data is being collected.
Other Description:
- The
nputracefield has four states:Uninitialized,Idle,Running, andReady. Thenpumonitorfield has three states:Uninitialized,Idle, andRunning. start_stepandstop_stepindicate the range of the collection step. In the PyTorch framework, the valid collection range is [start_step, stop_step), that is,start_stepis included butstop_stepis not included. In the MindSpore framework, the valid collection range is [start_step, stop_step], that is,stop_stepis included.start_step** andstop_stepare printed only whennputraceis in theRunningorReadystate.- The default value of
current_stepis-1. - In the MindSpore framework,
nputracedoes not have theReadystate.
npu-monitor
The npu-monitor feature provides lightweight key metrics monitoring. The npu-monitor is developed based on MSPTI. Users can use the npu-monitor to view the execution time of compute and communication operators during model running. For details about how to use the npu-monitor, see npu-monitor. For details about how to use the msMonitor in the MindSpore framework, see Using msMonitor in MindSpore.
nputrace
The nputrace feature provides users with the capability of dynamically triggering the AI frameworks Ascend PyTorch Profiler and MindSpore Profiler to collect and parse data. That is, after a model is started, the model running does not need to be interrupted, and the profiler can be triggered for multiple times to collect and parse data with different configurations. The collected profile data can be visualized using MindStudio Insight, as shown in the following figure.
For details, see nputrace. For details about how to use msMonitor in the MindSpore framework, see Using msMonitor in MindSpore.

Monitor
Provides easy-to-use APIs to collect profile data of compute operators, communication operators, APIs, Runtime APIs, and MSTX. You can select metrics to be collected as needed. For details, see Monitor.
API Reference
API Reference of the mindstudio_monitor Module, including the following feature APIs:
- APIs for interacting with the dynolog component. For details, see "PyDynamicMonitorProxy APIs".
- Monitor feature APIs. For details, see "Monitor Feature APIs".
FAQs
FAQs summarize the problems that may occur when you use msMonitor. For details, see FAQs.
How to Contribute
For instructions on reporting issues, requesting features, and contributing code to msMonitor, see Contributing to MindStudio Monitor.
Contact Us
Security Statement
For details about the security hardening information and public network address information of MindStudio Monitor, see Security Statement.
Disclaimer
- This tool is intended solely for debugging and development. Users are responsible for any risks and should carefully review the following information:
- Data processing and deletion: Users are responsible for managing and deleting any data generated while using this tool. Users are advised to delete such data promptly after use to prevent information leakage.
- Data confidentiality and transmission: Users understand and agree not to share or transmit any data generated by this tool. Neither the tool nor its developers are responsible for any information leaks, data breaches, or other negative consequences.
- User input security: Users are responsible for the security of any commands they enter and for any risks or losses resulting from improper input. The tool and its developers are not liable for issues caused by incorrect command usage.
- Disclaimer scope: This disclaimer applies to all individuals and entities using this tool. By using the tool, you acknowledge and accept this statement and assume all risks and responsibilities arising from its use. If you do not agree, please stop using the tool immediately.
- Before using this tool, please read and understand the preceding disclaimer. If you have any questions, contact the developer.
License
For details about the license of MindStudio Monitor, see LICENSE.
Documents in the docs directory of msMonitor are licensed under CC-BY 4.0. For details, see LICENSE.
Acknowledgments
msMonitor is jointly developed by the following Huawei departments:
Huawei:
- Ascend Computing MindStudio Development Department
Thank you to everyone in the community for your PRs. We warmly welcome contributions to msMonitor.