npu-monitor
Overview
The npu-monitor tool is a lightweight, resident background tool that monitors the time consumption of key operators.
Preparations
The npu-monitor tool is enabled by running the npu-monitor subcommand in dyno CLI.
dyno --certs-dir <CERT_DIR> npu-monitor [SUBCOMMANDS]
Constraints
- The value of
--certs-dirmust be the same in dyno and dynolog. CERT_DIRcan be set to the certificate path. If the TLS certificate key is not used, setCERT_DIRtoNO_CERTS.
npu-monitor Functions
Function
Enables npu-monitor performance monitoring.
Syntax
dyno npu-monitor [SUBCOMMANDS] --help
The following table lists the subcommands of npu-monitor.
Subcommand Description
| Subcommand | Type | Description | Supported by PyTorch (Y/N) | Supported by MindSpore (Y/N) | Mandatory (Y/N) |
|---|---|---|---|---|---|
| --npu-monitor-start | action | Enables performance monitoring. This feature is disabled by default and only takes effect after configuration. | Y | Y | N |
| --npu-monitor-stop | action | Stops performance monitoring. This feature is disabled by default and only takes effect after configuration. | Y | Y | N |
| --report-interval-s | u32 | Indicates the interval for reporting profile data, in seconds. This subcommand needs to be set during startup. The default value is 60. |
Y | Y | N |
| --duration | f32 | Indicates the duration for collecting profile data, in seconds. This subcommand needs to be set during startup. The default value is 0.0, indicating that the collection duration is not limited. This subcommand can be set only when npu_monitor is started and cannot be modified during running. |
Y | Y | N |
| --mspti-activity-kind | String | Indicates the type of profile data to be reported. You can set one or more types. Multiple types must be separated by commas (,). The type is updated globally each time you set this subcommand. The value range is [Marker, Kernel, API, Hccl, Memory, MemSet, MemCpy, Communication, AclAPI, NodeAPI, 'RuntimeAPI']. The default value is Marker. |
Y | Y | N |
| --log-file | String | Indicates the path for outputting profile data to drives. Currently, only the data of the following types can be exported: Marker, Kernel, API, Communication, AclAPI, NodeAPI, and RuntimeAPI. The output data format can be DB or JSONL (for details, see the description of export-type). The default value is empty, indicating that data is not output to drives. |
Y | Y | N |
| --export-type | String | Indicates the format of the profile data to be output to drives. This parameter is valid only when log-file is set. The value range is [DB, Jsonl]. The default value is DB. This subcommand can be set only when npu_monitor is started and cannot be modified during running.1. DB: The output data is in DB format and the file name is msmonitor_{process_id}_{timestamp}_{rank_id}.db. For details about the content, see description of data exported from msprof in DB Format. MindStudio Insight can be used to visualize the data. (This tool does not support the display of msmonitor.db data collected in the single-process multi-device scenario.)2. Jsonl: The output data is in JSONL format. The file name is msmonitor_{process_id}_{timestamp}_{rank_id}.jsonl. Each line in the JSONL file contains a complete profile data record in JSON format. You can set the following environment variables to adjust the output process:MSMONITOR_JSONL_BUFFER_CAPACITY: size of the ring buffer for outputting data to drives. The value must be a power of 2 (2n2^{n}). The default value is 524288 (2192^{19}). The value range is [8192, 2097152] (2132^{13} to 2212^{21}).MSMONITOR_JSONL_MAX_DUMP_INTERVAL: maximum interval for outputting data to drives, in milliseconds. If the interval between the current time and the last time data is output to drives exceeds the threshold, data output is automatically triggered. The default value is 30000 ms, and the minimum value is 1000 ms.MSMONITOR_JSONL_ROTATE_LOG_LINES: maximum number of JSON data records in a single JSONL file. If the number is exceeded, a new file is created and output to drives. The value ranges from [100, 500000]. The default value is 10000.MSMONITOR_JSONL_ROTATE_LOG_FILES: maximum number of JSONL files that can be output to drives at a time. If the number is exceeded, the earliest files are deleted. The default value is -1 (this function is disabled). The minimum value is 2. |
Y | Y | N |
| --filter | String | Filters profile data by the name of the data to be collected. Different data types are separated by semicolons (😉, different data names are separated by commas (,), and the data type and name are separated by colons (😃. Fuzzy match is supported. You only need to set the keywords. If the value contains semicolons (😉, enclose the entire value with double quotation marks (""). Example: --filter "<activity_kind>:<data>[,<data>][;<activity_kind>:<data>[,<data>]]". The options of activity_kind are as follows: Marker, Kernel, API, Communication, AclAPI, NodeAPI, and RuntimeAPI. By default, all data is retained. |
Y | Y | N |
-
Start the dynolog daemon process. For details, see dynolog.
# Enable the dynolog daemon in CLI mode. dynolog --enable-ipc-monitor --certs-dir /home/server_certs # To use TensorBoard to display data, pass --metric_log_dir to specify the output path of the TensorBoard file. # Example: dynolog --enable-ipc-monitor --certs-dir /home/server_certs --metric_log_dir /tmp/metric_log_dir -
Configure the dynolog environment variable.
export MSMONITOR_USE_DAEMON=1 -
(Optional) Configure the msMonitor log path. The default path is
msmonitor_login the current directory.export MSMONITOR_LOG_PATH=<LOG PATH> # Example: export MSMONITOR_LOG_PATH=/tmp/msmonitor_log -
Set
LD_PRELOADto enable MSPTI.export LD_PRELOAD=<CANN_Toolkit_installation_path>/cann/lib64/libmspti.so # Example: export LD_PRELOAD=/usr/local/Ascend/cann/lib64/libmspti.so -
Start a training or inference job.
# The PyTorch optimizer or native optimizer is required in the training job. bash train.sh -
Use the dyno CLI to start npu-monitor.
# Example 1: Enable performance monitoring and use the default configuration. dyno --certs-dir /home/client_certs npu-monitor --npu-monitor-start # Example 2: Pause performance monitoring. dyno --certs-dir /home/client_certs npu-monitor --npu-monitor-stop # Example 3: Modify the configuration during performance monitoring. # Set the reporting period to 30s and the reported data type to Marker and Kernel, and retain only data of the Kernel type whose operator name contains the keyword "Mul". dyno --certs-dir /home/client_certs npu-monitor --report-interval-s 30 --mspti-activity-kind Marker,Kernel --filter Kernel:Mul # Example 4: Modify the configuration when performance monitoring is enabled. # Set the reporting period to 30s and the reported data type to Marker and Kernel, and retain only data of the Kernel type whose operator name contains the keyword "Mul". dyno --certs-dir /home/client_certs npu-monitor --npu-monitor-start --report-interval-s 30 --mspti-activity-kind Marker,Kernel --filter Kernel:Mul # Example 5: Modify the configuration when performance monitoring is enabled and enable data output. # The data output path is /tmp/msmonitor_db, the output period is 30s, and the collected data types are Marker, Kernel, and Communication. dyno --certs-dir /home/client_certs npu-monitor --npu-monitor-start --report-interval-s 30 --mspti-activity-kind Marker,Kernel,Communication --log-file /tmp/msmonitor_db # Example 6: Modify the configuration when performance monitoring is enabled in the multi-server scenario. # In the multi-server scenario, send parameter information to a specific server x.x.x.x. The parameters indicate that the reporting period is 30s and the reported data type is Marker and Kernel. dyno --certs-dir /home/client_certs --hostname x.x.x.x npu-monitor --npu-monitor-start --report-interval-s 30 --mspti-activity-kind Marker,Kernel
Output File Description
Observe the data reported by TensorBoard.
# Ensure that TensorBoard is installed.
pip install tensorboard
# Run TensorBoard.
tensorboard --logdir=<metric_log_dir> # metric_log_dir is the path specified by --metric_log_dir in the dynolog command line.
# Access http://localhost:6006 from the browser to view the corresponding visualization chart. localhost indicates the IP address of the server, and 6006 indicates the default port of TensorBoard.
For details about TensorBoard, visit https://github.com/tensorflow/tensorboard.