Service Performance Split Tool
Overview
Based on performance data collected by msServiceProfiler, Service Performance Split Tool breaks down the time consumed in each phase of service batch execution—including batching, input transfer, model execution, and output retrieval. This breakdown helps identify performance bottlenecks, enabling developers to optimize the framework.
Concepts
prefill batch: The prefill phase is the initial stage where the model processes the entire input prompt to compute and generate the first output token. The batch processed in this phase is called a prefill batch.decode batch: In the decode phase, the model generates subsequent output tokens one by one. Each iteration in this phase involves less computation compared to the prefill phase, but the decode phase may consist of many iterations due to the sequential token generation. The batch processed in this phase is called a decode batch.
Product Support
Note
For details about Ascend product models, see Ascend Product Models.
| Product Type | Supported (Yes/No) |
|---|---|
| Atlas A3 training products and Atlas A3 inference products | Yes |
| Atlas A2 training products and Atlas A2 inference products | Yes |
| Atlas 200I/500 A2 inference products | Yes |
| Atlas inference products | Yes |
| Atlas training products | x |
Note
For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800I A2 inference server is supported. For Atlas inference products, only the Atlas 300I Duo inference card and Atlas 800 inference server (model 3000) are supported.
Preparations
Environment Setup
Install msServiceProfiler.
Version Mapping
| Service Performance Split Tool | CANN | MindIE |
|---|---|---|
| Dependency Version | ≥ CANN 8.2.RC1 | ≥ MindIE 2.1.RC1 |
Function Description
Function
Performs fine-grained breakdown on service profile data.
Precaution
None
Syntax
msserviceprofiler split
--input-path /path/to/input
[--output-path path/to/output]
[--log-level level]
[--prefill-number prefill_number]
[--decode-number decode_number]
{
--prefill-batch-size prefill_batch_size
--prefill-rid prefill_rid
--decode-batch-size decode_batch_size
--decode-rid decode_rid
}
Optional fields are enclosed in square brackets ([]), and mandatory fields are enclosed in braces ({}).
Parameter Description
| Parameter | Description | Mandatory (Yes/No) |
|---|---|---|
| --input-path | Specifies the path to the profile data. | Yes |
| --output-path | Specifies the output directory where the breakdown result files will be saved. It defaults to the output folder in the current directory. |
No |
| --log-level | Sets the log level. The options are debug, info, warning, error, fatal, and critical. It defaults to info. |
No |
| --prefill-batch-size | Specifies the batch_size value for prefill batch breakdown. This value can be obtained from the batch_size field in batch.csv. It defaults to 0 (to disable prefill performance breakdown). |
No |
| --prefill-number | Specifies the number of prefill batches to break down. It defaults to 1 and is used to calculate the max, min, average, and standard deviation of execution time. |
No |
| --prefill-rid | Specifies the request ID for prefill batch breakdown. This value can be obtained from the http_rid field in request.csv. It defaults to -1 (to disable prefill performance breakdown). |
No |
| --decode-batch-size | Specifies the batch_size value for decode batch breakdown. This value can be obtained from the batch_size field in batch.csv. It defaults to 0 (to disable decode performance breakdown). |
No |
| --decode-number | Specifies the number of decode batches to break down. It defaults to 1 and is used to calculate the max, min, average, and standard deviation of execution time. |
No |
| --decode-rid | Specifies the request ID for decode batch breakdown. This value can be obtained from the http_rid field in request.csv. It defaults to -1 (to disable decode performance breakdown). |
No |
Examples
- Scenario 1: Specify the
batch sizevalue for breakdown.-
To break down 100
prefill batchdata records withbatch_sizeset to1:msserviceprofiler split --input-path=/path/to/input --output-path=/path/to/output/ --prefill-batch-size=1 --prefill-number=100After the execution is complete, the output file
prefill.csvis generated in the result path. -
To break down 50
decode batchdata records withbatch_sizeset to10:msserviceprofiler split --input-path=/path/to/input --output-path=/path/to/output/ --decode-batch-size=10 --decode-number=50After the execution is complete, the output file
decode.csvis generated in the result path.
-
- Scenario 2: Specify the
ridvalue for breakdown.-
To break down data for the prefill phase:
msserviceprofiler split --input-path=/path/to/input --output-path=/path/to/output/ --prefill-rid=efcas2dAfter the execution is complete, the output file
prefill.csvis generated in the result path. -
To break down data for the decode phase:
msserviceprofiler split --input-path=/path/to/input --output-path=/path/to/output/ --decode-rid=efcas2dAfter the execution is complete, the output file
decode.csvis generated in the result path.
-
Output Description
-
prefill.csvField Description name Labels a batch event. during_time(ms) Execution time of the current batch event, in ms max Maximum event execution time, in ms min Minimum event execution time, in ms mean Average event execution time, in ms std Standard deviation of the event execution time, in ms pid Process ID of the event tid Thread ID of the event start_time(ms) Start time of the current batch event, displayed as a timestamp, in ms end_time(ms) End time of the current batch event, displayed as a timestamp, in ms rid Request ID. -
decode.csvThe format is the same as that ofprefill.csv. Thedecode.csvfile does not contain theridcolumn. -
Mapping Between Domains and Parsed Results
Parsed Result Collection Domain prefill.csv "Request; BatchSchedule; ModelExecute" decode.csv "BatchSchedule; ModelExecute"