Service Profiling Advisor
Overview
- Service Profiling Advisor, built upon the MindIE Service framework, provides one-click parameter tuning recommendations based on the output of benchmark tests to assist users in performance optimization. By analyzing the instance output results from benchmarks, the
config.jsonconfiguration of MindIE Service, NPU memory utilization, and model deployment, the tool performs a comprehensive analysis to offer tuning recommendations for parameters such asmaxBatchSizeandmaxPrefillBatchSizewithinconfig.json. These recommendations aim to improve performance metrics like Time to First Token (TTFT) and throughput. - Note: Due to variations in hardware such as CPU and memory, differences in network environments, and specific model parameter configurations, the recommended values may not guarantee performance improvement. Validation through actual modifications is required.
Supported Products
Note: For details about Ascend product models, see Ascend Product Models.
| Product Type | Supported (Yes/No) |
|---|---|
| Atlas A3 Training Products and Atlas A3 Inference Products | Yes |
| Atlas A2 Training Series Product | Yes |
| Atlas 200I/500 A2 inference products | Yes |
| Atlas Inference Products | Yes |
| Atlas training products | No |
Before You Start
Environment Setup
-
Prepare an Ascend Atlas 800I A2 training server with NPUs.
-
Prepare the Python environment: Python 3.10 or later is required.
-
Install dependencies.
pip install scipy loguru pandas psutil # Install necessary dependencies.
Data Preparation
- Ensure that the MindIE benchmark produces the expected output. The generated
instancefolder should be placed under/your/path/instance/. - Ensure that the MindIE Service
config.jsonfile is correctly configured. This file is typically located under/usr/local/Ascend/mindie/latest/mindie-service/conf/.
Tool Installation
-
Installation from Source Service Profiling Advisor requires msServiceProfiler as its entry point. If msServiceProfiler is not installed, install it first. For details, see msServiceProfiler.
git clone https://gitcode.com/Ascend/msserviceprofiler.git # Skip this if the repository is already cloned. cd msserviceprofiler/msservice_advisor pip install . msserviceprofiler advisor -h
Tool Uninstallation
pip uninstall msservice_advisor
Function Description
Functions
- Scenario 1: Recommend
maxBatchSizefor the decode phase based on the NPU memory, input/output lengths, and model size.- You need to provide the
instancefolder, or manually specify the input/output token lengths using-in, --input_token_numand `-out, --output_token_num``. - To obtain the model path, you need to provide the MindIE Service
config.jsonfile. - The NPU memory usage corresponding to
npuDeviceIdsspecified in the MindIE Serviceconfig.jsonfile should reflect the actual memory usage before MindIE Service startup. Alternatively,npuMemSizein the MindIE Serviceconfig.jsonfile must be explicitly specified.
- You need to provide the
- Scenario 2: Fit a model to the
instanceoutput data from benchmark tests and recommendmaxBatchSizeandmaxPrefillBatchSizevalues.- The
instancefolder must be provided, containing at least 1,000 samples. - If the data required for Scenario 1 is also provided, the
maxBatchSizerecommendation from Scenario 1 will be taken into account.
- The
Precautions
Security warning: Do not run this tool as user root. Executing operations with elevated privileges may compromise system security. It is advised to use a regular user account.
Syntax
# Specifies the input and output token lengths (`-in, --input_token_num` and `-out, --output_token_num`).
msserviceprofiler advisor -in 4096 -out 256
# Provide the 'instance' folder.
msserviceprofiler advisor -i /your/path/instance/
Parameter Description
| Parameters | Mandatory (Yes/No) | Description |
|---|---|---|
| -i or --instance_path | No | Path to the benchmark instance output. If this parameter is not specified, related information will not be used in the analysis by default. |
| -s or --service_config_path | No | Path to the MindIE Service path or config. json file. The default value is the environment variable MIES_INSTALL_PATH of MindIE Service. If neither of them is configured, /usr/local/Ascend/mindie/latest/mindie-service is used. |
| -t or --target | No | Metrics to optimize. The options are as follows: · ttft: Time to First Token (TTFT).· firsttokentime: TTFT.· throughput: throughput.The default value is ttft. |
| -m or --target_metrics | No | Specific metric to optimize. The options are as follows: · average: average value.· max: maximum value.· min: minimum value.· P75: 75th percentile.· P90: 90th percentile.· SLO_P90: 90 percentile under the specific Service Level Objective (SLO) constraints.· P99: 99th percentile.· N: Nth percentile.The default value is average. |
| -l or --log_level | No | Log level. The options are as follows: · debug: debug level.· info: information level.· warning: warning level.· error: error level.· fatal: fatal level.· critical: critical level.The default value is info. |
| -in or --input_token_num | No | Request input length. The value must be a positive integer. If this parameter is not specified, the value is obtained from the benchmark instance results by default. |
| -out or --output_token_num | No | Request output length. The value must be a positive integer. If this parameter is not specified, the value defaults to the maxIterTimes value in the MindIE Service config.json. |
| -tp or --tp | No | Tensor Parallelism (TP) domain size. The value must be a positive integer. If this parameter is not specified, the value is obtained from the MindIE Service config.json file. If not found, 1 is used by default. |
Usage Example
-
Scenario 1
- Enter the
-ior-inparameter. The related parameters in the MindIE Serviceconfig.jsonfile are correctly configured, and the available NPU memory for the serving configuration is not 0 (as described in the third point in Scenario 1 from Function Description).
msserviceprofiler advisor -in 4096 -out 256 - Enter the
-
Scenario 2
- Fit a model to the benchmark
instanceoutput data from the path specified by-i, --instance_pathand recommendmaxBatchSizeandmaxPrefillBatchSizevalues. ThemaxBatchSizerecommendation fromScenario 1will be taken into account.
msserviceprofiler advisor -i /your/path/instance/ - Fit a model to the benchmark
Output Description
After the advisor command completes, tuning recommendations are displayed as follows:
-
Scenario 1
- The output indicates the recommended range for the
maxBatchSizevalue in the MindIE Serviceconfig.jsonfile. The range is calculated based on available NPU memory, model architecture, MindIE Serviceconfig.jsonconfiguration, and input/output token lengths. - You are advised to set
maxBatchSizein the MindIE Serviceconfig.jsonfile to the average value and setmaxPrefillBatchSizeto half ofmaxBatchSize. Then, restart the service, and check whether performance improves. - If performance improves, try to gradually approach the maximum value within the range and monitor the performance metrics.
- If
maxBatchSizeis set too high, the model may fail to start. In this case, decrease the value toward the lower end of the range until the model starts successfully. - Generally,
maxPrefillBatchSizeis typically set to half ofmaxBatchSize.
# msservice_advisor_logger - INFO - </think> # msservice_advisor_logger - INFO - # msservice_advisor_logger - INFO - <advice> # msservice_advisor_logger - INFO - [config] maxBatchSize # The value range of `msservice_advisor_logger - INFO - [advice]` is [xx, xx], and the average value is `xx`. # Based on current NPU memory usage, it is advised to set maxBatchSize to the average value and gradually adjust toward the upper end of the range to fully utilize NPU memory. # msservice_advisor_logger - INFO - </advice> - The output indicates the recommended range for the
-
Scenario 2
- The output provides recommendations for
maxBatchSizeandmaxPrefillBatchSize. ThemaxBatchSizerecommendation takes into account the value recommended inScenario 1. - You can apply the recommended values to the
config.jsonfile and check whether the performance improves. If the result is not satisfactory, you can apply only one of these values and verify the inference performance. - In this mode, a fitted data plot is also generated to help you assess whether the data fitting is reasonable.
# Path of the msservice_advisor_logger - INFO - fitted plot path: func_curv_031734.png # msservice_advisor_logger - INFO - <think> # ... # msservice_advisor_logger - INFO - </think> # msservice_advisor_logger - INFO - # msservice_advisor_logger - INFO - <advice> # msservice_advisor_logger - INFO - [config] maxBatchSize # Try to set msservice_advisor_logger - INFO - [advice] Try setting to 25 (original value: 50). # Based on latency data for different batch sizes and function fitting analysis, this is the recommended optimal batch size. # msservice_advisor_logger - INFO - # msservice_advisor_logger - INFO - [config] maxPrefillBatchSize # Try to set msservice_advisor_logger - INFO - [advice] Try setting to 50 (original value: 100). # Based on latency data for different batch sizes and function fitting analysis, this is the recommended optimal batch size. # msservice_advisor_logger - INFO - # msservice_advisor_logger - INFO - </advice> - The output provides recommendations for