hiperf

hiperf is a command line tool that integrates multiple performance analysis capabilities, enabling you to identify system bottlenecks, locate software hotspots, optimize code efficiency, and collect and analyze runtime performance data.

You can use hiperf through Deveco Studio or SmartPerf to collect the function call stack, obtain the execution time of each function on the call stack, and view the call chain information in a swimlane diagram for performance analysis. For details, see Basic Time Analysis: Time and hiperf Usage. To specify the event, sampling period, collection duration, and number of CPU cores, you can use HiPerf. The perf.data file can be opened using SmartPerf and displayed in a flame graph.

This topic describes how to use hiperf to perform performance analysis.

Environment Setup

  • The environment for OpenHarmony Device Connector (hdc) has been set up. For details, see Environment Setup.

  • The devices are properly connected and hdc shell is executed.

Command Syntax

Run the hiperf --help command to list all hiperf commands, including dump, list, record, report, and stat.

$ hiperf --help
Command Description
--hilog Records logs generated during program running to HiLog.
--logpath Sets the save path of log files. You can set the output file path to /data/local/tmp/ and customize the file name.
--logtag Enables logs of a specified functionality.
--debug Records debug logs.
--verbose Records verbose logs.
--much Records much logs.
--nodebug Disables all logs.
--mixlog Outputs logs to the CLI.
-h/--help Displays the help information.
dump Converts the performance data file (for example, perf.data) into a readable format.
list Displays the performance event types supported by the system.
record Collects performance data.
report Converts performance data into visualized data.
stat Collects statistics on performance data.

Example

$ hiperf --help
Usage: hiperf [options] command [args for command]
options:
        --debug                 show debug log, usage format: --debug [command] [args]
        --help                  show help
        --hilog                 use hilog not file to record log
        --logpath               log file name full path, usage format: --logpath [filepath] [command] [args]
        --logtag                enable log level for HILOG_TAG, usage format: --logtag <tag>[:level][,<tag>[:level]] [command] [args]
                                tag: Dump, Report, Record, Stat... level: D, V, M...
                                example: hiperf --verbose --logtag Record:D [command] [args]
        --mixlog                mix the log in output, usage format: --mixlog [command] [args]
        --much                  show extremely much debug log, usage format: --much [command] [args]
        --nodebug               disable debug log, usage format: --nodebug [command] [args]
        --verbose               show debug log, usage format: --verbose [command] [args]     
        -h                      show help
command:
        dump:   Dump content of a perf data file, like perf.data
        help:   Show more help information for hiperf
        list:   List the supported event types.
        record: Collect performance sample information
        report: report sampling information from perf.data format file
        stat:   Collect performance counter information

See 'hiperf help [command]' for more information on a specific command.

Common Commands

Recording Performance Data Sampling

  1. Sample the process 1234 for 10 seconds. Set the stack unwinding mode to fp, sampling frequency to 1000 times per second, event types to hw-cpu-cycles and hw-instructions, and save the sampling file to /data/local/tmp/perf.data.

    $ hiperf record -p 1234 -s fp -f 1000 -d 10 -e hw-cpu-cycles,hw-instructions -o /data/local/tmp/perf.data
    Profiling duration is 10.000 seconds.
    Start Profiling...
    Timeout exit (total 10335 ms)
    Process and Saving data...
    Hiperf is not running as root mode. Do not need load kernel syms
    [ hiperf record: Captured 3.014 MB perf data. ]
    [ Sample records: 1293, Non sample records: 855 ]
    [ Sample lost: 0, Non sample lost: 0 ]
    

    The collected data is saved as a perf.data file in binary format, which contains the sampling data, process information, symbol table, and function calls required for performance analysis. You can use the flame graph script to convert the sampling data into a flame graph to identify system performance bottlenecks, locate software hotspots, and optimize code efficiency.

  2. Sample the application com.example.insight_test_stage. Set the sampling duration to 10s, stack unwinding mode to dwarf (debug information table), sampling period to 1000, event types to hw-cpu-cycles and hw-instructions, and use the default save path.

    $ hiperf record --app com.example.insight_test_stage -d 10 -s dwarf --period 1000 -e hw-cpu-cycles,hw-instructions
    Profiling duration is 10.000 seconds.
    Start Profiling...
    Timeout exit (total 10000 ms)
    Process and Saving data...
    Hiperf is not running as root mode. Do not need load kernel syms
    [ hiperf record: Captured 0.296 MB perf data. ]
    [ Sample records: 0, Non sample records: 2640 ]
    [ Sample lost: 0, Non sample lost: 0 ]
    

    The collected data is saved to the default path /data/local/tmp/perf.data.

Collecting Performance Statistics

  1. Count the 1745 and 1910 processes for 10 seconds.

    $ hiperf stat -d 10 -p 1745,1910
    Profiling duration is 10.000 seconds.
    Start Profiling...
    Timeout exit (total 10000 ms)
                        count  name                           | comment                          | coverage
                      148,450  hw-branch-instructions         | 26.404 M/sec                     | (100%)
                      49,833  hw-branch-misses               | 33.568878 miss rate              | (100%)
                    8,986,523  hw-cpu-cycles                  | 1.598409 GHz                     | (100%)
                    1,283,596  hw-instructions                | 7.001053 cycles per instruction  | (100%)
                          63  sw-context-switches            | 11.206 K/sec                     | (100%)
                            0  sw-page-faults                 | 0.000 /sec                       | (100%)
                    5,622,169  sw-task-clock                  | 0.000562 cpus used               | (100%)
    
  2. Count processes 1745 and 1910 for 10 seconds, with event types set to hw-cpu-cycles, hw-instructions, and sw-task-clock, and a print interval of 3000 ms.

    $ hiperf stat -d 10 -p 1745,1910 -e hw-cpu-cycles,hw-instructions,sw-task-clock -i 3000
    Profiling duration is 10.000 seconds.
    Start Profiling...
    Report at 3000 ms (6999 ms left):
                        count  name                           | comment                          | coverage
                    2,534,675  hw-cpu-cycles                  | 1.717114 GHz                     | (100%)
                      324,279  hw-instructions                | 7.816340 cycles per instruction  | (100%)
                    1,476,125  sw-task-clock                  | 0.000492 cpus used               | (100%)
    Report at 6000 ms (3999 ms left):
                        count  name                           | comment                          | coverage
                    5,112,570  hw-cpu-cycles                  | 1.724259 GHz                     | (100%)
                      648,303  hw-instructions                | 7.886081 cycles per instruction  | (100%)
                    2,965,083  sw-task-clock                  | 0.000494 cpus used               | (100%)
    Report at 9000 ms (999 ms left):
                        count  name                           | comment                          | coverage
                    7,870,422  hw-cpu-cycles                  | 1.724897 GHz                     | (100%)
                      994,407  hw-instructions                | 7.914689 cycles per instruction  | (100%)
                    4,562,835  sw-task-clock                  | 0.000507 cpus used               | (100%)
    Timeout exit (total 10000 ms)
    
  3. Count the process 1910 for 3 seconds, with the event types to hw-cpu-cycles and hw-instructions, and print detailed information.

    $ hiperf stat -d 3 -p 1910 -e hw-cpu-cycles,hw-instructions --verbose
    Profiling duration is 3.000 seconds.
    Start Profiling...
    Timeout exit (total 3000 ms)
    hw-cpu-cycles id:1342(c-1:p1910) timeEnabled:133583 timeRunning:133583 value:255740
    hw-cpu-cycles id:1343(c-1:p1988) timeEnabled:0 timeRunning:0 value:0
    hw-cpu-cycles id:1344(c-1:p1989) timeEnabled:0 timeRunning:0 value:0
    hw-cpu-cycles id:1345(c-1:p1990) timeEnabled:187833 timeRunning:187833 value:331425
    ...
    hw-instructions id:1375(c-1:p1910) timeEnabled:133583 timeRunning:133583 value:36485
    hw-instructions id:1376(c-1:p1988) timeEnabled:0 timeRunning:0 value:0
    hw-instructions id:1377(c-1:p1989) timeEnabled:0 timeRunning:0 value:0
    hw-instructions id:1378(c-1:p1990) timeEnabled:187833 timeRunning:187833 value:47816
    ...
                        count  name                           | comment                          | coverage
                      669,850  hw-cpu-cycles                  |                                  | (100%)
                      94,903  hw-instructions                | 7.058259 cycles per instruction  | (100%)
    

list

Displays the performance event types supported by the system, which can be used as parameters of the -e option in the record and stat commands.

Parameters

Name Description
-h/--help Displays the help information.
hw Lists the hardware events.
The following events are supported:
- hw-cpu-cycles
- hw-instructions
- hw-cache-references
- hw-cache-misses
- hw-branch-instructions
- hw-branch-misses
- hw-bus-cycles
- hw-stalled-cycles-frontend
- hw-stalled-cycles-backend
sw Lists the software events.
tp Lists the tracepoint event.
cache Lists the hardware cache events.
raw Lists original performance monitoring unit (PMU) events.

Example

Usage: hiperf list [event type name]

Query the supported hardware event types.

$ hiperf list hw
event not support hw-ref-cpu-cycles

Supported events for hardware:
        hw-cpu-cycles
        hw-instructions
        hw-cache-references
        hw-cache-misses
        hw-branch-instructions
        hw-branch-misses
        hw-bus-cycles
        hw-stalled-cycles-frontend
        hw-stalled-cycles-backend

record

Collects the performance data of a specified process or application, including the CPU cycle, number of instructions, and function calls, and saves the sampling data to a specified file. (For the default path, run the hiperf record -h/--help command to view the description of the -o parameter.)

NOTE

The process collected by the command must be that of a debug-type application.

Parameters of the record command

Parameter Description
-h/--help Displays the help information.
-c Sets the ID of the CPU to collect its data.
--cpu-limit Sets the maximum CPU usage during collection. The value ranges from 1 to 100. The default value is 25.
-d Sets the collection duration, in seconds. This parameter cannot be used together with --control.
-f Sets the collection frequency. The default value is 4000 times per second. This parameter cannot be used together with --period.
--period Sets the event collection period, that is, the number of events for each collection. This parameter cannot be used together with -f.
-e Sets the event to collect. Multiple event types are supported; separate them with commas. You can run the list command to obtain the supported event types.
-g Specifies the event groups to collect, which are separated by commas (,).
--no-inherit Collects no subprocess data.
-p Specifies the process ID to collect. Multiple process IDs are supported; separate them with commas (,). This parameter cannot be used together with -a.
-t Specifies the thread ID to collect. Multiple thread IDs are supported; separate them with commas (,). This parameter cannot be used together with -a.
--exclude-tid Specifies the thread ID not to collect. Multiple thread IDs are supported; separate them with commas (,). This parameter cannot be used together with -a.
--exclude-thread Specifies the thread name not to collect. Multiple thread names are supported; separate them with commas (,). This parameter cannot be used together with -a.
--offcpu Traces the time when a thread is out of CPU scheduling.
-j Samples branch stacks. The following filters are supported: any, any_call, any_ret, ind_call, ind_jmp, cond and call.
-s/--callstack Sets the stack unwinding mode, which can be fp (stack pointer) or dwarf (debug information table). The default mode is fp.
--kernel-callchain Collects kernel-mode stacks. This parameter must be used together with the -s parameter.
--callchain-useronly Collects only user stacks.
--delay-unwind Delays call stack unwinding until after recording when the stack mode is set to dwarf.
--disable-unwind Disables call stack unwinding after recording when the stack mode is set to dwarf.
--disable-callstack-expand Merges the call stacks using the cached thread stack when the stack mode is set to dwarf.
--enable-debuginfo-symbolic Parses the symbols in the .gnu_debugdata section of elf when -s fp/dwarf is set. By default, the symbols are not parsed.
--clockid Sets the collection clock type, which can be monotonic or monotonic_raw. Some events support the boottime, realtime, and clock_tai clock types.
--symbol-dir Sets the symbol table file path, which is used for symbolization during collection.
-m Sets the number of mmap pages. Value range: 2 to 1024. The default value is 1024.
--app Sets the application names to collect. Use commas (,) to separate them. The application must already be running. If it has not started, the command waits up to 20s and then exits automatically. This parameter cannot be used together with -a.
--chkms Sets the query interval, in milliseconds. The value ranges from 1 to 200. The default value is 10.
--data-limit Sets the limit of the output data size. When this limit is reached, the collection stops. By default, there is no limit.
-o Sets the output file path. You can customize the file name.
-z Outputs the data in a .gz file.
--restart Collects performance metrics about application startup. If the process is not started within 30 seconds, the collection stops.
--verbose Outputs a more detailed report.
--control [command] Controls the collection operation. The following commands are supported: prepare/start/pause/resume/output/stop. This parameter cannot be used together with -d.
--dedup_stack Deletes duplicate stacks from the record.
--cmdline-size Sets the value of the /sys/kernel/tracing/saved_cmdlines_size node, in bytes. The value ranges from 512 to 4096.
--report Collects the backtrace report.
--backtrack Collects data in a previous period. This parameter must be used together with --control prepare.
--backtrack-sec Collects the duration of previous data, in seconds. The value ranges from 5 to 30. The default value is 10. This parameter must be used together with --backtrack.
--dumpoptions Displays the collection parameter details.
-a Collects the device performance data.
--exclude-hiperf Excludes the performance data of the hiperf process. This parameter must be used together with -a.
--exclude-process Specifies the process name not to collect. This parameter must be used together with -a.
--pipe_input Establishes a command input pipe when the client process calls hiperf in device development. For details about how to use this capability, see hiperf. This parameter is not required for application development.
--pipe_output Establishes an output pipe when the client process calls hiperf in device development. For details about how to use this capability, see hiperf. This parameter is not required for application development.
--append-smo-data Appends the original .so file name to the packed .so file name.
Note: This parameter is supported since API version 23.

Example

Usage: hiperf record [options] [command [command-args]]

Sample the process 267 for 10 seconds and use dwarf to unwind the stack.

$ hiperf record -p 267 -d 10 -s dwarf

stat

Monitors the specified application and periodically prints the values of performance counters.

NOTE

The process collected by the command must be that of a debug-type application.

Parameters of the stat command

Parameter Description
-h/--help Displays the help information.
-c Sets the ID of the CPU to collect its data.
-d Sets the collection duration, in seconds. This parameter cannot be used together with --control.
-i Sets the interval for printing stat information, in milliseconds.
-e Specifies the events to collect. Multiple events are supported; use commas (,) to separate them.
-g Specifies the event groups to collect, which are separated by commas (,). You can run the list command to obtain the supported event types.
--no-inherit Collects no subprocess data.
-p Specifies the process ID to collect. Multiple process IDs are supported; separate them with commas (,). This parameter cannot be used together with -a.
-t Specifies the thread ID to collect. Multiple thread IDs are supported; separate them with commas (,). This parameter cannot be used together with -a.
--app Sets the application names to collect. Use commas (,) to separate them. The application must already be running. If it has not started, the command waits up to 20s and then exits automatically. This parameter cannot be used together with -a.
--chkms Sets the query interval, in milliseconds. The value ranges from 1 to 200. The default value is 10.
--per-core Obtains the print count of each CPU core.
--per-thread Obtains the print count of each thread.
--restart Collects performance indicator information about application startup. If a process is not started within 30 seconds, the record exits. This parameter must be used together with --app.
--verbose Outputs detailed information.
--dumpoptions Displays details about all options in the list.
--control [command] Controls the collection operation. The commands include prepare, start, and stop. This parameter cannot be used together with -d.
Note: This parameter is supported since API version 20.
-o Sets the output file path. You can customize the file name.
For the default path, run the hiperf stat -h/--help command to view the description of the -o parameter.
This parameter must be used with --control prepare, and cannot be used with --control.
Note: This parameter is supported since API version 20.
-a Collects the device performance data.

Example

hiperf stat [options] [command [command-args]]

Run the stat command to monitor the performance data of the process 2349 that runs on CPU 0 for three seconds.

$ hiperf stat -p 1745 -d 3 -c 0

dump

Converts performance data files in different formats (for example, perf.data) into plain texts for you to check the correctness of original sampling data.

Parameters of the dump command

Parameter Description
-h/--help Displays the help information.
--head Outputs only the data header and attributes.
-d Outputs only the data segment.
-f Outputs only additional functions.
--sympath Specifies the path of the symbol table file.
-i Specifies the path of the sampling file.
-o Sets the output file path. You can set the output file path to /data/local/tmp/ and customize the file name. If this parameter is not set, the data is output to the CLI.
--elf Converts the ELF file to a readable plaintext.
--proto Converts the .proto file to a readable plaintext.
--export Splits the user stack data into multiple files.

Example

Usage: hiperf dump [option] \<filename\>

Run the dump command to read the /data/local/tmp/perf.data file and export it to the /data/local/tmp/perf.dump file.

$ hiperf dump -i /data/local/tmp/perf.data -o /data/local/tmp/perf.dump

report

Converts the sampling data (perf.data) to the specified format (such as JSON or ProtoBuf), groups samples belonging to the same process, thread, or function into individual sample entries, sorts these entries by event count, and displays them in a report.

Parameters of the report command

Parameter Description
-h/--help Displays the help information.
--symbol-dir Specifies the path of the symbol table file.
--limit-percent Filters performance data whose share is at least the specified percentage (1 to 100). Only entries meeting this threshold are included in the report.
-s Displays the stack mode.
--call-stack-limit-percent Displays the stack content of a specified proportion. The value ranges from 1 to 100.
-i Specifies the resource file path. The default value is perf.data.
-o Sets the output file path. You can set the output file path to /data/local/tmp/ and customize the file name. If this parameter is not set, the data is output to the CLI.
--proto Outputs data in ProtoBuf format.
--json Outputs data in JSON format.
--diff Displays the differences between the source file and the converted file. This parameter cannot be used together with --proto, --json, or -s.
--branch Displays the branches based on the function address.
--<keys> <keyname1>[,keyname2][,...] Specifies the keywords, which can be comms, pids, tids, dsos, funcs, from_dsos or from_funcs, for example, --comms hiperf.
--sort [key1],[key2],[...] Sorts the data by keyword.
--hide_count Hides values in the report.
--dumpoptions Displays details about all options in the list.

Example

Usage: hiperf report [option] \<filename\>

Extract key data that has a great impact on performance (≥ 1%) from the perf.data file and displays the data in a report.

$ hiperf report -i /data/local/tmp/perf.data --limit-percent 1

FAQs

What should I do if hiperf fails to collect applications without the debug certificate signature

Symptom

Only applications with the debug certificate signature can be collected. The message "only support debug application" is displayed.

Possible Causes and Solution

Causes

The application does not have the debug certificate signature.

Solution

When the hiperf record/stat -p [pid] command is used, the process to be collected must be that of an application signed with the debug certificate.

Run the hdc shell "bm dump -n bundlename | grep appProvisionType" command to check whether the application specified in the command is a debug-type application. The expected output is "appProvisionType": "debug".

For example, run the following command to check the bundle name com.example.myapplication:

hdc shell "bm dump -n com.example.myapplication | grep appProvisionType"

If the application is a debug-type application, the following information is displayed:

"appProvisionType": "debug",

To build a debug-type application, you need to use a debug certificate for signature. For details about how to request and use the debug certificate, see Requesting a Debug Certificate.