# Profiling support in runtime

The AIC100 stack provides mechanism to get performance metrics of
key milestones through the inference cycle to triage low
performance and for general monitoring.

There are two broad classifications of profiling data:

1. Host metrics: An inference passes through various software layers on the
host before it makes it to the network on device. Examining the
performance on the host helps identify if tweaking the network pre and post
processing stages or the host side multi-threading knobs is required.
2. Network/device metrics: Network metrics include a wide-range of
information from whole network execution time to detailed
operator-level performance. Use this information to
make the most out of the existing network or to optimize the network itself.

    You must enable network performance collection during network compilation itself because it is not baked into the network by
default. The amount of network performance details present depends on the parameters
passed to the AIC compiler.

## Profiling report types

You can use the the AIC100 software stacks to request the following types (layouts) of profiling information:

### Latency type

Latency type is a CSV style table of key of AIC100 stack. It contains both
host and device side information.

### Trace type

Trace type is a json formatted Chrome trace data. It can be viewed on
any interface that consumes Chrome traces. It contains both host and device
side information.

## Profiling report collection methods

There are two ways to request the profiling information from the AIC100 stack:

1. Num-iter based profiling
2. Duration based profiling

The core content is the same in both mechanisms, just the delivery mechanism changes.

### Num-iter based profiling

Alias: Legacy profiling

The fundamental idea for num-iter based profiling is that you create a profiling handle where you specify the program to be profiled and the number of inferences
that need to be sampled. Only one program can be profiled by a given profiling handle. If you wants to profile multiple
programs, you must create multiple profiling handles, one for each program.

To create a profiling handle, specify the following:

1. The program to profile.
2. Number of samples to collect.
3. Profiling callback.
4. Type of profiling output type (latency or trace).

After you start the profiling, the profiling stops and calls
the callback you specified in the previous step when either of the following occurs:

1. The number of samples requested by user has been collected.
2. You explicitly stop profiling. In this case, the number of samples
collected might be less than the number specified during profiling handle
creation.

After the profiling is stopped, you can start profiling again
using the same handle. The behavior of the profiling is the same as when the
handle is triggered for the first time.

Refer to section `ProfilingHandle`\_ for HPP API interface.

### Duration-based profiling

Alias: Stream profiling

For duration-based profiling, create a profiling handle and specify the following:

1. Reporting rate
2. Sampling rate
3. Callback
4. RegEx
5. Type of profiling output type (latency or trace)

Notice how there is no condition to specify when profiling should
automatically end, hence once you call to start profiling, the
samples are collected until upi explicitly call to stop profiling. Also notice that you
do not specify the program to profile. You add and remove
programs at runtime (even after profiling has started) using the appropriate
API. This allows more than one program to be profiled by same
profiling handle.

#### Reporting rate

Duration-based profiling calls the callbacks at every reporting rate boundary.
For example, assume the reporting rate set by user is 500ms and the profiling is
started at 4 seconds and 400ms time-point. In this case, the first callback is
called at the 4 seconds and 500ms time-point (note that the callback did not get
called after 500ms, but at the 500ms boundary). The next callback is at the 5 seconds
time-point, then at 5 seconds 500ms time-point, and continues for every 500ms until stopped.
The callback contains profiling data for inferences that took place between
the last report and the current report.

#### Sampling rate

If you are not interested in performance of each and every inference,
you can use the sampling rate to record the data for every second,
fourth, eighth, or sixteenth inference.  Using the sampling rate enables you to get an overview of the performance
rather than the details for each inference.

#### RegEx

User may want to profiling all the programs running under the process
matching a regular expression say “Resnet50\*”. If the user creates a
profiling handle with a specific regular expression, any new program
created, whose name passes the regEx filter, will automatically start
getting profiled. Once the program is released by the user, it
automatically is also removed from the profiling handle’s list of
programs.

Note: Addition/removal of program can lead to a report generation
getting delayed or -preponed. Note: RegEx engine used is ECMAScript.

Refer to section `StreamProfilingHandle`\_ for HPP API interface.

Last Published: May 01, 2026

[Previous Topic
InferenceSet IO Example](https://docs.qualcomm.com/bundle/publicresource/80-99100-3/topics/index_example.md) [Next Topic
Runtime](https://docs.qualcomm.com/bundle/publicresource/80-99100-3/topics/index_runtime.md)