# Execute the QPC

After [compiling the model](https://docs.qualcomm.com/doc/80-99100-3/topic/index_model-compilation.html#reference-to-compile-the-model) into a QPC, the next step is to run inference on the device using the precompiled QPC. There are four ways to call inference on the Qualcomm Cloud AI 100 device:

- Use `qaic-runner`
- Use Python APIs
- Use C++ APIs
- Use ONNX Runtime

This section describes how to use `qaic-runner` to call inference and how to configure the profiling options to collect [inference performance](https://docs.qualcomm.com/doc/80-99100-3/topic/index_model-execution.html#reference-to-inference-profiling) metrics. See Next steps for information on how to use the other methods to call inference and execute the QPC.

## Use `qaic-runner`

The `qaic-runner` is a command line interface (CLI) tool designed to facilitate inference performance and benchmarking analysis. It provides various options and functionalities to facilitate inference and performance/benchmarking analysis Use `qaic-runner` primarily for performance testing purposes. For actual inference tasks, it is recommended to use the [Python APIs](https://docs.qualcomm.com/doc/80-99100-3/topic/index_Python-API.html#reference-to-python-api) or [C++ APIs](https://docs.qualcomm.com/doc/80-99100-3/topic/index_Cpp-API.html#reference-to-cpp-api), depending on your preferred technology stack.

### Prerequisites

The examples in this section assume that a [QPC](https://docs.qualcomm.com/doc/80-99100-3/topic/index_model-compilation.html#reference-to-compile-the-model) is already generated.

### Usage examples

The examples in this section show how to:

- Run inference in `qaic-runner` with random inputs
- Run inference on a set of inputs that you provide.
- Generate dumps and device level profiling.

#### Run inference in `qaic-runner` with random inputs

This example shows running inference in `qaic-runner` with random inputs. Because inputs aren’t provided, you’re feeding a randomly generated input to the device with the appropriate dimensions and type inferred from the QPC. You can use `qaic-runner` in this configuration to measure performance.

sudo /opt/qti-aic/exec/qaic-runner -t /path/to/qpc  -a 3 -n 5000 -d 0 -v
    Copy to clipboard

- `-a 3`: Number of activations.  Activations here refers to the number of instances of the network you want to run on the device. In this case 3 copies of the network can run in parallel on the device.

> 
> 
> - If you assume each network was compiled with 4 cores, then the device needs at least 12 (3x4) cores free.
>     - You can use `sudo /opt/qti-aic/tools/qaic-qpc validate -i /path/to/qpc/programqpc.bin` and look for `Number of NSP required` in the output to find the number of cores.
- `-n 5000`: Number of iterations. This single randomly generated input is used for 5000 inferences.
- `-d 0`: The device ID. Use `/opt/qti-aic/tools/qaic-util -q` to find the device ID.
- `-v`: Enables verbose logging.

#### Run inference on a set of inputs

Before running inference, it is necessary to convert the inputs to the appropriate format based on input size and type. Look at these [Jupyter notebook examples](https://github.com/quic/cloud-ai-sdk/blob/1.10/tutorials/NLP/Model-Onboarding-Beginner).

#### Generate dumps for latency capture

The following example command shows how to generate latency stats:

!/opt/qti-aic/exec/qaic-runner -t ./BERT_LARGE -a 8 -S 1 -d 0 \ #-i inputFiles/input.raw \
    --aic-profiling-format latency --aic-profiling-out-dir ./BERT_LARGE_STATS \
    --aic-profiling-start-iter 100 --aic-profiling-num-samples 99999 --time 20
    Copy to clipboard

- `aic-profiling-out-dir` : Output directory for latency capture
(needs to exist before this command is run).
- `aic-profiling-start-iter` : Set this value high enough to start
capturing samples after device warmup.
- `aic-profiling-num-samples` : Number of samples to be captured. Can be
set greater than the number of inferences.

Look at this Jupyter notebook
[example](https://github.com/quic/cloud-ai-sdk/tree/1.10/tutorials/NLP/Profiler-Intermediate).

### Parameters and default values

The following table lists the `qaic-runner` parameters and default values:

<details class="sd-sphinx-override sd-dropdown sd-card sd-mb-3 sd-border-1">
<summary class="sd-summary-title sd-card-header sd-bg-light sd-bg-text-light">
<span class="sd-summary-icon"><span class="svg-1 sd-octicon sd-octicon-beaker"><svg version="1.1" width="1.0em" height="1.0em" class="sd-octicon sd-octicon-beaker" viewbox="0 0 16 16" aria-hidden="true"><path fill-rule="evenodd" d="M5 5.782V2.5h-.25a.75.75 0 010-1.5h6.5a.75.75 0 010 1.5H11v3.282l3.666 5.76C15.619 13.04 14.543 15 12.767 15H3.233c-1.776 0-2.852-1.96-1.899-3.458L5 5.782zM9.5 2.5h-3V6a.75.75 0 01-.117.403L4.73 9h6.54L9.617 6.403A.75.75 0 019.5 6V2.5zm-6.9 9.847L3.775 10.5h8.45l1.175 1.847a.75.75 0 01-.633 1.153H3.233a.75.75 0 01-.633-1.153z"></path></svg></span></span><span class="svg-2 sd-octicon sd-octicon-kebab-horizontal"><svg version="1.1" width="1.5em" height="1.5em" class="sd-octicon sd-octicon-kebab-horizontal" viewbox="0 0 24 24" aria-hidden="true"><path fill-rule="evenodd" d="M6 12a2 2 0 11-4 0 2 2 0 014 0zm8 0a2 2 0 11-4 0 2 2 0 014 0zm6 2a2 2 0 100-4 2 2 0 000 4z"></path></svg></span><div class="sd-summary-down docutils">
<span class="svg-3 sd-octicon sd-octicon-chevron-down"><svg version="1.1" width="1.5em" height="1.5em" class="sd-octicon sd-octicon-chevron-down" viewbox="0 0 24 24" aria-hidden="true"><path fill-rule="evenodd" d="M5.22 8.72a.75.75 0 000 1.06l6.25 6.25a.75.75 0 001.06 0l6.25-6.25a.75.75 0 00-1.06-1.06L12 14.44 6.28 8.72a.75.75 0 00-1.06 0z"></path></svg></span></div>
<div class="sd-summary-up docutils">
<span class="svg-4 sd-octicon sd-octicon-chevron-up"><svg version="1.1" width="1.5em" height="1.5em" class="sd-octicon sd-octicon-chevron-up" viewbox="0 0 24 24" aria-hidden="true"><path fill-rule="evenodd" d="M18.78 15.28a.75.75 0 000-1.06l-6.25-6.25a.75.75 0 00-1.06 0l-6.25 6.25a.75.75 0 101.06 1.06L12 9.56l5.72 5.72a.75.75 0 001.06 0z"></path></svg></span></div>
</summary><div class="sd-summary-content sd-card-body docutils">
<table class="docutils align-default">
<colgroup>
<col style="width: 46.0%">
<col style="width: 46.0%">
<col style="width: 8.0%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p class="sd-card-text">Parameter</p></th>
<th class="head"><p class="sd-card-text">Description</p></th>
<th class="head"><p class="sd-card-text">Default</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-d,</span> <span class="pre">--aic-device-id</span> <span class="pre">&lt;id&gt;</span></code></p></td>
<td><p class="sd-card-text">Specify AIC device ID.</p></td>
<td><p class="sd-card-text">0</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-D,</span> <span class="pre">--dev-list</span> <span class="pre">&lt;qid&gt;[:&lt;qid&gt;]</span></code></p></td>
<td><p class="sd-card-text">Map of device IDs for a multi-device network.</p></td>
<td><p class="sd-card-text">0[:1]</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-d,</span> <span class="pre">--aic-device-id</span> <span class="pre">&lt;id&gt;</span></code></p></td>
<td><p class="sd-card-text">AIC device ID, default Auto-pick</p></td>
<td><p class="sd-card-text">0</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-D,</span> <span class="pre">--aic-device-map</span> <span class="pre">&lt;qid&gt;[:&lt;qid&gt;]</span></code></p></td>
<td><p class="sd-card-text">Map of Device IDs for multi-device network,</p></td>
<td><p class="sd-card-text">0[:1]</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-t,</span> <span class="pre">--test-data</span> <span class="pre">&lt;path&gt;</span></code></p></td>
<td><p class="sd-card-text">Location of program binaries</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-i,</span> <span class="pre">--input-file</span> <span class="pre">&lt;path&gt;</span></code></p></td>
<td><p class="sd-card-text">Input filename from which to load input data. Specify multiple times for each input file. If no -i is given, look for available inputs in bindings.json at the -t directory. If bindings.json is not available, random input will be generated.</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-n,</span> <span class="pre">--num-iter</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Number of iterations,</p></td>
<td><p class="sd-card-text">40</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--time</span> <span class="pre">&lt;t&gt;</span></code></p></td>
<td><p class="sd-card-text">Duration (in seconds) for which to submit inferences</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-l,</span> <span class="pre">--live-reporting</span></code></p></td>
<td><p class="sd-card-text">Enable Live reporting periodic at 1 sec interval</p></td>
<td><p class="sd-card-text">off</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text">-r, –live-reporting-period</p></td>
<td><p class="sd-card-text">Set Live Reporting Period in Ms</p></td>
<td><p class="sd-card-text">1000</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-s</span> <span class="pre">--stats</span></code></p></td>
<td><p class="sd-card-text">Enable Live Profiling Stats reporting periodically at 1 sec interval</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-a,</span> <span class="pre">--aic-num-of-activations</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Number of activations</p></td>
<td><p class="sd-card-text">1</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-start-iter</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling Start Iteration</p></td>
<td><p class="sd-card-text">0</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-start-delay</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling Start delay (in milliseconds). Profiling will start after given delay period has elapsed</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-num-samples</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling Num Samples to save to file</p></td>
<td><p class="sd-card-text">1</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-format</span> <span class="pre">&lt;level&gt;</span></code></p></td>
<td><p class="sd-card-text">Deprecated</p></td>
<td><p class="sd-card-text">DEF</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-type</span> <span class="pre">&lt;type&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling Type, ‘stats’|’trace’|’latency’ for legacy profiling and ‘trace_stream’ | ‘latency_stream’ for stream profiling. Set multiple times for multiple formats</p></td>
<td><p class="sd-card-text">none</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-duration</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling duration to run profiling for (in ms). After starting profiling, it will stop at the expiry of profiling duration</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-sampling-rate</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling sampling rate [full/half/fourth/eighth/sixteenth]. Programs will generate profiling samples at the requested rate. To profile all samples select full, for every second sample select half and so on</p></td>
<td><p class="sd-card-text">full</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-reporting-rate</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Profiling report generation rate (in ms) [500/1000/2000/4000]. Profiling report will be generated at every requested interval for profiling duration</p></td>
<td><p class="sd-card-text">500</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-profiling-out-dir</span> <span class="pre">&lt;path&gt;</span></code></p></td>
<td><p class="sd-card-text">Location to save files, dir should exist and be writable</p></td>
<td><p class="sd-card-text">‘.’</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--write-output-start-iter</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Write outputs start iteration</p></td>
<td><p class="sd-card-text">0</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--write-output-num-samples</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Number of outputs to write</p></td>
<td><p class="sd-card-text">1</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--write-output-dir</span> <span class="pre">&lt;path&gt;</span></code></p></td>
<td><p class="sd-card-text">Location to save output files, dir should exist and be writable</p></td>
<td><p class="sd-card-text">‘.’</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-lib-path</span> <span class="pre">DEPRECATED</span></code></p></td>
<td><p class="sd-card-text">Deprecated, please set env variable QAIC_LIB to the full path of the custom library, by default loads libQAic.so from install location</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-batch-input-directory</span></code></p></td>
<td><p class="sd-card-text">Batch mode: process all files from input directory. Only the networks with single input file are currently supported</p></td>
<td><p class="sd-card-text">DEF</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-batch-input-file-list</span></code></p></td>
<td><p class="sd-card-text">Batch mode: Specify an input file containing comma-separated absolute path for buffers. Each line is 1 inference and must have number and size of buffers required by program</p></td>
<td><p class="sd-card-text">DEF</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--aic-batch-max-memory</span> <span class="pre">&lt;mb&gt;</span></code></p></td>
<td><p class="sd-card-text">Batch mode: Limit memory usage when loading files, provide parameter in Mb</p></td>
<td><p class="sd-card-text">1024</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--submit-timeout</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Time to wait for an inference request completion on kernel. default 0 ms. When 0, kernel defaults to 5000ms</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--submit-retry-count</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Number of wait-call retries when an inference request times out.</p></td>
<td><p class="sd-card-text">5</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--unbound-random</span></code></p></td>
<td><p class="sd-card-text">When populating random values in buffer, do not consider input buffer format and fill each byte with random input between 0 to 255. This can result in unexpected behavior from certain network.</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--dump-input-buffers</span></code></p></td>
<td><p class="sd-card-text">Dump input buffers used in benchmarking mode</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-S,</span> <span class="pre">--set-size</span> <span class="pre">&lt;num&gt;</span></code></p></td>
<td><p class="sd-card-text">Set Size for inference loop execution, min:1</p></td>
<td><p class="sd-card-text">10</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-T,</span> <span class="pre">--aic-threads-per-queue</span></code></p></td>
<td><p class="sd-card-text">Number of threads per queue</p></td>
<td><p class="sd-card-text">4</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">--auto-batch-input</span></code></p></td>
<td><p class="sd-card-text">Automatically batch inputs to meet batchsize requirements of network. Inputs should be for Batch size 1</p></td>
<td><p class="sd-card-text">1</p></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-p,</span> <span class="pre">--pre-post-processing</span></code></p></td>
<td><p class="sd-card-text">Pre-post processing [on|off]</p></td>
<td><p class="sd-card-text">on</p></td>
</tr>
<tr class="row-even"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-v,</span> <span class="pre">--verbose</span></code></p></td>
<td><p class="sd-card-text">Verbose log from program</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p class="sd-card-text"><code class="docutils literal notranslate"><span class="pre">-h,</span> <span class="pre">--help</span></code></p></td>
<td><p class="sd-card-text">help</p></td>
<td></td>
</tr>
</tbody>
</table>
</div>
</details>

## Profile inference time

Cloud AI supports both system and device level profiling to help you identify performance bottlenecks. You can do system profiling without any changes to the QPC while device profiling requires that you recompile the model.

### Profile the system

System level profiling includes the breakdown of the inference time
between application, linux runtime, kernel mode driver and device
processing. This profiling can provide insights into the inference time spent on
the host vs. device per inference. You can optimize your
application or the model with this information.

See the [Profiler notebook](https://github.com/quic/cloud-ai-sdk/tree/1.20/tutorials/NLP/Profiler-Intermediate) for an example of the complete system-level profiling workflow using qaic-runner.

Profiling using the [Python APIs](https://docs.qualcomm.com/doc/80-99100-3/topic/index_Python-API.html#reference-to-python-api) is also supported.

### Profile the device

Device level profiling identifies bottlenecks in the inference execution on the device. This profiling requires
a good understanding of the AI core and SoC architecture. There are three
key features that would be interesting to developers:

- Memory metrics: Provides a compiler estimate of the usage of
on-board DDR vs. VTCM (vector tightly coupled memory) for a model.
- Summary view: Provides a histogram of the operations, total
time taken by every operation, where the operands are stored (DDR vs
VTCM), effective usage of the individual IP blocks in the AI cores. Use for debug only because it can impact performance based on the size of the model.
- Timeline view: Provides a timeline view of all the operations
executing across all IP blocks from start of an inference till the
end. Used primarily to zoom into the operations to
understand bottlenecks.

See the [Profiler notebooks](https://github.com/quic/cloud-ai-sdk/tree/1.20/tutorials/NLP/Profiler-Intermediate) for an example of the complete device level profiling workflow for a model using
qaic-exec and qaic-runner.

### Next steps

- To use Python APIs to execute QPC and run inference, refer to [Python API](https://docs.qualcomm.com/doc/80-99100-3/topic/index_Python-API.html#reference-to-python-api).
- To use C++ APIs to execute QPC and run inference, refer to [C++ API](https://docs.qualcomm.com/doc/80-99100-3/topic/index_Cpp-API.html#reference-to-cpp-api).
- To use ONNX Runtime to execute QPC and run inference, see [using Qualcomm Cloud AI 100 as execution provider in ONNX Runtime](https://docs.qualcomm.com/doc/80-99100-3/topic/index_onnxruntime.html).

Last Published: May 01, 2026

[Previous Topic
Compile the model](https://docs.qualcomm.com/bundle/publicresource/80-99100-3/topics/index_model-compilation.md) [Next Topic
Pytorch workflow](https://docs.qualcomm.com/bundle/publicresource/80-99100-3/topics/index_Eager-Mode-Finetune.md)