# Tutorial - Benchmarking the Qualcomm® AI Engine Direct Delegate

This tutorial demonstrates how to benchmark models running through the Qualcomm® AI Engine Direct
Delegate using the TFLite benchmark\_model application.

## Prerequisites

The following list of prerequisites must be met before starting this tutorial:

1. Finished the Setup from QNN, notice this is different from the Delegate Setup Section.
The QNN Setup Section can be accessed through $QNN\_SDK\_ROOT/docs/QNN/index.html.
After open up the index.html, users should see the Setup Section for QNN on the left side.
2. A Qualcomm device with an ADB connection.
3. Read the [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/overview.html) and Setup pages to
understand the different components of the Qualcomm® AI Engine Direct Delegate.
4. The TFLite [native benchmark_model application](https://www.tensorflow.org/lite/performance/measurement#native_benchmark_binary).
It is possible to use a precompiled version of the *benchmark\_model*
application through the External Delegate interface.
5. Set the environment variable <cite>TENSORFLOW_HOME</cite> to point to the location where TensorFlow package is installed.
TensorFlow 2.10.1 has been tested and is compatible with this tutorial.

## Setup

This tutorial will use the same *inception\_v3\_quant* model used in
Tutorial qtld-net-run Setup to perform benchmarking.

Follow the instructions below to download the model and un-tar it. If you have
already performed this setup, these commands can be skipped.

$ python3 $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d
    $ python3 $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/scripts/convert_inceptionv3_tflite.py
    Copy to clipboard

There should be a file called *inception\_v3\_quant.tflite* under $QNN\_SDK\_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant. This
is the model that will be used to run benchmarking.

Push the *benchmark\_model* application, the TFLite model, the Qualcomm® AI Engine Direct Delegate, and
the Qualcomm® AI Engine Direct backend libraries to the device using ADB.

$ adb shell mkdir -p /data/local/tmp/qnn_delegate/inception_v3_quant
    $ adb push <PATH_TO_BENCHMARK_MODEL>/benchmark_model /data/local/tmp/qnn_delegate/
    $ adb push inception_v3_quant.tflite /data/local/tmp/qnn_delegate/inception_v3_quant/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnTFLiteDelegate.so /data/local/tmp/qnn_delegate/
    # push QNN libraries
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnSystem.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtp.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV69Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV73Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v69/unsigned/libQnnHtpV69Skel.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v73/unsigned/libQnnHtpV73Skel.so /data/local/tmp/qnn_delegate/
    Copy to clipboard

Notice that the model data does not need to be pushed to the device. This is
because the *benchmark\_model* application will automatically create random data
for benchmarking.

## Running the Benchmark

Run the following command to benchmark the *inception\_v3\_quant* model.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
                 /data/local/tmp/qnn_delegate/benchmark_model \
                 --graph=inception_v3_quant.tflite \
                 --external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
                 --external_delegate_options="backend_type:htp"'
    Copy to clipboard

In order to enable per-operator profiling measurements from the benchmark, add
the `--enable_op_profiling=true` *benchmark\_model* option along with the
`profiling:2` delegate option. Look at
[External Delegate Options](https://docs.qualcomm.com/doc/80-63442-10/topic/options.html#external-delegate-options) for more information. Run the following
command to get per-operator profiling measurements. You can also save a csv file
by `profiling_output_csv_file` option of *benchmark\_model*.

Warning

The profiling behavior of Qualcomm® AI Engine Direct Delegate subject to change in near future. Please use with cautions.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
                 /data/local/tmp/qnn_delegate/benchmark_model \
                 --graph=inception_v3_quant.tflite \
                 --external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
                 --external_delegate_options="backend_type:htp;profiling:2" \
                 --enable_op_profiling=true'
    Copy to clipboard

Note that the *benchmark\_model* application displays the unit as a time measurement
whereas HTP backend capture cycles events.

The HTP backend allows for performance modes which can be configured to achieve
the best performance. Adding the `htp_performance_mode:1` delegate option will
enable the maximum performance mode, details for different performance modes
options can be found here [External Delegate Options](https://docs.qualcomm.com/doc/80-63442-10/topic/options.html#external-delegate-options).
Run the following command to enable max performance mode.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
                 /data/local/tmp/qnn_delegate/benchmark_model \
                 --graph=inception_v3_quant.tflite \
                 --external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
                 --external_delegate_options="backend_type:htp;htp_performance_mode:1"'
    Copy to clipboard

Note that this option is only for the HTP backend. Other backends handle
performance mode configurations internally.

Congratulations, you have just benchmarked the Qualcomm® AI Engine Direct Delegate!

## Generate Model Cache or Restoring from One

By specifying `model_token` and `cache_dir` in `--external_delegate_options`, the
model passed in through the `--graph` option will either be saved to a
folder for future use, or loaded from the cache file if that file is present.
Building on the example above:

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
                 /data/local/tmp/qnn_delegate/benchmark_model \
                 --graph=inception_v3_quant.tflite \
                 --external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
                 --external_delegate_options="backend_type:htp;htp_performance_mode:1;cache_dir:/data/local/tmp/;model_token:qnn_delegate_model"'
    Copy to clipboard

If the cache directory doesn’t exist, model caching will fail.
if it exists and caching data doesn’t exist, model caching will operate in *SAVE MODE*, meaning
the prepared model will be saved to these files. These files are created by the Qualcomm® AI Engine Direct Delegate.
The effect of caching can be seen if `benchmark_model`
is executed with the same options again.  Its console output would indicate caching is operating
in *RESTORE MODE*.

The activation of the model caching feature is logged as INFO / WARNING logs.

One way to tell if the model is being restored from the cache file, rather than prepared from the tflite
model file, is by comparing the time it took to initialize the session.  The time can be found on the
`Initialized session` line in console output:  For moderate or large models, there should be a noticeable
difference.

One thing to note about the model caching feature:  The Qualcomm® AI Engine Direct Delegate is designed to continue with the inference
requests even when the caching feature fails. Here are some of the ways
that the caching feature can fail:

- Files in `cache_dir` are not readable when restoring, or not writable when saving

Last Published: Jul 02, 2026

[Previous Topic
Tutorial - Skip Delegation Ops Using the Qualcomm® AI Engine Direct Delegate](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_skip_node.md) [Next Topic
Tutorial - Running Inference Using Shared Memory](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_htp_shared_memory.md)