# Tutorial - Running Inference Using the Qualcomm® AI Engine Direct Delegate

This tutorial demonstrates how to run the TFLite *inception\_v3\_quant* model
using the Qualcomm® AI Engine Direct Delegate on the HTP backend.

## Prerequisites

The following list of prerequisites must be met before starting this tutorial:

1. Finished the Setup from QNN, notice this is different from the Delegate Setup Section.
The QNN Setup Section can be accessed through $QNN\_SDK\_ROOT/docs/QNN/index.html.
After open up the index.html, users should see the Setup Section for QNN on the left side.
2. A Qualcomm device with an ADB connection.
3. Read the [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/overview.html) and Setup pages to
understand the different components of the Qualcomm® AI Engine Direct Delegate.
4. A python3 environment with the *numpy* package installed.
5. Set the environment variable <cite>TENSORFLOW_HOME</cite> to point to the location where TensorFlow package is installed.
TensorFlow 2.10.1 has been tested and is compatible with this tutorial.

## Setup

In this tutorial, the *inception\_v3\_quant* model will be used to run inference
with the delegate. The Qualcomm® AI Engine Direct Delegate comes with some artifacts for this model
under `$QNN_SDK_ROOT/examples/Models/InceptionV3`.

First, to get the the model file and images, run:

$ python3 $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d
    Copy to clipboard

Notice the following directories under this path:

- `data/cropped`: the jpg and the preprocessed versions of the images.
- `data/target_raw_list.txt`: The list of paths of the preprocessed images.
- `tensorflow/inception_v3_2016_08_28_frozen_opt.pb`: the jpg and the preprocessed versions of the images.

Follow the instructions below to convert inception\_v3\_2016\_08\_28\_frozen\_opt.pb into inception\_v3\_quant.tflite.

$ python3 $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/scripts/convert_inceptionv3_tflite.py
    Copy to clipboard

The output *inception\_v3\_quant.tflite* is located at $QNN\_SDK\_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant.
This is the model that will be used to run inference with.

Next, push the model, croped, and target\_raw\_list.txt to the device using adb.

$ adb shell mkdir -p /data/local/tmp/qnn_delegate/inception_v3_quant
    $ adb push $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/inception_v3_quant.tflite /data/local/tmp/qnn_delegate/inception_v3_quant/
    $ adb push $QNN_SDK_ROOT/examples/Models/InceptionV3/data/cropped /data/local/tmp/qnn_delegate/inception_v3_quant/
    $ adb push $QNN_SDK_ROOT/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/qnn_delegate/inception_v3_quant/
    Copy to clipboard

This tutorial will use the tools:qtld-net-run application to run inference
through the delegate. Push this application and the Qualcomm® AI Engine Direct Delegate to the
device.

$ adb push $QNN_SDK_ROOT/bin/aarch64-android/qtld-net-run /data/local/tmp/qnn_delegate/
    Copy to clipboard

Finally, push the Qualcomm® AI Engine Direct HTP backend libraries to the device.
Notice that for the HTP and DSP backend, there are two libraries that need to be pushed,
the Stub library that will run on the CPU and the Skel library that will run on
the HTP or DSP.

Here is an example for the HTP backend.

$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnSystem.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtp.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV69Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV73Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV75Stub.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v69/unsigned/libQnnHtpV69Skel.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v73/unsigned/libQnnHtpV73Skel.so /data/local/tmp/qnn_delegate/
    $ adb push $QNN_SDK_ROOT/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so /data/local/tmp/qnn_delegate/
    Copy to clipboard

## Running the inception\_v3\_quant Model

Now that all artifacts are on the device, inference can be run on the
<cite>inception_v3_quant</cite> model using the qtld-net-run application.

Run the following command to execute inference with qtld-net-run. Checkout the
tools:qtld-net-run page for a reference on all the supported command line
options.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
                 /data/local/tmp/qnn_delegate/qtld-net-run \
                 --model inception_v3_quant.tflite \
                 --input target_raw_list.txt \
                 --output output \
                 --backend htp'
    Copy to clipboard

The output should look something like the following. If there are any errors
revisit the instructions above.

TFLite model: [inception_v3_quant.tflite]
    Input list file: [target_raw_list.txt]
    Total number of inferences: [4]
    Using QNN Backend: [htp]
    Loaded model successfully.
    
    INFO: Initialized TensorFlow Lite runtime.
    INFO: TfLiteQnnDelegate delegate: 128 nodes delegated out of 128 nodes with 1 partitions.
    
    === Pre-invoke Interpreter State ===
    Line 720: Allocated 1 input tensor(s)
    Line 730: Allocated 1 output tensor(s)
    
    === Invoking Interpreter ===
    Line 894: About to fout.write() output tensors with 4004 bytes
    === Invoking Interpreter ===
    Line 894: About to fout.write() output tensors with 4004 bytes
    === Invoking Interpreter ===
    Line 894: About to fout.write() output tensors with 4004 bytes
    === Invoking Interpreter ===
    Line 894: About to fout.write() output tensors with 4004 bytes
    Copy to clipboard

Notice the line *X nodes delegated out of Y nodes with N partitions*. This is an
info log from the TFLite framework stating how many nodes in the graph were
successfully delegated to the Qualcomm® AI Engine Direct Delegate. If the Qualcomm® AI Engine Direct Delegate does not support
an operator in the model, it will not be delegated but will instead fall back to
other supported runtimes, creating multiple partitions.

After qtld-net-run has completed running, the output results can be pulled from
the disk and inspected.

$ cd $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant
    $ adb pull /data/local/tmp/qnn_delegate/inception_v3_quant/output ./
    Copy to clipboard

Notice under the output folder, there are four result folders, representing the
output of each input image.

QNN has provided a program, <cite>show_inceptionv3_classifications.py</cite>, to view the results.
Under $QNN\_SDK\_ROOT/examples/Models/InceptionV3/scripts, launch the script file <cite>convert_output.sh</cite> to convert
the output directory into <cite>show_inceptionv3_classifications.py</cite> readable format.

$ ./scripts/convert_output.sh
    Copy to clipboard

The converted output will be stored inside the folder <cite>output_android</cite>.
Next, execute <cite>show_inceptionv3_classifications.py</cite> with the following:

$ python3 $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py \
                -i $QNN_SDK_ROOT/examples/Models/InceptionV3/data/cropped/raw_list.txt \
                -o $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/output_android/ \
                -l $QNN_SDK_ROOT/examples/Models/InceptionV3/data/imagenet_slim_labels.txt
    Copy to clipboard

The classification result should be similar:

${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/trash_bin.raw   0.695312 413 ashcan
    ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/plastic_cup.raw 0.996094 648 measuring cup
    ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/notice_sign.raw 0.175781 459 brass
    ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/chairs.raw      0.410156 832 studio couch
    Copy to clipboard

Congratulations, you have just ran your first inference with the Qualcomm® AI Engine Direct Delegate!

## Get Profile Result from Inference

Run the following command to profile the *inception\_v3\_quant* model.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
                 /data/local/tmp/qnn_delegate/qtld-net-run \
                 --model inception_v3_quant.tflite \
                 --input target_raw_list.txt \
                 --output output \
                 --profiling 1 \
                 --profiling_output_dir profile_results \
                 --backend htp'
    Copy to clipboard

For other profiling options, please refer to tools:qtld-net-run.

Warning

The profiling behavior of Qualcomm® AI Engine Direct Delegate subject to change in near future. Please use with cautions.

## View Profiling Result by qtld-profile-viewer

After qtld-net-run has completed running, the profiling output results can be pulled from
the device.

$ adb pull /data/local/tmp/qnn_delegate/inception_v3_quant/profile_results/qnn_delegate_profiling_result.bin ./
    Copy to clipboard

The binary file can be convert into a .txt file by the following command.

$ $QNN_SDK_ROOT/bin/aarch64-android/qtld-profile-viewer \
       --input_profile_data qnn_delegate_profiling_result.bin \
       --topK <topK> \
       --output ./profiling_output.txt \
       --num_warmup 1
    Copy to clipboard

Note that the options is described as below.

- `--input_profile_data`: A binary input file that contains the profiling result.
- `--topK`: Only used in detailed profiling mode. Number of events to be printed under Top by Computation Time section. Default is 5.
- `--output`: An output txt file contains human readable profiling result. If it is not specified, the profiling output will show in standard output.
- `--num_warmup`: Number of initialization/execution to be counted as warmups. Default is 1.

Last Published: Jun 04, 2026

[Previous Topic
Tutorial - Using TFLite Delegate With a Java Application](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_java_application.md) [Next Topic
Tutorial - Skip Delegation Ops Using the Qualcomm® AI Engine Direct Delegate](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_skip_node.md)