# Tutorial - Use Mix-Precision Model with Qualcomm® AI Engine Direct Delegate

Floating-point models often result in more accurate predictions compared to quantization models.
Quantization models, on the other hand, can significantly reduce the model size and
computational requirements, resulting in lower run latency than the corresponding floating-point model.
To strike a balance between the high accuracy of floating-point models and
the computational efficiency of quantization models, we can employ the mix-precision models.
This approach offers a compromise by achieving a reasonable level of accuracy while optimizing computational efficiency.

This tutorial demonstrates how to use the mix-precision model in Qualcomm® AI Engine Direct Delegate.
Additionally, we demonstrate one case on MobileNet v3 that highlight the benefits of employing mix-precision models.

## Prerequisites

The following list of prerequisites must be met before starting this tutorial:

1. To generate the mix-precision model, please refer to
[Quantization Debugger](https://www.tensorflow.org/lite/performance/quantization_debugger) tutorial on the TenserFlow website.
2. Read the [Tutorial for qtld-net-run](https://docs.qualcomm.com/doc/80-63442-10/topic/tutorial_qtld_net_run.html), [Tutorial for benchmark\_model](https://docs.qualcomm.com/doc/80-63442-10/topic/tutorial_benchmark_model.html)
to understand how to run inference and benchmark using the Qualcomm® AI Engine Direct Delegate.

## Running mix-precision model with qtld-net-run

[Tutorial for qtld-net-run](https://docs.qualcomm.com/doc/80-63442-10/topic/tutorial_qtld_net_run.html) demonstrates how to run the TFLite model using the Qualcomm® AI Engine Direct Delegate on the HTP backend.

Set the `htp_precision=1` when using a mixed-precision model.

$ adb shell "LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
          ADSP_LIBRARY_PATH=/data/local/tmp/qnn_delegate/ &&
          /data/local/tmp/qnn_delegate/qtld-net-run \
          --model=/data/local/tmp/qnn_delegate/mix_precision_model.tflite  \
          --input=/data/local/tmp/qnn_delegate/input_list.txt \
          --output=/data/local/tmp/qnn_delegate/tensor_dump_output \
          --htp_precision=1 \
          --backend=htp"
    Copy to clipboard

## Running mix-precision model with benchmarking

[Tutorial for benchmark](https://docs.qualcomm.com/doc/80-63442-10/topic/tutorial_benchmark_model.html) demonstrates how to benchmark models
running through the Qualcomm® AI Engine Direct Delegate using the TFLite benchmark\_model application.

Set the `htp_precision:1`, when using a mixed-precision model.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
                 export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
                 /data/local/tmp/qnn_delegate/benchmark_model \
                 --graph=mix_precision_model.tflite \
                 --external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
                 --external_delegate_options="backend_type:htp;htp_precision:1"'
    Copy to clipboard

## Experiments

Here, we use two experiments to show you that the mixed-precision model results in more accurate
predictions compared to the quantized model, and result in lower run latency than the corresponding floating-point model.

### Testing environment

Here is our testing environment.

- - Base Model: We download the MobileNet v3 model from Tenserflow Hub.
    - - MobileNet v3: [https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/5](https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/5)
- - Representative dataset:
    - - We use tensorflow dataset, imagenet\_v2, first 100 pictures.
- - Testing dataset:
    - - We use tensorflow dataset, imagenet\_v2, first 1000 pictures.
- Testing device: We test the MobileNet v3 model with different precision on Snapdragon 8 Gen 1+.
- TensorFlow version: v2.10.0

### Testing the top-K accuracy

We apply top-K accuracy to verify if the mixed-precision model offers better accuracy than the fully-quantized model.

Results show that the mixed-precision model has higher top-K accuracy.

|  | MobileNet v3 (full float) | MobileNet v3 (full quantization) | MobileNet v3  (mix-precision) |
| --- | --- | --- | --- |
| 1000 testing data, Top-1 accuracy, running on htp, delegated by Qualcomm® AI Engine Direct op | 59.2% | 15.7% | **51.3%** |
| 1000 testing data, Top-1 accuracy, running on cpu | 59.3% | 19.5% | **53.7%** |
| 1000 testing data, Top-5 accuracy, running on htp, delegated by Qualcomm® AI Engine Direct op | 83.2% | 33.0% | **74.1%** |
| 1000 testing data, Top-5 accuracy, running on cpu | 83.3% | 37.8% | **76.4%** |

### Testing the benchmark

We employ the TFLite benchmark\_model application used in [this tutorial](https://docs.qualcomm.com/doc/80-63442-10/topic/tutorial_benchmark_model.html) to check if the mixed-precision model operates with less latency than the floating-point model.

Results show that the mixed-precision model runs with lower latency than the corresponding floating-point model.

| Inference timings (in us) | MobileNet v3 (full float) | MobileNet v3 (full quantization) | MobileNet v3  (mix-precision) |
| --- | --- | --- | --- |
| Init | 967562 | 576929 | **606613** |
| Inference (avg) | 4384.77 | 2944.59 | **3148.82** |

## Conclusion

On this page, we have attempted to infer the MobileNet v3 model with different precision. In the experiments, we can see that the mixed-precision model has higher top-K accuracy than the quantized model, and result in lower run latency than the corresponding floating-point model.

Last Published: Jun 04, 2026

[Previous Topic
Tutorial - Running Inference Using Shared Memory](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_htp_shared_memory.md) [Next Topic
Tutorial - Profile Custom Models using Qualcomm® AI Engine Direct Delegate](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_qtld_profiler.md)