# Acceleration Support

Qualcomm® AI Engine Direct Delegate provides acceleration on Qualcomm platforms using the Qualcomm® AI Engine Direct
SDK. The following sections describe the operators and features Qualcomm® AI Engine Direct Delegate
supports.

## GPU

In general, Qualcomm® AI Engine Direct GPU backend supports float32 and float16 operators and
activations. There is an option to set the activation and operator compute
precisions used by the accelerator core,
see the `TfLiteQnnDelegateGpuBackendOptions` structure.
For TFLite operator support, see [Supported Operators](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#supported-operators). The library for
this backend is *libQnnGpu.so*.

There is one option to set the accelerator’s performance mode, see
`TfLiteQnnDelegateGpuBackendOptions` for all the options enums.
The order of performance level of each mode is (from high to low):
High &gt; Normal &gt; Low, and Default mode is aligned with GPU backend default setting.

## HTP

In general, Qualcomm® AI Engine Direct HTP backend supports quantized 8 bits fixed-points activations and
operators. For TFLite operator support, see [Supported Operators](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#supported-operators), in
addition to that, the following table lists op restrictions. The libraries for
this backend are *libQnnHtp.so*, *libQnnHtpPrepare.so*, *libQnnHtp\*Stub.so* and *libQnnHtp\*Skel.so*.

There is one option to set the accelerator’s performance mode, see
`TfLiteQnnDelegateHtpBackendOptions` for all the options enums.
When performance\_mode is used, the delegate will vote for the provided
performance level at initialization. Users can change that on-the-fly by C-APIs.
See [C Interface](https://docs.qualcomm.com/doc/80-63442-10/topic/c_interface.html). Or, Delegate also support another strategy
called `kHtpPerfCtrlAuto`, which vote automatically during inference and then
return back to relaxed vote after the inference has completed.
The order of performance level of each mode is (from high to low):
Burst &gt; Sustained High Performance &gt; High Performance &gt; Balanced &gt; Low Balanced
&gt; High Power Saver &gt; Power Saver &gt; Low Power Saver.
In case of power consumption, it’s of the opposite order. One exception from
this order is Default mode, it basically means no input vote from client and
HTP will vote for the optimal modes itself automatically.

| Performance mode | Corner on up vote |
| --- | --- |
| Default | <ul class="simple"><br><li><p>Doesn’t perform any specific voting.</p></li><br></ul> |
| Sustained High Performance | <ul class="simple"><br><li><p>Sustained the TURBO corner vote.</p></li><br></ul> |
| Burst | <ul class="simple"><br><li><p>Vote for TURBO Plus corner vote.</p></li><br></ul> |
| High Performance | <ul class="simple"><br><li><p>Vote for TURBO corner vote.</p></li><br></ul> |
| Power Saver | <ul class="simple"><br><li><p>Vote for SVS corner vote.</p></li><br></ul> |
| Low Power Saver | <ul class="simple"><br><li><p>Vote for SVS2 corner vote.</p></li><br></ul> |
| High Power Saver | <ul class="simple"><br><li><p>Vote for SVS Plus corner vote.</p></li><br></ul> |
| Low Balance | <ul class="simple"><br><li><p>Vote for NORMINAL corner vote.</p></li><br></ul> |
| Balance | <ul class="simple"><br><li><p>Vote for NORMINAL Plus corner vote.</p></li><br></ul> |

On certain SoCs, the Qualcomm® AI Engine Direct HTP backend supports 16-bit floating point precision.
This can be enabled by setting the
`TfLiteQnnDelegateHtpBackendOptions.precision` option to
`TfLiteQnnDelegateHtpPrecision.kHtpFp16`. This requires the
.tflite model to have tensors with floating point precision. Note that fp32 models
can still be delegated, but the underlying math is in 16-bit precision.

Note that <cite>kHtpFp16</cite> is only supported by a limited set of chips. At this moment,
SnapDragon 8 Gen 1 or newer SnapDragon 8 generations can support <cite>kHtpFp16</cite>.

## DSP

The Qualcomm® AI Engine Direct DSP backend supports legacy chipsets with the Hexagon DSP hardware, as
opposed to the newer HTP hardware. The DSP backend only supports
quantized uint8 activations and operators. Qualcomm® AI Engine Direct Delegate supports the V66
generation of the Qualcomm® AI Engine Direct DSP only. For TFLite
operator support, see [Supported Operators](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#supported-operators). The libraries for this backend are *libQnnDspV66Stub.so* and
*libQnnDspV66Skel.so*.

For dsp\_performance\_mode, it’s the same voting options and performance order as
htp\_performance\_mode, see [HTP](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#htp).

Note that DSP backend returns per operator profiling events in cycles,
whereas the other backends may display the units as time measurements
(microseconds).

## Supported Operators

The following table shows the supported TFLite operators. See
[Operator Restrictions](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#operator-restrictions) for limitations and restrictions of operators supported
by this delegate.

| Operators |
| --- |
| Abs |
| Add |
| AddN |
| ArgMax |
| ArgMin |
| AveragePool2d |
| BatchMatMul |
| BatchToSpaceNd |
| Broadcast\_to |
| Cast |
| Ceil |
| Concatenation |
| Conv2d |
| Conv3d |
| Conv3dTranspose |
| Cos |
| Cumsum |
| DepthToSpace |
| DepthwiseConv2d |
| Dequantize |
| DetectionPostprocess |
| Div |
| Elu |
| Exp |
| EmbeddingLookup |
| ExpandDims |
| Equal |
| Floor |
| FullyConnected |
| Gather |
| GatherNd |
| Gelu |
| Greater |
| GreaterEqual |
| HardSwish |
| L2Normalization |
| L2Pool2d |
| LeakyRelu |
| Less |
| LessEqual |
| LocalResponseNormalization |
| Log |
| LogicalAnd |
| LogicalNot |
| LogicalOr |
| Logistic |
| LogSoftmax |
| Lstm |
| MaxPool2d |
| Maximum |
| Mean |
| Minimum |
| MirrorPad |
| Mul |
| Neg |
| NotEqual |
| OneHot |
| Pack |
| Pad |
| Padv2 |
| Pow |
| Prelu |
| Quantize |
| ReduceMax |
| ReduceMin |
| ReduceProd |
| Relu |
| Relu0To1 |
| Relu6 |
| ReluN1To1 |
| Reshape |
| ResizeBilinear |
| ResizeNearestNeighbor |
| ReverseV2 |
| Round |
| Rsqrt |
| ScatterNd |
| SegmentSum |
| Select |
| SelectV2 |
| Sin |
| Slice |
| Softmax |
| SpaceToBatchNd |
| SpaceToDepth |
| Split |
| SplitV |
| Sqrt |
| Square |
| SquaredDifference |
| Squeeze |
| StridedSlice |
| Sub |
| Sum |
| Tanh |
| Tile |
| TopkV2 |
| Transpose |
| TransposeConv |
| Unpack |

## Operator Restrictions

The following table lists any operator restrictions imposed by the delegate.
All other operator restrictions are determined at runtime by the Qualcomm® AI Engine Direct backend.
See the Qualcomm® AI Engine Direct SDK documentation for backend specific limitations and restrictions.

| Operators | Restriction |
| --- | --- |
| AddN | <ul class="simple"><br><li><p>Inputs can only be float32 or int32</p></li><br></ul> |
| ArgMax | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| ArgMin | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| BatchToSpaceNd | <ul class="simple"><br><li><p>block shape tensor must be constant</p></li><br><li><p>crops tensor must be constant</p></li><br></ul> |
| Ceil | <ul class="simple"><br><li><p>in[0]: only can be supported by HTPFP16 backend currently</p></li><br></ul> |
| Conv3d | <ul class="simple"><br><li><p>only can be supported by HTPFP16 backend currently</p></li><br><li><p>input, in[0]: float32</p></li><br><li><p>filter, in[1]: float32</p></li><br><li><p>bias, in[2]: same as input type</p></li><br></ul> |
| Conv3dTranspose | <ul class="simple"><br><li><p>only can be supported by HTPFP16 backend currently</p></li><br><li><p>filter, in[1]: must be constant</p></li><br><li><p>bias, in[2]: must be given</p></li><br></ul> |
| Cos | <ul class="simple"><br><li><p>in[0]: supports float32</p></li><br></ul> |
| Cumsum | <ul class="simple"><br><li><p>only supported by HTP backend currently</p></li><br></ul> |
| Elu | <ul class="simple"><br><li><p>alpha=1, in[0]: only float32 and int8 supported</p></li><br></ul> |
| ExpandDims | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| GatherNd | <ul class="simple"><br><li><p>only supported by HTP/DSP backends currently</p></li><br><li><p>params, in[0]: int32/uint8/int8</p></li><br><li><p>indices, in[1]: int32</p></li><br></ul> |
| L2pool2d | <ul class="simple"><br><li><p>in[0] only float32 supported</p></li><br></ul> |
| LeakyRelu | <ul class="simple"><br><li><p>alpha range must be in [0, 1]</p></li><br></ul> |
| Mean | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| OneHot | <ul class="simple"><br><li><p>Only supported by HTP/HTPFP16 backend currently</p></li><br><li><p>in[0]: int32, must be in range [0, depth-1]</p></li><br><li><p>out[0], on_value, off_value: float32, uint8/int8 must be quantized</p></li><br><li><p>depth, on_value, off_value must be static tensor</p></li><br></ul> |
| Pad | <ul class="simple"><br><li><p>paddings tensor must be constant</p></li><br><li><p>constant values tensor must be constant</p></li><br></ul> |
| PadV2 | <ul class="simple"><br><li><p>paddings tensor must be constant</p></li><br><li><p>constant values tensor must be constant</p></li><br><li><p>in[0]: supports uint8, int8</p></li><br></ul> |
| Relu\_0\_to\_1 | <ul class="simple"><br><li><p>in[0]: supports uint8, int8, and float32</p></li><br></ul> |
| ReverseV2 | <ul class="simple"><br><li><p>axis must be INT32 constant tensor with only 1 element.</p></li><br></ul> |
| Round | <ul class="simple"><br><li><p>input, in[0]: float32(limited by tflite op model), support by gpu/gpu16</p></li><br></ul> |
| ScatterNd | <ul class="simple"><br><li><p>only supported by HTP/HTPFP16 currently</p></li><br><li><p>indices, in[0]: int32</p></li><br><li><p>updates, in[1]: Float32/uint32</p></li><br><li><p>shape, in[2]: int32, must be static(constant)</p></li><br></ul> |
| SegmentSum | <ul class="simple"><br><li><p>only supported by HTP/HTPFP16 currently</p></li><br><li><p>input, in[0]: float32/int32, must be static(constant)</p></li><br><li><p>segment id, in[1]: int32, must be an 1D tensor and static(constant)</p></li><br></ul> |
| Slice | <ul class="simple"><br><li><p>begin and size tensors must be constant</p></li><br></ul> |
| SpaceToBatchNd | <ul class="simple"><br><li><p>block shape tensor must be constant</p></li><br><li><p>paddings tensor must be constant</p></li><br></ul> |
| Split | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| StridedSlice | <ul class="simple"><br><li><p>begin, end, and strides tensors must be constant</p></li><br><li><p>ellipsis is not supported</p></li><br></ul> |
| Sum | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| Tile | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |
| Transpose | <ul class="simple"><br><li><p>axis tensor must be constant</p></li><br></ul> |

Last Published: Jun 04, 2026

[Previous Topic
Tutorial - Use IR Backend by Using the Qualcomm® AI Engine Direct Delegate](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_ir_backend.md) [Next Topic
Tools](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tools.md)