# Acceleration Support Qualcomm® AI Engine Direct Delegate provides acceleration on Qualcomm platforms using the Qualcomm® AI Engine Direct SDK. The following sections describe the operators and features Qualcomm® AI Engine Direct Delegate supports. ## GPU In general, Qualcomm® AI Engine Direct GPU backend supports float32 and float16 operators and activations. There is an option to set the activation and operator compute precisions used by the accelerator core, see the `TfLiteQnnDelegateGpuBackendOptions` structure. For TFLite operator support, see [Supported Operators](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#supported-operators). The library for this backend is *libQnnGpu.so*. There is one option to set the accelerator’s performance mode, see `TfLiteQnnDelegateGpuBackendOptions` for all the options enums. The order of performance level of each mode is (from high to low): High > Normal > Low, and Default mode is aligned with GPU backend default setting. ## HTP In general, Qualcomm® AI Engine Direct HTP backend supports quantized 8 bits fixed-points activations and operators. For TFLite operator support, see [Supported Operators](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#supported-operators), in addition to that, the following table lists op restrictions. The libraries for this backend are *libQnnHtp.so*, *libQnnHtpPrepare.so*, *libQnnHtp\*Stub.so* and *libQnnHtp\*Skel.so*. There is one option to set the accelerator’s performance mode, see `TfLiteQnnDelegateHtpBackendOptions` for all the options enums. When performance\_mode is used, the delegate will vote for the provided performance level at initialization. Users can change that on-the-fly by C-APIs. See [C Interface](https://docs.qualcomm.com/doc/80-63442-10/topic/c_interface.html). Or, Delegate also support another strategy called `kHtpPerfCtrlAuto`, which vote automatically during inference and then return back to relaxed vote after the inference has completed. The order of performance level of each mode is (from high to low): Burst > Sustained High Performance > High Performance > Balanced > Low Balanced > High Power Saver > Power Saver > Low Power Saver. In case of power consumption, it’s of the opposite order. One exception from this order is Default mode, it basically means no input vote from client and HTP will vote for the optimal modes itself automatically. | Performance mode | Corner on up vote | | --- | --- | | Default |

Doesn’t perform any specific voting.

| | Sustained High Performance |

Sustained the TURBO corner vote.

| | Burst |

Vote for TURBO Plus corner vote.

| | High Performance |

Vote for TURBO corner vote.

| | Power Saver |

Vote for SVS corner vote.

| | Low Power Saver |

Vote for SVS2 corner vote.

| | High Power Saver |

Vote for SVS Plus corner vote.

| | Low Balance |

Vote for NORMINAL corner vote.

| | Balance |

Vote for NORMINAL Plus corner vote.

| On certain SoCs, the Qualcomm® AI Engine Direct HTP backend supports 16-bit floating point precision. This can be enabled by setting the `TfLiteQnnDelegateHtpBackendOptions.precision` option to `TfLiteQnnDelegateHtpPrecision.kHtpFp16`. This requires the .tflite model to have tensors with floating point precision. Note that fp32 models can still be delegated, but the underlying math is in 16-bit precision. Note that kHtpFp16 is only supported by a limited set of chips. At this moment, SnapDragon 8 Gen 1 or newer SnapDragon 8 generations can support kHtpFp16. ## DSP The Qualcomm® AI Engine Direct DSP backend supports legacy chipsets with the Hexagon DSP hardware, as opposed to the newer HTP hardware. The DSP backend only supports quantized uint8 activations and operators. Qualcomm® AI Engine Direct Delegate supports the V66 generation of the Qualcomm® AI Engine Direct DSP only. For TFLite operator support, see [Supported Operators](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#supported-operators). The libraries for this backend are *libQnnDspV66Stub.so* and *libQnnDspV66Skel.so*. For dsp\_performance\_mode, it’s the same voting options and performance order as htp\_performance\_mode, see [HTP](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#htp). Note that DSP backend returns per operator profiling events in cycles, whereas the other backends may display the units as time measurements (microseconds). ## Supported Operators The following table shows the supported TFLite operators. See [Operator Restrictions](https://docs.qualcomm.com/doc/80-63442-10/topic/support.html#operator-restrictions) for limitations and restrictions of operators supported by this delegate. | Operators | | --- | | Abs | | Add | | AddN | | ArgMax | | ArgMin | | AveragePool2d | | BatchMatMul | | BatchToSpaceNd | | Broadcast\_to | | Cast | | Ceil | | Concatenation | | Conv2d | | Conv3d | | Conv3dTranspose | | Cos | | Cumsum | | DepthToSpace | | DepthwiseConv2d | | Dequantize | | DetectionPostprocess | | Div | | Elu | | Exp | | EmbeddingLookup | | ExpandDims | | Equal | | Floor | | FullyConnected | | Gather | | GatherNd | | Gelu | | Greater | | GreaterEqual | | HardSwish | | L2Normalization | | L2Pool2d | | LeakyRelu | | Less | | LessEqual | | LocalResponseNormalization | | Log | | LogicalAnd | | LogicalNot | | LogicalOr | | Logistic | | LogSoftmax | | Lstm | | MaxPool2d | | Maximum | | Mean | | Minimum | | MirrorPad | | Mul | | Neg | | NotEqual | | OneHot | | Pack | | Pad | | Padv2 | | Pow | | Prelu | | Quantize | | ReduceMax | | ReduceMin | | ReduceProd | | Relu | | Relu0To1 | | Relu6 | | ReluN1To1 | | Reshape | | ResizeBilinear | | ResizeNearestNeighbor | | ReverseV2 | | Round | | Rsqrt | | ScatterNd | | SegmentSum | | Select | | SelectV2 | | Sin | | Slice | | Softmax | | SpaceToBatchNd | | SpaceToDepth | | Split | | SplitV | | Sqrt | | Square | | SquaredDifference | | Squeeze | | StridedSlice | | Sub | | Sum | | Tanh | | Tile | | TopkV2 | | Transpose | | TransposeConv | | Unpack | ## Operator Restrictions The following table lists any operator restrictions imposed by the delegate. All other operator restrictions are determined at runtime by the Qualcomm® AI Engine Direct backend. See the Qualcomm® AI Engine Direct SDK documentation for backend specific limitations and restrictions. | Operators | Restriction | | --- | --- | | AddN |

Inputs can only be float32 or int32

| | ArgMax |

axis tensor must be constant

| | ArgMin |

axis tensor must be constant

| | BatchToSpaceNd |

block shape tensor must be constant

crops tensor must be constant

| | Ceil |

in[0]: only can be supported by HTPFP16 backend currently

| | Conv3d |

only can be supported by HTPFP16 backend currently

input, in[0]: float32

filter, in[1]: float32

bias, in[2]: same as input type

| | Conv3dTranspose |

only can be supported by HTPFP16 backend currently

filter, in[1]: must be constant

bias, in[2]: must be given

| | Cos |

in[0]: supports float32

| | Cumsum |

only supported by HTP backend currently

| | Elu |

alpha=1, in[0]: only float32 and int8 supported

| | ExpandDims |

axis tensor must be constant

| | GatherNd |

only supported by HTP/DSP backends currently

params, in[0]: int32/uint8/int8

indices, in[1]: int32

| | L2pool2d |

in[0] only float32 supported

| | LeakyRelu |

alpha range must be in [0, 1]

| | Mean |

axis tensor must be constant

| | OneHot |

Only supported by HTP/HTPFP16 backend currently

in[0]: int32, must be in range [0, depth-1]

out[0], on_value, off_value: float32, uint8/int8 must be quantized

depth, on_value, off_value must be static tensor

| | Pad |

paddings tensor must be constant

constant values tensor must be constant

| | PadV2 |

paddings tensor must be constant

constant values tensor must be constant

in[0]: supports uint8, int8

| | Relu\_0\_to\_1 |

in[0]: supports uint8, int8, and float32

| | ReverseV2 |

axis must be INT32 constant tensor with only 1 element.

| | Round |

input, in[0]: float32(limited by tflite op model), support by gpu/gpu16

| | ScatterNd |

only supported by HTP/HTPFP16 currently

indices, in[0]: int32

updates, in[1]: Float32/uint32

shape, in[2]: int32, must be static(constant)

| | SegmentSum |

only supported by HTP/HTPFP16 currently

input, in[0]: float32/int32, must be static(constant)

segment id, in[1]: int32, must be an 1D tensor and static(constant)

| | Slice |

begin and size tensors must be constant

| | SpaceToBatchNd |

block shape tensor must be constant

paddings tensor must be constant

| | Split |

axis tensor must be constant

| | StridedSlice |

begin, end, and strides tensors must be constant

ellipsis is not supported

| | Sum |

axis tensor must be constant

| | Tile |

axis tensor must be constant

| | Transpose |

axis tensor must be constant

| Last Published: Jun 04, 2026 [Previous Topic Tutorial - Use IR Backend by Using the Qualcomm® AI Engine Direct Delegate](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tutorial_ir_backend.md) [Next Topic Tools](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/tools.md)