# HTA

This section provides information specific to QNN HTA backend.

## API Specializations

This section contains information related to API specialization for the HTA backend. All QNN HTA
backend specialization is available under `<QNN_SDK_ROOT>/include/QNN/HTA/` directory.

The current version of the QNN HTA backend API is:

- QNN\_HTA\_API\_VERSION\_MAJOR 2

    - 

- QNN\_HTA\_API\_VERSION\_MINOR 0

    - 

- QNN\_HTA\_API\_VERSION\_PATCH 0

    -

## QNN HTA Supported Operations

QNN HTA supports running quantized 8-bit and quantized 16-bit networks.
List of operations supported by QNN HTA Quant runtime can be seen under Backend Support HTA column in
[Supported Operations](https://docs.qualcomm.com/doc/80-63442-50/topic/SupportedOps.html#supported-operations)

## QNN HTA 16-bit Integer Support Limitations

To enable 16-bit integer inference, specify quantization bit width of activation to 16 while keeping
that of weights to 8. The input/output data format should be defined as 16-bit.

> 
> 
> - Use `--act_bw 16 --weight_bw 8`
> by QNN converter tools to generate model with 16-bit activations and 8 bit weights.

### QNN HTA Performance Infrastructure API

Clients can invoke
QnnBackend\_getPerfInfrastructure
after loading the QNN HTA library and then invoke methods that are available in
[File QnnHtaPerfInfrastructure.h](https://docs.qualcomm.com/doc/80-63442-50/topic/api-rst_file_include_QNN_HTA_QnnHtaPerfInfrastructure_h.html##file-include-qnn-hta-qnnhtaperfinfrastructure-h). These APIs allow a client to control
the HTA accelerator’s system settings thereby giving fine-grained control of the accelerator.
A few use-cases are:

1. Set up the power mode of the accelerator.

Last Published: Oct 10, 2025

[Previous Topic
QnnContext\_createFromBinaryWithCallback API](https://docs.qualcomm.com/bundle/publicresource/80-63442-50/topics/htp_backend.md) [Next Topic
LPAI](https://docs.qualcomm.com/bundle/publicresource/80-63442-50/topics/lpai_backend.md)