# Run LiteRT Model on NPU

The Qualcomm AI Runtime delegate is a proprietary delegate designed
for vendor-specific hardware acceleration to speed up LiteRT models.

It’s based on the [external delegate interface](https://ai.google.dev/edge/litert/performance/implementing_delegate#option_2_leverage_external_delegate)
of LiteRT. You can use the qualcomm AI runtime delegate to offload
parts or the entire LiteRT model to specialized Qualcomm hardware,
such as the Adreno GPU and the NPU.

This delegate improves model execution performance and power
efficiency by reducing the CPU workload. It also uses the existing
Qualcomm AI Runtime APIs and available back ends to speed up
models. For more information about these APIs, see
[Qualcomm AI Runtime (QAIRT) SDK](https://docs.qualcomm.com/doc/80-63442-10).
The Qualcomm AI runtime delegate can run models in both 32-bit floating-point
precision and int8 precision on the available hardware.

You can build applications using the following interfaces:

- Qualcomm AI Runtime delegate interface
- LiteRT external delegate interface

You can access both the interfaces when using a standalone LiteRT
application. However, if you deploy your LiteRT models using the
Qualcomm IM SDK, the `qtimltflite` GStreamer plug-in for Qualcomm
TensorFlow Lite uses the QNN delegate. For more information, see
[Leverage external delegate](https://ai.google.dev/edge/litert/performance/implementing_delegate#option_2_leverage_external_delegate).

Last Published: May 14, 2026

[Previous Topic
Run LiteRT Model on GPU](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/run-litert-model-on-gpu.md) [Next Topic
Prepare a LiteRT model](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/prepare-a-litert-model.md)