# QNN Gen AI Transformer

The following tutorial demonstrates running the Llama 2 7B model on the QNN Gen AI Transformer backend using
[genie-t2t-run](https://docs.qualcomm.com/doc/80-63442-10/topic/genie-t2t-run.html#genie-t2t-run).

The Genie provided QNN GenAITransformer backend to represent an entire LLM model as a single op. It runs the inference
on the host CPU. Genie packages a prebuilt QnnGenAiTransformerModel model library. The corresponding source for this
model library can be found at `${SDK_ROOT}/examples/Genie/Model/model.cpp`. Because the QNN GenAITransformer backend
model is prebuilt, this backend uses the `qnn-genai-transformer-composer` tool for preparation.

## Model download

Download Llama-2-7b-chat-hf from [https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main).

## Preparation tutorials

Choose the offline preparation host platform:

- [Linux](https://docs.qualcomm.com/doc/80-63442-10/topic/linux.html)
- [Windows](https://docs.qualcomm.com/doc/80-63442-10/topic/windows.html)

Last Published: Jun 04, 2026

[Previous Topic
Windows](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/windows_windows.md) [Next Topic
Linux](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/linux.md)