# Windows

The following tutorial will walk users through offline preparation and inference on the QNN Gen AI Transformer backend
on the Windows platform.

## Offline Preparation

The QNN Gen AI Transformer backend uses the [qnn-genai-transformer-composer](https://docs.qualcomm.com/doc/80-63442-10/topic/qnn-genai-transformer-composer.html#qnn-genai-transformer-composer)
utility to prepare models for inference.

Open `Developer PowerShell for VS2022` on a Windows host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
    cd ${QNN_SDK_ROOT}>\bin\x86_64-windows-msvc
    python .\qnn-genai-transformer-composer --quantize Z4
                                            --outfile <output filename with complete path>.bin
                                            --model <path-to-downloaded-LLama-model-directory>
    Copy to clipboard

## Inference

Open `Developer PowerShell for VS2022` on Windows on Snapdragon host and run:

# Make sure environment is setup as per instructions, or can cd into bin folder on Windows host
    cd <QNN_SDK_ROOT>\bin\aarch64-windows-msvc
    .\genie-t2t-run.exe -c <path to llama2-7b-genaitransformer.json> -p "Tell me about Qualcomm"
    Copy to clipboard

Last Published: Jun 04, 2026

[Previous Topic
Android](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/no_lora_android_inference.md) [Next Topic
QNN GPU](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/gpu.md)