# Appendix

## Gen AI Builder Model Support on HTP

The list below contains models that have been verified with Gen AI Builder on the HTP backend using
a Snapdragon Android mobile device. For the complete list of preconfigured model architectures, see
[`SupportedLLMs`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-gen-ai-builder-factory.html#qairt.gen_ai_api.gen_ai_builder_factory.SupportedLLMs).

- Key Points:
    - - At minimum, the entry point to Gen AI Builder is an ONNX model along with quantization encodings obtained using AIMET.
Please see the HTP section in [`qairt.gen_ai_api.gen_ai_builder_factory.GenAIBuilderFactory`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-api-gen-ai-builder-factory.html#qairt.gen_ai_api.gen_ai_builder_factory.GenAIBuilderFactory) for more details on the expected inputs.
- Please obtain tokenizer.json and config.json from Hugging Face repositories unless otherwise noted.

| Model Name | Supported Platforms | Notes |
| --- | --- | --- |
| Baichuan2-7B-Instruct | Android (Mobile) | Tokenizer can be obtained from [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/baichuan2_7b_quantized/v2/tokenizer.json) |
| IndusQ-1.1B | Android (Mobile) |  |
| JAIS-6p7B-Chat | Android (Mobile) |  |
| Llama 3-8B-Instruct | Android (Mobile) |  |
| Llama 3.1-8B-Instruct | Android (Mobile) |  |
| Llama 3.2 1B-Instruct | Android (Mobile) |  |
| Llama 3.2 3B-Instruct | Android (Mobile) |  |
| Ministral-3B | Android (Mobile) |  |
| Mistral-3B | Android (Mobile) |  |
| Phi-3.5-Instruct | Android (Mobile) |  |
| PlaMo-1B | Android (Mobile) |  |
| Qwen2-7B-Instruct | Android (Mobile) | May require protobuf==3.20.2 and onnx &lt;= 1.17.0. Issues have been observed with other protobuf versions. |

## GGUF Model Support on HTP

The list below contains GGUF models that have been verified with Gen AI Builder on the HTP backend using
a Snapdragon Android mobile device.

- For reasoning models, the [`sampler parameters`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-gen-ai-modules-genie-execution.html#qairt.modules.genie_execution.genie_config.Sampler) may need to be adjusted in the **gen\_ai\_config.json** file after the GenAIContainer has been built.
- An example configuration is shown below:

"sampler_params": {
        "version": 1,
        "seed": 42,
        "temp": 0.85,
        "top-k": 30,
        "top-p": 0.8
    }
    Copy to clipboard

| Model Name | Supported Platforms | Notes |
| --- | --- | --- |
| Llama-3-8B-Instruct | Android (Mobile) |  |
| Llama-3.1-8B-Instruct | Android (Mobile) |  |
| DeepSeek-R1-Distill-Llama-8B | Android (Mobile) |  |
| Llama-3.2-Instruct (1B, 3B) | Android (Mobile) |  |
| Qwen2-7B-Instruct | Android (Mobile) |  |
| DeepSeek-R1-Distill-Qwen-1.5B | Android (Mobile) | The Q2\_K model exhibits degraded performance compared to other quantization formats. |
| Qwen2.5-1.5B-Instruct | Android (Mobile) |  |
| Qwen3 (0.6B, 1.7B, 4B, 8B) | Android (Mobile) |  |
| Deepseek-R1-0528-Qwen3-8B | Android (Mobile) |  |
| Phi-3-mini-instruct | Android (Mobile) |  |
| Phi-3.5-mini-instruct | Android (Mobile) |  |
| Phi-4-mini-instruct | Android (Mobile) |  |
| Phi-4-mini-reasoning | Android (Mobile) |  |
| Mistral-7B-Instruct (v0.1, v0.2, v0.3) | Android (Mobile) |  |
| Mathstral-7B-Instruct-v0.1 | Android (Mobile) |  |
| Ministral-3 (3B, 8B) | Android (Mobile) |  |
| Gemma-2B-it | Android (Mobile) |  |
| Gemma-2-2B-it | Android (Mobile) |  |
| Gemma-3-1B-it | Android (Mobile) |  |

## How to create a quantized GGUF model with llama.cpp

- Build llama.cpp [from source](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
- Use the [convert_hf_to_gguf.py](https://github.com/ggml-org/llama.cpp/blob/master/convert_hf_to_gguf.py) script in llama.cpp repo to create a float GGUF model.

python convert_hf_to_gguf.py \
        --outfile <output_name>.gguf \
        --outtype f32/f16 \
        <path_to_HF_model>
    Copy to clipboard

- Use the **llama-quantize** tool (located in &lt;build\_folder&gt;/bin) to create a quantized GGUF model from the float GGUF model.

./llama-quantize \
        <float_model.gguf> \
        <quantized_model.gguf> \
        <quant_dtype>
    Copy to clipboard

Last Published: May 26, 2026

[Previous Topic
MHA to SHA](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/qairt-optimizer-passes-classes.md) [Next Topic
Changelogs](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/changelogs.md)