# 0.8.0

## Release Information

- OS & Framework support - Tested with the following configurations:

> 
> 
> - Non-Gen AI:
> 
> 
> 
> > 
> > 
> > - Frameworks: ONNX, TFLite, Pytorch
> >         - Host Platforms: Linux-x86\_64 (Ubuntu 22.04, Ubuntu 24.04), Linux-aarch64 (Ubuntu 24.04, Python 3.12), Windows-x86\_64 (10+), Windows-arm64 (10+)
> >         - Target Platforms: Linux-x86\_64 (Ubuntu 22.04, Ubuntu 24.04), Linux-aarch64 (Ubuntu 24.04), Windows-arm64 (10+), Android-arm64, QNX
>     - Gen AI:
> 
> 
> 
> > 
> > 
> > - Frameworks: ONNX, GGUF
> >         - Host Platforms: Linux-x86\_64 (Ubuntu 22.04, Ubuntu 24.04)
> >         - Target Platforms: Android-arm64, Linux-aarch64
- QAIRT SDK - Tested with version (2.45.0, 2.46.0, 2.47.0(Default))

Note

For QAIRT SDK, other supported SDK versions can be listed using `qairt-vm fetch --list`.

## Highlights

- Default QAIRT SDK updated to 2.47.0
- New features in QAIRT Dev Python APIs (compatible with QAIRT SDK 2.47.0+, unless noted otherwise)

> 
> 
> - Platform Support
> 
> 
> 
> > 
> > 
> > - Python 3.12 support for Linux-x86\_64 and Linux-aarch64 (Ubuntu 24.04)
> >         - Linux-aarch64 as dev environment: `qairt-dev` now supports end-to-end convert/compile/execute workflows natively on Linux-aarch64, similar to Linux-x86\_64 (Non-Gen AI models, see [ResNet50 Native Inference on arm-linux](https://docs.qualcomm.com/doc/80-87189-2/topic/native_inference.html#native-inference-oelinux))
>     - Gen AI Builder Enhancements
> 
> 
> 
> > 
> > 
> > - `attach_model_for_arn()` on the HTP Gen AI builder attaches a child builder for prefill/decode (or any per-AR variant) that is compiled weight-shared with the parent; see [Advanced Features](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_advanced_features.html#genai-advanced-features) for details
> >         - Qwen3 (dense) model support in Gen AI Builder — `Qwen3ForCausalLM` architectures (e.g. `Qwen/Qwen3-4B`) are now handled by `GenAIBuilderFactory.create()` on the HTP backend
> >         - `engine_config` parameter on `LLMContainer.export()` and `LLMContainer.get_executor()` accepts an `EngineConfig` so that runtime engine settings (such as HTP `allow_async_init` and `enable_graph_switching`) can be tuned at execution time, separately from build-time configuration
> >         - `GenerationConfig` separates per-call generation parameters (`temperature`, `top_k`, `top_p`, `seed`, `greedy`, `max_num_tokens`) from build-time configuration, so sampling behavior can be varied per inference request without rebuilding the model
> >         - Human-readable builder cache: cache artifacts are now grouped under a per-model directory with `<operation>_<hash>` stage subdirectories, and each stage writes a `builder_cache_info.json` sidecar recording its source, configuration, and hash for offline inspection and selective stage replay (see [Understanding the Build Cache](https://docs.qualcomm.com/doc/80-87189-2/topic/genai_overview.html#genai-builder-cache))
>     - Multimodal
> 
> 
> 
> > 
> > 
> > - Qwen2.5-VL image-to-text workflow on HTP: build a vision encoder and text generator into a single `WorkflowContainer` via `WorkflowBuilder` (using `GenAIBuilderFactory.create_vision_encoder()` and a `WorkflowGraph`) and run multimodal inference through `ImageT2TExecutor`

## Resolved Issues

> 
> 
> - Gen AI Builder multi-graph validation no longer rejects a child builder that sets `multi_graph=True` when its parent builder also has `multi_graph=True`. Previously this over-strict check blocked valid parent/child multi-graph configurations from building.

## Deprecation Notices

All items below will be removed once the default QAIRT SDK version moves past 2.48.

> 
> 
> - The ONNX Model interface (`OnnxModel` and its public methods `OnnxModel.load()`, `OnnxModel.split()`, `OnnxModel.mha2sha_v2()`) is deprecated. It exists only for compatibility with the QAIRT SDK. Migrate to `GraphContext` together with the simple optimizer APIs (`convert_mha_to_sha`, `split_llm`, `adapt_moe`).
> - `GenAIConfig.allow_async_init` and `GenAIConfig.enable_graph_switching` are deprecated. Set these through `EngineConfig.htp.allow_async_init` and `EngineConfig.htp.enable_graph_switching` and pass the `EngineConfig` to `get_executor()` / `T2TExecutor` instead.
> - `GenAIExecutor` (`qairt.gen_ai_api.executors.gen_ai_executor`) is deprecated and has been renamed to `GenAIExecutable` (`qairt.gen_ai_api.executors.gen_ai_executable`). Update imports to the new module.

## Known Issues

> 
> 
> - Gen AI Builder is temporarily unsupported on Windows-arm64. Use Linux-x86\_64 as the host platform for Gen AI workflows.
> - Models with batch size greater than 1
> 
> 
> 
> > 
> > 
> > - A discrepancy exists between the qnn-net-run CLI and the Python API regarding data preparation for batch sizes
> > greater than 1. While the CLI automatically groups individual tensor paths from an input list into a batch,
> > the Python API requires inputs to be pre-batched (concatenated into a single raw file) prior to execution

Last Published: Jun 19, 2026

[Previous Topic
Known Issues](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/0_7_0.md) [Next Topic
0.8.1](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/0_8_1.md)