# 0.7.0

## Release Information

- OS & Framework support - Tested with the following configurations:

> 
> 
> - Non-Gen AI:
> 
> 
> 
> > 
> > 
> > - Frameworks: ONNX, TFLite, Pytorch
> >         - Host Platforms: Linux-x86\_64 (Ubuntu 22.04), Linux-aarch\_64 (Ubuntu 22.04), Windows-x86\_64 (10+), Windows-arm64 (10+)
> >         - Target Platforms: Linux-x86\_64 (Ubuntu 22.04), Linux-aarch\_64 (Ubuntu 22.04), Windows-arm64 (10+), Android-arm64, QNX
>     - Gen AI:
> 
> 
> 
> > 
> > 
> > - Frameworks: ONNX, GGUF
> >         - Host Platforms: Linux-x86\_64 (Ubuntu 22.04)
> >         - Target Platforms: Android-arm64, Linux-aarch64
- QAIRT SDK - Tested with version (2.45.0, 2.46.0(Default), 2.47.0)

Note

For QAIRT SDK, other supported SDK versions can be listed using `qairt-vm fetch --list`.

## Highlights

- Default QAIRT SDK updated to 2.46.0
- New features in QAIRT Dev Python APIs (compatible with QAIRT SDK 2.46.0+, unless noted otherwise)

> 
> 
> - ONNX Optimizer New Passes and APIs
> 
> 
> 
> > 
> > 
> > - `convert_mha_to_sha` — typed convenience API for converting Multi-Head Attention to Single-Head Attention, replacing the legacy `OnnxModel.mha2sha_v2()` workflow with a single `GraphContext`-based call
> >         - `split_llm` — typed convenience API for splitting an LLM ONNX model into a configurable number of graph splits, with optional embedding and LM-head extraction, replacing `OnnxModel.split()`
> >         - Consistent three-level public namespace for the ONNX optimizer: simple APIs under `qairt.optimizer.onnx`, pass classes under `qairt.optimizer.onnx.passes`, and configs/data classes under `qairt.optimizer.onnx.passes.config`
>     - Gen AI Builder Enhancements
> 
> 
> 
> > 
> > 
> > - `GGUFCalibrator` for generating activation-encoding calibrations on GGUF models, enabling tighter integer-kernel selection at compile time for improved on-device throughput and reduced latency (see [GGUF Calibration for Activation Encodings](https://docs.qualcomm.com/doc/80-87189-2/topic/gguf_calibration.html#gguf-calibration)) (requires QAIRT SDK 2.47.0+)
>     - Resource Profiler
> 
> 
> 
> > 
> > 
> > - Programmatic `enable_profiling` / `disable_profiling` and a `profiling_scope` context manager to scope memory and wall-clock measurements to specific code paths without restarting the workflow (see [Resource Profiler](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-resource-profiler.html#qairt-resource-profiler))
> >         - Concurrent `@resource_profile` calls are now thread-safe, allowing parallel build stages to share the profiler store without dropped samples
>     - Logging
> 
> 
> 
> > 
> > 
> > - `QAIRTLogger.redirect_logger_to_file` — runtime API to redirect log output of any registered area to a file, without re-initializing the logging system

## Resolved Issues

> 
> 
> - Gen AI Builder LoRA workflow: `lora_adapter_bin` selection during workflow execution now resolves the correct adapter file. Previously, workflows could load the wrong adapter when multiple bin files were present, requiring manual editing of cache files (reported in 0.5.0).
> - MHA2SHA conversion is now restricted to the self-attention start point for AR=1 Qwen3.5 hybrid-attention models. This eliminates spurious rewrites at non-self-attention nodes that previously caused build failures on these models.
> - Gen AI Builder now applies `transform_options` before `native_kv` in the build stage. Previously, transform options were silently ignored when `native_kv` was enabled because they were applied after the KV-cache rewrites had already run.
> - OE-Linux Gen AI execution no longer fails when pushing artifacts to devices with read-only fastrpc paths. The genie execution path now skips fastrpc folder creation when the target location is not writable.

## Deprecation Notices

> 
> 
> - The ONNX Model interface (`OnnxModel` and its public methods `OnnxModel.load()`, `OnnxModel.split()`, `OnnxModel.mha2sha_v2()`) is deprecated. It exists only for compatibility with the QAIRT SDK and will be removed once the default QAIRT SDK version moves past 2.48. Migrate to `GraphContext` together with the simple optimizer APIs (`convert_mha_to_sha`, `split_llm`, `adapt_moe`).

## Known Issues

> 
> 
> - Gen AI Builder is temporarily unsupported on Windows-arm64. Use Linux-x86\_64 as the host platform for Gen AI workflows.
> - Models with batch size greater than 1
> 
> 
> 
> > 
> > 
> > - A discrepancy exists between the qnn-net-run CLI and the Python API regarding data preparation for batch sizes
> > greater than 1. While the CLI automatically groups individual tensor paths from an input list into a batch,
> > the Python API requires inputs to be pre-batched (concatenated into a single raw file) prior to execution

Last Published: Jul 08, 2026

[Previous Topic
Known Issues](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/0_6_0.md) [Next Topic
0.8.0](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/0_8_0.md)