# 0.4.0

## Release Information

- OS & Framework support - Tested with the following configurations:

> 
> 
> - Non-Gen AI:
> 
> 
> 
> > 
> > 
> > - Frameworks: ONNX, TFLite, Pytorch
> >         - Host Platforms: Linux-x86\_64 (Ubuntu 22.04), Windows-x86\_64 (10+), Windows-arm64 (10+)
> >         - Target Platforms: Linux-x86\_64 (Ubuntu 22.04), Linux-aarch\_64 (Ubuntu 22.04), Windows-arm64 (10+), Android-arm64, QNX
>     - Gen AI:
> 
> 
> 
> > 
> > 
> > - Frameworks: ONNX, GGUF
> >         - Host Platforms: Linux-x86\_64 (Ubuntu 22.04)
> >         - Target Platforms: Windows-arm64 (10+), Android-arm64
- QAIRT SDK - Tested with version (2.42.0, 2.43.0(Default))

Note

For QAIRT SDK, other supported SDK versions can be listed using `qairt-vm fetch --list`.

## Highlights

- Default QAIRT SDK updated to 2.43.0
- Using 2.43.0 QAIRT SDK adds the following features with QAIRT Dev Python APIs

> 
> 
> - Speculative decoding technique support
> 
> 
> 
> > 
> > 
> > - Added support for speculative decoding techniques: SSD, LADE and Eaglet
> >         - See [Speculative Decoding Tutorial](https://docs.qualcomm.com/doc/80-87189-2/topic/speculative_decoding_tutorial.html#speculative-decoding-tutorial) for more details

## Known Issues

> 
> 
> - Models with batch size greater than 1
> 
> 
> 
> > 
> > 
> > - A discrepancy exists between the qnn-net-run CLI and the Python API regarding data preparation for batch sizes
> > greater than 1. While the CLI automatically groups individual tensor paths from an input list into a batch,
> > the Python API requires inputs to be pre-batched (concatenated into a single raw file) prior to execution

Last Published: Jul 08, 2026

[Previous Topic
Known Issues](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/0_3_0.md) [Next Topic
0.5.0](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/0_5_0.md)