# Revision History

> 
> 
> 

This page contains the change log revision history starting from QAIRT SDK v2.34.0. For details on earlier releases, please refer to ReleaseNotes.txt in QNN\_SDK\_ROOT for QNN revision history.

| Version | Date | Description |
| --- | --- | --- |
| 2.47.0 | May 2026 | <ul class="simple"><br><li><p>Core: Added support for the center_point_box parameter in the NonMaxSuppression Op. {171194}</p></li><br><li><p>Core: Added System Level Cache (SLC) allocator support to snpe-dlc-graph-prepare. {182200}</p></li><br><li><p>Core: Improved qairt-dlc-diff attribute comparison for otherwise identical Ops. {160768}</p></li><br><li><p>Genie: Added dialog pause/resume tutorial to SDK documentation. {167467}</p></li><br><li><p>Genie: Added support for linear attention. {165312}</p></li><br><li><p>Genie: Enabled GPU engine on Windows. {134277}</p></li><br><li><p>LPAI: Introduced Hexagon Simulator package support for the QNN-LPAI backend, enabling development and validation without physical<br>hardware. {133768}</p></li><br><li><p>Tool: Added LoRA model support to the QAIRT Accuracy Debugger, including LoRA-aware inference and accuracy debugging. {155624}</p></li><br><li><p>SDK: Added Ubuntu 24.04 as a supported host OS. {182575}</p></li><br><li><p>Tool: Added Python 3.12 support on Linux x86. For conversion, only ONNX and TFLite frameworks are supported with Python 3.12.<br>{182575}</p></li><br><li><p>Tool: Added ARM-Linux as a supported development host environment with Ubuntu 24.04 and Python 3.12 across qairt-* tools, including<br>conversion, quantization, compilation, and accuracy-debugger. {182575}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 and QNN_DATATYPE_UFIXED_POINT_8 for AvgPool2D, MaxPool2D, and L2Pool2D Ops.<br>{176135}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 and QNN_DATATYPE_UFIXED_POINT_8 for Concat Op. {177166}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 and QNN_DATATYPE_UFIXED_POINT_8 for Gather Op. {176139}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for Conv2d Op. {164369}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for ReLU Min-Max, Sigmoid, Hard Sigmoid, and PReLU Ops. {175169}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for Softmax Op. {176136}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for TransposeConv Op. {176098}</p></li><br><li><p>Op:HTP: Added support for Int16 Hadamard Transform. {161464}</p></li><br><li><p>Op:HTP: Added support for W2FP16a block quantization for Conv2d, FullyConnected, and MatMul Ops. {180118}</p></li><br><li><p>Op:HTP: Added support for W4FP16a block quantization for FullyConnected and MatMul Ops. {179184}</p></li><br><li><p>Tool:Converter: Added AdjustNorm optimization pass support in converter. {161626}</p></li><br><li><p>Tool:Converter: Added expand_elementwise_op_structure optimization pass in converter. {166493}</p></li><br><li><p>Tool:Converter: Added ExpandInverse optimization pass support in converter. {168752}</p></li><br><li><p>Tool:Converter: Added Squash Constant Transpose optimization pass support in converter. {177494}</p></li><br><li><p>Tool:Converter: Improved LPBQ quantization handling for MatMul Ops with transB=1. {165302}</p></li><br><li><p>Core: Fixed a SNPE segfault when memory-mapped I/O buffers are requested during graph preparation but not registered during<br>inference. {168054}</p></li><br><li><p>Core: Fixed elevated power consumption that could occur after inference in certain configurations. {180830}</p></li><br><li><p>Core: Fixed SNPE HMX timeout configuration by adding Perf API controls for the timeout interval. {182001}</p></li><br><li><p>DSP: Fixed a race condition in the HexNNv2 backend performance driver that could cause a SIGSEGV during SNPE instance teardown or<br>runtime performance profile switching. {163626}</p></li><br><li><p>Genie: Fixed an issue where backend extension configs with multiple contexts were not correctly propagated. {181206}</p></li><br><li><p>Genie: Fixed an issue where KV cache rewind did not work correctly with an embedding LUT. {152694}</p></li><br><li><p>Genie: Fixed an issue where pipeline text generator state was not fully reset. {177268}</p></li><br><li><p>GPU: Fixed an inference failure with the Softmax Op. {180133}</p></li><br><li><p>GPU: Fixed context creation failure when using tuning mode. {177871}</p></li><br><li><p>GPU: Fixed timeout errors that could occur during inference for affected models. {141256}</p></li><br><li><p>HTP: Fixed a cDSP crash in the Softmax Op implementation observed during inference. {178668}</p></li><br><li><p>HTP: Fixed a stability issue caused by an undersized DDR save buffer. {181281}</p></li><br><li><p>HTP: Fixed an accuracy issue with certain elementwise multiplication Ops. {179412}</p></li><br><li><p>HTP: Fixed an indefinite hang when two DLCs are used in parallel to initialize two independent contexts. {180809}</p></li><br><li><p>HTP: Fixed an issue where LoRA weight sharing RAM preload could trigger unexpected callback behavior and invalid offset handling<br>when loading contexts through the callback-based flow. {183762}</p></li><br><li><p>HTP: Fixed slower than expected inference performance for affected encoder models. {176492}</p></li><br><li><p>Tool:Converter: Fixed a Convert Op issue in the mixed-precision stage. {165230}</p></li><br><li><p>Tool:Quantizer: Fixed the default bias_bitwidth setting in qairt-quantizer. {167357}</p></li><br></ul> |
| 2.46.0 | Apr 2026 | <ul class="simple"><br><li><p>Genie: Added genie-app command to clear command history. {177605}</p></li><br><li><p>Genie: Added genie-app dialog command options for setOemKey and setPriority. {127420}</p></li><br><li><p>Genie: Added GENIE_NODE_WILDCARD option for connecting hidden states between nodes in a pipeline. {165273}</p></li><br><li><p>Genie: Added GenieDialog_releaseLoraMemory to release memory allocated for a LoRA. Added GENIE_DIALOG_PARAM_APPLIED_LORA_ADAPTER to<br>retrieve the actively applied LoRA. {138865}</p></li><br><li><p>GPU: Added documentation for GPU offline prepare feature for Android targets. {171008}</p></li><br><li><p>HTP: Enabled VTCM virtual address optimization in HTP backend. {168583}</p></li><br><li><p>API:GPU: Added support for aliasing memory objects. {171571}</p></li><br><li><p>Op:GPU: Added INT8 support for activation functions, including the ReLU Op. {167736}</p></li><br><li><p>Op:GPU: Added INT8 support for the StridedSlice Op. {171906}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_INT_32 inputs to ElementwiseNeuron Op. {172297}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_4 for Gather Op. {167304}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 and QNN_DATATYPE_UFIXED_POINT_8 for ResizeOp with linear interpolation mode.<br>{171905}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for ElementwiseBinary with ADD and MINIMUM param. {167734}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for MatMul and FullyConnected Op. {171909}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_SFIXED_POINT_8 for the ReduceMean Op. {175155}</p></li><br><li><p>Op:GPU: Added support for QNN_OP_RESIZE_TRANSFORMATION_MODE_PYTORCH_HALF_PIXEL in Resize Op. {169077}</p></li><br><li><p>Op:HTP: Added support for W4FP16a asymmetric block quantization for FullyConnected and MatMul Ops. {178535}</p></li><br><li><p>Tool:Converter: Added optimization passes to improve model conversion performance and efficiency. {161663}</p></li><br><li><p>Tool:Converter: Added RandomNormalLike Op support. {147408}</p></li><br><li><p>Genie: Fixed an issue where GeniePipeline_reset does not reset KV cache state. {178827}</p></li><br><li><p>Genie: Fixed an issue where the genie-app async command does not work inside of the loop statement. {147298}</p></li><br><li><p>Genie: Fixed an issue where the token generation rate is reported too high when the number of generated tokens is small. {159119}</p></li><br><li><p>HTP: Fixed a crash and improper abort handling that could occur during init cancellation when using the signal feature. {177676}</p></li><br><li><p>HTP: Fixed a latency issue when migrating models across QNN versions. {168926}</p></li><br><li><p>HTP: Fixed a performance issue when changing performance profiles post-initialization from higher to lower profiles. {169760}</p></li><br><li><p>HTP: Fixed a performance issue with LayerNorm by removing redundant transpose operations. {172195}</p></li><br><li><p>HTP: Fixed a performance issue with transpose and block-transform Op sequences. {171175}</p></li><br><li><p>HTP: Fixed an accuracy issue with certain elementwise multiplication Ops. {179412}</p></li><br><li><p>HTP: Fixed an accuracy issue with LoRA models using weight sharing by temporarily disabling shared memory pooling for graphs with<br>far weights. {168730}</p></li><br><li><p>HTP: Fixed an accuracy issue with Relu6 quantization by adding a validation warning when Conv-Relu fusion is disabled due to<br>out-of-range encodings. {166898}</p></li><br><li><p>HTP: Fixed an execution performance issue for certain depthwise convolution patterns. {170951}</p></li><br><li><p>HTP: Fixed an execution performance issue for FP16 Argmax and Argmin operations. {171176}</p></li><br><li><p>HTP: Fixed an inference time issue affecting FP16 models. {170721}</p></li><br><li><p>Tool: Fixed an issue in qnn-context-binary-generator where calling finish() would unexpectedly overwrite the input DLC file.<br>{178907}</p></li><br><li><p>Tool: Fixed an issue where IR graphs were stripped from the output DLC when using qnn-context-binary-generator with multiple input<br>DLCs. {176055}</p></li><br><li><p>Op:CPU: Fixed an integer underflow in output dimension calculations for AvgPool2D and GlobalAveragePool Ops, which could cause<br>validation failures with certain input and filter size combinations. {169544}</p></li><br><li><p>Op:GPU: Fixed Split Op to support input rank 5 with axis 0. {151892}</p></li><br><li><p>Tool:Converter: Fixed GRU translation to correctly propagate quantization encodings. {174425}</p></li><br><li><p>Tool:Converter: Fixed incorrect initial_h and initial_c handling for LSTM. {174333}</p></li><br><li><p>Tool:Converter: Fixed incorrect offset computation during float fallback. {161782}</p></li><br></ul> |
| 2.45.0 | Mar 2026 | <ul class="simple"><br><li><p>Genie: Removed QnnGenAITransformerBE OpPackage dependency. QnnGenAiTransformerCpuOpPkg library is removed from the SDK. {157103}</p></li><br><li><p>GPU: Added support for offline graph preparation using the Adreno Offline Compiler for select Android devices. {168225}</p></li><br><li><p>HTP: Added support for updateable quantized tensors in TransposeConv2d. {151571}</p></li><br><li><p>API:GPU: Added support for asynchronous dispatch using semaphore file descriptors via the new QnnGpuGraph_FenceConfig_t structure.<br>{148590}</p></li><br><li><p>API:GPU: Added support for the QnnGraph_executeAsync API. {9307}</p></li><br><li><p>Op:GPU: Added INT8 support (QNN_DATATYPE_SFIXED_POINT_8 and QNN_DATATYPE_UFIXED_POINT_8) for the DepthToSpace Op. {167735}</p></li><br><li><p>Op:GPU: Added QNN_DATATYPE_SFIXED_POINT_8 and QNN_DATATYPE_UFIXED_POINT_8 support for the Transpose Op. {165347}</p></li><br><li><p>Op:HTP: Added support for int32 Scatter Element with None reduction type. {171658}</p></li><br><li><p>Op:HTP: Added support for signed INT8 and INT16 data types for the ScatterND Op. {125562}</p></li><br><li><p>Tool:Converter: Added a graph optimization pass to expand GRU Ops into primitive operations. {161504}</p></li><br><li><p>Tool:Converter: Added a graph optimization pass to simplify negation patterns. {161579}</p></li><br><li><p>Tool:Converter: Added a graph optimization pass to unroll multi-timestep GRU Ops into single-timestep GRU Ops. {161507}</p></li><br><li><p>Tool:Converter: Added a graph optimization pass to unroll multi-timestep LSTM Ops into single-timestep LSTM Ops. {161509}</p></li><br><li><p>Genie: Fixed a segmentation fault in the GenAiTransformer backend when using kv-share and engine-share features together. {166506}</p></li><br><li><p>HTP: Fixed a rare process hang that could occur during HTP device cleanup when a DSP subsystem error was encountered. {167096}</p></li><br><li><p>HTP: Fixed an HVX thread configuration mismatch between offline and on-device graph preparation. {171561}</p></li><br><li><p>Tool: Fixed snpe-dlc-diff compare weights behavior when either DLC has weights stripped. {165305}</p></li><br><li><p>Tool:Converter: Fixed a converter failure for some PyTorch models when running the remove-disconnected-nodes optimization pass on an<br>empty graph. {168257}</p></li><br><li><p>Tool:Converter: Fixed LoRA import failure by adding the missing disable_defer_loading argument in the converter module. {168661}</p></li><br><li><p>Tool:Converter: Fixed model conversion failure for some BF16 models by marking QNN_DATATYPE_FLOAT_16 as a quantizable datatype.<br>{168562}</p></li><br><li><p>Tool:Converter: Resolved an issue where model conversion would fail during BatchNormalization Op validation for some BF16 models.<br>{167399}</p></li><br><li><p>Tool:Converter: Resolved an issue where overridden encoding did not take effect for RmsNorm matching. {166943}</p></li><br><li><p>Tool:Quantizer: Fixed an accuracy regression in the quantizer tool when using the –adjust_bias_encoding flag on a pre-quantized<br>model. {167796}</p></li><br><li><p>Tool:Quantizer: Fixed incorrect 32-bit offset shifting for static tensors. {168056}</p></li><br></ul> |
| 2.44.0 | Feb 2026 | <ul class="simple"><br><li><p>Core: Added –deferred_init support to SNPE libs and snpe-net-run. {158344}</p></li><br><li><p>CPU: Enhanced MatMul kernel reliability on ARM Cortex-A55 cores by rerouting to an optimized implementation. {165001}</p></li><br><li><p>Genie: Added the <cite>genie-app</cite> source code as an example in the SDK. {123560}</p></li><br><li><p>OpDef: Added Op definition for ElementWiseMux. {166667}</p></li><br><li><p>API:Genie: Added GenieNode_train and GenieNode_saveLora APIs. {155439}</p></li><br><li><p>Op:HTP: Added support for the ONNX RotaryEmbedding (RoPE) Op. {147230}</p></li><br><li><p>Op:HTP: Enhanced PoolMax2d Op edge-window handling when <cite>rounding_mode</cite> is set to <cite>ceil</cite>. {160577}</p></li><br><li><p>Tool:Converter: Added a new graph optimization for the GroupNorm Op that reshapes wide tensors to a more performant format,<br>improving HTP performance. {159606}</p></li><br><li><p>Tool:Converter: Implemented a new <cite>remove_disconnected_nodes</cite> optimization pass to automatically prune unused nodes from the graph,<br>which can help reduce model size. {158568}</p></li><br><li><p>Genie: Fixed an issue where the <cite>genie-app</cite> tool could return a success code even if an underlying Genie API call failed. {166170}</p></li><br><li><p>Genie: Fixed issue where the token query API would always fail for dialogs with embedding LUT encoders. {167327}</p></li><br><li><p>GPU: Fixed graph prepare failures on certain Adreno GPU tiers caused by incorrect compatibility checks for image2darray. {165815}</p></li><br><li><p>HTP: Fixed a timeout in multi-threaded scenarios caused by a resource hang during cooperative pre-emption. {164978}</p></li><br><li><p>HTP: Fixed an issue where grouped LoRA adapter application failed on WoS. {161545}</p></li><br><li><p>LPAI: Fixed incorrect per-layer profiling time data caused by timestamp truncation. {165829}</p></li><br><li><p>SDK: Fixed an issue in the QNN sample app where graph finalization was unconditionally called on deserialized graphs, causing<br>failures on backends that do not support this step. {163438}</p></li><br><li><p>SNPE: Fixed a crash that occurred when running the SNPE sample application on certain hardware targets. {165960}</p></li><br><li><p>Op:HTP: Fixed a context binary creation failure caused by missing INT16 to INT32 casting support in the Cast Op. {152878}</p></li><br><li><p>Op:HTP: Fixed an accuracy issue in the GatherND kernel. {163397}</p></li><br><li><p>Tool:Converter: Fixed a data type inference issue affecting multiple Ops during conversion, particularly when using 16-bit float<br>precision. {164079}</p></li><br><li><p>Tool:Converter: Fixed an issue in the <cite>FoldMultipleTranspose</cite> pass where consecutive Transpose Ops were incorrectly pruned when<br>their combined permutation was not an identity. {164340}</p></li><br><li><p>Tool:Converter: Fixed an issue in the <cite>matmul_to_fc</cite> optimization where the buffer shape of the bias was not correctly updated,<br>causing conversion failures. {164520}</p></li><br><li><p>Tool:Converter: Fixed an issue in the Cast Op translation where BF16 data types were not handled correctly. {164517}</p></li><br><li><p>Tool:Converter: Fixed an issue where a graph output was incorrectly pruned by the <cite>remove-disconnected-nodes</cite> pass after LSTM<br>unrolling. {166306}</p></li><br><li><p>Tool:Converter: Fixed an issue with Concat Op quantization where its output bit-width was incorrectly calculated, leading to model<br>conversion failures. {159401}</p></li><br><li><p>Tool:Converter: Resolved an issue where model conversion would fail during BatchNormalization Op validation for some BF16 models.<br>{165682}</p></li><br><li><p>Tool:Quantizer: Fixed a regression in a graph optimization pass that caused model conversion failures for models with a Transpose op<br>of 7 or more dimensions with non-consecutive axes. {164567}</p></li><br></ul> |
| 2.43.0 | Jan 2026 | <ul class="simple"><br><li><p>API:HTP: Introduced a new Python API, <cite>tuner.optimize</cite>, to enable compiler option tuning to control tiling granularity for HTP<br>backends. {127018}</p></li><br><li><p>CPU: Support RoPE OP in CPU Backend {147231}</p></li><br><li><p>Docs: Added documentation with instructions for executing models on the LPAI backend on Windows on Snapdragon (WoS) platforms.<br>{156078}</p></li><br><li><p>Genie: Added support for weight-shared LoRA adapters via the <cite>weight-shared-lora</cite> JSON configuration option. {155687}</p></li><br><li><p>Genie: Added the GenieDialog_getValue API and GENIE_DIALOG_PARAM_CONTEXT_OCCUPANCY option. {145722}</p></li><br><li><p>Genie: Added the <cite>dialog rewindQuery</cite> and <cite>dialog setStopSequence</cite> commands to the <cite>genie-app</cite> tool. {163223}</p></li><br><li><p>Op:HTP: Added support for 5D GeLU Op for FP16 and FP32 data types on the HTP backend. {138861}</p></li><br><li><p>Op:LPAI: Added support for 32-bit integer data types for the Split and Concat Ops on the v5 platform. {164355}</p></li><br><li><p>Op:LPAI: Added support for 32-bit integer data types for the Split and Concat Ops on the v6 platform. {164353}</p></li><br><li><p>Tool:Converter:ONNX: Added support for BF16 datatype in Converter. {151133}</p></li><br><li><p>Tool:Genie: Added the ability to write output to a file for <cite>pipeline execute</cite> commands in the <cite>genie-app</cite> tool. {162608}</p></li><br><li><p>Tool:Quantizer: Added support for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, and Q8_0 GGUF data types for quantization on the HTP backend.<br>{129186}</p></li><br><li><p>Genie: Fixed a segmentation fault that could occur when using the save/restore KV cache feature with the GPU backend. {158549}</p></li><br><li><p>Genie: Fixed an issue in the QNN GenAiTransformer engine’s GGUF file reader that could cause parsing errors and segmentation faults.<br>{163156}</p></li><br><li><p>HTP: Fixed an accuracy regression for the ReduceSum Op when using FP16 precision on certain hardware targets. {163097}</p></li><br><li><p>Op:CPU: Added support for INT8, UINT8 Support in CPU FP32 Transpose Execution {155986}</p></li><br><li><p>Op:HTP: Added support for casting from INT16 to INT32 data types for the Cast Op on the HTP backend. {152878}</p></li><br><li><p>SDK: Fixed an issue in the SNPE Android sample APK where repeatedly building a network with User-Defined Operations (UDOs) could<br>cause a crash or an “Invalid OpPackage” error. {163340}</p></li><br><li><p>Tool:Converter: Added support for the <cite>remove_unused_inputs</cite> parameter to the converter Python API, matching existing command-line<br>functionality. {151089}</p></li><br><li><p>Tool:Converter: Fixed a model context saving failure by ensuring quantization encodings are preserved for the RMSNorm gamma<br>parameter when it has multiple consumers. {154731}</p></li><br><li><p>Tool:Converter: Fixed an accuracy issue where quantization encodings were dropped during optimizations involving a static Slice op.<br>{154605}</p></li><br><li><p>Tool:Converter: Fixed an issue in the Logit op’s data type inference that could cause op validation failures on the backend,<br>particularly when using a separate quantization step. {153054}</p></li><br><li><p>Tool:Converter: Fixed an issue in the ONNX converter where an incorrect graph optimization would fold a <cite>Mul -&gt; Softmax</cite> pattern,<br>causing accuracy degradation in certain models. {116260}</p></li><br><li><p>Tool:Converter: Fixed an issue where the <cite>qairt-lora-adapter-bin-updater</cite> tool would fail when processing models that did not<br>contain multiple graph splits. The tool’s logic for determining graph execution order now correctly handles models with a single<br>graph. {161292}</p></li><br><li><p>Tool:Quantizer: Fixed a regression in a graph optimization pass that caused model conversion failures for models with a Transpose op<br>of 7 or more dimensions with non-consecutive axes. {164567}</p></li><br></ul> |
| 2.42.0 | Dec 2025 | <ul class="simple"><br><li><p>API:HTP: Added support for a monolithic LSTM feature, configurable via a new graph option, to improve performance for certain LSTM<br>model structures. {146918}</p></li><br><li><p>API:SNPE: Added new HTP backend option, <cite>monolithic_lstm</cite> (default: false). This flag can be controlled during offline preparation<br>with <cite>snpe-dlc-graph-prepare</cite> or during online preparation through platform configuration options. {161369}</p></li><br><li><p>Docs: Added documentation for LPAI core control. {155151}</p></li><br><li><p>Docs: Updated the LPAI Sample App tutorial to remove an incorrect signing step for Android artifacts that could break the<br>executables. {158732}</p></li><br><li><p>Genie: Added GenieDialog_tokenQuery support in genie-app. {135259}</p></li><br><li><p>Genie: Added a configuration option for controlling the KeyDiff anchor weight. {146125}</p></li><br><li><p>Genie: Added look-ahead decoding dialog support for GenieDialog_tokenQuery. {160627}</p></li><br><li><p>Genie: Added support for 16 KB page sizes in the Android sample application build to ensure compatibility with Android 15. {160693}</p></li><br><li><p>Genie: Added support for YaRN RoPE. {138733}</p></li><br><li><p>Genie: Added support for cross-attention. {154798}</p></li><br><li><p>Genie: Added support for linear RoPE. {157494}</p></li><br><li><p>Op:CPU: Resolved an issue that caused mismatched output for the FullyConnected Op when using a dynamic bias by correcting the kernel<br>selection logic. {160281}</p></li><br><li><p>Op:HTP: Added support for 3D inputs for the LSTM Op. {143117}</p></li><br><li><p>Op:HTP: Added support for the INT16 data type for the ElementWiseDivide Op. {152779}</p></li><br><li><p>Op:HTP: Added support for the INT16 data type for the ElementWiseMaximum Op. {152780}</p></li><br><li><p>SDK: Updated the QNN sample apps to optionally compose a graph from a DLC file. {157809}</p></li><br><li><p>Tool: The qnn-context-binary-generator tool now includes a new profiling event to capture the time taken for finalizing a graph<br>after tensor updates. {156608}</p></li><br><li><p>Tool:Converter: Added layout transformation support for IsInf, Convert, Dequantize, Quantize, CombinedNms, and Unpack Ops. {147648}</p></li><br><li><p>Tool:Converter: Added support for using the Quantizer v2 engine for calibration via the <cite>–use_quantize_v2</cite> flag in<br><cite>qnn-onnx-converter</cite>. {142674}</p></li><br><li><p>Tool:Converter: During quantization, the Buffer Op will now copy the quantization encodings from its input to its output to ensure<br>consistency. {153288}</p></li><br><li><p>CPU: Fixed a memory leak on the CPU backend by ensuring that memory allocated by the underlying cpuinfo library is properly<br>de-initialized. {147345}</p></li><br><li><p>Core: Fixed a race condition during the de-initialization sequence that could occur when multiple SNPE instances were used in<br>different threads. {161975}</p></li><br><li><p>Genie: Fixed a bug in perplexity calculations with FP32 models or tokenized inputs. {154447}</p></li><br><li><p>Genie: Fixed an issue that caused a build error in the sample code on Windows. {160178}</p></li><br><li><p>Genie: Fixed an issue where grouped LoRA adapters would fail to be applied. {154778}</p></li><br><li><p>HTP: Fixed a Windows-specific execution failure for low-priority graphs that occurred when burst mode, high-performance mode, or RPC<br>polling was enabled. {142747}</p></li><br><li><p>HTP: Fixed a bug in the HTP backend where an ArgMax kernel could return an uninitialized index value under certain data conditions.<br>{159115}</p></li><br><li><p>HTP: Improved window partition operations and multi batch single headed attention in some transformer models. {148501}</p></li><br><li><p>HTP: Fixed prepare failure caused by large broadcast op. {157143}</p></li><br><li><p>Op: Added 6D Support for Int32 Mul and 5D Int32 Pow with Broadcasting {159296}</p></li><br><li><p>Op:CPU: Fixed an accuracy issue with the ReluMinMax Op on ARMv7 (32-bit) devices. {155120}</p></li><br><li><p>Op:CPU: Fixed an issue in the ElementWiseMultiply Op. {160547}</p></li><br><li><p>Op:HTP: Fixed a graph finalization failure for 6D StridedSlice when the batch dimension equals one. {159012}</p></li><br><li><p>Op:HTP: Fixed an issue with the conversion of 5D PReLU to 4D for FP16 models. {158971}</p></li><br><li><p>Tool:Converter: Enabled the MatMul + Add fusion for the LPAI backend, which squashes the pattern into a single MatMul Op with bias.<br>{155659}</p></li><br><li><p>Tool:Converter: Fixed a TFLite model conversion failure caused by incorrect handling of quantization offsets in pre-quantized<br>models. {157117}</p></li><br><li><p>Tool:Converter: Fixed a model context saving failure by ensuring quantization encodings are preserved for the RMSNorm gamma<br>parameter when it has multiple consumers. {154731}</p></li><br><li><p>Tool:Converter: Fixed an ‘index out of bounds’ error in the <cite>fold-concat</cite> graph optimization pass. {148528}</p></li><br><li><p>Tool:Converter: Fixed an issue where conversion of GGUF models failed in context binary due to incorrect datatype and quant schema.<br>{156493}</p></li><br><li><p>Tool:Converter: Fixed an issue where using the <cite>–preserve_io</cite> flag with a quantized ONNX model could cause an unexpected <cite>Convert</cite><br>Op to be added to the graph output. {137974}</p></li><br><li><p>Tools:Converter: Fixed a qairt-quantizer dequantization failure when the graph output tensor has consumers. {152814}</p></li><br></ul> |
| 2.41.0 | Nov 2025 | <ul class="simple"><br><li><p>SNPE Core: Added support for new profiling level “qhas” for advanced HTP chrometrace profiling. Workflow documented under<br>Benchmarking and Accuracy, Benchmarking, QHAS profile. {144800}</p></li><br><li><p>SNPE Core: Added the <cite>–use_native_input_files</cite> argument to <cite>snpe-throughput-net-run</cite> similar to snpe-net-run {149784}</p></li><br><li><p>API: Added support for the BF16 datatype to the core QNN API. {151153}</p></li><br><li><p>API:Genie: Added a new C/C++ API (<cite>GenieAccuracy.h</cite>) for calculating accuracy metrics for models. {115211}</p></li><br><li><p>Genie: Added multimodal RoPE support. {145055}</p></li><br><li><p>HTP: Added support for the BF16 datatype for runtime execution on the HTP backend. {152435}</p></li><br><li><p>HTP: Improved performance of the MobileBERT model. {155606}</p></li><br><li><p>Op:HTP: Added support for converting tensors from <cite>sfxp16</cite> to <cite>fp16</cite> datatype within the Convert Op on the HTP backend. {157470}</p></li><br><li><p>QNN Core: Fixed qnn-net-run exit error logs related to destroying power config id on some platforms {146296}</p></li><br><li><p>SNPE Core: Fixed a backward compatibility issue where using a cached model with a resized input dimension could lead to incorrect<br>predictions. {144435}</p></li><br><li><p>SNPE DSP: Fixed an issue where models with a 5D Split Op would fail on the DSP backend. {151456}</p></li><br><li><p>API:Python: Fixed an issue in the LoRA model transformation process where the optimizer would cause a <cite>KeyError</cite> when searching for<br>LoRA encodings in a model split that contained no updatable tensors. {157941}</p></li><br><li><p>CPU: Resolved an accuracy issue with the MatMul Op for quantized models on certain targets. {149047}</p></li><br><li><p>Core: Resolved an issue in <cite>qnn-net-run</cite> where continuous profiling for the LPAI backend could incorrectly report a minimum<br>execution time of zero. {153294}</p></li><br><li><p>DSP: Fixed a bug in the DSP backend that caused model execution to fail for certain depthwise convolution patterns that had zero<br>left padding. {150291}</p></li><br><li><p>GPU: Resolved an inference failure in Conv2d on some targets. {156699}</p></li><br><li><p>GenAiTransformer: Enabled log support for custom ops. {147362}</p></li><br><li><p>Genie: Fixed an issue that could cause poor-quality output from certain 8-bit quantized decoder models by improving the detection<br>logic for the prefill stage of the decoder. {156683}</p></li><br><li><p>Genie: Fixed issue where the HTP engine does not mmap serialized binaries when the use-mmap flag is enabled. {157330}</p></li><br><li><p>Genie: Fixed issue with example code build on Windows platforms. {154894}</p></li><br><li><p>HTP: Fixed a LoRA-related checksum issue that occurred during context binary generation for models with a QINT32 zero-point bias.<br>{155550}</p></li><br><li><p>HTP: Fixed a segmentation fault that could occur during context binary generation when using the<br><cite>concurrent_deserialize_patch=measure</cite> option for very small graphs. {153476}</p></li><br><li><p>HTP: Fixed an accuracy issue with the Dequantize Op when converting from uint8 to fp16 on the HTP backend. {154787}</p></li><br><li><p>HTP: Fixed an issue where context binary generation would fail for models with auxiliary graphs. {155944}</p></li><br><li><p>HTP: Improved the <cite>qnn-context-binary-generator</cite> tool by adding an informational log message. {137064}</p></li><br><li><p>HTP: Resolved a rare race condition that could cause a crash when executing graphs in asynchronous mode across multiple threads.<br>{152851}</p></li><br><li><p>Op:CPU: Improved performance of the FullyConnected Op. {152835}</p></li><br><li><p>Op:CPU: Resolved an integer overflow issue in the tensor size calculation for the MatMul Op, which could cause execution failures.<br>{152348}</p></li><br><li><p>Op:HTP: Added support for fusing <cite>Conv3D</cite> with <cite>Relu</cite> and <cite>ElementWiseNeuron</cite> operations on the HTP backend. {152639}</p></li><br><li><p>Op:HTP: Added support for the <cite>Conv3D`+`ReLU</cite> supergroup (fused operation) on the HTP backend. This resolves converter errors that<br>previously occurred when quantizing models with this pattern. {148609}</p></li><br><li><p>Op:HTP: Improved the accuracy of the FP16 LogSoftmax Op for inputs with large variance. {148540}</p></li><br><li><p>Op:HTP: Resolved a corner-case failure during the creation of GroupedConv2D layers. {156026}</p></li><br><li><p>Op:HTP: Resolved a numerical accuracy issue with the Pad Op on the HTP backend. {154337}</p></li><br><li><p>Op:HTP: Resolved an execution failure for stateful FP16 LSTM models when the reset tensor was null. {146825}</p></li><br><li><p>SDK: Fixed a compilation error in the LPAI sample application for Android builds. {149435}</p></li><br><li><p>SDK: Improved the performance of the <cite>qnn-model-lib-generator</cite> tool on Windows. {136223}</p></li><br><li><p>SDK: Resolved an issue where an asynchronous graph execution failure could lead to incorrect error codes and incomplete resource<br>cleanup. {156488}</p></li><br><li><p>Tool: Fixed a bug in the model preparation step where custom Op detection would fail. {154956}</p></li><br><li><p>Tool: Resolved an issue where the <cite>qairt-dlc-info</cite> tool failed on DLC files that were updated using <cite>qnn-context-binary-generator</cite>.<br>{153977}</p></li><br><li><p>Tool: The <cite>qairt-lora-model-creator</cite> tool now correctly handles LoRA weight shapes when attaching to a Grouped Convolution layer.<br>{148214}</p></li><br><li><p>Tool:Converter: Fixed an issue in the TFLite converter where fused activations for TransposeConv2D and TransposeConv3D Ops were not<br>being handled correctly. {150806}</p></li><br><li><p>Tool:Converter: Resolved a GRU quantization issue by ensuring the Op uses its specific quantization logic instead of a common<br>default. {151608}</p></li><br><li><p>Tool:Converter: Resolved a LSTM related accuracy issue that caused by wrong schema setting in quantization optimization module.<br>{155539}</p></li><br><li><p>Tool:Converter: Resolved an issue where the output layout was not preserved when using the <cite>–preserve_io</cite> option for ONNX models<br>ending with a Softmax Op. {150766}</p></li><br></ul> |
| 2.40.0 | Oct 2025 | <ul class="simple"><br><li><p>API:SNPE: Added two new HTP backend options, <cite>advanced_activation_fusion</cite> (default: true) and <cite>high_precision_sigmoid</cite> (default:<br>false). These flags can be controlled during offline preparation with <cite>snpe-dlc-graph-prepare</cite> or during online preparation through<br>platform configuration options. {153694}</p></li><br><li><p>CPU: Added support for context binary support in QNN CPU. {142754}</p></li><br><li><p>Genie: Aligned the attention mask behavior for Sliding Window Attention layers with Hugging Face implementations to improve model<br>compatibility. {151861}</p></li><br><li><p>HTP: Added Support for multi-graph switch on HNRD {147026}</p></li><br><li><p>SNPE: Added new low level perf voting corners for HTP TURBO_L4 and L5 {152587}</p></li><br><li><p>SNPE: Added support for QMX in CPU runtime via new SNPE Builder API - SNPEBuilder::setCpuQmxMode() /<br>Snpe_SNPEBuilder_SetCpuQmxMode(). A new flag –enable_cpu_qmx has been added to net-run apps as well. {150785}</p></li><br><li><p>Tool:Converter: Added support for signed asymmetric quantization in the ONNX converter and quantizer. {111082}</p></li><br><li><p>Tool:Quantizer: Added support for Quantizer v2 for calibration in the QNN Quantizer tool, enabled via the <cite>–use_quantize_v2</cite> flag.<br>{144453}</p></li><br><li><p>Tool:qairt-accuracy-debugger: Enhanced the accuracy debugger to allow comparison between two different backend configurations. This<br>feature uses the one-shot algorithm and supports starting from pre-compiled DLC files. {149818}</p></li><br><li><p>Add advanced_activation_fusion flag to enable/disable the few activations fusion with preceding convolution layer. {147374}</p></li><br><li><p>QNN Core: Fixed memory leaks in qnn-throughput-net-run during cleanup phase. {145228}</p></li><br><li><p>API: Resolved an IR graph serialization failure that occurred when converting certain LoRA models using the Python API. {151851}</p></li><br><li><p>CPU: Added support for convert op for float input and output datatypes. {146089}</p></li><br><li><p>CPU: Fix the LSTM flow to save the input_gate, forget_gate, cell_gate, output_gate, and hidden_state, values from scratch to<br>dedicated output memory. {149330}</p></li><br><li><p>CPU: Fixed padding for count_pad_for_edges param {143281}</p></li><br><li><p>CPU: Fixed the cell gate calculation to preserve scratch memory for final LSTM output. {149464}</p></li><br><li><p>Core: Fixed a bug in snpe/qairt-dlc-diff tool when –compare_layers argument is passed. {150517}</p></li><br><li><p>Core: Resolved warnings related to the priority management library in HTP non-RPC mode by ensuring the correct API symbol is used.<br>{152714}</p></li><br><li><p>Core: The <cite>qairt-dlc-info</cite> and <cite>snpe-dlc-info</cite> tools now list input tensors in an order that follows the topological sort of the<br>graph. {147076}</p></li><br><li><p>GPU: Resolved inference failures in models having AvgPool Op on QCM2290 {150565}</p></li><br><li><p>Genie: Fixed a memory deallocation issue that could cause a crash when executing a model with memory-mapped tensors enabled.<br>{149468}</p></li><br><li><p>HTP: Added an option to disable Conv+Activation fusion to resolve accuracy issues with the GRU Op in certain models. {148860}</p></li><br><li><p>HTP: Fixed a bug that prevented some models from executing for multiple iterations in qnn-net-run or qnn-throughput-netrun. {145419}</p></li><br><li><p>HTP: Fixed a logging issue where a failed context deserialization was incorrectly reported as successful in verbose logs. {148549}</p></li><br><li><p>HTP: Fixed an issue related to a data type mismatch error during graph preparation. {152793}</p></li><br><li><p>HTP: Resolved an issue that caused model failures when using weight-sharing with multicore configurations due to an incorrect<br>calculation of shared weight blobs. Recommend combination is udma=on + weight-sharing + multicore without lora. {154732}</p></li><br><li><p>HTP: Resolved an issue where a signed PD session was not correctly configured during backend initialization, ensuring it is enabled<br>as expected. {152031}</p></li><br><li><p>Op:CPU: Added support for signed 8-bit fixed-point weights in the FullyConnected Op. {149602}</p></li><br><li><p>Op:CPU: Fixed an issue in the Division Op to prevent potential division-by-zero with certain quantized inputs. {150626}</p></li><br><li><p>Op:CPU: Fixed an issue in the Relu Op implementation for Armv7 that caused incorrect outputs for uint8 quantized models. {149048}</p></li><br><li><p>Op:GPU: Resolved a performance regression for the Convolution Op on QCS8250. {139732}</p></li><br><li><p>Op:HTP: Fixed an accuracy issue for certain Concat operations that follow a Gather Op. {153063}</p></li><br><li><p>QNN: Reduced verbosity by suppressing ‘Bad quantization: zero scale!’ log messages, improving terminal readability. {123651}</p></li><br><li><p>Tool: Fixed an issue in the quantizer where the encoding offset could go out of range. {153153}</p></li><br><li><p>Tool: Resolved a Python error in the quantizer that occurred during an optimization pass by ensuring correct encoding information is<br>used for activations. {153154}</p></li><br><li><p>Tool:Converter: Added support for converting the GroupNorm layer from TensorFlow models. {146973}</p></li><br><li><p>Tool:Converter: Fixed a bug in the Where Op translation that produced an incorrect output shape when the inputs were broadcastable<br>but had different shapes. {143835}</p></li><br><li><p>Tool:Converter: For the LPAI backend, disabled the <cite>matmul_to_fc</cite> optimization and the automatic insertion of Convert Ops before<br>Matmul to better support mixed-precision models and avoid performance issues. {152658}</p></li><br><li><p>Tool:Converter: Resolved a segmentation fault that occurred during model conversion for certain backends by implementing different<br>schema selection strategies. {153271}</p></li><br><li><p>Tool:Converter: Resolved an accuracy regression that caused incorrect outputs in some Generative AI models by reverting a change<br>related to asymmetric quantization for signed data types. {154206}</p></li><br><li><p>Tool:Converter: Resolved an issue where a graph’s output tensor could be incorrectly removed because squashing of producer. {152011}</p></li><br><li><p>Tool:Converter: Resolved an issue where models with Conv2d ops failed on the HTP backend due to unsupported input or output data<br>types. {153277}</p></li><br><li><p>Tool:Converter: Resolved an issue where the LayerNorm Op failed validation due to an unsupported data type. {153276}</p></li><br><li><p>Tool:Converter: The <cite>qairt-lora-model-creator</cite> tool no longer restricts the quantization bitwidth of tensors in the LoRA branch.<br>{148088}</p></li><br><li><p>Tool:Converter:ONNX: Added 2 patterns mapping to RMS Norm to avoid ops falling back to float during quantization {149931}</p></li><br></ul> |
| 2.39.0 | Sep 2025 | <ul class="simple"><br><li><p>API:Genie: Added the <cite>GenieDialog_embeddingTokenQuery</cite> API. {148803}</p></li><br><li><p>API:Genie: Added the <cite>GenieDialog_setMaxNumTokens</cite> API. {146820}</p></li><br><li><p>API:HTP: Added a new HTP-specific property to support a detachable buffers feature. {148227}</p></li><br><li><p>API:HTP: Enhanced profiling capabilities to expose detailed timing information for each component during the graph preparation phase<br>(<cite>QnnGraph_finalize</cite>). {143804}</p></li><br><li><p>API:HTP: Implemented a feature allowing read-only weights buffers to be detached and unmapped. {141354}</p></li><br><li><p>API:HTP: Introduced new APIs and configuration options to support a detachable buffers feature. {143832}</p></li><br><li><p>API:SNPE: Added new builder API for enabling accelerated HTP inititialization with a pre-prepared cache<br>Snpe_SNPEBuilder_SetAcceleratedInit() / SNPEBuilder::setAcceleratedInit(). Support also added to snpe-net-run,<br>snpe-throughput-net-run and snpe-parallel-run via cmd line argument –enable_htp_accelerated_init. {149873}</p></li><br><li><p>Docs: Updated documentation for <cite>qairt-accuracy-debugger</cite> to include support for the Windows on Snapdragon (WoS) platform, including<br>updated help sections and sample commands. {149286}</p></li><br><li><p>Docs: Updated the LPAI documentation to include a summary of the required steps for model preparation. {142076}</p></li><br><li><p>Genie: Added new profiling option for collecting detailed trace events. {133638}</p></li><br><li><p>Genie: Added the <cite>GENIE_STATUS_ERROR_CONTEXT_EXCEEDED</cite> error code to provide a specific status when a prompt exceeds the model’s<br>context length limit. {145721}</p></li><br><li><p>HTP: Added support for multi-graph switching, which allows multiple graphs to be loaded and retained in memory simultaneously.<br>{139603}</p></li><br><li><p>HTP: Added support for several operator fusion patterns on the HTP backend, including combinations like Conv-Relu and<br>Conv-Batchnorm-HardSwish. {125633}</p></li><br><li><p>HTP: Added support for the BFloat16 data type by including the necessary header and definitions in the HTP backend. {140994}</p></li><br><li><p>HTP: Minor performance improvement for benchmark models. {147751}</p></li><br><li><p>LPAI: Fixed an issue where the quantization process would incorrectly modify the offset specified in a <cite>quant.json</cite> file. {145916}</p></li><br><li><p>LPAI: Resolved an accuracy issue with audio context detection models on the LPAI backend. The issue was caused by incorrect bias<br>quantization settings for convolution and GEMM operations. {146710}</p></li><br><li><p>Op:GPU: Added support for QNN_DATATYPE_INT_32 inputs to StridedSlice op. {142629}</p></li><br><li><p>Op:HTP: Added support for 6D variants of Cast, GatherElements, Pad, and StridedSlice with certain constraints. For GatherElements,<br>input and index shapes must match except along the axis dimension. For Pad, padding is limited to dimensions 5D or smaller. For<br>StridedSlice, slicing is limited to dimensions 5D or smaller, and some axis parameters are not supported. {147157}</p></li><br><li><p>Op:HTP: Enabled support for the SFIXED_POINT_16 data type for the Sqrt Op in QNN HTP Op validation flow. {142710}</p></li><br><li><p>OpDef: Added support for the <cite>RandomUniformLike</cite> Op. This includes the ONNX to QNN IR translation in the converter and the backend<br>implementation. {138616}</p></li><br><li><p>OpDef: Updated the NonZero Op definition to clarify that it outputs -1 for padded values in static shapes. Also updated Gather and<br>Scatter Ops to restrict index tensors to non-negative values, allowing -1 only as a sentinel value for indices generated from other<br>Ops. {142505}</p></li><br><li><p>QNN: TFLite Delegate: Added support for the Broadcast_to Op. {149782}</p></li><br><li><p>Tool: Added native support for WoS to the Accuracy Evaluator tool. This includes updates to handle platform-specific file paths and<br>resolves a file permission error in the SQuAD evaluation script on Windows. {136566}</p></li><br><li><p>Tool: Added support for multi-graph switching in <cite>qnn-net-run</cite> and <cite>qnn-throughput-net-run</cite> via the new custom configuration option<br><cite>graphs_retention_order</cite>. {145979}</p></li><br><li><p>Tool: Enabled support for the Windows on Snapdragon (WoS) platform in the accuracy debugger. Users can now debug models on WoS using<br>both the CLI and Python API interfaces. {147963}</p></li><br><li><p>Tool:Converter: Added reference implementations for static tensor manipulation Ops, including <cite>Add</cite>, <cite>Mul</cite>, <cite>Sub</cite>, <cite>Div</cite>,<br><cite>Transpose</cite>, and <cite>Reshape</cite>. {133602}</p></li><br><li><p>Tool:Converter: Fixed a segmentation fault in <cite>qairt-converter</cite> that occurred during float fallback for models with external data.<br>{147000}</p></li><br><li><p>Tool:Converter: Fixed an issue where FP16 constant tensors were not correctly interpreted at the Python layer. {147009}</p></li><br><li><p>Tool:Converter: Introduced new flags to provide fine-grained control over the IR optimizer passes. {135982}</p></li><br><li><p>Tool:Converter: RMSNorm node names now use either the common prefix of all matched nodes in the pattern or, if no common prefix<br>exists, the output buffer name of the pattern. This replaces the previous rms_norm_i naming based on topological order. {146838}</p></li><br><li><p>Tool:Converter: Removed exception handling for 6D tensors in the converter. {144599}</p></li><br><li><p>API:HTA: Resolved an application crash that occurred when calling the QNN API to get the HTA device infrastructure for performance<br>tuning. {146157}</p></li><br><li><p>DLC: Fixed issues within the DLC format when per-channel block quantization is employed on a multi-graph DLC. {138853}</p></li><br><li><p>GPU: Improved performance by updating heuristics for Pooling and Reduction Ops to better utilize hardware resources, addressing<br>inference time regressions on some models. {147242}</p></li><br><li><p>Genie: Fixed an accuracy bug with cross-layer attention networks when the decoder block is a single context binary. {150908}</p></li><br><li><p>Genie: Fixed an issue that caused incorrect calculation of KV cache tensor sizes on the HTP backend, which could lead to<br>segmentation faults. {148675}</p></li><br><li><p>Genie: Fixed an issue where no output was generated for certain models when the prompt prefill phase required multiple graph<br>executions. {145896}</p></li><br><li><p>HTP: Enabled support for using the <cite>ScatterElements</cite> Op within LoRA-updatable models. {147845}</p></li><br><li><p>HTP: Fixed a checksum mismatch error that could occur during graph finalization for models using LoRa. {147901}</p></li><br><li><p>HTP: Fixed a crash that could occur during long-running stress tests involving VTCM sharing. {148064}</p></li><br><li><p>HTP: Fixed a graph finalization failure by adjusting the optimization pass order for certain Ops like Split and Unpack. {141064}</p></li><br><li><p>HTP: Fixed a memory leak that occurred in the HTP backend during repeated inference runs when performance profiling was enabled.<br>{146627}</p></li><br><li><p>HTP: Fixed an Op package deregistration failure that could occur in specific multi-core use cases. {143977}</p></li><br><li><p>HTP: Fixed an issue preventing context binary generation for models using LoRA adapters where a MatMul operation of size 16x16 was<br>present. {149711}</p></li><br><li><p>HTP: Fixed an issue that caused graph finalization failures for certain large models on specific SoCs. {147402}</p></li><br><li><p>HTP: Fixed an issue that caused incorrect error code translation when writing shared weight buffers. {147793}</p></li><br><li><p>HTP: Fixed an issue where applying a LoRA adapter binary would fail for multicore scenarios or float-precision graphs. {149995}</p></li><br><li><p>HTP: Fixed an issue where requesting a signed PD would fail on x86 simulation environments. The configuration is now ignored for<br>x86, as it makes no difference in that context. {145651}</p></li><br><li><p>HTP: Fixed an occasional VTCM memory allocation error that could occur during context binary generation. {145879}</p></li><br><li><p>HTP: Optimized performance for a text encoder model by successfully applying MHA-to-SHA transformations, converting MatMuls to<br>Convolutions, and ensuring correct quantization settings. {136947}</p></li><br><li><p>HTP: Resolved a failure in on-device context binary generation when using custom Ops. {147187}</p></li><br><li><p>HTP: Resolved an error where applying a LoRA adapter failed with the message “Apply cannot happen as context bin did not have<br>serialized bin.” {149992}</p></li><br><li><p>HTP: Resolved an issue where using Op packages in multi-threaded applications could cause a<br><cite>QNN_OP_PACKAGE_ERROR_LIBRARY_ALREADY_INITIALIZED</cite> error, halting execution. {147431}</p></li><br><li><p>HTP: Resolved memory leaks observed under specific stress scenarios. {145181}</p></li><br><li><p>LPAI: Fixed an issue that caused the ADSP driver to fail to load on certain Windows on Snapdragon platforms. {149188}</p></li><br><li><p>Op:CPU: Fixed the Mod Op to align its calculation with the behavior of standard frameworks. {147060}</p></li><br><li><p>Op:CPU: Resolved an issue that caused model failures on the CPU backend when a quantized Div Op encountered a zero-valued divisor.<br>{150630}</p></li><br><li><p>SDK: Optimized specific library functions on Windows by replacing parts of the C++ standard library with native Windows API calls,<br>reducing the overall binary size. {150497}</p></li><br><li><p>SNPE:DSP: Resolved an issue where executing a model with a UDO package on the DSP backend could fail with a<br><cite>QNN_OP_PACKAGE_ERROR_LIBRARY_ALREADY_INITIALIZED</cite> error. {135967}</p></li><br><li><p>Tool: Fixed an input parsing issue in the ModelModifierArchChecker tool. {144884}</p></li><br><li><p>Tool: Resolved an issue where <cite>qnn-accuracy-debugger</cite> would fail with a <cite>FileNotFoundError</cite> when using a compiled model (<cite>–stage<br>compiled</cite>). {149891}</p></li><br><li><p>Tool:Compiler: Fixed an issue in the context binary generator where a SpaceToDepth Op adjacent to a graph input could cause an<br>error. {147548}</p></li><br><li><p>Tool:Converter: Enabled support for dynamic 16-bit weights by default in <cite>qairt-converter</cite> and <cite>qairt-quantizer</cite>. This resolves an<br>issue where an unnecessary <cite>Convert</cite> Op was inserted for <cite>MatMul</cite> weights, which previously led to increased model size and reduced<br>accuracy. A new <cite>–disable_dynamic_16_bit_weights</cite> flag has been added to revert to 8-bit conversion if needed. {147008}</p></li><br><li><p>Tool:Converter: Fixed a bug in the quantizer where node-squashing logic could fail for nodes that were both a graph output and had<br>inputs with multiple consumers. {136028}</p></li><br><li><p>Tool:Converter: Fixed a bug that could cause a ‘Duplicate buffer name’ error during certain graph optimizations. {145690}</p></li><br><li><p>Tool:Converter: Fixed a fatal “access violation” exception that occurred when running the ONNX converter on WoS devices. {149750}</p></li><br><li><p>Tool:Converter: Fixed an issue with generating quantization encodings for models containing LSTM or GRU layers. {146424}</p></li><br><li><p>Tool:Converter: Fixed an issue with handling dynamic inputs for the slope tensor in the PReLU Op. {145599}</p></li><br><li><p>Tool:Converter: Fixed an issue with the LoRA model conversion flow where certain graph optimization passes were not being applied<br>consistently. {150868}</p></li><br><li><p>Tool:Converter: Fixed incorrect weight broadcasting behavior in the RMSNorm and LayerNorm fusion patterns within the ONNX converter.<br>{124105}</p></li><br><li><p>Tool:Converter: Resolved an issue where certain graph optimizations could incorrectly remove a tensor that was also a graph output.<br>{150933}</p></li><br><li><p>Tool:qairt-tool: Added support for Clip,SpaceToDepth,Relu Ops in mha2sha-v2 {149759}</p></li><br><li><p>Models with very large buffers (~1 GB or more) can abort during execution with “Could not create context from binary” due to FastRPC<br>mapping failures {148198}</p></li><br></ul> |
| 2.38.0 | Aug 2025 | <ul class="simple"><br><li><p>API: Generalized the <cite>qairt.transform</cite> API to support multiple, interchangeable transformation implementations. {138775}</p></li><br><li><p>API:GPU: Added support for the <cite>QNN_GPU_PRECISION_USER_PROVIDED</cite> precision mode to the GPU backend extension API, allowing users to<br>specify custom precision settings for a graph. {142096}</p></li><br><li><p>API:Genie: Added GENIE_NODE_IMAGE_ENCODER_IMAGE_FULL_ATTN_MASK and GENIE_NODE_IMAGE_ENCODER_IMAGE_WINDOW_ATTN_MASK node inputs.<br>{145051}</p></li><br><li><p>Genie: Added a source code example for genie-t2e-run to the SDK. {144427}</p></li><br><li><p>Genie: Added embeddingQuery support for offline embeddings in genie-app. {146044}</p></li><br><li><p>Genie: Added engine sharing support for models used across different dialogs, currently available for the HTP backend and applicable<br>to basic and SSD dialogs. {147585}</p></li><br><li><p>Genie: Added support for encoder-decoder models in Gen AI Transformer. {136070}</p></li><br><li><p>HTP: Improved performance and reduced memory usage for certain vision models by removing redundant <cite>space_rearrange</cite> operations from<br>the graph. {141570}</p></li><br><li><p>HTP: Removed the <cite>-ffast-math</cite> compiler flag from the build configuration to prevent potential numerical inconsistencies and improve<br>accuracy alignment for floating-point operations. {139547}</p></li><br><li><p>Op:CPU: Added support for the Logit Op. {136656}</p></li><br><li><p>Op:GPU: Added support for <cite>INT32</cite> data type inputs to the <cite>ArgMax</cite> Op on the GPU backend. {133989}</p></li><br><li><p>Op:GPU: Added support for the CumulativeSum Op. {38682}</p></li><br><li><p>Op:HTP: Added backend support for the <cite>STFT</cite> Op. {134956}</p></li><br><li><p>Op:HTP: Added documentation for dynamic dimension constraints in HTP Op definitions. {143878}</p></li><br><li><p>Op:HTP: Added support for Int32 ElementWiseAbs and ElementWiseUnary with Abs operation. {138856}</p></li><br><li><p>Op:HTP: Added support for signed int16 data type in Unpack Op validation. {142708}</p></li><br><li><p>Op:HTP: Enabled support for the 5D Cast Op. {143121}</p></li><br><li><p>Op:HTP: Enabled support for the 5D GatherElements Op with non-zero axis values. {143123}</p></li><br><li><p>Op:HTP: Enabled support for the 5D Pad Op with a constant padding scheme for FP16 and FP32 data types. {143122}</p></li><br><li><p>OpDef: Added Op definition for STFT {134955}</p></li><br><li><p>OpDef: Added support for <cite>int32</cite> and <cite>UFIXEDPOINT8</cite> data types for the <cite>RandomUniformLike</cite> Op. {146810}</p></li><br><li><p>QNN: TFLite Delegate: Added support for the Broadcast_to Op. {138848}</p></li><br><li><p>QNN:HTP: Enabled Quant &amp; Dequant Op between FP32 and QINT16 op validator {141056}</p></li><br><li><p>SDK: Added a new <cite>RandomUniformLike</cite> Op definition and reference implementation to align with the ONNX specification. {134859}</p></li><br><li><p>SDK: Enhanced OEM control over QNN priority levels, allowing more flexible configuration of graph execution priorities on HTP<br>backend. {126262}</p></li><br><li><p>SNPE: Added documentation for low-level performance APIs under “Tutorials and Examples”, “Application Tips” {145899}</p></li><br><li><p>Tool: Added the ability to debug a specific subgraph by introducing two new command-line options: <cite>–debug_subgraph_inputs</cite> and<br><cite>–debug_subgraph_outputs</cite>. These options allow specifying the input and output tensors that define the subgraph to be analyzed.<br>{127762}</p></li><br><li><p>Tool: Introduced a new Network Specialization module and API to programmatically convert and optimize models with multiple graph<br>configurations into a single DLC file. This replaces the previous command-line-only workflow. {108571}</p></li><br><li><p>Tool:Converter: Added support for the Logit Op. {138107}</p></li><br><li><p>Tool:Converter: Added support for the ONNX RandomUniformLike Op. {134348}</p></li><br><li><p>Tool:Converter: Added support for the ONNX STFT Op. {134349}</p></li><br><li><p>Tool:Converter: Added support for the <a href="https://docs.qualcomm.com/doc/80-63442-10/topic/general_revision_history.html#id1"><span class="problematic" id="id2">`</span></a>STFT Op in the ONNX converter. {138613}</p></li><br><li><p>Tool:Converter: Added support for the <cite>buffer_padding</cite> parameter in the Buffer Op. {128998}</p></li><br><li><p>Tool:Converter: Enhanced the converter to automatically apply a float-fallback quantization behavior for models that contain<br>Quantize-Dequantize nodes or are provided with quantization overrides (e.g., for LoRA). {139341}</p></li><br><li><p>Tool:Converter: First version (v0.1) of the QAIRT Quantization Specification is released which supports 2.0.0 schema version for<br>quantization overrides file. {114160}</p></li><br><li><p>DSP: Significantly improved performance for models with a batch size greater than one by optimizing the 5D Reshape-Transpose-Gather<br>pattern in the backend. {140837}</p></li><br><li><p>GPU: Improved inference performance for select models in GPU FP16 mode on certain chipsets. {144204}</p></li><br><li><p>Genie: Added the missing ‘type’ field to the sampler.json configuration example. {138004}</p></li><br><li><p>Genie: Fixed a regression in Eaglet token generation rate. {145608}</p></li><br><li><p>Genie: Fixed a segmentation fault caused by uninitialized variables. {144692}</p></li><br><li><p>Genie: Fixed a segmentation fault that occurred when running LLM models with the <cite>genie-t2t-run</cite> tool. {147760}</p></li><br><li><p>Genie: Fixed an issue loading lm_head or LoRA adapters on Windows platforms. {143661}</p></li><br><li><p>Genie: Fixed an issue where paused queries with LUT encoder models could not resume. {145135}</p></li><br><li><p>Genie: Fixed an issue where prompt templates were not applied when GenieEmbedding_generate outputs were truncated. {143445}</p></li><br><li><p>Genie: Fixed memory leaks occurring during GenieDialog_applyLora. {136542}</p></li><br><li><p>HTP: Added support for casting from <cite>uint8</cite> to <cite>fp16</cite> to resolve an accuracy issue where <cite>uint8</cite> was incorrectly interpreted during<br>a cast to a float type. {135317}</p></li><br><li><p>HTP: Enabled support for asynchronous context initialization in multi-core environments. {138427}</p></li><br><li><p>HTP: Fixed a memory corruption crash that could occur in multi-threaded applications during deinitialization. {144587}</p></li><br><li><p>HTP: Fixed a segmentation fault that occurred when using asynchronous initialization on multi-core HTP configurations. {138335}</p></li><br><li><p>HTP: Fixed an accuracy issue that produced incorrect output when using LPBQ. {146380}</p></li><br><li><p>HTP: Fixed an issue where models would crash or hang on the HTP backend when the inference batch size was greater than one. {144574}</p></li><br><li><p>HTP: Fixed an issue where the <cite>deviceGetPlatformInfo</cite> API returned incorrect SoC information when using the non-RPC path. {141569}</p></li><br><li><p>HTP: Implemented a fix to prevent a CDSP crash when Virtual Address space is exhausted during memory allocation. {145909}</p></li><br><li><p>HTP: Resolved an intermittent failure in asynchronous execution mode that could lead to errors {138318}</p></li><br><li><p>HTP: Resolved an issue on certain platforms where a failure to lock the HMX context could cause a DMA execution failure. {138289}</p></li><br><li><p>HTP: Resolved execution failures for certain models in Gen AI corner cases. {129730}</p></li><br><li><p>HTP: Significantly improved performance for models using grouped <cite>TransposeConv2d</cite> by enabling an optimization that was previously<br>restricted to operations with zero padding. {143544}</p></li><br><li><p>Op:HTP: Added support for FP32 weight-only quantization in fully connected layers. {131398}</p></li><br><li><p>Op:HTP: Fixed NullRequant Op registration failure when using w16 and per-channel quantization. {145523}</p></li><br><li><p>Op:HTP: Fixed a crash in PoolAvg2d Op when reducing NxM inputs to 1x1 with padding and count_pad=0. {131311}</p></li><br><li><p>Op:HTP: Fixed a crash occurring during GroupNorm fusion. {130501}</p></li><br><li><p>Op:HTP: Fixed a runtime failure during context creation when a <cite>spill_fill_buffer</cite> was configured. {143863}</p></li><br><li><p>Op:HTP: Fixed an accuracy issue in ElementWiseAdd Op when broadcasting a constant zero. {143254}</p></li><br><li><p>Op:HTP: Fixed an accuracy issue in FP16 models caused by a faulty <cite>SlicePad_shape-&gt;Transpose</cite> graph optimization rule. {145638}</p></li><br><li><p>Op:HTP: Improved performance of the <cite>ReduceSum</cite> Op for FP16 data types by ensuring a faster, optimized implementation is used.<br>{143158}</p></li><br><li><p>Op:HTP: Resolved a performance regression affecting model execution. {145191}</p></li><br><li><p>Op:HTP: Resolved accuracy issue in Gather Op for depth=1 cases. {134448}</p></li><br><li><p>Op:HTP: Resolved performance regressions for select models. {143809}</p></li><br><li><p>SNPE: Added support for the –optimization_preset option in snpe-dlc-graph-prepare and enabled online preparation via platform<br>options. {135223}</p></li><br><li><p>SNPE: Fixed an issue where setting HTP graph optimization levels in online preparation did not support distinct optimization levels<br>for different SNPE instances. {142940}</p></li><br><li><p>SNPE: The snpe-dlc-info tool now displays input, output, and unconsumed tensors in topologically sorted order. {146793}</p></li><br><li><p>Tool: Fixed an accuracy regression that could occur in certain models due to an incorrect start index calculation in a transpose<br>operation. {144858}</p></li><br><li><p>Tool: Fixed an issue where block quantized convolution with special dimensions could cause preparation failures. {144994}</p></li><br><li><p>Tool: Resolved an issue where <cite>snpe-parallel-run-cpp</cite> would crash when used with the <cite>–userbuffer_memorymapped</cite> argument. {119102}</p></li><br><li><p>Tool:Converter: Fixed a bug in Expand Op translation caused by incorrect data type population. {141810}</p></li><br><li><p>Tool:Converter: Fixed a bug in sink_transpose optimization where a transpose node could be consumed twice by the same node. {140535}</p></li><br><li><p>Tool:Converter: Fixed a bug that introduced redundant Convert nodes before LSTM/GRU nodes during mixed precision conversion.<br>{145617}</p></li><br><li><p>Tool:Converter: Fixed an axis tracking issue in ONNX PRelu Op that could cause incorrect broadcasting. {142728}</p></li><br><li><p>Tool:Converter: Fixed an issue where 0D tensors were incorrectly retained as 1D tensors by propagating scalar tensor information as<br>needed. {141899}</p></li><br><li><p>Tool:Converter: Fixed an issue where models with extremely small, near-zero quantization scale values (e.g., 1e-35) would fail<br>during inference on the CPU backend. {127367}</p></li><br><li><p>Tool:Converter: Fixed an issue where the –float_bitwidth option could incorrectly update non-quantizable tensors. {145723}</p></li><br><li><p>Tool:Converter: Fixed an issue where the second input tensor of MatMul nodes from QDQ models was not correctly quantized. {136049}</p></li><br><li><p>Tool:Converter: Fixed an issue with encoding population in LayerNorm pattern matching. {141265}</p></li><br><li><p>Tool:Converter: Fixed issue where squashable elementwise operations following convolution operations caused errors when encodings of<br>the convolution’s weights/bias were provided. {85485}</p></li><br><li><p>Tool:Converter: Improved validation in Resize optimization to prevent errors when invalid scale values are provided. {138778}</p></li><br><li><p>Tool:Converter: Resolved a model conversion failure for large ONNX models caused by excessive memory consumption. {122217}</p></li><br><li><p>Tool:Converter: Resolved an issue where recent updates to the model converter caused excessive memory consumption during graph<br>serialization, leading to failures when creating context binaries for large models. {136952}</p></li><br><li><p>Tool:Converter: Squashed identity Expand and Tile nodes in the graph to remove redundant operations. {144693}</p></li><br><li><p>Tool:Converter: Updated the logic for matching RmsNorm patterns to improve pattern recognition. {146093}</p></li><br></ul> |
| 2.37.0 | July 2025 | <ul class="simple"><br><li><p>QNN HTP opdef supplement doc updated with descriptions of use of QNN_DEFINITION_IMPL_GENERATED encoding definition. {127977}</p></li><br><li><p>API:GPU: Added support for the Qnn_DeviceHandle_t argument in the QnnContext_create API. {123584}</p></li><br><li><p>API:GPU: Added support for the Qnn_GlobalConfig API. {135731}</p></li><br><li><p>Genie: Added an async command to genie-app allowing for execution of asynchronous statements. {137243}</p></li><br><li><p>Genie: Added support for non-updatable quantization (NUQ) and grouped LoRA adapters. {138782}</p></li><br><li><p>Genie: Added the cache-groups JSON configuration option allowing for the sliding window attention (SWA) cache management policy.<br>{135552}</p></li><br><li><p>Genie: Introduced the SSD dialog “branch-mode” config option with “top-1” and “all-expand” supported values. {134925}</p></li><br><li><p>Genie: Added Eaglet dialog support for dual head draft models. {134373}</p></li><br><li><p>Genie:API: Added GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_SIN and GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_COS node inputs. {133935}</p></li><br><li><p>HTP: Support LoRA weights sharing feature by extracting updatable weights across all graphs into a shared blob. {126930}</p></li><br><li><p>HTP: Added Support for QAIRT Block Ops Stateful LSTM, Stateful GRU &amp; Buffer Ops for FP16 precision {125048}</p></li><br><li><p>HTP: Added support for VA Reservation on Windows platforms. {138341}</p></li><br><li><p>HTP: Support LoRA weights sharing feature by extracting updatable weights across all graphs into a shared blob. {128558}</p></li><br><li><p>Op:GPU: Added support for the GatherND Op on the GPU backend. {61057}</p></li><br><li><p>OpDef: Added Op definition for IsNaN. {135847}</p></li><br><li><p>QNN: Fixed html documentation broken links for SNPE documentation URL “Qualcomm Neural Processing SDK” under Overview -&gt; Integration<br>workflow and in the tutorial for Utilizing DLCs. {143420}</p></li><br><li><p>Tool: Lora Creator: Added support for any kernel shape for Conv in Lora Branch. This removes limitation where only 1x1 Conv was<br>supported. {140575}</p></li><br><li><p>Tool:Converter: Added support for SparseConvolution2D. {118014}</p></li><br><li><p>Tool:Converter: Optimized Lora Importer for non-updatable quantization (NUQ). {127586}</p></li><br><li><p>Tool:Converter: Resolved performance regression on CPU/DSP backends by removing redundant clip operations in the TFLite converter;<br>now, clip is only added when required based on fused_activation_function. {123581}</p></li><br><li><p>Tool:Genie: Added support for GenieEmbedding APIs in genie-app. {123549}</p></li><br><li><p>Fixed for wrongly freeing rpc memory allocation for lora adapter in scenarios where context had multiple graphs. {138835}</p></li><br><li><p>Fixed lora weight tensor names not found issue when graph transformation involved {136062}</p></li><br><li><p>QNN Docs: Corrected html docs for qnn-net-run command line argument –output to –output_dir {144805}</p></li><br><li><p>SNPE Tools: snpe/qairt dlc-info fixed to display the correct graph optimization level for HTP cache records generated via API<br>Snpe_SNPEBuilder_SetInitCacheMode() / SNPEBuilder::setInitCacheMode() or net-run option –enable_init_cache {142514}</p></li><br><li><p>Support is added for Conv2D ops with reuse_space_indices parameter defined. Prepare/graph finalization failures will be prevented.<br>{143040}</p></li><br><li><p>Tool Update: [Converter]: Few performances regression observed on CPU/DSP backends and fixed by removing redundant clip operations<br>in the TFLite converter; now, clip is only added when required based on fused_activation_function. {141085}</p></li><br><li><p>Fixed updatable attribute tracking error for torch models {145158}</p></li><br><li><p>CPU: Fixed quantization issues for large models by correcting the softmax Op implementation. {140260}</p></li><br><li><p>CPU: Resolved an issue with axis permutation for BW_AXIS_SCALE_OFFSET quantization encoding in Conv operations. {138266}</p></li><br><li><p>DLC: Fixed small memory leak in DLC based initialization in SNPE and QNN.<br>made to track it {135810}</p></li><br><li><p>Genie: Fixed a crash when running SSD or SPD dialog types on certain Linux platforms. {137954}</p></li><br><li><p>Genie: Fixed an out of bounds read issue observed on uint16 embedding LUTs. {144801}</p></li><br><li><p>Genie: Fixed issue where first context binary split does not contain sufficient information about graph variants to properly<br>initialize the KV$ Manager. {136530}</p></li><br><li><p>Genie: Fixed issue where the draft model EOS token was not set causing an Eaglet initialization failure. {145057}</p></li><br><li><p>Genie: Fixed minor memory leaks. {136813}</p></li><br><li><p>Genie: Fixed segmentation fault when graph switching is enabled along with memory mapping. {143826}</p></li><br><li><p>HTP: Fixed a deadlock issue that could cause the qnn-throughput-netrun application to hang under stress conditions. {142471}</p></li><br><li><p>KI: In QNN HTP BE, update on the prepare sequence is causing a regression on some specific models. This will be fixed in the next<br>release (2.36) {136438}</p></li><br><li><p>Op:HTP: Optimized qu16 Dequantize op {136231}</p></li><br><li><p>Op:HTP: Optimized the pattern of TransposeConv2d-Dequantize pattern near Output. {134467}</p></li><br><li><p>Op:HTP: Optimized the pattern of TransposeConv2d-Dequantize pattern near Output. {136219}</p></li><br><li><p>Op:HTP: Reduced preparation time for 5D operations with large batch sizes. {130280}</p></li><br><li><p>SNPE: Fixed a crash in snpe-throughput-net-run when the container argument was not specified before certain optional arguments.<br>{141598}</p></li><br><li><p>Tool: Calibration Input Validation, Quantizer Params, Input Type Conversion handled for HTP Memory Pipeline {138064}</p></li><br><li><p>Tool: Fixed a failure in the memory pipeline when filtered inference schemas were non-sequential. {142391}</p></li><br><li><p>Tool: Ordered ONNX Runtime outputs based on output name to resolve issues in memory pipeline inference. {136967}</p></li><br><li><p>Tool: Removed backend_info from Quantizer params to resolve issue in memory pipeline compilation {136586}</p></li><br><li><p>Tool: Updated params access way of pydantic object to resolve preserve_io_datatype issue in memory pipeline {144331}</p></li><br><li><p>Tool:Converter: Added support for Layernorm with multiple normalization dimension {137898}</p></li><br><li><p>Tool:Converter: Added support for matching new GeLU Op patterns that include Reshape operations to addsress an issue where semantic<br>search models failed conversion with AutoMHA2SHA. {139465}</p></li><br><li><p>Tool:Converter: Fixed a bug in the Conv/MatMul quantizer optimization to ensure safe indexing. {142845}</p></li><br><li><p>Tool:Converter: Resolved performance regression on CPU/DSP backends by removing redundant clip operations in the TFLite converter;<br>now, clip is only added when required based on fused_activation_function. {140762}</p></li><br><li><p>Tool:Converter: Updated conv node’s weight/bias naming during BatchNorm fusion to resolve quantization parameter naming conflicts.<br>{139997}</p></li><br><li><p>Tool:Converter: Added support for a new pattern in RMSNORM pattern matching {134922}</p></li><br><li><p>Tool:Converter: Added fix to remove injected ops blocking supergroups {134113}</p></li><br><li><p>Tool:Converter: Fixed accuracy drop in models having shared biases {134589}</p></li><br><li><p>Tool:Converter: Updated Tensor Name Sanitization Logic in {141135}</p></li><br><li><p>Tool:Converter: Updated gamma and beta shape of Layernorm Onnx Op {130934}</p></li><br><li><p>Tool:Converter:TFLite: Add support for int64 quantized bias {140882}</p></li><br><li><p>Tool:Converters: Fixed issue of LayerNorm pattern mismatch. {137459}</p></li><br><li><p>Tool:Converters: Supported dynamic bias to ConvOp. {142223}</p></li><br><li><p>Tool:qairt-accuracy-evaluator: Fixed inclusion of converter params in execcution summary {140752}</p></li><br><li><p>Tool:qairt-accuracy-evaluator: Limit parallel qnn x86 evaluations to1 {138075}</p></li><br><li><p>Tool:snpe-net-run: Fixed a dynamic resizing issue in Conv op when using the –input_dimensions option. {142139}</p></li><br><li><p>Tools:Converters: Reduced conversion time for large models with more than 10000 ops. {135822}</p></li><br></ul> |
| 2.36.0 | June 2025 | <ul class="simple"><br><li><p>API: Added LLM support in the Python API. {118016}</p></li><br><li><p>API: Added support for quantizer-specific options in the Converter Python API, including parameters for <cite>act_quantizer_schema</cite>,<br><cite>param_quantizer_schema</cite>, and <cite>target_backend</cite>. These options are now available through the <cite>CalibrationConfig</cite> object, improving<br>feature parity with the command-line interface. {136135}</p></li><br><li><p>API: Added support for the Baichuan2-7b model through the high-level Generative AI Python API, enabling both builder and executor<br>workflows. {126702}</p></li><br><li><p>API: Added support for the Phi-3.5-mini model through the high-level Generative AI Python API, enabling both builder and executor<br>workflows. {138126}</p></li><br><li><p>API: Added support for the Qwen2-7b model through the high-level Generative AI Python API, enabling both builder and executor<br>workflows. {132444}</p></li><br><li><p>API: Enabled the generation and consumption of JSON profiling data on Windows platforms. Users can now utilize the profiling<br>capabilities of the Python API on Windows on Snapdragon (WoS) systems. {138647}</p></li><br><li><p>API: Introduced a model conversion capability to modify the Auto-Regression (AR) number and Context Length (CL) of ONNX-based<br>language models. This allows for flexible adaptation of models to different deployment requirements. {123570}</p></li><br><li><p>API:Genie: Introduced Genie Dialog and Embedding APIs to set and get performance policy. {137070}</p></li><br><li><p>API:HTP: Added support for <cite>ContextFinalize</cite> for the HTP backend, enhancing context management capabilities. {136699}</p></li><br><li><p>API:HTP: Implemented a URI Builder abstraction to simplify the programmatic construction of FastRPC URIs used for opening sessions<br>with the HTP backend. {110797}</p></li><br><li><p>Core: Added custom Op support to <cite>oe–gcc11.2</cite> and <cite>oe-gcc 9.3</cite> toolchains for QNN OP Package Support on LE Target for HTP. {130471}</p></li><br><li><p>Docs: Updated the LoRAv2 tutorial to indicate support for Windows operating systems in both offline and online workflows. {138772}</p></li><br><li><p>Genie: Added <cite>skip-lora-validation</cite> option to reduce LoRA adapter switch time by allowing skipping of LoRA CRC checks on QnnHtp<br>engines. {134913}</p></li><br><li><p>Genie: Added experimental support for the <cite>arm64x-windows-msvc</cite> platform. {129093}</p></li><br><li><p>Genie: Added support for Non-Updateable Quantization (NUQ) and Grouped LoRA, allowing LoRA adapter groups to share encoding bins and<br>supporting non-updateable quant adapters. {138782}</p></li><br><li><p>Genie: Added support for pausing and resuming active queries using a signal API, introducing an architecture for resuming paused<br>queries in SSD and basic dialogs. {119704}</p></li><br><li><p>Genie: Added support for profiling and logging of GenieEngine APIs, enabling measurement of switch time, creation time, and other<br>metrics. {131908}</p></li><br><li><p>Genie: Added support for repetition penalties in sampling within the Genie Sampler. {118081}</p></li><br><li><p>HTP: Added support for HTP online graph preparation optimization level via platform options. {138420}</p></li><br><li><p>HTP: Added validation to reject Per-Graph-Execution (PGE) configurations that specify incompatible features such as shared<br>spill/fill buffers or VTCM backup sharing. A warning is now issued to prevent these unsupported setups. {128832}</p></li><br><li><p>HTP: Enabled 64-bit UDMA support in QNN HTP, allowing access to memory beyond 4GB for large neural networks, and implemented<br>shared-weights far mapping. {91520}</p></li><br><li><p>HTP: Enabled multi-context spill/fill buffer sharing for QNX. {128061}</p></li><br><li><p>HTP: Enhanced the HTP backend polling mechanism to support separate polling contexts and threads for each execution priority level.<br>This design improves performance and resource management for multithreaded applications that concurrently run graphs with different<br>priorities. {131859}</p></li><br><li><p>LPAI: Added support for LPAI backend RPC mode and <cite>QNN_GRAPH_ERROR_EARLY_TERMINATION</cite> in <cite>qnn-throughput-net-run</cite>. {121599}</p></li><br><li><p>Op:CPU: Added support for Sparse Convolution 2D. {120883}</p></li><br><li><p>Op:CPU: Updated the Cast Op to correctly map NaN (Not a Number) inputs to <cite>True</cite> when casting floating-point values to <cite>BOOL8</cite>,<br>aligning with ONNX implementation. {136649}</p></li><br><li><p>Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}</p></li><br><li><p>Op:LPAI: Added support for the <cite>frame_pad</cite> parameter to the Buffer Op on the LPAI backend. {128999}</p></li><br><li><p>OpDef: Added an optional parameter <cite>reuse_sparse_indices</cite> to the Conv2d Op, with default support for AIC, GPU, HTA, and LPAI<br>backends. {118012}</p></li><br><li><p>SDK: Introduced <cite>QAIRT_SDK_ROOT</cite> as the new primary environment variable for setting the SDK path. The previous <cite>QNN_SDK_ROOT</cite> and<br><cite>SNPE_ROOT</cite> variables are now deprecated and will be removed in a future release. For backward compatibility, they are currently set<br>based on <cite>QAIRT_SDK_ROOT</cite>. {121206}</p></li><br><li><p>Tool: Enhanced layerwise debugging tools to accept externally provided “golden” reference outputs for comparison. This allows users<br>to supply their own reference data. A new option to disable layout transformation during this process has also been added to<br>accommodate various data sources. {122717}</p></li><br><li><p>Tool:Converter: Added support for the new Einsum equation <cite>nkctv,kvw-&gt;nctw</cite>, expanding the range of supported ONNX models. {126231}</p></li><br><li><p>Tool:Converter: Added support to serialize disconnected model inputs (dangling inputs) from the source framework into the DLC file.<br>{139058}</p></li><br><li><p>Tool:Converter: Defer loading is now enabled by default for the ONNX converter to improve memory usage and processing time. To<br>disable this feature, use the new <cite>–onnx_disable_defer_loading</cite> flag for the QAIRT converter or the <cite>–disable_defer_loading</cite> flag<br>for the QNN/SNPE ONNX converter. {139858}</p></li><br><li><p>Tool:Converter: Enabled support for the <cite>–defer_loading</cite> option in the QNN ONNX converter when generating C++/binary outputs. This<br>feature, which was previously unsupported for this output format, helps reduce memory consumption and processing time during<br>conversion. {139859}</p></li><br><li><p>Tool:Converter: Removed a limitation in the ONNX converter that previously prevented using defer loading (<cite>–onnx_defer_loading</cite>)<br>and ONNX model simplification in the same conversion. Both features can now be used simultaneously. {116422}</p></li><br><li><p>Tool:Converter:ONNX: Added support for the ONNX Size Op, which outputs the total number of elements of an input tensor as an int64<br>scalar. {138523}</p></li><br><li><p>API: Fixed a bug in the converter input configuration where the data type of the first input was incorrectly applied to all other<br>inputs. {137113}</p></li><br><li><p>API: Fixed a bug in the model-level API where a typo in an internal variable could cause issues with input list file generation.<br>{137830}</p></li><br><li><p>API: Fixed an issue in the Quantizer API where parsing an input list file containing comment lines (e.g., lines starting with ‘%’)<br>could fail. {136414}</p></li><br><li><p>API: Fixed an issue where the GenAIExecutor would return invalid performance metrics, such as -1 or 0 for timing and tokens per<br>second. {137575}</p></li><br><li><p>API: Reduced excessive warning messages generated by <cite>qairt.compile</cite> by correcting an internal log level configuration. {137628}</p></li><br><li><p>API: Refactored the Python API to ensure model configuration files (<cite>config.json</cite>) can be loaded correctly using standard methods<br>like <cite>autoconfig.from_pretrained</cite>. {131057}</p></li><br><li><p>API:CPU: Fixed an issue where graph composition for the CPU backend would fail with an OpConfig validation error for the Transpose<br>Op, particularly when using the <cite>float_precision=16</cite> conversion option. {138242}</p></li><br><li><p>CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a GroupNorm<br>operation failure. {135924}</p></li><br><li><p>Core: Improved model initialization time on the HTP backend by optimizing internal system calls during runtime setup. {136899}</p></li><br><li><p>Genie: Fixed LM head execution for split LEQ models during the last iteration of prefill. {139824}</p></li><br><li><p>Genie: Fixed a memory leak in the tokenizer implementation observed when running <cite>genie-t2t-run</cite> with the LoRA adapter. {130865}</p></li><br><li><p>Genie: Fixed an issue where LLM inference could produce random or incorrect output. {124867}</p></li><br><li><p>Genie: Fixed sampling for float16 models which would produce nonsensical response text. {134604}</p></li><br><li><p>Genie: Reduced peak RAM by removing unnecessary copies for embedding LUT encoders when running embeddings on CPU, addressing high<br>memory usage for longer prompts. {134506}</p></li><br><li><p>Genie: Resolved a crash in the Genie runtime that occurred when using non-empty stop sequences in a dialogue query. {138311}</p></li><br><li><p>HTA: Fixed a segmentation fault that could occur when executing a cached model on the HTA backend if a subgraph fell back to the DSP<br>backend. {127808}</p></li><br><li><p>HTP: Fixed a performance regression on the HTP backend that affected certain transformer models, including those using masked<br>softmax. {137554}</p></li><br><li><p>HTP: Fixed an accuracy regression for models using the ResizeNearestNeighbour Op. The fix adapts the HTP backend to handle updated<br>quantization parameters resulting from an improved CPU backend implementation of the Op. {116566}</p></li><br><li><p>HTP: Fixed an issue that prevented the DSP driver from loading correctly for multicore execution on Android. {135235}</p></li><br><li><p>HTP: Fixed memory deregistering failures in GenAI use cases by deallocating unused tensor buffers after inference completion in<br>async mode. {129731}</p></li><br><li><p>HTP: Resolved a performance regression on the HTP backend that affected both synchronous and asynchronous inference modes for<br>certain models. {137386}</p></li><br><li><p>HTP:Op: Fixed ElementwiseFloorDiv name mismatch. {135158}</p></li><br><li><p>LPAI: Fixed an accuracy regression for models using asymmetric parameter quantization. A change was introduced to correctly handle<br>the <cite>–param_quantizer_schema</cite> flag, which may require users to update their quantization settings. When a tensor’s encoding is<br>symmetric, the quantizer schema must now be set to <cite>unsignedsymmetric</cite> to ensure correct behavior. {138453}</p></li><br><li><p>Op:CPU: Fixed a dynamic bias issue in the DepthwiseConv2d Op that caused a segmentation fault with the QNN CPU backend. {137313}</p></li><br><li><p>Op:CPU: Fixed a memory leak in the Expand Dims Op by ensuring the freeing of space created for axis data. {138049}</p></li><br><li><p>Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}</p></li><br><li><p>Op:DSP: Fixed a performance regression by preventing an unnecessary Reshape Op from being added by the LogSoftmax implementation<br>when its input and output shapes are identical. {137013}</p></li><br><li><p>Op:HTP: Added 5D rank constraints for Softmax and Conv Ops, resolving an issue with ExecuTorch QNN Delegate model preparation.<br>{137462}</p></li><br><li><p>Op:HTP: Fixed an accuracy drop in the HTP backend’s <cite>GridSample</cite> Op that occurred with multi-batch inputs (batch size &gt; 1). {134663}</p></li><br><li><p>Op:HTP: Fixed an accuracy regression in the HTP backend implementation of the <cite>DepthToSpace</cite> Op. This change restores the behavior<br>to align with previous versions, resolving potential output deviations for models utilizing this operation. {139578}</p></li><br><li><p>Op:HTP: Resolved an accuracy issue where models using the <cite>Concat</cite> Op on the HTP backend could produce different and less accurate<br>results when running without the <cite>–debug</cite> flag in <cite>qnn-net-run</cite>. {134084}</p></li><br><li><p>Tool: Fixed an issue where an incorrect offset was generated during the dequantization of tensors with signed symmetric, per-channel<br>encodings. {137056}</p></li><br><li><p>Tool: Resolved a segmentation fault that could occur in the <cite>qnn-context-binary-generator</cite> tool during the <cite>QnnContext_free</cite> call.<br>{139746}</p></li><br><li><p>Tool:Converter: Added support for GRU Op quantization, specifically enabling quantization for LPAI backend by optimizing static<br>inputs. {126350}</p></li><br><li><p>Tool:Converter: Corrected an issue that could lead to accuracy regressions on the LPAI backend for models using 4-bit activation<br>quantization. The SDK now correctly enforces the use of 8-bit activation quantization, as 4-bit is not supported on the LPAI<br>backend. {137976}</p></li><br><li><p>Tool:Converter: Enabled <cite>enableQnnQuant</cite> flag for Resize Op in-out optimization, resolving issues with Nearest Neighbor and Bilinear<br>modes. {137641}</p></li><br><li><p>Tool:Converter: Fixed a bug in the Converter tool that ensures the correct order of input and output tensors in the QNN graph JSON<br>file during serialization, aligning them with the IR graph. {118500}</p></li><br><li><p>Tool:Converter: Fixed a corner case in the Expand Op pattern matching, specifically resolving an issue in the Squash Tile Unsqueeze<br>optimization that led to incorrect shape inference for multi-consumer cases. {136864}</p></li><br><li><p>Tool:Converter: Fixed a log print format issue that affected accuracy when converting LLM models with <cite>maskedsoftmax</cite>. {137471}</p></li><br><li><p>Tool:Converter: Fixed an issue where Batch Normalization (BN) scales and offsets were not correctly obtained from QDQ models,<br>ensuring proper application of BN parameter encodings. {129578}</p></li><br><li><p>Tool:Converter: Fixed an issue where ONNX Logsoftmax Opset11 would add unnecessary reshapes, leading to extra transpose operations,<br>even when input/output shapes were identical. {137545}</p></li><br><li><p>Tool:Converter: Fixed an issue where per-Block/per-Channel encodings were not correctly applied for weights during QAIRT conversion,<br>resolving the inability to quantize DLC with 4-bit BQ weights. {134363}</p></li><br><li><p>Tool:Converter: Fixed an issue where using multiple Static Tensor nodes in a single graph would fail due to duplicate output tensor<br>names. {136080}</p></li><br><li><p>Tool:Converter: Fixed an issue with merging <cite>Mul</cite> and <cite>Add</cite> operations into <cite>Batchnorm</cite> by correcting pattern definitions and adding<br>validation checks. {136756}</p></li><br><li><p>Tool:Converter: Reduced converter memory and time usage by avoiding unnecessary access to tensor weights. {137665}</p></li><br><li><p>Tool:Converter: Removed the <cite>beartype</cite> import in the PyTorch converter. {134045}</p></li><br><li><p>Tool:Converter: Resolved an issue in the Layout Transform post-optimization where a node could be incorrectly squashed multiple<br>times, causing incorrect broadcasted output shapes for certain <cite>Reshape</cite> and <cite>Transpose</cite> operations. {139382}</p></li><br><li><p>Tool:Converter: Updated tensor name sanitization logic to ensure uniqueness and prevent conflicts, resolving issues like “Compose<br>Graph failed: Sigmoid Tensor already exists”. {135409}</p></li><br><li><p>Tool:Converter:ONNX: Enhanced support for the <cite>If</cite> Op in the ONNX converter to allow subgraphs with multiple outputs. {136721}</p></li><br><li><p>Tool:Converter:ONNX: Resolved a <cite>NameError</cite> in the quantizer tool that occurred due to a missing internal logging function. {140893}</p></li><br><li><p>Tool:Quantizer: Resolved an issue in the quantizer to correctly apply per-channel quantization for grouped ConvTranspose Ops.<br>{136585}</p></li><br><li><p>Tool:qnn-context-binary-generator: Enhanced <cite>qnn-context-binary-generator</cite> to precompute and validate adaptation weight metadata<br>paths, allowing early error detection for erroneous LoRA config contents and avoiding long wait times. {126629}</p></li><br><li><p>Tool:qnn-model-lib-generator: Redirected error logs to <cite>stderr</cite> and all other logs to <cite>stdout</cite>. {135807}</p></li><br></ul> |
| 2.35.0 | May 2025 | <ul class="simple"><br><li><p>API: Added LLM support in the Python API. {118016}</p></li><br><li><p>API:Genie: Added a data-alignment-size configuration option for dialog and embeddings APIs. {130270}</p></li><br><li><p>API:Genie: Introduced the GeniePipeline.h and GenieNode.h APIs, providing multimodal support. {123389}</p></li><br><li><p>API:Genie: Introduced the GenieTokenizer.h API. {126408}</p></li><br><li><p>API:HTP: Added support for new memory buffer types (<cite>QNN_HTP_MEM_WEIGHTS_BUFFER</cite> and <cite>QNN_HTP_MEM_SCRATCH_BUFFER</cite>) in the<br><cite>QnnMem_register</cite> and <cite>QnnMem_deregister</cite> APIs. {121766}</p></li><br><li><p>API:HTP: Introduced API changes to support external weights and spillfill buffers. {121760}</p></li><br><li><p>CPU: Added Phi 3 and Phi 3.5 model configurations to the Genie SDK. {134117}</p></li><br><li><p>CPU: Added dangling inputs support in Graph. {134280}</p></li><br><li><p>Core: Added platform information to the JSON output of the context binary utility. {129905}</p></li><br><li><p>Docs: Updated QNN/SNPE documentation to include QCS8625 in the list of supported Snapdragon devices. {134450}</p></li><br><li><p>Genie: Added support for use-mmap on Windows platforms. {116519}</p></li><br><li><p>Genie: Enabled support for multi-modal inference with low latency through the GenIE pipeline, supporting various input/output<br>modalities and utilizing shared embedding weights. {120507}</p></li><br><li><p>Genie: Removed printing of KPIs to stdout, favoring use of GenieProfile. {123352}</p></li><br><li><p>HTP: Added initial support for multi-core weight sharing during deserialization, including functions to handle VA allocation for<br>weights per core and passing multi-core metadata. {124612}</p></li><br><li><p>HTP: Added multicore weight sharing support during deserialization to map shared weights to different cores without requiring VA<br>reservations. {135411}</p></li><br><li><p>HTP: Added support for configuring extended_udma prepare time. {136435}</p></li><br><li><p>HTP: Added support for measuring end-to-end latency in the runtime. {98570}</p></li><br><li><p>HTP: Added support for the <cite>QNN_HTP_CONTEXT_CONFIG_OPTION_DEFER_GRAPH_INIT</cite> context configuration option to postpone graph-related<br>tasks. {130605}</p></li><br><li><p>HTP: Added support for the <cite>QNN_HTP_CONTEXT_GET_PROP_BUFFER_START_ALIGNMENT</cite> context property to retrieve buffer start alignment.<br>{134678}</p></li><br><li><p>HTP: Added support for the usage of external weights and scratch buffers on the HTP backend. {121767}</p></li><br><li><p>HTP: Added support to save the transport result for multicore transport during async execution. {132146}</p></li><br><li><p>HTP: Enabled support for dynamic input and output resolution for SD3 on the HTP backend. {105781}</p></li><br><li><p>HTP: Enabled the mmap budget feature for WoS to reduce peak RAM usage during context initialization for GenAI use cases. {131070}</p></li><br><li><p>HTP: Extended binary format support for spill/fill to include external buffers. {136017}</p></li><br><li><p>HTP: Implemented buffer size calculations for the HTP backend, including consideration for graph selection and calculation of<br>maximum spill/fill buffer size. {121765}</p></li><br><li><p>HTP: Updated the Throughput Net Run (TNR) application to utilize thread_pool utilities for thread management. {113123}</p></li><br><li><p>Op:CPU: Added dynamic dimension support for AvgPool2D. {126775}</p></li><br><li><p>Op:CPU: Added dynamic dimension support for InstanceNorm Op. {101384}</p></li><br><li><p>Op:CPU: Added support for the ‘frame_pad’ parameter in Buffer Op. {133242}</p></li><br><li><p>Op:GPU: Added support for the Cast operation from INT64 to INT32 on Windows. {132750}</p></li><br><li><p>Op:HTP: Added INT16 support for the ElementWiseAsin Op on the HTP backend. {114479}</p></li><br><li><p>Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}</p></li><br><li><p>Op:HTP: Implemented performance optimizations for the Score Filter and NMS operations on the HTP backend. {134740}</p></li><br><li><p>OpDef: Added Op definition for IsInf. {125370}</p></li><br><li><p>SDK: Added an option to enable optrace profiling in the TNR application. {135588}</p></li><br><li><p>SDK: Enabled SNPE, QNN, and QNN delegate support for the QCM8550 platform. {129533}</p></li><br><li><p>Tool:Converter: Added dynamic weights support for the Deconv Op in TensorFlow models. {109713}</p></li><br><li><p>Tool:Converter: Added support for Add, Subtract, Multiply, and Divide operations in Float32 precision for static tensor<br>manipulation within the G2G IR. {125540}</p></li><br><li><p>Tool:Converter: Added support for ONNX 1.16.1 in the Ubuntu 20.04 (Focal) environment. {134975}</p></li><br><li><p>Tool:Converter: Added support for the Size operation and updated Relu opset versions in the ONNX converter to address unsupported<br>operations in certain models. {133472}</p></li><br><li><p>Tool:Genie: Introduced the genie-app command-line utility. {123548}</p></li><br><li><p>Tool:HTP: Added support for the HTP MCP Binary format in the <cite>QnnHtpBinaryBufferPrinter</cite> tool, enabling proper parsing and<br>printing of MCP binaries. {128507}</p></li><br><li><p>API: Allowed passing extra arguments through the Python API’s <cite>ConverterConfig</cite> to underlying modules. {133985}</p></li><br><li><p>API: Fixed an encodings path issue during the build phase with GenAI models using the Python API. {133815}</p></li><br><li><p>API: Fixed an issue where quantized and compiled models failed during execution with the Python API when using default<br><cite>CalibrationConfig</cite> values. {134858}</p></li><br><li><p>API: Fixed an issue where the QAIRT Python API failed to load backend libraries (<cite>QnnCpu.dll</cite>/<cite>QnnHtp.dll</cite>) on certain devices.<br>{134461}</p></li><br><li><p>API: Fixed an issue with the JSON reader setting in QNN profiling on Windows. {134565}</p></li><br><li><p>CPU: Fixed a memory management issue for xnnpack Conv2D nodes. {132710}</p></li><br><li><p>CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a<br>GroupNorm operation failure. {135924}</p></li><br><li><p>Core: Fixed cross SoC compatibility issues caused by unsynchronized GpuInfo fields between SocServer and SocUtility. {135786}</p></li><br><li><p>DSP: Fixed a context binary generation issue on OE Linux Platform. {124376}</p></li><br><li><p>DSP: Fixed an issue where <cite>snpe-net-run</cite> failed due to an unavailable runtime. {135399}</p></li><br><li><p>DSP: Fixed inference time regressions observed on HTP_FP16 and HTP backends by propagating DSP architecture characteristics to the<br>HTP core. {133777}</p></li><br><li><p>GPU: Resolved model verification failures encountered with certain CNN models on the GPU backend, related to Conv Kernel<br>processing. {130041}</p></li><br><li><p>Genie: Fixed an asynchronous initialization issue for Windows platforms. {135904}</p></li><br><li><p>Genie: Fixed an issue where GenieDialog_save/restore could not be used with GENIE_DIALOG_SENTENCE_REWIND. {135558}</p></li><br><li><p>Genie: Fixed an issue where GenieProfiling data could report invalid initialization time data. {134498}</p></li><br><li><p>Genie: Fixed an issue where stop sequences did not work with GenieDialog_embeddingQuery. {134592}</p></li><br><li><p>HTP: Adjusted max PD size calculation to correctly account for far weights, resolving an issue with unexpected secondary PD<br>triggers during specific test conditions. {127268}</p></li><br><li><p>HTP: Fixed a Stability issue with Llama 3 3B multicore models by updating the method for setting the mc_spill_fill buffer.<br>{135253}</p></li><br><li><p>HTP: Fixed a crash occurring in multicore graphs due to incorrect identification of spillfill memory pools by the Hexagon NN API.<br>{135543}</p></li><br><li><p>HTP: Fixed an issue where <cite>qnn-net-run</cite> failed to open a session due to library loading and device transport instance creation<br>errors. {135028}</p></li><br><li><p>HTP: Fixed an issue where core information was not correctly captured in optrace for multicore execution. {133797}</p></li><br><li><p>HTP: Fixed an out-of-memory issue occurring when running Llama 3 8B models on a single core without splitting. {134696}</p></li><br><li><p>HTP: Fixed async execution failures observed while running certain models in a multicore configuration with shared buffers.<br>{135047}</p></li><br><li><p>HTP: Fixed logic in graph switching to prevent a bug. {133794}</p></li><br><li><p>HTP: Fixed multicore async inference failures, including issues observed with Zero copy. {134701}</p></li><br><li><p>HTP: Improved model execution time performance on SM8750, addressing an issue where the execution time KPI was not being met.<br>{128145}</p></li><br><li><p>HTP: Resolved a graph execution failure issue observed during the async_group_init_llama7b_graph_switch_no_shared_resources test.<br>{126402}</p></li><br><li><p>HTP: Resolved an issue causing incorrect mapping of test failures in nightly reports. {125884}</p></li><br><li><p>HTP: Resolved an issue leading to a “Failed to deregister ion memory with the backend” log message during multi-threaded HTP<br>binary execution with shared buffers. {129716}</p></li><br><li><p>HTP: Resolved differences in adapter switch time between Genie and <cite>qnn-net-run</cite> by addressing issues related to graph switching<br>and power settings. {131776}</p></li><br><li><p>Op:CPU: Fixed TransposeConv2d for asymmetric kernels in Float execution. {133778}</p></li><br><li><p>Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}</p></li><br><li><p>Op:GPU: Fixed accuracy errors with the ReduceSum operation when used with Image2DArray for non-Mean ops and specific dimensions.<br>{131616}</p></li><br><li><p>Op:GPU: Fixed inference failures in models with Argmax/Argmin Ops. {133052}</p></li><br><li><p>Op:HTP: Added support for LayerNorm when the constant input is FP16 converted to FP32. {131420}</p></li><br><li><p>Op:HTP: Enabled UINT_8 datatype support for the StridedSlice Op on the HTP backend, resolving model conversion and graph<br>preparation failures. {125597}</p></li><br><li><p>Op:HTP: Fixed accuracy issue for GatherNd Op. {110126}</p></li><br><li><p>Op:HTP: Fixed an accuracy issue with LPBQ convolution for MOE on v73. {133134}</p></li><br><li><p>Op:HTP: Fixed an issue where the Genie output resulted in an infinite loop with WoS by updating the prompt file. {134680}</p></li><br><li><p>Op:HTP: Fixed an issue with high power consumption for DepthwiseConv op with asymmetric stride by optimizing the pattern on the<br>HTP backend. {133635}</p></li><br><li><p>Op:HTP: Improved accuracy of the Swish Op. {133898}</p></li><br><li><p>Op:HTP: Improved performance of the MatMul Op running on HVX. {135210}</p></li><br><li><p>Op:HTP: Improved the performance of the 5D GridSample Op on the HTP backend for W8A16 quantization. {122831}</p></li><br><li><p>Op:HTP: Improved the performance of the GridSample Op on the HTP backend by addressing tiling and scheduling issues. {126462}</p></li><br><li><p>SDK: Fixed an issue where some models failed at the concat operation during graph preparation. {132887}</p></li><br><li><p>Tool: Added a validation check for float fallback to prevent quantizer failures when encodings or calibration lists are not<br>provided. {133463}</p></li><br><li><p>Tool: Added support for the <cite>–onnx_batch</cite> and <cite>–tensorflow_batch</cite> options in Hypertuner after QAIRT converter changes. {131064}</p></li><br><li><p>Tool: Eliminated a misleading warning message “Function not called, PrepareLib isn’t loaded!” that would appear when running<br><cite>qnn-net-run</cite> successfully on HTP. {122382}</p></li><br><li><p>Tool: Fixed an issue where the <cite>is_symmetric</cite> value for 32-bit bias tensors was incorrectly reset during Float Fallback, causing<br>failures when the output DLC was passed back to the quantizer. {135379}</p></li><br><li><p>Tool: Fixed quantizer to insert Convert Op for LayerNorm weights with external encoding. {134466}</p></li><br><li><p>Tool: Resolved an issue where <cite>snpe-dlc-graph-prepare</cite> failed for certain models due to incompatible float bitwidths when QParams<br>were present, particularly in the float fallback path. {130558}</p></li><br><li><p>Tool:Converter: Added a fix for a bug in LayerNorm squeeze_axes. {126234}</p></li><br><li><p>Tool:Converter: Added a pattern to map to expand op to reduce inference time. {132363}</p></li><br><li><p>Tool:Converter: Added a warning message for the Non-Zero Op when the output shape is dynamic. {126185}</p></li><br><li><p>Tool:Converter: Added support for a new einsum equation, expanding the range of supported ONNX models. {133824}</p></li><br><li><p>Tool:Converter: Converter-generated FullyConnected Ops now have 2D input and 2D output. {127049}</p></li><br><li><p>Tool:Converter: Ensured that <cite>ApplyEncodings</cite> is called by the quantizer when <cite>–use_quantize_v2</cite> is provided internally, even if<br>not on the command line. {133705}</p></li><br><li><p>Tool:Converter: Fixed JSON dumping for 4-bit quantized tensors. {133481}</p></li><br><li><p>Tool:Converter: Fixed KernelScale expansion for scalars in TFLite DeConv dequantization. {128978}</p></li><br><li><p>Tool:Converter: Fixed a bug in NonZero Op translation constant folding. {127165}</p></li><br><li><p>Tool:Converter: Fixed a bug in the squash_node_into_nn_node optimization. {126354}</p></li><br><li><p>Tool:Converter: Fixed a conversion error that occurred when <cite>–float_bitwidth 16</cite> was provided on the command line with existing<br>quantization parameters. {134716}</p></li><br><li><p>Tool:Converter: Fixed a corner case in the DCE process in the converter to correctly handle node removal based on the number of<br>consumers of output tensors. {129704}</p></li><br><li><p>Tool:Converter: Fixed an error in the squash_node_into_nn_node optimization. {132836}</p></li><br><li><p>Tool:Converter: Fixed an issue where output nodes for BatchMatMul and BatchMatMulV2 Ops were missing by adding support to convert<br>them to FullyConnected Op. {127139}</p></li><br><li><p>Tool:Converter: Fixed an issue where the converter failed when using the <cite>–desired_input_layout</cite> argument with the new layout<br>transform algorithm by unifying its behavior with <cite>custom_io</cite>. {136144}</p></li><br><li><p>Tool:Converter: Fixed an issue with 6D support for Concat and Constant Ops in the frontend, resolving a core dump error during<br>quantization. {117698}</p></li><br><li><p>Tool:Converter: Fixed incorrect population of the “is_symmetric” flag, ensuring encodings are dumped correctly. {134673}</p></li><br><li><p>Tool:Converter: Fixed issue observed when several GRU share one init hidden status, add UT for bidirectional GRU. {91127}</p></li><br><li><p>Tool:Converter: Resolved an accuracy regression issue related to the <cite>squash_batchnorm</cite> optimization in the converter by ensuring<br>the optimization correctly handles encodings. {130130}</p></li><br><li><p>Tool:Converter: Skipped adding dummy weights and bias tensors during LayerNorm pattern matching. {128870}</p></li><br><li><p>Tool:Converter:ONNX: Added a fix for axis_format handling in matmul_to_fc translation. {118318}</p></li><br><li><p>Tool:Converter:ONNX: Fixed a model conversion issue with the Resize operation in the ONNX converter. {131677}</p></li><br><li><p>Tool:Converter:ONNX: Fixed an ONNX conversion failure for the Sam2 Image Encoder model by addressing layout format issues for<br>Matmul node inputs and outputs. {131098}</p></li><br><li><p>Tool:Op:HTP: Optimized the DepthwiseConv op with asymmetric stride to improve performance for specific models. {132474}</p></li><br><li><p>Tool:accuracy_debugger: Corrected a tensor shape issue for the oneshot algorithm with ONNX batch=1; the onnx_batch override option<br>is no longer accessible. {133915}</p></li><br><li><p>Tool:qairt-accuracy-evaluator: Removed the preproc-file option from the Accuracy Evaluator CLI as it is no longer valid due to the<br>deprecation of minimal mode. {129278}</p></li><br><li><p>Tool:qnn-onnx-converter: Fixed an issue where static tensor framework trace information was missing for some tensors. {120982}</p></li><br><li><p>Tool:qnn-tensorflow-converter: Added logic to ensure the min-max in TensorFlow FakeQuantPerChannel nodes are symmetric. {118672}</p></li><br><li><p>Tool:quantizer: Fixed an issue with 2-bit weight quantization calculation, resolving incorrect output values. {132048}</p></li><br></ul> |
| 2.34.0 | April 2025 | <ul class="simple"><br><li><p>API:Genie: Added GenieSampler_registerUserDataCallback API which adds a userData argument to the sampler custom callback. {130164}</p></li><br><li><p>API:Genie: Added <cite>GenieEngine.h</cite>, <cite>GenieDialog_getEngine</cite>, and <cite>GenieDialog_bindEngine</cite> APIs. {126715}</p></li><br><li><p>API:SNPE: Added Java API <cite>setUnconsumedTensorsOutput()</cite>, equivalent to the C/C++ builder API<br><cite>Snpe_SNPEBuilder_SetUnconsumedTensorsAsOutputs()</cite> / <cite>SNPEBuilder::setUnconsumedTensorsAsOutputs()</cite>. {125891}</p></li><br><li><p>CPU: Added BOOL support in CPU Concat Op. {130940}</p></li><br><li><p>CPU: Added axes parameter support in L2Norm. {121463}</p></li><br><li><p>DSP:SNPE: Added the ability to display the exact priority of the HVX thread in the log to help identify potential issues related<br>to HVX concurrency scenarios. {117790}</p></li><br><li><p>Genie: Added KV quantization support for GenAiTransformer backend. {123438}</p></li><br><li><p>Genie: Added a LoRAv3 reference/sample Genie configuration to the SDK examples. {130008}</p></li><br><li><p>Genie: Added the Eaglet dialog type. {126452}</p></li><br><li><p>Genie: Added token-acceptance-rate to the GenieProfile output for some dialog types. {123350}</p></li><br><li><p>Genie: Introduced a performance optimization where logits are sampled using the native datatype output of the model. {121359}</p></li><br><li><p>HTP: Deprecated optrace collection via debug configuration files. Use optrace via profiling instead. {124739}</p></li><br><li><p>HTP: Fixed an issue where the number of items was missing in the multicore callback. {129636}</p></li><br><li><p>HTP: Implemented service call to do dspqueue_close for multicore environments. {126381}</p></li><br><li><p>HTP: Introduced parallel graph execution, enabling concurrent running of multiple graphs on a single HTP core to improve<br>throughput and resource utilization {89181}</p></li><br><li><p>HTP: Performance improvement for Softmax Op with 32 channels or less. {130819}</p></li><br><li><p>Op:GPU: Added support for GridSample Op. {127898}</p></li><br><li><p>Op:HTP: Optimized DepthWiseConv2d op execution by ensuring it runs on HMX {128655}</p></li><br><li><p>Op:HTP: Optimized DepthwiseConv op performance for an ASR model on SM8750 HTP W8A16. {129860}</p></li><br><li><p>OpDef: Added dynamic shape support for FullyConnected Op. {116235}</p></li><br><li><p>OpDef: Added optional parameter <cite>buffer_padding</cite> to Buffer Op. {125962}</p></li><br><li><p>Tool:Converter: Added support for BQ and LPBQ in JSON serializer and deserializer. {132650}</p></li><br><li><p>Tool:Converter: Added support for quantized DLC files as input to the quantizer module. 1. If all tensors are quantized or<br>overridden float, return directly. 2. If half-quantized DLC, dequantize the fixed-point tensors back to float before quantization.<br>3. Quantize all float tensors. {129135}</p></li><br><li><p>Tool:Converter: Added support to trigger Quantizer with float_fallback mode. {129131}</p></li><br><li><p>Tool:Converter: Fixed handling of dynamic input shapes with a more informative error message. {127631}</p></li><br><li><p>Tool:Converter: Introduced a new Converter argument to guide different Converter output export formats: –export_format<br>[&quot;DLC_DEFAULT&quot;, &quot;DLC_STRIP_QUANT&quot;] {129132}</p></li><br><li><p>Tool:Converter: QAIRT Quantizer now skips quantization steps if float_fallback is specified for an input Quant DLC. {130397}</p></li><br><li><p>Tool:qnn-onnx-converter: Added the <cite>–preserve_onnx_output_order</cite> option to maintain ONNX output order in the converted graph.<br>{126070}</p></li><br><li><p>QNN Core: Fixed an issue where QNN Savecontext failed for multiple models on Windows platforms due to the inability to find the<br>graph in the DLC. {130104}</p></li><br><li><p>CPU: Added int32 data datatype for ScatterElements. {126766}</p></li><br><li><p>CPU: Fixed L2Norm to handle multiple axis {127053}</p></li><br><li><p>CPU: Fixed verifier failures for single-layer resize models on ONNX16 framework. {124524}</p></li><br><li><p>CPU: Implemented deep copy of <cite>opConfig</cite> in CPU to prevent model failures. {128204}</p></li><br><li><p>DSP: Fixed an SNPE inference failure due to QnnContext_createFromBinary failing with a memory allocation error. {127804}</p></li><br><li><p>DSP: Fixed an SNPE inference failure where multiple models failed due to errors obtaining input tensor names {127809}</p></li><br><li><p>DSP: Fixed inference failures for specific models on HTP due to network partition issues. {131151}</p></li><br><li><p>GPU: Fixed accuracy error in QnnGpuOperationTestActivationAndroid. {125640}</p></li><br><li><p>GPU: Fixed accuracy error in QnnGpuOperationTestTransposeConvAndroid. {125992}</p></li><br><li><p>GPU: Fixed inference regressions in models having Convolution Op in <cite>gpu_fp16</cite> mode for some devices. {120026}</p></li><br><li><p>Genie: Fixed issue in genie-t2t-run where dialog de-initialization data was not saved. {132621}</p></li><br><li><p>Genie: Fixed issue where GenieEmbedding_generate would return a rank of 0. {131581}</p></li><br><li><p>Genie: Fixed issue where quantized values may overflow or underflow. {125929}</p></li><br><li><p>HTP: Addressed inference time regressions on multiple chipsets for HTP and HTP_FP16 configurations. {128165}</p></li><br><li><p>HTP: Corrected the TransportResult resize function to properly set the number of cores. {132311}</p></li><br><li><p>HTP: Fixed a LayerNorm validation failure by checking rank of bias only if it’s present in LayerNorm Op. {106186}</p></li><br><li><p>HTP: Fixed a Windows compatibility issue related to non-shared weight VA reservation. {130567}</p></li><br><li><p>HTP: Fixed a crash in libQnnHtp.so that occurred in graph switch scenarios involving spill fill buffer sharing. {131575}</p></li><br><li><p>HTP: Fixed a deadlock in <cite>allocateAndMapPersistentSpillFillBuffer()</cite> that occurred due to locking conflicts. {132488}</p></li><br><li><p>HTP: Fixed a hang issue in GenAI TNR tests when using asynchronous group initialization with weight sharing and spill-fill sharing<br>with weight sharing. {132586}</p></li><br><li><p>HTP: Fixed a multithreaded concurrency issue with LLM and small models that caused a ‘memHandles registration failure’. {131051}</p></li><br><li><p>HTP: Fixed a performance regression for a MobileBERT model that was introduced in a previous release. {132111}</p></li><br><li><p>HTP: Fixed a prepare failure for the L2Norm op with fp16 when the relaxed_precision_flag is not set during converter stage.<br>{129566}</p></li><br><li><p>HTP: Fixed an issue where QNN HTP inference failed during MC detailed profiling. {132564}</p></li><br><li><p>HTP: Fixed an issue where multiple VA sharing groups caused the error ‘Unable to map reserved buffer for non-shared weights’.<br>{131009}</p></li><br><li><p>HTP: Fixed an issue where qnn-context-binary-generator would hang, consuming excessive CPU and memory. {126833}</p></li><br><li><p>HTP: Fixed intermittent hangs that occurred during the creation of a context from a binary in concurrent scenarios. {131049}</p></li><br><li><p>HTP: Fixed the checker failures related to the OpPackage example by correcting the include path. {130707}</p></li><br><li><p>HTP: Improved performance to address inference time regressions observed on multiple chipsets. {131073}</p></li><br><li><p>HTP: Resolved an issue related to spill-fill buffer sharing, which caused incorrect output. {124544}</p></li><br><li><p>HTP: Resolved an issue with x86_prepare failures during savecontext. High CPU utilization during graph preparation was addressed.<br>{125093}</p></li><br><li><p>HTP: Resolved failures in LoRA v2 test cases due to DSP transport call issues, impacting multi-model context and graph switch<br>scenarios. {130142}</p></li><br><li><p>HTP: Resolved inference time regressions on SM8750. Avoided broadcast overhead on mul_op to improve performance of uint16<br>elementwise multiplication. {125746}</p></li><br><li><p>HTP: Reverted the enablement of the 64-bit flag to address reported hangs. {130301}</p></li><br><li><p>HTP: Updated PGE support check to use support Features on SoC Model. {127754}</p></li><br><li><p>LPAI: Fixed a failure in LPAI direct mode {131750}</p></li><br><li><p>LPAI: Fixed an issue where LPAI single layer models were failing. {130729}</p></li><br><li><p>Op:DSP: Supported LayerNorm; modified the hard code check. {122112}</p></li><br><li><p>Op:HTP: Added 5D support for float Sigmoid. {128867}</p></li><br><li><p>Op:HTP: Addressed performance issues when converting models with w8a16 compared to w8a8 on SM8350 by optimizing matmul and Gemm<br>OPs. {121404}</p></li><br><li><p>Op:HTP: Fixed ReduceMax FP16 compilation error. {127900}</p></li><br><li><p>Op:HTP: Fixed a QNN context-binary-generator failure due to a TCM insufficient tile error when processing a custom model. {129510}</p></li><br><li><p>Op:HTP: Fixed context binary generation failures for ArgMin/ArgMax ops due to TCM overflow. {108763}</p></li><br><li><p>Op:HTP: Fixed model validation errors during context saving, specifically addressing issues with the DepthToSpace Op. {131083}</p></li><br><li><p>Op:HTP: Fixed numerical issue for DepthwiseConv2d -&gt; HardSwish in a MobileNetV3 model. {128158}</p></li><br><li><p>Op:HTP: Fixed rank constraints of Op replacement rule. {130194}</p></li><br><li><p>Op:HTP: Improved DepthwiseConv2D performance. {126421}</p></li><br><li><p>Op:HTP: Optimized Reshape Ops when PCQ is enabled on constant tensors going into a MatMul Op, improving performance. {130415}</p></li><br><li><p>Op:HTP: Registered QInt16 for Concat Op to resolve graph preparation failures when using QuantInt16 tensors. {125735}</p></li><br><li><p>Op:HTP: Resolved an issue where context binary size calculation failed during graph preparation. {124130}</p></li><br><li><p>Op:HTP: Resolved an on-device hang issue during execution of Dynamic MobileNet V2, specifically during the Transpose Op {126806}</p></li><br><li><p>Op:HTP: Resolved context binary generation failures for the BevFormer model with AMP encodings. {129991}</p></li><br><li><p>SDK: Fixed build issues in Qnn SampleApp, Qnn SampleAppAsyncExecution and Qnn SampleAppSharedBuffer. {131442}</p></li><br><li><p>SDK: Removed “pytorch to onnx conversion avoidance suggestions” from QNN SDK Docs. {132125}</p></li><br><li><p>SDK: <cite>ReleaseNotes.txt</cite> renamed to <cite>QAIRT_ReleaseNotes.txt</cite> and now contains release notes for both Unix and WoS. {127817}</p></li><br><li><p>SNPE: Fixed API <cite>Snpe_SNPEBuilder_SetInitCacheMode()</cite>/<cite>SNPEBuilder::setInitCacheMode()</cite> breakage for non-HTP backends when using<br>the <cite>snpe-net-run</cite> option <cite>–enable_init_cache</cite>. {129545}</p></li><br><li><p>SNPE: Fixed the <cite>–enable_init_cache</cite> option (API <cite>SNPEBuilder::setInitCacheMode()</cite>/<cite>Snpe_SNPEBuilder_SetInitCacheMode()</cite>) in<br><cite>net-run</cite> for AIP runtime. {131929}</p></li><br><li><p>Tool:Converter: Corrected an issue where qnn-context-binary-generator logged an incorrect QPC path when the –backend_binary<br>option was used. {126169}</p></li><br><li><p>Tool:Converter: Corrected the allowed length for pad amounts for 4D tensors in the emitter. {132185}</p></li><br><li><p>Tool:Converter: Enabled data invariant optimizations for the Tile Op. If the input of Tile Op is quantized, the input dataType and<br>qInfo are copied to the output. {126372}</p></li><br><li><p>Tool:Converter: Fixed Layout Transform to avoid unintentionally loading deferred weights. {132173}</p></li><br><li><p>Tool:Converter: Fixed a segfault issue in IrJsonDeserializer during deserialization of newly generated model JSON files. {129816}</p></li><br><li><p>Tool:Converter: Fixed an issue where Accuracy Evaluator runs failed at the Netrun stage. {129997}</p></li><br><li><p>Tool:Converter: Fixed an issue where FOLD_MULTIPLE_TRANSPOSE was incorrectly pruning graph outputs. {127963}</p></li><br><li><p>Tool:Converter: Fixed an issue where context binary generation failed with a ‘Graph Finalize failure’ when using multi-Qranium<br>pipelined partitioning. {124908}</p></li><br><li><p>Tool:Converter: Fixed an issue where qnn-context-binary generation failed for LVM UNet models due to tensor updateability and<br>GroupNorm Op validation errors with the HTP backend. {127887}</p></li><br><li><p>Tool:Converter: Fixed an issue where the qnn-context-binary-generator tool failed on Windows-X86 when processing LoRAv3 models.<br>{130894}</p></li><br><li><p>Tool:Converter: Fixed index error failure in remove identity optimization. {125867}</p></li><br><li><p>Tool:Converter: Fixed issue when folding multiple transposes to retain graph output names. {128685}</p></li><br><li><p>Tool:Converter: Resolved a serialization issue with MatMul ops involving int16*int16 data types when using dynamic 16-bit weights.<br>{129733}</p></li><br><li><p>Tool:Converter:ONNX: Added support for dynamic inputs for Clip Op. {124203}</p></li><br><li><p>Tool:Converter:ONNX: Fixed an issue in the Converter to ensure correct name sanitization following C++ naming conventions.<br>{129356}</p></li><br><li><p>Tool:Converter:ONNX: Fixed axis tracking in ScatterElements. {118614}</p></li><br><li><p>Tool:Converter:ONNX: Fixed issue for reverse GRU Op to ensure the correct order of input names for the first output. {130544}</p></li><br><li><p>Tool:Converter:ONNX: Updated translation for ExpandOp to reduce inference time. {127065}</p></li><br><li><p>Tool:qairt-accuracy-evaluator: Fixed issue where the input list was incorrectly passed to the quantizer. {130537}</p></li><br><li><p>Tool:qairt-accuracy-evaluator: - Added support for the ‘algorithms’ quantizer parameter in the evaluator. - Provided input shape<br>to the converter for PyTorch models. {126291}</p></li><br><li><p>Tool:qnn-accuracy-debugger: Enhanced the qnn-accuracy-debugger tool to provide more meaningful metrics for intermediate tensor<br>cosine similarity. {126437}</p></li><br><li><p>Tool:qnn-net-run: Resolved an issue in accuracy evaluator runs where the error “‘Namespace’ object has no attribute<br>‘preserve_graph_output_order’” was encountered. {132180}</p></li><br><li><p>Tool:qnn-onnx-converter: Aligned the ONNX Resize Op translator’s behavior with ONNX definitions. {123092}</p></li><br><li><p>Tool:snpe-architecture-checker: Fixed an issue where snpe-architecture-checker would fail due to an uninitialized variable.<br>{126778}</p></li><br><li><p>Tool:snpe-stress-net-run: Fixed a memory leak issue when loading QNN models. {128498}</p></li><br></ul> |

Last Published: Jun 04, 2026

[Previous Topic
Glossary](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/general_glossary.md) [Next Topic
SNPE](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/index_SNPE.md)