# QNN LPAI Memory Management

This document describes how the QNN Low-Power AI (LPAI) runtime uses and manages
memory. The runtime relies on user-allocated buffers that must obey backend-provided
alignment constraints. Incorrect alignment or insufficient memory will cause
initialization or execution failures.

- [Overview of Memory Types](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#overview-of-memory-types)
- [Get Memory Alignment Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#get-memory-alignment-requirements)
- [Scratch Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#scratch-memory)

    - [Key Properties](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#key-properties)
    - [Querying Scratch Memory Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#querying-scratch-memory-requirements)
    - [Allocating and Configuring Scratch Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#allocating-and-configuring-scratch-memory)
- [Persistent Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#persistent-memory)

    - [Key Properties](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id2)
    - [Querying Persistent Memory Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#querying-persistent-memory-requirements)
    - [Allocating and Configuring Persistent Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#allocating-and-configuring-persistent-memory)
- [IO Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#io-memory)

    - [Key Properties](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id3)
    - [Querying IO Memory Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#querying-io-memory-requirements)
    - [Allocating and Configuring IO Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#allocating-and-configuring-io-memory)
    - [Shared Buffers in the LPAI Backend](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#shared-buffers-in-the-lpai-backend)
- [Memory Lifetime and Allocation Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#memory-lifetime-and-allocation-requirements)
- [Recommended Workflow](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#recommended-workflow)
- [TCM Memory Support in LPAIBackendExtensions (ADSP Direct Mode)](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#tcm-memory-support-in-lpaibackendextensions-adsp-direct-mode)

    - [Allocation Details](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#allocation-details)
    - [Configuring TCM Memory via JSON](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#configuring-tcm-memory-via-json)
    - [Invalid `mem_type` Configuration](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#invalid-mem-type-configuration)
    - [Testing TCM Memory Support](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#testing-tcm-memory-support)

## [Overview of Memory Types](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id4)

The LPAI runtime uses **three distinct memory pools**, each required for correct graph
execution:

1. [Scratch Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#qnn-lpai-scratch-memory)
2. [Persistent Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#qnn-lpai-persistent-memory)
3. [IO Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#qnn-lpai-io-memory)

Each type has unique allocation rules, lifetime characteristics, and backend alignment
requirements.

- **Scratch Memory**: temporary and overwriteable tensors.
- **Persistent Memory**: long-lived tensors such as RNN state.
- **IO Memory**: input/output tensors; may be user-provided or automatically placed
into scratch memory.

All memory pools must be correctly aligned according to backend requirements.

## [Get Memory Alignment Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id5)

Before allocating any memory, clients must retrieve backend alignment constraints.
These constraints apply to:

- Scratch memory
- Persistent memory
- User-provided IO buffers

To query backend alignment requirements:

1QnnLpaiBackend_BufferAlignmentReq_t bufferAlignmentReq;
     2
     3QnnLpaiBackend_CustomProperty_t customBackendProp;
     4customBackendProp.option   = QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ;
     5customBackendProp.property = &bufferAlignmentReq;
     6
     7QnnBackend_Property_t backendProp;
     8backendProp.option         = QNN_BACKEND_PROPERTY_OPTION_CUSTOM;
     9backendProp.customProperty = &customBackendProp;
    10
    11QnnBackend_Property_t *backendPropPtrs[2] = {0};
    12backendPropPtrs[0] = &backendProp;
    13
    14QnnBackend_getProperty(backendHandle, backendPropPtrs);
    15
    16if (!error) {
    17  *startAddrAlignment = bufferAlignmentReq.startAddrAlignment;
    18  *sizeAlignment      = bufferAlignmentReq.sizeAlignment;
    19}
    Copy to clipboard

## [Scratch Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id6)

Scratch memory holds temporary intermediate results that the runtime can overwrite and
reuse during execution.

### [Key Properties](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id7)

- Used for intermediate tensors across graph execution.
- Fully memory-planned offline by the backend.
- Size must be queried from the graph.
- Must be provided before `QnnGraph_finalize()`.
- May be replaced at runtime but must always exist.

### [Querying Scratch Memory Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id8)

QnnLpaiGraph_CustomProperty_t customGraphProp;
    customGraphProp.option   = QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE;
    customGraphProp.property = scratchSize;
    
    QnnGraph_Property_t graphProp;
    graphProp.option         = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
    graphProp.customProperty = &customGraphProp;
    
    QnnGraph_Property_t *graphPropPtrs[2] = {0};
    graphPropPtrs[0] = &graphProp;
    
    QnnGraph_getProperty(graphHandle, graphPropPtrs);
    Copy to clipboard

### [Allocating and Configuring Scratch Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id9)

QnnLpaiGraph_Mem_t lpaiGraphMem;
    lpaiGraphMem.memType = memType;
    lpaiGraphMem.size    = scratchSize;
    lpaiGraphMem.addr    = scratchBuffer;
    
    QnnLpaiGraph_CustomConfig_t customGraphCfg;
    customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM;
    customGraphCfg.config = &lpaiGraphMem;
    
    QnnGraph_Config_t graphConfig;
    graphConfig.option       = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
    graphConfig.customConfig = &customGraphCfg;
    
    QnnGraph_Config_t *graphCfgPtrs[2] = {0};
    graphCfgPtrs[0] = &graphConfig;
    
    QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
    Copy to clipboard

## [Persistent Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id10)

Persistent memory stores intermediate tensors that **cannot be overwritten**, because
they must persist across operations. Examples include RNN state tensors.

### [Key Properties](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id11)

- Holds long-lived intermediate data.
- User must allocate memory after querying required size.
- Must follow backend alignment constraints.
- Must remain valid until `QnnContext_free()`.

### [Querying Persistent Memory Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id12)

QnnLpaiGraph_CustomProperty_t customGraphProp;
    customGraphProp.option   = QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE;
    customGraphProp.property = persistentSize;
    
    QnnGraph_Property_t graphProp;
    graphProp.option         = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
    graphProp.customProperty = &customGraphProp;
    
    QnnGraph_Property_t *graphPropPtrs[2] = {0};
    graphPropPtrs[0] = &graphProp;
    
    QnnGraph_getProperty(graphHandle, graphPropPtrs);
    Copy to clipboard

### [Allocating and Configuring Persistent Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id13)

QnnLpaiGraph_Mem_t lpaiGraphMem;
    lpaiGraphMem.memType = memType;
    lpaiGraphMem.size    = persistentSize;
    lpaiGraphMem.addr    = persistentBuffer;
    
    QnnLpaiGraph_CustomConfig_t customGraphCfg;
    customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM;
    customGraphCfg.config = &lpaiGraphMem;
    
    QnnGraph_Config_t graphConfig;
    graphConfig.option       = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
    graphConfig.customConfig = &customGraphCfg;
    
    QnnGraph_Config_t *graphCfgPtrs[2] = {0};
    graphCfgPtrs[0] = &graphConfig;
    
    QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
    Copy to clipboard

## [IO Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id14)

IO memory contains all graph input and output tensors.

### [Key Properties](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id15)

- Can be user-provided or mapped into scratch memory by default.
- User-provided IO buffers must follow alignment requirements.
- Must remain valid during graph execution.

### [Querying IO Memory Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id16)

// QnnSystemInterface is defined in ${QNN_SDK_ROOT}/include/QNN/System/QnnSystemInterface.h
    QnnSystemInterface qnnSystemInterface;
    
    // Init qnn system interface ......
    // See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code
    // Extract QNN binaryInfo
    const QnnSystemContext_BinaryInfo_t* binaryInfo;
    Qnn_ContextBinarySize_t binaryInfoSize;
    qnnSystemInterface->systemContextGetBinaryInfo(qnnSystemCtxHandle,
                                                   contextBinaryBuffer,
                                                   contextBinaryBufferSize,
                                                   &binaryInfo,
                                                   &binaryInfoSize);
    // Extract graph info from QNN binaryInfo, assume only one graph in the context
    QnnSystemContext_GraphInfo_t* graphInfos = binaryInfo->contextBinaryInfoV1.graphs;
    QnnSystemContext_GraphInfo_t* graphInfo  = &(graphInfos[0]);
    
    // Extract tensor info from graphInfo
    Qnn_Tensor_t* inputs     = graphInfo->graphInfoV1.graphInputs;
    Qnn_Tensor_t* outputs    = graphInfo->graphInfoV1.graphOutputs;
    size_t numInputs         = graphInfo->graphInfoV1.numGraphInputs;
    size_t numOutputs        = graphInfo->graphInfoV1.numGraphOutputs;
    Copy to clipboard

### [Allocating and Configuring IO Memory](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id17)

// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
    Qnn_Tensor_t tensors[numTensors];
    
    size_t startAddrAlignment, sizeAlignment
    // Retrieve buffer start address and size alignment requirements
    // See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code
    
    for (uint32_t i = 0; i < numTensors; i++) {
       Qnn_Tensor_t* tensor = &tensors[i];
       tensor->v1.memType   = QNN_TENSORMEMTYPE_RAW;
       int dataSize         = calculate_tensor_size(qnnTensor->v1);
       tensor->v1.clientBuf.data =
          allocate_aligned_memory(startAddrAlignment, sizeAlignment, dataSize);
       tensor->v1.clientBuf.dataSize = dataSize;
    }
    Copy to clipboard

### [Shared Buffers in the LPAI Backend](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id18)

In the LPAI backend, *shared buffers* offer an efficient mechanism for moving data
between the host CPU and the LPAI accelerator without requiring additional memory
copies. Shared buffers allow both domains to reference the same underlying memory,
enabling:

- **Zero-copy tensor transfers**
- **Reduced latency during graph execution**
- **Avoidance of redundant CPU-to-accelerator buffer duplication**
- **Improved overall memory efficiency**

Shared buffers are especially valuable when frequently updating input tensors or
retrieving output tensors at high frame rates.

The following tutorial explains how to register and use shared buffers within the
LPAI backend, covering the required API calls and expected memory constraints:

- [Allocate and Use Shared Buffers](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend.html#qnn-lpai-shared-buffer-tutorial)

## [Memory Lifetime and Allocation Requirements](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id19)

- Scratch and persistent memory must be allocated and provided before
`QnnGraph_finalize()`.
- Persistent memory must remain accessible for the entire lifetime of the LPAI context.
- Scratch memory may be replaced dynamically but must always exist.
- IO memory must remain valid throughout execution.

## [Recommended Workflow](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id20)

1. Query backend alignment requirements.
2. Query scratch memory size.
3. Query persistent memory size.
4. Allocate aligned memory buffers.
5. Pass scratch and persistent memory to the graph using `QnnGraph_setConfig()`.
6. Call `QnnGraph_finalize()`.
7. Optionally provide user-defined IO buffers.
8. Execute the graph.

## [TCM Memory Support in LPAIBackendExtensions (ADSP Direct Mode)](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id21)

The `LPAIBackendExtensions` library supports allocating I/O tensor buffers and the
model binary in **Tightly Coupled Memory (TCM)** when running in
[ADSP Direct Mode](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_execution_direct_tutorial.html#qnn-lpai-direct-mode-backend-type). TCM is fast on-chip
memory with lower access latency than main system memory (DDR).

By default, buffers are allocated in DDR. When TCM is selected, allocations are
served from a fixed on-chip memory pool of up to **2,035,712 bytes** (~1.94 MB).

Note

TCM memory support is only available in ADSP Direct Mode, with a maximum pool size
of **2,035,712 bytes** (~1.94 MB). It is not supported on ARM or x86 simulation
platforms.

The following `mem_type` values are supported for I/O tensor buffers and the model binary:

| `mem_type` value | Description |
| --- | --- |
| `"ddr"` | **(Default)** Allocates buffers in main system memory (DDR). |
| `"tcm"` | Allocates buffers in fast on-chip memory (TCM). Lowest latency; limited to ~1.94 MB total. |

### [Allocation Details](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id22)

- **I/O tensor buffers** and **model binary**: allocated in the memory type specified
by `mem_type`.
- **Scratch and persistent memory**: always allocated in DDR, regardless of the
configured `mem_type`.
- Memory type cannot be mixed: I/O tensor buffers and the model binary always use
the same type.

### [Configuring TCM Memory via JSON](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id23)

Set `mem_type` to `"tcm"` in the `lpai_graph/execute` section of
`eaiParams_direct.conf`:

{
      "lpai_graph": {
        "execute": {
          "mem_type": "tcm"
        }
      }
    }
    Copy to clipboard

Refer to [QNN LPAI Backend Configuration Parameters](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_setup_configuration.html#qnn-lpai-configuration-parameters)
for the full list of supported configuration keys.

### [Invalid `mem_type` Configuration](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id24)

If an unrecognized value is provided for `mem_type`, the config parser logs:

Invalid memory type
    Copy to clipboard

and resolves the type to `QNN_LPAI_MEM_TYPE_UNDEFINED`. When the allocator
encounters this undefined type, it logs:

Memory type only supports DDR & TCM
    Copy to clipboard

and returns `NULL`, causing tensor buffer or model binary allocation to fail and
triggering an initialization error. Ensure `mem_type` is set to `"ddr"` or
`"tcm"`.

### [Testing TCM Memory Support](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend_memory_allocations.html#id25)

#### Prerequisites

- A device with Hexagon ADSP.
- QNN SDK version `v6` or later.
- An offline-prepared LPAI model binary (`*.serialized.bin`).
- Total required TCM memory less than **2,035,712 bytes** (~1.94 MB).
- The required libraries and binaries. Refer to the LPAI entry in the
[Available QNN SDK Backend libraries](https://docs.qualcomm.com/doc/80-63442-10/topic/backend.html#qnn-sdk-backends-table) section.

#### Prepare the TCM Configuration File

Create `eaiParams_direct.conf` with TCM enabled:

{
      "lpai_graph": {
        "execute": {
          "mem_type": "tcm"
        }
      }
    }
    Copy to clipboard

Create `eaiParams_direct.json` for `qnn-net-run`:

{
      "backend_extensions": {
        "shared_library_path": "/data/local/tmp/libQnnLpaiNetRunExtensions.so",
        "config_file_path": "/data/local/tmp/eaiParams_direct.conf"
      },
      "context_configs": {
        "is_persistent_binary": true
      }
    }
    Copy to clipboard

#### Run the Model

$ ./qnn-net-run \
        --input_list model/input_list.txt \
        --backend /data/local/tmp/libQnnLpai.so \
        --direct_mode adsp \
        --config_file model/eaiParams_direct.json \
        --retrieve_context model/tmp.bin
    Copy to clipboard

#### Verify TCM Allocation

With `--log_level debug` or higher, the following `QNN_INFO` messages confirm
successful pool registration and buffer allocation:

TCM pool current size: <current> max_size: <max>
    Allocate size <N> from tcm pool 0
    Copy to clipboard

The first message is emitted once when the TCM pool is registered on the initial
allocation. The second is printed for each subsequent buffer allocation.

If a TCM allocation fails, a `QNN_ERROR` message is emitted:

TCM memory allocation failure: required size: <N>
    Copy to clipboard

#### Limitations and Error Behavior

- TCM is only available in **ADSP Direct Mode**; ARM and x86 simulation modes are
not supported.
- Scratch and persistent memory are always allocated in DDR.
- I/O tensor buffers and the model binary always use the same memory type; mixed
usage across memory types is not supported.
- The TCM pool maximum size is **2,035,712 bytes** (~1.94 MB); there is no
dynamic resizing. If the combined size of all I/O tensor buffers and the model
binary exceeds this limit, the allocator returns `NULL` and emits
`TCM memory allocation failure: required size: <N>`, causing an initialization
failure. Reduce model or tensor buffer sizes, or switch `mem_type` to `"ddr"`.
- The TCM allocator supports a maximum of **32** individual buffer allocations.
Exceeding this limit causes a system fatal error. Reduce the number of individual
tensor buffer allocations, or switch `mem_type` to `"ddr"`.
- In all failure cases, there is no automatic fallback to DDR; the application must
handle the failure explicitly.

Last Published: Jun 04, 2026