# Multi-SoC DLC with Reference Weight Sharing

- [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#overview)

    - [Key Terms](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#key-terms)
- [Reference Weight Sharing](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#reference-weight-sharing)

    - [Limitations](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#limitations)
- [API and Configuration Reference](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#api-and-configuration-reference)

    - [QNN_HTP_CONTEXT_CONFIG_OPTION_REFERENCE_WEIGHT_SHARING_ENABLED](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#qnn-htp-context-config-option-reference-weight-sharing-enabled)
    - [QnnContext_addToDlc()](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#qnncontext-addtodlc)
- [Generating a Multi-SoC DLC](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#generating-a-multi-soc-dlc)

    - [Step 1: Create Per-SoC Backend Config Files](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#step-1-create-per-soc-backend-config-files)
    - [Step 2: Create the Main Config File](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#step-2-create-the-main-config-file)
    - [Step 3: Run qnn-context-binary-generator](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#step-3-run-qnn-context-binary-generator)
- [Loading a Multi-SoC DLC at Runtime](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#loading-a-multi-soc-dlc-at-runtime)

## [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id1)

A context binary prepared through the standard offline workflow is tied to a single SoC.
This limits deployment flexibility for clients who need to support multiple target platforms
from a single artifact.

This feature enables clients to prepare context binaries for multiple SoCs in a single
offline pass on x86. All resulting context binaries are embedded into a single
Deep Learning Container (DLC), producing one portable artifact that can be deployed across
different target platforms. At runtime, the HTP backend automatically detects the DLC
format and loads the context binary that is compatible with the target SoC.

To minimize ROM size, reference weight sharing can optionally be enabled. When active,
the weights from the first prepared context binary serve as a shared reference for all
subsequent context binaries, eliminating weight duplication across SoCs.

The DLC also supports incremental extension — new context binaries prepared with a newer
SDK or for additional SoCs can be appended to an existing DLC at any time.

Note

1. Multi-SoC context binary generation and reference weight sharing are supported
exclusively through the DLC workflow. The traditional context binary format
supports only a single SoC per binary and doesn’t support these capabilities.
2. The DLC serialization workflow is not limited to multi-SoC use cases — it can also
be used to serialize a single SoC context binary into a DLC.
3. When targeting multiple SoCs, if the source model is quantized (e.g., using AIMET),
it should be quantized with a common denominator architecture — one that is forward
compatible with all target architectures. This ensures the same quantized model can
be used to prepare context binaries for the targeted SoCs.

Note

`libQnnSystem.so` (or `QnnSystem.dll` on Windows) is required only during offline
context binary preparation. Both the client and the HTP backend dynamically load this
library to read and write DLC records at preparation time. This library is not required
on the target device for inference.

### [Key Terms](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id2)

- **DLC (Deep Learning Container)**

    The Qualcomm representation of a neural network, packaging its topology, weights, and
parameters into a single portable file. A DLC is produced by QAIRT converters and
consumed by QAIRT SDKs. In the context of this feature, a DLC also serves as the
container for one or more SoC-specific context binaries, along with their associated
shared weight blobs and metadata, all stored as separate records within the file.
- **Reference Weight Sharing**

    An optimization where the weight blob from the first prepared context binary is stored
once in the DLC as a shared reference. All subsequent context binaries reference this
shared blob rather than storing their own copy, reducing overall DLC ROM size.
- **Multi-SoC DLC**

    A DLC containing context binaries for more than one target SoC. It can be extended at
any time by appending context binaries prepared for additional SoCs or with newer SDKs.

## [Reference Weight Sharing](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id3)

Without reference weight sharing, each SoC’s context binary carries its own independent
weight blob. A DLC targeting N SoCs therefore contains N weight blobs. With reference
weight sharing, a single shared weight blob is stored in the DLC and referenced by all
subsequent SoC context binaries, significantly reducing overall ROM size.

If weight sharing is less than 100%, the unshared portion of weights contributes
proportionally to the DLC’s ROM size regardless of DLC stripping.

For optimal weight sharing, prepare the same set of graphs for all target SoCs.
Preparing a different set of graphs across SoCs reduces the effectiveness of weight sharing.

### [Limitations](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id4)

| Scenario | Behavior |
| --- | --- |
| 64-bit UDMA context binaries | Fully supported. Far and near shared blobs from the first context serve as the<br>shared reference for subsequent context binaries. |
| Mixed uDMA and non-uDMA context binaries | Not optimal. Weight sharing effectiveness may be reduced, which can increase ROM<br>size compared to the fully shared case. |
| DLBC (Deep Learning Bandwidth Compression) | Not supported. When DLBC is enabled for a context, reference weight sharing is<br>not applied for that context. |
| Multicore binaries | Not supported across multicore binaries, or between single-core and multicore<br>binaries. |
| LoRA weight sharing | Not supported alongside reference weight sharing. |
| One context binary per SoC | A DLC can contain only one context binary per SoC. Calling `QnnContext_addToDlc()`<br>for a SoC that already has a binary in the DLC will fail. |

## [API and Configuration Reference](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id5)

### [QNN_HTP_CONTEXT_CONFIG_OPTION_REFERENCE_WEIGHT_SHARING_ENABLED](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id6)

There is a new HTP context configuration option that enables reference weight sharing across
context binaries embedded in a DLC. This option is disabled by default and is only
applicable to the DLC workflow.

The corresponding field in `QnnHtpContext_CustomConfig_t` is `referenceWeightSharingEnabled`.

QnnHtpContext_CustomConfig_t customConfig = QNN_HTP_CONTEXT_CUSTOM_CONFIG_INIT;
    customConfig.option                       = QNN_HTP_CONTEXT_CONFIG_OPTION_REFERENCE_WEIGHT_SHARING_ENABLED;
    customConfig.referenceWeightSharingEnabled = true;
    
    QnnContext_Config_t config = QNN_CONTEXT_CONFIG_INIT;
    config.option       = QNN_CONTEXT_CONFIG_OPTION_CUSTOM;
    config.customConfig = &customConfig;
    
    const QnnContext_Config_t *configs[] = {&config, NULL};
    Copy to clipboard

When this option is enabled:

- If the DLC doesn’t yet contain a shared weight blob, the backend generates one
along with associated metadata and stores them as separate records in the DLC.
- If the DLC already contains a shared weight blob, the backend reads and reuses it
to deduplicate weights for the new context binary being appended.

### [QnnContext_addToDlc()](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id7)

There is a new context-level API that serializes a finalized context binary into a DLC. This API
must be called after all graphs in the context have been finalized. It is used internally
by `qnn-context-binary-generator` as part of the DLC workflow, but can also be called
directly for custom workflows.

Qnn_ErrorHandle_t QnnContext_addToDlc(Qnn_ContextHandle_t   context,
                                          QnnSystemDlc_Handle_t dlcHandle);
    Copy to clipboard

**Parameters**

- `context` — Handle to a finalized context whose binary is to be saved.
- `dlcHandle` — Handle to an existing DLC, created via `QnnSystemDlc_createFromFile()`
or `QnnSystemDlc_createFromBinary()`.

**Return codes**

The following are common return codes. Other error codes may be returned depending on
internal state and backend implementation.

| Error Code | Description |
| --- | --- |
| `QNN_SUCCESS` | Context binary successfully written to the DLC. |
| `QNN_CONTEXT_ERROR_UNSUPPORTED_FEATURE` | Feature is not supported by the backend. |
| `QNN_CONTEXT_ERROR_INVALID_HANDLE` | `context` is NULL, not registered, or otherwise invalid; or `dlcHandle` is NULL<br>or not a valid DLC handle. |
| `QNN_COMMON_ERROR_OPERATION_NOT_PERMITTED` | `context` was created via `QnnContext_createFromBinary()` or a related<br>deserialization API. Serializing a deserialized context is not permitted. |
| `QNN_CONTEXT_ERROR_GET_BINARY_FAILED` | Serialization of the context binary failed internally. |
| `QNN_COMMON_ERROR_GENERAL` | `QnnSystem` library could not be loaded or initialized. Ensure `libQnnSystem.so`<br>(or `QnnSystem.dll` on Windows) is accessible at preparation time. |

After calling `QnnContext_addToDlc()`, the client serializes the DLC to the output
location using `QnnSystemDlc_save()` and releases the handle with `QnnSystemDlc_free()`.

## [Generating a Multi-SoC DLC](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id8)

The `qnn-context-binary-generator` tool supports multi-SoC DLC generation in a single
invocation. The tool iterates over each per-SoC config, prepares the corresponding context
binary, and appends it to the output DLC. The output DLC retains the original source model
and accumulates all prepared context binaries.

### [Step 1: Create Per-SoC Backend Config Files](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id9)

Create one backend extension config file per target SoC. Set
`reference_weight_sharing_enabled` to `true` in the `context` section of each file.

**soc1\_config.json**

{
        "graphs": [
            {
                "graph_names": ["qnn_model"],
                ...
            }
        ],
        "devices": [
            {
                ...
                "dsp_arch": "v79",
                "soc_id": 69
            }
        ],
        "context": {
            "reference_weight_sharing_enabled": true
        }
    }
    Copy to clipboard

**soc2\_config.json**

{
        "graphs": [
            {
                "graph_names": ["qnn_model"],
                ...
            }
        ],
        "devices": [
            {
                ...
                "dsp_arch": "v81",
                "soc_id": 87
            }
        ],
        "context": {
            "reference_weight_sharing_enabled": true
        }
    }
    Copy to clipboard

### [Step 2: Create the Main Config File](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id10)

Create a top-level config file that points to each per-SoC config via the
`per_soc_config_file_path` array under `backend_extensions`. The tool processes the
SoC configs in the order they are listed. The first entry becomes the reference context
binary for weight sharing.

**multi\_soc\_config.json**

{
        "backend_extensions": {
            "shared_library_path": "lib/x86_64-linux-clang/libQnnHtpNetRunExtensions.so",
            "per_soc_config_file_path": [
                "soc1_config.json",
                "soc2_config.json"
            ]
        }
    }
    Copy to clipboard

Note

`per_soc_config_file_path` and the `--htp_socs` command-line flag are mutually
exclusive. Use `per_soc_config_file_path` when per-SoC configuration is required,
such as enabling reference weight sharing.

Alternatively, reference weight sharing can be enabled for all contexts via the
`--reference_weight_sharing_enabled_override` flag, which takes precedence over the
per-SoC config files:

--reference_weight_sharing_enabled_override true
    Copy to clipboard

### [Step 3: Run qnn-context-binary-generator](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id11)

Provide the input DLC (converter output), the main config file, and the output DLC path.
The tool prepares a context binary for each SoC listed in the config and appends it to
the output DLC.

${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
        --backend    ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
        --model      ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
        --dlc_path   model.dlc \
        --output_dlc model_multi_soc.dlc \
        --output_dir output/ \
        --config_file multi_soc_config.json \
        --log_level  warn
    Copy to clipboard

The resulting `model_multi_soc.dlc` contains:

- The QNN source model (QNN graph and weights from the converter)
- A context binary for each target SoC
- Weights shared across context binaries (when reference weight sharing is enabled)

The `qairt-dlc-info` tool can be used to inspect the contents of the generated DLC,
including all embedded context binaries and their associated metadata. This is useful
for verifying the output before proceeding to deployment.

qairt-dlc-info -i model_multi_soc.dlc
    Copy to clipboard

Note

1. See the `qairt-dlc-info` tool documentation for the full list of available
options and output details.
2. The `qnn-context-binary-utility` tool doesn’t support DLC inspection. Use
`qairt-dlc-info` to inspect the contents of a DLC.

## [Loading a Multi-SoC DLC at Runtime](https://docs.qualcomm.com/doc/80-63442-10/topic/htp_multi_soc_dlc.html#id12)

The HTP backend is DLC-aware at runtime. A multi-SoC DLC can be passed directly to any
of the standard context loading APIs in the same way as a traditional context binary.
The backend automatically detects the DLC format, selects the context binary
compatible with the target SoC, and loads it. No changes to the loading call are required.

The following APIs all support DLC input transparently:

- `QnnContext_createFromBinary()`
- `QnnContext_createFromBinaryWithSignal()`
- `QnnContext_createFromBinaryWithCallback()`
- `QnnContext_createFromBinaryListAsync()`

**Example**

// Load the DLC buffer (same as loading a traditional context binary)
    void  *dlcBuffer     = /* pointer to DLC data */;
    size_t dlcBufferSize = /* size of DLC data in bytes */;
    
    Qnn_ContextHandle_t context = NULL;
    Qnn_ErrorHandle_t error = QnnContext_createFromBinary(
        backendHandle,
        deviceHandle,
        contextConfigs,
        dlcBuffer,
        dlcBufferSize,
        &context,
        profileHandle);
    Copy to clipboard

Last Published: Jun 04, 2026

[Previous Topic
QNN HTP Qmem Graph](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/htp_qmem_graph.md) [Next Topic
HTA](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/hta_backend.md)