# Develop an AI application using QAIRT C++ APIs

Note

Support for this section on Ubuntu will be available soon.

The Qualcomm AI Runtime (QAIRT) SDK provides C++ APIs for sample
application development. Samples are available for both Qualcomm AI Engine Direct (QNN)
and Qualcomm Neural Processing Engine SDK (SNPE). The samples help you begin application
development. The following instructions describe how to build, run, and navigate the
source code. They demonstrate the workflow for utilizing QNN or SNPE APIs to run a model.

## Build and run the QNN sample app

The `qnn-sample-app` is located at `${QNN_SDK_ROOT}/examples/QNN/SampleApp`,
where `QNN_SDK_ROOT` refers to the path where the QNN SDK has been extracted.

### Set up the QAIRT SDK

To setup the toolchain for the QNN sample app, do the following:

1. [Download the Qualcomm AI Runtime SDK](https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.43.0.260128/v2.43.0.260128.zip).
2. Extract and unzip the SDK.

unzip v2.43.0.260128.zip
        Copy to clipboard

cd qairt/2.43.0.260128
        Copy to clipboard

export QNN_SDK_ROOT=`pwd`
        Copy to clipboard
3. Install the eSDK.

    Follow the [Qualcomm IM SDK quickstart](https://docs.qualcomm.com/doc/80-80022-51/topic/install-sdk.html#section-b5c-z3k-5bc) to install the eSDK, which contains the
required cross-compiler toolchain.

    - For Yocto Scarthgap devices, the libraries are compiled with GCC-11.2.
    - Set the `ESDK_PATH` environment variable with eSDK installation path. Later steps use the installation path (`/path/to/extracted/toolchain`) for the compilation.

export ESDK_PATH="/path/to/extracted/toolchain"
            Copy to clipboard

### Build the QNN sample app

Follow the steps below to setup the toolchain for the QNN sample app.

1. Go to the sample app directory.

cd ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleApp/
        Copy to clipboard
2. Set the environment variable for the GCC toolchain.

export QNN_AARCH64_LINUX_OE_GCC_112=$ESDK_PATH
        Copy to clipboard
3. Build the application.

make CXX="$ESDK_PATH/tmp/sysroots/x86_64/usr/bin/aarch64-qcom-linux/aarch64-qcom-linux-g++ --sysroot=$ESDK_PATH/tmp/sysroots/qcs6490-rb3gen2-vision-kit/" all_linux_oe_aarch64_gcc112
        Copy to clipboard

    This creates two folders.

    - `bin`: Contains `qnn-sample-app` binaries for each platform within their respective directories.
    - `obj`: Contains all object files used in building and linking the executable.

### Run the QNN sample app on Linux (Yocto-based)

The built `qnn-sample-app` executable can run a model with any
QNN backend. For Yocto scarthgap-based devices, backends are available
for `aarch64-oe-linux-gcc11.2`.

1. Push the artifacts to the target device.

scp ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleApp/bin/aarch64-oe-linux-gcc11.2/qnn-sample-app root@[ip-addr]:/etc/apps/qnn-sample-app
        Copy to clipboard

Note

Create the `/etc/apps/` directory if it doesn’t already exist on the device.
2. On the host computer, use [AI Hub](https://docs.qualcomm.com/doc/80-80022-15B/topic/ai-hub.html) to export a model.

    For example, to export the InceptionV3 QNN model, run the following commands:

pip3 install qai-hub-models
        Copy to clipboard

python -m qai_hub_models.models.inception_v3.export --quantize w8a8 --target-runtime=qnn_context_binary --device="Dragonwing RB3 Gen 2 Vision Kit" --compile-options="qairt_version 2.43" --profile-options "--qairt_version 2.43"
        Copy to clipboard

Note

Generate the context binary for the same SDK version in use on the target device.
3. Push the exported InceptionV3 QNN model to the target device.

    Save the model to `export_assets/inception_v3-qnn_context_binary-w8a8-<CHIPSET>`. The following example uses `QCS6490` as the chipset.

scp export_assets/inception_v3-qnn_context_binary-w8a8-qualcomm_qcs6490/inception_v3.bin root@<ip-addr>:/etc/apps/
        Copy to clipboard

    When prompted to enter the password, enter oelinux123.

4. On the host machine, generate a dummy input file to be used for inference and transfer it to the target device.

    1. Run the following commands in the Python environment.

python3
            Copy to clipboard

import numpy as np
            Copy to clipboard

((np.random.random((1,3,224,224)).astype(np.float32))).tofile("input.raw")
            Copy to clipboard
    2. Transfer the `input.raw` file to the target device:

scp input.raw root@<IP_ADDRESS_OF_TARGET_DEVICE>:/etc/apps
            Copy to clipboard
5. From the host computer, SSH into the target device.

ssh root@<IP_ADDRESS_OF_TARGET_DEVICE>
        Copy to clipboard

cd /etc/apps
        Copy to clipboard
6. Create `input_list.txt`.

echo "input.raw" > /etc/apps/input_list.txt
        Copy to clipboard
7. Run the app.

chmod +x qnn-sample-app
        Copy to clipboard

./qnn-sample-app --retrieve_context inception_v3.bin \
                         --backend libQnnHtp.so \
                         --input_list input_list.txt \
                         --system_library libQnnSystem.so
        Copy to clipboard

Note

Update the model name and input\_list as per the selected model.

    For help context, run:

./qnn-sample-app --help
        Copy to clipboard

Command line arguments

- **Required arguments**
    - - `--model`: Path to the QNN network model. Mutually exclusive with `--retrieve_context`.
- `--retrieve_context`: Path to a cached binary for loading a saved
context and execution graphs. Mutually exclusive with `--model`.
- `--backend`: Path to a QNN backend to run the model.
- `--input_list`: Path to a file listing network inputs. For multiple
graphs, provide a comma-separated list of input files.

- **Optional arguments**
    - - `--debug`: Save output from all network layers.
- `--output_dir`: Directory for outputs (default: ./output).
- `--output_data_type`: Output data type (float\_only, native\_only, float\_and\_native).
- `--input_data_type`: Input data type (float or native).
- `--op_packages`: Comma-separated list of op packages and interface providers.
- `--profiling_level`: Profiling level (basic or detailed).
- `--save_context`: Save backend context and graph metadata to a binary file.
- `--num_inferences`: Number of inferences to perform.
- `--log_level`: Max logging level (error, warn, info, verbose).
- `--system_library`: Path to libQnnSystem.so for reflection APIs during context loading.
- `--version`: Print QNN SDK version.
- `--help`: Display help message.

## Workflow and API usage

Use the following recommended pattern to develop C++ applications using QNN APIs.

1. [Load prerequisite shared libraries.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#loading-pre-requisite-shared-libraries)
2. [Use QNN APIs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#usage-of-qnn-apis)

    1. [Use QNN interface to obtain function pointers.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#use-qnn-interface-to-obtain-function-pointers)
    2. [Set up logging.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#set-up-logging)
    3. [Initialize backend.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#initialize-backend)
    4. [Initialize profiling.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#initialize-profiling)
    5. [Create device.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#create-device)
    6. [Register op packages.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#register-op-packages)
    7. [Create context.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#create-context)
    8. [Prepare graphs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#prepare-graphs)
    9. [Finalize graphs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#finalize-graphs)
    10. [Save context into a binary.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#save-context-into-a-binary)
    11. [Load context from a cached binary.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#load-context-from-a-cached-binary)
    12. [Run graphs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#execute-graphs)
    13. [Free context.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#free-context)
    14. [Terminate backend.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#terminate-backend)

### Load prerequisite shared libraries

QNN SDK provides various shared libraries to access backends and
applications have to load them as needed to run a network.

Create a network in QNN in one of the following ways.

- Build the network directly in your application using QNN APIs.
- Use QNN converters to produce a shared library of a QNN network.

`qnn-sample-app` uses the shared library option. This network can
be produced using one of the QNN converters available in the SDK, and
compiled into a shared library using `qnn-model-lib-generator`.

Note

For Windows users, replace all `.so` files with the analogous
`.dll` file in the following instructions. For more details, see
platform differences.

#### Loading a backend

Shared libraries for various backends including CPU, GPU, HTP, and DSP
are available in the QNN SDK. Every backend that implements QNN APIs
exposes all necessary symbols that can be accessed using dynamic loading
mechanism.

Consider a sample backend shared library
named *libQnnSampleBackend.so*, which can be dynamically loaded as shown
below:

void* libBackendHandle = pal::dynamicloading::dlOpen(
      "libQnnSampleBackend.so", pal::dynamicloading::DL_NOW | pal::dynamicloading::DL_LOCAL);
    
    if (nullptr == libBackendHandle) {
      QNN_ERROR("Unable to load backend. pal::dynamicloading::dlError(): %s",
                pal::dynamicloading::dlError());
      return StatusCode::FAIL_LOAD_BACKEND;
    Copy to clipboard

To load a model as a shared library, let’s consider a sample model
shared library named *libQnnSampleModel.so*, which can be dynamically
loaded as shown below:

void* libModelHandle = pal::dynamicloading::dlOpen(
        "libQnnSampleModel.so", pal::dynamicloading::DL_NOW | pal::dynamicloading::DL_LOCAL);
    
    if (nullptr == libModelHandle) {
      QNN_ERROR("Unable to load model. pal::dynamicloading::dlError(): %s",
                pal::dynamicloading::dlError());
      return StatusCode::FAIL_LOAD_MODEL;
    }
    Copy to clipboard

Optionally, to create a context from a cached binary and execute graphs,
applications can make use of QnnSystem API to retrieve metadata
associated with the context. QnnSystem API can be accessed by loading
the *libQnnSystem.so* shared library as shown below:

void* systemLibraryHandle = pal::dynamicloading::dlOpen(
      "libQnnSystem.so", pal::dynamicloading::DL_NOW | pal::dynamicloading::DL_LOCAL);
    
    if (nullptr == systemLibraryHandle) {
      QNN_ERROR("Unable to load system library. pal::dynamicloading::dlError(): %s",
                pal::dynamicloading::dlError());
      return StatusCode::FAIL_LOAD_SYSTEM_LIB;
    }
    Copy to clipboard

#### Resolving symbols in shared libraries

After the shared libraries are successfully loaded, we can proceed to
resolve all necessary symbols to access QNN APIs.

The below code snippet shows a template to resolve a symbol in a shared
library:

// A generic function to resolve symbols in a library
    template <class T>
    static inline T resolveSymbol(void* libHandle, const char* symName) {
    T ptr = (T)pal::dynamicloading::dlSym(libHandle, symName);
    if (ptr == nullptr) {
       QNN_ERROR("Unable to access symbol [%s]. pal::dynamicloading::dlError(): %s", symName, pal::dynamicloading::dlError());
    }
    return ptr;
    }
    // Template for resolving a function of type SampleFnHandleType_t
    typedef ReturnType_t (*SampleFnHandleType_t)(FunctionParameterTypes_t ...);
    SampleFnHandleType_t sampleFn = nullptr;
    sampleFnHandle = resolveSymbol<SampleFnHandleType_t>(libBackendHandle, "QnnSample_API");
    if (nullptr == sampleFnHandle) {
    // Error code indicating failure in symbol resolution
    return StatusCode::FAIL_SYM_FUNCTION;
    }
    Copy to clipboard

The below code snippet shows an example of how to resolve an actual QNN
API:

/* Resolve the symbol for Qnn_ErrorHandle_t QnnInterface_getProviders(const QnnInterface_t*** providerList,
                                                                         uint32_t* numProviders)
       API */
    
    typedef Qnn_ErrorHandle_t (*QnnInterfaceGetProvidersFn_t)(const QnnInterface_t*** providerList,
                                                             uint32_t* numProviders);
    
    QnnInterfaceGetProvidersFn_t getInterfaceProviders {nullptr};
    
    getInterfaceProviders =
    resolveSymbol<QnnInterfaceGetProvidersFn_t>(libBackendHandle, "QnnInterface_getProviders");
    if (nullptr == getInterfaceProviders) {
    return StatusCode::FAIL_SYM_FUNCTION;
    }
    Copy to clipboard

In *qnn-sample-app* source code, all necessary symbols are resolved and
stored in a struct of type QnnFunctionPointers shown below:

typedef struct QnnFunctionPointers {
      // APIs from model output from converters
      // QnnModel_composeGraphs
      ComposeGraphsFnHandleType_t composeGraphsFnHandle;
      // QnnModel_freeGraphsInfo
      FreeGraphInfoFnHandleType_t freeGraphInfoFnHandle;
      // QNN Interface function table containing pointers to all necessary QNN APIs
      // in a backend
      QNN_INTERFACE_VER_TYPE qnnInterface;
      // QNN System Interface function table containing pointers to all QNN System APIs
      QNN_SYSTEM_INTERFACE_VER_TYPE qnnSystemInterface;
    } QnnFunctionPointers;
    Copy to clipboard

The above structure can be found
in ${QNN\_SDK\_ROOT}/examples/QNN/SampleApp/SampleApp/src/SampleApp.hpp.
The rest of the tutorial will assume a variable
named *m\_qnnFunctionPointers* of type *QnnFunctionPointers* that
contains valid function pointers.

### Usage of QNN APIs

This section demonstrates the usage of QNN APIs in a client application.

#### Use QNN Interface to obtain function pointers

QNN Interface mechanism can be used to set up a table of function
pointers to QNN APIs in the backend instead of manually resolving
symbols to each and every API, which makes resolving symbols easy. QNN
Interface can be used as below:

QnnInterface_t** interfaceProviders{nullptr};
    uint32_t numProviders{0};
    // Query for al available interfaces
    if (QNN_SUCCESS !=
       getInterfaceProviders((const QnnInterface_t***)&interfaceProviders, &numProviders)) {
    QNN_ERROR("Failed to get interface providers.");
    return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    // Check for validity of returned interfaces
    if (nullptr == interfaceProviders) {
    QNN_ERROR("Failed to get interface providers: null interface providers received.");
    return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    if (0 == numProviders) {
    QNN_ERROR("Failed to get interface providers: 0 interface providers.");
    return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    bool foundValidInterface{false};
    // Loop through all available interface providers and pick the one that suits the current API
    // version
    for (size_t pIdx = 0; pIdx < numProviders; pIdx++) {
    if (QNN_API_VERSION_MAJOR == interfaceProviders[pIdx]->apiVersion.coreApiVersion.major &&
          QNN_API_VERSION_MINOR <= interfaceProviders[pIdx]->apiVersion.coreApiVersion.minor) {
       foundValidInterface                 = true;
       m_qnnFunctionPointers.qnnInterface = interfaceProviders[pIdx]->QNN_INTERFACE_VER_NAME;
       break;
    }
    }
    if (!foundValidInterface) {
    QNN_ERROR("Unable to find a valid interface.");
    libBackendHandle = nullptr;
    return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    Copy to clipboard

QNN System Interface can be used to resolve all symbols related to QNN
System APIs as shown below:

typedef Qnn_ErrorHandle_t (*QnnSystemInterfaceGetProvidersFn_t)(
       const QnnSystemInterface_t*** providerList, uint32_t* numProviders);
    
    QnnSystemInterfaceGetProvidersFn_t getSystemInterfaceProviders{nullptr};
    getSystemInterfaceProviders = resolveSymbol<QnnSystemInterfaceGetProvidersFn_t>(
       systemLibraryHandle, "QnnSystemInterface_getProviders");
    if (nullptr == getSystemInterfaceProviders) {
       return StatusCode::FAIL_SYM_FUNCTION;
    }
    QnnSystemInterface_t** systemInterfaceProviders{nullptr};
    uint32_t numProviders{0};
    if (QNN_SUCCESS != getSystemInterfaceProviders(
                         (const QnnSystemInterface_t***)&systemInterfaceProviders, &numProviders)) {
       QNN_ERROR("Failed to get system interface providers.");
       return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    if (nullptr == systemInterfaceProviders) {
       QNN_ERROR("Failed to get system interface providers: null interface providers received.");
       return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    if (0 == numProviders) {
       QNN_ERROR("Failed to get interface providers: 0 interface providers.");
       return StatusCode::FAIL_GET_INTERFACE_PROVIDERS;
    }
    bool foundValidSystemInterface{false};
    for (size_t pIdx = 0; pIdx < numProviders; pIdx++) {
       if (QNN_SYSTEM_API_VERSION_MAJOR == systemInterfaceProviders[pIdx]->systemApiVersion.major &&
          QNN_SYSTEM_API_VERSION_MINOR <= systemInterfaceProviders[pIdx]->systemApiVersion.minor) {
       foundValidSystemInterface = true;
       m_qnnFunctionPointers->qnnSystemInterface =
          systemInterfaceProviders[pIdx]->QNN_SYSTEM_INTERFACE_VER_NAME;
       break;
       }
    }
    Copy to clipboard

#### Set up logging

Logging can be set up before a backed is initialized and after a backend
shared library has been dynamically loaded.

To initialize logging, a callback of type *QnnLog\_Callback\_t* has to be
defined. An example is defined below:

void logStdoutCallback(const char* fmt,
                            QnnLog_Level_t level,
                            uint64_t timestamp,
                            va_list argp) {
       const char* levelStr = "";
       switch (level) {
       case QNN_LOG_LEVEL_ERROR:
       levelStr = " ERROR ";
       break;
       case QNN_LOG_LEVEL_WARN:
       levelStr = "WARNING";
       break;
       case QNN_LOG_LEVEL_INFO:
       levelStr = "  INFO ";
       break;
       case QNN_LOG_LEVEL_DEBUG:
       levelStr = " DEBUG ";
       break;
       case QNN_LOG_LEVEL_VERBOSE:
       levelStr = "VERBOSE";
       break;
       case QNN_LOG_LEVEL_MAX:
       levelStr = "UNKNOWN";
       break;
       }
       fprintf(stdout, "%8.1fms [%-7s] ", ms, levelStr);
       vfprintf(stdout, fmt, argp);
       fprintf(stdout, "\n");
    }
    Copy to clipboard

The above callback can be registered with the backend along with a
maximum log level. Sample code to initialize with a max log level of
QNN\_LOG\_LEVEL\_INFO:

Qnn_LogHandle_t logHandle;
    if (QNN_SUCCESS !=
          m_qnnFunctionPointers.qnnInterface.logCreate(logStdoutCallback, QNN_LOG_LEVEL_INFO, &logHandle)) {
    QNN_ERROR("Unable to initialize logging in the backend.");
    return StatusCode::FAILURE;
    }
    Copy to clipboard

#### Initialize backend

Once logging has been successfully initialized, backend can be
initialized as shown below:

Qnn_BackendHandle_t backendHandle;
    const QnnBackend_Config_t* backendConfigs;
    /* Set up any necessary backend configurations */
    if (QNN_BACKEND_NO_ERROR != m_qnnFunctionPointers.qnnInterface.backendCreate(logHandle,
                                                                                 &backendConfigs,
                                                                                 &backendHandle)) {
      QNN_ERROR("Could not initialize backend");
      return StatusCode::FAILURE;
    }
    Copy to clipboard

#### Initialize Profiling

If profiling is desired, after the backend is initialized, a profile
handle can be set up. This profile handle can be used at a later point
in any API that supports profiling.

A profile handle can be created in the backend with basic profiling
level as shown below:

Qnn_ProfileHandle_t profileHandle;
    if (QNN_PROFILE_NO_ERROR != m_qnnFunctionPointers.qnnInterface.profileCreate(
                                      backendHandle, QNN_PROFILE_LEVEL_BASIC, &profileHandle)) {
      QNN_WARN("Unable to create profile handle in the backend.");
      return StatusCode::FAILURE;
    }
    Copy to clipboard

#### Create device

Device can be created as shown below:

Qnn_DeviceHandle_t deviceHandle {nullptr};
    const QnnDevice_Config_t* devConfigArray[] = {&devConfig, nullptr};
    Qnn_ErrorHandle_t ret = m_qnnFunctionPointers.qnnInterface.deviceCreate(logHandle,
                                                                            devConfigArray,
                                                                            &deviceHandle);
    if (QNN_SUCCESS != ret) {
       QNN_ERROR("Failed to create device: %u", qnnStatus);
       return StatusCode::FAILURE;
    }
    Copy to clipboard

Set devConfig as defined here in QNN HTP Backend API

#### Register op packages

Op packages are way to supply libraries containing ops to backends. They
can be registered as shown below:

uint32_t opPackageCount;
    char* opPackagePath[opPackageCount];
    char* opPackageInterfaceProvider[opPackageCount];
    /* Set up required op package paths and interface providers as necessary */
    for(uint32_t idx = 0; idx < opPackageCount; idx++) {
      if (QNN_BACKEND_NO_ERROR !=
            m_qnnFunctionPointers.qnnInterface.backendRegisterOpPackage(backendHandle,
                                                                        opPackagePath[idx],
                                                                        opPackageInterfaceProvider[idx])) {
        QNN_ERROR("Could not register Op Package: %s and interface provider: %s",
                opPackagePath[idx],
                opPackageInterfaceProvider[idx]);
        return StatusCode::FAILURE;
      }
    }
    Copy to clipboard

#### Create context

A context can be created in a backend as shown below:

Qnn_ContextHandle_t context;
    Qnn_DeviceHandle_t deviceHandle {nullptr};
    const QnnContext_Config_t* contextConfigs;
    /* Set up any context configs that are necessary */
    if (QNN_CONTEXT_NO_ERROR !=
           m_qnnFunctionPointers.qnnInterface.contextCreate(backendHandle,
                                                            deviceHandle,
                                                            &contextConfigs,
                                                            &context)) {
       QNN_ERROR("Could not create context");
       return StatusCode::FAILURE;
    }
    Copy to clipboard

#### Prepare graphs

*qnn-sample-app* relies on the output from one of the converters to
create a QNN network in the backend. *composeGraphsFnHandle* is mapped
to *QnnModel\_composeGraphs* API in the model shared library, which
takes *qnn\_wrapper\_api::GraphInfo\_t\**\*\* as one of the parameters. The
function *composeGraphsFnHandle* will make necessary calls to the
backend to create a network(s). It also writes all necessary
information, like information about input and output tensors related to
the graph, required to execute a graph into the
structure *graphsInfo* as shown in the following code block:

/* Structure to retrieve information about graphs, like graph name,
       details about input and output tensors preset in libQnnSampleModel.so */
    qnn_wrapper_api::GraphInfo_t** graphsInfo;
    // No. of graphs present in libQnnSampleModel.so
    uint32_t graphsCount;
    // true to enable intermediate outputs, false for network outputs only
    bool debug;
    if (qnn_wrapper_api::ModelError_t::MODEL_NO_ERROR !=
            m_qnnFunctionPointers.composeGraphsFnHandle(backendHandle,
                                                        m_qnnFunctionPointers.qnnInterface,
                                                        context,
                                                        &graphsInfo,
                                                        &graphsCount,
                                                        debug)) {
      QNN_ERROR("Failed in composeGraphs()");
      return StatusCode::FAILURE;
    }
    Copy to clipboard

At this point, the context will contain all the graphs that were present
in *libQnnSampleModel.so*.

#### Finalize Graphs

Graphs that were added in the previous step can be finalized as shown
below:

// information about graphs obtained in the previous step
    qnn_wrapper_api::GraphInfo_t** graphsInfo;
    // No. of graphs obtained in the previous step
    uint32_t graphsCount;
    /* A valid profile handle if profiling is desired,
       nullptr if profiling is not needed */
    Qnn_ProfileHandle_t profileHandle;
    
    for (size_t graphIdx = 0; graphIdx < m_graphsCount; graphIdx++) {
      if (QNN_GRAPH_NO_ERROR !=
        m_qnnFunctionPointers.qnnInterface.graphFinalize(
            (*graphsInfo)[graphIdx].graph, profileBackendHandle, nullptr)) {
        return StatusCode::FAILURE;
      }
      /* Extract profiling information if desired and if a valid handle was supplied to finalize
         graphs API */
    }
    Copy to clipboard

#### Save context into a binary

After all the graphs in a context are finalized, the user application
may choose to save the context into a binary for future use. The
advantage of saving a context is that it can be retrieved in the future
for execution of graphs contained within it without having to finalize
them again. This will save considerable time for initialization during
execution of a network.

The context can be saved as shown below:

// Get the expected size of the buffer from the backend in which the context can be saved
    if (QNN_CONTEXT_NO_ERROR !=
      m_qnnFunctionPointers.qnnInterface.contextGetBinarySize(context, &requiredBufferSize)) {
      QNN_ERROR("Could not get the required binary size.");
      return StatusCode::FAILURE;
    }
    
    // Allocate a buffer of the required size
    saveBuffer = (uint8_t*)malloc(requiredBufferSize * sizeof(uint8_t));
    if (nullptr == saveBuffer) {
      QNN_ERROR("Could not allocate buffer to save binary.");
      return StatusCode::FAILURE;
    }
    
    auto status = StatusCode::SUCCESS;
    uint32_t writtenBufferSize{0};
    // Pass the allocated buffer and obtain a copy of the context binary written into the buffer
    if (QNN_CONTEXT_NO_ERROR !=
      m_qnnFunctionPointers.qnnInterface.contextGetBinary(context,
                                                          reinterpret_cast<void*>(saveBuffer),
                                                          requiredBufferSize,
                                                          &writtenBufferSize)) {
     QNN_ERROR("Could not get binary.");
     status = StatusCode::FAILURE;
    }
    
    // Check if the supplied buffer size is at least as big as the amount of data witten by the backend
    if (requiredBufferSize < writtenBufferSize) {
      QNN_ERROR(
        "Illegal written buffer size [%d] bytes. Cannot exceed allocated memory of [%d] bytes",
        writtenBufferSize,
        requiredBufferSize);
      status = StatusCode::FAILURE;
    }
    
    // Use caching utility to save metadata along with the binary buffer from the backend
    if (status == StatusCode::SUCCESS &&
      tools::datautil::StatusCode::SUCCESS != tools::datautil::writeBinaryToFile(outputPath,
                                                                                 saveBinaryName + ".bin",
                                                                                 (uint8_t*)saveBuffer,
                                                                                 writtenBufferSize)) {
      QNN_ERROR("Could not serialize to file.");
      status = StatusCode::FAILURE;
    }
    Copy to clipboard

#### Load context from a cached binary

A context that was saved into a binary, like in the previous step, can
be loaded as an alternative to creating a new context every time. The
code snippet below demonstrates this step:

auto returnStatus   = StatusCode::SUCCESS;
    std::shared_ptr<uint8_t> buffer{nullptr};
    uint32_t graphsCount {0};
    buffer = std::shared_ptr<uint8_t>(new uint8_t[bufferSize], std::default_delete<uint8_t[]>());
    if (!buffer) {
        QNN_ERROR("Failed to allocate memory.");
        return StatusCode::FAILURE;
    }
    
    if (tools::datautil::StatusCode::SUCCESS !=
        tools::datautil::readBinaryFromFile(
        cachedBinaryPath, reinterpret_cast<uint8_t*>(buffer.get()), bufferSize)
        QNN_ERROR("Failed to read binary file.");
        returnStatus = StatusCode::FAILURE;
    }
    
    /* Create a QnnSystemContext handle to access system context APIs. */
    QnnSystemContext_Handle_t sysCtxHandle{nullptr};
    if (QNN_SUCCESS != m_qnnFunctionPointers.qnnSystemInterface.systemContextCreate(&sysCtxHandle)) {
      QNN_ERROR("Could not create system handle.");
      returnStatus = StatusCode::FAILURE;
    }
    
    /* Retrieve metadata from the context binary through QNN System Context API. */
    QnnSystemContext_BinaryInfo_t* binaryInfo{nullptr};
    uint32_t binaryInfoSize{0};
    if (StatusCode::SUCCESS == returnStatus &&
        QNN_SUCCESS != m_qnnFunctionPointers.qnnSystemInterface.systemContextGetBinaryInfo(
                         sysCtxHandle,
                         static_cast<void*>(buffer.get()),
                         bufferSize,
                         &binaryInfo,
                         &binaryInfoSize)) {
        QNN_ERROR("Failed to get context binary info");
        returnStatus = StatusCode::FAILURE;
    }
    
    qnn_wrapper_api::GraphInfo_t** graphsInfo;
    /* Make a copy of the metadata. */
    if (StatusCode::SUCCESS == returnStatus &&
        !copyMetadataToGraphsInfo(binaryInfo, graphsInfo, graphsCount)) {
      QNN_ERROR("Failed to copy metadata.");
      returnStatus = StatusCode::FAILURE;
    }
    
    /* Release resources associated with previously created QnnSystemContext handle. */
    m_qnnFunctionPointers.qnnSystemInterface.systemContextFree(sysCtxHandle);
    sysCtxHandle = nullptr;
    
    /* readBuffer contains the binary data that was previously obtained from a backend. Pass this
       cached binary data to the backend to recreate the same context. */
    if (StatusCode::SUCCESS == returnStatus &&
        m_qnnFunctionPointers.qnnInterface.contextCreateFromBinary(backendHandle,
                                                                   deviceHandle,
                                                                   (const QnnContext_Config_t**)&contextConfig,
                                                                   reinterpret_cast<void*>(readBuffer),
                                                                   bufferSize,
                                                                   &context,
                                                                   profileBackendHandle)) {
      QNN_ERROR("Could not create context from binary.");
      returnStatus = StatusCode::FAILURE;
    }
    
    // Optionally, extract profiling numbers if desired
    if (ProfilingLevel::OFF != m_profilingLevel) {
      extractBackendProfilingInfo(profileBackendHandle);
    }
    
    /* Obtain and save graph handles for each graph present in the context based on the saved graph
       names in the metadata */
    if (StatusCode::SUCCESS == returnStatus) {
      for (size_t graphIdx = 0; graphIdx < m_graphsCount; graphIdx++) {
        if (QNN_SUCCESS !=
            m_qnnFunctionPointers.qnnInterface.graphRetrieve(
                context, (*graphsInfo)[graphIdx].graphName, &((*graphsInfo)[graphIdx].graph))) {
          QNN_ERROR("Unable to retrieve graph handle for graph Idx: %d", graphIdx);
          returnStatus = StatusCode::FAILURE;
        }
      }
    }
    Copy to clipboard

#### Run graphs

After a context has been created, graphs have been added and finalized,
or alternatively, after a context has been retrieved from a binary, one
or more graphs in the context can be executed.

Running a graph involves:

1. Setting up input and output tensors.
2. Populating input data into input tensors.
3. Calling the execute method in the backend.
4. Obtaining outputs and saving them.

This is demonstrated using the code snippet below:

// Select a graph from graphsInfo if there are more than one graph in this context
    uint32_t graphIdx;
    QNN_DEBUG("Starting execution for graphIdx: %d", graphIdx);
    Qnn_Tensor_t* inputs  = nullptr;
    Qnn_Tensor_t* outputs = nullptr;
    // IOTensor utility is used to set up input and output tensor structures
    if (iotensor::StatusCode::SUCCESS !=
          ioTensor.setupInputAndOutputTensors(&inputs, &outputs, (*graphsInfo)[graphIdx])) {
      QNN_ERROR("Error in setting up Input and output Tensors for graphIdx: %d", graphIdx);
      returnStatus = StatusCode::FAILURE;
      break;
    }
    
    // Grab input raw file paths to read input data
    auto inputFileList = inputFileLists[graphIdx];
    auto graphInfo     = (*graphsInfo)[graphIdx];
    if (!inputFileList.empty()) {
      /* *qnn-sample-app* reads data based on the batch size until the whole buffer is filled.
         If there isn't sufficient data, it pads the rest with zeroes. */
      size_t totalCount = inputFileList[0].size();
      while (!inputFileList[0].empty()) {
        size_t startIdx = (totalCount - inputFileList[0].size());
    
        // IOTensor utility is used to populate input tensors with input data
        if (iotensor::StatusCode::SUCCESS !=
              m_ioTensor.populateInputTensors(
                graphIdx, inputFileList, inputs, graphInfo, inputDataType)) {
          returnStatus = StatusCode::FAILURE;
        }
    
        if (StatusCode::SUCCESS == returnStatus) {
          // Execute the graph in the backend with optional profile handle
          QNN_DEBUG("Successfully populated input tensors for graphIdx: %d", graphIdx);
          Qnn_ErrorHandle_t executeStatus = QNN_GRAPH_NO_ERROR;
          executeStatus = m_qnnFunctionPointers.qnnInterface.graphExecute(graphInfo.graph,
                                                                          inputs,
                                                                          graphInfo.numInputTensors,
                                                                          outputs,
                                                                          graphInfo.numOutputTensors,
                                                                          profileBackendHandle,
                                                                          nullptr);
          if (QNN_GRAPH_NO_ERROR != executeStatus) {
            returnStatus = StatusCode::FAILURE;
          }
          if (StatusCode::SUCCESS == returnStatus) {
            QNN_DEBUG("Successfully executed graphIdx: %d ", graphIdx);
            // IOTensor utility is used to write output tensors to raw files
            if (iotensor::StatusCode::SUCCESS !=
                  ioTensor.writeOutputTensors(graphIdx,
                                              startIdx,
                                              graphInfo.graphName,
                                              outputs,
                                              graphInfo.outputTensors,
                                              graphInfo.numOutputTensors,
                                              outputDataType,
                                              graphsCount,
                                              outputPath)) {
                returnStatus = StatusCode::FAILURE;
             }
            }
          }
          if (StatusCode::SUCCESS != returnStatus) {
            QNN_ERROR("Execution of Graph: %d failed!", graphIdx);
            break;
          }
        }
      }
    
      // Clean up all the tensors after execution is completed
      ioTensor.tearDownInputAndOutputTensors(
          inputs, outputs, graphInfo.numInputTensors, graphInfo.numOutputTensors);
      inputs  = nullptr;
      outputs = nullptr;
      if (StatusCode::SUCCESS != returnStatus) {
        break;
      }
    }
    Copy to clipboard

IOTensor is a utility provided with the source code
at ${QNN\_SDK\_ROOT}/examples/QNN/SampleApp/SampleApp/src/Utils/IOTensor.cpp.
It exposes a few methods that help with the execution of a graph, which
were used in the previous code snippet:

1. *setupInputAndOutputTensors* to set up structures related to input
and output tensors.
2. *populateInputTensors* to copy input data into input tensor
structures.
3. *tearDownInputAndOutputTensors* to clean up resources associated with
input and output tensors.

Refer to the IOTensor source code for more details about these APIs.

#### Free context

After all the execution is completed, the context can be freed as shown
below:

if (QNN_CONTEXT_NO_ERROR !=
          m_qnnFunctionPointers.qnnInterface.contextFree(context, profileBackendHandle)) {
      QNN_ERROR("Could not free context");
      return StatusCode::FAILURE;
    }
    Copy to clipboard

#### Terminate backend

Backend can be terminated as shown below:

if (QNN_BACKEND_NO_ERROR != m_qnnFunctionPointers.qnnInterface.backendFree(backendHandle)) {
      QNN_ERROR("Could not free backend");
      return StatusCode::FAILURE;
    }
    Copy to clipboard

## SNPE sample app

For C++ API and sample app execution using SNPE, see the [Qualcomm AI Runtime SDK documentation](https://docs.qualcomm.com/nav/home/usergroup8.html?product=1601111740009302).

Last Published: May 14, 2026

[Previous Topic
Add postprocessing support for a custom model](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/add-postprocessing-support-custom-model.md) [Next Topic
Use AI Hub models with the GStreamer API](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/use-ai-hub-models-with-gstreamer.md)