# Develop an AI application using QAIRT C++ APIs Note Support for this section on Ubuntu will be available soon. The Qualcomm AI Runtime (QAIRT) SDK provides C++ APIs for sample application development. Samples are available for both Qualcomm AI Engine Direct (QNN) and Qualcomm Neural Processing Engine SDK (SNPE). The samples help you begin application development. The following instructions describe how to build, run, and navigate the source code. They demonstrate the workflow for utilizing QNN or SNPE APIs to run a model. ## Build and run the QNN sample app The `qnn-sample-app` is located at `${QNN_SDK_ROOT}/examples/QNN/SampleApp`, where `QNN_SDK_ROOT` refers to the path where the QNN SDK has been extracted. ### Set up the QAIRT SDK To setup the toolchain for the QNN sample app, do the following: 1. [Download the Qualcomm AI Runtime SDK](https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.43.0.260128/v2.43.0.260128.zip). 2. Extract and unzip the SDK. unzip v2.43.0.260128.zip Copy to clipboard cd qairt/2.43.0.260128 Copy to clipboard export QNN_SDK_ROOT=`pwd` Copy to clipboard 3. Install the eSDK. Follow the [Qualcomm IM SDK quickstart](https://docs.qualcomm.com/doc/80-80022-51/topic/install-sdk.html#section-b5c-z3k-5bc) to install the eSDK, which contains the required cross-compiler toolchain. - For Yocto Scarthgap devices, the libraries are compiled with GCC-11.2. - Set the `ESDK_PATH` environment variable with eSDK installation path. Later steps use the installation path (`/path/to/extracted/toolchain`) for the compilation. export ESDK_PATH="/path/to/extracted/toolchain" Copy to clipboard ### Build the QNN sample app Follow the steps below to setup the toolchain for the QNN sample app. 1. Go to the sample app directory. cd ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleApp/ Copy to clipboard 2. Set the environment variable for the GCC toolchain. export QNN_AARCH64_LINUX_OE_GCC_112=$ESDK_PATH Copy to clipboard 3. Build the application. make CXX="$ESDK_PATH/tmp/sysroots/x86_64/usr/bin/aarch64-qcom-linux/aarch64-qcom-linux-g++ --sysroot=$ESDK_PATH/tmp/sysroots/qcs6490-rb3gen2-vision-kit/" all_linux_oe_aarch64_gcc112 Copy to clipboard This creates two folders. - `bin`: Contains `qnn-sample-app` binaries for each platform within their respective directories. - `obj`: Contains all object files used in building and linking the executable. ### Run the QNN sample app on Linux (Yocto-based) The built `qnn-sample-app` executable can run a model with any QNN backend. For Yocto scarthgap-based devices, backends are available for `aarch64-oe-linux-gcc11.2`. 1. Push the artifacts to the target device. scp ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleApp/bin/aarch64-oe-linux-gcc11.2/qnn-sample-app root@[ip-addr]:/etc/apps/qnn-sample-app Copy to clipboard Note Create the `/etc/apps/` directory if it doesn’t already exist on the device. 2. On the host computer, use [AI Hub](https://docs.qualcomm.com/doc/80-80022-15B/topic/ai-hub.html) to export a model. For example, to export the InceptionV3 QNN model, run the following commands: pip3 install qai-hub-models Copy to clipboard python -m qai_hub_models.models.inception_v3.export --quantize w8a8 --target-runtime=qnn_context_binary --device="Dragonwing RB3 Gen 2 Vision Kit" --compile-options="qairt_version 2.43" --profile-options "--qairt_version 2.43" Copy to clipboard Note Generate the context binary for the same SDK version in use on the target device. 3. Push the exported InceptionV3 QNN model to the target device. Save the model to `export_assets/inception_v3-qnn_context_binary-w8a8-`. The following example uses `QCS6490` as the chipset. scp export_assets/inception_v3-qnn_context_binary-w8a8-qualcomm_qcs6490/inception_v3.bin root@:/etc/apps/ Copy to clipboard When prompted to enter the password, enter oelinux123. 4. On the host machine, generate a dummy input file to be used for inference and transfer it to the target device. 1. Run the following commands in the Python environment. python3 Copy to clipboard import numpy as np Copy to clipboard ((np.random.random((1,3,224,224)).astype(np.float32))).tofile("input.raw") Copy to clipboard 2. Transfer the `input.raw` file to the target device: scp input.raw root@:/etc/apps Copy to clipboard 5. From the host computer, SSH into the target device. ssh root@ Copy to clipboard cd /etc/apps Copy to clipboard 6. Create `input_list.txt`. echo "input.raw" > /etc/apps/input_list.txt Copy to clipboard 7. Run the app. chmod +x qnn-sample-app Copy to clipboard ./qnn-sample-app --retrieve_context inception_v3.bin \ --backend libQnnHtp.so \ --input_list input_list.txt \ --system_library libQnnSystem.so Copy to clipboard Note Update the model name and input\_list as per the selected model. For help context, run: ./qnn-sample-app --help Copy to clipboard Command line arguments - **Required arguments** - - `--model`: Path to the QNN network model. Mutually exclusive with `--retrieve_context`. - `--retrieve_context`: Path to a cached binary for loading a saved context and execution graphs. Mutually exclusive with `--model`. - `--backend`: Path to a QNN backend to run the model. - `--input_list`: Path to a file listing network inputs. For multiple graphs, provide a comma-separated list of input files. - **Optional arguments** - - `--debug`: Save output from all network layers. - `--output_dir`: Directory for outputs (default: ./output). - `--output_data_type`: Output data type (float\_only, native\_only, float\_and\_native). - `--input_data_type`: Input data type (float or native). - `--op_packages`: Comma-separated list of op packages and interface providers. - `--profiling_level`: Profiling level (basic or detailed). - `--save_context`: Save backend context and graph metadata to a binary file. - `--num_inferences`: Number of inferences to perform. - `--log_level`: Max logging level (error, warn, info, verbose). - `--system_library`: Path to libQnnSystem.so for reflection APIs during context loading. - `--version`: Print QNN SDK version. - `--help`: Display help message. ## Workflow and API usage Use the following recommended pattern to develop C++ applications using QNN APIs. 1. [Load prerequisite shared libraries.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#loading-pre-requisite-shared-libraries) 2. [Use QNN APIs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#usage-of-qnn-apis) 1. [Use QNN interface to obtain function pointers.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#use-qnn-interface-to-obtain-function-pointers) 2. [Set up logging.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#set-up-logging) 3. [Initialize backend.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#initialize-backend) 4. [Initialize profiling.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#initialize-profiling) 5. [Create device.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#create-device) 6. [Register op packages.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#register-op-packages) 7. [Create context.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#create-context) 8. [Prepare graphs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#prepare-graphs) 9. [Finalize graphs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#finalize-graphs) 10. [Save context into a binary.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#save-context-into-a-binary) 11. [Load context from a cached binary.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#load-context-from-a-cached-binary) 12. [Run graphs.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#execute-graphs) 13. [Free context.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#free-context) 14. [Terminate backend.](https://docs.qualcomm.com/nav/home/sample_app.html?product=1601111740009302#terminate-backend) ### Load prerequisite shared libraries QNN SDK provides various shared libraries to access backends and applications have to load them as needed to run a network. Create a network in QNN in one of the following ways. - Build the network directly in your application using QNN APIs. - Use QNN converters to produce a shared library of a QNN network. `qnn-sample-app` uses the shared library option. This network can be produced using one of the QNN converters available in the SDK, and compiled into a shared library using `qnn-model-lib-generator`. Note For Windows users, replace all `.so` files with the analogous `.dll` file in the following instructions. For more details, see platform differences. #### Loading a backend Shared libraries for various backends including CPU, GPU, HTP, and DSP are available in the QNN SDK. Every backend that implements QNN APIs exposes all necessary symbols that can be accessed using dynamic loading mechanism. Consider a sample backend shared library named *libQnnSampleBackend.so*, which can be dynamically loaded as shown below: void* libBackendHandle = pal::dynamicloading::dlOpen( "libQnnSampleBackend.so", pal::dynamicloading::DL_NOW | pal::dynamicloading::DL_LOCAL); if (nullptr == libBackendHandle) { QNN_ERROR("Unable to load backend. pal::dynamicloading::dlError(): %s", pal::dynamicloading::dlError()); return StatusCode::FAIL_LOAD_BACKEND; Copy to clipboard To load a model as a shared library, let’s consider a sample model shared library named *libQnnSampleModel.so*, which can be dynamically loaded as shown below: void* libModelHandle = pal::dynamicloading::dlOpen( "libQnnSampleModel.so", pal::dynamicloading::DL_NOW | pal::dynamicloading::DL_LOCAL); if (nullptr == libModelHandle) { QNN_ERROR("Unable to load model. pal::dynamicloading::dlError(): %s", pal::dynamicloading::dlError()); return StatusCode::FAIL_LOAD_MODEL; } Copy to clipboard Optionally, to create a context from a cached binary and execute graphs, applications can make use of QnnSystem API to retrieve metadata associated with the context. QnnSystem API can be accessed by loading the *libQnnSystem.so* shared library as shown below: void* systemLibraryHandle = pal::dynamicloading::dlOpen( "libQnnSystem.so", pal::dynamicloading::DL_NOW | pal::dynamicloading::DL_LOCAL); if (nullptr == systemLibraryHandle) { QNN_ERROR("Unable to load system library. pal::dynamicloading::dlError(): %s", pal::dynamicloading::dlError()); return StatusCode::FAIL_LOAD_SYSTEM_LIB; } Copy to clipboard #### Resolving symbols in shared libraries After the shared libraries are successfully loaded, we can proceed to resolve all necessary symbols to access QNN APIs. The below code snippet shows a template to resolve a symbol in a shared library: // A generic function to resolve symbols in a library template static inline T resolveSymbol(void* libHandle, const char* symName) { T ptr = (T)pal::dynamicloading::dlSym(libHandle, symName); if (ptr == nullptr) { QNN_ERROR("Unable to access symbol [%s]. pal::dynamicloading::dlError(): %s", symName, pal::dynamicloading::dlError()); } return ptr; } // Template for resolving a function of type SampleFnHandleType_t typedef ReturnType_t (*SampleFnHandleType_t)(FunctionParameterTypes_t ...); SampleFnHandleType_t sampleFn = nullptr; sampleFnHandle = resolveSymbol(libBackendHandle, "QnnSample_API"); if (nullptr == sampleFnHandle) { // Error code indicating failure in symbol resolution return StatusCode::FAIL_SYM_FUNCTION; } Copy to clipboard The below code snippet shows an example of how to resolve an actual QNN API: /* Resolve the symbol for Qnn_ErrorHandle_t QnnInterface_getProviders(const QnnInterface_t*** providerList, uint32_t* numProviders) API */ typedef Qnn_ErrorHandle_t (*QnnInterfaceGetProvidersFn_t)(const QnnInterface_t*** providerList, uint32_t* numProviders); QnnInterfaceGetProvidersFn_t getInterfaceProviders {nullptr}; getInterfaceProviders = resolveSymbol(libBackendHandle, "QnnInterface_getProviders"); if (nullptr == getInterfaceProviders) { return StatusCode::FAIL_SYM_FUNCTION; } Copy to clipboard In *qnn-sample-app* source code, all necessary symbols are resolved and stored in a struct of type QnnFunctionPointers shown below: typedef struct QnnFunctionPointers { // APIs from model output from converters // QnnModel_composeGraphs ComposeGraphsFnHandleType_t composeGraphsFnHandle; // QnnModel_freeGraphsInfo FreeGraphInfoFnHandleType_t freeGraphInfoFnHandle; // QNN Interface function table containing pointers to all necessary QNN APIs // in a backend QNN_INTERFACE_VER_TYPE qnnInterface; // QNN System Interface function table containing pointers to all QNN System APIs QNN_SYSTEM_INTERFACE_VER_TYPE qnnSystemInterface; } QnnFunctionPointers; Copy to clipboard The above structure can be found in ${QNN\_SDK\_ROOT}/examples/QNN/SampleApp/SampleApp/src/SampleApp.hpp. The rest of the tutorial will assume a variable named *m\_qnnFunctionPointers* of type *QnnFunctionPointers* that contains valid function pointers. ### Usage of QNN APIs This section demonstrates the usage of QNN APIs in a client application. #### Use QNN Interface to obtain function pointers QNN Interface mechanism can be used to set up a table of function pointers to QNN APIs in the backend instead of manually resolving symbols to each and every API, which makes resolving symbols easy. QNN Interface can be used as below: QnnInterface_t** interfaceProviders{nullptr}; uint32_t numProviders{0}; // Query for al available interfaces if (QNN_SUCCESS != getInterfaceProviders((const QnnInterface_t***)&interfaceProviders, &numProviders)) { QNN_ERROR("Failed to get interface providers."); return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } // Check for validity of returned interfaces if (nullptr == interfaceProviders) { QNN_ERROR("Failed to get interface providers: null interface providers received."); return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } if (0 == numProviders) { QNN_ERROR("Failed to get interface providers: 0 interface providers."); return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } bool foundValidInterface{false}; // Loop through all available interface providers and pick the one that suits the current API // version for (size_t pIdx = 0; pIdx < numProviders; pIdx++) { if (QNN_API_VERSION_MAJOR == interfaceProviders[pIdx]->apiVersion.coreApiVersion.major && QNN_API_VERSION_MINOR <= interfaceProviders[pIdx]->apiVersion.coreApiVersion.minor) { foundValidInterface = true; m_qnnFunctionPointers.qnnInterface = interfaceProviders[pIdx]->QNN_INTERFACE_VER_NAME; break; } } if (!foundValidInterface) { QNN_ERROR("Unable to find a valid interface."); libBackendHandle = nullptr; return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } Copy to clipboard QNN System Interface can be used to resolve all symbols related to QNN System APIs as shown below: typedef Qnn_ErrorHandle_t (*QnnSystemInterfaceGetProvidersFn_t)( const QnnSystemInterface_t*** providerList, uint32_t* numProviders); QnnSystemInterfaceGetProvidersFn_t getSystemInterfaceProviders{nullptr}; getSystemInterfaceProviders = resolveSymbol( systemLibraryHandle, "QnnSystemInterface_getProviders"); if (nullptr == getSystemInterfaceProviders) { return StatusCode::FAIL_SYM_FUNCTION; } QnnSystemInterface_t** systemInterfaceProviders{nullptr}; uint32_t numProviders{0}; if (QNN_SUCCESS != getSystemInterfaceProviders( (const QnnSystemInterface_t***)&systemInterfaceProviders, &numProviders)) { QNN_ERROR("Failed to get system interface providers."); return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } if (nullptr == systemInterfaceProviders) { QNN_ERROR("Failed to get system interface providers: null interface providers received."); return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } if (0 == numProviders) { QNN_ERROR("Failed to get interface providers: 0 interface providers."); return StatusCode::FAIL_GET_INTERFACE_PROVIDERS; } bool foundValidSystemInterface{false}; for (size_t pIdx = 0; pIdx < numProviders; pIdx++) { if (QNN_SYSTEM_API_VERSION_MAJOR == systemInterfaceProviders[pIdx]->systemApiVersion.major && QNN_SYSTEM_API_VERSION_MINOR <= systemInterfaceProviders[pIdx]->systemApiVersion.minor) { foundValidSystemInterface = true; m_qnnFunctionPointers->qnnSystemInterface = systemInterfaceProviders[pIdx]->QNN_SYSTEM_INTERFACE_VER_NAME; break; } } Copy to clipboard #### Set up logging Logging can be set up before a backed is initialized and after a backend shared library has been dynamically loaded. To initialize logging, a callback of type *QnnLog\_Callback\_t* has to be defined. An example is defined below: void logStdoutCallback(const char* fmt, QnnLog_Level_t level, uint64_t timestamp, va_list argp) { const char* levelStr = ""; switch (level) { case QNN_LOG_LEVEL_ERROR: levelStr = " ERROR "; break; case QNN_LOG_LEVEL_WARN: levelStr = "WARNING"; break; case QNN_LOG_LEVEL_INFO: levelStr = " INFO "; break; case QNN_LOG_LEVEL_DEBUG: levelStr = " DEBUG "; break; case QNN_LOG_LEVEL_VERBOSE: levelStr = "VERBOSE"; break; case QNN_LOG_LEVEL_MAX: levelStr = "UNKNOWN"; break; } fprintf(stdout, "%8.1fms [%-7s] ", ms, levelStr); vfprintf(stdout, fmt, argp); fprintf(stdout, "\n"); } Copy to clipboard The above callback can be registered with the backend along with a maximum log level. Sample code to initialize with a max log level of QNN\_LOG\_LEVEL\_INFO: Qnn_LogHandle_t logHandle; if (QNN_SUCCESS != m_qnnFunctionPointers.qnnInterface.logCreate(logStdoutCallback, QNN_LOG_LEVEL_INFO, &logHandle)) { QNN_ERROR("Unable to initialize logging in the backend."); return StatusCode::FAILURE; } Copy to clipboard #### Initialize backend Once logging has been successfully initialized, backend can be initialized as shown below: Qnn_BackendHandle_t backendHandle; const QnnBackend_Config_t* backendConfigs; /* Set up any necessary backend configurations */ if (QNN_BACKEND_NO_ERROR != m_qnnFunctionPointers.qnnInterface.backendCreate(logHandle, &backendConfigs, &backendHandle)) { QNN_ERROR("Could not initialize backend"); return StatusCode::FAILURE; } Copy to clipboard #### Initialize Profiling If profiling is desired, after the backend is initialized, a profile handle can be set up. This profile handle can be used at a later point in any API that supports profiling. A profile handle can be created in the backend with basic profiling level as shown below: Qnn_ProfileHandle_t profileHandle; if (QNN_PROFILE_NO_ERROR != m_qnnFunctionPointers.qnnInterface.profileCreate( backendHandle, QNN_PROFILE_LEVEL_BASIC, &profileHandle)) { QNN_WARN("Unable to create profile handle in the backend."); return StatusCode::FAILURE; } Copy to clipboard #### Create device Device can be created as shown below: Qnn_DeviceHandle_t deviceHandle {nullptr}; const QnnDevice_Config_t* devConfigArray[] = {&devConfig, nullptr}; Qnn_ErrorHandle_t ret = m_qnnFunctionPointers.qnnInterface.deviceCreate(logHandle, devConfigArray, &deviceHandle); if (QNN_SUCCESS != ret) { QNN_ERROR("Failed to create device: %u", qnnStatus); return StatusCode::FAILURE; } Copy to clipboard Set devConfig as defined here in QNN HTP Backend API #### Register op packages Op packages are way to supply libraries containing ops to backends. They can be registered as shown below: uint32_t opPackageCount; char* opPackagePath[opPackageCount]; char* opPackageInterfaceProvider[opPackageCount]; /* Set up required op package paths and interface providers as necessary */ for(uint32_t idx = 0; idx < opPackageCount; idx++) { if (QNN_BACKEND_NO_ERROR != m_qnnFunctionPointers.qnnInterface.backendRegisterOpPackage(backendHandle, opPackagePath[idx], opPackageInterfaceProvider[idx])) { QNN_ERROR("Could not register Op Package: %s and interface provider: %s", opPackagePath[idx], opPackageInterfaceProvider[idx]); return StatusCode::FAILURE; } } Copy to clipboard #### Create context A context can be created in a backend as shown below: Qnn_ContextHandle_t context; Qnn_DeviceHandle_t deviceHandle {nullptr}; const QnnContext_Config_t* contextConfigs; /* Set up any context configs that are necessary */ if (QNN_CONTEXT_NO_ERROR != m_qnnFunctionPointers.qnnInterface.contextCreate(backendHandle, deviceHandle, &contextConfigs, &context)) { QNN_ERROR("Could not create context"); return StatusCode::FAILURE; } Copy to clipboard #### Prepare graphs *qnn-sample-app* relies on the output from one of the converters to create a QNN network in the backend. *composeGraphsFnHandle* is mapped to *QnnModel\_composeGraphs* API in the model shared library, which takes *qnn\_wrapper\_api::GraphInfo\_t\**\*\* as one of the parameters. The function *composeGraphsFnHandle* will make necessary calls to the backend to create a network(s). It also writes all necessary information, like information about input and output tensors related to the graph, required to execute a graph into the structure *graphsInfo* as shown in the following code block: /* Structure to retrieve information about graphs, like graph name, details about input and output tensors preset in libQnnSampleModel.so */ qnn_wrapper_api::GraphInfo_t** graphsInfo; // No. of graphs present in libQnnSampleModel.so uint32_t graphsCount; // true to enable intermediate outputs, false for network outputs only bool debug; if (qnn_wrapper_api::ModelError_t::MODEL_NO_ERROR != m_qnnFunctionPointers.composeGraphsFnHandle(backendHandle, m_qnnFunctionPointers.qnnInterface, context, &graphsInfo, &graphsCount, debug)) { QNN_ERROR("Failed in composeGraphs()"); return StatusCode::FAILURE; } Copy to clipboard At this point, the context will contain all the graphs that were present in *libQnnSampleModel.so*. #### Finalize Graphs Graphs that were added in the previous step can be finalized as shown below: // information about graphs obtained in the previous step qnn_wrapper_api::GraphInfo_t** graphsInfo; // No. of graphs obtained in the previous step uint32_t graphsCount; /* A valid profile handle if profiling is desired, nullptr if profiling is not needed */ Qnn_ProfileHandle_t profileHandle; for (size_t graphIdx = 0; graphIdx < m_graphsCount; graphIdx++) { if (QNN_GRAPH_NO_ERROR != m_qnnFunctionPointers.qnnInterface.graphFinalize( (*graphsInfo)[graphIdx].graph, profileBackendHandle, nullptr)) { return StatusCode::FAILURE; } /* Extract profiling information if desired and if a valid handle was supplied to finalize graphs API */ } Copy to clipboard #### Save context into a binary After all the graphs in a context are finalized, the user application may choose to save the context into a binary for future use. The advantage of saving a context is that it can be retrieved in the future for execution of graphs contained within it without having to finalize them again. This will save considerable time for initialization during execution of a network. The context can be saved as shown below: // Get the expected size of the buffer from the backend in which the context can be saved if (QNN_CONTEXT_NO_ERROR != m_qnnFunctionPointers.qnnInterface.contextGetBinarySize(context, &requiredBufferSize)) { QNN_ERROR("Could not get the required binary size."); return StatusCode::FAILURE; } // Allocate a buffer of the required size saveBuffer = (uint8_t*)malloc(requiredBufferSize * sizeof(uint8_t)); if (nullptr == saveBuffer) { QNN_ERROR("Could not allocate buffer to save binary."); return StatusCode::FAILURE; } auto status = StatusCode::SUCCESS; uint32_t writtenBufferSize{0}; // Pass the allocated buffer and obtain a copy of the context binary written into the buffer if (QNN_CONTEXT_NO_ERROR != m_qnnFunctionPointers.qnnInterface.contextGetBinary(context, reinterpret_cast(saveBuffer), requiredBufferSize, &writtenBufferSize)) { QNN_ERROR("Could not get binary."); status = StatusCode::FAILURE; } // Check if the supplied buffer size is at least as big as the amount of data witten by the backend if (requiredBufferSize < writtenBufferSize) { QNN_ERROR( "Illegal written buffer size [%d] bytes. Cannot exceed allocated memory of [%d] bytes", writtenBufferSize, requiredBufferSize); status = StatusCode::FAILURE; } // Use caching utility to save metadata along with the binary buffer from the backend if (status == StatusCode::SUCCESS && tools::datautil::StatusCode::SUCCESS != tools::datautil::writeBinaryToFile(outputPath, saveBinaryName + ".bin", (uint8_t*)saveBuffer, writtenBufferSize)) { QNN_ERROR("Could not serialize to file."); status = StatusCode::FAILURE; } Copy to clipboard #### Load context from a cached binary A context that was saved into a binary, like in the previous step, can be loaded as an alternative to creating a new context every time. The code snippet below demonstrates this step: auto returnStatus = StatusCode::SUCCESS; std::shared_ptr buffer{nullptr}; uint32_t graphsCount {0}; buffer = std::shared_ptr(new uint8_t[bufferSize], std::default_delete()); if (!buffer) { QNN_ERROR("Failed to allocate memory."); return StatusCode::FAILURE; } if (tools::datautil::StatusCode::SUCCESS != tools::datautil::readBinaryFromFile( cachedBinaryPath, reinterpret_cast(buffer.get()), bufferSize) QNN_ERROR("Failed to read binary file."); returnStatus = StatusCode::FAILURE; } /* Create a QnnSystemContext handle to access system context APIs. */ QnnSystemContext_Handle_t sysCtxHandle{nullptr}; if (QNN_SUCCESS != m_qnnFunctionPointers.qnnSystemInterface.systemContextCreate(&sysCtxHandle)) { QNN_ERROR("Could not create system handle."); returnStatus = StatusCode::FAILURE; } /* Retrieve metadata from the context binary through QNN System Context API. */ QnnSystemContext_BinaryInfo_t* binaryInfo{nullptr}; uint32_t binaryInfoSize{0}; if (StatusCode::SUCCESS == returnStatus && QNN_SUCCESS != m_qnnFunctionPointers.qnnSystemInterface.systemContextGetBinaryInfo( sysCtxHandle, static_cast(buffer.get()), bufferSize, &binaryInfo, &binaryInfoSize)) { QNN_ERROR("Failed to get context binary info"); returnStatus = StatusCode::FAILURE; } qnn_wrapper_api::GraphInfo_t** graphsInfo; /* Make a copy of the metadata. */ if (StatusCode::SUCCESS == returnStatus && !copyMetadataToGraphsInfo(binaryInfo, graphsInfo, graphsCount)) { QNN_ERROR("Failed to copy metadata."); returnStatus = StatusCode::FAILURE; } /* Release resources associated with previously created QnnSystemContext handle. */ m_qnnFunctionPointers.qnnSystemInterface.systemContextFree(sysCtxHandle); sysCtxHandle = nullptr; /* readBuffer contains the binary data that was previously obtained from a backend. Pass this cached binary data to the backend to recreate the same context. */ if (StatusCode::SUCCESS == returnStatus && m_qnnFunctionPointers.qnnInterface.contextCreateFromBinary(backendHandle, deviceHandle, (const QnnContext_Config_t**)&contextConfig, reinterpret_cast(readBuffer), bufferSize, &context, profileBackendHandle)) { QNN_ERROR("Could not create context from binary."); returnStatus = StatusCode::FAILURE; } // Optionally, extract profiling numbers if desired if (ProfilingLevel::OFF != m_profilingLevel) { extractBackendProfilingInfo(profileBackendHandle); } /* Obtain and save graph handles for each graph present in the context based on the saved graph names in the metadata */ if (StatusCode::SUCCESS == returnStatus) { for (size_t graphIdx = 0; graphIdx < m_graphsCount; graphIdx++) { if (QNN_SUCCESS != m_qnnFunctionPointers.qnnInterface.graphRetrieve( context, (*graphsInfo)[graphIdx].graphName, &((*graphsInfo)[graphIdx].graph))) { QNN_ERROR("Unable to retrieve graph handle for graph Idx: %d", graphIdx); returnStatus = StatusCode::FAILURE; } } } Copy to clipboard #### Run graphs After a context has been created, graphs have been added and finalized, or alternatively, after a context has been retrieved from a binary, one or more graphs in the context can be executed. Running a graph involves: 1. Setting up input and output tensors. 2. Populating input data into input tensors. 3. Calling the execute method in the backend. 4. Obtaining outputs and saving them. This is demonstrated using the code snippet below: // Select a graph from graphsInfo if there are more than one graph in this context uint32_t graphIdx; QNN_DEBUG("Starting execution for graphIdx: %d", graphIdx); Qnn_Tensor_t* inputs = nullptr; Qnn_Tensor_t* outputs = nullptr; // IOTensor utility is used to set up input and output tensor structures if (iotensor::StatusCode::SUCCESS != ioTensor.setupInputAndOutputTensors(&inputs, &outputs, (*graphsInfo)[graphIdx])) { QNN_ERROR("Error in setting up Input and output Tensors for graphIdx: %d", graphIdx); returnStatus = StatusCode::FAILURE; break; } // Grab input raw file paths to read input data auto inputFileList = inputFileLists[graphIdx]; auto graphInfo = (*graphsInfo)[graphIdx]; if (!inputFileList.empty()) { /* *qnn-sample-app* reads data based on the batch size until the whole buffer is filled. If there isn't sufficient data, it pads the rest with zeroes. */ size_t totalCount = inputFileList[0].size(); while (!inputFileList[0].empty()) { size_t startIdx = (totalCount - inputFileList[0].size()); // IOTensor utility is used to populate input tensors with input data if (iotensor::StatusCode::SUCCESS != m_ioTensor.populateInputTensors( graphIdx, inputFileList, inputs, graphInfo, inputDataType)) { returnStatus = StatusCode::FAILURE; } if (StatusCode::SUCCESS == returnStatus) { // Execute the graph in the backend with optional profile handle QNN_DEBUG("Successfully populated input tensors for graphIdx: %d", graphIdx); Qnn_ErrorHandle_t executeStatus = QNN_GRAPH_NO_ERROR; executeStatus = m_qnnFunctionPointers.qnnInterface.graphExecute(graphInfo.graph, inputs, graphInfo.numInputTensors, outputs, graphInfo.numOutputTensors, profileBackendHandle, nullptr); if (QNN_GRAPH_NO_ERROR != executeStatus) { returnStatus = StatusCode::FAILURE; } if (StatusCode::SUCCESS == returnStatus) { QNN_DEBUG("Successfully executed graphIdx: %d ", graphIdx); // IOTensor utility is used to write output tensors to raw files if (iotensor::StatusCode::SUCCESS != ioTensor.writeOutputTensors(graphIdx, startIdx, graphInfo.graphName, outputs, graphInfo.outputTensors, graphInfo.numOutputTensors, outputDataType, graphsCount, outputPath)) { returnStatus = StatusCode::FAILURE; } } } if (StatusCode::SUCCESS != returnStatus) { QNN_ERROR("Execution of Graph: %d failed!", graphIdx); break; } } } // Clean up all the tensors after execution is completed ioTensor.tearDownInputAndOutputTensors( inputs, outputs, graphInfo.numInputTensors, graphInfo.numOutputTensors); inputs = nullptr; outputs = nullptr; if (StatusCode::SUCCESS != returnStatus) { break; } } Copy to clipboard IOTensor is a utility provided with the source code at ${QNN\_SDK\_ROOT}/examples/QNN/SampleApp/SampleApp/src/Utils/IOTensor.cpp. It exposes a few methods that help with the execution of a graph, which were used in the previous code snippet: 1. *setupInputAndOutputTensors* to set up structures related to input and output tensors. 2. *populateInputTensors* to copy input data into input tensor structures. 3. *tearDownInputAndOutputTensors* to clean up resources associated with input and output tensors. Refer to the IOTensor source code for more details about these APIs. #### Free context After all the execution is completed, the context can be freed as shown below: if (QNN_CONTEXT_NO_ERROR != m_qnnFunctionPointers.qnnInterface.contextFree(context, profileBackendHandle)) { QNN_ERROR("Could not free context"); return StatusCode::FAILURE; } Copy to clipboard #### Terminate backend Backend can be terminated as shown below: if (QNN_BACKEND_NO_ERROR != m_qnnFunctionPointers.qnnInterface.backendFree(backendHandle)) { QNN_ERROR("Could not free backend"); return StatusCode::FAILURE; } Copy to clipboard ## SNPE sample app For C++ API and sample app execution using SNPE, see the [Qualcomm AI Runtime SDK documentation](https://docs.qualcomm.com/nav/home/usergroup8.html?product=1601111740009302). Last Published: Jun 23, 2026 [Previous Topic Add postprocessing support for a custom model](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/add-postprocessing-support-custom-model.md) [Next Topic Use AI Hub models with the GStreamer API](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/use-ai-hub-models-with-gstreamer.md)