# QNN GPU QnnMem API Tutorial

## Introduction

This tutorial demonstrates the usage of the QnnMem API for the QNN GPU backend.
This feature allows for data sharing between processing domains in the QNN GPU
backend.

The supported types of shared memory of the QnnMem API for the QNN GPU backend are as follows:

| Qnn\_MemDescriptor\_t Type | QnnGpu\_MemType\_t Type | Descriptor |
| --- | --- | --- |
| QNN\_MEM\_TYPE\_CUSTOM | QNN\_GPU\_MEM\_OPENCL | <ul class="simple"><br><li><p>Each tensor shall be mapped to their own OpenCL buffer</p></li><br><li><p>One-to-one relationship between OpenCL buffer and memory handle</p></li><br></ul> |

Note

This tutorial is only focused on the QNN GPU Mem OpenCL buffer usage. There are some prerequisites in the
SDK example code not discussed in detail here. Users can refer to the corresponding part in the
QNN documentation, or refer to the SampleApp.

SampleApp documentation: general/sample\_app:Sample App Tutorial

SampleApp code: ${QNN\_SDK\_ROOT}/examples/QNN/SampleApp

## Using QNN\_MEM\_TYPE\_CUSTOM with the QNN API

The following documentation demonstrates the custom memory type feature of the QnnMem API for the QNN
GPU backend which allows a user to allocate their own OpenCL buffers to manage and register for input
and output tensors. Doing so will achieve zero-copy in which the need to read to and write from the GPU
memory to host and vice-versa is mitigated. This will yield inference time metric improvements.

The following is a representation of utilizing OpenCL buffers, where each tensor has its own OpenCL
buffer and memory handle.

![../../_static/resources/qnn_gpu_opencl_buffers.png](data:image/png;base64,UklGRhgIAABXRUJQVlA4TAwIAAAvyUEzAB/kOJJtVZn/P+4rhxw0OXIkie+2xLn3vpeGG0m2VWVwd3fW4LD/nNgdXvbdPzy+28i21eaIKZQiyZi5Alfg1rEM5vgzzH/4KAYBJNAaPCXToQKNeZAJ46mPJ5n0xMsgLRD+sYSXKZOOycD9A/15Lgr06/kocDKSn0HcefwrHcuxEJWT6+gKEb52ZD+Ij78fuR86qRI8HQBDsMbj24POaogVDWhBa+IGEjoSeB+R4gUY9R2AUAPuAKkGIJSAC0AAqQTQgL9xFaNqok1ETAGDDX9McGRjRQNaK6A5dFrQFLaALxiCBBoJi79tYgE80UcB4AmBAsBNAeAOgRQgkAFe2IPsDZjhAbiLAczMzcaiPOgVjaVdgtFCWZvOtOv26c5Zyh59ZAK8wBsxYAjeSzwufvut267QZwcUt/9X01a/FqbX5+7uWplRaOeWWiaQSeZjYaynFmhyfzf9zzkSjuTQXktE/2WxbWNJkmrROzP7sp2q2r6rAPsPq8iR+0/uP/9awyepeFFjckVVqMIHxGOuHQjTCXGlQ5hCoTXjFlpzGcaL8GQb6xNQjRERCVRjJO+x7gAkwYLNsFGhqvEAgN/M8rtq3JgAP3Do1BZtAT6BQoskVrvYqFCdkUQEMs0A3ZHmhLDzxm9MWg0bFarDm6jpWgbC6kFEbLorIEQ/wO98ZvY3qTeBA+DPuLWPwmAAbPbESoUqvA+uAmQceuPgUAVtDI67kCDiHPt7x14XOxWK+U6ImX/ZymMlUigvQrTX3m21QuVLNgIgkhCRdEeCjUlGWDcn39YIZIXRwxASmfPPhKlISLo5+bZGID25h8GLEIMlbLpQayPWJwQ6BPgR4nEXfJxpYd0BRt8XEZxe4E++LRRID+1hSAInbL6MB7zoKoAfOLwOCPljRv5QLbhCC1eN56AWi+D1ciffNgqkh/YwdMINSTUe8JEZA7wOwZJ7vYSIzGEaI5JZ7okIXi938m2lQPzGZK+FCzkduqQKLSkExfE7n2mdQMQ9DD0ULyIuvIkG0kkaTuFtJIIWrtYSbZ68XnrybalAGA+98lOai3giDpxaG/Hb8Ve8js/8PzLeY3CijcRHJFSk67WxvoDBPLPEAgdYvS538m2jQAQ9DKvIkftP7j//H8emHTrHphQ2bNNqYWMKz7U62JrGVq0WnhvE71rdb1HGzqN39I2jO1PYuOeOxnHgtxS2HNRpYVcau3Q6OLjFIHYc1hqkOm6P6Ru3U9k7pnEcSuWITgvp6HRwxCju6Awy95/cf/6lzujUaZ6RR89Oq0GBg8oQwNqHalBgYeQxtaAEmVhFXIOz5iNjPwPM8qlenwboe6gGeQejU1Rf/z0lyFsoTbMWVCDjQMANgNPmI2M/A1zjUx3su/CseFcN0g4qT+HMWOUJnFaCfBjX1u4bqwz2PdTINZH9WbpCMss1MP8yOw2ixTY6tTymBmkHo1NnaWZ9D5UgGwb3wadqLrIOGA3LT/rvmY+s/WsAxRvTcP6uiVwDhuJddgs9qwhpB8y4Ob08pgQFFipPmRWsAnkHpenlytPiXfORtd9hHZ1NZ417kr7+Z+G0IqQciD7x3F01KLAwOkU/ogRpB6Xp4l26QmT5+fLlFC6myriUwqWL3TyRtj8Lax/QXclIBKmyG7EiFDgYGe/bPTamE1pq8Ww8qwQJB8KbPwWfsv+vv1L4kSrjewrff3SDRJCC0jIrVTMKbgHopqcIaQeVYTrhVaHmvqQ0rQZZB6Vphr6H5l/kb2f77zEr1TBElkYeF+9yn6kIWQelcfpppV8VPZG1cK2/MxPLT+C0EmRjFZTP/C+TtT861Ymg0rlzNJHKU0bVMi8AzipB0gFvoKgI2TAGgRnnxpQg6YC53WEu5iNpn+5I/fenqH8jXx4H+GXfmGLkHAh+Ey4relk6jMfACFCDpAOetRn4RpL2Z+nKukU/9d5q8n/uc//J/ed/nViF/+2gzTt0js0prN+u1cJPve7vef2hOchV5Mj9J/ef3H9y/7ErPklhzalPZAVc3mqMBELEpmsWYQqFVhARy0zpWhuxPtkDCAmkvAUOJA0j4vQiPNnG+gSdlYhImNn5HusO49UuWUNI3ELYdDOJ0eWtxth0DaFjpTEBfuCIChZiY9InujB9SkNmMbq81fhkNGcOA3QLnxPC+GEudsmaKsLAgUxibnlZPX6wYC4ENGL6lIZaq+n2EAiwFFpo0pM3dI35M27t45xl6MK/92c23xlZXiHgRcZwss3+9qu2MTjuQoKIc7SGiIhN1xJZ859beJfRi8HlZapLwDfniSWzBngfIX6dsPnLWbhFxcakVbM2FguUt3dlnRPea4JIQkTSHQk2Jhlh6d0oFgjCzB4GC4SpvxklJAB+11LZo9eQGNCNYl4QZvYwWCBMVc0oXoQYLGHTZZfOhECHAD9CPO6CjzMtrDvA6PsigtOr65TN+CDAEOwSplwzShI4YfNlPOBFVwH8wOF1QMifmvGnkcEV6rUaz0EtFsHr1XReanYQBvUw2CVMuWaUjpyQ0B4HZMYAr0MwS18vISJzCsmIZLagRASvVxNmB2FQD4NdwmSbUaThZKVDZ2GhJQVnRxPmBmFYD4M9whQ1o8jiRcSFN9FAOknDKbyNRFCvtZZov+H1asLsIAzqYbBLmFLNKF6EzUU8EQdOrY347fgrXsdn/r78PQYn2kh8REJFul4b6wsYzMeIiIEDrF5XUzeK2UGY1cNgjzBfCJpRVpEj9x9VAA==)

Below is an example of an implementation of this API. It implies that the user has prior background
with OpenCL and its APIs.

GPU OpenCL Buffer Example

1 // QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
     2 QnnInterface_t qnnInterface;
     3 // Init qnn interface ......
     4 // See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
     5
     6 // Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
     7 Qnn_Tensor_t inputTensor;
     8 // Set up common setting for inputTensor ......
     9 /* There are 2 specific settings for custom QnnMem buffer:
    10 *  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 41)
    11 *  2. union member memHandle should be used instead of clientBuf, and it
    12 *     should be set to nullptr. (line 42)
    13 */
    14
    15
    16 // Allocate some buffer, for example two tensors for the gpu runtime with dimensions {64, 128}
    17 size_t bufferSize = 64 * 128 * sizeof(float);
    18 auto outputTensorBuffer = (float*)malloc(bufferSize);
    19
    20 // Generate OpenCL context
    21 auto clContext = ...;
    22 const cl_mem_flags = CL_MEM_READ_WRITE;
    23 cl_int clStatus;
    24 auto outputTensorCLBuffer = new cl::buffer(*clContext, memFlags, bufferSize, outputTensorBuffer, &clStatus);
    25 if (clStatus != CL_SUCCESS) {
    26    // Handle error
    27 }
    28
    29 // Fill the info of Qnn_MemDescriptor_t and register the buffer to QNN
    30 // Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
    31 Qnn_MemDescriptor_t memDescriptor = QNN_MEM_DESCRIPTOR_INIT;
    32 memDescriptor.memShape = {inputTensor.rank, inputTensor.dimensions, nullptr};
    33 memDescriptor.dataType = inputTensor.dataType;
    34 memDescriptor.memType = QNN_MEM_TYPE_CUSTOM;
    35
    36 // Fill the info of QnnGpu_MemInfoCustom_t to apply to the Qnn_MemDescriptor_t.
    37 QnnGpu_MemInfoCustom_t customInfo = QNN_GPU_MEM_INFO_CUSTOM_INIT;
    38 customInfo.memType = QNN_GPU_MEM_OPENCL;
    39 customInfo.buffer = reinterpret_cast<QnnGpuMem_Buffer_t>(outputTensorBuffer);
    40 memDescriptor.customInfo = customInfo;
    41 outputTensor.memType = QNN_TENSORMEMTYPE_MEMHANDLE;
    42 outputTensor.memHandle = nullptr;
    43
    44 Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
    45 // To obtain QNN context handle:
    46 // For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
    47 // For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
    48 Qnn_MemHandle_t memHandles[1];
    49 auto result = QnnMem_register(context, &memDescriptor, 1u, memHandles);
    50 if (QNN_SUCCESS != result) {
    51     // handle errors
    52 }
    53
    54 /**
    55 * At this place, the allocation and registration of the OpenCL buffer has been complete.
    56 * On user side, this buffer can be manipulated through outputTensorBuffer;
    57 */
    58
    59 // Load the input data to outputTensorBuffer ......
    60
    61 // Execute QNN graph with input tensor and output tensor ......
    62
    63 // Get output data for example
    64 auto openCLCommandQueue = ...; // Get cl::CommandQueue instance
    65 auto mappedPtr =
    66     reinterpret_cast<float*>(openCLCommandQueue->enqueueMapBuffer(*outputTensorBuffer,
    67                                                                   CL_TRUE,
    68                                                                   CL_MAP_READ,
    69                                                                   0,
    70                                                                   sizeof(bufferSize),
    71                                                                   nullptr,
    72                                                                   nullptr,
    73                                                                   &clStatus);
    74 if (clStatus != CL_SUCCESS) {
    75     // handle error
    76 }
    77
    78 // Access contents of mappedPtr e.g.
    79 std::vector<float> contents(mappedPtr, mappedPtr + bufferSize);
    80 for (size_t i = 0u; i < contents.size(); i++) {
    81     // Read data
    82 }
    83
    84 // On completion, unmap the mappedPtr
    85 clStatus = openCLCommandQueue->enqueueUnmapBuffer(*outputTensorBuffer, mappedPtr, nullptr, nullptr);
    86 if (clStatus != CL_SUCCESS) {
    87     // handle error
    88 }
    89
    90 // Deregister and free all buffers if it's not being used
    91 result = QnnMem_deregister(&tensors.memHandle, 1);
    92 if (QNN_SUCCESS != registRet) {
    93     // handle errors
    94 }
    95
    96 // deallocate memory
    Copy to clipboard

Last Published: Jun 04, 2026

[Previous Topic
GPU](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/gpu_backend.md) [Next Topic
QNN GPU Tuning Mode Tutorial](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/gpu_tuning_mode_tutorial.md)