# Creating a UDO Package

This section describes the process of creating a UDO package
from a simple text specification of a user-defined operation
using the
[snpe-udo-package-generator](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_tools.html#snpe-udo-package-generator).
From the Qualcomm® Neural Processing SDK API standpoint, a UDO package consists of a
registration library and one or more implementation libraries.
As such, while a user can create a UDO package independent of
this prescription, this section describes the process of
creating a partially defined UDO package which can be easily
implemented and compiled to produce the relevant libraries.

Generating UDO Skeleton Code

To generate a package using Qualcomm® Neural Processing SDK tools, it is necessary to
create a UDO configuration describing the operation and the
package details. See [Defining a UDO
Package](https://docs.qualcomm.com/doc/80-63442-10/topic/udo_package_definition.html) for more information.
Once a configuration has been specified to adequately represent
the desired UDO, it can be supplied as an argument to the Qualcomm® Neural Processing SDK
UDO package generator tool described in
[snpe-udo-package-generator](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_tools.html#snpe-udo-package-generator).
The intention of the tool is to generate partial skeleton code
to aid rapid prototyping. This section describes the usage of
the package generator tool and the artifacts it generates.

In order to run the
[snpe-udo-package-generator](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_tools.html#snpe-udo-package-generator),
the user is expected to have followed the setup instructions at
[Qualcomm (R) Neural Processing SDK Setup](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_setup.html). The tool also has a dependency on
the Mako Template Library, which can be found here:
[https://www.makotemplates.org/download.html](https://www.makotemplates.org/download.html). Additionally, we
need an extracted Qualcomm® AI Direct SDK (no need of Qualcomm® AI Direct SDK setup) for
generating the skeleton code. For Qualcomm® AI Direct SDK details, refer to the
Qualcomm® AI Direct SDK documentation at `$QNN_SDK_ROOT/docs/index.html` page,
where `QNN_SDK_ROOT` is the location of the Qualcomm® AI Direct SDK
installation. Set the `$QNN_SDK_ROOT` to the unzipped Qualcomm® AI Direct SDK
location. Once setup is complete, the following command can be
used to generate a package:

snpe-udo-package-generator -p $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/config/Softmax_Htp.json -o <my-dir>
    Copy to clipboard

The above command will create a UDO package which will be a
directory composed of skeleton code and build files that can be
used to compile the package contents into stand-alone shared
libraries. The config file referenced in [UDO
Tutorial](https://docs.qualcomm.com/doc/80-63442-10/topic/tutorial_inceptionv3_udo.html) has been used to
generate the udo package contents below:

|-- Makefile
    |-- common.mk
    |-- config
    |   `-- Softmax_Htp.json
    |-- include
    |   `-- utils
    |       |-- IUdoOpDefinition.hpp
    |       |-- UdoMacros.hpp
    |       `-- UdoUtil.hpp
    `-- jni
        |-- Android.mk
        |-- Application.mk
        `-- src
            |-- CPU
            |   |-- Makefile
            |   |-- makefiles
            |   |   |-- Android.mk
            |   |   |-- Application.mk
            |   |   `-- Makefile.linux-x86_64
            |   `-- src
            |       |-- CpuCustomOpPackage.cpp
            |       |-- SoftmaxUdoPackageInterface.cpp
            |       |-- ops
            |       |   `-- Softmax.cpp
            |       `-- utils
            |           |-- BackendUtils.hpp
            |           |-- CPU
            |           |   |-- CpuBackendUtils.cpp
            |           |   `-- CpuBackendUtils.hpp
            |           `-- CustomOpUtils.hpp
            |-- DSP_V68
            |   |-- Makefile
            |    `-- src
            |       |-- SoftmaxUdoPackageInterface.cpp
            |       `-- ops
            |           `-- Softmax.cpp
            |-- GPU
            |   |-- Makefile
            |   |-- include
            |   |   |-- GpuCustomOpPackage.hpp
            |   |   `-- Operation.hpp
            |   |-- makefiles
            |   |   |-- Android.mk
            |   |   `-- Application.mk
            |   `-- src
            |       |-- GpuCustomOpPackage.cpp
            |       |-- SoftmaxUdoPackageInterface.cpp
            |       `-- ops
            |           `-- Softmax.cpp
            |-- reg
            |   |-- Makefile
            |   `-- SoftmaxUdoPackageRegLib.cpp
            `-- utils
                `-- UdoUtil.cpp
    Copy to clipboard

Contents of a UDO package

- The package can be compiled using the make build system for
a Linux host machine or the Android-NDK build system for an
Android device. Briefly, the make system is configured using
the top level **Makefile**, **common.mk** and the individual
makefiles in each runtime directory. The android-build
system is configured using **jni/Android.mk** and
**jni/Application.mk**. See [Compiling a UDO
package](https://docs.qualcomm.com/doc/80-63442-10/topic/compiling_udo_package.html) for more compilation
details.
- The config directory contains the JSON configuration used to
create the package.
- The include directory contains three kinds of files: headers
from the Qualcomm® Neural Processing SDK UDO API, header files specific to the UDO
package and its operations, and a directory of C++ helper
utils which wrap the Qualcomm® Neural Processing SDK UDO API calls. Users should note
that the utils API is included simply for convenience in
creating implementation source code. The use of the utils is
not a prerequisite for constructing or executing a UDO
package.
- The relevant source files for the package are organized
under the **jni/src** directory. There will be a
sub-directory for each core-type specified in the config.
The registration (reg) directory contains files necessary to
create the registration library, which is generally the
point of entry for the Qualcomm® Neural Processing SDK API. There is also source code
from the previously mentioned C++ helper utils. In general,
users are only expected to edit code contained in
runtime-specific or registration directories.

Generated Source Code

This section and the following sub-sections cover the source
code generated in a package using the package contents
displayed in [Generating UDO Skeleton
Code](https://docs.qualcomm.com/doc/80-63442-10/topic/creating_udo_package.html#generating-udo-implementation-stubs).
When finalized, a UDO package is expected to contain a
registration library and one or more implementation libraries.
To produce the registration library, the source code in
**jni/src/reg** is compiled. The implementation library is
compiled using source code from each core-type specific
directory. Recall that the package created by the tool will
still need to be implemented. The following subsections will
address the files that need to be implemented. All generated
source code will have the tag **Auto-generated** in the header.
The source code is considered partially complete in the
generation stage, and it is the user’s responsibility to
implement certain files as needed to ensure proper
compatibility and functionality with the Qualcomm® Neural Processing SDK API. All code to
be implemented will have the tag **add code here** in the body
to indicate that it needs to be implemented. Note that all
libraries link against the C++ utils source code.

Completing the Registration Skeleton Code

As mentioned previously, the registration library is created
from source code in **jni/src/reg**. The directory contains a
Makefile to compile the package and the package specific file:
**SoftmaxUdoPackageRegLib.cpp** which contains the function
symbols that get resolved by the Qualcomm® Neural Processing SDK UDO API when the library
is opened. The registration library file contains API calls
that provide the Qualcomm® Neural Processing SDK UDO API with information about the nature
of the operations in the model, as well as the implementation
libraries they belong to.

Completing the Implementation Skeleton Code

The implementation library is created per core-type, from
source code that lives under the core-type specific directory
within **jni/src**. Using the CPU runtime as an example, the
**jni/src/CPU** directory contains a Makefile to build the CPU
implementation library, a package-specific source file:
**SoftmaxUdoPackageInterface.cpp** for all operations to be
contained in the library, and a per operation source file:
**Softmax.cpp** that should contain the runtime implementation.
As in the registration case, the package-specific source file
should not be edited in the general case. Similarly this file
contains methods that return information about the operations
contained in the implementation library, and methods that act
as a layer of indirection above the code that is ultimately
executed in the per operation file. In the CPU case, the three
methods in **Softmax.cpp** namely: **finalize**, **execute**,
and **free** are the user’s responsibility to edit. Note these
methods create the operation, execute its implementation, and
free the operation respectively. As such, these are completely
determined by the user. A sample generated version of the
implementation library is included below:

Qnn_ErrorHandle_t execute(CustomOp* operation) {
    
      /**
       * Add code here
       **/
    
      return QNN_SUCCESS;
    }
    
    Qnn_ErrorHandle_t finalize(const CustomOp* operation) {
      QNN_CUSTOM_BE_ENSURE_EQ(operation->numInput(), 1, QNN_OP_PACKAGE_ERROR_VALIDATION_FAILURE)
      QNN_CUSTOM_BE_ENSURE_EQ(operation->numOutput(), 1, QNN_OP_PACKAGE_ERROR_VALIDATION_FAILURE)
    
      /**
       * Add code here
       **/
    
      return QNN_SUCCESS;
    }
    
    Qnn_ErrorHandle_t free(CustomOp& operation) {
    
        /**
        * Add code here
        **/
    
        return QNN_SUCCESS;
    }
    Copy to clipboard

To have good performance and stability, it is required to
avoid heap memory allocation in the completed op execution
functions, that is, **&lt;op\_name&gt;Impl**,
**&lt;op\_name&gt;\_executeOp**, **&lt;op\_name&gt;Operation** and
**execute** functions for DSP V68 and later, DSP V66 / V65,
GPU, and CPU respectively which are executed during graph
execution. The heap memory allocation includes but not
limited to calling `malloc`, `operator new`,
constructing STL container objects like `std::vector`
with default allocator, and adding items like calling
`std::vector::push_back` to STL container objects
with default allocator.

The reason to avoid heap memory allocation is because the
time to finish heap memory allocation is unbounded and may
have huge variance. Especially for DSP and HTP, the heap
memory allocation can trigger CPU request in some cases and
significantly impact the inference speed. Also, the heap
memory allocation can fail and return null pointers or throw
exceptions. In such case, there is usually no good way to
continue the execution. In applications with strict
functional safety requirements, heap memory allocation after
initialization is not even permitted.

If a working buffer is required to carry out the op
computation, here are some potential alternatives:

- **construct std::array instead of std::vector for local variables**:
Unlike `std::vector`, `std::array` uses stack
memory. This works if the maximum memory size can be known
in advance and the size is not large.
- **use output tensor space as scratch memory**: Each
execution function has at least one output tensor. You
can use the space of the output tensor as the scratch buffer
before you fill in the real output data. Please note that
the output tensor space can only be safely written in the
execution function which owns the output tensor.

Notes

- In the general case, the package should only require
functional edits that enable proper execution. The initial
un-implemented package is guaranteed to compile.
- One subtle distinction is that the generated DSP V65 or DSP
V66 implementation source code expects one operation per
implementation library. While in the CPU, GPU, and DSP V68
or later cases, there may be an arbitrary number of
operations in a library.
- There are differences between the implementation source
files for each runtime. In the GPU case, the **execute**
workflow is already implemented and the user is only
expected to implement the **&lt;OpName&gt;Operation** and
**setKernelInfo** methods. In contrast to CPU and GPU, DSP
uses API which does not depend on C++ helper utils discussed
in the [Generated Source
Code](https://docs.qualcomm.com/doc/80-63442-10/topic/creating_udo_package.html#udo_generated_source_code)
section. This means that certain helper methods and
constructors may not be available in the DSP case. For DSP
case, the user is expected to implement **softmaxImpl**
method.

Last Published: Jun 04, 2026

[Previous Topic
Defining a UDO Package](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/udo_package_definition.md) [Next Topic
Compiling a UDO package](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/compiling_udo_package.md)