# Compiling a UDO package

Introduction

This section provides information about compiling UDO packages
for all supported runtimes in Qualcomm® Neural Processing SDK.

As explained in [Overview of UDO](https://docs.qualcomm.com/doc/80-63442-2/topic/udo_overview.html), a set
of registration and implementation libraries is collectively
referred to as a UDO package. The user has complete control
over building these libraries for their desired runtimes using
compatible tool-chains. Alternatively, Qualcomm® Neural Processing SDK offers tools
and utilities to create and compile a UDO easily. For more
information about the tool used to create a UDO package refer
to [Creating a UDO package](https://docs.qualcomm.com/doc/80-63442-2/topic/creating_udo_package.html). This
section explains UDO package compilation based on the directory
structure provided by the package generator.

Implementing a User-defined operation

Fundamentally, a UDO is required to be developed using the
set of APIs defined in header files located at
$SNPE\_ROOT/include/SNPE/SnpeUdo/. Each runtime may impose
additional requirements and provide options for customizing
the implementation to suit the runtime. Details of the UDO
APIs can be found in the API documentation at
Qualcomm® Neural Processing SDK API.
This section assumes that a UDO package was generated using
the UDO package generator tool described in [Creating a UDO
package](https://docs.qualcomm.com/doc/80-63442-2/topic/creating_udo_package.html) which produces a
partial implementation skeleton based on the UDO
specification configured by the user.

Make Targets for Package Compilation

The UDO package generator tool creates a makefile to compile
the package for a specific runtime and target platform
combination. The makefile is intended to provide a simple
interface to compile for platforms that use make natively or
require ndk-build. Using the provided makefile also allows for
per library compilation for various targets.

The general form of each make target is &lt;runtime&gt;\_&lt;platform&gt;.
Targets that are only of the form &lt;runtime&gt; include all
possible targets. For instance, running

make cpu
    Copy to clipboard

will compile the CPU for both x86 and Android platforms (arm64-v8a).
A comprehensive table of available
make targets is presented
[below](https://docs.qualcomm.com/doc/80-63442-2/topic/compiling_udo_package.html#table-of-make-targets) .

**Note:** Use of the makefile is optional and not required to
generate libraries.

**Note:** For all following examples, the displayed artifacts
are for arm64-v8a target.

Implementing a UDO for CPU

A CPU UDO implementation library based on core UDO APIs is
required to run a UDO package on CPU runtime. The UDO package
generator tool will create a skeleton containing blank
constructs in the required format, but the core logic of
creating and execution of the operation needs to be filled in
by the user. This can be done by completing the implementation
of `finalize()`, `execute()`, and `free()` functions in
the **&lt;OpName&gt;.cpp** file generated by the UDO package
generator tool.

To have good performance and stability, it is required to avoid
heap memory allocation in the completed `execute()` functions.
The heap memory allocation includes but not limited to calling
`malloc`, `operator new`, constructing STL container objects
like `std::vector` with default allocator, and adding items
like calling `std::vector::push_back` to STL container objects
with default allocator. Please check
[here](https://docs.qualcomm.com/doc/80-63442-2/topic/creating_udo_package.html#avoid-using-heap-memory-allocation)
for more information.

**Note:** One important notion to take into account is that the
Qualcomm® Neural Processing SDK provides tensor data corresponding to all the inputs and
outputs of a UDO not directly but as an opaque pointer.
The UDO implementation is expected to get a handle to the
raw tensor pointers using the methods in the CustomOp operation
object issued by Qualcomm® Neural Processing SDK at the time of
execution. The CPU runtime operates only with floating point
activation tensors. Therefore, CPU UDO implementations should be
implemented to receive and produce only floating point tensors and set
the field data\_type in the config file to FLOAT\_32. All other data
types will be ignored. Refer to [Defining a UDO](https://docs.qualcomm.com/doc/80-63442-2/topic/udo_operator_definition.html)
for more details.

Compiling and running the UDO package on host is required for
Qualcomm® Neural Processing SDK model quantization tool,
[snpe-dlc-quantize](https://docs.qualcomm.com/doc/80-63442-2/topic/tools.html#snpe-dlc-quantize). It is
necessary to quantize a model using snpe-dlc-quantize, to run a
UDO layer that has at least one non-float input on the DSP.

Compiling a UDO for CPU on host

Steps to compile the CPU UDO implementation library on host x86
platform are as below:

1. Set the environment variable `$SNPE_UDO_ROOT`.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
        Copy to clipboard
2. Run the make instruction below in UDO package directory to compile the UDO package:

make cpu_x86
        Copy to clipboard

The expected artifacts after compiling for Host CPU are

- The UDO CPU implementation library:
&lt;UDO-Package&gt;/libs/x86-64\_linux\_clang/libUdo&lt;UDO-Package&gt;ImplCpu.so
- The UDO package registration library:
&lt;UDO-Package&gt;/libs/x86-64\_linux\_clang/libUdo&lt;UDO-Package&gt;Reg.so

**Note:** The command must be run from the package root.

Compiling a UDO for CPU on device

Steps to compile the CPU UDO implementation library on Android
platform are as below:

1. Set the environment variable `$SNPE_UDO_ROOT`.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
        Copy to clipboard
2. `$ANDROID_NDK_ROOT` must be set for the Android NDK build
toolchain.

export ANDROID_NDK_ROOT=<absolute_path_to_android_ndk_directory>
        Copy to clipboard
3. Run the make instruction below in UDO package directory to compile the UDO package:

make cpu_android
        Copy to clipboard

    The shared C++ standard library is required for the NDK
build to run. Make sure libc++\_shared.so is present on the
device at `LD_LIBRARY_PATH`.

The expected artifacts after compiling for Android CPU are

- The UDO CPU implementation library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;ImplCpu.so
- The UDO package registration library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;Reg.so
- A copy of shared standard C++ library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libc++\_shared.so

Implementing a UDO for GPU

Similar to the CPU runtime, a GPU UDO implementation library
based on core UDO APIs is required to run a UDO package on GPU
runtime. The UDO package generator tool will create a skeleton
containing blank constructs in the required format, but the
core logic of creating and execution of the operation needs to
be filled in by the user. This can be done by completing the
implementation of `setKernelInfo()` and
`<OpName>Operation()` function, and adding the GPU kernel
implementations in the **&lt;OpName&gt;.cpp** file generated by the
UDO package generator tool.

To have good performance and stability, it is required to avoid
heap memory allocation in the completed `<OpName>Operation()`
functions. The heap memory allocation includes but not limited
to calling `malloc`, `operator new`, constructing STL
container objects like `std::vector` with default allocator,
and adding items like calling `std::vector::push_back` to
STL container objects with default allocator. Please check
[here](https://docs.qualcomm.com/doc/80-63442-2/topic/creating_udo_package.html#avoid-using-heap-memory-allocation)
for more information.

Qualcomm® Neural Processing SDK GPU UDO supports 16-bit floating point activations in the
network. Users should expect input/output OpenCL buffer memory
from Qualcomm® Neural Processing SDK GPU UDO to be in 16-bit floating point (or OpenCL
half) data format as the storage type. For increased accuracy,
users may choose to implement internal math operations of the
kernel using 32-bit floating point data, and converting to half
precision when reading input buffers or writing output buffers
from the UDO kernel.

**Note:** Qualcomm® Neural Processing SDK provides tensor data corresponding to all the
inputs and outputs of a UDO not directly but as an opaque pointer.
The UDO implementation is expected to convert it to &lt;code&gt;Qnn\_Tensor\_t&lt;/code&gt;
which holds OpenCL memory pointer for tensor.

Compiling a UDO for GPU on device

Steps to compile the GPU UDO implementation library on Android
platform are as below:

1. Set the environment variable `$SNPE_UDO_ROOT`.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
        Copy to clipboard
2. `$ANDROID_NDK_ROOT` must be set for the Andorid NDK build
toolchain.

export ANDROID_NDK_ROOT=<absolute_path_to_android_ndk_directory>
        Copy to clipboard
3. `$CL_LIBRARY_PATH` must be set for the libOpenCL.so
library location.

export CL_LIBRARY_PATH=<absolute_path_to_OpenCL_library>
        Copy to clipboard

    The OpenCL shared library is not distributed as part of Qualcomm® Neural Processing SDK.
4. Run the make instruction below in UDO package directory to compile the UDO package:

make gpu_android
        Copy to clipboard

**Note:** The shared OpenCL library is target specific. It
should be discoverable in `CL_LIBRARY_PATH`.

The expected artifacts after compiling for Android GPU are

- The UDO GPU implementation library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;ImplGpu.so
- The UDO package registration library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;Reg.so
- A copy of shared standard C++ library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libc++\_shared.so

Implementing a UDO for DSP V65 and V66

Qualcomm® Neural Processing SDK utilizes Qualcomm® AI Direct SDK to run UDO layers on DSP. Therefore, a DSP
implementation library based on Qualcomm® AI Direct SDK APIs is required to run
a UDO package on DSP runtime. The UDO package generator tool
will create the template file **&lt;OpName&gt;.cpp** and the user
will need to implement the execution logic in the
`<OpName>_executeOp()` function in the template file.

To have good performance and stability, it is required to avoid
heap memory allocation in the completed `<OpName>_executeOp()`
functions. The heap memory allocation includes but not limited
to calling `malloc`, `operator new`, constructing STL
container objects like `std::vector` with default allocator,
and adding items like calling `std::vector::push_back` to
STL container objects with default allocator. Please check
[here](https://docs.qualcomm.com/doc/80-63442-2/topic/creating_udo_package.html#avoid-using-heap-memory-allocation)
for more information.

Qualcomm® Neural Processing SDK UDO provides the support for multi-threading of the
operation using worker threads, Hexagon Vector Extensions (HVX)
code and VTCM support.

The DSP runtime only propagates unsigned 8-bit activation
tensors between the network layers. But it has the ability to
de-quantize data to floating point if required. Therefore users
developing DSP kernels can expect either UINT\_8 or FLOAT\_32
activation tensors in and out of the operation, and thus can
set the field data\_type in the config file to one of these two
settings. Refer to [Defining a
UDO](https://docs.qualcomm.com/doc/80-63442-2/topic/udo_operator_definition.html) for more details.

Compiling a UDO for DSP V65 and V66 on device

This Qualcomm® Neural Processing SDK release supports building UDO DSP implementation
libraries using Hexagon-SDK 3.5.x.

1. Set the environment variables `$SNPE_UDO_ROOT`

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
        Copy to clipboard
2. Hexagon-SDK needs to be installed and set up. For details,
follow the setup instructions on
`$HEXAGON_SDK_ROOT/docs/readme.html` page, where
`$HEXAGON_SDK_ROOT` is the location of the Hexagon-SDK
installation. Make sure `$HEXAGON_SDK_ROOT` is set to use
the Hexagon-SDK build toolchain. Also set
`$HEXAGON_TOOLS_ROOT` and `$SDK_SETUP_ENV`

export HEXAGON_SDK_ROOT=<path to hexagon sdk installation>
        export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07
        export ANDROID_NDK_ROOT=<path to Android NDK installation>
        export SDK_SETUP_ENV=Done
        Copy to clipboard
3. `$ANDROID_NDK_ROOT` must be set for the Andorid NDK build
toolchain.

export ANDROID_NDK_ROOT=<absolute_path_to_android_ndk_directory>
        Copy to clipboard
4. Run the make instruction below in UDO package directory to compile the UDO DSP
implementation library:

make dsp
        Copy to clipboard

The expected artifacts after compiling for DSP are

- The UDO DSP implementation library:
&lt;UDO-Package&gt;/libs/dsp\_&lt;dsp\_arch\_type&gt;/libUdo&lt;UDO-Package&gt;ImplDsp.so
- The UDO package registration library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;Reg.so

**Note:** The command must be run from the package root. dsp\_v60 folder is created for all aarchs which are less than v68.

Implementing a UDO for DSP V68 or later

Qualcomm® Neural Processing SDK utilizes Qualcomm® AI Direct SDK to run UDO layers on DSP v68 or later.
Therefore, a DSP implementation library based on Qualcomm® AI Direct SDK APIs
is required to run a UDO package on DSP runtime. The UDO
package generator tool will create the template file
**&lt;OpName&gt;ImplLibDsp.cpp** and the user will need to implement
the execution logic in the `<OpName>Impl()` function in the
template file.

To have good performance and stability, it is required to avoid
heap memory allocation in the completed `<OpName>Impl()`
functions. The heap memory allocation includes but not limited
to calling `malloc`, `operator new`, constructing STL
container objects like `std::vector` with default allocator,
and adding items like calling `std::vector::push_back` to
STL container objects with default allocator. Please check
[here](https://docs.qualcomm.com/doc/80-63442-2/topic/creating_udo_package.html#avoid-using-heap-memory-allocation)
for more information.

Qualcomm® Neural Processing SDK UDO provides the support for Hexagon Vector Extensions
(HVX) code and cost based scheduling.

The DSP runtime propagates unsigned 8-bit or unsigned 16-bit
activation tensors between the network layers. But it has the
ability to de-quantize data to floating point if required.
Therefore users developing DSP kernels can expect either
UINT\_8, UINT\_16 or FLOAT\_32 activation tensors in and out of
the operation, and thus can set the field data\_type in the
config file to one of these three settings. Refer to Qualcomm® AI Direct SDK
for more details.

Compiling a UDO for DSP\_V68 or later on device

This Qualcomm® Neural Processing SDK release supports building UDO DSP implementation
libraries using Hexagon-SDK 4.x and Qualcomm® AI Direct SDK.

1. Set the environment variables `$SNPE_UDO_ROOT`

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
        Copy to clipboard
2. Hexagon-SDK 4.0+ needs to be installed and set up. For
Hexagon-SDK details, follow the setup instructions on
`$HEXAGON_SDK4_ROOT/docs/readme.html` page, where
`$HEXAGON_SDK4_ROOT` is the location of the Hexagon-SDK
installation. Make sure `$HEXAGON_SDK4_ROOT` is set to use
the Hexagon-SDK build toolchain. Also, set
`$HEXAGON_TOOLS_ROOT` and `$SDK_SETUP_ENV`. Additionally,
we need an extracted Qualcomm® AI Direct SDK (no need of Qualcomm® AI Direct SDK setup) for
building the libraries. For Qualcomm® AI Direct SDK details, refer to the
Qualcomm® AI Direct SDK documentation at `$QNN_SDK_ROOT/docs/QNN/index.html` page,
where `$QNN_SDK_ROOT` is the location of the Qualcomm® AI Direct SDK
installation. Set the `$QNN_SDK_ROOT` to the unzipped
Qualcomm® AI Direct SDK location.

export HEXAGON_SDK_ROOT=<path to hexagon sdk installation>
        export HEXAGON_SDK4_ROOT=<path to hexagon sdk 4.x installation>
        export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.4.09
        export QNN_SDK_ROOT=<path to QNN sdk installation>
        export ANDROID_NDK_ROOT=<path to Android NDK installation>
        export SDK_SETUP_ENV=Done
        Copy to clipboard
3. `$ANDROID_NDK_ROOT` must be set for the Andorid NDK build
toolchain.

export ANDROID_NDK_ROOT=<absolute_path_to_android_ndk_directory>
        Copy to clipboard
4. Run the make instruction below in UDO package directory to compile the UDO DSP
implementation library:

make dsp
        Copy to clipboard
5. Run the make instruction below in UDO package directory to generate a library for
offline cache generation:

make dsp_x86 X86_CXX=<path_to_x86_64_clang>
        Copy to clipboard
6. Run the make instruction below in UDO package directory to generate a library for **Android ARM architecture**:

make dsp_aarch64
        Copy to clipboard

    **Note:** This should only be run on linux based devices. This should not be run for Windows based devices.

The expected artifacts after compiling for DSP are

- The UDO DSP implementation library:
&lt;UDO-Package&gt;/libs/dsp\_v68/libUdo&lt;UDO-Package&gt;ImplDsp.so
- The UDO package registration library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;Reg.so

The expected artifact after compiling for offline cache
generation is

- The UDO DSP implementation library:
&lt;UDO-Package&gt;/libs/x86-64\_linux\_clang/libUdo&lt;UDO-Package&gt;ImplDsp.so

The expected artifact after compiling for Android ARM
architecture is

- The UDO DSP implementation library:
&lt;UDO-Package&gt;/libs/arm64-v8a/libUdo&lt;UDO-Package&gt;ImplDsp\_AltPrep.so

Note: The command must be run from the package root.

Table of Make Targets

| Make Target | Runtime | Platform | Misc. |
| --- | --- | --- | --- |
| all | CPU, GPU, DSP | x86, arm64-v8a |  |
| all\_x86 | CPU | x86 |  |
| all\_android | CPU, GPU, DSP | arm64-v8a |  |
| reg |  | x86, arm64-v8a |  |
| reg\_x86 |  | x86 |  |
| reg\_android |  | arm64-v8a |  |
| cpu | CPU | x86, arm64-v8a |  |
| cpu\_x86 | CPU | x86 | Same as all\_x86 |
| cpu\_android | CPU | arm64-v8a |  |
| gpu | GPU | arm64-v8a |  |
| gpu\_android | GPU | arm64-v8a | Same as gpu |
| dsp | DSP |  |  |
| dsp\_android | DSP |  | Same as dsp |
| dsp\_x86 | DSP |  |  |
| dsp\_aarch64 | DSP |  |  |

**Note:** By default, compiling for a runtime additionally
compiles the corresponding registration library

Last Published: Oct 02, 2025

[Previous Topic
Creating a UDO Package](https://docs.qualcomm.com/bundle/publicresource/80-63442-2/topics/creating_udo_package.md) [Next Topic
Compiling a UDO package for Windows](https://docs.qualcomm.com/bundle/publicresource/80-63442-2/topics/compiling_udo_package_for_windows.md)