# QNN HTP-FP16 Op Package - Relu Op Example

## Overview

The typical usage of HTP-FP16 ops is to accelerate QNN float32 graphs. These graphs generally have tensors of TensorType  QNN\_TENSOR\_TYPE\_APP\_WRITE and QNN\_TENSOR\_TYPE\_APP\_READ (See exhale\_enum\_QnnTypes\_8h\_1aedf2127d917e35605f466d7d02f28f25) and DataType QNN\_DATATYPE\_FLOAT32 (See exhale\_enum\_QnnTypes\_8h\_1a8bda296c6c8a3148f764dcbdb5b8f534). Furthermore, the input and output activation tensors defined by the operations of these graphs have DataType QNN\_DATATYPE\_FLOAT32 (See exhale\_enum\_QnnTypes\_8h\_1a8bda296c6c8a3148f764dcbdb5b8f534).

The client is expected to set up QNN graphs with float32 tensors and the QNN HTP accelerator will finalize and execute those QNN graphs using float16 math. See general/htp\_backend/qnn-htp-precision.

This document outlines how to write ops in QNN HTP-FP16 op package with a basic example of relu op. The source code for this example is located at examples/OpPackage/HTP/ExampleOpPackageReluFp16.cpp.

For detailed descriptions about writing op implementations, defining optimization
rules, specifying op parameter orders, please read HTP/implementing\_ops:Implementing Ops. In additon,
HTP/optimization\_grammar:Optimization Grammar provides more information on defining optimization rules.

## Writing Relu Op with HTP-FP16

ExampleOpPackageReluFp16.cpp contains a generic relu op and two specializations
(relu1 and reluX).

### Op Registration

Op implementation functions need to be registered with an op name, op cost and flags.
Op registration can be achieved using HTP core macros listed below, and these macros
should be placed in global scope in individual op implementation source files.

Registration with user specified cost value and flags.

**Syntax**

/*
     * F    - op implementation function
     *
     * OP   - op name
     *
     * COST - pre-defined cost value names, one of GLACIAL, SNAIL, FAST, FREE
     *        (listed in descending order of value).
     *        Op implementation with relatively lower cost will be chosen given all
     *        other criteria are met.
     *
     * ...  - zero or more flags, available flags include IS_CONST, INHIBIT_CONST_PROP,
     *        RESOURCE_HVX.
     *        IS_CONST is used to mark an op should be treated as a constant op.
     *        INHIBIT_CONST_PROP marks an op should not participate in constant propagation.
     *        RESOURCE_HVX marks this op will use HVX resources.
     */
    DEF_PACKAGE_OP_AND_COST_AND_FLAGS(F,OP,COST,...)
    Copy to clipboard

**Example**

DEF\_PACKAGE\_OP\_AND\_COST\_AND\_FLAGS((reluImplFp&lt;PlainFloat16Tensor&gt;), “Relufp16”, FAST)

### Optimization Rule Definition

In order to correctly handle operations using float16 math, op-writers are required to add a DEF\_PACKAGE\_OPTIMIZATION\_WITH\_FLAGS macro that is set up at GRAPH\_CLEANUP priority to convert the appropriate float32 tensors to float16 tensors. This DEF\_PACKAGE\_OPTIMIZATION\_WITH\_FLAGS is essentially an optimization rule that is applied on the graph during the graph optimization phase (which happens during QnnGraph\_finalize()). The purpose of this optimization rule is to insert a QNN\_Cast float32 to float16 on the inputs to the operation and a QNN\_Cast float16 to float32 on the outputs of the operation.

At a later pass of graph optimization, any sequence of QNN\_Cast float16 to float32 followed by QNN\_Cast float32 to float16 between consecutive FP ops are cancelled out. This results in a QNN graph with only QNN\_Cast float32 to float16 for the graph inputs and a QNN\_Cast float16 to float32 at graph outputs.

**Syntax**

/*
     * PRIORITY       - unsigned integer value, used for indicating optimization pass number,
     *                  smaller number indicates earlier optimization pass.
     *                  Predefined values include GRAPH_CLEANUP(0), EARLY(2000), MIDDLE(3000),
     *                  LATE(4000).
     *
     * FLAGS          - used to trigger all rules containing that flag.
     *                  relaxed_precision_flag - if overall flag for relaxed precision is
     *                  enabled all rules containing this flag will be triggered
     *
     * MATCHCODE      - subgraph matching pattern which this optimization rule should apply on
     *
     * CONSTRAINTCODE - constraints applied to the match pattern
     *
     * REPLACECODE    - new subgraph pattern which should replace the matching pattern if the
     *                  constraints are met
     */
     DEF_PACKAGE_OPTIMIZATION_WITH_FLAGS(PRIORITY,FLAGS,MATCHCODE,CONSTRAINTCODE,REPLACECODE)
    Copy to clipboard

**Example**

1DEF_PACKAGE_OPTIMIZATION_WITH_FLAGS(
     2   GRAPH_CLEANUP,          // priority
     3   relaxed_precision_flag, // flag to ensure that this op should run relaxed float math i.e. float16 math
     4   Op(QNN_OP_RELU, "In"),  // matchcode
     5
     6   //constaintcode
     7   AND(EQ(DTYPE_OF("In"), DType::Float32), EQ(DTYPE_OF("*"), DType::Float32)),
     8
     9   // replacecode
    10   WITH_OUTPUT_TYPE(DType::Float32, 0, 1.0f,
    11      Op(FROM_DEFAULT_PACKAGE("Cast"),
    12         WITH_SIZE("*",
    13            WITH_OUTPUT_TYPE(DType::Float16, 0, 1.0f,
    14               Op(OP,
    15                  WITH_SIZE("In",
    16                     WITH_OUTPUT_TYPE(DType::Float16, 0, 1.0f,
    17                        Op(FROM_DEFAULT_PACKAGE("Cast"), "In")
    18                     )
    19                  )
    20               )
    21            )
    22         )
    23      )
    24   )
    25)
    Copy to clipboard

Last Published: Jul 02, 2026

[Previous Topic
QNN HTP Op Package - Relu Op Example](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/relu_example.md) [Next Topic
Scheduling and Allocation](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/scheduling_and_allocation.md)