# QNN HTP Op Package - Relu Op Example

## Overview

This document outlines how to write ops in QNN HTP op package with a basic example
of relu op. Here we will go through procedures of writing op implementations,
registering ops, defining optimization rules, and registering optimization rules.
The source code for this example is located at
examples/OpPackage/HTP/ExampleOpPackageRelu.cpp in QNN SDK.

For detailed descriptions about writing op implementations, defining optimization
rules, specifying op parameter orders, please read implementing\_ops.html. In additon,
optimization\_grammar.html provides more information on defining optimization rules.

## Writing Relu Op

ExampleOpPackageRelu.cpp contains a standard Relu op, one variation of Relu op called
ReluMinMax which clips data to a specified range, and another variation of Relu op
called ReluTableGen which can be used with tableLookup. The standard Relu op
demonstrates the basics of reference op implementation, for example, tensor reading
and writing. ReluMinMax op consists of optimized implementation which heavily depends
on HVX. ReluTableGen serves as a faster alternative of Relu op, it leverages lookup
table to achieve fast lookups. Besides, there are many optimization rules associated
with relu, these rules are used to convert, split and optimize graph around relu,
thus best performance can be achived.

This document focuses only on the standard relu op and some of its basic optimization
rules.

### Op Implementation Function

/*
     * @brief                  implementation of relu op
     *
     * @param[out] out         output HTP tensor
     *
     * @param[in] in           input HTP tensor
     *
     * @return GraphStatus     error code
     */
    template <typename T_Ttype>
    int reluImpl(T_Ttype &out, const T_Ttype &in);
    Copy to clipboard

This example uses a function template, and the template parameter takes a HTP core
tensor type. The op implementation function parameter list consists of a series of
HTP core tensors in the following order: <cite>outputs inputs parameters</cite>. Input tensors
and parameter tensors shall be marked as const. Please note, in implementation
fucntions, there is no separation between input tensors and parameters. Also, both
QNN scalar and tensor parameters are converted into HTP core tensors. In addition,
HTP core tensors are always 4 dimensions, and the layout is always bhwc. QNN tensors
with lower dimensions are backfilled into 4-dimensional HTP core tensors. Op
implementation functions shall return GraphStatus which is an enum defined in
include/HTP/core/graph\_status.h in QNN SDK.

#### HTP Core Tensor Types

HTP core has a base tensor type `Tensor` and a bunch of `ConcreteTensor` types.
`ConcreteTensor` types are derived from base `Tensor`, and each ConcreteTensor
type has a fixed rank, memory layout and data type. Base `Tensor` can be used
in generic op implementations and served as a fallback option. `ConcreteTensor`
types can be used to specialize op implementations for faster performance purpose.

For a list of HTP core tensor types supported in op package, please refer to AllTensors
defined in include/HTP/core/template\_help\_tensor\_ext.h in QNN SDK. For details about
HTP core tensors’ usage and their accessor functions, please refer to
include/HTP/core/tensor.h.

More descriptions about HTP memory layouts and tensors can be found in
tensors\_and\_memory\_layout.html.

#### Relu Op Implementation

The functionality of relu op is as follows:

f(x) = max(x, 0)
    Copy to clipboard

The implementation is as follows:

template <typename T_Ttype>
    int reluImpl(T_Ttype &out, const T_Ttype &in) {
      out.set_dims(in);  // sets output tensor dimension to be the same as input tensor
      // loops thru each input and output tensor elements via their four dimensions
      for (Idx b = 0; b < in.dim(0); b++) {
        for (Idx h = 0; h < in.dim(1); h++) {
          for (Idx w = 0; w < in.dim(2); w++) {
            for (Idx d = 0; d < in.dim(3); d++) {
              // read input tensor element located at coordinates (b, h, w, d)
              float inval     = in(b, h, w, d);
              // find max(in, 0) and assign to output tensor at coordinates (b, h, w, d)
              out(b, h, w, d) = fmaxf(inval, 0.0f);
            }
          }
        }
      }
    }
    Copy to clipboard

### Op Registration

Op implementation functions need to be registered with an op name, op cost and flags.
Op registration can be achieved using HTP core macros listed below, and these macros
should be placed in global scope in individual op implementation source files.

#### Method 1

Registration with default cost value (i.e. GLACIAL) and default flag (Flags::RESOURCE\_HVX)

**Syntax**

/*
     * F  - op implementation function
     *
     * OP - op name
     */
    DEF_PACKAGE_OP(F,OP)
    Copy to clipboard

**Example**

DEF_PACKAGE_OP((reluImpl<Tensor>), "Relu")
    Copy to clipboard

#### Method 2

Registration with user specified cost value and flags.

**Syntax**

/*
     * F    - op implementation function
     *
     * OP   - op name
     *
     * COST - pre-defined cost value names, one of GLACIAL, SNAIL, FAST, FREE
     *        (listed in descending order of value).
     *        Op implementation with relatively lower cost will be chosen given all
     *        other criteria are met.
     *
     * ...  - zero or more flags, available flags include IS_CONST, INHIBIT_CONST_PROP,
     *        RESOURCE_HVX.
     *        IS_CONST is used to mark an op should be treated as a constant op.
     *        INHIBIT_CONST_PROP marks an op should not participate in constant propagation.
     *        RESOURCE_HVX marks this op will use HVX resources.
     */
    DEF_PACKAGE_OP_AND_COST_AND_FLAGS(F,OP,COST,...)
    Copy to clipboard

**Example**

DEF_PACKAGE_OP_AND_COST_AND_FLAGS((reluImpl<PlainFloatTensor>), "Relu", SNAIL, Flags::RESOURCE_HVX)
    Copy to clipboard

#### Method 3

Registration with user specified cost function and flags.
(not shown in relu op example)

**Syntax**

/*
     * F      - op implementation function
     *
     * OP     - op name
     *
     * COST_F - user defined cost function
     *          cost function pointer type: typedef float (*cost_function) (const Op * op);
     *          Op implementation with relatively lower cost will be chosen given all
     *          other criteria are met.
     *
     * ...    - zero or more flags, available flags include IS_CONST, INHIBIT_CONST_PROP,
     *          RESOURCE_HVX.
     *          IS_CONST is used to mark an op should be treated as a constant op.
     *          INHIBIT_CONST_PROP marks an op should not participate in constant propagation.
     *          RESOURCE_HVX marks this op will use HVX resources.
     */
    DEF_PACKAGE_OP_AND_COST_F_AND_FLAGS(F,OP,COST_F,...)
    Copy to clipboard

**Example**

float reluCost(const Op *op) {
      // can use some properties of an op to determine cost
      return 0.0;
    }
    
    DEF_PACKAGE_OP_AND_COST_F_AND_FLAGS((reluImpl<PlainFloatTensor>), "Relu", reluCost, Flags::RESOURCE_HVX)
    Copy to clipboard

### Defining Optimization Rules

Optimization rules are intended for graph level transformations and will be applied
in passes during graph preparation in QNN context finalization. Relu optimization rules
contain examples of splitting op into smaller chuncks, moving data to and from vtcm,
converting one op to another more optimized op, and more. These rules would transform
the graph around Relu in order to get best performance.

Optimization rules can be defined with a HTP core macro listed below, and this macro
should be placed in global scope in individual op implementation source files.

#### Syntax

/*
     * PRIORITY       - optimization pass priority, smaller number means getting applied earlier
     *                  predefined values include EARLY(2000), MIDDLE(3000), LATE(4000)
     *
     * MATCHCODE      - matching pattern for transformation to occur
     *
     * CONSTRAINTCODE - constraints which limits the conditions for transformation to occur
     *
     * REPLACECODE    - tranformed pattern which replaces the original matching pattern
     */
    DEF_PACKAGE_OPTIMIZATION(PRIORITY,MATCHCODE,CONSTRAINTCODE,REPLACECODE)
    Copy to clipboard

**Example**

DEF_PACKAGE_OPTIMIZATION(
        EARLY,
        Op("Relu", "X"),
        IS_QUANT_TYPE("X"),
        Op( "ReluMinMax", "X", gen_ConstScalar_f32(0.0f), gen_ConstScalar_f32(INF)))
    Copy to clipboard

In this example, this optimization rule shall be applied at EARLY optimization pass during
graph finalization, and the pattern this rule matches is a “Relu” op with one input, and we
temporarily call this input “X”, and the constraint for this rule is that the input data
must be quantized. If at EARLY optimization pass, there is an “Relu” op with one quantized
input in the graph, this “Relu” op will get converted to a “ReluMinMax” op with the three
inputs, the first input is “X”, and the next two inputs are 0.0f and float32 infinity.

HTP core provides some replacement functions and constraint macros for op package to use,
for more information about optimization rules, please refer to optimization\_grammar.html.

## Next Steps

This is a basic and yet helpful example which outlines how to write an op.

Please continue to read implementing\_ops.html, and read `Relu`, `Max Pool`
and `Softmax` example files to view the code more in detail.

Last Published: May 06, 2026

Previous Topic
 
Optimization Grammar Next Topic

QNN HTP-FP16 Op Package - Relu Op Example