# QNN HTP Op Package - Relu Op Example ## Overview This document outlines how to write ops in QNN HTP op package with a basic example of relu op. Here we will go through procedures of writing op implementations, registering ops, defining optimization rules, and registering optimization rules. The source code for this example is located at examples/OpPackage/HTP/ExampleOpPackageRelu.cpp in QNN SDK. For detailed descriptions about writing op implementations, defining optimization rules, specifying op parameter orders, please read implementing\_ops.html. In additon, optimization\_grammar.html provides more information on defining optimization rules. ## Writing Relu Op ExampleOpPackageRelu.cpp contains a standard Relu op, one variation of Relu op called ReluMinMax which clips data to a specified range, and another variation of Relu op called ReluTableGen which can be used with tableLookup. The standard Relu op demonstrates the basics of reference op implementation, for example, tensor reading and writing. ReluMinMax op consists of optimized implementation which heavily depends on HVX. ReluTableGen serves as a faster alternative of Relu op, it leverages lookup table to achieve fast lookups. Besides, there are many optimization rules associated with relu, these rules are used to convert, split and optimize graph around relu, thus best performance can be achived. This document focuses only on the standard relu op and some of its basic optimization rules. ### Op Implementation Function /* * @brief implementation of relu op * * @param[out] out output HTP tensor * * @param[in] in input HTP tensor * * @return GraphStatus error code */ template int reluImpl(T_Ttype &out, const T_Ttype &in); Copy to clipboard This example uses a function template, and the template parameter takes a HTP core tensor type. The op implementation function parameter list consists of a series of HTP core tensors in the following order: outputs inputs parameters. Input tensors and parameter tensors shall be marked as const. Please note, in implementation fucntions, there is no separation between input tensors and parameters. Also, both QNN scalar and tensor parameters are converted into HTP core tensors. In addition, HTP core tensors are always 4 dimensions, and the layout is always bhwc. QNN tensors with lower dimensions are backfilled into 4-dimensional HTP core tensors. Op implementation functions shall return GraphStatus which is an enum defined in include/HTP/core/graph\_status.h in QNN SDK. #### HTP Core Tensor Types HTP core has a base tensor type `Tensor` and a bunch of `ConcreteTensor` types. `ConcreteTensor` types are derived from base `Tensor`, and each ConcreteTensor type has a fixed rank, memory layout and data type. Base `Tensor` can be used in generic op implementations and served as a fallback option. `ConcreteTensor` types can be used to specialize op implementations for faster performance purpose. For a list of HTP core tensor types supported in op package, please refer to AllTensors defined in include/HTP/core/template\_help\_tensor\_ext.h in QNN SDK. For details about HTP core tensors’ usage and their accessor functions, please refer to include/HTP/core/tensor.h. More descriptions about HTP memory layouts and tensors can be found in tensors\_and\_memory\_layout.html. #### Relu Op Implementation The functionality of relu op is as follows: f(x) = max(x, 0) Copy to clipboard The implementation is as follows: template int reluImpl(T_Ttype &out, const T_Ttype &in) { out.set_dims(in); // sets output tensor dimension to be the same as input tensor // loops thru each input and output tensor elements via their four dimensions for (Idx b = 0; b < in.dim(0); b++) { for (Idx h = 0; h < in.dim(1); h++) { for (Idx w = 0; w < in.dim(2); w++) { for (Idx d = 0; d < in.dim(3); d++) { // read input tensor element located at coordinates (b, h, w, d) float inval = in(b, h, w, d); // find max(in, 0) and assign to output tensor at coordinates (b, h, w, d) out(b, h, w, d) = fmaxf(inval, 0.0f); } } } } } Copy to clipboard ### Op Registration Op implementation functions need to be registered with an op name, op cost and flags. Op registration can be achieved using HTP core macros listed below, and these macros should be placed in global scope in individual op implementation source files. #### Method 1 Registration with default cost value (i.e. GLACIAL) and default flag (Flags::RESOURCE\_HVX) **Syntax** /* * F - op implementation function * * OP - op name */ DEF_PACKAGE_OP(F,OP) Copy to clipboard **Example** DEF_PACKAGE_OP((reluImpl), "Relu") Copy to clipboard #### Method 2 Registration with user specified cost value and flags. **Syntax** /* * F - op implementation function * * OP - op name * * COST - pre-defined cost value names, one of GLACIAL, SNAIL, FAST, FREE * (listed in descending order of value). * Op implementation with relatively lower cost will be chosen given all * other criteria are met. * * ... - zero or more flags, available flags include IS_CONST, INHIBIT_CONST_PROP, * RESOURCE_HVX. * IS_CONST is used to mark an op should be treated as a constant op. * INHIBIT_CONST_PROP marks an op should not participate in constant propagation. * RESOURCE_HVX marks this op will use HVX resources. */ DEF_PACKAGE_OP_AND_COST_AND_FLAGS(F,OP,COST,...) Copy to clipboard **Example** DEF_PACKAGE_OP_AND_COST_AND_FLAGS((reluImpl), "Relu", SNAIL, Flags::RESOURCE_HVX) Copy to clipboard #### Method 3 Registration with user specified cost function and flags. (not shown in relu op example) **Syntax** /* * F - op implementation function * * OP - op name * * COST_F - user defined cost function * cost function pointer type: typedef float (*cost_function) (const Op * op); * Op implementation with relatively lower cost will be chosen given all * other criteria are met. * * ... - zero or more flags, available flags include IS_CONST, INHIBIT_CONST_PROP, * RESOURCE_HVX. * IS_CONST is used to mark an op should be treated as a constant op. * INHIBIT_CONST_PROP marks an op should not participate in constant propagation. * RESOURCE_HVX marks this op will use HVX resources. */ DEF_PACKAGE_OP_AND_COST_F_AND_FLAGS(F,OP,COST_F,...) Copy to clipboard **Example** float reluCost(const Op *op) { // can use some properties of an op to determine cost return 0.0; } DEF_PACKAGE_OP_AND_COST_F_AND_FLAGS((reluImpl), "Relu", reluCost, Flags::RESOURCE_HVX) Copy to clipboard ### Defining Optimization Rules Optimization rules are intended for graph level transformations and will be applied in passes during graph preparation in QNN context finalization. Relu optimization rules contain examples of splitting op into smaller chuncks, moving data to and from vtcm, converting one op to another more optimized op, and more. These rules would transform the graph around Relu in order to get best performance. Optimization rules can be defined with a HTP core macro listed below, and this macro should be placed in global scope in individual op implementation source files. #### Syntax /* * PRIORITY - optimization pass priority, smaller number means getting applied earlier * predefined values include EARLY(2000), MIDDLE(3000), LATE(4000) * * MATCHCODE - matching pattern for transformation to occur * * CONSTRAINTCODE - constraints which limits the conditions for transformation to occur * * REPLACECODE - tranformed pattern which replaces the original matching pattern */ DEF_PACKAGE_OPTIMIZATION(PRIORITY,MATCHCODE,CONSTRAINTCODE,REPLACECODE) Copy to clipboard **Example** DEF_PACKAGE_OPTIMIZATION( EARLY, Op("Relu", "X"), IS_QUANT_TYPE("X"), Op( "ReluMinMax", "X", gen_ConstScalar_f32(0.0f), gen_ConstScalar_f32(INF))) Copy to clipboard In this example, this optimization rule shall be applied at EARLY optimization pass during graph finalization, and the pattern this rule matches is a “Relu” op with one input, and we temporarily call this input “X”, and the constraint for this rule is that the input data must be quantized. If at EARLY optimization pass, there is an “Relu” op with one quantized input in the graph, this “Relu” op will get converted to a “ReluMinMax” op with the three inputs, the first input is “X”, and the next two inputs are 0.0f and float32 infinity. HTP core provides some replacement functions and constraint macros for op package to use, for more information about optimization rules, please refer to optimization\_grammar.html. ## Next Steps This is a basic and yet helpful example which outlines how to write an op. Please continue to read implementing\_ops.html, and read `Relu`, `Max Pool` and `Softmax` example files to view the code more in detail. Last Published: May 06, 2026 Previous Topic Optimization Grammar Next Topic QNN HTP-FP16 Op Package - Relu Op Example