# Quantized vs Non-Quantized Models

Overview

- Non-quantized DLC files use 32 bit floating point
representations of network parameters.
- Quantized DLC files use fixed point representations of
network parameters, generally 8 bit weights and 8 or 32 bit
biases. The fixed point representation is the same used in
Tensorflow quantized models.

ONNX

The default output of
snpe-onnx-to-dlc is a
non-quantized model. This means that all the network parameters
are left in the 32 bit floating point representation as present
in the original ONNX model. To quantize the model to 8 bit
fixed point, see
snpe-dlc-quantize.
Note that models that are intended to be quantized using
snpe-dlc-quantize must
have their batch dimension set to 1. A different batch
dimension can be used during inference, by
resizing the network during
initialization.

TensorFlow

The default output of
snpe-tensorflow-to-dlc
is a non-quantized model. This means that all the network
parameters are left in the 32 bit floating point representation
as present in the original TensorFlow model. To quantize the
model to 8 bit fixed point, see
snpe-dlc-quantize.
Note that models that are intended to be quantized using
snpe-dlc-quantize must
have their batch dimension set to 1. A different batch
dimension can be used during inference, by
resizing the network during
initialization.

Choosing Between a Quantized or Non-Quantized
Model

Summary

| Runtime | Quantized DLC | Non-Quantized DLC |
| --- | --- | --- |
| CPU | **Compatible.** If CPU fixed-point mode is enabled, model can be passed<br>directly to the runtime. Else the model is dequantized by the runtime,<br>increasing network initialization time.  Accuracy may be impacted. | **Compatible.** The model is native format for this runtime. Model can<br>be passed directly to the runtime. May be more accurate than a quantized model. |
| GPU | **Compatible.** The model is dequantized by the runtime, increasing<br>network initialization time. Accuracy may be impacted. | **Compatible.** The model is native format for this runtime. Model can<br>be passed directly to the runtime. May be more accurate than a quantized model. |
| DSP | **Compatible.** The model is native format for DSP runtime. Model can be<br>passed directly to the runtime. Accuracy may be different than a<br>non-quantized model | **Compatible.** The model is quantized by the runtime, increasing<br>network initialization time. Accuracy may be different than a quantized model. |
| AIP | **Compatible.** The model is in supported format for AIP runtime. Model<br>can be passed directly to the runtime. | **Incompatible.** Non-quantized models are not supported by the AIP runtime. |

Details

- CPU

    - The CPU by default uses floating point (non-quantized) network parameters.
    - Using quantized DLC files with CPU runtime is supported. To use quantized network parameters directly,
CPU fixed-point mode should be enabled. If it is not enabled, network initialization time will
dramatically increase as Qualcomm® Neural Processing SDK will automatically
de-quantize the network parameters in order to run on CPU.
    - Quantization of the DLC file does introduce noise, as quantization is lossy.
    - The network performance during execute is not impacted by the
choice of quantized vs non-quantized DLC files.
- GPU

    - The GPU always uses floating point (non-quantized) network parameters.
    - Using quantized DLC files with GPU runtime is supported. Network
initialization time will dramatically increase as Qualcomm® Neural Processing SDK will automatically
de-quantize the network parameters in order to run on GPU.
    - If network initialization time is a concern, it is recommended to
use non-quantized DLC files (default) for GPU.
    - Quantization of the DLC file does introduce noise, as quantization is lossy.
    - The network performance during execute is not impacted by the
choice of quantized vs non-quantized DLC files.
- DSP

    - The DSP always uses quantized network parameters.
    - Using a non-quantized DLC file on the DSP is supported. Network
initialization time will dramatically increase as Qualcomm® Neural Processing SDK will automatically
quantize the network parameters in order to run on the DSP.
    - It is generally recommended to use quantized DLC files for running
on the DSP. In addition to faster network initialization time,
using quantized models also reduces peak memory usage during
initialization, and decreases DLC file size.
- AIP

    - The AIP runtime always uses quantized network parameters.
    - Passing through snpe-dlc-quantize is mandatory for generating the
binaries for HTA subnets.
    - Using a non-quantized DLC file with the AIP runtime is not supported.
    - HTA subnets use the quantized parameters in the DLC.
    - HNN (Hexagon NN) subnets use the quantization parameters in the
same way DSP runtime does.
- Balancing DLC file size, network initialization time and accuracy

    - If the network will mainly run on the GPU and CPU it is
recommended to try both quantized and non-quantized models during
development. If a quantized model provides enough accuracy, then
it may be used directly for CPU using CPU fixed-point mode. For GPU,
it may be used at the expense of increased network initialization
time. The benefit is a much smaller DLC file. The tradeoff between
accuracy, network initialization time, and DLC file size is
application specific.
    - If the network will mainly run on the DSP, there is no benefit to
using a non-quantized model. As previously stated it will
dramatically increase network initialization time and DLC file
size, but provide no accuracy benefit.

**Quantization Algorithm**

This section describes the concepts behind the quantization algorithm used in Qualcomm® Neural Processing SDK.
These concepts are used by snpe-dlc-quantize and is also
used by Qualcomm® Neural Processing SDK for input quantization when using the DSP runtime.

Overview

**Note:** Qualcomm® Neural Processing SDK supports multiple quantization modes. The basics of the
quantization, regardless of mode, are described here. See Quantization
Modes for more
information.

> 
> 
> - Quantization converts floating point data to Tensorflow-style 8-bit fixed point format.
> - The following requirements are satisfied:
> 
>     - Full range of input values is covered.
>     - Minimum range of 0.0001 is enforced.
>     - Floating point zero is exactly representable.
> - Quantization algorithm inputs:
> 
>     - Set of floating point values to be quantized.
> - Quantization algorithm outputs:
> 
>     - Set of 8-bit fixed point values.
>     - Encoding parameters:
> 
>         - encoding-min : minimum floating point value representable (by fixed point value 0).
>         - encoding-max : maximum floating point value representable (by fixed point value 255).
> - Algorithm
> 
>     1. Compute the true range (min, max) of input data.
>     2. Compute the encoding-min and encoding-max.
>     3. Quantize the input floating point values.
>     4. Output:
> 
>         - fixed point values
>         - encoding-min and encoding-max parameters

Details

This section outlines more information regarding the quantization process.

> 
> 
> 1. Compute the true range of the input floating point data.
> 
>     - finds the smallest and largest values in the input data
>     - represents the true range of the input data
> 2. Compute the encoding-min and encoding-max.
> 
>     - These parameters are used in the quantization step.
>     - These parameters define the range and floating point values that
> will be representable by the fixed point format.
> 
>         - encoding-min: specifies the smallest floating point value that
> will be represented by the fixed point value of 0
>         - encoding-max: specifies the largest floating point value that
> will be represented by the fixed point value of 255
>         - floating point values at every step size, where step size =
> (encoding-max - encoding-min) / 255, will be representable
> 
> 
>     1. encoding-min and encoding-max are first set to the true min and
> true max computed in the previous step
>     2. First requirement: encoding range must be at least a minimum of 0.0001
> 
>         - encoding-max is adjusted to max(true max, true min + 0.01)
>     3. Second requirement: floating point value of 0 must be exactly
> representable
> 
>         - encoding-min or encoding-max may be further adjusted
> 3. Handling 0.
> 
>     1. Case 1: Inputs are strictly positive
> 
>         - the encoding-min is set to 0.0
>         - zero floating point value is exactly representable by smallest
> fixed point value 0
>         - e.g. input range = [5.0, 10.0]
> 
>             - encoding-min = 0.0, encoding-max = 10.0
>     2. Case 2: Inputs are strictly negative
> 
>         - encoding-max is set to 0.0
>         - zero floating point value is exactly representable by the
> largest fixed point value 255
>         - e.g. input range = [-20.0, -6.0]
> 
>             - encoding\*min = \*20.0, encoding\*max = 0.0
>     3. Case 3: Inputs are both negative and positive
> 
>         - encoding-min and encoding-max are slightly shifted to make the
> floating point zero exactly representable
>         - e.g. input range = [-5.1, 5.1]
> 
>             - encoding-min and encoding-max are first set to -5.1 and 5.1,
> respectively
>             - encoding range is 10.2 and the step size is 10.2/255 = 0.04
>             - zero value is currently not representable. The closest
> values representable are -0.02 and +0.02 by fixed point
> values 127 and 128, respectively
>             - encoding-min and encoding-max are shifted by -0.02. The new
> encoding-min is -5.12 and the new encoding-max is 5.08
>             - floating point zero is now exactly representable by the
> fixed point value of 128
> 4. Quantize the input floating point values.
> 
>     - encoding-min and encoding-max parameter determined in the previous
> step are used to quantize all the input floating values to their
> fixed point representation
>     - Quantization formula is:
> 
>         - quantized value = round(255 \* (floating point value -
> encoding.min) / (encoding.max - encoding.min))
>     - quantized value is also clamped to be within 0 and 255
> 5. Outputs
> 
>     - the fixed point values
>     - encoding-min and encoding-max parameters

Quantization Example

- Inputs:

    - input values = [-1.8, -1.0, 0, 0.5]
- encoding-min is set to -1.8 and encoding-max to 0.5
- encoding range is 2.3, which is larger than the required 0.0001
- encoding-min is adjusted to −1.803922 and encoding-max to 0.496078 to
make zero exactly representable
- step size (delta or scale) is 0.009020
- Outputs:

    - quantized values are [0, 89, 200, 255]

Dequantization Example

- Inputs:

    - quantized values = [0, 89, 200, 255]
    - encoding-min = −1.803922, encoding-max = 0.496078
- step size is 0.009020
- Outputs:

    - dequantized values = [−1.8039, −1.0011, 0.0000, 0.4961]

Bias BitWidth

Qualcomm® Neural Processing SDK currently supports a default quantization bit width of 8 for both
weights and biases. The bias bitwidth, however, can be overriden to use
32 bit quantization by specifying the command line option
“–bias\_bitwidth 32” from snpe-dlc-quantize.
For some models, using 32 bit biases may give a small improvement in accuracy.
Unfortunately it is difficult to predict which models may benefit from
this since model architectures, weight distributions, etc all have an
impact on quantization performance.

Activation BitWidth

Qualcomm® Neural Processing SDK also supports, quantization bitwidth of 16 for activation.(See Notes)

To enable 16-bit fixed point inference, specify quantization bitwidth of
activations to 16 while keeping that of weights to 8. Passing the
command line options: “–act\_bitwidth 16 –weights\_bitwidth 8” to
snpe-dlc-quantize, will generate quantized model files with 16-bit
activations and 8-bit weights.

It is recommended to use UserBuffer TF16 as input/output data format for
better efficiency. In this case, users of Qualcomm® Neural Processing SDK need quantize/dequantize
input/output data on their own if floating point data are used. When
testing with snpe-net-run, command line option “–userbuffer\_tfN 16” can
be used to select UserBuffer TF16 mode. ITensor and UserBuffer floating point
format can still be used with 16-bit integer inference with less efficient
quantization applied internally.

Packed 4-bit quantization

In packed 4-bit quantization, two 4-bit quantized tensors can be stored in a single 8-bit buffer. The lower nibble
stores the first value while the higher nibble stores the second value. This can be enabled by providing the
“–pack\_4\_bit\_weights” from snpe-dlc-quantize. For the quantized values (10, 4) the
unpacked and packed representation is given below.

- Unpacked = (0000 1010, 0000 0100)
- Packed   = (0100 1010)

In case of per-channel/per-row quantization, the quantized values are packed along each channel/row. For a tensor of size
(3,3,3,32) containing 32 output channels and 27 values per channel, the unpacked and packed representation will take the
following amount of memory for the 27 quantized values per channel.

- Unpacked = (3\*3\*3) = 27 bytes
- Packed   = ceil((3\*3\*3)/2) = 14 bytes

**Note** Packed 4-bit tensors are stored with QNN\_DATATYPE\_SFIXED\_POINT\_4/QNN\_DATATYPE\_UFIXED\_POINT\_4 datatypes while
unpacked 4-bit tensors are stored with QNN\_DATATYPE\_SFIXED\_POINT\_8/QNN\_DATATYPE\_UFIXED\_POINT\_8 datatypes. Please refer
to the backend supplements to find ops which support 4-bit packed tensors.

Quantization Modes

Qualcomm® Neural Processing SDK supports multiple quantization modes, the difference is in how
quantization parameters are chosen.

Default Quantization Mode

The default mode has been described above, and uses the true min/max of
the data being quantized, followed by an adjustment of the range to
ensure a minimum range and to ensure 0.0 is exactly quantizable.

Enhanced Quantization Mode

Enhanced quantization mode (invoked by using the
“use\_enhanced\_quantizer” parameter to
snpe-dlc-quantize) uses an
algorithm to try to determine a better set of quantization parameters to
improve accuracy. The algorithm may pick a different min/max value than
the default quantizer, and in some cases it may set the range such that
some of the original weights and/or activations cannot fall into that
range. However, this range does produce better accuracy than simply
using the true min/max. The enhanced quantizer can be enabled
independently for weights and activations by appending either “weights”
or “activations” after the option.

This is useful for some models where the weights and/or activations may
have “long tails”. (Imagine a range with most values between -100 and
1000, but a few values much greater than 1000 or much less than -100.)
In some cases these long tails can be ignored and the range -100, 1000
can be used more effectively than the full range.

Enhanced quantizer still enforces a minimum range and ensures 0.0 is
exactly quantizable.

Adjusted Weights Quantization Mode

This mode is used only for quantizing weights to 8 bit fixed
point (invoked by using the “use\_adjusted\_weights\_quantizer” parameter to
snpe-dlc-quantize), which uses
adjusted min or max of the data being quantized other than true min/max
or the min/max that exclude the long tail. This has been verified to be
able to provide accuracy benefit for denoise model specifically. Using
this quantizer, the max will be expanded or the min will be decreased if
necessary.

Adjusted weights quantizer still enforces a minimum range and ensures
0.0 is exactly quantizable.

Enhanced Quantization Techniques

Quantization can be a difficult problem to solve due to the myriad of
training techniques, model architectures, and layer types. In an attempt
to mitigate quantization problems two new model preprocessing techniques
have been added to snpe-dlc-quantize
that may improve quantization performance on models which exhibit sharp drops in
accuracy upon quantization.

The new technique introduced is CLE (Cross Layer Equalization).

CLE works by scaling the convolution weight ranges in the network by
making use of a scale-equivariance property of activation functions. In
addition, the process absorbs high biases which may be result from
weight scaling from one convolution layer to a subsequent convolution
layer.

Enhanced Quantization Techniques: Limitations

In many cases, CLE may enable quantized models to return to close to
their original floating-point accuracy. There are some
caveats/limitations to the current algorithms:

> 
> 
> 1. CLE operates on specific patterns of operations that all exist in a
> single branch (outputs cannot be consumed by more than one op). The
> matched operation patterns (r=required, o=optional) are:
> 
>     - Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)-&gt;Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)
>     - Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)-&gt;DepthwiseConv(r)-&gt;Batchnorm(r)-&gt;activation(o)-&gt;Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)
> 2. The CLE algorithm currently only supports Relu activations. Any Relu6
> activations will be automatically changed to Relu and any activations
> other than these will cause the algorithm to ignore the preceding
> convolution. Typically the switch from Relu6-&gt;Relu is harmless and
> does not cause any degradation in accuracy, however some models may
> exhibit a slight degradation of accuracy. In this case, CLE can
> only recover accuracy to that degraded level, and not to the original
> float accuracy.
> 3. CLE requires batchnorms (specifically detectable batchnorm beta/gamma
> data) be present in the original model before conversion to DLC for
> the complete algorithm to be run and to regain maximum accuracy. For
> Tensorflow, the beta and gamma can sometimes still be found even with
> folded batchnorms, so long as the folding didn’t fold the parameters
> into the convolution’s static weights and bias. If it does not detect
> the required information you may see a message that looks like:
> “Invalid model for HBA quantization algorithm.” This indicates the
> algorithm will only partially run and accuracy issues may likely be
> present.

To run CLE pass the “–optimizations cle” to
snpe-dlc-quantize.

The original converted float model should always be used as input to
snpe-dlc-quantize. Passing quantized models back to the quantizer is not
supported and will result in undefined behavior.

More information about the algorithms can be found
here: [https://arxiv.org/abs/1906.04721](https://arxiv.org/abs/1906.04721)

Quantization Impacts

Quantizing a model and/or running it in a quantized runtime (like the
DSP) can affect accuracy. Some models may not work well when quantized,
and may yield incorrect results. The metrics for measuring impact of
quantization on a model that does classification are typically “Mean
Average Precision”, “Top-1 Error” and “Top-5 Error”. These metrics
published in Qualcomm® Neural Processing SDK release notes for various models.

Mixed Precision and FP16 Support

Mixed Precision enables specifying different bit widths (e.g. 8 or 16) or
datatypes (integer or floating point) for different operations within the same graph.
Data type conversion operations are automatically inserted when activation precision or
data type is different between successive operations. Graphs can have a mix of floating-point
and fixed-point data types. Each operation can have different precision for weights and
activations. However, for a particular operation, either all inputs, outputs and parameters
(weights/biases) will be floating-point or all will be fixed-point format.

Quantization Overrides

If the option –quantization\_overrides is provided during model
conversion the user can provide a json file with parameters to use for
quantization. These will be cached along with the model and can be used
to override any quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal quantization process in
snpe-dlc-quantize. To override the params during snpe-dlc-quantize the
option –override\_params must be passed, and the cached values will be
used instead. The json format is defined as per AIMET specification and
can be found below.

There are two sections in the json, a section for overriding operator
output encodings called “activation\_encodings” and a section for
overriding parameter (weight and bias) encodings called
“param\_encodings”. Both must be present in the file, but can be empty if
no overrides are desired.

An example with all of the currently supported options:

{
      "activation_encodings": {
          "Conv1:0": [
              {
                  "bitwidth": 8,
                  "max": 12.82344407824954,
                  "min": 0.0,
                  "offset": 0,
                  "scale": 0.050288015993135454
              }
          ],
          "input:0": [
              {
                  "bitwidth": 8,
                  "max": 0.9960872825108046,
                  "min": -1.0039304197656937,
                  "offset": 127,
                  "scale": 0.007843206675594112
              }
          ]
      },
      "param_encodings": {
          "Conv2d/weights": [
              {
                  "bitwidth": 8,
                  "max": 1.700559472933134,
                  "min": -2.1006477158567995,
                  "offset": 140,
                  "scale": 0.01490669485799974
              }
          ]
      }
    }
    Copy to clipboard

Under “activation\_encodings” the names (eg “Conv1:0”) represent the
output tensor names where quantization should be overriden. Under
“param\_encodings” the names represent the weights or biases for which
the encodings will be specified. A brief breakdown of the common
parameters:

- bitwidth (int, required) - The bitwidth to use for quantization. Note
that this much match the existing bitwidth support for the runtime on
which the model will be run.
- max (float, required) - The largest number in the distribution or
desired range.
- min (float, required) - The smalled number in the distribution or
desired range.
- offset (int) - The integer offset indicating the zero point (ie The
point at which 0 is exactly represnted).
- scale (float) - The value indicating the integer size divided by the
desired distribution range.

Note that it is not required to provide scale (also referred to as
delta) and offset (zero point) but bitwidth, min, and max should be provided.
Scale and offset will be calculated from the provided bitwidth, min, and
max parameters regardless if they are provided or not.

Note : Quantization bit width 16 for activation, supported from
Snapdragon 865/765 onwards on certain runtimes and currently not enable
for all ops.

Float16 (half-precision) additionally enables converting the entire models to FP16 or selecting between FP16 and FP32
data-types for the float ops in case of mixed precision graphs with a mix of floating point and integer ops. The
different modes of using mixed precision are described below.

- No override: If no –quantization\_overrides flag is given with an encoding file, all activations are quantized as per
–act\_bitwidth (default 8) and parameters are quantized as per –weight\_bitwidth/–bias\_bitwidth (default 8/8)
respectively.
- Full override: If –quantization\_overrides flag is given along with encoding file specifying encodings for all ops in
the model. In this case, the bitwidth with be set as per JSON for all ops defined as integer/float as per encoding
file (dtype=’int’ or dtype=’float’ in encoding json).
- Partial override: If –quantization\_overrides flag is given along with encoding file specifying partial encodings
(i.e. encodings are missing for some ops), the following will happen.

> 
> 
> - Layers for which encoding are NOT available in json file are encoded in the same manner as the no override case
> i.e. defined as integer with bitwidth defined as per –act\_bitwidth/–weight\_bitwidth/–bias\_bitwidth
> (or their default values 8/8/8).
> For some ops (Conv2d, Conv3d, TransposeConv2d, DepthwiseConv2d, FullyConnected, MatMul) even if any of the
> output/weights/bias are specified as float in the encoding file, all three of them will be overridden to float.
> The float bitwidth used will be same as the float bitwidth of the overriding tensor in the encodings file. We
> can also manually control the bitwidth of bias tensors in such case (if encodings for it are absent in encodings
> json and present for output/weights) with the use of the –float\_bias\_bitwidth (16/32) flag.
>     - Layers for which encoding are available in json are encoded in same manner as full override case.

We show a sample json for network with 3 Conv2d ops. The first and third Conv2d ops are INT8 while the second Conv2d op
is marked as FP32. The FP32 op (namely conv2\_1) is sandwiched between two INT8 ops in “activation\_encodings”, hence
convert ops will be inserted before and after the FP32 op. The corresponding weights and biases for conv2\_1 are also
marked as floating-point in the JSON in “param\_encodings”.

{
           "activation_encodings": {
               "data_0": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ],
               "conv1_1": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ],
               "conv2_1": [
                   {
                       "bitwidth": 32,
                       "dtype": "float"
                   }
               ],
               "conv3_1": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ]
           },
           "param_encodings": {
               "conv1_w_0": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ],
               "conv1_b_0": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ],
               "conv2_w_0": [
                   {
                       "bitwidth": 32,
                       "dtype": "float"
                   }
               ],
               "conv2_b_0": [
                   {
                       "bitwidth": 32,
                       "dtype": "float"
                   }
               ],
               "conv3_w_0": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ],
               "conv3_b_0": [
                   {
                       "bitwidth": 8,
                       "dtype": "int"
                   }
               ]
           }
        }
    Copy to clipboard

The ops that are not present in json will be assumed to be fixed-point and the bit widths will be selected according to
–act\_bitwidth/–weight\_bitwidth/–bias\_bitwidth respectively.

{
           "activation_encodings": {
               "conv2_1": [
                   {
                       "bitwidth": 32,
                       "dtype": "float"
                   }
               ]
           },
           "param_encodings": {
               "conv2_w_0": [
                   {
                       "bitwidth": 32,
                       "dtype": "float"
                   }
               ],
               "conv2_b_0": [
                   {
                       "bitwidth": 32,
                       "dtype": "float"
                   }
               ]
           }
        }
    Copy to clipboard

The following quantized mixed-precision graph will be generated based on the JSON shown above. Please note that the
convert operations are added appropriately to convert between float and int types and vice-versa.

![../images/snpe_quantization_mp_graph.png](data:image/png;base64,UklGRsoYAABXRUJQVlA4TL0YAAAvKQZQEF8EIZIkxVb2PaYVcBa+fwvfCjNz23AYSbZp9T58I/+EfiK2TceNJClS5jIc+W/QOcLwY74yyDZSEQ71ld7wDC7AsTNMURidgaqCAqPZQALNFhpVCQm0BwnQKUCC9KYCx38QGS6ETt8vwh/E6E0gwUvdEoTmuoQEr82BoBAxlwX01p0ooNmfIIBGdyS4SKVQCI3BFAoICvP7ZfRG6OBPcCw6osJXU1USoINSYNHtCHY01bWQYKprKUoNXorSz+2m4rYeCAJjMIN2nR4UHvoPfmRCp4JVspPzR0iw/v4EfH8K/8OF0Ofvr6K/IAqn7pRAEEp0PloiGrwczwrE2Xjldb5HFP4fT0+NP8HP7Y742JwoRW3AtI38/9tpENgjQpHbto3TdSpsB84jeKiJK9MrXmwtjsp17h+Pk6kGVJdBfIA36MCNxnR4zHGnKTHxAVID9MPzqFR6x4Hxl/9QTTPMbZRswMJEFG4i5m/cqE7dmRUmNlpt5Ts44g0Fxm3b5zZxNfKPQOQkLpUnjAF1xoFYbYoNxKrTHWR3RemxuqgbNGHO3n0FutCcM90X/mvr+/0ky5Fsk2uufu/vi+g/JEaSnEjZ5w2qbmqHYYDhzE9yG0mSJAkx+mu7Rx/lmeaR1WP9iOg/LEiSFCuJO/Rr+gn6/cb/vPv0fRGB/31DfvW7l++LiN/86r/f/Q2/TyLk3acv3zcRT//lP/pP/+k//af/9J/+03/6T//pP/0nw378zbQCwT+rYr5s56HwCs5A9/ixUpH4cSXz/fEcNF7BGegePt+qCqhkvpiDxis4A93DRyoT1cx3Dhqv4Ax0T//pv804ltcIHMvrAI5l/SDpozzQ8RCeH5JelMemW4xRx+NwRSbpwfNtFS0YZ/lKeRakK97zeE+q6D2b5Vui96CznnL80mihfdPF3vkgOfSAYSm8g52bbWxPV2FiF+23XPSqZ8GxkrfYRbgqWvACHbj5lqsm1fNeiFm+cOMSvUecD9Utj74TiEgX0blgB22/HCy6qUgAXoXZwTgV8RBVzoJu1jDFrKYrogWv0QHVqlwzNIWbpDKBV2IzRFw/kkflseNSN3XOBd0oteXQcUj/RvEKzJTab+QMq2bByBnILCzG5VmQrEebPdptnVbNe+PNUHNbed4jKeltXxE66Dz0NAVHPB3SkWgOCZy1DQXo7wkVOSPmiQmnbXZO8XyvWK9vyJwMeQ45oSPgMUfE1HwkIsXIRmf7UzOs6JKceUFYwTiIKKOqWXASTDICJ1q4BSHOkLZqq9cM0VCtEpshklp4K/CVa1N5G+3Ad3kiEqCTjUKqqbaiI4TFGCAmJgiohpKJXbUX+PBSSdwsczLk6atW+yZjJld5dNd1XBQl7Vldtqdm3JJgWEEaPuIqWlBEXC7bgmi7Z9FNq+m9HsIyvMdTDwNtQGFIrBUrl5BkndWBRLAzfFeKYTPNMXYy+3BbSsbHWOuX+lrmnohvGEhlZHbzWlIUDOiw6yKgWEG61itnQUPrVLYFuXYv9LbRTarovb53RQ2KNkMrLjt0TQYj2VFUATGEpRskrRfh+gURj0WGbmbbKcISMQ6W2mAiqq15yc7QN0jTzG4xwgKYE1K8CDBWUI7QqZwF9eegdAty7d7RzbZyoyp6b3izXfTbFK+4oJPro5FjzQRqmg4RF4R0MXb8VAUztXHJRAgMPaCulsPYCNm/7qrpnHScRYCxgmPVSqppwb7TLd+CYLsnEm+ruIreE4ldtsW8t+ri5xLlMcUg24UtisUonknsqpTbskBYIzITwYrryzkEYAUNz9W0YIgPpeWDtHvm0+Cq2AyRwy+j1Rrz2QvHib6eLUZmxOMYYWFkkwcusQ8PNMzJqPYcPEJXy3eqJcMYmxHFEcbz4ikjHaVpmReIFYxdlrSSFjzKnsu3INLu6WdWyRjdqnlvyIRDDMvzHkieE8+I0dWvyHExzSFC102LE6idzFrKQ06LxYip4jmQHTUl0kbiqZg2emkOgeq00uJQqeYyaLWK0Z4XfhU0PVfRgmNygVn5FoTavUO63CHCinlPK4H4sOV5D6RYufbYthBnpgiPJ28jSHMQBZ6DCIiJHVq5cmbJjJjjebCONzm2mTQLmvlRHlOocB7ccDJyVWxkOtNCCjgvCCsYu/CDLMKqWTBW2CP5jkq2INTuTV0VTo57yk0q573MZtxDu8RmCCSZuADUx7K1v5MpGKS5BBjnUWSHNso/vjnKktkZp/Mgw0waZ9IikvlRmod4iOegHbhEtDliTf6813EgrGAELdpVs+AhtPAXbkGGMxR3iPOkWs2QwVLoJiU2QySJTCI2KUjS8xLWHqdSlCLSJpYXPMWxWf7cNF5Brt2LrU2q6T22nJTpvZcbhT+Z0ngFu+8efYfQo7ie84pfaLAlgh62NBqvYOvd68Ak0KO4nvPKKGDOP4Absx5SGo1XsPvufUY8oP9+bfFB+9/4/6Qq/KyS+do5aLyCM9A9fPin36wC/JMq5ss/eTlHFF7BIejeK5G8eoXKZrtuxOdefa7Osy18hvSf/tN/+k//6T/9p//0n/7Tfwbi+R8gVc32lJKDPxTj+f0bu7dPqpotde9datgzM9jftQ9PjBJqJtumZ6gHf2js5rniNOOk2di9sdV4UNVsoWtuPeFnRu42Nvav2eaJUUK9ZFv1DPXwwGcf53DPns64Zf+f+XTrclWzZe57jXs5ok7sdWY+axwwQ5OwWy/ZVj1DTTy6EFc/oGmoarbIneWCJhmIaIL5DjXC1cv1km3XM9SCF/vvveTPHGhDkf+3f7VxQ37PTY0nVc2WuIMbjav7D2d2IAclTNo2LPNn6Fbk2uV6ybbrGWrBadabvHW5uZuNRfPZ/gcJdxtn2fjkVmWzRW638cH9B6f26sb+wVbjvs5d0+j0ib1dL9l2PUNdIK3DHXtm2Mm8M/PIpa3TymbL/QGxW0KtQLcflzY+YTi42dw6rZdsq56hNtAzHx7qPD+wzex8l6pmi95tZsOSPrF/rbF7onG98bhmsq16hhpxZuLGxhmzXL18UtVs8eONXd0faFJuNR7UTbZdz1A1vtf4BHO2pPtVzZa/5i5rcYcexzywn6idbLueoWJoCphf2NtVzRa/F40bfO+A8InGH2bctffrJ9uuZ6icW7czPtO4V9Vsydula/4B37EnGbfsKdlU1FC2Xc9QKz5hP3X6nK/Z209OP3lpS6qaLXl2F0/uX9p6yY83mg+f4CBrJe42tu6RQK1k2/UMteLFVsP+geWWncXu48pmS971bNXvCjM/nlnBbhw8e8nXGlo8rJVsu54h+OaKM2KJZ8BpdbNljpnpkYpnIkzj8RnbGsu26Bkajb+LQV99Zlv0DP0Cp/mRqmb7SfH3u/Wf/tN/+k//6T/9p//uhb7zfRM//05Vs63AD6yRb3+/nrMtcoZa8V388/s6P/jnK1vVbBvwd3zOAF79rZ6zbXKGSvHzfxB3AMQb8qOqZtuAL7/C5zTwCp+v52ybnKFWHvghcQdANxWVzbaCq88DnyPgFWxNZzsYM/T0i7KTANlUVDfbHq6RvQdb19kOxgw9/aLsJMAQW91si7j+JxiwtZ3tYMzQ0y/KTgIQW91sq7gGYOs728GYoadfmJ0UW91s27i2NZ7tYMzQ0y/LToqtbrZtXNs6z7bEGUr3lT//tWZDvlAmd7rsI3zpj99Iaza++sevlMitLvsEf/5qWrvxtT+VyK0u+wSQGg6UyK0ue/2n//Sf/tN/+k//6T/9J1ae/wGyGtT+aHjv4HkxzvZ37cN1uvbvXWrYswI1PzFKWA1qv/8dat/VvEclnz7PuGs39q/Zq8/X59o3t57wMw1TzZsnRgmrQe0B7FD3Tj/BOTRuzzhp7D7LbPmJtbn232vcy9FxYq8z81njgBmahN3VoPYAdqj9RxfiDw1QDWtz7c9yQZPUvAnmO/aUdHEvrwa1B7BD3fvMPvOL/d9r458H+42mHPCnNW6sy7U/sI2r+w9f7L+XOwq+YZk/Q9vta5dXg9oD2KHu3Wown9qrG/s3thoPM7ZmvNjYnXG38WBdrv3BbuOD+w9I5Q+2GvcJeTU/sbdXg9oD2KEJsCXMsrFr2LN9vNXc323cX59rf2bv6ZXnO40z2mJf2jAMhz9vbp2uBrUHsEMDgA4NX20aOGtuXJ316FbX2h9Hk3riNrNhSZ/Yv9bYPdG43nj88lwCIPvjaFKr7dAkuGbgcePWs5f88cZHV9PaJ3vKAeDHdQVv7DINQVNfLw/43LLW7K/bv1pthwbCjcvMZEnPV9Hax64DtakAmSySyMbnmOYua3Gn8YQM/tTYWZlp2a+qf2k7NA8E2qPzdAWtfeI66igVGSmHp4vjETA8v7xo3OB7B4RPkFNt7tbZMaK07FfVv7QdGohbDTIqDLuK1n7oICQCBkBUO5AO/GcaD/gOrfkte0ob5/oiLPtl9S9th0bBxtUnJ3zWaD48PbveuL+K1n7HcQ2tSyAi0d4V6/lxBnMU73m8J+nU5yMC80Qk8mfSwozAD2Nve+x3HLzJ4QXZXTy5f2nrJT/eaD58goOsdbrb2LpHAjVFXPaL6l/UDs2Cj1v7AeY/bDWs3bi7ktbeQUc/4UaSVHoOCRUSSS0FIBsF8cAZI0dZXdq2TTM5LhCBhH9B1281rN0VZn5Man7wbLYiGlo8rCnisl9U/6J2qGMlxukZmZ7tMdiupLV/lNeiREDbHlsXKs5AZ2Q7DgYycjCd0ZmJHALdJBtGDzKggsM4msnas3Kx7OmxgWcizJxf81rzX1D2S+tf0A6djr+LcXUe3CQViYFuRltEpgrd7OsCImecSSMDHVBEZmwaF7+iXP8taerXf1E7dPYAbBotpwPlsplp6WZChmiJAN7hLN4B4hltWX2pT/+db5JPQI+U5hMBVnYQzjDE+NzT/MgqQNPWov9OuR3HpYyBYEaX4OUjHrpjICZNWUQjyWmZ1nr/nXKBg4gwzARswiO4aOcygOpmAog0LXJY6/13yk2VI6MZkeswaaf6M3oOQjOaPkS0KSM6t20Og9cH/jvYZKIcuN4VQMapTLcB9gXoprlIF3RcxHImzQNaJkRBOntz0L9zTWLfyaI9Ju0OEaGCtAAREBp1qiAxMlSAPwf9mwbH0aTuSSZWFkciRdWs+7MysXYiOSJK0VlAimy3f8fM6WsC/9ni/tuJR1dIeGFcOhE7gNqrB77z/cUwzAqH7XBBD2NGu4ia0UxNcvbpp+OXs/8iotiP13//9ZUDSK9gO7QVjhbuJJ84svNwBKgdF/AWyfn6PwEsgrcdYFMA9MpnvKeAAkRUTXT26Zax//raQ4+n677/GFBtF/CLtEPbMTiWydAF8p8CYB5ccEwOqY1qgJ//Y2bJ8umRqXIk7jgIS8cHNlUBPGBH4Sc7+3TL138xfegFwGDd9x+d0BEO4gL+244hqbOCWyoRfVxOFTp1sJf7w8ySZRPP/JMKrZxXOnscyWYBOlRNePbhl3YsX/+NaO/MOvDXfP9FgKU2HBbw336YL/dJettXxAtTmfgtwOcok+JdsRzmcAhE2uWki+S8WbJkhoadMN9xhdZ2u0Nqe+TvSQ8exqn0/Q9TlQzJiaLey0jPPvv7jZav/0Ztnmhzb6zz/jPasMhdp7blEdAli8WmAnppBBJDiV0qxS+Gt1jOnSXLpetgSkmOJc1qS2KPbKrdFrIY6d0UcRGWR3z20d+ZuqT9JzIEwtcC/ksYnC5FDH0yTz/sEU8jH7A2SVmXYs8f586S5eLnjei5UEfJJACY7NS5/WhI9n4VghkREJdHfPbRP8OwLP1XbJGvAfw39VjBjZcsg2BEV4E1fHjIoTaNzLnj1bmLRTJyENIqqmlG5rVupoyhiEiW8ojPPtqS9l/sQsWvAfz3CAB42SLJYeBzx4RBynKjC+jVHOvl5oxHWXmnCtFKxHL2XzJ7Hr8W8N+xHWZdlWVJDHS1i1cAMyGVMqPW3vnBP1/9YzHjoYajZ4QIGOYgHjpiHVdKJD776Es7lrL/khbwsTXdf49cxQr+ssQCI/Klx3nvxMCHjud6p7VAzp0hf1QCuXWjdBw1Q+lYI9r97D2EZRKffbSl7D/Ontdz/z32g9E7cJcksQvWVkMeo2zPT2ScRwQMMhTaC+TcGfJluUwdePq4RdvwcxUNgDiPqcIeEJdJfPbRlrH/3gYGImu//wybMxetpUgSucBRqq2apEUYAFPDVj73w0LPMegB4QI5b4YsvfZdB704leRtICSX+ViylW6neQgj25stk/Tssy1h/70D8DGJNd9/A6BH55Txlx5aqI/RIS634yshm6MoEzGeKkdmUtzcftvIAXvbmL2/QC5WyAXU3nOgeFsBXWovtwPlSJzPGEBUKuHZZ1vC/hs5hii/f9NtrQl246WI2gkS+rBzHaAdK7KeumRtTIiU8U7ufCFDl5ySltTCTGQzQ5ZV+4eCUwOjiHac5qCfK1Aq2dmHW8L+OzRRfv8k9qGJrG8WN9V/YjxIxFSKFFPI1iZ1MbvxLxZV+0lkj89FFVecfTLz0kLvdxLveTya654CCVtZn/wnUcfjaSHnlWeipIiqJeq/Uu4fsiQRV8mxHgVWR03cW6sqJX/0gFhq9rE3TFuOVxDt/Gw3nE93pszmpgIALGRl+Lryw068/vhPekX2xY62AcAnm62xZqj5TNgvvIreyfy3HS9r/+1e5OjRrimqUvJHD4iVCpUzSpUjuUzs2DzJf28ePKjcrlvsAjttQMVUw2YbQJgeQk3WHv/FDlp2ksNQc56XPStgRwA3Ls7IgV/0F7doUS9isqT9t3vjTT06NUVVSv7oAbFQYwdBKqObRwW2O216liIn5LCinYMibRIDY/L47KbCJK+YbAIZarLu+M8WOU/EBSf6/HJMvrfvoDuH/uBm/taL7uha3X/ekvbfuuzu+7ZZZh+7+m34CxIBo2JTN/uxmSCXHbS1A4cGDV1ytnvHcadrjv8OgTiPR8CALimQMTCghnJzGEVijoIEgH6Vw3L23wTz6bQ4gT+gM/4POx7CVKzvOsrndEyJ8gF4Qgl8oDVTzRxZ9nLu+64yOtw3TctgFYL1xn/cBjp+NPFZM6Ahxtmw5bjNUcYOOFe0G2pMfY7IL22R9DzO+jFdBt7kgU5nBvXfxSvaa0+7lqnjZNEXXz/N1EVHm3CHaDHJAbyQKNbGf4E3iQI31umbL/9kMF2cWm/8BxLDSDfgnoGOcaK9OHvHQo+hfDjzWz82XTHsgV0AmdZNzZcRENJdZ81/F62QfRUziu1QIOk42oFYm8rERcv3oMZ5yOitrFUK45lCwLeTFEDXct/RTt7a87aBwHzddKjPArrW+C/qOejbZIYKjm1nJoLgsXLEmm8OFOcgse8C8MUEWqMoyLZk9Be3mND+j6+cVqL776KVkYOxmZGQJcX6h/Xt0odzzjWkPdqu66BFP6wtXhOt7+K0Qp0eEOgt4XCd8Z8+R5AFQvOUSQDwZqzDdAg098zTYRvA2ICb0NEx0wx2Cdk9UqHBfxep6BsUM6SVMuNB9Ud919meFtCc9XXZCBMxens0iXygZ3j+kBh+Otb1xX85WPOc38eTkesgmnP+xaSngIhgMq2R2EX7kHhR999FM0OqKwIC6Xt60J8AwoHybR55s+D7WgMXAu3UQLAuYVKT97NxvK05Z2Jw4Uj/YdzaI5kHD66eiua/C2VM7YpsopPz4T2l/dir+YxnJJE1NndyBLSS9Y1pHFm9u2Eev8z5sFUAstOx5gLo6su+oGUwNx1HGU60HUd6pAJwP0klh67+Kz7og98DwljRZ8O4+Jriv3xGQl1HPqUNf5qfxRpcmKQ+7a3Mja+Pimr+u1iFtkXFac0I6Hg2518NTDquuaiJfuMtRaYVShiYZs8SGY64WsCuL/4zQ4xjAV8ssEeFdUR6RW/9sZe9VZwp9awHFdOFjTX/Xajy149eFYUBj2XqAtt+sUn+H5EJVUBIfzJYZiGCdkCDqusCa4z//gPK09zW0W7zoGLTSFeR7cscEOd19ohmz1dAN9X8d7HKn18AKQj5Xhgm+e9IWgJdn9x4y3AbAXrB8GYOLvw1xn//e0VsuLVDV4FqWBgTF2iLxDv6r31B/XfxysRBX4pGZI9LneQ/2wIl9jjnngIFou8gXov8Zx1YSdhgHLvoe4MkmuzYWjb57+IV6Zf0pwAqiZLmSV1v/Pcfi4xX9e8CFgkc2KpgFXjd8V8Kq8DphSwSOMOqcKgGa4//ruI1/bugReLz8W5k4xLyS9Ym/00jm5xr4iR9bcYJzv5f/qP/9J/+03/6T//pP/2n//Tfb6D+/PX65at/KpFbXfYJvvLHr9Yt3/jjF0vkVpd9Apa//LVmg//tZYlxq8v+vW72V/pP/+k//af/9J/+03/6T//pP/2n/3zRu799vwS/xn/88tfvj+B3v/qvN/793afviwj85xsA)

Per-channel Quantization Overrides

Per-channel quantization should be used for tensors that are weight inputs to Conv consumers (Conv2d, Conv3d,
TransposeConv2d, DepthwiseConv2d). This section provides examples to manually override per-channel encodings for these
Conv-based op weight tensors.
Per-channel quantization will be used when we provide multiple encodings (equal to the number of channels) for the
given tensor.
We see an example for convolution weight for the following cases.

- Case 1: Asymmetric encodings without per-channel quantization

{
        "features.9.conv.3.weight": [
            {
                "bitwidth": 8,
                "is_symmetric": "False",
                "max": 3.0387749017453665,
                "min": -2.059169834735364,
                "offset": -103,
                "scale": 0.019991940143061618
            }
        ]
    }
    Copy to clipboard

- Case 2: Per-channel quantization encodings with 3 output channels

{
        "features.8.conv.3.weight": [
            {
                "bitwidth": 8,
                "is_symmetric": "True",
                "max": 0.7011175155639648,
                "min": -0.7066381259227362,
                "offset": -128.0,
                "scale": 0.005520610358771377
            },
            {
                "bitwidth": 8,
                "is_symmetric": "True",
                "max": 0.5228064656257629,
                "min": -0.5269230519692729,
                "offset": -128.0,
                "scale": 0.004116586343509945
            },
            {
                "bitwidth": 8,
                "is_symmetric": "True",
                "max": 0.7368279099464417,
                "min": -0.7426297045129491,
                "offset": -128.0,
                "scale": 0.005801794566507415
            }
        ]
    }
    Copy to clipboard

**Note:** Per-channel quantization must use symmetric representation with offset == -2^(bitwidth-1). Per-channel always
has is\_symmetric = True.

INT32 Overrides

INT32 overrides can also be provided to override an op to run in INT32 precision. To support running
an op in INT32 precision, INT32 overrides should be provided for all of its inputs and outputs.
This will inject a Dequantize op followed by a Cast (to: INT32) op at the inputs of the op and a Cast
(to: FP32) op followed by a Quantize op at the output of the op for a quantized model. We show a sample
graph below where the op “Op2” has its input and output tensor overridden to INT32 through the use of
external overrides. This in turn generates the second graph to support the INT32 overrides through
the use of Dequantize, Cast (to: INT32), Cast (to: FP32), and Quantize op.

![../images/quantization_int32_graph.png](data:image/png;base64,UklGRlAPAABXRUJQVlA4TEQPAAAv+kRNAHfBNgDAMklqu1v//4iTbbt9w20AAGQT2/b/35gzJxsFIwEAi2zZrlNP6/W2Oc9/gD/4A8J5Ecf9eU7NPAnLzAwD13VUUVFBifk+8vxu9qfw7inFQiAIiaxLkUKWl+MZh8i6MFjGMzaR90i8h6NtGCIztQyN6xpz9aNUeHU+4pfvm4e+3QAxLMjJ7boJ2YTJ0DPZryeqQgSUTgV6E43Cw7bNddtm20YyEmXLbS81ThpVzcKyr3oRMfn/v484Z0hxMU5DtZhFRP9pQbYVtJYWN1Q06pzZHg4HTHj/5SVfyf6T/Sf7T/af7D/Zfy7P+VI8DsuxiMdTeexjbNG0xsjBsXgYFi/KZ7fH4tjDfG+NuRzgdCyL8vWhibByqScKH5Ai7NQjWfM2PJ3K5wdZ0OJxQIo+ysP1HtWauICgOKqVrw6tko+ncqHGC7MH06O15f92wpCqj5YMaTB0tUXBQu0KaY8prCW7v5SPYKUg5QRiuifJFD8qLAlnhtKCRYHdR3eQ9UKT3XhL8NiFmlORCiHcooS88xHNaUhwMvPeYBGHBrQilI1hNgwhFKIolfXYF6L54o1CoHTcII8kvNCWBuKxlzQBBZN74ZCdwoYcOGh1ohtaRJK2bDlBPD6SIXWDnUQvFAbe5bL9X0tBpWN3c2pjVUPClB1JFgq7+vIAIc6F4RlFo0Q6rniGj1AlN1KiL4oWLN08F9GBFhGmhGTLA8WlhKTGPHZqUhglJFuU6UfOZGmvIsWj4CsmlApewfOSpuQrJpQu+jhMuxXchU5N+2HCcTqutUkhgQsxoaJFwWDFqVCcSgWRI8wCZF32sYaclkIRnmOeYB7Cks1nuNA5KMzYiVHo3EA6FK9tkSASMrIqXOgMQPNy6b5UJJ7x3MZKwDE4mQV+fjbWSAKI0IAmJyBQQQcxH0E0/PxsLLGkl+IRiVfmq0OKlJcNxYt4jWisvIDA088mXh+awnSh5GaupT9NSZFXybN4Li8LeK2VzP7z05/i+WxwrT3m68chGIP+EyfcCoC4tuulAkpWlWJ+SRevAqfCSKR9RgsuFCgMDlVcM9S88NeAI1lCVcuAY5vi4eJgg49hzi2+IujGADMIjkvCJEmXeh7MIPgo+aVZMuHIRuKuBRPErNTMjmmdBNJLJBPcHhm4pbIAX+gxs/QHgUkLU1hcuYPCuJcWKiJ07eewZ5gbDehMCMyjInYetVzqeUgBqClPBTxLUTqW7sCC2uClkBxx9dfGyCJggVlyBw1jQyz6h5wKWB2veXbrN+xUEp6i8FLSCzUGP4Eu9GRlr6XwH0kUfmhfiz5eRJhGmocL7RWCWltIcBIUMmHwoKaP1nPFhFMDkqcrL/2VnUyLEnKzjvMfkSOZipO2wvy57Uoauk1UZP7BuFblBR8LmI6D4yyFoO7nr0eYkYOHEIJidCCtTTq3Sf8Meh+7Ttxr0T/tCIHogTyHZU97oE+elVjYyHkaERDzZCuNPe2Z2qXEQl/qN5gHx6d0pWhjKo8FsctlBUTivjWoxakDcyJgQ7noWzEm+0/2n+w/c2ZVteWqD5tKbtZJyU03dWuMXRYQFRbtY5DYXFek9IrDbpcEKEnLuoORVS4228prTKK1S+A1yKuLNH++G4cVjUHrKrpYmQD8+W5gOgWsq03Hvi2k0DF//ny3RPHN343AtqoZC9ZtRhZpwnGLsWt4pirlCHBpc6Wtmpt1xaaYMXA1Yf9Yomgb87B0ZV8rZWbntpLg/m5oFHJMKtyWKKn09NWNCLRvliegMQ/ONROgWuDYTVXLgvu7oQlmbA1oUndNZb6RgZ/9b3736ZZfP41aBQ5VR2LSUo5abVDTkb4mNHn/WJyqeP9m5N0YoISpfn3/adzdnIRm7gNXk/ePJQlsyKOexlVygO/GPK2j0vpGCOi7RYnUkEf+PmHLCr+MIgfp3ajfJwQjbmDWFF5SgO+WJf7yN6P/ioQ57UYM/nr/aexfkTCn2U1NLNjMn1gA3onzS9urGyAg+I6FJErWGFuBjWLwrSiXCivGXFfxqqwYQ+hWMoPLRd/q0Nl/sv9k/8n+k/0n+0/2n3+Vc8lXsv9k/8n+k/0n+0/2n+w/2X+y/2T/yf6T/Sf7T/af7D/Zf7L/ZP/J/pP9J/tP9p/sP9l/sv9k/8n+k/2Hw+daXgflXgPtfsKGq5+u/FNwsew/2X+y//wLmqfvn0v7GvfAfdjdd6aU5tGWOyBZcC8Rd9//TtoZ8DmeWglTISq9tFd6IvJGvPwQc/uNFFP/A3nr+5JaA+xaA25fcZh7oOb+EVB3OxBr3hJD3PeCupi9JRYcwM8Yt9Kko4BEP+6Y2OXEArmIu14QG15aAi2Qjj+oD39k3HnPeC5j2U7OdrjDFhjf1CgQY0hoGYBBI/qO/l/RJq9CcgAngjvE0JpCbQiB8X1JfECNWqMC8gSQoEqCU6RIFsmIp2TQfahgfCVQLUFFMEPDUTtrE2d58aCeu2OcfCcdH9nY0rKHNsd0GbetIVqmhvnxfIaM6Ax+oDbsJIMLQukekE6jJZge5DYZ8jFpJ6SLJawtqQth3eqAHucH6dCnuOuCK6rUkPguFxraCVM07pIhqKp4HFh3ZkZgpOc+kEbo3/ZSkhRLyvFhZuctluwbgkEj+s5JBFPuNskIBNdwNY6PfkvU3ckKYwrThIgRezfxsNDe3EM0M0Cw6vPLd+8NSZGBO1FhTUHPJUb4txCTFEzanLALzET2g3wwaETfOW365LaElKSbEL6zLqBvTYdLDsmUEwJWKbL2I75zLwsVDI+XftyTITiao6Xg7NCpgb2c2K23IrHcQqT1oQ4ZLKJ3AcfD4VJi7m4MdDBxwEPkZY+Rds/Pc9y6iQdMKPaFGZ6+PJWCUba7+hIvjA2ltEDTI1dy1jqTDWaalDQ+bWEPoY9Nd0MwWERnCRx1S3mI0milBNaePe2VhHQx9pADwZWw91JCLLTv7u/hgD6VomHtrp4bT6GAhskH57e3zKM6yq4FbfgoKMroRcMJiAgbPsBpA0Z09hCl/49yEu4eDMSIwI23AsPaFDQ/EJG0qyd3YIzHfSkc7a6e2NeXRIRJDBdegyQf+/SSc8Nd9xg42PChJIGO8xkqovPs0JgDhHUhN/au6Uy8EZ2WhR9E3d1Zdgr2qMlIytf+ZiBXIlEtHXdn0pF2cZArAUpTZWk3usrbHZ2xkROMJbceDOyg+OMQHw8a0XfeaZn6rYx8dEf9ODjR9m6SwlvDeILfmph4BAM7Ro24g5d82NfgQJsUjDB4yU34gfFqVZOU8FsU7NBZBZy3P5/hIjrS2fpdZQSeDqUHyZ2VHzASi8yBfPNV4SUn9ncH5LcjEMeCFJeNuw6Y345EMgSLioy60ycHqiYpmAhi6GDAI4c6cMCI3lkpznHJB9vv7F072DuU21LiDWa+kZsUCRkx7d6tmIDeC4k9de9WcuZ3yWFc9yMzB3krI+rO5R9s8jfkhQaN6Dvv8fTCJCQY9dWDnXSV+0KMnA7SOuUfwIoxku/ek+TM5WHqZdCI7kzuSi900bdY1k9IqlSuu1m1xmyE5DrpthKWLdxrG07XzQLC3kZhU2FZ94jDw//NVrk1Ee5KUmoiVf0jxrp3c4hqKwk1casxiSrmwt7GooawJAjvv/1vPzyBsWrb6SCbJKdWPixzVa7GrC16a7TS2bG3Adhgcr8dwmeNNNsfB7QipY9BIuGuE2c6sWoMtvT+S4K3Ag4N46qrDv9YKbqODcpMldvS60S1OZvZ+fB5/kpFqexHCjYx0Eq7Q5w4bGk20MHkBhdTilXDQwanVtW9h0ezBAQc6f5qK19UynQhwMDVTJXbdHTLykwZkIEOGYHP04cHySS02qTmSe/whg5eqjqSXiijtdbR2YQVmfytpz/6kop1FbRBbDqpZ8nezoGO2UQEGspwdOm5rrZT4PdvUyq//TKkeFyO2Yaqq870eVUPym8zVY5vakk38kcmeSjz9OFhpKpqlpqZ16XDtD5Qm2wGoy+Z4OSutE09N+3doO0GmOXAoj+MPOhPimY49g3rKmHUKI+bpKInFbHtQLe84lwM+uIeTCtWxZtBUS4VY6VrJeOUm2n164dfbgZly+gI8F15bQN78byU606Imc4PziP93ao6d2PqPvzLWVL1pFIMK4kauvS0wXXeVrkSM6+qmMPoa+LsjTBWHraumHkunTOQp50FxFYjaQwVioUxKjij+gA29RtiTCtWvbFBYXt8VrogfjBTiHg3PmKl7MTW2nxgJqbcSJU5de/UFOe721G8QaS65jF8eADUePsZmCcItB16PY/RV2+mMxU7AlxHUwN641GteWY6sDiTqAKdiKQyHWyrNeNhvkkGB7pZEIkVzSQhYprQd8PCej0nXVIbmtJMdxPxxt3NdSYjzI9MSrmBd/f5ghDtA2OeYOc9xO5t+Lpj97o6j4CXV961OfnR15SBma2RK7RipRzzHVicR00cRYlCmw4wRQM2qGwHUTnra5tpQt+NVrHSBZlPmCcfQLwxT+Mrq+JmcCak3IinVc4OZCEdmSUDdN7bJA/JG43jGh151ZdNayyaNa+M2XoyranYEXdT1hVXzXdg0QtGKGO+f1JZD1yrc6DOZpR6knxgnGus3Zx0VKCZMoZ26d6SgYDXdNRiVaSXmD+gq9tOg4xIaM7v5NFXzJMP+maw7xPWFcRSGlCviFY9qWnPUiebko0zyZj7MrWp2NFO61fNd2DRF/ApJgHwftUVhtG+MM5W1XMIViOexkhHTVI58Q07cgsyJuCmsq7nx8i/IjFmgxgAptQi/YqEu9naqRUP2BItTMJhjKIT59wjJjYVO+L3Ca1yaDY1x1wHFmdQJ2WuIG4a5J1B5ktqGlA3mFT2ZU2TawObNlXIgUOYStIZK90K1F5LCgmv1zdA0DggKa40BuFfDf076To8RIIZkmwqxX4uGK1CO6i7QW1tFhnz5MHnmSO5mHeMP41DjoFFzTfbNTGBiNbCpDZUOx6wRVs4jY1mQoLAtAPGhJYgGqJys2WcguiP9ooxoXJpNwMGe57ZM5ptboDNCtLuGWNmZ1Kz4zkaqDbtBppynXZvIRp34OtOmK/PS4J7TYRlpYM/kG/9UwR3mZnaVOw4hSCzi7W8kivGLBmY2b+Hcfd/IfFPc3Xo7D/Zf7L/ZP/J/pP9J/vPf51wyVey/2T/yf6T/Sf7T/af7D/Zf7L/ZP/J/pP957/zMA==)

**Note:** INT32 overrides are only supported for ops which do not have weights and bias.

Last Published: May 06, 2026

Previous Topic
 
Supported ONNX Ops Next Topic

User-defined Operations