# Quantization

This page describes the general quantization process and supported algorithms and features.

- [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#overview)
- [Quantization](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id1)

    - [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id2)
- [Quantization Schema](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantization-schema)

    - [Quantization - Further Details](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantization-further-details)
    - [Quantization Example](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantization-example)
    - [Dequantization Example](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#dequantization-example)
    - [Bitwidth Selection](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#bitwidth-selection)
    - [Packed 4-bit Quantization](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#packed-4-bit-quantization)
- [Quantization Modes](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantization-modes)

    - [TF](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#tf)
    - [Symmetric](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#symmetric)
    - [Enhanced](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#enhanced)
    - [TF Adjusted](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#tf-adjusted)
- [Enhanced Quantization Techniques](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#enhanced-quantization-techniques)

    - [Enhanced Quantization Techniques: Limitations](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#enhanced-quantization-techniques-limitations)
- [Quantization Impacts](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantization-impacts)
- [Quantization Overrides](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantization-overrides)
- [Per-channel Quantization Overrides](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#per-channel-quantization-overrides)
- [INT32 Overrides](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#int32-overrides)
- [Quantizing a Model](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantizing-a-model)

    - [Examples](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#examples)
- [Mixed Precision and FP16 Support](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#mixed-precision-and-fp16-support)

    - [Non-quantized Mode](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#non-quantized-mode)
    - [Quantized Mode](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#quantized-mode)
- [qairt-quantizer](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#qairt-quantizer)

    - [Additional details](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#additional-details)

## [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id3)

Non-quantized models files use 32 bit floating point representations of network parameters.
Quantized model files use fixed point representations of network parameters, generally 8 bit weights and 8 or 32bit biases.
The fixed point representation is the same used in Tensorflow quantized models.

Choosing Between a Quantized or Non-Quantized Model

- CPU - Choose a non-quantized model. Quantized models are currently incompatible with the CPU backend.
- DSP - Choose a quantized model. Quantized models are required when running on the DSP backend.
- GPU - Choose a non-quantized model. Quantized models are currently incompatible with the GPU backend.
- HTP - Choose a quantized model. Quantized models are required when running on the HTP backend.
- HTA - Choose a quantized model. Quantized models are required when running on the HTA backend.

## [Quantization](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id4)

This section describes the concepts behind the quantization algorithm used in QNN.  These concepts are used by the converters when the developer decides to quantize a graph.

### [Overview](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id5)

QNN supports multiple quantization modes.  The basics of the quantization, regardless of mode, are described here.

- Quantization converts floating point data to fixed point format using a provided bitwidth.
- Bitwidth refers to the number of bits used to represent each float value.
- Quantization equation:

    - During quantization, we map a set of float values to a set of integers. \((x\_{min}, x\_{max})\) is the range of the data given, and \((q\_{min}, q\_{max})\) are the corresponding integers to which \((x\_{min}, x\_{max})\) are mapped, then for a data point x, the following holds, where \(\hat{x}\) is the fixed point representation of x.
    - \(\hat{x} = \left\lfloor \frac{x}{scale} - offset \right\rceil\) (1)
    - Where: \(scale = \frac{x\_{max} - x\_{min}}{q\_{max} - q\_{min}}\) and \(offset = \frac{x\_{min}}{scale} - q\_{min}\)      (2)
    - Note: \(\left\lfloor x \right\rceil\) indicates \(x\) rounded to the nearest integer.
- The following requirements are satisfied:

    - Full range of input values is covered.
    - Minimum range of 0.0001 is enforced.
    - Floating point zero is exactly representable.
- Quantization algorithm inputs:

    - Set of floating point values to be quantized.
- Quantization algorithm outputs:

    - Set of 8-bit fixed point values.
    - Encoding parameters:

        - encoding-min - minimum floating point value representable (by fixed point value 0)
        - encoding-max - maximum floating point value representable (by fixed point value 255)
        - scale - The step size for the given range (max - min) / (2^bw-1)
        - offset - The integer value which exactly represents 0. round(min/scale)
- Algorithm

    1. Compute the true range (min, max) of input data.
    2. Compute the encoding-min and encoding-max.
    3. Quantize the input floating point values.
    4. Output:

> 
> 
> - fixed point values
>     - encoding-min and encoding-max parameters
>     - scale and offset parameters

## [Quantization Schema](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id6)

In the equations (1) and (2), different values of \(q\_{min}\) and \(q\_{max}\) correspond to different quantization schema.
QNN supports 4 quantization schemas: signed symmetric, unsigned symmetric, signed asymmetric and unsigned asymmetric.

| Schema | Characteristics | Offset | \(q\_{min}\) | \(q\_{max}\) |
| --- | --- | --- | --- | --- |
| Unsigned Asymmetric | In this schema, the min and max need<br>not be equal in absolute value. We<br>set the \(q\_{min}\) to  0<br>and \(q\_{max}\) to<br>\(2^{bw} - 1\). This is the<br>default schema in quantization for<br>weights and activations. | \(-2^{bw-1}\) to 0 | 0 | \(2^{bw-1}\) |
| Signed Asymmetric | In this schema, the min and max need<br>not be equal in absolute value. We<br>set the \(q\_{min}\) to<br>\(-2^{bw-1}\) and \(q\_{max}\) to<br>\(2^{bw-1} - 1\). | \(-2^{bw-1}+1\) to<br>\(2^{bw-1}\)<br>Offset in signed asymmetric<br>schema = Offset in unsigned<br>asymmetric schema +<br>\(2^{bw-1}\) | \(-2^{bw-1}\) | \(2^{bw-1}-1\) |
| Signed Symmetric | In this schema, the min and max need to be<br>equal in magnitude and have opposite signs<br>In this schema, zero of fixed point maps<br>to zero of floating point. | 0 | \(-2^{bw-1}\) | \(2^{bw-1}-1\) |
| Unsigned Symmetric | In this schema, the min and max of the<br>float data must be equal in magnitude,<br>and have opposite signs. | \(-2^{bw-1}\) | 0 | \(2^{bw}-1\) |

### [Quantization - Further Details](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id7)

1. Compute the true range of the input floating point data.

> 
> 
> - finds the smallest and largest values in the input data
> - represents the true range of the input data

2. Compute the encoding-min and encoding-max.

> 
> 
> - These parameters are used in the quantization step.
> - These parameters define the range and floating point values that will be representable by the fixed point format.
> 
>     - encoding-min: specifies the smallest floating point value that will be represented by the fixed point value of 0
>     - encoding-max: specifies the largest floating point value that will be represented by the fixed point value of 255
>     - floating point values at every step size, where step size = (encoding-max - encoding-min) / (2^bw-1), will be representable
>     - offset where zero is exactly represented
> - encoding-min and encoding-max are first set to the true min and true max computed in the previous step
> - Requirements
> 
>     1. Encoding range must be at least a minimum of 0.0001
> 
> 
> 
> 
> > 
> > 
> > - encoding-max is adjusted to max(true max, true min + 0.0001)
> 
> 
>     2. Floating point value of 0 must be exactly representable
> 
> 
> 
> 
> > 
> > 
> > - encoding-min or encoding-max may be further adjusted

3. Cases - Handling 0

> 
> 
> 1. Inputs are strictly positive
> 
>     - the encoding-min is set to 0.0
>     - zero floating point value is exactly representable by smallest fixed point value 0
>     - e.g. input range = [5.0, 10.0]
> 
>         - encoding-min = 0.0, encoding-max = 10.0
> 2. Inputs are strictly negative
> 
> 
> 
> 
> > 
> > 
> > - encoding-max is set to 0.0
> > - zero floating point value is exactly representable by the largest fixed point value 255
> > - e.g. input range = [-20.0, -6.0]
> > 
> >     - encoding-min = -20.0, encoding-max = 0.0
> 
> 
> 3. Inputs are both negative and positive
> 
> 
> 
> 
> > 
> > 
> > - encoding-min and encoding-max are slightly shifted to make the floating point zero exactly representable
> > - e.g. input range = [-5.1, 5.1]
> > 
> >     - encoding-min and encoding-max are first set to -5.1 and 5.1, respectively
> >     - encoding range is 10.2 and the step size is 10.2/255 = 0.04
> >     - zero value is currently not representable. The closest values representable are -0.02 and +0.02 by fixed point values 127 and 128, respectively
> >     - encoding-min and encoding-max are shifted by -0.02. The new encoding-min is -5.12 and the new encoding-max is 5.08
> >     - floating point zero is now exactly representable by the fixed point value of 128

4. Quantize the input floating point values.

> 
> 
> - encoding-min and encoding-max parameter determined in the previous step are used to quantize all the input floating values to their fixed point representation
> - Quantization formula is:
> 
>     - quantized value = round(255 \* (floating point value - encoding.min) / (encoding.max - encoding.min))
> - quantized value is also clamped to be within 0 and 2^bw-1

5. Outputs

> 
> 
> - the fixed point values
> - encoding-min, encoding-max, scale, and offset  parameters

### [Quantization Example](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id8)

1. Inputs:

> 
> 
> - input values = [-1.8, -1.0, 0, 0.5]
> - encoding-min is set to -1.8 and encoding-max to 0.5
> - encoding range is 2.3, which is larger than the required 0.0001
> - encoding-min is adjusted to −1.803922 and encoding-max to 0.496078 to make zero exactly representable
> - step size is 0.009020
> - offset is 200

2. Outputs:

> 
> 
> - quantized values are [0, 89, 200, 255]

### [Dequantization Example](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id9)

1. Inputs:

> 
> 
> - quantized values = [0, 89, 200, 255]
> - encoding-min = −1.803922, encoding-max = 0.496078
> - step size is 0.009020
> - offset is 200

2. Outputs:

> 
> 
> - dequantized values = [−1.8039, −1.0011, 0.0000, 0.4961]

### [Bitwidth Selection](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id10)

QNN currently supports a default quantization bit width of 8 for both weights and biases. The weight, bias, and activation bit widths,
however, can be overriden by passing one of –weight\_bw, –bias\_bw, and/or –act\_bw followed by the bitwidth. Please see the
converter documentation [here](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html) for more details on the command line options.

### [Packed 4-bit Quantization](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id11)

In packed 4-bit quantization, two 4-bit quantized tensors can be stored in a single 8-bit buffer. The lower nibble
stores the first value while the higher nibble stores the second value. This can be enabled by providing the
–pack\_4\_bit\_weights flag. For the quantized values (10, 4) the unpacked and packed representation is given below.

> 
> 
> - Unpacked = (0000 1010, 0000 0100)
> - Packed   = (0100 1010)

In case of per-channel/per-row quantization, the quantized values are packed along each channel/row. For a tensor of size
(3,3,3,32) containing 32 output channels and 27 values per channel, the unpacked and packed representation will take the
following amount of memory for the 27 quantized values per channel.

> 
> 
> - Unpacked = (3\*3\*3) = 27 bytes
> - Packed   = ceil((3\*3\*3)/2) = 14 bytes

**Note** Packed 4-bit tensors are stored with QNN\_DATATYPE\_SFIXED\_POINT\_4/QNN\_DATATYPE\_UFIXED\_POINT\_4 datatypes while
unpacked 4-bit tensors are stored with QNN\_DATATYPE\_SFIXED\_POINT\_8/QNN\_DATATYPE\_UFIXED\_POINT\_8 datatypes. Please refer
to the backend supplements to find ops which support 4-bit packed tensors.

## [Quantization Modes](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id12)

QNN supports four quantization modes: tf, symmetric, enhanced, and adjusted. The primary difference is between how they select the quantization range to be used.

### [TF](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id13)

The default mode has been described above, and uses the true min/max of the data being quantized, followed by an adjustment of the range to ensure a minimum range and to ensure 0.0 is exactly quantizable.

### [Symmetric](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id14)

Symmetric quantization follows the same basic principles as TF quantization but adjusted the range to be symmetric. It does this by selecting a new min and max from the original
range such that new\_max=max(abs(min), abs(max)) and adjusts the range to be (-new\_max, new\_max) such that the range is symmetric around 0. This is typically only used for weights as it helps
to reduce computation overhead at runtime. This mode is enabled by passing –param\_quantizer symmetric  to one of the [converters](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html).

### [Enhanced](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id15)

Enhanced quantization mode (invoked by passing “enhanced” to either the `--param_quantizer` or `--act_quantizer` options in one of the [converters](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html)) uses an algorithm to try to determine a better
set of quantization parameters to improve accuracy.   The algorithm may pick a different min/max value than the default quantizer, and in some cases it may set
the range such that some of the original weights and/or activations cannot fall into that range.  However, this range does produce better accuracy than simply using
the true min/max. The enhanced quantizer can be enabled independently for weights and activations by appending either “weights” or “activations” after the option.

This is useful for some models where the weights and/or activations may have “long tails”. (Imagine a range with most values between -100 and 1000, but a few values much greater than 1000 or much less than -100.)  In some cases these long tails can be ignored and the range -100, 1000 can be used more effectively than the full range.

Enhanced quantizer still enforces a minimum range and ensures 0.0 is exactly quantizable.

### [TF Adjusted](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id16)

This mode is used only for quantizing weights to 8 bit fixed point (invoked by passing “adjusted” to either the `--param_quantizer` or `--act_quantizer` options in one of the [converters](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html))
to, which uses adjusted min or max of the data being quantized other than true min/max or the min/max
that exclude the long tail. This has been verified to be able to provide accuracy benefit for denoise model specifically. Using this
quantizer, the max will be expanded or the min will be decreased if necessary.

Adjusted weights quantizer still enforces a minimum range and ensures 0.0 is exactly quantizable.

## [Enhanced Quantization Techniques](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id17)

Quantization can be a difficult problem to solve due to the myriad of training techniques, model architectures, and layer types. In an attempt to mitigate quantization problems
model preprocessing techniques have been added to the quantizer that may improve quantization performance on models which exhibit sharp drops in accuracy upon quantization.

The primary technique introduced is CLE (Cross Layer Equalization).

CLE works by scaling the convolution weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition, the process absorbs high biases which may be result from weight scaling from one convolution layer to a subsequent convolution layer.

### [Enhanced Quantization Techniques: Limitations](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id18)

In many cases CLE may enable quantized models to return to close to their original floating-point accuracy. There are some caveats/limitations to the current algorithms:

> 
> 
> CLE operates on specific patterns of operations that all exist in a single branch (outputs cannot be consumed by more than one op). The matched operation patterns (r=required, o=optional) are:
> 
> 
> 
> > 
> > 
> > Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)-&gt;Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)
> > Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)-&gt;DepthwiseConv(r)-&gt;Batchnorm(r)-&gt;activation(o)-&gt;Conv(r)-&gt;Batchnorm(r)-&gt;activation(o)
> 
> 
> 
> The CLE algorithm currently only supports Relu activations. Any Relu6 activations will be automatically changed to Relu and any activations other than
> these will cause the algorithm to ignore the preceding convolution. Typically the switch from Relu6-&gt;Relu is harmless and does not cause any degradation in accuracy, however some
> models may exhibit a slight degradation of accuracy. In this case, CLE can only recover accuracy to that degraded level, and not to the original float
> accuracy.
> CLE requires batchnorms (specifically detectable batchnorm beta/gamma data) be present in the original model before conversion to DLC for the complete
> algorithm to be run and to regain maximum accuracy. For Tensorflow, the beta and gamma can sometimes still be found even with folded batchnorms, so long as
> the folding didn’t fold the parameters into the convolution’s static weights and bias. If it does not detect the required information you may see a message
> that looks like: “Invalid model for HBA quantization algorithm.” This indicates the algorithm will only partially run and accuracy issues may likely be present.

To run CLE simply add the option –algorithms cle to the converter command line.

More information about the algorithms can be found here: [[https://arxiv.org/abs/1906.04721](https://arxiv.org/abs/1906.04721)]

## [Quantization Impacts](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id19)

Quantizing a model and/or running it in a quantized runtime (like the HTP) can affect accuracy.  Some models may not work well when quantized, and may yield incorrect results.
The metrics for measuring impact of quantization on a model that does classification are typically “Mean Average Precision”, “Top-1 Error” and “Top-5 Error”.

## [Quantization Overrides](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id20)

If the option –quantization\_overrides is provided the user may provide a json file with parameters to use for quantization. These will override any
quantization data carried from conversion (eg TF fake quantization) or calculated during the normal quantization process. Format defined as per AIMET specification.

There are two sections in the json, a section for overriding operator output encodings called “activation\_encodings” and a section for overriding parameter (weight and bias) encodings
called “param\_encodings”. Both must be present in the file, but can be empty if no overrides are desired. An example with all of the currently supported options:

{
       "activation_encodings": {
           "Conv1:0": [
               {
                   "bitwidth": 8,
                   "max": 12.82344407824954,
                   "min": 0.0,
                   "offset": 0,
                   "scale": 0.050288015993135454
               }
           ],
           "input:0": [
               {
                   "bitwidth": 8,
                   "max": 0.9960872825108046,
                   "min": -1.0039304197656937,
                   "offset": 127,
                   "scale": 0.007843206675594112
               }
           ]
       },
       "param_encodings": {
           "Conv2d/weights": [
               {
                   "bitwidth": 8,
                   "max": 1.700559472933134,
                   "min": -2.1006477158567995,
                   "offset": 140,
                   "scale": 0.01490669485799974
               }
           ]
       }
    }
    Copy to clipboard

Note that it is not required to provide scale and offset but bw, min, and max should be provided. Scale and offset will
be calculated from the provided bw, min, and max parameters regardless if they are provided or not.

## [Per-channel Quantization Overrides](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id21)

Per-channel quantization should be used for tensors that are weight inputs to Conv consumers (Conv2d, Conv3d, TransposeConv2d, DepthwiseConv2d). This section provides examples to manually override per-channel encodings for these Conv-based op weight tensors.
Per-channel quantization will be used when we provide multiple encodings (equal to the number of channels) for the given tensor.
We see an example for convolution weight for the following cases.

- Case 1: Asymmetric encodings without per-channel quantization

{
        "features.9.conv.3.weight": [
            {
                "bitwidth": 8,
                "is_symmetric": "False",
                "max": 3.0387749017453665,
                "min": -2.059169834735364,
                "offset": -103,
                "scale": 0.019991940143061618
            }
        ]
    }
    Copy to clipboard

- Case 2: Per-channel quantization encodings with 3 output channels

{
        "features.8.conv.3.weight": [
            {
                "bitwidth": 8,
                "is_symmetric": "True",
                "max": 0.7011175155639648,
                "min": -0.7066381259227362,
                "offset": -128.0,
                "scale": 0.005520610358771377
            },
            {
                "bitwidth": 8,
                "is_symmetric": "True",
                "max": 0.5228064656257629,
                "min": -0.5269230519692729,
                "offset": -128.0,
                "scale": 0.004116586343509945
            },
            {
                "bitwidth": 8,
                "is_symmetric": "True",
                "max": 0.7368279099464417,
                "min": -0.7426297045129491,
                "offset": -128.0,
                "scale": 0.005801794566507415
            }
        ]
    }
    Copy to clipboard

**Note:** Per-channel quantization must use symmetric representation with offset == -2^(bitwidth-1). Per-channel always has is\_symmetric = True.

## [INT32 Overrides](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id22)

INT32 overrides can also be provided to override an op to run in INT32 precision. To support running
an op in INT32 precision, INT32 overrides should be provided for all of its inputs and outputs.
This will inject a Dequantize op followed by a Cast (to: INT32) op at the inputs of the op and a Cast
(to: FP32) op followed by a Quantize op at the output of the op for a quantized model. We show a sample
graph below where the op “Op2” has its input and output tensor overridden to INT32 through the use of
external overrides. This in turn generates the second graph to support the INT32 overrides through
the use of Dequantize, Cast (to: INT32), Cast (to: FP32), and Quantize op.

![../_static/resources/quantization_int32_graph.png](data:image/png;base64,UklGRlAPAABXRUJQVlA4TEQPAAAv+kRNAHfBNgDAMklqu1v//4iTbbt9w20AAGQT2/b/35gzJxsFIwEAi2zZrlNP6/W2Oc9/gD/4A8J5Ecf9eU7NPAnLzAwD13VUUVFBifk+8vxu9qfw7inFQiAIiaxLkUKWl+MZh8i6MFjGMzaR90i8h6NtGCIztQyN6xpz9aNUeHU+4pfvm4e+3QAxLMjJ7boJ2YTJ0DPZryeqQgSUTgV6E43Cw7bNddtm20YyEmXLbS81ThpVzcKyr3oRMfn/v484Z0hxMU5DtZhFRP9pQbYVtJYWN1Q06pzZHg4HTHj/5SVfyf6T/Sf7T/af7D/Zfy7P+VI8DsuxiMdTeexjbNG0xsjBsXgYFi/KZ7fH4tjDfG+NuRzgdCyL8vWhibByqScKH5Ai7NQjWfM2PJ3K5wdZ0OJxQIo+ysP1HtWauICgOKqVrw6tko+ncqHGC7MH06O15f92wpCqj5YMaTB0tUXBQu0KaY8prCW7v5SPYKUg5QRiuifJFD8qLAlnhtKCRYHdR3eQ9UKT3XhL8NiFmlORCiHcooS88xHNaUhwMvPeYBGHBrQilI1hNgwhFKIolfXYF6L54o1CoHTcII8kvNCWBuKxlzQBBZN74ZCdwoYcOGh1ohtaRJK2bDlBPD6SIXWDnUQvFAbe5bL9X0tBpWN3c2pjVUPClB1JFgq7+vIAIc6F4RlFo0Q6rniGj1AlN1KiL4oWLN08F9GBFhGmhGTLA8WlhKTGPHZqUhglJFuU6UfOZGmvIsWj4CsmlApewfOSpuQrJpQu+jhMuxXchU5N+2HCcTqutUkhgQsxoaJFwWDFqVCcSgWRI8wCZF32sYaclkIRnmOeYB7Cks1nuNA5KMzYiVHo3EA6FK9tkSASMrIqXOgMQPNy6b5UJJ7x3MZKwDE4mQV+fjbWSAKI0IAmJyBQQQcxH0E0/PxsLLGkl+IRiVfmq0OKlJcNxYt4jWisvIDA088mXh+awnSh5GaupT9NSZFXybN4Li8LeK2VzP7z05/i+WxwrT3m68chGIP+EyfcCoC4tuulAkpWlWJ+SRevAqfCSKR9RgsuFCgMDlVcM9S88NeAI1lCVcuAY5vi4eJgg49hzi2+IujGADMIjkvCJEmXeh7MIPgo+aVZMuHIRuKuBRPErNTMjmmdBNJLJBPcHhm4pbIAX+gxs/QHgUkLU1hcuYPCuJcWKiJ07eewZ5gbDehMCMyjInYetVzqeUgBqClPBTxLUTqW7sCC2uClkBxx9dfGyCJggVlyBw1jQyz6h5wKWB2veXbrN+xUEp6i8FLSCzUGP4Eu9GRlr6XwH0kUfmhfiz5eRJhGmocL7RWCWltIcBIUMmHwoKaP1nPFhFMDkqcrL/2VnUyLEnKzjvMfkSOZipO2wvy57Uoauk1UZP7BuFblBR8LmI6D4yyFoO7nr0eYkYOHEIJidCCtTTq3Sf8Meh+7Ttxr0T/tCIHogTyHZU97oE+elVjYyHkaERDzZCuNPe2Z2qXEQl/qN5gHx6d0pWhjKo8FsctlBUTivjWoxakDcyJgQ7noWzEm+0/2n+w/c2ZVteWqD5tKbtZJyU03dWuMXRYQFRbtY5DYXFek9IrDbpcEKEnLuoORVS4228prTKK1S+A1yKuLNH++G4cVjUHrKrpYmQD8+W5gOgWsq03Hvi2k0DF//ny3RPHN343AtqoZC9ZtRhZpwnGLsWt4pirlCHBpc6Wtmpt1xaaYMXA1Yf9Yomgb87B0ZV8rZWbntpLg/m5oFHJMKtyWKKn09NWNCLRvliegMQ/ONROgWuDYTVXLgvu7oQlmbA1oUndNZb6RgZ/9b3736ZZfP41aBQ5VR2LSUo5abVDTkb4mNHn/WJyqeP9m5N0YoISpfn3/adzdnIRm7gNXk/ePJQlsyKOexlVygO/GPK2j0vpGCOi7RYnUkEf+PmHLCr+MIgfp3ajfJwQjbmDWFF5SgO+WJf7yN6P/ioQ57UYM/nr/aexfkTCn2U1NLNjMn1gA3onzS9urGyAg+I6FJErWGFuBjWLwrSiXCivGXFfxqqwYQ+hWMoPLRd/q0Nl/sv9k/8n+k/0n+0/2n3+Vc8lXsv9k/8n+k/0n+0/2n+w/2X+y/2T/yf6T/Sf7T/af7D/Zf7L/ZP/J/pP9J/tP9p/sP9l/sv9k/8n+k/2Hw+daXgflXgPtfsKGq5+u/FNwsew/2X+y//wLmqfvn0v7GvfAfdjdd6aU5tGWOyBZcC8Rd9//TtoZ8DmeWglTISq9tFd6IvJGvPwQc/uNFFP/A3nr+5JaA+xaA25fcZh7oOb+EVB3OxBr3hJD3PeCupi9JRYcwM8Yt9Kko4BEP+6Y2OXEArmIu14QG15aAi2Qjj+oD39k3HnPeC5j2U7OdrjDFhjf1CgQY0hoGYBBI/qO/l/RJq9CcgAngjvE0JpCbQiB8X1JfECNWqMC8gSQoEqCU6RIFsmIp2TQfahgfCVQLUFFMEPDUTtrE2d58aCeu2OcfCcdH9nY0rKHNsd0GbetIVqmhvnxfIaM6Ax+oDbsJIMLQukekE6jJZge5DYZ8jFpJ6SLJawtqQth3eqAHucH6dCnuOuCK6rUkPguFxraCVM07pIhqKp4HFh3ZkZgpOc+kEbo3/ZSkhRLyvFhZuctluwbgkEj+s5JBFPuNskIBNdwNY6PfkvU3ckKYwrThIgRezfxsNDe3EM0M0Cw6vPLd+8NSZGBO1FhTUHPJUb4txCTFEzanLALzET2g3wwaETfOW365LaElKSbEL6zLqBvTYdLDsmUEwJWKbL2I75zLwsVDI+XftyTITiao6Xg7NCpgb2c2K23IrHcQqT1oQ4ZLKJ3AcfD4VJi7m4MdDBxwEPkZY+Rds/Pc9y6iQdMKPaFGZ6+PJWCUba7+hIvjA2ltEDTI1dy1jqTDWaalDQ+bWEPoY9Nd0MwWERnCRx1S3mI0milBNaePe2VhHQx9pADwZWw91JCLLTv7u/hgD6VomHtrp4bT6GAhskH57e3zKM6yq4FbfgoKMroRcMJiAgbPsBpA0Z09hCl/49yEu4eDMSIwI23AsPaFDQ/EJG0qyd3YIzHfSkc7a6e2NeXRIRJDBdegyQf+/SSc8Nd9xg42PChJIGO8xkqovPs0JgDhHUhN/au6Uy8EZ2WhR9E3d1Zdgr2qMlIytf+ZiBXIlEtHXdn0pF2cZArAUpTZWk3usrbHZ2xkROMJbceDOyg+OMQHw8a0XfeaZn6rYx8dEf9ODjR9m6SwlvDeILfmph4BAM7Ro24g5d82NfgQJsUjDB4yU34gfFqVZOU8FsU7NBZBZy3P5/hIjrS2fpdZQSeDqUHyZ2VHzASi8yBfPNV4SUn9ncH5LcjEMeCFJeNuw6Y345EMgSLioy60ycHqiYpmAhi6GDAI4c6cMCI3lkpznHJB9vv7F072DuU21LiDWa+kZsUCRkx7d6tmIDeC4k9de9WcuZ3yWFc9yMzB3krI+rO5R9s8jfkhQaN6Dvv8fTCJCQY9dWDnXSV+0KMnA7SOuUfwIoxku/ek+TM5WHqZdCI7kzuSi900bdY1k9IqlSuu1m1xmyE5DrpthKWLdxrG07XzQLC3kZhU2FZ94jDw//NVrk1Ee5KUmoiVf0jxrp3c4hqKwk1casxiSrmwt7GooawJAjvv/1vPzyBsWrb6SCbJKdWPixzVa7GrC16a7TS2bG3Adhgcr8dwmeNNNsfB7QipY9BIuGuE2c6sWoMtvT+S4K3Ag4N46qrDv9YKbqODcpMldvS60S1OZvZ+fB5/kpFqexHCjYx0Eq7Q5w4bGk20MHkBhdTilXDQwanVtW9h0ezBAQc6f5qK19UynQhwMDVTJXbdHTLykwZkIEOGYHP04cHySS02qTmSe/whg5eqjqSXiijtdbR2YQVmfytpz/6kop1FbRBbDqpZ8nezoGO2UQEGspwdOm5rrZT4PdvUyq//TKkeFyO2Yaqq870eVUPym8zVY5vakk38kcmeSjz9OFhpKpqlpqZ16XDtD5Qm2wGoy+Z4OSutE09N+3doO0GmOXAoj+MPOhPimY49g3rKmHUKI+bpKInFbHtQLe84lwM+uIeTCtWxZtBUS4VY6VrJeOUm2n164dfbgZly+gI8F15bQN78byU606Imc4PziP93ao6d2PqPvzLWVL1pFIMK4kauvS0wXXeVrkSM6+qmMPoa+LsjTBWHraumHkunTOQp50FxFYjaQwVioUxKjij+gA29RtiTCtWvbFBYXt8VrogfjBTiHg3PmKl7MTW2nxgJqbcSJU5de/UFOe721G8QaS65jF8eADUePsZmCcItB16PY/RV2+mMxU7AlxHUwN641GteWY6sDiTqAKdiKQyHWyrNeNhvkkGB7pZEIkVzSQhYprQd8PCej0nXVIbmtJMdxPxxt3NdSYjzI9MSrmBd/f5ghDtA2OeYOc9xO5t+Lpj97o6j4CXV961OfnR15SBma2RK7RipRzzHVicR00cRYlCmw4wRQM2qGwHUTnra5tpQt+NVrHSBZlPmCcfQLwxT+Mrq+JmcCak3IinVc4OZCEdmSUDdN7bJA/JG43jGh151ZdNayyaNa+M2XoyranYEXdT1hVXzXdg0QtGKGO+f1JZD1yrc6DOZpR6knxgnGus3Zx0VKCZMoZ26d6SgYDXdNRiVaSXmD+gq9tOg4xIaM7v5NFXzJMP+maw7xPWFcRSGlCviFY9qWnPUiebko0zyZj7MrWp2NFO61fNd2DRF/ApJgHwftUVhtG+MM5W1XMIViOexkhHTVI58Q07cgsyJuCmsq7nx8i/IjFmgxgAptQi/YqEu9naqRUP2BItTMJhjKIT59wjJjYVO+L3Ca1yaDY1x1wHFmdQJ2WuIG4a5J1B5ktqGlA3mFT2ZU2TawObNlXIgUOYStIZK90K1F5LCgmv1zdA0DggKa40BuFfDf076To8RIIZkmwqxX4uGK1CO6i7QW1tFhnz5MHnmSO5mHeMP41DjoFFzTfbNTGBiNbCpDZUOx6wRVs4jY1mQoLAtAPGhJYgGqJys2WcguiP9ooxoXJpNwMGe57ZM5ptboDNCtLuGWNmZ1Kz4zkaqDbtBppynXZvIRp34OtOmK/PS4J7TYRlpYM/kG/9UwR3mZnaVOw4hSCzi7W8kivGLBmY2b+Hcfd/IfFPc3Xo7D/Zf7L/ZP/J/pP9J/vPf51wyVey/2T/yf6T/Sf7T/af7D/Zf7L/ZP/J/pP957/zMA==)

**Note:** INT32 overrides are only supported for ops which do not have weights and bias.

## [Quantizing a Model](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id23)

To enable quantization simply pass the option –input\_list along with a text file containing raw data inputs to the network. Note that the inputs specified in this file
should match exactly with the inputs in the .cpp file generated by conversion. In most cases, these inputs can be obtained directly from the source framework model. However,
in rare cases, such as when the inputs are pruned by the converter, these inputs can differ. The format of the file uses a single line for each set of inputs to the network:

<inputFile0>
    <inputFile1>
    <inputFile2>
    Copy to clipboard

If a network contains multiple inputs they are all listed on a single line separated by a space
and prefaced with the input name and a “:=”

<inputNameA>:=<inputFile0a> <inputNameB>:=<inputFile0b>
    <inputNameA>:=<inputFile1a> <inputNameB>:=<inputFile1b>
    <inputNameA>:=<inputFile2a> <inputNameB>:=<inputFile2b>
    Copy to clipboard

### [Examples](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id24)

For graph containing a single input the input text file would contain something like:

/path/to/file/chair.raw
    /path/to/file/mongoose.raw
    /path/to/file/honeybadger.raw
    Copy to clipboard

For a network containing multiple graph inputs:

input_left_eye:=left0.rawtensor input_right_eye:=right0.rawtensor
    input_left_eye:=left1.rawtensor input_right_eye:=right1.rawtensor
    input_left_eye:=left2.rawtensor input_right_eye:=right2.rawtensor
    Copy to clipboard

## [Mixed Precision and FP16 Support](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id25)

Mixed Precision enables specifying different bit widths (e.g. INT8 or INT16) or datatypes (integer or floating point) for different ops within the same graph.
Data type conversion ops are automatically inserted when activation precision or data type is different between successive ops.
Graphs can have a mix of floating-point and fixed-point data types. Each op can have different precision for weights and activations.
However, for a particular op, either all inputs, outputs and parameters (weights/biases) will be floating-point or all will be integer type.
Please refer to the backend supplements for the supported weight/activation bit widths for a particular op.

- [CPU](https://docs.qualcomm.com/doc/80-63442-10/topic/CpuOpDefSupplement.html)
- [GPU](https://docs.qualcomm.com/doc/80-63442-10/topic/GpuOpDefSupplement.html)
- [HTP](https://docs.qualcomm.com/doc/80-63442-10/topic/HtpOpDefSupplement.html)
- HTP FP16

FP16 (half-precision) additionally enables converting the entire models to FP16 or selecting between FP16 and FP32 data-types for the float ops in case of mixed precision graphs with a mix of floating point and integer ops.
The different modes of using mixed precision are described below.

### [Non-quantized Mode](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id26)

In this mode no calibration images are given (–input\_list flag is not given) to the converter. The converted QNN model has only float tensors for both activations and weights.

- Non-quantized FP16: If “–float\_bw 16” is added in command line, all activation and weight/bias tensors are converted to FP16.
- Non-quantized FP32: If “–float\_bw” is absent from command line or “–float\_bw 32” is given, all activation and weight/bias tensors use FP32 format.

### [Quantized Mode](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id27)

In this mode calibration images are given (–input\_list is given) to converter. The converted QNN model has fixed point tensors for activations and weights.

- No override: If no –quantization\_overrides flag is given with an encoding file, all activations are quantized as per –act\_bw (default 8) and parameters are quantized as per –weight\_bw/–bias\_bw (default 8/8) respectively.
- Full override: If –quantization\_overrides flag is given along with encoding file specifying encodings for all ops in the model. In this case, the bitwidth with be set as per JSON for all ops defined as integer/float as per encoding file (dtype=’int’ or dtype=’float’ in encoding json).
- Partial override: If –quantization\_overrides flag is given along with encoding file specifying partial encodings (i.e. encodings are missing for some ops), the following will happen.

> 
> 
> - Layers for which encoding are NOT available in json file are encoded in the same manner as the no override case i.e. defined as integer with bitwidth defined as per –act\_bw/–weight\_bw/–bias\_bw (or their default values 8/8/8).
> For some ops (Conv2d, Conv3d, TransposeConv2d, DepthwiseConv2d, FullyConnected, MatMul) even if any of the output/weights/bias are specified as float in the encoding file, all three of them will be overridden to float.
> The float bitwidth used will be same as the float bitwidth of the overriding tensor in the encodings file. We can also manually control the bitwidth of bias tensors in such case (if encodings for it are absent in encodings json and present for output/weights) with the use of the –float\_bias\_bw (16/32) flag.
>     - Layers for which encoding are available in json are encoded in same manner as full override case.

We show a sample json for network with 3 Conv2d ops. The first and third Conv2d ops are INT8 while the second Conv2d op is marked as FP32.
The FP32 op (namely conv2\_1) is sandwiched between two INT8 ops in “activation\_encodings”, hence convert ops will be inserted before and after the FP32 op.
The corresponding weights and biases for conv2\_1 are also marked as floating-point in the JSON in “param\_encodings”.

{
       "activation_encodings": {
           "data_0": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ],
           "conv1_1": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ],
           "conv2_1": [
               {
                   "bitwidth": 32,
                   "dtype": "float"
               }
           ],
           "conv3_1": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ]
       },
       "param_encodings": {
           "conv1_w_0": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ],
           "conv1_b_0": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ],
           "conv2_w_0": [
               {
                   "bitwidth": 32,
                   "dtype": "float"
               }
           ],
           "conv2_b_0": [
               {
                   "bitwidth": 32,
                   "dtype": "float"
               }
           ],
           "conv3_w_0": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ],
           "conv3_b_0": [
               {
                   "bitwidth": 8,
                   "dtype": "int"
               }
           ]
       }
    }
    Copy to clipboard

The ops that are not present in json will be assumed to be fixed-point and the bit widths will be selected according to –act\_bw/–weight\_bw/–bias\_bw respectively.

{
       "activation_encodings": {
           "conv2_1": [
               {
                   "bitwidth": 32,
                   "dtype": "float"
               }
           ]
       },
       "param_encodings": {
           "conv2_w_0": [
               {
                   "bitwidth": 32,
                   "dtype": "float"
               }
           ],
           "conv2_b_0": [
               {
                   "bitwidth": 32,
                   "dtype": "float"
               }
           ]
       },
       "version": "0.5.0"
    }
    Copy to clipboard

The following quantized mixed-precision graph will be generated based on the JSON shown above. Please note that the convert operations are added appropriately to convert between float and int types and vice-versa.

![../_static/resources/qnn_quantization_mp_graph.png](data:image/png;base64,UklGRsoYAABXRUJQVlA4TL0YAAAvKQZQEF8EIZIkxVb2PaYVcBa+fwvfCjNz23AYSbZp9T58I/+EfiK2TceNJClS5jIc+W/QOcLwY74yyDZSEQ71ld7wDC7AsTNMURidgaqCAqPZQALNFhpVCQm0BwnQKUCC9KYCx38QGS6ETt8vwh/E6E0gwUvdEoTmuoQEr82BoBAxlwX01p0ooNmfIIBGdyS4SKVQCI3BFAoICvP7ZfRG6OBPcCw6osJXU1USoINSYNHtCHY01bWQYKprKUoNXorSz+2m4rYeCAJjMIN2nR4UHvoPfmRCp4JVspPzR0iw/v4EfH8K/8OF0Ofvr6K/IAqn7pRAEEp0PloiGrwczwrE2Xjldb5HFP4fT0+NP8HP7Y742JwoRW3AtI38/9tpENgjQpHbto3TdSpsB84jeKiJK9MrXmwtjsp17h+Pk6kGVJdBfIA36MCNxnR4zHGnKTHxAVID9MPzqFR6x4Hxl/9QTTPMbZRswMJEFG4i5m/cqE7dmRUmNlpt5Ts44g0Fxm3b5zZxNfKPQOQkLpUnjAF1xoFYbYoNxKrTHWR3RemxuqgbNGHO3n0FutCcM90X/mvr+/0ky5Fsk2uufu/vi+g/JEaSnEjZ5w2qbmqHYYDhzE9yG0mSJAkx+mu7Rx/lmeaR1WP9iOg/LEiSFCuJO/Rr+gn6/cb/vPv0fRGB/31DfvW7l++LiN/86r/f/Q2/TyLk3acv3zcRT//lP/pP/+k//af/9J/+03/6T//pP/0nw378zbQCwT+rYr5s56HwCs5A9/ixUpH4cSXz/fEcNF7BGegePt+qCqhkvpiDxis4A93DRyoT1cx3Dhqv4Ax0T//pv804ltcIHMvrAI5l/SDpozzQ8RCeH5JelMemW4xRx+NwRSbpwfNtFS0YZ/lKeRakK97zeE+q6D2b5Vui96CznnL80mihfdPF3vkgOfSAYSm8g52bbWxPV2FiF+23XPSqZ8GxkrfYRbgqWvACHbj5lqsm1fNeiFm+cOMSvUecD9Utj74TiEgX0blgB22/HCy6qUgAXoXZwTgV8RBVzoJu1jDFrKYrogWv0QHVqlwzNIWbpDKBV2IzRFw/kkflseNSN3XOBd0oteXQcUj/RvEKzJTab+QMq2bByBnILCzG5VmQrEebPdptnVbNe+PNUHNbed4jKeltXxE66Dz0NAVHPB3SkWgOCZy1DQXo7wkVOSPmiQmnbXZO8XyvWK9vyJwMeQ45oSPgMUfE1HwkIsXIRmf7UzOs6JKceUFYwTiIKKOqWXASTDICJ1q4BSHOkLZqq9cM0VCtEpshklp4K/CVa1N5G+3Ad3kiEqCTjUKqqbaiI4TFGCAmJgiohpKJXbUX+PBSSdwsczLk6atW+yZjJld5dNd1XBQl7Vldtqdm3JJgWEEaPuIqWlBEXC7bgmi7Z9FNq+m9HsIyvMdTDwNtQGFIrBUrl5BkndWBRLAzfFeKYTPNMXYy+3BbSsbHWOuX+lrmnohvGEhlZHbzWlIUDOiw6yKgWEG61itnQUPrVLYFuXYv9LbRTarovb53RQ2KNkMrLjt0TQYj2VFUATGEpRskrRfh+gURj0WGbmbbKcISMQ6W2mAiqq15yc7QN0jTzG4xwgKYE1K8CDBWUI7QqZwF9eegdAty7d7RzbZyoyp6b3izXfTbFK+4oJPro5FjzQRqmg4RF4R0MXb8VAUztXHJRAgMPaCulsPYCNm/7qrpnHScRYCxgmPVSqppwb7TLd+CYLsnEm+ruIreE4ldtsW8t+ri5xLlMcUg24UtisUonknsqpTbskBYIzITwYrryzkEYAUNz9W0YIgPpeWDtHvm0+Cq2AyRwy+j1Rrz2QvHib6eLUZmxOMYYWFkkwcusQ8PNMzJqPYcPEJXy3eqJcMYmxHFEcbz4ikjHaVpmReIFYxdlrSSFjzKnsu3INLu6WdWyRjdqnlvyIRDDMvzHkieE8+I0dWvyHExzSFC102LE6idzFrKQ06LxYip4jmQHTUl0kbiqZg2emkOgeq00uJQqeYyaLWK0Z4XfhU0PVfRgmNygVn5FoTavUO63CHCinlPK4H4sOV5D6RYufbYthBnpgiPJ28jSHMQBZ6DCIiJHVq5cmbJjJjjebCONzm2mTQLmvlRHlOocB7ccDJyVWxkOtNCCjgvCCsYu/CDLMKqWTBW2CP5jkq2INTuTV0VTo57yk0q573MZtxDu8RmCCSZuADUx7K1v5MpGKS5BBjnUWSHNso/vjnKktkZp/Mgw0waZ9IikvlRmod4iOegHbhEtDliTf6813EgrGAELdpVs+AhtPAXbkGGMxR3iPOkWs2QwVLoJiU2QySJTCI2KUjS8xLWHqdSlCLSJpYXPMWxWf7cNF5Brt2LrU2q6T22nJTpvZcbhT+Z0ngFu+8efYfQo7ie84pfaLAlgh62NBqvYOvd68Ak0KO4nvPKKGDOP4Absx5SGo1XsPvufUY8oP9+bfFB+9/4/6Qq/KyS+do5aLyCM9A9fPin36wC/JMq5ss/eTlHFF7BIejeK5G8eoXKZrtuxOdefa7Osy18hvSf/tN/+k//6T/9p//0n/7Tfwbi+R8gVc32lJKDPxTj+f0bu7dPqpotde9datgzM9jftQ9PjBJqJtumZ6gHf2js5rniNOOk2di9sdV4UNVsoWtuPeFnRu42Nvav2eaJUUK9ZFv1DPXwwGcf53DPns64Zf+f+XTrclWzZe57jXs5ok7sdWY+axwwQ5OwWy/ZVj1DTTy6EFc/oGmoarbIneWCJhmIaIL5DjXC1cv1km3XM9SCF/vvveTPHGhDkf+3f7VxQ37PTY0nVc2WuIMbjav7D2d2IAclTNo2LPNn6Fbk2uV6ybbrGWrBadabvHW5uZuNRfPZ/gcJdxtn2fjkVmWzRW638cH9B6f26sb+wVbjvs5d0+j0ib1dL9l2PUNdIK3DHXtm2Mm8M/PIpa3TymbL/QGxW0KtQLcflzY+YTi42dw6rZdsq56hNtAzHx7qPD+wzex8l6pmi95tZsOSPrF/rbF7onG98bhmsq16hhpxZuLGxhmzXL18UtVs8eONXd0faFJuNR7UTbZdz1A1vtf4BHO2pPtVzZa/5i5rcYcexzywn6idbLueoWJoCphf2NtVzRa/F40bfO+A8InGH2bctffrJ9uuZ6icW7czPtO4V9Vsydula/4B37EnGbfsKdlU1FC2Xc9QKz5hP3X6nK/Z209OP3lpS6qaLXl2F0/uX9p6yY83mg+f4CBrJe42tu6RQK1k2/UMteLFVsP+geWWncXu48pmS971bNXvCjM/nlnBbhw8e8nXGlo8rJVsu54h+OaKM2KJZ8BpdbNljpnpkYpnIkzj8RnbGsu26Bkajb+LQV99Zlv0DP0Cp/mRqmb7SfH3u/Wf/tN/+k//6T/9p//uhb7zfRM//05Vs63AD6yRb3+/nrMtcoZa8V388/s6P/jnK1vVbBvwd3zOAF79rZ6zbXKGSvHzfxB3AMQb8qOqZtuAL7/C5zTwCp+v52ybnKFWHvghcQdANxWVzbaCq88DnyPgFWxNZzsYM/T0i7KTANlUVDfbHq6RvQdb19kOxgw9/aLsJMAQW91si7j+JxiwtZ3tYMzQ0y/KTgIQW91sq7gGYOs728GYoadfmJ0UW91s27i2NZ7tYMzQ0y/LToqtbrZtXNs6z7bEGUr3lT//tWZDvlAmd7rsI3zpj99Iaza++sevlMitLvsEf/5qWrvxtT+VyK0u+wSQGg6UyK0ue/2n//Sf/tN/+k//6T/9J1ae/wGyGtT+aHjv4HkxzvZ37cN1uvbvXWrYswI1PzFKWA1qv/8dat/VvEclnz7PuGs39q/Zq8/X59o3t57wMw1TzZsnRgmrQe0B7FD3Tj/BOTRuzzhp7D7LbPmJtbn232vcy9FxYq8z81njgBmahN3VoPYAdqj9RxfiDw1QDWtz7c9yQZPUvAnmO/aUdHEvrwa1B7BD3fvMPvOL/d9r458H+42mHPCnNW6sy7U/sI2r+w9f7L+XOwq+YZk/Q9vta5dXg9oD2KHu3Wown9qrG/s3thoPM7ZmvNjYnXG38WBdrv3BbuOD+w9I5Q+2GvcJeTU/sbdXg9oD2KEJsCXMsrFr2LN9vNXc323cX59rf2bv6ZXnO40z2mJf2jAMhz9vbp2uBrUHsEMDgA4NX20aOGtuXJ316FbX2h9Hk3riNrNhSZ/Yv9bYPdG43nj88lwCIPvjaFKr7dAkuGbgcePWs5f88cZHV9PaJ3vKAeDHdQVv7DINQVNfLw/43LLW7K/bv1pthwbCjcvMZEnPV9Hax64DtakAmSySyMbnmOYua3Gn8YQM/tTYWZlp2a+qf2k7NA8E2qPzdAWtfeI66igVGSmHp4vjETA8v7xo3OB7B4RPkFNt7tbZMaK07FfVv7QdGohbDTIqDLuK1n7oICQCBkBUO5AO/GcaD/gOrfkte0ob5/oiLPtl9S9th0bBxtUnJ3zWaD48PbveuL+K1n7HcQ2tSyAi0d4V6/lxBnMU73m8J+nU5yMC80Qk8mfSwozAD2Nve+x3HLzJ4QXZXTy5f2nrJT/eaD58goOsdbrb2LpHAjVFXPaL6l/UDs2Cj1v7AeY/bDWs3bi7ktbeQUc/4UaSVHoOCRUSSS0FIBsF8cAZI0dZXdq2TTM5LhCBhH9B1281rN0VZn5Man7wbLYiGlo8rCnisl9U/6J2qGMlxukZmZ7tMdiupLV/lNeiREDbHlsXKs5AZ2Q7DgYycjCd0ZmJHALdJBtGDzKggsM4msnas3Kx7OmxgWcizJxf81rzX1D2S+tf0A6djr+LcXUe3CQViYFuRltEpgrd7OsCImecSSMDHVBEZmwaF7+iXP8taerXf1E7dPYAbBotpwPlsplp6WZChmiJAN7hLN4B4hltWX2pT/+db5JPQI+U5hMBVnYQzjDE+NzT/MgqQNPWov9OuR3HpYyBYEaX4OUjHrpjICZNWUQjyWmZ1nr/nXKBg4gwzARswiO4aOcygOpmAog0LXJY6/13yk2VI6MZkeswaaf6M3oOQjOaPkS0KSM6t20Og9cH/jvYZKIcuN4VQMapTLcB9gXoprlIF3RcxHImzQNaJkRBOntz0L9zTWLfyaI9Ju0OEaGCtAAREBp1qiAxMlSAPwf9mwbH0aTuSSZWFkciRdWs+7MysXYiOSJK0VlAimy3f8fM6WsC/9ni/tuJR1dIeGFcOhE7gNqrB77z/cUwzAqH7XBBD2NGu4ia0UxNcvbpp+OXs/8iotiP13//9ZUDSK9gO7QVjhbuJJ84svNwBKgdF/AWyfn6PwEsgrcdYFMA9MpnvKeAAkRUTXT26Zax//raQ4+n677/GFBtF/CLtEPbMTiWydAF8p8CYB5ccEwOqY1qgJ//Y2bJ8umRqXIk7jgIS8cHNlUBPGBH4Sc7+3TL138xfegFwGDd9x+d0BEO4gL+244hqbOCWyoRfVxOFTp1sJf7w8ySZRPP/JMKrZxXOnscyWYBOlRNePbhl3YsX/+NaO/MOvDXfP9FgKU2HBbw336YL/dJettXxAtTmfgtwOcok+JdsRzmcAhE2uWki+S8WbJkhoadMN9xhdZ2u0Nqe+TvSQ8exqn0/Q9TlQzJiaLey0jPPvv7jZav/0Ztnmhzb6zz/jPasMhdp7blEdAli8WmAnppBBJDiV0qxS+Gt1jOnSXLpetgSkmOJc1qS2KPbKrdFrIY6d0UcRGWR3z20d+ZuqT9JzIEwtcC/ksYnC5FDH0yTz/sEU8jH7A2SVmXYs8f586S5eLnjei5UEfJJACY7NS5/WhI9n4VghkREJdHfPbRP8OwLP1XbJGvAfw39VjBjZcsg2BEV4E1fHjIoTaNzLnj1bmLRTJyENIqqmlG5rVupoyhiEiW8ojPPtqS9l/sQsWvAfz3CAB42SLJYeBzx4RBynKjC+jVHOvl5oxHWXmnCtFKxHL2XzJ7Hr8W8N+xHWZdlWVJDHS1i1cAMyGVMqPW3vnBP1/9YzHjoYajZ4QIGOYgHjpiHVdKJD776Es7lrL/khbwsTXdf49cxQr+ssQCI/Klx3nvxMCHjud6p7VAzp0hf1QCuXWjdBw1Q+lYI9r97D2EZRKffbSl7D/Ontdz/z32g9E7cJcksQvWVkMeo2zPT2ScRwQMMhTaC+TcGfJluUwdePq4RdvwcxUNgDiPqcIeEJdJfPbRlrH/3gYGImu//wybMxetpUgSucBRqq2apEUYAFPDVj73w0LPMegB4QI5b4YsvfZdB704leRtICSX+ViylW6neQgj25stk/Tssy1h/70D8DGJNd9/A6BH55Txlx5aqI/RIS634yshm6MoEzGeKkdmUtzcftvIAXvbmL2/QC5WyAXU3nOgeFsBXWovtwPlSJzPGEBUKuHZZ1vC/hs5hii/f9NtrQl246WI2gkS+rBzHaAdK7KeumRtTIiU8U7ufCFDl5ySltTCTGQzQ5ZV+4eCUwOjiHac5qCfK1Aq2dmHW8L+OzRRfv8k9qGJrG8WN9V/YjxIxFSKFFPI1iZ1MbvxLxZV+0lkj89FFVecfTLz0kLvdxLveTya654CCVtZn/wnUcfjaSHnlWeipIiqJeq/Uu4fsiQRV8mxHgVWR03cW6sqJX/0gFhq9rE3TFuOVxDt/Gw3nE93pszmpgIALGRl+Lryw068/vhPekX2xY62AcAnm62xZqj5TNgvvIreyfy3HS9r/+1e5OjRrimqUvJHD4iVCpUzSpUjuUzs2DzJf28ePKjcrlvsAjttQMVUw2YbQJgeQk3WHv/FDlp2ksNQc56XPStgRwA3Ls7IgV/0F7doUS9isqT9t3vjTT06NUVVSv7oAbFQYwdBKqObRwW2O216liIn5LCinYMibRIDY/L47KbCJK+YbAIZarLu+M8WOU/EBSf6/HJMvrfvoDuH/uBm/taL7uha3X/ekvbfuuzu+7ZZZh+7+m34CxIBo2JTN/uxmSCXHbS1A4cGDV1ytnvHcadrjv8OgTiPR8CALimQMTCghnJzGEVijoIEgH6Vw3L23wTz6bQ4gT+gM/4POx7CVKzvOsrndEyJ8gF4Qgl8oDVTzRxZ9nLu+64yOtw3TctgFYL1xn/cBjp+NPFZM6Ahxtmw5bjNUcYOOFe0G2pMfY7IL22R9DzO+jFdBt7kgU5nBvXfxSvaa0+7lqnjZNEXXz/N1EVHm3CHaDHJAbyQKNbGf4E3iQI31umbL/9kMF2cWm/8BxLDSDfgnoGOcaK9OHvHQo+hfDjzWz82XTHsgV0AmdZNzZcRENJdZ81/F62QfRUziu1QIOk42oFYm8rERcv3oMZ5yOitrFUK45lCwLeTFEDXct/RTt7a87aBwHzddKjPArrW+C/qOejbZIYKjm1nJoLgsXLEmm8OFOcgse8C8MUEWqMoyLZk9Be3mND+j6+cVqL776KVkYOxmZGQJcX6h/Xt0odzzjWkPdqu66BFP6wtXhOt7+K0Qp0eEOgt4XCd8Z8+R5AFQvOUSQDwZqzDdAg098zTYRvA2ICb0NEx0wx2Cdk9UqHBfxep6BsUM6SVMuNB9Ud919meFtCc9XXZCBMxens0iXygZ3j+kBh+Otb1xX85WPOc38eTkesgmnP+xaSngIhgMq2R2EX7kHhR999FM0OqKwIC6Xt60J8AwoHybR55s+D7WgMXAu3UQLAuYVKT97NxvK05Z2Jw4Uj/YdzaI5kHD66eiua/C2VM7YpsopPz4T2l/dir+YxnJJE1NndyBLSS9Y1pHFm9u2Eev8z5sFUAstOx5gLo6su+oGUwNx1HGU60HUd6pAJwP0klh67+Kz7og98DwljRZ8O4+Jriv3xGQl1HPqUNf5qfxRpcmKQ+7a3Mja+Pimr+u1iFtkXFac0I6Hg2518NTDquuaiJfuMtRaYVShiYZs8SGY64WsCuL/4zQ4xjAV8ssEeFdUR6RW/9sZe9VZwp9awHFdOFjTX/Xajy149eFYUBj2XqAtt+sUn+H5EJVUBIfzJYZiGCdkCDqusCa4z//gPK09zW0W7zoGLTSFeR7cscEOd19ohmz1dAN9X8d7HKn18AKQj5Xhgm+e9IWgJdn9x4y3AbAXrB8GYOLvw1xn//e0VsuLVDV4FqWBgTF2iLxDv6r31B/XfxysRBX4pGZI9LneQ/2wIl9jjnngIFou8gXov8Zx1YSdhgHLvoe4MkmuzYWjb57+IV6Zf0pwAqiZLmSV1v/Pcfi4xX9e8CFgkc2KpgFXjd8V8Kq8DphSwSOMOqcKgGa4//ruI1/bugReLz8W5k4xLyS9Ym/00jm5xr4iR9bcYJzv5f/qP/9J/+03/6T//pP/2n//Tfb6D+/PX65at/KpFbXfYJvvLHr9Yt3/jjF0vkVpd9Apa//LVmg//tZYlxq8v+vW72V/pP/+k//af/9J/+03/6T//pP/2n/3zRu799vwS/xn/88tfvj+B3v/qvN/793afviwj85xsA)

## [qairt-quantizer](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id28)

The [qairt-converter](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html#qairt-converter) tool now converts non-quantized models into a non-quantized or quantized
DLC file depending on the overrides provided during the Converter step. `qairt-quantizer` now can be used to quantize all the tensors which
are missing encodings during `qairt-converter` step (fill in the gaps) or can be used to calibrate the provided encodings through a list of images.
The [qairt-quantizer](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html#qairt-quantizer) tool is used to quantize the model to one of supported fixed point formats.

For example, the following command will convert an Inception v3 DLC file into a quantized Inception v3 DLC file.

$ qairt-quantizer --input_dlc inception_v3.dlc \
                      --input_list image_file_list.txt \
                      --output_dlc inception_v3_quantized.dlc
    Copy to clipboard

To properly calculate the ranges for the quantization parameters, a representative set of input data needs to be used as
input into [qairt-quantizer](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html#qairt-quantizer) using the `--input_list` parameter.
The `--input_list` specifies paths to raw image files to be used for calibration during quantization.
For details refer to `--input_list` argument in [qnn-net-run](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html#qnn-net-run) for supported
input formats (in order to calculate output activation encoding information for all layers, **do not** include the line
which specifies desired outputs).

The tool requires the batch dimension of the DLC input file to be set to 1 during model conversion. The batch dimension
can be changed to a different value for inference, by resizing the network during initialization.

### [Additional details](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html#id29)

- `qairt-quantizer` is majorly similar to `snpe-dlc-quant` with the following differences:

    - `qairt-quantizer` can now be used to generate encodings using calibration dataset provided via the `--input_list` flag
for the tensors for the following scenarios:

        - Fill in the gaps: If any tensor is missing encoding during the `qairt-converter` step i.e. the tensors for which override
is not specified in `--quantization_overrides` or source model encodings (QAT).
        - If encodings is not specified for all the tensors via overrides or QAT encodings.
    - HTP is set as the default backend in the QAIRT quantizer, which may enable certain HTP-specific behaviors that
wouldn’t be triggered by default in legacy quantizers where the backend is left empty. This difference can affect
how some backend-dependent features behave during conversion/quantization.

        - For example, during quantization, an optimization called `IntBiasUpdates` is applied to the FullyConnected op if
the backend is set to `HTP` in SNPE, whereas it is always applied in QAIRT.
    - The external overrides and source model encodings (QAT) are now applied during `qairt-converter` stage by default.
So the quantizer options to ignore the overrides and source model encodings, `--ignore_encodings` (legacy) and `--ignore_quantization_overrides` are now no-op.
    - An alternative option is to the `--export_format=DLC_STRIP_QUANT` flag of `qairt-converter`, when specified the converter will ignore/remove all the encodings in
the source model and output float model which can be recalibrated using `qairt-quantizer` and `--input_list` flag.
    - Another alternative for using this feature is through `qairt-quantizer` options `--input_list` and `--ignore_quantization_overrides``in combination
which signals the quantizer to ignores all the encodings applied during conversion and generates encodings using the calibration dataset provided via ``--input_list`.
    - The float fallback feature controlled via command-line option `--enable_float_fallback`, present as `--float_fallback` in legacy quantizers
is also a no-op for `qairt-quantizer` and can be skipped. The float fallback was added to produce a fully quantized or mixed precision graph by applying encoding overrides
or source model encodings, by propagating encodings across data invariant Ops and falling back the missing tensors to float datatype.
To simplify the steps, this is taken care during `qairt-converter`. `qairt-converter` applies the overrides and encodings, and the tensors which are missing
encodings will fall back to the default float datatype.
    - To summarize, `qairt-quantizer` command-line arguments `--ignore_quantization_overrides`, and `--enable_float_fallback` are now no-op,
and are applied by default during `qairt-converter` step itself.

Note

`--enable_float_fallback` and `--input_list` are mutually exclusive options. One of them is
mandatory argument for quantizer.
- Outputs can be specified for `qairt-quantizer` by modifying the input\_list in the following ways:

#<output_layer_name>[<space><output_layer_name>]
        %<output_tensor_name>[<space><output_tensor_name>]
        <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
        Copy to clipboard

    **Note:** Output tensors and layers can be specified individually, but when specifying both, the order shown must
be used to specify each.
- `qairt-quantizer` also supports quantization using AIMET, inplace of default Quantizer,
when `--use_aimet_quantizer` command line option is provided. To use AIMET Quantizer,
install AIMET torch through pip. More information about AIMET torch installation can be found here, [https://pypi.org/project/aimet-torch/](https://pypi.org/project/aimet-torch/)
- Advance AIMET algorithms- AdaRound, AMP and AutoQuant are also supported in `qairt-quantizer`. The user needs to provide a YAML
config file through the command line option `--config` and specify the algorithm “adaround”, “amp” or “autoquant”
through `--apply_algorithms` along with `--use_aimet_quantizer` flag.
- **AdaRound**

> 
> 
> - The template for the YAML config file for AdaRound is shown below:
> 
> 
> 
> 
> > 
> > 
> > datasets:
> >         <dataset_name>:
> >             dataloader_callback: '<path/to/unlabled/dataloader/callback/function>'
> >             dataloader_kwargs: {arg1: val, arg2: val2}
> >     
> >     adaround:
> >         dataset: <dataset_name>
> >         num_batches: 1
> >     Copy to clipboard
> 
> 
> - - The required arguments for AdaRound are specified below.
>     - - *dataloader\_callback* is used to set the path of a callback function which returns unlabeled dataloader of type
> torch.DataLoader. The data should be in source network input format.
>     - *dataloader\_kwargs* is an optional dictionary through which the user
> can provide keyword arguments of the above defined callback function.
>     - *dataset* is used to specify the name of the dataset
> that has been defined above.
>     - *num\_batches* is used to specify the number of batches to be used for adaround iteration.
> - Other than the above required arguments, there are few optional args that have default values set however, the user can specify
> a non-default value through *optional\_adaround\_args* in config file as a keyword dictionary. The supported optional arguments
> are specified below.
> 
> 
> 
> > 
> > 
> > - *default\_param\_bw*: [int] Default bitwidth (4-31) to use for quantizing layer parameters
> >     - *param\_bw\_override\_list*: [List of list] Each list is a module and the corresponding parameter bitwidth to be used for that module.
> >     - *ignore\_quant\_ops\_list*: [List of str] Ops listed here are skipped during quantization needed for AdaRounding. Do not specify Conv and Linear modules in this list. Doing so, will affect accuracy.
> >     - *default\_quant\_scheme*: [str] Quantization scheme. Supported options are post\_training\_tf or post\_training\_tf\_enhanced
> >     - *default\_config\_file*: [str] Default configuration file path for model quantizers
> - AdaRound can also run in default mode, without config file, by just passing “adaround”
> in the command line option `--apply_algorithms` along with `--use_aimet_quantizer` flag. This flow uses the data provided
> through the input\_list option to take rounding decisions.

- **AMP**

> 
> 
> - The template for the YAML config file for AMP is shown below:
> 
> 
> 
> 
> > 
> > 
> > datasets:
> >         <dataset_name>:
> >             dataloader_callback: '<path/to/unlabled/dataloader/callback/function>'
> >             dataloader_kwargs: {arg1: val, arg2: val2}
> >     
> >     amp:
> >         dataset: <dataset_name>,
> >         candidates:  [[[8, 'int'], [16, 'int']], [[16, 'float'], [16, 'float']]],
> >         allowed_accuracy_drop: 0.02
> >         eval_callback_for_phase2: '<path/to/evaluator/callback/function>'
> >     Copy to clipboard
> 
> 
> - - The required arguments for AMP are specified below.
>     - - *dataloader\_callback* is used to set the path of a callback function which returns labeled dataloader of type torch.DataLoader.
> The data should be in source network input format.
>     - *dataloader\_kwargs* is an optional dictionary through which the user
> can provide keyword arguments of the above defined callback function.
>     - *dataset* is used to specify the name of the dataset
> that has been defined above.
>     - *candidates* is list of lists for all possible bitwidth values for activations and parameters.
>     - *allowed\_accuracy\_drop* is used to specify the maximum allowed drop in accuracy from FP32 baseline. The pareto front
> curve is plotted only till the point where the allowable accuracy drop is met.
>     - *eval\_callback\_for\_phase2* is used to set the path of the evaluator function which takes predicted batch
> as the first argument and ground truth batch as the second argument and returns calculated metric float value.
> 
> 
> 
> Sample eval callback function for computing top-k accuracy metrics:
> 
> 
> def accuracy(output, target):
>            """Computes the accuracy over the k top predictions for the specified values of k"""
>         
>            topk = (1,)
>            maxk = max(topk)
>            batch_size = target.size(0)
>         
>            _, pred = output.topk(maxk, 1, True, True)
>            pred = pred.t()
>            correct = pred.eq(target.view(1, -1).expand_as(pred))
>         
>            res = []
>            for k in topk:
>                correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
>                res.append(correct_k.mul_(100.0 / batch_size))
>         
>            return res
>         Copy to clipboard
> - Other than the above required arguments, there are few optional arguments that have default values set however, the user can specify
> a non-default value through *optional\_amp\_args* in amp as a keyword dictionary. The supported optional arguments are specified below.
> 
> 
> 
> > 
> > 
> > - *eval\_callback\_for\_phase1*: [str] Path of the Eval function which takes only model as an argument and returns
> > calculated metrics float value. This function is used to measure sensitivity of each quantizer group during
> > phase 1. The phase 1 involves finding accuracy list/sensitivity of each module. Therefore, a user might want
> > to run the phase 1 with a smaller dataset.
> >     - *clean\_start*: [bool] If true, any cached information from previous runs will be deleted prior to starting the
> > mixed-precision analysis. If false, prior cached information will be used if applicable
> >     - *forward\_pass\_callback*: [str] The path of function which takes only model as an argument and runs the forward
> > pass on this model.
> >     - *use\_all\_amp\_candidates*: [bool] Using the “supported\_kernels” field in the config file
> > (under defaults and op\_type sections), a list of supported candidates can be specified. All the AMP candidates which are
> > passed through the “candidates” field may not be supported based on the data passed through “supported\_kernels”. When the
> > field “use\_all\_amp\_candidates” is set to True, the AMP algorithm will ignore the “supported\_kernels” in the config file
> > and continue to use all candidates.
> >     - *phase2\_reverse*: [bool] If user will set this parameter to True, then phase1 of amp algo, that is calculating
> > accuracy list will not be changed, whereas the phase2 algo of amp, which generate the pareto list will be changed. In
> > phase2, algo will start, model with all quantizer groups in least candidate, and one by one, it will put nodes in higher
> > candidate till target accuracy does not meet.
> >     - *amp\_search\_algo*: [str] Defines the search algorithm to be used for the phase 2 of AMP. Supported algorithms are Binary, Interpolation and BruteForce

- **AutoQuant**

> 
> 
> - The template for the YAML config file for AutoQuant is shown below:
> 
> 
> 
> 
> > 
> > 
> > datasets:
> >         <dataset_name>:
> >             dataloader_callback: '<path/to/unlabled/dataloader/callback/function>'
> >             dataloader_kwargs: {arg1: val, arg2: val2}
> >         <eval_dataset_name>:
> >             dataloader_callback: '<path/to/labled/dataloader/callback/function>'
> >             dataloader_kwargs: {arg1: val, arg2: val2}
> >     
> >     autoquant:
> >         dataset: <dataset_name>
> >         eval_callback: "qti.aisw.converters.aimet.aimet_utils.accuracy"
> >         eval_dataset: <eval_dataset_name>
> >         allowed_accuracy_drop: 0.07
> >         amp_candidates: [[[16,'int'],[16,'int']], [[16,'int'],[8,'int']], [[8,'int'],[16,'int']], [[8,'int'],[8,'int']]]
> >     Copy to clipboard
> 
> 
> - - The required arguments for AutoQuant are specified below.
>     - - *dataloader\_callback* is used to set the path of a callback function which returns unlabeled dataloader of type
> torch.DataLoader. The data should be in source network input format.
>     - *dataloader\_kwargs* is an optional dictionary through which the user can provide keyword arguments of the above defined callback function.
>     - *dataset* is used to specify the name of the dataset that has been defined above.
>     - *eval\_callback\_for\_phase2* is used to set the path of the evaluator function which takes predicted value batch
> as the first argument and ground truth batch as the second argument and returns calculated metric float value.
>     - *dataset* is used to specify the name of the labeled dataset that has been defined above which is used for model evaluation.
>     - *allowed\_accuracy\_drop* is used to specify the maximum allowed drop in accuracy from FP32 baseline.
>     - *amp\_candidates* is list of lists for all possible bitwidth values for activations and parameters.
> - Other than the above required options, there are few optional arguments that have default values set however, the user can specify
> a non-default value through *optional\_autoquant\_args* in autoquant as a keyword dictionary.
> 
> 
> 
> > 
> > 
> > - - The supported optional arguments for *optional\_autoquant\_args* are specified below.
> >     - - *param\_bw*: [int] Parameter bitwidth.
> >         - *output\_bw*: [int] Output bitwidth.
> >         - *quant\_scheme*: [str] Quantization scheme
> >         - *rounding\_mode*: [str] Rounding mode
> >         - *config\_file*: [str] Path to configuration file for model quantizers
> >         - *cache\_id*: [str] ID associated with cache results
> >         - *strict\_validation*: [bool] Flag set to True by default.If False, AutoQuant will proceed with execution and handle
> > errors internally if possible. This may produce unideal or unintuitive results.
> >     - - The supported optional arguments for *optional\_amp\_args* are specified below.
> >     - - *num\_samples\_for\_phase\_1*: [int] Number of samples to be used for performance evaluation in AMP phase 1
> >         - *forward\_fn*: [Callable] callback function that performs forward pass given a model and inputs yielded from
> > the data loader. The function expects model as first argument and inputs to model as second argument
> >         - *num\_samples\_for\_phase\_2*: [int] Number of samples to be used for performance evaluation in AMP phase 2
> >     - In AutoQuant, num\_batches is supported as an optional arguments, hence a non-default value can be provided
> > through *optional\_adaround\_args* along with other optional args of AdaRound Params
> > in autoquant as a keyword dictionary. The args description can be found above in the AdaRound algorithm section.

Last Published: Jun 04, 2026

[Previous Topic
Converters](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/converters.md) [Next Topic
QAIRT Quantization Specification](https://docs.qualcomm.com/bundle/publicresource/80-63442-10/topics/applyencodings.md)