# Mixed precision

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-mixed-precision.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-mixed-precision.html)

The mixed precision feature allows a user to execute a network with nodes in FP32/FP16/INT8 combination. Specific node instances of each node type can be set to FP16 precision using the "<var class="keyword varname">-node-precision-info</var>" option. The "<var class="keyword varname">-node-precsion-info</var>" option can be used with the Qaic compiler's profile guided quantization and <var class="keyword varname">"-keep-original-precision-for-nodes</var>" to execute a network in mixed precision (FP32/FP16/INT8).

## Interoperability with "`-keep-original-precision-for-nodes`"

1. The "<var class="keyword varname">-keep-original-precision-for-nodes</var>" and "<var class="keyword varname">-node-precision-info</var>" options can be used together to create a graph in mixed mode precision (FP32/FP16).
2. "<var class="keyword varname">keep-original-precision-for-nodes</var>" supports executing all instances of specified node kind in original precision (if original precision is FP32, will remain FP32).
3. Setting node instances to FP32 is not supported with "<var class="keyword varname">-node-precision-info</var>".

## Node precision info input file

Operator instances required to run in FP16 are identified via the operator’s first output name. The user should provide a YAML file containing operator instances’ first output name that is required in FP16 listed against the field "<var class="keyword varname">FP16NodeInstanceNames</var>".

Example: Sample YAML file content containing output name of node instances required in FP16.

    FP16NodeInstanceNames: [conv0, bn0, relu0]Copy to clipboard

## Assumptions and dependencies

1. Supported for ONNX, Caffe2, and PyTorch models currently.
2. Node instances required to run in FP16 are identified via operator’s first output name.
3. When used with profile guided quantization, model quantization profile needs to be generated with "<var class="keyword varname">-node-precision-info</var>".
4. During quantization profiling, node instances required to run in FP16 precision should have FP16 kernel implementation for interpreter backend.

## Usage with qaic-exec

Step 1: Generate quantization profile with `-node-precision-info`.

    $ /opt/qti-aic/exec/qaic-exec -m=./path-to-model -input-list-file=list.txt -node-precision-info=node_precision.yaml -dump-profile=pgq.yaml  
     
    Quantization Profile is being generated. 
    Quantization profile is dumped at pgq.yamlCopy to clipboard

Step 2: Inference using generated pgq profile with Step 1.

    $ /opt/qti-aic/exec/qaic-exec -m=./path-to-model -input-list-file=list.txt -node-precision-info=node_precision.yaml -load-profile=pgq.yaml 
     
    Model is compiled with Int8 precision using PGQ.Copy to clipboard

## Usage with QAic graph API

Set the graph configuration option <var class="keyword varname">QAicGraphConfig.quantizationConfig.nodePrecisionInfo</var> to force the execution of specific operator instances with FP16 precision. This flag is supported for ONNX, Caffe2, and PyTorch Models loaded through API "<var class="keyword varname">qaicAddNodesToGraphFromModel</var>".

Note: When selecting a Convolution node instance to run in FP16 precision, set BatchNorm node (if there is any) as well as Convolution to FP16 precision to allow fusion of Convolution and BatchNorm.

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

Last Published: Jul 26, 2023