# 5 Network compilation

**Parent Topic:** https://docs.qualcomm.com/doc/80-PT790-993B/topic/dl_inference_tools_part.html

## 5.1 QAic executor

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

The QAic executor (qaic-exec) tool runs on x86 platforms and supports the following:

- Loading and executing the following model frameworks:
    - ONNX (onnx, .onnxtxt, external data is also supported)
    - TensorFlow (.pb, pbtxt) frozen graph
    - Caffe
    - Caffe2
    - PyTorch TorchScript (.pt)
- Generating quantization profile and loading and compilation in one command.
- Dumping a quantization profile.
- Loading a quantization profile and compiling a model.
- Multiple inputs and multiple outputs.
- Compiles and executes in both simulator mode and hardware mode. Hardware mode execution uses the QAic runtime APIs provided by the Platform SDK.
- Performs AIC100 pre- and postprocessing on host.
- Option available to run pre- and postprocessing on AIC100.
- Can compile a model of arbitrary batch size to any other specified batch size.
- Can generate uniform random inputs and feed to a network if no input files are specified.
- All the framework operators that are not supported in a model are listed on the console during a failure.

Source code for QAic executor can be found in the following locations:

- `/opt/qti-aic/examples/apps/qaic-exec/QAicExec.cpp `
- `/opt/qti-aic/examples/apps/qaic-exec/QAicExecParser.cpp `
- `/opt/qti-aic/examples/apps/qaic-exec/QAicExecParser.h `

The following table shows the options applicable to QAic executor.

Table : QAic Executor options

| Flag | Use | Example Usage |
| --- | --- | --- |
| `-u` | Detailed help with options and defaults. | `-u` |
| `-m` | Specifies the path of ONNX, Caffe2, TensorFlow (TF), Pytorch Torchscript (\*.pt) model files. | `-m=/tmp/models/test_1/network.onnx ` |
| `-model-input` | Specifies the inputs of the graph for Caffe2 models. Optional for ONNX and TF models.<br><br><br>              <br>The format is: &lt;inputName1&gt;,&lt;inputType1&gt;,&lt;inputShape1&gt;. | `-model-input=input_03_data,float,[1]` |
| `-output-node-name ` | Set this flag to the output node name of the model for intermediate layer outputs.<br><br><br>              <br>This option is required for TensorFlow models. | `-output-node-name=out ` |
| `-aic-num-cores ` | Number of AIC cores to be used for inference. | `-aic-num-cores=16` |
| `-aic-hw ` | Runs inference in Hardware (HW) mode. Without this option, the default is Simulator mode. | `-aic-hw` |
| `-run-on-interpreter ` | Runs inference on the interpreter.<br><br><br>              <br>Default is Simulator mode. | `-run-on-interpreter ` |
| `-aic-hw-version ` | AIC100 hardware version to generate model binary. | `-aic-hw-version=2.0` |
| `-ols` | Factor to increasing splitting of network operations. | `-ols=1` |
| `-mos` | Effort level to reduce the on-chip memory usage. | `-mos=1` |
| `-allocator-dealloc-delay ` | Option to increase buffer lifetime 0 - 10. | `-allocator-dealloc-delay=1 ` |
| `-size-split-granularity ` | To set the maximum tile size, KiB between 512 - 2048. | `-size-split-granularity=1024 ` |
| `-vtcm-working-set-limit-ratio ` | Ratio of fast memory. An instruction can use {0 - 1}. | `-vtcm-working-set-limit-ratio=0.25 ` |
| `-convert-to-fp16 ` | Specifies the model being compiled/executed requires FP16 format. | `-convert-to-fp16` |
| `-execute-nodes-in-fp16 ` | Runs all instances of the operators in this list with FP16 precision while in quantization. | `-execute-nodes-in-fp16 ` |
| `-node-precision-info` | Load the model loader precision file which contains the first output name of operator instances required to be executed in FP16.<br><br><br>              <br>Currently supported for ONNX models only. | `-node-precision-info=node_precision.yaml ` |
| `-keep-original-precision-for-nodes` | Run all instances of the operators in this list with the original precision during generation of quantized precision model even if the operator is supported in Int8 precision. | `-keep-original-precision-for-nodes ` |
| `-custom-IO-list-file` | Custom I/O configuration file in YAML format containing layout, precision scale and offset for each input and output of the model. | `-custom-IO-list-file=custom_IO_config.yaml ` |
| `-dump-custom-IO-config-template` | Dumps the YAML template for custom I/O configuration. This file can be edited as per the custom requirements and passed using the option '`-custom-IO-list-file`'. Model I/O PGQ profile file can also be used with this option to get proper scale and offset values in the dumped YAML template for desired activations quantization schema and calibration. | `-dump-custom-IO-config-template=<template>`<br><br><br>              <br>`-dump-custom-IO-config-template=<template>,<modelIoPgqProfile>` |
| `-external-quantization` | Load the externally generated quantization profile. File contains first output name of operator against the quantization parameter information.<br><br><br>              <br>Currently supported for ONNX models only. | `-external-quantization=external_quantization_profile.yaml ` |
| `-quantization-schema-activations ` | Specify which quantization schema to use for activations:<br><ul class="ul"><br>                <li class="li">asymmetric (asymmetric ranges) </li><br><br>                <li class="li">symmetric (symmetric ranges) </li><br><br>                <li class="li">symmetric_with_uint8 (symmetric ranges with potentially uint8 ranges) </li><br><br>                <li class="li">symmetric_with_power2_scale (symmetric ranges with power of 2 scaling factor) </li><br><br>              </ul> | `-quantization-schema-activations=symmetric_with_uint8` |
| `-quantization-schema-constants` | Specify which quantization schema to use for constants such as weights and bias:<br><ul class="ul"><br>                <li class="li">asymmetric (asymmetric ranges) </li><br><br>                <li class="li">symmetric (symmetric ranges) </li><br><br>                <li class="li">symmetric_with_uint8 (symmetric ranges with potentially uint8 ranges) </li><br><br>                <li class="li">symmetric_with_power2_scale (symmetric&nbsp;ranges with power of 2 scaling factor)</li><br><br>              </ul> | `-quantization-schema-constants=symmetric_with_uint8` |
| `-quantization-calibration` | Specify which quantization calibration to use.<br><ul class="ul"><br>                <li class="li">None (min-max calibration) (default) </li><br><br>                <li class="li">KLMinimization</li><br><br>                <li class="li">Percentile</li><br><br>              </ul><br><br>              <br><br><br><br>              <br>Glow PGQ calibration<br><ul class="ul"><br>                <li class="li">MSE </li><br><br>                <li class="li">SQNR </li><br><br>                <li class="li">KLMinimizationV2 </li><br><br>              </ul> | `-quantization-calibration=None`<br><br><br>              <br>`-quantization-calibration=KLMinimization `<br><br><br>              <br>`-quantization-calibration=Percentile `<br><br><br>              <br><br><br><br>              <br><br><br><br>              <br><br><br><br>              <br>`-quantization-calibration=MSE`<br><br><br>              <br>`-quantization-calibration=SQNR`<br><br><br>              <br>`-quantization-calibration=KLMinimizationV2` |
| `-percentile-calibration-value` | Specify the percentile value to be used when percentile calibration is chosen. It can take any float value between 90 and 100. | `-percentile-calibration-value=99.99` |
| `-num-histogram-bins ` | Sets the num of histogram bins profiling nodes. | `-num-histogram-bins=512` |
| `-quantization-precision ` | Specify which quantization precision to use. Int8 (default) is the only supported precision for now. | `-quantization-precision=Int8 ` |
| `-quantization-precision ` | Specify which quantization precision option to use, Int8 (default), Int16. | `-quantization-precision=Int8 ` |
| `-quantization-precision-bias` | Specify which quantization precision bias to use, Int8, Int32 (default). | `-quantization-precision-bias=Int32 ` |
| `-enable-rowwise ` | Enable rowwise quantization of FullyConnected and SparseLengthsSum ops. | `-enable-rowwise ` |
| `-enable-channelwise ` | Enable channelwise quantization of convolution op. | `-enable-channelwise ` |
| `-dump-profile ` | Specifies the path and PGQ (profile guided quantization) YAML file.<br><br><br>              <br>Use this option to create the PGQ YAML file to run the model with quantization enabled. | `-dump-profile=./resnet50_profile.yaml` |
| `-load-profile ` | Specifies the path and PGQ (Profile Guided Quantization) YAML file to run the model quantized. | `-load-profile= ./resnet50_profile.yaml` |
| `-convert-to-quantize` | If `-load-profile` is not provided, then input data is profiled and run in Quantized mode. | `-convert-to-quantize ` |
| `-load-embedding-tables ` | Load embedding tables for DLRM or RecSys models from the Zip file specified.<br><br><br>              <br>The .zip file should be generated using<br><br><br>              <br>`-dump-embedding-tables`. Works only for PyTorch. It cannot be used with `-convert-to-quantize` or `-dump-profile`. | `-load-embedding-tables=embedding_tables.zip ` |
| `-dump-embedding-tables ` | Dump embedding tables from DLRM or RecSys PyTorch models to the .zip file specified. It cannot be used with `-convert-to-quantize` or `-dump-profile`. | `-dump-embedding-tables=embedding_tables.zip ` |
| `-aic-binary-dir ` | Stores model binaries at directory location provided. | `-aic-binary-dir=/tmp/models/test_1/` |
| `-compile-only ` | Compiles a model and generates binaries at the location provided through `-aic-binary-dir`. | `-compile-only ` |
| `-host-preproc ` | Executes pre- and postprocessing operations on the host. | `-host-preproc ` |
| `-aic-preproc ` | Executes pre- and postprocessing operations on AIC100 instead of on the host. | `-aic-preproc` |
| `-aic-enable-depth-first ` | Enables depth-first compiler optimizations. The depth-first memory size is picked by the compiler based on heuristics. | `-aic-enable-depth-first ` |
| `-aic-depth-first-mem ` | Sets the depth-first memory size. Memory size must be chosen from {16,20,24,28,32} and depth-first must be enabled for this. | `-aic-enable-depth-first -aic-depth-first-mem=20 ` |
| `-batchsize ` | Set the number of batches to be used for execution. | `-batchsize=1` |
| `-stats-batchsize ` | This option is used to normalize performance statistics to be per inference. | `-stats-batchsize=1 ` |
| `-onnx-define-symbol ` | Define an ONNX symbol with its value. | `-onnx-define-symbol=batchsize,1 ` |
| `-onnxlib ` | Path to an ONNX library. | `-onnxlib=.` |
| `-always-expand-onnx-functions ` | Force ONNX function inlining. | `-always-expand-onnx-functions ` |
| `-register-custom-op` | Register a custom op using this configuration file. | `-register-custom-op=customop_config.yaml ` |
| `-compiler-help ` | List compiler-specific help options. | `-compiler-help ` |
| `-input-list-file ` | Name of the file (.txt) containing the list of inputs (one line per inputs). | `-input-list-file=/tmp/models/test_1/inputlistfile.txt` |
| `-use-random-input-data ` | Generates random data for model input as per the distribution type provided. | `-use-random-input-data=gaussian ` |
| `-num-iter ` | Number of iterations to perform. | `-num-iter 100` |
| `-time` | Duration (in seconds) for which to submit inferences. | `-time=1` |
| `-enable-debug ` | Enables debug mode during model compilation. | `-enable-debug` |
| `-time-passes ` | Enable printing of compile-time statistics. | `-time-passes` |
| `-io-crc ` | Enable CRC check for inputs and outputs of the network. | `-io-crc` |
| `-io-crc-stride ` | Specifies the size of the stride to calculate CRC in the stride section (default: 256). | `Io-crc-stride=256` |
| `-sdp-cluster-sizes` | Enables single device partitioning and sets the cluster configuration. | `-sdp-cluster-sizes ` |
| `-mdp-dump-partition-config` | Specifies the location to write a manual partition configuration file, which can be modified by a user to enable multi-device partitioning. | `-mdp-dump-partition-config=/tmp/nn_partition_config.json` |
| `-mdp-load-partition-config` | Specifies the location of a manual partition configuration file used to enable multi-device partitioning. | `-mdp-load-partition-config=/tmp/nn_partition_config.json` |
| `-profiling-threads ` | Sets the number of parallel threads for speeding up profile generation. Default: 1. | `-profiling-threads=5 ` |
| `-compile-threads ` | Sets the number of parallel threads used for compilation. Default: # of concurrent threads supported by host. | `-compile-threads=1 ` |
| `-use-producer-dma` | Initiate NSP DMAs from the thread that produces the data being transferred. | `-use-producer-dma ` |
| `-aic-perf-warnings ` | Print performance warning messages. | `-aic-perf-warnings ` |
| `-aic-perf-metrics ` | Print compiler performance metrics. | `-aic-perf-metrics ` |
| `-aic-pmu-recipe ` | Enable the PMU selection based on a built-in recipe: AxiRd, AxiWr, AQxiRdwr, KernelUtil, HmxMacs. | `-aic-pmu-recipe=AxiRd ` |
| `-aic-pmu-events` | Track events in NSP cores. Up to 8 events are supported. | `-aic-pmu-events=205,63,206,65,64,74,85,70 ` |
| `-multicast-weights ` | Reduce DDR bandwidth by loading weights used on multiple cores only once and multicasting to the other cores. | `-multicast-weights ` |
| `-direct-api` | Used to enable a platform-specific shared memory API.<br><br><br>              <br>Not supported for the Cloud AI 100. | `-direct-api ` |
| `-stats-level ` | Used to enable inference and operator level statistics. | `-stats-level=3` |
| `-ddr-stats ` | Used to collect DDR traffic details on a per core level. | `-ddr-stats` |
| `-combine-inputs ` | When enabled, combines inputs into fewer buffers for transfer to the device. | `-combine-inputs ` |
| `-combine-outputs ` | When enabled, combines outputs into a single buffer for transfer to the host. | `-combine-outputs ` |
| `-device-id ` | Device ID on which inference should run. | `-device-id=0` |
| `-aic-num-of-instances ` | Number of instances used for inference. | `-aic-num-of-instances=1 ` |
| `-aic-profiling-start-iter ` | Profiling start iteration | `-aic-profiling-start-iter=1` |
| `-aic-profiling-start-delay` | Profiling start delay (in milliseconds).<br><br><br>              <br>Profiling will start after given delay period has elapsed. | `-aic-profiling-start-delay=1` |
| `-aic-profiling-num-samples ` | Profiling num samples to save to file. | `-aic-profiling-num-samples=1` |
| `-aic-profiling-format <level>`<br><br><br>              <br><br><br><br>              <br>**Note:** Soon to be deprecated. Refer to` aic-profiling-type ` | Profiling format: '`ascii'|'json'|’latency`’.<br><br><br>              <br>Set as many formats as required.<br><br><br>              <br>**Note:** Use ‘stats’ instead of ‘ascii’. ‘ascii’ is soon to be deprecated.<br><br><br>              <br>Use ‘trace’ instead of ‘json’. ‘json’ is soon to be deprecated. | `-aic-profiling-format=latency` |
| `-aic-profiling-type <type>` | Profiling type: '`stats'|'trace'|'latency`' for legacy profiling and '<br>                <br><br>    trace_stream'   |<br>                    'latency_streamCopy to clipboard<br><br><br>                ' for stream profiling.<br><br><br>              <br>Set multiple times for multiple formats. Default: none. | `-aic-profiling-type=latency_stream` |
| `-aic-profiling-duration` | Duration to run profiling for (in ms). After starting profiling, it stops at the expiration of the profiling duration.<br><br><br>              <br>**Note:** This option only works with stream profiling and not with legacy (num iter based) profiling. | `-aic-profiling-duration=2000` |
| `-aic-profiling-sampling-rate` | Profiling sampling rate [`full/half/fourth/eighth/sixteenth`]. Programs will generate profiling samples at the requested rate.<br><br><br>              <br>**Note:** This option only works with stream profiling and not with legacy (num iter based) profiling. | `-aic-profiling-sampling-rate=full` |
| `-aic-profiling-reporting-rate` | Profiling report generation rate (in ms) [`500/1000/2000/4000`]. A profiling report is generated at every requested interval for the profiling duration.<br><br><br>              <br>**Note:** This option only works with stream profiling and not with legacy (num iter based) profiling. | `-aic-profiling-reporting-rate=500` |
| `-json-input-file=<path>` | Use a JSON file to specify input/output data files. This option also allows a user to specify the data type and dimensions of the data buffers. The JSON format is:<br><br><br>              <br><br>          { "IO-files": [<br>                         [ {“path”: "path-to-data-file",<br>                            “dims”:[1,16,128,..],<br>                            “data-type": "uint64_t",<br>                            “elem-size”: 8,<br>                            "io-diretion":["in"|"out"]},<br>                           ..<br>                         ],<br>                         ...<br>                       ]<br>          }Copy to clipboard<br><br><br>              <br>**Note:**<br><ul class="ul"><br>                <li class="li">"elem-size" overrides data-type if specified.</li><br><br>                <li class="li">Valid data types and related sizes: "int": 4, "float": 4, "uint": 4, "float16": 2,"uint8_t": 1, "int8_t": 1, "int16_t": 2, "uint16_t": 2, "int64_t": 8, "uint64_t": 8.</li><br><br>              </ul> | `-json-input-file=input.json` |
| `-write-output-dir ` | Location to save output files. | `-write-output-dir=./` |
| `-submit-timeout ` | Kernel time out for each inference in milliseconds. | `-submit-timeout=5000 ` |
| `-submit-retry-count ` | Number of retries when an inference request times out. | `-submit-retry-count=0 ` |
| `-aic-batch-max-memory` | Batch mode: Limit memory usage when loading files, provide parameter in Mb. | `-aic-batch-max-memory=1000` |
| `-auto-batch-input ` | Automatically batch inputs to meet the batchsize requirements of the network. Inputs should be for batchsize = 1. | `-auto-batch-input ` |
| `-unbound-random` | When populating random values in a buffer, do not consider the input buffer format and fill each byte with random input between 0 to 255. **Note:** This can result in unexpected behavior from certain networks. |  |
| `-dump-input-buffers` | Dump input buffers used in benchmarking mode. | `-v` |
| `-network-specialization-config ` | Instructs the compiler to compile multiple network specializations using the configurations found in the passed .json file. | `-network-specialization-config=configuration.json` |
| `-v -vv -vvv ` | Specify different log verbosity levels in increasing order. | `-v` |
| `-version` | Prints QAic Graph API version. | `-version` |
| `-version-extended` | Prints QAic Graph API version along with the compiler SHA information. | `-version-extended ` |
| `-operators-supported` | Dumps the list of all supported operators for a given model type (`onnx,``tensorflow`, `pytorch`, `caffe`, or `caffe2`) into a file with name `<type>SupportedOperators.txt` in the current directory. | `-operators-supported=onnx ` |
| `-h -help --help` | Lists help options. | `-h` |

**Parent Topic:** [Network compilation](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

## 5.1.1 Network specialization

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

Network specialization is a compilation and runtime strategy to select an appropriately sized network to run based on the shapes of the inputs provided to the network.

Network specialization packages multiple networks that are compiled with different settings for symbolic variables into the same binary. Each logically different network within the binary is called a network specialization. At inference time, one of these network specializations is selected to run based on the shapes of the network’s inputs and which specialization should be more optimally compiled for that corresponding shape.

This feature requires host pre-/postprocessing, and so is not supported with the `-aic-preproc` option.

Table : Network specialization

| Option | Description |
| --- | --- |
| `-network-specialization-config=<configuration.json> ` | Instructs the compiler to compile multiple network specializations using the configurations found in the passed configuration.json file. |

## Example commands

qaic-exec command:

    /opt/qti-aic/exec/qaic-exec -m=./ResNet18_Dynamic.onnx -aic-hw -aic-num-cores=14 -convert-to-fp16 -compile-only -aic-binary-dir=output-specialization -network-specialization-config=configuration.jsonCopy to clipboard

Users need to supply a network specialization configuration JSON file that informs the compiler how many separate network specializations to create and provides the substitute values for undefined symbols to use within that specialization. The <var class="keyword varname">-aic-hw</var> flag may be omitted to run the specialized networks on the native path.

spec.json file:

    { 
        "specializations": [ 
            { 
                "batch": "1", 
                "height": "56", 
                "width": "56" 
            }, 
            { 
                "batch": "1", 
                "height": "112", 
                "width": "112" 
            }, 
            { 
                "batch": "2", 
                "height": "56", 
                "width": "56" 
            } 
        ] 
    }Copy to clipboard

To supply data for network specialization, a user can use the <var class="keyword varname">-json-input-file</var> option to describe the dimensions of the data buffers. Refer to [Table : QAic Executor options](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html#QAic_executor_concept__table_qaic_exec_options) for details on this option. Note that currently, the user must use the <var class="keyword varname">-json-input-file</var> format, and that the <var class="keyword varname">-input-list-file</var> format used for non-specialized execution is not supported.

## Quantization with network specialization

To perform quantization on specialized networks, the user needs to run qaic-exec twice: once to dump the quantization profile file and a second time to use it.

To create the quantization profile, the user should first run qaic-exec with the <var class="keyword varname">-dump-profile</var> flag:

    /opt/qti-aic/exec/qaic-exec -m=./ResNet18_Dynamic.onnx -aic-hw -network-specialization-config=configuration.json -json-input-file=io_shapes.json -dump-profile=profile.pgqCopy to clipboard

To use the quantization profile, the user should then run the same command but replace <var class="keyword varname">-dump-profile</var> with <var class="keyword varname">-load-profile</var>:

    /opt/qti-aic/exec/qaic-exec -m=./ResNet18_Dynamic.onnx -aic-hw -network-specialization-config=configuration.json -json-input-file=io_shapes.json -load-profile=profile.pgqCopy to clipboard

Note that there must be at least one input in the json-input-file for each of the specializations in the specialization configuration.

## Multiple compile options with network specialization

The user may also specify compiler options on a per-specialization basis in the specialization configuration file. Using the above example, the spec.json file could be updated to:

    { 
        "specializations": [ 
            { 
                "batch": "1", 
                "height": "56", 
                "width": "56",
                "compile_opts": "-ols=1" 
            }, 
            { 
                "batch": "1", 
                "height": "112", 
                "width": "112",
                "compile_opts": "-ols=2"  
            }, 
            { 
                "batch": "2", 
                "height": "56", 
                "width": "56",
                "compile_opts": "-ols=4" 
            } 
        ] 
    }Copy to clipboard

Currently <var class="keyword varname">-ols</var> is the only option that can be specified on a per-specialization basis.

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

## 5.1.2 Single device partitioning

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

Single device partitioning is a workload mapping strategy in the compiler where available cores on the device are viewed as clusters of cores and the network is partitioned into subgraphs. Each subgraph is mapped to a cluster on the device and the cluster runs only its portion of inference. Once done, it copies the outputs to the next cluster for further processing. This pipelined manner of execution results in the device processing multiple inferences at a time.

Partitioning helps with the throughput performance for the following reasons:

- Only a subset of the cores executes the same operation, that decreases the di/dt events and limits violations.
- Reduces the cross-core communication and synchronization.
- With the partitioned graphs, the MOS setting can be better exploited. For example, the first half of vgg16 can use MOS 1 and the second half can use MOS 4.

Table : Partitioning

| Option | Description |
| --- | --- |
| `-sdp-cluster-sizes=<x,y,z,w>` | Option to specify the cluster configuration to be used for SDP. When set, this option enables SDP. Dual cluster configurations are 8/8, 4/4, 2/2, and 1/1. Quad cluster configurations are 4/4/4/4, 2/2/2/2, 1/1/1/1, and 4/4/4/2 (for the 14 core setup). x+y+z+w must be equal to the number of cores set. |
| `-mos=<num>` | Maximum output channel split (MOS) – The effort level reduces the on-chip memory usage. The compiler optimizes for the on-chip memory usage by mapping the network to the on-chip memory. Increasing the effort level holds more of the network inside on-chip memory but may lead to higher communication overload. There may be a sweet spot for optimum performance that is dependent on the actual network being run. This value should be less than and equal to number of cores. If this option is not set, then the compiler sets it per its internal heuristic algorithms.<br><br><br>              <br>The maximum number of partitions/clusters supported are 4. |

**Example commands**

qaic-exec command

    /opt/qti-aic/exec/qaic-exec -m=./generatedModels/ONNX/vgg16.onnx -convert-to-quantize -aic-hw -aic-num-cores=14 -input-list-file=list.txt -num-iter=5000 -aic-num-of-instances=1 -ols=4 -quantization-schema-activations=symmetric_with_uint8 -quantization-schema-constants=symmetric_with_uint8 -quantization-precision=Int8 -aic-profiling-format=ascii -aic-profiling-format=json -aic-profiling-out-dir=./vgg16_onnx_int8_ppp_host_elfs -aic-profiling-num-samples=5 -aic-profiling-start-iter=10 -batchsize=1 -mos=1 -v -mos=1,4 -sdp-cluster-sizes=7,7Copy to clipboard

model\_configurator command

    python3 /opt/qti-aic/scripts/qaic-model-configurator/model_configurator.py ./generatedModels/ONNX/vgg16.onnx onnx -iter 5000 -list-configs -batchsize 1 -cores 15 -mos 1,2,4,8 -ols 2,4 -instance 1,2 -input-list-generate -image-dir ./inputFiles -width 224 -height 224 -reuse-single-file -enable-single-device-partitioningCopy to clipboard

Configuration output

    cores  bs  ols  mos  instances  dealloc-dly  split-size  limit-vtcm-percent  sd_partition mos_combinations  
    1      15   1    2        1          1            3        2048                 100           []             [] 
    2      15   1    2        0          1            3        2048                 100 [4, 4, 4, 3]   [1, 1, 1, 1] 
    3      15   1    2        0          1            3        2048                 100 [4, 4, 4, 3]   [1, 1, 2, 2] 
    4      15   1    2        0          1            3        2048                 100 [4, 4, 4, 3]   [2, 2, 1, 1] 
    5      15   1    2        0          1            3        2048                 100 [4, 4, 4, 3]   [2, 2, 2, 2] 
    6      15   1    2        0          1            3        2048                 100 [4, 4, 4, 3]   [4, 4, 1, 1] 
    7      15   1    2        0          1            3        2048                 100 [4, 4, 4, 3]   [4, 4, 2, 2] 
    8      15   1    2        0          1            3        2048                 100       [8, 7]         [1, 1] 
    9      15   1    2        0          1            3        2048                 100       [8, 7]         [1, 2] 
    10     15   1    2        0          1            3        2048                 100       [8, 7]         [1, 4] 
    11     15   1    2        0          1            3        2048                 100       [8, 7]         [2, 1] 
    …Copy to clipboard

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

## 5.1.3 Multi-QAic (limited to preview/demo usage; not for commercialization)

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

Multi-QAic is a new feature that enables execution of large neural networks across multiple Cloud AI 100 cards connected to the same host. Multiple topologies are supported with or without a PCIe switch. Cards connected to a PCIe switch with peer-to-peer communication enabled provide the best performance. The network is partitioned, and the sub-networks are executed on multiple Cloud AI 100 cards.

## Limitations

Multi-QAic is an experimental feature and has the following limitations:
- Multi-QAic feature is functional. No performance tuning was done.
- Only a limited set of networks (BERT-BASE and BERT-LARGE) have been tested.
- Does not support compilation of network with single device partitioning (-sdp-cluster-sizes) and multi-QAic at the same time.
- Only running network for multi-qaic using ONNXRT is supported. Compilation for multi-qaic using ONNXRT is not supported.
- BAR4/5 card configuration is only supported on bare metal Ubuntu 18/20. Do not attempt configuring BAR4/5 through hyper-V.
- By default, networks are scheduled to run on QID 0,1,2 etc., with qaic-exec. Use qaic-runner to run network on selected QIDs with -D option.
- Oversubscription is not support for networks running using multi-QAic and there by data plane switching is not supported.

##  Command line options

| Option | Description |
| --- | --- |
| `-mdp-dump-partition-config` | Specifies the location to write a manual partition configuration file, which can be modified by a user to enable multi-device partitioning. |
| `-mdp-load-partition-config` | Specifies the location of a manual partition configuration file used to enable multi-device partitioning. |

## Partition configuration file

JSON schema
- <var class="keyword varname">connections</var>: Specifies the connection type (such as peer-to-peer or
            via the host) between two or more logical devices. The connections must define a
            pipelined partition. If no connection type is defined between partitions (for example,
            between deviceId 0 and deviceId1), then the connection defaults to going through the
            host.
- <var class="keyword varname">devices</var>: List of two or more logical device ids
- <var class="keyword varname">type</var>: connection type (either ‘host’ or ‘p2p’)
    - <var class="keyword varname">partitions</var>: An array of partition objects. Each entry
                contains:
- <var class="keyword varname">name</var>: The name of the partition (e.g., Partition0)
- <var class="keyword varname">devices</var>: A array of one or more devices assigned to the partition.
            Greater than one partition enables tensor slicing partitioning within the partition
    - <var class="keyword varname">deviceId</var>: Logical device ID
    - <var class="keyword varname">numCores</var>: The number of cores the partition is compiled
                for.
- <var class="keyword varname">nodeList</var>: List of node names that have been assigned to this
            partition. Names should match those in the original graph.

As a starting point, developers can use the <var class="keyword varname">-mdp-dump-partition-config</var> option to emit a sample partition configuration file. This will include a single partition that includes a list of all the node names in the network. These node names should correspond to the names from the original network. The developer must split the nodes across two or more partitions and specify the connections between each partition.

Example pipelined partition configuration file:

    {
        "connections": [
            {
                "devices": [0,1,2],
                "type": "p2p"
            },
            {
                "devices": [2,3],
                "type": "host"
            }
        ],
        "partitions": [
            {
                "name": "Partition0",
                "devices": [
                    {
                        "deviceId": 0,
                        "numCores": 10
                    }
                ],
                "nodeList": [
                    "Add_1105",
                    "Div_1115",
                    "Add_1106"
                ]
            },
            {
                "name": "Partition1",
                "devices": [
                    {
                        "deviceId": 1,
                        "numCores": 8
                    }
                ],
                "nodeList": [
                    "Add_1789",
                    "Div_1799",
                    "Add_1790"
                ]
            },
            {
                "name": "Partition2",
                "devices": [
                    {
                        "deviceId": 2,
                        "numCores": 16
                    }
                ],
                "nodeList": [
                    "Add_2473",
                    "Div_2483",
                    "Add_2474"
                ]
            },
            {
                "name": "Partition3",
                "devices": [
                    {
                        "deviceId": 3,
                        "numCores": 1
                    }
                ],
                "nodeList": [
                    "Add_3157",
                    "Div_3167",
                    "Add_3158",
                    "Add_3172"
                ]
            }
        ]
    }
    Copy to clipboard

Example tensor sliced partition configuration
        file:

    {
        "connections": [
            {
                "devices": [0,1],
                "type": "p2p"
            }
        ],
        "partitions": [
            {
                "name": "Partition0",
                "devices": [
                    {
                        "deviceId": 0,
                        "numCores": 8
                    },
                    {
                        "deviceId": 1,
                        "numCores": 8
                    }
                ]
            }
        ]
    }
    Copy to clipboard

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

## 5.1.4 Custom I/O

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

The custom I/O feature allows a user to provide the desired layout and precision for the inputs and outputs while loading a network. Instead of compiling the network for the inputs and outputs specified in the model, the network is compiled for the inputs and outputs described in the custom configuration. This feature is used when the user intends to preprocess (on GPU/CDSP or any other method) or offline process (like allowed by ML commons) the input data and avoid some steps in the input processing. A user can avoid redundant transposes, data-type conversions if they have knowledge of the input preprocessing steps. Similarly, on the postprocessing side, if the model output is to be fed to a next stage in a pipeline, the desired format and type can be configured as the output of current stage.

- In case of layout, the user can choose either 'NHWC' or 'NCHW' if the rank of the tensor is four.
- In case of precision, the user can choose either 'float', 'float16', or 'int8' datatypes. For the 'int8' datatype, quantization parameters should also be provided.

This feature is compatible with other features such as mixed precision, external quantization, and quantization profile generation. In this section, the term "model I/O" refers to the input and output datatypes and formats of the original model. The term "custom I/O" refers to the input and output datatypes and formats desired by the user.

## Custom I/O configuration file

- YAML schema
          
    Custom I/O can be applied using a configuration YAML file that contains the following fields for each input and output that needs to be modified:

    - IOName: Name of the input or output that needs to be loaded as per the custom requirement.
    - Layout: NCHW and NHWC layouts are supported. This field is optional and can be skipped for an input or output if rank is not equal to four or if layout customization is not required.
    - Precision: float, float16, and int8 datatypes are supported. This field is optional and can be skipped for an input or output if datatype customization is not required.
    - Scale: scale value. This field is mandatory if the 'Precision' is specified as 'int8'. This field is optional for other datatypes, and it will be ignored even if provided.
    - Offset: offset value. This field is mandatory if the 'Precision' is specified as 'int8'. This field is optional for other datatypes, and it will be ignored even if provided.
- Custom I/O configuration example
          
    Consider an ONNX model with three inputs and three outputs and with the original model I/O and custom I/O configuration as shown in the following table.

|  | I/O Name | Model I/O | Custom I/O |
| --- | --- | --- | --- |
| *Inputs* | input\_0 | float NCHW | int8 NHWC |
| *Inputs* | input\_1 | float NCHW | float NHWC |
| *Inputs* | input\_2 | int64 NCHW | int64 NHWC |
| *Outputs* | output\_0 | float NCHW | float64 NHWC |
| *Outputs* | output\_1 | float (rank != 4) | float64 (No layout change) |
| *Outputs* | output\_2 | int64 (rank !=4) | No change |
|  |  |  |  |
|  |  |  |  |

Then, the content of the custom I/O configuration YAML file that should be provided is:

    - IOName: input_0 
      Layout: NHWC 
      Precision: int8 
      Scale: 0.12 
      Offset: 3 
      
    - IOName: input_1 
      Layout: NHWC 
      
    - IOName: input_2 
      Layout: NHWC 
      
    - IOName: output_0 
      Layout: NHWC 
      Precision: float16 
      
    - IOName: output_1 
      Precision: float16 Copy to clipboard

**Notes:**

1. If no change is required for an input or output, it can be skipped in the configuration file.
2. This feature currently does not support layouts other than NCHW and NHWC. For other layouts, the '`Layout`' field should be skipped in the configuration file.
3. Precision can be modified using the custom I/O feature only if the model input or output datatype is float, float16, or int8. For other datatypes, the '`Precision`' field should be skipped in the configuration file.
4. For the int8 datatype, quantization parameters must be provided. They can be obtained for a desired activations quantization schema using the min-max quantization method. In case of SQNR or KL divergence minimization or percentile calibration methods, then the min and max values will be different than that of the float32 inputs.

## Usage with QAic exec

The QAic exec option '`-custom-IO-list-file`' can be used to provide the custom I/O configuration file as follows:

    $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yamlCopy to clipboard

This feature is compatible with other QAic exec options such as  "`-aic-preproc`". In case all transformations need to be pushed to the device, the "`-aic-preproc`" option can be passed to qaic-exec along with custom I/O.

For obtaining the custom I/O configuration template file, the option '`-dump-custom-IO-config-template`' should be used with Qaic exec as follows:

    $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlCopy to clipboard

If proper scale and offset fields need to be filled in the template file, then the option '`-dump-custom-IO-config-template`' should be used with QAic exec along with the quantization profile generated with model I/O, activations quantization schema, and calibration as follows:

    $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yaml,pgq_profile_model_IO.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration>Copy to clipboard

- **Custom I/O with -convert-to-fp16**
    1. Obtain the custom I/O configuration template

            $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlCopy to clipboard
    2. Edit the template file 'custom\_IO\_config.yaml' obtained in step 1 and provide the desired layout and precision fields for the inputs and outputs.
    3. Apply custom I/O along with the `-convert-to-fp16` option.

            $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -convert-to-fp16Copy to clipboard
- **Custom I/O with external quantization**
    1. Obtain the custom I/O configuration template.

            $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlCopy to clipboard
    2. Edit the template file '`custom_IO_config.yaml`' obtained in step 1 and provide the desired layout and precision fields for the inputs and outputs. The scale and offset fields will be ignored even if provided in the custom I/O configuration file. If precision is set to int8, scale and offset will be obtained from the external quantization profile file.
    3. Apply custom I/O along with external quantization feature.

            $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -external-quantization=external_quantization_profile.yamlCopy to clipboard
- **Custom I/O with profile guided quantization**
          
    In case of profile guided quantization, the same custom I/O configuration file should be used while dumping the profile and loading the profile.

    - **If scale and offset of I/O are known to the user:**
        1. Obtain the custom I/O configuration template.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlCopy to clipboard
        2. Edit the template file ' `custom_IO_config.yaml`' obtained in step 1 and provide the desired layout and precision fields for the inputs and outputs. Fill the scale and offset fields with proper values known to the user if the corresponding I/O precision is set to int8.
        3. Apply custom I/O and perform profiling using the input data in the format as per the custom I/O configuration.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -dump-profile=pgq_profile_custom_IO.yamlCopy to clipboard
        4. Apply custom I/O and load the quantization profile file generated in step 3.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -load-profile=pgq_profile_custom_IO.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration> -quantization-schema-constants=<desired_schema>Copy to clipboard
    - **If scale and offset of I/O are not known to the user:**
              
        In case the user does not have the scale and offset for the I/O that should be used in the custom I/O configuration file, the user can use any external tool such as AIMET for obtaining the scale and offset and follow the steps provided in the previous section.

        Alternatively, the scale and offset can be derived by performing PGQ on model I/O. Then use that scale and offset to process the data into custom I/O datatype and format and perform profiling with the preprocessed input data. These steps are as follows:

        1. Perform profiling with input data as per model I/O without using custom I/O configuration. This step can be skipped if profile file with model I/O is already available.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_model_IO> -dump-profile=pgq_profile_model_IO.yamlCopy to clipboard
        2. Using the profile file dumped in step 1, the scale and offset can be obtained in the template file using the option '`-dump-custom-IO-config-template`' and by providing the desired activations quantization schema and calibration method.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yaml,pgq_profile_model_IO.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration>Copy to clipboard
        3. Edit the template file '`custom_IO_config.yaml`' obtained in step 2 and provide the desired layout and precision fields for the inputs and outputs. Note that the scale and offset fields are already filled with proper values based on the given quantization schema and calibration. They will be considered if the corresponding I/O precision is set to int8, else ignored.
        4. Perform profiling with the input data as per the custom I/O configuration file.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -dump-profile=pgq_profile_custom_IO.yaml -custom-IO-list-file=custom_IO_config.yamlCopy to clipboard
        5. Load the model using the `-load-profile` option. Note that the activations quantization schema and calibration should be same as those used in step 2 while computing scale and offset.

                $ /opt/qti-aic/exec/qaic-exec -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -load-profile=pgq_profile_custom_IO.yaml -custom-IO-list-file=custom_IO_config.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration> -quantization-schema-constants=<desired_schema>Copy to clipboard

**Notes:**

1. The Uint8 datatype is currently not supported with the custom I/O feature. Conversion from Uint8 to Int8 should be done by LRT or the device.
2. Int64, Int32, and Int16 are currently not supported with custom I/O feature.
3. When low precision scale (fewer decimal places) and offset are used to preprocess input data into custom I/O and further used by dequantization during quantization profile generation for custom I/O, there will be loss in accuracy.

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

## 5.1.5 Mixed precision

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

The mixed precision feature allows a user to execute a network with nodes in FP32/FP16/INT8 combination. Specific node instances of each node type can be set to FP16 precision using the "<var class="keyword varname">-node-precision-info</var>" option. The "<var class="keyword varname">-node-precsion-info</var>" option can be used with the Qaic compiler's profile guided quantization and <var class="keyword varname">"-keep-original-precision-for-nodes</var>" to execute a network in mixed precision (FP32/FP16/INT8).

## Interoperability with "`-keep-original-precision-for-nodes`"

1. The "<var class="keyword varname">-keep-original-precision-for-nodes</var>" and "<var class="keyword varname">-node-precision-info</var>" options can be used together to create a graph in mixed mode precision (FP32/FP16).
2. "<var class="keyword varname">keep-original-precision-for-nodes</var>" supports executing all instances of specified node kind in original precision (if original precision is FP32, will remain FP32).
3. Setting node instances to FP32 is not supported with "<var class="keyword varname">-node-precision-info</var>".

## Node precision info input file

Operator instances required to run in FP16 are identified via the operator’s first output name. The user should provide a YAML file containing operator instances’ first output name that is required in FP16 listed against the field "<var class="keyword varname">FP16NodeInstanceNames</var>".

Example: Sample YAML file content containing output name of node instances required in FP16.

    FP16NodeInstanceNames: [conv0, bn0, relu0]Copy to clipboard

## Assumptions and dependencies

1. Supported for ONNX, Caffe2, and PyTorch models currently.
2. Node instances required to run in FP16 are identified via operator’s first output name.
3. When used with profile guided quantization, model quantization profile needs to be generated with "<var class="keyword varname">-node-precision-info</var>".
4. During quantization profiling, node instances required to run in FP16 precision should have FP16 kernel implementation for interpreter backend.

## Usage with qaic-exec

Step 1: Generate quantization profile with `-node-precision-info`.

    $ /opt/qti-aic/exec/qaic-exec -m=./path-to-model -input-list-file=list.txt -node-precision-info=node_precision.yaml -dump-profile=pgq.yaml  
     
    Quantization Profile is being generated. 
    Quantization profile is dumped at pgq.yamlCopy to clipboard

Step 2: Inference using generated pgq profile with Step 1.

    $ /opt/qti-aic/exec/qaic-exec -m=./path-to-model -input-list-file=list.txt -node-precision-info=node_precision.yaml -load-profile=pgq.yaml 
     
    Model is compiled with Int8 precision using PGQ.Copy to clipboard

## Usage with QAic graph API

Set the graph configuration option <var class="keyword varname">QAicGraphConfig.quantizationConfig.nodePrecisionInfo</var> to force the execution of specific operator instances with FP16 precision. This flag is supported for ONNX, Caffe2, and PyTorch Models loaded through API "<var class="keyword varname">qaicAddNodesToGraphFromModel</var>".

Note: When selecting a Convolution node instance to run in FP16 precision, set BatchNorm node (if there is any) as well as Convolution to FP16 precision to allow fusion of Convolution and BatchNorm.

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

## 5.1.6 Get intermediate layer outputs

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

The option <var class="keyword varname"> -output-node-name</var> can be used to save the output of an intermediate model layer. To use this option, pass the name of the output that should be saved as it appears in the model graph. Multiple output names can be passed separated by a comma if needed. If a valid name is passed, the tensor will be saved along with the original outputs of the model. This option can be used with Caffe, Caffe2, TensorFlow, PyTorch, and ONNX models.

Sample command for usage:

    $ /opt/qti-aic/exec/qaic-exec –m=./resnet50.onnx -input-list-file=onnx_resnet50/list.txt -output-node-name=gpu_0/res2_0_branch2a_1,gpu_0/res2_2_branch2c_bn_1 -write-output-dir=outCopy to clipboard

For TensorFlow models, the output names should be passed as they appear in the model graph. If an operator has multiple outputs, the output names should be passed in the format LayerName:OutputIndex. For example, to get all the outputs of a node named Output/CombinedNonMaxSuppression the user should specify` -output-node-name=Output/CombinedNonMaxSuppression,Output/CombinedNonMaxSuppression:1,Output/CombinedNonMaxSuppression:2,Output/CombinedNonMaxSuppression:3`.

Intermediate layer output names can be known from applications like Netron. However, since a PyTorch-traced model uses the SSA format, use the following script to know their output names:

    import torch  
    model = torch.jit.load("model.pt") 
    model.eval() 
    model._c = torch._C._freeze_module(model._c) 
    g = model.graph.copy() 
    torch._C._jit_pass_inline(g) 
    print(g)Copy to clipboard

**Limitations**

- Using this option will prevent certain optimizations from occurring. For example, layers whose outputs are saved will not be fused with any other operators.
- If this option is used along with quantization options, the quantization profile of the model must be regenerated.

**Parent Topic:** [QAic executor](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qaic-executor.html)

## 5.2 QAic QPC tool

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

The QAic QPC tool (qaic-qpc) is used to pack the artifacts generated by the compiler so they can be passed to the QAic runtime and provides an unpack feature to extract individual files from a single QPC binary. This tool also provides an option to validate network descriptor, network descriptor to metadata, and metadata to hardware installed on the system and to provide model information.

Precompiled network binaries such as those located in the /opt/qti-aic/test-data folder or generated by the Cloud AI 100 Apps SDK can be used to test.

Test files for precompiled network binaries:

    /opt/qti-aic/test-data/aic100/v2Copy to clipboard

## Examples

- Extract QPC binary:

        sudo /opt/qti-aic/tools/qaic-qpc extract --input-file /opt/qti-aic/test-data/aic100/v2/14nsp/14nsp-quant-resnet50/programqpc.bin --output-dir ./Copy to clipboard
- Validate QPC binary:

        sudo /opt/qti-aic/tools/qaic-qpc validate --input-file /opt/qti-aic/test-data/aic100/v2/14nsp/14nsp-quant-resnet50/programqpc.binCopy to clipboard
- Add CRC to QPC binary:

        sudo /opt/qti-aic/tools/qaic-qpc addCRC32 --input-file /opt/qti-aic/test-data/aic100/v2/14nsp/14nsp-quant-resnet50/programqpc.bin --output-dir ./ programqpc.binCopy to clipboard

## Argument details

Table : qaic-qpc create (pack) arguments

| Argument | Description |
| --- | --- |
| -i, --input-dir &lt;directory name&gt; | Input directory name where all the files that needs to be packed. |
| -n, --input-networkelf &lt;filename&gt; | Input network elf file name. |
| -d, --input-networkdesc &lt;filename&gt; | Input networkdesc filename. |
| -c, --input-constant &lt;filename&gt; | Input constant filename. |
| -e, --input-constantdesc &lt;filename&gt; | Input constantdesc filename. |
| -b, --input-networkbin &lt;filename&gt; | Input networkbin filename. |
| -o, --output-file &lt;filename&gt; | Output filename.<br><br><br>                <br>This is where all the input files get packed into one QPC output file. |
| -h, --help | Help |

Table : qaic-qpc extract (unpack) arguments

| Argument | Description |
| --- | --- |
| -i, --input-file &lt;filename&gt; | Packed QPC file as an input file. |
| -o, --output-dir &lt;directory name&gt; | Output directory name where all the files will get unpacked. |

Table : qaic-qpc validate arguments

| Argument | Description |
| --- | --- |
| -i, --input-file &lt;filename&gt; | Packed QPC file as an input file. |

Table : qaic-qpc addCRC32 arguments

| Argument | Description |
| --- | --- |
| -i, --input-file <filename>   Copy to clipboard | Packed QPC file as an input file. |
| -o, --output-dir <directory name> Copy to clipboard | Output directory name where all the files will get unpacked. |

**Parent Topic:** [Network compilation](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-compilation.html)

Last Published: Jul 26, 2023

[Previous Topic
QAic-compiler](https://docs.qualcomm.com/bundle/publicresource/80-PT790-993B/topics/network-preparation.md#network-preparation_qaic-model-profile-ref-qaic-compiler) [Next Topic
Network execution](https://docs.qualcomm.com/bundle/publicresource/80-PT790-993B/topics/network-execution.md)