# 6 Network execution

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

**Parent Topic:** https://docs.qualcomm.com/doc/80-PT790-993B/topic/dl_inference_tools_part.html

## 6.1 QAic runner

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

QAic runner (qaic-runner) is used to run precompiled network binaries such as those located in the ` /opt/qti-aic/test-data` folder or generated by the Cloud AI 100 Apps SDK.

Test data for precompiled workloads:

    /opt/qti-aic/test-data/aic100/v2Copy to clipboard

Run the quantized ResNet-50 model:

    sudo /opt/qti-aic/exec/qaic-runner -t /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ -i /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/user_idx0_input_tensor_1.bin -a 3 -n 5000 -d 0 -vCopy to clipboard

Table : qaic-runner argument details

| Argument | Description | Default |
| --- | --- | --- |
| `-d, --aic-device-id <id> ` | Specify AIC device ID. | 0 |
| `-D, --dev-list <qid>[:<qid>]` | Map of device IDs for a multi-device network. | 0[:1] |
| `-t,--test-data <path> ` | Location of network binaries. |  |
| `-i, --input-file <path> ` | Input filename from which to load input data.<br><br><br>              <br>Specify multiple times for each input file.<br><br><br>              <br>If no -i is given, the system will look for a bindings.json in the -t directory. |  |
| `-a, --aic-num-of-activations <num> ` | Number of activations. | 1 |
| `-n, --num-iter <num> ` | Number of iterations. | 40 |
| `--time <t>` | Duration (in seconds) for which to submit inferences. |  |
| `-l, --live-reporting ` | Enable live reporting periodic at 1 sec interval. | 0ff |
| `-r, --live-reporting-period ` | Set live reporting period in ms. Default 1000. | 1000 |
| `-s --stats ` | Enable device latency stats collection. |  |
| `--aic-profiling-start-iter ` | Profiling start iteration. |  |
| `--aic-profiling-num-samples <num>` | Profiling num samples to save to file. |  |
| `--aic-profiling-format <level>`<br><br><br>              <br><br><br><br>              <br>**Note:** Will be deprecated soon. Please refer to aic-profiling-type. | Profiling format, '`ascii`'|'`json`'|’`latency`’. Set as many formats as required.<br><br><br>              <br>**Note:** Use ‘`stats`’ instead of ‘`ascii`’. ‘`ascii`’ will be deprecated soon. Use ‘`trace`’ instead of ‘`json`’. ‘j`son`’ will be deprecated soon. |  |
| `--aic-profiling-out-dir <path> ` | Location to save files; dir should exist and be writable. |  |
| `--aic-profiling-type <type>` | Profiling type; '`stats`'|'`trace`'|'`latency`' for legacy profiling and '`trace_stream`' | '`latency_stream`' for stream profiling. Set multiple times for multiple formats, Default: none. |  |
| `--aic-profiling-duration` | The duration for running profiling (in ms). After profiling starts, it stops at the expiration of the profiling duration.<br><br><br>              <br>**Note:** This option only works with stream profiling and not with legacy (num iter based) profiling. | 2000 ms |
| `--aic-profiling-sampling-rate` | Profiling sampling rate [`full/half/fourth/eighth/sixteenth`]. Programs generate profiling samples at the requested rate.<br><br><br>              <br>**Note:** This option only works with stream profiling and not with legacy (num iter based) profiling. | Full |
| `--aic-profiling-reporting-rate` | Profiling report generation rate (in ms) [`500/1000/2000/4000`]. The profiling report is generated at every requested interval for the profiling duration.<br><br><br>              <br>**Note:** This option only works with stream profiling and not with legacy (num iter based) profiling. | 500 |
| `-S, --set-size <num>` | Set size for inference loop execution. Default 10, Min: 1. | 10 |
| `--write-output-start-iter <num> ` | Write outputs start iteration. |  |
| `--write-output-num-samples <num>` | Number of outputs to write. |  |
| `--write-output-dir <path> ` | Location to save output files; dir should exist and be writable. |  |
| `--aic-lib-path DEPRECATED ` | Deprecated. Set env variable QAIC\_LIB to the full path of the custom library. Loads libQAic.so from install location by default. |  |
| `--aic-batch-input-directory ` | Batch mode: Process all files from input directory. |  |
| `--aic-batch-input-file-list ` | Batch mode: Process all files specified in input file list with relative or absolute paths. |  |
| `--aic-batch-max-memory <mb> ` | Batch mode: Limit memory usage when loading files.<br><br><br>              <br>Provide parameter in Mb. | 1024 |
| `--submit-timeout <num> ` | Time to wait for an inference request completion on kernel. Defaults to 0 ms. When 0, kernel defaults to 5000 ms. | 5000 ms |
| `--submit-retry-count <num> ` | Number of wait-call retries when an inference request times out. Default 5. | 5 |
| `--unbound-random` | When populating random values in a buffer, do not consider the input buffer format and fill each byte with random input between 0 to 255. **Note:** This can result in unexpected behavior from certain networks. |  |
| `--dump-input-buffers` | Dump input buffers used in Benchmarking mode. |  |
| `--auto-batch-input ` | Automatically batch inputs to meet the batch size requirements of the network. Inputs should be for batch size 1. |  |
| `-v, --verbose` | Verbose log from program. |  |
| `-T, --aic-threads-per-queue` | Number of threads per queue. | 4 |
| `-p, --pre-post-processing` | Pre-/Postprocessing [on | off] | on |
| `-h, --help` | Help | Default |

**Parent Topic:** [Network execution](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## 6.2 QAic oversubscription

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

Oversubscription schedules multiple networks to run on a group of NSPs. The goal is to optimize the transition/switching between networks. The “`qaic-program-group-app`” application is part of the SDK release, which can be found at “`/opt/qti-aic/tools/`”.

By default, the switching of networks is done on the control plane. Data plane switching (DPS) is also available to optimize the switching time of the networks by enabling the switching over the AIC100 virtual channel. DPS yields better latency/performance results.

**Configuring the switching modes**

The mode of switching the networks is controlled by two parameters:

1. Program group configuration
2. Device configuration

**Program group configuration:**

The user can specify the preferred mode of switching during program group creation through the property `QAicProgramGroupPropertiesProtocol_t`.

    QAicProgramGroupPropertiesProtocol_t 
        QAIC_PROGRAM_GROUP_PROTOCOL_CONTROL_PATH  [Default] 
        QAIC_PROGRAM_GROUP_PROTOCOL_DATA_PATHCopy to clipboard

**Device configuration:**

The AIC100 can be configured to turn on or turn off the Data Plane Switching mode using the PVS variable ENABLE\_DPS.

The following table lists the operation mode matrix for all possible configurations.

Table : Operation mode matrix

| Program group property | ENABLE\_DPS | DPS Mode |
| --- | --- | --- |
| X | 0 | OFF |
| QAIC\_PROGRAM\_GROUP\_PROTOCOL\_CONTROL\_PATH | 1 | OFF |
| QAIC\_PROGRAM\_GROUP\_PROTOCOL\_DATA\_PATH | 1 | ON |

    $sudo /opt/qti-aic/tools/qaic-program-group-app --aic-lib-path /opt/qti-aic/dev/lib/x86_64/libQAic.so -i ./<configuration file name>.jsonCopy to clipboard

Note:
- Part of the SDK release is: `qaic-program-group-app` and `libQAic.so`
- The oversubscription configuration file is not part of the SDK and will be provided separately.
- The configuration file decides the number of program groups, the number of programs per program group, and the resources required.

**Example:**

    $sudo /opt/qti-aic/tools/qaic-program-group-app --aic-lib-path /opt/qti-aic/dev/lib/x86_64/libQAic.so -i ./oversubscription-config-2x4x1.json  
      
    ProgramStats:'ProgGroup-1-pg1' Samples:6 Enable Time: Min:0us Max:41271us   Avg:26388us  
    ProgramStats:'ProgGroup-1-pg1' Samples:6 EnqueueLatencyTime Time: Min:7us  Max:9us  Avg:8us  
    ProgramStats:'ProgGroup-1-pg2' Samples:6 Enable Time: Min:0us Max:21570us   Avg:11930us  
    ProgramStats:'ProgGroup-1-pg2' Samples:6 EnqueueLatencyTime Time: Min:1us  Max:3us  Avg:2us  
    ………………..  
    ………………..  
    ……………..  
    ……..Copy to clipboard

**Oversubscription configuration file example:**

    {  
      "program_group": [  
        {  
          "programGroupId": 1,  
          "name": "ProgGroup-1",  
          "qid": 0,  
          "frequency": 45,
          "profilingProp": 
          { 
             "streamProfilingReportingRate": 1000, 
             "streamProfilingSamplingRate": "full", 
             "streamProfilingDurationMs": 5000, 
             "enableTraceProfiling": false, 
             "enableLatencyProfiling": true 
          },     
          "stats_collection_frequency":500,  
          "programs": [  
            {  
              "numExecObj": 1,  
              "name": "pg1",  
        "qpc": "../../../opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-    
         resnet50/programqpc.bin"  
            },  
            {  
              "numExecObj": 1,  
              "name": "pg2",  
              "qpc": "../../../opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-conv-
               hmx/programqpc.bin"  
            }  
          ]  
        }  
      ]  
    } Copy to clipboard

Note: ProfilingProp is an optional key. If the user wants to profile a program group, they may add the ProfilingProg key and its corresponding keys to control profiling. Currently, programGroupApp only supports collecting profiling information via stream profiling.

The above configuration file creates:

- A program group named "ProgGroup-1" with two programs named "pg1" and "pg2".
- Each program "pg1" and "pg2" requires four NSPs to run.

Table : qaic-program-group-app argument details

| Argument | Description |
| --- | --- |
| `-i, --input-file <path> ` | Input JSON file path from which to load input data. |
| `--aic-lib-path DEPRECATED ` | Deprecated; set env variable QAIC\_LIB to the full path of the custom library. By default, loads `libQAic.so` from the install location. |
| `-s, --duration(s) ` | Total duration to run the program `60` |
| `-p, --protocol ` | Protocol type `control | data`, default `control` |
| `-v, --verbose ` | Verbose log from program. |
| `-h, --help ` | Help |

**Parent Topic:** [Network execution](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## 6.3 ONNX runtime support

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

ONNX runtime is a cross-platform inference machine-learning accelerator that provides users a common frontend to leverage execution on a range of supported execution providers. The Apps SDK provides AIC backend support for ONNX Runtime v1.10.0.

The QAic execution provider (QAic EP) provides the following features:

- Supports execution of ONNX models using the AIC backend.
- Supports execution and compilation of ONNX models in INT8 and FP16 precisions.
- Supports parallel inference using multiple activations.
- Supports intake of the best parameters from the model-settings YAML file generated by the inference toolkit pipeline.
- Supports intake of the relative path for `aic-binary-dir` in the model-settings YAML file, if the relative-path parameter is set to true.
- Supports compilation and execution of ONNX models with external data format.
- Supports execution of ONNX models using precompiled binaries.
- Supports creation of multiple ONNX runtime sessions provided the device resources are available.
- Supports execution of split ONNX networks on Multi-QAic (MQ) using precompiled binaries.

This version has the following limitations:

- Running subgraphs are not supported. It is assumed that the ONNX model being executed with ONNX runtime on AIC has all the operators supported by the AIC SDK.
- PGQ profile generation is not supported.
- QAic int8 quantization is supported with the following assumptions:
    - PGQ profile is pregenerated and is provided in the model settings file.
    - ONNX runtime optimization is disabled.

**Parent Topic:** [Network execution](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## 6.3.1 Steps to build ONNX runtime

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

Prerequisite software versions:

- Operating system – Ubuntu 20.04
- gcc/g++ toolchain 9
- git 2.17
- CMake 3.18+
- libpci-dev, libudev, libssl-dev, libncurses5, libpng-dev, libgl1-mesaglx, libglib2.0-0
- Python 3.6+
- Python packages – numpy, wheel, flatbuffers-2.0, pyyaml, opencv-python

The AIC backend support code is provided as a patch along with the SDK. Set up the environment and build ONNXRT with QAic as follows:

    export QAIC_LIB=/opt/qti-aic/dev/lib/x86_64/libQAic.so  
    export QAIC_COMPILER_LIB=/opt/qti-aic/dev/lib/x86_64/libQAicCompiler.so  
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /opt/qti-aic/dev/lib/x86_64/   
      
    cd /opt/qti-aic/integrations/qaic_onnxrt  
    ./build_onnxrt_qaic.shCopy to clipboard

This will checkout the ONNX runtime repo (tag 1.10.0 as indicated in QAIC\_ONNXRT\_RELEASE\_VERSION file) in the same location, apply the patch (qaic\_onnxrt.patch), and build the ONNX runtime distribution with qaic EP enabled.

Rebuild the project using build.sh within onnxruntime\_qaic

    ./build.sh --use_qaic --qaic_home /opt/qti-aic/examples/apps/ --build_wheel –parallel --build_shared_lib --config [Release|Debug] [--skip-tests]Copy to clipboard

**Parent Topic:** [ONNX runtime support](https://docs.qualcomm.com/doc/80-PT790-993B/topic/onnxruntime-support-feature.html)

## 6.3.2 QAic execution provider options

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

QAic EP can be configured during session creation using the following provider options:

Table : QAic EP options

| Parameter | Command Syntax |
| --- | --- |
| `Config ` | (string) Path to the model settings file.<br><br><br>              <br>The file contains the model configuration for the best performance and accuracy. |
| `aic_device_id ` | (uint32) Device ID on which inference should run. Default is 0. |

## Example session for QAic execution provider

C++:

    OrtSessionOptionsAppendExecutionProvider_QAic(session_options,config, aic_device_id);Copy to clipboard

Python:

    provider_options = []   
    qaic_provider_options = {}  
    qaic_provider_options['config'] = config  
    qaic_provider_options['device_id'] = aic_device_id  
    provider_options.append(qaic_provider_options)  
    session=onnxruntime.InferenceSession(config['model-path'], sess_options, providers, provider_options)Copy to clipboard

**Parent Topic:** [ONNX runtime support](https://docs.qualcomm.com/doc/80-PT790-993B/topic/onnxruntime-support-feature.html)

## 6.3.3 Inference modes

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## Compile and inference

In this mode, the QAic EP compiles the model during session initialization. The compiled model is used to run the inference. Provide an empty path to the `aic-binary-dir` in the model settings file to run inference in this mode.

## Inference only

In this method, the QAic EP only runs the inference. Follow these steps for this mode:

1. Compile the model using the QAic executor.
2. Provide the path to the compiled model in the model settings file.
3. Run inference using the model settings file.

**Parent Topic:** [ONNX runtime support](https://docs.qualcomm.com/doc/80-PT790-993B/topic/onnxruntime-support-feature.html)

## 6.3.3.1 Example – Running ResNet-50v1 model

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

End-to-end Python and C++ tests for running a ResNet-50 model with QAic EP are available at:`/opt/qti-aic/integrations/qaic_onnxrt/tests/resnet50`.

## Running the ResNet C++ sample

- Compile the sample ResNet C++ test using the `build_tests.sh` script. By default, the test is built using libs from the `onnxruntime_qaic` release build. To enable debugging, rebuild the `onnxruntime_qaic` project in the debug configuration and run `./build_test.sh` with the `–debug` flag.

        build_tests.sh [--release|--debug]Copy to clipboard
- Run the executable. The following commands set the environment and run the ResNet-50 model with the provided image on QAic or CPU backend. The program outputs the most probable prediction class index for each iteration.

        cd build/release 
        ./qaic-onnxrt-resnet50-test -i <path/to/input/png/image> -m  ../../resnet50/resnet50.yamlCopy to clipboard

Table : ResNet C++ test options

| Flag | Description |
| --- | --- |
| `-m, --model-config ` | [Required] Path to the model-setting YAML file. |
| `-i, --input-path ` | [Required] Path to the input PNG image file. |
| `-b, --backend ` | [Optional | Default=’qaic’] Specify qaic / cpu as backend. |
| `-d, --device-id ` | [Optional | Default=0] Specify qaic device ID. |
| `-n, --num-iter ` | [Optional | Default=100] Specify num iterations for the test. |

## Running the ResNet Python sample

Run `test_resnet.py` at `/opt/qti-aic/integrations/qaic_onnxrt/tests/resnet50`.

    python test_resnet.py --model_config ./resnet50/resnet50.yaml --input_file </path/to/png/image>Copy to clipboard

Table : ResNet Python test options

| Flag | Description |
| --- | --- |
| `model_config ` | [Required] Path to the model-setting YAML file. |
| `input_file ` | [Required] Path to the input PNG image file. |
| `backend ` | [Optional | Default=’qaic’] Specify qaic / cpu as backend. |
| `device_id ` | [Optional | Default=0] Specify qaic device ID. |
| `num_iter ` | [Optional | Default=100] Specify num iterations for the test. |

**Parent Topic:** [Inference modes](https://docs.qualcomm.com/doc/80-PT790-993B/topic/inference_modes.html)

## 6.3.3.2 Example – Running models with generic QAic EP test

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

Run `test_qaic_ep.py` at `/opt/qti-aic/integrations/qaic_onnxrt/tests/`.

    python test_qaic_ep.py --model_config ./resnet50/resnet50.yaml --input_file_list </path/to/input/list> Copy to clipboard

Table : QAic EP test options

| Flag | Description |
| --- | --- |
| `model_config ` | [Required] Path to the model-setting YAML file. |
| `input_file_list ` | [Required] Path of the file (.txt) containing a list of batched inputs in .raw format. |
| `backend ` | [Optional | Default=’qaic’] Specify qaic / cpu as backend. |
| `device_id ` | [Optional | Default=0] Specify qaic device ID. |
| `num_iter ` | [Optional | Default=100] Specify num iterations for the test. |
| `max_threads ` | [Optional | Default=1000] Maximum number of threads to run inferences. |
| `log_level ` | [Optional | Default=4] ONNX Runtime log severity level (0-4). |

**Parent Topic:** [Inference modes](https://docs.qualcomm.com/doc/80-PT790-993B/topic/inference_modes.html)

## 6.3.3.3 Generating model settings file

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

- The file contains the model configuration for the best performance and accuracy. This file is used to set the AIC configuration in the QAic execution provider of ONNX runtime.
- Refer to [model_preparation_params](https://docs.qualcomm.com/doc/80-PT790-993B/topic/qinference-optimizer-configuration-model-preparation-params.html) to generate the model settings file for your model.

| Option | Description | Default | Relevance |
| --- | --- | --- | --- |
| Runtime parameters | Runtime parameters | Runtime parameters | Runtime parameters |
| `aic-binary-dir` | Absolute path or relative path (regarding the model settings file parent directory) to the directory with programqpc.bin. | "" | Required to skip compilation. |
| `device-id ` | AIC device ID. | 0 | Optional. |
| `set-size ` | Set size for inference loop execution. | 10 | Optional. |
| `aic-num-of-activations ` | Number of activations. | 1 | Optional. |
| qaicRegisterCustomOp – Compiler C API | qaicRegisterCustomOp – Compiler C API | qaicRegisterCustomOp – Compiler C API | qaicRegisterCustomOp – Compiler C API |
| `register-custom-op` | Register custom op using this configuration file. |  | Required if the model has AIC custom ops; vector of string. |
| Graph config – Compiler API | Graph config – Compiler API | Graph config – Compiler API | Graph config – Compiler API |
| `aic-depth-first-mem` | Sets DFS memory size. | Set by compiler | Optional.<br><br><br>              <br>Used in compilation with `aic-enable-depth-first`. |
| `aic-enable-depth-first ` | Enables DFS with default memory size;<br><br><br>              <br>"True", "False" | Set by compiler | Optional.<br><br><br>              <br>Used in compilation. |
| `aic-num-cores ` | Number of AIC cores to be used for inference. | 1 | Optional.<br><br><br>              <br>Used in compilation. |
| `allocator-dealloc-delay ` | Option to increase buffer lifetime.<br><br><br>              <br>0 - 10, Example: 1. | Set by compiler | Optional.<br><br><br>              <br>Used in compilation. |
| `Batchsize ` | Sets the number of batches to be used for execution. | 1 | Optional.<br><br><br>              <br>Used in compilation. |
| `convert-to-fp16 ` | Run all floating-point in fp16;<br><br><br>              <br>"True", "False" | "False" | Optional.<br><br><br>              <br>Used in compilation. |
| `enable-channelwise ` | Enable channelwise quantization of Convolution op;<br><br><br>              <br>"True", "False" | Set by compiler | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile. |
| `enable-rowwise` | Enable rowwise quantization of FullyConnected and SparseLengthsSum ops;<br><br><br>              <br>"True", "False" | Set by compiler | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile. |
| `execute-nodes-in-fp16 ` | Run all instances of the operators in this list with FP16;<br><br><br>              <br>"True", "False" | Set by compiler | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile for mixed precision. |
| `hwVersion ` | HW version of AIC100. | QAIC\_HW\_V2\_0 | Cannot be configured; set to QAIC\_HW\_V2\_0. |
| `keep-original-precision-for-nodes ` | Run operators in this list with original precision at generation. |  | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile for mixed precision. |
| `Mos ` | Effort level to reduce the on-chip memory; Example: "1" | Set by compiler | Optional.<br><br><br>              <br>Used in compilation. |
| `multicast-weights` | Reduce DDR bandwidth by loading weights used on multiple cores only once and multicasting to the other cores. |  |  |
| ols | Factor to increasing splitting of network for parallelism. | Set by compiler | Optional.<br><br><br>              <br>Used in compilation. |
| quantization-calibration | Specify quantization calibration – "None", "KLMinimization", "Percentile", "MSE", "SQNR", "KLMinimizationV2" | "None" | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile. |
| quantization-schema-activations | Specify quantization schema – "asymmetric", "symmetric", "symmetric\_with\_uint8", "symmetric\_with\_power2\_scale" | "symmetric\_with\_uint8" | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile. |
| quantization-schema-constants | Specify quantization schema – "asymmetric", "symmetric", "symmetric\_with\_uint8", "symmetric\_with\_power2\_scale" | "symmetric\_with\_uint8" | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile. |
| size-split-granularity | To set max tile size, KiB between 512 - 2048.<br><br><br>              <br>Example: 1024. | Set by compiler | Optional.<br><br><br>              <br>Used in compilation. |
| Model params – Compiler API | Model params – Compiler API | Model params – Compiler API | Model params – Compiler API |
| `model-path` | Path to model file |  | Required.<br><br><br>              <br>Used in compilation, ONNXRT framework. |
| `onnx-define-symbol` | Define an ONNX symbol with its value. Pairs of ONNX symbol key,value separated by space. |  | Required.<br><br><br>              <br>Used in compilation, ONNXRT framework. |
| `external-quantization ` | Path to load the externally generated quantization profile |  | Optional. |
| `node-precision-info` | Path to load model loader precision file for setting node instances to FP16 or FP32 |  | Optional.<br><br><br>              <br>Used in compilation with PGQ-profile for mixed precision. |
| Common | Common | Common | Common |
| `relative-path` | `aic-binary-dir` absolute path will be constructed using base-path of model-settings file; <br>              <br>"True", "False" | "False" | Optional.<br><br><br>              <br>Set to true to allow relative-path for `aic-binary-dir`. |

**Parent Topic:** [Inference modes](https://docs.qualcomm.com/doc/80-PT790-993B/topic/inference_modes.html)

## 6.4 Triton support

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

The Apps SDK includes a release for Triton 22.02 with support for:

1. An AIC ONNXRT backend (qaic-onnxrt backend) that integrates AIC100 runtime and compiler support leveraging an ONNX runtime framework.
2. A custom C++ AIC backend (qaic backend) for a Triton inference server, which uses AIC100 runtime APIs to execute inferencing on inputs through a precompiled network of a given model.

**Parent Topic:** [Network execution](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## 6.4.1 Features and limitations

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## qaic-onnxrt backend

- Supports compile + inference and inference only modes.
- Supports sharing an ONNXRT session for a performance matchup against qaic-runner.
- Supports intake of the best parameters from the model-settings YAML file generated by the inference toolkit pipeline.
- Supports intake of relative path for the model-settings YAML file in the configuration protobuf text file of the model.
- Supports the following Triton features: ensemble and shared memory.
- Does not support the Triton feature of dynamic batching.
- Additional features and limitations are common with ONNX runtime. Refer to [ONNX runtime support](https://docs.qualcomm.com/doc/80-PT790-993B/topic/onnxruntime-support-feature.html).

## qaic backend

- Does not support compilation. Uses program binaries compiled with qaic-exec to execute inference.
- Supports the following Triton features: ensemble, dynamic batching, and shared memory.

**Parent Topic:** [Triton support](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_support.html)

## 6.4.1.1 Steps to build Triton server

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

A docker configuration file is used to build a Triton server container with qaic-onnxrt, qaic backend support.

    {Unzipped Apps SDK Path}/tools/docker-build/config/x86-64/Dockerfile.tritonCopy to clipboard

**Parent Topic:** [Features and limitations](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_features_and_limitations.html)

## 6.4.1.1.1 Build Triton server image

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

cd {Unzipped Apps SDK Path}/tools/docker-build/  
    ./build_image.sh --apps-sdk [Path to zipped Apps SDK] --platform-sdk [Path to zipped Platform SDK] --tag [triton-image] --os triton  Copy to clipboard

- This builds a docker image with:
    - Triton server tag: `nvcr.io/nvidia/tritonserver:22.02-py3`
    - ONNXRT version: 1.10.0
    - Backend API version: 1.8
    - Backends: onnxruntime (with qaic EP), qaic
    - Default model repository generated for distil\_bert, yolov5m, ensemble to run on AIC backend at `/opt/qti-aic/aic-triton-model-repo`

**Parent Topic:** [Steps to build Triton server](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_build_server_with_aic_support.html)

## 6.4.1.1.2 Prepare model repository

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

- The default model repository is created within the Triton server:

        /opt/qti-aic/aic-triton-model-repoCopy to clipboard
- Can skip this section for qaic backend validation with the default model repository.
- The model repository path should be mapped while launching the Triton container.

## Directory structure

Place models that need to be loaded and the corresponding configuration files as per the directory structure:

    <model-repo-onnx>   
    ├── <model_1>_onnx    
    │   ├── 1   
    │   │ └── aic100   
    │   │       └── <model_1>_qpc 
    │   │          └── programqpc.bin [required to skip compilation]    
    │   │       └── model.onnx  
    │   │       └── <model_settings>.yaml [required for qaic-onnxrt backend]  
    │   └── config.pbtxt  
    ├── <model_2>_onnx …..  Copy to clipboard

## Generating model settings file

See [Configuration details for Single Model use case](https://docs.qualcomm.com/doc/80-PT790-993B/topic/configuration-details-modelprep.html) and [Configuration details for Model Chaining use case](https://docs.qualcomm.com/doc/80-PT790-993B/topic/configuration-details-modelchaining.html).

## Customize config.pbtxt

For a template, refer to `config.pbtxt` in:

[https://github.com/triton-inference-server/server/tree/main/docs/examples/model_repository/densenet_onnx](https://github.com/triton-inference-server/server/tree/main/docs/examples/model_repository/densenet_onnx)

## Enable execution on AIC100

To enable execution on the AIC100 through the qaic/qaic-onnxrt backend, set the parameters shown in the following table within the configuration probouf text files of the model repository.

Table : Triton – AIC registration

| No. | Parameter | Value |
| --- | --- | --- |
| 1 | default\_model\_file\_name | "aic100/model.onnx" |
| 2 | parameters: use\_qaic | { string\_value: "true" } |
| 3 | instance\_group: kind | KIND\_MODEL |

**Parent Topic:** [Steps to build Triton server](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_build_server_with_aic_support.html)

## 6.4.1.1.3 Configure a model to load with the qaic-onnxrt backend

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

## Configure QAIC EP endpoint parameters

Configure the model\_settings file path and device-id to be passed onto ONNXRT as shown in the following table.

Table : Triton – OnnxRT - QAic EP configuration

| No. | Parameter | Value |
| --- | --- | --- |
| 1 | platform | “onnxruntime\_onnx” |
| 2 | backend | “onnxruntime” [Not required if platform is configured as above] |
| 3 | parameters: config | { string\_value: "1/aic100/[model\_settings].yaml" } |
| 4 | parameters: device\_id | { string\_value: "[QID]" } |

## Enable ORT session sharing

To enable sharing session across an instance group, set the parameter as shown in the following table within the configuration probouf text files of the model repository.

Table : Triton – ORT session sharing

| No. | Parameter | Value |
| --- | --- | --- |
| 1 | parameters: config | { string\_value: "1/aic100/[model\_settings].yaml" } |

**Parent Topic:** [Steps to build Triton server](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_build_server_with_aic_support.html)

## 6.4.1.1.4 Configure a model to load with qaic backend

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

Set the backend as “qaic” and configure the backend parameters.

Table : Triton – qaic backend configuration

| No. | Parameter | Value |
| --- | --- | --- |
| 1 | Backend | “qaic” |
| 2 | parameters: qpc\_path | { string\_value: "“&lt;absolute path of the folder containing programmqpc.bin file &gt;" } |
| 3 | parameters: device\_id | { string\_value: "[QID]" } |
| 4 | parameters: set\_size | { string\_value: "&lt;Set size for inference loop execution&gt;" } |
| 5 | parameters: no\_of\_activations | { string\_value: "&lt;count of runtime activations&gt;" } |

**Parent Topic:** [Steps to build Triton server](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_build_server_with_aic_support.html)

## 6.4.1.1.5 Run Triton server

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

**Launch server container:**

    docker run -it --rm --privileged –shm-size=1g --net=host -v [Path to model_repo_onnx]:/opt/qti-aic/triton-models qran-triton-x86_64:[triton-image tag] bash Copy to clipboard

**Run server with mapped model repository:**

    /opt/tritonserver/bin/tritonserver --model-repository=/opt/qti-aic/triton-modelsCopy to clipboard

**Run server with default model repository:**

    /opt/tritonserver/bin/tritonserver --model-repository=/opt/qti-aic/aic-triton-model-repoCopy to clipboard

**Parent Topic:** [Steps to build Triton server](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_build_server_with_aic_support.html)

## 6.4.1.1.6 Launch Triton client container

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:22.02-py3-sdkCopy to clipboard

**Parent Topic:** [Steps to build Triton server](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_build_server_with_aic_support.html)

## 6.4.1.2 Running inference on the client

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

**Parent Topic:** [Features and limitations](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_features_and_limitations.html)

## 6.4.1.2.1 image\_client

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

Example:

    /workspace/install/bin/image_client -m densenet_onnx -c 1 -s INCEPTION  /workspace/images/mug.jpg Copy to clipboard

**Parent Topic:** [Running inference on the client](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_running_inference_on_the_client.html)

## 6.4.1.2.2 perf\_analyzer

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

Example:

    perf_analyzer -m bert_qa_onnxCopy to clipboard

**Parent Topic:** [Running inference on the client](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_running_inference_on_the_client.html)

## 6.4.1.3 Inference modes (for qaic-onnxrt backend)

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

**Parent Topic:** [Features and limitations](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_features_and_limitations.html)

## 6.4.1.3.1 Compile and inference

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

In this mode, the QAic EP compiles the model during the session initialization. The compiled model is used to run the inference. Provide an empty path to the `aic-binary-dir` in the model settings file to run inference in this mode.

**Parent Topic:** [Inference modes (for qaic-onnxrt backend)](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_inference_modes.html)

## 6.4.1.3.2 Inference only

Source: [https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html](https://docs.qualcomm.com/doc/80-PT790-993B/topic/network-execution.html)

In this method, the QAic EP only runs the inference. Follow these steps for this mode:

1. Compile the model using QAic executor.
2. Provide the path to the compiled model in the model settings file.
3. Run inference using the model settings file.

**Parent Topic:** [Inference modes (for qaic-onnxrt backend)](https://docs.qualcomm.com/doc/80-PT790-993B/topic/triton_inference_modes.html)

Last Published: Jul 26, 2023

[Previous Topic
QAic QPC tool](https://docs.qualcomm.com/bundle/publicresource/80-PT790-993B/topics/network-compilation.md#network-compilation_qaic-qpc-tool) [Next Topic
Network accuracy analysis](https://docs.qualcomm.com/bundle/publicresource/80-PT790-993B/topics/network-accuracy-analysis.md)