# Profiling Models with QAIRT Visualizer

This tutorial shows how to generate profiling reports on QAIRT backends. We will assume that you have
followed the tutorial on [Mobilenet V2 Inference on HTP](https://docs.qualcomm.com/doc/80-87189-2/topic/on_device_inference.html#on-device-inference-android) and you already have a working environment setup.

Note

The expected flow of this tutorial is to execute each snippet step by step. If you would like to skip
the breakdown, you can find the following full tutorials at:

> 
> 
> - For basic profiling see: *&lt;SDK&gt;/examples/QAIRT/python/profiling\_tutorial.py*
> - For performance reports see: *&lt;SDK&gt;/examples/QAIRT/python/op\_trace\_profiling\_tutorial.py*

The parameters for this tutorial are as follows:

> 
> 
> - Framework: PyTorch
> - Model: [InceptionV3](https://pytorch.org/hub/pytorch_vision_inception_v3)
> - Configurations:
> 
>     - Host OS: Linux (x86\_64)
>     - Target Devices: **Snapdragon Android Device**
>     - Processor: Qualcomm Neural Processing Unit (NPU)
>     - Backend: HTP

Note

This example is not compatible with Windows targets.

Tip

This tutorial creates some temporary files as part of the workflow. To customize the temporary file
location, set this environment variable `QAIRT_TMP_DIR` to a location of your choosing.

## Step 1. Prerequisites

This example uses the qairt visualizer python package. You can obtain the visualizer here:
[QAIRT Visualizer](https://docs.qualcomm.com/bundle/publicresource/topics/80-87189-1/getting-started.html?product=1601111740009302#setup).

Following installation, please ensure you can import the visualizer by running the following command in a shell:

python -c "from qairt_visualizer import view"
    Copy to clipboard

Optionally, if you would like to simply run the code without visualization, you can skip this breakdown and go
directly to the tutorial in the SDK.

You will need to disable the visualizer by setting the following flag at the top of each file:

ENABLE_QAIRT_VISUALIZER = False
    Copy to clipboard

## Step 2. Setup

import json
    import os
    import platform
    from pathlib import Path
    from typing import Any
    
    import numpy as np
    import torch
    import torchvision.models as models
    
    import qairt
    from qairt import CompileConfig, Device, DevicePlatformType, Profiler
    from qairt.api.common.backends.htp import HtpDeviceConfig, HtpGraphConfig, PerfProfile
    from qairt.api.configs.device import DeviceFactory, RemoteDeviceIdentifier
    Copy to clipboard

## Step 3. Get an InceptionV3 model

Get a pretrained InceptionV3 model

inceptionv3_artifacts = Path("./inceptionv3_artifacts")
    inceptionv3_artifacts.mkdir(exist_ok=True)
    
    # Create a directory for artifacts
    model = models.inception_v3(weights="Inception_V3_Weights.DEFAULT")
    model.eval()

    # Step 1b: Export the PyTorch model as an ONNX model
    # Prepare the dummy input
    dummy_input = torch.rand((1, 3, 224, 224), dtype=torch.float32)
    inceptionv3_model_path = str(inceptionv3_artifacts / "inceptionv3.onnx")
    
    torch.onnx.export(
        model,
        (dummy_input,),
        inceptionv3_model_path,
        input_names=["input"],
        output_names=["output"],
        opset_version=11,
    )
    Copy to clipboard

## Step 4. Convert the model

Convert the model and set `enable_framework_trace` argument to true. The argument creates a mapping
from framework operations to their corresponding QAIRT operations which will be used during profiling.

converted_model = qairt.convert(inceptionv3_model_path, enable_framework_trace=True)
    Copy to clipboard

## Step 5. Set up an Android device

Reusing the android setup instructions from Step 4 in [Mobilenet V2 Inference on HTP](https://docs.qualcomm.com/doc/80-87189-2/topic/on_device_inference.html#on-device-inference-android)

android_serial = os.getenv("ANDROID_SERIAL")
    android_hostname = os.getenv("ANDROID_HOSTNAME")
    
    device_id = RemoteDeviceIdentifier(serial_id=android_serial, hostname=android_hostname)
    android_device = Device(identifier=device_id, type=DevicePlatformType.ANDROID)
    Copy to clipboard

## Step 6. Generating Profiling Reports

Profiling reports can provide insights into the performance of the model on your target device. Reports
can be generated at different levels to provide finer grained details on metrics such as bandwidth and latency.
The following sections will guide you through the process of generating different profiling reports for your model.

### Basic Reports

To generate a profiling report, you can instantiate a [`qairt.Profiler`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-core-api.html#qairt.Profiler) instance. The profiler
acts as a context manager that gathers profiling data from compile and execution calls. You can generate
basic, detailed, client or backend reports by setting the <cite>level</cite> parameter in the context manager.

Generating a report is as simple as performing inference, compilation or both within a context

input_array = np.random.randn(1, 3, 224, 224).astype(np.float32)
    
    # You may set the backend here:
    desired_backend = "HTP"
    
    with Profiler(context={"level": "basic"}) as profiler:
        # Execute the model directly without compiling.
        _ = converted_model(input_array, device=android_device, backend=desired_backend)
    
        # Generate the profiler report
        basic_model_report = profiler.generate_report()
    
    basic_report_path = str(inceptionv3_artifacts / "profiling_report.json")
    basic_model_report.dump(basic_report_path)
    Copy to clipboard

You can view the report using the <cite>view</cite> function. This will open the report in a QAIRT Visualizer window.

view(basic_report_path)
    Copy to clipboard

Note

Generating detailed reports may take slightly longer due to intermediate output tensor generation.

### Step 7. Detailed Reports

# Create the data as before
    input_array = np.random.randn(1, 3, 224, 224).astype(np.float32)
    
    # You may set the backend here
    desired_backend = "HTP"
    
    # Generating a detailed report as is simple as changing the context level
    with Profiler(context={"level": "detailed"}) as profiler:
        _ = converted_model(input_array, device=android_device, backend=desired_backend)
    
        # Generate the profiler report
        detailed_model_report = profiler.generate_report()
    
    detailed_report_path = str(inceptionv3_artifacts / "detailed_report.json")
    detailed_model_report.dump(detailed_report_path)
    Copy to clipboard

You can view the detailed report and model using the visualizer.
This view enables you to interact with the model and report simultaneously.

view(converted_model, reports=detailed_report_path)
    Copy to clipboard

Note

The following section is HTP specific.

### Step 8. Performance Reports

In this section, we can explore the impact of applying a few HTP backend settings on generated
profiling reports. We’ll specialize the compile configuration by enabling fp16 precision and
adding soc-specific details to improve performance.

First, we’ll retrieve the chipset of the android device and set the soc details.

# Retrieve the chipset
    chipset = android_device.get_chipset()
    print(f"Device Chipset: {chipset}\n")
    Copy to clipboard

Assuming the device is a Snapdragon 8 Elite android device, the code above will return:

Device Chipset: SM8750
    Copy to clipboard

If you see this value: `UNKNOWN`, then your chipset could not be automatically detected. You can manually
identify the chipset for your device from this table: [Chipsets](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html#supported-snapdragon-devices)

Next, we’ll set the soc details for the HTP backend assuming SM8750. Change the chipset below if you are using a different device.

# Then set the soc details
    soc_details = DeviceFactory.get_device_soc_details("HTP", chipset)
    
    print(f"Device SoC Details: {soc_details.model_dump_json(indent=4)}\n")
    Copy to clipboard

You can inspect the soc details to see additional details such as num\_of\_hvx threads and vtcm size

Device SoC Details: {
        "chipset": "SM8750",
        "model": "69",
        "dsp_arch": 79,
        "vtcm_size_in_mb": 8,
        "num_of_hvx_threads": 6,
        "supports_fp16": true
    }
    Copy to clipboard

We’ll use the soc details and set the graph to fp16 relaxed precision as before

# Create the graph config
    htp_graph_config = HtpGraphConfig(
        name="inceptionv3",
        fp16_relaxed_precision=True,
        vtcm_size_in_mb=soc_details.vtcm_size_in_mb,
        hvx_threads=soc_details.num_of_hvx_threads,
    )
    
    # Create the compile config
    compile_config = CompileConfig(
        backend="HTP", soc_details=f"chipset:{chipset}", graph_custom_configs=[htp_graph_config]
    )
    
    # Set the HTP performance profile to burst. Defaults to HIGH_PERFORMANCE
    compile_config.device_custom_configs[0].cores[0].perf_profile = PerfProfile.BURST
    Copy to clipboard

Next, we’ll compile the model and generate an Op Trace report.

#### Op Trace Reports

Op trace provides internal graph execution details in the form of per-operation cycle counts across each hardware thread.
This information is useful for identifying performance bottlenecks at an operation level.

To generate an op-trace, we need to compile and execute the model in a profiler context. The profiler
gathers the report from each call to compile and execute.

# Create a random input array
    input_array = np.random.randn(1, 3, 224, 224).astype(np.float32)
    
    with Profiler(context={"level": "detailed", "option": "optrace"}) as profiler:
        # Compile the model
        compiled_model = qairt.compile(converted_model, config=compile_config)
    
        # Execute the model
        _ = compiled_model(input_array, device=android_device)
    
        # Generate the op trace report
        op_trace_report = profiler.generate_report()
    
        # Save the profiler report as a .json file
        op_trace_report_path = str(inceptionv3_artifacts / "op_trace_report.json")
        op_trace_report.dump(op_trace_report_path)
    Copy to clipboard

Note

A compile config instance should be passed to the compile function to generate an op trace report.

The op trace output is a JSON file that can be viewed with the visualizer. You can interact with the model and
observe the op trace nodes graphically by viewing both in the same window.

view(converted_model, reports=op_trace_report_path)
    Copy to clipboard

Note

To view in separate windows, set options=DisplayOptions(use\_same\_workspace=False)

#### Qualcomm Hexagon Analysis Summary (QHAS) Report

A QHAS Report includes a summary of overall HTP resource utilization, active cycles, dominant cycle path,
and tracing back to the original QNN graph.

To view the QHAS report, we can save the report as a .json file and then view it using the visualizer.

qhas_report_path = str(inceptionv3_artifacts / "qhas_report.json")
    op_trace_report.summary.dump(qhas_report_path)
    view(reports=qhas_report_path)
    Copy to clipboard

Last Published: Jul 08, 2026

[Previous Topic
Next Steps](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/on_device_inference.md) [Next Topic
Tuning Models](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/tuning_tutorial.md)