# Tuning Models

This tutorial explains how to improve model inference performance using the Tuner API. It explains step-by-step
how to tune a model to reduce latency on a Snapdragon device.

Note

If you want a shorter version, you can find it in the QAIRT SDK at:

> 
> 
> - `examples/QAIRT/python/tuning_tutorial.py`

Tutorial setup:

- Framework: PyTorch
- Host OS: Linux (x86\_64)
- Model: [ResNet18](https://docs.pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html)
- Target Device: Snapdragon Android Device SM8750
- Processor: Qualcomm NPU
- Backend: HTP

Tip

This tutorial creates temporary files. To change where these files are saved, set the *QAIRT\_TMP\_DIR* environment variable.

## Step 1: Setup

First, import the libraries and create a directory for output files.

import json
    import os
    from pathlib import Path
    
    import numpy as np
    import torch
    from torchvision import models
    
    import qairt
    from qairt import CompileConfig, Device, DevicePlatformType
    
    ENABLE_QAIRT_VISUALIZER = False
    
    tuned_artifacts_dir = Path("./tuned_artifacts")
    tuned_artifacts_dir.mkdir(exist_ok=True)
    Copy to clipboard

## Step 2: Get a ResNet model

Load a pre-trained ResNet18 model and export it to ONNX format.

resnet_model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
    resnet_model.eval()
    dummy_input = torch.randn(1, 3, 32, 32)
    resnet_model_path = "resnet18.onnx"
    torch.onnx.export(
        resnet_model,
        dummy_input,
        resnet_model_path,
        input_names=["input"],
    )
    Copy to clipboard

## Step 3: Connect to an Android device

Set up your Android device using environment variables.

android_serial = os.getenv("ANDROID_SERIAL")
    android_hostname = os.getenv("ANDROID_HOSTNAME")
    
    android_device = None
    
    if android_serial:
        print(f"INFO: ANDROID_SERIAL : {android_serial} was set. Enabling tutorial for Android")
        device_id = f"{android_serial}@{android_hostname}" if android_hostname else android_serial
        android_device = Device(identifier=device_id, type=DevicePlatformType.ANDROID)
    
    if not android_device:
        print("INFO: ANDROID_SERIAL was not set. Exiting")
        exit(1)
    Copy to clipboard

## Step 4: Create the first QHAS report

Convert and compile the model, then run it and generate a QHAS report.

resnet_model_converted = qairt.convert(
        resnet_model_path, input_tensor_config=[dict(name="input", shape=(1, 3, 32, 32))]
    )
    resnet_model_converted.save(tuned_artifacts_dir / "resnet_model_converted.dlc")
    
    compile_config = CompileConfig(backend="HTP", soc_details=f"chipset:{android_device.get_chipset()}")
    
    with qairt.Profiler(context=dict(level="detailed", option="optrace")) as pf:
        compiled_model = qairt.compile(resnet_model_converted, config=compile_config)
    
        input_array = np.random.randn(1, 3, 32, 32).astype(np.float32)
        _ = compiled_model(inputs=input_array, device=android_device)
    
        op_trace_report = pf.generate_report()
        qhas_report_path = tuned_artifacts_dir / "qhas_report.json"
        qhas_report = op_trace_report.summary.dump(qhas_report_path)
    Copy to clipboard

Note

Sometimes, get\_chipset may not return the correct value. You can get the chipset using this command on your host:

<cite>adb -H ${ANDROID_HOSTNAME} -s $ANDROID_SERIAL shell getprop ro.soc.model</cite>

def print_or_view_qhas(report_path):
        if ENABLE_QAIRT_VISUALIZER:
            from qairt_visualizer import view
    
            view(reports=str(qhas_report_path))
        else:
            qhas_report = json.load(open(report_path))
            data = qhas_report["data"]["htp_overall_summary"]["data"][0]
    
            qhas_summary = {}
            for key, value in data.items():
                if key != "htp_resources":
                    qhas_summary[key] = value
    
            print("QHAS Summary: \n")
            print(json.dumps(qhas_summary, indent=4))
    
    print_or_view_qhas(qhas_report_path)
    Copy to clipboard

The report shows *total dram* (DDR bandwidth in bytes) and *time\_us* (latency in microseconds).

The next section introduces the Tuner API, which can help reduce bandwidth and latency.

The Tuner API tries different options to find the fastest model. It compiles and runs the model several times to find the best result.

## Step 5: Tune the model

Use the Tuner API to reduce latency (*time\_us*). The code below shows how to use it.

from qairt.api.compiler.backends.common import tuner
    
    input_array = np.random.randn(1, 3, 32, 32).astype(np.float32)
    best_compiled_model, report = tuner.optimize(
        resnet_model_converted,
        criteria="latency",
        compile_args=dict(config=compile_config),
        execution_args=dict(inputs=input_array, device=android_device),
    )
    Copy to clipboard

Note that compile\_args and execution\_args are dictionaries passed to <cite>qairt.compile</cite> and <cite>CompiledModel.__call__</cite>.
The tuner simply forwards these arguments to the respective APIs.

To illustrate the connection, an example call to each API is shown below:

# Compile the model with the same arguments passed to the tuner
    compiled_model = qairt.compile(
         resnet_model_converted,
         config=compile_config,
     )
    
     # Run the compiled model with the same arguments passed to the tuner
    outputs = compiled_model(inputs=input_array, device=android_device)
    Copy to clipboard

Please refer to the <cite>qairt.compile</cite> and <cite>CompiledModel.__call__</cite> documentation for more details
on additional arguments that can be passed.

Following the optimize call, you should see messages in your console indicating progress and improvements.

Example output:

2025-07-18 13:18:15,005 - qairt.tuner - INFO - Improvement in criteria (latency) observed: 10.67%
    Copy to clipboard

After tuning, view the new QHAS report:

assert report.summary is not None
    bw_tuned_qhas_report_path = Path(tuned_artifacts_dir / "latency_tuned_model_qhas.json")
    bw_qhas = report.summary.dump(bw_tuned_qhas_report_path)
    
    print_or_view_qhas(bw_tuned_qhas_report_path)
    Copy to clipboard

On a Snapdragon SM8750 device, you should see about 10% lower *total\_time (us)*. Results may vary by device and run. You may need to tune several times to see a clear improvement.

Finally, you can save the tuned model as a context binary:

best_compiled_model.save(tuned_artifacts_dir / "resnet_model_tuned.bin")
    Copy to clipboard

Note

You can also tune for lower bandwidth (*total\_dram*) by setting <cite>criteria=”bandwidth”</cite>.

Last Published: Jul 08, 2026

[Previous Topic
Step 8. Performance Reports](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/profiling_models_with_visualizer.md) [Next Topic
For Windows on Snapdragon devices](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/tutorials.md)