# ResNet50 Native Inference on arm-linux

This tutorial shows how to prepare and execute a ResNet50 model entirely on an
[arm-linux device](https://docs.qualcomm.com/doc/80-87189-2/topic/tutorials.html#arm-linux-devices) using the QAIRT HTP Backend. Development is now
supported directly on arm-linux
devices, so the same machine that runs inference can also prepare the model: convert, quantize,
compile, and execute — all natively, without a remote host connection. This makes it ideal for
edge deployment scenarios.

The tutorial provides a step-by-step breakdown of the process. You can copy each snippet into a
single script to run the tutorial end to end.

Note

This tutorial assumes you are running the Python script directly on an arm-linux device.
For remote inference from a host machine, see the [Mobilenet V2 Remote Inference on arm-linux](https://docs.qualcomm.com/doc/80-87189-2/topic/remote_inference.html#remote-inference-oelinux) tutorial.

The parameters for this tutorial are as follows:

> 
> 
> - Framework: ONNX
> - Model: [ResNet50](https://aihub.qualcomm.com/iot/models/resnet50)
> - Configurations:
> 
>     - Host OS: arm-linux (Ubuntu 24.04 or QLI)
>     - Target device: Local arm-linux device (Linux-aarch64 platforms e.g., [QCS8275](https://www.qualcomm.com/internet-of-things/products/iq8-series))
>     - Processor: Qualcomm NPU
>     - Backend: HTP

Note

On some devices (for example, QCS6490), HTP backend execution requires a quantized model.
This tutorial includes the quantization step for reference.

Tip

This tutorial creates some temporary files as part of the workflow. To customize the temporary file
location, set the env variable *QAIRT\_TMP\_DIR* to a location of your choosing.

## Prerequisites

Before starting this tutorial, ensure you have:

1. An arm-linux device (Ubuntu 24.04 or QLI) with `qairt-dev` and the required QAIRT libraries
available:

    - **QLI (Qualcomm Linux)**: The flashed device image already includes the QAIRT Python APIs
and required libraries.
    - **Ubuntu 24.04**: Install `qairt-dev` directly on the device (instructions can be found
[here](https://docs.qualcomm.com/bundle/publicresource/topics/80-87189-2/setup.html)).
`qairt-dev` can handle installing the required libraries for you.

Note

Installing `qairt-dev` on a Linux-aarch64 device requires Python 3.12.
2. A ResNet50 ONNX model. Download it from [Qualcomm AI Hub](https://aihub.qualcomm.com/iot/models/resnet50?chipsets=qualcomm-qcs8275)
using the **Download Model** option on the model page.
3. An input list file for calibration and inference. Each line points to a raw input file in the
model’s native input format.

## Step 1. Setup

Import the necessary libraries. This tutorial uses the QAIRT Python API to prepare and execute
models natively.

from pathlib import Path
    
    import numpy as np
    
    import qairt
    from qairt.api.converter.converter_config import CalibrationConfig
    Copy to clipboard

## Step 2. Prepare model and inputs

Set up paths to your ResNet50 ONNX model and input list, and create an output directory for
the artifacts produced during preparation.

# Set paths to your model and inputs
    model_path = "path/to/your/resnet50.onnx"
    input_list_path = "path/to/your/input_list.txt"
    
    # Set output directory
    output_dir = Path("output")
    output_dir.mkdir(exist_ok=True)
    Copy to clipboard

## Step 3. Convert and quantize the model

Convert and quantize the model into a DLC directly on the device. The `convert` API accepts
ONNX, TensorFlow, and TFLite source models. Calibration data is provided via the input list
file; set `use_native_input_files=True` when the listed files are already in the model’s
native input format. For best quantization quality, use real representative inputs for
calibration.

# Convert and quantize using the calibration input list
    calib_config = CalibrationConfig(dataset=input_list_path, use_native_input_files=True)
    quantized_model: qairt.Model = qairt.convert(model_path, calibration_config=calib_config)
    
    # Save the quantized DLC for later use
    quantized_model_path = output_dir / "resnet50_quantized.dlc"
    quantized_model.save(str(quantized_model_path))
    print(f"Quantized model saved to: {quantized_model_path}")
    Copy to clipboard

The `convert` API with a calibration config produces a quantized [`qairt.Model`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-core-api.html#qairt.Model) object
that can be compiled, executed, and saved to disk. Using multiple diverse samples for
calibration helps improve the quantization accuracy.

## Step 4. Compile the model

Compile the quantized model for the HTP backend. When compiling natively on the device, the
default configuration targets the local device’s SoC, so no explicit SoC details are required.

compiled_model: qairt.CompiledModel = qairt.compile(quantized_model, backend="HTP")
    print("Compiled model:", compiled_model)
    
    # (Optional) Save the compiled context binary for reuse
    compiled_model_path = output_dir / "resnet50_htp.bin"
    compiled_model.save(str(compiled_model_path))
    print(f"Compiled model saved to: {compiled_model_path}")
    Copy to clipboard

Tip

To compile for a specific SoC explicitly, build a `qairt.CompileConfig` with
`soc_details` and per-graph HTP options. See the [Mobilenet V2 Remote Inference on arm-linux](https://docs.qualcomm.com/doc/80-87189-2/topic/remote_inference.html#remote-inference-oelinux) tutorial
for an example that retrieves SoC details and constructs an `HtpGraphConfig`.

## Step 5. Load an existing DLC (optional)

If you already have a quantized DLC (prepared earlier or transferred from a host), load it
directly with `qairt.load()` instead of repeating Steps 3-4.

compiled_model = qairt.load(quantized_model_path)
    # See model information
    print("Model DLC information:")
    print(compiled_model.module.info)
    Copy to clipboard

The model information will display details about the model’s inputs, outputs, and configuration.

## Step 6. Single input execution

Execute the model with a single input. You can generate random input data or use your own
preprocessed data.

def generate_input_data() -> np.ndarray:
        # Generate random data matching the ResNet50 input shape.
        return np.random.rand(1, 3, 224, 224).astype(np.float32)
    
    # Execute the model with a single input
    print("Executing model with single input:")
    exec_result = compiled_model(inputs=generate_input_data(), backend="HTP")
    print(exec_result)
    Copy to clipboard

## Step 7. Batch execution with input list

For processing multiple inputs, you can use the input list file. This is more efficient than
running individual inferences.

# Execute using the input list
    print("Execution result for running the model using an input list:")
    exec_result_batch = compiled_model(inputs=input_list_path, backend="HTP")
    print(exec_result_batch)
    Copy to clipboard

## Step 8. Stream execution (initialize once, run multiple times)

For optimal performance when running multiple inferences, use stream execution. This initializes
the backend once and reuses it for multiple runs, significantly reducing overhead.

Note

Stream execution provides significant performance benefits:

- **Initialize**: Set up the backend once (`initialize()`)
- **Run**: Execute multiple inferences without re-initialization
- **Destroy**: Clean up resources when done (`destroy()`)

print("\nStream execution example:")
    print("Initializing backend once and running model multiple times")
    # Initialize backend once
    compiled_model.initialize(backend="HTP")
    # Run multiple inferences using the initialized backend
    for i in range(3):
        exec_result_stream = compiled_model(inputs=generate_input_data())
        print(exec_result_stream)
    # Clean up backend resources
    compiled_model.destroy()
    Copy to clipboard

## Next Steps

This tutorial demonstrated an end-to-end native workflow where model preparation and
execution both run directly on the arm-linux device.
If you want to control inference remotely from a host machine, see the
[Mobilenet V2 Remote Inference on arm-linux](https://docs.qualcomm.com/doc/80-87189-2/topic/remote_inference.html#remote-inference-oelinux) tutorial.

Last Published: Jul 08, 2026

[Previous Topic
For arm-linux devices](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/tutorials.md) [Next Topic
Mobilenet V2 Remote Inference on arm-linux](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/remote_inference.md)