# Mobilenet V2 Remote Inference on arm-linux

This tutorial shows how to execute a MobileNetV2 model on a remote arm-linux device using the QAIRT HTP Backend.
The tutorial provides a step-by-step breakdown of the process. You can copy each snippet into a single script to run the tutorial end to end.

The parameters for this tutorial are as follows:

> 
> 
> - Framework: PyTorch
> - Model: [MobileNetV2](https://pytorch.org/hub/pytorch_vision_mobilenet_v2)
> - Configurations:
> 
>     - Host OS: Linux (x86\_64)
>     - Target devices: arm-linux device (Linux-aarch64 platforms e.g., [QCS6490](https://www.qualcomm.com/internet-of-things/products/q6-series))
>     - Processor: Qualcomm NPU
>     - Backend: HTP

Note

arm-linux refers to Linux-aarch64 platforms (Ubuntu 24.04 or QLI). The model preparation
steps below (convert, quantize, and compile) are the same as those used for on-device
development in the [ResNet50 Native Inference on arm-linux](https://docs.qualcomm.com/doc/80-87189-2/topic/native_inference.html#native-inference-oelinux) tutorial; here they run on the x86\_64
host and execution is driven remotely on the target device.

Note

This remote *prepare + run* flow is not limited to arm-linux targets. The same steps work
when targeting **linux-x86** (`DevicePlatformType.X86_64_LINUX`) — useful when an
arm-linux device is not available.

Note

On some devices (for example, QCS6490), HTP backend execution requires a quantized model.
This tutorial includes the quantization step for reference.

Tip

This tutorial creates some temporary files as part of the workflow. To customize the temporary file
location, set the env variable *QAIRT\_TMP\_DIR* to a location of your choosing.

## Step 1. Setup

import json
    import os
    import platform
    from pathlib import Path
    
    import numpy as np
    import requests
    import torch
    import torchvision.transforms as transforms
    from PIL import Image
    
    import qairt
    from qairt import CompileConfig, Device, DevicePlatformType, ExecutionResult
    from qairt.api.converter.converter_config import CalibrationConfig
    from qti.aisw.tools.core.utilities.devices.api.device_factory import DeviceFactory
    Copy to clipboard

## Step 2. Get a MobileNetV2 model

Download the MobileNetV2 model from PyTorch Hub.

pytorch_model = torch.hub.load("pytorch/vision:v0.10.0", "mobilenet_v2", pretrained=True)
    pytorch_model.eval()
    
    # Create a directory for artifacts
    artifacts_dir = Path("./mobilenetv2_artifacts").resolve()
    artifacts_dir.mkdir(parents=True, exist_ok=True)
    onnx_model_path = str(artifacts_dir / "mobilenet_v2.onnx")
    
    # Export the PyTorch model as an ONNX model
    dummy_input = torch.rand((1, 3, 224, 224), dtype=torch.float32)
    
    torch.onnx.export(
        pytorch_model,
        (dummy_input,),
        onnx_model_path,
        input_names=["input"],
        output_names=["output"],
        opset_version=11,
    )
    Copy to clipboard

## Step 3. Prepare input preprocessing function

Define a preprocessing function that will be used for both calibration and inference.

# ImageNet-specific preprocessing parameters
    IMAGENET_INPUT_SIZE = 224
    IMAGENET_MEAN = [0.485, 0.456, 0.406]
    IMAGENET_STD = [0.229, 0.224, 0.225]
    
    # To make things simpler, we can define a simple function to preprocess each image.
    def preprocess_input(image: str) -> np.ndarray:
        image_obj = Image.open(image)
    
        preprocess = transforms.Compose(
            [
                transforms.Resize(IMAGENET_INPUT_SIZE),
                transforms.CenterCrop(IMAGENET_INPUT_SIZE),
                transforms.ToTensor(),
                transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
            ]
        )
        tensor = preprocess(image_obj).unsqueeze(0)
        return tensor.numpy()
    Copy to clipboard

## Step 4. Convert and quantize the model

Once the model is exported, you can proceed to convert and quantize it using QAIRT. For best quantization quality,
**use real representative images for calibration**.

image_location = os.path.join(os.environ["QAIRT_SDK_ROOT"], "examples", "QAIRT", "python", "images")
    IMAGE_DATASET = {
        "african elephant": os.path.join(image_location, "african_elephant.jpg"),
        "samoyed": os.path.join(image_location, "samoyed.jpg"),
        "sea lion": os.path.join(image_location, "sea_lion.jpg"),
    }
    
    # Split dataset: african elephant for calibration, others for prediction
    CALIBRATION_DATASET = {
        "african elephant": IMAGE_DATASET["african elephant"],
    }
    
    PREDICTION_DATASET = {
        "samoyed": IMAGE_DATASET["samoyed"],
        "sea lion": IMAGE_DATASET["sea lion"]
    }
    
    calibration_data_list = []
    
    for label, image_path in CALIBRATION_DATASET.items():
        preprocessed_data = preprocess_input(image_path)
        calibration_data_list.append(preprocessed_data)
    
    # Stack all calibration samples into a single array
    calibration_data = np.vstack(calibration_data_list).astype(np.float32)
    print(f"Created calibration dataset with {len(calibration_data_list)} images")
    
    # Use the calibration data for quantization
    calib_config = CalibrationConfig(dataset=calibration_data)
    quantized_model: qairt.Model = qairt.convert(onnx_model_path, calibration_config=calib_config)
    
    # (Optional) Save the quantized model for later use or native execution
    quantized_model_path = artifacts_dir / "mobilenet_v2_quantized.dlc"
    quantized_model.save(str(quantized_model_path))
    print(f"Quantized model saved to: {quantized_model_path}")
    Copy to clipboard

The `convert` API with calibration config produces a quantized [`qairt.Model`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-core-api.html#qairt.Model) object that can be executed and saved
to disk. Using multiple diverse samples for calibration helps improve the quantization accuracy.

## Step 5. Create an arm-linux device

Before you can compile and run the model on the device, you need to create a `qairt.api.device.Device` object.

The object encapsulates the connection to the arm-linux device via SSH. You need to provide the serial number
and hostname of your IoT device.

# Set your device serial and hostname
    iot_serial = os.getenv("IOT_SERIAL", "your_device_serial")
    iot_hostname = os.getenv("IOT_HOSTNAME", "your_device_hostname")
    
    armlinux_device = Device(
        type=DevicePlatformType.LINUX_EMBEDDED,
        identifier=f"{iot_serial}@{iot_hostname}"
    )
    Copy to clipboard

## Step 6. Retrieve device information

Retrieve the chipset information from the device and get the SoC details for compilation.

# Retrieve the chipset
    chipset = armlinux_device.get_chipset()
    print(f"Target chipset: {chipset}")
    
    # Get the SoC details
    soc_details = DeviceFactory.get_device_soc_details("HTP", chipset)
    
    # Inspect the SoC details
    print(f"Device SoC details: {soc_details.model_dump_json(indent=4)}\n")
    Copy to clipboard

The SoC details provide important information about the device capabilities. For example, on a QCS6490 device, you should see:

Device SoC details: {
        "chipset": "QCS6490",
        "model": "93",
        "dsp_arch": 68,
        "vtcm_size_in_mb": 2,
        "num_of_hvx_threads": 4,
        "supports_fp16": false
    }
    Copy to clipboard

The SoC details include:

- **chipset**: Specific chipset model (e.g., QCS6490)
- **model**: SoC model number
- **dsp\_arch**: DSP architecture version
- **vtcm\_size\_in\_mb**: Vector Tightly Coupled Memory (VTCM) size in megabytes
- **num\_of\_hvx\_threads**: Number of Hexagon Vector eXtensions (HVX) threads available
- **supports\_fp16**: Whether the device supports FP16 operations

## Step 7. Compile the model

Compile the model for the target device using the retrieved SoC details.

from qairt.api.common.backends.htp import HtpGraphConfig
    
    # Create HTP graph configuration
    htp_graph_config = HtpGraphConfig(
        name=quantized_model.module.info.graphs[0].name,
        fp16_relaxed_precision=soc_details.supports_fp16,
        vtcm_size_in_mb=soc_details.vtcm_size_in_mb,
        hvx_threads=soc_details.num_of_hvx_threads,
    )
    
    # Create the compile config
    compile_config = CompileConfig(
        backend="HTP",
        soc_details=soc_details,
        graph_custom_configs=[htp_graph_config]
    )
    
    # Compile the model
    compiled_model: qairt.CompiledModel = qairt.compile(quantized_model, config=compile_config)
    
    # Print the information from the model
    print(json.dumps(compiled_model.module.info.as_dict(), indent=4))
    
    # (Optional) Save the compiled model
    compiled_model_path = artifacts_dir / "mobilenet_v2_compiled.bin"
    compiled_model.save(str(compiled_model_path))
    print(f"Compiled model saved to: {compiled_model_path}")
    Copy to clipboard

The compilation process will produce a [`qairt.api.compiled_model.CompiledModel`](https://docs.qualcomm.com/doc/80-87189-2/topic/qairt-core-api.html#qairt.CompiledModel) object. You should see information about
the compiled model below:

{
        "name": "mobilenet_v2",
        "graphs": [
            {
                "name": "mobilenet_v2",
                "inputs": [
                    {
                        "name": "input",
                        "dimensions": [
                            1,
                            3,
                            224,
                            224
                        ],
                        "data_type": "QNN_DATATYPE_UFIXED_POINT_8"
                    }
                ],
                "outputs": [
                    {
                        "name": "output",
                        "dimensions": [
                            1,
                            1000
                        ],
                        "data_type": "QNN_DATATYPE_UFIXED_POINT_8"
                    }
                ]
            }
        ],
        "soc_name": "93",
        "backend": "HTP",
        "backend_info": {
            "arch": 68,
            "vtcm_size": 2,
            "optimization_level": 0
        }
    }
    Copy to clipboard

## Step 8. Execute on device

Execute the model remotely from your host machine. The model runs on the device, but the execution is controlled from the host.

outputs = []
    
    # Initialize the model for REMOTE execution (with device parameter)
    compiled_model.initialize(device=armlinux_device)
    
    for label, image_url in PREDICTION_DATASET.items():
        image: np.ndarray = preprocess_input(image_url)
    
        result: ExecutionResult = compiled_model(image)
    
        _, output_tensors = compiled_model.output_tensors[0]
    
        outputs.append((result[output_tensors[0].name], label))
    
    # Clean up resources
    compiled_model.destroy()
    Copy to clipboard

## Step 9. Post-processing

For post-processing, use imagenet labels obtained from [Qualcomm AI Hub](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/imagenet_classes.txt).

Here is a small snippet of code that computes softmax probabilities from the output and prints the top-5 predictions using the class
labels.

def postprocess_and_evaluate(output_ndarray: np.ndarray, expected_label: str) -> None:
        """Postprocess classification results and evaluate prediction"""
        # Softmax function
        on_device_probabilities = np.exp(output_ndarray) / np.sum(np.exp(output_ndarray), axis=1)
    
        # Read the ImageNet class labels
        sample_classes = "https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/imagenet_classes.txt"
        response = requests.get(sample_classes, stream=True, timeout=5)
        response.raw.decode_content = True
        categories = [s.strip().decode("utf-8") for s in response.raw]
    
        # Print the top five predictions
        print("Top-5 predictions:")
        top5_classes = np.argsort(on_device_probabilities[0], axis=0)[-5:]
        prediction = categories[top5_classes[-1]]
        for c in reversed(top5_classes):
            print(f"{c} {categories[c]:20s} {on_device_probabilities[0][c]:>6.1%}")
        print()
    
        # Evaluate the prediction
        prediction_lower = prediction.lower()
        expected_lower = expected_label.lower()
        if prediction_lower == expected_lower:
            print(f"Successful prediction: {prediction_lower}\n")
        else:
            print(f"Failed prediction: {prediction_lower}. Expected {expected_lower}\n")
    Copy to clipboard

The code below prints predictions for each image in the dataset.

for arr, label in outputs:
        # Postprocess and evaluate the results
        postprocess_and_evaluate(arr, label)
    Copy to clipboard

### Expected Output

When running inference, you should see output similar to:

Top-5 predictions:
    258 Samoyed               47.4%
    259 Pomeranian            18.7%
    332 Angora                 7.4%
    279 Arctic fox             6.6%
    261 keeshond               4.0%
    
    Successful prediction: samoyed
    
    Top-5 predictions:
    150 sea lion              98.3%
    147 grey whale             0.5%
    360 otter                  0.4%
    146 albatross              0.2%
    148 killer whale           0.1%
    
    Successful prediction: sea lion
    Copy to clipboard

You can see that the model is able to correctly predict the labels for all images provided. This shows that the
model is executing correctly on the target device.

## Next Steps

Note

This tutorial demonstrated **remote execution** where the model runs on the device but is controlled from the host machine.
If you want to run inference **natively on the device** (without a host connection), see the [ResNet50 Native Inference on arm-linux](https://docs.qualcomm.com/doc/80-87189-2/topic/native_inference.html#native-inference-oelinux) tutorial.

Last Published: Jul 08, 2026

[Previous Topic
Next Steps](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/native_inference.md) [Next Topic
Gen AI API Tutorials](https://docs.qualcomm.com/bundle/publicresource/80-87189-2/topics/tutorials.md)