# Run a LiteRT model on NPU

Use the [LiteRT](https://ai.google.dev/edge/litert) runtime to run existing
quantized LiteRT models on the NPU of Qualcomm® Dragonwing™
devices.

## Prerequisites

Before running the LiteRT sample applications, complete the prerequisites such as connecting to device with SSH, downloading reference Python app, models and run them with file output and output to a connected display.

1. Sign in with SSH and connect to the target device. For detailed instructions, see:

    - [Sign in using SSH for Qualcomm Linux](https://docs.qualcomm.com/bundle/publicresource/topics/80-70022-254/how_to.html#use-ssh)
    - [Sign in using SSH for Ubuntu Server](https://docs.qualcomm.com/bundle/publicresource/topics/80-90441-1/Use_Ubuntu_on_RB3_Gen2_3.html#sign-in-to-the-rb3-gen-2-console-using-ssh)

Note

If SSH is already set up and Wi-Fi is connected, skip this step.
2. Sign in to the target device using SSH:

Tab Qualcomm Linux
Tab Ubuntu Server

ssh root@<IP ADDRESS OF THE TARGET DEVICE>
        Copy to clipboard

ssh ubuntu@<IP ADDRESS OF THE TARGET DEVICE>
        Copy to clipboard
3. On the target device, obtain the `download_artifacts.sh` script, set executable permissions, and run it with the required arguments to download the model and label files to the device.

curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/download_artifacts.sh
        Copy to clipboard

chmod +x download_artifacts.sh
        Copy to clipboard

./download_artifacts.sh
        Copy to clipboard
4. Install the LiteRT runtime and other dependencies by setting up Python environment on the target device.

Tab Qualcomm Linux
Tab Ubuntu Server

Install the LiteRT runtime, Pillow, and OpenCV packages.

pip3 install ai-edge-litert==1.3.0 Pillow opencv-python
        Copy to clipboard

1. Install Python pip and virtual environment.

sudo apt install python3-pip python3-venv
            Copy to clipboard
    2. Create a new virtual environment, and install the LiteRT runtime, Pillow and OpenCV packages.

python3 -m venv .venv-litert-demo --system-site-packages
            source .venv-litert-demo/bin/activate
            pip3 install ai-edge-litert==1.3.0 Pillow
            pip3 install opencv-python
            Copy to clipboard
    3. Install the necessary python3 and GTK packages.

sudo apt install -y python3-gi python3-gi-cairo gir1.2-gtk-3.0 python3-full pkg-config cmake libcairo2-dev libgirepository1.0-dev gir1.2-glib-2.0 build-essential python3-dev python3-pip pkg-config meson
            Copy to clipboard

## Run an objection detection application

The following Python application performs object detection in real time on a
video file using a quantized YoloX LiteRT model and displays the annotated
frames to file or wayland display. It’s optimized for edge AI scenarios
using hardware acceleration through the QNN LiteRT delegate.

1. Create and go to the `/etc/apps/` directory.

mkdir -p /etc/apps/ && cd /etc/apps/
        Copy to clipboard
2. Download the `object_detection.py` file.

curl -L https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/applications/LiteRT/object_detection.py -o /etc/apps/object_detection.py
        Copy to clipboard

    To create your own local copy of `object_detection.py`, see [create object-detection.py](https://docs.qualcomm.com/doc/80-70022-15B/topic/run-a-litert-model-using-delegate.html#id1).
3. Run the application:

Tab Output to file
Tab Output to display

python3 object_detection.py --output file
        Copy to clipboard

In the terminal of the target device, run do the following:

    1. Activate the display:

        - For Linux:

export XDG_RUNTIME_DIR=/dev/socket/weston && export WAYLAND_DISPLAY=wayland-1
            Copy to clipboard

        - For Ubuntu:

export XDG_RUNTIME_DIR=/run/user/$(id -u ubuntu)/ && export WAYLAND_DISPLAY=wayland-1
                Copy to clipboard
    2. Run the object detection application:

python3 object_detection.py --output wayland
            Copy to clipboard

## Create `object-detection.py`

To create an application similar to the object-detection application described in the previous section, create an `object-detection.py` file as follows:

1. In the `/etc/apps/` folder, create an `object_detection.py` file.
2. Add the following code to your `object_detection.py` file.

Note

The postprocessing in the following code is compatible with object detection models from AI Hub. For custom models, you must update the post-processing logic to align with the model’s output format and specific requirements.

    1. Import the required packages:

#!/usr/bin/env python3
            import cv2
            import numpy as np
            import argparse
            import ai_edge_litert.interpreter as tflite
            import gi
            gi.require_version('Gst', '1.0')
            from gi.repository import Gst
            Copy to clipboard
    2. Handle output arguments:

parser = argparse.ArgumentParser(description="Run object detection and output to file or Wayland.")
            parser.add_argument("--output", choices=["file", "wayland"], default="file",
                              help="Choose output mode: 'file' (default) or 'wayland'")
            args = parser.parse_args()
            Copy to clipboard
    3. Initialize and configure model parameters:

MODEL_PATH = "/etc/models/yolox_quantized.tflite"
            LABEL_PATH = "/etc/labels/coco_labels.txt"
            VIDEO_IN = "/etc/media/video.mp4"
            VIDEO_OUT = "output_object_detection.mp4"
            DELEGATE_PATH = "libQnnTFLiteDelegate.so"
            
            FRAME_W, FRAME_H = 1600, 900
            FPS_OUT = 30
            CONF_THRES = 0.25
            NMS_IOU_THRES = 0.50
            BOX_SCALE = 3.2108588218688965
            BOX_ZP = 31.0
            SCORE_SCALE = 0.0038042240776121616
            Copy to clipboard
    4. Load the model and set up the LiteRT delegate:

delegate_options = {'backend_type': 'htp'}
            delegate = tflite.load_delegate(DELEGATE_PATH, delegate_options)
            interpreter = tflite.Interpreter(model_path=MODEL_PATH, experimental_delegates=[delegate])
            interpreter.allocate_tensors()
            
            in_det = interpreter.get_input_details()
            out_det = interpreter.get_output_details()
            in_h, in_w = in_det[0]["shape"][1:3]
            
            labels = [l.strip() for l in open(LABEL_PATH)]
            Copy to clipboard
    5. Set up video capture and preprocessing:

cap = cv2.VideoCapture(VIDEO_IN)
            sx, sy = FRAME_W / in_w, FRAME_H / in_h
            frame_rs = np.empty((FRAME_H, FRAME_W, 3), np.uint8)
            input_tensor = np.empty((1, in_h, in_w, 3), np.uint8)
            Copy to clipboard
    6. Create a GStreamer pipeline to stream frames to the wayland display:

if args.output == "file":
               fourcc = cv2.VideoWriter_fourcc(*"mp4v")
               out_writer = cv2.VideoWriter(VIDEO_OUT, fourcc, FPS_OUT, (FRAME_W, FRAME_H))
            else:
               Gst.init(None)
               # Enables real-time display of processed frames.
               pipeline = Gst.parse_launch(
                  'appsrc name=src is-live=true block=true format=time caps=video/x-raw,format=BGR,width=1600,height=900,framerate=30/1 ! videoconvert ! waylandsink'
               )
               appsrc = pipeline.get_by_name('src')
               pipeline.set_state(Gst.State.PLAYING)
            
            frame_cnt = 0
            Copy to clipboard
    7. Initialize the main loop to open the video, run inference on each frame, and draw bounding boxes on the output:

# -------------------- Main Loop --------------------
            while True:
               ok, frame = cap.read()
               if not ok:
                  break
               frame_cnt += 1
            
               # Resizes and preprocesses each frame.
               cv2.resize(frame, (FRAME_W, FRAME_H), dst=frame_rs)
               cv2.resize(frame_rs, (in_w, in_h), dst=input_tensor[0])
            
               # Runs inference on each frame.
               interpreter.set_tensor(in_det[0]['index'], input_tensor)
               interpreter.invoke()
            
               boxes_q = interpreter.get_tensor(out_det[0]['index'])[0]
               scores_q = interpreter.get_tensor(out_det[1]['index'])[0]
               classes_q = interpreter.get_tensor(out_det[2]['index'])[0]
            
               # Dequantizes the model outputs using predefined scales and zero-points.
               boxes = BOX_SCALE * (boxes_q.astype(np.float32) - BOX_ZP)
               scores = SCORE_SCALE * scores_q.astype(np.float32)
               classes = classes_q.astype(np.int32)
            
               # Applies a confidence threshold to filter low-probability detections.
               mask = scores >= CONF_THRES
               if np.any(mask):
                  boxes_f = boxes[mask]
                  scores_f = scores[mask]
                  classes_f = classes[mask]
            
                  x1, y1, x2, y2 = boxes_f.T
                  boxes_cv2 = np.column_stack((x1, y1, x2 - x1, y2 - y1))
            
                  # Uses non-maximum suppression (NMS) to remove overlapping boxes.
                  idx_cv2 = cv2.dnn.NMSBoxes(
                        bboxes=boxes_cv2.tolist(),
                        scores=scores_f.tolist(),
                        score_threshold=CONF_THRES,
                        nms_threshold=NMS_IOU_THRES
                  )
            
                  if len(idx_cv2):
                        idx = idx_cv2.flatten()
                        sel_boxes = boxes_f[idx]
                        sel_scores = scores_f[idx]
                        sel_classes = classes_f[idx]
            
                        sel_boxes[:, [0, 2]] *= sx
                        sel_boxes[:, [1, 3]] *= sy
                        sel_boxes = sel_boxes.astype(np.int32)
            
                        sel_boxes[:, [0, 2]] = np.clip(sel_boxes[:, [0, 2]], 0, FRAME_W - 1)
                        sel_boxes[:, [1, 3]] = np.clip(sel_boxes[:, [1, 3]], 0, FRAME_H - 1)
            
                        for (x1i, y1i, x2i, y2i), sc, cl in zip(sel_boxes, sel_scores, sel_classes):
                           # Draws bounding boxes and labels on the frame using OpenCV
                           # and logs the highest detection score every 100 frames.
                           cv2.rectangle(frame_rs, (x1i, y1i), (x2i, y2i), (0, 255, 0), 2)
                           lab = labels[cl] if cl < len(labels) else str(cl)
                           cv2.putText(frame_rs, f"{lab} {sc:.2f}", (x1i, max(10, y1i - 5)),
                                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
            
               # Output based on mode
               # Write output to a file.
               if args.output == "file":
                  out_writer.write(frame_rs)
               # Stream output to a Wayland display.
               else:
                  data = frame_rs.tobytes()
                  # Converts frames to GStreamer buffers and pushes them to the pipeline with timestamps for smooth playback.
                  buf = Gst.Buffer.new_allocate(None, len(data), None)
                  buf.fill(0, data)
                  buf.duration = Gst.util_uint64_scale_int(1, Gst.SECOND, FPS_OUT)
                  timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) * Gst.MSECOND
                  buf.pts = buf.dts = int(timestamp)
                  appsrc.emit('push-buffer', buf)
            Copy to clipboard
    8. Release the pipeline and notify user of completion.

cap.release()
            if args.output == "file":
               out_writer.release()
               print(f"Done - processed video saved to {VIDEO_OUT}")
            else:
               appsrc.emit('end-of-stream')
               pipeline.set_state(Gst.State.NULL)
               print("Done - video streamed to Wayland sink")
            Copy to clipboard

Last Published: Nov 28, 2025

[Previous Topic
Run prebuilt AI models and applications](https://docs.qualcomm.com/bundle/publicresource/80-70022-15B/topics/run-prebuilt-models-and-apps.md) [Next Topic
Experience AI applications with Qdemo UI](https://docs.qualcomm.com/bundle/publicresource/80-70022-15B/topics/run-the-gui-demo.md)