# Run a LiteRT model on NPU

Note

This section isn’t applicable for this release.

Use the [LiteRT](https://ai.google.dev/edge/litert) runtime to run existing
quantized LiteRT models on the NPU of Qualcomm® Dragonwing™
devices.

## Prerequisites

Before running the LiteRT sample applications, complete the prerequisites, such as
connecting with SSH, downloading reference Python app, models and run them with file output and output to a connected display.

1. Sign in with SSH and connect to the target device. For detailed instructions, see:

    Sign in using [SSH](https://docs.qualcomm.com/doc/80-80022-254/topic/how_to.html#use-ssh) for Qualcomm Linux.

Note

If SSH is already set up and Wi-Fi is connected, skip this step.
2. Sign in to the target device using SSH:

ssh root@<IP ADDRESS OF THE TARGET DEVICE>
        Copy to clipboard
3. If device setup isn’t complete, follow the
[Qualcomm IM SDK quickstart instructions](https://docs.qualcomm.com/doc/80-80022-51/topic/install-sdk.html) to set up the device.
4. On the target device, download the `download_artifacts.sh` script, set executable permissions,
and run it with the required arguments to download the model and label files to the device.

cd /tmp/
        Copy to clipboard

curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/download_artifacts.sh
        Copy to clipboard

chmod +x download_artifacts.sh
        Copy to clipboard

./download_artifacts.sh
        Copy to clipboard
5. Install the LiteRT runtime and other dependencies by setting up Python environment on the target device.

    Install the LiteRT runtime, Pillow, and OpenCV packages.

pip3 install ai-edge-litert==1.3.0 Pillow opencv-python==4.10.0.84
        Copy to clipboard

## Run an objection detection application

The following Python application performs object detection in real time on a
video file using a quantized YoloX LiteRT model and displays the annotated
frames to file or wayland display. It’s optimized for edge AI scenarios
using hardware acceleration through the QNN LiteRT delegate.

1. Create and go to the `/etc/apps/` directory.

mkdir -p /etc/apps/ && cd /etc/apps/
        Copy to clipboard
2. Download the `object_detection.py` file.

curl -L https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/applications/LiteRT/object_detection.py -o /etc/apps/object_detection.py
        Copy to clipboard

    To create your own local copy of `object_detection.py`, see [create object-detection.py](https://docs.qualcomm.com/doc/80-80022-15B/topic/run-a-litert-model-using-delegate.html#id1).
3. Run the application:

Tab Output to file
Tab Output to display

python3 object_detection.py --output file
        Copy to clipboard

Download the output video to the host machine to check the output:

scp root@<IP ADDRESS OF THE TARGET DEVICE>:/etc/apps/output_object_detection.mp4 .
        Copy to clipboard

When prompted to enter the password, enter `oelinux123`.

In the terminal of the target device, run the object detection application:

python3 object_detection.py --output wayland
        Copy to clipboard

## Create `object-detection.py`

To create an application similar to the object-detection application described in the previous section,
create an `object-detection.py` file as follows:

1. In the `/etc/apps/` folder, create an `object_detection.py` file.
2. Add the following code to your `object_detection.py` file.

Note

The postprocessing in the following code is compatible with object detection models from AI Hub.
For custom models, you must update the postprocessing logic to align with the model’s output format
and specific requirements.

    1. Import the required packages:

#!/usr/bin/env python3
            import cv2
            import numpy as np
            import argparse
            import ai_edge_litert.interpreter as tflite
            import gi
            gi.require_version('Gst', '1.0')
            from gi.repository import Gst
            Copy to clipboard
    2. Handle output arguments:

parser = argparse.ArgumentParser(description="Run object detection and output to file or Wayland.")
            parser.add_argument("--output", choices=["file", "wayland"], default="file",
                              help="Choose output mode: 'file' (default) or 'wayland'")
            args = parser.parse_args()
            Copy to clipboard
    3. Initialize and configure model parameters:

MODEL_PATH = "/etc/models/yolox_quantized.tflite"
            LABEL_PATH = "/etc/labels/coco_labels.txt"
            VIDEO_IN = "/etc/media/video.mp4"
            VIDEO_OUT = "output_object_detection.mp4"
            DELEGATE_PATH = "libQnnTFLiteDelegate.so"
            
            FRAME_W, FRAME_H = 1600, 900
            FPS_OUT = 30
            CONF_THRES = 0.25
            NMS_IOU_THRES = 0.50
            BOX_SCALE = 3.2108588218688965
            BOX_ZP = 31.0
            SCORE_SCALE = 0.0038042240776121616
            Copy to clipboard
    4. Load the model and set up the LiteRT delegate:

delegate_options = {'backend_type': 'htp'}
            delegate = tflite.load_delegate(DELEGATE_PATH, delegate_options)
            interpreter = tflite.Interpreter(model_path=MODEL_PATH, experimental_delegates=[delegate])
            interpreter.allocate_tensors()
            
            in_det = interpreter.get_input_details()
            out_det = interpreter.get_output_details()
            in_h, in_w = in_det[0]["shape"][1:3]
            
            labels = [l.strip() for l in open(LABEL_PATH)]
            Copy to clipboard
    5. Set up video capture and preprocessing:

cap = cv2.VideoCapture(VIDEO_IN)
            sx, sy = FRAME_W / in_w, FRAME_H / in_h
            frame_rs = np.empty((FRAME_H, FRAME_W, 3), np.uint8)
            input_tensor = np.empty((1, in_h, in_w, 3), np.uint8)
            Copy to clipboard
    6. Create a GStreamer pipeline to stream frames to the wayland display:

if args.output == "file":
               fourcc = cv2.VideoWriter_fourcc(*"mp4v")
               out_writer = cv2.VideoWriter(VIDEO_OUT, fourcc, FPS_OUT, (FRAME_W, FRAME_H))
            else:
               Gst.init(None)
               # Enables real-time display of processed frames.
               pipeline = Gst.parse_launch(
                  'appsrc name=src is-live=true block=true format=time caps=video/x-raw,format=BGR,width=1600,height=900,framerate=30/1 ! videoconvert ! waylandsink'
               )
               appsrc = pipeline.get_by_name('src')
               pipeline.set_state(Gst.State.PLAYING)
            
            frame_cnt = 0
            Copy to clipboard
    7. Initialize the main loop to open the video, run inference on each frame, and draw bounding boxes on the output:

# -------------------- Main Loop --------------------
            while True:
               ok, frame = cap.read()
               if not ok:
                  break
               frame_cnt += 1
            
               # Resizes and preprocesses each frame.
               cv2.resize(frame, (FRAME_W, FRAME_H), dst=frame_rs)
               cv2.resize(frame_rs, (in_w, in_h), dst=input_tensor[0])
            
               # Runs inference on each frame.
               interpreter.set_tensor(in_det[0]['index'], input_tensor)
               interpreter.invoke()
            
               boxes_q = interpreter.get_tensor(out_det[0]['index'])[0]
               scores_q = interpreter.get_tensor(out_det[1]['index'])[0]
               classes_q = interpreter.get_tensor(out_det[2]['index'])[0]
            
               # Dequantizes the model outputs using predefined scales and zero-points.
               boxes = BOX_SCALE * (boxes_q.astype(np.float32) - BOX_ZP)
               scores = SCORE_SCALE * scores_q.astype(np.float32)
               classes = classes_q.astype(np.int32)
            
               # Applies a confidence threshold to filter low-probability detections.
               mask = scores >= CONF_THRES
               if np.any(mask):
                  boxes_f = boxes[mask]
                  scores_f = scores[mask]
                  classes_f = classes[mask]
            
                  x1, y1, x2, y2 = boxes_f.T
                  boxes_cv2 = np.column_stack((x1, y1, x2 - x1, y2 - y1))
            
                  # Uses non-maximum suppression (NMS) to remove overlapping boxes.
                  idx_cv2 = cv2.dnn.NMSBoxes(
                        bboxes=boxes_cv2.tolist(),
                        scores=scores_f.tolist(),
                        score_threshold=CONF_THRES,
                        nms_threshold=NMS_IOU_THRES
                  )
            
                  if len(idx_cv2):
                        idx = idx_cv2.flatten()
                        sel_boxes = boxes_f[idx]
                        sel_scores = scores_f[idx]
                        sel_classes = classes_f[idx]
            
                        sel_boxes[:, [0, 2]] *= sx
                        sel_boxes[:, [1, 3]] *= sy
                        sel_boxes = sel_boxes.astype(np.int32)
            
                        sel_boxes[:, [0, 2]] = np.clip(sel_boxes[:, [0, 2]], 0, FRAME_W - 1)
                        sel_boxes[:, [1, 3]] = np.clip(sel_boxes[:, [1, 3]], 0, FRAME_H - 1)
            
                        for (x1i, y1i, x2i, y2i), sc, cl in zip(sel_boxes, sel_scores, sel_classes):
                           # Draws bounding boxes and labels on the frame using OpenCV
                           # and logs the highest detection score every 100 frames.
                           cv2.rectangle(frame_rs, (x1i, y1i), (x2i, y2i), (0, 255, 0), 2)
                           lab = labels[cl] if cl < len(labels) else str(cl)
                           cv2.putText(frame_rs, f"{lab} {sc:.2f}", (x1i, max(10, y1i - 5)),
                                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
            
               # Output based on mode
               # Write output to a file.
               if args.output == "file":
                  out_writer.write(frame_rs)
               # Stream output to a Wayland display.
               else:
                  data = frame_rs.tobytes()
                  # Converts frames to GStreamer buffers and pushes them to the pipeline with timestamps for smooth playback.
                  buf = Gst.Buffer.new_allocate(None, len(data), None)
                  buf.fill(0, data)
                  buf.duration = Gst.util_uint64_scale_int(1, Gst.SECOND, FPS_OUT)
                  timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) * Gst.MSECOND
                  buf.pts = buf.dts = int(timestamp)
                  appsrc.emit('push-buffer', buf)
            Copy to clipboard
    8. Release the pipeline and notify user of completion.

cap.release()
            if args.output == "file":
               out_writer.release()
               print(f"Done - processed video saved to {VIDEO_OUT}")
            else:
               appsrc.emit('end-of-stream')
               pipeline.set_state(Gst.State.NULL)
               print("Done - video streamed to Wayland sink")
            Copy to clipboard

Last Published: May 14, 2026

[Previous Topic
Run prebuilt AI models and applications](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/run-prebuilt-models-and-apps.md) [Next Topic
Experience AI applications with Qdemo UI](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/run-the-gui-demo.md)