# Run a LiteRT model on NPU Note This section isn’t applicable for this release. Use the [LiteRT](https://ai.google.dev/edge/litert) runtime to run existing quantized LiteRT models on the NPU of Qualcomm® Dragonwing™ devices. ## Prerequisites Before running the LiteRT sample applications, complete the prerequisites, such as connecting with SSH, downloading reference Python app, models and run them with file output and output to a connected display. 1. Sign in with SSH and connect to the target device. For detailed instructions, see: Sign in using [SSH](https://docs.qualcomm.com/doc/80-80022-254/topic/how_to.html#use-ssh) for Qualcomm Linux. Note If SSH is already set up and Wi-Fi is connected, skip this step. 2. Sign in to the target device using SSH: ssh root@ Copy to clipboard 3. If device setup isn’t complete, follow the [Qualcomm IM SDK quickstart instructions](https://docs.qualcomm.com/doc/80-80022-51/topic/install-sdk.html) to set up the device. 4. On the target device, download the `download_artifacts.sh` script, set executable permissions, and run it with the required arguments to download the model and label files to the device. cd /tmp/ Copy to clipboard curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/download_artifacts.sh Copy to clipboard chmod +x download_artifacts.sh Copy to clipboard ./download_artifacts.sh Copy to clipboard 5. Install the LiteRT runtime and other dependencies by setting up Python environment on the target device. Install the LiteRT runtime, Pillow, and OpenCV packages. pip3 install ai-edge-litert==1.3.0 Pillow opencv-python==4.10.0.84 Copy to clipboard ## Run an objection detection application The following Python application performs object detection in real time on a video file using a quantized YoloX LiteRT model and displays the annotated frames to file or wayland display. It’s optimized for edge AI scenarios using hardware acceleration through the QNN LiteRT delegate. 1. Create and go to the `/etc/apps/` directory. mkdir -p /etc/apps/ && cd /etc/apps/ Copy to clipboard 2. Download the `object_detection.py` file. curl -L https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/applications/LiteRT/object_detection.py -o /etc/apps/object_detection.py Copy to clipboard To create your own local copy of `object_detection.py`, see [create object-detection.py](https://docs.qualcomm.com/doc/80-80022-15B/topic/run-a-litert-model-using-delegate.html#id1). 3. Run the application: Tab Output to file Tab Output to display python3 object_detection.py --output file Copy to clipboard Download the output video to the host machine to check the output: scp root@:/etc/apps/output_object_detection.mp4 . Copy to clipboard When prompted to enter the password, enter `oelinux123`. In the terminal of the target device, run the object detection application: python3 object_detection.py --output wayland Copy to clipboard ## Create `object-detection.py` To create an application similar to the object-detection application described in the previous section, create an `object-detection.py` file as follows: 1. In the `/etc/apps/` folder, create an `object_detection.py` file. 2. Add the following code to your `object_detection.py` file. Note The postprocessing in the following code is compatible with object detection models from AI Hub. For custom models, you must update the postprocessing logic to align with the model’s output format and specific requirements. 1. Import the required packages: #!/usr/bin/env python3 import cv2 import numpy as np import argparse import ai_edge_litert.interpreter as tflite import gi gi.require_version('Gst', '1.0') from gi.repository import Gst Copy to clipboard 2. Handle output arguments: parser = argparse.ArgumentParser(description="Run object detection and output to file or Wayland.") parser.add_argument("--output", choices=["file", "wayland"], default="file", help="Choose output mode: 'file' (default) or 'wayland'") args = parser.parse_args() Copy to clipboard 3. Initialize and configure model parameters: MODEL_PATH = "/etc/models/yolox_quantized.tflite" LABEL_PATH = "/etc/labels/coco_labels.txt" VIDEO_IN = "/etc/media/video.mp4" VIDEO_OUT = "output_object_detection.mp4" DELEGATE_PATH = "libQnnTFLiteDelegate.so" FRAME_W, FRAME_H = 1600, 900 FPS_OUT = 30 CONF_THRES = 0.25 NMS_IOU_THRES = 0.50 BOX_SCALE = 3.2108588218688965 BOX_ZP = 31.0 SCORE_SCALE = 0.0038042240776121616 Copy to clipboard 4. Load the model and set up the LiteRT delegate: delegate_options = {'backend_type': 'htp'} delegate = tflite.load_delegate(DELEGATE_PATH, delegate_options) interpreter = tflite.Interpreter(model_path=MODEL_PATH, experimental_delegates=[delegate]) interpreter.allocate_tensors() in_det = interpreter.get_input_details() out_det = interpreter.get_output_details() in_h, in_w = in_det[0]["shape"][1:3] labels = [l.strip() for l in open(LABEL_PATH)] Copy to clipboard 5. Set up video capture and preprocessing: cap = cv2.VideoCapture(VIDEO_IN) sx, sy = FRAME_W / in_w, FRAME_H / in_h frame_rs = np.empty((FRAME_H, FRAME_W, 3), np.uint8) input_tensor = np.empty((1, in_h, in_w, 3), np.uint8) Copy to clipboard 6. Create a GStreamer pipeline to stream frames to the wayland display: if args.output == "file": fourcc = cv2.VideoWriter_fourcc(*"mp4v") out_writer = cv2.VideoWriter(VIDEO_OUT, fourcc, FPS_OUT, (FRAME_W, FRAME_H)) else: Gst.init(None) # Enables real-time display of processed frames. pipeline = Gst.parse_launch( 'appsrc name=src is-live=true block=true format=time caps=video/x-raw,format=BGR,width=1600,height=900,framerate=30/1 ! videoconvert ! waylandsink' ) appsrc = pipeline.get_by_name('src') pipeline.set_state(Gst.State.PLAYING) frame_cnt = 0 Copy to clipboard 7. Initialize the main loop to open the video, run inference on each frame, and draw bounding boxes on the output: # -------------------- Main Loop -------------------- while True: ok, frame = cap.read() if not ok: break frame_cnt += 1 # Resizes and preprocesses each frame. cv2.resize(frame, (FRAME_W, FRAME_H), dst=frame_rs) cv2.resize(frame_rs, (in_w, in_h), dst=input_tensor[0]) # Runs inference on each frame. interpreter.set_tensor(in_det[0]['index'], input_tensor) interpreter.invoke() boxes_q = interpreter.get_tensor(out_det[0]['index'])[0] scores_q = interpreter.get_tensor(out_det[1]['index'])[0] classes_q = interpreter.get_tensor(out_det[2]['index'])[0] # Dequantizes the model outputs using predefined scales and zero-points. boxes = BOX_SCALE * (boxes_q.astype(np.float32) - BOX_ZP) scores = SCORE_SCALE * scores_q.astype(np.float32) classes = classes_q.astype(np.int32) # Applies a confidence threshold to filter low-probability detections. mask = scores >= CONF_THRES if np.any(mask): boxes_f = boxes[mask] scores_f = scores[mask] classes_f = classes[mask] x1, y1, x2, y2 = boxes_f.T boxes_cv2 = np.column_stack((x1, y1, x2 - x1, y2 - y1)) # Uses non-maximum suppression (NMS) to remove overlapping boxes. idx_cv2 = cv2.dnn.NMSBoxes( bboxes=boxes_cv2.tolist(), scores=scores_f.tolist(), score_threshold=CONF_THRES, nms_threshold=NMS_IOU_THRES ) if len(idx_cv2): idx = idx_cv2.flatten() sel_boxes = boxes_f[idx] sel_scores = scores_f[idx] sel_classes = classes_f[idx] sel_boxes[:, [0, 2]] *= sx sel_boxes[:, [1, 3]] *= sy sel_boxes = sel_boxes.astype(np.int32) sel_boxes[:, [0, 2]] = np.clip(sel_boxes[:, [0, 2]], 0, FRAME_W - 1) sel_boxes[:, [1, 3]] = np.clip(sel_boxes[:, [1, 3]], 0, FRAME_H - 1) for (x1i, y1i, x2i, y2i), sc, cl in zip(sel_boxes, sel_scores, sel_classes): # Draws bounding boxes and labels on the frame using OpenCV # and logs the highest detection score every 100 frames. cv2.rectangle(frame_rs, (x1i, y1i), (x2i, y2i), (0, 255, 0), 2) lab = labels[cl] if cl < len(labels) else str(cl) cv2.putText(frame_rs, f"{lab} {sc:.2f}", (x1i, max(10, y1i - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) # Output based on mode # Write output to a file. if args.output == "file": out_writer.write(frame_rs) # Stream output to a Wayland display. else: data = frame_rs.tobytes() # Converts frames to GStreamer buffers and pushes them to the pipeline with timestamps for smooth playback. buf = Gst.Buffer.new_allocate(None, len(data), None) buf.fill(0, data) buf.duration = Gst.util_uint64_scale_int(1, Gst.SECOND, FPS_OUT) timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) * Gst.MSECOND buf.pts = buf.dts = int(timestamp) appsrc.emit('push-buffer', buf) Copy to clipboard 8. Release the pipeline and notify user of completion. cap.release() if args.output == "file": out_writer.release() print(f"Done - processed video saved to {VIDEO_OUT}") else: appsrc.emit('end-of-stream') pipeline.set_state(Gst.State.NULL) print("Done - video streamed to Wayland sink") Copy to clipboard Last Published: Jun 23, 2026 [Previous Topic Run prebuilt AI models and applications](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/run-prebuilt-models-and-apps.md) [Next Topic Experience AI applications with Qdemo UI](https://docs.qualcomm.com/bundle/publicresource/80-80022-15B/topics/run-the-gui-demo.md)