# Object detection and display with ONNX

The use cases use an ONNX object detection model to identify objects in a scene from a single camera stream. The detected bounding boxes are either overlaid on the video output or composed into the rendered stream and displayed.

Run the use case on the target device:

export WAYLAND_DISPLAY=wayland-1 && export XDG_RUNTIME_DIR=/dev/socket/weston && \
    gst-launch-1.0 -v filesrc location=/etc/media/video.mp4 ! qtdemux ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! tee name=split \
    split. ! queue ! qtivcomposer name=mixer ! queue ! waylandsink fullscreen=true \
    split. ! queue ! qtimlvconverter ! queue ! qtimlonnx model=/etc/models/model.onnx execution-provider=qnn backend-path="/usr/lib/libQnnHtp.so" ! queue ! qtimlpostprocess module=yolov8 labels=/etc/labels/yolox.json ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.
    Copy to clipboard

Note

When using an `ONNXModel`, place the model weight file in the same directory as the ONNX model file and name it `model.data`.

To stop the use case, use **CTRL + C**.

The following figure shows the flow of the use case execution:

1. Identifies object scenes in the scene from a video stream, which is coming through a camera source.
2. Overlays bounding boxes over the detected objects using overlaylib.
3. Displays the results.

<!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="940" height="349.974597930908203" viewbox="0 0 940 349.974597930908203" aria-label="../../_images/image_object_detection_onnx.svg">
  <defs>
    <style>.svg-1 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-1 .cls-2 { fill: #fff; font-size: 16px }
.svg-1 .cls-2,.svg-1 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-1 .cls-4 { fill: #007884 }
.svg-1 .cls-5 { fill: #d2d7e1 }
.svg-1 .cls-6 { fill: #2a2aea }
.svg-1 .cls-3 { font-size: 14px }
.svg-1 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect class="cls-7" x=".5" y=".499774932861328" width="939" height="348.974609375" rx="7.5" ry="7.5"></rect>
      <path class="cls-5" d="M932,1c3.859741210939319,0,7,3.140233993530273,7,7v333.974597930908203c0,3.8597412109375-3.140258789060681,7-7,7H8c-3.859771728515625,0-7-3.1402587890625-7-7V8c0-3.859766006469727,3.140228271484375-7,7-7h924M932,0H8C3.581771850585938,0,0,3.581764221191406,0,8v333.974597930908203c0,4.418212890625,3.581771850585938,8,8,8h924c4.418334960939319,0,8-3.581787109375,8-8V8c0-4.418235778808594-3.581665039060681-8-8-8h0Z"></path>
    </g>
    <g>
      <g>
        <text class="cls-3" transform="translate(741.492431640625 326.066074371337891)"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect class="cls-6" x="721.241394846414551" y="313.974597930908203" width="16" height="16" rx="2" ry="2"></rect>
      </g>
      <g>
        <text class="cls-3" transform="translate(840.07421875 326.066074371337891)"><tspan x="0" y="0">Open source</tspan></text>
        <rect class="cls-4" x="819.823176259271349" y="313.974597930908203" width="16" height="16" rx="2" ry="2"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <g>
          <rect class="cls-4" x="20" y="20.000005443602277" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(73.429725646972656 48.506705760955811)"><tspan x="0" y="0">camsrc</tspan></text>
        </g>
        <g>
          <line class="cls-1" x1="180" y1="45.000003814697266" x2="199.976654052734375" y2="45.000003814697266"></line>
          <polygon points="198.955352783203125 48.490436553955078 205 45.000003814697266 198.955352783203125 41.509574890136719 198.955352783203125 48.490436553955078"></polygon>
        </g>
        <g>
          <line class="cls-1" x1="285" y1="68.974582672119141" x2="285" y2="88.951221466064453"></line>
          <polygon points="281.50958251953125 87.929927825927734 285 93.974582672119141 288.49041748046875 87.929927825927734 281.50958251953125 87.929927825927734"></polygon>
        </g>
        <g>
          <rect class="cls-4" x="205" y="20.000005443602277" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(273.910198211669922 48.506705760955811)"><tspan x="0" y="0">tee</tspan></text>
        </g>
        <g>
          <rect class="cls-6" x="205" y="93.974581043215949" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(229.527484893798828 123.650539398193359)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
        </g>
        <g>
          <line class="cls-1" x1="285" y1="143.974582672119141" x2="285" y2="163.951221466064453"></line>
          <polygon points="281.50958251953125 162.929920196533203 285 168.974582672119141 288.49041748046875 162.929920196533203 281.50958251953125 162.929920196533203"></polygon>
        </g>
        <g>
          <rect class="cls-6" x="205" y="168.974581043215949" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(249.589984893798828 198.650547027587891)"><tspan x="0" y="0">qtimlonnx</tspan></text>
        </g>
        <g>
          <line class="cls-1" x1="285" y1="218.974582672119141" x2="285" y2="238.951221466064453"></line>
          <polygon points="281.50958251953125 237.929920196533203 285 243.974567413330078 288.49041748046875 237.929920196533203 281.50958251953125 237.929920196533203"></polygon>
        </g>
        <g>
          <rect class="cls-6" x="205" y="243.974581043215949" width="160" height="49.999999999998181" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(222.722797393798828 273.650547027587891)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
        </g>
        <g>
          <line class="cls-1" x1="365" y1="45.000003814697266" x2="384.97662353515625" y2="45.000003814697266"></line>
          <polygon points="383.955322265625 48.490436553955078 390 45.000003814697266 383.955322265625 41.509574890136719 383.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect class="cls-6" x="390" y="20.000005443602277" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(427.296905517578125 48.506705760955811)"><tspan x="0" y="0">qtimetamux</tspan></text>
        </g>
        <g>
          <line class="cls-1" x1="550" y1="45.000003814697266" x2="569.97662353515625" y2="45.000003814697266"></line>
          <polygon points="568.955322265625 48.490436553955078 575 45.000003814697266 568.955322265625 41.509574890136719 568.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect class="cls-6" x="575" y="20.000005443602277" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(616.562530517578125 48.506705760955811)"><tspan x="0" y="0">qtivoverlay</tspan></text>
        </g>
        <g>
          <line class="cls-1" x1="735" y1="45.000003814697266" x2="754.97662353515625" y2="45.000003814697266"></line>
          <polygon points="753.955322265625 48.490436553955078 760 45.000003814697266 753.955322265625 41.509574890136719 753.955322265625 48.490436553955078"></polygon>
        </g>
        <g>
          <rect class="cls-4" x="760" y="20.000005443602277" width="160" height="50" rx="4" ry="4"></rect>
          <text class="cls-2" transform="translate(796.09771728515625 48.506705760955811)"><tspan x="0" y="0">waylandsink</tspan></text>
        </g>
      </g>
      <g>
        <polyline class="cls-1" points="365 271.338306427001953 470 271.338306427001953 470 74.922946929931641"></polyline>
        <polygon points="473.49041748046875 75.944240570068359 470 69.899585723876953 466.50958251953125 75.944240570068359 473.49041748046875 75.944240570068359"></polygon>
      </g>
    </g>
  </g>
</svg>
**Figure : Pipeline for object detection with ONNX**

The following table provides the sequential processing stages of the pipeline execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70029-50/topic/qtiqmmfsrc.html) | <ol class="arabic simple"><br><li><p>Collects the video stream (source) and creates two copies of the source:</p><ul class="simple"><br><li><p>One stream is sent to <a href="https://docs.qualcomm.com/doc/80-70029-50/topic/qtimetamux.html"><span class="doc">qtimetamux</span></a> plugin to retain the video stream.</p></li><br><li><p>The other stream is sent to an ML inferencing pipeline.</p></li><br></ul><br></li><br></ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70029-50/topic/qtimlvconverter.html) | <ol class="arabic"><br><li><p>Receives the video stream on its sink pad.</p></li><br><li><p>Performs preprocessing:</p><ul class="simple"><br><li><p>Color conversion</p></li><br><li><p>Scaling down/up</p></li><br><li><p>Normalization on the stream data when the model expects the floating point values as input</p></li><br></ul><br></li><br><li><p>Converts the video stream to a tensor stream on its source pad.</p><br><p>The object detection model uses this tensor stream for inferencing.</p><br></li><br></ol> |
| **Inferencing** | **Inferencing** |
| [qtimlonnx](https://docs.qualcomm.com/doc/80-70029-50/topic/qtimlonnx.html) | <ol class="arabic simple"><br><li><p>Loads the object detection model.</p></li><br><li><p>Modifies the graph for the chosen delegate.</p></li><br><li><p>Receives the tensor stream on its sinkpad.</p></li><br><li><p>Runs the inference and produces a tensor stream with the object detection results on its source pad.</p></li><br></ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="arabic"><br><li><p>Receives the inference tensors from object detection.</p></li><br><li><p>Converts the inference tensors on its sinkpad into formats like video or text that the multimedia plugins can process later.</p></li><br><li><p>Applies the threshold to the chosen number of results.</p></li><br><li><p>Loads the corresponding modules for detection models.</p><br><p>In this use case, qtimlpostprocess does the following:</p><ol class="loweralpha simple"><br><li><p>Loads the YOLOv8 submodule.</p></li><br><li><p>Produces results as structures of text.</p></li><br><li><p>Sends them to the sinkpad of qtimetamux.</p></li><br></ol><br></li><br></ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70029-50/topic/qtimetamux.html) | <ol class="arabic simple"><br><li><p>Receives video stream and text stream with bounding box results corresponding to the video stream on its sinkpads.</p></li><br><li><p>Produces GST buffers with contents of video stream from its sink pad.</p></li><br><li><p>Adds bounding boxes as GstVideoRegionOfInterest from data sinkpad to GST buffers meta (meta muxing) on its source pad.</p></li><br></ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70029-50/topic/qtioverlay.html) | <ol class="arabic simple"><br><li><p>Receives the multiplexed stream.</p></li><br><li><p>Overlays the bounding boxes on the VideoFrame using CL.</p></li><br><li><p>Produces GST buffers with overlays in its source pad.</p></li><br></ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70029-50/topic/waylandsink.html) | <ol class="arabic simple"><br><li><p>Receives the video stream on its sinkpad.</p></li><br><li><p>Submits the video stream to Weston.</p></li><br><li><p>Weston renders the video stream and bounding boxes generated for the objects in that scene on a local display device.</p></li><br></ol> |

Last Published: Apr 02, 2026

[Previous Topic
Image classification with ONNX](https://docs.qualcomm.com/bundle/publicresource/80-70029-50/topics/image-classification-with-onnx.md) [Next Topic
Custom Gstreamer pipeline use cases](https://docs.qualcomm.com/bundle/publicresource/80-70029-50/topics/custom-gstreamer-pipeline-use-cases.md)

Source: [https://docs.qualcomm.com/doc/80-70029-50/topic/object-detection-with-onnx.html](https://docs.qualcomm.com/doc/80-70029-50/topic/object-detection-with-onnx.html)