# Object detection and display with LiteRT Source: [https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-object-detection-and-display.html](https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-object-detection-and-display.html) The use cases use a YOLOX LiteRT model to identify the object in a scene. The use case is to either overlay or compose the bounding boxes over the detected objects, and then display the results. ## Use qtivoverlay plugin to apply bounding box overlay Run the use case on the target device: gst-launch-1.0 -e qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! \ queue ! tee name=split split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true split. ! queue ! \ qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \ external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/yolox_quantized.tflite ! queue ! \ qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json ! text/x-raw ! queue ! metamux.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Identifies object scenes in the scene from a video stream, which is coming through a camera source. 2. Overlays bounding boxes over the detected objects using overlaylib. 3. Displays the results. Figure : Pipeline for bounding box overlay The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70023-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
- One stream is sent to qtimetamux
  plugin to retain the video stream.
- The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The object detection model uses this tensor
stream for inferencing.

Loads the object detection model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
object detection results on its source pad.

Receives the inference tensors from object detection.

Converts the inference tensors on its sinkpad into formats
like video or text that the multimedia plugins can process
later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules for detection models.
In
this use case, qtimlpostprocess does the following:
1. Loads the YOLOv8 submodule.
2. Produces results as structures of text.
3. Sends them to the sinkpad of qtimetamux.

| | [qtimetamux](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimetamux.html) |

Receives video stream and text stream with bounding box
results corresponding to the video stream on its
sinkpads.

Produces GST buffers with contents of video stream from its
sink pad.

Adds bounding boxes as GstVideoRegionOfInterest from data
sinkpad to GST buffers meta (meta muxing) on its source
pad.

| | [qtivoverlay](https://docs.qualcomm.com/doc/80-70023-50/topic/qtioverlay.html) |

Receives the multiplexed stream.

Overlays the bounding boxes on the VideoFrame using CL.

Produces GST buffers with overlays in its source pad.

Receives the video stream on its sinkpad.

Submits the video stream to Weston.

Weston renders the video stream and bounding boxes generated
for the objects in that scene on a local display
device.

| ## Use qtivcomposer to mix original frame with bounding box mask Run the use case on the target device: gst-launch-1.0 -e qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split split. ! \ queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true split. ! queue ! qtimlvconverter ! queue ! \ qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \ model=/etc/models/yolox_quantized.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json \ ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Identifies object scenes in the scene from a video stream, which is coming through a camera source. 2. Composes the following using qtivcomposer: 1. Bounding boxes over objects detected. 2. Original video stream. 3. Displays the results. Figure : Pipeline for bounding box mask with qtivcomposer The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70023-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
- One stream is sent to qtivcomposer plugin to retain
  the video stream.
- The other stream is sent to a ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The object detection model uses this tensor
stream for inferencing.

Loads the object detection model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
object detection results on its source pad.

Receives the inference tensors from the object detection
model.

Converts the inference tensors on its sinkpad into formats
like video or text that the multimedia plugins can process
later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules for detection models.
In
this use case, qtimlpostprocess does the following:
1. Loads the YOLOv8 submodule.
2. Produces video frames with only bounding boxes that
  can be overlaid on objects.
3. Sends them to the sinkpad of qtivcomposer.

| | [qtivcomposer](https://docs.qualcomm.com/doc/80-70023-50/topic/qtivcomposer.html) |

Receives the original video stream and video stream with
bounding boxes on its sinkpads.

On its sourcepads, produces content that's composed of the
video streams processed from its sinkpads.

Receives the video stream on its sinkpad.

Submits the video stream to Weston.

Weston displays the following on a local display device:
1. The video stream is captured from the camera.
2. The bounding boxes are drawn over the allowed number
  of objects identified in that scene.

| **Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70023-50/topic/tensorflow-lite-use-cases.html) Last Published: Mar 27, 2026 [Previous Topic Audio classification decode and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/audio-classification-with-litert.md) [Next Topic Object detection and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/single-camera-stream-with-object-detection-and-encode.md)