# Object detection and display with Neural Processing SDK Source: [https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.html](https://docs.qualcomm.com/doc/80-70022-50/topic/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.html) The use cases implement a yolox.dlc object detection model with Qualcomm Neural Processing SDK to identify an object from a camera stream. The use case is to overlay or compose the bounding boxes over the detected objects, and the display the results. Download [YOLOX](https://aihub.qualcomm.com/iot/models/yolox?searchTerm=yolox%29) Qualcomm AI runtime w8a8 precision model from AI hub. The YOLOX model uses the YOLOv8 postprocessing module. ## Use qtivoverlay plugin to apply bounding box overlay Run the use case on the target device: gst-launch-1.0 -e \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true sync=false \ split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/yolox-yolo-x-w8a8.dlc layers="" ! queue ! \ qtimlpostprocess settings="{\"confidence\": 70.0}" results=5 module=yolov8 labels=/etc/labels/yolox.json ! text/x-raw ! queue ! metamux.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: - Identify an object in a scene from a video stream coming through camera source. - Overlay the bounding boxes over the detected objects using overlaylib. - Display the results on a local display. Figure : Pipeline for bounding box overlay The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
- One stream is sent to qtimetamux plugin to retain
  the video stream.
- The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion
- Scaling down/up
- Normalization on the stream data when the model
  expects the floating point values as input

Converts the video stream to a tensor stream on its source
pad.
The object detection model uses this tensor
stream for inferencing.

Loads the object detection model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
object detection results on its source pad.

Receives the inference tensors from the object detection
model.

Converts the inference tensors on its sinkpad into formats
like video or text that the multimedia plugins can process
later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules for detection models.
In
this use case, qtimlpostprocess does the
following:
1. Loads YOLOv8 submodule.
2. Produces results as structures of text.
3. Sends them to the sinkpad of qtimetamux.

| | [qtimetamux](https://docs.qualcomm.com/doc/80-70022-50/topic/qtimetamux.html) |

Receives video stream and text stream with bounding box
results corresponding to the video stream on its
sinkpads.

Produces GST buffers with contents of the video stream from
its sink pad.

Adds the bounding boxes as
GstVideoRegionOfInterest from data
sinkpad to GST buffers meta (meta muxing) on its source
pad.

| | [qtivoverlay](https://docs.qualcomm.com/doc/80-70022-50/topic/qtioverlay.html) |

Receives the multiplexed stream.

Overlays the bounding boxes on the VideoFrame using CL.

Produces GST buffers with overlays in its source pad.

Receives the video stream on its sinkpad.

Submits the video stream to Weston.

Weston renders the video stream and bounding boxes generated
for the objects in that scene on a local display
device.

| ## Use qtivcomposer to mix original frame with bounding box mask Run the use case on the target device: gst-launch-1.0 -e --gst-debug=2 \ qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split \ split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \ split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp model=/etc/models/yolox-yolo-x-w8a8.dlc layers="" ! queue ! \ qtimlpostprocess settings="{\"confidence\": 70.0}" results=5 module=yolov8 labels=/etc/labels/yolox.json ! video/x-raw,width=640,height=360 ! queue ! mixer.Copy to clipboard To stop the use case, use CTRL + C. The following figure shows the flow of the use case execution: 1. Identifies object scenes in the scene from a video stream, which is coming through a camera source. 2. Composes the following using qtivcomposer: 1. Bounding boxes over objects detected. 2. Original video stream. 3. Display the results. Figure : Pipeline for bounding box mask with qtivcomposer The following table provides the sequential processing stages of the pipeline execution: | Process | Description | | --- | --- | | [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70022-50/topic/qtiqmmfsrc.html) |

Collects the video stream (source) and creates two copies of
the source:
1. One stream is sent to qtivcomposer plugin to retain
  the video stream.
2. The other stream is sent to an ML inferencing
  pipeline.

Receives the video stream on its sink pad.

Performs preprocessing:
- Color conversion.
- Scaling down/up.
- Normalization on the stream data when the model
  expects the floating point values as input.

Converts the video stream to a tensor stream on its source
pad.
The object detection model uses this tensor
stream for inferencing.

Loads the object detection model.

Modifies the graph for the chosen delegate.

Receives the tensor stream on its sinkpad.

Runs the inference and produces a tensor stream with the
object detection results on its source pad.

Receives the inference tensors from the objection detection
model.

Converts the inference tensors on its sinkpad into formats
like video or text that the multimedia plugins can process
later.

Applies the threshold to the chosen number of results.

Loads the corresponding modules for detection models.
In
this use case, qtimlpostprocess does the following:
1. Loads the YOLOv8 submodule.
2. Produces video frames with only bounding boxes that
  can be overlaid on objects.
3. Sends them to the sinkpad of qtivcomposer.

| | [qtivcomposer](https://docs.qualcomm.com/doc/80-70022-50/topic/qtivcomposer.html) |

Receives the original video stream and video stream with
bounding boxes on its sinkpads.

On its sourcepads, produces content that's composed of the
video streams processed from its sinkpads.

Receives the video stream on its sinkpad.

Submits the video stream to Weston.

Weston displays the following on a local display device:
1. The video stream is captured from the camera.
2. The bounding boxes are drawn over the allowed number
  of objects identified in that scene.

| ## Known issue The current model in AI hub isn't giving the expected output. The issue will be fixed in a future release. **Parent Topic:** [Qualcomm Neural Processing SDK use cases](https://docs.qualcomm.com/doc/80-70022-50/topic/qualcomm-neural-processing-sdk-use-cases.html) Last Published: Feb 20, 2026 [Previous Topic Image classification and encode with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.md) [Next Topic Object detection and encode with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-70022-50/topics/single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd.md)