# Object detection and display with Neural Processing SDK

The use cases implement a `foot_track_net-person-foot-detection.dlc` object detection model with Qualcomm Neural Processing SDK to identify an object from a file stream. The use case is to overlay or compose the bounding boxes over the detected objects, and the display the results.

Download [Foot Track Net](https://aihub.qualcomm.com/iot/models/foot_track_net) Qualcomm AI runtime w8a8 precision model from AI hub.

## Use qtivoverlay plugin to apply bounding box overlay

Run the use case on the target device:

gst-launch-1.0 --gst-debug=2 filesrc location=/etc/media/video.mp4 ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! tee name=split \
    split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true \
    split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp tensors="<heatmap,bbox,landmark,landmark_visibility>" model=/etc/models/foot_track_net-person-foot-detection-w8a8.dlc ! queue ! qtimlpostprocess name=stage_01_postproc results=10 module=qpd labels=/etc/labels/foot_track_net.json settings=/etc/labels/foot_track_net_settings.json ! text/x-raw ! queue ! metamux.
    Copy to clipboard

To stop the use case, use **CTRL + C**.

The following figure shows the flow of the use case execution:

- Identify an object in a scene from a video stream coming through file source.
- Overlay the bounding boxes over the detected objects using overlaylib.
- Display the results on a local display.

<!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="755" height="444.133985996246338" viewbox="0 0 755 444.133985996246338" aria-label="../../_images/pipeline_bounding_overlay.svg">
  <defs>
    <style>.svg-1 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-1 .cls-2 { fill: #fff; font-size: 16px }
.svg-1 .cls-2,.svg-1 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-1 .cls-4 { fill: #007884 }
.svg-1 .cls-5 { fill: #d2d7e1 }
.svg-1 .cls-6 { fill: #2a2aea }
.svg-1 .cls-3 { font-size: 14px }
.svg-1 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g>
    <rect class="cls-7" x=".5" y=".500091552734375" width="754" height="443.1337890625" rx="7.499999999999983" ry="7.499999999999983"></rect>
    <path class="cls-5" d="M747,1c3.859802246090112,0,7,3.14019775390625,7,7v428.133986473083496c0,3.859799385070801-3.140197753909888,6.999999523162842-7,6.999999523162842H8c-3.85980224609375,0-7-3.140200138092041-7-6.999999523162842V8c0-3.85980224609375,3.14019775390625-7,7-7h739M747,0H8C3.581695556640625,0,0,3.581695556640625,0,8v428.133986473083496c0,4.418299674987793,3.581695556640625,7.999999523162842,8,7.999999523162842h739c4.418273925779431,0,8-3.581699848175049,8-7.999999523162842V8c0-4.418304443359375-3.581726074220569-8-8-8h0Z"></path>
  </g>
  <g>
    <g>
      <text class="cls-3" transform="translate(557.492919921875 420.225494384765625)"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect class="cls-6" x="537.241901596624302" y="408.134010127494548" width="16" height="16" rx="2" ry="2"></rect>
    </g>
    <g>
      <text class="cls-3" transform="translate(656.07470703125 420.225494384765625)"><tspan x="0" y="0">Open source</tspan></text>
      <rect class="cls-4" x="635.823683009481101" y="408.134010127494548" width="16" height="16" rx="2" ry="2"></rect>
    </g>
  </g>
  <g>
    <g>
      <line class="cls-1" x1="100" y1="163.134002685545966" x2="100" y2="182.393035888670966"></line>
      <polygon points="96.01092529296875 181.225830078125 100 188.134002685545966 103.98907470703125 181.225830078125 96.01092529296875 181.225830078125"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="20" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(88.910197734832764 142.6661376953125)"><tspan x="0" y="0">tee</tspan></text>
    </g>
    <g>
      <rect class="cls-6" x="20" y="188.134008498590447" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(44.527485370635986 217.809967041015625)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="100" y1="238.134002685545966" x2="100" y2="257.393020629882812"></line>
      <polygon points="96.01092529296875 256.225830078125 100 263.134002685545966 103.98907470703125 256.225830078125 96.01092529296875 256.225830078125"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="20" y="263.134008498590447" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(64.679829120635986 292.809967041015625)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="100" y1="313.134002685545966" x2="100" y2="332.393028259277344"></line>
      <polygon points="96.01092529296875 331.225830078125 100 338.134010314941406 103.98907470703125 331.225830078125 96.01092529296875 331.225830078125"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="20" y="338.134008498589537" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(37.722797870635986 367.809967041015625)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="180" y1="139.159423828125" x2="199.259002685546875" y2="139.159423828125"></line>
      <polygon points="198.091827392578125 143.14849853515625 205 139.159423828125 198.091827392578125 135.170379638670966 198.091827392578125 143.14849853515625"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="205" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(242.296905517578125 143.835296630859375)"><tspan x="0" y="0">qtimetamux</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="365" y1="139.159423828125" x2="384.259033203125" y2="139.159423828125"></line>
      <polygon points="383.091796875 143.14849853515625 390 139.159423828125 383.091796875 135.170379638670966 383.091796875 143.14849853515625"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="390" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(431.56256103515625 142.6661376953125)"><tspan x="0" y="0">qtivoverlay</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="550" y1="139.159423828125" x2="569.259033203125" y2="139.159423828125"></line>
      <polygon points="568.091796875 143.14849853515625 575 139.159423828125 568.091796875 135.170379638670966 568.091796875 143.14849853515625"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="575" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(611.09771728515625 142.6661376953125)"><tspan x="0" y="0">waylandsink</tspan></text>
    </g>
    <g>
      <polyline class="cls-1" points="180 365.497726440429688 285 365.497734069824219 285 169.79998779296875"></polyline>
      <polygon points="288.98907470703125 170.967193603515625 285 164.05902099609375 281.01092529296875 170.967193603515625 288.98907470703125 170.967193603515625"></polygon>
    </g>
    <rect class="cls-4" x="20" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(78.082072734832764 52.620574951171875)"><tspan x="0" y="0">filesrc</tspan></text>
    <g>
      <line class="cls-1" x1="180" y1="49.113861083984375" x2="199.259002685546875" y2="49.113861083984375"></line>
      <polygon points="198.091827392578125 53.102935791015625 205 49.113861083984375 198.091827392578125 45.12481689453125 198.091827392578125 53.102935791015625"></polygon>
    </g>
    <rect class="cls-4" x="205" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(253.703155517578125 52.620574951171875)"><tspan x="0" y="0">qtdemux</tspan></text>
    <g>
      <line class="cls-1" x1="365" y1="49.113861083984375" x2="384.259033203125" y2="49.113861083984375"></line>
      <polygon points="383.091796875 53.102935791015625 390 49.113861083984375 383.091796875 45.12481689453125 383.091796875 53.102935791015625"></polygon>
    </g>
    <rect class="cls-4" x="390" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(432.20703125 52.620574951171875)"><tspan x="0" y="0">h264parse</tspan></text>
    <g>
      <line class="cls-1" x1="550" y1="49.113861083984375" x2="569.259033203125" y2="49.113861083984375"></line>
      <polygon points="568.091796875 53.102935791015625 575 49.113861083984375 568.091796875 45.12481689453125 568.091796875 53.102935791015625"></polygon>
    </g>
    <g>
      <polyline class="cls-1" points="99.999755859375 108.41845703125 99.999755859375 94.159423828125 655 94.159423828125 655 74.159423828125"></polyline>
      <polygon points="103.98883056640625 107.251251220703125 99.999755859375 114.159423828125 96.010711669921875 107.251251220703125 103.98883056640625 107.251251220703125"></polygon>
    </g>
    <rect class="cls-4" x="575" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(609.37896728515625 52.620574951171875)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
  </g>
</svg>

**Figure : Pipeline for bounding box overlay**

The following table provides the sequential processing stages of the pipeline execution:

| Process | Description |
| --- | --- |
| File source: filesrc | <ol class="arabic simple"><br><li><p>Captures the video stream using filesrc, followed by qtdemux, which demultiplexes the stream.</p></li><br><li><p>Uses tee to split the stream for inferencing.</p></li><br></ol> |
| h264parse | Parses the H.264 video. |
| [v4l2h264dec](https://docs.qualcomm.com/doc/80-80022-50/topic/v4l2h264dec.html) | Decodes the video. |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimlvconverter.html) | <ol class="arabic"><br><li><p>Receives the video stream on its sink pad.</p></li><br><li><p>Performs preprocessing:</p><ul class="simple"><br><li><p>Color conversion</p></li><br><li><p>Scaling down/up</p></li><br><li><p>Normalization on the stream data when the model expects the floating point values as input</p></li><br></ul><br></li><br><li><p>Converts the video stream to a tensor stream on its source pad.</p><br><p>The object detection model uses this tensor stream for inferencing.</p><br></li><br></ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimlsnpe.html) | <ol class="arabic simple"><br><li><p>Loads the object detection model.</p></li><br><li><p>Modifies the graph for the chosen delegate.</p></li><br><li><p>Receives the tensor stream on its sinkpad.</p></li><br><li><p>Runs the inference and produces a tensor stream with the object detection results on its source pad.</p></li><br></ol> |
| **Postprocessing** | **Postprocessing** |
| [qtimlpostprocess](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimlpostprocess.html) | <ol class="arabic"><br><li><p>Receives the inference tensors from the object detection model.</p></li><br><li><p>Converts the inference tensors on its sinkpad into formats like video or text that the multimedia plugins can process later.</p></li><br><li><p>Applies the threshold to the chosen number of results.</p></li><br><li><p>Loads the corresponding modules for detection models.</p><br><p>In this use case, qtimlpostprocess does the following:</p><ol class="loweralpha simple"><br><li><p>Loads YOLOv8 submodule.</p></li><br><li><p>Produces results as structures of text.</p></li><br><li><p>Sends them to the sinkpad of qtimetamux.</p></li><br></ol><br></li><br></ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimetamux.html) | <ol class="arabic simple"><br><li><p>Receives video stream and text stream with bounding box results corresponding to the video stream on its sinkpads.</p></li><br><li><p>Produces GST buffers with contents of the video stream from its sink pad.</p></li><br><li><p>Adds the bounding boxes as GstVideoRegionOfInterest from data sinkpad to GST buffers meta (meta muxing) on its source pad.</p></li><br></ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-80022-50/topic/qtioverlay.html) | <ol class="arabic simple"><br><li><p>Receives the multiplexed stream.</p></li><br><li><p>Overlays the bounding boxes on the VideoFrame using CL.</p></li><br><li><p>Produces GST buffers with overlays in its source pad.</p></li><br></ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-80022-50/topic/waylandsink.html) | <ol class="arabic simple"><br><li><p>Receives the video stream on its sinkpad.</p></li><br><li><p>Submits the video stream to Weston.</p></li><br><li><p>Weston renders the video stream and bounding boxes generated for the objects in that scene on a local display device.</p></li><br></ol> |

## Use qtivcomposer to mix original frame with bounding box mask

Run the use case on the target device:

gst-launch-1.0 --gst-debug=2 filesrc location=/etc/media/video.mp4 ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! tee name=split split. ! queue ! qtivcomposer name=mixer ! queue ! waylandsink sync=true fullscreen=true split. ! queue ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp tensors="<heatmap,bbox,landmark,landmark_visibility>" model=/etc/models/foot_track_net-person-foot-detection-w8a8.dlc ! queue ! qtimlpostprocess name=stage_01_postproc results=10 module=qpd labels=/etc/labels/foot_track_net.json settings=/etc/labels/foot_track_net_settings.json ! video/x-raw,format=BGRA,width=960,height=540 ! queue ! mixer.
    Copy to clipboard

To stop the use case, use **CTRL + C**.

The following figure shows the flow of the use case execution:

1. Identifies object scenes in the scene from a video stream, which is coming through a file source.
2. Composes the following using qtivcomposer:

    1. Bounding boxes over objects detected.
    2. Original video stream.
3. Display the results.

<!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="755" height="444.133987426757812" viewbox="0 0 755 444.133987426757812" aria-label="../../_images/pipeline_bounding_mask_qtivcomposer.svg">
  <defs>
    <style>.svg-2 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-2 .cls-2 { fill: #fff; font-size: 16px }
.svg-2 .cls-2,.svg-2 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-2 .cls-4 { fill: #007884 }
.svg-2 .cls-5 { fill: #d2d7e1 }
.svg-2 .cls-6 { fill: #2a2aea }
.svg-2 .cls-3 { font-size: 14px }
.svg-2 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g>
    <rect class="cls-7" x=".50006103515625" y=".49981689453125" width="753.99951171875" height="443.13427734375" rx="7.499999999999982" ry="7.499999999999982"></rect>
    <path class="cls-5" d="M747,1c3.85980224609375,0,7,3.14019775390625,7,7v428.133987426757812c0,3.85980224609375-3.14019775390625,7-7,7H8c-3.85980224609375,0-7-3.14019775390625-7-7V8c0-3.85980224609375,3.14019775390625-7,7-7h739M747,0H8C3.581695556640625,0,0,3.581695556640625,0,8v428.133987426757812c0,4.418304443359375,3.581695556640625,8,8,8h739c4.41827392578125,0,8-3.581695556640625,8-8V8c0-4.418304443359375-3.58172607421875-8-8-8h0Z"></path>
  </g>
  <g>
    <g>
      <text class="cls-3" transform="translate(557.492917268173187 420.225477514799422)"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect class="cls-6" x="537.241898942922489" y="408.134023585559589" width="16" height="16" rx="2" ry="2"></rect>
    </g>
    <g>
      <text class="cls-3" transform="translate(656.074704377548187 420.225477514799422)"><tspan x="0" y="0">Open source</tspan></text>
      <rect class="cls-4" x="635.823680355779288" y="408.134023585559589" width="16" height="16" rx="2" ry="2"></rect>
    </g>
  </g>
  <g>
    <g>
      <line class="cls-1" x1="99.999997346298187" y1="163.134016333158797" x2="99.999997346298187" y2="182.393049536283797"></line>
      <polygon points="96.010922639266937 181.225843725736922 99.999997346298187 188.134016333158797 103.989072053329437 181.225843725736922 96.010922639266937 181.225843725736922"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="19.999997346298187" y="114.159446357043635" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(88.910195557968109 142.666151342924422)"><tspan x="0" y="0">tee</tspan></text>
    </g>
    <g>
      <rect class="cls-6" x="19.999997346298187" y="188.134021956656397" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(44.527493684188812 217.809980688627547)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="99.999997346298187" y1="238.134016333158797" x2="99.999997346298187" y2="257.393034277494735"></line>
      <polygon points="96.010922639266937 256.225843725736922 99.999997346298187 263.134016333158797 103.989072053329437 256.225843725736922 96.010922639266937 256.225843725736922"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="19.999997346298187" y="263.134021956656397" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(64.679837434188812 292.809980688627547)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="99.999997346298187" y1="313.134016333158797" x2="99.999997346298187" y2="332.393034277494735"></line>
      <polygon points="96.010922639266937 331.225843725736922 99.999997346298187 338.134023962553329 103.989072053329437 331.225843725736922 96.010922639266937 331.225843725736922"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="19.999997346298187" y="338.134021956656397" width="160" height="49.999999999998181" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(37.722806184188812 367.809980688627547)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="179.999997346298187" y1="139.159437475736922" x2="199.259000031845062" y2="139.159437475736922"></line>
      <polygon points="198.091824738876312 143.148512182768172 204.999997346298187 139.159437475736922 198.091824738876312 135.170393286283797 198.091824738876312 143.148512182768172"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="204.999997346298187" y="114.159446357043635" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(236.140652863876312 143.835340796049422)"><tspan x="0" y="0">qtivcomposer</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="364.999997346298187" y1="139.159437475736922" x2="384.259030549423187" y2="139.159437475736922"></line>
      <polygon points="383.091794221298187 143.148512182768172 389.999997346298187 139.159437475736922 383.091794221298187 135.170393286283797 383.091794221298187 143.148512182768172"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="389.999997346298187" y="114.159446357043635" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(426.097714631454437 142.666151342924422)"><tspan x="0" y="0">waylandsink</tspan></text>
    </g>
    <g>
      <polyline class="cls-1" points="179.999997346298187 365.49774008804161 284.999997346298187 365.497747717436141 284.999997346298187 169.800001440580672"></polyline>
      <polygon points="288.989072053329437 170.967207251127547 284.999997346298187 164.059034643705672 281.010922639266937 170.967207251127547 288.989072053329437 170.967207251127547"></polygon>
    </g>
    <rect class="cls-4" x="19.999997346298187" y="24.113881388891059" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(78.082070557968109 52.620588598783797)"><tspan x="0" y="0">filesrc</tspan></text>
    <g>
      <line class="cls-1" x1="179.999997346298187" y1="49.113881983997089" x2="199.259000031845062" y2="49.113881983997089"></line>
      <polygon points="198.091824738876312 53.102945246936542 204.999997346298187 49.113881983997089 198.091824738876312 45.124816813709003 198.091824738876312 53.102945246936542"></polygon>
    </g>
    <rect class="cls-4" x="204.999997346298187" y="24.113881388891059" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(253.703152863876312 52.620588598783797)"><tspan x="0" y="0">qtdemux</tspan></text>
    <g>
      <line class="cls-1" x1="364.999997346298187" y1="49.113881983997089" x2="384.259030549423187" y2="49.113881983997089"></line>
      <polygon points="383.091794221298187 53.102945246936542 389.999997346298187 49.113881983997089 383.091794221298187 45.124816813709003 383.091794221298187 53.102945246936542"></polygon>
    </g>
    <rect class="cls-4" x="389.999997346298187" y="24.113881388891059" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(432.207028596298187 52.620588598783797)"><tspan x="0" y="0">h264parse</tspan></text>
    <g>
      <line class="cls-1" x1="549.999997346298187" y1="49.113881983997089" x2="569.259030549423187" y2="49.113881983997089"></line>
      <polygon points="568.091794221298187 53.102945246936542 574.999997346298187 49.113881983997089 568.091794221298187 45.124816813709003 568.091794221298187 53.102945246936542"></polygon>
    </g>
    <g>
      <polyline class="cls-1" points="99.999753205673187 108.418462672472742 99.999753205673187 94.159446635486347 654.999997346298187 94.159446635486347 654.999997346298187 74.159446397067768"></polyline>
      <polygon points="103.988827912704437 107.251272120715839 99.999753205673187 114.159444728138624 96.010693757431 107.251272120715839 103.988827912704437 107.251272120715839"></polygon>
    </g>
    <rect class="cls-4" x="574.999997346298187" y="24.113881388891059" width="160" height="50" rx="4.000000000000019" ry="4.000000000000019"></rect>
    <text class="cls-2" transform="translate(609.378964631454437 52.620588598783797)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
  </g>
</svg>

**Figure : Pipeline for bounding box mask with qtivcomposer**

The following table provides the sequential processing stages of the pipeline execution:

| Process | Description |
| --- | --- |
| File source: filesrc | <ol class="arabic simple"><br><li><p>Captures the video stream using filesrc, followed by qtdemux, which demultiplexes the stream.</p></li><br><li><p>Uses tee to split the stream for inferencing.</p></li><br></ol> |
| h264parse | Parses the H.264 video. |
| [v4l2h264dec](https://docs.qualcomm.com/doc/80-80022-50/topic/v4l2h264dec.html) | Decodes the video. |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimlvconverter.html) | <ol class="arabic"><br><li><p>Receives the video stream on its sink pad.</p></li><br><li><p>Performs preprocessing</p><ul class="simple"><br><li><p>Color conversion.</p></li><br><li><p>Scaling down/up.</p></li><br><li><p>Normalization on the stream data when the model expects the floating point values as input</p></li><br></ul><br></li><br><li><p>Converts the video stream to a tensor stream on its source pad.</p><br><p>The object detection model uses this tensor stream for inferencing.</p><br></li><br></ol> |
| **Inferencing** | **Inferencing** |
| [qtimlsnpe](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimlsnpe.html) | <ol class="arabic simple"><br><li><p>Loads the object detection model.</p></li><br><li><p>Modifies the graph for the chosen delegate.</p></li><br><li><p>Receives the tensor stream on its sinkpad.</p></li><br><li><p>Runs the inference and produces a tensor stream with the object detection results on its source pad.</p></li><br></ol> |
| **Postprocessing** | **Postprocessing** |
| [qtimlpostprocess](https://docs.qualcomm.com/doc/80-80022-50/topic/qtimlpostprocess.html) | <ol class="arabic"><br><li><p>Receives the inference tensors from the objection detection model.</p></li><br><li><p>Converts the inference tensors on its sinkpad into formats like video or text that the multimedia plugins can process later.</p></li><br><li><p>Applies the threshold to the chosen number of results.</p></li><br><li><p>Loads the corresponding modules for detection models.</p><br><p>In this use case, qtimlpostprocess does the following:</p><ol class="loweralpha simple"><br><li><p>Loads the YOLOv8 submodule.</p></li><br><li><p>Produces video frames with only bounding boxes that can be overlaid on objects.</p></li><br><li><p>Sends them to the sinkpad of qtivcomposer.</p></li><br></ol><br></li><br></ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-80022-50/topic/qtivcomposer.html) | <ol class="arabic simple"><br><li><p>Receives the original video stream and video stream with bounding boxes on its sinkpads.</p></li><br><li><p>On its sourcepads, produces content that's composed of the video streams processed from its sinkpads.</p></li><br></ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-80022-50/topic/waylandsink.html) | <ol class="arabic simple"><br><li><p>Receives the video stream on its sinkpad.</p></li><br><li><p>Submits the video stream to Weston.</p></li><br><li><p>Weston displays the following on a local display device:</p><ol class="loweralpha simple"><br><li><p>The video stream is captured from the camera.</p></li><br><li><p>The bounding boxes are drawn over the allowed number of objects identified in that scene.</p></li><br></ol><br></li><br></ol> |

Last Published: May 14, 2026

[Previous Topic
Image classification and encode with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-80022-50/topics/single-camera-stream-with-image-classification-and-encode-with-mobilenet-v1.md) [Next Topic
Object detection and encode with Neural Processing SDK](https://docs.qualcomm.com/bundle/publicresource/80-80022-50/topics/single-camera-stream-with-object-detection-and-encode-with-mobilenet-v2-ssd.md)

Source: [https://docs.qualcomm.com/doc/80-80022-50/topic/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.html](https://docs.qualcomm.com/doc/80-80022-50/topic/single-camera-stream-with-object-detection-and-display-with-mobilenet-v2-ssd.html)