# Object detection and display with LiteRT

Source: [https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-object-detection-and-display.html](https://docs.qualcomm.com/doc/80-70023-50/topic/single-camera-stream-with-object-detection-and-display.html)

The use cases use a YOLOX LiteRT model to identify the object in a scene. The use
        case is to either overlay or compose the bounding boxes over the detected objects, and then
        display the results.

## Use qtivoverlay plugin to apply bounding box overlay

Run the use case on the target device:

    gst-launch-1.0 -e qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! \
    queue ! tee name=split split. ! queue ! qtimetamux name=metamux ! queue ! qtivoverlay ! queue ! waylandsink fullscreen=true split. ! queue ! \
    qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
    external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/etc/models/yolox_quantized.tflite ! queue ! \
    qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json ! text/x-raw ! queue ! metamux.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identifies object scenes in the scene from a video stream, which is coming
                    through a camera source.
2. Overlays bounding boxes over the detected objects using overlaylib.
3. Displays the results.

Figure : Pipeline for bounding box overlay
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" width="755" height="444.133985996246338" viewbox="0 0 755 444.133985996246338">
  <defs>
    <style>.svg-1 .cls-1 { fill: none; stroke: #000; stroke-miterlimit: 10 }
.svg-1 .cls-2 { fill: #fff; font-size: 16px }
.svg-1 .cls-2,.svg-1 .cls-3 { font-family: Roboto-Regular, Roboto }
.svg-1 .cls-4 { fill: #007884 }
.svg-1 .cls-5 { fill: #d2d7e1 }
.svg-1 .cls-6 { fill: #2a2aea }
.svg-1 .cls-3 { font-size: 14px }
.svg-1 .cls-7 { fill: #fafafa }</style>
  </defs>
  <g>
    <rect class="cls-7" x=".5" y=".500091552734375" width="754" height="443.1337890625" rx="7.499999999999983" ry="7.499999999999983"></rect>
    <path class="cls-5" d="M747,1c3.859802246090112,0,7,3.14019775390625,7,7v428.133986473083496c0,3.859799385070801-3.140197753909888,6.999999523162842-7,6.999999523162842H8c-3.85980224609375,0-7-3.140200138092041-7-6.999999523162842V8c0-3.85980224609375,3.14019775390625-7,7-7h739M747,0H8C3.581695556640625,0,0,3.581695556640625,0,8v428.133986473083496c0,4.418299674987793,3.581695556640625,7.999999523162842,8,7.999999523162842h739c4.418273925779431,0,8-3.581699848175049,8-7.999999523162842V8c0-4.418304443359375-3.581726074220569-8-8-8h0Z"></path>
  </g>
  <g>
    <g>
      <text class="cls-3" transform="translate(557.492919921875 420.225494384765625)"><tspan x="0" y="0">Qualcomm </tspan></text>
      <rect class="cls-6" x="537.241901596624302" y="408.134010127494548" width="16" height="16" rx="2" ry="2"></rect>
    </g>
    <g>
      <text class="cls-3" transform="translate(656.07470703125 420.225494384765625)"><tspan x="0" y="0">Open source</tspan></text>
      <rect class="cls-4" x="635.823683009481101" y="408.134010127494548" width="16" height="16" rx="2" ry="2"></rect>
    </g>
  </g>
  <g>
    <g>
      <line class="cls-1" x1="100" y1="163.134002685545966" x2="100" y2="182.393035888670966"></line>
      <polygon points="96.01092529296875 181.225830078125 100 188.134002685545966 103.98907470703125 181.225830078125 96.01092529296875 181.225830078125"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="20" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(88.910197734832764 142.6661376953125)"><tspan x="0" y="0">tee</tspan></text>
    </g>
    <g>
      <rect class="cls-6" x="20" y="188.134008498590447" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(44.527485370635986 217.809967041015625)"><tspan x="0" y="0">qtimlvconverter</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="100" y1="238.134002685545966" x2="100" y2="257.393020629882812"></line>
      <polygon points="96.01092529296875 256.225830078125 100 263.134002685545966 103.98907470703125 256.225830078125 96.01092529296875 256.225830078125"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="20" y="263.134008498590447" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(64.679829120635986 292.809967041015625)"><tspan x="0" y="0">qtimlsnpe</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="100" y1="313.134002685545966" x2="100" y2="332.393028259277344"></line>
      <polygon points="96.01092529296875 331.225830078125 100 338.134010314941406 103.98907470703125 331.225830078125 96.01092529296875 331.225830078125"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="20" y="338.134008498589537" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(37.722797870635986 367.809967041015625)"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="180" y1="139.159423828125" x2="199.259002685546875" y2="139.159423828125"></line>
      <polygon points="198.091827392578125 143.14849853515625 205 139.159423828125 198.091827392578125 135.170379638670966 198.091827392578125 143.14849853515625"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="205" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(242.296905517578125 143.835296630859375)"><tspan x="0" y="0">qtimetamux</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="365" y1="139.159423828125" x2="384.259033203125" y2="139.159423828125"></line>
      <polygon points="383.091796875 143.14849853515625 390 139.159423828125 383.091796875 135.170379638670966 383.091796875 143.14849853515625"></polygon>
    </g>
    <g>
      <rect class="cls-6" x="390" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(431.56256103515625 142.6661376953125)"><tspan x="0" y="0">qtivoverlay</tspan></text>
    </g>
    <g>
      <line class="cls-1" x1="550" y1="139.159423828125" x2="569.259033203125" y2="139.159423828125"></line>
      <polygon points="568.091796875 143.14849853515625 575 139.159423828125 568.091796875 135.170379638670966 568.091796875 143.14849853515625"></polygon>
    </g>
    <g>
      <rect class="cls-4" x="575" y="114.159432898976775" width="160" height="50" rx="4" ry="4"></rect>
      <text class="cls-2" transform="translate(611.09771728515625 142.6661376953125)"><tspan x="0" y="0">waylandsink</tspan></text>
    </g>
    <g>
      <polyline class="cls-1" points="180 365.497726440429688 285 365.497734069824219 285 169.79998779296875"></polyline>
      <polygon points="288.98907470703125 170.967193603515625 285 164.05902099609375 281.01092529296875 170.967193603515625 288.98907470703125 170.967193603515625"></polygon>
    </g>
    <rect class="cls-4" x="20" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(78.082072734832764 52.620574951171875)"><tspan x="0" y="0">filesrc</tspan></text>
    <g>
      <line class="cls-1" x1="180" y1="49.113861083984375" x2="199.259002685546875" y2="49.113861083984375"></line>
      <polygon points="198.091827392578125 53.102935791015625 205 49.113861083984375 198.091827392578125 45.12481689453125 198.091827392578125 53.102935791015625"></polygon>
    </g>
    <rect class="cls-4" x="205" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(253.703155517578125 52.620574951171875)"><tspan x="0" y="0">qtdemux</tspan></text>
    <g>
      <line class="cls-1" x1="365" y1="49.113861083984375" x2="384.259033203125" y2="49.113861083984375"></line>
      <polygon points="383.091796875 53.102935791015625 390 49.113861083984375 383.091796875 45.12481689453125 383.091796875 53.102935791015625"></polygon>
    </g>
    <rect class="cls-4" x="390" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(432.20703125 52.620574951171875)"><tspan x="0" y="0">h264parse</tspan></text>
    <g>
      <line class="cls-1" x1="550" y1="49.113861083984375" x2="569.259033203125" y2="49.113861083984375"></line>
      <polygon points="568.091796875 53.102935791015625 575 49.113861083984375 568.091796875 45.12481689453125 568.091796875 53.102935791015625"></polygon>
    </g>
    <g>
      <polyline class="cls-1" points="99.999755859375 108.41845703125 99.999755859375 94.159423828125 655 94.159423828125 655 74.159423828125"></polyline>
      <polygon points="103.98883056640625 107.251251220703125 99.999755859375 114.159423828125 96.010711669921875 107.251251220703125 103.98883056640625 107.251251220703125"></polygon>
    </g>
    <rect class="cls-4" x="575" y="24.113867930825108" width="160" height="50" rx="4" ry="4"></rect>
    <text class="cls-2" transform="translate(609.37896728515625 52.620574951171875)"><tspan x="0" y="0">v4l2h264dec</tspan></text>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70023-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_l2f_zgm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ol_m2f_zgm_vbc"><br>                                            <li class="li">One stream is sent to <a href="https://docs.qualcomm.com/doc/80-70023-50/topic/qtimetamux.html">qtimetamux</a><br>                                                plugin to retain the video stream.</li><br><br>                                            <li class="li">The other stream is sent to an ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_xsf_q5l_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ul_ff2_twl_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_ufn_2lm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol"><br>                                    <li class="li"> Receives the inference tensors from object detection. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-display__ol_jcd_wnk_5bc"><br>                                            <li class="li">Loads the YOLOv8 submodule. </li><br><br>                                            <li class="li">Produces results as structures of text.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtimetamux.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtimetamux](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimetamux.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_ll3_x5l_vbc"><br>                                    <li class="li">Receives video stream and text stream with bounding box<br>                                        results corresponding to the video stream on its<br>                                        sinkpads.</li><br><br>                                    <li class="li">Produces GST buffers with contents of video stream from its<br>                                        sink pad.</li><br><br>                                    <li class="li">Adds bounding boxes as GstVideoRegionOfInterest from data<br>                                        sinkpad to GST buffers meta (meta muxing) on its source<br>                                        pad.</li><br><br>                                </ol> |
| [qtivoverlay](https://docs.qualcomm.com/doc/80-70023-50/topic/qtioverlay.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_wst_y5l_vbc"><br>                                    <li class="li">Receives the multiplexed stream.</li><br><br>                                    <li class="li">Overlays the bounding boxes on the VideoFrame using CL. </li><br><br>                                    <li class="li">Produces GST buffers with overlays in its source pad.</li><br><br>                                </ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70023-50/topic/waylandsink.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_dyv_1lm_vbc"><br>                                    <li class="li">Receives the video stream on its sinkpad.</li><br><br>                                    <li class="li">Submits the video stream to Weston. </li><br><br>                                    <li class="li">Weston renders the video stream and bounding boxes generated<br>                                        for the objects in that scene on a local display<br>                                        device.</li><br><br>                                </ol> |

## Use qtivcomposer to mix original frame with bounding box mask

Run the use case on the target
                device:

    gst-launch-1.0 -e qtiqmmfsrc name=camsrc ! video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1 ! queue ! tee name=split split. ! \
    queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true split. ! queue ! qtimlvconverter ! queue ! \
    qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
    model=/etc/models/yolox_quantized.tflite ! queue ! qtimlpostprocess settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/etc/labels/yolox.json \
    ! video/x-raw,format=BGRA,width=640,height=360 ! queue ! mixer.Copy to clipboard

To stop the use case,  use CTRL + C.

The following figure shows the flow of the use case execution:

1. Identifies object scenes in the scene from a video stream, which is coming
                    through a camera source.
2. Composes the following using qtivcomposer:
    1. Bounding boxes over objects detected.
    2. Original video stream.
3. Displays the results.

Figure : Pipeline for bounding box mask with qtivcomposer
                
                <!--?xml version="1.0" encoding="UTF-8"?-->
<svg xmlns="http://www.w3.org/2000/svg" width="755" height="349.974590301513672" viewbox="0 0 755 349.974590301513672">
  <g id="Layer_1" data-name="Layer 1">
    <g>
      <rect x=".5" y=".499797821044922" width="754" height="348.974609375" rx="7.499999999999944" ry="7.499999999999944" style="fill: #fafafa;"></rect>
      <path d="M747,1c3.85980224609375,0,7,3.140201568603516,7,7v333.974590301513672c0,3.85980224609375-3.14019775390625,7-7,7H8c-3.85980224609375,0-7-3.14019775390625-7-7V8c0-3.859798431396484,3.14019775390625-7,7-7h739M747,0H8C3.581695556640625,0,0,3.581701278686523,0,8v333.974590301513672c0,4.418304443359375,3.581695556640625,8,8,8h739c4.41827392578125,0,8-3.581695556640625,8-8V8c0-4.418298721313477-3.58172607421875-8-8-8h0Z" style="fill: #d2d7e1;"></path>
    </g>
    <g>
      <g>
        <text transform="translate(564.49249267578125 325.765972137451172)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Qualcomm </tspan></text>
        <rect x="544.241457639080181" y="313.674465179443359" width="16" height="16" rx="2" ry="2" style="fill: #2a2aea;"></rect>
      </g>
      <g>
        <text transform="translate(663.07427978515625 325.765972137451172)" style="font-family: Roboto-Regular, Roboto; font-size: 14px;"><tspan x="0" y="0">Open source</tspan></text>
        <rect x="642.823239051933342" y="313.674465179443359" width="16" height="16" rx="2" ry="2" style="fill: #007884;"></rect>
      </g>
    </g>
  </g>
  <g id="Layer_2" data-name="Layer 2">
    <g>
      <g>
        <rect x="20" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(73.429729461669922 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">camsrc</tspan></text>
      </g>
      <g>
        <line x1="180" y1="45.000011444091797" x2="199.976654052734375" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="198.955352783203125 48.490444183349609 205 45.000011444091797 198.955352783203125 41.50958251953125 198.955352783203125 48.490444183349609"></polygon>
      </g>
      <g>
        <line x1="285" y1="68.974590301513672" x2="285" y2="88.951229095458984" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 87.929935455322266 285 93.974590301513672 288.49041748046875 87.929935455322266 281.50958251953125 87.929935455322266"></polygon>
      </g>
      <g>
        <rect x="205" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #007884;"></rect>
        <text transform="translate(273.910202026367188 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">tee</tspan></text>
      </g>
      <g>
        <rect x="205" y="93.97459047333632" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(229.527496337890625 123.650547027587891)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlvconverter</tspan></text>
      </g>
      <g>
        <line x1="285" y1="143.974590301513672" x2="285" y2="163.951229095458984" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 162.929943084716797 285 168.974590301513672 288.49041748046875 162.929943084716797 281.50958251953125 162.929943084716797"></polygon>
      </g>
      <g>
        <rect x="205" y="168.97459047333632" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(250.820465087890625 198.650554656982422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimltflite</tspan></text>
      </g>
      <g>
        <line x1="285" y1="218.974590301513672" x2="285" y2="238.951244354248047" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="281.50958251953125 237.929943084716797 285 243.974590301513672 288.49041748046875 237.929943084716797 281.50958251953125 237.929943084716797"></polygon>
      </g>
      <g>
        <rect x="205" y="243.97459047333632" width="160" height="49.999999999999091" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(222.722808837890625 273.650554656982422)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtimlpostprocess</tspan></text>
      </g>
      <g>
        <line x1="365" y1="45.000011444091797" x2="384.97662353515625" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="383.955322265625 48.490444183349609 390 45.000011444091797 383.955322265625 41.50958251953125 383.955322265625 48.490444183349609"></polygon>
      </g>
      <g>
        <rect x="390" y="20.000014873722648" width="160" height="50" rx="4" ry="4" style="fill: #2a2aea;"></rect>
        <text transform="translate(421.140655517578125 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">qtivcomposer</tspan></text>
      </g>
      <g>
        <line x1="550" y1="45.000011444091797" x2="569.97662353515625" y2="45.000011444091797" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></line>
        <polygon points="568.955322265625 48.490444183349609 575 45.000011444091797 568.955322265625 41.50958251953125 568.955322265625 48.490444183349609"></polygon>
      </g>
      <g>
        <rect x="575" y="20.000014873722648" width="160" height="50" rx="4.000000000000019" ry="4.000000000000019" style="fill: #007884;"></rect>
        <text transform="translate(611.09771728515625 48.506718635559082)" style="fill: #fff; font-family: Roboto-Regular, Roboto; font-size: 16px;"><tspan x="0" y="0">waylandsink</tspan></text>
      </g>
    </g>
    <g>
      <polyline points="365 268.974590301513672 470 268.974590301513672 470 75.847972869873047" style="fill: none; stroke: #000; stroke-miterlimit: 10;"></polyline>
      <polygon points="473.49041748046875 76.869258880615234 470 70.824611663818359 466.50958251953125 76.869258880615234 473.49041748046875 76.869258880615234"></polygon>
    </g>
  </g>
</svg>

The following table provides the sequential processing stages of the pipeline
                execution:

| Process | Description |
| --- | --- |
| [qtiqmmfsrc](https://docs.qualcomm.com/doc/80-70023-50/topic/qtiqmmfsrc.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_z2v_xlm_vbc"><br>                                    <li class="li">Collects the video stream (source) and creates two copies of<br>                                        the source:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ol_afv_xlm_vbc"><br>                                            <li class="li">One stream is sent to qtivcomposer plugin to retain<br>                                                the video stream.</li><br><br>                                            <li class="li">The other stream is sent to a ML inferencing<br>                                                pipeline.</li><br><br>                                        </ul><br></li><br><br>                                </ol> |
| **Preprocessing** | **Preprocessing** |
| [qtimlvconverter](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimlvconverter.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_bfv_xlm_vbc"><br>                                    <li class="li">Receives the video stream on its sink pad.</li><br><br>                                    <li class="li">Performs preprocessing:<ul class="ul" id="single-camera-stream-with-object-detection-and-display__ul_cfv_xlm_vbc"><br>                                            <li class="li">Color conversion</li><br><br>                                            <li class="li">Scaling down/up</li><br><br>                                            <li class="li">Normalization on the stream data when the model<br>                                                expects the floating point values as input</li><br><br>                                        </ul><br></li><br><br>                                    <li class="li">Converts the video stream to a tensor stream on its source<br>                                            pad.<p class="p">The object detection model uses this tensor<br>                                            stream for inferencing.</p><br></li><br><br>                                </ol> |
| **Inferencing** | **Inferencing** |
| [qtimltflite](https://docs.qualcomm.com/doc/80-70023-50/topic/qtimltflite.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_dfv_xlm_vbc"><br>                                    <li class="li">Loads the object detection model.</li><br><br>                                    <li class="li">Modifies the graph for the chosen delegate.</li><br><br>                                    <li class="li">Receives the tensor stream on its sinkpad.</li><br><br>                                    <li class="li">Runs the inference and produces a tensor stream with the<br>                                        object detection results on its source pad.</li><br><br>                                </ol> |
| **Postprocessing** | **Postprocessing** |
| qtimlpostprocess | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_efv_xlm_vbc"><br>                                    <li class="li"> Receives the inference tensors from the object detection<br>                                        model. </li><br><br>                                    <li class="li">Converts the inference tensors on its sinkpad into formats<br>                                        like video or text that the multimedia plugins can process<br>                                        later.</li><br><br>                                    <li class="li">Applies the threshold to the chosen number of results. </li><br><br>                                    <li class="li">Loads the corresponding modules for detection models. <p class="p">In<br>                                            this use case, qtimlpostprocess does the following:<br>                                            </p><ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-display__ol_ffv_xlm_vbc"><br>                                            <li class="li">Loads the YOLOv8 submodule. </li><br><br>                                            <li class="li">Produces video frames with only bounding boxes that<br>                                                can be overlaid on objects.</li><br><br>                                            <li class="li">Sends them to the sinkpad of qtivcomposer.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |
| [qtivcomposer](https://docs.qualcomm.com/doc/80-70023-50/topic/qtivcomposer.html) | <ol class="ol"><br>                                    <li class="li">Receives the original video stream and video stream with<br>                                        bounding boxes on its sinkpads.</li><br><br>                                    <li class="li">On its sourcepads, produces content that's composed of the<br>                                        video streams processed from its sinkpads.</li><br><br>                                </ol> |
| **Output** | **Output** |
| [Waylandsink](https://docs.qualcomm.com/doc/80-70023-50/topic/waylandsink.html) | <ol class="ol" id="single-camera-stream-with-object-detection-and-display__ol_ifv_xlm_vbc"><br>                                    <li class="li">Receives the video stream on its sinkpad.</li><br><br>                                    <li class="li">Submits the video stream to Weston. </li><br><br>                                    <li class="li">Weston displays the following on a local display device:<ol class="ol" type="a" id="single-camera-stream-with-object-detection-and-display__ol_nv1_xzm_vbc"><br>                                            <li class="li">The video stream is captured from the camera.</li><br><br>                                            <li class="li">The bounding boxes are drawn over the allowed number<br>                                                of objects identified in that scene.</li><br><br>                                        </ol><br></li><br><br>                                </ol> |

**Parent Topic:** [LiteRT use cases](https://docs.qualcomm.com/doc/80-70023-50/topic/tensorflow-lite-use-cases.html)

Last Published: Mar 27, 2026

[Previous Topic
Audio classification decode and display with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/audio-classification-with-litert.md) [Next Topic
Object detection and encode with LiteRT](https://docs.qualcomm.com/bundle/publicresource/80-70023-50/topics/single-camera-stream-with-object-detection-and-encode.md)